EGC 2007 (September 22nd 2006 / January 23-26 2007)

The largest French Data Mining conference (EGC 2007) has a new CFP. It will be held in Belgium.

WWW2007 (November 20, 2006 / May 8-12, 2007)

The WWW2007 call for papers is out.

WWW2007 seeks original papers describing research in all areas of the Web. Papers should not have been published or be in submission at another conference or journal. Topics include but are not limited to: Browsers and User Interfaces, Data Mining, Performance and Scalability, Search, Web Services, etc.

Steven got his M.Sc.

Steven’s presentation on Efficient Storage Methods for a Literary Data Warehouse went well and he got his M.Sc. Congratulations!

(I drove 12 hours to get to Saint John… and 12 hours back… I’m exhausted!)

STXXL: C++ Standard Template Library for Extra Large Data Sets

I haven’t tried it, but the description is cool:

The The core of STXXL is an implementation of the C++ standard template library STL for external memory (out-of-core) computations, i.e., STXXL implements containers and algorithms that can process huge volumes of data that only fit on disks. While the compatibility to the STL supports ease of use and compatibility with existing applications, another design priority is high performance.

They also have external memory suffix arrays.

Source: geomblog.

Update: Hazel pointed me to TPIE:

TPIE is a software environment (written in C++) that facilitates the implementation of external memory algorithms. The goal of theoretical work in the area of external memory algorithms (also called I/O algorithms or out-of-core algorithms) has been to develop algorithms that minimize the Input/Output communication (or just I/O) performed when solving problems on very large data sets. The area was effectively started in the late eighties by Aggarwal and Vitter and subsequently I/O algorithms have been developed for several problem domains.

Peter Norvig on solving every Sudoku puzzle

Peter Norvig shows us how to solve every Sudoku problem in 100 lines of code using constraint propagation and search.

Efficient Storage Methods for a Literary Data Warehouse

FACULTY OF COMPUTER SCIENCE
NOTICE OF ORAL DEFENCE

MCS Degree

Efficient Storage Methods for a Literary Data Warehouse
By
Steven W. Keith
Examining Committee:
Supervisors: Dr. Owen Kaser (UNBSJ),
Dr. Daniel Lemire (Adjunct Prof., Univ.of Quebec)
Chairperson: Dr. Larry Garey (UNBSJ)
Internal Reader: Dr. Weichang Du
External Reader: Dr. George Stoica (UNBSJ)

Monday, May 29, 2006
10:30 a.m.
VIDEO CONFERENCE
UNBSJ LOCATION: MacMurray Room-Oland Hall- Rm 203
UNBF LOCATION:  Multimedia Center(1st floor, room 126)Marshall D'Avray Hall

ABSTRACT

Computer-assisted reading and analysis of text has applications in the
humanities and social sciences. Ever-larger electronic text archives have
the advantage of allowing a more complete analysis but the disadvantage
of forcing longer waits for results. This thesis addresses the issue of
efficiently storing data in a literary data warehouse. The method in which
the data is stored directly influences the ability to extract useful,
analytical results from the data warehouse in a timely fashion.
A variety of storage methods including mapped files, trees, hashing,
and databases are evaluated to determine the most efficient method
of storing cubes in the data warehouse. Each storage method's ability
to insert and retrieve data points as well as slice, dice, and roll-up a
cube is evaluated. The amount of disk space required to store
the cubes is also considered. Five test cubes of various sizes are used to
determine which method being evaluated is most efficient. The results lead
to various storage methods being efficient, depending on properties of the
cube and the requirements of the user.

ALL GRADUATE STUDENTS ARE ENCOURAGED TO ATTEND

*********************
Linda Sales
Graduate Studies Program
Administrative Assistant
Faculty of Computer Science
University of New Brunswick
540 Windsor Street
Fredericton, NB
E3B 5A3

Phone: 506-458-7285
Fax: 506-453-3566

Curse of Dimensionality and intuition

Yaroslav has this thought provoking article on the Curse of Dimensionality:

(…) consider a cube of width 1. As dimension increases, the volume stays the same. But (…) eventually almost all the mass is concentrated in the corners (meaning outside of the inscribed sphere).

The plot is especially shocking: at d=8, the sphere of diameter 1 inscribed in the unit cube has a negligible volume!

Some academic research tricks for the web

I really shouldn’t share these tricks since they are secret weapons for massive research productivity (SWMRP), but I hope some of you will share your own research productivity tricks also, so that academic bloggers will quickly dominate the research world.

  • Use Firefox! If you do, you can add citeseer, google scholar and wikipedia to your quick search bar. When (academic) research becomes intensive, make google scholar or citeseer your default Firefox search engine! If you do this, your research productivity will jump. I hope you will then give me credit in all future papers you write! I could use the fame.
  • (From Suresh) You can use citeulike to collect references for papers, and export the result to bibtex at the end. Don’t forget to add the bookmarklet to your bookmarks for extra convenience. I have my own citeulike library though I haven’t used it for collecting references for a paper, so far.
  • Similarly, for tracking random web sites, del.icio.us is a must. Don’t forget to install the Firefox plugin for extra convenience.
  • A little known fact is that Google Scholar can be configured (see “preferences”) to export search results to bibtex. Again, this is the sort of thing that can make your productivity jump so that you’ll be tempted to start all your papers by “Daniel Lemire made this paper possible…” Ok, maybe not, but close.
  • Subscribe to lots and lots of mailing lists and newsletters, but instead of reading them all, just have your mail client flag those whose text contain keywords you are interested in. This is especially powerful to monitor interesting call for papers.

Disclaimer: if you use these tricks, you ought to be able to easily write 15 papers a year. Ah! But why don’t I write so many papers? Because I’ve got bad habits such as constantly changing my field of research, thinking for a very long time about non-paperable ideas or writing code, sometimes pointless code, for months at a time, just because I like playing with live algorithms. Also, I’m a little bit dumb and ignorant. ;-)

Beyond the algorithmization of the sciences

Thomas Easton promotes algorithms as a higher form of science in his paper Beyond the algorithmization of the sciences:

Algorithms have thus made biology as useful a science as physics, chemistry, and computer science. But are algorithms enough to move biology closer to the throne? Are they math?
(…)
Five decades ago, most mathematicians would have said no. Then in the 1970s, they discovered the value of computers for “proving” theorems (…)

Myself, I haven’t had time to spend much time thinking about it, but it is quite clear that I value algorithms at least as highly as I value theorems.

Google Web Toolkit – Build AJAX apps in the Java language

There might be hope for Java after all. Google just published is AJAX-based web toolkit in Java. Cross-browser compatibility is a major pain with AJAX, but this toolkit solves it all. Or so says Google.

Next Page »

19 queries. 0.410 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.