EGC 2007 (September 22nd 2006 / January 23-26 2007)
The largest French Data Mining conference (EGC 2007) has a new CFP. It will be held in Belgium.
The largest French Data Mining conference (EGC 2007) has a new CFP. It will be held in Belgium.
The WWW2007 call for papers is out.
WWW2007 seeks original papers describing research in all areas of the Web. Papers should not have been published or be in submission at another conference or journal. Topics include but are not limited to: Browsers and User Interfaces, Data Mining, Performance and Scalability, Search, Web Services, etc.
Steven’s presentation on Efficient Storage Methods for a Literary Data Warehouse went well and he got his M.Sc. Congratulations!
(I drove 12 hours to get to Saint John… and 12 hours back… I’m exhausted!)
I haven’t tried it, but the description is cool:
The The core of STXXL is an implementation of the C++ standard template library STL for external memory (out-of-core) computations, i.e., STXXL implements containers and algorithms that can process huge volumes of data that only fit on disks. While the compatibility to the STL supports ease of use and compatibility with existing applications, another design priority is high performance.
They also have external memory suffix arrays.
Source: geomblog.
Update: Hazel pointed me to TPIE:
TPIE is a software environment (written in C++) that facilitates the implementation of external memory algorithms. The goal of theoretical work in the area of external memory algorithms (also called I/O algorithms or out-of-core algorithms) has been to develop algorithms that minimize the Input/Output communication (or just I/O) performed when solving problems on very large data sets. The area was effectively started in the late eighties by Aggarwal and Vitter and subsequently I/O algorithms have been developed for several problem domains.
Peter Norvig shows us how to solve every Sudoku problem in 100 lines of code using constraint propagation and search.
FACULTY OF COMPUTER SCIENCE NOTICE OF ORAL DEFENCE MCS Degree Efficient Storage Methods for a Literary Data Warehouse By Steven W. Keith Examining Committee: Supervisors: Dr. Owen Kaser (UNBSJ), Dr. Daniel Lemire (Adjunct Prof., Univ.of Quebec) Chairperson: Dr. Larry Garey (UNBSJ) Internal Reader: Dr. Weichang Du External Reader: Dr. George Stoica (UNBSJ) Monday, May 29, 2006 10:30 a.m. VIDEO CONFERENCE UNBSJ LOCATION: MacMurray Room-Oland Hall- Rm 203 UNBF LOCATION: Multimedia Center(1st floor, room 126)Marshall D'Avray Hall ABSTRACT Computer-assisted reading and analysis of text has applications in the humanities and social sciences. Ever-larger electronic text archives have the advantage of allowing a more complete analysis but the disadvantage of forcing longer waits for results. This thesis addresses the issue of efficiently storing data in a literary data warehouse. The method in which the data is stored directly influences the ability to extract useful, analytical results from the data warehouse in a timely fashion. A variety of storage methods including mapped files, trees, hashing, and databases are evaluated to determine the most efficient method of storing cubes in the data warehouse. Each storage method's ability to insert and retrieve data points as well as slice, dice, and roll-up a cube is evaluated. The amount of disk space required to store the cubes is also considered. Five test cubes of various sizes are used to determine which method being evaluated is most efficient. The results lead to various storage methods being efficient, depending on properties of the cube and the requirements of the user. ALL GRADUATE STUDENTS ARE ENCOURAGED TO ATTEND ********************* Linda Sales Graduate Studies Program Administrative Assistant Faculty of Computer Science University of New Brunswick 540 Windsor Street Fredericton, NB E3B 5A3 Phone: 506-458-7285 Fax: 506-453-3566
Yaroslav has this thought provoking article on the Curse of Dimensionality:
(…) consider a cube of width 1. As dimension increases, the volume stays the same. But (…) eventually almost all the mass is concentrated in the corners (meaning outside of the inscribed sphere).

The plot is especially shocking: at d=8, the sphere of diameter 1 inscribed in the unit cube has a negligible volume!
I really shouldn’t share these tricks since they are secret weapons for massive research productivity (SWMRP), but I hope some of you will share your own research productivity tricks also, so that academic bloggers will quickly dominate the research world.
Disclaimer: if you use these tricks, you ought to be able to easily write 15 papers a year. Ah! But why don’t I write so many papers? Because I’ve got bad habits such as constantly changing my field of research, thinking for a very long time about non-paperable ideas or writing code, sometimes pointless code, for months at a time, just because I like playing with live algorithms. Also, I’m a little bit dumb and ignorant.
Thomas Easton promotes algorithms as a higher form of science in his paper Beyond the algorithmization of the sciences:
Algorithms have thus made biology as useful a science as physics, chemistry, and computer science. But are algorithms enough to move biology closer to the throne? Are they math?
(…)
Five decades ago, most mathematicians would have said no. Then in the 1970s, they discovered the value of computers for “proving” theorems (…)
Myself, I haven’t had time to spend much time thinking about it, but it is quite clear that I value algorithms at least as highly as I value theorems.
There might be hope for Java after all. Google just published is AJAX-based web toolkit in Java. Cross-browser compatibility is a major pain with AJAX, but this toolkit solves it all. Or so says Google.
19 queries. 0.410 seconds. Valid XHTML
Powered by WordPress
© 2004-2009, Daniel Lemire (lemire at acm dot org). This work is licensed under a Creative Commons License.
Subscribe to this blog
in a reader or
by Email.