Wednesday, May 31st, 2006

EGC 2007 (September 22nd 2006 / January 23-26 2007)

Filed under: Passed CFP — Daniel Lemire @ 20:31

The largest French Data Mining conference (EGC 2007) has a new CFP. It will be held in Belgium.

WWW2007 (November 20, 2006 / May 8-12, 2007)

Filed under: Passed CFP — Daniel Lemire @ 6:04

The WWW2007 call for papers is out.

WWW2007 seeks original papers describing research in all areas of the Web. Papers should not have been published or be in submission at another conference or journal. Topics include but are not limited to: Browsers and User Interfaces, Data Mining, Performance and Scalability, Search, Web Services, etc.

Tuesday, May 30th, 2006

Steven got his M.Sc.

Filed under: Academia/Research — Daniel Lemire @ 20:45

Steven’s presentation on Efficient Storage Methods for a Literary Data Warehouse went well and he got his M.Sc. Congratulations!

(I drove 12 hours to get to Saint John… and 12 hours back… I’m exhausted!)

Friday, May 26th, 2006

STXXL: C++ Standard Template Library for Extra Large Data Sets

Filed under: — Daniel Lemire @ 16:50

I haven’t tried it, but the description is cool:

The The core of STXXL is an implementation of the C++ standard template library STL for external memory (out-of-core) computations, i.e., STXXL implements containers and algorithms that can process huge volumes of data that only fit on disks. While the compatibility to the STL supports ease of use and compatibility with existing applications, another design priority is high performance.

They also have external memory suffix arrays.

Source: geomblog.

Update: Hazel pointed me to TPIE:

TPIE is a software environment (written in C++) that facilitates the implementation of external memory algorithms. The goal of theoretical work in the area of external memory algorithms (also called I/O algorithms or out-of-core algorithms) has been to develop algorithms that minimize the Input/Output communication (or just I/O) performed when solving problems on very large data sets. The area was effectively started in the late eighties by Aggarwal and Vitter and subsequently I/O algorithms have been developed for several problem domains.

Thursday, May 25th, 2006

Peter Norvig on solving every Sudoku puzzle

Filed under: — Daniel Lemire @ 17:33

Peter Norvig shows us how to solve every Sudoku problem in 100 lines of code using constraint propagation and search.

Wednesday, May 24th, 2006

Efficient Storage Methods for a Literary Data Warehouse

Filed under: Data Warehousing and OLAP — Daniel Lemire @ 9:43
FACULTY OF COMPUTER SCIENCE
NOTICE OF ORAL DEFENCE

MCS Degree

Efficient Storage Methods for a Literary Data Warehouse
By
Steven W. Keith
Examining Committee:
Supervisors: Dr. Owen Kaser (UNBSJ),
Dr. Daniel Lemire (Adjunct Prof., Univ.of Quebec)
Chairperson: Dr. Larry Garey (UNBSJ)
Internal Reader: Dr. Weichang Du
External Reader: Dr. George Stoica (UNBSJ)

Monday, May 29, 2006
10:30 a.m.
VIDEO CONFERENCE
UNBSJ LOCATION: MacMurray Room-Oland Hall- Rm 203
UNBF LOCATION:  Multimedia Center(1st floor, room 126)Marshall D'Avray Hall

ABSTRACT

Computer-assisted reading and analysis of text has applications in the
humanities and social sciences. Ever-larger electronic text archives have
the advantage of allowing a more complete analysis but the disadvantage
of forcing longer waits for results. This thesis addresses the issue of
efficiently storing data in a literary data warehouse. The method in which
the data is stored directly influences the ability to extract useful,
analytical results from the data warehouse in a timely fashion.
A variety of storage methods including mapped files, trees, hashing,
and databases are evaluated to determine the most efficient method
of storing cubes in the data warehouse. Each storage method's ability
to insert and retrieve data points as well as slice, dice, and roll-up a
cube is evaluated. The amount of disk space required to store
the cubes is also considered. Five test cubes of various sizes are used to
determine which method being evaluated is most efficient. The results lead
to various storage methods being efficient, depending on properties of the
cube and the requirements of the user.

ALL GRADUATE STUDENTS ARE ENCOURAGED TO ATTEND

*********************
Linda Sales
Graduate Studies Program
Administrative Assistant
Faculty of Computer Science
University of New Brunswick
540 Windsor Street
Fredericton, NB
E3B 5A3

Phone: 506-458-7285
Fax: 506-453-3566

Monday, May 22nd, 2006

Curse of Dimensionality and intuition

Filed under: — Daniel Lemire @ 19:52

Yaroslav has this thought provoking article on the Curse of Dimensionality:

(…) consider a cube of width 1. As dimension increases, the volume stays the same. But (…) eventually almost all the mass is concentrated in the corners (meaning outside of the inscribed sphere).

The plot is especially shocking: at d=8, the sphere of diameter 1 inscribed in the unit cube has a negligible volume!

Next Page »

30 queries. 0.265 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.