Wavelet-Based Relative Prefix Sum Methods for Range Sum Queries in Data Cubes

This paper won the Best Paper Award!

Abstract

Data mining and related applications often rely on extensive range sum queries and thus, it is important for these queries to scale well. Range sum queries in data cubes can be achieved in time O(1) using prefix sum aggregates but prefix sum update costs are proportional to the size of the data cube O(n^d). Using the Relative Prefix Sum (RPS) method, the update costs can be reduced to the root of the size of the data cube O(n^d/2). We present a new family of base b wavelet algorithms further reducing the update costs to O(n^d/B) for B as large as we want while preserving constant-time queries. We also show that this approach leads to O(log^d n) query and update methods twice as fast as Haar-based methods. Moreover, since these new methods are pyramidal, they provide incrementally improving estimates.

Keywords

B-adic wavelets, multidimensional databases, knowledge discovery, OLAP, MOLAP, data cube, orthogonal range queries.

Reference

Daniel Lemire, Wavelet-Based Relative Prefix Sum Methods for Range Sum Queries in Data Cubes. Proceedings of CASCON 2002, Toronto, Canada, October 2002. (NRC 44967)

Download

Hint : It is sometimes necessary to hold down shift while clicking in order to save a document.

ACM

This paper is available through the ACM Digital Library.

Software

Java source code (zip)

The software available here is the one used to generate the data for the paper. Please note that a better, faster implementation in C++ called Lemur OLAP is available from the author (lemire arobas ondelette dot com) and might soon be released with an open source license.

Citeseer

This paper is listed on citeseer. This can be useful to find quickly related papers.

BibTeX

@inproceedings{LemireCASCON2002,
   author    = {Daniel Lemire},
   title     = {Wavelet-Based Relative Prefix Sum Methods for Range Sum Queries in Data Cubes},
   booktitle = {Proceedings of CASCON 2002},
   organization = {IBM},
   month     = {October},
   year      = {2002},
   url = {http://www.daniel-lemire.com/fr/documents/publications/multihaar_final.pdf},
}

Author

Related work

Cited by

This paper is cited by

Errata

The original paper submitted to CASCON 2002 had a typo pointed to my by professor C.K. Poon. The error was in the complexity analysis where instead of having updates in time log^d, the paper wrongly stated that updates where in time d log. Unfortunately, it is likely that version of this paper with the typo will remain on the web somewhere even though I did what I could to correct the typo throughout. Sorry.

Valid XHTML 1.0! Valid CSS!