Wavelet-Based Relative Prefix Sum Methods for Range Sum Queries in Data Cubes
This paper won the Best Paper Award!Abstract
Data mining and related applications often rely on extensive range sum queries and thus, it is important for these queries to scale well. Range sum queries in data cubes can be achieved in time O(1) using prefix sum aggregates but prefix sum update costs are proportional to the size of the data cube O(n^d). Using the Relative Prefix Sum (RPS) method, the update costs can be reduced to the root of the size of the data cube O(n^d/2). We present a new family of base b wavelet algorithms further reducing the update costs to O(n^d/B) for B as large as we want while preserving constant-time queries. We also show that this approach leads to O(log^d n) query and update methods twice as fast as Haar-based methods. Moreover, since these new methods are pyramidal, they provide incrementally improving estimates.
Keywords
B-adic wavelets, multidimensional databases, knowledge discovery, OLAP, MOLAP, data cube, orthogonal range queries.
Reference
Daniel Lemire, Wavelet-Based Relative Prefix Sum Methods for Range Sum Queries in Data Cubes. Proceedings of CASCON 2002, Toronto, Canada, October 2002. (NRC 44967)
Download
Hint : It is sometimes necessary to hold down shift while clicking in order to save a document.
ACM
This paper is available through the ACM Digital Library.
Software
Java source code (zip)The software available here is the one used to generate the data for the paper. Please note that a better, faster implementation in C++ called Lemur OLAP is available from the author (lemire arobas ondelette dot com) and might soon be released with an open source license.
Citeseer
This paper is listed on citeseer. This can be useful to find quickly related papers.
BibTeX
@inproceedings{LemireCASCON2002,
author = {Daniel Lemire},
title = {Wavelet-Based Relative Prefix Sum Methods for Range Sum Queries in Data Cubes},
booktitle = {Proceedings of CASCON 2002},
organization = {IBM},
month = {October},
year = {2002},
url = {http://www.daniel-lemire.com/fr/documents/publications/multihaar_final.pdf},
}
Author
- Daniel Lemire: lemire at acm.org
Related work
Cited by
This paper is cited by
- M. Jahangiri, D. Sacharidis, and C. Shahabi. SHIFT-SPLIT: I/O efficient maintenance of wavelet-transformed multidimensional data. In Proceedings of the 2005 ACM SIGMOD international Conference on Management of Data, June 2005.
Cyrus Shahabi, Mehrdad Jahangiri, Dimitris Sacharidis, Hybrid Query and Data Ordering for Fast and Progressive Range-Aggregate Query Answering, International Journal of Data Warehousing and Mining, April 2005, Vol. 1, No.2, Pages 49-69, Idea Group Publishing.
Francesco Buccafurri, Gianluca Lax, Fast range query estimation by N-level tree histograms, Data & Knowledge Engineering, Volume 51 , Issue 2, pp. 257-275, 2004.
Owen Kaser and Daniel Lemire, Attribute Value Reordering for Efficient Hybrid OLAP, In DOLAP'03, New Orleans, Louisiana, November 7, 2003. NRC 46510.
Errata
The original paper submitted to CASCON 2002 had a typo pointed to my by professor C.K. Poon. The error was in the complexity analysis where instead of having updates in time log^d, the paper wrongly stated that updates where in time d log. Unfortunately, it is likely that version of this paper with the typo will remain on the web somewhere even though I did what I could to correct the typo throughout. Sorry.