Attribute Value Reordering for Efficient Hybrid OLAP
Abstract
The normalization of a data cube is the process of choosing an ordering for the attribute values, and the chosen ordering will affect the physical storage of the cube's data. For large multidimensional arrays, proper normalization can lead to more efficient storage in hybrid OLAP contexts that store dense and sparse chunks differently. We show that it is NP-hard to compute an optimal normalization even for 1x3 chunks, although we find an exact algorithm for 1x2 chunks. When attributes are nearly statistically independent, we show that an optimal normalization is given by dimension-wise attribute frequency sorting, which can be done in time O(d n log(n)) for data cubes of size n^d. When attributes are not independent, we propose and evaluate a number of heuristics.
Our optimized hybrid OLAP storage mechanism was observed to be 44% more storage efficient than ROLAP and the gains due to normalization alone accounted for 45% of this increase in efficiency.
Keywords
Multidimensional Databases, Data Cubes, Multidimensional Binary Arrays, OLAP, MOLAP, HOLAP, Normalization, Chunking
Reference
Owen Kaser and Daniel Lemire, Attribute Value Reordering for Efficient Hybrid OLAP, In DOLAP'03, New Orleans, Louisiana, November 7, 2003. NRC 46510.
Download
Hint : It is sometimes necessary to hold down shift while clicking in order to save a document.
Software
We used the Lemur OLAP C++ library for the experimental part of the paper. This library is available for the public (GPL).
BibTeX
@inproceedings{LemireDOLAP2003,
author = {Owen Kaser and Daniel Lemire},
title = {Attribute Value Reordering for Efficient Hybrid OLAP},
booktitle = {Proceedings of DOLAP'03},
organization = {ACM},
month = {November},
year = {2003},
url = {http://www.daniel-lemire.com/fr/documents/publications/p19-kaser-nrc.pdf},
}
Author
- Owen Kaser: owenATunbsjDOTca
- Daniel Lemire: lemire at acm.org
Related work
- Owen Kaser and Daniel Lemire, Attribute Value Reordering For Efficient Hybrid OLAP, accepted in Information Sciences on September 2005, to appear.
- Owen Kaser's publications
- Daniel Lemire's publications