Database Questions for 2010: What’s On My Mind

I started 2009 with an interest in Web  2.0 OLAP and collaborative data processing. The field of collaborative data processing has progressed tremendously. Last year, we got Google Fusion Tables and data warehousing products are getting more collaborative.

In 2010, my research might focus more on database theory—while maintaining a strong experimental bias. Specifically, I am currently thinking about:

  • Lightweight compression. The goal of lightweight compression is save CPU cycles—not storage! Of course, the CPU architecture is critical. Thus, you have to worry about instruction-level parallelism. Measuring the quality of the compression by the compression ratio is wrong.
  • Row reordering. Some compression formats, such as run-length encoding, are sensitive to the order of the objects in a database. Reordering them is NP-hard. The efficiency of the heuristics depends on the compression format. I will continue my earlier work on this topic.
  • Concurrency and parallelism. Some believe that multicore CPUs can be used to compress data even more aggressively. It might be misguided. Instead, we must focus on embarrassingly parallel problems. Already, we can scan a large in-memory table using several CPU cores quite fast. In 2010, we should empower our users so that they can explore their data more freely.
  • String hashing. I have argued on this blog that universal hashing of long strings is impossible. While hashing strings is textbook material, our understanding of hashing can be improved further.

Further reading: Search Questions for 2010: What’s On My Mind by Daniel Tunkelang

8 Comments

  1. Don’t bother!
    There are cheaper ways to improve software performance ;-)
    CS research is mostly another form of “mathematical disease”.
    Oh! Happy New Year nevertheless…

    Comment by Kevembuangga — 4/1/2010 @ 13:24

  2. @Kevembuangga

    Good points.

    You may enjoy this blog post by Maarten van Emden on “What is Computer Science?”

    http://vanemden.wordpress.com/2009/12/27/what-is-computer-science/

    Comment by Daniel Lemire — 4/1/2010 @ 13:54

  3. I just love these arrogant people who think that CS education and academia are useless. Of course, only they — real warriors — save the world and write the code. Unfortunately, quite often their code contains such horrible jokes as bubble sort. Which, in turn, other folks have to debug and to fix.

    Comment by Itman — 4/1/2010 @ 16:44

  4. Unfortunately, quite often their code contains such horrible jokes as bubble sort. Which, in turn, other folks have to debug and to fix.

    Debugging bubble sort?
    What are the chances that a bug show up in bubble sort relative to any other sort?
    It’s ugly and horrendously inefficient but if you have a dozen items to sort it could be that it IS the right solution.
    You didn’t really grok the points in the linked comment, did you?

    Comment by Kevembuangga — 5/1/2010 @ 1:06

  5. Chances are huge! I had to fix this once myself. The premature optimization problem arises usually, when there are two comparable algorithms and you chose a more complicated. Then, of course, it is a waste of time. But to chose the right algorithm, you should first understand the problem. That requires some CS knowledge. Moreover, Moore’s law would not hold forever. Sooner or later you have to optimize to get this 10 or 50 percent increase. In some applications such as large-scale applications (e.g. web search), you have to do this already.

    Comment by Itman — 5/1/2010 @ 1:20

  6. More exactly, by “lightweight” you mean “compute-cost-effective” compression. Can compression save CPU cycles? Maybe. Sometimes. Would be good to explore and map that dimension somewhat.

    Comment by Preston L. Bannister — 5/1/2010 @ 1:39

  7. @Bannister

    Yes, you are quite right. By lightweight, I mean that it reduces the computational cost.

    Comment by Daniel Lemire — 5/1/2010 @ 8:24

  8. I had to fix this once myself.

    OK, fix that C one:

    for (i = size – 1; i;) {
        for (j = i; ++j < size && a[j-1] > a[j]; swap(a[j-1], a[j]));
        while (–i && a{i] <= a[i+1]);
    }
    :-)

    Comment by Kevembuangga — 5/1/2010 @ 19:17

Sorry, the comment form is closed at this time.

« Blog's main page

23 queries. 0.365 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.