Thursday, February 21st, 2008

When a terabyte is small

Filed under: Science and Technology — Daniel Lemire @ 11:54

With Kamel and Owen, I am working on a paper involving database indexes. We had over a terabyte of space, and yet, in the middle of the production of the paper, we ran out of space. Only a year ago, I thought that one terabyte was large.

So, I ask our technician about getting a new drive. He comes back with a small 500 GB drive. I ask how much they cost, he says “$200.”

This is a new frontier for me. Producing a simple research paper required us to generate more than one terabyte of data. Moreover, we will generate much more data before the paper is finished.

Assuming I write, say, 4 research papers a year, this means that I will generate over 4 terabytes of data a year at my current rate which is going to cost me about $1600 in storage.

3 Comments »

  1. I think this is one big obstacle for current research in IR. The time spent dealing with “infrastructure” is getting bigger. This leaves less time for real research. I think that, in the broad field of IR, “industry research” is going to produce much more results in the next years than “academia research”.

    Google’s Peter Norvig is quoted saying - Google does not have the best minds, they have a great infrastructure that allows them to experiment much faster.

    How can academia deal with this?

    Comment by Sérgio Nunes — 21/2/2008 @ 15:00

  2. LOL!!!
    You are probably not old enough to know that rule:
    No matter the size of the drive it is ALWAYS 95/98% full so for the “next run” (whatever this is) you have first to upgrade.
    This is probably even more “solid” than Moore’s law.
    In the very early 70s a 5 megabytes drive was “large”…

    Comment by Kevembuangga — 21/2/2008 @ 15:59

  3. BTW, why not using outsourced storage and computation power?
    The NYT did it:
    http://open.blogs.nytimes.com/tag/hadoop/

    (via Lukas Biewald http://www.lukasbiewald.com/?p=134 )

    Comment by Kevembuangga — 21/2/2008 @ 16:20

RSS feed for comments on this post.

Leave a comment

Warning: When entering a long comment, please ensure that you make copy of your text prior to submitting it. If the server should fail or if you hit a bug, you might lose your work. I am not responsible for your lost effort.

To spammers: I carefully review every single post and make sure that spam gets deleted. You are wasting your time if you are manually entering spam using this form. Read my terms of use to see what I consider to be abusive.

Example: I + II + IX= XII. Yes, you have to enter a roman numeral. (Answer must be in upper case.)

« Blog's main page

41 queries. 0.460 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.