When a terabyte is small
With Kamel and Owen, I am working on a paper involving database indexes. We had over a terabyte of space, and yet, in the middle of the production of the paper, we ran out of space. Only a year ago, I thought that one terabyte was large.
So, I ask our technician about getting a new drive. He comes back with a small 500 GB drive. I ask how much they cost, he says “$200.”
This is a new frontier for me. Producing a simple research paper required us to generate more than one terabyte of data. Moreover, we will generate much more data before the paper is finished.
Assuming I write, say, 4 research papers a year, this means that I will generate over 4 terabytes of data a year at my current rate which is going to cost me about $1600 in storage.
Facebook
Friendfeed
LinkedIn
SlideShare
Twitter
Delicious
I think this is one big obstacle for current research in IR. The time spent dealing with “infrastructure” is getting bigger. This leaves less time for real research. I think that, in the broad field of IR, “industry research” is going to produce much more results in the next years than “academia research”.
Google’s Peter Norvig is quoted saying - Google does not have the best minds, they have a great infrastructure that allows them to experiment much faster.
How can academia deal with this?
Comment by Sérgio Nunes — 21/2/2008 @ 15:00
LOL!!!
You are probably not old enough to know that rule:
No matter the size of the drive it is ALWAYS 95/98% full so for the “next run” (whatever this is) you have first to upgrade.
This is probably even more “solid” than Moore’s law.
In the very early 70s a 5 megabytes drive was “large”…
Comment by Kevembuangga — 21/2/2008 @ 15:59
BTW, why not using outsourced storage and computation power?
The NYT did it:
http://open.blogs.nytimes.com/tag/hadoop/
(via Lukas Biewald http://www.lukasbiewald.com/?p=134 )
Comment by Kevembuangga — 21/2/2008 @ 16:20