Friday, May 21st, 2004

WWW 2004

Filed under: Data Warehousing and OLAP, Science and Technology — Daniel Lemire @ 8:31

Ok, I’m not attending WWW 2004, but Elliotte Rusty Harold (ERH) does and he reports very well on what he sees and hear. Anna and I are really kicking ourselves for not going to WWW2004… but this is a long story. Damn.

Data storage issues?

According to ERH, Rick Rashid who I think is head of Microsoft Research, asked an interesting question… what happens when everyone has 1 TeraByte of storage. It now costs US$1000 to have that much storage, but it will be dirt cheap soon enough.

First, how much is 1 TeraByte… I think it is 1024 GB… Ok… How many 1MB picture can I store… that’s 1024*1024… so about a million pictures… over a year… that’s about a picture every 30 seconds. So, yes, that’s a lot.

I’m reminded of this cool interview with Jim Gray, also from Microsoft Research… and a Turing award recipient… and a model for me as a researcher… Jim wrote this…

Today disk-capacity growth continues at this blistering rate, maybe a little slower. But disk access, which is to say, “Move the disk arm to the right cylinder and rotate the disk to the right block,” has improved about tenfold. The rotation speed has gone up from 3,000 to 15,000 RPM, and the access times have gone from 50 milliseconds down to 5 milliseconds. That’s a factor of 10. Bandwidth has improved about 40-fold, from 1 megabyte per second to 40 megabytes per second. Access times are improving about 7 to 10 percent per year. Meanwhile, densities have been improving at 100 percent per year.

So, we have to be careful: yes, we’ll all have 1TB of storage soon. In fact, the geeks like myself will soon have far more, I think. Just because I think I always had slightly more storage and CPU power than the average Joe (not much, but a little more). However, this unlimited storage world brings its own flock of problems. If you can record a million 1MB pictures a year… what if you want to trace back something in this giant collection of pictures? You can’t possibly look at all of them, it would be boring… plus, your computer might not have the bandwidth to display all of this in a very short time, even if it were useful.

Other people have blogged about this because it has other consequences. For the last few years, for example, I’ve been storing all of my important files, and their revision history, using CVS, on daniel-lemire.com. I can afford this because my host has affordable storage. This means that in several years, I’ll be able to go back in time and precisely see what I had in terms of research results on a given day. This is not exactly new: experimental researchers keep lab books. However, this is electronic data… which means that software can go, parse it, analyse it. Here’s a dream: a piece of software will go over my logs, learn how I do research and assist me in some way.

Is there anything solid to Semantic Web?

Here’s an interesting quote from ERH:

RDF and OWL enable standard, machine interpretable semantics. XML enables only syntax. Or at least so he claims. I agree about XML, but so far I’ve yet to see evidence that there’s more semantics in RDF/OWL than in plain XML. There’s a huge number of big words being tossed around this morning (business inferencing, .NET tier, dynamic applications, reclassify corporate data, proprietary metadata markup, “align the semantics of federated distributed sources”, “rich, automatic, service orchestration”, etc.) which mostly seems to obscure the fact that none of this does anything.

or

They’re investing in and exploring a broad range of applications: Semantic blogging, information portals, and SMILE Joint. He’s describing several application areas, but again I don’t see how RDF/OWL have anything to do with what he’s talking about. I realized part of the problem: no one is showing any code. I feel like I’m a mechanical engineer in 1904 listening to a bunch of other engineers talks about airplanes, but nobody’s willing to show me how they actually expect to get their flying machines into the air. Maybe they can do it, but I won’t believe it until I see a plane in the air, and even then I really want to take the machine apart before I believe it isn’t a disguised hot air balloon. A lot of what I’m hearing this morning sounds like it could float a few balloons.

I think that ERH think exactly like I think. You want to semantically mark up something using XML, you do this…

<team>
<player>joe</player>
<player>jill</player>
</team>

I’ve seen no evidence that there is any thing more out there you can do. Oh… there are extensive specifications, nice diagrams, but like ERH says, I haven’t seen the plane fly yet, just engineers talking about it.

Of course, I should point to this article which also goes in the same direction. And I’m currently talking with a new colleague, Jean Robillard who has his own arguments.

No Comments »

No comments yet.

RSS feed for comments on this post.

Leave a comment

Warning: When entering a long comment, please ensure that you make copy of your text prior to submitting it. If the server should fail or if you hit a bug, you might lose your work. I am not responsible for your lost effort.

To spammers: I carefully review every single post and make sure that spam gets deleted. You are wasting your time if you are manually entering spam using this form. Read my terms of use to see what I consider to be abusive.

Example: I + II + IX= XII. Yes, you have to enter a roman numeral. (Answer must be in upper case.)

« Blog's main page

30 queries. 1.120 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.