What is an effective social network?

Many democratic systems require vote diversity. You do not get elected prime minister of Canada by rallying the largest number of voters. You also need to have your votes spread out over several regions.

Similarly, Scott Karp argues that completely open social networks fail. He takes two examples: Digg and Wikipedia.

Digg recommends web sites based on user votes. They recently modified their algorithm:

The algorithm change effectively holds back from the homepage any story that is Dugg by the same groups of friends, i.e. a group that is not “diverse,” (…)

As for wikipedia, Karp points out that it is not a really open system since a group of editors have a great deal of control.

Stephen Downes asks an interesting question: what constraints make a network effective?

The wisdom of crowds is not obtained by mere voting. What is required — as the new Digg algorithm explicitly recognizes — is diversity.

I would like to formalize this problem. You are given a set of users and their votes on several issues as in the Digg community. You are not given out explicitly what the cliques — or set of friends — are. Is there a canonical way to take into account diversity when counting votes?

Who should be buying expensive commercial database systems?

According to Curt Monash, few people should be buying high-end Database management systems:

There are relatively few applications that wouldn’t run perfectly well on PostgreSQL or EnterpriseDB today. (…)

What’s more, these mid-range database management systems can have significant advantages over their high-end brethren. The biggest is often price, for licenses and maintenance alike. Beyond that, they can be much easier to administer then their more complex counterparts. (…)

And what these mid-range DBMS don’t do today, they likely will do soon. (…)

EnterpriseDB is equal or superior in every way I can think of to Oracle7, a few security certifications perhaps excepted.

If you work for an organization that has expensive contracts with Oracle or Microsoft for their DBMS, it is most certainly in vain.

Meanwhile, the world of open source Business Intelligence is getting more interesting every day. We now have Pentaho Mondrian, Jedox, Birt, Enhydra Octopus, and so on. In 2005, I asked whether open source was ready for Business Intelligence. The question seems less controversial in 2008, doesn’t it?

Most of the database industry has been commoditized. If you stick around with these old schemas, you lose.

Research questions about… tag clouds?

Tag clouds are graphical representations of attributes and their relative importance. In a recent paper, we have argued that tag clouds might help bridge the gap between collaboration and Business Intelligence.

Here are some fun things to do with tag clouds:

  • In our paper, a tag cloud computation is the equivalent of an approximate orthogonal top-k range query. There has been little work in this area. We propose error measures for this problem. Our own approach is based on the pre-computation of icebergs.
  • Unlike bar charts, a tag cloud can have 50, 100 or 150 attributes. It makes it easier to collaborate because you do not need so often to rely on hierarchies. However, tag clouds tend to mix badly with non-nominal dimensions such as time or price. More generally, more work is needed on multidimensional tag clouds.
  • The problem of optimally drawing tag clouds is still very much open.

My top blog posts in 2007

Now that January 2008 is coming to an end, maybe it is time to give 2007 a final loop. According to my logs, my most popular blog posts in 2007 are:

Database indexes are less useful than you think

An index helps you find an item without scanning all of the data. David DeWitt and and Michael Stonebraker have made comments opposing index-light systems such as MapReduce, SimpleDB, and CouchDB.

But David DeWitt and and Michael Stonebraker failed to tell us about schemas falling apart as you scale up. To them, database theory took us out of the dark ages and these new kids are taking up back in caves. I have a different take:

  • Initially, you have a messy start-up. You do the accounting, Joe takes care of hiring the new staff and your wife answers the phone. This is an analogy to the early database days before schemas and relational models.
  • The company grows and you organize it clearly. You now have an IT department, an accounting department, and so on. This is analogous the classical database technology David and Michael say we should respect.
  • Eventually, you have 1500 employees, half of them working from home in India. Nobody knows how many IT departments you have or whether you have one at all. By analogy, as you scale up, the classical database schemas and indexes become much less useful.

Update: Here is a comment by Mark C. Chu-Carroll

(…) indexing is a great tool if your data is tabular, and you have a central index that you can work with. But if your task isn’t fundamentally relational, and what you really need is computation then indexes aren’t going to help.

Solid-state drives: when external memory becomes as fast as internal memory

Steve Jobs just introduced the MacBook Air. The MacBook Air is thin and light, but what matters to me is that it uses a solid-state drive:

Using technology similar to that in the iPod nano and other Flash-based products, MacBook Air introduces a solid-state drive. This drive has no moving parts and can access data more quickly than standard hard drives, so you’ll enjoy a boost in performance when starting up your computer and opening files and applications. In addition, solid-state drives offer greater durability and improved resistance to data loss in the event of an accidental drop.

This follows recent announcements by storage vendors such as IBM and EMC who have started offering solid-state drives for enterprise needs.

Solid-state drives are compelling:

  • Solid-state drives have access speeds about 250 times faster.
  • Solid-state drives use less power (over 30% less).
  • Solid-state drives are silent.
  • Solid-state drives are typically much smaller.
  • Solid-state drives are between 15 to 20 times more expensive, but prices are coming down.

I estimate that typical RAM is now only 10 or 20 times faster to access than a solid-state drive. These new drives lower the gap between internal and external memory.

So, external memory becomes internal memory? Maybe not. For example, solid-state drives tend to have poor random write performance. You better write the data sequentially.

Disclaimer. I wish I was an expert on solid-state drives, but I am not. Please correct me if I am wrong.

« Previous PageNext Page »

18 queries. 0.386 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.