Monday, August 25th, 2008

Proceedings of the Large-Scale Recommender Systems workshop

Filed under: Science and Technology — Daniel Lemire @ 7:50

We have made available a PDF copy of the proceedings for the second Netflix/Large-Scale KDD Recommender workshop. It includes the following papers:

  • Jinlong Wu and Tiejun Li. A Modified Fuzzy C-Means Algorithm For Collaborative Filtering
  • Gavin Potter. Putting the collaborator back into collaborative filtering
  • Andreas Toescher, Michael Jahrer and Robert Legenstein. Improved Neighborhood-Based Algorithms for Large-Scale Recommender Systems
  • Oscar Celma and Pedro Cano. From hits to niches? or how popular artists can bias music recommendations
  • Domonkos Tikk, Gabor Takacs, Istvan Pilaszy and Bottyan Nemeth. Investigation of Various Matrix Factorization Methods for Large Recommender Systems

Friday, August 22nd, 2008

Cool software design insight #4

Filed under: Science and Technology — Daniel Lemire @ 15:27

Mathematicians and philosophers often make terrible programmers. They also tend to write gibberish even in English. (Ok, I do not know if it is a fact, but stay with me.)

A terrible way of programming is to try to hold the entire problem in your head and to put it into code in one shot. Why? Because you are almost certainly overestimating your brain. Your mind can only cope with few parameters at a time.

Here is how you have to program:

  • If you have the luxury to start fresh, start small. Otherwise, make sure you understand and respect the code you start with.
  • Identify specific changes you want to implement. Small changes! Then implement them. Then test them!

You should never redesign working software from the ground up without incremental testing. Never. Even if you work alone.

Interestingly, I also write my papers incrementally, fixing small things one after the other. No other way works for me.

How to select even or odd rows in a table using CSS 3

Filed under: Science and Technology — Daniel Lemire @ 14:08

CSS 3 is around the corner. Already we are seeing some benefits. The latest versions of Safari and Opera, as well as the beta version of Firefox allow you to select even or odd rows in a table using only CSS:


tr:nth-child(2n+1) {
background-color: blue;
}
tr:nth-child(2n) {
background-color: red;
}

See? No ECMAScript, no server-side programming. Alas, no sign of support for this in Internet Explorer.

Thursday, August 21st, 2008

Quick CSS quiz

Filed under: Science and Technology — Daniel Lemire @ 21:49

Given these CSS instructions,


z[x] > a[i] {color: blue;}
z z[x] a {text-decoration: underline;}
z > z a , z z z + a { color: red ;}

what will be the color of the text in the following XML file?

<?xml version="1.0" encoding="ISO-8859-1" ?>
<?xml-stylesheet type="text/css"
href="test.css"?>
<z><z x="x">
<z />
<a i="x">my text</a>
</z>
</z>

Wednesday, August 20th, 2008

The mythical bitmap index

Filed under: Data Warehousing and OLAP — Daniel Lemire @ 8:50

A bitmap index is a popular data structure to speed up the retrieval of matching rows in a table. (It is also used in Information Retrieval, to retrieve matching words.)

Building a bitmap index is not hard. You match each possible value with a vector of bits. When the value occur, you insert the value “true” in the vector, otherwise you insert the value “false”. Hence, you get a set of vectors containing binary values. To find out when a particular value occur, you just load up the corresponding vector of bits. (You still need a B-tree or a hash table to map values of the corresponding vector.) In practice, programming with (compressed) vector of bits leads to much faster software than working of sets of row IDs.

The size of such a matrix is the number of distinct values times the number of rows. Hence, some people conclude that bitmap indexes are mostly good to index data were we have few distinct values. In fact, the first sentence of the bitmap index article on wikipedia tells us that bitmap indexes are meant to index data such as “gender” were only two values are possible.

Of course, this reasoning is wrong. Why?

Because bitmap indexes (the big matrix of zeroes and ones) are never stored uncompressed. You always manipulate them in compressed form.

How big is the compressed matrix? Let us see. By run-length encoding (RLE), the simplest compression algorithm I know, each vector of bits has a compressed size at most proportional to the number of occurrences of the corresponding value. If you sum it up, you have that the compressed size of a bitmap index is at most proportional to the size of your table! Irrespective of the number of distinct values!

For a wide range of decision-support applications, bitmap indexes can match or surpass the performance of most indexes, irrespective of the number of distinct values.

You don’t believe me? Read these benchmarks using an Oracle database: “In summary, bitmap indexes are best suited for DSS regardless of cardinality (…)”.

With my very own bitmap index library, I have scaled up to hundreds of millions of attribute values without a problem and tables having several gigabytes of data. And my budget is not comparable to Oracle (I have one developer, me).

I am not saying that bitmap indexes are always the best solution. Of course not! But there is no published evidence that the number of distinct values is a practical criterion.

Then why does the bitmap index article on wikipedia suggests otherwise? Because it is all over the blogosphere and posting boards… because when I tried to fix the wikipedia article, my changes got reverted. So, I post my rebuttal here. If you have practical evidence that bitmap indexes mostly work well when you have 2-3 attribute values, let us see it. Otherwise, help me kill this myth!

Friday, August 1st, 2008

A good word for Nintendo repair services

Filed under: Science and Technology — Daniel Lemire @ 8:59

A few hours a month, I like to play video games. I never get very far because I lack when teenagers have in abundance: time. However, I make it up by complaining that in my time, we had 4 colors—when we had colors at all—and a 80×25-pixel resolution, and we programmed our games ourselves in BASIC—if not assembly.

In any case, the Nintendo Wii is a great toy, but mine had recurring troubles with something called the “black screen of death” (BSD). Basically, you insert a disk and the Wii will fail loading it displaying a screen that goes like so: “disk error, press eject.” The first few times this happens, you think there is a scratch on the disk or some dust over your laser. Unfortunately, as times went by, my Wii got worse and worse. Eventually, a week ago, I got a BSD whenever I tried loading any game up.

I am not a man who likes to call up a company or send a product up for repair. However, I decided to call Nintendo. Turns out that they are very nice about it. They gave me a repair order over the phone, I shipped the Wii (for free) and now, 3 days later, I got an email saying that they are returning a working Wii to me.

A few things that Nintendo does right:

  • You can call them 7 days a week.
  • When you call them up, they do not make you wait with annoying messages like “your call is important to us.” No. They answer the phone almost immediately.
  • The support staff does not assume you are an idiot. They do not ask you whether you forgot to plug in the Wii.
  • At all times, you can track the repair order with their Web sites, and you receive automated emails.

My experience with Sony and my PS2 was much worse when I got similar problems. They had me call up California and an annoying lady took me on a ride. After 15 minutes on the phone, all she told me was that she was sorry that she could not help me. With Nintendo, there are no excuses. They take charge of my problem. Period.

Before I conclude: why are we still using disks at all? Most games are under 400 MB. I can easily download 10 or 20 such games a year without taxing my Internet bandwidth at all. Storing that much data is not an issue: a 10 GB Flash drive is dirt cheap. If the Flash drive fails, I may have to redownload the games, but who cares? And wouldn’t a Flash drive be much faster than an optical disk (at least for random reads)? I wish I could at least copy the optical drives on a Flash drive and have my game console load it up from there. I could easily put 10 games on a small drive. The Wii has USB ports, so there is no reason why this could not work. Maybe hackers out there even made it work already… any clue?

Update: I got the Wii back today (Friday), while I shipped my unit on Monday night. They changed the disk drive.

Thursday, July 31st, 2008

Given up on Eclipse, now with NetBeans

Filed under: Science and Technology — Daniel Lemire @ 15:21

I write most of my code using vim. This winter, Kamel made me discover Eclipse.

I dislike IDEs in general because they have a tendency to force me to work in certain ways that are suboptimal. For example, if I need to remember to go to menu X and set option Y to build my project correctly, then that is simply not portable. Everytime I will go to a new machine, I will need to remember these precise steps. Moreover, if I cannot build my code without a GUI, then I cannot test my code on a remote machine under low bandwidth conditions. Finally, IDEs tend to do several operations silently and when things go wrong, you have layers and layers of abstraction before you can correct the problem.

However, Eclipse allowed me to import my project using subversion and use my very own Makefile! What a great idea. And it worked too!

Up until two days ago. For some reason, Eclipse stopped building my code. Hitting the “build” button simply does nothing. I never changed anything in the settings, but playing with the options did not help. I have no way of knowing what went wrong and after hours spent on the Web chasing the problem, I gave up.

I just downloaded NetBeans, and surprise, surpise! There is a C/C++ NetBeans that will use your makefiles too! Wow!

Not everything is rosy however:

  • Under MacOS NetBeans is much uglier than Eclipse. I guess NetBeans must be using Swing or some other horrible Java GUI system. It really feel like a cheap application.
  • NetBeans was unable to detect my subversion binary. It allowed me to tell it where to find subversion but I had to reboot the application for this setting to work! What?!? Eclipse worked right out of the box with subversion.

My main concern is just how ugly and unprofessional NetBeans look. In comparison, vim is great looking! Sun software people need to learn a thing or two about design.

« Previous PageNext Page »

32 queries. 0.388 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.