Monday, August 25th, 2008

If you claim high scalability…

Filed under: Academia/Research — Daniel Lemire @ 14:56

I just reviewed a paper where the authors come up with a nice highly scalable algorithm. And it is really scalable too! But to prove just how fast it is, they process 2,000 data points.

This is correct, strictly speaking. Their algorithm runs in O(n) time, so to know how long it would take to process 1000 times more data, just multiply by 1000.

But where is the fun in that?

The insane world of academic publishing

Filed under: Academia/Research — Daniel Lemire @ 8:24

Stephen Few few wrote a post on how insane academic publishing is. If you publish academic papers, his post is worth your time. Don’t miss the comments!

Stephen is not in academia. From his point of view, what is required of him makes no sense:

  • While he does not expect to get paid for publishing a paper, he expects some kind of symbolic reward like a few subscription to the journal: Is there really any question that someone who takes the time to write an article and go through the lengthy process of working with a publisher, deserves a gesture of thanks equaling the cost of postage?
  • Stephen is surprised that reviewers remain anonymous through the entire process: Cloaking the process in anonymity seemed to indicate a level of discomfort with critique that I didn’t expect to find to this degree in academia.
  • He is upset with how IEEE handles copyright:”I have worked with several publishers and I have never had to give up my rights as author. Most modern publishers know that they don’t need to strip authors of their rights in order to do their job.

My own answers:

  • Anonymous review is just a system we refuse to question. Speaking your mind is certainly a dangerous thing—more so in some countries than others. However, I believe a scholar should have the backbone to speak out in the open. Do something else with your life if you are afraid to sign your opinion pieces.
  • The copyright issue is a shame. However, Stephen should also ask why so many employers ask for non-compete clauses. He should also ask why musicians sign away their soul routinely. I have always been puzzled at how easily TV series are killed: clearly the authors lose their copyright along the way. Fortunately, scholars are pretty bad at reading the contracts they sign…

Proceedings of the Large-Scale Recommender Systems workshop

Filed under: Science and Technology — Daniel Lemire @ 7:50

We have made available a PDF copy of the proceedings for the second Netflix/Large-Scale KDD Recommender workshop. It includes the following papers:

  • Jinlong Wu and Tiejun Li. A Modified Fuzzy C-Means Algorithm For Collaborative Filtering
  • Gavin Potter. Putting the collaborator back into collaborative filtering
  • Andreas Toescher, Michael Jahrer and Robert Legenstein. Improved Neighborhood-Based Algorithms for Large-Scale Recommender Systems
  • Oscar Celma and Pedro Cano. From hits to niches? or how popular artists can bias music recommendations
  • Domonkos Tikk, Gabor Takacs, Istvan Pilaszy and Bottyan Nemeth. Investigation of Various Matrix Factorization Methods for Large Recommender Systems

Friday, August 22nd, 2008

Cool software design insight #4

Filed under: Science and Technology — Daniel Lemire @ 15:27

Mathematicians and philosophers often make terrible programmers. They also tend to write gibberish even in English. (Ok, I do not know if it is a fact, but stay with me.)

A terrible way of programming is to try to hold the entire problem in your head and to put it into code in one shot. Why? Because you are almost certainly overestimating your brain. Your mind can only cope with few parameters at a time.

Here is how you have to program:

  • If you have the luxury to start fresh, start small. Otherwise, make sure you understand and respect the code you start with.
  • Identify specific changes you want to implement. Small changes! Then implement them. Then test them!

You should never redesign working software from the ground up without incremental testing. Never. Even if you work alone.

Interestingly, I also write my papers incrementally, fixing small things one after the other. No other way works for me.

How to select even or odd rows in a table using CSS 3

Filed under: Science and Technology — Daniel Lemire @ 14:08

CSS 3 is around the corner. Already we are seeing some benefits. The latest versions of Safari and Opera, as well as the beta version of Firefox allow you to select even or odd rows in a table using only CSS:


tr:nth-child(2n+1) {
background-color: blue;
}
tr:nth-child(2n) {
background-color: red;
}

See? No ECMAScript, no server-side programming. Alas, no sign of support for this in Internet Explorer.

Thursday, August 21st, 2008

Quick CSS quiz

Filed under: Science and Technology — Daniel Lemire @ 21:49

Given these CSS instructions,


z[x] > a[i] {color: blue;}
z z[x] a {text-decoration: underline;}
z > z a , z z z + a { color: red ;}

what will be the color of the text in the following XML file?

<?xml version="1.0" encoding="ISO-8859-1" ?>
<?xml-stylesheet type="text/css"
href="test.css"?>
<z><z x="x">
<z />
<a i="x">my text</a>
</z>
</z>

Peer review is an honor-based system

Filed under: Academia/Research — Daniel Lemire @ 8:28

It would take too long to expose all of the flaws of peer review, here are some:

  • some work is just flat wrong because the reviewers cannot analyze all of the mathematical results, and because they cannot redo the experiments;
  • numerous researchers cheat, sometimes in small ways (”2 out of 3 experiments agree with by theory, let us drop the third one”), sometimes in big ways (”I don’t have time to run these experiments, so let me make up some data”);
  • peer review may perpetuate some biases and prevent researchers from putting into question some fundamental questions (”we decided that this is the right way, if you question it, you are a loony”).

However, for all its faults, peer review remains essential in science. I want other researchers to read and criticize my work. I enjoy it very much when people try to find flaws in my work. I think that my work is serious enough that when people point out flaws, I am usually aware of them at some level and I can respond easily (and enjoy the process).

The type of peer review I do not enjoy is the country-club approach: 1) does the paper agrees with the goals and views of the reviewers 2) is the paper written by someone we can respect? Fortunately, you can navigate the system and stay away (mostly) from country-club peer review.

But why do I still like peer review despite its obvious flaws? Because I see it as an honor-based system. In such a system, you have to accept that there will be cheaters. A lot of them. And there will lots of mistakes. All we have to do is be open about it. That is, you cannot say “but my work was peer reviewed so you cannot question it!” or “I am very good, look at these prestigious publications!”. The peer review is there to help the authors. It is not, however, an insurance against fraud or mistakes. I like peer review because it helps me become better, but I do not use the system to determine how good someone else is.

So, what do we do if we want to know how good someone is? You read his work. You reproduce his experiments. You redo his math. Of course, this scales poorly. If you have to hire someone, you cannot read the work of 50 or 500 candidates. So? I think we have to be realistic. It is hard to know how good someone is. You can get to know 10 or 20 researchers in your life. That is about all.

Hiring processes are flawed. You will hire cheaters. Get over it.

Next Page »

30 queries. 0.373 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.