Monday, July 21st, 2008

Google makes me smarter

Filed under: Science and Technology — Daniel Lemire @ 8:54

I am a bit late to the show, but I would like to comment on Carr’s Is Google Making Us Stupid? Carr’s observation is simple:

Once I was a scuba diver in the sea of words. Now I zip along the surface like a guy on a Jet Ski.

Here are my thoughts:

  • Quite often, as a teenager, I would read long-winded technical books and conclude “Oh! That’s what he meant to say”. Unavoidably, I would find a very concise way to represent the same information. I am not surprised if people read fewer books, assuming that is even true, because large textbooks are not an optimal communication channel. Books have several deficiencies: they are static, they are not interactive, and they are often not concise. It would not do to try to publish a 12-page book, so authors have a strong incentive to elaborate (sometimes uselessly).
  • My research has grown better thanks to the Web, not worse. I can quickly survey a field, cross-reference statements, drill-down on an issue, roll-up to get an overview, and so on. Anyone who claims researchers were better off without the Web should try cutting off his net connection for a decade, and see what happens. I doubt very much if the research would be any deeper, it might just become narrower.
  • At all time throughout history, few people have given serious thought to any one topic. The fact that you, as an individual, spend your time facing issues that you cannot think through, does not mean that as a whole, humanity has become shallow.
  • You must not let yourself be overwhelmed. There are proper ways to use the Web. What you do not want to do is to try to stay afloat by skimming the new events. Setup filters and remain firm in your dedication to a few objects. Learn to focus in the chaos. Be rude: if something is outside the scope of your interests, say so. Technology can extend its coverage infinitely, you cannot.

We need a more negative culture

Filed under: Academia/Research — Daniel Lemire @ 7:58

There is a strong bias in science, at least in Computer Science, toward positive results. For example, showing that algorithm A is better than algorithm B, will get you published. Reporting the opposite result is likely to get your paper rejected.

One justification for the value of positive results is that it gives you more information. Indeed, there is infinite number of possibilities. Listing all the cases that are of no interest would take too long. We better focus on what works!

This argument is fallacious since it ignores one of the pillars of science: reproducibility. By taking away the possibility of publishing negative results, we basically throw away the most important reason why we require reproducibility: to verify what others have done.

Times and times again, I come across falsehoods in science. Typically, they occur when reporting experimental results that are either badly interpreted or badly implemented. Here is a typical scenario:

  • Researcher A publishes some paper where he makes some false statement.
  • The statement is compelling. It matches people’s intuition.
  • The work becomes well known and is repeatedly cited.
  • Other researchers build upon the falsehood. They either do not verify the statement (where is the profit in that?) or if they do, they avoid denouncing the falsehood.

Eventually, the statement because an accepted fact. Anyone who wants to challenge it has the burden of proof, and it is easy to cast doubts on any experimental procedure. I claim that this happens often. As someone who crafts my own experiments, I see it all the time. I am repeatedly unable to reproduce “accepted facts”. Yet, I never (or almost never) report these problems because trying to do so would ensure that whatever paper I produce is frowned upon. Moreover, I believe few people ever attempt to verify published results. What makes matters worse is that trying to reproduce experiments is never considered serious work in Computer Science. Often, it is quite a difficult task too: either the data or the code is missing or barely available.

What bothers me is not so much the falsehoods, but the fact that it tends to feed into the biases of entire communities. People expect certain things, and they filter out any “negative” result, and protect “positive” results even when such results are not solid. Entire fields are therefore being built on shaky foundations.

We have made some progress recently in Computer Science regarding reproducibility. There are more conferences and journals asking researchers to make their data and code available. However, I believe that culturally, we still have a long way to go.

Friday, July 11th, 2008

Do you think because you write, or write because you think?

Filed under: Academia/Research — Daniel Lemire @ 9:33

I used to believe that the pressure to publish what you did in research was inherently bad. About four years ago or so, I started to change my mind.

I now believe that the more you write, the more you think about the issues, and the more ideas you have. In short, productive researchers do not write a lot because they are brilliant, they are brilliant because they write a lot.

This statement has counterexamples, however. We all know of some researchers who produce papers after papers, all of them toying with the same set of narrow ideas, or all of them misguided. Hence, I will add a constraint. You must write a lot about different things.

But clearly, that is not enough. Many people who write textbooks, for example, happen to write a lot, and they write about different things, yet, they are not automatically brilliant researchers (though, I submit to you that they probably are brilliant individuals). Hence, I will add a final constraint: you must be ambitious and go where nobody has gone before.

So, let me summarize my recipe:

  • write a lot…
  • about different things…
  • and be bold.

My final point for the day: When I say that you must write a lot, I do not mean that you must publish a lot in peer-reviewed journals and conferences. Getting continual and high-quality feedback is essential, but I see no evidence that getting formally reviewed frequently is essential. In fact, it may even prove counterproductive as it may encourage you to become more conservative.

How do you get feedback, if not through peer review? For one thing, you can run experiments: nature will tell you whether you are wrong. For another, informal review of your work by friends or collaborators can be as good or better than formal peer review.

I also think that posting your work on the Web might be a very valid form of publication, especially if you have job security. Sometimes you know that your work is correct. At the very least, you know as well as any reviewer might. Or sometimes, your result might just not warrant the process. Maybe we should all create our own personal journals.

Thursday, July 10th, 2008

A small graph-theory puzzle

Filed under: Science and Technology — Daniel Lemire @ 13:23

I like to think about graph theory problems these days. Here is one:

What type of graph has minimal diameter for a given number of vertices, given an upper bound on the in-degree and another upper bound on the out-degree?

I will give eternal fame (among the readership of this blog) to anyone who can provide a practical algorithm to construct such graphs. Pointing me to a reference counts.

(No, I have not even tried to solve the problem. I am just interested in the answer.)

Monday, July 7th, 2008

I still don’t have the multiplication tables memorized

Filed under: Academia/Research — Daniel Lemire @ 17:21

I read this on slashdot:

I have a PhD in math, and I still don’t have the multiplication tables memorized

Now I know I am not the only one!

In other news,

  • I still deduce my age from my birth date (takes me a minute or so each time);
  • I was identified as having a learning disability when I entered school (since I could not recite my phone number nor tie my shoes) and put in a special class;
  • I still don’t know my office phone number;
  • I don’t know my bank account number, nor how much money there is in it;
  • I don’t know my Social Insurance Number;
  • I get the birthdays of my sons mixed up.

But I know what a soliton is, I can solve nonlinear differential equations by multiscale methods, and I can program my very own bitmap index from scratch in C++. Oh! and I can grow coreopsis and echinacea from seeds.

Let us face it: the purpose of school should not be to teach specifics. And you should never judge kids by what you expect them to achieve. Let them surprise you!

Sorting 1 terabyte in 209 seconds

Filed under: Science and Technology — Daniel Lemire @ 8:22

Yahoo! managed to sort 10 billion 100-byte elements in 209 seconds. This was done in Java using Hadoop.

As a basis for comparison, on a fast and recent Mac Pro, it takes 6000 seconds to sort a 2 GB text file using Unix file utilities. Yahoo!’s problem is 500 times larger, and they solve it 30 times faster : they are 4 orders of magnitude faster! Of course, they have fixed-length records which helps tremendously.

However, I wonder how much energy (power usage) was spent on the sort operation?

Friday, July 4th, 2008

Backing up your Mac on an external disk

Filed under: Science and Technology — Daniel Lemire @ 19:28

A couple of weeks ago, I needed to backup my MacBook Pro to an external disk (a firewire G-Drive) because my hard drive was failing. I started shopping for a good backup solution, but none of them had the following features:

  • support for incremental backups: if a change is made, you only backup the files that differ;
  • adequate handling of IO errors (no all-out abort);
  • inexpensive.

Indeed, I tried two different tools, but they refused to backup my disk due to numerous IO errors. They would not even tell me how to fix my problem.

As it turns out, your Mac has already all it needs, by default, to do just that. First, create a file called “backup.sh”, make it executable (chmod +x backup.sh) and copy the following content to it:


#!/bin/sh
RSYNC="/usr/bin/rsync -E"
# my external disk is located
# at /Volumes/G-DRIVE\ MINI/
sudo $RSYNC -a -x -S --delete --exclude-from backup_excludes.txt $* /Volumes/G-DRIVE\ MINI/
sudo bless -folder /Volumes/G-DRIVE\ MINI/System/Library/CoreServices

Then run it! Go to a shell and type “./backup.sh”. It will ask for you root password.

If you ever need to restore your files, then create a file called “restore.sh” with the following content:


#!/bin/sh
RSYNC="/usr/bin/rsync -E"
sudo $RSYNC -a -x -S --delete --exclude-from backup_excludes.txt $* /Volumes/G-DRIVE\ MINI/ /Volumes/Macintosh\ HD/
sudo bless -folder /Volumes/Macintosh\ HD/System/Library/CoreServices

Executing restore.sh may prove dangerous. Make sure you have tried booting from the external disk first. To boot from an external disk, I think you have to hold down the command key while rebooting.

« Previous PageNext Page »

32 queries. 0.415 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.