Monday, July 21st, 2008

Google makes me smarter

Filed under: Science and Technology — Daniel Lemire @ 8:54

I am a bit late to the show, but I would like to comment on Carr’s Is Google Making Us Stupid? Carr’s observation is simple:

Once I was a scuba diver in the sea of words. Now I zip along the surface like a guy on a Jet Ski.

Here are my thoughts:

  • Quite often, as a teenager, I would read long-winded technical books and conclude “Oh! That’s what he meant to say”. Unavoidably, I would find a very concise way to represent the same information. I am not surprised if people read fewer books, assuming that is even true, because large textbooks are not an optimal communication channel. Books have several deficiencies: they are static, they are not interactive, and they are often not concise. It would not do to try to publish a 12-page book, so authors have a strong incentive to elaborate (sometimes uselessly).
  • My research has grown better thanks to the Web, not worse. I can quickly survey a field, cross-reference statements, drill-down on an issue, roll-up to get an overview, and so on. Anyone who claims researchers were better off without the Web should try cutting off his net connection for a decade, and see what happens. I doubt very much if the research would be any deeper, it might just become narrower.
  • At all time throughout history, few people have given serious thought to any one topic. The fact that you, as an individual, spend your time facing issues that you cannot think through, does not mean that as a whole, humanity has become shallow.
  • You must not let yourself be overwhelmed. There are proper ways to use the Web. What you do not want to do is to try to stay afloat by skimming the new events. Setup filters and remain firm in your dedication to a few objects. Learn to focus in the chaos. Be rude: if something is outside the scope of your interests, say so. Technology can extend its coverage infinitely, you cannot.

Thursday, July 10th, 2008

A small graph-theory puzzle

Filed under: Science and Technology — Daniel Lemire @ 13:23

I like to think about graph theory problems these days. Here is one:

What type of graph has minimal diameter for a given number of vertices, given an upper bound on the in-degree and another upper bound on the out-degree?

I will give eternal fame (among the readership of this blog) to anyone who can provide a practical algorithm to construct such graphs. Pointing me to a reference counts.

(No, I have not even tried to solve the problem. I am just interested in the answer.)

Monday, July 7th, 2008

Sorting 1 terabyte in 209 seconds

Filed under: Science and Technology — Daniel Lemire @ 8:22

Yahoo! managed to sort 10 billion 100-byte elements in 209 seconds. This was done in Java using Hadoop.

As a basis for comparison, on a fast and recent Mac Pro, it takes 6000 seconds to sort a 2 GB text file using Unix file utilities. Yahoo!’s problem is 500 times larger, and they solve it 30 times faster : they are 4 orders of magnitude faster! Of course, they have fixed-length records which helps tremendously.

However, I wonder how much energy (power usage) was spent on the sort operation?

Friday, July 4th, 2008

Backing up your Mac on an external disk

Filed under: Science and Technology — Daniel Lemire @ 19:28

A couple of weeks ago, I needed to backup my MacBook Pro to an external disk (a firewire G-Drive) because my hard drive was failing. I started shopping for a good backup solution, but none of them had the following features:

  • support for incremental backups: if a change is made, you only backup the files that differ;
  • adequate handling of IO errors (no all-out abort);
  • inexpensive.

Indeed, I tried two different tools, but they refused to backup my disk due to numerous IO errors. They would not even tell me how to fix my problem.

As it turns out, your Mac has already all it needs, by default, to do just that. First, create a file called “backup.sh”, make it executable (chmod +x backup.sh) and copy the following content to it:


#!/bin/sh
RSYNC="/usr/bin/rsync -E"
# my external disk is located
# at /Volumes/G-DRIVE\ MINI/
sudo $RSYNC -a -x -S --delete --exclude-from backup_excludes.txt $* /Volumes/G-DRIVE\ MINI/
sudo bless -folder /Volumes/G-DRIVE\ MINI/System/Library/CoreServices

Then run it! Go to a shell and type “./backup.sh”. It will ask for you root password.

If you ever need to restore your files, then create a file called “restore.sh” with the following content:


#!/bin/sh
RSYNC="/usr/bin/rsync -E"
sudo $RSYNC -a -x -S --delete --exclude-from backup_excludes.txt $* /Volumes/G-DRIVE\ MINI/ /Volumes/Macintosh\ HD/
sudo bless -folder /Volumes/Macintosh\ HD/System/Library/CoreServices

Executing restore.sh may prove dangerous. Make sure you have tried booting from the external disk first. To boot from an external disk, I think you have to hold down the command key while rebooting.

Friday, June 27th, 2008

List of Accepted Papers to Large-Scale Recommender Systems Workshop

Filed under: Science and Technology — Daniel Lemire @ 17:03

We just posted the list of accepted papers to second workshop on Large-Scale Recommender Systems and the Netflix Prize Competition. Here are the titles:

  • Jinlong Wu and Tiejun Li. A Modified Fuzzy C-Means Algorithm For Collaborative Filtering
  • Gavin Potter. Putting the collaborator back into collaborative filtering
  • Andreas Toescher, Michael Jahrer and Robert Legenstein. Improved Neighborhood-Based Algorithms for Large-Scale Recommender Systems
  • Tamas Kiss, Miklos Kurucz, István Nagy and Andras A. Benczur. Large-scale recommenders based on Association Rule Mining
  • Oscar Celma and Pedro Cano. From hits to niches? or how popular artists can bias music recommendations
  • Domonkos Tikk, Gabor Takacs, Istvan Pilaszy and Bottyan Nemeth. Investigation of Various Matrix Factorization Methods for Large Recommender Systems

Thursday, June 12th, 2008

Proof that I am a stubborn bastard

Filed under: Science and Technology, Academia/Research — Daniel Lemire @ 9:01
  • I have not used Microsoft Office in over 5 years. I use Mac OS and Linux.
  • I never use my employer’s email service. Prior to Google Mail, I used a private provider and forwarded my work email there.
  • I have never driven to work, in the last 4 years.
  • As a researcher, I do not belong to any one community.
  • I keep teaching an university-level XML course, even though I have been ridiculed for teaching such lowly technical issues.

Tuesday, June 10th, 2008

From Graph Drawing to Tag-Cloud drawing?

Filed under: Science and Technology — Daniel Lemire @ 9:17

Tag clouds are an interesting visualization technique because, unlike bar charts, you can easily display 30 or 50 weights in a compact figure. Moreover, because they are a 2D structure, you can more easily cluster similar tags together. The Tag-Cloud Drawing problem is the optimization of the layout of the tag clouds. It is somewhat related to the Graph Drawing problem.

Recently, Fujimura et al. showed how to scale tag clouds further… up to 5,000 attributes!

We use a topographical image that helps users to grasp the relationship among tags intuitively as a background to the tag clouds. We apply this interface to a blog navigation system and show that the proposed method enables users to find the desired tags easily even if the tag clouds are very large, 5,000 and above tags. Our approach is also effective for understanding the overall structure of a large amount of tagged documents.

I really think that tag-cloud drawing is a topic deserving of more attention. It is both a fun and practical problem.

Next Page »

32 queries. 0.237 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.