Wednesday, June 11th, 2008

The Purity Scale in Science

Filed under: Academia/Research — Daniel Lemire @ 13:58

This is how most people understand purity in Science:

As for myself, I measure purity on a bandwidth scale: the more feedback the researchers get, the less pure they are. I should maybe use another term.

(Thanks to Steven for pointing this comic to me.)

Distractions make you dumb

Filed under: Academia/Research — Daniel Lemire @ 8:30

Sufficient focus is necessary to be smart. The corollary is that distractions may turn your brain into mulch. There several conditions to sufficient focus:

  • a sense of urgency: without a strong need to get the task done, long term focus is difficult;
  • the dismissal of external stimuli: either you make sure not to be disturbed, or you can filter out the distractions;
  • mental readiness: sometimes your mind will simply not focus before you rest.

Tuesday, June 10th, 2008

From Graph Drawing to Tag-Cloud drawing?

Filed under: Science and Technology — Daniel Lemire @ 9:17

Tag clouds are an interesting visualization technique because, unlike bar charts, you can easily display 30 or 50 weights in a compact figure. Moreover, because they are a 2D structure, you can more easily cluster similar tags together. The Tag-Cloud Drawing problem is the optimization of the layout of the tag clouds. It is somewhat related to the Graph Drawing problem.

Recently, Fujimura et al. showed how to scale tag clouds further… up to 5,000 attributes!

We use a topographical image that helps users to grasp the relationship among tags intuitively as a background to the tag clouds. We apply this interface to a blog navigation system and show that the proposed method enables users to find the desired tags easily even if the tag clouds are very large, 5,000 and above tags. Our approach is also effective for understanding the overall structure of a large amount of tagged documents.

I really think that tag-cloud drawing is a topic deserving of more attention. It is both a fun and practical problem.

Monday, June 9th, 2008

Grounded versus Pure Theory

Filed under: Academia/Research — Daniel Lemire @ 8:34

My previous blog post generated quite a number of comments and much criticism. Let me summarize the main objections:

  • What I describe is not pure theory but bad research.
  • Pure theory is useful: consider the n log n lower bound on sorting.

My replies:

  • Our brains are bandwidth-driven machines, not standalone computers. You will only thrive given sufficient feedback. And peer review is a low-bandwidth high-latency feedback system.
  • Pure theory is low-bandwidth Science: few results depend on it, whether it is useful or powerful is entirely a matter of opinion. It is pure because it is not tainted by external feedback.
  • Theoretical results are the reason why we do Science.
  • Pure theorists are likely to describe themselves as engineers.
  • I have done and will do pure theory work. It is a very tempting trap.
  • If a new Engineering concept seems like a good idea, wait before you make a book out of it. Try it out in practice first.
  • If a theorem seems useful to you, wait before you make a career out of it. Can you relate it to anything in the world out there?

Thursday, June 5th, 2008

Why pure theory is wasteful

Filed under: Academia/Research — Daniel Lemire @ 21:51

Pure theory is like exploring the universe by staying on Earth. Sure, it seems expensive at first to build space ships, but our brains are at their best when facing reality up close. Too many scientists work exclusively over models in their mind. Then they are surprised that nobody outside their clique finds what they do interesting.

And I am not thinking about Mathematics: Mathematics was founded by people who wanted to sell land by area, not perimeter… modern Mathematics came to be with Newton, who wanted to help the state manage its money better. I am thinking about Software Engineering researchers who never write software and never study people who write software. I am thinking about Semantic Web researchers who have been building models and ontologies for ten years, but who have never tested their ideas against the harsh reality. I am thinking about Algorithm Design people who claim one algorithm is better than another, but they never bothered to implement it. I am thinking about Machine Learning researchers who never bother to test their schemes with the terabytes of data we find everywhere.

All topics warrant research, but pure theory is not an acceptable methodology.

It doesn’t matter how beautiful your theory is, it doesn’t matter how smart you are. If it doesn’t agree with experiment, it’s wrong. (Attributed to Feynman)

Tuesday, June 3rd, 2008

A short review of Collective Intelligence in Action

Filed under: Science and Technology — Daniel Lemire @ 22:03


I was recently asked by the publisher to review Collective Intelligence in Action. The author is Satnam Alag, a Bay area engineer with a Ph.D. from the University of California, Berkeley. Dr. Alag is VP of NextBio, a specialized search engine.

The first chapter is free and so is the source code used in the book.

The book is for Java developers who want to implement “Collective Intelligence” applications in Java. It tells us about extracting and applying data from blogs, wikis and social network applications. People who read this blog know that I am not one to praise, but this book succeeds brilliantly. If you are a Java engineer and work with Web technologies, you must get this book. It covers topics such as computing similarity measures using vector models, Naïve Bayes Classifiers, inverse document frequency (idf), Machine Learning (using the Weka API), building a crawler with regular expressions, collaborative filtering (with links to open source tools), and so on.

Even if you do not work with Java, if you care for high-end Web applications, this book is for you. It reminds me of Lyon’s Java Digital Signal Processing book. It offers the gist of what academia knows, but focuses on what people (engineers and researchers) do in practice.

The book is not meant for academia however. There are references, but no theorem.

The book is available for preorder on Amazon for $30. Go order it.

Disclaimer. I did not get paid to review this book, and I do not stand to gain anything if you buy the book. I have no relationship with the publisher or the author.

Further reading. A competing book is Programming Collective Intelligence: Building Smart Web 2.0 Applications by Toby Segaran. It uses Python instead of Java.

The ten-minute rule for presentations

Filed under: Academia/Research — Daniel Lemire @ 7:47

Mike gives us 3 rules to improve our presentations. Two of the rules I knew: you have to practice and you should present pictures, not text, on your slides. The other rule is the 10-minute rule: you have to insert a break in your presentation every 10 minutes to refresh the audience.

I must admit that I am really bad at attending presentations. I usually fall asleep within 5 minutes. But, at least, if you try to start fresh every 10 minutes, you may catch me when I randomly wake up. But do not mind me: I must be an outlier. For one thing, I really prefer to read your papers rather than listen to a 50-minute talk. I have this strange belief that lectures are leftovers from an era when paper and ink were expensive. But, yes, I know that talks reach many people who would not otherwise read the papers.

« Previous PageNext Page »

40 queries. 1.438 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.