Thursday, July 31st, 2008

Given up on Eclipse, now with NetBeans

Filed under: Science and Technology — Daniel Lemire @ 15:21

I write most of my code using vim. This winter, Kamel made me discover Eclipse.

I dislike IDEs in general because they have a tendency to force me to work in certain ways that are suboptimal. For example, if I need to remember to go to menu X and set option Y to build my project correctly, then that is simply not portable. Everytime I will go to a new machine, I will need to remember these precise steps. Moreover, if I cannot build my code without a GUI, then I cannot test my code on a remote machine under low bandwidth conditions. Finally, IDEs tend to do several operations silently and when things go wrong, you have layers and layers of abstraction before you can correct the problem.

However, Eclipse allowed me to import my project using subversion and use my very own Makefile! What a great idea. And it worked too!

Up until two days ago. For some reason, Eclipse stopped building my code. Hitting the “build” button simply does nothing. I never changed anything in the settings, but playing with the options did not help. I have no way of knowing what went wrong and after hours spent on the Web chasing the problem, I gave up.

I just downloaded NetBeans, and surprise, surpise! There is a C/C++ NetBeans that will use your makefiles too! Wow!

Not everything is rosy however:

  • Under MacOS NetBeans is much uglier than Eclipse. I guess NetBeans must be using Swing or some other horrible Java GUI system. It really feel like a cheap application.
  • NetBeans was unable to detect my subversion binary. It allowed me to tell it where to find subversion but I had to reboot the application for this setting to work! What?!? Eclipse worked right out of the box with subversion.

My main concern is just how ugly and unprofessional NetBeans look. In comparison, vim is great looking! Sun software people need to learn a thing or two about design.

Scientific productivity tips from Hartley and Branthwaite

Filed under: Academia/Research — Daniel Lemire @ 7:27

Hartley and Branthwaite (1989) have done a questionnaire study of productive psychologists. They make the following recommendation for best productivity:

  • Make a rough plan;
  • Complete sections at a time;
  • Use a word processor if possible;
  • Revise and redraft at least twice;
  • Spend about 2–5 hours writing each week;
  • Find quiet conditions in which to write;
  • Set goals and targets;
  • Get colleagues and friends to comment on early drafts;
  • Collaborate with trusted friends.

Source: Sylvie Noël.

Wednesday, July 30th, 2008

Cool software design insight #1

Filed under: Software design — Daniel Lemire @ 16:20

I plan to progressively discuss a few things I have learned about software design during the rest of the year. Trivial things that make a big difference in your productivity. I do not claim that any of these insights will be novel in any way.

As a college professor, I do not code full time. Usually, I build dirty software that will last just long enough to make a point. I do not need to build industrial-strength software. I have no business needs to satisfy. I can afford to throw away code and never look at it again once a research project is completed. None of my code needs to run for more than a few days at a time.

With this disclaimer in place, here is insight #1:

Remove features as often as you can.

Repeatedly, I have observed that my software is too complex for its own good as months go by. Often, I thought that my code would need to do X when, in reality, the need never arises. For example, maybe you wrote code that could sort strings or integers, and you realize that you never sort integers.

It is tempting to leave these extra functions in place. After all, what is the harm? And maybe I will need the extra power some day.

However, I have learned that I systematically underestimate the cognitive overhead of these useless features. I always think that this little extra template parameter is harmless. It is only after removing it, and working with my code some more that I realize how much easier my work has become.

So, drop useless flags and parameters. Do your brain a favor!

Tuesday, July 29th, 2008

Some myths about online teaching

Filed under: Academia/Research — Daniel Lemire @ 8:32

Last year, I launched an online graduate course on Information Retrieval. This summer, I am preparing an online data warehousing course, my fourth online course. It will cover topics ranging from multidimensional indexing techniques, the MDX language, what data warehouses are, Mondrian, JPivot, and so on.

Chatting with a few colleagues who have never taught online, I was reminded of how mythical online teaching still is in 2008. Here are a few myths:

  • Videocasting classroom lectures works. No, it does not. A few lecturers are good enough to keep you watching a YouTube video for 50 minutes. Chances are that you are not among them, at least not always. (Video lectures may work, but only if they are carefully prepared and edited. And that is a lot of work.)
  • Posting lecture notes is pretty much good enough. Not really. There is an insane amount of details making up a course, beyond pedagogically correct notes. Also, you must organize and divide the student work in small chunks. Self-assessment is also very important: you must prepare solved problems for the student to do on is own.
  • Online teaching is mostly good for introductory or low-level courses. Actually, online learning requires a lot of maturity from the students. For this reason, it works better with advanced topics or with more mature students. While teaching calculus online may certainly work, it will work with a very small fraction of the students. You would expect graduate students to have enough maturity to learn on their own, but do not count on it too much.
  • Online courses are ok for learning Microsoft Word, but you cannot possibly teach real science. Think again. Actually, an online course can be much tougher than a traditional course because you do not have to waste time with reminders: just offer a link to a refresher and the students are all set. You also do not waste time with questions about when such and such an assignment is due: the student is expected to read.
  • I do not have time for such nonsense as online teaching as I must focus on my research. Actually, if you have time at all for teaching, online teaching is probably more research-friendly. For one thing, there are fewer unwanted disruptions with online teaching.
  • Online courses will empty the classrooms. That is very unlikely. Universities have been offering bachelor and graduate degrees online for years, how many graduates do you know? Many, many students feel that they need 3 hours of classroom lectures per week to learn. Let us not forget that the classroom play a role in the country-club model of the university: you go on campus to meet people, socialize, and so on. Online courses miss all that, mostly. With the current technology, online learning is a complement to what is already done on campus, not a replacement.

Monday, July 28th, 2008

Coverage of the cuil search engine

Filed under: Science and Technology — Daniel Lemire @ 12:58

It seems that the Cuil search engine is getting reactions from almost everyone. On my blog roll, about ten people have commented on it.

Here is my verdict:

  • Cuil.com claims to have outdone Google as far as recall goes: Cuil searches more pages on the Web than anyone else—three times as many as Google and ten times as many as Microsoft;
  • It is not difficult to find queries for which Cuil returns nothing, or returns far fewer pages than Google does (See what Daniel had to write about this).

They also do some semantic analysis. For example, if you type my name, then it knows that I am a famous comedian (hint: I am not!), and helps you find other famous comedians. I think they made a mistake to tie the semantics to the query, and not to particular results. You can tell me more about a given Web page, that’s useful. Guessing what my query meant? That’s reckless.

So:

  • Cuil needs more tuning,
  • semantic analysis should not be so tied to the query terms,
  • they do not appear to outdo Google in any significant way, and certainly not with respect to recall (but this may change).

Update: Sylvie actually likes it?

Updating your model as a researcher

Filed under: Academia/Research — Daniel Lemire @ 10:09

Doing research is hard work. Most people make their life easier by following a model. This model is made of a series of recipes used to carry forward research projects.

There are a few reasons why a researcher may want to update his research model:

  • you want to do research on a new topic: some of your recipes no longer apply;
  • you have new collaborators: they may not follow your rules;
  • you want to increase your long term productivity.

This winter, I updated my research model. Here are a few changes I have made:

  • I relaxed my focus on exploration: I used to spend months toying with random ideas with no precise purpose in mind;
  • I decided to spend more time managing my research by keeping track of precise tasks I need to accomplish;
  • I setup stronger filters: anything that is not closely related to my may research theme is ignored with higher probability than ever.

I did not follow through entirely on my new model. For example, I cannot resist exploring random research ideas. However, my focus has definitively become much narrower.

In a sense, this is a step backward. Indeed, as a Ph.D. student I used to focus on the subject of my thesis at the exclusion of everything else. After my Ph.D. was finished, I started learning about entirely different fields. For example, I know a thing or two about geophysics or image processing, whereas I completed a somewhat theoretical thesis on Wavelets. Now? I work on databases.

However, exploratory research is expensive. Yes, I have learned that I can pick any topic at random and eventually make a small contribution, but the effort is considerable. However, carrying several unrelated research issues is difficult for another reason, beyond just the obvious cost of becoming familiar with a new field. The problem is that you cannot maintain alive your different projects if you have too many and they are too unrelated. Hence, as you open a new front, you drop another. After several years of this process, you have proven that you can learn fast and be creative, but you are still not standing on firm ground. Things do not become easier as time passes.

Hence, I now spend a lot of time choosing which battles I am going to ignore. Am I more productive? Hard to tell. The catch is that, as a researcher, it is very difficult to establish solid grounding since you are constantly picking at it yourself. However, I feel less overwhelmed than I did a few years ago.

Wednesday, July 23rd, 2008

Encouraging diversity in science

Filed under: Academia/Research — Daniel Lemire @ 9:04

Science follows a conservative process. It takes a long time for a fact or a law to be accepted. Several scientists must verify and reproduce the same results before acceptance is granted.

So goes the theory.

In practice, science is not such a clean process. Routinely, facts and theories become widely accepted quickly, without criticism. Mostly because they are convenient. Other proposals get shot down immediately: perhaps for good reasons, perhaps not. Negative results, including any challenge to the convenient—but poorly reviewed—facts, are frowned upon.

In some fields, there is a bias against simplicity. If you show that a simple technique works well, even if it works better than more complicated or expensive techniques, people will dismiss your work as too easy. I believe we should have the opposite bias: we should try to steer away from complicated solutions. Complicated techniques should have the burden of the proof: do we need something so difficult? But complexity is often convenient: it raises the barrier of entry to a field. If anyone can do your work using simple techniques, then why are you getting paid?

I believe that to minimize the effects of such biases, we should encourage diversity in science. Here are a few clues on how to get more diversity:

  • pick numerous and different reviewers: the composition of program committees should be different year after year;
  • encourage the multiplication of conferences, journals and workshops;
  • provide funding to more researchers (spread the money more evenly);
  • mix researchers from different organizations (universities, government, industry);
  • do not reward researchers who always publish in the same small set of conferences or journals (the same where they often act as reviewers);
  • mix researchers having different backgrounds.

Finally, I believe that we need to stress reproducibility a lot more. Researchers need to open up their data and their code. This will ensure that more people can check the facts. It should lead to better science and more diversity.

« Previous PageNext Page »

30 queries. 0.540 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.