Given up on Eclipse, now with NetBeans

I write most of my code using vim. This winter, Kamel made me discover Eclipse.

I dislike IDEs in general because they have a tendency to force me to work in certain ways that are suboptimal. For example, if I need to remember to go to menu X and set option Y to build my project correctly, then that is simply not portable. Everytime I will go to a new machine, I will need to remember these precise steps. Moreover, if I cannot build my code without a GUI, then I cannot test my code on a remote machine under low bandwidth conditions. Finally, IDEs tend to do several operations silently and when things go wrong, you have layers and layers of abstraction before you can correct the problem.

However, Eclipse allowed me to import my project using subversion and use my very own Makefile! What a great idea. And it worked too!

Up until two days ago. For some reason, Eclipse stopped building my code. Hitting the “build” button simply does nothing. I never changed anything in the settings, but playing with the options did not help. I have no way of knowing what went wrong and after hours spent on the Web chasing the problem, I gave up.

I just downloaded NetBeans, and surprise, surpise! There is a C/C++ NetBeans that will use your makefiles too! Wow!

Not everything is rosy however:

  • Under MacOS NetBeans is much uglier than Eclipse. I guess NetBeans must be using Swing or some other horrible Java GUI system. It really feel like a cheap application.
  • NetBeans was unable to detect my subversion binary. It allowed me to tell it where to find subversion but I had to reboot the application for this setting to work! What?!? Eclipse worked right out of the box with subversion.

My main concern is just how ugly and unprofessional NetBeans look. In comparison, vim is great looking! Sun software people need to learn a thing or two about design.

Scientific productivity tips from Hartley and Branthwaite

Hartley and Branthwaite (1989) have done a questionnaire study of productive psychologists. They make the following recommendation for best productivity:

  • Make a rough plan;
  • Complete sections at a time;
  • Use a word processor if possible;
  • Revise and redraft at least twice;
  • Spend about 2–5 hours writing each week;
  • Find quiet conditions in which to write;
  • Set goals and targets;
  • Get colleagues and friends to comment on early drafts;
  • Collaborate with trusted friends.

Source: Sylvie Noël.

Cool software design insight #1

I plan to progressively discuss a few things I have learned about software design during the rest of the year. Trivial things that make a big difference in your productivity. I do not claim that any of these insights will be novel in any way.

As a college professor, I do not code full time. Usually, I build dirty software that will last just long enough to make a point. I do not need to build industrial-strength software. I have no business needs to satisfy. I can afford to throw away code and never look at it again once a research project is completed. None of my code needs to run for more than a few days at a time.

With this disclaimer in place, here is insight #1:

Remove features as often as you can.

Repeatedly, I have observed that my software is too complex for its own good as months go by. Often, I thought that my code would need to do X when, in reality, the need never arises. For example, maybe you wrote code that could sort strings or integers, and you realize that you never sort integers.

It is tempting to leave these extra functions in place. After all, what is the harm? And maybe I will need the extra power some day.

However, I have learned that I systematically underestimate the cognitive overhead of these useless features. I always think that this little extra template parameter is harmless. It is only after removing it, and working with my code some more that I realize how much easier my work has become.

So, drop useless flags and parameters. Do your brain a favor!

Some myths about online teaching

Last year, I launched an online graduate course on Information Retrieval. This summer, I am preparing an online data warehousing course, my fourth online course. It will cover topics ranging from multidimensional indexing techniques, the MDX language, what data warehouses are, Mondrian, JPivot, and so on.

Chatting with a few colleagues who have never taught online, I was reminded of how mythical online teaching still is in 2008. Here are a few myths:

  • Videocasting classroom lectures works. No, it does not. A few lecturers are good enough to keep you watching a YouTube video for 50 minutes. Chances are that you are not among them, at least not always. (Video lectures may work, but only if they are carefully prepared and edited. And that is a lot of work.)
  • Posting lecture notes is pretty much good enough. Not really. There is an insane amount of details making up a course, beyond pedagogically correct notes. Also, you must organize and divide the student work in small chunks. Self-assessment is also very important: you must prepare solved problems for the student to do on is own.
  • Online teaching is mostly good for introductory or low-level courses. Actually, online learning requires a lot of maturity from the students. For this reason, it works better with advanced topics or with more mature students. While teaching calculus online may certainly work, it will work with a very small fraction of the students. You would expect graduate students to have enough maturity to learn on their own, but do not count on it too much.
  • Online courses are ok for learning Microsoft Word, but you cannot possibly teach real science. Think again. Actually, an online course can be much tougher than a traditional course because you do not have to waste time with reminders: just offer a link to a refresher and the students are all set. You also do not waste time with questions about when such and such an assignment is due: the student is expected to read.
  • I do not have time for such nonsense as online teaching as I must focus on my research. Actually, if you have time at all for teaching, online teaching is probably more research-friendly. For one thing, there are fewer unwanted disruptions with online teaching.
  • Online courses will empty the classrooms. That is very unlikely. Universities have been offering bachelor and graduate degrees online for years, how many graduates do you know? Many, many students feel that they need 3 hours of classroom lectures per week to learn. Let us not forget that the classroom play a role in the country-club model of the university: you go on campus to meet people, socialize, and so on. Online courses miss all that, mostly. With the current technology, online learning is a complement to what is already done on campus, not a replacement.

Coverage of the cuil search engine

It seems that the Cuil search engine is getting reactions from almost everyone. On my blog roll, about ten people have commented on it.

Here is my verdict:

  • Cuil.com claims to have outdone Google as far as recall goes: Cuil searches more pages on the Web than anyone else—three times as many as Google and ten times as many as Microsoft;
  • It is not difficult to find queries for which Cuil returns nothing, or returns far fewer pages than Google does (See what Daniel had to write about this).

They also do some semantic analysis. For example, if you type my name, then it knows that I am a famous comedian (hint: I am not!), and helps you find other famous comedians. I think they made a mistake to tie the semantics to the query, and not to particular results. You can tell me more about a given Web page, that’s useful. Guessing what my query meant? That’s reckless.

So:

  • Cuil needs more tuning,
  • semantic analysis should not be so tied to the query terms,
  • they do not appear to outdo Google in any significant way, and certainly not with respect to recall (but this may change).

Update: Sylvie actually likes it?

Updating your model as a researcher

Doing research is hard work. Most people make their life easier by following a model. This model is made of a series of recipes used to carry forward research projects.

There are a few reasons why a researcher may want to update his research model:

  • you want to do research on a new topic: some of your recipes no longer apply;
  • you have new collaborators: they may not follow your rules;
  • you want to increase your long term productivity.

This winter, I updated my research model. Here are a few changes I have made:

  • I relaxed my focus on exploration: I used to spend months toying with random ideas with no precise purpose in mind;
  • I decided to spend more time managing my research by keeping track of precise tasks I need to accomplish;
  • I setup stronger filters: anything that is not closely related to my may research theme is ignored with higher probability than ever.

I did not follow through entirely on my new model. For example, I cannot resist exploring random research ideas. However, my focus has definitively become much narrower.

In a sense, this is a step backward. Indeed, as a Ph.D. student I used to focus on the subject of my thesis at the exclusion of everything else. After my Ph.D. was finished, I started learning about entirely different fields. For example, I know a thing or two about geophysics or image processing, whereas I completed a somewhat theoretical thesis on Wavelets. Now? I work on databases.

However, exploratory research is expensive. Yes, I have learned that I can pick any topic at random and eventually make a small contribution, but the effort is considerable. However, carrying several unrelated research issues is difficult for another reason, beyond just the obvious cost of becoming familiar with a new field. The problem is that you cannot maintain alive your different projects if you have too many and they are too unrelated. Hence, as you open a new front, you drop another. After several years of this process, you have proven that you can learn fast and be creative, but you are still not standing on firm ground. Things do not become easier as time passes.

Hence, I now spend a lot of time choosing which battles I am going to ignore. Am I more productive? Hard to tell. The catch is that, as a researcher, it is very difficult to establish solid grounding since you are constantly picking at it yourself. However, I feel less overwhelmed than I did a few years ago.

Encouraging diversity in science

Science follows a conservative process. It takes a long time for a fact or a law to be accepted. Several scientists must verify and reproduce the same results before acceptance is granted.

So goes the theory.

In practice, science is not such a clean process. Routinely, facts and theories become widely accepted quickly, without criticism. Mostly because they are convenient. Other proposals get shot down immediately: perhaps for good reasons, perhaps not. Negative results, including any challenge to the convenient—but poorly reviewed—facts, are frowned upon.

In some fields, there is a bias against simplicity. If you show that a simple technique works well, even if it works better than more complicated or expensive techniques, people will dismiss your work as too easy. I believe we should have the opposite bias: we should try to steer away from complicated solutions. Complicated techniques should have the burden of the proof: do we need something so difficult? But complexity is often convenient: it raises the barrier of entry to a field. If anyone can do your work using simple techniques, then why are you getting paid?

I believe that to minimize the effects of such biases, we should encourage diversity in science. Here are a few clues on how to get more diversity:

  • pick numerous and different reviewers: the composition of program committees should be different year after year;
  • encourage the multiplication of conferences, journals and workshops;
  • provide funding to more researchers (spread the money more evenly);
  • mix researchers from different organizations (universities, government, industry);
  • do not reward researchers who always publish in the same small set of conferences or journals (the same where they often act as reviewers);
  • mix researchers having different backgrounds.

Finally, I believe that we need to stress reproducibility a lot more. Researchers need to open up their data and their code. This will ensure that more people can check the facts. It should lead to better science and more diversity.

Google makes me smarter

I am a bit late to the show, but I would like to comment on Carr’s Is Google Making Us Stupid? Carr’s observation is simple:

Once I was a scuba diver in the sea of words. Now I zip along the surface like a guy on a Jet Ski.

Here are my thoughts:

  • Quite often, as a teenager, I would read long-winded technical books and conclude “Oh! That’s what he meant to say”. Unavoidably, I would find a very concise way to represent the same information. I am not surprised if people read fewer books, assuming that is even true, because large textbooks are not an optimal communication channel. Books have several deficiencies: they are static, they are not interactive, and they are often not concise. It would not do to try to publish a 12-page book, so authors have a strong incentive to elaborate (sometimes uselessly).
  • My research has grown better thanks to the Web, not worse. I can quickly survey a field, cross-reference statements, drill-down on an issue, roll-up to get an overview, and so on. Anyone who claims researchers were better off without the Web should try cutting off his net connection for a decade, and see what happens. I doubt very much if the research would be any deeper, it might just become narrower.
  • At all time throughout history, few people have given serious thought to any one topic. The fact that you, as an individual, spend your time facing issues that you cannot think through, does not mean that as a whole, humanity has become shallow.
  • You must not let yourself be overwhelmed. There are proper ways to use the Web. What you do not want to do is to try to stay afloat by skimming the new events. Setup filters and remain firm in your dedication to a few objects. Learn to focus in the chaos. Be rude: if something is outside the scope of your interests, say so. Technology can extend its coverage infinitely, you cannot.

We need a more negative culture

There is a strong bias in science, at least in Computer Science, toward positive results. For example, showing that algorithm A is better than algorithm B, will get you published. Reporting the opposite result is likely to get your paper rejected.

One justification for the value of positive results is that it gives you more information. Indeed, there is infinite number of possibilities. Listing all the cases that are of no interest would take too long. We better focus on what works!

This argument is fallacious since it ignores one of the pillars of science: reproducibility. By taking away the possibility of publishing negative results, we basically throw away the most important reason why we require reproducibility: to verify what others have done.

Times and times again, I come across falsehoods in science. Typically, they occur when reporting experimental results that are either badly interpreted or badly implemented. Here is a typical scenario:

  • Researcher A publishes some paper where he makes some false statement.
  • The statement is compelling. It matches people’s intuition.
  • The work becomes well known and is repeatedly cited.
  • Other researchers build upon the falsehood. They either do not verify the statement (where is the profit in that?) or if they do, they avoid denouncing the falsehood.

Eventually, the statement because an accepted fact. Anyone who wants to challenge it has the burden of proof, and it is easy to cast doubts on any experimental procedure. I claim that this happens often. As someone who crafts my own experiments, I see it all the time. I am repeatedly unable to reproduce “accepted facts”. Yet, I never (or almost never) report these problems because trying to do so would ensure that whatever paper I produce is frowned upon. Moreover, I believe few people ever attempt to verify published results. What makes matters worse is that trying to reproduce experiments is never considered serious work in Computer Science. Often, it is quite a difficult task too: either the data or the code is missing or barely available.

What bothers me is not so much the falsehoods, but the fact that it tends to feed into the biases of entire communities. People expect certain things, and they filter out any “negative” result, and protect “positive” results even when such results are not solid. Entire fields are therefore being built on shaky foundations.

We have made some progress recently in Computer Science regarding reproducibility. There are more conferences and journals asking researchers to make their data and code available. However, I believe that culturally, we still have a long way to go.

Do you think because you write, or write because you think?

I used to believe that the pressure to publish what you did in research was inherently bad. About four years ago or so, I started to change my mind.

I now believe that the more you write, the more you think about the issues, and the more ideas you have. In short, productive researchers do not write a lot because they are brilliant, they are brilliant because they write a lot.

This statement has counterexamples, however. We all know of some researchers who produce papers after papers, all of them toying with the same set of narrow ideas, or all of them misguided. Hence, I will add a constraint. You must write a lot about different things.

But clearly, that is not enough. Many people who write textbooks, for example, happen to write a lot, and they write about different things, yet, they are not automatically brilliant researchers (though, I submit to you that they probably are brilliant individuals). Hence, I will add a final constraint: you must be ambitious and go where nobody has gone before.

So, let me summarize my recipe:

  • write a lot…
  • about different things…
  • and be bold.

My final point for the day: When I say that you must write a lot, I do not mean that you must publish a lot in peer-reviewed journals and conferences. Getting continual and high-quality feedback is essential, but I see no evidence that getting formally reviewed frequently is essential. In fact, it may even prove counterproductive as it may encourage you to become more conservative.

How do you get feedback, if not through peer review? For one thing, you can run experiments: nature will tell you whether you are wrong. For another, informal review of your work by friends or collaborators can be as good or better than formal peer review.

I also think that posting your work on the Web might be a very valid form of publication, especially if you have job security. Sometimes you know that your work is correct. At the very least, you know as well as any reviewer might. Or sometimes, your result might just not warrant the process. Maybe we should all create our own personal journals.

Next Page »

18 queries. 0.430 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.