Trading latency for quality in research

I am not opposed to the Publish or Perish mantra. I am an academic writer. I am what I publish. We all think of researchers as people wearing laboratory coats, working on exotic devices. And my own laboratory includes a one-million-dollar computer cluster with a SAN server as large as a fridge. I also generate much software. But you know what? The writing is what matters.

And publishing is easy. Write and submit many papers  conforming to the expectations of the editors. Eventually, some of your work will be accepted. And there are thousands of journals, conferences and workshops. Just write a lot.

Yet, don’t publish everything you write—even when what you wrote looks like a research paper. Hold on to it.  Because, publishing everything that looks like a research paper leads to what Feynman famously described as Cargo Cult Science. Indeed, there is a real danger that we become so good at faking science that we are no longer doing science at all! We become dishonest.

In our haste to be published…

  • we cut corners in our experiments, when we validate our ideas at all;
  • we pretend that our work is applicable in the real world, when it isn’t;
  • we don’t take the time to reproduce and reflect on known results;
  • we give the positive aspects of our research while omitting to mention the negatives;
  • we complexify the issues so that our research looks fancier;
  • we get lost in abstract nonsense.

If you want your work to really matter, you should be honest. You should not fool yourself and others. So what do we do? Maybe we should publish carefully. While barely reducing our output rate as academic writers, we can introduce extra steps to keep us more honest. What do we need?

  • Diverse point of views: it is easy to fool a small group of like-minded experts, but comparatively more difficult to fool the readers of my blog.
  • Time to reflect: if you read what you wrote months ago, and you don’t feel the urgency to communicate it more broadly, maybe it wasn’t all that good to begin with?

The problem is that once a paper is published in a journal or a conference, we tend to move on. Anyhow, we cannot easily revise our published work. Are there other models? Economists regularly publish working papers—commonly known in Computer Science as technical reports. But the difference between computer scientists and economists is that economists revise their working papers. And only when their work has stood the test of time, that is, has been available freely for months or years, do they submit it to conventional peer review.

This year, I will try the following experiment. Both on this blog and on my publication page, I will “publish” working papers and specifically ask readers to be critical of my work. Only after a couple of months have passed (or more) will I submit my work to a journal or conference.

This will introduce some latency in my publication output. Can I trade latency for quality? I plan to report back in a year on this (very public) experiment.

Further reading: Time for computer science to grow up by Lance Fortnow.

My best blog posts (2009)

As year 2009 comes to an end, I selected a few of my best blog posts.

Database, compression and column stores:

Hashing (contrarian blog posts):

Academia and Research:

Become independent of peer review

When I asked the director of a large—and successful—British software house his most serious problem, he said without hesitation “how to prevent clusters of incompetence from emerging”. I was reminded of that when I noticed the —for me unusual— weight given to the “peer review”. What, if the peers aren’t any better? The mechanism does not protect us from harbouring fragments that are too shallow, too speculative, or—as the case may be—too fraudulent to merit the name of science. (And let us have no illusions: such topics abound! We are fortunate in not having professors in software metrics, animation or key-wording!)

Not only does the mechanism of peer review fail to protect us from disasters, in a certain way it guarantees mediocrity: the genius has no peers. And to make matters worse, his publication record does not reflect his work either. At the time it is done, truly original work —which, in the scientific establishment, is as welcome as unwanted baby— is very hard to publish as it takes at least another ten years for the appropriate journal to be founded. (I sooner blame someone for his publication list being too long than being too short.)

The moral is that we cannot delegate our responsibility to judge ourself. We can forsake it, but not delegate it. By hiding behind the excuse “But that is not my specialty” we degrade ourselves to lame ducks, and we should not do so. A good young scientist is able to explain

  • what he is trying to achieve
  • why he is tackling this in the way he is
  • why he believes he can do it
  • the criterion by which he will decide whether he has succeeded or failed.

He is, in fact, able to explain this to his next-door neighbour. If we are too lazy or too stupid to follow such an explanation, we should resign. By urging young scientists to submit papers for publication and to apply for grants so that we can rely on the judgements of others we make ourselves ridiculous.

Source: Edsger W. Dijkstra, EWD1018, Nuenen, 21 December 1987

Disclaimer: I did not write a single word of this blog post (not even the title). But I agree with all of it.

Why I am not publishing in PLoS One, yet

PLoS One is a new peer-reviewed journal (2006) with many interesting features:

Unfortunately, for a Computer Scientist, it is not yet attractive:

  • The Computer Science section is filled with biology and medicine papers making use of Information Technology. In other words, the PLoS One taxonomy  confuses Information Technology and Computer Science! Thankfully, I could find one article in Natural Language Processing which might be the first and only Computer Science paper published in PLoS One. So there is hope.
  • As a related point, PLoS One is not indexed at the usual places as a Computer Science journal (DBLP, ACM DL, and so on). Of course, no Computer Science indexing is possible until PLoS One correctly classifies the Computer Science articles.

If they could fix these problems, I would gladly submit some of my work to them. PLoS One could become a useful journal in Computer Science over time. What about prestige? PLoS One uses article-level metrics. Instead of trying to be a prestigious journal, PLoS One helps you measure the impact of your own papers.

    A recipe for interesting Computer Science research papers

    In Are your research papers telling original stories?, I claimed that the main benefits of the typical research paper were that:

    • the contribution to the state-of-the-art is clear (what did you invent?);
    • we can quickly quantify the value of the contribution (how well does it work?).

    Basically, research papers are fitted to the needs of the current peer review system.

    The current breed of research papers are also convenient. There are millions of ways of improving any given process. Each improvement can become a research paper. You can even proceed systematically. Pick any given solution to a problem and add a twist to it. Can you solve the problem faster? Can you solve the problem by using less memory? Can you solve the problem incrementally? And so on. You can manufacture countless research papers without ever learning anything new. And because you measured and categorized all of your contributions, you are even likely to get much recognition! Moreover, because you invented many new things, you may even get your name on a framework, algorithm or problem! If any of what you did is useful for industry, you may even get rich!

    But I may not find your work interesting. I would like to propose an alternative recipe that should produce more interesting research papers:

    • Pick any process followed by practitioners or by nature. How do human beings or ants solve a given problem? What heuristics do successful engineers follow?
    • Explain, model or reproduce the process in question.

    There are endless puzzles out there. For example, I have no satisfactory explanation of why wikipedia worked. Had I been asked about a project like wikipedia in the nineties, I would have predicted failure. Admit it: you would have done the same. Yet, it worked. Why?

    Look at how the best programmers work. They have many clever tricks (algorithms, processes, strategies) that you will never find in any textbook. Sometimes these tricks work unreasonably well. But we have no explanation.

    Remember: Nature is the best coauthor.

    Further reading: Write good papers.

    Is Open Access publishing the solution? Really?

    Back when I was a consultant, I had client who was convinced that Microsoft Windows was free software. So, he insisted that all applications ran on Microsoft’s web server. To him, the Apache server was an expensive proposition. Yet, Microsoft is not at all in the business of free software, but their cost is hidden from the consumer.

    Similarly, for professors and many graduate students, the costs of academic publishing are hidden. UQAM pays for my unrestricted access to research papers. Open Access research papers might have marginally more impact. However, the costs of Open Access are significant for me, just like the costs of Apache were important for my client:

    • There are far fewer Open Access journals to choose from.
    • On average, Open Access journals have lower standing.

    Open access to research papers is the responsible thing to do. How do we change the system? Do we boycott restricted journals? No. There is nothing wrong with restricted journals. We should not force them to close, we should evolve so that they become irrelevant. For now, they serve their purpose. There is no adequate drop-in replacement.

    Disruption is the solution. Younger folks may not remember this, but in the nineties, Microsoft had a tight grasp of the software market. Right now, Microsoft’s monopoly is irrelevant as far as I am concerned. Anyone can buy a PC, install Linux on it and access everything that matters. Of course, the real story is not that Linux has beaten Microsoft Windows. Instead, it is the operating system that has lost relevance.

    How do we generate disruption? By providing alternatives. It is important to realize that these alternatives do not have to be better. Instead, they have to be more convenient and simpler. Unfortunately, I do not believe that Open Access journals are disruptive. They are challengers, certainly, but due to economics, they may fail to subvert the current system.

    Several years ago, I decided to publish all my preprints to arxiv. You can even grab an atom feed of my publications. Arxiv is indexed by Google Scholar and DBLP. Arxiv is well managed. Their web site is usable. Before I used arxiv, I would merely post my papers on my web site. This is an individual choice. While it is not apolitical, it does not require me to change anybody’s mind.

    To me, the single most important recent event in academic publishing has been the publication by Perelman of his solution to the Poincaré conjecture on arxiv. This is truly a historical event.

    Self-publishing is both simpler and more convenient than traditional publishing. It is disruptive. As is often the case with disruptive solutions, it lacks some important features. For example, reputation, peer-review, quality control, review, validation, authentication are difficult with self-publishing. But that is to be expected. The solution is not to try to emulate these features one by one. Indeed, we may find that many of these important missing features are not relevant.

    Further reading: Peer Review is Vanity Publishing

    Next Page »

    17 queries. 0.438 seconds. Valid XHTML

    Powered by WordPress

    Subscribe to this blog in a reader or by Email.