Death to the 3-hour exam

As an undergraduate student, I hated the 3-hour exams. But I knew how to do well on them. The secret? Get your hands on all exams from the last ten years for this class. Sit down for a couple of days and grind through all questions. It works because a 3-hour exam is a very specific context. 

But wait… as a professor, why would I care about how my students do on a 3-hour exam? Does it measure what I care about? Jon Dron said it best: ”So, I have been thinking about what exams taught me:(…) that the most important things in life generally take around three hours to complete.”

We need novelists, NASA engineers, and researchers. People who can work for days, weeks, months, on the same project. What I want from my students is an ability to sit down for hours and days, and work out difficult problems. I see no evidence that training specifically for exams is the right type of training.

What are the alternatives? At the University of Toronto, we only had take-home exams in higher Mathematics classes. The problems were difficult, but satisfying. What about cheating? I will do whatever I can in my classes to prevent cheating, however my primary function cannot be to thwart cheaters. 

(Starting in September 2009, I am switching all my classes to take-home exams.)

Why senior researchers and managers should analyze data themselves…

Scientists, businessman and even spies are supposed to analyze data collaboratively. Are they?

If you are a scientist, you are familiar with the following type of research collaboration: a lowly student collects the data, crunches the numbers and plots the data. Other collaborators—such as the professor—merely comment on the tables and plots. Similarly, the CEO sees the pie chart, while the assistant crunches the numbers. That is vertical collaboration: you clean the basement and I will clean the main floor.

Yet, reliable data analysis requires horizontal collaboration.  Indeed, there are downsides to task specialization:

  • By never looking at the data, senior scientists and managers rely on experience and hearsay. Their incoming bandwidth is dramatically reduced. Nature is the best coauthor. Consider how the best American spies were fooled prior to 9/11 while all the data to catch the terrorists was available. Bandwidth is a requirement to be smart.
  • When a single person crunches the numbers, hard-to-detect errors creep in. The problem is serious: Ioannidis showed that most research findings are wrong.
  • With nobody to review the source data, the sole data analyst is more likely to cheat. Why rerun these tests properly, when you can just randomly dismiss part of the data? People are lazy: when given no incentive, we take the easy way out.

The common justification for task specialization is that senior researchers and managers do not have the time. Yet, 30 years ago, researchers and managers did not type their own letters. Improve the tools, and reduce task specialization.

With Sylvie Noël, I decided to have a closer look. My preliminary conclusions are as follows:

  • There are adequate tools to support rich collaboration over data analysis. Collaboratories have been around for a long time. We have the technology! Yet, we may need a disruption: inexpensive, accessible and convenient tools. The current migration tower Web-based applications might help.
  • Given a chance, everyone will pitch in. To make our demonstration, we collected user data from sites such as IBM Many Eyes and StatCrunch. We then ran an Ochoa-Duval analysis. We find that the network of users within web-based data analysis tools is comparable to other Web 2.0 sites.

As a database researcher, I think that further progress lies with loosely coupled data (no big tables! no centralized tool!) and flexible visualization tools (stop the pie charts! go with tag clouds!). I am currently looking for new research directions on this problem, any idea?

Further reading

The roots of plagiarism are deep

William Meehan—president of the Jacksonville State University—got his Ph.D. by copying largely word-for-word the dissertation of another student. He did not even copy an obscur thesis published in some remote country. In fact, he copied the thesis of a fellow University of Alabama graduate. And wait for it: they graduated nearly at the same time. And 3 professors were on both dissertation committees.

Call me naïve, but I am surprised.  We all know there are bad apples. Students will cheat. But cheating on a Ph.D. dissertation must be extremely difficult. It takes guts to copy a dissertation submitted recently, at the same school. It should not be possible. The University of Alabama seems like a respectable school, with actual professors and Ph.D. programs. What happened?

The thesis supervisor ought to know. A supervisor must provide feedback throughout the student’s work, from the proposal stage, to the final revision.  Either he knew about the plagiarism (I doubt it) or else, he played no role in supervising the student. The student came to him with a complete thesis. He read it over, made some minor comments, and approved it. Rubber stamping a thesis should be as bad as plagiarism.

(It seems that professor Howard Jones was his supervisor though I am unsure.)

Further reading: Alabama college president accused of plagiarism (USA Today)

Stop generating metadata and access the full content!

 Many researchers advocate the use of metadata to help find or recommend content automatically. Metadata is certainly useful when aggregating content for human beings: I first read the titles of research papers before reading them. However, machines do better when they access at least some of content  (Lin, 2009). Moreover, metadata is of little value in ranking answers (Hawking and Zobel, 2007). 

I think that researchers cling to metadata because that is how we have indexed books for so long. When I was a kid, full text searches in a library was unthinkable. Yet, there is no escape: everything is miscellaneous. Folksonomies and ontologies will not save the day. When working with machines, let go of metadata and embrace the full content.

I am particularly puzzled by a common research approach. Take an object. Extract metadata. Then compare objects between themselves using the metadata, or use the metadata for retrieval. I understand that this may constitute a useful form of dimensionality reduction. Yet, researchers frequently omit to check whether it is necessary to extract metadata at all.

Reference

  • David Hawking and Justin Zobel, Does topic metadata help with Web search? Journal of the American Society for Information Science 58 (5), 2007.
  • Jimmy Lin, Is searching full text more effective than searching abstracts? BMC Bioinformatics 2009, 10:46, 2009.

Credit: Thanks to Andre Vellino for motivating this post.

Make your research papers easy to skim

Claus Metzner asked us how often we read research papers carefully. He reads fully less than 1% of all research papers he comes across. This must be true of nearly everyone. We read a few titles, fewer abstracts, even fewer introductions, we skim a few papers, but we rarely read entire papers carefully.

We could blame information overload and its academic companion: publish or perish. However,  when I read a research paper, most often, I only need to know the main contribution of the research paper. As Claus puts it:

I don’t care too much if the arguments, methods and results of a paper are 100% sound or not. Mostly I am hunting for small reusable items

Hence, we should:

  • Pick good titles giving away the main insight (”The Earth is round” and not “On the geometry of our planet”);
  • Pick good section headers giving away the conclusions of the section (replace “Discussion” by “this drug fails to work”);
  • Use bullet points to outline our results;
  • Use simple schemas and figures.

Ultimately, we should write research papers expecting our readers to barely skim them.

A researcher’s garden

I love gardening. I get good results too. However, my wife is very critical of my techniques. 

  • While I work hard, my work is often obtuse. Who grows his perennials from seeds  these days? The result matters less to me than what I learn in the process.
  • I do not care for uniformity, I prefer diversity. For example, I currently have 7 types of Morning Glory. I am always experimenting.
« Previous PageNext Page »

22 queries. 0.466 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.