Stop generating metadata and access the full content!

 Many researchers advocate the use of metadata to help find or recommend content automatically. Metadata is certainly useful when aggregating content for human beings: I first read the titles of research papers before reading them. However, machines do better when they access at least some of content  (Lin, 2009). Moreover, metadata is of little value in ranking answers (Hawking and Zobel, 2007). 

I think that researchers cling to metadata because that is how we have indexed books for so long. When I was a kid, full text searches in a library was unthinkable. Yet, there is no escape: everything is miscellaneous. Folksonomies and ontologies will not save the day. When working with machines, let go of metadata and embrace the full content.

I am particularly puzzled by a common research approach. Take an object. Extract metadata. Then compare objects between themselves using the metadata, or use the metadata for retrieval. I understand that this may constitute a useful form of dimensionality reduction. Yet, researchers frequently omit to check whether it is necessary to extract metadata at all.

Reference

  • David Hawking and Justin Zobel, Does topic metadata help with Web search? Journal of the American Society for Information Science 58 (5), 2007.
  • Jimmy Lin, Is searching full text more effective than searching abstracts? BMC Bioinformatics 2009, 10:46, 2009.

Credit: Thanks to Andre Vellino for motivating this post.

Make your research papers easy to skim

Claus Metzner asked us how often we read research papers carefully. He reads fully less than 1% of all research papers he comes across. This must be true of nearly everyone. We read a few titles, fewer abstracts, even fewer introductions, we skim a few papers, but we rarely read entire papers carefully.

We could blame information overload and its academic companion: publish or perish. However,  when I read a research paper, most often, I only need to know the main contribution of the research paper. As Claus puts it:

I don’t care too much if the arguments, methods and results of a paper are 100% sound or not. Mostly I am hunting for small reusable items

Hence, we should:

  • Pick good titles giving away the main insight (“The Earth is round” and not “On the geometry of our planet”);
  • Pick good section headers giving away the conclusions of the section (replace “Discussion” by “this drug fails to work”);
  • Use bullet points to outline our results;
  • Use simple schemas and figures.

Ultimately, we should write research papers expecting our readers to barely skim them.

A researcher’s garden

I love gardening. I get good results too. However, my wife is very critical of my techniques. 

  • While I work hard, my work is often obtuse. Who grows his perennials from seeds  these days? The result matters less to me than what I learn in the process.
  • I do not care for uniformity, I prefer diversity. For example, I currently have 7 types of Morning Glory. I am always experimenting.

Promoted to full professor

At least in North America, professors are usually first hired at the rank of assistant professor. Your salary is poor and you have little job security. Once you get tenure, you become associate professor. However, if you can convince a set of your peers—including professors from other universities—that you have done an exceptional job as a professor, you may get promoted to the rank of full professor. The salary is better. As you grow older, the salary difference becomes large.

As of yesterday, my promotion to the rank of full professor has been officially approved. 

As my students will tell you, I am not always good at explaining strange concepts. Some weeks ago, I called my mother:

  • (me) I am promoted!
  • (my mother) to what?
  • (me) … as a professor
  • (my mother) <silence>
  • (me) I am promoted to the rank of full professor.
  • (my mother) <silence>
  • (me) Ok. I will keep the same job, I will just make a lot more money.
  • (my mother) Ah! Ok. Good.

Reinventing university education? Practical ideas…

Yesterday, John stressed that education is about helping people discover their passion. I have many brilliant students, but few passionate students. 

Success is more a matter of hard work than talent.  We need to humble our students with difficult problems and long assignments. However, we should find ways to do it without turning our students into bureaucrats.

To fight industrialized teaching, I have been using cognitive stress: make sure part of your course goes beyond what any one student can grasp. I think that you learn best when your brain is forced to rewire itself. Cognitive stress is my way to force students to think differently. If I could, I would take my students on a flight to Mars and I would teach in Klingon.

(Don’t worry about taking my courses: 90% of all my students would recommend my courses to other students. Just make sure you don’t belong to the 10% who hates me.)

I will change the way I teach in the coming years. Maybe I also need some cognitive stress to evolve as a professor. And, I feel that professors must evolve—quickly—if they are to remain relevant in this new century.

  • Do we need grades? Grades are convenient to compile statistics about students. No need to get to know the individuals, to compute averages and standard deviations. Mike Stiber described on this blog the grades as an interface from academia to the rest of the world. The Rancourt case has convinced me to reevaluate this a priori. I want students—undergraduate students—to challenge and surprise me. For the good students, I want them to go beyond getting an A. For the weak students, I want them to try to turn their weaknesses into strengths. The Rancourt solution is to give an A to all students who pass your course. You take the railroad that are tests and standard assignments, and you just pull it off. 
  • Do we even need formal classes? While I resisted project-based courses for numerous years, I might have been misguided. I am thinking about setting up courses with broad guidelines, where students must build something. My role would be to set the frame of reference, tell them what sort of skills I want them to learn, and provide them with feedback. I have had great luck supervising senior B.Sc. thesis. For example, one of my student did a survey on fast shortest path computations over the DBLP database. Many of the more specialized courses could probably be project-based. I think that this helps to evacuate the role of the professor as a provider of content. I may know a lot about Databases, Information Retrieval or XML. But I can’t spill the content into my students’ brains. And I am not the sole source of content. Nor even the best source.
  • Automated personalized teaching is probably underutilized. For teaching technical skills, such as computing derivatives or programming in Java, I think that a human instructor is wasted. Automated tests can provide faster feedback, and allow for more ambitious courses through personalization.

These measures will challenge the weak or lazy students, but also the best students. Without a clear path in a given course, some students will feel cheated. After all, it is much easier to be told what to do. Ah! But that’s precisely what I want: students who dislike being told what to do!

Research interests should be short-lived?

How did I come to Computer Science? Through geophysics! I was once given data sets spanning several CD-ROMs. Back then, this was a lot of data! To this day, my research is still inspired by this short gig in geophysics. I keep trying to bridge mathematics and software implementations.

This warped path was beneficial to me. I still feel the need to keep my brain on its toes. In Mapping the evolution of scientific ideas, Herrera et al. suggests that this strategy might be sound:

(…) communities that are more willing to reinvent themselves tend to be the ones that have most impact per paper (…) our analysis shows that communities with a higher fitness tend to be short-lived.

Good researchers  need to be exposed to strange or surprising ideas and problems. Attending the same conference year after year does not count.

How peer review is supposed to help you!

Malicious authors know how to get past peer review without effort:

  • Pretend to have run extensive experiments supporting your theories. When the experiments contradict you or are merely difficult to explain, clean them out conveniently. Nobody will try to reproduce your experiments on the short run.
  • Do not think through the deep and complicated issues: reviewers only have a few days at the most to review your papers anyhow!
  • Pick your problems and experiments so as to make the problem as elegant as possible. Do not bother yourself with nasty (but important) details: they will merely get in the way of getting your paper accepted.

 

Peer review is meant to help you generate better results. Listen to the reviewers.  Peers are (potentially nasty and ill-tempered) advisors. Convince yourself that your work is good, even under some scrutiny.

Remember: your research program is more than the sum of your papers. Many useless researchers wrote many more papers (and got larger grants) than Shannon or  Feynman. Don’t write papers whose only virtue is that they may eventually get past peer review. It is a depressing goal.

18 queries. 0.406 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.