AAAI 2008 (January 25, 2007 / July 13-17, 2008)
AAAI-08 will be held in Chicago. It is one of the largest annual conference in AI. They also have a special AI and the Web track.
AAAI-08 will be held in Chicago. It is one of the largest annual conference in AI. They also have a special AI and the Web track.
I stumbled upon this nice paper Social-Organizational Characteristics of Work and Publication Productivity among Academic Scientists in Doctoral-Granting Departments (Journal of Higher Education, 2007). I skimmed it and here are some sketchy conclusions:
I often read that good research papers should tell a story. There should be a continuous flow. We should care about the story, we should be eager to learn about what will happen in the next section.
I do not know about you, but I do not come across many such research papers. Mathemagenic points us to some references, going back to Plato, as to why storytelling is not taken seriously. It seems to me that our current approach to research papers assumes that knowledge is axiomatic: you can decompose your paper in facts that can be laid out in a formal language. Whether this is true or not, I do not care, the fact of the matter is… I do not write research papers to lay out facts. I wish it were that simple.
In any case, I decided to do some Googling on the topic to see if I could find new clever techniques to make my research papers more exciting (irrespective of the quality of the science), and I found this related piece of advice:
A good way [to describe your results] is to tell a story, an interesting one that puts everything into perspective re the existing literature and conveys how it is you succeeded where others failed. What was the key idea which nobody else spotted? It should not reflect the actual historical progress of your research (which may have been long and winding) but rather based on how your thinking should have gone with the benefit of hindsight. This is not quite the same as the shortest logical path (which would not be understood until after the paper is read), but rather involves an historical element with reference to works and ideas that the reader might already be familiar with.
What I could not find were good examples of storytelling in research papers. Anyone has a pointer?
Reference: Hints for New PhD students on How to Write Papers (Shahn Majid)
Web search engines such as Google look at which page links to which page to determine what are the authoritative Web pages. A good algorithm in this context is one that is hard to fool: if you and your friends decide to mutually add link to each others, it should be hard to make much of a difference. Sérgio commented earlier on this blog that PageRank is known to be just a marketing. So I decided to go hunting. Up until now, I thought PageRank was a clever idea because it feels like it would be harder to fool it than just counting how many in-bound link a page has. It was not very long before I found a reference that supported Sérgio’s claim:
Log of indegree was highly correlated with Google-reported PageRank scores, and just as effective when predicting desirable company attributes. Further, we found that PageRank scores for sites within a known spam network were no lower than would be expected on the basis of their indegree. We encounter no compelling evidence to support the use of PageRank over indegree.
Reference: Upstill, T. and Craswell, N. and Hawking, D., Predicting fame and fortune: Pagerank or indegree, ADCS2003, 2003.
Anyone knows of any demonstrated benefit of PageRank over merely counting the number of inbound links? Is PageRank more resilient at all?
Update: do read the comments! They are more interesting than my post.
Véronis discovered something very interesting. About a third of the time, Google’s results include the Wikipedia link as the first link. His explanation is insightful:
How can this sudden interest in Wikipedia by both engines be explained? It is undoubtedly connected with the increasing difficultly engines have in calculating satisfactory ranking. The good old days of PageRank algorithms are over. (…) The explosion of blogs and news sites has changed the situation considerably.
If Web topology cannot cope anymore, this means we need to introduce time as a factor. Any taker on an hypergraph version of PageRank? How do you call a time-varying Markov process?
I have stated before that researchers should focus on new problems or on providing solutions that are at least an order of magnitude better than previous solutions. There is a catch to this statement: it says that if you are within an order of magnitude of the ultimate answer, then you should stop, unless, maybe, you can prove that you have achieved the ultimate solution. Proving you have the best possible solution whereas others were providing approximation does constitute a significant gain, certainly worth publishing, but this is rarely possible. Most real problems are too complex to allow our puny brain to prove that a solution is ultimate.
So do we just accept that being within an order of magnitude of the answer is good enough? If you are within an order of magnitude of perfection with respect to all indicators, simultaneously, then maybe you ought to stop. Yes?
Another catch to this is that you may not know exactly how far off you are from the best solution. It might be very difficult to study the characteristics of the ideal solution. What then? Do we still hold off on publishing incremental improvements to existing solutions? Do you call the problem solved if, over a long period of time, nobody was able to improve the state-of-the-art by an order of magnitude?
Food for thoughts: Recently, John Riedl asked on his blog whether we could tell when spam filters would get to be good enough. My immediate answer was to apply the Turing test: a spam filter is good enough when it has achieved a human-level of performance. Yet, I know this is not the answer. Nothing is ever perfect, but my level of performance is far from the ultimate goal. I doubt spam filters will ever pass my Turing test, but even if they did, I am likely not to be satisfied. One false positive is still one too many.
Disclaimer: this is not meant to be a scientific survey. However, if you disagree with my survey, please do add a comment!
Disclaimer 2: I drink a lot of coffee. I almost certainly reach a point where it impacts negatively my performance because I get too tensed to focus. However, I find it preferable to boredom.
30 queries. 0.215 seconds. Valid XHTML
Powered by WordPress
© 2004-2008, Daniel Lemire (lemire at acm dot org). This work is licensed under a Creative Commons License.
Subscribe to this blog
in a reader or
by Email.