<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/rss2enclosuresfull.xsl" type="text/xsl" media="screen"?><?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/itemcontent.css" type="text/css" media="screen"?><rss xmlns:media="http://search.yahoo.com/mrss/" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0"><channel><title>Daniel Lemire's blog</title><link>http://www.daniel-lemire.com/blog</link><description>Daniel Lemire's blog is about life in academia, research in Computer Science, wondering how we can reconcile fast databases and algorithms with the informal and asemantic nature of the world around us. It is broadcasted from Montreal (Canada).</description><language>en</language><lastBuildDate>Mon, 01 Dec 2008 23:06:19 -0600</lastBuildDate><generator>WordPress http://wordpress.org/</generator><itunes:explicit>no</itunes:explicit><itunes:subtitle>Daniel Lemire's blog is about life in academia, research in Computer Science, wondering how we can reconcile fast databases and algorithms with the informal and asemantic nature of the world around us. It is broadcasted from Montreal (Canada).</itunes:subtitle><creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/2.0/</creativeCommons:license><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/daniel-lemire/atom" type="application/rss+xml" /><feedburner:emailServiceId>1396075</feedburner:emailServiceId><feedburner:feedburnerHostname>http://www.feedburner.com</feedburner:feedburnerHostname><item><title>Put your lectures only easily and for free with Panopto</title><link>http://feeds.feedburner.com/~r/daniel-lemire/atom/~3/471984072/</link><category>Academia/Research</category><category>Science and Technology</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Daniel Lemire</dc:creator><pubDate>Mon, 01 Dec 2008 20:51:09 -0600</pubDate><guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1578</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p><img style="margin:2px;float:right" src="http://panopto.com/images/screenshots/panopto_notes_small.png" alt="Panopto screen shot" /> I saw an impressive online course this morning using <a href="http://panopto.com">Panopto</a>. The asynchronous videocasting was really convincing. Basically, the PowerPoint slides are synced with the video, and you can move up or down in the slide deck, with the video syncing automatically. Students can annotate your slides. You can add secondary video feeds or screen capture.</p>
<p>What is more is that a trusty colleague said it was really easy. You can do it on his own given a good camera. The catch is that Windows is required. The price is <a href="http://panopto.com/solutions_state-local-gov_cicero_faq.aspx">free</a> or <a href="http://panopto.com/products_panopto-hosted.aspx">relatively cheap</a>.</p>
<p><strong><br />
</strong></p>
<p><strong>Reference</strong>: The <a href="http://digitalproducer.digitalmedianet.com/articles/viewarticle.jsp?id=578925">November press</a> release where Panopto announces the free version of their product.</p>
<img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/471984072" height="1" width="1"/>]]></content:encoded><description>I saw an impressive online course this morning using Panopto. The asynchronous videocasting was really convincing. Basically, the PowerPoint slides are synced with the video, and you can move up or down in the slide deck, with the video syncing automatically. Students can annotate your slides. You can add secondary video feeds or screen [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total><feedburner:awareness>http://api.feedburner.com/awareness/1.0/GetItemData?uri=daniel-lemire/atom&amp;itemurl=http%3A%2F%2Fwww.daniel-lemire.com%2Fblog%2Farchives%2F2008%2F12%2F01%2Fput-your-lectures-only-easily-and-for-free-with-panopto%2F</feedburner:awareness><feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2008/12/01/put-your-lectures-only-easily-and-for-free-with-panopto/</feedburner:origLink></item><item><title>Are you really running out of time?</title><link>http://feeds.feedburner.com/~r/daniel-lemire/atom/~3/470285469/</link><category>Academia/Research</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Daniel Lemire</dc:creator><pubDate>Sun, 30 Nov 2008 08:20:09 -0600</pubDate><guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1498</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>A common feeling among creative workers is the <strong>lack of time</strong>. Yet, most people will run out of energy before they run out of time. A single task that takes you 5 minutes (asking a BDO for IP rights) can drain you out for a week. Another task, like lecturing for 3 hours, can energize you for the rest of the week. Highly productive people do not have more time, but they may have more energy, more method and better feedback on their progress.</p>
<p>I believe that three problems lead us to conclude we lack time:</p>
<ul>
<li><strong>You are spending too much time on boring tasks.</strong> To be productive, you need to work on projects you love. For this reason, creative people should pick their projects.</li>
<li><strong>You fail to manage your projects</strong>. Without help, you can only keep track of our 7 projects or tasks at any one time. If you want to do more, a method is needed. Myself, I use <a href="http://en.wikipedia.org/wiki/Getting_Things_Done">GTD</a>. But some method is needed to scale up to a large number of projects. Without method, you will drift to unessential tasks and then blame the lack of time to explain why important tasks went unattended.</li>
<li><strong>You do not measure your progress</strong>. You need to get feedback about the quality and quantity of your work. Myself, I put my work under <a href="http://en.wikipedia.org/wiki/Subversion">subversion</a> and get daily emails of what files changed. It is a crude by effective measure of my work. Also, tracking your project carefully, at the task level, helps. Finally, having coworkers who react to your work is a blessing. Without measure of your progress, you may realize too late that your projects did not progress and then blame the lack of time.</li>
</ul>
<img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/470285469" height="1" width="1"/>]]></content:encoded><description>A common feeling among creative workers is the lack of time. Yet, most people will run out of energy before they run out of time. A single task that takes you 5 minutes (asking a BDO for IP rights) can drain you out for a week. Another task, like lecturing for 3 hours, can energize [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:awareness>http://api.feedburner.com/awareness/1.0/GetItemData?uri=daniel-lemire/atom&amp;itemurl=http%3A%2F%2Fwww.daniel-lemire.com%2Fblog%2Farchives%2F2008%2F11%2F30%2Fare-you-really-running-out-of-time%2F</feedburner:awareness><feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2008/11/30/are-you-really-running-out-of-time/</feedburner:origLink></item><item><title>Social Networking for Scientists: Mendeley</title><link>http://feeds.feedburner.com/~r/daniel-lemire/atom/~3/468432408/</link><category>Academia/Research</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Daniel Lemire</dc:creator><pubDate>Fri, 28 Nov 2008 09:27:08 -0600</pubDate><guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1587</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>Among scientists-bloggers, the new buzz word is <a href="http://www.mendeley.com/">Mendeley</a>: a social networking platform for scientists (<a href="http://my.biotechlife.net/2008/08/13/mendeley-goes-public-and-gets-some-love-from-ex-skype-and-lastfm/">Ricardo Vidal</a>, <a href="http://www.sylvienoel.ca/blog/?p=1004">Sylvie Noël</a>, <a href="http://lemeshko.blogspot.com/2008/11/mendeley-some-suggestions.html">Misha Lemeshko</a>, <a href="http://blog.mckuhn.de/2008/08/mendeley-mekentosj-papers-web-20.html">Michael Kuhn</a>, &#8230;). The site is barely getting started and is still in early beta, there are bugs and limitations. However, the London-based has funding and a solid staff.</p>
<p>Their vision statement is compelling:</p>
<blockquote><p>Mendeley is free <strong>social</strong> <strong>software</strong> for managing and sharing research papers. It is also a <strong>Web 2.0 site</strong> for discovering research trends and connecting to like-minded academics. To achieve our long-term vision of a “<a href="http://www.mendeley.com/blog/2008/07/an-excellent-euroscience-adventure-part-ii/">Last.fm for research</a>“, we’re working with the former founding engineers of <a href="http://www.skype.com/">Skype</a> and <a href="http://www.last.fm/">Last.fm</a>’s former chairman.</p></blockquote>
<p>Last night I created a <a href="http://www.mendeley.com/profiles/daniel-lemire">profile</a>. I got tired of entering my papers and I stopped entering them around 2005-2006. If you have 100 published papers, you are going to swear a lot. It is sad that you cannot just link to your existing pub. list (such as arxiv.org).</p>
<p>Where I see the potential is in the social networking. It seems that all of the scientific networking I do is &#8220;hand-crafted&#8221;. I am hoping for more!</p>
<img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/468432408" height="1" width="1"/>]]></content:encoded><description>Among scientists-bloggers, the new buzz word is Mendeley: a social networking platform for scientists (Ricardo Vidal, Sylvie Noël, Misha Lemeshko, Michael Kuhn, &amp;#8230;). The site is barely getting started and is still in early beta, there are bugs and limitations. However, the London-based has funding and a solid staff.
Their vision statement is compelling:
Mendeley is free [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">7</thr:total><feedburner:awareness>http://api.feedburner.com/awareness/1.0/GetItemData?uri=daniel-lemire/atom&amp;itemurl=http%3A%2F%2Fwww.daniel-lemire.com%2Fblog%2Farchives%2F2008%2F11%2F28%2Fsocial-networking-for-scientists-mendeley%2F</feedburner:awareness><feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2008/11/28/social-networking-for-scientists-mendeley/</feedburner:origLink></item><item><title>Innovative ideas are indistinguishable from crackpot ones</title><link>http://feeds.feedburner.com/~r/daniel-lemire/atom/~3/467499977/</link><category>Academia/Research</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Daniel Lemire</dc:creator><pubDate>Thu, 27 Nov 2008 10:59:13 -0600</pubDate><guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1494</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>It is impossible to distinguish objectively and systematically bogus work from high quality work. You can sort work based on external attributes such as <em>quality of the presentation</em>, <em>length</em>, <em>logical correctness</em>, <em>prestige of the authors</em>, and <em>methodology</em>, but not on the significance of the work. Significance cannot be disproved at the time of the review. Even technical details end up being fundamental ideas: this happens frequently in mathematics where lemmas often outshine theorems on the long term.</p>
<p>I review several research papers every month, and several research funding proposals every year. At best, I can determine that something is badly presented. I can find logical or mathematical errors. Beyond this, my opinion is probably often wrong.</p>
<p>Here are a few things I would have or I have categorized as crackpot ideas:</p>
<ul>
<li>Back in 1990, I would have predicted that the WWW was impractical. How can you deal efficiently with broken links? Who is going to maintain all these links? Yet, it works. I almost never encounter a 404 (missing page) error.</li>
<li>Back in 1991, I would have laughed had anyone that you can efficiently index and categorize over 8 billion dynamic Web pages, much of which appears and disappears frequently. Yet Google, Yahoo and many other search engines are able to index daily the content of my posts. They differentiate my content from webspam. They also determine the authority of my page. Yet, there is no central registry, no form of quality control, and so on. While they use technically sophisticated techniques, much of it works simply by brute force: keep revisiting and reindexing the sites you expect to change.</li>
<li>Not long ago, I had concluded that <a href="http://twitter.com/lemire/">Twitter</a> was a useless idea. Months later, I realize that Twitter offers <a href="http://www.sylvienoel.ca/blog/?p=1001">ambient collaboration</a>. I believe it caters to an essential need that  had gone mostly unnoticed previously. (If you are not on Twitter, you ought to be.)</li>
<li>The first time I read about <a href="http://en.wikipedia.org/wiki/Bitmap_index">bitmap indexes</a>, I thought it was a limited clever technical trick with little scientific interest. (I just published two papers on bitmap indexes and I have more on the way!)</li>
<li>Jim Gray&#8217;s <a href="http://research.microsoft.com/~gray/DataCube.doc">data cube idea</a> is to work with a lattice of 2<em><sup>d</sup></em> cuboids. Since, in data warehouses, we often have <em>d</em> large (<em>d</em>&gt;15), the materialization of even a small fraction of these cuboids is impractical. Yet, it has been very fruitful both in industry and in academia.</li>
</ul>
<p>Fortunately, if you merely discard the papers that omit to follow my <a href="http://www.daniel-lemire.com/blog/rules-to-write-a-good-research-paper/">guidelines</a>, you already discard quite a number! Requiring papers to be without logical flaws and well written is often quite harsh!</p>
<p>Anyhow, there must be some link to evolution theory. I am sure that there has been new species which presented initially little interest, but ended up being of crucial importance.</p>
<p>For an entertaining take on this problem, see:  Simone Santini, <a href="http://www.computer.org/portal/site/computer/menuitem.5d61c1d591162e4b0ef1bd108bcd45f3/index.jsp?&amp;pName=computer_level1_article&amp;TheCat=1015&amp;path=computer/homepage/1205&amp;file=profession.xml&amp;xsl=article.xsl&amp;;jsessionid=JnTlsb5LGDp3hmyxW2WD9R5J2fJJrtbhRR2nbGsBwcnzNCYdRK9X!1817242887">We are sorry to inform you&#8230;</a>, IEEE Computer, December 2005.</p>
<img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/467499977" height="1" width="1"/>]]></content:encoded><description>It is impossible to distinguish objectively and systematically bogus work from high quality work. You can sort work based on external attributes such as quality of the presentation, length, logical correctness, prestige of the authors, and methodology, but not on the significance of the work. Significance cannot be disproved at the time of the review. [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total><media:content url="http://feeds.feedburner.com/~r/daniel-lemire/atom/~5/467499978/DataCube.doc" fileSize="144896" type="application/msword" /><itunes:explicit>no</itunes:explicit><itunes:subtitle>It is impossible to distinguish objectively and systematically bogus work from high quality work. You can sort work based on external attributes such as quality of the presentation, length, logical correctness, prestige of the authors, and methodology, bu</itunes:subtitle><itunes:summary>It is impossible to distinguish objectively and systematically bogus work from high quality work. You can sort work based on external attributes such as quality of the presentation, length, logical correctness, prestige of the authors, and methodology, but not on the significance of the work. Significance cannot be disproved at the time of the review. [...]</itunes:summary><itunes:keywords>Academia/Research</itunes:keywords><feedburner:awareness>http://api.feedburner.com/awareness/1.0/GetItemData?uri=daniel-lemire/atom&amp;itemurl=http%3A%2F%2Fwww.daniel-lemire.com%2Fblog%2Farchives%2F2008%2F11%2F27%2Finnovative-ideas-are-indistinguishable-from-crackpot-ones%2F</feedburner:awareness><feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2008/11/27/innovative-ideas-are-indistinguishable-from-crackpot-ones/</feedburner:origLink><enclosure url="http://feeds.feedburner.com/~r/daniel-lemire/atom/~5/467499978/DataCube.doc" length="144896" type="application/msword" /><feedburner:origEnclosureLink>http://research.microsoft.com/~gray/DataCube.doc</feedburner:origEnclosureLink></item><item><title>Diversity in recommender systems: sketch of a bibliography</title><link>http://feeds.feedburner.com/~r/daniel-lemire/atom/~3/463920852/</link><category>Science and Technology</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Daniel Lemire</dc:creator><pubDate>Tue, 25 Nov 2008 06:49:21 -0600</pubDate><guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1570</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>I have been arguing <a href="http://www.daniel-lemire.com/blog/archives/2007/12/22/collaborative-filtering-why-working-on-static-data-sets-is-not-enough/">on this blog</a> that while everyone knows diversity is a desirable property of recommender systems, there has been little work on the topic. To make my claim precise, I decided to list the papers addressing both recommender systems and diversity. I mean this list to be complete.</p>
<ul>
<li> L. McGinty, B. Smyth, <a href="http://www.cs.pitt.edu/~mrotaru/comp/rs/McGinty%20ICCBR%202003.pdf">On the Role of Diversity in Conversational Recommender Systems</a>, in: Proc. ICCBR 2003, 2003.</li>
<li>Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, Georg Lausen, <a href="http://www.informatik.uni-freiburg.de/%7Edbis/Publications/05/WWW05.html">Improving Recommendation Lists Through Topic Diversification</a>,  <em>Proceedings of the 14th International World Wide Web Conference (WWW &#8216;05),</em> May 10-14, 2005, Chiba, Japan.  (Thanks to Daniel Haran for pointing me this one.)</li>
<li>D. Fleder, K. Hosanagar, <a href="http://papers.ssrn.com/sol3/papers.cfm?abstract_id=955984">Blockbuster culture’s next rise or fall: The effect of recommender systems on sales diversity</a>, in: Proc. WISE 2006, 2006. </li>
<li>S. M. McNee, J. Riedl, J. A. Konstan, <a href="http://portal.acm.org/citation.cfm?id=1125451.1125659">Being accurate is not enough: how accuracy  metrics have hurt recommender systems</a>, in: Proc. CHI &#8216;06 (2006) 1097 – 1101.</li>
<li>Zhang, M. and Hurley, N. 2008. <a href="http://doi.acm.org/10.1145/1454008.1454030">Avoiding monotony: improving the diversity of recommendation lists</a>. In <em>Proceedings of the 2008 ACM Conference on Recommender Systems</em> (Lausanne, Switzerland, October 23 - 25, 2008). RecSys &#8216;08. ACM, New York, NY, 123-130.</li>
<li>Quoc Le, Alexander Smola, <a href="http://arxiv.org/abs/0704.3359">Direct Optimization of Ranking Measures</a>, published online, 2008. (Thanks Mark Reid.)</li>
</ul>
<p>You can find a few more references and some analysis in our technical report:</p>
<blockquote><p>Daniel Lemire, Stephen Downes, Sébastien Paquet, <a href="http://www.daniel-lemire.com/fr/abstracts/DIVERSITY2008.html">Diversity in open social networks</a>, published online, October 2008.</p></blockquote>
<p>If I am missing any paper, tell me!</p>
<p>Maybe this warrants a Wikipedia page?</p>
<img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/463920852" height="1" width="1"/>]]></content:encoded><description>I have been arguing on this blog that while everyone knows diversity is a desirable property of recommender systems, there has been little work on the topic. To make my claim precise, I decided to list the papers addressing both recommender systems and diversity. I mean this list to be complete.

 L. McGinty, B. Smyth, [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">7</thr:total><media:content url="http://feeds.feedburner.com/~r/daniel-lemire/atom/~5/463920853/McGinty%20ICCBR%202003.pdf" fileSize="1803465" type="application/pdf" /><itunes:explicit>no</itunes:explicit><itunes:subtitle>I have been arguing on this blog that while everyone knows diversity is a desirable property of recommender systems, there has been little work on the topic. To make my claim precise, I decided to list the papers addressing both recommender systems and di</itunes:subtitle><itunes:summary>I have been arguing on this blog that while everyone knows diversity is a desirable property of recommender systems, there has been little work on the topic. To make my claim precise, I decided to list the papers addressing both recommender systems and diversity. I mean this list to be complete. L. McGinty, B. Smyth, [...]</itunes:summary><itunes:keywords>Science and Technology</itunes:keywords><feedburner:awareness>http://api.feedburner.com/awareness/1.0/GetItemData?uri=daniel-lemire/atom&amp;itemurl=http%3A%2F%2Fwww.daniel-lemire.com%2Fblog%2Farchives%2F2008%2F11%2F24%2Fdiversity-in-recommender-systems-sketch-of-a-bibliography%2F</feedburner:awareness><feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2008/11/24/diversity-in-recommender-systems-sketch-of-a-bibliography/</feedburner:origLink><enclosure url="http://feeds.feedburner.com/~r/daniel-lemire/atom/~5/463920853/McGinty%20ICCBR%202003.pdf" length="1803465" type="application/pdf" /><feedburner:origEnclosureLink>http://www.cs.pitt.edu/~mrotaru/comp/rs/McGinty%20ICCBR%202003.pdf</feedburner:origEnclosureLink></item><item><title>Recommender systems: where are we headed?</title><link>http://feeds.feedburner.com/~r/daniel-lemire/atom/~3/461258987/</link><category>Favorite</category><category>Science and Technology</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Daniel Lemire</dc:creator><pubDate>Fri, 21 Nov 2008 23:35:51 -0600</pubDate><guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1565</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>Daniel  Tunkelang <a href="http://thenoisychannel.com/2008/11/21/the-napoleon-dynamite-problem/">comments</a> on the recent progress in collaborative filtering:</p>
<blockquote><p>(&#8230;) the machine learning community, much like the information retrieval community, generally prefers black box approaches, (&#8230;) If the goal is to optimize one-shot recommendations, they are probably right. But I maintain that the process of picking a movie, like most information seeking tasks, is inherently interactive, (&#8230;)</p></blockquote>
<p>I disagree with him. Even for non-interactive recommendations, the Machine Learning community is off-track for two reasons:</p>
<ul>
<li>They fail to take into account diversity. In Information Retrieval, we know that if precision is high (all documents are relevant) but recall is low (few of the relevant documents are presented), then the system is poor. There is no such balance in collaborative filtering. Precision above all else is the goal. This is wrong. <a href="http://www.daniel-lemire.com/blog/archives/2008/11/14/measuring-the-diversity-of-recommended-lists-at-last/">Diversity metrics must be used</a>.</li>
<li>They work over static data sets. A system like Netflix is not static and so, accuracy on a static data set might be a good predictor for real-world performance. The problem is intrinsically nonlinear. People will rate different items, and they will rate differently, if you change the recommender system. The feedback loop may work against you or in your favour. The effect might be large or small. As far as I can tell, I am <a href="http://www.daniel-lemire.com/blog/archives/2007/12/22/collaborative-filtering-why-working-on-static-data-sets-is-not-enough/">the only one</a> who keep pointing out this fundamental, but never addressed limitation of working over static data sets. <strong>Update: This has absolutely nothing to do with online versus batch algorithms.</strong></li>
</ul>
<p>See also my post <a href="http://www.daniel-lemire.com/blog/archives/2007/12/13/netflix-an-interesting-machine-learning-game-but-is-it-good-science/">Netflix: an interesting Machine Learning game, but is it good science?</a></p>
<p><strong>Disclaimer</strong>: I organized the ACM KDD <a href="http://netflixkddworkshop2008.info/pc.html">Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition</a> along with people like Yehuda Koren. Yahuda is among the candidates to win the Netflix prize. I do not encourage the Netflix competition. I just do not think that it will solve our big problems.</p>
<img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/461258987" height="1" width="1"/>]]></content:encoded><description>Daniel  Tunkelang comments on the recent progress in collaborative filtering:
(&amp;#8230;) the machine learning community, much like the information retrieval community, generally prefers black box approaches, (&amp;#8230;) If the goal is to optimize one-shot recommendations, they are probably right. But I maintain that the process of picking a movie, like most information seeking tasks, is [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">8</thr:total><feedburner:awareness>http://api.feedburner.com/awareness/1.0/GetItemData?uri=daniel-lemire/atom&amp;itemurl=http%3A%2F%2Fwww.daniel-lemire.com%2Fblog%2Farchives%2F2008%2F11%2F21%2Frecommender-systems-where-are-we-headed%2F</feedburner:awareness><feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2008/11/21/recommender-systems-where-are-we-headed/</feedburner:origLink></item><item><title>Tim Bray on solving the economic crisis</title><link>http://feeds.feedburner.com/~r/daniel-lemire/atom/~3/460927299/</link><category>Business / Economics / Politics</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Daniel Lemire</dc:creator><pubDate>Fri, 21 Nov 2008 10:39:11 -0600</pubDate><guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1562</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>For reasons I will not go into, this quote feels very satisfying today:</p>
<blockquote><p>Solution to economic crisis: sack everyone who has an MBA. (<a href="http://twitter.com/timbray/statuses/1016736621">Tim Bray</a>)</p></blockquote>
<img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/460927299" height="1" width="1"/>]]></content:encoded><description>For reasons I will not go into, this quote feels very satisfying today:
Solution to economic crisis: sack everyone who has an MBA. (Tim Bray)</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">4</thr:total><feedburner:awareness>http://api.feedburner.com/awareness/1.0/GetItemData?uri=daniel-lemire/atom&amp;itemurl=http%3A%2F%2Fwww.daniel-lemire.com%2Fblog%2Farchives%2F2008%2F11%2F21%2Ftim-bray-on-solving-the-economic-crisis%2F</feedburner:awareness><feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2008/11/21/tim-bray-on-solving-the-economic-crisis/</feedburner:origLink></item><item><title>How to speed up retrieval without any index?</title><link>http://feeds.feedburner.com/~r/daniel-lemire/atom/~3/460294526/</link><category>Data Warehousing and OLAP</category><category>Science and Technology</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Daniel Lemire</dc:creator><pubDate>Tue, 25 Nov 2008 16:11:54 -0600</pubDate><guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1555</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>John Cook gives us a nice recipe to <a href="http://www.johndcook.com/blog/2008/11/17/fast-way-to-test-whether-a-number-is-a-square/">quickly find all squares in a set of integers</a>. For example, given 3, 4, 9, 15, you want your algorithm to identify 4 and 9 as squares.</p>
<p>The naïve way to solve this problem goes as follows:</p>
<ol>
<li>For each element&#8230;</li>
<li>check whether sqrt(x) is an integer.</li>
</ol>
<p>This may prove too expensive since the square-root operation must be computed using a floating-point algorithm. </p>
<p>A better way is to look at the first 4 bits of each integer. If the integer is a square, then the first 4 bits must have value 0, 1, 4, or 9. If you have a random distribution of numbers, this means that you can quickly discard 3 out of 4 numbers.</p>
<p>It is not immediately obvious that you will speed up the retrieval because inserting this check will add some overhead. However, it doubles the speed according to John. It is even less obvious that checking the first 8 or 16 bits instead of just the first 4 bits, can help. John says it does not help in one C++ implementation, but it does in a C# implementation.</p>
<p>This sort of strategy is entirely general. The research question is how much work should you do on fast dismissal? Too much effort toward dismissing lots of candidates might be counterproductive. Too little and your performance might not improve optimally.</p>
<p>Recently I started to wonder whether we could make it multipass: you first dismiss a few candidates with a cheap test, then on the survivors you use a more expensive test and so on. For example, you first check the first 4 bits, and if you cannot dismiss the candidate, you check the next 4 bits and so on. It is not a surprising idea, but figuring out whether it is worth the effort is a research question.</p>
<p>To make my point, I have worked on fast retrieval under the <a href="http://en.wikipedia.org/wiki/Dynamic_time_warping">Dynamic Time Warping</a> (DTW) distance, a nonlinear distance measure between time series. The DTW does not satisfy a triangle inequality. It is commonly used as a pattern recognition technique when comparing time series. It was initially designed to compare voice samples, allowing for changes in voice rhythm.</p>
<p><a href="http://www.cs.ucr.edu/~eamonn/">Eamonn Keogh</a> from <del datetime="2008-11-25T22:10:13+00:00">UCI</del>UCR has come up with a simple but nearly optimal way to compute a lower bound to the DTW between any two times series, called LB_Keogh (named after himself). Just like in the John Cook algorithm, this lower bound  <strong>quickly discards the false negatives</strong>. If you are interested, Eamonn has applied LB_Keogh to just about every time series problem you can think of. (Update: one hundred people or more also used LB_Keogh in their work, see comments below.) </p>
<p>I improved over LB_Keogh as follows. If LB_Keogh is not good enough (and only if it is not good enough), I compute a tighter lower bound (called LB_Improved). Surprisingly, in many cases, I can improve the retrieval time by a factor of two or more. </p>
<p>I have published my work as a <a href="http://code.google.com/p/lbimproved/">software library</a>, but also as the following paper:</p>
<blockquote><p>Daniel Lemire, <a href="http://arxiv.org/abs/0811.3301">Faster Retrieval with a Two-Pass Dynamic-Time-Warping Lower Bound</a>, to appear in Pattern Recognition.</p></blockquote>
<p>This sort of work is much more difficult than it appears. I could have easily made my method look good by optimizing it, while leaving the competing methods unoptimized. By publishing my implementation, I go a long way toward keeping me honest. If I fooled myself and the reviewers, someone might find out by surveying my source code. </p>
<img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/460294526" height="1" width="1"/>]]></content:encoded><description>John Cook gives us a nice recipe to quickly find all squares in a set of integers. For example, given 3, 4, 9, 15, you want your algorithm to identify 4 and 9 as squares.
The naïve way to solve this problem goes as follows:

For each element&amp;#8230;
check whether sqrt(x) is an integer.

This may prove too expensive [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">3</thr:total><feedburner:awareness>http://api.feedburner.com/awareness/1.0/GetItemData?uri=daniel-lemire/atom&amp;itemurl=http%3A%2F%2Fwww.daniel-lemire.com%2Fblog%2Farchives%2F2008%2F11%2F20%2Fhow-to-speed-up-retrieval-without-any-index%2F</feedburner:awareness><feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2008/11/20/how-to-speed-up-retrieval-without-any-index/</feedburner:origLink></item><item><title>Why am I not working on world hunger?</title><link>http://feeds.feedburner.com/~r/daniel-lemire/atom/~3/459158110/</link><category>Academia/Research</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Daniel Lemire</dc:creator><pubDate>Thu, 20 Nov 2008 20:00:10 -0600</pubDate><guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1467</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>My wife sometimes asks me why I am not working on important problems like world hunger. Instead, I am one of the top world expert in <a href="http://arxiv.org/abs/cs.DS/0703109">tag-cloud drawing</a>. I am sure she thinks that I just fool around, faking serious research.</p>
<p>I actually take my research very seriously. </p>
<p>I like to distinguish  abstract from concrete research. Concrete research is when you seek to obtain results in special cases. For example, an AI researcher may try to first understand how we can detect spam. Eventually he might move on to even more sophisticated tasks. In such a form of research, there are no overarching formal plans. You could say it is inductive, maybe. Researchers are often driven to this form of research because the deeper problems are simply too difficult to address directly. (I define a problem to be too difficult when you cannot make noticeable progress in a matter of months.) They hope for a breakthrough to an important problem to come as they work on a narrow issue.</p>
<p>Abstract research derives from a formal plan. Semantic Web is one such a plan. Tim Berners-Lee even drew diagrams early on of what the beast should look like. The research issues are clearly laid out. As a researcher you are tackling an extremely difficult problem, unsure whether you will ever make any noticeable progress. Researchers follow this path because they believe that only a focused effort in a definite direction can solve the difficult problems. Funding agencies love abstract research.</p>
<p>It might be a matter of biology, but my brain has always been much more productive in concrete research. I resist the inductive/deductive classification because it feels wrong. However, times and times again, working on a tractable, but possibly insignificant problem, has lead me to understand a deeper issue. When the problems are too big, my brain gets into circular and incorrect arguments. I need to chop down the problems to a manageable size. The problems need to be hard enough to push me to the limits, but easy enough that I can make weekly progress. Moreover, I cannot never know exactly what I will be doing a month later, as a researcher. </p>
<p>I will make a stronger claim: abstract research is never done. Researchers will give the illusion that they are working directly on some grand problem (like world hunger), but, in reality, they will work at a much smaller scale. And when a researcher solves a grand problem in what seems like a short time, and with few concrete possibly irrelevant steps, I attribute it to luck or lies.</p>
<p>See also my post <a href="http://www.daniel-lemire.com/blog/archives/2007/11/19/my-research-process/">my research process</a>.</p>
<img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/459158110" height="1" width="1"/>]]></content:encoded><description>My wife sometimes asks me why I am not working on important problems like world hunger. Instead, I am one of the top world expert in tag-cloud drawing. I am sure she thinks that I just fool around, faking serious research.
I actually take my research very seriously. 
I like to distinguish  abstract from concrete [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">12</thr:total><feedburner:awareness>http://api.feedburner.com/awareness/1.0/GetItemData?uri=daniel-lemire/atom&amp;itemurl=http%3A%2F%2Fwww.daniel-lemire.com%2Fblog%2Farchives%2F2008%2F11%2F19%2Fwhy-am-i-not-work-on-world-hunger%2F</feedburner:awareness><feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2008/11/19/why-am-i-not-work-on-world-hunger/</feedburner:origLink></item><item><title>Is what I do technical?</title><link>http://feeds.feedburner.com/~r/daniel-lemire/atom/~3/457836589/</link><category>Science and Technology</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Daniel Lemire</dc:creator><pubDate>Tue, 18 Nov 2008 20:44:41 -0600</pubDate><guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1536</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>We are trying to design a master degree in Information Technology. To me, this sort of program should be a professional master degree, that is, it does not lead naturally to a research career or a Ph.D.</p>
<p>My business colleagues argue in favour of research methodology courses. Apparently, students need to learn how to conduct interviews and such. In any case, I then pointed out that my master degree did not contain any such course. One of my business colleague then said a deadly thing:</p>
<blockquote><p>Of course, you got a technical master degree!</p></blockquote>
<p>This got me really angry. Really, really, really angry. I do not think I ever got so angry in my life.</p>
<p>For the record, my master degree was in <strong>Mathematics</strong> at the University of Toronto. Is Mathematics technical? If technical is to have a &#8220;practical&#8221; connotation, I can tell you that none of my graduate courses were technical. Are <a href="http://search.barnesandnoble.com/Fewnomials/A-Khovanskii/e/9780821845479">fewnomials</a> practical? I think not.</p>
<p>But the deeper implication was that anything having to do with Science was technical. That is, it deals with nuts and bolts. And I think that it is squarely wrong. From my view point, business is far more technical. And I ran my own business for several years. The business side of things was always the boring-but-easy component.</p>
<p>There is a distinct feeling in North America that <strong>business is king, and science &amp; technology are things monkeys or foreigners can do</strong>. Yet, in my experience, it is a lot harder to design a usable web application than negotiate a business deal. I believe that India and China are getting a sweet deal by doing our science &amp; technology while we focus on business. A very sweet deal indeed.</p>
<p>I think that Amazon, Google, Cisco, Microsoft and so on, thrive because many of their engineers have a deep knowledge of Computer Science. Kill the science and you kill the business.</p>
<p>But even if you discard science. Writing good source code is hard. Very hard. And it is not hard for technical reasons, not any more than painting, movie-making and sculpture are technical challenges.</p>
<p>In any case, I believe that North America is headed for a wall if it fails to recognize that its prosperity is due to culture, science and technology. And given that 40% of all students at my school go for a business degree, I am nervous.</p>
<p>See also my post <a href="http://www.daniel-lemire.com/blog/archives/2005/09/16/career-swings/">Career Swings</a> where I wrote:</p>
<blockquote><p>I cannot believe that in 2015, we’ll all be lawyers, business managers, salesman, and medical doctors. I cannot believe that technology will stand still and mathematics beyond basic algebra will be a lost art. I cannot believe my two sons will have business degrees and make three times my salary by managing a bunch of underpaid Indian programmers.</p></blockquote>
<img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/457836589" height="1" width="1"/>]]></content:encoded><description>We are trying to design a master degree in Information Technology. To me, this sort of program should be a professional master degree, that is, it does not lead naturally to a research career or a Ph.D.
My business colleagues argue in favour of research methodology courses. Apparently, students need to learn how to conduct interviews [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">5</thr:total><feedburner:awareness>http://api.feedburner.com/awareness/1.0/GetItemData?uri=daniel-lemire/atom&amp;itemurl=http%3A%2F%2Fwww.daniel-lemire.com%2Fblog%2Farchives%2F2008%2F11%2F18%2Fis-what-i-do-technical%2F</feedburner:awareness><feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2008/11/18/is-what-i-do-technical/</feedburner:origLink></item><item><title>SciFi book review: Spin by Robert Charles Wilson</title><link>http://feeds.feedburner.com/~r/daniel-lemire/atom/~3/456581749/</link><category>Science and Technology</category><category>scifi review</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Daniel Lemire</dc:creator><pubDate>Mon, 17 Nov 2008 18:21:07 -0600</pubDate><guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1534</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>The novel <a href="http://en.wikipedia.org/wiki/Spin_(novel)">Spin</a> won the <a title="Hugo Award" href="http://en.wikipedia.org/wiki/Hugo_Award">Hugo Award</a> for <a title="Hugo Award for Best Novel" href="http://en.wikipedia.org/wiki/Hugo_Award_for_Best_Novel">Best Novel</a> in 2006.</p>
<p>It is what I would call a &#8220;temporal disparity&#8221; novel. Earth becomes suddenly surrounded in a temporal shield that slows time down for human beings. Alas, the Sun is aging very fast for the poor human beings. Are we going to die? Who is creating this field?</p>
<p>This is almost exactly the reverse story from <a href="http://fr.wikipedia.org/wiki/Georges-Jean_Arnaud">Georges-Jean Arnaud</a>&#8217;s <em>La grande  séparation</em> (1971-1973). In Arnaud&#8217;s story, a planet has a similar temporal field, but it accelerates time on the planet. Even though the planet has primitive technology, it is constantly surveyed for any sign of technological development. Spin offers the counterpart story.</p>
<p>A temporal disparity leads to a technological disparity: a small band of savages can evolve into a technologically superior race while you are having coffee.</p>
<p><strong>Pros</strong></p>
<p>The novel is very good. The author writes with good scientific rigour.  The writing is supported by repeatedly introducing new mysteries in every chapter&#8230; to keep you coming for more. The characters are believable and well drawn.</p>
<p><strong>Cons</strong></p>
<p>The author tried to limit the scope of the story to few characters, but not all of them are good characters. The writing style reminds me a bit of Card&#8217;s Ender&#8217;s game series. There is the extra smart kid who grows up to be the is the only one able to see through what is happening. I found this particular element of the novel irritating. A major catastrophe hits the Earth and only one man seems to be able to put it all together? I am a bit disappointed by how the author dealt with anything outside the Earth, including the Martians. He could have done so much more! </p>
<p>Sequels are upcoming.</p>
<img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/456581749" height="1" width="1"/>]]></content:encoded><description>The novel Spin won the Hugo Award for Best Novel in 2006.
It is what I would call a &amp;#8220;temporal disparity&amp;#8221; novel. Earth becomes suddenly surrounded in a temporal shield that slows time down for human beings. Alas, the Sun is aging very fast for the poor human beings. Are we going to die? Who is [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:awareness>http://api.feedburner.com/awareness/1.0/GetItemData?uri=daniel-lemire/atom&amp;itemurl=http%3A%2F%2Fwww.daniel-lemire.com%2Fblog%2Farchives%2F2008%2F11%2F17%2Fscifi-book-review-spin-by-robert-charles-wilson%2F</feedburner:awareness><feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2008/11/17/scifi-book-review-spin-by-robert-charles-wilson/</feedburner:origLink></item><item><title>The most active blogs I follow…</title><link>http://feeds.feedburner.com/~r/daniel-lemire/atom/~3/456538279/</link><category>Science and Technology</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Daniel Lemire</dc:creator><pubDate>Mon, 17 Nov 2008 17:15:07 -0600</pubDate><guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1530</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>A very active feed that has remained in my list for a long time is a good feed (for me). My top 3 (in decreasing order of activity):</p>
<ul>
<li><a href="http://thenoisychannel.com/">The Noisy Channel</a>: Daniel Tunkelang, chief Scientist at <a href="http://www.endeca.com/">Endeca</a>. He works in information retrieval.</li>
<li><a href="http://www.sylvienoel.ca/blog/">Population of One</a>: Sylvie Noël, research scientist at the government of Canada. She works in <a href="http://en.wikipedia.org/wiki/Human-computer_interaction">HCI</a>.</li>
<li><a href="http://www.tbray.org/ongoing/">Ongoing</a>: Tim Bray, director of Web Technologies at Sun Microsystems. He helped create XML and Atom.</li>
</ul>
<img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/456538279" height="1" width="1"/>]]></content:encoded><description>A very active feed that has remained in my list for a long time is a good feed (for me). My top 3 (in decreasing order of activity):

The Noisy Channel: Daniel Tunkelang, chief Scientist at Endeca. He works in information retrieval.
Population of One: Sylvie Noël, research scientist at the government of Canada. She works in [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total><feedburner:awareness>http://api.feedburner.com/awareness/1.0/GetItemData?uri=daniel-lemire/atom&amp;itemurl=http%3A%2F%2Fwww.daniel-lemire.com%2Fblog%2Farchives%2F2008%2F11%2F17%2Fthe-most-active-blogs-i-follow%2F</feedburner:awareness><feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2008/11/17/the-most-active-blogs-i-follow/</feedburner:origLink></item><item><title>Full text search in SQL with LuSql</title><link>http://feeds.feedburner.com/~r/daniel-lemire/atom/~3/456096072/</link><category>Data Warehousing and OLAP</category><category>Science and Technology</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Daniel Lemire</dc:creator><pubDate>Mon, 17 Nov 2008 08:30:53 -0600</pubDate><guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1526</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>MySQL supports natively <a href="http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html">full text search</a>; many database engines do. However, few databases can match a dedicated search engine library like <a href="http://lucene.apache.org/java/docs/">Lucene</a>. Moreover, even if you do not need the power of Lucene, sometimes you are forced to use a database engine that does not support full text search (like raw <a href="http://en.wikipedia.org/wiki/Comma-separated_values">CSV</a> files). </p>
<p>It would be nice to be able to combine a true search engine with any database engine. </p>
<p>If you are willing to use Java, then Glen Newton from NRC has the solution: <a href="http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql">LuSql</a>. It allows you to index with Lucene any database accessible by Java (through JDBC). He says it has been extensively tested. It is open source and free.</p>
<img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/456096072" height="1" width="1"/>]]></content:encoded><description>MySQL supports natively full text search; many database engines do. However, few databases can match a dedicated search engine library like Lucene. Moreover, even if you do not need the power of Lucene, sometimes you are forced to use a database engine that does not support full text search (like raw CSV files). 
It would [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:awareness>http://api.feedburner.com/awareness/1.0/GetItemData?uri=daniel-lemire/atom&amp;itemurl=http%3A%2F%2Fwww.daniel-lemire.com%2Fblog%2Farchives%2F2008%2F11%2F17%2Ffull-text-search-in-sql-with-lusql%2F</feedburner:awareness><feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2008/11/17/full-text-search-in-sql-with-lusql/</feedburner:origLink></item><item><title>Toward the Commoditization of Natural Language Processing</title><link>http://feeds.feedburner.com/~r/daniel-lemire/atom/~3/456096073/</link><category>Science and Technology</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Daniel Lemire</dc:creator><pubDate>Fri, 14 Nov 2008 17:52:51 -0600</pubDate><guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1503</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>In a remarkable <a href="http://iit-iti.nrc-cnrc.gc.ca/publications/nrc-50398_e.html">paper</a>, <a href="http://apperceptual.wordpress.com/">Peter Turney</a> shows that using a simple family of algorithms and  freely available software, one can determine analogies, synonyms, antonyms, and relations between words automatically. Here is the beginning of the abstract:</p>
<blockquote><p>
Recognizing analogies, synonyms, antonyms, and associations appear to be four distinct tasks, requiring distinct NLP algorithms. In the past, the four tasks have been treated independently, using a wide variety of algorithms. These four semantic classes, however, are a tiny sample of the full range of semantic phenomena, and we cannot afford to create ad hoc algorithms for each semantic phenomenon; we need to seek a unified approach.</p></blockquote>
<p>I do not work in Natural Language Processing (NLP) per se, but this sounds like commoditization to me in the sense that you no longer need to design, learn and tweak custom algorithms. <strong>If you have enough data, you can do NLP after learning one (remarkably simple) family of algorithms</strong>. <a href="http://norvig.com/">Peter Norvig</a> might approve.</p>
<p>In the database research world, commoditization is already an accomplished fact. Database researchers have been wondering about their relevance for about ten years. </p>
<p><a href="http://apperceptual.wordpress.com/">Peter</a> might argue that in such a context, researchers should become bold and daring. Computer Science researchers should choose crazy problems. </p>
<p><strong>Reference</strong>: Peter Turney, <a href="http://iit-iti.nrc-cnrc.gc.ca/publications/nrc-50398_e.html">A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations</a>,  Coling 2008 August 2008.</p>
<img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/456096073" height="1" width="1"/>]]></content:encoded><description>In a remarkable paper, Peter Turney shows that using a simple family of algorithms and  freely available software, one can determine analogies, synonyms, antonyms, and relations between words automatically. Here is the beginning of the abstract:

Recognizing analogies, synonyms, antonyms, and associations appear to be four distinct tasks, requiring distinct NLP algorithms. In the past, [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">3</thr:total><feedburner:awareness>http://api.feedburner.com/awareness/1.0/GetItemData?uri=daniel-lemire/atom&amp;itemurl=http%3A%2F%2Fwww.daniel-lemire.com%2Fblog%2Farchives%2F2008%2F11%2F14%2Ftoward-the-commoditization-of-natural-language-processing%2F</feedburner:awareness><feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2008/11/14/toward-the-commoditization-of-natural-language-processing/</feedburner:origLink></item><item><title>Do not trust financial experts</title><link>http://feeds.feedburner.com/~r/daniel-lemire/atom/~3/456096074/</link><category>Business / Economics / Politics</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Daniel Lemire</dc:creator><pubDate>Fri, 14 Nov 2008 15:47:09 -0600</pubDate><guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1500</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>One expert predicted the recession. He was ridiculed. Watch and draw your conclusions.</p>
<p><object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/2I0QN-FYkpw&#038;color1=0xb1b1b1&#038;color2=0xcfcfcf&#038;hl=en&#038;fs=1"></param><param name="allowFullScreen" value="true"></param><embed src="http://www.youtube.com/v/2I0QN-FYkpw&#038;color1=0xb1b1b1&#038;color2=0xcfcfcf&#038;hl=en&#038;fs=1" type="application/x-shockwave-flash" allowfullscreen="true" width="425" height="344"></embed></object></p>
<p><strong>Source</strong>: <a href="http://parand.com/say/index.php/2008/11/13/this-guy-called-the-recession/">Standard Deviations</a>.</p>
<img src="http://feeds.feedburner.com/~r/daniel-lemire/atom/~4/456096074" height="1" width="1"/>]]></content:encoded><description>One expert predicted the recession. He was ridiculed. Watch and draw your conclusions.

Source: Standard Deviations.</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">3</thr:total><media:content url="http://feeds.feedburner.com/~r/daniel-lemire/atom/~5/456096075/2I0QN-FYkpw&amp;" fileSize="882" type="application/x-shockwave-flash" /><itunes:explicit>no</itunes:explicit><itunes:subtitle>One expert predicted the recession. He was ridiculed. Watch and draw your conclusions. Source: Standard Deviations. </itunes:subtitle><itunes:summary>One expert predicted the recession. He was ridiculed. Watch and draw your conclusions. Source: Standard Deviations. </itunes:summary><itunes:keywords>Business / Economics / Politics</itunes:keywords><feedburner:awareness>http://api.feedburner.com/awareness/1.0/GetItemData?uri=daniel-lemire/atom&amp;itemurl=http%3A%2F%2Fwww.daniel-lemire.com%2Fblog%2Farchives%2F2008%2F11%2F14%2Fdo-not-trust-financial-experts%2F</feedburner:awareness><feedburner:origLink>http://www.daniel-lemire.com/blog/archives/2008/11/14/do-not-trust-financial-experts/</feedburner:origLink><enclosure url="http://feeds.feedburner.com/~r/daniel-lemire/atom/~5/456096075/2I0QN-FYkpw&amp;" length="882" type="application/x-shockwave-flash" /><feedburner:origEnclosureLink>http://www.youtube.com/v/2I0QN-FYkpw&amp;#038;color1=0xb1b1b1&amp;#038;color2=0xcfcfcf&amp;#038;hl=en&amp;#038;fs=1</feedburner:origEnclosureLink></item><media:rating>nonadult</media:rating><feedburner:awareness>http://api.feedburner.com/awareness/1.0/GetFeedData?uri=daniel-lemire/atom</feedburner:awareness></channel></rss>
