<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: The missing research tool&#8230;</title>
	<atom:link href="http://lemire.me/blog/archives/2009/03/02/the-missing-research-tool/feed/" rel="self" type="application/rss+xml" />
	<link>http://lemire.me/blog/archives/2009/03/02/the-missing-research-tool/</link>
	<description>Computer Scientist and Open Scholar: Databases, Information Retrieval, Business Intelligence.</description>
	<lastBuildDate>Thu, 24 May 2012 04:42:46 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<item>
		<title>By: Francois Rivest</title>
		<link>http://lemire.me/blog/archives/2009/03/02/the-missing-research-tool/comment-page-1/#comment-50760</link>
		<dc:creator>Francois Rivest</dc:creator>
		<pubDate>Fri, 06 Mar 2009 01:14:04 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1850#comment-50760</guid>
		<description>ISI Web of Science, although incomplete, allows you to trace papers citing a specific paper. 

Also, I don&#039;t known about computer science, but for Health sciences in general, many publisher allow you to set citations alert on papers of interest. I.e., each time a paper important for your litterature is cited (including yours), it e-mails you the reference.

I find these tools very valuable to be informed of what is going on the specific domain I am working on.</description>
		<content:encoded><![CDATA[<p>ISI Web of Science, although incomplete, allows you to trace papers citing a specific paper. </p>
<p>Also, I don&#8217;t known about computer science, but for Health sciences in general, many publisher allow you to set citations alert on papers of interest. I.e., each time a paper important for your litterature is cited (including yours), it e-mails you the reference.</p>
<p>I find these tools very valuable to be informed of what is going on the specific domain I am working on.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steven</title>
		<link>http://lemire.me/blog/archives/2009/03/02/the-missing-research-tool/comment-page-1/#comment-50759</link>
		<dc:creator>Steven</dc:creator>
		<pubDate>Thu, 05 Mar 2009 20:28:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1850#comment-50759</guid>
		<description>After reading this post, I wrote a little shell script that polls Google Scholar for new citations to my papers.  I used wget with the Google Scholar URL and full paper title, egrep -o &quot;Cited by [0-9]+&quot;, then store the counts in a file.  If the counts change, the script e-mails me.

Of course, this misses whatever citations that Google Scholar doesn&#039;t pick up.

The link by Suresh, WhatToSee, looks very useful.</description>
		<content:encoded><![CDATA[<p>After reading this post, I wrote a little shell script that polls Google Scholar for new citations to my papers.  I used wget with the Google Scholar URL and full paper title, egrep -o &#8220;Cited by [0-9]+&#8221;, then store the counts in a file.  If the counts change, the script e-mails me.</p>
<p>Of course, this misses whatever citations that Google Scholar doesn&#8217;t pick up.</p>
<p>The link by Suresh, WhatToSee, looks very useful.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: santhosh</title>
		<link>http://lemire.me/blog/archives/2009/03/02/the-missing-research-tool/comment-page-1/#comment-50753</link>
		<dc:creator>santhosh</dc:creator>
		<pubDate>Wed, 04 Mar 2009 17:56:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1850#comment-50753</guid>
		<description>Hi Daniel,
Look at http://silverfish.iiitb.ac.in, it&#039;s a web-based semantics extraction and aggregation engine for academic documents.

You can find related authors,related papers, papers citing a paper.</description>
		<content:encoded><![CDATA[<p>Hi Daniel,<br />
Look at <a href="http://silverfish.iiitb.ac.in" rel="nofollow">http://silverfish.iiitb.ac.in</a>, it&#8217;s a web-based semantics extraction and aggregation engine for academic documents.</p>
<p>You can find related authors,related papers, papers citing a paper.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ali Shams</title>
		<link>http://lemire.me/blog/archives/2009/03/02/the-missing-research-tool/comment-page-1/#comment-50751</link>
		<dc:creator>Ali Shams</dc:creator>
		<pubDate>Wed, 04 Mar 2009 15:55:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1850#comment-50751</guid>
		<description>Daniel,
Look at www.scientificcommons.org . I think this is a very good start. 

I think  that such tool should be a social networking software and not a natural language processing.</description>
		<content:encoded><![CDATA[<p>Daniel,<br />
Look at <a href="http://www.scientificcommons.org" rel="nofollow">http://www.scientificcommons.org</a> . I think this is a very good start. </p>
<p>I think  that such tool should be a social networking software and not a natural language processing.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Krishnan</title>
		<link>http://lemire.me/blog/archives/2009/03/02/the-missing-research-tool/comment-page-1/#comment-50742</link>
		<dc:creator>Krishnan</dc:creator>
		<pubDate>Tue, 03 Mar 2009 04:44:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1850#comment-50742</guid>
		<description>Hi Daniel

We have built such a tool at HP Labs India. We plan to expose it as a service in future

Krishnan</description>
		<content:encoded><![CDATA[<p>Hi Daniel</p>
<p>We have built such a tool at HP Labs India. We plan to expose it as a service in future</p>
<p>Krishnan</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mat Todd</title>
		<link>http://lemire.me/blog/archives/2009/03/02/the-missing-research-tool/comment-page-1/#comment-50741</link>
		<dc:creator>Mat Todd</dc:creator>
		<pubDate>Mon, 02 Mar 2009 20:23:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1850#comment-50741</guid>
		<description>I use Web of Science for this. I have saved searches for relevant terms, and one for any papers that cite key papers.  Weekly emails summarise it all.</description>
		<content:encoded><![CDATA[<p>I use Web of Science for this. I have saved searches for relevant terms, and one for any papers that cite key papers.  Weekly emails summarise it all.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Preston L. Bannister</title>
		<link>http://lemire.me/blog/archives/2009/03/02/the-missing-research-tool/comment-page-1/#comment-50740</link>
		<dc:creator>Preston L. Bannister</dc:creator>
		<pubDate>Mon, 02 Mar 2009 19:43:07 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1850#comment-50740</guid>
		<description>This is a symptom of a general problem. As long as academic writings are hidden in a maze of pay-for-access ghettos, access to information (and the tools used) will be poor.

The same base problem makes academic writings less useful (and thus less meaningful) to the entire community.

Solve the underlying problem.</description>
		<content:encoded><![CDATA[<p>This is a symptom of a general problem. As long as academic writings are hidden in a maze of pay-for-access ghettos, access to information (and the tools used) will be poor.</p>
<p>The same base problem makes academic writings less useful (and thus less meaningful) to the entire community.</p>
<p>Solve the underlying problem.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andre Vellino</title>
		<link>http://lemire.me/blog/archives/2009/03/02/the-missing-research-tool/comment-page-1/#comment-50739</link>
		<dc:creator>Andre Vellino</dc:creator>
		<pubDate>Mon, 02 Mar 2009 19:30:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1850#comment-50739</guid>
		<description>That&#039;s true but the problem with publisher tools is they apply only to the content owned by the publisher (usually).  What&#039;s needed is publisher-neutral tools that don&#039;t care who owns the intellectual property. (Which is related to, in spirit anyway, Daniel&#039;s desideratum &quot;The tool promotes open access content when possible&quot;)</description>
		<content:encoded><![CDATA[<p>That&#8217;s true but the problem with publisher tools is they apply only to the content owned by the publisher (usually).  What&#8217;s needed is publisher-neutral tools that don&#8217;t care who owns the intellectual property. (Which is related to, in spirit anyway, Daniel&#8217;s desideratum &#8220;The tool promotes open access content when possible&#8221;)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://lemire.me/blog/archives/2009/03/02/the-missing-research-tool/comment-page-1/#comment-50737</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Mon, 02 Mar 2009 18:20:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1850#comment-50737</guid>
		<description>@Haran

The data is most certainly not available in structured format. Nor is it available from one place only. Even if you can parse the papers to recognize the citations, you have to link the citations to the papers. That is not easy. There are many ways to cite a paper, and several papers have almost the same titles and almost the same authors.

There are places to get you started. For example, in Computer Science, DBLP makes available a rather large list of papers as XML. The papers in the arxiv database can also, I presume, be indexed somehow.

Recognizing similarities between what the researcher is doing, and a given paper is also not trivial. It is probably similar to spam filtering. There may even be people who will try to cheat the system to get their papers recommended more often!

So, it is a difficult challenge, for many reasons. But it seems that as years go by, no progress is being made. I have seen zero progress in the last two years on this problem. None. Nada.

And it is not just the challenge in getting access to the data. For example, even open access archives (such as arxiv) are hard to monitor!</description>
		<content:encoded><![CDATA[<p>@Haran</p>
<p>The data is most certainly not available in structured format. Nor is it available from one place only. Even if you can parse the papers to recognize the citations, you have to link the citations to the papers. That is not easy. There are many ways to cite a paper, and several papers have almost the same titles and almost the same authors.</p>
<p>There are places to get you started. For example, in Computer Science, DBLP makes available a rather large list of papers as XML. The papers in the arxiv database can also, I presume, be indexed somehow.</p>
<p>Recognizing similarities between what the researcher is doing, and a given paper is also not trivial. It is probably similar to spam filtering. There may even be people who will try to cheat the system to get their papers recommended more often!</p>
<p>So, it is a difficult challenge, for many reasons. But it seems that as years go by, no progress is being made. I have seen zero progress in the last two years on this problem. None. Nada.</p>
<p>And it is not just the challenge in getting access to the data. For example, even open access archives (such as arxiv) are hard to monitor!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Haran</title>
		<link>http://lemire.me/blog/archives/2009/03/02/the-missing-research-tool/comment-page-1/#comment-50736</link>
		<dc:creator>Daniel Haran</dc:creator>
		<pubDate>Mon, 02 Mar 2009 17:46:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1850#comment-50736</guid>
		<description>That would be a trivial mashup, if the data were available in machine-readable format. What&#039;s the challenge here? Is the raw data available? Is it parsing the papers to recognize citations?</description>
		<content:encoded><![CDATA[<p>That would be a trivial mashup, if the data were available in machine-readable format. What&#8217;s the challenge here? Is the raw data available? Is it parsing the papers to recognize citations?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://lemire.me/blog/archives/2009/03/02/the-missing-research-tool/comment-page-1/#comment-50735</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Mon, 02 Mar 2009 16:59:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1850#comment-50735</guid>
		<description>@Anonymous

Citeseer was good back before Google Scholar. Now it is irrelevant. Its coverage is ridiculous.</description>
		<content:encoded><![CDATA[<p>@Anonymous</p>
<p>Citeseer was good back before Google Scholar. Now it is irrelevant. Its coverage is ridiculous.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anonymous</title>
		<link>http://lemire.me/blog/archives/2009/03/02/the-missing-research-tool/comment-page-1/#comment-50734</link>
		<dc:creator>Anonymous</dc:creator>
		<pubDate>Mon, 02 Mar 2009 16:23:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1850#comment-50734</guid>
		<description>Is it not what http://citeseerx.ist.psu.edu/ does already ?</description>
		<content:encoded><![CDATA[<p>Is it not what <a href="http://citeseerx.ist.psu.edu/" rel="nofollow">http://citeseerx.ist.psu.edu/</a> does already ?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Suresh</title>
		<link>http://lemire.me/blog/archives/2009/03/02/the-missing-research-tool/comment-page-1/#comment-50733</link>
		<dc:creator>Suresh</dc:creator>
		<pubDate>Mon, 02 Mar 2009 16:17:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1850#comment-50733</guid>
		<description>try this:

http://www.cs.utah.edu/~hal/WhatToSee/

and it&#039;s from an NLP researcher, to boot !</description>
		<content:encoded><![CDATA[<p>try this:</p>
<p><a href="http://www.cs.utah.edu/~hal/WhatToSee/" rel="nofollow">http://www.cs.utah.edu/~hal/WhatToSee/</a></p>
<p>and it&#8217;s from an NLP researcher, to boot !</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://lemire.me/blog/archives/2009/03/02/the-missing-research-tool/comment-page-1/#comment-50732</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Mon, 02 Mar 2009 16:12:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1850#comment-50732</guid>
		<description>@Dupuis


&lt;i&gt; Scopus and/or Web of Science do a few of the things you&#039;re looking for, not perfectly, but it&#039;s a start.&lt;/i&gt;


I would argue it is a *bad* start. 

Scopus knows about 11 of my papers, and one of them is not from me, so it knows about 10 of my papers. It thinks that my 2000 paper &quot;Wavelet time entropy&quot; is my most cited work (with 21 citations).

Should I trust Scopus? Is this an accurate picture? No. Not even close.

Google Scholar tells me otherwise. My RACOFI paper from 2003 is cited 39 times according to Google Scholar, yet it does not even exist on Scopus! My Slope One paper from 2005 is cited 36 times according to Google Scholar and it does not exist on Scopus! My Tag-cloud drawing paper from 2007 is cited 20 times according to Google and... it does not even exists on Scopus.

Even if you don&#039;t trust the numbers Google Scholar gives you, these 3 papers I just gave you do exist. They have been repeatedly cited and there is even a wikipedia page about one of these papers. Yet, as far as Scopus is concerned, I have hardly been cited for my work after 2000... except for the Scale and translation invariant collaborative filtering systems paper...

Coverage matters a lot more to a researcher than precision. Missing 3 of my most important contributions is a big deal to me. I don&#039;t care that it reports only 10 of my papers... I care that it misses my most important work though!!!

A tool that does not know about my important work, can&#039;t possibly help me monitor upcoming papers efficiently.</description>
		<content:encoded><![CDATA[<p>@Dupuis</p>
<p><i> Scopus and/or Web of Science do a few of the things you&#8217;re looking for, not perfectly, but it&#8217;s a start.</i></p>
<p>I would argue it is a *bad* start. </p>
<p>Scopus knows about 11 of my papers, and one of them is not from me, so it knows about 10 of my papers. It thinks that my 2000 paper &#8220;Wavelet time entropy&#8221; is my most cited work (with 21 citations).</p>
<p>Should I trust Scopus? Is this an accurate picture? No. Not even close.</p>
<p>Google Scholar tells me otherwise. My RACOFI paper from 2003 is cited 39 times according to Google Scholar, yet it does not even exist on Scopus! My Slope One paper from 2005 is cited 36 times according to Google Scholar and it does not exist on Scopus! My Tag-cloud drawing paper from 2007 is cited 20 times according to Google and&#8230; it does not even exists on Scopus.</p>
<p>Even if you don&#8217;t trust the numbers Google Scholar gives you, these 3 papers I just gave you do exist. They have been repeatedly cited and there is even a wikipedia page about one of these papers. Yet, as far as Scopus is concerned, I have hardly been cited for my work after 2000&#8230; except for the Scale and translation invariant collaborative filtering systems paper&#8230;</p>
<p>Coverage matters a lot more to a researcher than precision. Missing 3 of my most important contributions is a big deal to me. I don&#8217;t care that it reports only 10 of my papers&#8230; I care that it misses my most important work though!!!</p>
<p>A tool that does not know about my important work, can&#8217;t possibly help me monitor upcoming papers efficiently.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Erik Duval</title>
		<link>http://lemire.me/blog/archives/2009/03/02/the-missing-research-tool/comment-page-1/#comment-50731</link>
		<dc:creator>Erik Duval</dc:creator>
		<pubDate>Mon, 02 Mar 2009 16:11:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.daniel-lemire.com/blog/?p=1850#comment-50731</guid>
		<description>Seems like we have similar concerns - I blogged about this more than 18 months ago: http://erikduval.wordpress.com/2007/10/04/i-need-help/.

Can&#039;t believe that progress in this area is so slow, though there are some new initiatives (http://www.mendeley.com/).

We&#039;re working to get something off the ground too. Would love to get more pointers to related work!</description>
		<content:encoded><![CDATA[<p>Seems like we have similar concerns &#8211; I blogged about this more than 18 months ago: <a href="http://erikduval.wordpress.com/2007/10/04/i-need-help/" rel="nofollow">http://erikduval.wordpress.com/2007/10/04/i-need-help/</a>.</p>
<p>Can&#8217;t believe that progress in this area is so slow, though there are some new initiatives (<a href="http://www.mendeley.com/" rel="nofollow">http://www.mendeley.com/</a>).</p>
<p>We&#8217;re working to get something off the ground too. Would love to get more pointers to related work!</p>
]]></content:encoded>
	</item>
</channel>
</rss>

