Wednesday, December 19th, 2007

How many Computer Science researchers are there?

Filed under: Academia/Research — Daniel Lemire @ 20:00
picture by -Kj

In current work with do on database indexes, we decided to use DBLP as a data source. Among other things, we use the authors’ name as a dimension. From one plot, I noticed that there must have half a million distinct authors. I doubted this number, and Kamel was nice enough to investigate further. It turns out that there are 531,480 different authors in DBLP! (As a basis for comparison, there about 945,000 articles.)

I don’t know about you, but this feels like a large number. We started to look for explanations. I already reported that the USA is producing 1,500 new Computer Science Ph.D.s a year. Still, there cannot be many more than 100,000 recently active Computer Science authors holding a Ph.D.

Owen pointed us to the recent CACM article Are your citations clean? by Lee et al. Alas, while DBLP is certainly dirty, in that some researchers will appear under two or more different names, it cannot explain why we end up with half a million authors!

The best explanation so far is that many undergraduate or M.Sc. students have papers on DBLP. So much so that they make up the majority of the authors in DBLP.

Do you buy this theory? If not, do you have a better explanation?

(As a side-effect, it should not be very hard to be in the top 10% among the most prolific DBLP authors!)

5 Comments »

  1. You should also take into account papers from industrial research lab and industry in general. It would be very interesting to see a grouping of those 500K authors by affiliation-at-time-of-publication (or even current affiliation).

    Comment by Muli Ben-Yehuda — 20/12/2007 @ 1:10

  2. I almost sure that there are hordes of non-PhD authors. For example, I’m about to hold an Msc and have co-authored an IEEE paper with two other non-PhD co-workers and our names magically appeared on DBLP.

    Comment by Ricardo NIederberger Cabral — 20/12/2007 @ 6:51

  3. There might also be a substantial number of authors from math/physics/econ/whatever other field who have published in a CS journal/conference at some point.

    Comment by Andris — 22/12/2007 @ 23:18

  4. On the other hand, they’re not cataloging CS researchers who publish in non-CS journals. OK, that probably only reduces the number of hits for some strange folks, not the number of authors.

    Comment by Michael Stiber — 25/12/2007 @ 18:12

  5. Just stumbled across this blog entry. Here is some statistics. As of 31Dec07, DBLP lists 588,150 different authors, 48,126 of which with 10 or more publications, 20,345 with 20 or more publications, and 1,178 with 100 or more publications.

    BTW, check out http://dblp.mpi-inf.mpg.de now linked from every DBLP author page, providing convenient prefix search, faceted search, etc. You probably have noticed it already. Feedback very welcome.

    Comment by Holger Bast — 30/12/2007 @ 21:34

RSS feed for comments on this post.

Leave a comment

Warning: When entering a long comment, please ensure that you make copy of your text prior to submitting it. If the server should fail or if you hit a bug, you might lose your work. I am not responsible for your lost effort.

To spammers: I carefully review every single post and make sure that spam gets deleted. You are wasting your time if you are manually entering spam using this form. Read my terms of use to see what I consider to be abusive.

Example: I + II + IX= XII. Yes, you have to enter a roman numeral. (Answer must be in upper case.)

« Blog's main page

34 queries. 1.416 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.