Wednesday, October 10th, 2007

Disambiguate words using wikipedia

Filed under: Science and Technology — Daniel Lemire @ 12:54

A common problem in information retrieval is that words are ambiguous. That is a fancy way of saying that you cannot tell the meaning of a word when you take it out of context. Some people claim that this problem must be solved by using the Semantic Web. I have long advocated that the Semantic Web is more of a solution in search of a problem.

We already have some good strategies regarding disambiguation, but I have wondered recently why we can’t use wikipedia to disambiguate words. After all, wikipedia knows the difference between Java (the island) and Java (the programming language). It turns out that Google has implemented and patented this very idea!

Bunescu, R. and Pasca, M., Using Encyclopedic Knowledge for Named Entity Disambiguation, EACL-06, 2006.

See? Who needs RDF to disambiguate words?

(Source.)

1 Comment »

  1. And even more recently:

    Cucerzan, S. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. EMNLP-CoNLL Joint Conference. Prague, 2007.

    (Silviu Cucerzan works for Microsoft)

    Comment by David — 11/10/2007 @ 7:57

RSS feed for comments on this post.

Leave a comment

Warning: When entering a long comment, please ensure that you make copy of your text prior to submitting it. If the server should fail or if you hit a bug, you might lose your work. I am not responsible for your lost effort.

To spammers: I carefully review every single post and make sure that spam gets deleted. You are wasting your time if you are manually entering spam using this form. Read my terms of use to see what I consider to be abusive.

Example: I + II + IX= XII. Yes, you have to enter a roman numeral. (Answer must be in upper case.)

« Blog's main page

26 queries. 0.321 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.