Tuesday, January 17th, 2006

Technorati allows time-based text mining

Filed under: Data Warehousing and OLAP — Daniel Lemire @ 23:03

Matthew is reporting that technorati now allows you to plot word usage frequency over time in the blogosphere. Here’s the usage of the word “segmentation” over time:

Technorati Chart

I think BlogPulse has been offering this sort of things for some time. I’m confused by the relationship between these various services. However, these services could benefit from OLAPish concepts (shameless plug):

Steven Keith, Owen Kaser, Daniel Lemire, Analyzing Large Collections of Electronic Text Using OLAP, APICS 2005, Wolfville, Canada, October 2005.

2 Comments »

  1. you can do this and more yourself using their API, a unix shell account and a php/perl script running on cron. I’m planning on doing it… (just reached your blog accidentally).

    Comment by saurab — 21/6/2006 @ 8:00

RSS feed for comments on this post.

Leave a comment

Warning: When entering a long comment, please ensure that you make copy of your text prior to submitting it. If the server should fail or if you hit a bug, you might lose your work. I am not responsible for your lost effort.

To spammers: I carefully review every single post and make sure that spam gets deleted. You are wasting your time if you are manually entering spam using this form. Read my terms of use to see what I consider to be abusive.

Example: I + II + IX= XII. Yes, you have to enter a roman numeral. (Answer must be in upper case.)

« Blog's main page

39 queries. 1.438 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.