Saturday, November 11th, 2006

RSS hygiene and Google Reader

Filed under: — Daniel Lemire @ 11:49

It used to be that Google’s RSS reader was a catastrophe, but it is now pretty good and pretty smart. Google has turned things around. I’ve switched to it two days ago. The UI is still trying a bit too hard for my taste, but I guess you have to account for the fact that it is still slightly experimental.

Oh! And can we, please, get over the round corners? I actually like rectangular shapes. The Web 2.0 look where everything is round and smooth is really a fad and we will look back on these Web designs, in five years, with a fair amount of disgust.

The really nice thing about Google Reader is that when you scroll down past an item, it marks it automatically as read. At least, this is the default behavior. I found that one annoying thing with most readers is how they either required me to manually marked items as read, or they used a “delay then mark read” approach. I much prefer the Google Reader approach.

See http://www.google.com/reader/.

In general, about RSS/Atom feeds, I have learn that you should not maintain more than 25 or so feeds. It is quite tempting, especially when you are bored or stressed out, to add more and more feeds. I find it is better to migrate and slowly change your list of feeds. Let us call this “RSS hygiene“. You don’t keep 50 books open on your desk, do you? You do not follow 50 TV shows, do you? You do not read simultaneously 50 novels, do you? Why is 25 a good number? Because, on an average day, you will have no more than 5 new posts to read, maybe 10 at the outmost.

One side effect of limiting yourself to few feeds is that you tend to go for quality. What about diversity? What about the long tail, you ask? I think you simply outgrow some of the feeds over time and will naturally replace them. If you are like me, your interests change over time and so will your feeds.

And this whole argument justifies the fact that “feed recommender systems” have never really picked up any steam. People don’t want lots of feed proposals, not most of the time. They want to carefully choose any new feed, and that is not something that can be done automatically.

Louka’s first birthday

Filed under: Video podcast — Daniel Lemire @ 11:00

My son Louka is 1.

Here’s what you get when you try to put a hat on a baby:

My wife and my first son Lohan know how to present a birthday cake:

Oh! And my wife is pretty, isn’t she?

I’m such a lucky man!

Thursday, November 9th, 2006

Advances in Querying Non-Conventional Data Sources — IADC 2007 (December 15, 2006 / April 4-7)

Filed under: Passed CFP — Daniel Lemire @ 20:47

IADC 2007 has a very interesting special track on “Advances in Querying Non-Conventional Data Sources”. The conference will be held in San Diego. Selected papers from the track will be invited for submission to a special issue of JDIM.

Nowadays, heterogeneity of data-intensive information systems poses new challenges on the issue of querying non-conventional data sources beyond relational databases. Non-conventional data sources arise in many fields: Web/XML data in massive Web repositories e.g., B2B and B2C e-commerce systems, RDF data in ontological databases, text data in digital libraries, peer-to-peer data in innovative scenarios drawn from Web and Grid service-based architectures, data streams and RFID data in emerging sensor network applications, DW/OLAP data in very-large data warehouses, spatial data in advanced GIS applications, temporal data in sequence and genomic databases, spatio-temporal data in mobile computing applications, log-data in data and process mining tools, scientific data in e-science applications, biological data in bio-banks etc.

In these contexts, traditional DBMS query technologies are inadequate, so that novel models, algorithms and paradigms are necessary in order to efficiently support query answering against non-conventional data sources. Despite some recent advancements, various aspects need to be further investigated, among which: formal foundations, advanced query models and techniques, design of innovative query algorithms, query optimization models and techniques, query translation solutions, query re-writing schemes, view-based query answering, design and implementation of advanced query operators/predicates, indexing strategies for efficient query answering, imprecise/incomplete query answering, complex query result visualization techniques, multiple query result fusion techniques, security/privacy-preserving issues in query answering etc.

Wednesday, November 8th, 2006

Querying the library of congress using Search/Retrieve via URL

Filed under: — Daniel Lemire @ 21:26

SRU (Search/Retrieve via URL) is an interesting REST Web Service protocol.

Enough technobabble. Let’s run an example.

Suppose you want to retrieve the data that the library of congress has on a book called “First Impressions of the New World” by “Trotter Isabella Strange”, you issue the following query (follow the hyperlink for the XML result):

(dc.title=”First Impressions of the New World”) and (dc.creator all “Trotter Isabella Strange”)

You want to use this in software? Download my corresponding Perl and Python code examples: srucodeexamples.zip.

Further reading: See the wikipedia entry or even better, check the refbase entry.

(Special thanks to Owen Kaser for making me discover this exciting new technology.)

Monday, November 6th, 2006

Looking for an all-recording plug-in for Firefox

Filed under: Data Warehousing and OLAP — Daniel Lemire @ 19:04

Dear reader,

I’m looking to record everything I ever browse on the Web using Firefox. That is right. I want a copy of every single document, Web page, query, and so on, I ever encounter. I also want to record the content of every single form I submit. The result should be adequately protected so that it is not possible to have access to the data without access to my machine. It would be akin to the wayback machine except that it would cover only the stuff I have seen. I have not yet decided whether it should record all the youtube videos I watch, but I guess it should.

Ideally, it should be able write the data on a portable disk.

Why do I want to do this? Why not. Actually, I would then want to use this data to build a fancy database that would support things like drill-down or roll-up queries (à la OLAP).

If you know how to build this, want to help, or know of such a plug-in, please drop me a line.

I’m also running a competition to decide how to call such a thing. I initially thought about calling it a webex but it turns out that’s a trademark. Then I decided that Hammerspace might be a better name, or maybe Magic Satchel? What do you think?

90% accuracy for translation software?

Filed under: — Daniel Lemire @ 18:21

The USA Today research projects whose goal is to deliver, by 2010, software that can almost instantly translate Arabic and Mandarin Chinese with 90 to 95% accuracy. I’m no natural language researcher, but I’d be interested in knowing how they measure accuracy. I have seen some recent results on automated translation (French-English) and the results were alright some of the time, but even if you only screw up one word out of ten, is this good enough?

To anyone who has spent weeks crafting a ten pages paper: do you think you could get away by screwing up one word out of ten? Communication is a difficult task. It requires high accuracy most of the time.

Yes, I’m being hard. Automated translation is good and interesting work, but let us remain critical of the possibilities. While we lack strong AI, we cannot hope to replace human translators.

Wednesday, November 1st, 2006

ACL Wiki for Computational Linguistics

Filed under: — Daniel Lemire @ 18:29

The Association for Computational Linguistics (ACL) has created a wiki for the Computational Linguistics community. This is a great initiative!

(Source: Peter Turney.)

« Previous Page

35 queries. 0.404 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.