Tuesday, February 21st, 2006

Firefox Extension Development Tutorial

Filed under: — Daniel Lemire @ 17:58

Through Downes, I got to this Firefox Extension Development Tutorial. Maybe I ought to try my hand at fixing the “browsers lack a good text editor” problem.

Well, this will have to wait after I prepare my lectures for CS 6905, prepare week 7 of INF 6460, mark the homeworks for INF 6450, finish my next paper, finish sorting out the papers for CSWWS 2006 (we got some really good ones), register for Curves and Surfaces 2006 and apply for another grant. Oh! And with some luck, finish the paperwork so that this super smart guy from France can come do a post-doc with us.

Oh! Yeah! I’ll be writing lots of Firefox extensions this year!

The Grouplens Research Papers Recommender System

Filed under: — Daniel Lemire @ 15:22

Sure, there is CiteULike, but now the respectable Grouplens research group has come up with a Paper Recommender System. If you have papers appearing on the ACM Digital Library, try it out, it is cool. It will match your papers with others and ask you how you feel about the recommendation lists. By trying it out, you are helping science!

This post is ad-supported. If you like this post, you must also like these papers:

And by the way, it is a joke, my posts are not really ad-supported. But yes, I do use my blog to promote my research papers. I’m pretending that my blog is supporting my academic work. Or so says my activity report and my last promotion case.

Friday, February 17th, 2006

When only 2.5% of your students are female

Filed under: Academia/Research, Science and Technology — Daniel Lemire @ 15:28

The head of the CS Department at Purdue University is reporting that only 6 out of 155 freshmen are females. That’s a meagre 2.5%.

Does it matter? Yes it does. When half the population thinks that a given field is without any interest, you have a serious problem.

Elsewhere on this blog, I reported that at a school like the University of Montreal, there are more new students in Physics than in Computer Science. Should I remind you that many Physics Departments have closed down since we’ve stopped sending people on the moon?

I get the feeling that we will look back on the current years as being historical for Computer Science. Maybe the pendulum will switch back, maybe it will break and fall on the floor. Who knows?

If you need me, I’ll be in the Mathematics Department before dropping by the Business School, on my way to meet the Environmental Sciences people. See yah!

Wednesday, February 15th, 2006

Strange keywords point people to my site

Filed under: — Daniel Lemire @ 22:05

The following query in Google, “warcraft book manuel”, gives the PDF file of my INF 6450 lecture notes as one of the first 10 hits. Turns out that I use Warcraft, or rather the Warcraft customization engine, as an example and Google picks up on that.

I find it totally amazing because I had no recollection of this mention of Warcraft and thought that Google was buggy, but then, when I read the short in-context summary Google gives, I immediately recognized my own wording. Google has such a good UI!

This reminds me of the observation I once made about people getting to my cat’s home page (yes, my cat has a home page, try to find it for extra points). She is called “Jolie” and I wrote she was “coquine” or maybe my wife wrote so. In any case, all sorts of perverts get to my cat’s home page looking for dirty pictures.

Of course, to get better, what Google would need is “context”. Google would need to know what you are doing as you are typing your query and possibly lots of data about who you are (whether you have played Warcraft or not and so on). Quite possibly, we could use various strategies to preserve privacy. For example, Google could issue an answer to the request and leave the last steps of the filtering to an AJAX application. This reminds me of lazy evaluation techniques.

Wednesday, February 8th, 2006

ICDT 2007 (July 10, 2006 / January 10-12, 2007)

Filed under: Data Warehousing and OLAP, Passed CFP — Daniel Lemire @ 15:48

The ICDT 2007 call for papers is out. The 11th International Conference on Database Theory (ICDT 2007) will be held in Barcelona, Spain, in January 10-12, 2007.

Suggested, but not exclusive, topics of interest for submissions include: Access methods and physical design; Active databases; Complexity and performance; Constraint databases; Data integration and interoperability; Data mining; Data models; Database programming and query languages; Databases and information retrieval; Probabilistic Databases; Databases and workflow; Databases and the Semantic Web; Databases in e-commerce; Databases in e-services; Deductive databases and knowledge bases; Distributed databases; Integrity and security; Logic and databases; Multimedia databases; Query optimization; Query processing; Real-time databases; Semi-structured, XML, and Web data; Spatial data; Temporal data; Concurrency and recovery; Transaction management; Views and data warehousing.

Monday, February 6th, 2006

Looking for a sane “probabilistic models” tutorial

Filed under: — Daniel Lemire @ 22:33

I’m preparing a course in Information Retrieval and Filtering. One topic I cover is the probabilistic models in Information Retrieval. I must admit it is not a topic I know well. Up until now, I’m not very impressed with the few documents I found. It seems everyone copies from someone else, and they all end up with the same type of explanations which are a bit obscur to me. It seems awfully contrived.

Don’t get me wrong, I can follow the logic, it is just not clear to me how probabilistic models in Information Retrieval can be better than Vector Models. I can see some “mathematical foundation”, but it hasn’t gotten me excited so far. I wouldn’t say it is awfully elegant.

Anyone can help me?

Most frequently asked question about XML

Filed under: — Daniel Lemire @ 22:30

I teach XML. It is neither a glorious nor a prestigious task, but it is fun. I must admit that I am quite a bit of a hacker. While I’m a trained mathematician and some of my papers contain highly non trivial mathematical results, I also enjoy the elegance and the simplicity of something like XML because, when viewed in the perpective of everything that came before it, it is simply a very nice solution. Yes, developers do respect elegance and, to a large extend, are driven to it.

Anyhow, XML is not hard. Actually, it is very hard to come up with hard questions about XML. XSLT has the reputation to be very hard, but most students learn it rather easily, which is why, I think, it is not worth your time to learn easier, less powerful, languages (such as XQuery).

Here’s a challenge to you my reader: if you had to write an exam question for an XML class, a really difficult one, but one where you can express the answer in simple terms, what would it be? You are not allowed to take a hard problem from, say, graph theory, and put it in XML terms. Your problem must be a naturally occuring XML problem. You cannot also extend the realm of XML to include Web Services.

Most semi-difficult problem I found have to do with XSLT programming. In particular, aggregation (sums, averages and so on) problems are not trivial in XSLT, though, once you’ve solved one hard one, you’ve solved all of them. Some of them are interesting, like automatically extracting data (say Dublin Core) from documents (say XHTML) and formatting it appropriatedly (say XML/RDF). Topics like AJAX are mostly difficult because of the JavaScript-in-the-browser issue: that’s not really worthy of a university-level course, in my humble opinion. Interacting with XML from other languages is slightly interesting, but it grows boring after a while: while XOM is much better than DOM, how much mileage can you get from such an issue at the university-level?

Maybe we can derive the hard questions with what puzzles the students the most? So I thought at first. The most often asked question has to do with the fact that the Firefox browser is a non validating browser. What this means is that when loading XML, Firefox doesn’t process the DTD. If you spent a lot of time crafting crazy DTDs, you basically wasted your time as far as Firefox is concerned. Note that this doesn’t mean Firefox can’t read DTDs: it will process the internal part of the DTD (the one contained in your XML document). Alas, I don’t see how I can exploit the fact that it causes a surprise among the student to derive a hard question from it.

In fact, with experience, you learn that DTDs are not very useful. If the XML document is well formed and contain the tags you need, then why would you care about the DTD? A good example is XHTML. Most browsers will do just fine with well formed but slightly invalid XHTML and that’s the right behavior to have. Why should the browser or any application choke because I have extra attributes? Or a missing element who content can be safely assumed to be empty?

So, in an XML course, you begin by teaching students who to define a formal grammar for their XML vocabulary, and then, you hope that they will learn on their own that formal grammars are not so useful in practice. Of course, you can’t quite put it this why because you’ll always find a colleague to object that, surely, you don’t mean it. But I do.

« Previous PageNext Page »

44 queries. 1.639 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.