ICDM’06 (July 5, 2006 / December 18-22, 2006)

ICDM 2006 will be held in Hong Kong.

The 2006 IEEE International Conference on Data Mining (ICDM-06) provides a premier forum for the dissemination of innovative, practical development experiences as well as original research results in data mining, spanning applications, algorithms, software and systems. The conference draws researchers and application developers from a wide range of data mining related areas such as statistics, machine learning, pattern recognition, databases and data warehousing, data visualization, knowledge-based systems and high performance computing.

KDD 2006 (March 3th, 2006 / August 23-26 2006)

KDD 2006 will be held in Philadelphia.

During the past years, the ACM SIGKDD conference has established itself as the premier international conference on knowledge discovery and data mining with an attendance of 600-900 people.

I’m leaving for Houston (ICDM’05)

From Nov. 26th to Dec. 1st 2005, I’m be in Houston for ICDM’05 where I’ll present our paper An Optimal Linear Time Algorithm for Quasi-Monotonic Segmentation. For a limited time, my slides are available on the Web.

(If you are a thief, I’ve got two sons guarding my house, so don’t bother.)

Tabs are evil

I thought I had written a piece about this, but no. So, there you go. Tabs are evil in text files. Why? Because the tab character (\t) has vaguely defined semantics. It means “insert x spaces” where x depends on the text editor and the preferences of the user.

For example, the following two lines of code will appear perfectly aligned to me, because I prefer short indentations (only 2 characters):

[space][space]print "dog"
[tab]print "cat"

But if someone types the following code

[space][space][space][space]print "dog"
[tab]print "cat"

then it won’t look aligned at all for me.

The solution? Tell your text editor to dynamically replace tabs by spaces. For vim, you can achieve this by putting the line “set expandtab” in your file “~\.vimrc” or by typing “:set expandtab” while vim is running. The equivalent should be possible with all good text editors.

Now, go do it. Configure you text editor properly.

Disclaimer: There is one case where this brings your trouble. Makefiles, for some odd reasons, require actual tabs. But I write a lot more code than makefiles and so should you.

Java OLAP Interface (JOLAP) is dead?

It looks like JOLAP is dead. The final specification has been approved on June 15th 2004. However, to this day, except for Mondrian and Xelopes, I know of no implementation of JOLAP. According this this thread, Oracle has no intention of ever supporting JOLAP.

On the other hand, Oracle doesn’t support nor does it plan to support MDX or derived technologies such as XML for Analysis (XMLA) and more recent specifications. But, you can get MDX support in Mondrian and in SQL Server standard edition or better. I am pretty sure IBM supports MDX and maybe XMLA, but with recent changes in their OLAP product line, I must admit I’m a bit confused.

This leaves us with no cross-platform OLAP query standard. After all these failed attempts, it is very depressing.

Update: Daniel Guerrero from Ideasoft correctly pointed out to be that the current JOLAP spec. has not been published yet as a Final Release, but only as a Final Draft. The Final Draft has been approved in June 2004 (though IBM abstained), and normally, the Final Draft ought to be a Final Release by now, but this didn’t happen. The difference is significant because, right now, the JOLAP license, granted by Hyperion, is for evaluation purposes only. This means you can’t go out and implement JOLAP without risking legal troubles. We can imagine many scenarios on what is happening, but I’ll vote for an Intellectual Property issue.

IBM, Oracle and Microsoft freeing their databases

Oracle has recently made available their Oracle Database 10g Express Edition. Its limitations are that it can only run servers with one processor, with 4GB of disk space and 1GB of memory. It is not sufficient for even a small data warehousing project, but it is great for teaching a class. It is available for Linux and Windows.

Microsoft recently made available for free its SQL Server 2005 Express Edition. Obviously only available under Windows. It lacks enterprise features, it is limited to one CPU, 1GB of memory and 4GB of disk space: basically the same limitations as the Oracle Database 10g Express Edition.

IBM is thinking about doing the same with DB2. Currently, it offers the free Java-based Cloudscape database running on any standard Java Virtual Machine (JVM). They also offer a free PHP-bound version of DB2 called Zend Core available for Linux and AIX, and to be available for Windows.

However, it is not like you are limited to what IBM, Oracle and Microsoft have to offer or have to accept the limitations of their “free” products. There are many good free and open source databases such as MySQL, PostgreSQL, MaxDB, Firebird or Ingres. None of these free alternatives is as powerful as an Oracle database, but if you compare what you can buy with zero dollars, the big guys don’t necessarily come on top.

Idea for a cool AJAX-based project: a web-based slide projector

Thanks to tools like HTML Slidy and S5, you can build nice slide shows using HTML, CSS and some Javascript.

But what if you are lecturing at a distance? Imagine people get to watch you by videoconference while watching your slides. How do they know when to go to the next slide and so on? The solution, I believe, is to use AJAX. With a little PHP or Python based server-side script, you could control which slide must be displayed and anyone looking at your slide show would be automatically moved to the right slide.

That’s it. So simple. Anyone dare try it out? I bet it can be done with 200 lines of PHP and 200 lines of JavaScript, no more.

For extra points, make it so that the administrator can upload a picture (maybe using a webcam) which gets displayed in the right top corner, so that it feels a bit more like a videoconference.

Cross-platform videoconferencing/slides sharing: still a long way to go?

With Owen Kaser and Yuhong Yan, I am organizing our eBusiness Technologies course for this winter.

Now, Yuhong is in Fredericton, Owen is in Saint John and I’m in Montreal. How do we give one course all together? Last time was through expensive videoconferencing equipment, but this time, we are looking at cheaper, PC-based solutions. Owen and I are Linux users, Yuhong is a Windows user. Yuhong suggested we use IBM Sametime. I thought “Great! IBM is very pro-Linux!” At first, it looked great because the preferred Sametime client is Java-based with an added browser plugin. Well, after 4 hours of fun, the results are so-so. My experience with Sametime, both under Linux and Windows, wasn’t great. In both OSes (Windows and Linux), Firefox crashed on the first attempt to use Sametime. I didn’t get this problem using Internet Explorer. Restarting Firefox after a first crash fixed the problem. Then, the desktop sharing feature was, at first, completely disabled for me under Linux. Restarting Firefox a third time fixed the problem. However, desktop sharing under Linux was not great: in order to share a window, it has to be entirely visible so, this means, you must keep the window above the others. Not great. You don’t have this problem under Windows. Finally, videoconferencing is entirely disabled under Linux whereas it worked well under Windows. The only glitch to videoconferencing under Windows is that if you want to select another input device than the default one, you have to restart Sametime for the changes to take effect, but you get no dialog box warning you about it. Oh! And the license for Sametime was around $20k, though the client is available for free as a Java applet.

So, why not use gnomemeeting (Linux) and netmeeting (Windows)? Because gnomemeeting makes not effort to support the full T.120 protocol: this means no desktop sharing features under gnomemeeting for the foreseeable future. Of course, you can use VNC but if all you want to do is broadcast slides remotely, it is an overkill and since you don’t have integrated chatting and videoconferencing, this means a lot of fiddling around.

What I’m looking for is simple. I want basic videoconferencing and slides sharing (to display my PDF slides remotely) between Windows and Linux (and MacOS). It is sad to see that in 2005, I still can’t get this work.

As I was writting this, I was reminded of a post by Harold on Marratech which is available for Linux, Windows and MacOS. The best thing is that you can try Marratech for free (though without the desktop sharing feature) and even pay as you go at a rate of around $36 a day. Maybe that’s the solution I’m looking for? (I’m not related in any way to Marratech and I don’t even claim their product work. I haven’t tried it.)

Me with my new son Louka!

Finally, a picture of me with my new son, Louka, by my fireplace, no less. I had to remove red eyes using gimp.

Can you infer tags from text?

The buzz is all about tags these days. Tagyu is an interesting tool which claims to suggest tags based on the text content of the page. I’d like to see a description of the algorithm, but I see none.

  • http://www.daniel-lemire.com/ gets the tags “firefox” “web2.0″.
  • http://www.daniel-lemire.com/en/ gets the tag “job”.
  • http://www.daniel-lemire.com/fr/ gets the tags “france” and “uqam”.

It seems like the tags for my blog make sense, but the tags for my home pages (French and English) are really bad. Tagging my French home page with “france”? Maybe because I use the French language? It is a bit of a stretch. Tagging my English home page with “job”? No. I don’t think so.

The problem is interesting and I bet there are solid solutions, but we are not there yet.

I also question whether collaborative tags have a future. I must admit I don’t use them, so I won’t comment much further, but it is a bit too empirical for my taste.

Next Page »

19 queries. 0.409 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.