Thursday, October 20th, 2005

Oracle Java Applications on Linux

Filed under: — Daniel Lemire @ 18:09

A nameless university is using Oracle’s jinitiator applets on some management web sites. Jinitiator is just Oracle’s version of the Java JVM, but you can use any recent JVM and be happy. The trick under Linux is to fool the browser into interpreting the mime-type “application/x-jinit-applet” (specific to Oracle) as just an ordinary applet. As it turns out, you just have to edit a small text file called pluginreg.dat.

Reference: Oracle Apps on Linux - AVallark.

See also my posts Oracle buys Hyperion, JOLAP versus the Oracle Java API, IBM, Oracle and Microsoft freeing their databases and Oracle and MySQL — is MySQL in a weak position?

Subscribe to this blog
in a reader
or by Email.

Spoofing your user agent - When Firefox tells the world it is Internet Explorer

Filed under: — Daniel Lemire @ 18:04

Some nameless university has some management web site requiring Internet Explorer. If you ask me, that’s a lot like requiring GM cars on some highways. Such a web site is no longer a web site, but an Internet Explorer site.

You can often get around these problems by using a Firefox extension called “User Agent Switcher”. It adds a menu and a toolbar button to switch the user agent of the browser. In effect, the web site will be fooled into thinking it is dealing with Internet Explorer.

My only regret is that unlike Konqueror, it seems Firefox cannot spoof only specific web sites. You switch your user agent for all sites at once.

Spam bots got to me: no more comments

Filed under: — Daniel Lemire @ 16:48

Spam bots killed my server. I had fancy spam filtering code in place, but it was taking too much juice to filter all the crap being sent at me. This blog is now read-only. There are just too many people buying penis enhancers and falling for get-rich-quick scams. Stop wasting your money.

Wednesday, October 19th, 2005

DEXA 2006 (February 21, 2006 / September 4-8, 2006)

Filed under: Data Warehousing and OLAP, Passed CFP — Daniel Lemire @ 6:53

The 17th International Conference on Database and Expert Systems Applications (DEXA 2006) call for papers is out. It will be held in Krakow, Poland.

The aim of DEXA 2006 is to present both research contributions in the area of data base and intelligent systems and a large spectrum of already implemented or just being developed applications. DEXA will offer the opportunity to extensively discuss requirements, problems, and solutions in the field. The workshop and conference should inspire a fruitful dialogue between developers in practice, users of database and expert systems, and scientists working in the field.

Tuesday, October 18th, 2005

Oracle and MySQL — is MySQL in a weak position?

Filed under: Data Warehousing and OLAP — Daniel Lemire @ 22:45

Oracle has recently bought Innobase which makes one library MySQL relies upon for storing its tables. One user on slashdot had the following insightful comment:

Among the technologies that MySQL licenses from third parties under commercial redistribution licenses:

Berkeley DB (Sleepycat Software)
InnoDB (Oracle, formerly Innobase)
MaxDB (SAP AG)

See the problem? MySQL itself is largely a language parser and a simple and technically inadequate storage engine (for anything where data integrity matters). In other words they don’t own any of the foundations of their technologies.

This is interesting. We always encourage developers to use and reuse existing libraries. Should MySQL be blamed for doing so?

The comparison with PostgreSQL is interesting. PostgreSQL works in a decentralized way as opposed to MySQL which is developed by single company, using libraries.

I think that MySQL could definitively be a fragile product whose development could be impaired through various business decisions. However, I think it has nothing to do with the fact that MySQL relies on libraries it hasn’t written, but rather on the fact that there is no community of MySQL developers.

Free Sofware is not a cure to the world’s hunger.However, building software using a highly distributed community might very be the best possible way to develop generic software.

Research versus Teaching versus Development versus Blogging versus Consulting

Filed under: — Daniel Lemire @ 22:21

I’m working rather intensively on a new course (Information Retrieval and Filtering) which should be offered in 2006 or 2007. This course is really a pleasure. Normally, teaching is something you do seriously, while you either do as much consulting or as much research as you can. You won’t see many university professors spending 60 hours a week preparing a single course. However, sometimes, teaching is something that you can really become passionate about. While I have published work in Information Retrieval, I never paid much attention to the field. Being too busy in my research to stop and start fiddling with more elementary concepts such as the Zipf law: where it comes from and what you can do with it. Thanks to Will Fitzgerald, I now know how to use n-grams and Shannon’s information value to determine the language a text is written in. As a researcher, this is highly enjoyable and likely to help my research.

Monday, October 17th, 2005

Where does the logarithm of the standard deviation comes from in model selection?

Filed under: — Daniel Lemire @ 10:02

Update: This is a failed experiment. Online TeX to MathML simply doesn’t work fast enough to be usable. What is needed is server side support, but I don’t trust current wordpress plugins.

(This post requires MathML and JavaScript support: use Firefox or a MathML plugin such as MathPlayer. It will also not display with the inline MathML in a RSS aggregator.)

In several signal processing and data mining applications, when people use a probabilistic model, the logarithm of the standard deviation appears, the rest being a standard error measure. Up to recently, I have been too lazy to figure out where the logarithm comes from, but I finally figured it out, in part thanks to my friend Yuhong Yan.

The Normal Distribution can be defined by the following density function:

`f(x;\mu,\sigma)= \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{(x- \mu)^2}{2\sigma^2} }`.

Ah! You see this exponential function? That’s where the logarithm will come from!

Suppose you have `m` (independent) samples of a normal distribution: `a_1,a_2, \ldots, a_m`. The joint normal distribution has the following density function:

`f(a_1,a_2, \ldots, a_m;\mu,\sigma,m)= \frac{1}{(\sigma\sqrt{2\pi})^m} e^ { -\sum_{i=1, \ldots,a_m} \frac{(a_i- \mu)^2}{2\sigma^2} }`.

The logarithm of the joint normal distribution is

`m \log \frac{1}{\sigma\sqrt{2\pi}} -\sum_{i=1, \ldots,a_m} \frac{(a_i- \mu)^2}{2\sigma^2}`

or

`-m \log (\sigma\sqrt{2\pi}) - \frac{\sum_{i=1, \ldots,a_m} (a_i- \mu)^2}{2\sigma^2}`.

You see the last bit? `\sum_{i=1, \ldots,a_m} (a_i- \mu)^2`? That’s the `l_2` error!

Hence, whenever you see the `l_2` mixed up with the logarithm of the standard deviation, chances are that you are looking at the logarithm of the normal distribution!

In particular, this trick applies to the Bayesian information criterion (BIC) which is used to select a model by maximizing or minimizing a log-likelihood function such as -2 log-likelihood ` + k \log(n)`, where `k` represents the number of parameters and `n` the number of observations in the fitted model. The log-likelihood component can sometimes be computed using the above analysis.

Reference: Schwarz, G. (1978) “Estimating the Dimension of a Model”, Annals of Statistics, 6, 461-464

« Previous PageNext Page »

33 queries. 0.362 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.