Monday, October 17th, 2005

Where does the logarithm of the standard deviation comes from in model selection?

Filed under: — Daniel Lemire @ 10:02

Update: This is a failed experiment. Online TeX to MathML simply doesn’t work fast enough to be usable. What is needed is server side support, but I don’t trust current wordpress plugins.

(This post requires MathML and JavaScript support: use Firefox or a MathML plugin such as MathPlayer. It will also not display with the inline MathML in a RSS aggregator.)

In several signal processing and data mining applications, when people use a probabilistic model, the logarithm of the standard deviation appears, the rest being a standard error measure. Up to recently, I have been too lazy to figure out where the logarithm comes from, but I finally figured it out, in part thanks to my friend Yuhong Yan.

The Normal Distribution can be defined by the following density function:

`f(x;\mu,\sigma)= \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{(x- \mu)^2}{2\sigma^2} }`.

Ah! You see this exponential function? That’s where the logarithm will come from!

Suppose you have `m` (independent) samples of a normal distribution: `a_1,a_2, \ldots, a_m`. The joint normal distribution has the following density function:

`f(a_1,a_2, \ldots, a_m;\mu,\sigma,m)= \frac{1}{(\sigma\sqrt{2\pi})^m} e^ { -\sum_{i=1, \ldots,a_m} \frac{(a_i- \mu)^2}{2\sigma^2} }`.

The logarithm of the joint normal distribution is

`m \log \frac{1}{\sigma\sqrt{2\pi}} -\sum_{i=1, \ldots,a_m} \frac{(a_i- \mu)^2}{2\sigma^2}`

or

`-m \log (\sigma\sqrt{2\pi}) - \frac{\sum_{i=1, \ldots,a_m} (a_i- \mu)^2}{2\sigma^2}`.

You see the last bit? `\sum_{i=1, \ldots,a_m} (a_i- \mu)^2`? That’s the `l_2` error!

Hence, whenever you see the `l_2` mixed up with the logarithm of the standard deviation, chances are that you are looking at the logarithm of the normal distribution!

In particular, this trick applies to the Bayesian information criterion (BIC) which is used to select a model by maximizing or minimizing a log-likelihood function such as -2 log-likelihood ` + k \log(n)`, where `k` represents the number of parameters and `n` the number of observations in the fitted model. The log-likelihood component can sometimes be computed using the above analysis.

Reference: Schwarz, G. (1978) “Estimating the Dimension of a Model”, Annals of Statistics, 6, 461-464

No Comments »

No comments yet.

RSS feed for comments on this post.

Leave a comment

Warning: When entering a long comment, please ensure that you make copy of your text prior to submitting it. If the server should fail or if you hit a bug, you might lose your work. I am not responsible for your lost effort.

To spammers: I carefully review every single post and make sure that spam gets deleted. You are wasting your time if you are manually entering spam using this form. Read my terms of use to see what I consider to be abusive.

Example: I + II + IX= XII. Yes, you have to enter a roman numeral. (Answer must be in upper case.)

« Blog's main page

26 queries. 2.923 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.