Update: This is a failed experiment. Online TeX to MathML simply doesn’t work fast enough to be usable. What is needed is server side support, but I don’t trust current wordpress plugins.
(This post requires MathML and JavaScript support: use Firefox or a MathML plugin such as MathPlayer. It will also not display with the inline MathML in a RSS aggregator.)
In several signal processing and data mining applications, when people use a probabilistic model, the logarithm of the standard deviation appears, the rest being a standard error measure. Up to recently, I have been too lazy to figure out where the logarithm comes from, but I finally figured it out, in part thanks to my friend Yuhong Yan.
The Normal Distribution can be defined by the following density function:
`f(x;\mu,\sigma)= \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{(x- \mu)^2}{2\sigma^2} }`.
Ah! You see this exponential function? That’s where the logarithm will come from!
Suppose you have `m` (independent) samples of a normal distribution: `a_1,a_2, \ldots, a_m`. The joint normal distribution has the following density function:
`f(a_1,a_2, \ldots, a_m;\mu,\sigma,m)= \frac{1}{(\sigma\sqrt{2\pi})^m} e^ { -\sum_{i=1, \ldots,a_m} \frac{(a_i- \mu)^2}{2\sigma^2} }`.
The logarithm of the joint normal distribution is
`m \log \frac{1}{\sigma\sqrt{2\pi}} -\sum_{i=1, \ldots,a_m} \frac{(a_i- \mu)^2}{2\sigma^2}`
or
`-m \log (\sigma\sqrt{2\pi}) – \frac{\sum_{i=1, \ldots,a_m} (a_i- \mu)^2}{2\sigma^2}`.
You see the last bit? `\sum_{i=1, \ldots,a_m} (a_i- \mu)^2`? That’s the `l_2` error!
Hence, whenever you see the `l_2` mixed up with the logarithm of the standard deviation, chances are that you are looking at the logarithm of the normal distribution!
In particular, this trick applies to the Bayesian information criterion (BIC) which is used to select a model by maximizing or minimizing a log-likelihood function such as -2 log-likelihood ` + k \log(n)`, where `k` represents the number of parameters and `n` the number of observations in the fitted model. The log-likelihood component can sometimes be computed using the above analysis.
Reference: Schwarz, G. (1978) “Estimating the Dimension of a Model”, Annals of Statistics, 6, 461-464