Fisher information is defined in terms of the likelihood function and the score function.
Given a family of probability density functions parameterized by a parameter , the function is called the likelihood function and denoted by or .
The likelihood function is not a probability density function in the parameter space, but it sometimes can be normalized to such and interpreted in this manner.
The likelihood function is related to the Bayesian point of view on statistics. Its most fundamental application is to deriving maximum likelihood estimators. For instance, the sample mean is the maximum likelihood of the true mean of a population.
The score function is the following function:
TeX Embedding failed!
We note that this is a one-parameter family of random variables because of the dependence on the outcome of the experiment.
The likelihood function is a function of the parameter and of the event .
Thus, it is a family of random variables indexed with . Hence, it makes
sense to talk about the expected value and variance of , where
the probability measure is that corresponding to the parameter . Similarly, we may ask questions about the expected value and variance of the score function.
It is easy to see that the expected value of the score function is always zero.
The Fisher information is the variance of the score function. It is denoted by .
Thus, explicitly written, Fisher information is defined as:
TeX Embedding failed!
For calculations, the following formula is important:
TeX Embedding failed!
which can be justified under some reasonable integrability and differentiability assumptions, so that one can integrate by parts.
Excercises:
- If then
- Calculate for the Bernoulli process.
- Calculate for other distributions.
The Cramer-Rao Inequality yields a uniform bound on the variance of an estimator of a function of a parameter of a probability density function :
TeX Embedding failed!
In particular, if is an unbiased estimator of , i.e.
TeX Embedding failed!
for all then
TeX Embedding failed!
The Hammersley–Chapman–Robbins Inequality goes in the same direction, but it is often stronger and does not require differentiability of the distribution function:
TeX Embedding failed!
The Cramer-Rao bound generalizes to random vectors, where the variance is replaced with covariance, and the Fisher information also becomes a symmetric matrix. The inequality is interpreted as positive definiteness of the difference of two symmetric matrices.