In estimation theory and statistics, the Cramér–Rao bound (CRB) or Cramér–Rao lower bound (CRLB), named in honor of Harald Cramér and Calyampudi Radhakrishna Rao who were among the first to derive it,[1][2][3] expresses a lower bound on the variance of estimators of a deterministic parameter. Estimation theory is a branch of Statistics and Signal processing that deals with estimating the values of parameters based on measured/empirical data Statistics is a mathematical science pertaining to the collection analysis interpretation or explanation and presentation of Data. Harald Cramér ( September 25, 1893 - October 5, 1985) was a Swedish Mathematician, Actuary, and Statistician Calyampudi Radhakrishna Rao ( Kannada: ಕಲ್ಯಾಂಪುದಿ ರಾಧಾಕೃಷ್ಣ ರಾಯ) FRS (born September 10, In Probability theory and Statistics, the variance of a Random variable, Probability distribution, or sample is one measure of In Statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population Parameter (which is called the The bound is also known as the Cramér–Rao inequality or the information inequality.

In its simplest form, the bound states that the variance of any unbiased estimator is at least as high as the inverse of the Fisher information. In Statistics, the difference between an Estimator 's Expected value and the true value of the parameter being estimated is called the bias. In Statistics and Information theory, the Fisher information (denoted \mathcal{I}(\theta is the Variance of the score. An unbiased estimator which achieves this lower bound is said to be efficient. In Statistics, efficiency is one measure of desirability of an Estimator. Such a solution achieves the lowest possible mean squared error among all unbiased methods, and is therefore the minimum variance unbiased (MVU) estimator. In Statistics, the mean squared error or MSE of an Estimator is one of many ways to quantify the amount by which an Estimator differs from the In Statistics a uniformly minimum-variance unbiased estimator or minimum-variance unbiased estimator (often abbreviated as UMVU or MVUE is an unbiased estimator that However, in some cases, no unbiased technique exists which achieves the bound. This may occur even when an MVU estimator exists.

The Cramér–Rao bound can also be used to bound the variance of biased estimators. In Statistics, the difference between an Estimator 's Expected value and the true value of the parameter being estimated is called the bias. In some cases, a biased approach can result in both a variance and a mean squared error that are below the unbiased Cramér–Rao lower bound; see estimator bias. In Statistics, the mean squared error or MSE of an Estimator is one of many ways to quantify the amount by which an Estimator differs from the In Statistics, the difference between an Estimator 's Expected value and the true value of the parameter being estimated is called the bias.

## Statement

The Cramér–Rao bound is stated in this section for several increasingly general cases, beginning with the case in which the parameter is a scalar and its estimator is unbiased. In Linear algebra, Real numbers are called Scalars and relate to vectors in a Vector space through the operation of Scalar multiplication In Statistics, the difference between an Estimator 's Expected value and the true value of the parameter being estimated is called the bias. All versions of the bound require certain regularity conditions, which hold for most well-behaved distributions. These conditions are listed later in this section.

### Scalar unbiased case

Suppose θ is an unknown deterministic parameter which is to be estimated from measurements x, distributed according to some probability density function f(x;θ). In Mathematics, a probability density function (pdf is a function that represents a Probability distribution in terms of Integrals Formally a probability The variance of any unbiased estimator $\hat{\theta}$ of θ is then bounded by the inverse of the Fisher information I(θ):

$\mathrm{var}(\hat{\theta})\geq\frac{1}{I(\theta)}$

where the Fisher information I(θ) is defined by

$I(\theta) = \mathrm{E} \left[ \left( \frac{\partial \ell(x;\theta)}{\partial\theta} \right)^2 \right] = -\mathrm{E}\left[ \frac{\partial^2 \ell(x;\theta)}{\partial\theta^2} \right]$

and $\ell(x;\theta)=\log f(x;\theta)$ is the natural logarithm of the likelihood function and E denotes the expected value. In Probability theory and Statistics, the variance of a Random variable, Probability distribution, or sample is one measure of In Mathematics, a multiplicative inverse for a number x, denoted by 1&frasl x or x &minus1 is a number which In Statistics and Information theory, the Fisher information (denoted \mathcal{I}(\theta is the Variance of the score. The natural logarithm, formerly known as the Hyperbolic logarithm is the Logarithm to the base e, where e is an irrational In Statistics, the likelihood function (often simply the likelihood) is a function of the Parameters of a Statistical model that plays a key role

The efficiency of an unbiased estimator $\hat{\theta}$ measures how close this estimator's variance comes to this lower bound; estimator efficiency is defined as

$e(\hat{\theta}) = \frac{I(\theta)^{-1}}{{\rm var}(\hat{\theta})}$

or the minimum possible variance for an unbiased estimator divided by its actual variance. In Statistics, efficiency is one measure of desirability of an Estimator. The Cramér–Rao lower bound thus gives $e(\hat{\theta}) \le 1.$

### General scalar case

A more general form of the bound can be obtained by considering an unbiased estimator T(X) of a function ψ(θ) of the parameter θ. Here, unbiasedness is understood as stating that E{T(X)} = ψ(θ). In this case, the bound is given by

$\mathrm{var}(T)\geq\frac{[\psi'(\theta)]^2}{I(\theta)}$

where ψ'(θ) is the derivative of ψ(θ), and I(θ) is the Fisher information defined above.

Apart from being a bound on estimators of functions of the parameter, this approach can be used to derive a bound on the variance of biased estimators with a given bias, as follows. Consider an estimator $\hat{\theta}$ with bias $b(\theta) = E\{\hat{\theta}\} - \theta$, and let ψ(θ) = b(θ) + θ. By the result above, any unbiased estimator whose expectation is ψ(θ) has variance greater than or equal to (ψ'(θ))2 / I(θ). Thus, any estimator $\hat{\theta}$ whose bias is given by a function b(θ) satisfies

$\mathrm{var} \left(\hat{\theta}\right)\geq\frac{[1+b'(\theta)]^2}{I(\theta)}.$

Clearly, the unbiased version of the bound is a special case of this result, with b(θ) = 0.

### Multivariate case

Extending the Cramér–Rao bound to multiple parameters, define a parameter column vector

$\boldsymbol{\theta} = \left[ \theta_1, \theta_2, \dots, \theta_d \right]^T \in \mathbb{R}^d$

with probability density function $f(x; \boldsymbol{\theta})$ which satisfies the two regularity conditions below. In Mathematics, a vector space (or linear space) is a collection of objects (called vectors) that informally speaking may be scaled and added

The Fisher information matrix is a $d \times d$ matrix with element Im,k defined as

$I_{m, k} = \mathrm{E} \left[ \frac{d}{d\theta_m} \log f\left(x; \boldsymbol{\theta}\right) \frac{d}{d\theta_k} \log f\left(x; \boldsymbol{\theta}\right)\right].$

Let $\boldsymbol{T}(X)$ be an estimator of any vector function of parameters, $\boldsymbol{T}(X) = (T_1(X), \ldots, T_n(X))^T$, and denote its expectation vector $\mathrm{E}[\boldsymbol{T}(X)]$ by $\boldsymbol{\psi}(\boldsymbol{\theta})$. In Statistics and Information theory, the Fisher information (denoted \mathcal{I}(\theta is the Variance of the score. The Cramér–Rao bound then states that the covariance matrix of $\boldsymbol{T}(X)$ satisfies

$\mathrm{cov}_{\boldsymbol{\theta}}\left(\boldsymbol{T}(X)\right)\geq \frac {\partial \boldsymbol{\psi} \left(\boldsymbol{\theta}\right)} {\partial \boldsymbol{\theta}}[I\left(\boldsymbol{\theta}\right)]^{-1}\left( \frac {\partial \boldsymbol{\psi}\left(\boldsymbol{\theta}\right)} {\partial \boldsymbol{\theta}}\right)^T$

where

• The matrix inequality $A \ge B$ is understood to mean that the matrix AB is positive semidefinite, and
• $\partial \boldsymbol{\psi}(\boldsymbol{\theta})/\partial \boldsymbol{\theta}$ is a matrix whose ijth element is given by $\partial \psi_i(\boldsymbol{\theta})/\partial \theta_j$. In Statistics and Probability theory, the covariance matrix is a matrix of Covariances between elements of a vector In Linear algebra, a positive-definite matrix is a (Hermitian matrix which in many ways is analogous to a Positive Real number.

If $\boldsymbol{T}(X)$ is an unbiased estimator of $\boldsymbol{\theta}$ (i. In Statistics, the difference between an Estimator 's Expected value and the true value of the parameter being estimated is called the bias. e. , $\boldsymbol{\psi}\left(\boldsymbol{\theta}\right) = \boldsymbol{\theta}$), then the Cramér–Rao bound reduces to

$\mathrm{cov}_{\boldsymbol{\theta}}\left(\boldsymbol{T}(X)\right)\geqI\left(\boldsymbol{\theta}\right)^{-1}.$

### Regularity conditions

The bound relies on two weak regularity conditions on the probability density function, f(x;θ), and the estimator T(X):

• The Fisher information is always defined; equivalently, for all x such that f(x;θ) > 0,
$\frac{\partial}{\partial\theta} \ln f(x;\theta)$
exists, and is finite. In Mathematics, a probability density function (pdf is a function that represents a Probability distribution in terms of Integrals Formally a probability
• The operations of integration with respect to x and differentiation with respect to θ can be interchanged in the expectation of T; that is,
$\frac{\partial}{\partial\theta} \left[ \int T(x) f(x;\theta) \,dx \right] = \int T(x) \left[ \frac{\partial}{\partial\theta} f(x;\theta) \right] \,dx$
whenever the right-hand side is finite.
This condition can often be confirmed by using the fact that integration and differentiation can be swapped when either of the following cases hold:
1. The function f(x;θ) has bounded support in x, and the bounds do not depend on θ;
2. The function f(x;θ) has infinite support, is continuously differentiable, and the integral converges uniformly for all θ.

### Simplified form of the Fisher information

Suppose, in addition, that the operations of integration and differentiation can be swapped for the second derivative of f(x;θ) as well, i. e. ,

$\frac{\partial^2}{\partial\theta^2} \left[ \int T(x) f(x;\theta) \,dx \right] = \int T(x) \left[ \frac{\partial^2}{\partial\theta^2} f(x;\theta) \right] \,dx.$

In this case, it can be shown that the Fisher information equals

$I(\theta)= -\mathrm{E} \left[ \frac{\partial^2}{\partial\theta^2} \log f(X;\theta) \right].$

The Cramér–Rao bound can then be written as

$\mathrm{var} \left(\widehat{\theta}\right)\geq\frac{1}{I(\theta)}=\frac{1}{ -\mathrm{E} \left[ \frac{\partial^2}{\partial\theta^2} \log f(X;\theta) \right]}.$

In some cases, this formula gives a more convenient technique for evaluating the bound.

## Single-parameter proof

The following is a proof of the general scalar case of the Cramér–Rao bound, which was described above; namely, that if the expectation of T is denoted by ψ(θ), then, for all θ,

${\rm var}(t(X)) \geq \frac{[\psi^\prime(\theta)]^2}{I(\theta)}.$

Let X be a random variable with probability density function f(x;θ). A random variable is a rigorously defined mathematical entity used mainly to describe Chance and Probability in a mathematical way Here T = t(X) is a statistic, which is used as an estimator for ψ(θ). A statistic (singular is the result of applying a function (statistical Algorithm) to a set of data. In Statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population Parameter (which is called the If V is the score, i. In Statistics, the score or score function is the Partial derivative, with respect to some parameter \theta of the Logarithm (commonly e.

$V = \frac{\partial}{\partial\theta} \ln f(X;\theta)$

then the expectation of V, written E(V), is zero. If we consider the covariance cov(V,T) of V and T, we have cov(V,T) = E(VT), because E(V) = 0. In Probability theory and Statistics, covariance is a measure of how much two variables change together (the Variance is a special case of the covariance Expanding this expression we have

${\rm cov}(V,T)={\rm E}\left( T \cdot \frac{\partial}{\partial\theta} \ln f(X;\theta)\right)$

This may be expanded using the chain rule

$\frac{\partial}{\partial\theta} \ln Q = \frac{1}{Q}\frac{\partial Q}{\partial\theta}$

and the definition of expectation gives, after cancelling f(x;θ),

${\rm E} \left( T \cdot \frac{\partial}{\partial\theta} \ln f(X;\theta)\right)=\int t(x) \left[ \frac{\partial}{\partial\theta} f(x;\theta) \right]\, dx=\frac{\partial}{\partial\theta}\left[ \int t(x)f(x;\theta)\,dx\right]=\psi^\prime(\theta)$

because the integration and differentiation operations commute (second condition). In Calculus, the chain rule is a Formula for the Derivative of the composite of two functions.

The Cauchy-Schwarz inequality shows that

$\sqrt{ {\rm var} (T) {\rm var} (V)} \geq \left| {\rm cov}(V,T) \right| = \left | \psi^\prime (\theta)\right |$

therefore

${\rm var\ } T \geq \frac{[\psi^\prime(\theta)]^2}{{\rm var} (V)}=\frac{[\psi^\prime(\theta)]^2}{I(\theta)}=\left[ \frac{\partial}{\partial\theta} {\rm E} (T)\right]^2\frac{1}{I(\theta)}$

which proves the proposition. In Mathematics, the Cauchy–Schwarz inequality, also known as the Schwarz inequality, the Cauchy inequality, or the Cauchy–Schwarz–Bunyakovsky

## Examples

### Multivariate normal distribution

For the case of a d-variate normal distribution

$\boldsymbol{x}\simN_d\left( \boldsymbol{\mu} \left( \boldsymbol{\theta} \right) , C \left( \boldsymbol{\theta} \right)\right)$
$f\left( \boldsymbol{x}; \boldsymbol{\theta} \right)=\frac{1}{\sqrt{ (2\pi)^d \left| C \right| }}\exp\left( -\frac{1}{2} \left( \boldsymbol{x} - \boldsymbol{\mu} \right)^{T} C^{-1} \left( \boldsymbol{x} - \boldsymbol{\mu} \right)\right).$

The Fisher information matrix has elements

$I_{m, k}=\frac{\partial \boldsymbol{\mu}^T}{\partial \theta_m}C^{-1}\frac{\partial \boldsymbol{\mu}}{\partial \theta_k}+\frac{1}{2}\mathrm{tr}\left( C^{-1} \frac{\partial C}{\partial \theta_m} C^{-1} \frac{\partial C}{\partial \theta_k}\right)$

where "tr" is the trace. MVN redirects here For the airport with that IATA code in Mount Vernon Illinois, see Mount Vernon Airport. In Mathematics, a probability density function (pdf is a function that represents a Probability distribution in terms of Integrals Formally a probability In Statistics and Information theory, the Fisher information (denoted \mathcal{I}(\theta is the Variance of the score. In Linear algebra, the trace of an n -by- n Square matrix A is defined to be the sum of the elements on the Main diagonal

Let w[n] be a white Gaussian noise (a sample of N independent observations) with variance σ2

$w[n] \sim \mathbb{N}_N \left(\boldsymbol{\mu}(\theta), \sigma^2 I \right).$

Where

$\boldsymbol{\mu}(\theta)_i = \theta = \text{mean},$

and $\boldsymbol{\mu}(\theta)$ has N (the number of independent observations) terms.

Then the Fisher information matrix is 1 × 1

$I(\theta)=\left(\frac{\partial\boldsymbol{\mu}(\theta)}{\partial\theta_m}\right)^TC^{-1}\left(\frac{\partial\boldsymbol{\mu}(\theta)}{\partial\theta_k}\right) = \sum^N_{i=0}\frac{1}{\sigma^2} = \frac{N}{\sigma^2},$

and so the Cramér–Rao bound is

$\mathrm{var}\left(\hat \theta\right)\geq\frac{\sigma^2}{N}.$

### Normal variance with known mean

Suppose X is a normally distributed random variable with known mean μ and unknown variance σ2. The normal distribution, also called the Gaussian distribution, is an important family of Continuous probability distributions applicable in many fields Consider the following statistic:

$T=\frac{\sum_{i=1}^n\left(X_i-\mu\right)^2}{n}.$

Then T is unbiased for σ2, as E(T) = σ2. What is the variance of T?

$\mathrm{Var}(T) = \frac{\mathrm{var}(X-\mu)^2}{n}=\frac{1}{n}\left[E\left\{(X-\mu)^4\right\}-\left(E\left\{(X-\mu)^2\right\}\right)^2\right]$

(the second equality follows directly from the definition of variance). The first term is the fourth moment about the mean and has value 3(σ2)2; the second is the square of the variance, or 2)2. In Probability theory and Statistics, the k th moment about the Mean (or k th central moment Thus

$\mathrm{var}(T)=\frac{2(\sigma^2)^2}{n}.$

Now, what is the Fisher information in the sample? Recall that the score V is defined as

$V=\frac{\partial}{\partial\sigma^2}\log L(\sigma^2,X)$

where L is the likelihood function. In Statistics and Information theory, the Fisher information (denoted \mathcal{I}(\theta is the Variance of the score. In Statistics, the score or score function is the Partial derivative, with respect to some parameter \theta of the Logarithm (commonly In Statistics, the likelihood function (often simply the likelihood) is a function of the Parameters of a Statistical model that plays a key role Thus in this case,

$V=\frac{\partial}{\partial\sigma^2}\log\left[\frac{1}{\sqrt{2\pi\sigma^2}}e^{-(X-\mu)^2/{2\sigma^2}}\right]=\frac{(X-\mu)^2}{2(\sigma^2)^2}-\frac{1}{2\sigma^2}$

where the second equality is from elementary calculus. Thus, the information in a single observation is just minus the expectation of the derivative of V, or

$I=-E\left(\frac{\partial V}{\partial\sigma^2}\right)=-E\left(-\frac{(X-\mu)^2}{(\sigma^2)^3}+\frac{1}{2(\sigma^2)^2}\right)=\frac{\sigma^2}{(\sigma^2)^3}-\frac{1}{2(\sigma^2)^2}=\frac{1}{2(\sigma^2)^2}.$

Thus the information in a sample of n independent observations is just n times this, or $\frac{n}{2(\sigma^2)^2}$.

The Cramer Rao bound states that

$\mathrm{var}(T)\geq\frac{1}{I}.$

In this case, the inequality is saturated (equality is achieved), showing that the estimator is efficient. In Statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population Parameter (which is called the In Statistics, efficiency is one measure of desirability of an Estimator.

## References and notes

1. ^ Cramér, Harald (1946). In Statistics, the Chapman–Robbins bound or Hammersley–Chapman–Robbins bound is a lower bound on the Variance of Estimators of a deterministic Mathematical Methods of Statistics. Princeton Univ. Press. ISBN 0-691-08004-6.
2. ^ Rao, Calyampudi (1945). "Information and the accuracy attainable in the estimation of statistical parameters". Bull. Calcutta Math. Soc. 37: 81–89.
3. ^ Rao, Calyampudi (1994). in S. Das Gupta: Selected Papers of C. R. Rao. Wiley. ISBN 978-0470220917.