Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data. Statistics is a mathematical science pertaining to the collection analysis interpretation or explanation and presentation of Data. Signal processing is the analysis interpretation and manipulation of signals Signals of interest include sound, images, biological signals such as The parameters describe an underlying physical setting in such a way that the value of the parameters affects the distribution of the measured data. An estimator attempts to approximate the unknown parameters using the measurements. In Statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population Parameter (which is called the

For example, it is desired to estimate the proportion of a population of voters who will vote for a particular candidate. That proportion is the unobservable parameter; the estimate is based on a small random sample of voters.

Or, for example, in radar the goal is to estimate the location of objects (airplanes, boats, etc. Radar is a system that uses electromagnetic waves to identify the range altitude direction or speed of both moving and fixed objects such as Aircraft, ships ) by analyzing the received echo and a possible question to be posed is "where are the airplanes?" To answer where the airplanes are, it is necessary to estimate the distance the airplanes are at from the radar station, which can provide an absolute location if the absolute location of the radar station is known.

In estimation theory, it is assumed that the desired information is embedded into a noisy signal. In Science, and especially in Physics and Telecommunication, noise is fluctuations in and the addition of external factors to the stream of target In the fields of communications, Signal processing, and in Electrical engineering more generally a signal is any time-varying or spatial-varying quantity Noise adds uncertainty and if there was no uncertainty then there would be no need for estimation.

## Estimation process

The entire purpose of estimation theory is to arrive at an estimator, and preferably an implementable one that could actually be used. The estimator takes the measured data as input and produces an estimate of the parameters.

It is also preferable to derive an estimator that exhibits optimality. In Mathematics, the term optimization, or mathematical programming, refers to the study of problems in which one seeks to minimize or maximize a real function An optimal estimator would indicate that all available information in the measured data has been extracted, for if there was unused information in the data then the estimator would not be optimal.

These are the general steps to arrive at an estimator:

• In order to arrive at a desired estimator for estimating a single or multiple parameters, it is first necessary to determine a model for the system. This model should incorporate the process being modeled as well as points of uncertainty and noise. The model describes the physical scenario in which the parameters apply.
• After deciding upon a model, it is helpful to find the limitations placed upon an estimator. This limitation, for example, can be found through the Cramér-Rao bound. In Estimation theory and Statistics, the Cramér–Rao bound (CRB or Cramér–Rao lower bound (CRLB, named in honor of Harald Cramér and
• Next, an estimator needs to be developed or applied if an already known estimator is valid for the model. The estimator needs to be tested against the limitations to determine if it is an optimal estimator (if so, then no other estimator will perform better).
• Finally, experiments or simulations can be run using the estimator to test its performance.

After arriving at an estimator, real data might show that the model used to derive the estimator is incorrect, which may require repeating these steps to find a new estimator. A non-implementable or infeasible estimator may need to be scrapped and the process start anew.

In summary, the estimator estimates the parameters of a physical model based on measured data.

## Basics

To build a model, several statistical "ingredients" need to be known. These are needed to ensure the estimator has some mathematical tractability instead of being based on "good feel".

The first is a set of statistical samples taken from a random vector (RV) of size N. In Statistics, a sample is a Subset of a population. Typically the population is very large making a Census or a complete Enumeration A multivariate random variable or random vector is a vector X = ( X 1. Put into a vector,

$\mathbf{x} = \begin{bmatrix} x[0] \\ x[1] \\ \vdots \\ x[N-1] \end{bmatrix}.$

Secondly, we have the corresponding M parameters

$\mathbf{\theta} = \begin{bmatrix} \theta_1 \\ \theta_2 \\ \vdots \\ \theta_M \end{bmatrix}$,

which need to be established with their probability density function (pdf) or probability mass function (pmf)

$p(\mathbf{x} | \mathbf{\theta})$. In Mathematics, a probability density function (pdf is a function that represents a Probability distribution in terms of Integrals Formally a probability In Probability theory, a probability mass function (abbreviated pmf) is a function that gives the probability that a discrete Random variable

It is also possible for the parameters themselves to have a probability distribution (e. g. , Bayesian statistics). Bayesian inference is Statistical inference in which evidence or observations are used to update or to newly infer the Probability that a hypothesis may be true It is then necessary to define the epistemic probability

$\pi( \mathbf{\theta})$. Bayesian probability interprets the concept of Probability as 'a measure of a state of knowledge'.

After the model is formed, the goal is to estimate the parameters, commonly denoted $\hat{\mathbf{\theta}}$, where the "hat" indicates the estimate.

One common estimator is the minimum mean squared error (MMSE) estimator, which utilizes the error between the estimated parameters and the actual value of the parameters

$\mathbf{e} = \hat{\mathbf{\theta}} - \mathbf{\theta}$

as the basis for optimality. In Statistics and Signal processing, a minimum mean square error ( MMSE) estimator describes the approach which minimizes the Mean square error This error term is then squared and minimized for the MMSE estimator.

## Estimators

Commonly-used estimators, and topics related to them:

• Maximum likelihood estimators
• Bayes estimators
• Method of moments estimators
• Cramér-Rao bound
• Minimum mean squared error (MMSE), also known as Bayes least squared error (BLSE)
• Maximum a posteriori (MAP)
• Minimum variance unbiased estimator (MVUE)
• Best linear unbiased estimator (BLUE)
• Unbiased estimators — see estimator bias. Maximum likelihood estimation ( MLE) is a popular statistical method used for fitting a mathematical model to some data In Decision theory and Estimation theory, a Bayes estimator is an Estimator or decision rule that maximizes the posterior Expected value In Statistics, the method of moments is a method of Estimation of population parameters such as mean variance median etc In Estimation theory and Statistics, the Cramér–Rao bound (CRB or Cramér–Rao lower bound (CRLB, named in honor of Harald Cramér and In Statistics and Signal processing, a minimum mean square error ( MMSE) estimator describes the approach which minimizes the Mean square error In Statistics, the method of maximum a posteriori (MAP or posterior mode) estimation can be used to obtain a point estimate of an unobserved In Statistics a uniformly minimum-variance unbiased estimator or minimum-variance unbiased estimator (often abbreviated as UMVU or MVUE is an unbiased estimator that This article is not about Gauss–Markov processes In Statistics, the Gauss–Markov theorem, named after Carl Friedrich In Statistics, the difference between an Estimator 's Expected value and the true value of the parameter being estimated is called the bias.
• Particle filter
• Markov chain Monte Carlo (MCMC)
• Kalman filter
• Ensemble Kalman filter (EnKF)
• Wiener filter

## Example: DC gain in white Gaussian noise

Consider a received discrete signal, x[n], of N independent samples that consists of a DC gain A with Additive white Gaussian noise w[n] with known variance σ2 (i. Particle filters, also known as sequential Monte Carlo methods (SMC are sophisticated model Estimation techniques based on Simulation. The Kalman filter is an efficient Recursive filter that estimates the state of a Dynamic system from a series of noisy measurements The ensemble Kalman filter (EnKF is a Recursive filter suitable for problems with a large number of variables such as Discretizations of Partial differential In Signal processing, the Wiener filter is a filter proposed by Norbert Wiener during the 1940s and published in 1949 A discrete signal or discrete-time signal is a Time series, perhaps a signal that has been sampled from a continuous-time signal. In Probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other In Statistics, a sample is a Subset of a population. Typically the population is very large making a Census or a complete Enumeration Direct current ( DC) is the unidirectional flow of Electric charge. Explanation In communications, the additive white Gaussian noise ( AWGN) channel model is one in which the only impairment is the linear addition of In Probability theory and Statistics, the variance of a Random variable, Probability distribution, or sample is one measure of e. , $\mathcal{N}(0, \sigma^2)$). Since the variance is known then the only unknown parameter is A.

The model for the signal is then

$x[n] = A + w[n] \quad n=0, 1, \dots, N-1$

Two possible (of many) estimators are:

• $\hat{A}_1 = x[0]$
• $\hat{A}_2 = \frac{1}{N} \sum_{n=0}^{N-1} x[n]$ which is the sample mean

Both of these estimators have a mean of A, which can be shown through taking the expected value of each estimator

$\mathrm{E}\left[\hat{A}_1\right] = \mathrm{E}\left[ x[0] \right] = A$

and

$\mathrm{E}\left[ \hat{A}_2 \right]=\mathrm{E}\left[ \frac{1}{N} \sum_{n=0}^{N-1} x[n] \right]=\frac{1}{N} \left[ \sum_{n=0}^{N-1} \mathrm{E}\left[ x[n] \right] \right]=\frac{1}{N} \left[ N A \right]=A$

At this point, these two estimators would appear to perform the same. In Mathematics and Statistics, the arithmetic Mean (or simply the mean) of a list of numbers is the sum of all the members of the list divided In Statistics, mean has two related meanings the Arithmetic mean (and is distinguished from the Geometric mean or Harmonic mean However, the difference between them becomes apparent when comparing the variances.

$\mathrm{var} \left( \hat{A}_1 \right) = \mathrm{var} \left( x[0] \right) = \sigma^2$

and

$\mathrm{var} \left( \hat{A}_2 \right)=\mathrm{var} \left( \frac{1}{N} \sum_{n=0}^{N-1} x[n] \right)\overset{independence}{=}\frac{1}{N^2} \left[ \sum_{n=0}^{N-1} \mathrm{var} (x[n]) \right]=\frac{1}{N^2} \left[ N \sigma^2 \right]=\frac{\sigma^2}{N}$

It would seem that the sample mean is a better estimator since, as $N \to \infty$, the variance goes to zero.

### Maximum likelihood

Main article: Maximum likelihood

Continuing the example using the maximum likelihood estimator, the probability density function (pdf) of the noise for one sample w[n] is

$p(w[n]) = \frac{1}{\sigma \sqrt{2 \pi}} \exp\left(- \frac{1}{2 \sigma^2} w[n]^2 \right)$

and the probability of x[n] becomes (x[n] can be thought of a $\mathcal{N}(A, \sigma^2)$)

$p(x[n]; A) = \frac{1}{\sigma \sqrt{2 \pi}} \exp\left(- \frac{1}{2 \sigma^2} (x[n] - A)^2 \right)$

By independence, the probability of $\mathbf{x}$ becomes

$p(\mathbf{x}; A)=\prod_{n=0}^{N-1} p(x[n]; A)=\frac{1}{\left(\sigma \sqrt{2\pi}\right)^N}\exp\left(- \frac{1}{2 \sigma^2} \sum_{n=0}^{N-1}(x[n] - A)^2 \right)$

Taking the natural logarithm of the pdf

$\ln p(\mathbf{x}; A)=-N \ln \left(\sigma \sqrt{2\pi}\right)- \frac{1}{2 \sigma^2} \sum_{n=0}^{N-1}(x[n] - A)^2$

and the maximum likelihood estimator is

$\hat{A} = \arg \max \ln p(\mathbf{x}; A)$

Taking the first derivative of the log-likelihood function

$\frac{\partial}{\partial A} \ln p(\mathbf{x}; A)=\frac{1}{\sigma^2} \left[ \sum_{n=0}^{N-1}(x[n] - A) \right]=\frac{1}{\sigma^2} \left[ \sum_{n=0}^{N-1}x[n] - N A \right]$

and setting it to zero

$0=\frac{1}{\sigma^2} \left[ \sum_{n=0}^{N-1}x[n] - N A \right]=\sum_{n=0}^{N-1}x[n] - N A$

This results in the maximum likelihood estimator

$\hat{A} = \frac{1}{N} \sum_{n=0}^{N-1}x[n]$

which is simply the sample mean. Maximum likelihood estimation ( MLE) is a popular statistical method used for fitting a mathematical model to some data Maximum likelihood estimation ( MLE) is a popular statistical method used for fitting a mathematical model to some data In Mathematics, a probability density function (pdf is a function that represents a Probability distribution in terms of Integrals Formally a probability The natural logarithm, formerly known as the Hyperbolic logarithm is the Logarithm to the base e, where e is an irrational In Calculus, a branch of mathematics the derivative is a measurement of how a function changes when the values of its inputs change From this example, it was found that the sample mean is the maximum likelihood estimator for N samples of AWGN with a fixed, unknown DC gain.

### Cramér–Rao lower bound

Main article: Cramér–Rao bound

To find the Cramér-Rao lower bound (CRLB) of the sample mean estimator, it is first necessary to find the Fisher information number

$\mathcal{I}(A)=\mathrm{E}\left( \left[ \frac{\partial}{\partial\theta} \ln p(\mathbf{x}; A) \right]^2\right)=-\mathrm{E}\left[ \frac{\partial^2}{\partial\theta^2} \ln p(\mathbf{x}; A)\right]$

and copying from above

$\frac{\partial}{\partial A} \ln p(\mathbf{x}; A)=\frac{1}{\sigma^2} \left[ \sum_{n=0}^{N-1}x[n] - N A \right]$

Taking the second derivative

$\frac{\partial^2}{\partial A^2} \ln p(\mathbf{x}; A)=\frac{1}{\sigma^2} (- N)=\frac{-N}{\sigma^2}$

and finding the negative expected value is trivial since it is now a deterministic constant $-\mathrm{E}\left[ \frac{\partial^2}{\partial A^2} \ln p(\mathbf{x}; A)\right]=\frac{N}{\sigma^2}$

Finally, putting the Fisher information into

$\mathrm{var}\left( \hat{A} \right)\geq\frac{1}{\mathcal{I}}$

results in

$\mathrm{var}\left( \hat{A} \right)\geq\frac{\sigma^2}{N}$

Comparing this to the variance of the sample mean (determined previously) shows that the sample mean is equal to the Cramér-Rao lower bound for all values of N and A. In Estimation theory and Statistics, the Cramér–Rao bound (CRB or Cramér–Rao lower bound (CRLB, named in honor of Harald Cramér and In Estimation theory and Statistics, the Cramér–Rao bound (CRB or Cramér–Rao lower bound (CRLB, named in honor of Harald Cramér and In Statistics and Information theory, the Fisher information (denoted \mathcal{I}(\theta is the Variance of the score. The sample mean is the minimum variance unbiased estimator (MVUE) in addition to being the maximum likelihood estimator. In Statistics a uniformly minimum-variance unbiased estimator or minimum-variance unbiased estimator (often abbreviated as UMVU or MVUE is an unbiased estimator that Maximum likelihood estimation ( MLE) is a popular statistical method used for fitting a mathematical model to some data

## Fields that use estimation theory

There are numerous fields that require the use of estimation theory. Some of these fields include (but by no means limited to):

The measured data is likely to be subject to noise or uncertainty and it is through statistical probability that optimal solutions are sought to extract as much information from the data as possible. In scientific inquiry an experiment ( Latin: Ex- periri, "to try out" is a method of investigating particular types of research questions or Signal processing is the analysis interpretation and manipulation of signals Signals of interest include sound, images, biological signals such as In health care clinical trials are conducted to allow safety and Efficacy data to be collected for new drugs or devices An opinion poll is a survey of Public opinion from a particular sample. In Engineering and Manufacturing, quality control and quality engineering are involved in developing systems to ensure products or services Software engineering is the application of a systematic disciplined quantifiable approach to the development operation and maintenance of Software. Control theory is an interdisciplinary branch of Engineering and Mathematics, that deals with the behavior of Dynamical systems The desired output A network intrusion detection system ( NIDS) is an Intrusion detection system that tries to detect malicious activity such as Denial of service attacks In Science, and especially in Physics and Telecommunication, noise is fluctuations in and the addition of external factors to the stream of target Probability is the likelihood or chance that something is the case or will happen In Mathematics, the term optimization, or mathematical programming, refers to the study of problems in which one seeks to minimize or maximize a real function In Statistics and Information theory, the Fisher information (denoted \mathcal{I}(\theta is the Variance of the score.

## References

• Mathematical Statistics and Data Analysis by John Rice. (ISBN 0-534-209343)
• Fundamentals of Statistical Signal Processing: Estimation Theory by Steven M. Kay (ISBN 0-13-345711-7)
• An Introduction to Signal Detection and Estimation by H. Vincent Poor (ISBN 0-387-94173-8)
• Detection, Estimation, and Modulation Theory, Part 1 by Harry L. Van Trees (ISBN 0-471-09517-6; website)
• Optimal State Estimation: Kalman, H-infinity, and Nonlinear Approaches by Dan Simon website