In statistics, the mean squared error or MSE of an estimator is one of many ways to quantify the amount by which an estimator differs from the true value of the quantity being estimated. Statistics is a mathematical science pertaining to the collection analysis interpretation or explanation and presentation of Data. In Statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population Parameter (which is called the In Statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population Parameter (which is called the As a loss function, MSE is called squared error loss. In Statistics, Decision theory and Economics, a loss function is a function that maps an event (technically an element of a Sample space MSE measures the average of the square of the "error. " The error is the amount by which the estimator differs from the quantity to be estimated. The difference occurs because of randomness or because the estimator doesn't account for information that could produce a more accurate estimate. Randomness is a lack of order Purpose, cause, or predictability Omitted-variable bias (OVB is the bias that appears in estimates of Parameters in a Regression analysis when the assumed specification is incorrect [1]

The MSE is the second moment (about the origin) of the error, and thus incorporates both the variance of the estimator and its bias. In Probability theory and Statistics, the variance of a Random variable, Probability distribution, or sample is one measure of For an unbiased estimator, the MSE is the variance. In Statistics, the difference between an Estimator 's Expected value and the true value of the parameter being estimated is called the bias. Like the variance, MSE has the same unit of measurement as the square of the quantity being estimated. In an analogy to standard deviation, taking the square root of MSE yields the root mean squared error or RMSE, which has the same units as the quantity being estimated; for an unbiased estimator, the RMSE is the square root of the variance, known as the standard error. In Probability and Statistics, the standard deviation is a measure of the dispersion of a collection of values The root mean square deviation (RMSD ( also root mean square error (RMSE) is a frequently-used measure of the differences between values predicted by a model

## Definition and basic properties

The MSE of an estimator $\hat{\theta}$ with respect to the estimated parameter θ is defined as

$\operatorname{MSE}(\hat{\theta})=\operatorname{E}((\hat{\theta}-\theta)^2).$

The MSE can be written as the sum of the variance and the squared bias of the estimator

$\operatorname{MSE}(\hat{\theta})=\operatorname{Var}\left(\hat{\theta}\right)+ \left(\operatorname{Bias}(\hat{\theta},\theta)\right)^2.$

The MSE thus assesses the quality of an estimator in terms of its variation and unbiasedness. In Statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population Parameter (which is called the In Probability theory and Statistics, the variance of a Random variable, Probability distribution, or sample is one measure of In Statistics, the difference between an Estimator 's Expected value and the true value of the parameter being estimated is called the bias. Note that the MSE is not equivalent to the expected value of the absolute error. In the mathematical field of Numerical analysis, the approximation error in some data is the discrepancy between an exact value and some approximation to it

In a statistical model where the estimand is unknown, the MSE is a random variable whose value must be estimated. A random variable is a rigorously defined mathematical entity used mainly to describe Chance and Probability in a mathematical way This is usually done by the sample mean

$\operatorname{\widehat{MSE}}(\hat{\theta}) = \frac{1}{n} \sum_{j=1}^n \left(\theta_j-\theta\right)^2$

with θj being realizations of the estimator $\hat{\theta}$ of size n. In Mathematics and Statistics, the arithmetic Mean (or simply the mean) of a list of numbers is the sum of all the members of the list divided

## Examples

Suppose we have a random sample of size n from any identically distributed population, $X_1,\dots,X_n$.

Some commonly-used estimators of the true parameters of the population, μ and σ2, are[2]

True valueEstimatorMean squared error
θ = μ$\hat{\theta}$ = the unbiased estimator of the sample mean, $\overline{X}=\frac{1}{n}\sum_{i=1}^n(X_i)$$\operatorname{MSE}(\overline{X})=\operatorname{E}((\overline{X}-\mu)^2)=\left(\frac{\sigma}{\sqrt{n}}\right)^2$
θ = σ2$\hat{\theta}$ = the unbiased estimator of the sample variance, $S^2_{n-1} = \frac{1}{n-1}\sum_{i=1}^n\left(X_i-\overline{X}\,\right)^2$$\operatorname{MSE}(S^2_{n-1})=\operatorname{E}((S^2_{n-1}-\sigma^2)^2)=\frac{2}{n - 1}\sigma^4$
θ = σ2$\hat{\theta}$ = the biased estimator of the sample variance, $S^2_{n} = \frac{1}{n}\sum_{i=1}^n\left(X_i-\overline{X}\,\right)^2$$\operatorname{MSE}(S^2_{n})=\operatorname{E}((S^2_{n}-\sigma^2)^2)=\frac{2n + 1}{n^2}\sigma^4$
θ = σ2$\hat{\theta}$ = the biased estimator of the sample variance, $S^2_{n+1} = \frac{1}{n+1}\sum_{i=1}^n\left(X_i-\overline{X}\,\right)^2$$\operatorname{MSE}(S^2_{n+1})=\operatorname{E}((S^2_{n+1}-\sigma^2)^2)=\frac{2}{n + 1}\sigma^4$

Note that:

1. Unbiased estimators may not produce estimates with the smallest total variation (as measured by MSE): $S^2_{n-1}$'s MSE is larger than $S^2_{n+1}$'s MSE. In Mathematics and Statistics, the arithmetic Mean (or simply the mean) of a list of numbers is the sum of all the members of the list divided In Probability theory and Statistics, the variance of a Random variable, Probability distribution, or sample is one measure of In Probability theory and Statistics, the variance of a Random variable, Probability distribution, or sample is one measure of In Probability theory and Statistics, the variance of a Random variable, Probability distribution, or sample is one measure of
2. Estimators with the smallest total variation may produce biased estimates: $S^2_{n+1}$ typically underestimates σ2 by $\frac{2}{n}\sigma^2$

## Interpretation

An MSE of zero, meaning that the estimator $\hat{\theta}$ predicts observations of the parameter θ with perfect accuracy, is the ideal and forms the basis for the least squares method of regression analysis. The method of least squares is used to solve Overdetermined systems Least squares is often applied in statistical contexts particularly Regression analysis. In statistics regression analysis is a collective name for techniques for the modeling and analysis of numerical data consisting of values of a Dependent variable (response

While particular values of MSE other than zero are meaningless in and of themselves, they may be used for comparative purposes. Two or more statistical models may be compared using their MSEs as a measure of how well they explain a given set of observations: The unbiased model with the smallest MSE is generally interpreted as best explaining the variability in the observations. Statistical models are used in Applied statistics. Three notions are sufficient to describe all statistical models

Both Analysis of Variance and Linear Regression techniques estimate MSE as part of the analysis and use the estimated MSE to determine the statistical significance of the factors or predictors under study. In Statistics, ANOVA is short for analysis of variance Analysis of variance is a collection of Statistical models and their associated procedures in which the observed In statistics linear regression is a form of Regression analysis in which the relationship between one or more Independent variables and another variable called In Statistics, a result is called statistically significant if it is unlikely to have occurred by Chance. The goal of Design of Experiments is to construct experiments in such a way that when the observations are analyzed, the MSE is close to zero relative to the magnitude of at least one of the estimated treatment effects. Design of experiments, or experimental design, is the design of all information-gathering exercises where variation is present whether under the full control of the experimenter

MSE is also used in several stepwise regression techniques as part of the determination as to how many predictors from a candidate set to include in a model for a given set of observations. In Statistics, stepwise regression includes regression models in which the choice of predictive variables is carried out by an automatic procedure

## Applications

• Minimizing MSE is a key criterion in selection estimators. Among unbiased estimators, the minimal MSE is equivalent to minimizing the variance, and is obtained by the MVUE. In Statistics a uniformly minimum-variance unbiased estimator or minimum-variance unbiased estimator (often abbreviated as UMVU or MVUE is an unbiased estimator that However, a biased estimator may have lower MSE; see estimator bias. In Statistics, the difference between an Estimator 's Expected value and the true value of the parameter being estimated is called the bias.
• In statistical modelling, the MSE is defined as the difference between the actual observations and the response predicted by the model and is used to determine whether the model does not fit the data or whether the model can be simplified by removing terms. Statistical models are used in Applied statistics. Three notions are sufficient to describe all statistical models

## Criticism

The MSE is one of the most widely-used loss functions in statistics. In Statistics, Decision theory and Economics, a loss function is a function that maps an event (technically an element of a Sample space Its widespread use stems more from mathematical convenience than considerations of actual loss in applications. Carl Friedrich Gauss, who introduced the use of mean squared error, was aware of its arbitrariness and was in agreement with objections to it on these grounds. Johann Carl Friedrich Gauss (ˈɡaʊs, Gauß Carolus Fridericus Gauss ( 30 April 1777 – 23 February 1855) was a German [1] The mathematical benefits of mean squared error are particularly evident in its use at analyzing the performance of linear regression, as it allows one to partition the variation in a dataset into variation explained by the model and variation explained by randomness. In statistics linear regression is a form of Regression analysis in which the relationship between one or more Independent variables and another variable called

The use of mean squared error without question has been criticized by the decision theorist J. Decision theory in Mathematics and Statistics is concerned with identifying the Values uncertainties and other issues relevant in a given O. Berger. Mean squared error conflicts with most losses derived from utility functions; mean squared error is convex everywhere, whereas most losses derived from utility theory have concave tails (and may be concave everywhere). In Economics, utility is a measure of the relative satisfaction from or desirability of Consumption of various Goods and services. There are, however, some scenarios where mean squared error can serve as a good approximation to a loss function occurring naturally in an application. [3]

Like variance, mean squared error has the disadvantage of heavily weighting outliers. In Probability theory and Statistics, the variance of a Random variable, Probability distribution, or sample is one measure of In Statistics, an outlier is an observation that is numerically distant from the rest of the data. [4] This is a result of the squaring of each term, which effectively weights large errors more heavily than small ones. This property, undesirable in many applications, has led researchers to use alternatives such as the mean absolute error, or those based on the median. In Statistics, the mean absolute error is a quantity used to measure how close forecasts or predictions are to the eventual outcomes In Probability theory and Statistics, a median is described as the number separating the higher half of a sample a population or a Probability distribution

## References

1. ^ a b George Casella & E. L. Lehmann, "Theory of Point Estimation". Springer, (1999)
2. ^ Degroot, Morris (1980). Probability and Statistics, 2, Addison-Wesley.
3. ^ J. O. Berger, Statistical Decision Theory and Bayesian Analysis. Springer-Verlag 2nd ed. (1985) section 2. 4. 2. (ISBN 3540960988)
4. ^ Sergio Bermejo, Joan Cabestany "Oriented principal component analysis for large margin classifiers", Neural Networks, Vol. 14, No. 10, (Dec. 2001), pp. 1447-1461.