Variance - Population Variance and Sample Variance

Population Variance and Sample Variance

In general, the population variance of a finite population of size N is given by

where

is the population mean, and


\begin{align}
\sum_{i=1}^N \left(x_i - \mu \right)^2 &= \sum_{i=1}^N \left(x_i^2 - 2 x_i \mu + \mu^2 \right) \\
&= \sum_{i=1}^N \left(x_i^2 + \mu^2 \right) - 2 \mu \sum_{i=1}^N x_i \\
&= \sum_{i=1}^N \left(x_i^2 + \mu^2 \right) - 2 N \mu^2 \\
&= \sum_{i=1}^N \left(x_i^2 + \mu^2 - 2 \mu^2 \right) \\
&= \sum_{i=1}^N \left(x_i^2 - \mu^2 \right)
\end{align}

In many practical situations, the true variance of a population is not known a priori and must be computed somehow. When dealing with extremely large populations, it is not possible to count every object in the population.

A common task is to estimate the variance of a population from a sample. We take a sample with replacement of n values y1, ..., yn from the population, where n < N, and estimate the variance on the basis of this sample. There are several good estimators. Two of them are well known:

and

\begin{align}
s^2
& = \frac{1}{n-1} \sum_{i=1}^n\left(y_i - \overline{y} \right)^ 2 \\
& = \frac{1}{n-1}\sum_{i=1}^n\left(y_i^2-\overline{y}^2\right) \\
& = \frac{1}{n-1}\sum_{i=1}^n y_i^2-\frac{n}{n-1}\overline{y}^2
\end{align}

The first estimator, also known as the second central moment, is called the biased sample variance. The second estimator is called the unbiased sample variance. Either estimator may be simply referred to as the sample variance when the version can be determined by context. Here, denotes the sample mean:

The two estimators only differ slightly as can be seen, and for larger values of the sample size n the difference is negligible. While the first one may be seen as the variance of the sample considered as a population, the second one is the unbiased estimator of the population variance, meaning that its expected value E is equal to the true variance of the sampled random variable; the use of the term n − 1 is called Bessel's correction. In particular,


\operatorname{E} = \sigma^2,

while, in contrast,

The unbiased sample variance is a U-statistic for the function ƒ(x1, x2) = (x1x2)2/2, meaning that it is obtained by averaging a 2-sample statistic over 2-element subsets of the population.

Read more about this topic:  Variance

Famous quotes containing the words population, variance and/or sample:

    I think that cars today are almost the exact equivalent of the great Gothic cathedrals: I mean the supreme creation of an era, conceived with passion by unknown artists, and consumed in image if not in usage by a whole population which appropriates them as a purely magical object.
    Roland Barthes (1915–1980)

    There is an untroubled harmony in everything, a full consonance in nature; only in our illusory freedom do we feel at variance with it.
    Fyodor Tyutchev (1803–1873)

    The present war having so long cut off all communication with Great-Britain, we are not able to make a fair estimate of the state of science in that country. The spirit in which she wages war is the only sample before our eyes, and that does not seem the legitimate offspring either of science or of civilization.
    Thomas Jefferson (1743–1826)