Variance - Properties

Properties

Variance is non-negative because the squares are positive or zero.

The variance of a constant random variable is zero, and if the variance of a variable in a data set is 0, then all the entries have the same value.

Variance is invariant with respect to changes in a location parameter. That is, if a constant is added to all values of the variable, the variance is unchanged.

If all values are scaled by a constant, the variance is scaled by the square of that constant.

The variance of a sum of two random variables is given by:

In general we have for the sum of random variables:

The variance of a finite sum of uncorrelated random variables is equal to the sum of their variances. This stems from the above identity and the fact that for uncorrelated variables the covariance is zero.

These results lead to the variance of a linear combination as:


\begin{align}
\operatorname{Var}\left( \sum_{i=1}^{N} a_iX_i\right) &=\sum_{i=1}^{N}\sum_{j=1}^{N} a_ia_j\operatorname{Cov}(X_i,X_j) \\
&=\sum_{i=1}^{N}a_i^2\operatorname{Var}(X_i)+\sum_{i\not=j}a_ia_j\operatorname{Cov}(X_i,X_j)\\
& =\sum_{i=1}^{N}a_i^2\operatorname{Var}(X_i)+2\sum_{i<j}a_ia_j\operatorname{Cov}(X_i,X_j).
\end{align}

Suppose that the observations can be partitioned into equal-sized subgroups according to some second variable. Then the variance of the total group is equal to the mean of the variances of the subgroups plus the variance of the means of the subgroups. This property is known as variance decomposition or the law of total variance and plays an important role in the analysis of variance. For example, suppose that a group consists of a subgroup of men and an equally large subgroup of women. Suppose that the men have a mean height of 180 and that the variance of their heights is 100. Suppose that the women have a mean height of 160 and that the variance of their heights is 50. Then the mean of the variances is (100 + 50) / 2 = 75; the variance of the means is the variance of 180, 160 which is 100. Then, for the total group of men and women combined, the variance of the height will be 75 + 100 = 175. Note that this uses N for the denominator instead of N − 1.
In a more general case, if the subgroups have unequal sizes, then they must be weighted proportionally to their size in the computations of the means and variances. The formula is also valid with more than two groups, and even if the grouping variable is continuous.
This formula implies that the variance of the total group cannot be smaller than the mean of the variances of the subgroups. Note, however, that the total variance is not necessarily larger than the variances of the subgroups. In the above example, when the subgroups are analyzed separately, the variance is influenced only by the man-man differences and the woman-woman differences. If the two groups are combined, however, then the men-women differences enter into the variance also.

Many computational formulas for the variance are based on this equality: The variance is equal to the mean of the square minus the square of the mean:

\begin{align} \operatorname{Var}(X) &= \operatorname{E} - \operatorname{E}^2. \end{align}

For example, if we consider the numbers 1, 2, 3, 4 then the mean of the squares is (1 × 1 + 2 × 2 + 3 × 3 + 4 × 4) / 4 = 7.5. The regular mean of all four numbers is 2.5, so the square of the mean is 6.25. Therefore the variance is 7.5 − 6.25 = 1.25, which is indeed the same result obtained earlier with the definition formulas. Many pocket calculators use an algorithm that is based on this formula and that allows them to compute the variance while the data are entered, without storing all values in memory. The algorithm is to adjust only three variables when a new data value is entered: The number of data entered so far (n), the sum of the values so far (S), and the sum of the squared values so far (SS). For example, if the data are 1, 2, 3, 4, then after entering the first value, the algorithm would have n = 1, S = 1 and SS = 1. After entering the second value (2), it would have n = 2, S = 3 and SS = 5. When all data are entered, it would have n = 4, S = 10 and SS = 30. Next, the mean is computed as M = S / n, and finally the variance is computed as SS / nM × M. In this example the outcome would be 30 / 4 − 2.5 × 2.5 = 7.5 − 6.25 = 1.25. If the unbiased sample estimate is to be computed, the outcome will be multiplied by 1 / (n − 1), which yields 1.667 in this example.

Read more about this topic:  Variance

Famous quotes containing the word properties:

    The reason why men enter into society, is the preservation of their property; and the end why they choose and authorize a legislative, is, that there may be laws made, and rules set, as guards and fences to the properties of all the members of the society: to limit the power, and moderate the dominion, of every part and member of the society.
    John Locke (1632–1704)

    A drop of water has the properties of the sea, but cannot exhibit a storm. There is beauty of a concert, as well as of a flute; strength of a host, as well as of a hero.
    Ralph Waldo Emerson (1803–1882)