Algorithms For Calculating Variance - Covariance

Covariance

Very similar algorithms can be used to compute the covariance. The naive algorithm is:

For the algorithm above, one could use the following pseudocode:

def naive_covariance(data1, data2): n = len(data1) sum12 = 0 sum1 = sum(data1) sum2 = sum(data2) for i in range(n): sum12 += data1*data2 covariance = (sum12 - sum1*sum2 / n) / n return covariance

A more numerically stable two-pass algorithm first computes the sample means, and then the covariance:

The two-pass algorithm may be written as:

def two_pass_covariance(data1, data2): n = len(data1) mean1 = sum(data1) / n mean2 = sum(data2) / n covariance = 0 for i in range(n): a = data1 - mean1 b = data2 - mean2 covariance += a*b / n return covariance

A slightly more accurate compensated version performs the full naive algorithm on the residuals. The final sums and should be zero, but the second pass compensates for any small error.

A stable one-pass algorithm exists, similar to the one above, that computes :

The apparent asymmetry in that last equation is due to the fact that, so both update terms are equal to . Even greater accuracy can be achieved by first computing the means, then using the stable one-pass algorithm on the residuals.

Likewise, there is a formula for combining the covariances of two sets that can be used to parallelize the computation:

.

Read more about this topic:  Algorithms For Calculating Variance