Pearson Product-moment Correlation Coefficient - Pearson's Correlation and Least Squares Regression Analysis

Pearson's Correlation and Least Squares Regression Analysis

The square of the sample correlation coefficient, which is also known as the coefficient of determination, estimates the fraction of the variance in Y that is explained by X in a simple linear regression. As a starting point, the total variation in the Yi around their average value can be decomposed as follows


\sum_i (Y_i - \bar{Y})^2 = \sum_i (Y_i-\hat{Y}_i)^2 + \sum_i (\hat{Y}_i-\bar{Y})^2,

where the are the fitted values from the regression analysis. This can be rearranged to give


1 = \frac{\sum_i (Y_i-\hat{Y}_i)^2}{\sum_i (Y_i - \bar{Y})^2} + \frac{\sum_i (\hat{Y}_i-\bar{Y})^2}{\sum_i (Y_i - \bar{Y})^2}.

The two summands above are the fraction of variance in Y that is explained by X (right) and that is unexplained by X (left).

Next, we apply a property of least square regression models, that the sample covariance between and is zero. Thus, the sample correlation coefficient between the observed and fitted response values in the regression can be written


\begin{align}
r(Y,\hat{Y}) &= \frac{\sum_i(Y_i-\bar{Y})(\hat{Y}_i-\bar{Y})}{\sqrt{\sum_i(Y_i-\bar{Y})^2\cdot \sum_i(\hat{Y}_i-\bar{Y})^2}}\\
&= \frac{\sum_i(Y_i-\hat{Y}_i+\hat{Y}_i-\bar{Y})(\hat{Y}_i-\bar{Y})}{\sqrt{\sum_i(Y_i-\bar{Y})^2\cdot \sum_i(\hat{Y}_i-\bar{Y})^2}}\\
&= \frac{ \sum_i }{\sqrt{\sum_i(Y_i-\bar{Y})^2\cdot \sum_i(\hat{Y}_i-\bar{Y})^2}}\\
&= \frac{ \sum_i (\hat{Y}_i-\bar{Y})^2 }{\sqrt{\sum_i(Y_i-\bar{Y})^2\cdot \sum_i(\hat{Y}_i-\bar{Y})^2}}\\
&= \sqrt{\frac{\sum_i(\hat{Y}_i-\bar{Y})^2}{\sum_i(Y_i-\bar{Y})^2}}.
\end{align}

Thus


r(Y,\hat{Y})^2 = \frac{\sum_i(\hat{Y}_i-\bar{Y})^2}{\sum_i(Y_i-\bar{Y})^2}

is the proportion of variance in Y explained by a linear function of X.

Read more about this topic:  Pearson Product-moment Correlation Coefficient

Famous quotes containing the words pearson, squares and/or analysis:

    The newly-formed clothing unions are ready to welcome her; but woman shrinks back from organization, Heaven knows why! It is perhaps because in organization one find the truest freedom, and woman has been a slave too long to know what freedom means.
    —Katharine Pearson Woods (1853–1923)

    An afternoon of nurses and rumours;
    The provinces of his body revolted,
    The squares of his mind were empty,
    Silence invaded the suburbs,
    —W.H. (Wystan Hugh)

    Analysis as an instrument of enlightenment and civilization is good, in so far as it shatters absurd convictions, acts as a solvent upon natural prejudices, and undermines authority; good, in other words, in that it sets free, refines, humanizes, makes slaves ripe for freedom. But it is bad, very bad, in so far as it stands in the way of action, cannot shape the vital forces, maims life at its roots. Analysis can be a very unappetizing affair, as much so as death.
    Thomas Mann (1875–1955)