Regression Toward The Mean - Definition For Simple Linear Regression of Data Points

Definition For Simple Linear Regression of Data Points

This is the definition of regression toward the mean that closely follows Sir Francis Galton's original usage.

Suppose there are n data points {yi, xi}, where i = 1, 2, …, n. We want to find the equation of the regression line, i.e. the straight line

which would provide a “best” fit for the data points. (Note that a straight line may not be the appropriate regression curve for the given data points.) Here the “best” will be understood as in the least-squares approach: such a line that minimizes the sum of squared residuals of the linear regression model. In other words, numbers α and β solve the following minimization problem:

Find, where Q(\alpha,\beta) = \sum_{i=1}^n\hat{\varepsilon}_i^{\,2}
= \sum_{i=1}^n (y_i - \alpha - \beta x_i)^2\

Using simple calculus it can be shown that the values of α and β that minimize the objective function Q are

\begin{align} & \hat\beta = \frac{ \sum_{i=1}^{n} (x_{i}-\bar{x})(y_{i}-\bar{y}) }{ \sum_{i=1}^{n} (x_{i}-\bar{x})^2 } = \frac{ \overline{xy} - \bar{x}\bar{y} }{ \overline{x^2} - \bar{x}^2 } = \frac{ \operatorname{Cov} }{ \operatorname{Var} } = r_{xy} \frac{s_y}{s_x}, \\ & \hat\alpha = \bar{y} - \hat\beta\,\bar{x}, \end{align}

where rxy is the sample correlation coefficient between x and y, sx is the standard deviation of x, and sy is correspondingly the standard deviation of y. Horizontal bar over a variable means the sample average of that variable. For example:

Substituting the above expressions for and into yields fitted values

which yields

This shows the role rxy plays in the regression line of standardized data points.

If −1 < rxy < 1, then we say that the data points exhibit regression toward the mean. In other words, if linear regression is the appropriate model for a set of data points whose sample correlation coefficient is not perfect, then there is regression toward the mean. The predicted (or fitted) standardized value of y is closer to its mean than the standardized value of x is to its mean.

Read more about this topic:  Regression Toward The Mean

Famous quotes containing the words definition, simple, data and/or points:

    According to our social pyramid, all men who feel displaced racially, culturally, and/or because of economic hardships will turn on those whom they feel they can order and humiliate, usually women, children, and animals—just as they have been ordered and humiliated by those privileged few who are in power. However, this definition does not explain why there are privileged men who behave this way toward women.
    Ana Castillo (b. 1953)

    Beauty is an ecstasy; it is as simple as hunger. There is really nothing to be said about it. It is like the perfume of a rose: you can smell it and that is all.
    W. Somerset Maugham (1874–1965)

    This city is neither a jungle nor the moon.... In long shot: a cosmic smudge, a conglomerate of bleeding energies. Close up, it is a fairly legible printed circuit, a transistorized labyrinth of beastly tracks, a data bank for asthmatic voice-prints.
    Susan Sontag (b. 1933)

    In writing biography, fact and fiction shouldn’t be mixed. And if they are, the fictional points should be printed in red ink, the facts printed in black ink.
    Catherine Drinker Bowen (1897–1973)