Regression Toward The Mean - Definition For Simple Linear Regression of Data Points

Definition For Simple Linear Regression of Data Points

This is the definition of regression toward the mean that closely follows Sir Francis Galton's original usage.

Suppose there are n data points {yi, xi}, where i = 1, 2, …, n. We want to find the equation of the regression line, i.e. the straight line

which would provide a “best” fit for the data points. (Note that a straight line may not be the appropriate regression curve for the given data points.) Here the “best” will be understood as in the least-squares approach: such a line that minimizes the sum of squared residuals of the linear regression model. In other words, numbers α and β solve the following minimization problem:

Find, where Q(\alpha,\beta) = \sum_{i=1}^n\hat{\varepsilon}_i^{\,2}
= \sum_{i=1}^n (y_i - \alpha - \beta x_i)^2\

Using simple calculus it can be shown that the values of α and β that minimize the objective function Q are

\begin{align} & \hat\beta = \frac{ \sum_{i=1}^{n} (x_{i}-\bar{x})(y_{i}-\bar{y}) }{ \sum_{i=1}^{n} (x_{i}-\bar{x})^2 } = \frac{ \overline{xy} - \bar{x}\bar{y} }{ \overline{x^2} - \bar{x}^2 } = \frac{ \operatorname{Cov} }{ \operatorname{Var} } = r_{xy} \frac{s_y}{s_x}, \\ & \hat\alpha = \bar{y} - \hat\beta\,\bar{x}, \end{align}

where rxy is the sample correlation coefficient between x and y, sx is the standard deviation of x, and sy is correspondingly the standard deviation of y. Horizontal bar over a variable means the sample average of that variable. For example:

Substituting the above expressions for and into yields fitted values

which yields

This shows the role rxy plays in the regression line of standardized data points.

If −1 < rxy < 1, then we say that the data points exhibit regression toward the mean. In other words, if linear regression is the appropriate model for a set of data points whose sample correlation coefficient is not perfect, then there is regression toward the mean. The predicted (or fitted) standardized value of y is closer to its mean than the standardized value of x is to its mean.

Read more about this topic:  Regression Toward The Mean

Famous quotes containing the words definition, simple, data and/or points:

    Mothers often are too easily intimidated by their children’s negative reactions...When the child cries or is unhappy, the mother reads this as meaning that she is a failure. This is why it is so important for a mother to know...that the process of growing up involves by definition things that her child is not going to like. Her job is not to create a bed of roses, but to help him learn how to pick his way through the thorns.
    Elaine Heffner (20th century)

    There was a deserted log camp here, apparently used the previous winter, with its “hovel” or barn for cattle.... It was a simple and strong fort erected against the cold, and suggested what valiant trencher work had been done there.
    Henry David Thoreau (1817–1862)

    Mental health data from the 1950’s on middle-aged women showed them to be a particularly distressed group, vulnerable to depression and feelings of uselessness. This isn’t surprising. If society tells you that your main role is to be attractive to men and you are getting crow’s feet, and to be a mother to children and yours are leaving home, no wonder you are distressed.
    Grace Baruch (20th century)

    A few ideas seem to be agreed upon. Help none but those who help themselves. Educate only at schools which provide in some form for industrial education. These two points should be insisted upon. Let the normal instruction be that men must earn their own living, and that by the labor of their hands as far as may be. This is the gospel of salvation for the colored man. Let the labor not be servile, but in manly occupations like that of the carpenter, the farmer, and the blacksmith.
    Rutherford Birchard Hayes (1822–1893)