Coefficient of Determination - Adjusted R2

Adjusted R2

Adjusted R2 (often written as and pronounced "R bar squared") is a modification due to Theil of R2 that adjusts for the number of explanatory terms in a model. The adjusted R2 can be negative, and its value will always be less than or equal to that of R2. Unlike R2, the adjusted R2 increases only if the new term improves the model more than would be expected by chance. If the best-fit polynomial for a given set of points were calculated multiple times, with the degree increasing by one each time, the level at which R2 reaches a maximum, and decreases afterward, would be the regression with the ideal combination of having the best fit without excess/unnecessary terms. The adjusted R2 is defined as

where p is the total number of regressors in the linear model (not counting the constant term), and n is the sample size.

Adjusted R2 can also be written as

where dft is the degrees of freedom n– 1 of the estimate of the population variance of the dependent variable, and dfe is the degrees of freedom np – 1 of the estimate of the underlying population error variance.

The principle behind the adjusted R2 statistic can be seen by rewriting the ordinary R2 as

where and are estimates of the variances of the errors and of the observations, respectively. These estimates are replaced by statistically unbiased versions: and .

Adjusted R2 does not have the same interpretation as R2. As such, care must be taken in interpreting and reporting this statistic. Adjusted R2 is particularly useful in the feature selection stage of model building.

The use of an adjusted R2 is an attempt to take account of the phenomenon of statistical shrinkage.

Read more about this topic:  Coefficient Of Determination

Famous quotes containing the word adjusted:

    The lover of nature is he whose inward and outward senses are still truly adjusted to each other; who has retained the spirit of infancy even into the era of manhood.
    Ralph Waldo Emerson (1803–1882)