Mallows' Cp - Practical Use

Practical Use

The Cp statistic is often used as a stopping rule for various forms of stepwise regression. Mallows proposed the statistic as a criterion for selecting among many alternative subset regressions. Under a model not suffering from appreciable lack of fit (bias), Cp has expectation nearly equal to P; otherwise the expectation is roughly P plus a positive bias term. Nevertheless, even though it has expectation greater than or equal to P, there is nothing to prevent Cp < P or even Cp < 0 in extreme cases. It is suggested that one should choose a subset that has Cp approaching P, from above, for a list of subsets ordered by increasing P. In practice, the positive bias can be adjusted for by selecting a model from the ordered list of subsets, such that Cp < 2P.

Since the sample-based Cp statistic is an estimate of the MSPE, using Cp for model selection does not completely guard against overfitting. For instance, it is possible that the selected model will be one in which the sample Cp was a particularly severe underestimate of the MSPE.

Model selection statistics such as Cp are generally not used blindly, but rather information about the field of application, the intended use of the model, and any known biases in the data are taken into account in the process of model selection.

Read more about this topic:  Mallows' Cp

Famous quotes related to practical use:

    Not many appreciate the ultimate power and potential usefulness of basic knowledge accumulated by obscure, unseen investigators who, in a lifetime of intensive study, may never see any practical use for their findings but who go on seeking answers to the unknown without thought of financial or practical gain.
    Eugenie Clark (b. 1922)