Cross-validation (statistics)

Cross-validation (statistics)

Cross-validation, sometimes called rotation estimation, is a technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, multiple rounds of cross-validation are performed using different partitions, and the validation results are averaged over the rounds.

Cross-validation is important in guarding against testing hypotheses suggested by the data (called "Type III errors"), especially where further samples are hazardous, costly or impossible to collect (see uncomfortable science).

Read more about Cross-validation (statistics):  Purpose of Cross Validation, Measures of Fit, Applications, Statistical Properties, Computational Issues, Relationship To Other Forms of Validation, Limitations and Misuse