In its most basic forms, model selection is one of the fundamental tasks of scientific inquiry. Determining the principle that explains a series of observations is often linked directly to a mathematical model predicting those observations. For example, when Galileo performed his inclined plane experiments, he demonstrated that the motion of the balls fitted the parabola predicted by his model.
Of the countless number of possible mechanisms and processes that could have produced the data, how can one even begin to choose the best model? The mathematical approach commonly taken decides among a set of candidate models; this set must be chosen by the researcher. Often simple models such as polynomials are used, at least initially. Burnham and Anderson (2002) emphasize the importance of choosing models based on sound scientific principles, modeling the underlying data throughout their book.
Once the set of candidate models has been chosen, the mathematical analysis allows us to select the best of these models. What is meant by best is controversial. A good model selection technique will balance goodness of fit with simplicity. More complex models will be better able to adapt their shape to fit the data (for example, a fifth-order polynomial can exactly fit six points), but the additional parameters may not represent anything useful. (Perhaps those six points are really just randomly distributed about a straight line.) Goodness of fit is generally determined using a likelihood ratio approach, or an approximation of this, leading to a chi-squared test. The complexity is generally measured by counting the number of parameters in the model.
Model selection techniques can be considered as estimators of some physical quantity, such as the probability of the model producing the given data. The bias and variance are both important measures of the quality of this estimator.
Asymptotic efficiency is also often considered.
A standard example of model selection is that of curve fitting, where, given a set of points and other background knowledge (e.g. points are a result of i.i.d. samples), we must select a curve that describes the function that generated the points.
Read more about this topic: Model Selection
Famous quotes containing the word introduction:
“For the introduction of a new kind of music must be shunned as imperiling the whole state; since styles of music are never disturbed without affecting the most important political institutions.”
—Plato (c. 427347 B.C.)
“For better or worse, stepparenting is self-conscious parenting. Youre damned if you do, and damned if you dont.”
—Anonymous Parent. Making It as a Stepparent, by Claire Berman, introduction (1980, repr. 1986)
“We used chamber-pots a good deal.... My mother ... loved to repeat: When did the queen reign over China? This whimsical and harmless scatological pun was my first introduction to the wonderful world of verbal transformations, and also a first perception that a joke need not be funny to give pleasure.”
—Angela Carter (19401992)