Model Selection

Definitions

Model selection is the task of choosing a model from a set of potential models with the best inductive bias, which in practice means selecting parameters in an attempt to create a model of optimal complexity given (finite) training data.
Sewell (2006)

"Thus learning is not possible without inductive bias, and now the question is how to choose the right bias. This is called model selection."
Alpaydin (2004) p33

"Model selection is the task of choosing a model of optimal complexity for the given (finite) data."
Cherkassky and Mulier (1998) p73

"Model selection: estimating the performance of different models in order to choose the (approximate) best one."
Hastie, Tibshirani and Friedman (2001)

"The term model selection (see e.g. [Forster, 2000]) refers to the problem of selecting good learning parameters from a small set of choices based on training data."
Joachims (2002) p105

"...it is important to find the optimal model complexity h to avoid overfitting. This process is called model selection and can be done using different criteria. In model selection quantities like the kernel width for radial basis functions, the number of neurons in a neural network or regularization parameters are chosen."
Rychetsky

"Model selection is the task of selecting a mathematical model from a set of potential models, given evidence."
Wikipedia (2006)

Bibliography

'Model selection (variable selection in regression is a special case) is a bias versus variance trade-off and this is the statistical principle of parsimony. Inference under models with too few parameters (variables) can be biased, while with models having too many parameters (variables) there may be poor precision or identification of effects that are, in fact, spurious. These considerations call for a balance between under- and over-fitted models -- the so-called "model selection problem" (see Forster 2000, 2001).'
Burnham and Anderson (2004)

Empirical

Adjusted R-squared (Wherry (1931))

Bootstrap (Efron (1979))

Cross-validation (Stone (1974), Geisser (1975))

Generalized cross-validation (GCV) (Craven and Wahba, 1979)
k-fold crossvalidation
leave-one-out crossvalidation

Jacknife

Linear regression

Shibata’s model selector (sms) (1981)

signal-to-noise ratio

test set validation

Theoretical

Akaike information criterion (AIC)

AIC (Akaike, 1974)
AICc (Hurvich and Tsai, 1989)
QAIC
QAICc
AICW (Wilks, 1995)

CAT (Parzen, 1974, 1977)

CP (Mallow's Cp, Mallows, 1973)

Deviance information criterion (DIC) (Spiegelhalter, et al., 2002)

FIC (Wei, 1992)

Final prediction error (FPE) (Akaike, 1969)

FPEα (Bhansali and Downham, 1977)

FPEC (De Luna, 1998)

FPER (Larsen and Hansen, 1994)

GM (Geweke and Meese, 1981)

generalized prediction error (GPE) (Moody 1991, Moody 1992)

Hannan and Quinn Criterion (HQ)(Hannan and Quinn, 1979)

KIC, KICc (Cavanaugh, 1999, 2004)

Minimum description length (MDL) (Rissanen, 1978)

Minimum message length (MML) (Wallace and Boulton, 1968)

Predicted squared error (PSE) (Barron, 1984)

PRESS (Allen, 1974)

Schwarz criterion (also Schwarz information criterion (SIC) or Bayesian information criterion (BIC) or Schwarz-Bayesian information criterion) (Schwarz (1978))