Minimum Description Length (MDL)

Minimum description length (MDL) (Rissanen 1978) is a technique from algorithmic information theory which dictates that the best hypothesis for a given set of data is the one that leads to the largest compression of the data. We seek to minimize the sum of the length, in bits, of an effective description of the model and the length, in bits, of an effective description of the data when encoded with the help of the model.
Sewell (2006)

seminal paper:
RISSANEN, J., 1978. Modeling by shortest data description, Automatica, Volume 14, Issue 5, September 1978, Pages 465-471
Abstract: "The number of digits it takes to write down an observed sequence x1, …, xN of a time series depends on the model with its parameters that one assumes to have generated the observed data. Accordingly, by finding the model which minimizes the description length one obtains estimates of both the integer-valued structure parameters and the real-valued system parameters."

"Minimum description length (MDL) (Rissanen 1978) uses an information theoretic measure. Kolmogorov complexity of a dataset is defined as the shortest description of the data. If the data is simple, it has a short complexity; for example, if it is a sequence of “0”s, we can just write “0” and the length of the sequence. If the data is completely random, then we cannot have any description of the data shorter than the data itself. If a model is appropriate for the data, then it has a good fit to the data, and instead of the data, we can send/store the model description. Out of all the models that describe the data, we want to have the simplest model so that it lends itself to the shortest description. So we again have a trade-off between how simple the model is and how well it explains the data."
Alpaydin, 2004

"As an example the Minimum Description Length (MDL) principle proposes to use the set of hypotheses for which the description of the chosen function together with the list of training errors is shortest."
Cristianini and Shawe-Taylor (2000), page 5

"It is sometimes claimed that the minimum description length principle provides justification for preferring one type of classifier over another—specifically, “simpler” classifiers over “complex” ones. Briefly stated, the approach purports to find some irreducible, smallest representation of all members of a category (much like a “signal”); all variation among the individual patterns is then “noise.” The argument is that by simplifting recognizers appropriately, the signal can be retained while the noise is ignored."
The minimum description length (MDL) principle states that we should minimize the sum of the model’s algorithmic complexity and the description of the training data with respect to that model, ..."
Duda, Hart and Stork (2001), pages 461-462, 463

"Intuitively, we can think of the MDL principle as recommending the shortest method for re-encoding the training data, where we count both the size of the hypothesis and any additional cost of encoding the data given this hypothesis."
Mitchell (1997), page 173

"The minimum description length principle is a formalization of Occam's Razor in which the best hypothesis for a given set of data is the one that leads to the largest compression of the data. MDL was introduced by Jorma Rissanen in 1978; it is important in information theory and learning theory."
Wikipedia (2006)

"The MDL principle (Wallace and Boulton, 1968) states that one should prefer models that can communicate the data in the smallest number of bits."
MacKay (2003), page 352

"The MDL Principle is a relatively recent method for inductive inference. The fundamental idea behind the MDL Principle is that any regularity in a given set of data can be used to compress the data, i.e. to describe it using fewer symbols than needed to describe the data literally."
Grünwald, 1998

Given a sample of data, and an effective enumeration of models, ideal MDL selects the model which minimizes the sum of

Vit?a?nyi and Li, In: Zellner, Keuzenkamp and McAleer (2001), page 153

Top 10 Papers

  1. RISSANEN, J., 1983. A Universal Prior for Integers and Estimation by Minimum Description Length. The Annals of Statistics. [Cited by 635] (25.88/year)
  2. QUINLAN, J.R. and R.L. RIVEST, 1989. Inferring decision trees using the minimum description length principle. Information and Computation. [Cited by 436] (23.52/year)
  3. BARRON, A.R. and J.B. YU, 1998. The minimum description length principle in coding and modeling. Information Theory, IEEE Transactions on. [Cited by 325] (34.09/year)
  4. DAVIES, R.H., T.F. COOTES and C.J. TAYLOR, 2001. A Minimum Description Length Approach to Statistical Shape Modelling. Information Processing in Medical Imaging: 17th …. [Cited by 200] (30.61/year)
  5. COOK, D.J. and L.B. HOLDER, 1994. Substructure Discovery Using Minimum Description Length and Background Knowledge. Arxiv preprint cs.AI/9402102. [Cited by 186] (13.74/year)
  6. GRüNWALD, P.D., 1998. The Minimum Description Length Principle and Reasoning Under Uncertainty. cwi.nl. [Cited by 98] (10.28/year)
  7. HANSEN, M.H. and B. YU, 2001. Model Selection and the Principle of Minimum Description Length.. Journal of the American Statistical Association. [Cited by 202] (30.91/year)
  8. HINTON, G.E. and R.S. ZEMEL, 1994. Autoencoders, minimum description length, and Helmholtz free energy. Advances in Neural Information Processing Systems. [Cited by 111] (8.20/year)
  9. RISSANEN, J., 1987. Minimum description length principle. Encyclopedia of Statistical Sciences. [Cited by 132] (6.43/year)
  10. HITOSHI, I.B.A., H. DE and S. TAISUKE, 1994. Genetic Programming Using a Minimum Description Length Principle. Advances in Genetic Programming. [Cited by 96] (7.09/year)