Minimum Description Length (MDL)

Minimum description length (MDL) (Rissanen 1978) is a technique from algorithmic information theory which dictates that the best hypothesis for a given set of data is the one that leads to the largest compression of the data. We seek to minimize the sum of the length, in bits, of an effective description of the model and the length, in bits, of an effective description of the data when encoded with the help of the model.
Sewell (2006)

seminal paper:
RISSANEN, J., 1978. Modeling by shortest data description, Automatica, Volume 14, Issue 5, September 1978, Pages 465-471
Abstract: "The number of digits it takes to write down an observed sequence x₁, …, x_N of a time series depends on the model with its parameters that one assumes to have generated the observed data. Accordingly, by finding the model which minimizes the description length one obtains estimates of both the integer-valued structure parameters and the real-valued system parameters."

"Minimum description length (MDL) (Rissanen 1978) uses an information theoretic measure. Kolmogorov complexity of a dataset is defined as the shortest description of the data. If the data is simple, it has a short complexity; for example, if it is a sequence of “0”s, we can just write “0” and the length of the sequence. If the data is completely random, then we cannot have any description of the data shorter than the data itself. If a model is appropriate for the data, then it has a good fit to the data, and instead of the data, we can send/store the model description. Out of all the models that describe the data, we want to have the simplest model so that it lends itself to the shortest description. So we again have a trade-off between how simple the model is and how well it explains the data."
Alpaydin, 2004

"As an example the Minimum Description Length (MDL) principle proposes to use the set of hypotheses for which the description of the chosen function together with the list of training errors is shortest."
Cristianini and Shawe-Taylor (2000), page 5

"It is sometimes claimed that the minimum description length principle provides justification for preferring one type of classifier over another—specifically, “simpler” classifiers over “complex” ones. Briefly stated, the approach purports to find some irreducible, smallest representation of all members of a category (much like a “signal”); all variation among the individual patterns is then “noise.” The argument is that by simplifting recognizers appropriately, the signal can be retained while the noise is ignored."
[...]
The minimum description length (MDL) principle states that we should minimize the sum of the model’s algorithmic complexity and the description of the training data with respect to that model, ..."
Duda, Hart and Stork (2001), pages 461-462, 463

"Intuitively, we can think of the MDL principle as recommending the shortest method for re-encoding the training data, where we count both the size of the hypothesis and any additional cost of encoding the data given this hypothesis."
Mitchell (1997), page 173

"The minimum description length principle is a formalization of Occam's Razor in which the best hypothesis for a given set of data is the one that leads to the largest compression of the data. MDL was introduced by Jorma Rissanen in 1978; it is important in information theory and learning theory."
Wikipedia (2006)

"The MDL principle (Wallace and Boulton, 1968) states that one should prefer models that can communicate the data in the smallest number of bits."
MacKay (2003), page 352

"The MDL Principle is a relatively recent method for inductive inference. The fundamental idea behind the MDL Principle is that any regularity in a given set of data can be used to compress the data, i.e. to describe it using fewer symbols than needed to describe the data literally."
Gr�nwald, 1998

Given a sample of data, and an effective enumeration of models, ideal MDL selects the model which minimizes the sum of

the length, in bits, of an effective description of the model; and
the length, in bits, of an effective description of the data when encoded with the help of the model.

Vit?a?nyi and Li, In: Zellner, Keuzenkamp and McAleer (2001), page 153

Top 10 Papers

RISSANEN, J., 1983. A Universal Prior for Integers and Estimation by Minimum Description Length. The Annals of Statistics. [Cited by 635] (25.88/year)
QUINLAN, J.R. and R.L. RIVEST, 1989. Inferring decision trees using the minimum description length principle. Information and Computation. [Cited by 436] (23.52/year)
BARRON, A.R. and J.B. YU, 1998. The minimum description length principle in coding and modeling. Information Theory, IEEE Transactions on. [Cited by 325] (34.09/year)
DAVIES, R.H., T.F. COOTES and C.J. TAYLOR, 2001. A Minimum Description Length Approach to Statistical Shape Modelling. Information Processing in Medical Imaging: 17th …. [Cited by 200] (30.61/year)
COOK, D.J. and L.B. HOLDER, 1994. Substructure Discovery Using Minimum Description Length and Background Knowledge. Arxiv preprint cs.AI/9402102. [Cited by 186] (13.74/year)
GR�NWALD, P.D., 1998. The Minimum Description Length Principle and Reasoning Under Uncertainty. cwi.nl. [Cited by 98] (10.28/year)
HANSEN, M.H. and B. YU, 2001. Model Selection and the Principle of Minimum Description Length.. Journal of the American Statistical Association. [Cited by 202] (30.91/year)
HINTON, G.E. and R.S. ZEMEL, 1994. Autoencoders, minimum description length, and Helmholtz free energy. Advances in Neural Information Processing Systems. [Cited by 111] (8.20/year)
RISSANEN, J., 1987. Minimum description length principle. Encyclopedia of Statistical Sciences. [Cited by 132] (6.43/year)
HITOSHI, I.B.A., H. DE and S. TAISUKE, 1994. Genetic Programming Using a Minimum Description Length Principle. Advances in Genetic Programming. [Cited by 96] (7.09/year)

Bibliography

AGRAWAL, R., M. MEHTA and J.J. RISSANEN, 1998. … decision tree classifier for data records based on a minimum description length (MDL) and presorting …. US Patent 5,787,274. [Cited by 47] (4.93/year)
ANDERSON, E.C. and J. NOVEMBRE, 2003. Finding Haplotype Block Boundaries by Using the Minimum-Description-Length Principle. The American Journal of Human Genetics. [Cited by 57] (12.57/year)
ARGAMON, S., et al., 2004. Efficient Unsupervised Recursive Word Segmentation Using Minimum Description Length. Proc. 20th International Conference on Computational …. [Cited by 8] (2.26/year)
AXELSSON, P., 1999. Processing of laser scanner data�algorithms and applications. ISPRS Journal of Photogrammetry and Remote Sensing. [Cited by 114] (13.36/year)
BACARDIT, J. and J.M. GARRELL, 2003. Bloat control and generalization pressure using the minimum description length principle for a …. Proceedings of the 6th International Workshop on Learning …. [Cited by 13] (2.87/year)
BARRON, A., Y. YANG and B. YU, lip;. Asymptotically optimal function estimation by minimum complexitycriteria. Information Theory, 1994. Proceedings., 1994 IEEE &h. [Cited by 13] (?/year)
BARRON, A.R. and J.B. YU, 1998. The minimum description length principle in coding and modeling. Information Theory, IEEE Transactions on. [Cited by 325] (34.09/year)
BARRON, A.R. and T.M. COVER, 1991. Minimum complexity density estimation. Information Theory, IEEE Transactions on. [Cited by 260] (15.72/year)
BAXTER, R.A., 1996. Minimum Message Length Inference: Theory and Applications. Unpublished doctoral dissertation, Department of Computer …. [Cited by 12] (1.04/year)
BOUCKAERT, R.R., 1993. Probabilistic Network Construction Using the Minimum Description Length Principle. Symbolic and Quantitative Approaches to Reasoning and …. [Cited by 24] (1.65/year)
BOUCKAERT, R.R., 1993. Belief networks construction using the minimum description length principle. Lecture Notes in Computer Science. [Cited by 15] (1.03/year)
BRENT, M.R., S.K. MURTHY and A. LUNDBERG, 1995. Discovering morphemic suffixes: A case study in minimum description length induction. Proceedings of the Fifth International Workshop on …. [Cited by 28] (2.23/year)
CAMERON-JONES, R.M., 1992. Minimum description length instance-based learning. Proceedings of the Fifth Australian Joint Conference on …. [Cited by 14] (0.90/year)
CHAKRABARTI, S., S. SARAWAGI and B. DOM, 1998. Mining surprising patterns using temporal description length. VLDB. [Cited by 79] (8.29/year)
COHEN, I., S. RAZ and D. MALAH, 1999. Translation-invariant denoising using the minimum description length criterion. Signal Processing. [Cited by 15] (1.76/year)
COOK, D.J. and L.B. HOLDER, 1994. Substructure Discovery Using Minimum Description Length and Background Knowledge. Arxiv preprint cs.AI/9402102. [Cited by 186] (13.74/year)
COVER, T.M. and J.A. THOMAS, 2006. Elements of Information Theory. doi.wiley.com. [Cited by 11517] (7,506.76/year)
DAVIES, R.H., et al., 2002. 3D statistical shape models using direct optimisation of description length. 7th European Conference on Computer Vision. [Cited by 79] (14.27/year)
DAVIES, R.H., T.F. COOTES and C.J. TAYLOR, 2001. A Minimum Description Length Approach to Statistical Shape Modelling. Information Processing in Medical Imaging: 17th …. [Cited by 200] (30.61/year)
DOUGHERTY, J., R. KOHAVI and M. SAHAMI, 1995. Supervised and unsupervised discretization of continuous features. Proceedings of the Twelfth International Conference on …. [Cited by 730] (58.24/year)
ERICSSON, A. and K. ASTROM, 2003. Minimizing the description length using steepest descent. Proc. British Machine Vision Conference, Norwich, United …. [Cited by 20] (4.41/year)
FEDER, M., 1986. Maximum entropy as a special case of the minimum description length criterion (Corresp.). Information Theory, IEEE Transactions on. [Cited by 11] (0.51/year)
FIGUEIREDO, M.A.T., J.M.N. LEITAO and A.K. JAIN, 2000. Unsupervised contour representation and estimation using B-splinesand a minimum description length …. Image Processing, IEEE Transactions on. [Cited by 54] (7.17/year)
GALLAND, F., N. BERTAUX and P. REFREGIER, 2003. Minimum description length synthetic aperture radar image segmentation. Image Processing, IEEE Transactions on. [Cited by 22] (4.85/year)
GAO, Q. and M. LI, 1988. An Application of Minimum Description Length Principle to Online Recognition of Handprinted …. York University, Faculty of Arts, Faculty of Science, Dept. of …. [Cited by 11] (0.56/year)
GAO, Q. and M. LI, 1989. The minimum description length principle and its application to online learning of handprinted …. Proc. 11th IEEE International Joint Conference on Artificial …. [Cited by 7] (0.38/year)
GRIINWALD, P., 1996. A Minimum Description Length Approach to Grammar Inference. Connectionist, Statistical and Symbolic Approaches to …. [Cited by 17] (1.47/year)
GRUNWALD, P., 1994. A minimum description length approach to grammar inference. Connectionist, Statistical and Symbolic Approaches to …. [Cited by 33] (2.44/year)
GRUNWALD, P., 2004. A tutorial introduction to the minimum description length principle. Arxiv preprint math.ST/0406077. [Cited by 51] (14.43/year)
GRUNWALD, P., Proc. IJCAI'97 Workshop on Abduction and Induction in AI. The Minimum Description Length principle and non-deductive inference. [Cited by 7] (?/year)
GRUNWALD, P.D., 2007. The minimum description length principle. MIT Press. [Cited by 11] (20.59/year)
GRUNWALD, P.D., I.J. MYUNG and M.A. PITT, 2004. Advances in Minimum Description Length: Theory and Applications. [Cited by 30] (8.49/year)
GRUNWALD, P.D., I.J. MYUNG and M.A. PITT, 2005. Advances in minimum description length. MIT Press. [Cited by 6] (2.37/year)
GR�NWALD, P., 2000. Model Selection Based on Minimum Description Length. Journal of Mathematical Psychology. [Cited by 58] (7.70/year)
GR�NWALD, P., 2005. Minimum Description Length Tutorial. … In Minimum Description Length: Theory And Applications. [Cited by 20] (7.89/year)
GR�NWALD, P.D., 1998. The Minimum Description Length Principle and Reasoning Under Uncertainty. cwi.nl. [Cited by 98] (10.28/year)
GR�NWALD, P.D., M.A. PITT and I.J. MYUNG, 2005. Advances In Minimum Description Length: Theory And Applications. books.google.com. [Cited by 19] (7.50/year)
GU, H., Y. SHIRAI and M. ASADA, 1996. MDL-based segmentation and motion modeling in a long image sequence of scene with multiple …. IEEE Transactions on Pattern Analysis and Machine …. [Cited by 40] (3.47/year)
GUSTAFSSON, F., 2000. Adaptive filtering and change detection. doi.wiley.com. [Cited by 200] (26.55/year)
HALL, P. and E.J. HANNAN, 2004. On stochastic complexity and nonparametric density estimation. Biometrika. [Cited by 33] (9.34/year)
HANSEN, M. and B. YU, 1998. Model selection and minimum description length principle. Journal of the American Statistical Association, submitted. [Cited by 15] (1.57/year)
HANSEN, M.H. and B. YU, 2001. Model Selection and the Principle of Minimum Description Length.. Journal of the American Statistical Association. [Cited by 202] (30.91/year)
HINTON, G.E. and D. VAN, 1993. Keeping the neural networks simple by minimizing the description length of the weights. Proceedings of the sixth annual conference on Computational …. [Cited by 145] (9.98/year)
HINTON, G.E. and R.S. ZEMEL, 1994. Autoencoders, minimum description length, and Helmholtz free energy. Advances in Neural Information Processing Systems. [Cited by 111] (8.20/year)
HINTON, G.E., et al., 1995. The" wake-sleep" algorithm for unsupervised neural networks. Science. [Cited by 293] (23.38/year)
HITOSHI, I.B.A., H. DE and S. TAISUKE, 1994. Genetic Programming Using a Minimum Description Length Principle. Advances in Genetic Programming. [Cited by 96] (7.09/year)
HOLDER, L., D. COOK and S. DJOKO, 1994. Substructure discovery in the subdue system. Proc. AAAI. [Cited by 71] (5.25/year)
JOHN, G.H., R. KOHAVI and K. PFLEGER, 1994. Irrelevant features and the subset selection problem. Proceedings of the Eleventh International Conference on …. [Cited by 864] (63.84/year)
KAVCIC, A. and M. SRINIVASAN, 2001. The minimum description length principle for modeling recordingchannels. Selected Areas in Communications, IEEE Journal on. [Cited by 8] (1.22/year)
KELLER, B. and R. LUTZ, 1997. Evolving stochastic context-free grammars from examples using a minimum description length principle. Workshop on Automata Induction Grammatical Inference and …. [Cited by 21] (1.99/year)
KIT, C. and Y. WILKS, 1999. Unsupervised learning of word boundary with description length gain. Proc. CoNLL99 ACL Workshop. [Cited by 22] (2.58/year)
LE, Y., J.S. DENKER and S.A. SOLLA…, 1990. Optimal brain damage. Advances in Neural Information Processing Systems. [Cited by 701] (39.98/year)
LEE, T.C.M., 2000. Regression spline smoothing using the minimum description length principle. Statistics and Probability Letters. [Cited by 8] (1.06/year)
LEE, T.C.M., 2000. A Minimum Description Length-Based Image Segmentation Procedure, and Its Comparison with a Cross- …. Journal of the American Statistical Association. [Cited by 14] (1.86/year)
LEE, T.C.M., 2002. Tree-based wavelet regression for correlated data using the minimum description length principle. Australian & New Zealand Journal of Statistics. [Cited by 6] (1.08/year)
LEE, T.C.M., rn. An Introduction to Coding Theory and the Two {Part Minimum Description Length Principle. [Cited by 15] (?/year)
LEHTOKANGAS, M., 1996. Predictive minimum description length criterion for time series modeling with neural networks. Neural Computation. [Cited by 10] (0.87/year)
LEONARDIS, A. and H. BISCHOF, 2000. Robust recognition using eigenimages. Computer Vision and Image Understanding. [Cited by 158] (20.97/year)
LI, J., et al., 2006. Minimum description length principle: Generators are preferable to closed patterns. Proceedings 21st National Conference on Artificial …. [Cited by 6] (3.91/year)
LI, M., 1993. Minimum description length based 2D shape description. Computer Vision, 1993. Proceedings., Fourth International …. [Cited by 20] (1.38/year)
LINDEBERG, T. and M.X. LI, 1997. … and Classification of Edges Using Minimum Description Length Approximation and Complementary …. Computer Vision and Image Understanding. [Cited by 20] (1.90/year)
LUTZ, R., 2002. Recovering High-Level Structure of Software Systems Using a Minimum Description Length Principle. Artificial Intelligence and Cognitive Science: 13th Irish …. [Cited by 14] (2.53/year)
MANNILA, H., et al., 2003. Minimum Description Length Block Finder, a Method to Identify Haplotype Blocks and to Compare the … - all 6 versions ». The American Journal of Human Genetics. [Cited by 10] (2.21/year)
MANSOURI, A.R. and J. KONRAD, 2000. Minimum description length region tracking with level sets. Proceedings of SPIE- The International Society for Optical …. [Cited by 6] (0.80/year)
MARK, K.E. and M.I. MILLER, 1992. Bayesian model selection and minimum description length estimation of auditory-nerve discharge rates. The Journal of the Acoustical Society of America. [Cited by 9] (0.58/year)
MERHAV, N., 1993. On the minimum description length principle for sources withpiecewise constant parameters. Information Theory, IEEE Transactions on. [Cited by 29] (2.00/year)
OLIVEIRA, A.L. and A. SANGIOVANNI-VINCENTELLI, 1996. Using the Minimum Description Length Principle to Infer Reduced Ordered Decision Graphs. Machine Learning. [Cited by 20] (1.73/year)
PARIDA, L., D. GEIGER and R. HUMMEL, 1997. Kona: A Multi-junction Detector Using Minimum Description Length Principle. Energy Minimization Methods in Computer Vision and Pattern …. [Cited by 6] (0.57/year)
POTHOS, E.M. and N. CHATER, 2001. Categorization by simplicity: A minimum description length approach to unsupervised clustering. Similarity and Categorization. [Cited by 6] (0.92/year)
QUINLAN, J.R. and R.L. RIVEST, 1989. Inferring decision trees using the minimum description length principle. Information and Computation. [Cited by 436] (23.52/year)
QUINLAN, J.R., 1994. The Minimum Description Length Principle and Categorical Theories. The Proceedings of the Eleventh International Machine …. [Cited by 30] (2.22/year)
RAO, R.B. and S.C.Y. LU, 1992. Learning engineering models with the minimum description length principle.. PROC TENTH NATL CONF ARTIF INTELL AAAI 92., AAAI, MENLO PARK …. [Cited by 9] (0.58/year)
RISSANEN, J., 1982. Estimation of structure by minimum description length. Circuits, Systems, and Signal Processing. [Cited by 15] (0.59/year)
RISSANEN, J., 1983. A Universal Prior for Integers and Estimation by Minimum Description Length. The Annals of Statistics. [Cited by 635] (25.88/year)
RISSANEN, J., 1985. The minimum description length principle, Encyclopedia of Statistical Sciences vol. 5. [Cited by 9] (0.40/year)
RISSANEN, J., 1987. Encyclopedia of Statistical Sciences, volume 5, chapter Minimum-Description-Length Principle. [Cited by 7] (0.34/year)
RISSANEN, J., 1987. Minimum description length principle. Encyclopedia of Statistical Sciences. [Cited by 132] (6.43/year)
SAITO, N., 1994. … and Signal Compression Using a Library of Orthonormal Bases and the Minimum Description Length …. Wavelets in Geophysics. [Cited by 100] (7.39/year)
SCHUG, J. and G.C. OVERTON, 1997. Modeling transcription factor binding sites with Gibbs Sampling and Minimum Description Length …. Proc Int Conf Intell Syst Mol Biol. [Cited by 30] (2.85/year)
SHINODA, K. and T. WATANABE, 1996. Speaker adaptation with autonomous model complexity control by MDLprinciple. Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. …. [Cited by 41] (3.55/year)
SMALL, M. and C.K. TSE, 2002. Minimum description length neural networks for time series prediction. Physical Review E. [Cited by 31] (5.60/year)
SUZUKI, J., 1996. Learning Bayesian belief networks based on the minimum description length principle: An efficient …. Proceedings of the Thirteenth International Conference on …. [Cited by 20] (1.73/year)
SUZUKI, J., Proceedings of the 9th Conference on Uncertainty in …. … Bayesian Belief Networks Based on the Minimum Description Length Principle: Basic Properties. [Cited by 17] (?/year)
THODBERG, H.H. and H. OLAFSDOTTIR, 2003. Adding curvature to minimum description length shape models. Proc. British Machine Vision Conference. [Cited by 28] (6.18/year)
THODBERG, H.H., 2003. Minimum Description Length Shape and Appearance Models. Information Processing in Medical Imaging: 18th …. [Cited by 38] (8.38/year)
VITANYI, P.M.B., M. LI and A. CWI, 2000. Minimum description length induction, Bayesianism, and Kolmogorovcomplexity. Information Theory, IEEE Transactions on. [Cited by 100] (13.27/year)
WALLACE, C.S. and D.L. DOWE, 1999. Minimum Message Length and Kolmogorov Complexity. The Computer Journal. [Cited by 124] (14.53/year)
WALLACE, R.S. and T. KANADE, 1990. Finding natural clusters having minimum description length. Pattern Recognition, 1990. Proceedings., 10th International …. [Cited by 10] (0.57/year)
WALTER, M., A. PSARROU and S. GONG, Proc. BMVC. Data driven gesture model acquisition using minimum description length. [Cited by 9] (?/year)
WAX, M., I. ZISKIND and H. RAFAEL, 1989. Detection of the number of coherent signals by the MDL principle. Acoustics, Speech, and Signal Processing [see also IEEE …. [Cited by 139] (7.50/year)
WIENBERGER, M.J., N. MERHAV and M. FEDER, lip;. Optimal sequential probability assignment for individual sequences. Information Theory, 1994. Proceedings., 1994 IEEE &h. [Cited by 51] (?/year)
WONG, M.L., W. LAM and K.S. LEUNG, 1999. Using evolutionary programming and minimum description lengthprinciple for data mining of Bayesian …. Pattern Analysis and Machine Intelligence, IEEE Transactions …. [Cited by 49] (5.74/year)
WONG, M.L., W. LAM and K.S. LEUNG, 1999. Using evolutionary computation and minimum description length principle for data mining of …. IEEE Transactions on Pattern Analysis and Machine …. [Cited by 8] (0.94/year)
XIE, J., D. ZHANG and W. XU, 2004. Spatially adaptive wavelet denoising using the minimum description length principle. Image Processing, IEEE Transactions on. [Cited by 7] (1.98/year)
XIONG, Z., et al., 2004. … efficient sports highlights extraction using the minimum description length criterion in selecting …. Multimedia and Expo, 2004. ICME'04. 2004 IEEE International …. [Cited by 12] (3.40/year)
ZADROZNY, W., 2001. MINIMUM DESCRIPTION LENGTH AND COMPOSITIONALITY. Computing Meaning. [Cited by 8] (1.22/year)
ZEMEL, R.S. and G.E. HINTON, 1994. Developing population codes by minimizing description length. Advances in Neural Information Processing Systems. [Cited by 29] (2.14/year)
ZEMEL, R.S., 1993. A Minimum Description Length Framework for Unsupervised Learning. [Cited by 80] (5.50/year)
ZEMEL, R.S., 1998. Minimum description length analysis. The handbook of brain theory and neural networks table of …. [Cited by 12] (1.26/year)
ZHAO, W., E. SERPEDIN and E.R. DOUGHERTY, 2006. Inferring gene regulatory networks from time series data using the minimum description length …. Bioinformatics. [Cited by 7] (4.56/year)

Minimum Description Length (MDL)

Top 10 Papers

Links

Bibliography