Notebooks

The Minimum Description Length Principle (MDL)

16 Feb 2008 23:10

MDL is an information-theoretic approach to machine learning, or statistical model selection, which basically says you should pick the model which gives you the most compact description of the data, including the description of the model itself. More precisely, given a probabilistic model, Shannon's coding theorems tell you the minimal number of bits needed to encode your data, i.e., the maximum extent to which it can be compressed. Really, however, to complete the description, you need to specify the model as well, from among some set of alternatives, and this will also require a certain number of bits. Hence you really want to minimize the combined length of the description of the model, plus the description of the data under that model. This works out to being a kind of penalized maximum likelihood --- the data-given-model bit is the negative log likelihood, and the model-description term is the penalty.

It's a very appealing idea, and a lot of work has (rightly) been done under this heading, though I have to say I'm not altogether convinced, both because of the issues involved in chosing a coding scheme for models, and because it's not clear that, in practice, it actually does that much better than straightforward likelihood maximization. (See the paper by Domingos.)

I should also say that what I've described above is the old-fashioned "two-part" MDL, and there are now "one-part" schemes, where (so to speak) the model coding scheme is supposed to be fixed by the data as well, in some unambiguous and nearly-optimal manner. However, I honestly don't understand those yet, so I'm not competent to talk about them.

See also: Universal Prediction Algorithms


Notebooks:     Hosted, but not endorsed, by the Center for the Study of Complex Systems