Sufficient Statistics
22 Jun 2008 21:27
In statistical theory, a "statistic" is a well-behaved function of the data, which is what's actualy used in calculations or inferences, rather than the full data set. E.g., the sample mean, the sample median, the sample variance, etc. A statistic is sufficient if it is just as informative as the full data. The concept was introduced by R. A. Fisher in the 1920s, and refined by Jerzy Neyman in the 1930s. Parametric sufficiency means that the statistic contains just as much information about the parameter as the full data. The actual data has a certain probability distribution conditional on the data, which in general will also involve the parameter. The statistic is sufficient if this conditional distribution is the same for all parameter values. (That's actually clearer in algebra but I don't feel up to writing it in HTML now.) Once we've controlled for the sufficient statistic, nothing else --- not even the original data --- can tell us anything more about the parameter. Predictive sufficiency is similar: given the predictively sufficient statistic, future observations can be predicted as well as if the whole past was available. This can be expressed concisely in terms of mutual information.
A necessary statistic is one which can be computed from any sufficient statistic, without reference to the original data. (It's "necessary" in the sense that any optimal inference implicitly involves knowing the necessary statistic.) Under pretty general conditions, maximum likelihood estimates are necessary statistics, though they are not always sufficient. A minimal sufficient statistic is one which is both necessary and sufficient --- i.e., it's just as informative as the original data, but it can be computed from any other sufficient statistic; no further compression of the data is possible, without losing some information.
A lot of my work has involved describing and finding predictively sufficient statistics for time series and spatio-temporal processes. It turns out that the statistical sufficiency property gives rise to a Markov property for the statistics. (So, basically, computational mechanics turns out to be about constructive predictively sufficient statistics.) So I'm very interested in sufficiency in general, and especially how it relates to Markovian representations of non-Markovian processes.
Topics of particular interest: Necessary and sufficient conditions for the existence of non-trivial sufficient statistics; dimensionality of sufficient statistics; geometric and probabilistic characterizations; decision-theoretic properties; necessary statistics; minimal sufficient statistics for transducers; connections to causal inference
- Recommended:
- Sufficiency is a very important topic in statistical inference, and any good book on theoretical statistics will cover it in depth. I like E. L. Lehmann's two-volume set on Theory of Point Estimation and Testing Statistical Hypotheses, but really anyone will do.
- David Blackwell and M. A. Girshick, Theory of Games and Statistical Decisions [Blackwell was a pioneer in exploring the decision-theoretic properties of sufficiency, and this excellent old book contains many deep theorems in this area]
- E. B. Dynkin, "Sufficient statistics and extreme points", Annals of Probability 6 (1978): 705--730 ["The connection between ergodic decompositions and sufficient statistics is explored in an elegant paper by DYNKIN" --- Kallenberg, Foundations of Modern Probability, p. 577. Link to JSTOR]
- John W. Fisher III, Alexander T. Ihler and Paula A. Viola, "Learning Informative Statistics: A Nonparametric Approach", pp. 900--906 in NIPS 12 (1999) [PDF reprint. I'd call this more of a semi-parametric approach than a fully non-parametric one; they assume a parametric form for the dependence structure, but are agnostic about the distributions of innovations, and so try to maximize non-parametrically estimated mutual informations. In the limit, this will give them sufficient statistics.]
- R. A. Fisher
- "A Mathematical Examination of the Methods of Determining the Accuracy of an Observation by the Mean Error, and by the Mean Square Error", Monthly Notices of the Royal Astronomical Society 80 (1920): 758--770 [Apparently the first time the sufficiency property was noted, though Fisher does not use that term here. PDF]
- "On the Mathematical Foundations of Theoretical Statistics", Philosophical Transactions of the Royal SocietyA 222 (1922): 309--368 [Formal introduction of the concept, and the name, of sufficiency, along with much else that has proved fundamental to statistics, such as the likelihood function and the method of maximum likelihood. PDF in two parts, 1, 2]
- "Theory of Statistical Estimation", Proceedings of the Cambridge Philosophical Society 22 (1925): 700--725 [Often, but mistakenly, cited in place of the 1922 paper; admittedly, clearer. PDF]
- Solomon Kullback, Information Theory and Statistics
- Rudolf Kulhavy, Recursive Nonlinear Estimation: A Geometric Approach
- Benoit Mandelbrot, "The Role of Sufficiency and of Estimation in Thermodynamics", Annals of Mathematical Statistics 33 (1962): 1021--1038 [JSTOR; free PDF reprint. Extensive thermodynamic variables as sufficient statistics for the conjugate intensive variables; Gibbs canonical form arising from natural requirements on finite-dimensional sufficient statistics, which can only be achieved for exponential families of probability distributions. Very clever, and IMHO a real contribution to the foundations of staitstical mechanics and thermodynamics.]
- Giorgio Picci, "Some Connections Between the Theory of Sufficient Statistics and the Identifiability Problem", SIAM Journal on Applied Mathematics 33 (1977): 383--398 [Introduces the idea of a "maximal identifiable statistic" --- the coarsest partition of hypothesis space where each equivalence class/cell of the partition gives rise to a distinct distribution of observables. (I would prefer "parameter" or "functional",rather than "statistic", since it's a function of the distribution, not the observables, but that's a quibble.) It might be interesting to try to define emergence in these terms --- perhaps as a restriction on the observable sigma-field such that the equivalence classes of the maximal identifiable parameter become infinite-dimensional, or something like that. JSTOR. Thanks to Rhiannon Weaver for the pointer.]
- To read:
- R. R. Bahadur, "Sufficiency and statistical decision functions," Annals of Mathematical Statistics 25 (1954): 423--462
- T. Bohlin, "Information pattern for linear discrete-time models with stochastic coefficients," IEEE Transactions on Automatic Control 15 (1970): 104--106 [On recursively-computable sufficient statistics]
- J. L. Denny, "Sufficient Conditions for a Family of Probabilities to be Exponential", Proceedings of the National Academy of Sciences 57 (1967): 1184-- ["We make the following statement precise under fairly weak conditions: in an experiment, if we summarize n statistically independent observtions (x1,...xn) in m < n real numbers (y1,...ym), where yj = \sumi=1nfj(xi) and the fj are given functions, and if we assume we have lost no information by the summary, then the family of probabilities associated with the experiment must be an exponentialm family."]
- E. B. Dynkin, "Necessary and sufficient statistics for a family of probability distributions," Uspekhi maetm. nauk 6 (1951): 68--90 [Apparently translated in Select. Trans. Math. Statist. Prob. 1 (1951): 23--41. Zacks, below, is supposed to follow closely]
- V. S. Huzurbazar, Sufficient Statistics: Selected Contributions
- Anna Jencova and Denes Petz, "Suffificiency in quantum statistical inference", math-ph/0412093 [Sounds cool!]
- S. L. Lauritzen, Extremal Families and Systems of Sufficient Statistics
- W. J. Runggaldier and F. Spizzichino, "Sufficient conditions for finite dimensionality of filters in discrete time: A Laplace transform-based approach," Bernoulli 7 (2001): 211--221
- S. Zacks, The Theory of Statistical Inference [For material on necessary and sufficient statistics]
