Ensemble Methods in Machine Learning
20 Aug 2007 21:31
Boosting, bagging, binning, stacking, mixtures of experts, ...
Value of diversity.
- Recommended (totally inadequate, what happened to come to mind cleaning
up my files):
- Pedro Domingos, "The Role of Occam's Razor in Knowledge Discovery," Data Mining and Knowledge Discovery, 3 (1999) [Online. Ensemble methods as an apparent violation of Occam's Razor.]
- G. Langer and U. Parlitz, "Modeling parameter dependence from time series", Physical Review E 70 (2004): 056217 [Interesting use of ensemble methods in state space modeling]
- Laurence K. Saul and Michael I. Jordan, "Mixed Memory Markov Models: Decomposing Complex Stochastic Processes as Mixtures of Simpler Ones", Machine Learning 37 (1999): 75--87
- Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee, "Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods", Annals of Statistics 26 (1998): 1651--1686
- To read:
- Ran Avnimelech and Nathan Intrator, "Boosted Mixture of Experts: An Ensemble Learning Scheme", Neural Computation 11 (1999): 483--497
- Larry M. Bartels, "Specification Uncertainty and Model Averaging", American Journal of Political Science 41 (1997): 641--674
- Zhuo Chen and Yuhong Yan, "Time Series Models for Forecasting: Testing or Combining?", Studies in Nonlinear Dynamics and Econometrics 11:1 (2007): 3
- M. Di Marzio and C. C. Taylor, "Kernel density classification and boosting: an L2 analysis", Statistics and Computing 15 (2005): 113--123
- Yoav Freund, Yishay Mansour and Robert E. Schapire, "Generalization bounds for averaged classifiers", Annals of Statistics 32 (2004): 1698--1722 = math.ST/0410092
- Yoav Freund, Robert E. Schapire, Yoram Singer and Manfred K. Warmuth, "Using and combining predictors that specialize" [PDF preprint]
- G. Fumera and F. Roli, "A Theoretical and Experimental Analysis of Linear Combiners for Multiple Classifier Systems", IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (2005): 942--956
- Nicolas Garcia-Pedrajas, Cesar Garcia-Osorio and Colin Fyfe, "Nonlinear Boosting Projections for Ensemble Construction", Journal of Machine Learning Research 8 (2007): 1--33
- Etienne Grossmann, "A Theory of Probabilistic Boosting, Decision Trees and Matryoshki", cs.LG/0607110
- Jakob Vogdrup Hansen, Combining Predictors: Meta Machine Learning Methods and Bias/Variance & Ambiguity Decompositions [Ph.D. thesis, University of Aarhus, 2000; on-line]
- Geoffrey E. Hinton, "Training Products of Experts by Minimizing Contrastive Divergence," Neural Computation 14 (2002): 1771--1800.
- Marcus Hutter and Jan Poland, "Adaptive Online Prediction by Following the Perturbed Leader", cs.AI/0504078 = Journal of Machine Learning Research 6 (2005): 639--660
- Robert A. Jacobs, "Bias/Variance Analyses of Mixtures-of-Experts Architectures", Neural Computation 9 (1997): 369--383 ["This article investigates the bias and variance of mixtures-of-experts (ME) architectures. The variance of an ME architecture can be expressed as the sum of two terms: the first term is related to the variances of the expert networks that comprise the architecture and the second term is related to the expert networks' covariances. One goal of this article is to study and quantify a number of properties of ME architectures via the metrics of bias and variance. A second goal is to clarify the relationships between this class of systems and other systems that have recently been proposed. It is shown that in contrast to systems that produce unbiased experts whose estimation errors are uncorrelated, ME architectures produce biased experts whose estimates are negatively correlated."]
- Wenxin Jiang, "Boosting with Noisy Data: Some Views from Statistical Theory", Neural Computation 16 (2004): 789--810
- Ludmila I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms
- Nicole Kraemer, "Boosting for Functional Data", math.ST/0605751
- Guillaume Lecu&eaucte;, "Lower Bounds and Aggregation in Density Estimation", Journal of Machine Learning Research 7 (2006): 971--981
- David Mease, Abraham J. Wyner and Andreas Buja, "Boosted Classification Trees and Class Probability/Quantile Estimation", Journal of Machine Learning Research 8 (2007): 409--439
- David J. Miller and Siddharth Pal, "Transductive Methods for the Distributed Ensemble Classification Problem", Neural Computation 19 (2007): 856--884
- Seiji Miyoshi, Kazuyuki Hara, and Masato Okada, "Analysis of ensemble learning using simple perceptrons based on online learning theory", Physical Review E 71 (2005): 036116
- L. Nunes and E. Oliveira, "On Learning by Exchanging Advice," cs.LG/0203010
- Frenando C. Pereira and Yoram Singer, "An Efficient Extension to Mixture Techniques for Prediction and Decision Trees", Machine Learning 36 (1999): 183--199
- Evgueni Petrov, "Constraint-based analysis of composite solvers," cs.AI/0302036
- Yoram Singer, "Adaptive Mixtures of Probabilistic Transducers", Neural Computation 9 (1997): 1711--1733 [PS.gz preprint]
- Eiji Takimoto and Akira Maruoka, "Top-down decision tree learning as information based boosting," Theoretical Computer Science 292 (2002): 447-464
- Héla Zouari, Laurent Heutte and Yves Lecourtier, "Controlling the diversity in classifier ensembles through a measure of agreement", Pattern Recognition 38 (2005): 2195--2199
