jmlr jmlr2005 jmlr2005-54 jmlr2005-54-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Gavin Brown, Jeremy L. Wyatt, Peter Tiňo
Abstract: Ensembles are a widely used and effective technique in machine learning—their success is commonly attributed to the degree of disagreement, or ‘diversity’, within the ensemble. For ensembles where the individual estimators output crisp class labels, this ‘diversity’ is not well understood and remains an open research issue. For ensembles of regression estimators, the diversity can be exactly formulated in terms of the covariance between individual estimator outputs, and the optimum level is expressed in terms of a bias-variance-covariance trade-off. Despite this, most approaches to learning ensembles use heuristics to encourage the right degree of diversity. In this work we show how to explicitly control diversity through the error function. The first contribution of this paper is to show that by taking the combination mechanism for the ensemble into account we can derive an error function for each individual that balances ensemble diversity with individual accuracy. We show the relationship between this error function and an existing algorithm called negative correlation learning, which uses a heuristic penalty term added to the mean squared error function. It is demonstrated that these methods control the bias-variance-covariance trade-off systematically, and can be utilised with any estimator capable of minimising a quadratic error function, for example MLPs, or RBF networks. As a second contribution, we derive a strict upper bound on the coefficient of the penalty term, which holds for any estimator that can be cast in a generalised linear regression framework, with mild assumptions on the basis functions. Finally we present the results of an empirical study, showing significant improvements over simple ensemble learning, and finding that this technique is competitive with a variety of methods, including boosting, bagging, mixtures of experts, and Gaussian processes, on a number of tasks. Keywords: ensemble, diversity, regression estimators, neural networks, hessia
G. Brown. Diversity in Neural Network Ensembles. PhD thesis, School of Computer Science, University of Birmingham, 2004. G. Brown, J. L. Wyatt, R. Harris, and X. Yao. Diversity creation methods: A survey and categorisation. Journal of Information Fusion, 6(1):5–20, 2005a. G. Brown, J. L. Wyatt, and P. Sun. Between two extremes: Examining decompositions of the ensemble objective function. In Proc. Int. Workshop on Multiple Classifier Systems (LNCS 3541), Monterey, California, 2005b. Springer. J. H. Friedman. Multivariate adaptive regression splines. Annals of Statistics, 19:1–141, 1991. G. Fumera and F. Roli. Linear combiners for classifier fusion: Some theoretical and experimental results. In Proc. Int. Workshop on Multiple Classifier Systems (LNCS 2709), pages 74–83, Guildford, Surrey, 2003. Springer. S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, 4(1):1–58, 1992. J. V. Hansen. Combining Predictors: Meta Machine Learning Methods and Bias/Variance and Ambiguity Decompositions. PhD thesis, Aarhus Universitet, Datalogisk Institut, 2000. A. Krogh and J. Vedelsby. Neural network ensembles, cross validation, and active learning. NIPS, 7:231–238, 1995. L. I. Kuncheva and C. Whitaker. Measures of diversity in classifier ensembles. Machine Learning, (51):181–207, 2003. Y. Liu. Negative Correlation Learning and Evolutionary Neural Network Ensembles. PhD thesis, University College, The University of New South Wales, Australian Defence Force Academy, Canberra, Australia, 1998. Y. Liu and X. Yao. Negatively correlated neural networks can produce best ensembles. Australian Journal of Intelligent Information Processing Systems, 4(3/4):176–185, 1997. Y. Liu, X. Yao, and T. Higuchi. Evolutionary ensembles with negative correlation learning. IEEE Transactions on Evolutionary Computation, 4(4), 2000. H. Markowitz. Portfolio selection. Journal of Finance, 7, 1952. R. McKay and H. Abbass. Analyzing anticorrelation in ensemble learning. In Proceedings of 2001 Conference on Artificial Neural Networks and Expert Systems, pages 22–27, Otago, New Zealand, 2001. 1649 ˘ B ROWN , W YATT AND T I NO M. P. Perrone. Improving Regression Estimation: Averaging Methods for Variance Reduction with Extensions to General Convex Measure Optimization. PhD thesis, Brown University, Institute for Brain and Neural Systems, 1993. B. E. Rosen. Ensemble learning using decorrelated neural networks. Connection Science - Special Issue on Combining Artificial Neural Networks: Ensemble Approaches, 8(3 and 4):373–384, 1996. P. Sollich and A. Krogh. Learning with ensembles: How overfitting can be useful. 8:190–196, 1996. P. Tino, I. Nabney, B. S. Williams, J. Losel, and Y. Sun. Non-linear prediction of quantitative structure-activity relationships. Journal of Chemical Information and Computer Sciences, 44(5): 1647–1653, 2004. K. Tumer and J. Ghosh. Theoretical foundations of linear and order statistics combiners for neural pattern classifiers. Technical Report TR-95-02-98, Computer and Vision Research Center, University of Texas, Austin, 1995. N. Ueda and R. Nakano. Generalization error of ensemble estimators. In Proceedings of International Conference on Neural Networks, pages 90–95, 1996. X. Yao, M. Fischer, and G. Brown. Neural network ensembles and their application to traffic flow prediction in telecommunications networks. In Proceedings of International Joint Conference on Neural Networks, pages 693–698. IEEE Press, 2001. Washington DC. 1650