jmlr jmlr2007 jmlr2007-59 jmlr2007-59-reference knowledge-graph by maker-knowledge-mining

59 jmlr-2007-Nonlinear Boosting Projections for Ensemble Construction


Source: pdf

Author: Nicolás García-Pedrajas, César García-Osorio, Colin Fyfe

Abstract: In this paper we propose a novel approach for ensemble construction based on the use of nonlinear projections to achieve both accuracy and diversity of individual classifiers. The proposed approach combines the philosophy of boosting, putting more effort on difficult instances, with the basis of the random subspace method. Our main contribution is that instead of using a random subspace, we construct a projection taking into account the instances which have posed most difficulties to previous classifiers. In this way, consecutive nonlinear projections are created by a neural network trained using only incorrectly classified instances. The feature subspace induced by the hidden layer of this network is used as the input space to a new classifier. The method is compared with bagging and boosting techniques, showing an improved performance on a large set of 44 problems from the UCI Machine Learning Repository. An additional study showed that the proposed approach is less sensitive to noise in the data than boosting methods. Keywords: classifier ensembles, boosting, neural networks, nonlinear projections


reference text

D. W. Aha and R. L. Bankert. A comparative evaluation of sequential feature selection algorithms. In D. Fisher and H. Lenz, editors, Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, pages 1–7, 1995. E. Alpaydin. Combined 5 × 2 cv F test for comparing supervised classification learning algorithms. Neural Computation, 11:1885–1892, 1999. T. W. Anderson. An Introduction to Multivariate Statistical Analysis. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, New York, 2nd edition, 1984. E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36(1/2):105–142, July/August 1999. L. Breiman. Stacked regressions. Machine Learning, 24(1):49–64, 1996a. L. Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996b. L. Breiman. Bias, variance, and arcing classifiers. Technical Report 460, Department of Statistics, University of California, Berkeley, CA, 1996c. L. Breiman. Arcing classifiers. Annals of Statistics, 26:801–824, 1998. L. Breiman. Prediction games and arcing algorithms. Neural Computation, 11(7):1493–1517, 1999. N. H. Bshouty and D. Gavinsky. On boosting with polynomially bounded distributions. Journal of Machine Learning Research, 3:483–506, 2002. Ch-Ch. Chang and Ch-J. Lin. LIBSVM: A Library for Support Vector Machines, 2001. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm. K. Chen, L. Wang, and H. Chi. Methods of combining multiple classifiers with different features and their applications to text-independent speaker identification. Journal of Pattern Recognition and Artificial Intelligence, 11(3):417–445, 1997. K. Cherkauer. Human expert-level performance on a scientific image analysis task by a system using combined artificial neural networks. In Working Notes of the AAAI Workshop on Integrating Multiple Learned Models, pages 15–21, 1996. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, 2000. P. Cunningham and J. Carney. Diversity versus quality in classification ensembles based on feature selection. In R. L. de Mantar´ s and E. Plaza, editors, Proceedings of the Eleventh Conference on a Machine Learning ECML 2000, pages 109–116, Barcelona, Spain, 2000. Springer. D. G. T. Denison, C. C. Holmes, B. K. Mallick, and A. F. M. Smith. Bayesian Methods for Nonlinear Classification and Regression. Wiley Series in Probability and Statistics. John Wiley & Sons, West Sussex, England, 2002. 29 G ARC´A -P EDRAJAS , G ARC´A -O SORIO AND F YFE I I L. Diao, K. Hu, Y. Lu, and Ch. Shi. A method to boost support vector machines. In M-S. Chen, P. S. Yu, and B. Liu, editors, Proceedings of the Sixth Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 463–468, Taipei, Taiwan, 2002. Springer-Verlag. T. G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40:139–157, 2000a. T. G. Dietterich. Ensemble methods in machine learning. In J. Kittler and F. Roli, editors, Proceedings of the First International Workshop on Multiple Classifier Systems, pages 1–15. SpringerVerlag, 2000b. T. G. Dietterich. Ensemble methods in machine learning. Lecture Notes in Computer Science, 1857: 1–15, 2000c. T. G. Dietterich. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7):1895–1923, 1998. C. Domingo and O. Watanabe. MadaBoost: A modification of AdaBoost. In Proceedings of the 13th Annual Conference on Computational Learning Theory, pages 180–189. Morgan Kaufmann, San Francisco, 2000. S. Dzeroski and B. Zenko. Is combining classifiers with stacking better than selecting the best one? Machine Learning, 54:255–273, 2004. G. Eibl and K-P. Pfeiffer. Multiclass boosting for weak classifiers. Journal of Machine Learning Research, 6:189–210, 2005. A. Fern and R. Givan. Online ensemble learning: An empirical study. Machine Learning, 53: 71–109, 2003. Y. Freund and R. Schapire. Experiments with a new boosting algorithm. In Proc. of the Thirteenth International Conference on Machine Learning, pages 148–156, Bari, Italy, 1996. J. Friedman, T. Hastie, and R. Tibshirani. Additice logistic regression: A statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000. N. Garc´a-Pedrajas, C. Herv´ s-Mart´nez, and D. Ortiz-Boyer. Cooperative coevolution of artificial ı a ı neural network ensembles for pattern classification. IEEE Transactions on Evolutionary Computation, 9(3):271–302, June 2005. R. L. Gorsuch. Factor Analysis. Erlbaum, Hillsdale, NJ, USA, 1983. A. J. Grove and D. Schuurmans. Boosting in the limit: Maximizing the margin of learned ensembles. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, 1998. L. Hall, K. Bowyer, R. Banfield, D. Bhadoria, W. Kegelmeyer, and S. Eschrich. Comparing pure parallel ensemble creation techniques against bagging. In Third IEEE International Conference on Data Mining, pages 533–536, Melbourne, FL, USA, 2003. S. Haykin. Neural Networks – A Comprehensive Foundation. Prentice – Hall, Upper Saddle River, NJ, 2nd edition, 1999. 30 N ONLINEAR B OOSTING P ROJECTIONS FOR E NSEMBLE C ONSTRUCTION S. Hettich, C.L. Blake, and C.J. Merz. UCI repository of machine learning databases, 1998. http://www.ics.uci.edu/∼mlearn/MLRepository.html. T. K. Ho. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8):832–844, 1998. I. T. Jolliffe. Principal Components Analysis. Springer – Verlag, New York, NY, 1986. H-Ch. Kim, S. Pang, H-M. Je, D. Kim, and S. Y. Bang. Pattern classification using support vector machine ensembles. In Proceedings of the 16th International Conference on Pattern Recognition (ICPR´02), volume 2, pages 160–163, 2002. E. Kleinberg. On the algorithmic implementation of stochastic discrimination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(5):473–490, 2000. R. Kohavi. Wrappers for Performance Enhancement and Oblivious Decision Graphs. PhD thesis, Department of Computer Science, Stanford University, Stanford, USA, 1995. R. Kohavi and C. Kunz. Option decision trees with majority voting. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 161–169, San Francisco, CA, USA, 1997. Morgan Kaufman. T. Kohonen. Self-Organizing Maps, volume 30 of Springer Series in Information Sciences. Springer, Berlin, third edition, 2001. J. F. Kolen and J. B. Pollack. Back propagation is sensitive to initial conditions. In Richard P. Lippmann, John E. Moody, and David S. Touretzky, editors, Advances in Neural Information Processing Systems, volume 3, pages 860–867. Morgan Kaufmann Publishers, Inc., 1991. L. Kuncheva and C. J. Whitaker. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 51(2):181–207, May 2003. L. I. Kuncheva. Combining classifiers: Soft computing solutions. In S. K. Pal and A. Pal, editors, Pattern Recognition: From Classical to Modern Approaches, pages 427–451. World Scientific, 2001. L. I. Kuncheva. Error bounds for aggressive and conservative adaboost. In Proceedings of MCS, number 2709 in Lecture Notes in Computer Science, pages 25–34, Guilford, UK, 2003. ¨ ¨ Y. LeCun, L. Bottou, G. B. Orr, and K-R. Muller. Efficient backprop. In G. B. Orr and K-R. Muller, editors, Neural Networks: Tricks of the Trade, pages 9–50. Springer-Verlag, 1998. B. Lerner, H. Guterman, M. Aladjem, and I. Dinstein. A comparative study of neural networks based feature extraction paradigms. Pattern Recognition Letters, 20(1):7–14, 1999. Y. Liu, X. Yao, and T. Higuchi. Evolutionary ensembles with negative correlation learning. IEEE Transactions on Evolutionary Computation, 4(4):380–387, November 2000. D. D. Margineantu and T. G. Dietterich. Pruning adaptive boosting. In Douglas H. Fisher, editor, Proceedings of the Fourteenth International Conference on Machine Learning, pages 211–218, San Francisco, CA, USA, 1997. Morgan Kaufmann Publishers Inc. 31 G ARC´A -P EDRAJAS , G ARC´A -O SORIO AND F YFE I I L. Mason, P. L. Bartlett, and J. Baxter. Improved generalization through explicit optimization of margins. Machine Learning, 38:243–255, 2000. C. J. Merz. Using correspondence analysis to combine classifiers. Machine Learning, 36(1):33–58, July 1999. R. Munro, D. Ler, and J. Patrick. Meta-learning orthographic and contextual models for language independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning, pages 192 – 195, 2003. D. W. Opitz. Feature selection for ensembles. In Proceedings of the Sixteenth National Conference on Artificial Intelligence, pages 379 – 384, Orlando, FL, USA, 1999. American Association for Artificial Intelligence. D. Ortiz-Boyer, C. Herv´ s-Mart´nez, and N. Garc´a-Pedrajas. Cixl2: A crossover operator for a ı ı evolutionary algorithms based on population features. Journal of Artificial Intelligence Research, 24:33–80, July 2005. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, 1993. J. J. Rodr´guez, L. I. Kuncheva, and C. J. Alonso. Rotation forest: A new classifier ensemble ı method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1619–1630, Oct 2006. S. Rosset, J. Zhu, and T. Hastie. Boosting as a regularized path to a maximum margin classifier. Journal of Machine Learning Research, 5:941–073, 2004. D. Rumelhart, G. Hinton, and R. J. Williams. Learning internal representations by error propagation. In D. Rumelhart and J. McClelland, editors, Parallel Distributed Processing, pages 318–362. MIT Press, Cambridge, MA, 1986. R. E. Schapire and Y. Singer. Boostexter: A boosting-based system for text categorization. Machine Learning, 39:135–168, 2000. R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37:297–336, 1999. R. E. Schapire, Y. Freund, P. L. Bartlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26(5):1651–1686, 1998. M. Sebban, R. Nock, and S. Lallich. Stopping criterion for boosting-based data reduction techniques: from binary to multiclass problems. Journal of Machine Learning Research, 3:863–885, 2002. M. Skurichina and R. P. W. Duin. Bagging and the random subspace method for redundant feature spaces. In J. Kittler and R. Poli, editors, Proceedings of the Second International Workshop on Multiple Classifier Systems MCS 2001, pages 1–10, Cambridge, UK, 2001. 32 N ONLINEAR B OOSTING P ROJECTIONS FOR E NSEMBLE C ONSTRUCTION A. Tsymbal, P. Cunningham, M. Pechinizkiy, and P. Puuronen. Search strategies for ensemble feature selection in medical diagnosis. In M. Krol, S. Mitra, and D. J. Lee, editors, Proceedings of the Sixteenth IEEE Symposium on Computer-Bases Medical Systems CBMS’2003, pages 124– 129, The Mount Sinai School of Medicine, New York, USA, 2003. IEEE CS Press. K. Tumer and J. Ghosh. Error correlation and error reduction in ensemble classifier. Connection Science, 8(3–4):385–404, 1996. A. Utsugi. Ensemble of independent factor analyzers with application to natural image analysis. Neural Processing Letters, 14(1):49–60, August 2001. G. I. Webb. Multiboosting: A technique for combining boosting and wagging. Machine Learning, 40(2):159–196, August 2000. M-H. Yand, N. Ahuja, and D. Kriegman. Face detection using mixtures of linear subspaces. In Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition, pages 70–77. IEEE Computer Society Washington, DC, USA, 2000. G. Zenobi and P. Cunningham. Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error. In L. de Raedt and P. Flach, editors, 12th European Conference on Machine Learning (ECML 2001), LNAI 2167, pages 576–587. Springer–Verlag, 2001. T. Zhang and B. Yu. Boosting with early stooping: Convergence and consistency. The Annals of Statistics, 33(4):1538–1579, 2005. Z-H. Zhou, J. Wu, and W. Tang. Ensembling neural networks: Many could be better than all. Artificial Intelligence, 137(1–2):239–253, May 2002. 33