nips nips2012 nips2012-200 nips2012-200-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Joseph Wang, Venkatesh Saligrama
Abstract: We develop a novel approach for supervised learning based on adaptively partitioning the feature space into different regions and learning local region-specific classifiers. We formulate an empirical risk minimization problem that incorporates both partitioning and classification in to a single global objective. We show that space partitioning can be equivalently reformulated as a supervised learning problem and consequently any discriminative learning method can be utilized in conjunction with our approach. Nevertheless, we consider locally linear schemes by learning linear partitions and linear region classifiers. Locally linear schemes can not only approximate complex decision boundaries and ensure low training error but also provide tight control on over-fitting and generalization error. We train locally linear classifiers by using LDA, logistic regression and perceptrons, and so our scheme is scalable to large data sizes and high-dimensions. We present experimental results demonstrating improved performance over state of the art classification techniques on benchmark datasets. We also show improved robustness to label noise.
[1] G. R¨ tsch, T. Onoda, and K.-R. M¨ ller. Soft margins for AdaBoost. Technical Report NC-TRa u 1998-021, Department of Computer Science, Royal Holloway, University of London, Egham, UK, August 1998. Submitted to Machine Learning.
[2] Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119 – 139, 1997.
[3] Leo Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, 1984.
[4] Thomas G. Dietterich and Ghulum Bakiri. Solving multiclass learning problems via errorcorrecting output codes. Journal of Artificial Intelligence Research, 2:263–286, 1995.
[5] Erin L. Allwein, Robert E. Schapire, and Yoram Singer. Reducing multiclass to binary: a unifying approach for margin classifiers. J. Mach. Learn. Res., 1:113–141, September 2001.
[6] Koby Crammer and Yoram Singer. On the learnability and design of output codes for multiclass problems. In In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pages 35–46, 2000.
[7] Venkatesan Guruswami and Amit Sahai. Multiclass learning, boosting, and error-correcting codes. In Proceedings of the twelfth annual conference on Computational learning theory, COLT ’99, pages 145–155, New York, NY, USA, 1999. ACM.
[8] Yijun Sun, Sinisa Todorovic, Jian Li, and Dapeng Wu. Unifying the error-correcting and output-code adaboost within the margin framework. In Proceedings of the 22nd international conference on Machine learning, ICML ’05, pages 872–879, New York, NY, USA, 2005. ACM.
[9] Trevor Hastie and Robert Tibshirani. Discriminant analysis by gaussian mixtures. Journal of the Royal Statistical Society, Series B, 58:155–176, 1996.
[10] Tae-Kyun Kim and Josef Kittler. Locally linear discriminant analysis for multimodally distributed classes for face recognition with a single model image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27:318–327, 2005.
[11] Ofer Dekel and Ohad Shamir. There’s a hole in my data space: Piecewise predictors for heterogeneous learning problems. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 15, 2012.
[12] Juan Dai, Shuicheng Yan, Xiaoou Tang, and James T. Kwok. Locally adaptive classification piloted by uncertainty. In Proceedings of the 23rd international conference on Machine learning, ICML ’06, pages 225–232, New York, NY, USA, 2006. ACM.
[13] Marc Toussaint and Sethu Vijayakumar. Learning discontinuities with products-of-sigmoids for switching between local models. In Proceedings of the 22nd international conference on Machine Learning, pages 904–911. ACM Press, 2005.
[14] Eduardo D. Sontag. Vc dimension of neural networks. In Neural Networks and Machine Learning, pages 69–95. Springer, 1998.
[15] Yoav Freund and Robert E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37:277–296, 1999. 10.1023/A:1007662407062.
[16] A. Frank and A. Asuncion. UCI machine learning repository, 2010.
[17] J. Langford. Tutorial on practical prediction theory for classification. Journal of Machine Learning Research, 6(1):273, 2006.
[18] Mohammad J. Saberian and Nuno Vasconcelos. Multiclass boosting: Theory and algorithms. In J. Shawe-Taylor, R.S. Zemel, P. Bartlett, F.C.N. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 2124–2132. 2011.
[19] Ji Zhu, Hui Zou, Saharon Rosset, and Trevor Hastie. Multi-class adaboost, 2009. 9