nips nips2012 nips2012-226 nips2012-226-reference knowledge-graph by maker-knowledge-mining

226 nips-2012-Multiclass Learning Approaches: A Theoretical Comparison with Implications

Source: pdf

Author: Amit Daniely, Sivan Sabato, Shai S. Shwartz

Abstract: We theoretically analyze and compare the following ﬁve popular multiclass classiﬁcation methods: One vs. All, All Pairs, Tree-based classiﬁers, Error Correcting Output Codes (ECOC) with randomly generated code matrices, and Multiclass SVM. In the ﬁrst four methods, the classiﬁcation is based on a reduction to binary classiﬁcation. We consider the case where the binary classiﬁer comes from a class of VC dimension d, and in particular from the class of halfspaces over Rd . We analyze both the estimation error and the approximation error of these methods. Our analysis reveals interesting conclusions of practical relevance, regarding the success of the different approaches under various conditions. Our proof technique employs tools from VC theory to analyze the approximation error of hypothesis classes. This is in contrast to most previous uses of VC theory, which only deal with estimation error. 1

reference text

E. L. Allwein, R.E. Schapire, and Y. Singer. Reducing multiclass to binary: A unifying approach for margin classiﬁers. Journal of Machine Learning Research, 1:113–141, 2000. S. Ben-David, N. Cesa-Bianchi, D. Haussler, and P. Long. Characterizations of learnability for classes of {0, . . . , n}-valued functions. Journal of Computer and System Sciences, 50:74–86, 1995. S. Bengio, J. Weston, and D. Grangier. Label embedding trees for large multi-class tasks. In NIPS, 2011. A. Beygelzimer, J. Langford, and P. Ravikumar. Multiclass classiﬁcation with ﬁlter trees. Preprint, June, 2007. K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2:265–292, 2001. A. Daniely, S. Sabato, S. Ben-David, and S. Shalev-Shwartz. Multiclass learnability and the erm principle. In COLT, 2011. T. G. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artiﬁcial Intelligence Research, 2:263–286, January 1995. Trevor Hastie and Robert Tibshirani. Classiﬁcation by pairwise coupling. The Annals of Statistics, 26(1):451–471, 1998. Ryan Rifkin and Aldebaro Klautau. In defense of one-vs-all classiﬁcation. Journal of Machine Learning Research, 5:101–141, 2004. David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning internal representations by error propagation. In David E. Rumelhart and James L. McClelland, editors, Parallel Distributed Processing – Explorations in the Microstructure of Cognition, chapter 8, pages 318–362. MIT Press, 1986. G. Takacs. Convex polyhedron learning and its applications. PhD thesis, Budapest University of Technology and Economics, 2009. V. N. Vapnik. Statistical Learning Theory. Wiley, 1998. J. Weston and C. Watkins. Support vector machines for multi-class pattern recognition. In Proceedings of the Seventh European Symposium on Artiﬁcial Neural Networks, April 1999. 9