nips nips2012 nips2012-227 nips2012-227-reference knowledge-graph by maker-knowledge-mining

227 nips-2012-Multiclass Learning with Simplex Coding

Source: pdf

Author: Youssef Mroueh, Tomaso Poggio, Lorenzo Rosasco, Jean-jeacques Slotine

Abstract: In this paper we discuss a novel framework for multiclass learning, deﬁned by a suitable coding/decoding strategy, namely the simplex coding, that allows us to generalize to multiple classes a relaxation approach commonly used in binary classiﬁcation. In this framework, we develop a relaxation error analysis that avoids constraints on the considered hypotheses class. Moreover, using this setting we derive the ﬁrst provably consistent regularized method with training/tuning complexity that is independent to the number of classes. We introduce tools from convex analysis that can be used beyond the scope of this paper. 1

reference text

[1] Erin L. Allwein, Robert E. Schapire, and Yoram Singer. Reducing multiclass to binary: a unifying approach for margin classiﬁers. Journal of Machine Learning Research, 1:113–141, 2000.

[2] Peter L. Bartlett, Michael I. Jordan, and Jon D. McAuliffe. Convexity, classiﬁcation, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006.

[3] A. Caponnetto and E. De Vito. Optimal rates for regularized least-squares algorithm. Foundations of Computational Mathematics, 2006.

[4] D. Chen and T. Sun. Consistency of multiclass empirical risk minimization methods based in convex loss. Journal of machine learning, X, 2006.

[5] Crammer.K and Singer.Y. On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2001.

[6] Thomas G. Dietterich and Ghulum Bakiri. Solving multiclass learning problems via errorcorrecting output codes. Journal of Artiﬁcial Intelligence Research, 2:263–286, 1995.

[7] Yann Guermeur. Vc theory of large margin multi-category classiers. Journal of Machine Learning Research, 8:2551–2594, 2007. 8

[8] Simon I. Hill and Arnaud Doucet. A framework for kernel-based multi-category classiﬁcation. J. Artif. Int. Res., 30(1):525–564, December 2007.

[9] G. Kimeldorf and G. Wahba. A correspondence between bayesian estimation of stochastic processes and smoothing by splines. Ann. Math. Stat., 41:495–502, 1970.

[10] Lee.Y, L.Yin, and Wahba.G. Multicategory support vector machines: Theory and application to the classiﬁcation of microarray data and satellite radiance data. Journal of the American Statistical Association, 2004.

[11] Liu.Y. Fisher consistency of multicategory support vector machines. Eleventh International Conference on Artiﬁcial Intelligence and Statistics, 289-296, 2007.

[12] C.A. Micchelli and M. Pontil. On learning vector–valued functions. Neural Computation, 17:177–204, 2005.

[13] N. Pinto, Z. Stone, T. Zickler, and D.D. Cox. Scaling-up biologically-inspired computer vision: A case-study on facebook. 2011.

[14] M.D. Reid and R.C. Williamson. Composite binary losses. JMLR, 11, September 2010.

[15] Rifkin.R and Klautau.A. In defense of one versus all classiﬁcation. journal of machine learning, 2004.

[16] Saberian.M and Vasconcelos .N. Multiclass boosting: Theory and algorithms. In NIPS 2011, 2011.

[17] Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal estimated subgradient solver for svm. In Proceedings of the 24th ICML, ICML ’07, pages 807–814, New York, NY, USA, 2007. ACM.

[18] I. Steinwart and A. Christmann. Support vector machines. Information Science and Statistics. Springer, New York, 2008.

[19] Van de Geer.S Tarigan.B. A moment bound for multicategory support vector machines. JMLR 9, 2171-2185, 2008.

[20] A. Tewari and P. L. Bartlett. On the consistency of multiclass classiﬁcation methods. In Proceedings of the 18th Annual Conference on Learning Theory, volume 3559, pages 143– 157. Springer, 2005.

[21] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. JMLR, 6(2):1453–1484, 2005.

[22] Alexandre B. Tsybakov. Optimal aggregation of classiﬁers in statistical learning. Annals of Statistics, 32:135–166, 2004.

[23] Elodie Vernet, Robert C. Williamson, and Mark D. Reid. Composite multiclass losses. In Proceedings of Neural Information Processing Systems (NIPS 2011), 2011.

[24] G. Wahba. Spline models for observational data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, PA, 1990.

[25] Weston and Watkins. Support vector machine for multi class pattern recognition. Proceedings of the seventh european symposium on artiﬁcial neural networks, 1999.

[26] Tong Tong Wu and Kenneth Lange. Multicategory vertex discriminant analysis for highdimensional data. Ann. Appl. Stat., 4(4):1698–1721, 2010.

[27] Y. Yao, L. Rosasco, and A. Caponnetto. On early stopping in gradient descent learning. Constructive Approximation, 26(2):289–315, 2007.

[28] T. Zhang. Statistical analysis of some multi-category large margin classiﬁcation methods. Journal of Machine Learning Research, 5:1225–1251, 2004.

[29] Tong Zhang. Statistical behavior and consistency of classiﬁcation methods based on convex risk minimization. The Annals of Statistics, Vol. 32, No. 1, 56134, 2004. 9