nips nips2002 nips2002-149 nips2002-149-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ofer Dekel, Yoram Singer
Abstract: We describe a new algorithmic framework for learning multiclass categorization problems. In this framework a multiclass predictor is composed of a pair of embeddings that map both instances and labels into a common space. In this space each instance is assigned the label it is nearest to. We outline and analyze an algorithm, termed Bunching, for learning the pair of embeddings from labeled data. A key construction in the analysis of the algorithm is the notion of probabilistic output codes, a generalization of error correcting output codes (ECOC). Furthermore, the method of multiclass categorization using ECOC is shown to be an instance of Bunching. We demonstrate the advantage of Bunching over ECOC by comparing their performance on numerous categorization problems.
[1] E.L. Allwein, R.E. Schapire, and Y. Singer. Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113–141, 2000.
[2] Michael Collins, Robert E. Schapire, and Yoram Singer. Logistic regression, adaboost and bregman distances. Machine Learning, 47(2/3):253–285, 2002.
[3] K. Crammer and Y. Singer. On the learnability and design of output codes for multiclass problems. In Proc. of the Thirteenth Annual Conference on Computational Learning Theory, 2000.
[4] I. Csisz´ r and G. Tusn´ dy. Information geometry and alternaning minimization procedures. a a Statistics and Decisions, Supplement Issue, 1:205–237, 1984.
[5] A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Ser. B, 39:1–38, 1977.
[6] Thomas G. Dietterich and Ghulum Bakiri. Solving multiclass learning problems via errorcorrecting output codes. Journal of Artificial Intelligence Research, 2:263–286, January 1995.
[7] Jyrki Kivinen and Manfred K. Warmuth. Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132(1):1–64, January 1997.
[8] John D. Lafferty. Additive models, boosting and inference for generalized divergences. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory, 1999.
[9] S. Della Pietra, V. Della Pietra, and J. Lafferty. Duality and auxilary functions for Bregman distances. Technical Report CS-01-10, CMU, 2002.
[10] T. Poggio and F. Girosi. Networks for approximation and learning. Proc. of IEEE, 78(9), 1990.
[11] R.E. Schapire, M. Rochery, M. Rahim, and N. Gupta. Incorporating prior knowledge into boosting. In Machine Learning: Proceedings of the Nineteenth International Conference, 2002.