jmlr jmlr2008 jmlr2008-74 jmlr2008-74-reference knowledge-graph by maker-knowledge-mining

74 jmlr-2008-Online Learning of Complex Prediction Problems Using Simultaneous Projections


Source: pdf

Author: Yonatan Amit, Shai Shalev-Shwartz, Yoram Singer

Abstract: We describe and analyze an algorithmic framework for online classification where each online trial consists of multiple prediction tasks that are tied together. We tackle the problem of updating the online predictor by defining a projection problem in which each prediction task corresponds to a single linear constraint. These constraints are tied together through a single slack parameter. We then introduce a general method for approximately solving the problem by projecting simultaneously and independently on each constraint which corresponds to a prediction sub-problem, and then averaging the individual solutions. We show that this approach constitutes a feasible, albeit not necessarily optimal, solution of the original projection problem. We derive concrete simultaneous projection schemes and analyze them in the mistake bound model. We demonstrate the power of the proposed algorithm in experiments with synthetic data and with multiclass text categorization tasks. Keywords: online learning, parallel computation, mistake bounds, structured prediction


reference text

E. L. Allwein, R.E. Schapire, and Y. Singer. Reducing multiclass to binary: A unifying approach for margin classifiers. In Machine Learning: Proceedings of the Seventeenth International Conference, 2000. L. M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7:200–217, 1967. Y. Censor and S.A. Zenios. Parallel Optimization: Theory, Algorithms, and Applications. Oxford University Press, New York, NY, USA, 1997. K. Crammer and Y. Singer. On the learnability and design of output codes for multiclass problems. Machine Learning, 47, 2002. K. Crammer and Y. Singer. Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3:951–991, 2003. K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive aggressive algorithms. Journal of Machine Learning Research, 7:551–585, Mar 2006. 1433 A MIT, S HALEV-S HWARTZ AND S INGER T. G. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2:263–286, January 1995. M. Fink, S. Shalev-Shwartz, Y. Singer, and S. Ullman. Online multiclass learning by interclass hypothesis sharing. In Proceedings of the 23rd International Conference on Machine Learning, 2006. T. Hastie and R. Tibshirani. Classification by pairwise coupling. The Annals of Statistics, 26(1): 451–471, 1998. C. Hildreth. A quadratic programming procedure. Naval Research Logistics Quarterly, 4:79–85, 1957. Erratum, ibidem, p.361. J. Kivinen and M. Warmuth. Relative loss bounds for multidimensional regression problems. Journal of Machine Learning, 45(3):301–329, July 2001. J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1–64, January 1997. N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1988. A.R. De Pierro and A.N. Iusem. A relaxed version of bregman’s method for convex programming. Journal of Optimization Theory and Applications, 51:421–440, 1986. J. C. Platt. Fast training of Support Vector Machines using sequential minimal optimization. In B. Sch¨ lkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector o Learning. MIT Press, 1998. F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–407, 1958. (Reprinted in Neurocomputing (MIT Press, 1988).). R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3):1–40, 1999. R.E. Schapire and Y. Singer. BoosTexter: A boosting-based system for text categorization. Machine Learning, 32(2/3), 2000. S. Shalev-Shwartz and Y. Singer. Online learning meets optimization in the dual. In Proceedings of the Nineteenth Annual Conference on Computational Learning Theory, 2006a. S. Shalev-Shwartz and Y. Singer. Efficient learning of label ranking by soft projections onto polyhedra. Journal of Machine Learning Research, 7 (July):1567–1599, 2006b. S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In Proceedings of the 24th International Conference on Machine Learning, 2007. B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. In Advances in Neural Information Processing Systems 17, 2003. 1434 O NLINE L EARNING U SING S IMULTANEOUS P ROJECTIONS I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for interdependent and structured output spaces. In Proceedings of the Twenty-First International Conference on Machine Learning, 2004. J. Weston and C. Watkins. Support vector machines for multi-class pattern recognition. In Proceedings of the Seventh European Symposium on Artificial Neural Networks, April 1999. 1435