nips nips2004 nips2004-67 nips2004-67-reference knowledge-graph by maker-knowledge-mining

67 nips-2004-Exponentiated Gradient Algorithms for Large-margin Structured Classification

Source: pdf

Author: Peter L. Bartlett, Michael Collins, Ben Taskar, David A. McAllester

Abstract: We consider the problem of structured classiﬁcation, where the task is to predict a label y from an input x, and y has meaningful internal structure. Our framework includes supervised training of Markov random ﬁelds and weighted context-free grammars as special cases. We describe an algorithm that solves the large-margin optimization problem deﬁned in [12], using an exponential-family (Gibbs distribution) representation of structured objects. The algorithm is efﬁcient—even in cases where the number of labels y is exponential in size—provided that certain expectations under Gibbs distributions can be calculated efﬁciently. The method for structured labels relies on a more general result, speciﬁcally the application of exponentiated gradient updates [7, 8] to quadratic programs. 1

reference text

[1] Long version of this paper. Available at http://www.ai.mit.edu/people/mcollins.

[2] Y. Altun, I. Tsochantaridis, and T. Hofmann. Hidden markov support vector machines. In ICML, 2003.

[3] Michael Collins. Parameter estimation for statistical parsing models: Theory and practice of distribution-free methods. In Harry Bunt, John Carroll, and Giorgio Satta, editors, New Developments in Parsing Technology. Kluwer, 2004.

[4] K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2(5):265–292, 2001.

[5] N. Cristianini, C. Campbell, and J. Shawe-Taylor. Multiplicative updatings for support-vector learning. Technical report, NeuroCOLT2, 1998.

[6] N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, 2000.

[7] J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1–63, 1997.

[8] J. Kivinen and M. Warmuth. Relative loss bounds for multidimensional regression problems. Journal of Machine Learning Research, 45(3):301–329, 2001.

[9] John Lafferty, Andrew McCallum, and Fernando Pereira. Conditional random ﬁelds: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML-01, 2001.

[10] D. Pollard. Convergence of Stochastic Processes. Springer-Verlag, 1984.

[11] F. Sha, L. Saul, and D. Lee. Multiplicative updates for large margin classiﬁers. In COLT, 2003.

[12] B. Taskar, C. Guestrin, and D. Koller. Max margin Markov networks. In NIPS, 2003.

[13] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for interdependent and structured output spaces. ICML, 2004 (To appear).