nips nips2005 nips2005-184 nips2005-184-reference knowledge-graph by maker-knowledge-mining

184 nips-2005-Structured Prediction via the Extragradient Method

Source: pdf

Author: Ben Taskar, Simon Lacoste-Julian, Michael I. Jordan

Abstract: We present a simple and scalable algorithm for large-margin estimation of structured models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convex-concave saddle-point problem and apply the extragradient method, yielding an algorithm with linear convergence using simple gradient and projection calculations. The projection step can be solved using combinatorial algorithms for min-cost quadratic ﬂow. This makes the approach an efﬁcient alternative to formulations based on reductions to a quadratic program (QP). We present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm. 1

reference text

[1] Y. Altun, I. Tsochantaridis, and T. Hofmann. Hidden Markov support vector machines. In Proc. ICML, 2003.

[2] D. Anguelov, B. Taskar, V. Chatalbashev, D. Koller, D. Gupta, G. Heitz, and A. Ng. Discriminative learning of Markov random ﬁelds for segmentation of 3d scan data. In CVPR, 2005.

[3] P. Baldi, J. Cheng, and A. Vullo. Large-scale prediction of disulphide bond connectivity. In Proc. NIPS, 2004.

[4] P. Bartlett, M. Collins, B. Taskar, and D. McAllester. Exponentiated gradient algorithms for large-margin structured classiﬁcation. In NIPS, 2004.

[5] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell., 24, 2002.

[6] M. Collins. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proc. EMNLP, 2002.

[7] D. M. Greig, B. T. Porteous, and A. H. Seheult. Exact maximum a posteriori estimation for binary images. J. R. Statist. Soc. B, 51, 1989.

[8] F. Guerriero and P. Tseng. Implementation and test of auction methods for solving generalized network ﬂow problems with separable convex cost. Journal of Optimization Theory and Applications, 115(1):113–144, October 2002.

[9] B.S. He and L. Z. Liao. Improvements of some projection methods for monotone nonlinear variational inequalities. JOTA, 112:111:128, 2002.

[10] M. Jerrum and A. Sinclair. Polynomial-time approximation algorithms for the Ising model. SIAM J. Comput., 22, 1993.

[11] G. M. Korpelevich. The extragradient method for ﬁnding saddle points and other problems. Ekonomika i Matematicheskie Metody, 12:747:756, 1976.

[12] S. Kumar and M. Hebert. Discriminative ﬁelds for modeling spatial dependencies in natural images. In NIPS, 2003.

[13] J. Lafferty, A. McCallum, and F. Pereira. Conditional random ﬁelds: Probabilistic models for segmenting and labeling sequence data. In ICML, 2001.

[14] E. Matusov, R. Zens, and H. Ney. Symmetric word alignments for statistical machine translation. In Proc. COLING, 2004.

[15] R. Mihalcea and T. Pedersen. An evaluation exercise for word alignment. In Proceedings of the HLT-NAACL 2003 Workshop, Building and Using parallel Texts: Data Driven Machine Translation and Beyond, pages 1–6, Edmonton, Alberta, Canada, 2003.

[16] F. Och and H. Ney. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1), 2003.

[17] A. Schrijver. Combinatorial Optimization: Polyhedra and Efﬁciency. Springer, 2003.

[18] B. Taskar. Learning Structured Prediction Models: A Large Margin Approach. PhD thesis, Stanford University, 2004.

[19] B. Taskar, V. Chatalbashev, D. Koller, and C. Guestrin. Learning structured prediction models: a large margin approach. In ICML, 2005.

[20] B. Taskar, C. Guestrin, and D. Koller. Max margin Markov networks. In NIPS, 2003.

[21] B. Taskar, S. Lacoste-Julien, and M. Jordan. Structured prediction, dual extragradient and Bregman projections. Technical report, UC Berkeley Statistics Department, 2005.

[22] B. Taskar, S. Lacoste-Julien, and D. Klein. A discriminative matching approach to word alignment. In EMNLP, 2005.

[23] L. G. Valiant. The complexity of computing the permanent. Theoretical Computer Science, 8:189–201, 1979.