emnlp emnlp2011 emnlp2011-45 emnlp2011-45-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Andre Martins ; Noah Smith ; Mario Figueiredo ; Pedro Aguiar
Abstract: Dual decomposition has been recently proposed as a way of combining complementary models, with a boost in predictive power. However, in cases where lightweight decompositions are not readily available (e.g., due to the presence of rich features or logical constraints), the original subgradient algorithm is inefficient. We sidestep that difficulty by adopting an augmented Lagrangian method that accelerates model consensus by regularizing towards the averaged votes. We show how first-order logical constraints can be handled efficiently, even though the corresponding subproblems are no longer combinatorial, and report experiments in dependency parsing, with state-of-the-art results. 1
M. Auli and A. Lopez. 2011. A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing. In Proc. of ACL. Y. Bar-Hillel, M. Perles, and E. Shamir. 1964. On formal properties of simple phrase structure grammars. Language and Information: Selected Essays on their Theory and Application, pages 116–150. Taylor Berg-Kirkpatrick, Dan Gillick, and Dan Klein. 2011. Jointly learning to extract and compress. In Proc. of ACL. D. Bertsekas, W. Hager, and O. Mangasarian. 1999. Nonlinear programming. Athena Scientific. D.P. Bertsekas, A. Nedic, and A.E. Ozdaglar. 2003. Convex analysis and optimization. Athena Scientific. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. 2011. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Now Publishers (to appear). J.P. Boyle and R.L. Dykstra. 1986. A method for finding projections onto the intersections of convex sets in Hilbert spaces. In Advances in order restricted statistical inference, pages 28–47. Springer Verlag. S. Buchholz and E. Marsi. 2006. CoNLL-X shared task on multilingual dependency parsing. In CoNLL. X. Carreras, M. Collins, and T. Koo. 2008. TAG, Dynamic Programming, and the Perceptron for Efficient, Feature-rich Parsing. In CONLL. X. Carreras. 2007. Experiments with a higher-order projective dependency parser. In CoNLL. E. Charniak and M. Johnson. 2005. Coarse-to-fine nbest parsing and MaxEnt discriminative reranking. In Proc. ACL, pages 173–180. Association for Computational Linguistics Morristown, NJ, USA. D. Chiang. 2007. Hierarchical phrase-based translation. computational linguistics, 33(2):201–228. J. Clarke and M. Lapata. 2008. Global Inference for Sentence Compression An Integer Linear Programming Approach. JAIR, 31:399–429. K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. 2006. Online Passive-Aggressive Algorithms. JMLR, 7:551–585. J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. 2008. Efficient projections onto the L1-ball for learning in high dimensions. In ICML. J. Eckstein and D. Bertsekas. 1992. On the DouglasRachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55(1):293–3 18. J. R. Finkel, A. Kleeman, and C. D. Manning. 2008. Efficient, feature-based, conditional random field parsing. Proceedings of ACL-08: HLT, pages 959–967. 248 D. Gabay and B. Mercier. 1976. A dual algorithm for the solution ofnonlinear variational problems via finite element approximation. Computers and Mathematics with Applications, 2(1): 17–40. J. Gim e´nez and L. Marquez. 2004. Svmtool: A general pos tagger generator based on support vector machines. In Proc. of LREC. R. Glowinski and P. Le Tallec. 1989. Augmented Lagrangian and operator-splitting methods in nonlinear mechanics. Society for Industrial Mathematics. R. Glowinski and A. Marroco. 1975. Sur l’approximation, par ´ el e´ments finis d’ordre un, et la r ´esolution, par penalisation-dualit e´, d’une classe de probl `emes de Dirichlet non lin e´aires. Rev. Franc. Au- tomat. Inform. Rech. Operat., 9:41–76. M. Hestenes. 1969. Multiplier and gradient methods. Jour. Optim. Theory and Applic., 4:302–320. L. Huang and K. Sagae. 2010. Dynamic programming for linear-time incremental parsing. In Proc. of ACL, pages 1077–1086. R. Johansson and P. Nugues. 2008. Dependency-based Semantic Role Labeling of PropBank. In EMNLP. V. Jojic, S. Gould, and D. Koller. 2010. Accelerated dual decomposition for MAP inference. In ICML. N. Komodakis, N. Paragios, and G. Tziritas. 2007. MRF optimization via dual decomposition: Messagepassing revisited. In ICCV. T. Koo and M. Collins. 2010. Efficient third-order dependency parsers. In Proc. of ACL, pages 1–1 1. T. Koo, A. M. Rush, M. Collins, T. Jaakkola, and D. Sontag. 2010. Dual decomposition for parsing with nonprojective head automata. In EMNLP. A. Kulesza and F. Pereira. 2007. Structured Learning with Approximate Inference. NIPS. P. Liang, M.I. Jordan, and D. Klein. 2011. Learning dependency-based compositional semantics. In Proc. Association for Computational Linguistics (ACL). A. F. T. Martins and N. A. Smith. 2009. Summarization with a joint model for sentence extraction and compression. In NAACL-HLT Workshop on Integer Linear Programming for NLP. A. F. T. Martins, D. Das, N. A. Smith, and E. P. Xing. 2008. Stacking dependency parsers. In EMNLP. A. F. T. Martins, N. A. Smith, and E. P. Xing. 2009a. Concise integer linear programming formulations for dependency parsing. In ACL-IJCNLP. A. F. T. Martins, N. A. Smith, and E. P. Xing. 2009b. Polyhedral outer approximations with application to natural language parsing. In ICML. A. F. T. Martins, N. A. Smith, E. P. Xing, M. A. T. Figueiredo, and P. M. Q. Aguiar. 2010. Turbo parsers: Dependency parsing by approximate variational inference. In EMNLP. A. F. T. Martins, M. A. T. Figueiredo, P. M. Q. Aguiar, N. A. Smith, and E. P. Xing. 2011. An Augmented Lagrangian Approach to Constrained MAP Inference. In ICML. R. McDonald, K. Lerman, and F. Pereira. 2006. Multilingual dependency analysis with a two-stage discriminative parser. In CoNLL. O. Meshi and A. Globerson. 2011. An Alternating Direction Method for Dual MAP LP Relaxation. In ECML PKDD. J. Nivre and R. McDonald. 2008. Integrating graphbased and transition-based dependency parsers. In ACL-HLT. J. Nivre, J. Hall, J. Nilsson, G. Eryi gˇit, and S. Marinov. 2006. Labeled pseudo-projective dependency parsing with support vector machines. In Procs. of CoNLL. S. Petrov and D. Klein. 2008. Sparse multi-scale grammars for discriminative latent variable parsing. In Proc. of EMNLP. M. Powell. 1969. A method for nonlinear constraints in minimization problems. In R. Fletcher, editor, Optimization, pages 283–298. Academic Press. M. Richardson and P. Domingos. 2006. Markov logic networks. Machine Learning, 62(1): 107–136. S. Riedel and J. Clarke. 2006. Incremental integer linear programming for non-projective dependency parsing. In EMNLP. A. M. Rush and M. Collins. 2011. Exact decoding of syntactic translation models through lagrangian relaxation. In ACL. A. Rush, D. Sontag, M. Collins, and T. Jaakkola. 2010. On dual decomposition and linear programming relaxations for natural language processing. In EMNLP. N. Shor. 1985. Minimization methods for nondifferentiable functions. Springer. D. Smith and J. Eisner. 2008. Dependency parsing by belief propagation. In EMNLP. H. Yamada and Y. Matsumoto. 2003. Statistical dependency analysis with support vector machines. In IWPT. D. Sontag, T. Meltzer, A. Globerson, Y. Weiss, and T Jaakkola. 2008. Tightening LP relaxations for MAP using message-passing. In UAI. M. Surdeanu, R. Johansson, A. Meyers, L. M `arquez, and J. Nivre. 2008. The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies. CoNLL. B. Taskar, C. Guestrin, and D. Koller. 2003. Max-margin Markov networks. In NIPS. R.W. Tromble and J. Eisner. 2006. A fast finite-state relaxation method for enforcing global constraints on sequence decoding. In Proc. of NAACL, pages 423– 430. M. Wainwright and M. Jordan. 2008. Graphical Models, Exponential Families, and Variational Inference. Now Publishers. 249