acl acl2013 acl2013-260 acl2013-260-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Matthew R. Gormley ; Jason Eisner
Abstract: Many models in NLP involve latent variables, such as unknown parses, tags, or alignments. Finding the optimal model parameters is then usually a difficult nonconvex optimization problem. The usual practice is to settle for local optimization methods such as EM or gradient ascent. We explore how one might instead search for a global optimum in parameter space, using branch-and-bound. Our method would eventually find the global maximum (up to a user-specified ?) if run for long enough, but at any point can return a suboptimal solution together with an upper bound on the global maximum. As an illustrative case, we study a generative model for dependency parsing. We search for the maximum-likelihood model parameters and corpus parse, subject to posterior constraints. We show how to formulate this as a mixed integer quadratic programming problem with nonlinear constraints. We use the Reformulation Linearization Technique to produce convex relaxations during branch-and-bound. Although these techniques do not yet provide a practical solution to our instance of this NP-hard problem, they sometimes find better solutions than Viterbi EM with random restarts, in the same time.
Tobias Achterberg. 2007. Constraint integer programming. Ph.D. thesis, TU Berlin. Warren P. Adams and Hanif D. Sherali. 1986. A tight linearization and an algorithm for zero-one quadratic programming problems. Management Science, 32(10): 1274–1290, October. ArticleType: research-article / Full publication date: Oct., 1986 / Copyright 1986 INFORMS. Kurt Anstreicher. 2009. Semidefinite versus the reformulation-linearization nonconvex quadratically constrained gramming. Journal of Global 43(2):471–484. programming technique for quadratic proOptimization, Taylor Berg-Kirkpatrick, Alexandre Bouchard-C oˆt´ e, DeNero, John DeNero, and Dan Klein. 2010. Painless unsupervised learning with features. In Proc. of NAACL, June. Samuel Burer and Dieter Vandenbussche. 2009. Globally solving box-constrained nonconvex quadratic programs with semidefinite-based finite branch-andbound. Computational Optimization and Applications, 43(2): 181–195. Olivier Chapelle, Vikas Sindhwani, and S. Sathiya Keerthi. 2007. Branch and bound for semisupervised support vector machines. In Proc. of NIPS 19, pages 217–224. MIT Press. E. Charniak. 1993. Statistical language learning. MIT press. Yunmei Chen and Xiaojing Ye. 2011. Projection onto a simplex. arXiv:1101.6081, January. Noam Chomsky and Howard Lasnik. 1993. Principles and parameters theory. In Syntax: An International Handbook of Contemporary Research. Berlin: de Gruyter. Shay Cohen and Noah A. Smith. 2009. Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction. In Proc. of HLTNAACL, pages 74–82, June. Shay Cohen and Noah A. Smith. 2010. Viterbi training for PCFGs: Hardness results and competitiveness of uniform initialization. In Proc. of ACL, pages 1502– 15 11, July. S. B. Cohen, K. Gimpel, and N. A. Smith. 2009. Logistic normal priors for unsupervised probabilistic grammar induction. In Proceedings of NIPS. Shay B. Cohen, Karl Stratos, Michael Collins, Dean P. Foster, and Lyle Ungar. 2012. Spectral learning of latent-variable PCFGs. In Proc. of ACL (Volume 1: Long Papers), pages 223–23 1. Association for Computational Linguistics, July. George B. Dantzig and Philip Wolfe. 1960. Decomposition principle for linear programs. Operations Research, 8(1): 101–1 11, January. Jason Eisner and Giorgio Satta. 1999. Efficient parsing for bilexical context-free grammars and head automaton grammars. In Proc. of ACL, pages 457– 464, June. Jennifer Gillenwater, Kuzman Ganchev, Joo Graa, Fernando Pereira, and Ben Taskar. 2010. Sparsity in dependency grammar induction. In Proceedings of the ACL 2010 Conference Short Papers, pages 194–199. Association for Computational Linguistics, July. K. Gimpel and N. A. Smith. 2012. Concavity and initialization for unsupervised dependency parsing. In Proc. of NAACL. M. Held and R. M. Karp. 1970. The travelingsalesman problem and minimum spanning trees. Operations Research, 18(6): 1138–1 162. D. Hsu, S. M Kakade, and T. Zhang. 2009. A spectral algorithm for learning hidden markov models. In COLT 2009 - The 22nd Conference on Learning Theory. Dan Klein and Christopher Manning. 2004. Corpusbased induction of syntactic structure: Models of dependency and constituency. In Proc. of ACL, pages 478–485, July. D. E. Knuth. 1975. backtrack programs. 29(129): 121–136. Estimating the efficiency of Mathematics of computation, L. Lobjois and M. Lema ıˆtre. 1998. Branch and bound algorithm selection by performance prediction. In Proc. of the National Conference on Artificial Intelligence, pages 353–358. Franco M. Luque, Ariadna Quattoni, Borja Balle, and Xavier Carreras. 2012. Spectral learning for non-deterministic dependency parsing. In Proc. of EACL, pages 409–419, April. Thomas L. Magnanti and Laurence A. Wolsey. 1994. Optimal Trees. Center for Operations Research and Econometrics. Alexander Martin. 2000. Integer programs with block structure. Technical Report SC-99-03, ZIB. Andr e´ Martins, Noah A. Smith, and Eric Xing. 2009. Concise integer linear programming formulations for dependency parsing. In Proc. of ACL-IJCNLP, pages 342–350, August. Garth P. McCormick. 1976. Computability of global solutions to factorable nonconvex programs: Part I—Convex underestimating problems. Mathematical Programming, 10(1): 147–175. 453 Tahira Naseem, Harr Chen, Regina Barzilay, and Mark Johnson. 2010. Using universal linguistic knowledge to guide grammar induction. In Proc. of EMNLP, pages 1234–1244, October. P. M. Pardalos. 1991. Global optimization algorithms for linearly constrained indefinite quadratic problems. Computers & Mathematics with Applications, 21(6):87–97. Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. A universal part-of-speech tagset. In Proc. of LREC. Xian Qian and Yang Liu. 2013. Branch and bound algorithm for dependency parsing with non-local features. TACL, 1:37—48. Sebastian Riedel and James Clarke. 2006. Incremental integer linear programming for non-projective dependency parsing. In Proc. of EMNLP, pages 129– 137, July. Sebastian Riedel, David Smith, and Andrew McCallum. 2012. Parse, price and cut—Delayed column and row generation for graph based parsers. In Proc. of EMNLP-CoNLL, pages 732–743, July. Hanif D. Sherali and Warren P. Adams. 1990. A hierarchy of relaxations between the continuous and convex hull representations for zero-one programming problems. SIAM Journal on Discrete Mathematics, 3(3):41 1–430, August. H. Sherali and L. Liberti. 2008. Reformulationlinearization technique for global optimization. Encyclopedia of Optimization, 2:3263–3268. Hanif D. Sherali and Cihan H. Tuncbilek. 1995. A reformulation-convexification approach for solving nonconvex quadratic programming problems. Journal of Global Optimization, 7(1): 1–3 1. Noah A. Smith and Jason Eisner. 2006. Annealing structural bias in multilingual weighted grammar induction. In Proc. of COLING-ACL, pages 569–576, July. N.A. Smith. 2006. Novel estimation methods for unsupervised discovery of latent structure in natural language text. Ph.D. thesis, Johns Hopkins University, Baltimore, MD. Valentin ISpitkovsky, Hiyan Alshawi, and Daniel Jurafsky. 2010a. From baby steps to leapfrog: How Qin Iris Wang, Dale Schuurmans, and Dekang Lin. 2008. Semi-supervised convex training for dependency parsing. In Proc of ACL-HLT, pages 532–540. Association for Computational Linguistics, June. Less is more in unsupervised dependency parsing. In Proc. of HLT-NAACL, pages 751–759. Association for Computational Linguistics, June. Valentin I Spitkovsky, Hiyan Alshawi, Daniel Jurafsky, and Christopher D Manning. 2010b. Viterbi training improves unsupervised dependency parsing. In Proc. of CoNLL, pages 9–17. Association for Computational Linguistics, July. S. A. Vavasis. 1991. Nonlinear optimization: plexity issues. Oxford University Press, Inc. com- 454