emnlp emnlp2012 emnlp2012-119 emnlp2012-119-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Paramveer Dhillon ; Jordan Rodu ; Michael Collins ; Dean Foster ; Lyle Ungar
Abstract: Recently there has been substantial interest in using spectral methods to learn generative sequence models like HMMs. Spectral methods are attractive as they provide globally consistent estimates of the model parameters and are very fast and scalable, unlike EM methods, which can get stuck in local minima. In this paper, we present a novel extension of this class of spectral methods to learn dependency tree structures. We propose a simple yet powerful latent variable generative model for dependency parsing, and a spectral learning method to efficiently estimate it. As a pilot experimental evaluation, we use the spectral tree probabilities estimated by our model to re-rank the outputs of a near state-of-theart parser. Our approach gives us a moderate reduction in error of up to 4.6% over the baseline re-ranker. .
Shay Cohen, Karl Stratos, Michael Collins, Dean Foster, and Lyle Ungar. Spectral learning of latent-variable pcfgs. In Association of Computational Linguistics (ACL), volume 50, 2012. Michael Collins. Ranking algorithms for namedentity extraction: boosting and the voted perceptron. In Proceedings of the 40th Annual Meet- ing on Association for Computational Linguistics, ACL ’02, pages 489–496, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics. URL http : / / dx .doi .org/ 10 . 3 115 / 10 7 3 0 8 3 . 10 7 3 16 5. Michael Collins and Terry Koo. Discriminative reranking for natural language parsing. Comput. Linguist., 31(1):25–70, March 2005. ISSN 08912017. Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7:55 1–585, 2006. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. JRSS, SERIES B, 39(1): 1–38, 1977. Paramveer S. Dhillon, Dean Foster, and Lyle Ungar. Multi-view learning of word embeddings via cca. In Advances in Neural Information Processing Systems (NIPS), volume 24, 2011. Jason Eisner and Giorgio Satta. Efficient parsing for bilexical context-free grammars and headautomaton grammars. In Proceedings of the 37th Annual Meeting of the Association for Computa- tional Linguistics (ACL), pages 457–464, University of Maryland, June 1999. URL http : / / cs . j hu .edu / ˜ j as on /papers / #acl 9 9. Dean Foster, Jordan Rodu, and Lyle Ungar. Spectral dimensionality reduction for HMMs. ArXiV http://arxiv.org/abs/1203.6130, 2012. D Hsu, S M. Kakade, and Tong Zhang. A spectral algorithm for learning hidden markov models. arXiv:0811.4413v2, 2008. Terry Koo, Xavier Carreras, and Michael Collins. Simple semi-supervised dependency parsing. In In Proc. ACL/HLT, 2008. F. Luque, A. Quattoni, B. Balle, and X. Carreras. Spectral learning for non-deterministic dependency parsing. In EACL, 2012. Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated corpus of english: the penn treebank. Comput. Linguist., 19:313–330, June 1993. ISSN 08912017. Ryan McDonald. Discriminative learning and spanning tree algorithmsfor dependencyparsing. PhD thesis, University of Pennsylvania, Philadelphia, PA, USA, 2006. AAI3225503. Bernard Merialdo. Tagging english text with a probabilistic model. Comput. Linguist., 20: 155–171, June 1994. ISSN 0891-2017. 213 Gabriele Antonio Musillo and Paola Merlo. Unlexicalised hidden variable models of split dependency grammars. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, HLT-Short ’08, pages 213– 216, Stroudsburg, PA, USA, 2008. Association for Computational Linguistics. Ankur P. Parikh, Le Song, and Eric P. Xing. A spectral algorithm for latent tree graphical models. In ICML, pages 1065–1072, 2011. Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. Learning accurate, compact, and interpretable tree annotation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Associationfor Computational Linguistics, ACL-44, pages 433–440, Stroudsburg, PA, USA, 2006. Association for Computational Linguistics. Ariadna Quattoni, Michael Collins, and Trevor Darrell. Conditional random fields for object recognition. In In NIPS, pages 1097–1 104. MIT Press, 2004. Lawrence R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. In Proceedings of the IEEE, pages 257– 286, 1989. Adwait Ratnaparkhi. A Maximum Entropy Model for Part-Of-Speech Tagging. In Eric Brill and Kenneth Church, editors, Proceedings of the Empirical Methods in Natural Language Processing, pages 133–142, 1996. Sebastiaan Terwijn. On the learnability of hidden markov models. In Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications, ICGI ’02, pages 261–268, London, UK, UK, 2002. SpringerVerlag. ISBN 3-540-44239-1.