acl acl2012 acl2012-11 acl2012-11-reference knowledge-graph by maker-knowledge-mining

11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction

Source: pdf

Author: Dave Golland ; John DeNero ; Jakob Uszkoreit

Abstract: We present LLCCM, a log-linear variant ofthe constituent context model (CCM) of grammar induction. LLCCM retains the simplicity of the original CCM but extends robustly to long sentences. On sentences of up to length 40, LLCCM outperforms CCM by 13.9% bracketing F1 and outperforms a right-branching baseline in regimes where CCM does not.

reference text

Taylor Berg-Kirkpatrick and Dan Klein. 2010. Phylogenetic grammar induction. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1288–1297, Uppsala, Sweden, July. Association for Computational Linguistics. Taylor Berg-Kirkpatrick, Alexandre Bouchard-C oˆt´ e, John DeNero, and Dan Klein. 2010. Painless unsupervised learning with features. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 582–590, Los Angeles, California, June. Association for Computational Linguistics. Rens Bod. 2006. Unsupervised parsing with U-DOP. In Proceedings of the Conference on Computational Natural Language Learning. Glenn Carroll and Eugene Charniak. 1992. Two experiments on learning probabilistic dependency grammars from corpora. In Workshop Notes for StatisticallyBased NLP Techniques, AAAI, pages 1–13. David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 263–270, Ann Arbor, Michigan, June. Association for Computational Linguistics. Shay B. Cohen and Noah A. Smith. 2009. Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Asso- ciation for Computational Linguistics, pages 74–82, Boulder, Colorado, June. Association for Computational Linguistics. Shay B. Cohen, Dipanjan Das, and Noah A. Smith. 2011. Unsupervised structure prediction with non-parallel multilingual guidance. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 50–61, Edinburgh, Scotland, UK., July. Association for Computational Linguistics. Arthur Dempster, Nan Laird, and Donald Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1): 1–38. John DeNero and Jakob Uszkoreit. 2011. Inducing sentence structure from parallel corpora for reordering. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 193– 203, Edinburgh, Scotland, UK., July. Association for Computational Linguistics. Chris Dyer, Kevin Gimpel, Jonathan H. Clark, and Noah A. Smith. 2011. The CMU-ARK German21 English translation system. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 337–343, Edinburgh, Scotland, July. Association for Computational Linguistics. William P. Headden III, Mark Johnson, and David McClosky. 2009. Improving unsupervised dependency parsing with richer contexts and smoothing. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 101–109, Boulder, Colorado, June. Association for Computational Linguistics. Dan Klein and Christopher D. Manning. 2002. A generative constituent-context model for improved grammar induction. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pages 128–135, Philadelphia, Pennsylvania, USA, July. Association for Computational Linguistics. Dan Klein and Christopher D. Manning. 2004. Corpusbased induction of syntactic structure: Models of dependency and constituency. In Proceedings of the 42nd Meeting of the Association for Computational Linguistics, Main Volume, pages 478–485, Barcelona, Spain, July. Dan Klein. 2005. The Unsupervised Learning of Natural Language Structure. Ph.D. thesis. Karim Lari and Steve J. Young. 1990. The estimation of stochastic context-free grammars using the insideoutside algorithm. Computer Speech and Language, 4:35–56. Dong C. Liu and Jorge Nocedal. 1989. On the limited memory method for large scale optimization. Mathematical Programming B, 45(3):503–528. Franco Luque. 2011. Una implementaci o´n del modelo DMV+CCM para parsing no supervisado. In 2do Workshop Argentino en Procesamiento de Lenguaje Natural. Mitchell P. Marcus, Beatrice Santorini, and Mary A. Marcinkiewicz. 1993. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19(2):3 13–330. Tahira Naseem and Regina Barzilay. 2011. Using semantic cues to learn syntax. In AAAI. Tahira Naseem, Harr Chen, Regina Barzilay, and Mark Johnson. 2010. Using universal linguistic knowledge to guide grammar induction. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1234–1244, Cambridge, MA, October. Association for Computational Linguistics. Fernando Pereira and Yves Schabes. 1992. Insideoutside reestimation from partially bracketed corpora. In Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, pages 128– 135, Newark, Delaware, USA, June. Association for Computational Linguistics. Elias Ponvert, Jason Baldridge, and Katrin Erk. 2011. Simple unsupervised grammar induction from raw text with cascaded finite state models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 1077–1086, Portland, Oregon, USA, June. Association for Computational Linguistics. Roi Reichart and Ari Rappoport. 2010. Improved fully unsupervised parsing with zoomed learning. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 684–693, Cambridge, MA, October. Association for Computational Linguistics. Yoav Seginer. 2007. Fast unsupervised incremental parsing. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 384– 391, Prague, Czech Republic, June. Association for Computational Linguistics. 22