acl acl2013 acl2013-357 acl2013-357-reference knowledge-graph by maker-knowledge-mining

357 acl-2013-Transfer Learning for Constituency-Based Grammars


Source: pdf

Author: Yuan Zhang ; Regina Barzilay ; Amir Globerson

Abstract: In this paper, we consider the problem of cross-formalism transfer in parsing. We are interested in parsing constituencybased grammars such as HPSG and CCG using a small amount of data specific for the target formalism, and a large quantity of coarse CFG annotations from the Penn Treebank. While all of the target formalisms share a similar basic syntactic structure with Penn Treebank CFG, they also encode additional constraints and semantic features. To handle this apparent discrepancy, we design a probabilistic model that jointly generates CFG and target formalism parses. The model includes features of both parses, allowing trans- fer between the formalisms, while preserving parsing efficiency. We evaluate our approach on three constituency-based grammars CCG, HPSG, and LFG, augmented with the Penn Treebank-1. Our experiments show that across all three formalisms, the target parsers significantly benefit from the coarse annotations.1 —


reference text

John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 120–128. Association for Computational Linguistics. Joan Bresnan. 1982. The mental representation of grammatical relations, volume 1. The MIT Press. Aoife Cahill, Mairad McCarthy, Josef van Genabith, and Andy Way. 2002. Parsing with pcfgs and automatic f-structure annotation. In Proceedings of the Seventh International Conference on LFG, pages 76–95. CSLI Publications. Aoife Cahill, Michael Burke, Ruth O’Donovan, Josef Van Genabith, and Andy Way. 2004. Long-distance dependency resolution in automatically acquired wide-coverage pcfg-based lfg approximations. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, page 319. Association for Computational Linguistics. Aoife Cahill. 2004. Parsing with Automatically Acquired, Wide-Coverage, Robust, Probabilistic LFG Approximation. Ph.D. thesis. Marie Candito, Beno ıˆt Crabb ´e, Pascal Denis, et al. 2010. Statistical french dependency parsing: treebank conversion and first results. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), pages 1840–1847. Eugene Charniak and Mark Johnson. 2005. Coarseto-fine n-best parsing and maxent discriminative reranking. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 173–180. Association for Computational Linguistics. Eugene Charniak. 2000. A maximum-entropyinspired parser. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference, pages 132–139. John Chen and Vijay K Shanker. 2005. Automated extraction of tags from the penn treebank. New developments in parsing technology, pages 73–89. Stephen Clark and James R Curran. 2003. Log-linear models for wide-coverage ccg parsing. In Proceed- ings of the 2003 conference on Empirical methods in natural language processing, pages 97–104. Association for Computational Linguistics. 299 Stephen Clark and James R Curran. 2007. Widecoverage efficient statistical parsing with ccg and log-linear models. Computational Linguistics, 33(4):493–552. Michael Collins. 1997. Three generative, lexicalised models for statistical pprsing. In Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, pages 16–23. Association for Computational Linguistics. Michael Collins. 2003. Head-driven statistical models for natural language parsing. Computational linguistics, 29(4):589–637. Mark Dredze, John Blitzer, Partha Pratim Talukdar, Kuzman Ganchev, Joao V Gra ¸ca, and Fernando Pereira. 2007. Frustratingly hard domain adaptation for dependency parsing. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL, volume 2007. William Gropp, Ewing Lusk, and Anthony Skjellum. 1999. Using MPI: portable parallel programming with the message passing interface, volume 1. MIT press. Julia Hockenmaier and Mark Steedman. 2002. Acquiring compact lexicalized grammars from a cleaner treebank. In Proceedings of the Third LREC Conference, pages 1974–1981 . Julia Hockenmaier. 2003. Data and models for statistical parsing with combinatory categorial grammar. Rebecca Hwa, Philip Resnik, and Amy Weinberg. 2005. Breaking the resource bottleneck for multilingual parsing. Technical report, DTIC Document. Wenbin Jiang and Qun Liu. 2009. Automatic adaptation of annotation standards for dependency parsing: using projected treebank as source corpus. In Proceedings of the 11th International Conference on Parsing Technologies, pages 25–28. Association for Computational Linguistics. Ronald M. Kaplan, Stefan Riezler, Tracy H. King, John T. Maxwell III, Alexander Vasserman, and Richard Crouch. 2004. Speed and accuracy in shallow and deep stochastic parsing. In Proceedings of NAACL. Tracy Holloway King, Richard Crouch, Stefan Riezler, Mary Dalrymple, and Ronald M Kaplan. 2003. The parc 700 dependency bank. In Proceedings of the EACL03: 4th International Workshop on Linguistically Interpreted Corpora (LINC-03), pages 1–8. Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of english: The penn treebank. Computational linguistics, 19(2):3 13–330. David McClosky, Eugene Charniak, and Mark Johnson. 2010. Automatic domain adaptation for parsing. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 28–36. Association for Computational Linguistics. Ryan McDonald, Kevin Lerman, and Fernando Pereira. 2006. Multilingual dependency analysis with a twostage discriminative parser. In Proceedings of the Tenth Conference on Computational Natural Language Learning, pages 216–220. Association for Computational Linguistics. Ryan McDonald, Slav Petrov, and Keith Hall. 2011. Multi-source transfer of delexicalized dependency parsers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 62–72. Association for Computational Linguistics. Yusuke Miyao and Jun’ichi Tsujii. 2008. Feature forest models for probabilistic hpsg parsing. Computational Linguistics, 34(1):35–80. Yusuke Miyao, Takashi Ninomiya, and Junichi Tsujii. 2005. Corpus-oriented grammar development for acquiring a head-driven phrase structure grammar from the penn treebank. Natural Language Processing–IJCNLP 2004, pages 684–693. Yusuke Miyao. 2006. From Linguistic Theory to Syntactic Analysis: Corpus-Oriented Grammar Development and Feature Forest Model. Ph.D. thesis. Jorge Nocedal and Stephen J Wright. 1999. Numerical optimization. Springer verlag. Stephan Oepen, Dan Flickinger, and Francis Bond. 2004. Towards holistic grammar engineering and testing–grafting treebank maintenance into the grammar revision cycle. In Proceedings of the IJCNLP workshop beyond shallow analysis. Citeseer. Slav Petrov and Dan Klein. 2007. Improved inference for unlexicalized parsing. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, pages 404–41 1. Carl Pollard and Ivan A Sag. 1994. Head-driven phrase structure grammar. University of Chicago Press. Stefan Riezler, Tracy H King, Ronald M Kaplan, Richard Crouch, John T Maxwell III, and Mark Johnson. 2002. Parsing the wall street journal using a lexical-functional grammar and discriminative estimation techniques. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 271–278. Association for Computational Linguistics. Benjamin Snyder, Tahira Naseem, and Regina Barzilay. 2009. Unsupervised multilingual grammar induction. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 300 4th International Joint Conference on Natural Lan- guage Processing of the AFNLP: Volume 1-Volume 1, pages 73–81. Association for Computational Linguistics. Mark Steedman. press. 2001 . The syntactic process. MIT Yue Zhang, Stephen Clark, et al. 2011. Shift-reduce ccg parsing. In Proceedings of the 49th Meeting of the Association for Computational pages Linguistics, 683–692. 301