emnlp emnlp2011 emnlp2011-115 emnlp2011-115-reference knowledge-graph by maker-knowledge-mining

115 emnlp-2011-Relaxed Cross-lingual Projection of Constituent Syntax


Source: pdf

Author: Wenbin Jiang ; Qun Liu ; Yajuan Lv

Abstract: We propose a relaxed correspondence assumption for cross-lingual projection of constituent syntax, which allows a supposed constituent of the target sentence to correspond to an unrestricted treelet in the source parse. Such a relaxed assumption fundamentally tolerates the syntactic non-isomorphism between languages, and enables us to learn the target-language-specific syntactic idiosyncrasy rather than a strained grammar directly projected from the source language syntax. Based on this assumption, a novel constituency projection method is also proposed in order to induce a projected constituent treebank from the source-parsed bilingual corpus. Experiments show that, the parser trained on the projected treebank dramatically outperforms previous projected and unsupervised parsers.


reference text

Phil Blunsom, Trevor Cohn, and Miles Osborne. 2008. Bayesian synchronous grammar induction. In Proceedings of the NIPS. Rens Bod. 2006. An all-subtrees approach to unsupervised parsing. In Proceedings of the COLING-ACL. David Burkett and Dan Klein. 2008. Two languages are better than one (for syntactic parsing). In Proceedings of the EMNLP. Eugene Charniak and Mark Johnson. 2005. Coarse-tofine-grained n-best parsing and discriminative reranking. In Proceedings of the ACL. 1200 Wenliang Chen, Jun. ichi Kazama, and Kentaro Torisawa. 2010. Bitext dependency parsing with bilingual subtree constraints. In Proceedings of the ACL. Shay B. Cohen and Noah A. Smith. 2009. Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction. In Proceedings of the NAACL-HLT. Michael Collins. 2002. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Proceedings of the EMNLP. Michael Collins. 2003. Head-driven statistical models for natural language parsing. Computational Linguistics. Kuzman Ganchev, Jennifer Gillenwater, and Ben Taskar. 2009. Dependency grammar induction via bitext projection constraints. In Proceedings of the 47th ACL. Liang Huang, Wenbin Jiang, and Qun Liu. 2009. Bilingually-constrained (monolingual) shift-reduce parsing. In Proceedings of the EMNLP. Rebecca Hwa, Philip Resnik, Amy Weinberg, Clara Cabezas, and Okan Kolak. 2005. Bootstrapping parsers via syntactic projection across parallel texts. In Natural Language Engineering. Wenbin Jiang, Yajuan L u¨, Yang Liu, and Qun Liu. 2010. Effective constituent projection across languages. In Proceedings of the COLING. A. K. Joshi, L. S. Levy, and M. Takahashi. 1975. Tree adjunct grammars. Journal Computer Systems Science. Dan Klein and Christopher D. Manning. 2004. Corpusbased induction of syntactic structure: Models of dependency and constituency. In Proceedings of the ACL. Terry Koo and Michael Collins. 2010. Efficient thirdorder dependency parsers. In Proceedings of the ACL. Terry Koo, Xavier Carreras, and Michael Collins. 2008. Simple semi-supervised dependency parsing. In Proceedings of the ACL. Jonas Kuhn. 2004. Experiments in parallel-text based grammar induction. In Proceedings of the ACL. Andr e´ F. T. Martins, Noah A. Smith, Eric P. Xing, Pedro M. Q. Aguiar, and M a´rio A. T. Figueiredo. 2010. Turbo parsers: Dependency parsing by approximate variational inference. In Proceedings of EMNLP. David McClosky, Eugene Charniak, and Mark Johnson. 2006. Reranking and self-training for parser adaptation. In Proceedings of the ACL. McDonald and Fernando Pereira. 2006. Online learning of approximate dependency parsing algorithms. In Proceedings of EACL, pages 8 1–88. Ryan Joakim Nivre, Johan Hall, Jens Nilsson, Gulsen Eryigit, and Svetoslav Marinov. 2006. Labeled pseudoprojective dependency parsing with support vector machines. In Proceedings of CoNLL, pages 221–225. Franz J. Och and Hermann Ney. 2000. Improved statistical alignment models. In Proceedings of the ACL. Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. 2006. Learning accurate, compact, and interpretable tree annotation. In Proceedings of the ACL. Anoop Sarkar. 2001. Applying co-training methods to statistical parsing. In Proceedings of NAACL. Yoav Seginer. 2007. Fast unsupervised incremental parsing. In Proceedings of the ACL. David Smith and Jason Eisner. 2009. Parser adaptation and projection with quasi-synchronous grammar features. In Proceedings of EMNLP. David A. Smith and Noah A. Smith. 2004. Bilingual parsing with factored estimation: Using english to parse korean. In Proceedings of the EMNLP. Benjamin Snyder, Tahira Naseem, and Regina Barzilay. 2009. Unsupervised multilingual grammar induction. In Proceedings of the ACL. Mark Steedman, Miles Osborne, Anoop Sarkar, Stephen Clark, Rebecca Hwa, Julia Hockenmaier, Paul Ruhlen, Steven Baker, and Jeremiah Crim. 2003. Bootstrapping statistical parsers from small datasets. In Proceedings of the EACL. Dekai Wu. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics. Nianwen Xue, Fei Xia, Fu-Dong Chiou, and Martha Palmer. 2005. The penn chinese treebank: Phrase structure annotation of a large corpus. In Natural Language Engineering. Hai Zhao, Yan Song, Chunyu Kit, and Guodong Zhou. 2009. Cross language dependency parsing using a bilingual lexicon. In Proceedings of the ACL-IJCNLP. 1201