emnlp emnlp2011 emnlp2011-118 emnlp2011-118-reference knowledge-graph by maker-knowledge-mining

118 emnlp-2011-SMT Helps Bitext Dependency Parsing

Source: pdf

Author: Wenliang Chen ; Jun'ichi Kazama ; Min Zhang ; Yoshimasa Tsuruoka ; Yujie Zhang ; Yiou Wang ; Kentaro Torisawa ; Haizhou Li

Abstract: We propose a method to improve the accuracy of parsing bilingual texts (bitexts) with the help of statistical machine translation (SMT) systems. Previous bitext parsing methods use human-annotated bilingual treebanks that are hard to obtain. Instead, our approach uses an auto-generated bilingual treebank to produce bilingual constraints. However, because the auto-generated bilingual treebank contains errors, the bilingual constraints are noisy. To overcome this problem, we use large-scale unannotated data to verify the constraints and design a set of effective bilingual features for parsing models based on the verified results. The experimental results show that our new parsers significantly outperform state-of-theart baselines. Moreover, our approach is still able to provide improvement when we use a larger monolingual treebank that results in a much stronger baseline. Especially notable is that our approach can be used in a purely monolingual setting with the help of SMT.

reference text

Ann Bies, Martha Palmer, Justin Mott, and Colin Warner. 2007. English Chinese Translation Treebank V 1.0, LDC2007T02. Linguistic Data Consortium. David Burkett and Dan Klein. 2008. Two languages are better than one (for syntactic parsing). In Proceedings of EMNLP 2008, pages 877–886, Honolulu, Hawaii, October. Association for Computational Linguistics. Xavier Carreras. 2007. Experiments with a higher-order projective dependency parser. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pages 957–961, Prague, Czech Republic, June. Association for Computational Linguistics. 82 Eugene Charniak, Don Blaheta, Niyu Ge, Keith Hall, John Hale, and Mark Johnson. 2000. BLLIP 198789 WSJ Corpus Release 1, LDC2000T43. Linguistic Data Consortium. Wenliang Chen, Jun’ichi Kazama, Kiyotaka Uchimoto, and Kentaro Torisawa. 2009. Improving dependency parsing with subtrees from auto-parsed data. In Proceedings of EMNLP 2009, pages 570–579, Singapore, August. Wenliang Chen, Jun’ichi Kazama, and Kentaro Torisawa. 2010. Bitext dependency parsing with bilingual subtree constraints. In Proceedings of ACL 2010, pages 21–29, Uppsala, Sweden, July. Association for Computational Linguistics. Koby Crammer and Yoram Singer. 2003. Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res., 3:95 1–991 . John DeNero and Dan Klein. 2007. Tailoring word alignments to syntactic machine translation. In Proceedings of ACL 2007, pages 17–24, Prague, Czech Republic, June. Association for Computational Linguistics. Liang Huang, Wenbin Jiang, and Qun Liu. 2009. Bilingually-constrained (monolingual) shift-reduce parsing. In Proceedings of EMNLP 2009, pages 1222– 1231, Singapore, August. Association for Computational Linguistics. Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of NAACL 2003, pages 48–54. Association for Computational Linguistics. Terry Koo and Michael Collins. 2010. Efficient thirdorder dependency parsers. In Proceedings of ACL 2010, pages 1–1 1, Uppsala, Sweden, July. Association for Computational Linguistics. Canasai Kruengkrai, Kiyotaka Uchimoto, Jun’ichi Kazama, Yiou Wang, Kentaro Torisawa, and Hitoshi Isahara. 2009. An error-driven word-character hybrid model for joint Chinese word segmentation and POS tagging. In Proceedings of ACL-IJCNLP2009, pages 513–521, Suntec, Singapore, August. Association for Linguistics. N. Li and Sandra A. Thompson. 1997. Computational Charles darin Chinese - A Functional Reference Man- Grammar. University of California Press. Percy Liang, Ben Taskar, and Dan Klein. 2006. Alignment by agreement. In Proceedings of NAACL 2006, pages 104–1 11, New York City, USA, June. Association for Computational Linguistics. Yang Liu and Liang Huang. 2010. Tree-based and forestbased translation. In Tutorial Abstracts of ACL 2010, page 2, Uppsala, Sweden, July. Association for Computational Linguistics. Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguisticss, 19(2):3 13–330. Ryan McDonald and Fernando Pereira. 2006. Online learning of approximate dependency parsing algorithms. In Proceedings of EACL 2006, pages 81–88. Joakim Nivre. 2003. An efficient algorithm for projective dependency parsing. In Proceedings of IWPT2003, pages 149–160. Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1): 19–5 1. Adwait Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. In Proceedings of EMNLP 1996, pages 133–142. David A. Smith and Noah A. Smith. 2004. Bilingual parsing with factored estimation: Using English to parse Korean. In Proceedings of EMNLP 2004, pages 49–56. Hiroyasu Yamada and Yuji Matsumoto. 2003. Statistical dependency analysis with support vector machines. In Proceedings of IWPT 2003, pages 195–206. Hai Zhao, Yan Song, Chunyu Kit, and Guodong Zhou. 2009. Cross language dependency parsing using a bilingual lexicon. In Proceedings of ACLIJCNLP2009, pages 55–63, Suntec, Singapore, August. Association for Computational Linguistics. 83