acl acl2012 acl2012-172 acl2012-172-reference knowledge-graph by maker-knowledge-mining

172 acl-2012-Selective Sharing for Multilingual Dependency Parsing

Source: pdf

Author: Tahira Naseem ; Regina Barzilay ; Amir Globerson

Abstract: We present a novel algorithm for multilingual dependency parsing that uses annotations from a diverse set of source languages to parse a new unannotated language. Our motivation is to broaden the advantages of multilingual learning to languages that exhibit significant differences from existing resource-rich languages. The algorithm learns which aspects of the source languages are relevant for the target language and ties model parameters accordingly. The model factorizes the process of generating a dependency tree into two steps: selection of syntactic dependents and their ordering. Being largely languageuniversal, the selection component is learned in a supervised fashion from all the training languages. In contrast, the ordering decisions are only influenced by languages with similar properties. We systematically model this cross-lingual sharing using typological features. In our experiments, the model consistently outperforms a state-of-the-art multilingual parser. The largest improvement is achieved on the non Indo-European languages yielding a gain of 14.4%.1

reference text

Taylor Berg-Kirkpatrick and Dan Klein. 2010. Phylogenetic grammar induction. In ACL, pages 1288–1297. Sabine Buchholz and Erwin Marsi. 2006. CoNLL-X shared task on multilingual dependency parsing. In Proceedings of CoNLL, pages 149–164. David Burkett and Dan Klein. 2008. Two languages are better than one (for syntactic parsing). In Proceedings of EMNLP, pages 877–886. Shay B. Cohen, Dipanjan Das, and Noah A. Smith. 2011. Unsupervised structure prediction with non-parallel multilingual guidance. In EMNLP, pages 50–61. Bernard Comrie. 1989. Language Universals and Linguistic Typology: Syntax and Morphology. Oxford: Blackwell. Jason Eisner and Noah A. Smith. 2010. Favor short dependencies: Parsing with soft and hard constraints on dependency length. In Trends in Parsing Technology: Dependency Parsing, Domain Adaptation, and Deep Parsing, pages 121–150. Jo˜ ao Gra ¸ca, Kuzman Ganchev, and Ben Taskar. 2007. Expectation maximization and posterior constraints. In Advances in NIPS, pages 569–576. Joseph H Greenberg. 1963. Some universals of language with special reference to the order of meaningful elements. In Joseph H Greenberg, editor, Universals of Language, pages 73–1 13. MIT Press. Z.S. Harris. 1968. Mathematical structures of language. Wiley. Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie, editors. 2005. The World Atlas of Language Structures. Oxford University Press. R. Hwa, P. Resnik, A. Weinberg, C. Cabezas, and O. Kolak. 2005. Bootstrapping parsers via syntactic projection across parallel texts. Journal ofNatural Language Engineering, 11(3):31 1–325. Dan Klein and Christopher Manning. 2004. Corpusbased induction of syntactic structure: Models of dependency and constituency. In Proceedings of ACL, pages 478–485. Jonas Kuhn. 2004. Experiments in parallel-text based grammar induction. In Proceedings of the ACL, pages 470–477. Ryan T. McDonald, Slav Petrov, and Keith Hall. 2011. Multi-source transfer of delexicalized dependency parsers. In EMNLP, pages 62–72. Tahira Naseem, Harr Chen, Regina Barzilay, and Mark Johnson. 2010. Using universal linguistic knowledge to guide grammar induction. In EMNLP, pages 1234– 1244. Joakim Nivre, Johan Hall, Sandra K ¨ubler, Ryan McDonald, Jens Nilsson, Sebastian Riedel, and Deniz Yuret. 2007. The CoNLL 2007 shared task on dependency 637 parsing. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pages 915–932. Slav Petrov, Dipanjan Das, and Ryan McDonald. 2011. A universal part-of-speech tagset. In ArXiv, April. David A. Smith and Noah A. Smith. 2004. Bilingual parsing with factored estimation: Using English to parse Korean. In Proceeding of EMNLP, pages 49– 56. Benjamin Snyder, Tahira Naseem, and Regina Barzilay. 2009. Unsupervised multilingual grammar induction. In Proceedings of ACL/AFNLP, pages 73–81 . Anders Søgaard. 2011. Data point selection for crosslanguage adaptation of dependency parsers. In ACL (Short Papers), pages 682–686. Dekai Wu. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23(3):377–403. Chenhai Xi and Rebecca Hwa. 2005. A backoff model for bootstrapping resources for non-english languages. In Proceedings of EMNLP, pages 85 1 858. Daniel Zeman and Philip Resnik. 2008. Cross-language parser adaptation between related languages. In Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages, pages 35–42, January. –