acl acl2011 acl2011-243 acl2011-243-reference knowledge-graph by maker-knowledge-mining

243 acl-2011-Partial Parsing from Bitext Projections


Source: pdf

Author: Prashanth Mannem ; Aswarth Dara

Abstract: Recent work has shown how a parallel corpus can be leveraged to build syntactic parser for a target language by projecting automatic source parse onto the target sentence using word alignments. The projected target dependency parses are not always fully connected to be useful for training traditional dependency parsers. In this paper, we present a greedy non-directional parsing algorithm which doesn’t need a fully connected parse and can learn from partial parses by utilizing available structural and syntactic information in them. Our parser achieved statistically significant improvements over a baseline system that trains on only fully connected parses for Bulgarian, Spanish and Hindi. It also gave a significant improvement over previously reported results for Bulgarian and set a benchmark for Hindi.


reference text

R. Begum, S. Husain, A. Dhwaj, D. Sharma, L. Bai, and R. Sangal. 2008. Dependency annotation scheme for indian languages. In In Proceedings of The Third International Joint Conference on Natural Language Processing (IJCNLP), Hyderabad, India. Michael John Collins. 1999. Head-driven statistical models for natural language parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia, PA, USA. AAI99261 10. Michael Collins. 2002. Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10, EMNLP 1605 ’02, pages 1–8, Morristown, NJ, USA. Association for Computational Linguistics. Jason M. Eisner. 1996. Three new probabilistic models for dependency parsing: an exploration. In Proceedings of the 16th conference on Computational linguistics - Volume 1, pages 340–345, Morristown, NJ, USA. Association for Computational Linguistics. Kuzman Ganchev, Jennifer Gillenwater, and Ben Taskar. 2009. Dependency grammar induction via bitext projection constraints. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1, ACL-IJCNLP ’09, pages 369– 377, Morristown, NJ, USA. Association for Computational Linguistics. Yoav Goldberg and Michael Elhadad. 2010. An efficient algorithm for easy-first non-directional dependency parsing. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’ 10, pages 742–750, Morristown, NJ, USA. Association for Computational Linguistics. Samar Husain, Prashanth Mannem, Bharath Ambati, and Phani Gadde. 2010. Icon 2010 tools contest on indian language dependency parsing. In Proceedings of ICON 2010 NLP Tools Contest. Rebecca Hwa, Philip Resnik, Amy Weinberg, Clara Cabezas, and Okan Kolak. 2005. Bootstrapping parsers via syntactic projection across parallel texts. Nat. Lang. Eng., 11:3 11–325, September. Wenbin Jiang and Qun Liu. 2010. Dependency parsing and projection based on word-pair classification. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’ 10, pages 12–20, Morristown, NJ, USA. Association for Computational Linguistics. P. Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In MT summit, volume 5. Citeseer. Marco Kuhlmann and Joakim Nivre. 2006. Mildly non-projective dependency structures. In Proceedings of the COLING/ACL on Main conference poster sessions, pages 507–5 14, Morristown, NJ, USA. Association for Computational Linguistics. Mitchell P. Marcus, Beatrice Santorini, and Mary A. Marcinkiewicz. 1994. Building a large annotated corpus of english: The penn treebank. Computational Linguistics, 19(2):313–330. R. McDonald, K. Crammer, and F. Pereira. 2005. Online large-margin training of dependency parsers. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Jens Nilsson and Joakim Nivre. 2008. Malteval: an evaluation and visualization tool for dependency parsing. In Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech, Morocco, may. European Language Resources Association (ELRA). http://www.lrecconf.org/proceedings/lrec2008/. Joakim Nivre, Johan Hall, Sandra K ¨ubler, Ryan Mcdonald, Jens Nilsson, Sebastian Riedel, and Deniz Yuret. 2007. The CoNLL 2007 shared task on dependency parsing. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pages 915–932, Prague, Czech Republic. Association for Computational Linguistics. Joakim Nivre. 2003. An Efficient Algorithm for Projective Dependency Parsing. In Eighth International Workshop on Parsing Technologies, Nancy, France. Joakim Nivre. 2009. Non-projective dependency parsing in expected linear time. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 351–359, Suntec, Singapore, August. Association for Computational Linguistics. Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1): 19–5 1. Avinesh PVS. and Karthik Gali. 2007. Part-Of-Speech Tagging and Chunking using Conditional Random Fields and Transformation-Based Learning. In Proceedings of the IJCAI and the Workshop On Shallow Parsing for South Asian Languages (SPSAL), pages 21–24. Roi Reichart and Ari Rappoport. 2007. Self-training for enhancement and domain adaptation of statistical parsers trained on small datasets. In Proceedings of the 45th Annual Meeting of the Associa- tion of Computational Linguistics, pages 616–623, Prague, Czech Republic, June. Association for Computational Linguistics. Libin Shen and Aravind Joshi. 2008. LTAG dependency parsing with bidirectional incremental construction. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 495–504, Honolulu, Hawaii, October. Association for Computational Linguistics. L. Shen, G. Satta, and A. Joshi. 2007. Guided learning for bidirectional sequence classification. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL). Mark Steedman, Miles Osborne, Anoop Sarkar, Stephen Clark, Rebecca Hwa, Julia Hockenmaier, Paul Ruhlen, Steven Baker, and Jeremiah Crim. 2003. Bootstrapping statistical parsers from small datasets. In Proceedings of the tenth conference on 1606 European chapter of the Association for Computational Linguistics - Volume 1, EACL ’03, pages 33 1– 338, Morristown, NJ, USA. Association for Computational Linguistics. Jrg Tiedemann. 2002. MatsLex - a multilingual lex- ical database for machine translation. In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC’2002), volume VI, pages 1909–1912, Las Palmas de Gran Canaria, Spain, 29-3 1May. Sriram Venkatapathy. 2008. Nlp tools contest - 2008: Summary. In Proceedings of ICON 2008 NLP Tools Contest. Hiroyasu Yamada and Yuji Matsumoto. 2003. Statistical Dependency Analysis with Support Vector Machines. In In Proceedings of IWPT, pages 195–206. David Yarowsky, Grace Ngai, and Richard Wicentowski. 2001. Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the first international conference on Human language technology research, HLT ’01, pages 1–8, Morristown, NJ, USA. Association for Computational Linguistics.