acl acl2011 acl2011-28 acl2011-28-reference knowledge-graph by maker-knowledge-mining

28 acl-2011-A Statistical Tree Annotator and Its Applications

Source: pdf

Author: Xiaoqiang Luo ; Bing Zhao

Abstract: In many natural language applications, there is a need to enrich syntactical parse trees. We present a statistical tree annotator augmenting nodes with additional information. The annotator is generic and can be applied to a variety of applications. We report 3 such applications in this paper: predicting function tags; predicting null elements; and predicting whether a tree constituent is projectable in machine translation. Our function tag prediction system outperforms significantly published results.

reference text

Adam L. Berger, Stephen A. Della Pietra, and Vincent J. Della Pietra. 1996. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39–71, March. Ann Bies, Mark Ferguson, and karen Katz. 1995. Bracketing guidelines for treebank II-style penn treebank project. Technical report, Linguistic Data Consortium. Daniel M. Bikel. 2004. A distributional analysis of a lexicalized statistical parsing model. In Dekang Lin and Dekai Wu, editors, Proceedings of EMNLP 2004, pages 182–189, Barcelona, Spain, July. Association for Computational Linguistics. Don Blaheta and Eugene Charniak. 2000. Assigning function tags to parsed text. In Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics, pages 234–240. Don Blaheta. 2003. Function Tagging. Ph.D. thesis, Brown University. Richard Campbell. 2004. Using linguistic principles to recover empty categories. In Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL’04), Main Volume, pages 645–652, Barcelona, Spain, July. Xavier Carreras, Michael Collins, and Terry Koo. 2008. TAG, dynamic programming, and the perceptron for efficient, feature-rich parsing. In Proceedings of CoNLL. E. Charniak. 2000. A maximum-entropy-inspiredparser. In Proceedings of NAACL, Seattle. David Chiang. 2010. Learning to translate with source and target syntax. In Proc. ACL, pages 1443–1452. Tagyoung Chung and Daniel Gildea. 2010. Effects of empty categories on machine translation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 636–645, Cambridge, MA, October. Association for Computational Linguistics. Michael Collins. 1997. Three generative, lexicalised models for statistical parsing. In Proc. Annual Meeting of ACL, pages 16–23. Peter Dienes, P Eter Dienes, and Amit Dubey. 2003. Antecedent recovery: Experiments with a trace tagger. In In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 33–40. Ryan Gabbard, Mitchell Marcus, and Seth Kulick. 2006. Fully parsing the Penn Treebank. In Proceedings of Human Language Technology Conference of the North Amer- ican Chapter of the Association of Computational Linguistics. Joshua Goodman. 2002. Sequential conditional generalized iterative scaling. In Pro. of the 40th ACL. Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel. 2006. Ontonotes: The 90% solution. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pages 57–60, New York City, USA, June. Association for Computational Linguistics. Dan Klein and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In Erhard Hinrichs and Dan Roth, editors, Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 423–430. 1238 Mihai Lintean and V. Rus. 2007a. Large scale experiments with function tagging. In Proceedings of the International Conference on Knowledge Engineering, pages 1–7. Mihai Lintean and V. Rus. 2007b. Naive Bayes and decision trees for function tagging. In Proceedings of the International Conference of the FLAIRS-2007. David M. Magerman. 1994. Natural Language Parsing As Statistical Pattern Recognition. Ph.D. thesis, Stanford University. Robert Malouf. 2002. A comparison of algorithms for maximum entropy parameter estimation. In the Sixth Conference on Natural Language Learning (CoNLL2002), pages 49–55. M. Marcus, B. Santorini, and M. Marcinkiewicz. 1993. Building a large annotated corpus of English: the Penn treebank. Computational Linguistics, 19(2):3 13–330. Adwait Ratnaparkhi. 1997. A Linear Observed Time Statistical Parser Based on Maximum Entropy Models. In Second Conference on Empirical Methods in Natural Language Processing, pages 1 10. – Helmut Schmid. 2006. Trace prediction and recovery with unlexicalized PCFGs and slash features. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 177–184, Sydney, Australia, July. Association for Computational Linguistics. Libin Shen, Jinxi Xu, and Ralph Weischedel. 2008. A new string-to-dependency machine translation algorithm with a target dependency language model. In Proceedings of ACL. Libin Shen, Bing Zhang, Spyros Matsoukas, Jinxi Xu, and Ralph Weischedel. 2010. Statistical machine translation with a factorized grammar. In Proceedings ofthe 2010 Conference on EmpiricalMethods in Natural Language Processing, pages 616–625, Cambridge, MA, October. Association for Computational Linguistics. Deyi Xiong, Min Zhang, and Haizhou Li. 2010. Learning translation boundaries for phrase-based decoding. In NAACL-HLT 2010. Nianwen Xue, Fei Xia, Fu-Dong Chiou, and Martha Palmer. 2005. The Penn Chinese TreeBank: Phrase structure annotation of a large corpus. Natural Language Engineering, 11(2):207–238. Kenji Yamada and Kevin Knight. 2001. A syntax-based statistical translation model. In Proc. Annual Meeting of the Association for Computational Linguistics. Bing Zhao, , Young-Suk Lee, Xiaoqiang Luo, and Liu Li. 2011. Learning to transform and select elementary trees for improved syntax-based machine translations. In Proc. of ACL.