emnlp emnlp2012 emnlp2012-88 emnlp2012-88-reference knowledge-graph by maker-knowledge-mining

88 emnlp-2012-Minimal Dependency Length in Realization Ranking

Source: pdf

Author: Michael White ; Rajakrishnan Rajkumar

Abstract: Comprehension and corpus studies have found that the tendency to minimize dependency length has a strong influence on constituent ordering choices. In this paper, we investigate dependency length minimization in the context of discriminative realization ranking, focusing on its potential to eliminate egregious ordering errors as well as better match the distributional characteristics of sentence orderings in news text. We find that with a stateof-the-art, comprehensive realization ranking model, dependency length minimization yields statistically significant improvements in BLEU scores and significantly reduces the number of heavy/light ordering errors. Through distributional analyses, we also show that with simpler ranking models, dependency length minimization can go overboard, too often sacrificing canonical word order to shorten dependencies, while richer models manage to better counterbalance the dependency length minimization preference against (sometimes) competing canonical word order preferences.

reference text

Arto Anttila, Matthew Adams, and Mike Speriosu. 2010. The role of prosody in the English dative alternation. Language and Cognitive Processes. Jennifer E. Arnold, Thomas Wasow, Anthony Losongco, and Ryan Ginstrom. 2000. Heaviness vs. newness: The effects of structural complexity and discourse status on constituent ordering. Language, 76:28–55. Jason Baldridge and Geert-Jan Kruijff. 2002. Coupling CCG and Hybrid Logic Dependency Semantics. In Proc. ACL-02. Anja Belz, Mike White, Dominic Espinosa, Eric Kow, Deirdre Hogan, and Amanda Stent. 2011. The first surface realisation shared task: Overview and evaluation results. In Proceedings of the Generation Challenges Session at the 13th European Workshop on Natural Language Generation, pages 217– 226, Nancy, France, September. Association for Computational Linguistics. Bernd Bohnet, Simon Mille, Beno ıˆt Favre, and Leo Wanner. 2011. : From deep representation to surface. In Proceedings of the Generation Challenges Session at the 13th European Workshop on Natural Language Generation, pages 232–235, Nancy, France, September. Association for Computational Linguistics. H Branigan, M Pickering, and M Tanaka. 2008. Contributions of animacy to grammatical function assignment and word order during production. Lingua, 118(2): 172–189. Joan Bresnan, Anna Cueni, Tatiana Nikitina, and R. Harald Baayen. 2007. Predicting the Dative Alternation. Cognitive Foundations of Interpretation, pages 69–94. Aoife Cahill and Arndt Riester. 2009. Incorporating information status into generation ranking. In Proceedings of, ACL-IJCNLP ’09, pages 817–825, Morristown, NJ, USA. Association for Computational Linguistics. Aoife Cahill, Martin Forst, and Christian Rohrer. 2007. Designing features for parse disambiguation and realisation ranking. In Miriam Butt and Tracy Holloway King, editors, Proceedings of the 12th International Lexical Functional Grammar Conference, pages 128– 147. CSLI Publications, Stanford. Charles Callaway. 2005. The types and distributions of errors in a wide coverage surface realizer evaluation. In Proceedings of the 10th European Workshop on Natural Language Generation. Stephen Clark and James R. Curran. 2007. WideCoverage Efficient Statistical Parsing with CCG and Log-Linear Models. Computational Linguistics, 33(4):493–552. 254 Vera Demberg and Frank Keller. 2008. Data from eyetracking corpora as evidence for theories of syntactic processing complexity. Cognition, 109(2): 193–210. Katja Filippova and Michael Strube. 2007. Generating constituent order in German clauses. In ACL 2007, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, June 23-30, 2007, Prague, Czech Republic. The Association for Computer Linguistics. Katja Filippova and Michael Strube. 2009. Tree linearization in English: Improving language model based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pages 225–228, Boulder, Colorado, June. Association for Computational Linguistics. Edward Gibson. 1998. Linguistic complexity: Locality of syntactic dependencies. Cognition, 68: 1–76. Edward Gibson. 2000. Dependency locality theory: A distance-based theory of linguistic complexity. In Alec Marantz, Yasushi Miyashita, and Wayne O’Neil, editors, Image, Language, brain: Papers from the First Mind Articulation Project Symposium. MIT Press, Cambridge, MA. Daniel Gildea and David Temperley. 2007. Optimizing grammars for minimum dependency length. In Proceedings ofthe 45thAnnual Meeting ofthe Association of Computational Linguistics, pages 184–191, Prague, Czech Republic, June. Association for Computational Linguistics. Yuqing Guo, Josef van Genabith, and Haifeng Wang. 2008. Dependency-based n-gram models for general purpose sentence realisation. In Proc. COLING-08. John A. Hawkins. 1994. A Performance Theory of Order and Constituency. Cambridge University Press, New York. John A. Hawkins. 2000. The relative order of prepositional phrases in English: Going beyond manner-place-time. Language Variation and Change, 11(03):231–266. John A. Hawkins. 2001. Why are categories adjacent? Journal of Linguistics, 37: 1–34. Julia Hockenmaier and Mark Steedman. 2007. CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank. Computational Linguistics, 33(3):355–396. Julia Hockenmaier. 2003. Data and models for statistical parsing with Combinatory Categorial Grammar. Ph.D. thesis, University of Edinburgh. Deirdre Hogan, Conor Cafferkey, Aoife Cahill, and Josef van Genabith. 2007. Exploiting multi-word units in history-based probabilistic generation. In Proc. EMNLP-CoNLL. Gerard Kempen and Karin Harbusch. 2004. Generating natural word orders in a semi-free word order language: Treebank-based linearization preferences for German. In Alexander F. Gelbukh, editor, CICLing, volume 2945 of Lecture Notes in Computer Science, pages 350–354. Springer. Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Dekang Lin and Dekai Wu, editors, Proceedings of EMNLP 2004, pages 388–395, Barcelona, Spain, July. Association for Computational Linguistics. Irene Langkilde-Geary. 2002. An empirical verification of coverage and correctness for a general-purpose sentence generator. In Proc. INLG-02. R. L. Lewis and S. Vasishth. 2005. An activation-based model of sentence processing as skilled memory retrieval. Cognitive Science, 29: 1–45, May. Richard L. Lewis, Shravan Vasishth, and Julie Van Dyke. 2006. Computational principles of working memory in sentence comprehension. Trends in Cognitive Sciences, 10(10):447–454. Hiroko Nakanishi, Yusuke Miyao, and Jun’ichi Tsujii. 2005. Probabilistic methods for disambiguation of an HPSG-based chart generator. In Proc. IWPT-05. Kishore Papineni, Salim Roukos, Todd Ward, and Wei- Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proc. ACL-02. Rajakrishnan Rajkumar and Michael White. 2010. Designing agreement features for realization ranking. In Coling 2010: Posters, pages 1032–1040, Beijing, China, August. Coling 2010 Organizing Committee. Rajakrishnan Rajkumar, Michael White, and Dominic Espinosa. 2009. Exploiting named entity classes in CCG surface realization. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pages 161–164, Boulder, Colorado, June. Association for Computational Linguistics. Eric Ringger, Michael Gamon, Robert C. Moore, David Rojas, Martine Smets, and Simon Corston-Oliver. 2004. Linguistically informed statistical models of constituent structure for ordering in sentence realization. In Proc. COLING-04. Neal Snider and Annie Zaenen. 2006. Animacy and syntactic structure: Fronted NPs in English. In M. Butt, M. Dalrymple, and T.H. King, editors, Intelligent Linguistic Architectures: Variations on Themes by Ronald M. Kaplan. CSLI Publications, Stanford. Mark Steedman. 2000. The Syntactic Process. MIT Press. David Temperley. 2007. Minimization of dependency length in written English. Cognition, 105(2):300 – 333. 255 Harry Tily. 2010. The Role of Processing Complexity in Word Order Variation and Change. Ph.D. thesis, Stanford University. Erik Velldal and Stefan Oepen. 2005. Maximum entropy models for realization ranking. In Proc. MT-Summit X. Thomas Wasow and Jennifer Arnold. 2003. Post-verbal Constituent Ordering in English. Mouton. Tom Wasow. 2002. Postverbal Behavior. CSLI Publications, Stanford. Michael White and Rajakrishnan Rajkumar. 2009. Perceptron reranking for CCG realization. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 410–419, Singapore, August. Association for Computational Linguistics. Michael White. 2006. Efficient Realization of Coordinate Structures in Combinatory Categorial Grammar. Research on Language & Computation, 4(1):39–75. Hiroko Yamashita and Franklin Chang. 2001. “Long before short” preference in the production of a headfinal language. Cognition, 81.