emnlp emnlp2010 emnlp2010-105 emnlp2010-105-reference knowledge-graph by maker-knowledge-mining

105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

Source: pdf

Author: Kristian Woodsend ; Yansong Feng ; Mirella Lapata

Abstract: The task of selecting information and rendering it appropriately appears in multiple contexts in summarization. In this paper we present a model that simultaneously optimizes selection and rendering preferences. The model operates over a phrase-based representation of the source document which we obtain by merging PCFG parse trees and dependency graphs. Selection preferences for individual phrases are learned discriminatively, while a quasi-synchronous grammar (Smith and Eisner, 2006) captures rendering preferences such as paraphrases and compressions. Based on an integer linear programming formulation, the model learns to generate summaries that satisfy both types of preferences, while ensuring that length, topic coverage and grammar constraints are met. Experiments on headline and image caption generation show that our method obtains state-of-the-art performance using essentially the same model for both tasks without any major modifications.

reference text

Achterberg, Tobias. 2007. Constraint Integer Programming. Ph.D. thesis, Technische Universit a¨t Berlin. Banko, Michele, Vibhu O. Mittal, and Michael J. Witbrock. 2000. Headline generation based on statistical translation. In Proceedings of the 38th ACL. Hong Kong, pages 3 18–325. Clarke, James and Mirella Lapata. 2008. Global inference for sentence compression: An integer linear programming approach. Journal of Artificial Intelligence Research 31:399–429. Cohn, Trevor and Mirella Lapata. 2008. Sentence compression beyond word deletion. In Proceedings of the 22nd COLING. Manchester, UK, pages 137–144. Das, Dipanjan and Noah A. Smith. 2009. Paraphrase identification as probabilistic quasi-synchronous recognition. In Proceedings of the ACL-IJCNLP. Suntec, Singapore, pages 468–476. Daum e´ III, Hal. 2006. Practical Structured Learning Techniques for Natural Language Processing. Ph.D. thesis, University of Southern California. Daum e´ III, Hal and Daniel Marcu. 2002. A noisy-channel model for document compression. In Proceedings of the 40th ACL. Philadelphia, PA, pages 449–456. Dorr, Bonnie, David Zajic, and Richard Schwartz. 2003. Hedge trimmer: A parse-and-trim approach to headline generation. In Proceedings of the HLTNAACL 2003 Text Summarization Workshop and Document Understanding Conference. Edmondon, Alberta, pages 1–8. Dras, Mark. 1999. Tree Adjoining Grammar and the Reluctant Paraphrasing of Text. . Ph.D. thesis, Macquarie University. Feng, Yansong and Mirella Lapata. 2010a. How many words is a picture worth? Automatic caption generation for news images. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Uppsala, Sweden, pages 1239–1249. Feng, Yansong and Mirella Lapata. 2010b. Topic models for image annotation and text illustration. In Proceedings of the NAACL HLT. Association for Com- putational Linguistics, Los Angeles, California, pages 831–839. Jing, Hongyan. 2000. Sentence reduction for automatic text summarization. In Proceedings of the 6th ANLP. Seattle, WA, pages 310–315. Jing, Hongyan. 2002. Using hidden Markov modeling to decompose human-written summaries. Computational Linguistics 28(4):527–544. Jing, Hongyan and Kathleen McKeown. 2000. Cut and paste summarization. In Proceedings of the 1st NAACL. Seattle, WA, pages 178–185. Keller, Frank, Subahshini Gunasekharan, Neil Mayo, and Martin Corley. 2009. Timing accuracy of web experiments: A case study using the WebExp software package. Behavior Research Methods 41(1): 1–12. Klein, Dan and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st ACL. Sapporo, Japan, pages 423–430. Koch, Thorsten. 2004. Rapid Mathematical Prototyping. Ph.D. thesis, Technische Universit a¨t Berlin. Kupiec, Julian, Jan O. Pedersen, and Francine Chen. 1995. A trainable document summarizer. In Proceedings of SIGIR-95. Seattle, WA, pages 68–73. Lin, Chin-Yew. 2003. Improving summarization performance by sentence compression a pilot study. In Proceedings of the 6th International Workshop on Information Retrieval with Asian Languages. Sapporo, — Japan, pages 1–8. Lin, Chin-Yew and Eduard H. Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of HLT NAACL. Edmonton, Canada, pages 71–78. Martins, Andr e´ and Noah A. Smith. 2009. Summarization with a joint model for sentence extraction and compression. In Proceedings of the Workshop on Integer Linear Programming for Natural Language Processing. Boulder, Colorado, pages 1–9. Smith, David and Jason Eisner. 2006. Quasi-synchronous grammars: Alignment by soft projection of syntactic dependencies. In Proceedings on the Workshop on Statistical Machine Translation. Association for Computational Linguistics, New York City, pages 23–30. Smith, David A. and Jason Eisner. 2009. Parser adaptation and projection with quasi-synchronous grammar features. In Proceedings of the EMNLP. Suntec, Singapore, pages 822–83 1. Soricut, R. and D. Marcu. 2007. Abstractive headline generation using WIDL-expressions. Information Processing and Management 43(6): 1536–1548. Text Summarization. 523 Wang, Mengqiu, Noah A. Smith, and Teruko Mitamura. 2007. What is the Jeopardy model? a quasi- synchronous grammar for QA. In Proceedings of the EMNLP-CoNLL. Prague, Czech Republic, pages 22– 32. Woodsend, Kristian and Jacek Gondzio. 2009. Exploiting separability in large-scale linear support vector machine training. Computational Optimization and Applications Published online. Woodsend, Kristian and Mirella Lapata. 2010. Automatic generation of story highlights. In Sandra Carberry and Stephen Clark, editors, Proceedings of the 48th ACL. Uppsala, Sweden, pages 565–574. Zajic, David, Bonnie Dorr, and Richard Schwartz. 2004. BBN/UMD at DUC-2004: Topiary. In Proceedings of the NAACL Workshop on Document Understanding. Boston, MA, pages 112–1 19. Zajic, David, Bonnie J. Dorr, Jimmy Lin, and Richard Schwartz. 2007. Multi-candidate reduction: Sentence compression as a tool for document summarization tasks. Information Processing Management Special Issue on Summarization 43(6): 1549–1570. Zhao, Shiqi, Xiang Lan, Ting Liu, and Sheng Li. 2009. Application-driven statistical paraphrase generation. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Suntec, Singapore, pages 834–842.