acl acl2010 acl2010-39 acl2010-39-reference knowledge-graph by maker-knowledge-mining

39 acl-2010-Automatic Generation of Story Highlights

Source: pdf

Author: Kristian Woodsend ; Mirella Lapata

Abstract: In this paper we present a joint content selection and compression model for single-document summarization. The model operates over a phrase-based representation of the source document which we obtain by merging information from PCFG parse trees and dependency graphs. Using an integer linear programming formulation, the model learns to select and combine phrases subject to length, coverage and grammar constraints. We evaluate the approach on the task of generating “story highlights”—a small number of brief, self-contained sentences that allow readers to quickly gather information on news stories. Experimental results show that the model’s output is comparable to human-written highlights in terms of both grammaticality and content.

reference text

Achterberg, Tobias. 2007. Constraint Integer Programming. Ph.D. thesis, Technische Universit a¨t Berlin. Banko, Michele, Vibhu O. Mittal, and Michael J. Witbrock. 2000. Headline generation based on statistical translation. In Proceedings of the 38th ACL. Hong Kong, pages 3 18– 325. Clarke, James and Mirella Lapata. 2007. Modelling compression with discourse constraints. In Proceedings of EMNLP-CoNLL. Prague, Czech Republic, pages 1–1 1. Clarke, James and Mirella Lapata. 2008. Global inference for sentence compression: An integer linear programming approach. Journal of Artificial Intelligence Research 31:399–429. Cohn, Trevor and Mirella Lapata. 2009. Sentence compression as tree transduction. Journal of Artificial Intelligence Research 34:637–674. 573 Conroy, J. M., J. D. Schlesinger, J. Goldstein, and D. P. O’Leary. 2004. Left-brain/right-brain multi-document summarization. In DUC 2004 Conference Proceedings. Daum e´ III, Hal. 2006. Practical Structured Learning Techniques for Natural Language Processing. Ph.D. thesis, University of Southern California. Daum e´ III, Hal and Daniel Marcu. 2002. A noisy-channel model for document compression. In Proceedings of the 40th ACL. Philadelphia, PA, pages 449–456. Dorr, Bonnie, David Zajic, and Richard Schwartz. 2003. Hedge trimmer: A parse-and-trim approach to headline generation. In Proceedings of the HLT-NAACL 2003 Workshop on Text Summarization. pages 1–8. Jing, Hongyan. 2000. Sentence reduction for automatic text summarization. In Proceedings of the 6th ANLP. Seattle, WA, pages 310–315. Jing, Hongyan. 2002. Using hidden Markov modeling to decompose human-written summaries. Computational Linguistics 28(4):527–544. Jing, Hongyan and Kathleen McKeown. 2000. Cut and paste summarization. In Proceedings of the 1st NAACL. Seattle, WA, pages 178–185. Keller, Frank, Subahshini Gunasekharan, Neil Mayo, and Martin Corley. 2009. Timing accuracy of web experiments: A case study using the WebExp software package. Behavior Research Methods 41(1): 1–12. Klein, Dan and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st ACL. Sapporo, Japan, pages 423–430. Knight, Kevin and Daniel Marcu. 2002. Summarization beyond sentence extraction: a probabilistic approach to sentence compression. Artificial Intelligence 139(1):91–107. Koch, Thorsten. 2004. Rapid Mathematical Prototyping. Ph.D. thesis, Technische Universit a¨t Berlin. Kupiec, Julian, Jan O. Pedersen, and Francine Chen. 1995. A trainable document summarizer. In Proceedings of SIGIR95. Seattle, WA, pages 68–73. Lin, Chin-Yew. 2003. Improving summarization performance by sentence compression a pilot study. In Proceedings of the 6th International Workshop on Information Retrieval with Asian Languages. Sapporo, Japan, pages 1–8. Lin, Chin-Yew and Eduard H. Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of HLT NAACL. Edmonton, Canada, pages 71–78. Mani, Inderjeet. 2001. Automatic Summarization. John Benjamins Pub Co. Martins, Andr e´ and Noah A. Smith. 2009. Summarization with a joint model for sentence extraction and compression. In Proceedings of the Workshop on Integer Linear Programming for Natural Language Processing. Boulder, Colorado, pages 1–9. McDonald, Ryan. 2006. Discriminative sentence compression with soft syntactic constraints. In Proceedings of the 11th EACL. Trento, Italy. McDonald, Ryan. 2007. A study of global inference algorithms in multi-document summarization. In Proceedings of the 29th ECIR. Rome, Italy. Nenkova, Ani. 2005. Automatic text summarization of newswire: Lessons learned from the Document Understanding Conference. In Proceedings of the 20th AAAI. Pittsburgh, PA, pages 1436–1441 . Siddharthan, Advaith, Ani Nenkova, and Kathleen McKeown. 2004. Syntactic simplification for improving content selection in multi-document summarization. In Proceedings of the 20th International Conference on Compu— tational Linguistics (COLING 2004). pages 896–902. Sparck Jones, Karen. 1999. Automatic summarizing: Factors and directions. In Inderjeet Mani and Mark T. Maybury, editors, Advances in Automatic Text Summarization, MIT Press, Cambridge, pages 1–33. Svore, Krysta, Lucy Vanderwende, and Christopher Burges. 2007. Enhancing single-document summarization by combining RankNet and third-party sources. In Proceedings of EMNLP-CoNLL. Prague, Czech Republic, pages 448–457. Wan, Stephen and C ´ecile Paris. 2008. Experimenting with clause segmentation for text summarization. In Proceedings of the 1st TAC. Gaithersburg, MD. Witten, Ian H., Gordon Paynter, Eibe Frank, Carl Gutwin, and Craig G. Nevill-Manning. 1999. KEA: Practical automatic keyphrase extraction. In Proceedings of the 4th ACM International Conference on Digital Libraries. Berkeley, CA, pages 254–255. Woodsend, Kristian and Jacek Gondzio. 2009. Exploiting separability in large-scale linear support vector machine training. Computational Optimization and Applications . Wunderling, Roland. 1996. Paralleler und objektorientierter Simplex-Algorithmus. Ph.D. thesis, Technische Universit a¨t Berlin. Zajic, David, Bonnie J. Door, Jimmy Lin, and Richard Schwartz. 2007. Multi-candidate reduction: Sentence compression as a tool for document summarization tasks. Information Processing Management Special Issue on Summarization 43(6): 1549–1570. 574