emnlp emnlp2012 emnlp2012-94 emnlp2012-94-reference knowledge-graph by maker-knowledge-mining

94 emnlp-2012-Multiple Aspect Summarization Using Integer Linear Programming

Source: pdf

Author: Kristian Woodsend ; Mirella Lapata

Abstract: Multi-document summarization involves many aspects of content selection and surface realization. The summaries must be informative, succinct, grammatical, and obey stylistic writing conventions. We present a method where such individual aspects are learned separately from data (without any hand-engineering) but optimized jointly using an integer linear programme. The ILP framework allows us to combine the decisions of the expert learners and to select and rewrite source content through a mixture of objective setting, soft and hard constraints. Experimental results on the TAC-08 data set show that our model achieves state-of-the-art performance using ROUGE and significantly improves the informativeness of the summaries.

reference text

Taylor Berg-Kirkpatrick, Dan Gillick, and Dan Klein. 2011. Jointly learning to extract and compress. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 48 1–490, Portland, Oregon. James Clarke and Mirella Lapata. 2008. Global inference for sentence compression: An integer linear programming approach. Journal of Artificial Intelligence Research, 31:273–381. Trevor Cohn and Mirella Lapata. 2008. Sentence com- pression beyond word deletion. In Proceedings of the 22nd International Conference on Computational Linguistics, pages 137–144, Manchester, UK. Trevor Cohn and Mirella Lapata. 2009. Sentence compression as tree transduction. Journal of Artificial Intelligence Research, 34:637–674. Pawan Deshpande, Regina Barzilay, and David Karger. 2007. Randomized decoding for selection-andordering problems. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 444–45 1, Rochester, New York. Jason Eisner. 2003. Learning non-isomorphic tree mappings for machine translation. In Proceedings of the ACL Interactive Poster/Demonstration Sessions, pages 205–208, Sapporo, Japan. Elena Filatova and Vasileios Hatzivassiloglou. 2004. A formal model for information selection in multisentence text extraction. In Proceedings of the 20th International Conference on Computational Linguistics, pages 397–403, Geneva, Switzerland. Dan Gillick and Benoit Favre. 2009. A scalable global model for summarization. In Proceedings of the Workshop on Integer Linear Programming for Natural Language Processing, pages 10–18, Boulder, Colorado. Dan Gillick, Benoit Favre, and Dilek Hakkani-t u¨r. 2008. The ICSI summarization system at TAC 2008. In Proceedings of the Text Analysis Conference. Dan Gillick, Benoit Favre, Dilek Hakkani-t u¨r, Berndt Bohnet, Yang Liu, and Shasha Xie. 2009. The ICSI/UTD summarization system at TAC 2009. In Proceedings of the Text Analysis Conference. Jade Goldstein, Vibhu Mittal, Jaime Carbonell, and Mark Kantrowitz. 2000. Multi-document summarization by sentence extraction. In Proceedings of the 2000 NAACL–ANLP Workshop on Automatic Summarization, pages 40–48, Seattle, Washington. Dorit S. Hochba. 1997. Approximating covering and packing problems: Set cover, vertex cover, independent set, and related problems. In Dorit S. Hochba, editor, Approximation Algorithms for NPHard Problems, pages 94–143. PWS Publishing Company, Boston, MA. Honyang Jing. 2002. Using Hidden Markov modeling to decompose human-written summaries. Computational Linguistics, 28(4):527–544. Dan Klein and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Association for Computational Linguistics, pages 423– 430, Sapporo, Japan. Chin-Yew Lin and Eduard H. Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of HLT–NAACL, pages 71– 78, Edmonton, Canada. Inderjeet Mani. 2001. Automatic Summarization. John Benjamins Pub Co. Andr e´ Martins and Noah A. Smith. 2009. Summarization with a joint model for sentence extraction and compression. In Proceedings of the Workshop on Integer Linear Programming for Natural Language Processing, pages 1–9, Boulder, Colorado. Ryan McDonald. 2007. A study ofglobal inference algorithms in multi-document summarization. In Proceed243 ings of the 29th European conference on IR Research, pages 557–564, Rome, Italy. Rani Nelken and Stuart Schieber. 2006. Towards robust context-sensitive sentence alignment for monolingual corpora. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pages 161–168, Trento, Italy. David Smith and Jason Eisner. 2006. Quasi-synchronous grammars: Alignment by soft projection of syntactic dependencies. In Proceedings of Workshop on Statistical Machine Translation, pages 23–30, NYC. Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, pages 223–23 1, Cambridge. Karen Sparck Jones. 1999. Automatic summarizing: Factors and directions. In Inderjeet Mani and Mark T. Maybury, editors, Advances in Automatic Text Summarization, pages 1–33. MIT Press, Cambridge. Kristian Woodsend and Jacek Gondzio. 2009. Exploiting separability in large-scale linear support vector machine training. Computational Optimization and Applications. Kristian Woodsend and Mirella Lapata. 2010. Automatic generation of story highlights. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 565–574, Uppsala, Sweden. Kristian Woodsend, Yansong Feng, and Mirella Lapata. 2010. Title generation with quasi-synchronous grammar. In Proceedings ofthe 2010 Conference on Empirical Methods in Natural Language Processing, pages 513–523, Cambridge, MA.