acl acl2013 acl2013-377 acl2013-377-reference knowledge-graph by maker-knowledge-mining

377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization

Source: pdf

Author: Chen Li ; Xian Qian ; Yang Liu

Abstract: In this paper, we propose a bigram based supervised method for extractive document summarization in the integer linear programming (ILP) framework. For each bigram, a regression model is used to estimate its frequency in the reference summary. The regression model uses a variety ofindicative features and is trained discriminatively to minimize the distance between the estimated and the ground truth bigram frequency in the reference summary. During testing, the sentence selection problem is formulated as an ILP problem to maximize the bigram gains. We demonstrate that our system consistently outperforms the previous ILP method on different TAC data sets, and performs competitively compared to the best results in the TAC evaluations. We also conducted various analysis to show the impact of bigram selection, weight estimation, and ILP setup.

reference text

Ahmet Aker and Robert Gaizauskas. 2009. Summary generation for toponym-referencedimages using object type language models. In Proceedings of the International Conference RANLP. Ahmet Aker, Trevor Cohn, and Robert Gaizauskas. 2010. Multi-document summarization using a* search and discriminative training. In Proceedings of the EMNLP. Taylor Berg-Kirkpatrick, Dan Gillick, and Dan Klein. 2011. Jointly learning to extract and compress. In Proceedings of the ACL. Ronald Brandow, Karl Mitze, and Lisa F. Rau. 1995. Automatic condensation of electronic publications by sentence selection. Inf. Process. Manage. Jaime Carbonell and Jade Goldstein. 1998. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the SIGIR. Asli Celikyilmaz and Dilek Hakkani-T u¨r. 2011. Discovery of topically coherent sentences for extractive summarization. In Proceedings of the ACL. John M. Conroy, Judith D. Schlesinger, Jeff Kubina, Peter A. Rankel, and Dianne P. O’Leary. 2011. Classy 2011 at tac: Guided and multi-lingual summaries and evaluation metrics. In Proceedings ofthe TAC. William M. Darling and Fei Song. 2011. Probabilistic document modeling for syntax removal in text summarization. In Proceedings of the ACL. Sashka T. Davis, John M. Conroy, and Judith D. Schlesinger. 2012. Occams - an optimal combinato- rial covering algorithm for multi-document summarization. In Proceedings of the ICDM. H. P. Edmundson. 1969. New methods in automatic extracting. J. ACM. G u¨nes Erkan and Dragomir R. Radev. 2004. Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Int. Res. Dimitrios Galanis, Gerasimos Lampouras, and Ion Androutsopoulos. 2012. Extractive multi-document summarization with integer linear programming and support vector regression. In Proceedings of the COLING. Michel Galley. 2006. A skip-chain conditional random field for ranking meeting utterances by importance. In Proceedings of the EMNLP. Dan Gillick and Benoit Favre. 2009. A scalable global model for summarization. In Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing on NAACL. Dan Gillick, Benoit Favre, and Dilek Hakkani-T u¨r. 2008. In The ICSI Summarization System at TAC 2008. Feng Jin, Minlie Huang, and Xiaoyan Zhu. 2010. A comparative study on ranking and selection strategies for multi-document summarization. In Proceedings of the COLING. Julian Kupiec, Jan Pedersen, and Francine Chen. 1995. A trainable document summarizer. In Proceedings of the SIGIR. Jure Leskovec, Natasa Milic-Frayling, and Marko Grobelnik. 2005. Impact of linguistic analysis on the semantic graph coverage and learning of document extracts. In Proceedings of the AAAI. Hui Lin and Jeff Bilmes. 2010. Multi-document summarization via budgeted maximization of submodular functions. In Proceedings of the NAACL. Chin-Yew Lin. 2004. Rouge: a package for automatic evaluation of summaries. In Proceedings of the ACL. Ryan McDonald. 2007. A study of global inference algorithms in multi-document summarization. In Proceedings of the European conference on IR research. Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the EMNLP. Miles Osborne. 2002. Using maximum entropy for sentence extraction. In Proceedings of the ACL-02 Workshop on Automatic Summarization. 1012 Dragomir R. Radev. 2001. Experiments in single and multidocument summarization using mead. In In First Document Understanding Conference. Seonggi Ryang and Takeshi Abekawa. 2012. Framework of automatic text summarization using reinforcement learning. In Proceedings of the EMNLP. Noah A. Smith and Jason Eisner. 2005. Contrastive estimation: training log-linear models on unlabeled data. In Proceedings of the ACL. Kristian Woodsend and Mirella Lapata. 2012. Multiple aspect summarization using integer linear programming. In Proceedings of the EMNLP. Shasha Xie and Yang Liu. 2010. Improving supervised learning for meeting summarization using sampling and regression. Comput. Speech Lang. 1013