acl acl2011 acl2011-187 acl2011-187-reference knowledge-graph by maker-knowledge-mining

187 acl-2011-Jointly Learning to Extract and Compress

Source: pdf

Author: Taylor Berg-Kirkpatrick ; Dan Gillick ; Dan Klein

Abstract: We learn a joint model of sentence extraction and compression for multi-document summarization. Our model scores candidate summaries according to a combined linear model whose features factor over (1) the n-gram types in the summary and (2) the compressions used. We train the model using a marginbased objective whose loss captures end summary quality. Because of the exponentially large set of candidate summaries, we use a cutting-plane algorithm to incrementally detect and add active constraints efficiently. Inference in our model can be cast as an ILP and thereby solved in reasonable time; we also present a fast approximation scheme which achieves similar performance. Our jointly extracted and compressed summaries outperform both unlearned baselines and our learned extraction-only system on both ROUGE and Pyramid, without a drop in judged linguistic quality. We achieve the highest published ROUGE results to date on the TAC 2008 data set.

reference text

J. Carbonell and J. Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proc. of SIGIR. D. Chiang, Y. Marton, and P. Resnik. 2008. Online largemargin training of syntactic and structural translation features. In Proc. of EMNLP. J. Clarke and M. Lapata. 2008. Global Inference for Sentence Compression: An Integer Linear Programming Approach. Journal of Artificial Intelligence Research, 31:399–429. K. Crammer and Y. Singer. 2003. Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3:95 1–991 . H.C. Daum e´ III. 2006. Practical structured learning techniques for natural language processing. Ph.D. thesis, University of Southern California. D. Gillick and B. Favre. 2009. A scalable global model for summarization. In Proc. of ACL Workshop on In- teger Linear Programming for Natural Language Pro- cessing. Liu. 2010. Non-Expert Evaluation of Summarization Systems is Risky. In Proc. of NAACL Workshop on Creating Speech and Language Data with Amazon ’s Mechanical Turk. K. Knight and D. Marcu. 2001. Statistics-based summarization-step one: Sentence compression. In D. Gillick and Y. Proc. of AAAI. L. Li, K. Zhou, G.R. Xue, H. Zha, and Y. Yu. 2009. Enhancing diversity, coverage and balance for summarization through structure learning. In Proc. ofthe 18th International Conference on World Wide Web. P. Liang, A. Bouchard-C oˆt´ e, D. Klein, and B. Taskar. 2006. An end-to-end discriminative approach to machine translation. In Proc. of the ACL. C.Y. Lin. 2003. Improving summarization performance by sentence compression: a pilot study. In Proc. of ACL Workshop on Information Retrieval with Asian Languages. C.Y. Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Proc. of ACL Workshop on Text Summarization Branches Out. A.F.T. Martins and N.A. Smith. 2009. Summarization with a joint model for sentence extraction and compression. In Proc. of NAACL Workshop on Integer Linear Programming for Natural Language Processing. R. McDonald. 2006. Discriminative sentence compression with soft syntactic constraints. In Proc. of EACL. A. Nenkova and R. Passonneau. 2004. Evaluating content selection in summarization: The pyramid method. In Proc. of NAACL. A. Nenkova and L. Vanderwende. 2005. The impact of frequency on summarization. Technical report, MSRTR-2005-101. Redmond, Washington: Microsoft Research. S. Petrov and D. Klein. 2007. Learning and inference for hierarchically split PCFGs. In AAAI. J.C. Platt. 1999. Fast training of support vector machines using sequential minimal optimization. InAdvances in Kernel Methods. MIT press. F. Schilder and R. Kondadadi. 2008. Fastsum: Fast and accurate query-based multi-document summarization. In Proc. of ACL. D. Shen, J.T. Sun, H. Li, Q. Yang, and Z. Chen. 2007. Document summarization using conditional random fields. In Proc. of IJCAI. B. Taskar, C. Guestrin, and D. Koller. 2003. Max-margin Markov networks. In Proc. of NIPS. B. Taskar, D. Klein, M. Collins, D. Koller, and C. Manning. 2004. Max-margin parsing. In Proc. of EMNLP. S. Teufel and M. Moens. 1997. Sentence extraction as a classification task. In Proc. of ACL Workshop on Intelligent and Scalable Text Summarization. I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. 2004. Support vector machine learning for interdependent and structured output spaces. In Proc. of ICML. V.N. Vapnik. 1998. Statistical learning theory. John Wiley and Sons, New York. K. Woodsend and M. Lapata. 2010. Automatic generation of story highlights. In Proc. of ACL. W. Yih, J. Goodman, L. Vanderwende, and H. Suzuki. 2007. Multi-document summarization by maximizing informative content-words. In Proc. of IJCAI. D.M. Zajic, B.J. Dorr, R. Schwartz, and J. Lin. 2006. Sentence compression as a component of a multidocument summarization system. In Proc. of the 2006 Document Understanding Workshop. 490