acl acl2013 acl2013-332 acl2013-332-reference knowledge-graph by maker-knowledge-mining

332 acl-2013-Subtree Extractive Summarization via Submodular Maximization


Source: pdf

Author: Hajime Morita ; Ryohei Sasano ; Hiroya Takamura ; Manabu Okumura

Abstract: This study proposes a text summarization model that simultaneously performs sentence extraction and compression. We translate the text summarization task into a problem of extracting a set of dependency subtrees in the document cluster. We also encode obligatory case constraints as must-link dependency constraints in order to guarantee the readability of the generated summary. In order to handle the subtree extraction problem, we investigate a new class of submodular maximization problem, and a new algorithm that has the approximation ratio 21 (1 − e−1). Our experiments with the NTC(1IR − −A eCLIA test collections show that our approach outperforms a state-of-the-art algorithm.


reference text

Taylor Berg-Kirkpatrick, Dan Gillick, and Dan Klein. 2011. Jointly learning to extract and compress. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT ’ 11, pages 481–490, Stroudsburg, PA, USA. Association for Computational Linguistics. Calinescu Calinescu, Chandra Chekuri, Martin P a´l, and Jan Vondr ´ak. 2011. Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6): 1740–1766. Michele Conforti and G ´erard Cornu ´ejols. 1984. Submodular set functions, matroids and the greedy algorithm: Tight worst-case bounds and some generalizations of the rado-edmonds theorem. Discrete Applied Mathematics, 7(3):251 274. – Hoa Trang Dang. 2008. Overview of the tac 2008 opinion question answering and summarization tasks. In Proceedings of Text Analysis Confer- ence. Anupam Gupta, Aaron Roth, Grant Schoenebeck, and Kunal Talwar. 2010. Constrained non-monotone submodular maximization: offline and secretary algorithms. In Proceedings of the 6th international conference on Internet and network economics, WINE’ 10, pages 246–257, Berlin, Heidelberg. Springer-Verlag. Sun-Yuan Hsieh and Ting-Yu Chou. 2010. The weight-constrained maximum-density subtree problem and related problems in trees. The Journal of Supercomputing, 54(3):366–380, December. Daisuke Kawahara and Sadao Kurohashi. 2006. A fully-lexicalized probabilistic model for japanese syntactic and case structure analysis. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, HLT-NAACL ’06, pages 176–183, Stroudsburg, PA, USA. Association for Computational Linguistics. Samir Khuller, Anna Moss, and Joseph S. Naor. 1999. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39–45. Andreas Krause and Carlos Guestrin. 2005. A note on the budgeted maximization on submodular functions. Technical Report CMU-CALD-05-103, Carnegie Mellon University. 1031 Ariel Kulik, Hadas Shachnai, and Tami Tamir. 2009. Maximizing submodular set functions subject to multiple linear constraints. In Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’09, pages 545–554, Philadelphia, PA, USA. Society for Industrial and Applied Mathematics. Sadao Kurohashi and Daisuke Kawahara, 2009a. Japanese Morphological Analysis System JUMAN 6.0 Users Manual. http : / / nlp . i .i st . kyot o-u .ac . jp /EN/ index .php ? JUMAN. Sadao Kurohashi and Daisuke Kawahara, 2009b. KN parser (Kurohashi-Nagao parser) 3.0 Users Manual. http : / /nlp . i . i st .kyot o-u . ac . jp / EN/ index .php ?KNP . Jon Lee, Vahab S. Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. 2009. Non-monotone submodular maximization under matroid and knapsack constraints. In Proceedings of the 41st annual ACM symposium on Theory of computing, STOC ’09, pages 323–332, New York, NY, USA. ACM. Hui Lin and Jeff Bilmes. 2010. Multi-document summarization via budgeted maximization of submodular functions. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’ 10, pages 912–920, Stroudsburg, PA, USA. Association for Computational Linguistics. Hui Lin and Jeff Bilmes. 2011. A class of submodular functions for document summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT ’ 11, pages 510–520, Stroudsburg, PA, USA. Association for Computational Linguistics. Jimmy Lin and Dina Demner-Fushman. 2006. Methods for automatically evaluating answers to complex questions. Information Retrieval, 9(5):565– 587, November. Andr e´ F. T. Martins and Noah A. Smith. 2009. Summarization with ajoint model for sentence extraction and compression. In Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing, ILP ’09, pages 1–9, Stroudsburg, PA, USA. Association for Computational Linguistics. Ryan McDonald. 2007. A study of global inference algorithms in multi-document summarization. In Proceedings of the 29th European conference on IR research, ECIR’07, pages 557–564, Berlin, Heidelberg. Springer-Verlag. Teruko Mitamura, Eric Nyberg, Hideki Shima, Tsuneaki Kato, Tatsunori Mori, Chin-Yew Lin, Ruihua Song, Chuan-Jie Lin, Tetsuya Sakai, Donghong Ji, and Noriko Kando. 2008. Overview of the NTCIR-7 ACLIA Tasks: Advanced Cross-Lingual Information Access. In Proceedings of the 7th NTCIR Workshop. Teruko Mitamura, Hideki Shima, Tetsuya Sakai, Noriko Kando, Tatsunori Mori, Koichi Takeda, Chin-Yew Lin, Ruihua Song, Chuan-Jie Lin, and Cheng-Wei Lee. 2010. Overview of the ntcir-8 aclia tasks: Advanced cross-lingual information access. In Proceedings of the 8th NTCIR Workshop. Hajime Morita, Tetsuya Sakai, and Manabu Okumura. 2011. Query snowball: a co-occurrence-based approach to multi-document summarization for question answering. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2, HLT ’ 11, pages 223–229, Stroudsburg, PA, USA. Association for Computational Lin- guistics. Wolfgang Schmidt. 1991 . Greedoids and searches in directed graphs. Discrete Mathmatics, 93(1):75–88, November. Jie Tang, Limin Yao, and Dewei Chen. 2009. Multitopic based query-oriented summarization. In Proceedings of 2009 SIAM International Conference Data Mining (SDM’2009), pages 1147–1 158. David M. Zajic, Bonnie J. Dorr, Jimmy Lin, and Richard Schwartz. 2006. Sentence compression as a component of a multi-document summarization system. In Proceedings of the 2006 Document Understanding Conference (DUC 2006) at NLT/NAACL 2006. 1032