emnlp emnlp2013 emnlp2013-174 emnlp2013-174-reference knowledge-graph by maker-knowledge-mining

174 emnlp-2013-Single-Document Summarization as a Tree Knapsack Problem

Source: pdf

Author: Tsutomu Hirao ; Yasuhisa Yoshida ; Masaaki Nishino ; Norihito Yasuda ; Masaaki Nagata

Abstract: Recent studies on extractive text summarization formulate it as a combinatorial optimization problem such as a Knapsack Problem, a Maximum Coverage Problem or a Budgeted Median Problem. These methods successfully improved summarization quality, but they did not consider the rhetorical relations between the textual units of a source document. Thus, summaries generated by these methods may lack logical coherence. This paper proposes a single document summarization method based on the trimming of a discourse tree. This is a two-fold process. First, we propose rules for transforming a rhetorical structure theorybased discourse tree into a dependency-based discourse tree, which allows us to take a tree- . trimming approach to summarization. Second, we formulate the problem of trimming a dependency-based discourse tree as a Tree Knapsack Problem, then solve it with integer linear programming (ILP). Evaluation results showed that our method improved ROUGE scores.

reference text

Yoav Benjamini and Yosef Hochberg. 1995. Control- ling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B (Methodological), 57(1):289– 300. Lynn Carlson, Daniel Marcu, and Mary Ellen Okurowski. 2001 . Building a discourse-tagged corpus in the framework of rhetorical structure theory. In Proc. of the SIGDIAL01, pages 1–10. Geon Cho and Dong X Shaw. 1997. A depth-first dynamic programming algorithm for the tree knapsack problem. INFORMS Journal on Computing, 9(4):431–438. Hal Daum e´ IIIand Daniel Marcu. 2002. A noisy-channel model for document compression. In Proc. of the 40th ACL, pages 449–456. David duVerle and Helmut Prendinger. 2009. A novel discourse parser based on support vector machine classification. In Proc. of the Joint Conference of the 47th ACL and 4th IJCNLP, pages 665–673. Elena Filatova and Vasileios Hatzivassiloglou. 2004. A formal model for information selection in multisentence extraction. In Proc. of the 20th COLING, pages 397–403. Katja Filippova and Michael Strube. 2008. Dependency tree based sentence compression. In Proc. of the 5th International Natural Language Generation Conference (INLG), pages 25–32. Hugo Hernault, Helmut Prendinger, David A duVerle, and Mitsuru Ishizuka. 2010. HILDA: A discourse parser using support vector machine classification. Dialogue and Discourse, 1(3): 1–33. Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Proc. of Workshop on Text Summarization Branches Out, pages 74–81. J. A. Lukes. 1974. Efficient algorithm for the partitioning of trees. IBM Journal of Research and Development, 18(3):217–224. Daniel Marcu. 1998. Improving summarization through rhetorical parsing tuning. In Proc. of the 6th Workshop on Very Large Corpora, pages 206–215. Ryan McDonald. 2007. A study of global inference algorithms in multi-document summarization. In Proc. of the 29th ECIR, pages 557–564. Ani Nenkova and Kathaleen McKeown. 2011. Auto- matic summarization. Foundations and Trends in Information Retrieval, 5(2-3): 103–233. Natthawut Samphaiboon and Takeo Yamada. 2000. Heuristic and exact algorithms for the precedenceconstrained knapsack problem. Journal of Optimization Theory and Applications, 105(3):659–676. Hiroya Takamura and Manabu Okumura. 2009a. Text summarization model based on maximum coverage problem and its variant. In Proc. of the 12th EACL, pages 781–789. Hiroya Takamura and Manabu Okumura. 2009b. Text summarization model based on the budgeted median problem. In Proceedings of the 18th CIKM. Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometrics Bulletin, 1(6):80–83. William Charles, Mann and Sandra Annear, Thompson. 1988. Rhetorical Structure Theory: Toward a functional theory of text organization. Text, 8(3):243–281. 1520