acl acl2011 acl2011-270 acl2011-270-reference knowledge-graph by maker-knowledge-mining

270 acl-2011-SciSumm: A Multi-Document Summarization System for Scientific Articles


Source: pdf

Author: Nitin Agarwal ; Ravi Shankar Reddy ; Kiran GVR ; Carolyn Penstein Rose

Abstract: In this demo, we present SciSumm, an interactive multi-document summarization system for scientific articles. The document collection to be summarized is a list of papers cited together within the same source article, otherwise known as a co-citation. At the heart of the approach is a topic based clustering of fragments extracted from each article based on queries generated from the context surrounding the co-cited list of papers. This analysis enables the generation of an overview of common themes from the co-cited papers that relate to the context in which the co-citation was found. SciSumm is currently built over the 2008 ACL Anthology, however the gen- eralizable nature of the summarization techniques and the extensible architecture makes it possible to use the system with other corpora where a citation network is available. Evaluation results on the same corpus demonstrate that our system performs better than an existing widely used multi-document summarization system (MEAD).


reference text

Agrawal R. and Srikant R. 1994. Fast Algorithm for Mining Association Rules In Proceedings of the 20th VLDB Conference Santiago, Chile, 1994 Baxendale, P. 1958. Machine-made index for technical literature - an experiment. IBM Journal of Research and Development Beil F., Ester M. and Xu X 2002. Frequent-Term based Text Clustering In Proceedings of SIGKDD ’02 Edmonton, Alberta, Canada Carbonell J. and Goldstein J. 1998. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries In Research and Development in Information Retrieval, pages 335–336 Councill I. G. , Giles C. L. and Kan M. 2008. ParsCit: An open-source CRF reference string parsing package INTERNATIONAL LANGUAGE RESOURCES AND EVALUATION European Language Resources Association Edmundson, H.P. 1969. New methods in automatic extracting. Journal of ACM. Hearst M.A. 1997 TextTiling: Segmenting text into multi-paragraph subtopic passages In proceedings of LREC 2004, Lisbon, Portugal, May 2004 Joseph M. T. and Radev D. R. 2007. Citation analysis, centrality, and the ACL Anthology Kupiec J. , Pedersen J. , Chen F. 1995. A training document summarizer. In Proceedings SIGIR ’95, pages 68-73, New York, NY, USA. 28(1): 114–133. Luhn, H. P. 1958. IBM Journal of Research Development. Mani I. , Bloedorn E. 1997. Multi-Document Summarization by graph search and matching In AAAI/IAAI, pages 622-628. [15, 16]. 120 Figure 5: ROUGE-2 Recall Nanba H. , Okumura M. 1999. Towards Multi-paper Summarization Using Reference Information In Proceedings of IJCAI-99, pages 926–931 . Paice CD. 1990. Constructing Literature Abstracts by Computer: Techniques and Prospects Information Processing and Management Vol. 26, No. 1, pp, 171186, 1990 Qazvinian V. , Radev D.R 2008. Scientific Paper summarization using Citation Summary Networks In Proceedings of the 22nd International Conference on Computational Linguistics, pages 689–696 Manchester, August 2008 Radev D. R . , Jing H. and Budzikowska M. 2000. Centroid-based summarization of multiple documents: sentence extraction, utility based evaluation, and user studies In NAACL-ANLP 2000 Workshop on Automatic summarization, pages 21-30, Morristown, NJ, USA. [12, 16, 17]. Radev, Dragomir. 2004. MEAD - a platform for multidocument multilingual text summarization. In proceedings of LREC 2004, Lisbon, Portugal, May 2004. Teufel S. , Moens M. 2002. Summarizing Scientific Articles - Experiments with Relevance and Rhetorical Status In Journal of Computational Linguistics, MIT Press. Hal Daume III , Marcu D. 2006. Bayesian queryfocussed summarization. In Proceedings of the Conference of the Association for Computational Linguistics, ACL. Eisenstein J , Barzilay R. 2008. Bayesian unsupervised topic segmentation In EMNLP-SIGDAT. Barzilay R , Lee L. 2004. Catching the drift: Probabilistic content models, with applications to generation and summarization In Proceedings of 3rd Asian Semantic Web Conference (ASWC 2008), pp.182-188,. Kaplan D , Tokunaga T. 2008. Sighting citation sights: A collective-intelligence approach for automatic summarization of research papers using C-sites In HLTNAACL.