acl acl2010 acl2010-140 acl2010-140-reference knowledge-graph by maker-knowledge-mining

140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.

Source: pdf

Author: Vahed Qazvinian ; Dragomir R. Radev

Abstract: Identifying background (context) information in scientific articles can help scholars understand major contributions in their research area more easily. In this paper, we propose a general framework based on probabilistic inference to extract such context information from scientific papers. We model the sentences in an article and their lexical similarities as a Markov Random Field tuned to detect the patterns that context data create, and employ a Belief Propagation mechanism to detect likely context sentences. We also address the problem of generating surveys of scientific papers. Our experiments show greater pyramid scores for surveys generated using such context information rather than citation sentences alone.

reference text

Shannon Bradshaw. 2002. Reference Directed Indexing: Indexing Scientific Literature in the Context of Its Use. Ph.D. thesis, Northwestern University. Shannon Bradshaw. 2003. Reference directed indexing: Redeeming relevance for subject search in citation indexes. In Proceedings of the 7th European Conference on Research and Advanced Technology for Digital Libraries. Aaron Elkiss, Siwei Shen, Anthony Fader, G ¨unes ¸ Erkan, David States, and Dragomir R. Radev. 2008. Blind men and elephants: What do citation summaries tell us about a research article? Journal of the American Society for Information Science and Technology, 59(1):51–62. G ¨unes ¸ Erkan and Dragomir R. Radev. 2004. Lexrank: Graph-based centrality as salience in text summarization. Journal of Artificial Intelligence Research (JAIR). Dain Kaplan, Ryu Iida, and Takenobu Tokunaga. 2009. Automatic extraction of citation contexts for re- search paper summarization: A coreference-chain based approach. In Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, pages 88–95, Suntec City, Singapore, August. Association for Computational Linguistics. Mary McGlohon, Stephen Bay, Markus G. Anderle, David M. Steier, and Christos Faloutsos. 2009. Snare: a link analytic system for graph labeling and risk detection. In KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1265–1274. Qiaozhu Mei and ChengXiang Zhai. 2008. Generating impact-based summaries for scientific literature. In Proceedings of ACL ’08, pages 816–824. Donald Metzler and W. Bruce Croft. 2005. A markov random field model for term dependencies. In SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 472–479. Donald Metzler and W. Bruce Croft. 2007. Latent concept expansion using markov random fields. In SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 3 11–318. Donald A. Metzler. 2007. Automatic feature selection in the markov random field model for information retrieval. In CIKM ’07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 253–262. Saif Mohammad, Bonnie Dorr, Melissa Egan, Ahmed Hassan, Pradeep Muthukrishan, Vahed Qazvinian, Dragomir Radev, and David Zajic. 2009. Using citations to generate surveys of scientific paradigms. 563 In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 584–592, Boulder, Colorado, June. Association for Computational Linguistics. Hidetsugu Nanba and Manabu Okumura. 1999. Towards multi-paper summarization using reference information. In IJCAI1999, pages 926–931. Hidetsugu Nanba, Takeshi Abekawa, Manabu Okumura, and Suguru Saito. 2004a. Bilingual presri: Integration of multiple research paper databases. In Proceedings of RIAO 2004, pages 195–21 1, Avignon, France. Hidetsugu Nanba, Noriko Kando, and Manabu Okumura. 2004b. Classification of research papers using citation links and citation types: Towards automatic review article generation. In Proceedings of the 11th SIG Classification Research Workshop, pages 117–134, Chicago, USA. Mark E. J. Newman. 2001. The structure of scientific collaboration networks. PNAS, 98(2):404–409. Vahed Qazvinian and Dragomir R. Radev. 2008. Scientific paper summarization using citation summary networks. In COLING 2008, Manchester, UK. Dragomir R. Radev, Pradeep Muthukrishnan, and Vahed Qazvinian. 2009. The ACL anthology network corpus. In ACL workshop on Natural Language Processing and Information Retrieval for Digital Libraries. Matteo Romanello, Federico Boschetti, and Gregory Crane. 2009. Citations in the digital library of classics: Extracting canonical references by using conditional random fields. In Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, pages 80–87, Suntec City, Singapore, August. Association for Computational Linguistics. Advaith Siddharthan and Simone Teufel. 2007. Whose idea was this, and why does it matter? attributing scientific work to citations. In Proceedings of NAACL/HLT-07. Simone Teufel and Marc Moens. 2002. Summarizing scientific articles: experiments with relevance and rhetorical status. Comput. Linguist., 28(4):409–445. Simone Teufel, Advaith Siddharthan, and Dan Tidhar. 2006. Automatic classification of citation function. In Proceedings of the EMNLP, Sydney, Australia, July. Simone Teufel. 2005. Argumentative Zoning for Improved Citation Indexing. Computing Attitude and Affect in Text: Theory and Applications, pages 159– 170. Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. 2003. Understanding belief propagation and its generalizations. pages 239–269. 564