acl acl2011 acl2011-270 knowledge-graph by maker-knowledge-mining

270 acl-2011-SciSumm: A Multi-Document Summarization System for Scientific Articles


Source: pdf

Author: Nitin Agarwal ; Ravi Shankar Reddy ; Kiran GVR ; Carolyn Penstein Rose

Abstract: In this demo, we present SciSumm, an interactive multi-document summarization system for scientific articles. The document collection to be summarized is a list of papers cited together within the same source article, otherwise known as a co-citation. At the heart of the approach is a topic based clustering of fragments extracted from each article based on queries generated from the context surrounding the co-cited list of papers. This analysis enables the generation of an overview of common themes from the co-cited papers that relate to the context in which the co-citation was found. SciSumm is currently built over the 2008 ACL Anthology, however the gen- eralizable nature of the summarization techniques and the extensible architecture makes it possible to use the system with other corpora where a citation network is available. Evaluation results on the same corpus demonstrate that our system performs better than an existing widely used multi-document summarization system (MEAD).

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Kiran Gvr Language Technologies Resource Center IIIT-Hyderabad, India kiran gvr@ student s . [sent-3, score-0.09]

2 in i Abstract In this demo, we present SciSumm, an interactive multi-document summarization system for scientific articles. [sent-9, score-0.525]

3 The document collection to be summarized is a list of papers cited together within the same source article, otherwise known as a co-citation. [sent-10, score-0.291]

4 At the heart of the approach is a topic based clustering of fragments extracted from each article based on queries generated from the context surrounding the co-cited list of papers. [sent-11, score-0.342]

5 This analysis enables the generation of an overview of common themes from the co-cited papers that relate to the context in which the co-citation was found. [sent-12, score-0.259]

6 SciSumm is currently built over the 2008 ACL Anthology, however the gen- eralizable nature of the summarization techniques and the extensible architecture makes it possible to use the system with other corpora where a citation network is available. [sent-13, score-0.436]

7 Evaluation results on the same corpus demonstrate that our system performs better than an existing widely used multi-document summarization system (MEAD). [sent-14, score-0.388]

8 1 Introduction We present an interactive multi-document summarization system called SciSumm that summarizes document collections that are composed of lists of papers cited together within the same source article, otherwise known as a co-citation. [sent-15, score-0.661]

9 The interactive nature of the summarization approach makes this demo session ideal for its presentation. [sent-16, score-0.38]

10 When users interact with SciSumm, they request summaries in context as they read, and that context 115 Carolyn Penstein Ros e´ Language Technologies Institute Carnegie Mellon University cpro se @ cs cmu edu . [sent-17, score-0.249]

11 determines the focus of the summary generated for a set of related scientific articles. [sent-19, score-0.328]

12 This behaviour is different from some other non-interactive summarization systems that might appear as a black box and might not tailor the result to the specific information needs of the users in context. [sent-20, score-0.29]

13 SciSumm captures a user’s contextual needs when a user clicks on a co-citation. [sent-21, score-0.19]

14 Using the context of the co-citation in the source article, we generate a query that allows us to create a summary in a query-oriented fashion. [sent-22, score-0.209]

15 The extracted portions of the co-cited articles are then assembled into clusters that represent the main themes of the articles that relate to the context in which they were cited. [sent-23, score-0.488]

16 Our evaluation demonstrates that SciSumm achieves higher quality summaries than a state-of-the-art multidocument summarization system (Radev, 2004). [sent-24, score-0.559]

17 The end-to-end summarization pipeline has been described in Section 3. [sent-27, score-0.29]

18 Section 4 presents an evaluation of summaries generated from the system. [sent-28, score-0.25]

19 We present an overview of relevant literature in Section 5. [sent-29, score-0.098]

20 2 Design Goals Consider that as a researcher reads a scientific article, she/he encounters numerous citations, most of them citing the foundational and seminal work that is important in that scientific domain. [sent-31, score-0.387]

21 The text surrounding these citations is a valuable resource as it allows the author to make a statement about her PortlanPdr,o Ocre egdoin ,g sU oSAf t,h 2e1 A CJuLn-eH 2L0T1 2. [sent-32, score-0.124]

22 1c 12 S0y1s1te Amss Doecmiaotinosntr faotiron Cos,m papguetast 1io1n5a–l1 L2in0g,uistics viewpoint towards the cited articles. [sent-34, score-0.227]

23 A system that could generate a small summary of the collection of cited articles that is constructed specifically to relate to the claims made by the author citing them would be incredibly useful. [sent-36, score-0.473]

24 It would also help the researcher determine if the cited work is relevant for her own research. [sent-37, score-0.22]

25 As an example of such a co-citation consider the following citation sentence: Various machine learning approaches have been proposed for chunking (Ramshaw and Marcus, 1995; Tjong Kim Sang, 2000a; Tjong Kim Sang et al. [sent-38, score-0.183]

26 He would probably be required to go through these cited papers to understand what is similar and different in the variety of chunking approaches. [sent-41, score-0.33]

27 Instead of going through these individual papers, it would be quicker if the user could get the summary of the topics in all those papers that talk about the usage of machine learning methods in chunking. [sent-42, score-0.354]

28 SciSumm aims to automatically discover these points of comparison between the cocited papers by taking into consideration the contextual needs of a user. [sent-43, score-0.106]

29 When the user clicks on a co-citation in context, the system uses the text surrounding that co-citation as evidence of the information need. [sent-44, score-0.289]

30 The system provides a web based interface for viewing and summarizing research articles in the ACL Anthology corpus, 2008. [sent-46, score-0.166]

31 The summarization proceeds in three main stages as follows: • A user may retrieve a collection of articles oAf i unsteerres mt by entering a query. [sent-47, score-0.519]

32 Snci oSfum armtic responds by returning a list of relevant articles, including the title and a snippet based summary. [sent-48, score-0.11]

33 • A user can use the title, snippet summary and aAut uhsoerr cinafnor umsaeti thoen ttiotl efi,n sdn an atrt siucmlem oafr yin atenrdest. [sent-50, score-0.259]

34 The actual article is rendered in HTML after the user clicks on one of the search results. [sent-51, score-0.276]

35 • If a user clicks on one, SciSumm responds by generating a query fornome, tShcei Sloumcalm mco rnestpexotn dofs t bhey co-citation. [sent-53, score-0.299]

36 That query is then used to select relevant portions of the co-cited articles, which are then used to generate the summary. [sent-54, score-0.128]

37 An example of a summary for a particular topic is displayed in Figure 2. [sent-55, score-0.164]

38 This figure shows one of the clusters generated for the citation sentence “Various machine learning approaches have been proposed for chunking (Ramshaw and Marcus, 1995; Tjong Kim Sang, 2000a; Tjong Kim Sang et al. [sent-56, score-0.359]

39 The cluster has a label Chunk, Tag, Word and contains fragments from two of the papers discussing this topic. [sent-58, score-0.16]

40 A ranked list of such clusters is generated, which allows for swift navigation between topics of interest for a user (Figure 3). [sent-59, score-0.277]

41 This summary is tremendously useful as it informs the user of the different perspectives of co-cited authors towards a shared problem (in this case ”Chunking”). [sent-60, score-0.302]

42 More specifically, it informs the user as to how different or similar approaches are that were used for this research problem (which is ”Chunking”). [sent-61, score-0.159]

43 First, the Text Tiling module takes care of obtaining tiles of text relevant to the citation context. [sent-64, score-0.48]

44 Next, the clustering module is used to generate labelled clusters using the text tiles extracted from the co-cited papers. [sent-65, score-0.522]

45 The clusters are ordered according to relevance with respect to the generated query. [sent-66, score-0.222]

46 2 Texttiling The Text Tiling module uses the TextTiling algorithm (Hearst, 1997) for segmenting the text of each article. [sent-70, score-0.139]

47 We have used text tiles as the basic unit for our summary since individual sentences are too short to stand on their own. [sent-71, score-0.35]

48 This happens as a sideeffect of the length of scientific articles. [sent-72, score-0.155]

49 Sentences picked from different parts of several articles assembled together would make an incoherent summary. [sent-73, score-0.135]

50 Once computed, text tiles are used to expand on the content viewed within the context associated with a co-citation. [sent-74, score-0.298]

51 The intuition is that an embedded cocitation in a text tile is connected with the topic distribution of its context. [sent-75, score-0.167]

52 Thus, we can use a computation of similarity between tiles and the context of the co-citation to rank clusters generated using Frequent Term based text clustering. [sent-76, score-0.447]

53 3 Frequent Term Based Clustering The clustering module employs Frequent Term Based Clustering (Beil et al. [sent-78, score-0.169]

54 For each cocitation, we use this clustering technique to cluster all the of the extracted text tiles generated by segmenting each of the co-cited papers. [sent-80, score-0.499]

55 We settled on this clustering approach for the following reasons: • Text tile contents coming from different papers cToenxstt tiitluete c a sparse vector space, faenrde tth puasp tehres centroid based approaches would not work very well for integrating content across papers. [sent-81, score-0.365]

56 • Frequent Term based clustering is extremely fFarsetq iune nexte Tcuertmion b taimseed as uwsteelrl as a insd relatively 117 efficient in terms of space requirements. [sent-82, score-0.101]

57 • A frequent term set is generated for each cluster, weqhuicehnt gives a comprehensible description that can be used to label the cluster. [sent-83, score-0.212]

58 Frequent Term Based text clustering uses a group of frequently co-occurring terms called a frequent term set. [sent-84, score-0.28]

59 We use a measure of entropy to rank these frequent term sets. [sent-85, score-0.149]

60 Frequent term sets provide a clean clustering that is determined by specifying the number of overlapping documents containing more than one frequent term set. [sent-86, score-0.325]

61 The algorithm uses the first k term sets if all the documents in the document collection are clustered. [sent-87, score-0.107]

62 4 Cluster Ranking The ranking module uses cosine similarity between the query and the centroid of each cluster to rank all the clusters generated by the clustering module. [sent-93, score-0.583]

63 4 Evaluation We have taken great care in the design of the evaluation for the SciSumm summarization system. [sent-96, score-0.348]

64 In a Figure 2: Example of a summary generated by our system. [sent-97, score-0.173]

65 We can see that the clusters are cross cutting across different papers, thus giving the user a multi-document summary. [sent-98, score-0.224]

66 typical evaluation of a multi-document summarization system, gold standard summaries are created by hand and then compared against fixed length gen- erated summaries. [sent-99, score-0.53]

67 It was necessary to prepare our own evaluation corpus, consisting of gold standard summaries created for a randomly selected set of cocitations because such an evaluation corpus does not exist for this task. [sent-100, score-0.292]

68 1 Experimental Setup An important target user population for multidocument summarization of scientific articles is graduate students. [sent-102, score-0.726]

69 Hence to get a measure of how well the summarization system is performing, we asked 2 graduate students who have been working in the computational linguistics community to create gold standard summaries of a fixed length (8 sentences ∼ 200 words) for 10 randomly selected cocitations. [sent-103, score-0.594]

70 We obtained two different gold standard summaries for each co-citation (i. [sent-104, score-0.24]

71 In the absence of any other multi-document summarization system in the domain of scientific article summarization, we used a widely used and freely available multi-document summarization sys- tem called MEAD (Radev, 2004) as our baseline. [sent-109, score-0.885]

72 MEAD uses centroid based summarization to create informative clusters of topics. [sent-110, score-0.482]

73 We use the default configuration of MEAD in which MEAD uses 118 length, position and centroid for ranking each sentence. [sent-111, score-0.116]

74 We did not use query focussed summarization with MEAD. [sent-112, score-0.358]

75 We evaluate its performance with the same gold standard summaries we use to evaluate SciSumm. [sent-113, score-0.24]

76 For generating a summary from our system we used sentences from the tiles that are clustered in the top ranked cluster. [sent-114, score-0.354]

77 In this way we prepare a summary comprising of 8 highly relevant sentences. [sent-116, score-0.193]

78 2 Results For measuring performance of the two summarization systems (SciSumm and MEAD), we compute the ROUGE metric based on the 2 * 10 gold standard summaries that were manually created. [sent-118, score-0.53]

79 ROUGE has been traditionally used to compute the performance based on the N-gram overlap (ROUGE-N) between the summaries generated by the system and the target gold standard summaries. [sent-119, score-0.337]

80 Especially important is Figure 3: Clusters generated in response to a user click on the co-citation. [sent-123, score-0.174]

81 The list of clusters in the left pane gives a bird-eye view of the topics which are present in the co-cited papers Table 1: Average ROUGE results. [sent-124, score-0.246]

82 It is apparent from the p-values generated by T-Test that our system performs significantly better than MEAD on three of the metrics on which both the systems were evaluated using (p < 0. [sent-132, score-0.097]

83 This supports the view that summaries perceived as higher in value are generated using a query focused technique, where the query is generated automatically from the context of the co-citation. [sent-134, score-0.48]

84 5 Previous Work Surprisingly, not many approaches to the problem of summarization of scientific articles have been proposed in the past. [sent-135, score-0.537]

85 (2008) present a summarization approach that can be seen as the converse of what we are working to achieve. [sent-137, score-0.29]

86 Rather than summarizing multiple papers cited in the same source article, they summarize different viewpoints expressed towards the same paper from different papers that cite it. [sent-138, score-0.481]

87 (1999) argue in their 119 work that a co-citation frequently implies a consistent viewpoint towards the cited articles. [sent-140, score-0.227]

88 Another approach that uses bibliographic coupling has been used for gathering different viewpoints from which to summarize a document (Kaplan et al. [sent-141, score-0.101]

89 In our work we make use of this insight by generating a query to focus our multi-document summary from the text closest to the citation. [sent-143, score-0.208]

90 6 Conclusion And Future Work In this demo, we present SciSumm, which is an interactive multi-document summarization system for scientific articles. [sent-144, score-0.525]

91 Our evaluation shows that the SciSumm approach to content selection outperforms another widely used multi-document summarization system for this summarization task. [sent-145, score-0.671]

92 Our long term goal is to expand the capabilities of SciSumm to generate literature surveys of larger document collections from less focused queries. [sent-146, score-0.14]

93 This more challenging task would require more control over filtering and ranking in order to avoid generating summaries that lack focus. [sent-147, score-0.224]

94 , 1998), which can be used to optimize the diversity of selected text tiles as well as the relevance based ordering of clusters, i. [sent-149, score-0.286]

95 , so that more diverse sets of extracts from the co-cited articles will be placed at the ready fingertips of users. [sent-151, score-0.092]

96 Another important direction is to refine the interaction design through task-based user studies. [sent-152, score-0.14]

97 Scientific Paper summarization using Citation Summary Networks In Proceedings of the 22nd International Conference on Computational Linguistics, pages 689–696 Manchester, August 2008 Radev D. [sent-212, score-0.29]

98 Centroid-based summarization of multiple documents: sentence extraction, utility based evaluation, and user studies In NAACL-ANLP 2000 Workshop on Automatic summarization, pages 21-30, Morristown, NJ, USA. [sent-217, score-0.401]

99 Catching the drift: Probabilistic content models, with applications to generation and summarization In Proceedings of 3rd Asian Semantic Web Conference (ASWC 2008), pp. [sent-236, score-0.317]

100 Sighting citation sights: A collective-intelligence approach for automatic summarization of research papers using C-sites In HLTNAACL. [sent-240, score-0.508]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('scisumm', 0.595), ('summarization', 0.29), ('tiles', 0.21), ('summaries', 0.187), ('mead', 0.181), ('scientific', 0.155), ('cited', 0.153), ('sang', 0.124), ('tjong', 0.119), ('clusters', 0.113), ('citation', 0.112), ('user', 0.111), ('summary', 0.11), ('papers', 0.106), ('clustering', 0.101), ('articles', 0.092), ('radev', 0.092), ('article', 0.086), ('clicks', 0.079), ('centroid', 0.079), ('term', 0.075), ('frequent', 0.074), ('texttiling', 0.072), ('rouge', 0.072), ('chunking', 0.071), ('kim', 0.069), ('query', 0.068), ('module', 0.068), ('generated', 0.063), ('anthology', 0.062), ('beil', 0.059), ('cocitation', 0.059), ('gvr', 0.059), ('kiran', 0.059), ('tiling', 0.059), ('citations', 0.059), ('cluster', 0.054), ('gold', 0.053), ('agrawal', 0.052), ('prepare', 0.052), ('tile', 0.052), ('sassano', 0.052), ('utsuro', 0.052), ('informs', 0.048), ('portugal', 0.048), ('multidocument', 0.048), ('relevance', 0.046), ('interactive', 0.046), ('nanba', 0.045), ('themes', 0.045), ('halteren', 0.045), ('reddy', 0.045), ('demo', 0.044), ('viewpoints', 0.043), ('lisbon', 0.043), ('assembled', 0.043), ('relate', 0.043), ('qazvinian', 0.041), ('viewpoint', 0.041), ('responds', 0.041), ('citing', 0.041), ('segmenting', 0.041), ('summarizing', 0.04), ('kaplan', 0.04), ('mmr', 0.038), ('snippet', 0.038), ('ranking', 0.037), ('researcher', 0.036), ('carbonell', 0.035), ('india', 0.035), ('surrounding', 0.035), ('system', 0.034), ('overview', 0.034), ('literature', 0.033), ('towards', 0.033), ('ramshaw', 0.033), ('document', 0.032), ('relevant', 0.031), ('hearst', 0.031), ('context', 0.031), ('student', 0.031), ('widely', 0.03), ('graduate', 0.03), ('modules', 0.03), ('text', 0.03), ('care', 0.029), ('portions', 0.029), ('design', 0.029), ('displayed', 0.028), ('mellon', 0.028), ('content', 0.027), ('goals', 0.027), ('carnegie', 0.027), ('topics', 0.027), ('topic', 0.026), ('coupling', 0.026), ('swift', 0.026), ('entering', 0.026), ('etu', 0.026), ('ester', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 270 acl-2011-SciSumm: A Multi-Document Summarization System for Scientific Articles

Author: Nitin Agarwal ; Ravi Shankar Reddy ; Kiran GVR ; Carolyn Penstein Rose

Abstract: In this demo, we present SciSumm, an interactive multi-document summarization system for scientific articles. The document collection to be summarized is a list of papers cited together within the same source article, otherwise known as a co-citation. At the heart of the approach is a topic based clustering of fragments extracted from each article based on queries generated from the context surrounding the co-cited list of papers. This analysis enables the generation of an overview of common themes from the co-cited papers that relate to the context in which the co-citation was found. SciSumm is currently built over the 2008 ACL Anthology, however the gen- eralizable nature of the summarization techniques and the extensible architecture makes it possible to use the system with other corpora where a citation network is available. Evaluation results on the same corpus demonstrate that our system performs better than an existing widely used multi-document summarization system (MEAD).

2 0.25450569 71 acl-2011-Coherent Citation-Based Summarization of Scientific Papers

Author: Amjad Abu-Jbara ; Dragomir Radev

Abstract: In citation-based summarization, text written by several researchers is leveraged to identify the important aspects of a target paper. Previous work on this problem focused almost exclusively on its extraction aspect (i.e. selecting a representative set of citation sentences that highlight the contribution of the target paper). Meanwhile, the fluency of the produced summaries has been mostly ignored. For example, diversity, readability, cohesion, and ordering of the sentences included in the summary have not been thoroughly considered. This resulted in noisy and confusing summaries. In this work, we present an approach for producing readable and cohesive citation-based summaries. Our experiments show that the pro- posed approach outperforms several baselines in terms of both extraction quality and fluency.

3 0.18476075 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations

Author: Dong Wang ; Yang Liu

Abstract: This paper presents a pilot study of opinion summarization on conversations. We create a corpus containing extractive and abstractive summaries of speaker’s opinion towards a given topic using 88 telephone conversations. We adopt two methods to perform extractive summarization. The first one is a sentence-ranking method that linearly combines scores measured from different aspects including topic relevance, subjectivity, and sentence importance. The second one is a graph-based method, which incorporates topic and sentiment information, as well as additional information about sentence-to-sentence relations extracted based on dialogue structure. Our evaluation results show that both methods significantly outperform the baseline approach that extracts the longest utterances. In particular, we find that incorporating dialogue structure in the graph-based method contributes to the improved system performance.

4 0.1564568 201 acl-2011-Learning From Collective Human Behavior to Introduce Diversity in Lexical Choice

Author: Vahed Qazvinian ; Dragomir R. Radev

Abstract: We analyze collective discourse, a collective human behavior in content generation, and show that it exhibits diversity, a property of general collective systems. Using extensive analysis, we propose a novel paradigm for designing summary generation systems that reflect the diversity of perspectives seen in reallife collective summarization. We analyze 50 sets of summaries written by human about the same story or artifact and investigate the diversity of perspectives across these summaries. We show how different summaries use various phrasal information units (i.e., nuggets) to express the same atomic semantic units, called factoids. Finally, we present a ranker that employs distributional similarities to build a net- work of words, and captures the diversity of perspectives by detecting communities in this network. Our experiments show how our system outperforms a wide range of other document ranking systems that leverage diversity.

5 0.15468337 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features

Author: Awais Athar

Abstract: Sentiment analysis of citations in scientific papers and articles is a new and interesting problem due to the many linguistic differences between scientific texts and other genres. In this paper, we focus on the problem of automatic identification of positive and negative sentiment polarity in citations to scientific papers. Using a newly constructed annotated citation sentiment corpus, we explore the effectiveness of existing and novel features, including n-grams, specialised science-specific lexical features, dependency relations, sentence splitting and negation features. Our results show that 3-grams and dependencies perform best in this task; they outperform the sentence splitting, science lexicon and negation based features.

6 0.14425167 98 acl-2011-Discovery of Topically Coherent Sentences for Extractive Summarization

7 0.14390846 326 acl-2011-Using Bilingual Information for Cross-Language Document Summarization

8 0.14218895 76 acl-2011-Comparative News Summarization Using Linear Programming

9 0.14119765 308 acl-2011-Towards a Framework for Abstractive Summarization of Multimodal Documents

10 0.12287838 255 acl-2011-Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering

11 0.12257961 18 acl-2011-A Latent Topic Extracting Method based on Events in a Document and its Application

12 0.12173167 4 acl-2011-A Class of Submodular Functions for Document Summarization

13 0.11940276 251 acl-2011-Probabilistic Document Modeling for Syntax Removal in Text Summarization

14 0.11766294 187 acl-2011-Jointly Learning to Extract and Compress

15 0.11132432 47 acl-2011-Automatic Assessment of Coverage Quality in Intelligence Reports

16 0.098765649 298 acl-2011-The ACL Anthology Searchbench

17 0.098457009 156 acl-2011-IMASS: An Intelligent Microblog Analysis and Summarization System

18 0.093627699 280 acl-2011-Sentence Ordering Driven by Local and Global Coherence for Summary Generation

19 0.088255502 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

20 0.080286503 73 acl-2011-Collective Classification of Congressional Floor-Debate Transcripts


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.177), (1, 0.119), (2, -0.07), (3, 0.116), (4, -0.117), (5, -0.076), (6, -0.116), (7, 0.152), (8, 0.041), (9, -0.057), (10, -0.103), (11, -0.047), (12, -0.17), (13, -0.025), (14, -0.233), (15, -0.094), (16, 0.028), (17, 0.043), (18, 0.077), (19, 0.033), (20, -0.116), (21, -0.071), (22, 0.128), (23, -0.091), (24, 0.035), (25, 0.019), (26, 0.02), (27, 0.035), (28, -0.067), (29, -0.07), (30, -0.06), (31, 0.004), (32, -0.023), (33, -0.073), (34, 0.026), (35, -0.033), (36, 0.005), (37, -0.005), (38, -0.048), (39, 0.044), (40, -0.036), (41, 0.054), (42, 0.018), (43, 0.042), (44, 0.013), (45, 0.013), (46, 0.034), (47, -0.038), (48, -0.02), (49, -0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95683044 270 acl-2011-SciSumm: A Multi-Document Summarization System for Scientific Articles

Author: Nitin Agarwal ; Ravi Shankar Reddy ; Kiran GVR ; Carolyn Penstein Rose

Abstract: In this demo, we present SciSumm, an interactive multi-document summarization system for scientific articles. The document collection to be summarized is a list of papers cited together within the same source article, otherwise known as a co-citation. At the heart of the approach is a topic based clustering of fragments extracted from each article based on queries generated from the context surrounding the co-cited list of papers. This analysis enables the generation of an overview of common themes from the co-cited papers that relate to the context in which the co-citation was found. SciSumm is currently built over the 2008 ACL Anthology, however the gen- eralizable nature of the summarization techniques and the extensible architecture makes it possible to use the system with other corpora where a citation network is available. Evaluation results on the same corpus demonstrate that our system performs better than an existing widely used multi-document summarization system (MEAD).

2 0.8418566 201 acl-2011-Learning From Collective Human Behavior to Introduce Diversity in Lexical Choice

Author: Vahed Qazvinian ; Dragomir R. Radev

Abstract: We analyze collective discourse, a collective human behavior in content generation, and show that it exhibits diversity, a property of general collective systems. Using extensive analysis, we propose a novel paradigm for designing summary generation systems that reflect the diversity of perspectives seen in reallife collective summarization. We analyze 50 sets of summaries written by human about the same story or artifact and investigate the diversity of perspectives across these summaries. We show how different summaries use various phrasal information units (i.e., nuggets) to express the same atomic semantic units, called factoids. Finally, we present a ranker that employs distributional similarities to build a net- work of words, and captures the diversity of perspectives by detecting communities in this network. Our experiments show how our system outperforms a wide range of other document ranking systems that leverage diversity.

3 0.79390985 71 acl-2011-Coherent Citation-Based Summarization of Scientific Papers

Author: Amjad Abu-Jbara ; Dragomir Radev

Abstract: In citation-based summarization, text written by several researchers is leveraged to identify the important aspects of a target paper. Previous work on this problem focused almost exclusively on its extraction aspect (i.e. selecting a representative set of citation sentences that highlight the contribution of the target paper). Meanwhile, the fluency of the produced summaries has been mostly ignored. For example, diversity, readability, cohesion, and ordering of the sentences included in the summary have not been thoroughly considered. This resulted in noisy and confusing summaries. In this work, we present an approach for producing readable and cohesive citation-based summaries. Our experiments show that the pro- posed approach outperforms several baselines in terms of both extraction quality and fluency.

4 0.73639965 308 acl-2011-Towards a Framework for Abstractive Summarization of Multimodal Documents

Author: Charles Greenbacker

Abstract: We propose a framework for generating an abstractive summary from a semantic model of a multimodal document. We discuss the type of model required, the means by which it can be constructed, how the content of the model is rated and selected, and the method of realizing novel sentences for the summary. To this end, we introduce a metric called information density used for gauging the importance of content obtained from text and graphical sources.

5 0.66564065 255 acl-2011-Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering

Author: Hajime Morita ; Tetsuya Sakai ; Manabu Okumura

Abstract: We propose a new method for query-oriented extractive multi-document summarization. To enrich the information need representation of a given query, we build a co-occurrence graph to obtain words that augment the original query terms. We then formulate the summarization problem as a Maximum Coverage Problem with Knapsack Constraints based on word pairs rather than single words. Our experiments with the NTCIR ACLIA question answering test collections show that our method achieves a pyramid F3-score of up to 0.3 13, a 36% improvement over a baseline using Maximal Marginal Relevance. 1

6 0.62747657 67 acl-2011-Clairlib: A Toolkit for Natural Language Processing, Information Retrieval, and Network Analysis

7 0.62636286 76 acl-2011-Comparative News Summarization Using Linear Programming

8 0.62021923 187 acl-2011-Jointly Learning to Extract and Compress

9 0.61000741 326 acl-2011-Using Bilingual Information for Cross-Language Document Summarization

10 0.60415661 4 acl-2011-A Class of Submodular Functions for Document Summarization

11 0.59931988 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations

12 0.58983719 251 acl-2011-Probabilistic Document Modeling for Syntax Removal in Text Summarization

13 0.58980191 98 acl-2011-Discovery of Topically Coherent Sentences for Extractive Summarization

14 0.54767019 47 acl-2011-Automatic Assessment of Coverage Quality in Intelligence Reports

15 0.5176664 73 acl-2011-Collective Classification of Congressional Floor-Debate Transcripts

16 0.50921613 80 acl-2011-ConsentCanvas: Automatic Texturing for Improved Readability in End-User License Agreements

17 0.49564415 298 acl-2011-The ACL Anthology Searchbench

18 0.49017256 280 acl-2011-Sentence Ordering Driven by Local and Global Coherence for Summary Generation

19 0.48913983 156 acl-2011-IMASS: An Intelligent Microblog Analysis and Summarization System

20 0.44754782 338 acl-2011-Wikulu: An Extensible Architecture for Integrating Natural Language Processing Techniques with Wikis


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.028), (11, 0.021), (13, 0.011), (17, 0.027), (26, 0.031), (37, 0.033), (39, 0.026), (41, 0.029), (55, 0.019), (59, 0.016), (72, 0.027), (91, 0.032), (96, 0.636)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.99731517 25 acl-2011-A Simple Measure to Assess Non-response

Author: Anselmo Penas ; Alvaro Rodrigo

Abstract: There are several tasks where is preferable not responding than responding incorrectly. This idea is not new, but despite several previous attempts there isn’t a commonly accepted measure to assess non-response. We study here an extension of accuracy measure with this feature and a very easy to understand interpretation. The measure proposed (c@1) has a good balance of discrimination power, stability and sensitivity properties. We show also how this measure is able to reward systems that maintain the same number of correct answers and at the same time decrease the number of incorrect ones, by leaving some questions unanswered. This measure is well suited for tasks such as Reading Comprehension tests, where multiple choices per question are given, but only one is correct.

2 0.99702263 49 acl-2011-Automatic Evaluation of Chinese Translation Output: Word-Level or Character-Level?

Author: Maoxi Li ; Chengqing Zong ; Hwee Tou Ng

Abstract: Word is usually adopted as the smallest unit in most tasks of Chinese language processing. However, for automatic evaluation of the quality of Chinese translation output when translating from other languages, either a word-level approach or a character-level approach is possible. So far, there has been no detailed study to compare the correlations of these two approaches with human assessment. In this paper, we compare word-level metrics with characterlevel metrics on the submitted output of English-to-Chinese translation systems in the IWSLT’08 CT-EC and NIST’08 EC tasks. Our experimental results reveal that character-level metrics correlate with human assessment better than word-level metrics. Our analysis suggests several key reasons behind this finding. 1

same-paper 3 0.99626607 270 acl-2011-SciSumm: A Multi-Document Summarization System for Scientific Articles

Author: Nitin Agarwal ; Ravi Shankar Reddy ; Kiran GVR ; Carolyn Penstein Rose

Abstract: In this demo, we present SciSumm, an interactive multi-document summarization system for scientific articles. The document collection to be summarized is a list of papers cited together within the same source article, otherwise known as a co-citation. At the heart of the approach is a topic based clustering of fragments extracted from each article based on queries generated from the context surrounding the co-cited list of papers. This analysis enables the generation of an overview of common themes from the co-cited papers that relate to the context in which the co-citation was found. SciSumm is currently built over the 2008 ACL Anthology, however the gen- eralizable nature of the summarization techniques and the extensible architecture makes it possible to use the system with other corpora where a citation network is available. Evaluation results on the same corpus demonstrate that our system performs better than an existing widely used multi-document summarization system (MEAD).

4 0.99554288 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers

Author: Daniel Emilio Beck

Abstract: In this paper I present a Master’s thesis proposal in syntax-based Statistical Machine Translation. Ipropose to build discriminative SMT models using both tree-to-string and tree-to-tree approaches. Translation and language models will be represented mainly through the use of Tree Automata and Tree Transducers. These formalisms have important representational properties that makes them well-suited for syntax modeling. Ialso present an experiment plan to evaluate these models through the use of a parallel corpus written in English and Brazilian Portuguese.

5 0.99357891 168 acl-2011-Improving On-line Handwritten Recognition using Translation Models in Multimodal Interactive Machine Translation

Author: Vicent Alabau ; Alberto Sanchis ; Francisco Casacuberta

Abstract: In interactive machine translation (IMT), a human expert is integrated into the core of a machine translation (MT) system. The human expert interacts with the IMT system by partially correcting the errors of the system’s output. Then, the system proposes a new solution. This process is repeated until the output meets the desired quality. In this scenario, the interaction is typically performed using the keyboard and the mouse. In this work, we present an alternative modality to interact within IMT systems by writing on a tactile display or using an electronic pen. An on-line handwritten text recognition (HTR) system has been specifically designed to operate with IMT systems. Our HTR system improves previous approaches in two main aspects. First, HTR decoding is tightly coupled with the IMT system. Second, the language models proposed are context aware, in the sense that they take into account the partial corrections and the source sentence by using a combination of ngrams and word-based IBM models. The proposed system achieves an important boost in performance with respect to previous work.

6 0.99247557 314 acl-2011-Typed Graph Models for Learning Latent Attributes from Names

7 0.99140888 272 acl-2011-Semantic Information and Derivation Rules for Robust Dialogue Act Detection in a Spoken Dialogue System

8 0.98728013 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity

9 0.9825328 82 acl-2011-Content Models with Attitude

10 0.97475296 41 acl-2011-An Interactive Machine Translation System with Online Learning

11 0.96952844 341 acl-2011-Word Maturity: Computational Modeling of Word Knowledge

12 0.95938969 266 acl-2011-Reordering with Source Language Collocations

13 0.95079041 264 acl-2011-Reordering Metrics for MT

14 0.94857544 169 acl-2011-Improving Question Recommendation by Exploiting Information Need

15 0.94433814 251 acl-2011-Probabilistic Document Modeling for Syntax Removal in Text Summarization

16 0.94294339 26 acl-2011-A Speech-based Just-in-Time Retrieval System using Semantic Search

17 0.94287759 2 acl-2011-AM-FM: A Semantic Framework for Translation Quality Assessment

18 0.94118518 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations

19 0.93728423 326 acl-2011-Using Bilingual Information for Cross-Language Document Summarization

20 0.936001 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages