acl acl2013 acl2013-377 knowledge-graph by maker-knowledge-mining

377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization


Source: pdf

Author: Chen Li ; Xian Qian ; Yang Liu

Abstract: In this paper, we propose a bigram based supervised method for extractive document summarization in the integer linear programming (ILP) framework. For each bigram, a regression model is used to estimate its frequency in the reference summary. The regression model uses a variety ofindicative features and is trained discriminatively to minimize the distance between the estimated and the ground truth bigram frequency in the reference summary. During testing, the sentence selection problem is formulated as an ILP problem to maximize the bigram gains. We demonstrate that our system consistently outperforms the previous ILP method on different TAC data sets, and performs competitively compared to the best results in the TAC evaluations. We also conducted various analysis to show the impact of bigram selection, weight estimation, and ILP setup.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu , Abstract In this paper, we propose a bigram based supervised method for extractive document summarization in the integer linear programming (ILP) framework. [sent-3, score-0.826]

2 For each bigram, a regression model is used to estimate its frequency in the reference summary. [sent-4, score-0.332]

3 The regression model uses a variety ofindicative features and is trained discriminatively to minimize the distance between the estimated and the ground truth bigram frequency in the reference summary. [sent-5, score-0.799]

4 During testing, the sentence selection problem is formulated as an ILP problem to maximize the bigram gains. [sent-6, score-0.459]

5 We also conducted various analysis to show the impact of bigram selection, weight estimation, and ILP setup. [sent-8, score-0.442]

6 1 Introduction Extractive summarization is a sentence selection problem: identifying important summary sentences from one or multiple documents. [sent-9, score-0.431]

7 Many methods have been developed for this problem, including supervised approaches that use classifiers to predict summary sentences, graph based approaches to rank the sentences, and recent global optimization methods such as integer linear programming (ILP) and submodular methods. [sent-10, score-0.324]

8 Gillick and Favre (Gillick and Favre, 2009) used bigrams as concepts, which are selected from a subset of the sentences, and their document frequency as the weight in the objective function. [sent-24, score-0.63]

9 In this paper, we propose to find a candidate summary such that the language concepts (e. [sent-25, score-0.254]

10 , bigrams) in this candidate summary and the reference summary can have the same frequency. [sent-27, score-0.467]

11 Our method can decide not only which language concepts to use in ILP, but also the frequency of these language concepts in the candidate summary. [sent-32, score-0.288]

12 To estimate the bigram frequency in the summary, we propose to use a supervised regression model that is discriminatively trained using a variety of features. [sent-33, score-0.686]

13 Our experiments on several TAC summarization data sets demonstrate this proposed method outperforms the previous ILP system and often the best performing TAC system. [sent-34, score-0.254]

14 1 Bigram Gain Maximization by ILP We choose bigrams as the language concepts in our proposed method since they have been successfully used in previous work. [sent-36, score-0.526]

15 In addition, we expect that the bigram oriented ILP is consistent with the ROUGE-2 measure widely used for summarization evaluation. [sent-37, score-0.576]

16 We start the description of our approach for the scenario where a human abstractive summary is provided, and the task is to select sentences to form an extractive summary. [sent-38, score-0.422]

17 Then Our goal is to make the bigram frequency in this system summary as close as possible to that in the reference. [sent-39, score-0.709]

18 For each bigram b, we define its gain: Gain(b, sum) = min{nb,ref, nb,sum} (7) where nb,ref is the frequency of b in the reference summary, and nb,sum is the frequency of b in the automatic summary. [sent-40, score-0.72]

19 The gain of a bigram is no more than its frequency in the reference summary, hence adding redundant bigrams will not increase the gain. [sent-41, score-1.054]

20 2 Regression Model for Bigram Frequency Estimation In the previous section, we assume that nb,ref is at hand (reference abstractive summary is given) and propose a bigram-based optimization framework for extractive summarization. [sent-53, score-0.411]

21 However, for the summarization task, the bigram frequency is unknown, and thus our first goal is to estimate such frequency. [sent-54, score-0.684]

22 Since a bigram’s frequency depends on the summary length (L), we use a normalized frequency 1005 in our method. [sent-56, score-0.421]

23 Since the normalized frequency Nb,ref is a real number, we choose to use a logistic regression model to predict it: Nb,ref=Pjexexpp{w{w′f′(fb()b}j)} (13) where f(bj) is the feaPture vector of bigram bj and w′ is the corresponding feature weight. [sent-59, score-0.659]

24 The objective function for training is thus to minimize the KL distance: minXbNeb,reflogNNebb,,r eeff Neb,ref (15) where is the treue normalized frequency of bigram be in reference summaries. [sent-62, score-0.67]

25 3 Features Each bigram is represented using a set of features in the above regression model. [sent-66, score-0.475]

26 Term frequency1: The frequency of this bigram in the given topic. [sent-70, score-0.496]

27 Term frequency2: The frequency of this bigram in the selected sentences1 . [sent-72, score-0.525]

28 Similarity with description of the topic: Similarity of the bigram with topic description (see next data section about the given topics in the summarization task). [sent-80, score-0.583]

29 The TAC summarization task is to generate at most 100 words summaries from 10 documents for a given topic query (with a title and more detailed description). [sent-93, score-0.289]

30 Step 2: Extract bigrams from all the sentences, :th Eenxt rsaecletc bt gthraosmes bigrams lw tihteh d seocn-ument frequency equal to more than 3. [sent-102, score-0.913]

31 We call this subset as initial bigram set in the following. [sent-103, score-0.407]

32 Step 3: Select relevant sentences that contain aStt lpea 3s:t one bigram nftro smen tehnec eisni tthiaalt bigram set. [sent-104, score-0.799]

33 Step 4: Feed the ILP with sentences and the bigram s Fete etod get tIhLeP P re wsuithlt. [sent-105, score-0.414]

34 In our method, we first extract all the bigrams from the selected sentences and then estimate each bigram’s Nb,ref using the regression model. [sent-108, score-0.567]

35 Then we use the top-n bigrams with their Nb,ref and all the selected sentences in our proposed ILP module for summary sentence selec- tion. [sent-109, score-0.729]

36 When training our bigram regression model, we use each of the 4 reference summaries separately, i. [sent-110, score-0.657]

37 , the bigram frequency is obtained from one reference summary. [sent-112, score-0.609]

38 The same pre-selection of sentences described above is also applied in training, that is, the bigram instances used in training are from these selected sentences and the reference summary. [sent-113, score-0.585]

39 We use the top 100 bigrams from our bigram estimation module; whereas the ICSI system just used the initial bigram set described in Section 3. [sent-129, score-1.268]

40 We used the estimated value from the regression model; the ICSI system just uses the bigram’s document frequency in the original text as weight. [sent-132, score-0.324]

41 We use three weighting setups: the estimated bigram frequency value in our method, document frequency, or term frequency from the original text. [sent-138, score-0.758]

42 Table 2 and 3 show the results using the top 100 bigrams from our system and the initial bigram set from the ICSI system respectively. [sent-139, score-0.88]

43 First of all, we can see that for both ILP sys- tems, our estimated bigram weights outperform the other frequency-based weights. [sent-141, score-0.479]

44 For the ICSI ILP system, using bigram document frequency achieves better performance than term frequency (which verified why document frequency is used in their system). [sent-142, score-0.865]

45 U1 120G173408E6290- Table 2: Results using different weighting methods on the top 100 bigrams generated from our proposed system. [sent-144, score-0.45]

46 U1 1 0G6507E91 2- Table 3: Results using different weighting methods based on the initial bigram sets. [sent-146, score-0.431]

47 The average number of bigrams is around 80 for each topic. [sent-147, score-0.401]

48 When the weight is document frequency, the ICSI’s result is better than our proposed ILP; whereas when using term frequency as the weights, our ILP has better results, again suggesting term frequency fits our ILP system better. [sent-150, score-0.47]

49 When the weight is estimated value, the results depend on the bigram set used. [sent-151, score-0.453]

50 This shows that the size and quality of the bigrams has an impact on the ILP modules. [sent-153, score-0.433]

51 There are about 80 bigrams used in the ICSI ILP system. [sent-156, score-0.401]

52 A natural question to ask is the impact of the number of bigrams and their quality on the summarization system. [sent-157, score-0.603]

53 We can see that about one third of bigrams in the reference summary are in the original text (127. [sent-159, score-0.691]

54 We mentioned that we only use the top-N (n is 100 in previous experiments) bigrams in our summarization system. [sent-162, score-0.571]

55 On the other hand, we see from the table that only 127 of these more than 2K bigrams are in the reference summary and are thus expected to help the summary responsiveness. [sent-164, score-0.868]

56 In- cluding all the bigrams would lead to huge noise. [sent-165, score-0.401]

57 Fig 1shows the bigram coverage (number of bigrams used in the system that are also in reference summaries) when we vary N selected bigrams. [sent-173, score-0.983]

58 As expected, we can see that as n increases, there are more reference summary bigrams included in the system. [sent-174, score-0.691]

59 There are 25 summary bigrams in the top-50 bigrams and about 38 in top-100 bigrams. [sent-175, score-0.979]

60 Compared with the ICSI system that has around 80 bigrams in the initial bigram set and 29 in the reference summary, our estimation module has better coverage. [sent-176, score-1.052]

61 ehuaSngmicoetbNrlfmetBroie ndRfeac1 372610529480 0 05 095140 85230 75320 Number of Selected Bigram Figure 1: Coverage of bigrams (number of bigrams in reference summary) when varying the number of bigrams used in the ILP systems. [sent-177, score-1.316]

62 Increasing the number of bigrams used in the system will lead to better coverage, however, the incorrect bigrams also increase and have a negative impact on the system performance. [sent-178, score-0.925]

63 To examine the best tradeoff, we conduct the experiments by choosing the different top-N bigram set for the two ILP systems, as shown in Fig 2. [sent-179, score-0.385]

64 1008 We can see that the ICSI ILP system performs better when the input bigrams have less noise (those bigrams that are not in summary). [sent-181, score-0.879]

65 However, our proposed method is slightly more robust to this kind of noise, possibly because of the weights we use in our system the noisy bigrams have lower weights and thus less impact on the final system performance. [sent-182, score-0.655]

66 The optimal number of bigrams differs for the two systems, with a larger number of bigrams in our method. [sent-184, score-0.802]

67 1218 when using top 60 bigrams, which is better than using the initial bigram set in their method (0. [sent-186, score-0.449]

68 1 1 1 210239753140560780910 102130 Number ofselected bigram Proposed ILP ICSI Figure 2: Summarization performance when varying the number of bigrams for two systems. [sent-190, score-0.786]

69 4 Oracle Experiments Based on the above analysis, we can see the impact of the bigram set and their weights. [sent-192, score-0.417]

70 The following experiments are designed to demonstrate the best system performance we can achieve if we have access to good quality bigrams and weights. [sent-193, score-0.437]

71 The first is an oracle experiment, where we use all the bigrams from the reference summaries that are also in the original text. [sent-195, score-0.623]

72 In the ICSI ILP system, the weights are the document frequency from the multiple reference summaries. [sent-196, score-0.319]

73 We hypothesize that one reason may be that many bigrams in the summary reference only appear once. [sent-201, score-0.691]

74 Table 6 shows the frequency of the bigrams in the summary. [sent-202, score-0.512]

75 Indeed 85% of bigram only appear once ILOCPuS rIy IsL tP emRO0 . [sent-203, score-0.385]

76 U2 G1 2 E48-2 Table 5: Oracle experiment: using bigrams and their frequencies in the reference summary as weights. [sent-204, score-0.707]

77 04 Table 6: Average number of bigrams for each term frequency in one topic’s reference summary. [sent-216, score-0.665]

78 We also treat the oracle results as the gold standard for extractive summarization and compared how the two automatic summarization systems differ at the sentence level. [sent-217, score-0.524]

79 In the second experiment, after we obtain the estimated Nb,ref for every bigram in the selected sentences from our regression model, we only keep those bigrams that are in the reference summary, and use the estimated weights for both ILP modules. [sent-222, score-1.184]

80 This might be attributed to the fact there is less noise (all the bigrams are the correct ones) and thus the ICSI ILP system performs well. [sent-226, score-0.459]

81 We can see that these results are worse than the previous oracle experiments, but are better than using the automatically generated bigrams, again showing the bigram and weight estimation is critical for 1009 summarization. [sent-227, score-0.508]

82 U1 98G48E28-2 Table 7: Summarization results when using the estimated weights and only keeping the bigrams that are in the reference summary. [sent-229, score-0.608]

83 For set A, the task is a standard summarization, and there are 4 reference summaries, each 100 words long; for set B, it is an update summarization task the summary includes information not mentioned in the summary from set A. [sent-233, score-0.637]

84 6 Summary of Analysis The previous experiments have shown the impact of the three factors: the quality of the bigrams themselves, the weights used for these bigrams, and the ILP module. [sent-244, score-0.484]

85 We found that the bigrams and their weights are critical for both the ILP setups. [sent-245, score-0.452]

86 An important part of our system is the supervised method for bigram and weight estimation. [sent-247, score-0.513]

87 We have already seen for the previous ILP method, when using our bigrams together with the weights, better performance can be achieved. [sent-248, score-0.42]

88 To answer this, we trained a simple supervised binary classifier for bigram prediction (positive means that a bigram appears in the summary) using the same set of features as used in our bigram weight estimation module, and then used their document frequency in the ICSI ILP system. [sent-250, score-1.418]

89 We originally expected that using the supervised method may outperform the unsupervised bigram selection which only uses term frequency information. [sent-254, score-0.627]

90 From this we can see that it is not just the supervised methods or using annotated data that yields the overall improved system performance, but rather our proposed regression setup for bigrams is the main reason. [sent-256, score-0.596]

91 In particular, recently several optimization approaches have demonstrated 1010 competitive performance for extractive summarization task. [sent-259, score-0.301]

92 In contrast to these unsupervised approaches, there are also various efforts on supervised learning for summarization where a model is trained to predict whether a sentence is in the summary or not. [sent-276, score-0.439]

93 Unlike previous work using sentencebased supervised learning, we use a regression model to estimate the bigrams and their weights, and use these to guide sentence selection. [sent-283, score-0.584]

94 When abstractive summaries are given, one needs to use that information to automatically generate reference labels (a sentence is in the summary or not) for extractive summarization. [sent-285, score-0.606]

95 Most researchers have used the similarity between a sentence in the document and the abstractive summary for labeling. [sent-286, score-0.355]

96 In our method, we do not need to generate this extra label for model training since ours is based on bigrams it is straightforward to obtain the reference frequency for bigrams by simply looking at the reference summary. [sent-288, score-1.139]

97 , 2012), which leveraged Latent Semantic Anal– ysis (LSA) to produce term weights and selected summary sentences by computing an approximate solution to the Budgeted Maximal Coverage problem. [sent-295, score-0.326]

98 Different from the previous ILP summarization approach, we propose a supervised learning method (a discriminatively trained regression model) to determine the importance of the bigrams fed to the ILP module. [sent-297, score-0.766]

99 In addition, we revise the ILP to maximize the bigram gain (which is expected to be highly correlated with ROUGE-2 scores) rather than the concept/bigram coverage. [sent-298, score-0.448]

100 We plan to consider the context of bigrams to better predict whether a bigram is in the reference summary. [sent-302, score-0.935]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ilp', 0.52), ('bigrams', 0.401), ('bigram', 0.385), ('icsi', 0.344), ('summary', 0.177), ('summarization', 0.17), ('tac', 0.164), ('gillick', 0.12), ('reference', 0.113), ('extractive', 0.113), ('frequency', 0.111), ('abstractive', 0.103), ('regression', 0.09), ('xz', 0.084), ('xs', 0.084), ('xb', 0.083), ('concepts', 0.077), ('favre', 0.071), ('summaries', 0.069), ('aker', 0.055), ('rouge', 0.055), ('weights', 0.051), ('cb', 0.046), ('mmr', 0.046), ('document', 0.044), ('supervised', 0.044), ('gain', 0.044), ('estimated', 0.043), ('compression', 0.04), ('oracle', 0.04), ('term', 0.04), ('estimation', 0.039), ('discriminatively', 0.038), ('duc', 0.038), ('module', 0.037), ('system', 0.036), ('bj', 0.034), ('impact', 0.032), ('conroy', 0.032), ('fig', 0.032), ('brandow', 0.031), ('darling', 0.031), ('igdfnhrtevqfarleu', 0.031), ('psie', 0.031), ('ryang', 0.031), ('xmin', 0.031), ('sentence', 0.031), ('bi', 0.03), ('integer', 0.03), ('selected', 0.029), ('sentences', 0.029), ('topic', 0.028), ('leskovec', 0.028), ('ahmet', 0.028), ('galanis', 0.028), ('weight', 0.025), ('proposed', 0.025), ('ci', 0.025), ('fb', 0.024), ('budgeted', 0.024), ('selection', 0.024), ('weighting', 0.024), ('method', 0.023), ('erkan', 0.023), ('kupiec', 0.023), ('rio', 0.023), ('normalized', 0.022), ('initial', 0.022), ('performs', 0.022), ('sum', 0.022), ('title', 0.022), ('benoit', 0.021), ('celikyilmaz', 0.021), ('davis', 0.021), ('submodular', 0.021), ('woodsend', 0.021), ('expect', 0.021), ('xie', 0.02), ('objective', 0.02), ('ref', 0.02), ('carbonell', 0.02), ('reinforcement', 0.02), ('maximize', 0.019), ('topically', 0.019), ('wf', 0.019), ('coverage', 0.019), ('better', 0.019), ('minimize', 0.019), ('estimate', 0.018), ('notice', 0.018), ('optimization', 0.018), ('concept', 0.018), ('judith', 0.018), ('years', 0.018), ('dilek', 0.017), ('experiment', 0.017), ('pj', 0.017), ('predict', 0.017), ('programming', 0.017), ('frequencies', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization

Author: Chen Li ; Xian Qian ; Yang Liu

Abstract: In this paper, we propose a bigram based supervised method for extractive document summarization in the integer linear programming (ILP) framework. For each bigram, a regression model is used to estimate its frequency in the reference summary. The regression model uses a variety ofindicative features and is trained discriminatively to minimize the distance between the estimated and the ground truth bigram frequency in the reference summary. During testing, the sentence selection problem is formulated as an ILP problem to maximize the bigram gains. We demonstrate that our system consistently outperforms the previous ILP method on different TAC data sets, and performs competitively compared to the best results in the TAC evaluations. We also conducted various analysis to show the impact of bigram selection, weight estimation, and ILP setup.

2 0.21261436 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning

Author: Miguel Almeida ; Andre Martins

Abstract: We present a dual decomposition framework for multi-document summarization, using a model that jointly extracts and compresses sentences. Compared with previous work based on integer linear programming, our approach does not require external solvers, is significantly faster, and is modular in the three qualities a summary should have: conciseness, informativeness, and grammaticality. In addition, we propose a multi-task learning framework to take advantage of existing data for extractive summarization and sentence compression. Experiments in the TAC2008 dataset yield the highest published ROUGE scores to date, with runtimes that rival those of extractive summarizers.

3 0.16671543 5 acl-2013-A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art

Author: Peter A. Rankel ; John M. Conroy ; Hoa Trang Dang ; Ani Nenkova

Abstract: How good are automatic content metrics for news summary evaluation? Here we provide a detailed answer to this question, with a particular focus on assessing the ability of automatic evaluations to identify statistically significant differences present in manual evaluation of content. Using four years of data from the Text Analysis Conference, we analyze the performance of eight ROUGE variants in terms of accuracy, precision and recall in finding significantly different systems. Our experiments show that some of the neglected variants of ROUGE, based on higher order n-grams and syntactic dependencies, are most accurate across the years; the commonly used ROUGE-1 scores find too many significant differences between systems which manual evaluation would deem comparable. We also test combinations ofROUGE variants and find that they considerably improve the accuracy of automatic prediction.

4 0.16632098 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization

Author: Lu Wang ; Claire Cardie

Abstract: We address the challenge of generating natural language abstractive summaries for spoken meetings in a domain-independent fashion. We apply Multiple-Sequence Alignment to induce abstract generation templates that can be used for different domains. An Overgenerateand-Rank strategy is utilized to produce and rank candidate abstracts. Experiments using in-domain and out-of-domain training on disparate corpora show that our system uniformly outperforms state-of-the-art supervised extract-based approaches. In addition, human judges rate our system summaries significantly higher than compared systems in fluency and overall quality.

5 0.14920481 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

Author: Lu Wang ; Hema Raghavan ; Vittorio Castelli ; Radu Florian ; Claire Cardie

Abstract: We consider the problem of using sentence compression techniques to facilitate queryfocused multi-document summarization. We present a sentence-compression-based framework for the task, and design a series of learning-based compression models built on parse trees. An innovative beam search decoder is proposed to efficiently find highly probable compressions. Under this framework, we show how to integrate various indicative metrics such as linguistic motivation and query relevance into the compression process by deriving a novel formulation of a compression scoring function. Our best model achieves statistically significant improvement over the state-of-the-art systems on several metrics (e.g. 8.0% and 5.4% improvements in ROUGE-2 respectively) for the DUC 2006 and 2007 summarization task. ,

6 0.13967554 375 acl-2013-Using Integer Linear Programming in Concept-to-Text Generation to Produce More Compact Texts

7 0.12271941 333 acl-2013-Summarization Through Submodularity and Dispersion

8 0.12238869 247 acl-2013-Modeling of term-distance and term-occurrence information for improving n-gram language model performance

9 0.12081838 353 acl-2013-Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain

10 0.11249101 160 acl-2013-Fine-grained Semantic Typing of Emerging Entities

11 0.10711147 332 acl-2013-Subtree Extractive Summarization via Submodular Maximization

12 0.094000079 319 acl-2013-Sequential Summarization: A New Application for Timely Updated Twitter Trending Topics

13 0.085984953 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors

14 0.078670725 237 acl-2013-Margin-based Decomposed Amortized Inference

15 0.078377672 324 acl-2013-Smatch: an Evaluation Metric for Semantic Feature Structures

16 0.076504409 91 acl-2013-Connotation Lexicon: A Dash of Sentiment Beneath the Surface Meaning

17 0.070776746 29 acl-2013-A Visual Analytics System for Cluster Exploration

18 0.070450582 142 acl-2013-Evolutionary Hierarchical Dirichlet Process for Timeline Summarization

19 0.068757311 207 acl-2013-Joint Inference for Fine-grained Opinion Extraction

20 0.067940563 50 acl-2013-An improved MDL-based compression algorithm for unsupervised word segmentation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.138), (1, 0.048), (2, -0.004), (3, -0.048), (4, 0.023), (5, 0.01), (6, 0.116), (7, -0.027), (8, -0.218), (9, -0.093), (10, -0.057), (11, 0.006), (12, -0.184), (13, -0.057), (14, -0.111), (15, 0.137), (16, 0.142), (17, -0.152), (18, -0.007), (19, 0.063), (20, 0.007), (21, -0.081), (22, -0.016), (23, 0.037), (24, 0.036), (25, -0.031), (26, 0.027), (27, -0.011), (28, 0.004), (29, -0.012), (30, 0.007), (31, -0.023), (32, 0.037), (33, 0.037), (34, -0.003), (35, 0.096), (36, -0.0), (37, -0.01), (38, -0.043), (39, -0.042), (40, 0.02), (41, 0.027), (42, -0.048), (43, 0.041), (44, -0.063), (45, 0.037), (46, 0.023), (47, -0.086), (48, 0.065), (49, -0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94926983 377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization

Author: Chen Li ; Xian Qian ; Yang Liu

Abstract: In this paper, we propose a bigram based supervised method for extractive document summarization in the integer linear programming (ILP) framework. For each bigram, a regression model is used to estimate its frequency in the reference summary. The regression model uses a variety ofindicative features and is trained discriminatively to minimize the distance between the estimated and the ground truth bigram frequency in the reference summary. During testing, the sentence selection problem is formulated as an ILP problem to maximize the bigram gains. We demonstrate that our system consistently outperforms the previous ILP method on different TAC data sets, and performs competitively compared to the best results in the TAC evaluations. We also conducted various analysis to show the impact of bigram selection, weight estimation, and ILP setup.

2 0.79935479 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning

Author: Miguel Almeida ; Andre Martins

Abstract: We present a dual decomposition framework for multi-document summarization, using a model that jointly extracts and compresses sentences. Compared with previous work based on integer linear programming, our approach does not require external solvers, is significantly faster, and is modular in the three qualities a summary should have: conciseness, informativeness, and grammaticality. In addition, we propose a multi-task learning framework to take advantage of existing data for extractive summarization and sentence compression. Experiments in the TAC2008 dataset yield the highest published ROUGE scores to date, with runtimes that rival those of extractive summarizers.

3 0.79000974 353 acl-2013-Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain

Author: Jackie Chi Kit Cheung ; Gerald Penn

Abstract: In automatic summarization, centrality is the notion that a summary should contain the core parts of the source text. Current systems use centrality, along with redundancy avoidance and some sentence compression, to produce mostly extractive summaries. In this paper, we investigate how summarization can advance past this paradigm towards robust abstraction by making greater use of the domain of the source text. We conduct a series of studies comparing human-written model summaries to system summaries at the semantic level of caseframes. We show that model summaries (1) are more abstractive and make use of more sentence aggregation, (2) do not contain as many topical caseframes as system summaries, and (3) cannot be reconstructed solely from the source text, but can be if texts from in-domain documents are added. These results suggest that substantial improvements are unlikely to result from better optimizing centrality-based criteria, but rather more domain knowledge is needed.

4 0.76936346 5 acl-2013-A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art

Author: Peter A. Rankel ; John M. Conroy ; Hoa Trang Dang ; Ani Nenkova

Abstract: How good are automatic content metrics for news summary evaluation? Here we provide a detailed answer to this question, with a particular focus on assessing the ability of automatic evaluations to identify statistically significant differences present in manual evaluation of content. Using four years of data from the Text Analysis Conference, we analyze the performance of eight ROUGE variants in terms of accuracy, precision and recall in finding significantly different systems. Our experiments show that some of the neglected variants of ROUGE, based on higher order n-grams and syntactic dependencies, are most accurate across the years; the commonly used ROUGE-1 scores find too many significant differences between systems which manual evaluation would deem comparable. We also test combinations ofROUGE variants and find that they considerably improve the accuracy of automatic prediction.

5 0.76893443 333 acl-2013-Summarization Through Submodularity and Dispersion

Author: Anirban Dasgupta ; Ravi Kumar ; Sujith Ravi

Abstract: We propose a new optimization framework for summarization by generalizing the submodular framework of (Lin and Bilmes, 2011). In our framework the summarization desideratum is expressed as a sum of a submodular function and a nonsubmodular function, which we call dispersion; the latter uses inter-sentence dissimilarities in different ways in order to ensure non-redundancy of the summary. We consider three natural dispersion functions and show that a greedy algorithm can obtain an approximately optimal summary in all three cases. We conduct experiments on two corpora—DUC 2004 and user comments on news articles—and show that the performance of our algorithm outperforms those that rely only on submodularity.

6 0.71804333 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

7 0.68207991 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization

8 0.67701697 332 acl-2013-Subtree Extractive Summarization via Submodular Maximization

9 0.66047859 59 acl-2013-Automated Pyramid Scoring of Summaries using Distributional Semantics

10 0.6274485 375 acl-2013-Using Integer Linear Programming in Concept-to-Text Generation to Produce More Compact Texts

11 0.57998502 142 acl-2013-Evolutionary Hierarchical Dirichlet Process for Timeline Summarization

12 0.5187996 319 acl-2013-Sequential Summarization: A New Application for Timely Updated Twitter Trending Topics

13 0.46296057 225 acl-2013-Learning to Order Natural Language Texts

14 0.45076269 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors

15 0.42936543 50 acl-2013-An improved MDL-based compression algorithm for unsupervised word segmentation

16 0.42888466 178 acl-2013-HEADY: News headline abstraction through event pattern clustering

17 0.42835674 21 acl-2013-A Statistical NLG Framework for Aggregated Planning and Realization

18 0.35208422 324 acl-2013-Smatch: an Evaluation Metric for Semantic Feature Structures

19 0.34968781 23 acl-2013-A System for Summarizing Scientific Topics Starting from Keywords

20 0.34810016 172 acl-2013-Graph-based Local Coherence Modeling


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.039), (6, 0.107), (11, 0.045), (15, 0.017), (24, 0.113), (26, 0.057), (27, 0.17), (35, 0.105), (42, 0.046), (48, 0.051), (60, 0.012), (70, 0.032), (88, 0.02), (90, 0.036), (95, 0.052)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.85731834 239 acl-2013-Meet EDGAR, a tutoring agent at MONSERRATE

Author: Pedro Fialho ; Luisa Coheur ; Sergio Curto ; Pedro Claudio ; Angela Costa ; Alberto Abad ; Hugo Meinedo ; Isabel Trancoso

Abstract: In this paper we describe a platform for embodied conversational agents with tutoring goals, which takes as input written and spoken questions and outputs answers in both forms. The platform is developed within a game environment, and currently allows speech recognition and synthesis in Portuguese, English and Spanish. In this paper we focus on its understanding component that supports in-domain interactions, and also small talk. Most indomain interactions are answered using different similarity metrics, which compare the perceived utterances with questions/sentences in the agent’s knowledge base; small-talk capabilities are mainly due to AIML, a language largely used by the chatbots’ community. In this paper we also introduce EDGAR, the butler of MONSERRATE, which was developed in the aforementioned platform, and that answers tourists’ questions about MONSERRATE.

same-paper 2 0.82658386 377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization

Author: Chen Li ; Xian Qian ; Yang Liu

Abstract: In this paper, we propose a bigram based supervised method for extractive document summarization in the integer linear programming (ILP) framework. For each bigram, a regression model is used to estimate its frequency in the reference summary. The regression model uses a variety ofindicative features and is trained discriminatively to minimize the distance between the estimated and the ground truth bigram frequency in the reference summary. During testing, the sentence selection problem is formulated as an ILP problem to maximize the bigram gains. We demonstrate that our system consistently outperforms the previous ILP method on different TAC data sets, and performs competitively compared to the best results in the TAC evaluations. We also conducted various analysis to show the impact of bigram selection, weight estimation, and ILP setup.

3 0.72904247 36 acl-2013-Adapting Discriminative Reranking to Grounded Language Learning

Author: Joohyun Kim ; Raymond Mooney

Abstract: We adapt discriminative reranking to improve the performance of grounded language acquisition, specifically the task of learning to follow navigation instructions from observation. Unlike conventional reranking used in syntactic and semantic parsing, gold-standard reference trees are not naturally available in a grounded setting. Instead, we show how the weak supervision of response feedback (e.g. successful task completion) can be used as an alternative, experimentally demonstrating that its performance is comparable to training on gold-standard parse trees.

4 0.72489208 194 acl-2013-Improving Text Simplification Language Modeling Using Unsimplified Text Data

Author: David Kauchak

Abstract: In this paper we examine language modeling for text simplification. Unlike some text-to-text translation tasks, text simplification is a monolingual translation task allowing for text in both the input and output domain to be used for training the language model. We explore the relationship between normal English and simplified English and compare language models trained on varying amounts of text from each. We evaluate the models intrinsically with perplexity and extrinsically on the lexical simplification task from SemEval 2012. We find that a combined model using both simplified and normal English data achieves a 23% improvement in perplexity and a 24% improvement on the lexical simplification task over a model trained only on simple data. Post-hoc analysis shows that the additional unsimplified data provides better coverage for unseen and rare n-grams.

5 0.72102225 246 acl-2013-Modeling Thesis Clarity in Student Essays

Author: Isaac Persing ; Vincent Ng

Abstract: Recently, researchers have begun exploring methods of scoring student essays with respect to particular dimensions of quality such as coherence, technical errors, and relevance to prompt, but there is relatively little work on modeling thesis clarity. We present a new annotated corpus and propose a learning-based approach to scoring essays along the thesis clarity dimension. Additionally, in order to provide more valuable feedback on why an essay is scored as it is, we propose a second learning-based approach to identifying what kinds of errors an essay has that may lower its thesis clarity score.

6 0.7194193 230 acl-2013-Lightly Supervised Learning of Procedural Dialog Systems

7 0.71854293 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri

8 0.71370274 145 acl-2013-Exploiting Qualitative Information from Automatic Word Alignment for Cross-lingual NLP Tasks

9 0.71228081 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations

10 0.71027964 319 acl-2013-Sequential Summarization: A New Application for Timely Updated Twitter Trending Topics

11 0.71008044 52 acl-2013-Annotating named entities in clinical text by combining pre-annotation and active learning

12 0.70872432 99 acl-2013-Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation

13 0.70468158 210 acl-2013-Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition

14 0.70400858 318 acl-2013-Sentiment Relevance

15 0.70394599 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

16 0.70049155 298 acl-2013-Recognizing Rare Social Phenomena in Conversation: Empowerment Detection in Support Group Chatrooms

17 0.69838476 389 acl-2013-Word Association Profiles and their Use for Automated Scoring of Essays

18 0.6968779 172 acl-2013-Graph-based Local Coherence Modeling

19 0.69560939 346 acl-2013-The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia

20 0.69511884 353 acl-2013-Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain