acl acl2013 acl2013-333 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Anirban Dasgupta ; Ravi Kumar ; Sujith Ravi
Abstract: We propose a new optimization framework for summarization by generalizing the submodular framework of (Lin and Bilmes, 2011). In our framework the summarization desideratum is expressed as a sum of a submodular function and a nonsubmodular function, which we call dispersion; the latter uses inter-sentence dissimilarities in different ways in order to ensure non-redundancy of the summary. We consider three natural dispersion functions and show that a greedy algorithm can obtain an approximately optimal summary in all three cases. We conduct experiments on two corpora—DUC 2004 and user comments on news articles—and show that the performance of our algorithm outperforms those that rely only on submodularity.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract We propose a new optimization framework for summarization by generalizing the submodular framework of (Lin and Bilmes, 2011). [sent-5, score-0.627]
2 In our framework the summarization desideratum is expressed as a sum of a submodular function and a nonsubmodular function, which we call dispersion; the latter uses inter-sentence dissimilarities in different ways in order to ensure non-redundancy of the summary. [sent-6, score-0.772]
3 We consider three natural dispersion functions and show that a greedy algorithm can obtain an approximately optimal summary in all three cases. [sent-7, score-0.843]
4 We conduct experiments on two corpora—DUC 2004 and user comments on news articles—and show that the performance of our algorithm outperforms those that rely only on submodularity. [sent-8, score-0.22]
5 Thanks to the omnipresent information overload facing all of us, the importance of summarization is gaining; semiautomatically summarized content is increasingly becoming user-facing: many newspapers equip editors with automated tools to aid them in choosing a subset of user comments to show. [sent-11, score-0.518]
6 Each domain throws up its own set of idiosyncrasies and challenges for the summarization task. [sent-13, score-0.339]
7 While there have been many approaches to automatic summarization (see Section 2), our work is directly inspired by the recent elegant framework of (Lin and Bilmes, 2011). [sent-20, score-0.367]
8 They employed the powerful theory of submodular functions for summarization: submodularity embodies the “diminishing returns” property and hence is a natural vocabulary to express the summarization desider- ata. [sent-21, score-0.784]
9 ) is captured as a submodular function and the objective is to maximize their sum. [sent-23, score-0.349]
10 For example, a natural constraint on the summary is that the sum or the minimum ofpairwise dissimilarities between sentences chosen in the summary should be maximized; this, unfortunately, is not a submodular function. [sent-28, score-0.557]
11 Ac s2s0o1ci3a Atiosnso fcoirat Cio nm foprut Caotimonpaulta Lti nognuails Lti cnsg,u piasgteics 1014–102 , wise dissimilarities in the summary as dispersion functions. [sent-31, score-0.746]
12 Our focus in this work is on significantly furthering the submodularity-based sum- marization framework to incorporate such dispersion functions. [sent-32, score-0.604]
13 We propose a very general graph-based summarization framework that combines a submodular function with a non-submodular dispersion function. [sent-33, score-1.233]
14 We consider three natural dispersion functions on the sentences in a summary: sum of all-pair sentence dissimilarities, the weight of the minimum spanning tree on the sentences, and the minimum of all-pair sentence dissimilarities. [sent-34, score-0.844]
15 We then show that a greedy algorithm can obtain approximately optimal summary in each of the three cases; the proof exploits some nice combinatorial properties satisfied by the three dispersion functions. [sent-36, score-0.807]
16 We then conduct experiments on two corpora: the DUC 2004 corpus and a corpus of user comments on news articles. [sent-37, score-0.189]
17 On DUC 2004, we obtain performance that matches (Lin and Bilmes, 2011), without any serious parameter tuning; note that their framework does not have the dispersion function. [sent-38, score-0.604]
18 On the comment corpus, we outperform their method, demonstrating that value of dispersion functions. [sent-39, score-0.679]
19 2 Related Work Automatic summarization is a well-studied problem in the literature. [sent-41, score-0.339]
20 Several methods have been proposed for single- and multi-document summarization (Carbonell and Goldstein, 1998; Conroy and O’Leary, 2001 ; Takamura and Okumura, 2009; Shen and Li, 2010). [sent-42, score-0.339]
21 Related concepts have also been used in several other scenarios such as query-focused summarization in information retrieval (Daum ´e and Marcu, 2006), microblog summarization (Sharifi et al. [sent-43, score-0.678]
22 , 2010), event summarization (Filatova, 2004), and others (Riedhammer et al. [sent-44, score-0.339]
23 Graph-based methods have been used for summarization (Ganesan et al. [sent-48, score-0.339]
24 For a detailed survey on existing automatic summarization techniques and other related topics, see (Kim et al. [sent-50, score-0.339]
25 3 Framework In this section we present the summarization framework. [sent-52, score-0.339]
26 We start by describing a generic objective function that can be widely applied to several summarization scenarios. [sent-53, score-0.456]
27 This objective function is the sum of a monotone submodular coverage function and a non-submodular dispersion function. [sent-54, score-1.087]
28 We then describe a simple greedy algorithm for optimizing this objective function with provable approximation guarantees for three natural dispersion functions. [sent-55, score-0.95]
29 Depending on the summarization application, C can refer to the set of documents (e. [sent-58, score-0.369]
30 , user-generated content), it is a collection of comments associated with a news article or a blog post, etc. [sent-62, score-0.232]
31 : Othuer sum measure hs(S, T) = P{u,v}∈P d(u, v), the spanning tree measure ht (SP, T{)u given by the cost of the minimum spanning trePe of the set S∪T, and tohfe t hmei nm measure hm(S, T) = min{u,v}∈P d(u, v). [sent-80, score-0.485]
32 B 1/ute )ifδ > 0, since the dispersion function h(·) is not submodular, eth teh ec odmispbeirnseiod objective f(·) i ss nnoott ssuubbmmoodduullaarr, as ew celol. [sent-93, score-0.693]
33 m Despite this, we s(h·)ow is th noatt a simple greedy algorithm achieves a provable approximation factor for (1). [sent-94, score-0.221]
34 This is possible due to some nice structural properties of the dispersion functions we consider. [sent-95, score-0.648]
35 Algorithm 2 Greedy algorithm, parametrized by the dispersion function h; here, U, k, g, δ are fixed. [sent-96, score-0.66]
36 3 Analysis In this section we obtain a provable approximation for the greedy algorithm. [sent-101, score-0.19]
37 First, we show that a greedy choice is well-behaved with respect to the dispersion function h· (·). [sent-102, score-0.719]
38 ∅, = = We next show that the tree created by the greedy algorithm for h = ht is not far from the optimum. [sent-128, score-0.425]
39 PThe proof follows by noting that we get a spanning tree by connecting each ui to its closest pPoint in Si−1. [sent-136, score-0.215]
40 The cost of this spanning tree is P2≤j≤kd(uj,Sj−1) and this tree is also the resPul2t ≤ojf≤ thke greedy algorithm run in an online fashPion on the input sequence {u1, . [sent-137, score-0.28]
41 For hs and ht, we run Algorithm 1 using a new dispersion function h0, which is a slightly modified version of h. [sent-147, score-0.712]
42 For ht, using the above argument of submodularity and monotonicity of g, and the result from Lemma 1(ii), we have X g(Si∪ u) − g(Si) + δd(u,Si) u∈XO\Si ≥ ≥ ≥ g(O) g(Si) + δ(ht (O)/2 ht (Si)) (g(O) + δht(O)/2) (g(Si) + δht (Si)) − − − f(O)/2 − (g(Si) + δht(Si)). [sent-162, score-0.433]
43 Also, ht (Si) ≤ 2 smt(Si) since this is a metric space. [sent-163, score-0.265]
44 Using t2he s monotonicity of the Steiner tree cost, smt(Si) ≤ smt(Sk) ≤ ht (Sk). [sent-164, score-0.365]
45 We do not use this algorithm in our experiments, as it is oblivious of the actual dispersion functions used. [sent-182, score-0.679]
46 We then use this representation to 1017 generate a graph and instantiate our summarization objective function with specific components that capture the desiderata of a given summarization task. [sent-189, score-0.86]
47 1 Structured representation for sentences In order to instantiate the summarization graph (nodes and edges), we first need to model each sentence (in multi-document summarization) or comment (i. [sent-191, score-0.508]
48 Sentences have been typically modeled using standard ngrams (unigrams or bigrams) in previous summarization work. [sent-194, score-0.364]
49 Furthermore, the edge weights s(u, v) represent pairwise similarity between sentences or comments (e. [sent-202, score-0.238]
50 The edge weights are then used to define the inter-sentence distance metric d(u, v) for the different dispersion functions. [sent-205, score-0.602]
51 Using the syntactic structure along with semantic similarity helps us identify useful (valid) nuggets of information within comments (or documents), avoid redundancies, and identify similar views in a semantic space. [sent-216, score-0.216]
52 , 50 different summarization tasks) with 10 documents per cluster on average. [sent-253, score-0.369]
53 We extracted a set of news articles and corresponding user comments from Yahoo! [sent-256, score-0.189]
54 2 Evaluation For each summarization task, we compare the system output (i. [sent-261, score-0.339]
55 We use the following evaluation settings in our experiments for each summarization task: (1) For multi-document summarization, we compute the ROUGE-15 scores that was the main evaluation criterion for DUC 2004 evaluations. [sent-265, score-0.339]
56 2 (2) For comment summarization, the collection of user comments associated with a given article is typically much larger. [sent-274, score-0.301]
57 Hence for this task, we use a slightly different evaluation criterion that is inspired from the DUC 2005-2007 summarization evaluation tasks. [sent-276, score-0.339]
58 We then run our summarization algorithm on the instantiated graph to produce a summary for each news article. [sent-280, score-0.524]
59 In addition, each news article and corresponding set of comments were presented to three human annotators. [sent-281, score-0.232]
60 They were asked to select a subset of comments (at most 20 comments) that best represented a summary capturing the most popular as well as diverse set of views and opinions expressed by different users that are relevant to the given news article. [sent-282, score-0.297]
61 We then compare the automatically generated comment summaries against the human-generated summaries and compute the ROUGE-1 and ROUGE-2 scores. [sent-283, score-0.209]
62 6 This summarization task is particularly hard for even human annotators since user-generated comments are typically noisy and there are several hundreds of comments per article. [sent-284, score-0.649]
63 This shows that even though this is a new type of summarization task, humans tend to generate more consistent summaries and hence their annotations are reliable for evaluation purposes as in multi-document sum- marization. [sent-289, score-0.421]
64 5 -t 0 -d -l 150 -a -n 2 -x -m -2 4 -u -c 95 1019 of our system that approximates the submodular objective function proposed by (Lin and Bilmes, 2011). [sent-296, score-0.349]
65 7 As shown in the results, our best system8 which uses the hs dispersion function achieves a better ROUGE-1 F-score than all other systems. [sent-297, score-0.712]
66 (2) We observe that the hm and ht dispersion functions produce slightly lower scores than hs, which may be a characteristic of this particular summarization task. [sent-298, score-1.439]
67 We believe that the empirical results achieved by different dispersion functions depend on the nature of the summarization tasks and there are task settings under which hm or ht perform better than hs. [sent-299, score-1.439]
68 For example, we show later how using the ht dispersion function yields the best performance on the comments summarization task. [sent-300, score-1.42]
69 (3) We also analyze the contributions of individual components of the new objective function towards summarization performance by selectively setting certain parameters to 0. [sent-302, score-0.48]
70 We clearly see that each component (popularity, cluster contribution, dispersion) individually yields a reasonable summarization performance but the best result is achieved by the combined system (row 5 in the table). [sent-304, score-0.366]
71 We also contrast the performance of the full system with and without the dispersion component (row 4 versus row 5). [sent-305, score-0.601]
72 The results show that optimizing for dispersion yields an improvement in summarization performance. [sent-306, score-0.942]
73 (4) To understand the effect of utilizing syntactic structure and semantic similarity for constructing the summarization graph, we ran the experiments using just the unigrams and bigrams; we obtained a ROUGE-1 F-score the syntactic structure of 37. [sent-307, score-0.371]
74 This is because their system was tuned for the particular summarization task using the DUC 2003 corpus. [sent-311, score-0.339]
75 On the other hand, even without any parameter tuning our method yields good performance, as evidenced by results on the two different summarization tasks. [sent-312, score-0.366]
76 8For the full system, we weight certain parameters pertaining to cluster contributions and dispersion higher (α = β = δ = 5) compared to the rest of the objective function (λ = 1). [sent-314, score-0.693]
77 If the maximum number of sentences/comments chosen were k, we brought both hs and ht to the same approximate scale as hm by dividing hs by k(k − 1)/2 and ht by k − 1. [sent-316, score-0.873]
78 However, while the structured represen- tation is beneficial, we observed that dispersion (and other individual components) contribute similar performance gains even when using ngrams alone. [sent-319, score-0.625]
79 So the improvements obtained from the structured representation and dispersion are complementary. [sent-320, score-0.6]
80 , we first pick the longest comment (comprising the most number of characters), then the next longest comment and so on, to create an ordered set of comments. [sent-326, score-0.206]
81 The intuition behind this baseline is that longer comments contain more content and possibly cover more topics than short ones. [sent-327, score-0.21]
82 From the table, we observe that the new system (using either dispersion function) outperforms the baseline by a huge margin (+44% relative improvement in ROUGE-1 and much bigger improvements in ROUGE-2 scores). [sent-328, score-0.576]
83 Our system models sentences using the syntactic structure and semantics and jointly optimizes for multiple summarization criteria (including dispersion) which helps weed out the noise and identify relevant, useful information within the comments thereby producing better quality summaries. [sent-330, score-0.519]
84 (2) Unlike the multi-document summarization, here we observe that the ht dispersion function yields the best empirical performance for this task. [sent-334, score-0.926]
85 This observation supports our claim that the choice of the specific dispersion function depends 1020 TabOhle=j2c:sti,Pvwe(rfSuo)αwnrc=(m Stia)αoβn,c=α e,oβλm w,=p λito,hδ n e=d >nitf0 sferRnOtUp3 a758Gr. [sent-335, score-0.634]
86 on the summarization task and that the dispersion functions proposed in this paper have a wider variety of use cases. [sent-337, score-0.987]
87 (3) Results showing contributions from individual components of the new summarization objective function are listed in Table 4. [sent-338, score-0.48]
88 The table also shows that incorporating dispersion into the objective function yields an improvement in summarization quality (row 4 versus row 5). [sent-341, score-1.084]
89 6 Conclusions We introduced a new general-purpose graph-based summarization framework that combines a sub- modular coverage function with a non-submodular dispersion function. [sent-348, score-1.03]
90 We presented three natural dispersion functions that represent three different ways of ensuring non-redundancy (using sentence dissimilarities) for summarization and proved that a simple greedy algorithm can obtain an approximately optimal summary in all these cases. [sent-349, score-1.182]
91 Experiments on two different summarization tasks show that our algorithm outperforms algorithms that rely only on submodularity. [sent-350, score-0.37]
92 Finally, we demonstrated that using a structured representation to model sentences in the graph improves summarization quality. [sent-351, score-0.429]
93 Firstly, it would interesting to see if dispersion offers similar improvements over a tuned version of the submodular framework ofLin and Bilmes (201 1). [sent-353, score-0.836]
94 In a very recent work, Lin and Bilmes (2012) demonstrate a further improvement in performance for document summarization by using mixtures of submodular shells. [sent-354, score-0.6]
95 This is an interesting extension of their previous submodular framework and while the new formulation permits more complex functions, the resulting function is still submodular and hence can be combined with the dispersion measures proposed in this paper. [sent-355, score-1.155]
96 A different body of work uses determinantal point processes (DPP) to model subset selection problems and adapt it for document summarization (Kulesza and Taskar, 2011). [sent-356, score-0.396]
97 Opinosis: A graph based approach to abstractive summarization of highly redundant opinions. [sent-404, score-0.405]
98 Learning mixtures of submodular shells with application to document summarization. [sent-429, score-0.261]
99 An analysis of approximations for maximizing submodular set functions I. [sent-445, score-0.304]
100 Text summarization model based on maximum coverage problem and its variant. [sent-486, score-0.368]
wordName wordTfidf (topN-words)
[('dispersion', 0.576), ('summarization', 0.339), ('ht', 0.265), ('submodular', 0.232), ('si', 0.216), ('hm', 0.187), ('sk', 0.173), ('comments', 0.155), ('duc', 0.116), ('submodularity', 0.112), ('comment', 0.103), ('dissimilarities', 0.091), ('bilmes', 0.09), ('ui', 0.086), ('tennis', 0.086), ('greedy', 0.085), ('summary', 0.079), ('hs', 0.078), ('steiner', 0.076), ('functions', 0.072), ('adore', 0.069), ('uj', 0.06), ('og', 0.06), ('objective', 0.059), ('function', 0.058), ('monotonicity', 0.056), ('xo', 0.056), ('approximation', 0.053), ('summaries', 0.053), ('borodin', 0.052), ('ganesan', 0.052), ('provable', 0.052), ('monotone', 0.051), ('pu', 0.05), ('bi', 0.05), ('spanning', 0.049), ('oh', 0.046), ('dpps', 0.046), ('tree', 0.044), ('article', 0.043), ('cov', 0.042), ('smt', 0.042), ('graph', 0.041), ('lemma', 0.038), ('lin', 0.037), ('let', 0.036), ('proof', 0.036), ('guarantees', 0.036), ('curel', 0.034), ('diversification', 0.034), ('halld', 0.034), ('imase', 0.034), ('kavita', 0.034), ('yatani', 0.034), ('news', 0.034), ('iii', 0.033), ('rel', 0.033), ('similarity', 0.032), ('wn', 0.031), ('algorithm', 0.031), ('sg', 0.031), ('cover', 0.031), ('nemhauser', 0.03), ('riedhammer', 0.03), ('hyun', 0.03), ('sharifi', 0.03), ('chandra', 0.03), ('rouge', 0.03), ('documents', 0.03), ('dependency', 0.03), ('chengxiang', 0.029), ('hence', 0.029), ('coverage', 0.029), ('ravi', 0.029), ('views', 0.029), ('document', 0.029), ('framework', 0.028), ('determinantal', 0.028), ('cost', 0.027), ('yields', 0.027), ('minimum', 0.027), ('summarizing', 0.027), ('parametrized', 0.026), ('edge', 0.026), ('sentences', 0.025), ('ngrams', 0.025), ('keyphrase', 0.025), ('abstractive', 0.025), ('sd', 0.025), ('sh', 0.025), ('row', 0.025), ('redundancy', 0.025), ('components', 0.024), ('wordnet', 0.024), ('uai', 0.024), ('mountain', 0.024), ('structured', 0.024), ('sum', 0.024), ('sj', 0.024), ('content', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 333 acl-2013-Summarization Through Submodularity and Dispersion
Author: Anirban Dasgupta ; Ravi Kumar ; Sujith Ravi
Abstract: We propose a new optimization framework for summarization by generalizing the submodular framework of (Lin and Bilmes, 2011). In our framework the summarization desideratum is expressed as a sum of a submodular function and a nonsubmodular function, which we call dispersion; the latter uses inter-sentence dissimilarities in different ways in order to ensure non-redundancy of the summary. We consider three natural dispersion functions and show that a greedy algorithm can obtain an approximately optimal summary in all three cases. We conduct experiments on two corpora—DUC 2004 and user comments on news articles—and show that the performance of our algorithm outperforms those that rely only on submodularity.
2 0.2232631 332 acl-2013-Subtree Extractive Summarization via Submodular Maximization
Author: Hajime Morita ; Ryohei Sasano ; Hiroya Takamura ; Manabu Okumura
Abstract: This study proposes a text summarization model that simultaneously performs sentence extraction and compression. We translate the text summarization task into a problem of extracting a set of dependency subtrees in the document cluster. We also encode obligatory case constraints as must-link dependency constraints in order to guarantee the readability of the generated summary. In order to handle the subtree extraction problem, we investigate a new class of submodular maximization problem, and a new algorithm that has the approximation ratio 21 (1 − e−1). Our experiments with the NTC(1IR − −A eCLIA test collections show that our approach outperforms a state-of-the-art algorithm.
3 0.15172468 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
Author: Miguel Almeida ; Andre Martins
Abstract: We present a dual decomposition framework for multi-document summarization, using a model that jointly extracts and compresses sentences. Compared with previous work based on integer linear programming, our approach does not require external solvers, is significantly faster, and is modular in the three qualities a summary should have: conciseness, informativeness, and grammaticality. In addition, we propose a multi-task learning framework to take advantage of existing data for extractive summarization and sentence compression. Experiments in the TAC2008 dataset yield the highest published ROUGE scores to date, with runtimes that rival those of extractive summarizers.
4 0.15125659 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
Author: Lu Wang ; Hema Raghavan ; Vittorio Castelli ; Radu Florian ; Claire Cardie
Abstract: We consider the problem of using sentence compression techniques to facilitate queryfocused multi-document summarization. We present a sentence-compression-based framework for the task, and design a series of learning-based compression models built on parse trees. An innovative beam search decoder is proposed to efficiently find highly probable compressions. Under this framework, we show how to integrate various indicative metrics such as linguistic motivation and query relevance into the compression process by deriving a novel formulation of a compression scoring function. Our best model achieves statistically significant improvement over the state-of-the-art systems on several metrics (e.g. 8.0% and 5.4% improvements in ROUGE-2 respectively) for the DUC 2006 and 2007 summarization task. ,
5 0.12271941 377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization
Author: Chen Li ; Xian Qian ; Yang Liu
Abstract: In this paper, we propose a bigram based supervised method for extractive document summarization in the integer linear programming (ILP) framework. For each bigram, a regression model is used to estimate its frequency in the reference summary. The regression model uses a variety ofindicative features and is trained discriminatively to minimize the distance between the estimated and the ground truth bigram frequency in the reference summary. During testing, the sentence selection problem is formulated as an ILP problem to maximize the bigram gains. We demonstrate that our system consistently outperforms the previous ILP method on different TAC data sets, and performs competitively compared to the best results in the TAC evaluations. We also conducted various analysis to show the impact of bigram selection, weight estimation, and ILP setup.
6 0.11984926 353 acl-2013-Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain
7 0.11617365 5 acl-2013-A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art
8 0.11506508 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
9 0.10222584 319 acl-2013-Sequential Summarization: A New Application for Timely Updated Twitter Trending Topics
10 0.10002912 257 acl-2013-Natural Language Models for Predicting Programming Comments
11 0.096607625 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization
12 0.092073701 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk
13 0.089544214 142 acl-2013-Evolutionary Hierarchical Dirichlet Process for Timeline Summarization
14 0.074875012 23 acl-2013-A System for Summarizing Scientific Topics Starting from Keywords
15 0.069156758 126 acl-2013-Diverse Keyword Extraction from Conversations
16 0.065702006 209 acl-2013-Joint Modeling of News Readerâ•Žs and Comment Writerâ•Žs Emotions
17 0.061904285 315 acl-2013-Semi-Supervised Semantic Tagging of Conversational Understanding using Markov Topic Regression
18 0.060555048 320 acl-2013-Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation
19 0.057657216 217 acl-2013-Latent Semantic Matching: Application to Cross-language Text Categorization without Alignment Information
20 0.056038644 59 acl-2013-Automated Pyramid Scoring of Summaries using Distributional Semantics
topicId topicWeight
[(0, 0.162), (1, 0.029), (2, 0.008), (3, -0.057), (4, -0.0), (5, 0.02), (6, 0.119), (7, -0.007), (8, -0.174), (9, -0.097), (10, -0.024), (11, 0.048), (12, -0.122), (13, -0.015), (14, -0.063), (15, 0.129), (16, 0.148), (17, -0.098), (18, 0.016), (19, 0.075), (20, -0.009), (21, -0.097), (22, 0.015), (23, -0.026), (24, -0.018), (25, -0.034), (26, 0.029), (27, 0.023), (28, 0.028), (29, -0.01), (30, -0.043), (31, 0.055), (32, -0.013), (33, 0.019), (34, -0.002), (35, 0.015), (36, -0.028), (37, -0.039), (38, 0.027), (39, -0.045), (40, -0.052), (41, -0.002), (42, 0.07), (43, 0.022), (44, -0.024), (45, -0.041), (46, -0.059), (47, -0.003), (48, 0.051), (49, -0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.94698709 333 acl-2013-Summarization Through Submodularity and Dispersion
Author: Anirban Dasgupta ; Ravi Kumar ; Sujith Ravi
Abstract: We propose a new optimization framework for summarization by generalizing the submodular framework of (Lin and Bilmes, 2011). In our framework the summarization desideratum is expressed as a sum of a submodular function and a nonsubmodular function, which we call dispersion; the latter uses inter-sentence dissimilarities in different ways in order to ensure non-redundancy of the summary. We consider three natural dispersion functions and show that a greedy algorithm can obtain an approximately optimal summary in all three cases. We conduct experiments on two corpora—DUC 2004 and user comments on news articles—and show that the performance of our algorithm outperforms those that rely only on submodularity.
2 0.85737628 353 acl-2013-Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain
Author: Jackie Chi Kit Cheung ; Gerald Penn
Abstract: In automatic summarization, centrality is the notion that a summary should contain the core parts of the source text. Current systems use centrality, along with redundancy avoidance and some sentence compression, to produce mostly extractive summaries. In this paper, we investigate how summarization can advance past this paradigm towards robust abstraction by making greater use of the domain of the source text. We conduct a series of studies comparing human-written model summaries to system summaries at the semantic level of caseframes. We show that model summaries (1) are more abstractive and make use of more sentence aggregation, (2) do not contain as many topical caseframes as system summaries, and (3) cannot be reconstructed solely from the source text, but can be if texts from in-domain documents are added. These results suggest that substantial improvements are unlikely to result from better optimizing centrality-based criteria, but rather more domain knowledge is needed.
3 0.8431865 332 acl-2013-Subtree Extractive Summarization via Submodular Maximization
Author: Hajime Morita ; Ryohei Sasano ; Hiroya Takamura ; Manabu Okumura
Abstract: This study proposes a text summarization model that simultaneously performs sentence extraction and compression. We translate the text summarization task into a problem of extracting a set of dependency subtrees in the document cluster. We also encode obligatory case constraints as must-link dependency constraints in order to guarantee the readability of the generated summary. In order to handle the subtree extraction problem, we investigate a new class of submodular maximization problem, and a new algorithm that has the approximation ratio 21 (1 − e−1). Our experiments with the NTC(1IR − −A eCLIA test collections show that our approach outperforms a state-of-the-art algorithm.
4 0.83006209 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
Author: Lu Wang ; Hema Raghavan ; Vittorio Castelli ; Radu Florian ; Claire Cardie
Abstract: We consider the problem of using sentence compression techniques to facilitate queryfocused multi-document summarization. We present a sentence-compression-based framework for the task, and design a series of learning-based compression models built on parse trees. An innovative beam search decoder is proposed to efficiently find highly probable compressions. Under this framework, we show how to integrate various indicative metrics such as linguistic motivation and query relevance into the compression process by deriving a novel formulation of a compression scoring function. Our best model achieves statistically significant improvement over the state-of-the-art systems on several metrics (e.g. 8.0% and 5.4% improvements in ROUGE-2 respectively) for the DUC 2006 and 2007 summarization task. ,
5 0.80935425 377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization
Author: Chen Li ; Xian Qian ; Yang Liu
Abstract: In this paper, we propose a bigram based supervised method for extractive document summarization in the integer linear programming (ILP) framework. For each bigram, a regression model is used to estimate its frequency in the reference summary. The regression model uses a variety ofindicative features and is trained discriminatively to minimize the distance between the estimated and the ground truth bigram frequency in the reference summary. During testing, the sentence selection problem is formulated as an ILP problem to maximize the bigram gains. We demonstrate that our system consistently outperforms the previous ILP method on different TAC data sets, and performs competitively compared to the best results in the TAC evaluations. We also conducted various analysis to show the impact of bigram selection, weight estimation, and ILP setup.
6 0.79621625 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
7 0.78244871 5 acl-2013-A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art
8 0.69132 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization
9 0.68530405 59 acl-2013-Automated Pyramid Scoring of Summaries using Distributional Semantics
10 0.6794157 142 acl-2013-Evolutionary Hierarchical Dirichlet Process for Timeline Summarization
11 0.58515483 319 acl-2013-Sequential Summarization: A New Application for Timely Updated Twitter Trending Topics
12 0.52421772 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
13 0.50051785 23 acl-2013-A System for Summarizing Scientific Topics Starting from Keywords
14 0.49557304 225 acl-2013-Learning to Order Natural Language Texts
15 0.48898509 178 acl-2013-HEADY: News headline abstraction through event pattern clustering
16 0.45359632 375 acl-2013-Using Integer Linear Programming in Concept-to-Text Generation to Produce More Compact Texts
17 0.43124929 50 acl-2013-An improved MDL-based compression algorithm for unsupervised word segmentation
18 0.41864756 293 acl-2013-Random Walk Factoid Annotation for Collective Discourse
19 0.41392764 126 acl-2013-Diverse Keyword Extraction from Conversations
20 0.38522473 182 acl-2013-High-quality Training Data Selection using Latent Topics for Graph-based Semi-supervised Learning
topicId topicWeight
[(0, 0.054), (4, 0.011), (6, 0.087), (11, 0.075), (15, 0.011), (23, 0.204), (24, 0.038), (26, 0.065), (28, 0.013), (35, 0.055), (42, 0.053), (48, 0.035), (70, 0.051), (88, 0.042), (90, 0.035), (95, 0.077)]
simIndex simValue paperId paperTitle
1 0.8625294 209 acl-2013-Joint Modeling of News Readerâ•Žs and Comment Writerâ•Žs Emotions
Author: Huanhuan Liu ; Shoushan Li ; Guodong Zhou ; Chu-ren Huang ; Peifeng Li
Abstract: Emotion classification can be generally done from both the writer’s and reader’s perspectives. In this study, we find that two foundational tasks in emotion classification, i.e., reader’s emotion classification on the news and writer’s emotion classification on the comments, are strongly related to each other in terms of coarse-grained emotion categories, i.e., negative and positive. On the basis, we propose a respective way to jointly model these two tasks. In particular, a cotraining algorithm is proposed to improve semi-supervised learning of the two tasks. Experimental evaluation shows the effectiveness of our joint modeling approach. . 1
same-paper 2 0.80921924 333 acl-2013-Summarization Through Submodularity and Dispersion
Author: Anirban Dasgupta ; Ravi Kumar ; Sujith Ravi
Abstract: We propose a new optimization framework for summarization by generalizing the submodular framework of (Lin and Bilmes, 2011). In our framework the summarization desideratum is expressed as a sum of a submodular function and a nonsubmodular function, which we call dispersion; the latter uses inter-sentence dissimilarities in different ways in order to ensure non-redundancy of the summary. We consider three natural dispersion functions and show that a greedy algorithm can obtain an approximately optimal summary in all three cases. We conduct experiments on two corpora—DUC 2004 and user comments on news articles—and show that the performance of our algorithm outperforms those that rely only on submodularity.
3 0.79499108 328 acl-2013-Stacking for Statistical Machine Translation
Author: Majid Razmara ; Anoop Sarkar
Abstract: We propose the use of stacking, an ensemble learning technique, to the statistical machine translation (SMT) models. A diverse ensemble of weak learners is created using the same SMT engine (a hierarchical phrase-based system) by manipulating the training data and a strong model is created by combining the weak models on-the-fly. Experimental results on two language pairs and three different sizes of training data show significant improvements of up to 4 BLEU points over a conventionally trained SMT model.
4 0.79049826 365 acl-2013-Understanding Tables in Context Using Standard NLP Toolkits
Author: Vidhya Govindaraju ; Ce Zhang ; Christopher Re
Abstract: Tabular information in text documents contains a wealth of information, and so tables are a natural candidate for information extraction. There are many cues buried in both a table and its surrounding text that allow us to understand the meaning of the data in a table. We study how natural-language tools, such as part-of-speech tagging, dependency paths, and named-entity recognition, can be used to improve the quality of relation extraction from tables. In three domains we show that (1) a model that performs joint probabilistic inference across tabular and natural language features achieves an F1 score that is twice as high as either a puretable or pure-text system, and (2) using only shallower features or non-joint inference results in lower quality.
5 0.66533101 36 acl-2013-Adapting Discriminative Reranking to Grounded Language Learning
Author: Joohyun Kim ; Raymond Mooney
Abstract: We adapt discriminative reranking to improve the performance of grounded language acquisition, specifically the task of learning to follow navigation instructions from observation. Unlike conventional reranking used in syntactic and semantic parsing, gold-standard reference trees are not naturally available in a grounded setting. Instead, we show how the weak supervision of response feedback (e.g. successful task completion) can be used as an alternative, experimentally demonstrating that its performance is comparable to training on gold-standard parse trees.
6 0.66112536 210 acl-2013-Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition
7 0.654562 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
8 0.65406752 259 acl-2013-Non-Monotonic Sentence Alignment via Semisupervised Learning
9 0.65178967 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
10 0.65093774 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
11 0.65039188 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
12 0.64896899 353 acl-2013-Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain
13 0.64665318 131 acl-2013-Dual Training and Dual Prediction for Polarity Classification
14 0.64659119 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
15 0.64464009 246 acl-2013-Modeling Thesis Clarity in Student Essays
16 0.64292324 9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner
17 0.64289403 204 acl-2013-Iterative Transformation of Annotation Guidelines for Constituency Parsing
18 0.64233243 318 acl-2013-Sentiment Relevance
19 0.64175606 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification
20 0.64171284 212 acl-2013-Language-Independent Discriminative Parsing of Temporal Expressions