acl acl2011 acl2011-187 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Taylor Berg-Kirkpatrick ; Dan Gillick ; Dan Klein
Abstract: We learn a joint model of sentence extraction and compression for multi-document summarization. Our model scores candidate summaries according to a combined linear model whose features factor over (1) the n-gram types in the summary and (2) the compressions used. We train the model using a marginbased objective whose loss captures end summary quality. Because of the exponentially large set of candidate summaries, we use a cutting-plane algorithm to incrementally detect and add active constraints efficiently. Inference in our model can be cast as an ILP and thereby solved in reasonable time; we also present a fast approximation scheme which achieves similar performance. Our jointly extracted and compressed summaries outperform both unlearned baselines and our learned extraction-only system on both ROUGE and Pyramid, without a drop in judged linguistic quality. We achieve the highest published ROUGE results to date on the TAC 2008 data set.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We learn a joint model of sentence extraction and compression for multi-document summarization. [sent-3, score-0.269]
2 Our model scores candidate summaries according to a combined linear model whose features factor over (1) the n-gram types in the summary and (2) the compressions used. [sent-4, score-0.451]
3 We train the model using a marginbased objective whose loss captures end summary quality. [sent-5, score-0.314]
4 Our jointly extracted and compressed summaries outperform both unlearned baselines and our learned extraction-only system on both ROUGE and Pyramid, without a drop in judged linguistic quality. [sent-8, score-0.521]
5 One reason learning may have provided limited gains is that typical models do not learn to optimize end summary quality directly, but rather learn intermediate quantities in isolation. [sent-11, score-0.245]
6 , 2007; Schilder and Kondadadi, 2008), and then assemble extractive summaries from the top-ranked sentences in a way not incorporated into the learning process. [sent-13, score-0.658]
7 One main contribution of the current paper is the direct optimization of summary quality in a single model; we find that our learned systems substantially outperform unlearned counterparts on both automatic and manual metrics. [sent-19, score-0.299]
8 While pure extraction is certainly simple and does guarantee some minimal readability, Lin (2003) showed that sentence compression (Knight and Marcu, 2001 ; McDonald, 2006; Clarke and Lapata, 2008) has the potential to improve the resulting summaries. [sent-20, score-0.222]
9 However, attempts to incorporate compression into a summarization system have largely failed to realize large gains. [sent-21, score-0.212]
10 For example, Zajic et al (2006) use a pipeline approach, pre-processing to yield additional candidates for extraction by applying heuristic sentence compressions, but their system does not outperform state-of-the-art purely extractive systems. [sent-22, score-0.528]
11 Similarly, Gillick and Favre (2009), though not learning weights, do a limited form of compression jointly with extraction. [sent-23, score-0.199]
12 A second contribution of the current work is to show a system for jointly learning to jointly compress and extract that exhibits gains in both ROUGE and content metrics over purely extractive systems. [sent-25, score-0.726]
13 In our approach, we define a linear model that scores candidate summaries according to features that factor over the n-gram types that appear in the summary and the structural compressions used to create the sentences in the summary. [sent-30, score-0.451]
14 We train these parameters jointly using a margin-based objective whose loss captures end summary quality through the ROUGE metric. [sent-31, score-0.439]
15 Our jointly extracted and compressed summaries outperform both unlearned baselines and our learned extraction-only system on both ROUGE and Pyramid, without a drop in judged linguistic quality. [sent-35, score-0.521]
16 Let x be the input document set, and let y be a representation of a summary as a vector. [sent-40, score-0.245]
17 For an extractive summary, y is as a vector of indicators y = (ys : s ∈ x), one indicator ys for each sentence s in x. [sent-41, score-0.519]
18 A :se snt ∈en xc)e, s is present in the summary if and only if its indicator ys = 1 (see Figure 1a). [sent-42, score-0.193]
19 Let Y (x) be the set of valid summaries of document set x with length no greater than Lmax. [sent-43, score-0.269]
20 This later approach is associated with the following objective function: ym∈Ya (xx) b∈XB(y)vb (1) Here, vb is the value of bigram b, and B(y) is the set of bigrams present in the summary encoded by y. [sent-49, score-0.575]
21 They let the value vb of each bigram be given by the number of input documents the bigram appears in. [sent-51, score-0.438]
22 We extend objective 1 so that it assigns value not just to the bigrams that appear in the summary, but also to the choices made in the creation of the summary. [sent-53, score-0.213]
23 In our complete model, which jointly extracts and compresses sentences, we choose whether or not to cut individual subtrees in the constituency parses 1See Text Analysis Conference results in 2008 and 2009. [sent-54, score-0.211]
24 This is in contrast to the extractive case where choices are made on full sentences. [sent-56, score-0.479]
25 Next, we present details of our representation of compressive summaries. [sent-58, score-0.487]
26 We represent a compressive summary as a vector y = (yn : n ∈ ts, s ∈ x) of indicators, one for each non-term:in nal ∈ n tod,es i ∈n each parse tree of the sentences in the document set x. [sent-60, score-0.696]
27 A word is present in the output summary if and only if its parent parse tree node n has yn = 1(see Figure 1b). [sent-61, score-0.302]
28 In addition to the length constraint on the members of Y (x), we require that each node n may have yn = 1 only if its parent π(n) has yπ(n) = 1. [sent-62, score-0.194]
29 For the compressive model we define the set of cut choices C(y) for a summary y to be the set of edges in each parse that are broken in order to delete a subtree (see Figure 1b). [sent-65, score-0.874]
30 We require that each subtree has a non-terminal node for a root, and say that an edge (n, π(n)) between a node and its parent is broken if the parent has yπ(n) = 1but the child has yn = 0. [sent-66, score-0.38]
31 This entails to parameterizing each bigram score vb and each subtree deletion score vc. [sent-70, score-0.447]
32 Learning weights for Objective 1 where Y (x) is the set of extractive summaries gives our LEARNED EXTRACTIVE system. [sent-74, score-0.658]
33 Learning weights for Objective 2 where Y (x) is the set of compressive summaries, and C(y) the set of broken edges that produce subtree deletions, gives our LEARNED COMPRESSIVE system, which is our joint model of extraction and compression. [sent-75, score-0.774]
34 3 Structured Learning Discriminative training attempts to minimize the loss incurred during prediction by optimizing an objective on the training set. [sent-76, score-0.217]
35 We will perform discriminative training using a loss function that directly measures end-to-end summarization quality. [sent-77, score-0.179]
36 In Section 4 we show that finding summaries that optimize Objective 2, Viterbi prediction, is efficient. [sent-78, score-0.21]
37 We instead turn to an approach that optimizes a batch objective which is sensitive to all constraints on all instances, but is efficient by adding these constraints incrementally. [sent-81, score-0.267]
38 eNlo stuem tmhaatr tehse, Dlab =el s{u(xmmaries can be expressed as vectors y∗ because our training summaries are variously extractive or extractive and compressive (see Section 5). [sent-87, score-1.593]
39 , 2004) of extractive and compressive summaries: mwin21kwk2+NCXiN=1ξi (4) s. [sent-90, score-0.935]
40 ≥ ‘(y, yi∗) − ξi The constraints in Equation 5 require that the difference in model score between each possible summary y and the gold summary yi∗ be no smaller than the loss ‘(y, yi∗), padded by a per-instance slack of ξi. [sent-94, score-0.464]
41 We use bigram recall as our loss function (see Section 3. [sent-95, score-0.234]
42 Unfortunately, the size of the output space of extractive summaries is exponential in the number of sentences in the input document set. [sent-100, score-0.717]
43 It alternates between solving Objective 4 with a reduced set of currently active constraints, and adding newly active constraints to the set. [sent-104, score-0.241]
44 Luckily, our choice of loss function, ξˆ, bigram recall, factors over bigrams. [sent-126, score-0.234]
45 We simply modify each bigram value vb to include bigram b’s contribution to the total loss. [sent-128, score-0.402]
46 Since there are many reasonable summaries we are less interested in exactly matching any specific training instance, and more interested in the degree to which a predicted summary deviates from a label. [sent-135, score-0.36]
47 The standard method for automatically evaluating a summary against a reference is ROUGE, which we simplify slightly to bigram recall. [sent-136, score-0.309]
48 With an extractive reference denoted by y∗, our loss function is: ‘(y,y∗) =|B(y|B)T(yB∗)(|y∗)| We verified that bigram recall correlates well with ROUGE and with manual metrics. [sent-137, score-0.682]
49 4 Efficient Prediction We show how to perform prediction with the extractive and compressive models by solving ILPs. [sent-138, score-1.032]
50 For many instances, a generic ILP solver can find exact solutions to the prediction problems in a matter of seconds. [sent-139, score-0.253]
51 1 ILP for extraction Gillick and Favre (2009) express the optimization of Objective 1 for extractive summarization as an ILP. [sent-142, score-0.604]
52 Let the presence of each bigram b in B(y) be indicated by the binary variable zb. [sent-145, score-0.203]
53 Let Qsb be an indicator of the presence of bigram b in sentence s. [sent-146, score-0.231]
54 Xlsys ≤ Lmax Xs ∀b XQsb ≤ zb (7) ∀s, b ysQsb ≥ zb (8) Xs Constraints 7 and 8 ensure consistency between sentences and bigrams. [sent-149, score-0.206]
55 Notice that the Constraint 7 requires that selecting a sentence entails selecting all its bigrams, and Constraint 8 requires that selecting a bigram entails selecting at least one sentence that contains it. [sent-150, score-0.277]
56 2 ILP for joint compression and extraction We can extend the ILP formulation of extraction to solve the compressive problem. [sent-155, score-0.809]
57 Let the presence of each cut c in CP(y) be i≤ndi Lcated by the binary variable zc, which iPs active if and only if yn = 0 but yπ(n) = 1, where node π(n) is the parent of node n. [sent-158, score-0.339]
58 While it is possible to let B(y) contain all bigrams present in the compressive summary, the re485 Figure 2: Diagram of ILP for joint extraction and compression. [sent-160, score-0.749]
59 Variables zb indicate the presence of bigrams in the summary. [sent-161, score-0.24]
60 The figure suppresses bigram variables zstopped,in and zfrance,he to reduce clutter. [sent-163, score-0.205]
61 We omit from B(y) bigrams that are the result of deleted intermediate words. [sent-167, score-0.204]
62 As a result the re- quired number of variables zb is linear in the length of a sentence. [sent-168, score-0.182]
63 By solving the following ILP we can compute the arg max required for prediction in the joint model: mya,zx Xbvbzb+Xcvczc s. [sent-171, score-0.178]
64 These constraints can be encoded explicitly using O(N) linear constraints, where N is the number of words in the document set x. [sent-181, score-0.181]
65 The reduction of B(y) to include only bigrams not resulting from deleted intermediate words avoids O(N2) required constraints. [sent-182, score-0.204]
66 In practice, solving this ILP for joint extraction and compression is, on average, an order of magnitude slower than solving the ILP for pure extraction, and for certain instances finding the exact solution is prohibitively slow. [sent-183, score-0.4]
67 3 Fast approximate prediction One common way to quickly approximate an ILP is to solve its LP relaxation (and round the results). [sent-185, score-0.258]
68 We developed an alternative fast approximate joint extractive and compressive solver that gives better results in terms of both objective value and bigram recall of resulting solutions. [sent-188, score-1.536]
69 The approximate joint solver first extracts a subset of the sentences in the document set that total no more than M words. [sent-189, score-0.363]
70 In a second step, we apply the exact joint extractive and compressive summarizer (see Section 4. [sent-190, score-1.053]
71 The objective we maximize in performing the initial extraction is different from the one used in extractive summarization. [sent-192, score-0.589]
72 This objective rewards redundant bPigramsP, Panb∈ds thus is likely to give thejoint solver multiple options for including the same piece of relevant content. [sent-194, score-0.28]
73 When M is the size of the document set x, the approximate solver solves the exact joint problem. [sent-196, score-0.4]
74 In Figure 3 we plot the trade-off between approximation quality and computation time, comparing to the exact joint solver, an exact solver that is limited to extractive solutions, and the LP relaxation solver. [sent-197, score-0.856]
75 The results show that the approximate joint solver yields substantial improvements over the LP relaxation, and 486 Figure 3: Plot of objective value, bigram recall, and elapsed time for the approximate joint extractive and compressive solver against size ofintermediate extraction set. [sent-198, score-1.843]
76 Also shown are values for an LP relaxation approximate solver, a solver that is restricted to extractive solutions, and finally the exact compressive solver. [sent-199, score-1.251]
77 can achieve results comparable to those produced by the exact solver with a 5-fold reduction in computation time. [sent-202, score-0.228]
78 To train the extractive system described in Section 2, we use as our labels y∗ the extractions with the largest bigram recall values relative to the sets of references. [sent-207, score-0.649]
79 Table 1: Bigram features: component feature functions in g(b, x) that we use to characterize the bigram b in both the extractive and compressive models. [sent-215, score-1.094]
80 To make the joint annotation task more feasible, we adopted an approximate approach that closely matches our fast approximate prediction procedure. [sent-220, score-0.303]
81 Annotators were shown a 150-word maximum bigram recall extractions from the full document set and instructed to form a compressed summary by deleting words until 100 or fewer words remained. [sent-221, score-0.488]
82 We chose the summary we judged to be of highest quality from each pair to add to our corpus. [sent-223, score-0.211]
83 This gave one gold compressive summary y∗ for each of the 44 problems in the TAC 2009 set. [sent-224, score-0.637]
84 We used these labels to train our joint extractive and compressive system described in Section 2. [sent-225, score-1.016]
85 Relative to some NLP tasks, our feature sets are small: roughly two hundred features on bigrams and thirteen features on subtree deletions. [sent-229, score-0.219]
86 Table 2: Subtree deletion features: component feature functions in h(c, x) that we use to characterize the subtree deleted by cutting edge c = (n, π(n)) in the joint extractive and compressive model. [sent-250, score-1.239]
87 1 Bigram features Our bigram features include document counts, the earliest position in a document of a sentence that contains the bigram, and membership of each word in a standard set of stopwords. [sent-252, score-0.305]
88 We use stemmed bigrams and prune bigrams that appear in fewer than three input documents. [sent-255, score-0.186]
89 2 Subtree deletion features Table 2 gives a description of our subtree tree deletion features. [sent-257, score-0.22]
90 We solved extractive ILPs exactly, and joint extractive and compressive ILPs approximately using an intermediate extraction size of 1000. [sent-276, score-1.609]
91 To evaluate linguistic quality, we sent all the summaries to Mechanical Turk (with two times redun488 TaSLbEyAlXEesTSAteR. [sent-285, score-0.21]
92 All the content-based metrics show substantial improvement for learned systems over unlearned ones, and we see an extremely large improvement for the learned joint extractive and compressive system over the previous state-of-the-art EXTRACTIVE BASELINE. [sent-294, score-1.177]
93 But, importantly, the gains achieved by the joint extractive and compressive system in content-based metrics do not come at the cost of linguistic quality when compared to purely extractive systems. [sent-298, score-1.498]
94 The joint extractive and compressive system fits more word types into a summary than the extractive systems, but also produces longer sentences on average. [sent-300, score-1.614]
95 Example summaries produced by thejoint system are given in Figure 4 along with reference summaries produced by humans. [sent-302, score-0.504]
96 $couN1erWYm2isn5gKvtlTa:AodremiwsRfbtcnlavyg- Figure 4: Example summaries produced by our learned joint model of extraction and compression. [sent-308, score-0.417]
97 These are each 100-word-limited summaries of a collection of ten documents from the TAC 2008 data set. [sent-309, score-0.21]
98 References summaries produced by humans are provided for comparison. [sent-311, score-0.238]
99 489 8 Conclusion Jointly learning to extract and compress within a unified model outperforms learning pure extraction, which in turn outperforms a state-of-the-art extractive baseline. [sent-312, score-0.55]
100 Sentence compression as a component of a multidocument summarization system. [sent-531, score-0.212]
wordName wordTfidf (topN-words)
[('compressive', 0.487), ('extractive', 0.448), ('summaries', 0.21), ('ilp', 0.189), ('solver', 0.163), ('bigram', 0.159), ('rouge', 0.153), ('summary', 0.15), ('subtree', 0.126), ('compression', 0.108), ('gillick', 0.108), ('summarization', 0.104), ('zb', 0.103), ('pyramid', 0.099), ('bigrams', 0.093), ('jointly', 0.091), ('constraints', 0.089), ('objective', 0.089), ('vb', 0.084), ('joint', 0.081), ('yn', 0.078), ('tac', 0.078), ('compressed', 0.078), ('loss', 0.075), ('vc', 0.069), ('tsochantaridis', 0.069), ('unlearned', 0.069), ('zc', 0.069), ('compress', 0.068), ('intermediate', 0.061), ('approximate', 0.06), ('document', 0.059), ('compressions', 0.058), ('isstop', 0.058), ('yi', 0.056), ('relaxation', 0.056), ('active', 0.054), ('xb', 0.054), ('prediction', 0.053), ('cut', 0.052), ('extraction', 0.052), ('favre', 0.052), ('lp', 0.051), ('nenkova', 0.051), ('deleted', 0.05), ('fast', 0.049), ('xc', 0.049), ('ym', 0.047), ('deletion', 0.047), ('learned', 0.046), ('variables', 0.046), ('presence', 0.044), ('solving', 0.044), ('ys', 0.043), ('extractions', 0.042), ('constraint', 0.042), ('taskar', 0.042), ('martins', 0.041), ('constituency', 0.04), ('doccount', 0.039), ('docposition', 0.039), ('glpk', 0.039), ('ilps', 0.039), ('schilder', 0.039), ('xbvbzb', 0.039), ('xi', 0.037), ('parent', 0.037), ('node', 0.037), ('exact', 0.037), ('daum', 0.036), ('bias', 0.036), ('let', 0.036), ('zajic', 0.034), ('ayrg', 0.034), ('lmax', 0.034), ('woodsend', 0.034), ('structured', 0.034), ('pure', 0.034), ('deletions', 0.034), ('quality', 0.034), ('linear', 0.033), ('equation', 0.032), ('solved', 0.032), ('parameterize', 0.032), ('entails', 0.031), ('choices', 0.031), ('indicates', 0.03), ('smo', 0.03), ('passonneau', 0.03), ('solve', 0.029), ('subtrees', 0.028), ('content', 0.028), ('thejoint', 0.028), ('teufel', 0.028), ('sentence', 0.028), ('produced', 0.028), ('broken', 0.028), ('klein', 0.027), ('judged', 0.027), ('cast', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 187 acl-2011-Jointly Learning to Extract and Compress
Author: Taylor Berg-Kirkpatrick ; Dan Gillick ; Dan Klein
Abstract: We learn a joint model of sentence extraction and compression for multi-document summarization. Our model scores candidate summaries according to a combined linear model whose features factor over (1) the n-gram types in the summary and (2) the compressions used. We train the model using a marginbased objective whose loss captures end summary quality. Because of the exponentially large set of candidate summaries, we use a cutting-plane algorithm to incrementally detect and add active constraints efficiently. Inference in our model can be cast as an ILP and thereby solved in reasonable time; we also present a fast approximation scheme which achieves similar performance. Our jointly extracted and compressed summaries outperform both unlearned baselines and our learned extraction-only system on both ROUGE and Pyramid, without a drop in judged linguistic quality. We achieve the highest published ROUGE results to date on the TAC 2008 data set.
2 0.2928099 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations
Author: Dong Wang ; Yang Liu
Abstract: This paper presents a pilot study of opinion summarization on conversations. We create a corpus containing extractive and abstractive summaries of speaker’s opinion towards a given topic using 88 telephone conversations. We adopt two methods to perform extractive summarization. The first one is a sentence-ranking method that linearly combines scores measured from different aspects including topic relevance, subjectivity, and sentence importance. The second one is a graph-based method, which incorporates topic and sentiment information, as well as additional information about sentence-to-sentence relations extracted based on dialogue structure. Our evaluation results show that both methods significantly outperform the baseline approach that extracts the longest utterances. In particular, we find that incorporating dialogue structure in the graph-based method contributes to the improved system performance.
3 0.13956283 326 acl-2011-Using Bilingual Information for Cross-Language Document Summarization
Author: Xiaojun Wan
Abstract: Cross-language document summarization is defined as the task of producing a summary in a target language (e.g. Chinese) for a set of documents in a source language (e.g. English). Existing methods for addressing this task make use of either the information from the original documents in the source language or the information from the translated documents in the target language. In this study, we propose to use the bilingual information from both the source and translated documents for this task. Two summarization methods (SimFusion and CoRank) are proposed to leverage the bilingual information in the graph-based ranking framework for cross-language summary extraction. Experimental results on the DUC2001 dataset with manually translated reference Chinese summaries show the effectiveness of the proposed methods. 1
4 0.13211684 47 acl-2011-Automatic Assessment of Coverage Quality in Intelligence Reports
Author: Samuel Brody ; Paul Kantor
Abstract: Common approaches to assessing document quality look at shallow aspects, such as grammar and vocabulary. For many real-world applications, deeper notions of quality are needed. This work represents a first step in a project aimed at developing computational methods for deep assessment of quality in the domain of intelligence reports. We present an automated system for ranking intelligence reports with regard to coverage of relevant material. The system employs methodologies from the field of automatic summarization, and achieves performance on a par with human judges, even in the absence of the underlying information sources.
5 0.13001595 98 acl-2011-Discovery of Topically Coherent Sentences for Extractive Summarization
Author: Asli Celikyilmaz ; Dilek Hakkani-Tur
Abstract: Extractive methods for multi-document summarization are mainly governed by information overlap, coherence, and content constraints. We present an unsupervised probabilistic approach to model the hidden abstract concepts across documents as well as the correlation between these concepts, to generate topically coherent and non-redundant summaries. Based on human evaluations our models generate summaries with higher linguistic quality in terms of coherence, readability, and redundancy compared to benchmark systems. Although our system is unsupervised and optimized for topical coherence, we achieve a 44.1 ROUGE on the DUC-07 test set, roughly in the range of state-of-the-art supervised models.
6 0.1293803 251 acl-2011-Probabilistic Document Modeling for Syntax Removal in Text Summarization
7 0.1259902 76 acl-2011-Comparative News Summarization Using Linear Programming
8 0.1255451 201 acl-2011-Learning From Collective Human Behavior to Introduce Diversity in Lexical Choice
9 0.11896233 308 acl-2011-Towards a Framework for Abstractive Summarization of Multimodal Documents
10 0.11766294 270 acl-2011-SciSumm: A Multi-Document Summarization System for Scientific Articles
11 0.11210857 4 acl-2011-A Class of Submodular Functions for Document Summarization
12 0.11106807 144 acl-2011-Global Learning of Typed Entailment Rules
13 0.11029626 71 acl-2011-Coherent Citation-Based Summarization of Scientific Papers
14 0.10260098 255 acl-2011-Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering
15 0.088794224 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
16 0.085763767 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
17 0.074188478 79 acl-2011-Confidence Driven Unsupervised Semantic Parsing
18 0.073511563 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
19 0.072536707 280 acl-2011-Sentence Ordering Driven by Local and Global Coherence for Summary Generation
20 0.06509015 18 acl-2011-A Latent Topic Extracting Method based on Events in a Document and its Application
topicId topicWeight
[(0, 0.188), (1, 0.046), (2, -0.046), (3, 0.058), (4, -0.07), (5, -0.02), (6, -0.098), (7, 0.188), (8, -0.008), (9, -0.027), (10, -0.054), (11, 0.036), (12, -0.143), (13, 0.026), (14, -0.196), (15, -0.016), (16, 0.011), (17, -0.011), (18, 0.031), (19, 0.035), (20, -0.02), (21, -0.094), (22, 0.132), (23, 0.037), (24, -0.067), (25, -0.03), (26, -0.002), (27, -0.127), (28, 0.068), (29, 0.043), (30, 0.023), (31, -0.004), (32, 0.068), (33, -0.073), (34, 0.012), (35, 0.051), (36, 0.025), (37, 0.007), (38, -0.016), (39, -0.077), (40, 0.079), (41, 0.025), (42, 0.012), (43, -0.068), (44, -0.034), (45, 0.002), (46, -0.027), (47, -0.086), (48, 0.064), (49, 0.001)]
simIndex simValue paperId paperTitle
same-paper 1 0.93882179 187 acl-2011-Jointly Learning to Extract and Compress
Author: Taylor Berg-Kirkpatrick ; Dan Gillick ; Dan Klein
Abstract: We learn a joint model of sentence extraction and compression for multi-document summarization. Our model scores candidate summaries according to a combined linear model whose features factor over (1) the n-gram types in the summary and (2) the compressions used. We train the model using a marginbased objective whose loss captures end summary quality. Because of the exponentially large set of candidate summaries, we use a cutting-plane algorithm to incrementally detect and add active constraints efficiently. Inference in our model can be cast as an ILP and thereby solved in reasonable time; we also present a fast approximation scheme which achieves similar performance. Our jointly extracted and compressed summaries outperform both unlearned baselines and our learned extraction-only system on both ROUGE and Pyramid, without a drop in judged linguistic quality. We achieve the highest published ROUGE results to date on the TAC 2008 data set.
2 0.73709333 76 acl-2011-Comparative News Summarization Using Linear Programming
Author: Xiaojiang Huang ; Xiaojun Wan ; Jianguo Xiao
Abstract: Comparative News Summarization aims to highlight the commonalities and differences between two comparable news topics. In this study, we propose a novel approach to generating comparative news summaries. We formulate the task as an optimization problem of selecting proper sentences to maximize the comparativeness within the summary and the representativeness to both news topics. We consider semantic-related cross-topic concept pairs as comparative evidences, and consider topic-related concepts as representative evidences. The optimization problem is addressed by using a linear programming model. The experimental results demonstrate the effectiveness of our proposed model.
3 0.72326618 308 acl-2011-Towards a Framework for Abstractive Summarization of Multimodal Documents
Author: Charles Greenbacker
Abstract: We propose a framework for generating an abstractive summary from a semantic model of a multimodal document. We discuss the type of model required, the means by which it can be constructed, how the content of the model is rated and selected, and the method of realizing novel sentences for the summary. To this end, we introduce a metric called information density used for gauging the importance of content obtained from text and graphical sources.
4 0.70523357 47 acl-2011-Automatic Assessment of Coverage Quality in Intelligence Reports
Author: Samuel Brody ; Paul Kantor
Abstract: Common approaches to assessing document quality look at shallow aspects, such as grammar and vocabulary. For many real-world applications, deeper notions of quality are needed. This work represents a first step in a project aimed at developing computational methods for deep assessment of quality in the domain of intelligence reports. We present an automated system for ranking intelligence reports with regard to coverage of relevant material. The system employs methodologies from the field of automatic summarization, and achieves performance on a par with human judges, even in the absence of the underlying information sources.
5 0.70028758 4 acl-2011-A Class of Submodular Functions for Document Summarization
Author: Hui Lin ; Jeff Bilmes
Abstract: We design a class of submodular functions meant for document summarization tasks. These functions each combine two terms, one which encourages the summary to be representative of the corpus, and the other which positively rewards diversity. Critically, our functions are monotone nondecreasing and submodular, which means that an efficient scalable greedy optimization scheme has a constant factor guarantee of optimality. When evaluated on DUC 2004-2007 corpora, we obtain better than existing state-of-art results in both generic and query-focused document summarization. Lastly, we show that several well-established methods for document summarization correspond, in fact, to submodular function optimization, adding further evidence that submodular functions are a natural fit for document summarization.
6 0.69088101 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations
7 0.66111296 270 acl-2011-SciSumm: A Multi-Document Summarization System for Scientific Articles
8 0.65664655 201 acl-2011-Learning From Collective Human Behavior to Introduce Diversity in Lexical Choice
9 0.63795966 326 acl-2011-Using Bilingual Information for Cross-Language Document Summarization
10 0.6228413 251 acl-2011-Probabilistic Document Modeling for Syntax Removal in Text Summarization
11 0.61911517 255 acl-2011-Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering
12 0.55123824 98 acl-2011-Discovery of Topically Coherent Sentences for Extractive Summarization
13 0.48141724 71 acl-2011-Coherent Citation-Based Summarization of Scientific Papers
14 0.45500717 291 acl-2011-SystemT: A Declarative Information Extraction System
15 0.45478168 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
16 0.44062215 150 acl-2011-Hierarchical Text Classification with Latent Concepts
17 0.43392354 51 acl-2011-Automatic Headline Generation using Character Cross-Correlation
18 0.42379755 130 acl-2011-Extracting Comparative Entities and Predicates from Texts Using Comparative Type Classification
19 0.40635157 215 acl-2011-MACAON An NLP Tool Suite for Processing Word Lattices
20 0.40085939 156 acl-2011-IMASS: An Intelligent Microblog Analysis and Summarization System
topicId topicWeight
[(1, 0.021), (5, 0.028), (17, 0.06), (26, 0.032), (31, 0.011), (37, 0.079), (39, 0.067), (41, 0.056), (50, 0.205), (55, 0.047), (59, 0.049), (72, 0.038), (91, 0.047), (96, 0.157), (98, 0.016)]
simIndex simValue paperId paperTitle
1 0.93114674 89 acl-2011-Creative Language Retrieval: A Robust Hybrid of Information Retrieval and Linguistic Creativity
Author: Tony Veale
Abstract: Information retrieval (IR) and figurative language processing (FLP) could scarcely be more different in their treatment of language and meaning. IR views language as an open-ended set of mostly stable signs with which texts can be indexed and retrieved, focusing more on a text’s potential relevance than its potential meaning. In contrast, FLP views language as a system of unstable signs that can be used to talk about the world in creative new ways. There is another key difference: IR is practical, scalable and robust, and in daily use by millions of casual users. FLP is neither scalable nor robust, and not yet practical enough to migrate beyond the lab. This paper thus presents a mutually beneficial hybrid of IR and FLP, one that enriches IR with new operators to enable the non-literal retrieval of creative expressions, and which also transplants FLP into a robust, scalable framework in which practical applications of linguistic creativity can be implemented. 1
2 0.8524822 284 acl-2011-Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models
Author: Elias Ponvert ; Jason Baldridge ; Katrin Erk
Abstract: We consider a new subproblem of unsupervised parsing from raw text, unsupervised partial parsing—the unsupervised version of text chunking. We show that addressing this task directly, using probabilistic finite-state methods, produces better results than relying on the local predictions of a current best unsupervised parser, Seginer’s (2007) CCL. These finite-state models are combined in a cascade to produce more general (full-sentence) constituent structures; doing so outperforms CCL by a wide margin in unlabeled PARSEVAL scores for English, German and Chinese. Finally, we address the use of phrasal punctuation as a heuristic indicator of phrasal boundaries, both in our system and in CCL.
same-paper 3 0.82658011 187 acl-2011-Jointly Learning to Extract and Compress
Author: Taylor Berg-Kirkpatrick ; Dan Gillick ; Dan Klein
Abstract: We learn a joint model of sentence extraction and compression for multi-document summarization. Our model scores candidate summaries according to a combined linear model whose features factor over (1) the n-gram types in the summary and (2) the compressions used. We train the model using a marginbased objective whose loss captures end summary quality. Because of the exponentially large set of candidate summaries, we use a cutting-plane algorithm to incrementally detect and add active constraints efficiently. Inference in our model can be cast as an ILP and thereby solved in reasonable time; we also present a fast approximation scheme which achieves similar performance. Our jointly extracted and compressed summaries outperform both unlearned baselines and our learned extraction-only system on both ROUGE and Pyramid, without a drop in judged linguistic quality. We achieve the highest published ROUGE results to date on the TAC 2008 data set.
4 0.75766385 4 acl-2011-A Class of Submodular Functions for Document Summarization
Author: Hui Lin ; Jeff Bilmes
Abstract: We design a class of submodular functions meant for document summarization tasks. These functions each combine two terms, one which encourages the summary to be representative of the corpus, and the other which positively rewards diversity. Critically, our functions are monotone nondecreasing and submodular, which means that an efficient scalable greedy optimization scheme has a constant factor guarantee of optimality. When evaluated on DUC 2004-2007 corpora, we obtain better than existing state-of-art results in both generic and query-focused document summarization. Lastly, we show that several well-established methods for document summarization correspond, in fact, to submodular function optimization, adding further evidence that submodular functions are a natural fit for document summarization.
5 0.7220602 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
Author: Joseph Reisinger ; Marius Pasca
Abstract: We develop a novel approach to the semantic analysis of short text segments and demonstrate its utility on a large corpus of Web search queries. Extracting meaning from short text segments is difficult as there is little semantic redundancy between terms; hence methods based on shallow semantic analysis may fail to accurately estimate meaning. Furthermore search queries lack explicit syntax often used to determine intent in question answering. In this paper we propose a hybrid model of semantic analysis combining explicit class-label extraction with a latent class PCFG. This class-label correlation (CLC) model admits a robust parallel approximation, allowing it to scale to large amounts of query data. We demonstrate its performance in terms of (1) its predicted label accuracy on polysemous queries and (2) its ability to accurately chunk queries into base constituents.
6 0.71787047 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
7 0.71490467 241 acl-2011-Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation
8 0.71468002 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment
9 0.71240681 178 acl-2011-Interactive Topic Modeling
10 0.71207047 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
11 0.71180952 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation
12 0.71142733 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
13 0.71136725 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
14 0.71117055 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction
15 0.71078509 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning
16 0.71076548 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models
17 0.7098065 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling
18 0.70884418 5 acl-2011-A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing
19 0.70821464 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
20 0.70806706 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing