acl acl2013 acl2013-129 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Lu Wang ; Claire Cardie
Abstract: We address the challenge of generating natural language abstractive summaries for spoken meetings in a domain-independent fashion. We apply Multiple-Sequence Alignment to induce abstract generation templates that can be used for different domains. An Overgenerateand-Rank strategy is utilized to produce and rank candidate abstracts. Experiments using in-domain and out-of-domain training on disparate corpora show that our system uniformly outperforms state-of-the-art supervised extract-based approaches. In addition, human judges rate our system summaries significantly higher than compared systems in fluency and overall quality.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract We address the challenge of generating natural language abstractive summaries for spoken meetings in a domain-independent fashion. [sent-3, score-0.737]
2 We apply Multiple-Sequence Alignment to induce abstract generation templates that can be used for different domains. [sent-4, score-0.198]
3 Experiments using in-domain and out-of-domain training on disparate corpora show that our system uniformly outperforms state-of-the-art supervised extract-based approaches. [sent-6, score-0.144]
4 In addition, human judges rate our system summaries significantly higher than compared systems in fluency and overall quality. [sent-7, score-0.368]
5 Consequently, automatically generated meeting summaries could be of great value to people and businesses alike by providing quick access to the essential content of past meetings. [sent-9, score-0.483]
6 Focused meeting summaries have been proposed as particularly useful; in contrast to summaries of a meeting as a whole, they refer to summaries of a specific aspect of a meeting, such as the DECISIONS reached, PROBLEMS discussed, PROGRESS made or ACTION ITEMS that emerged (Carenini et al. [sent-10, score-1.164]
7 Our goal is to provide an automatic summarization system that can generate abstract-style focused meeting summaries to help users digest the vast amount of meeting content in an easy manner. [sent-12, score-0.765]
8 Existing meeting summarization systems remain largely extractive: their summaries are comprised exclusively of patchworks of utterances selected directly from the meetings to be summa- rized (Riedhammer et al. [sent-13, score-0.859]
9 Although relatively easy to construct, extractive approaches fall short of producing concise and readable summaries, largely due Claire Cardie Department of Computer Science Cornell University Ithaca, NY 14853 cardie @ c s . [sent-17, score-0.191]
10 In contrast, human-written meeting summaries are typically in the form of abstracts distillations of the original conversation written in new language. [sent-28, score-0.688]
11 (2010b) showed that people demonstrate a strong preference for abstractive summaries over extracts when the text to be summarized is conversational. [sent-30, score-0.404]
12 Con— sider, for example, the two types of focused summary along with their associated dialogue snippets in Figure 1. [sent-31, score-0.254]
13 On the contrary, the manually composed summaries (abstracts) are more compact and readable, and are written in a distinctly non-conversational style. [sent-33, score-0.33]
14 Ac s2s0o1ci3a Atiosnso fcoirat Cio nm foprut Caotimonpaulta Lti nognuails Lti cnsg,u piasgteics 1395–1405, To address the limitations of extract-based summaries, we propose a complete and fully automatic domain-independent abstract generation framework for focused meeting summarization. [sent-36, score-0.235]
15 , 2010; Konstas and Lapata, 2012), we first perform content selection: given the dialogue acts relevant to one element of the meeting (e. [sent-38, score-0.296]
16 After redundancy reduction, the full meeting abstract can thus comprise the focused summary for each meeting element. [sent-43, score-0.39]
17 As described in subsequent sections, the generation framework allows us to identify and reformulate the important information for the focused summary. [sent-44, score-0.148]
18 Our contributions are as follows: • • To the best of our knowledge, our system is tThoe hfires tb fully oauutrom knaotwic system tro generate natural language abstracts for spoken meetings. [sent-45, score-0.383]
19 We present a novel template extraction algorithm, ebnatse ad on Multiple Sequence Alignment (MSA) (Durbin et al. [sent-46, score-0.141]
20 ), we also make initial tries for domain adaptation so that our summarization method does not need human-written abstracts for each new meeting domain (e. [sent-51, score-0.48]
21 We instantiate the abstract generation framework on two corpora from disparate domains the AMI Meeting Corpus (Mccowan et al. [sent-54, score-0.143]
22 , 2003) and produce systems to generate focused summaries with regard to four types of meeting elements: DECISIONs, PROBLEMs, ACTION ITEMSs, and PROGRESS. [sent-56, score-0.488]
23 , 2002)) against manually generated focused summaries shows that our summarizers uniformly and statistically significantly outperform two baseline systems as well as a state-of-the-art supervised extraction-based sys— — tem. [sent-58, score-0.441]
24 Human evaluation also indicates that the abstractive summaries produced by our systems are more linguistically appealing than those of the utterance-level extraction-based system, preferring them over summaries from the extractionbased system of comparable semantic correctness (62. [sent-59, score-0.809]
25 Finally, we examine the generality of our model across domains for two types of focused summarization decisions and problems by training the summarizer on out-of-domain data (i. [sent-63, score-0.249]
26 The resulting systems yield results comparable to those from the same system trained on in-domain data, and statistically significantly outperform supervised extractive summarization approaches trained on in-domain data. [sent-66, score-0.288]
27 — 2 — Related Work Most research on spoken dialogue summarization attempts to generate summaries for full dialogues (Carenini et al. [sent-67, score-0.548]
28 Only recently has the task of focused summarization been studied. [sent-69, score-0.157]
29 Supervised methods are investigated to identify key phrases or utterances for inclusion in the decision summary (Fern ´andez et al. [sent-70, score-0.192]
30 Our research is also in line with generating abstractive summaries for conversations. [sent-75, score-0.404]
31 It takes as input a cluster of meeting-item-specific from which one focused summary is constructed. [sent-83, score-0.242]
32 are further italicized and the arguments dialogue acts, Sample relation instances are denoted in bold (The indicators are in [brackets]). [sent-84, score-0.223]
33 Summary-worthy relation instances are identified by content selection module (see Section 4) and then filled into the learned templates individually. [sent-85, score-0.396]
34 A statistical ranker subsequently selects one best abstract per relation instance (see Section 5. [sent-86, score-0.169]
35 The post-selection component reduces the redundancy and outputs the final summary (see Section 5. [sent-88, score-0.145]
36 This work is also broadly related to expert system-based language generation (Reiter and Dale, 2000) and concept-to-text generation tasks (Angeli et al. [sent-93, score-0.154]
37 , 2010; Konstas and Lapata, 2012), where the generation process is decomposed into content selection (or text planning) and surface realization. [sent-94, score-0.216]
38 They generate texts based on a series of decisions made to select the records, fields, and proper templates for render- ing. [sent-97, score-0.171]
39 3 Framework Our domain-independent abstract generation framework produces a summarizer that generates a grammatical abstract from a cluster of meeting-element-related dialogue acts (DAs) all utterances associated with a single decision, problem, action item or progress step of interest. [sent-101, score-0.444]
40 Given the DA cluster to be summarized, the Content Selection module identifies a set of summary-worthy rela— — tion instances represented as indicator-argument pairs (i. [sent-106, score-0.141]
41 In the first step, each relation instance is filled into templates with disparate structures that are learned automatically from the training set (Template Filling). [sent-110, score-0.302]
42 A statistical ranker then selects one best abstract per relation instance (Statistical Ranking). [sent-111, score-0.169]
43 Finally, selected abstracts are processed for redundancy removal in Post-Selection. [sent-112, score-0.329]
44 4 Content Selection Phrase-based content selection approaches have been shown to support better meeting summaries (Fern ´andez et al. [sent-114, score-0.52]
45 Therefore, we chose a content selection representation of a finer granularity than an utterance: we identify relation instances that can both effectively detect the crucial content and incorporate enough syntactic information to facilitate the downstream surface re- alization. [sent-116, score-0.332]
46 A valid indicator-argument pair should have at least one content word and satisfy one of the following constraints: • • When the indicator is a noun, the argument hWash eton b teh a imndoidciafitoerr or complement aorfg uthme einn-t dicator. [sent-125, score-0.154]
47 5 Surface Realization In this section, we describe surface realization, which renders the relation instances into natural language abstracts. [sent-133, score-0.163]
48 Once the templates are learned, the relation instances from Section 4 are filled into the templates to generate an abstract (see Section 5. [sent-136, score-0.414]
49 Every basic or content feature is concatenated with the constituent tags of indicator and argument to compose a new one. [sent-142, score-0.224]
50 Template extraction starts with clustering the sentences that constitute the manually generated abstracts in the training data according to their lexical and structural similarity. [sent-148, score-0.271]
51 Intuitively, desirable templates are those that can be applied in different domains to generate the same type of focused summary (e. [sent-150, score-0.279]
52 We also replace words that appear in both the abstract and supporting dialogue acts by a label indicating its phrase type. [sent-157, score-0.143]
53 (4) Figure 3: Example of template extraction by MultipleSequence Alignment for problem abstracts from AMI. [sent-168, score-0.412]
54 To transform the resulting MSAs into templates, we need to decide whether a word in the sentence should be retained to comprise the template or ab- stracted. [sent-188, score-0.141]
55 We then create a FINE template for each sentence by abstracting the non-backbone words, i. [sent-190, score-0.177]
56 We also create a COARSE template that only preserves the nodes shared by all of the cluster’s sentences. [sent-193, score-0.141]
57 Since filling the relation instances into templates of dis- tinct structures may result in abstracts of varying quality, we rank the abstracts based on the features of the template, the transformation conducted, and the generated abstract. [sent-201, score-0.83]
58 For each hindi, argii, the overgenerate-andranFko approach nfildls it inito, e thaceh template iante -Ta by applying F to generate all possible abstracts. [sent-205, score-0.141]
59 Postselection is conducted on the abstracts {absi}iN=1 stoe feocrtmio nth ies f cionanld summary. [sent-207, score-0.271]
60 1399 The transformation function F models the constituent-level transformations of relation instances and their mappings to the parse trees of templates. [sent-209, score-0.167]
61 FullConstituent Mapping denotes that a source constituent is mapped directly to a target constituent of the template parse tree with the same tag. [sent-211, score-0.324]
62 For instance, an argument “with a spinning wheel” (PP) can be mapped to an NP in a template because it has a sub-constituent “a spinning wheel” (NP). [sent-214, score-0.321]
63 To obtain the realized abstract, we traverse the parse tree of the filled template in pre-order. [sent-220, score-0.186]
64 a We first find which is ethe corresponding constituent pair of hindtkar, argktari in a. [sent-227, score-0.151]
65 Given the generated abstracts A = 1400 selection to generate the final summary. [sent-234, score-0.308]
66 We define wij as the unigram similarity between abstracts absi and absj, C(absi) as the number of words in absi. [sent-236, score-0.352]
67 , 2005) contains 139 scenario-driven meetings, where groups of four people participate in a series of four meetings for a fictitious project of designing remote control. [sent-241, score-0.297]
68 We use 57 meetings in ICSI and 139 meetings in AMI that include a short (usually one-sentence), manually constructed abstract summarizing each important output for every meeting. [sent-247, score-0.594]
69 Decision and problem summaries are annotated for both corpora. [sent-248, score-0.33]
70 The set of dialogue acts that support each abstract are annotated as such. [sent-250, score-0.143]
71 X-axis shows the number of meetings used for training. [sent-256, score-0.297]
72 We compare our system with (1) two unsupervised baselines, (2) two supervised extractive approaches, and (3) an oracle derived from the gold standard abstracts. [sent-260, score-0.202]
73 the DA with the largest TFIDF similarity with the cluster centroid) as the summary according to Wang and Cardie (201 1). [sent-266, score-0.171]
74 Although it is possible that important content is spread over multiple DAs, both baselines allow us to determine summary quality when summaries are restricted to a single utterance. [sent-267, score-0.483]
75 We also compare our approach to two supervised extractive summarization methods Support Vector Machines (Joachims, 1998) trained with the same fea— 1401 tures as our system (see Table 1) to identify the important DAs (no syntax features) (Xie et al. [sent-269, score-0.288]
76 ROUGE computes the ngram overlapping between the system summaries with the reference summaries, and has been used for both text and speech summarization (Dang, 2005; Xie et al. [sent-277, score-0.454]
77 In AMI, four meetings of different functions are carried out in each group5. [sent-280, score-0.297]
78 35 meetings for “conceptual design” are randomly selected for testing. [sent-281, score-0.297]
79 The learning curve of our system is relatively flat, which means not many training meetings are required to reach a usable performance level. [sent-284, score-0.335]
80 Note that the ROUGE scores are relative low when the reference summaries are human abstracts, even for evaluation among abstracts produced by different annotators (Dang, 2005). [sent-285, score-0.601]
81 , 2002) (the precision of unigrams and bigrams with a brevity penalty) is computed with human abstracts as reference. [sent-290, score-0.271]
82 5The four types of meetings in AMI are: project kick-off (35 meetings), functional design (35 meetings), conceptual design (35 meetings), and detailed design (34 meetings). [sent-294, score-0.297]
83 SVM-DA de- notes for supervised extractive methods with SVMs on utterance-level. [sent-296, score-0.164]
84 We are not aware of any existing work generating abstractive summaries for conversations. [sent-297, score-0.404]
85 Therefore, we compare our full system against a supervised utterance-level extractive method based on SVMs along with the baselines. [sent-298, score-0.202]
86 We further examine our system in domain adaptation scenarios for decision and problem summarization, where we train the system on AMI for use on ICSI, and vice versa. [sent-301, score-0.158]
87 Table 3 indicates that, with both true clusterings and system clusterings, our system trained on out-of-domain data achieves comparable performance with the same system trained on in-domain data. [sent-302, score-0.188]
88 The average Length of the abstracts for each system is also listed. [sent-324, score-0.309]
89 Each cluster of DAs along with three randomly ordered summaries are presented to the judges. [sent-326, score-0.414]
90 Results in Table 4 demonstrate that our system summaries are significantly more compact and fluent than the extract-based method (p < 0. [sent-332, score-0.368]
91 The judges also rank the three summaries in terms of the overall quality in content, conciseness and grammaticality. [sent-334, score-0.33]
92 8 Conclusion We presented a domain-independent abstract generation framework for focused meeting summa- rization. [sent-342, score-0.235]
93 Experimental results on two disparate meeting corpora show that our system can uni- itPSTOHhnrVteuoMmsbhtSlae- DnympsrA:et ohSHmafuondwr(teuOIm. [sent-343, score-0.191]
94 Figure 6: Sample decision and problem summaries generated by various systems for examples in Figure 1. [sent-347, score-0.376]
95 Our system also exhibits an ability to train on out-of-domain data to generate abstracts for a new target domain. [sent-349, score-0.309]
96 Extracting decisions from multi-party dialogue using directed graphical models and semantic similarity. [sent-365, score-0.146]
97 A skip-chain conditional random field for ranking meeting utterances by importance. [sent-393, score-0.146]
98 From extractive to ab- stractive meeting summaries: can it be done by sentence compression? [sent-458, score-0.211]
99 Generating and validating abstracts of meeting conversations: a user study. [sent-496, score-0.358]
100 Evaluating the effectiveness of features and sampling in extractive meeting summarization. [sent-556, score-0.211]
wordName wordTfidf (topN-words)
[('summaries', 0.33), ('meetings', 0.297), ('abstracts', 0.271), ('slotvp', 0.263), ('slotnp', 0.202), ('ami', 0.177), ('icsi', 0.165), ('murray', 0.148), ('template', 0.141), ('msa', 0.132), ('extractive', 0.124), ('templates', 0.121), ('argsrci', 0.101), ('hindsrc', 0.101), ('angeli', 0.099), ('ranker', 0.099), ('carenini', 0.099), ('dialogue', 0.096), ('xie', 0.092), ('stroudsburg', 0.089), ('meeting', 0.087), ('summary', 0.087), ('summarization', 0.086), ('cluster', 0.084), ('absi', 0.081), ('argktari', 0.081), ('hindtkran', 0.081), ('mccowan', 0.081), ('slotwhadjp', 0.081), ('fern', 0.078), ('generation', 0.077), ('clusterings', 0.074), ('gabriel', 0.074), ('abstractive', 0.074), ('bui', 0.071), ('focused', 0.071), ('pa', 0.071), ('constituent', 0.07), ('relation', 0.07), ('cardie', 0.067), ('content', 0.066), ('disparate', 0.066), ('das', 0.062), ('durbin', 0.061), ('indtkar', 0.061), ('janin', 0.061), ('da', 0.059), ('utterances', 0.059), ('konstas', 0.059), ('rouge', 0.059), ('redundancy', 0.058), ('instances', 0.057), ('subsumed', 0.057), ('giuseppe', 0.057), ('andez', 0.054), ('riedhammer', 0.054), ('decisions', 0.05), ('adjp', 0.049), ('spinning', 0.049), ('indicator', 0.049), ('acts', 0.047), ('decision', 0.046), ('filled', 0.045), ('mapped', 0.043), ('bleu', 0.043), ('np', 0.043), ('summarizer', 0.042), ('barzilay', 0.042), ('argii', 0.04), ('argktar', 0.04), ('argsrc', 0.04), ('argtkrani', 0.04), ('frampton', 0.04), ('hindtkar', 0.04), ('hwash', 0.04), ('noldus', 0.04), ('sandu', 0.04), ('slotsbar', 0.04), ('supervised', 0.04), ('wang', 0.04), ('transformation', 0.04), ('realization', 0.04), ('argument', 0.039), ('action', 0.039), ('system', 0.038), ('svms', 0.038), ('selection', 0.037), ('coarse', 0.037), ('correctness', 0.037), ('raymond', 0.037), ('heilman', 0.036), ('sigdial', 0.036), ('surface', 0.036), ('adaptation', 0.036), ('spoken', 0.036), ('wheel', 0.036), ('needleman', 0.036), ('eligible', 0.036), ('abstracting', 0.036), ('dowding', 0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999869 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization
Author: Lu Wang ; Claire Cardie
Abstract: We address the challenge of generating natural language abstractive summaries for spoken meetings in a domain-independent fashion. We apply Multiple-Sequence Alignment to induce abstract generation templates that can be used for different domains. An Overgenerateand-Rank strategy is utilized to produce and rank candidate abstracts. Experiments using in-domain and out-of-domain training on disparate corpora show that our system uniformly outperforms state-of-the-art supervised extract-based approaches. In addition, human judges rate our system summaries significantly higher than compared systems in fluency and overall quality.
2 0.16632098 377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization
Author: Chen Li ; Xian Qian ; Yang Liu
Abstract: In this paper, we propose a bigram based supervised method for extractive document summarization in the integer linear programming (ILP) framework. For each bigram, a regression model is used to estimate its frequency in the reference summary. The regression model uses a variety ofindicative features and is trained discriminatively to minimize the distance between the estimated and the ground truth bigram frequency in the reference summary. During testing, the sentence selection problem is formulated as an ILP problem to maximize the bigram gains. We demonstrate that our system consistently outperforms the previous ILP method on different TAC data sets, and performs competitively compared to the best results in the TAC evaluations. We also conducted various analysis to show the impact of bigram selection, weight estimation, and ILP setup.
3 0.16187538 353 acl-2013-Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain
Author: Jackie Chi Kit Cheung ; Gerald Penn
Abstract: In automatic summarization, centrality is the notion that a summary should contain the core parts of the source text. Current systems use centrality, along with redundancy avoidance and some sentence compression, to produce mostly extractive summaries. In this paper, we investigate how summarization can advance past this paradigm towards robust abstraction by making greater use of the domain of the source text. We conduct a series of studies comparing human-written model summaries to system summaries at the semantic level of caseframes. We show that model summaries (1) are more abstractive and make use of more sentence aggregation, (2) do not contain as many topical caseframes as system summaries, and (3) cannot be reconstructed solely from the source text, but can be if texts from in-domain documents are added. These results suggest that substantial improvements are unlikely to result from better optimizing centrality-based criteria, but rather more domain knowledge is needed.
4 0.14599527 5 acl-2013-A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art
Author: Peter A. Rankel ; John M. Conroy ; Hoa Trang Dang ; Ani Nenkova
Abstract: How good are automatic content metrics for news summary evaluation? Here we provide a detailed answer to this question, with a particular focus on assessing the ability of automatic evaluations to identify statistically significant differences present in manual evaluation of content. Using four years of data from the Text Analysis Conference, we analyze the performance of eight ROUGE variants in terms of accuracy, precision and recall in finding significantly different systems. Our experiments show that some of the neglected variants of ROUGE, based on higher order n-grams and syntactic dependencies, are most accurate across the years; the commonly used ROUGE-1 scores find too many significant differences between systems which manual evaluation would deem comparable. We also test combinations ofROUGE variants and find that they considerably improve the accuracy of automatic prediction.
5 0.14412877 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
Author: Lu Wang ; Hema Raghavan ; Vittorio Castelli ; Radu Florian ; Claire Cardie
Abstract: We consider the problem of using sentence compression techniques to facilitate queryfocused multi-document summarization. We present a sentence-compression-based framework for the task, and design a series of learning-based compression models built on parse trees. An innovative beam search decoder is proposed to efficiently find highly probable compressions. Under this framework, we show how to integrate various indicative metrics such as linguistic motivation and query relevance into the compression process by deriving a novel formulation of a compression scoring function. Our best model achieves statistically significant improvement over the state-of-the-art systems on several metrics (e.g. 8.0% and 5.4% improvements in ROUGE-2 respectively) for the DUC 2006 and 2007 summarization task. ,
6 0.14115976 59 acl-2013-Automated Pyramid Scoring of Summaries using Distributional Semantics
7 0.12779506 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
8 0.11844238 319 acl-2013-Sequential Summarization: A New Application for Timely Updated Twitter Trending Topics
9 0.11644334 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
10 0.11571974 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
11 0.10229472 90 acl-2013-Conditional Random Fields for Responsive Surface Realisation using Global Features
12 0.096607625 333 acl-2013-Summarization Through Submodularity and Dispersion
13 0.094264813 376 acl-2013-Using Lexical Expansion to Learn Inference Rules from Sparse Data
14 0.092685014 21 acl-2013-A Statistical NLG Framework for Aggregated Planning and Realization
15 0.081910782 126 acl-2013-Diverse Keyword Extraction from Conversations
16 0.080893524 23 acl-2013-A System for Summarizing Scientific Topics Starting from Keywords
17 0.074546911 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation
18 0.074418135 317 acl-2013-Sentence Level Dialect Identification in Arabic
19 0.073732942 332 acl-2013-Subtree Extractive Summarization via Submodular Maximization
20 0.073458321 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
topicId topicWeight
[(0, 0.213), (1, 0.029), (2, 0.003), (3, -0.066), (4, -0.006), (5, 0.043), (6, 0.111), (7, -0.027), (8, -0.148), (9, -0.06), (10, -0.042), (11, 0.034), (12, -0.133), (13, 0.04), (14, -0.124), (15, 0.131), (16, 0.148), (17, -0.131), (18, 0.018), (19, 0.048), (20, -0.053), (21, -0.103), (22, 0.044), (23, 0.021), (24, 0.048), (25, -0.021), (26, 0.08), (27, -0.005), (28, 0.05), (29, -0.011), (30, -0.007), (31, 0.007), (32, 0.103), (33, -0.014), (34, -0.063), (35, 0.05), (36, -0.027), (37, 0.015), (38, -0.038), (39, 0.04), (40, 0.041), (41, 0.08), (42, 0.051), (43, -0.004), (44, 0.052), (45, -0.031), (46, 0.07), (47, -0.049), (48, -0.053), (49, 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.91925144 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization
Author: Lu Wang ; Claire Cardie
Abstract: We address the challenge of generating natural language abstractive summaries for spoken meetings in a domain-independent fashion. We apply Multiple-Sequence Alignment to induce abstract generation templates that can be used for different domains. An Overgenerateand-Rank strategy is utilized to produce and rank candidate abstracts. Experiments using in-domain and out-of-domain training on disparate corpora show that our system uniformly outperforms state-of-the-art supervised extract-based approaches. In addition, human judges rate our system summaries significantly higher than compared systems in fluency and overall quality.
Author: Jackie Chi Kit Cheung ; Gerald Penn
Abstract: In automatic summarization, centrality is the notion that a summary should contain the core parts of the source text. Current systems use centrality, along with redundancy avoidance and some sentence compression, to produce mostly extractive summaries. In this paper, we investigate how summarization can advance past this paradigm towards robust abstraction by making greater use of the domain of the source text. We conduct a series of studies comparing human-written model summaries to system summaries at the semantic level of caseframes. We show that model summaries (1) are more abstractive and make use of more sentence aggregation, (2) do not contain as many topical caseframes as system summaries, and (3) cannot be reconstructed solely from the source text, but can be if texts from in-domain documents are added. These results suggest that substantial improvements are unlikely to result from better optimizing centrality-based criteria, but rather more domain knowledge is needed.
3 0.82787848 5 acl-2013-A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art
Author: Peter A. Rankel ; John M. Conroy ; Hoa Trang Dang ; Ani Nenkova
Abstract: How good are automatic content metrics for news summary evaluation? Here we provide a detailed answer to this question, with a particular focus on assessing the ability of automatic evaluations to identify statistically significant differences present in manual evaluation of content. Using four years of data from the Text Analysis Conference, we analyze the performance of eight ROUGE variants in terms of accuracy, precision and recall in finding significantly different systems. Our experiments show that some of the neglected variants of ROUGE, based on higher order n-grams and syntactic dependencies, are most accurate across the years; the commonly used ROUGE-1 scores find too many significant differences between systems which manual evaluation would deem comparable. We also test combinations ofROUGE variants and find that they considerably improve the accuracy of automatic prediction.
4 0.79143369 377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization
Author: Chen Li ; Xian Qian ; Yang Liu
Abstract: In this paper, we propose a bigram based supervised method for extractive document summarization in the integer linear programming (ILP) framework. For each bigram, a regression model is used to estimate its frequency in the reference summary. The regression model uses a variety ofindicative features and is trained discriminatively to minimize the distance between the estimated and the ground truth bigram frequency in the reference summary. During testing, the sentence selection problem is formulated as an ILP problem to maximize the bigram gains. We demonstrate that our system consistently outperforms the previous ILP method on different TAC data sets, and performs competitively compared to the best results in the TAC evaluations. We also conducted various analysis to show the impact of bigram selection, weight estimation, and ILP setup.
5 0.76165664 333 acl-2013-Summarization Through Submodularity and Dispersion
Author: Anirban Dasgupta ; Ravi Kumar ; Sujith Ravi
Abstract: We propose a new optimization framework for summarization by generalizing the submodular framework of (Lin and Bilmes, 2011). In our framework the summarization desideratum is expressed as a sum of a submodular function and a nonsubmodular function, which we call dispersion; the latter uses inter-sentence dissimilarities in different ways in order to ensure non-redundancy of the summary. We consider three natural dispersion functions and show that a greedy algorithm can obtain an approximately optimal summary in all three cases. We conduct experiments on two corpora—DUC 2004 and user comments on news articles—and show that the performance of our algorithm outperforms those that rely only on submodularity.
6 0.73538297 59 acl-2013-Automated Pyramid Scoring of Summaries using Distributional Semantics
7 0.72966379 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
8 0.69139373 375 acl-2013-Using Integer Linear Programming in Concept-to-Text Generation to Produce More Compact Texts
9 0.66513449 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
10 0.64597315 21 acl-2013-A Statistical NLG Framework for Aggregated Planning and Realization
11 0.63584203 332 acl-2013-Subtree Extractive Summarization via Submodular Maximization
12 0.58833462 142 acl-2013-Evolutionary Hierarchical Dirichlet Process for Timeline Summarization
13 0.56550878 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
14 0.56147897 90 acl-2013-Conditional Random Fields for Responsive Surface Realisation using Global Features
15 0.54582822 23 acl-2013-A System for Summarizing Scientific Topics Starting from Keywords
16 0.53902966 319 acl-2013-Sequential Summarization: A New Application for Timely Updated Twitter Trending Topics
17 0.52607059 337 acl-2013-Tag2Blog: Narrative Generation from Satellite Tag Data
18 0.52239877 65 acl-2013-BRAINSUP: Brainstorming Support for Creative Sentence Generation
19 0.51318473 86 acl-2013-Combining Referring Expression Generation and Surface Realization: A Corpus-Based Investigation of Architectures
20 0.51294053 225 acl-2013-Learning to Order Natural Language Texts
topicId topicWeight
[(0, 0.045), (6, 0.069), (9, 0.268), (11, 0.062), (14, 0.011), (15, 0.02), (24, 0.048), (26, 0.04), (35, 0.072), (42, 0.06), (48, 0.035), (70, 0.045), (88, 0.047), (90, 0.03), (95, 0.075)]
simIndex simValue paperId paperTitle
same-paper 1 0.76954293 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization
Author: Lu Wang ; Claire Cardie
Abstract: We address the challenge of generating natural language abstractive summaries for spoken meetings in a domain-independent fashion. We apply Multiple-Sequence Alignment to induce abstract generation templates that can be used for different domains. An Overgenerateand-Rank strategy is utilized to produce and rank candidate abstracts. Experiments using in-domain and out-of-domain training on disparate corpora show that our system uniformly outperforms state-of-the-art supervised extract-based approaches. In addition, human judges rate our system summaries significantly higher than compared systems in fluency and overall quality.
2 0.64867526 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation
Author: Trevor Cohn ; Gholamreza Haffari
Abstract: Modern phrase-based machine translation systems make extensive use of wordbased translation models for inducing alignments from parallel corpora. This is problematic, as the systems are incapable of accurately modelling many translation phenomena that do not decompose into word-for-word translation. This paper presents a novel method for inducing phrase-based translation units directly from parallel data, which we frame as learning an inverse transduction grammar (ITG) using a recursive Bayesian prior. Overall this leads to a model which learns translations of entire sentences, while also learning their decomposition into smaller units (phrase-pairs) recursively, terminating at word translations. Our experiments on Arabic, Urdu and Farsi to English demonstrate improvements over competitive baseline systems.
3 0.54785508 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
Author: Ulle Endriss ; Raquel Fernandez
Abstract: Crowdsourcing, which offers new ways of cheaply and quickly gathering large amounts of information contributed by volunteers online, has revolutionised the collection of labelled data. Yet, to create annotated linguistic resources from this data, we face the challenge of having to combine the judgements of a potentially large group of annotators. In this paper we investigate how to aggregate individual annotations into a single collective annotation, taking inspiration from the field of social choice theory. We formulate a general formal model for collective annotation and propose several aggregation methods that go beyond the commonly used majority rule. We test some of our methods on data from a crowdsourcing experiment on textual entailment annotation.
4 0.54655939 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
Author: Lu Wang ; Hema Raghavan ; Vittorio Castelli ; Radu Florian ; Claire Cardie
Abstract: We consider the problem of using sentence compression techniques to facilitate queryfocused multi-document summarization. We present a sentence-compression-based framework for the task, and design a series of learning-based compression models built on parse trees. An innovative beam search decoder is proposed to efficiently find highly probable compressions. Under this framework, we show how to integrate various indicative metrics such as linguistic motivation and query relevance into the compression process by deriving a novel formulation of a compression scoring function. Our best model achieves statistically significant improvement over the state-of-the-art systems on several metrics (e.g. 8.0% and 5.4% improvements in ROUGE-2 respectively) for the DUC 2006 and 2007 summarization task. ,
5 0.5449478 333 acl-2013-Summarization Through Submodularity and Dispersion
Author: Anirban Dasgupta ; Ravi Kumar ; Sujith Ravi
Abstract: We propose a new optimization framework for summarization by generalizing the submodular framework of (Lin and Bilmes, 2011). In our framework the summarization desideratum is expressed as a sum of a submodular function and a nonsubmodular function, which we call dispersion; the latter uses inter-sentence dissimilarities in different ways in order to ensure non-redundancy of the summary. We consider three natural dispersion functions and show that a greedy algorithm can obtain an approximately optimal summary in all three cases. We conduct experiments on two corpora—DUC 2004 and user comments on news articles—and show that the performance of our algorithm outperforms those that rely only on submodularity.
6 0.54456854 353 acl-2013-Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain
7 0.54094136 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
8 0.53984332 36 acl-2013-Adapting Discriminative Reranking to Grounded Language Learning
9 0.53900176 259 acl-2013-Non-Monotonic Sentence Alignment via Semisupervised Learning
10 0.53821826 252 acl-2013-Multigraph Clustering for Unsupervised Coreference Resolution
11 0.53668368 250 acl-2013-Models of Translation Competitions
12 0.53640139 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
13 0.53634501 318 acl-2013-Sentiment Relevance
14 0.53605771 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
15 0.53593916 172 acl-2013-Graph-based Local Coherence Modeling
16 0.53565514 196 acl-2013-Improving pairwise coreference models through feature space hierarchy learning
17 0.53411818 52 acl-2013-Annotating named entities in clinical text by combining pre-annotation and active learning
18 0.53381193 377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization
19 0.53366327 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
20 0.53356439 267 acl-2013-PARMA: A Predicate Argument Aligner