acl acl2013 acl2013-225 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jiwei Tan ; Xiaojun Wan ; Jianguo Xiao
Abstract: Ordering texts is an important task for many NLP applications. Most previous works on summary sentence ordering rely on the contextual information (e.g. adjacent sentences) of each sentence in the source document. In this paper, we investigate a more challenging task of ordering a set of unordered sentences without any contextual information. We introduce a set of features to characterize the order and coherence of natural language texts, and use the learning to rank technique to determine the order of any two sentences. We also propose to use the genetic algorithm to determine the total order of all sentences. Evaluation results on a news corpus show the effectiveness of our proposed method. 1
Reference: text
sentIndex sentText sentNum sentScore
1 cn b , Abstract Ordering texts is an important task for many NLP applications. [sent-4, score-0.04]
2 Most previous works on summary sentence ordering rely on the contextual information (e. [sent-5, score-0.599]
3 adjacent sentences) of each sentence in the source document. [sent-7, score-0.09]
4 In this paper, we investigate a more challenging task of ordering a set of unordered sentences without any contextual information. [sent-8, score-0.609]
5 We introduce a set of features to characterize the order and coherence of natural language texts, and use the learning to rank technique to determine the order of any two sentences. [sent-9, score-0.246]
6 We also propose to use the genetic algorithm to determine the total order of all sentences. [sent-10, score-0.346]
7 Evaluation results on a news corpus show the effectiveness of our proposed method. [sent-11, score-0.051]
8 1 Introduction Ordering texts is an important task in many natu- ral language processing (NLP) applications. [sent-12, score-0.04]
9 It is typically applicable in the text generation field, both for concept-to-text generation and text-totext generation (Lapata, 2003), such as multiple document summarization (MDS), question answering and so on. [sent-13, score-0.201]
10 However, ordering a set of sentences into a coherent text is still a hard and challenging problem for computers. [sent-14, score-0.53]
11 Previous works on sentence ordering mainly focus on the MDS task (Barzilay et al. [sent-15, score-0.547]
12 In this task, each summary sentence is extracted from a source document. [sent-24, score-0.142]
13 The timestamp of the source documents and the adjacent sentences in the source documents can be used as important clues for ordering summary sentences. [sent-25, score-0.53]
14 In this study, we investigate a more challenging and more general task of ordering a set of unordered sentences (e. [sent-26, score-0.609]
15 sentences in a text paragraph) without any con- textual information. [sent-29, score-0.057]
16 This task can be applied to almost all text generation applications without restriction. [sent-30, score-0.054]
17 In order to address this challenging task, we first introduce a few useful features to characterize the order and coherence of natural language texts, and then propose to use the learning to rank algorithm to determine the order of two sentences. [sent-31, score-0.341]
18 Moreover, we propose to use the genetic algorithm to decide the overall text order. [sent-32, score-0.383]
19 Evaluations are conducted on a news corpus, and the results show the prominence of our method. [sent-33, score-0.051]
20 2 Related Work For works taking no use of source document, Lapata (2003) proposed a probabilistic model which learns constraints on sentence ordering from a corpus of texts. [sent-35, score-0.547]
21 However, the model only works well when using single feature, but unfortunately, it becomes worse when multiple features are combined. [sent-37, score-0.08]
22 Nahnsen (2009) employed features which were based on discourse entities, shallow syntactic analysis, and temporal precedence relations retrieved from VerbOcean. [sent-39, score-0.114]
23 1 Our Proposed Method Overview The task of text ordering can be modeled like (Cohen et al. [sent-42, score-0.421]
24 , 1998), as measuring the coherence of a text by summing the association strength of any sentence pairs. [sent-43, score-0.206]
25 Then the objective of a text ordering model is to find a permutation which can maximize the summation. [sent-44, score-0.622]
26 c e2 A0s1s3oc Aiastsio cnia fotiron C fo mrp Cuotmatpiounta tlio Lninaglu Li sntgicusi,s ptaicgses 87–91, Formally, we define an association strength function PREF(u , v ) ∈ R to measure how strong it is that sentence u should be arranged before sentence v (denoted as u ? [sent-47, score-0.384]
27 We then define function AGREE(ρ ,PREF) as: AGREE(ρ ,PREF) = u, v :ρ∑ (u )>ρ(v )PREF(u , v ) (1) where ρ denotes a sentence permutation and ρ(u ) > ρ(v ) means u ? [sent-49, score-0.339]
28 Then the objective of finding an overall order of the sentences becomes finding a permutation ρ to maximize AGREE(ρ ,PREF) . [sent-51, score-0.298]
29 The main framework is made up of two parts: defining a pairwise order relation and determining an overall order. [sent-52, score-0.138]
30 Our study focuses on both the two parts by learning a better pairwise relation and proposing a better search strategy, as described respectively in next sections. [sent-53, score-0.103]
31 2 Pairwise Relation Learning The goal for pairwise relation learning is defining the strength function PREF for any sentence pair. [sent-55, score-0.287]
32 Method: Traditionally, there are two main methods for defining a strength function: inte- grating features by a linear combination (He et al. [sent-57, score-0.148]
33 However, the binary classification method is very coarsegrained since it considers any pair of sentences either “positive” or “negative”. [sent-61, score-0.106]
34 Instead we propose to use a better model of learning to rank to integrate multiple features. [sent-62, score-0.043]
35 In this study, we use Ranking SVM implemented in the svmrank toolkit (Joachims, 2002; Joachims, 2006) as the ranking model. [sent-63, score-0.171]
36 The examples to be ranked in our ranking model are sequential sentence pairs like u ? [sent-64, score-0.235]
37 The feature values for a training example are generated by a few feature functions fi( u , v ) , and we will introduce the features later. [sent-66, score-0.044]
38 We build the training examples for svmrank as follows: For a training query, which is a paragraph with n sequential sentences as s1 ? [sent-67, score-0.286]
39 sa+ k (k > 0) the target rank values are set to n − k , which means that the longer the distance between the two sentences is, the smaller the target value is. [sent-75, score-0.1]
40 In order to better capture the order information of each feature, for every sentence pair u ? [sent-78, score-0.09]
41 A paragraph of unordered sentences is viewed as a test query, and the predicted target value for u ? [sent-88, score-0.285]
42 Features: We select four types of features to characterize text coherence. [sent-90, score-0.107]
43 Every type of features is quantified with several functions distinguished by i in the formulation of fi( u , v ) and normalized to [0,1] . [sent-91, score-0.095]
44 The features and definitions of fi( u , v ) are introduced in Table 1. [sent-92, score-0.044]
45 For the coreference features we use the ARKref1 tool. [sent-95, score-0.132]
46 It can output the coreference chains containing words which represent the same entity for two sequential sentences u ? [sent-96, score-0.195]
47 The probability model originates from (Lapata, 2003), and we implement the model with four features of lemmatized noun, verb, adjective or adverb, and verb and noun related dependency. [sent-98, score-0.131]
48 (1998) proved finding a permutation ρ to maximize AGREE(ρ ,PREF) is NPcomplete. [sent-101, score-0.201]
49 To solve this, they proposed a greedy algorithm for finding an approximately optimal order. [sent-102, score-0.118]
50 Most later works adopted the greedy search strategy to determine the overall order. [sent-103, score-0.292]
51 However, a greedy algorithm does not always lead to satisfactory results, as our experiment shows in Section 4. [sent-104, score-0.118]
52 Therefore, we propose to use the genetic algorithm (Holland, 1992) as the search strategy, which can lead to better results. [sent-106, score-0.355]
53 Genetic Algorithm: The genetic algorithm (GA) is an artificial intelligence algorithm for optimization and search problems. [sent-107, score-0.398]
54 The key point of using GA is modeling the individual, fitness function and three operators of crossover, mutation and selection. [sent-108, score-0.305]
55 Once a problem is modeled, the algorithm can be constructed conventionally. [sent-109, score-0.043]
56 In our method we set a permutation ρ as an individual encoded by a numerical path, for example a permutation s2 ? [sent-110, score-0.322]
57 Then the function AGREE(ρ ,PREF) is just the fitness function. [sent-113, score-0.191]
58 We adopt the order-based crossover operator which is described in (Davis, 1985). [sent-114, score-0.199]
59 The mutation operator is a random inversion of two sentences. [sent-115, score-0.208]
60 For selection operator we take a tournament selection operator which randomly selects two individuals to choose the one with the greater fitness value AGREE(ρ ,PREF) . [sent-116, score-0.34]
61 edu/ARKref/ After several generations of evolution, the individual with the greatest fitness value will be a close solution to the optimal result. [sent-121, score-0.152]
62 Comparisons: It is incomparable with other methods for summary sentence ordering based on special summarization corpus, so we implemented Lapata’s probability model for comparison, which is considered the state of the art for this task. [sent-126, score-0.602]
63 In addition, we implemented a random ordering as a baseline. [sent-127, score-0.421]
64 We also tried to use a classification model in place of the ranking model. [sent-128, score-0.144]
65 In the classification model, sentence pairs like sa ? [sent-129, score-0.306]
66 sa+1 were viewed as positive examples and all other pairs were viewed as negative examples. [sent-130, score-0.092]
67 When deciding the overall order for either ranking or classification model we used three search strategies: greedy, genetic and exhaustive (or brutal) algorithms. [sent-131, score-0.559]
68 For comparative analysis of features, we tested with an exhaustive search algorithm to determine the overall order. [sent-134, score-0.235]
69 2 Experiment Results The comparison results in Table 2 show that our Ranking SVM based method improves the performance over the baselines and the classification based method with any of the search algorithms. [sent-136, score-0.098]
70 We can also see the greedy search strategy does not perform well and the genetic algorithm can provide a good approximate solution to obCtPalTBriRMnosa bensoiftalkpehbciton2admlegit:oayAlnvresG0aug. [sent-137, score-0.482]
71 Classification: It is not surprising that the ranking model is better, because when using a classification model, an example should be labeled either positive or negative. [sent-148, score-0.144]
72 It is not very reasonable to label a sentence pair like sa ? [sent-149, score-0.257]
73 sa+ k (k > 1) as a negative example, nor a positive one, because in some cases, it is easy to conclude one sentence should be arranged after another but hard to decide whether they should be adjacent. [sent-150, score-0.232]
74 In a ranking model, this informa- tion can be quantified by the different priorities of sentence pairs with different distances. [sent-152, score-0.274]
75 Single Feature Effect: The effects of different types of features are shown in Table 3. [sent-153, score-0.044]
76 It can be seen in Table 3 that all these features contribute to the final result. [sent-155, score-0.044]
77 The two features of noun probability and dependency probability play an important role as demonstrated in (Lapata, 2003). [sent-156, score-0.044]
78 A paragraph which is ordered entirely right by our method is shown in Figure 1. [sent-158, score-0.103]
79 (1) Vanunu, 43, is serving an 18-year sentence for treason. [sent-159, score-0.09]
80 (2) He was kidnapped by Israel's Mossad spy agency in Rome in 1986 after giving The Sunday Times of London photographs of the inside of the Dimona reactor. [sent-160, score-0.086]
81 (3) From the photographs, experts determined that Israel had the world's sixth largest stockpile of nuclear weapons. [sent-161, score-0.09]
82 (4) Israel has never confirmed or denied that it has a nuclear capability. [sent-162, score-0.128]
83 Sentences which should be arranged together tend to have a higher similarity and overlap. [sent-164, score-0.105]
84 Like sentence (3) and (4) in Figure 1, they have a highest cosine similarity of 0. [sent-165, score-0.09]
85 However, the similarity or overlap of the two sentences does not help to decide which sentence should be arranged before another. [sent-167, score-0.355]
86 In this case the overlap and similarity of half part of the sentences may help. [sent-168, score-0.123]
87 For example latter((3)) and former((4)) share an overlap of “Israel” while there is no overlap for latter((4)) and former((3)). [sent-169, score-0.132]
88 Coreference is also an important clue for ordering natural language texts. [sent-170, score-0.421]
89 For example when conducting coreference resolution for (1) ? [sent-172, score-0.088]
90 3 Genetic Algorithm There are three main parameters for GA including the crossover probability (PC), the mutation probability (PM) and the population size (PS). [sent-177, score-0.219]
91 As we can see in Table 4, when adjusting the three parameters the average τ values are all close to the exhaustive result of 0. [sent-187, score-0.063]
92 Table 4 shows that in our case the genetic algorithm is not very sensible to the parameters. [sent-189, score-0.306]
93 5 Conclusion and Discussion In this paper we propose a method for ordering sentences which have no contextual information by making use of Ranking SVM and the genetic algorithm. [sent-195, score-0.741]
94 In future work, we will explore more features such as semantic features to further improve the performance. [sent-197, score-0.088]
95 A machine learning approach to sentence ordering for multi-document summarization and its evaluation. [sent-201, score-0.55]
96 A bottom-up approach to sentence ordering for multi-document summarization. [sent-205, score-0.511]
97 Inferring strategies for sentence ordering in multidocument news summarization. [sent-253, score-0.637]
98 Catching the drift: Probabilistic content models, with applications to generation and summarization. [sent-257, score-0.054]
99 Sentence ordering with event-enriched semantics and two-layered clustering for multi-document news summarization. [sent-261, score-0.472]
100 An adjacency model for sentence ordering in multidocument summarization. [sent-299, score-0.586]
wordName wordTfidf (topN-words)
[('ordering', 0.421), ('genetic', 0.263), ('pref', 0.257), ('bollegala', 0.188), ('sa', 0.167), ('permutation', 0.161), ('fi', 0.158), ('fitness', 0.152), ('israel', 0.128), ('okazaki', 0.116), ('mutation', 0.114), ('arranged', 0.105), ('naoaki', 0.105), ('crossover', 0.105), ('paragraph', 0.103), ('agree', 0.1), ('usa', 0.096), ('ranking', 0.095), ('operator', 0.094), ('mitsuru', 0.09), ('nuclear', 0.09), ('sentence', 0.09), ('coreference', 0.088), ('lapata', 0.087), ('photographs', 0.086), ('vanunu', 0.086), ('unordered', 0.079), ('svmrank', 0.076), ('donghong', 0.076), ('greedy', 0.075), ('multidocument', 0.075), ('barzilay', 0.072), ('ji', 0.072), ('ga', 0.072), ('stroudsburg', 0.071), ('danushka', 0.07), ('precedence', 0.07), ('mds', 0.066), ('overlap', 0.066), ('exhaustive', 0.063), ('characterize', 0.063), ('inproceedings', 0.06), ('madnani', 0.06), ('strength', 0.06), ('sentences', 0.057), ('pa', 0.057), ('coherence', 0.056), ('pairwise', 0.054), ('generation', 0.054), ('nie', 0.054), ('xiaojun', 0.053), ('adverb', 0.053), ('summary', 0.052), ('challenging', 0.052), ('strategy', 0.052), ('quantified', 0.051), ('pc', 0.051), ('lemmatized', 0.051), ('news', 0.051), ('sequential', 0.05), ('denotes', 0.049), ('classification', 0.049), ('judith', 0.049), ('comma', 0.049), ('search', 0.049), ('former', 0.049), ('pm', 0.048), ('cohen', 0.048), ('viewed', 0.046), ('features', 0.044), ('kdd', 0.044), ('defining', 0.044), ('algorithm', 0.043), ('rank', 0.043), ('texts', 0.04), ('determine', 0.04), ('overall', 0.04), ('maximize', 0.04), ('function', 0.039), ('summarization', 0.039), ('latter', 0.039), ('wana', 0.038), ('inversions', 0.038), ('denied', 0.038), ('nova', 0.038), ('wanxiaoj', 0.038), ('busemann', 0.038), ('exits', 0.038), ('kearns', 0.038), ('ayan', 0.038), ('lingpeng', 0.038), ('priorities', 0.038), ('renxian', 0.038), ('ps', 0.037), ('decide', 0.037), ('regina', 0.037), ('joachims', 0.037), ('works', 0.036), ('adjective', 0.036), ('thorsten', 0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 225 acl-2013-Learning to Order Natural Language Texts
Author: Jiwei Tan ; Xiaojun Wan ; Jianguo Xiao
Abstract: Ordering texts is an important task for many NLP applications. Most previous works on summary sentence ordering rely on the contextual information (e.g. adjacent sentences) of each sentence in the source document. In this paper, we investigate a more challenging task of ordering a set of unordered sentences without any contextual information. We introduce a set of features to characterize the order and coherence of natural language texts, and use the learning to rank technique to determine the order of any two sentences. We also propose to use the genetic algorithm to determine the total order of all sentences. Evaluation results on a news corpus show the effectiveness of our proposed method. 1
2 0.14432311 172 acl-2013-Graph-based Local Coherence Modeling
Author: Camille Guinaudeau ; Michael Strube
Abstract: We propose a computationally efficient graph-based approach for local coherence modeling. We evaluate our system on three tasks: sentence ordering, summary coherence rating and readability assessment. The performance is comparable to entity grid based approaches though these rely on a computationally expensive training phase and face data sparsity problems.
3 0.11074369 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
Author: Lu Wang ; Hema Raghavan ; Vittorio Castelli ; Radu Florian ; Claire Cardie
Abstract: We consider the problem of using sentence compression techniques to facilitate queryfocused multi-document summarization. We present a sentence-compression-based framework for the task, and design a series of learning-based compression models built on parse trees. An innovative beam search decoder is proposed to efficiently find highly probable compressions. Under this framework, we show how to integrate various indicative metrics such as linguistic motivation and query relevance into the compression process by deriving a novel formulation of a compression scoring function. Our best model achieves statistically significant improvement over the state-of-the-art systems on several metrics (e.g. 8.0% and 5.4% improvements in ROUGE-2 respectively) for the DUC 2006 and 2007 summarization task. ,
4 0.095082544 130 acl-2013-Domain-Specific Coreference Resolution with Lexicalized Features
Author: Nathan Gilbert ; Ellen Riloff
Abstract: Most coreference resolvers rely heavily on string matching, syntactic properties, and semantic attributes of words, but they lack the ability to make decisions based on individual words. In this paper, we explore the benefits of lexicalized features in the setting of domain-specific coreference resolution. We show that adding lexicalized features to off-the-shelf coreference resolvers yields significant performance gains on four domain-specific data sets and with two types of coreference resolution architectures.
5 0.080321416 196 acl-2013-Improving pairwise coreference models through feature space hierarchy learning
Author: Emmanuel Lassalle ; Pascal Denis
Abstract: This paper proposes a new method for significantly improving the performance of pairwise coreference models. Given a set of indicators, our method learns how to best separate types of mention pairs into equivalence classes for which we construct distinct classification models. In effect, our approach finds an optimal feature space (derived from a base feature set and indicator set) for discriminating coreferential mention pairs. Although our approach explores a very large space of possible feature spaces, it remains tractable by exploiting the structure of the hierarchies built from the indicators. Our exper- iments on the CoNLL-2012 Shared Task English datasets (gold mentions) indicate that our method is robust relative to different clustering strategies and evaluation metrics, showing large and consistent improvements over a single pairwise model using the same base features. Our best system obtains a competitive 67.2 of average F1 over MUC, and CEAF which, despite its simplicity, places it above the mean score of other systems on these datasets. B3,
6 0.076317966 252 acl-2013-Multigraph Clustering for Unsupervised Coreference Resolution
7 0.068530954 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation
8 0.06837476 319 acl-2013-Sequential Summarization: A New Application for Timely Updated Twitter Trending Topics
9 0.067240424 307 acl-2013-Scalable Decipherment for Machine Translation via Hash Sampling
10 0.067187235 375 acl-2013-Using Integer Linear Programming in Concept-to-Text Generation to Produce More Compact Texts
11 0.067160703 121 acl-2013-Discovering User Interactions in Ideological Discussions
12 0.065603182 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization
13 0.064145088 345 acl-2013-The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis
14 0.063831948 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
15 0.063674726 22 acl-2013-A Structured Distributional Semantic Model for Event Co-reference
16 0.063496657 5 acl-2013-A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art
17 0.06253738 149 acl-2013-Exploring Word Order Universals: a Probabilistic Graphical Model Approach
18 0.06172407 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
19 0.061703201 31 acl-2013-A corpus-based evaluation method for Distributional Semantic Models
20 0.061396528 55 acl-2013-Are Semantically Coherent Topic Models Useful for Ad Hoc Information Retrieval?
topicId topicWeight
[(0, 0.195), (1, 0.044), (2, 0.014), (3, -0.063), (4, 0.031), (5, 0.043), (6, 0.057), (7, 0.01), (8, -0.079), (9, 0.001), (10, 0.006), (11, 0.001), (12, -0.092), (13, -0.017), (14, -0.068), (15, 0.078), (16, -0.027), (17, 0.008), (18, -0.069), (19, 0.02), (20, -0.021), (21, 0.016), (22, -0.0), (23, -0.06), (24, 0.005), (25, 0.039), (26, 0.003), (27, 0.034), (28, 0.004), (29, 0.047), (30, -0.057), (31, -0.02), (32, 0.029), (33, -0.052), (34, 0.047), (35, -0.084), (36, 0.048), (37, 0.003), (38, -0.023), (39, -0.03), (40, -0.023), (41, 0.008), (42, -0.035), (43, 0.012), (44, -0.029), (45, -0.005), (46, 0.011), (47, -0.013), (48, -0.012), (49, -0.049)]
simIndex simValue paperId paperTitle
same-paper 1 0.91252011 225 acl-2013-Learning to Order Natural Language Texts
Author: Jiwei Tan ; Xiaojun Wan ; Jianguo Xiao
Abstract: Ordering texts is an important task for many NLP applications. Most previous works on summary sentence ordering rely on the contextual information (e.g. adjacent sentences) of each sentence in the source document. In this paper, we investigate a more challenging task of ordering a set of unordered sentences without any contextual information. We introduce a set of features to characterize the order and coherence of natural language texts, and use the learning to rank technique to determine the order of any two sentences. We also propose to use the genetic algorithm to determine the total order of all sentences. Evaluation results on a news corpus show the effectiveness of our proposed method. 1
2 0.80628252 172 acl-2013-Graph-based Local Coherence Modeling
Author: Camille Guinaudeau ; Michael Strube
Abstract: We propose a computationally efficient graph-based approach for local coherence modeling. We evaluate our system on three tasks: sentence ordering, summary coherence rating and readability assessment. The performance is comparable to entity grid based approaches though these rely on a computationally expensive training phase and face data sparsity problems.
3 0.74237525 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
Author: Lu Wang ; Hema Raghavan ; Vittorio Castelli ; Radu Florian ; Claire Cardie
Abstract: We consider the problem of using sentence compression techniques to facilitate queryfocused multi-document summarization. We present a sentence-compression-based framework for the task, and design a series of learning-based compression models built on parse trees. An innovative beam search decoder is proposed to efficiently find highly probable compressions. Under this framework, we show how to integrate various indicative metrics such as linguistic motivation and query relevance into the compression process by deriving a novel formulation of a compression scoring function. Our best model achieves statistically significant improvement over the state-of-the-art systems on several metrics (e.g. 8.0% and 5.4% improvements in ROUGE-2 respectively) for the DUC 2006 and 2007 summarization task. ,
4 0.70948344 196 acl-2013-Improving pairwise coreference models through feature space hierarchy learning
Author: Emmanuel Lassalle ; Pascal Denis
Abstract: This paper proposes a new method for significantly improving the performance of pairwise coreference models. Given a set of indicators, our method learns how to best separate types of mention pairs into equivalence classes for which we construct distinct classification models. In effect, our approach finds an optimal feature space (derived from a base feature set and indicator set) for discriminating coreferential mention pairs. Although our approach explores a very large space of possible feature spaces, it remains tractable by exploiting the structure of the hierarchies built from the indicators. Our exper- iments on the CoNLL-2012 Shared Task English datasets (gold mentions) indicate that our method is robust relative to different clustering strategies and evaluation metrics, showing large and consistent improvements over a single pairwise model using the same base features. Our best system obtains a competitive 67.2 of average F1 over MUC, and CEAF which, despite its simplicity, places it above the mean score of other systems on these datasets. B3,
5 0.66953331 5 acl-2013-A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art
Author: Peter A. Rankel ; John M. Conroy ; Hoa Trang Dang ; Ani Nenkova
Abstract: How good are automatic content metrics for news summary evaluation? Here we provide a detailed answer to this question, with a particular focus on assessing the ability of automatic evaluations to identify statistically significant differences present in manual evaluation of content. Using four years of data from the Text Analysis Conference, we analyze the performance of eight ROUGE variants in terms of accuracy, precision and recall in finding significantly different systems. Our experiments show that some of the neglected variants of ROUGE, based on higher order n-grams and syntactic dependencies, are most accurate across the years; the commonly used ROUGE-1 scores find too many significant differences between systems which manual evaluation would deem comparable. We also test combinations ofROUGE variants and find that they considerably improve the accuracy of automatic prediction.
6 0.6625908 59 acl-2013-Automated Pyramid Scoring of Summaries using Distributional Semantics
7 0.66050309 340 acl-2013-Text-Driven Toponym Resolution using Indirect Supervision
8 0.65178132 252 acl-2013-Multigraph Clustering for Unsupervised Coreference Resolution
9 0.63928908 178 acl-2013-HEADY: News headline abstraction through event pattern clustering
10 0.63670981 333 acl-2013-Summarization Through Submodularity and Dispersion
11 0.63082212 130 acl-2013-Domain-Specific Coreference Resolution with Lexicalized Features
13 0.62495565 377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization
14 0.62349916 280 acl-2013-Plurality, Negation, and Quantification:Towards Comprehensive Quantifier Scope Disambiguation
15 0.61850893 364 acl-2013-Typesetting for Improved Readability using Lexical and Syntactic Information
16 0.61781174 332 acl-2013-Subtree Extractive Summarization via Submodular Maximization
17 0.6123538 149 acl-2013-Exploring Word Order Universals: a Probabilistic Graphical Model Approach
18 0.6106351 322 acl-2013-Simple, readable sub-sentences
19 0.61010194 205 acl-2013-Joint Apposition Extraction with Syntactic and Semantic Constraints
20 0.60872036 142 acl-2013-Evolutionary Hierarchical Dirichlet Process for Timeline Summarization
topicId topicWeight
[(0, 0.062), (6, 0.044), (11, 0.072), (15, 0.013), (24, 0.056), (26, 0.063), (34, 0.01), (35, 0.081), (42, 0.062), (48, 0.058), (61, 0.213), (64, 0.011), (70, 0.052), (88, 0.062), (90, 0.029), (95, 0.04)]
simIndex simValue paperId paperTitle
1 0.81924355 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing
Author: Ryan McDonald ; Joakim Nivre ; Yvonne Quirmbach-Brundage ; Yoav Goldberg ; Dipanjan Das ; Kuzman Ganchev ; Keith Hall ; Slav Petrov ; Hao Zhang ; Oscar Tackstrom ; Claudia Bedini ; Nuria Bertomeu Castello ; Jungmee Lee
Abstract: We present a new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean. To show the usefulness of such a resource, we present a case study of crosslingual transfer parsing with more reliable evaluation than has been possible before. This ‘universal’ treebank is made freely available in order to facilitate research on multilingual dependency parsing.1
2 0.81396794 59 acl-2013-Automated Pyramid Scoring of Summaries using Distributional Semantics
Author: Rebecca J. Passonneau ; Emily Chen ; Weiwei Guo ; Dolores Perin
Abstract: The pyramid method for content evaluation of automated summarizers produces scores that are shown to correlate well with manual scores used in educational assessment of students’ summaries. This motivates the development of a more accurate automated method to compute pyramid scores. Of three methods tested here, the one that performs best relies on latent semantics.
same-paper 3 0.80670267 225 acl-2013-Learning to Order Natural Language Texts
Author: Jiwei Tan ; Xiaojun Wan ; Jianguo Xiao
Abstract: Ordering texts is an important task for many NLP applications. Most previous works on summary sentence ordering rely on the contextual information (e.g. adjacent sentences) of each sentence in the source document. In this paper, we investigate a more challenging task of ordering a set of unordered sentences without any contextual information. We introduce a set of features to characterize the order and coherence of natural language texts, and use the learning to rank technique to determine the order of any two sentences. We also propose to use the genetic algorithm to determine the total order of all sentences. Evaluation results on a news corpus show the effectiveness of our proposed method. 1
4 0.6534729 318 acl-2013-Sentiment Relevance
Author: Christian Scheible ; Hinrich Schutze
Abstract: A number of different notions, including subjectivity, have been proposed for distinguishing parts of documents that convey sentiment from those that do not. We propose a new concept, sentiment relevance, to make this distinction and argue that it better reflects the requirements of sentiment analysis systems. We demonstrate experimentally that sentiment relevance and subjectivity are related, but different. Since no large amount of labeled training data for our new notion of sentiment relevance is available, we investigate two semi-supervised methods for creating sentiment relevance classifiers: a distant supervision approach that leverages structured information about the domain of the reviews; and transfer learning on feature representations based on lexical taxonomies that enables knowledge transfer. We show that both methods learn sentiment relevance classifiers that perform well.
5 0.65163213 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
Author: Ulle Endriss ; Raquel Fernandez
Abstract: Crowdsourcing, which offers new ways of cheaply and quickly gathering large amounts of information contributed by volunteers online, has revolutionised the collection of labelled data. Yet, to create annotated linguistic resources from this data, we face the challenge of having to combine the judgements of a potentially large group of annotators. In this paper we investigate how to aggregate individual annotations into a single collective annotation, taking inspiration from the field of social choice theory. We formulate a general formal model for collective annotation and propose several aggregation methods that go beyond the commonly used majority rule. We test some of our methods on data from a crowdsourcing experiment on textual entailment annotation.
6 0.64213878 94 acl-2013-Coordination Structures in Dependency Treebanks
7 0.64140141 275 acl-2013-Parsing with Compositional Vector Grammars
8 0.64090288 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
9 0.63875234 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations
10 0.63778269 252 acl-2013-Multigraph Clustering for Unsupervised Coreference Resolution
11 0.6365965 85 acl-2013-Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
12 0.63503784 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
13 0.63494396 123 acl-2013-Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study
14 0.63409686 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
15 0.63330042 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
16 0.63279641 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
17 0.63192099 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation
18 0.63121974 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction
19 0.63107342 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
20 0.63022679 173 acl-2013-Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging