acl acl2011 acl2011-53 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ziheng Lin ; Hwee Tou Ng ; Min-Yen Kan
Abstract: We present a novel model to represent and assess the discourse coherence of text. Our model assumes that coherent text implicitly favors certain types of discourse relation transitions. We implement this model and apply it towards the text ordering ranking task, which aims to discern an original text from a permuted ordering of its sentences. The experimental results demonstrate that our model is able to significantly outperform the state-ofthe-art coherence model by Barzilay and Lapata (2005), reducing the error rate of the previous approach by an average of 29% over three data sets against human upperbounds. We further show that our model is synergistic with the previous approach, demonstrating an error reduction of 73% when the features from both models are combined for the task.
Reference: text
sentIndex sentText sentNum sentScore
1 s g inz Abstract We present a novel model to represent and assess the discourse coherence of text. [sent-4, score-0.996]
2 Our model assumes that coherent text implicitly favors certain types of discourse relation transitions. [sent-5, score-0.999]
3 We implement this model and apply it towards the text ordering ranking task, which aims to discern an original text from a permuted ordering of its sentences. [sent-6, score-0.573]
4 The experimental results demonstrate that our model is able to significantly outperform the state-ofthe-art coherence model by Barzilay and Lapata (2005), reducing the error rate of the previous approach by an average of 29% over three data sets against human upperbounds. [sent-7, score-0.364]
5 1 Introduction The coherence of a text is usually reflected by its discourse structure and relations. [sent-9, score-0.985]
6 This notion of preferential ordering of discourse relations is observed in natural language in general, 997 and generalizes to other discourse frameworks aside from RST. [sent-18, score-1.453]
7 Thus coherent text exhibits measurable preferences for specific intra- and inter-discourse relation ordering. [sent-33, score-0.324]
8 Our key idea is to use the converse of this phenomenon to assess the coherence of a text. [sent-34, score-0.321]
9 In this paper, we detail our model to capture the coherence of a text based on the statistical distribution of the discourse structure and relations. [sent-35, score-1.022]
10 Our method specifically focuses on the discourse relation transitions between adjacent sentences, modeling them in a discourse role matrix. [sent-36, score-1.602]
11 We implement and validate our model on three data sets, which show robust improvements over the current state-of-the-art for coherence assessment. [sent-40, score-0.353]
12 We also provide the first assessment of the upper-bound of human performance on the standard task of distinguishing coherent from incoherent orderings. [sent-41, score-0.272]
13 To the best our knowledge, this is also the first study in which we show output from an automatic discourse parser helps in coherence modeling. [sent-42, score-0.963]
14 2 Related Work The study of coherence in discourse has led to many linguistic theories, of which we only discuss algorithms that have been reduced to practice. [sent-43, score-0.928]
15 Barzilay and Lapata operationalized Centering Theory by creating an entity grid model to capture discourse entity transi- tions at the sentence-to-sentence level, and demonstrated their model’s ability to discern coherent texts from incoherent ones. [sent-47, score-1.09]
16 The global coherence of a text can then be summarized by the overall probability of topic shift from the first sentence to the last. [sent-49, score-0.347]
17 (2007) combined the entity-based and HMM-based models and demonstrated that these two models are complementary to each other in coherence assessment. [sent-51, score-0.333]
18 Our approach differs from these models in that it introduces and operationalizes another indicator of discourse coherence, by modeling a text’s discourse relation transitions. [sent-52, score-1.399]
19 To implement our proposal, we need to identify the text’s discourse relations. [sent-54, score-0.638]
20 This task, discourse parsing, has been a recent focus of study in the natural language processing (NLP) community, largely enabled by the availability of large-scale discourse annotated corpora (Wellner and Pustejovsky, 2007; Elwell and Baldridge, 2008; Lin et al. [sent-55, score-1.276]
21 , signaled by discourse connectives such as because) discourse relations, but also implicit (i. [sent-64, score-1.325]
22 3 Using Discourse Relations To utilize discourse relations of a text, we first apply automatic discourse parsing on the input text. [sent-67, score-1.369]
23 This parser tags each explicit/implicit relation with two levels of relation types. [sent-70, score-0.281]
24 This parser automatically identifies the discourse relations, labels the argument spans, and classifies the relation types, including identifying common entity and no relation (EntRel and NoRel) as types. [sent-72, score-1.009]
25 A simple approach to directly model the connections among discourse relations is to use the sequence of discourse relation transitions. [sent-73, score-1.529]
26 sg/ ˜linzihen/ pars e r / tion of the n-gram discourse relation transition sequences in gold standard coherent text, and a similar one for incoherent text. [sent-79, score-1.08]
27 Our analysis revealed a serious shortcoming: as the discourse relation transitions in short texts are few in number, we have very little data to base the coherence judgment on. [sent-86, score-1.21]
28 However, when faced with even short text excerpts, humans can distinguish coherent texts from incoherent ones, as exemplified in our example texts. [sent-87, score-0.412]
29 4 A Refined Approach The central problem with the basic approach is in its sparse modeling of discourse relations. [sent-91, score-0.638]
30 In developing an improved model, we need to better exploit the discourse parser’s output to provide more circumstantial evidence to support the system’s coherence decision. [sent-92, score-0.928]
31 In this section, we introduce the concept of a discourse role matrix which aims to capture an expanded set of discourse relation transition patterns. [sent-93, score-1.636]
32 We describe how to represent the coherence of a text with its discourse relations and how to transform such information into a matrix representation. [sent-94, score-1.178]
33 1 Discourse Role Matrix Figure 1 shows a text and its gold standard PDTB discourse relations. [sent-97, score-0.695]
34 When a term appears in a discourse relation, the discourse role of this term is defined as the discourse relation type plus the argument span in which the term is located (i. [sent-98, score-2.369]
35 Since the relation type is a 999 wsj 0437, showing five gold standard discourse relations. [sent-102, score-0.911]
36 Rows correspond to sentences, columns to stemmed terms, and cells contain extracted discourse roles. [sent-113, score-0.693]
37 Comparison and “cananea” is found in the Arg1 span, the discourse role of “cananea” is defined as Comp. [sent-114, score-0.728]
38 When terms appear in different relations and/or argument spans, they obtain different discourse roles in the text. [sent-116, score-0.889]
39 For instance, “cananea” plays a different discourse role of Temp. [sent-117, score-0.728]
40 In the fourth relation, since “cananea” appears in both argument spans, it has two additional discourse roles, Exp. [sent-119, score-0.702]
41 The discourse role matrix thus represents the different discourse roles of the terms across the continuous text units. [sent-122, score-1.617]
42 We formulate the discourse role matrix such that it encodes the discourse roles of the terms across adjacent sentences. [sent-124, score-1.56]
43 A cell CTi,Sj then contains the set of the discourse roles of the term Ti that appears in sentence Sj. [sent-127, score-0.819]
44 A cell may be empty (nil, as in Ccananea,S2) or contain multiple discourse roles (as in Ccananea,S3 , as “cananea” in S3 participates in the second, third, and fourth relations). [sent-130, score-0.803]
45 Given these discourse relations, building the matrix is straightforward: we note down the relations that a term Ti from a sentence Sj participates in, and record its discourse roles in the respective cell. [sent-131, score-1.647]
46 We hypothesize that the sequence of discourse role transitions in a coherent text provides clues that × distinguish it from an incoherent text. [sent-132, score-1.207]
47 The discourse role matrix thus provides the foundation for computing such role transitions, on a per term basis. [sent-133, score-0.968]
48 The key differences from the traditional lexical chains are that our chain nodes’ entities are simplified (they share the same stemmed form, instead being connected by WordNet relations), but are further enriched by being typed with discourse relations. [sent-135, score-0.671]
49 We compile the set of sub-sequences of discourse role transitions for every term in the matrix. [sent-136, score-0.891]
50 These transitions tell us how the discourse role of a term varies through the progression of the text. [sent-137, score-0.891]
51 As we have six relation types (Temp(oral), Cont(ingency), Comp(arison), Exp(ansion), EntRel and NoRel) and two argument tags (Arg1 and Arg2) for each type, we have a total of 6 2 = 12 possible discourse roles, plus a n6il × ×va 2lue. [sent-143, score-0.825]
52 We define a dis1000 course role transition as the sub-sequence of dis- course roles for a term in multiple consecutive sentences. [sent-144, score-0.281]
53 For example, the discourse role transition of “cananea” from S1 to S2 is Comp. [sent-145, score-0.775]
54 To illustrate the calculation, suppose the matrix fragment in Table 1 is the entire discourse role matrix. [sent-160, score-0.828]
55 a sA a key property of our approach is that, while discourse transitions are captured locally on a per-term basis, the probabilities of the discourse transitions are aggregated globally, across all terms. [sent-165, score-1.502]
56 We believe that the overall distribution of discourse role transitions for a coherent text is distinguishable from that for an incoherent text. [sent-166, score-1.17]
57 Our model captures the distributional differences of such sub-sequences in coherent and incoherent text in training to determine an unseen text’s coherence. [sent-167, score-0.366]
58 To evaluate the coherence of a text, we extract sub-sequences with various lengths from the discourse role matrix as features2 and compute the sub-sequence probabilities as the feature values. [sent-168, score-1.118]
59 A text can be less coherent when compared to one text, but more coherent when compared to another. [sent-174, score-0.345]
60 As such, since the notion of coherence is relative, we feel that coherence assessment is better represented as 2Sub-sequences consisting of only nil values are not used as features. [sent-175, score-0.603]
61 Coherence scoring equations can also be deduced (Lapata and Barzilay, 2005) from such a model, yielding coherence scores. [sent-182, score-0.29]
62 5 Experiments We evaluate our coherence model on the task of text ordering ranking, a standard coherence evaluation task used in both (Barzilay and Lapata, 2005) and (Elsner et al. [sent-185, score-0.758]
63 We must also be careful in using the automatic discourse parser. [sent-212, score-0.638]
64 Since the discourse parser utilizes paragraph boundaries but a permuted text does not have such boundaries, we ignore paragraph boundaries and treat the source text as if it has only one paragraph. [sent-219, score-0.984]
65 1 Human Evaluation While the text ordering ranking task has been used in previous studies, two key questions about this task have remained unaddressed in the previous work: (1) to what extent is the assumption that the source text is more coherent than its permutation correct? [sent-222, score-0.497]
66 ” We remove these sentences from the source and permuted texts, to avoid the subjects judging based on these clues instead of textual coherence. [sent-236, score-0.239]
67 When both subjects rank a source text higher than its permutation, we interpret it as the subjects agreeing that the source text is more coherent than the permutation. [sent-239, score-0.404]
68 Also, since our subjects’ judgments correlate highly with the gold standard, the assumption that the original text is always more coherent than the permuted text is supported. [sent-247, score-0.424]
69 2 Baseline Barzilay and Lapata (2005) showed that their entitybased model is able to distinguish a source text from its permutation accurately. [sent-251, score-0.288]
70 Thus, it can serve as a good comparison point for our discourse relation- based model. [sent-252, score-0.638]
71 Since they did not automatically determine the coreferential information of a permuted text but obtained that from its corresponding source text, we do not perform automatic coreference resolution in our reimplementation of their system. [sent-254, score-0.277]
72 The model setting of Type+Arg+Sal means that the model makes use of the discourse roles consisting of 1) relation types and 2) argument tags (e. [sent-272, score-0.993]
73 In Row 3, we first delete relation types from the discourse roles, which causes discourse roles to only contain the argument tags. [sent-295, score-1.557]
74 These results suggest that the argument tag information plays an important role in our discourse role transition model. [sent-304, score-0.929]
75 The entity-based model of Barzilay and Lapata (2005) connects the local entity transition with textual coherence, while our model looks at the patterns of discourse relation transitions. [sent-308, score-0.935]
76 The relation/length ratio gives us an idea of how often a sentence participates in discourse relations. [sent-335, score-0.705]
77 A high ratio means that the article is densely interconnected by discourse relations, and may make distinguishing this article from its permutation easier compared to that for a loosely connected article. [sent-336, score-0.865]
78 We expect that when a text contains more discourse relation types (i. [sent-337, score-0.818]
79 This is because compared to EntRel and NoRel, these four discourse relations can combine to produce meaningful transitions, such as the example Text (2). [sent-340, score-0.731]
80 To examine how this affects performance, we calculate the average ratio between the number of the four discourse relations in the permuted text and the length for the permuted text. [sent-341, score-1.153]
81 A very good clue of coherence in Text (2) is the explicit Comp relation between S1 and S2. [sent-361, score-0.443]
82 By modeling longer range discourse relation transitions, we may be able to discern these two cases. [sent-364, score-0.806]
83 While performance on identifying explicit discourse relations in the PDTB is as high as 93% (Pitler et al. [sent-365, score-0.731]
84 7 Conclusion We have proposed a new model for discourse coherence that leverages the observation that coherent texts preferentially follow certain discourse structures. [sent-369, score-1.793]
85 We posit that these structures can be captured in and represented by the patterns of discourse relation transitions. [sent-370, score-0.761]
86 We first demonstrate that simply using the sequence of discourse relation transition leads to sparse features and is insufficient to distinguish coherent from incoherent text. [sent-371, score-1.117]
87 To address this, our method transforms the discourse relation transitions into a discourse role matrix. [sent-372, score-1.602]
88 The matrix schematically represents term occurrences in text units and associates each occurrence with its discourse roles in the text units. [sent-373, score-0.996]
89 In our approach, n-gram sub-sequences of transitions per term in the discourse role matrix then constitute the more finegrained evidence used in our model to distinguish coherence from incoherence. [sent-374, score-1.355]
90 When applied to distinguish a source text from a sentence-reordered permutation, our model significantly outperforms the previous state-of-the-art, 1005 the entity-based local coherence model. [sent-375, score-0.479]
91 While the entity-based model captures repetitive mentions of entities, our discourse relation-based model gleans its evidence from the argumentative and discourse structure of the text. [sent-376, score-1.35]
92 The idea of modeling coherence with discourse relations and formulating it in a discourse role matrix can also be applied to other NLP tasks. [sent-379, score-1.849]
93 We plan to apply our methodology to other tasks, such as summarization, text generation and essay scoring, which also need to produce and assess discourse coherence. [sent-380, score-0.726]
94 A unified local and global model for discourse coherence. [sent-396, score-0.702]
95 Centering: a framework for modeling the local coherence of discourse. [sent-406, score-0.317]
96 Supplementing entity coherence with local rhetorical relations for information ordering. [sent-418, score-0.475]
97 Recognizing implicit discourse relations in the Penn Discourse Treebank. [sent-426, score-0.755]
98 Using syntax to disambiguate explicit discourse connectives in text. [sent-448, score-0.663]
99 Automatic sense prediction for implicit discourse relations in text. [sent-456, score-0.755]
100 Kernel based discourse relation recognition with temporal ordering information. [sent-470, score-0.845]
wordName wordTfidf (topN-words)
[('discourse', 0.638), ('coherence', 0.29), ('accidents', 0.197), ('earthquakes', 0.197), ('cananea', 0.189), ('permuted', 0.166), ('coherent', 0.144), ('pdtb', 0.136), ('incoherent', 0.128), ('relation', 0.123), ('wsj', 0.122), ('barzilay', 0.118), ('transitions', 0.113), ('lapata', 0.108), ('matrix', 0.1), ('sal', 0.098), ('roles', 0.094), ('relations', 0.093), ('role', 0.09), ('comp', 0.088), ('ordering', 0.084), ('permutation', 0.081), ('pitler', 0.075), ('entrel', 0.069), ('norel', 0.069), ('argument', 0.064), ('text', 0.057), ('row', 0.053), ('term', 0.05), ('transition', 0.047), ('curves', 0.046), ('texts', 0.046), ('arg', 0.046), ('discern', 0.045), ('entitybased', 0.045), ('ziheng', 0.045), ('article', 0.044), ('ranking', 0.043), ('combined', 0.043), ('subjects', 0.042), ('rhetorical', 0.039), ('model', 0.037), ('centering', 0.037), ('rst', 0.037), ('distinguish', 0.037), ('cell', 0.037), ('salience', 0.036), ('elsner', 0.036), ('parser', 0.035), ('arison', 0.034), ('elwell', 0.034), ('participates', 0.034), ('answer', 0.034), ('baseline', 0.033), ('stemmed', 0.033), ('articles', 0.033), ('ratio', 0.033), ('source', 0.031), ('assess', 0.031), ('emily', 0.03), ('permutations', 0.03), ('cont', 0.03), ('temp', 0.03), ('webber', 0.03), ('upper', 0.03), ('lin', 0.03), ('clue', 0.03), ('salient', 0.029), ('regina', 0.029), ('type', 0.028), ('ani', 0.028), ('morris', 0.028), ('permuting', 0.028), ('aravind', 0.028), ('preference', 0.028), ('local', 0.027), ('hwee', 0.027), ('wellner', 0.026), ('nucleus', 0.026), ('tou', 0.026), ('entity', 0.026), ('validate', 0.026), ('reductions', 0.026), ('easier', 0.025), ('penn', 0.025), ('mirella', 0.025), ('contingency', 0.025), ('connective', 0.025), ('connectives', 0.025), ('bound', 0.025), ('kan', 0.024), ('soricut', 0.024), ('implicit', 0.024), ('theory', 0.023), ('accuracies', 0.023), ('grosz', 0.023), ('reimplementation', 0.023), ('nil', 0.023), ('columns', 0.022), ('prasad', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999845 53 acl-2011-Automatically Evaluating Text Coherence Using Discourse Relations
Author: Ziheng Lin ; Hwee Tou Ng ; Min-Yen Kan
Abstract: We present a novel model to represent and assess the discourse coherence of text. Our model assumes that coherent text implicitly favors certain types of discourse relation transitions. We implement this model and apply it towards the text ordering ranking task, which aims to discern an original text from a permuted ordering of its sentences. The experimental results demonstrate that our model is able to significantly outperform the state-ofthe-art coherence model by Barzilay and Lapata (2005), reducing the error rate of the previous approach by an average of 29% over three data sets against human upperbounds. We further show that our model is synergistic with the previous approach, demonstrating an error reduction of 73% when the features from both models are combined for the task.
2 0.26907539 96 acl-2011-Disambiguating temporal-contrastive connectives for machine translation
Author: Thomas Meyer
Abstract: Temporal–contrastive discourse connectives (although, while, since, etc.) signal various types ofrelations between clauses such as temporal, contrast, concession and cause. They are often ambiguous and therefore difficult to translate from one language to another. We discuss several new and translation-oriented experiments for the disambiguation of a specific subset of discourse connectives in order to correct some of the translation errors made by current statistical machine translation systems.
3 0.25358462 280 acl-2011-Sentence Ordering Driven by Local and Global Coherence for Summary Generation
Author: Renxian Zhang
Abstract: In summarization, sentence ordering is conducted to enhance summary readability by accommodating text coherence. We propose a grouping-based ordering framework that integrates local and global coherence concerns. Summary sentences are grouped before ordering is applied on two levels: group-level and sentence-level. Different algorithms for grouping and ordering are discussed. The preliminary results on single-document news datasets demonstrate the advantage of our method over a widely accepted method. 1 Introduction and Background The canonical pipeline of text summarization consists of topic identification, interpretation, and summary generation (Hovy, 2005). In the simple case of extraction, topic identification and interpretation are conflated to sentence selection and concerned with summary informativeness. In . comparison, summary generation addresses summary readability and a frequently discussed generation technique is sentence ordering. It is implicitly or explicitly stated that sentence ordering for summarization is primarily driven by coherence. For example, Barzilay et al. (2002) use lexical cohesion information to model local coherence. A statistical model by Lapata (2003) considers both lexical and syntactic features in calculating local coherence. More globally biased is Barzilay and Lee’s (2004) HMM-based content model, which models global coherence with word distribution patterns. Whilst the above models treat coherence as lexical or topical relations, Barzilay and Lapata (2005, 2008) explicitly model local coherence with an entity grid model trained for optimal syntactic role transitions of entities. 6 . polyu .edu .hk Although coherence in those works is modeled in the guise of “lexical cohesion”, “topic closeness”, “content relatedness”, etc., few published works simultaneously accommodate coherence on the two levels: local coherence and global coherence, both of which are intriguing topics in text linguistics and psychology. For sentences, local coherence means the wellconnectedness between adjacent sentences through lexical cohesion (Halliday and Hasan, 1976) or entity repetition (Grosz et al., 1995) and global coherence is the discourse-level relation connecting remote sentences (Mann and Thompson, 1995; Kehler, 2002). An abundance of psychological evidences show that coherence on both levels is manifested in text comprehension (Tapiero, 2007). Accordingly, an apt sentence ordering scheme should be driven by such concerns. We also note that as sentence ordering is usually discussed only in the context of multi-document summarization, factors other than coherence are also considered, such as time and source sentence position in Bollegala et al.’s (2006) “agglomerative ordering” approach. But it remains an open question whether sentence ordering is non-trivial for single-document summarization, as it has long been recognized as an actual strategy taken by human summarizers (Jing, 1998; Jing and McKeown, 2000) and acknowledged early in work on sentence ordering for multi-document summarization (Barzilay et al., 2002). In this paper, we outline a grouping-based sentence ordering framework that is driven by the concern of local and global coherence. Summary sentences are grouped according to their conceptual relatedness before being ordered on two levels: group-level ordering and sentence-level ordering, which capture global coherence and local coherence in an integrated model. As a preliminary study, we applied the framework to singlePortland, ORP,r UoSceAed 1i9ng-2s4 of Ju tnhee 2 A0C1L1-H. ?Lc T2 0201111 A Ssstuodcieanttio Snes fsoiro Cn,o pmapguesta 6t–io1n1a,l Linguistics document summary generation and obtained interesting results. The main contributions of this work are: (1) we stress the need to channel sentence ordering research to linguistic and psychological findings about text coherence; (2) we propose a groupingbased ordering framework that integrates both local and global coherence; (3) we find in experiments that coherence-driven sentence ordering improves the readability of singledocument summaries, for which sentence ordering is often considered trivial. In Section 2, we review related ideas and techniques in previous work. Section 3 provides the details of grouping-based sentence ordering. The preliminary experimental results are presented in Section 4. Finally, Section 5 concludes the whole paper and describes future work. 2 Grouping-Based Ordering Our ordering framework is designed to capture both local and global coherence. Globally, we identify related groups among sentences and find their relative order. Locally, we strive to keep sentence similar or related in content close to each other within one group. 2.1 Sentence Representation As summary sentences are isolated from their original context, we retain the important content information by representing sentences as concept vectors. In the simplest case, the “concept” is equivalent to content word. A drawback of this practice is that it considers every content word equally contributive to the sentence content, which is not always true. For example, in the news domain, entities realized as NPs are more important than other concepts. To represent sentences as entity vectors, we identify both common entities (as the head nouns of NPs) and named entities. Two common entities are equivalent if their noun stems are identical or synonymous. Named entities are usually equated by identity. But in order to improve accuracy, we also consider: 1) structural subsumption (one is part of another); 2) hypernymy and holonymy (the named entities are in a superordinate-subordinate or part-whole relation). Now with summary sentence Si and m entities eik (k = 1 m), Si = (wf(ei1), wf(ei2), wf(eim)), … … … … … …, 7 where wf(eik) = wk× ×f(eik), f(eik) is the frequency of eik and wk is the weight of eik. We define wk = 1 if eik is a common entity and wk = 2 if eik is a named entity. We give double weight to named entities because of their significance to news articles. After all, a news story typically contains events, places, organizations, people, etc. that denote the news theme. Other things being equal, two sentences sharing a mention of named entities are thematically closer than two sentences sharing a mention of common entities. Alternatively, we can realize the “concept” as “event” because events are prevalent semantic constructs that bear much of the sentence content in some domains (e.g., narratives and news reports). To represent sentences as event vectors, we can follow Zhang et al.’s (2010) method at the cost of more complexity. 2.2 Sentence Grouping To meet the global need of identifying sentence groups, we develop two grouping algorithms by applying graph-based operation and clustering. Connected Component Finding (CC) This algorithm treats grouping sentences as finding connected components (CC) in a text graph TG = (V, E), where V represents the sentences and E the sentence relations weighted by cosine similarity. Edges with weight < t, a threshold, are removed because they represent poor sentence coherence. The resultant graph may be disconnected, in which we find all of its connected components, using depth-first search. The connected components are the groups we are looking for. Note that this method cannot guarantee that every two sentences in such a group are directly linked, but it does guarantee that there exists a path between every sentence pair. Modified K-means Clustering (MKM) Observing that the CC method finds only coherent groups, not necessarily groups of coherent sentences, we develop a second algorithm using clustering. A good choice might be K-means as it is efficient and outperforms agglomerative clustering methods in NLP applications (Steibach et al., 2000), but the difficulty with the conventional K-means is the decision of K. Our solution is modified K-means (MKM) based on (Wilpon and Rabiner, 1985). Let’s denote cluster iby CLi and cluster similarity by Sim(CLi) =SimM,SiinnCLi(Sim( Sim,Sin)), where Sim( Sim,Sin)is their cvMeluin231st.(roCWSD21imLsdhtoIf(=ClaSehbLsimt)l1;(zehS<-amncs,tdoeSa1vn)c;st:el=uMionvrate(lhcSKiosmg-tC;eblLayn)s,c2riuegant wcoelsunathdi cosine. The following illustrates the algorithm. The above algorithm stops iterating when each cluster contains all above-threshold-similarity sentence pairs or only one sentence. Unlike CC, MKM results in more strongly connected groups, or groups of coherence sentences. 2.3 Ordering Algorithms After the sentences are grouped, ordering is to be conducted on two levels: group and sentence. Composed of closely related sentences, groups simulate high-level textual constructs, such as “central event”, “cause”, “effect”, “background”, etc. for news articles, around which sentences are generated for global coherence. For an intuitive example, all sentences about “cause” should immediately precede all sentences about “effect” to achieve optimal readability. We propose two approaches to group-level ordering. 1) If the group sentences come from the same document, group (Gi) order is decided by the group-representing sentence (gi) order ( means “precede”) in the text. gi gj Gi Gj 2) Group order is decided in a greedy fashion in order to maximize the connectedness between adjacent groups, thus enhancing local coherence. Each time a group is selected to achieve maximum similarity with the ordered groups and the first ordered group (G1) is selected to achieve maximum similarity with all the other groups. G1argmGaxG'GSim( G , G') GiGuanrogrdemred agrxoupsij1 Sim( Gj, G) (i > 1) where Sim(G, G’) is the average sentence cosine similarity between G and G’. 8 Within the ordered groups, sentence-level ordering is aimed to enhance local coherence by placing conceptually close sentences next to each other. Similarly, we propose two approaches. 1) If the sentences come from the same document, they are arranged by the text order. 2) Sentence order is greedily decided. Similar to the decision of group order, with ordered sentence Spi in group Gp: Sp1argSm GpaxS'SSim( S , S') SpiSunorader egd smenteanxces in Gpji1 Sim( Spj,S )(i > 1) Note that the text order is used as a common heuristic, based on the assumption that the sentences are arranged coherently in the source document, locally and globally. 3 Experiments and Preliminary Results Currently, we have evaluated grouping-based ordering on single-document summarization, for which text order is usually considered sufficient. But there is no theoretical proof that it leads to optimal global and local coherence that concerns us. On some occasions, e.g., a news article adopting the “Wall Street Journal Formula” (Rich and Harper, 2007) where conceptually related sentences are placed at the beginning and the end, sentence conceptual relatedness does not necessarily correlate with spatial proximity and thus selected sentences may need to be rearranged for better readability. We are not aware of any published work that has empirically compared alternative ways of sentence ordering for singledocument summarization. The experimental results reported below may draw some attention to this taken-for-granted issue. 3.1 Data and Method We prepared 3 datasets of 60 documents each, the first (D400) consisting of documents of about 400 words from the Document Understanding Conference (DUC) 01/02 datasets; the second (D1k) consisting of documents of about 1000 words manually selected from popular English journals such as The Wall Street Journal, The Washington Post, etc; the third (D2k) consisting of documents of about 2000 words from the DUC 01/02 dataset. Then we generated 100-word summaries for D400 and 200-word summaries for D1k and D2k. Since sentence selection is not our focus, the 180 summaries were all extracts produced by a simple but robust summarizer built on term frequency and sentence position (Aone et al., 1999). Three human annotators were employed to each provide reference orderings for the 180 summaries and mark paragraph (of at least 2 sentences) boundaries, which will be used by one of the evaluation metrics described below. In our implementation of the grouping-based ordering, sentences are represented as entity vectors and the threshold t = Avg( Sim( Sm, Sn)) c , the average sentence similarity in a group multiplied by a coefficient empirically decided on separate held-out datasets of 20 documents for each length category. The “group-representing sentence” is the textually earliest sentence in the group. We experimented with both CC and MKM to generate sentence groups and all the proposed algorithms in 2.3 for group-level and sentence- level orderings, resulting in 8 combinations as test orderings, each coded in the format of “Grouping (CC/MKM) / Group ordering (T/G) / Sentence ordering (T/G)”, where T and G represent the text order approach and the greedy selection approach respectively. For example, “CC/T/G” means grouping with CC, group ordering with text order, and sentence ordering with the greedy approach. We evaluated the test orderings against the 3 reference orderings and compute the average (Madnani et al., 2007) by using 3 different metrics. The first metric is Kendall’s τ (Lapata 2003, 2006), which has been reliably used in ordering evaluations (Bollegala et al., 2006; Madnani et al., 2007). It measures ordering differences in terms of the number of adjacent sentence inversions necessary to convert a test ordering to the reference ordering. 1 4m N( N 1) In this formula, m represents the number of inversions described above and N is the total number of sentences. The second metric is the Average Continuity (AC) proposed by Bollegala et al. (2006), which captures the intuition that the quality of sentence orderings can be estimated by the number of correctly arranged continuous sentences. 9 ACexp(1/(k1) klog(Pn)) n2 In this formula, k is the maximum number of continuous sentences, α is a small value in case Pn = 1. Pn, the proportion of continuous sentences of length n in an ordering, is defined as m/(N – n + 1) where m is the number of continuous sentences of length n in both the test and reference orderings and N is the total number of sentences. Following (Bollegala et al., 2006), we set k = Min(4, N) and α = 0.01. We also go a step further by considering only the continuous sentences in a paragraph marked by human annotators, because paragraphs are local meaning units perceived by human readers and the order of continuous sentences in a paragraph is more strongly grounded than the order of continuous sentences across paragraph boundaries. So in-paragraph sentence continuity is a better estimation for the quality of sentence orderings. This is our third metric: Paragraph-level Average Continuity (P-AC). P-ACexp(1/(k1) klog(PPn )) n2 Here PPn = m ’/(N – n + 1), where m ’ is the number of continuous sentences of length n in both the test ordering and a paragraph of the reference ordering. All the other parameters are as defined in AC and Pn. 3.2 Results The following tables show the results measured by each metric. For comparison, we also include a “Baseline” that uses the text order. For each dataset, two-tailed t-test is conducted between the top scorer and all the other orderings and statistical significance (p < 0.05) is marked with *. In general, our grouping-based ordering scheme outperforms the baseline for news articles of various lengths and statistically significant improvement can be observed on each dataset. This result casts serious doubt on the widely accepted practice of taking the text order for single-document summary generation, which is a major finding from our study. The three evaluation metrics give consistent results although they are based on different observations. The P-AC scores are much lower than their AC counterparts because of its strict paragraph constraint. Interestingly, applying the text order posterior to sentence grouping for group-level and sentence- level ordering leads to consistently optimal performance, as the top scorers on each dataset are almost all “__/T/T”. This suggests that the textual realization of coherence can be sought in the source document if possible, after the selected sentences are rearranged. It is in this sense that the general intuition about the text order is justified. It also suggests that tightly knit paragraphs (groups), where the sentences are closely connected, play a crucial role in creating a coherence flow. Shuffling those paragraphs may not affect the final coherence1. 1 Ithank an anonymous reviewer for pointing this out. 10 The grouping method does make a difference. While CC works best for the short and long datasets (D400 and D2k), MKM is more effective for the medium-sized dataset D1k. Whether the difference is simply due to length or linguistic/stylistic subtleties is an interesting topic for in-depth study. 4 Conclusion and Future Work We have established a grouping-based ordering scheme to accommodate both local and global coherence for summary generation. Experiments on single-document summaries validate our approach and challenge the well accepted text order by the summarization community. Nonetheless, the results do not necessarily propagate to multi-document summarization, for which the same-document clue for ordering cannot apply directly. Adapting the proposed scheme to multi-document summary generation is the ongoing work we are engaged in. In the next step, we will experiment with alternative sentence representations and ordering algorithms to achieve better performance. We are also considering adapting more sophisticated coherence-oriented models, such as (Soricut and Marcu, 2006; Elsner et al., 2007), to our problem so as to make more interesting comparisons possible. Acknowledgements The reported work was inspired by many talks with my supervisor, Dr. Wenjie Li, who saw through this work down to every writing detail. The author is also grateful to many people for assistance. You Ouyang shared part of his summarization work and helped with the DUC data. Dr. Li Shen, Dr. Naishi Liu, and three participants helped with the experiments. I thank them all. The work described in this paper was partially supported by Hong Kong RGC Projects (No. PolyU 5217/07E). References Aone, C., Okurowski, M. E., Gorlinsky, J., and Larsen, B. 1999. A Trainable Summarizer with Knowledge Acquired from Robust NLP Techniques. In I. Mani and M. T. Maybury (eds.), Advances in Automatic Text Summarization. 71–80. Cambridge, Massachusetts: MIT Press. Barzilay, R., Elhadad, N., and McKeown, K. 2002. Inferring Strategies for Sentence Ordering in Multidocument News Summarization. Journal of Artificial Intelligence Research, 17: 35–55. Barzilay, R. and Lapata, M. 2005. Modeling Local Coherence: An Entity-based Approach. In Proceedings of the 43rd Annual Meeting of the ACL, 141–148. Ann Arbor. Barzilay, R. and Lapata, M. 2008. Modeling Local Coherence: An Entity-Based Approach. Computational Linguistics, 34: 1–34. Barzilay, R. and Lee L. 2004. Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization. In HLT-NAACL 2004: Proceedings of the Main Conference. 113–120. Bollegala, D, Okazaki, N., and Ishizuka, M. 2006. A Bottom-up Approach to Sentence Ordering for Multidocument Summarization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, 385–392. Sydney. Elsner, M., Austerweil, j. & Charniak E. 2007. “A Unified Local and Global Model for Discourse Coherence”. In Proceedings of NAACL HLT 2007, 436-443. Rochester, NY. Grosz, B. J., Aravind K. J., and Scott W. 1995. Centering: A framework for Modeling the Local Coherence of Discourse. Computational Linguistics, 21(2):203–225. Halliday, M. A. K., and Hasan, R. 1976. Cohesion in English. London: Longman. Hovy, E. 2005. Automated Text Summarization. In R. Mitkov (ed.), The Oxford Handbook of Computational Linguistics, pp. 583–598. Oxford: Oxford University Press. Jing, H. 2000. Sentence Reduction for Automatic Text Summarization. In Proceedings of the 6th Applied Natural Language Processing WA, pp. 310–315. Conference, Seattle, Jing, H., and McKeown, K. 2000. Cut and Paste Based Text Summarization. In Proceedings of the 1st NAACL, 178–185. 11 Kehler, A. 2002. Coherence, Reference, and the Theory of Grammar. Stanford, California: CSLI Publications. Lapata, M. 2003. Probabilistic Text Structuring: Experiments with Sentence Ordering. In Proceedings of the Annual Meeting of ACL, 545–552. Sapporo, Japan. Lapata, M. 2006. Automatic evaluation of information ordering: Kendall’s tau. Computational Linguistics, 32(4):1–14. Madnani, N., Passonneau, R., Ayan, N. F., Conroy, J. M., Dorr, B. J., Klavans, J. L., O’leary, D. P., and Schlesinger, J. D. 2007. Measuring Variability in Sentence Ordering for News Summarization. In Proceedings of the Eleventh European Workshop on Natural Language Generation, 81–88. Germany. Mann, W. C. and Thompson, S. 1988. Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. Text, 8:243–281. Rich C., and Harper, C. 2007. Writing and Reporting News: A Coaching Method, Fifth Edition. Thomason Learning, Inc. Belmont, CA. Soricut, R. and Marcu D. 2006. Discourse Generation Using Utility-Trained Coherence Models. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, 803–810. Steibach, M., Karypis, G., and Kumar V. 2000. A Comparison of Document Clustering Techniques. Technical Report 00-034. Department of Computer Science and Engineering, University of Minnesota. Tapiero, I. 2007. Situation Models and Levels of Coherence: Towards a Definition of Comprehension. Mahwah, New Jersey: Lawrence Erlbaum Associates. Wilpon, J. G. and Rabiner, L. R. 1985. A Modified Kmeans Clustering Algorithm for Use in Isolated Word Recognition. In IEEE Trans. Acoustics, Speech, Signal Proc. ASSP-33(3), 587–594. Zhang R., Li, W., and Lu, Q. 2010. Sentence Ordering with Event-Enriched Semantics and Two-Layered Clustering for Multi-Document News Summarization. In COLING 2010: Poster Volume, 1489–1497, Beijing.
4 0.20067254 129 acl-2011-Extending the Entity Grid with Entity-Specific Features
Author: Micha Elsner ; Eugene Charniak
Abstract: We extend the popular entity grid representation for local coherence modeling. The grid abstracts away information about the entities it models; we add discourse prominence, named entity type and coreference features to distinguish between important and unimportant entities. We improve the best result for WSJ document discrimination by 6%.
5 0.19210279 101 acl-2011-Disentangling Chat with Local Coherence Models
Author: Micha Elsner ; Eugene Charniak
Abstract: We evaluate several popular models of local discourse coherence for domain and task generality by applying them to chat disentanglement. Using experiments on synthetic multiparty conversations, we show that most models transfer well from text to dialogue. Coherence models improve results overall when good parses and topic models are available, and on a constrained task for real chat data.
6 0.15728953 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
7 0.13593712 260 acl-2011-Recognizing Authority in Dialogue with an Integer Linear Programming Constrained Model
8 0.10345119 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
9 0.097903356 91 acl-2011-Data-oriented Monologue-to-Dialogue Generation
10 0.097753681 20 acl-2011-A New Dataset and Method for Automatically Grading ESOL Texts
11 0.082931168 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering
12 0.080046088 109 acl-2011-Effective Measures of Domain Similarity for Parsing
13 0.079908356 275 acl-2011-Semi-Supervised Modeling for Prenominal Modifier Ordering
14 0.079564616 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
15 0.076382644 287 acl-2011-Structural Topic Model for Latent Topical Structure Analysis
16 0.072392866 114 acl-2011-End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories
17 0.070175633 98 acl-2011-Discovery of Topically Coherent Sentences for Extractive Summarization
18 0.068400368 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
19 0.066415094 333 acl-2011-Web-Scale Features for Full-Scale Parsing
20 0.066389941 95 acl-2011-Detection of Agreement and Disagreement in Broadcast Conversations
topicId topicWeight
[(0, 0.186), (1, 0.043), (2, -0.144), (3, 0.004), (4, -0.014), (5, 0.077), (6, 0.015), (7, 0.068), (8, -0.124), (9, -0.008), (10, 0.003), (11, -0.041), (12, -0.06), (13, -0.102), (14, -0.092), (15, 0.042), (16, -0.145), (17, 0.187), (18, -0.063), (19, 0.03), (20, 0.025), (21, 0.105), (22, -0.126), (23, 0.002), (24, 0.019), (25, 0.027), (26, -0.064), (27, 0.062), (28, 0.095), (29, 0.041), (30, -0.102), (31, 0.078), (32, 0.05), (33, 0.031), (34, -0.046), (35, -0.132), (36, -0.14), (37, 0.127), (38, -0.098), (39, -0.116), (40, -0.202), (41, -0.043), (42, 0.021), (43, -0.126), (44, -0.038), (45, 0.017), (46, 0.123), (47, -0.018), (48, -0.103), (49, 0.128)]
simIndex simValue paperId paperTitle
same-paper 1 0.95575166 53 acl-2011-Automatically Evaluating Text Coherence Using Discourse Relations
Author: Ziheng Lin ; Hwee Tou Ng ; Min-Yen Kan
Abstract: We present a novel model to represent and assess the discourse coherence of text. Our model assumes that coherent text implicitly favors certain types of discourse relation transitions. We implement this model and apply it towards the text ordering ranking task, which aims to discern an original text from a permuted ordering of its sentences. The experimental results demonstrate that our model is able to significantly outperform the state-ofthe-art coherence model by Barzilay and Lapata (2005), reducing the error rate of the previous approach by an average of 29% over three data sets against human upperbounds. We further show that our model is synergistic with the previous approach, demonstrating an error reduction of 73% when the features from both models are combined for the task.
2 0.72479647 280 acl-2011-Sentence Ordering Driven by Local and Global Coherence for Summary Generation
Author: Renxian Zhang
Abstract: In summarization, sentence ordering is conducted to enhance summary readability by accommodating text coherence. We propose a grouping-based ordering framework that integrates local and global coherence concerns. Summary sentences are grouped before ordering is applied on two levels: group-level and sentence-level. Different algorithms for grouping and ordering are discussed. The preliminary results on single-document news datasets demonstrate the advantage of our method over a widely accepted method. 1 Introduction and Background The canonical pipeline of text summarization consists of topic identification, interpretation, and summary generation (Hovy, 2005). In the simple case of extraction, topic identification and interpretation are conflated to sentence selection and concerned with summary informativeness. In . comparison, summary generation addresses summary readability and a frequently discussed generation technique is sentence ordering. It is implicitly or explicitly stated that sentence ordering for summarization is primarily driven by coherence. For example, Barzilay et al. (2002) use lexical cohesion information to model local coherence. A statistical model by Lapata (2003) considers both lexical and syntactic features in calculating local coherence. More globally biased is Barzilay and Lee’s (2004) HMM-based content model, which models global coherence with word distribution patterns. Whilst the above models treat coherence as lexical or topical relations, Barzilay and Lapata (2005, 2008) explicitly model local coherence with an entity grid model trained for optimal syntactic role transitions of entities. 6 . polyu .edu .hk Although coherence in those works is modeled in the guise of “lexical cohesion”, “topic closeness”, “content relatedness”, etc., few published works simultaneously accommodate coherence on the two levels: local coherence and global coherence, both of which are intriguing topics in text linguistics and psychology. For sentences, local coherence means the wellconnectedness between adjacent sentences through lexical cohesion (Halliday and Hasan, 1976) or entity repetition (Grosz et al., 1995) and global coherence is the discourse-level relation connecting remote sentences (Mann and Thompson, 1995; Kehler, 2002). An abundance of psychological evidences show that coherence on both levels is manifested in text comprehension (Tapiero, 2007). Accordingly, an apt sentence ordering scheme should be driven by such concerns. We also note that as sentence ordering is usually discussed only in the context of multi-document summarization, factors other than coherence are also considered, such as time and source sentence position in Bollegala et al.’s (2006) “agglomerative ordering” approach. But it remains an open question whether sentence ordering is non-trivial for single-document summarization, as it has long been recognized as an actual strategy taken by human summarizers (Jing, 1998; Jing and McKeown, 2000) and acknowledged early in work on sentence ordering for multi-document summarization (Barzilay et al., 2002). In this paper, we outline a grouping-based sentence ordering framework that is driven by the concern of local and global coherence. Summary sentences are grouped according to their conceptual relatedness before being ordered on two levels: group-level ordering and sentence-level ordering, which capture global coherence and local coherence in an integrated model. As a preliminary study, we applied the framework to singlePortland, ORP,r UoSceAed 1i9ng-2s4 of Ju tnhee 2 A0C1L1-H. ?Lc T2 0201111 A Ssstuodcieanttio Snes fsoiro Cn,o pmapguesta 6t–io1n1a,l Linguistics document summary generation and obtained interesting results. The main contributions of this work are: (1) we stress the need to channel sentence ordering research to linguistic and psychological findings about text coherence; (2) we propose a groupingbased ordering framework that integrates both local and global coherence; (3) we find in experiments that coherence-driven sentence ordering improves the readability of singledocument summaries, for which sentence ordering is often considered trivial. In Section 2, we review related ideas and techniques in previous work. Section 3 provides the details of grouping-based sentence ordering. The preliminary experimental results are presented in Section 4. Finally, Section 5 concludes the whole paper and describes future work. 2 Grouping-Based Ordering Our ordering framework is designed to capture both local and global coherence. Globally, we identify related groups among sentences and find their relative order. Locally, we strive to keep sentence similar or related in content close to each other within one group. 2.1 Sentence Representation As summary sentences are isolated from their original context, we retain the important content information by representing sentences as concept vectors. In the simplest case, the “concept” is equivalent to content word. A drawback of this practice is that it considers every content word equally contributive to the sentence content, which is not always true. For example, in the news domain, entities realized as NPs are more important than other concepts. To represent sentences as entity vectors, we identify both common entities (as the head nouns of NPs) and named entities. Two common entities are equivalent if their noun stems are identical or synonymous. Named entities are usually equated by identity. But in order to improve accuracy, we also consider: 1) structural subsumption (one is part of another); 2) hypernymy and holonymy (the named entities are in a superordinate-subordinate or part-whole relation). Now with summary sentence Si and m entities eik (k = 1 m), Si = (wf(ei1), wf(ei2), wf(eim)), … … … … … …, 7 where wf(eik) = wk× ×f(eik), f(eik) is the frequency of eik and wk is the weight of eik. We define wk = 1 if eik is a common entity and wk = 2 if eik is a named entity. We give double weight to named entities because of their significance to news articles. After all, a news story typically contains events, places, organizations, people, etc. that denote the news theme. Other things being equal, two sentences sharing a mention of named entities are thematically closer than two sentences sharing a mention of common entities. Alternatively, we can realize the “concept” as “event” because events are prevalent semantic constructs that bear much of the sentence content in some domains (e.g., narratives and news reports). To represent sentences as event vectors, we can follow Zhang et al.’s (2010) method at the cost of more complexity. 2.2 Sentence Grouping To meet the global need of identifying sentence groups, we develop two grouping algorithms by applying graph-based operation and clustering. Connected Component Finding (CC) This algorithm treats grouping sentences as finding connected components (CC) in a text graph TG = (V, E), where V represents the sentences and E the sentence relations weighted by cosine similarity. Edges with weight < t, a threshold, are removed because they represent poor sentence coherence. The resultant graph may be disconnected, in which we find all of its connected components, using depth-first search. The connected components are the groups we are looking for. Note that this method cannot guarantee that every two sentences in such a group are directly linked, but it does guarantee that there exists a path between every sentence pair. Modified K-means Clustering (MKM) Observing that the CC method finds only coherent groups, not necessarily groups of coherent sentences, we develop a second algorithm using clustering. A good choice might be K-means as it is efficient and outperforms agglomerative clustering methods in NLP applications (Steibach et al., 2000), but the difficulty with the conventional K-means is the decision of K. Our solution is modified K-means (MKM) based on (Wilpon and Rabiner, 1985). Let’s denote cluster iby CLi and cluster similarity by Sim(CLi) =SimM,SiinnCLi(Sim( Sim,Sin)), where Sim( Sim,Sin)is their cvMeluin231st.(roCWSD21imLsdhtoIf(=ClaSehbLsimt)l1;(zehS<-amncs,tdoeSa1vn)c;st:el=uMionvrate(lhcSKiosmg-tC;eblLayn)s,c2riuegant wcoelsunathdi cosine. The following illustrates the algorithm. The above algorithm stops iterating when each cluster contains all above-threshold-similarity sentence pairs or only one sentence. Unlike CC, MKM results in more strongly connected groups, or groups of coherence sentences. 2.3 Ordering Algorithms After the sentences are grouped, ordering is to be conducted on two levels: group and sentence. Composed of closely related sentences, groups simulate high-level textual constructs, such as “central event”, “cause”, “effect”, “background”, etc. for news articles, around which sentences are generated for global coherence. For an intuitive example, all sentences about “cause” should immediately precede all sentences about “effect” to achieve optimal readability. We propose two approaches to group-level ordering. 1) If the group sentences come from the same document, group (Gi) order is decided by the group-representing sentence (gi) order ( means “precede”) in the text. gi gj Gi Gj 2) Group order is decided in a greedy fashion in order to maximize the connectedness between adjacent groups, thus enhancing local coherence. Each time a group is selected to achieve maximum similarity with the ordered groups and the first ordered group (G1) is selected to achieve maximum similarity with all the other groups. G1argmGaxG'GSim( G , G') GiGuanrogrdemred agrxoupsij1 Sim( Gj, G) (i > 1) where Sim(G, G’) is the average sentence cosine similarity between G and G’. 8 Within the ordered groups, sentence-level ordering is aimed to enhance local coherence by placing conceptually close sentences next to each other. Similarly, we propose two approaches. 1) If the sentences come from the same document, they are arranged by the text order. 2) Sentence order is greedily decided. Similar to the decision of group order, with ordered sentence Spi in group Gp: Sp1argSm GpaxS'SSim( S , S') SpiSunorader egd smenteanxces in Gpji1 Sim( Spj,S )(i > 1) Note that the text order is used as a common heuristic, based on the assumption that the sentences are arranged coherently in the source document, locally and globally. 3 Experiments and Preliminary Results Currently, we have evaluated grouping-based ordering on single-document summarization, for which text order is usually considered sufficient. But there is no theoretical proof that it leads to optimal global and local coherence that concerns us. On some occasions, e.g., a news article adopting the “Wall Street Journal Formula” (Rich and Harper, 2007) where conceptually related sentences are placed at the beginning and the end, sentence conceptual relatedness does not necessarily correlate with spatial proximity and thus selected sentences may need to be rearranged for better readability. We are not aware of any published work that has empirically compared alternative ways of sentence ordering for singledocument summarization. The experimental results reported below may draw some attention to this taken-for-granted issue. 3.1 Data and Method We prepared 3 datasets of 60 documents each, the first (D400) consisting of documents of about 400 words from the Document Understanding Conference (DUC) 01/02 datasets; the second (D1k) consisting of documents of about 1000 words manually selected from popular English journals such as The Wall Street Journal, The Washington Post, etc; the third (D2k) consisting of documents of about 2000 words from the DUC 01/02 dataset. Then we generated 100-word summaries for D400 and 200-word summaries for D1k and D2k. Since sentence selection is not our focus, the 180 summaries were all extracts produced by a simple but robust summarizer built on term frequency and sentence position (Aone et al., 1999). Three human annotators were employed to each provide reference orderings for the 180 summaries and mark paragraph (of at least 2 sentences) boundaries, which will be used by one of the evaluation metrics described below. In our implementation of the grouping-based ordering, sentences are represented as entity vectors and the threshold t = Avg( Sim( Sm, Sn)) c , the average sentence similarity in a group multiplied by a coefficient empirically decided on separate held-out datasets of 20 documents for each length category. The “group-representing sentence” is the textually earliest sentence in the group. We experimented with both CC and MKM to generate sentence groups and all the proposed algorithms in 2.3 for group-level and sentence- level orderings, resulting in 8 combinations as test orderings, each coded in the format of “Grouping (CC/MKM) / Group ordering (T/G) / Sentence ordering (T/G)”, where T and G represent the text order approach and the greedy selection approach respectively. For example, “CC/T/G” means grouping with CC, group ordering with text order, and sentence ordering with the greedy approach. We evaluated the test orderings against the 3 reference orderings and compute the average (Madnani et al., 2007) by using 3 different metrics. The first metric is Kendall’s τ (Lapata 2003, 2006), which has been reliably used in ordering evaluations (Bollegala et al., 2006; Madnani et al., 2007). It measures ordering differences in terms of the number of adjacent sentence inversions necessary to convert a test ordering to the reference ordering. 1 4m N( N 1) In this formula, m represents the number of inversions described above and N is the total number of sentences. The second metric is the Average Continuity (AC) proposed by Bollegala et al. (2006), which captures the intuition that the quality of sentence orderings can be estimated by the number of correctly arranged continuous sentences. 9 ACexp(1/(k1) klog(Pn)) n2 In this formula, k is the maximum number of continuous sentences, α is a small value in case Pn = 1. Pn, the proportion of continuous sentences of length n in an ordering, is defined as m/(N – n + 1) where m is the number of continuous sentences of length n in both the test and reference orderings and N is the total number of sentences. Following (Bollegala et al., 2006), we set k = Min(4, N) and α = 0.01. We also go a step further by considering only the continuous sentences in a paragraph marked by human annotators, because paragraphs are local meaning units perceived by human readers and the order of continuous sentences in a paragraph is more strongly grounded than the order of continuous sentences across paragraph boundaries. So in-paragraph sentence continuity is a better estimation for the quality of sentence orderings. This is our third metric: Paragraph-level Average Continuity (P-AC). P-ACexp(1/(k1) klog(PPn )) n2 Here PPn = m ’/(N – n + 1), where m ’ is the number of continuous sentences of length n in both the test ordering and a paragraph of the reference ordering. All the other parameters are as defined in AC and Pn. 3.2 Results The following tables show the results measured by each metric. For comparison, we also include a “Baseline” that uses the text order. For each dataset, two-tailed t-test is conducted between the top scorer and all the other orderings and statistical significance (p < 0.05) is marked with *. In general, our grouping-based ordering scheme outperforms the baseline for news articles of various lengths and statistically significant improvement can be observed on each dataset. This result casts serious doubt on the widely accepted practice of taking the text order for single-document summary generation, which is a major finding from our study. The three evaluation metrics give consistent results although they are based on different observations. The P-AC scores are much lower than their AC counterparts because of its strict paragraph constraint. Interestingly, applying the text order posterior to sentence grouping for group-level and sentence- level ordering leads to consistently optimal performance, as the top scorers on each dataset are almost all “__/T/T”. This suggests that the textual realization of coherence can be sought in the source document if possible, after the selected sentences are rearranged. It is in this sense that the general intuition about the text order is justified. It also suggests that tightly knit paragraphs (groups), where the sentences are closely connected, play a crucial role in creating a coherence flow. Shuffling those paragraphs may not affect the final coherence1. 1 Ithank an anonymous reviewer for pointing this out. 10 The grouping method does make a difference. While CC works best for the short and long datasets (D400 and D2k), MKM is more effective for the medium-sized dataset D1k. Whether the difference is simply due to length or linguistic/stylistic subtleties is an interesting topic for in-depth study. 4 Conclusion and Future Work We have established a grouping-based ordering scheme to accommodate both local and global coherence for summary generation. Experiments on single-document summaries validate our approach and challenge the well accepted text order by the summarization community. Nonetheless, the results do not necessarily propagate to multi-document summarization, for which the same-document clue for ordering cannot apply directly. Adapting the proposed scheme to multi-document summary generation is the ongoing work we are engaged in. In the next step, we will experiment with alternative sentence representations and ordering algorithms to achieve better performance. We are also considering adapting more sophisticated coherence-oriented models, such as (Soricut and Marcu, 2006; Elsner et al., 2007), to our problem so as to make more interesting comparisons possible. Acknowledgements The reported work was inspired by many talks with my supervisor, Dr. Wenjie Li, who saw through this work down to every writing detail. The author is also grateful to many people for assistance. You Ouyang shared part of his summarization work and helped with the DUC data. Dr. Li Shen, Dr. Naishi Liu, and three participants helped with the experiments. I thank them all. The work described in this paper was partially supported by Hong Kong RGC Projects (No. PolyU 5217/07E). References Aone, C., Okurowski, M. E., Gorlinsky, J., and Larsen, B. 1999. A Trainable Summarizer with Knowledge Acquired from Robust NLP Techniques. In I. Mani and M. T. Maybury (eds.), Advances in Automatic Text Summarization. 71–80. Cambridge, Massachusetts: MIT Press. Barzilay, R., Elhadad, N., and McKeown, K. 2002. Inferring Strategies for Sentence Ordering in Multidocument News Summarization. Journal of Artificial Intelligence Research, 17: 35–55. Barzilay, R. and Lapata, M. 2005. Modeling Local Coherence: An Entity-based Approach. In Proceedings of the 43rd Annual Meeting of the ACL, 141–148. Ann Arbor. Barzilay, R. and Lapata, M. 2008. Modeling Local Coherence: An Entity-Based Approach. Computational Linguistics, 34: 1–34. Barzilay, R. and Lee L. 2004. Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization. In HLT-NAACL 2004: Proceedings of the Main Conference. 113–120. Bollegala, D, Okazaki, N., and Ishizuka, M. 2006. A Bottom-up Approach to Sentence Ordering for Multidocument Summarization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, 385–392. Sydney. Elsner, M., Austerweil, j. & Charniak E. 2007. “A Unified Local and Global Model for Discourse Coherence”. In Proceedings of NAACL HLT 2007, 436-443. Rochester, NY. Grosz, B. J., Aravind K. J., and Scott W. 1995. Centering: A framework for Modeling the Local Coherence of Discourse. Computational Linguistics, 21(2):203–225. Halliday, M. A. K., and Hasan, R. 1976. Cohesion in English. London: Longman. Hovy, E. 2005. Automated Text Summarization. In R. Mitkov (ed.), The Oxford Handbook of Computational Linguistics, pp. 583–598. Oxford: Oxford University Press. Jing, H. 2000. Sentence Reduction for Automatic Text Summarization. In Proceedings of the 6th Applied Natural Language Processing WA, pp. 310–315. Conference, Seattle, Jing, H., and McKeown, K. 2000. Cut and Paste Based Text Summarization. In Proceedings of the 1st NAACL, 178–185. 11 Kehler, A. 2002. Coherence, Reference, and the Theory of Grammar. Stanford, California: CSLI Publications. Lapata, M. 2003. Probabilistic Text Structuring: Experiments with Sentence Ordering. In Proceedings of the Annual Meeting of ACL, 545–552. Sapporo, Japan. Lapata, M. 2006. Automatic evaluation of information ordering: Kendall’s tau. Computational Linguistics, 32(4):1–14. Madnani, N., Passonneau, R., Ayan, N. F., Conroy, J. M., Dorr, B. J., Klavans, J. L., O’leary, D. P., and Schlesinger, J. D. 2007. Measuring Variability in Sentence Ordering for News Summarization. In Proceedings of the Eleventh European Workshop on Natural Language Generation, 81–88. Germany. Mann, W. C. and Thompson, S. 1988. Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. Text, 8:243–281. Rich C., and Harper, C. 2007. Writing and Reporting News: A Coaching Method, Fifth Edition. Thomason Learning, Inc. Belmont, CA. Soricut, R. and Marcu D. 2006. Discourse Generation Using Utility-Trained Coherence Models. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, 803–810. Steibach, M., Karypis, G., and Kumar V. 2000. A Comparison of Document Clustering Techniques. Technical Report 00-034. Department of Computer Science and Engineering, University of Minnesota. Tapiero, I. 2007. Situation Models and Levels of Coherence: Towards a Definition of Comprehension. Mahwah, New Jersey: Lawrence Erlbaum Associates. Wilpon, J. G. and Rabiner, L. R. 1985. A Modified Kmeans Clustering Algorithm for Use in Isolated Word Recognition. In IEEE Trans. Acoustics, Speech, Signal Proc. ASSP-33(3), 587–594. Zhang R., Li, W., and Lu, Q. 2010. Sentence Ordering with Event-Enriched Semantics and Two-Layered Clustering for Multi-Document News Summarization. In COLING 2010: Poster Volume, 1489–1497, Beijing.
3 0.7166149 101 acl-2011-Disentangling Chat with Local Coherence Models
Author: Micha Elsner ; Eugene Charniak
Abstract: We evaluate several popular models of local discourse coherence for domain and task generality by applying them to chat disentanglement. Using experiments on synthetic multiparty conversations, we show that most models transfer well from text to dialogue. Coherence models improve results overall when good parses and topic models are available, and on a constrained task for real chat data.
4 0.59423542 129 acl-2011-Extending the Entity Grid with Entity-Specific Features
Author: Micha Elsner ; Eugene Charniak
Abstract: We extend the popular entity grid representation for local coherence modeling. The grid abstracts away information about the entities it models; we add discourse prominence, named entity type and coreference features to distinguish between important and unimportant entities. We improve the best result for WSJ document discrimination by 6%.
5 0.59372061 96 acl-2011-Disambiguating temporal-contrastive connectives for machine translation
Author: Thomas Meyer
Abstract: Temporal–contrastive discourse connectives (although, while, since, etc.) signal various types ofrelations between clauses such as temporal, contrast, concession and cause. They are often ambiguous and therefore difficult to translate from one language to another. We discuss several new and translation-oriented experiments for the disambiguation of a specific subset of discourse connectives in order to correct some of the translation errors made by current statistical machine translation systems.
6 0.46346372 294 acl-2011-Temporal Evaluation
7 0.45923352 322 acl-2011-Unsupervised Learning of Semantic Relation Composition
8 0.44959345 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
9 0.44611943 260 acl-2011-Recognizing Authority in Dialogue with an Integer Linear Programming Constrained Model
10 0.38148648 40 acl-2011-An Error Analysis of Relation Extraction in Social Media Documents
11 0.37575093 239 acl-2011-P11-5002 k2opt.pdf
12 0.35286623 74 acl-2011-Combining Indicators of Allophony
13 0.35047024 138 acl-2011-French TimeBank: An ISO-TimeML Annotated Reference Corpus
14 0.33950335 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
15 0.33668381 262 acl-2011-Relation Guided Bootstrapping of Semantic Lexicons
16 0.33667168 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
17 0.32443625 20 acl-2011-A New Dataset and Method for Automatically Grading ESOL Texts
18 0.3231653 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering
19 0.32200733 157 acl-2011-I Thou Thee, Thou Traitor: Predicting Formal vs. Informal Address in English Literature
20 0.32011467 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction
topicId topicWeight
[(5, 0.025), (17, 0.035), (21, 0.198), (26, 0.028), (27, 0.027), (30, 0.019), (37, 0.087), (39, 0.054), (41, 0.079), (53, 0.019), (55, 0.04), (59, 0.052), (72, 0.036), (91, 0.041), (96, 0.172), (97, 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.83610016 53 acl-2011-Automatically Evaluating Text Coherence Using Discourse Relations
Author: Ziheng Lin ; Hwee Tou Ng ; Min-Yen Kan
Abstract: We present a novel model to represent and assess the discourse coherence of text. Our model assumes that coherent text implicitly favors certain types of discourse relation transitions. We implement this model and apply it towards the text ordering ranking task, which aims to discern an original text from a permuted ordering of its sentences. The experimental results demonstrate that our model is able to significantly outperform the state-ofthe-art coherence model by Barzilay and Lapata (2005), reducing the error rate of the previous approach by an average of 29% over three data sets against human upperbounds. We further show that our model is synergistic with the previous approach, demonstrating an error reduction of 73% when the features from both models are combined for the task.
2 0.75062239 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
Author: Joel Lang ; Mirella Lapata
Abstract: In this paper we describe an unsupervised method for semantic role induction which holds promise for relieving the data acquisition bottleneck associated with supervised role labelers. We present an algorithm that iteratively splits and merges clusters representing semantic roles, thereby leading from an initial clustering to a final clustering of better quality. The method is simple, surprisingly effective, and allows to integrate linguistic knowledge transparently. By combining role induction with a rule-based component for argument identification we obtain an unsupervised end-to-end semantic role labeling system. Evaluation on the CoNLL 2008 benchmark dataset demonstrates that our method outperforms competitive unsupervised approaches by a wide margin.
3 0.74776852 88 acl-2011-Creating a manually error-tagged and shallow-parsed learner corpus
Author: Ryo Nagata ; Edward Whittaker ; Vera Sheinman
Abstract: The availability of learner corpora, especially those which have been manually error-tagged or shallow-parsed, is still limited. This means that researchers do not have a common development and test set for natural language processing of learner English such as for grammatical error detection. Given this background, we created a novel learner corpus that was manually error-tagged and shallowparsed. This corpus is available for research and educational purposes on the web. In this paper, we describe it in detail together with its data-collection method and annotation schemes. Another contribution of this paper is that we take the first step toward evaluating the performance of existing POStagging/chunking techniques on learner corpora using the created corpus. These contributions will facilitate further research in related areas such as grammatical error detection and automated essay scoring.
4 0.74643707 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
Author: Joseph Reisinger ; Marius Pasca
Abstract: We develop a novel approach to the semantic analysis of short text segments and demonstrate its utility on a large corpus of Web search queries. Extracting meaning from short text segments is difficult as there is little semantic redundancy between terms; hence methods based on shallow semantic analysis may fail to accurately estimate meaning. Furthermore search queries lack explicit syntax often used to determine intent in question answering. In this paper we propose a hybrid model of semantic analysis combining explicit class-label extraction with a latent class PCFG. This class-label correlation (CLC) model admits a robust parallel approximation, allowing it to scale to large amounts of query data. We demonstrate its performance in terms of (1) its predicted label accuracy on polysemous queries and (2) its ability to accurately chunk queries into base constituents.
5 0.74583387 101 acl-2011-Disentangling Chat with Local Coherence Models
Author: Micha Elsner ; Eugene Charniak
Abstract: We evaluate several popular models of local discourse coherence for domain and task generality by applying them to chat disentanglement. Using experiments on synthetic multiparty conversations, we show that most models transfer well from text to dialogue. Coherence models improve results overall when good parses and topic models are available, and on a constrained task for real chat data.
6 0.74429631 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
7 0.74145579 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing
8 0.74082577 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models
9 0.74080676 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models
10 0.74071676 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment
11 0.74053365 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation
12 0.73996395 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
13 0.7399416 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction
14 0.73985994 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
15 0.73900187 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling
16 0.73892236 129 acl-2011-Extending the Entity Grid with Entity-Specific Features
17 0.73879945 128 acl-2011-Exploring Entity Relations for Named Entity Disambiguation
18 0.73812038 178 acl-2011-Interactive Topic Modeling
19 0.73653829 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
20 0.73647535 244 acl-2011-Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts