acl acl2011 acl2011-101 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Micha Elsner ; Eugene Charniak
Abstract: We evaluate several popular models of local discourse coherence for domain and task generality by applying them to chat disentanglement. Using experiments on synthetic multiparty conversations, we show that most models transfer well from text to dialogue. Coherence models improve results overall when good parses and topic models are available, and on a constrained task for real chat data.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract We evaluate several popular models of local discourse coherence for domain and task generality by applying them to chat disentanglement. [sent-2, score-0.571]
2 Using experiments on synthetic multiparty conversations, we show that most models transfer well from text to dialogue. [sent-3, score-0.228]
3 Coherence models improve results overall when good parses and topic models are available, and on a constrained task for real chat data. [sent-4, score-0.317]
4 Models of coherence have primarily been used for text-based generation tasks: ordering units oftext for multidocument summarization or inserting new text into an existing article. [sent-7, score-0.375]
5 But the theoretical concept of coherence goes beyond both this domain and this task setting– and so should coherence models. [sent-9, score-0.539]
6 This paper evaluates a variety of local coherence models on the task of chat disentanglement or “threading”: separating a transcript of a multiparty interaction into independent conversations1 . [sent-10, score-1.202]
7 Such simultaneous conversations occur in internet chat 1A public implementation is available via https : / / bitbucket . [sent-11, score-0.357]
8 In these situations, a single, correctly disentangled, conversational thread will be coherent, since the speakers involved understand the normal rules of discourse, but the transcript as a whole will not be. [sent-16, score-0.25]
9 Thus, a good model of coherence should be able to disentangle sentences as well as order them. [sent-17, score-0.327]
10 There are several differences between disentanglement and the newswire sentence-ordering tasks typically used to evaluate coherence models. [sent-18, score-0.785]
11 Internet chat comes from a different domain, one where topics vary widely and no reliable syntactic annotations are available. [sent-19, score-0.207]
12 The disentanglement task measures different capabilities of a model, since it compares documents that are not permuted versions of one another. [sent-20, score-0.556]
13 Finally, full disentanglement requires a large-scale search, which is computationally dif? [sent-21, score-0.556]
14 We move toward disentanglement in stages, carrying out a series of experiments to measure the contribution of each of these factors. [sent-23, score-0.529]
15 SWBD contains recorded telephone conversations with known topics and hand-annotated parse trees; this allows us to control for the performance of our parser and other informational resources. [sent-25, score-0.189]
16 cially entangle pairs of telephone dialogues to create synthetic transcripts which we can disentangle. [sent-27, score-0.212]
17 Finally, we present results on actual internet chat corpora. [sent-28, score-0.226]
18 On synthetic SWBD transcripts, local coherence models improve performance considerably over our baseline model, Elsner and Charniak (2008b). [sent-29, score-0.423]
19 c s 2o0ci1a1ti Aonss foocria Ctioomnp fourta Ctioomnaplu Ltaintigouniaslti Lcisn,g puaigsetsic 1s179–1189, internet chat, we continue to do better on a constrained disentanglement task, though so far, we are unable to apply these improvements to the full task. [sent-32, score-0.581]
20 We suspect that, with better low-level annotation tools for the chat domain and a good way of integrating prior information, our improvements on SWBD could transfer fully to IRC chat. [sent-33, score-0.233]
21 2 Related work There is extensive previous work on coherence models for text ordering; we describe several speci? [sent-34, score-0.298]
22 We avoid using them here, because we do not believe topic sequences are predictable in conversation and because such models tend to be algorithmically cumbersome. [sent-39, score-0.197]
23 In addition to text ordering, local coherence models have also been used to score the ? [sent-40, score-0.334]
24 The task of disentanglement or “threading” for internet chat was introduced by Shen et al. [sent-45, score-0.755]
25 Adams (2008) also created and released a disentanglement corpus. [sent-51, score-0.529]
26 These features fail to improve their performance, which is puzzling in light of the success of topic modeling for other coherence and segmentation problems (Eisenstein and Barzilay, 2008; Foltz et al. [sent-54, score-0.315]
27 Most of them are drawn from previous work; one, the topical entity grid, is a novel extension of the entity grid. [sent-68, score-0.395]
28 For the experiments below, we train the models on SWBD, sometimes augmented with a larger set ofautomatically parsed conversations from the FISHER corpus. [sent-69, score-0.173]
29 1 Entity grid The entity grid (Lapata and Barzilay, 2005; Barzilay and Lapata, 2005) is an attempt to model some principles of Centering Theory (Grosz et al. [sent-74, score-0.677]
30 edu/ ˜melsner utterance, the grid predicts the role in which each entity will appear, given its history of roles in the previous sentences, plus a salience feature counting the total number of times the entity occurs. [sent-79, score-0.583]
31 For instance, for an entity which is the subject of sentence 1, the object of sentence 2, and occurs four times in total, the grid predicts its role in sentence 3 according to the conditional P(? [sent-80, score-0.394]
32 2 Topical entity grid This model is a variant of the generative entity grid, intended to take into account topical information. [sent-89, score-0.678]
33 To create the topical entity grid, we learn a set of topic-to-word distributions for our corpus using LDA (Blei et al. [sent-90, score-0.254]
34 html 1181 tory features of the standard entity grid, which detect, for example, whether entity e is the subject of the previous sentence. [sent-99, score-0.282]
35 In the topical entity grid, we instead compute a real-valued feature which sums up the similarity between entity e and the subject(s) of the previous sentence. [sent-100, score-0.395]
36 rst considered for coherence by Soricut and Marcu (2006), although a less probabilistically elegant version was proposed earlier (Lapata, 2003). [sent-110, score-0.346]
37 4 Pronouns The use of a generative pronoun resolver for coherence modeling originates in Elsner and Charniak (2008a). [sent-114, score-0.296]
38 , 2005), Elsner and Charniak (2008a) use a model which recognizes discourse-new versus old NPs as a coherence model. [sent-127, score-0.286]
39 c features Most disentanglement models use non-linguistic information alongside lexical features; in fact, timestamps and speaker identities are usually better cues than words are. [sent-135, score-0.647]
40 rst feature is the time gap between one utterance and the next within the same thread. [sent-138, score-0.219]
41 The second feature is speaker identity; conversations usually involve a small subset of the total number of speakers, and a few core speakers make most of the utterances. [sent-142, score-0.213]
42 Speakers in IRC chat often use their addressee's names to coordinate the chat (O'Neill and Martin, 2003), and this is a powerful source of information (Elsner and Charniak, 2008b). [sent-146, score-0.348]
43 es each utterance into either the start or continuation of a conversational turn, by checking if the previous utterance had the same speaker. [sent-148, score-0.304]
44 For ordering experiments, the contrast set is a single random permutation of d; we explain the training regime for disentanglement below, in subsection 4. [sent-156, score-0.648]
45 rst compare our models on an ordering task, discrimination (Barzilay and Lapata, 2005; Karamanis et al. [sent-159, score-0.252]
46 The best SWBD result is 91%; the best WSJ result is 82% (both for mixtures without the topical entity grid). [sent-177, score-0.281]
47 The strongest models in both cases are the entity grid and IBM-1 (at about 77% for news, 85% for dialogue). [sent-182, score-0.436]
48 The most critical in news is IBM-1 (decreasing performance by 3% when removed); in conversation, it is the entity grid (decreasing by about 1%). [sent-187, score-0.394]
49 Overall, these results suggest that most previously proposed local coherence models are domaingeneral; they work on conversation as well as news. [sent-191, score-0.43]
50 We would like to hold the domain constant, but we do not have any disentanglement data recorded from naturally occurring speech, so we create synthetic instances by merging pairs of SWBD dialogues. [sent-198, score-0.67]
51 Because we are not using speaker information, we remove all utterances which do not contain a noun before constructing synthetic transcripts– these are mostly backchannels like “Yeah”. [sent-205, score-0.178]
52 Such utterances cannot be correctly assigned by our coherence models, which deal with content; we suspect most of them could be dealt with by associating them with the nearest utterance from the same speaker. [sent-206, score-0.425]
53 rst simulate timestamps by sampling the number of seconds between each utterance and the next from a discretized Gaussian: bN(0; 2:5)c . [sent-209, score-0.246]
54 We create synthetic instances of two types– those where the two entangled conversations had different topical prompts and those where they were the same. [sent-212, score-0.364]
55 ) We entangle dialogues from our ordering development set to use for mixture training and validation; for testing, we use 100 instances of each type, constructed from dialogues in our test set. [sent-215, score-0.249]
56 rst disentanglement task is to correctly assign a single utterance, given the true structure of the rest of the transcript. [sent-225, score-0.644]
57 We also use it to train our mixture models for disentanglement, by construct a training example for each utterance iin our training transcripts. [sent-229, score-0.203]
58 Again, results for individual models are above the line, then 1184 gle utterance on 200 synthetic multiparty conversations from SWBD test. [sent-233, score-0.456]
59 The topical model, which performed poorly for ordering, is actually stronger than the entity grid in this setting. [sent-240, score-0.507]
60 IBM-1 underperforms either grid model (69% to 77%); on ordering, it was nearly as good (85% to 86%). [sent-241, score-0.283]
61 Since the disentanglement task involves moving only a single sentence, if moving this sentence does not sever a resolvable pronoun from its antecedent, the model will be unable to make a good decision. [sent-246, score-0.624]
62 As before, the ablation results show that all the models are quite correlated, since removing any single model from the mixture causes only a small decrease in performance. [sent-247, score-0.176]
63 7%, and in the ablation results, removing it causes a larger drop in performance for than “different”; this suggests it is somewhat more robust to similarity in topic than entity grids. [sent-251, score-0.247]
64 One difference is that the models which use longer histories (the two entity grids) remain strong, while the models considering only one or two previous sentences (IBM and pronouns) do not do as well. [sent-253, score-0.225]
65 2 Disentangling an entire transcript We now turn to the task of disentangling an entire transcript at once. [sent-256, score-0.325]
66 cult than assigning only a single utterance, because decisions are interrelated– an error on one utterance may cause a cascade of poor decisions further down. [sent-259, score-0.209]
67 nds and moves the utterance which would most improve the model score if swapped from one thread to the other. [sent-264, score-0.257]
68 Our results (Table 3) show that, for transcripts with different topics, our disentanglement has 68% over5The other popular metrics, F and loc 3, are correlated. [sent-270, score-0.575]
69 1185 sults and truth on 200 synthetic multiparty conversations from SWBD test. [sent-271, score-0.285]
70 Where the entangled conversations have the same topic, performance is lower, about 60%, but still better than the comparison model with 57%. [sent-273, score-0.192]
71 Since correlations with the previous section are fairly reliable, and the disentanglement procedure is computationally intensive, we omit ablation experiments. [sent-274, score-0.603]
72 cult than single-sentence disentanglement (com- bined scores drop by about 20%), but the singlesentence task is a good predictor of relative performance. [sent-276, score-0.584]
73 Entity grid models do best, the IBM model remains useful, but less so than for discrimination, and pronouns are very weak. [sent-277, score-0.374]
74 6 IRC data In this section, we move from synthetic data to real multiparty discourse recorded from internet chat rooms. [sent-280, score-0.441]
75 In order to use syntactic models like the entity grid, we parse the transcripts using (McClosky et al. [sent-285, score-0.229]
76 Performance is bad, although the parser does identify most of the NPs; poor results are typical for a standard parser on chat (Foster, 2010). [sent-287, score-0.174]
77 1 Disentangling a single sentence As before, we show results on correctly disentangling a single sentence, given the correct structure of the rest of the transcript. [sent-296, score-0.175]
78 We average performance on each transcript over the different annotations, then average the transcripts, weighing them by length to give each utterance equal weight. [sent-297, score-0.229]
79 4 Table 5: Average accuracy for disentanglement of a single utterance for 19581 total lines from Adams (2008). [sent-313, score-0.683]
80 Both IBM and the topical entity grid achieve similar gains. [sent-315, score-0.507]
81 The entity grid does better, increasing performance to 79%. [sent-316, score-0.394]
82 The grid looks at the previous six sentences, which differentiates it from the IBM model and from Elsner and Charniak (2008b), which treats each pair of sentences independently. [sent-319, score-0.283]
83 We suspect that our lexicalized models, IBM and the topical entity grid, are hampered by poor parameter settings, since their parameters were learned on FISHER rather than IRC chat. [sent-321, score-0.254]
84 In particular, we believe this explains why the topical entity grid, which slightly outperformed the entity grid on SWBD, is much worse here. [sent-322, score-0.674]
85 2 Full disentanglement Running our tabu search algorithm on the full disentanglement task yields disappointing results. [sent-324, score-1.151]
86 c models as well as in the entity grid; the time model (which generates gaps between adjacent sentences) and the speaker model (which uses a CRP) both assign probability 1to single-utterance conversations. [sent-329, score-0.322]
87 The entity grid also has a bias toward short conversations, because unseen entities are empirically more likely to occur toward the beginning of a conversation than in the middle. [sent-330, score-0.49]
88 A major weakness in our model is that we aim only to maximize coherence of the individual conversations, with no prior on the likely length or num- ber of conversations that will appear in the transcript. [sent-331, score-0.417]
89 Integrating a prior into our framework is not straightforward because we currently train our mixture to maximize single-utterance disentanglement performance, and the prior is not useful for this task. [sent-333, score-0.561]
90 xing parts of the transcript to the solution obtained by Elsner and Charniak (2008b), then using tabu search to ? [sent-335, score-0.193]
91 It appears that our performance increase on single-sentence disentanglement does not transfer to this task because of cascading errors and the necessity of using external constraints. [sent-339, score-0.561]
92 7 Conclusions We demonstrate that several popular models of local coherence transfer well to the conversational domain, suggesting that they do indeed capture coherence in general rather than speci? [sent-340, score-0.668]
93 Our results study suggest that while sophisticated coherence models can potentially contribute to disentanglement, they would bene? [sent-343, score-0.298]
94 Better parsing, or at least NP chunking, would help for models like the entity grid which rely on syntactic role information. [sent-345, score-0.436]
95 With better parameters for these models and the integration of a prior, we believe that our good performance on SWBD and single-utterance disentanglement for IRC can be extended to full-scale disentanglement of IRC. [sent-349, score-1.1]
96 1187 Acknowledgements We are extremely grateful to Regina Barzilay, Mark Johnson, Rebecca Mason, Ben Swanson and Neal Fox for their comments, to Craig Martell for the NPS chat datasets and to three anonymous reviewers. [sent-350, score-0.174]
97 The measurement of textual coherence with latent semantic analysis. [sent-426, score-0.256]
98 Centering: A framework for modeling the local coherence of discourse. [sent-446, score-0.292]
99 Analyzing dialog coherence using transition patterns in lexical and semantic features. [sent-530, score-0.256]
100 Context-based message expansion for disentanglement of interleaved text conversations. [sent-544, score-0.529]
wordName wordTfidf (topN-words)
[('disentanglement', 0.529), ('swbd', 0.291), ('elsner', 0.259), ('coherence', 0.256), ('grid', 0.253), ('chat', 0.174), ('charniak', 0.154), ('entity', 0.141), ('irc', 0.14), ('conversations', 0.131), ('utterance', 0.129), ('disentangling', 0.125), ('topical', 0.113), ('transcript', 0.1), ('conversation', 0.096), ('ordering', 0.094), ('tabu', 0.093), ('rst', 0.09), ('synthetic', 0.089), ('aoki', 0.078), ('thread', 0.071), ('barzilay', 0.07), ('adams', 0.068), ('multiparty', 0.065), ('lapata', 0.063), ('speci', 0.062), ('topic', 0.059), ('linux', 0.056), ('cocktail', 0.055), ('cult', 0.055), ('party', 0.054), ('fisher', 0.054), ('internet', 0.052), ('dif', 0.052), ('speaker', 0.049), ('pronouns', 0.049), ('ibm', 0.048), ('history', 0.048), ('ablation', 0.047), ('conversational', 0.046), ('transcripts', 0.046), ('dialogues', 0.046), ('micha', 0.045), ('eugene', 0.044), ('models', 0.042), ('arti', 0.041), ('disentangle', 0.041), ('senate', 0.041), ('signi', 0.041), ('pronoun', 0.04), ('utterances', 0.04), ('regina', 0.039), ('local', 0.036), ('crp', 0.036), ('discourse', 0.036), ('mcclosky', 0.033), ('speakers', 0.033), ('topics', 0.033), ('soricut', 0.032), ('mixture', 0.032), ('transfer', 0.032), ('chatspeci', 0.031), ('disentangled', 0.031), ('entangle', 0.031), ('entangled', 0.031), ('glover', 0.031), ('haykin', 0.031), ('szymanski', 0.031), ('threading', 0.031), ('poesio', 0.031), ('wsj', 0.031), ('gaps', 0.03), ('mirella', 0.03), ('model', 0.03), ('dialogue', 0.029), ('antecedent', 0.028), ('eisenstein', 0.028), ('classi', 0.027), ('karamanis', 0.027), ('permuted', 0.027), ('permute', 0.027), ('allison', 0.027), ('bene', 0.027), ('deictics', 0.027), ('nanc', 0.027), ('nds', 0.027), ('timestamps', 0.027), ('uence', 0.027), ('domain', 0.027), ('mixtures', 0.027), ('computationally', 0.027), ('discrimination', 0.026), ('worse', 0.026), ('bucket', 0.025), ('oard', 0.025), ('oftext', 0.025), ('single', 0.025), ('recorded', 0.025), ('nenkova', 0.025), ('obama', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 101 acl-2011-Disentangling Chat with Local Coherence Models
Author: Micha Elsner ; Eugene Charniak
Abstract: We evaluate several popular models of local discourse coherence for domain and task generality by applying them to chat disentanglement. Using experiments on synthetic multiparty conversations, we show that most models transfer well from text to dialogue. Coherence models improve results overall when good parses and topic models are available, and on a constrained task for real chat data.
2 0.41847941 129 acl-2011-Extending the Entity Grid with Entity-Specific Features
Author: Micha Elsner ; Eugene Charniak
Abstract: We extend the popular entity grid representation for local coherence modeling. The grid abstracts away information about the entities it models; we add discourse prominence, named entity type and coreference features to distinguish between important and unimportant entities. We improve the best result for WSJ document discrimination by 6%.
3 0.2298502 280 acl-2011-Sentence Ordering Driven by Local and Global Coherence for Summary Generation
Author: Renxian Zhang
Abstract: In summarization, sentence ordering is conducted to enhance summary readability by accommodating text coherence. We propose a grouping-based ordering framework that integrates local and global coherence concerns. Summary sentences are grouped before ordering is applied on two levels: group-level and sentence-level. Different algorithms for grouping and ordering are discussed. The preliminary results on single-document news datasets demonstrate the advantage of our method over a widely accepted method. 1 Introduction and Background The canonical pipeline of text summarization consists of topic identification, interpretation, and summary generation (Hovy, 2005). In the simple case of extraction, topic identification and interpretation are conflated to sentence selection and concerned with summary informativeness. In . comparison, summary generation addresses summary readability and a frequently discussed generation technique is sentence ordering. It is implicitly or explicitly stated that sentence ordering for summarization is primarily driven by coherence. For example, Barzilay et al. (2002) use lexical cohesion information to model local coherence. A statistical model by Lapata (2003) considers both lexical and syntactic features in calculating local coherence. More globally biased is Barzilay and Lee’s (2004) HMM-based content model, which models global coherence with word distribution patterns. Whilst the above models treat coherence as lexical or topical relations, Barzilay and Lapata (2005, 2008) explicitly model local coherence with an entity grid model trained for optimal syntactic role transitions of entities. 6 . polyu .edu .hk Although coherence in those works is modeled in the guise of “lexical cohesion”, “topic closeness”, “content relatedness”, etc., few published works simultaneously accommodate coherence on the two levels: local coherence and global coherence, both of which are intriguing topics in text linguistics and psychology. For sentences, local coherence means the wellconnectedness between adjacent sentences through lexical cohesion (Halliday and Hasan, 1976) or entity repetition (Grosz et al., 1995) and global coherence is the discourse-level relation connecting remote sentences (Mann and Thompson, 1995; Kehler, 2002). An abundance of psychological evidences show that coherence on both levels is manifested in text comprehension (Tapiero, 2007). Accordingly, an apt sentence ordering scheme should be driven by such concerns. We also note that as sentence ordering is usually discussed only in the context of multi-document summarization, factors other than coherence are also considered, such as time and source sentence position in Bollegala et al.’s (2006) “agglomerative ordering” approach. But it remains an open question whether sentence ordering is non-trivial for single-document summarization, as it has long been recognized as an actual strategy taken by human summarizers (Jing, 1998; Jing and McKeown, 2000) and acknowledged early in work on sentence ordering for multi-document summarization (Barzilay et al., 2002). In this paper, we outline a grouping-based sentence ordering framework that is driven by the concern of local and global coherence. Summary sentences are grouped according to their conceptual relatedness before being ordered on two levels: group-level ordering and sentence-level ordering, which capture global coherence and local coherence in an integrated model. As a preliminary study, we applied the framework to singlePortland, ORP,r UoSceAed 1i9ng-2s4 of Ju tnhee 2 A0C1L1-H. ?Lc T2 0201111 A Ssstuodcieanttio Snes fsoiro Cn,o pmapguesta 6t–io1n1a,l Linguistics document summary generation and obtained interesting results. The main contributions of this work are: (1) we stress the need to channel sentence ordering research to linguistic and psychological findings about text coherence; (2) we propose a groupingbased ordering framework that integrates both local and global coherence; (3) we find in experiments that coherence-driven sentence ordering improves the readability of singledocument summaries, for which sentence ordering is often considered trivial. In Section 2, we review related ideas and techniques in previous work. Section 3 provides the details of grouping-based sentence ordering. The preliminary experimental results are presented in Section 4. Finally, Section 5 concludes the whole paper and describes future work. 2 Grouping-Based Ordering Our ordering framework is designed to capture both local and global coherence. Globally, we identify related groups among sentences and find their relative order. Locally, we strive to keep sentence similar or related in content close to each other within one group. 2.1 Sentence Representation As summary sentences are isolated from their original context, we retain the important content information by representing sentences as concept vectors. In the simplest case, the “concept” is equivalent to content word. A drawback of this practice is that it considers every content word equally contributive to the sentence content, which is not always true. For example, in the news domain, entities realized as NPs are more important than other concepts. To represent sentences as entity vectors, we identify both common entities (as the head nouns of NPs) and named entities. Two common entities are equivalent if their noun stems are identical or synonymous. Named entities are usually equated by identity. But in order to improve accuracy, we also consider: 1) structural subsumption (one is part of another); 2) hypernymy and holonymy (the named entities are in a superordinate-subordinate or part-whole relation). Now with summary sentence Si and m entities eik (k = 1 m), Si = (wf(ei1), wf(ei2), wf(eim)), … … … … … …, 7 where wf(eik) = wk× ×f(eik), f(eik) is the frequency of eik and wk is the weight of eik. We define wk = 1 if eik is a common entity and wk = 2 if eik is a named entity. We give double weight to named entities because of their significance to news articles. After all, a news story typically contains events, places, organizations, people, etc. that denote the news theme. Other things being equal, two sentences sharing a mention of named entities are thematically closer than two sentences sharing a mention of common entities. Alternatively, we can realize the “concept” as “event” because events are prevalent semantic constructs that bear much of the sentence content in some domains (e.g., narratives and news reports). To represent sentences as event vectors, we can follow Zhang et al.’s (2010) method at the cost of more complexity. 2.2 Sentence Grouping To meet the global need of identifying sentence groups, we develop two grouping algorithms by applying graph-based operation and clustering. Connected Component Finding (CC) This algorithm treats grouping sentences as finding connected components (CC) in a text graph TG = (V, E), where V represents the sentences and E the sentence relations weighted by cosine similarity. Edges with weight < t, a threshold, are removed because they represent poor sentence coherence. The resultant graph may be disconnected, in which we find all of its connected components, using depth-first search. The connected components are the groups we are looking for. Note that this method cannot guarantee that every two sentences in such a group are directly linked, but it does guarantee that there exists a path between every sentence pair. Modified K-means Clustering (MKM) Observing that the CC method finds only coherent groups, not necessarily groups of coherent sentences, we develop a second algorithm using clustering. A good choice might be K-means as it is efficient and outperforms agglomerative clustering methods in NLP applications (Steibach et al., 2000), but the difficulty with the conventional K-means is the decision of K. Our solution is modified K-means (MKM) based on (Wilpon and Rabiner, 1985). Let’s denote cluster iby CLi and cluster similarity by Sim(CLi) =SimM,SiinnCLi(Sim( Sim,Sin)), where Sim( Sim,Sin)is their cvMeluin231st.(roCWSD21imLsdhtoIf(=ClaSehbLsimt)l1;(zehS<-amncs,tdoeSa1vn)c;st:el=uMionvrate(lhcSKiosmg-tC;eblLayn)s,c2riuegant wcoelsunathdi cosine. The following illustrates the algorithm. The above algorithm stops iterating when each cluster contains all above-threshold-similarity sentence pairs or only one sentence. Unlike CC, MKM results in more strongly connected groups, or groups of coherence sentences. 2.3 Ordering Algorithms After the sentences are grouped, ordering is to be conducted on two levels: group and sentence. Composed of closely related sentences, groups simulate high-level textual constructs, such as “central event”, “cause”, “effect”, “background”, etc. for news articles, around which sentences are generated for global coherence. For an intuitive example, all sentences about “cause” should immediately precede all sentences about “effect” to achieve optimal readability. We propose two approaches to group-level ordering. 1) If the group sentences come from the same document, group (Gi) order is decided by the group-representing sentence (gi) order ( means “precede”) in the text. gi gj Gi Gj 2) Group order is decided in a greedy fashion in order to maximize the connectedness between adjacent groups, thus enhancing local coherence. Each time a group is selected to achieve maximum similarity with the ordered groups and the first ordered group (G1) is selected to achieve maximum similarity with all the other groups. G1argmGaxG'GSim( G , G') GiGuanrogrdemred agrxoupsij1 Sim( Gj, G) (i > 1) where Sim(G, G’) is the average sentence cosine similarity between G and G’. 8 Within the ordered groups, sentence-level ordering is aimed to enhance local coherence by placing conceptually close sentences next to each other. Similarly, we propose two approaches. 1) If the sentences come from the same document, they are arranged by the text order. 2) Sentence order is greedily decided. Similar to the decision of group order, with ordered sentence Spi in group Gp: Sp1argSm GpaxS'SSim( S , S') SpiSunorader egd smenteanxces in Gpji1 Sim( Spj,S )(i > 1) Note that the text order is used as a common heuristic, based on the assumption that the sentences are arranged coherently in the source document, locally and globally. 3 Experiments and Preliminary Results Currently, we have evaluated grouping-based ordering on single-document summarization, for which text order is usually considered sufficient. But there is no theoretical proof that it leads to optimal global and local coherence that concerns us. On some occasions, e.g., a news article adopting the “Wall Street Journal Formula” (Rich and Harper, 2007) where conceptually related sentences are placed at the beginning and the end, sentence conceptual relatedness does not necessarily correlate with spatial proximity and thus selected sentences may need to be rearranged for better readability. We are not aware of any published work that has empirically compared alternative ways of sentence ordering for singledocument summarization. The experimental results reported below may draw some attention to this taken-for-granted issue. 3.1 Data and Method We prepared 3 datasets of 60 documents each, the first (D400) consisting of documents of about 400 words from the Document Understanding Conference (DUC) 01/02 datasets; the second (D1k) consisting of documents of about 1000 words manually selected from popular English journals such as The Wall Street Journal, The Washington Post, etc; the third (D2k) consisting of documents of about 2000 words from the DUC 01/02 dataset. Then we generated 100-word summaries for D400 and 200-word summaries for D1k and D2k. Since sentence selection is not our focus, the 180 summaries were all extracts produced by a simple but robust summarizer built on term frequency and sentence position (Aone et al., 1999). Three human annotators were employed to each provide reference orderings for the 180 summaries and mark paragraph (of at least 2 sentences) boundaries, which will be used by one of the evaluation metrics described below. In our implementation of the grouping-based ordering, sentences are represented as entity vectors and the threshold t = Avg( Sim( Sm, Sn)) c , the average sentence similarity in a group multiplied by a coefficient empirically decided on separate held-out datasets of 20 documents for each length category. The “group-representing sentence” is the textually earliest sentence in the group. We experimented with both CC and MKM to generate sentence groups and all the proposed algorithms in 2.3 for group-level and sentence- level orderings, resulting in 8 combinations as test orderings, each coded in the format of “Grouping (CC/MKM) / Group ordering (T/G) / Sentence ordering (T/G)”, where T and G represent the text order approach and the greedy selection approach respectively. For example, “CC/T/G” means grouping with CC, group ordering with text order, and sentence ordering with the greedy approach. We evaluated the test orderings against the 3 reference orderings and compute the average (Madnani et al., 2007) by using 3 different metrics. The first metric is Kendall’s τ (Lapata 2003, 2006), which has been reliably used in ordering evaluations (Bollegala et al., 2006; Madnani et al., 2007). It measures ordering differences in terms of the number of adjacent sentence inversions necessary to convert a test ordering to the reference ordering. 1 4m N( N 1) In this formula, m represents the number of inversions described above and N is the total number of sentences. The second metric is the Average Continuity (AC) proposed by Bollegala et al. (2006), which captures the intuition that the quality of sentence orderings can be estimated by the number of correctly arranged continuous sentences. 9 ACexp(1/(k1) klog(Pn)) n2 In this formula, k is the maximum number of continuous sentences, α is a small value in case Pn = 1. Pn, the proportion of continuous sentences of length n in an ordering, is defined as m/(N – n + 1) where m is the number of continuous sentences of length n in both the test and reference orderings and N is the total number of sentences. Following (Bollegala et al., 2006), we set k = Min(4, N) and α = 0.01. We also go a step further by considering only the continuous sentences in a paragraph marked by human annotators, because paragraphs are local meaning units perceived by human readers and the order of continuous sentences in a paragraph is more strongly grounded than the order of continuous sentences across paragraph boundaries. So in-paragraph sentence continuity is a better estimation for the quality of sentence orderings. This is our third metric: Paragraph-level Average Continuity (P-AC). P-ACexp(1/(k1) klog(PPn )) n2 Here PPn = m ’/(N – n + 1), where m ’ is the number of continuous sentences of length n in both the test ordering and a paragraph of the reference ordering. All the other parameters are as defined in AC and Pn. 3.2 Results The following tables show the results measured by each metric. For comparison, we also include a “Baseline” that uses the text order. For each dataset, two-tailed t-test is conducted between the top scorer and all the other orderings and statistical significance (p < 0.05) is marked with *. In general, our grouping-based ordering scheme outperforms the baseline for news articles of various lengths and statistically significant improvement can be observed on each dataset. This result casts serious doubt on the widely accepted practice of taking the text order for single-document summary generation, which is a major finding from our study. The three evaluation metrics give consistent results although they are based on different observations. The P-AC scores are much lower than their AC counterparts because of its strict paragraph constraint. Interestingly, applying the text order posterior to sentence grouping for group-level and sentence- level ordering leads to consistently optimal performance, as the top scorers on each dataset are almost all “__/T/T”. This suggests that the textual realization of coherence can be sought in the source document if possible, after the selected sentences are rearranged. It is in this sense that the general intuition about the text order is justified. It also suggests that tightly knit paragraphs (groups), where the sentences are closely connected, play a crucial role in creating a coherence flow. Shuffling those paragraphs may not affect the final coherence1. 1 Ithank an anonymous reviewer for pointing this out. 10 The grouping method does make a difference. While CC works best for the short and long datasets (D400 and D2k), MKM is more effective for the medium-sized dataset D1k. Whether the difference is simply due to length or linguistic/stylistic subtleties is an interesting topic for in-depth study. 4 Conclusion and Future Work We have established a grouping-based ordering scheme to accommodate both local and global coherence for summary generation. Experiments on single-document summaries validate our approach and challenge the well accepted text order by the summarization community. Nonetheless, the results do not necessarily propagate to multi-document summarization, for which the same-document clue for ordering cannot apply directly. Adapting the proposed scheme to multi-document summary generation is the ongoing work we are engaged in. In the next step, we will experiment with alternative sentence representations and ordering algorithms to achieve better performance. We are also considering adapting more sophisticated coherence-oriented models, such as (Soricut and Marcu, 2006; Elsner et al., 2007), to our problem so as to make more interesting comparisons possible. Acknowledgements The reported work was inspired by many talks with my supervisor, Dr. Wenjie Li, who saw through this work down to every writing detail. The author is also grateful to many people for assistance. You Ouyang shared part of his summarization work and helped with the DUC data. Dr. Li Shen, Dr. Naishi Liu, and three participants helped with the experiments. I thank them all. The work described in this paper was partially supported by Hong Kong RGC Projects (No. PolyU 5217/07E). References Aone, C., Okurowski, M. E., Gorlinsky, J., and Larsen, B. 1999. A Trainable Summarizer with Knowledge Acquired from Robust NLP Techniques. In I. Mani and M. T. Maybury (eds.), Advances in Automatic Text Summarization. 71–80. Cambridge, Massachusetts: MIT Press. Barzilay, R., Elhadad, N., and McKeown, K. 2002. Inferring Strategies for Sentence Ordering in Multidocument News Summarization. Journal of Artificial Intelligence Research, 17: 35–55. Barzilay, R. and Lapata, M. 2005. Modeling Local Coherence: An Entity-based Approach. In Proceedings of the 43rd Annual Meeting of the ACL, 141–148. Ann Arbor. Barzilay, R. and Lapata, M. 2008. Modeling Local Coherence: An Entity-Based Approach. Computational Linguistics, 34: 1–34. Barzilay, R. and Lee L. 2004. Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization. In HLT-NAACL 2004: Proceedings of the Main Conference. 113–120. Bollegala, D, Okazaki, N., and Ishizuka, M. 2006. A Bottom-up Approach to Sentence Ordering for Multidocument Summarization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, 385–392. Sydney. Elsner, M., Austerweil, j. & Charniak E. 2007. “A Unified Local and Global Model for Discourse Coherence”. In Proceedings of NAACL HLT 2007, 436-443. Rochester, NY. Grosz, B. J., Aravind K. J., and Scott W. 1995. Centering: A framework for Modeling the Local Coherence of Discourse. Computational Linguistics, 21(2):203–225. Halliday, M. A. K., and Hasan, R. 1976. Cohesion in English. London: Longman. Hovy, E. 2005. Automated Text Summarization. In R. Mitkov (ed.), The Oxford Handbook of Computational Linguistics, pp. 583–598. Oxford: Oxford University Press. Jing, H. 2000. Sentence Reduction for Automatic Text Summarization. In Proceedings of the 6th Applied Natural Language Processing WA, pp. 310–315. Conference, Seattle, Jing, H., and McKeown, K. 2000. Cut and Paste Based Text Summarization. In Proceedings of the 1st NAACL, 178–185. 11 Kehler, A. 2002. Coherence, Reference, and the Theory of Grammar. Stanford, California: CSLI Publications. Lapata, M. 2003. Probabilistic Text Structuring: Experiments with Sentence Ordering. In Proceedings of the Annual Meeting of ACL, 545–552. Sapporo, Japan. Lapata, M. 2006. Automatic evaluation of information ordering: Kendall’s tau. Computational Linguistics, 32(4):1–14. Madnani, N., Passonneau, R., Ayan, N. F., Conroy, J. M., Dorr, B. J., Klavans, J. L., O’leary, D. P., and Schlesinger, J. D. 2007. Measuring Variability in Sentence Ordering for News Summarization. In Proceedings of the Eleventh European Workshop on Natural Language Generation, 81–88. Germany. Mann, W. C. and Thompson, S. 1988. Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. Text, 8:243–281. Rich C., and Harper, C. 2007. Writing and Reporting News: A Coaching Method, Fifth Edition. Thomason Learning, Inc. Belmont, CA. Soricut, R. and Marcu D. 2006. Discourse Generation Using Utility-Trained Coherence Models. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, 803–810. Steibach, M., Karypis, G., and Kumar V. 2000. A Comparison of Document Clustering Techniques. Technical Report 00-034. Department of Computer Science and Engineering, University of Minnesota. Tapiero, I. 2007. Situation Models and Levels of Coherence: Towards a Definition of Comprehension. Mahwah, New Jersey: Lawrence Erlbaum Associates. Wilpon, J. G. and Rabiner, L. R. 1985. A Modified Kmeans Clustering Algorithm for Use in Isolated Word Recognition. In IEEE Trans. Acoustics, Speech, Signal Proc. ASSP-33(3), 587–594. Zhang R., Li, W., and Lu, Q. 2010. Sentence Ordering with Event-Enriched Semantics and Two-Layered Clustering for Multi-Document News Summarization. In COLING 2010: Poster Volume, 1489–1497, Beijing.
4 0.19210279 53 acl-2011-Automatically Evaluating Text Coherence Using Discourse Relations
Author: Ziheng Lin ; Hwee Tou Ng ; Min-Yen Kan
Abstract: We present a novel model to represent and assess the discourse coherence of text. Our model assumes that coherent text implicitly favors certain types of discourse relation transitions. We implement this model and apply it towards the text ordering ranking task, which aims to discern an original text from a permuted ordering of its sentences. The experimental results demonstrate that our model is able to significantly outperform the state-ofthe-art coherence model by Barzilay and Lapata (2005), reducing the error rate of the previous approach by an average of 29% over three data sets against human upperbounds. We further show that our model is synergistic with the previous approach, demonstrating an error reduction of 73% when the features from both models are combined for the task.
5 0.10834925 226 acl-2011-Multi-Modal Annotation of Quest Games in Second Life
Author: Sharon Gower Small ; Jennifer Strommer-Galley ; Tomek Strzalkowski
Abstract: We describe an annotation tool developed to assist in the creation of multimodal actioncommunication corpora from on-line massively multi-player games, or MMGs. MMGs typically involve groups of players (5-30) who control their perform various activities (questing, competing, fighting, etc.) and communicate via chat or speech using assumed screen names. We collected a corpus of 48 group quests in Second Life that jointly involved 206 players who generated over 30,000 messages in quasisynchronous chat during approximately 140 hours of recorded action. Multiple levels of coordinated annotation of this corpus (dialogue, movements, touch, gaze, wear, etc) are required in order to support development of automated predictors of selected real-life social and demographic characteristics of the players. The annotation tool presented in this paper was developed to enable efficient and accurate annotation of all dimensions simultaneously. avatars1, 1
6 0.108007 314 acl-2011-Typed Graph Models for Learning Latent Attributes from Names
7 0.10707577 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations
8 0.10526874 12 acl-2011-A Generative Entity-Mention Model for Linking Entities with Knowledge Base
9 0.099808834 23 acl-2011-A Pronoun Anaphora Resolution System based on Factorial Hidden Markov Models
10 0.086673036 117 acl-2011-Entity Set Expansion using Topic information
11 0.084914006 275 acl-2011-Semi-Supervised Modeling for Prenominal Modifier Ordering
12 0.084706917 287 acl-2011-Structural Topic Model for Latent Topical Structure Analysis
13 0.084147386 98 acl-2011-Discovery of Topically Coherent Sentences for Extractive Summarization
14 0.084133461 285 acl-2011-Simple supervised document geolocation with geodesic grids
15 0.0722728 161 acl-2011-Identifying Word Translations from Comparable Corpora Using Latent Topic Models
16 0.071971551 95 acl-2011-Detection of Agreement and Disagreement in Broadcast Conversations
17 0.070096649 260 acl-2011-Recognizing Authority in Dialogue with an Integer Linear Programming Constrained Model
18 0.07005088 194 acl-2011-Language Use: What can it tell us?
19 0.064568676 52 acl-2011-Automatic Labelling of Topic Models
20 0.06401106 178 acl-2011-Interactive Topic Modeling
topicId topicWeight
[(0, 0.179), (1, 0.072), (2, -0.129), (3, 0.054), (4, -0.044), (5, 0.078), (6, -0.046), (7, 0.063), (8, -0.178), (9, 0.066), (10, -0.002), (11, 0.055), (12, -0.1), (13, -0.142), (14, 0.008), (15, 0.126), (16, -0.098), (17, 0.288), (18, -0.054), (19, -0.034), (20, 0.076), (21, 0.07), (22, -0.116), (23, 0.121), (24, -0.007), (25, 0.075), (26, -0.121), (27, 0.012), (28, -0.037), (29, 0.017), (30, -0.043), (31, 0.012), (32, 0.026), (33, 0.032), (34, 0.064), (35, -0.118), (36, 0.043), (37, 0.095), (38, -0.106), (39, -0.047), (40, -0.091), (41, -0.087), (42, -0.007), (43, -0.062), (44, -0.078), (45, 0.055), (46, 0.15), (47, 0.068), (48, -0.073), (49, 0.134)]
simIndex simValue paperId paperTitle
same-paper 1 0.9257369 101 acl-2011-Disentangling Chat with Local Coherence Models
Author: Micha Elsner ; Eugene Charniak
Abstract: We evaluate several popular models of local discourse coherence for domain and task generality by applying them to chat disentanglement. Using experiments on synthetic multiparty conversations, we show that most models transfer well from text to dialogue. Coherence models improve results overall when good parses and topic models are available, and on a constrained task for real chat data.
2 0.8481465 129 acl-2011-Extending the Entity Grid with Entity-Specific Features
Author: Micha Elsner ; Eugene Charniak
Abstract: We extend the popular entity grid representation for local coherence modeling. The grid abstracts away information about the entities it models; we add discourse prominence, named entity type and coreference features to distinguish between important and unimportant entities. We improve the best result for WSJ document discrimination by 6%.
3 0.71063316 53 acl-2011-Automatically Evaluating Text Coherence Using Discourse Relations
Author: Ziheng Lin ; Hwee Tou Ng ; Min-Yen Kan
Abstract: We present a novel model to represent and assess the discourse coherence of text. Our model assumes that coherent text implicitly favors certain types of discourse relation transitions. We implement this model and apply it towards the text ordering ranking task, which aims to discern an original text from a permuted ordering of its sentences. The experimental results demonstrate that our model is able to significantly outperform the state-ofthe-art coherence model by Barzilay and Lapata (2005), reducing the error rate of the previous approach by an average of 29% over three data sets against human upperbounds. We further show that our model is synergistic with the previous approach, demonstrating an error reduction of 73% when the features from both models are combined for the task.
4 0.67555934 280 acl-2011-Sentence Ordering Driven by Local and Global Coherence for Summary Generation
Author: Renxian Zhang
Abstract: In summarization, sentence ordering is conducted to enhance summary readability by accommodating text coherence. We propose a grouping-based ordering framework that integrates local and global coherence concerns. Summary sentences are grouped before ordering is applied on two levels: group-level and sentence-level. Different algorithms for grouping and ordering are discussed. The preliminary results on single-document news datasets demonstrate the advantage of our method over a widely accepted method. 1 Introduction and Background The canonical pipeline of text summarization consists of topic identification, interpretation, and summary generation (Hovy, 2005). In the simple case of extraction, topic identification and interpretation are conflated to sentence selection and concerned with summary informativeness. In . comparison, summary generation addresses summary readability and a frequently discussed generation technique is sentence ordering. It is implicitly or explicitly stated that sentence ordering for summarization is primarily driven by coherence. For example, Barzilay et al. (2002) use lexical cohesion information to model local coherence. A statistical model by Lapata (2003) considers both lexical and syntactic features in calculating local coherence. More globally biased is Barzilay and Lee’s (2004) HMM-based content model, which models global coherence with word distribution patterns. Whilst the above models treat coherence as lexical or topical relations, Barzilay and Lapata (2005, 2008) explicitly model local coherence with an entity grid model trained for optimal syntactic role transitions of entities. 6 . polyu .edu .hk Although coherence in those works is modeled in the guise of “lexical cohesion”, “topic closeness”, “content relatedness”, etc., few published works simultaneously accommodate coherence on the two levels: local coherence and global coherence, both of which are intriguing topics in text linguistics and psychology. For sentences, local coherence means the wellconnectedness between adjacent sentences through lexical cohesion (Halliday and Hasan, 1976) or entity repetition (Grosz et al., 1995) and global coherence is the discourse-level relation connecting remote sentences (Mann and Thompson, 1995; Kehler, 2002). An abundance of psychological evidences show that coherence on both levels is manifested in text comprehension (Tapiero, 2007). Accordingly, an apt sentence ordering scheme should be driven by such concerns. We also note that as sentence ordering is usually discussed only in the context of multi-document summarization, factors other than coherence are also considered, such as time and source sentence position in Bollegala et al.’s (2006) “agglomerative ordering” approach. But it remains an open question whether sentence ordering is non-trivial for single-document summarization, as it has long been recognized as an actual strategy taken by human summarizers (Jing, 1998; Jing and McKeown, 2000) and acknowledged early in work on sentence ordering for multi-document summarization (Barzilay et al., 2002). In this paper, we outline a grouping-based sentence ordering framework that is driven by the concern of local and global coherence. Summary sentences are grouped according to their conceptual relatedness before being ordered on two levels: group-level ordering and sentence-level ordering, which capture global coherence and local coherence in an integrated model. As a preliminary study, we applied the framework to singlePortland, ORP,r UoSceAed 1i9ng-2s4 of Ju tnhee 2 A0C1L1-H. ?Lc T2 0201111 A Ssstuodcieanttio Snes fsoiro Cn,o pmapguesta 6t–io1n1a,l Linguistics document summary generation and obtained interesting results. The main contributions of this work are: (1) we stress the need to channel sentence ordering research to linguistic and psychological findings about text coherence; (2) we propose a groupingbased ordering framework that integrates both local and global coherence; (3) we find in experiments that coherence-driven sentence ordering improves the readability of singledocument summaries, for which sentence ordering is often considered trivial. In Section 2, we review related ideas and techniques in previous work. Section 3 provides the details of grouping-based sentence ordering. The preliminary experimental results are presented in Section 4. Finally, Section 5 concludes the whole paper and describes future work. 2 Grouping-Based Ordering Our ordering framework is designed to capture both local and global coherence. Globally, we identify related groups among sentences and find their relative order. Locally, we strive to keep sentence similar or related in content close to each other within one group. 2.1 Sentence Representation As summary sentences are isolated from their original context, we retain the important content information by representing sentences as concept vectors. In the simplest case, the “concept” is equivalent to content word. A drawback of this practice is that it considers every content word equally contributive to the sentence content, which is not always true. For example, in the news domain, entities realized as NPs are more important than other concepts. To represent sentences as entity vectors, we identify both common entities (as the head nouns of NPs) and named entities. Two common entities are equivalent if their noun stems are identical or synonymous. Named entities are usually equated by identity. But in order to improve accuracy, we also consider: 1) structural subsumption (one is part of another); 2) hypernymy and holonymy (the named entities are in a superordinate-subordinate or part-whole relation). Now with summary sentence Si and m entities eik (k = 1 m), Si = (wf(ei1), wf(ei2), wf(eim)), … … … … … …, 7 where wf(eik) = wk× ×f(eik), f(eik) is the frequency of eik and wk is the weight of eik. We define wk = 1 if eik is a common entity and wk = 2 if eik is a named entity. We give double weight to named entities because of their significance to news articles. After all, a news story typically contains events, places, organizations, people, etc. that denote the news theme. Other things being equal, two sentences sharing a mention of named entities are thematically closer than two sentences sharing a mention of common entities. Alternatively, we can realize the “concept” as “event” because events are prevalent semantic constructs that bear much of the sentence content in some domains (e.g., narratives and news reports). To represent sentences as event vectors, we can follow Zhang et al.’s (2010) method at the cost of more complexity. 2.2 Sentence Grouping To meet the global need of identifying sentence groups, we develop two grouping algorithms by applying graph-based operation and clustering. Connected Component Finding (CC) This algorithm treats grouping sentences as finding connected components (CC) in a text graph TG = (V, E), where V represents the sentences and E the sentence relations weighted by cosine similarity. Edges with weight < t, a threshold, are removed because they represent poor sentence coherence. The resultant graph may be disconnected, in which we find all of its connected components, using depth-first search. The connected components are the groups we are looking for. Note that this method cannot guarantee that every two sentences in such a group are directly linked, but it does guarantee that there exists a path between every sentence pair. Modified K-means Clustering (MKM) Observing that the CC method finds only coherent groups, not necessarily groups of coherent sentences, we develop a second algorithm using clustering. A good choice might be K-means as it is efficient and outperforms agglomerative clustering methods in NLP applications (Steibach et al., 2000), but the difficulty with the conventional K-means is the decision of K. Our solution is modified K-means (MKM) based on (Wilpon and Rabiner, 1985). Let’s denote cluster iby CLi and cluster similarity by Sim(CLi) =SimM,SiinnCLi(Sim( Sim,Sin)), where Sim( Sim,Sin)is their cvMeluin231st.(roCWSD21imLsdhtoIf(=ClaSehbLsimt)l1;(zehS<-amncs,tdoeSa1vn)c;st:el=uMionvrate(lhcSKiosmg-tC;eblLayn)s,c2riuegant wcoelsunathdi cosine. The following illustrates the algorithm. The above algorithm stops iterating when each cluster contains all above-threshold-similarity sentence pairs or only one sentence. Unlike CC, MKM results in more strongly connected groups, or groups of coherence sentences. 2.3 Ordering Algorithms After the sentences are grouped, ordering is to be conducted on two levels: group and sentence. Composed of closely related sentences, groups simulate high-level textual constructs, such as “central event”, “cause”, “effect”, “background”, etc. for news articles, around which sentences are generated for global coherence. For an intuitive example, all sentences about “cause” should immediately precede all sentences about “effect” to achieve optimal readability. We propose two approaches to group-level ordering. 1) If the group sentences come from the same document, group (Gi) order is decided by the group-representing sentence (gi) order ( means “precede”) in the text. gi gj Gi Gj 2) Group order is decided in a greedy fashion in order to maximize the connectedness between adjacent groups, thus enhancing local coherence. Each time a group is selected to achieve maximum similarity with the ordered groups and the first ordered group (G1) is selected to achieve maximum similarity with all the other groups. G1argmGaxG'GSim( G , G') GiGuanrogrdemred agrxoupsij1 Sim( Gj, G) (i > 1) where Sim(G, G’) is the average sentence cosine similarity between G and G’. 8 Within the ordered groups, sentence-level ordering is aimed to enhance local coherence by placing conceptually close sentences next to each other. Similarly, we propose two approaches. 1) If the sentences come from the same document, they are arranged by the text order. 2) Sentence order is greedily decided. Similar to the decision of group order, with ordered sentence Spi in group Gp: Sp1argSm GpaxS'SSim( S , S') SpiSunorader egd smenteanxces in Gpji1 Sim( Spj,S )(i > 1) Note that the text order is used as a common heuristic, based on the assumption that the sentences are arranged coherently in the source document, locally and globally. 3 Experiments and Preliminary Results Currently, we have evaluated grouping-based ordering on single-document summarization, for which text order is usually considered sufficient. But there is no theoretical proof that it leads to optimal global and local coherence that concerns us. On some occasions, e.g., a news article adopting the “Wall Street Journal Formula” (Rich and Harper, 2007) where conceptually related sentences are placed at the beginning and the end, sentence conceptual relatedness does not necessarily correlate with spatial proximity and thus selected sentences may need to be rearranged for better readability. We are not aware of any published work that has empirically compared alternative ways of sentence ordering for singledocument summarization. The experimental results reported below may draw some attention to this taken-for-granted issue. 3.1 Data and Method We prepared 3 datasets of 60 documents each, the first (D400) consisting of documents of about 400 words from the Document Understanding Conference (DUC) 01/02 datasets; the second (D1k) consisting of documents of about 1000 words manually selected from popular English journals such as The Wall Street Journal, The Washington Post, etc; the third (D2k) consisting of documents of about 2000 words from the DUC 01/02 dataset. Then we generated 100-word summaries for D400 and 200-word summaries for D1k and D2k. Since sentence selection is not our focus, the 180 summaries were all extracts produced by a simple but robust summarizer built on term frequency and sentence position (Aone et al., 1999). Three human annotators were employed to each provide reference orderings for the 180 summaries and mark paragraph (of at least 2 sentences) boundaries, which will be used by one of the evaluation metrics described below. In our implementation of the grouping-based ordering, sentences are represented as entity vectors and the threshold t = Avg( Sim( Sm, Sn)) c , the average sentence similarity in a group multiplied by a coefficient empirically decided on separate held-out datasets of 20 documents for each length category. The “group-representing sentence” is the textually earliest sentence in the group. We experimented with both CC and MKM to generate sentence groups and all the proposed algorithms in 2.3 for group-level and sentence- level orderings, resulting in 8 combinations as test orderings, each coded in the format of “Grouping (CC/MKM) / Group ordering (T/G) / Sentence ordering (T/G)”, where T and G represent the text order approach and the greedy selection approach respectively. For example, “CC/T/G” means grouping with CC, group ordering with text order, and sentence ordering with the greedy approach. We evaluated the test orderings against the 3 reference orderings and compute the average (Madnani et al., 2007) by using 3 different metrics. The first metric is Kendall’s τ (Lapata 2003, 2006), which has been reliably used in ordering evaluations (Bollegala et al., 2006; Madnani et al., 2007). It measures ordering differences in terms of the number of adjacent sentence inversions necessary to convert a test ordering to the reference ordering. 1 4m N( N 1) In this formula, m represents the number of inversions described above and N is the total number of sentences. The second metric is the Average Continuity (AC) proposed by Bollegala et al. (2006), which captures the intuition that the quality of sentence orderings can be estimated by the number of correctly arranged continuous sentences. 9 ACexp(1/(k1) klog(Pn)) n2 In this formula, k is the maximum number of continuous sentences, α is a small value in case Pn = 1. Pn, the proportion of continuous sentences of length n in an ordering, is defined as m/(N – n + 1) where m is the number of continuous sentences of length n in both the test and reference orderings and N is the total number of sentences. Following (Bollegala et al., 2006), we set k = Min(4, N) and α = 0.01. We also go a step further by considering only the continuous sentences in a paragraph marked by human annotators, because paragraphs are local meaning units perceived by human readers and the order of continuous sentences in a paragraph is more strongly grounded than the order of continuous sentences across paragraph boundaries. So in-paragraph sentence continuity is a better estimation for the quality of sentence orderings. This is our third metric: Paragraph-level Average Continuity (P-AC). P-ACexp(1/(k1) klog(PPn )) n2 Here PPn = m ’/(N – n + 1), where m ’ is the number of continuous sentences of length n in both the test ordering and a paragraph of the reference ordering. All the other parameters are as defined in AC and Pn. 3.2 Results The following tables show the results measured by each metric. For comparison, we also include a “Baseline” that uses the text order. For each dataset, two-tailed t-test is conducted between the top scorer and all the other orderings and statistical significance (p < 0.05) is marked with *. In general, our grouping-based ordering scheme outperforms the baseline for news articles of various lengths and statistically significant improvement can be observed on each dataset. This result casts serious doubt on the widely accepted practice of taking the text order for single-document summary generation, which is a major finding from our study. The three evaluation metrics give consistent results although they are based on different observations. The P-AC scores are much lower than their AC counterparts because of its strict paragraph constraint. Interestingly, applying the text order posterior to sentence grouping for group-level and sentence- level ordering leads to consistently optimal performance, as the top scorers on each dataset are almost all “__/T/T”. This suggests that the textual realization of coherence can be sought in the source document if possible, after the selected sentences are rearranged. It is in this sense that the general intuition about the text order is justified. It also suggests that tightly knit paragraphs (groups), where the sentences are closely connected, play a crucial role in creating a coherence flow. Shuffling those paragraphs may not affect the final coherence1. 1 Ithank an anonymous reviewer for pointing this out. 10 The grouping method does make a difference. While CC works best for the short and long datasets (D400 and D2k), MKM is more effective for the medium-sized dataset D1k. Whether the difference is simply due to length or linguistic/stylistic subtleties is an interesting topic for in-depth study. 4 Conclusion and Future Work We have established a grouping-based ordering scheme to accommodate both local and global coherence for summary generation. Experiments on single-document summaries validate our approach and challenge the well accepted text order by the summarization community. Nonetheless, the results do not necessarily propagate to multi-document summarization, for which the same-document clue for ordering cannot apply directly. Adapting the proposed scheme to multi-document summary generation is the ongoing work we are engaged in. In the next step, we will experiment with alternative sentence representations and ordering algorithms to achieve better performance. We are also considering adapting more sophisticated coherence-oriented models, such as (Soricut and Marcu, 2006; Elsner et al., 2007), to our problem so as to make more interesting comparisons possible. Acknowledgements The reported work was inspired by many talks with my supervisor, Dr. Wenjie Li, who saw through this work down to every writing detail. The author is also grateful to many people for assistance. You Ouyang shared part of his summarization work and helped with the DUC data. Dr. Li Shen, Dr. Naishi Liu, and three participants helped with the experiments. I thank them all. The work described in this paper was partially supported by Hong Kong RGC Projects (No. PolyU 5217/07E). References Aone, C., Okurowski, M. E., Gorlinsky, J., and Larsen, B. 1999. A Trainable Summarizer with Knowledge Acquired from Robust NLP Techniques. In I. Mani and M. T. Maybury (eds.), Advances in Automatic Text Summarization. 71–80. Cambridge, Massachusetts: MIT Press. Barzilay, R., Elhadad, N., and McKeown, K. 2002. Inferring Strategies for Sentence Ordering in Multidocument News Summarization. Journal of Artificial Intelligence Research, 17: 35–55. Barzilay, R. and Lapata, M. 2005. Modeling Local Coherence: An Entity-based Approach. In Proceedings of the 43rd Annual Meeting of the ACL, 141–148. Ann Arbor. Barzilay, R. and Lapata, M. 2008. Modeling Local Coherence: An Entity-Based Approach. Computational Linguistics, 34: 1–34. Barzilay, R. and Lee L. 2004. Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization. In HLT-NAACL 2004: Proceedings of the Main Conference. 113–120. Bollegala, D, Okazaki, N., and Ishizuka, M. 2006. A Bottom-up Approach to Sentence Ordering for Multidocument Summarization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, 385–392. Sydney. Elsner, M., Austerweil, j. & Charniak E. 2007. “A Unified Local and Global Model for Discourse Coherence”. In Proceedings of NAACL HLT 2007, 436-443. Rochester, NY. Grosz, B. J., Aravind K. J., and Scott W. 1995. Centering: A framework for Modeling the Local Coherence of Discourse. Computational Linguistics, 21(2):203–225. Halliday, M. A. K., and Hasan, R. 1976. Cohesion in English. London: Longman. Hovy, E. 2005. Automated Text Summarization. In R. Mitkov (ed.), The Oxford Handbook of Computational Linguistics, pp. 583–598. Oxford: Oxford University Press. Jing, H. 2000. Sentence Reduction for Automatic Text Summarization. In Proceedings of the 6th Applied Natural Language Processing WA, pp. 310–315. Conference, Seattle, Jing, H., and McKeown, K. 2000. Cut and Paste Based Text Summarization. In Proceedings of the 1st NAACL, 178–185. 11 Kehler, A. 2002. Coherence, Reference, and the Theory of Grammar. Stanford, California: CSLI Publications. Lapata, M. 2003. Probabilistic Text Structuring: Experiments with Sentence Ordering. In Proceedings of the Annual Meeting of ACL, 545–552. Sapporo, Japan. Lapata, M. 2006. Automatic evaluation of information ordering: Kendall’s tau. Computational Linguistics, 32(4):1–14. Madnani, N., Passonneau, R., Ayan, N. F., Conroy, J. M., Dorr, B. J., Klavans, J. L., O’leary, D. P., and Schlesinger, J. D. 2007. Measuring Variability in Sentence Ordering for News Summarization. In Proceedings of the Eleventh European Workshop on Natural Language Generation, 81–88. Germany. Mann, W. C. and Thompson, S. 1988. Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. Text, 8:243–281. Rich C., and Harper, C. 2007. Writing and Reporting News: A Coaching Method, Fifth Edition. Thomason Learning, Inc. Belmont, CA. Soricut, R. and Marcu D. 2006. Discourse Generation Using Utility-Trained Coherence Models. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, 803–810. Steibach, M., Karypis, G., and Kumar V. 2000. A Comparison of Document Clustering Techniques. Technical Report 00-034. Department of Computer Science and Engineering, University of Minnesota. Tapiero, I. 2007. Situation Models and Levels of Coherence: Towards a Definition of Comprehension. Mahwah, New Jersey: Lawrence Erlbaum Associates. Wilpon, J. G. and Rabiner, L. R. 1985. A Modified Kmeans Clustering Algorithm for Use in Isolated Word Recognition. In IEEE Trans. Acoustics, Speech, Signal Proc. ASSP-33(3), 587–594. Zhang R., Li, W., and Lu, Q. 2010. Sentence Ordering with Event-Enriched Semantics and Two-Layered Clustering for Multi-Document News Summarization. In COLING 2010: Poster Volume, 1489–1497, Beijing.
5 0.47227386 314 acl-2011-Typed Graph Models for Learning Latent Attributes from Names
Author: Delip Rao ; David Yarowsky
Abstract: This paper presents an original approach to semi-supervised learning of personal name ethnicity from typed graphs of morphophonemic features and first/last-name co-occurrence statistics. We frame this as a general solution to an inference problem over typed graphs where the edges represent labeled relations between features that are parameterized by the edge types. We propose a framework for parameter estimation on different constructions of typed graphs for this problem using a gradient-free optimization method based on grid search. Results on both in-domain and out-of-domain data show significant gains over 30% accuracy improvement using the techniques presented in the paper.
6 0.43384376 12 acl-2011-A Generative Entity-Mention Model for Linking Entities with Knowledge Base
7 0.41788253 23 acl-2011-A Pronoun Anaphora Resolution System based on Factorial Hidden Markov Models
8 0.39540735 260 acl-2011-Recognizing Authority in Dialogue with an Integer Linear Programming Constrained Model
9 0.36415717 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models
10 0.34827551 287 acl-2011-Structural Topic Model for Latent Topical Structure Analysis
11 0.33440727 285 acl-2011-Simple supervised document geolocation with geodesic grids
12 0.33426681 96 acl-2011-Disambiguating temporal-contrastive connectives for machine translation
13 0.32672533 301 acl-2011-The impact of language models and loss functions on repair disfluency detection
14 0.31897217 99 acl-2011-Discrete vs. Continuous Rating Scales for Language Evaluation in NLP
15 0.31803873 320 acl-2011-Unsupervised Discovery of Domain-Specific Knowledge from Text
16 0.31743035 118 acl-2011-Entrainment in Speech Preceding Backchannels.
17 0.30847523 98 acl-2011-Discovery of Topically Coherent Sentences for Extractive Summarization
18 0.30087778 191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges
19 0.29823616 117 acl-2011-Entity Set Expansion using Topic information
20 0.28930536 95 acl-2011-Detection of Agreement and Disagreement in Broadcast Conversations
topicId topicWeight
[(5, 0.039), (17, 0.056), (26, 0.016), (27, 0.197), (30, 0.054), (31, 0.01), (37, 0.086), (39, 0.043), (41, 0.08), (55, 0.036), (57, 0.021), (59, 0.032), (72, 0.04), (88, 0.02), (91, 0.045), (96, 0.132), (97, 0.014)]
simIndex simValue paperId paperTitle
1 0.90790182 337 acl-2011-Wikipedia Revision Toolkit: Efficiently Accessing Wikipedias Edit History
Author: Oliver Ferschke ; Torsten Zesch ; Iryna Gurevych
Abstract: We present an open-source toolkit which allows (i) to reconstruct past states of Wikipedia, and (ii) to efficiently access the edit history of Wikipedia articles. Reconstructing past states of Wikipedia is a prerequisite for reproducing previous experimental work based on Wikipedia. Beyond that, the edit history of Wikipedia articles has been shown to be a valuable knowledge source for NLP, but access is severely impeded by the lack of efficient tools for managing the huge amount of provided data. By using a dedicated storage format, our toolkit massively decreases the data volume to less than 2% of the original size, and at the same time provides an easy-to-use interface to access the revision data. The language-independent design allows to process any language represented in Wikipedia. We expect this work to consolidate NLP research using Wikipedia in general, and to foster research making use of the knowledge encoded in Wikipedia’s edit history.
2 0.79704469 96 acl-2011-Disambiguating temporal-contrastive connectives for machine translation
Author: Thomas Meyer
Abstract: Temporal–contrastive discourse connectives (although, while, since, etc.) signal various types ofrelations between clauses such as temporal, contrast, concession and cause. They are often ambiguous and therefore difficult to translate from one language to another. We discuss several new and translation-oriented experiments for the disambiguation of a specific subset of discourse connectives in order to correct some of the translation errors made by current statistical machine translation systems.
same-paper 3 0.79016995 101 acl-2011-Disentangling Chat with Local Coherence Models
Author: Micha Elsner ; Eugene Charniak
Abstract: We evaluate several popular models of local discourse coherence for domain and task generality by applying them to chat disentanglement. Using experiments on synthetic multiparty conversations, we show that most models transfer well from text to dialogue. Coherence models improve results overall when good parses and topic models are available, and on a constrained task for real chat data.
4 0.71821719 104 acl-2011-Domain Adaptation for Machine Translation by Mining Unseen Words
Author: Hal Daume III ; Jagadeesh Jagarlamudi
Abstract: We show that unseen words account for a large part of the translation error when moving to new domains. Using an extension of a recent approach to mining translations from comparable corpora (Haghighi et al., 2008), we are able to find translations for otherwise OOV terms. We show several approaches to integrating such translations into a phrasebased translation system, yielding consistent improvements in translations quality (between 0.5 and 1.5 Bleu points) on four domains and two language pairs.
5 0.69609046 129 acl-2011-Extending the Entity Grid with Entity-Specific Features
Author: Micha Elsner ; Eugene Charniak
Abstract: We extend the popular entity grid representation for local coherence modeling. The grid abstracts away information about the entities it models; we add discourse prominence, named entity type and coreference features to distinguish between important and unimportant entities. We improve the best result for WSJ document discrimination by 6%.
6 0.6877315 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction
7 0.68605292 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction
8 0.68462098 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models
9 0.68432784 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing
10 0.68304819 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning
11 0.68252766 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
12 0.68016732 311 acl-2011-Translationese and Its Dialects
13 0.67965114 209 acl-2011-Lexically-Triggered Hidden Markov Models for Clinical Document Coding
14 0.67891973 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
15 0.67800587 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling
16 0.67705727 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition
17 0.67647755 32 acl-2011-Algorithm Selection and Model Adaptation for ESL Correction Tasks
18 0.6744796 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
19 0.67445236 157 acl-2011-I Thou Thee, Thou Traitor: Predicting Formal vs. Informal Address in English Literature
20 0.6744107 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora