acl acl2011 acl2011-275 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Margaret Mitchell ; Aaron Dunlop ; Brian Roark
Abstract: In this paper, we argue that ordering prenominal modifiers typically pursued as a supervised modeling task is particularly wellsuited to semi-supervised approaches. By relying on automatic parses to extract noun phrases, we can scale up the training data by orders of magnitude. This minimizes the predominant issue of data sparsity that has informed most previous approaches. We compare several recent approaches, and find improvements from additional training data across the board; however, none outperform a simple n-gram model. – –
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract In this paper, we argue that ordering prenominal modifiers typically pursued as a supervised modeling task is particularly wellsuited to semi-supervised approaches. [sent-10, score-1.086]
2 By relying on automatic parses to extract noun phrases, we can scale up the training data by orders of magnitude. [sent-11, score-0.164]
3 We compare several recent approaches, and find improvements from additional training data across the board; however, none outperform a simple n-gram model. [sent-13, score-0.131]
4 – – 1 Introduction In any given noun phrase (NP), an arbitrary number of nominal modifiers may be used. [sent-14, score-0.262]
5 The order of these modifiers affects how natural or fluent a phrase sounds. [sent-15, score-0.262]
6 Determining a natural ordering is a key task in the surface realization stage of a natural language generation (NLG) system, where the adjectives and other modifiers chosen to identify a referent must be ordered before a final string is produced. [sent-16, score-0.788]
7 For example, consider the alternation between the phrases “big red ball” and “red big ball”. [sent-17, score-0.134]
8 The phrase “big red ball” provides a basic ordering of the words big and red. [sent-18, score-0.498]
9 The reverse ordering, in “red big ball”, sounds strange, a phrase that would only occur in marked situations. [sent-19, score-0.073]
10 There is no consensus on the exact qualities that affect a modifier’s position, but it is clear that some modifier orderings sound more natural than others, even if all are strictly speaking grammatical. [sent-20, score-0.404]
11 Determining methods for ordering modifiers prenominally and investigating the factors underlying modifier ordering have been areas of considerprocessing (Shaw and Hatzivassiloglou, 1999; Malouf, 2000; Mitchell, 2009; Dunlop et al. [sent-21, score-1.394]
12 A central issue in work on modifier ordering is how to order modifiers that are unobserved during system development. [sent-23, score-1.03]
13 English has upwards of 200,000 words, with over 50,000 words in the vocabulary of an educated adult (Aitchison, 2003). [sent-24, score-0.029]
14 Up to a quarter of these words may be adjectives, which poses a significant problem for any system that attempts to categorize English adjectives in ways that are useful for an ordering task. [sent-25, score-0.453]
15 Extensive in-context observation of adjectives and other modifiers is required to adequately characterize their behavior. [sent-26, score-0.351]
16 Developers of automatic modifier ordering systems have thus spent considerable effort attempting to make reliable predictions despite sparse data, and have largely limited their systems to order modifier pairs instead of full modifier strings. [sent-27, score-1.607]
17 Conventional wisdom has been that direct evidence methods such as simple n-gram modeling are insufficient for capturing such a complex and productive process. [sent-28, score-0.077]
18 Recent approaches have therefore utilized increasingly sophisticated data-driven approaches. [sent-29, score-0.027]
19 This yields orders-ofmagnitude larger training sets, so that methods that are sensitive to sparse data and/or are domain specific can be trained on sufficient data. [sent-37, score-0.114]
20 This paper presents two important results: 1) N-gram modeling performs better than previously believed for this task, and in fact surpasses current class-based systems. [sent-39, score-0.034]
21 1 2) Automatic parsers can effectively provide essentially unlimited training data for learning modifier ordering preferences. [sent-40, score-0.814]
22 Our results point the way to larger scale datadriven approaches to this and related tasks. [sent-41, score-0.027]
23 2 Related Work In one of the earliest automatic prenominal modifier ordering systems, Shaw and Hatzivassiloglou (1999) ordered pairs of modifiers, including adjectives, nouns (“baseball field”); gerunds, (“running man”); and participles (“heated debate”). [sent-42, score-1.165]
24 They described a direct evidence method, a transitivity method, and a clustering method for ordering these different kinds of modifiers, with the transitivity technique returning the highest accuracy of 90. [sent-43, score-0.494]
25 However, when testing across domains, their accuracy dropped to 56%, not much higher than random guessing. [sent-45, score-0.083]
26 Malouf (2000) continued this work, ordering prenominal adjective pairs in the BNC. [sent-46, score-0.795]
27 He abandoned a bigram model, finding it achieved only 75. [sent-47, score-0.029]
28 57% prediction accuracy, and instead pursued statistical and machine learning techniques that are more robust to data sparsity. [sent-48, score-0.044]
29 However, it is not clear whether the proposed ordering approaches extend to other kinds of modifiers, such as gerund verbs and nouns, and he did not present analysis of cross-domain generalization. [sent-51, score-0.42]
30 1But note that these approaches may still be useful, e. [sent-52, score-0.027]
31 Her system mapped each modifier to a class based on the frequency with which it occurs in different prenominal positions, and ordered unseen sequences based on these classes. [sent-55, score-0.846]
32 (2010) used a Multiple Sequence Alignment (MSA) approach to order modifiers, achieving the highest accuracy to date across different domains. [sent-57, score-0.083]
33 In contrast to earlier work, both systems order full modifier strings. [sent-58, score-0.404]
34 Below, we evaluate these most recent systems, scaling up the training data by several orders ofmagnitude. [sent-59, score-0.146]
35 Our results indicate that an n-gram model outperforms previous systems, and generalizes quite well across different domains. [sent-60, score-0.029]
36 Journal (WSJ), Switchboard (SWBD) and Brown corpus sections of the Penn Treebank (Marcus et al. [sent-63, score-0.083]
37 Table 1 lists the NP length distributions for each training corpus. [sent-66, score-0.046]
38 The WSJ training corpus yields just under 5,100 distinct modifier types (without normal- izing for capitalization), while the NYT data yields 105,364. [sent-67, score-0.524]
39 Note that the number of NPs extracted from the manual and automatic parses of the WSJ are quite close. [sent-68, score-0.077]
40 We find that the overlap between the when the goal is to construct general modifier classes. [sent-69, score-0.404]
41 237two groups is well over 90%, suggesting that extract- ing NPs from a large, automatically parsed corpus will provide phrases comparable to manually annotated NPs. [sent-70, score-0.033]
42 We evaluate across a variety of domains, including (1) the WSJ sections 22-24, and sections commensurate in size of (2) the SWBD corpus and (3) the Brown corpus. [sent-71, score-0.295]
43 4 Methods In this section, we present two novel prenominal modifier ordering approaches: a 5-gram model and an EM-trained HMM. [sent-73, score-1.121]
44 In both systems, modifiers that occur only once in the training data are given the Berkeley parser OOV class labels (Petrov, 2010). [sent-74, score-0.353]
45 In Section 5, we compare these approaches to the one-class system described in Mitchell (2010) and the discriminative MSA described in Dunlop et al. [sent-75, score-0.027]
46 1 N-Gram Modeling We used the SRILM toolkit (Stolcke, 2002) to build unpruned 5-gram models using interpolated modified Kneser-Ney smoothing (Chen and Goodman, 1998). [sent-79, score-0.029]
47 We explored building n-gram models based on entire observed sequences (sentences) and on extracted multiple modifier NPs. [sent-81, score-0.404]
48 As shown in Table 3, we found a very large (12% absolute) accuracy improvement in a model trained with just NP sequences. [sent-82, score-0.054]
49 This is likely due to several factors, including the role of the begin string symbol , which helps to capture word preferences for occurring first in a modifier sequence; also the behavior of modifiers when they occur in NPs may differ from how they behave in other contexts. [sent-83, score-0.725]
50 Note that the full-sentence n-gram model performs similarly to Malouf’s bigram model; although the results are not directly comparable, this may explain the common impression that n-gram modeling is not effective for modifier ordering. [sent-84, score-0.438]
51 1 Table 3: Modifier ordering accuracy on WSJ sections 2224, trained on sections 2-21 4. [sent-87, score-0.584]
52 , 1977) to learn the parameterizations of such an HMM. [sent-92, score-0.029]
53 The model is defined in terms of state transition probabilities P(c0 | c), i. [sent-93, score-0.121]
54 ,ed th c to a state yla obfel teradn csi0;and state observation probabilities P(w | c), i. [sent-97, score-0.041]
55 , tahned probability vofa emitting awbiolritdi w fPro(wm a particular class c. [sent-99, score-0.045]
56 Since the classes are predicting an or- dering, we include hard constraints on class transitions. [sent-100, score-0.045]
57 Specifically, we forbid a transition from a class closer to the head noun to one farther away. [sent-101, score-0.125]
58 More formally, if the subscript of a class indicates its distance from the head, then for any i,j, P(ci | cj) = 0 if i ≥ j; i. [sent-102, score-0.045]
59 , ci is stipulated to never occur clo)se =r t0o tfh ie ≥he jad; it. [sent-104, score-0.029]
60 We initialize the model with a uniform distribution over allowed transition and emission probabilities, and use add-δ regularization in the M-step of EM at each iteration. [sent-108, score-0.132]
61 Rather than training to full convergence of the corpus likelihood, we stop training when there is no improvement in ordering accuracy on the held-out dataset for five iterations, and output the best scoring model. [sent-111, score-0.51]
62 Because of the constraints on transition probabilities, straightforward application of EM leads to the transition probabilities strongly skewing the learning of emission probabilities. [sent-112, score-0.282]
63 We thus followed a generalized EM procedure (Neal and Hinton, 1998), updating only emission probabilities until no more improvement is achieved, and then training both on extracted NPs. [sent-113, score-0.139]
64 Often, we single-class system (1-cl), HMM and MSA systems, under various training conditions. [sent-115, score-0.046]
65 find no improvement with the inclusion of transition probabilities, and they are left uniform. [sent-116, score-0.08]
66 In this case, test ordering is determined by the class label alone. [sent-117, score-0.409]
67 5 Empirical results Several measures have been used to evaluate the accuracy of a system’s modifier ordering, including both type/token accuracy, pairwise accuracy, and full string accuracy. [sent-118, score-0.487]
68 We evaluate full string ordering accuracy over all tokens in the evaluation set. [sent-119, score-0.447]
69 For every NP, if the model’s highest-scoring ordering is identical to the actual observed order, it is correct; otherwise, it is incorrect. [sent-120, score-0.364]
70 We evaluate under a variety oftraining conditions, on WSJ sections 22-24, as well as the testing sections from the Switchboard and Brown corpus portions of the Penn Treebank. [sent-122, score-0.195]
71 We perform no domainspecific tuning, so the results on the Switchboard and Brown corpora demonstrate cross-domain applicability of the approaches. [sent-123, score-0.032]
72 1 Manual parses versus automatic parses We begin by comparing the NPs extracted from manual parses to those extracted from automatic parses. [sent-125, score-0.231]
73 We parsed Wall Street Journal sections 02 through 21 using cross-validation to ensure that the parses are as errorful as when sentences have never been observed by training. [sent-126, score-0.193]
74 Table 4 compares models trained on these two training corpora, as evaluated on the manuallyannotated test set. [sent-127, score-0.046]
75 No system’s accuracy degrades greatly when using automatic parses, indicating that we can likely derive useful training data by automat5. [sent-128, score-0.1]
76 2 Semi-supervised models We now evaluate performance of the models on the scaled up training data. [sent-129, score-0.082]
77 Using the Berkeley parser, we parsed 169 million words of NYT text from the English Gigaword corpus (Graff and Cieri, 2003), extracted the multiple modifier NPs, and trained our various models on this data. [sent-130, score-0.437]
78 Rows 3-6 of Table 4 show the accuracy on WSJ sections 22-24 after training on 10%, 20%, 50% and 100% of this data. [sent-131, score-0.183]
79 Note that this represents approximately 150 times the amount of training data as the original treebank training data. [sent-132, score-0.127]
80 Even with just 10% of this data (a 15-fold increase in the training data), we see across the board improvements. [sent-133, score-0.119]
81 Using all of the NYT data results in approximately 5% absolute performance increase for the n-gram and MSA models, yielding roughly commensurate performance, over 92% accuracy. [sent-134, score-0.1]
82 Although we do not have space to present the results in this paper, we found further improvements (over 1% absolute, statistically significant) by combining the four models, indicating a continued benefit of the other models, even if none of them best the n-gram individually. [sent-135, score-0.094]
83 Further, contrary to conventional wisdom, n-gram models are very competitive with recent highaccuracy frameworks. [sent-138, score-0.037]
84 The n-gram model still outperforms the other systems, but improves by well over a percent, ically parsing a large, unlabeled training corpus. [sent-141, score-0.046]
85 239while the class-based HMM and MSA approaches are relatively static. [sent-142, score-0.027]
86 3 Cross-domain evaluation With respect to cross-domain applicability, we see that, as with the WSJ evaluation, the MSA and ngram approaches are roughly commensurate on the Brown corpus; but the n-gram model shows a greater advantage on the Switchboard test set when trained on the NYT data. [sent-145, score-0.127]
87 Perhaps this is due to higher reliance on conventionalized collocations in the spoken language of Switchboard. [sent-146, score-0.029]
88 Finally, it is clear that the addition of the WSJ data to the NYT data yields improvements only for the specific newswire domain none of the results change much for these — two new domains when the WSJ data is included (last row of the table). [sent-147, score-0.139]
89 We note that the improvements observed when scaling the training corpus with in-domain data persist when applied to very diverse domains. [sent-148, score-0.132]
90 Interestingly, n-gram models, which may have been considered unlikely to generalize well to other domains, maintain their superior performance in each trial. [sent-149, score-0.03]
91 6 Discussion In this paper, we demonstrated the efficacy of scaling up training data for prenominal modifier ordering using automatic parses. [sent-150, score-1.226]
92 We presented two novel systems for ordering prenominal modifiers, and demonstrated that with sufficient data, a simple n-gram model outperforms position-specific models, such as an EM-trained HMM and the MSA approach of Dunlop et al. [sent-151, score-0.717]
93 The accuracy achieved by the n-gram model is particularly interesting, since such models have previously been considered ineffective for this task. [sent-153, score-0.054]
94 This does not obviate the need for a class based model —modifier classes may inform linguistic research, and system combination still yields large improvements —but points to new data-rich methods for learning such models. [sent-154, score-0.138]
95 An empirical study of smoothing techniques for language modeling. [sent-165, score-0.029]
96 The order of prenominal adjectives in natural language generation. [sent-187, score-0.442]
97 A flexible approach to classbased ordering of prenominal modifiers. [sent-207, score-0.759]
wordName wordTfidf (topN-words)
[('modifier', 0.404), ('ordering', 0.364), ('prenominal', 0.353), ('modifiers', 0.262), ('dunlop', 0.229), ('msa', 0.208), ('wsj', 0.171), ('nyt', 0.162), ('nps', 0.112), ('ball', 0.108), ('malouf', 0.102), ('commensurate', 0.1), ('switchboard', 0.092), ('adjectives', 0.089), ('shaw', 0.088), ('mitchell', 0.087), ('margaret', 0.086), ('sections', 0.083), ('transition', 0.08), ('hmm', 0.077), ('parses', 0.077), ('big', 0.073), ('em', 0.068), ('danks', 0.067), ('graff', 0.067), ('red', 0.061), ('scaling', 0.059), ('cieri', 0.059), ('petrov', 0.056), ('np', 0.055), ('aberdeen', 0.054), ('swbd', 0.054), ('accuracy', 0.054), ('emission', 0.052), ('berkeley', 0.05), ('neal', 0.046), ('domains', 0.046), ('brown', 0.046), ('training', 0.046), ('class', 0.045), ('pursued', 0.044), ('board', 0.044), ('ordered', 0.044), ('wisdom', 0.043), ('oregon', 0.042), ('classbased', 0.042), ('probabilities', 0.041), ('orders', 0.041), ('verbal', 0.04), ('adjective', 0.04), ('continued', 0.038), ('transitivity', 0.038), ('aaron', 0.038), ('hatzivassiloglou', 0.037), ('conventional', 0.037), ('yields', 0.037), ('dempster', 0.036), ('scaled', 0.036), ('roark', 0.036), ('health', 0.035), ('treebank', 0.035), ('modeling', 0.034), ('brian', 0.034), ('parsed', 0.033), ('applicability', 0.032), ('sparse', 0.031), ('superior', 0.03), ('slav', 0.03), ('behavior', 0.03), ('across', 0.029), ('aitchison', 0.029), ('stipulated', 0.029), ('parameterizations', 0.029), ('skewing', 0.029), ('dunlopa', 0.029), ('ndw', 0.029), ('ofverbal', 0.029), ('baseball', 0.029), ('gerund', 0.029), ('ays', 0.029), ('conventionalized', 0.029), ('glucksberg', 0.029), ('sam', 0.029), ('oftraining', 0.029), ('obviate', 0.029), ('fpro', 0.029), ('wellsuited', 0.029), ('abandoned', 0.029), ('borne', 0.029), ('educated', 0.029), ('inapplicability', 0.029), ('string', 0.029), ('none', 0.029), ('penn', 0.029), ('smoothing', 0.029), ('srilm', 0.029), ('portland', 0.029), ('approaches', 0.027), ('improvements', 0.027), ('scotland', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 275 acl-2011-Semi-Supervised Modeling for Prenominal Modifier Ordering
Author: Margaret Mitchell ; Aaron Dunlop ; Brian Roark
Abstract: In this paper, we argue that ordering prenominal modifiers typically pursued as a supervised modeling task is particularly wellsuited to semi-supervised approaches. By relying on automatic parses to extract noun phrases, we can scale up the training data by orders of magnitude. This minimizes the predominant issue of data sparsity that has informed most previous approaches. We compare several recent approaches, and find improvements from additional training data across the board; however, none outperform a simple n-gram model. – –
2 0.61115581 237 acl-2011-Ordering Prenominal Modifiers with a Reranking Approach
Author: Jenny Liu ; Aria Haghighi
Abstract: In this work, we present a novel approach to the generation task of ordering prenominal modifiers. We take a maximum entropy reranking approach to the problem which admits arbitrary features on a permutation of modifiers, exploiting hundreds ofthousands of features in total. We compare our error rates to the state-of-the-art and to a strong Google ngram count baseline. We attain a maximum error reduction of 69.8% and average error reduction across all test sets of 59. 1% compared to the state-of-the-art and a maximum error reduction of 68.4% and average error reduction across all test sets of 41.8% compared to our Google n-gram count baseline.
3 0.2103354 280 acl-2011-Sentence Ordering Driven by Local and Global Coherence for Summary Generation
Author: Renxian Zhang
Abstract: In summarization, sentence ordering is conducted to enhance summary readability by accommodating text coherence. We propose a grouping-based ordering framework that integrates local and global coherence concerns. Summary sentences are grouped before ordering is applied on two levels: group-level and sentence-level. Different algorithms for grouping and ordering are discussed. The preliminary results on single-document news datasets demonstrate the advantage of our method over a widely accepted method. 1 Introduction and Background The canonical pipeline of text summarization consists of topic identification, interpretation, and summary generation (Hovy, 2005). In the simple case of extraction, topic identification and interpretation are conflated to sentence selection and concerned with summary informativeness. In . comparison, summary generation addresses summary readability and a frequently discussed generation technique is sentence ordering. It is implicitly or explicitly stated that sentence ordering for summarization is primarily driven by coherence. For example, Barzilay et al. (2002) use lexical cohesion information to model local coherence. A statistical model by Lapata (2003) considers both lexical and syntactic features in calculating local coherence. More globally biased is Barzilay and Lee’s (2004) HMM-based content model, which models global coherence with word distribution patterns. Whilst the above models treat coherence as lexical or topical relations, Barzilay and Lapata (2005, 2008) explicitly model local coherence with an entity grid model trained for optimal syntactic role transitions of entities. 6 . polyu .edu .hk Although coherence in those works is modeled in the guise of “lexical cohesion”, “topic closeness”, “content relatedness”, etc., few published works simultaneously accommodate coherence on the two levels: local coherence and global coherence, both of which are intriguing topics in text linguistics and psychology. For sentences, local coherence means the wellconnectedness between adjacent sentences through lexical cohesion (Halliday and Hasan, 1976) or entity repetition (Grosz et al., 1995) and global coherence is the discourse-level relation connecting remote sentences (Mann and Thompson, 1995; Kehler, 2002). An abundance of psychological evidences show that coherence on both levels is manifested in text comprehension (Tapiero, 2007). Accordingly, an apt sentence ordering scheme should be driven by such concerns. We also note that as sentence ordering is usually discussed only in the context of multi-document summarization, factors other than coherence are also considered, such as time and source sentence position in Bollegala et al.’s (2006) “agglomerative ordering” approach. But it remains an open question whether sentence ordering is non-trivial for single-document summarization, as it has long been recognized as an actual strategy taken by human summarizers (Jing, 1998; Jing and McKeown, 2000) and acknowledged early in work on sentence ordering for multi-document summarization (Barzilay et al., 2002). In this paper, we outline a grouping-based sentence ordering framework that is driven by the concern of local and global coherence. Summary sentences are grouped according to their conceptual relatedness before being ordered on two levels: group-level ordering and sentence-level ordering, which capture global coherence and local coherence in an integrated model. As a preliminary study, we applied the framework to singlePortland, ORP,r UoSceAed 1i9ng-2s4 of Ju tnhee 2 A0C1L1-H. ?Lc T2 0201111 A Ssstuodcieanttio Snes fsoiro Cn,o pmapguesta 6t–io1n1a,l Linguistics document summary generation and obtained interesting results. The main contributions of this work are: (1) we stress the need to channel sentence ordering research to linguistic and psychological findings about text coherence; (2) we propose a groupingbased ordering framework that integrates both local and global coherence; (3) we find in experiments that coherence-driven sentence ordering improves the readability of singledocument summaries, for which sentence ordering is often considered trivial. In Section 2, we review related ideas and techniques in previous work. Section 3 provides the details of grouping-based sentence ordering. The preliminary experimental results are presented in Section 4. Finally, Section 5 concludes the whole paper and describes future work. 2 Grouping-Based Ordering Our ordering framework is designed to capture both local and global coherence. Globally, we identify related groups among sentences and find their relative order. Locally, we strive to keep sentence similar or related in content close to each other within one group. 2.1 Sentence Representation As summary sentences are isolated from their original context, we retain the important content information by representing sentences as concept vectors. In the simplest case, the “concept” is equivalent to content word. A drawback of this practice is that it considers every content word equally contributive to the sentence content, which is not always true. For example, in the news domain, entities realized as NPs are more important than other concepts. To represent sentences as entity vectors, we identify both common entities (as the head nouns of NPs) and named entities. Two common entities are equivalent if their noun stems are identical or synonymous. Named entities are usually equated by identity. But in order to improve accuracy, we also consider: 1) structural subsumption (one is part of another); 2) hypernymy and holonymy (the named entities are in a superordinate-subordinate or part-whole relation). Now with summary sentence Si and m entities eik (k = 1 m), Si = (wf(ei1), wf(ei2), wf(eim)), … … … … … …, 7 where wf(eik) = wk× ×f(eik), f(eik) is the frequency of eik and wk is the weight of eik. We define wk = 1 if eik is a common entity and wk = 2 if eik is a named entity. We give double weight to named entities because of their significance to news articles. After all, a news story typically contains events, places, organizations, people, etc. that denote the news theme. Other things being equal, two sentences sharing a mention of named entities are thematically closer than two sentences sharing a mention of common entities. Alternatively, we can realize the “concept” as “event” because events are prevalent semantic constructs that bear much of the sentence content in some domains (e.g., narratives and news reports). To represent sentences as event vectors, we can follow Zhang et al.’s (2010) method at the cost of more complexity. 2.2 Sentence Grouping To meet the global need of identifying sentence groups, we develop two grouping algorithms by applying graph-based operation and clustering. Connected Component Finding (CC) This algorithm treats grouping sentences as finding connected components (CC) in a text graph TG = (V, E), where V represents the sentences and E the sentence relations weighted by cosine similarity. Edges with weight < t, a threshold, are removed because they represent poor sentence coherence. The resultant graph may be disconnected, in which we find all of its connected components, using depth-first search. The connected components are the groups we are looking for. Note that this method cannot guarantee that every two sentences in such a group are directly linked, but it does guarantee that there exists a path between every sentence pair. Modified K-means Clustering (MKM) Observing that the CC method finds only coherent groups, not necessarily groups of coherent sentences, we develop a second algorithm using clustering. A good choice might be K-means as it is efficient and outperforms agglomerative clustering methods in NLP applications (Steibach et al., 2000), but the difficulty with the conventional K-means is the decision of K. Our solution is modified K-means (MKM) based on (Wilpon and Rabiner, 1985). Let’s denote cluster iby CLi and cluster similarity by Sim(CLi) =SimM,SiinnCLi(Sim( Sim,Sin)), where Sim( Sim,Sin)is their cvMeluin231st.(roCWSD21imLsdhtoIf(=ClaSehbLsimt)l1;(zehS<-amncs,tdoeSa1vn)c;st:el=uMionvrate(lhcSKiosmg-tC;eblLayn)s,c2riuegant wcoelsunathdi cosine. The following illustrates the algorithm. The above algorithm stops iterating when each cluster contains all above-threshold-similarity sentence pairs or only one sentence. Unlike CC, MKM results in more strongly connected groups, or groups of coherence sentences. 2.3 Ordering Algorithms After the sentences are grouped, ordering is to be conducted on two levels: group and sentence. Composed of closely related sentences, groups simulate high-level textual constructs, such as “central event”, “cause”, “effect”, “background”, etc. for news articles, around which sentences are generated for global coherence. For an intuitive example, all sentences about “cause” should immediately precede all sentences about “effect” to achieve optimal readability. We propose two approaches to group-level ordering. 1) If the group sentences come from the same document, group (Gi) order is decided by the group-representing sentence (gi) order ( means “precede”) in the text. gi gj Gi Gj 2) Group order is decided in a greedy fashion in order to maximize the connectedness between adjacent groups, thus enhancing local coherence. Each time a group is selected to achieve maximum similarity with the ordered groups and the first ordered group (G1) is selected to achieve maximum similarity with all the other groups. G1argmGaxG'GSim( G , G') GiGuanrogrdemred agrxoupsij1 Sim( Gj, G) (i > 1) where Sim(G, G’) is the average sentence cosine similarity between G and G’. 8 Within the ordered groups, sentence-level ordering is aimed to enhance local coherence by placing conceptually close sentences next to each other. Similarly, we propose two approaches. 1) If the sentences come from the same document, they are arranged by the text order. 2) Sentence order is greedily decided. Similar to the decision of group order, with ordered sentence Spi in group Gp: Sp1argSm GpaxS'SSim( S , S') SpiSunorader egd smenteanxces in Gpji1 Sim( Spj,S )(i > 1) Note that the text order is used as a common heuristic, based on the assumption that the sentences are arranged coherently in the source document, locally and globally. 3 Experiments and Preliminary Results Currently, we have evaluated grouping-based ordering on single-document summarization, for which text order is usually considered sufficient. But there is no theoretical proof that it leads to optimal global and local coherence that concerns us. On some occasions, e.g., a news article adopting the “Wall Street Journal Formula” (Rich and Harper, 2007) where conceptually related sentences are placed at the beginning and the end, sentence conceptual relatedness does not necessarily correlate with spatial proximity and thus selected sentences may need to be rearranged for better readability. We are not aware of any published work that has empirically compared alternative ways of sentence ordering for singledocument summarization. The experimental results reported below may draw some attention to this taken-for-granted issue. 3.1 Data and Method We prepared 3 datasets of 60 documents each, the first (D400) consisting of documents of about 400 words from the Document Understanding Conference (DUC) 01/02 datasets; the second (D1k) consisting of documents of about 1000 words manually selected from popular English journals such as The Wall Street Journal, The Washington Post, etc; the third (D2k) consisting of documents of about 2000 words from the DUC 01/02 dataset. Then we generated 100-word summaries for D400 and 200-word summaries for D1k and D2k. Since sentence selection is not our focus, the 180 summaries were all extracts produced by a simple but robust summarizer built on term frequency and sentence position (Aone et al., 1999). Three human annotators were employed to each provide reference orderings for the 180 summaries and mark paragraph (of at least 2 sentences) boundaries, which will be used by one of the evaluation metrics described below. In our implementation of the grouping-based ordering, sentences are represented as entity vectors and the threshold t = Avg( Sim( Sm, Sn)) c , the average sentence similarity in a group multiplied by a coefficient empirically decided on separate held-out datasets of 20 documents for each length category. The “group-representing sentence” is the textually earliest sentence in the group. We experimented with both CC and MKM to generate sentence groups and all the proposed algorithms in 2.3 for group-level and sentence- level orderings, resulting in 8 combinations as test orderings, each coded in the format of “Grouping (CC/MKM) / Group ordering (T/G) / Sentence ordering (T/G)”, where T and G represent the text order approach and the greedy selection approach respectively. For example, “CC/T/G” means grouping with CC, group ordering with text order, and sentence ordering with the greedy approach. We evaluated the test orderings against the 3 reference orderings and compute the average (Madnani et al., 2007) by using 3 different metrics. The first metric is Kendall’s τ (Lapata 2003, 2006), which has been reliably used in ordering evaluations (Bollegala et al., 2006; Madnani et al., 2007). It measures ordering differences in terms of the number of adjacent sentence inversions necessary to convert a test ordering to the reference ordering. 1 4m N( N 1) In this formula, m represents the number of inversions described above and N is the total number of sentences. The second metric is the Average Continuity (AC) proposed by Bollegala et al. (2006), which captures the intuition that the quality of sentence orderings can be estimated by the number of correctly arranged continuous sentences. 9 ACexp(1/(k1) klog(Pn)) n2 In this formula, k is the maximum number of continuous sentences, α is a small value in case Pn = 1. Pn, the proportion of continuous sentences of length n in an ordering, is defined as m/(N – n + 1) where m is the number of continuous sentences of length n in both the test and reference orderings and N is the total number of sentences. Following (Bollegala et al., 2006), we set k = Min(4, N) and α = 0.01. We also go a step further by considering only the continuous sentences in a paragraph marked by human annotators, because paragraphs are local meaning units perceived by human readers and the order of continuous sentences in a paragraph is more strongly grounded than the order of continuous sentences across paragraph boundaries. So in-paragraph sentence continuity is a better estimation for the quality of sentence orderings. This is our third metric: Paragraph-level Average Continuity (P-AC). P-ACexp(1/(k1) klog(PPn )) n2 Here PPn = m ’/(N – n + 1), where m ’ is the number of continuous sentences of length n in both the test ordering and a paragraph of the reference ordering. All the other parameters are as defined in AC and Pn. 3.2 Results The following tables show the results measured by each metric. For comparison, we also include a “Baseline” that uses the text order. For each dataset, two-tailed t-test is conducted between the top scorer and all the other orderings and statistical significance (p < 0.05) is marked with *. In general, our grouping-based ordering scheme outperforms the baseline for news articles of various lengths and statistically significant improvement can be observed on each dataset. This result casts serious doubt on the widely accepted practice of taking the text order for single-document summary generation, which is a major finding from our study. The three evaluation metrics give consistent results although they are based on different observations. The P-AC scores are much lower than their AC counterparts because of its strict paragraph constraint. Interestingly, applying the text order posterior to sentence grouping for group-level and sentence- level ordering leads to consistently optimal performance, as the top scorers on each dataset are almost all “__/T/T”. This suggests that the textual realization of coherence can be sought in the source document if possible, after the selected sentences are rearranged. It is in this sense that the general intuition about the text order is justified. It also suggests that tightly knit paragraphs (groups), where the sentences are closely connected, play a crucial role in creating a coherence flow. Shuffling those paragraphs may not affect the final coherence1. 1 Ithank an anonymous reviewer for pointing this out. 10 The grouping method does make a difference. While CC works best for the short and long datasets (D400 and D2k), MKM is more effective for the medium-sized dataset D1k. Whether the difference is simply due to length or linguistic/stylistic subtleties is an interesting topic for in-depth study. 4 Conclusion and Future Work We have established a grouping-based ordering scheme to accommodate both local and global coherence for summary generation. Experiments on single-document summaries validate our approach and challenge the well accepted text order by the summarization community. Nonetheless, the results do not necessarily propagate to multi-document summarization, for which the same-document clue for ordering cannot apply directly. Adapting the proposed scheme to multi-document summary generation is the ongoing work we are engaged in. In the next step, we will experiment with alternative sentence representations and ordering algorithms to achieve better performance. We are also considering adapting more sophisticated coherence-oriented models, such as (Soricut and Marcu, 2006; Elsner et al., 2007), to our problem so as to make more interesting comparisons possible. Acknowledgements The reported work was inspired by many talks with my supervisor, Dr. Wenjie Li, who saw through this work down to every writing detail. The author is also grateful to many people for assistance. You Ouyang shared part of his summarization work and helped with the DUC data. Dr. Li Shen, Dr. Naishi Liu, and three participants helped with the experiments. I thank them all. The work described in this paper was partially supported by Hong Kong RGC Projects (No. PolyU 5217/07E). References Aone, C., Okurowski, M. E., Gorlinsky, J., and Larsen, B. 1999. A Trainable Summarizer with Knowledge Acquired from Robust NLP Techniques. In I. Mani and M. T. Maybury (eds.), Advances in Automatic Text Summarization. 71–80. Cambridge, Massachusetts: MIT Press. Barzilay, R., Elhadad, N., and McKeown, K. 2002. Inferring Strategies for Sentence Ordering in Multidocument News Summarization. Journal of Artificial Intelligence Research, 17: 35–55. Barzilay, R. and Lapata, M. 2005. Modeling Local Coherence: An Entity-based Approach. In Proceedings of the 43rd Annual Meeting of the ACL, 141–148. Ann Arbor. Barzilay, R. and Lapata, M. 2008. Modeling Local Coherence: An Entity-Based Approach. Computational Linguistics, 34: 1–34. Barzilay, R. and Lee L. 2004. Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization. In HLT-NAACL 2004: Proceedings of the Main Conference. 113–120. Bollegala, D, Okazaki, N., and Ishizuka, M. 2006. A Bottom-up Approach to Sentence Ordering for Multidocument Summarization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, 385–392. Sydney. Elsner, M., Austerweil, j. & Charniak E. 2007. “A Unified Local and Global Model for Discourse Coherence”. In Proceedings of NAACL HLT 2007, 436-443. Rochester, NY. Grosz, B. J., Aravind K. J., and Scott W. 1995. Centering: A framework for Modeling the Local Coherence of Discourse. Computational Linguistics, 21(2):203–225. Halliday, M. A. K., and Hasan, R. 1976. Cohesion in English. London: Longman. Hovy, E. 2005. Automated Text Summarization. In R. Mitkov (ed.), The Oxford Handbook of Computational Linguistics, pp. 583–598. Oxford: Oxford University Press. Jing, H. 2000. Sentence Reduction for Automatic Text Summarization. In Proceedings of the 6th Applied Natural Language Processing WA, pp. 310–315. Conference, Seattle, Jing, H., and McKeown, K. 2000. Cut and Paste Based Text Summarization. In Proceedings of the 1st NAACL, 178–185. 11 Kehler, A. 2002. Coherence, Reference, and the Theory of Grammar. Stanford, California: CSLI Publications. Lapata, M. 2003. Probabilistic Text Structuring: Experiments with Sentence Ordering. In Proceedings of the Annual Meeting of ACL, 545–552. Sapporo, Japan. Lapata, M. 2006. Automatic evaluation of information ordering: Kendall’s tau. Computational Linguistics, 32(4):1–14. Madnani, N., Passonneau, R., Ayan, N. F., Conroy, J. M., Dorr, B. J., Klavans, J. L., O’leary, D. P., and Schlesinger, J. D. 2007. Measuring Variability in Sentence Ordering for News Summarization. In Proceedings of the Eleventh European Workshop on Natural Language Generation, 81–88. Germany. Mann, W. C. and Thompson, S. 1988. Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. Text, 8:243–281. Rich C., and Harper, C. 2007. Writing and Reporting News: A Coaching Method, Fifth Edition. Thomason Learning, Inc. Belmont, CA. Soricut, R. and Marcu D. 2006. Discourse Generation Using Utility-Trained Coherence Models. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, 803–810. Steibach, M., Karypis, G., and Kumar V. 2000. A Comparison of Document Clustering Techniques. Technical Report 00-034. Department of Computer Science and Engineering, University of Minnesota. Tapiero, I. 2007. Situation Models and Levels of Coherence: Towards a Definition of Comprehension. Mahwah, New Jersey: Lawrence Erlbaum Associates. Wilpon, J. G. and Rabiner, L. R. 1985. A Modified Kmeans Clustering Algorithm for Use in Isolated Word Recognition. In IEEE Trans. Acoustics, Speech, Signal Proc. ASSP-33(3), 587–594. Zhang R., Li, W., and Lu, Q. 2010. Sentence Ordering with Event-Enriched Semantics and Two-Layered Clustering for Multi-Document News Summarization. In COLING 2010: Poster Volume, 1489–1497, Beijing.
4 0.12352093 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features
Author: Yue Zhang ; Joakim Nivre
Abstract: Transition-based dependency parsers generally use heuristic decoding algorithms but can accommodate arbitrarily rich feature representations. In this paper, we show that we can improve the accuracy of such parsers by considering even richer feature sets than those employed in previous systems. In the standard Penn Treebank setup, our novel features improve attachment score form 91.4% to 92.9%, giving the best results so far for transitionbased parsing and rivaling the best results overall. For the Chinese Treebank, they give a signficant improvement of the state of the art. An open source release of our parser is freely available.
5 0.1089173 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing
Author: Guangyou Zhou ; Jun Zhao ; Kang Liu ; Li Cai
Abstract: In this paper, we present a novel approach which incorporates the web-derived selectional preferences to improve statistical dependency parsing. Conventional selectional preference learning methods have usually focused on word-to-class relations, e.g., a verb selects as its subject a given nominal class. This paper extends previous work to wordto-word selectional preferences by using webscale data. Experiments show that web-scale data improves statistical dependency parsing, particularly for long dependency relationships. There is no data like more data, performance improves log-linearly with the number of parameters (unique N-grams). More importantly, when operating on new domains, we show that using web-derived selectional preferences is essential for achieving robust performance.
6 0.087535113 299 acl-2011-The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content
7 0.086186349 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation
8 0.084914006 101 acl-2011-Disentangling Chat with Local Coherence Models
9 0.079908356 53 acl-2011-Automatically Evaluating Text Coherence Using Discourse Relations
10 0.073421873 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction
11 0.072937235 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing
12 0.070094287 333 acl-2011-Web-Scale Features for Full-Scale Parsing
13 0.064805366 284 acl-2011-Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models
14 0.063995287 109 acl-2011-Effective Measures of Domain Similarity for Parsing
15 0.062815361 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing
16 0.062218003 39 acl-2011-An Ensemble Model that Combines Syntactic and Semantic Clustering for Discriminative Dependency Parsing
17 0.062139302 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
18 0.062058065 184 acl-2011-Joint Hebrew Segmentation and Parsing using a PCFGLA Lattice Parser
19 0.061473727 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation
20 0.059901718 44 acl-2011-An exponential translation model for target language morphology
topicId topicWeight
[(0, 0.176), (1, -0.011), (2, -0.055), (3, -0.091), (4, -0.045), (5, -0.018), (6, 0.058), (7, 0.079), (8, -0.013), (9, 0.057), (10, 0.016), (11, 0.045), (12, -0.071), (13, -0.037), (14, -0.079), (15, 0.084), (16, -0.239), (17, 0.4), (18, -0.107), (19, -0.141), (20, 0.102), (21, 0.092), (22, -0.162), (23, -0.052), (24, 0.042), (25, -0.102), (26, 0.058), (27, 0.293), (28, 0.234), (29, 0.085), (30, 0.028), (31, 0.036), (32, -0.042), (33, -0.088), (34, -0.161), (35, 0.155), (36, 0.021), (37, -0.077), (38, 0.084), (39, -0.023), (40, 0.061), (41, 0.061), (42, -0.0), (43, 0.113), (44, 0.069), (45, 0.008), (46, -0.165), (47, -0.087), (48, 0.088), (49, -0.147)]
simIndex simValue paperId paperTitle
1 0.97361821 237 acl-2011-Ordering Prenominal Modifiers with a Reranking Approach
Author: Jenny Liu ; Aria Haghighi
Abstract: In this work, we present a novel approach to the generation task of ordering prenominal modifiers. We take a maximum entropy reranking approach to the problem which admits arbitrary features on a permutation of modifiers, exploiting hundreds ofthousands of features in total. We compare our error rates to the state-of-the-art and to a strong Google ngram count baseline. We attain a maximum error reduction of 69.8% and average error reduction across all test sets of 59. 1% compared to the state-of-the-art and a maximum error reduction of 68.4% and average error reduction across all test sets of 41.8% compared to our Google n-gram count baseline.
same-paper 2 0.94607061 275 acl-2011-Semi-Supervised Modeling for Prenominal Modifier Ordering
Author: Margaret Mitchell ; Aaron Dunlop ; Brian Roark
Abstract: In this paper, we argue that ordering prenominal modifiers typically pursued as a supervised modeling task is particularly wellsuited to semi-supervised approaches. By relying on automatic parses to extract noun phrases, we can scale up the training data by orders of magnitude. This minimizes the predominant issue of data sparsity that has informed most previous approaches. We compare several recent approaches, and find improvements from additional training data across the board; however, none outperform a simple n-gram model. – –
3 0.54493982 280 acl-2011-Sentence Ordering Driven by Local and Global Coherence for Summary Generation
Author: Renxian Zhang
Abstract: In summarization, sentence ordering is conducted to enhance summary readability by accommodating text coherence. We propose a grouping-based ordering framework that integrates local and global coherence concerns. Summary sentences are grouped before ordering is applied on two levels: group-level and sentence-level. Different algorithms for grouping and ordering are discussed. The preliminary results on single-document news datasets demonstrate the advantage of our method over a widely accepted method. 1 Introduction and Background The canonical pipeline of text summarization consists of topic identification, interpretation, and summary generation (Hovy, 2005). In the simple case of extraction, topic identification and interpretation are conflated to sentence selection and concerned with summary informativeness. In . comparison, summary generation addresses summary readability and a frequently discussed generation technique is sentence ordering. It is implicitly or explicitly stated that sentence ordering for summarization is primarily driven by coherence. For example, Barzilay et al. (2002) use lexical cohesion information to model local coherence. A statistical model by Lapata (2003) considers both lexical and syntactic features in calculating local coherence. More globally biased is Barzilay and Lee’s (2004) HMM-based content model, which models global coherence with word distribution patterns. Whilst the above models treat coherence as lexical or topical relations, Barzilay and Lapata (2005, 2008) explicitly model local coherence with an entity grid model trained for optimal syntactic role transitions of entities. 6 . polyu .edu .hk Although coherence in those works is modeled in the guise of “lexical cohesion”, “topic closeness”, “content relatedness”, etc., few published works simultaneously accommodate coherence on the two levels: local coherence and global coherence, both of which are intriguing topics in text linguistics and psychology. For sentences, local coherence means the wellconnectedness between adjacent sentences through lexical cohesion (Halliday and Hasan, 1976) or entity repetition (Grosz et al., 1995) and global coherence is the discourse-level relation connecting remote sentences (Mann and Thompson, 1995; Kehler, 2002). An abundance of psychological evidences show that coherence on both levels is manifested in text comprehension (Tapiero, 2007). Accordingly, an apt sentence ordering scheme should be driven by such concerns. We also note that as sentence ordering is usually discussed only in the context of multi-document summarization, factors other than coherence are also considered, such as time and source sentence position in Bollegala et al.’s (2006) “agglomerative ordering” approach. But it remains an open question whether sentence ordering is non-trivial for single-document summarization, as it has long been recognized as an actual strategy taken by human summarizers (Jing, 1998; Jing and McKeown, 2000) and acknowledged early in work on sentence ordering for multi-document summarization (Barzilay et al., 2002). In this paper, we outline a grouping-based sentence ordering framework that is driven by the concern of local and global coherence. Summary sentences are grouped according to their conceptual relatedness before being ordered on two levels: group-level ordering and sentence-level ordering, which capture global coherence and local coherence in an integrated model. As a preliminary study, we applied the framework to singlePortland, ORP,r UoSceAed 1i9ng-2s4 of Ju tnhee 2 A0C1L1-H. ?Lc T2 0201111 A Ssstuodcieanttio Snes fsoiro Cn,o pmapguesta 6t–io1n1a,l Linguistics document summary generation and obtained interesting results. The main contributions of this work are: (1) we stress the need to channel sentence ordering research to linguistic and psychological findings about text coherence; (2) we propose a groupingbased ordering framework that integrates both local and global coherence; (3) we find in experiments that coherence-driven sentence ordering improves the readability of singledocument summaries, for which sentence ordering is often considered trivial. In Section 2, we review related ideas and techniques in previous work. Section 3 provides the details of grouping-based sentence ordering. The preliminary experimental results are presented in Section 4. Finally, Section 5 concludes the whole paper and describes future work. 2 Grouping-Based Ordering Our ordering framework is designed to capture both local and global coherence. Globally, we identify related groups among sentences and find their relative order. Locally, we strive to keep sentence similar or related in content close to each other within one group. 2.1 Sentence Representation As summary sentences are isolated from their original context, we retain the important content information by representing sentences as concept vectors. In the simplest case, the “concept” is equivalent to content word. A drawback of this practice is that it considers every content word equally contributive to the sentence content, which is not always true. For example, in the news domain, entities realized as NPs are more important than other concepts. To represent sentences as entity vectors, we identify both common entities (as the head nouns of NPs) and named entities. Two common entities are equivalent if their noun stems are identical or synonymous. Named entities are usually equated by identity. But in order to improve accuracy, we also consider: 1) structural subsumption (one is part of another); 2) hypernymy and holonymy (the named entities are in a superordinate-subordinate or part-whole relation). Now with summary sentence Si and m entities eik (k = 1 m), Si = (wf(ei1), wf(ei2), wf(eim)), … … … … … …, 7 where wf(eik) = wk× ×f(eik), f(eik) is the frequency of eik and wk is the weight of eik. We define wk = 1 if eik is a common entity and wk = 2 if eik is a named entity. We give double weight to named entities because of their significance to news articles. After all, a news story typically contains events, places, organizations, people, etc. that denote the news theme. Other things being equal, two sentences sharing a mention of named entities are thematically closer than two sentences sharing a mention of common entities. Alternatively, we can realize the “concept” as “event” because events are prevalent semantic constructs that bear much of the sentence content in some domains (e.g., narratives and news reports). To represent sentences as event vectors, we can follow Zhang et al.’s (2010) method at the cost of more complexity. 2.2 Sentence Grouping To meet the global need of identifying sentence groups, we develop two grouping algorithms by applying graph-based operation and clustering. Connected Component Finding (CC) This algorithm treats grouping sentences as finding connected components (CC) in a text graph TG = (V, E), where V represents the sentences and E the sentence relations weighted by cosine similarity. Edges with weight < t, a threshold, are removed because they represent poor sentence coherence. The resultant graph may be disconnected, in which we find all of its connected components, using depth-first search. The connected components are the groups we are looking for. Note that this method cannot guarantee that every two sentences in such a group are directly linked, but it does guarantee that there exists a path between every sentence pair. Modified K-means Clustering (MKM) Observing that the CC method finds only coherent groups, not necessarily groups of coherent sentences, we develop a second algorithm using clustering. A good choice might be K-means as it is efficient and outperforms agglomerative clustering methods in NLP applications (Steibach et al., 2000), but the difficulty with the conventional K-means is the decision of K. Our solution is modified K-means (MKM) based on (Wilpon and Rabiner, 1985). Let’s denote cluster iby CLi and cluster similarity by Sim(CLi) =SimM,SiinnCLi(Sim( Sim,Sin)), where Sim( Sim,Sin)is their cvMeluin231st.(roCWSD21imLsdhtoIf(=ClaSehbLsimt)l1;(zehS<-amncs,tdoeSa1vn)c;st:el=uMionvrate(lhcSKiosmg-tC;eblLayn)s,c2riuegant wcoelsunathdi cosine. The following illustrates the algorithm. The above algorithm stops iterating when each cluster contains all above-threshold-similarity sentence pairs or only one sentence. Unlike CC, MKM results in more strongly connected groups, or groups of coherence sentences. 2.3 Ordering Algorithms After the sentences are grouped, ordering is to be conducted on two levels: group and sentence. Composed of closely related sentences, groups simulate high-level textual constructs, such as “central event”, “cause”, “effect”, “background”, etc. for news articles, around which sentences are generated for global coherence. For an intuitive example, all sentences about “cause” should immediately precede all sentences about “effect” to achieve optimal readability. We propose two approaches to group-level ordering. 1) If the group sentences come from the same document, group (Gi) order is decided by the group-representing sentence (gi) order ( means “precede”) in the text. gi gj Gi Gj 2) Group order is decided in a greedy fashion in order to maximize the connectedness between adjacent groups, thus enhancing local coherence. Each time a group is selected to achieve maximum similarity with the ordered groups and the first ordered group (G1) is selected to achieve maximum similarity with all the other groups. G1argmGaxG'GSim( G , G') GiGuanrogrdemred agrxoupsij1 Sim( Gj, G) (i > 1) where Sim(G, G’) is the average sentence cosine similarity between G and G’. 8 Within the ordered groups, sentence-level ordering is aimed to enhance local coherence by placing conceptually close sentences next to each other. Similarly, we propose two approaches. 1) If the sentences come from the same document, they are arranged by the text order. 2) Sentence order is greedily decided. Similar to the decision of group order, with ordered sentence Spi in group Gp: Sp1argSm GpaxS'SSim( S , S') SpiSunorader egd smenteanxces in Gpji1 Sim( Spj,S )(i > 1) Note that the text order is used as a common heuristic, based on the assumption that the sentences are arranged coherently in the source document, locally and globally. 3 Experiments and Preliminary Results Currently, we have evaluated grouping-based ordering on single-document summarization, for which text order is usually considered sufficient. But there is no theoretical proof that it leads to optimal global and local coherence that concerns us. On some occasions, e.g., a news article adopting the “Wall Street Journal Formula” (Rich and Harper, 2007) where conceptually related sentences are placed at the beginning and the end, sentence conceptual relatedness does not necessarily correlate with spatial proximity and thus selected sentences may need to be rearranged for better readability. We are not aware of any published work that has empirically compared alternative ways of sentence ordering for singledocument summarization. The experimental results reported below may draw some attention to this taken-for-granted issue. 3.1 Data and Method We prepared 3 datasets of 60 documents each, the first (D400) consisting of documents of about 400 words from the Document Understanding Conference (DUC) 01/02 datasets; the second (D1k) consisting of documents of about 1000 words manually selected from popular English journals such as The Wall Street Journal, The Washington Post, etc; the third (D2k) consisting of documents of about 2000 words from the DUC 01/02 dataset. Then we generated 100-word summaries for D400 and 200-word summaries for D1k and D2k. Since sentence selection is not our focus, the 180 summaries were all extracts produced by a simple but robust summarizer built on term frequency and sentence position (Aone et al., 1999). Three human annotators were employed to each provide reference orderings for the 180 summaries and mark paragraph (of at least 2 sentences) boundaries, which will be used by one of the evaluation metrics described below. In our implementation of the grouping-based ordering, sentences are represented as entity vectors and the threshold t = Avg( Sim( Sm, Sn)) c , the average sentence similarity in a group multiplied by a coefficient empirically decided on separate held-out datasets of 20 documents for each length category. The “group-representing sentence” is the textually earliest sentence in the group. We experimented with both CC and MKM to generate sentence groups and all the proposed algorithms in 2.3 for group-level and sentence- level orderings, resulting in 8 combinations as test orderings, each coded in the format of “Grouping (CC/MKM) / Group ordering (T/G) / Sentence ordering (T/G)”, where T and G represent the text order approach and the greedy selection approach respectively. For example, “CC/T/G” means grouping with CC, group ordering with text order, and sentence ordering with the greedy approach. We evaluated the test orderings against the 3 reference orderings and compute the average (Madnani et al., 2007) by using 3 different metrics. The first metric is Kendall’s τ (Lapata 2003, 2006), which has been reliably used in ordering evaluations (Bollegala et al., 2006; Madnani et al., 2007). It measures ordering differences in terms of the number of adjacent sentence inversions necessary to convert a test ordering to the reference ordering. 1 4m N( N 1) In this formula, m represents the number of inversions described above and N is the total number of sentences. The second metric is the Average Continuity (AC) proposed by Bollegala et al. (2006), which captures the intuition that the quality of sentence orderings can be estimated by the number of correctly arranged continuous sentences. 9 ACexp(1/(k1) klog(Pn)) n2 In this formula, k is the maximum number of continuous sentences, α is a small value in case Pn = 1. Pn, the proportion of continuous sentences of length n in an ordering, is defined as m/(N – n + 1) where m is the number of continuous sentences of length n in both the test and reference orderings and N is the total number of sentences. Following (Bollegala et al., 2006), we set k = Min(4, N) and α = 0.01. We also go a step further by considering only the continuous sentences in a paragraph marked by human annotators, because paragraphs are local meaning units perceived by human readers and the order of continuous sentences in a paragraph is more strongly grounded than the order of continuous sentences across paragraph boundaries. So in-paragraph sentence continuity is a better estimation for the quality of sentence orderings. This is our third metric: Paragraph-level Average Continuity (P-AC). P-ACexp(1/(k1) klog(PPn )) n2 Here PPn = m ’/(N – n + 1), where m ’ is the number of continuous sentences of length n in both the test ordering and a paragraph of the reference ordering. All the other parameters are as defined in AC and Pn. 3.2 Results The following tables show the results measured by each metric. For comparison, we also include a “Baseline” that uses the text order. For each dataset, two-tailed t-test is conducted between the top scorer and all the other orderings and statistical significance (p < 0.05) is marked with *. In general, our grouping-based ordering scheme outperforms the baseline for news articles of various lengths and statistically significant improvement can be observed on each dataset. This result casts serious doubt on the widely accepted practice of taking the text order for single-document summary generation, which is a major finding from our study. The three evaluation metrics give consistent results although they are based on different observations. The P-AC scores are much lower than their AC counterparts because of its strict paragraph constraint. Interestingly, applying the text order posterior to sentence grouping for group-level and sentence- level ordering leads to consistently optimal performance, as the top scorers on each dataset are almost all “__/T/T”. This suggests that the textual realization of coherence can be sought in the source document if possible, after the selected sentences are rearranged. It is in this sense that the general intuition about the text order is justified. It also suggests that tightly knit paragraphs (groups), where the sentences are closely connected, play a crucial role in creating a coherence flow. Shuffling those paragraphs may not affect the final coherence1. 1 Ithank an anonymous reviewer for pointing this out. 10 The grouping method does make a difference. While CC works best for the short and long datasets (D400 and D2k), MKM is more effective for the medium-sized dataset D1k. Whether the difference is simply due to length or linguistic/stylistic subtleties is an interesting topic for in-depth study. 4 Conclusion and Future Work We have established a grouping-based ordering scheme to accommodate both local and global coherence for summary generation. Experiments on single-document summaries validate our approach and challenge the well accepted text order by the summarization community. Nonetheless, the results do not necessarily propagate to multi-document summarization, for which the same-document clue for ordering cannot apply directly. Adapting the proposed scheme to multi-document summary generation is the ongoing work we are engaged in. In the next step, we will experiment with alternative sentence representations and ordering algorithms to achieve better performance. We are also considering adapting more sophisticated coherence-oriented models, such as (Soricut and Marcu, 2006; Elsner et al., 2007), to our problem so as to make more interesting comparisons possible. Acknowledgements The reported work was inspired by many talks with my supervisor, Dr. Wenjie Li, who saw through this work down to every writing detail. The author is also grateful to many people for assistance. You Ouyang shared part of his summarization work and helped with the DUC data. Dr. Li Shen, Dr. Naishi Liu, and three participants helped with the experiments. I thank them all. The work described in this paper was partially supported by Hong Kong RGC Projects (No. PolyU 5217/07E). References Aone, C., Okurowski, M. E., Gorlinsky, J., and Larsen, B. 1999. A Trainable Summarizer with Knowledge Acquired from Robust NLP Techniques. In I. Mani and M. T. Maybury (eds.), Advances in Automatic Text Summarization. 71–80. Cambridge, Massachusetts: MIT Press. Barzilay, R., Elhadad, N., and McKeown, K. 2002. Inferring Strategies for Sentence Ordering in Multidocument News Summarization. Journal of Artificial Intelligence Research, 17: 35–55. Barzilay, R. and Lapata, M. 2005. Modeling Local Coherence: An Entity-based Approach. In Proceedings of the 43rd Annual Meeting of the ACL, 141–148. Ann Arbor. Barzilay, R. and Lapata, M. 2008. Modeling Local Coherence: An Entity-Based Approach. Computational Linguistics, 34: 1–34. Barzilay, R. and Lee L. 2004. Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization. In HLT-NAACL 2004: Proceedings of the Main Conference. 113–120. Bollegala, D, Okazaki, N., and Ishizuka, M. 2006. A Bottom-up Approach to Sentence Ordering for Multidocument Summarization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, 385–392. Sydney. Elsner, M., Austerweil, j. & Charniak E. 2007. “A Unified Local and Global Model for Discourse Coherence”. In Proceedings of NAACL HLT 2007, 436-443. Rochester, NY. Grosz, B. J., Aravind K. J., and Scott W. 1995. Centering: A framework for Modeling the Local Coherence of Discourse. Computational Linguistics, 21(2):203–225. Halliday, M. A. K., and Hasan, R. 1976. Cohesion in English. London: Longman. Hovy, E. 2005. Automated Text Summarization. In R. Mitkov (ed.), The Oxford Handbook of Computational Linguistics, pp. 583–598. Oxford: Oxford University Press. Jing, H. 2000. Sentence Reduction for Automatic Text Summarization. In Proceedings of the 6th Applied Natural Language Processing WA, pp. 310–315. Conference, Seattle, Jing, H., and McKeown, K. 2000. Cut and Paste Based Text Summarization. In Proceedings of the 1st NAACL, 178–185. 11 Kehler, A. 2002. Coherence, Reference, and the Theory of Grammar. Stanford, California: CSLI Publications. Lapata, M. 2003. Probabilistic Text Structuring: Experiments with Sentence Ordering. In Proceedings of the Annual Meeting of ACL, 545–552. Sapporo, Japan. Lapata, M. 2006. Automatic evaluation of information ordering: Kendall’s tau. Computational Linguistics, 32(4):1–14. Madnani, N., Passonneau, R., Ayan, N. F., Conroy, J. M., Dorr, B. J., Klavans, J. L., O’leary, D. P., and Schlesinger, J. D. 2007. Measuring Variability in Sentence Ordering for News Summarization. In Proceedings of the Eleventh European Workshop on Natural Language Generation, 81–88. Germany. Mann, W. C. and Thompson, S. 1988. Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. Text, 8:243–281. Rich C., and Harper, C. 2007. Writing and Reporting News: A Coaching Method, Fifth Edition. Thomason Learning, Inc. Belmont, CA. Soricut, R. and Marcu D. 2006. Discourse Generation Using Utility-Trained Coherence Models. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, 803–810. Steibach, M., Karypis, G., and Kumar V. 2000. A Comparison of Document Clustering Techniques. Technical Report 00-034. Department of Computer Science and Engineering, University of Minnesota. Tapiero, I. 2007. Situation Models and Levels of Coherence: Towards a Definition of Comprehension. Mahwah, New Jersey: Lawrence Erlbaum Associates. Wilpon, J. G. and Rabiner, L. R. 1985. A Modified Kmeans Clustering Algorithm for Use in Isolated Word Recognition. In IEEE Trans. Acoustics, Speech, Signal Proc. ASSP-33(3), 587–594. Zhang R., Li, W., and Lu, Q. 2010. Sentence Ordering with Event-Enriched Semantics and Two-Layered Clustering for Multi-Document News Summarization. In COLING 2010: Poster Volume, 1489–1497, Beijing.
4 0.31263322 239 acl-2011-P11-5002 k2opt.pdf
Author: empty-author
Abstract: unkown-abstract
5 0.29701793 297 acl-2011-That's What She Said: Double Entendre Identification
Author: Chloe Kiddon ; Yuriy Brun
Abstract: Humor identification is a hard natural language understanding problem. We identify a subproblem — the “that’s what she said” problem with two distinguishing characteristics: (1) use of nouns that are euphemisms for sexually explicit nouns and (2) structure common in the erotic domain. We address this problem in a classification approach that includes features that model those two characteristics. Experiments on web data demonstrate that our approach improves precision by 12% over baseline techniques that use only word-based features. —
6 0.28622311 301 acl-2011-The impact of language models and loss functions on repair disfluency detection
7 0.28145579 102 acl-2011-Does Size Matter - How Much Data is Required to Train a REG Algorithm?
8 0.27795425 320 acl-2011-Unsupervised Discovery of Domain-Specific Knowledge from Text
9 0.27237892 53 acl-2011-Automatically Evaluating Text Coherence Using Discourse Relations
10 0.26777589 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation
11 0.26259583 38 acl-2011-An Empirical Investigation of Discounting in Cross-Domain Language Models
12 0.26125374 284 acl-2011-Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models
13 0.25259733 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features
14 0.25258347 101 acl-2011-Disentangling Chat with Local Coherence Models
15 0.25094253 333 acl-2011-Web-Scale Features for Full-Scale Parsing
16 0.24983846 120 acl-2011-Even the Abstract have Color: Consensus in Word-Colour Associations
17 0.24667838 195 acl-2011-Language of Vandalism: Improving Wikipedia Vandalism Detection via Stylometric Analysis
18 0.24558458 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing
19 0.23729861 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity
20 0.23707883 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing
topicId topicWeight
[(5, 0.027), (17, 0.041), (26, 0.024), (31, 0.014), (37, 0.087), (39, 0.047), (41, 0.051), (55, 0.434), (59, 0.04), (72, 0.02), (91, 0.033), (96, 0.115)]
simIndex simValue paperId paperTitle
1 0.86585134 24 acl-2011-A Scalable Probabilistic Classifier for Language Modeling
Author: Joel Lang
Abstract: We present a novel probabilistic classifier, which scales well to problems that involve a large number ofclasses and require training on large datasets. A prominent example of such a problem is language modeling. Our classifier is based on the assumption that each feature is associated with a predictive strength, which quantifies how well the feature can predict the class by itself. The predictions of individual features can then be combined according to their predictive strength, resulting in a model, whose parameters can be reliably and efficiently estimated. We show that a generative language model based on our classifier consistently matches modified Kneser-Ney smoothing and can outperform it if sufficiently rich features are incorporated.
2 0.84123433 78 acl-2011-Confidence-Weighted Learning of Factored Discriminative Language Models
Author: Viet Ha Thuc ; Nicola Cancedda
Abstract: Language models based on word surface forms only are unable to benefit from available linguistic knowledge, and tend to suffer from poor estimates for rare features. We propose an approach to overcome these two limitations. We use factored features that can flexibly capture linguistic regularities, and we adopt confidence-weighted learning, a form of discriminative online learning that can better take advantage of a heavy tail of rare features. Finally, we extend the confidence-weighted learning to deal with label noise in training data, a common case with discriminative lan- guage modeling.
same-paper 3 0.8369509 275 acl-2011-Semi-Supervised Modeling for Prenominal Modifier Ordering
Author: Margaret Mitchell ; Aaron Dunlop ; Brian Roark
Abstract: In this paper, we argue that ordering prenominal modifiers typically pursued as a supervised modeling task is particularly wellsuited to semi-supervised approaches. By relying on automatic parses to extract noun phrases, we can scale up the training data by orders of magnitude. This minimizes the predominant issue of data sparsity that has informed most previous approaches. We compare several recent approaches, and find improvements from additional training data across the board; however, none outperform a simple n-gram model. – –
4 0.82646883 124 acl-2011-Exploiting Morphology in Turkish Named Entity Recognition System
Author: Reyyan Yeniterzi
Abstract: Turkish is an agglutinative language with complex morphological structures, therefore using only word forms is not enough for many computational tasks. In this paper we analyze the effect of morphology in a Named Entity Recognition system for Turkish. We start with the standard word-level representation and incrementally explore the effect of capturing syntactic and contextual properties of tokens. Furthermore, we also explore a new representation in which roots and morphological features are represented as separate tokens instead of representing only words as tokens. Using syntactic and contextual properties with the new representation provide an 7.6% relative improvement over the baseline.
5 0.75643277 144 acl-2011-Global Learning of Typed Entailment Rules
Author: Jonathan Berant ; Ido Dagan ; Jacob Goldberger
Abstract: Extensive knowledge bases ofentailment rules between predicates are crucial for applied semantic inference. In this paper we propose an algorithm that utilizes transitivity constraints to learn a globally-optimal set of entailment rules for typed predicates. We model the task as a graph learning problem and suggest methods that scale the algorithm to larger graphs. We apply the algorithm over a large data set of extracted predicate instances, from which a resource of typed entailment rules has been recently released (Schoenmackers et al., 2010). Our results show that using global transitivity information substantially improves performance over this resource and several baselines, and that our scaling methods allow us to increase the scope of global learning of entailment-rule graphs.
6 0.73754346 237 acl-2011-Ordering Prenominal Modifiers with a Reranking Approach
7 0.67848325 245 acl-2011-Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives
8 0.54283243 150 acl-2011-Hierarchical Text Classification with Latent Concepts
9 0.53723174 175 acl-2011-Integrating history-length interpolation and classes in language modeling
10 0.533764 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning
11 0.53229511 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling
13 0.5232904 85 acl-2011-Coreference Resolution with World Knowledge
14 0.51903003 17 acl-2011-A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation
15 0.51310611 197 acl-2011-Latent Class Transliteration based on Source Language Origin
16 0.51301646 135 acl-2011-Faster and Smaller N-Gram Language Models
17 0.5103693 280 acl-2011-Sentence Ordering Driven by Local and Global Coherence for Summary Generation
18 0.51016188 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora
19 0.50909138 38 acl-2011-An Empirical Investigation of Discounting in Cross-Domain Language Models
20 0.50761503 10 acl-2011-A Discriminative Model for Joint Morphological Disambiguation and Dependency Parsing