acl acl2011 acl2011-237 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jenny Liu ; Aria Haghighi
Abstract: In this work, we present a novel approach to the generation task of ordering prenominal modifiers. We take a maximum entropy reranking approach to the problem which admits arbitrary features on a permutation of modifiers, exploiting hundreds ofthousands of features in total. We compare our error rates to the state-of-the-art and to a strong Google ngram count baseline. We attain a maximum error reduction of 69.8% and average error reduction across all test sets of 59. 1% compared to the state-of-the-art and a maximum error reduction of 68.4% and average error reduction across all test sets of 41.8% compared to our Google n-gram count baseline.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract In this work, we present a novel approach to the generation task of ordering prenominal modifiers. [sent-3, score-0.716]
2 We take a maximum entropy reranking approach to the problem which admits arbitrary features on a permutation of modifiers, exploiting hundreds ofthousands of features in total. [sent-4, score-0.359]
3 We compare our error rates to the state-of-the-art and to a strong Google ngram count baseline. [sent-5, score-0.106]
4 8% and average error reduction across all test sets of 59. [sent-7, score-0.072]
5 1% compared to the state-of-the-art and a maximum error reduction of 68. [sent-8, score-0.097]
6 4% and average error reduction across all test sets of 41. [sent-9, score-0.072]
7 1 Introduction Speakers rarely have difficulty correctly ordering modifiers such as adjectives, adverbs, or gerunds when describing some noun. [sent-11, score-0.997]
8 The phrase “beautiful blue Macedonian vase” sounds very natural, whereas changing the modifier ordering to “blue Macedonian beautiful vase” is awkward (see Table 1 for more examples). [sent-12, score-0.859]
9 In this work, we consider the task of ordering an unordered set of prenominal modifiers so that they sound fluent to native language speakers. [sent-13, score-1.387]
10 Much linguistic research has investigated the semantic constraints behind prenominal modifier orderings. [sent-15, score-0.663]
11 One common line of research suggests that modifiers can be organized by the underlying semantic property they describe and that there is 1109 Aria Haghighi MIT CSAIL me @ aria4 2 . [sent-16, score-0.64]
12 The most natural sounding ordering is in bold, followed by other possibilities that may only be appropriate in certain situations. [sent-18, score-0.466]
13 an ordering on semantic properties which in turn restricts modifier orderings. [sent-19, score-0.711]
14 For instance, Sproat and Shih (1991) contend that the size property precedes the color property and thus “small black cat” sounds more fluent than “black small cat”. [sent-20, score-0.423]
15 Using > to denote precedence of semantic groups, some commonly proposed orderings are: quality > size > shape > color > provenance (Sproat and Shih, 1991), age > color > participle > provenance > noun > denominal (Quirk et al. [sent-21, score-0.571]
16 , 1974), and value > dimension > physical property > speed > human propensity > age > color (Dixon, 1977). [sent-22, score-0.208]
17 However, correctly classifying modifiers into these groups can be difficult and may be domain dependent or constrained by the context in which the modifier is being used. [sent-23, score-0.935]
18 In addition, these methods do not specify how to order modifiers within the same class or modifiers that do not fit into any of the specified groups. [sent-24, score-1.263]
19 c s 2o0ci1a1ti Aonss foocria Ctioomnp fourta Ctioomnaplu Ltaintigouniaslti Lcisn,g puaigsetsic 1s109–1116, a class-based approach in which modifiers are grouped into classes based on which positions they prefer in the training corpus, with a predefined ordering imposed on these classes. [sent-28, score-1.285]
20 Shaw and Hatzivassiloglou (1999) developed three different approaches to the problem that use counting methods and clustering algorithms, and Malouf (2000) expands upon Shaw and Hatzivassiloglou’s work. [sent-29, score-0.089]
21 This paper describes a computational solution to the problem that uses relevant features to model the modifier ordering process. [sent-30, score-0.722]
22 By mapping a set of features across the training data and using a maximum entropy reranking model, we can learn optimal weights for these features and then order each set of modifiers in the test data according to our features and the learned weights. [sent-31, score-0.914]
23 This approach has not been used before to solve the prenominal modifier ordering problem, and as we demonstrate, vastly outperforms the state-of-the-art, especially for sequences of longer lengths. [sent-32, score-1.09]
24 In Section 3 we present the details of our maximum entropy reranking approach. [sent-34, score-0.183]
25 2 Related Work Mitchell (2009) orders sequences of at most 4 modifiers and defines nine classes that express the broad positional preferences of modifiers, where position 1 is closest to the noun phrase (NP) head and position 4 is farthest from it. [sent-37, score-1.013]
26 Classes 1 through 4 comprise those modifiers that prefer only to be in positions 1 through 4, respectively. [sent-38, score-0.808]
27 Class 5 through 7 modifiers prefer positions 1-2, 2-3, and 3-4, respectively, while class 8 modifiers prefer positions 1-3, and finally, class 9 modifiers prefer positions 2-4. [sent-39, score-2.501]
28 Mitchell counts how often each word type appears in each of these positions in the training corpus. [sent-40, score-0.142]
29 If any modifier’s probability of taking a certain position is greater than a uniform distribution would allow, then it is said to prefer that position. [sent-41, score-0.168]
30 Each word type is then assigned a class, with a global ordering defined over the nine classes. [sent-42, score-0.412]
31 1110 Given a set of modifiers to order, if the entire set has been seen at training time, Mitchell’s system looks up the class of each modifier and then orders the sequence based on the predefined ordering for the classes. [sent-43, score-1.486]
32 When two modifiers have the same class, the system picks between the possibilities randomly. [sent-44, score-0.631]
33 If a modifier was not seen at training time and thus cannot be said to belong to a specific class, the system favors orderings where modifiers whose classes are known are as close to their classes’ preferred positions as possible. [sent-45, score-1.247]
34 Shaw and Hatzivassiloglou (1999) use corpusbased counting methods as well. [sent-46, score-0.088]
35 For a corpus with × w word types, they define a w w matrix where Count[A, B] i,nd thiecayte dse fhinoew o wfte ×n wmo mdiafitreirx A w precedes modifier B. [sent-47, score-0.451]
36 Given two modifiers a and b to order, they compare Count[a, b] and Count[b, a] in their training data. [sent-48, score-0.626]
37 Assuming a null hypothesis that the probability of either ordering is 0. [sent-49, score-0.37]
38 5, they use a binomial distribution to compute the probability of seeing the ordering < a, b > for Count[a, b] number of times. [sent-50, score-0.395]
39 If this probability is above a certain threshold then they say that a precedes b. [sent-51, score-0.076]
40 These methods have proven to work well, but they also suffer from sparsity issues in the training data. [sent-53, score-0.057]
41 59% for NPs of all lengths, but the accuracy of her approach is greatly reduced when two modifiers fall into the same class, since the system cannot make an informed decision in those cases. [sent-55, score-0.629]
42 In addition, if a modifier is not seen in the training data, the system is unable to assign it a class, which also limits accuracy. [sent-56, score-0.383]
43 93%, but since their methods depend heavily on bigram counts in the training corpus, they are also limited in how informed their decisions can be if modifiers in the test data are not present at training time. [sent-59, score-0.718]
44 In this next section, we describe our maximum entropy reranking approach that tries to develop a more comprehensive model of the modifier ordering process to avoid the sparsity issues that previous approaches have faced. [sent-60, score-0.896]
45 3 Model We treat the problem of prenominal modifier ordering as a reranking problem. [sent-61, score-1.169]
46 Given a set B of prenominal modifiers and a noun phrase head H which B modifies, we define π(B) to be the set of all possible permutations, or orderings, of B. [sent-62, score-1.018]
47 We suppose that for a set B there is some x∗ ∈ π(B) which represents a “correct” natural-sounding ordering cohf the modifiers in B. [sent-63, score-0.997]
48 At test time, we choose an ordering x ∈ π(B) using a m tesatx timimuem, w entropy reranking approach (Collins and Koo, 2005). [sent-64, score-0.528]
49 Our distribution over orderings x ∈ π(B) is given by: P(x|H,B,W) =x∈eπx(Bp){eWxpT{φW(BT,φH(B,x,)H},x)} where φ(B, H, x) is a feature vector over a particular ordering of B and W is a learned weight vector over features. [sent-65, score-0.48]
50 1, but note that we are free under this formulation to use arbitrary features on the full ordering x of B as well as the head noun H, which we implicitly condition on throughout. [sent-67, score-0.517]
51 Since the size of the set of prenominal modifiers B is typically less than six, enumerating π(B) is not expensive. [sent-68, score-0.967]
52 At training time, our data consists of sequences of prenominal orderings and their corresponding nominal heads. [sent-69, score-0.518]
53 We treat each sequence as a training example where the labeled ordering x∗ ∈ π(B) is the one we observe. [sent-70, score-0.424]
54 Concretely, at training time, we select W to maximize: L(W) =(B,H,x∗)P(x∗|H,B,W)−2Wσ22 where the first term represents our observed data likelihood and the second the 2 regularization, where σ2 is a fixed hyperparameter; we fix the value of σ2 to 0. [sent-72, score-0.031]
55 The key to the success of our approach is using the flexibility afforded by having arbitrary features φ(B, H, x) to capture all the salient elements 1111 of the prenominal ordering data. [sent-75, score-0.786]
56 These features can be used to create a richer model of the modifier ordering process than previous corpus-based counting approaches. [sent-76, score-0.785]
57 In addition, we can encapsulate previous approaches in terms of features in our model. [sent-77, score-0.07]
58 Mitchell’s class-based approach can be expressed as a binary feature that tells us whether a given permuation satisfies the class ordering constraints in her model. [sent-78, score-0.492]
59 Previous counting approaches can be expressed as a real-valued feature that, given all n- grams generated by a permutation of modifiers, returns the count of all these n-grams in the original training data. [sent-79, score-0.289]
60 1 Feature Selection Our features are ofthe form φ(B, H, x) as expressed in the model above, and we include both indicator features and real-valued numeric features in our model. [sent-81, score-0.182]
61 We attempt to capture aspects ofthe modifier permutations that may be significant in the ordering process. [sent-82, score-0.78]
62 We might also expect permutations that contain n- grams previously seen in the training data to be more natural sounding than other permutations that generate n-grams that have not been seen before. [sent-84, score-0.399]
63 We can express this as a real-valued feature: φ(B,H,x) = cno-gunratm ins t pr aeisnein tg in da xta of al See Table 2 for a summary of our features. [sent-85, score-0.056]
64 Many of the features we use are similar to those in Dunlop et al. [sent-86, score-0.035]
wordName wordTfidf (topN-words)
[('modifiers', 0.595), ('ordering', 0.37), ('prenominal', 0.346), ('modifier', 0.317), ('shaw', 0.173), ('reranking', 0.113), ('hatzivassiloglou', 0.11), ('orderings', 0.11), ('prefer', 0.106), ('color', 0.095), ('permutations', 0.093), ('mitchell', 0.088), ('positions', 0.084), ('macedonian', 0.079), ('precedes', 0.076), ('count', 0.076), ('class', 0.073), ('provenance', 0.069), ('vase', 0.064), ('csail', 0.064), ('counting', 0.063), ('sounding', 0.06), ('farthest', 0.06), ('shih', 0.057), ('grams', 0.052), ('sounds', 0.052), ('beautiful', 0.051), ('positional', 0.049), ('sproat', 0.048), ('property', 0.045), ('entropy', 0.045), ('classes', 0.043), ('cat', 0.043), ('nine', 0.042), ('reduction', 0.042), ('fluent', 0.041), ('permutation', 0.041), ('adverbs', 0.04), ('noun', 0.039), ('head', 0.038), ('blue', 0.037), ('possibilities', 0.036), ('age', 0.036), ('seen', 0.035), ('ifrom', 0.035), ('unordered', 0.035), ('wmo', 0.035), ('contend', 0.035), ('encapsulate', 0.035), ('extr', 0.035), ('features', 0.035), ('arbitrary', 0.035), ('black', 0.034), ('informed', 0.034), ('predefined', 0.033), ('said', 0.032), ('orders', 0.032), ('tg', 0.032), ('propensity', 0.032), ('awkward', 0.032), ('gerunds', 0.032), ('cohf', 0.032), ('sequences', 0.031), ('training', 0.031), ('mit', 0.031), ('position', 0.03), ('dunlop', 0.03), ('admits', 0.03), ('attain', 0.03), ('fea', 0.03), ('participle', 0.03), ('malouf', 0.03), ('positioned', 0.03), ('error', 0.03), ('precedence', 0.028), ('indicator', 0.028), ('counts', 0.027), ('sparsity', 0.026), ('expressed', 0.026), ('enumerating', 0.026), ('expands', 0.026), ('vastly', 0.026), ('maximum', 0.025), ('corpusbased', 0.025), ('hyperparameter', 0.025), ('binomial', 0.025), ('concretely', 0.025), ('express', 0.024), ('bp', 0.024), ('restricts', 0.024), ('groups', 0.023), ('imposed', 0.023), ('dse', 0.023), ('tells', 0.023), ('bt', 0.023), ('numeric', 0.023), ('treat', 0.023), ('modifies', 0.023), ('comprise', 0.023), ('transitivity', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 237 acl-2011-Ordering Prenominal Modifiers with a Reranking Approach
Author: Jenny Liu ; Aria Haghighi
Abstract: In this work, we present a novel approach to the generation task of ordering prenominal modifiers. We take a maximum entropy reranking approach to the problem which admits arbitrary features on a permutation of modifiers, exploiting hundreds ofthousands of features in total. We compare our error rates to the state-of-the-art and to a strong Google ngram count baseline. We attain a maximum error reduction of 69.8% and average error reduction across all test sets of 59. 1% compared to the state-of-the-art and a maximum error reduction of 68.4% and average error reduction across all test sets of 41.8% compared to our Google n-gram count baseline.
2 0.61115581 275 acl-2011-Semi-Supervised Modeling for Prenominal Modifier Ordering
Author: Margaret Mitchell ; Aaron Dunlop ; Brian Roark
Abstract: In this paper, we argue that ordering prenominal modifiers typically pursued as a supervised modeling task is particularly wellsuited to semi-supervised approaches. By relying on automatic parses to extract noun phrases, we can scale up the training data by orders of magnitude. This minimizes the predominant issue of data sparsity that has informed most previous approaches. We compare several recent approaches, and find improvements from additional training data across the board; however, none outperform a simple n-gram model. – –
3 0.21882601 280 acl-2011-Sentence Ordering Driven by Local and Global Coherence for Summary Generation
Author: Renxian Zhang
Abstract: In summarization, sentence ordering is conducted to enhance summary readability by accommodating text coherence. We propose a grouping-based ordering framework that integrates local and global coherence concerns. Summary sentences are grouped before ordering is applied on two levels: group-level and sentence-level. Different algorithms for grouping and ordering are discussed. The preliminary results on single-document news datasets demonstrate the advantage of our method over a widely accepted method. 1 Introduction and Background The canonical pipeline of text summarization consists of topic identification, interpretation, and summary generation (Hovy, 2005). In the simple case of extraction, topic identification and interpretation are conflated to sentence selection and concerned with summary informativeness. In . comparison, summary generation addresses summary readability and a frequently discussed generation technique is sentence ordering. It is implicitly or explicitly stated that sentence ordering for summarization is primarily driven by coherence. For example, Barzilay et al. (2002) use lexical cohesion information to model local coherence. A statistical model by Lapata (2003) considers both lexical and syntactic features in calculating local coherence. More globally biased is Barzilay and Lee’s (2004) HMM-based content model, which models global coherence with word distribution patterns. Whilst the above models treat coherence as lexical or topical relations, Barzilay and Lapata (2005, 2008) explicitly model local coherence with an entity grid model trained for optimal syntactic role transitions of entities. 6 . polyu .edu .hk Although coherence in those works is modeled in the guise of “lexical cohesion”, “topic closeness”, “content relatedness”, etc., few published works simultaneously accommodate coherence on the two levels: local coherence and global coherence, both of which are intriguing topics in text linguistics and psychology. For sentences, local coherence means the wellconnectedness between adjacent sentences through lexical cohesion (Halliday and Hasan, 1976) or entity repetition (Grosz et al., 1995) and global coherence is the discourse-level relation connecting remote sentences (Mann and Thompson, 1995; Kehler, 2002). An abundance of psychological evidences show that coherence on both levels is manifested in text comprehension (Tapiero, 2007). Accordingly, an apt sentence ordering scheme should be driven by such concerns. We also note that as sentence ordering is usually discussed only in the context of multi-document summarization, factors other than coherence are also considered, such as time and source sentence position in Bollegala et al.’s (2006) “agglomerative ordering” approach. But it remains an open question whether sentence ordering is non-trivial for single-document summarization, as it has long been recognized as an actual strategy taken by human summarizers (Jing, 1998; Jing and McKeown, 2000) and acknowledged early in work on sentence ordering for multi-document summarization (Barzilay et al., 2002). In this paper, we outline a grouping-based sentence ordering framework that is driven by the concern of local and global coherence. Summary sentences are grouped according to their conceptual relatedness before being ordered on two levels: group-level ordering and sentence-level ordering, which capture global coherence and local coherence in an integrated model. As a preliminary study, we applied the framework to singlePortland, ORP,r UoSceAed 1i9ng-2s4 of Ju tnhee 2 A0C1L1-H. ?Lc T2 0201111 A Ssstuodcieanttio Snes fsoiro Cn,o pmapguesta 6t–io1n1a,l Linguistics document summary generation and obtained interesting results. The main contributions of this work are: (1) we stress the need to channel sentence ordering research to linguistic and psychological findings about text coherence; (2) we propose a groupingbased ordering framework that integrates both local and global coherence; (3) we find in experiments that coherence-driven sentence ordering improves the readability of singledocument summaries, for which sentence ordering is often considered trivial. In Section 2, we review related ideas and techniques in previous work. Section 3 provides the details of grouping-based sentence ordering. The preliminary experimental results are presented in Section 4. Finally, Section 5 concludes the whole paper and describes future work. 2 Grouping-Based Ordering Our ordering framework is designed to capture both local and global coherence. Globally, we identify related groups among sentences and find their relative order. Locally, we strive to keep sentence similar or related in content close to each other within one group. 2.1 Sentence Representation As summary sentences are isolated from their original context, we retain the important content information by representing sentences as concept vectors. In the simplest case, the “concept” is equivalent to content word. A drawback of this practice is that it considers every content word equally contributive to the sentence content, which is not always true. For example, in the news domain, entities realized as NPs are more important than other concepts. To represent sentences as entity vectors, we identify both common entities (as the head nouns of NPs) and named entities. Two common entities are equivalent if their noun stems are identical or synonymous. Named entities are usually equated by identity. But in order to improve accuracy, we also consider: 1) structural subsumption (one is part of another); 2) hypernymy and holonymy (the named entities are in a superordinate-subordinate or part-whole relation). Now with summary sentence Si and m entities eik (k = 1 m), Si = (wf(ei1), wf(ei2), wf(eim)), … … … … … …, 7 where wf(eik) = wk× ×f(eik), f(eik) is the frequency of eik and wk is the weight of eik. We define wk = 1 if eik is a common entity and wk = 2 if eik is a named entity. We give double weight to named entities because of their significance to news articles. After all, a news story typically contains events, places, organizations, people, etc. that denote the news theme. Other things being equal, two sentences sharing a mention of named entities are thematically closer than two sentences sharing a mention of common entities. Alternatively, we can realize the “concept” as “event” because events are prevalent semantic constructs that bear much of the sentence content in some domains (e.g., narratives and news reports). To represent sentences as event vectors, we can follow Zhang et al.’s (2010) method at the cost of more complexity. 2.2 Sentence Grouping To meet the global need of identifying sentence groups, we develop two grouping algorithms by applying graph-based operation and clustering. Connected Component Finding (CC) This algorithm treats grouping sentences as finding connected components (CC) in a text graph TG = (V, E), where V represents the sentences and E the sentence relations weighted by cosine similarity. Edges with weight < t, a threshold, are removed because they represent poor sentence coherence. The resultant graph may be disconnected, in which we find all of its connected components, using depth-first search. The connected components are the groups we are looking for. Note that this method cannot guarantee that every two sentences in such a group are directly linked, but it does guarantee that there exists a path between every sentence pair. Modified K-means Clustering (MKM) Observing that the CC method finds only coherent groups, not necessarily groups of coherent sentences, we develop a second algorithm using clustering. A good choice might be K-means as it is efficient and outperforms agglomerative clustering methods in NLP applications (Steibach et al., 2000), but the difficulty with the conventional K-means is the decision of K. Our solution is modified K-means (MKM) based on (Wilpon and Rabiner, 1985). Let’s denote cluster iby CLi and cluster similarity by Sim(CLi) =SimM,SiinnCLi(Sim( Sim,Sin)), where Sim( Sim,Sin)is their cvMeluin231st.(roCWSD21imLsdhtoIf(=ClaSehbLsimt)l1;(zehS<-amncs,tdoeSa1vn)c;st:el=uMionvrate(lhcSKiosmg-tC;eblLayn)s,c2riuegant wcoelsunathdi cosine. The following illustrates the algorithm. The above algorithm stops iterating when each cluster contains all above-threshold-similarity sentence pairs or only one sentence. Unlike CC, MKM results in more strongly connected groups, or groups of coherence sentences. 2.3 Ordering Algorithms After the sentences are grouped, ordering is to be conducted on two levels: group and sentence. Composed of closely related sentences, groups simulate high-level textual constructs, such as “central event”, “cause”, “effect”, “background”, etc. for news articles, around which sentences are generated for global coherence. For an intuitive example, all sentences about “cause” should immediately precede all sentences about “effect” to achieve optimal readability. We propose two approaches to group-level ordering. 1) If the group sentences come from the same document, group (Gi) order is decided by the group-representing sentence (gi) order ( means “precede”) in the text. gi gj Gi Gj 2) Group order is decided in a greedy fashion in order to maximize the connectedness between adjacent groups, thus enhancing local coherence. Each time a group is selected to achieve maximum similarity with the ordered groups and the first ordered group (G1) is selected to achieve maximum similarity with all the other groups. G1argmGaxG'GSim( G , G') GiGuanrogrdemred agrxoupsij1 Sim( Gj, G) (i > 1) where Sim(G, G’) is the average sentence cosine similarity between G and G’. 8 Within the ordered groups, sentence-level ordering is aimed to enhance local coherence by placing conceptually close sentences next to each other. Similarly, we propose two approaches. 1) If the sentences come from the same document, they are arranged by the text order. 2) Sentence order is greedily decided. Similar to the decision of group order, with ordered sentence Spi in group Gp: Sp1argSm GpaxS'SSim( S , S') SpiSunorader egd smenteanxces in Gpji1 Sim( Spj,S )(i > 1) Note that the text order is used as a common heuristic, based on the assumption that the sentences are arranged coherently in the source document, locally and globally. 3 Experiments and Preliminary Results Currently, we have evaluated grouping-based ordering on single-document summarization, for which text order is usually considered sufficient. But there is no theoretical proof that it leads to optimal global and local coherence that concerns us. On some occasions, e.g., a news article adopting the “Wall Street Journal Formula” (Rich and Harper, 2007) where conceptually related sentences are placed at the beginning and the end, sentence conceptual relatedness does not necessarily correlate with spatial proximity and thus selected sentences may need to be rearranged for better readability. We are not aware of any published work that has empirically compared alternative ways of sentence ordering for singledocument summarization. The experimental results reported below may draw some attention to this taken-for-granted issue. 3.1 Data and Method We prepared 3 datasets of 60 documents each, the first (D400) consisting of documents of about 400 words from the Document Understanding Conference (DUC) 01/02 datasets; the second (D1k) consisting of documents of about 1000 words manually selected from popular English journals such as The Wall Street Journal, The Washington Post, etc; the third (D2k) consisting of documents of about 2000 words from the DUC 01/02 dataset. Then we generated 100-word summaries for D400 and 200-word summaries for D1k and D2k. Since sentence selection is not our focus, the 180 summaries were all extracts produced by a simple but robust summarizer built on term frequency and sentence position (Aone et al., 1999). Three human annotators were employed to each provide reference orderings for the 180 summaries and mark paragraph (of at least 2 sentences) boundaries, which will be used by one of the evaluation metrics described below. In our implementation of the grouping-based ordering, sentences are represented as entity vectors and the threshold t = Avg( Sim( Sm, Sn)) c , the average sentence similarity in a group multiplied by a coefficient empirically decided on separate held-out datasets of 20 documents for each length category. The “group-representing sentence” is the textually earliest sentence in the group. We experimented with both CC and MKM to generate sentence groups and all the proposed algorithms in 2.3 for group-level and sentence- level orderings, resulting in 8 combinations as test orderings, each coded in the format of “Grouping (CC/MKM) / Group ordering (T/G) / Sentence ordering (T/G)”, where T and G represent the text order approach and the greedy selection approach respectively. For example, “CC/T/G” means grouping with CC, group ordering with text order, and sentence ordering with the greedy approach. We evaluated the test orderings against the 3 reference orderings and compute the average (Madnani et al., 2007) by using 3 different metrics. The first metric is Kendall’s τ (Lapata 2003, 2006), which has been reliably used in ordering evaluations (Bollegala et al., 2006; Madnani et al., 2007). It measures ordering differences in terms of the number of adjacent sentence inversions necessary to convert a test ordering to the reference ordering. 1 4m N( N 1) In this formula, m represents the number of inversions described above and N is the total number of sentences. The second metric is the Average Continuity (AC) proposed by Bollegala et al. (2006), which captures the intuition that the quality of sentence orderings can be estimated by the number of correctly arranged continuous sentences. 9 ACexp(1/(k1) klog(Pn)) n2 In this formula, k is the maximum number of continuous sentences, α is a small value in case Pn = 1. Pn, the proportion of continuous sentences of length n in an ordering, is defined as m/(N – n + 1) where m is the number of continuous sentences of length n in both the test and reference orderings and N is the total number of sentences. Following (Bollegala et al., 2006), we set k = Min(4, N) and α = 0.01. We also go a step further by considering only the continuous sentences in a paragraph marked by human annotators, because paragraphs are local meaning units perceived by human readers and the order of continuous sentences in a paragraph is more strongly grounded than the order of continuous sentences across paragraph boundaries. So in-paragraph sentence continuity is a better estimation for the quality of sentence orderings. This is our third metric: Paragraph-level Average Continuity (P-AC). P-ACexp(1/(k1) klog(PPn )) n2 Here PPn = m ’/(N – n + 1), where m ’ is the number of continuous sentences of length n in both the test ordering and a paragraph of the reference ordering. All the other parameters are as defined in AC and Pn. 3.2 Results The following tables show the results measured by each metric. For comparison, we also include a “Baseline” that uses the text order. For each dataset, two-tailed t-test is conducted between the top scorer and all the other orderings and statistical significance (p < 0.05) is marked with *. In general, our grouping-based ordering scheme outperforms the baseline for news articles of various lengths and statistically significant improvement can be observed on each dataset. This result casts serious doubt on the widely accepted practice of taking the text order for single-document summary generation, which is a major finding from our study. The three evaluation metrics give consistent results although they are based on different observations. The P-AC scores are much lower than their AC counterparts because of its strict paragraph constraint. Interestingly, applying the text order posterior to sentence grouping for group-level and sentence- level ordering leads to consistently optimal performance, as the top scorers on each dataset are almost all “__/T/T”. This suggests that the textual realization of coherence can be sought in the source document if possible, after the selected sentences are rearranged. It is in this sense that the general intuition about the text order is justified. It also suggests that tightly knit paragraphs (groups), where the sentences are closely connected, play a crucial role in creating a coherence flow. Shuffling those paragraphs may not affect the final coherence1. 1 Ithank an anonymous reviewer for pointing this out. 10 The grouping method does make a difference. While CC works best for the short and long datasets (D400 and D2k), MKM is more effective for the medium-sized dataset D1k. Whether the difference is simply due to length or linguistic/stylistic subtleties is an interesting topic for in-depth study. 4 Conclusion and Future Work We have established a grouping-based ordering scheme to accommodate both local and global coherence for summary generation. Experiments on single-document summaries validate our approach and challenge the well accepted text order by the summarization community. Nonetheless, the results do not necessarily propagate to multi-document summarization, for which the same-document clue for ordering cannot apply directly. Adapting the proposed scheme to multi-document summary generation is the ongoing work we are engaged in. In the next step, we will experiment with alternative sentence representations and ordering algorithms to achieve better performance. We are also considering adapting more sophisticated coherence-oriented models, such as (Soricut and Marcu, 2006; Elsner et al., 2007), to our problem so as to make more interesting comparisons possible. Acknowledgements The reported work was inspired by many talks with my supervisor, Dr. Wenjie Li, who saw through this work down to every writing detail. The author is also grateful to many people for assistance. You Ouyang shared part of his summarization work and helped with the DUC data. Dr. Li Shen, Dr. Naishi Liu, and three participants helped with the experiments. I thank them all. The work described in this paper was partially supported by Hong Kong RGC Projects (No. PolyU 5217/07E). References Aone, C., Okurowski, M. E., Gorlinsky, J., and Larsen, B. 1999. A Trainable Summarizer with Knowledge Acquired from Robust NLP Techniques. In I. Mani and M. T. Maybury (eds.), Advances in Automatic Text Summarization. 71–80. Cambridge, Massachusetts: MIT Press. Barzilay, R., Elhadad, N., and McKeown, K. 2002. Inferring Strategies for Sentence Ordering in Multidocument News Summarization. Journal of Artificial Intelligence Research, 17: 35–55. Barzilay, R. and Lapata, M. 2005. Modeling Local Coherence: An Entity-based Approach. In Proceedings of the 43rd Annual Meeting of the ACL, 141–148. Ann Arbor. Barzilay, R. and Lapata, M. 2008. Modeling Local Coherence: An Entity-Based Approach. Computational Linguistics, 34: 1–34. Barzilay, R. and Lee L. 2004. Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization. In HLT-NAACL 2004: Proceedings of the Main Conference. 113–120. Bollegala, D, Okazaki, N., and Ishizuka, M. 2006. A Bottom-up Approach to Sentence Ordering for Multidocument Summarization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, 385–392. Sydney. Elsner, M., Austerweil, j. & Charniak E. 2007. “A Unified Local and Global Model for Discourse Coherence”. In Proceedings of NAACL HLT 2007, 436-443. Rochester, NY. Grosz, B. J., Aravind K. J., and Scott W. 1995. Centering: A framework for Modeling the Local Coherence of Discourse. Computational Linguistics, 21(2):203–225. Halliday, M. A. K., and Hasan, R. 1976. Cohesion in English. London: Longman. Hovy, E. 2005. Automated Text Summarization. In R. Mitkov (ed.), The Oxford Handbook of Computational Linguistics, pp. 583–598. Oxford: Oxford University Press. Jing, H. 2000. Sentence Reduction for Automatic Text Summarization. In Proceedings of the 6th Applied Natural Language Processing WA, pp. 310–315. Conference, Seattle, Jing, H., and McKeown, K. 2000. Cut and Paste Based Text Summarization. In Proceedings of the 1st NAACL, 178–185. 11 Kehler, A. 2002. Coherence, Reference, and the Theory of Grammar. Stanford, California: CSLI Publications. Lapata, M. 2003. Probabilistic Text Structuring: Experiments with Sentence Ordering. In Proceedings of the Annual Meeting of ACL, 545–552. Sapporo, Japan. Lapata, M. 2006. Automatic evaluation of information ordering: Kendall’s tau. Computational Linguistics, 32(4):1–14. Madnani, N., Passonneau, R., Ayan, N. F., Conroy, J. M., Dorr, B. J., Klavans, J. L., O’leary, D. P., and Schlesinger, J. D. 2007. Measuring Variability in Sentence Ordering for News Summarization. In Proceedings of the Eleventh European Workshop on Natural Language Generation, 81–88. Germany. Mann, W. C. and Thompson, S. 1988. Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. Text, 8:243–281. Rich C., and Harper, C. 2007. Writing and Reporting News: A Coaching Method, Fifth Edition. Thomason Learning, Inc. Belmont, CA. Soricut, R. and Marcu D. 2006. Discourse Generation Using Utility-Trained Coherence Models. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, 803–810. Steibach, M., Karypis, G., and Kumar V. 2000. A Comparison of Document Clustering Techniques. Technical Report 00-034. Department of Computer Science and Engineering, University of Minnesota. Tapiero, I. 2007. Situation Models and Levels of Coherence: Towards a Definition of Comprehension. Mahwah, New Jersey: Lawrence Erlbaum Associates. Wilpon, J. G. and Rabiner, L. R. 1985. A Modified Kmeans Clustering Algorithm for Use in Isolated Word Recognition. In IEEE Trans. Acoustics, Speech, Signal Proc. ASSP-33(3), 587–594. Zhang R., Li, W., and Lu, Q. 2010. Sentence Ordering with Event-Enriched Semantics and Two-Layered Clustering for Multi-Document News Summarization. In COLING 2010: Poster Volume, 1489–1497, Beijing.
4 0.13166934 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features
Author: Yue Zhang ; Joakim Nivre
Abstract: Transition-based dependency parsers generally use heuristic decoding algorithms but can accommodate arbitrarily rich feature representations. In this paper, we show that we can improve the accuracy of such parsers by considering even richer feature sets than those employed in previous systems. In the standard Penn Treebank setup, our novel features improve attachment score form 91.4% to 92.9%, giving the best results so far for transitionbased parsing and rivaling the best results overall. For the Chinese Treebank, they give a signficant improvement of the state of the art. An open source release of our parser is freely available.
5 0.059025079 333 acl-2011-Web-Scale Features for Full-Scale Parsing
Author: Mohit Bansal ; Dan Klein
Abstract: Counts from large corpora (like the web) can be powerful syntactic cues. Past work has used web counts to help resolve isolated ambiguities, such as binary noun-verb PP attachments and noun compound bracketings. In this work, we first present a method for generating web count features that address the full range of syntactic attachments. These features encode both surface evidence of lexical affinities as well as paraphrase-based cues to syntactic structure. We then integrate our features into full-scale dependency and constituent parsers. We show relative error reductions of7.0% over the second-order dependency parser of McDonald and Pereira (2006), 9.2% over the constituent parser of Petrov et al. (2006), and 3.4% over a non-local constituent reranker.
6 0.054490656 53 acl-2011-Automatically Evaluating Text Coherence Using Discourse Relations
7 0.051832337 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing
8 0.051451504 101 acl-2011-Disentangling Chat with Local Coherence Models
9 0.050564945 264 acl-2011-Reordering Metrics for MT
10 0.048666522 320 acl-2011-Unsupervised Discovery of Domain-Specific Knowledge from Text
11 0.042758696 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation
12 0.042233802 313 acl-2011-Two Easy Improvements to Lexical Weighting
13 0.038897302 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
14 0.038199652 44 acl-2011-An exponential translation model for target language morphology
15 0.036659881 287 acl-2011-Structural Topic Model for Latent Topical Structure Analysis
16 0.03453847 146 acl-2011-Goodness: A Method for Measuring Machine Translation Confidence
17 0.034171075 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
18 0.034157209 31 acl-2011-Age Prediction in Blogs: A Study of Style, Content, and Online Behavior in Pre- and Post-Social Media Generations
19 0.032655992 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
20 0.032643545 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
topicId topicWeight
[(0, 0.114), (1, 0.008), (2, -0.046), (3, -0.051), (4, -0.032), (5, -0.01), (6, 0.04), (7, 0.063), (8, 0.001), (9, 0.035), (10, 0.018), (11, 0.017), (12, -0.059), (13, -0.045), (14, -0.102), (15, 0.075), (16, -0.235), (17, 0.387), (18, -0.112), (19, -0.125), (20, 0.094), (21, 0.077), (22, -0.16), (23, -0.066), (24, 0.03), (25, -0.132), (26, 0.069), (27, 0.309), (28, 0.242), (29, 0.11), (30, 0.026), (31, 0.036), (32, -0.042), (33, -0.062), (34, -0.18), (35, 0.182), (36, 0.033), (37, -0.083), (38, 0.045), (39, 0.014), (40, 0.054), (41, 0.05), (42, -0.021), (43, 0.101), (44, 0.071), (45, 0.014), (46, -0.15), (47, -0.086), (48, 0.089), (49, -0.165)]
simIndex simValue paperId paperTitle
same-paper 1 0.96886742 237 acl-2011-Ordering Prenominal Modifiers with a Reranking Approach
Author: Jenny Liu ; Aria Haghighi
Abstract: In this work, we present a novel approach to the generation task of ordering prenominal modifiers. We take a maximum entropy reranking approach to the problem which admits arbitrary features on a permutation of modifiers, exploiting hundreds ofthousands of features in total. We compare our error rates to the state-of-the-art and to a strong Google ngram count baseline. We attain a maximum error reduction of 69.8% and average error reduction across all test sets of 59. 1% compared to the state-of-the-art and a maximum error reduction of 68.4% and average error reduction across all test sets of 41.8% compared to our Google n-gram count baseline.
2 0.90422595 275 acl-2011-Semi-Supervised Modeling for Prenominal Modifier Ordering
Author: Margaret Mitchell ; Aaron Dunlop ; Brian Roark
Abstract: In this paper, we argue that ordering prenominal modifiers typically pursued as a supervised modeling task is particularly wellsuited to semi-supervised approaches. By relying on automatic parses to extract noun phrases, we can scale up the training data by orders of magnitude. This minimizes the predominant issue of data sparsity that has informed most previous approaches. We compare several recent approaches, and find improvements from additional training data across the board; however, none outperform a simple n-gram model. – –
3 0.53103471 280 acl-2011-Sentence Ordering Driven by Local and Global Coherence for Summary Generation
Author: Renxian Zhang
Abstract: In summarization, sentence ordering is conducted to enhance summary readability by accommodating text coherence. We propose a grouping-based ordering framework that integrates local and global coherence concerns. Summary sentences are grouped before ordering is applied on two levels: group-level and sentence-level. Different algorithms for grouping and ordering are discussed. The preliminary results on single-document news datasets demonstrate the advantage of our method over a widely accepted method. 1 Introduction and Background The canonical pipeline of text summarization consists of topic identification, interpretation, and summary generation (Hovy, 2005). In the simple case of extraction, topic identification and interpretation are conflated to sentence selection and concerned with summary informativeness. In . comparison, summary generation addresses summary readability and a frequently discussed generation technique is sentence ordering. It is implicitly or explicitly stated that sentence ordering for summarization is primarily driven by coherence. For example, Barzilay et al. (2002) use lexical cohesion information to model local coherence. A statistical model by Lapata (2003) considers both lexical and syntactic features in calculating local coherence. More globally biased is Barzilay and Lee’s (2004) HMM-based content model, which models global coherence with word distribution patterns. Whilst the above models treat coherence as lexical or topical relations, Barzilay and Lapata (2005, 2008) explicitly model local coherence with an entity grid model trained for optimal syntactic role transitions of entities. 6 . polyu .edu .hk Although coherence in those works is modeled in the guise of “lexical cohesion”, “topic closeness”, “content relatedness”, etc., few published works simultaneously accommodate coherence on the two levels: local coherence and global coherence, both of which are intriguing topics in text linguistics and psychology. For sentences, local coherence means the wellconnectedness between adjacent sentences through lexical cohesion (Halliday and Hasan, 1976) or entity repetition (Grosz et al., 1995) and global coherence is the discourse-level relation connecting remote sentences (Mann and Thompson, 1995; Kehler, 2002). An abundance of psychological evidences show that coherence on both levels is manifested in text comprehension (Tapiero, 2007). Accordingly, an apt sentence ordering scheme should be driven by such concerns. We also note that as sentence ordering is usually discussed only in the context of multi-document summarization, factors other than coherence are also considered, such as time and source sentence position in Bollegala et al.’s (2006) “agglomerative ordering” approach. But it remains an open question whether sentence ordering is non-trivial for single-document summarization, as it has long been recognized as an actual strategy taken by human summarizers (Jing, 1998; Jing and McKeown, 2000) and acknowledged early in work on sentence ordering for multi-document summarization (Barzilay et al., 2002). In this paper, we outline a grouping-based sentence ordering framework that is driven by the concern of local and global coherence. Summary sentences are grouped according to their conceptual relatedness before being ordered on two levels: group-level ordering and sentence-level ordering, which capture global coherence and local coherence in an integrated model. As a preliminary study, we applied the framework to singlePortland, ORP,r UoSceAed 1i9ng-2s4 of Ju tnhee 2 A0C1L1-H. ?Lc T2 0201111 A Ssstuodcieanttio Snes fsoiro Cn,o pmapguesta 6t–io1n1a,l Linguistics document summary generation and obtained interesting results. The main contributions of this work are: (1) we stress the need to channel sentence ordering research to linguistic and psychological findings about text coherence; (2) we propose a groupingbased ordering framework that integrates both local and global coherence; (3) we find in experiments that coherence-driven sentence ordering improves the readability of singledocument summaries, for which sentence ordering is often considered trivial. In Section 2, we review related ideas and techniques in previous work. Section 3 provides the details of grouping-based sentence ordering. The preliminary experimental results are presented in Section 4. Finally, Section 5 concludes the whole paper and describes future work. 2 Grouping-Based Ordering Our ordering framework is designed to capture both local and global coherence. Globally, we identify related groups among sentences and find their relative order. Locally, we strive to keep sentence similar or related in content close to each other within one group. 2.1 Sentence Representation As summary sentences are isolated from their original context, we retain the important content information by representing sentences as concept vectors. In the simplest case, the “concept” is equivalent to content word. A drawback of this practice is that it considers every content word equally contributive to the sentence content, which is not always true. For example, in the news domain, entities realized as NPs are more important than other concepts. To represent sentences as entity vectors, we identify both common entities (as the head nouns of NPs) and named entities. Two common entities are equivalent if their noun stems are identical or synonymous. Named entities are usually equated by identity. But in order to improve accuracy, we also consider: 1) structural subsumption (one is part of another); 2) hypernymy and holonymy (the named entities are in a superordinate-subordinate or part-whole relation). Now with summary sentence Si and m entities eik (k = 1 m), Si = (wf(ei1), wf(ei2), wf(eim)), … … … … … …, 7 where wf(eik) = wk× ×f(eik), f(eik) is the frequency of eik and wk is the weight of eik. We define wk = 1 if eik is a common entity and wk = 2 if eik is a named entity. We give double weight to named entities because of their significance to news articles. After all, a news story typically contains events, places, organizations, people, etc. that denote the news theme. Other things being equal, two sentences sharing a mention of named entities are thematically closer than two sentences sharing a mention of common entities. Alternatively, we can realize the “concept” as “event” because events are prevalent semantic constructs that bear much of the sentence content in some domains (e.g., narratives and news reports). To represent sentences as event vectors, we can follow Zhang et al.’s (2010) method at the cost of more complexity. 2.2 Sentence Grouping To meet the global need of identifying sentence groups, we develop two grouping algorithms by applying graph-based operation and clustering. Connected Component Finding (CC) This algorithm treats grouping sentences as finding connected components (CC) in a text graph TG = (V, E), where V represents the sentences and E the sentence relations weighted by cosine similarity. Edges with weight < t, a threshold, are removed because they represent poor sentence coherence. The resultant graph may be disconnected, in which we find all of its connected components, using depth-first search. The connected components are the groups we are looking for. Note that this method cannot guarantee that every two sentences in such a group are directly linked, but it does guarantee that there exists a path between every sentence pair. Modified K-means Clustering (MKM) Observing that the CC method finds only coherent groups, not necessarily groups of coherent sentences, we develop a second algorithm using clustering. A good choice might be K-means as it is efficient and outperforms agglomerative clustering methods in NLP applications (Steibach et al., 2000), but the difficulty with the conventional K-means is the decision of K. Our solution is modified K-means (MKM) based on (Wilpon and Rabiner, 1985). Let’s denote cluster iby CLi and cluster similarity by Sim(CLi) =SimM,SiinnCLi(Sim( Sim,Sin)), where Sim( Sim,Sin)is their cvMeluin231st.(roCWSD21imLsdhtoIf(=ClaSehbLsimt)l1;(zehS<-amncs,tdoeSa1vn)c;st:el=uMionvrate(lhcSKiosmg-tC;eblLayn)s,c2riuegant wcoelsunathdi cosine. The following illustrates the algorithm. The above algorithm stops iterating when each cluster contains all above-threshold-similarity sentence pairs or only one sentence. Unlike CC, MKM results in more strongly connected groups, or groups of coherence sentences. 2.3 Ordering Algorithms After the sentences are grouped, ordering is to be conducted on two levels: group and sentence. Composed of closely related sentences, groups simulate high-level textual constructs, such as “central event”, “cause”, “effect”, “background”, etc. for news articles, around which sentences are generated for global coherence. For an intuitive example, all sentences about “cause” should immediately precede all sentences about “effect” to achieve optimal readability. We propose two approaches to group-level ordering. 1) If the group sentences come from the same document, group (Gi) order is decided by the group-representing sentence (gi) order ( means “precede”) in the text. gi gj Gi Gj 2) Group order is decided in a greedy fashion in order to maximize the connectedness between adjacent groups, thus enhancing local coherence. Each time a group is selected to achieve maximum similarity with the ordered groups and the first ordered group (G1) is selected to achieve maximum similarity with all the other groups. G1argmGaxG'GSim( G , G') GiGuanrogrdemred agrxoupsij1 Sim( Gj, G) (i > 1) where Sim(G, G’) is the average sentence cosine similarity between G and G’. 8 Within the ordered groups, sentence-level ordering is aimed to enhance local coherence by placing conceptually close sentences next to each other. Similarly, we propose two approaches. 1) If the sentences come from the same document, they are arranged by the text order. 2) Sentence order is greedily decided. Similar to the decision of group order, with ordered sentence Spi in group Gp: Sp1argSm GpaxS'SSim( S , S') SpiSunorader egd smenteanxces in Gpji1 Sim( Spj,S )(i > 1) Note that the text order is used as a common heuristic, based on the assumption that the sentences are arranged coherently in the source document, locally and globally. 3 Experiments and Preliminary Results Currently, we have evaluated grouping-based ordering on single-document summarization, for which text order is usually considered sufficient. But there is no theoretical proof that it leads to optimal global and local coherence that concerns us. On some occasions, e.g., a news article adopting the “Wall Street Journal Formula” (Rich and Harper, 2007) where conceptually related sentences are placed at the beginning and the end, sentence conceptual relatedness does not necessarily correlate with spatial proximity and thus selected sentences may need to be rearranged for better readability. We are not aware of any published work that has empirically compared alternative ways of sentence ordering for singledocument summarization. The experimental results reported below may draw some attention to this taken-for-granted issue. 3.1 Data and Method We prepared 3 datasets of 60 documents each, the first (D400) consisting of documents of about 400 words from the Document Understanding Conference (DUC) 01/02 datasets; the second (D1k) consisting of documents of about 1000 words manually selected from popular English journals such as The Wall Street Journal, The Washington Post, etc; the third (D2k) consisting of documents of about 2000 words from the DUC 01/02 dataset. Then we generated 100-word summaries for D400 and 200-word summaries for D1k and D2k. Since sentence selection is not our focus, the 180 summaries were all extracts produced by a simple but robust summarizer built on term frequency and sentence position (Aone et al., 1999). Three human annotators were employed to each provide reference orderings for the 180 summaries and mark paragraph (of at least 2 sentences) boundaries, which will be used by one of the evaluation metrics described below. In our implementation of the grouping-based ordering, sentences are represented as entity vectors and the threshold t = Avg( Sim( Sm, Sn)) c , the average sentence similarity in a group multiplied by a coefficient empirically decided on separate held-out datasets of 20 documents for each length category. The “group-representing sentence” is the textually earliest sentence in the group. We experimented with both CC and MKM to generate sentence groups and all the proposed algorithms in 2.3 for group-level and sentence- level orderings, resulting in 8 combinations as test orderings, each coded in the format of “Grouping (CC/MKM) / Group ordering (T/G) / Sentence ordering (T/G)”, where T and G represent the text order approach and the greedy selection approach respectively. For example, “CC/T/G” means grouping with CC, group ordering with text order, and sentence ordering with the greedy approach. We evaluated the test orderings against the 3 reference orderings and compute the average (Madnani et al., 2007) by using 3 different metrics. The first metric is Kendall’s τ (Lapata 2003, 2006), which has been reliably used in ordering evaluations (Bollegala et al., 2006; Madnani et al., 2007). It measures ordering differences in terms of the number of adjacent sentence inversions necessary to convert a test ordering to the reference ordering. 1 4m N( N 1) In this formula, m represents the number of inversions described above and N is the total number of sentences. The second metric is the Average Continuity (AC) proposed by Bollegala et al. (2006), which captures the intuition that the quality of sentence orderings can be estimated by the number of correctly arranged continuous sentences. 9 ACexp(1/(k1) klog(Pn)) n2 In this formula, k is the maximum number of continuous sentences, α is a small value in case Pn = 1. Pn, the proportion of continuous sentences of length n in an ordering, is defined as m/(N – n + 1) where m is the number of continuous sentences of length n in both the test and reference orderings and N is the total number of sentences. Following (Bollegala et al., 2006), we set k = Min(4, N) and α = 0.01. We also go a step further by considering only the continuous sentences in a paragraph marked by human annotators, because paragraphs are local meaning units perceived by human readers and the order of continuous sentences in a paragraph is more strongly grounded than the order of continuous sentences across paragraph boundaries. So in-paragraph sentence continuity is a better estimation for the quality of sentence orderings. This is our third metric: Paragraph-level Average Continuity (P-AC). P-ACexp(1/(k1) klog(PPn )) n2 Here PPn = m ’/(N – n + 1), where m ’ is the number of continuous sentences of length n in both the test ordering and a paragraph of the reference ordering. All the other parameters are as defined in AC and Pn. 3.2 Results The following tables show the results measured by each metric. For comparison, we also include a “Baseline” that uses the text order. For each dataset, two-tailed t-test is conducted between the top scorer and all the other orderings and statistical significance (p < 0.05) is marked with *. In general, our grouping-based ordering scheme outperforms the baseline for news articles of various lengths and statistically significant improvement can be observed on each dataset. This result casts serious doubt on the widely accepted practice of taking the text order for single-document summary generation, which is a major finding from our study. The three evaluation metrics give consistent results although they are based on different observations. The P-AC scores are much lower than their AC counterparts because of its strict paragraph constraint. Interestingly, applying the text order posterior to sentence grouping for group-level and sentence- level ordering leads to consistently optimal performance, as the top scorers on each dataset are almost all “__/T/T”. This suggests that the textual realization of coherence can be sought in the source document if possible, after the selected sentences are rearranged. It is in this sense that the general intuition about the text order is justified. It also suggests that tightly knit paragraphs (groups), where the sentences are closely connected, play a crucial role in creating a coherence flow. Shuffling those paragraphs may not affect the final coherence1. 1 Ithank an anonymous reviewer for pointing this out. 10 The grouping method does make a difference. While CC works best for the short and long datasets (D400 and D2k), MKM is more effective for the medium-sized dataset D1k. Whether the difference is simply due to length or linguistic/stylistic subtleties is an interesting topic for in-depth study. 4 Conclusion and Future Work We have established a grouping-based ordering scheme to accommodate both local and global coherence for summary generation. Experiments on single-document summaries validate our approach and challenge the well accepted text order by the summarization community. Nonetheless, the results do not necessarily propagate to multi-document summarization, for which the same-document clue for ordering cannot apply directly. Adapting the proposed scheme to multi-document summary generation is the ongoing work we are engaged in. In the next step, we will experiment with alternative sentence representations and ordering algorithms to achieve better performance. We are also considering adapting more sophisticated coherence-oriented models, such as (Soricut and Marcu, 2006; Elsner et al., 2007), to our problem so as to make more interesting comparisons possible. Acknowledgements The reported work was inspired by many talks with my supervisor, Dr. Wenjie Li, who saw through this work down to every writing detail. The author is also grateful to many people for assistance. You Ouyang shared part of his summarization work and helped with the DUC data. Dr. Li Shen, Dr. Naishi Liu, and three participants helped with the experiments. I thank them all. The work described in this paper was partially supported by Hong Kong RGC Projects (No. PolyU 5217/07E). References Aone, C., Okurowski, M. E., Gorlinsky, J., and Larsen, B. 1999. A Trainable Summarizer with Knowledge Acquired from Robust NLP Techniques. In I. Mani and M. T. Maybury (eds.), Advances in Automatic Text Summarization. 71–80. Cambridge, Massachusetts: MIT Press. Barzilay, R., Elhadad, N., and McKeown, K. 2002. Inferring Strategies for Sentence Ordering in Multidocument News Summarization. Journal of Artificial Intelligence Research, 17: 35–55. Barzilay, R. and Lapata, M. 2005. Modeling Local Coherence: An Entity-based Approach. In Proceedings of the 43rd Annual Meeting of the ACL, 141–148. Ann Arbor. Barzilay, R. and Lapata, M. 2008. Modeling Local Coherence: An Entity-Based Approach. Computational Linguistics, 34: 1–34. Barzilay, R. and Lee L. 2004. Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization. In HLT-NAACL 2004: Proceedings of the Main Conference. 113–120. Bollegala, D, Okazaki, N., and Ishizuka, M. 2006. A Bottom-up Approach to Sentence Ordering for Multidocument Summarization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, 385–392. Sydney. Elsner, M., Austerweil, j. & Charniak E. 2007. “A Unified Local and Global Model for Discourse Coherence”. In Proceedings of NAACL HLT 2007, 436-443. Rochester, NY. Grosz, B. J., Aravind K. J., and Scott W. 1995. Centering: A framework for Modeling the Local Coherence of Discourse. Computational Linguistics, 21(2):203–225. Halliday, M. A. K., and Hasan, R. 1976. Cohesion in English. London: Longman. Hovy, E. 2005. Automated Text Summarization. In R. Mitkov (ed.), The Oxford Handbook of Computational Linguistics, pp. 583–598. Oxford: Oxford University Press. Jing, H. 2000. Sentence Reduction for Automatic Text Summarization. In Proceedings of the 6th Applied Natural Language Processing WA, pp. 310–315. Conference, Seattle, Jing, H., and McKeown, K. 2000. Cut and Paste Based Text Summarization. In Proceedings of the 1st NAACL, 178–185. 11 Kehler, A. 2002. Coherence, Reference, and the Theory of Grammar. Stanford, California: CSLI Publications. Lapata, M. 2003. Probabilistic Text Structuring: Experiments with Sentence Ordering. In Proceedings of the Annual Meeting of ACL, 545–552. Sapporo, Japan. Lapata, M. 2006. Automatic evaluation of information ordering: Kendall’s tau. Computational Linguistics, 32(4):1–14. Madnani, N., Passonneau, R., Ayan, N. F., Conroy, J. M., Dorr, B. J., Klavans, J. L., O’leary, D. P., and Schlesinger, J. D. 2007. Measuring Variability in Sentence Ordering for News Summarization. In Proceedings of the Eleventh European Workshop on Natural Language Generation, 81–88. Germany. Mann, W. C. and Thompson, S. 1988. Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. Text, 8:243–281. Rich C., and Harper, C. 2007. Writing and Reporting News: A Coaching Method, Fifth Edition. Thomason Learning, Inc. Belmont, CA. Soricut, R. and Marcu D. 2006. Discourse Generation Using Utility-Trained Coherence Models. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, 803–810. Steibach, M., Karypis, G., and Kumar V. 2000. A Comparison of Document Clustering Techniques. Technical Report 00-034. Department of Computer Science and Engineering, University of Minnesota. Tapiero, I. 2007. Situation Models and Levels of Coherence: Towards a Definition of Comprehension. Mahwah, New Jersey: Lawrence Erlbaum Associates. Wilpon, J. G. and Rabiner, L. R. 1985. A Modified Kmeans Clustering Algorithm for Use in Isolated Word Recognition. In IEEE Trans. Acoustics, Speech, Signal Proc. ASSP-33(3), 587–594. Zhang R., Li, W., and Lu, Q. 2010. Sentence Ordering with Event-Enriched Semantics and Two-Layered Clustering for Multi-Document News Summarization. In COLING 2010: Poster Volume, 1489–1497, Beijing.
4 0.27230054 239 acl-2011-P11-5002 k2opt.pdf
Author: empty-author
Abstract: unkown-abstract
5 0.24461673 102 acl-2011-Does Size Matter - How Much Data is Required to Train a REG Algorithm?
Author: Mariet Theune ; Ruud Koolen ; Emiel Krahmer ; Sander Wubben
Abstract: In this paper we investigate how much data is required to train an algorithm for attribute selection, a subtask of Referring Expressions Generation (REG). To enable comparison between different-sized training sets, a systematic training method was developed. The results show that depending on the complexity of the domain, training on 10 to 20 items may already lead to a good performance.
6 0.23817758 297 acl-2011-That's What She Said: Double Entendre Identification
7 0.22864401 53 acl-2011-Automatically Evaluating Text Coherence Using Discourse Relations
8 0.22573966 120 acl-2011-Even the Abstract have Color: Consensus in Word-Colour Associations
9 0.21780688 320 acl-2011-Unsupervised Discovery of Domain-Specific Knowledge from Text
10 0.19925743 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation
11 0.19835994 301 acl-2011-The impact of language models and loss functions on repair disfluency detection
12 0.19687285 195 acl-2011-Language of Vandalism: Improving Wikipedia Vandalism Detection via Stylometric Analysis
13 0.19629484 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features
14 0.18753622 101 acl-2011-Disentangling Chat with Local Coherence Models
15 0.185574 113 acl-2011-Efficient Online Locality Sensitive Hashing via Reservoir Counting
16 0.18474394 38 acl-2011-An Empirical Investigation of Discounting in Cross-Domain Language Models
17 0.18235451 333 acl-2011-Web-Scale Features for Full-Scale Parsing
18 0.18017887 85 acl-2011-Coreference Resolution with World Knowledge
19 0.17588143 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing
20 0.17373723 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity
topicId topicWeight
[(5, 0.013), (17, 0.065), (26, 0.028), (37, 0.088), (39, 0.031), (41, 0.044), (55, 0.247), (59, 0.041), (64, 0.132), (72, 0.023), (91, 0.055), (96, 0.121), (97, 0.016)]
simIndex simValue paperId paperTitle
1 0.86911047 24 acl-2011-A Scalable Probabilistic Classifier for Language Modeling
Author: Joel Lang
Abstract: We present a novel probabilistic classifier, which scales well to problems that involve a large number ofclasses and require training on large datasets. A prominent example of such a problem is language modeling. Our classifier is based on the assumption that each feature is associated with a predictive strength, which quantifies how well the feature can predict the class by itself. The predictions of individual features can then be combined according to their predictive strength, resulting in a model, whose parameters can be reliably and efficiently estimated. We show that a generative language model based on our classifier consistently matches modified Kneser-Ney smoothing and can outperform it if sufficiently rich features are incorporated.
2 0.85852814 78 acl-2011-Confidence-Weighted Learning of Factored Discriminative Language Models
Author: Viet Ha Thuc ; Nicola Cancedda
Abstract: Language models based on word surface forms only are unable to benefit from available linguistic knowledge, and tend to suffer from poor estimates for rare features. We propose an approach to overcome these two limitations. We use factored features that can flexibly capture linguistic regularities, and we adopt confidence-weighted learning, a form of discriminative online learning that can better take advantage of a heavy tail of rare features. Finally, we extend the confidence-weighted learning to deal with label noise in training data, a common case with discriminative lan- guage modeling.
3 0.85438228 275 acl-2011-Semi-Supervised Modeling for Prenominal Modifier Ordering
Author: Margaret Mitchell ; Aaron Dunlop ; Brian Roark
Abstract: In this paper, we argue that ordering prenominal modifiers typically pursued as a supervised modeling task is particularly wellsuited to semi-supervised approaches. By relying on automatic parses to extract noun phrases, we can scale up the training data by orders of magnitude. This minimizes the predominant issue of data sparsity that has informed most previous approaches. We compare several recent approaches, and find improvements from additional training data across the board; however, none outperform a simple n-gram model. – –
same-paper 4 0.84605867 237 acl-2011-Ordering Prenominal Modifiers with a Reranking Approach
Author: Jenny Liu ; Aria Haghighi
Abstract: In this work, we present a novel approach to the generation task of ordering prenominal modifiers. We take a maximum entropy reranking approach to the problem which admits arbitrary features on a permutation of modifiers, exploiting hundreds ofthousands of features in total. We compare our error rates to the state-of-the-art and to a strong Google ngram count baseline. We attain a maximum error reduction of 69.8% and average error reduction across all test sets of 59. 1% compared to the state-of-the-art and a maximum error reduction of 68.4% and average error reduction across all test sets of 41.8% compared to our Google n-gram count baseline.
5 0.83480096 124 acl-2011-Exploiting Morphology in Turkish Named Entity Recognition System
Author: Reyyan Yeniterzi
Abstract: Turkish is an agglutinative language with complex morphological structures, therefore using only word forms is not enough for many computational tasks. In this paper we analyze the effect of morphology in a Named Entity Recognition system for Turkish. We start with the standard word-level representation and incrementally explore the effect of capturing syntactic and contextual properties of tokens. Furthermore, we also explore a new representation in which roots and morphological features are represented as separate tokens instead of representing only words as tokens. Using syntactic and contextual properties with the new representation provide an 7.6% relative improvement over the baseline.
6 0.81392574 144 acl-2011-Global Learning of Typed Entailment Rules
7 0.7360422 245 acl-2011-Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives
8 0.71805567 106 acl-2011-Dual Decomposition for Natural Language Processing
9 0.65770745 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling
10 0.65480763 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning
11 0.65445644 175 acl-2011-Integrating history-length interpolation and classes in language modeling
13 0.65042841 150 acl-2011-Hierarchical Text Classification with Latent Concepts
14 0.65024006 85 acl-2011-Coreference Resolution with World Knowledge
15 0.64331305 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora
16 0.63992298 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
17 0.63861871 17 acl-2011-A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation
18 0.63677448 135 acl-2011-Faster and Smaller N-Gram Language Models
19 0.63472003 258 acl-2011-Ranking Class Labels Using Query Sessions
20 0.63371158 197 acl-2011-Latent Class Transliteration based on Source Language Origin