acl acl2012 acl2012-58 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mohit Bansal ; Dan Klein
Abstract: To address semantic ambiguities in coreference resolution, we use Web n-gram features that capture a range of world knowledge in a diffuse but robust way. Specifically, we exploit short-distance cues to hypernymy, semantic compatibility, and semantic context, as well as general lexical co-occurrence. When added to a state-of-the-art coreference baseline, our Web features give significant gains on multiple datasets (ACE 2004 and ACE 2005) and metrics (MUC and B3), resulting in the best results reported to date for the end-to-end task of coreference resolution.
Reference: text
sentIndex sentText sentNum sentScore
1 edu , Abstract To address semantic ambiguities in coreference resolution, we use Web n-gram features that capture a range of world knowledge in a diffuse but robust way. [sent-3, score-0.552]
2 Specifically, we exploit short-distance cues to hypernymy, semantic compatibility, and semantic context, as well as general lexical co-occurrence. [sent-4, score-0.108]
3 When added to a state-of-the-art coreference baseline, our Web features give significant gains on multiple datasets (ACE 2004 and ACE 2005) and metrics (MUC and B3), resulting in the best results reported to date for the end-to-end task of coreference resolution. [sent-5, score-0.632]
4 1 Introduction Many of the most difficult ambiguities in coreference resolution are semantic in nature. [sent-6, score-0.47]
5 For instance, consider the following example: When Obama met Jobs, the president dis- cussed the economy, technology, and education. [sent-7, score-0.21]
6 ] For resolving coreference in this example, a system would benefit from the world knowledge that Obama is the president. [sent-11, score-0.299]
7 Also, to resolve the pronoun his to the correct antecedent Obama, we can use the knowledge that Obama has an election campaign while Jobs does not. [sent-12, score-0.374]
8 There have been multiple previous systems that incorporate some form of world knowledge in coreference resolution tasks. [sent-14, score-0.424]
9 , 2005; Bergsma and Lin, 2006) addresses special cases and subtasks such as bridging anaphora, 389 other anaphora, definite NP reference, and pronoun resolution, computing semantic compatibility via Web-hits and counts from large corpora. [sent-17, score-0.41]
10 There is also work on end-to-end coreference resolution that uses large noun-similarity lists (Daum ´e III and Marcu, 2005) or structured knowledge bases such as Wikipedia (Yang and Su, 2007; Haghighi and Klein, 2009; Kobdani et al. [sent-18, score-0.449]
11 In order to harness the information on the Web without presupposing a deep understanding of all Web text, we instead turn to a diverse collection of Web n-gram counts (Brants and Franz, 2006) which, in aggregate, contain diffuse and indirect, but often robust, cues to reference. [sent-22, score-0.151]
12 For example, we can collect the cooccurrence statistics of an anaphor with various candidate antecedents to judge relative surface affinities (i. [sent-23, score-0.413]
13 We can also count co-occurrence statistics of competing antecedents when placed in the context of an anaphoric pronoun (i. [sent-26, score-0.506]
14 , Obama ’s election campaign versus Jobs ’ election campaign). [sent-28, score-0.192]
15 All of our features begin with a pair of headwords from candidate mention pairs and compute statistics derived from various potentially informative queries’ counts. [sent-29, score-0.62]
16 We explore five major categories of semantically informative Web features, based on (1) general lexical affinities (via generic co-occurrence statistics), (2) lexical relations (via Hearst-style hypernymy patterns), (3) similarity of entity-based context (e. [sent-30, score-0.16]
17 c so2c0ia1t2io Ans fso rc Ciatoiomnp fuotart Cio nmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi3c 8s9–398, which h is a y is attested), (4) matches of distributional soft cluster ids, and (5) attested substitutions of candidate antecedents in the context of a pronominal anaphor. [sent-34, score-0.459]
18 , 2010) using a decision tree (DT) as its pairwise classifier. [sent-37, score-0.172]
19 To this baseline system, we add our suite of features in turn, each class of features providing substantial gains. [sent-38, score-0.167]
20 (2009) in using a decision tree classifier rather than an averaged linear perceptron. [sent-45, score-0.223]
21 The mention-pair model relies on a pairwise function to determine whether or not two mentions are coreferent. [sent-49, score-0.12]
22 Pairwise predictions are then consolidated by transitive closure (or some other clustering method) to form the final set of coreference clusters (chains). [sent-50, score-0.338]
23 The Reconcile system provides baseline features, learning mechanisms, and resolution procedures that already achieve near state-of-the-art results on multiple popular datasets using multiple standard metrics. [sent-52, score-0.214]
24 1 In this paper, we develop a suite of simple semantic Web features based on pairs of mention headwords which stack with the default Reconcile features to surpass past state-of-the-art results. [sent-56, score-0.626]
25 2 Decision Tree Classifier Among the various learning algorithms that Reconcile supports, we chose the decision tree classifier, available in Weka (Hall et al. [sent-58, score-0.139]
26 5 algorithm builds decision trees by incrementally maximizing information gain. [sent-62, score-0.134]
27 5 splits the data on an attribute that most effectively splits its set of samples into more ordered subsets, and then recurses on these smaller subsets. [sent-65, score-0.116]
28 The decision tree can then be used to classify a new sample by following a path from the root downward based on the attribute values of the sample. [sent-66, score-0.139]
29 We find the decision tree classifier to work better than the default averaged perceptron (used by Stoyanov et al. [sent-67, score-0.312]
30 Many advantages have been claimed for decision tree classifiers, including interpretability and robustness. [sent-70, score-0.139]
31 However, we suspect that the aspect most relevant to our case is that decision trees can capture non-linear interactions between features. [sent-71, score-0.127]
32 For example, recency is very important for pronoun reference but much less so for nominal reference. [sent-72, score-0.13]
33 3 Semantics via Web Features Our Web features for coreference resolution are simple and capture a range of diffuse world knowledge. [sent-73, score-0.597]
34 Given a mention pair, we use the head finder in Reconcile to find the lexical heads of both mentions (for 1We use the default configuration settings of Reconcile (Stoyanov et al. [sent-74, score-0.244]
35 Next, we take each headword pair (h1, h2) and compute various Web-count functions on it that can signal whether or not this mention pair is coreferent. [sent-77, score-0.44]
36 The features that require word clusters (Section 3. [sent-80, score-0.133]
37 The first four types are most intuitive for mention pairs where both members are non-pronominal, but, aside from the general co-occurrence group, helped for all mention pair types. [sent-84, score-0.293]
38 The fifth feature group applies only to pairs in which the anaphor is a pronoun but the antecedent is a non-pronoun. [sent-85, score-0.448]
39 1 General co-occurrence These features capture co-occurrence statistics of the two headwords, i. [sent-88, score-0.143]
40 This count can be a useful coreference signal because, in general, mentions referring to the same entity will co-occur more frequently (in large corpora) than those that do not. [sent-91, score-0.506]
41 Using the n-grams corpus (for n = 1 to 5), we collect co-occurrence Web-counts by allowing a varying number of wildcards between h1 and h2 in the query. [sent-92, score-0.113]
42 2These clusters are derived form the V2 Google n-grams corpus. [sent-97, score-0.073]
43 We normalize the overall co-occurrence count of the headword pair c12 by the unigram counts of the individual headwords c1 and c2, so that high-frequency headwords do not unfairly get a high feature value (this is similar to computing scaled mutual information MI (Church and Hanks, 1989)). [sent-108, score-1.133]
44 3 This normalized value is quantized by taking its log10 and binning. [sent-109, score-0.104]
45 The actual feature that fires is an indicator of which quantized bin the query produced. [sent-110, score-0.183]
46 As a real example from our development set, the cooccurrence count c12 for the headword pair (leader, president) is 11383, while it is only 95 for the headword pair (voter, president); after normalization and log10, the values are -10. [sent-111, score-0.734]
47 2 Hearst co-occurrence These features capture templated co-occurrence of the two headwords h1 and h2 in the Web-corpus. [sent-119, score-0.407]
48 Here, we only collect statistics of the headwords cooccurring with a generalized Hearst pattern (Hearst, 1992) in between. [sent-120, score-0.411]
49 Hearst patterns capture various lexical semantic relations between items. [sent-121, score-0.12]
50 h2 For this feature, we again use a quantized normalized count as in Section 3. [sent-130, score-0.258]
51 We did not allow wildcards in between the headwords and the Hearst-patterns because this introduced a significant amount of noise. [sent-132, score-0.374]
52 4 As a real example from our development set, the c12 count for the headword pair (leader, president) is 752, while for (voter, president), it is 0. [sent-134, score-0.461]
53 Hypernymic semantic compatibility for coreference is intuitive and has been explored in varying forms by previous work. [sent-135, score-0.393]
54 (2004) and Markert and Nissim (2005) employ a subset of our Hearst patterns and Web-hits for the subtasks of bridging anaphora, other-anaphora, and definite NP resolution. [sent-137, score-0.161]
55 Others (Haghighi and Klein, 2009; Rahman and Ng, 2011; Daum e´ III and Marcu, 2005) use similar relations to extract compatibility statistics from Wikipedia, YAGO, and noun-similarity lists. [sent-138, score-0.139]
56 Yang and Su (2007) use Wikipedia to automatically extract semantic patterns, which are then used as features in a learning setup. [sent-139, score-0.097]
57 Instead of extracting patterns from the training data, we use all the above patterns, which helps us generalize to new datasets for end-to-end coreference resolution (see Section 4. [sent-140, score-0.48]
58 3 Entity-based context For each headword h, we first collect context seeds y using the pattern h {is | are | was | were} {a | an | the} ? [sent-143, score-0.466]
59 y taking seeds y in order of decreasing Web count. [sent-144, score-0.089]
60 are h1 including h2 392 30 seeds (and their parts of speech) include important cues such as president is elected (verb), president is authorized (verb), president is responsible (adjective), president is the chief (adjective), president is above (preposition), and president is the head (noun). [sent-147, score-1.435]
61 Matches in the seed lists of two headwords can be a strong signal that they are coreferent. [sent-148, score-0.421]
62 For example, in the top 30 seed lists for the headword pair (leader, president), we get matches including elected, responsible, and expected. [sent-149, score-0.436]
63 To capture this effect, we create a feature that indicates whether there is a match in the top k seeds of the two headwords (where k is a hyperparameter to tune). [sent-150, score-0.557]
64 We create another feature that indicates whether the dominant parts of speech in the seed lists matches for the headword pair. [sent-151, score-0.438]
65 We first collect the POS tags (using length 2 character prefixes to indicate coarse parts of speech) of the seeds matched in the top k0 seed lists of the two headwords, where k0 is another hyperparameter to tune. [sent-152, score-0.295]
66 If the dominant tags match and are in a small list of important tags ({JJ, NN, RB, VB}), we fire an indicator feature specifying tNhe, RmBa,tcVhBed} tag, oftihreerawniisned we ofirrfee a nomatch indicator. [sent-153, score-0.114]
67 Here, we design features with the idea that this hypothesis extends to reference: mentions occurring in similar contexts in large document sets such as the Web tend to be compatible for coreference. [sent-157, score-0.189]
68 Instead of collecting the contexts of each mention and creating sparse features from them, we use Web-scale distributional clustering to summarize compatibility. [sent-158, score-0.231]
69 These clusters come from distributional K-Means clustering (with K = 1000) on phrases, using the n-gram context as features. [sent-161, score-0.165]
70 The cluster data contains almost 10 million phrases and their soft cluster memberships. [sent-162, score-0.178]
71 Up to twenty cluster ids with the highest centroid similarities are included for each phrase in this dataset (Lin et al. [sent-163, score-0.135]
72 Our cluster-based features assume that if the headwords of the two mentions have matches in their cluster id lists, then they are more compatible for coreference. [sent-165, score-0.644]
73 We check the match of not just the top 1cluster ids, but also farther down in the 20 sized lists because, as discussed in Lin and Wu (2009), the soft cluster assignments often reveal different senses of a word. [sent-166, score-0.18]
74 To this end, we fire a feature indicating the value bin(i +j), where iand j are the earliest match positions in the cluster id lists of h1 and h2. [sent-168, score-0.262]
75 5 Pronoun context Our last feature category specifically addresses pronoun reference, for cases when the anaphoric mention NP2 (and hence its headword h2) is a pronoun, while the candidate antecedent mention NP1 (and hence its headword h1) is not. [sent-175, score-1.135]
76 For such a headword pair (h1, h2), the idea is to substitute the nonpronoun h1 into h2’s position and see whether the result is attested on the Web. [sent-176, score-0.335]
77 If the anaphoric pronominal mention is h2 and its sentential context is l’ lh2 r r’, then the substituted phrase will be l’ lh1 r r’. [sent-177, score-0.312]
78 5 High Web counts of substituted phrases tend to indicate semantic compatibility. [sent-178, score-0.113]
79 We chose the following three context types, based on performance on a development set: 5Possessive pronouns are replaced with an additional apostrophe, i. [sent-180, score-0.081]
80 We also use features (see R1Gap) that allow wildcards (? [sent-183, score-0.122]
81 ) in between the headword and the context when collecting Web-counts, in order to allow for determiners and other filler words. [sent-184, score-0.279]
82 r (R1Gap) As an example of the R1Gap feature, if the anaphor h2 + context is his victory and one candidate antecedent h1 is Bush, then we compute the normalized value count( “Bush 0s ? [sent-186, score-0.433]
83 r”)count(“h1”) The final feature value is again a normalized count converted to log10 and then binned. [sent-192, score-0.197]
84 6 We have three separate features for the R1, R2, and R1Gap context types. [sent-193, score-0.107]
85 These pronoun resolution features are similar to selectional preference work by Yang et al. [sent-195, score-0.315]
86 (2005) and Bergsma and Lin (2006), who compute semantic compatibility for pronouns in specific syntactic relationships such as possessive-noun, subject-verb, etc. [sent-196, score-0.128]
87 In our case, we directly use the general context of any pronominal anaphor to find its most compatible antecedent. [sent-197, score-0.3]
88 Note that all our above features are designed to be non-sparse by firing indicators of the quantized Web statistics and not the lexical- or class-based identities of the mention pair. [sent-198, score-0.338]
89 This keeps the total number of features small, which is important for the relatively small datasets used for coreference resolution. [sent-199, score-0.367]
90 We go from around 100 features in the Reconcile baseline to around 165 features after adding all our Web features. [sent-200, score-0.167]
91 First, we divide by the count of the antecedent so that when choosing the best antecedent for a fixed anaphor, we are not biased towards more frequently occurring antecedents. [sent-202, score-0.432]
92 Second, we divide by the count of the context so that across anaphora, an anaphor with rarer context does not get smaller values (for all its candidate antecedents) than another anaphor with a more common context. [sent-203, score-0.599]
93 1 Data We show results on three popular and comparatively larger coreference resolution data sets ACE05, and ACE05-ALL – the ACE04, datasets from the ACE Program (NIST, 2004). [sent-206, score-0.432]
94 B3 computes precision and recall for each mention by computing the intersection of its predicted and gold cluster and dividing by the size of the predicted 7Note that the development set is used only for ACE04, because for ACE05, and ACE05-ALL, we directly test using the features tuned on ACE04. [sent-218, score-0.309]
95 8Note that B3 has two versions which handle twinless (spurious) mentions in different ways (see Stoyanov et al. [sent-219, score-0.087]
96 AvgPerc is the averaged perceptron baseline, DecTree is the decision tree baseline, and the +Feature rows show the effect of adding a particular feature incrementally (not in isolation) to the DecTree baseline. [sent-229, score-0.318]
97 , 2011) that MUC is biased towards large clusters (chains) whereas B3 is biased towards singleton clusters. [sent-233, score-0.165]
98 3 Results We start with the Reconcile baseline but employ the decision tree (DT) classifier, because it has significantly better performance than the default averaged perceptron classifier used in Stoyanov et al. [sent-236, score-0.359]
99 9 Table 2 compares the baseline perceptron results to the DT results and then shows the incremental addition of the Web features to the DT baseline (on the ACE04 development set). [sent-238, score-0.246]
100 Each feature type incrementally increases both MUC and B3 F1-measures, showing that they are not taking advantage of any bias of either metric. [sent-241, score-0.085]
wordName wordTfidf (topN-words)
[('reconcile', 0.417), ('headwords', 0.312), ('coreference', 0.265), ('headword', 0.232), ('president', 0.21), ('stoyanov', 0.207), ('anaphor', 0.159), ('count', 0.154), ('pronoun', 0.13), ('mention', 0.126), ('resolution', 0.125), ('muc', 0.116), ('antecedent', 0.116), ('ace', 0.108), ('quantized', 0.104), ('web', 0.102), ('decision', 0.092), ('compatibility', 0.091), ('seeds', 0.089), ('cluster', 0.089), ('mentions', 0.087), ('obama', 0.087), ('diffuse', 0.078), ('kobdani', 0.078), ('victory', 0.078), ('dt', 0.078), ('antecedents', 0.077), ('jobs', 0.077), ('hearst', 0.077), ('clusters', 0.073), ('haghighi', 0.071), ('hypernymy', 0.068), ('leader', 0.068), ('election', 0.064), ('campaign', 0.064), ('anaphora', 0.064), ('klein', 0.063), ('wildcards', 0.062), ('attested', 0.062), ('bush', 0.062), ('features', 0.06), ('lists', 0.059), ('rahman', 0.058), ('perceptron', 0.058), ('splits', 0.058), ('matches', 0.054), ('pronominal', 0.052), ('bergsma', 0.052), ('dectree', 0.052), ('elected', 0.052), ('territories', 0.052), ('collect', 0.051), ('anaphoric', 0.05), ('seed', 0.05), ('patterns', 0.048), ('bansal', 0.048), ('statistics', 0.048), ('classifier', 0.048), ('context', 0.047), ('tree', 0.047), ('baseline', 0.047), ('ids', 0.046), ('daum', 0.046), ('hyperparameter', 0.046), ('biased', 0.046), ('nissim', 0.045), ('affinities', 0.045), ('distributional', 0.045), ('lin', 0.044), ('ambiguities', 0.043), ('feature', 0.043), ('datasets', 0.042), ('incrementally', 0.042), ('compatible', 0.042), ('voter', 0.041), ('markert', 0.041), ('bridging', 0.041), ('poesio', 0.041), ('pair', 0.041), ('chains', 0.04), ('counts', 0.039), ('fire', 0.039), ('subtasks', 0.039), ('semantic', 0.037), ('yago', 0.037), ('substituted', 0.037), ('bin', 0.036), ('averaged', 0.036), ('capture', 0.035), ('wikipedia', 0.034), ('development', 0.034), ('cues', 0.034), ('world', 0.034), ('ng', 0.033), ('soon', 0.033), ('definite', 0.033), ('candidate', 0.033), ('pairwise', 0.033), ('match', 0.032), ('default', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 58 acl-2012-Coreference Semantics from Web Features
Author: Mohit Bansal ; Dan Klein
Abstract: To address semantic ambiguities in coreference resolution, we use Web n-gram features that capture a range of world knowledge in a diffuse but robust way. Specifically, we exploit short-distance cues to hypernymy, semantic compatibility, and semantic context, as well as general lexical co-occurrence. When added to a state-of-the-art coreference baseline, our Web features give significant gains on multiple datasets (ACE 2004 and ACE 2005) and metrics (MUC and B3), resulting in the best results reported to date for the end-to-end task of coreference resolution.
2 0.29272789 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale
Author: Michael Wick ; Sameer Singh ; Andrew McCallum
Abstract: Sameer Singh Andrew McCallum University of Massachusetts University of Massachusetts 140 Governor’s Drive 140 Governor’s Drive Amherst, MA Amherst, MA s ameer@ cs .umas s .edu mccal lum@ c s .umas s .edu Hamming” who authored “The unreasonable effectiveness of mathematics.” Features of the mentions Methods that measure compatibility between mention pairs are currently the dominant ap- proach to coreference. However, they suffer from a number of drawbacks including difficulties scaling to large numbers of mentions and limited representational power. As these drawbacks become increasingly restrictive, the need to replace the pairwise approaches with a more expressive, highly scalable alternative is becoming urgent. In this paper we propose a novel discriminative hierarchical model that recursively partitions entities into trees of latent sub-entities. These trees succinctly summarize the mentions providing a highly compact, information-rich structure for reasoning about entities and coreference uncertainty at massive scales. We demonstrate that the hierarchical model is several orders of magnitude faster than pairwise, allowing us to perform coreference on six million author mentions in under four hours on a single CPU.
3 0.18221498 50 acl-2012-Collective Classification for Fine-grained Information Status
Author: Katja Markert ; Yufang Hou ; Michael Strube
Abstract: Previous work on classifying information status (Nissim, 2006; Rahman and Ng, 2011) is restricted to coarse-grained classification and focuses on conversational dialogue. We here introduce the task of classifying finegrained information status and work on written text. We add a fine-grained information status layer to the Wall Street Journal portion of the OntoNotes corpus. We claim that the information status of a mention depends not only on the mention itself but also on other mentions in the vicinity and solve the task by collectively classifying the information status ofall mentions. Our approach strongly outperforms reimplementations of previous work.
4 0.14500125 18 acl-2012-A Probabilistic Model for Canonicalizing Named Entity Mentions
Author: Dani Yogatama ; Yanchuan Sim ; Noah A. Smith
Abstract: We present a statistical model for canonicalizing named entity mentions into a table whose rows represent entities and whose columns are attributes (or parts of attributes). The model is novel in that it incorporates entity context, surface features, firstorder dependencies among attribute-parts, and a notion of noise. Transductive learning from a few seeds and a collection of mention tokens combines Bayesian inference and conditional estimation. We evaluate our model and its components on two datasets collected from political blogs and sports news, finding that it outperforms a simple agglomerative clustering approach and previous work.
5 0.092548661 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation
Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum
Abstract: To discover relation types from text, most methods cluster shallow or syntactic patterns of relation mentions, but consider only one possible sense per pattern. In practice this assumption is often violated. In this paper we overcome this issue by inducing clusters of pattern senses from feature representations of patterns. In particular, we employ a topic model to partition entity pairs associated with patterns into sense clusters using local and global features. We merge these sense clusters into semantic relations using hierarchical agglomerative clustering. We compare against several baselines: a generative latent-variable model, a clustering method that does not disambiguate between path senses, and our own approach but with only local features. Experimental results show our proposed approach discovers dramatically more accurate clusters than models without sense disambiguation, and that incorporating global features, such as the document theme, is crucial.
6 0.086320177 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling
7 0.084435776 73 acl-2012-Discriminative Learning for Joint Template Filling
8 0.083191946 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model
9 0.070142165 48 acl-2012-Classifying French Verbs Using French and English Lexical Resources
10 0.067632124 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling
11 0.064721696 187 acl-2012-Subgroup Detection in Ideological Discussions
12 0.059211843 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
13 0.058457132 126 acl-2012-Labeling Documents with Timestamps: Learning from their Time Expressions
14 0.056049958 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT
15 0.055795263 134 acl-2012-Learning to Find Translations and Transliterations on the Web
16 0.054204721 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment
17 0.054192815 85 acl-2012-Event Linking: Grounding Event Reference in a News Archive
18 0.052921187 30 acl-2012-Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations
19 0.051956762 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes
20 0.049327135 142 acl-2012-Mining Entity Types from Query Logs via User Intent Modeling
topicId topicWeight
[(0, -0.184), (1, 0.103), (2, -0.055), (3, 0.051), (4, 0.03), (5, 0.085), (6, -0.042), (7, 0.065), (8, 0.083), (9, -0.024), (10, 0.042), (11, -0.179), (12, -0.101), (13, -0.066), (14, 0.008), (15, 0.062), (16, 0.085), (17, 0.111), (18, -0.31), (19, -0.132), (20, -0.072), (21, 0.057), (22, 0.038), (23, 0.014), (24, 0.081), (25, -0.023), (26, 0.017), (27, 0.07), (28, 0.041), (29, -0.035), (30, -0.129), (31, 0.018), (32, -0.06), (33, -0.043), (34, 0.046), (35, -0.032), (36, 0.013), (37, -0.054), (38, 0.039), (39, 0.041), (40, -0.063), (41, -0.067), (42, 0.123), (43, 0.088), (44, 0.045), (45, -0.121), (46, 0.021), (47, -0.14), (48, -0.04), (49, -0.088)]
simIndex simValue paperId paperTitle
same-paper 1 0.93134636 58 acl-2012-Coreference Semantics from Web Features
Author: Mohit Bansal ; Dan Klein
Abstract: To address semantic ambiguities in coreference resolution, we use Web n-gram features that capture a range of world knowledge in a diffuse but robust way. Specifically, we exploit short-distance cues to hypernymy, semantic compatibility, and semantic context, as well as general lexical co-occurrence. When added to a state-of-the-art coreference baseline, our Web features give significant gains on multiple datasets (ACE 2004 and ACE 2005) and metrics (MUC and B3), resulting in the best results reported to date for the end-to-end task of coreference resolution.
2 0.88923985 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale
Author: Michael Wick ; Sameer Singh ; Andrew McCallum
Abstract: Sameer Singh Andrew McCallum University of Massachusetts University of Massachusetts 140 Governor’s Drive 140 Governor’s Drive Amherst, MA Amherst, MA s ameer@ cs .umas s .edu mccal lum@ c s .umas s .edu Hamming” who authored “The unreasonable effectiveness of mathematics.” Features of the mentions Methods that measure compatibility between mention pairs are currently the dominant ap- proach to coreference. However, they suffer from a number of drawbacks including difficulties scaling to large numbers of mentions and limited representational power. As these drawbacks become increasingly restrictive, the need to replace the pairwise approaches with a more expressive, highly scalable alternative is becoming urgent. In this paper we propose a novel discriminative hierarchical model that recursively partitions entities into trees of latent sub-entities. These trees succinctly summarize the mentions providing a highly compact, information-rich structure for reasoning about entities and coreference uncertainty at massive scales. We demonstrate that the hierarchical model is several orders of magnitude faster than pairwise, allowing us to perform coreference on six million author mentions in under four hours on a single CPU.
3 0.83009475 50 acl-2012-Collective Classification for Fine-grained Information Status
Author: Katja Markert ; Yufang Hou ; Michael Strube
Abstract: Previous work on classifying information status (Nissim, 2006; Rahman and Ng, 2011) is restricted to coarse-grained classification and focuses on conversational dialogue. We here introduce the task of classifying finegrained information status and work on written text. We add a fine-grained information status layer to the Wall Street Journal portion of the OntoNotes corpus. We claim that the information status of a mention depends not only on the mention itself but also on other mentions in the vicinity and solve the task by collectively classifying the information status ofall mentions. Our approach strongly outperforms reimplementations of previous work.
4 0.69372183 18 acl-2012-A Probabilistic Model for Canonicalizing Named Entity Mentions
Author: Dani Yogatama ; Yanchuan Sim ; Noah A. Smith
Abstract: We present a statistical model for canonicalizing named entity mentions into a table whose rows represent entities and whose columns are attributes (or parts of attributes). The model is novel in that it incorporates entity context, surface features, firstorder dependencies among attribute-parts, and a notion of noise. Transductive learning from a few seeds and a collection of mention tokens combines Bayesian inference and conditional estimation. We evaluate our model and its components on two datasets collected from political blogs and sports news, finding that it outperforms a simple agglomerative clustering approach and previous work.
5 0.37922341 73 acl-2012-Discriminative Learning for Joint Template Filling
Author: Einat Minkov ; Luke Zettlemoyer
Abstract: This paper presents a joint model for template filling, where the goal is to automatically specify the fields of target relations such as seminar announcements or corporate acquisition events. The approach models mention detection, unification and field extraction in a flexible, feature-rich model that allows for joint modeling of interdependencies at all levels and across fields. Such an approach can, for example, learn likely event durations and the fact that start times should come before end times. While the joint inference space is large, we demonstrate effective learning with a Perceptron-style approach that uses simple, greedy beam decoding. Empirical results in two benchmark domains demonstrate consistently strong performance on both mention de- tection and template filling tasks.
6 0.34610295 126 acl-2012-Labeling Documents with Timestamps: Learning from their Time Expressions
7 0.32058156 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study
8 0.30007526 83 acl-2012-Error Mining on Dependency Trees
9 0.29913563 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling
10 0.29640746 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
11 0.29075068 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes
12 0.28816965 34 acl-2012-Automatically Learning Measures of Child Language Development
13 0.27861625 48 acl-2012-Classifying French Verbs Using French and English Lexical Resources
14 0.27566707 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation
15 0.27538937 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction
16 0.26697221 30 acl-2012-Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations
17 0.26316467 124 acl-2012-Joint Inference of Named Entity Recognition and Normalization for Tweets
18 0.25203416 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities
19 0.25026998 189 acl-2012-Syntactic Annotations for the Google Books NGram Corpus
20 0.24808264 120 acl-2012-Information-theoretic Multi-view Domain Adaptation
topicId topicWeight
[(25, 0.013), (26, 0.049), (28, 0.048), (30, 0.018), (31, 0.27), (37, 0.044), (39, 0.056), (49, 0.013), (74, 0.034), (82, 0.034), (84, 0.027), (85, 0.038), (90, 0.128), (92, 0.051), (94, 0.024), (96, 0.026), (99, 0.054)]
simIndex simValue paperId paperTitle
1 0.77400017 74 acl-2012-Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach
Author: Hao Tang ; Joseph Keshet ; Karen Livescu
Abstract: We address the problem of learning the mapping between words and their possible pronunciations in terms of sub-word units. Most previous approaches have involved generative modeling of the distribution of pronunciations, usually trained to maximize likelihood. We propose a discriminative, feature-rich approach using large-margin learning. This approach allows us to optimize an objective closely related to a discriminative task, to incorporate a large number of complex features, and still do inference efficiently. We test the approach on the task of lexical access; that is, the prediction of a word given a phonetic transcription. In experiments on a subset of the Switchboard conversational speech corpus, our models thus far improve classification error rates from a previously published result of 29.1% to about 15%. We find that large-margin approaches outperform conditional random field learning, and that the Passive-Aggressive algorithm for largemargin learning is faster to converge than the Pegasos algorithm.
same-paper 2 0.75931859 58 acl-2012-Coreference Semantics from Web Features
Author: Mohit Bansal ; Dan Klein
Abstract: To address semantic ambiguities in coreference resolution, we use Web n-gram features that capture a range of world knowledge in a diffuse but robust way. Specifically, we exploit short-distance cues to hypernymy, semantic compatibility, and semantic context, as well as general lexical co-occurrence. When added to a state-of-the-art coreference baseline, our Web features give significant gains on multiple datasets (ACE 2004 and ACE 2005) and metrics (MUC and B3), resulting in the best results reported to date for the end-to-end task of coreference resolution.
3 0.54629117 187 acl-2012-Subgroup Detection in Ideological Discussions
Author: Amjad Abu-Jbara ; Pradeep Dasigi ; Mona Diab ; Dragomir Radev
Abstract: The rapid and continuous growth of social networking sites has led to the emergence of many communities of communicating groups. Many of these groups discuss ideological and political topics. It is not uncommon that the participants in such discussions split into two or more subgroups. The members of each subgroup share the same opinion toward the discussion topic and are more likely to agree with members of the same subgroup and disagree with members from opposing subgroups. In this paper, we propose an unsupervised approach for automatically detecting discussant subgroups in online communities. We analyze the text exchanged between the participants of a discussion to identify the attitude they carry toward each other and towards the various aspects of the discussion topic. We use attitude predictions to construct an attitude vector for each discussant. We use clustering techniques to cluster these vectors and, hence, determine the subgroup membership of each participant. We compare our methods to text clustering and other baselines, and show that our method achieves promising results.
4 0.54583371 191 acl-2012-Temporally Anchored Relation Extraction
Author: Guillermo Garrido ; Anselmo Penas ; Bernardo Cabaleiro ; Alvaro Rodrigo
Abstract: Although much work on relation extraction has aimed at obtaining static facts, many of the target relations are actually fluents, as their validity is naturally anchored to a certain time period. This paper proposes a methodological approach to temporally anchored relation extraction. Our proposal performs distant supervised learning to extract a set of relations from a natural language corpus, and anchors each of them to an interval of temporal validity, aggregating evidence from documents supporting the relation. We use a rich graphbased document-level representation to generate novel features for this task. Results show that our implementation for temporal anchoring is able to achieve a 69% of the upper bound performance imposed by the relation extraction step. Compared to the state of the art, the overall system achieves the highest precision reported.
5 0.54450172 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili ; Martha Palmer
Abstract: In this paper, we propose innovative representations for automatic classification of verbs according to mainstream linguistic theories, namely VerbNet and FrameNet. First, syntactic and semantic structures capturing essential lexical and syntactic properties of verbs are defined. Then, we design advanced similarity functions between such structures, i.e., semantic tree kernel functions, for exploiting distributional and grammatical information in Support Vector Machines. The extensive empirical analysis on VerbNet class and frame detection shows that our models capture mean- ingful syntactic/semantic structures, which allows for improving the state-of-the-art.
6 0.54402047 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence
7 0.54344708 50 acl-2012-Collective Classification for Fine-grained Information Status
8 0.54277569 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model
9 0.54243749 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
11 0.53930157 130 acl-2012-Learning Syntactic Verb Frames using Graphical Models
12 0.53922713 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
13 0.53879625 45 acl-2012-Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
14 0.53850287 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization
15 0.5380339 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities
17 0.53798616 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents
18 0.53698921 167 acl-2012-QuickView: NLP-based Tweet Search
19 0.53657883 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool
20 0.53614861 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling