acl acl2012 acl2012-18 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dani Yogatama ; Yanchuan Sim ; Noah A. Smith
Abstract: We present a statistical model for canonicalizing named entity mentions into a table whose rows represent entities and whose columns are attributes (or parts of attributes). The model is novel in that it incorporates entity context, surface features, firstorder dependencies among attribute-parts, and a notion of noise. Transductive learning from a few seeds and a collection of mention tokens combines Bayesian inference and conditional estimation. We evaluate our model and its components on two datasets collected from political blogs and sports news, finding that it outperforms a simple agglomerative clustering approach and previous work.
Reference: text
sentIndex sentText sentNum sentScore
1 edu , Abstract We present a statistical model for canonicalizing named entity mentions into a table whose rows represent entities and whose columns are attributes (or parts of attributes). [sent-4, score-1.152]
2 The model is novel in that it incorporates entity context, surface features, firstorder dependencies among attribute-parts, and a notion of noise. [sent-5, score-0.222]
3 Transductive learning from a few seeds and a collection of mention tokens combines Bayesian inference and conditional estimation. [sent-6, score-0.383]
4 We evaluate our model and its components on two datasets collected from political blogs and sports news, finding that it outperforms a simple agglomerative clustering approach and previous work. [sent-7, score-0.6]
5 1 Introduction Proper handling of mentions in text of real-world entities—identifying and resolving them—is a central part of many NLP applications. [sent-8, score-0.287]
6 We seek an algorithm that infers a set of real-world entities from mentions in a text, mapping each entity mention token to an entity, and discovers general categories of words used in names (e. [sent-9, score-0.908]
7 Here, we use a probabilistic model to infer a structured representation of canonical forms of entity attributes through transductive learning from named entity mentions with a small number of seeds (see Table 1). [sent-12, score-0.809]
8 The input is a collection of mentions found by a named entity recognizer, along with their contexts, and, following Eisenstein et al. [sent-13, score-0.491]
9 (201 1), the output is a table in which entities are rows (the number of which is not pre-specified) and attribute words are organized into columns. [sent-14, score-0.35]
10 In a political blogs dataset, the mentions refer to political figures in the United States (e. [sent-19, score-0.489]
11 , Michelle, Obamai— ewrshile p simultaneously performing coreference resolution for named entity mentions. [sent-24, score-0.314]
12 In the sports news dataset, the model is provided with named entity mentions of heterogenous types, and success here consists of identifying the correct team for every player (e. [sent-25, score-0.891]
13 In this scenario, given a few seed examples, the model begins to identify simple relations among named entities (in addition to discovering attribute structures), since attributes are expressed as named entities across multiple mentions. [sent-28, score-0.616]
14 leeyrins Table 1: Seeds for politics (above) and sports (below). [sent-33, score-0.472]
15 Top, the generation of the table: C is the number of attributes/columns, the number of rows is infinite, α is a vector of concentration parameters, φ is a multinomial distribution over strings, and x is a word in a table cell. [sent-37, score-0.216]
16 Lower left, for choosing entities to be mentioned: τ determines the stick lengths and η is the distribution over entities to be selected for mention. [sent-38, score-0.328]
17 erating mentions: M is the number of mentions in the data, w is a word token set from an entity/row r and attribute/column c. [sent-40, score-0.327]
18 2 Model We begin by assuming as input a set of mention tokens, each one or more words. [sent-44, score-0.285]
19 In our experiments these are obtained by running a named entity recognizer. [sent-45, score-0.204]
20 The output is a table in which rows are understood to correspond to entities (types, not mention tokens) and columns are fields, each associated with an attribute or a part of it. [sent-46, score-0.784]
21 The generative story, depicted in Figure 1, is: • For each column j ∈ {1, . [sent-50, score-0.201]
22 For each row i(ofwhich Gtheenree are infinitely many), adcrhawro an entry xi,j for cell i,j from φj. [sent-59, score-0.331]
23 A few of these entries (the seeds) are observed; we denote those ◦ Draw weights βj that associate shape and posDirtiaowna wl feeigathutrse sβ with columns from a 0-mean, µ-variance Laplace distribution. [sent-60, score-0.205]
24 For each row r: ◦ eDnerarwate a monutlextitn odmisitarilb over sth. [sent-63, score-0.262]
25 For each mention token m: ◦ Dr eraawch an entity/row r ∼ η. [sent-65, score-0.368]
26 ion w, given some of iFtso rf eeaatcuhre ws f (assumed observed): Choose a column c ∼ Z1 This uses a log-linear d cist ∼ribution with partition function Z. [sent-67, score-0.201]
27 first-order dependencies among the columns are enabled; these introduce a dynamic character to the graphical model that is not shown in Figure 1. [sent-71, score-0.223]
28 In the E-step, we perform collapsed Gibbs sampling to obtain distributions over row and column indices for every mention, given the current value of the hyperparamaters. [sent-98, score-0.605]
29 1 E-step For the mth mention, we sample row index r, then for each word wm‘, we sample column index c. [sent-101, score-0.727]
30 (201 1), when we sample the row for a mention, we use Bayes’ rule and marginalize the columns. [sent-105, score-0.313]
31 ) ∝ p(rm = r | r−m, η) (Q‘ Pc p(wm‘ | x, rm = r, cm‘ = c)) (QQt pP(smt | rm = r)) 687 We consider each quantity in turn. [sent-110, score-0.238]
32 The probability of drawing a row index follows a stick breaking distribution. [sent-112, score-0.38]
33 This allows us to have an unbounded number of rows and let the model infer the optimal value from data. [sent-113, score-0.21]
34 A standard marginalization of η gives us: p(rm= r | r−m,τ) =( N N+τ−r+mτ ioft Nher−rwmise>, 0 where N is the number of mentions, Nr is the number of mentions assigned to row r, and Nr−m is the number of mentions assigned to row r, excluding m. [sent-114, score-1.136]
35 In order to compute the likelihood of observing mentions in the dataset, we have to consider a few cases. [sent-116, score-0.321]
36 With base distribution G0,1 and marginalizing φ, we have: p(wm‘ | x, rm = r, cm‘ = c, αc) = (1) G0(wNmc−N1‘mc)−w? [sent-123, score-0.216]
37 ‘c−αmcα‘+αci f x xr c = 6∈={w∅ wm a n‘ d ,∅ N }cw =>0 where Nc−m‘ is the number of cells in column c that are not empty and Nc−wm‘ is the number of cells in column c that are set to the word wm‘; both counts excluding the current word under consideration. [sent-125, score-0.55]
38 It is important to be able to use context information to determine which row a mention should go into. [sent-127, score-0.598]
39 As a novel extension, our model also uses surrounding words of a mention as its “context”—similar context words can encourage two mentions that do not share any words to be merged. [sent-128, score-0.658]
40 For every row in the table, we have a multinomial distribution over context vocabulary θr from a Dirichlet prior λ. [sent-130, score-0.432]
41 2 Sampling Columns Our column sampling procedure is novel to this work and substantially differs from that of Eisenstein et al. [sent-135, score-0.297]
42 First, we note that when we sample column indices for each word in a mention, the row index for the mention r has already been sampled. [sent-137, score-0.926]
43 Also, our model has interdependencies among column indices of a mention. [sent-138, score-0.282]
44 For faster mixing, we experiment with first-order dependencies between columns when sampling column indices. [sent-140, score-0.485]
45 We sample the column index c1 for the first word in the mention, marginalizing out probabilities of other words in the mention. [sent-144, score-0.389]
46 After we sample the column index for the first word, we sample the column index c2 for the second word, fixing the previous word to be in column c1, and marginalizing out probabilities of c3, . [sent-145, score-0.995]
47 (Pwm‘ | x, rm = r, cm‘ = c, αc) , | x, 2As shown in Figure 1, column indices in a mention form “v-structures” with the row index r. [sent-160, score-0.994]
48 Our model can use arbitrary features to choose a column index. [sent-166, score-0.236]
49 Our transition probabilities are as follows: p(cm‘= c | cm‘−= c−) =PcN0c−−Nm,cc−0m,c, where Nc−m,c is the number of times we observe transitions from column c− to c, excluding mention m. [sent-172, score-0.557]
50 Mention likelihood p(wm‘ | x, rm = r, cm‘ = c, αc) is identical to when we sample the row index (Eq. [sent-175, score-0.513]
51 For each word in a mention w, we introduced 12 binary features f for our featurized log-linear distribution (§3. [sent-181, score-0.363]
52 ncased all words in mentions for the purpose of defining the table and the mention words w. [sent-187, score-0.572]
53 Ten context words (5 each to the left and right) define s for each mention token. [sent-188, score-0.336]
54 1 Datasets We collected named entity mentions from two corpora: political blogs and sports news. [sent-197, score-0.906]
55 The political blogs corpus is a collection of blog posts about politics in the United States (Eisenstein and Xing, 2010), andthe sports news corpus contains news summaries of major league sports games (National Basketball 3On our moderate-sized datasets (see §4. [sent-198, score-0.992]
56 , 2005), which tagged named entities mentions as person, location, and organization. [sent-217, score-0.501]
57 We used named entity of type person from the political blogs corpus, while we are interested in person and organization entities for the sports news corpus. [sent-218, score-0.897]
58 Table 2 summarizes statistics for both datasets of named entity mentions. [sent-220, score-0.245]
59 ’s manually built 125-entity (282 vocabulary items) reference table for the politics dataset. [sent-223, score-0.3]
60 For the sports data, we obtained a roster of all NBA, NFL, and MLB players in 2009. [sent-225, score-0.366]
61 We built our sports reference table using the players’ names, teams and locations, to get 3,642 players and 15,932 vocabulary items. [sent-226, score-0.512]
62 The gold standard table for the politics dataset is incomplete, whereas it is complete for the sports dataset. [sent-227, score-0.552]
63 2 Evaluation Scores We propose both a row evaluation to determine how well a model disambiguates entities and merges mentions of the same entity and a column evaluation to measure how well the model relates words used in different mentions. [sent-231, score-1.06]
64 The first step in evaluation is to find a maximum score bipartite matching between rows in the response and reference table. [sent-233, score-0.337]
65 5 Given the response and 5Treating each row as a set of words, we can optimize the matching using the Jonker and Volgenant (1987) algorithm. [sent-234, score-0.347]
66 The column evaluation is identical, except that sets of words that are matched are defined by columns. [sent-235, score-0.248]
67 3 Baselines Our simple baseline is an agglomerative clustering algorithm that clusters mentions into entities using the single-linkage criterion. [sent-241, score-0.521]
68 Initially, each unique mention forms its own cluster that we incrementally merge together to form rows in the table. [sent-242, score-0.46]
69 (201 1) and use the string edit distance between mention strings in each cluster to define the score. [sent-245, score-0.285]
70 For the sports dataset, since mentions contain person and organization named entity types, our score for clustering uses the Jaccard distance between context words of the mentions. [sent-246, score-0.991]
71 Therefore, we first match words in mentions of type person against a regular expression to recognize entity attributes from a fixed set of titles and suffixes, and the first, middle and last names. [sent-248, score-0.555]
72 We treat words in mentions of type organization as a single As we merge clusters together, we arrange words such that attribute. [sent-249, score-0.344]
73 Precision prefers a more compact response row (or column), which unfairly penalizes situations like those of our sports dataset, where rows are heterogeneous (e. [sent-252, score-0.808]
74 perfect precision by matching tsihme fiilarsrtit row ntcot tiohen , r wef-e erence row for Kobe Bryant and the latter row to any Lakers player. [sent-261, score-0.786]
75 690 all words within a column belong to the same attribute, creating columns as necessary to accomo- date multiple similar attributes as a result of merging. [sent-267, score-0.42]
76 4 Results For both the politics and sports dataset, we run the procedure in §3. [sent-272, score-0.472]
77 We also analyze how much each of our four main extensions (shape features, context information, noise, and first-order column dependencies) to EEA contributes to overall performance by ablating each in turn (also shown in Fig. [sent-278, score-0.252]
78 Indeed, the best model did not have column dependencies. [sent-282, score-0.236]
79 In row evaluation, the baseline system can achieve a high reference score by creating one entity row per unique string, but as it merges strings, the clusters encompass more word tokens, improving response score at the expense of reference score. [sent-284, score-0.878]
80 With fewer clusters, there are fewer entities in the response table for matching and the response score suffers. [sent-285, score-0.295]
81 In column evaluation, our method without column dependencies was best. [sent-288, score-0.441]
82 The results for the sports dataset are shown in Figure 3. [sent-290, score-0.366]
83 Our best-performing complete model’s response table contains 599 rows, of which 561 were matched to the reference table. [sent-291, score-0.209]
84 3 5-depc-nbfocdaemo-sanet pEuoltcErei nxsetA set reference score reference score Figure 2: Row (left) and column (right) scores for the politics dataset. [sent-303, score-0.541]
85 5-depc-nbfocdaemo-san EetpuolcEire nAxset set reference score reference score Figure 3: Row (left) and column (right) scores for the sports dataset. [sent-320, score-0.641]
86 The reference score is low since the reference set includes all entities in the NBA, NFL, and MLB, but most of them were not mentioned in our dataset. [sent-322, score-0.279]
87 uation, our model lies above the baseline responsereference score curve, demonstrating its ability to correctly identify and combine player mentions with their team names. [sent-323, score-0.401]
88 Similar to the previous dataset, our model is also substantially better in column evaluation, indicating that it mapped mention words into a coherent set of five columns. [sent-324, score-0.521]
89 In the politics dataset, most of the mentions contain information about people. [sent-328, score-0.473]
90 Therefore, besides canonicalizing named entities, the model also resolves within-document and cross-document coreference, since it assigned a row index for every mention. [sent-329, score-0.574]
91 For example, our model learned that most mentions of John McCain, Sen. [sent-330, score-0.322]
92 Hus ein Table 3: A small segment of the output table for the politics dataset, showing a few noteworthy correct (blue) and incorrect (red) examples. [sent-337, score-0.254]
93 In the sports dataset, persons and organizations are mentioned. [sent-353, score-0.286]
94 Our model can do better, since it makes use of context information and features, and it can put a person and an organization in one row even though they do not share common words. [sent-356, score-0.502]
95 Surprisingly, the model failed to infer that Derek Jeter plays for New York Yankees, although mentions of the organization New York Yankees can be found in our dataset. [sent-358, score-0.379]
96 The next two rows show cases where our model successfully matched players with their teams and put each word token to its respective column. [sent-361, score-0.458]
97 The sixth row is interesting because Dave Toub is indeed affiliated with the Chicago Bears. [sent-365, score-0.262]
98 However, when the model saw a mention token The Bears, it did not have any other columns to put the word token The, and decided to put it in the fourth column although it is not a location. [sent-366, score-0.848]
99 Our model is focused on the problem of canonicalizing mention strings into their parts, though its r variables (which map mentions to rows) could be interpreted as (within-document and cross-document) coreference resolution, which has been tackled using a range of probabilistic models (Li et al. [sent-374, score-0.787]
100 6 Conclusions We presented an improved probabilistic model for canonicalizing named entities into a table. [sent-378, score-0.356]
wordName wordTfidf (topN-words)
[('mentions', 0.287), ('sports', 0.286), ('mention', 0.285), ('row', 0.262), ('eisenstein', 0.243), ('cm', 0.222), ('column', 0.201), ('politics', 0.186), ('rows', 0.175), ('columns', 0.149), ('entities', 0.125), ('rm', 0.119), ('entity', 0.115), ('wm', 0.11), ('canonicalizing', 0.107), ('seeds', 0.098), ('sampling', 0.096), ('mccain', 0.093), ('named', 0.089), ('eea', 0.085), ('response', 0.085), ('index', 0.081), ('dataset', 0.08), ('players', 0.08), ('reference', 0.077), ('bayesian', 0.077), ('coreference', 0.073), ('political', 0.073), ('fixing', 0.072), ('attributes', 0.07), ('cell', 0.069), ('noteworthy', 0.068), ('league', 0.064), ('xref', 0.064), ('backward', 0.058), ('sim', 0.058), ('clustering', 0.058), ('organization', 0.057), ('shape', 0.056), ('blogs', 0.056), ('marginalizing', 0.056), ('names', 0.056), ('agglomerative', 0.051), ('bryant', 0.051), ('laplace', 0.051), ('sample', 0.051), ('context', 0.051), ('gibbs', 0.05), ('attribute', 0.05), ('nr', 0.049), ('put', 0.049), ('person', 0.048), ('matched', 0.047), ('indices', 0.046), ('player', 0.045), ('zero', 0.044), ('eraawch', 0.043), ('hkobe', 0.043), ('jonker', 0.043), ('kobe', 0.043), ('lakersi', 0.043), ('mlb', 0.043), ('nfl', 0.043), ('sref', 0.043), ('initialized', 0.043), ('pc', 0.041), ('dr', 0.041), ('prior', 0.041), ('datasets', 0.041), ('distribution', 0.041), ('token', 0.04), ('dependencies', 0.039), ('haghighi', 0.039), ('excluding', 0.038), ('featurized', 0.037), ('jensen', 0.037), ('lakers', 0.037), ('nba', 0.037), ('palin', 0.037), ('sres', 0.037), ('stick', 0.037), ('yankees', 0.037), ('yano', 0.037), ('resolution', 0.037), ('vocabulary', 0.037), ('obama', 0.036), ('model', 0.035), ('titles', 0.035), ('monte', 0.035), ('elsner', 0.034), ('michelle', 0.034), ('mochihashi', 0.034), ('pb', 0.034), ('observing', 0.034), ('team', 0.034), ('transition', 0.033), ('nonparametric', 0.033), ('notion', 0.033), ('seed', 0.033), ('teams', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 18 acl-2012-A Probabilistic Model for Canonicalizing Named Entity Mentions
Author: Dani Yogatama ; Yanchuan Sim ; Noah A. Smith
Abstract: We present a statistical model for canonicalizing named entity mentions into a table whose rows represent entities and whose columns are attributes (or parts of attributes). The model is novel in that it incorporates entity context, surface features, firstorder dependencies among attribute-parts, and a notion of noise. Transductive learning from a few seeds and a collection of mention tokens combines Bayesian inference and conditional estimation. We evaluate our model and its components on two datasets collected from political blogs and sports news, finding that it outperforms a simple agglomerative clustering approach and previous work.
2 0.26304424 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale
Author: Michael Wick ; Sameer Singh ; Andrew McCallum
Abstract: Sameer Singh Andrew McCallum University of Massachusetts University of Massachusetts 140 Governor’s Drive 140 Governor’s Drive Amherst, MA Amherst, MA s ameer@ cs .umas s .edu mccal lum@ c s .umas s .edu Hamming” who authored “The unreasonable effectiveness of mathematics.” Features of the mentions Methods that measure compatibility between mention pairs are currently the dominant ap- proach to coreference. However, they suffer from a number of drawbacks including difficulties scaling to large numbers of mentions and limited representational power. As these drawbacks become increasingly restrictive, the need to replace the pairwise approaches with a more expressive, highly scalable alternative is becoming urgent. In this paper we propose a novel discriminative hierarchical model that recursively partitions entities into trees of latent sub-entities. These trees succinctly summarize the mentions providing a highly compact, information-rich structure for reasoning about entities and coreference uncertainty at massive scales. We demonstrate that the hierarchical model is several orders of magnitude faster than pairwise, allowing us to perform coreference on six million author mentions in under four hours on a single CPU.
3 0.1795374 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling
Author: Wei Lu ; Dan Roth
Abstract: This paper presents a novel sequence labeling model based on the latent-variable semiMarkov conditional random fields for jointly extracting argument roles of events from texts. The model takes in coarse mention and type information and predicts argument roles for a given event template. This paper addresses the event extraction problem in a primarily unsupervised setting, where no labeled training instances are available. Our key contribution is a novel learning framework called structured preference modeling (PM), that allows arbitrary preference to be assigned to certain structures during the learning procedure. We establish and discuss connections between this framework and other existing works. We show empirically that the structured preferences are crucial to the success of our task. Our model, trained without annotated data and with a small number of structured preferences, yields performance competitive to some baseline supervised approaches.
4 0.17586917 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation
Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum
Abstract: To discover relation types from text, most methods cluster shallow or syntactic patterns of relation mentions, but consider only one possible sense per pattern. In practice this assumption is often violated. In this paper we overcome this issue by inducing clusters of pattern senses from feature representations of patterns. In particular, we employ a topic model to partition entity pairs associated with patterns into sense clusters using local and global features. We merge these sense clusters into semantic relations using hierarchical agglomerative clustering. We compare against several baselines: a generative latent-variable model, a clustering method that does not disambiguate between path senses, and our own approach but with only local features. Experimental results show our proposed approach discovers dramatically more accurate clusters than models without sense disambiguation, and that incorporating global features, such as the document theme, is crucial.
5 0.17123471 73 acl-2012-Discriminative Learning for Joint Template Filling
Author: Einat Minkov ; Luke Zettlemoyer
Abstract: This paper presents a joint model for template filling, where the goal is to automatically specify the fields of target relations such as seminar announcements or corporate acquisition events. The approach models mention detection, unification and field extraction in a flexible, feature-rich model that allows for joint modeling of interdependencies at all levels and across fields. Such an approach can, for example, learn likely event durations and the fact that start times should come before end times. While the joint inference space is large, we demonstrate effective learning with a Perceptron-style approach that uses simple, greedy beam decoding. Empirical results in two benchmark domains demonstrate consistently strong performance on both mention de- tection and template filling tasks.
6 0.15126997 50 acl-2012-Collective Classification for Fine-grained Information Status
7 0.14500125 58 acl-2012-Coreference Semantics from Web Features
8 0.11099153 124 acl-2012-Joint Inference of Named Entity Recognition and Normalization for Tweets
9 0.10801408 153 acl-2012-Named Entity Disambiguation in Streaming Data
10 0.10736392 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model
11 0.09804225 126 acl-2012-Labeling Documents with Timestamps: Learning from their Time Expressions
12 0.095324039 171 acl-2012-SITS: A Hierarchical Nonparametric Model using Speaker Identity for Topic Segmentation in Multiparty Conversations
13 0.091649301 150 acl-2012-Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia
14 0.087460823 40 acl-2012-Big Data versus the Crowd: Looking for Relationships in All the Right Places
15 0.085370369 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling
16 0.078293733 142 acl-2012-Mining Entity Types from Query Logs via User Intent Modeling
17 0.078232966 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study
18 0.075244531 145 acl-2012-Modeling Sentences in the Latent Space
19 0.072374165 85 acl-2012-Event Linking: Grounding Event Reference in a News Archive
20 0.068133742 41 acl-2012-Bootstrapping a Unified Model of Lexical and Phonetic Acquisition
topicId topicWeight
[(0, -0.223), (1, 0.132), (2, -0.015), (3, 0.125), (4, 0.002), (5, 0.147), (6, -0.031), (7, 0.063), (8, 0.132), (9, -0.057), (10, 0.071), (11, -0.24), (12, -0.168), (13, -0.12), (14, 0.029), (15, 0.098), (16, 0.076), (17, 0.092), (18, -0.293), (19, -0.05), (20, -0.036), (21, 0.029), (22, -0.009), (23, -0.016), (24, 0.052), (25, -0.0), (26, -0.031), (27, 0.059), (28, -0.032), (29, -0.027), (30, -0.109), (31, 0.049), (32, 0.004), (33, -0.055), (34, 0.038), (35, -0.028), (36, -0.074), (37, 0.082), (38, 0.018), (39, 0.064), (40, -0.021), (41, -0.019), (42, -0.001), (43, -0.052), (44, -0.022), (45, -0.013), (46, -0.058), (47, 0.11), (48, -0.026), (49, -0.007)]
simIndex simValue paperId paperTitle
same-paper 1 0.96050853 18 acl-2012-A Probabilistic Model for Canonicalizing Named Entity Mentions
Author: Dani Yogatama ; Yanchuan Sim ; Noah A. Smith
Abstract: We present a statistical model for canonicalizing named entity mentions into a table whose rows represent entities and whose columns are attributes (or parts of attributes). The model is novel in that it incorporates entity context, surface features, firstorder dependencies among attribute-parts, and a notion of noise. Transductive learning from a few seeds and a collection of mention tokens combines Bayesian inference and conditional estimation. We evaluate our model and its components on two datasets collected from political blogs and sports news, finding that it outperforms a simple agglomerative clustering approach and previous work.
2 0.88313401 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale
Author: Michael Wick ; Sameer Singh ; Andrew McCallum
Abstract: Sameer Singh Andrew McCallum University of Massachusetts University of Massachusetts 140 Governor’s Drive 140 Governor’s Drive Amherst, MA Amherst, MA s ameer@ cs .umas s .edu mccal lum@ c s .umas s .edu Hamming” who authored “The unreasonable effectiveness of mathematics.” Features of the mentions Methods that measure compatibility between mention pairs are currently the dominant ap- proach to coreference. However, they suffer from a number of drawbacks including difficulties scaling to large numbers of mentions and limited representational power. As these drawbacks become increasingly restrictive, the need to replace the pairwise approaches with a more expressive, highly scalable alternative is becoming urgent. In this paper we propose a novel discriminative hierarchical model that recursively partitions entities into trees of latent sub-entities. These trees succinctly summarize the mentions providing a highly compact, information-rich structure for reasoning about entities and coreference uncertainty at massive scales. We demonstrate that the hierarchical model is several orders of magnitude faster than pairwise, allowing us to perform coreference on six million author mentions in under four hours on a single CPU.
3 0.74636245 58 acl-2012-Coreference Semantics from Web Features
Author: Mohit Bansal ; Dan Klein
Abstract: To address semantic ambiguities in coreference resolution, we use Web n-gram features that capture a range of world knowledge in a diffuse but robust way. Specifically, we exploit short-distance cues to hypernymy, semantic compatibility, and semantic context, as well as general lexical co-occurrence. When added to a state-of-the-art coreference baseline, our Web features give significant gains on multiple datasets (ACE 2004 and ACE 2005) and metrics (MUC and B3), resulting in the best results reported to date for the end-to-end task of coreference resolution.
4 0.69888812 50 acl-2012-Collective Classification for Fine-grained Information Status
Author: Katja Markert ; Yufang Hou ; Michael Strube
Abstract: Previous work on classifying information status (Nissim, 2006; Rahman and Ng, 2011) is restricted to coarse-grained classification and focuses on conversational dialogue. We here introduce the task of classifying finegrained information status and work on written text. We add a fine-grained information status layer to the Wall Street Journal portion of the OntoNotes corpus. We claim that the information status of a mention depends not only on the mention itself but also on other mentions in the vicinity and solve the task by collectively classifying the information status ofall mentions. Our approach strongly outperforms reimplementations of previous work.
5 0.661264 73 acl-2012-Discriminative Learning for Joint Template Filling
Author: Einat Minkov ; Luke Zettlemoyer
Abstract: This paper presents a joint model for template filling, where the goal is to automatically specify the fields of target relations such as seminar announcements or corporate acquisition events. The approach models mention detection, unification and field extraction in a flexible, feature-rich model that allows for joint modeling of interdependencies at all levels and across fields. Such an approach can, for example, learn likely event durations and the fact that start times should come before end times. While the joint inference space is large, we demonstrate effective learning with a Perceptron-style approach that uses simple, greedy beam decoding. Empirical results in two benchmark domains demonstrate consistently strong performance on both mention de- tection and template filling tasks.
6 0.51892817 153 acl-2012-Named Entity Disambiguation in Streaming Data
7 0.50696564 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation
8 0.50085181 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling
9 0.47699735 124 acl-2012-Joint Inference of Named Entity Recognition and Normalization for Tweets
10 0.40847728 126 acl-2012-Labeling Documents with Timestamps: Learning from their Time Expressions
11 0.36427116 142 acl-2012-Mining Entity Types from Query Logs via User Intent Modeling
12 0.35415605 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model
13 0.35395616 186 acl-2012-Structuring E-Commerce Inventory
14 0.33646283 150 acl-2012-Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia
15 0.33125758 14 acl-2012-A Joint Model for Discovery of Aspects in Utterances
16 0.3276659 215 acl-2012-WizIE: A Best Practices Guided Development Environment for Information Extraction
17 0.3176626 16 acl-2012-A Nonparametric Bayesian Approach to Acoustic Model Discovery
18 0.30359858 145 acl-2012-Modeling Sentences in the Latent Space
19 0.30226609 79 acl-2012-Efficient Tree-Based Topic Modeling
20 0.29970378 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study
topicId topicWeight
[(7, 0.24), (25, 0.014), (26, 0.04), (28, 0.048), (30, 0.02), (37, 0.041), (39, 0.075), (57, 0.013), (59, 0.011), (74, 0.032), (82, 0.027), (84, 0.022), (85, 0.023), (90, 0.168), (92, 0.084), (94, 0.026), (99, 0.05)]
simIndex simValue paperId paperTitle
1 0.87877595 109 acl-2012-Higher-order Constituent Parsing and Parser Combination
Author: Xiao Chen ; Chunyu Kit
Abstract: This paper presents a higher-order model for constituent parsing aimed at utilizing more local structural context to decide the score of a grammar rule instance in a parse tree. Experiments on English and Chinese treebanks confirm its advantage over its first-order version. It achieves its best F1 scores of 91.86% and 85.58% on the two languages, respectively, and further pushes them to 92.80% and 85.60% via combination with other highperformance parsers.
same-paper 2 0.81167209 18 acl-2012-A Probabilistic Model for Canonicalizing Named Entity Mentions
Author: Dani Yogatama ; Yanchuan Sim ; Noah A. Smith
Abstract: We present a statistical model for canonicalizing named entity mentions into a table whose rows represent entities and whose columns are attributes (or parts of attributes). The model is novel in that it incorporates entity context, surface features, firstorder dependencies among attribute-parts, and a notion of noise. Transductive learning from a few seeds and a collection of mention tokens combines Bayesian inference and conditional estimation. We evaluate our model and its components on two datasets collected from political blogs and sports news, finding that it outperforms a simple agglomerative clustering approach and previous work.
3 0.66654277 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction
Author: Dave Golland ; John DeNero ; Jakob Uszkoreit
Abstract: We present LLCCM, a log-linear variant ofthe constituent context model (CCM) of grammar induction. LLCCM retains the simplicity of the original CCM but extends robustly to long sentences. On sentences of up to length 40, LLCCM outperforms CCM by 13.9% bracketing F1 and outperforms a right-branching baseline in regimes where CCM does not.
4 0.66573209 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
Author: Fangtao Li ; Sinno Jialin Pan ; Ou Jin ; Qiang Yang ; Xiaoyan Zhu
Abstract: Extracting sentiment and topic lexicons is important for opinion mining. Previous works have showed that supervised learning methods are superior for this task. However, the performance of supervised methods highly relies on manually labeled training data. In this paper, we propose a domain adaptation framework for sentiment- and topic- lexicon co-extraction in a domain of interest where we do not require any labeled data, but have lots of labeled data in another related domain. The framework is twofold. In the first step, we generate a few high-confidence sentiment and topic seeds in the target domain. In the second step, we propose a novel Relational Adaptive bootstraPping (RAP) algorithm to expand the seeds in the target domain by exploiting the labeled source domain data and the relationships between topic and sentiment words. Experimental results show that our domain adaptation framework can extract precise lexicons in the target domain without any annotation.
5 0.66493291 167 acl-2012-QuickView: NLP-based Tweet Search
Author: Xiaohua Liu ; Furu Wei ; Ming Zhou ; QuickView Team Microsoft
Abstract: Tweets have become a comprehensive repository for real-time information. However, it is often hard for users to quickly get information they are interested in from tweets, owing to the sheer volume of tweets as well as their noisy and informal nature. We present QuickView, an NLP-based tweet search platform to tackle this issue. Specifically, it exploits a series of natural language processing technologies, such as tweet normalization, named entity recognition, semantic role labeling, sentiment analysis, tweet classification, to extract useful information, i.e., named entities, events, opinions, etc., from a large volume of tweets. Then, non-noisy tweets, together with the mined information, are indexed, on top of which two brand new scenarios are enabled, i.e., categorized browsing and advanced search, allowing users to effectively access either the tweets or fine-grained information they are interested in.
6 0.66173244 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling
7 0.66146314 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
8 0.66122001 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
9 0.66038245 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
10 0.65931535 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale
11 0.65786451 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition
12 0.65780622 142 acl-2012-Mining Entity Types from Query Logs via User Intent Modeling
13 0.65591389 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval
15 0.65491456 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization
16 0.65475053 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
17 0.65379769 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT
18 0.65330386 31 acl-2012-Authorship Attribution with Author-aware Topic Models
19 0.6531834 182 acl-2012-Spice it up? Mining Refinements to Online Instructions from User Generated Content
20 0.65252542 191 acl-2012-Temporally Anchored Relation Extraction