emnlp emnlp2012 emnlp2012-91 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sameer Singh ; Michael Wick ; Andrew McCallum
Abstract: Conditional random fields and other graphical models have achieved state of the art results in a variety of tasks such as coreference, relation extraction, data integration, and parsing. Increasingly, practitioners are using models with more complex structure—higher treewidth, larger fan-out, more features, and more data—rendering even approximate inference methods such as MCMC inefficient. In this paper we propose an alternative MCMC sam- pling scheme in which transition probabilities are approximated by sampling from the set of relevant factors. We demonstrate that our method converges more quickly than a traditional MCMC sampler for both marginal and MAP inference. In an author coreference task with over 5 million mentions, we achieve a 13 times speedup over regular MCMC inference.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract Conditional random fields and other graphical models have achieved state of the art results in a variety of tasks such as coreference, relation extraction, data integration, and parsing. [sent-3, score-0.14]
2 Increasingly, practitioners are using models with more complex structure—higher treewidth, larger fan-out, more features, and more data—rendering even approximate inference methods such as MCMC inefficient. [sent-4, score-0.185]
3 In this paper we propose an alternative MCMC sam- pling scheme in which transition probabilities are approximated by sampling from the set of relevant factors. [sent-5, score-0.164]
4 In an author coreference task with over 5 million mentions, we achieve a 13 times speedup over regular MCMC inference. [sent-7, score-0.314]
5 Previously, most applications of graphical models were limited to structures where exact inference is possible, for example linear-chain CRFs (Lafferty et al. [sent-9, score-0.202]
6 , 2010), higher-order models for dependency parsing (Carreras, 2007), entity-wise models for coreference (Culotta et al. [sent-16, score-0.134]
7 The increasing sophistication of these individual NLP components compounded with the community’s desire to model these tasks jointly across cross-document considerations has resulted in graphical models for which inference is computationally intractable. [sent-20, score-0.202]
8 Even popular approximate inference techniques such as loopy belief propagation and Markov chain Monte Carlo (MCMC) may be prohibitively slow. [sent-21, score-0.252]
9 MCMC algorithms such as Metropolis-Hastings are usually efficient for graphical models because the only factors needed to score a proposal are those touching the changed variables. [sent-22, score-0.525]
10 For example, the seemingly innocuous proposal changing the entity type of a single entity requires examining all its mentions, i. [sent-24, score-0.293]
11 scoring a linear number of factors (in the number of mentions of that entity). [sent-26, score-0.456]
12 Similarly, evaluating coreference of a mention to an entity also requires scoring factors to all the mentions of the entity. [sent-27, score-0.675]
13 In this paper we propose an approximate MCMC framework that facilitates efficient inference in highdegree graphical models. [sent-31, score-0.256]
14 In particular, we approximate the acceptance ratio in the Metropolis Hastings algorithm by replacing the exact model score with a stochastic approximation that samples from the set of relevant factors. [sent-32, score-0.308]
15 We explore two sampling strategies, a fixed proportion approach that samples the factors uniformly, and a dynamic alternative that samples factors until the method is confident about its estimate of the model score. [sent-33, score-1.017]
16 On synthetic classification data, our approximate MCMC procedure obtains the true marginals faster than a traditional MCMC sampler. [sent-35, score-0.232]
17 On real-world tasks, our method achieves 7 times speedup on citation matching, and 13 times speedup on large-scale author disambiguation. [sent-36, score-0.336]
18 , 2001) succinctly represent the joint distribution over random variables by a product of factors that make the dependencies between the random variables explicit. [sent-39, score-0.58]
19 A factor graph is a bipartite graph between the variables and factors, where each (log) factor f ∈ F is a bfluensc atinodn tahcatto maps an assignment ofafc itotsr neighbor- ing variables to a real number. [sent-40, score-0.376]
20 For example, in a linear-chain model of part-of-speech tagging, transition factors score compatibilities between consecutive labels, while emission factors score compatibilities between a label and its observed token. [sent-41, score-0.662]
21 The two common inference problems for graphical models in NLP are maximum a posterior (MAP) and marginal inference. [sent-43, score-0.293]
22 2 Markov chain Monte Carlo (MCMC) Often, computing marginal estimates of a model is computationally intractable due to the normalization constant Z, while maximum a posteriori (MAP) is prohibitive due to the search space of possible configurations. [sent-46, score-0.133]
23 Markov chain Monte Carlo (MCMC) is important tool for performing sample- and searchbased inference in these models. [sent-47, score-0.133]
24 A particularly successful MCMC method for graphical model inference is Metropolis-Hastings (MH). [sent-48, score-0.202]
25 Since sampling from the true model p(y) is intractable, MH instead uses a simpler distribution q(y0|y) that conditions on a current state y and proposes a new s ctoatned iyti0o by modifying a few variables. [sent-49, score-0.164]
26 (5) Computing this acceptance probability is often highly efficient because the partition function cancels, as do all the factors in the model that do not neighbor the modified variables. [sent-52, score-0.369]
27 1 Marginal Inference To compute marginals with MH, the variables are initialized to an arbitrary assignment (i. [sent-56, score-0.222]
28 , randomly or with some heuristic), and sampling is run until the samples {yi |i = 0, · · · , n} become independent of tshaem inpliteisal { assignment. [sent-58, score-0.283]
29 T·h ,ne ergodic tehe ionrdeempe provides the MCMC analog to the law-of-large-numbers, justifying the use of the generated samples to compute the desired statistics (such as feature expectations or variable marginals). [sent-59, score-0.149]
30 3 Monte Carlo MCMC In this section we introduce our approach for approximating the acceptance ratio of MetropolisHastings that samples the factors, and describe two sampling strategies. [sent-68, score-0.361]
31 In particular, evaluation of each sample requires computing the score of all the factors that are involved in the change, i. [sent-71, score-0.35]
32 all factors that neighbor any variable in the set that has changed. [sent-73, score-0.321]
33 In particular, if the set of factors for a given proposal y → y0 is F(y, y0), we use a sampled sivuebnse ptr oopf othsael yfac →tors y S ⊆ F(y, y0) as an approximation eoft tohfe t hmeo fdaeclt score. [sent-76, score-0.414]
34 Formally, ψ(y) = Xf(yf) = |F| · EF[f(yf)] Xf∈F ψS(y) = |F| · ES [f(yf)] (7) We use the sample log-score (ψS) in the acceptance probability α to evaluate the samples. [sent-78, score-0.137]
35 Since we are using a stochastic approximation to the model score, in general we need to take more MCMC samples before we converge, however, since evaluating each sample will be much faster (O(|S|) as opposed to O( |F|)), we expect cohve fraasltle sampling t aos sb eo pfapostseer. [sent-79, score-0.399]
36 The primary restriction on the set of samples S is that their mean should be an unbiased estimator ofEF[f] . [sent-82, score-0.119]
37 Further, time taken to obtain the set of samples should be negligible when compared to scoring all the factors in F. [sent-83, score-0.41]
38 3 Confidence-Based Sampling Selecting the best value for p is difficult, requiring analysis of the graph structure, and statistics on the distribution of the factors scores; often a difficult task in real-world applications. [sent-93, score-0.291]
39 Instead of sampling a fixed proportion of factors, we can sample until we are confident that the current set of samples Sc is an accurate estimate of the true mean of F. [sent-96, score-0.375]
40 In particular, we maintain a running count of the sample mean ESc [f] and variance σSc, using them to compute a confidence interval IS around our estimate of the mean. [sent-97, score-0.146]
41 Since the number of sampled factors S could be a substantial fraction of the set of factors F,1 we also incorporate finite population control (fpc) in our sample variance computation. [sent-98, score-0.672]
42 This approach starts with an empty set of samples, S = {}, and iteratively samples factors wsaimthpoluets replacement to dad idte to iSv, eulynti sl atmhep cloesnf fiadcetnocrse interval around the estimated mean falls below a user specified maximum interval width threshold i. [sent-103, score-0.522]
43 As a result, for proposals that contain high-variance factors, this strategy examines a large number of factors, while proposals that involve similar factors will result in fewer samples. [sent-104, score-0.437]
44 To evaluate the type at the entity-level, the scored factors examine features of all the entity mentions of the entity, along with the labels of all relation mentions for which it is an argument. [sent-110, score-0.747]
45 Since a subset of the mentions can be sufficiently informative for the model, we expect our stochastic MCMC approach to work well. [sent-114, score-0.222]
46 24co3e Model (n = 100) Figure 1: Synthetic Model for Classification Number of Factors Examined Figure 2: Marginal Inference Error for Classification on Synthetic Data We use synthetic data for such a model to evaluate the quality of marginals returned by the Gibbs sampling form of MCMC. [sent-122, score-0.342]
47 Since the Gibbs algorithm samples each variable using a fixed assignment of its neighborhood, we represent generating a single sample as classification. [sent-123, score-0.208]
48 We generate a synthetic dataset for this model, creating 100 variables consisting of 100 factors each. [sent-125, score-0.501]
49 The scores of the factors are generated from gaussians, N(0. [sent-126, score-0.291]
50 Although each structure contains only a single variable, and no cycles, it is a valid benchmark to test our sampling approach since the effects of the setting of burn-in period and the thinning samples are not a concern. [sent-130, score-0.283]
51 We perform standard Gibbs sampling, and com- pare the marginals obtained during sampling with the true marginals, computed exactly. [sent-131, score-0.259]
52 We evaluate the previously described uniform sampling and confidence-based sampling, with several parameter values, and plot the L1 error to the true marginals as more factors are examined. [sent-132, score-0.627]
53 Note that here, and in the rest of the evaluation, we shall use the number of factors scored as a proxy for running time, since the effects of the rest of the steps of sampling are relatively negligible. [sent-133, score-0.496]
54 Initially, as the sampling approach is made more stochastic (lowering p or increasing i), we see a steady improvement in the running time needed to obtain the same error tolerance. [sent-135, score-0.221]
55 2 Entity Resolution in Citation Data To evaluate our approach on a real world dataset, we apply stochastic MCMC for MAP inference on the task of citation matching. [sent-140, score-0.249]
56 The citation matching problem is an instance of entity resolution, in which observed mentions need to be partitioned such that mentions in a set refer to the same underlying entity. [sent-142, score-0.516]
57 In this paper, the graphical model of entity resolution consists of observed mentions (mi), and pairwise binary variables between all pairs of mentions (yij) which represent whether the corresponding observed mentions are coreferent. [sent-144, score-0.873]
58 There is a local factor for each coreference variable yij that has a high score if the underlying mentions mi and mj are similar. [sent-145, score-0.459]
59 For the sake of efficiency, we only instantiate and incorporate the variables and factors when the variable is true, i. [sent-146, score-0.448]
60 The set of possible worlds coPnsisPts of all settings of the y variables that are consiPstentP Pwith transitivity, i. [sent-150, score-0.127]
61 the binary variables directly represent a valid clustering over the mentions. [sent-152, score-0.127]
62 An example of the model defined over 5 1108 Figure 3: Graphical Model for Entity Resolution: defined over 5 mentions, with the setting of the variables resulting in 2 entities. [sent-153, score-0.127]
63 For the sake of brevity, we’ve only included variables set to 1; binary variables between mentions that are not coreferent have been omitted. [sent-154, score-0.419]
64 As opposed to belief propagation and other approximate inference techniques, MCMC is especially appropriate for the task as it can directly enforce transitivity. [sent-157, score-0.21]
65 When performing MCMC, each sample is a setting to all the y variables that is consistent with transitivity. [sent-158, score-0.186]
66 To maintain transitivity during sampling, Metropolis Hastings is used to change the binary variables in a way that is consistent with moving individual mentions. [sent-159, score-0.165]
67 Our proposal function selects a random mention, and moves it to a random entity, changing all the pairwise variables with mentions in its old entity, and the pairwise variables with mentions in its new entity. [sent-160, score-0.707]
68 Thus, evaluation of such a proposal function requires scoring a number of factors linear in the size of the entities, which, for large datasets, can be a significant bottleneck. [sent-161, score-0.414]
69 In practice, however, these set of factors are often highly redundant, as many of the mentions that refer to the same entity contain redundant information and features, and entity membership may be efficiently determined by observing a subset of its mentions. [sent-162, score-0.657]
70 The dataset Number of Factors Examined Figure 4: Citation Resolution Accuracy Plot for uniform and variance-based sampling compared to regular MCMC (p = 1) consists of 1295 mentions, that refer to 134 true underlying entities. [sent-167, score-0.248]
71 We run MCMC on the entity resolution model using the proposal function described above, running our approach with different parameter values. [sent-171, score-0.263]
72 As inference progresses, we compute BCubed2 F1 of the current sample, and plot it against the number of scored factors in Figure 4. [sent-173, score-0.455]
73 We observe consistent speed improvements as stochasticity is improved, with uniform sampling and confidence-based sampling performing competitively. [sent-174, score-0.413]
74 To compute the speedup, we measure the number of factors scored to obtain a desired level of accuracy (90% F1), shown for a diverse set of parameters in Table 1. [sent-175, score-0.332]
75 With a very large confidence interval threshold (i = 20) and small proportion (p = 0. [sent-176, score-0.12]
76 1), we obtain up to 7 times speedup over regular MCMC. [sent-177, score-0.133]
77 Since the average entity size in this data set is < 10, using a small proportion (and a wide interval) is equivalent to picking a single mention to compare against. [sent-178, score-0.118]
78 2B3 is a coreference evaluation metric, introduced by Bagga and Baldwin (1998) 1109 Table 1: Speedups on Cora to obtain 90% B3 F1 4. [sent-179, score-0.134]
79 , McCallum and Wellner (2004)) is difficult because the number of factors in the model grows quadratically with the number of mentions (research papers) and the number of factors evaluated for every MCMC proposal scales linearly in the size of the clusters. [sent-183, score-0.87]
80 For author coreference, the number of author mentions and the number of references to an author entity can often be in the millions, making the evaluation of the MCMC proposals computationally expensive. [sent-184, score-0.464]
81 For evaluation of accuracy, we also include author mentions from the Rexa corpus4 that contains 2, 833 mentions 3http : / /www . [sent-186, score-0.377]
82 html (a) Accuracy versus Number of Factors scored (b) Accuracy versus Number of Samples Figure 5: Performance of Different Sampling Strategies and Parameters for coreference over 5 million mentions. [sent-192, score-0.175]
83 Plot with p refer to uniform sampling with proportion p of factors picked, while plots with i sample till confidence intervals are narrower than i. [sent-193, score-0.623]
84 As before, we initialize to the singleton configuration and run the experiments for a fixed number of samples, plotting accuracy versus the number of factors evaluated (Figure 5a) as well as accuracy versus the number of samples generated (Figure 5b). [sent-196, score-0.41]
85 As expected, when we compare the performance using the number of generated samples, the approximate MCMC chains appear to converge more slowly; however, the overall convergence for our approach is substantially faster because evaluation of each sample is significantly cheaper. [sent-201, score-0.19]
86 5 Discussion and Related Work MCMC is a popular method for inference amongst researchers that work with large and dense graphical models (Richardson and Domingos, 2006; Poon and Domingos, 2006; Poon et al. [sent-204, score-0.234]
87 Some of the probabilistic 1110 Table 2: Speedups on DBLP to reach 80% B3 F1 programming packages popular amongst NLP practitioners also rely on MCMC for inference and learning (Richardson and Domingos, 2006; McCallum et al. [sent-208, score-0.163]
88 Although most of these methods apply MCMC directly, the rate of convergence of MCMC has become a concern as larger and more denselyfactored models are being considered, motivating the need for more efficient sampling that uses parallelism (Singh et al. [sent-210, score-0.204]
89 There has also been recent work in designing scalable approximate inference techniques. [sent-215, score-0.145]
90 Recent work introduces dynamic schedules to prioritize amongst the factors (Coughlan and Shen, 2007; Sutton and McCallum, 2007) that has been used to only visit a small fraction of the factors (Riedel and Smith, 2010). [sent-218, score-0.674]
91 6 Conclusions and Future Work Motivated by the need for an efficient inference technique that can scale to large, densely-factored models, this paper considers a simple extension to the Markov chain Monto Carlo algorithm. [sent-226, score-0.133]
92 By observing that many graphical models contain substantial redundancy among the factors, we propose stochastic evaluation of proposals that subsamples the factors to be scored. [sent-227, score-0.563]
93 Using two proposed sampling strategies, we demonstrate improved convergence for marginal inference on synthetic data. [sent-228, score-0.469]
94 Further, we evaluate our approach on two real-world entity resolution datasets, obtaining a 13 times speedup on a dataset containing 5 million mentions. [sent-229, score-0.234]
95 Based on the ideas presented in the paper, we will consider additional sampling strategies. [sent-230, score-0.164]
96 In particular, we will explore dynamic sampling, in which we sample fewer factors during the initial, burnin phase, but sample more factors as we get close to convergence. [sent-231, score-0.7]
97 Motivated by our positive results, we will also study the application of this approach to other approximate inference techniques, such as belief propagation and variational inference. [sent-232, score-0.21]
98 Relaxed marginal inference and its application to dependency parsing. [sent-363, score-0.182]
99 Bi-directional joint inference for entity resolution and segmentation using imperatively-defined factor graphs. [sent-368, score-0.292]
100 1112 Large-scale cross-document coreference using distributed inference and hierarchical models. [sent-380, score-0.225]
wordName wordTfidf (topN-words)
[('mcmc', 0.578), ('factors', 0.291), ('wick', 0.176), ('mentions', 0.165), ('sampling', 0.164), ('coreference', 0.134), ('variables', 0.127), ('proposal', 0.123), ('samples', 0.119), ('singh', 0.118), ('poon', 0.112), ('graphical', 0.111), ('gonzalez', 0.104), ('mccallum', 0.102), ('citation', 0.101), ('marginals', 0.095), ('speedup', 0.094), ('inference', 0.091), ('marginal', 0.091), ('speedups', 0.086), ('entity', 0.085), ('synthetic', 0.083), ('domingos', 0.083), ('mh', 0.081), ('acceptance', 0.078), ('proposals', 0.073), ('yf', 0.069), ('yij', 0.069), ('andrew', 0.068), ('carlo', 0.068), ('sutton', 0.067), ('belief', 0.065), ('monte', 0.064), ('sameer', 0.064), ('culotta', 0.064), ('factor', 0.061), ('bertsimas', 0.06), ('christen', 0.06), ('coughlan', 0.06), ('hulten', 0.06), ('kschischang', 0.06), ('pasula', 0.06), ('schedules', 0.06), ('umas', 0.06), ('pedro', 0.06), ('sample', 0.059), ('map', 0.057), ('stochastic', 0.057), ('interval', 0.056), ('resolution', 0.055), ('approximate', 0.054), ('databases', 0.052), ('xf', 0.051), ('richardson', 0.051), ('murray', 0.047), ('leskovec', 0.047), ('amherst', 0.047), ('governor', 0.047), ('aron', 0.047), ('author', 0.047), ('uniform', 0.045), ('massachusetts', 0.045), ('hoffmann', 0.043), ('chain', 0.042), ('scored', 0.041), ('compatibilities', 0.04), ('dblp', 0.04), ('metropolis', 0.04), ('practitioners', 0.04), ('rexa', 0.04), ('stochasticity', 0.04), ('yucheng', 0.04), ('convergence', 0.04), ('regular', 0.039), ('kok', 0.038), ('drive', 0.038), ('transitivity', 0.038), ('converge', 0.037), ('succinctly', 0.035), ('hastings', 0.035), ('cora', 0.035), ('khashayar', 0.035), ('schultz', 0.035), ('samplerank', 0.035), ('hoifung', 0.033), ('bagga', 0.033), ('proportion', 0.033), ('plot', 0.032), ('amongst', 0.032), ('yao', 0.032), ('temperature', 0.031), ('rohanimanesh', 0.031), ('aistats', 0.031), ('redundant', 0.031), ('confidence', 0.031), ('substantial', 0.031), ('variable', 0.03), ('graphs', 0.03), ('riedel', 0.03), ('fields', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000012 91 emnlp-2012-Monte Carlo MCMC: Efficient Inference by Approximate Sampling
Author: Sameer Singh ; Michael Wick ; Andrew McCallum
Abstract: Conditional random fields and other graphical models have achieved state of the art results in a variety of tasks such as coreference, relation extraction, data integration, and parsing. Increasingly, practitioners are using models with more complex structure—higher treewidth, larger fan-out, more features, and more data—rendering even approximate inference methods such as MCMC inefficient. In this paper we propose an alternative MCMC sam- pling scheme in which transition probabilities are approximated by sampling from the set of relevant factors. We demonstrate that our method converges more quickly than a traditional MCMC sampler for both marginal and MAP inference. In an author coreference task with over 5 million mentions, we achieve a 13 times speedup over regular MCMC inference.
2 0.20225832 43 emnlp-2012-Exact Sampling and Decoding in High-Order Hidden Markov Models
Author: Simon Carter ; Marc Dymetman ; Guillaume Bouchard
Abstract: We present a method for exact optimization and sampling from high order Hidden Markov Models (HMMs), which are generally handled by approximation techniques. Motivated by adaptive rejection sampling and heuristic search, we propose a strategy based on sequentially refining a lower-order language model that is an upper bound on the true model we wish to decode and sample from. This allows us to build tractable variable-order HMMs. The ARPA format for language models is extended to enable an efficient use of the max-backoff quantities required to compute the upper bound. We evaluate our approach on two problems: a SMS-retrieval task and a POS tagging experiment using 5-gram models. Results show that the same approach can be used for exact optimization and sampling, while explicitly constructing only a fraction of the total implicit state-space.
3 0.16463064 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents
Author: Heeyoung Lee ; Marta Recasens ; Angel Chang ; Mihai Surdeanu ; Dan Jurafsky
Abstract: We introduce a novel coreference resolution system that models entities and events jointly. Our iterative method cautiously constructs clusters of entity and event mentions using linear regression to model cluster merge operations. As clusters are built, information flows between entity and event clusters through features that model semantic role dependencies. Our system handles nominal and verbal events as well as entities, and our joint formulation allows information from event coreference to help entity coreference, and vice versa. In a cross-document domain with comparable documents, joint coreference resolution performs significantly better (over 3 CoNLL F1 points) than two strong baselines that resolve entities and events separately.
4 0.13645248 65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure
Author: Jason Naradowsky ; Sebastian Riedel ; David Smith
Abstract: Many NLP tasks make predictions that are inherently coupled to syntactic relations, but for many languages the resources required to provide such syntactic annotations are unavailable. For others it is unclear exactly how much of the syntactic annotations can be effectively leveraged with current models, and what structures in the syntactic trees are most relevant to the current task. We propose a novel method which avoids the need for any syntactically annotated data when predicting a related NLP task. Our method couples latent syntactic representations, constrained to form valid dependency graphs or constituency parses, with the prediction task via specialized factors in a Markov random field. At both training and test time we marginalize over this hidden structure, learning the optimal latent representations for the problem. Results show that this approach provides significant gains over a syntactically uninformed baseline, outperforming models that observe syntax on an English relation extraction task, and performing comparably to them in semantic role labeling.
5 0.13642964 73 emnlp-2012-Joint Learning for Coreference Resolution with Markov Logic
Author: Yang Song ; Jing Jiang ; Wayne Xin Zhao ; Sujian Li ; Houfeng Wang
Abstract: Pairwise coreference resolution models must merge pairwise coreference decisions to generate final outputs. Traditional merging methods adopt different strategies such as the bestfirst method and enforcing the transitivity constraint, but most of these methods are used independently of the pairwise learning methods as an isolated inference procedure at the end. We propose a joint learning model which combines pairwise classification and mention clustering with Markov logic. Experimental results show that our joint learning system outperforms independent learning systems. Our system gives a better performance than all the learning-based systems from the CoNLL-201 1shared task on the same dataset. Compared with the best system from CoNLL2011, which employs a rule-based method, our system shows competitive performance.
6 0.125523 76 emnlp-2012-Learning-based Multi-Sieve Co-reference Resolution with Knowledge
7 0.10604998 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction
8 0.10223636 72 emnlp-2012-Joint Inference for Event Timeline Construction
9 0.086456582 19 emnlp-2012-An Entity-Topic Model for Entity Linking
10 0.081394099 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
11 0.081130743 15 emnlp-2012-Active Learning for Imbalanced Sentiment Classification
12 0.081090763 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation
13 0.078932323 99 emnlp-2012-On Amortizing Inference Cost for Structured Prediction
14 0.065277658 36 emnlp-2012-Domain Adaptation for Coreference Resolution: An Adaptive Ensemble Approach
15 0.064987443 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules
16 0.064252123 84 emnlp-2012-Linking Named Entities to Any Database
17 0.063577913 96 emnlp-2012-Name Phylogeny: A Generative Model of String Variation
18 0.060287207 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
19 0.056140952 104 emnlp-2012-Parse, Price and Cut-Delayed Column and Row Generation for Graph Based Parsers
20 0.053607274 98 emnlp-2012-No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities
topicId topicWeight
[(0, 0.213), (1, 0.151), (2, 0.062), (3, -0.177), (4, -0.085), (5, -0.024), (6, -0.103), (7, -0.091), (8, 0.118), (9, 0.074), (10, 0.039), (11, 0.128), (12, 0.078), (13, -0.183), (14, -0.124), (15, 0.167), (16, -0.107), (17, 0.015), (18, -0.083), (19, 0.167), (20, -0.133), (21, 0.161), (22, -0.183), (23, 0.057), (24, 0.106), (25, 0.083), (26, 0.076), (27, -0.163), (28, 0.083), (29, -0.008), (30, -0.102), (31, 0.028), (32, -0.028), (33, 0.04), (34, -0.034), (35, -0.095), (36, -0.022), (37, -0.001), (38, -0.004), (39, -0.006), (40, -0.008), (41, 0.002), (42, 0.024), (43, -0.008), (44, 0.006), (45, 0.022), (46, 0.022), (47, 0.08), (48, -0.045), (49, -0.003)]
simIndex simValue paperId paperTitle
same-paper 1 0.97426885 91 emnlp-2012-Monte Carlo MCMC: Efficient Inference by Approximate Sampling
Author: Sameer Singh ; Michael Wick ; Andrew McCallum
Abstract: Conditional random fields and other graphical models have achieved state of the art results in a variety of tasks such as coreference, relation extraction, data integration, and parsing. Increasingly, practitioners are using models with more complex structure—higher treewidth, larger fan-out, more features, and more data—rendering even approximate inference methods such as MCMC inefficient. In this paper we propose an alternative MCMC sam- pling scheme in which transition probabilities are approximated by sampling from the set of relevant factors. We demonstrate that our method converges more quickly than a traditional MCMC sampler for both marginal and MAP inference. In an author coreference task with over 5 million mentions, we achieve a 13 times speedup over regular MCMC inference.
2 0.7468009 43 emnlp-2012-Exact Sampling and Decoding in High-Order Hidden Markov Models
Author: Simon Carter ; Marc Dymetman ; Guillaume Bouchard
Abstract: We present a method for exact optimization and sampling from high order Hidden Markov Models (HMMs), which are generally handled by approximation techniques. Motivated by adaptive rejection sampling and heuristic search, we propose a strategy based on sequentially refining a lower-order language model that is an upper bound on the true model we wish to decode and sample from. This allows us to build tractable variable-order HMMs. The ARPA format for language models is extended to enable an efficient use of the max-backoff quantities required to compute the upper bound. We evaluate our approach on two problems: a SMS-retrieval task and a POS tagging experiment using 5-gram models. Results show that the same approach can be used for exact optimization and sampling, while explicitly constructing only a fraction of the total implicit state-space.
3 0.50759101 73 emnlp-2012-Joint Learning for Coreference Resolution with Markov Logic
Author: Yang Song ; Jing Jiang ; Wayne Xin Zhao ; Sujian Li ; Houfeng Wang
Abstract: Pairwise coreference resolution models must merge pairwise coreference decisions to generate final outputs. Traditional merging methods adopt different strategies such as the bestfirst method and enforcing the transitivity constraint, but most of these methods are used independently of the pairwise learning methods as an isolated inference procedure at the end. We propose a joint learning model which combines pairwise classification and mention clustering with Markov logic. Experimental results show that our joint learning system outperforms independent learning systems. Our system gives a better performance than all the learning-based systems from the CoNLL-201 1shared task on the same dataset. Compared with the best system from CoNLL2011, which employs a rule-based method, our system shows competitive performance.
4 0.49959147 65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure
Author: Jason Naradowsky ; Sebastian Riedel ; David Smith
Abstract: Many NLP tasks make predictions that are inherently coupled to syntactic relations, but for many languages the resources required to provide such syntactic annotations are unavailable. For others it is unclear exactly how much of the syntactic annotations can be effectively leveraged with current models, and what structures in the syntactic trees are most relevant to the current task. We propose a novel method which avoids the need for any syntactically annotated data when predicting a related NLP task. Our method couples latent syntactic representations, constrained to form valid dependency graphs or constituency parses, with the prediction task via specialized factors in a Markov random field. At both training and test time we marginalize over this hidden structure, learning the optimal latent representations for the problem. Results show that this approach provides significant gains over a syntactically uninformed baseline, outperforming models that observe syntax on an English relation extraction task, and performing comparably to them in semantic role labeling.
5 0.47674006 76 emnlp-2012-Learning-based Multi-Sieve Co-reference Resolution with Knowledge
Author: Lev Ratinov ; Dan Roth
Abstract: We explore the interplay of knowledge and structure in co-reference resolution. To inject knowledge, we use a state-of-the-art system which cross-links (or “grounds”) expressions in free text to Wikipedia. We explore ways of using the resulting grounding to boost the performance of a state-of-the-art co-reference resolution system. To maximize the utility of the injected knowledge, we deploy a learningbased multi-sieve approach and develop novel entity-based features. Our end system outperforms the state-of-the-art baseline by 2 B3 F1 points on non-transcript portion of the ACE 2004 dataset.
6 0.44759133 75 emnlp-2012-Large Scale Decipherment for Out-of-Domain Machine Translation
7 0.42961219 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents
8 0.38661656 15 emnlp-2012-Active Learning for Imbalanced Sentiment Classification
9 0.36223686 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation
10 0.31205601 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction
11 0.30652496 99 emnlp-2012-On Amortizing Inference Cost for Structured Prediction
12 0.28862977 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
13 0.28851104 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
14 0.2840704 72 emnlp-2012-Joint Inference for Event Timeline Construction
15 0.25422412 19 emnlp-2012-An Entity-Topic Model for Entity Linking
16 0.25227818 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP
17 0.24996153 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules
18 0.24839841 96 emnlp-2012-Name Phylogeny: A Generative Model of String Variation
19 0.2446283 104 emnlp-2012-Parse, Price and Cut-Delayed Column and Row Generation for Graph Based Parsers
20 0.24020196 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation
topicId topicWeight
[(2, 0.02), (14, 0.011), (16, 0.027), (25, 0.319), (34, 0.102), (39, 0.011), (45, 0.011), (60, 0.089), (63, 0.036), (65, 0.042), (68, 0.016), (70, 0.014), (73, 0.012), (74, 0.05), (76, 0.052), (80, 0.023), (81, 0.011), (86, 0.037), (95, 0.014)]
simIndex simValue paperId paperTitle
1 0.98266679 65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure
Author: Jason Naradowsky ; Sebastian Riedel ; David Smith
Abstract: Many NLP tasks make predictions that are inherently coupled to syntactic relations, but for many languages the resources required to provide such syntactic annotations are unavailable. For others it is unclear exactly how much of the syntactic annotations can be effectively leveraged with current models, and what structures in the syntactic trees are most relevant to the current task. We propose a novel method which avoids the need for any syntactically annotated data when predicting a related NLP task. Our method couples latent syntactic representations, constrained to form valid dependency graphs or constituency parses, with the prediction task via specialized factors in a Markov random field. At both training and test time we marginalize over this hidden structure, learning the optimal latent representations for the problem. Results show that this approach provides significant gains over a syntactically uninformed baseline, outperforming models that observe syntax on an English relation extraction task, and performing comparably to them in semantic role labeling.
same-paper 2 0.89278585 91 emnlp-2012-Monte Carlo MCMC: Efficient Inference by Approximate Sampling
Author: Sameer Singh ; Michael Wick ; Andrew McCallum
Abstract: Conditional random fields and other graphical models have achieved state of the art results in a variety of tasks such as coreference, relation extraction, data integration, and parsing. Increasingly, practitioners are using models with more complex structure—higher treewidth, larger fan-out, more features, and more data—rendering even approximate inference methods such as MCMC inefficient. In this paper we propose an alternative MCMC sam- pling scheme in which transition probabilities are approximated by sampling from the set of relevant factors. We demonstrate that our method converges more quickly than a traditional MCMC sampler for both marginal and MAP inference. In an author coreference task with over 5 million mentions, we achieve a 13 times speedup over regular MCMC inference.
3 0.55711007 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
Author: Fei Huang ; Alexander Yates
Abstract: Representation learning is a promising technique for discovering features that allow supervised classifiers to generalize from a source domain dataset to arbitrary new domains. We present a novel, formal statement of the representation learning task. We argue that because the task is computationally intractable in general, it is important for a representation learner to be able to incorporate expert knowledge during its search for helpful features. Leveraging the Posterior Regularization framework, we develop an architecture for incorporating biases into representation learning. We investigate three types of biases, and experiments on two domain adaptation tasks show that our biased learners identify significantly better sets of features than unbiased learners, resulting in a relative reduction in error of more than 16% for both tasks, with respect to existing state-of-the-art representation learning techniques.
4 0.55426407 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
Author: Jayant Krishnamurthy ; Tom Mitchell
Abstract: We present a method for training a semantic parser using only a knowledge base and an unlabeled text corpus, without any individually annotated sentences. Our key observation is that multiple forms ofweak supervision can be combined to train an accurate semantic parser: semantic supervision from a knowledge base, and syntactic supervision from dependencyparsed sentences. We apply our approach to train a semantic parser that uses 77 relations from Freebase in its knowledge representation. This semantic parser extracts instances of binary relations with state-of-theart accuracy, while simultaneously recovering much richer semantic structures, such as conjunctions of multiple relations with partially shared arguments. We demonstrate recovery of this richer structure by extracting logical forms from natural language queries against Freebase. On this task, the trained semantic parser achieves 80% precision and 56% recall, despite never having seen an annotated logical form.
5 0.54283857 132 emnlp-2012-Universal Grapheme-to-Phoneme Prediction Over Latin Alphabets
Author: Young-Bum Kim ; Benjamin Snyder
Abstract: We consider the problem of inducing grapheme-to-phoneme mappings for unknown languages written in a Latin alphabet. First, we collect a data-set of 107 languages with known grapheme-phoneme relationships, along with a short text in each language. We then cast our task in the framework of supervised learning, where each known language serves as a training example, and predictions are made on unknown languages. We induce an undirected graphical model that learns phonotactic regularities, thus relating textual patterns to plausible phonemic interpretations across the entire range of languages. Our model correctly predicts grapheme-phoneme pairs with over 88% F1-measure.
6 0.53989482 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
7 0.5394581 104 emnlp-2012-Parse, Price and Cut-Delayed Column and Row Generation for Graph Based Parsers
8 0.53083771 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction
9 0.530375 73 emnlp-2012-Joint Learning for Coreference Resolution with Markov Logic
10 0.52299136 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP
11 0.5193029 33 emnlp-2012-Discovering Diverse and Salient Threads in Document Collections
12 0.50906599 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns
13 0.50878388 133 emnlp-2012-Unsupervised PCFG Induction for Grounded Language Learning with Highly Ambiguous Supervision
14 0.50706816 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
15 0.50430429 10 emnlp-2012-A Statistical Relational Learning Approach to Identifying Evidence Based Medicine Categories
16 0.50329089 96 emnlp-2012-Name Phylogeny: A Generative Model of String Variation
17 0.50258011 30 emnlp-2012-Constructing Task-Specific Taxonomies for Document Collection Browsing
18 0.50063145 8 emnlp-2012-A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes
19 0.49989933 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
20 0.498849 92 emnlp-2012-Multi-Domain Learning: When Do Domains Matter?