emnlp emnlp2011 emnlp2011-101 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David Mimno ; Hanna Wallach ; Edmund Talley ; Miriam Leenders ; Andrew McCallum
Abstract: Latent variable models have the potential to add value to large document collections by discovering interpretable, low-dimensional subspaces. In order for people to use such models, however, they must trust them. Unfortunately, typical dimensionality reduction methods for text, such as latent Dirichlet allocation, often produce low-dimensional subspaces (topics) that are obviously flawed to human domain experts. The contributions of this paper are threefold: (1) An analysis of the ways in which topics can be flawed; (2) an automated evaluation metric for identifying such topics that does not rely on human annotators or reference collections outside the training data; (3) a novel statistical topic model based on this metric that significantly improves topic quality in a large-scale document collection from the National Institutes of Health (NIH).
Reference: text
sentIndex sentText sentNum sentScore
1 1 Introduction Statistical topic models such as latent Dirichlet allocation (LDA) (Blei et al. [sent-12, score-0.559]
2 In our experience, however, the primary obstacle to acceptance of statistical topic models by users the outside machine learning community is the presence of poor quality topics. [sent-14, score-0.609]
3 In general, users prefer models with larger numbers of topics because such models have greater resolution and are able to support finer-grained distinctions. [sent-16, score-0.379]
4 edu is a strong relationship between the size of topics and the probability of topics being nonsensical as judged by domain experts: as the number of topics increases, the smallest topics (number of word tokens assigned to each topic) are almost always poor quality. [sent-19, score-1.416]
5 The common practice of displaying only a small number of example topics hides the fact that as many as 10% of topics may be so bad that they cannot be shown without reducing users’ confidence. [sent-20, score-0.812]
6 The evaluation of statistical topic models has traditionally been dominated by either extrinsic methods (i. [sent-21, score-0.533]
7 , using the inferred topics to perform some external task such as information retrieval (Wei and Croft, 2006)) or quantitative intrinsic methods, such as computing the probability of held-out documents (Wallach et al. [sent-23, score-0.402]
8 Recent work has focused on evaluation of topics as semanticallycoherent concepts. [sent-25, score-0.337]
9 (2010) showed that an automated evaluation metric based on word co-occurrence statistics gathered from Wikipedia could predict human evaluations of topic quality. [sent-29, score-0.639]
10 With little additional computational cost beyond that of LDA, this model exhibits significant gains in average topic coherence score. [sent-37, score-0.857]
11 Although the model does not result in a statisticallysignificant reduction in the number of topics marked “bad”, the model consistently improves the topic coherence score of the ten lowest-scoring topics (i. [sent-38, score-1.557]
12 , results in bad topics that are “less bad” than those found using LDA) while retaining the ability to identify low-quality topics without human interaction. [sent-40, score-0.812]
13 The inference task in topic models is generally cast as inferring the document–topic proportions {θ1 , . [sent-46, score-0.533]
14 dT thhee m touplitci-nsopmeicaifli topic bduisttiroinbsuti {oφns are usually drawn from a shared symmetric Dirichlet prior with hyperparameter such that conditioned on {φt}tT=1 ahyndp trphea topic assignments {oznd(1i)t,i ozn(e2d), . [sent-53, score-1.124]
15 The resulting distribution over words for a topic t is then w {(nwd) zn(d) β, a function of the hyperparameter β and the number of words of each type assigned to that topic, Nw|t. [sent-58, score-0.622]
16 In a later section, we will = introduce a topic mo6=de wl . [sent-64, score-0.533]
17 th Iant asu labtesrtit suetcetsi a generalized P o´lya urn model for the DCM/P ´olya distribution, allowing a draw of word type w to increase the probability of seeing certain other word types. [sent-65, score-0.402]
18 For real-world data, documents W are observed, whFiloer trehea corresponding topic assignments eZr are uwnhoiblese trhveed c oarnrde may ibneg i tnofeprirced a using eenitthser Z Zva arrieational methods (Blei et al. [sent-66, score-0.599]
19 The goal of this study was to develop an annotated set of baseline topics, along with their salient characteristics, as a first step towards automatically identifying and inferring the kinds of topics desired by domain experts. [sent-71, score-0.337]
20 1 Expert-Driven Annotation Protocol In order to ensure that the topics selected for annotation were within the NINDS experts’ area of expertise, they selected 148 topics (out of 500), all associated with areas funded by NINDS. [sent-73, score-0.674]
21 Each topic 1All evaluated models will be released publicly. [sent-74, score-0.533]
22 The experts first categorized each topic as one of three types: “research”, “grant mechanisms and publication types” or “general”. [sent-77, score-0.622]
23 2 The quality of each topic (“good”, “intermediate”, or “bad”) was then evaluated using criteria specific to the type of topic. [sent-78, score-0.596]
24 In general, topics were only annotated as “good” if they contained words that could be grouped together as a single coherent concept. [sent-79, score-0.337]
25 Additionally, each “research” topic was only considered to be “good” if, in addition to representing a single coherent concept, the aggregate content of the set of documents with appreciable allocations to that topic clearly contained text referring to the concept inferred from the topic words. [sent-80, score-1.637]
26 For example, a topic whose top three words are “acids”, “fatty” and “nucleic” consists of two distinct concepts (i. [sent-82, score-0.533]
27 • • • Intruded: either two or more unrelated sets Ionf t rreuladteedd: words, joined arbitrarily, or an o sethtserwise good topic with a few “intruder” words. [sent-85, score-0.559]
28 t Unbalanced: the top words are all logically cUonnbnaeclatnedc etod :ea tchhe other, w bourtd tshe a topic lc loomgibcianlleys very general and specific terms (e. [sent-87, score-0.533]
29 Examples ofa good general topic, a good research topic, and a chained research topic are in Table 1. [sent-92, score-0.7]
30 2 Annotation Results The experts annotated the topics independently and then aggregated their results. [sent-94, score-0.426]
31 Interestingly, no topics were ever considered “good” by one expert and “bad” by the other—when there was disagreement between the experts, one expert always believed the topic to be “intermediate. [sent-95, score-1.024]
32 Of the 148 topics selected for annotation, 90 were labeled as “good,” 21 as “intermediate,” and 37 as “bad. [sent-97, score-0.337]
33 ” Of the topics labeled as “bad” or “intermediate,” 23 were “chained,” 21 were “intruded,” 3 were “random,” and 15 were “unbalanced”. [sent-98, score-0.337]
34 We therefore explore the extent to which information already contained in the documents being modeled can be used to assess topic quality. [sent-101, score-0.571]
35 In this section we evaluate several methods for ranking the quality of topics and compare these rankings to human annotations. [sent-102, score-0.371]
36 For an application involving removing low quality topics we recommend using a weighted combination of metrics, with a threshold determined by users. [sent-104, score-0.371]
37 1 Topic Size As a simple baseline, we considered the extent to which topic “size” (as measured by the number of tokens assigned to each topic via Gibbs sampling) is a good metric for assessing topic quality. [sent-106, score-1.729]
38 Figure 1 (top) displays the topic size (number of tokens assigned to that topic) and expert annotations (“good”, “intermediate”, “bad”) for the 148 topics manually labeled by annotators as described above. [sent-107, score-1.025]
39 Top shows expert-rated topics ranked by topic size (AP 0. [sent-109, score-0.87]
40 79), bottom shows same topics ranked by coherence (AP 0. [sent-111, score-0.661]
41 Unfortunately, this observation conflicts with the goal of building highly specialized, domainspecific topic models with many high-quality, finegrained topics—in such models the majority of topics will have relatively few tokens assigned to them. [sent-117, score-0.911]
42 2 Topic Coherence When displaying topics to users, each topic t is generally represented as a list of the M = 5, . [sent-119, score-0.87]
43 Although there has been previous work on automated generation of labels or headings for topics (Mei et al. [sent-123, score-0.38]
44 Labels may obscure or detract from fundamental problems with topic coherence, and better labels don’t make bad topics good. [sent-125, score-1.008]
45 The expert-driven annotation study described in section 3 suggests that three of the four types of poor-quality topics (“chained,” “intruded” and “random”) could be detected using a metric based on the co-occurrence of words within the documents being modeled. [sent-126, score-0.438]
46 This insight can be used to design a new metric for assessing topic quality. [sent-133, score-0.596]
47 , the number of documents containing one or more tokens of type v and at least one token of type v0), we define topic coherence as C(t;V(t)) =mXM=2mXl=−11logD(v(Dmt)(,vvl(l(tt)))) + 1, V(t) (v1(t), vM(t)) (1) where = . [sent-138, score-1.028]
48 , is a list of the M most probable words in topic t. [sent-141, score-0.57]
49 Figure 1shows the association between the expert annotations and both topic size (top) and our coherence metric (bottom). [sent-143, score-0.997]
50 Treating “good” topics as positive and “intermediate” or “bad” topics as negative, we get average precision values of 0. [sent-145, score-0.674]
51 We performed a logistic regression analysis on the binary variable “is this topic bad”. [sent-152, score-0.533]
52 Using topic size alone as a predictor gives AIC (a measure of model fit) 152. [sent-153, score-0.533]
53 We tried weighting the terms in equation 1 by their corresponding topic–word probabilities and and by their position in the sorted list of the M most probable words for that topic, but we found that a uniform weighting better predicted topic quality. [sent-159, score-0.57]
54 Our topic coherence metric also exhibits good qualitative behavior: of the 20 best-scoring topics, 18 are labeled as “good,” one is “intermediate” (“unbalanced”), and one is “bad” (combining “cortex” and “fmri”, words that commonly co-occur, but are conceptually distinct). [sent-160, score-0.946]
55 ” Our coherence metric relies only upon word cooccurrence statistics gathered from the corpus being modeled, and does not depend on an external reference corpus. [sent-162, score-0.387]
56 We believe that one of the main contributions of our work is demonstrating that standard topic models do not fully utilize available co-occurrence information, and that a held-out reference corpus is therefore not required for purposes of topic evaluation. [sent-164, score-1.066]
57 In order to provide intuition for the behavior of our topic coherence metric, table 1 shows three example topics and their topic coherence scores. [sent-172, score-2.051]
58 The last topic is one of the lowest-scoring topics. [sent-177, score-0.533]
59 It is very unlikely that a randomlychosen word will be semantically related to any of the original words in the topic, so if a topic is a high quality representation of a semantically coher- ent concept, it should be easy for users to select the intruder word. [sent-188, score-0.749]
60 If the topic is not coherent, there may be words in the topic that are also not semantically related to any other word, thus causing users to select “correct” words instead of the real intruder. [sent-189, score-1.108]
61 In the first two plots, the x-axis is one of our two automated quality metTable 1: Example topics (good/general, good/research, chained/research) with different coherence scores (numbers closer to zero indicate higher coherence). [sent-192, score-0.738]
62 The chained topic combines words related to aging (indicated in plain text) and words describing blood and blood-related diseases (bold). [sent-193, score-0.686]
63 2 aging, lifespan, globin, age related, longevity, human, age, erythroid, sickle cell, beta globin, hb, senescence, adult, older, lcr Table 2: Co-document frequency matrix for the top words in a low-quality topic (according to our coherence metric), shaded to highlight zeros. [sent-198, score-0.857]
64 The histograms below these plots show the number of topics with each level of annotator accuracy for good and bad topics. [sent-203, score-0.529]
65 For good topics (green circles), the annotators were generally able to detect the intruder word with high accuracy. [sent-204, score-0.54]
66 These results suggest that top- ics with low intruder detection accuracy tend to be bad, but some bad topics can have a high accuracy. [sent-206, score-0.615]
67 For example, spotting an intruder word in a chained topic can be easy. [sent-207, score-0.788]
68 The low-quality topic receptors, cannabinoid, cannabinoids, ligands, cannabis, endocannabinoid, cxcr4, [virus], receptor, sdf1, is a typical “chained” topic, with CXCR4 linked to cannabinoids only through receptors, and otherwise unrelated. [sent-208, score-0.563]
69 5 Generalized P o´lya Urn Models Although the topic coherence metric defined above provides an accurate way of assessing topic quality, preventing poor quality topics from occurring in the first place is preferable. [sent-212, score-1.824]
70 In this section, we describe a new topic model that incorporates the corpus-specific word co-occurrence information used in our coherence metric directly into the statistical topic modeling framework. [sent-214, score-1.453]
71 It is important to note that simply disallowing words that never co-occur from being assigned to the same topic is not sufficient. [sent-215, score-0.533]
72 It is rather the degree to which the most prominent words in a topic do not co-occur with the other most prominent words in that topic that is an indicator of topic incoherence. [sent-217, score-1.599]
73 This new topic model retains the document– topic component of standard LDA, but replaces the usual P o´lya urn topic–word component with a generalized P o´lya urn framework (Mahmoud, 2008). [sent-220, score-1.621]
74 This process represents the marginal distribution of a hierarchical model with a Dirichlet prior and a multinomial likelihood, and is used as the distribution over words for each topic in almost all previous topic models. [sent-227, score-1.15]
75 In a generalized P o´lya urn model, having drawn a ball of color w, Avw additional balls of each color v ∈ {1, . [sent-228, score-0.544]
76 c,oWnd}iti aorneal r posterior probability oivfe wno Wrd w idn Z topic et c implied by pthosist generalized imlitoyd oelf iws P(w|t,W,Z,β,A) =PvNNtv+|tA |Vvw|β+ β, (2) × where A is a W W real-valued matrix, known as tehree a Add iitsio an Wma ×trix W or rsecahle-vmaalu. [sent-235, score-0.643]
77 eTdh em simple P o´lya urn model (and hence the conditional posterior probability of word w in topic t under LDA) can be recovered by setting the schema A to the identity matrix. [sent-236, score-0.844]
78 Unlike the simple P o´lya distribution, we do not know of a representation of the generalized P o´lya urn distribution that can be expressed using a concise set of conditional independence assumptions. [sent-237, score-0.349]
79 Another property of the generalized P o´lya urn model is that it is nonexchangeable—the joint probability of the tokens in any given topic is not invariant to permutation of those tokens. [sent-246, score-0.92]
80 es I repeatedly cycling through tibheb t osakmenpsl ning Winv and, rfeopre aeatecdhl one, resampling giths topic assignment acnodn,d fitoiorn eeadc on nWe, raensda mthep current topic assignments fcoorn adillt itoonkeedns o onth Wer than the token of interest. [sent-248, score-1.128]
81 For LDA, the sampling distribution for each topic assignment is simply the product of two predictive probabilities, obtained by 269 treating the token of interest as if it were the last. [sent-249, score-0.65]
82 For a topic model with a generalized P o´lya urn for the topic–word component, the sampling distribution is more complicated. [sent-250, score-0.935]
83 Specifically, the topic– word component of the sampling distribution is no longer a simple predictive distribution—when sampling a new value for the implication of each possible value for subsequent tokens and their topic assignments must be considered. [sent-251, score-0.738]
84 The first is to use sequential Monte Carlo methods, which have been successfully applied to topic models previously (Canini et al. [sent-254, score-0.533]
85 The second approach is to approximate the true Gibbs sampling distribution by treating each token as if it were the last, ignoring implications for subsequent tokens and their topic assignments. [sent-256, score-0.691]
86 The top plots show topic coherence (averaged over 15 runs) over 1000 iterations of Gibbs sampling. [sent-270, score-0.885]
87 Two metrics—our new topic coherence metric and the log probability of held-out documents—are shown over 1000 iterations at 50 iteration intervals. [sent-301, score-0.947]
88 For each model we calculated an overall coherence score by calculating the topic coherence for each topic individually and then averaging these values. [sent-303, score-1.714]
89 The generalized P o´lya model performs very well in average topic coherence, reaching levels within the first 50 iterations that match the final score. [sent-307, score-0.616]
90 2, we demonstrated that our topic coherence metric correlates with expert opinions of topic quality for standard LDA. [sent-314, score-1.564]
91 It is possible, however, that optimizing for coherence directly could break the association between coherence metric and topic quality. [sent-316, score-1.244]
92 The topics were then presented to the experts from NINDS, with no indication as to the identity of the model from which each topic came. [sent-320, score-0.959]
93 As these evaluations are time consuming, the experts evaluated the only the first 200 topics, which consisted of 103 generalized P o´lya urn topics and 97 LDA topics. [sent-321, score-0.745]
94 AUC values predicting bad topics given coherence were 0. [sent-322, score-0.799]
95 Although we were able to improve the average overall quality of topics and the average quality of the ten lowest-scoring topics, we found that the generalized P o´lya urn model was less successful reducing the overall number of bad topics. [sent-326, score-0.888]
96 6 Discussion We have demonstrated the following: • • • There is a class of low-quality topics that cannot r bee sde ate ccltaessd using existing owpoicrds- tinhtartus ciaonntests, but that can be identified reliably using a metric based on word co-occurrence statistics. [sent-332, score-0.4]
97 It is possible to improve the coherence score Iotf topics, b bleoth to oo vimerparlol aen tdh efo cr thheer etencne worst, while retaining the ability to flag bad topics, all without requiring semi-supervised data or additional reference corpora. [sent-333, score-0.462]
98 We believe that the most important challenges in future topic modeling research are improving the semantic quality of topics, particularly at the low end, and scaling to ever-larger data sets while ensuring 271 high-quality topics. [sent-336, score-0.567]
99 We found that it should be possible to construct unsupervised topic models that do not produce bad topics. [sent-338, score-0.671]
100 Incorporating domain knowledge into topic modeling via Dirichlet forest priors. [sent-349, score-0.533]
wordName wordTfidf (topN-words)
[('topic', 0.533), ('topics', 0.337), ('lya', 0.332), ('coherence', 0.324), ('urn', 0.236), ('intruder', 0.14), ('bad', 0.138), ('lda', 0.119), ('nzi', 0.119), ('chained', 0.115), ('experts', 0.089), ('nih', 0.089), ('generalized', 0.083), ('zi', 0.083), ('expert', 0.077), ('ball', 0.075), ('ninds', 0.074), ('metric', 0.063), ('color', 0.06), ('acids', 0.059), ('hsrew', 0.059), ('institutes', 0.059), ('intruded', 0.059), ('lpoh', 0.059), ('nwi', 0.059), ('oherncc', 0.059), ('dirichlet', 0.058), ('auc', 0.058), ('gibbs', 0.057), ('aic', 0.054), ('sampling', 0.053), ('di', 0.053), ('intermediate', 0.051), ('globin', 0.051), ('intrusion', 0.051), ('zn', 0.051), ('schema', 0.048), ('unbalanced', 0.046), ('newman', 0.046), ('alsumait', 0.044), ('nucleic', 0.044), ('sweep', 0.044), ('amherst', 0.043), ('nv', 0.043), ('automated', 0.043), ('users', 0.042), ('tokens', 0.041), ('pmi', 0.039), ('aging', 0.038), ('documents', 0.038), ('wallach', 0.038), ('annotators', 0.037), ('probable', 0.037), ('geman', 0.035), ('quality', 0.034), ('token', 0.034), ('collapsed', 0.031), ('idf', 0.03), ('hyperparameter', 0.03), ('document', 0.03), ('distribution', 0.03), ('andrzejewski', 0.03), ('avw', 0.03), ('avwi', 0.03), ('balls', 0.03), ('canini', 0.03), ('cannabinoids', 0.03), ('dna', 0.03), ('erythroid', 0.03), ('lifespan', 0.03), ('mahmoud', 0.03), ('nntw', 0.03), ('nwiw', 0.03), ('olya', 0.03), ('receptors', 0.03), ('requncyf', 0.03), ('threefold', 0.03), ('vacuous', 0.03), ('virus', 0.03), ('type', 0.029), ('assignments', 0.028), ('plots', 0.028), ('seeing', 0.027), ('health', 0.027), ('probability', 0.027), ('latent', 0.026), ('blei', 0.026), ('drawing', 0.026), ('ten', 0.026), ('good', 0.026), ('vm', 0.026), ('neurons', 0.026), ('doyle', 0.026), ('mcmc', 0.026), ('protocol', 0.026), ('disorders', 0.026), ('fatty', 0.026), ('count', 0.025), ('chang', 0.025), ('multinomial', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999791 101 emnlp-2011-Optimizing Semantic Coherence in Topic Models
Author: David Mimno ; Hanna Wallach ; Edmund Talley ; Miriam Leenders ; Andrew McCallum
Abstract: Latent variable models have the potential to add value to large document collections by discovering interpretable, low-dimensional subspaces. In order for people to use such models, however, they must trust them. Unfortunately, typical dimensionality reduction methods for text, such as latent Dirichlet allocation, often produce low-dimensional subspaces (topics) that are obviously flawed to human domain experts. The contributions of this paper are threefold: (1) An analysis of the ways in which topics can be flawed; (2) an automated evaluation metric for identifying such topics that does not rely on human annotators or reference collections outside the training data; (3) a novel statistical topic model based on this metric that significantly improves topic quality in a large-scale document collection from the National Institutes of Health (NIH).
2 0.36921009 119 emnlp-2011-Semantic Topic Models: Combining Word Distributional Statistics and Dictionary Definitions
Author: Weiwei Guo ; Mona Diab
Abstract: In this paper, we propose a novel topic model based on incorporating dictionary definitions. Traditional topic models treat words as surface strings without assuming predefined knowledge about word meaning. They infer topics only by observing surface word co-occurrence. However, the co-occurred words may not be semantically related in a manner that is relevant for topic coherence. Exploiting dictionary definitions explicitly in our model yields a better understanding of word semantics leading to better text modeling. We exploit WordNet as a lexical resource for sense definitions. We show that explicitly modeling word definitions helps improve performance significantly over the baseline for a text categorization task.
3 0.35580391 21 emnlp-2011-Bayesian Checking for Topic Models
Author: David Mimno ; David Blei
Abstract: Real document collections do not fit the independence assumptions asserted by most statistical topic models, but how badly do they violate them? We present a Bayesian method for measuring how well a topic model fits a corpus. Our approach is based on posterior predictive checking, a method for diagnosing Bayesian models in user-defined ways. Our method can identify where a topic model fits the data, where it falls short, and in which directions it might be improved.
4 0.1872372 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases
Author: Matthias Hartung ; Anette Frank
Abstract: This paper introduces an attribute selection task as a way to characterize the inherent meaning of property-denoting adjectives in adjective-noun phrases, such as e.g. hot in hot summer denoting the attribute TEMPERATURE, rather than TASTE. We formulate this task in a vector space model that represents adjectives and nouns as vectors in a semantic space defined over possible attributes. The vectors incorporate latent semantic information obtained from two variants of LDA topic models. Our LDA models outperform previous approaches on a small set of 10 attributes with considerable gains on sparse representations, which highlights the strong smoothing power of LDA models. For the first time, we extend the attribute selection task to a new data set with more than 200 classes. We observe that large-scale attribute selection is a hard problem, but a subset of attributes performs robustly on the large scale as well. Again, the LDA models outperform the VSM baseline.
5 0.13406117 116 emnlp-2011-Robust Disambiguation of Named Entities in Text
Author: Johannes Hoffart ; Mohamed Amir Yosef ; Ilaria Bordino ; Hagen Furstenau ; Manfred Pinkal ; Marc Spaniol ; Bilyana Taneva ; Stefan Thater ; Gerhard Weikum
Abstract: Disambiguating named entities in naturallanguage text maps mentions of ambiguous names onto canonical entities like people or places, registered in a knowledge base such as DBpedia or YAGO. This paper presents a robust method for collective disambiguation, by harnessing context from knowledge bases and using a new form of coherence graph. It unifies prior approaches into a comprehensive framework that combines three measures: the prior probability of an entity being mentioned, the similarity between the contexts of a mention and a candidate entity, as well as the coherence among candidate entities for all mentions together. The method builds a weighted graph of mentions and candidate entities, and computes a dense subgraph that approximates the best joint mention-entity mapping. Experiments show that the new method significantly outperforms prior methods in terms of accuracy, with robust behavior across a variety of inputs.
6 0.11661305 114 emnlp-2011-Relation Extraction with Relation Topics
7 0.096973576 107 emnlp-2011-Probabilistic models of similarity in syntactic context
8 0.082946442 25 emnlp-2011-Cache-based Document-level Statistical Machine Translation
9 0.081230864 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics
10 0.077035069 61 emnlp-2011-Generating Aspect-oriented Multi-Document Summarization with Event-aspect model
11 0.075241096 130 emnlp-2011-Summarize What You Are Interested In: An Optimization Framework for Interactive Personalized Summarization
12 0.063486822 98 emnlp-2011-Named Entity Recognition in Tweets: An Experimental Study
13 0.058513843 128 emnlp-2011-Structured Relation Discovery using Generative Models
14 0.055234786 144 emnlp-2011-Unsupervised Learning of Selectional Restrictions and Detection of Argument Coercions
15 0.050135925 11 emnlp-2011-A Simple Word Trigger Method for Social Tag Suggestion
16 0.04954005 14 emnlp-2011-A generative model for unsupervised discovery of relations and argument classes from clinical texts
17 0.046811931 99 emnlp-2011-Non-parametric Bayesian Segmentation of Japanese Noun Phrases
18 0.046527576 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances
19 0.044824108 88 emnlp-2011-Linear Text Segmentation Using Affinity Propagation
20 0.044637293 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model
topicId topicWeight
[(0, 0.175), (1, -0.194), (2, -0.201), (3, -0.301), (4, -0.097), (5, 0.448), (6, 0.153), (7, -0.019), (8, 0.06), (9, -0.107), (10, -0.009), (11, 0.091), (12, -0.022), (13, 0.014), (14, 0.061), (15, -0.135), (16, 0.012), (17, -0.158), (18, 0.02), (19, -0.112), (20, 0.018), (21, 0.087), (22, -0.03), (23, 0.052), (24, -0.008), (25, 0.032), (26, -0.143), (27, -0.003), (28, 0.024), (29, -0.044), (30, 0.037), (31, -0.048), (32, 0.044), (33, -0.017), (34, -0.041), (35, -0.021), (36, -0.078), (37, 0.055), (38, 0.013), (39, -0.016), (40, -0.009), (41, 0.014), (42, -0.016), (43, 0.081), (44, 0.09), (45, 0.005), (46, -0.025), (47, -0.058), (48, 0.01), (49, 0.051)]
simIndex simValue paperId paperTitle
same-paper 1 0.99048418 101 emnlp-2011-Optimizing Semantic Coherence in Topic Models
Author: David Mimno ; Hanna Wallach ; Edmund Talley ; Miriam Leenders ; Andrew McCallum
Abstract: Latent variable models have the potential to add value to large document collections by discovering interpretable, low-dimensional subspaces. In order for people to use such models, however, they must trust them. Unfortunately, typical dimensionality reduction methods for text, such as latent Dirichlet allocation, often produce low-dimensional subspaces (topics) that are obviously flawed to human domain experts. The contributions of this paper are threefold: (1) An analysis of the ways in which topics can be flawed; (2) an automated evaluation metric for identifying such topics that does not rely on human annotators or reference collections outside the training data; (3) a novel statistical topic model based on this metric that significantly improves topic quality in a large-scale document collection from the National Institutes of Health (NIH).
2 0.96947169 21 emnlp-2011-Bayesian Checking for Topic Models
Author: David Mimno ; David Blei
Abstract: Real document collections do not fit the independence assumptions asserted by most statistical topic models, but how badly do they violate them? We present a Bayesian method for measuring how well a topic model fits a corpus. Our approach is based on posterior predictive checking, a method for diagnosing Bayesian models in user-defined ways. Our method can identify where a topic model fits the data, where it falls short, and in which directions it might be improved.
3 0.88053453 119 emnlp-2011-Semantic Topic Models: Combining Word Distributional Statistics and Dictionary Definitions
Author: Weiwei Guo ; Mona Diab
Abstract: In this paper, we propose a novel topic model based on incorporating dictionary definitions. Traditional topic models treat words as surface strings without assuming predefined knowledge about word meaning. They infer topics only by observing surface word co-occurrence. However, the co-occurred words may not be semantically related in a manner that is relevant for topic coherence. Exploiting dictionary definitions explicitly in our model yields a better understanding of word semantics leading to better text modeling. We exploit WordNet as a lexical resource for sense definitions. We show that explicitly modeling word definitions helps improve performance significantly over the baseline for a text categorization task.
4 0.55438977 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases
Author: Matthias Hartung ; Anette Frank
Abstract: This paper introduces an attribute selection task as a way to characterize the inherent meaning of property-denoting adjectives in adjective-noun phrases, such as e.g. hot in hot summer denoting the attribute TEMPERATURE, rather than TASTE. We formulate this task in a vector space model that represents adjectives and nouns as vectors in a semantic space defined over possible attributes. The vectors incorporate latent semantic information obtained from two variants of LDA topic models. Our LDA models outperform previous approaches on a small set of 10 attributes with considerable gains on sparse representations, which highlights the strong smoothing power of LDA models. For the first time, we extend the attribute selection task to a new data set with more than 200 classes. We observe that large-scale attribute selection is a hard problem, but a subset of attributes performs robustly on the large scale as well. Again, the LDA models outperform the VSM baseline.
5 0.35169813 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics
Author: Joseph Reisinger ; Raymond Mooney
Abstract: Context-dependent word similarity can be measured over multiple cross-cutting dimensions. For example, lung and breath are similar thematically, while authoritative and superficial occur in similar syntactic contexts, but share little semantic similarity. Both of these notions of similarity play a role in determining word meaning, and hence lexical semantic models must take them both into account. Towards this end, we develop a novel model, Multi-View Mixture (MVM), that represents words as multiple overlapping clusterings. MVM finds multiple data partitions based on different subsets of features, subject to the marginal constraint that feature subsets are distributed according to Latent Dirich- let Allocation. Intuitively, this constraint favors feature partitions that have coherent topical semantics. Furthermore, MVM uses soft feature assignment, hence the contribution of each data point to each clustering view is variable, isolating the impact of data only to views where they assign the most features. Through a series of experiments, we demonstrate the utility of MVM as an inductive bias for capturing relations between words that are intuitive to humans, outperforming related models such as Latent Dirichlet Allocation.
6 0.31934643 25 emnlp-2011-Cache-based Document-level Statistical Machine Translation
7 0.31526461 114 emnlp-2011-Relation Extraction with Relation Topics
8 0.27064502 107 emnlp-2011-Probabilistic models of similarity in syntactic context
10 0.23567922 61 emnlp-2011-Generating Aspect-oriented Multi-Document Summarization with Event-aspect model
11 0.2350366 116 emnlp-2011-Robust Disambiguation of Named Entities in Text
12 0.21901985 63 emnlp-2011-Harnessing WordNet Senses for Supervised Sentiment Classification
13 0.21512131 86 emnlp-2011-Lexical Co-occurrence, Statistical Significance, and Word Association
14 0.21326174 98 emnlp-2011-Named Entity Recognition in Tweets: An Experimental Study
15 0.21056445 143 emnlp-2011-Unsupervised Information Extraction with Distributional Prior Knowledge
16 0.20558515 106 emnlp-2011-Predicting a Scientific Communitys Response to an Article
17 0.2032464 88 emnlp-2011-Linear Text Segmentation Using Affinity Propagation
18 0.19630201 144 emnlp-2011-Unsupervised Learning of Selectional Restrictions and Detection of Argument Coercions
19 0.17347017 133 emnlp-2011-The Imagination of Crowds: Conversational AAC Language Modeling using Crowdsourcing and Large Data Sources
20 0.17131662 128 emnlp-2011-Structured Relation Discovery using Generative Models
topicId topicWeight
[(15, 0.019), (23, 0.063), (36, 0.019), (37, 0.028), (45, 0.173), (53, 0.018), (54, 0.023), (57, 0.015), (62, 0.017), (64, 0.015), (66, 0.026), (69, 0.012), (79, 0.029), (82, 0.021), (90, 0.01), (96, 0.412), (97, 0.011), (98, 0.017)]
simIndex simValue paperId paperTitle
1 0.9140116 4 emnlp-2011-A Fast, Accurate, Non-Projective, Semantically-Enriched Parser
Author: Stephen Tratz ; Eduard Hovy
Abstract: Dependency parsers are critical components within many NLP systems. However, currently available dependency parsers each exhibit at least one of several weaknesses, including high running time, limited accuracy, vague dependency labels, and lack of nonprojectivity support. Furthermore, no commonly used parser provides additional shallow semantic interpretation, such as preposition sense disambiguation and noun compound interpretation. In this paper, we present a new dependency-tree conversion of the Penn Treebank along with its associated fine-grain dependency labels and a fast, accurate parser trained on it. We explain how a non-projective extension to shift-reduce parsing can be incorporated into non-directional easy-first parsing. The parser performs well when evaluated on the standard test section of the Penn Treebank, outperforming several popular open source dependency parsers; it is, to the best of our knowledge, the first dependency parser capable of parsing more than 75 sentences per second at over 93% accuracy.
2 0.88661605 145 emnlp-2011-Unsupervised Semantic Role Induction with Graph Partitioning
Author: Joel Lang ; Mirella Lapata
Abstract: In this paper we present a method for unsupervised semantic role induction which we formalize as a graph partitioning problem. Argument instances of a verb are represented as vertices in a graph whose edge weights quantify their role-semantic similarity. Graph partitioning is realized with an algorithm that iteratively assigns vertices to clusters based on the cluster assignments of neighboring vertices. Our method is algorithmically and conceptually simple, especially with respect to how problem-specific knowledge is incorporated into the model. Experimental results on the CoNLL 2008 benchmark dataset demonstrate that our model is competitive with other unsupervised approaches in terms of F1 whilst attaining significantly higher cluster purity.
same-paper 3 0.87804013 101 emnlp-2011-Optimizing Semantic Coherence in Topic Models
Author: David Mimno ; Hanna Wallach ; Edmund Talley ; Miriam Leenders ; Andrew McCallum
Abstract: Latent variable models have the potential to add value to large document collections by discovering interpretable, low-dimensional subspaces. In order for people to use such models, however, they must trust them. Unfortunately, typical dimensionality reduction methods for text, such as latent Dirichlet allocation, often produce low-dimensional subspaces (topics) that are obviously flawed to human domain experts. The contributions of this paper are threefold: (1) An analysis of the ways in which topics can be flawed; (2) an automated evaluation metric for identifying such topics that does not rely on human annotators or reference collections outside the training data; (3) a novel statistical topic model based on this metric that significantly improves topic quality in a large-scale document collection from the National Institutes of Health (NIH).
4 0.61020333 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus
Author: Emily M. Bender ; Dan Flickinger ; Stephan Oepen ; Yi Zhang
Abstract: In order to obtain a fine-grained evaluation of parser accuracy over naturally occurring text, we study 100 examples each of ten reasonably frequent linguistic phenomena, randomly selected from a parsed version of the English Wikipedia. We construct a corresponding set of gold-standard target dependencies for these 1000 sentences, operationalize mappings to these targets from seven state-of-theart parsers, and evaluate the parsers against this data to measure their level of success in identifying these dependencies.
5 0.597543 116 emnlp-2011-Robust Disambiguation of Named Entities in Text
Author: Johannes Hoffart ; Mohamed Amir Yosef ; Ilaria Bordino ; Hagen Furstenau ; Manfred Pinkal ; Marc Spaniol ; Bilyana Taneva ; Stefan Thater ; Gerhard Weikum
Abstract: Disambiguating named entities in naturallanguage text maps mentions of ambiguous names onto canonical entities like people or places, registered in a knowledge base such as DBpedia or YAGO. This paper presents a robust method for collective disambiguation, by harnessing context from knowledge bases and using a new form of coherence graph. It unifies prior approaches into a comprehensive framework that combines three measures: the prior probability of an entity being mentioned, the similarity between the contexts of a mention and a candidate entity, as well as the coherence among candidate entities for all mentions together. The method builds a weighted graph of mentions and candidate entities, and computes a dense subgraph that approximates the best joint mention-entity mapping. Experiments show that the new method significantly outperforms prior methods in terms of accuracy, with robust behavior across a variety of inputs.
6 0.59387761 81 emnlp-2011-Learning General Connotation of Words using Graph-based Algorithms
7 0.59228367 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics
8 0.58148652 33 emnlp-2011-Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs
9 0.56023997 127 emnlp-2011-Structured Lexical Similarity via Convolution Kernels on Dependency Trees
10 0.55503213 128 emnlp-2011-Structured Relation Discovery using Generative Models
11 0.55078733 144 emnlp-2011-Unsupervised Learning of Selectional Restrictions and Detection of Argument Coercions
12 0.54938787 78 emnlp-2011-Large-Scale Noun Compound Interpretation Using Bootstrapping and the Web as a Corpus
13 0.54244471 112 emnlp-2011-Refining the Notions of Depth and Density in WordNet-based Semantic Similarity Measures
14 0.54044336 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing
15 0.53765947 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing
16 0.53456128 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
17 0.5184201 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances
18 0.51375967 119 emnlp-2011-Semantic Topic Models: Combining Word Distributional Statistics and Dictionary Definitions
19 0.51269382 67 emnlp-2011-Hierarchical Verb Clustering Using Graph Factorization
20 0.51101989 107 emnlp-2011-Probabilistic models of similarity in syntactic context