emnlp emnlp2011 emnlp2011-119 knowledge-graph by maker-knowledge-mining

119 emnlp-2011-Semantic Topic Models: Combining Word Distributional Statistics and Dictionary Definitions

Source: pdf

Author: Weiwei Guo ; Mona Diab

Abstract: In this paper, we propose a novel topic model based on incorporating dictionary definitions. Traditional topic models treat words as surface strings without assuming predefined knowledge about word meaning. They infer topics only by observing surface word co-occurrence. However, the co-occurred words may not be semantically related in a manner that is relevant for topic coherence. Exploiting dictionary definitions explicitly in our model yields a better understanding of word semantics leading to better text modeling. We exploit WordNet as a lexical resource for sense definitions. We show that explicitly modeling word definitions helps improve performance significantly over the baseline for a text categorization task.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu @ Abstract In this paper, we propose a novel topic model based on incorporating dictionary definitions. [sent-3, score-0.505]

2 Traditional topic models treat words as surface strings without assuming predefined knowledge about word meaning. [sent-4, score-0.503]

3 However, the co-occurred words may not be semantically related in a manner that is relevant for topic coherence. [sent-6, score-0.491]

4 Exploiting dictionary definitions explicitly in our model yields a better understanding of word semantics leading to better text modeling. [sent-7, score-0.364]

5 We exploit WordNet as a lexical resource for sense definitions. [sent-8, score-0.328]

6 We show that explicitly modeling word definitions helps improve performance significantly over the baseline for a text categorization task. [sent-9, score-0.431]

7 In LDA, there are two factors which determine the topic of a word: the topic distribution of the document, and the probability of a topic to emit this word. [sent-14, score-1.529]

8 If a word is not observed frequently enough in the corpus, then it is likely to be assigned the dominant topic in this document. [sent-17, score-0.534]

9 For example, the word grease (a thick fatty oil) in a political domain document should be assigned the topic chemicals. [sent-18, score-0.715]

10 edu LD5A52 model will assign it the dominant document topic politics. [sent-21, score-0.537]

11 Therefore, in this paper, we test our hypothesis by exploring the integration of word semantics explicitly in the topic modeling framework. [sent-24, score-0.632]

12 In order to incorporate word semantics from dictionaries, we recognize the need to model sense-topic distribution rather than word-topic distribution, since dictionaries are constructed at the sense level. [sent-25, score-0.454]

13 The notion of a sense in WordNet goes beyond a typical word sense in a traditional dictionary since a WordNet sense links senses of different words that have similar meanings. [sent-27, score-1.196]

14 Accordingly, the sense for the first verbal entry for buy and for purchase will have the same sense id (and same definition) in WordNet, while they could have different meaning definitions in a traditional dictionary such as the Merriam Webster Dictionary or LDOCE. [sent-28, score-0.892]

15 In our model, a topic will first emit a WordNet sense, then the sense will generate a word. [sent-29, score-0.892]

16 Moreover, we analyze both qualitatively and quantitatively the contribution of modeling definitions (by teasing out the contribution ofexplicit sense modeling in a word sense disambiguation task). [sent-33, score-1.027]

17 In figure 1a, given a corpus with D documents, LDA will summarize each document as a normalized T-dimension topic mixture θ. [sent-44, score-0.607]

18 φ contains T multinomial distribution, each representing the probability of a topic z generating word w p(w|z). [sent-46, score-0.536]

19 φ is drawn from a Dirichlet distribution Dir(β) w φit his prior In Collapsed Gibbs Sampling, the distribution of a topic for the word wi = w based on values of other data is computed as: β. [sent-47, score-0.645]

20 Hence, the first fraction is the proportion of the topic in this document p(z|θ). [sent-49, score-0.537]

21 The second fraction is the probability eonf topic z emitting ownodrd f w. [sent-50, score-0.462]

22 tAiofnte irs th thee topics become stable, all the topics in a document construct the topic mixture θ. [sent-51, score-0.891]

23 2 Applying Word Sense Disambiguation Techniques We add a sense node between the topic node and the word node based on two linguistic observations: a) Polysemy: many words have more than one meaning. [sent-53, score-0.978]

24 A topic is more directly relevant to a word meaning (sense) than to a word due to pol- ysemy; b) Synonymy: different words may share the same sense. [sent-54, score-0.544]

25 Hence the word choice is dependent on the relatedness of the sense and its fit to the document context. [sent-59, score-0.444]

26 In standard topic models, the topic of a word is sampled from the document level topic mixture θ. [sent-60, score-1.628]

27 Titov and McDonald (2008) find that using global topic mixtures can only extract global topics in online reviews (e. [sent-63, score-0.604]

28 They design the Multi-grain LDA where the local topic of a word is only determined by topics of surrounding sentences. [sent-66, score-0.678]

29 In word sense disambiguation (WSD), an even narrower context is taken into consideration, for instance in graph based WSD models (Mihalcea, 2005), the choice of a sense for a word only depends on a local window whose size equals the length of the sentence. [sent-67, score-0.953]

30 This approach enables us to exploit different context sizes without restricting it to the sentence length, and hence spread topic information across sentence boundaries. [sent-86, score-0.522]

31 3 Integrating Definitions Intuitively, a sense definition reveals some prior knowledge on the topic domain: the definition of sense [crime, offense, offence] indicates a legal topic; the definition of sense [basketball] indicates a sports topic, etc. [sent-88, score-1.707]

32 Therefore, during inference, we want to choose a topic/sense pair for each word, such that the topic is supported by the context θ and the sense definition also matches that topic. [sent-89, score-0.877]

33 Given that words used in the sense definitions are strongly relevant to the sense/concept, we set out to find the topics of those definition words, and accordingly assign the sense sen itself these topics. [sent-90, score-1.314]

34 We treat a sense definition as a document and perform Gibbs sampling on it. [sent-91, score-0.547]

35 Therefore, before the topic model sees the actual documents, each sense s has been sampled γ times. [sent-93, score-0.846]

36 The γ topics are then used as a “training set”, so that given a sense, φ has some prior knowledge of which topic it should be sampled from. [sent-94, score-0.66]

37 Consider the sense [party, political party] with a definition “an organization to gain political power” of length 6 when γ = 12. [sent-95, score-0.545]

38 If topic model assigns politics topic to the words “organization political power”, then sense [party, political party] will be sampled from politics topic for 3 ∗ γ/definitionLength = 6 times. [sent-96, score-1.996]

39 For a word wij in window vi, a sense sij is drawn from the topic, and then sij + generates the word wi. [sent-101, score-0.863]

40 From WordNet we know the set of words W(s) that have a sense s as an entry. [sent-103, score-0.328]

41 Hence, for each sense s, there is a multinomial distribution ηs over W(s). [sent-105, score-0.402]

42 On the definition side, we use a different prior αs to generate a topic mixture θ. [sent-107, score-0.619]

43 Aside from generating si, zi will deterministically generate the current sense sen for γ/Nsen times (Nsen is the number of words in the definition of sense sen), so that sen is sampled γ times in total. [sent-108, score-1.209]

44 The formal procedure of generative process is the following: For the definition of sense sen: • choose topic mixture θ ∼ Dir(αs). [sent-109, score-0.947]

45 • fcohro oeasech t owpoicrd m wi: c ehaocohs we topic zi ∼ Mult(θ). [sent-110, score-0.677]

46 deterministically cMhuoolste(φ sense sen ∼ − − − Mult(φzi ) nfoisrt γ/Nsen htiomoeses. [sent-112, score-0.487]

47 For each window vi in a document: • choose local topic mixture θi ∼ Dir(αd). [sent-114, score-0.735]

48 • fcohro oeasech l owcaorld to wij imn vi: −or c ehaocohs we topic zij ∼ Mult(θi). [sent-115, score-0.779]

49 4 Using WordNet Since definitions and documents are in different genre/domains, they have different distributions on senses and words. [sent-119, score-0.396]

50 We observe that neighboring sense definitions are usually similar and are in the same topic domain. [sent-124, score-0.983]

51 Hence, we represent the definition of a sense as the union of itself with its neighboring sense definitions pertaining to WordNet relations. [sent-125, score-0.936]

52 We use a type specific subscript to distinguish them: Ps(·) for sense definitions and Pd(·) for documents. [sent-133, score-0.521]

53 SseYnS=1QΓz(nΓ((snez(ns)en+) T+α α)s) (2) n(zsen) where means the number of times a word in the definition of sen is assigned to topic z, and is the length of the definition. [sent-137, score-0.78]

54 d Socimumilaernlyts, assigned to topic z, and be the number of times sense s assigned to topic z. [sent-141, score-1.314]

55 Note that when s appears in the superscript surrounded by brackets n(sen) nzs nz(s), such as it denotes the number of words assigned to topics z in the definition of sense s. [sent-142, score-0.588]

56 Let seni be the sense definition containing word wi, then we have: Pnsz(+niszP=+szn0,(−s(i−0s,)iz0=,)γz/sn|z(s−(0s)i,+s)−βi,Swβ)n∝s+n sw(− s|−Wie+ni,z (λ)is+)|λTαs (5) TheP subscript −i in expression n−i denotes theT nheum sbuebrs corfi pcter −taiin i neve enxtpsr excluding word wi. [sent-146, score-0.497]

57 The probability for documents is similar to that for definitions except that there is a topic mixture for each word, which is estimated by the topics in the window. [sent-149, score-0.942]

58 The column word types shows corresponding word#pos types, and the total number of possible sense types is listed in column sense types. [sent-170, score-0.697]

59 (a) LDA: the traditional topic model proposed in (Blei et al. [sent-174, score-0.462]

60 (c) STM0: the topic model with an additional explicit sense node in the model, but we do not model the sense definitions. [sent-178, score-1.167]

61 In topic models, each word is generalized as a topic and each document is summarized as the topic mixture θ, hence it is natural to evaluate the quality of inferred topics in a text categorization task. [sent-194, score-1.886]

62 , 2005): first run topic models on each dataset individually without knowing label information to achieve document level topic mixtures, then we employ Naive Bayes and SVM (both implemented in the WEKA Toolkit (Hall et al. [sent-196, score-0.999]

63 For the three datasets, we use the Brown corpus only as a tuning set to decide on the topic model parameters for all of our experimentation, and use the optimized parameters directly on NYT and R20 without further optimization. [sent-204, score-0.462]

64 = all means that no local window is used, and γ = 0 means definitions are not used. [sent-215, score-0.355]

65 values, we find that explicitly modeling the sense node in the model greatly improves the classification results. [sent-224, score-0.494]

66 We present three baselines in Table 2: (1) WEKA uses WEKA’s classifiers directly on bag-of-words without topic modeling. [sent-233, score-0.462]

67 It is worth noting that R20 (compared to NYT) is a harder condition for topic models. [sent-240, score-0.493]

68 Table 2 illustrates that despite the difference between NYT, Reuters and Brown (data size, genre, domains, category labels), exploiting WSD techniques (namely using a local window size coupled with explicitly modeling a sense node) yields TableW2ESL:TKDCMA l+a20FsSifc4698t7. [sent-246, score-0.575]

69 In this case, it is very important to sample a true topic for each word, so that ML algorithms can distinguish the Culture documents from the Religion ones by the proportion of topics. [sent-293, score-0.537]

70 Accordingly, adding definitions should be very helpful, since it specifically defines the topic of a sense, and shields it from the influence of other “incorrect/irrelevant” topics. [sent-294, score-0.655]

71 2 Quantitative Analysis with Word Sense Disambiguation A side effect of our model is that it sense disambiguates all words. [sent-297, score-0.328]

72 As a means of analyzing and gaining some insight into the exact contribution of explicitly incorporating sense definitions (STMn) versus simply a sense node (STM0) in the model, we investigate the quality of the sense assignments in our models. [sent-298, score-1.269]

73 We believe that the choice of the correct sense is directly correlated with the choice of a correct topic in our framework. [sent-299, score-0.79]

74 Accordingly, a relative improvement of STMn over STM0 (where the only difference is the explicit sense definition modeling) in WSD task is an indicator of the impact of using sense definitions in the text categorization task. [sent-300, score-1.048]

75 , 2007), where a WordNet based topic model for WSD is introduced. [sent-306, score-0.462]

76 Finally, we choose the most frequent answer for each word in the last 10 iterations of a Gibbs Sampling run as the final sense choice. [sent-318, score-0.369]

77 This is understandable since topic information content is mostly borne by nouns and adjectives, while adverbs and verbs tend to be less informative about topics (e. [sent-334, score-0.636]

78 Hence topic models are weaker in their ability to identify clear cues for senses for verbs and adverbs. [sent-337, score-0.59]

79 At last, we notice that the most frequent sense baseline performs much better than our models. [sent-341, score-0.328]

80 5 Related work Various topic models have been developed for many applications. [sent-344, score-0.462]

81 (2007) are the first to integrate semantics into the topic model framework. [sent-350, score-0.506]

82 They propose a topic model based on WordNet noun hierarchy for WSD. [sent-351, score-0.507]

83 A word is assumed to be generated by first sampling a topic, then choosing a path from the root node of hierarchy to a sense node corresponding to that word. [sent-352, score-0.615]

84 (2008) also incorporate a sense hierarchy into a topic model. [sent-356, score-0.835]

85 In their framework, a word may be directly generated from a topic (as in standard topic models), or it can be generated by choosing a sense path in the hierarchy. [sent-357, score-1.339]

86 Note that no topic information is on the sense path. [sent-358, score-0.79]

87 Recently, several systems have been proposed to apply topic models to WSD. [sent-361, score-0.462]

88 (2007) incorporate topic features into a supervised WSD framework. [sent-363, score-0.462]

89 Brody and Lapata (2009) place the sense induction in a Baysian framework by assuming each context word is generated from the target word’s senses, and a context is modeled as a multinomial distribution over the target word’s senses rather than topics. [sent-364, score-0.571]

90 (2010) design several systems that use latent topics to find a most likely sense based on the sense paraphrases (extracted from WordNet) and context. [sent-366, score-0.798]

91 Our model borrows the local window idea from word sense disambiguation community. [sent-368, score-0.584]

92 6 Conclusion and Future Work We presented a novel model STM that combines explicit semantic information and word distribution information in a unified topic model. [sent-371, score-0.544]

93 STM is able to capture topics of words more accurately than traditional LDA topic models. [sent-372, score-0.604]

94 Jordan Boyd-Graber and David turning predominant senses word sense disambiguation. [sent-392, score-0.497]

95 Putop: into a topic model for In Proceedings of the on Semantic Evalua- Jordan Boyd-Graber, David M. [sent-397, score-0.462]

96 Samuel Brody word sense Conference pages 103–1 and Mirella Lapata. [sent-402, score-0.369]

97 Topic models for word sense disambiguation and token-based idiom detection. [sent-451, score-0.422]

98 Unsupervised large-vocabulary word sense disambiguation with graph-based algorithms for sequence data labeling. [sent-459, score-0.422]

99 Unsupervised graph-based word sense disambiguation using measures of word semantic similarity. [sent-469, score-0.463]

100 Modeling online reviews with multi-grain topic models. [sent-478, score-0.462]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('topic', 0.462), ('sense', 0.328), ('wsd', 0.237), ('definitions', 0.193), ('nyt', 0.192), ('lda', 0.183), ('sen', 0.159), ('wordnet', 0.147), ('topics', 0.142), ('sij', 0.141), ('window', 0.129), ('senses', 0.128), ('zij', 0.122), ('categorization', 0.112), ('mult', 0.111), ('emit', 0.102), ('semcor', 0.102), ('stmn', 0.102), ('zi', 0.092), ('reuters', 0.089), ('definition', 0.087), ('stm', 0.082), ('zjj', 0.082), ('svm', 0.081), ('accordingly', 0.077), ('brown', 0.077), ('document', 0.075), ('documents', 0.075), ('guo', 0.074), ('blei', 0.073), ('sinha', 0.07), ('gibbs', 0.07), ('mixture', 0.07), ('griffiths', 0.069), ('weka', 0.066), ('dir', 0.066), ('political', 0.065), ('diab', 0.064), ('cchhoooossee', 0.061), ('hence', 0.06), ('sampling', 0.057), ('sampled', 0.056), ('party', 0.055), ('disambiguation', 0.053), ('mihalcea', 0.053), ('sjj', 0.053), ('node', 0.049), ('politics', 0.048), ('cai', 0.048), ('si', 0.047), ('dirichlet', 0.047), ('choosing', 0.046), ('hierarchy', 0.045), ('semantics', 0.044), ('explicitly', 0.043), ('dictionary', 0.043), ('modeling', 0.042), ('nz', 0.042), ('wij', 0.042), ('vi', 0.041), ('distribution', 0.041), ('word', 0.041), ('deafl', 0.041), ('dietz', 0.041), ('ehaocohs', 0.041), ('fcohro', 0.041), ('grease', 0.041), ('ldawn', 0.041), ('nsw', 0.041), ('odni', 0.041), ('oeasech', 0.041), ('pd', 0.039), ('integrating', 0.038), ('culture', 0.037), ('titov', 0.037), ('naive', 0.036), ('synonymy', 0.036), ('weiwei', 0.035), ('crime', 0.035), ('iarpa', 0.035), ('tz', 0.035), ('religion', 0.035), ('pradhan', 0.035), ('bayes', 0.035), ('pos', 0.033), ('local', 0.033), ('multinomial', 0.033), ('collapsed', 0.032), ('mona', 0.032), ('understandable', 0.032), ('chemudugunta', 0.032), ('classification', 0.032), ('assigned', 0.031), ('noting', 0.031), ('values', 0.03), ('wi', 0.03), ('hypothesize', 0.03), ('imn', 0.03), ('manner', 0.029), ('allocation', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999839 119 emnlp-2011-Semantic Topic Models: Combining Word Distributional Statistics and Dictionary Definitions

Author: Weiwei Guo ; Mona Diab

2 0.36921009 101 emnlp-2011-Optimizing Semantic Coherence in Topic Models

Author: David Mimno ; Hanna Wallach ; Edmund Talley ; Miriam Leenders ; Andrew McCallum

Abstract: Latent variable models have the potential to add value to large document collections by discovering interpretable, low-dimensional subspaces. In order for people to use such models, however, they must trust them. Unfortunately, typical dimensionality reduction methods for text, such as latent Dirichlet allocation, often produce low-dimensional subspaces (topics) that are obviously flawed to human domain experts. The contributions of this paper are threefold: (1) An analysis of the ways in which topics can be flawed; (2) an automated evaluation metric for identifying such topics that does not rely on human annotators or reference collections outside the training data; (3) a novel statistical topic model based on this metric that significantly improves topic quality in a large-scale document collection from the National Institutes of Health (NIH).

3 0.29972231 21 emnlp-2011-Bayesian Checking for Topic Models

Author: David Mimno ; David Blei

Abstract: Real document collections do not fit the independence assumptions asserted by most statistical topic models, but how badly do they violate them? We present a Bayesian method for measuring how well a topic model fits a corpus. Our approach is based on posterior predictive checking, a method for diagnosing Bayesian models in user-defined ways. Our method can identify where a topic model fits the data, where it falls short, and in which directions it might be improved.

4 0.21461436 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases

Author: Matthias Hartung ; Anette Frank

Abstract: This paper introduces an attribute selection task as a way to characterize the inherent meaning of property-denoting adjectives in adjective-noun phrases, such as e.g. hot in hot summer denoting the attribute TEMPERATURE, rather than TASTE. We formulate this task in a vector space model that represents adjectives and nouns as vectors in a semantic space defined over possible attributes. The vectors incorporate latent semantic information obtained from two variants of LDA topic models. Our LDA models outperform previous approaches on a small set of 10 attributes with considerable gains on sparse representations, which highlights the strong smoothing power of LDA models. For the first time, we extend the attribute selection task to a new data set with more than 200 classes. We observe that large-scale attribute selection is a hard problem, but a subset of attributes performs robustly on the large scale as well. Again, the LDA models outperform the VSM baseline.

5 0.20489283 63 emnlp-2011-Harnessing WordNet Senses for Supervised Sentiment Classification

Author: Balamurali AR ; Aditya Joshi ; Pushpak Bhattacharyya

Abstract: Traditional approaches to sentiment classification rely on lexical features, syntax-based features or a combination of the two. We propose semantic features using word senses for a supervised document-level sentiment classifier. To highlight the benefit of sense-based features, we compare word-based representation of documents with a sense-based representation where WordNet senses of the words are used as features. In addition, we highlight the benefit of senses by presenting a part-ofspeech-wise effect on sentiment classification. Finally, we show that even if a WSD engine disambiguates between a limited set of words in a document, a sentiment classifier still performs better than what it does in absence of sense annotation. Since word senses used as features show promise, we also examine the possibility of using similarity metrics defined on WordNet to address the problem of not finding a sense in the training corpus. We per- form experiments using three popular similarity metrics to mitigate the effect of unknown synsets in a test corpus by replacing them with similar synsets from the training corpus. The results show promising improvement with respect to the baseline.

6 0.11888644 107 emnlp-2011-Probabilistic models of similarity in syntactic context

7 0.094858915 144 emnlp-2011-Unsupervised Learning of Selectional Restrictions and Detection of Argument Coercions

8 0.093695149 61 emnlp-2011-Generating Aspect-oriented Multi-Document Summarization with Event-aspect model

9 0.093097195 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics

10 0.089371637 25 emnlp-2011-Cache-based Document-level Statistical Machine Translation

11 0.087734245 114 emnlp-2011-Relation Extraction with Relation Topics

12 0.082131825 98 emnlp-2011-Named Entity Recognition in Tweets: An Experimental Study

13 0.077128381 80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context

14 0.073946074 128 emnlp-2011-Structured Relation Discovery using Generative Models

15 0.07236398 7 emnlp-2011-A Joint Model for Extended Semantic Role Labeling

16 0.07104826 55 emnlp-2011-Exploiting Syntactic and Distributional Information for Spelling Correction with Web-Scale N-gram Models

17 0.070418924 130 emnlp-2011-Summarize What You Are Interested In: An Optimization Framework for Interactive Personalized Summarization

18 0.068861105 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances

19 0.068496861 112 emnlp-2011-Refining the Notions of Depth and Density in WordNet-based Semantic Similarity Measures

20 0.062698446 125 emnlp-2011-Statistical Machine Translation with Local Language Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.234), (1, -0.211), (2, -0.224), (3, -0.276), (4, -0.021), (5, 0.449), (6, 0.114), (7, 0.003), (8, 0.086), (9, -0.09), (10, 0.016), (11, 0.078), (12, 0.017), (13, 0.048), (14, 0.014), (15, -0.083), (16, -0.038), (17, -0.123), (18, 0.039), (19, -0.057), (20, 0.005), (21, 0.046), (22, 0.026), (23, 0.153), (24, -0.052), (25, 0.05), (26, -0.061), (27, -0.036), (28, -0.003), (29, -0.049), (30, -0.043), (31, -0.019), (32, -0.064), (33, 0.081), (34, -0.038), (35, 0.019), (36, -0.006), (37, -0.009), (38, 0.04), (39, 0.006), (40, 0.069), (41, -0.012), (42, -0.014), (43, 0.068), (44, 0.008), (45, 0.01), (46, -0.026), (47, 0.019), (48, -0.015), (49, -0.006)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98561752 119 emnlp-2011-Semantic Topic Models: Combining Word Distributional Statistics and Dictionary Definitions

Author: Weiwei Guo ; Mona Diab

2 0.91994929 101 emnlp-2011-Optimizing Semantic Coherence in Topic Models

Author: David Mimno ; Hanna Wallach ; Edmund Talley ; Miriam Leenders ; Andrew McCallum

3 0.90952158 21 emnlp-2011-Bayesian Checking for Topic Models

Author: David Mimno ; David Blei

4 0.62354541 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases

Author: Matthias Hartung ; Anette Frank

5 0.49146035 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics

Author: Joseph Reisinger ; Raymond Mooney

Abstract: Context-dependent word similarity can be measured over multiple cross-cutting dimensions. For example, lung and breath are similar thematically, while authoritative and superficial occur in similar syntactic contexts, but share little semantic similarity. Both of these notions of similarity play a role in determining word meaning, and hence lexical semantic models must take them both into account. Towards this end, we develop a novel model, Multi-View Mixture (MVM), that represents words as multiple overlapping clusterings. MVM finds multiple data partitions based on different subsets of features, subject to the marginal constraint that feature subsets are distributed according to Latent Dirich- let Allocation. Intuitively, this constraint favors feature partitions that have coherent topical semantics. Furthermore, MVM uses soft feature assignment, hence the contribution of each data point to each clustering view is variable, isolating the impact of data only to views where they assign the most features. Through a series of experiments, we demonstrate the utility of MVM as an inductive bias for capturing relations between words that are intuitive to humans, outperforming related models such as Latent Dirichlet Allocation.

6 0.47730762 63 emnlp-2011-Harnessing WordNet Senses for Supervised Sentiment Classification

7 0.38564464 144 emnlp-2011-Unsupervised Learning of Selectional Restrictions and Detection of Argument Coercions

8 0.3825447 107 emnlp-2011-Probabilistic models of similarity in syntactic context

9 0.34541577 25 emnlp-2011-Cache-based Document-level Statistical Machine Translation

10 0.32448652 61 emnlp-2011-Generating Aspect-oriented Multi-Document Summarization with Event-aspect model

11 0.27711838 91 emnlp-2011-Literal and Metaphorical Sense Identification through Concrete and Abstract Context

12 0.2692591 112 emnlp-2011-Refining the Notions of Depth and Density in WordNet-based Semantic Similarity Measures

13 0.26881993 86 emnlp-2011-Lexical Co-occurrence, Statistical Significance, and Word Association

14 0.26127368 64 emnlp-2011-Harnessing different knowledge sources to measure semantic relatedness under a uniform model

15 0.25549522 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model

16 0.25161439 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances

17 0.2503064 55 emnlp-2011-Exploiting Syntactic and Distributional Information for Spelling Correction with Web-Scale N-gram Models

18 0.24801329 11 emnlp-2011-A Simple Word Trigger Method for Social Tag Suggestion

19 0.24776956 130 emnlp-2011-Summarize What You Are Interested In: An Optimization Framework for Interactive Personalized Summarization

20 0.24481358 80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(15, 0.012), (23, 0.118), (36, 0.018), (37, 0.035), (45, 0.167), (53, 0.027), (54, 0.033), (57, 0.012), (62, 0.013), (64, 0.031), (66, 0.049), (69, 0.014), (79, 0.03), (82, 0.02), (87, 0.014), (90, 0.014), (94, 0.274), (96, 0.051), (98, 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.82172698 119 emnlp-2011-Semantic Topic Models: Combining Word Distributional Statistics and Dictionary Definitions

Author: Weiwei Guo ; Mona Diab

2 0.81212896 3 emnlp-2011-A Correction Model for Word Alignments

Author: J. Scott McCarley ; Abraham Ittycheriah ; Salim Roukos ; Bing Xiang ; Jian-ming Xu

Abstract: Models of word alignment built as sequences of links have limited expressive power, but are easy to decode. Word aligners that model the alignment matrix can express arbitrary alignments, but are difficult to decode. We propose an alignment matrix model as a correction algorithm to an underlying sequencebased aligner. Then a greedy decoding algorithm enables the full expressive power of the alignment matrix formulation. Improved alignment performance is shown for all nine language pairs tested. The improved alignments also improved translation quality from Chinese to English and English to Italian.

3 0.62745452 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus

Author: Emily M. Bender ; Dan Flickinger ; Stephan Oepen ; Yi Zhang

Abstract: In order to obtain a fine-grained evaluation of parser accuracy over naturally occurring text, we study 100 examples each of ten reasonably frequent linguistic phenomena, randomly selected from a parsed version of the English Wikipedia. We construct a corresponding set of gold-standard target dependencies for these 1000 sentences, operationalize mappings to these targets from seven state-of-theart parsers, and evaluate the parsers against this data to measure their level of success in identifying these dependencies.

4 0.61159223 90 emnlp-2011-Linking Entities to a Knowledge Base with Query Expansion

Author: Swapna Gottipati ; Jing Jiang

Abstract: In this paper we present a novel approach to entity linking based on a statistical language model-based information retrieval with query expansion. We use both local contexts and global world knowledge to expand query language models. We place a strong emphasis on named entities in the local contexts and explore a positional language model to weigh them differently based on their distances to the query. Our experiments on the TAC-KBP 2010 data show that incorporating such contextual information indeed aids in disambiguating the named entities and consistently improves the entity linking performance. Compared with the official results from KBP 2010 participants, our system shows competitive performance.

5 0.60358804 19 emnlp-2011-Approximate Scalable Bounded Space Sketch for Large Data NLP

Author: Amit Goyal ; Hal Daume III

Abstract: We exploit sketch techniques, especially the Count-Min sketch, a memory, and time efficient framework which approximates the frequency of a word pair in the corpus without explicitly storing the word pair itself. These methods use hashing to deal with massive amounts of streaming text. We apply CountMin sketch to approximate word pair counts and exhibit their effectiveness on three important NLP tasks. Our experiments demonstrate that on all of the three tasks, we get performance comparable to Exact word pair counts setting and state-of-the-art system. Our method scales to 49 GB of unzipped web data using bounded space of 2 billion counters (8 GB memory).

6 0.60226953 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics

7 0.58634007 63 emnlp-2011-Harnessing WordNet Senses for Supervised Sentiment Classification

8 0.58588821 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases

9 0.5830915 33 emnlp-2011-Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs

10 0.58223063 112 emnlp-2011-Refining the Notions of Depth and Density in WordNet-based Semantic Similarity Measures

11 0.58004612 64 emnlp-2011-Harnessing different knowledge sources to measure semantic relatedness under a uniform model

12 0.57908291 86 emnlp-2011-Lexical Co-occurrence, Statistical Significance, and Word Association

13 0.57846481 101 emnlp-2011-Optimizing Semantic Coherence in Topic Models

14 0.57831991 21 emnlp-2011-Bayesian Checking for Topic Models

15 0.57556468 55 emnlp-2011-Exploiting Syntactic and Distributional Information for Spelling Correction with Web-Scale N-gram Models

16 0.57052678 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing

17 0.57049292 107 emnlp-2011-Probabilistic models of similarity in syntactic context

18 0.56943995 81 emnlp-2011-Learning General Connotation of Words using Graph-based Algorithms

19 0.56873381 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features

20 0.56685901 128 emnlp-2011-Structured Relation Discovery using Generative Models