emnlp emnlp2011 emnlp2011-37 knowledge-graph by maker-knowledge-mining

37 emnlp-2011-Cross-Cutting Models of Lexical Semantics


Source: pdf

Author: Joseph Reisinger ; Raymond Mooney

Abstract: Context-dependent word similarity can be measured over multiple cross-cutting dimensions. For example, lung and breath are similar thematically, while authoritative and superficial occur in similar syntactic contexts, but share little semantic similarity. Both of these notions of similarity play a role in determining word meaning, and hence lexical semantic models must take them both into account. Towards this end, we develop a novel model, Multi-View Mixture (MVM), that represents words as multiple overlapping clusterings. MVM finds multiple data partitions based on different subsets of features, subject to the marginal constraint that feature subsets are distributed according to Latent Dirich- let Allocation. Intuitively, this constraint favors feature partitions that have coherent topical semantics. Furthermore, MVM uses soft feature assignment, hence the contribution of each data point to each clustering view is variable, isolating the impact of data only to views where they assign the most features. Through a series of experiments, we demonstrate the utility of MVM as an inductive bias for capturing relations between words that are intuitive to humans, outperforming related models such as Latent Dirichlet Allocation.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Context-dependent word similarity can be measured over multiple cross-cutting dimensions. [sent-3, score-0.051]

2 Both of these notions of similarity play a role in determining word meaning, and hence lexical semantic models must take them both into account. [sent-5, score-0.119]

3 MVM finds multiple data partitions based on different subsets of features, subject to the marginal constraint that feature subsets are distributed according to Latent Dirich- let Allocation. [sent-7, score-0.111]

4 Intuitively, this constraint favors feature partitions that have coherent topical semantics. [sent-8, score-0.053]

5 Furthermore, MVM uses soft feature assignment, hence the contribution of each data point to each clustering view is variable, isolating the impact of data only to views where they assign the most features. [sent-9, score-0.188]

6 Human knowledgebases such as Wikipedia also exhibit such multiple clustering structure (e. [sent-16, score-0.085]

7 In this work, we introduce a novel probabilistic clustering method, Multi-View Mixture (MVM), based on cross-cutting categorization (Shafto et al. [sent-20, score-0.105]

8 Cross-cutting categorization finds multiple feature subsets (categorization systems) that produce high quality clusterings of the data. [sent-22, score-0.164]

9 Contextdependent variation in word usage can be accounted for by leveraging multiple latent categorization systems. [sent-24, score-0.07]

10 In particular, cross-cutting models can be used to capture both syntagmatic and paradigmatic notions of word relatedness, breaking up word features into multiple categorization systems and then computing similarity separately for each system. [sent-25, score-0.214]

11 Each clustering (view) in MVM consists of a distribution over features and data and views are further subdivided into clusters based on a DPMM. [sent-27, score-0.176]

12 tc ho2d0s1 in A Nsasotucira tlio Lnan fogru Cagoem Ppruotcaetisosninagl, L pinag uesis 1ti4c0s5–1415, We evaluate MVM against several other modelbased clustering procedures in a series of human evaluation tasks, measuring its ability to find meaningful syntagmatic and paradigmatic structure. [sent-31, score-0.168]

13 We find that MVM finds more semantically and syntactically coherent fine-grained structure, using both common and rare n-gram contexts. [sent-32, score-0.058]

14 The distributional hypothesis addresses the problem of modeling word similarity (Curran, 2004; Miller and Charles, 1991 ; Sch u¨tze, 1998; Turney, 2006), and can be extended to selectional preference (Resnik, 1997) and lexical substitution (McCarthy and Navigli, 2007) as well. [sent-38, score-0.066]

15 , the triangle inequality: the sum of distances from bat to club and club to association is less than the distance from bat to association (Griffiths et al. [sent-45, score-0.064]

16 The cluster assumption is a natural fit for lexical semantics, as partitions can account for metric violations. [sent-56, score-0.095]

17 The end result is a model capable of representing multiple, overlapping similarity metrics that result in disparate valid clusterings leveraging the Subspace Hypothesis: For any pair of words, the set of “active” features governing their apparent similarity differs. [sent-57, score-0.2]

18 For example wine and bottle are similar and wine and vinegar are similar, but it would not be reasonable to expect that the features governing such similarity computations to overlap much, despite occurring in similar documents. [sent-58, score-0.078]

19 MVM can extract multiple competing notions of similarity, for example both paradigmatic, or thematic similarity, and syntagmatic or syntactic similarity, in addition to more fine grained relations. [sent-59, score-0.112]

20 For example, company websites can be clustered by sector or by geographic location, with one particular clustering becoming predominant when a majority of features correlate with it. [sent-63, score-0.102]

21 In fact, informative features in one clustering may be noise in another, e. [sent-64, score-0.086]

22 the occurrence of CEO is not necessarily discriminative when clustering companies by industry sector, but may be useful in other clusterings. [sent-66, score-0.068]

23 Multiple clustering is one approach to inferring feature subspaces that lead to high quality data partitions. [sent-67, score-0.068]

24 Multiple clustering also improves the flexibility of generative clustering models, as a single model is no longer required to explain all the variance in the feature dimensions (Mansinghka et al. [sent-68, score-0.136]

25 ruenwglfiacstndgawhtendisr_ aepsxlicojnrgeupsdtarlieynwghalynicodawres_beac_ouste psatoenmhsobiyue_rnalgictdfonerswa_lmetnvoarisyucldeotanrhseulbatyfso_rmudnki_tcorhyelaimpn Figure 1: Example clusterings from MVM applied to Google n-gram data. [sent-70, score-0.074]

26 Top contexts (features) for each view are shown, along with examples of word clusters. [sent-71, score-0.106]

27 MVM is a multinomial-Dirichlet multiple clus- tering procedure for distributional lexical semantics that fits multiple, overlapping Dirichlet Process Mixture Models (DPMM) to a set of word data. [sent-73, score-0.092]

28 Features are distributed across the set of clusterings (views) using LDA, and each DPMM is fit using a subset of the features. [sent-74, score-0.074]

29 This reduces clustering noise and allows MVM to capture multiple ways in which the data can be partitioned. [sent-75, score-0.103]

30 Figure 1 shows a simple example, and Figure 2 shows a larger sample of feature-view assignments from a 3-view MVM fit to contexts drawn from the Google n-gram corpus. [sent-76, score-0.077]

31 |M| disparate clusterings (views) are inferred jointly from a set of data D twd|d P r1. [sent-78, score-0.09]

32 θ|dM| views Empirically, is represented as a set of feature-view assignments zd, sampled via the standard LDA collapsed Gibbs sampler. [sent-83, score-0.05]

33 Conditional on the feature-view assignment tzu, a clustering is inferred for each view using the Chinese Restaurant Process representation of the DP. [sent-86, score-0.115]

34 The clustering probability is given by ppc|z, 9 pptcmu, z, wq wq m¹M 1d¹|D 1|ppwdrz ms|cm,zqppcm|zq. [sent-87, score-0.104]

35 ppcm |zq is a prior on the clustering for view m, i. [sent-88, score-0.115]

36 the DPMM, and ppwrdz ms |cm, zq is the like- where lihood of the clustering cm given the data point restricted to the features assigned to view m: wrdz ms def twid|zid wd mu. [sent-90, score-0.203]

37 Thus, we treat the m clusterings cm as conditionally independent given the feature-view assignments. [sent-91, score-0.091]

38 The feature-view assignments tzu act as a set of marginal constraints on the multiple clusterings, and the impact that each data point can have on each clustering is limited by the number of features assigned to it. [sent-92, score-0.103]

39 For example, in a two-view model, zid 1might be set for all syntactic features (yielding a syntagmatic clustering) while zid 2 is set for document features (paradigmatic clustering). [sent-93, score-0.12]

40 By allowing the clustering model capacity to vary via the DPMM, MVM can naturally account for the semantic variance of the view. [sent-94, score-0.089]

41 View 1 cluster 2 and View 3 cluster 1 both contain pas1t4-t0e8nse verbs, but only overlap on a subset of syntactic features. [sent-100, score-0.086]

42 The most similar model to ours is Cross-cutting categorization (CCC), which fits multiple DPMMs to non-overlapping partitions of features (Mansinghka et al. [sent-103, score-0.091]

43 Unlike MVM, CCC partitions features among multiple DPMMs, hence all occurrences of a particular feature will end up in a single clustering, instead of assigning them softly using LDA. [sent-106, score-0.077]

44 1 Word Representation MVM is trained as a lexical semantic model on Web-scale n-gram and semantic context data. [sent-110, score-0.057]

45 Ngram contexts are drawn from a combination of the Google n-gram and Google books n-gram corpora, with the head word removed: e. [sent-111, score-0.092]

46 for the term architect, we collect contexts such as the of the house, an is a, and the of the universe. [sent-113, score-0.059]

47 Semantic contexts are derived from word occurrence in Wikipedia documents: each document a word appears in is added as a potential feature for that word. [sent-114, score-0.082]

48 Syntax-only Words are represented as bags ofngram contexts derived slot-filling procedure described above. [sent-117, score-0.059]

49 Syntax+Documents The syntax-only representation is augmented with additional document contexts drawn from Wikipedia. [sent-119, score-0.1]

50 – Models trained on the syntax-only set are only capable of capturing syntagmatic similarity relations, that is, words that tend to appear in similar contexts. [sent-120, score-0.077]

51 In contrast, the syntax+documents set broadens the scope of modelable similarity relations, allowing for paradigmatic similarity (e. [sent-121, score-0.125]

52 1409 Given such word representation data, MVM generates a fixed set of M context views corresponding to dominant eigenvectors in local syntactic or semantic space. [sent-124, score-0.071]

53 Within each view, MVM partitions words into clusters based on each word’s local representation in that view; that is, based on the set of con- text features it allocates to the view. [sent-125, score-0.095]

54 Words have a non-uniform affinity for each view, and hence may not be present in every clustering (Figure 2). [sent-126, score-0.091]

55 In contrast, LDA finds locally consistent collections of contexts but does not further subdivide words into clusters given that set of contexts. [sent-128, score-0.132]

56 Two versions of the syntax-only dataset are created from different subsets of the Google n-gram corpora: (1) the common subset contains all syntactic contexts appearing more than 200 times in the combined corpus, and (2) the rare subset, containing only contexts that appear 50 times or fewer. [sent-133, score-0.166]

57 According to the use theory of meaning, lexical semantic knowledge is equivalent to knowing the contexts that words appear in, and hence being able to form reasonable hypotheses about the relatedness of syntactic contexts. [sent-141, score-0.118]

58 Vector space models are commonly evaluated by comparing their similarity predictions to a nominal set of human similarity judgments (Curran, 2004; Pad o´ and Lapata, 2007; Sch u¨tze, 1998; Turney, 2006). [sent-142, score-0.068]

59 In this work, since we are evaluating models that potentially yield many different similarity scores, we take a different approach, scoring clusters on their semantic and syntactic coherence using a set intrusion task (Chang et al. [sent-143, score-0.363]

60 In set intrusion, human raters are shown a set of options from a coherent group and asked to identify a single intruder drawn from a different group. [sent-145, score-0.1]

61 We extend intrusion to three different lexical semantic tasks: (1) context intrusion, where the top contexts from each cluster are used, (3) document intrusion, where the top document contexts from each cluster are used, and (2) word intrusion, where the top words from each cluster are used. [sent-146, score-0.579]

62 4 The resulting set is then shuffled, and the human raters are asked to identify the intruder, af4Choosing four elements from the cluster uniformly at random instead of the top by probability led to lower performance across all models. [sent-148, score-0.086]

63 As the semantic coherence and distinctness from other clusters increases, this task becomes easier. [sent-151, score-0.079]

64 Set intrusion is a more robust way to account for human similarity judgments than asking directly for a numeric score (e. [sent-152, score-0.284]

65 A total of 1256 raters completed 30438 evaluations for 5780 unique intrusion tasks (5 evaluations per task). [sent-157, score-0.293]

66 2736 potentially fraudulent evaluations from 11 raters were rejected. [sent-158, score-0.061]

67 6LDA is run on a different range of M settings from MVM (50-1000 vs 3-100) in order to keep the effective number of context intrusion DPMM−0. [sent-163, score-0.25]

68 0 % correct word intrusion GG GG G G G GG GG GG G GG G G GG 0. [sent-215, score-0.25]

69 01 GGG GG G G G G GG word intrusion G G GG GG G G G GG 0. [sent-321, score-0.25]

70 0 word intrusion GG GG GG GG G G GGG GG GGG G G GG G G GG G 0. [sent-327, score-0.25]

71 01u, and β P order to understand how they on the intrusion tasks and also are to various parameter run until convergence, defined log-likelihood on the training α u t0. [sent-364, score-0.25]

72 Average runtimes varied from a few hours to a few days, depending on the number of clusters or topics. [sent-369, score-0.058]

73 Overall, MVM significantly outperforms both LDA and DPMM (measured as % of intruders correctly identified) as the number of clusters increases. [sent-371, score-0.058]

74 Coarse-grained lexical semantic distinctions are easy for humans to make, and hence models with fewer clusters tend to outperform models with more clusters. [sent-372, score-0.152]

75 Since high granularity predictions are more clusters (and hence model capacity) roughly comparable. [sent-373, score-0.101]

76 1 Syntax-only Model For common n-gram context features, MVM performance is significantly less variable than LDA on both the word intrusion and context intrusion tasks, and furthermore significantly outperforms DPMM (Figure 3(a)). [sent-399, score-0.5]

77 These models vary significantly in the average number of clusters used: 373. [sent-407, score-0.058]

78 Results from MVM have higher κ scores than LDA or DPMM; likewise Syntax+Documents data yields higher agreement, primarily due to the relative ease of the document intrusion task. [sent-425, score-0.273]

79 Average cluster sizes are more uniform across model types for rare contexts: 384. [sent-437, score-0.07]

80 Human performance on the context intrusion task is significantly more variable than on the wordintrusion task, reflecting the additional complexity. [sent-440, score-0.264]

81 Qualitatively, models trained on syntax+document yield a higher degree of paradigmatic clusters which have intuitive thematic structure. [sent-472, score-0.156]

82 Performance on document intrusion is significantly lower and more variable, reflecting the higher degree of world knowledge required. [sent-473, score-0.287]

83 As with the previous data set, performance of MVM models trained on syntax+documents data degrades less slowly as the cluster granularity increases (Figure 5). [sent-474, score-0.077]

84 One interesting question is to what degree MVM views partition syntax and document features versus LDA topics. [sent-475, score-0.1]

85 That is, to what degree do the MVM views capture purely syntagmatic or purely paradigmatic variation? [sent-476, score-0.15]

86 We measured view entropy for all three models, treating syntactic features and document features as different class labels. [sent-477, score-0.07]

87 MVM with M 50 views obtained an entropy score of 0. [sent-478, score-0.05]

88 9 Thus MVM views may indeed capture pure syntactic or thematic clusterings. [sent-482, score-0.076]

89 9The low entropy scores reflect the higher percentage of syntactic contexts overall. [sent-483, score-0.059]

90 3 Discussion As cluster granularity increases, we find that MVM accounts for feature noise better than either LDA or DPMM, yielding more coherent clusters. [sent-485, score-0.097]

91 , 2009) note that LDA performance degrades significantly on a related task as the number of topics increases, reflecting the increasing difficulty for humans in grasping the connection between terms in the same topic. [sent-487, score-0.05]

92 In this work, we find that although MVM and LDA perform similarity on average, MVM clusters are significantly more interpretable than LDA clusters as the granularity increases (Figures 4 and 5). [sent-489, score-0.184]

93 where clusters are drawn uniformly at random from the model) and when cluster selection is biased based on model probability (results shown). [sent-493, score-0.119]

94 Biased selection potentially gives an advantage to MVM, which generates many more small clusters than either LDA or DPMM, helping it account for noise. [sent-494, score-0.058]

95 6 Future Work Models based on cross-cutting categorization is a novel approach to lexical semantics and hence should be evaluated on standard baseline tasks, e. [sent-495, score-0.092]

96 For example, clusterings that divide cities by geography or clusterings partition adjectives by their polarity. [sent-499, score-0.148]

97 (Hierarchical Cross-Categorization) Human concept organization consists of multiple overlapping local ontologies, similar to the loose ontological structure of Wikipedia. [sent-507, score-0.059]

98 It would be interesting to extend MVM to model hierarchy explicitly, and compare against baselines such as Brown clustering (Brown et al. [sent-509, score-0.068]

99 7 Conclusion This paper introduced MVM, a novel approach to modeling lexical semantic organization using multiple cross-cutting clusterings capable of capturing multiple lexical similarity relations jointly in the same model. [sent-513, score-0.193]

100 In addition to robustly handling homonymy and polysemy, MVM naturally captures both syntagmatic and paradigmatic notions of word similarity. [sent-514, score-0.126]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mvm', 0.817), ('lda', 0.3), ('dpmm', 0.25), ('intrusion', 0.25), ('gg', 0.148), ('clusterings', 0.074), ('clustering', 0.068), ('contexts', 0.059), ('clusters', 0.058), ('paradigmatic', 0.057), ('shafto', 0.054), ('views', 0.05), ('view', 0.047), ('raters', 0.043), ('syntagmatic', 0.043), ('cluster', 0.043), ('categorization', 0.037), ('partitions', 0.037), ('mansinghka', 0.036), ('similarity', 0.034), ('google', 0.033), ('ggg', 0.031), ('austin', 0.031), ('cuisine', 0.027), ('dpmms', 0.027), ('idon', 0.027), ('zid', 0.027), ('rare', 0.027), ('syntax', 0.027), ('notions', 0.026), ('thematic', 0.026), ('overlapping', 0.026), ('sch', 0.026), ('blank', 0.023), ('intruder', 0.023), ('document', 0.023), ('hence', 0.023), ('tze', 0.022), ('mixture', 0.022), ('rater', 0.021), ('reisinger', 0.021), ('restaurant', 0.021), ('wd', 0.021), ('subsets', 0.021), ('semantic', 0.021), ('granularity', 0.02), ('documents', 0.018), ('drawn', 0.018), ('dirichletp', 0.018), ('discretep', 0.018), ('fraudulent', 0.018), ('gorman', 0.018), ('pachinko', 0.018), ('primitives', 0.018), ('purple', 0.018), ('sector', 0.018), ('tversky', 0.018), ('tzu', 0.018), ('vikash', 0.018), ('wq', 0.018), ('zq', 0.018), ('noise', 0.018), ('humans', 0.018), ('topics', 0.018), ('cm', 0.017), ('multiple', 0.017), ('curran', 0.017), ('semantics', 0.017), ('distributional', 0.017), ('distinctions', 0.017), ('latent', 0.016), ('pad', 0.016), ('coherent', 0.016), ('clustered', 0.016), ('ms', 0.016), ('nested', 0.016), ('mccarthy', 0.016), ('scatterplot', 0.016), ('bat', 0.016), ('ccc', 0.016), ('club', 0.016), ('disparate', 0.016), ('governing', 0.016), ('korea', 0.016), ('oni', 0.016), ('ontological', 0.016), ('finds', 0.015), ('books', 0.015), ('psychological', 0.015), ('hierarchical', 0.015), ('joshua', 0.015), ('intuitive', 0.015), ('dirichlet', 0.015), ('lexical', 0.015), ('reflecting', 0.014), ('durme', 0.014), ('niu', 0.014), ('wine', 0.014), ('increases', 0.014), ('dm', 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics

Author: Joseph Reisinger ; Raymond Mooney

Abstract: Context-dependent word similarity can be measured over multiple cross-cutting dimensions. For example, lung and breath are similar thematically, while authoritative and superficial occur in similar syntactic contexts, but share little semantic similarity. Both of these notions of similarity play a role in determining word meaning, and hence lexical semantic models must take them both into account. Towards this end, we develop a novel model, Multi-View Mixture (MVM), that represents words as multiple overlapping clusterings. MVM finds multiple data partitions based on different subsets of features, subject to the marginal constraint that feature subsets are distributed according to Latent Dirich- let Allocation. Intuitively, this constraint favors feature partitions that have coherent topical semantics. Furthermore, MVM uses soft feature assignment, hence the contribution of each data point to each clustering view is variable, isolating the impact of data only to views where they assign the most features. Through a series of experiments, we demonstrate the utility of MVM as an inductive bias for capturing relations between words that are intuitive to humans, outperforming related models such as Latent Dirichlet Allocation.

2 0.13486262 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases

Author: Matthias Hartung ; Anette Frank

Abstract: This paper introduces an attribute selection task as a way to characterize the inherent meaning of property-denoting adjectives in adjective-noun phrases, such as e.g. hot in hot summer denoting the attribute TEMPERATURE, rather than TASTE. We formulate this task in a vector space model that represents adjectives and nouns as vectors in a semantic space defined over possible attributes. The vectors incorporate latent semantic information obtained from two variants of LDA topic models. Our LDA models outperform previous approaches on a small set of 10 attributes with considerable gains on sparse representations, which highlights the strong smoothing power of LDA models. For the first time, we extend the attribute selection task to a new data set with more than 200 classes. We observe that large-scale attribute selection is a hard problem, but a subset of attributes performs robustly on the large scale as well. Again, the LDA models outperform the VSM baseline.

3 0.093097195 119 emnlp-2011-Semantic Topic Models: Combining Word Distributional Statistics and Dictionary Definitions

Author: Weiwei Guo ; Mona Diab

Abstract: In this paper, we propose a novel topic model based on incorporating dictionary definitions. Traditional topic models treat words as surface strings without assuming predefined knowledge about word meaning. They infer topics only by observing surface word co-occurrence. However, the co-occurred words may not be semantically related in a manner that is relevant for topic coherence. Exploiting dictionary definitions explicitly in our model yields a better understanding of word semantics leading to better text modeling. We exploit WordNet as a lexical resource for sense definitions. We show that explicitly modeling word definitions helps improve performance significantly over the baseline for a text categorization task.

4 0.081230864 101 emnlp-2011-Optimizing Semantic Coherence in Topic Models

Author: David Mimno ; Hanna Wallach ; Edmund Talley ; Miriam Leenders ; Andrew McCallum

Abstract: Latent variable models have the potential to add value to large document collections by discovering interpretable, low-dimensional subspaces. In order for people to use such models, however, they must trust them. Unfortunately, typical dimensionality reduction methods for text, such as latent Dirichlet allocation, often produce low-dimensional subspaces (topics) that are obviously flawed to human domain experts. The contributions of this paper are threefold: (1) An analysis of the ways in which topics can be flawed; (2) an automated evaluation metric for identifying such topics that does not rely on human annotators or reference collections outside the training data; (3) a novel statistical topic model based on this metric that significantly improves topic quality in a large-scale document collection from the National Institutes of Health (NIH).

5 0.079309173 144 emnlp-2011-Unsupervised Learning of Selectional Restrictions and Detection of Argument Coercions

Author: Kirk Roberts ; Sanda Harabagiu

Abstract: Metonymic language is a pervasive phenomenon. Metonymic type shifting, or argument type coercion, results in a selectional restriction violation where the argument’s semantic class differs from the class the predicate expects. In this paper we present an unsupervised method that learns the selectional restriction of arguments and enables the detection of argument coercion. This method also generates an enhanced probabilistic resolution of logical metonymies. The experimental results indicate substantial improvements the detection of coercions and the ranking of metonymic interpretations.

6 0.072938874 107 emnlp-2011-Probabilistic models of similarity in syntactic context

7 0.071397461 21 emnlp-2011-Bayesian Checking for Topic Models

8 0.068572693 14 emnlp-2011-A generative model for unsupervised discovery of relations and argument classes from clinical texts

9 0.057857607 128 emnlp-2011-Structured Relation Discovery using Generative Models

10 0.04267627 67 emnlp-2011-Hierarchical Verb Clustering Using Graph Factorization

11 0.041930806 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

12 0.03821509 145 emnlp-2011-Unsupervised Semantic Role Induction with Graph Partitioning

13 0.037281934 61 emnlp-2011-Generating Aspect-oriented Multi-Document Summarization with Event-aspect model

14 0.034356728 80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context

15 0.028098824 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features

16 0.024658881 104 emnlp-2011-Personalized Recommendation of User Comments via Factor Models

17 0.024639459 98 emnlp-2011-Named Entity Recognition in Tweets: An Experimental Study

18 0.024179943 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning

19 0.024009107 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model

20 0.023382811 114 emnlp-2011-Relation Extraction with Relation Topics


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.097), (1, -0.084), (2, -0.132), (3, -0.103), (4, -0.041), (5, 0.154), (6, 0.048), (7, 0.024), (8, 0.049), (9, 0.004), (10, -0.018), (11, 0.061), (12, 0.022), (13, -0.065), (14, -0.048), (15, 0.038), (16, 0.03), (17, 0.078), (18, -0.007), (19, 0.039), (20, 0.03), (21, 0.068), (22, -0.009), (23, 0.012), (24, 0.025), (25, -0.037), (26, -0.026), (27, -0.065), (28, 0.007), (29, 0.043), (30, -0.121), (31, 0.078), (32, -0.007), (33, 0.054), (34, 0.0), (35, -0.114), (36, 0.04), (37, -0.058), (38, -0.133), (39, -0.006), (40, -0.15), (41, 0.025), (42, 0.118), (43, -0.147), (44, -0.202), (45, -0.062), (46, 0.108), (47, 0.136), (48, -0.001), (49, -0.139)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.924308 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics

Author: Joseph Reisinger ; Raymond Mooney

Abstract: Context-dependent word similarity can be measured over multiple cross-cutting dimensions. For example, lung and breath are similar thematically, while authoritative and superficial occur in similar syntactic contexts, but share little semantic similarity. Both of these notions of similarity play a role in determining word meaning, and hence lexical semantic models must take them both into account. Towards this end, we develop a novel model, Multi-View Mixture (MVM), that represents words as multiple overlapping clusterings. MVM finds multiple data partitions based on different subsets of features, subject to the marginal constraint that feature subsets are distributed according to Latent Dirich- let Allocation. Intuitively, this constraint favors feature partitions that have coherent topical semantics. Furthermore, MVM uses soft feature assignment, hence the contribution of each data point to each clustering view is variable, isolating the impact of data only to views where they assign the most features. Through a series of experiments, we demonstrate the utility of MVM as an inductive bias for capturing relations between words that are intuitive to humans, outperforming related models such as Latent Dirichlet Allocation.

2 0.62137085 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases

Author: Matthias Hartung ; Anette Frank

Abstract: This paper introduces an attribute selection task as a way to characterize the inherent meaning of property-denoting adjectives in adjective-noun phrases, such as e.g. hot in hot summer denoting the attribute TEMPERATURE, rather than TASTE. We formulate this task in a vector space model that represents adjectives and nouns as vectors in a semantic space defined over possible attributes. The vectors incorporate latent semantic information obtained from two variants of LDA topic models. Our LDA models outperform previous approaches on a small set of 10 attributes with considerable gains on sparse representations, which highlights the strong smoothing power of LDA models. For the first time, we extend the attribute selection task to a new data set with more than 200 classes. We observe that large-scale attribute selection is a hard problem, but a subset of attributes performs robustly on the large scale as well. Again, the LDA models outperform the VSM baseline.

3 0.56362796 144 emnlp-2011-Unsupervised Learning of Selectional Restrictions and Detection of Argument Coercions

Author: Kirk Roberts ; Sanda Harabagiu

Abstract: Metonymic language is a pervasive phenomenon. Metonymic type shifting, or argument type coercion, results in a selectional restriction violation where the argument’s semantic class differs from the class the predicate expects. In this paper we present an unsupervised method that learns the selectional restriction of arguments and enables the detection of argument coercion. This method also generates an enhanced probabilistic resolution of logical metonymies. The experimental results indicate substantial improvements the detection of coercions and the ranking of metonymic interpretations.

4 0.30529568 119 emnlp-2011-Semantic Topic Models: Combining Word Distributional Statistics and Dictionary Definitions

Author: Weiwei Guo ; Mona Diab

Abstract: In this paper, we propose a novel topic model based on incorporating dictionary definitions. Traditional topic models treat words as surface strings without assuming predefined knowledge about word meaning. They infer topics only by observing surface word co-occurrence. However, the co-occurred words may not be semantically related in a manner that is relevant for topic coherence. Exploiting dictionary definitions explicitly in our model yields a better understanding of word semantics leading to better text modeling. We exploit WordNet as a lexical resource for sense definitions. We show that explicitly modeling word definitions helps improve performance significantly over the baseline for a text categorization task.

5 0.29089794 107 emnlp-2011-Probabilistic models of similarity in syntactic context

Author: Diarmuid O Seaghdha ; Anna Korhonen

Abstract: This paper investigates novel methods for incorporating syntactic information in probabilistic latent variable models of lexical choice and contextual similarity. The resulting models capture the effects of context on the interpretation of a word and in particular its effect on the appropriateness of replacing that word with a potentially related one. Evaluating our techniques on two datasets, we report performance above the prior state of the art for estimating sentence similarity and ranking lexical substitutes.

6 0.29044268 67 emnlp-2011-Hierarchical Verb Clustering Using Graph Factorization

7 0.28308272 21 emnlp-2011-Bayesian Checking for Topic Models

8 0.28277552 14 emnlp-2011-A generative model for unsupervised discovery of relations and argument classes from clinical texts

9 0.24875556 101 emnlp-2011-Optimizing Semantic Coherence in Topic Models

10 0.23640537 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

11 0.23521586 19 emnlp-2011-Approximate Scalable Bounded Space Sketch for Large Data NLP

12 0.22445059 23 emnlp-2011-Bootstrapped Named Entity Recognition for Product Attribute Extraction

13 0.1975372 128 emnlp-2011-Structured Relation Discovery using Generative Models

14 0.17968464 86 emnlp-2011-Lexical Co-occurrence, Statistical Significance, and Word Association

15 0.17965917 145 emnlp-2011-Unsupervised Semantic Role Induction with Graph Partitioning

16 0.17480908 80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context

17 0.1694617 62 emnlp-2011-Generating Subsequent Reference in Shared Visual Scenes: Computation vs Re-Use

18 0.16848671 81 emnlp-2011-Learning General Connotation of Words using Graph-based Algorithms

19 0.16678587 60 emnlp-2011-Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation

20 0.15674382 61 emnlp-2011-Generating Aspect-oriented Multi-Document Summarization with Event-aspect model


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(23, 0.084), (36, 0.018), (37, 0.019), (45, 0.12), (52, 0.277), (53, 0.014), (54, 0.025), (57, 0.021), (62, 0.023), (64, 0.026), (66, 0.068), (69, 0.02), (79, 0.033), (82, 0.023), (87, 0.011), (96, 0.069), (98, 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.78150272 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases

Author: Matthias Hartung ; Anette Frank

Abstract: This paper introduces an attribute selection task as a way to characterize the inherent meaning of property-denoting adjectives in adjective-noun phrases, such as e.g. hot in hot summer denoting the attribute TEMPERATURE, rather than TASTE. We formulate this task in a vector space model that represents adjectives and nouns as vectors in a semantic space defined over possible attributes. The vectors incorporate latent semantic information obtained from two variants of LDA topic models. Our LDA models outperform previous approaches on a small set of 10 attributes with considerable gains on sparse representations, which highlights the strong smoothing power of LDA models. For the first time, we extend the attribute selection task to a new data set with more than 200 classes. We observe that large-scale attribute selection is a hard problem, but a subset of attributes performs robustly on the large scale as well. Again, the LDA models outperform the VSM baseline.

2 0.76798379 41 emnlp-2011-Discriminating Gender on Twitter

Author: John D. Burger ; John Henderson ; George Kim ; Guido Zarrella

Abstract: Accurate prediction of demographic attributes from social media and other informal online content is valuable for marketing, personalization, and legal investigation. This paper describes the construction of a large, multilingual dataset labeled with gender, and investigates statistical models for determining the gender of uncharacterized Twitter users. We explore several different classifier types on this dataset. We show the degree to which classifier accuracy varies based on tweet volumes as well as when various kinds of profile metadata are included in the models. We also perform a large-scale human assessment using Amazon Mechanical Turk. Our methods significantly out-perform both baseline models and almost all humans on the same task.

same-paper 3 0.76305187 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics

Author: Joseph Reisinger ; Raymond Mooney

Abstract: Context-dependent word similarity can be measured over multiple cross-cutting dimensions. For example, lung and breath are similar thematically, while authoritative and superficial occur in similar syntactic contexts, but share little semantic similarity. Both of these notions of similarity play a role in determining word meaning, and hence lexical semantic models must take them both into account. Towards this end, we develop a novel model, Multi-View Mixture (MVM), that represents words as multiple overlapping clusterings. MVM finds multiple data partitions based on different subsets of features, subject to the marginal constraint that feature subsets are distributed according to Latent Dirich- let Allocation. Intuitively, this constraint favors feature partitions that have coherent topical semantics. Furthermore, MVM uses soft feature assignment, hence the contribution of each data point to each clustering view is variable, isolating the impact of data only to views where they assign the most features. Through a series of experiments, we demonstrate the utility of MVM as an inductive bias for capturing relations between words that are intuitive to humans, outperforming related models such as Latent Dirichlet Allocation.

4 0.56121343 33 emnlp-2011-Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs

Author: Samuel Brody ; Nicholas Diakopoulos

Abstract: We present an automatic method which leverages word lengthening to adapt a sentiment lexicon specifically for Twitter and similar social messaging networks. The contributions of the paper are as follows. First, we call attention to lengthening as a widespread phenomenon in microblogs and social messaging, and demonstrate the importance of handling it correctly. We then show that lengthening is strongly associated with subjectivity and sentiment. Finally, we present an automatic method which leverages this association to detect domain-specific sentiment- and emotionbearing words. We evaluate our method by comparison to human judgments, and analyze its strengths and weaknesses. Our results are of interest to anyone analyzing sentiment in microblogs and social networks, whether for research or commercial purposes.

5 0.55682969 98 emnlp-2011-Named Entity Recognition in Tweets: An Experimental Study

Author: Alan Ritter ; Sam Clark ; Mausam ; Oren Etzioni

Abstract: People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition. Our novel T-NER system doubles F1 score compared with the Stanford NER system. T-NER leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. LabeledLDA outperforms cotraining, increasing F1 by 25% over ten common entity types. Our NLP tools are available at: http : / / github .com/ aritt er /twitte r_nlp

6 0.5548864 117 emnlp-2011-Rumor has it: Identifying Misinformation in Microblogs

7 0.55436605 139 emnlp-2011-Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter

8 0.55307829 71 emnlp-2011-Identifying and Following Expert Investors in Stock Microblogs

9 0.53581423 89 emnlp-2011-Linguistic Redundancy in Twitter

10 0.52614325 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus

11 0.52140993 101 emnlp-2011-Optimizing Semantic Coherence in Topic Models

12 0.50851053 107 emnlp-2011-Probabilistic models of similarity in syntactic context

13 0.50486952 119 emnlp-2011-Semantic Topic Models: Combining Word Distributional Statistics and Dictionary Definitions

14 0.49830225 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing

15 0.49804431 116 emnlp-2011-Robust Disambiguation of Named Entities in Text

16 0.49676341 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning

17 0.49539304 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features

18 0.49427614 128 emnlp-2011-Structured Relation Discovery using Generative Models

19 0.49076393 81 emnlp-2011-Learning General Connotation of Words using Graph-based Algorithms

20 0.4906902 90 emnlp-2011-Linking Entities to a Knowledge Base with Query Expansion