acl acl2013 acl2013-238 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Aurelie Herbelot ; Mohan Ganesalingam
Abstract: Some words are more contentful than others: for instance, make is intuitively more general than produce and fifteen is more ‘precise’ than a group. In this paper, we propose to measure the ‘semantic content’ of lexical items, as modelled by distributional representations. We investigate the hypothesis that semantic content can be computed using the KullbackLeibler (KL) divergence, an informationtheoretic measure of the relative entropy of two distributions. In a task focusing on retrieving the correct ordering of hyponym-hypernym pairs, the KL diver- gence achieves close to 80% precision but does not outperform a simpler (linguistically unmotivated) frequency measure. We suggest that this result illustrates the rather ‘intensional’ aspect of distributions.
Reference: text
sentIndex sentText sentNum sentScore
1 Measuring semantic content in distributional vectors Aur e´lie Herbelot EB Kognitionswissenschaft Universit a¨t Potsdam Golm, Germany aure l . [sent-1, score-0.561]
2 net ie Abstract Some words are more contentful than others: for instance, make is intuitively more general than produce and fifteen is more ‘precise’ than a group. [sent-3, score-0.179]
3 In this paper, we propose to measure the ‘semantic content’ of lexical items, as modelled by distributional representations. [sent-4, score-0.363]
4 We investigate the hypothesis that semantic content can be computed using the KullbackLeibler (KL) divergence, an informationtheoretic measure of the relative entropy of two distributions. [sent-5, score-0.501]
5 In a task focusing on retrieving the correct ordering of hyponym-hypernym pairs, the KL diver- gence achieves close to 80% precision but does not outperform a simpler (linguistically unmotivated) frequency measure. [sent-6, score-0.134]
6 1 Introduction Distributional semantics is a representation of lexical meaning that relies on a statistical analysis of the way words are used in corpora (Curran, 2003; Turney and Pantel, 2010; Erk, 2012). [sent-8, score-0.142]
7 In this framework, the semantics of a lexical item is accounted for by modelling its co-occurrence with other words (or any larger lexical context). [sent-9, score-0.086]
8 This paper investigates the issue of computing the semantic content of distributional vectors. [sent-12, score-0.614]
9 Mohan Ganesalingam Trinity College University of Cambridge Cambridge, UK mohan 0 @ gmai l com . [sent-13, score-0.059]
10 That is, we look at the ways we can distributionally express that make is a more general verb than produce, which is itself more general than, for instance, weave. [sent-14, score-0.15]
11 Although the task is related to the identification of hyponymy relations, it aims to reflect a more encompassing phenomenon: we wish to be able to compare the semantic content of words within parts-of-speech where the standard notion of hyponymy does not apply (e. [sent-15, score-0.69]
12 The hypothesis we will put forward is that semantic content is related to notions of relative entropy found in information theory. [sent-23, score-0.457]
13 More specifically, we hypothesise that the more specific a word is, the more the distribution of the words co-occurring with it will differ from the baseline distribution of those words in the language as a whole. [sent-24, score-0.222]
14 ) The specific measure of difference that we will use is the Kullback-Leibler divergence of the distribution of words co-ocurring with the target word against the distribution of those words in the language as a whole. [sent-26, score-0.401]
15 We evaluate our hypothesis against a subset of the WordNet hierarchy (given by (Baroni et al, 2012)), relying on the intuition that in a hyponym-hypernym pair, the hyponym should have higher semantic content than its hypernym. [sent-27, score-0.388]
16 We first define our notion of semantic content and motivate the need for measuring semantic content in distributional setups. [sent-29, score-0.965]
17 We then describe the implementation of the distributional system we use in this paper, emphasising our choice of weighting measure. [sent-30, score-0.275]
18 We finally evaluate the KL measure against a basic notion of frequency and conclude with some error analysis. [sent-34, score-0.251]
19 2 Semantic content As a first approximation, we will define semantic content as informativeness with respect to denotation. [sent-35, score-0.583]
20 Following Searle (1969), we will take a ‘successful reference’ to be a speech act where the choice of words used by the speaker appropriately identifies a referent for the hearer. [sent-36, score-0.052]
21 Glossing over questions of pragmatics, we will assume that a more informative word is more likely to lead to a successful reference than a less informative one. [sent-37, score-0.118]
22 That is, if Kim owns a cat and a dog, the identifying expression my cat is a better referent than my pet and so cat can be said to have more semantic content than pet. [sent-38, score-0.674]
23 While our definition relies on reference, it also posits a correspondence between actual utterances and denotation. [sent-39, score-0.049]
24 Given two possible identifying expressions e1 and e2, e1 may be preferred in a particular context, and so, context will be an indicator of the amount of semantic content in an expression. [sent-40, score-0.331]
25 In Section 5, we will produce an explicit hypothesis for how the amount of semantic content in a lexical item affects the contexts in which it appears. [sent-41, score-0.34]
26 A case where semantic content has a direct correspondence with a lexical relation is hyponymy. [sent-42, score-0.335]
27 Here, the correspondence relies entirely on a basic notion of extension. [sent-43, score-0.167]
28 For instance, it is clear that hammer is more contentful than tool because the extension of hammer is smaller than that of tool, and therefore more discriminating in a given identifying expression (See Give me the hammer versus Give me the tool). [sent-44, score-0.569]
29 But we can also talk about semantic content in cases where the notion of extension does not necessarily apply. [sent-45, score-0.523]
30 For example, it is not usual to talk of the extension of a preposition. [sent-46, score-0.119]
31 However, in context, the use of a preposition against another one might be more discrim- inating in terms of reference. [sent-47, score-0.037]
32 Given a set of possible situations involving, say, Kim and Sandy at a party, we could show that b) is more discriminating than a), because it excludes the situations where Sandy came to the party with Kim but is currently talking to Kay at the other end of the room. [sent-49, score-0.157]
33 The fact that next to expresses physical proximity, as opposed to just being in the same situation, confers it more semantic content according to our definition. [sent-50, score-0.286]
34 Further still, there may be a need for comparing the informativeness of words across parts of speech (compare A group of/Fifteen people was/were waiting in front of the town hall). [sent-51, score-0.1]
35 Although we will not discuss this in detail, there is a notion of semantic content above the word level which should naturally derive from composition rules. [sent-52, score-0.461]
36 For instance, we would expect the composition of a given intersective adjective and a given noun to result into a phrase with a semantic content greater than that of its components (or at least equal to it). [sent-53, score-0.343]
37 3 Motivation The last few years have seen a growing interest in distributional semantics as a representation of lexical meaning. [sent-54, score-0.361]
38 In a distributional semantic space, for instance, the word ‘cat’ may be close to ‘dog’ or to ‘tiger’, and its vector might have high values along the dimensions ‘meow’, ‘mouse’ and ‘pet’ . [sent-56, score-0.445]
39 Distributional semantics has had great successes in recent years, and for many computational linguists, it is an essential tool for modelling phenomena affected by lexical meaning. [sent-57, score-0.124]
40 If distributional semantics is to be seen as a general-purpose representation, however, we should evaluate it across all properties which we deem relevant to a model of the lexicon. [sent-58, score-0.361]
41 We consider semantic content to be one such property. [sent-59, score-0.286]
42 It underlies the notion of hyponymy and naturally models our intuitions about the ‘precision’ (as opposed to ‘vagueness’) of words. [sent-60, score-0.23]
43 Further, semantic content may be crucial in solving some fundamental problems of distributional semantics. [sent-61, score-0.561]
44 As pointed out by McNally (2013), there is no easy way to define the notion of a function word and this has consequences for theories where function words are not assigned a distributional representation. [sent-62, score-0.432]
45 McNally suggests that the most appropriate way to separate function 441 from content words might, in the end, involve taking into account how much ‘descriptive’ content they have. [sent-63, score-0.394]
46 4 An implementation of a distributional system The distributional system we implemented for this paper is close to the system of Mitchell and Lapata (2010) (subsequently M&L;). [sent-64, score-0.55]
47 Each article in the corpus is converted into a 11-word window format, that is, we are assuming that context in our system is defined by the five words preceding and the five words following the target. [sent-69, score-0.045]
48 5 Semantic content as entropy: two measures Resnik (1995) uses the notion of information content to improve on the standard edge counting methods proposed to measure similarity in taxonomies such as WordNet. [sent-72, score-0.6]
49 He proposes that the information content of a term t is given by the selfinformation measure logp(t) . [sent-73, score-0.361]
50 The idea behind tihnfiso measure mise that, as tloheg frequency iodfe ath bee theirnmd increases, its informativeness decreases. [sent-74, score-0.27]
51 Although a good first approximation, the measure cannot be said to truly reflect our concept of semantic content. [sent-75, score-0.232]
52 For instance, in the British National Corpus, time and see are more frequent than thing or may and man is more frequent than part. [sent-76, score-0.038]
53 However, it seems intuitively right to say that time, see and man are more ‘precise’ concepts than thing, may and part respectively. [sent-77, score-0.038]
54 Or said otherwise, there is no indication that more general concepts occur in speech more than less general ones. [sent-78, score-0.191]
55 As we expect more specific words to be more informative about which words co-occur with them, it is natural to try to measure the specificity of a word by using notions from information the− ory to analyse the probability distribution p(ci |t) associated with the word. [sent-80, score-0.314]
56 The standard noti|otn) of entropy is not appropriate for this purpose, because it does not take account of the fact that the words serving as semantic space dimensions may have different frequencies in language as a whole, i. [sent-81, score-0.206]
57 Instead we need to measure the degree to which p(ci |t) differs from the context word distribution p(ci). [sent-84, score-0.206]
58 |At)n d appropriate measure xfot rw tohrids is the Kullback-Leibler (KL) divergence or relative entropy: DKL(PkQ) =Xiln(QP((ii)))P(i) (5) By taking P(i) to be p(ci |t) and Q(i) to be p(ci) (as given by Equation 4), we cnadlcQu(lai)tet tbhee p re(lcative entropy of p(ci |t) and p(ci). [sent-85, score-0.328]
59 The measure is clearly informative: tit) r aefnldec pt(s cthe way that t modifies the expectation of seeing ci in the corpus. [sent-86, score-0.405]
60 We hypothesise that when compared to the distribution p(ci), more informative words will have a 442 more ‘distorted’ distribution p(ci |t) and that the KL divergence will reflect this. [sent-87, score-0.448]
61 1 6 Evaluation In Section 2, we defined semantic content as a notion encompassing various referential properties, including a basic concept of extension in cases where it is applicable. [sent-88, score-0.535]
62 However, we do not know of a dataset providing human judgements over the general informativeness of lexical items. [sent-89, score-0.188]
63 So in order to evaluate our proposed measure, we investigate its ability to retrieve the right ordering of hyponym pairs, which can be considered a subset of the issue at hand. [sent-90, score-0.148]
64 Our assumption is that if X is a hypernym of Y , then the information content in X will be lower than in Y (because it has a more ‘general’ meaning). [sent-91, score-0.277]
65 So, given a pair of words {w1, w2} in a known hyponymy relation, we sdhso {uwld be a}ble in t oa tell which of w1 or w2 is the hypernym by computing the respective KL divergences. [sent-92, score-0.192]
66 We use the hypernym data provided by (Baroni et al, 2012) as testbed for our experiment. [sent-93, score-0.08]
67 Before running our system on the data, we make slight modifications to it. [sent-95, score-0.035]
68 First, as our distributions are created over the British National Corpus, some spellings must be converted to British English: for instance, color is replaced by colour. [sent-96, score-0.079]
69 Second, five of the nouns included in the test set are not in the BNC. [sent-97, score-0.038]
70 Those nouns are brethren, intranet, iPod, webcam and IX. [sent-98, score-0.038]
71 We then calculate both the self-information measure and the KL divergence of all terms in- 1Note that KL divergence is not symmetric: DKL (p(ci |t) kp(ci))) is not necessarily equal to DKL (p(ci)|t tk)pk(pc(ic c|t)). [sent-104, score-0.461]
72 That is, we count a pair as correctly represented by our system if w1 is a hypernym of w2 and KL(w1) < KL(w2) (or, in the case of the baseline, SI(w1) < SI(w2) where SI is selfinformation). [sent-114, score-0.08]
73 8% precision on the task, with the KL divergence lagging a little behind with 79. [sent-116, score-0.204]
74 We analyse potential reasons for this disappointing result in the next section. [sent-119, score-0.112]
75 7 Error analysis It is worth reminding ourselves of the assumption we made with regard to semantic content. [sent-120, score-0.089]
76 Our hypothesis was that with a ‘more general’ target word t, the p(ci |t) distribution would be fairly similar to p(ci). [sent-121, score-0.127]
77 Manually checking some of the pairs which were wrongly classified by the KL divergence reveals that our hypothesis might not hold. [sent-122, score-0.258]
78 For example, the pair beer beverage is classified incorrectly. [sent-123, score-0.169]
79 When looking at the beverage distribution, it is clear that it does not conform to our expectations: it shows high vi(t) weights along the food, wine, coffee and tea dimensions, for instance, i. [sent-124, score-0.169]
80 Although beverage is an umbrella word| tf)o,r many vthaoriuoguhs types goef drinks, speakers of English use it in very particular contexts. [sent-127, score-0.169]
81 – The general point is that, as pointed out elsewhere in the literature (Erk, 2013), distributions are a good representation of (some aspects of) intension, but they are less apt to model extension. [sent-132, score-0.167]
82 So for instance, in the jewelry distribution, the most highly weighted context is mental (with vi (t) = 395. [sent-136, score-0.183]
83 While named entities could easily be eliminated from the system’s results by preprocessing the corpus with a named entity recogniser, the issue is not so simple when it comes to fixed phrases of a more compositional nature (e. [sent-138, score-0.053]
84 army ant): excluding them might be detrimental for the representation (it is, after all, part of the meaning of ant that it can be used metaphorically to refer to people) and identifying such phrases is a non-trivial problem in itself. [sent-140, score-0.139]
85 For instance, the word medium, to be found in the pair magazine medium, can be synonymous with middle, clairvoyant or again mode of communication. [sent-142, score-0.076]
86 As distributions do not distinguish between senses, this will have an effect on our results. [sent-144, score-0.079]
87 – 8 Conclusion In this paper, we attempted to define a measure of distributional semantic content in order to model the fact that some words have a more general meaning than others. [sent-145, score-0.754]
88 We compared the Kullback-Leibler divergence to a simple self-information measure. [sent-146, score-0.167]
89 Our experiments, which involved retrieving the correct ordering of hyponym-hypernym pairs, had disappointing results: the KL divergence was unable to outperform self-information, and both measures misclassified around 20% of our testset. [sent-147, score-0.356]
90 First, distributions are unable to model extensional properties which, in many cases, account for the feeling that a word is more general than another. [sent-149, score-0.166]
91 Finally, dis4Although it is more difficult to talk of the extension of e. [sent-151, score-0.119]
92 adverbials (very) or some adjectives (skillful), the general point is that text is biased towards a certain usage of words, while the general meaning a competent speaker ascribes to lexical items does not necessarily follow this bias. [sent-153, score-0.193]
93 To conclude, we would like to stress that we do not think another information-theoretic measure would perform hugely better than the KL divergence. [sent-155, score-0.088]
94 The point is that the nature of distributional vectors makes them sensitive to word usage and that, despite the general assumption behind distributional semantics, word usage might not suffice to model all aspects of lexical semantics. [sent-156, score-0.751]
95 We leave as an open problem the issue of whether a modified form of our ‘basic’ distributional vectors would encode the right information. [sent-157, score-0.328]
96 Acknowledgements This work was funded by a postdoctoral fellowship from the Alexander von Humboldt Foundation to the first author, and a Title A Fellowship from Trinity College, Cambridge, to the second author. [sent-158, score-0.046]
97 ), From context to meaning: Distribu- tional models of the lexicon in linguistics and cognitive science (Special issue of the Italian Journal of Linguistics 20(1)), pages 55–88. [sent-163, score-0.134]
98 Vector space models of word meaning and phrase meaning: a survey. [sent-180, score-0.056]
99 Using information content to evaluate semantic similarity in a taxonomy. [sent-212, score-0.286]
100 From frequency to meaning: Vector space models of semantics. [sent-221, score-0.045]
wordName wordTfidf (topN-words)
[('ci', 0.317), ('kl', 0.291), ('distributional', 0.275), ('content', 0.197), ('beverage', 0.169), ('divergence', 0.167), ('intension', 0.153), ('baroni', 0.136), ('sandy', 0.125), ('notion', 0.118), ('hammer', 0.115), ('hyponymy', 0.112), ('mcnally', 0.101), ('dkl', 0.101), ('informativeness', 0.1), ('vi', 0.091), ('semantic', 0.089), ('british', 0.088), ('measure', 0.088), ('semantics', 0.086), ('hypernym', 0.08), ('distributions', 0.079), ('cfood', 0.076), ('clairvoyant', 0.076), ('contentful', 0.076), ('freqci', 0.076), ('freqtotal', 0.076), ('garside', 0.076), ('hypothesise', 0.076), ('searle', 0.076), ('selfinformation', 0.076), ('trinity', 0.076), ('cat', 0.075), ('erk', 0.074), ('distribution', 0.073), ('entropy', 0.073), ('lenci', 0.072), ('extension', 0.069), ('bernardi', 0.068), ('raffaella', 0.068), ('freqt', 0.068), ('xci', 0.068), ('encompassing', 0.062), ('disappointing', 0.062), ('burgess', 0.062), ('informative', 0.059), ('mohan', 0.059), ('lund', 0.059), ('composition', 0.057), ('meaning', 0.056), ('leech', 0.056), ('pet', 0.056), ('said', 0.055), ('hypothesis', 0.054), ('fifteen', 0.054), ('evert', 0.054), ('potsdam', 0.054), ('dog', 0.054), ('issue', 0.053), ('referent', 0.052), ('distributionally', 0.052), ('kim', 0.051), ('instance', 0.051), ('talk', 0.05), ('analyse', 0.05), ('general', 0.049), ('correspondence', 0.049), ('hyponym', 0.048), ('bnc', 0.048), ('mental', 0.047), ('ordering', 0.047), ('ant', 0.046), ('fellowship', 0.046), ('context', 0.045), ('frequency', 0.045), ('medium', 0.045), ('dimensions', 0.044), ('party', 0.044), ('notions', 0.044), ('marco', 0.042), ('retrieving', 0.042), ('discriminating', 0.041), ('curran', 0.041), ('judgements', 0.039), ('calculate', 0.039), ('usage', 0.039), ('pointed', 0.039), ('nouns', 0.038), ('thing', 0.038), ('unable', 0.038), ('concepts', 0.038), ('tool', 0.038), ('linguists', 0.037), ('might', 0.037), ('behind', 0.037), ('cognitive', 0.036), ('situations', 0.036), ('si', 0.036), ('equations', 0.036), ('modifications', 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999923 238 acl-2013-Measuring semantic content in distributional vectors
Author: Aurelie Herbelot ; Mohan Ganesalingam
Abstract: Some words are more contentful than others: for instance, make is intuitively more general than produce and fifteen is more ‘precise’ than a group. In this paper, we propose to measure the ‘semantic content’ of lexical items, as modelled by distributional representations. We investigate the hypothesis that semantic content can be computed using the KullbackLeibler (KL) divergence, an informationtheoretic measure of the relative entropy of two distributions. In a task focusing on retrieving the correct ordering of hyponym-hypernym pairs, the KL diver- gence achieves close to 80% precision but does not outperform a simpler (linguistically unmotivated) frequency measure. We suggest that this result illustrates the rather ‘intensional’ aspect of distributions.
Author: Raffaella Bernardi ; Georgiana Dinu ; Marco Marelli ; Marco Baroni
Abstract: Distributional models of semantics capture word meaning very effectively, and they have been recently extended to account for compositionally-obtained representations of phrases made of content words. We explore whether compositional distributional semantic models can also handle a construction in which grammatical terms play a crucial role, namely determiner phrases (DPs). We introduce a new publicly available dataset to test distributional representations of DPs, and we evaluate state-of-the-art models on this set.
3 0.13933459 87 acl-2013-Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics
Author: Angeliki Lazaridou ; Marco Marelli ; Roberto Zamparelli ; Marco Baroni
Abstract: Speakers of a language can construct an unlimited number of new words through morphological derivation. This is a major cause of data sparseness for corpus-based approaches to lexical semantics, such as distributional semantic models of word meaning. We adapt compositional methods originally developed for phrases to the task of deriving the distributional meaning of morphologically complex words from their parts. Semantic representations constructed in this way beat a strong baseline and can be of higher quality than representations directly constructed from corpus data. Our results constitute a novel evaluation of the proposed composition methods, in which the full additive model achieves the best performance, and demonstrate the usefulness of a compositional morphology component in distributional semantics.
4 0.1289272 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
Author: Olivier Ferret
Abstract: Distributional thesauri are now widely used in a large number of Natural Language Processing tasks. However, they are far from containing only interesting semantic relations. As a consequence, improving such thesaurus is an important issue that is mainly tackled indirectly through the improvement of semantic similarity measures. In this article, we propose a more direct approach focusing on the identification of the neighbors of a thesaurus entry that are not semantically linked to this entry. This identification relies on a discriminative classifier trained from unsupervised selected examples for building a distributional model of the entry in texts. Its bad neighbors are found by applying this classifier to a representative set of occurrences of each of these neighbors. We evaluate the interest of this method for a large set of English nouns with various frequencies.
5 0.12514512 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
Author: Jackie Chi Kit Cheung ; Gerald Penn
Abstract: Generative probabilistic models have been used for content modelling and template induction, and are typically trained on small corpora in the target domain. In contrast, vector space models of distributional semantics are trained on large corpora, but are typically applied to domaingeneral lexical disambiguation tasks. We introduce Distributional Semantic Hidden Markov Models, a novel variant of a hidden Markov model that integrates these two approaches by incorporating contextualized distributional semantic vectors into a generative model as observed emissions. Experiments in slot induction show that our approach yields improvements in learning coherent entity clusters in a domain. In a subsequent extrinsic evaluation, we show that these improvements are also reflected in multi-document summarization.
6 0.12499893 22 acl-2013-A Structured Distributional Semantic Model for Event Co-reference
7 0.11985739 31 acl-2013-A corpus-based evaluation method for Distributional Semantic Models
8 0.10847215 76 acl-2013-Building and Evaluating a Distributional Memory for Croatian
9 0.10693494 103 acl-2013-DISSECT - DIStributional SEmantics Composition Toolkit
10 0.096009172 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering
11 0.095681906 113 acl-2013-Derivational Smoothing for Syntactic Distributional Semantics
12 0.093284406 116 acl-2013-Detecting Metaphor by Contextual Analogy
13 0.093197308 43 acl-2013-Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity
14 0.092082947 347 acl-2013-The Role of Syntax in Vector Space Models of Compositional Semantics
15 0.090187542 74 acl-2013-Building Comparable Corpora Based on Bilingual LDA Model
16 0.087627351 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates
17 0.086305805 58 acl-2013-Automated Collocation Suggestion for Japanese Second Language Learners
18 0.083977029 111 acl-2013-Density Maximization in Context-Sense Metric Space for All-words WSD
19 0.083038062 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
20 0.082454145 152 acl-2013-Extracting Definitions and Hypernym Relations relying on Syntactic Dependencies and Support Vector Machines
topicId topicWeight
[(0, 0.2), (1, 0.063), (2, 0.026), (3, -0.153), (4, -0.072), (5, -0.117), (6, -0.049), (7, 0.045), (8, -0.023), (9, 0.024), (10, -0.058), (11, 0.021), (12, 0.106), (13, -0.047), (14, -0.005), (15, 0.048), (16, -0.037), (17, -0.097), (18, -0.011), (19, -0.02), (20, -0.038), (21, 0.012), (22, 0.094), (23, -0.026), (24, -0.016), (25, 0.071), (26, -0.014), (27, 0.035), (28, -0.074), (29, 0.073), (30, -0.004), (31, 0.045), (32, 0.076), (33, -0.058), (34, 0.047), (35, 0.066), (36, 0.089), (37, 0.021), (38, 0.019), (39, 0.011), (40, 0.065), (41, -0.057), (42, -0.025), (43, -0.03), (44, 0.02), (45, 0.035), (46, 0.049), (47, -0.023), (48, 0.074), (49, -0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.95661312 238 acl-2013-Measuring semantic content in distributional vectors
Author: Aurelie Herbelot ; Mohan Ganesalingam
Abstract: Some words are more contentful than others: for instance, make is intuitively more general than produce and fifteen is more ‘precise’ than a group. In this paper, we propose to measure the ‘semantic content’ of lexical items, as modelled by distributional representations. We investigate the hypothesis that semantic content can be computed using the KullbackLeibler (KL) divergence, an informationtheoretic measure of the relative entropy of two distributions. In a task focusing on retrieving the correct ordering of hyponym-hypernym pairs, the KL diver- gence achieves close to 80% precision but does not outperform a simpler (linguistically unmotivated) frequency measure. We suggest that this result illustrates the rather ‘intensional’ aspect of distributions.
2 0.86838663 32 acl-2013-A relatedness benchmark to test the role of determiners in compositional distributional semantics
Author: Raffaella Bernardi ; Georgiana Dinu ; Marco Marelli ; Marco Baroni
Abstract: Distributional models of semantics capture word meaning very effectively, and they have been recently extended to account for compositionally-obtained representations of phrases made of content words. We explore whether compositional distributional semantic models can also handle a construction in which grammatical terms play a crucial role, namely determiner phrases (DPs). We introduce a new publicly available dataset to test distributional representations of DPs, and we evaluate state-of-the-art models on this set.
3 0.82712448 103 acl-2013-DISSECT - DIStributional SEmantics Composition Toolkit
Author: Georgiana Dinu ; Nghia The Pham ; Marco Baroni
Abstract: We introduce DISSECT, a toolkit to build and explore computational models of word, phrase and sentence meaning based on the principles of distributional semantics. The toolkit focuses in particular on compositional meaning, and implements a number of composition methods that have been proposed in the literature. Furthermore, DISSECT can be useful to researchers and practitioners who need models of word meaning (without composition) as well, as it supports various methods to construct distributional semantic spaces, assessing similarity and even evaluating against benchmarks, that are independent of the composition infrastructure.
4 0.82198983 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
Author: Olivier Ferret
Abstract: Distributional thesauri are now widely used in a large number of Natural Language Processing tasks. However, they are far from containing only interesting semantic relations. As a consequence, improving such thesaurus is an important issue that is mainly tackled indirectly through the improvement of semantic similarity measures. In this article, we propose a more direct approach focusing on the identification of the neighbors of a thesaurus entry that are not semantically linked to this entry. This identification relies on a discriminative classifier trained from unsupervised selected examples for building a distributional model of the entry in texts. Its bad neighbors are found by applying this classifier to a representative set of occurrences of each of these neighbors. We evaluate the interest of this method for a large set of English nouns with various frequencies.
5 0.79066175 76 acl-2013-Building and Evaluating a Distributional Memory for Croatian
Author: Jan Snajder ; Sebastian Pado ; Zeljko Agic
Abstract: We report on the first structured distributional semantic model for Croatian, DM.HR. It is constructed after the model of the English Distributional Memory (Baroni and Lenci, 2010), from a dependencyparsed Croatian web corpus, and covers about 2M lemmas. We give details on the linguistic processing and the design principles. An evaluation shows state-of-theart performance on a semantic similarity task with particularly good performance on nouns. The resource is freely available.
6 0.77881777 31 acl-2013-A corpus-based evaluation method for Distributional Semantic Models
8 0.66502523 22 acl-2013-A Structured Distributional Semantic Model for Event Co-reference
9 0.61496013 302 acl-2013-Robust Automated Natural Language Processing with Multiword Expressions and Collocations
10 0.61126071 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
11 0.60921901 12 acl-2013-A New Set of Norms for Semantic Relatedness Measures
12 0.58928579 113 acl-2013-Derivational Smoothing for Syntactic Distributional Semantics
14 0.57864416 96 acl-2013-Creating Similarity: Lateral Thinking for Vertical Similarity Judgments
15 0.56711721 61 acl-2013-Automatic Interpretation of the English Possessive
16 0.55584735 116 acl-2013-Detecting Metaphor by Contextual Analogy
17 0.54694909 262 acl-2013-Offspring from Reproduction Problems: What Replication Failure Teaches Us
18 0.54388624 242 acl-2013-Mining Equivalent Relations from Linked Data
19 0.54360378 93 acl-2013-Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora
20 0.52988881 389 acl-2013-Word Association Profiles and their Use for Automated Scoring of Essays
topicId topicWeight
[(0, 0.035), (6, 0.038), (11, 0.069), (15, 0.01), (24, 0.079), (26, 0.049), (35, 0.159), (42, 0.044), (48, 0.062), (64, 0.012), (70, 0.042), (88, 0.039), (89, 0.229), (90, 0.014), (95, 0.054)]
simIndex simValue paperId paperTitle
same-paper 1 0.8400104 238 acl-2013-Measuring semantic content in distributional vectors
Author: Aurelie Herbelot ; Mohan Ganesalingam
Abstract: Some words are more contentful than others: for instance, make is intuitively more general than produce and fifteen is more ‘precise’ than a group. In this paper, we propose to measure the ‘semantic content’ of lexical items, as modelled by distributional representations. We investigate the hypothesis that semantic content can be computed using the KullbackLeibler (KL) divergence, an informationtheoretic measure of the relative entropy of two distributions. In a task focusing on retrieving the correct ordering of hyponym-hypernym pairs, the KL diver- gence achieves close to 80% precision but does not outperform a simpler (linguistically unmotivated) frequency measure. We suggest that this result illustrates the rather ‘intensional’ aspect of distributions.
Author: Angeliki Lazaridou ; Ivan Titov ; Caroline Sporleder
Abstract: We propose a joint model for unsupervised induction of sentiment, aspect and discourse information and show that by incorporating a notion of latent discourse relations in the model, we improve the prediction accuracy for aspect and sentiment polarity on the sub-sentential level. We deviate from the traditional view of discourse, as we induce types of discourse relations and associated discourse cues relevant to the considered opinion analysis task; consequently, the induced discourse relations play the role of opinion and aspect shifters. The quantitative analysis that we conducted indicated that the integration of a discourse model increased the prediction accuracy results with respect to the discourse-agnostic approach and the qualitative analysis suggests that the induced representations encode a meaningful discourse structure.
Author: Yukari Ogura ; Ichiro Kobayashi
Abstract: In this paper, we propose a method to raise the accuracy of text classification based on latent topics, reconsidering the techniques necessary for good classification for example, to decide important sentences in a document, the sentences with important words are usually regarded as important sentences. In this case, tf.idf is often used to decide important words. On the other hand, we apply the PageRank algorithm to rank important words in each document. Furthermore, before clustering documents, we refine the target documents by representing them as a collection of important sentences in each document. We then classify the documents based on latent information in the documents. As a clustering method, we employ the k-means algorithm and inves– tigate how our proposed method works for good clustering. We conduct experiments with Reuters-21578 corpus under various conditions of important sentence extraction, using latent and surface information for clustering, and have confirmed that our proposed method provides better result among various conditions for clustering.
4 0.67124915 158 acl-2013-Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval
Author: K. Tamsin Maxwell ; Jon Oberlander ; W. Bruce Croft
Abstract: Techniques that compare short text segments using dependency paths (or simply, paths) appear in a wide range of automated language processing applications including question answering (QA). However, few models in ad hoc information retrieval (IR) use paths for document ranking due to the prohibitive cost of parsing a retrieval collection. In this paper, we introduce a flexible notion of paths that describe chains of words on a dependency path. These chains, or catenae, are readily applied in standard IR models. Informative catenae are selected using supervised machine learning with linguistically informed features and compared to both non-linguistic terms and catenae selected heuristically with filters derived from work on paths. Automatically selected catenae of 1-2 words deliver significant performance gains on three TREC collections.
5 0.67112601 347 acl-2013-The Role of Syntax in Vector Space Models of Compositional Semantics
Author: Karl Moritz Hermann ; Phil Blunsom
Abstract: Modelling the compositional process by which the meaning of an utterance arises from the meaning of its parts is a fundamental task of Natural Language Processing. In this paper we draw upon recent advances in the learning of vector space representations of sentential semantics and the transparent interface between syntax and semantics provided by Combinatory Categorial Grammar to introduce Combinatory Categorial Autoencoders. This model leverages the CCG combinatory operators to guide a non-linear transformation of meaning within a sentence. We use this model to learn high dimensional embeddings for sentences and evaluate them in a range of tasks, demonstrating that the incorporation of syntax allows a concise model to learn representations that are both effective and general.
6 0.67037576 121 acl-2013-Discovering User Interactions in Ideological Discussions
7 0.66997963 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
8 0.66965747 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context
9 0.66774237 172 acl-2013-Graph-based Local Coherence Modeling
10 0.66598797 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
11 0.66133833 58 acl-2013-Automated Collocation Suggestion for Japanese Second Language Learners
12 0.66043937 194 acl-2013-Improving Text Simplification Language Modeling Using Unsimplified Text Data
13 0.66041058 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension
14 0.65886897 99 acl-2013-Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation
15 0.65881205 60 acl-2013-Automatic Coupling of Answer Extraction and Information Retrieval
16 0.65818971 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction
17 0.65726519 351 acl-2013-Topic Modeling Based Classification of Clinical Reports
18 0.65660101 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
19 0.65522552 318 acl-2013-Sentiment Relevance
20 0.65477097 213 acl-2013-Language Acquisition and Probabilistic Models: keeping it simple