emnlp emnlp2012 emnlp2012-107 knowledge-graph by maker-knowledge-mining

107 emnlp-2012-Polarity Inducing Latent Semantic Analysis


Source: pdf

Author: Wen-tau Yih ; Geoffrey Zweig ; John Platt

Abstract: Existing vector space models typically map synonyms and antonyms to similar word vectors, and thus fail to represent antonymy. We introduce a new vector space representation where antonyms lie on opposite sides of a sphere: in the word vector space, synonyms have cosine similarities close to one, while antonyms are close to minus one. We derive this representation with the aid of a thesaurus and latent semantic analysis (LSA). Each entry in the thesaurus a word sense along with its synonyms and antonyms is treated as a “document,” and the resulting document collection is subjected to LSA. The key contribution of this work is to show how to assign signs to the entries in the co-occurrence matrix on which LSA operates, so as to induce a subspace with the desired property. – – We evaluate this procedure with the Graduate Record Examination questions of (Mohammed et al., 2008) and find that the method improves on the results of that study. Further improvements result from refining the subspace representation with discriminative training, and augmenting the training data with general newspaper text. Altogether, we improve on the best previous results by 11points absolute in F measure.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Platt Microsoft Research One Microsoft Way Redmond, WA 98052, USA { s cottyih , gzwe ig , Abstract Existing vector space models typically map synonyms and antonyms to similar word vectors, and thus fail to represent antonymy. [sent-2, score-0.623]

2 We introduce a new vector space representation where antonyms lie on opposite sides of a sphere: in the word vector space, synonyms have cosine similarities close to one, while antonyms are close to minus one. [sent-3, score-1.335]

3 We derive this representation with the aid of a thesaurus and latent semantic analysis (LSA). [sent-4, score-0.476]

4 Each entry in the thesaurus a word sense along with its synonyms and antonyms is treated as a “document,” and the resulting document collection is subjected to LSA. [sent-5, score-0.914]

5 The key contribution of this work is to show how to assign signs to the entries in the co-occurrence matrix on which LSA operates, so as to induce a subspace with the desired property. [sent-6, score-0.242]

6 com represented as a vector in a multi-dimensional continuous space, and a similarity metric such as cosine similarity can be used to measure the relatedness of different items. [sent-12, score-0.433]

7 At the word level, vector representations have been used to measure word similarity (Deerwester et al. [sent-19, score-0.162]

8 For example, latent semantic analysis might assign a high degree of similarity to opposites as well as synonyms (Landauer and Laham, 1998; Landauer, 2002). [sent-23, score-0.448]

9 At the simplest level, we may wish to distinguish between synonyms and antonyms, which can be further differentiated. [sent-25, score-0.203]

10 Moreover, Cruse (1986) notes that numerous kinds of antonymy are possible, for example antipodal pairs like “top-bottom” or 1212 LParnogcue agdein Lgesa ornf tihneg, 2 p0a1g2e Jso 1in21t C2–o1n2f2e2re,n Jce ju on Is Elanmdp,ir Kicoarlea M,e 1t2h–o1d4s J iunly N 2a0tu1r2a. [sent-28, score-0.205]

11 Despite the existence of a large amount of related work in the literature, distinguishing synonyms and antonyms is still considered as a difficult open problem in general (Poon and Domingos, 2009). [sent-35, score-0.504]

12 In this paper, we fuse these two strands of research in an attempt to develop a vector space representation in which the synonymy and antonymy are naturally differentiated. [sent-36, score-0.398]

13 (2002) in requiring a representation in which two lexical items in an antonymy relation should lie at opposite ends of an axis. [sent-38, score-0.37]

14 However, in contrast to the logical axes used previously, we desire that antonyms should lie at the opposite ends of a sphere lying in a continuous and automatically induced vector space. [sent-39, score-0.528]

15 The result is a vector space representation in which synonyms cluster together, and the opposites of a word tend to clus- ter together at the opposite end of a sphere. [sent-42, score-0.561]

16 First, by finding the items most and least similar to a word, we are able to discover new synonyms and antonyms. [sent-44, score-0.203]

17 Thirdly, as we discuss in Section 6, it is straightforward to embed new words into the derived subspace by using information from a large unsupervised text corpus such as Wikipedia. [sent-46, score-0.19]

18 2 Related Work The detection of antonymy has been studied in a number ofprevious papers. [sent-56, score-0.205]

19 (2008) approach the problem by combining information from a published thesaurus with corpus statistics derived from the Google n-gram corpus (Brants and Franz, 2006). [sent-58, score-0.329]

20 Words belonging to contrasting categories are treated as antonyms and the degree of contrast is determined by distributional similarity. [sent-66, score-0.389]

21 , 2011) to include a study of antonymy based on crowd-sourcing experiments. [sent-70, score-0.205]

22 Turney (2008) proposes a unified approach to handling analogies, synonyms, antonyms and associations by transforming the last three cases into cases of analogy. [sent-71, score-0.301]

23 (2003) builds on (Lin, 1998) and identifies antonyms as semantically related words which also happen to be found together in a database in pre-identified phrases indicating opposition. [sent-75, score-0.369]

24 (2003) further note that whereas synonyms will tend to translate to the same word in another language, antonyms will not. [sent-77, score-0.543]

25 This observation is used to select antonyms from amongst distributionally similar words. [sent-78, score-0.341]

26 The automatic detection of synonyms has been more extensively studied. [sent-81, score-0.203]

27 , 1998) to answer word similarity questions derived from the Test of English as a Foreign Language (TOEFL). [sent-93, score-0.242]

28 Turney (2001) proposes the use ofpoint-wise mutual information in conjunction with LSA, and again presents results on synonym questions derived from the TOEFL. [sent-94, score-0.171]

29 These documents may be actual documents such as newspaper articles, or simply notional documents such as sentences, or any other collection in which words are grouped together. [sent-101, score-0.176]

30 The similarity between two documents can be computed using the cosine similarity of their corresponding row vectors: sim(x,y) =k xx k ·k y y k Similarly, the cosine similarity of two column vectors can be used to judge the similarity of the corresponding words. [sent-106, score-0.757]

31 An important property of SVD is that the columns of SVT which now represent the words behave similarly to the original columns of W, in the sense that the cosine similarity between two columns in SVT approximates the cosine similarity between the – – × corresponding columns in W. [sent-113, score-0.835]

32 For efficiency, we normalize the columns of SVT to unit length, allowing the cosine similarity between two words to be computed with a single dot-product; this also has the property of mapping each word to a point on a multidimensional sphere. [sent-116, score-0.351]

33 1 Limitation of LSA Word similarity as determined by LSA assigns high values to words which tend to co-occur in documents. [sent-120, score-0.172]

34 However, as noted by (Landauer and Laham, 1998; Landauer, 2002), there is no notion of antonymy; words with low or negative cosine scores are simply unrelated. [sent-121, score-0.169]

35 In comparison, words with high cosine similarity scores are typically semantically related, which includes both synonyms and antonyms, as contrasting words often cooccur (Murphy and Andrew, 1993; Mohammed et al. [sent-122, score-0.588]

36 To illustrate this, we have performed SVD with the aid of the Encarta thesaurus developed by Bloomsbury Publishing Plc. [sent-124, score-0.361]

37 This thesaurus contains approximately 47k word senses and a vocabulary of 50k words and phrases. [sent-125, score-0.411]

38 Each “document” is taken to be the thesaurus entry for a word-sense, including synonyms and antonyms. [sent-126, score-0.567]

39 Table 1 shows some words, their original thesaurus documents, and the most and least similar words in the LSA subspace. [sent-131, score-0.364]

40 For example, “meritorious” for “admirable” arguably better than any of the words given in the thesaurus itself. [sent-133, score-0.364]

41 – • Similarity is based on co-occurrence, so the co-occurrence oasfe antonyms cicnu trhreen cteh,esa sour tuhsederived documents induces their presence as LSA-similar words. [sent-134, score-0.348]

42 ” In the case of “mourning,” opposites 1215 a cfr iemctio n yacrim1 onyran1 corgo 1 dwil af e1c1tion Table 2: The W matrix for two thesaurus entries in its original form. [sent-136, score-0.651]

43 a cfr iemctio n yacri-m1 onyra-n1 corgo -1d1wilaf e-1c1tion Table 3: The W matrix for two thesaurus entries in its polarity-inducing form. [sent-138, score-0.577]

44 ” In the next section, we will present a method for inducing polarity in LSA subspaces, where opposite words will tend to have negative cosine similarities, analogous to the positive similarities of synonyms. [sent-142, score-0.443]

45 4 Polarity Inducing LSA We modify LSA so that we may exploit a thesaurus to embed meaningful axes in the induced subspace representation. [sent-144, score-0.524]

46 Words with opposite meaning will lie at opposite positions on a sphere. [sent-145, score-0.207]

47 Recall that the cosine similarity between word-vectors in the original matrix W are preserved in the subspace representation ofwords. [sent-146, score-0.475]

48 Thus, ifwe construct the original matrix so that the columns representing antonyms will tend to have negative cosine similarities while columns representing synonyms will tend to have positive similarities, we will achieve the desired behavior. [sent-147, score-1.028]

49 This can be achieved by negating the TF-IDF entries for the antonyms of a word when constructing W from the thesaurus, which is illustrated in Tables 2 and 3. [sent-148, score-0.342]

50 The two rows in these tables correspond to thesaurus entries for the sense-categories WordThesaurus EntryLSA Most-Similar WordsLSA Least-Similar Words “acrimony,” and “affection. [sent-149, score-0.37]

51 ” The thesaurus entries induce two “documents” containing the words and their synonyms and antonyms. [sent-150, score-0.608]

52 Note that the cosine similarity between every pair of words (columns) is 1. [sent-156, score-0.267]

53 Here, the cosine similarity between synonymous words (columns) is 1, and the cosine similarity between antonymous words is -1. [sent-158, score-0.534]

54 Since LSA tends to preserve cosine similarities between words, in the resulting subspace we may expect to find meaningful axes, where opposite senses map to opposite extremes. [sent-159, score-0.455]

55 In this section, we see that when supervised training data is available, the projection matrix of LSA can be enhanced through a discriminative training technique explicitly designed to create a representation suited to a specific task. [sent-169, score-0.197]

56 Given a d-by-1 input vector f, the model of S2Net is a d-by-k matrix A = [aij]d×k, which maps f to a k-by-1 output vector g = AdT×fk. [sent-177, score-0.224]

57 The objective of the training process is to assign higher cosine similarities to these pairs compared to others. [sent-181, score-0.182]

58 Given a projection matrix A, t)h,e· similarity score . [sent-183, score-0.253]

59 Let ∆ij = simA (fpi , fqi ) − simA(fpi , fqj ) be the difference of the similarity scores of (fpi , fqi ) and (fpi , fqj ). [sent-185, score-0.356]

60 We first sample pairs of antonyms from the thesaurus to create the training data. [sent-194, score-0.63]

61 The raw input vector f of a selected word is its corresponding column vector of the document-term matrix W (Section 3) after inducing polarity (Section 4). [sent-195, score-0.327]

62 When each pair of vectors in the training data represents two antonyms, we can redefine ∆ij by flipping the sign: ∆ij = simA (fpi , fqj ) − simA (fpi , fqi ), and leave others unchanged. [sent-196, score-0.179]

63 As t)he − lo ssism mfunction encourages ∆ij to be larger, an antonym pair will tend to have a smaller cosine similarity than other pairs. [sent-197, score-0.366]

64 6 Extending PILSA to Out-of-thesaurus Words While PILSA is effective at representing synonym and antonym information, in its pure form, it is limited to the vocabulary of the thesaurus. [sent-200, score-0.201]

65 For example, although the Encarta thesaurus does not have the word “corruptibility,” it does contain other forms like “corruptible” and “corruption. [sent-207, score-0.329]

66 A simple rule that checks whether removing hyphens from words can lead to a match and whether the target word occurs as part of a compound word in the thesaurus is applied when both morphological analysis and stemming fail to find a match. [sent-212, score-0.437]

67 2 Leveraging General Text Data If no words in the thesaurus can be linked to the target word through the simple lexical analysis procedure, we try to find matched words by creating a context vector space model from a large document collection, and then mapping from this space to the PILSA space. [sent-216, score-0.658]

68 When a word is not in the thesaurus but appears in the corpus, we predict its PILSA vector representation from the context vector space model by using its k-nearest neighbors which are in the thesaurus and consistent with each other. [sent-219, score-0.921]

69 The semantic similarity/relatedness of two words can then be determined using the cosine similarity of their corresponding LSA word vectors. [sent-226, score-0.304]

70 In the following text, we refer to this LSA context vector space model as the corpus space, in contrast to the PILSA thesaurus space. [sent-227, score-0.448]

71 2 Embedding Out-of-Vocabulary Words Given the context space model, we may use a linear regression or a k-nearest neighbors approach to embed out-of-thesaurus words into the thesaurusspace representation. [sent-230, score-0.178]

72 However, as near words in the context space may be synonyms in addition to other semantically related words (including antonyms), such approaches can potentially be noisy. [sent-231, score-0.361]

73 An affine transform cannot “tear space” and map them to opposite poles in the thesaurus space. [sent-233, score-0.413]

74 1 Data Resources The primary thesaurus we use is the Encarta Thesaurus developed by Bloomsbury Publishing Plc4. [sent-247, score-0.329]

75 Each synset in WordNet maps to a row in the document-term matrix; synonyms in a synset are weighted with positive TFIDF values, and antonyms are weighted negative TFIDF values. [sent-250, score-0.504]

76 Each question contains a target word and five choices, and asks which of the choice words has the most opposite meaning to the target word. [sent-260, score-0.197]

77 Out of the 162 questions, using the Bloomsbury thesaurus data we are able to answer 153 of them. [sent-270, score-0.361]

78 sive, test for example one which would require the use of sentence context to choose between related yet distributionally different antonyms (e. [sent-275, score-0.341]

79 “little, small” as antonyms of “big”) but chose to stick to a – previously used benchmark. [sent-277, score-0.301]

80 Some of these questions contain very rarely used target or choice words, which are not included in the thesaurus vocabulary. [sent-279, score-0.48]

81 When the target word is in vocabulary but one or more choices are unknown words, we ignore those unknown words and pick the word with the lowest cosine similarity from the rest as the answer. [sent-282, score-0.353]

82 The results of our methods are reported in precision (the number of questions answered correctly divided by the number of questions attempted), recall (the number of questions answered correctly divided by the number of all questions) and F1 (the harmonic mean of precision and recall)6. [sent-283, score-0.402]

83 , 2008) as when their system “did not find any evidence of antonymy between the target and any of its alternatives, then it refrained from attempting that question. [sent-286, score-0.244]

84 In contrast, the larger vocabulary in WordNet helps the system answer 160 questions but the quality is not as good. [sent-296, score-0.191]

85 We find dimensions 300 and 400 are equally good, where both answer 100 questions correctly (0. [sent-297, score-0.181]

86 In contrast, the WordNet-based methods (Lines 1–3) attempted 936 7Note that the number of questions attempted is not a function of the number of dimensions. [sent-308, score-0.188]

87 For example, directly using the antonym sets in the Bloomsbury thesaurus gives 0. [sent-318, score-0.424]

88 59 in F1 (Line 4), while using cosine similarity on the signed vectors prior to LSA only reaches 0. [sent-319, score-0.329]

89 As described in Section 5, we take the PILSA projection matrix as the initial model in S2Net and train the model using 20,517 pairs of antonyms sampled from the Bloomsbury thesaurus. [sent-323, score-0.456]

90 Using the Bloomsbury PILSA-S2Net thesaurus space and the Wikipedia corpus space, our method increases recall by 3 points on the test set. [sent-331, score-0.384]

91 We notice that the out-ofthesaurus words are either offensive words excluded in the thesaurus (e. [sent-334, score-0.399]

92 When the lexical analysis procedure fails to match the target word to some in-thesaurus words, the context vector embedding approach solves the former case, but has difficulty in handling the latter. [sent-339, score-0.169]

93 8 Conclusion In this paper we have tackled the problem of find- ing a vector-space representation of words where, by construction, synonyms and antonyms are easy to distinguish. [sent-343, score-0.581]

94 Specifically, we have defined a way of assigning sign to the entries in the co-occurrence matrix on which LSA operates, such that synonyms will tend to have positive cosine similarity, and antonyms will tend to have negative similarities. [sent-344, score-0.853]

95 To the best of our knowledge, our method of inducing polarity to the document-term matrix before applying LSA is novel and has shown to effectively preserve and generalize the synonymous/antonymous information in the projected space. [sent-345, score-0.199]

96 A study on similarity and relatedness using distributional and wordnet-based approaches. [sent-353, score-0.175]

97 Using wordnet similarity and antonymy relations to aid document retrieval. [sent-388, score-0.429]

98 The conceptual basis of antonymy and synonymy in adjectives. [sent-479, score-0.237]

99 Mining the web for synonyms: Pmi-ir versus lsa on toefl. [sent-541, score-0.33]

100 Finding synonyms using automatic word alignment and measures of distributional similarity. [sent-559, score-0.241]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pilsa', 0.331), ('lsa', 0.33), ('thesaurus', 0.329), ('antonyms', 0.301), ('mohammed', 0.205), ('antonymy', 0.205), ('synonyms', 0.203), ('bloomsbury', 0.147), ('cosine', 0.134), ('fpi', 0.129), ('gre', 0.129), ('turney', 0.124), ('questions', 0.112), ('subspace', 0.105), ('landauer', 0.098), ('similarity', 0.098), ('matrix', 0.096), ('antonym', 0.095), ('plas', 0.092), ('ij', 0.089), ('columns', 0.084), ('opposite', 0.084), ('deerwester', 0.079), ('sima', 0.079), ('fqj', 0.074), ('inthesaurus', 0.074), ('opposites', 0.074), ('svt', 0.074), ('platt', 0.071), ('embedding', 0.066), ('vector', 0.064), ('projection', 0.059), ('synonym', 0.059), ('polarity', 0.056), ('acrimony', 0.055), ('admirable', 0.055), ('fqi', 0.055), ('space', 0.055), ('salton', 0.053), ('der', 0.051), ('vectors', 0.05), ('embed', 0.05), ('littman', 0.05), ('contrasting', 0.05), ('similarities', 0.048), ('wordnet', 0.048), ('signed', 0.047), ('saif', 0.047), ('encarta', 0.047), ('vocabulary', 0.047), ('documents', 0.047), ('inducing', 0.047), ('document', 0.046), ('yih', 0.043), ('representation', 0.042), ('entries', 0.041), ('distributionally', 0.04), ('axes', 0.04), ('pca', 0.04), ('lie', 0.039), ('relatedness', 0.039), ('target', 0.039), ('tend', 0.039), ('distributional', 0.038), ('wikipedia', 0.038), ('neighbors', 0.038), ('attempted', 0.038), ('semantic', 0.037), ('hot', 0.037), ('centroid', 0.037), ('dimensions', 0.037), ('cfr', 0.037), ('corgo', 0.037), ('iemctio', 0.037), ('ijth', 0.037), ('jarmasz', 0.037), ('laham', 0.037), ('lonneke', 0.037), ('minnen', 0.037), ('opca', 0.037), ('rancor', 0.037), ('schwab', 0.037), ('scorching', 0.037), ('wtw', 0.037), ('peter', 0.037), ('latent', 0.036), ('words', 0.035), ('entry', 0.035), ('morphological', 0.034), ('svd', 0.033), ('answered', 0.033), ('semantically', 0.033), ('answer', 0.032), ('synonymy', 0.032), ('aid', 0.032), ('moens', 0.032), ('goodwill', 0.032), ('cold', 0.032), ('aaai', 0.031), ('lin', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis

Author: Wen-tau Yih ; Geoffrey Zweig ; John Platt

Abstract: Existing vector space models typically map synonyms and antonyms to similar word vectors, and thus fail to represent antonymy. We introduce a new vector space representation where antonyms lie on opposite sides of a sphere: in the word vector space, synonyms have cosine similarities close to one, while antonyms are close to minus one. We derive this representation with the aid of a thesaurus and latent semantic analysis (LSA). Each entry in the thesaurus a word sense along with its synonyms and antonyms is treated as a “document,” and the resulting document collection is subjected to LSA. The key contribution of this work is to show how to assign signs to the entries in the co-occurrence matrix on which LSA operates, so as to induce a subspace with the desired property. – – We evaluate this procedure with the Graduate Record Examination questions of (Mohammed et al., 2008) and find that the method improves on the results of that study. Further improvements result from refining the subspace representation with discriminative training, and augmenting the training data with general newspaper text. Altogether, we improve on the best previous results by 11points absolute in F measure.

2 0.10610341 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition

Author: William Blacoe ; Mirella Lapata

Abstract: In this paper we address the problem of modeling compositional meaning for phrases and sentences using distributional methods. We experiment with several possible combinations of representation and composition, exhibiting varying degrees of sophistication. Some are shallow while others operate over syntactic structure, rely on parameter learning, or require access to very large corpora. We find that shallow approaches are as good as more computationally intensive alternatives with regards to two particular tests: (1) phrase similarity and (2) paraphrase detection. The sizes of the involved training corpora and the generated vectors are not as important as the fit between the meaning representation and compositional method.

3 0.08954455 49 emnlp-2012-Exploring Topic Coherence over Many Models and Many Topics

Author: Keith Stevens ; Philip Kegelmeyer ; David Andrzejewski ; David Buttler

Abstract: We apply two new automated semantic evaluations to three distinct latent topic models. Both metrics have been shown to align with human evaluations and provide a balance between internal measures of information gain and comparisons to human ratings of coherent topics. We improve upon the measures by introducing new aggregate measures that allows for comparing complete topic models. We further compare the automated measures to other metrics for topic models, comparison to manually crafted semantic tests and document classification. Our experiments reveal that LDA and LSA each have different strengths; LDA best learns descriptive topics while LSA is best at creating a compact semantic representation ofdocuments and words in a corpus.

4 0.088364854 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces

Author: Richard Socher ; Brody Huval ; Christopher D. Manning ; Andrew Y. Ng

Abstract: Single-word vector space models have been very successful at learning lexical information. However, they cannot capture the compositional meaning of longer phrases, preventing them from a deeper understanding of language. We introduce a recursive neural network (RNN) model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length. Our model assigns a vector and a matrix to every node in a parse tree: the vector captures the inherent meaning of the constituent, while the matrix captures how it changes the meaning of neighboring words or phrases. This matrix-vector RNN can learn the meaning of operators in propositional logic and natural language. The model obtains state of the art performance on three different experiments: predicting fine-grained sentiment distributions of adverb-adjective pairs; classifying sentiment labels of movie reviews and classifying semantic relationships such as cause-effect or topic-message between nouns using the syntactic path between them.

5 0.076416343 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes

Author: Jong-Hoon Oh ; Kentaro Torisawa ; Chikara Hashimoto ; Takuya Kawada ; Stijn De Saeger ; Jun'ichi Kazama ; Yiou Wang

Abstract: In this paper we explore the utility of sentiment analysis and semantic word classes for improving why-question answering on a large-scale web corpus. Our work is motivated by the observation that a why-question and its answer often follow the pattern that if something undesirable happens, the reason is also often something undesirable, and if something desirable happens, the reason is also often something desirable. To the best of our knowledge, this is the first work that introduces sentiment analysis to non-factoid question answering. We combine this simple idea with semantic word classes for ranking answers to why-questions and show that on a set of 850 why-questions our method gains 15.2% improvement in precision at the top-1 answer over a baseline state-of-the-art QA system that achieved the best performance in a shared task of Japanese non-factoid QA in NTCIR-6.

6 0.075616546 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context

7 0.067323692 24 emnlp-2012-Biased Representation Learning for Domain Adaptation

8 0.064361162 32 emnlp-2012-Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants

9 0.063308619 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts

10 0.060065277 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews

11 0.060004141 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP

12 0.058321159 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation

13 0.057647679 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering

14 0.057241216 28 emnlp-2012-Collocation Polarity Disambiguation Using Web-based Pseudo Contexts

15 0.053739835 34 emnlp-2012-Do Neighbours Help? An Exploration of Graph-based Algorithms for Cross-domain Sentiment Classification

16 0.052344475 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games

17 0.050710723 80 emnlp-2012-Learning Verb Inference Rules from Linguistically-Motivated Evidence

18 0.049457945 97 emnlp-2012-Natural Language Questions for the Web of Data

19 0.046049178 53 emnlp-2012-First Order vs. Higher Order Modification in Distributional Semantics

20 0.045760654 25 emnlp-2012-Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.18), (1, 0.078), (2, -0.022), (3, 0.133), (4, 0.04), (5, 0.05), (6, 0.053), (7, 0.056), (8, 0.133), (9, -0.009), (10, -0.125), (11, -0.015), (12, 0.031), (13, 0.015), (14, 0.047), (15, -0.044), (16, 0.058), (17, 0.069), (18, 0.08), (19, 0.028), (20, -0.004), (21, 0.037), (22, 0.037), (23, 0.127), (24, 0.03), (25, 0.0), (26, -0.023), (27, -0.02), (28, 0.093), (29, 0.074), (30, -0.028), (31, 0.016), (32, 0.057), (33, 0.065), (34, 0.028), (35, 0.082), (36, 0.185), (37, 0.164), (38, 0.083), (39, 0.007), (40, -0.193), (41, -0.017), (42, 0.048), (43, 0.077), (44, 0.015), (45, 0.051), (46, -0.089), (47, -0.04), (48, 0.064), (49, 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93336189 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis

Author: Wen-tau Yih ; Geoffrey Zweig ; John Platt

Abstract: Existing vector space models typically map synonyms and antonyms to similar word vectors, and thus fail to represent antonymy. We introduce a new vector space representation where antonyms lie on opposite sides of a sphere: in the word vector space, synonyms have cosine similarities close to one, while antonyms are close to minus one. We derive this representation with the aid of a thesaurus and latent semantic analysis (LSA). Each entry in the thesaurus a word sense along with its synonyms and antonyms is treated as a “document,” and the resulting document collection is subjected to LSA. The key contribution of this work is to show how to assign signs to the entries in the co-occurrence matrix on which LSA operates, so as to induce a subspace with the desired property. – – We evaluate this procedure with the Graduate Record Examination questions of (Mohammed et al., 2008) and find that the method improves on the results of that study. Further improvements result from refining the subspace representation with discriminative training, and augmenting the training data with general newspaper text. Altogether, we improve on the best previous results by 11points absolute in F measure.

2 0.57297397 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context

Author: Mehmet Ali Yatbaz ; Enis Sert ; Deniz Yuret

Abstract: We investigate paradigmatic representations of word context in the domain of unsupervised syntactic category acquisition. Paradigmatic representations of word context are based on potential substitutes of a word in contrast to syntagmatic representations based on properties of neighboring words. We compare a bigram based baseline model with several paradigmatic models and demonstrate significant gains in accuracy. Our best model based on Euclidean co-occurrence embedding combines the paradigmatic context representation with morphological and orthographic features and achieves 80% many-to-one accuracy on a 45-tag 1M word corpus.

3 0.49419963 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition

Author: William Blacoe ; Mirella Lapata

Abstract: In this paper we address the problem of modeling compositional meaning for phrases and sentences using distributional methods. We experiment with several possible combinations of representation and composition, exhibiting varying degrees of sophistication. Some are shallow while others operate over syntactic structure, rely on parameter learning, or require access to very large corpora. We find that shallow approaches are as good as more computationally intensive alternatives with regards to two particular tests: (1) phrase similarity and (2) paraphrase detection. The sizes of the involved training corpora and the generated vectors are not as important as the fit between the meaning representation and compositional method.

4 0.46920374 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces

Author: Richard Socher ; Brody Huval ; Christopher D. Manning ; Andrew Y. Ng

Abstract: Single-word vector space models have been very successful at learning lexical information. However, they cannot capture the compositional meaning of longer phrases, preventing them from a deeper understanding of language. We introduce a recursive neural network (RNN) model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length. Our model assigns a vector and a matrix to every node in a parse tree: the vector captures the inherent meaning of the constituent, while the matrix captures how it changes the meaning of neighboring words or phrases. This matrix-vector RNN can learn the meaning of operators in propositional logic and natural language. The model obtains state of the art performance on three different experiments: predicting fine-grained sentiment distributions of adverb-adjective pairs; classifying sentiment labels of movie reviews and classifying semantic relationships such as cause-effect or topic-message between nouns using the syntactic path between them.

5 0.46390039 44 emnlp-2012-Excitatory or Inhibitory: A New Semantic Orientation Extracts Contradiction and Causality from the Web

Author: Chikara Hashimoto ; Kentaro Torisawa ; Stijn De Saeger ; Jong-Hoon Oh ; Jun'ichi Kazama

Abstract: We propose a new semantic orientation, Excitation, and its automatic acquisition method. Excitation is a semantic property of predicates that classifies them into excitatory, inhibitory and neutral. We show that Excitation is useful for extracting contradiction pairs (e.g., destroy cancer develop cancer) and causality pairs (cea.ng.c,e rin ⊥cre daesvee lionp c craimnece r⇒) ahnedig chatuensa laintyx pieatyir)s. (Oe.ugr. ,ex ipncerreimaseent ins shc roimw eth ⇒at w heitihg ahtuetnom anatxicieatlyly). acquired Excitation knowledge we can extract one million contradiction pairs and 500,000 causality pairs with about 70% precision from a 600 million page Web corpus. Furthermore, by combining these extracted causality and contradiction pairs, we can generate one million plausible causality hypotheses that are not written in any single sentence in our corpus ⊥ with reasonable precision.

6 0.44200626 61 emnlp-2012-Grounded Models of Semantic Representation

7 0.43734208 32 emnlp-2012-Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants

8 0.42023468 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes

9 0.41816172 26 emnlp-2012-Building a Lightweight Semantic Model for Unsupervised Information Extraction on Short Listings

10 0.39618415 53 emnlp-2012-First Order vs. Higher Order Modification in Distributional Semantics

11 0.39434564 49 emnlp-2012-Exploring Topic Coherence over Many Models and Many Topics

12 0.36479643 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games

13 0.34595674 50 emnlp-2012-Extending Machine Translation Evaluation Metrics with Lexical Cohesion to Document Level

14 0.34344521 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts

15 0.33761376 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP

16 0.31759027 29 emnlp-2012-Concurrent Acquisition of Word Meaning and Lexical Categories

17 0.31131846 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering

18 0.30674523 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation

19 0.2914072 33 emnlp-2012-Discovering Diverse and Salient Threads in Document Collections

20 0.28767249 28 emnlp-2012-Collocation Polarity Disambiguation Using Web-based Pseudo Contexts


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.011), (14, 0.011), (16, 0.028), (25, 0.017), (34, 0.075), (41, 0.011), (45, 0.026), (60, 0.083), (63, 0.05), (64, 0.031), (65, 0.032), (70, 0.027), (73, 0.013), (74, 0.041), (76, 0.071), (79, 0.011), (80, 0.018), (86, 0.042), (95, 0.041), (97, 0.272)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.74991298 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis

Author: Wen-tau Yih ; Geoffrey Zweig ; John Platt

Abstract: Existing vector space models typically map synonyms and antonyms to similar word vectors, and thus fail to represent antonymy. We introduce a new vector space representation where antonyms lie on opposite sides of a sphere: in the word vector space, synonyms have cosine similarities close to one, while antonyms are close to minus one. We derive this representation with the aid of a thesaurus and latent semantic analysis (LSA). Each entry in the thesaurus a word sense along with its synonyms and antonyms is treated as a “document,” and the resulting document collection is subjected to LSA. The key contribution of this work is to show how to assign signs to the entries in the co-occurrence matrix on which LSA operates, so as to induce a subspace with the desired property. – – We evaluate this procedure with the Graduate Record Examination questions of (Mohammed et al., 2008) and find that the method improves on the results of that study. Further improvements result from refining the subspace representation with discriminative training, and augmenting the training data with general newspaper text. Altogether, we improve on the best previous results by 11points absolute in F measure.

2 0.64560717 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction

Author: Yang Feng ; Yang Liu ; Qun Liu ; Trevor Cohn

Abstract: Decoding algorithms for syntax based machine translation suffer from high computational complexity, a consequence of intersecting a language model with a context free grammar. Left-to-right decoding, which generates the target string in order, can improve decoding efficiency by simplifying the language model evaluation. This paper presents a novel left to right decoding algorithm for tree-to-string translation, using a bottom-up parsing strategy and dynamic future cost estimation for each partial translation. Our method outperforms previously published tree-to-string decoders, including a competing left-to-right method.

3 0.49503255 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

Author: Jayant Krishnamurthy ; Tom Mitchell

Abstract: We present a method for training a semantic parser using only a knowledge base and an unlabeled text corpus, without any individually annotated sentences. Our key observation is that multiple forms ofweak supervision can be combined to train an accurate semantic parser: semantic supervision from a knowledge base, and syntactic supervision from dependencyparsed sentences. We apply our approach to train a semantic parser that uses 77 relations from Freebase in its knowledge representation. This semantic parser extracts instances of binary relations with state-of-theart accuracy, while simultaneously recovering much richer semantic structures, such as conjunctions of multiple relations with partially shared arguments. We demonstrate recovery of this richer structure by extracting logical forms from natural language queries against Freebase. On this task, the trained semantic parser achieves 80% precision and 56% recall, despite never having seen an annotated logical form.

4 0.48919123 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts

Author: Lizhen Qu ; Rainer Gemulla ; Gerhard Weikum

Abstract: We propose the weakly supervised MultiExperts Model (MEM) for analyzing the semantic orientation of opinions expressed in natural language reviews. In contrast to most prior work, MEM predicts both opinion polarity and opinion strength at the level of individual sentences; such fine-grained analysis helps to understand better why users like or dislike the entity under review. A key challenge in this setting is that it is hard to obtain sentence-level training data for both polarity and strength. For this reason, MEM is weakly supervised: It starts with potentially noisy indicators obtained from coarse-grained training data (i.e., document-level ratings), a small set of diverse base predictors, and, if available, small amounts of fine-grained training data. We integrate these noisy indicators into a unified probabilistic framework using ideas from ensemble learning and graph-based semi-supervised learning. Our experiments indicate that MEM outperforms state-of-the-art methods by a significant margin.

5 0.48117724 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents

Author: Heeyoung Lee ; Marta Recasens ; Angel Chang ; Mihai Surdeanu ; Dan Jurafsky

Abstract: We introduce a novel coreference resolution system that models entities and events jointly. Our iterative method cautiously constructs clusters of entity and event mentions using linear regression to model cluster merge operations. As clusters are built, information flows between entity and event clusters through features that model semantic role dependencies. Our system handles nominal and verbal events as well as entities, and our joint formulation allows information from event coreference to help entity coreference, and vice versa. In a cross-document domain with comparable documents, joint coreference resolution performs significantly better (over 3 CoNLL F1 points) than two strong baselines that resolve entities and events separately.

6 0.4789792 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP

7 0.47886825 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews

8 0.4774237 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction

9 0.4754658 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games

10 0.47519156 92 emnlp-2012-Multi-Domain Learning: When Do Domains Matter?

11 0.47511867 129 emnlp-2012-Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries

12 0.47414911 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT

13 0.47366109 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon

14 0.47312337 120 emnlp-2012-Streaming Analysis of Discourse Participants

15 0.47288901 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling

16 0.47269264 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition

17 0.47244683 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules

18 0.47217405 24 emnlp-2012-Biased Representation Learning for Domain Adaptation

19 0.47141337 30 emnlp-2012-Constructing Task-Specific Taxonomies for Document Collection Browsing

20 0.47066361 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM