emnlp emnlp2012 emnlp2012-116 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Richard Socher ; Brody Huval ; Christopher D. Manning ; Andrew Y. Ng
Abstract: Single-word vector space models have been very successful at learning lexical information. However, they cannot capture the compositional meaning of longer phrases, preventing them from a deeper understanding of language. We introduce a recursive neural network (RNN) model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length. Our model assigns a vector and a matrix to every node in a parse tree: the vector captures the inherent meaning of the constituent, while the matrix captures how it changes the meaning of neighboring words or phrases. This matrix-vector RNN can learn the meaning of operators in propositional logic and natural language. The model obtains state of the art performance on three different experiments: predicting fine-grained sentiment distributions of adverb-adjective pairs; classifying sentiment labels of movie reviews and classifying semantic relationships such as cause-effect or topic-message between nouns using the syntactic path between them.
Reference: text
sentIndex sentText sentNum sentScore
1 However, they cannot capture the compositional meaning of longer phrases, preventing them from a deeper understanding of language. [sent-7, score-0.338]
2 We introduce a recursive neural network (RNN) model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length. [sent-8, score-0.858]
3 Our model assigns a vector and a matrix to every node in a parse tree: the vector captures the inherent meaning of the constituent, while the matrix captures how it changes the meaning of neighboring words or phrases. [sent-9, score-0.758]
4 This matrix-vector RNN can learn the meaning of operators in propositional logic and natural language. [sent-10, score-0.555]
5 In these models the meaning of a word is encoded as a vector computed from co-occurrence statistics of a word and its neighboring words. [sent-16, score-0.226]
6 vmeactroix Figure 1: A recursive neural network which learns semantic vector representations of phrases in a tree structure. [sent-21, score-0.732]
7 Recently, there has been much progress in capturing compositionality in vector spaces, e. [sent-29, score-0.277]
8 We present a novel recursive neural network model for semantic compositionality. [sent-35, score-0.402]
9 In our context, compositionality is the ability to learn compositional vector representations for various types of phrases and sentences of arbitrary length. [sent-36, score-0.667]
10 The matrix captures how it modifies the meaning of the other word that it combines with. [sent-42, score-0.225]
11 Since the model uses the MV representation with a neural network as the final merging function, we call our model a matrix-vector recursive neural network (MV-RNN). [sent-44, score-0.472]
12 We show that the ability to capture semantic compositionality in a syntactically plausible way translates into state of the art performance on various tasks. [sent-45, score-0.3]
13 The task is to predict a sentiment distribution over movie reviews of adverb-adjective pairs such as unbelievably sad or really awesome. [sent-47, score-0.592]
14 The MV-RNN is the only model that is able to properly negate sentiment when adjectives are combined with not. [sent-48, score-0.252]
15 The MV-RNN outperforms previous state of the art models on full sentence sentiment prediction of movie reviews. [sent-49, score-0.379]
16 2 MV-RNN: A Recursive Matrix-Vector Model The dominant approach for building representations of multi-word units from single word vector representations has been to form a linear combination of the single word representations, such as a sum or weighted average. [sent-59, score-0.461]
17 These approaches can work well when the meaning of a text is literally “the sum of its parts”, but fails when words function as operators that modify the meaning of another word: the meaning of “extremely strong” cannot be captured as the sum of word representations for “extremely” and “strong. [sent-61, score-0.673]
18 (201 1c) provided a new possibility for moving beyond a linear combination, through use of a matrix W that multiplied the word vectors (a, b), and a nonlinearity function g (such as a sigmoid or tanh). [sent-63, score-0.449]
19 (1) and apply this function recursively inside a binarized parse tree so that it can compute vectors for multiword sequences. [sent-68, score-0.324]
20 Even though the nonlinearity allows to express a wider range of functions, it is almost certainly too much to expect a single fixed W matrix to be able to capture the meaning combination effects of all natural language operators. [sent-69, score-0.369]
21 Recent work has started to capture the behavior of natural language operators inside semantic vector spaces by modeling them as matrices, which would allow a matrix for “extremely” to appropriately modify vectors for “smelly” or “strong” (Baroni and Zamparelli, 2010; Zanzotto et al. [sent-71, score-0.769]
22 These approaches are along the right lines but so far have been restricted to capture linear functions of pairs of words whereas we would like nonlinear functions to compute compositional meaning representations for multi-word phrases or full sentences. [sent-73, score-0.635]
23 The MV-RNN combines the strengths of both of these ideas by (i) assigning a vector and a matrix to every word and (ii) learning an input-specific, nonlinear, compositional function for computing vector and matrix representations for multi-word sequences of any syntactic type. [sent-74, score-0.954]
24 If a word lacks operator semantics, its matrix can be an identity matrix. [sent-76, score-0.254]
25 However, if a word acts mainly as an operator, such as “extremely”, its vector can become close to zero, while its matrix gains a clear operator meaning, here magnifying the meaning of the modified word in both positive and negative directions. [sent-77, score-0.415]
26 1 Matrix-Vector Neural Word Representation We represent a word as both a continuous vector and a matrix of parameters. [sent-81, score-0.265]
27 Similar to other local co-occurrence based vector space models, the resulting word vectors capture syntactic and semantic information. [sent-84, score-0.404]
28 If the vectors have dimensionality n, then each word’s matrix has dimensionality X ∈ Rn×n. [sent-90, score-0.36]
29 While the initialization is random, athliety vectors and matrices will subsequently be modified to enable a sequence of words to compose a vector that can predict a distribution over semantic labels. [sent-91, score-0.522]
30 In order to compute a parent vector p from two consecutive words and their respective vectors a and b, Mitchell and Lapata (2010) give as their most general function: p = f(a, b, R, K),where R is the a-priori known syntactic relation and K is background knowledge. [sent-98, score-0.36]
31 (2) where A, B are matrices for single words, the global W ∈ Rn×2n is a matrix that maps both transformed wWor ∈ds R Rback into the same n-dimensional space. [sent-112, score-0.324]
32 The element-wise function g could be simply the identity function but we use instead a nonlinearity such as the sigmoid or hyperbolic tangent tanh. [sent-113, score-0.244]
33 Rewriting the two transformed vectors as one vector z, we get p = g(Wz) which is a single layer neural network. [sent-116, score-0.356]
34 In this model, the word matrices can capture compositional effects specific to each word, whereas W captures a general composition function. [sent-117, score-0.525]
35 Baroni and Zamparelli (2010) computed the parent vector of adjective-noun pairs by p = Ab, where A is an adjective matrix and b is a vector for a noun. [sent-125, score-0.456]
36 3 Recursive Compositions of Multiple Words and Phrases This section describes how we extend a word-pair matrix-vector-based compositional model to learn vectors and matrices for longer sequences of words. [sent-137, score-0.571]
37 For this to work, we need to take as input a binary parse tree of a phrase or sentence and also compute matrices at each nonterminal parent node. [sent-139, score-0.383]
38 The function f can be readily used for phrase vectors since it is recursively compatible (p has the same dimensionality as its children). [sent-140, score-0.313]
39 For instance, to compute the vectors and matrices depicted in Fig. [sent-145, score-0.324]
40 The model computes vectors and matrices in a bottom-up fashion, applying the functions f,fM to its own previous output (i. [sent-148, score-0.38]
41 4 Objective Functions for Training One of the advantages of RNN-based models is that each node of a tree has associated with it a distributed vector representation (the parent vector p) which can also be seen as features describing that phrase. [sent-156, score-0.378]
42 We train these representations by adding on top of each parent node a simple softmax classifier to predict a class distribution over, e. [sent-157, score-0.324]
43 The main difference is that the MV-RNN has more flexibility since it has an input specific recursive function fA,B to compute each parent. [sent-166, score-0.238]
44 In the following applications, we will use the softmax classifier to predict both sentiment distributions and noun-noun relationships. [sent-167, score-0.338]
45 (4) To compute this gradient, we first compute all tree nodes (pi, Pi) from the bottom-up and then take derivatives of the softmax classifiers at each node in the tree from the top down. [sent-172, score-0.295]
46 6 Low-Rank Matrix Approximations If every word is represented by an n-dimensional vector and additionally by an n n matrix, the dimensionality odift itohnea wllyho blye amno nde ×l may a btericxo,m thee t doiolarge with commonly used vector sizes of n = 100. [sent-178, score-0.314]
47 7 Discussion: Evaluation and Generality Evaluation of compositional vector spaces is a complex task. [sent-181, score-0.396]
48 However, even with good correlation the question remains how these models would perform on downstream NLP tasks such as sentiment detection. [sent-184, score-0.215]
49 For sentiment analysis, this is not surprising since antonyms often get similar vectors during unsupervised learning from co-occurrences due to high similarity of local syntactic contexts. [sent-187, score-0.384]
50 If a model cannot correctly capture how an adverb operates on the meaning of adjectives, then there’s little chance it can learn operators for more complex relationships. [sent-194, score-0.384]
51 The second study analyzes whether the MV-RNN can learn simple boolean operators of propositional logic such as conjunctives or negation from truth values. [sent-195, score-0.585]
52 1 Predicting Sentiment Distributions of Adverb-Adjective Pairs The first study considers the prediction of finegrained sentiment distributions of adverb-adjective pairs and analyzes different possibilities for computing the parent vector p. [sent-198, score-0.443]
53 The results show that the MV-RNN operators are powerful enough to capture the operational meanings of various types of adverbs. [sent-199, score-0.252]
54 We never give the algorithm sentiment distributions for single words, and, while single words overlap between training and testing, the test set consists of never before seen word pairs. [sent-207, score-0.215]
55 unbelievaMRbNlVyN−aRnNnNoying 01 2345678910 fairly sad fairly awesome 0 4. [sent-231, score-0.305]
56 20 1 2 3 4 5 6 7 8 9 10 1 2 3 4 not awesome 5 6 7 8 9 not sad 0 0 02310. [sent-239, score-0.227]
57 RMNVN−RN 1 2 3 4 5 6 7 8 9 10 1 2 3 unbelievably awesome 1 2 3 4 5 6 7 8 4 5 6 7 8 9 10 8 9 10 unbelievably sad 9 10 1 2 3 4 5 6 7 Figure 3: Left: Average KL-divergence for predicting sentiment distributions of unseen adverb-adjective pairs of the test set. [sent-250, score-0.741]
58 Right: Predicting sentiment distributions (over 1-10 stars on the x-axis) of adverbadjective pairs. [sent-254, score-0.215]
59 The RNN and linear MVR are not able to modify the sentiment correctly: not awesome is more positive than fairly awesome and not annoying has a similar shape as unbelievably annoying. [sent-257, score-0.723]
60 An (adverb,adjective) pair is described by its vectors (a, b) and matrices (A, B). [sent-261, score-0.324]
61 We cross-validated all models over regularization parameters for word vectors, the softmax classifier, the RNN parameter W and the word operators (10−4, 10−3) and word vector sizes (n = 6, 8, 10, 12, 15, 20). [sent-276, score-0.471]
62 It shows that the idea of matrix-vector representations for all words and having a nonlinearity are both important. [sent-281, score-0.25]
63 The MV-RNN which combines these two ideas is best able to learn the various compositional effects. [sent-282, score-0.247]
64 However, only the MV-RNN has enough expressive power to allow negation to completely shift the sentiment with respect to an adjective. [sent-287, score-0.301]
65 A negated adjective carrying negative sentiment becomes slightly positive, whereas not awesome is correctly attenuated. [sent-288, score-0.343]
66 false falsefalse false truefalse false falsetrue true truetrue true ¬false false ¬true Figure 4: Training trees for the MV-RNN to learn propositional operators. [sent-293, score-0.41]
67 The model learns vectors and operators for ∧ (and) and ¬(negation). [sent-294, score-0.347]
68 In other words, can it learn some of the propositional logic operators such as and, or, not in terms of vectors and matrices from a few examples. [sent-300, score-0.786]
69 There have been many attempts at automatically parsing natural language to a logi- cal form using recursive compositional rules. [sent-305, score-0.41]
70 To this end, we illustrate in a simple example that our MV-RNN model and its learned word matrices (operators) have the ability to learn propositional logic operators such as ∧, ∨, ¬ (and, or, not). [sent-313, score-0.654]
71 ion T hfiosr tshe a ability to pick up these phenomena in real datasets 1207 and tasks such as sentiment detection which we focus on in the subsequent sections. [sent-315, score-0.215]
72 Fixing the operators to the 1 1 identity matrix 1 Fisi essentially ignoring ot htheme. [sent-320, score-0.412]
73 Thus, we can combine these operators to construct any propositional logic function of any number of inputs (including xor). [sent-329, score-0.459]
74 Table 2: Hard movie review examples of positive (1) and negative (0) sentiment (S. [sent-357, score-0.325]
75 The state of the art recursive autoencoder model of Socher et al. [sent-361, score-0.293]
76 In our last experiment we show that the MV-RNN can also learn how a syntactic context composes an aggregate meaning of the semantic relationships between words. [sent-370, score-0.302]
77 We show that by building a single compositional semantics for the minimal constituent including both terms one can achieve a higher performance. [sent-378, score-0.247]
78 We apply the same type of MV-RNN model as in sentiment to the subtree spanned by the two words. [sent-384, score-0.215]
79 Note that the final classifier is a recursive, compositional function of all the words in the syntactic path between the bracketed words. [sent-408, score-0.32]
80 maining inputs and the training setup are the same as in previous sentiment experiments. [sent-427, score-0.215]
81 In order to see whether our system can improve over this system, we added three features to the MV-RNN vector and trained another softmax classifier. [sent-430, score-0.256]
82 There are several sophisticated ideas for compositionality in vector spaces. [sent-444, score-0.277]
83 Mitchell and Lapata (2010) present an overview of the most important compositional models, from simple vector addition and component-wise multiplication to tensor products, and convolution (Metcalfe, 1990). [sent-445, score-0.468]
84 net/projects/supersensetag/ Other important models are tensor products (Clark and Pulman, 2007), quantum logic (Widdows, 2008), holographic reduced representations (Plate, 1995) and the Compositional Matrix Space model (Rudolph and Giesbrecht, 2010). [sent-449, score-0.42]
85 RNNs are related to autoencoder models such as the recursive autoas- sociative memory (RAAM) (Pollack, 1990) or recurrent neural networks (Elman, 1991). [sent-450, score-0.42]
86 Bottou (201 1) and Hinton (1990) discussed related models such as recursive autoencoders for text understanding. [sent-451, score-0.248]
87 Yessenalina and Cardie (201 1) introduce a sentiment analysis model that describes words as matrices and composition as matrix multiplication. [sent-458, score-0.627]
88 Since matrix multiplication is associative, this cannot capture different scopes of negation or syntactic differences. [sent-459, score-0.369]
89 Their model, is a special case of our encoding model (when you ignore vectors, fix the tree to be strictly branching in one direction and use as the matrix composition function P = AB). [sent-460, score-0.31]
90 Grefenstette and Sadrzadeh (201 1) learn matrices for verbs in a categorical model. [sent-462, score-0.231]
91 The trained matrices improve correlation with human judgments on the task of identifying relatedness of subjectverb-object triplets. [sent-463, score-0.247]
92 7 Conclusion We introduced a new model towards a complete treatment of compositionality in word vector spaces. [sent-464, score-0.277]
93 Our model builds on a syntactically plausible parse tree and can handle compositional phenomena. [sent-465, score-0.304]
94 The main novelty of our model is the combination of matrix-vector representations with a recursive neural network. [sent-466, score-0.436]
95 It can learn both the meaning vectors of a word and how that word modifies its neighbors (via its matrix). [sent-467, score-0.264]
96 It generalizes several models in the literature, can learn propositional logic, accurately predicts sentiment and can be used to classify semantic relationships between nouns in a sentence. [sent-469, score-0.563]
97 Experimental support for a categorical compositional distributional model of meaning. [sent-544, score-0.265]
98 Dependency tree-based sentiment classification using CRFs with hidden variables. [sent-617, score-0.215]
99 Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. [sent-638, score-0.283]
100 Learning continuous phrase representations and syntactic parsing with recursive neural networks. [sent-690, score-0.51]
wordName wordTfidf (topN-words)
[('socher', 0.322), ('sentiment', 0.215), ('operators', 0.215), ('compositional', 0.208), ('recursive', 0.202), ('matrices', 0.192), ('mvr', 0.149), ('rnn', 0.149), ('compositionality', 0.144), ('representations', 0.143), ('vector', 0.133), ('matrix', 0.132), ('vectors', 0.132), ('awesome', 0.128), ('unbelievably', 0.128), ('propositional', 0.123), ('softmax', 0.123), ('baroni', 0.115), ('ba', 0.115), ('zanzotto', 0.11), ('movie', 0.11), ('nonlinearity', 0.107), ('rn', 0.106), ('ab', 0.106), ('sad', 0.099), ('meaning', 0.093), ('neural', 0.091), ('composition', 0.088), ('negation', 0.086), ('logic', 0.085), ('multiplication', 0.077), ('hendrickx', 0.073), ('relationships', 0.068), ('zamparelli', 0.066), ('identity', 0.065), ('lapata', 0.065), ('mitchell', 0.065), ('semantic', 0.065), ('derivatives', 0.064), ('mv', 0.064), ('nrnn', 0.064), ('rmnv', 0.064), ('svmpos', 0.064), ('false', 0.062), ('recursively', 0.06), ('parent', 0.058), ('distributional', 0.057), ('operator', 0.057), ('functions', 0.056), ('holographic', 0.055), ('judgments', 0.055), ('spaces', 0.055), ('nakagawa', 0.054), ('art', 0.054), ('tree', 0.054), ('nouns', 0.053), ('quantum', 0.05), ('tensor', 0.05), ('yessenalina', 0.05), ('pretty', 0.05), ('wm', 0.05), ('wordnet', 0.048), ('dimensionality', 0.048), ('networks', 0.047), ('autoencoders', 0.046), ('network', 0.044), ('predicting', 0.043), ('annoying', 0.043), ('backpropagation', 0.043), ('goller', 0.043), ('mvrnn', 0.043), ('nella', 0.043), ('potts', 0.043), ('ptop', 0.043), ('recurrent', 0.043), ('rink', 0.043), ('rnns', 0.043), ('rudolph', 0.043), ('classifying', 0.042), ('linear', 0.042), ('parse', 0.042), ('reviews', 0.04), ('semantics', 0.039), ('path', 0.039), ('fairly', 0.039), ('learn', 0.039), ('extremely', 0.038), ('products', 0.037), ('analyzes', 0.037), ('fm', 0.037), ('phrase', 0.037), ('autoencoder', 0.037), ('influenza', 0.037), ('apartment', 0.037), ('negate', 0.037), ('stemming', 0.037), ('und', 0.037), ('syntactic', 0.037), ('capture', 0.037), ('function', 0.036)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000011 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces
Author: Richard Socher ; Brody Huval ; Christopher D. Manning ; Andrew Y. Ng
Abstract: Single-word vector space models have been very successful at learning lexical information. However, they cannot capture the compositional meaning of longer phrases, preventing them from a deeper understanding of language. We introduce a recursive neural network (RNN) model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length. Our model assigns a vector and a matrix to every node in a parse tree: the vector captures the inherent meaning of the constituent, while the matrix captures how it changes the meaning of neighboring words or phrases. This matrix-vector RNN can learn the meaning of operators in propositional logic and natural language. The model obtains state of the art performance on three different experiments: predicting fine-grained sentiment distributions of adverb-adjective pairs; classifying sentiment labels of movie reviews and classifying semantic relationships such as cause-effect or topic-message between nouns using the syntactic path between them.
2 0.42117533 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition
Author: William Blacoe ; Mirella Lapata
Abstract: In this paper we address the problem of modeling compositional meaning for phrases and sentences using distributional methods. We experiment with several possible combinations of representation and composition, exhibiting varying degrees of sophistication. Some are shallow while others operate over syntactic structure, rely on parameter learning, or require access to very large corpora. We find that shallow approaches are as good as more computationally intensive alternatives with regards to two particular tests: (1) phrase similarity and (2) paraphrase detection. The sizes of the involved training corpora and the generated vectors are not as important as the fit between the meaning representation and compositional method.
3 0.13704175 53 emnlp-2012-First Order vs. Higher Order Modification in Distributional Semantics
Author: Gemma Boleda ; Eva Maria Vecchi ; Miquel Cornudella ; Louise McNally
Abstract: Adjectival modification, particularly by expressions that have been treated as higherorder modifiers in the formal semantics tradition, raises interesting challenges for semantic composition in distributional semantic models. We contrast three types of adjectival modifiers intersectively used color terms (as in white towel, clearly first-order), subsectively used color terms (white wine, which have been modeled as both first- and higher-order), and intensional adjectives (former bassist, clearly higher-order) and test the ability of different composition strategies to model their behavior. In addition to opening up a new empirical domain for research on distributional semantics, our observations concerning the attested vectors for the different types of adjectives, the nouns they modify, and the resulting – – noun phrases yield insights into modification that have been little evident in the formal semantics literature to date.
4 0.13232686 34 emnlp-2012-Do Neighbours Help? An Exploration of Graph-based Algorithms for Cross-domain Sentiment Classification
Author: Natalia Ponomareva ; Mike Thelwall
Abstract: This paper presents a comparative study of graph-based approaches for cross-domain sentiment classification. In particular, the paper analyses two existing methods: an optimisation problem and a ranking algorithm. We compare these graph-based methods with each other and with the other state-ofthe-art approaches and conclude that graph domain representations offer a competitive solution to the domain adaptation problem. Analysis of the best parameters for graphbased algorithms reveals that there are no optimal values valid for all domain pairs and that these values are dependent on the characteristics of corresponding domains.
5 0.10750264 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
Author: Fei Huang ; Alexander Yates
Abstract: Representation learning is a promising technique for discovering features that allow supervised classifiers to generalize from a source domain dataset to arbitrary new domains. We present a novel, formal statement of the representation learning task. We argue that because the task is computationally intractable in general, it is important for a representation learner to be able to incorporate expert knowledge during its search for helpful features. Leveraging the Posterior Regularization framework, we develop an architecture for incorporating biases into representation learning. We investigate three types of biases, and experiments on two domain adaptation tasks show that our biased learners identify significantly better sets of features than unbiased learners, resulting in a relative reduction in error of more than 16% for both tasks, with respect to existing state-of-the-art representation learning techniques.
6 0.10155846 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes
7 0.088364854 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis
8 0.085561194 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context
9 0.083998658 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
10 0.079240501 15 emnlp-2012-Active Learning for Imbalanced Sentiment Classification
11 0.078774579 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
12 0.072624125 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews
13 0.070550457 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts
14 0.070363812 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering
15 0.066668853 65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure
16 0.066638038 32 emnlp-2012-Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants
17 0.064895853 29 emnlp-2012-Concurrent Acquisition of Word Meaning and Lexical Categories
18 0.063793972 139 emnlp-2012-Word Salad: Relating Food Prices and Descriptions
19 0.060845386 97 emnlp-2012-Natural Language Questions for the Web of Data
20 0.05965846 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP
topicId topicWeight
[(0, 0.261), (1, 0.07), (2, -0.007), (3, 0.171), (4, 0.137), (5, 0.138), (6, 0.085), (7, 0.118), (8, 0.261), (9, 0.077), (10, -0.417), (11, 0.176), (12, 0.01), (13, -0.137), (14, 0.087), (15, -0.117), (16, 0.19), (17, 0.015), (18, 0.047), (19, -0.057), (20, -0.055), (21, -0.009), (22, -0.117), (23, 0.015), (24, 0.019), (25, 0.053), (26, -0.052), (27, 0.065), (28, -0.07), (29, -0.006), (30, 0.004), (31, 0.034), (32, -0.037), (33, 0.028), (34, 0.015), (35, -0.03), (36, -0.08), (37, -0.103), (38, 0.011), (39, -0.022), (40, -0.002), (41, -0.023), (42, 0.005), (43, 0.061), (44, -0.028), (45, -0.089), (46, 0.055), (47, 0.038), (48, 0.064), (49, 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.95575559 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces
Author: Richard Socher ; Brody Huval ; Christopher D. Manning ; Andrew Y. Ng
Abstract: Single-word vector space models have been very successful at learning lexical information. However, they cannot capture the compositional meaning of longer phrases, preventing them from a deeper understanding of language. We introduce a recursive neural network (RNN) model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length. Our model assigns a vector and a matrix to every node in a parse tree: the vector captures the inherent meaning of the constituent, while the matrix captures how it changes the meaning of neighboring words or phrases. This matrix-vector RNN can learn the meaning of operators in propositional logic and natural language. The model obtains state of the art performance on three different experiments: predicting fine-grained sentiment distributions of adverb-adjective pairs; classifying sentiment labels of movie reviews and classifying semantic relationships such as cause-effect or topic-message between nouns using the syntactic path between them.
2 0.89708728 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition
Author: William Blacoe ; Mirella Lapata
Abstract: In this paper we address the problem of modeling compositional meaning for phrases and sentences using distributional methods. We experiment with several possible combinations of representation and composition, exhibiting varying degrees of sophistication. Some are shallow while others operate over syntactic structure, rely on parameter learning, or require access to very large corpora. We find that shallow approaches are as good as more computationally intensive alternatives with regards to two particular tests: (1) phrase similarity and (2) paraphrase detection. The sizes of the involved training corpora and the generated vectors are not as important as the fit between the meaning representation and compositional method.
3 0.78831369 53 emnlp-2012-First Order vs. Higher Order Modification in Distributional Semantics
Author: Gemma Boleda ; Eva Maria Vecchi ; Miquel Cornudella ; Louise McNally
Abstract: Adjectival modification, particularly by expressions that have been treated as higherorder modifiers in the formal semantics tradition, raises interesting challenges for semantic composition in distributional semantic models. We contrast three types of adjectival modifiers intersectively used color terms (as in white towel, clearly first-order), subsectively used color terms (white wine, which have been modeled as both first- and higher-order), and intensional adjectives (former bassist, clearly higher-order) and test the ability of different composition strategies to model their behavior. In addition to opening up a new empirical domain for research on distributional semantics, our observations concerning the attested vectors for the different types of adjectives, the nouns they modify, and the resulting – – noun phrases yield insights into modification that have been little evident in the formal semantics literature to date.
4 0.52057379 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis
Author: Wen-tau Yih ; Geoffrey Zweig ; John Platt
Abstract: Existing vector space models typically map synonyms and antonyms to similar word vectors, and thus fail to represent antonymy. We introduce a new vector space representation where antonyms lie on opposite sides of a sphere: in the word vector space, synonyms have cosine similarities close to one, while antonyms are close to minus one. We derive this representation with the aid of a thesaurus and latent semantic analysis (LSA). Each entry in the thesaurus a word sense along with its synonyms and antonyms is treated as a “document,” and the resulting document collection is subjected to LSA. The key contribution of this work is to show how to assign signs to the entries in the co-occurrence matrix on which LSA operates, so as to induce a subspace with the desired property. – – We evaluate this procedure with the Graduate Record Examination questions of (Mohammed et al., 2008) and find that the method improves on the results of that study. Further improvements result from refining the subspace representation with discriminative training, and augmenting the training data with general newspaper text. Altogether, we improve on the best previous results by 11points absolute in F measure.
5 0.44940788 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context
Author: Mehmet Ali Yatbaz ; Enis Sert ; Deniz Yuret
Abstract: We investigate paradigmatic representations of word context in the domain of unsupervised syntactic category acquisition. Paradigmatic representations of word context are based on potential substitutes of a word in contrast to syntagmatic representations based on properties of neighboring words. We compare a bigram based baseline model with several paradigmatic models and demonstrate significant gains in accuracy. Our best model based on Euclidean co-occurrence embedding combines the paradigmatic context representation with morphological and orthographic features and achieves 80% many-to-one accuracy on a 45-tag 1M word corpus.
6 0.40208367 34 emnlp-2012-Do Neighbours Help? An Exploration of Graph-based Algorithms for Cross-domain Sentiment Classification
7 0.35665458 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
8 0.35545775 61 emnlp-2012-Grounded Models of Semantic Representation
9 0.34832826 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
10 0.29342765 139 emnlp-2012-Word Salad: Relating Food Prices and Descriptions
11 0.29080302 15 emnlp-2012-Active Learning for Imbalanced Sentiment Classification
12 0.27503809 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
13 0.26104087 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes
14 0.25614509 29 emnlp-2012-Concurrent Acquisition of Word Meaning and Lexical Categories
15 0.24792986 32 emnlp-2012-Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants
16 0.24345247 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering
17 0.23511058 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP
18 0.23209865 27 emnlp-2012-Characterizing Stylistic Elements in Syntactic Structure
19 0.22534589 38 emnlp-2012-Employing Compositional Semantics and Discourse Consistency in Chinese Event Extraction
20 0.22453168 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts
topicId topicWeight
[(2, 0.013), (16, 0.021), (25, 0.016), (34, 0.044), (45, 0.048), (60, 0.066), (63, 0.043), (64, 0.023), (65, 0.028), (70, 0.012), (74, 0.034), (76, 0.537), (80, 0.012), (86, 0.019), (95, 0.016)]
simIndex simValue paperId paperTitle
1 0.94048566 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation
Author: David Hall ; Dan Klein
Abstract: PCFGs can grow exponentially as additional annotations are added to an initially simple base grammar. We present an approach where multiple annotations coexist, but in a factored manner that avoids this combinatorial explosion. Our method works with linguisticallymotivated annotations, induced latent structure, lexicalization, or any mix of the three. We use a structured expectation propagation algorithm that makes use of the factored structure in two ways. First, by partitioning the factors, it speeds up parsing exponentially over the unfactored approach. Second, it minimizes the redundancy of the factors during training, improving accuracy over an independent approach. Using purely latent variable annotations, we can efficiently train and parse with up to 8 latent bits per symbol, achieving F1 scores up to 88.4 on the Penn Treebank while using two orders of magnitudes fewer parameters compared to the na¨ ıve approach. Combining latent, lexicalized, and unlexicalized anno- tations, our best parser gets 89.4 F1 on all sentences from section 23 of the Penn Treebank.
same-paper 2 0.92409533 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces
Author: Richard Socher ; Brody Huval ; Christopher D. Manning ; Andrew Y. Ng
Abstract: Single-word vector space models have been very successful at learning lexical information. However, they cannot capture the compositional meaning of longer phrases, preventing them from a deeper understanding of language. We introduce a recursive neural network (RNN) model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length. Our model assigns a vector and a matrix to every node in a parse tree: the vector captures the inherent meaning of the constituent, while the matrix captures how it changes the meaning of neighboring words or phrases. This matrix-vector RNN can learn the meaning of operators in propositional logic and natural language. The model obtains state of the art performance on three different experiments: predicting fine-grained sentiment distributions of adverb-adjective pairs; classifying sentiment labels of movie reviews and classifying semantic relationships such as cause-effect or topic-message between nouns using the syntactic path between them.
Author: Ahmed Hassan ; Amjad Abu-Jbara ; Dragomir Radev
Abstract: A mixture of positive (friendly) and negative (antagonistic) relations exist among users in most social media applications. However, many such applications do not allow users to explicitly express the polarity of their interactions. As a result most research has either ignored negative links or was limited to the few domains where such relations are explicitly expressed (e.g. Epinions trust/distrust). We study text exchanged between users in online communities. We find that the polarity of the links between users can be predicted with high accuracy given the text they exchange. This allows us to build a signed network representation of discussions; where every edge has a sign: positive to denote a friendly relation, or negative to denote an antagonistic relation. We also connect our analysis to social psychology theories of balance. We show that the automatically predicted networks are consistent with those theories. Inspired by that, we present a technique for identifying subgroups in discussions by partitioning singed networks representing them.
4 0.58850539 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition
Author: William Blacoe ; Mirella Lapata
Abstract: In this paper we address the problem of modeling compositional meaning for phrases and sentences using distributional methods. We experiment with several possible combinations of representation and composition, exhibiting varying degrees of sophistication. Some are shallow while others operate over syntactic structure, rely on parameter learning, or require access to very large corpora. We find that shallow approaches are as good as more computationally intensive alternatives with regards to two particular tests: (1) phrase similarity and (2) paraphrase detection. The sizes of the involved training corpora and the generated vectors are not as important as the fit between the meaning representation and compositional method.
5 0.47969106 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
Author: Jayant Krishnamurthy ; Tom Mitchell
Abstract: We present a method for training a semantic parser using only a knowledge base and an unlabeled text corpus, without any individually annotated sentences. Our key observation is that multiple forms ofweak supervision can be combined to train an accurate semantic parser: semantic supervision from a knowledge base, and syntactic supervision from dependencyparsed sentences. We apply our approach to train a semantic parser that uses 77 relations from Freebase in its knowledge representation. This semantic parser extracts instances of binary relations with state-of-theart accuracy, while simultaneously recovering much richer semantic structures, such as conjunctions of multiple relations with partially shared arguments. We demonstrate recovery of this richer structure by extracting logical forms from natural language queries against Freebase. On this task, the trained semantic parser achieves 80% precision and 56% recall, despite never having seen an annotated logical form.
6 0.47283483 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT
7 0.46942273 30 emnlp-2012-Constructing Task-Specific Taxonomies for Document Collection Browsing
8 0.46333489 134 emnlp-2012-User Demographics and Language in an Implicit Social Network
9 0.46099621 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews
10 0.44946185 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP
11 0.4486838 53 emnlp-2012-First Order vs. Higher Order Modification in Distributional Semantics
12 0.44521892 120 emnlp-2012-Streaming Analysis of Discourse Participants
13 0.44482896 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
14 0.43931857 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis
15 0.43207231 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts
16 0.43199754 51 emnlp-2012-Extracting Opinion Expressions with semi-Markov Conditional Random Fields
17 0.4297871 139 emnlp-2012-Word Salad: Relating Food Prices and Descriptions
18 0.40971851 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction
19 0.40529034 133 emnlp-2012-Unsupervised PCFG Induction for Grounded Language Learning with Highly Ambiguous Supervision
20 0.39538038 122 emnlp-2012-Syntactic Surprisal Affects Spoken Word Duration in Conversational Contexts