emnlp emnlp2012 emnlp2012-4 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: William Blacoe ; Mirella Lapata
Abstract: In this paper we address the problem of modeling compositional meaning for phrases and sentences using distributional methods. We experiment with several possible combinations of representation and composition, exhibiting varying degrees of sophistication. Some are shallow while others operate over syntactic structure, rely on parameter learning, or require access to very large corpora. We find that shallow approaches are as good as more computationally intensive alternatives with regards to two particular tests: (1) phrase similarity and (2) paraphrase detection. The sizes of the involved training corpora and the generated vectors are not as important as the fit between the meaning representation and compositional method.
Reference: text
sentIndex sentText sentNum sentScore
1 w b blacoe@ sms ed ac Abstract In this paper we address the problem of modeling compositional meaning for phrases and sentences using distributional methods. [sent-6, score-0.524]
2 We find that shallow approaches are as good as more computationally intensive alternatives with regards to two particular tests: (1) phrase similarity and (2) paraphrase detection. [sent-9, score-0.324]
3 The sizes of the involved training corpora and the generated vectors are not as important as the fit between the meaning representation and compositional method. [sent-10, score-0.578]
4 For example, they have been used to model judgments of semantic similarity (McDonald, 2000) and association (Denhire and Lemaire, 2004; Griffiths et al. [sent-12, score-0.249]
5 While much research has been directed at the most effective ways of constructing representations for individual words, there has been far less consensus regarding the representation of larger constructions such as phrases and sentences. [sent-20, score-0.309]
6 The problem has received some attention in the connection- ist literature, particularly in response to criticisms of the ability of connectionist representations to handle complex structures (Smolensky, 1990; Plate, 1995). [sent-21, score-0.215]
7 More recently, several proposals have been put forward for computing the meaning of word combinations in vector spaces. [sent-22, score-0.186]
8 This renewed interest is partly due to the popularity of distributional methods and their application potential to tasks that require an understanding of larger phrases or complete sentences. [sent-23, score-0.238]
9 For example, Mitchell and Lapata (2010) introduce a general framework for studying vector composition, which they formulate as a function f of two vectors u and v. [sent-24, score-0.319]
10 Different composition models arise, depending on how f is chosen. [sent-25, score-0.331]
11 Assuming that composition is a linear function of the Cartesian product of u and v allows to specify additive models which are by far the most common method of vector combination in the literature (Landauer and Dumais, 1997; Foltz et al. [sent-26, score-0.504]
12 Alternatively, assuming that composition is a linear function of the tensor product of u and v, gives rise to models based on multiplication. [sent-28, score-0.663]
13 One of the most sophisticated proposals for semantic composition is that of Clark et al. [sent-29, score-0.441]
14 Using techniques from logic, category theory, and quantum information they develop a compositional distributional semantics that brings type-logical and distributional vector space models together. [sent-33, score-0.861]
15 The category of a word is decided by the number and type of adjoints (arguments) it can take and the composition of a sentence results in a vector which exists in sentential space. [sent-35, score-0.436]
16 (201 1b) present a framework based on recursive neural net- works that learns vector space representations for multi-word phrases and sentences. [sent-41, score-0.592]
17 Parent representations are computed essentially by concatenating the representations of their children. [sent-44, score-0.364]
18 During training, the model tries to minimize the reconstruction errors between the n-dimensional parent vectors and those representing their children. [sent-45, score-0.269]
19 This model can also compute compositional representations when the tree structure is not given, e. [sent-46, score-0.424]
20 Although the type of function used for vector composition has attracted much attention, relatively less emphasis has been placed on the basic distributional representations on which the composition functions operate. [sent-49, score-1.147]
21 In this paper, we examine three types of distributional representation of increasing sophistication and their effect on semantic composition. [sent-50, score-0.32]
22 Using these representations, we construct several compositional models, based on addition, multiplication, and recursive neural networks. [sent-54, score-0.461]
23 The first one involves modeling similarity judgments for short phrases gathered in human experiments (Mitchell and Lapata, 2010). [sent-56, score-0.214]
24 We instantiate these word representations following three distinct semantic space models which we describe in Section 2. [sent-65, score-0.303]
25 , how a phrase or a sentence can be represented as a vector using the vectors of its constituent words. [sent-70, score-0.337]
26 Combin- ing different vector representations and composition methods gives rise to several compositional models whose performance we evaluate in Sections 3 and 4. [sent-71, score-0.892]
27 The dimensionality will depend on the source of the vectors involved. [sent-74, score-0.278]
28 The contextual elements can be words themselves, or larger linguistic units such as sentences or documents, or even more complex linguistic representations such as the argument slots of predicates. [sent-78, score-0.182]
29 A semantic space that is often employed in studying compositionality across a variety of tasks (Mitchell and Lapata, 2010; Grefenstette and Sadrzadeh, 2011a) uses a context window of five words on either side of the target word, and 2,000 vector dimensions. [sent-79, score-0.317]
30 Despite its simplicity, it is a good starting point for studying representations for compositional models as a baseline against which to evaluate more elaborate models. [sent-103, score-0.457]
31 Neural Language Model Another perhaps less well-known approach to meaning representation is to represent words as continuous vectors of parameters. [sent-104, score-0.272]
32 Such word vectors can be obtained with an unsupervised neural language model (NLM, Bengio (2001); Collobert and Weston (2008)) which jointly learns an embedding of words into a vector space and uses these vectors to predict how likely a word is, given its context. [sent-105, score-0.727]
33 We induced word embeddings with Collobert and Weston (2008)’s neural language model. [sent-106, score-0.276]
34 The6= =mwodel concatenates the learned embeddings of the n words and predicts a score for the n-gram sequence using the learned embeddings as features. [sent-119, score-0.308]
35 The model learns via gradient descent over the neural network parameters and the embedding lookup table. [sent-121, score-0.257]
36 Word vectors are stored in a word embedding matrix which captures syntactic and semantic information from co-occurrence statistics. [sent-122, score-0.348]
37 As these representations are learned, albeit in an unsupervised manner, one would hope that they capture word meanings more succinctly, compared to the simpler distributional representations that are merely based on co-occurrence. [sent-123, score-0.562]
38 We experimented with vectors of varying dimensionality (ranging from 50 to 200, with a step size of 50). [sent-128, score-0.278]
39 Distributional Memory Tensor Baroni and Lenci (2010) present Distributional Memory, a generalized framework for distributional semantics from which several special-purpose models can be derived. [sent-149, score-0.237]
40 In their framework distributional information 549 Figure 1: A two-dimensional projection of the word embeddings we trained on the BNC using Turian et al. [sent-150, score-0.352]
41 word wlink lco-word vvalue c tensor is extracted from the corpus once, in the form of a set of weighted word-link-word tuples arranged into a third-order tensor. [sent-154, score-0.344]
42 In this way, the same distributional information can be shared across tasks such as word similarity or analogical learning. [sent-156, score-0.343]
43 More formally, Baroni and Lenci (2010) construct a 3-dimensional tensor T assigning a value c to instances of word pairs w, v and a connecting link-word l. [sent-157, score-0.3]
44 These were taken from a distributional memory tensor1 1Available at http://clic. [sent-160, score-0.24]
45 frequencylink lco-word v Lenci (2010)’s tensor (v and j represent verbs and adjectives, respectively). [sent-164, score-0.3]
46 Extracting a 3-dimensional tensor from the BNC alone would create very sparse representations. [sent-167, score-0.3]
47 We therefore extract so-called word-fibres, essentially projections onto a lower-dimensional subspace, from the same tensor Baroni and Lenci (2010) collectively derived from the 3 billion word corpus just described (henceforth 3-BWC). [sent-168, score-0.3]
48 This approach is used when a fixed vector dimensionality is necessary. [sent-184, score-0.202]
49 Their representations can then have a denser format, that is, with no zero-valued components. [sent-189, score-0.225]
50 The dimensionality varies strongly depending on the selection of words, but if n does not exceed 4, the dimensionality |ctxtdyn | will typically be sceuebsdt 4an,t tihael enough. [sent-197, score-0.194]
51 2 Composition Methods In our experiments we compose word vectors to create representations for phrase vectors and sentence vectors. [sent-204, score-0.595]
52 Conceiving of a phrase phr = (w1,w2) as a binary tuple of words, we obtain its vector from its words’ vectors either by addition: phrVec(w1,w2) = wdVecw1 + wdVecw2 (10) or by point-wise multiplication: phrVec(w1,w2) = wdVecw1 ? [sent-208, score-0.337]
53 e,xniists The multiplication model in (13) can be seen as an instantiation of the categorical compositional framework put forward by Clark et al. [sent-221, score-0.381]
54 In fact, a variety of multiplication-based models can be derived from this framework; and comparisons against component-wise multiplication on phrase similarity tasks yield comparable results (Grefenstette and Sadrzadeh, 2011a; Grefenstette and Sadrzadeh, 2011b). [sent-223, score-0.298]
55 We thus opt for the model (13) as an example of compositional models based on multiplication due to its good performance across a variety oftasks, including language modeling and prediction of reading difficulty (Mitchell, 2011). [sent-224, score-0.381]
56 Our third method, for creating phrase and sentence vectors alike, is the application of Socher et al. [sent-225, score-0.232]
57 This tree is then used as the basis for a deep recursive autoencoder (RAE). [sent-228, score-0.189]
58 use word embeddings provided by the neural language model (Collobert and Weston, 2008). [sent-231, score-0.276]
59 (201 1a) extend the standard recursive autoencoder sketched above in two ways. [sent-234, score-0.189]
60 We obtained three compositional models per representation resulting in nine compositional models overall. [sent-237, score-0.531]
61 Plugging different representations into the additive and multiplicative models is relatively × straightforward. [sent-238, score-0.296]
62 NLM vectors were trained with this dimensionality on the BNC for 7. [sent-242, score-0.278]
63 We constructed a simple distributional space with M = 100 dimensions, i. [sent-244, score-0.244]
64 In the case of vectors obtained from Baroni and Lenci (2010)’s DM tensor, we differentiated between phrases and sentences, due to the disparate amount of words contained in them (see Section 2. [sent-247, score-0.221]
65 To represent phrases, we used vectors of dynamic dimensionality, since these form a richer and denser representation. [sent-249, score-0.224]
66 The sentences considered in Section 4 are too large for this approach and all word vectors must be members of the same vector space. [sent-250, score-0.286]
67 Hence, these sentence vectors have fixed dimensionality D = 100, consisting of the “most significant” 100 dimensions, i. [sent-251, score-0.278]
68 3 Experiment 1: Phrase Similarity Our first experiment focused on modeling similarity judgments for short phrases gathered in human experiments. [sent-254, score-0.214]
69 Distributional representations of individual words are commonly evaluated on tasks based on their ability to model semantic similarity relations, e. [sent-255, score-0.365]
70 Thus, it seems appropriate to evaluate phrase representations in a dim. [sent-258, score-0.233]
71 1), composition method: + is additive vector composition, ? [sent-274, score-0.504]
72 Using the composition models described above, we compute the cosine similarity of phr1 and phr2: phrSimphr1,phr2=|phprhVrVecepchprh1r|1×· p|hprhVrVeeccphprh2r2| (17) Model similarities were evaluated against the human similarity ratings using Spearman’s ρ correlation coefficient. [sent-281, score-0.547]
73 For each phrase type we report results for each compositional model, namely additive (+), multiplicative (? [sent-284, score-0.407]
74 uk/ s 0 4 5 3 3 5 6 / share http : 552 the dimensionality of the input vectors next to the vector representation. [sent-292, score-0.383]
75 In general, neither DM or NLM in any compositional configuration are able to outperform SDS with multiplication. [sent-295, score-0.242]
76 4 Experiment 2: Paraphrase Detection Although the phrase similarity task gives a fairly direct insight into semantic similarity and compositional representations, it is somewhat limited in scope as it only considers two-word constructions rather than naturally occurring sentences. [sent-301, score-0.624]
77 Ideally, we would like to augment our evaluation with a task which is based on large quantities of natural data and for which vector composition has practical consequences. [sent-302, score-0.436]
78 The vector representations obtained from our various compositional models were used as features for the paraphrase classification task. [sent-306, score-0.694]
79 Therefore, uniOverlapi1,i2=nMk∑=SR1PCsm=i1n,2{wdCountis[k]} (21) In order to establish which features work best for each representation and composition method, we exhaustively explored all combinations on a development set (20% of the original MSRPC training set). [sent-342, score-0.415]
80 Each row corresponds to a different type of composition and each column to a different word representation model. [sent-344, score-0.378]
81 As can be seen, the distributional memory (DM) is the best performing representation for the additive composition model. [sent-345, score-0.686]
82 The neural language model (NLM) gives best results for the recursive autoencoder (RAE), although the other two representations come close. [sent-346, score-0.493]
83 And finally the simple distributional semantic space (SDS) works best with multiplication. [sent-347, score-0.319]
84 Although our intention was to use the paraphrase detection task as a test-bed for evaluating compositional models rather than achieving state-of-the-art results, Table 6 compares our approach against previous work on the same task and dataset. [sent-349, score-0.407]
85 WordNet in conjunction with distributional similarity in an attempt to detect meaning conveyed by synonymous words (Islam and Inkpen, 2007; Mihalcea et al. [sent-355, score-0.35]
86 (201 1a) without using elaborate features, or any additional manipulations over and above the output of the composition functions 3Without dynamic pooling, their model yields of 74. [sent-368, score-0.331]
87 5 Discussion In this paper we systematically compared three types of distributional representation and their effect on semantic composition. [sent-371, score-0.32]
88 Our comparisons involved a simple distributional semantic space (Mitchell and Lapata, 2010), word embeddings computed with a neural language model (Collobert and Weston, 2008) and a representation based on weighted word-link-word tuples arranged into a third-order tensor (Baroni and Lenci, 2010). [sent-372, score-1.05]
89 These representations served as input to three composition methods involving addition, multiplication and a deep recursive autoencoder. [sent-374, score-0.749]
90 In contrast, the recursive autoencoder is syntax-aware as it operates over a parse tree. [sent-376, score-0.189]
91 However, the composed representations must be learned with a neural network. [sent-377, score-0.304]
92 We evaluated nine models on the complementary tasks of phrase similarity and paraphrase detection. [sent-378, score-0.324]
93 The former task simplifies the challenge of finding an adequate method of composition and places more emphasis on the representation, whereas the latter poses, in a sense, the ultimate challenge for composition models. [sent-379, score-0.662]
94 Despite being in theory more expressive, the representations obtained by the neural language model and the third-order tensor cannot match the simple semantic space on the phrase similarity task. [sent-382, score-0.884]
95 In this task syntax-oblivious composition models are superior to the more sophisticated recursive autoencoder. [sent-383, score-0.463]
96 The simple semantic space may not take word order or sentence structure into account, but nevertheless achieves considerable semantic expressivity: it is on par with the third-order tensor without having access to as much data (3 billion words) or a syntactically parsed corpus. [sent-385, score-0.496]
97 What do these findings tell us about the future of compositional models for distributional semantics? [sent-386, score-0.44]
98 The problem of finding the right methods of vector composition cannot be pursued independent of the choice of lexical representation. [sent-387, score-0.436]
99 Having tested many model combinations, we argue that in a good model of distributive semantics representation and composition must go hand in hand, i. [sent-388, score-0.46]
100 Experimental support for a categorical compositional distributional model of meaning. [sent-463, score-0.44]
wordName wordTfidf (topN-words)
[('composition', 0.331), ('tensor', 0.3), ('compositional', 0.242), ('distributional', 0.198), ('socher', 0.185), ('representations', 0.182), ('vectors', 0.181), ('paraphrase', 0.165), ('baroni', 0.159), ('embeddings', 0.154), ('lenci', 0.15), ('bnc', 0.139), ('multiplication', 0.139), ('nlm', 0.129), ('neural', 0.122), ('similarity', 0.108), ('sds', 0.107), ('vector', 0.105), ('grefenstette', 0.101), ('collobert', 0.1), ('dimensionality', 0.097), ('recursive', 0.097), ('embedding', 0.092), ('autoencoder', 0.092), ('msrpc', 0.092), ('weston', 0.092), ('seni', 0.086), ('dm', 0.083), ('rae', 0.083), ('mitchell', 0.082), ('semantic', 0.075), ('lapata', 0.074), ('additive', 0.068), ('sadrzadeh', 0.067), ('judgments', 0.066), ('ctxtdyn', 0.064), ('freqw', 0.064), ('nblnmc', 0.064), ('sbdnsc', 0.064), ('wdcounti', 0.064), ('involved', 0.064), ('compositionality', 0.058), ('mehrnoosh', 0.055), ('reconstruction', 0.055), ('phrase', 0.051), ('nii', 0.05), ('representation', 0.047), ('jeff', 0.046), ('multiplicative', 0.046), ('space', 0.046), ('meaning', 0.044), ('tuples', 0.044), ('lookup', 0.043), ('pooling', 0.043), ('spearman', 0.043), ('dimensions', 0.043), ('blacoe', 0.043), ('cocountw', 0.043), ('corrupted', 0.043), ('denhire', 0.043), ('denser', 0.043), ('distributive', 0.043), ('dyn', 0.043), ('islam', 0.043), ('jdyn', 0.043), ('jtop', 0.043), ('ltw', 0.043), ('nmsrpc', 0.043), ('phrvec', 0.043), ('senvec', 0.043), ('totalcount', 0.043), ('vocbnc', 0.043), ('wdcount', 0.043), ('wdkv', 0.043), ('wdvecwk', 0.043), ('children', 0.043), ('memory', 0.042), ('phrases', 0.04), ('constructions', 0.04), ('semantics', 0.039), ('analogical', 0.037), ('behavioral', 0.037), ('dissimilarity', 0.037), ('kintsch', 0.037), ('subtraction', 0.037), ('combinations', 0.037), ('das', 0.037), ('sophisticated', 0.035), ('landauer', 0.034), ('layer', 0.034), ('cartesian', 0.033), ('connectionist', 0.033), ('expressivity', 0.033), ('quantum', 0.033), ('turian', 0.033), ('yoshua', 0.033), ('edinburgh', 0.033), ('studying', 0.033), ('parent', 0.033), ('rise', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000018 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition
Author: William Blacoe ; Mirella Lapata
Abstract: In this paper we address the problem of modeling compositional meaning for phrases and sentences using distributional methods. We experiment with several possible combinations of representation and composition, exhibiting varying degrees of sophistication. Some are shallow while others operate over syntactic structure, rely on parameter learning, or require access to very large corpora. We find that shallow approaches are as good as more computationally intensive alternatives with regards to two particular tests: (1) phrase similarity and (2) paraphrase detection. The sizes of the involved training corpora and the generated vectors are not as important as the fit between the meaning representation and compositional method.
2 0.42117533 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces
Author: Richard Socher ; Brody Huval ; Christopher D. Manning ; Andrew Y. Ng
Abstract: Single-word vector space models have been very successful at learning lexical information. However, they cannot capture the compositional meaning of longer phrases, preventing them from a deeper understanding of language. We introduce a recursive neural network (RNN) model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length. Our model assigns a vector and a matrix to every node in a parse tree: the vector captures the inherent meaning of the constituent, while the matrix captures how it changes the meaning of neighboring words or phrases. This matrix-vector RNN can learn the meaning of operators in propositional logic and natural language. The model obtains state of the art performance on three different experiments: predicting fine-grained sentiment distributions of adverb-adjective pairs; classifying sentiment labels of movie reviews and classifying semantic relationships such as cause-effect or topic-message between nouns using the syntactic path between them.
3 0.18884805 53 emnlp-2012-First Order vs. Higher Order Modification in Distributional Semantics
Author: Gemma Boleda ; Eva Maria Vecchi ; Miquel Cornudella ; Louise McNally
Abstract: Adjectival modification, particularly by expressions that have been treated as higherorder modifiers in the formal semantics tradition, raises interesting challenges for semantic composition in distributional semantic models. We contrast three types of adjectival modifiers intersectively used color terms (as in white towel, clearly first-order), subsectively used color terms (white wine, which have been modeled as both first- and higher-order), and intensional adjectives (former bassist, clearly higher-order) and test the ability of different composition strategies to model their behavior. In addition to opening up a new empirical domain for research on distributional semantics, our observations concerning the attested vectors for the different types of adjectives, the nouns they modify, and the resulting – – noun phrases yield insights into modification that have been little evident in the formal semantics literature to date.
4 0.13229772 58 emnlp-2012-Generalizing Sub-sentential Paraphrase Acquisition across Original Signal Type of Text Pairs
Author: Aurelien Max ; Houda Bouamor ; Anne Vilnat
Abstract: This paper describes a study on the impact of the original signal (text, speech, visual scene, event) of a text pair on the task of both manual and automatic sub-sentential paraphrase acquisition. A corpus of 2,500 annotated sentences in English and French is described, and performance on this corpus is reported for an efficient system combination exploiting a large set of features for paraphrase recognition. A detailed quantified typology of subsentential paraphrases found in our corpus types is given.
5 0.12954156 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context
Author: Mehmet Ali Yatbaz ; Enis Sert ; Deniz Yuret
Abstract: We investigate paradigmatic representations of word context in the domain of unsupervised syntactic category acquisition. Paradigmatic representations of word context are based on potential substitutes of a word in contrast to syntagmatic representations based on properties of neighboring words. We compare a bigram based baseline model with several paradigmatic models and demonstrate significant gains in accuracy. Our best model based on Euclidean co-occurrence embedding combines the paradigmatic context representation with morphological and orthographic features and achieves 80% many-to-one accuracy on a 45-tag 1M word corpus.
6 0.12812535 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
7 0.12000078 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering
8 0.11824635 39 emnlp-2012-Enlarging Paraphrase Collections through Generalization and Instantiation
9 0.10655542 135 emnlp-2012-Using Discourse Information for Paraphrase Extraction
10 0.10610341 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis
11 0.09650401 80 emnlp-2012-Learning Verb Inference Rules from Linguistically-Motivated Evidence
12 0.078740604 61 emnlp-2012-Grounded Models of Semantic Representation
13 0.076382518 29 emnlp-2012-Concurrent Acquisition of Word Meaning and Lexical Categories
14 0.066427223 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
15 0.066407263 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
16 0.061174124 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP
17 0.060029786 65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure
18 0.054954909 49 emnlp-2012-Exploring Topic Coherence over Many Models and Many Topics
19 0.052957151 11 emnlp-2012-A Systematic Comparison of Phrase Table Pruning Techniques
20 0.049100891 27 emnlp-2012-Characterizing Stylistic Elements in Syntactic Structure
topicId topicWeight
[(0, 0.24), (1, 0.04), (2, -0.112), (3, 0.123), (4, 0.131), (5, 0.289), (6, 0.154), (7, 0.081), (8, 0.238), (9, 0.089), (10, -0.437), (11, 0.133), (12, 0.02), (13, -0.146), (14, 0.086), (15, -0.134), (16, 0.167), (17, -0.057), (18, 0.043), (19, -0.059), (20, -0.011), (21, 0.003), (22, -0.151), (23, -0.015), (24, -0.009), (25, 0.061), (26, -0.085), (27, 0.049), (28, -0.062), (29, -0.004), (30, 0.045), (31, 0.049), (32, 0.018), (33, 0.058), (34, 0.001), (35, -0.055), (36, -0.03), (37, -0.029), (38, 0.008), (39, 0.015), (40, 0.023), (41, -0.022), (42, 0.035), (43, 0.043), (44, -0.039), (45, -0.031), (46, 0.001), (47, 0.028), (48, 0.031), (49, 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.95552421 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition
Author: William Blacoe ; Mirella Lapata
Abstract: In this paper we address the problem of modeling compositional meaning for phrases and sentences using distributional methods. We experiment with several possible combinations of representation and composition, exhibiting varying degrees of sophistication. Some are shallow while others operate over syntactic structure, rely on parameter learning, or require access to very large corpora. We find that shallow approaches are as good as more computationally intensive alternatives with regards to two particular tests: (1) phrase similarity and (2) paraphrase detection. The sizes of the involved training corpora and the generated vectors are not as important as the fit between the meaning representation and compositional method.
2 0.89992541 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces
Author: Richard Socher ; Brody Huval ; Christopher D. Manning ; Andrew Y. Ng
Abstract: Single-word vector space models have been very successful at learning lexical information. However, they cannot capture the compositional meaning of longer phrases, preventing them from a deeper understanding of language. We introduce a recursive neural network (RNN) model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length. Our model assigns a vector and a matrix to every node in a parse tree: the vector captures the inherent meaning of the constituent, while the matrix captures how it changes the meaning of neighboring words or phrases. This matrix-vector RNN can learn the meaning of operators in propositional logic and natural language. The model obtains state of the art performance on three different experiments: predicting fine-grained sentiment distributions of adverb-adjective pairs; classifying sentiment labels of movie reviews and classifying semantic relationships such as cause-effect or topic-message between nouns using the syntactic path between them.
3 0.85118353 53 emnlp-2012-First Order vs. Higher Order Modification in Distributional Semantics
Author: Gemma Boleda ; Eva Maria Vecchi ; Miquel Cornudella ; Louise McNally
Abstract: Adjectival modification, particularly by expressions that have been treated as higherorder modifiers in the formal semantics tradition, raises interesting challenges for semantic composition in distributional semantic models. We contrast three types of adjectival modifiers intersectively used color terms (as in white towel, clearly first-order), subsectively used color terms (white wine, which have been modeled as both first- and higher-order), and intensional adjectives (former bassist, clearly higher-order) and test the ability of different composition strategies to model their behavior. In addition to opening up a new empirical domain for research on distributional semantics, our observations concerning the attested vectors for the different types of adjectives, the nouns they modify, and the resulting – – noun phrases yield insights into modification that have been little evident in the formal semantics literature to date.
4 0.53478324 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis
Author: Wen-tau Yih ; Geoffrey Zweig ; John Platt
Abstract: Existing vector space models typically map synonyms and antonyms to similar word vectors, and thus fail to represent antonymy. We introduce a new vector space representation where antonyms lie on opposite sides of a sphere: in the word vector space, synonyms have cosine similarities close to one, while antonyms are close to minus one. We derive this representation with the aid of a thesaurus and latent semantic analysis (LSA). Each entry in the thesaurus a word sense along with its synonyms and antonyms is treated as a “document,” and the resulting document collection is subjected to LSA. The key contribution of this work is to show how to assign signs to the entries in the co-occurrence matrix on which LSA operates, so as to induce a subspace with the desired property. – – We evaluate this procedure with the Graduate Record Examination questions of (Mohammed et al., 2008) and find that the method improves on the results of that study. Further improvements result from refining the subspace representation with discriminative training, and augmenting the training data with general newspaper text. Altogether, we improve on the best previous results by 11points absolute in F measure.
5 0.52878326 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context
Author: Mehmet Ali Yatbaz ; Enis Sert ; Deniz Yuret
Abstract: We investigate paradigmatic representations of word context in the domain of unsupervised syntactic category acquisition. Paradigmatic representations of word context are based on potential substitutes of a word in contrast to syntagmatic representations based on properties of neighboring words. We compare a bigram based baseline model with several paradigmatic models and demonstrate significant gains in accuracy. Our best model based on Euclidean co-occurrence embedding combines the paradigmatic context representation with morphological and orthographic features and achieves 80% many-to-one accuracy on a 45-tag 1M word corpus.
6 0.41942775 61 emnlp-2012-Grounded Models of Semantic Representation
7 0.34059578 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
8 0.32037535 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering
9 0.30172265 29 emnlp-2012-Concurrent Acquisition of Word Meaning and Lexical Categories
10 0.30037576 58 emnlp-2012-Generalizing Sub-sentential Paraphrase Acquisition across Original Signal Type of Text Pairs
11 0.29764339 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
12 0.28392941 39 emnlp-2012-Enlarging Paraphrase Collections through Generalization and Instantiation
13 0.26652968 135 emnlp-2012-Using Discourse Information for Paraphrase Extraction
14 0.25986651 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP
15 0.25569904 80 emnlp-2012-Learning Verb Inference Rules from Linguistically-Motivated Evidence
16 0.22694385 34 emnlp-2012-Do Neighbours Help? An Exploration of Graph-based Algorithms for Cross-domain Sentiment Classification
17 0.21875483 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
18 0.19835192 27 emnlp-2012-Characterizing Stylistic Elements in Syntactic Structure
19 0.19809933 65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure
20 0.1960149 17 emnlp-2012-An "AI readability" Formula for French as a Foreign Language
topicId topicWeight
[(2, 0.012), (16, 0.023), (25, 0.021), (34, 0.049), (45, 0.391), (60, 0.078), (63, 0.048), (64, 0.019), (65, 0.029), (70, 0.018), (73, 0.012), (74, 0.035), (76, 0.138), (80, 0.014), (86, 0.025), (95, 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.80742657 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition
Author: William Blacoe ; Mirella Lapata
Abstract: In this paper we address the problem of modeling compositional meaning for phrases and sentences using distributional methods. We experiment with several possible combinations of representation and composition, exhibiting varying degrees of sophistication. Some are shallow while others operate over syntactic structure, rely on parameter learning, or require access to very large corpora. We find that shallow approaches are as good as more computationally intensive alternatives with regards to two particular tests: (1) phrase similarity and (2) paraphrase detection. The sizes of the involved training corpora and the generated vectors are not as important as the fit between the meaning representation and compositional method.
2 0.65112925 135 emnlp-2012-Using Discourse Information for Paraphrase Extraction
Author: Michaela Regneri ; Rui Wang
Abstract: Previous work on paraphrase extraction using parallel or comparable corpora has generally not considered the documents’ discourse structure as a useful information source. We propose a novel method for collecting paraphrases relying on the sequential event order in the discourse, using multiple sequence alignment with a semantic similarity measure. We show that adding discourse information boosts the performance of sentence-level paraphrase acquisition, which consequently gives a tremendous advantage for extracting phraselevel paraphrase fragments from matched sentences. Our system beats an informed baseline by a margin of 50%.
3 0.64098603 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
Author: Kewei Tu ; Vasant Honavar
Abstract: We introduce a novel approach named unambiguity regularization for unsupervised learning of probabilistic natural language grammars. The approach is based on the observation that natural language is remarkably unambiguous in the sense that only a tiny portion of the large number of possible parses of a natural language sentence are syntactically valid. We incorporate an inductive bias into grammar learning in favor of grammars that lead to unambiguous parses on natural language sentences. The resulting family of algorithms includes the expectation-maximization algorithm (EM) and its variant, Viterbi EM, as well as a so-called softmax-EM algorithm. The softmax-EM algorithm can be implemented with a simple and computationally efficient extension to standard EM. In our experiments of unsupervised dependency grammar learn- ing, we show that unambiguity regularization is beneficial to learning, and in combination with annealing (of the regularization strength) and sparsity priors it leads to improvement over the current state of the art.
4 0.57197648 138 emnlp-2012-Wiki-ly Supervised Part-of-Speech Tagging
Author: Shen Li ; Joao Graca ; Ben Taskar
Abstract: Despite significant recent work, purely unsupervised techniques for part-of-speech (POS) tagging have not achieved useful accuracies required by many language processing tasks. Use of parallel text between resource-rich and resource-poor languages is one source ofweak supervision that significantly improves accuracy. However, parallel text is not always available and techniques for using it require multiple complex algorithmic steps. In this paper we show that we can build POS-taggers exceeding state-of-the-art bilingual methods by using simple hidden Markov models and a freely available and naturally growing resource, the Wiktionary. Across eight languages for which we have labeled data to evaluate results, we achieve accuracy that significantly exceeds best unsupervised and parallel text methods. We achieve highest accuracy reported for several languages and show that our . approach yields better out-of-domain taggers than those trained using fully supervised Penn Treebank.
5 0.47500044 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces
Author: Richard Socher ; Brody Huval ; Christopher D. Manning ; Andrew Y. Ng
Abstract: Single-word vector space models have been very successful at learning lexical information. However, they cannot capture the compositional meaning of longer phrases, preventing them from a deeper understanding of language. We introduce a recursive neural network (RNN) model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length. Our model assigns a vector and a matrix to every node in a parse tree: the vector captures the inherent meaning of the constituent, while the matrix captures how it changes the meaning of neighboring words or phrases. This matrix-vector RNN can learn the meaning of operators in propositional logic and natural language. The model obtains state of the art performance on three different experiments: predicting fine-grained sentiment distributions of adverb-adjective pairs; classifying sentiment labels of movie reviews and classifying semantic relationships such as cause-effect or topic-message between nouns using the syntactic path between them.
7 0.41406497 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation
8 0.4013654 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
9 0.39723933 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
10 0.3942605 129 emnlp-2012-Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
11 0.39146808 53 emnlp-2012-First Order vs. Higher Order Modification in Distributional Semantics
12 0.3854458 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis
13 0.38098401 22 emnlp-2012-Automatically Constructing a Normalisation Dictionary for Microblogs
14 0.38055378 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews
15 0.37996793 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP
16 0.37916422 30 emnlp-2012-Constructing Task-Specific Taxonomies for Document Collection Browsing
17 0.37394798 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
18 0.36953017 51 emnlp-2012-Extracting Opinion Expressions with semi-Markov Conditional Random Fields
19 0.36854199 120 emnlp-2012-Streaming Analysis of Discourse Participants
20 0.3681407 24 emnlp-2012-Biased Representation Learning for Domain Adaptation