emnlp emnlp2013 emnlp2013-137 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kai-Wei Chang ; Wen-tau Yih ; Christopher Meek
Abstract: We present Multi-Relational Latent Semantic Analysis (MRLSA) which generalizes Latent Semantic Analysis (LSA). MRLSA provides an elegant approach to combining multiple relations between words by constructing a 3-way tensor. Similar to LSA, a lowrank approximation of the tensor is derived using a tensor decomposition. Each word in the vocabulary is thus represented by a vector in the latent semantic space and each relation is captured by a latent square matrix. The degree of two words having a specific relation can then be measured through simple linear algebraic operations. We demonstrate that by integrating multiple relations from both homogeneous and heterogeneous information sources, MRLSA achieves state- of-the-art performance on existing benchmark datasets for two relations, antonymy and is-a.
Reference: text
sentIndex sentText sentNum sentScore
1 Similar to LSA, a lowrank approximation of the tensor is derived using a tensor decomposition. [sent-4, score-0.933]
2 Each word in the vocabulary is thus represented by a vector in the latent semantic space and each relation is captured by a latent square matrix. [sent-5, score-0.294]
3 We demonstrate that by integrating multiple relations from both homogeneous and heterogeneous information sources, MRLSA achieves state- of-the-art performance on existing benchmark datasets for two relations, antonymy and is-a. [sent-7, score-0.27]
4 For instance, when applied to lexical semantics, synonyms and antonyms may both be assigned high similarity scores (Landauer and Laham, 1998; Landauer, 2002). [sent-25, score-0.264]
5 Asymmetric relations like hyponyms and hypernyms also cannot be differentiated. [sent-26, score-0.176]
6 , antonymy or is-a), the word vectors will be mapped to a new space according to the relation where the degree of having this relation will be 1602 Proce Sdeiantgtlse o,f W thaesh 2i0n1gt3o nC,o UnSfeAre,n 1c8e- o2n1 E Omctpoibriecra 2l0 M13et. [sent-36, score-0.29]
7 The raw data construction in MRLSA is straightforward and similar to the document-term matrix in LSA. [sent-39, score-0.16]
8 Each slice corresponds to the document-term matrix in the original LSA design but for a specific relation. [sent-41, score-0.2]
9 Analogous to LSA, the whole linear transformation mapping is derived through tensor decomposition, which provides a low-rank approximation of the original tensor. [sent-42, score-0.484]
10 As a result, previously unseen relations between two words can be discovered, and the information encoded in other relations can influence the construction of the latent representations, and thus poten- tially improves the overall quality. [sent-43, score-0.307]
11 1603 2 Related Work MRLSA can be viewed as a model that derives general continuous space representations for capturing lexical semantics, with the help of tensor decomposition techniques. [sent-61, score-0.581]
12 , 2011), or by graphical models such as probabilistic latent semantic analysis (PLSA) (Hofmann, 1999) and latent Dirichlet allocation (LDA) (Blei et al. [sent-70, score-0.183]
13 However, while the words are represented by vectors as well, multiple relations between words are captured separately by matrices. [sent-73, score-0.149]
14 For instance, morphological variations discovered from the Google n-gram corpus have been combined with information from thesauri and vector-based word relatedness models for detecting antonyms (Mohammad et al. [sent-78, score-0.188]
15 The degree that two words having a particular relation is estimated using the same linear function of the corresponding vectors and matrix. [sent-84, score-0.149]
16 Tensor decomposition generalizes matrix factorization and has been applied to several NLP applications recently. [sent-85, score-0.209]
17 (2013) represented triples of question title, question content and answer as a tensor and applied 3-mode SVD to derive latent semantic representations for question matching. [sent-91, score-0.636]
18 However, our goal of modeling different relations for lexical semantics is very different from the intended usage of tensor decomposition in the existing work. [sent-93, score-0.692]
19 Once the matrix is constructed, the second step is to apply singular value decomposition (SVD) to W in order to derive a low-rank approximation. [sent-106, score-0.177]
20 For instance, to compare the u-th and v-th words in the vocabulary, one can compute the cosine similarity of the u-th and v-th column vectors of X, the reconstruction matrix of W. [sent-111, score-0.198]
21 An alternative view of using LSA is to treat the column vectors of ΣVT as a representation of the words in a new k-dimensional latent space. [sent-114, score-0.145]
22 This comes from the observation that the inner product of every two column vectors in X is the inner product of the corresponding column vectors of ΣVT, joyglfaudlndgsea sdnrs-01 01-0 1-0 10 0 0 Figure 2: The matrix construction of PILSA. [sent-115, score-0.284]
23 , sign) indicates wTFh×ethIDerF Fth sec term i nT hthee p vocabulary ,is s a synonym or antonym of the target word. [sent-121, score-0.161]
24 This is due to the fact that the raw matrix representation only records the occurrences of words in documents without knowing the specific relation between the word and document. [sent-124, score-0.195]
25 (2012) proposed a polarity inducing latent semantic analysis model recently, which we introduce next. [sent-126, score-0.146]
26 1 Polarity Inducing Latent Semantic Analysis In order to distinguish antonyms from synonyms, the polarity inducing LSA (PILSA) model (Yih et al. [sent-129, score-0.197]
27 Synonyms and antonyms of the same target word are grouped together as a “document” and a document-term matrix is constructed accordingly as done in LSA. [sent-131, score-0.253]
28 While the absolute value of each element in the matrix is still the same TF×IDF score, the ienlem theen mts tthriaxt correspond to tehe T antonyms boreec,otm hee negative. [sent-133, score-0.252]
29 When comparing two words using the cosine similarity (or simply inner product) of their corresponding column vectors in the matrix, the score of a synonym pair remains positive, but the score of an antonym pair becomes negative. [sent-135, score-0.263]
30 The sign of the cosine score of the column vectors of any two words indicates whether they are close to synonyms or to antonyms and the absolute value reflects the degree of the relation. [sent-138, score-0.38]
31 When all the column vectors are normalized to unit vectors, it can also be viewed as synonyms are clustered together and antonyms lie on the opposite sides of a unit sphere. [sent-139, score-0.331]
32 Each slice captures a particular relation and is in the format of the document-term matrix in LSA. [sent-142, score-0.257]
33 1 Representing Multi-Relational Data in Tensors A tensor is simply a multi-dimensional array. [sent-150, score-0.449]
34 In this work, we use a 3-way tensor W to encode multiple kw,o wrde r ueslaeti aon 3s-. [sent-151, score-0.449]
35 w aAyn t eenlseomre Wnt o tof Wenc oisd ede mnoutletidby Wi,j,k using its indices, aenndt W:,:,k represents bthye Wk-th sliucsei ogf tWs (a scleisce, aonfd a 3-way tensor is a matrix, loibcetain ofed W by fixing t ohef ath 3ird-w index). [sent-152, score-0.449]
36 oFro lis- lowing (Kolda and Bader, 2009), a fiber of a tensor W:,j,k is a vector, which is a high order analog soof a Wmatrix row or column. [sent-153, score-0.472]
37 When constructing the raw tensor W in MRLSA, eacWh selinc ec ins analogous et roa twhe t dnosocur Wmen itn-t MermRL matrix in LSA, but created based on the data of a particular relation, such as synonyms. [sent-154, score-0.613]
38 For instance, W:, “word” ,k represents the fiber corresponding ctoe ,t hWe “word” in slice k, and W:,:,syn refers to tinheg s tolice th teh “atw eonrcdo”de ins t shleic synonymy Wrelation. [sent-156, score-0.207]
39 Below we use an example to compare this construction to the raw matrix in PILSA, and discuss how it extends LSA. [sent-157, score-0.16]
40 The raw tensor in MRLSA would then consist of two slices, W:,:,syn MandR W:,:,ant, utlod e tnhceonde c synonyms aond s antonyms of target words from a knowledge source (e. [sent-159, score-0.776]
41 We− can extend the construction above to enable MRLSA to utilize other semantic relations (e. [sent-166, score-0.17]
42 , hypernymy) by adding a slice corresponding to each relation of interest. [sent-168, score-0.191]
43 3(c) demonstrates how to add another slice W:,:,hyper to tdheem toennssotrra fteosr encoding hypernyms. [sent-170, score-0.165]
44 2 Tensor Decomposition The MRLSA raw tensor encodes relations in one or more data resources, such as thesauri. [sent-172, score-0.628]
45 In this section, we derive a low-rank approximation of the tensor to generalize the knowledge. [sent-174, score-0.484]
46 Various tensor decomposition methods have been proposed in literature. [sent-176, score-0.56]
47 In Tucker decomposition, a d n m tensor W is decomposed into four components G, U, V, TW. [sent-180, score-0.449]
48 4(a) illustrates the Tucker tensor decomposition method which factors a 3-way tensor W to tFhigreuer orthogonal matrices, U, V, T, caknedr a core t edenscoorm Gp. [sent-182, score-1.031]
49 ( X of W is defined by Wi,j,k ≈ Xi,j,k XR1 XR2 XR3 = X X X Gr1,r2,r3Ui,r1Vj,r2Tk,r3, rX1 X=1 rX2=1 rX3=1 where G is a core tensor with dimensions R1 R2 R3 arned G U, V, rTe are orthogonal mnsaitornicse Rs w×ithR d×imensions d R1, n R2, m R3, respectively. [sent-186, score-0.471]
50 d To make the analogy to SVD clear, we rewrite the results of Tucker decomposition by performing a nmode matrix product over the core tensor G with the mmoadtriex m mTat. [sent-192, score-0.648]
51 4(b), Then, a straightforward calculation shows that k-th slice of tensor W is approximated by W:,:,k ≈ X:,:,k = US:,:,kVT. [sent-197, score-0.583]
52 (1), one can observe that matrices U and V play similar roles here, and 1607 each slice of the core tensor S is analogous to Σ. [sent-200, score-0.631]
53 Aares imn SVD, the column vectors of G:,:,kVT (capture both word and relation information) behave similarly to the column vectors of the original tensor slice W:,:,k. [sent-202, score-0.788]
54 3 Measuring the Degrees of Word Relations In principle, the raw information in the input tensor W can be used for computing lexical similarity using th cea nco bsein ues score bc oetmwpeuetni tghe l cxoicluaml sni mveilcatroitrys for two words from the same slice of the tensor. [sent-204, score-0.688]
55 The key role of the pivot slice is to expand the lexical coverage of the relation of interest to additional lexical entries and, for this reason, the pivot slice should be chosen to capture the equivalence of the lexical entries. [sent-206, score-0.389]
56 In this paper, we use the synonymy relation as our pivot slice. [sent-207, score-0.139]
57 First we consider measuring the degree of a relation rel holding between the i-th and j-th words using the raw tensor W, which can be computed as cos This measurement ? [sent-208, score-0.738]
58 Turning to the use of the tensor decomposition, we use a similar derivation to Eq. [sent-214, score-0.449]
59 (3), and measure the degree of relation rel between two words by cos ? [sent-215, score-0.184]
60 (6) For instance, the degree of antonymy between “joy” and “sorrow” is measured by the cosine similarity between the respective fibers cos(X:,“joy”,syn, X:,“sorrow”,ant). [sent-218, score-0.192]
61 When encoding two opposite relations from the same source, MRLSA performs comparably to PILSA. [sent-230, score-0.186]
62 1 Experimental Setup We construct the raw tensors to encode a particular relation in each slice based on two data sources. [sent-234, score-0.294]
63 WordNet We use four types of relations from WordNet: synonymy, antonymy, hypernymy and hyponymy. [sent-239, score-0.144]
64 For instance, the WordNet antonym slice contains only 46,945 nonzero entries, while the Encarta antonym slice has 129,733. [sent-242, score-0.432]
65 com 1608 We apply a memory-efficient Tucker decomposition algorithm (Kolda and Sun, 2008) implemented in tensor toolbox v2. [sent-245, score-0.591]
66 The largest tensor considered in this paper can be decomposed in about 3 hours using less than 4GB of memory with a commodity PC. [sent-248, score-0.449]
67 We tune two sets of parameters using the development set: (1) the rank parameter τ in the tensor decomposition and (2) the scaling factors of different slices of the tensor. [sent-258, score-0.656]
68 g T fahcet oerl-s ements of each slice are multiplied by the scaling factor before factorization. [sent-263, score-0.165]
69 We modify the MATLAB code of tensor toolbox to use the built-in svd function instead of svds. [sent-277, score-0.565]
70 RawTensor evaluates the performance of the tensor with 2 slices encoding synonyms and antonyms before decomposition (see Eq. [sent-299, score-0.887]
71 MRLSA:Syn+Ant applies Tucker decomposition to the raw tensor and measures the degree of antonymy using Eq. [sent-301, score-0.766]
72 MRLSA:4-layers adds hypernyms and hyponyms from WordNet; MRLSA:WordNet+Encarta consists of synonyms/antonyms from Encarta and hypernyms/hyponyms from WordNet, where the target words are aligned using the synonymy relations. [sent-306, score-0.143]
73 corpora, Encarta and The performance of the MRLSA raw tensor is close to that of looking up the thesaurus. [sent-309, score-0.521]
74 This indicates capture the tensor representation the word relations the thesaurus. [sent-310, score-0.556]
75 explicitly After conducting is able to described in tensor decomposi- tion, MRLSA:Syn+Ant achieves similar results to PILSA. [sent-311, score-0.449]
76 However, the true power of MRLSA is its ability to incorporate other semantic relations to boost the performance of the target task. [sent-313, score-0.172]
77 For example, when we add the hypernymy and hyponymy relations to the tensor, these class-inclusion relations provide a weak signal to help resolve antonymy. [sent-314, score-0.251]
78 We suspect that this is due to the fact that antonyms typically share the same properties but only have the opposite meaning on one particular semantic dimension. [sent-315, score-0.23]
79 For instance, the antonyms “sadness” and “happiness” are different forms of emotion. [sent-316, score-0.163]
80 When two words are hyponyms of a target word, the likelihood that they are antonyms should thus be increased. [sent-317, score-0.228]
81 We show that the target relations and these auxiliary semantic relations can be col1609 lected from the same data source (e. [sent-318, score-0.279]
82 Moreover, our experiments show that adding the hypernym and hyponym layers from WordNet improves modeling antonym relations based on the Encarta thesaurus. [sent-324, score-0.215]
83 To better understand the model, we examine the top antonyms for three question words from the GRE test. [sent-328, score-0.188]
84 The lists below show antonyms and their MRLSA scores for each of the GRE question words as determined by the MRLSA:WordNET+Encarta model. [sent-329, score-0.188]
85 318) Table 2: Results of measuring the class-inclusion (is-a) relations in MaxDiff accuracy (see text for detail). [sent-359, score-0.14]
86 RawTensor has synonym and hyponym slices and measures the degree of is-a relation using Eq. [sent-360, score-0.223]
87 MRLSA:Syn+Hypo factors the raw tensor and judges the relation by Eq. [sent-362, score-0.578]
88 only preserves the antonyms in the thesaurus, but also discovers additional ones, such as exacerbate and inflame for “alleviate”. [sent-369, score-0.186]
89 Results show that even the raw tensor representation (RawTensor) performs better than WordNet lookup. [sent-391, score-0.521]
90 We suspect that this is because the tensor representation can capture the fact that the hyponyms of a word are usually synonymous to each other. [sent-392, score-0.49]
91 By performing Tucker decomposition on the raw Tensor, MRLSA achieves better performance. [sent-393, score-0.183]
92 MRLSA:4-layers further leverages the information from antonyms and hypernyms and thus improves the model. [sent-394, score-0.191]
93 Therefore, it is interesting to check if combining synonyms and antonyms from Encarta helps. [sent-396, score-0.231]
94 Having one additional slice to capture the general term co-occurrence relation may help improve the model in this respect. [sent-422, score-0.191]
95 MRLSA models multiple word relations by leveraging a 3-way tensor, where each slice captures one particular relation. [sent-424, score-0.241]
96 A low-rank approximation of the tensor is then derived using a tensor decomposition. [sent-425, score-0.933]
97 Consequently, words in the vocabulary are represented by vectors in the latent semantic space, and each relation is captured by a latent square matrix. [sent-426, score-0.336]
98 Given two words, MRLSA not only can measure their degree of having a specific relation, but also can discover unknown relations between them. [sent-427, score-0.157]
99 By encoding relations from both homogeneous or heterogeneous data sources, MRLSA achieves state-of-the-art performance on existing benchmark datasets for two relations, antonymy and is-a. [sent-429, score-0.301]
100 For instance, the knowledge encoded by MRLSA can be enriched by adding more relations from a variety of linguistic resources, including the co-occurrence relations from large cor1611 pora. [sent-431, score-0.214]
wordName wordTfidf (topN-words)
[('mrlsa', 0.656), ('tensor', 0.449), ('encarta', 0.176), ('antonyms', 0.163), ('lsa', 0.161), ('slice', 0.134), ('pilsa', 0.129), ('tucker', 0.121), ('decomposition', 0.111), ('relations', 0.107), ('yih', 0.096), ('svd', 0.085), ('antonymy', 0.084), ('antonym', 0.082), ('raw', 0.072), ('latent', 0.071), ('kolda', 0.07), ('gre', 0.069), ('synonyms', 0.068), ('matrix', 0.066), ('slices', 0.065), ('vt', 0.061), ('gladden', 0.059), ('heterogeneous', 0.058), ('relation', 0.057), ('wordnet', 0.056), ('thesaurus', 0.055), ('mohammad', 0.052), ('degree', 0.05), ('synonymy', 0.05), ('cos', 0.048), ('bader', 0.047), ('maxdiff', 0.047), ('sorrow', 0.046), ('syn', 0.043), ('joy', 0.043), ('vectors', 0.042), ('semantic', 0.041), ('landauer', 0.041), ('hyponyms', 0.041), ('deerwester', 0.041), ('hypernymy', 0.037), ('anger', 0.037), ('joyfulness', 0.035), ('rawtensor', 0.035), ('zhila', 0.035), ('approximation', 0.035), ('polarity', 0.034), ('questions', 0.034), ('similarity', 0.033), ('measuring', 0.033), ('pivot', 0.032), ('generalizes', 0.032), ('column', 0.032), ('scaling', 0.031), ('encoding', 0.031), ('jurgens', 0.031), ('orthonormal', 0.031), ('tensors', 0.031), ('toolbox', 0.031), ('vocabulary', 0.03), ('turney', 0.029), ('rel', 0.029), ('meek', 0.028), ('hypernyms', 0.028), ('asymmetric', 0.028), ('analogous', 0.026), ('opposite', 0.026), ('hyponym', 0.026), ('question', 0.025), ('relatedness', 0.025), ('synonym', 0.025), ('semantics', 0.025), ('cosine', 0.025), ('tamara', 0.025), ('salton', 0.025), ('zweig', 0.025), ('target', 0.024), ('inner', 0.024), ('square', 0.024), ('idf', 0.024), ('tf', 0.024), ('degrees', 0.024), ('angsaedr', 0.023), ('buttercrunch', 0.023), ('fiber', 0.023), ('inflame', 0.023), ('louviere', 0.023), ('sadden', 0.023), ('subcategory', 0.023), ('theen', 0.023), ('wordi', 0.023), ('wordj', 0.023), ('comparably', 0.022), ('core', 0.022), ('construction', 0.022), ('homogeneous', 0.021), ('continuous', 0.021), ('socher', 0.021), ('platt', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999994 137 emnlp-2013-Multi-Relational Latent Semantic Analysis
Author: Kai-Wei Chang ; Wen-tau Yih ; Christopher Meek
Abstract: We present Multi-Relational Latent Semantic Analysis (MRLSA) which generalizes Latent Semantic Analysis (LSA). MRLSA provides an elegant approach to combining multiple relations between words by constructing a 3-way tensor. Similar to LSA, a lowrank approximation of the tensor is derived using a tensor decomposition. Each word in the vocabulary is thus represented by a vector in the latent semantic space and each relation is captured by a latent square matrix. The degree of two words having a specific relation can then be measured through simple linear algebraic operations. We demonstrate that by integrating multiple relations from both homogeneous and heterogeneous information sources, MRLSA achieves state- of-the-art performance on existing benchmark datasets for two relations, antonymy and is-a.
2 0.21276839 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors
Author: Dimitri Kartsaklis ; Mehrnoosh Sadrzadeh
Abstract: Recent work has shown that compositionaldistributional models using element-wise operations on contextual word vectors benefit from the introduction of a prior disambiguation step. The purpose of this paper is to generalise these ideas to tensor-based models, where relational words such as verbs and adjectives are represented by linear maps (higher order tensors) acting on a number of arguments (vectors). We propose disambiguation algorithms for a number of tensor-based models, which we then test on a variety of tasks. The results show that disambiguation can provide better compositional representation even for the case of tensor-based models. Further- more, we confirm previous findings regarding the positive effect of disambiguation on vector mixture models, and we compare the effectiveness of the two approaches.
3 0.13750747 158 emnlp-2013-Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
Author: Richard Socher ; Alex Perelygin ; Jean Wu ; Jason Chuang ; Christopher D. Manning ; Andrew Ng ; Christopher Potts
Abstract: Semantic word spaces have been very useful but cannot express the meaning of longer phrases in a principled way. Further progress towards understanding compositionality in tasks such as sentiment detection requires richer supervised training and evaluation resources and more powerful models of composition. To remedy this, we introduce a Sentiment Treebank. It includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality. To address them, we introduce the Recursive Neural Tensor Network. When trained on the new treebank, this model outperforms all previous methods on several metrics. It pushes the state of the art in single sentence positive/negative classification from 80% up to 85.4%. The accuracy of predicting fine-grained sentiment labels for all phrases reaches 80.7%, an improvement of 9.7% over bag of features baselines. Lastly, it is the only model that can accurately capture the effects of negation and its scope at various tree levels for both positive and negative phrases.
4 0.11815025 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
Author: Yangfeng Ji ; Jacob Eisenstein
Abstract: Matrix and tensor factorization have been applied to a number of semantic relatedness tasks, including paraphrase identification. The key idea is that similarity in the latent space implies semantic relatedness. We describe three ways in which labeled data can improve the accuracy of these approaches on paraphrase classification. First, we design a new discriminative term-weighting metric called TF-KLD, which outperforms TF-IDF. Next, we show that using the latent representation from matrix factorization as features in a classification algorithm substantially improves accuracy. Finally, we combine latent features with fine-grained n-gram overlap features, yielding performance that is 3% more accurate than the prior state-of-the-art.
5 0.10280725 59 emnlp-2013-Deriving Adjectival Scales from Continuous Space Word Representations
Author: Joo-Kyung Kim ; Marie-Catherine de Marneffe
Abstract: Continuous space word representations extracted from neural network language models have been used effectively for natural language processing, but until recently it was not clear whether the spatial relationships of such representations were interpretable. Mikolov et al. (2013) show that these representations do capture syntactic and semantic regularities. Here, we push the interpretation of continuous space word representations further by demonstrating that vector offsets can be used to derive adjectival scales (e.g., okay < good < excellent). We evaluate the scales on the indirect answers to yes/no questions corpus (de Marneffe et al., 2010). We obtain 72.8% accuracy, which outperforms previous results (∼60%) on tichihs corpus aornmd highlights sth rees quality o6f0% the) scales extracted, providing further support that the continuous space word representations are meaningful.
6 0.071156487 148 emnlp-2013-Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching
7 0.060233869 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification
8 0.054255515 134 emnlp-2013-Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks
9 0.05175003 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery
10 0.051241592 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge
11 0.04971274 165 emnlp-2013-Scaling to Large3 Data: An Efficient and Effective Method to Compute Distributional Thesauri
12 0.048685201 152 emnlp-2013-Predicting the Presence of Discourse Connectives
13 0.048029941 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations
14 0.047083825 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?
15 0.046330083 12 emnlp-2013-A Semantically Enhanced Approach to Determine Textual Similarity
16 0.043833785 111 emnlp-2013-Joint Chinese Word Segmentation and POS Tagging on Heterogeneous Annotated Corpora with Multiple Task Learning
17 0.040601414 41 emnlp-2013-Building Event Threads out of Multiple News Articles
18 0.039810773 160 emnlp-2013-Relational Inference for Wikification
19 0.037518825 172 emnlp-2013-Simple Customization of Recursive Neural Networks for Semantic Relation Classification
20 0.03648074 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering
topicId topicWeight
[(0, -0.141), (1, 0.046), (2, -0.069), (3, -0.014), (4, 0.037), (5, 0.141), (6, -0.042), (7, -0.054), (8, -0.12), (9, -0.043), (10, 0.123), (11, 0.014), (12, -0.132), (13, 0.001), (14, 0.016), (15, -0.067), (16, 0.05), (17, 0.02), (18, -0.048), (19, 0.076), (20, -0.052), (21, 0.019), (22, -0.006), (23, -0.081), (24, -0.013), (25, -0.074), (26, -0.068), (27, 0.016), (28, 0.092), (29, 0.139), (30, 0.015), (31, 0.039), (32, 0.001), (33, -0.081), (34, 0.163), (35, 0.081), (36, -0.237), (37, 0.12), (38, 0.045), (39, -0.111), (40, -0.077), (41, 0.058), (42, -0.073), (43, -0.159), (44, 0.194), (45, 0.043), (46, -0.01), (47, -0.156), (48, -0.04), (49, -0.053)]
simIndex simValue paperId paperTitle
same-paper 1 0.91515958 137 emnlp-2013-Multi-Relational Latent Semantic Analysis
Author: Kai-Wei Chang ; Wen-tau Yih ; Christopher Meek
Abstract: We present Multi-Relational Latent Semantic Analysis (MRLSA) which generalizes Latent Semantic Analysis (LSA). MRLSA provides an elegant approach to combining multiple relations between words by constructing a 3-way tensor. Similar to LSA, a lowrank approximation of the tensor is derived using a tensor decomposition. Each word in the vocabulary is thus represented by a vector in the latent semantic space and each relation is captured by a latent square matrix. The degree of two words having a specific relation can then be measured through simple linear algebraic operations. We demonstrate that by integrating multiple relations from both homogeneous and heterogeneous information sources, MRLSA achieves state- of-the-art performance on existing benchmark datasets for two relations, antonymy and is-a.
2 0.7463789 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors
Author: Dimitri Kartsaklis ; Mehrnoosh Sadrzadeh
Abstract: Recent work has shown that compositionaldistributional models using element-wise operations on contextual word vectors benefit from the introduction of a prior disambiguation step. The purpose of this paper is to generalise these ideas to tensor-based models, where relational words such as verbs and adjectives are represented by linear maps (higher order tensors) acting on a number of arguments (vectors). We propose disambiguation algorithms for a number of tensor-based models, which we then test on a variety of tasks. The results show that disambiguation can provide better compositional representation even for the case of tensor-based models. Further- more, we confirm previous findings regarding the positive effect of disambiguation on vector mixture models, and we compare the effectiveness of the two approaches.
Author: Masashi Tsubaki ; Kevin Duh ; Masashi Shimbo ; Yuji Matsumoto
Abstract: We present a novel vector space model for semantic co-compositionality. Inspired by Generative Lexicon Theory (Pustejovsky, 1995), our goal is a compositional model where both predicate and argument are allowed to modify each others’ meaning representations while generating the overall semantics. This readily addresses some major challenges with current vector space models, notably the polysemy issue and the use of one representation per word type. We implement cocompositionality using prototype projections on predicates/arguments and show that this is effective in adapting their word representations. We further cast the model as a neural network and propose an unsupervised algorithm to jointly train word representations with co-compositionality. The model achieves the best result to date (ρ = 0.47) on the semantic similarity task of transitive verbs (Grefenstette and Sadrzadeh, 2011).
4 0.43473399 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
Author: Yangfeng Ji ; Jacob Eisenstein
Abstract: Matrix and tensor factorization have been applied to a number of semantic relatedness tasks, including paraphrase identification. The key idea is that similarity in the latent space implies semantic relatedness. We describe three ways in which labeled data can improve the accuracy of these approaches on paraphrase classification. First, we design a new discriminative term-weighting metric called TF-KLD, which outperforms TF-IDF. Next, we show that using the latent representation from matrix factorization as features in a classification algorithm substantially improves accuracy. Finally, we combine latent features with fine-grained n-gram overlap features, yielding performance that is 3% more accurate than the prior state-of-the-art.
5 0.40366966 59 emnlp-2013-Deriving Adjectival Scales from Continuous Space Word Representations
Author: Joo-Kyung Kim ; Marie-Catherine de Marneffe
Abstract: Continuous space word representations extracted from neural network language models have been used effectively for natural language processing, but until recently it was not clear whether the spatial relationships of such representations were interpretable. Mikolov et al. (2013) show that these representations do capture syntactic and semantic regularities. Here, we push the interpretation of continuous space word representations further by demonstrating that vector offsets can be used to derive adjectival scales (e.g., okay < good < excellent). We evaluate the scales on the indirect answers to yes/no questions corpus (de Marneffe et al., 2010). We obtain 72.8% accuracy, which outperforms previous results (∼60%) on tichihs corpus aornmd highlights sth rees quality o6f0% the) scales extracted, providing further support that the continuous space word representations are meaningful.
6 0.37859315 44 emnlp-2013-Centering Similarity Measures to Reduce Hubs
7 0.35989383 158 emnlp-2013-Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
8 0.34085622 68 emnlp-2013-Effectiveness and Efficiency of Open Relation Extraction
9 0.30361187 148 emnlp-2013-Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching
10 0.28953347 189 emnlp-2013-Two-Stage Method for Large-Scale Acquisition of Contradiction Pattern Pairs using Entailment
11 0.28892159 172 emnlp-2013-Simple Customization of Recursive Neural Networks for Semantic Relation Classification
12 0.27942646 60 emnlp-2013-Detecting Compositionality of Multi-Word Expressions using Nearest Neighbours in Vector Space Models
13 0.27410746 195 emnlp-2013-Unsupervised Spectral Learning of WCFG as Low-rank Matrix Completion
15 0.256926 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations
17 0.24875495 10 emnlp-2013-A Multi-Teraflop Constituency Parser using GPUs
18 0.24682036 23 emnlp-2013-Animacy Detection with Voting Models
19 0.24252427 152 emnlp-2013-Predicting the Presence of Discourse Connectives
topicId topicWeight
[(3, 0.029), (6, 0.011), (8, 0.309), (9, 0.017), (18, 0.041), (22, 0.045), (30, 0.045), (43, 0.019), (50, 0.016), (51, 0.182), (66, 0.051), (71, 0.016), (75, 0.046), (77, 0.02), (90, 0.01), (96, 0.035), (97, 0.012)]
simIndex simValue paperId paperTitle
1 0.8123554 90 emnlp-2013-Generating Coherent Event Schemas at Scale
Author: Niranjan Balasubramanian ; Stephen Soderland ; Mausam ; Oren Etzioni
Abstract: Chambers and Jurafsky (2009) demonstrated that event schemas can be automatically induced from text corpora. However, our analysis of their schemas identifies several weaknesses, e.g., some schemas lack a common topic and distinct roles are incorrectly mixed into a single actor. It is due in part to their pair-wise representation that treats subjectverb independently from verb-object. This often leads to subject-verb-object triples that are not meaningful in the real-world. We present a novel approach to inducing open-domain event schemas that overcomes these limitations. Our approach uses cooccurrence statistics of semantically typed relational triples, which we call Rel-grams (relational n-grams). In a human evaluation, our schemas outperform Chambers’s schemas by wide margins on several evaluation criteria. Both Rel-grams and event schemas are freely available to the research community.
same-paper 2 0.77712768 137 emnlp-2013-Multi-Relational Latent Semantic Analysis
Author: Kai-Wei Chang ; Wen-tau Yih ; Christopher Meek
Abstract: We present Multi-Relational Latent Semantic Analysis (MRLSA) which generalizes Latent Semantic Analysis (LSA). MRLSA provides an elegant approach to combining multiple relations between words by constructing a 3-way tensor. Similar to LSA, a lowrank approximation of the tensor is derived using a tensor decomposition. Each word in the vocabulary is thus represented by a vector in the latent semantic space and each relation is captured by a latent square matrix. The degree of two words having a specific relation can then be measured through simple linear algebraic operations. We demonstrate that by integrating multiple relations from both homogeneous and heterogeneous information sources, MRLSA achieves state- of-the-art performance on existing benchmark datasets for two relations, antonymy and is-a.
3 0.67079675 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation
Author: Kevin Gimpel ; Dhruv Batra ; Chris Dyer ; Gregory Shakhnarovich
Abstract: This paper addresses the problem of producing a diverse set of plausible translations. We present a simple procedure that can be used with any statistical machine translation (MT) system. We explore three ways of using diverse translations: (1) system combination, (2) discriminative reranking with rich features, and (3) a novel post-editing scenario in which multiple translations are presented to users. We find that diversity can improve performance on these tasks, especially for sentences that are difficult for MT.
4 0.61346573 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
Author: Yangfeng Ji ; Jacob Eisenstein
Abstract: Matrix and tensor factorization have been applied to a number of semantic relatedness tasks, including paraphrase identification. The key idea is that similarity in the latent space implies semantic relatedness. We describe three ways in which labeled data can improve the accuracy of these approaches on paraphrase classification. First, we design a new discriminative term-weighting metric called TF-KLD, which outperforms TF-IDF. Next, we show that using the latent representation from matrix factorization as features in a classification algorithm substantially improves accuracy. Finally, we combine latent features with fine-grained n-gram overlap features, yielding performance that is 3% more accurate than the prior state-of-the-art.
5 0.56163734 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
Author: Jason Weston ; Antoine Bordes ; Oksana Yakhnenko ; Nicolas Usunier
Abstract: This paper proposes a novel approach for relation extraction from free text which is trained to jointly use information from the text and from existing knowledge. Our model is based on scoring functions that operate by learning low-dimensional embeddings of words, entities and relationships from a knowledge base. We empirically show on New York Times articles aligned with Freebase relations that our approach is able to efficiently use the extra information provided by a large subset of Freebase data (4M entities, 23k relationships) to improve over methods that rely on text features alone.
6 0.55248529 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
7 0.5517 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
8 0.55035299 152 emnlp-2013-Predicting the Presence of Discourse Connectives
9 0.54960799 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction
10 0.54926389 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations
11 0.54911029 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types
12 0.54774529 69 emnlp-2013-Efficient Collective Entity Linking with Stacking
13 0.54733384 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation
14 0.54710567 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology
15 0.54702103 73 emnlp-2013-Error-Driven Analysis of Challenges in Coreference Resolution
16 0.54668593 80 emnlp-2013-Exploiting Zero Pronouns to Improve Chinese Coreference Resolution
17 0.54649854 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction
18 0.54633158 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery
19 0.54616982 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification
20 0.54511696 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors