emnlp emnlp2013 emnlp2013-64 knowledge-graph by maker-knowledge-mining

64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity


Source: pdf

Author: Yangfeng Ji ; Jacob Eisenstein

Abstract: Matrix and tensor factorization have been applied to a number of semantic relatedness tasks, including paraphrase identification. The key idea is that similarity in the latent space implies semantic relatedness. We describe three ways in which labeled data can improve the accuracy of these approaches on paraphrase classification. First, we design a new discriminative term-weighting metric called TF-KLD, which outperforms TF-IDF. Next, we show that using the latent representation from matrix factorization as features in a classification algorithm substantially improves accuracy. Finally, we combine latent features with fine-grained n-gram overlap features, yielding performance that is 3% more accurate than the prior state-of-the-art.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 j iyfeng@ gat e ch edu Abstract Matrix and tensor factorization have been applied to a number of semantic relatedness tasks, including paraphrase identification. [sent-2, score-0.792]

2 The key idea is that similarity in the latent space implies semantic relatedness. [sent-3, score-0.337]

3 We describe three ways in which labeled data can improve the accuracy of these approaches on paraphrase classification. [sent-4, score-0.537]

4 First, we design a new discriminative term-weighting metric called TF-KLD, which outperforms TF-IDF. [sent-5, score-0.05]

5 Next, we show that using the latent representation from matrix factorization as features in a classification algorithm substantially improves accuracy. [sent-6, score-0.749]

6 Finally, we combine latent features with fine-grained n-gram overlap features, yielding performance that is 3% more accurate than the prior state-of-the-art. [sent-7, score-0.335]

7 1 Introduction Measuring the semantic similarity of short units of text is fundamental to many natural language processing tasks, from evaluating machine translation (Kauchak and Barzilay, 2006) to grouping redundant event mentions in social media (Petrovi c´ et al. [sent-8, score-0.142]

8 The task is challenging because of the infinitely diverse set of possible linguistic realizations for any idea (Bhagat and Hovy, 2013), and because of the short length of individual sentences, which means that standard bag-of-words representations will be hopelessly sparse. [sent-10, score-0.053]

9 Distributional methods address this problem by transforming the high-dimensional bag-of-words representation into a lower-dimensional latent space. [sent-11, score-0.272]

10 891 Jacob Eisenstein School of Interactive Computing Georgia Institute of Technology j acobe @ gatech edu . [sent-12, score-0.035]

11 × This can be accomplished by factoring a matrix or tensor of term-context counts (Turney and Pantel, 2010); proximity in the induced latent space has been shown to correlate with semantic similarity (Mihalcea et al. [sent-13, score-0.682]

12 However, factoring the term-context matrix means throwing away a considerable amount of information, as the original matrix of size M N (number of instances by number torfi features) Mis ×facNtor (endu minbteor rt owfo i nssmtaanlcleers m byatr niucmesb eofr size M K and N K, with K ? [sent-15, score-0.479]

13 d I fd tahtea about semantic similarity, important information can be lost. [sent-19, score-0.055]

14 In this paper, we show how labeled data can considerably improve distributional methods for measuring semantic similarity. [sent-20, score-0.316]

15 First, we develop a new discriminative term-weighting metric called TF-KLD, which is applied to the term-context matrix before factorization. [sent-21, score-0.22]

16 On a standard paraphrase identification task (Dolan et al. [sent-22, score-0.499]

17 Next, we convert the latent repre- sentations of each sentence pair into a feature vector, which is used as input to a linear SVM classifier. [sent-24, score-0.23]

18 This yields further improvements and substantially outperforms the current state-of-the-art on paraphrase classification. [sent-25, score-0.455]

19 We then add “finegrained” features about the lexical similarity of the sentence pair. [sent-26, score-0.123]

20 The combination of latent and finegrained features yields further improvements in accuracy, demonstrating that these feature sets provide complementary information on semantic similarity. [sent-27, score-0.324]

21 , 2012); (2) syntactic operations on the parse structure (Wu, 2005; Das and Smith, 2009); and (3) distributional methods, such as latent semantic analysis (LSA; Landauer et al. [sent-33, score-0.468]

22 One application of distributional techniques is to replace in- dividual words with distributionally similar alternatives (Kauchak and Barzilay, 2006). [sent-35, score-0.288]

23 Alternatively, Blacoe and Lapata (2012) show that latent word representations can be combined with simple elementwise operations to identify the semantic similarity of larger units of text. [sent-36, score-0.39]

24 We take a different approach: rather than representing the meanings of individual words, we directly obtain a distributional representation for the entire sentence. [sent-39, score-0.295]

25 (2006) and Guo and Diab (2012), who treat sentences as pseudo-documents in an LSA framework, and identify paraphrases using similarity in the latent space. [sent-41, score-0.367]

26 We show that the performance of such techniques can be improved dramatically by using supervised information to (1) reweight the individual distributional features and (2) learn the importance of each latent dimension. [sent-42, score-0.59]

27 3 Discriminative feature weighting Distributional representations (Turney and Pantel, 2010) can be induced from a co-occurrence matrix W ∈ RM×N, where M is the number of instances a ∈nd R N is the number of distributional features. [sent-43, score-0.61]

28 For paraphrase identification, each instance is a sentence; features may be unigrams, or may include higher-order n-grams or dependency pairs. [sent-44, score-0.491]

29 By decomposing the matrix W, we hope to obtain a latent representation in which semantically-related sentences are similar. [sent-45, score-0.442]

30 However, recent work has demonstrated the ro892 bustness of nonnegative matrix factorization (NMF; Lee and Seung, 2001) for text mining tasks (Xu et al. [sent-47, score-0.41]

31 , 2012); the difference from SVD is the addition of a non-negativity constraint in the latent representation based on non-orthogonal basis. [sent-49, score-0.272]

32 While W may simply contain counts of distributional features, prior work has demonstrated the utility of reweighting these counts (Turney and Pantel, 2010). [sent-50, score-0.32]

33 Guo and Diab (2012) show that applying a special weight to unseen words can further improvement performance on paraphrase identification. [sent-52, score-0.455]

34 We present a new weighting scheme, TF-KLD, based on supervised information. [sent-53, score-0.191]

35 The key idea is to increase the weights of distributional features that are discriminative, and to decrease the weights of features that are not. [sent-54, score-0.29]

36 Conceptually, this is similar to Linear Discriminant Analysis, a supervised feature weighting scheme for continuous data (Murphy, 2012). [sent-55, score-0.191]

37 Assuming t thhee yor adreer l aobfe tlehed asesn ate pnacrae-s within the pair is irrelevant, then for k-th distributional feature, we define two Bernoulli distributions: w~ i(2) • pk = P(wi(1k)|wi(k2) = 1,ri = 1). [sent-57, score-0.395]

38 This is the wi(1) contains feature k, given that k appears in w(i2) and the two sen- probability that sentence tences are labeled as paraphrases, • qk = P(wi(1k)|wi(k2) ri = 1. [sent-58, score-0.318]

39 This is the wi(1) contains feature k, given that k appears in w(i2) and the two sen- probability that sentence tences are labeled as not paraphrases, The Kullback-Leibler divergence ri = 0. [sent-60, score-0.209]

40 KL(pk | |qk) iPnaxbiplikt(yx) oflo fgeapqtk u( x re) i ks, th aen d a is m geua sruarnete oefd th toe d biesc nroimn- = Figure 1: Conditional probabilities for a few handselected unigram features, with lines showing contours with identical KL-divergence. [sent-61, score-0.064]

41 1 We use this divergence to reweight the features in W before performing the matrix factorization. [sent-65, score-0.343]

42 This has the effect of increasing the weights of features whose likelihood of appearing in a pair × of sentences is strongly influenced by the paraphrase relationship between the two sentences. [sent-66, score-0.53]

43 On the other hand, if pk = qk, then the KL-divergence will be zero, and the feature will be ignored in the matrix factorization. [sent-67, score-0.347]

44 We name this weighting scheme TF-KLD, since it includes the term frequency and the KL-divergence. [sent-68, score-0.119]

45 Taking the unigram feature not as an example, we have pk = [0. [sent-69, score-0.241]

46 25: the likelihood of this word being shared between two sentence is strongly dependent on whether the sentences are paraphrases. [sent-74, score-0.039]

47 a9n ×d other unigram features with respect to pk and 1−qk. [sent-83, score-0.277]

48 The diagonal line running through the midanddle 1 o−f qthe plot indicates zero KL-divergence, so features on this line will be ignored. [sent-84, score-0.036]

49 1We obtain very similar results with the opposite divergence KL(qk ||pk). [sent-85, score-0.068]

50 4 Supervised classification While previous work has performed paraphrase classification using distance or similarity in the latent space (Guo and Diab, 2012; Socher et al. [sent-89, score-0.897]

51 Specifically, we convert the latent representations of a pair of sentences v1 and v2 into a sample vector, s( v~1, v~2) = ? [sent-91, score-0.248]

52 Gts ~iv(·e,n· )th isis s representation, we can use any supervised classification algorithm. [sent-97, score-0.152]

53 A further advantage of treating paraphrase as a supervised classification problem is that we can apply additional features besides the latent representation. [sent-98, score-0.838]

54 We consider a subset of features identified by Wan et al. [sent-99, score-0.036]

55 These features mainly capture fine-grained similarity between sentences, for example by counting specific unigram and bigram overlap. [sent-101, score-0.187]

56 5 Experiments Our experiments test the utility of the TFKLD weighting towards paraphrase classification, using the Microsoft Research Paraphrase Corpus (Dolan et al. [sent-102, score-0.574]

57 The training set contains 2753 true paraphrase pairs and 1323 false paraphrase pairs; the test set contains 1147 and 578 pairs, respectively. [sent-104, score-0.91]

58 The TF-KLD weights are constructed from only the training set, while matrix factorizations are performed on the entire corpus. [sent-105, score-0.205]

59 Matrix factorization on both training and (unlabeled) test data can be viewed as a form of transductive learning (Gammerman et al. [sent-106, score-0.297]

60 , 1998), where we assume access to unlabeled test set instances. [sent-107, score-0.035]

61 2 We also consider an inductive setting, where we construct the basis of the latent space from only the training set, and then project the test set onto this basis to find the corresponding latent representation. [sent-108, score-0.519]

62 The performance differences between the transductive and inductive settings were generally between 0. [sent-109, score-0.157]

63 To our knowledge, the current state-of-theart is a supervised system that combines several machine translation metrics (Madnani et al. [sent-113, score-0.072]

64 , 2012), but we also compare with state-of-the-art unsupervised matrix factorization work (Guo and Diab, 2012). [sent-114, score-0.361]

65 1 Similarity-based classification In the first experiment, we predict whether a pair of sentences is a paraphrase by measuring their cosine similarity in latent space, using a threshold for the classification boundary. [sent-116, score-0.957]

66 As in prior work (Guo and Diab, 2012), the threshold is tuned on held-out training data. [sent-117, score-0.067]

67 We consider two distributional feature sets: FEAT1, which includes unigrams; and FEAT2, which also includes bigrams and unlabeled dependency pairs obtained from MaltParser (Nivre et al. [sent-118, score-0.253]

68 To compare with Guo and Diab (2012), we set the latent dimensionality to K = 100, which was the same in their paper. [sent-120, score-0.195]

69 Both SVD and NMF factorization are evaluated; in both cases, we minimize the Frobenius norm of the reconstruction error. [sent-121, score-0.191]

70 Table 2 compares the accuracy of a number of different configurations. [sent-122, score-0.039]

71 The transductive TF-KLD weighting yields the best overall accuracy, achieving 72. [sent-123, score-0.225]

72 While NMF performs slightly better than SVD in both comparisons, the major difference is the performance of discriminative TF-KLD weighting, which outperforms TF-IDF regardless of the factorization technique. [sent-125, score-0.241]

73 When we 2Another example of transductive learning in NLP is when Turian et al. [sent-126, score-0.106]

74 (2010) induced word representations from a corpus that included both training and test data for their downstream named entity recognition task. [sent-127, score-0.103]

75 894 Figure 2: Accuracy of feature and weighting combinations in the classification framework. [sent-128, score-0.199]

76 perform the matrix factorization on only the training data, the accuracy on the test set is 73. [sent-129, score-0.4]

77 2 Supervised classification Next, we apply supervised classification, constructing sample vectors from the latent representation as shown in Equation 1. [sent-133, score-0.424]

78 Figure 2 presents results for a range of latent dimensionalities. [sent-137, score-0.195]

79 Supervised learning identifies the important dimensions in the latent space, yielding significantly better performance that the similaritybased classification from the previous experiment. [sent-138, score-0.347]

80 In Table 3, we compare against prior published work, using the held-out development set to select the best value of K (again, K = 400). [sent-139, score-0.067]

81 The best result is from TF-KLD, with distributional features FEAT2, achieving 79. [sent-140, score-0.254]

82 This is well beyond all known prior results on this task. [sent-143, score-0.067]

83 When we induce the latent basis from only the training data, we get 78. [sent-144, score-0.234]

84 Finally, we augment the distributional representation, concatenating the ten “fine-grained” fea- tures in Table in Equation 1 to 1. [sent-147, score-0.26]

85 14 WTMF SVD NMF unigrams unigrams unigrams TF-IDF TF-KLD TF-KLD 100 100 100 cosine sim. [sent-154, score-0.327]

86 When the latent representation is induced from only the training data, the corresponding sults are 79. [sent-191, score-0.367]

87 re- F1, again These re- sults show that the information captured by the distributional representation can still be augmented by more fine-grained traditional features. [sent-194, score-0.34]

88 6 Conclusion We have presented three ways in which labeled data can improve distributional measures of seman- tic similarity at the sentence level. [sent-195, score-0.348]

89 The main innovation is TF-KLD, which discriminatively reweights the distributional features before factorization, so that discriminability impacts the induction of the latent representation. [sent-196, score-0.449]

90 We then transform the latent representation into a sample vector for supervised learning, obtaining results that strongly outperform the prior state-of-the-art; adding fine-grained lexical features further increases performance. [sent-197, score-0.486]

91 These ideas may have applicability in other semantic similarity tasks, and we are also eager to apply them to new, large-scale automatically-induced paraphrase corpora (Ganitkevitch et al. [sent-198, score-0.597]

92 In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 546–556, Stroudsburg, PA, USA. [sent-213, score-0.037]

93 In Proceedings of the Joint Conference of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, pages 468–476, Stroudsburg, PA, USA. [sent-224, score-0.037]

94 In Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence, pages 148–155. [sent-237, score-0.037]

95 In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 864–872, Stroudsburg, PA, USA. [sent-247, score-0.037]

96 Recognizing paraphrases and textual entailment using inversion transduction grammars. [sent-311, score-0.125]

97 In Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pages 25–30. [sent-312, score-0.037]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('paraphrase', 0.455), ('distributional', 0.218), ('latent', 0.195), ('diab', 0.194), ('factorization', 0.191), ('pk', 0.177), ('qk', 0.177), ('matrix', 0.17), ('guo', 0.164), ('nmf', 0.158), ('svd', 0.132), ('kauchak', 0.12), ('wtmf', 0.12), ('weighting', 0.119), ('madnani', 0.111), ('dolan', 0.106), ('transductive', 0.106), ('wan', 0.095), ('unigrams', 0.089), ('similarity', 0.087), ('paraphrases', 0.085), ('classification', 0.08), ('arora', 0.08), ('gammerman', 0.08), ('das', 0.078), ('turney', 0.078), ('representation', 0.077), ('bu', 0.076), ('supervised', 0.072), ('socher', 0.071), ('reweight', 0.069), ('bhagat', 0.069), ('factoring', 0.069), ('divergence', 0.068), ('prior', 0.067), ('wi', 0.066), ('unigram', 0.064), ('petrovi', 0.063), ('blacoe', 0.063), ('cosine', 0.06), ('mihalcea', 0.057), ('tensor', 0.056), ('maltparser', 0.056), ('semantic', 0.055), ('fan', 0.055), ('tences', 0.053), ('turian', 0.053), ('ganitkevitch', 0.053), ('representations', 0.053), ('pantel', 0.051), ('weiwei', 0.051), ('inductive', 0.051), ('discriminative', 0.05), ('induced', 0.05), ('nonnegative', 0.049), ('landauer', 0.047), ('ri', 0.045), ('sults', 0.045), ('identification', 0.044), ('lsa', 0.044), ('labeled', 0.043), ('kl', 0.042), ('concatenating', 0.042), ('stroudsburg', 0.042), ('bleu', 0.041), ('entailment', 0.04), ('accuracy', 0.039), ('basis', 0.039), ('strongly', 0.039), ('finegrained', 0.038), ('yielding', 0.037), ('pages', 0.037), ('features', 0.036), ('georgia', 0.036), ('unlabeled', 0.035), ('barzilay', 0.035), ('interactive', 0.035), ('nivre', 0.035), ('acobe', 0.035), ('gat', 0.035), ('dividual', 0.035), ('reweighting', 0.035), ('elem', 0.035), ('adi', 0.035), ('atanas', 0.035), ('chanev', 0.035), ('distributionally', 0.035), ('eofr', 0.035), ('eryigit', 0.035), ('factorizations', 0.035), ('justice', 0.035), ('marinov', 0.035), ('oefs', 0.035), ('owfo', 0.035), ('ppdb', 0.035), ('sentations', 0.035), ('similaritybased', 0.035), ('svetoslav', 0.035), ('ulsen', 0.035), ('yihong', 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity

Author: Yangfeng Ji ; Jacob Eisenstein

Abstract: Matrix and tensor factorization have been applied to a number of semantic relatedness tasks, including paraphrase identification. The key idea is that similarity in the latent space implies semantic relatedness. We describe three ways in which labeled data can improve the accuracy of these approaches on paraphrase classification. First, we design a new discriminative term-weighting metric called TF-KLD, which outperforms TF-IDF. Next, we show that using the latent representation from matrix factorization as features in a classification algorithm substantially improves accuracy. Finally, we combine latent features with fine-grained n-gram overlap features, yielding performance that is 3% more accurate than the prior state-of-the-art.

2 0.17485996 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations

Author: Congle Zhang ; Daniel S. Weld

Abstract: The distributional hypothesis, which states that words that occur in similar contexts tend to have similar meanings, has inspired several Web mining algorithms for paraphrasing semantically equivalent phrases. Unfortunately, these methods have several drawbacks, such as confusing synonyms with antonyms and causes with effects. This paper introduces three Temporal Correspondence Heuristics, that characterize regularities in parallel news streams, and shows how they may be used to generate high precision paraphrases for event relations. We encode the heuristics in a probabilistic graphical model to create the NEWSSPIKE algorithm for mining news streams. We present experiments demonstrating that NEWSSPIKE significantly outperforms several competitive baselines. In order to spur further research, we provide a large annotated corpus of timestamped news arti- cles as well as the paraphrases produced by NEWSSPIKE.

3 0.13299561 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations

Author: Shashank Srivastava ; Dirk Hovy ; Eduard Hovy

Abstract: In this paper, we propose a walk-based graph kernel that generalizes the notion of treekernels to continuous spaces. Our proposed approach subsumes a general framework for word-similarity, and in particular, provides a flexible way to incorporate distributed representations. Using vector representations, such an approach captures both distributional semantic similarities among words as well as the structural relations between them (encoded as the structure of the parse tree). We show an efficient formulation to compute this kernel using simple matrix operations. We present our results on three diverse NLP tasks, showing state-of-the-art results.

4 0.13194849 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

Author: Min Xiao ; Yuhong Guo

Abstract: Cross-lingual adaptation aims to learn a prediction model in a label-scarce target language by exploiting labeled data from a labelrich source language. An effective crosslingual adaptation system can substantially reduce the manual annotation effort required in many natural language processing tasks. In this paper, we propose a new cross-lingual adaptation approach for document classification based on learning cross-lingual discriminative distributed representations of words. Specifically, we propose to maximize the loglikelihood of the documents from both language domains under a cross-lingual logbilinear document model, while minimizing the prediction log-losses of labeled documents. We conduct extensive experiments on cross-lingual sentiment classification tasks of Amazon product reviews. Our experimental results demonstrate the efficacy of the pro- posed cross-lingual adaptation approach.

5 0.11815025 137 emnlp-2013-Multi-Relational Latent Semantic Analysis

Author: Kai-Wei Chang ; Wen-tau Yih ; Christopher Meek

Abstract: We present Multi-Relational Latent Semantic Analysis (MRLSA) which generalizes Latent Semantic Analysis (LSA). MRLSA provides an elegant approach to combining multiple relations between words by constructing a 3-way tensor. Similar to LSA, a lowrank approximation of the tensor is derived using a tensor decomposition. Each word in the vocabulary is thus represented by a vector in the latent semantic space and each relation is captured by a latent square matrix. The degree of two words having a specific relation can then be measured through simple linear algebraic operations. We demonstrate that by integrating multiple relations from both homogeneous and heterogeneous information sources, MRLSA achieves state- of-the-art performance on existing benchmark datasets for two relations, antonymy and is-a.

6 0.11385678 167 emnlp-2013-Semi-Markov Phrase-Based Monolingual Alignment

7 0.10288118 134 emnlp-2013-Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks

8 0.0973529 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering

9 0.09304662 148 emnlp-2013-Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching

10 0.081977308 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors

11 0.077635817 158 emnlp-2013-Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

12 0.0776015 9 emnlp-2013-A Log-Linear Model for Unsupervised Text Normalization

13 0.076075621 12 emnlp-2013-A Semantically Enhanced Approach to Determine Textual Similarity

14 0.075312681 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

15 0.072885871 87 emnlp-2013-Fish Transporters and Miracle Homes: How Compositional Distributional Semantics can Help NP Parsing

16 0.07277929 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations

17 0.071448743 197 emnlp-2013-Using Paraphrases and Lexical Semantics to Improve the Accuracy and the Robustness of Supervised Models in Situated Dialogue Systems

18 0.070011213 123 emnlp-2013-Learning to Rank Lexical Substitutions

19 0.069811083 58 emnlp-2013-Dependency Language Models for Sentence Completion

20 0.069385014 165 emnlp-2013-Scaling to Large3 Data: An Efficient and Effective Method to Compute Distributional Thesauri


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.234), (1, 0.015), (2, -0.074), (3, -0.011), (4, 0.053), (5, 0.137), (6, 0.01), (7, -0.031), (8, -0.154), (9, -0.092), (10, 0.119), (11, -0.01), (12, -0.059), (13, 0.078), (14, 0.038), (15, -0.047), (16, -0.019), (17, 0.081), (18, -0.043), (19, 0.005), (20, -0.041), (21, -0.061), (22, 0.066), (23, -0.075), (24, -0.026), (25, 0.039), (26, -0.217), (27, 0.058), (28, 0.039), (29, 0.135), (30, 0.102), (31, 0.214), (32, 0.074), (33, 0.043), (34, 0.062), (35, -0.055), (36, 0.041), (37, -0.117), (38, 0.018), (39, -0.033), (40, -0.087), (41, -0.153), (42, -0.007), (43, -0.064), (44, 0.113), (45, -0.028), (46, -0.116), (47, -0.027), (48, 0.104), (49, -0.038)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95039451 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity

Author: Yangfeng Ji ; Jacob Eisenstein

Abstract: Matrix and tensor factorization have been applied to a number of semantic relatedness tasks, including paraphrase identification. The key idea is that similarity in the latent space implies semantic relatedness. We describe three ways in which labeled data can improve the accuracy of these approaches on paraphrase classification. First, we design a new discriminative term-weighting metric called TF-KLD, which outperforms TF-IDF. Next, we show that using the latent representation from matrix factorization as features in a classification algorithm substantially improves accuracy. Finally, we combine latent features with fine-grained n-gram overlap features, yielding performance that is 3% more accurate than the prior state-of-the-art.

2 0.6124807 165 emnlp-2013-Scaling to Large3 Data: An Efficient and Effective Method to Compute Distributional Thesauri

Author: Martin Riedl ; Chris Biemann

Abstract: We introduce a new highly scalable approach for computing Distributional Thesauri (DTs). By employing pruning techniques and a distributed framework, we make the computation for very large corpora feasible on comparably small computational resources. We demonstrate this by releasing a DT for the whole vocabulary of Google Books syntactic n-grams. Evaluating against lexical resources using two measures, we show that our approach produces higher quality DTs than previous approaches, and is thus preferable in terms of speed and quality for large corpora.

3 0.60200566 137 emnlp-2013-Multi-Relational Latent Semantic Analysis

Author: Kai-Wei Chang ; Wen-tau Yih ; Christopher Meek

Abstract: We present Multi-Relational Latent Semantic Analysis (MRLSA) which generalizes Latent Semantic Analysis (LSA). MRLSA provides an elegant approach to combining multiple relations between words by constructing a 3-way tensor. Similar to LSA, a lowrank approximation of the tensor is derived using a tensor decomposition. Each word in the vocabulary is thus represented by a vector in the latent semantic space and each relation is captured by a latent square matrix. The degree of two words having a specific relation can then be measured through simple linear algebraic operations. We demonstrate that by integrating multiple relations from both homogeneous and heterogeneous information sources, MRLSA achieves state- of-the-art performance on existing benchmark datasets for two relations, antonymy and is-a.

4 0.59767872 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations

Author: Congle Zhang ; Daniel S. Weld

Abstract: The distributional hypothesis, which states that words that occur in similar contexts tend to have similar meanings, has inspired several Web mining algorithms for paraphrasing semantically equivalent phrases. Unfortunately, these methods have several drawbacks, such as confusing synonyms with antonyms and causes with effects. This paper introduces three Temporal Correspondence Heuristics, that characterize regularities in parallel news streams, and shows how they may be used to generate high precision paraphrases for event relations. We encode the heuristics in a probabilistic graphical model to create the NEWSSPIKE algorithm for mining news streams. We present experiments demonstrating that NEWSSPIKE significantly outperforms several competitive baselines. In order to spur further research, we provide a large annotated corpus of timestamped news arti- cles as well as the paraphrases produced by NEWSSPIKE.

5 0.55292547 197 emnlp-2013-Using Paraphrases and Lexical Semantics to Improve the Accuracy and the Robustness of Supervised Models in Situated Dialogue Systems

Author: Claire Gardent ; Lina M. Rojas Barahona

Abstract: This paper explores to what extent lemmatisation, lexical resources, distributional semantics and paraphrases can increase the accuracy of supervised models for dialogue management. The results suggest that each of these factors can help improve performance but that the impact will vary depending on their combination and on the evaluation mode.

6 0.50972372 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations

7 0.49889058 12 emnlp-2013-A Semantically Enhanced Approach to Determine Textual Similarity

8 0.46722013 123 emnlp-2013-Learning to Rank Lexical Substitutions

9 0.4452312 138 emnlp-2013-Naive Bayes Word Sense Induction

10 0.44471273 134 emnlp-2013-Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks

11 0.43843228 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

12 0.42625773 9 emnlp-2013-A Log-Linear Model for Unsupervised Text Normalization

13 0.41773012 167 emnlp-2013-Semi-Markov Phrase-Based Monolingual Alignment

14 0.40441662 172 emnlp-2013-Simple Customization of Recursive Neural Networks for Semantic Relation Classification

15 0.40257841 44 emnlp-2013-Centering Similarity Measures to Reduce Hubs

16 0.39536491 25 emnlp-2013-Appropriately Incorporating Statistical Significance in PMI

17 0.39259216 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering

18 0.38542268 195 emnlp-2013-Unsupervised Spectral Learning of WCFG as Low-rank Matrix Completion

19 0.3791098 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations

20 0.37886879 11 emnlp-2013-A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.026), (6, 0.01), (8, 0.071), (18, 0.08), (22, 0.047), (30, 0.072), (39, 0.138), (45, 0.018), (47, 0.013), (50, 0.014), (51, 0.188), (66, 0.056), (71, 0.027), (75, 0.076), (77, 0.015), (96, 0.057), (97, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.88116163 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity

Author: Yangfeng Ji ; Jacob Eisenstein

Abstract: Matrix and tensor factorization have been applied to a number of semantic relatedness tasks, including paraphrase identification. The key idea is that similarity in the latent space implies semantic relatedness. We describe three ways in which labeled data can improve the accuracy of these approaches on paraphrase classification. First, we design a new discriminative term-weighting metric called TF-KLD, which outperforms TF-IDF. Next, we show that using the latent representation from matrix factorization as features in a classification algorithm substantially improves accuracy. Finally, we combine latent features with fine-grained n-gram overlap features, yielding performance that is 3% more accurate than the prior state-of-the-art.

2 0.8788482 80 emnlp-2013-Exploiting Zero Pronouns to Improve Chinese Coreference Resolution

Author: Fang Kong ; Hwee Tou Ng

Abstract: Coreference resolution plays a critical role in discourse analysis. This paper focuses on exploiting zero pronouns to improve Chinese coreference resolution. In particular, a simplified semantic role labeling framework is proposed to identify clauses and to detect zero pronouns effectively, and two effective methods (refining syntactic parser and refining learning example generation) are employed to exploit zero pronouns for Chinese coreference resolution. Evaluation on the CoNLL-2012 shared task data set shows that zero pronouns can significantly improve Chinese coreference resolution.

3 0.85097808 21 emnlp-2013-An Empirical Study Of Semi-Supervised Chinese Word Segmentation Using Co-Training

Author: Fan Yang ; Paul Vozila

Abstract: In this paper we report an empirical study on semi-supervised Chinese word segmentation using co-training. We utilize two segmenters: 1) a word-based segmenter leveraging a word-level language model, and 2) a character-based segmenter using characterlevel features within a CRF-based sequence labeler. These two segmenters are initially trained with a small amount of segmented data, and then iteratively improve each other using the large amount of unlabelled data. Our experimental results show that co-training captures 20% and 31% of the performance improvement achieved by supervised training with an order of magnitude more data for the SIGHAN Bakeoff 2005 PKU and CU corpora respectively.

4 0.83450669 137 emnlp-2013-Multi-Relational Latent Semantic Analysis

Author: Kai-Wei Chang ; Wen-tau Yih ; Christopher Meek

Abstract: We present Multi-Relational Latent Semantic Analysis (MRLSA) which generalizes Latent Semantic Analysis (LSA). MRLSA provides an elegant approach to combining multiple relations between words by constructing a 3-way tensor. Similar to LSA, a lowrank approximation of the tensor is derived using a tensor decomposition. Each word in the vocabulary is thus represented by a vector in the latent semantic space and each relation is captured by a latent square matrix. The degree of two words having a specific relation can then be measured through simple linear algebraic operations. We demonstrate that by integrating multiple relations from both homogeneous and heterogeneous information sources, MRLSA achieves state- of-the-art performance on existing benchmark datasets for two relations, antonymy and is-a.

5 0.81114811 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

Author: Jason Weston ; Antoine Bordes ; Oksana Yakhnenko ; Nicolas Usunier

Abstract: This paper proposes a novel approach for relation extraction from free text which is trained to jointly use information from the text and from existing knowledge. Our model is based on scoring functions that operate by learning low-dimensional embeddings of words, entities and relationships from a knowledge base. We empirically show on New York Times articles aligned with Freebase relations that our approach is able to efficiently use the extra information provided by a large subset of Freebase data (4M entities, 23k relationships) to improve over methods that rely on text features alone.

6 0.8097465 90 emnlp-2013-Generating Coherent Event Schemas at Scale

7 0.80945331 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

8 0.80665731 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks

9 0.80023783 164 emnlp-2013-Scaling Semantic Parsers with On-the-Fly Ontology Matching

10 0.79506439 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations

11 0.79392642 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

12 0.79151255 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

13 0.78891742 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

14 0.78693765 69 emnlp-2013-Efficient Collective Entity Linking with Stacking

15 0.78680545 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction

16 0.78610909 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

17 0.78502852 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery

18 0.78443372 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors

19 0.78409731 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

20 0.78329355 143 emnlp-2013-Open Domain Targeted Sentiment