emnlp emnlp2013 emnlp2013-17 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shashank Srivastava ; Dirk Hovy ; Eduard Hovy
Abstract: In this paper, we propose a walk-based graph kernel that generalizes the notion of treekernels to continuous spaces. Our proposed approach subsumes a general framework for word-similarity, and in particular, provides a flexible way to incorporate distributed representations. Using vector representations, such an approach captures both distributional semantic similarities among words as well as the structural relations between them (encoded as the structure of the parse tree). We show an efficient formulation to compute this kernel using simple matrix operations. We present our results on three diverse NLP tasks, showing state-of-the-art results.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract In this paper, we propose a walk-based graph kernel that generalizes the notion of treekernels to continuous spaces. [sent-2, score-0.482]
2 Our proposed approach subsumes a general framework for word-similarity, and in particular, provides a flexible way to incorporate distributed representations. [sent-3, score-0.093]
3 Using vector representations, such an approach captures both distributional semantic similarities among words as well as the structural relations between them (encoded as the structure of the parse tree). [sent-4, score-0.273]
4 We show an efficient formulation to compute this kernel using simple matrix operations. [sent-5, score-0.564]
5 1 Introduction Capturing semantic similarity between sentences is a fundamental issue in NLP, with applications in a wide range of tasks. [sent-7, score-0.178]
6 Previously, tree kernels based on common substructures have been used to model similarity between parse trees (Collins and Duffy, 2002; Moschitti, 2004; Moschitti, 2006b). [sent-8, score-0.976]
7 These kernels encode a high number of latent syntactic features within a concise representation, and compute the similarity between two parse trees based on the matching of node-labels (words, POS tags, etc. [sent-9, score-0.913]
8 While this is sufficient to capture syntactic similarity, it does not capture semantic similarity very well, even when using discrete semantic types as node labels. [sent-11, score-0.405]
9 This constrains the utility of many traditional tree kernels in two ways: i) two sentences that are syntactically identical, but have no semantic similarity can receive a high matching score (see Table 1, top) while ii) two sentences with only local 1411 edu, mai l dirkhovy . [sent-12, score-0.897]
10 com @ syntactic overlap, but high semantic similarity can receive low scores (see Table 1, bottom). [sent-13, score-0.254]
11 ✓aticpshlougwresmantic similarity In contrast, distributional vector representations of words have been successful in capturing finegrained semantics, but lack syntactic knowledge. [sent-16, score-0.311]
12 Resources such as Wordnet, dictionaries and ontologies that encode different semantic perspectives can also provide additional knowledge infusion. [sent-17, score-0.052]
13 In this paper, we describe a generic walk-based graph kernel for dependency parse trees that subsumes general notions of word-similarity, while focusing on vector representations of words to capture lexical semantics. [sent-18, score-0.826]
14 Through a convolutional framework, our approach takes into account the distributional semantic similarities between words in a sentence as well as the structure of the parse tree. [sent-19, score-0.27]
15 We present a new graph kernel for NLP that extends to distributed word representations, and diverse word similarity measures. [sent-21, score-0.686]
16 Our generic kernel shows state-of-the-art performance on three eclectic NLP tasks. [sent-25, score-0.376]
17 oc d2s0 i1n3 N Aastusorcaila Ltiaon g fuoarg Ceo Pmrpoucetastsi on ga,l p Laignegsu 1is4t1ic1s–1416, 2 Related Work Tree kernels in NLP Tree kernels have been extensively used to capture syntactic information about parse trees in tasks such as parsing (Collins and Duffy, 2002), NER (Wang et al. [sent-28, score-1.156]
18 These kernels are based on the paradigm that parse trees are similar if they contain many common substructures, consisting ofnodes with identical labels (Vishwanathan and Smola, 2003; Collins and Duffy, 2002). [sent-32, score-0.656]
19 Moschitti (2006a) proposed a partial tree kernel that adds flexibility in matching tree substructures. [sent-33, score-0.695]
20 (201 1) introduce a lexical semantic tree kernel that incorporates continuous similarity values between node labels, albeit with a different focus than ours and would not match words with different POS. [sent-35, score-0.786]
21 This would miss the similarity of “feline friend” and “cat” in our examples, as it requires matching the adjective “feline” with “cat”, and verb “kissed” with “kiss”. [sent-36, score-0.165]
22 Walk based kernels Kernels for structured data derive from the seminal Convolution Kernel formalism by Haussler (1999) for designing kernels for structured objects through local decompositions. [sent-37, score-1.005]
23 Our proposed kernel for parse trees is most closely associated with the random walk-based kernels defined by Gartner et al. [sent-38, score-1.032]
24 The walk-based graph kernels proposed by Gartner et al. [sent-41, score-0.526]
25 (2003) count the common walks between two input graphs, using the adjacency matrix of the product graph. [sent-42, score-0.676]
26 This work extends to graphs with a finite set of edge and node labels by appropriately modifying the adjacency matrix. [sent-43, score-0.353]
27 Our kernel differs from these kernels in two significant ways: (i) Our method extends beyond label matching to con- tinuous similarity metrics (this conforms with the very general formalism for graph kernels in Vishwanathan et al. [sent-44, score-1.63]
28 (ii) Rather than using the adjacency matrix to model edge-strengths, we modify the product graph and the corresponding adjacency matrix to model node similarities. [sent-46, score-0.743]
29 3 Vector Tree Kernels In this section, we describe our kernel and an algorithm to compute it as a simple matrix multiplication formulation. [sent-47, score-0.512]
30 1 Kernel description The similarity kernel K between two dependency trees can be defined as: K(T1,T2) = X k(h1,h2) h1⊆TX1,h2⊆T2 len(⊆hT1X X)=le⊆n(Th2) where the summation is over pairs of equal length walks h1 and h2 on the trees T1 and T2 respectively. [sent-49, score-1.24]
31 The vector representation allows us several choices for the node kernel function κ. [sent-51, score-0.498]
32 nonnegative values in [0, 1] (assuming word vector representations are normalized). [sent-62, score-0.085]
33 Non-negativity is necessary, since we define the walk kernel to be the product of the individual kernels. [sent-63, score-0.588]
34 As walk kernels are products of individual node-kernels, boundedness by 1 ensures that the kernel contribution does not grow arbitrarily for longer length walks. [sent-64, score-1.07]
35 The kernel function K puts a high similarity weight between parse trees if they contain common walks with semantically similar words in corresponding positions. [sent-65, score-1.055]
36 Apart from the Gaussian kernel, the other two kernels are based on the dot-product of the word vector representations. [sent-66, score-0.484]
37 We observe that the positive-linear kernel defined above is not a Mercer kernel, since the max operation makes it nonpositive semidefinite (PSD). [sent-67, score-0.376]
38 However, this formulation has desirable properties, most significant being that all walks with one or more node-pair mismatches are strictly penalized and add no score to the tree-kernel. [sent-68, score-0.433]
39 This is a more selective condition than the other two kernels, where mediocre walk combinations could also add small contributions to the score. [sent-69, score-0.141]
40 The sigmoid kernel is also non-PSD, but is known to work well empirically (Boughorbel et al. [sent-70, score-0.412]
41 We also observe while the summation in the kernel is over equal length walks, the formalism can allow comparisons over different length paths by including self-loops at nodes in the tree. [sent-72, score-0.66]
42 With a notion of similarity between words that defines the local node kernels, we need computational machinery to enumerate all pairs of walks between two trees, and compute the summation over products in the kernel K(T1, T2) efficiently. [sent-73, score-1.143]
43 We now show a convenient way to compute this as a matrix geometric series. [sent-74, score-0.136]
44 2 Matrix Formulation for Kernel Computation Walk-based kernels compute the number of common walks using the adjacency matrix of the product graph (Gartner et al. [sent-76, score-1.248]
45 In our case, this computation is complicated by the fact that instead of counting common walks, we need to compute a product of node-similarities for each walk. [sent-78, score-0.117]
46 Since we compute similarity scores over nodes, rather than edges, the product for a walk of length n involves n + 1factors. [sent-79, score-0.434]
47 However, we can still compute the tree kernel K as a simple sum of matrix products. [sent-80, score-0.652]
48 Given two trees T(V, E) and T0(V0, E0), we define a modified product graph G(Vp, Ep) with an additional ghost node u added to the vertex set. [sent-81, score-0.381]
49 In our formulation, u now serves as a starting location for all random walks on G, and a k + 1length walk of G corresponds to a pair of k length walks on T and T0. [sent-83, score-0.893]
50 We now define the weighted adjacency matrix W for G, which incorporates the local node kernels. [sent-84, score-0.375]
51 1413 W(vi1,vj10),(vi2,vj20)=(0κ( :v(i(2v,iv1,jv2j10)0), :(v oi2th,vej2r0w))i ∈/seEp = κ(vi1, vj10) Wu,(vi1,vj10) W(v,u) = 0 ∀ v ∈ Vp There is a straightforward bijective mapping from walks on G starting from u to pairs of walks on T and T0. [sent-85, score-0.702]
52 Restricting ourselves to the case when the first node of a k + 1 length walk is u, the next k steps allow us to efficiently compute the products of the node similarities along the k nodes in the corresponding k length walks in T and T0. [sent-86, score-0.961]
53 Given this ad- jacency matrix for G, the sum of values of k length walk kernels is given by the uth row of the (k + 1)th exponent of the weighted adjacency matrix (denoted as Wk+1). [sent-87, score-1.016]
54 This corresponds to k+ 1length walks on G starting from u and ending at any node. [sent-88, score-0.351]
55 Specifically, Wu,(vi,vj0) corresponds to the sum of similarities of all common walks of length n in T and T0 that end in vi in T and vj0 in T0. [sent-89, score-0.449]
56 The kernel K for walks upto length N can now be calculated as : K(T,T0) =X|Vp|Su,i Xi where S = W + W2 + . [sent-90, score-0.777]
57 WN+1 We note that in out formulation, longer walks are naturally discounted, since they involve products of more factors (generally all less than unity). [sent-93, score-0.4]
58 The above kernel provides a similarity measure between any two pairs of dependency parse-trees. [sent-94, score-0.539]
59 Depending on whether we consider directional relations in the parse tree, the edge set Ep changes, while the procedure for the kernel computation remains the same. [sent-95, score-0.521]
60 Finally, to avoid larger trees yield- ing larger values for the kernel, we normalize the kernel by the number of edges in the product graph. [sent-96, score-0.589]
61 We create dependency trees using the FANSE parser (Tratz and Hovy, 2011), and use distribution-based SENNA word embeddings by Collobert et al. [sent-98, score-0.203]
62 These embeddings provide low-dimensional vector representations of words, while encoding distributional semantic characteristics. [sent-100, score-0.244]
63 The task is to identify the polarity of a given sentence. [sent-104, score-0.065]
64 “terribly entertaining” vs “terribly written”), so simple lexical approaches are not expected to work well here, while syntactic context could help disambiguation. [sent-107, score-0.046]
65 Next, we try our approach on the MSR paraphrase corpus. [sent-108, score-0.099]
66 Each instance consists of a pair of sentences, so the VTK cannot be directly used by a kernel machine for classification. [sent-110, score-0.376]
67 Instead, we generate 16 kernel values based for each pair on different parameter settings of the kernel, and feed these as features to a linear SVM. [sent-111, score-0.376]
68 We focus on target phrases by upweighting walks that pass through target nodes. [sent-116, score-0.351]
69 This is done by simply multiplying the corresponding entries in the adjacency matrix by a constant factor. [sent-117, score-0.254]
70 654a7193t4ase On the polarity data set, Vector Tree Kernel (VTK) significantly outperforms the state-of-the-art method by Carrillo de Albornoz et al. [sent-123, score-0.065]
71 (2010), who use a hybrid model incorporating databases of affective lexicons, and also explicitly model the effect of negation and quantifiers (see Table 2). [sent-124, score-0.041]
72 Lexical approaches using pairwise semantic similarity 1414 of SENNA embeddings (DSM), as well as Wordnet Affective Database-based (WNA) labels perform poorly (Carrillo de Albornoz et al. [sent-125, score-0.231]
73 On the other hand, a syntactic tree kernel (SSTK) that ignores distributional semantic similarity between words, fails as expected. [sent-127, score-0.794]
74 c76o5429rpus On the MSR paraphrase corpus, VTK performs competitively against state-of-the-art-methods. [sent-133, score-0.126]
75 We expected paraphrasing to be challenging to our method, since it can involve little syntactic overlap. [sent-134, score-0.046]
76 However, data analysis reveals that the corpus generally contains sentence pairs with high syntactic similarity. [sent-135, score-0.046]
77 Results for this task are encouraging since ours is a general approach, while other systems use multiple task-specific features like semantic role labels, active-passive voice conversion, and synonymy resolution. [sent-136, score-0.052]
78 (2013), whose approach uses an conjunction of lexical and syntactic tree kernels (Moschitti, 2006b), and distributional vectors. [sent-144, score-0.694]
79 VTK identified several templates of metaphor usage such as “warm heart” and “cold shoulder”. [sent-145, score-0.137]
80 We look towards approaches for automatedly mining such metaphor patterns from a corpus. [sent-146, score-0.137]
81 6 Conclusion We present a general formalism for walk-based kernels to evaluate similarity of dependency trees. [sent-147, score-0.685]
82 Our method generalizes tree kernels to take distributed representations of nodes as input, and capture both lexical semantics and syntactic structures of parse trees. [sent-148, score-0.899]
83 Our approach has tunable parameters to look for larger or smaller syntactic constructs. [sent-149, score-0.046]
84 The approach can generalize to any task involving structural and local similarity, and arbitrary node similarity measures. [sent-151, score-0.247]
85 Conditionally positive definite kernels for svm based image recognition. [sent-154, score-0.454]
86 A hybrid approach to emotional sentence polarity and intensity classification. [sent-158, score-0.092]
87 In Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pages 153–161. [sent-159, score-0.033]
88 New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. [sent-163, score-0.037]
89 In Proceedings of the 40th annual meeting on association for computational linguistics, pages 263–270. [sent-164, score-0.033]
90 Structured lexical similarity via convolution kernels on dependency trees. [sent-172, score-0.72]
91 In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 1034–1046. [sent-173, score-0.033]
92 of the International Conference on Machine Learning, pages 107–1 14. [sent-179, score-0.033]
93 Using machine translation evaluation techniques to determine sentence-level semantic equivalence. [sent-182, score-0.052]
94 In Proceedings of the Annual Conference on Computational Learning Theory, pages 129–143. [sent-187, score-0.033]
95 In Proceedings of the Twentieth International Conference on Machine Learning, pages 321–328. [sent-200, score-0.033]
96 Paraphrase recognition using machine learning to combine similarity measures. [sent-204, score-0.126]
97 In Proceedings of the ACL-IJCNLP 2009 Student Research Workshop, ACLstudent ’09, pages 27– 35, Stroudsburg, PA, USA. [sent-205, score-0.033]
98 A study on convolution kernels for shallow semantic parsing. [sent-213, score-0.609]
99 Efficient convolution kernels for dependency and constituent syntactic trees. [sent-218, score-0.64]
100 Exploiting constituent dependencies for tree kernel-based semantic relation extraction. [sent-231, score-0.192]
wordName wordTfidf (topN-words)
[('kernels', 0.454), ('kernel', 0.376), ('walks', 0.351), ('vtk', 0.217), ('adjacency', 0.164), ('walk', 0.141), ('tree', 0.14), ('metaphor', 0.137), ('moschitti', 0.132), ('similarity', 0.126), ('gartner', 0.124), ('vishwanathan', 0.124), ('trees', 0.113), ('convolution', 0.103), ('paraphrase', 0.099), ('albornoz', 0.093), ('carrillo', 0.093), ('node', 0.092), ('matrix', 0.09), ('parse', 0.089), ('alessandro', 0.088), ('duffy', 0.074), ('summation', 0.074), ('graph', 0.072), ('product', 0.071), ('formalism', 0.068), ('polarity', 0.065), ('boughorbel', 0.062), ('croce', 0.062), ('cumby', 0.062), ('feline', 0.062), ('senna', 0.062), ('terribly', 0.062), ('tratz', 0.062), ('msr', 0.062), ('vp', 0.055), ('representations', 0.055), ('dirk', 0.054), ('kashima', 0.054), ('shashank', 0.054), ('substructures', 0.054), ('subsumes', 0.054), ('distributional', 0.054), ('hovy', 0.053), ('embeddings', 0.053), ('formulation', 0.052), ('semantic', 0.052), ('ep', 0.051), ('length', 0.05), ('products', 0.049), ('similarities', 0.048), ('syntactic', 0.046), ('compute', 0.046), ('eduard', 0.044), ('nodes', 0.042), ('extends', 0.041), ('affective', 0.041), ('qian', 0.039), ('matching', 0.039), ('distributed', 0.039), ('discrete', 0.037), ('collins', 0.037), ('dependency', 0.037), ('sigmoid', 0.036), ('generalizes', 0.034), ('vertex', 0.033), ('pages', 0.033), ('diverse', 0.032), ('stroudsburg', 0.032), ('roberto', 0.032), ('nlp', 0.032), ('pang', 0.03), ('vector', 0.03), ('strictly', 0.03), ('collobert', 0.03), ('receive', 0.03), ('edge', 0.029), ('edges', 0.029), ('local', 0.029), ('cat', 0.028), ('gaussian', 0.028), ('graphs', 0.027), ('akihiro', 0.027), ('uth', 0.027), ('arr', 0.027), ('competitively', 0.027), ('constrains', 0.027), ('convolutional', 0.027), ('daniele', 0.027), ('denmark', 0.027), ('directional', 0.027), ('entertaining', 0.027), ('gerv', 0.027), ('hardness', 0.027), ('intensity', 0.027), ('kondor', 0.027), ('pighin', 0.027), ('prodromos', 0.027), ('psd', 0.027), ('risi', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations
Author: Shashank Srivastava ; Dirk Hovy ; Eduard Hovy
Abstract: In this paper, we propose a walk-based graph kernel that generalizes the notion of treekernels to continuous spaces. Our proposed approach subsumes a general framework for word-similarity, and in particular, provides a flexible way to incorporate distributed representations. Using vector representations, such an approach captures both distributional semantic similarities among words as well as the structural relations between them (encoded as the structure of the parse tree). We show an efficient formulation to compute this kernel using simple matrix operations. We present our results on three diverse NLP tasks, showing state-of-the-art results.
2 0.18424928 31 emnlp-2013-Automatic Feature Engineering for Answer Selection and Extraction
Author: Aliaksei Severyn ; Alessandro Moschitti
Abstract: This paper proposes a framework for automatically engineering features for two important tasks of question answering: answer sentence selection and answer extraction. We represent question and answer sentence pairs with linguistic structures enriched by semantic information, where the latter is produced by automatic classifiers, e.g., question classifier and Named Entity Recognizer. Tree kernels applied to such structures enable a simple way to generate highly discriminative structural features that combine syntactic and semantic information encoded in the input trees. We conduct experiments on a public benchmark from TREC to compare with previous systems for answer sentence selection and answer extraction. The results show that our models greatly improve on the state of the art, e.g., up to 22% on F1 (relative improvement) for answer extraction, while using no additional resources and no manual feature engineering.
3 0.15528256 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes
Author: Daniel Preotiuc-Pietro ; Trevor Cohn
Abstract: Temporal variations of text are usually ignored in NLP applications. However, text use changes with time, which can affect many applications. In this paper we model periodic distributions of words over time. Focusing on hashtag frequency in Twitter, we first automatically identify the periodic patterns. We use this for regression in order to forecast the volume of a hashtag based on past data. We use Gaussian Processes, a state-ofthe-art bayesian non-parametric model, with a novel periodic kernel. We demonstrate this in a text classification setting, assigning the tweet hashtag based on the rest of its text. This method shows significant improvements over competitive baselines.
4 0.13299561 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
Author: Yangfeng Ji ; Jacob Eisenstein
Abstract: Matrix and tensor factorization have been applied to a number of semantic relatedness tasks, including paraphrase identification. The key idea is that similarity in the latent space implies semantic relatedness. We describe three ways in which labeled data can improve the accuracy of these approaches on paraphrase classification. First, we design a new discriminative term-weighting metric called TF-KLD, which outperforms TF-IDF. Next, we show that using the latent representation from matrix factorization as features in a classification algorithm substantially improves accuracy. Finally, we combine latent features with fine-grained n-gram overlap features, yielding performance that is 3% more accurate than the prior state-of-the-art.
5 0.11723483 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk
Author: Xiaoning Zhu ; Zhongjun He ; Hua Wu ; Haifeng Wang ; Conghui Zhu ; Tiejun Zhao
Abstract: This paper proposes a novel approach that utilizes a machine learning method to improve pivot-based statistical machine translation (SMT). For language pairs with few bilingual data, a possible solution in pivot-based SMT using another language as a
6 0.082060866 10 emnlp-2013-A Multi-Teraflop Constituency Parser using GPUs
7 0.077178165 44 emnlp-2013-Centering Similarity Measures to Reduce Hubs
8 0.07492166 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation
9 0.071795367 188 emnlp-2013-Tree Kernel-based Negation and Speculation Scope Detection with Structured Syntactic Parse Features
10 0.071584947 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
11 0.070844561 170 emnlp-2013-Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet
12 0.067252062 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification
13 0.065622859 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?
14 0.06200783 58 emnlp-2013-Dependency Language Models for Sentence Completion
15 0.061554361 158 emnlp-2013-Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
16 0.06132295 182 emnlp-2013-The Topology of Semantic Knowledge
17 0.060567584 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification
18 0.059120093 12 emnlp-2013-A Semantically Enhanced Approach to Determine Textual Similarity
19 0.058656141 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations
20 0.057863832 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing
topicId topicWeight
[(0, -0.195), (1, 0.009), (2, -0.05), (3, -0.007), (4, 0.032), (5, 0.105), (6, -0.004), (7, -0.081), (8, -0.064), (9, 0.022), (10, 0.056), (11, 0.101), (12, -0.121), (13, 0.055), (14, 0.034), (15, 0.022), (16, -0.104), (17, 0.232), (18, 0.014), (19, 0.028), (20, 0.021), (21, -0.188), (22, 0.073), (23, -0.018), (24, -0.232), (25, -0.072), (26, -0.215), (27, 0.07), (28, -0.074), (29, -0.033), (30, 0.167), (31, 0.133), (32, -0.14), (33, -0.095), (34, -0.011), (35, -0.13), (36, 0.202), (37, -0.147), (38, -0.003), (39, -0.043), (40, 0.028), (41, 0.135), (42, -0.063), (43, 0.021), (44, -0.172), (45, 0.034), (46, -0.001), (47, -0.083), (48, 0.006), (49, 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.96642107 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations
Author: Shashank Srivastava ; Dirk Hovy ; Eduard Hovy
Abstract: In this paper, we propose a walk-based graph kernel that generalizes the notion of treekernels to continuous spaces. Our proposed approach subsumes a general framework for word-similarity, and in particular, provides a flexible way to incorporate distributed representations. Using vector representations, such an approach captures both distributional semantic similarities among words as well as the structural relations between them (encoded as the structure of the parse tree). We show an efficient formulation to compute this kernel using simple matrix operations. We present our results on three diverse NLP tasks, showing state-of-the-art results.
2 0.74555933 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes
Author: Daniel Preotiuc-Pietro ; Trevor Cohn
Abstract: Temporal variations of text are usually ignored in NLP applications. However, text use changes with time, which can affect many applications. In this paper we model periodic distributions of words over time. Focusing on hashtag frequency in Twitter, we first automatically identify the periodic patterns. We use this for regression in order to forecast the volume of a hashtag based on past data. We use Gaussian Processes, a state-ofthe-art bayesian non-parametric model, with a novel periodic kernel. We demonstrate this in a text classification setting, assigning the tweet hashtag based on the rest of its text. This method shows significant improvements over competitive baselines.
3 0.58087748 188 emnlp-2013-Tree Kernel-based Negation and Speculation Scope Detection with Structured Syntactic Parse Features
Author: Bowei Zou ; Guodong Zhou ; Qiaoming Zhu
Abstract: Scope detection is a key task in information extraction. This paper proposes a new approach for tree kernel-based scope detection by using the structured syntactic parse information. In addition, we have explored the way of selecting compatible features for different part-of-speech cues. Experiments on the BioScope corpus show that both constituent and dependency structured syntactic parse features have the advantage in capturing the potential relationships between cues and their scopes. Compared with the state of the art scope detection systems, our system achieves substantial improvement.
4 0.48328096 31 emnlp-2013-Automatic Feature Engineering for Answer Selection and Extraction
Author: Aliaksei Severyn ; Alessandro Moschitti
Abstract: This paper proposes a framework for automatically engineering features for two important tasks of question answering: answer sentence selection and answer extraction. We represent question and answer sentence pairs with linguistic structures enriched by semantic information, where the latter is produced by automatic classifiers, e.g., question classifier and Named Entity Recognizer. Tree kernels applied to such structures enable a simple way to generate highly discriminative structural features that combine syntactic and semantic information encoded in the input trees. We conduct experiments on a public benchmark from TREC to compare with previous systems for answer sentence selection and answer extraction. The results show that our models greatly improve on the state of the art, e.g., up to 22% on F1 (relative improvement) for answer extraction, while using no additional resources and no manual feature engineering.
5 0.46471539 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
Author: Yangfeng Ji ; Jacob Eisenstein
Abstract: Matrix and tensor factorization have been applied to a number of semantic relatedness tasks, including paraphrase identification. The key idea is that similarity in the latent space implies semantic relatedness. We describe three ways in which labeled data can improve the accuracy of these approaches on paraphrase classification. First, we design a new discriminative term-weighting metric called TF-KLD, which outperforms TF-IDF. Next, we show that using the latent representation from matrix factorization as features in a classification algorithm substantially improves accuracy. Finally, we combine latent features with fine-grained n-gram overlap features, yielding performance that is 3% more accurate than the prior state-of-the-art.
6 0.4645305 44 emnlp-2013-Centering Similarity Measures to Reduce Hubs
7 0.42387554 170 emnlp-2013-Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet
9 0.35653156 165 emnlp-2013-Scaling to Large3 Data: An Efficient and Effective Method to Compute Distributional Thesauri
11 0.34720409 10 emnlp-2013-A Multi-Teraflop Constituency Parser using GPUs
12 0.34691942 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk
13 0.31822884 12 emnlp-2013-A Semantically Enhanced Approach to Determine Textual Similarity
14 0.31547996 123 emnlp-2013-Learning to Rank Lexical Substitutions
15 0.30392006 196 emnlp-2013-Using Crowdsourcing to get Representations based on Regular Expressions
16 0.30325538 58 emnlp-2013-Dependency Language Models for Sentence Completion
17 0.29489797 102 emnlp-2013-Improving Learning and Inference in a Large Knowledge-Base using Latent Syntactic Cues
18 0.29119349 45 emnlp-2013-Chinese Zero Pronoun Resolution: Some Recent Advances
19 0.28756687 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing
20 0.27445012 182 emnlp-2013-The Topology of Semantic Knowledge
topicId topicWeight
[(3, 0.016), (9, 0.015), (18, 0.022), (22, 0.04), (30, 0.09), (50, 0.013), (51, 0.134), (66, 0.02), (71, 0.031), (75, 0.044), (77, 0.024), (90, 0.014), (96, 0.466)]
simIndex simValue paperId paperTitle
1 0.9334814 165 emnlp-2013-Scaling to Large3 Data: An Efficient and Effective Method to Compute Distributional Thesauri
Author: Martin Riedl ; Chris Biemann
Abstract: We introduce a new highly scalable approach for computing Distributional Thesauri (DTs). By employing pruning techniques and a distributed framework, we make the computation for very large corpora feasible on comparably small computational resources. We demonstrate this by releasing a DT for the whole vocabulary of Google Books syntactic n-grams. Evaluating against lexical resources using two measures, we show that our approach produces higher quality DTs than previous approaches, and is thus preferable in terms of speed and quality for large corpora.
same-paper 2 0.81560934 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations
Author: Shashank Srivastava ; Dirk Hovy ; Eduard Hovy
Abstract: In this paper, we propose a walk-based graph kernel that generalizes the notion of treekernels to continuous spaces. Our proposed approach subsumes a general framework for word-similarity, and in particular, provides a flexible way to incorporate distributed representations. Using vector representations, such an approach captures both distributional semantic similarities among words as well as the structural relations between them (encoded as the structure of the parse tree). We show an efficient formulation to compute this kernel using simple matrix operations. We present our results on three diverse NLP tasks, showing state-of-the-art results.
3 0.78827798 112 emnlp-2013-Joint Coreference Resolution and Named-Entity Linking with Multi-Pass Sieves
Author: Hannaneh Hajishirzi ; Leila Zilles ; Daniel S. Weld ; Luke Zettlemoyer
Abstract: Many errors in coreference resolution come from semantic mismatches due to inadequate world knowledge. Errors in named-entity linking (NEL), on the other hand, are often caused by superficial modeling of entity context. This paper demonstrates that these two tasks are complementary. We introduce NECO, a new model for named entity linking and coreference resolution, which solves both problems jointly, reducing the errors made on each. NECO extends the Stanford deterministic coreference system by automatically linking mentions to Wikipedia and introducing new NEL-informed mention-merging sieves. Linking improves mention-detection and enables new semantic attributes to be incorporated from Freebase, while coreference provides better context modeling by propagating named-entity links within mention clusters. Experiments show consistent improve- ments across a number of datasets and experimental conditions, including over 11% reduction in MUC coreference error and nearly 21% reduction in F1 NEL error on ACE 2004 newswire data.
4 0.75011337 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter
Author: Qiming Diao ; Jing Jiang
Abstract: With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant message. On the one hand, people tweets about their daily lives, and on the other hand, when major events happen, people also follow and tweet about them. Moreover, people’s posting behaviors on events are often closely tied to their personal interests. In this paper, we try to model topics, events and users on Twitter in a unified way. We propose a model which combines an LDA-like topic model and the Recurrent Chinese Restaurant Process to capture topics and events. We further propose a duration-based regularization component to find bursty events. We also propose to use event-topic affinity vectors to model the asso- . ciation between events and topics. Our experiments shows that our model can accurately identify meaningful events and the event-topic affinity vectors are effective for event recommendation and grouping events by topics.
Author: Baichuan Li ; Jing Liu ; Chin-Yew Lin ; Irwin King ; Michael R. Lyu
Abstract: Social media like forums and microblogs have accumulated a huge amount of user generated content (UGC) containing human knowledge. Currently, most of UGC is listed as a whole or in pre-defined categories. This “list-based” approach is simple, but hinders users from browsing and learning knowledge of certain topics effectively. To address this problem, we propose a hierarchical entity-based approach for structuralizing UGC in social media. By using a large-scale entity repository, we design a three-step framework to organize UGC in a novel hierarchical structure called “cluster entity tree (CET)”. With Yahoo! Answers as a test case, we conduct experiments and the results show the effectiveness of our framework in constructing CET. We further evaluate the performance of CET on UGC organization in both user and system aspects. From a user aspect, our user study demonstrates that, with CET-based structure, users perform significantly better in knowledge learning than using traditional list-based approach. From a system aspect, CET substantially boosts the performance of two information retrieval models (i.e., vector space model and query likelihood language model).
6 0.53804111 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
7 0.53031182 130 emnlp-2013-Microblog Entity Linking by Leveraging Extra Posts
8 0.51324022 73 emnlp-2013-Error-Driven Analysis of Challenges in Coreference Resolution
9 0.5055787 80 emnlp-2013-Exploiting Zero Pronouns to Improve Chinese Coreference Resolution
10 0.49929166 1 emnlp-2013-A Constrained Latent Variable Model for Coreference Resolution
11 0.49552557 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types
12 0.49109966 200 emnlp-2013-Well-Argued Recommendation: Adaptive Models Based on Words in Recommender Systems
13 0.49058574 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes
14 0.48668891 69 emnlp-2013-Efficient Collective Entity Linking with Stacking
15 0.46635795 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
16 0.45937407 67 emnlp-2013-Easy Victories and Uphill Battles in Coreference Resolution
17 0.45867145 160 emnlp-2013-Relational Inference for Wikification
18 0.45248646 89 emnlp-2013-Gender Inference of Twitter Users in Non-English Contexts
19 0.45231593 75 emnlp-2013-Event Schema Induction with a Probabilistic Entity-Driven Model
20 0.45026803 143 emnlp-2013-Open Domain Targeted Sentiment