emnlp emnlp2011 emnlp2011-127 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili
Abstract: Alessandro Moschitti DISI University of Trento 38123 Povo (TN), Italy mo s chitt i di s i @ .unit n . it Roberto Basili DII University of Tor Vergata 00133 Roma, Italy bas i i info .uni roma2 . it l@ over semantic networks, e.g. (Cowie et al., 1992; Wu and Palmer, 1994; Resnik, 1995; Jiang and Conrath, A central topic in natural language processing is the design of lexical and syntactic fea- tures suitable for the target application. In this paper, we study convolution dependency tree kernels for automatic engineering of syntactic and semantic patterns exploiting lexical similarities. We define efficient and powerful kernels for measuring the similarity between dependency structures, whose surface forms of the lexical nodes are in part or completely different. The experiments with such kernels for question classification show an unprecedented results, e.g. 41% of error reduction of the former state-of-the-art. Additionally, semantic role classification confirms the benefit of semantic smoothing for dependency kernels.
Reference: text
sentIndex sentText sentNum sentScore
1 In this paper, we study convolution dependency tree kernels for automatic engineering of syntactic and semantic patterns exploiting lexical similarities. [sent-11, score-0.708]
2 We define efficient and powerful kernels for measuring the similarity between dependency structures, whose surface forms of the lexical nodes are in part or completely different. [sent-12, score-0.557]
3 The experiments with such kernels for question classification show an unprecedented results, e. [sent-13, score-0.317]
4 Additionally, semantic role classification confirms the benefit of semantic smoothing for dependency kernels. [sent-16, score-0.225]
5 A semantic similarity can be defined at structural level over a graph, e. [sent-28, score-0.214]
6 , 2009), as well as combining structural and lexical similarity 1034 1997; Schtze, 1998; Leacock and Chodorow, 1998; Pedersen et al. [sent-31, score-0.218]
7 On the other hand, automatic feature engineering of syntactic or shallow semantic structures has been carried out by means of structural kernels, e. [sent-38, score-0.231]
8 The main idea of structural kernels is to generate structures that in turn represent syntactic or shallow semantic features. [sent-48, score-0.514]
9 Most notably, the work in (Bloehdorn and Moschitti, 2007b) encodes lexical similarity in such kernels. [sent-49, score-0.159]
10 This is essentially the syntactic tree kernel (STK) proposed in (Collins and Duffy, 2002) in which syntactic fragments from constituency trees can be matched even if they only differ in the leaf nodes (i. [sent-50, score-0.646]
11 This implies matching scores lower than 1, depending on the semantic similarity of the corresponding leaves in the syntactic fragments. [sent-53, score-0.243]
12 Although this kernel achieves state-of-the-art performance in NLP tasks, such as Question ClassificaProce dEindgisnb oufr tgh e, 2 S0c1o1tl Canodn,f eUrKen,c Jeuol yn 2 E7m–3p1ir,ic 2a0l1 M1. [sent-54, score-0.175]
13 , trivially STK only matches the syntactic structure apple/orange when comparing the big beautiful apple to a nice large orange; and (ii) STK cannot be effectively applied to dependency structures, e. [sent-59, score-0.109]
14 Additionally, to our knowledge, there is no previous study that clearly describes how dependency structures should be converted in trees to be fully and effectively exploitable by convolution kernels. [sent-62, score-0.34]
15 Indeed, although the work in (Culotta and Sorensen, 2004) defines a dependency tree also using node similarity, it is not a convolution kernel: this results in a much poorer feature space. [sent-63, score-0.313]
16 In this paper, we propose a study of convolution kernels for dependency structures aiming at jointly modeling syntactic and lexical semantic similarity. [sent-64, score-0.673]
17 More precisely, we define several dependency trees exploitable by the Partial Tree Kernel (PTK) (Moschitti, 2006a) and compared them with STK over constituency trees. [sent-65, score-0.212]
18 the Smoothed Partial Tree Kernels (SPTKs), which can measure the similarity of structural similar trees whose nodes are associated with different but related lexicals. [sent-68, score-0.275]
19 Given the convolution nature of such kernels any possible node path of lexicals provide a contribution smoothed by the similarity accounted by its nodes. [sent-69, score-0.704]
20 In the reminder of this paper, Section 2 provides the background for structural and lexical similarity kernels. [sent-71, score-0.218]
21 This in several cases can be efficiently and implicitly computed by kernel functions by exploiting the following dual formulation: Pi=1. [sent-77, score-0.175]
22 l yiαiφ(oi)φ(o) + b = 0, where oi and o are Ptwo objects, φ is a mapping from the objects to featPure vectors x~ i and φ(oi)φ(o) = K(oi, o) is a kernel function implicitly defining such mapping. [sent-79, score-0.206]
23 The most general kind of kernels used in NLP are string kernels, e. [sent-81, score-0.283]
24 , f|F| } be a tree fragment space and (n) f be an indicator} function, equal tto s 1a cief atnhed target fi is rooted at node n and equal to 0 otherwise. [sent-104, score-0.186]
25 The ∆ function determines the richness of the kernel space and thus different tree kernels. [sent-107, score-0.286]
26 O(|NT1 | + |NT2 |), for natural language syntactic trees (Moschitti, 2006a). [sent-114, score-0.102]
27 I~1,~I2,l(XI~1)=l(I~2) Yj=1 1To have a similarity score between 0 and 1, a normalization in the kernel space, i. [sent-119, score-0.283]
28 This way, we penalize both larger trees and child subsequences with gaps. [sent-123, score-0.175]
29 PTK is more general than the STK as if we only consider the contribution of shared subsequences containing all children of nodes, we implement the STK kernel. [sent-124, score-0.155]
30 3 Lexical Semantic Kernel Given two text fragments d1 and d2 ∈ D (the text fragment set), a general lexical kernel∈ (Basili eet al. [sent-128, score-0.14]
31 , 2005) defines their similarity as: K(d1,d2) = X w1∈dX1 X,w2 ∈d2 (ω1ω2) σ(w1,w2) (1) where ω1 and ω2 are the weights of the words (features) w1 and w2 in the documents d1 and d2, respectively, and σ is a term similarity function, e. [sent-129, score-0.216]
32 We determine the term similarity function through distributional analysis (Pado and Lapata, 2007), ac- cording to the idea that the meaning of a word can be described by the set of textual contexts in which it appears (Distributional Hypothesis, (Harris, 1964)). [sent-135, score-0.108]
33 Therefore, given two words w1 and w2, the term similarity function σ is estimated as the cosine similarity between the corresponding projections w~ 1 , w~ 2, in. [sent-142, score-0.216]
34 Another methods to design a valid kernel is to represent words as word vectors and compute σ as their scalar product between such vectors. [sent-146, score-0.175]
35 We will refer to such similarity as WL (word list). [sent-149, score-0.108]
36 3 Smoothing Partial Tree Kernel (SPTK) Combining lexical and structural kernels provides clear advantages on all-vs-all words similarity, which tends to semantically diverge. [sent-150, score-0.393]
37 Following this idea, Bloedhorn & Moschitti (2007a) modified step (i) of ∆STK computation as follows: (i) if n1 and n2 are pre-terminal nodes with the same number of children, ∆STK (n1, n2) = Qjn=c(1n1) λ σ(lex(n1), lex(n2)), where lex returns theQ node label. [sent-152, score-0.124]
38 This allows to match fragments having same structure but different leaves by assigning a score proportional to the product of the lexical sim- × ilarities of each leaf pair. [sent-153, score-0.189]
39 Although it is an interesting kernel, the fact that lexicals must belong to the leaf nodes of exactly the same structures limits its applications. [sent-154, score-0.312]
40 Hereafter, we define a much more general smoothed tree kernel that can be applied to any tree and exploit any combination of lexical similarities, respecting the syntax enforced by the tree. [sent-156, score-0.488]
41 , (2) 1037 where σ is any similarity between nodes, e. [sent-160, score-0.108]
42 ubtrees rooted in subsequences of exactly p children (of n1 and n2) and m = min{l(cn1) , l(cn2)}. [sent-196, score-0.155]
43 s2b = cn2 (a and b are the last children) ∆p(s1a,s2b) = ∆(a,b) ×X|s1|X|s2|λ|s1|−i+|s2|−r× iX= X1 rX= X1 ∆p−1 (s1[1 : i] , s2 [1 : r] ) where s1[1 : i] and s2 [1 : r] are the child subsequences from 1 to iand from 1 to r of s1 and s2. [sent-198, score-0.122]
44 0; Note that Dp satisfies the recursive relation: Dp(k, l) = ∆p−1 (s1[1 : k], s2 [1 : l]) + λDp(k, l− 1) +λDp(k − 1, l) − λ2Dp(k − 1, l− 1) By means of the abo(vke − −re 1la,tli)o −n, we can compute )the child subsequences of two sequences s1 and s2 in O(p| s1| | s2 |). [sent-203, score-0.154]
45 The latter is very small in natural language parse trees and we also avoid the computation of node pairs with non similar labels. [sent-208, score-0.122]
46 We note that PTK generalizes both (i) SK, allowing the similarity between sequences (node children) structured in a tree and (ii) STK, allowing the computation of STK over any possible pair of subtrees extracted from the original tree. [sent-209, score-0.285]
47 4 Innovative Features of SPTK The most similar kernel to SPTK is the Syntactic Semantic Tree Kernel (SSTK) proposed in (Bloehdorn and Moschitti, 2007a; Bloehdorn and Moschitti, 2007b). [sent-212, score-0.175]
48 However, the following aspects show the remarkable innovativeness of SPTK: • • SSTK can only work on constituency trees aSnSdT Knot c on dependency trees (see (Moschitti, 2006a)). [sent-213, score-0.221]
49 The lexical similarity in SSTK is only applied tToh ele laefx ncaolde ssim i lna exactly StTheK same syntactic 1038 constituents. [sent-214, score-0.208]
50 ”television system” and ”video system” (so also exploiting the meaningful similarity between “video” and “television”). [sent-220, score-0.108]
51 • • The similarity in the PTK equation is added sTuhceh tihmaitl aSrPitTyK in st thilel corresponds to a ascddaleadr product in the semantic/structure space2. [sent-221, score-0.108]
52 In case of PTK and SPTK different tree representations may lead to engineer more or less effective syntactic/semantic feature spaces. [sent-224, score-0.111]
53 The next two sections provide our representation models for dependency trees and their discussion. [sent-225, score-0.113]
54 1 Proposed Computational Structures Given the following sentence: (s1) What is the width of a football field? [sent-227, score-0.166]
55 The representation tree for a phrase structure paradigm leaves little room for variations as shown by the constituency tree (CT) in Figure 1. [sent-228, score-0.316]
56 We apply lemmatization to the lexicals to improve generalization and, at the same time, we add a generalized PoS-tag, i. [sent-229, score-0.131]
57 This is useful to measure similarity between lexicals belonging to the same grammatical category. [sent-232, score-0.239]
58 In contrast, the conversion of dependency structures in computationally effective trees (for the above kernels) is not straightforward. [sent-233, score-0.189]
59 ROOT VBZ PRD NN P Figure 4: Lexical Centered Tree (LCT) to associate edges with dependencies but, since our kernels cannot process labels on the arcs, they must be associated with tree nodes. [sent-246, score-0.394]
60 see Figure 3, where the PoS-Tags are children of GR nodes and fathers of their associated lexicals; and the Lexical Centered Tree (LCT), e. [sent-259, score-0.119]
61 see Figure 6, which ignores the syntactic structure of the sentence being a simple sequence of PoS-Tag nodes, where lexicals are simply added as children; and the Lexical Sequence Tree (LST), where only lexical items are leaves of a single root node. [sent-270, score-0.302]
62 paths only composed by similar lexical nodes constrained by syntactic dependencies. [sent-288, score-0.155]
63 All the other trees produce fragments in which lexicals play the role of features of GR or PoS-Tag nodes. [sent-289, score-0.27]
64 At the same time, we compare with the constituency trees and different kernels to derive the best syntactic paradigm for convolution kernels. [sent-293, score-0.547]
65 Most importantly, the role of lexical similarity embedded in syntactic structures will be investigated. [sent-294, score-0.321]
66 htm 1040 cludes structural kernels in SVMLight (Joachims, 2000)) with the smooth match between tree nodes. [sent-302, score-0.453]
67 For generating constituency trees, we used the Charniak parser (Charniak, 2000) whereas we applied LTH syntactic parser (described in (Johansson and Nugues, 2008a)) to generate dependency trees. [sent-303, score-0.164]
68 The second approach uses the similarity based on word list (WL) as provided in (Li and Roth, 2002). [sent-317, score-0.108]
69 Models: SVM-LightTK is applied to the different tree representations discussed in Section 4. [sent-318, score-0.111]
70 LCTWL is the SPTK kernel applied to LCT structure, using WL similarity. [sent-324, score-0.175]
71 The outcome of the several kernels applied to several structures for the coarse and fine grained QC is reported in Table 1. [sent-338, score-0.427]
72 STK applied to a constituency tree and BOW, which is a linear ker- 4http://cogcomp. [sent-343, score-0.166]
73 It is worth nothing that when no similarity is applied: (i) BOW produces high accuracy, i. [sent-351, score-0.108]
74 , 2007)); (ii) PTK applied to the same tree of STK produces a slightly lower value (non-statistically significant difference); (iii) interestingly, when PTK is instead applied to dependency structures, it im- art5 proves STK, i. [sent-355, score-0.171]
75 80% since it is obviously subject to data sparseness (fragments only composed by lexicals are very sparse). [sent-361, score-0.131]
76 The very important results can be noted when lexical similarity is used, i. [sent-362, score-0.159]
77 3 Learning curves It is interesting to study the impact of syntactic/semantic kernels on the learning generalization. [sent-375, score-0.283]
78 Number of Nodes Figure 11: Micro-seconds for each kernel computation of the previous models without lexical similarity whereas Fig. [sent-378, score-0.368]
79 We note that when no similarity is used the dependency trees better generalize than constituency trees or non-syntactic structures like LPST or BOW. [sent-380, score-0.405]
80 When WL is activated, all models outperform the best kernel of the previous pool, i. [sent-381, score-0.175]
81 Figure 11 shows the elapsed time in function of the number of nodes for different tree representations. [sent-389, score-0.166]
82 We note that: (i) when the WL is not active, LCT and GRCT are very fast as they impose hierarchical matching of subtrees; (ii) when the similarity is activated, LCTWL and GRCTWL tend to match many more tree fragments thus their complexity increases. [sent-390, score-0.268]
83 Only LPSTWL, which has no structure, matches a very large number of sequences of nodes, when the similarity is active. [sent-393, score-0.14]
84 5 FrameNet Role Classification Experiments To verify that our findings are general and that our syntactic/semantic dependency kernels can be effectively exploited for diverse NLP tasks, we experimented with a completely different application, i. [sent-396, score-0.343]
85 6 Final Remarks and Conclusion In this paper, we have proposed a study on representation of dependency structures for the design of effective structural kernels. [sent-429, score-0.195]
86 Most importantly, we have defined a new class of kernel functions, i. [sent-430, score-0.175]
87 These show that by exploiting the similarity between two sets of words carried out according to their dependency structure leads to an unprecedented result for QC, i. [sent-436, score-0.202]
88 This is very valuable as previous work showed that tree kernels (TK) alone perform lower than models based on manually engineered features for SRL tasks, e. [sent-450, score-0.394]
89 Thus for the first time in an SRL task, a general tree kernel reaches the same accuracy of heavy manual feature design. [sent-455, score-0.286]
90 In word sense disambiguation tasks, SPTK can generalize csoen dteisxatm according t aos syntactic acandn semantic constraints (selectional restrictions) making very effective distributional semantic approaches. [sent-461, score-0.143]
91 Semantic kernels for text classification based on topological measures of feature similarity. [sent-487, score-0.283]
92 A hybrid convolution tree kernel for semantic role labeling. [sent-525, score-0.477]
93 The effect of syntactic representation on semantic role labeling. [sent-630, score-0.133]
94 Exploiting syntactic 1045 and shallow semantic kernels for question/answer classification. [sent-663, score-0.379]
95 A study on convolution kernels for shallow semantic parsing. [sent-672, score-0.437]
96 Efficient convolution kernels for dependency and constituent syntactic trees. [sent-676, score-0.499]
97 Using information content to evaluate semantic similarity in a taxonomy. [sent-704, score-0.155]
98 Support vector machines based on a semantic kernel for text categorization. [sent-727, score-0.222]
99 Exploring Syntactic Features for Relation Extraction using a Convolution tree kernel. [sent-768, score-0.111]
100 PRank: a comprehensive structural similarity measure over information networks. [sent-772, score-0.167]
wordName wordTfidf (topN-words)
[('sptk', 0.335), ('stk', 0.329), ('kernels', 0.283), ('moschitti', 0.271), ('ptk', 0.261), ('lct', 0.247), ('nmod', 0.204), ('kernel', 0.175), ('qc', 0.137), ('lexicals', 0.131), ('bloehdorn', 0.125), ('alessandro', 0.114), ('tree', 0.111), ('similarity', 0.108), ('convolution', 0.107), ('grct', 0.102), ('nn', 0.097), ('sk', 0.095), ('football', 0.095), ('subsequences', 0.091), ('centered', 0.082), ('wl', 0.08), ('sbj', 0.079), ('structures', 0.076), ('prd', 0.073), ('johansson', 0.071), ('width', 0.071), ('framenet', 0.069), ('wp', 0.069), ('children', 0.064), ('lsa', 0.062), ('dt', 0.061), ('dependency', 0.06), ('structural', 0.059), ('ux', 0.058), ('constituency', 0.055), ('srl', 0.055), ('nodes', 0.055), ('trees', 0.053), ('pmod', 0.053), ('lexical', 0.051), ('giuglea', 0.05), ('leaf', 0.05), ('basili', 0.049), ('fragments', 0.049), ('gr', 0.049), ('syntactic', 0.049), ('semantic', 0.047), ('roberto', 0.045), ('exploitable', 0.044), ('grctlsa', 0.044), ('lctlsa', 0.044), ('lpst', 0.044), ('sstk', 0.044), ('dp', 0.041), ('pedersen', 0.041), ('smoothed', 0.04), ('fragment', 0.04), ('cristianini', 0.039), ('grained', 0.039), ('productions', 0.039), ('leaves', 0.039), ('mehdad', 0.038), ('corley', 0.038), ('television', 0.038), ('pct', 0.038), ('role', 0.037), ('kudo', 0.036), ('node', 0.035), ('field', 0.034), ('smoothing', 0.034), ('ice', 0.034), ('video', 0.034), ('unprecedented', 0.034), ('computation', 0.034), ('root', 0.032), ('sequences', 0.032), ('copy', 0.032), ('vbp', 0.032), ('innovative', 0.032), ('oi', 0.031), ('child', 0.031), ('zhang', 0.03), ('mihalcea', 0.03), ('tk', 0.03), ('fine', 0.029), ('bunke', 0.029), ('cowie', 0.029), ('croce', 0.029), ('dii', 0.029), ('film', 0.029), ('golub', 0.029), ('lctwl', 0.029), ('lew', 0.029), ('loct', 0.029), ('qlj', 0.029), ('roma', 0.029), ('schtze', 0.029), ('sptks', 0.029), ('vnmodnnnmod', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 127 emnlp-2011-Structured Lexical Similarity via Convolution Kernels on Dependency Trees
Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili
Abstract: Alessandro Moschitti DISI University of Trento 38123 Povo (TN), Italy mo s chitt i di s i @ .unit n . it Roberto Basili DII University of Tor Vergata 00133 Roma, Italy bas i i info .uni roma2 . it l@ over semantic networks, e.g. (Cowie et al., 1992; Wu and Palmer, 1994; Resnik, 1995; Jiang and Conrath, A central topic in natural language processing is the design of lexical and syntactic fea- tures suitable for the target application. In this paper, we study convolution dependency tree kernels for automatic engineering of syntactic and semantic patterns exploiting lexical similarities. We define efficient and powerful kernels for measuring the similarity between dependency structures, whose surface forms of the lexical nodes are in part or completely different. The experiments with such kernels for question classification show an unprecedented results, e.g. 41% of error reduction of the former state-of-the-art. Additionally, semantic role classification confirms the benefit of semantic smoothing for dependency kernels.
2 0.37716702 147 emnlp-2011-Using Syntactic and Semantic Structural Kernels for Classifying Definition Questions in Jeopardy!
Author: Alessandro Moschitti ; Jennifer Chu-carroll ; Siddharth Patwardhan ; James Fan ; Giuseppe Riccardi
Abstract: The last decade has seen many interesting applications of Question Answering (QA) technology. The Jeopardy! quiz show is certainly one of the most fascinating, from the viewpoints of both its broad domain and the complexity of its language. In this paper, we study kernel methods applied to syntactic/semantic structures for accurate classification of Jeopardy! definition questions. Our extensive empirical analysis shows that our classification models largely improve on classifiers based on word-language models. Such classifiers are also used in the state-of-the-art QA pipeline constituting Watson, the IBM Jeopardy! system. Our experiments measuring their impact on Watson show enhancements in QA accuracy and a consequent increase in the amount of money earned in game-based evaluation.
3 0.13065556 68 emnlp-2011-Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding
Author: Marco Dinarelli ; Sophie Rosset
Abstract: Reranking models have been successfully applied to many tasks of Natural Language Processing. However, there are two aspects of this approach that need a deeper investigation: (i) Assessment of hypotheses generated for reranking at classification phase: baseline models generate a list of hypotheses and these are used for reranking without any assessment; (ii) Detection of cases where reranking models provide a worst result: the best hypothesis provided by the reranking model is assumed to be always the best result. In some cases the reranking model provides an incorrect hypothesis while the baseline best hypothesis is correct, especially when baseline models are accurate. In this paper we propose solutions for these two aspects: (i) a semantic inconsistency metric to select possibly more correct n-best hypotheses, from a large set generated by an SLU basiline model. The selected hypotheses are reranked applying a state-of-the-art model based on Partial Tree Kernels, which encode SLU hypotheses in Support Vector Machines with complex structured features; (ii) finally, we apply a decision strategy, based on confidence values, to select the final hypothesis between the first ranked hypothesis provided by the baseline SLU model and the first ranked hypothesis provided by the re-ranker. We show the effectiveness of these solutions presenting comparative results obtained reranking hypotheses generated by a very accurate Conditional Random Field model. We evaluate our approach on the French MEDIA corpus. The results show significant improvements with respect to current state-of-the-art and previous 1104 Sophie Rosset LIMSI-CNRS B.P. 133, 91403 Orsay Cedex France ro s set @ l ims i fr . re-ranking models.
4 0.09249384 50 emnlp-2011-Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation
Author: Reut Tsarfaty ; Joakim Nivre ; Evelina Andersson
Abstract: unkown-abstract
5 0.085957564 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP
Author: Federico Sangati ; Willem Zuidema
Abstract: We present a novel approach to Data-Oriented Parsing (DOP). Like other DOP models, our parser utilizes syntactic fragments of arbitrary size from a treebank to analyze new sentences, but, crucially, it uses only those which are encountered at least twice. This criterion allows us to work with a relatively small but representative set of fragments, which can be employed as the symbolic backbone of several probabilistic generative models. For parsing we define a transform-backtransform approach that allows us to use standard PCFG technology, making our results easily replicable. According to standard Parseval metrics, our best model is on par with many state-ofthe-art parsers, while offering some complementary benefits: a simple generative probability model, and an explicit representation of the larger units of grammar.
6 0.080956504 4 emnlp-2011-A Fast, Accurate, Non-Projective, Semantically-Enriched Parser
7 0.074780427 89 emnlp-2011-Linguistic Redundancy in Twitter
8 0.070714228 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation
9 0.068588339 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
10 0.065958582 7 emnlp-2011-A Joint Model for Extended Semantic Role Labeling
11 0.06498991 145 emnlp-2011-Unsupervised Semantic Role Induction with Graph Partitioning
12 0.063902058 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
13 0.063137263 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing
14 0.06078371 84 emnlp-2011-Learning the Information Status of Noun Phrases in Spoken Dialogues
15 0.058317907 114 emnlp-2011-Relation Extraction with Relation Topics
16 0.057242889 107 emnlp-2011-Probabilistic models of similarity in syntactic context
17 0.055478521 80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context
18 0.052499175 10 emnlp-2011-A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
19 0.052097891 131 emnlp-2011-Syntactic Decision Tree LMs: Random Selection or Intelligent Design?
20 0.050355952 135 emnlp-2011-Timeline Generation through Evolutionary Trans-Temporal Summarization
topicId topicWeight
[(0, 0.205), (1, -0.028), (2, -0.08), (3, 0.081), (4, -0.022), (5, -0.058), (6, -0.063), (7, -0.012), (8, 0.349), (9, -0.006), (10, 0.094), (11, -0.089), (12, -0.005), (13, 0.344), (14, -0.223), (15, -0.063), (16, -0.235), (17, 0.08), (18, -0.245), (19, -0.03), (20, -0.158), (21, -0.11), (22, -0.072), (23, 0.062), (24, -0.041), (25, 0.016), (26, -0.178), (27, -0.164), (28, 0.037), (29, -0.002), (30, -0.038), (31, -0.021), (32, 0.123), (33, -0.065), (34, 0.069), (35, 0.004), (36, -0.003), (37, -0.054), (38, 0.045), (39, 0.084), (40, -0.008), (41, 0.052), (42, -0.001), (43, -0.033), (44, -0.029), (45, 0.026), (46, -0.018), (47, 0.016), (48, 0.01), (49, -0.063)]
simIndex simValue paperId paperTitle
same-paper 1 0.93688434 127 emnlp-2011-Structured Lexical Similarity via Convolution Kernels on Dependency Trees
Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili
Abstract: Alessandro Moschitti DISI University of Trento 38123 Povo (TN), Italy mo s chitt i di s i @ .unit n . it Roberto Basili DII University of Tor Vergata 00133 Roma, Italy bas i i info .uni roma2 . it l@ over semantic networks, e.g. (Cowie et al., 1992; Wu and Palmer, 1994; Resnik, 1995; Jiang and Conrath, A central topic in natural language processing is the design of lexical and syntactic fea- tures suitable for the target application. In this paper, we study convolution dependency tree kernels for automatic engineering of syntactic and semantic patterns exploiting lexical similarities. We define efficient and powerful kernels for measuring the similarity between dependency structures, whose surface forms of the lexical nodes are in part or completely different. The experiments with such kernels for question classification show an unprecedented results, e.g. 41% of error reduction of the former state-of-the-art. Additionally, semantic role classification confirms the benefit of semantic smoothing for dependency kernels.
2 0.92828709 147 emnlp-2011-Using Syntactic and Semantic Structural Kernels for Classifying Definition Questions in Jeopardy!
Author: Alessandro Moschitti ; Jennifer Chu-carroll ; Siddharth Patwardhan ; James Fan ; Giuseppe Riccardi
Abstract: The last decade has seen many interesting applications of Question Answering (QA) technology. The Jeopardy! quiz show is certainly one of the most fascinating, from the viewpoints of both its broad domain and the complexity of its language. In this paper, we study kernel methods applied to syntactic/semantic structures for accurate classification of Jeopardy! definition questions. Our extensive empirical analysis shows that our classification models largely improve on classifiers based on word-language models. Such classifiers are also used in the state-of-the-art QA pipeline constituting Watson, the IBM Jeopardy! system. Our experiments measuring their impact on Watson show enhancements in QA accuracy and a consequent increase in the amount of money earned in game-based evaluation.
3 0.33276296 68 emnlp-2011-Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding
Author: Marco Dinarelli ; Sophie Rosset
Abstract: Reranking models have been successfully applied to many tasks of Natural Language Processing. However, there are two aspects of this approach that need a deeper investigation: (i) Assessment of hypotheses generated for reranking at classification phase: baseline models generate a list of hypotheses and these are used for reranking without any assessment; (ii) Detection of cases where reranking models provide a worst result: the best hypothesis provided by the reranking model is assumed to be always the best result. In some cases the reranking model provides an incorrect hypothesis while the baseline best hypothesis is correct, especially when baseline models are accurate. In this paper we propose solutions for these two aspects: (i) a semantic inconsistency metric to select possibly more correct n-best hypotheses, from a large set generated by an SLU basiline model. The selected hypotheses are reranked applying a state-of-the-art model based on Partial Tree Kernels, which encode SLU hypotheses in Support Vector Machines with complex structured features; (ii) finally, we apply a decision strategy, based on confidence values, to select the final hypothesis between the first ranked hypothesis provided by the baseline SLU model and the first ranked hypothesis provided by the re-ranker. We show the effectiveness of these solutions presenting comparative results obtained reranking hypotheses generated by a very accurate Conditional Random Field model. We evaluate our approach on the French MEDIA corpus. The results show significant improvements with respect to current state-of-the-art and previous 1104 Sophie Rosset LIMSI-CNRS B.P. 133, 91403 Orsay Cedex France ro s set @ l ims i fr . re-ranking models.
4 0.29477662 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP
Author: Federico Sangati ; Willem Zuidema
Abstract: We present a novel approach to Data-Oriented Parsing (DOP). Like other DOP models, our parser utilizes syntactic fragments of arbitrary size from a treebank to analyze new sentences, but, crucially, it uses only those which are encountered at least twice. This criterion allows us to work with a relatively small but representative set of fragments, which can be employed as the symbolic backbone of several probabilistic generative models. For parsing we define a transform-backtransform approach that allows us to use standard PCFG technology, making our results easily replicable. According to standard Parseval metrics, our best model is on par with many state-ofthe-art parsers, while offering some complementary benefits: a simple generative probability model, and an explicit representation of the larger units of grammar.
5 0.27422205 50 emnlp-2011-Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation
Author: Reut Tsarfaty ; Joakim Nivre ; Evelina Andersson
Abstract: unkown-abstract
6 0.27263466 84 emnlp-2011-Learning the Information Status of Noun Phrases in Spoken Dialogues
7 0.26012257 47 emnlp-2011-Efficient retrieval of tree translation examples for Syntax-Based Machine Translation
8 0.256782 131 emnlp-2011-Syntactic Decision Tree LMs: Random Selection or Intelligent Design?
9 0.25048873 89 emnlp-2011-Linguistic Redundancy in Twitter
10 0.21412471 112 emnlp-2011-Refining the Notions of Depth and Density in WordNet-based Semantic Similarity Measures
11 0.20604947 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
12 0.20412411 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming
13 0.20270388 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation
14 0.19816174 4 emnlp-2011-A Fast, Accurate, Non-Projective, Semantically-Enriched Parser
15 0.19354704 145 emnlp-2011-Unsupervised Semantic Role Induction with Graph Partitioning
16 0.18981755 63 emnlp-2011-Harnessing WordNet Senses for Supervised Sentiment Classification
17 0.18911871 107 emnlp-2011-Probabilistic models of similarity in syntactic context
18 0.18892723 102 emnlp-2011-Parse Correction with Specialized Models for Difficult Attachment Types
19 0.18610637 19 emnlp-2011-Approximate Scalable Bounded Space Sketch for Large Data NLP
20 0.18458386 80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context
topicId topicWeight
[(14, 0.307), (23, 0.099), (27, 0.019), (36, 0.026), (37, 0.033), (45, 0.058), (53, 0.021), (54, 0.028), (57, 0.017), (62, 0.015), (64, 0.018), (66, 0.056), (69, 0.015), (79, 0.047), (82, 0.02), (87, 0.011), (90, 0.04), (96, 0.066), (98, 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.78850484 127 emnlp-2011-Structured Lexical Similarity via Convolution Kernels on Dependency Trees
Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili
Abstract: Alessandro Moschitti DISI University of Trento 38123 Povo (TN), Italy mo s chitt i di s i @ .unit n . it Roberto Basili DII University of Tor Vergata 00133 Roma, Italy bas i i info .uni roma2 . it l@ over semantic networks, e.g. (Cowie et al., 1992; Wu and Palmer, 1994; Resnik, 1995; Jiang and Conrath, A central topic in natural language processing is the design of lexical and syntactic fea- tures suitable for the target application. In this paper, we study convolution dependency tree kernels for automatic engineering of syntactic and semantic patterns exploiting lexical similarities. We define efficient and powerful kernels for measuring the similarity between dependency structures, whose surface forms of the lexical nodes are in part or completely different. The experiments with such kernels for question classification show an unprecedented results, e.g. 41% of error reduction of the former state-of-the-art. Additionally, semantic role classification confirms the benefit of semantic smoothing for dependency kernels.
2 0.77642488 92 emnlp-2011-Minimally Supervised Event Causality Identification
Author: Quang Do ; Yee Seng Chan ; Dan Roth
Abstract: This paper develops a minimally supervised approach, based on focused distributional similarity methods and discourse connectives, for identifying of causality relations between events in context. While it has been shown that distributional similarity can help identifying causality, we observe that discourse connectives and the particular discourse relation they evoke in context provide additional information towards determining causality between events. We show that combining discourse relation predictions and distributional similarity methods in a global inference procedure provides additional improvements towards determining event causality.
Author: Alessandro Moschitti ; Jennifer Chu-carroll ; Siddharth Patwardhan ; James Fan ; Giuseppe Riccardi
Abstract: The last decade has seen many interesting applications of Question Answering (QA) technology. The Jeopardy! quiz show is certainly one of the most fascinating, from the viewpoints of both its broad domain and the complexity of its language. In this paper, we study kernel methods applied to syntactic/semantic structures for accurate classification of Jeopardy! definition questions. Our extensive empirical analysis shows that our classification models largely improve on classifiers based on word-language models. Such classifiers are also used in the state-of-the-art QA pipeline constituting Watson, the IBM Jeopardy! system. Our experiments measuring their impact on Watson show enhancements in QA accuracy and a consequent increase in the amount of money earned in game-based evaluation.
4 0.46008429 68 emnlp-2011-Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding
Author: Marco Dinarelli ; Sophie Rosset
Abstract: Reranking models have been successfully applied to many tasks of Natural Language Processing. However, there are two aspects of this approach that need a deeper investigation: (i) Assessment of hypotheses generated for reranking at classification phase: baseline models generate a list of hypotheses and these are used for reranking without any assessment; (ii) Detection of cases where reranking models provide a worst result: the best hypothesis provided by the reranking model is assumed to be always the best result. In some cases the reranking model provides an incorrect hypothesis while the baseline best hypothesis is correct, especially when baseline models are accurate. In this paper we propose solutions for these two aspects: (i) a semantic inconsistency metric to select possibly more correct n-best hypotheses, from a large set generated by an SLU basiline model. The selected hypotheses are reranked applying a state-of-the-art model based on Partial Tree Kernels, which encode SLU hypotheses in Support Vector Machines with complex structured features; (ii) finally, we apply a decision strategy, based on confidence values, to select the final hypothesis between the first ranked hypothesis provided by the baseline SLU model and the first ranked hypothesis provided by the re-ranker. We show the effectiveness of these solutions presenting comparative results obtained reranking hypotheses generated by a very accurate Conditional Random Field model. We evaluate our approach on the French MEDIA corpus. The results show significant improvements with respect to current state-of-the-art and previous 1104 Sophie Rosset LIMSI-CNRS B.P. 133, 91403 Orsay Cedex France ro s set @ l ims i fr . re-ranking models.
5 0.4451229 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
Author: Kevin Gimpel ; Noah A. Smith
Abstract: We present a quasi-synchronous dependency grammar (Smith and Eisner, 2006) for machine translation in which the leaves of the tree are phrases rather than words as in previous work (Gimpel and Smith, 2009). This formulation allows us to combine structural components of phrase-based and syntax-based MT in a single model. We describe a method of extracting phrase dependencies from parallel text using a target-side dependency parser. For decoding, we describe a coarse-to-fine approach based on lattice dependency parsing of phrase lattices. We demonstrate performance improvements for Chinese-English and UrduEnglish translation over a phrase-based baseline. We also investigate the use of unsupervised dependency parsers, reporting encouraging preliminary results.
6 0.44250634 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
7 0.43688738 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
8 0.43672889 136 emnlp-2011-Training a Parser for Machine Translation Reordering
9 0.43468633 144 emnlp-2011-Unsupervised Learning of Selectional Restrictions and Detection of Argument Coercions
10 0.43359378 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing
11 0.43108711 107 emnlp-2011-Probabilistic models of similarity in syntactic context
12 0.42907435 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP
13 0.42869267 111 emnlp-2011-Reducing Grounded Learning Tasks To Grammatical Inference
14 0.4286283 87 emnlp-2011-Lexical Generalization in CCG Grammar Induction for Semantic Parsing
15 0.42805398 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation
16 0.42771566 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances
17 0.42755979 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search
18 0.42746165 79 emnlp-2011-Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction
19 0.42695048 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
20 0.4269062 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction