acl acl2010 acl2010-70 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Stefan Thater ; Hagen Furstenau ; Manfred Pinkal
Abstract: We present a syntactically enriched vector model that supports the computation of contextualized semantic representations in a quasi compositional fashion. It employs a systematic combination of first- and second-order context vectors. We apply our model to two different tasks and show that (i) it substantially outperforms previous work on a paraphrase ranking task, and (ii) achieves promising results on a wordsense similarity task; to our knowledge, it is the first time that an unsupervised method has been applied to this task.
Reference: text
sentIndex sentText sentNum sentScore
1 de Abstract We present a syntactically enriched vector model that supports the computation of contextualized semantic representations in a quasi compositional fashion. [sent-3, score-0.648]
2 We apply our model to two different tasks and show that (i) it substantially outperforms previous work on a paraphrase ranking task, and (ii) achieves promising results on a wordsense similarity task; to our knowledge, it is the first time that an unsupervised method has been applied to this task. [sent-5, score-0.398]
3 Frequency counts of context words for a given target word provide invariant representations averaging over all different usages of the target word. [sent-17, score-0.279]
4 acquire in different contexts, such as acquire knowledge or acquire shares. [sent-20, score-0.546]
5 In contrast to these approaches, we present a method to model the mutual contextualization of words in a phrase in a compositional way, guided by syntactic structure. [sent-22, score-0.212]
6 We go one step further, however, in that we employ syntactically enriched vector models as the basic meaning representations, assuming a vector space spanned by combinations of dependency relations and words (Lin, 1998). [sent-24, score-0.802]
7 Using syntactically enriched vector models raises problems of different kinds: First, the use 948 Proce dinUgsp osfa tlhae, 4S8wthed Aen n,u 1a1l-1 M6e Jeutilnyg 2 o0f1 t0h. [sent-27, score-0.313]
8 Second, the vectors of two syntactically related words, e. [sent-30, score-0.341]
9 , a target verb acquire and its direct object knowledge, typically have different syntactic environments, which implies that their vector representations encode complementary information and there is no direct way of combining the information encoded in the respective vectors. [sent-32, score-0.679]
10 Second-order vector representations in a bag-ofwords setting were first used by Schütze (1998); in a syntactic setting, they also feature in Dligach and Palmer (2008). [sent-35, score-0.282]
11 For the problem at hand, the use of second-order vectors alleviates the sparseness problem, and enables the definition of vector space transformations that make the distributional information attached to words in different syntactic positions compatible. [sent-36, score-0.553]
12 Thus, it allows vectors for a predicate and its arguments to be combined in a compositional way. [sent-37, score-0.323]
13 Our first experiment is carried out on the SemEval 2007 lexical substitution task dataset (McCarthy and Navigli, 2007). [sent-39, score-0.302]
14 2 Related Work Several approaches to contextualize vector representations of word meaning have been proposed. [sent-47, score-0.521]
15 One common approach is to represent the meaning of a word a in context b simply as the sum, or centroid of a and b (Landauer and Dumais, 1997). [sent-48, score-0.235]
16 By using vector representations of a predicate p and an argument a, Kintsch identifies words that are similar to p and a, and takes the centroid of these words’ vectors to be the representation of the complex expression p(a). [sent-50, score-0.64]
17 Mitchell and Lapata (2008), henceforth M&L;, propose a general framework in which meaning representations for complex expressions are computed compositionally by combining the vector representations of the individual words of the complex expression. [sent-51, score-0.52]
18 They focus on the assessment of different operations combining the vectors of the subexpressions. [sent-52, score-0.278]
19 Also, they use syntax-free bag-of-words-based vectors as basic representations of word meaning. [sent-55, score-0.372]
20 Erk and Padó (2008), henceforth E&P;, represent the meaning of a word w through a collection of vectors instead of a single vector: They assume selectional preferences and inverse selectional preferences to be constitutive parts of the meaning in addition to the meaning proper. [sent-56, score-0.895]
21 The interpretation of a word p in context a is a combination of p’s meaning with the (inverse) selectional preference of a. [sent-57, score-0.302]
22 Thus, a verb meaning does not combine directly with the meaning of its object noun, as on the M&L; account, but with the centroid of the vectors of the verbs to which the noun can stand in an object relation. [sent-58, score-0.853]
23 In the present paper, we formulate a general model of syntactically informed contextualization and show how to apply it to a number a of representative lexical substitution tasks. [sent-66, score-0.381]
24 ; 3 The model In this section, we present our method of contex- tualizing semantic vector representations. [sent-69, score-0.284]
25 We first give an overview of the main ideas, which is followed by a technical description of first-order and second-order vectors (Section 3. [sent-70, score-0.278]
26 1 Overview Our model employs vector representations for words and expressions containing syntax-specific first and second order co-occurrences information. [sent-74, score-0.333]
27 The basis for the construction of both kinds of vector representations are co-occurrence graphs. [sent-75, score-0.282]
28 From this graph, we can directly read off the first-order vector for every word w: the vector’s dimensions correspond to pairs (r, w0) of a grammatical relation and a neighboring word, and are assigned the frequency count of (w, r, w0). [sent-77, score-0.317]
29 i This vector talks about the possible dependency heads of knowledge and thus can be seen as the (inverse) selectional preference of knowledge (see Erk and Padó (2008)). [sent-81, score-0.423]
30 As soon as we want to compute a meaning representation for a phrase like acquire knowledge from the verb acquire together with its direct object knowledge, we are facing the problem that verbs have different syntactic neighbors than nouns, hence their first-order vectors are not easily comparable. [sent-82, score-1.012]
31 To solve this problem we additionally introduce another kind of vectors capturing informations about all words that can be reached with two steps in the co-occurrence graph. [sent-83, score-0.331]
32 To avoid overly sparse vectors we generalize over the “middle word” w0 and build our second-order vectors on the dimensions corresponding to triples (r, r0, w00) of two dependency relations and one word at the end of the twostep path. [sent-87, score-0.833]
33 For instance, the second-order vector for acquire is h15(OBJ,OBJ−1,gain), 6(OBJ,CONJ−1,skill), 6(OBJ,OBJ−1 ,buy-back) 42(OBJ,OBJ−1,purchase), , . [sent-88, score-0.37]
34 Note that second order vectors in particular contain paths of the form (r, r−1 ,w0), relating a verb w to other verbs w0 which are possible substitution candidates. [sent-94, score-0.598]
35 With first- and second-order vectors we can now model the interaction of semantic information within complex expressions. [sent-95, score-0.374]
36 Given a pair of words in a particular grammatical relation like acquire knowledge, we contextualize the secondorder vector of acquire with the first-order vector of knowledge. [sent-96, score-0.916]
37 We let the first-order vector with its selectional preference information act as a kind of weighting filter on the second-order vector, and thus refine the meaning representation of the verb. [sent-97, score-0.441]
38 In our example, we obtain a new second-order vector for acquire in the context of knowledge: h75(OBJ,OBJ−1,gain), 12(OBJ,CONJ−1,skill), 0(OBJ,OBJ−1,buy-back), 0(OBJ,OBJ−1,purchase), . [sent-100, score-0.419]
39 Also, contextualisation of acquire with the argument share instead of knowledge 950 would have led to a very different vector, which reflects the fact that the two argument nouns induce different readings of the inherently ambiguous acquire. [sent-104, score-0.224]
40 2 First and second-order vectors Assuming a set W of words and a set R of dependency relation labels, we consider a Euclidean vector space V1 spanned by the set of orthonormal basis vectors {~ er,w0 | r ∈ R, w0 ∈ W}, i. [sent-106, score-0.985]
41 , a vector space whose d{im~ eens|io rn ∈s correspond to pairs of a relation and a word. [sent-108, score-0.231]
42 In this vector space we define the first-order vector [w] of a word w as follows: [w] =wr∑0∈∈WRω(w,r,w0)·~ er,w0 where ω is a function that assigns the dependency triple (w, r,w0) a corresponding weight. [sent-110, score-0.507]
43 Evidently this is a higher dimensional space than V1, which therefore can be embedded into V2 by the “lifting maps” Lr : V1 ,→ V2 defined by Lr(~ er0,w0) := er,r0,w0 (and by linear,→ →ex Vtension therefore on all vectors of V1). [sent-114, score-0.321]
44 Using these lifting maps we define the second-order vector [[w]] of a word w as [ w] =wr∑0∈∈WRω(w,r,w0)·Lr? [sent-115, score-0.241]
45 3 Composition Both first and second-order vectors are defined for lexical expressions only. [sent-123, score-0.278]
46 In order to represent the meaning of complex expressions we need to combine the vectors for grammatically related words in a given sentence. [sent-124, score-0.384]
47 Given two words w and w0 in relation r we contextualize the second-order vector of w with the r-lifted first-order vector of w0: [[wr:w0]] = [[w]] Lr([w0]) Here may denote any operator on V2. [sent-125, score-0.548]
48 The objective is to incorporate (inverse) selectional preference information from the context (r, w0) in such a way as to identify the correct word sense of w. [sent-126, score-0.254]
49 This can be expressed by pointwise vector multiplication (in terms of the given basis of V2). [sent-129, score-0.324]
50 To contextualize (the vector of) a word w with multiple words w1 , . [sent-131, score-0.321]
51 ,rn, we compute the sum of the results of the pairwise contextualizations of the target vector with the vectors of the respective dependents: 4 [ wr1:w1,. [sent-137, score-0.572]
52 ,rn:wn]] =k∑=n1[[wrk:wk]] Experiments: Ranking Paraphrases In this section, we evaluate our model on a paraphrase ranking task. [sent-139, score-0.345]
53 We consider sentences with an occurrence of some target word w and a list of paraphrase candidates w1 ,. [sent-140, score-0.373]
54 , wk such that each of the wi is a paraphrase of w for some sense of w. [sent-143, score-0.277]
55 The task is to decide for each of the paraphrase candidates wi how appropriate it is as a paraphrase of w in the given context. [sent-144, score-0.524]
56 For instance, buy, purchase and obtain are all paraphrases of acquire, in the sense that they can be substituted for acquire in some contexts, but purchase and buy are not paraphrases of acquire in the first sentence of Table 1. [sent-145, score-1.227]
57 951 Paraphrases Sentence Teacher education students will acquire the knowl- gain 4; amass 1; receive 1; obtain 1 edge and skills required to [. [sent-146, score-0.235]
58 ] acquire the remaining IXOS buy 3; purchase 1; gain 1; get 1; procure 2; obtain 1 shares [. [sent-153, score-0.319]
59 1 set Resources We use a vector model based on dependency trees obtained from parsing the English Gigaword corpus (LDC2003T05). [sent-157, score-0.327]
60 The complete dataset contains 10 instances for each of 200 target words—nouns, verbs, adjectives and adverbs—in different sentential contexts. [sent-164, score-0.289]
61 Systems that participated in the task had to generate paraphrases for every instance, and were evaluated against a gold standard containing up to 10 possible paraphrases for each of the individual instances. [sent-165, score-0.644]
62 There are two natural subtasks in generating paraphrases: identifying paraphrase candidates and ranking them according to the context. [sent-166, score-0.38]
63 We follow E&P; and evaluate it only on the second subtask: we extract paraphrase candidates from the gold standard by pooling all annotated gold-standard paraphrases for all instances of a verb in all contexts, and use our model to rank these paraphrase candidates in specific contexts. [sent-167, score-1.204]
64 Table 1 shows two instances of the target verb acquire together with its paraphrases in the gold standard as an example. [sent-168, score-0.754]
65 The paraphrases are attached with weights, which correspond to the number of times they have been given by different annotators. [sent-169, score-0.294]
66 We define average precision first: AP =Σin=R1xipi pi=Σik=i1xk where xi is a binary variable indicating whether the ith item as ranked by the model is in the gold standard or not, R is the size of the gold standard, and n is the number of paraphrase candidates to be ranked. [sent-180, score-0.602]
67 3 Experiment 1: Verb paraphrases In our first experiment, we consider verb paraphrases using the same controlled subset of the 952 lexical substitution task data that had been used by TDP in an earlier study. [sent-190, score-0.829]
68 The dataset is identical to the one used by TDP and has been constructed in the same way as the dataset used by E&P;: it contains those goldstandard instances of verbs that have—according to the analyses produced by the MiniPar parser (Lin, 1993)—an overtly realized subject and object. [sent-193, score-0.369]
69 Gold-standard paraphrases that do not occur in the parsed British National Corpus are removed. [sent-194, score-0.35]
70 5 substitution candidates; for individual instances of a target verb, an average of 3. [sent-197, score-0.326]
71 9 of the substitution candidates are annotated as correct paraphrases. [sent-198, score-0.237]
72 To compute the vector space, we consider only a subset of the complete set of dependency triples extracted from the parsed Gigaword corpus. [sent-201, score-0.479]
73 We experimented with various strategies, and found that models which consider all dependency triples exceeding certain pmi- and frequency thresholds perform best. [sent-202, score-0.284]
74 Since the dataset is rather small, we use a fourfold cross-validation method for parameter tuning: We divide the dataset into four subsets, test various parameter settings on one subset and use the parameters that perform best (in terms of GAP) to evaluate the model on the three other subsets. [sent-203, score-0.277]
75 We consider the following parameters: pmi-thresholds for the dependency triples used in the computation of the first- and second-order vectors, and frequency thresholds. [sent-204, score-0.246]
76 The threshold values for context vectors are slightly different: a medium pmi-threshold between 2 and 4 and a low frequency threshold of 3. [sent-207, score-0.376]
77 To rank paraphrases in context, we compute contextualized vectors for the verb in the input sen2Both TDP and E&P; use the British National Corpus. [sent-208, score-0.839]
78 , a second order vector for the verb that is contextually constrained by the first order vectors of all its arguments, and compare them to the unconstrained (second-order) vectors of each paraphrase candidate, using cosine similarity. [sent-211, score-1.183]
79 We evaluate our model against a random baseline and two variants of our model: One variant (“2nd order uncontexualized”) simply uses contextually unconstrained second-order vectors to rank paraphrase candidates. [sent-214, score-0.731]
80 Comparing the full model to this variant will show how effective our method of contextualizing vectors is. [sent-215, score-0.489]
81 The second variant (“1st order contextualized”) represents verbs in context by their first order vectors that specify how often the verb co-occurs with its arguments in the parsed Gigaword corpus. [sent-216, score-0.606]
82 With our choice of pointwise multiplication for the composition operator we have (~ v1 w) · v2 = v1 · (~ v2 w). [sent-231, score-0.234]
83 To find out how our model performs on less controlled datasets, we extracted all instances from the lexical substitution task dataset with a verb target, excluding only instances which could not be parsed by the Stanford parser, or in which the target was mistagged as a non-verb by the parser. [sent-255, score-0.657]
84 As for the LST/SO dataset, we ignore all gold-standard paraphrases that do not occur in the parsed (Gigaword) corpus. [sent-257, score-0.35]
85 4 Experiment 2: Non-verb paraphrases We now apply our model to parts of speech (POS) other than verbs. [sent-269, score-0.345]
86 25 Table 3: GAP-scores for non-verb paraphrases using two different methods. [sent-280, score-0.294]
87 We therefore propose an alternative method to rank non-verb paraphrases: We take the secondorder vector of the target’s head and contextually constrain it by the first order vector of the target. [sent-282, score-0.548]
88 For instance, if we want to rank the paraphrase candidates hint and star for the noun lead in the sentence (1) Meet for coffee early, swap leads and get permission to contact if possible. [sent-283, score-0.344]
89 we compute [[swapOBJ:lead]] and compare it to the lifted first-order vectors of all paraphrase candidates, LOBJ([hint]) and LOBJ([star]), using cosine similarity. [sent-284, score-0.575]
90 To evaluate the performance of the two methods, we extract all instances from the lexical substitution task dataset with a nominal, adjectival, or adverbial target, excluding instances with incorrect parse or no parse at all. [sent-285, score-0.392]
91 As before, we ignore gold-standard paraphrases that do not occur in the parsed Gigaword corpus. [sent-286, score-0.35]
92 5 Experiment: Ranking Word Senses In this section, we apply our model to a different word sense ranking task: Given a word w in context, the task is to decide to what extent the different 954 WordNet (Fellbaum, 1998) senses of w apply to this occurrence of w. [sent-293, score-0.261]
93 The dataset contains ordinal judgments of the applicability of WordNet senses on a 5 point scale, ranging from completely different to identical for eight different lemmas in 50 different sentential contexts. [sent-296, score-0.228]
94 For each word sense, we compute the centroid of the second-order vectors of its synset members. [sent-301, score-0.396]
95 3: For each instance in the dataset, we compute the second-order vector of the target verb, contextually constrain it by the first-order vectors of the verb’s arguments, and compare the resulting vector to the vectors that represent the different WordNet senses of the verb. [sent-304, score-1.205]
96 The WordNet senses are then ranked according to the cosine similarity between their sense vector and the contextually constrained target verb vector. [sent-305, score-0.657]
97 6 Conclusion We have presented a novel method for adapting the vector representations of words according to their context. [sent-333, score-0.282]
98 Evaluating on the SemEval 2007 lexical substitu- tion task dataset, our model performs substantially better than all earlier approaches, exceeding the state of the art by around 9% in terms of generalized average precision and around 7% in terms of precision out of ten. [sent-336, score-0.269]
99 We studied the effect that context has on target words in a series of experiments, which vary the target word and keep the context constant. [sent-339, score-0.234]
100 A structured vector space model for word meaning in context. [sent-373, score-0.388]
wordName wordTfidf (topN-words)
[('paraphrases', 0.294), ('vectors', 0.278), ('paraphrase', 0.219), ('tdp', 0.212), ('vector', 0.188), ('erk', 0.187), ('acquire', 0.182), ('mccarthy', 0.177), ('substitution', 0.151), ('contextualize', 0.133), ('pad', 0.123), ('contextualization', 0.116), ('dataset', 0.113), ('triples', 0.109), ('contextualizing', 0.106), ('meaning', 0.106), ('thater', 0.1), ('contextualized', 0.1), ('selectional', 0.098), ('wr', 0.098), ('representations', 0.094), ('contextually', 0.09), ('verb', 0.09), ('dependency', 0.088), ('candidates', 0.086), ('lr', 0.083), ('centroid', 0.08), ('dimensions', 0.08), ('purchase', 0.08), ('verbs', 0.079), ('senses', 0.077), ('gap', 0.077), ('ranking', 0.075), ('dependents', 0.074), ('multiplication', 0.069), ('target', 0.068), ('pointwise', 0.067), ('inverse', 0.065), ('instances', 0.064), ('spanned', 0.064), ('syntactically', 0.063), ('enriched', 0.062), ('obj', 0.062), ('composition', 0.059), ('sense', 0.058), ('gigaword', 0.058), ('buy', 0.057), ('semeval', 0.057), ('object', 0.057), ('gold', 0.056), ('sch', 0.056), ('parsed', 0.056), ('variant', 0.054), ('amass', 0.053), ('bites', 0.053), ('georgiana', 0.053), ('informations', 0.053), ('lifting', 0.053), ('lobj', 0.053), ('wordsense', 0.053), ('adverbs', 0.052), ('navigli', 0.052), ('model', 0.051), ('preference', 0.049), ('frequency', 0.049), ('context', 0.049), ('katrin', 0.048), ('diana', 0.048), ('generalized', 0.047), ('tze', 0.047), ('oh', 0.047), ('graded', 0.046), ('orthonormal', 0.046), ('ranked', 0.046), ('compositional', 0.045), ('precision', 0.045), ('semantic', 0.045), ('sparseness', 0.044), ('modelled', 0.044), ('wordnet', 0.044), ('adjectives', 0.044), ('space', 0.043), ('pinkal', 0.043), ('dinu', 0.043), ('secondorder', 0.043), ('average', 0.043), ('nouns', 0.042), ('dligach', 0.04), ('cosine', 0.04), ('rank', 0.039), ('operator', 0.039), ('lapata', 0.039), ('compute', 0.038), ('henceforth', 0.038), ('experiment', 0.038), ('exceeding', 0.038), ('sebastian', 0.038), ('judgments', 0.038), ('contexts', 0.037), ('mitchell', 0.037)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
Author: Stefan Thater ; Hagen Furstenau ; Manfred Pinkal
Abstract: We present a syntactically enriched vector model that supports the computation of contextualized semantic representations in a quasi compositional fashion. It employs a systematic combination of first- and second-order context vectors. We apply our model to two different tasks and show that (i) it substantially outperforms previous work on a paraphrase ranking task, and (ii) achieves promising results on a wordsense similarity task; to our knowledge, it is the first time that an unsupervised method has been applied to this task.
2 0.25737426 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection
Author: Linlin Li ; Benjamin Roth ; Caroline Sporleder
Abstract: This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instantiations of the model for solving sense disambiguation problems with different degrees of resource availability. The proposed models are tested on three different tasks: coarse-grained word sense disambiguation, fine-grained word sense disambiguation, and detection of literal vs. nonliteral usages of potentially idiomatic expressions. In all three cases, we outper- form state-of-the-art systems either quantitatively or statistically significantly.
3 0.25045249 107 acl-2010-Exemplar-Based Models for Word Meaning in Context
Author: Katrin Erk ; Sebastian Pado
Abstract: This paper describes ongoing work on distributional models for word meaning in context. We abandon the usual one-vectorper-word paradigm in favor of an exemplar model that activates only relevant occurrences. On a paraphrasing task, we find that a simple exemplar model outperforms more complex state-of-the-art models.
4 0.2164295 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation
Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita
Abstract: Lattice decoding in statistical machine translation (SMT) is useful in speech translation and in the translation of German because it can handle input ambiguities such as speech recognition ambiguities and German word segmentation ambiguities. We show that lattice decoding is also useful for handling input variations. Given an input sentence, we build a lattice which represents paraphrases of the input sentence. We call this a paraphrase lattice. Then, we give the paraphrase lattice as an input to the lattice decoder. The decoder selects the best path for decoding. Using these paraphrase lattices as inputs, we obtained significant gains in BLEU scores for IWSLT and Europarl datasets.
5 0.1494891 158 acl-2010-Latent Variable Models of Selectional Preference
Author: Diarmuid O Seaghdha
Abstract: This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to previously proposed techniques, these models perform very competitively, especially for infrequent predicate-argument combinations where they exceed the quality of Web-scale predictions while using relatively little data.
6 0.14858226 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences
7 0.1477582 66 acl-2010-Compositional Matrix-Space Models of Language
8 0.13581629 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
9 0.13216145 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences
10 0.13075341 220 acl-2010-Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
11 0.12699732 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs
12 0.10795328 238 acl-2010-Towards Open-Domain Semantic Role Labeling
13 0.1073192 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems
14 0.10646771 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
15 0.098869525 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion
16 0.096812002 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models
17 0.095522612 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
18 0.094277292 205 acl-2010-SVD and Clustering for Unsupervised POS Tagging
19 0.093497798 62 acl-2010-Combining Orthogonal Monolingual and Multilingual Sources of Evidence for All Words WSD
20 0.089980975 165 acl-2010-Learning Script Knowledge with Web Experiments
topicId topicWeight
[(0, -0.286), (1, 0.111), (2, 0.007), (3, 0.006), (4, 0.17), (5, -0.004), (6, 0.108), (7, 0.039), (8, 0.011), (9, -0.028), (10, 0.003), (11, 0.069), (12, 0.167), (13, 0.092), (14, 0.106), (15, 0.005), (16, 0.003), (17, -0.071), (18, -0.081), (19, 0.013), (20, 0.267), (21, 0.2), (22, -0.012), (23, 0.015), (24, 0.196), (25, -0.139), (26, -0.057), (27, 0.033), (28, -0.139), (29, -0.134), (30, -0.113), (31, 0.014), (32, -0.007), (33, 0.03), (34, 0.017), (35, 0.048), (36, -0.097), (37, -0.14), (38, -0.016), (39, 0.0), (40, -0.057), (41, -0.067), (42, -0.099), (43, 0.12), (44, -0.044), (45, 0.017), (46, 0.048), (47, 0.04), (48, 0.061), (49, -0.035)]
simIndex simValue paperId paperTitle
same-paper 1 0.93905497 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
Author: Stefan Thater ; Hagen Furstenau ; Manfred Pinkal
Abstract: We present a syntactically enriched vector model that supports the computation of contextualized semantic representations in a quasi compositional fashion. It employs a systematic combination of first- and second-order context vectors. We apply our model to two different tasks and show that (i) it substantially outperforms previous work on a paraphrase ranking task, and (ii) achieves promising results on a wordsense similarity task; to our knowledge, it is the first time that an unsupervised method has been applied to this task.
2 0.92942852 107 acl-2010-Exemplar-Based Models for Word Meaning in Context
Author: Katrin Erk ; Sebastian Pado
Abstract: This paper describes ongoing work on distributional models for word meaning in context. We abandon the usual one-vectorper-word paradigm in favor of an exemplar model that activates only relevant occurrences. On a paraphrasing task, we find that a simple exemplar model outperforms more complex state-of-the-art models.
3 0.62702137 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation
Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita
Abstract: Lattice decoding in statistical machine translation (SMT) is useful in speech translation and in the translation of German because it can handle input ambiguities such as speech recognition ambiguities and German word segmentation ambiguities. We show that lattice decoding is also useful for handling input variations. Given an input sentence, we build a lattice which represents paraphrases of the input sentence. We call this a paraphrase lattice. Then, we give the paraphrase lattice as an input to the lattice decoder. The decoder selects the best path for decoding. Using these paraphrase lattices as inputs, we obtained significant gains in BLEU scores for IWSLT and Europarl datasets.
4 0.5837388 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection
Author: Linlin Li ; Benjamin Roth ; Caroline Sporleder
Abstract: This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instantiations of the model for solving sense disambiguation problems with different degrees of resource availability. The proposed models are tested on three different tasks: coarse-grained word sense disambiguation, fine-grained word sense disambiguation, and detection of literal vs. nonliteral usages of potentially idiomatic expressions. In all three cases, we outper- form state-of-the-art systems either quantitatively or statistically significantly.
5 0.56161362 66 acl-2010-Compositional Matrix-Space Models of Language
Author: Sebastian Rudolph ; Eugenie Giesbrecht
Abstract: We propose CMSMs, a novel type of generic compositional models for syntactic and semantic aspects of natural language, based on matrix multiplication. We argue for the structural and cognitive plausibility of this model and show that it is able to cover and combine various common compositional NLP approaches ranging from statistical word space models to symbolic grammar formalisms.
6 0.54985034 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences
7 0.52771074 183 acl-2010-Online Generation of Locality Sensitive Hash Signatures
8 0.5083999 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models
9 0.47421902 5 acl-2010-A Framework for Figurative Language Detection Based on Sense Differentiation
10 0.45646453 41 acl-2010-Automatic Selectional Preference Acquisition for Latin Verbs
11 0.45594358 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
12 0.43825313 238 acl-2010-Towards Open-Domain Semantic Role Labeling
13 0.42830965 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
14 0.41814014 7 acl-2010-A Generalized-Zero-Preserving Method for Compact Encoding of Concept Lattices
15 0.40816319 263 acl-2010-Word Representations: A Simple and General Method for Semi-Supervised Learning
16 0.40704179 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs
17 0.40618157 220 acl-2010-Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
18 0.40330285 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.
19 0.40178466 158 acl-2010-Latent Variable Models of Selectional Preference
20 0.39728662 12 acl-2010-A Probabilistic Generative Model for an Intermediate Constituency-Dependency Representation
topicId topicWeight
[(7, 0.01), (14, 0.02), (25, 0.092), (42, 0.018), (44, 0.02), (59, 0.118), (73, 0.053), (78, 0.306), (80, 0.015), (83, 0.086), (84, 0.038), (98, 0.125)]
simIndex simValue paperId paperTitle
1 0.97271323 228 acl-2010-The Importance of Rule Restrictions in CCG
Author: Marco Kuhlmann ; Alexander Koller ; Giorgio Satta
Abstract: Combinatory Categorial Grammar (CCG) is generally construed as a fully lexicalized formalism, where all grammars use one and the same universal set of rules, and crosslinguistic variation is isolated in the lexicon. In this paper, we show that the weak generative capacity of this ‘pure’ form of CCG is strictly smaller than that of CCG with grammar-specific rules, and of other mildly context-sensitive grammar formalisms, including Tree Adjoining Grammar (TAG). Our result also carries over to a multi-modal extension of CCG.
2 0.97029048 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling
Author: Hector-Hugo Franco-Penya
Abstract: ―Tree SRL system‖ is a Semantic Role Labelling supervised system based on a tree-distance algorithm and a simple k-NN implementation. The novelty of the system lies in comparing the sentences as tree structures with multiple relations instead of extracting vectors of features for each relation and classifying them. The system was tested with the English CoNLL-2009 shared task data set where 79% accuracy was obtained. 1
3 0.91903472 229 acl-2010-The Influence of Discourse on Syntax: A Psycholinguistic Model of Sentence Processing
Author: Amit Dubey
Abstract: Probabilistic models of sentence comprehension are increasingly relevant to questions concerning human language processing. However, such models are often limited to syntactic factors. This paper introduces a novel sentence processing model that consists of a parser augmented with a probabilistic logic-based model of coreference resolution, which allows us to simulate how context interacts with syntax in a reading task. Our simulations show that a Weakly Interactive cognitive architecture can explain data which had been provided as evidence for the Strongly Interactive hypothesis.
4 0.91275162 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences
Author: Alan Ritter ; Mausam Mausam ; Oren Etzioni
Abstract: The computation of selectional preferences, the admissible argument values for a relation, is a well-known NLP task with broad applicability. We present LDA-SP, which utilizes LinkLDA (Erosheva et al., 2004) to model selectional preferences. By simultaneously inferring latent topics and topic distributions over relations, LDA-SP combines the benefits of previous approaches: like traditional classbased approaches, it produces humaninterpretable classes describing each relation’s preferences, but it is competitive with non-class-based methods in predictive power. We compare LDA-SP to several state-ofthe-art methods achieving an 85% increase in recall at 0.9 precision over mutual information (Erk, 2007). We also evaluate LDA-SP’s effectiveness at filtering improper applications of inference rules, where we show substantial improvement over Pantel et al. ’s system (Pantel et al., 2007).
same-paper 5 0.87090737 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
Author: Stefan Thater ; Hagen Furstenau ; Manfred Pinkal
Abstract: We present a syntactically enriched vector model that supports the computation of contextualized semantic representations in a quasi compositional fashion. It employs a systematic combination of first- and second-order context vectors. We apply our model to two different tasks and show that (i) it substantially outperforms previous work on a paraphrase ranking task, and (ii) achieves promising results on a wordsense similarity task; to our knowledge, it is the first time that an unsupervised method has been applied to this task.
6 0.80592412 158 acl-2010-Latent Variable Models of Selectional Preference
7 0.80239606 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses
8 0.78339845 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar
9 0.77253973 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates
10 0.76251078 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
11 0.74834096 130 acl-2010-Hard Constraints for Grammatical Function Labelling
12 0.74794447 21 acl-2010-A Tree Transducer Model for Synchronous Tree-Adjoining Grammars
13 0.7420941 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
14 0.74181765 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
15 0.73794872 71 acl-2010-Convolution Kernel over Packed Parse Forest
16 0.73540711 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
17 0.73139447 67 acl-2010-Computing Weakest Readings
18 0.72731519 198 acl-2010-Predicate Argument Structure Analysis Using Transformation Based Learning
19 0.72599459 107 acl-2010-Exemplar-Based Models for Word Meaning in Context
20 0.72192603 248 acl-2010-Unsupervised Ontology Induction from Text