acl acl2010 acl2010-107 knowledge-graph by maker-knowledge-mining

107 acl-2010-Exemplar-Based Models for Word Meaning in Context


Source: pdf

Author: Katrin Erk ; Sebastian Pado

Abstract: This paper describes ongoing work on distributional models for word meaning in context. We abandon the usual one-vectorper-word paradigm in favor of an exemplar model that activates only relevant occurrences. On a paraphrasing task, we find that a simple exemplar model outperforms more complex state-of-the-art models.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract This paper describes ongoing work on distributional models for word meaning in context. [sent-4, score-0.153]

2 We abandon the usual one-vectorper-word paradigm in favor of an exemplar model that activates only relevant occurrences. [sent-5, score-0.567]

3 On a paraphrasing task, we find that a simple exemplar model outperforms more complex state-of-the-art models. [sent-6, score-0.502]

4 They describe a lemma through a high-dimensional vector that records co-occurrence with context features over a large corpus. [sent-8, score-0.174]

5 Distributional models are also attractive as a model of word meaning in context, since they do not have to rely on fixed sets of dictionary sense with their well-known problems (Kilgarriff, 1997; McCarthy and Navigli, 2009). [sent-13, score-0.131]

6 Also, they can be used directly for testing paraphrase applicability (Szpektor et al. [sent-14, score-0.18]

7 , 2008), a task that has recently become prominent in the context of textual entailment (Bar-Haim et al. [sent-15, score-0.084]

8 However, polysemy is a fundamental problem for distributional models. [sent-17, score-0.131]

9 Typically, distributional models compute a single “type” vector for a target word, which contains cooccurrence counts for all the occurrences of the target in a large corpus. [sent-18, score-0.316]

10 If the target is polysemous, this vector mixes contextual features for all the senses of the target. [sent-19, score-0.117]

11 This problem has typically been approached by modifying the type vector for a target to better match a given context (Mitchell and Lapata, 2008; Erk and Pad o´, 2008; Thater et al. [sent-23, score-0.184]

12 In the terms of research on human concept representation, which often employs feature vector representations, the use of type vectors can be understood as aprototype-based approach, which uses a single vector per category. [sent-25, score-0.159]

13 From this angle, computing prototypes throws away much interesting distributional information. [sent-26, score-0.105]

14 A rival class of models is that of exemplar models, which memorize each seen instance of a category and perform categorization by comparing a new stimulus to each remembered exemplar vector. [sent-27, score-1.019]

15 We can address the polysemy issue through an exemplar model by simply removing all exem- plars that are “not relevant” for the present context, or conversely activating only the relevant ones. [sent-28, score-0.6]

16 For the coach example, in the context of a text about motorways, presumably an instance like “The coach drove a steady 45 mph” would be activated, while “The team lost all games since the new coach arrived” would not. [sent-29, score-0.291]

17 In this paper, we present an exemplar-based distributional model for modeling word meaning in context, applying the model to the task of deciding paraphrase applicability. [sent-30, score-0.334]

18 With a very simple vector representation and just using activation, we outperform the state-of-the-art prototype models. [sent-31, score-0.13]

19 2 Related Work Among distributional models of word, there are some approaches that address polysemy, either by inducing a fixed clustering of contexts into senses (Sch u¨tze, 1998) or by dynamically modi92 Uppsala,P Srwoce de dni,n 1g1s- 1of6 t Jhuely AC 20L1 20 . [sent-33, score-0.151]

20 0c 2 C0o1n0fe Aresnsoceci Sathio rnt f Poarp Ceorsm,p paugteastio 9n2a–l9 L7i,nguistics fying a word’s type vector according to each given sentence context (Landauer and Dumais, 1997; Mitchell and Lapata, 2008; Erk and Pad o´, 2008; Thater et al. [sent-35, score-0.112]

21 Some use a bag-of-words representation of words in the current sentence (Sch u¨tze, 1998; Landauer and Dumais, 1997), some make use of syntactic context (Mitchell and Lapata, 2008; Erk and Pad o´, 2008; Thater et al. [sent-38, score-0.072]

22 The approach that we present in the current paper computes a representation dynamically for each sentence context, using a simple bag-of-words representation of context. [sent-40, score-0.066]

23 In cognitive science, prototype models predict degree of category membership through similarity to a single prototype, while exemplar theory represents a concept as a collection of all previously seen exemplars (Murphy, 2002). [sent-41, score-0.915]

24 (2007) found that the benefit of exemplars over prototypes grows with the number of available exemplars. [sent-43, score-0.309]

25 The problem of representing meaning in context, which we consider in this paper, is closely related to the problem of concept combination in cognitive science, i. [sent-44, score-0.089]

26 , the derivation of representations for complex concepts (such as “metal spoon”) given the representations of base concepts (“metal” and “spoon”). [sent-46, score-0.092]

27 While most approaches to concept combination are based on prototype models, Voorspoels et al. [sent-47, score-0.063]

28 (2009) show superior results for an exemplar model based on exemplar activation. [sent-48, score-0.987]

29 In the current paper, we use an exemplar model for computing distributional representations for word meaning in context, using the context to activate relevant exemplars. [sent-51, score-0.756]

30 Comparing representations of context, bag-of-words (BOW) representations are more informative and noisier, while syntax-based representations deliver sparser and less noisy information. [sent-52, score-0.138]

31 Following the hypothesis that richer, topical information is more suitable for exemplar activation, we use BOW representations of sentential context in the current paper. [sent-53, score-0.61]

32 3 Exemplar Activation Models We now present an exemplar-based model for meaning in context. [sent-54, score-0.06]

33 It lemma is represented by an exemplar is a sentence represented as a vector. [sent-55, score-0.565]

34 for individual exemplars assumes that each target a set of exemplars, where in which the target occurs, We use lowercase letters (vectors), and uppercase AaSriWtleswfnduatrseiyngutrifbeachnlorteWdxnutrihegn. [sent-56, score-0.425]

35 We model polysemy by activating relevant exemplars of a lemma E in a given sentence context s. [sent-60, score-0.525]

36 (Note that we use E to refer to both a lemma and its exemplar set, and that s can be viewed as just another exemplar vector. [sent-61, score-1.05]

37 ) In general, we define activation of a set E by exemplar s as act(E, s) = {e ∈ E | sim(e, s) > θ(E, s) } where E is an exemplar set, s is the “point of comparison”, sim is some similarity measure such as Cosine or Jaccard, and θ(E, s) is a threshold. [sent-62, score-1.524]

38 Exemplars belong to the activated set iftheir similarity to s exceeds θ(E, s). [sent-63, score-0.133]

39 In kNN activation, the k most similar exemplars to s are activated by setting θ to the similarity of the k-th most similar exemplar. [sent-65, score-0.414]

40 Note that, while in the kNN activation scheme the number of activated exemplars is the same for every lemma, this is not the case for percentage activation: There, a more frequent lemma (i. [sent-67, score-0.962]

41 , a lemma with more exemplars) will have more exemplars activated. [sent-69, score-0.361]

42 A paraphrases is typically only applicable to a particular sense of a target word. [sent-71, score-0.276]

43 Table 1illustrates this on two examples from the Lexical Substitution (LexSub) dataset (McCarthy and Navigli, 2009), both featuring the target return. [sent-72, score-0.115]

44 The right column lists appropriate paraphrases of return in each context (given by human annotators). [sent-73, score-0.216]

45 2 We apply the exemplar activation model to the task of predicting paraphrase felicity: Given a target lemma T in a particular sentential context s, and given a list of 1In principle, activation could be treated not just as binary inclusion/exclusion, but also as a graded weighting scheme. [sent-74, score-1.857]

46 2Each annotator was allowed to give up to three paraphrases per target in context. [sent-76, score-0.239]

47 As a consequence, the number of gold paraphrases per target sentence varies. [sent-77, score-0.263]

48 93 potential paraphrases of T, the task is to predict which of the paraphrases are applicable in s. [sent-78, score-0.352]

49 , 2009) have performed this task by modifying the type vector for T to the context s and then comparing the resulting vector T0 to the type vector of a paraphrase candidate P. [sent-80, score-0.4]

50 In our exemplar setting, we select a contextually adequate subset of contexts in which T has been observed, using T0 = act(T, s) as a generalized representation of meaning of target T in the context of s. [sent-81, score-0.693]

51 Previous approaches used all of P as a representation for a paraphrase candidate P. [sent-82, score-0.203]

52 However, P includes also irrelevant exemplars, while for a paraphrase to be judged as good, it is sufficient that one plausible reading exists. [sent-83, score-0.197]

53 We evaluate our model on predicting paraphrases from the Lexical Substitution (LexSub) dataset (McCarthy and Navigli, 2009). [sent-86, score-0.227]

54 This dataset consists of 2000 instances of 200 target words in sentential contexts, with paraphrases for each target word instance generated by up to 6 participants. [sent-87, score-0.384]

55 Following Erk and Pad o´ (2008), we take the list of paraphrase candidates for a target as given (computed by pooling all paraphrases that LexSub annotators proposed for the target) and use the models to rank them for any given sentence context. [sent-90, score-0.486]

56 These vectors represent instances of a target word by the other words in the same sentence, lemmatized and POStagged, minus stop words. [sent-92, score-0.102]

57 , if the lemma gnurge occurs twice in the BNC, once in the sentence “The dog will gnurge the other dog”, and once in “The old windows gnurged”, the exemplar set for gnurge contains the vectors [dog-n: 2, othera:1] and [old-a: 1, window-n: 1]. [sent-95, score-0.783]

58 For exemplar similarity, we use the standard Cosine similarity, and for the similarity of two exemplar sets, the Cosine of their centroids. [sent-96, score-0.998]

59 The model’s prediction for an item is a list of paraphrases ranked by their predicted goodness of fit. [sent-98, score-0.167]

60 , qmi be the list of gold paraphrases with gold weights hy1, . [sent-104, score-0.215]

61 , xni be the gold weights amssoodceli,at eadnd dw lietth htxhem (assume xi = g0o lifd pi ∈ G), where G ⊆ P. [sent-114, score-0.077]

62 Lwertite I xi = i1 xk f,o arn tdhe z average gold weight of the first Pi model predictions, and analogously yi. [sent-117, score-0.093]

63 Then Pik=1∈ GAP(P,G) =Pjm=11I(yj)yjXi=n1I(xi)xi Since the model mayP Prank multiple paraphrases the same, we average over 10 random permutations of equally ranked paraphrases. [sent-118, score-0.184]

64 We first computed two models that activate either the paraphrase or the target, but not both. [sent-121, score-0.252]

65 Model 1, actT, activates only the target, using the complete P as paraphrase, and ranking paraphrases by sim(P, act(T, s)). [sent-122, score-0.253]

66 Model 2, actP, activates only the paraphrase, using s as the target word, ranking by sim(act(P, s) , s). [sent-123, score-0.158]

67 The results for these models are shown in Table 2, with both kNN and percentage activation: kNN activation with a parameter of 10 means that the 10 closest neighbors were activated, while percentage with a parameter of 10 means that the closest 10% of the exemplars were used. [sent-124, score-0.854]

68 6) corresponds to a prototype-based model that ranks paraphrase candidates by the distance between their type vectors and the target’s type vector. [sent-129, score-0.279]

69 Note also that both actT and actP show the best results for small values of the activation parameter. [sent-131, score-0.491]

70 This indicates paraphrases can be judged on the basis of a rather small number of exemplars. [sent-132, score-0.184]

71 For actT, a small absolute number of activated exemplars (here, 20) works best , while actP yields the best results for a small percentage of paraphrase exemplars. [sent-134, score-0.628]

72 Section 3): Activation of the paraphrase must allow a guess about whether there is reasonable interpretation of P in the context s. [sent-136, score-0.229]

73 In contrast, target activation merely has to counteract the sparsity of s, and activation of too many exemplars from T leads to oversmoothing. [sent-138, score-1.318]

74 With the exception of actT/perc, all activation methods significantly outperform the best baseline (actP, no activation). [sent-151, score-0.511]

75 Based on these observations, we computed a third model, actTP, that activates both T (by kNN) and P (by percentage), ranking paraphrases by sim(act(P, s) , act(T, s)). [sent-152, score-0.253]

76 Table 2), namely by setting the activation parameters to small values. [sent-155, score-0.493]

77 e137ful LexSub dataset (GAP evaluation) we fix the actP activation level, we find comparatively large performance differences between the T activation settings k=5 and k=10 (highly significant for 10% actP, and significant for 20% and 30% actP). [sent-160, score-1.007]

78 On the other hand, when we fix the actT activation level, changes in actP activation generally have an insignificant impact. [sent-161, score-0.964]

79 This indicates that at least in the current vector space the sparsity of s is less of a problem than the “dilution” of s that we face when we representing the target word by exemplars of T close to s. [sent-163, score-0.419]

80 An analysis of the results by target part-of-speech showed that the globally optimal parameters also yield the best results for individual POS, even though there are substantial differences among POS. [sent-166, score-0.112]

81 For actT, the best results emerge for all POS with kNN activation with k between 10 and 30. [sent-167, score-0.491]

82 For actP, the best parameter for all POS was activation of 10%, with GAPs of 36. [sent-172, score-0.491]

83 9) are better than actP for verbs, but worse for nouns and adjectives, which indicates that the sparsity problem might be more prominent than for the other POS. [sent-179, score-0.061]

84 In all three models, we found a clear effect of target and paraphrase frequency, with deteriorating performance for the highest-frequency targets as well as for the lemmas with the highest average paraphrase frequency. [sent-180, score-0.432]

85 We have re-evaluated our exemplar models on the subsets we used in Erk and Pad o´ (2008, EP08, 367 95 Models EP08 EP09 TDP09 EP08 dataset27. [sent-183, score-0.518]

86 The results in Table 4 compare these models against our best previous exemplar models and show that our models outperform these models across the board. [sent-198, score-0.656]

87 5 Conclusions and Outlook This paper reports on work in progress on an exemplar activation model as an alternative to onevector-per-word approaches to word meaning in context. [sent-205, score-1.017]

88 Exemplar activation is very effective in handling polysemy, even with a very simple (and sparse) bag-of-words vector representation. [sent-206, score-0.517]

89 On both the EP08 and EP09 datasets, our models surpass more complex prototype-based approaches (Tab. [sent-207, score-0.065]

90 It is also noteworthy that the exemplar activation models work best when few exemplars are used, which bodes well for their efficiency. [sent-209, score-1.29]

91 We found that the best target representations re3Since our models had the advantage of being tuned on the dataset, we also report the range of results across the parameters we tested. [sent-210, score-0.191]

92 Paraphrase representations are best activated with a percentage-based threshold. [sent-227, score-0.17]

93 Overall, we found that paraphrase activation had a much larger impact on performance than target activation, and that drawing on target exemplars other than s to represent the target meaning in context improved over using s itself only for verbs (Tab. [sent-228, score-1.266]

94 This suggests the possibility of considering T’s activated paraphrase candidates as the representation of T in the context s, rather than some vector of T itself, in the spirit of Kintsch (2001). [sent-230, score-0.418]

95 While it is encouraging that the best parameter settings involved the activation of only few exemplars, computation with exemplar models still requires the management of large numbers of vectors. [sent-231, score-1.009]

96 The computational overhead can be reduced by us- ing data structures that cut down on the number of vector comparisons, or by decreasing vector dimensionality (Gorman and Curran, 2006). [sent-232, score-0.09]

97 , 2008), and we hope that they can be integrated in a more sophisticated exemplar model. [sent-236, score-0.485]

98 A structured vector space 96 model for word meaning in context. [sent-268, score-0.105]

99 Paraphrase assessment in structured vector space: Exploring parameters and datasets. [sent-274, score-0.066]

100 Testing the distributional hypothesis: The influence of context on judgements of semantic similarity. [sent-333, score-0.126]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('exemplar', 0.485), ('activation', 0.472), ('actp', 0.35), ('exemplars', 0.281), ('actt', 0.221), ('paraphrase', 0.18), ('paraphrases', 0.167), ('lexsub', 0.148), ('erk', 0.119), ('activated', 0.105), ('pad', 0.104), ('thater', 0.097), ('knn', 0.097), ('lemma', 0.08), ('distributional', 0.077), ('acttp', 0.074), ('coach', 0.074), ('target', 0.072), ('mccarthy', 0.067), ('activates', 0.065), ('gap', 0.059), ('szpektor', 0.055), ('gnurge', 0.055), ('sim', 0.054), ('polysemy', 0.054), ('act', 0.054), ('context', 0.049), ('representations', 0.046), ('vector', 0.045), ('activating', 0.044), ('dataset', 0.043), ('meaning', 0.043), ('prototype', 0.042), ('landauer', 0.041), ('substitution', 0.04), ('activate', 0.039), ('datapoints', 0.037), ('spoon', 0.037), ('voorspoels', 0.037), ('navigli', 0.036), ('salton', 0.035), ('models', 0.033), ('surpass', 0.032), ('geometrical', 0.032), ('gorman', 0.032), ('metal', 0.032), ('mitchell', 0.032), ('vectors', 0.03), ('adjectives', 0.03), ('sentential', 0.03), ('bow', 0.03), ('xi', 0.029), ('similarity', 0.028), ('cogsci', 0.028), ('prototypes', 0.028), ('lapata', 0.027), ('dumais', 0.026), ('verbs', 0.025), ('daelemans', 0.025), ('cognitive', 0.025), ('pi', 0.024), ('baroni', 0.024), ('percentage', 0.024), ('gold', 0.024), ('sch', 0.023), ('dog', 0.023), ('tdhe', 0.023), ('florian', 0.023), ('representation', 0.023), ('tze', 0.022), ('concept', 0.021), ('parameters', 0.021), ('ranking', 0.021), ('cosine', 0.021), ('sparsity', 0.021), ('nouns', 0.021), ('contexts', 0.021), ('team', 0.02), ('neighbors', 0.02), ('dagan', 0.02), ('fix', 0.02), ('dynamically', 0.02), ('outperform', 0.02), ('predictions', 0.02), ('attractive', 0.019), ('prominent', 0.019), ('athens', 0.019), ('sense', 0.019), ('best', 0.019), ('applicable', 0.018), ('type', 0.018), ('datasets', 0.018), ('annotators', 0.018), ('model', 0.017), ('cooccurrence', 0.017), ('significance', 0.017), ('judged', 0.017), ('entailment', 0.016), ('categorization', 0.016), ('candidates', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 107 acl-2010-Exemplar-Based Models for Word Meaning in Context

Author: Katrin Erk ; Sebastian Pado

Abstract: This paper describes ongoing work on distributional models for word meaning in context. We abandon the usual one-vectorper-word paradigm in favor of an exemplar model that activates only relevant occurrences. On a paraphrasing task, we find that a simple exemplar model outperforms more complex state-of-the-art models.

2 0.25045249 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models

Author: Stefan Thater ; Hagen Furstenau ; Manfred Pinkal

Abstract: We present a syntactically enriched vector model that supports the computation of contextualized semantic representations in a quasi compositional fashion. It employs a systematic combination of first- and second-order context vectors. We apply our model to two different tasks and show that (i) it substantially outperforms previous work on a paraphrase ranking task, and (ii) achieves promising results on a wordsense similarity task; to our knowledge, it is the first time that an unsupervised method has been applied to this task.

3 0.14977874 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation

Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita

Abstract: Lattice decoding in statistical machine translation (SMT) is useful in speech translation and in the translation of German because it can handle input ambiguities such as speech recognition ambiguities and German word segmentation ambiguities. We show that lattice decoding is also useful for handling input variations. Given an input sentence, we build a lattice which represents paraphrases of the input sentence. We call this a paraphrase lattice. Then, we give the paraphrase lattice as an input to the lattice decoder. The decoder selects the best path for decoding. Using these paraphrase lattices as inputs, we obtained significant gains in BLEU scores for IWSLT and Europarl datasets.

4 0.12410437 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection

Author: Linlin Li ; Benjamin Roth ; Caroline Sporleder

Abstract: This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instantiations of the model for solving sense disambiguation problems with different degrees of resource availability. The proposed models are tested on three different tasks: coarse-grained word sense disambiguation, fine-grained word sense disambiguation, and detection of literal vs. nonliteral usages of potentially idiomatic expressions. In all three cases, we outper- form state-of-the-art systems either quantitatively or statistically significantly.

5 0.075015724 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation

Author: Boxing Chen ; George Foster ; Roland Kuhn

Abstract: This paper proposes new algorithms to compute the sense similarity between two units (words, phrases, rules, etc.) from parallel corpora. The sense similarity scores are computed by using the vector space model. We then apply the algorithms to statistical machine translation by computing the sense similarity between the source and target side of translation rule pairs. Similarity scores are used as additional features of the translation model to improve translation performance. Significant improvements are obtained over a state-of-the-art hierarchical phrase-based machine translation system. 1

6 0.060120948 158 acl-2010-Latent Variable Models of Selectional Preference

7 0.054835286 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences

8 0.053450249 220 acl-2010-Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure

9 0.051790912 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models

10 0.047781356 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion

11 0.044843365 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts

12 0.044314027 66 acl-2010-Compositional Matrix-Space Models of Language

13 0.04292395 144 acl-2010-Improved Unsupervised POS Induction through Prototype Discovery

14 0.042355515 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems

15 0.041926712 165 acl-2010-Learning Script Knowledge with Web Experiments

16 0.040316645 127 acl-2010-Global Learning of Focused Entailment Graphs

17 0.039342597 26 acl-2010-All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision

18 0.039079584 238 acl-2010-Towards Open-Domain Semantic Role Labeling

19 0.036873635 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference

20 0.036667246 205 acl-2010-SVD and Clustering for Unsupervised POS Tagging


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.113), (1, 0.047), (2, -0.017), (3, -0.0), (4, 0.093), (5, -0.002), (6, 0.056), (7, 0.017), (8, 0.014), (9, -0.028), (10, 0.016), (11, 0.047), (12, 0.14), (13, 0.048), (14, 0.008), (15, -0.016), (16, -0.024), (17, -0.085), (18, -0.087), (19, -0.004), (20, 0.24), (21, 0.166), (22, -0.019), (23, 0.029), (24, 0.111), (25, -0.151), (26, -0.064), (27, 0.026), (28, -0.103), (29, -0.161), (30, -0.085), (31, 0.026), (32, -0.046), (33, 0.028), (34, 0.059), (35, 0.056), (36, -0.102), (37, -0.088), (38, -0.015), (39, 0.041), (40, -0.063), (41, -0.093), (42, -0.119), (43, 0.121), (44, -0.046), (45, 0.058), (46, 0.141), (47, 0.097), (48, 0.07), (49, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9359414 107 acl-2010-Exemplar-Based Models for Word Meaning in Context

Author: Katrin Erk ; Sebastian Pado

Abstract: This paper describes ongoing work on distributional models for word meaning in context. We abandon the usual one-vectorper-word paradigm in favor of an exemplar model that activates only relevant occurrences. On a paraphrasing task, we find that a simple exemplar model outperforms more complex state-of-the-art models.

2 0.77847767 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models

Author: Stefan Thater ; Hagen Furstenau ; Manfred Pinkal

Abstract: We present a syntactically enriched vector model that supports the computation of contextualized semantic representations in a quasi compositional fashion. It employs a systematic combination of first- and second-order context vectors. We apply our model to two different tasks and show that (i) it substantially outperforms previous work on a paraphrase ranking task, and (ii) achieves promising results on a wordsense similarity task; to our knowledge, it is the first time that an unsupervised method has been applied to this task.

3 0.63250393 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation

Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita

Abstract: Lattice decoding in statistical machine translation (SMT) is useful in speech translation and in the translation of German because it can handle input ambiguities such as speech recognition ambiguities and German word segmentation ambiguities. We show that lattice decoding is also useful for handling input variations. Given an input sentence, we build a lattice which represents paraphrases of the input sentence. We call this a paraphrase lattice. Then, we give the paraphrase lattice as an input to the lattice decoder. The decoder selects the best path for decoding. Using these paraphrase lattices as inputs, we obtained significant gains in BLEU scores for IWSLT and Europarl datasets.

4 0.46855238 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection

Author: Linlin Li ; Benjamin Roth ; Caroline Sporleder

Abstract: This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instantiations of the model for solving sense disambiguation problems with different degrees of resource availability. The proposed models are tested on three different tasks: coarse-grained word sense disambiguation, fine-grained word sense disambiguation, and detection of literal vs. nonliteral usages of potentially idiomatic expressions. In all three cases, we outper- form state-of-the-art systems either quantitatively or statistically significantly.

5 0.41925028 66 acl-2010-Compositional Matrix-Space Models of Language

Author: Sebastian Rudolph ; Eugenie Giesbrecht

Abstract: We propose CMSMs, a novel type of generic compositional models for syntactic and semantic aspects of natural language, based on matrix multiplication. We argue for the structural and cognitive plausibility of this model and show that it is able to cover and combine various common compositional NLP approaches ranging from statistical word space models to symbolic grammar formalisms.

6 0.38361838 183 acl-2010-Online Generation of Locality Sensitive Hash Signatures

7 0.36476412 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models

8 0.35248876 5 acl-2010-A Framework for Figurative Language Detection Based on Sense Differentiation

9 0.3405754 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation

10 0.32198128 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts

11 0.29549539 263 acl-2010-Word Representations: A Simple and General Method for Semi-Supervised Learning

12 0.28504857 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences

13 0.28257912 220 acl-2010-Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure

14 0.28045571 238 acl-2010-Towards Open-Domain Semantic Role Labeling

15 0.26301336 194 acl-2010-Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning

16 0.25423634 7 acl-2010-A Generalized-Zero-Preserving Method for Compact Encoding of Concept Lattices

17 0.24612837 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion

18 0.24461465 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.

19 0.23397614 205 acl-2010-SVD and Clustering for Unsupervised POS Tagging

20 0.23152515 26 acl-2010-All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.015), (23, 0.272), (25, 0.065), (35, 0.011), (42, 0.021), (44, 0.018), (59, 0.078), (64, 0.011), (71, 0.01), (72, 0.016), (73, 0.042), (78, 0.114), (83, 0.082), (84, 0.038), (98, 0.1)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.78530079 63 acl-2010-Comparable Entity Mining from Comparative Questions

Author: Shasha Li ; Chin-Yew Lin ; Young-In Song ; Zhoujun Li

Abstract: Comparing one thing with another is a typical part of human decision making process. However, it is not always easy to know what to compare and what are the alternatives. To address this difficulty, we present a novel way to automatically mine comparable entities from comparative questions that users posted online. To ensure high precision and high recall, we develop a weakly-supervised bootstrapping method for comparative question identification and comparable entity extraction by leveraging a large online question archive. The experimental results show our method achieves F1measure of 82.5% in comparative question identification and 83.3% in comparable entity extraction. Both significantly outperform an existing state-of-the-art method. 1

same-paper 2 0.76019788 107 acl-2010-Exemplar-Based Models for Word Meaning in Context

Author: Katrin Erk ; Sebastian Pado

Abstract: This paper describes ongoing work on distributional models for word meaning in context. We abandon the usual one-vectorper-word paradigm in favor of an exemplar model that activates only relevant occurrences. On a paraphrasing task, we find that a simple exemplar model outperforms more complex state-of-the-art models.

3 0.71876121 122 acl-2010-Generating Fine-Grained Reviews of Songs from Album Reviews

Author: Swati Tata ; Barbara Di Eugenio

Abstract: Music Recommendation Systems often recommend individual songs, as opposed to entire albums. The challenge is to generate reviews for each song, since only full album reviews are available on-line. We developed a summarizer that combines information extraction and generation techniques to produce summaries of reviews of individual songs. We present an intrinsic evaluation of the extraction components, and of the informativeness of the summaries; and a user study of the impact of the song review summaries on users’ decision making processes. Users were able to make quicker and more informed decisions when presented with the summary as compared to the full album review.

4 0.594028 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences

Author: Alan Ritter ; Mausam Mausam ; Oren Etzioni

Abstract: The computation of selectional preferences, the admissible argument values for a relation, is a well-known NLP task with broad applicability. We present LDA-SP, which utilizes LinkLDA (Erosheva et al., 2004) to model selectional preferences. By simultaneously inferring latent topics and topic distributions over relations, LDA-SP combines the benefits of previous approaches: like traditional classbased approaches, it produces humaninterpretable classes describing each relation’s preferences, but it is competitive with non-class-based methods in predictive power. We compare LDA-SP to several state-ofthe-art methods achieving an 85% increase in recall at 0.9 precision over mutual information (Erk, 2007). We also evaluate LDA-SP’s effectiveness at filtering improper applications of inference rules, where we show substantial improvement over Pantel et al. ’s system (Pantel et al., 2007).

5 0.59232634 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models

Author: Stefan Thater ; Hagen Furstenau ; Manfred Pinkal

Abstract: We present a syntactically enriched vector model that supports the computation of contextualized semantic representations in a quasi compositional fashion. It employs a systematic combination of first- and second-order context vectors. We apply our model to two different tasks and show that (i) it substantially outperforms previous work on a paraphrase ranking task, and (ii) achieves promising results on a wordsense similarity task; to our knowledge, it is the first time that an unsupervised method has been applied to this task.

6 0.58216596 229 acl-2010-The Influence of Discourse on Syntax: A Psycholinguistic Model of Sentence Processing

7 0.57945186 158 acl-2010-Latent Variable Models of Selectional Preference

8 0.57271564 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses

9 0.56768578 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling

10 0.56580603 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans

11 0.56329918 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification

12 0.55990928 130 acl-2010-Hard Constraints for Grammatical Function Labelling

13 0.55847979 71 acl-2010-Convolution Kernel over Packed Parse Forest

14 0.55798101 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese

15 0.55425799 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates

16 0.55095929 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar

17 0.54774475 248 acl-2010-Unsupervised Ontology Induction from Text

18 0.54754478 198 acl-2010-Predicate Argument Structure Analysis Using Transformation Based Learning

19 0.54732352 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns

20 0.54701716 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition