emnlp emnlp2011 emnlp2011-107 knowledge-graph by maker-knowledge-mining

107 emnlp-2011-Probabilistic models of similarity in syntactic context

Source: pdf

Author: Diarmuid O Seaghdha ; Anna Korhonen

Abstract: This paper investigates novel methods for incorporating syntactic information in probabilistic latent variable models of lexical choice and contextual similarity. The resulting models capture the effects of context on the interpretation of a word and in particular its effect on the appropriateness of replacing that word with a potentially related one. Evaluating our techniques on two datasets, we report performance above the prior state of the art for estimating sentence similarity and ranking lexical substitutes.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Probabilistic models of similarity in syntactic context Diarmuid O´ S ´eaghdha Computer Laboratory University of Cambridge United Kingdom do2 4 2 @ cl cam ac uk . [sent-1, score-0.248]

2 Abstract This paper investigates novel methods for incorporating syntactic information in probabilistic latent variable models of lexical choice and contextual similarity. [sent-4, score-0.273]

3 The resulting models capture the effects of context on the interpretation of a word and in particular its effect on the appropriateness of replacing that word with a potentially related one. [sent-5, score-0.118]

4 Evaluating our techniques on two datasets, we report performance above the prior state of the art for estimating sentence similarity and ranking lexical substitutes. [sent-6, score-0.101]

5 1 Introduction Distributional models of lexical semantics, which assume that aspects of a word’s meaning can be related to the contexts in which that word is typically used, have a long history in Natural Language Processing (Sp a¨rck Jones, 1964; Harper, 1965). [sent-7, score-0.224]

6 Such models still constitute one of the most popular approaches to lexical semantics, with many proven applications. [sent-8, score-0.099]

7 Much work in distributional semantics treats words as non-contextualised units; the models that are constructed can answer questions such as “how similar are the words body and corpse? [sent-9, score-0.269]

8 ” but do not capture the way the syntactic context in which a word appears can affect its interpretation. [sent-10, score-0.102]

9 , 2011) have aimed to address compositionality of meaning in terms of distributional semantics, leading to new kinds of questions such as “how similar are the usages of the words body and corpse in the 1047 Anna Korhonen Computer Laboratory University of Cambridge United Kingdom Anna . [sent-13, score-0.269]

10 ” and “how similar are the phrases the body deliberated the motion and the corpse rotted? [sent-21, score-0.187]

11 In this paper we focus on answering questions of the former type and investigate models that describe the effect of syntactic context on the meaning of a single word. [sent-23, score-0.197]

12 The work described in this paper uses probabilistic latent variable models to describe patterns of syntactic interaction, building on the selectional preference models of O´ S ´eaghdha (2010) and Ritter et al. [sent-24, score-0.351]

13 (2010) and the lexical substitution models of Dinu and Lapata (2010). [sent-25, score-0.219]

14 We propose novel methods for incorporating information about syntactic context in models of lexical choice, yielding a probabilistic analogue to dependency-based models of contextual similarity. [sent-26, score-0.256]

15 Our models attain state-of-the-art performance on two evaluation datasets: a set of sentence similarity judgements collected by Mitchell and Lapata (2008) and the dataset of the English Lexical Substitution Task (McCarthy and Navigli, 2009). [sent-27, score-0.229]

16 In view of the well-established effectiveness of dependency-based distributional semantics and of probabilistic frameworks for semantic inference, we expect that our approach will prove to be of value in a wide range of application settings. [sent-28, score-0.192]

17 2 Related work The literature on distributional semantics is vast; in this section we focus on outlining the research that is most directly related to capturing effects of context and compositionality. [sent-29, score-0.215]

18 Mitchell and Lapata investigate a number of simple methods for combining distributional word vectors, concluding that pointwise multiplication best corresponds to the effects of syntactic interaction. [sent-35, score-0.131]

19 Erk and Pad o´ (2008) introduce the concept of a structured vector space in which each word is associated with a set of selectional preference vectors corresponding to different syntactic dependencies. [sent-36, score-0.106]

20 (2010) develop this geometric approach further using a space of second-order distributional vectors that represent the words typically co-occurring with the contexts in which a word typically appears. [sent-38, score-0.177]

21 The primary concern of these authors is to model the effect of context on word meaning; the work we present in this paper uses similar intuitions in a probabilistic modelling framework. [sent-39, score-0.224]

22 A parallel strand of research seeks to represent the meaning of larger compositional structures using matrix and tensor algebra (Smolensky, 1990; Rudolph and Giesbrecht, 2010; Baroni and Zamparelli, 2010; Grefenstette et al. [sent-40, score-0.114]

23 This nascent approach holds the promise of providing a much richer notion of context than is currently exploited in semantic applications. [sent-42, score-0.103]

24 Probabilistic latent variable frameworks for generalising about contextual behaviour (in the form of verb-noun selectional preferences) were proposed by Pereira et al. [sent-43, score-0.202]

25 Latent variable models are also conceptually similar to non-probabilistic dimensionality reduction techniques such as Latent Semantic Analysis (Landauer and Dumais, 1997). [sent-46, score-0.101]

26 ’s approach in a Bayesian framework using models related to Latent Dirichlet Allocation (Blei et al. [sent-49, score-0.055]

27 , 2003), demonstrating that this “topic modelling” architecture is a very good fit for capturing selectional preferences. [sent-50, score-0.067]

28 Reisinger and Mooney (2010) investigate nonparametric Bayesian models for teasing apart the context distributions of polysemous words. [sent-51, score-0.181]

29 1048 As described in Section 3 below, Dinu and Lapata (2010) propose an LDA-based model for lexical substitution; the techniques presented in this paper can be viewed as a generalisation oftheirs. [sent-52, score-0.109]

30 Topic models have also been applied to other classes of semantic task, for example word sense disambiguation (Li et al. [sent-53, score-0.134]

31 , 2010), word sense induction (Brody and Lapata, 2009) and modelling human judgements of semantic association (Griffiths et al. [sent-54, score-0.277]

32 1 Latent variable context models In this paper we consider generative models of lexical choice that assign a probability to a particular word appearing in a given linguistic context. [sent-57, score-0.263]

33 , 2003) is a powerful method for learning such models from a text corpus in an unsupervised way; LDA was originally applied to document modelling, but it has recently been shown to be very effective at inducing models for a variety of semantic tasks (see Section 2). [sent-61, score-0.15]

34 Given a set of contexts C in which an instance o appears (e. [sent-63, score-0.085]

35 The model described above (henceforth C → T) moTdheels m mthoed dependence aofb a target cweofrodr on Cits → context. [sent-67, score-0.067]

36 An alternative perspective is to model the dependence of a set of contexts on a target word, i. [sent-68, score-0.152]

37 iAs non-generative alternative is one that estimates the similarity of the latent variable distributions associated with seeing n and o in context C. [sent-72, score-0.318]

38 The principle that similarity between topic distributions corresponds to semantic similarity is well-known in document modelling and was proposed in the context of lexical substitution by Dinu and Lapata (2010). [sent-73, score-0.671]

39 In terms of the equations presented above, we could compare the distributions P(z|o, C) with P(z|n, C) using equations (5) or (16). [sent-74, score-0.063]

40 In this paper we train LDA models of P(w|c) and P(c|w). [sent-78, score-0.055]

41 I pna tpheer f woerm treari case, Athe m analogy to (dowc|ucm) aenndt modelling itsh tehafot remaechr ccaosnet,e txhte type plays thdeoc ruomlee noft a “document” consisting of all the words observed in that context in a corpus; for P(c|w) the roles are irenv tehrasted co. [sent-79, score-0.274]

42 The empirical estimates for distributions over words and latent variables are derived from the assignment of topics over the training corpus in a single sampling state. [sent-82, score-0.152]

43 2 Context types We have not yet defined what the contexts c look like. [sent-89, score-0.085]

44 In vector space models of semantics it is common to distinguish between window-based and dependency-based models (Pad o´ and Lapata, 2007); one can make the same distinction for probabilistic context models. [sent-90, score-0.233]

45 A broad generalisation is that window-based models capture semantic association (e. [sent-91, score-0.16]

46 referee is associated with football), while dependency models capture a finer-grained notion of similarity (referee is similar to umpire but not to football). [sent-93, score-0.162]

47 Dinu and Lapata (2010) propose a window-based model of lexical substitution; the set of contexts in which a word appears is the set of surrounding words within a prespecified “window size”. [sent-94, score-0.129]

48 In this paper we also investigate dependencybased context sets derived from syntactic structure. [sent-95, score-0.102]

49 the set C of dependency contexts for the noun body is {executive:j:ncmod−1:n, decide:v:ncsubj:n}, wher{ee xneccmutoivde−:j1: ndcemnoodtes that body dstea:nvd:sn cisnu an ni}n-, verse non-clausal modifier relation to executive (we assume that nouns are the heads of their adjectival modifiers). [sent-105, score-0.268]

50 1 Data Mitchell and Lapata (2008) collected human judgements of semantic similarity for pairs of short sentences, where the sentences in a pair share the same subject but different verbs. [sent-107, score-0.173]

51 For example, the sales slumped and the sales declined should be judged as very similar while the shoulders slumped and the shoulders declined should be judged as less similar. [sent-108, score-0.372]

52 Both Mitchell and Lapata and Erk and Pad o´ (2008) split the data into a development portion and a test portion, the development portion consisting of the judgements of six annotators; in order to compare our results with previous research we use the same data split. [sent-112, score-0.076]

53 To evaluate performance, the predictions made by a model are compared to the judgements of each annotator in turn (using ρ) and the resulting per-annotator ρ values are averaged. [sent-113, score-0.076]

54 The BNC was also used by Mitchell and Lapata (2008) and Erk and Pad o´ (2008); as the ML08 dataset was compiled using words appearing more than 50 times in the BNC, there are no coverage problems caused by data sparsity. [sent-117, score-0.122]

55 We trained LDA models for the grammatical relations v:ncsubj:n and n:ncsubj−1:v Table 1: Performance (average ρ) on the ML08 test set and used these to create predictors of type C → T aanndd uTs → Ces, respectively. [sent-118, score-0.195]

56 dFicorto erasc ohf predictor, we tarnadine Td f→ive C runs swpeithct 1v0el0y topics efaorc h10 p0r0ed iitcetroart,io wnse and averaged the predictions produced from their final states. [sent-119, score-0.064]

57 We investigate both the generative paraphrasing model (PARA) and the method of comparing topic distributions (SIM). [sent-120, score-0.164]

58 For both PARA and SIM we present results using each predictor type on its own as well as a combination of both types (T ↔ C); wfonr PasAR weAl tlh aes c ao cnotmribbuitniaotnios no fo tfh beo types are Tmu ↔ltiplied and for SIM they are averaged. [sent-121, score-0.199]

59 This is done by simply evaluating every possible subset of 1–5 runs on the development data and picking the best-scoring subset. [sent-127, score-0.064]

60 3 Results Table 1 presents the results of the PARA and SIM predictors on the ML08 dataset. [sent-129, score-0.14]

61 The best results 3This configuration seems the most intuitive; averaging PARA predictors and multiplying SIM also give good results. [sent-130, score-0.196]

62 27 for their structured vector space (SVS) syntactic disambiguation method. [sent-133, score-0.078]

63 Even without using the development set to select models, performance is well above the previous state of the art for all predictors except PARAC→T. [sent-134, score-0.14]

64 In all cases the T → C predbiyc tMoristc outperform Cat → T al: mcaosdeesl tsh eth Tat →ass Coc piarete- target sw oourdtpse rwfoithrm mdi Cstri →but Tion:s m over cso tnhtaetxt a scslousctieartse are superior to those that associate contexts with distributions over target words. [sent-138, score-0.318]

65 Figure 1 plots the beneficial effect of averaging over multiple runs; as the number of runs n is increased, the average performance over all combinations of n predictors chosen from the set of five T → C and five C → T runs is observed to increase monotonically. [sent-139, score-0.324]

66 1 Data The English Lexical Substitution task, run as part of the SemEval-1 competition, required participants to propose good substitutes for a set of target words in various sentential contexts (McCarthy and Navigli, 2009). [sent-142, score-0.248]

67 Table 2 shows two example sentences and the substitutes appearing in the gold standard, ranked by the number of human annotators who pro- posed each substitute. [sent-143, score-0.096]

68 The dataset contains a total of 2,010 annotated sentences with 205 distinct target words across four parts of speech (noun, verb, adjective, adverb). [sent-144, score-0.108]

69 In line with previous work on contextual disambiguation, we focus here on the subtask of ranking attested substitutes rather than proposing them from an unrestricted vocabulary. [sent-145, score-0.139]

70 To this end, a candidate set is constructed for each target word from all the substitutes proposed for that word in all sentences in the dataset. [sent-146, score-0.163]

71 The data contains a number of multiword paraphrases such as rush at; as our models (like most No. [sent-147, score-0.178]

72 Erk and Pad o´ (2008) use only a subset of the data where the target is a noun headed by a verb or a verb heading a noun. [sent-157, score-0.067]

73 (2010) and Dinu and Lapata (2010) similarly remove multiword paraphrases (Georgiana Dinu, p. [sent-160, score-0.123]

74 1052 (2010) discard sentences which their parser cannot parse and paraphrases absent from their training corpus and then optimise the parameters of their model through four-fold cross-validation. [sent-163, score-0.069]

75 Here we aim for complete coverage on the dataset and do not perform any parameter tuning. [sent-164, score-0.086]

76 1to the Lexical Substitution Task dataset using dependencyand window-based context information. [sent-197, score-0.104]

77 We compare two classes of context models: models learned from window-based contexts and models learned from syntactic dependency contexts. [sent-212, score-0.297]

78 For the syntactic models we extracted all dependencies and inverse dependencies between lemmas of the aforementioned POS types; in order to maximise the extraction yield, the dependency graph for each sentence was preprocessed using the transformations shown in Table 3. [sent-213, score-0.094]

79 For the window-based context model we follow Dinu and Lapata (2010) in treating each word within five words of a target as a member use nlpado de of its context set. [sent-214, score-0.193]

80 It proved necessary to subsample the corpora in order to make LDA training tractable, especially for the window-based model where the training set of context-target counts is extremely dense (each instance of a word in the corpus contributes up to 10 context instances). [sent-215, score-0.117]

81 As the dependency data is an order of magnitude smaller we downsampled the Wikipedia counts by 5 and left the BNC counts untouched. [sent-218, score-0.158]

82 We trained three LDA predictors for each corpus: a window-based predictor (W5), a Context → Target predictor (C → T) ra (nWd a Target → C →on Tteaxrtpredictor (T → C). [sent-221, score-0.466]

83 E foarch e aicnhdividual prediction of similarity between P(z| C, o) adnivdi P(z|n) diisc mtioande o by averaging over nth Pe( predic- tainodns P Po(fz a|lnl runs manadde over aavlle settings vofe rZ . [sent-227, score-0.177]

84 t Choosing a single setting of Z does not degrade performance significantly; however, averaging over settings is a convenient way to avoid having to pick a specific value. [sent-228, score-0.056]

85 We also investigate combinations of predictor types, once again produced by averaging: we combine C → T with C ↔ T (T ↔ C) and combine ebainche oCf t→hes Te t whriethe Cmo ↔dels T w (Tith ↔W5 C. [sent-229, score-0.163]

86 3 Results Table 5 presents the results attained by our models on the Lexical Substitution Task data. [sent-231, score-0.113]

87 The dependency-based models have imperfect coverage (86% of the data); they can make no prediction when no syntactic context is provided for a target, per1054 haps as a result of parsing error. [sent-232, score-0.202]

88 21) on the BNC corpus, but the best results are attained by W5 + T ↔ C trained on the combined corpus (GAP= 49. [sent-239, score-0.058]

89 Tnh teh e re csoumltsb fnoerd t cheo Wpu5s model trained on BNC data is comparable to that trained on the combined corpus; however the syntactic models show a clear benefit from the less sparse dependency data in the combined training corpus. [sent-242, score-0.094]

90 Table 6: Performance by part of speech Table 6 gives a breakdown of performance by target part of speech for the BNC+Wikipedia-trained W5 and W5 + T ↔ C models, as well as figures provided by previous Crese maordcehlesrs,. [sent-252, score-0.067]

91 (2010) were attained on a slightly smaller dataset with parameters set through cross-validation. [sent-255, score-0.099]

92 The results for W5 + T ↔ C outperform all of Dinu raensdu Lapata’s per-POS ↔and C o oveurtaplelr f roersumlts a except ifnour a slightly superior score on adverbs attained by their NMF model (τb = 0. [sent-256, score-0.058]

93 C O isn balance, we suggest that our models do have an advantage over the current state of the art for lexical substitution. [sent-265, score-0.099]

94 6 Conclusion In this paper we have proposed novel methods for modelling the effect of context on lexical mean- ing, demonstrating that information about syntactic context and textual proximity can fruitfully be integrated to produce state-of-the-art models of lexical choice. [sent-266, score-0.469]

95 We have demonstrated the effectiveness of our techniques on two datasets but they are potentially applicable to a range of applications where semantic disambiguation is required. [sent-267, score-0.079]

96 Concrete sentence spaces for compositional distributional models of meaning. [sent-315, score-0.185]

97 A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. [sent-343, score-0.129]

98 Topic models for word sense disambiguation and token-based idiom detection. [sent-347, score-0.094]

99 A latent Dirichlet allocation method for selectional prefer- O´ ences. [sent-385, score-0.191]

100 Efficient methods for topic model inference on streaming document collections. [sent-420, score-0.066]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dinu', 0.367), ('lapata', 0.271), ('bnc', 0.254), ('xz', 0.194), ('para', 0.194), ('predictor', 0.163), ('modelling', 0.161), ('predictors', 0.14), ('ncsubj', 0.137), ('thater', 0.137), ('pz', 0.129), ('sim', 0.122), ('substitution', 0.12), ('erk', 0.112), ('pad', 0.111), ('mitchell', 0.1), ('substitutes', 0.096), ('lda', 0.095), ('distributional', 0.092), ('po', 0.092), ('latent', 0.089), ('ncmod', 0.086), ('zo', 0.086), ('contexts', 0.085), ('judgements', 0.076), ('xp', 0.076), ('corpse', 0.075), ('paraphrases', 0.069), ('eaghdha', 0.068), ('target', 0.067), ('selectional', 0.067), ('blei', 0.067), ('topic', 0.066), ('generalisation', 0.065), ('rooth', 0.065), ('oo', 0.064), ('runs', 0.064), ('gap', 0.064), ('distributions', 0.063), ('context', 0.063), ('body', 0.062), ('semantics', 0.06), ('executive', 0.059), ('reisinger', 0.059), ('rasp', 0.059), ('attained', 0.058), ('similarity', 0.057), ('averaging', 0.056), ('models', 0.055), ('uppsala', 0.055), ('multiword', 0.054), ('ritter', 0.054), ('counts', 0.054), ('wikipedia', 0.051), ('grefenstette', 0.051), ('deliberated', 0.05), ('downsampled', 0.05), ('noft', 0.05), ('nulty', 0.05), ('referee', 0.05), ('shoulders', 0.05), ('slumped', 0.05), ('yc', 0.05), ('variable', 0.046), ('mirella', 0.046), ('cp', 0.046), ('coverage', 0.045), ('lexical', 0.044), ('mccarthy', 0.044), ('kingdom', 0.043), ('remarked', 0.043), ('georgiana', 0.043), ('declined', 0.043), ('attested', 0.043), ('fz', 0.043), ('sales', 0.043), ('dirichlet', 0.041), ('dataset', 0.041), ('anna', 0.041), ('semantic', 0.04), ('meaning', 0.04), ('disambiguation', 0.039), ('fier', 0.039), ('dei', 0.039), ('rudolph', 0.039), ('cz', 0.039), ('syntactic', 0.039), ('compositional', 0.038), ('pt', 0.038), ('dogs', 0.036), ('tensor', 0.036), ('generalised', 0.036), ('compiled', 0.036), ('cso', 0.036), ('kendall', 0.036), ('aes', 0.036), ('football', 0.036), ('paraphrasing', 0.035), ('allocation', 0.035), ('cam', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000017 107 emnlp-2011-Probabilistic models of similarity in syntactic context

Author: Diarmuid O Seaghdha ; Anna Korhonen

2 0.33207965 80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context

Author: Tim Van de Cruys ; Thierry Poibeau ; Anna Korhonen

Abstract: This paper presents a novel method for the computation of word meaning in context. We make use of a factorization model in which words, together with their window-based context words and their dependency relations, are linked to latent dimensions. The factorization model allows us to determine which dimensions are important for a particular context, and adapt the dependency-based feature vector of the word accordingly. The evaluation on a lexical substitution task carried out for both English and French – indicates that our approach is able to reach better results than state-of-the-art methods in lexical substitution, while at the same time providing more accurate meaning representations. –

3 0.20147058 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning

Author: Edward Grefenstette ; Mehrnoosh Sadrzadeh

Abstract: Modelling compositional meaning for sentences using empirical distributional methods has been a challenge for computational linguists. We implement the abstract categorical model of Coecke et al. (2010) using data from the BNC and evaluate it. The implementation is based on unsupervised learning of matrices for relational words and applying them to the vectors of their arguments. The evaluation is based on the word disambiguation task developed by Mitchell and Lapata (2008) for intransitive sentences, and on a similar new experiment designed for transitive sentences. Our model matches the results of its competitors . in the first experiment, and betters them in the second. The general improvement in results with increase in syntactic complexity showcases the compositional power of our model.

4 0.13298477 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases

Author: Matthias Hartung ; Anette Frank

Abstract: This paper introduces an attribute selection task as a way to characterize the inherent meaning of property-denoting adjectives in adjective-noun phrases, such as e.g. hot in hot summer denoting the attribute TEMPERATURE, rather than TASTE. We formulate this task in a vector space model that represents adjectives and nouns as vectors in a semantic space defined over possible attributes. The vectors incorporate latent semantic information obtained from two variants of LDA topic models. Our LDA models outperform previous approaches on a small set of 10 attributes with considerable gains on sparse representations, which highlights the strong smoothing power of LDA models. For the first time, we extend the attribute selection task to a new data set with more than 200 classes. We observe that large-scale attribute selection is a hard problem, but a subset of attributes performs robustly on the large scale as well. Again, the LDA models outperform the VSM baseline.

5 0.11888644 119 emnlp-2011-Semantic Topic Models: Combining Word Distributional Statistics and Dictionary Definitions

Author: Weiwei Guo ; Mona Diab

Abstract: In this paper, we propose a novel topic model based on incorporating dictionary definitions. Traditional topic models treat words as surface strings without assuming predefined knowledge about word meaning. They infer topics only by observing surface word co-occurrence. However, the co-occurred words may not be semantically related in a manner that is relevant for topic coherence. Exploiting dictionary definitions explicitly in our model yields a better understanding of word semantics leading to better text modeling. We exploit WordNet as a lexical resource for sense definitions. We show that explicitly modeling word definitions helps improve performance significantly over the baseline for a text categorization task.

6 0.099794537 144 emnlp-2011-Unsupervised Learning of Selectional Restrictions and Detection of Argument Coercions

7 0.096973576 101 emnlp-2011-Optimizing Semantic Coherence in Topic Models

8 0.089483164 145 emnlp-2011-Unsupervised Semantic Role Induction with Graph Partitioning

9 0.089411214 21 emnlp-2011-Bayesian Checking for Topic Models

10 0.087776281 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation

11 0.07305856 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus

12 0.072938874 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics

13 0.072721891 6 emnlp-2011-A Generate and Rank Approach to Sentence Paraphrasing

14 0.071096987 2 emnlp-2011-A Cascaded Classification Approach to Semantic Head Recognition

15 0.067668676 128 emnlp-2011-Structured Relation Discovery using Generative Models

16 0.064272054 10 emnlp-2011-A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

17 0.057686031 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features

18 0.057309851 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

19 0.057242889 127 emnlp-2011-Structured Lexical Similarity via Convolution Kernels on Dependency Trees

20 0.056739155 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.24), (1, -0.096), (2, -0.185), (3, -0.143), (4, 0.027), (5, 0.142), (6, -0.086), (7, 0.188), (8, 0.164), (9, 0.096), (10, -0.057), (11, -0.057), (12, 0.213), (13, -0.163), (14, -0.187), (15, 0.011), (16, 0.126), (17, 0.245), (18, 0.032), (19, 0.056), (20, 0.076), (21, -0.149), (22, 0.012), (23, -0.02), (24, 0.018), (25, -0.033), (26, -0.047), (27, 0.007), (28, -0.138), (29, 0.058), (30, 0.171), (31, -0.079), (32, 0.026), (33, 0.026), (34, 0.003), (35, 0.123), (36, 0.047), (37, -0.026), (38, -0.031), (39, -0.012), (40, -0.044), (41, -0.044), (42, -0.111), (43, 0.003), (44, -0.003), (45, -0.05), (46, -0.04), (47, 0.069), (48, 0.0), (49, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9435972 107 emnlp-2011-Probabilistic models of similarity in syntactic context

Author: Diarmuid O Seaghdha ; Anna Korhonen

2 0.90612763 80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context

Author: Tim Van de Cruys ; Thierry Poibeau ; Anna Korhonen

3 0.7776925 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning

Author: Edward Grefenstette ; Mehrnoosh Sadrzadeh

4 0.48070619 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases

Author: Matthias Hartung ; Anette Frank

5 0.45064706 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics

Author: Joseph Reisinger ; Raymond Mooney

Abstract: Context-dependent word similarity can be measured over multiple cross-cutting dimensions. For example, lung and breath are similar thematically, while authoritative and superficial occur in similar syntactic contexts, but share little semantic similarity. Both of these notions of similarity play a role in determining word meaning, and hence lexical semantic models must take them both into account. Towards this end, we develop a novel model, Multi-View Mixture (MVM), that represents words as multiple overlapping clusterings. MVM finds multiple data partitions based on different subsets of features, subject to the marginal constraint that feature subsets are distributed according to Latent Dirich- let Allocation. Intuitively, this constraint favors feature partitions that have coherent topical semantics. Furthermore, MVM uses soft feature assignment, hence the contribution of each data point to each clustering view is variable, isolating the impact of data only to views where they assign the most features. Through a series of experiments, we demonstrate the utility of MVM as an inductive bias for capturing relations between words that are intuitive to humans, outperforming related models such as Latent Dirichlet Allocation.

6 0.39233783 2 emnlp-2011-A Cascaded Classification Approach to Semantic Head Recognition

7 0.3785679 144 emnlp-2011-Unsupervised Learning of Selectional Restrictions and Detection of Argument Coercions

8 0.33952823 119 emnlp-2011-Semantic Topic Models: Combining Word Distributional Statistics and Dictionary Definitions

9 0.32602805 73 emnlp-2011-Improving Bilingual Projections via Sparse Covariance Matrices

10 0.30016583 19 emnlp-2011-Approximate Scalable Bounded Space Sketch for Large Data NLP

11 0.29483926 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming

12 0.29426816 145 emnlp-2011-Unsupervised Semantic Role Induction with Graph Partitioning

13 0.28995529 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features

14 0.28879917 111 emnlp-2011-Reducing Grounded Learning Tasks To Grammatical Inference

15 0.28728011 30 emnlp-2011-Compositional Matrix-Space Models for Sentiment Analysis

16 0.28571385 91 emnlp-2011-Literal and Metaphorical Sense Identification through Concrete and Abstract Context

17 0.28419673 6 emnlp-2011-A Generate and Rank Approach to Sentence Paraphrasing

18 0.26032823 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing

19 0.2587868 55 emnlp-2011-Exploiting Syntactic and Distributional Information for Spelling Correction with Web-Scale N-gram Models

20 0.25532934 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(23, 0.088), (36, 0.024), (37, 0.023), (45, 0.093), (53, 0.022), (54, 0.033), (57, 0.017), (62, 0.014), (64, 0.022), (66, 0.122), (69, 0.017), (79, 0.044), (82, 0.29), (87, 0.011), (90, 0.02), (92, 0.017), (94, 0.011), (96, 0.046), (98, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91239095 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing

Author: Shay B. Cohen ; Carlos Gomez-Rodriguez ; Giorgio Satta

Abstract: We describe a generative model for nonprojective dependency parsing based on a simplified version of a transition system that has recently appeared in the literature. We then develop a dynamic programming parsing algorithm for our model, and derive an insideoutside algorithm that can be used for unsupervised learning of non-projective dependency trees.

2 0.9122535 129 emnlp-2011-Structured Sparsity in Structured Prediction

Author: Andre Martins ; Noah Smith ; Mario Figueiredo ; Pedro Aguiar

Abstract: Linear models have enjoyed great success in structured prediction in NLP. While a lot of progress has been made on efficient training with several loss functions, the problem of endowing learners with a mechanism for feature selection is still unsolved. Common approaches employ ad hoc filtering or L1regularization; both ignore the structure of the feature space, preventing practicioners from encoding structural prior knowledge. We fill this gap by adopting regularizers that promote structured sparsity, along with efficient algorithms to handle them. Experiments on three tasks (chunking, entity recognition, and dependency parsing) show gains in performance, compactness, and model interpretability.

same-paper 3 0.83401817 107 emnlp-2011-Probabilistic models of similarity in syntactic context

Author: Diarmuid O Seaghdha ; Anna Korhonen

4 0.59094453 140 emnlp-2011-Universal Morphological Analysis using Structured Nearest Neighbor Prediction

Author: Young-Bum Kim ; Joao Graca ; Benjamin Snyder

Abstract: In this paper, we consider the problem of unsupervised morphological analysis from a new angle. Past work has endeavored to design unsupervised learning methods which explicitly or implicitly encode inductive biases appropriate to the task at hand. We propose instead to treat morphological analysis as a structured prediction problem, where languages with labeled data serve as training examples for unlabeled languages, without the assumption of parallel data. We define a universal morphological feature space in which every language and its morphological analysis reside. We develop a novel structured nearest neighbor prediction method which seeks to find the morphological analysis for each unlabeled lan- guage which lies as close as possible in the feature space to a training language. We apply our model to eight inflecting languages, and induce nominal morphology with substantially higher accuracy than a traditional, MDLbased approach. Our analysis indicates that accuracy continues to improve substantially as the number of training languages increases.

5 0.58162957 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning

Author: Edward Grefenstette ; Mehrnoosh Sadrzadeh

6 0.57747132 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases

7 0.5703389 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

8 0.56607187 80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context

9 0.54684591 66 emnlp-2011-Hierarchical Phrase-based Translation Representations

10 0.53991336 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model

11 0.53588355 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing

12 0.53541338 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features

13 0.53216588 116 emnlp-2011-Robust Disambiguation of Named Entities in Text

14 0.53194636 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics

15 0.53093964 87 emnlp-2011-Lexical Generalization in CCG Grammar Induction for Semantic Parsing

16 0.52743077 143 emnlp-2011-Unsupervised Information Extraction with Distributional Prior Knowledge

17 0.51802433 77 emnlp-2011-Large-Scale Cognate Recovery

18 0.51784992 97 emnlp-2011-Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French

19 0.5175302 31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars

20 0.51530552 46 emnlp-2011-Efficient Subsampling for Training Complex Language Models