acl acl2012 acl2012-132 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Weiwei Guo ; Mona Diab
Abstract: In this paper we study unsupervised word sense disambiguation (WSD) based on sense definition. We learn low-dimensional latent semantic vectors of concept definitions to construct a more robust sense similarity measure wmfvec. Experiments on four all-words WSD data sets show significant improvement over the baseline WSD systems and LDA based similarity measures, achieving results comparable to state of the art WSD systems.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract In this paper we study unsupervised word sense disambiguation (WSD) based on sense definition. [sent-3, score-0.59]
2 We learn low-dimensional latent semantic vectors of concept definitions to construct a more robust sense similarity measure wmfvec. [sent-4, score-0.82]
3 Experiments on four all-words WSD data sets show significant improvement over the baseline WSD systems and LDA based similarity measures, achieving results comparable to state of the art WSD systems. [sent-5, score-0.107]
4 1 Introduction To date, many unsupervised WSD systems rely on a sense similarity module that returns a similarity score given two senses. [sent-6, score-0.518]
5 Many similarity measures use the taxonomy structure of WordNet [WN] (Fellbaum, 1998), which allows only noun-noun and verb-verb pair similarity computation since the other parts of speech (adjectives and adverbs) do not have a taxonomic representation structure. [sent-7, score-0.235]
6 For example, the jcn similarity measure (Jiang and Conrath, 1997) computes the sense pair similarity score based on the information content of three senses: the two senses and their least common subsumer in the noun/verb hierarchy. [sent-8, score-0.924]
7 The most popular sense similarity measure is the Extended Lesk [elesk] measure (Banerjee and Pedersen, 2003). [sent-9, score-0.426]
8 In elesk, the similarity score is computed based on the length of overlapping words/phrases between two extended dictionary definitions. [sent-10, score-0.164]
9 The definitions are extended by definitions of neighbor senses to discover more overlapping words. [sent-11, score-0.633]
10 edu the issue of shares entitling holders to an ownership interest (equity) Despite the high semantic relatedness of the two senses, the overlapping words in the two definitions are only a, the, leading to a very low similarity score. [sent-15, score-0.425]
11 Accordingly we are interested in extracting latent semantics from sense definitions to improve elesk. [sent-16, score-0.671]
12 However, the challenge lies in that sense definitions are typically too short/sparse for latent variable models to learn accurate semantics, since these models are designed for long documents. [sent-17, score-0.572]
13 For example, topic models such as LDA (Blei et al. [sent-18, score-0.036]
14 , 2003), can only find the dominant topic based on the observed words in a definition (financial topic in bank#n#1 and stock#n#1) without further discernibility. [sent-19, score-0.184]
15 In this case, many senses will share the same latent semantics profile, as long as they are in the same topic/domain. [sent-20, score-0.411]
16 To solve the sparsity issue we use missing words as negative evidence of latent semantics, as in (Guo and Diab, 2012). [sent-21, score-0.342]
17 We define missing words ofa sense definition as the whole vocabulary in a corpus minus the observed words in the sense definition. [sent-22, score-0.777]
18 Since observed words in definitions are too few to reveal the semantics of senses, missing words can be used to tell the model what the definition is not about. [sent-23, score-0.535]
19 Therefore, we want to find a latent semantics profile that is related to observed words in a definition, but also not related to missing words, so that the induced latent semantics is unique for the sense. [sent-24, score-0.706]
20 Finally we also show how to use WN neighbor sense definitions to construct a nuanced sense similarity wmfvec, based on the inferred latent semantic vectors of senses. [sent-25, score-1.149]
21 We show that wmfvec outperforms elesk and LDA based approaches in four All-words WSD data sets. [sent-26, score-0.991]
22 To our best knowledge, wmfvec is the first sense similarity measure based on latent semantics of sense definitions. [sent-27, score-1.43]
23 0u12tionR12580oR3610 0m0 0 Table 1: Three possible hypotheses of latent vectors for the definition of bank#n#1 2 Learning Latent Semantics of Definitions 2. [sent-34, score-0.294]
24 1 Intuition Given only a few observed words in a definition, there are many hypotheses of latent vectors that are highly related to the observed words. [sent-35, score-0.331]
25 Therefore, missing words can be used to prune the hypotheses that are also highly related to the missing words. [sent-36, score-0.313]
26 Consider the hypotheses of latent vectors in table 1 for bank#n#1. [sent-37, score-0.24]
27 Assume there are 3 dimensions in our latent model: financial, sport, institution. [sent-38, score-0.174]
28 We use Rov to denote the sum of relatedness between latent vector v and all observed words; similarly, Rmv is the sum of relatedness between the vector v and all missing words. [sent-39, score-0.503]
29 Hypothesis v1 is given by topic models, where only the financial dimension is found, and it has the maximum relatedness to observed words in bank#n#1 definition Rvo1 = 20. [sent-40, score-0.274]
30 v2 is the ideal latent vector, since it also detects that bank#n#1 is related to institution. [sent-41, score-0.148]
31 It × × has a slightly smaller Rvo2 = 18, but more importantly, its relatedness to missing words, Rvm2 = 300, is substantially smaller than Rvm1 = 600. [sent-42, score-0.203]
32 The solution is straightforward: give a smaller weight to missing words, e. [sent-44, score-0.149]
33 2 Modeling Missing Words by Weighted Matrix Factorization We represent the corpus of WN definitions as an M N matrix X, where row entries are M unique wMor ×ds N existing i nX W, wNh definitions, iaensd a creol uMmn usn represent N WN sense ids. [sent-51, score-0.457]
34 The cell Xij records the TF-IDF value of word wi appearing in definition of sense sj. [sent-52, score-0.362]
35 In WMF, the original matrix X is factorized into two matrices such that X ≈ P>Q, where P is a 141 ×× K M matrix, and Q is a K N matrix. [sent-53, score-0.033]
36 d wi or sense sj is represented as a K-dimension vector P·,i or Q·,j respectively. [sent-55, score-0.331]
37 Note that the inner product of P·,i and Q·,j is used to approximate the semantic relatedness of word wi and definition of sense sj : Xij ≈ P·,i · Q·,j. [sent-56, score-0.488]
38 In≈ ≈W PMF· e Qach cell is associated with a weight, so missing words cells (Xij=0) can have a much less contribution than observed words. [sent-57, score-0.243]
39 The latent vectors of words P and senses Q are estimated by min- imXizinXg thWeij o(bPj·e,ic·t Qiv·e,j f−un Xcti jo)2n:+1 λ||P| 22+ λ||Q| 22 Xi Xj where Wi,j=? [sent-59, score-0.401]
40 1w,m, i f X Xiij 6== 0 0 (1) Equation 1 explicitly requires the latent vector of sense Q·,j to be not related to missing words (P·,i · Q·,j should be close to 0 for missing words Xij = 0). [sent-60, score-0.736]
41 Also weight wm for missing words is very small to make sure latent vectors such as v3 in table 1will not be chosen. [sent-61, score-0.437]
42 After we run WMF on the definitions corpus, the similarity of two senses sj and sk can be computed by the inner product of Q·,j and Q·,k. [sent-64, score-0.506]
43 3 A Nuanced Sense Similarity: wmfvec We can further use the features in WordNet to construct a better sense similarity measure. [sent-66, score-0.922]
44 The most important feature of WN is senses are connected by relations such as hypernymy, meronymy, similar attributes, etc. [sent-67, score-0.164]
45 We observe that neighbor senses are usually similar, hence they could be a good indicator for the latent semantics of the target sense. [sent-68, score-0.503]
46 Note that in elesk each definition is extended by including definitions of its neighbor senses. [sent-70, score-0.758]
47 In our case, we also adopt these two ideas: (1) a sense is represented by the sum of its original latent vector and its neighbors’ latent vectors. [sent-72, score-0.574]
48 Let N(j) be the set of neighbor senses of sense j. [sent-73, score-0.493]
49 then new latent vector is: Q·n,ejw = Q·,j + Q·,k (2) Inner product (inste·,ajd of cosine sPimilarity) of the two resulting sense vectors is treatedP as the sense pair similarity. [sent-74, score-0.745]
50 We refer to our sense similarity measure as wmfvec. [sent-75, score-0.394]
51 We tune the parameters in wmfvec and other baselines based on SE2, and then directly apply the tuned models on other three data sets. [sent-78, score-0.559]
52 WMF and LDA are built on the corpus of sense definitions of two dictionaries: WN and Wiktionary [Wik]. [sent-81, score-0.424]
53 2 We do not link the senses across dictionaries, hence Wik is only used as augmented data for WMF to better learn the semantics of words. [sent-82, score-0.281]
54 , 2003) and lemmatized, resulting in 341,557 sense definitions and 3,563,649 words. [sent-84, score-0.424]
55 WSD Algorithm: To perform WSD we need two components: (1) a sense similarity measure that returns a similarity score given two senses; (2) a disambiguation algorithm that determines which senses to choose as final answers based on the sense pair similarity scores. [sent-85, score-1.13]
56 We choose the Indegree algorithm used in (Sinha and Mihalcea, 2007; Guo and Diab, 2010) as our disambiguation algorithm. [sent-86, score-0.079]
57 It is a graphbased algorithm, where nodes are senses, and edge weight equals to the sense pair similarity. [sent-87, score-0.274]
58 The final answer is chosen as the sense with maximum indegree. [sent-88, score-0.255]
59 Using the Indegree algorithm allows us to easily replace the sense similarity with wmfvec. [sent-89, score-0.362]
60 In Indegree, two senses are connected if their words are within a local window. [sent-90, score-0.189]
61 Baselines: We compare with (1) elesk, the most widely used sense similarity. [sent-92, score-0.255]
62 We believe WMF is a better approach to model latent semantics than LDA, hence the second baseline (2) LDA using Gibbs sampling (Griffiths and Steyvers, 2004). [sent-95, score-0.265]
63 However, we cannot directly use estimated topic distribution P(z|d) to represent the desetfiimniattioend tsoipnciec diti only thioasn non-zero ovra eluperes on one or two topics. [sent-96, score-0.036]
64 Instead, we calculate the latent vec2http : / / en . [sent-97, score-0.148]
65 org/ 142 tor of a definition by summing up the P(z|w) of taollr c oofnas ti dtueefninti wtioonrd bs weighted by Xij, ew Phi(czh|w gives much better WSD results. [sent-99, score-0.075]
66 3 We produce LDA vectors [ldavec] in the same setting as wmfvec, which means it is trained on the same corpus, uses WN neighbors, and is tuned on SE2. [sent-100, score-0.089]
67 At last, we compare wmfvec with a mature WSD system based on sense similarities, (3) (Sinha and Mihalcea, 2007) [jcn+elesk], where they evaluate six sense similarities, select the best of them and combine them into one system. [sent-101, score-1.073]
68 Specifically, in their implementation they use jcn for noun-noun and verbverb pairs, and elesk for other pairs. [sent-102, score-0.695]
69 4 Experiment Results The disambiguation results (K = 100) are summarized in Table 2. [sent-104, score-0.055]
70 We also present in Table 3 results using other values of dimensions K for wmfvec and ldavec. [sent-105, score-0.56]
71 Based on SE2, wmfvec’s parameters are tuned as λ = 20, wm = 0. [sent-107, score-0.076]
72 d W bye aalvseor set a threshold to elesk similarity values, which yields better performance. [sent-114, score-0.56]
73 Same as (Sinha and Mihalcea, 2007), values of elesk larger than 240 are set to 1, and the rest are mapped to [0,1]. [sent-115, score-0.436]
74 elesk vs wmfvec: wmfvec outperforms elesk consistently in all POS cases (noun, adjective, adverb and verb) on four datasets by a large margin (2. [sent-116, score-1.477]
75 ldavec vs wmfvec: ldavec also performs very well, again proving the superiority of latent semantics over surface words matching. [sent-124, score-0.576]
76 However, wmfvec also outperforms ldavec in every POS case except Semcor adverbs (at least +1% in total case). [sent-125, score-0.717]
77 We observe the trend is consistent in Table 3 where different dimensions are used for ldavec and wmfvec. [sent-126, score-0.188]
78 These results show that given the same text data, WMF outperforms LDA on modeling latent semantics of senses by exploiting missing words. [sent-127, score-0.562]
79 jcn+elesk vs jcn+wmfvec: jcn+elesk is a very mature WSD system that takes advantage of the great performance of jcn on noun-noun and verb-verb pairs. [sent-128, score-0.314]
80 Although wmfvec does much better than elesk, using wmfvec solely is sometimes outperformed by jcn+elesk on nouns and verbs. [sent-129, score-1.068]
81 Therefore to beat jcn+elesk, we replace the elesk in jcn+elesk with wmfvec (hence jcn+wmfvec). [sent-130, score-0.97]
82 Similar to (Sinha and Mihalcea, 2007), we normalize wmfvec similarity such that values greater than 400 are set to 1, and the rest values are mapped to [0,1]. [sent-131, score-0.641]
83 It shows wmfvec is robust that it not only performs very well individually, but also can be easily incorporated with existing evidence as represented using jcn. [sent-140, score-0.555]
84 For example, the target word mouse in the context: . [sent-151, score-0.059]
85 in experiments with mice that a gene called p53 could transform normal cells into cancerous ones. [sent-154, score-0.055]
86 elesk returns the wrong sense computer device, due to the sparsity of overlapping words between definitions of animal mouse and the context words. [sent-157, score-1.062]
87 wmfvec chooses the correct sense animal mouse, by recognizing the biology element of animal mouse and related context words gene, cell, cancerous. [sent-158, score-0.961]
88 5 Related Work Sense similarity measures have been the core component in many unsupervised WSD systems and lexical semantics research/applications. [sent-159, score-0.252]
89 To date, elesk is the most popular such measure (McCarthy et al. [sent-160, score-0.468]
90 Sometimes people use jcn to obtain similarity of noun-noun and verb-verb pairs (Sinha and Mihalcea, 2007; Guo and Diab, 2010). [sent-163, score-0.366]
91 Our similarity measure wmfvec exploits the same information (sense definitions) elesk and ldavec use, and outperforms them significantly on four standardized data sets. [sent-164, score-1.269]
92 To our best knowledge, we are the first to construct a sense similarity by latent semantics of sense definitions. [sent-165, score-0.89]
93 6 Conclusions We construct a sense similarity wmfvec from the latent semantics of sense definitions. [sent-166, score-1.424]
94 Experiment results show wmfvec significantly outperforms previous definition-based similarity measures and LDA vectors on four all-words WSD data sets. [sent-167, score-0.747]
95 Extended gloss overlaps as a measure of semantic relatedness. [sent-180, score-0.051]
96 Combining orthogonal monolingual and multilingual sources of evidence for all words wsd. [sent-208, score-0.046]
97 Semantic topic models: Combining word distributional statistics and dictionary definitions. [sent-212, score-0.036]
98 Topic models for word sense disambiguation and token-based idiom detection. [sent-226, score-0.31]
99 Unsupervised large-vocabulary word sense disambiguation with graph-based algorithms for sequence data labeling. [sent-234, score-0.31]
100 Unsupervised graph-based word sense disambiguation using measures of word semantic similarity. [sent-242, score-0.35]
wordName wordTfidf (topN-words)
[('wmfvec', 0.534), ('elesk', 0.436), ('jcn', 0.259), ('sense', 0.255), ('wsd', 0.211), ('definitions', 0.169), ('senses', 0.164), ('latent', 0.148), ('ldavec', 0.139), ('wmf', 0.138), ('missing', 0.13), ('sinha', 0.126), ('similarity', 0.107), ('guo', 0.101), ('semantics', 0.099), ('lda', 0.094), ('wn', 0.09), ('mihalcea', 0.083), ('diab', 0.079), ('neighbor', 0.074), ('xij', 0.074), ('relatedness', 0.073), ('bank', 0.067), ('vectors', 0.064), ('indegree', 0.059), ('mouse', 0.059), ('disambiguation', 0.055), ('definition', 0.054), ('financial', 0.053), ('srebro', 0.052), ('wik', 0.052), ('wm', 0.051), ('mona', 0.047), ('weiwei', 0.047), ('animal', 0.044), ('pedersen', 0.038), ('topic', 0.036), ('untagged', 0.035), ('inner', 0.034), ('matrix', 0.033), ('neighbors', 0.033), ('observed', 0.033), ('sj', 0.032), ('cell', 0.032), ('overlapping', 0.032), ('measure', 0.032), ('nuanced', 0.032), ('gene', 0.032), ('mature', 0.029), ('wordnet', 0.029), ('hypotheses', 0.028), ('brody', 0.028), ('odni', 0.028), ('predominant', 0.026), ('iarpa', 0.026), ('jaakkola', 0.026), ('banerjee', 0.026), ('mccarthy', 0.026), ('vs', 0.026), ('construct', 0.026), ('dimensions', 0.026), ('extended', 0.025), ('cai', 0.025), ('agirre', 0.025), ('words', 0.025), ('unsupervised', 0.025), ('tuned', 0.025), ('returns', 0.024), ('stock', 0.024), ('rm', 0.024), ('adverb', 0.024), ('profile', 0.024), ('choose', 0.024), ('cells', 0.023), ('adverbs', 0.023), ('trend', 0.023), ('vector', 0.023), ('ted', 0.022), ('wi', 0.021), ('evidence', 0.021), ('outperforms', 0.021), ('measures', 0.021), ('weighted', 0.021), ('rada', 0.021), ('weight', 0.019), ('semantic', 0.019), ('toutanova', 0.018), ('griffiths', 0.018), ('hence', 0.018), ('pos', 0.018), ('columbia', 0.018), ('sparsity', 0.018), ('phi', 0.017), ('bye', 0.017), ('ionary', 0.017), ('nx', 0.017), ('hypernymy', 0.017), ('weeds', 0.017), ('wmor', 0.017), ('erroneously', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition
Author: Weiwei Guo ; Mona Diab
Abstract: In this paper we study unsupervised word sense disambiguation (WSD) based on sense definition. We learn low-dimensional latent semantic vectors of concept definitions to construct a more robust sense similarity measure wmfvec. Experiments on four all-words WSD data sets show significant improvement over the baseline WSD systems and LDA based similarity measures, achieving results comparable to state of the art WSD systems.
2 0.25938118 145 acl-2012-Modeling Sentences in the Latent Space
Author: Weiwei Guo ; Mona Diab
Abstract: Sentence Similarity is the process of computing a similarity score between two sentences. Previous sentence similarity work finds that latent semantics approaches to the problem do not perform well due to insufficient information in single sentences. In this paper, we show that by carefully handling words that are not in the sentences (missing words), we can train a reliable latent variable model on sentences. In the process, we propose a new evaluation framework for sentence similarity: Concept Definition Retrieval. The new framework allows for large scale tuning and testing of Sentence Similarity models. Experiments on the new task and previous data sets show significant improvement of our model over baselines and other traditional latent variable models. Our results indicate comparable and even better performance than current state of the art systems addressing the problem of sentence similarity.
3 0.25737551 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval
Author: Zhi Zhong ; Hwee Tou Ng
Abstract: Previous research has conflicting conclusions on whether word sense disambiguation (WSD) systems can improve information retrieval (IR) performance. In this paper, we propose a method to estimate sense distributions for short queries. Together with the senses predicted for words in documents, we propose a novel approach to incorporate word senses into the language modeling approach to IR and also exploit the integration of synonym relations. Our experimental results on standard TREC collections show that using the word senses tagged by a supervised WSD system, we obtain significant improvements over a state-of-the-art IR system.
4 0.18011014 152 acl-2012-Multilingual WSD with Just a Few Lines of Code: the BabelNet API
Author: Roberto Navigli ; Simone Paolo Ponzetto
Abstract: In this paper we present an API for programmatic access to BabelNet a wide-coverage multilingual lexical knowledge base and multilingual knowledge-rich Word Sense Disambiguation (WSD). Our aim is to provide the research community with easy-to-use tools to perform multilingual lexical semantic analysis and foster further research in this direction. – –
5 0.13962589 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation
Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum
Abstract: To discover relation types from text, most methods cluster shallow or syntactic patterns of relation mentions, but consider only one possible sense per pattern. In practice this assumption is often violated. In this paper we overcome this issue by inducing clusters of pattern senses from feature representations of patterns. In particular, we employ a topic model to partition entity pairs associated with patterns into sense clusters using local and global features. We merge these sense clusters into semantic relations using hierarchical agglomerative clustering. We compare against several baselines: a generative latent-variable model, a clustering method that does not disambiguate between path senses, and our own approach but with only local features. Experimental results show our proposed approach discovers dramatically more accurate clusters than models without sense disambiguation, and that incorporating global features, such as the document theme, is crucial.
6 0.084872633 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
7 0.08126004 216 acl-2012-Word Epoch Disambiguation: Finding How Words Change Over Time
8 0.064408787 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes
9 0.063400172 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
10 0.056984197 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
11 0.054880071 161 acl-2012-Polarity Consistency Checking for Sentiment Dictionaries
13 0.052573897 79 acl-2012-Efficient Tree-Based Topic Modeling
15 0.049128901 56 acl-2012-Computational Approaches to Sentence Completion
16 0.044385239 86 acl-2012-Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks
17 0.042281553 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study
18 0.039945453 181 acl-2012-Spectral Learning of Latent-Variable PCFGs
19 0.039910607 199 acl-2012-Topic Models for Dynamic Translation Model Adaptation
20 0.036956593 110 acl-2012-Historical Analysis of Legal Opinions with a Sparse Mixed-Effects Latent Variable Model
topicId topicWeight
[(0, -0.127), (1, 0.075), (2, 0.051), (3, 0.028), (4, -0.061), (5, 0.151), (6, -0.059), (7, 0.017), (8, -0.001), (9, -0.015), (10, 0.18), (11, 0.018), (12, 0.243), (13, 0.129), (14, -0.044), (15, -0.205), (16, 0.097), (17, 0.103), (18, -0.044), (19, 0.008), (20, 0.149), (21, -0.191), (22, -0.186), (23, 0.027), (24, 0.034), (25, -0.035), (26, -0.209), (27, -0.019), (28, -0.077), (29, -0.009), (30, 0.01), (31, 0.005), (32, -0.038), (33, -0.122), (34, 0.096), (35, 0.071), (36, -0.041), (37, -0.032), (38, -0.055), (39, 0.125), (40, 0.013), (41, -0.021), (42, 0.03), (43, -0.252), (44, 0.055), (45, 0.017), (46, -0.064), (47, 0.038), (48, 0.065), (49, 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.96804577 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition
Author: Weiwei Guo ; Mona Diab
Abstract: In this paper we study unsupervised word sense disambiguation (WSD) based on sense definition. We learn low-dimensional latent semantic vectors of concept definitions to construct a more robust sense similarity measure wmfvec. Experiments on four all-words WSD data sets show significant improvement over the baseline WSD systems and LDA based similarity measures, achieving results comparable to state of the art WSD systems.
2 0.73345518 145 acl-2012-Modeling Sentences in the Latent Space
Author: Weiwei Guo ; Mona Diab
Abstract: Sentence Similarity is the process of computing a similarity score between two sentences. Previous sentence similarity work finds that latent semantics approaches to the problem do not perform well due to insufficient information in single sentences. In this paper, we show that by carefully handling words that are not in the sentences (missing words), we can train a reliable latent variable model on sentences. In the process, we propose a new evaluation framework for sentence similarity: Concept Definition Retrieval. The new framework allows for large scale tuning and testing of Sentence Similarity models. Experiments on the new task and previous data sets show significant improvement of our model over baselines and other traditional latent variable models. Our results indicate comparable and even better performance than current state of the art systems addressing the problem of sentence similarity.
3 0.6811673 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval
Author: Zhi Zhong ; Hwee Tou Ng
Abstract: Previous research has conflicting conclusions on whether word sense disambiguation (WSD) systems can improve information retrieval (IR) performance. In this paper, we propose a method to estimate sense distributions for short queries. Together with the senses predicted for words in documents, we propose a novel approach to incorporate word senses into the language modeling approach to IR and also exploit the integration of synonym relations. Our experimental results on standard TREC collections show that using the word senses tagged by a supervised WSD system, we obtain significant improvements over a state-of-the-art IR system.
4 0.54815465 152 acl-2012-Multilingual WSD with Just a Few Lines of Code: the BabelNet API
Author: Roberto Navigli ; Simone Paolo Ponzetto
Abstract: In this paper we present an API for programmatic access to BabelNet a wide-coverage multilingual lexical knowledge base and multilingual knowledge-rich Word Sense Disambiguation (WSD). Our aim is to provide the research community with easy-to-use tools to perform multilingual lexical semantic analysis and foster further research in this direction. – –
5 0.47556883 56 acl-2012-Computational Approaches to Sentence Completion
Author: Geoffrey Zweig ; John C. Platt ; Christopher Meek ; Christopher J.C. Burges ; Ainur Yessenalina ; Qiang Liu
Abstract: This paper studies the problem of sentencelevel semantic coherence by answering SATstyle sentence completion questions. These questions test the ability of algorithms to distinguish sense from nonsense based on a variety of sentence-level phenomena. We tackle the problem with two approaches: methods that use local lexical information, such as the n-grams of a classical language model; and methods that evaluate global coherence, such as latent semantic analysis. We evaluate these methods on a suite of practice SAT questions, and on a recently released sentence completion task based on data taken from five Conan Doyle novels. We find that by fusing local and global information, we can exceed 50% on this task (chance baseline is 20%), and we suggest some avenues for further research.
6 0.42798519 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes
7 0.37708908 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation
8 0.36216375 216 acl-2012-Word Epoch Disambiguation: Finding How Words Change Over Time
9 0.34669441 161 acl-2012-Polarity Consistency Checking for Sentiment Dictionaries
10 0.32582363 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
11 0.25370812 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
12 0.23974845 110 acl-2012-Historical Analysis of Legal Opinions with a Sparse Mixed-Effects Latent Variable Model
13 0.23117621 79 acl-2012-Efficient Tree-Based Topic Modeling
14 0.22050652 181 acl-2012-Spectral Learning of Latent-Variable PCFGs
15 0.21519239 32 acl-2012-Automated Essay Scoring Based on Finite State Transducer: towards ASR Transcription of Oral English Speech
17 0.20769989 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study
18 0.20509194 31 acl-2012-Authorship Attribution with Author-aware Topic Models
19 0.19949138 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
20 0.19771217 7 acl-2012-A Computational Approach to the Automation of Creative Naming
topicId topicWeight
[(18, 0.19), (25, 0.025), (26, 0.037), (28, 0.026), (30, 0.023), (37, 0.029), (39, 0.065), (58, 0.066), (59, 0.018), (62, 0.021), (74, 0.027), (82, 0.017), (85, 0.069), (90, 0.093), (91, 0.026), (92, 0.108), (94, 0.022), (99, 0.037)]
simIndex simValue paperId paperTitle
same-paper 1 0.78715754 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition
Author: Weiwei Guo ; Mona Diab
Abstract: In this paper we study unsupervised word sense disambiguation (WSD) based on sense definition. We learn low-dimensional latent semantic vectors of concept definitions to construct a more robust sense similarity measure wmfvec. Experiments on four all-words WSD data sets show significant improvement over the baseline WSD systems and LDA based similarity measures, achieving results comparable to state of the art WSD systems.
2 0.65437675 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars
Author: Elif Yamangil ; Stuart Shieber
Abstract: We present a Bayesian nonparametric model for estimating tree insertion grammars (TIG), building upon recent work in Bayesian inference of tree substitution grammars (TSG) via Dirichlet processes. Under our general variant of TIG, grammars are estimated via the Metropolis-Hastings algorithm that uses a context free grammar transformation as a proposal, which allows for cubic-time string parsing as well as tree-wide joint sampling of derivations in the spirit of Cohn and Blunsom (2010). We use the Penn treebank for our experiments and find that our proposal Bayesian TIG model not only has competitive parsing performance but also finds compact yet linguistically rich TIG representations of the data.
3 0.62491125 145 acl-2012-Modeling Sentences in the Latent Space
Author: Weiwei Guo ; Mona Diab
Abstract: Sentence Similarity is the process of computing a similarity score between two sentences. Previous sentence similarity work finds that latent semantics approaches to the problem do not perform well due to insufficient information in single sentences. In this paper, we show that by carefully handling words that are not in the sentences (missing words), we can train a reliable latent variable model on sentences. In the process, we propose a new evaluation framework for sentence similarity: Concept Definition Retrieval. The new framework allows for large scale tuning and testing of Sentence Similarity models. Experiments on the new task and previous data sets show significant improvement of our model over baselines and other traditional latent variable models. Our results indicate comparable and even better performance than current state of the art systems addressing the problem of sentence similarity.
4 0.61154133 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment
Author: Asher Stern ; Ido Dagan
Abstract: This paper introduces BIUTEE1 , an opensource system for recognizing textual entailment. Its main advantages are its ability to utilize various types of knowledge resources, and its extensibility by which new knowledge resources and inference components can be easily integrated. These abilities make BIUTEE an appealing RTE system for two research communities: (1) researchers of end applications, that can benefit from generic textual inference, and (2) RTE researchers, who can integrate their novel algorithms and knowledge resources into our system, saving the time and effort of developing a complete RTE system from scratch. Notable assistance for these re- searchers is provided by a visual tracing tool, by which researchers can refine and “debug” their knowledge resources and inference components.
5 0.60966361 205 acl-2012-Tweet Recommendation with Graph Co-Ranking
Author: Rui Yan ; Mirella Lapata ; Xiaoming Li
Abstract: Mirella Lapata‡ Xiaoming Li†, \ ‡Institute for Language, \State Key Laboratory of Software Cognition and Computation, Development Environment, University of Edinburgh, Beihang University, Edinburgh EH8 9AB, UK Beijing 100083, China mlap@ inf .ed .ac .uk lxm@pku .edu .cn 2012.1 Twitter enables users to send and read textbased posts ofup to 140 characters, known as tweets. As one of the most popular micro-blogging services, Twitter attracts millions of users, producing millions of tweets daily. Shared information through this service spreads faster than would have been possible with traditional sources, however the proliferation of user-generation content poses challenges to browsing and finding valuable information. In this paper we propose a graph-theoretic model for tweet recommendation that presents users with items they may have an interest in. Our model ranks tweets and their authors simultaneously using several networks: the social network connecting the users, the network connecting the tweets, and a third network that ties the two together. Tweet and author entities are ranked following a co-ranking algorithm based on the intuition that that there is a mutually reinforcing relationship between tweets and their authors that could be reflected in the rankings. We show that this framework can be parametrized to take into account user preferences, the popularity of tweets and their authors, and diversity. Experimental evaluation on a large dataset shows that our model out- performs competitive approaches by a large margin.
6 0.60581714 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
7 0.60129374 31 acl-2012-Authorship Attribution with Author-aware Topic Models
8 0.59933132 154 acl-2012-Native Language Detection with Tree Substitution Grammars
9 0.59818655 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
10 0.59285629 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation
11 0.59270084 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning
12 0.59027588 86 acl-2012-Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks
13 0.58966756 167 acl-2012-QuickView: NLP-based Tweet Search
14 0.58897913 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
15 0.58826357 152 acl-2012-Multilingual WSD with Just a Few Lines of Code: the BabelNet API
16 0.58640331 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
17 0.58557039 78 acl-2012-Efficient Search for Transformation-based Inference
18 0.58351481 136 acl-2012-Learning to Translate with Multiple Objectives