acl acl2010 acl2010-156 knowledge-graph by maker-knowledge-mining

156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems


Source: pdf

Author: Simone Paolo Ponzetto ; Roberto Navigli

Abstract: One of the main obstacles to highperformance Word Sense Disambiguation (WSD) is the knowledge acquisition bottleneck. In this paper, we present a methodology to automatically extend WordNet with large amounts of semantic relations from an encyclopedic resource, namely Wikipedia. We show that, when provided with a vast amount of high-quality semantic relations, simple knowledge-lean disambiguation algorithms compete with state-of-the-art supervised WSD systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In this paper, we present a methodology to automatically extend WordNet with large amounts of semantic relations from an encyclopedic resource, namely Wikipedia. [sent-4, score-0.262]

2 We show that, when provided with a vast amount of high-quality semantic relations, simple knowledge-lean disambiguation algorithms compete with state-of-the-art supervised WSD systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets. [sent-5, score-0.241]

3 In the recent years, two main approaches have been studied that rely on a fixed sense inventory, i. [sent-7, score-0.161]

4 The relations are harvested from an encyclopedic resource, namely Wikipedia. [sent-22, score-0.213]

5 Wikipedia pages are automatically associated with WordNet senses, and topical, semantic associative relations from Wikipedia are transferred to WordNet, thus producing a much richer lexical resource. [sent-23, score-0.177]

6 The results show that the integration of vast amounts of semantic relations in knowledge-based systems yields performance competitive with state-of-the-art supervised approaches on open-text WSD. [sent-25, score-0.154]

7 Other approaches include the extraction of semantic preferences from sense-annotated (Agirre and Martinez, 2001) and raw corpora (McCarthy and Carroll, 2003), as well as the disambiguation of dictionary glosses based on cyclic graph patterns (Navigli, 2009a). [sent-40, score-0.222]

8 Other works rely on the disambiguation of collocations, either obtained from specialized learner’s dictionaries (Navigli and Velardi, 2005) or extracted by means of statistical techniques (Cuadros and Rigau, 2008), e. [sent-41, score-0.138]

9 But while most of these methods represent state-of-the-art proposals for enriching lexical and taxonomic resources, none concentrates on augmenting WordNet with associative semantic relations for many domains on a very large scale. [sent-44, score-0.133]

10 Mihalcea (2007) manually maps Wikipedia pages to WordNet senses to perform lexical-sample WSD. [sent-55, score-0.168]

11 We extend her proposal in three important ways: (1) we fully automatize the mapping between Wikipedia pages and WordNet senses; (2) we use the mappings to enrich an existing resource, i. [sent-56, score-0.17]

12 WordNet, rather than annotating text with sense labels; (3) we deploy the knowledge encoded by this mapping to perform unrestricted WSD, rather than apply it to a lexical sample setting. [sent-58, score-0.316]

13 Knowledge from Wikipedia is injected into a WSD system by means of a mapping to WordNet. [sent-59, score-0.15]

14 Previous efforts aimed at automatically link- ing Wikipedia to WordNet include full use of the first WordNet sense heuristic (Suchanek et al. [sent-60, score-0.161]

15 , 2008), a graph-based mapping of Wikipedia categories to WordNet synsets (Ponzetto and Navigli, 2009), a model based on vector spaces (RuizCasado et al. [sent-61, score-0.198]

16 hyperlinked, nor they propose a high-performing probabilistic formulation of the mapping problem, a task to which we turn in the next section. [sent-66, score-0.126]

17 3 Extending WordNet Our approach consists of two main phases: first, a mapping is automatically established between Wikipedia pages and WordNet senses; second, the relations connecting Wikipedia pages are transferred to WordNet. [sent-67, score-0.29]

18 For instance, the con1523 cept of soda drink is expressed as: { popn2, sodan2, soda popn1, soda watern2, tonicn2 } where each word’s subscripts and superscripts indicate their parts of speech (e. [sent-80, score-1.148]

19 n stands for noun) and sense number1 , respectively. [sent-82, score-0.161]

20 For example, the gloss of the above synset is: “a sweet drink containing carbonated water and flavoring”. [sent-84, score-0.534]

21 Formally, given the entire set of pages SensesWiki and WordNet senses SensesWN, we aim to acquire a mapping: µ : SensesWiki → SensesWN, such that, for each Wikipage w ∈ SensesWiki: µ(w) =? [sent-107, score-0.168]

22 s ∈ SensesWN(w) ieoftshta e brwl i sinshke ,d c,an be where SensesWN(w) is the set of senses of the lemma of w in WordNet. [sent-108, score-0.16]

23 We use word senses to unambiguously denote the corresponding synsets (e. [sent-111, score-0.196]

24 mapping methodology linked SODA (SOFT DRINK) to the corresponding WordNet sense sodan2, we would have µ(SODA (SOFT DRINK)) = sodan2. [sent-119, score-0.349]

25 In order to establish a mapping between the two resources, we first identify different kinds of disambiguation contexts for Wikipages (Section 3. [sent-120, score-0.289]

26 Next, we intersect these contexts to perform the mapping (see Section 3. [sent-125, score-0.151]

27 1 Disambiguation Context of a Wikipage Given a target Wikipage w which we aim to map to a WordNet sense of w, we use the following information as a disambiguation context: • • • Sense labels: e. [sent-130, score-0.299]

28 given the page SODA (SOFT DRINK), the words soft and drink are added to the disambiguation context. [sent-132, score-0.558]

29 , SWEDISH WRITERS or SCI- ENTISTS WHO COMMITTED SUICIDE), we use the lemmas of their syntactic heads as disambiguation context (i. [sent-139, score-0.217]

30 Given a Wikipage w, we define its disambiguation context Ctx(w) as the set of words obtained from some or all of the three sources above. [sent-143, score-0.173]

31 2 Disambiguation Context of a WordNet Sense Given a WordNet sense s and its synset S, we use the following information as disambiguation context to provide evidence for a potential link in our mapping µ: • Synonymy: all synonyms of s in synset S. [sent-146, score-0.638]

32 For instance, given the synset of sodan2, all its synonyms are included in the context (that is, tonic, soda pop, pop, etc. [sent-147, score-0.409]

33 For example, given sodan2, we include the words from its hypernym { soft drinkn1 }. [sent-154, score-0.132]

34 Thus the words bitter and lemon are included in the disambiguation context of s. [sent-158, score-0.209]

35 Gloss: the set of lemmas of the content words occurring within the gloss of s. [sent-159, score-0.125]

36 For instance, given s = sodan2, defined as “a sweet drink containing carbonated water and flavoring”, we add to the disambiguation context of s the following lemmas: sweet, drink, contain, carbonated, water, flavoring. [sent-160, score-0.55]

37 Given a WordNet sense s, we define its disambiguation context Ctx(s) as the set of words ob- µ tained from some or all of the four sources above. [sent-161, score-0.334]

38 The following steps are performed: • • • Initially (lines 1-2), our mapping it links each Wikipage w to ? [sent-165, score-0.185]

39 |SensesWiki(w) | = |SensesWN(w) | = 1) we map w to its only WordNet sense wn1 (lines 3-5). [sent-171, score-0.161]

40 Finally, for each remaining Wikipage w for which no mapping was previously found (i. [sent-172, score-0.126]

41 , line 7), we do the following: lines 8-10: for each Wikipage d which is a redirection to w, for which a mapping was previously found (i. [sent-175, score-0.158]

42 , that is, d is monosemous in both Wikipedia and WordNet) and such that it maps to a sense µ(d) in a synset S that also contains a sense of w, we – = µµ map w to the corresponding – sense in S. [sent-178, score-0.591]

43 S tehnesne 13: if no tie occurs then 14: µ(w) := argmax p(s|w) 15: return s ∈ SensesWN (w) s ∈ SensesWN(w) (no mapping is established µ if a tie occurs, line 13). [sent-186, score-0.181]

44 As a result of the execution of the algorithm, the mapping is returned (line 15). [sent-187, score-0.126]

45 At the heart of the mapping algorithm lies the calculation of the conditional probability p(s|w) of selecting the WordNet sense s given the Wikipage w. [sent-188, score-0.287]

46 The sense s which maximizes this probability can be obtained as follows: µ(w) =s∈SaenrgsmesaWxN(w)p(s|w) argsmaxpp((s,ww)) = argsmaxp(s,w) = The latter formula is obtained by observing that p(w) does not influence our maximization, as it is a constant independent of s. [sent-189, score-0.161]

47 As a result, the most appropriate sense s is determined by maximizing the joint probability p(s, w) of sense s and page w. [sent-190, score-0.356]

48 Thus, in our algorithm we determine the best sense s by computing the intersection of the disambiguation contexts of s and w, and normalizing by the scores summed over all senses of w in Wikipedia and WordNet. [sent-192, score-0.448]

49 4 Example We illustrate the execution of our mapping algorithm by way of an example. [sent-195, score-0.126]

50 The word soda is polysemous both in Wikipedia and WordNet, thus lines 3–5 of the algorithm do not concern this Wikipage. [sent-197, score-0.33]

51 Lines 6–14 aim to find a mapping µ(SODA (SOFT DRINK)) to an appropriate WordNet sense of the word. [sent-198, score-0.287]

52 Next, we construct the disambiguation context for the Wikipage by including words from its label, links and cate- gories (cf. [sent-200, score-0.232]

53 We now construct the disambiguation context for the two WordNet senses of soda (cf. [sent-205, score-0.595]

54 2), namely the sodium carbonate (#1) and the drink (#2) senses. [sent-208, score-0.384]

55 The sense with the largest intersection is #2, so the following mapping is established: µ(SODA (SOFT DRINK)) = sodan2. [sent-212, score-0.287]

56 3 Transferring Semantic Relations The output of the algorithm presented in the previous section is a mapping between Wikipages and WordNet senses (that is, implicitly, synsets). [sent-214, score-0.25]

57 For any such link from w to w0, if the two Wikipages are mapped to WordNet senses (i. [sent-217, score-0.15]

58 Thus, WordNet++ represents an extension of WordNet which includes semantic associative relations between synsets. [sent-227, score-0.133]

59 In turn, WordNet++ represents the English-only subset of a larger multilingual resource, BabelNet (Navigli and Ponzetto, 2010), where lexicalizations of the synsets are harvested for many languages using the so-called Wikipedia inter-language links and applying a machine translation system. [sent-233, score-0.171]

60 4 Experiments We perform two sets of experiments: we first evaluate the intrinsic quality of our mapping (Section 4. [sent-234, score-0.126]

61 We first conducted an evaluation of the mapping quality. [sent-240, score-0.126]

62 To create a gold standard for evaluation, we started from the set of all lemmas contained both in WordNet and Wikipedia: the intersection between the two resources includes 80,295 lemmas which correspond to 105,797 WordNet senses and 199,735 Wikipedia pages. [sent-241, score-0.212]

63 We selected a random sample of 1,000 Wikipages and asked an annotator with previous experience in lexicographic annotation to provide the correct WordNet sense for each page title (an empty sense label was given if no correct mapping was possible). [sent-247, score-0.482]

64 In order to quantify the quality of the annotations and the difficulty of the task, a second annotator sense tagged a subset of 200 pages from the original sample. [sent-251, score-0.205]

65 Table 1summarizes the performance of our disambiguation algorithm against the manually annotated dataset. [sent-254, score-0.138]

66 Evaluation is performed in terms of standard measures of precision (the ratio of correct sense labels to the non-empty labels output by the mapping algorithm), recall (the ratio of correct sense labels to the total of non-empty labels in the gold standard) and F1-measure (P2P+RR). [sent-255, score-0.56]

67 empty sense labels (that is, calculated on all 1,000 test instances). [sent-269, score-0.189]

68 As baseline we use the most frequent WordNet sense (MFS), as well as a random sense assignment. [sent-270, score-0.322]

69 We evaluate the mapping methodology described in Section 3. [sent-271, score-0.161]

70 2 against different disambiguation contexts for the WordNet senses (cf. [sent-272, score-0.287]

71 The results show that our method improves on the baseline by a large margin and that higher performance can be achieved by using more disambiguation information. [sent-284, score-0.138]

72 That is, using a richer disambiguation context helps to better choose the most appropriate WordNet sense for a Wikipedia page. [sent-285, score-0.334]

73 This implies that the different disambiguation contexts only partially overlap and, when used separately, each produces different mappings with a similar level of precision. [sent-291, score-0.194]

74 As for the baselines, the most frequent sense is just 0. [sent-295, score-0.161]

75 ing the most frequent sense rather than any other sense for each target page represents a choice as arbitrary as picking a sense at random. [sent-303, score-0.517]

76 The final mapping contains 81,533 pairs of Wikipages and word senses they map to, covering 55. [sent-304, score-0.25]

77 Using our best performing mapping we are able to extend WordNet with 1,902,859 semantic edges: of these, 97. [sent-306, score-0.18]

78 For instance, mapping TRAVEL to the first or the second sense in WordNet is an arbitrary choice, as the Wikipage refers to both senses. [sent-321, score-0.287]

79 Accordingly, we expect the transfer of semantic relations from Wikipedia to WordNet to have sometimes the side effect to penalize some fine-grained senses of a word. [sent-323, score-0.229]

80 – 1527 algorithm (Lesk, 1986), that performs WSD based on the overlap between the context surrounding the target word to be disambiguated and the definitions of its candidate senses (Kilgarriff and Rosenzweig, 2000). [sent-336, score-0.214]

81 Given a target word w, this method assigns to w the sense whose gloss has the highest overlap (i. [sent-337, score-0.273]

82 Due to the limited context provided by the WordNet glosses, we follow Banerjee and Pedersen (2003) and expand the gloss of each sense s to include words from the glosses of those synsets in a semantic relation with s. [sent-340, score-0.433]

83 These include all WordNet synsets which are directly connected to s, either by means of the semantic pointers found in WordNet or through the unlabeled links found in WordNet++. [sent-341, score-0.185]

84 Starting from each sense s of the target word, it performs a depth-first search (DFS) of the WordNet(++) graph and collects all the paths connecting s to senses of other words in context. [sent-343, score-0.285]

85 The sense of the target word with the highest vertex degree is se- lected. [sent-346, score-0.254]

86 We follow Navigli and Lapata (2010) and run Degree in a weakly supervised setting where the system attempts no sense assignment if the highest degree score is below a certain (empirically estimated) threshold. [sent-347, score-0.303]

87 Accordingly, in order to improve the disambiguation performance, we developed a filter to rule out weak semantic relations from WordNet++. [sent-354, score-0.243]

88 The final graph used by Degree consists of WordNet, together with 152,944 relations from our semantic relation enrichment method (cf. [sent-361, score-0.137]

89 only those relations harvested from the links found within Wikipedia pages; (3) their union, i. [sent-369, score-0.15]

90 As common practice, we compare with random sense assignment and the most frequent sense (MFS) from SemCor as baselines. [sent-373, score-0.347]

91 Enriching WordNet with encyclopedic relations from Wikipedia yields a consistent improvement against using WordNet (+7. [sent-374, score-0.147]

92 257960 Table 3: Performance on Semeval-2007 coarsegrained all-words WSD with MFS as a back-off strategy when no sense assignment is attempted. [sent-388, score-0.228]

93 Table 3 shows the results for nouns (1,108) and all words (2,269 words): we use the MFS as a back-off strategy when no sense assignment is attempted. [sent-395, score-0.186]

94 In addition, our system achieves better results than Static and Personalized PageRank, indicating that competitive disambiguation performance can still be achieved by a less sophisticated knowledgebased WSD algorithm when provided with a rich amount of high-quality knowledge. [sent-433, score-0.168]

95 1529 5 Conclusions In this paper, we have presented a large-scale method for the automatic enrichment of a computational lexicon with encyclopedic relational knowledge8. [sent-438, score-0.128]

96 , 2009; Navigli and Lapata, 2010) and prove that knowledge-rich disambiguation is a competitive alternative to supervised systems, even when relying on a simple algorithm. [sent-441, score-0.187]

97 Moreover, while the mapping has been used to enrich WordNet with a large amount of semantic edges, the method can be reversed and applied to the encyclopedic resource itself, that is Wikipedia, to perform disambiguation with the corresponding sense inventory (cf. [sent-445, score-0.614]

98 Extended gloss overlap as a measure of semantic relatedness. [sent-473, score-0.166]

99 Building a sense tagged corpus with Open Mind Word Expert. [sent-497, score-0.161]

100 Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. [sent-572, score-0.299]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('wordnet', 0.393), ('wikipedia', 0.313), ('wikipage', 0.299), ('soda', 0.298), ('drink', 0.254), ('wsd', 0.217), ('navigli', 0.203), ('senseswiki', 0.163), ('sense', 0.161), ('wikipages', 0.156), ('disambiguation', 0.138), ('soft', 0.132), ('mapping', 0.126), ('senses', 0.124), ('senseswn', 0.117), ('agirre', 0.113), ('ponzetto', 0.102), ('encyclopedic', 0.096), ('gloss', 0.081), ('synset', 0.076), ('synsets', 0.072), ('roberto', 0.072), ('degree', 0.068), ('cuadros', 0.065), ('rigau', 0.065), ('koeling', 0.06), ('extlesk', 0.059), ('sodium', 0.059), ('links', 0.059), ('semantic', 0.054), ('mfs', 0.053), ('ctx', 0.052), ('relations', 0.051), ('supervised', 0.049), ('pagerank', 0.047), ('eneko', 0.047), ('simone', 0.045), ('milne', 0.045), ('carbonate', 0.045), ('carbonated', 0.045), ('cola', 0.045), ('lemmas', 0.044), ('pages', 0.044), ('paolo', 0.044), ('polysemy', 0.044), ('gabrilovich', 0.042), ('coarsegrained', 0.042), ('harvested', 0.04), ('sweet', 0.039), ('velardi', 0.039), ('water', 0.039), ('ssi', 0.039), ('resource', 0.039), ('strube', 0.036), ('mccarthy', 0.036), ('mihalcea', 0.036), ('lemma', 0.036), ('bitter', 0.036), ('lesk', 0.036), ('methodology', 0.035), ('context', 0.035), ('page', 0.034), ('personalized', 0.033), ('roma', 0.033), ('lines', 0.032), ('monosemous', 0.032), ('nastase', 0.032), ('enrichment', 0.032), ('overlap', 0.031), ('knowledgebased', 0.03), ('glosses', 0.03), ('flavoring', 0.03), ('lemonade', 0.03), ('tehnesne', 0.03), ('treematch', 0.03), ('knowledge', 0.029), ('suchanek', 0.029), ('labels', 0.028), ('associative', 0.028), ('diana', 0.027), ('linked', 0.027), ('namely', 0.026), ('link', 0.026), ('lacalle', 0.026), ('oier', 0.026), ('babelnet', 0.026), ('bml', 0.026), ('montse', 0.026), ('sisterhood', 0.026), ('dfs', 0.026), ('sauper', 0.026), ('established', 0.025), ('contexts', 0.025), ('vertex', 0.025), ('assignment', 0.025), ('disambiguated', 0.024), ('aitor', 0.024), ('injected', 0.024), ('shnarch', 0.024), ('redirects', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000011 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems

Author: Simone Paolo Ponzetto ; Roberto Navigli

Abstract: One of the main obstacles to highperformance Word Sense Disambiguation (WSD) is the knowledge acquisition bottleneck. In this paper, we present a methodology to automatically extend WordNet with large amounts of semantic relations from an encyclopedic resource, namely Wikipedia. We show that, when provided with a vast amount of high-quality semantic relations, simple knowledge-lean disambiguation algorithms compete with state-of-the-art supervised WSD systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets.

2 0.55704415 44 acl-2010-BabelNet: Building a Very Large Multilingual Semantic Network

Author: Roberto Navigli ; Simone Paolo Ponzetto

Abstract: In this paper we present BabelNet a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia. In addition Machine Translation is also applied to enrich the resource with lexical information for all languages. We conduct experiments on new and existing gold-standard datasets to show the high quality and coverage of the resource. –

3 0.36143199 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results

Author: Celina Santamaria ; Julio Gonzalo ; Javier Artiles

Abstract: Is it possible to use sense inventories to improve Web search results diversity for one word queries? To answer this question, we focus on two broad-coverage lexical resources of a different nature: WordNet, as a de-facto standard used in Word Sense Disambiguation experiments; and Wikipedia, as a large coverage, updated encyclopaedic resource which may have a better coverage of relevant senses in Web pages. Our results indicate that (i) Wikipedia has a much better coverage of search results, (ii) the distribution of senses in search results can be estimated using the internal graph structure of the Wikipedia and the relative number of visits received by each sense in Wikipedia, and (iii) associating Web pages to Wikipedia senses with simple and efficient algorithms, we can produce modified rankings that cover 70% more Wikipedia senses than the original search engine rankings. 1 Motivation The application of Word Sense Disambiguation (WSD) to Information Retrieval (IR) has been subject of a significant research effort in the recent past. The essential idea is that, by indexing and matching word senses (or even meanings) , the retrieval process could better handle polysemy and synonymy problems (Sanderson, 2000). In practice, however, there are two main difficulties: (i) for long queries, IR models implicitly perform disambiguation, and thus there is little room for improvement. This is the case with most standard IR benchmarks, such as TREC (trec.nist.gov) or CLEF (www.clef-campaign.org) ad-hoc collections; (ii) for very short queries, disambiguation j ul io @ l i uned . e s j avart s . @bec . uned . e s may not be possible or even desirable. This is often the case with one word and even two word queries in Web search engines. In Web search, there are at least three ways of coping with ambiguity: • • • Promoting diversity in the search results (Clarke negt al., 2008): given th seea query s”uolatssis”, the search engine may try to include representatives for different senses of the word (such as the Oasis band, the Organization for the Advancement of Structured Information Standards, the online fashion store, etc.) among the top results. Search engines are supposed to handle diversity as one of the multiple factors that influence the ranking. Presenting the results as a set of (labelled) cPlruessteenrtsi nragth tehre eth reansu as a a rsan ake sde lti ostf (Carpineto et al., 2009). Complementing search results with search suggestions (e.g. e”oaracshis band”, ”woitahsis s fashion store”) that serve to refine the query in the intended way (Anick, 2003). All of them rely on the ability of the search engine to cluster search results, detecting topic similarities. In all of them, disambiguation is implicit, a side effect of the process but not its explicit target. Clustering may detect that documents about the Oasis band and the Oasis fashion store deal with unrelated topics, but it may as well detect a group of documents discussing why one of the Oasis band members is leaving the band, and another group of documents about Oasis band lyrics; both are different aspects of the broad topic Oasis band. A perfect hierarchical clustering should distinguish between the different Oasis senses at a first level, and then discover different topics within each of the senses. Is it possible to use sense inventories to improve search results for one word queries? To answer 1357 Proce dingUsp opfs thaela 4, 8Stwhe Adnen u,a 1l1- M16e Jtiunlgy o 2f0 t1h0e. A ?c s 2o0c1ia0ti Aosnso focria Ctio nm fpourta Ctoiomnpault Laitniognuaislt Licisn,g puaigsetisc 1s357–136 , this question, we will focus on two broad-coverage lexical resources of a different nature: WordNet (Miller et al., 1990), as a de-facto standard used in Word Sense Disambiguation experiments and many other Natural Language Processing research fields; and Wikipedia (www.wikipedia.org), as a large coverage and updated encyclopedic resource which may have a better coverage of relevant senses in Web pages. Our hypothesis is that, under appropriate conditions, any of the above mechanisms (clustering, search suggestions, diversity) might benefit from an explicit disambiguation (classification of pages in the top search results) using a wide-coverage sense inventory. Our research is focused on four relevant aspects of the problem: 1. Coverage: Are Wikipedia/Wordnet senses representative of search results? Otherwise, trying to make a disambiguation in terms of a fixed sense inventory would be meaningless. 2. If the answer to (1) is positive, the reverse question is also interesting: can we estimate search results diversity using our sense inven- tories? 3. Sense frequencies: knowing sense frequencies in (search results) Web pages is crucial to have a usable sense inventory. Is it possible to estimate Web sense frequencies from currently available information? 4. Classification: The association of Web pages to word senses must be done with some unsupervised algorithm, because it is not possible to hand-tag training material for every possible query word. Can this classification be done accurately? Can it be effective to promote diversity in search results? In order to provide an initial answer to these questions, we have built a corpus consisting of 40 nouns and 100 Google search results per noun, manually annotated with the most appropriate Wordnet and Wikipedia senses. Section 2 describes how this corpus has been created, and in Section 3 we discuss WordNet and Wikipedia coverage of search results according to our testbed. As this initial results clearly discard Wordnet as a sense inventory for the task, the rest of the paper mainly focuses on Wikipedia. In Section 4 we estimate search results diversity from our testbed, finding that the use of Wikipedia could substantially improve diversity in the top results. In Section 5 we use the Wikipedia internal link structure and the number of visits per page to estimate relative frequencies for Wikipedia senses, obtaining an estimation which is highly correlated with actual data in our testbed. Finally, in Section 6 we discuss a few strategies to classify Web pages into word senses, and apply the best classifier to enhance diversity in search results. The paper concludes with a discussion of related work (Section 7) and an overall discussion of our results in Section 8. 2 Test Set 2.1 Set of Words The most crucial step in building our test set is choosing the set of words to be considered. We are looking for words which are susceptible to form a one-word query for a Web search engine, and therefore we should focus on nouns which are used to denote one or more named entities. At the same time we want to have some degree of comparability with previous research on Word Sense Disambiguation, which points to noun sets used in Senseval/SemEval evaluation campaigns1 . Our budget for corpus annotation was enough for two persons-month, which limited us to handle 40 nouns (usually enough to establish statistically significant differences between WSD algorithms, although obviously limited to reach solid figures about the general behaviour of words in the Web). With these arguments in mind, we decided to choose: (i) 15 nouns from the Senseval-3 lexical sample dataset, which have been previously employed by (Mihalcea, 2007) in a related experiment (see Section 7); (ii) 25 additional words which satisfy two conditions: they are all ambiguous, and they are all names for music bands in one of their senses (not necessarily the most salient). The Senseval set is: {argument, arm, atmosphere, bank, degree, difference, disc, irmm-, age, paper, party, performance, plan, shelter, sort, source}. The bands set is {amazon, apple, camel, cell, columbia, cream, foreigner, fox, genesis, jaguar, oasis, pioneer, police, puma, rainbow, shell, skin, sun, tesla, thunder, total, traffic, trapeze, triumph, yes}. Fpoerz e,a trchiu noun, we looked up all its possible senses in WordNet 3.0 and in Wikipedia (using 1http://senseval.org 1358 Table 1: Coverage of Search Results: Wikipedia vs. WordNet Wikiped#ia documents # senses WordNe#t documents Senseval setava2il4a2b/1le0/u0sedassign8e7d7 to (5 s9o%me) senseavai9la2b/5le2/usedassigne6d96 to (4 s6o%m)e sense # senses BaTnodtsa lset868420//21774421323558 ((5546%%))17780/3/9911529995 (2 (342%%)) Wikipedia disambiguation pages). Wikipedia has an average of 22 senses per noun (25.2 in the Bands set and 16. 1in the Senseval set), and Wordnet a much smaller figure, 4.5 (3. 12 for the Bands set and 6.13 for the Senseval set). For a conventional dictionary, a higher ambiguity might indicate an excess of granularity; for an encyclopaedic resource such as Wikipedia, however, it is just an indication of larger coverage. Wikipedia en- tries for camel which are not in WordNet, for instance, include the Apache Camel routing and mediation engine, the British rock band, the brand of cigarettes, the river in Cornwall, and the World World War I fighter biplane. 2.2 Set of Documents We retrieved the 150 first ranked documents for each noun, by submitting the nouns as queries to a Web search engine (Google). Then, for each document, we stored both the snippet (small description of the contents of retrieved document) and the whole HTML document. This collection of documents contain an implicit new inventory of senses, based on Web search, as documents retrieved by a noun query are associated with some sense of the noun. Given that every document in the top Web search results is supposed to be highly relevant for the query word, we assume a ”one sense per document” scenario, although we allow annotators to assign more than one sense per document. In general this assumption turned out to be correct except in a few exceptional cases (such as Wikipedia disambiguation pages): only nine docu- ments received more than one WordNet sense, and 44 (1. 1% of all annotated pages) received more than one Wikipedia sense. 2.3 Manual Annotation We implemented an annotation interface which stored all documents and a short description for every Wordnet and Wikipedia sense. The annotators had to decide, for every document, whether there was one or more appropriate senses in each of the dictionaries. They were instructed to provide annotations for 100 documents per name; if an URL in the list was corrupt or not available, it had to be discarded. We provided 150 documents per name to ensure that the figure of 100 usable documents per name could be reached without problems. Each judge provided annotations for the 4,000 documents in the final data set. In a second round, they met and discussed their independent annotations together, reaching a consensus judgement for every document. 3 Coverage of Web Search Results: Wikipedia vs Wordnet Table 1 shows how Wikipedia and Wordnet cover the senses in search results. We report each noun subset separately (Senseval and bands subsets) as well as aggregated figures. The most relevant fact is that, unsurprisingly, Wikipedia senses cover much more search results (56%) than Wordnet (32%). If we focus on the top ten results, in the bands subset (which should be more representative of plausible web queries) Wikipedia covers 68% of the top ten documents. This is an indication that it can indeed be useful for promoting diversity or help clustering search results: even if 32% of the top ten documents are not covered by Wikipedia, it is still a representative source of senses in the top search results. We have manually examined all documents in the top ten results that are not covered by Wikipedia: a majority of the missing senses consists of names of (generally not well-known) companies (45%) and products or services (26%); the other frequent type (12%) of non annotated doc- ument is disambiguation pages (from Wikipedia and also from other dictionaries). It is also interesting to examine the degree of overlap between Wikipedia and Wordnet senses. Being two different types of lexical resource, they might have some degree of complementarity. Table 2 shows, however, that this is not the case: most of the (annotated) documents either fit Wikipedia senses (26%) or both Wikipedia and Wordnet (29%), and just 3% fit Wordnet only. 1359 Table 2: Overlap between Wikipedia and Wordnet in Search Results # documents annotated with Senseval setWikipe60di7a ( &40 W%o)rdnetWi2k7ip0e (d1i8a% on)lyWo8r9d (n6e%t o)nly534no (3n6e%) BaTnodtsa slet1517729 ( (2239%%))1708566 (3 (216%%))12176 ( (13%%))11614195 ( (4415%%)) Therefore, Wikipedia seems to extend the coverage of Wordnet rather than providing complementary sense information. If we wanted to extend the coverage of Wikipedia, the best strategy seems to be to consider lists ofcompanies, products and services, rather than complementing Wikipedia with additional sense inventories. 4 Diversity in Google Search Results Once we know that Wikipedia senses are a representative subset of actual Web senses (covering more than half of the documents retrieved by the search engine), we can test how well search results respect diversity in terms of this subset of senses. Table 3 displays the number of different senses found at different depths in the search results rank, and the average proportion of total senses that they represent. These results suggest that diversity is not a major priority for ranking results: the top ten results only cover, in average, 3 Wikipedia senses (while the average number of senses listed in Wikipedia is 22). When considering the first 100 documents, this number grows up to 6.85 senses per noun. Another relevant figure is the frequency of the most frequent sense for each word: in average, 63% of the pages in search results belong to the most frequent sense of the query word. This is roughly comparable with most frequent sense figures in standard annotated corpora such as Semcor (Miller et al., 1993) and the Senseval/Semeval data sets, which suggests that diversity may not play a major role in the current Google ranking algorithm. Of course this result must be taken with care, because variability between words is high and unpredictable, and we are using only 40 nouns for our experiment. But what we have is a positive indication that Wikipedia could be used to improve diversity or cluster search results: potentially the first top ten results could cover 6.15 different senses in average (see Section 6.5), which would be a substantial growth. 5 Sense Frequency Estimators for Wikipedia Wikipedia disambiguation pages contain no systematic information about the relative importance of senses for a given word. Such information, however, is crucial in a lexicon, because sense distributions tend to be skewed, and knowing them can help disambiguation algorithms. We have attempted to use two estimators of expected sense distribution: • • Internal relevance of a word sense, measured as incoming alinnckes o ffo ar wthoer U seRnLs o, fm a given sense in Wikipedia. External relevance of a word sense, measured as ttheren naulm rebleevr aonfc vei osifts a f woro trhde s eUnRsLe, mofe a given sense (as reported in http://stats.grok.se). The number of internal incoming links is expected to be relatively stable for Wikipedia articles. As for the number of visits, we performed a comparison of the number of visits received by the bands noun subset in May, June and July 2009, finding a stable-enough scenario with one notorious exception: the number of visits to the noun Tesla raised dramatically in July, because July 10 was the anniversary of the birth of Nicola Tesla, and a special Google logo directed users to the Wikipedia page for the scientist. We have measured correlation between the relative frequencies derived from these two indicators and the actual relative frequencies in our testbed. Therefore, for each noun w and for each sense wi, we consider three values: (i) proportion of documents retrieved for w which are manually assigned to each sense wi; (ii) inlinks(wi) : relative amount of incoming links to each sense wi; and (iii) visits(wi) : relative number of visits to the URL for each sense wi. We have measured the correlation between these three values using a linear regression correlation coefficient, which gives a correlation value of .54 for the number of visits and of .71 for the number of incoming links. Both estimators seem 1360 Table 3: Diversity in Search Results according to Wikipedia F ir s t 12570 docsBave6n425.rd9854a6 s8get#snSe 65sien43. v68a3s27elarcthesTu6543l.o t5083as5lBvaen.r3d2a73s81gectovrSaegnso. 4f32v615aWlsiketpdaTs.3oe249tn01asle to be positively correlated with real relative frequencies in our testbed, with a strong preference for the number of links. We have experimented with weighted combinations of both indicators, using weights of the form (k, 1 k) , k ∈ {0, 0.1, 0.2 . . . 1}, reaching a maxi(mk,a1l c−okrre),lkati ∈on { 0of, .07.13, f0o.r2 t.h.e. following weights: − freq(wi) = 0.9∗inlinks(wi) +0. 1∗visits(wi) (1) This weighted estimator provides a slight advantage over the use of incoming links only (.73 vs .71). Overall, we have an estimator which has a strong correlation with the distribution of senses in our testbed. In the next section we will test its utility for disambiguation purposes. 6 Association of Wikipedia Senses to Web Pages We want to test whether the information provided by Wikipedia can be used to classify search results accurately. Note that we do not want to consider approaches that involve a manual creation of training material, because they can’t be used in practice. Given a Web page p returned by the search engine for the query w, and the set of senses w1 . . . wn listed in Wikipedia, the task is to assign the best candidate sense to p. We consider two different techniques: • A basic Information Retrieval approach, wAhe breas tche I dfoocrmumateionnts Ranetdr tvhael Wikipedia pages are represented using a Vector Space Model (VSM) and compared with a standard cosine measure. This is a basic approach which, if successful, can be used efficiently to classify search results. An approach based on a state-of-the-art supervised oWacShD b system, extracting training examples automatically from Wikipedia content. We also compute two baselines: • • • A random assignment of senses (precision is computed as itghnem ienvnter osfe oenfs tehse ( pnruemcibsieorn o isf senses, for every test case). A most frequent sense heuristic which uses our eosstitm fraetiqoune otf s sense frequencies acnhd u assigns the same sense (the most frequent) to all documents. Both are naive baselines, but it must be noted that the most frequent sense heuristic is usually hard to beat for unsupervised WSD algorithms in most standard data sets. We now describe each of the two main approaches in detail. 6.1 VSM Approach For each word sense, we represent its Wikipedia page in a (unigram) vector space model, assigning standard tf*idf weights to the words in the document. idf weights are computed in two different ways: 1. Experiment VSM computes inverse document frequencies in the collection of retrieved documents (for the word being considered). 2. Experiment VSM-GT uses the statistics provided by the Google Terabyte collection (Brants and Franz, 2006), i.e. it replaces the collection of documents with statistics from a representative snapshot of the Web. 3. Experiment VSM-mixed combines statistics from the collection and from the Google Terabyte collection, following (Chen et al., 2009). The document p is represented in the same vector space as the Wikipedia senses, and it is compared with each of the candidate senses wi via the cosine similarity metric (we have experimented 1361 with other similarity metrics such as χ2, but differences are irrelevant). The sense with the highest similarity to p is assigned to the document. In case of ties (which are rare), we pick the first sense in the Wikipedia disambiguation page (which in practice is like a random decision, because senses in disambiguation pages do not seem to be ordered according to any clear criteria). We have also tested a variant of this approach which uses the estimation of sense frequencies presented above: once the similarities are computed, we consider those cases where two or more senses have a similar score (in particular, all senses with a score greater or equal than 80% of the highest score). In that cases, instead of using the small similarity differences to select a sense, we pick up the one which has the largest frequency according to our estimator. We have applied this strategy to the best performing system, VSM-GT, resulting in experiment VSM-GT+freq. 6.2 WSD Approach We have used TiMBL (Daelemans et al., 2001), a state-of-the-art supervised WSD system which uses Memory-Based Learning. The key, in this case, is how to extract learning examples from the Wikipedia automatically. For each word sense, we basically have three sources of examples: (i) occurrences of the word in the Wikipedia page for the word sense; (ii) occurrences of the word in Wikipedia pages pointing to the page for the word sense; (iii) occurrences of the word in external pages linked in the Wikipedia page for the word sense. After an initial manual inspection, we decided to discard external pages for being too noisy, and we focused on the first two options. We tried three alternatives: • • • TiMBL-core uses only the examples found Tini MtheB page rfoer u tshees sense being atrmaipneleds. TiMBL-inlinks uses the examples found in Wikipedia pages pointing etxoa mthep sense being trained. TiMBL-all uses both sources of examples. In order to classify a page p with respect to the senses for a word w, we first disambiguate all occurrences of w in the page p. Then we choose the sense which appears most frequently in the page according to TiMBL results. In case of ties we pick up the first sense listed in the Wikipedia disambiguation page. We have also experimented with a variant of the approach that uses our estimation of sense frequencies, similarly to what we did with the VSM approach. In this case, (i) when there is a tie between two or more senses (which is much more likely than in the VSM approach), we pick up the sense with the highest frequency according to our estimator; and (ii) when no sense reaches 30% of the cases in the page to be disambiguated, we also resort to the most frequent sense heuristic (among the candidates for the page). This experiment is called TiMBL-core+freq (we discarded ”inlinks” and ”all” versions because they were clearly worse than ”core”). 6.3 Classification Results Table 4 shows classification results. The accuracy of systems is reported as precision, i.e. the number of pages correctly classified divided by the total number of predictions. This is approximately the same as recall (correctly classified pages divided by total number of pages) for our systems, because the algorithms provide an answer for every page containing text (actual coverage is 94% because some pages only contain text as part of an image file such as photographs and logotypes). Table 4: Classification Results Experiment Precision random most frequent sense (estimation) .19 .46 TiMBL-core TiMBL-inlinks TiMBL-all TiMBL-core+freq .60 .50 .58 .67 VSM VSM-GT VSM-mixed VSM-GT+freq .67 .68 .67 .69 All systems are significantly better than the random and most frequent sense baselines (using p < 0.05 for a standard t-test). Overall, both approaches (using TiMBL WSD machinery and using VSM) lead to similar results (.67 vs. .69), which would make VSM preferable because it is a simpler and more efficient approach. Taking a 1362 Figure 1: Precision/Coverage curves for VSM-GT+freq classification algorithm closer look at the results with TiMBL, there are a couple of interesting facts: • There is a substantial difference between using only examples itaalke dnif fferroemnc tehe b Wikipedia Web page for the sense being trained (TiMBL-core, .60) and using examples from the Wikipedia pages pointing to that page (TiMBL-inlinks, .50). Examples taken from related pages (even if the relationship is close as in this case) seem to be too noisy for the task. This result is compatible with findings in (Santamar ı´a et al., 2003) using the Open Directory Project to extract examples automatically. • Our estimation of sense frequencies turns oOuutr rto e tbiem very helpful sfeor f cases wcihesere t our TiMBL-based algorithm cannot provide an answer: precision rises from .60 (TiMBLcore) to .67 (TiMBL-core+freq). The difference is statistically significant (p < 0.05) according to the t-test. As for the experiments with VSM, the variations tested do not provide substantial improvements to the baseline (which is .67). Using idf frequencies obtained from the Google Terabyte corpus (instead of frequencies obtained from the set of retrieved documents) provides only a small improvement (VSM-GT, .68), and adding the estimation of sense frequencies gives another small improvement (.69). Comparing the baseline VSM with the optimal setting (VSM-GT+freq), the difference is small (.67 vs .69) but relatively robust (p = 0.066 according to the t-test). Remarkably, the use of frequency estimations is very helpful for the WSD approach but not for the SVM one, and they both end up with similar performance figures; this might indicate that using frequency estimations is only helpful up to certain precision ceiling. 6.4 Precision/Coverage Trade-off All the above experiments are done at maximal coverage, i.e., all systems assign a sense for every document in the test collection (at least for every document with textual content). But it is possible to enhance search results diversity without annotating every document (in fact, not every document can be assigned to a Wikipedia sense, as we have discussed in Section 3). Thus, it is useful to investigate which is the precision/coverage trade-off in our dataset. We have experimented with the best performing system (VSM-GT+freq), introducing a similarity threshold: assignment of a document to a sense is only done if the similarity of the document to the Wikipedia page for the sense exceeds the similarity threshold. We have computed precision and coverage for every threshold in the range [0.00 −0.90] (beyond 0e.v9e0ry coverage was null) anngde represented 0th] e(b breeysuolntds in Figure 1 (solid line). The graph shows that we 1363 can classify around 20% of the documents with a precision above .90, and around 60% of the documents with a precision of .80. Note that we are reporting disambiguation results using a conventional WSD test set, i.e., one in which every test case (every document) has been manually assigned to some Wikipedia sense. But in our Web Search scenario, 44% of the documents were not assigned to any Wikipedia sense: in practice, our classification algorithm would have to cope with all this noise as well. Figure 1 (dotted line) shows how the precision/coverage curve is affected when the algorithm attempts to disambiguate all documents retrieved by Google, whether they can in fact be assigned to a Wikipedia sense or not. At a coverage of 20%, precision drops approximately from .90 to .70, and at a coverage of 60% it drops from .80 to .50. We now address the question of whether this performance is good enough to improve search re- sults diversity in practice. 6.5 Using Classification to Promote Diversity We now want to estimate how the reported classification accuracy may perform in practice to enhance diversity in search results. In order to provide an initial answer to this question, we have re-ranked the documents for the 40 nouns in our testbed, using our best classifier (VSM-GT+freq) and making a list of the top-ten documents with the primary criterion of maximising the number of senses represented in the set, and the secondary criterion of maximising the similarity scores of the documents to their assigned senses. The algorithm proceeds as follows: we fill each position in the rank (starting at rank 1), with the document which has the highest similarity to some of the senses which are not yet represented in the rank; once all senses are represented, we start choosing a second representative for each sense, following the same criterion. The process goes on until the first ten documents are selected. We have also produced a number of alternative rankings for comparison purposes: clustering (centroids): this method applies eHriiengrarc (hciecnatlr Agglomerative Clustering which proved to be the most competitive clustering algorithm in a similar task (Artiles et al., 2009) to the set of search results, forcing the algorithm to create ten clusters. The centroid of each cluster is then selected Table 5: Enhancement of Search Results Diversity • – – rank@10 # senses coverage Original rank2.8049% Wikipedia 4.75 77% clustering (centroids) 2.50 42% clustering (top ranked) 2.80 46% random 2.45 43% upper bound6.1597% as one of the top ten documents in the new rank. • clustering (top ranked): Applies the same clustering algorithm, db u)t: tAhpisp lti emse t tehe s top ranked document (in the original Google rank) of each cluster is selected. • • random: Randomly selects ten documents frraonmd otmhe: :se Rt aofn dreomtrielyve sde lreecstuslts te. upper bound: This is the maximal diversity tuhpapt can o beu nodb:tai Tnheids iins our mteasxtbiemda. lN doivteer tshitayt coverage is not 100%, because some words have more than ten meanings in Wikipedia and we are only considering the top ten documents. All experiments have been applied on the full set of documents in the testbed, including those which could not be annotated with any Wikipedia sense. Coverage is computed as the ratio of senses that appear in the top ten results compared to the number of senses that appear in all search results. Results are presented in Table 5. Note that diversity in the top ten documents increases from an average of 2.80 Wikipedia senses represented in the original search engine rank, to 4.75 in the modified rank (being 6.15 the upper bound), with the coverage of senses going from 49% to 77%. With a simple VSM algorithm, the coverage of Wikipedia senses in the top ten results is 70% larger than in the original ranking. Using Wikipedia to enhance diversity seems to work much better than clustering: both strategies to select a representative from each cluster are unable to improve the diversity of the original ranking. Note, however, that our evaluation has a bias towards using Wikipedia, because only Wikipedia senses are considered to estimate diversity. Of course our results do not imply that the Wikipedia modified rank is better than the original 1364 Google rank: there are many other factors that influence the final ranking provided by a search engine. What our results indicate is that, with simple and efficient algorithms, Wikipedia can be used as a reference to improve search results diversity for one-word queries. 7 Related Work Web search results clustering and diversity in search results are topics that receive an increasing attention from the research community. Diversity is used both to represent sub-themes in a broad topic, or to consider alternative interpretations for ambiguous queries (Agrawal et al., 2009), which is our interest here. Standard IR test collections do not usually consider ambiguous queries, and are thus inappropriate to test systems that promote diversity (Sanderson, 2008); it is only recently that appropriate test collections are being built, such as (Paramita et al., 2009) for image search and (Artiles et al., 2009) for person name search. We see our testbed as complementary to these ones, and expect that it can contribute to foster research on search results diversity. To our knowledge, Wikipedia has not explicitly been used before to promote diversity in search results; but in (Gollapudi and Sharma, 2009), it is used as a gold standard to evaluate diversification algorithms: given a query with a Wikipedia disambiguation page, an algorithm is evaluated as promoting diversity when different documents in the search results are semantically similar to different Wikipedia pages (describing the alternative senses of the query). Although semantic similarity is measured automatically in this work, our results confirm that this evaluation strategy is sound, because Wikipedia senses are indeed representative of search results. (Clough et al., 2009) analyses query diversity in a Microsoft Live Search, using click entropy and query reformulation as diversity indicators. It was found that at least 9.5% - 16.2% of queries could benefit from diversification, although no correlation was found between the number of senses of a word in Wikipedia and the indicators used to discover diverse queries. This result does not discard, however, that queries where applying diversity is useful cannot benefit from Wikipedia as a sense inventory. In the context of clustering, (Carmel et al., 2009) successfully employ Wikipedia to enhance automatic cluster labeling, finding that Wikipedia labels agree with manual labels associated by humans to a cluster, much more than with signif- icant terms that are extracted directly from the text. In a similar line, both (Gabrilovich and Markovitch, 2007) and (Syed et al., 2008) provide evidence suggesting that categories of Wikipedia articles can successfully describe common concepts in documents. In the field of Natural Language Processing, there has been successful attempts to connect Wikipedia entries to Wordnet senses: (RuizCasado et al., 2005) reports an algorithm that provides an accuracy of 84%. (Mihalcea, 2007) uses internal Wikipedia hyperlinks to derive sensetagged examples. But instead of using Wikipedia directly as sense inventory, Mihalcea then manually maps Wikipedia senses into Wordnet senses (claiming that, at the time of writing the paper, Wikipedia did not consistently report ambiguity in disambiguation pages) and shows that a WSD system based on acquired sense-tagged examples reaches an accuracy well beyond an (informed) most frequent sense heuristic. 8 Conclusions We have investigated whether generic lexical resources can be used to promote diversity in Web search results for one-word, ambiguous queries. We have compared WordNet and Wikipedia and arrived to a number of conclusions: (i) unsurprisingly, Wikipedia has a much better coverage of senses in search results, and is therefore more appropriate for the task; (ii) the distribution of senses in search results can be estimated using the internal graph structure of the Wikipedia and the relative number of visits received by each sense in Wikipedia, and (iii) associating Web pages to Wikipedia senses with simple and efficient algorithms, we can produce modified rankings that cover 70% more Wikipedia senses than the original search engine rankings. We expect that the testbed created for this research will complement the - currently short - set of benchmarking test sets to explore search results diversity and query ambiguity. Our testbed is publicly available for research purposes at http://nlp.uned.es. Our results endorse further investigation on the use of Wikipedia to organize search results. Some limitations of our research, however, must be 1365 noted: (i) the nature of our testbed (with every search result manually annotated in terms of two sense inventories) makes it too small to extract solid conclusions on Web searches (ii) our work does not involve any study of diversity from the point of view of Web users (i.e. when a Web query addresses many different use needs in practice); research in (Clough et al., 2009) suggests that word ambiguity in Wikipedia might not be related with diversity of search needs; (iii) we have tested our classifiers with a simple re-ordering of search results to test how much diversity can be improved, but a search results ranking depends on many other factors, some of them more crucial than diversity; it remains to be tested how can we use document/Wikipedia associations to improve search results clustering (for instance, providing seeds for the clustering process) and to provide search suggestions. Acknowledgments This work has been partially funded by the Spanish Government (project INES/Text-Mess) and the Xunta de Galicia. References R. Agrawal, S. Gollapudi, A. Halverson, and S. Leong. 2009. Diversifying Search Results. In Proc. of WSDM’09. ACM. P. Anick. 2003. Using Terminological Feedback for Web Search Refinement : a Log-based Study. In Proc. ACM SIGIR 2003, pages 88–95. ACM New York, NY, USA. J. Artiles, J. Gonzalo, and S. Sekine. 2009. WePS 2 Evaluation Campaign: overview of the Web People Search Clustering Task. In 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference. 2009. T. Brants and A. Franz. 2006. Web 1T 5-gram, version 1. Philadelphia: Linguistic Data Consortium. D. Carmel, H. Roitman, and N. Zwerdling. 2009. Enhancing Cluster Labeling using Wikipedia. In Pro- ceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 139–146. ACM. C. Carpineto, S. Osinski, G. Romano, and Dawid Weiss. 2009. A Survey of Web Clustering Engines. ACM Computing Surveys, 41(3). Y. Chen, S. Yat Mei Lee, and C. Huang. 2009. PolyUHK: A Robust Information Extraction System for Web Personal Names. In Proc. WWW’09 (WePS2 Workshop). ACM. C. Clarke, M. Kolla, G. Cormack, O. Vechtomova, A. Ashkan, S. B ¨uttcher, and I. MacKinnon. 2008. Novelty and Diversity in Information Retrieval Evaluation. In Proc. SIGIR ’08, pages 659–666. ACM. P. Clough, M. Sanderson, M. Abouammoh, S. Navarro, and M. Paramita. 2009. Multiple Approaches to Analysing Query Diversity. In Proc. of SIGIR 2009. ACM. W. Daelemans, J. Zavrel, K. van der Sloot, and A. van den Bosch. 2001 . TiMBL: Tilburg Memory Based Learner, version 4.0, Reference Guide. Technical report, University of Antwerp. E. Gabrilovich and S. Markovitch. 2007. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In Proceedings of The 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India. S. Gollapudi and A. Sharma. 2009. An Axiomatic Approach for Result Diversification. In Proc. WWW 2009, pages 381–390. ACM New York, NY, USA. R. Mihalcea. 2007. Using Wikipedia for Automatic Word Sense Disambiguation. In Proceedings of NAACL HLT, volume 2007. G. Miller, C. R. Beckwith, D. Fellbaum, Gross, and K. Miller. 1990. Wordnet: An on-line lexical database. International Journal of Lexicograph, 3(4). G.A Miller, C. Leacock, R. Tengi, and Bunker R. T. 1993. A Semantic Concordance. In Proceedings of the ARPA WorkShop on Human Language Technology. San Francisco, Morgan Kaufman. M. Paramita, M. Sanderson, and P. Clough. 2009. Diversity in Photo Retrieval: Overview of the ImageCLEFPhoto task 2009. CLEF working notes, 2009. M. Ruiz-Casado, E. Alfonseca, and P. Castells. 2005. Automatic Assignment of Wikipedia Encyclopaedic Entries to Wordnet Synsets. Advances in Web Intelligence, 3528:380–386. M. Sanderson. 2000. Retrieving with Good Sense. Information Retrieval, 2(1):49–69. M. Sanderson. 2008. Ambiguous Queries: Test Collections Need More Sense. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 499–506. ACM New York, NY, USA. C. Santamar ı´a, J. Gonzalo, and F. Verdejo. 2003. Automatic Association of Web Directories to Word Senses. Computational Linguistics, 29(3):485–502. Z. S. Syed, T. Finin, and Joshi. A. 2008. Wikipedia as an Ontology for Describing Documents. In Proc. ICWSM’08. 1366

4 0.24993844 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection

Author: Linlin Li ; Benjamin Roth ; Caroline Sporleder

Abstract: This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instantiations of the model for solving sense disambiguation problems with different degrees of resource availability. The proposed models are tested on three different tasks: coarse-grained word sense disambiguation, fine-grained word sense disambiguation, and detection of literal vs. nonliteral usages of potentially idiomatic expressions. In all three cases, we outper- form state-of-the-art systems either quantitatively or statistically significantly.

5 0.2154263 152 acl-2010-It Makes Sense: A Wide-Coverage Word Sense Disambiguation System for Free Text

Author: Zhi Zhong ; Hwee Tou Ng

Abstract: Word sense disambiguation (WSD) systems based on supervised learning achieved the best performance in SensEval and SemEval workshops. However, there are few publicly available open source WSD systems. This limits the use of WSD in other applications, especially for researchers whose research interests are not in WSD. In this paper, we present IMS, a supervised English all-words WSD system. The flexible framework of IMS allows users to integrate different preprocessing tools, additional features, and different classifiers. By default, we use linear support vector machines as the classifier with multiple knowledge-based features. In our implementation, IMS achieves state-of-the-art results on several SensEval and SemEval tasks.

6 0.20792159 26 acl-2010-All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision

7 0.19308923 62 acl-2010-Combining Orthogonal Monolingual and Multilingual Sources of Evidence for All Words WSD

8 0.18022312 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation

9 0.16741806 257 acl-2010-WSD as a Distributed Constraint Optimization Problem

10 0.11578937 166 acl-2010-Learning Word-Class Lattices for Definition and Hypernym Extraction

11 0.11422606 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network

12 0.11375637 121 acl-2010-Generating Entailment Rules from FrameNet

13 0.1073192 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models

14 0.10463199 185 acl-2010-Open Information Extraction Using Wikipedia

15 0.098842628 141 acl-2010-Identifying Text Polarity Using Random Walks

16 0.08832293 5 acl-2010-A Framework for Figurative Language Detection Based on Sense Differentiation

17 0.084944226 250 acl-2010-Untangling the Cross-Lingual Link Structure of Wikipedia

18 0.08278852 27 acl-2010-An Active Learning Approach to Finding Related Terms

19 0.081955574 159 acl-2010-Learning 5000 Relational Extractors

20 0.081645377 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.204), (1, 0.149), (2, -0.096), (3, -0.005), (4, 0.472), (5, 0.074), (6, 0.252), (7, 0.177), (8, -0.105), (9, 0.085), (10, 0.123), (11, -0.242), (12, -0.052), (13, 0.103), (14, -0.111), (15, -0.075), (16, -0.009), (17, 0.062), (18, -0.043), (19, -0.002), (20, -0.143), (21, 0.021), (22, -0.038), (23, -0.029), (24, 0.046), (25, 0.032), (26, 0.057), (27, 0.003), (28, -0.019), (29, 0.012), (30, 0.006), (31, -0.018), (32, 0.063), (33, 0.023), (34, 0.008), (35, -0.049), (36, -0.002), (37, 0.047), (38, 0.02), (39, -0.008), (40, 0.019), (41, -0.067), (42, -0.06), (43, -0.077), (44, -0.011), (45, 0.031), (46, -0.067), (47, -0.037), (48, -0.047), (49, -0.004)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97565019 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems

Author: Simone Paolo Ponzetto ; Roberto Navigli

Abstract: One of the main obstacles to highperformance Word Sense Disambiguation (WSD) is the knowledge acquisition bottleneck. In this paper, we present a methodology to automatically extend WordNet with large amounts of semantic relations from an encyclopedic resource, namely Wikipedia. We show that, when provided with a vast amount of high-quality semantic relations, simple knowledge-lean disambiguation algorithms compete with state-of-the-art supervised WSD systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets.

2 0.94902188 44 acl-2010-BabelNet: Building a Very Large Multilingual Semantic Network

Author: Roberto Navigli ; Simone Paolo Ponzetto

Abstract: In this paper we present BabelNet a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia. In addition Machine Translation is also applied to enrich the resource with lexical information for all languages. We conduct experiments on new and existing gold-standard datasets to show the high quality and coverage of the resource. –

3 0.87550569 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results

Author: Celina Santamaria ; Julio Gonzalo ; Javier Artiles

Abstract: Is it possible to use sense inventories to improve Web search results diversity for one word queries? To answer this question, we focus on two broad-coverage lexical resources of a different nature: WordNet, as a de-facto standard used in Word Sense Disambiguation experiments; and Wikipedia, as a large coverage, updated encyclopaedic resource which may have a better coverage of relevant senses in Web pages. Our results indicate that (i) Wikipedia has a much better coverage of search results, (ii) the distribution of senses in search results can be estimated using the internal graph structure of the Wikipedia and the relative number of visits received by each sense in Wikipedia, and (iii) associating Web pages to Wikipedia senses with simple and efficient algorithms, we can produce modified rankings that cover 70% more Wikipedia senses than the original search engine rankings. 1 Motivation The application of Word Sense Disambiguation (WSD) to Information Retrieval (IR) has been subject of a significant research effort in the recent past. The essential idea is that, by indexing and matching word senses (or even meanings) , the retrieval process could better handle polysemy and synonymy problems (Sanderson, 2000). In practice, however, there are two main difficulties: (i) for long queries, IR models implicitly perform disambiguation, and thus there is little room for improvement. This is the case with most standard IR benchmarks, such as TREC (trec.nist.gov) or CLEF (www.clef-campaign.org) ad-hoc collections; (ii) for very short queries, disambiguation j ul io @ l i uned . e s j avart s . @bec . uned . e s may not be possible or even desirable. This is often the case with one word and even two word queries in Web search engines. In Web search, there are at least three ways of coping with ambiguity: • • • Promoting diversity in the search results (Clarke negt al., 2008): given th seea query s”uolatssis”, the search engine may try to include representatives for different senses of the word (such as the Oasis band, the Organization for the Advancement of Structured Information Standards, the online fashion store, etc.) among the top results. Search engines are supposed to handle diversity as one of the multiple factors that influence the ranking. Presenting the results as a set of (labelled) cPlruessteenrtsi nragth tehre eth reansu as a a rsan ake sde lti ostf (Carpineto et al., 2009). Complementing search results with search suggestions (e.g. e”oaracshis band”, ”woitahsis s fashion store”) that serve to refine the query in the intended way (Anick, 2003). All of them rely on the ability of the search engine to cluster search results, detecting topic similarities. In all of them, disambiguation is implicit, a side effect of the process but not its explicit target. Clustering may detect that documents about the Oasis band and the Oasis fashion store deal with unrelated topics, but it may as well detect a group of documents discussing why one of the Oasis band members is leaving the band, and another group of documents about Oasis band lyrics; both are different aspects of the broad topic Oasis band. A perfect hierarchical clustering should distinguish between the different Oasis senses at a first level, and then discover different topics within each of the senses. Is it possible to use sense inventories to improve search results for one word queries? To answer 1357 Proce dingUsp opfs thaela 4, 8Stwhe Adnen u,a 1l1- M16e Jtiunlgy o 2f0 t1h0e. A ?c s 2o0c1ia0ti Aosnso focria Ctio nm fpourta Ctoiomnpault Laitniognuaislt Licisn,g puaigsetisc 1s357–136 , this question, we will focus on two broad-coverage lexical resources of a different nature: WordNet (Miller et al., 1990), as a de-facto standard used in Word Sense Disambiguation experiments and many other Natural Language Processing research fields; and Wikipedia (www.wikipedia.org), as a large coverage and updated encyclopedic resource which may have a better coverage of relevant senses in Web pages. Our hypothesis is that, under appropriate conditions, any of the above mechanisms (clustering, search suggestions, diversity) might benefit from an explicit disambiguation (classification of pages in the top search results) using a wide-coverage sense inventory. Our research is focused on four relevant aspects of the problem: 1. Coverage: Are Wikipedia/Wordnet senses representative of search results? Otherwise, trying to make a disambiguation in terms of a fixed sense inventory would be meaningless. 2. If the answer to (1) is positive, the reverse question is also interesting: can we estimate search results diversity using our sense inven- tories? 3. Sense frequencies: knowing sense frequencies in (search results) Web pages is crucial to have a usable sense inventory. Is it possible to estimate Web sense frequencies from currently available information? 4. Classification: The association of Web pages to word senses must be done with some unsupervised algorithm, because it is not possible to hand-tag training material for every possible query word. Can this classification be done accurately? Can it be effective to promote diversity in search results? In order to provide an initial answer to these questions, we have built a corpus consisting of 40 nouns and 100 Google search results per noun, manually annotated with the most appropriate Wordnet and Wikipedia senses. Section 2 describes how this corpus has been created, and in Section 3 we discuss WordNet and Wikipedia coverage of search results according to our testbed. As this initial results clearly discard Wordnet as a sense inventory for the task, the rest of the paper mainly focuses on Wikipedia. In Section 4 we estimate search results diversity from our testbed, finding that the use of Wikipedia could substantially improve diversity in the top results. In Section 5 we use the Wikipedia internal link structure and the number of visits per page to estimate relative frequencies for Wikipedia senses, obtaining an estimation which is highly correlated with actual data in our testbed. Finally, in Section 6 we discuss a few strategies to classify Web pages into word senses, and apply the best classifier to enhance diversity in search results. The paper concludes with a discussion of related work (Section 7) and an overall discussion of our results in Section 8. 2 Test Set 2.1 Set of Words The most crucial step in building our test set is choosing the set of words to be considered. We are looking for words which are susceptible to form a one-word query for a Web search engine, and therefore we should focus on nouns which are used to denote one or more named entities. At the same time we want to have some degree of comparability with previous research on Word Sense Disambiguation, which points to noun sets used in Senseval/SemEval evaluation campaigns1 . Our budget for corpus annotation was enough for two persons-month, which limited us to handle 40 nouns (usually enough to establish statistically significant differences between WSD algorithms, although obviously limited to reach solid figures about the general behaviour of words in the Web). With these arguments in mind, we decided to choose: (i) 15 nouns from the Senseval-3 lexical sample dataset, which have been previously employed by (Mihalcea, 2007) in a related experiment (see Section 7); (ii) 25 additional words which satisfy two conditions: they are all ambiguous, and they are all names for music bands in one of their senses (not necessarily the most salient). The Senseval set is: {argument, arm, atmosphere, bank, degree, difference, disc, irmm-, age, paper, party, performance, plan, shelter, sort, source}. The bands set is {amazon, apple, camel, cell, columbia, cream, foreigner, fox, genesis, jaguar, oasis, pioneer, police, puma, rainbow, shell, skin, sun, tesla, thunder, total, traffic, trapeze, triumph, yes}. Fpoerz e,a trchiu noun, we looked up all its possible senses in WordNet 3.0 and in Wikipedia (using 1http://senseval.org 1358 Table 1: Coverage of Search Results: Wikipedia vs. WordNet Wikiped#ia documents # senses WordNe#t documents Senseval setava2il4a2b/1le0/u0sedassign8e7d7 to (5 s9o%me) senseavai9la2b/5le2/usedassigne6d96 to (4 s6o%m)e sense # senses BaTnodtsa lset868420//21774421323558 ((5546%%))17780/3/9911529995 (2 (342%%)) Wikipedia disambiguation pages). Wikipedia has an average of 22 senses per noun (25.2 in the Bands set and 16. 1in the Senseval set), and Wordnet a much smaller figure, 4.5 (3. 12 for the Bands set and 6.13 for the Senseval set). For a conventional dictionary, a higher ambiguity might indicate an excess of granularity; for an encyclopaedic resource such as Wikipedia, however, it is just an indication of larger coverage. Wikipedia en- tries for camel which are not in WordNet, for instance, include the Apache Camel routing and mediation engine, the British rock band, the brand of cigarettes, the river in Cornwall, and the World World War I fighter biplane. 2.2 Set of Documents We retrieved the 150 first ranked documents for each noun, by submitting the nouns as queries to a Web search engine (Google). Then, for each document, we stored both the snippet (small description of the contents of retrieved document) and the whole HTML document. This collection of documents contain an implicit new inventory of senses, based on Web search, as documents retrieved by a noun query are associated with some sense of the noun. Given that every document in the top Web search results is supposed to be highly relevant for the query word, we assume a ”one sense per document” scenario, although we allow annotators to assign more than one sense per document. In general this assumption turned out to be correct except in a few exceptional cases (such as Wikipedia disambiguation pages): only nine docu- ments received more than one WordNet sense, and 44 (1. 1% of all annotated pages) received more than one Wikipedia sense. 2.3 Manual Annotation We implemented an annotation interface which stored all documents and a short description for every Wordnet and Wikipedia sense. The annotators had to decide, for every document, whether there was one or more appropriate senses in each of the dictionaries. They were instructed to provide annotations for 100 documents per name; if an URL in the list was corrupt or not available, it had to be discarded. We provided 150 documents per name to ensure that the figure of 100 usable documents per name could be reached without problems. Each judge provided annotations for the 4,000 documents in the final data set. In a second round, they met and discussed their independent annotations together, reaching a consensus judgement for every document. 3 Coverage of Web Search Results: Wikipedia vs Wordnet Table 1 shows how Wikipedia and Wordnet cover the senses in search results. We report each noun subset separately (Senseval and bands subsets) as well as aggregated figures. The most relevant fact is that, unsurprisingly, Wikipedia senses cover much more search results (56%) than Wordnet (32%). If we focus on the top ten results, in the bands subset (which should be more representative of plausible web queries) Wikipedia covers 68% of the top ten documents. This is an indication that it can indeed be useful for promoting diversity or help clustering search results: even if 32% of the top ten documents are not covered by Wikipedia, it is still a representative source of senses in the top search results. We have manually examined all documents in the top ten results that are not covered by Wikipedia: a majority of the missing senses consists of names of (generally not well-known) companies (45%) and products or services (26%); the other frequent type (12%) of non annotated doc- ument is disambiguation pages (from Wikipedia and also from other dictionaries). It is also interesting to examine the degree of overlap between Wikipedia and Wordnet senses. Being two different types of lexical resource, they might have some degree of complementarity. Table 2 shows, however, that this is not the case: most of the (annotated) documents either fit Wikipedia senses (26%) or both Wikipedia and Wordnet (29%), and just 3% fit Wordnet only. 1359 Table 2: Overlap between Wikipedia and Wordnet in Search Results # documents annotated with Senseval setWikipe60di7a ( &40 W%o)rdnetWi2k7ip0e (d1i8a% on)lyWo8r9d (n6e%t o)nly534no (3n6e%) BaTnodtsa slet1517729 ( (2239%%))1708566 (3 (216%%))12176 ( (13%%))11614195 ( (4415%%)) Therefore, Wikipedia seems to extend the coverage of Wordnet rather than providing complementary sense information. If we wanted to extend the coverage of Wikipedia, the best strategy seems to be to consider lists ofcompanies, products and services, rather than complementing Wikipedia with additional sense inventories. 4 Diversity in Google Search Results Once we know that Wikipedia senses are a representative subset of actual Web senses (covering more than half of the documents retrieved by the search engine), we can test how well search results respect diversity in terms of this subset of senses. Table 3 displays the number of different senses found at different depths in the search results rank, and the average proportion of total senses that they represent. These results suggest that diversity is not a major priority for ranking results: the top ten results only cover, in average, 3 Wikipedia senses (while the average number of senses listed in Wikipedia is 22). When considering the first 100 documents, this number grows up to 6.85 senses per noun. Another relevant figure is the frequency of the most frequent sense for each word: in average, 63% of the pages in search results belong to the most frequent sense of the query word. This is roughly comparable with most frequent sense figures in standard annotated corpora such as Semcor (Miller et al., 1993) and the Senseval/Semeval data sets, which suggests that diversity may not play a major role in the current Google ranking algorithm. Of course this result must be taken with care, because variability between words is high and unpredictable, and we are using only 40 nouns for our experiment. But what we have is a positive indication that Wikipedia could be used to improve diversity or cluster search results: potentially the first top ten results could cover 6.15 different senses in average (see Section 6.5), which would be a substantial growth. 5 Sense Frequency Estimators for Wikipedia Wikipedia disambiguation pages contain no systematic information about the relative importance of senses for a given word. Such information, however, is crucial in a lexicon, because sense distributions tend to be skewed, and knowing them can help disambiguation algorithms. We have attempted to use two estimators of expected sense distribution: • • Internal relevance of a word sense, measured as incoming alinnckes o ffo ar wthoer U seRnLs o, fm a given sense in Wikipedia. External relevance of a word sense, measured as ttheren naulm rebleevr aonfc vei osifts a f woro trhde s eUnRsLe, mofe a given sense (as reported in http://stats.grok.se). The number of internal incoming links is expected to be relatively stable for Wikipedia articles. As for the number of visits, we performed a comparison of the number of visits received by the bands noun subset in May, June and July 2009, finding a stable-enough scenario with one notorious exception: the number of visits to the noun Tesla raised dramatically in July, because July 10 was the anniversary of the birth of Nicola Tesla, and a special Google logo directed users to the Wikipedia page for the scientist. We have measured correlation between the relative frequencies derived from these two indicators and the actual relative frequencies in our testbed. Therefore, for each noun w and for each sense wi, we consider three values: (i) proportion of documents retrieved for w which are manually assigned to each sense wi; (ii) inlinks(wi) : relative amount of incoming links to each sense wi; and (iii) visits(wi) : relative number of visits to the URL for each sense wi. We have measured the correlation between these three values using a linear regression correlation coefficient, which gives a correlation value of .54 for the number of visits and of .71 for the number of incoming links. Both estimators seem 1360 Table 3: Diversity in Search Results according to Wikipedia F ir s t 12570 docsBave6n425.rd9854a6 s8get#snSe 65sien43. v68a3s27elarcthesTu6543l.o t5083as5lBvaen.r3d2a73s81gectovrSaegnso. 4f32v615aWlsiketpdaTs.3oe249tn01asle to be positively correlated with real relative frequencies in our testbed, with a strong preference for the number of links. We have experimented with weighted combinations of both indicators, using weights of the form (k, 1 k) , k ∈ {0, 0.1, 0.2 . . . 1}, reaching a maxi(mk,a1l c−okrre),lkati ∈on { 0of, .07.13, f0o.r2 t.h.e. following weights: − freq(wi) = 0.9∗inlinks(wi) +0. 1∗visits(wi) (1) This weighted estimator provides a slight advantage over the use of incoming links only (.73 vs .71). Overall, we have an estimator which has a strong correlation with the distribution of senses in our testbed. In the next section we will test its utility for disambiguation purposes. 6 Association of Wikipedia Senses to Web Pages We want to test whether the information provided by Wikipedia can be used to classify search results accurately. Note that we do not want to consider approaches that involve a manual creation of training material, because they can’t be used in practice. Given a Web page p returned by the search engine for the query w, and the set of senses w1 . . . wn listed in Wikipedia, the task is to assign the best candidate sense to p. We consider two different techniques: • A basic Information Retrieval approach, wAhe breas tche I dfoocrmumateionnts Ranetdr tvhael Wikipedia pages are represented using a Vector Space Model (VSM) and compared with a standard cosine measure. This is a basic approach which, if successful, can be used efficiently to classify search results. An approach based on a state-of-the-art supervised oWacShD b system, extracting training examples automatically from Wikipedia content. We also compute two baselines: • • • A random assignment of senses (precision is computed as itghnem ienvnter osfe oenfs tehse ( pnruemcibsieorn o isf senses, for every test case). A most frequent sense heuristic which uses our eosstitm fraetiqoune otf s sense frequencies acnhd u assigns the same sense (the most frequent) to all documents. Both are naive baselines, but it must be noted that the most frequent sense heuristic is usually hard to beat for unsupervised WSD algorithms in most standard data sets. We now describe each of the two main approaches in detail. 6.1 VSM Approach For each word sense, we represent its Wikipedia page in a (unigram) vector space model, assigning standard tf*idf weights to the words in the document. idf weights are computed in two different ways: 1. Experiment VSM computes inverse document frequencies in the collection of retrieved documents (for the word being considered). 2. Experiment VSM-GT uses the statistics provided by the Google Terabyte collection (Brants and Franz, 2006), i.e. it replaces the collection of documents with statistics from a representative snapshot of the Web. 3. Experiment VSM-mixed combines statistics from the collection and from the Google Terabyte collection, following (Chen et al., 2009). The document p is represented in the same vector space as the Wikipedia senses, and it is compared with each of the candidate senses wi via the cosine similarity metric (we have experimented 1361 with other similarity metrics such as χ2, but differences are irrelevant). The sense with the highest similarity to p is assigned to the document. In case of ties (which are rare), we pick the first sense in the Wikipedia disambiguation page (which in practice is like a random decision, because senses in disambiguation pages do not seem to be ordered according to any clear criteria). We have also tested a variant of this approach which uses the estimation of sense frequencies presented above: once the similarities are computed, we consider those cases where two or more senses have a similar score (in particular, all senses with a score greater or equal than 80% of the highest score). In that cases, instead of using the small similarity differences to select a sense, we pick up the one which has the largest frequency according to our estimator. We have applied this strategy to the best performing system, VSM-GT, resulting in experiment VSM-GT+freq. 6.2 WSD Approach We have used TiMBL (Daelemans et al., 2001), a state-of-the-art supervised WSD system which uses Memory-Based Learning. The key, in this case, is how to extract learning examples from the Wikipedia automatically. For each word sense, we basically have three sources of examples: (i) occurrences of the word in the Wikipedia page for the word sense; (ii) occurrences of the word in Wikipedia pages pointing to the page for the word sense; (iii) occurrences of the word in external pages linked in the Wikipedia page for the word sense. After an initial manual inspection, we decided to discard external pages for being too noisy, and we focused on the first two options. We tried three alternatives: • • • TiMBL-core uses only the examples found Tini MtheB page rfoer u tshees sense being atrmaipneleds. TiMBL-inlinks uses the examples found in Wikipedia pages pointing etxoa mthep sense being trained. TiMBL-all uses both sources of examples. In order to classify a page p with respect to the senses for a word w, we first disambiguate all occurrences of w in the page p. Then we choose the sense which appears most frequently in the page according to TiMBL results. In case of ties we pick up the first sense listed in the Wikipedia disambiguation page. We have also experimented with a variant of the approach that uses our estimation of sense frequencies, similarly to what we did with the VSM approach. In this case, (i) when there is a tie between two or more senses (which is much more likely than in the VSM approach), we pick up the sense with the highest frequency according to our estimator; and (ii) when no sense reaches 30% of the cases in the page to be disambiguated, we also resort to the most frequent sense heuristic (among the candidates for the page). This experiment is called TiMBL-core+freq (we discarded ”inlinks” and ”all” versions because they were clearly worse than ”core”). 6.3 Classification Results Table 4 shows classification results. The accuracy of systems is reported as precision, i.e. the number of pages correctly classified divided by the total number of predictions. This is approximately the same as recall (correctly classified pages divided by total number of pages) for our systems, because the algorithms provide an answer for every page containing text (actual coverage is 94% because some pages only contain text as part of an image file such as photographs and logotypes). Table 4: Classification Results Experiment Precision random most frequent sense (estimation) .19 .46 TiMBL-core TiMBL-inlinks TiMBL-all TiMBL-core+freq .60 .50 .58 .67 VSM VSM-GT VSM-mixed VSM-GT+freq .67 .68 .67 .69 All systems are significantly better than the random and most frequent sense baselines (using p < 0.05 for a standard t-test). Overall, both approaches (using TiMBL WSD machinery and using VSM) lead to similar results (.67 vs. .69), which would make VSM preferable because it is a simpler and more efficient approach. Taking a 1362 Figure 1: Precision/Coverage curves for VSM-GT+freq classification algorithm closer look at the results with TiMBL, there are a couple of interesting facts: • There is a substantial difference between using only examples itaalke dnif fferroemnc tehe b Wikipedia Web page for the sense being trained (TiMBL-core, .60) and using examples from the Wikipedia pages pointing to that page (TiMBL-inlinks, .50). Examples taken from related pages (even if the relationship is close as in this case) seem to be too noisy for the task. This result is compatible with findings in (Santamar ı´a et al., 2003) using the Open Directory Project to extract examples automatically. • Our estimation of sense frequencies turns oOuutr rto e tbiem very helpful sfeor f cases wcihesere t our TiMBL-based algorithm cannot provide an answer: precision rises from .60 (TiMBLcore) to .67 (TiMBL-core+freq). The difference is statistically significant (p < 0.05) according to the t-test. As for the experiments with VSM, the variations tested do not provide substantial improvements to the baseline (which is .67). Using idf frequencies obtained from the Google Terabyte corpus (instead of frequencies obtained from the set of retrieved documents) provides only a small improvement (VSM-GT, .68), and adding the estimation of sense frequencies gives another small improvement (.69). Comparing the baseline VSM with the optimal setting (VSM-GT+freq), the difference is small (.67 vs .69) but relatively robust (p = 0.066 according to the t-test). Remarkably, the use of frequency estimations is very helpful for the WSD approach but not for the SVM one, and they both end up with similar performance figures; this might indicate that using frequency estimations is only helpful up to certain precision ceiling. 6.4 Precision/Coverage Trade-off All the above experiments are done at maximal coverage, i.e., all systems assign a sense for every document in the test collection (at least for every document with textual content). But it is possible to enhance search results diversity without annotating every document (in fact, not every document can be assigned to a Wikipedia sense, as we have discussed in Section 3). Thus, it is useful to investigate which is the precision/coverage trade-off in our dataset. We have experimented with the best performing system (VSM-GT+freq), introducing a similarity threshold: assignment of a document to a sense is only done if the similarity of the document to the Wikipedia page for the sense exceeds the similarity threshold. We have computed precision and coverage for every threshold in the range [0.00 −0.90] (beyond 0e.v9e0ry coverage was null) anngde represented 0th] e(b breeysuolntds in Figure 1 (solid line). The graph shows that we 1363 can classify around 20% of the documents with a precision above .90, and around 60% of the documents with a precision of .80. Note that we are reporting disambiguation results using a conventional WSD test set, i.e., one in which every test case (every document) has been manually assigned to some Wikipedia sense. But in our Web Search scenario, 44% of the documents were not assigned to any Wikipedia sense: in practice, our classification algorithm would have to cope with all this noise as well. Figure 1 (dotted line) shows how the precision/coverage curve is affected when the algorithm attempts to disambiguate all documents retrieved by Google, whether they can in fact be assigned to a Wikipedia sense or not. At a coverage of 20%, precision drops approximately from .90 to .70, and at a coverage of 60% it drops from .80 to .50. We now address the question of whether this performance is good enough to improve search re- sults diversity in practice. 6.5 Using Classification to Promote Diversity We now want to estimate how the reported classification accuracy may perform in practice to enhance diversity in search results. In order to provide an initial answer to this question, we have re-ranked the documents for the 40 nouns in our testbed, using our best classifier (VSM-GT+freq) and making a list of the top-ten documents with the primary criterion of maximising the number of senses represented in the set, and the secondary criterion of maximising the similarity scores of the documents to their assigned senses. The algorithm proceeds as follows: we fill each position in the rank (starting at rank 1), with the document which has the highest similarity to some of the senses which are not yet represented in the rank; once all senses are represented, we start choosing a second representative for each sense, following the same criterion. The process goes on until the first ten documents are selected. We have also produced a number of alternative rankings for comparison purposes: clustering (centroids): this method applies eHriiengrarc (hciecnatlr Agglomerative Clustering which proved to be the most competitive clustering algorithm in a similar task (Artiles et al., 2009) to the set of search results, forcing the algorithm to create ten clusters. The centroid of each cluster is then selected Table 5: Enhancement of Search Results Diversity • – – rank@10 # senses coverage Original rank2.8049% Wikipedia 4.75 77% clustering (centroids) 2.50 42% clustering (top ranked) 2.80 46% random 2.45 43% upper bound6.1597% as one of the top ten documents in the new rank. • clustering (top ranked): Applies the same clustering algorithm, db u)t: tAhpisp lti emse t tehe s top ranked document (in the original Google rank) of each cluster is selected. • • random: Randomly selects ten documents frraonmd otmhe: :se Rt aofn dreomtrielyve sde lreecstuslts te. upper bound: This is the maximal diversity tuhpapt can o beu nodb:tai Tnheids iins our mteasxtbiemda. lN doivteer tshitayt coverage is not 100%, because some words have more than ten meanings in Wikipedia and we are only considering the top ten documents. All experiments have been applied on the full set of documents in the testbed, including those which could not be annotated with any Wikipedia sense. Coverage is computed as the ratio of senses that appear in the top ten results compared to the number of senses that appear in all search results. Results are presented in Table 5. Note that diversity in the top ten documents increases from an average of 2.80 Wikipedia senses represented in the original search engine rank, to 4.75 in the modified rank (being 6.15 the upper bound), with the coverage of senses going from 49% to 77%. With a simple VSM algorithm, the coverage of Wikipedia senses in the top ten results is 70% larger than in the original ranking. Using Wikipedia to enhance diversity seems to work much better than clustering: both strategies to select a representative from each cluster are unable to improve the diversity of the original ranking. Note, however, that our evaluation has a bias towards using Wikipedia, because only Wikipedia senses are considered to estimate diversity. Of course our results do not imply that the Wikipedia modified rank is better than the original 1364 Google rank: there are many other factors that influence the final ranking provided by a search engine. What our results indicate is that, with simple and efficient algorithms, Wikipedia can be used as a reference to improve search results diversity for one-word queries. 7 Related Work Web search results clustering and diversity in search results are topics that receive an increasing attention from the research community. Diversity is used both to represent sub-themes in a broad topic, or to consider alternative interpretations for ambiguous queries (Agrawal et al., 2009), which is our interest here. Standard IR test collections do not usually consider ambiguous queries, and are thus inappropriate to test systems that promote diversity (Sanderson, 2008); it is only recently that appropriate test collections are being built, such as (Paramita et al., 2009) for image search and (Artiles et al., 2009) for person name search. We see our testbed as complementary to these ones, and expect that it can contribute to foster research on search results diversity. To our knowledge, Wikipedia has not explicitly been used before to promote diversity in search results; but in (Gollapudi and Sharma, 2009), it is used as a gold standard to evaluate diversification algorithms: given a query with a Wikipedia disambiguation page, an algorithm is evaluated as promoting diversity when different documents in the search results are semantically similar to different Wikipedia pages (describing the alternative senses of the query). Although semantic similarity is measured automatically in this work, our results confirm that this evaluation strategy is sound, because Wikipedia senses are indeed representative of search results. (Clough et al., 2009) analyses query diversity in a Microsoft Live Search, using click entropy and query reformulation as diversity indicators. It was found that at least 9.5% - 16.2% of queries could benefit from diversification, although no correlation was found between the number of senses of a word in Wikipedia and the indicators used to discover diverse queries. This result does not discard, however, that queries where applying diversity is useful cannot benefit from Wikipedia as a sense inventory. In the context of clustering, (Carmel et al., 2009) successfully employ Wikipedia to enhance automatic cluster labeling, finding that Wikipedia labels agree with manual labels associated by humans to a cluster, much more than with signif- icant terms that are extracted directly from the text. In a similar line, both (Gabrilovich and Markovitch, 2007) and (Syed et al., 2008) provide evidence suggesting that categories of Wikipedia articles can successfully describe common concepts in documents. In the field of Natural Language Processing, there has been successful attempts to connect Wikipedia entries to Wordnet senses: (RuizCasado et al., 2005) reports an algorithm that provides an accuracy of 84%. (Mihalcea, 2007) uses internal Wikipedia hyperlinks to derive sensetagged examples. But instead of using Wikipedia directly as sense inventory, Mihalcea then manually maps Wikipedia senses into Wordnet senses (claiming that, at the time of writing the paper, Wikipedia did not consistently report ambiguity in disambiguation pages) and shows that a WSD system based on acquired sense-tagged examples reaches an accuracy well beyond an (informed) most frequent sense heuristic. 8 Conclusions We have investigated whether generic lexical resources can be used to promote diversity in Web search results for one-word, ambiguous queries. We have compared WordNet and Wikipedia and arrived to a number of conclusions: (i) unsurprisingly, Wikipedia has a much better coverage of senses in search results, and is therefore more appropriate for the task; (ii) the distribution of senses in search results can be estimated using the internal graph structure of the Wikipedia and the relative number of visits received by each sense in Wikipedia, and (iii) associating Web pages to Wikipedia senses with simple and efficient algorithms, we can produce modified rankings that cover 70% more Wikipedia senses than the original search engine rankings. We expect that the testbed created for this research will complement the - currently short - set of benchmarking test sets to explore search results diversity and query ambiguity. Our testbed is publicly available for research purposes at http://nlp.uned.es. Our results endorse further investigation on the use of Wikipedia to organize search results. Some limitations of our research, however, must be 1365 noted: (i) the nature of our testbed (with every search result manually annotated in terms of two sense inventories) makes it too small to extract solid conclusions on Web searches (ii) our work does not involve any study of diversity from the point of view of Web users (i.e. when a Web query addresses many different use needs in practice); research in (Clough et al., 2009) suggests that word ambiguity in Wikipedia might not be related with diversity of search needs; (iii) we have tested our classifiers with a simple re-ordering of search results to test how much diversity can be improved, but a search results ranking depends on many other factors, some of them more crucial than diversity; it remains to be tested how can we use document/Wikipedia associations to improve search results clustering (for instance, providing seeds for the clustering process) and to provide search suggestions. Acknowledgments This work has been partially funded by the Spanish Government (project INES/Text-Mess) and the Xunta de Galicia. References R. Agrawal, S. Gollapudi, A. Halverson, and S. Leong. 2009. Diversifying Search Results. In Proc. of WSDM’09. ACM. P. Anick. 2003. Using Terminological Feedback for Web Search Refinement : a Log-based Study. In Proc. ACM SIGIR 2003, pages 88–95. ACM New York, NY, USA. J. Artiles, J. Gonzalo, and S. Sekine. 2009. WePS 2 Evaluation Campaign: overview of the Web People Search Clustering Task. In 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference. 2009. T. Brants and A. Franz. 2006. Web 1T 5-gram, version 1. Philadelphia: Linguistic Data Consortium. D. Carmel, H. Roitman, and N. Zwerdling. 2009. Enhancing Cluster Labeling using Wikipedia. In Pro- ceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 139–146. ACM. C. Carpineto, S. Osinski, G. Romano, and Dawid Weiss. 2009. A Survey of Web Clustering Engines. ACM Computing Surveys, 41(3). Y. Chen, S. Yat Mei Lee, and C. Huang. 2009. PolyUHK: A Robust Information Extraction System for Web Personal Names. In Proc. WWW’09 (WePS2 Workshop). ACM. C. Clarke, M. Kolla, G. Cormack, O. Vechtomova, A. Ashkan, S. B ¨uttcher, and I. MacKinnon. 2008. Novelty and Diversity in Information Retrieval Evaluation. In Proc. SIGIR ’08, pages 659–666. ACM. P. Clough, M. Sanderson, M. Abouammoh, S. Navarro, and M. Paramita. 2009. Multiple Approaches to Analysing Query Diversity. In Proc. of SIGIR 2009. ACM. W. Daelemans, J. Zavrel, K. van der Sloot, and A. van den Bosch. 2001 . TiMBL: Tilburg Memory Based Learner, version 4.0, Reference Guide. Technical report, University of Antwerp. E. Gabrilovich and S. Markovitch. 2007. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In Proceedings of The 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India. S. Gollapudi and A. Sharma. 2009. An Axiomatic Approach for Result Diversification. In Proc. WWW 2009, pages 381–390. ACM New York, NY, USA. R. Mihalcea. 2007. Using Wikipedia for Automatic Word Sense Disambiguation. In Proceedings of NAACL HLT, volume 2007. G. Miller, C. R. Beckwith, D. Fellbaum, Gross, and K. Miller. 1990. Wordnet: An on-line lexical database. International Journal of Lexicograph, 3(4). G.A Miller, C. Leacock, R. Tengi, and Bunker R. T. 1993. A Semantic Concordance. In Proceedings of the ARPA WorkShop on Human Language Technology. San Francisco, Morgan Kaufman. M. Paramita, M. Sanderson, and P. Clough. 2009. Diversity in Photo Retrieval: Overview of the ImageCLEFPhoto task 2009. CLEF working notes, 2009. M. Ruiz-Casado, E. Alfonseca, and P. Castells. 2005. Automatic Assignment of Wikipedia Encyclopaedic Entries to Wordnet Synsets. Advances in Web Intelligence, 3528:380–386. M. Sanderson. 2000. Retrieving with Good Sense. Information Retrieval, 2(1):49–69. M. Sanderson. 2008. Ambiguous Queries: Test Collections Need More Sense. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 499–506. ACM New York, NY, USA. C. Santamar ı´a, J. Gonzalo, and F. Verdejo. 2003. Automatic Association of Web Directories to Word Senses. Computational Linguistics, 29(3):485–502. Z. S. Syed, T. Finin, and Joshi. A. 2008. Wikipedia as an Ontology for Describing Documents. In Proc. ICWSM’08. 1366

4 0.63381451 257 acl-2010-WSD as a Distributed Constraint Optimization Problem

Author: Siva Reddy ; Abhilash Inumella

Abstract: This work models Word Sense Disambiguation (WSD) problem as a Distributed Constraint Optimization Problem (DCOP). To model WSD as a DCOP, we view information from various knowledge sources as constraints. DCOP algorithms have the remarkable property to jointly maximize over a wide range of utility functions associated with these constraints. We show how utility functions can be designed for various knowledge sources. For the purpose of evaluation, we modelled all words WSD as a simple DCOP problem. The results are competi- tive with state-of-art knowledge based systems.

5 0.62784588 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection

Author: Linlin Li ; Benjamin Roth ; Caroline Sporleder

Abstract: This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instantiations of the model for solving sense disambiguation problems with different degrees of resource availability. The proposed models are tested on three different tasks: coarse-grained word sense disambiguation, fine-grained word sense disambiguation, and detection of literal vs. nonliteral usages of potentially idiomatic expressions. In all three cases, we outper- form state-of-the-art systems either quantitatively or statistically significantly.

6 0.61973321 62 acl-2010-Combining Orthogonal Monolingual and Multilingual Sources of Evidence for All Words WSD

7 0.59888941 26 acl-2010-All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision

8 0.57440042 126 acl-2010-GernEdiT - The GermaNet Editing Tool

9 0.56772 152 acl-2010-It Makes Sense: A Wide-Coverage Word Sense Disambiguation System for Free Text

10 0.54831398 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation

11 0.54144233 250 acl-2010-Untangling the Cross-Lingual Link Structure of Wikipedia

12 0.46726134 5 acl-2010-A Framework for Figurative Language Detection Based on Sense Differentiation

13 0.41204399 166 acl-2010-Learning Word-Class Lattices for Definition and Hypernym Extraction

14 0.39048171 185 acl-2010-Open Information Extraction Using Wikipedia

15 0.36806354 121 acl-2010-Generating Entailment Rules from FrameNet

16 0.34949726 141 acl-2010-Identifying Text Polarity Using Random Walks

17 0.32911307 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network

18 0.30388263 41 acl-2010-Automatic Selectional Preference Acquisition for Latin Verbs

19 0.29736307 159 acl-2010-Learning 5000 Relational Extractors

20 0.29127812 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.011), (14, 0.022), (25, 0.076), (42, 0.022), (44, 0.013), (51, 0.057), (59, 0.262), (67, 0.015), (68, 0.089), (73, 0.051), (76, 0.012), (78, 0.038), (80, 0.018), (83, 0.069), (84, 0.032), (98, 0.078)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9315089 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems

Author: Simone Paolo Ponzetto ; Roberto Navigli

Abstract: One of the main obstacles to highperformance Word Sense Disambiguation (WSD) is the knowledge acquisition bottleneck. In this paper, we present a methodology to automatically extend WordNet with large amounts of semantic relations from an encyclopedic resource, namely Wikipedia. We show that, when provided with a vast amount of high-quality semantic relations, simple knowledge-lean disambiguation algorithms compete with state-of-the-art supervised WSD systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets.

2 0.91514778 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models

Author: Chris Dyer ; Adam Lopez ; Juri Ganitkevitch ; Jonathan Weese ; Ferhan Ture ; Phil Blunsom ; Hendra Setiawan ; Vladimir Eidelman ; Philip Resnik

Abstract: Adam Lopez University of Edinburgh alopez@inf.ed.ac.uk Juri Ganitkevitch Johns Hopkins University juri@cs.jhu.edu Ferhan Ture University of Maryland fture@cs.umd.edu Phil Blunsom Oxford University pblunsom@comlab.ox.ac.uk Vladimir Eidelman University of Maryland vlad@umiacs.umd.edu Philip Resnik University of Maryland resnik@umiacs.umd.edu classes in a unified way.1 Although open source decoders for both phraseWe present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including word-based models, phrase-based models, and models based on synchronous context-free grammars. Using a single unified internal representation for translation forests, the decoder strictly separates model-specific translation logic from general rescoring, pruning, and inference algorithms. From this unified representation, the decoder can extract not only the 1- or k-best translations, but also alignments to a reference, or the quantities necessary to drive discriminative training using gradient-based or gradient-free optimization techniques. Its efficient C++ implementation means that memory use and runtime performance are significantly better than comparable decoders.

3 0.91416007 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs

Author: Galina Tremper

Abstract: Presupposition relations between verbs are not very well covered in existing lexical semantic resources. We propose a weakly supervised algorithm for learning presupposition relations between verbs that distinguishes five semantic relations: presupposition, entailment, temporal inclusion, antonymy and other/no relation. We start with a number of seed verb pairs selected manually for each semantic relation and classify unseen verb pairs. Our algorithm achieves an overall accuracy of 36% for type-based classification.

4 0.91010129 44 acl-2010-BabelNet: Building a Very Large Multilingual Semantic Network

Author: Roberto Navigli ; Simone Paolo Ponzetto

Abstract: In this paper we present BabelNet a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia. In addition Machine Translation is also applied to enrich the resource with lexical information for all languages. We conduct experiments on new and existing gold-standard datasets to show the high quality and coverage of the resource. –

5 0.90786684 205 acl-2010-SVD and Clustering for Unsupervised POS Tagging

Author: Michael Lamar ; Yariv Maron ; Mark Johnson ; Elie Bienenstock

Abstract: We revisit the algorithm of Schütze (1995) for unsupervised part-of-speech tagging. The algorithm uses reduced-rank singular value decomposition followed by clustering to extract latent features from context distributions. As implemented here, it achieves state-of-the-art tagging accuracy at considerably less cost than more recent methods. It can also produce a range of finer-grained taggings, with potential applications to various tasks. 1

6 0.90075761 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out

7 0.89894372 151 acl-2010-Intelligent Selection of Language Model Training Data

8 0.87400132 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study

9 0.84243864 114 acl-2010-Faster Parsing by Supertagger Adaptation

10 0.83750641 26 acl-2010-All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision

11 0.8301146 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences

12 0.82968289 97 acl-2010-Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices

13 0.81605607 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons

14 0.81500304 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network

15 0.8143791 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future

16 0.81212711 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation

17 0.80948567 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns

18 0.80863988 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years

19 0.80854976 212 acl-2010-Simple Semi-Supervised Training of Part-Of-Speech Taggers

20 0.80647779 169 acl-2010-Learning to Translate with Source and Target Syntax