acl acl2010 acl2010-257 knowledge-graph by maker-knowledge-mining

257 acl-2010-WSD as a Distributed Constraint Optimization Problem


Source: pdf

Author: Siva Reddy ; Abhilash Inumella

Abstract: This work models Word Sense Disambiguation (WSD) problem as a Distributed Constraint Optimization Problem (DCOP). To model WSD as a DCOP, we view information from various knowledge sources as constraints. DCOP algorithms have the remarkable property to jointly maximize over a wide range of utility functions associated with these constraints. We show how utility functions can be designed for various knowledge sources. For the purpose of evaluation, we modelled all words WSD as a simple DCOP problem. The results are competi- tive with state-of-art knowledge based systems.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 To model WSD as a DCOP, we view information from various knowledge sources as constraints. [sent-5, score-0.29]

2 DCOP algorithms have the remarkable property to jointly maximize over a wide range of utility functions associated with these constraints. [sent-6, score-0.357]

3 We show how utility functions can be designed for various knowledge sources. [sent-7, score-0.457]

4 The results are competi- tive with state-of-art knowledge based systems. [sent-9, score-0.098]

5 The correct sense of a word can be identified based on the context in which it occurs. [sent-11, score-0.143]

6 In the sentence, He took all his money from the bank, bank refers to a financial institution sense instead of other possibilities like the edge of river sense. [sent-12, score-0.28]

7 Given a word and its possible senses, as defined by a dictionary, the problem of Word Sense Disambiguation (WSD) can be defined as the task of assigning the most appropriate sense to the word within a given context. [sent-13, score-0.143]

8 A range of knowledge sources have been found to be useful for WSD. [sent-15, score-0.255]

9 Methods for WSD exploit information from one or more of these knowledge sources. [sent-21, score-0.098]

10 , 2002; Stevenson and Wilks, 2001) used collective information from various knowledge sources to perform disambiguation. [sent-23, score-0.29]

11 Information from various knowledge sources is encoded in the form of a feature vector and models were built by training on sense-tagged corpora. [sent-24, score-0.29]

12 They crucially rely on hand-tagged sense corpora which is hard to obtain. [sent-26, score-0.143]

13 Agirre and Martinez (Agirre and Mart ı´nez, 2001) evaluated the contribution of each knowledge source separately. [sent-28, score-0.098]

14 However, this does not combine information from more than one knowledge source. [sent-29, score-0.098]

15 In any case, little effort has been made in formalizing the way in which information from various knowledge sources can be collectively used within a single framework: a framework that allows interaction of evidence from various knowledge sources to arrive at a global optimal solution. [sent-30, score-0.629]

16 Here we present a way for modelling information from various knowledge sources in a multi agent setting called distributed constraint optimization problem (DCOP). [sent-31, score-0.669]

17 In DCOP, agents have constraints on their values and each constraint has a utility associated with it. [sent-32, score-0.558]

18 The agents communicate with each other and choose values such that a global optimum solution (maximum utility) is attained. [sent-33, score-0.202]

19 In DCOP framework, information from various knowledge sources can be used combinedly to perform WSD. [sent-36, score-0.29]

20 Utility functions for various knowledge sources are described in section 4. [sent-41, score-0.334]

21 In section 5, we conduct a simple experiment by modelling allwords WSD problem as a DCOP and perform disambiguation on Senseval-2 (Cotton et al. [sent-42, score-0.153]

22 Only the agent has knowledge and × control over values assigned to variables associated to it. [sent-53, score-0.257]

23 The goal for the agents is to choose values for variables such that a given global objective function is maximized. [sent-54, score-0.27]

24 The objective function is described as the summation over a set of utility functions. [sent-55, score-0.374]

25 The utility function fk is defined over a subset of variables V . [sent-70, score-0.485]

26 The domain of fk represent the constraints Cfk and fk (c) represents the utility associated with the con- straint c, where c ∈ Cfk . [sent-71, score-0.678]

27 • F =Xzk · fk is the objective function to be maxiXmkized where zkis the weight of the corresponding utility function fk An agent is allowed to communicate only with its neighbours. [sent-72, score-0.793]

28 Agents communicate with each other to agree upon a solution which maximizes the objective function. [sent-73, score-0.122]

29 wn} wGiivthe corresponding wadomrdisssi Wbl=e senses Dwi = {s1wi , s2wi . [sent-77, score-0.126]

30 The agent (word) has knowledge and control of its values (senses). [sent-83, score-0.184]

31 Each agent wi is associated with the variable swi . [sent-87, score-0.3]

32 The value assigned to this variable indicates the sense assigned by the algorithm. [sent-88, score-0.168]

33 The set of senses Dwi , is the domain of the variable swi . [sent-91, score-0.257]

34 4 Constraints A constraint specifies a particular configuration of the agents involved in its definition and has a utility associated with it. [sent-93, score-0.526]

35 If cij is a constraint defined on agents wi and wj, then cij refers to a particular instantiation of wi and wj, say wi = spwi and wj = sqwj . [sent-96, score-0.649]

36 A utility function fk : Cfk → ℜ denote a set of constraints Cfk = {Dwi Dwj . [sent-97, score-0.477]

37 We model information from each knowledge source as a utility function. [sent-104, score-0.378]

38 5 Objective function As already stated, various knowledge sources are identified to be useful for WSD. [sent-107, score-0.328]

39 It is desirable to use information from these sources collectively, to perform disambiguation. [sent-108, score-0.157]

40 DCOP provides such framework where an objective function is defined over all the knowledge sources (fk) as below F =Xzk · fk Xk where F denotes the total utility associated with a solution and zk is the weight given to a knowledge source i. [sent-109, score-0.912]

41 information from various sources 14 can be weighted. [sent-111, score-0.192]

42 (Note: It is desirable to normalize utility functions of different knowledge sources in order to compare them. [sent-112, score-0.579]

43 ) Every agent (word) choose its value (sense) in a such a way that the objective function (global solution) is maximized. [sent-113, score-0.18]

44 This way an agent is assigned a best value which is the target sense in our case. [sent-114, score-0.229]

45 4 Modelling information from various knowledge sources In this section, we discuss the modelling of information from various knowledge sources. [sent-115, score-0.49]

46 It has 47 senses out of which only 17 senses correspond to noun category. [sent-118, score-0.252]

47 Such information can be captured using a unary utility function defined for every word. [sent-126, score-0.318]

48 If the sense distributions of a word wi are known, a function f : Dwi → ℜ is defined which return higher utility for the senses sfa dveofuinreedd by the domain than to the other senses. [sent-127, score-0.719]

49 4 Sense Relatedness Sense relatedness between senses of two words wi, wj is captured by a function f : Dwi Dwj → ℜ where f returns sense relatedness (utility) b→etℜw weenhe senses tbuarsnesd on sense taxonomy atilnidty gloss overlaps. [sent-129, score-0.845]

50 5 Discourse Discourse constraints can be modelled using a n-ary function. [sent-131, score-0.068]

51 For instance, to the extent one sense per discourse (Gale et al. [sent-132, score-0.143]

52 , 1992) holds true, higher utility can be returned to the solutions which favour same sense to all the occurrences of a word in a given discourse. [sent-133, score-0.423]

53 This information can be modeled as follows: If wi, wj , . [sent-134, score-0.092]

54 wm are the occurrences of a same word, a function f : Di Dj . [sent-137, score-0.069]

55 Dm → ℜ is defined which returns higher utility . [sent-140, score-0.308]

56 6 Collocations Collocations of a word are known to provide strong evidence for identifying correct sense of the word. [sent-147, score-0.143]

57 For example: if in a given context bank cooccur with money, it is likely that bank refers to financial institution sense rather than the edge of a river sense. [sent-148, score-0.319]

58 The word cancer has at least two senses, one corresponding to the astrological sign and the other a disease. [sent-149, score-0.065]

59 But its derived form cancerous can only be used in disease sense. [sent-150, score-0.081]

60 When the words cancer and cancerous co-occur in a discourse, it is likely that the word cancer refers to disease sense. [sent-151, score-0.24]

61 Most supervised systems work through collocations to identify correct sense of a word. [sent-152, score-0.225]

62 If a word wi co-occurs with its collocate v, collocational information from v can be modeled by using the following function coll infrm vwi : Dwi → ℜ where coll infrm vwi returns high utility to collocationally preferred senses of wi than other senses. [sent-153, score-0.992]

63 Collocations can also be modeled by assigning more than one variable to the agents or by adding a dummy agent which gives collocational information but in view of simplicity we do not go into those details. [sent-154, score-0.327]

64 Topical word associations, semantic word associations, selectional preferences can also be modeled similar to collocations. [sent-155, score-0.086]

65 Complex information involving more than two entities can be modelled by using n-ary utility functions. [sent-156, score-0.316]

66 A utility function based on semantic relatedness is defined for every pair of words falling in a particular window size. [sent-160, score-0.387]

67 Restricting utility functions to a window size reduces the number ofconstraints. [sent-161, score-0.324]

68 An objective function is defined as sum of these restricted utility functions over the entire sentence and thus allowing information flow across all the words. [sent-162, score-0.418]

69 Hence, a DCOP algorithm which aims to maximize this objective function leads to a globally optimal solution. [sent-163, score-0.094]

70 Results show that our system performs consistently better than (Sinha and Mihalcea, 2007) which uses exactly same knowledge sources as used by us (with an exception of adverbs in Senseval-2). [sent-175, score-0.255]

71 Table 1 also shows the system (Agirre and Soroa, 2009), which obtained best results for knowledge based WSD. [sent-178, score-0.098]

72 A direct comparison between this and our system is not quantitative since they used additional knowledge such as extended WordNet relations (Mihalcea and 1http://liawww. [sent-179, score-0.098]

73 ch/frodo/ Moldovan, 2001) and sense disambiguated gloss present in WordNet3. [sent-181, score-0.182]

74 In Agirre09, it falls in the range 30 to 180 minutes on much powerful system with 16 GB memory having four 2. [sent-263, score-0.07]

75 Since DCOP algorithms are truly distributed in nature the execution times can be further reduced by running them parallely on multiple processors. [sent-266, score-0.125]

76 6 Related work Earlier approaches to WSD which encoded information from variety of knowledge sources can be classified as follows: • Supervised approaches: Most of the supervSiuspeder systems (Yarowsky Manods Florian, 2002; 16 Lee and Ng, 2002; Mart ı´nez et al. [sent-267, score-0.255]

77 , 2002; Stevenson and Wilks, 2001) rely on the sense tagged data. [sent-268, score-0.143]

78 These are mainly discriminative or aggregative models which essentially pose WSD a classification problem. [sent-269, score-0.07]

79 Further, they lack the ability to directly represent constraints like one sense per discourse. [sent-273, score-0.175]

80 • 7 Graph based approaches: These approaches crucially rely on alecxhiecsa:l knowledge abcahsees. [sent-274, score-0.098]

81 Graph-based WSD approaches (Agirre and Soroa, 2009; Sinha and Mihalcea, 2007) perform disambiguation over a graph composed of senses (nodes) and relations between pairs of senses (edges). [sent-275, score-0.305]

82 The edge weights encode information from a lexical knowledge base but lack an efficient way of modelling information from other knowledge sources like collocational information, selectional preferences, domain information, discourse. [sent-276, score-0.543]

83 Also, the edges represent binary utility functions defined over two entities which lacks the ability to encode ternary, and in general, any Nary utility functions. [sent-277, score-0.604]

84 Discussion This framework provides a convenient way of integrating information from various knowledge sources by defining their utility functions. [sent-278, score-0.57]

85 Information from different knowledge sources can be weighed based on the setting at hand. [sent-279, score-0.298]

86 For example, in a domain specific WSD setting, sense distributions play a crucial role. [sent-280, score-0.184]

87 The utility function corresponding to the sense distributions can be weighed higher in order to take advantage of domain information. [sent-281, score-0.545]

88 Thus for a given WSD setting, this framework allows us to find 1) the impact of each knowledge source individually 2) the best combination of knowledge sources. [sent-283, score-0.196]

89 As the number of constraints or words increase, the search space increases thereby increasing the time and memory bounds to solve them. [sent-287, score-0.072]

90 Also DCOP algorithms ex- hibit a trade-off between memory used and number of messages communicated between agents. [sent-288, score-0.072]

91 DPOP (Petcu and Faltings, 2005) use linear number of messages but requires exponential memory whereas ADOPT (Modi et al. [sent-289, score-0.072]

92 8 Future Work In our experiment, we only used relatedness based utility functions derived from WordNet. [sent-292, score-0.393]

93 Effect of other knowledge sources remains to be evaluated individually and in combination. [sent-293, score-0.255]

94 The best possible combination of weights of knowledge sources is yet to be engineered. [sent-294, score-0.255]

95 9 Conclusion We initiated a new line of investigation into WSD by modelling it in a distributed constraint optimization framework. [sent-296, score-0.293]

96 We showed that this framework is powerful enough to encode information from various knowledge sources. [sent-297, score-0.133]

97 Our experimental results show that a simple DCOP based model encoding just word similarity constraints performs comparably with the state-of-the-art knowledge based WSD systems. [sent-298, score-0.13]

98 An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. [sent-349, score-0.398]

99 Unsupervised graph-basedword sense disambiguation using measures of word semantic similarity. [sent-390, score-0.196]

100 The interaction of knowledge sources in word sense disambiguation. [sent-395, score-0.398]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dcop', 0.629), ('utility', 0.28), ('wsd', 0.273), ('dwi', 0.174), ('sources', 0.157), ('sense', 0.143), ('agirre', 0.14), ('agents', 0.136), ('fk', 0.127), ('senses', 0.126), ('modi', 0.108), ('sinha', 0.105), ('distributed', 0.1), ('knowledge', 0.098), ('mihalcea', 0.094), ('wi', 0.091), ('cfk', 0.087), ('petcu', 0.087), ('agent', 0.086), ('mart', 0.082), ('collocations', 0.082), ('nez', 0.077), ('constraint', 0.077), ('relatedness', 0.069), ('modelling', 0.067), ('cancer', 0.065), ('faltings', 0.065), ('frodo', 0.065), ('swi', 0.065), ('wj', 0.064), ('stevenson', 0.057), ('objective', 0.056), ('eneko', 0.054), ('disambiguation', 0.053), ('collocational', 0.052), ('soroa', 0.052), ('optimization', 0.049), ('collectively', 0.049), ('edmonds', 0.049), ('multiagent', 0.049), ('associations', 0.047), ('functions', 0.044), ('aggregative', 0.043), ('cancerous', 0.043), ('coll', 0.043), ('dcops', 0.043), ('dpop', 0.043), ('dwj', 0.043), ('eaut', 0.043), ('iiit', 0.043), ('infrm', 0.043), ('institution', 0.043), ('mailler', 0.043), ('pragnesh', 0.043), ('reddy', 0.043), ('vwi', 0.043), ('weighed', 0.043), ('xzk', 0.043), ('communicate', 0.041), ('domain', 0.041), ('memory', 0.04), ('variables', 0.04), ('gloss', 0.039), ('bank', 0.039), ('disease', 0.038), ('straint', 0.038), ('function', 0.038), ('modelled', 0.036), ('gb', 0.035), ('cij', 0.035), ('cotton', 0.035), ('jay', 0.035), ('various', 0.035), ('rada', 0.034), ('yarowsky', 0.034), ('associated', 0.033), ('allwords', 0.033), ('dj', 0.033), ('messages', 0.032), ('constraints', 0.032), ('wm', 0.031), ('wilks', 0.031), ('minutes', 0.03), ('selectional', 0.03), ('wordnet', 0.03), ('refers', 0.029), ('returns', 0.028), ('ghz', 0.028), ('india', 0.028), ('preferences', 0.028), ('modeled', 0.028), ('pose', 0.027), ('florian', 0.027), ('hyderabad', 0.027), ('dm', 0.026), ('river', 0.026), ('dn', 0.026), ('execution', 0.025), ('solution', 0.025), ('variable', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 257 acl-2010-WSD as a Distributed Constraint Optimization Problem

Author: Siva Reddy ; Abhilash Inumella

Abstract: This work models Word Sense Disambiguation (WSD) problem as a Distributed Constraint Optimization Problem (DCOP). To model WSD as a DCOP, we view information from various knowledge sources as constraints. DCOP algorithms have the remarkable property to jointly maximize over a wide range of utility functions associated with these constraints. We show how utility functions can be designed for various knowledge sources. For the purpose of evaluation, we modelled all words WSD as a simple DCOP problem. The results are competi- tive with state-of-art knowledge based systems.

2 0.20366555 152 acl-2010-It Makes Sense: A Wide-Coverage Word Sense Disambiguation System for Free Text

Author: Zhi Zhong ; Hwee Tou Ng

Abstract: Word sense disambiguation (WSD) systems based on supervised learning achieved the best performance in SensEval and SemEval workshops. However, there are few publicly available open source WSD systems. This limits the use of WSD in other applications, especially for researchers whose research interests are not in WSD. In this paper, we present IMS, a supervised English all-words WSD system. The flexible framework of IMS allows users to integrate different preprocessing tools, additional features, and different classifiers. By default, we use linear support vector machines as the classifier with multiple knowledge-based features. In our implementation, IMS achieves state-of-the-art results on several SensEval and SemEval tasks.

3 0.16741806 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems

Author: Simone Paolo Ponzetto ; Roberto Navigli

Abstract: One of the main obstacles to highperformance Word Sense Disambiguation (WSD) is the knowledge acquisition bottleneck. In this paper, we present a methodology to automatically extend WordNet with large amounts of semantic relations from an encyclopedic resource, namely Wikipedia. We show that, when provided with a vast amount of high-quality semantic relations, simple knowledge-lean disambiguation algorithms compete with state-of-the-art supervised WSD systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets.

4 0.1631002 62 acl-2010-Combining Orthogonal Monolingual and Multilingual Sources of Evidence for All Words WSD

Author: Weiwei Guo ; Mona Diab

Abstract: Word Sense Disambiguation remains one ofthe most complex problems facing computational linguists to date. In this paper we present a system that combines evidence from a monolingual WSD system together with that from a multilingual WSD system to yield state of the art performance on standard All-Words data sets. The monolingual system is based on a modification ofthe graph based state ofthe art algorithm In-Degree. The multilingual system is an improvement over an AllWords unsupervised approach, SALAAM. SALAAM exploits multilingual evidence as a means of disambiguation. In this paper, we present modifications to both of the original approaches and then their combination. We finally report the highest results obtained to date on the SENSEVAL 2 standard data set using an unsupervised method, we achieve an overall F measure of 64.58 using a voting scheme.

5 0.15180492 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection

Author: Linlin Li ; Benjamin Roth ; Caroline Sporleder

Abstract: This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instantiations of the model for solving sense disambiguation problems with different degrees of resource availability. The proposed models are tested on three different tasks: coarse-grained word sense disambiguation, fine-grained word sense disambiguation, and detection of literal vs. nonliteral usages of potentially idiomatic expressions. In all three cases, we outper- form state-of-the-art systems either quantitatively or statistically significantly.

6 0.13950738 26 acl-2010-All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision

7 0.12901707 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results

8 0.098851748 3 acl-2010-A Bayesian Method for Robust Estimation of Distributional Similarities

9 0.090946779 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation

10 0.080728747 44 acl-2010-BabelNet: Building a Very Large Multilingual Semantic Network

11 0.068278298 20 acl-2010-A Transition-Based Parser for 2-Planar Dependency Structures

12 0.067644857 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation

13 0.064507343 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses

14 0.059667502 27 acl-2010-An Active Learning Approach to Finding Related Terms

15 0.059150267 6 acl-2010-A Game-Theoretic Model of Metaphorical Bargaining

16 0.05642318 5 acl-2010-A Framework for Figurative Language Detection Based on Sense Differentiation

17 0.053490084 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models

18 0.048505705 158 acl-2010-Latent Variable Models of Selectional Preference

19 0.047437806 242 acl-2010-Tree-Based Deterministic Dependency Parsing - An Application to Nivre's Method -

20 0.045052521 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.138), (1, 0.077), (2, -0.038), (3, -0.024), (4, 0.227), (5, -0.002), (6, 0.107), (7, 0.077), (8, -0.04), (9, 0.074), (10, 0.029), (11, -0.104), (12, 0.056), (13, 0.11), (14, -0.039), (15, -0.116), (16, -0.029), (17, -0.106), (18, 0.014), (19, 0.044), (20, -0.001), (21, -0.037), (22, -0.024), (23, 0.004), (24, 0.059), (25, 0.056), (26, -0.009), (27, -0.011), (28, 0.092), (29, 0.13), (30, 0.044), (31, 0.055), (32, -0.003), (33, -0.009), (34, -0.031), (35, 0.05), (36, 0.001), (37, 0.062), (38, -0.091), (39, 0.067), (40, 0.022), (41, 0.12), (42, 0.145), (43, 0.039), (44, 0.081), (45, -0.122), (46, -0.045), (47, -0.006), (48, 0.048), (49, 0.045)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95056182 257 acl-2010-WSD as a Distributed Constraint Optimization Problem

Author: Siva Reddy ; Abhilash Inumella

Abstract: This work models Word Sense Disambiguation (WSD) problem as a Distributed Constraint Optimization Problem (DCOP). To model WSD as a DCOP, we view information from various knowledge sources as constraints. DCOP algorithms have the remarkable property to jointly maximize over a wide range of utility functions associated with these constraints. We show how utility functions can be designed for various knowledge sources. For the purpose of evaluation, we modelled all words WSD as a simple DCOP problem. The results are competi- tive with state-of-art knowledge based systems.

2 0.77062577 152 acl-2010-It Makes Sense: A Wide-Coverage Word Sense Disambiguation System for Free Text

Author: Zhi Zhong ; Hwee Tou Ng

Abstract: Word sense disambiguation (WSD) systems based on supervised learning achieved the best performance in SensEval and SemEval workshops. However, there are few publicly available open source WSD systems. This limits the use of WSD in other applications, especially for researchers whose research interests are not in WSD. In this paper, we present IMS, a supervised English all-words WSD system. The flexible framework of IMS allows users to integrate different preprocessing tools, additional features, and different classifiers. By default, we use linear support vector machines as the classifier with multiple knowledge-based features. In our implementation, IMS achieves state-of-the-art results on several SensEval and SemEval tasks.

3 0.76606524 62 acl-2010-Combining Orthogonal Monolingual and Multilingual Sources of Evidence for All Words WSD

Author: Weiwei Guo ; Mona Diab

Abstract: Word Sense Disambiguation remains one ofthe most complex problems facing computational linguists to date. In this paper we present a system that combines evidence from a monolingual WSD system together with that from a multilingual WSD system to yield state of the art performance on standard All-Words data sets. The monolingual system is based on a modification ofthe graph based state ofthe art algorithm In-Degree. The multilingual system is an improvement over an AllWords unsupervised approach, SALAAM. SALAAM exploits multilingual evidence as a means of disambiguation. In this paper, we present modifications to both of the original approaches and then their combination. We finally report the highest results obtained to date on the SENSEVAL 2 standard data set using an unsupervised method, we achieve an overall F measure of 64.58 using a voting scheme.

4 0.73871374 26 acl-2010-All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision

Author: Mitesh Khapra ; Anup Kulkarni ; Saurabh Sohoney ; Pushpak Bhattacharyya

Abstract: In spite of decades of research on word sense disambiguation (WSD), all-words general purpose WSD has remained a distant goal. Many supervised WSD systems have been built, but the effort of creating the training corpus - annotated sense marked corpora - has always been a matter of concern. Therefore, attempts have been made to develop unsupervised and knowledge based techniques for WSD which do not need sense marked corpora. However such approaches have not proved effective, since they typically do not better Wordnet first sense baseline accuracy. Our research reported here proposes to stick to the supervised approach, but with far less demand on annotation. We show that if we have ANY sense marked corpora, be it from mixed domain or a specific domain, a small amount of annotation in ANY other domain can deliver the goods almost as if exhaustive sense marking were available in that domain. We have tested our approach across Tourism and Health domain corpora, using also the well known mixed domain SemCor corpus. Accuracy figures close to self domain training lend credence to the viability of our approach. Our contribution thus lies in finding a convenient middle ground between pure supervised and pure unsupervised WSD. Finally, our approach is not restricted to any specific set of target words, a departure from a commonly observed practice in domain specific WSD.

5 0.59543794 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection

Author: Linlin Li ; Benjamin Roth ; Caroline Sporleder

Abstract: This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instantiations of the model for solving sense disambiguation problems with different degrees of resource availability. The proposed models are tested on three different tasks: coarse-grained word sense disambiguation, fine-grained word sense disambiguation, and detection of literal vs. nonliteral usages of potentially idiomatic expressions. In all three cases, we outper- form state-of-the-art systems either quantitatively or statistically significantly.

6 0.58144903 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems

7 0.4924866 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results

8 0.48825067 3 acl-2010-A Bayesian Method for Robust Estimation of Distributional Similarities

9 0.45367584 44 acl-2010-BabelNet: Building a Very Large Multilingual Semantic Network

10 0.44379604 5 acl-2010-A Framework for Figurative Language Detection Based on Sense Differentiation

11 0.34989432 27 acl-2010-An Active Learning Approach to Finding Related Terms

12 0.34286374 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation

13 0.31315577 186 acl-2010-Optimal Rank Reduction for Linear Context-Free Rewriting Systems with Fan-Out Two

14 0.30372792 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses

15 0.29950157 201 acl-2010-Pseudo-Word for Phrase-Based Machine Translation

16 0.29146296 92 acl-2010-Don't 'Have a Clue'? Unsupervised Co-Learning of Downward-Entailing Operators.

17 0.29002205 117 acl-2010-Fine-Grained Genre Classification Using Structural Learning Algorithms

18 0.28887042 7 acl-2010-A Generalized-Zero-Preserving Method for Compact Encoding of Concept Lattices

19 0.2801789 43 acl-2010-Automatically Generating Term Frequency Induced Taxonomies

20 0.27848104 175 acl-2010-Models of Metaphor in NLP


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.025), (25, 0.117), (39, 0.017), (42, 0.017), (44, 0.011), (59, 0.097), (73, 0.033), (78, 0.029), (80, 0.019), (82, 0.312), (83, 0.057), (84, 0.028), (98, 0.132)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75511092 257 acl-2010-WSD as a Distributed Constraint Optimization Problem

Author: Siva Reddy ; Abhilash Inumella

Abstract: This work models Word Sense Disambiguation (WSD) problem as a Distributed Constraint Optimization Problem (DCOP). To model WSD as a DCOP, we view information from various knowledge sources as constraints. DCOP algorithms have the remarkable property to jointly maximize over a wide range of utility functions associated with these constraints. We show how utility functions can be designed for various knowledge sources. For the purpose of evaluation, we modelled all words WSD as a simple DCOP problem. The results are competi- tive with state-of-art knowledge based systems.

2 0.75183296 43 acl-2010-Automatically Generating Term Frequency Induced Taxonomies

Author: Karin Murthy ; Tanveer A Faruquie ; L Venkata Subramaniam ; Hima Prasad K ; Mukesh Mohania

Abstract: We propose a novel method to automatically acquire a term-frequency-based taxonomy from a corpus using an unsupervised method. A term-frequency-based taxonomy is useful for application domains where the frequency with which terms occur on their own and in combination with other terms imposes a natural term hierarchy. We highlight an application for our approach and demonstrate its effectiveness and robustness in extracting knowledge from real-world data.

3 0.54978228 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection

Author: Linlin Li ; Benjamin Roth ; Caroline Sporleder

Abstract: This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instantiations of the model for solving sense disambiguation problems with different degrees of resource availability. The proposed models are tested on three different tasks: coarse-grained word sense disambiguation, fine-grained word sense disambiguation, and detection of literal vs. nonliteral usages of potentially idiomatic expressions. In all three cases, we outper- form state-of-the-art systems either quantitatively or statistically significantly.

4 0.54951912 169 acl-2010-Learning to Translate with Source and Target Syntax

Author: David Chiang

Abstract: Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. But progress has been slower on translation models that are able to learn the relationship between the grammars of both the source and target language. We discuss the reasons why this has been a challenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim- ple approach that uses both source and target syntax for significant improvements in translation accuracy.

5 0.54947418 69 acl-2010-Constituency to Dependency Translation with Forests

Author: Haitao Mi ; Qun Liu

Abstract: Tree-to-string systems (and their forestbased extensions) have gained steady popularity thanks to their simplicity and efficiency, but there is a major limitation: they are unable to guarantee the grammaticality of the output, which is explicitly modeled in string-to-tree systems via targetside syntax. We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model, which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model) to ensure grammaticality. Medium-scale experiments show an absolute and statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with fewer rules. This is also the first time that a treeto-tree model can surpass tree-to-string counterparts.

6 0.54799426 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion

7 0.54345149 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons

8 0.54299831 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

9 0.54270536 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results

10 0.54110599 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars

11 0.54071152 71 acl-2010-Convolution Kernel over Packed Parse Forest

12 0.53984809 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar

13 0.53814155 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation

14 0.53232527 191 acl-2010-PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names

15 0.5314939 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation

16 0.53033745 162 acl-2010-Learning Common Grammar from Multilingual Corpus

17 0.52987254 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web

18 0.52898639 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System

19 0.52889442 248 acl-2010-Unsupervised Ontology Induction from Text

20 0.52857804 114 acl-2010-Faster Parsing by Supertagger Adaptation