acl acl2013 acl2013-116 knowledge-graph by maker-knowledge-mining

116 acl-2013-Detecting Metaphor by Contextual Analogy

Source: pdf

Author: Eirini Florou

Abstract: As one of the most challenging issues in NLP, metaphor identification and its interpretation have seen many models and methods proposed. This paper presents a study on metaphor identification based on the semantic similarity between literal and non literal meanings of words that can appear at the same context.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Abstract As one of the most challenging issues in NLP, metaphor identification and its interpretation have seen many models and methods proposed. [sent-3, score-0.666]

2 This paper presents a study on metaphor identification based on the semantic similarity between literal and non literal meanings of words that can appear at the same context. [sent-4, score-1.474]

3 1 Introduction A metaphor is a literary figure of speech that describes a subject by asserting that it is, on some point of comparison, the same as another otherwise unrelated object. [sent-5, score-0.602]

4 A very challenging task in linguistics is the metaphor identification and the its interpretation. [sent-8, score-0.666]

5 Metaphor identification procedure (MIP) is a method for identifying metaphorically used words in discourse. [sent-9, score-0.153]

6 It can be used to recognize metaphors in spoken and written language. [sent-10, score-0.307]

7 Since many words can be considered metaphorical in different contexts, MIP requires a clear distinction between words that convey metaphorical meaning and those that do not, despite the fact that language generally differs in the degrees of metaphoricity. [sent-12, score-0.587]

8 In this paper we propose a method for identifying metaphorical usage in verbs. [sent-13, score-0.357]

9 Our method is looking for semantic analogies in the context of a verb by comparing it against prior known instances of literal and non-literal usage of the same verb in different contexts. [sent-14, score-0.521]

10 After discussing the metaphor identication literature (Section 2), we proceed to present our research proposal (Section 3) and to present and discuss our first experiments based on WordNet similarity measures (Section 4). [sent-15, score-0.73]

11 2 Background According to Lakoff and Johnson (1980), metaphor is a productive phenomenon that operates at the level of mental processes. [sent-17, score-0.633]

12 This view was subsequently acquired and extended by a multitude of approaches (Grady, 1997; Narayanan, 1997; Fauconnier and Tuner, 2002; Feldman, 2006; Pinker, 2007) and the term conceptual metaphor was adopted to describe it. [sent-19, score-0.659]

13 In cognitive linguistics, conceptual metaphor, or cognitive metaphor, refers to the understanding of an idea, or conceptual domain, in terms of another, for example, understanding quantity in terms of directionality as in, for example, ‘prices are rising’ . [sent-20, score-0.114]

14 A conceptual metaphor uses an idea and links it to another in order to better understand something. [sent-21, score-0.659]

15 It is generaly accepted that the conceptual metaphor of viewing communication as a conduit is a large theory explained with a metaphor. [sent-22, score-0.659]

16 These metaphors are prevalent in communication and everyone actually perceives and acts in accordance with the metaphors. [sent-23, score-0.351]

17 1 Metaphor Identification Automatic processing of metaphor can be clearly divided into two subtasks: metaphor identifica23 Sofia, BulPgraoricea,ed Ainug us otf 4 t-h9e 2 A0C13L. [sent-25, score-1.204]

18 tc ud2e0n1t3 R Aes seoacricahti Wonor foksrh Coopm, p augteasti 2o3n–a3l0 L,inguistics tion (distinguishing between literal and metaphorical language in text) and metaphor interpretation (identifying the intended literal meaning of a metaphorical expression). [sent-27, score-1.829]

19 The most influential account of metaphor identification is that of Wilks (1978). [sent-29, score-0.666]

20 According to Wilks, metaphors represent a violation of selectional restrictions in a given context. [sent-30, score-0.406]

21 First, literalness is distinguished from non-literalness using selectional preference violation as an indicator. [sent-37, score-0.139]

22 If the system fails to recognize metonymy, it proceeds to search the knowledge base for a relevant analogy in order to discriminate metaphorical relations from anomalous ones. [sent-39, score-0.331]

23 Berber Sardinha (2002) describes a collocationbased method for spotting metaphors in corpora. [sent-40, score-0.307]

24 The first step was to pick out a reasonable number of words that had an initial likelihood of being part of metaphorical expressions. [sent-42, score-0.279]

25 , 2003), under the assumption that words involved in metaphorical expressions tend to be denotationally unrelated. [sent-45, score-0.314]

26 The scores had to be adapted in order for them to be useful for metaphor analysis. [sent-47, score-0.602]

27 The results indicated that the procedure did pick up some major metaphors in the corpus, but it also captured metonyms. [sent-49, score-0.334]

28 Another approach to finding metaphor in corpora is CorMet, presented by Mason (2004). [sent-50, score-0.602]

29 When the system spots different verbs with similar selectional preferences (i. [sent-52, score-0.143]

30 CorMet requires specific domain corpora and a list of verbs for each domain. [sent-55, score-0.116]

31 The most typical verbs for each specific corpus are identified through frequency markedness, by comparing the frequencies of word stems in the domain corpus with those of the BNC. [sent-58, score-0.144]

32 Alternative approaches search for metaphors of a specific domain defined a priori in a specific type of discourse. [sent-61, score-0.348]

33 This idea originates from a similaritybased word sense disambiguation method developed by Karov and Edelman (1998). [sent-69, score-0.17]

34 The method employs a set of seed sentences, where the senses are annotated, computes similarity between the sentence containing the word to be disambiguated and all of the seed sentences and selects the sense corresponding to the annotation in the most similar seed sentences. [sent-70, score-0.847]

35 Birke and Sarkar (2006) adapt this algorithm to perform a two-way classification: literal vs. [sent-71, score-0.307]

36 (2006) focus only on metaphors expressed by a verb. [sent-76, score-0.307]

37 They use hyponymy relation in WordNet and word bigram counts to predict metaphors at the sentence level. [sent-78, score-0.368]

38 This idea is a modification of the selectional preference view of Wilks (1978). [sent-85, score-0.108]

39 Berber Sardinha (2010) presents a computer program for identifying metaphor candidates, which is intended as a tool that can help researchers find words that are more likely to be metaphor vehicles in a corpus. [sent-86, score-1.255]

40 The program is restricted to finding one component of linguistic metaphors and has been trained on business texts in Portuguese, and so it is restricted to that kind of text. [sent-88, score-0.307]

41 (2012) present an approach to automatic metaphor identification in unrestricted text. [sent-90, score-0.666]

42 Starting from a small seed set of manually annotated metaphorical expressions, the system is capable of harvesting a large number of metaphors of similar syntactic structure from a corpus. [sent-91, score-0.733]

43 Their method captures metaphoricity by means of verb and noun clustering. [sent-92, score-0.109]

44 They tested their system starting with a collection of metaphorical expressions representing verb-subject and verb-object constructions, where the verb is used metaphorically. [sent-97, score-0.375]

45 They evaluated the precision of metaphor identification with the help of human judges. [sent-98, score-0.666]

46 Shutova’s system employing unsupervised methods for metaphor identification operates with precision of 0. [sent-99, score-0.697]

47 For verb and noun clustering, they used the subcategorization frame acquisition system by Preiss et al. [sent-101, score-0.137]

48 (2007) and spectral clustering for both verbs and nouns. [sent-102, score-0.129]

49 They acquired selectional preference distributions for Verb-Subject and Verb-Object relations from the BNC parsed by RASP; adopted Resnik’s selectional preference measure; and applied to a number of tasks in NLP including word sense disambiguation (Resnik, 1997). [sent-103, score-0.386]

50 3 Detecting the metaphor use of a word by contextual analogy The first task for metaphor processing is its identification in a text. [sent-104, score-1.348]

51 We have seen above how previous approaches either utilize hand-coded knowledge (Fass, 1991), (Krishnakumaran and Zhu, 2007) or reduce the task to searching for metaphors of a specific domain defined a priori in a specific type of discourse (Gedigian et al. [sent-105, score-0.376]

52 By contrast, our research proposal is a method that relies on distributional similarity; the assumption is that the lexical items showing similar behaviour in a large body of text most likely have related meanings. [sent-107, score-0.144]

53 It is traditionally assumed that noun clusters produced using distributional clustering contain concepts that are similar to each other. [sent-109, score-0.241]

54 1 Word Sense Disambiguation and Metaphor One of the major developments in metaphor research in the last several years has been the fo- cus on identifying and explicating metaphoric language in real discourse. [sent-111, score-0.664]

55 Most research in Word Sense Disambiguation has concentrated on using contextual features, typically neighboring words, to help infer the correct sense of a target word. [sent-112, score-0.171]

56 In contrast, we are going to discover the predominant sense of a word from raw text because the first sense heuristic is so powerful and because manually sense-tagged data is not always available. [sent-113, score-0.277]

57 In word sense disambiguation, the first or predominant sense heuristic is used when information from the context is not sufficient to make a more informed choice. [sent-114, score-0.316]

58 We will need to use parsed data to find distributionally similar words (nearest neighbors) to the target word which will reflect the different senses of the word and have associated distributional similarity scores which could be used for ranking the senses according to prevalence. [sent-115, score-0.651]

59 The predominant sense for a target word is determined from a prevalence ranking of the possible senses for that word. [sent-116, score-0.44]

60 The senses will come from a predefined inventory which might be a dictio- nary or WordNet-like resource. [sent-117, score-0.215]

61 The ranking will be derived using a distributional thesaurus automatically produced from a large corpus, and a semantic similarity measure will be defined over the sense inventory. [sent-118, score-0.354]

62 The distributional thesaurus will contain a set of words that will be ‘nearest neighbors’ Lin (1998) to the target word with respect to similarity of the way in which they will be distributed. [sent-119, score-0.3]

63 The thesaurus will assign a distributional similarity score to each neighbor word, indicating its closeness to the target word. [sent-120, score-0.355]

64 We assume that the number and distributional similarity scores of neighbors pertaining to a given sense of a target word will reflect the prevalence of that sense in the corpus from which the thesaurus was derived. [sent-121, score-0.621]

65 This is because the more prevalent senses of the word will appear more frequently and in more contexts than other, less prevalent senses. [sent-122, score-0.331]

66 The neighbors of the target word relate to its senses, but are themselves word forms rather than senses. [sent-123, score-0.227]

67 The senses of the target word have to be predefined in a sense inventory and we will need to use a semantic similarity score which will be defined over the sense inventory to relate the neighbors to the various senses of the target word. [sent-124, score-0.969]

68 The measure for ranking the senses will use the sum total of the distributional similarity scores of the k nearest neighbors. [sent-125, score-0.433]

69 This total will be divided between the senses of the target word by apportioning the distributional similarity of each neighbor to the senses. [sent-126, score-0.535]

70 The contribution of each neighbor will be measured in terms of its distributional similarity score so that ‘nearer’ neighbors count for more. [sent-127, score-0.323]

71 The distributional similarity score of each neighbor will be divided between the various senses rather than attributing the neighbor to only one sense. [sent-128, score-0.466]

72 This is done because neighbors can relate to more than one sense due to relationships such as systematic polysemy. [sent-129, score-0.224]

73 To sum up, we will rank the senses of the target word by apportioning the distributional similarity scores of the top k neighbors between the senses. [sent-130, score-0.575]

74 Each distributional similarity score (dss) will be weighted by a normalized semantic similarity score (sss) between the sense and the neighbor. [sent-131, score-0.378]

75 We chose to use the distributional similarity score described by Lin (1998) because it is an unparameterized measure which uses pointwise mutual information to weight features and which has been shown Weeds et al. [sent-132, score-0.212]

76 This measure is based on Lin’s informationtheoretic similarity theorem (Lin, 1997) : The similarity between A and B is measured by the ratio between the amount of information needed to state the commonality of A and B and the information needed to fully describe what A and B are. [sent-134, score-0.221]

77 For the purposes of this work as context should be considered the verb of the seed metaphors. [sent-137, score-0.247]

78 We are going to take as seed metaphors the examples ofLakoff’s Master Metaphor List (Lakoff et al. [sent-138, score-0.454]

79 Then, as we will have already find the k nearest neighbors for each noun and we will have created 26 clusters for nouns which can appear at the same context, we will be able to calculate their semantic similarity. [sent-140, score-0.236]

80 (2003) in order to measure the semantic similarity between each member of the cluster and the noun of the annotated metaphor. [sent-142, score-0.173]

81 The WordNet similarity package supports a range of WordNet similarity scores. [sent-143, score-0.192]

82 Each time, we want to estimate if the similarity between the target noun and the seed metaphor will be higher than the similarity between the target noun and another literal word which could appear at the certain context. [sent-145, score-1.496]

83 Calculating the target word’s semantic sim- ilarity with the seed words (literal or non literal) we will be able to find out if the certain word has a literal or metaphorical meaning at the concrete context. [sent-146, score-0.902]

84 By this way, starting from an already known metaphor, we will be able to identify other non literal uses of words which may appear at the same context, estimating the similarity measure of the target word between the seed metaphor and another literal meaning of a word at the same context. [sent-147, score-1.717]

85 4 First Experiments and Results In order to evaluate our method we search for common English verbs which can take either literal or non literal predicates. [sent-149, score-0.755]

86 As the most common verbs (be, have and do) can function as verbs and auxiliary verbs, we didn’t use them for our experiments. [sent-150, score-0.15]

87 More specifically, at our experiments we concentrated on literal and non literal predicates of the verbs: break, catch, cut, draw, drop, find, get, hate, hear, hold, keep, kill, leave, listen, lose, love, make, pay, put, save, see, take, want. [sent-152, score-0.706]

88 In the case of he BNC, we were able to extract the direct object from the depency parses, but had manually controlled metaphorical vs. [sent-159, score-0.311]

89 In all, we collected 124 instances of literal usage and 275 instances of non-literal usage involving 3 11unique nouns. [sent-161, score-0.413]

90 With this body of literal and non-literal contexts, we tried every possible combination of one literal and one non-literal object for each verb as seed, and tested with the remaining words. [sent-162, score-0.736]

91 Only when differences in the similarities accumulate before the comparison between literal and non-literal context is made (three left-most columns in Table 1), does the choice of similarity measure make a difference. [sent-166, score-0.471]

92 VU-Amst 27 Table 1: Fβ=1 scores for all combinations of seven different similarity measures and five ways of deriving a single judgement on literal usage by testing all senses of a word against all senses of the seed words. [sent-172, score-1.056]

93 MeasureMaximumAverageSumSimple Voting Mean Std dev Mean Std dev Mean Std dev Mean Std dev Adapted Lesk63. [sent-173, score-0.164]

94 For both measures, final judgement is made on average similarity of all senses. [sent-224, score-0.123]

95 5 Conclusions In this paper, we presented a mildly supervised method for identifying metaphorical verb usage by taking the local context into account. [sent-227, score-0.457]

96 Furthermore, our method differs as compares the meaning of nouns which appear at the same context without associating them with concepts and then comparing the concepts. [sent-230, score-0.18]

97 We selected this procedure as words of the same abstract concept maybe not appear at the same context while words from different concepts could appear at the same context, especially when the certain context is metaphorical. [sent-231, score-0.221]

98 met*: a method for discriminating metonymy and metaphor by computer. [sent-249, score-0.639]

99 Using syntactic dependency as local context to resolve word sense ambiguity. [sent-299, score-0.166]

100 Using measures of semantic relatedness for word sense disambiguation. [sent-318, score-0.159]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('metaphor', 0.602), ('literal', 0.307), ('metaphors', 0.307), ('metaphorical', 0.279), ('senses', 0.183), ('seed', 0.147), ('sense', 0.099), ('similarity', 0.096), ('neighbors', 0.09), ('birke', 0.09), ('gedigian', 0.09), ('distributional', 0.087), ('berber', 0.079), ('verbs', 0.075), ('std', 0.069), ('selectional', 0.068), ('cormet', 0.067), ('fass', 0.067), ('krishnakumaran', 0.067), ('padwardhan', 0.067), ('non', 0.066), ('lakoff', 0.066), ('identification', 0.064), ('shutova', 0.063), ('verb', 0.061), ('resnik', 0.057), ('conceptual', 0.057), ('bnc', 0.057), ('clustering', 0.054), ('usage', 0.053), ('analogy', 0.052), ('concepts', 0.052), ('predominant', 0.051), ('neighbor', 0.05), ('noun', 0.048), ('target', 0.046), ('apportioning', 0.045), ('fauconnier', 0.045), ('florou', 0.045), ('karov', 0.045), ('literaly', 0.045), ('sardinha', 0.045), ('wilks', 0.044), ('prevalent', 0.044), ('disambiguation', 0.043), ('thesaurus', 0.043), ('wordnet', 0.043), ('dev', 0.041), ('domain', 0.041), ('sarkar', 0.04), ('preference', 0.04), ('context', 0.039), ('nearest', 0.038), ('metonymy', 0.037), ('mip', 0.037), ('metaphoric', 0.037), ('metaphorically', 0.037), ('srini', 0.037), ('met', 0.036), ('palmer', 0.035), ('relate', 0.035), ('expressions', 0.035), ('preiss', 0.034), ('portuguese', 0.034), ('closeness', 0.033), ('narayanan', 0.033), ('kingsbury', 0.033), ('prevalence', 0.033), ('hyponymy', 0.033), ('appear', 0.032), ('inventory', 0.032), ('measures', 0.032), ('object', 0.032), ('operates', 0.031), ('violation', 0.031), ('taxonomic', 0.03), ('weeds', 0.03), ('lesk', 0.03), ('meaning', 0.029), ('measure', 0.029), ('body', 0.029), ('items', 0.028), ('subcategorization', 0.028), ('nouns', 0.028), ('word', 0.028), ('fillmore', 0.028), ('searching', 0.028), ('vu', 0.027), ('apparent', 0.027), ('judgement', 0.027), ('procedure', 0.027), ('wu', 0.027), ('rhetorical', 0.026), ('concentrated', 0.026), ('intended', 0.026), ('tony', 0.026), ('collocations', 0.026), ('identifying', 0.025), ('master', 0.025), ('unannotated', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 116 acl-2013-Detecting Metaphor by Contextual Analogy

Author: Eirini Florou

2 0.41379282 253 acl-2013-Multilingual Affect Polarity and Valence Prediction in Metaphor-Rich Texts

Author: Zornitsa Kozareva

Abstract: Metaphor is an important way of conveying the affect of people, hence understanding how people use metaphors to convey affect is important for the communication between individuals and increases cohesion if the perceived affect of the concrete example is the same for the two individuals. Therefore, building computational models that can automatically identify the affect in metaphor-rich texts like “The team captain is a rock.”, “Time is money.”, “My lawyer is a shark.” is an important challenging problem, which has been of great interest to the research community. To solve this task, we have collected and manually annotated the affect of metaphor-rich texts for four languages. We present novel algorithms that integrate triggers for cognitive, affective, perceptual and social processes with stylistic and lexical information. By running evaluations on datasets in English, Spanish, Russian and Farsi, we show that the developed affect polarity and valence prediction technology of metaphor-rich texts is portable and works equally well for different languages.

3 0.24236143 88 acl-2013-Computational considerations of comparisons and similes

Author: Vlad Niculae ; Victoria Yaneva

Abstract: This paper presents work in progress towards automatic recognition and classification of comparisons and similes. Among possible applications, we discuss the place of this task in text simplification for readers with Autism Spectrum Disorders (ASD), who are known to have deficits in comprehending figurative language. We propose an approach to comparison recognition through the use of syntactic patterns. Keeping in mind the requirements of autistic readers, we discuss the properties relevant for distinguishing semantic criteria like figurativeness and abstractness.

4 0.18005058 43 acl-2013-Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity

Author: Mohammad Taher Pilehvar ; David Jurgens ; Roberto Navigli

Abstract: Semantic similarity is an essential component of many Natural Language Processing applications. However, prior methods for computing semantic similarity often operate at different levels, e.g., single words or entire documents, which requires adapting the method for each data type. We present a unified approach to semantic similarity that operates at multiple levels, all the way from comparing word senses to comparing text documents. Our method leverages a common probabilistic representation over word senses in order to compare different types of linguistic data. This unified representation shows state-ofthe-art performance on three tasks: seman- tic textual similarity, word similarity, and word sense coarsening.

5 0.15623511 53 acl-2013-Annotation of regular polysemy and underspecification

Author: Hector Martinez Alonso ; Bolette Sandford Pedersen ; Nuria Bel

Abstract: We present the result of an annotation task on regular polysemy for a series of semantic classes or dot types in English, Danish and Spanish. This article describes the annotation process, the results in terms of inter-encoder agreement, and the sense distributions obtained with two methods: majority voting with a theory-compliant backoff strategy, and MACE, an unsupervised system to choose the most likely sense from all the annotations.

6 0.14249435 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri

7 0.13983652 344 acl-2013-The Effects of Lexical Resource Quality on Preference Violation Detection

8 0.13804458 96 acl-2013-Creating Similarity: Lateral Thinking for Vertical Similarity Judgments

9 0.13711676 366 acl-2013-Understanding Verbs based on Overlapping Verbs Senses

10 0.12301037 111 acl-2013-Density Maximization in Context-Sense Metric Space for All-words WSD

11 0.10506759 258 acl-2013-Neighbors Help: Bilingual Unsupervised WSD Using Context

12 0.10213282 162 acl-2013-FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection

13 0.098941296 316 acl-2013-SenseSpotting: Never let your parallel data tie you to an old domain

14 0.093284406 238 acl-2013-Measuring semantic content in distributional vectors

15 0.090875879 190 acl-2013-Implicatures and Nested Beliefs in Approximate Decentralized-POMDPs

16 0.080685548 58 acl-2013-Automated Collocation Suggestion for Japanese Second Language Learners

17 0.075018071 93 acl-2013-Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora

18 0.074957691 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context

19 0.071626782 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates

20 0.070391744 192 acl-2013-Improved Lexical Acquisition through DPP-based Verb Clustering

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.174), (1, 0.089), (2, 0.034), (3, -0.137), (4, -0.087), (5, -0.181), (6, -0.15), (7, 0.112), (8, 0.058), (9, 0.002), (10, -0.015), (11, 0.072), (12, -0.099), (13, -0.054), (14, 0.074), (15, 0.007), (16, 0.012), (17, 0.027), (18, 0.02), (19, -0.012), (20, 0.065), (21, -0.067), (22, 0.113), (23, -0.1), (24, 0.106), (25, -0.027), (26, -0.134), (27, 0.017), (28, -0.052), (29, 0.067), (30, 0.063), (31, -0.28), (32, -0.014), (33, -0.009), (34, 0.093), (35, 0.149), (36, 0.229), (37, -0.134), (38, 0.128), (39, 0.04), (40, -0.057), (41, 0.09), (42, 0.108), (43, 0.196), (44, -0.019), (45, -0.059), (46, 0.039), (47, 0.153), (48, 0.159), (49, 0.08)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90989828 116 acl-2013-Detecting Metaphor by Contextual Analogy

Author: Eirini Florou

2 0.83026665 88 acl-2013-Computational considerations of comparisons and similes

Author: Vlad Niculae ; Victoria Yaneva

3 0.7882489 253 acl-2013-Multilingual Affect Polarity and Valence Prediction in Metaphor-Rich Texts

Author: Zornitsa Kozareva

4 0.57442582 96 acl-2013-Creating Similarity: Lateral Thinking for Vertical Similarity Judgments

Author: Tony Veale ; Guofu Li

Abstract: Just as observing is more than just seeing, comparing is far more than mere matching. It takes understanding, and even inventiveness, to discern a useful basis for judging two ideas as similar in a particular context, especially when our perspective is shaped by an act of linguistic creativity such as metaphor, simile or analogy. Structured resources such as WordNet offer a convenient hierarchical means for converging on a common ground for comparison, but offer little support for the divergent thinking that is needed to creatively view one concept as another. We describe such a means here, by showing how the web can be used to harvest many divergent views for many familiar ideas. These lateral views complement the vertical views of WordNet, and support a system for idea exploration called Thesaurus Rex. We show also how Thesaurus Rex supports a novel, generative similarity measure for WordNet. 1 Seeing is Believing (and Creating) Similarity is a cognitive phenomenon that is both complex and subjective, yet for practical reasons it is often modeled as if it were simple and objective. This makes sense for the many situations where we want to align our similarity judgments with those of others, and thus focus on the same conventional properties that others are also likely to focus upon. This reliance on the consensus viewpoint explains why WordNet (Fellbaum, 1998) has proven so useful as a basis for computational measures of lexico-semantic similarity Guofu Li School of Computer Science and Informatics, University College Dublin, Belfield, Dublin D2, Ireland. l .guo fu . l gmai l i @ .com (e.g. see Pederson et al. 2004, Budanitsky & Hirst, 2006; Seco et al. 2006). These measures reduce the similarity of two lexical concepts to a single number, by viewing similarity as an objective estimate of the overlap in their salient qualities. This convenient perspective is poorly suited to creative or insightful comparisons, but it is sufficient for the many mundane comparisons we often perform in daily life, such as when we organize books or look for items in a supermarket. So if we do not know in which aisle to locate a given item (such as oatmeal), we may tacitly know how to locate a similar product (such as cornflakes) and orient ourselves accordingly. Yet there are occasions when the recognition of similarities spurs the creation of similarities, when the act of comparison spurs us to invent new ways of looking at an idea. By placing pop tarts in the breakfast aisle, food manufacturers encourage us to view them as a breakfast food that is not dissimilar to oatmeal or cornflakes. When ex-PM Tony Blair published his memoirs, a mischievous activist encouraged others to move his book from Biography to Fiction in bookshops, in the hope that buyers would see it in a new light. Whenever we use a novel metaphor to convey a non-obvious viewpoint on a topic, such as “cigarettes are time bombs”, the comparison may spur us to insight, to see aspects of the topic that make it more similar to the vehicle (see Ortony, 1979; Veale & Hao, 2007). In formal terms, assume agent A has an insight about concept X, and uses the metaphor X is a Y to also provoke this insight in agent B. To arrive at this insight for itself, B must intuit what X and Y have in common. But this commonality is surely more than a standard categorization of X, or else it would not count as an insight about X. To understand the metaphor, B must place X 660 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t.he ?c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 6 0–670, in a new category, so that X can be seen as more similar to Y. Metaphors shape the way we per- ceive the world by re-shaping the way we make similarity judgments. So if we want to imbue computers with the ability to make and to understand creative metaphors, we must first give them the ability to look beyond the narrow viewpoints of conventional resources. Any measure that models similarity as an objective function of a conventional worldview employs a convergent thought process. Using WordNet, for instance, a similarity measure can vertically converge on a common superordinate category of both inputs, and generate a single numeric result based on their distance to, and the information content of, this common generalization. So to find the most conventional ways of seeing a lexical concept, one simply ascends a narrowing concept hierarchy, using a process de Bono (1970) calls vertical thinking. To find novel, non-obvious and useful ways of looking at a lexical concept, one must use what Guilford (1967) calls divergent thinking and what de Bono calls lateral thinking. These processes cut across familiar category boundaries, to simultaneously place a concept in many different categories so that we can see it in many different ways. de Bono argues that vertical thinking is selective while lateral thinking is generative. Whereas vertical thinking concerns itself with the “right” way or a single “best” way of looking at things, lateral thinking focuses on producing alternatives to the status quo. To be as useful for creative tasks as they are for conventional tasks, we need to re-imagine our computational similarity measures as generative rather than selective, expansive rather than reductive, divergent as well as convergent and lateral as well as vertical. Though WordNet is ideally structured to support vertical, convergent reasoning, its comprehensive nature means it can also be used as a solid foundation for building a more lateral and divergent model of similarity. Here we will use the web as a source of diverse perspectives on familiar ideas, to complement the conventional and often narrow views codified by WordNet. Section 2 provides a brief overview of past work in the area of similarity measurement, before section 3 describes a simple bootstrapping loop for acquiring richly diverse perspectives from the web for a wide variety of familiar ideas. These perspectives are used to enhance a Word- Net-based measure of lexico-semantic similarity in section 4, by broadening the range of informative viewpoints the measure can select from. Similarity is thus modeled as a process that is both generative and selective. This lateral-andvertical approach is evaluated in section 5, on the Miller & Charles (1991) data-set. A web app for the lateral exploration of diverse viewpoints, named Thesaurus Rex, is also presented, before closing remarks are offered in section 6. 2 Related Work and Ideas WordNet’s taxonomic organization of nounsenses and verb-senses – in which very general categories are successively divided into increasingly informative sub-categories or instancelevel ideas – allows us to gauge the overlap in information content, and thus of meaning, of two lexical concepts. We need only identify the deepest point in the taxonomy at which this content starts to diverge. This point of divergence is often called the LCS, or least common subsumer, of two concepts (Pederson et al., 2004). Since sub-categories add new properties to those they inherit from their parents – Aristotle called these properties the differentia that stop a category system from trivially collapsing into itself – the depth of a lexical concept in a taxonomy is an intuitive proxy for its information content. Wu & Palmer (1994) use the depth of a lexical concept in the WordNet hierarchy as such a proxy, and thereby estimate the similarity of two lexical concepts as twice the depth of their LCS divided by the sum of their individual depths. Leacock and Chodorow (1998) instead use the length of the shortest path between two concepts as a proxy for the conceptual distance between them. To connect any two ideas in a hierarchical system, one must vertically ascend the hierarchy from one concept, change direction at a potential LCS, and then descend the hierarchy to reach the second concept. (Aristotle was also first to suggest this approach in his Poetics). Leacock and Chodorow normalize the length of this path by dividing its size (in nodes) by twice the depth of the deepest concept in the hierarchy; the latter is an upper bound on the distance between any two concepts in the hierarchy. Negating the log of this normalized length yields a corresponding similarity score. While the role of an LCS is merely implied in Leacock and Chodorow’s use of a shortest path, the LCS is pivotal nonetheless, and like that of Wu & Palmer, the approach uses an essentially vertical reasoning process to identify a single “best” generalization. Depth is a convenient proxy for information content, but more nuanced proxies can yield 661 more rounded similarity measures. Resnick (1995) draws on information theory to define the information content of a lexical concept as the negative log likelihood of its occurrence in a corpus, either explicitly (via a direct mention) or by presupposition (via a mention of any of its sub-categories or instances). Since the likelihood of a general category occurring in a corpus is higher than that of any of its sub-categories or instances, such categories are more predictable, and less informative, than rarer categories whose occurrences are less predictable and thus more informative. The negative log likelihood of the most informative LCS of two lexical concepts offers a reliable estimate of the amount of infor- mation shared by those concepts, and thus a good estimate of their similarity. Lin (1998) combines the intuitions behind Resnick’s metric and that of Wu and Palmer to estimate the similarity of two lexical concepts as an information ratio: twice the information content of their LCS divided by the sum of their individual information contents. Jiang and Conrath (1997) consider the converse notion of dissimilarity, noting that two lexical concepts are dissimilar to the extent that each contains information that is not shared by the other. So if the information content of their most informative LCS is a good measure of what they do share, then the sum of their individual information contents, minus twice the content of their most informative LCS, is a reliable estimate of their dissimilarity. Seco et al. (2006) presents a minor innovation, showing how Resnick’s notion of information content can be calculated without the use of an external corpus. Rather, when using Resnick’s metric (or that of Lin, or Jiang and Conrath) for measuring the similarity of lexical concepts in WordNet, one can use the category structure of WordNet itself to estimate infor- mation content. Typically, the more general a concept, the more descendants it will possess. Seco et al. thus estimate the information content of a lexical concept as the log of the sum of all its unique descendants (both direct and indirect), divided by the log of the total number of concepts in the entire hierarchy. Not only is this intrinsic view of information content convenient to use, without recourse to an external corpus, Seco et al. show that it offers a better estimate of information content than its extrinsic, corpus-based alternatives, as measured relative to average human similarity ratings for the 30 word-pairs in the Miller & Charles (1991) test set. A similarity measure can draw on other sources of information besides WordNet’s category structures. One might eke out additional information from WordNet’s textual glosses, as in Lesk (1986), or use category structures other than those offered by WordNet. Looking beyond WordNet, entries in the online encyclopedia Wikipedia are not only connected by a dense topology of lateral links, they are also organized by a rich hierarchy of overlapping categories. Strube and Ponzetto (2006) show how Wikipedia can support a measure of similarity (and relatedness) that better approximates human judgments than many WordNet-based measures. Nonetheless, WordNet can be a valuable component of a hybrid measure, and Agirre et al. (2009) use an SVM (support vector machine) to combine information from WordNet with information harvested from the web. Their best similarity measure achieves a remarkable 0.93 correlation with human judgments on the Miller & Charles word-pair set. Similarity is not always applied to pairs of concepts; it is sometimes analogically applied to pairs of pairs of concepts, as in proportional analogies of the form A is to B as C is to D (e.g., hacks are to writers as mercenaries are to soldiers, or chisels are to sculptors as scalpels are to surgeons). In such analogies, one is really assessing the similarity of the unstated relationship between each pair of concepts: thus, mercenaries are soldiers whose allegiance is paid for, much as hacks are writers with income-driven loyalties; sculptors use chisels to carve stone, while surgeons use scalpels to cut or carve flesh. Veale (2004) used WordNet to assess the similarity of A:B to C:D as a function of the combined similarity of A to C and of B to D. In contrast, Turney (2005) used the web to pursue a more divergent course, to represent the tacit relationships of A to B and of C to D as points in a highdimensional space. The dimensions of this space initially correspond to linking phrases on the web, before these dimensions are significantly reduced using singular value decomposition. In the infamous SAT test, an analogy A:B::C:D has four other pairs of concepts that serve as likely distractors (e.g. singer:songwriter for hack:writer) and the goal is to choose the most appropriate C:D pair for a given A:B pairing. Using variants of Wu and Palmer (1994) on the 374 SAT analogies of Turney (2005), Veale (2004) reports a success rate of 38–44% using only WordNet-based similarity. In contrast, Turney (2005) reports up to 55% success on the same analogies, partly because his approach aims 662 to match implicit relations rather than explicit concepts, and in part because it uses a divergent process to gather from the web as rich a perspec- tive as it can on these latent relationships. 2.1 Clever Comparisons Create Similarity Each of these approaches to similarity is a user of information, rather than a creator, and each fails to capture how a creative comparison (such as a metaphor) can spur a listener to view a topic from an atypical perspective. Camac & Glucksberg (1984) provide experimental evidence for the claim that “metaphors do not use preexisting associations to achieve their effects [… ] people use metaphors to create new relations between concepts.” They also offer a salutary reminder of an often overlooked fact: every comparison exploits information, but each is also a source of new information in its own right. Thus, “this cola is acid” reveals a different perspective on cola (e.g. as a corrosive substance or an irritating food) than “this acid is cola” highlights for acid (such as e.g., a familiar substance) Veale & Keane (1994) model the role of similarity in realizing the long-term perlocutionary effect of an informative comparison. For example, to compare surgeons to butchers is to encourage one to see all surgeons as more bloody, … crude or careless. The reverse comparison, of butchers to surgeons, encourages one to see butchers as more skilled and precise. Veale & Keane present a network model of memory, called Sapper, in which activation can spread between related concepts, thus allowing one concept to prime the properties of a neighbor. To interpret an analogy, Sapper lays down new activation-carrying bridges in memory between analogical counterparts, such as between surgeon & butcher, flesh & meat, and scalpel & cleaver. Comparisons can thus have lasting effects on how Sapper sees the world, changing the pattern of activation that arises when it primes a concept. Veale (2003) adopts a similarly dynamic view of similarity in WordNet, showing how an analogical comparison can result in the automatic addition of new categories and relations to WordNet itself. Veale considers the problem of finding an analogical mapping between different parts of WordNet’s noun-sense hierarchy, such as between instances of Greek god and Norse god, or between the letters of different alphabets, such as of Greek and Hebrew. But no structural similarity measure for WordNet exhibits enough discernment to e.g. assign a higher similarity to Zeus & Odin (each is the supreme deity of its pantheon) than to a pairing of Zeus and any other Norse god, just as no structural measure will assign a higher similarity to Alpha & Aleph or to Beta & Beth than to any random letter pairing. A fine-grained category hierarchy permits fine-grained similarity judgments, and though WordNet is useful, its sense hierarchies are not especially fine-grained. However, we can automatically make WordNet subtler and more discerning, by adding new fine-grained categories to unite lexical concepts whose similarity is not reflected by any existing categories. Veale (2003) shows how a property that is found in the glosses of two lexical concepts, of the same depth, can be combined with their LCS to yield a new fine-grained parent category, so e.g. “supreme” + deity = Supreme-deity (for Odin, Zeus, Jupiter, etc.) and “1 st” + letter = 1st-letter (for Alpha, Aleph, etc.) Selected aspects of the textual similarity of two WordNet glosses – the key to similarity in Lesk (1986) – can thus be reified into an explicitly categorical WordNet form. 3 Divergent (Re)Categorization To tap into a richer source of concept properties than WordNet’s glosses, we can use web ngrams. Consider these descriptions of a cowboy from the Google n-grams (Brants & Franz, 2006). The numbers to the right are Google frequency counts. a lonesome cowboy 432 a mounted cowboy 122 a grizzled cowboy 74 a swaggering cowboy 68 To find the stable properties that can underpin a meaningful fine-grained category for cowboy, we must seek out the properties that are so often presupposed to be salient of all cowboys that one can use them to anchor a simile, such as

5 0.57031018 344 acl-2013-The Effects of Lexical Resource Quality on Preference Violation Detection

Author: Jesse Dunietz ; Lori Levin ; Jaime Carbonell

Abstract: Lexical resources such as WordNet and VerbNet are widely used in a multitude of NLP tasks, as are annotated corpora such as treebanks. Often, the resources are used as-is, without question or examination. This practice risks missing significant performance gains and even entire techniques. This paper addresses the importance of resource quality through the lens of a challenging NLP task: detecting selectional preference violations. We present DAVID, a simple, lexical resource-based preference violation detector. With asis lexical resources, DAVID achieves an F1-measure of just 28.27%. When the resource entries and parser outputs for a small sample are corrected, however, the F1-measure on that sample jumps from 40% to 61.54%, and performance on other examples rises, suggesting that the algorithm becomes practical given refined resources. More broadly, this paper shows that resource quality matters tremendously, sometimes even more than algorithmic improvements.

6 0.5371992 53 acl-2013-Annotation of regular polysemy and underspecification

7 0.53532642 366 acl-2013-Understanding Verbs based on Overlapping Verbs Senses

8 0.47655737 302 acl-2013-Robust Automated Natural Language Processing with Multiword Expressions and Collocations

9 0.44519585 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri

10 0.39346072 43 acl-2013-Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity

11 0.38946021 258 acl-2013-Neighbors Help: Bilingual Unsupervised WSD Using Context

12 0.38632947 238 acl-2013-Measuring semantic content in distributional vectors

13 0.36394456 111 acl-2013-Density Maximization in Context-Sense Metric Space for All-words WSD

14 0.35736859 198 acl-2013-IndoNet: A Multilingual Lexical Knowledge Network for Indian Languages

15 0.34238991 61 acl-2013-Automatic Interpretation of the English Possessive

16 0.33962202 234 acl-2013-Linking and Extending an Open Multilingual Wordnet

17 0.33855036 262 acl-2013-Offspring from Reproduction Problems: What Replication Failure Teaches Us

18 0.33818072 286 acl-2013-Psycholinguistically Motivated Computational Models on the Organization and Processing of Morphologically Complex Words

19 0.32684377 93 acl-2013-Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora

20 0.32586515 190 acl-2013-Implicatures and Nested Beliefs in Approximate Decentralized-POMDPs

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.054), (6, 0.03), (11, 0.07), (14, 0.011), (15, 0.017), (24, 0.064), (26, 0.046), (28, 0.018), (30, 0.018), (35, 0.085), (42, 0.052), (48, 0.054), (62, 0.234), (64, 0.016), (70, 0.033), (88, 0.041), (90, 0.019), (95, 0.061)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.81247091 198 acl-2013-IndoNet: A Multilingual Lexical Knowledge Network for Indian Languages

Author: Brijesh Bhatt ; Lahari Poddar ; Pushpak Bhattacharyya

Abstract: We present IndoNet, a multilingual lexical knowledge base for Indian languages. It is a linked structure of wordnets of 18 different Indian languages, Universal Word dictionary and the Suggested Upper Merged Ontology (SUMO). We discuss various benefits of the network and challenges involved in the development. The system is encoded in Lexical Markup Framework (LMF) and we propose modifications in LMF to accommodate Universal Word Dictionary and SUMO. This standardized version of lexical knowledge base of Indian Languages can now easily , be linked to similar global resources.

same-paper 2 0.80794722 116 acl-2013-Detecting Metaphor by Contextual Analogy

Author: Eirini Florou

3 0.73038912 118 acl-2013-Development and Analysis of NLP Pipelines in Argo

Author: Rafal Rak ; Andrew Rowley ; Jacob Carter ; Sophia Ananiadou

Abstract: Developing sophisticated NLP pipelines composed of multiple processing tools and components available through different providers may pose a challenge in terms of their interoperability. The Unstructured Information Management Architecture (UIMA) is an industry standard whose aim is to ensure such interoperability by defining common data structures and interfaces. The architecture has been gaining attention from industry and academia alike, resulting in a large volume ofUIMA-compliant processing components. In this paper, we demonstrate Argo, a Web-based workbench for the development and processing of NLP pipelines/workflows. The workbench is based upon UIMA, and thus has the potential of using many of the existing UIMA resources. We present features, and show examples, offacilitating the distributed development of components and the analysis of processing results. The latter includes annotation visualisers and editors, as well as serialisation to RDF format, which enables flexible querying in addition to data manipulation thanks to the semantic query language SPARQL. The distributed development feature allows users to seamlessly connect their tools to workflows running in Argo, and thus take advantage of both the available library of components (without the need of installing them locally) and the analytical tools.

4 0.60619199 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri

Author: Olivier Ferret

Abstract: Distributional thesauri are now widely used in a large number of Natural Language Processing tasks. However, they are far from containing only interesting semantic relations. As a consequence, improving such thesaurus is an important issue that is mainly tackled indirectly through the improvement of semantic similarity measures. In this article, we propose a more direct approach focusing on the identification of the neighbors of a thesaurus entry that are not semantically linked to this entry. This identification relies on a discriminative classifier trained from unsupervised selected examples for building a distributional model of the entry in texts. Its bad neighbors are found by applying this classifier to a representative set of occurrences of each of these neighbors. We evaluate the interest of this method for a large set of English nouns with various frequencies.

5 0.60208064 318 acl-2013-Sentiment Relevance

Author: Christian Scheible ; Hinrich Schutze

Abstract: A number of different notions, including subjectivity, have been proposed for distinguishing parts of documents that convey sentiment from those that do not. We propose a new concept, sentiment relevance, to make this distinction and argue that it better reflects the requirements of sentiment analysis systems. We demonstrate experimentally that sentiment relevance and subjectivity are related, but different. Since no large amount of labeled training data for our new notion of sentiment relevance is available, we investigate two semi-supervised methods for creating sentiment relevance classifiers: a distant supervision approach that leverages structured information about the domain of the reviews; and transfer learning on feature representations based on lexical taxonomies that enables knowledge transfer. We show that both methods learn sentiment relevance classifiers that perform well.

6 0.6004864 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations

7 0.59904128 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation

8 0.59854156 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

9 0.59771371 99 acl-2013-Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation

10 0.5940755 85 acl-2013-Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis

11 0.59400189 62 acl-2013-Automatic Term Ambiguity Detection

12 0.59246194 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context

13 0.59064794 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction

14 0.59043181 172 acl-2013-Graph-based Local Coherence Modeling

15 0.59027153 225 acl-2013-Learning to Order Natural Language Texts

16 0.59007251 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension

17 0.58949935 123 acl-2013-Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study

18 0.58927071 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering

19 0.58884561 267 acl-2013-PARMA: A Predicate Argument Aligner

20 0.58854938 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation