acl acl2013 acl2013-96 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tony Veale ; Guofu Li
Abstract: Just as observing is more than just seeing, comparing is far more than mere matching. It takes understanding, and even inventiveness, to discern a useful basis for judging two ideas as similar in a particular context, especially when our perspective is shaped by an act of linguistic creativity such as metaphor, simile or analogy. Structured resources such as WordNet offer a convenient hierarchical means for converging on a common ground for comparison, but offer little support for the divergent thinking that is needed to creatively view one concept as another. We describe such a means here, by showing how the web can be used to harvest many divergent views for many familiar ideas. These lateral views complement the vertical views of WordNet, and support a system for idea exploration called Thesaurus Rex. We show also how Thesaurus Rex supports a novel, generative similarity measure for WordNet. 1 Seeing is Believing (and Creating) Similarity is a cognitive phenomenon that is both complex and subjective, yet for practical reasons it is often modeled as if it were simple and objective. This makes sense for the many situations where we want to align our similarity judgments with those of others, and thus focus on the same conventional properties that others are also likely to focus upon. This reliance on the consensus viewpoint explains why WordNet (Fellbaum, 1998) has proven so useful as a basis for computational measures of lexico-semantic similarity Guofu Li School of Computer Science and Informatics, University College Dublin, Belfield, Dublin D2, Ireland. l .guo fu . l gmai l i @ .com (e.g. see Pederson et al. 2004, Budanitsky & Hirst, 2006; Seco et al. 2006). These measures reduce the similarity of two lexical concepts to a single number, by viewing similarity as an objective estimate of the overlap in their salient qualities. This convenient perspective is poorly suited to creative or insightful comparisons, but it is sufficient for the many mundane comparisons we often perform in daily life, such as when we organize books or look for items in a supermarket. So if we do not know in which aisle to locate a given item (such as oatmeal), we may tacitly know how to locate a similar product (such as cornflakes) and orient ourselves accordingly. Yet there are occasions when the recognition of similarities spurs the creation of similarities, when the act of comparison spurs us to invent new ways of looking at an idea. By placing pop tarts in the breakfast aisle, food manufacturers encourage us to view them as a breakfast food that is not dissimilar to oatmeal or cornflakes. When ex-PM Tony Blair published his memoirs, a mischievous activist encouraged others to move his book from Biography to Fiction in bookshops, in the hope that buyers would see it in a new light. Whenever we use a novel metaphor to convey a non-obvious viewpoint on a topic, such as “cigarettes are time bombs”, the comparison may spur us to insight, to see aspects of the topic that make it more similar to the vehicle (see Ortony, 1979; Veale & Hao, 2007). In formal terms, assume agent A has an insight about concept X, and uses the metaphor X is a Y to also provoke this insight in agent B. To arrive at this insight for itself, B must intuit what X and Y have in common. But this commonality is surely more than a standard categorization of X, or else it would not count as an insight about X. To understand the metaphor, B must place X 660 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t.he ?c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 6 0–670, in a new category, so that X can be seen as more similar to Y. Metaphors shape the way we per- ceive the world by re-shaping the way we make similarity judgments. So if we want to imbue computers with the ability to make and to understand creative metaphors, we must first give them the ability to look beyond the narrow viewpoints of conventional resources. Any measure that models similarity as an objective function of a conventional worldview employs a convergent thought process. Using WordNet, for instance, a similarity measure can vertically converge on a common superordinate category of both inputs, and generate a single numeric result based on their distance to, and the information content of, this common generalization. So to find the most conventional ways of seeing a lexical concept, one simply ascends a narrowing concept hierarchy, using a process de Bono (1970) calls vertical thinking. To find novel, non-obvious and useful ways of looking at a lexical concept, one must use what Guilford (1967) calls divergent thinking and what de Bono calls lateral thinking. These processes cut across familiar category boundaries, to simultaneously place a concept in many different categories so that we can see it in many different ways. de Bono argues that vertical thinking is selective while lateral thinking is generative. Whereas vertical thinking concerns itself with the “right” way or a single “best” way of looking at things, lateral thinking focuses on producing alternatives to the status quo. To be as useful for creative tasks as they are for conventional tasks, we need to re-imagine our computational similarity measures as generative rather than selective, expansive rather than reductive, divergent as well as convergent and lateral as well as vertical. Though WordNet is ideally structured to support vertical, convergent reasoning, its comprehensive nature means it can also be used as a solid foundation for building a more lateral and divergent model of similarity. Here we will use the web as a source of diverse perspectives on familiar ideas, to complement the conventional and often narrow views codified by WordNet. Section 2 provides a brief overview of past work in the area of similarity measurement, before section 3 describes a simple bootstrapping loop for acquiring richly diverse perspectives from the web for a wide variety of familiar ideas. These perspectives are used to enhance a Word- Net-based measure of lexico-semantic similarity in section 4, by broadening the range of informative viewpoints the measure can select from. Similarity is thus modeled as a process that is both generative and selective. This lateral-andvertical approach is evaluated in section 5, on the Miller & Charles (1991) data-set. A web app for the lateral exploration of diverse viewpoints, named Thesaurus Rex, is also presented, before closing remarks are offered in section 6. 2 Related Work and Ideas WordNet’s taxonomic organization of nounsenses and verb-senses – in which very general categories are successively divided into increasingly informative sub-categories or instancelevel ideas – allows us to gauge the overlap in information content, and thus of meaning, of two lexical concepts. We need only identify the deepest point in the taxonomy at which this content starts to diverge. This point of divergence is often called the LCS, or least common subsumer, of two concepts (Pederson et al., 2004). Since sub-categories add new properties to those they inherit from their parents – Aristotle called these properties the differentia that stop a category system from trivially collapsing into itself – the depth of a lexical concept in a taxonomy is an intuitive proxy for its information content. Wu & Palmer (1994) use the depth of a lexical concept in the WordNet hierarchy as such a proxy, and thereby estimate the similarity of two lexical concepts as twice the depth of their LCS divided by the sum of their individual depths. Leacock and Chodorow (1998) instead use the length of the shortest path between two concepts as a proxy for the conceptual distance between them. To connect any two ideas in a hierarchical system, one must vertically ascend the hierarchy from one concept, change direction at a potential LCS, and then descend the hierarchy to reach the second concept. (Aristotle was also first to suggest this approach in his Poetics). Leacock and Chodorow normalize the length of this path by dividing its size (in nodes) by twice the depth of the deepest concept in the hierarchy; the latter is an upper bound on the distance between any two concepts in the hierarchy. Negating the log of this normalized length yields a corresponding similarity score. While the role of an LCS is merely implied in Leacock and Chodorow’s use of a shortest path, the LCS is pivotal nonetheless, and like that of Wu & Palmer, the approach uses an essentially vertical reasoning process to identify a single “best” generalization. Depth is a convenient proxy for information content, but more nuanced proxies can yield 661 more rounded similarity measures. Resnick (1995) draws on information theory to define the information content of a lexical concept as the negative log likelihood of its occurrence in a corpus, either explicitly (via a direct mention) or by presupposition (via a mention of any of its sub-categories or instances). Since the likelihood of a general category occurring in a corpus is higher than that of any of its sub-categories or instances, such categories are more predictable, and less informative, than rarer categories whose occurrences are less predictable and thus more informative. The negative log likelihood of the most informative LCS of two lexical concepts offers a reliable estimate of the amount of infor- mation shared by those concepts, and thus a good estimate of their similarity. Lin (1998) combines the intuitions behind Resnick’s metric and that of Wu and Palmer to estimate the similarity of two lexical concepts as an information ratio: twice the information content of their LCS divided by the sum of their individual information contents. Jiang and Conrath (1997) consider the converse notion of dissimilarity, noting that two lexical concepts are dissimilar to the extent that each contains information that is not shared by the other. So if the information content of their most informative LCS is a good measure of what they do share, then the sum of their individual information contents, minus twice the content of their most informative LCS, is a reliable estimate of their dissimilarity. Seco et al. (2006) presents a minor innovation, showing how Resnick’s notion of information content can be calculated without the use of an external corpus. Rather, when using Resnick’s metric (or that of Lin, or Jiang and Conrath) for measuring the similarity of lexical concepts in WordNet, one can use the category structure of WordNet itself to estimate infor- mation content. Typically, the more general a concept, the more descendants it will possess. Seco et al. thus estimate the information content of a lexical concept as the log of the sum of all its unique descendants (both direct and indirect), divided by the log of the total number of concepts in the entire hierarchy. Not only is this intrinsic view of information content convenient to use, without recourse to an external corpus, Seco et al. show that it offers a better estimate of information content than its extrinsic, corpus-based alternatives, as measured relative to average human similarity ratings for the 30 word-pairs in the Miller & Charles (1991) test set. A similarity measure can draw on other sources of information besides WordNet’s category structures. One might eke out additional information from WordNet’s textual glosses, as in Lesk (1986), or use category structures other than those offered by WordNet. Looking beyond WordNet, entries in the online encyclopedia Wikipedia are not only connected by a dense topology of lateral links, they are also organized by a rich hierarchy of overlapping categories. Strube and Ponzetto (2006) show how Wikipedia can support a measure of similarity (and relatedness) that better approximates human judgments than many WordNet-based measures. Nonetheless, WordNet can be a valuable component of a hybrid measure, and Agirre et al. (2009) use an SVM (support vector machine) to combine information from WordNet with information harvested from the web. Their best similarity measure achieves a remarkable 0.93 correlation with human judgments on the Miller & Charles word-pair set. Similarity is not always applied to pairs of concepts; it is sometimes analogically applied to pairs of pairs of concepts, as in proportional analogies of the form A is to B as C is to D (e.g., hacks are to writers as mercenaries are to soldiers, or chisels are to sculptors as scalpels are to surgeons). In such analogies, one is really assessing the similarity of the unstated relationship between each pair of concepts: thus, mercenaries are soldiers whose allegiance is paid for, much as hacks are writers with income-driven loyalties; sculptors use chisels to carve stone, while surgeons use scalpels to cut or carve flesh. Veale (2004) used WordNet to assess the similarity of A:B to C:D as a function of the combined similarity of A to C and of B to D. In contrast, Turney (2005) used the web to pursue a more divergent course, to represent the tacit relationships of A to B and of C to D as points in a highdimensional space. The dimensions of this space initially correspond to linking phrases on the web, before these dimensions are significantly reduced using singular value decomposition. In the infamous SAT test, an analogy A:B::C:D has four other pairs of concepts that serve as likely distractors (e.g. singer:songwriter for hack:writer) and the goal is to choose the most appropriate C:D pair for a given A:B pairing. Using variants of Wu and Palmer (1994) on the 374 SAT analogies of Turney (2005), Veale (2004) reports a success rate of 38–44% using only WordNet-based similarity. In contrast, Turney (2005) reports up to 55% success on the same analogies, partly because his approach aims 662 to match implicit relations rather than explicit concepts, and in part because it uses a divergent process to gather from the web as rich a perspec- tive as it can on these latent relationships. 2.1 Clever Comparisons Create Similarity Each of these approaches to similarity is a user of information, rather than a creator, and each fails to capture how a creative comparison (such as a metaphor) can spur a listener to view a topic from an atypical perspective. Camac & Glucksberg (1984) provide experimental evidence for the claim that “metaphors do not use preexisting associations to achieve their effects [… ] people use metaphors to create new relations between concepts.” They also offer a salutary reminder of an often overlooked fact: every comparison exploits information, but each is also a source of new information in its own right. Thus, “this cola is acid” reveals a different perspective on cola (e.g. as a corrosive substance or an irritating food) than “this acid is cola” highlights for acid (such as e.g., a familiar substance) Veale & Keane (1994) model the role of similarity in realizing the long-term perlocutionary effect of an informative comparison. For example, to compare surgeons to butchers is to encourage one to see all surgeons as more bloody, … crude or careless. The reverse comparison, of butchers to surgeons, encourages one to see butchers as more skilled and precise. Veale & Keane present a network model of memory, called Sapper, in which activation can spread between related concepts, thus allowing one concept to prime the properties of a neighbor. To interpret an analogy, Sapper lays down new activation-carrying bridges in memory between analogical counterparts, such as between surgeon & butcher, flesh & meat, and scalpel & cleaver. Comparisons can thus have lasting effects on how Sapper sees the world, changing the pattern of activation that arises when it primes a concept. Veale (2003) adopts a similarly dynamic view of similarity in WordNet, showing how an analogical comparison can result in the automatic addition of new categories and relations to WordNet itself. Veale considers the problem of finding an analogical mapping between different parts of WordNet’s noun-sense hierarchy, such as between instances of Greek god and Norse god, or between the letters of different alphabets, such as of Greek and Hebrew. But no structural similarity measure for WordNet exhibits enough discernment to e.g. assign a higher similarity to Zeus & Odin (each is the supreme deity of its pantheon) than to a pairing of Zeus and any other Norse god, just as no structural measure will assign a higher similarity to Alpha & Aleph or to Beta & Beth than to any random letter pairing. A fine-grained category hierarchy permits fine-grained similarity judgments, and though WordNet is useful, its sense hierarchies are not especially fine-grained. However, we can automatically make WordNet subtler and more discerning, by adding new fine-grained categories to unite lexical concepts whose similarity is not reflected by any existing categories. Veale (2003) shows how a property that is found in the glosses of two lexical concepts, of the same depth, can be combined with their LCS to yield a new fine-grained parent category, so e.g. “supreme” + deity = Supreme-deity (for Odin, Zeus, Jupiter, etc.) and “1 st” + letter = 1st-letter (for Alpha, Aleph, etc.) Selected aspects of the textual similarity of two WordNet glosses – the key to similarity in Lesk (1986) – can thus be reified into an explicitly categorical WordNet form. 3 Divergent (Re)Categorization To tap into a richer source of concept properties than WordNet’s glosses, we can use web ngrams. Consider these descriptions of a cowboy from the Google n-grams (Brants & Franz, 2006). The numbers to the right are Google frequency counts. a lonesome cowboy 432 a mounted cowboy 122 a grizzled cowboy 74 a swaggering cowboy 68 To find the stable properties that can underpin a meaningful fine-grained category for cowboy, we must seek out the properties that are so often presupposed to be salient of all cowboys that one can use them to anchor a simile, such as
Reference: text
sentIndex sentText sentNum sentScore
1 It takes understanding, and even inventiveness, to discern a useful basis for judging two ideas as similar in a particular context, especially when our perspective is shaped by an act of linguistic creativity such as metaphor, simile or analogy. [sent-4, score-0.266]
2 Structured resources such as WordNet offer a convenient hierarchical means for converging on a common ground for comparison, but offer little support for the divergent thinking that is needed to creatively view one concept as another. [sent-5, score-0.372]
3 We describe such a means here, by showing how the web can be used to harvest many divergent views for many familiar ideas. [sent-6, score-0.264]
4 These lateral views complement the vertical views of WordNet, and support a system for idea exploration called Thesaurus Rex. [sent-7, score-0.317]
5 We show also how Thesaurus Rex supports a novel, generative similarity measure for WordNet. [sent-8, score-0.193]
6 This makes sense for the many situations where we want to align our similarity judgments with those of others, and thus focus on the same conventional properties that others are also likely to focus upon. [sent-10, score-0.254]
7 These measures reduce the similarity of two lexical concepts to a single number, by viewing similarity as an objective estimate of the overlap in their salient qualities. [sent-20, score-0.532]
8 This convenient perspective is poorly suited to creative or insightful comparisons, but it is sufficient for the many mundane comparisons we often perform in daily life, such as when we organize books or look for items in a supermarket. [sent-21, score-0.234]
9 In formal terms, assume agent A has an insight about concept X, and uses the metaphor X is a Y to also provoke this insight in agent B. [sent-27, score-0.239]
10 So if we want to imbue computers with the ability to make and to understand creative metaphors, we must first give them the ability to look beyond the narrow viewpoints of conventional resources. [sent-34, score-0.229]
11 Any measure that models similarity as an objective function of a conventional worldview employs a convergent thought process. [sent-35, score-0.249]
12 Using WordNet, for instance, a similarity measure can vertically converge on a common superordinate category of both inputs, and generate a single numeric result based on their distance to, and the information content of, this common generalization. [sent-36, score-0.294]
13 So to find the most conventional ways of seeing a lexical concept, one simply ascends a narrowing concept hierarchy, using a process de Bono (1970) calls vertical thinking. [sent-37, score-0.268]
14 To find novel, non-obvious and useful ways of looking at a lexical concept, one must use what Guilford (1967) calls divergent thinking and what de Bono calls lateral thinking. [sent-38, score-0.587]
15 These processes cut across familiar category boundaries, to simultaneously place a concept in many different categories so that we can see it in many different ways. [sent-39, score-0.317]
16 de Bono argues that vertical thinking is selective while lateral thinking is generative. [sent-40, score-0.659]
17 Whereas vertical thinking concerns itself with the “right” way or a single “best” way of looking at things, lateral thinking focuses on producing alternatives to the status quo. [sent-41, score-0.609]
18 To be as useful for creative tasks as they are for conventional tasks, we need to re-imagine our computational similarity measures as generative rather than selective, expansive rather than reductive, divergent as well as convergent and lateral as well as vertical. [sent-42, score-0.731]
19 Though WordNet is ideally structured to support vertical, convergent reasoning, its comprehensive nature means it can also be used as a solid foundation for building a more lateral and divergent model of similarity. [sent-43, score-0.41]
20 Here we will use the web as a source of diverse perspectives on familiar ideas, to complement the conventional and often narrow views codified by WordNet. [sent-44, score-0.344]
21 Section 2 provides a brief overview of past work in the area of similarity measurement, before section 3 describes a simple bootstrapping loop for acquiring richly diverse perspectives from the web for a wide variety of familiar ideas. [sent-45, score-0.422]
22 These perspectives are used to enhance a Word- Net-based measure of lexico-semantic similarity in section 4, by broadening the range of informative viewpoints the measure can select from. [sent-46, score-0.388]
23 A web app for the lateral exploration of diverse viewpoints, named Thesaurus Rex, is also presented, before closing remarks are offered in section 6. [sent-49, score-0.34]
24 Since sub-categories add new properties to those they inherit from their parents – Aristotle called these properties the differentia that stop a category system from trivially collapsing into itself – the depth of a lexical concept in a taxonomy is an intuitive proxy for its information content. [sent-54, score-0.293]
25 Wu & Palmer (1994) use the depth of a lexical concept in the WordNet hierarchy as such a proxy, and thereby estimate the similarity of two lexical concepts as twice the depth of their LCS divided by the sum of their individual depths. [sent-55, score-0.601]
26 Leacock and Chodorow (1998) instead use the length of the shortest path between two concepts as a proxy for the conceptual distance between them. [sent-56, score-0.175]
27 To connect any two ideas in a hierarchical system, one must vertically ascend the hierarchy from one concept, change direction at a potential LCS, and then descend the hierarchy to reach the second concept. [sent-57, score-0.185]
28 Leacock and Chodorow normalize the length of this path by dividing its size (in nodes) by twice the depth of the deepest concept in the hierarchy; the latter is an upper bound on the distance between any two concepts in the hierarchy. [sent-59, score-0.263]
29 Depth is a convenient proxy for information content, but more nuanced proxies can yield 661 more rounded similarity measures. [sent-62, score-0.214]
30 Since the likelihood of a general category occurring in a corpus is higher than that of any of its sub-categories or instances, such categories are more predictable, and less informative, than rarer categories whose occurrences are less predictable and thus more informative. [sent-64, score-0.27]
31 The negative log likelihood of the most informative LCS of two lexical concepts offers a reliable estimate of the amount of infor- mation shared by those concepts, and thus a good estimate of their similarity. [sent-65, score-0.297]
32 Lin (1998) combines the intuitions behind Resnick’s metric and that of Wu and Palmer to estimate the similarity of two lexical concepts as an information ratio: twice the information content of their LCS divided by the sum of their individual information contents. [sent-66, score-0.44]
33 Jiang and Conrath (1997) consider the converse notion of dissimilarity, noting that two lexical concepts are dissimilar to the extent that each contains information that is not shared by the other. [sent-67, score-0.174]
34 So if the information content of their most informative LCS is a good measure of what they do share, then the sum of their individual information contents, minus twice the content of their most informative LCS, is a reliable estimate of their dissimilarity. [sent-68, score-0.237]
35 Rather, when using Resnick’s metric (or that of Lin, or Jiang and Conrath) for measuring the similarity of lexical concepts in WordNet, one can use the category structure of WordNet itself to estimate infor- mation content. [sent-71, score-0.499]
36 thus estimate the information content of a lexical concept as the log of the sum of all its unique descendants (both direct and indirect), divided by the log of the total number of concepts in the entire hierarchy. [sent-74, score-0.331]
37 show that it offers a better estimate of information content than its extrinsic, corpus-based alternatives, as measured relative to average human similarity ratings for the 30 word-pairs in the Miller & Charles (1991) test set. [sent-76, score-0.267]
38 A similarity measure can draw on other sources of information besides WordNet’s category structures. [sent-77, score-0.249]
39 Looking beyond WordNet, entries in the online encyclopedia Wikipedia are not only connected by a dense topology of lateral links, they are also organized by a rich hierarchy of overlapping categories. [sent-79, score-0.281]
40 Strube and Ponzetto (2006) show how Wikipedia can support a measure of similarity (and relatedness) that better approximates human judgments than many WordNet-based measures. [sent-80, score-0.218]
41 Veale (2004) used WordNet to assess the similarity of A:B to C:D as a function of the combined similarity of A to C and of B to D. [sent-89, score-0.29]
42 In contrast, Turney (2005) used the web to pursue a more divergent course, to represent the tacit relationships of A to B and of C to D as points in a highdimensional space. [sent-90, score-0.181]
43 1 Clever Comparisons Create Similarity Each of these approaches to similarity is a user of information, rather than a creator, and each fails to capture how a creative comparison (such as a metaphor) can spur a listener to view a topic from an atypical perspective. [sent-98, score-0.237]
44 Thus, “this cola is acid” reveals a different perspective on cola (e. [sent-101, score-0.405]
45 , a familiar substance) Veale & Keane (1994) model the role of similarity in realizing the long-term perlocutionary effect of an informative comparison. [sent-105, score-0.253]
46 For example, to compare surgeons to butchers is to encourage one to see all surgeons as more bloody, … crude or careless. [sent-106, score-0.182]
47 Veale (2003) adopts a similarly dynamic view of similarity in WordNet, showing how an analogical comparison can result in the automatic addition of new categories and relations to WordNet itself. [sent-111, score-0.275]
48 assign a higher similarity to Zeus & Odin (each is the supreme deity of its pantheon) than to a pairing of Zeus and any other Norse god, just as no structural measure will assign a higher similarity to Alpha & Aleph or to Beta & Beth than to any random letter pairing. [sent-115, score-0.321]
49 A fine-grained category hierarchy permits fine-grained similarity judgments, and though WordNet is useful, its sense hierarchies are not especially fine-grained. [sent-116, score-0.3]
50 However, we can automatically make WordNet subtler and more discerning, by adding new fine-grained categories to unite lexical concepts whose similarity is not reflected by any existing categories. [sent-117, score-0.402]
51 ) Selected aspects of the textual similarity of two WordNet glosses – the key to similarity in Lesk (1986) – can thus be reified into an explicitly categorical WordNet form. [sent-122, score-0.327]
52 We value quality over size, as these similes will later be used to find diverse viewpoints on the web via bootstrapping. [sent-129, score-0.234]
53 Hao & Veale (2010) exploit this fact to filter ironic comparisons from web similes, by re-expressing each “as P as C” simile as “P * such as C” (using a wildcard * to match any values for S) and looking for attested uses of this new form on the web. [sent-134, score-0.255]
54 Since each hit will also yield a value for S via the wildcard *, and a finegrained category P-S for C, we use this approach here to harvest fine-grained categories from the web from most of our similes. [sent-135, score-0.288]
55 Once C is seen to be an exemplary member of the category P-S, such as cola in fizzy-drink, a targeted web search is used to find other members of P-S, via the anchored query “P S such as * and C”. [sent-136, score-0.374]
56 (2009) – explicitly mention both the category to be populated (P-S) and a recently acquired member of this category (C). [sent-141, score-0.241]
57 A reliable filter must be applied at each stage, to ensure that any member C that is placed in a category P-S is a sensible member of the category S. [sent-144, score-0.314]
58 Such a filter offers no creative latitude, however, since it forces every pairing of C and P-S to precisely obey WordNet’s category hierarchy. [sent-147, score-0.236]
59 Five successive cycles of bootstrapping are performed, using the 12,000+ web similes as a starting point. [sent-152, score-0.183]
60 After 2 cycles we acquire 43 categories; after 3 cycles, 72; after 4 cycles, 93; and after 5 cycles, we acquire 102 fine-grained perspectives on cola, such as stimulating-drink and corrosive-substance. [sent-154, score-0.176]
61 Fine-grained perspectives for cola found by Thesaurus Rex on the web. [sent-156, score-0.284]
62 4 Measuring and Creating Similarity Which perspectives will be most useful and informative to a WordNet-based similarity metric? [sent-160, score-0.318]
63 Simply, a perspective M-Cx for a concept Cy can be coherently added to WordNet iff Cx denotes a hypernym of some sense of Cy in WordNet. [sent-161, score-0.288]
64 For purposes of quantifying the similarity of two terms t1 and t2 – by finding the WordNet senses of these terms that exhibit the highest sim- ilarity – we can augment WordNet with the perspectives on t1 and t2 that are coherent with WordNet’s hierarchy. [sent-162, score-0.261]
65 We estimate the similarity of C1 and C2 as the cosine of the angle between the feature vectors that are constructed for each. [sent-165, score-0.178]
66 We also want any fine-grained perspective MH to influence our similarity metric, provided it can be coherently tied into WordNet as a shared hypernym of the two lexical concepts being compared. [sent-167, score-0.497]
67 The absolute information content of a category M-H that is newly added to WordNet is given by (2): ( 2) ICab s ( M - H ) = - l o g ( Σ m - h ∈ s i W zeN( M s -iHze ) ( m -h ) ) where size(M-H) is the number of lexical concepts in WordNet for which M-H can be added as a new hypernym. [sent-168, score-0.385]
68 The denominator in (2) denotes the sum total of the size of all fine-grained categories that can be coherently added to WordNet for any term. [sent-169, score-0.175]
69 A fine-grained perspective M-H will thus influence a similarity judgment between C1 and C2 only if M-H can be coherently added to WordNet as a hypernym of C1 and C2, and if M-H enriches our view of H. [sent-172, score-0.354]
70 (2006), this vector-space approach does not hinge on the information content of a single LCS, so any shared hypernym H or perspective M-H can shape a similarity judgment according to its informativeness. [sent-174, score-0.307]
71 5 Empirical Evaluation Many fascinating perspectives on familiar ideas are bootstrapped from the web using similes as a starting point. [sent-175, score-0.373]
72 These perspectives drive an exploratory web-aid to lateral thinking we call Thesaurus Rex, while the cosine-distance metric constructed from WordNet and these many finegrained categories is called, simply, Rex. [sent-176, score-0.618]
73 When Rex provides a numeric estimate of similarity for two ideas, Thesaurus Rex provides an enhanced insight into why these ideas are similar, e. [sent-177, score-0.306]
74 by explaining that cola & acid are not just substances, they are corrosive substances. [sent-179, score-0.27]
75 We evaluate Rex by estimating how closely its judgments correlate with those of human judges on the 30-pair word set of Miller & Charles (M&C;), who aggregated the judgments of multiple human raters into mean ratings for these pairs. [sent-180, score-0.19]
76 While corrosive-substance is a popular category for acid, it not so for cola or juice. [sent-183, score-0.272]
77 with mean human ratings on all 30 word pairs of the Miller & Charles similarity data-set. [sent-185, score-0.189]
78 Yet while informative, this thesaurus is 665 not organized as an explanatory system of hierarchical categories as it is in Thesaurus Rex. [sent-193, score-0.257]
79 The mean ratings of M&C; are the stuff of consensus, not individual creativity, and outside the realm of creative metaphor it often makes sense to safely align our judgments with those of others. [sent-200, score-0.279]
80 However, Rex has the greater capacity for insight, since it augments the structured category system of WordNet with structured categories of its own. [sent-205, score-0.187]
81 Because Rex makes highly selective use of the diverse products of lateral thinking, this selectivity also produces concise explanations for its judgments. [sent-207, score-0.321]
82 1 Generative Uses of Similarity A similarity metric offers a numerical measure of how closely one idea can cluster with another. [sent-209, score-0.188]
83 If they are to actively shape a creative decision, our similarity metrics must be made more generative. [sent-212, score-0.237]
84 A similarity metric can learn to be generative, by observing how people typically cluster words and ideas that are made similar by their contexts of use. [sent-213, score-0.271]
85 Thesaurus Rex can be used to find an embracing category for many such pairs of nodes, while Rex estimates the similarity of any two adjacent nodes. [sent-217, score-0.249]
86 4% relative to WordNet, suggesting that categorization tasks which require implicit comparability judgments are well served by a generative approach to similarity. [sent-228, score-0.191]
87 The relative performance of Rex-pop > Rex-lat > Rex-wn on the M&C; dataset shows that selective use of a divergent perspective permits WordNet to better serve its popular role as a judge of similarity. [sent-231, score-0.231]
88 The 39,826 unique fine-grained categories added by Rex-pop (versus the 44,238 categories added by Rex-lat) in the course of its 1,264,827 comparisons thus suggest credible enhancements to WordNet. [sent-238, score-0.266]
89 The number of new categories (Y-axis) with a given membership size (X-axis) added to WordNet when Rex-pop/lat are used on a large, web scale. [sent-241, score-0.183]
90 6 Summary and Conclusions de Bono (1970) argues that the best solutions arise from using lateral and vertical thinking in unison. [sent-245, score-0.463]
91 Lateral thinking is divergent and generative, while vertical thinking is convergent and analytical. [sent-246, score-0.559]
92 Thesaurus Rex uses the web to generate a rich pool of alternate perspectives on familiar ideas, and Rex selects from this pool to perform vertical reasoning with WordNet to yield precise similarity judgments. [sent-248, score-0.468]
93 Rex also uses the most informative perspective to concisely explain each comparison, or – when used in generative mode – to suggest a creative comparison. [sent-249, score-0.266]
94 A web app based on Thesaurus Rex, to support this kind of lateral thinking, is accessible online at this URL: http://boundinanutshell. [sent-251, score-0.299]
95 Rex dips selectively into this wealth of perspectives (and Rexpop is more selective still), though many of Rex’s needs can be anticipated by looking to how ideas are implicitly grouped into ad-hoc categories (Barsalou, 1983) in constructions such as “X+s and Y+s”. [sent-254, score-0.332]
96 Using the Google n-grams as a source of tacit grouping constructions, we have created a comprehensive lookup table that provides Rex similarity scores for the most common (if often implicit) comparisons. [sent-255, score-0.182]
97 Comparability is not the same as similarity, and a non-zero similarity score does not mean that two concepts would ever be considered comparable by a human. [sent-256, score-0.286]
98 A screenshot from the web application Thesaurus Rex, showing the fine-grained categories found by Thesaurus Rex for the lexical concept creativity on the web. [sent-264, score-0.331]
99 A screenshot from the web application Thesaurus Rex, showing the shared overlapping categories found by Thesaurus Rex for the lexical concepts divorce and war. [sent-266, score-0.326]
100 Semantic similarity based on corpus statistics and lexical taxonomy. [sent-333, score-0.178]
wordName wordTfidf (topN-words)
[('rex', 0.541), ('wordnet', 0.239), ('lateral', 0.23), ('veale', 0.2), ('thesaurus', 0.174), ('cola', 0.168), ('thinking', 0.146), ('similarity', 0.145), ('concepts', 0.141), ('cowboy', 0.122), ('seco', 0.122), ('lcs', 0.118), ('perspectives', 0.116), ('tony', 0.113), ('divergent', 0.112), ('bono', 0.107), ('category', 0.104), ('creative', 0.092), ('vertical', 0.087), ('ideas', 0.083), ('categories', 0.083), ('concept', 0.079), ('resnick', 0.077), ('judgments', 0.073), ('acid', 0.071), ('metaphor', 0.07), ('viewpoints', 0.07), ('comparability', 0.07), ('perspective', 0.069), ('web', 0.069), ('convergent', 0.068), ('surgeons', 0.068), ('creativity', 0.067), ('ic', 0.065), ('coherently', 0.061), ('ironic', 0.061), ('cycles', 0.06), ('metaphors', 0.06), ('informative', 0.057), ('similes', 0.054), ('hierarchy', 0.051), ('familiar', 0.051), ('aristotle', 0.05), ('analogies', 0.05), ('selective', 0.05), ('generative', 0.048), ('hypernym', 0.048), ('simile', 0.047), ('analogical', 0.047), ('butchers', 0.046), ('god', 0.046), ('sapper', 0.046), ('swaggering', 0.046), ('zeus', 0.046), ('insight', 0.045), ('content', 0.045), ('ratings', 0.044), ('depth', 0.043), ('metric', 0.043), ('discerning', 0.041), ('yanfen', 0.041), ('pederson', 0.041), ('diverse', 0.041), ('google', 0.04), ('filter', 0.04), ('miller', 0.039), ('leacock', 0.039), ('comparisons', 0.038), ('sat', 0.037), ('glosses', 0.037), ('lookup', 0.037), ('conventional', 0.036), ('convenient', 0.035), ('salient', 0.035), ('proxy', 0.034), ('calls', 0.033), ('lexical', 0.033), ('member', 0.033), ('estimate', 0.033), ('harvest', 0.032), ('agirre', 0.032), ('narrow', 0.031), ('kozareva', 0.031), ('added', 0.031), ('aisle', 0.031), ('aleph', 0.031), ('camac', 0.031), ('carve', 0.031), ('chisels', 0.031), ('corrosive', 0.031), ('cowboys', 0.031), ('deity', 0.031), ('fizzy', 0.031), ('grizzled', 0.031), ('icabs', 0.031), ('mercenaries', 0.031), ('norse', 0.031), ('oatmeal', 0.031), ('rexpop', 0.031), ('scalpels', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 96 acl-2013-Creating Similarity: Lateral Thinking for Vertical Similarity Judgments
Author: Tony Veale ; Guofu Li
Abstract: Just as observing is more than just seeing, comparing is far more than mere matching. It takes understanding, and even inventiveness, to discern a useful basis for judging two ideas as similar in a particular context, especially when our perspective is shaped by an act of linguistic creativity such as metaphor, simile or analogy. Structured resources such as WordNet offer a convenient hierarchical means for converging on a common ground for comparison, but offer little support for the divergent thinking that is needed to creatively view one concept as another. We describe such a means here, by showing how the web can be used to harvest many divergent views for many familiar ideas. These lateral views complement the vertical views of WordNet, and support a system for idea exploration called Thesaurus Rex. We show also how Thesaurus Rex supports a novel, generative similarity measure for WordNet. 1 Seeing is Believing (and Creating) Similarity is a cognitive phenomenon that is both complex and subjective, yet for practical reasons it is often modeled as if it were simple and objective. This makes sense for the many situations where we want to align our similarity judgments with those of others, and thus focus on the same conventional properties that others are also likely to focus upon. This reliance on the consensus viewpoint explains why WordNet (Fellbaum, 1998) has proven so useful as a basis for computational measures of lexico-semantic similarity Guofu Li School of Computer Science and Informatics, University College Dublin, Belfield, Dublin D2, Ireland. l .guo fu . l gmai l i @ .com (e.g. see Pederson et al. 2004, Budanitsky & Hirst, 2006; Seco et al. 2006). These measures reduce the similarity of two lexical concepts to a single number, by viewing similarity as an objective estimate of the overlap in their salient qualities. This convenient perspective is poorly suited to creative or insightful comparisons, but it is sufficient for the many mundane comparisons we often perform in daily life, such as when we organize books or look for items in a supermarket. So if we do not know in which aisle to locate a given item (such as oatmeal), we may tacitly know how to locate a similar product (such as cornflakes) and orient ourselves accordingly. Yet there are occasions when the recognition of similarities spurs the creation of similarities, when the act of comparison spurs us to invent new ways of looking at an idea. By placing pop tarts in the breakfast aisle, food manufacturers encourage us to view them as a breakfast food that is not dissimilar to oatmeal or cornflakes. When ex-PM Tony Blair published his memoirs, a mischievous activist encouraged others to move his book from Biography to Fiction in bookshops, in the hope that buyers would see it in a new light. Whenever we use a novel metaphor to convey a non-obvious viewpoint on a topic, such as “cigarettes are time bombs”, the comparison may spur us to insight, to see aspects of the topic that make it more similar to the vehicle (see Ortony, 1979; Veale & Hao, 2007). In formal terms, assume agent A has an insight about concept X, and uses the metaphor X is a Y to also provoke this insight in agent B. To arrive at this insight for itself, B must intuit what X and Y have in common. But this commonality is surely more than a standard categorization of X, or else it would not count as an insight about X. To understand the metaphor, B must place X 660 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t.he ?c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 6 0–670, in a new category, so that X can be seen as more similar to Y. Metaphors shape the way we per- ceive the world by re-shaping the way we make similarity judgments. So if we want to imbue computers with the ability to make and to understand creative metaphors, we must first give them the ability to look beyond the narrow viewpoints of conventional resources. Any measure that models similarity as an objective function of a conventional worldview employs a convergent thought process. Using WordNet, for instance, a similarity measure can vertically converge on a common superordinate category of both inputs, and generate a single numeric result based on their distance to, and the information content of, this common generalization. So to find the most conventional ways of seeing a lexical concept, one simply ascends a narrowing concept hierarchy, using a process de Bono (1970) calls vertical thinking. To find novel, non-obvious and useful ways of looking at a lexical concept, one must use what Guilford (1967) calls divergent thinking and what de Bono calls lateral thinking. These processes cut across familiar category boundaries, to simultaneously place a concept in many different categories so that we can see it in many different ways. de Bono argues that vertical thinking is selective while lateral thinking is generative. Whereas vertical thinking concerns itself with the “right” way or a single “best” way of looking at things, lateral thinking focuses on producing alternatives to the status quo. To be as useful for creative tasks as they are for conventional tasks, we need to re-imagine our computational similarity measures as generative rather than selective, expansive rather than reductive, divergent as well as convergent and lateral as well as vertical. Though WordNet is ideally structured to support vertical, convergent reasoning, its comprehensive nature means it can also be used as a solid foundation for building a more lateral and divergent model of similarity. Here we will use the web as a source of diverse perspectives on familiar ideas, to complement the conventional and often narrow views codified by WordNet. Section 2 provides a brief overview of past work in the area of similarity measurement, before section 3 describes a simple bootstrapping loop for acquiring richly diverse perspectives from the web for a wide variety of familiar ideas. These perspectives are used to enhance a Word- Net-based measure of lexico-semantic similarity in section 4, by broadening the range of informative viewpoints the measure can select from. Similarity is thus modeled as a process that is both generative and selective. This lateral-andvertical approach is evaluated in section 5, on the Miller & Charles (1991) data-set. A web app for the lateral exploration of diverse viewpoints, named Thesaurus Rex, is also presented, before closing remarks are offered in section 6. 2 Related Work and Ideas WordNet’s taxonomic organization of nounsenses and verb-senses – in which very general categories are successively divided into increasingly informative sub-categories or instancelevel ideas – allows us to gauge the overlap in information content, and thus of meaning, of two lexical concepts. We need only identify the deepest point in the taxonomy at which this content starts to diverge. This point of divergence is often called the LCS, or least common subsumer, of two concepts (Pederson et al., 2004). Since sub-categories add new properties to those they inherit from their parents – Aristotle called these properties the differentia that stop a category system from trivially collapsing into itself – the depth of a lexical concept in a taxonomy is an intuitive proxy for its information content. Wu & Palmer (1994) use the depth of a lexical concept in the WordNet hierarchy as such a proxy, and thereby estimate the similarity of two lexical concepts as twice the depth of their LCS divided by the sum of their individual depths. Leacock and Chodorow (1998) instead use the length of the shortest path between two concepts as a proxy for the conceptual distance between them. To connect any two ideas in a hierarchical system, one must vertically ascend the hierarchy from one concept, change direction at a potential LCS, and then descend the hierarchy to reach the second concept. (Aristotle was also first to suggest this approach in his Poetics). Leacock and Chodorow normalize the length of this path by dividing its size (in nodes) by twice the depth of the deepest concept in the hierarchy; the latter is an upper bound on the distance between any two concepts in the hierarchy. Negating the log of this normalized length yields a corresponding similarity score. While the role of an LCS is merely implied in Leacock and Chodorow’s use of a shortest path, the LCS is pivotal nonetheless, and like that of Wu & Palmer, the approach uses an essentially vertical reasoning process to identify a single “best” generalization. Depth is a convenient proxy for information content, but more nuanced proxies can yield 661 more rounded similarity measures. Resnick (1995) draws on information theory to define the information content of a lexical concept as the negative log likelihood of its occurrence in a corpus, either explicitly (via a direct mention) or by presupposition (via a mention of any of its sub-categories or instances). Since the likelihood of a general category occurring in a corpus is higher than that of any of its sub-categories or instances, such categories are more predictable, and less informative, than rarer categories whose occurrences are less predictable and thus more informative. The negative log likelihood of the most informative LCS of two lexical concepts offers a reliable estimate of the amount of infor- mation shared by those concepts, and thus a good estimate of their similarity. Lin (1998) combines the intuitions behind Resnick’s metric and that of Wu and Palmer to estimate the similarity of two lexical concepts as an information ratio: twice the information content of their LCS divided by the sum of their individual information contents. Jiang and Conrath (1997) consider the converse notion of dissimilarity, noting that two lexical concepts are dissimilar to the extent that each contains information that is not shared by the other. So if the information content of their most informative LCS is a good measure of what they do share, then the sum of their individual information contents, minus twice the content of their most informative LCS, is a reliable estimate of their dissimilarity. Seco et al. (2006) presents a minor innovation, showing how Resnick’s notion of information content can be calculated without the use of an external corpus. Rather, when using Resnick’s metric (or that of Lin, or Jiang and Conrath) for measuring the similarity of lexical concepts in WordNet, one can use the category structure of WordNet itself to estimate infor- mation content. Typically, the more general a concept, the more descendants it will possess. Seco et al. thus estimate the information content of a lexical concept as the log of the sum of all its unique descendants (both direct and indirect), divided by the log of the total number of concepts in the entire hierarchy. Not only is this intrinsic view of information content convenient to use, without recourse to an external corpus, Seco et al. show that it offers a better estimate of information content than its extrinsic, corpus-based alternatives, as measured relative to average human similarity ratings for the 30 word-pairs in the Miller & Charles (1991) test set. A similarity measure can draw on other sources of information besides WordNet’s category structures. One might eke out additional information from WordNet’s textual glosses, as in Lesk (1986), or use category structures other than those offered by WordNet. Looking beyond WordNet, entries in the online encyclopedia Wikipedia are not only connected by a dense topology of lateral links, they are also organized by a rich hierarchy of overlapping categories. Strube and Ponzetto (2006) show how Wikipedia can support a measure of similarity (and relatedness) that better approximates human judgments than many WordNet-based measures. Nonetheless, WordNet can be a valuable component of a hybrid measure, and Agirre et al. (2009) use an SVM (support vector machine) to combine information from WordNet with information harvested from the web. Their best similarity measure achieves a remarkable 0.93 correlation with human judgments on the Miller & Charles word-pair set. Similarity is not always applied to pairs of concepts; it is sometimes analogically applied to pairs of pairs of concepts, as in proportional analogies of the form A is to B as C is to D (e.g., hacks are to writers as mercenaries are to soldiers, or chisels are to sculptors as scalpels are to surgeons). In such analogies, one is really assessing the similarity of the unstated relationship between each pair of concepts: thus, mercenaries are soldiers whose allegiance is paid for, much as hacks are writers with income-driven loyalties; sculptors use chisels to carve stone, while surgeons use scalpels to cut or carve flesh. Veale (2004) used WordNet to assess the similarity of A:B to C:D as a function of the combined similarity of A to C and of B to D. In contrast, Turney (2005) used the web to pursue a more divergent course, to represent the tacit relationships of A to B and of C to D as points in a highdimensional space. The dimensions of this space initially correspond to linking phrases on the web, before these dimensions are significantly reduced using singular value decomposition. In the infamous SAT test, an analogy A:B::C:D has four other pairs of concepts that serve as likely distractors (e.g. singer:songwriter for hack:writer) and the goal is to choose the most appropriate C:D pair for a given A:B pairing. Using variants of Wu and Palmer (1994) on the 374 SAT analogies of Turney (2005), Veale (2004) reports a success rate of 38–44% using only WordNet-based similarity. In contrast, Turney (2005) reports up to 55% success on the same analogies, partly because his approach aims 662 to match implicit relations rather than explicit concepts, and in part because it uses a divergent process to gather from the web as rich a perspec- tive as it can on these latent relationships. 2.1 Clever Comparisons Create Similarity Each of these approaches to similarity is a user of information, rather than a creator, and each fails to capture how a creative comparison (such as a metaphor) can spur a listener to view a topic from an atypical perspective. Camac & Glucksberg (1984) provide experimental evidence for the claim that “metaphors do not use preexisting associations to achieve their effects [… ] people use metaphors to create new relations between concepts.” They also offer a salutary reminder of an often overlooked fact: every comparison exploits information, but each is also a source of new information in its own right. Thus, “this cola is acid” reveals a different perspective on cola (e.g. as a corrosive substance or an irritating food) than “this acid is cola” highlights for acid (such as e.g., a familiar substance) Veale & Keane (1994) model the role of similarity in realizing the long-term perlocutionary effect of an informative comparison. For example, to compare surgeons to butchers is to encourage one to see all surgeons as more bloody, … crude or careless. The reverse comparison, of butchers to surgeons, encourages one to see butchers as more skilled and precise. Veale & Keane present a network model of memory, called Sapper, in which activation can spread between related concepts, thus allowing one concept to prime the properties of a neighbor. To interpret an analogy, Sapper lays down new activation-carrying bridges in memory between analogical counterparts, such as between surgeon & butcher, flesh & meat, and scalpel & cleaver. Comparisons can thus have lasting effects on how Sapper sees the world, changing the pattern of activation that arises when it primes a concept. Veale (2003) adopts a similarly dynamic view of similarity in WordNet, showing how an analogical comparison can result in the automatic addition of new categories and relations to WordNet itself. Veale considers the problem of finding an analogical mapping between different parts of WordNet’s noun-sense hierarchy, such as between instances of Greek god and Norse god, or between the letters of different alphabets, such as of Greek and Hebrew. But no structural similarity measure for WordNet exhibits enough discernment to e.g. assign a higher similarity to Zeus & Odin (each is the supreme deity of its pantheon) than to a pairing of Zeus and any other Norse god, just as no structural measure will assign a higher similarity to Alpha & Aleph or to Beta & Beth than to any random letter pairing. A fine-grained category hierarchy permits fine-grained similarity judgments, and though WordNet is useful, its sense hierarchies are not especially fine-grained. However, we can automatically make WordNet subtler and more discerning, by adding new fine-grained categories to unite lexical concepts whose similarity is not reflected by any existing categories. Veale (2003) shows how a property that is found in the glosses of two lexical concepts, of the same depth, can be combined with their LCS to yield a new fine-grained parent category, so e.g. “supreme” + deity = Supreme-deity (for Odin, Zeus, Jupiter, etc.) and “1 st” + letter = 1st-letter (for Alpha, Aleph, etc.) Selected aspects of the textual similarity of two WordNet glosses – the key to similarity in Lesk (1986) – can thus be reified into an explicitly categorical WordNet form. 3 Divergent (Re)Categorization To tap into a richer source of concept properties than WordNet’s glosses, we can use web ngrams. Consider these descriptions of a cowboy from the Google n-grams (Brants & Franz, 2006). The numbers to the right are Google frequency counts. a lonesome cowboy 432 a mounted cowboy 122 a grizzled cowboy 74 a swaggering cowboy 68 To find the stable properties that can underpin a meaningful fine-grained category for cowboy, we must seek out the properties that are so often presupposed to be salient of all cowboys that one can use them to anchor a simile, such as
2 0.14302243 43 acl-2013-Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity
Author: Mohammad Taher Pilehvar ; David Jurgens ; Roberto Navigli
Abstract: Semantic similarity is an essential component of many Natural Language Processing applications. However, prior methods for computing semantic similarity often operate at different levels, e.g., single words or entire documents, which requires adapting the method for each data type. We present a unified approach to semantic similarity that operates at multiple levels, all the way from comparing word senses to comparing text documents. Our method leverages a common probabilistic representation over word senses in order to compare different types of linguistic data. This unified representation shows state-ofthe-art performance on three tasks: seman- tic textual similarity, word similarity, and word sense coarsening.
3 0.13804458 116 acl-2013-Detecting Metaphor by Contextual Analogy
Author: Eirini Florou
Abstract: As one of the most challenging issues in NLP, metaphor identification and its interpretation have seen many models and methods proposed. This paper presents a study on metaphor identification based on the semantic similarity between literal and non literal meanings of words that can appear at the same context.
4 0.13670526 88 acl-2013-Computational considerations of comparisons and similes
Author: Vlad Niculae ; Victoria Yaneva
Abstract: This paper presents work in progress towards automatic recognition and classification of comparisons and similes. Among possible applications, we discuss the place of this task in text simplification for readers with Autism Spectrum Disorders (ASD), who are known to have deficits in comprehending figurative language. We propose an approach to comparison recognition through the use of syntactic patterns. Keeping in mind the requirements of autistic readers, we discuss the properties relevant for distinguishing semantic criteria like figurativeness and abstractness.
5 0.13159131 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
Author: Olivier Ferret
Abstract: Distributional thesauri are now widely used in a large number of Natural Language Processing tasks. However, they are far from containing only interesting semantic relations. As a consequence, improving such thesaurus is an important issue that is mainly tackled indirectly through the improvement of semantic similarity measures. In this article, we propose a more direct approach focusing on the identification of the neighbors of a thesaurus entry that are not semantically linked to this entry. This identification relies on a discriminative classifier trained from unsupervised selected examples for building a distributional model of the entry in texts. Its bad neighbors are found by applying this classifier to a representative set of occurrences of each of these neighbors. We evaluate the interest of this method for a large set of English nouns with various frequencies.
6 0.11063398 234 acl-2013-Linking and Extending an Open Multilingual Wordnet
7 0.10670208 12 acl-2013-A New Set of Norms for Semantic Relatedness Measures
8 0.10111614 262 acl-2013-Offspring from Reproduction Problems: What Replication Failure Teaches Us
9 0.087408513 104 acl-2013-DKPro Similarity: An Open Source Framework for Text Similarity
10 0.08499179 253 acl-2013-Multilingual Affect Polarity and Valence Prediction in Metaphor-Rich Texts
11 0.084954813 304 acl-2013-SEMILAR: The Semantic Similarity Toolkit
12 0.08129254 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates
13 0.074709259 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
14 0.074452214 93 acl-2013-Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora
15 0.071532063 198 acl-2013-IndoNet: A Multilingual Lexical Knowledge Network for Indian Languages
16 0.068117954 65 acl-2013-BRAINSUP: Brainstorming Support for Creative Sentence Generation
17 0.067205541 58 acl-2013-Automated Collocation Suggestion for Japanese Second Language Learners
18 0.064943135 238 acl-2013-Measuring semantic content in distributional vectors
19 0.062041219 236 acl-2013-Mapping Source to Target Strings without Alignment by Analogical Learning: A Case Study with Transliteration
20 0.060003225 111 acl-2013-Density Maximization in Context-Sense Metric Space for All-words WSD
topicId topicWeight
[(0, 0.153), (1, 0.078), (2, 0.031), (3, -0.136), (4, -0.04), (5, -0.101), (6, -0.076), (7, 0.041), (8, 0.025), (9, -0.017), (10, -0.035), (11, 0.043), (12, -0.066), (13, -0.034), (14, 0.065), (15, 0.021), (16, 0.014), (17, 0.04), (18, -0.033), (19, -0.001), (20, 0.024), (21, -0.014), (22, 0.011), (23, -0.01), (24, 0.03), (25, 0.081), (26, -0.025), (27, 0.023), (28, -0.059), (29, 0.011), (30, -0.031), (31, -0.114), (32, 0.035), (33, -0.037), (34, 0.08), (35, 0.07), (36, 0.119), (37, 0.06), (38, 0.074), (39, -0.014), (40, -0.004), (41, 0.095), (42, 0.012), (43, 0.07), (44, -0.098), (45, -0.095), (46, -0.036), (47, -0.071), (48, 0.023), (49, 0.044)]
simIndex simValue paperId paperTitle
same-paper 1 0.92833275 96 acl-2013-Creating Similarity: Lateral Thinking for Vertical Similarity Judgments
Author: Tony Veale ; Guofu Li
Abstract: Just as observing is more than just seeing, comparing is far more than mere matching. It takes understanding, and even inventiveness, to discern a useful basis for judging two ideas as similar in a particular context, especially when our perspective is shaped by an act of linguistic creativity such as metaphor, simile or analogy. Structured resources such as WordNet offer a convenient hierarchical means for converging on a common ground for comparison, but offer little support for the divergent thinking that is needed to creatively view one concept as another. We describe such a means here, by showing how the web can be used to harvest many divergent views for many familiar ideas. These lateral views complement the vertical views of WordNet, and support a system for idea exploration called Thesaurus Rex. We show also how Thesaurus Rex supports a novel, generative similarity measure for WordNet. 1 Seeing is Believing (and Creating) Similarity is a cognitive phenomenon that is both complex and subjective, yet for practical reasons it is often modeled as if it were simple and objective. This makes sense for the many situations where we want to align our similarity judgments with those of others, and thus focus on the same conventional properties that others are also likely to focus upon. This reliance on the consensus viewpoint explains why WordNet (Fellbaum, 1998) has proven so useful as a basis for computational measures of lexico-semantic similarity Guofu Li School of Computer Science and Informatics, University College Dublin, Belfield, Dublin D2, Ireland. l .guo fu . l gmai l i @ .com (e.g. see Pederson et al. 2004, Budanitsky & Hirst, 2006; Seco et al. 2006). These measures reduce the similarity of two lexical concepts to a single number, by viewing similarity as an objective estimate of the overlap in their salient qualities. This convenient perspective is poorly suited to creative or insightful comparisons, but it is sufficient for the many mundane comparisons we often perform in daily life, such as when we organize books or look for items in a supermarket. So if we do not know in which aisle to locate a given item (such as oatmeal), we may tacitly know how to locate a similar product (such as cornflakes) and orient ourselves accordingly. Yet there are occasions when the recognition of similarities spurs the creation of similarities, when the act of comparison spurs us to invent new ways of looking at an idea. By placing pop tarts in the breakfast aisle, food manufacturers encourage us to view them as a breakfast food that is not dissimilar to oatmeal or cornflakes. When ex-PM Tony Blair published his memoirs, a mischievous activist encouraged others to move his book from Biography to Fiction in bookshops, in the hope that buyers would see it in a new light. Whenever we use a novel metaphor to convey a non-obvious viewpoint on a topic, such as “cigarettes are time bombs”, the comparison may spur us to insight, to see aspects of the topic that make it more similar to the vehicle (see Ortony, 1979; Veale & Hao, 2007). In formal terms, assume agent A has an insight about concept X, and uses the metaphor X is a Y to also provoke this insight in agent B. To arrive at this insight for itself, B must intuit what X and Y have in common. But this commonality is surely more than a standard categorization of X, or else it would not count as an insight about X. To understand the metaphor, B must place X 660 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t.he ?c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 6 0–670, in a new category, so that X can be seen as more similar to Y. Metaphors shape the way we per- ceive the world by re-shaping the way we make similarity judgments. So if we want to imbue computers with the ability to make and to understand creative metaphors, we must first give them the ability to look beyond the narrow viewpoints of conventional resources. Any measure that models similarity as an objective function of a conventional worldview employs a convergent thought process. Using WordNet, for instance, a similarity measure can vertically converge on a common superordinate category of both inputs, and generate a single numeric result based on their distance to, and the information content of, this common generalization. So to find the most conventional ways of seeing a lexical concept, one simply ascends a narrowing concept hierarchy, using a process de Bono (1970) calls vertical thinking. To find novel, non-obvious and useful ways of looking at a lexical concept, one must use what Guilford (1967) calls divergent thinking and what de Bono calls lateral thinking. These processes cut across familiar category boundaries, to simultaneously place a concept in many different categories so that we can see it in many different ways. de Bono argues that vertical thinking is selective while lateral thinking is generative. Whereas vertical thinking concerns itself with the “right” way or a single “best” way of looking at things, lateral thinking focuses on producing alternatives to the status quo. To be as useful for creative tasks as they are for conventional tasks, we need to re-imagine our computational similarity measures as generative rather than selective, expansive rather than reductive, divergent as well as convergent and lateral as well as vertical. Though WordNet is ideally structured to support vertical, convergent reasoning, its comprehensive nature means it can also be used as a solid foundation for building a more lateral and divergent model of similarity. Here we will use the web as a source of diverse perspectives on familiar ideas, to complement the conventional and often narrow views codified by WordNet. Section 2 provides a brief overview of past work in the area of similarity measurement, before section 3 describes a simple bootstrapping loop for acquiring richly diverse perspectives from the web for a wide variety of familiar ideas. These perspectives are used to enhance a Word- Net-based measure of lexico-semantic similarity in section 4, by broadening the range of informative viewpoints the measure can select from. Similarity is thus modeled as a process that is both generative and selective. This lateral-andvertical approach is evaluated in section 5, on the Miller & Charles (1991) data-set. A web app for the lateral exploration of diverse viewpoints, named Thesaurus Rex, is also presented, before closing remarks are offered in section 6. 2 Related Work and Ideas WordNet’s taxonomic organization of nounsenses and verb-senses – in which very general categories are successively divided into increasingly informative sub-categories or instancelevel ideas – allows us to gauge the overlap in information content, and thus of meaning, of two lexical concepts. We need only identify the deepest point in the taxonomy at which this content starts to diverge. This point of divergence is often called the LCS, or least common subsumer, of two concepts (Pederson et al., 2004). Since sub-categories add new properties to those they inherit from their parents – Aristotle called these properties the differentia that stop a category system from trivially collapsing into itself – the depth of a lexical concept in a taxonomy is an intuitive proxy for its information content. Wu & Palmer (1994) use the depth of a lexical concept in the WordNet hierarchy as such a proxy, and thereby estimate the similarity of two lexical concepts as twice the depth of their LCS divided by the sum of their individual depths. Leacock and Chodorow (1998) instead use the length of the shortest path between two concepts as a proxy for the conceptual distance between them. To connect any two ideas in a hierarchical system, one must vertically ascend the hierarchy from one concept, change direction at a potential LCS, and then descend the hierarchy to reach the second concept. (Aristotle was also first to suggest this approach in his Poetics). Leacock and Chodorow normalize the length of this path by dividing its size (in nodes) by twice the depth of the deepest concept in the hierarchy; the latter is an upper bound on the distance between any two concepts in the hierarchy. Negating the log of this normalized length yields a corresponding similarity score. While the role of an LCS is merely implied in Leacock and Chodorow’s use of a shortest path, the LCS is pivotal nonetheless, and like that of Wu & Palmer, the approach uses an essentially vertical reasoning process to identify a single “best” generalization. Depth is a convenient proxy for information content, but more nuanced proxies can yield 661 more rounded similarity measures. Resnick (1995) draws on information theory to define the information content of a lexical concept as the negative log likelihood of its occurrence in a corpus, either explicitly (via a direct mention) or by presupposition (via a mention of any of its sub-categories or instances). Since the likelihood of a general category occurring in a corpus is higher than that of any of its sub-categories or instances, such categories are more predictable, and less informative, than rarer categories whose occurrences are less predictable and thus more informative. The negative log likelihood of the most informative LCS of two lexical concepts offers a reliable estimate of the amount of infor- mation shared by those concepts, and thus a good estimate of their similarity. Lin (1998) combines the intuitions behind Resnick’s metric and that of Wu and Palmer to estimate the similarity of two lexical concepts as an information ratio: twice the information content of their LCS divided by the sum of their individual information contents. Jiang and Conrath (1997) consider the converse notion of dissimilarity, noting that two lexical concepts are dissimilar to the extent that each contains information that is not shared by the other. So if the information content of their most informative LCS is a good measure of what they do share, then the sum of their individual information contents, minus twice the content of their most informative LCS, is a reliable estimate of their dissimilarity. Seco et al. (2006) presents a minor innovation, showing how Resnick’s notion of information content can be calculated without the use of an external corpus. Rather, when using Resnick’s metric (or that of Lin, or Jiang and Conrath) for measuring the similarity of lexical concepts in WordNet, one can use the category structure of WordNet itself to estimate infor- mation content. Typically, the more general a concept, the more descendants it will possess. Seco et al. thus estimate the information content of a lexical concept as the log of the sum of all its unique descendants (both direct and indirect), divided by the log of the total number of concepts in the entire hierarchy. Not only is this intrinsic view of information content convenient to use, without recourse to an external corpus, Seco et al. show that it offers a better estimate of information content than its extrinsic, corpus-based alternatives, as measured relative to average human similarity ratings for the 30 word-pairs in the Miller & Charles (1991) test set. A similarity measure can draw on other sources of information besides WordNet’s category structures. One might eke out additional information from WordNet’s textual glosses, as in Lesk (1986), or use category structures other than those offered by WordNet. Looking beyond WordNet, entries in the online encyclopedia Wikipedia are not only connected by a dense topology of lateral links, they are also organized by a rich hierarchy of overlapping categories. Strube and Ponzetto (2006) show how Wikipedia can support a measure of similarity (and relatedness) that better approximates human judgments than many WordNet-based measures. Nonetheless, WordNet can be a valuable component of a hybrid measure, and Agirre et al. (2009) use an SVM (support vector machine) to combine information from WordNet with information harvested from the web. Their best similarity measure achieves a remarkable 0.93 correlation with human judgments on the Miller & Charles word-pair set. Similarity is not always applied to pairs of concepts; it is sometimes analogically applied to pairs of pairs of concepts, as in proportional analogies of the form A is to B as C is to D (e.g., hacks are to writers as mercenaries are to soldiers, or chisels are to sculptors as scalpels are to surgeons). In such analogies, one is really assessing the similarity of the unstated relationship between each pair of concepts: thus, mercenaries are soldiers whose allegiance is paid for, much as hacks are writers with income-driven loyalties; sculptors use chisels to carve stone, while surgeons use scalpels to cut or carve flesh. Veale (2004) used WordNet to assess the similarity of A:B to C:D as a function of the combined similarity of A to C and of B to D. In contrast, Turney (2005) used the web to pursue a more divergent course, to represent the tacit relationships of A to B and of C to D as points in a highdimensional space. The dimensions of this space initially correspond to linking phrases on the web, before these dimensions are significantly reduced using singular value decomposition. In the infamous SAT test, an analogy A:B::C:D has four other pairs of concepts that serve as likely distractors (e.g. singer:songwriter for hack:writer) and the goal is to choose the most appropriate C:D pair for a given A:B pairing. Using variants of Wu and Palmer (1994) on the 374 SAT analogies of Turney (2005), Veale (2004) reports a success rate of 38–44% using only WordNet-based similarity. In contrast, Turney (2005) reports up to 55% success on the same analogies, partly because his approach aims 662 to match implicit relations rather than explicit concepts, and in part because it uses a divergent process to gather from the web as rich a perspec- tive as it can on these latent relationships. 2.1 Clever Comparisons Create Similarity Each of these approaches to similarity is a user of information, rather than a creator, and each fails to capture how a creative comparison (such as a metaphor) can spur a listener to view a topic from an atypical perspective. Camac & Glucksberg (1984) provide experimental evidence for the claim that “metaphors do not use preexisting associations to achieve their effects [… ] people use metaphors to create new relations between concepts.” They also offer a salutary reminder of an often overlooked fact: every comparison exploits information, but each is also a source of new information in its own right. Thus, “this cola is acid” reveals a different perspective on cola (e.g. as a corrosive substance or an irritating food) than “this acid is cola” highlights for acid (such as e.g., a familiar substance) Veale & Keane (1994) model the role of similarity in realizing the long-term perlocutionary effect of an informative comparison. For example, to compare surgeons to butchers is to encourage one to see all surgeons as more bloody, … crude or careless. The reverse comparison, of butchers to surgeons, encourages one to see butchers as more skilled and precise. Veale & Keane present a network model of memory, called Sapper, in which activation can spread between related concepts, thus allowing one concept to prime the properties of a neighbor. To interpret an analogy, Sapper lays down new activation-carrying bridges in memory between analogical counterparts, such as between surgeon & butcher, flesh & meat, and scalpel & cleaver. Comparisons can thus have lasting effects on how Sapper sees the world, changing the pattern of activation that arises when it primes a concept. Veale (2003) adopts a similarly dynamic view of similarity in WordNet, showing how an analogical comparison can result in the automatic addition of new categories and relations to WordNet itself. Veale considers the problem of finding an analogical mapping between different parts of WordNet’s noun-sense hierarchy, such as between instances of Greek god and Norse god, or between the letters of different alphabets, such as of Greek and Hebrew. But no structural similarity measure for WordNet exhibits enough discernment to e.g. assign a higher similarity to Zeus & Odin (each is the supreme deity of its pantheon) than to a pairing of Zeus and any other Norse god, just as no structural measure will assign a higher similarity to Alpha & Aleph or to Beta & Beth than to any random letter pairing. A fine-grained category hierarchy permits fine-grained similarity judgments, and though WordNet is useful, its sense hierarchies are not especially fine-grained. However, we can automatically make WordNet subtler and more discerning, by adding new fine-grained categories to unite lexical concepts whose similarity is not reflected by any existing categories. Veale (2003) shows how a property that is found in the glosses of two lexical concepts, of the same depth, can be combined with their LCS to yield a new fine-grained parent category, so e.g. “supreme” + deity = Supreme-deity (for Odin, Zeus, Jupiter, etc.) and “1 st” + letter = 1st-letter (for Alpha, Aleph, etc.) Selected aspects of the textual similarity of two WordNet glosses – the key to similarity in Lesk (1986) – can thus be reified into an explicitly categorical WordNet form. 3 Divergent (Re)Categorization To tap into a richer source of concept properties than WordNet’s glosses, we can use web ngrams. Consider these descriptions of a cowboy from the Google n-grams (Brants & Franz, 2006). The numbers to the right are Google frequency counts. a lonesome cowboy 432 a mounted cowboy 122 a grizzled cowboy 74 a swaggering cowboy 68 To find the stable properties that can underpin a meaningful fine-grained category for cowboy, we must seek out the properties that are so often presupposed to be salient of all cowboys that one can use them to anchor a simile, such as
2 0.76663554 88 acl-2013-Computational considerations of comparisons and similes
Author: Vlad Niculae ; Victoria Yaneva
Abstract: This paper presents work in progress towards automatic recognition and classification of comparisons and similes. Among possible applications, we discuss the place of this task in text simplification for readers with Autism Spectrum Disorders (ASD), who are known to have deficits in comprehending figurative language. We propose an approach to comparison recognition through the use of syntactic patterns. Keeping in mind the requirements of autistic readers, we discuss the properties relevant for distinguishing semantic criteria like figurativeness and abstractness.
3 0.74305356 116 acl-2013-Detecting Metaphor by Contextual Analogy
Author: Eirini Florou
Abstract: As one of the most challenging issues in NLP, metaphor identification and its interpretation have seen many models and methods proposed. This paper presents a study on metaphor identification based on the semantic similarity between literal and non literal meanings of words that can appear at the same context.
4 0.71742368 12 acl-2013-A New Set of Norms for Semantic Relatedness Measures
Author: Sean Szumlanski ; Fernando Gomez ; Valerie K. Sims
Abstract: We have elicited human quantitative judgments of semantic relatedness for 122 pairs of nouns and compiled them into a new set of relatedness norms that we call Rel-122. Judgments from individual subjects in our study exhibit high average correlation to the resulting relatedness means (r = 0.77, σ = 0.09, N = 73), although not as high as Resnik’s (1995) upper bound for expected average human correlation to similarity means (r = 0.90). This suggests that human perceptions of relatedness are less strictly constrained than perceptions of similarity and establishes a clearer expectation for what constitutes human-like performance by a computational measure of semantic relatedness. We compare the results of several WordNet-based similarity and relatedness measures to our Rel-122 norms and demonstrate the limitations of WordNet for discovering general indications of semantic relatedness. We also offer a critique of the field’s reliance upon similarity norms to evaluate relatedness measures.
5 0.68551224 262 acl-2013-Offspring from Reproduction Problems: What Replication Failure Teaches Us
Author: Antske Fokkens ; Marieke van Erp ; Marten Postma ; Ted Pedersen ; Piek Vossen ; Nuno Freire
Abstract: Repeating experiments is an important instrument in the scientific toolbox to validate previous work and build upon existing work. We present two concrete use cases involving key techniques in the NLP domain for which we show that reproducing results is still difficult. We show that the deviation that can be found in reproduction efforts leads to questions about how our results should be interpreted. Moreover, investigating these deviations provides new insights and a deeper understanding of the examined techniques. We identify five aspects that can influence the outcomes of experiments that are typically not addressed in research papers. Our use cases show that these aspects may change the answer to research questions leading us to conclude that more care should be taken in interpreting our results and more research involving systematic testing of methods is required in our field.
6 0.68031919 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
7 0.64969158 304 acl-2013-SEMILAR: The Semantic Similarity Toolkit
8 0.62268877 104 acl-2013-DKPro Similarity: An Open Source Framework for Text Similarity
9 0.61980677 253 acl-2013-Multilingual Affect Polarity and Valence Prediction in Metaphor-Rich Texts
10 0.60529071 198 acl-2013-IndoNet: A Multilingual Lexical Knowledge Network for Indian Languages
11 0.59409946 43 acl-2013-Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity
12 0.57652473 234 acl-2013-Linking and Extending an Open Multilingual Wordnet
13 0.56821358 268 acl-2013-PATHS: A System for Accessing Cultural Heritage Collections
14 0.55129939 31 acl-2013-A corpus-based evaluation method for Distributional Semantic Models
15 0.52584976 238 acl-2013-Measuring semantic content in distributional vectors
16 0.51436442 93 acl-2013-Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora
17 0.50933075 242 acl-2013-Mining Equivalent Relations from Linked Data
18 0.50907606 61 acl-2013-Automatic Interpretation of the English Possessive
19 0.46792531 302 acl-2013-Robust Automated Natural Language Processing with Multiword Expressions and Collocations
20 0.46537361 65 acl-2013-BRAINSUP: Brainstorming Support for Creative Sentence Generation
topicId topicWeight
[(0, 0.078), (5, 0.025), (6, 0.028), (11, 0.059), (15, 0.019), (24, 0.058), (26, 0.036), (28, 0.248), (35, 0.072), (42, 0.039), (48, 0.056), (49, 0.017), (64, 0.016), (70, 0.038), (88, 0.038), (90, 0.022), (95, 0.048)]
simIndex simValue paperId paperTitle
1 0.95835102 349 acl-2013-The mathematics of language learning
Author: Andras Kornai ; Gerald Penn ; James Rogers ; Anssi Yli-Jyra
Abstract: unkown-abstract
same-paper 2 0.81823575 96 acl-2013-Creating Similarity: Lateral Thinking for Vertical Similarity Judgments
Author: Tony Veale ; Guofu Li
Abstract: Just as observing is more than just seeing, comparing is far more than mere matching. It takes understanding, and even inventiveness, to discern a useful basis for judging two ideas as similar in a particular context, especially when our perspective is shaped by an act of linguistic creativity such as metaphor, simile or analogy. Structured resources such as WordNet offer a convenient hierarchical means for converging on a common ground for comparison, but offer little support for the divergent thinking that is needed to creatively view one concept as another. We describe such a means here, by showing how the web can be used to harvest many divergent views for many familiar ideas. These lateral views complement the vertical views of WordNet, and support a system for idea exploration called Thesaurus Rex. We show also how Thesaurus Rex supports a novel, generative similarity measure for WordNet. 1 Seeing is Believing (and Creating) Similarity is a cognitive phenomenon that is both complex and subjective, yet for practical reasons it is often modeled as if it were simple and objective. This makes sense for the many situations where we want to align our similarity judgments with those of others, and thus focus on the same conventional properties that others are also likely to focus upon. This reliance on the consensus viewpoint explains why WordNet (Fellbaum, 1998) has proven so useful as a basis for computational measures of lexico-semantic similarity Guofu Li School of Computer Science and Informatics, University College Dublin, Belfield, Dublin D2, Ireland. l .guo fu . l gmai l i @ .com (e.g. see Pederson et al. 2004, Budanitsky & Hirst, 2006; Seco et al. 2006). These measures reduce the similarity of two lexical concepts to a single number, by viewing similarity as an objective estimate of the overlap in their salient qualities. This convenient perspective is poorly suited to creative or insightful comparisons, but it is sufficient for the many mundane comparisons we often perform in daily life, such as when we organize books or look for items in a supermarket. So if we do not know in which aisle to locate a given item (such as oatmeal), we may tacitly know how to locate a similar product (such as cornflakes) and orient ourselves accordingly. Yet there are occasions when the recognition of similarities spurs the creation of similarities, when the act of comparison spurs us to invent new ways of looking at an idea. By placing pop tarts in the breakfast aisle, food manufacturers encourage us to view them as a breakfast food that is not dissimilar to oatmeal or cornflakes. When ex-PM Tony Blair published his memoirs, a mischievous activist encouraged others to move his book from Biography to Fiction in bookshops, in the hope that buyers would see it in a new light. Whenever we use a novel metaphor to convey a non-obvious viewpoint on a topic, such as “cigarettes are time bombs”, the comparison may spur us to insight, to see aspects of the topic that make it more similar to the vehicle (see Ortony, 1979; Veale & Hao, 2007). In formal terms, assume agent A has an insight about concept X, and uses the metaphor X is a Y to also provoke this insight in agent B. To arrive at this insight for itself, B must intuit what X and Y have in common. But this commonality is surely more than a standard categorization of X, or else it would not count as an insight about X. To understand the metaphor, B must place X 660 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t.he ?c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 6 0–670, in a new category, so that X can be seen as more similar to Y. Metaphors shape the way we per- ceive the world by re-shaping the way we make similarity judgments. So if we want to imbue computers with the ability to make and to understand creative metaphors, we must first give them the ability to look beyond the narrow viewpoints of conventional resources. Any measure that models similarity as an objective function of a conventional worldview employs a convergent thought process. Using WordNet, for instance, a similarity measure can vertically converge on a common superordinate category of both inputs, and generate a single numeric result based on their distance to, and the information content of, this common generalization. So to find the most conventional ways of seeing a lexical concept, one simply ascends a narrowing concept hierarchy, using a process de Bono (1970) calls vertical thinking. To find novel, non-obvious and useful ways of looking at a lexical concept, one must use what Guilford (1967) calls divergent thinking and what de Bono calls lateral thinking. These processes cut across familiar category boundaries, to simultaneously place a concept in many different categories so that we can see it in many different ways. de Bono argues that vertical thinking is selective while lateral thinking is generative. Whereas vertical thinking concerns itself with the “right” way or a single “best” way of looking at things, lateral thinking focuses on producing alternatives to the status quo. To be as useful for creative tasks as they are for conventional tasks, we need to re-imagine our computational similarity measures as generative rather than selective, expansive rather than reductive, divergent as well as convergent and lateral as well as vertical. Though WordNet is ideally structured to support vertical, convergent reasoning, its comprehensive nature means it can also be used as a solid foundation for building a more lateral and divergent model of similarity. Here we will use the web as a source of diverse perspectives on familiar ideas, to complement the conventional and often narrow views codified by WordNet. Section 2 provides a brief overview of past work in the area of similarity measurement, before section 3 describes a simple bootstrapping loop for acquiring richly diverse perspectives from the web for a wide variety of familiar ideas. These perspectives are used to enhance a Word- Net-based measure of lexico-semantic similarity in section 4, by broadening the range of informative viewpoints the measure can select from. Similarity is thus modeled as a process that is both generative and selective. This lateral-andvertical approach is evaluated in section 5, on the Miller & Charles (1991) data-set. A web app for the lateral exploration of diverse viewpoints, named Thesaurus Rex, is also presented, before closing remarks are offered in section 6. 2 Related Work and Ideas WordNet’s taxonomic organization of nounsenses and verb-senses – in which very general categories are successively divided into increasingly informative sub-categories or instancelevel ideas – allows us to gauge the overlap in information content, and thus of meaning, of two lexical concepts. We need only identify the deepest point in the taxonomy at which this content starts to diverge. This point of divergence is often called the LCS, or least common subsumer, of two concepts (Pederson et al., 2004). Since sub-categories add new properties to those they inherit from their parents – Aristotle called these properties the differentia that stop a category system from trivially collapsing into itself – the depth of a lexical concept in a taxonomy is an intuitive proxy for its information content. Wu & Palmer (1994) use the depth of a lexical concept in the WordNet hierarchy as such a proxy, and thereby estimate the similarity of two lexical concepts as twice the depth of their LCS divided by the sum of their individual depths. Leacock and Chodorow (1998) instead use the length of the shortest path between two concepts as a proxy for the conceptual distance between them. To connect any two ideas in a hierarchical system, one must vertically ascend the hierarchy from one concept, change direction at a potential LCS, and then descend the hierarchy to reach the second concept. (Aristotle was also first to suggest this approach in his Poetics). Leacock and Chodorow normalize the length of this path by dividing its size (in nodes) by twice the depth of the deepest concept in the hierarchy; the latter is an upper bound on the distance between any two concepts in the hierarchy. Negating the log of this normalized length yields a corresponding similarity score. While the role of an LCS is merely implied in Leacock and Chodorow’s use of a shortest path, the LCS is pivotal nonetheless, and like that of Wu & Palmer, the approach uses an essentially vertical reasoning process to identify a single “best” generalization. Depth is a convenient proxy for information content, but more nuanced proxies can yield 661 more rounded similarity measures. Resnick (1995) draws on information theory to define the information content of a lexical concept as the negative log likelihood of its occurrence in a corpus, either explicitly (via a direct mention) or by presupposition (via a mention of any of its sub-categories or instances). Since the likelihood of a general category occurring in a corpus is higher than that of any of its sub-categories or instances, such categories are more predictable, and less informative, than rarer categories whose occurrences are less predictable and thus more informative. The negative log likelihood of the most informative LCS of two lexical concepts offers a reliable estimate of the amount of infor- mation shared by those concepts, and thus a good estimate of their similarity. Lin (1998) combines the intuitions behind Resnick’s metric and that of Wu and Palmer to estimate the similarity of two lexical concepts as an information ratio: twice the information content of their LCS divided by the sum of their individual information contents. Jiang and Conrath (1997) consider the converse notion of dissimilarity, noting that two lexical concepts are dissimilar to the extent that each contains information that is not shared by the other. So if the information content of their most informative LCS is a good measure of what they do share, then the sum of their individual information contents, minus twice the content of their most informative LCS, is a reliable estimate of their dissimilarity. Seco et al. (2006) presents a minor innovation, showing how Resnick’s notion of information content can be calculated without the use of an external corpus. Rather, when using Resnick’s metric (or that of Lin, or Jiang and Conrath) for measuring the similarity of lexical concepts in WordNet, one can use the category structure of WordNet itself to estimate infor- mation content. Typically, the more general a concept, the more descendants it will possess. Seco et al. thus estimate the information content of a lexical concept as the log of the sum of all its unique descendants (both direct and indirect), divided by the log of the total number of concepts in the entire hierarchy. Not only is this intrinsic view of information content convenient to use, without recourse to an external corpus, Seco et al. show that it offers a better estimate of information content than its extrinsic, corpus-based alternatives, as measured relative to average human similarity ratings for the 30 word-pairs in the Miller & Charles (1991) test set. A similarity measure can draw on other sources of information besides WordNet’s category structures. One might eke out additional information from WordNet’s textual glosses, as in Lesk (1986), or use category structures other than those offered by WordNet. Looking beyond WordNet, entries in the online encyclopedia Wikipedia are not only connected by a dense topology of lateral links, they are also organized by a rich hierarchy of overlapping categories. Strube and Ponzetto (2006) show how Wikipedia can support a measure of similarity (and relatedness) that better approximates human judgments than many WordNet-based measures. Nonetheless, WordNet can be a valuable component of a hybrid measure, and Agirre et al. (2009) use an SVM (support vector machine) to combine information from WordNet with information harvested from the web. Their best similarity measure achieves a remarkable 0.93 correlation with human judgments on the Miller & Charles word-pair set. Similarity is not always applied to pairs of concepts; it is sometimes analogically applied to pairs of pairs of concepts, as in proportional analogies of the form A is to B as C is to D (e.g., hacks are to writers as mercenaries are to soldiers, or chisels are to sculptors as scalpels are to surgeons). In such analogies, one is really assessing the similarity of the unstated relationship between each pair of concepts: thus, mercenaries are soldiers whose allegiance is paid for, much as hacks are writers with income-driven loyalties; sculptors use chisels to carve stone, while surgeons use scalpels to cut or carve flesh. Veale (2004) used WordNet to assess the similarity of A:B to C:D as a function of the combined similarity of A to C and of B to D. In contrast, Turney (2005) used the web to pursue a more divergent course, to represent the tacit relationships of A to B and of C to D as points in a highdimensional space. The dimensions of this space initially correspond to linking phrases on the web, before these dimensions are significantly reduced using singular value decomposition. In the infamous SAT test, an analogy A:B::C:D has four other pairs of concepts that serve as likely distractors (e.g. singer:songwriter for hack:writer) and the goal is to choose the most appropriate C:D pair for a given A:B pairing. Using variants of Wu and Palmer (1994) on the 374 SAT analogies of Turney (2005), Veale (2004) reports a success rate of 38–44% using only WordNet-based similarity. In contrast, Turney (2005) reports up to 55% success on the same analogies, partly because his approach aims 662 to match implicit relations rather than explicit concepts, and in part because it uses a divergent process to gather from the web as rich a perspec- tive as it can on these latent relationships. 2.1 Clever Comparisons Create Similarity Each of these approaches to similarity is a user of information, rather than a creator, and each fails to capture how a creative comparison (such as a metaphor) can spur a listener to view a topic from an atypical perspective. Camac & Glucksberg (1984) provide experimental evidence for the claim that “metaphors do not use preexisting associations to achieve their effects [… ] people use metaphors to create new relations between concepts.” They also offer a salutary reminder of an often overlooked fact: every comparison exploits information, but each is also a source of new information in its own right. Thus, “this cola is acid” reveals a different perspective on cola (e.g. as a corrosive substance or an irritating food) than “this acid is cola” highlights for acid (such as e.g., a familiar substance) Veale & Keane (1994) model the role of similarity in realizing the long-term perlocutionary effect of an informative comparison. For example, to compare surgeons to butchers is to encourage one to see all surgeons as more bloody, … crude or careless. The reverse comparison, of butchers to surgeons, encourages one to see butchers as more skilled and precise. Veale & Keane present a network model of memory, called Sapper, in which activation can spread between related concepts, thus allowing one concept to prime the properties of a neighbor. To interpret an analogy, Sapper lays down new activation-carrying bridges in memory between analogical counterparts, such as between surgeon & butcher, flesh & meat, and scalpel & cleaver. Comparisons can thus have lasting effects on how Sapper sees the world, changing the pattern of activation that arises when it primes a concept. Veale (2003) adopts a similarly dynamic view of similarity in WordNet, showing how an analogical comparison can result in the automatic addition of new categories and relations to WordNet itself. Veale considers the problem of finding an analogical mapping between different parts of WordNet’s noun-sense hierarchy, such as between instances of Greek god and Norse god, or between the letters of different alphabets, such as of Greek and Hebrew. But no structural similarity measure for WordNet exhibits enough discernment to e.g. assign a higher similarity to Zeus & Odin (each is the supreme deity of its pantheon) than to a pairing of Zeus and any other Norse god, just as no structural measure will assign a higher similarity to Alpha & Aleph or to Beta & Beth than to any random letter pairing. A fine-grained category hierarchy permits fine-grained similarity judgments, and though WordNet is useful, its sense hierarchies are not especially fine-grained. However, we can automatically make WordNet subtler and more discerning, by adding new fine-grained categories to unite lexical concepts whose similarity is not reflected by any existing categories. Veale (2003) shows how a property that is found in the glosses of two lexical concepts, of the same depth, can be combined with their LCS to yield a new fine-grained parent category, so e.g. “supreme” + deity = Supreme-deity (for Odin, Zeus, Jupiter, etc.) and “1 st” + letter = 1st-letter (for Alpha, Aleph, etc.) Selected aspects of the textual similarity of two WordNet glosses – the key to similarity in Lesk (1986) – can thus be reified into an explicitly categorical WordNet form. 3 Divergent (Re)Categorization To tap into a richer source of concept properties than WordNet’s glosses, we can use web ngrams. Consider these descriptions of a cowboy from the Google n-grams (Brants & Franz, 2006). The numbers to the right are Google frequency counts. a lonesome cowboy 432 a mounted cowboy 122 a grizzled cowboy 74 a swaggering cowboy 68 To find the stable properties that can underpin a meaningful fine-grained category for cowboy, we must seek out the properties that are so often presupposed to be salient of all cowboys that one can use them to anchor a simile, such as
3 0.813384 124 acl-2013-Discriminative state tracking for spoken dialog systems
Author: Angeliki Metallinou ; Dan Bohus ; Jason Williams
Abstract: In spoken dialog systems, statistical state tracking aims to improve robustness to speech recognition errors by tracking a posterior distribution over hidden dialog states. Current approaches based on generative or discriminative models have different but important shortcomings that limit their accuracy. In this paper we discuss these limitations and introduce a new approach for discriminative state tracking that overcomes them by leveraging the problem structure. An offline evaluation with dialog data collected from real users shows improvements in both state tracking accuracy and the quality of the posterior probabilities. Features that encode speech recognition error patterns are particularly helpful, and training requires rel- atively few dialogs.
4 0.80585563 107 acl-2013-Deceptive Answer Prediction with User Preference Graph
Author: Fangtao Li ; Yang Gao ; Shuchang Zhou ; Xiance Si ; Decheng Dai
Abstract: In Community question answering (QA) sites, malicious users may provide deceptive answers to promote their products or services. It is important to identify and filter out these deceptive answers. In this paper, we first solve this problem with the traditional supervised learning methods. Two kinds of features, including textual and contextual features, are investigated for this task. We further propose to exploit the user relationships to identify the deceptive answers, based on the hypothesis that similar users will have similar behaviors to post deceptive or authentic answers. To measure the user similarity, we propose a new user preference graph based on the answer preference expressed by users, such as “helpful” voting and “best answer” selection. The user preference graph is incorporated into traditional supervised learning framework with the graph regularization technique. The experiment results demonstrate that the user preference graph can indeed help improve the performance of deceptive answer prediction.
5 0.78281313 363 acl-2013-Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
Author: Hendra Setiawan ; Bowen Zhou ; Bing Xiang ; Libin Shen
Abstract: Long distance reordering remains one of the greatest challenges in statistical machine translation research as the key contextual information may well be beyond the confine of translation units. In this paper, we propose Two-Neighbor Orientation (TNO) model that jointly models the orientation decisions between anchors and two neighboring multi-unit chunks which may cross phrase or rule boundaries. We explicitly model the longest span of such chunks, referred to as Maximal Orientation Span, to serve as a global parameter that constrains underlying local decisions. We integrate our proposed model into a state-of-the-art string-to-dependency translation system and demonstrate the efficacy of our proposal in a large-scale Chinese-to-English translation task. On NIST MT08 set, our most advanced model brings around +2.0 BLEU and -1.0 TER improvement.
6 0.75526434 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data
7 0.70018333 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules
8 0.58811301 191 acl-2013-Improved Bayesian Logistic Supervised Topic Models with Data Augmentation
9 0.57834107 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation
10 0.57633978 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
11 0.57380313 366 acl-2013-Understanding Verbs based on Overlapping Verbs Senses
12 0.56510371 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context
13 0.56386596 367 acl-2013-Universal Conceptual Cognitive Annotation (UCCA)
14 0.56336445 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
15 0.56093967 267 acl-2013-PARMA: A Predicate Argument Aligner
16 0.5605039 116 acl-2013-Detecting Metaphor by Contextual Analogy
17 0.5561502 250 acl-2013-Models of Translation Competitions
18 0.55584967 230 acl-2013-Lightly Supervised Learning of Procedural Dialog Systems
19 0.55543697 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
20 0.55507517 16 acl-2013-A Novel Translation Framework Based on Rhetorical Structure Theory