acl acl2012 acl2012-206 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Gerard de Melo ; Gerhard Weikum
Abstract: We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names.
Reference: text
sentIndex sentText sentNum sentScore
1 UWN: A Large Multilingual Lexical Knowledge Base Gerard de Melo ICSI Berkeley deme lo @ i s i berkeley c . [sent-1, score-0.065]
2 edu Abstract We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. [sent-3, score-0.464]
3 This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. [sent-4, score-0.389]
4 We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. [sent-5, score-0.098]
5 An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names. [sent-6, score-0.243]
6 1 Introduction Semantic knowledge about words and named enti- ties is a fundamental building block both in various forms of language technology as well as in enduser applications. [sent-7, score-0.197]
7 Examples of the latter include word processor thesauri, online dictionaries, question answering, and mobile services. [sent-8, score-0.044]
8 Further uses of lexical knowledge include data cleaning (Kedad and Métais, 2002), visual object recognition (Marszałek and Schmid, 2007), and biomedical data analysis (Rubin and others, 2006). [sent-13, score-0.114]
9 Many of these applications have used Englishlanguage resources like WordNet (Fellbaum, 1998). [sent-14, score-0.044]
10 de However, a more multilingual resource equipped with an easy-to-use API would not only enable us to perform all of the aforementioned tasks in additional languages, but also to explore cross-lingual applications like cross-lingual IR (Etzioni et al. [sent-17, score-0.345]
11 This paper describes a new API that makes lexical knowledge about millions of items in over 200 languages available to applications, and a corresponding online user interface for users to explore the data. [sent-20, score-0.305]
12 We first describe link prediction techniques used to create the multilingual core of the knowledge base with word sense information (Section 2). [sent-21, score-0.621]
13 We then outline techniques used to incorporate named entities and specialized concepts (Section 3) and other types of knowledge (Section 4). [sent-22, score-0.29]
14 Finally, we describe how the information is made accessible via a user interface (Section 5) and a software API (Section 6). [sent-23, score-0.104]
15 2 The UWN Core UWN (de Melo and Weikum, 2009) is based on WordNet (Fellbaum, 1998), the most popular lexical knowledge base for the English language. [sent-24, score-0.192]
16 WordNet enumerates the senses of a word, providing a short description text (gloss) and synonyms for each meaning. [sent-25, score-0.146]
17 Additionally, it describes relationships be- tween senses, e. [sent-26, score-0.076]
18 via the hyponymy/hypernymy relation that holds when one term like ‘publication’ is a generalization of another term like ‘journal’ . [sent-28, score-0.088]
19 In order to accomplish this at a large scale, we automatically link Proce dJienjgus, R ofep thueb 5lic0t hof A Knonruea ,l M 8-e1e4ti Jnugly o f2 t0h1e2 A. [sent-30, score-0.097]
20 This transforms WordNet into a multilingual lexical knowledge base that covers not only English terms but hundreds of thousands of terms from many different languages. [sent-33, score-0.388]
21 Unfortunately, a straightforward translation runs into major difficulties because of homonyms and synonyms. [sent-34, score-0.05]
22 For example, a word like ‘bat’ has 10 senses in the English WordNet, but a German translation like ‘Fledermaus’ (the animal) only applies to a small subset of those senses (cf. [sent-35, score-0.43]
23 Figure 1: Word sense ambiguity Knowledge Extraction An initial input knowledge base graph G0 is constructed by extracting information from existing wordnets, translation dictionaries including Wiktionary (http://www. [sent-38, score-0.315]
24 Link Prediction A sequence of knowledge graphs Gi are iteratively derived by assessing paths from a new term x to an existing WordNet sense z via some English translation y covered by WordNet. [sent-42, score-0.195]
25 For instance, the German ‘Fledermaus’ has ‘bat’ as a translation and hence initially is tentatively linked to all senses of ‘bat’ with a confidence of 0. [sent-43, score-0.265]
26 In each iteration, the confidence values are then updated to reflect how likely it seems that those links are correct. [sent-44, score-0.092]
27 The confidences are predicted using RBFkernel SVM models that are learnt from a training set of labelled links between non-English words and 152 senses. [sent-45, score-0.144]
28 The feature space is constructed using a series of graph-based statistical scores that represent properties of the previous graph Gi−1 and additionally make use of measures of semantic relatedness and corpus frequencies. [sent-46, score-0.134]
29 The function sim∗ computes the maximal similarity between any sense of y and the current sense z. [sent-50, score-0.154]
30 The dissim function computes the sum of dissimilarities between senses of y and z, essentially quantifying how many alternatives there are to z. [sent-51, score-0.146]
31 Additional weighting functions γ are used to bias scores towards senses that have an acceptable part-of-speech and senses that are more frequent in the SemCor corpus. [sent-52, score-0.292]
32 Relying on multiple iterations allows us to draw on multilingual evidence for greater precision and recall. [sent-53, score-0.196]
33 For instance, after linking the German ‘Fledermaus’ to the animal sense of ‘bat’, we may be able to infer the same for the Turkish translation ‘yarasa’ . [sent-54, score-0.172]
34 φ, Results We have successfully applied these techniques to automatically create UWN, a large-scale multilingual wordnet. [sent-55, score-0.196]
35 3 MENTA: Named Entities and Specialized Concepts The UWN Core is extended by incorporating large amounts of named entities and language- and domain-specific concepts from Wikipedia (de Melo and Weikum, 2010a). [sent-71, score-0.222]
36 In the process, we also obtain human-readable glosses in many languages, links to images, and other valuable information. [sent-72, score-0.092]
37 These additions are not simply added as a separate knowledge base, but fully connected and integrated with the core. [sent-73, score-0.261]
38 In particular, we create a mapping between Wikipedia and WordNet in order to merge equivalent entries and we use taxonomy construction methods in order to attach all new named entities to their most likely classes, e. [sent-74, score-0.31]
39 ‘Haight-Ashbury’ is linked to a WordNet sense of the word ‘neighborhood’ . [sent-76, score-0.146]
40 Information Integration Supervised link prediction, similar to the method presented in Section 2, is used in order to attach Wikipedia articles to semanti- cally equivalent WordNet entries, while also exploiting gloss similarity as an additional feature. [sent-77, score-0.332]
41 Additionally, we connect articles from different multilingual Wikipedia editions via their cross-lingual interwiki links, as well as categories with equivalent articles and article redirects with redirect targets. [sent-78, score-0.497]
42 We then consider connected components of directly or transitively linked items. [sent-79, score-0.204]
43 In the ideal case, such a connected component consists of a number of items all describing the same concept or entity, including articles from different versions of Wikipedia and perhaps also categories or WordNet senses. [sent-80, score-0.312]
44 Unfortunately, in many cases one obtains connected components that are unlikely to be correct, because multiple articles from the same Wikipedia edition or multiple incompatible WordNet senses are included in the same component. [sent-81, score-0.375]
45 This can be due to incorrect links produced by the supervised link prediction, but often even the original links from Wikipedia are not consistent. [sent-82, score-0.281]
46 In order to obtain more consistent connected components, we use combinatorial optimization methods to delete certain links. [sent-83, score-0.135]
47 In particular, for each connected component to be analysed, an Integer Linear Program formalizes the objective of mini- mizing the costs for deleted edges and the costs for ignoring soft constraints. [sent-84, score-0.222]
48 The basic aim is that of deleting as few edges as possible while simultaneously ensuring that the graph becomes as consistent as possible. [sent-85, score-0.086]
49 In some cases, there is overwhelming evidence indicating that two slightly different articles should be grouped together, while in other cases there might be little evidence for the correctness of an edge and so it can easily be deleted with low cost. [sent-86, score-0.094]
50 The clean connected components resulting from this process can then be merged to form aggregate entities. [sent-88, score-0.135]
51 For instance, given WordNet’s standard sense for ‘fog’, water vapor, we can check which other items are in the connected component and transfer all information to the WordNet entry. [sent-89, score-0.255]
52 By extracting snippets of text from the beginning of Wikipedia articles, we can add new gloss descriptions for fog in Arabic, Asturian, Bengali, and many other languages. [sent-90, score-0.153]
53 We can also attach pictures showing fog to the WordNet word sense. [sent-91, score-0.191]
54 Taxonomy Induction The above process connects articles to their counterparts in WordNet. [sent-92, score-0.094]
55 In the next step, we ensure that articles without any direct counterpart are linked to WordNet as well, by means of taxonomic hypernymy/instance links (de Melo and Weikum, 2010a). [sent-93, score-0.323]
56 We generate individual hypotheses about likely parents of entities. [sent-94, score-0.091]
57 For instance, articles are connected to their Wikipedia categories (if these are not assessed to be mere topic descriptors) and categories are linked to parent categories, etc. [sent-95, score-0.484]
58 In order to link categories to possible parent hypernyms in WordNet, we adapt the approach proposed for YAGO (Suchanek et al. [sent-96, score-0.2]
59 We then construct a Markov chain based on this graph of parents that also incorporates the possibility of random jumps from any parent back to the current entity under consideration. [sent-100, score-0.196]
60 Figure 2: Noisy initial edges (left) and cleaned, integrated output (right), shown in a simplified form Figure 3: UWN with named entities Results Overall, we obtain a knowledge base with 5. [sent-102, score-0.415]
61 Over 2 million named entities come only from non-English Wikipedia editions, but their taxonomic links to WordNet still have an accuracy around 90%. [sent-105, score-0.327]
62 An example excerpt is shown in Figure 3, with named entities connected to higher-level classes in UWN, all with multilingual labels. [sent-106, score-0.498]
63 4 Other Extensions Word Relationships Another plugin provides word relationships and properties mined from Wiktionary. [sent-107, score-0.2]
64 Frame-Semantic Knowledge Frame semantics is a cognitively motivated theory that describes words in terms of the cognitive frames or scenarios that they evoke and the corresponding participants involved in them. [sent-117, score-0.099]
65 For instance, the Comme rce_goods -t rans fe r frame normally involves a seller and a buyer, among other things, and different words like ‘buy’ and ‘sell’ can be chosen to describe the same event. [sent-119, score-0.148]
66 Such detailed knowledge about scenarios is largely complementary in nature to the sense relationships that WordNet provides. [sent-120, score-0.221]
67 There have been individual systems that made use of both forms of knowledge (Shi and Mihalcea, 2005; Coppola and others, 2009), but due to their very different nature, there is currently no simple way to accomplish this feat. [sent-122, score-0.068]
68 Our system addresses this by seamlessly integrating frame semantic knowledge into the system. [sent-123, score-0.212]
69 , 1998), the most well-known computational instantiation of frame semantics. [sent-125, score-0.104]
70 These are all integrated into WordNet’s hypernym hierarchy, i. [sent-129, score-0.058]
71 from language families like the Sinitic languages one may move down to macrolanguages like Chinese, and then to more specific forms like Mandarin Chinese, dialect groups like Ji-Lu Mandarin, or even dialects of particular cities. [sent-131, score-0.219]
72 The information is obtained from ISO standards, the Unicode CLDR as well as Wikipedia and then integrated with WordNet using the information integration strategies described above (de Melo and Weikum, 2008). [sent-132, score-0.108]
73 For instance, the Chinese character ‘娴’ is connected to its radical component ‘女’ and to its pronunciation component ‘ 闲’ . [sent-134, score-0.221]
74 5 Integrated Query Interface and Wiki We have developed an online interface that provides access to our data to interested researchers (yagoknowledge. [sent-135, score-0.243]
75 Interactive online interfaces offer new ways of interacting with lexical knowledge that are not possible with traditional print dictionaries. [sent-137, score-0.158]
76 A non-native speaker of English looking up the word ‘tercel’ might find it helpful to see pictures available for the related terms ‘hawk’ or ‘falcon’ a Google Image search for ‘tercel’ merely delivers images of Toyota Tercel cars. [sent-139, score-0.088]
77 While there have been other multilingual interfaces to WordNet-style lexical knowledge in the past (Pianta et al. [sent-140, score-0.31]
78 The most similar resource is BabelNet (Navigli and Ponzetto, – 155 2010), which contains multilingual synsets but does not connect named entities from Wikipedia to them in a multilingual taxonomy. [sent-142, score-0.599]
79 Figure 4: Part of Online Interface 6 Integrated API Our goal is to make the knowledge that we have derived available for use in applications. [sent-143, score-0.068]
80 While there are many existing APIs for WordNet and other lexical resources (e. [sent-145, score-0.046]
81 , 2011; Gurevych and others, 2012)), these don’t provide a comparable degree of integrated multilingual and taxonomic information. [sent-148, score-0.322]
82 Interface The API can be used by initializing an accessor object and possibly specifying the list of plugins to be loaded. [sent-149, score-0.053]
83 Depending on the particular application, one may choose only Princeton WordNet and the UWN Core, or one may want to include named entities from Wikipedia and framesemantic knowledge derived from FrameNet, for instance. [sent-150, score-0.235]
84 The accessor provides a simple graph-based lookup API as well as some convenience methods for common types of queries. [sent-151, score-0.097]
85 It also provides a simple word sense disambiguation method that, given a tokenized text with part-of- speech and lemma annotations, selects likely word senses by choosing the senses (with matching partof-speech) that are most similar to words in the context. [sent-153, score-0.413]
86 Note that these modules go beyond existing APIs because they operate on words in many different languages and semantic similarity can even be assessed across languages. [sent-154, score-0.126]
87 Data Structures Under the hood, each plugin relies on a disk-based associative array to store the knowledge base as a labelled multi-graph. [sent-155, score-0.278]
88 The outgoing labelled edges of an entity are saved on disk in a serialized form, including relation names and relation weights. [sent-156, score-0.139]
89 An index structure allows determining the position of such records on disk. [sent-157, score-0.048]
90 Internally, this index structure is implemented as a linearly-probed hash table that is also stored externally. [sent-158, score-0.048]
91 Note that such a structure is very efficient in this scenario, because the index is used as a readonly data store by the API. [sent-159, score-0.048]
92 Once an index has been created, write operations are no longer performed, so B+ trees and similar disk-based balanced tree indices commonly used in relational database management systems are not needed. [sent-160, score-0.048]
93 The advantage is that this enables faster lookups, because retrieval opera- tions normally require only two disk reads per plugin, one to access a block in the index table, and another to access a block of actual data. [sent-161, score-0.287]
94 7 Conclusion UWN is an important new multilingual lexical resource that is now freely available to the community. [sent-162, score-0.282]
95 It has been constructed using sophisticated knowledge extraction, link prediction, information integration, and taxonomy induction methods. [sent-163, score-0.24]
96 Apart from an online querying and browsing interface, we have also implemented an API that facilitates the use of the knowledge base in applications. [sent-164, score-0.19]
97 Resolving pattern ambiguity for English to 156 Hindi machine translation using WordNet. [sent-180, score-0.05]
98 Towards a universal wordnet by learning from combined evidence. [sent-196, score-0.31]
99 Lexical translation with application to image search on the Web. [sent-211, score-0.05]
100 WikiNetTk – A tool kit for embedding world knowledge in NLP applications. [sent-242, score-0.068]
wordName wordTfidf (topN-words)
[('uwn', 0.336), ('wordnet', 0.31), ('melo', 0.266), ('multilingual', 0.196), ('api', 0.181), ('senses', 0.146), ('framenet', 0.143), ('wikipedia', 0.141), ('connected', 0.135), ('gerhard', 0.128), ('weikum', 0.128), ('bat', 0.106), ('interface', 0.104), ('frame', 0.104), ('gerard', 0.102), ('link', 0.097), ('articles', 0.094), ('links', 0.092), ('fledermaus', 0.092), ('tercel', 0.092), ('parents', 0.091), ('entities', 0.085), ('named', 0.082), ('plugin', 0.08), ('fog', 0.08), ('base', 0.078), ('sense', 0.077), ('relationships', 0.076), ('taxonomy', 0.075), ('editions', 0.073), ('gloss', 0.073), ('linked', 0.069), ('knowledge', 0.068), ('unicode', 0.068), ('attach', 0.068), ('taxonomic', 0.068), ('de', 0.065), ('parent', 0.063), ('cldr', 0.061), ('emphasizes', 0.061), ('godbole', 0.061), ('kedad', 0.061), ('madhavan', 0.061), ('marsza', 0.061), ('pianta', 0.061), ('rubin', 0.061), ('simx', 0.061), ('unhappy', 0.061), ('gi', 0.059), ('integrated', 0.058), ('apis', 0.056), ('concepts', 0.055), ('core', 0.054), ('evoke', 0.053), ('judea', 0.053), ('atserias', 0.053), ('babelnet', 0.053), ('chatterjee', 0.053), ('menta', 0.053), ('accessor', 0.053), ('thesauri', 0.053), ('labelled', 0.052), ('fellbaum', 0.052), ('additionally', 0.052), ('access', 0.051), ('prediction', 0.051), ('integration', 0.05), ('translation', 0.05), ('gong', 0.049), ('index', 0.048), ('block', 0.047), ('lexical', 0.046), ('participants', 0.046), ('animal', 0.045), ('images', 0.045), ('navigli', 0.045), ('suchanek', 0.045), ('coppola', 0.045), ('shi', 0.045), ('german', 0.045), ('edges', 0.044), ('online', 0.044), ('like', 0.044), ('provides', 0.044), ('samples', 0.044), ('languages', 0.043), ('component', 0.043), ('yago', 0.043), ('pictures', 0.043), ('assessed', 0.043), ('mandarin', 0.043), ('happy', 0.043), ('disk', 0.043), ('graph', 0.042), ('categories', 0.04), ('resource', 0.04), ('semantic', 0.04), ('ek', 0.039), ('gurevych', 0.039), ('baker', 0.039)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999917 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
Author: Gerard de Melo ; Gerhard Weikum
Abstract: We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names.
2 0.2836915 152 acl-2012-Multilingual WSD with Just a Few Lines of Code: the BabelNet API
Author: Roberto Navigli ; Simone Paolo Ponzetto
Abstract: In this paper we present an API for programmatic access to BabelNet a wide-coverage multilingual lexical knowledge base and multilingual knowledge-rich Word Sense Disambiguation (WSD). Our aim is to provide the research community with easy-to-use tools to perform multilingual lexical semantic analysis and foster further research in this direction. – –
3 0.12307023 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval
Author: Zhi Zhong ; Hwee Tou Ng
Abstract: Previous research has conflicting conclusions on whether word sense disambiguation (WSD) systems can improve information retrieval (IR) performance. In this paper, we propose a method to estimate sense distributions for short queries. Together with the senses predicted for words in documents, we propose a novel approach to incorporate word senses into the language modeling approach to IR and also exploit the integration of synonym relations. Our experimental results on standard TREC collections show that using the word senses tagged by a supervised WSD system, we obtain significant improvements over a state-of-the-art IR system.
4 0.11736114 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study
Author: Nathan Schneider ; Behrang Mohit ; Kemal Oflazer ; Noah A. Smith
Abstract: “Lightweight” semantic annotation of text calls for a simple representation, ideally without requiring a semantic lexicon to achieve good coverage in the language and domain. In this paper, we repurpose WordNet’s supersense tags for annotation, developing specific guidelines for nominal expressions and applying them to Arabic Wikipedia articles in four topical domains. The resulting corpus has high coverage and was completed quickly with reasonable inter-annotator agreement.
5 0.11588311 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation
Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum
Abstract: To discover relation types from text, most methods cluster shallow or syntactic patterns of relation mentions, but consider only one possible sense per pattern. In practice this assumption is often violated. In this paper we overcome this issue by inducing clusters of pattern senses from feature representations of patterns. In particular, we employ a topic model to partition entity pairs associated with patterns into sense clusters using local and global features. We merge these sense clusters into semantic relations using hierarchical agglomerative clustering. We compare against several baselines: a generative latent-variable model, a clustering method that does not disambiguate between path senses, and our own approach but with only local features. Experimental results show our proposed approach discovers dramatically more accurate clusters than models without sense disambiguation, and that incorporating global features, such as the document theme, is crucial.
6 0.11139461 161 acl-2012-Polarity Consistency Checking for Sentiment Dictionaries
7 0.10814214 134 acl-2012-Learning to Find Translations and Transliterations on the Web
8 0.10723397 150 acl-2012-Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia
9 0.096080281 178 acl-2012-Sentence Simplification by Monolingual Machine Translation
10 0.084872633 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition
11 0.083530948 7 acl-2012-A Computational Approach to the Automation of Creative Naming
12 0.081105746 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model
13 0.08107385 194 acl-2012-Text Segmentation by Language Using Minimum Description Length
14 0.077201575 60 acl-2012-Coupling Label Propagation and Constraints for Temporal Fact Extraction
15 0.074987978 13 acl-2012-A Graphical Interface for MT Evaluation and Error Analysis
16 0.072155148 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
17 0.066429824 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing
18 0.063828193 64 acl-2012-Crosslingual Induction of Semantic Roles
19 0.063276239 142 acl-2012-Mining Entity Types from Query Logs via User Intent Modeling
20 0.063078336 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes
topicId topicWeight
[(0, -0.212), (1, 0.082), (2, 0.009), (3, 0.081), (4, 0.103), (5, 0.149), (6, -0.026), (7, 0.06), (8, -0.006), (9, -0.043), (10, 0.179), (11, 0.027), (12, 0.142), (13, 0.086), (14, -0.053), (15, -0.229), (16, 0.001), (17, 0.075), (18, -0.016), (19, 0.031), (20, -0.052), (21, -0.101), (22, -0.048), (23, -0.023), (24, 0.009), (25, 0.165), (26, -0.099), (27, 0.004), (28, -0.026), (29, -0.06), (30, 0.087), (31, 0.048), (32, 0.013), (33, 0.096), (34, -0.086), (35, -0.016), (36, -0.067), (37, -0.002), (38, 0.095), (39, -0.175), (40, -0.115), (41, 0.097), (42, 0.1), (43, 0.121), (44, 0.01), (45, -0.049), (46, 0.098), (47, -0.112), (48, -0.089), (49, -0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.96435338 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
Author: Gerard de Melo ; Gerhard Weikum
Abstract: We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names.
2 0.90479904 152 acl-2012-Multilingual WSD with Just a Few Lines of Code: the BabelNet API
Author: Roberto Navigli ; Simone Paolo Ponzetto
Abstract: In this paper we present an API for programmatic access to BabelNet a wide-coverage multilingual lexical knowledge base and multilingual knowledge-rich Word Sense Disambiguation (WSD). Our aim is to provide the research community with easy-to-use tools to perform multilingual lexical semantic analysis and foster further research in this direction. – –
3 0.56205702 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study
Author: Nathan Schneider ; Behrang Mohit ; Kemal Oflazer ; Noah A. Smith
Abstract: “Lightweight” semantic annotation of text calls for a simple representation, ideally without requiring a semantic lexicon to achieve good coverage in the language and domain. In this paper, we repurpose WordNet’s supersense tags for annotation, developing specific guidelines for nominal expressions and applying them to Arabic Wikipedia articles in four topical domains. The resulting corpus has high coverage and was completed quickly with reasonable inter-annotator agreement.
4 0.52667075 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval
Author: Zhi Zhong ; Hwee Tou Ng
Abstract: Previous research has conflicting conclusions on whether word sense disambiguation (WSD) systems can improve information retrieval (IR) performance. In this paper, we propose a method to estimate sense distributions for short queries. Together with the senses predicted for words in documents, we propose a novel approach to incorporate word senses into the language modeling approach to IR and also exploit the integration of synonym relations. Our experimental results on standard TREC collections show that using the word senses tagged by a supervised WSD system, we obtain significant improvements over a state-of-the-art IR system.
5 0.46399051 7 acl-2012-A Computational Approach to the Automation of Creative Naming
Author: Gozde Ozbal ; Carlo Strapparava
Abstract: In this paper, we propose a computational approach to generate neologisms consisting of homophonic puns and metaphors based on the category of the service to be named and the properties to be underlined. We describe all the linguistic resources and natural language processing techniques that we have exploited for this task. Then, we analyze the performance of the system that we have developed. The empirical results show that our approach is generally effective and it constitutes a solid starting point for the automation ofthe naming process.
6 0.46290317 161 acl-2012-Polarity Consistency Checking for Sentiment Dictionaries
7 0.45826283 195 acl-2012-The Creation of a Corpus of English Metalanguage
8 0.43052211 194 acl-2012-Text Segmentation by Language Using Minimum Description Length
9 0.41103688 178 acl-2012-Sentence Simplification by Monolingual Machine Translation
10 0.38490289 1 acl-2012-ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora
11 0.34065598 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
12 0.33632547 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition
13 0.33465788 13 acl-2012-A Graphical Interface for MT Evaluation and Error Analysis
14 0.32909784 186 acl-2012-Structuring E-Commerce Inventory
15 0.32738003 112 acl-2012-Humor as Circuits in Semantic Networks
16 0.31943336 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence
17 0.31180128 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation
18 0.31130761 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing
19 0.30263263 138 acl-2012-LetsMT!: Cloud-Based Platform for Do-It-Yourself Machine Translation
20 0.29418704 151 acl-2012-Multilingual Subjectivity and Sentiment Analysis
topicId topicWeight
[(13, 0.011), (25, 0.047), (26, 0.059), (28, 0.047), (30, 0.035), (37, 0.025), (39, 0.08), (57, 0.016), (64, 0.012), (74, 0.024), (82, 0.039), (84, 0.024), (85, 0.079), (86, 0.011), (87, 0.218), (90, 0.083), (92, 0.049), (94, 0.017), (99, 0.058)]
simIndex simValue paperId paperTitle
same-paper 1 0.80198145 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
Author: Gerard de Melo ; Gerhard Weikum
Abstract: We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names.
2 0.78673577 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning
Author: Jonathan Berant ; Ido Dagan ; Meni Adler ; Jacob Goldberger
Abstract: Learning entailment rules is fundamental in many semantic-inference applications and has been an active field of research in recent years. In this paper we address the problem of learning transitive graphs that describe entailment rules between predicates (termed entailment graphs). We first identify that entailment graphs exhibit a “tree-like” property and are very similar to a novel type of graph termed forest-reducible graph. We utilize this property to develop an iterative efficient approximation algorithm for learning the graph edges, where each iteration takes linear time. We compare our approximation algorithm to a recently-proposed state-of-the-art exact algorithm and show that it is more efficient and scalable both theoretically and empirically, while its output quality is close to that given by the optimal solution of the exact algorithm.
3 0.61107105 152 acl-2012-Multilingual WSD with Just a Few Lines of Code: the BabelNet API
Author: Roberto Navigli ; Simone Paolo Ponzetto
Abstract: In this paper we present an API for programmatic access to BabelNet a wide-coverage multilingual lexical knowledge base and multilingual knowledge-rich Word Sense Disambiguation (WSD). Our aim is to provide the research community with easy-to-use tools to perform multilingual lexical semantic analysis and foster further research in this direction. – –
4 0.56118441 29 acl-2012-Assessing the Effect of Inconsistent Assessors on Summarization Evaluation
Author: Karolina Owczarzak ; Peter A. Rankel ; Hoa Trang Dang ; John M. Conroy
Abstract: We investigate the consistency of human assessors involved in summarization evaluation to understand its effect on system ranking and automatic evaluation techniques. Using Text Analysis Conference data, we measure annotator consistency based on human scoring of summaries for Responsiveness, Readability, and Pyramid scoring. We identify inconsistencies in the data and measure to what extent these inconsistencies affect the ranking of automatic summarization systems. Finally, we examine the stability of automatic metrics (ROUGE and CLASSY) with respect to the inconsistent assessments.
5 0.55771929 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents
Author: Yashar Mehdad ; Matteo Negri ; Marcello Federico
Abstract: We address a core aspect of the multilingual content synchronization task: the identification of novel, more informative or semantically equivalent pieces of information in two documents about the same topic. This can be seen as an application-oriented variant of textual entailment recognition where: i) T and H are in different languages, and ii) entailment relations between T and H have to be checked in both directions. Using a combination of lexical, syntactic, and semantic features to train a cross-lingual textual entailment system, we report promising results on different datasets.
6 0.55742192 82 acl-2012-Entailment-based Text Exploration with Application to the Health-care Domain
7 0.55345261 187 acl-2012-Subgroup Detection in Ideological Discussions
8 0.55128157 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle
9 0.55061913 138 acl-2012-LetsMT!: Cloud-Based Platform for Do-It-Yourself Machine Translation
10 0.55029273 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool
11 0.54988366 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
12 0.54803294 191 acl-2012-Temporally Anchored Relation Extraction
13 0.54600203 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
14 0.54552221 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence
16 0.54433364 139 acl-2012-MIX Is Not a Tree-Adjoining Language
17 0.54420239 44 acl-2012-CSNIPER - Annotation-by-query for Non-canonical Constructions in Large Corpora
18 0.54182333 165 acl-2012-Probabilistic Integration of Partial Lexical Information for Noise Robust Haptic Voice Recognition
19 0.54131413 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition
20 0.53980166 13 acl-2012-A Graphical Interface for MT Evaluation and Error Analysis