acl acl2012 acl2012-206 knowledge-graph by maker-knowledge-mining

206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

Source: pdf

Author: Gerard de Melo ; Gerhard Weikum

Abstract: We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 UWN: A Large Multilingual Lexical Knowledge Base Gerard de Melo ICSI Berkeley deme lo @ i s i berkeley c . [sent-1, score-0.065]

2 edu Abstract We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. [sent-3, score-0.464]

3 This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. [sent-4, score-0.389]

4 We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. [sent-5, score-0.098]

5 An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names. [sent-6, score-0.243]

6 1 Introduction Semantic knowledge about words and named enti- ties is a fundamental building block both in various forms of language technology as well as in enduser applications. [sent-7, score-0.197]

7 Examples of the latter include word processor thesauri, online dictionaries, question answering, and mobile services. [sent-8, score-0.044]

8 Further uses of lexical knowledge include data cleaning (Kedad and Métais, 2002), visual object recognition (Marszałek and Schmid, 2007), and biomedical data analysis (Rubin and others, 2006). [sent-13, score-0.114]

9 Many of these applications have used Englishlanguage resources like WordNet (Fellbaum, 1998). [sent-14, score-0.044]

10 de However, a more multilingual resource equipped with an easy-to-use API would not only enable us to perform all of the aforementioned tasks in additional languages, but also to explore cross-lingual applications like cross-lingual IR (Etzioni et al. [sent-17, score-0.345]

11 This paper describes a new API that makes lexical knowledge about millions of items in over 200 languages available to applications, and a corresponding online user interface for users to explore the data. [sent-20, score-0.305]

12 We first describe link prediction techniques used to create the multilingual core of the knowledge base with word sense information (Section 2). [sent-21, score-0.621]

13 We then outline techniques used to incorporate named entities and specialized concepts (Section 3) and other types of knowledge (Section 4). [sent-22, score-0.29]

14 Finally, we describe how the information is made accessible via a user interface (Section 5) and a software API (Section 6). [sent-23, score-0.104]

15 2 The UWN Core UWN (de Melo and Weikum, 2009) is based on WordNet (Fellbaum, 1998), the most popular lexical knowledge base for the English language. [sent-24, score-0.192]

16 WordNet enumerates the senses of a word, providing a short description text (gloss) and synonyms for each meaning. [sent-25, score-0.146]

17 Additionally, it describes relationships be- tween senses, e. [sent-26, score-0.076]

18 via the hyponymy/hypernymy relation that holds when one term like ‘publication’ is a generalization of another term like ‘journal’ . [sent-28, score-0.088]

19 In order to accomplish this at a large scale, we automatically link Proce dJienjgus, R ofep thueb 5lic0t hof A Knonruea ,l M 8-e1e4ti Jnugly o f2 t0h1e2 A. [sent-30, score-0.097]

20 This transforms WordNet into a multilingual lexical knowledge base that covers not only English terms but hundreds of thousands of terms from many different languages. [sent-33, score-0.388]

21 Unfortunately, a straightforward translation runs into major difficulties because of homonyms and synonyms. [sent-34, score-0.05]

22 For example, a word like ‘bat’ has 10 senses in the English WordNet, but a German translation like ‘Fledermaus’ (the animal) only applies to a small subset of those senses (cf. [sent-35, score-0.43]

23 Figure 1: Word sense ambiguity Knowledge Extraction An initial input knowledge base graph G0 is constructed by extracting information from existing wordnets, translation dictionaries including Wiktionary (http://www. [sent-38, score-0.315]

24 Link Prediction A sequence of knowledge graphs Gi are iteratively derived by assessing paths from a new term x to an existing WordNet sense z via some English translation y covered by WordNet. [sent-42, score-0.195]

25 For instance, the German ‘Fledermaus’ has ‘bat’ as a translation and hence initially is tentatively linked to all senses of ‘bat’ with a confidence of 0. [sent-43, score-0.265]

26 In each iteration, the confidence values are then updated to reflect how likely it seems that those links are correct. [sent-44, score-0.092]

27 The confidences are predicted using RBFkernel SVM models that are learnt from a training set of labelled links between non-English words and 152 senses. [sent-45, score-0.144]

28 The feature space is constructed using a series of graph-based statistical scores that represent properties of the previous graph Gi−1 and additionally make use of measures of semantic relatedness and corpus frequencies. [sent-46, score-0.134]

29 The function sim∗ computes the maximal similarity between any sense of y and the current sense z. [sent-50, score-0.154]

30 The dissim function computes the sum of dissimilarities between senses of y and z, essentially quantifying how many alternatives there are to z. [sent-51, score-0.146]

31 Additional weighting functions γ are used to bias scores towards senses that have an acceptable part-of-speech and senses that are more frequent in the SemCor corpus. [sent-52, score-0.292]

32 Relying on multiple iterations allows us to draw on multilingual evidence for greater precision and recall. [sent-53, score-0.196]

33 For instance, after linking the German ‘Fledermaus’ to the animal sense of ‘bat’, we may be able to infer the same for the Turkish translation ‘yarasa’ . [sent-54, score-0.172]

34 φ, Results We have successfully applied these techniques to automatically create UWN, a large-scale multilingual wordnet. [sent-55, score-0.196]

35 3 MENTA: Named Entities and Specialized Concepts The UWN Core is extended by incorporating large amounts of named entities and language- and domain-specific concepts from Wikipedia (de Melo and Weikum, 2010a). [sent-71, score-0.222]

36 In the process, we also obtain human-readable glosses in many languages, links to images, and other valuable information. [sent-72, score-0.092]

37 These additions are not simply added as a separate knowledge base, but fully connected and integrated with the core. [sent-73, score-0.261]

38 In particular, we create a mapping between Wikipedia and WordNet in order to merge equivalent entries and we use taxonomy construction methods in order to attach all new named entities to their most likely classes, e. [sent-74, score-0.31]

39 ‘Haight-Ashbury’ is linked to a WordNet sense of the word ‘neighborhood’ . [sent-76, score-0.146]

40 Information Integration Supervised link prediction, similar to the method presented in Section 2, is used in order to attach Wikipedia articles to semanti- cally equivalent WordNet entries, while also exploiting gloss similarity as an additional feature. [sent-77, score-0.332]

41 Additionally, we connect articles from different multilingual Wikipedia editions via their cross-lingual interwiki links, as well as categories with equivalent articles and article redirects with redirect targets. [sent-78, score-0.497]

42 We then consider connected components of directly or transitively linked items. [sent-79, score-0.204]

43 In the ideal case, such a connected component consists of a number of items all describing the same concept or entity, including articles from different versions of Wikipedia and perhaps also categories or WordNet senses. [sent-80, score-0.312]

44 Unfortunately, in many cases one obtains connected components that are unlikely to be correct, because multiple articles from the same Wikipedia edition or multiple incompatible WordNet senses are included in the same component. [sent-81, score-0.375]

45 This can be due to incorrect links produced by the supervised link prediction, but often even the original links from Wikipedia are not consistent. [sent-82, score-0.281]

46 In order to obtain more consistent connected components, we use combinatorial optimization methods to delete certain links. [sent-83, score-0.135]

47 In particular, for each connected component to be analysed, an Integer Linear Program formalizes the objective of mini- mizing the costs for deleted edges and the costs for ignoring soft constraints. [sent-84, score-0.222]

48 The basic aim is that of deleting as few edges as possible while simultaneously ensuring that the graph becomes as consistent as possible. [sent-85, score-0.086]

49 In some cases, there is overwhelming evidence indicating that two slightly different articles should be grouped together, while in other cases there might be little evidence for the correctness of an edge and so it can easily be deleted with low cost. [sent-86, score-0.094]

50 The clean connected components resulting from this process can then be merged to form aggregate entities. [sent-88, score-0.135]

51 For instance, given WordNet’s standard sense for ‘fog’, water vapor, we can check which other items are in the connected component and transfer all information to the WordNet entry. [sent-89, score-0.255]

52 By extracting snippets of text from the beginning of Wikipedia articles, we can add new gloss descriptions for fog in Arabic, Asturian, Bengali, and many other languages. [sent-90, score-0.153]

53 We can also attach pictures showing fog to the WordNet word sense. [sent-91, score-0.191]

54 Taxonomy Induction The above process connects articles to their counterparts in WordNet. [sent-92, score-0.094]

55 In the next step, we ensure that articles without any direct counterpart are linked to WordNet as well, by means of taxonomic hypernymy/instance links (de Melo and Weikum, 2010a). [sent-93, score-0.323]

56 We generate individual hypotheses about likely parents of entities. [sent-94, score-0.091]

57 For instance, articles are connected to their Wikipedia categories (if these are not assessed to be mere topic descriptors) and categories are linked to parent categories, etc. [sent-95, score-0.484]

58 In order to link categories to possible parent hypernyms in WordNet, we adapt the approach proposed for YAGO (Suchanek et al. [sent-96, score-0.2]

59 We then construct a Markov chain based on this graph of parents that also incorporates the possibility of random jumps from any parent back to the current entity under consideration. [sent-100, score-0.196]

60 Figure 2: Noisy initial edges (left) and cleaned, integrated output (right), shown in a simplified form Figure 3: UWN with named entities Results Overall, we obtain a knowledge base with 5. [sent-102, score-0.415]

61 Over 2 million named entities come only from non-English Wikipedia editions, but their taxonomic links to WordNet still have an accuracy around 90%. [sent-105, score-0.327]

62 An example excerpt is shown in Figure 3, with named entities connected to higher-level classes in UWN, all with multilingual labels. [sent-106, score-0.498]

63 4 Other Extensions Word Relationships Another plugin provides word relationships and properties mined from Wiktionary. [sent-107, score-0.2]

64 Frame-Semantic Knowledge Frame semantics is a cognitively motivated theory that describes words in terms of the cognitive frames or scenarios that they evoke and the corresponding participants involved in them. [sent-117, score-0.099]

65 For instance, the Comme rce_goods -t rans fe r frame normally involves a seller and a buyer, among other things, and different words like ‘buy’ and ‘sell’ can be chosen to describe the same event. [sent-119, score-0.148]

66 Such detailed knowledge about scenarios is largely complementary in nature to the sense relationships that WordNet provides. [sent-120, score-0.221]

67 There have been individual systems that made use of both forms of knowledge (Shi and Mihalcea, 2005; Coppola and others, 2009), but due to their very different nature, there is currently no simple way to accomplish this feat. [sent-122, score-0.068]

68 Our system addresses this by seamlessly integrating frame semantic knowledge into the system. [sent-123, score-0.212]

69 , 1998), the most well-known computational instantiation of frame semantics. [sent-125, score-0.104]

70 These are all integrated into WordNet’s hypernym hierarchy, i. [sent-129, score-0.058]

71 from language families like the Sinitic languages one may move down to macrolanguages like Chinese, and then to more specific forms like Mandarin Chinese, dialect groups like Ji-Lu Mandarin, or even dialects of particular cities. [sent-131, score-0.219]

72 The information is obtained from ISO standards, the Unicode CLDR as well as Wikipedia and then integrated with WordNet using the information integration strategies described above (de Melo and Weikum, 2008). [sent-132, score-0.108]

73 For instance, the Chinese character ‘娴’ is connected to its radical component ‘女’ and to its pronunciation component ‘ 闲’ . [sent-134, score-0.221]

74 5 Integrated Query Interface and Wiki We have developed an online interface that provides access to our data to interested researchers (yagoknowledge. [sent-135, score-0.243]

75 Interactive online interfaces offer new ways of interacting with lexical knowledge that are not possible with traditional print dictionaries. [sent-137, score-0.158]

76 A non-native speaker of English looking up the word ‘tercel’ might find it helpful to see pictures available for the related terms ‘hawk’ or ‘falcon’ a Google Image search for ‘tercel’ merely delivers images of Toyota Tercel cars. [sent-139, score-0.088]

77 While there have been other multilingual interfaces to WordNet-style lexical knowledge in the past (Pianta et al. [sent-140, score-0.31]

78 The most similar resource is BabelNet (Navigli and Ponzetto, – 155 2010), which contains multilingual synsets but does not connect named entities from Wikipedia to them in a multilingual taxonomy. [sent-142, score-0.599]

79 Figure 4: Part of Online Interface 6 Integrated API Our goal is to make the knowledge that we have derived available for use in applications. [sent-143, score-0.068]

80 While there are many existing APIs for WordNet and other lexical resources (e. [sent-145, score-0.046]

81 , 2011; Gurevych and others, 2012)), these don’t provide a comparable degree of integrated multilingual and taxonomic information. [sent-148, score-0.322]

82 Interface The API can be used by initializing an accessor object and possibly specifying the list of plugins to be loaded. [sent-149, score-0.053]

83 Depending on the particular application, one may choose only Princeton WordNet and the UWN Core, or one may want to include named entities from Wikipedia and framesemantic knowledge derived from FrameNet, for instance. [sent-150, score-0.235]

84 The accessor provides a simple graph-based lookup API as well as some convenience methods for common types of queries. [sent-151, score-0.097]

85 It also provides a simple word sense disambiguation method that, given a tokenized text with part-of- speech and lemma annotations, selects likely word senses by choosing the senses (with matching partof-speech) that are most similar to words in the context. [sent-153, score-0.413]

86 Note that these modules go beyond existing APIs because they operate on words in many different languages and semantic similarity can even be assessed across languages. [sent-154, score-0.126]

87 Data Structures Under the hood, each plugin relies on a disk-based associative array to store the knowledge base as a labelled multi-graph. [sent-155, score-0.278]

88 The outgoing labelled edges of an entity are saved on disk in a serialized form, including relation names and relation weights. [sent-156, score-0.139]

89 An index structure allows determining the position of such records on disk. [sent-157, score-0.048]

90 Internally, this index structure is implemented as a linearly-probed hash table that is also stored externally. [sent-158, score-0.048]

91 Note that such a structure is very efficient in this scenario, because the index is used as a readonly data store by the API. [sent-159, score-0.048]

92 Once an index has been created, write operations are no longer performed, so B+ trees and similar disk-based balanced tree indices commonly used in relational database management systems are not needed. [sent-160, score-0.048]

93 The advantage is that this enables faster lookups, because retrieval opera- tions normally require only two disk reads per plugin, one to access a block in the index table, and another to access a block of actual data. [sent-161, score-0.287]

94 7 Conclusion UWN is an important new multilingual lexical resource that is now freely available to the community. [sent-162, score-0.282]

95 It has been constructed using sophisticated knowledge extraction, link prediction, information integration, and taxonomy induction methods. [sent-163, score-0.24]

96 Apart from an online querying and browsing interface, we have also implemented an API that facilitates the use of the knowledge base in applications. [sent-164, score-0.19]

97 Resolving pattern ambiguity for English to 156 Hindi machine translation using WordNet. [sent-180, score-0.05]

98 Towards a universal wordnet by learning from combined evidence. [sent-196, score-0.31]

99 Lexical translation with application to image search on the Web. [sent-211, score-0.05]

100 WikiNetTk – A tool kit for embedding world knowledge in NLP applications. [sent-242, score-0.068]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('uwn', 0.336), ('wordnet', 0.31), ('melo', 0.266), ('multilingual', 0.196), ('api', 0.181), ('senses', 0.146), ('framenet', 0.143), ('wikipedia', 0.141), ('connected', 0.135), ('gerhard', 0.128), ('weikum', 0.128), ('bat', 0.106), ('interface', 0.104), ('frame', 0.104), ('gerard', 0.102), ('link', 0.097), ('articles', 0.094), ('links', 0.092), ('fledermaus', 0.092), ('tercel', 0.092), ('parents', 0.091), ('entities', 0.085), ('named', 0.082), ('plugin', 0.08), ('fog', 0.08), ('base', 0.078), ('sense', 0.077), ('relationships', 0.076), ('taxonomy', 0.075), ('editions', 0.073), ('gloss', 0.073), ('linked', 0.069), ('knowledge', 0.068), ('unicode', 0.068), ('attach', 0.068), ('taxonomic', 0.068), ('de', 0.065), ('parent', 0.063), ('cldr', 0.061), ('emphasizes', 0.061), ('godbole', 0.061), ('kedad', 0.061), ('madhavan', 0.061), ('marsza', 0.061), ('pianta', 0.061), ('rubin', 0.061), ('simx', 0.061), ('unhappy', 0.061), ('gi', 0.059), ('integrated', 0.058), ('apis', 0.056), ('concepts', 0.055), ('core', 0.054), ('evoke', 0.053), ('judea', 0.053), ('atserias', 0.053), ('babelnet', 0.053), ('chatterjee', 0.053), ('menta', 0.053), ('accessor', 0.053), ('thesauri', 0.053), ('labelled', 0.052), ('fellbaum', 0.052), ('additionally', 0.052), ('access', 0.051), ('prediction', 0.051), ('integration', 0.05), ('translation', 0.05), ('gong', 0.049), ('index', 0.048), ('block', 0.047), ('lexical', 0.046), ('participants', 0.046), ('animal', 0.045), ('images', 0.045), ('navigli', 0.045), ('suchanek', 0.045), ('coppola', 0.045), ('shi', 0.045), ('german', 0.045), ('edges', 0.044), ('online', 0.044), ('like', 0.044), ('provides', 0.044), ('samples', 0.044), ('languages', 0.043), ('component', 0.043), ('yago', 0.043), ('pictures', 0.043), ('assessed', 0.043), ('mandarin', 0.043), ('happy', 0.043), ('disk', 0.043), ('graph', 0.042), ('categories', 0.04), ('resource', 0.04), ('semantic', 0.04), ('ek', 0.039), ('gurevych', 0.039), ('baker', 0.039)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999917 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

Author: Gerard de Melo ; Gerhard Weikum

2 0.2836915 152 acl-2012-Multilingual WSD with Just a Few Lines of Code: the BabelNet API

Author: Roberto Navigli ; Simone Paolo Ponzetto

Abstract: In this paper we present an API for programmatic access to BabelNet a wide-coverage multilingual lexical knowledge base and multilingual knowledge-rich Word Sense Disambiguation (WSD). Our aim is to provide the research community with easy-to-use tools to perform multilingual lexical semantic analysis and foster further research in this direction. – –

3 0.12307023 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval

Author: Zhi Zhong ; Hwee Tou Ng

Abstract: Previous research has conflicting conclusions on whether word sense disambiguation (WSD) systems can improve information retrieval (IR) performance. In this paper, we propose a method to estimate sense distributions for short queries. Together with the senses predicted for words in documents, we propose a novel approach to incorporate word senses into the language modeling approach to IR and also exploit the integration of synonym relations. Our experimental results on standard TREC collections show that using the word senses tagged by a supervised WSD system, we obtain significant improvements over a state-of-the-art IR system.

4 0.11736114 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study

Author: Nathan Schneider ; Behrang Mohit ; Kemal Oflazer ; Noah A. Smith

Abstract: “Lightweight” semantic annotation of text calls for a simple representation, ideally without requiring a semantic lexicon to achieve good coverage in the language and domain. In this paper, we repurpose WordNet’s supersense tags for annotation, developing specific guidelines for nominal expressions and applying them to Arabic Wikipedia articles in four topical domains. The resulting corpus has high coverage and was completed quickly with reasonable inter-annotator agreement.

5 0.11588311 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum

Abstract: To discover relation types from text, most methods cluster shallow or syntactic patterns of relation mentions, but consider only one possible sense per pattern. In practice this assumption is often violated. In this paper we overcome this issue by inducing clusters of pattern senses from feature representations of patterns. In particular, we employ a topic model to partition entity pairs associated with patterns into sense clusters using local and global features. We merge these sense clusters into semantic relations using hierarchical agglomerative clustering. We compare against several baselines: a generative latent-variable model, a clustering method that does not disambiguate between path senses, and our own approach but with only local features. Experimental results show our proposed approach discovers dramatically more accurate clusters than models without sense disambiguation, and that incorporating global features, such as the document theme, is crucial.

6 0.11139461 161 acl-2012-Polarity Consistency Checking for Sentiment Dictionaries

7 0.10814214 134 acl-2012-Learning to Find Translations and Transliterations on the Web

8 0.10723397 150 acl-2012-Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia

9 0.096080281 178 acl-2012-Sentence Simplification by Monolingual Machine Translation

10 0.084872633 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

11 0.083530948 7 acl-2012-A Computational Approach to the Automation of Creative Naming

12 0.081105746 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model

13 0.08107385 194 acl-2012-Text Segmentation by Language Using Minimum Description Length

14 0.077201575 60 acl-2012-Coupling Label Propagation and Constraints for Temporal Fact Extraction

15 0.074987978 13 acl-2012-A Graphical Interface for MT Evaluation and Error Analysis

16 0.072155148 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

17 0.066429824 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing

18 0.063828193 64 acl-2012-Crosslingual Induction of Semantic Roles

19 0.063276239 142 acl-2012-Mining Entity Types from Query Logs via User Intent Modeling

20 0.063078336 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.212), (1, 0.082), (2, 0.009), (3, 0.081), (4, 0.103), (5, 0.149), (6, -0.026), (7, 0.06), (8, -0.006), (9, -0.043), (10, 0.179), (11, 0.027), (12, 0.142), (13, 0.086), (14, -0.053), (15, -0.229), (16, 0.001), (17, 0.075), (18, -0.016), (19, 0.031), (20, -0.052), (21, -0.101), (22, -0.048), (23, -0.023), (24, 0.009), (25, 0.165), (26, -0.099), (27, 0.004), (28, -0.026), (29, -0.06), (30, 0.087), (31, 0.048), (32, 0.013), (33, 0.096), (34, -0.086), (35, -0.016), (36, -0.067), (37, -0.002), (38, 0.095), (39, -0.175), (40, -0.115), (41, 0.097), (42, 0.1), (43, 0.121), (44, 0.01), (45, -0.049), (46, 0.098), (47, -0.112), (48, -0.089), (49, -0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96435338 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

Author: Gerard de Melo ; Gerhard Weikum

2 0.90479904 152 acl-2012-Multilingual WSD with Just a Few Lines of Code: the BabelNet API

Author: Roberto Navigli ; Simone Paolo Ponzetto

3 0.56205702 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study

Author: Nathan Schneider ; Behrang Mohit ; Kemal Oflazer ; Noah A. Smith

4 0.52667075 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval

Author: Zhi Zhong ; Hwee Tou Ng

5 0.46399051 7 acl-2012-A Computational Approach to the Automation of Creative Naming

Author: Gozde Ozbal ; Carlo Strapparava

Abstract: In this paper, we propose a computational approach to generate neologisms consisting of homophonic puns and metaphors based on the category of the service to be named and the properties to be underlined. We describe all the linguistic resources and natural language processing techniques that we have exploited for this task. Then, we analyze the performance of the system that we have developed. The empirical results show that our approach is generally effective and it constitutes a solid starting point for the automation ofthe naming process.

6 0.46290317 161 acl-2012-Polarity Consistency Checking for Sentiment Dictionaries

7 0.45826283 195 acl-2012-The Creation of a Corpus of English Metalanguage

8 0.43052211 194 acl-2012-Text Segmentation by Language Using Minimum Description Length

9 0.41103688 178 acl-2012-Sentence Simplification by Monolingual Machine Translation

10 0.38490289 1 acl-2012-ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora

11 0.34065598 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction

12 0.33632547 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

13 0.33465788 13 acl-2012-A Graphical Interface for MT Evaluation and Error Analysis

14 0.32909784 186 acl-2012-Structuring E-Commerce Inventory

15 0.32738003 112 acl-2012-Humor as Circuits in Semantic Networks

16 0.31943336 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence

17 0.31180128 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

18 0.31130761 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing

19 0.30263263 138 acl-2012-LetsMT!: Cloud-Based Platform for Do-It-Yourself Machine Translation

20 0.29418704 151 acl-2012-Multilingual Subjectivity and Sentiment Analysis

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.011), (25, 0.047), (26, 0.059), (28, 0.047), (30, 0.035), (37, 0.025), (39, 0.08), (57, 0.016), (64, 0.012), (74, 0.024), (82, 0.039), (84, 0.024), (85, 0.079), (86, 0.011), (87, 0.218), (90, 0.083), (92, 0.049), (94, 0.017), (99, 0.058)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80198145 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

Author: Gerard de Melo ; Gerhard Weikum

2 0.78673577 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

Author: Jonathan Berant ; Ido Dagan ; Meni Adler ; Jacob Goldberger

Abstract: Learning entailment rules is fundamental in many semantic-inference applications and has been an active field of research in recent years. In this paper we address the problem of learning transitive graphs that describe entailment rules between predicates (termed entailment graphs). We first identify that entailment graphs exhibit a “tree-like” property and are very similar to a novel type of graph termed forest-reducible graph. We utilize this property to develop an iterative efficient approximation algorithm for learning the graph edges, where each iteration takes linear time. We compare our approximation algorithm to a recently-proposed state-of-the-art exact algorithm and show that it is more efficient and scalable both theoretically and empirically, while its output quality is close to that given by the optimal solution of the exact algorithm.

3 0.61107105 152 acl-2012-Multilingual WSD with Just a Few Lines of Code: the BabelNet API

Author: Roberto Navigli ; Simone Paolo Ponzetto

4 0.56118441 29 acl-2012-Assessing the Effect of Inconsistent Assessors on Summarization Evaluation

Author: Karolina Owczarzak ; Peter A. Rankel ; Hoa Trang Dang ; John M. Conroy

Abstract: We investigate the consistency of human assessors involved in summarization evaluation to understand its effect on system ranking and automatic evaluation techniques. Using Text Analysis Conference data, we measure annotator consistency based on human scoring of summaries for Responsiveness, Readability, and Pyramid scoring. We identify inconsistencies in the data and measure to what extent these inconsistencies affect the ranking of automatic summarization systems. Finally, we examine the stability of automatic metrics (ROUGE and CLASSY) with respect to the inconsistent assessments.

5 0.55771929 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents

Author: Yashar Mehdad ; Matteo Negri ; Marcello Federico

Abstract: We address a core aspect of the multilingual content synchronization task: the identification of novel, more informative or semantically equivalent pieces of information in two documents about the same topic. This can be seen as an application-oriented variant of textual entailment recognition where: i) T and H are in different languages, and ii) entailment relations between T and H have to be checked in both directions. Using a combination of lexical, syntactic, and semantic features to train a cross-lingual textual entailment system, we report promising results on different datasets.

6 0.55742192 82 acl-2012-Entailment-based Text Exploration with Application to the Health-care Domain

7 0.55345261 187 acl-2012-Subgroup Detection in Ideological Discussions

8 0.55128157 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle

9 0.55061913 138 acl-2012-LetsMT!: Cloud-Based Platform for Do-It-Yourself Machine Translation

10 0.55029273 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool

11 0.54988366 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

12 0.54803294 191 acl-2012-Temporally Anchored Relation Extraction

13 0.54600203 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

14 0.54552221 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence

15 0.54530972 102 acl-2012-Genre Independent Subgroup Detection in Online Discussion Threads: A Study of Implicit Attitude using Textual Latent Semantics

16 0.54433364 139 acl-2012-MIX Is Not a Tree-Adjoining Language

17 0.54420239 44 acl-2012-CSNIPER - Annotation-by-query for Non-canonical Constructions in Large Corpora

18 0.54182333 165 acl-2012-Probabilistic Integration of Partial Lexical Information for Noise Robust Haptic Voice Recognition

19 0.54131413 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

20 0.53980166 13 acl-2012-A Graphical Interface for MT Evaluation and Error Analysis