acl acl2010 acl2010-166 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Roberto Navigli ; Paola Velardi
Abstract: Definition extraction is the task of automatically identifying definitional sentences within texts. The task has proven useful in many research areas including ontology learning, relation extraction and question answering. However, current approaches mostly focused on lexicosyntactic patterns suffer from both low recall and precision, as definitional sentences occur in highly variable syntactic structures. In this paper, we propose WordClass Lattices (WCLs), a generalization of word lattices that we use to model textual definitions. Lattices are learned from a dataset of definitions from Wikipedia. Our method is applied to the task of definition and hypernym extraction and compares favorably to other pattern general– – ization methods proposed in the literature.
Reference: text
sentIndex sentText sentNum sentScore
1 it , Abstract Definition extraction is the task of automatically identifying definitional sentences within texts. [sent-3, score-0.506]
2 However, current approaches mostly focused on lexicosyntactic patterns suffer from both low recall and precision, as definitional sentences occur in highly variable syntactic structures. [sent-5, score-0.688]
3 In this paper, we propose WordClass Lattices (WCLs), a generalization of word lattices that we use to model textual definitions. [sent-6, score-0.3]
4 Lattices are learned from a dataset of definitions from Wikipedia. [sent-7, score-0.211]
5 Our method is applied to the task of definition and hypernym extraction and compares favorably to other pattern general– – ization methods proposed in the literature. [sent-8, score-0.543]
6 Automatic definition extraction is useful not only in the construction of glossaries, but also in many other NLP tasks. [sent-15, score-0.19]
7 However, these methods suffer both from low recall and precision, as definitional sentences occur in highly variable syntactic structures, and because the most frequent definitional pattern X is a Y is inherently very noisy. [sent-23, score-0.883]
8 A lattice is a directed acyclic graph (DAG), a subclass of non-deterministic finite state automata (NFA). [sent-25, score-0.193]
9 In computational linguistics, lattices have been used to model in a compact way many sequences of symbols, each representing an alternative hypothesis. [sent-27, score-0.3]
10 In speech processing, phoneme or word lattices (Campbell et al. [sent-30, score-0.3]
11 In more com- plex text processing tasks, such as information retrieval, information extraction and summarization, the use of word lattices has been postulated but is considered unrealistic because of the dimension of the hypothesis space. [sent-39, score-0.375]
12 To reduce this problem, concept lattices have been proposed (Carpineto and Romano, 2005; Klein, 2008; Zhong et al. [sent-40, score-0.3]
13 In definition extraction, the variability of patterns is higher than for “traditional” applications of lattices, such as translation and speech, however not as high as in unconstrained sentences. [sent-44, score-0.33]
14 The methodology that we propose to align patterns is based on the use of star (wildcard *) characters to facilitate sentence clustering. [sent-45, score-0.405]
15 Each clus- ter of sentences is then generalized to a lattice of word classes (each class being either a frequent word or a part of speech). [sent-46, score-0.306]
16 WCLs are shown to generalize over lexico-syntactic patterns, and outperform well-known approaches to definition and hypernym extraction. [sent-49, score-0.456]
17 A great deal of work is concerned with definition extraction in several languages (Klavans and Muresan, 2001 ; Storrer and Wellinghoff, 2006; Gaudio and Branco, 2007; Iftene et al. [sent-53, score-0.19]
18 The majority of these approaches use symbolic methods that depend on lexico-syntactic patterns or features, which are manually crafted or semi-automatically learned (Zhang and Jiang, 2009; Hovy et al. [sent-57, score-0.175]
19 As we already remarked, most methods suffer from both low recall and precision, because definitional sentences occur in highly variable and potentially noisy syntactic structures. [sent-65, score-0.471]
20 Only few papers try to cope with the generality of patterns and domains in real-world corpora (like the Web). [sent-70, score-0.203]
21 , 2008), to improve precision while keeping pattern generality, candidates are pruned using more refined stylistic patterns and lexical filters. [sent-72, score-0.292]
22 (2007) propose the use of probabilistic lexico-semantic patterns, called soft patterns, for definitional question answering in the TREC contest1 . [sent-74, score-0.396]
23 Soft patterns generalize over lexicosyntactic “hard” patterns in that they allow a partial matching by calculating a generative degree of match probability between the test instance and the set of training instances. [sent-76, score-0.437]
24 The literature on hypernym extraction offers a higher variability of methods, from simple lexical patterns (Hearst, 1992; Oakes, 2005) to statistical and machine learning techniques (Agirre et al. [sent-83, score-0.586]
25 , 1990)); then they parse the sentences, and automatically extract patterns from the parse trees. [sent-90, score-0.175]
26 Finally, they train a hypernym classifer based on these features. [sent-91, score-0.296]
27 Lexico-syntactic patterns are generated for each sentence relating a term to its hypernym, and a dependency parser is used to represent them. [sent-92, score-0.232]
28 (2007) is based on soft patterns but also on a bag-of-word relevance heuristic. [sent-104, score-0.216]
29 , “a first-class function”); The REST field (RF): it includes additional cTlhaeuse Rs that further specify the differentia of the definiendum with respect to its genus (e. [sent-108, score-0.23]
30 Further examples of definitional sentences annotated with the above fields are shown in Table • 1. [sent-111, score-0.46]
31 For each sentence, the definiendum (that is, the word being defined) and its hypernym are marked in bold and italic, respectively. [sent-112, score-0.473]
32 Given the lexicosyntactic nature of the definition extraction models we experiment with, training and test sentences are part-of-speech tagged with the TreeTagger system, a part-of-speech tagger available for many languages (Schmid, 1995). [sent-113, score-0.308]
33 , w|s| , where wi is the i-th word of s, we generalize its words wi to word classes ωi as follows: ωi=(PwiOS(wi) oift hweirw∈i Fse that is, a word wi is left unchanged if it occurs frequently in the training corpus (i. [sent-127, score-0.186]
34 Table 1: Example definitions (defined terms are marked in bold face, their hypernyms in italic). [sent-143, score-0.314]
35 1); • • Sentence clustering: the training sentences are tthenenc ec clulsutsetreerdi nbga:se tdh on athinei nsgta rs patterns to which they belong (Section 3. [sent-147, score-0.251]
36 We present two variants of our WCL model, dealing either globally with the entire sentence or separately with its definition fields (Section 3. [sent-152, score-0.201]
37 ,F tohar instance, given st thhea ste anrete nnocen- f“rIenq arts, a chiaroscuro is a monochrome picture”, the corresponding star pattern is “In *, a hTARGETi is a *”, pwonhdeirne hTARGETi nis i sth “eI nde* f,in ae hdT AterRmG. [sent-171, score-0.34]
38 ,N wothee that, AhRereG aEnTdi iisn wtheha dte follows, we discard the sentence fragments tagged with the REST field, which is used only to delimit the core part of definitional sentences. [sent-172, score-0.444]
39 2 Sentence Clustering In the second step, we cluster the sentences in our training set T based on their star patterns. [sent-175, score-0.278]
40 p Woret endot ien th Taat- each cluster Ci contains sentences whose degree of variability is generally much lower than for any pair of sentences in T belonging to two different pcaluisrte ofrs. [sent-188, score-0.221]
41 s|Cen|te,n wcee sj ramndin eea tcheh sentence sk ∈ Ci such that k < j based on the following dynamic programming formulation (Cormen et al. [sent-218, score-0.236]
42 The matching score Sa,b is calculated on the generalized sentences s0k of sk and s0j of sj as follows: Sa,b=(10 oifth ωeakrw=is ωebj and ωbj where ωak are the a-th and b-th word classes of s0k and s0j, respectively. [sent-233, score-0.333]
43 Finally, the alignment score between sk and sj is given by M|sk |,|sj | , which calculates the mini1321 arts structure science picture IncomNmNapNtuh4Net m1ratics,ahcThiAgaprRriaxoGpsechlEuTriosamonNodJcaNJhta3romeNdNo2t Figure 1: The Word-Class Lattice for the sentences in Table 1. [sent-235, score-0.387]
44 , Furthermore, in the final lattice, no|dse|s− 1asso|scia|ted with the hypernym words in the learning sentences are marked as hypernyms in order to be able to determine the hypernym of a test sentence at classification time. [sent-250, score-0.887]
45 4 Variants of the WCL Model So far, we have assumed that our WCL model learns lattices from the training sentences in their entirety (we call this model WCL-1). [sent-253, score-0.376]
46 Rather than applying the WCL algorithm to the entire sentence, the very same method is applied to the sentence fragments tagged with one of the three definition fields. [sent-257, score-0.204]
47 The reason for introducing the WCL-3 model is that, while definitional patterns are highly variable, DF, VF and GF individually exhibit a lower variability, thus WCL-3 should improve the generalization power. [sent-258, score-0.53]
48 Given a test sentence s, the classification phase for the WCL-1 model consists of determining whether it exists a lattice that matches s. [sent-262, score-0.209]
49 In fact, choosing the most appropriate combination of lattices impacts the performance of hypernym extraction. [sent-265, score-0.596]
50 Finally, when a sentence is classified as a definition, its hypernym is extracted by selecting the words in the input sentence that are marked as “hypernyms” in the WCL-1 lattice (or in the WCL-3 GF lattice). [sent-267, score-0.591]
51 2, their star pattern is “In *, a hTARGETi is a *”. [sent-271, score-0.23]
52 Note that we draw the hypernym token NN2with a rectangle shape. [sent-276, score-0.329]
53 The defined terms belong to different Wikipedia domain categories4, so as to capture a representative and cross-domain sample of • lexical and syntactic patterns for definitions. [sent-292, score-0.175]
54 The subset includes over 300,000 sentences in which occur any of 239 terms selected from the terminology of four different domains (COMPUTER SCI- 3The first sentence of Wikipedia entries is, in the large majority of cases, a definition of the page title. [sent-298, score-0.248]
55 The reason for using the ukWaC corpus is that, unlike the “clean” Wikipedia dataset, in which relatively simple patterns can achieve good results, ukWaC represents a real-world test, with many complex cases. [sent-302, score-0.175]
56 For example, there are sentences that should be classified as definitional according to Section 3. [sent-303, score-0.431]
57 1 but are rather uninformative, like “dynamic programming was the brainchild of an american mathematician”, as well as informative sentences that are not definitional (e. [sent-304, score-0.431]
58 Even more frequently, the dataset includes sentences which are not definitions but have a definitional pattern (“A Pacific Northwest tribe’s saga refers to a young woman who [. [sent-307, score-0.699]
59 ]”), or sentences with very complex definitional patterns (“white body cells are the body’s clean up squad” and “joule is also an expression ofelectric energy”). [sent-309, score-0.606]
60 Star patterns: a simple classifier based on tShtea patterns lnesa:rn aed s as a eres culalts soiffi step a1s eofd our WCL learning algorithm (cf. [sent-319, score-0.175]
61 1): a sentence is classified as a definition if it matches any of the star patterns in the model. [sent-322, score-0.52]
62 The classifier selects as definitions all the sentences whose probability is above a specific threshold. [sent-325, score-0.228]
63 For hypernym extraction, we compared WCL1 and WCL-3 with Hearst’s patterns, a system that extracts hypernyms from sentences based on the lexico-syntactic patterns specified in Hearst’s seminal work (1992). [sent-352, score-0.68]
64 Halloyw {ever, }i t{ oshro |u alndd b}e N NnPo”te,d a nthda vt hypernym eeoxft. [sent-354, score-0.296]
65 ra Hcotiownmethods in the literature do not extract hypernyms from definitional sentences, like we do, but rather from specific patterns like “X such as Y”. [sent-355, score-0.663]
66 Nonetheless, we decided to implement Hearst’s patterns for the sake of completeness. [sent-357, score-0.175]
67 (2004) reported the following performance figures on a corpus of dimension and complexity comparable with ukWaC: the recall-precision graph indicates precision 85% at recall 10% and precision 25% at recall of 30% for the hypernym classifier. [sent-361, score-0.537]
68 To assess the performance of our systems, we calculated the following measures: • • • • precision the number of definitional sentpernecceiss correctly r neutrmiebveerd by dthefein system over the number of sentences marked by the system as definitional. [sent-377, score-0.52]
69 recall the number of definitional sentreenccaelsl correctly u remtrbieevre od by tehfien system over the number of definitional sentences in the dataset. [sent-378, score-0.826]
70 the F1-measure a harmonic mean of precitshioen F (P) and recall (R) given by accuracy the number of correctly classifaieccdu sentences (either as d oeffi cnoirtiroencatlly or nondefinitional) over the total number of sentences in the dataset. [sent-379, score-0.192]
71 In Table 2 we report the results of definition extraction systems on the Wikipedia dataset. [sent-383, score-0.19]
72 The results show very high precision for WCL-1, WCL-3 (around 99%) and star patterns (86%). [sent-385, score-0.408]
73 As expected, bigrams and star patterns exhibit a higher recall (82% and 66%, respectively). [sent-386, score-0.388]
74 In terms of F1-measure, star patterns and WCL-3 achieve 75%, and are thus the best systems. [sent-388, score-0.348]
75 From our Wikipedia corpus, we learned over 1,000 lattices (and star patterns). [sent-391, score-0.473]
76 5080 Table 4: Precision in hypernym extraction on the Wikipedia dataset the ukWaC dataset. [sent-396, score-0.43]
77 To calculate precision on this dataset, we manually validated the definitions output by each system. [sent-397, score-0.212]
78 Interestingly, star patterns obtain only 44% precision and around 63% recall. [sent-404, score-0.408]
79 For hypernym extraction, we tested WCL-1, WCL-3 and Hearst’s patterns. [sent-411, score-0.296]
80 The Substring column refers to the case in which the captured hypernym is a substring of what the annotator considered to be the correct hypernym. [sent-413, score-0.35]
81 Notice that this is a complex matter, because often the selection of a hypernym depends on semantic and contextual issues. [sent-414, score-0.296]
82 42(84) Table 5: Precision in hypernym extraction on the ukWaC dataset (number of hypernyms in parentheses). [sent-427, score-0.563]
83 For the above reasons it is difficult to achieve high performance in capturing the correct hypernym (e. [sent-429, score-0.296]
84 However, our performance of identifying a substring of the correct hypernym is much higher (around 78. [sent-433, score-0.35]
85 In Table 4 we do not report the precision of Hearst’s patterns, as only one hypernym was found, due to the inherently low coverage of the method. [sent-435, score-0.356]
86 On the ukWaC dataset, the hypernyms returned by the three systems were manually validated and precision was calculated. [sent-436, score-0.193]
87 Both WCL-1 and WCL3 obtained a very high precision (86-89% and 96% in identifying the exact hypernym and a substring of it, respectively). [sent-437, score-0.41]
88 Both WCL models are thus equally robust in identifying hypernyms, whereas WCL-1 suffers from a lack of generalization in definition extraction (cf. [sent-438, score-0.19]
89 Hearst’s patterns also obtain high precision, especially when substrings are taken into account. [sent-443, score-0.175]
90 However, the number of hypernyms returned by this method is much lower, due to the specificity of the patterns (62 vs. [sent-444, score-0.308]
91 6 Conclusions In this paper, we have presented a lattice-based approach to definition and hypernym extraction. [sent-446, score-0.411]
92 The use of a lattice structure to generalize over lexico-syntactic definitional patterns; 2. [sent-448, score-0.552]
93 The high performance as compared with the best-known methods for both definition and hypernym extraction. [sent-451, score-0.411]
94 Even though definitional patterns are learned from a manually annotated dataset, the dimension and heterogeneity of the training dataset ensures that training needs not to be repeated for specific domains7, as demonstrated by the cross-domain evaluation on the ukWaC corpus. [sent-455, score-0.589]
95 In the near future, we aim to apply the output of our classifiers to the task of automated taxonomy building, and to test the WCL approach on other information extraction tasks, like hypernym extraction from generic sentence fragments, as in Snow et al. [sent-460, score-0.503]
96 Language recognition with word lattices and support vector machines. [sent-477, score-0.3]
97 Using a maximum entropy model to build segmentation lattices for mt. [sent-524, score-0.3]
98 Automatic extraction of definitions in portuguese: A rulebased approach. [sent-540, score-0.227]
99 Extending metadata definitions by automatically extracting and organizing glossary definitions. [sent-549, score-0.182]
100 An annotated dataset for extracting definitions and hypernyms from the Web. [sent-591, score-0.344]
wordName wordTfidf (topN-words)
[('definitional', 0.355), ('lattices', 0.3), ('hypernym', 0.296), ('ukwac', 0.259), ('wcl', 0.184), ('patterns', 0.175), ('star', 0.173), ('htargeti', 0.166), ('definitions', 0.152), ('lattice', 0.152), ('definiendum', 0.148), ('wcls', 0.148), ('hypernyms', 0.133), ('definition', 0.115), ('navigli', 0.108), ('vf', 0.1), ('gf', 0.1), ('cui', 0.097), ('definiens', 0.092), ('definitor', 0.092), ('sj', 0.092), ('wikipedia', 0.087), ('sk', 0.087), ('velardi', 0.081), ('hearst', 0.08), ('roberto', 0.078), ('generalized', 0.078), ('sentences', 0.076), ('extraction', 0.075), ('paola', 0.075), ('westerhout', 0.074), ('df', 0.071), ('ci', 0.065), ('arts', 0.065), ('snow', 0.063), ('precision', 0.06), ('dataset', 0.059), ('pattern', 0.057), ('sentence', 0.057), ('chiaroscuro', 0.055), ('glossaries', 0.055), ('klavans', 0.055), ('monochrome', 0.055), ('rski', 0.055), ('schroeder', 0.055), ('storrer', 0.055), ('substring', 0.054), ('borg', 0.048), ('elearning', 0.048), ('italic', 0.048), ('wi', 0.047), ('generalize', 0.045), ('przepi', 0.044), ('genus', 0.044), ('nn', 0.042), ('roma', 0.042), ('lexicosyntactic', 0.042), ('soft', 0.041), ('graph', 0.041), ('recall', 0.04), ('variability', 0.04), ('deg', 0.039), ('field', 0.038), ('picture', 0.038), ('algorithmfullsubstring', 0.037), ('cormen', 0.037), ('eline', 0.037), ('fahmi', 0.037), ('gaudio', 0.037), ('ieeeinternational', 0.037), ('iftene', 0.037), ('ldf', 0.037), ('lgf', 0.037), ('lvf', 0.037), ('monachesi', 0.037), ('nondefinitional', 0.037), ('pixel', 0.037), ('sanfilippo', 0.037), ('wellinghoff', 0.037), ('xabier', 0.037), ('bollywood', 0.036), ('np', 0.034), ('token', 0.033), ('fragments', 0.032), ('imaging', 0.032), ('smaranda', 0.032), ('pozna', 0.032), ('zhong', 0.032), ('proceedings', 0.03), ('datasets', 0.03), ('orkowski', 0.03), ('muresan', 0.03), ('glossary', 0.03), ('fields', 0.029), ('alignment', 0.029), ('trec', 0.029), ('cluster', 0.029), ('marked', 0.029), ('generality', 0.028), ('judith', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 166 acl-2010-Learning Word-Class Lattices for Definition and Hypernym Extraction
Author: Roberto Navigli ; Paola Velardi
Abstract: Definition extraction is the task of automatically identifying definitional sentences within texts. The task has proven useful in many research areas including ontology learning, relation extraction and question answering. However, current approaches mostly focused on lexicosyntactic patterns suffer from both low recall and precision, as definitional sentences occur in highly variable syntactic structures. In this paper, we propose WordClass Lattices (WCLs), a generalization of word lattices that we use to model textual definitions. Lattices are learned from a dataset of definitions from Wikipedia. Our method is applied to the task of definition and hypernym extraction and compares favorably to other pattern general– – ization methods proposed in the literature.
2 0.13965391 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation
Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita
Abstract: Lattice decoding in statistical machine translation (SMT) is useful in speech translation and in the translation of German because it can handle input ambiguities such as speech recognition ambiguities and German word segmentation ambiguities. We show that lattice decoding is also useful for handling input variations. Given an input sentence, we build a lattice which represents paraphrases of the input sentence. We call this a paraphrase lattice. Then, we give the paraphrase lattice as an input to the lattice decoder. The decoder selects the best path for decoding. Using these paraphrase lattices as inputs, we obtained significant gains in BLEU scores for IWSLT and Europarl datasets.
3 0.11638241 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining
Author: Peng Li ; Jing Jiang ; Yinglin Wang
Abstract: In this paper, we propose a novel approach to automatic generation of summary templates from given collections of summary articles. This kind of summary templates can be useful in various applications. We first develop an entity-aspect LDA model to simultaneously cluster both sentences and words into aspects. We then apply frequent subtree pattern mining on the dependency parse trees of the clustered and labeled sentences to discover sentence patterns that well represent the aspects. Key features of our method include automatic grouping of semantically related sentence patterns and automatic identification of template slots that need to be filled in. We apply our method on five Wikipedia entity categories and compare our method with two baseline methods. Both quantitative evaluation based on human judgment and qualitative comparison demonstrate the effectiveness and advantages of our method.
4 0.11578937 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems
Author: Simone Paolo Ponzetto ; Roberto Navigli
Abstract: One of the main obstacles to highperformance Word Sense Disambiguation (WSD) is the knowledge acquisition bottleneck. In this paper, we present a methodology to automatically extend WordNet with large amounts of semantic relations from an encyclopedic resource, namely Wikipedia. We show that, when provided with a vast amount of high-quality semantic relations, simple knowledge-lean disambiguation algorithms compete with state-of-the-art supervised WSD systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets.
5 0.10708724 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs
Author: Galina Tremper
Abstract: Presupposition relations between verbs are not very well covered in existing lexical semantic resources. We propose a weakly supervised algorithm for learning presupposition relations between verbs that distinguishes five semantic relations: presupposition, entailment, temporal inclusion, antonymy and other/no relation. We start with a number of seed verb pairs selected manually for each semantic relation and classify unseen verb pairs. Our algorithm achieves an overall accuracy of 36% for type-based classification.
6 0.093310535 97 acl-2010-Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices
7 0.090060346 44 acl-2010-BabelNet: Building a Very Large Multilingual Semantic Network
8 0.086926162 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results
9 0.080052599 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
10 0.074917912 124 acl-2010-Generating Image Descriptions Using Dependency Relational Patterns
11 0.074824139 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web
12 0.074050032 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
13 0.073703252 171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering
14 0.073172942 185 acl-2010-Open Information Extraction Using Wikipedia
15 0.067991257 14 acl-2010-A Risk Minimization Framework for Extractive Speech Summarization
16 0.066874452 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
17 0.066697158 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
18 0.065828331 69 acl-2010-Constituency to Dependency Translation with Forests
19 0.062404849 181 acl-2010-On Learning Subtypes of the Part-Whole Relation: Do Not Mix Your Seeds
20 0.061793808 43 acl-2010-Automatically Generating Term Frequency Induced Taxonomies
topicId topicWeight
[(0, -0.199), (1, 0.031), (2, -0.057), (3, -0.009), (4, 0.1), (5, -0.006), (6, 0.086), (7, -0.002), (8, -0.022), (9, -0.01), (10, -0.013), (11, -0.014), (12, -0.064), (13, -0.025), (14, 0.008), (15, 0.052), (16, -0.012), (17, 0.12), (18, -0.022), (19, 0.092), (20, -0.023), (21, 0.068), (22, -0.081), (23, -0.052), (24, -0.0), (25, -0.082), (26, -0.013), (27, 0.012), (28, -0.17), (29, -0.022), (30, -0.058), (31, 0.109), (32, 0.057), (33, 0.017), (34, 0.059), (35, -0.047), (36, -0.003), (37, 0.051), (38, -0.017), (39, 0.037), (40, -0.026), (41, -0.042), (42, 0.086), (43, -0.001), (44, -0.064), (45, 0.052), (46, -0.129), (47, -0.021), (48, 0.066), (49, 0.045)]
simIndex simValue paperId paperTitle
same-paper 1 0.89360112 166 acl-2010-Learning Word-Class Lattices for Definition and Hypernym Extraction
Author: Roberto Navigli ; Paola Velardi
Abstract: Definition extraction is the task of automatically identifying definitional sentences within texts. The task has proven useful in many research areas including ontology learning, relation extraction and question answering. However, current approaches mostly focused on lexicosyntactic patterns suffer from both low recall and precision, as definitional sentences occur in highly variable syntactic structures. In this paper, we propose WordClass Lattices (WCLs), a generalization of word lattices that we use to model textual definitions. Lattices are learned from a dataset of definitions from Wikipedia. Our method is applied to the task of definition and hypernym extraction and compares favorably to other pattern general– – ization methods proposed in the literature.
2 0.65748382 43 acl-2010-Automatically Generating Term Frequency Induced Taxonomies
Author: Karin Murthy ; Tanveer A Faruquie ; L Venkata Subramaniam ; Hima Prasad K ; Mukesh Mohania
Abstract: We propose a novel method to automatically acquire a term-frequency-based taxonomy from a corpus using an unsupervised method. A term-frequency-based taxonomy is useful for application domains where the frequency with which terms occur on their own and in combination with other terms imposes a natural term hierarchy. We highlight an application for our approach and demonstrate its effectiveness and robustness in extracting knowledge from real-world data.
3 0.63767248 138 acl-2010-Hunting for the Black Swan: Risk Mining from Text
Author: Jochen Leidner ; Frank Schilder
Abstract: In the business world, analyzing and dealing with risk permeates all decisions and actions. However, to date, risk identification, the first step in the risk management cycle, has always been a manual activity with little to no intelligent software tool support. In addition, although companies are required to list risks to their business in their annual SEC filings in the USA, these descriptions are often very highlevel and vague. In this paper, we introduce Risk Mining, which is the task of identifying a set of risks pertaining to a business area or entity. We argue that by combining Web mining and Information Extraction (IE) techniques, risks can be detected automatically before they materialize, thus providing valuable business intelligence. We describe a system that induces a risk taxonomy with concrete risks (e.g., interest rate changes) at its leaves and more abstract risks (e.g., financial risks) closer to its root node. The taxonomy is induced via a bootstrapping algorithms starting with a few seeds. The risk taxonomy is used by the system as input to a risk monitor that matches risk mentions in financial documents to the abstract risk types, thus bridging a lexical gap. Our system is able to automatically generate company specific “risk maps”, which we demonstrate for a corpus of earnings report conference calls.
4 0.60573202 64 acl-2010-Complexity Assumptions in Ontology Verbalisation
Author: Richard Power
Abstract: We describe the strategy currently pursued for verbalising OWL ontologies by sentences in Controlled Natural Language (i.e., combining generic rules for realising logical patterns with ontology-specific lexicons for realising atomic terms for individuals, classes, and properties) and argue that its success depends on assumptions about the complexity of terms and axioms in the ontology. We then show, through analysis of a corpus of ontologies, that although these assumptions could in principle be violated, they are overwhelmingly respected in practice by ontology developers.
5 0.56617129 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation
Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita
Abstract: Lattice decoding in statistical machine translation (SMT) is useful in speech translation and in the translation of German because it can handle input ambiguities such as speech recognition ambiguities and German word segmentation ambiguities. We show that lattice decoding is also useful for handling input variations. Given an input sentence, we build a lattice which represents paraphrases of the input sentence. We call this a paraphrase lattice. Then, we give the paraphrase lattice as an input to the lattice decoder. The decoder selects the best path for decoding. Using these paraphrase lattices as inputs, we obtained significant gains in BLEU scores for IWSLT and Europarl datasets.
6 0.55454475 126 acl-2010-GernEdiT - The GermaNet Editing Tool
8 0.53578043 181 acl-2010-On Learning Subtypes of the Part-Whole Relation: Do Not Mix Your Seeds
9 0.53174943 19 acl-2010-A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation
10 0.50391012 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
11 0.49837175 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
12 0.48791707 44 acl-2010-BabelNet: Building a Very Large Multilingual Semantic Network
13 0.4861984 61 acl-2010-Combining Data and Mathematical Models of Language Change
14 0.48462242 185 acl-2010-Open Information Extraction Using Wikipedia
15 0.48420763 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining
16 0.47197658 139 acl-2010-Identifying Generic Noun Phrases
17 0.46471941 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs
18 0.46200612 74 acl-2010-Correcting Errors in Speech Recognition with Articulatory Dynamics
19 0.46137235 250 acl-2010-Untangling the Cross-Lingual Link Structure of Wikipedia
20 0.45939836 7 acl-2010-A Generalized-Zero-Preserving Method for Compact Encoding of Concept Lattices
topicId topicWeight
[(14, 0.016), (25, 0.053), (33, 0.015), (39, 0.011), (42, 0.033), (59, 0.127), (72, 0.023), (73, 0.047), (76, 0.011), (78, 0.048), (80, 0.012), (83, 0.074), (84, 0.037), (87, 0.293), (98, 0.095)]
simIndex simValue paperId paperTitle
same-paper 1 0.74953848 166 acl-2010-Learning Word-Class Lattices for Definition and Hypernym Extraction
Author: Roberto Navigli ; Paola Velardi
Abstract: Definition extraction is the task of automatically identifying definitional sentences within texts. The task has proven useful in many research areas including ontology learning, relation extraction and question answering. However, current approaches mostly focused on lexicosyntactic patterns suffer from both low recall and precision, as definitional sentences occur in highly variable syntactic structures. In this paper, we propose WordClass Lattices (WCLs), a generalization of word lattices that we use to model textual definitions. Lattices are learned from a dataset of definitions from Wikipedia. Our method is applied to the task of definition and hypernym extraction and compares favorably to other pattern general– – ization methods proposed in the literature.
2 0.71102536 13 acl-2010-A Rational Model of Eye Movement Control in Reading
Author: Klinton Bicknell ; Roger Levy
Abstract: A number of results in the study of realtime sentence comprehension have been explained by computational models as resulting from the rational use of probabilistic linguistic information. Many times, these hypotheses have been tested in reading by linking predictions about relative word difficulty to word-aggregated eye tracking measures such as go-past time. In this paper, we extend these results by asking to what extent reading is well-modeled as rational behavior at a finer level of analysis, predicting not aggregate measures, but the duration and location of each fixation. We present a new rational model of eye movement control in reading, the central assumption of which is that eye move- ment decisions are made to obtain noisy visual information as the reader performs Bayesian inference on the identities of the words in the sentence. As a case study, we present two simulations demonstrating that the model gives a rational explanation for between-word regressions.
3 0.54652572 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
Author: Zornitsa Kozareva ; Eduard Hovy
Abstract: A challenging problem in open information extraction and text mining is the learning of the selectional restrictions of semantic relations. We propose a minimally supervised bootstrapping algorithm that uses a single seed and a recursive lexico-syntactic pattern to learn the arguments and the supertypes of a diverse set of semantic relations from the Web. We evaluate the performance of our algorithm on multiple semantic relations expressed using “verb”, “noun”, and “verb prep” lexico-syntactic patterns. Humanbased evaluation shows that the accuracy of the harvested information is about 90%. We also compare our results with existing knowledge base to outline the similarities and differences of the granularity and diversity of the harvested knowledge.
4 0.54647225 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems
Author: Simone Paolo Ponzetto ; Roberto Navigli
Abstract: One of the main obstacles to highperformance Word Sense Disambiguation (WSD) is the knowledge acquisition bottleneck. In this paper, we present a methodology to automatically extend WordNet with large amounts of semantic relations from an encyclopedic resource, namely Wikipedia. We show that, when provided with a vast amount of high-quality semantic relations, simple knowledge-lean disambiguation algorithms compete with state-of-the-art supervised WSD systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets.
5 0.54524636 158 acl-2010-Latent Variable Models of Selectional Preference
Author: Diarmuid O Seaghdha
Abstract: This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to previously proposed techniques, these models perform very competitively, especially for infrequent predicate-argument combinations where they exceed the quality of Web-scale predictions while using relatively little data.
6 0.54388553 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
7 0.54321253 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences
8 0.54152399 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
9 0.54145801 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons
10 0.53974032 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study
11 0.53951705 169 acl-2010-Learning to Translate with Source and Target Syntax
12 0.53818673 44 acl-2010-BabelNet: Building a Very Large Multilingual Semantic Network
13 0.5380047 114 acl-2010-Faster Parsing by Supertagger Adaptation
14 0.53772527 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
15 0.53632879 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
16 0.53605449 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network
17 0.53545702 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
18 0.5346427 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
19 0.53456837 162 acl-2010-Learning Common Grammar from Multilingual Corpus
20 0.5339461 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar