acl acl2011 acl2011-262 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tara McIntosh ; Lars Yencken ; James R. Curran ; Timothy Baldwin
Abstract: State-of-the-art bootstrapping systems rely on expert-crafted semantic constraints such as negative categories to reduce semantic drift. Unfortunately, their use introduces a substantial amount of supervised knowledge. We present the Relation Guided Bootstrapping (RGB) algorithm, which simultaneously extracts lexicons and open relationships to guide lexicon growth and reduce semantic drift. This removes the necessity for manually crafting category and relationship constraints, and manually generating negative categories.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract State-of-the-art bootstrapping systems rely on expert-crafted semantic constraints such as negative categories to reduce semantic drift. [sent-5, score-0.65]
2 We present the Relation Guided Bootstrapping (RGB) algorithm, which simultaneously extracts lexicons and open relationships to guide lexicon growth and reduce semantic drift. [sent-7, score-0.419]
3 This removes the necessity for manually crafting category and relationship constraints, and manually generating negative categories. [sent-8, score-0.278]
4 1 Introduction Many approaches to extracting semantic lexicons extend the unsupervised bootstrapping framework (Riloff and Shepherd, 1997). [sent-9, score-0.326]
5 These use a small set of seed examples from the target lexicon to identify contextual patterns which are then used to extract new lexicon items (Riloff and Jones, 1999). [sent-10, score-0.466]
6 Bootstrappers are prone to semantic drift, caused by selection of poor candidate terms or patterns (Curran et al. [sent-11, score-0.365]
7 , 2002) and WMEB (McIntosh and Curran, 2008), reduce semantic drift by extracting multiple categories simultaneously in competition. [sent-14, score-0.46]
8 The inclusion of manually-crafted negative categories to multi-category bootstrappers achieves the best results, by clarifying the boundaries between categories (Yangarber et al. [sent-15, score-0.549]
9 For example, female name s are often bootstrapped with 266 j ame s @ it . [sent-17, score-0.053]
10 Unfortunately, negative categories are difficult to design, introducing a substantial amount of human expertise into an otherwise unsupervised framework. [sent-27, score-0.337]
11 McIntosh (2010) made some progress towards automatically learning useful negative categories during bootstrapping. [sent-28, score-0.281]
12 In this work we identify an unsupervised source of semantic constraints inspired by the Coupled Pattern Learner (CPL, Carlson et al. [sent-29, score-0.201]
13 In CPL, relation bootstrapping is coupled with lexicon bootstrapping in order to control semantic drift in the target relation’s arguments. [sent-31, score-0.968]
14 Semantic constraints on categories and relations are manually crafted in CPL. [sent-32, score-0.43]
15 For example, a candidate of the relation ISCEOOF will only be extracted if its arguments can be extracted into the ceo and company lexicons and a ceo is constrained to not be a celebrity or pol it ician. [sent-33, score-0.556]
16 CPL employs a large number of these manually-crafted constraints to improve precision at the expense of recall (only 18 ISCEOOF instances were extracted). [sent-35, score-0.102]
17 In our approach, we exploit open relation bootstrapping to minimise semantic drift, without any manual seeding of relations or pre-defined category lexicon combinations. [sent-36, score-0.971]
18 They extract relation tuples by exploitProceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o. [sent-40, score-0.462]
19 i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 266–270, ing broad syntactic patterns that are likely to indicate relations. [sent-42, score-0.159]
20 This enables the extraction of interesting and unanticipated relations from text. [sent-43, score-0.268]
21 However these patterns are often too broad, resulting in the extraction of tuples that do not represent relations at all. [sent-44, score-0.617]
22 (2010) improve TEXTRUNNER precision by using deep parsing information via semantic role labelling. [sent-47, score-0.117]
23 2 Relation Guided Bootstrapping Rather than relying on manually-crafted category and relation constraints, Relation Guided Bootstrapping (RGB) automatically detects, seeds and bootstraps open relations between the target categories. [sent-48, score-0.685]
24 We demonstrate that this relation guidance effectively reduces semantic drift, with performance approaching manually-crafted constraints. [sent-53, score-0.399]
25 RGB alternates between two phases of WMEB, one for terms and the other for relations, with a one-off relation discovery phase in between. [sent-55, score-0.37]
26 Term Extraction The first stage of RGB follows the term extraction process of WMEB. [sent-56, score-0.133]
27 Each category is initialised by a set of hand-picked seed terms. [sent-57, score-0.145]
28 In each iteration, a category’s terms are used to identify candidate patterns that can match the terms in the text. [sent-58, score-0.399]
29 Semantic drift is reduced by forcing the categories to be mutually exclusive (i. [sent-59, score-0.448]
30 The remaining patterns are ranked according to reliability and relevance, and the top-n patterns are then added to the pattern set. [sent-62, score-0.381]
31 1 The reliability of a pattern for a given category is the number of extracted terms in the category’s lexicon that match the pattern. [sent-63, score-0.356]
32 A pattern’s relevance weight is defined as the sum of the χ2 values between the pattern (p) and each of the lexicon terms 1In this work, n is set to 5. [sent-64, score-0.226]
33 These metrics are symmetrical for boPtht ∈cTandidate terms and pattern. [sent-66, score-0.075]
34 In WMEB’s term Pselection phase, a category’s pattern set is used to identify candidate terms. [sent-67, score-0.233]
35 Like the candidate patterns, terms matching multiple categories are excluded. [sent-68, score-0.281]
36 The remaining terms are ranked and the top-n terms are added to the lexicon. [sent-69, score-0.098]
37 , 2010), a relation is instantiated with manually-crafted seed tuples and patterns. [sent-71, score-0.511]
38 In RGB, the relations and their seeds are automatically identified in relation discovery. [sent-72, score-0.555]
39 Relation discovery is only performed once after the first 20 iterations of term extraction, which ensures the lexicons have adequate coverage to form potential relations. [sent-73, score-0.177]
40 Each ordered pair of categories (C1, C2) = R1,2 is checked for open (not pre-defined) relations between C1 and C2. [sent-74, score-0.4]
41 This check removes all pairs of terms, tuples (t1, t2) ∈ C1 C2 with freq(t1 , t2) < 5 and a cooccurrence score ×χ 2C(t1, t2) ≤ 0. [sent-75, score-0.245]
42 The tuples for R1,2 are then used to find its initial set of relation patterns. [sent-78, score-0.462]
43 Each pattern must match more than one tuple and must be mutually exclusive between the relations. [sent-79, score-0.263]
44 If fewer than n relation patterns are found for R1,2, it is discarded. [sent-80, score-0.405]
45 TYPE5gm5gm + 4gm5gm +DC Terms1347002 Patterns 4 090 412 Tuples21142433470206 14369673 Relation Patterns 552347310317703 31867250 Table 1: Statistics of three filtered MEDLINE datasets we have identified the open relations that link categories together and their initial extraction patterns. [sent-82, score-0.512]
46 Using the initial relation patterns, the top-n mutually exclusive seed tuples are identified for the relation R1,2. [sent-83, score-0.899]
47 Note that R1,2 can represent multiple relations between C1 and C2, which may not apply to all of the seeds, e. [sent-85, score-0.171]
48 We discover two types of relations, inter-category relations where C1 C2, and intra-category relations where C1 = C2. [sent-88, score-0.342]
49 = Relation Extraction The relation extraction phase involves running WMEB over tuples rather than terms. [sent-89, score-0.565]
50 R1,2 and R2,3, these are bootstrapped simultaneously, competing with each other for tuples and relation patterns. [sent-92, score-0.515]
51 Mutual exclusion constraints between the relations are also forced. [sent-93, score-0.299]
52 In each iteration, a relation’s set of tuples is used to identify candidate relation patterns, as for term extraction. [sent-94, score-0.632]
53 The top-n non-overlapping patterns are extracted for each relation, and are used to identify the top-n candidate tuples. [sent-95, score-0.267]
54 The tuples are scored similarly to the relation patterns, and any tuple identified by multiple relations is excluded. [sent-96, score-0.739]
55 For tuple extraction, a relation R1,2 is constrained to only consider candidates where either t1 or t2 has previously been extracted into C1 or C2, respectively. [sent-97, score-0.311]
56 To extract a candidate tuple with an unknown term, the term must also be a valid candidate of its associated category. [sent-98, score-0.283]
57 That is, the term must match at least one pattern assigned to the category and not match patterns assigned to another category. [sent-99, score-0.448]
58 This type-checking anchors relations to the cat- egories they link together, limiting their drift into other relations. [sent-100, score-0.364]
59 It also provides guided term growth in the categories they link. [sent-101, score-0.37]
60 The growth is “guided” because the relations ent subregions example, define, semantically coher- of the category search spaces. [sent-102, score-0.312]
61 3 This guidance reduces semantic Experimental Setup To compare the effectiveness of RGB we consider the task of extracting biomedical semantic lexicons, building on the work of McIntosh and Curran (2008). [sent-105, score-0.309]
62 Note however the method is equally applicable to any corpus and set of semantic categories. [sent-106, score-0.079]
63 , 2006), and parsed using the biomedical C&C; CCG parser (Rimell and Clark, 2009; Clark and Curran, 2007). [sent-110, score-0.077]
64 The term extraction data is formed from the raw 5-grams (t1, t2, t3, t4, t5), where the set of candidate terms correspond to the middle tokens (t3) and the patterns are formed from the surrounding tokens (t1, t2, t4, t5). [sent-111, score-0.635]
65 The relation extraction data is also formed from the 5-grams. [sent-112, score-0.399]
66 The candidate tuples correspond to the tokens (t1, t5) and the patterns are formed from the intervening tokens (t2, t3, t4). [sent-113, score-0.587]
67 The second relation dataset (5gm + 4gm), also includes length 2 patterns formed from 4-grams. [sent-114, score-0.487]
68 The final relation dataset (5gm + DC) includes dependency chains up to length 5 as the patterns between terms (Greenwood et al. [sent-115, score-0.512]
69 These chains are formed using the Stanford dependencies generated by the Rimell and Clark (2009) parser. [sent-117, score-0.14]
70 7 Table 3: Performance comparison of WMEB and RGB We follow McIntosh and Curran (2009) in using the 10 biomedical semantic categories and their hand-picked seeds in Table 2, and manu- ally crafted negative categories: amino acid, animal, body part and organi sm. [sent-144, score-0.575]
71 Our evaluation process involved manually judging each extracted term and we calculate the average precision of the top-1000 terms over the 10 target categories. [sent-145, score-0.149]
72 4 Results and Discussion Table 3 compares the performance of WMEB and RGB, with and without the negative categories. [sent-147, score-0.127]
73 For RGB, we compare intra-, inter- and mixed relation types, and use the 5gm format of tuples and relation patterns. [sent-148, score-0.708]
74 In WMEB, drift dominates in the later iterations with ∼19% precision drop between the first aatnido nlass wt i5th00 ∼ te1r9m%s. [sent-149, score-0.231]
75 rTecheis manually-crafted negative categories give a substantial boost in precision on both the first and last 500 terms (+11. [sent-150, score-0.396]
76 Over the top 1000 terms, RGB significantly outperforms the corresponding WMEB with and without negative categories (p < 0. [sent-152, score-0.281]
77 3 In particular, inter-RGB significantly improves upon WMEB with no negative categories (501-1000: +13. [sent-154, score-0.281]
78 InterRGB awseith poreuct negatives approaches tshhe, precision eor-f WMEB with the negatives, trailing only by 2. [sent-158, score-0.079]
79 This demonstrates that RGB effectively reduces the reliance on manually-crafted negative categories for lexicon bootstrapping. [sent-160, score-0.43]
80 The use of intra-category relations was far less 3Significance was tested using intensive randomisation tests. [sent-161, score-0.197]
81 5 Table 4: Comparison of different relation pattern types effective than inter-category relations, and the combination of intra- and inter- was less effective than just using inter-category relations. [sent-180, score-0.309]
82 In intra-RGB the categories are more susceptible to single-category drift. [sent-181, score-0.195]
83 The additional constraints provided by anchoring two categories appear to make inter-RGB less susceptible to drift. [sent-182, score-0.259]
84 Many intra-category relations represent listings commonly identified by conjunctions. [sent-183, score-0.212]
85 However, these patterns are identified by multiple intra-category relations and are excluded. [sent-184, score-0.371]
86 Through manual inspection of inter-RGB’s tuples and patterns, we identified numerous meaningful relations, such as i sExpre s s edI n(prot, ce l l). [sent-185, score-0.257]
87 Relations like this helped to reduce semantic drift within the CELL lexicon by up to 23%. [sent-186, score-0.386]
88 Table 4 compares the effect of different relation pattern representations on the performance of interRGB. [sent-187, score-0.309]
89 The 5gm+4gm data, which doubles the number of possible candidate relation patterns, performs similarly to the 5gm representation. [sent-188, score-0.324]
90 Adding dependency chains decreased and increased precision depending on whether negative categories were used. [sent-189, score-0.377]
91 In Wu and Weld (2010), the performance of an OPENIE system was significantly improved by using patterns formed from dependency parses. [sent-190, score-0.241]
92 However in our DC experiments, the earlier bootstrapping iterations were less precise than the simple 5gm+4gm and 5gm representations. [sent-191, score-0.147]
93 Since the chains can be as short as two dependencies, some of these patterns may not be specific enough. [sent-192, score-0.217]
94 These results demonstrate that useful open relations can be represented using only n-grams. [sent-193, score-0.246]
95 5 Conclusion In this paper, we have proposed Relation Guided Bootstrapping (RGB), an unsupervised approach to discovering and seeding open relations to constrain semantic lexicon bootstrapping. [sent-194, score-0.51]
96 Previous work used manually-crafted lexical and relation constraints to improve relation extraction (Carlson et al. [sent-195, score-0.627]
97 We turn this idea on its head, by using open relation extraction to provide constraints for lexicon bootstrapping, and automatically discover the open relations and their seeds from the expanding bootstrapped lexicons. [sent-197, score-0.966]
98 RGB effectively reduces semantic drift delivering performance comparable to state-of-the-art systems that rely on manually-crafted negative constraints. [sent-198, score-0.434]
99 Automatically acquiring a linguistically motivated genic interaction extraction system. [sent-231, score-0.071]
100 Weighted mutual exclusion bootstrapping for domain independent lexicon and template acquisition. [sent-240, score-0.358]
wordName wordTfidf (topN-words)
[('rgb', 0.412), ('wmeb', 0.324), ('relation', 0.246), ('mcintosh', 0.233), ('tuples', 0.216), ('drift', 0.193), ('relations', 0.171), ('patterns', 0.159), ('categories', 0.154), ('curran', 0.148), ('cpl', 0.147), ('isceoof', 0.147), ('bootstrapping', 0.147), ('negative', 0.127), ('tara', 0.119), ('lexicon', 0.114), ('guided', 0.109), ('seeds', 0.097), ('category', 0.096), ('bootstrappers', 0.088), ('medline', 0.088), ('openie', 0.088), ('formed', 0.082), ('semantic', 0.079), ('rimell', 0.078), ('candidate', 0.078), ('biomedical', 0.077), ('open', 0.075), ('lexicons', 0.072), ('extraction', 0.071), ('carlson', 0.069), ('tuple', 0.065), ('constraints', 0.064), ('exclusion', 0.064), ('pattern', 0.063), ('term', 0.062), ('australian', 0.061), ('grover', 0.059), ('yencken', 0.059), ('yangarber', 0.059), ('chains', 0.058), ('riloff', 0.055), ('exclusive', 0.055), ('bootstrapped', 0.053), ('prot', 0.052), ('christensen', 0.052), ('ceo', 0.051), ('seed', 0.049), ('terms', 0.049), ('nicta', 0.048), ('cell', 0.047), ('mutually', 0.046), ('clark', 0.045), ('greenwood', 0.045), ('dc', 0.045), ('growth', 0.045), ('discovery', 0.043), ('textrunner', 0.043), ('seeding', 0.043), ('coupled', 0.042), ('identified', 0.041), ('negatives', 0.041), ('susceptible', 0.041), ('crafted', 0.041), ('guidance', 0.039), ('james', 0.039), ('melbourne', 0.038), ('precision', 0.038), ('lars', 0.037), ('reduces', 0.035), ('ccg', 0.035), ('anchor', 0.035), ('simultaneously', 0.034), ('stephen', 0.034), ('match', 0.034), ('mutual', 0.033), ('phase', 0.032), ('banko', 0.032), ('council', 0.032), ('company', 0.032), ('identify', 0.03), ('soderland', 0.03), ('removes', 0.029), ('unsupervised', 0.028), ('substantial', 0.028), ('ellen', 0.027), ('tokens', 0.026), ('oren', 0.026), ('crafting', 0.026), ('interdependence', 0.026), ('acid', 0.026), ('nominated', 0.026), ('mutations', 0.026), ('randomisation', 0.026), ('celebrity', 0.026), ('unanticipated', 0.026), ('clarifying', 0.026), ('henk', 0.026), ('symmetrical', 0.026), ('molecular', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 262 acl-2011-Relation Guided Bootstrapping of Semantic Lexicons
Author: Tara McIntosh ; Lars Yencken ; James R. Curran ; Timothy Baldwin
Abstract: State-of-the-art bootstrapping systems rely on expert-crafted semantic constraints such as negative categories to reduce semantic drift. Unfortunately, their use introduces a substantial amount of supervised knowledge. We present the Relation Guided Bootstrapping (RGB) algorithm, which simultaneously extracts lexicons and open relationships to guide lexicon growth and reduce semantic drift. This removes the necessity for manually crafting category and relationship constraints, and manually generating negative categories.
2 0.19827373 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
Author: Harr Chen ; Edward Benson ; Tahira Naseem ; Regina Barzilay
Abstract: We present a novel approach to discovering relations and their instantiations from a collection of documents in a single domain. Our approach learns relation types by exploiting meta-constraints that characterize the general qualities of a good relation in any domain. These constraints state that instances of a single relation should exhibit regularities at multiple levels of linguistic structure, including lexicography, syntax, and document-level context. We capture these regularities via the structure of our probabilistic model as well as a set of declaratively-specified constraints enforced during posterior inference. Across two domains our approach successfully recovers hidden relation structure, comparable to or outperforming previous state-of-the-art approaches. Furthermore, we find that a small , set of constraints is applicable across the domains, and that using domain-specific constraints can further improve performance. 1
3 0.19256859 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
Author: Ryan Gabbard ; Marjorie Freedman ; Ralph Weischedel
Abstract: As an alternative to requiring substantial supervised relation training data, many have explored bootstrapping relation extraction from a few seed examples. Most techniques assume that the examples are based on easily spotted anchors, e.g., names or dates. Sentences in a corpus which contain the anchors are then used to induce alternative ways of expressing the relation. We explore whether coreference can improve the learning process. That is, if the algorithm considered examples such as his sister, would accuracy be improved? With coreference, we see on average a 2-fold increase in F-Score. Despite using potentially errorful machine coreference, we see significant increase in recall on all relations. Precision increases in four cases and decreases in six.
4 0.18119211 148 acl-2011-HITS-based Seed Selection and Stop List Construction for Bootstrapping
Author: Tetsuo Kiso ; Masashi Shimbo ; Mamoru Komachi ; Yuji Matsumoto
Abstract: In bootstrapping (seed set expansion), selecting good seeds and creating stop lists are two effective ways to reduce semantic drift, but these methods generally need human supervision. In this paper, we propose a graphbased approach to helping editors choose effective seeds and stop list instances, applicable to Pantel and Pennacchiotti’s Espresso bootstrapping algorithm. The idea is to select seeds and create a stop list using the rankings of instances and patterns computed by Kleinberg’s HITS algorithm. Experimental results on a variation of the lexical sample task show the effectiveness of our method.
5 0.14538036 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering
Author: Ang Sun ; Ralph Grishman ; Satoshi Sekine
Abstract: We present a simple semi-supervised relation extraction system with large-scale word clustering. We focus on systematically exploring the effectiveness of different cluster-based features. We also propose several statistical methods for selecting clusters at an appropriate level of granularity. When training on different sizes of data, our semi-supervised approach consistently outperformed a state-of-the-art supervised baseline system. 1
6 0.12703799 114 acl-2011-End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories
7 0.11053617 282 acl-2011-Shift-Reduce CCG Parsing
8 0.10886085 117 acl-2011-Entity Set Expansion using Topic information
9 0.10027461 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
10 0.099754088 174 acl-2011-Insights from Network Structure for Text Mining
11 0.094008632 315 acl-2011-Types of Common-Sense Knowledge Needed for Recognizing Textual Entailment
12 0.081931718 322 acl-2011-Unsupervised Learning of Semantic Relation Composition
13 0.080549344 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction
14 0.078867577 293 acl-2011-Template-Based Information Extraction without the Templates
15 0.077781841 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
16 0.076940984 304 acl-2011-Together We Can: Bilingual Bootstrapping for WSD
17 0.072638318 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction
18 0.069346003 162 acl-2011-Identifying the Semantic Orientation of Foreign Words
19 0.066237569 53 acl-2011-Automatically Evaluating Text Coherence Using Discourse Relations
20 0.066098027 334 acl-2011-Which Noun Phrases Denote Which Concepts?
topicId topicWeight
[(0, 0.157), (1, 0.053), (2, -0.117), (3, -0.041), (4, 0.074), (5, 0.012), (6, 0.025), (7, -0.002), (8, -0.119), (9, -0.059), (10, 0.017), (11, -0.022), (12, 0.076), (13, -0.011), (14, -0.037), (15, -0.098), (16, -0.03), (17, -0.227), (18, 0.002), (19, -0.014), (20, -0.048), (21, 0.119), (22, 0.062), (23, -0.025), (24, -0.049), (25, 0.027), (26, 0.149), (27, 0.183), (28, 0.157), (29, 0.05), (30, -0.032), (31, -0.001), (32, 0.099), (33, -0.023), (34, -0.028), (35, 0.0), (36, -0.008), (37, 0.124), (38, 0.046), (39, 0.013), (40, -0.063), (41, -0.04), (42, 0.037), (43, -0.067), (44, -0.014), (45, 0.016), (46, 0.105), (47, 0.012), (48, 0.025), (49, 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.97114325 262 acl-2011-Relation Guided Bootstrapping of Semantic Lexicons
Author: Tara McIntosh ; Lars Yencken ; James R. Curran ; Timothy Baldwin
Abstract: State-of-the-art bootstrapping systems rely on expert-crafted semantic constraints such as negative categories to reduce semantic drift. Unfortunately, their use introduces a substantial amount of supervised knowledge. We present the Relation Guided Bootstrapping (RGB) algorithm, which simultaneously extracts lexicons and open relationships to guide lexicon growth and reduce semantic drift. This removes the necessity for manually crafting category and relationship constraints, and manually generating negative categories.
2 0.78295004 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
Author: Ryan Gabbard ; Marjorie Freedman ; Ralph Weischedel
Abstract: As an alternative to requiring substantial supervised relation training data, many have explored bootstrapping relation extraction from a few seed examples. Most techniques assume that the examples are based on easily spotted anchors, e.g., names or dates. Sentences in a corpus which contain the anchors are then used to induce alternative ways of expressing the relation. We explore whether coreference can improve the learning process. That is, if the algorithm considered examples such as his sister, would accuracy be improved? With coreference, we see on average a 2-fold increase in F-Score. Despite using potentially errorful machine coreference, we see significant increase in recall on all relations. Precision increases in four cases and decreases in six.
3 0.75044441 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
Author: Harr Chen ; Edward Benson ; Tahira Naseem ; Regina Barzilay
Abstract: We present a novel approach to discovering relations and their instantiations from a collection of documents in a single domain. Our approach learns relation types by exploiting meta-constraints that characterize the general qualities of a good relation in any domain. These constraints state that instances of a single relation should exhibit regularities at multiple levels of linguistic structure, including lexicography, syntax, and document-level context. We capture these regularities via the structure of our probabilistic model as well as a set of declaratively-specified constraints enforced during posterior inference. Across two domains our approach successfully recovers hidden relation structure, comparable to or outperforming previous state-of-the-art approaches. Furthermore, we find that a small , set of constraints is applicable across the domains, and that using domain-specific constraints can further improve performance. 1
4 0.69109207 114 acl-2011-End-to-End Relation Extraction Using Distant Supervision from External Semantic Repositories
Author: Truc Vien T. Nguyen ; Alessandro Moschitti
Abstract: In this paper, we extend distant supervision (DS) based on Wikipedia for Relation Extraction (RE) by considering (i) relations defined in external repositories, e.g. YAGO, and (ii) any subset of Wikipedia documents. We show that training data constituted by sentences containing pairs of named entities in target relations is enough to produce reliable supervision. Our experiments with state-of-the-art relation extraction models, trained on the above data, show a meaningful F1 of 74.29% on a manually annotated test set: this highly improves the state-of-art in RE using DS. Additionally, our end-to-end experiments demonstrated that our extractors can be applied to any general text document.
5 0.6905719 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering
Author: Ang Sun ; Ralph Grishman ; Satoshi Sekine
Abstract: We present a simple semi-supervised relation extraction system with large-scale word clustering. We focus on systematically exploring the effectiveness of different cluster-based features. We also propose several statistical methods for selecting clusters at an appropriate level of granularity. When training on different sizes of data, our semi-supervised approach consistently outperformed a state-of-the-art supervised baseline system. 1
6 0.68036526 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
7 0.67485553 322 acl-2011-Unsupervised Learning of Semantic Relation Composition
8 0.61732036 40 acl-2011-An Error Analysis of Relation Extraction in Social Media Documents
9 0.59475416 148 acl-2011-HITS-based Seed Selection and Stop List Construction for Bootstrapping
10 0.54724449 174 acl-2011-Insights from Network Structure for Text Mining
11 0.51640916 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction
12 0.50867206 231 acl-2011-Nonlinear Evidence Fusion and Propagation for Hyponymy Relation Mining
13 0.47647768 291 acl-2011-SystemT: A Declarative Information Extraction System
14 0.44912314 304 acl-2011-Together We Can: Bilingual Bootstrapping for WSD
15 0.43348098 293 acl-2011-Template-Based Information Extraction without the Templates
16 0.41519555 222 acl-2011-Model-Portability Experiments for Textual Temporal Analysis
17 0.40746823 294 acl-2011-Temporal Evaluation
18 0.40687791 315 acl-2011-Types of Common-Sense Knowledge Needed for Recognizing Textual Entailment
19 0.40543661 121 acl-2011-Event Discovery in Social Media Feeds
20 0.39738798 74 acl-2011-Combining Indicators of Allophony
topicId topicWeight
[(5, 0.02), (17, 0.045), (25, 0.284), (31, 0.013), (37, 0.082), (39, 0.062), (41, 0.044), (55, 0.023), (59, 0.094), (72, 0.023), (91, 0.096), (96, 0.099), (97, 0.012)]
simIndex simValue paperId paperTitle
1 0.74417651 340 acl-2011-Word Alignment via Submodular Maximization over Matroids
Author: Hui Lin ; Jeff Bilmes
Abstract: We cast the word alignment problem as maximizing a submodular function under matroid constraints. Our framework is able to express complex interactions between alignment components while remaining computationally efficient, thanks to the power and generality of submodular functions. We show that submodularity naturally arises when modeling word fertility. Experiments on the English-French Hansards alignment task show that our approach achieves lower alignment error rates compared to conventional matching based approaches.
same-paper 2 0.73318243 262 acl-2011-Relation Guided Bootstrapping of Semantic Lexicons
Author: Tara McIntosh ; Lars Yencken ; James R. Curran ; Timothy Baldwin
Abstract: State-of-the-art bootstrapping systems rely on expert-crafted semantic constraints such as negative categories to reduce semantic drift. Unfortunately, their use introduces a substantial amount of supervised knowledge. We present the Relation Guided Bootstrapping (RGB) algorithm, which simultaneously extracts lexicons and open relationships to guide lexicon growth and reduce semantic drift. This removes the necessity for manually crafting category and relationship constraints, and manually generating negative categories.
3 0.58504629 92 acl-2011-Data point selection for cross-language adaptation of dependency parsers
Author: Anders Sgaard
Abstract: We consider a very simple, yet effective, approach to cross language adaptation of dependency parsers. We first remove lexical items from the treebanks and map part-of-speech tags into a common tagset. We then train a language model on tag sequences in otherwise unlabeled target data and rank labeled source data by perplexity per word of tag sequences from less similar to most similar to the target. We then train our target language parser on the most similar data points in the source labeled data. The strategy achieves much better results than a non-adapted baseline and stateof-the-art unsupervised dependency parsing, and results are comparable to more complex projection-based cross language adaptation algorithms.
4 0.53227568 164 acl-2011-Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features
Author: Yuval Marton ; Nizar Habash ; Owen Rambow
Abstract: We explore the contribution of morphological features both lexical and inflectional to dependency parsing of Arabic, a morphologically rich language. Using controlled experiments, we find that definiteness, person, number, gender, and the undiacritzed lemma are most helpful for parsing on automatically tagged input. We further contrast the contribution of form-based and functional features, and show that functional gender and number (e.g., “broken plurals”) and the related rationality feature improve over form-based features. It is the first time functional morphological features are used for Arabic NLP. – –
5 0.52691233 293 acl-2011-Template-Based Information Extraction without the Templates
Author: Nathanael Chambers ; Dan Jurafsky
Abstract: Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data, to learn to extract their slot fillers (e.g., an embassy is the Target of a Bombing template). This paper describes an approach to template-based IE that removes this requirement and performs extraction without knowing the template structure in advance. Our algorithm instead learns the template structure automatically from raw text, inducing template schemas as sets of linked events (e.g., bombings include detonate, set off, and destroy events) associated with semantic roles. We also solve the standard IE task, using the induced syntactic patterns to extract role fillers from specific documents. We evaluate on the MUC-4 terrorism dataset and show that we induce template structure very similar to handcreated gold structure, and we extract role fillers with an F1 score of .40, approaching the performance of algorithms that require full knowledge of the templates.
6 0.52645785 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
7 0.52527583 108 acl-2011-EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
8 0.52495635 79 acl-2011-Confidence Driven Unsupervised Semantic Parsing
9 0.52171731 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
10 0.51990575 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
11 0.51988125 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction
12 0.51447833 148 acl-2011-HITS-based Seed Selection and Stop List Construction for Bootstrapping
13 0.5137167 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling
14 0.51266962 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
15 0.51190704 329 acl-2011-Using Deep Morphology to Improve Automatic Error Detection in Arabic Handwriting Recognition
16 0.50950992 241 acl-2011-Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation
17 0.50888854 304 acl-2011-Together We Can: Bilingual Bootstrapping for WSD
18 0.50825357 4 acl-2011-A Class of Submodular Functions for Document Summarization
19 0.50757331 200 acl-2011-Learning Dependency-Based Compositional Semantics
20 0.50710803 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing