emnlp emnlp2011 emnlp2011-40 knowledge-graph by maker-knowledge-mining

40 emnlp-2011-Discovering Relations between Noun Categories


Source: pdf

Author: Thahir Mohamed ; Estevam Hruschka ; Tom Mitchell

Abstract: Traditional approaches to Relation Extraction from text require manually defining the relations to be extracted. We propose here an approach to automatically discovering relevant relations, given a large text corpus plus an initial ontology defining hundreds of noun categories (e.g., Athlete, Musician, Instrument). Our approach discovers frequently stated relations between pairs of these categories, using a two step process. For each pair of categories (e.g., Musician and Instrument) it first coclusters the text contexts that connect known instances of the two categories, generating a candidate relation for each resulting cluster. It then applies a trained classifier to determine which of these candidate relations is semantically valid. Our experiments apply this to a text corpus containing approximately 200 million web pages and an ontology containing 122 categories from the NELL system [Carlson et al., 2010b], producing a set of 781 proposed can- didate relations, approximately half of which are semantically valid. We conclude this is a useful approach to semi-automatic extension of the ontology for large-scale information extraction systems such as NELL. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Traditional approaches to Relation Extraction from text require manually defining the relations to be extracted. [sent-8, score-0.375]

2 We propose here an approach to automatically discovering relevant relations, given a large text corpus plus an initial ontology defining hundreds of noun categories (e. [sent-9, score-1.155]

3 Our approach discovers frequently stated relations between pairs of these categories, using a two step process. [sent-12, score-0.404]

4 , Musician and Instrument) it first coclusters the text contexts that connect known instances of the two categories, generating a candidate relation for each resulting cluster. [sent-15, score-0.313]

5 It then applies a trained classifier to determine which of these candidate relations is semantically valid. [sent-16, score-0.443]

6 Our experiments apply this to a text corpus containing approximately 200 million web pages and an ontology containing 122 categories from the NELL system [Carlson et al. [sent-17, score-0.987]

7 , 2010b], producing a set of 781 proposed can- didate relations, approximately half of which are semantically valid. [sent-18, score-0.326]

8 We conclude this is a useful approach to semi-automatic extension of the ontology for large-scale information extraction systems such as NELL. [sent-19, score-0.467]

9 , 2010b)) is a computer system that learns continuously to extract facts from the web. [sent-21, score-0.288]

10 NELL is given as input an initial ontology that specifies the semantic categories (e. [sent-22, score-0.792]

11 In addition, it is provided 10-20 seed positive training examples for each of these categories and relations, along with hundreds of millions of unlabeled web page. [sent-27, score-0.623]

12 Given this input, NELL applies a 1447 large-scale multitask, semisupervised learning method to learn to extract new instances of these categories (e. [sent-28, score-0.556]

13 During the past 17 months NELL has been running nearly continuously, learning to extract over 600 categories and relations, and populating a knowledge base containing over 700,000 instances of these categories and relations with a precision of approximately 0. [sent-33, score-1.47]

14 This paper considers the problem of automatically discovering new relations to extend the ontology of systems such as NELL, enabling them to increase over time their learning and extraction capabilities. [sent-35, score-0.902]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('nell', 0.564), ('ontology', 0.36), ('categories', 0.286), ('relations', 0.238), ('musician', 0.218), ('teamplaysincity', 0.188), ('instrument', 0.158), ('continuously', 0.148), ('estevam', 0.148), ('hundreds', 0.118), ('carlson', 0.113), ('discovering', 0.111), ('pittsburgh', 0.106), ('athlete', 0.094), ('didate', 0.094), ('populating', 0.094), ('sportsteam', 0.094), ('steelers', 0.094), ('approximately', 0.089), ('mohamed', 0.085), ('federal', 0.079), ('city', 0.078), ('instances', 0.075), ('months', 0.074), ('multitask', 0.074), ('defining', 0.07), ('applies', 0.07), ('containing', 0.069), ('hruschka', 0.067), ('semantically', 0.064), ('discovers', 0.064), ('base', 0.063), ('extract', 0.062), ('carlos', 0.061), ('connect', 0.061), ('london', 0.059), ('carnegie', 0.057), ('mellon', 0.057), ('specifying', 0.055), ('millions', 0.055), ('tom', 0.053), ('specifies', 0.053), ('frequently', 0.052), ('stated', 0.05), ('company', 0.05), ('enabling', 0.049), ('semisupervised', 0.049), ('facts', 0.046), ('seed', 0.044), ('input', 0.041), ('extraction', 0.04), ('mitchell', 0.04), ('plus', 0.04), ('half', 0.04), ('noun', 0.039), ('producing', 0.039), ('precisely', 0.039), ('web', 0.039), ('learner', 0.037), ('past', 0.035), ('perhaps', 0.034), ('considers', 0.034), ('conclude', 0.034), ('initial', 0.034), ('extension', 0.033), ('nearly', 0.033), ('relation', 0.033), ('running', 0.033), ('argument', 0.033), ('learns', 0.032), ('text', 0.032), ('contexts', 0.031), ('candidate', 0.03), ('million', 0.029), ('automatically', 0.028), ('unlabeled', 0.026), ('extend', 0.025), ('traditional', 0.025), ('classifier', 0.024), ('correspond', 0.023), ('relevant', 0.023), ('generating', 0.02), ('mentioned', 0.02), ('positive', 0.02), ('manually', 0.019), ('semantic', 0.018), ('known', 0.018), ('precision', 0.018), ('errors', 0.018), ('provided', 0.018), ('increase', 0.017), ('determine', 0.017), ('whose', 0.017), ('along', 0.017), ('require', 0.016), ('must', 0.015), ('knowledge', 0.015), ('learn', 0.014), ('corpus', 0.014), ('resulting', 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 40 emnlp-2011-Discovering Relations between Noun Categories

Author: Thahir Mohamed ; Estevam Hruschka ; Tom Mitchell

Abstract: Traditional approaches to Relation Extraction from text require manually defining the relations to be extracted. We propose here an approach to automatically discovering relevant relations, given a large text corpus plus an initial ontology defining hundreds of noun categories (e.g., Athlete, Musician, Instrument). Our approach discovers frequently stated relations between pairs of these categories, using a two step process. For each pair of categories (e.g., Musician and Instrument) it first coclusters the text contexts that connect known instances of the two categories, generating a candidate relation for each resulting cluster. It then applies a trained classifier to determine which of these candidate relations is semantically valid. Our experiments apply this to a text corpus containing approximately 200 million web pages and an ontology containing 122 categories from the NELL system [Carlson et al., 2010b], producing a set of 781 proposed can- didate relations, approximately half of which are semantically valid. We conclude this is a useful approach to semi-automatic extension of the ontology for large-scale information extraction systems such as NELL. 1

2 0.26068518 109 emnlp-2011-Random Walk Inference and Learning in A Large Scale Knowledge Base

Author: Ni Lao ; Tom Mitchell ; William W. Cohen

Abstract: t om . We consider the problem of performing learning and inference in a large scale knowledge base containing imperfect knowledge with incomplete coverage. We show that a soft inference procedure based on a combination of constrained, weighted, random walks through the knowledge base graph can be used to reliably infer new beliefs for the knowledge base. More specifically, we show that the system can learn to infer different target relations by tuning the weights associated with random walks that follow different paths through the graph, using a version of the Path Ranking Algorithm (Lao and Cohen, 2010b). We apply this approach to a knowledge base of approximately 500,000 beliefs extracted imperfectly from the web by NELL, a never-ending language learner (Carlson et al., 2010). This new system improves significantly over NELL’s earlier Horn-clause learning and inference method: it obtains nearly double the precision at rank 100, and the new learning method is also applicable to many more inference tasks.

3 0.084578723 57 emnlp-2011-Extreme Extraction - Machine Reading in a Week

Author: Marjorie Freedman ; Lance Ramshaw ; Elizabeth Boschee ; Ryan Gabbard ; Gary Kratkiewicz ; Nicolas Ward ; Ralph Weischedel

Abstract: We report on empirical results in extreme extraction. It is extreme in that (1) from receipt of the ontology specifying the target concepts and relations, development is limited to one week and that (2) relatively little training data is assumed. We are able to surpass human recall and achieve an F1 of 0.5 1 on a question-answering task with less than 50 hours of effort using a hybrid approach that mixes active learning, bootstrapping, and limited (5 hours) manual rule writing. We compare the performance of three systems: extraction with handwritten rules, bootstrapped extraction, and a combination. We show that while the recall of the handwritten rules surpasses that of the learned system, the learned system is able to improve the overall recall and F1.

4 0.078497969 14 emnlp-2011-A generative model for unsupervised discovery of relations and argument classes from clinical texts

Author: Bryan Rink ; Sanda Harabagiu

Abstract: This paper presents a generative model for the automatic discovery of relations between entities in electronic medical records. The model discovers relation instances and their types by determining which context tokens express the relation. Additionally, the valid semantic classes for each type of relation are determined. We show that the model produces clusters of relation trigger words which better correspond with manually annotated relations than several existing clustering techniques. The discovered relations reveal some of the implicit semantic structure present in patient records.

5 0.068958044 128 emnlp-2011-Structured Relation Discovery using Generative Models

Author: Limin Yao ; Aria Haghighi ; Sebastian Riedel ; Andrew McCallum

Abstract: We explore unsupervised approaches to relation extraction between two named entities; for instance, the semantic bornIn relation between a person and location entity. Concretely, we propose a series of generative probabilistic models, broadly similar to topic models, each which generates a corpus of observed triples of entity mention pairs and the surface syntactic dependency path between them. The output of each model is a clustering of observed relation tuples and their associated textual expressions to underlying semantic relation types. Our proposed models exploit entity type constraints within a relation as well as features on the dependency path between entity mentions. We examine effectiveness of our approach via multiple evaluations and demonstrate 12% error reduction in precision over a state-of-the-art weakly supervised baseline.

6 0.066599131 113 emnlp-2011-Relation Acquisition using Word Classes and Partial Patterns

7 0.065907329 114 emnlp-2011-Relation Extraction with Relation Topics

8 0.038664233 26 emnlp-2011-Class Label Enhancement via Related Instances

9 0.035148639 78 emnlp-2011-Large-Scale Noun Compound Interpretation Using Bootstrapping and the Web as a Corpus

10 0.035004366 7 emnlp-2011-A Joint Model for Extended Semantic Role Labeling

11 0.03402425 92 emnlp-2011-Minimally Supervised Event Causality Identification

12 0.033990849 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search

13 0.033944547 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax

14 0.032631435 121 emnlp-2011-Semi-supervised CCG Lexicon Extension

15 0.03064161 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

16 0.030514084 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation

17 0.029784609 70 emnlp-2011-Identifying Relations for Open Information Extraction

18 0.029104503 145 emnlp-2011-Unsupervised Semantic Role Induction with Graph Partitioning

19 0.029060775 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances

20 0.027657915 126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.095), (1, -0.074), (2, -0.107), (3, 0.029), (4, -0.047), (5, -0.179), (6, 0.074), (7, 0.008), (8, -0.014), (9, 0.091), (10, 0.048), (11, 0.076), (12, -0.113), (13, -0.031), (14, 0.082), (15, 0.003), (16, 0.068), (17, -0.123), (18, -0.039), (19, -0.006), (20, 0.23), (21, -0.082), (22, 0.106), (23, 0.227), (24, 0.02), (25, 0.108), (26, -0.043), (27, -0.255), (28, -0.157), (29, 0.109), (30, 0.313), (31, 0.286), (32, 0.035), (33, -0.123), (34, -0.096), (35, 0.066), (36, 0.134), (37, -0.064), (38, -0.091), (39, 0.115), (40, 0.124), (41, -0.01), (42, 0.05), (43, -0.003), (44, -0.025), (45, 0.134), (46, 0.04), (47, -0.04), (48, 0.042), (49, 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96938813 40 emnlp-2011-Discovering Relations between Noun Categories

Author: Thahir Mohamed ; Estevam Hruschka ; Tom Mitchell

Abstract: Traditional approaches to Relation Extraction from text require manually defining the relations to be extracted. We propose here an approach to automatically discovering relevant relations, given a large text corpus plus an initial ontology defining hundreds of noun categories (e.g., Athlete, Musician, Instrument). Our approach discovers frequently stated relations between pairs of these categories, using a two step process. For each pair of categories (e.g., Musician and Instrument) it first coclusters the text contexts that connect known instances of the two categories, generating a candidate relation for each resulting cluster. It then applies a trained classifier to determine which of these candidate relations is semantically valid. Our experiments apply this to a text corpus containing approximately 200 million web pages and an ontology containing 122 categories from the NELL system [Carlson et al., 2010b], producing a set of 781 proposed can- didate relations, approximately half of which are semantically valid. We conclude this is a useful approach to semi-automatic extension of the ontology for large-scale information extraction systems such as NELL. 1

2 0.80225527 109 emnlp-2011-Random Walk Inference and Learning in A Large Scale Knowledge Base

Author: Ni Lao ; Tom Mitchell ; William W. Cohen

Abstract: t om . We consider the problem of performing learning and inference in a large scale knowledge base containing imperfect knowledge with incomplete coverage. We show that a soft inference procedure based on a combination of constrained, weighted, random walks through the knowledge base graph can be used to reliably infer new beliefs for the knowledge base. More specifically, we show that the system can learn to infer different target relations by tuning the weights associated with random walks that follow different paths through the graph, using a version of the Path Ranking Algorithm (Lao and Cohen, 2010b). We apply this approach to a knowledge base of approximately 500,000 beliefs extracted imperfectly from the web by NELL, a never-ending language learner (Carlson et al., 2010). This new system improves significantly over NELL’s earlier Horn-clause learning and inference method: it obtains nearly double the precision at rank 100, and the new learning method is also applicable to many more inference tasks.

3 0.24323727 114 emnlp-2011-Relation Extraction with Relation Topics

Author: Chang Wang ; James Fan ; Aditya Kalyanpur ; David Gondek

Abstract: This paper describes a novel approach to the semantic relation detection problem. Instead of relying only on the training instances for a new relation, we leverage the knowledge learned from previously trained relation detectors. Specifically, we detect a new semantic relation by projecting the new relation’s training instances onto a lower dimension topic space constructed from existing relation detectors through a three step process. First, we construct a large relation repository of more than 7,000 relations from Wikipedia. Second, we construct a set of non-redundant relation topics defined at multiple scales from the relation repository to characterize the existing relations. Similar to the topics defined over words, each relation topic is an interpretable multinomial distribution over the existing relations. Third, we integrate the relation topics in a kernel function, and use it together with SVM to construct detectors for new relations. The experimental results on Wikipedia and ACE data have confirmed that backgroundknowledge-based topics generated from the Wikipedia relation repository can significantly improve the performance over the state-of-theart relation detection approaches.

4 0.23843728 57 emnlp-2011-Extreme Extraction - Machine Reading in a Week

Author: Marjorie Freedman ; Lance Ramshaw ; Elizabeth Boschee ; Ryan Gabbard ; Gary Kratkiewicz ; Nicolas Ward ; Ralph Weischedel

Abstract: We report on empirical results in extreme extraction. It is extreme in that (1) from receipt of the ontology specifying the target concepts and relations, development is limited to one week and that (2) relatively little training data is assumed. We are able to surpass human recall and achieve an F1 of 0.5 1 on a question-answering task with less than 50 hours of effort using a hybrid approach that mixes active learning, bootstrapping, and limited (5 hours) manual rule writing. We compare the performance of three systems: extraction with handwritten rules, bootstrapped extraction, and a combination. We show that while the recall of the handwritten rules surpasses that of the learned system, the learned system is able to improve the overall recall and F1.

5 0.23720774 14 emnlp-2011-A generative model for unsupervised discovery of relations and argument classes from clinical texts

Author: Bryan Rink ; Sanda Harabagiu

Abstract: This paper presents a generative model for the automatic discovery of relations between entities in electronic medical records. The model discovers relation instances and their types by determining which context tokens express the relation. Additionally, the valid semantic classes for each type of relation are determined. We show that the model produces clusters of relation trigger words which better correspond with manually annotated relations than several existing clustering techniques. The discovered relations reveal some of the implicit semantic structure present in patient records.

6 0.21003872 128 emnlp-2011-Structured Relation Discovery using Generative Models

7 0.20941697 113 emnlp-2011-Relation Acquisition using Word Classes and Partial Patterns

8 0.15853079 70 emnlp-2011-Identifying Relations for Open Information Extraction

9 0.15033191 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search

10 0.1432393 26 emnlp-2011-Class Label Enhancement via Related Instances

11 0.14108878 92 emnlp-2011-Minimally Supervised Event Causality Identification

12 0.13520789 126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation

13 0.13118283 121 emnlp-2011-Semi-supervised CCG Lexicon Extension

14 0.12663202 23 emnlp-2011-Bootstrapped Named Entity Recognition for Product Attribute Extraction

15 0.12397597 142 emnlp-2011-Unsupervised Discovery of Discourse Relations for Eliminating Intra-sentence Polarity Ambiguities

16 0.11136039 94 emnlp-2011-Modelling Discourse Relations for Arabic

17 0.10644202 64 emnlp-2011-Harnessing different knowledge sources to measure semantic relatedness under a uniform model

18 0.10614479 78 emnlp-2011-Large-Scale Noun Compound Interpretation Using Bootstrapping and the Web as a Corpus

19 0.10492112 81 emnlp-2011-Learning General Connotation of Words using Graph-based Algorithms

20 0.10352506 12 emnlp-2011-A Weakly-supervised Approach to Argumentative Zoning of Scientific Documents


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(23, 0.083), (36, 0.016), (45, 0.031), (47, 0.533), (54, 0.015), (57, 0.012), (62, 0.078), (66, 0.012), (69, 0.015), (79, 0.039), (82, 0.014), (96, 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.71060753 40 emnlp-2011-Discovering Relations between Noun Categories

Author: Thahir Mohamed ; Estevam Hruschka ; Tom Mitchell

Abstract: Traditional approaches to Relation Extraction from text require manually defining the relations to be extracted. We propose here an approach to automatically discovering relevant relations, given a large text corpus plus an initial ontology defining hundreds of noun categories (e.g., Athlete, Musician, Instrument). Our approach discovers frequently stated relations between pairs of these categories, using a two step process. For each pair of categories (e.g., Musician and Instrument) it first coclusters the text contexts that connect known instances of the two categories, generating a candidate relation for each resulting cluster. It then applies a trained classifier to determine which of these candidate relations is semantically valid. Our experiments apply this to a text corpus containing approximately 200 million web pages and an ontology containing 122 categories from the NELL system [Carlson et al., 2010b], producing a set of 781 proposed can- didate relations, approximately half of which are semantically valid. We conclude this is a useful approach to semi-automatic extension of the ontology for large-scale information extraction systems such as NELL. 1

2 0.34526065 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives

Author: Keith Hall ; Ryan McDonald ; Jason Katz-Brown ; Michael Ringgaard

Abstract: We present an online learning algorithm for training parsers which allows for the inclusion of multiple objective functions. The primary example is the extension of a standard supervised parsing objective function with additional loss-functions, either based on intrinsic parsing quality or task-specific extrinsic measures of quality. Our empirical results show how this approach performs for two dependency parsing algorithms (graph-based and transition-based parsing) and how it achieves increased performance on multiple target tasks including reordering for machine translation and parser adaptation.

3 0.24191347 136 emnlp-2011-Training a Parser for Machine Translation Reordering

Author: Jason Katz-Brown ; Slav Petrov ; Ryan McDonald ; Franz Och ; David Talbot ; Hiroshi Ichikawa ; Masakazu Seno ; Hideto Kazawa

Abstract: We propose a simple training regime that can improve the extrinsic performance of a parser, given only a corpus of sentences and a way to automatically evaluate the extrinsic quality of a candidate parse. We apply our method to train parsers that excel when used as part of a reordering component in a statistical machine translation system. We use a corpus of weakly-labeled reference reorderings to guide parser training. Our best parsers contribute significant improvements in subjective translation quality while their intrinsic attachment scores typically regress.

4 0.23347534 100 emnlp-2011-Optimal Search for Minimum Error Rate Training

Author: Michel Galley ; Chris Quirk

Abstract: Minimum error rate training is a crucial component to many state-of-the-art NLP applications, such as machine translation and speech recognition. However, common evaluation functions such as BLEU or word error rate are generally highly non-convex and thus prone to search errors. In this paper, we present LP-MERT, an exact search algorithm for minimum error rate training that reaches the global optimum using a series of reductions to linear programming. Given a set of N-best lists produced from S input sentences, this algorithm finds a linear model that is globally optimal with respect to this set. We find that this algorithm is polynomial in N and in the size of the model, but exponential in S. We present extensions of this work that let us scale to reasonably large tuning sets (e.g., one thousand sentences), by either searching only promising regions of the parameter space, or by using a variant of LP-MERT that relies on a beam-search approximation. Experimental results show improvements over the standard Och algorithm.

5 0.22892325 32 emnlp-2011-Computing Logical Form on Regulatory Texts

Author: Nikhil Dinesh ; Aravind Joshi ; Insup Lee

Abstract: The computation of logical form has been proposed as an intermediate step in the translation of sentences to logic. Logical form encodes the resolution of scope ambiguities. In this paper, we describe experiments on a modestsized corpus of regulation annotated with a novel variant of logical form, called abstract syntax trees (ASTs). The main step in computing ASTs is to order scope-taking operators. A learning model for ranking is adapted for this ordering. We design features by studying the problem ofcomparing the scope ofone operator to another. The scope comparisons are used to compute ASTs, with an F-score of 90.6% on the set of ordering decisons.

6 0.2253871 14 emnlp-2011-A generative model for unsupervised discovery of relations and argument classes from clinical texts

7 0.21145985 114 emnlp-2011-Relation Extraction with Relation Topics

8 0.21058916 109 emnlp-2011-Random Walk Inference and Learning in A Large Scale Knowledge Base

9 0.20898709 128 emnlp-2011-Structured Relation Discovery using Generative Models

10 0.20422134 70 emnlp-2011-Identifying Relations for Open Information Extraction

11 0.20067684 57 emnlp-2011-Extreme Extraction - Machine Reading in a Week

12 0.19669789 126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation

13 0.19434267 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances

14 0.19404064 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search

15 0.1926998 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training

16 0.1922861 138 emnlp-2011-Tuning as Ranking

17 0.19217549 65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction

18 0.19211824 98 emnlp-2011-Named Entity Recognition in Tweets: An Experimental Study

19 0.19105932 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation

20 0.19083567 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction