emnlp emnlp2012 emnlp2012-40 knowledge-graph by maker-knowledge-mining

40 emnlp-2012-Ensemble Semantics for Large-scale Unsupervised Relation Extraction

Source: pdf

Author: Bonan Min ; Shuming Shi ; Ralph Grishman ; Chin-Yew Lin

Abstract: Discovering significant types of relations from the web is challenging because of its open nature. Unsupervised algorithms are developed to extract relations from a corpus without knowing the relations in advance, but most of them rely on tagging arguments of predefined types. Recently, a new algorithm was proposed to jointly extract relations and their argument semantic classes, taking a set of relation instances extracted by an open IE algorithm as input. However, it cannot handle polysemy of relation phrases and fails to group many similar (“synonymous”) relation instances because of the sparseness of features. In this paper, we present a novel unsupervised algorithm that provides a more general treatment of the polysemy and synonymy problems. The algorithm incorporates various knowledge sources which we will show to be very effective for unsupervised extraction. Moreover, it explicitly disambiguates polysemous relation phrases and groups synonymous ones. While maintaining approximately the same precision, the algorithm achieves significant improvement on recall compared to the previous method. It is also very efficient. Experiments on a realworld dataset show that it can handle 14.7 million relation instances and extract a very large set of relations from the web. Ralph Grishman1 Chin-Yew Lin2 2Microsoft Research Asia Beijing, China { shumings cyl } @mi cro s o ft . com , that has many applications in answering factoid questions, building knowledge bases and improving search engine relevance. The web has become a massive potential source of such relations. However, its open nature brings an open-ended set of relation types. To extract these relations, a system should not assume a fixed set of relation types, nor rely on a fixed set of relation argument types. The past decade has seen some promising solutions, unsupervised relation extraction (URE) algorithms that extract relations from a corpus without knowing the relations in advance. However, most algorithms (Hasegawa et al., 2004, Shinyama and Sekine, 2006, Chen et. al, 2005) rely on tagging predefined types of entities as relation arguments, and thus are not well-suited for the open domain. Recently, Kok and Domingos (2008) proposed Semantic Network Extractor (SNE), which generates argument semantic classes and sets of synonymous relation phrases at the same time, thus avoiding the requirement of tagging relation arguments of predefined types. However, SNE has 2 limitations: 1) Following previous URE algorithms, it only uses features from the set of input relation instances for clustering. Empirically we found that it fails to group many relevant relation instances. These features, such as the surface forms of arguments and lexical sequences in between, are very sparse in practice. In contrast, there exist several well-known corpus-level semantic resources that can be automatically derived from a source corpus and are shown to be useful for generating the key elements of a relation: its 2 argument semantic classes and a set of synonymous phrases. For example, semantic classes can be derived from a source corpus with contextual distributional simi1 Introduction Relation extraction aims at discovering semantic larity and web table co-occurrences. The “synonymy” 1 problem for clustering relation instances relations between entities. It is an important task * Work done during an internship at Microsoft Research Asia 1027 LParnogcue agdein Lgesa ornf tihneg, 2 p0a1g2e Jso 1in02t C7–o1n0f3e7re,n Jce ju on Is Elanmdp,ir Kicoarlea M,e 1t2h–o1d4s J iunly N 2a0tu1r2a.l ? Lc a2n0g1u2ag Aes Psorcoicaetsiosin fgo arn Cdo Cmopmutpauti oantiaoln Lailn Ngautiustriacls could potentially be better solved by adding these resources. 2) SNE assumes that each entity or relation phrase belongs to exactly one cluster, thus is not able to effectively handle polysemy of relation phrases2. An example of a polysemous phrase is be the currency of as in 2 triples

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Ensemble Semantics for Large-scale Unsupervised Relation Extraction Bonan Min1* Shuming Shi2 1New York University New York, NY, USA { min gri shman } @ cs . [sent-1, score-0.032]

2 edu , Abstract Discovering significant types of relations from the web is challenging because of its open nature. [sent-3, score-0.384]

3 Unsupervised algorithms are developed to extract relations from a corpus without knowing the relations in advance, but most of them rely on tagging arguments of predefined types. [sent-4, score-1.03]

4 Recently, a new algorithm was proposed to jointly extract relations and their argument semantic classes, taking a set of relation instances extracted by an open IE algorithm as input. [sent-5, score-1.111]

5 However, it cannot handle polysemy of relation phrases and fails to group many similar (“synonymous”) relation instances because of the sparseness of features. [sent-6, score-1.577]

6 In this paper, we present a novel unsupervised algorithm that provides a more general treatment of the polysemy and synonymy problems. [sent-7, score-0.534]

7 The algorithm incorporates various knowledge sources which we will show to be very effective for unsupervised extraction. [sent-8, score-0.125]

8 Moreover, it explicitly disambiguates polysemous relation phrases and groups synonymous ones. [sent-9, score-1.082]

9 While maintaining approximately the same precision, the algorithm achieves significant improvement on recall compared to the previous method. [sent-10, score-0.097]

10 Experiments on a realworld dataset show that it can handle 14. [sent-12, score-0.167]

11 7 million relation instances and extract a very large set of relations from the web. [sent-13, score-0.779]

12 Ralph Grishman1 Chin-Yew Lin2 2Microsoft Research Asia Beijing, China { shumings cyl } @mi cro s o ft . [sent-14, score-0.219]

13 com , that has many applications in answering factoid questions, building knowledge bases and improving search engine relevance. [sent-15, score-0.266]

14 The web has become a massive potential source of such relations. [sent-16, score-0.186]

15 However, its open nature brings an open-ended set of relation types. [sent-17, score-0.635]

16 To extract these relations, a system should not assume a fixed set of relation types, nor rely on a fixed set of relation argument types. [sent-18, score-1.256]

17 The past decade has seen some promising solutions, unsupervised relation extraction (URE) algorithms that extract relations from a corpus without knowing the relations in advance. [sent-19, score-1.275]

18 al, 2005) rely on tagging predefined types of entities as relation arguments, and thus are not well-suited for the open domain. [sent-22, score-0.874]

19 Recently, Kok and Domingos (2008) proposed Semantic Network Extractor (SNE), which generates argument semantic classes and sets of synonymous relation phrases at the same time, thus avoiding the requirement of tagging relation arguments of predefined types. [sent-23, score-2.118]

20 However, SNE has 2 limitations: 1) Following previous URE algorithms, it only uses features from the set of input relation instances for clustering. [sent-24, score-0.547]

21 Empirically we found that it fails to group many relevant relation instances. [sent-25, score-0.578]

22 These features, such as the surface forms of arguments and lexical sequences in between, are very sparse in practice. [sent-26, score-0.23]

23 In contrast, there exist several well-known corpus-level semantic resources that can be automatically derived from a source corpus and are shown to be useful for generating the key elements of a relation: its 2 argument semantic classes and a set of synonymous phrases. [sent-27, score-0.868]

24 For example, semantic classes can be derived from a source corpus with contextual distributional simi1 Introduction Relation extraction aims at discovering semantic larity and web table co-occurrences. [sent-28, score-0.726]

25 The “synonymy” 1 problem for clustering relation instances relations between entities. [sent-29, score-0.714]

26 It is an important task * Work done during an internship at Microsoft Research Asia 1027 LParnogcue agdein Lgesa ornf tihneg, 2 p0a1g2e Jso 1in02t C7–o1n0f3e7re,n Jce ju on Is Elanmdp,ir Kicoarlea M,e 1t2h–o1d4s J iunly N 2a0tu1r2a. [sent-30, score-0.088]

27 Lc a2n0g1u2ag Aes Psorcoicaetsiosin fgo arn Cdo Cmopmutpauti oantiaoln Lailn Ngautiustriacls could potentially be better solved by adding these resources. [sent-32, score-0.04]

28 2) SNE assumes that each entity or relation phrase belongs to exactly one cluster, thus is not able to effectively handle polysemy of relation phrases2. [sent-33, score-1.34]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('relation', 0.443), ('synonymous', 0.326), ('polysemy', 0.264), ('sne', 0.244), ('ure', 0.226), ('relations', 0.167), ('asia', 0.163), ('arguments', 0.161), ('argument', 0.157), ('predefined', 0.153), ('polysemous', 0.153), ('synonymy', 0.131), ('knowing', 0.116), ('instances', 0.104), ('discovering', 0.102), ('open', 0.102), ('hasegawa', 0.097), ('bonan', 0.097), ('cyl', 0.097), ('classes', 0.094), ('larity', 0.088), ('realworld', 0.088), ('factoid', 0.088), ('internship', 0.088), ('fails', 0.087), ('massive', 0.081), ('advance', 0.081), ('handle', 0.079), ('unsupervised', 0.079), ('tagging', 0.078), ('disambiguates', 0.076), ('shinyama', 0.076), ('semantic', 0.073), ('kok', 0.072), ('rely', 0.07), ('decade', 0.069), ('cro', 0.069), ('extract', 0.065), ('sekine', 0.063), ('extractor', 0.063), ('brings', 0.06), ('maintaining', 0.06), ('treatment', 0.06), ('domingos', 0.058), ('engine', 0.056), ('sparseness', 0.056), ('triples', 0.056), ('web', 0.054), ('avoiding', 0.054), ('ralph', 0.054), ('phrases', 0.053), ('algorithms', 0.053), ('requirement', 0.053), ('ft', 0.053), ('ensemble', 0.053), ('source', 0.051), ('microsoft', 0.051), ('bases', 0.051), ('belongs', 0.051), ('york', 0.051), ('beijing', 0.05), ('group', 0.048), ('extraction', 0.047), ('limitations', 0.047), ('aims', 0.046), ('incorporates', 0.046), ('ie', 0.044), ('solutions', 0.044), ('solved', 0.04), ('network', 0.039), ('fixed', 0.039), ('recently', 0.039), ('al', 0.038), ('china', 0.038), ('sparse', 0.037), ('answering', 0.037), ('approximately', 0.037), ('ny', 0.036), ('mi', 0.036), ('derived', 0.036), ('promising', 0.035), ('cluster', 0.035), ('past', 0.034), ('com', 0.034), ('distributional', 0.034), ('questions', 0.033), ('challenging', 0.033), ('surface', 0.032), ('min', 0.032), ('groups', 0.031), ('assumes', 0.03), ('nature', 0.03), ('phrase', 0.03), ('generates', 0.03), ('exist', 0.03), ('semantics', 0.03), ('contextual', 0.028), ('elements', 0.028), ('lparnogcue', 0.028), ('types', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 40 emnlp-2012-Ensemble Semantics for Large-scale Unsupervised Relation Extraction

Author: Bonan Min ; Shuming Shi ; Ralph Grishman ; Chin-Yew Lin

2 0.1511967 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction

Author: Mihai Surdeanu ; Julie Tibshirani ; Ramesh Nallapati ; Christopher D. Manning

Abstract: Distant supervision for relation extraction (RE) gathering training data by aligning a database of facts with text – is an efficient approach to scale RE to thousands of different relations. However, this introduces a challenging learning scenario where the relation expressed by a pair of entities found in a sentence is unknown. For example, a sentence containing Balzac and France may express BornIn or Died, an unknown relation, or no relation at all. Because of this, traditional supervised learning, which assumes that each example is explicitly mapped to a label, is not appropriate. We propose a novel approach to multi-instance multi-label learning for RE, which jointly models all the instances of a pair of entities in text and all their labels using a graphical model with latent variables. Our model performs competitively on two difficult domains. –

3 0.1436983 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

Author: Jayant Krishnamurthy ; Tom Mitchell

Abstract: We present a method for training a semantic parser using only a knowledge base and an unlabeled text corpus, without any individually annotated sentences. Our key observation is that multiple forms ofweak supervision can be combined to train an accurate semantic parser: semantic supervision from a knowledge base, and syntactic supervision from dependencyparsed sentences. We apply our approach to train a semantic parser that uses 77 relations from Freebase in its knowledge representation. This semantic parser extracts instances of binary relations with state-of-theart accuracy, while simultaneously recovering much richer semantic structures, such as conjunctions of multiple relations with partially shared arguments. We demonstrate recovery of this richer structure by extracting logical forms from natural language queries against Freebase. On this task, the trained semantic parser achieves 80% precision and 56% recall, despite never having seen an annotated logical form.

4 0.14218543 62 emnlp-2012-Identifying Constant and Unique Relations by using Time-Series Text

Author: Yohei Takaku ; Nobuhiro Kaji ; Naoki Yoshinaga ; Masashi Toyoda

Abstract: Because the real world evolves over time, numerous relations between entities written in presently available texts are already obsolete or will potentially evolve in the future. This study aims at resolving the intricacy in consistently compiling relations extracted from text, and presents a method for identifying constancy and uniqueness of the relations in the context of supervised learning. We exploit massive time-series web texts to induce features on the basis of time-series frequency and linguistic cues. Experimental results confirmed that the time-series frequency distributions contributed much to the recall of constancy identification and the precision of the uniqueness identification.

5 0.12079298 100 emnlp-2012-Open Language Learning for Information Extraction

Author: Mausam ; Michael Schmitz ; Stephen Soderland ; Robert Bart ; Oren Etzioni

Abstract: Open Information Extraction (IE) systems extract relational tuples from text, without requiring a pre-specified vocabulary, by identifying relation phrases and associated arguments in arbitrary sentences. However, stateof-the-art Open IE systems such as REVERB and WOE share two important weaknesses (1) they extract only relations that are mediated by verbs, and (2) they ignore context, thus extracting tuples that are not asserted as factual. This paper presents OLLIE, a substantially improved Open IE system that addresses both these limitations. First, OLLIE achieves high yield by extracting relations mediated by nouns, adjectives, and more. Second, a context-analysis step increases precision by including contextual information from the sentence in the extractions. OLLIE obtains 2.7 times the area under precision-yield curve (AUC) compared to REVERB and 1.9 times the AUC of WOEparse. –

6 0.11370666 97 emnlp-2012-Natural Language Questions for the Web of Data

7 0.11163209 80 emnlp-2012-Learning Verb Inference Rules from Linguistically-Motivated Evidence

8 0.089810401 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules

9 0.083439343 98 emnlp-2012-No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities

10 0.081578344 103 emnlp-2012-PATTY: A Taxonomy of Relational Patterns with Semantic Types

11 0.059662219 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering

12 0.05898137 85 emnlp-2012-Local and Global Context for Supervised and Unsupervised Metonymy Resolution

13 0.053631227 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents

14 0.052896842 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes

15 0.052052222 65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure

16 0.051561855 7 emnlp-2012-A Novel Discriminative Framework for Sentence-Level Discourse Analysis

17 0.04808763 32 emnlp-2012-Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants

18 0.047325809 72 emnlp-2012-Joint Inference for Event Timeline Construction

19 0.045772273 36 emnlp-2012-Domain Adaptation for Coreference Resolution: An Adaptive Ensemble Approach

20 0.040848427 10 emnlp-2012-A Statistical Relational Learning Approach to Identifying Evidence Based Medicine Categories

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.168), (1, 0.152), (2, -0.007), (3, -0.066), (4, 0.064), (5, -0.031), (6, 0.111), (7, 0.231), (8, -0.165), (9, 0.003), (10, -0.125), (11, 0.003), (12, -0.068), (13, -0.084), (14, -0.052), (15, -0.028), (16, -0.296), (17, 0.004), (18, 0.027), (19, -0.025), (20, -0.034), (21, -0.05), (22, 0.046), (23, -0.015), (24, 0.011), (25, -0.217), (26, 0.018), (27, 0.002), (28, 0.073), (29, 0.12), (30, -0.063), (31, 0.046), (32, -0.047), (33, 0.101), (34, -0.068), (35, -0.042), (36, 0.029), (37, -0.008), (38, 0.003), (39, -0.002), (40, -0.041), (41, -0.142), (42, 0.074), (43, -0.041), (44, -0.0), (45, 0.02), (46, 0.062), (47, -0.006), (48, 0.019), (49, 0.151)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98665011 40 emnlp-2012-Ensemble Semantics for Large-scale Unsupervised Relation Extraction

Author: Bonan Min ; Shuming Shi ; Ralph Grishman ; Chin-Yew Lin

2 0.84526801 100 emnlp-2012-Open Language Learning for Information Extraction

Author: Mausam ; Michael Schmitz ; Stephen Soderland ; Robert Bart ; Oren Etzioni

3 0.79726976 62 emnlp-2012-Identifying Constant and Unique Relations by using Time-Series Text

Author: Yohei Takaku ; Nobuhiro Kaji ; Naoki Yoshinaga ; Masashi Toyoda

4 0.60876399 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction

Author: Mihai Surdeanu ; Julie Tibshirani ; Ramesh Nallapati ; Christopher D. Manning

5 0.57225388 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

Author: Jayant Krishnamurthy ; Tom Mitchell

6 0.55794096 103 emnlp-2012-PATTY: A Taxonomy of Relational Patterns with Semantic Types

7 0.51103556 80 emnlp-2012-Learning Verb Inference Rules from Linguistically-Motivated Evidence

8 0.49429435 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules

9 0.48006576 97 emnlp-2012-Natural Language Questions for the Web of Data

10 0.39002314 85 emnlp-2012-Local and Global Context for Supervised and Unsupervised Metonymy Resolution

11 0.33495036 32 emnlp-2012-Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants

12 0.33016858 7 emnlp-2012-A Novel Discriminative Framework for Sentence-Level Discourse Analysis

13 0.3139075 98 emnlp-2012-No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities

14 0.27167928 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering

15 0.21057819 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis

16 0.20797989 10 emnlp-2012-A Statistical Relational Learning Approach to Identifying Evidence Based Medicine Categories

17 0.20755631 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes

18 0.19124301 44 emnlp-2012-Excitatory or Inhibitory: A New Semantic Orientation Extracts Contradiction and Causality from the Web

19 0.18063715 25 emnlp-2012-Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation

20 0.17631336 65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(34, 0.023), (60, 0.04), (63, 0.051), (65, 0.726), (74, 0.019), (80, 0.012), (86, 0.011), (95, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98086995 40 emnlp-2012-Ensemble Semantics for Large-scale Unsupervised Relation Extraction

Author: Bonan Min ; Shuming Shi ; Ralph Grishman ; Chin-Yew Lin

2 0.80968118 53 emnlp-2012-First Order vs. Higher Order Modification in Distributional Semantics

Author: Gemma Boleda ; Eva Maria Vecchi ; Miquel Cornudella ; Louise McNally

Abstract: Adjectival modification, particularly by expressions that have been treated as higherorder modifiers in the formal semantics tradition, raises interesting challenges for semantic composition in distributional semantic models. We contrast three types of adjectival modifiers intersectively used color terms (as in white towel, clearly first-order), subsectively used color terms (white wine, which have been modeled as both first- and higher-order), and intensional adjectives (former bassist, clearly higher-order) and test the ability of different composition strategies to model their behavior. In addition to opening up a new empirical domain for research on distributional semantics, our observations concerning the attested vectors for the different types of adjectives, the nouns they modify, and the resulting – – noun phrases yield insights into modification that have been little evident in the formal semantics literature to date.

3 0.74202961 76 emnlp-2012-Learning-based Multi-Sieve Co-reference Resolution with Knowledge

Author: Lev Ratinov ; Dan Roth

Abstract: We explore the interplay of knowledge and structure in co-reference resolution. To inject knowledge, we use a state-of-the-art system which cross-links (or “grounds”) expressions in free text to Wikipedia. We explore ways of using the resulting grounding to boost the performance of a state-of-the-art co-reference resolution system. To maximize the utility of the injected knowledge, we deploy a learningbased multi-sieve approach and develop novel entity-based features. Our end system outperforms the state-of-the-art baseline by 2 B3 F1 points on non-transcript portion of the ACE 2004 dataset.

4 0.37315485 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

Author: Jayant Krishnamurthy ; Tom Mitchell

5 0.37055272 30 emnlp-2012-Constructing Task-Specific Taxonomies for Document Collection Browsing

Author: Hui Yang

Abstract: Taxonomies can serve as browsing tools for document collections. However, given an arbitrary collection, pre-constructed taxonomies could not easily adapt to the specific topic/task present in the collection. This paper explores techniques to quickly derive task-specific taxonomies supporting browsing in arbitrary document collections. The supervised approach directly learns semantic distances from users to propose meaningful task-specific taxonomies. The approach aims to produce globally optimized taxonomy structures by incorporating path consistency control and usergenerated task specification into the general learning framework. A comparison to stateof-the-art systems and a user study jointly demonstrate that our techniques are highly effective. .

6 0.3230781 98 emnlp-2012-No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities

7 0.29880679 97 emnlp-2012-Natural Language Questions for the Web of Data

8 0.29689977 26 emnlp-2012-Building a Lightweight Semantic Model for Unsupervised Information Extraction on Short Listings

9 0.29501459 73 emnlp-2012-Joint Learning for Coreference Resolution with Markov Logic

10 0.29494199 103 emnlp-2012-PATTY: A Taxonomy of Relational Patterns with Semantic Types

11 0.29407623 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules

12 0.28698438 100 emnlp-2012-Open Language Learning for Information Extraction

13 0.28657877 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents

14 0.28345969 62 emnlp-2012-Identifying Constant and Unique Relations by using Time-Series Text

15 0.28234029 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction

16 0.26881805 85 emnlp-2012-Local and Global Context for Supervised and Unsupervised Metonymy Resolution

17 0.26103377 72 emnlp-2012-Joint Inference for Event Timeline Construction

18 0.24633478 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation

19 0.24591734 6 emnlp-2012-A New Minimally-Supervised Framework for Domain Word Sense Disambiguation

20 0.23823532 101 emnlp-2012-Opinion Target Extraction Using Word-Based Translation Model