emnlp emnlp2012 emnlp2012-40 emnlp2012-40-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Bonan Min ; Shuming Shi ; Ralph Grishman ; Chin-Yew Lin
Abstract: Discovering significant types of relations from the web is challenging because of its open nature. Unsupervised algorithms are developed to extract relations from a corpus without knowing the relations in advance, but most of them rely on tagging arguments of predefined types. Recently, a new algorithm was proposed to jointly extract relations and their argument semantic classes, taking a set of relation instances extracted by an open IE algorithm as input. However, it cannot handle polysemy of relation phrases and fails to group many similar (“synonymous”) relation instances because of the sparseness of features. In this paper, we present a novel unsupervised algorithm that provides a more general treatment of the polysemy and synonymy problems. The algorithm incorporates various knowledge sources which we will show to be very effective for unsupervised extraction. Moreover, it explicitly disambiguates polysemous relation phrases and groups synonymous ones. While maintaining approximately the same precision, the algorithm achieves significant improvement on recall compared to the previous method. It is also very efficient. Experiments on a realworld dataset show that it can handle 14.7 million relation instances and extract a very large set of relations from the web. Ralph Grishman1 Chin-Yew Lin2 2Microsoft Research Asia Beijing, China { shumings cyl } @mi cro s o ft . com , that has many applications in answering factoid questions, building knowledge bases and improving search engine relevance. The web has become a massive potential source of such relations. However, its open nature brings an open-ended set of relation types. To extract these relations, a system should not assume a fixed set of relation types, nor rely on a fixed set of relation argument types. The past decade has seen some promising solutions, unsupervised relation extraction (URE) algorithms that extract relations from a corpus without knowing the relations in advance. However, most algorithms (Hasegawa et al., 2004, Shinyama and Sekine, 2006, Chen et. al, 2005) rely on tagging predefined types of entities as relation arguments, and thus are not well-suited for the open domain. Recently, Kok and Domingos (2008) proposed Semantic Network Extractor (SNE), which generates argument semantic classes and sets of synonymous relation phrases at the same time, thus avoiding the requirement of tagging relation arguments of predefined types. However, SNE has 2 limitations: 1) Following previous URE algorithms, it only uses features from the set of input relation instances for clustering. Empirically we found that it fails to group many relevant relation instances. These features, such as the surface forms of arguments and lexical sequences in between, are very sparse in practice. In contrast, there exist several well-known corpus-level semantic resources that can be automatically derived from a source corpus and are shown to be useful for generating the key elements of a relation: its 2 argument semantic classes and a set of synonymous phrases. For example, semantic classes can be derived from a source corpus with contextual distributional simi1 Introduction Relation extraction aims at discovering semantic larity and web table co-occurrences. The “synonymy” 1 problem for clustering relation instances relations between entities. It is an important task * Work done during an internship at Microsoft Research Asia 1027 LParnogcue agdein Lgesa ornf tihneg, 2 p0a1g2e Jso 1in02t C7–o1n0f3e7re,n Jce ju on Is Elanmdp,ir Kicoarlea M,e 1t2h–o1d4s J iunly N 2a0tu1r2a.l ? Lc a2n0g1u2ag Aes Psorcoicaetsiosin fgo arn Cdo Cmopmutpauti oantiaoln Lailn Ngautiustriacls could potentially be better solved by adding these resources. 2) SNE assumes that each entity or relation phrase belongs to exactly one cluster, thus is not able to effectively handle polysemy of relation phrases2. An example of a polysemous phrase is be the currency of as in 2 triples