emnlp emnlp2012 emnlp2012-93 knowledge-graph by maker-knowledge-mining

93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction


Source: pdf

Author: Mihai Surdeanu ; Julie Tibshirani ; Ramesh Nallapati ; Christopher D. Manning

Abstract: Distant supervision for relation extraction (RE) gathering training data by aligning a database of facts with text – is an efficient approach to scale RE to thousands of different relations. However, this introduces a challenging learning scenario where the relation expressed by a pair of entities found in a sentence is unknown. For example, a sentence containing Balzac and France may express BornIn or Died, an unknown relation, or no relation at all. Because of this, traditional supervised learning, which assumes that each example is explicitly mapped to a label, is not appropriate. We propose a novel approach to multi-instance multi-label learning for RE, which jointly models all the instances of a pair of entities in text and all their labels using a graphical model with latent variables. Our model performs competitively on two difficult domains. –

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com , , Abstract Distant supervision for relation extraction (RE) gathering training data by aligning a database of facts with text – is an efficient approach to scale RE to thousands of different relations. [sent-8, score-0.553]

2 However, this introduces a challenging learning scenario where the relation expressed by a pair of entities found in a sentence is unknown. [sent-9, score-0.327]

3 For example, a sentence containing Balzac and France may express BornIn or Died, an unknown relation, or no relation at all. [sent-10, score-0.246]

4 We propose a novel approach to multi-instance multi-label learning for RE, which jointly models all the instances of a pair of entities in text and all their labels using a graphical model with latent variables. [sent-12, score-0.319]

5 One of the most promising approaches to IE that addresses this limitation is distant supervision, which generates training data automatically by aligning a 455 DB = ? [sent-19, score-0.204]

6 eEmBplo–ryneIdBy Figure 1: Training sentences generated through distant supervision for a database containing two facts. [sent-25, score-0.336]

7 In this paper we focus on distant supervision for relation extraction (RE), a subproblem of IE that addresses the extraction of labeled relations between two named entities. [sent-27, score-0.739]

8 The second challenge is that the same pair of entities may have multiple labels and it is unclear which label is instantiated by any textual mention of the given tuple. [sent-36, score-0.448]

9 For example, in Figure 1, the tuple (Barack Obama, United States) has two valid labels: BornIn and EmployedBy, each (latently) instantiated in different sentences. [sent-37, score-0.198]

10 For relation extraction the object is a tuple of two named entities. [sent-46, score-0.51]

11 Each mention ofthis tuple in text generates a different instance. [sent-47, score-0.39]

12 5% of the entity tuples in the training partition have more than one label. [sent-49, score-0.253]

13 In this paper we propose a novel graphical model, which we called MIML-RE, that targets MIML learning for relation extraction. [sent-51, score-0.246]

14 Our work makes the following contributions: (a) To our knowledge, MIML-RE is the first RE approach that jointly models both multiple instances (by modeling the latent labels assigned to instances) and multiple labels (by providing a simple method to capture dependencies between labels). [sent-52, score-0.534]

15 For example, our model learns that certain labels tend to be generated jointly while others cannot be jointly assigned to the same tuple. [sent-53, score-0.319]

16 2 Related Work Distant supervision for IE was introduced by Craven and Kumlien (1999), who focused on the extraction of binary relations between proteins and cells/tissues/diseases/drugs using the Yeast Protein Database as a source of distant supervision. [sent-55, score-0.427]

17 For example, most proposals heuristically transform distant supervision to traditional supervised learning (i. [sent-63, score-0.331]

18 (2010) model distant supervision for relation extraction as a multi-instance single-label problem, which allows multiple mentions for the same tuple but disallows more than one label per object. [sent-70, score-1.065]

19 They address the same problem we do (binary relation extraction) with a MIML model, but they make two approximations. [sent-73, score-0.246]

20 First, they use a deterministic model that aggregates latent instance labels into a set of labels for the corresponding tuple by OR-ing the classification results. [sent-74, score-0.569]

21 We use instead an objectlevel classifier that is trained jointly with the classifier that assigns latent labels to instances and can capture dependencies between labels. [sent-75, score-0.525]

22 In this problem, each image may be assigned multiple labels corresponding to the different scenes captured. [sent-80, score-0.243]

23 3 Distant Supervision for Relation Extraction Here we focus on distant supervision for the extraction of relations between two entities. [sent-93, score-0.427]

24 We define a relation as the construct r(e1, e2), where r is the relation name, e. [sent-94, score-0.492]

25 Note that there are entity tuples (e1, e2) that participate in multiple relations, r1, . [sent-99, score-0.253]

26 In other words, the tuple (e1, e2) is the object illustrated in Figure 2 and the different relation names are the labels. [sent-103, score-0.444]

27 We define an entity mention as a sequence of text tokens that matches the corresponding entity name in some text, and relation mention (for a given relation r(e1, e2)) as a pair of entity mentions of e1 and e2 in the same sentence. [sent-104, score-1.581]

28 Relation mentions thus correspond to the instances in Figure 2. [sent-105, score-0.22]

29 Furthermore, we assume that entity mentions are extracted by a different process, such as a named entity recognizer. [sent-107, score-0.515]

30 4 Model Our model assumes that each relation mention involving an entity pair has exactly one label, but allows the pair to exhibit multiple labels across different mentions. [sent-113, score-0.817]

31 Since we do not know the actual relation label of a mention in the distantly supervised setting, we model it using a latent variable z that can take one of the k pre-specified relation labels as well as an additional NIL label, if no relation is expressed by the corresponding mention. [sent-114, score-1.207]

32 We model the multiple relation labels an entity pair can assume 1For this reason, we use relation mention and relation instance interchangeably in this paper. [sent-115, score-1.281]

33 457 y plate to emphasize that it is a collection of binary classifiers (one per relation label), whereas the z classifier is multi-class. [sent-116, score-0.412]

34 using a multi-label classifier that takes as input the latent relation types of the all the mentions involving that pair. [sent-118, score-0.589]

35 The z classifier assigns latent labels from L to individual relation mentions or NIL if no relatLion to i isn expressed by itohne mmeennttiioonn. [sent-121, score-0.75]

36 s oEra cNhI yj fc nlaoss riefileardecides if relation j holds for the given entity tuple, using the mention-level classifications as input. [sent-122, score-0.488]

37 Additionally, we define Pi (Ni) as the set of all known positive (negative) relation labels for the ith entity tuple. [sent-124, score-0.595]

38 (201 1a) proposed models where Ni for the ith tuple (e1, e2) is defined as: {rj | rj (e1, ek) ∈ D, ek e2, rj ∈/ Pi}, sw dheificnhe ids a s:u {brset| o rf L \ Pi. [sent-128, score-0.424]

39 )T ∈ha Dt is, entity e2 ∈is Pco}n,s widheircehd i a negative example for relation rj (in the context of entity e1) only if rj exists in the training data with a different value. [sent-129, score-0.802]

40 For example, it can learn that two relation labels (e. [sent-132, score-0.407]

41 , BornIn and SpouseOf) cannot be generated jointly for the same entity tuple. [sent-134, score-0.214]

42 So, if the z classifier outputs both these labels for different mentions of the same tuple, the y layer can cancel one of them. [sent-135, score-0.512]

43 Furthermore, the y classifiers can learn when two labels tend to appear jointly, e. [sent-136, score-0.224]

44 In the Expectation (E) step we assign latent mention labels using the current model (i. [sent-142, score-0.402]

45 The vector zi contains the latent mention-level classifications for the ith entity pair, while yi represents the corresponding set of gold-standard labels (that is, = 1if r ∈ Pi, and = 0 for r ∈ Ni. [sent-150, score-0.704]

46 , we maximize the above joint probability for each entity pair in the database. [sent-155, score-0.219]

47 E-step: In this step we infer the mention-level classifications zi for each entity tuple, given all its mentions, the gold labels yi, and current model, i. [sent-157, score-0.539]

48 Formally, we seek to find: zi∗ = argzmaxp(z|yi,xi,wy,wz) However it is computationally intractable to consider all vectors z as there is an exponential number of possible assignments, so we approximate and consider each mention separately. [sent-160, score-0.192]

49 For example, if a particular mention label receives a high mention-level probability but it is known to be a negative label for that tuple, it will receive a low overall score. [sent-166, score-0.358]

50 M-step: In this step we find wy, wz that maximize the lower bound of the log-likelihood, i. [sent-167, score-0.283]

51 From equation (1) it is clear that this can be maximized separately with respect to wy and wz. [sent-170, score-0.246]

52 We obtained these weights using k + 1 logistic classifiers: one multi-class classifier for wz and k binary classifiers for each relation label r ∈ L. [sent-175, score-0.763]

53 2 The main difference between the classifiers is how features are generated: the mention-level classifier computes its features based on xi, whereas the relation-level classifiers generate features based on the current assignments for zi and the corresponding relation label r. [sent-177, score-0.72]

54 2 Inference Given an entity tuple, we obtain its relation labels as follows. [sent-180, score-0.569]

55 shtml (5) 459 then decide on the final relation labels using the toplevel classifiers: y(ir)∗ = argy∈ m{0a,x1}p(y|zi∗,wy(r)) (6) 4. [sent-184, score-0.407]

56 In our context, the initial values are labels assigned to zi, which are required to compute equation (2) in the first iteration (z0i). [sent-187, score-0.275]

57 We generate these values using a local logistic regression classifier that uses the same features as the mention-level classifier in the joint model but treats each relation mention independently. [sent-188, score-0.674]

58 We train this classifier using “traditional” distant supervision: for each relation in the database D we assume that all the corresponding tmheen dtiaotanbsa are positive examples lflo thr teh ceo corresponding label (Mintz et al. [sent-189, score-0.617]

59 Note that this heuris- tic repeats relation mentions with different labels for the tuples that participate in multiple relations. [sent-191, score-0.689]

60 For example, all the relation mentions in Figure 1 will yield datums with both the EmployedBy and BornIn labels. [sent-192, score-0.471]

61 For the second part of equation (2), we initialize the relation-level classifier with a model that replicates the at least one heuristic of Hoffmann et Eachw(yr) al. [sent-194, score-0.195]

62 modelhasasinglefeaturewith a high positive weight that is triggered when label r is assigned to any of the mentions in zi∗. [sent-196, score-0.312]

63 Avoiding overfitting: A na¨ ıve implementation of our approach leads to an unrealistic training scenario where the z classifier generates predictions (in equation (2)) for the same datums it has seen in training in the previous iteration. [sent-197, score-0.305]

64 Each classifier outputs wz) for tuples in a given fold during tphuet E-step (equation (2)) and is trained (equation (3)) p(z|x(im), using tuples from all other folds. [sent-199, score-0.311]

65 At testing time, we compute wz) in equation (5) as mtehe, average pouf tehe p probabilities of the above set of mention classifiers: p(z|x(im), p(z|x(im),wz) = PjK=1p(zK|xi(m),wzj) wzj where are the weights of the mention classifier responsible for fold j. [sent-200, score-0.547]

66 We found that this simple bagging model performs slightly better in practice (a couple of tenths of a percent) than training a single mention classifier on the latent mention labels generated in the last training iteration. [sent-201, score-0.724]

67 Inference during training: During the inference process in the E-step, the algorithm incrementally “flips” mention labels based on equation (2), for each group of mentions Mi. [sent-202, score-0.604]

68 Thus, zi0 changes as the algorithm progresses, which may impact the label assigned to the remaining mentions in that group. [sent-203, score-0.312]

69 To avoid any potential bias introduced by the arbitrary order of mentions as seen in the data, we randomize each group Mi before we inspect its mentions. [sent-204, score-0.191]

70 , 2005) to find entity mentions in text and constructed relation mentions only between entity mentions in the same sentence. [sent-210, score-1.143]

71 (201 1) perform a second evaluation where they compute the accuracy of labels assigned to a set of relation mentions that they manually annotated. [sent-216, score-0.652]

72 During training, for each entity tuple (e1, e2), we retrieved up to 50 sentences that contain both entity mentions. [sent-233, score-0.522]

73 4 We used Stanford’s CoreNLP pack- age to find entity mentions in text and, similarly to Riedel et al. [sent-234, score-0.353]

74 (2010), we construct relation mention candidates only between entity mentions in the same sentence. [sent-235, score-0.791]

75 We analyzed a set of over 2,000 relation mentions and we found that 39% of the mentions where e1 is an organization name and 36% of mentions where e1 is a person name do not express the corresponding relation. [sent-236, score-0.819]

76 At evaluation time, the KBP shared task requires the extraction of all relations r(e1, e2) given a query that contains only the first entity e1. [sent-237, score-0.292]

77 To accommodate this setup, we adjusted our sentence extraction component to use just e1 as the retrieval query and we kept up to 50 sentences that contain a mention of the input entity for each evaluation query. [sent-238, score-0.42]

78 The table indicates that having multiple mentions for an entity tuple is a very common phenomenon in both corpora, and that having multiple labels per tuple is more common in the Riedel dataset than KBP (7. [sent-253, score-0.964]

79 2 Features Our model requires two sets of features: one for the mention classifier (z) and one for the relation classifier (y). [sent-258, score-0.644]

80 ’s at least one heuristic using a single feature, which is set to true if at least one mention in zi has the label r, which is modeled by the current relation classifier. [sent-268, score-0.715]

81 The second group models the dependencies between relation labels. [sent-269, score-0.274]

82 This is implemented by a set of |L| 1 features, where feature j is instanstieatte odf w |Lh|en −ev 1er f ethateu rlaesb,el w mheordeefl eeda (r) i js predicted jointly with another label rj (rj ∈ L, rj r) in zi. [sent-270, score-0.347]

83 3, this model follows the “traditional” distant supervision heuristic, similarly to (Mintz et al. [sent-281, score-0.297]

84 However, our implementation has several advantages over the original model: (a) we model each relation mention independently, whereas Mintz et al. [sent-283, score-0.467]

85 collapsed all the mentions of the same entity tuple into a single datum; (b) we allow multi-label outputs for a given entity tuple at prediction time by OR-ing the predictions for the individual relation mentions corresponding to the tuple (similarly – to (Hoffmann et al. [sent-284, score-1.598]

86 – 6We also allow multiple labels per tuple at training time, in which case we replicate the corresponding datum for each label. [sent-293, score-0.41]

87 This models RE as a MIML problem, but learns using a Perceptron algorithm and uses a deterministic “at least one” decision instead of a relation classifier. [sent-297, score-0.246]

88 MIML-RE has two parameters that require tuning: the number of EM epochs (T) and the number of folds for the mention classifiers (K). [sent-301, score-0.28]

89 Formally, we rank a relation r predicted for group i, i. [sent-321, score-0.246]

90 Note that the above ranking score does not include the probability of the relation classifier (equation (6)) for MIML-RE. [sent-325, score-0.349]

91 We see a smaller improvement in KBP (concentrated around the middle of the curve), likely because the number of entity tuples with multiple labels in training is small (see Table 1). [sent-350, score-0.414]

92 This corpus contains approximately 2% of the groups from the original testing partition, out of which 90 tuples have at least one known label and 1410 groups serve as negative examples. [sent-357, score-0.216]

93 A post-hoc inspection of the results indicates that, indeed, MIML-RE successfully eliminates undesired labels when two (or more) incompatible labels are jointly assigned to the same tuple. [sent-363, score-0.428]

94 Take for example the tuple (Mexico City, Mexico), for which the correct relation is /location/administrative division/country. [sent-364, score-0.444]

95 7 Conclusion In this paper we showed that distant supervision for RE, which generates training data by aligning a database of facts with text, poses a distinct multiinstance multi-label learning scenario. [sent-370, score-0.403]

96 In this setting, each entity pair to be modeled typically has multiple instances in the text and may have multiple labels in the database. [sent-371, score-0.38]

97 , the latent assignment of labels to instances and dependencies between labels assigned to the same entity pair. [sent-376, score-0.644]

98 Our model performs well even when not all aspects of the MIML scenario are common, and as seen in the discussion, shows significant improvement when evaluated on entity pairs with many labels or mentions. [sent-378, score-0.376]

99 Knowledgebased weak supervision for information extraction of overlapping relations. [sent-412, score-0.201]

100 End-to-end relation extraction using distant supervision from external semantic repositories. [sent-431, score-0.609]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('kbp', 0.306), ('miml', 0.275), ('wz', 0.254), ('relation', 0.246), ('riedel', 0.228), ('hoffmann', 0.22), ('tuple', 0.198), ('mention', 0.192), ('mentions', 0.191), ('wy', 0.186), ('im', 0.179), ('zi', 0.178), ('entity', 0.162), ('distant', 0.162), ('labels', 0.161), ('supervision', 0.135), ('mintz', 0.131), ('classifier', 0.103), ('rj', 0.1), ('tuples', 0.091), ('yi', 0.09), ('yr', 0.071), ('bornin', 0.071), ('analytics', 0.068), ('label', 0.067), ('xi', 0.067), ('extraction', 0.066), ('relations', 0.064), ('surdeanu', 0.064), ('classifiers', 0.063), ('equation', 0.06), ('nguyen', 0.059), ('pyi', 0.059), ('curve', 0.057), ('ni', 0.056), ('assigned', 0.054), ('dataset', 0.054), ('scenario', 0.053), ('jointly', 0.052), ('datum', 0.051), ('latent', 0.049), ('craven', 0.046), ('re', 0.046), ('yj', 0.042), ('aligning', 0.042), ('ji', 0.041), ('bag', 0.04), ('bunescu', 0.04), ('grishman', 0.04), ('bellare', 0.04), ('argzmaxp', 0.039), ('brodley', 0.039), ('employedby', 0.039), ('mexico', 0.039), ('database', 0.039), ('mihai', 0.039), ('classifications', 0.038), ('approximations', 0.038), ('mi', 0.037), ('traditional', 0.034), ('datums', 0.034), ('infoboxes', 0.034), ('stanford', 0.034), ('pi', 0.033), ('heuristic', 0.032), ('negative', 0.032), ('sun', 0.032), ('layer', 0.031), ('logp', 0.031), ('points', 0.031), ('ie', 0.031), ('nil', 0.031), ('corenlp', 0.031), ('kumlien', 0.031), ('curves', 0.03), ('freebase', 0.03), ('mooney', 0.03), ('logistic', 0.03), ('implementation', 0.029), ('instances', 0.029), ('maximize', 0.029), ('ralph', 0.028), ('image', 0.028), ('organizers', 0.028), ('odf', 0.028), ('moschitti', 0.028), ('hoffman', 0.028), ('scene', 0.028), ('anford', 0.028), ('pair', 0.028), ('dependencies', 0.028), ('bagging', 0.027), ('positives', 0.027), ('hoa', 0.027), ('outputs', 0.026), ('predictions', 0.026), ('ith', 0.026), ('approximately', 0.026), ('facts', 0.025), ('epochs', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999881 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction

Author: Mihai Surdeanu ; Julie Tibshirani ; Ramesh Nallapati ; Christopher D. Manning

Abstract: Distant supervision for relation extraction (RE) gathering training data by aligning a database of facts with text – is an efficient approach to scale RE to thousands of different relations. However, this introduces a challenging learning scenario where the relation expressed by a pair of entities found in a sentence is unknown. For example, a sentence containing Balzac and France may express BornIn or Died, an unknown relation, or no relation at all. Because of this, traditional supervised learning, which assumes that each example is explicitly mapped to a label, is not appropriate. We propose a novel approach to multi-instance multi-label learning for RE, which jointly models all the instances of a pair of entities in text and all their labels using a graphical model with latent variables. Our model performs competitively on two difficult domains. –

2 0.2279689 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

Author: Jayant Krishnamurthy ; Tom Mitchell

Abstract: We present a method for training a semantic parser using only a knowledge base and an unlabeled text corpus, without any individually annotated sentences. Our key observation is that multiple forms ofweak supervision can be combined to train an accurate semantic parser: semantic supervision from a knowledge base, and syntactic supervision from dependencyparsed sentences. We apply our approach to train a semantic parser that uses 77 relations from Freebase in its knowledge representation. This semantic parser extracts instances of binary relations with state-of-theart accuracy, while simultaneously recovering much richer semantic structures, such as conjunctions of multiple relations with partially shared arguments. We demonstrate recovery of this richer structure by extracting logical forms from natural language queries against Freebase. On this task, the trained semantic parser achieves 80% precision and 56% recall, despite never having seen an annotated logical form.

3 0.18623945 84 emnlp-2012-Linking Named Entities to Any Database

Author: Avirup Sil ; Ernest Cronin ; Penghai Nie ; Yinfei Yang ; Ana-Maria Popescu ; Alexander Yates

Abstract: Existing techniques for disambiguating named entities in text mostly focus on Wikipedia as a target catalog of entities. Yet for many types of entities, such as restaurants and cult movies, relational databases exist that contain far more extensive information than Wikipedia. This paper introduces a new task, called Open-Database Named-Entity Disambiguation (Open-DB NED), in which a system must be able to resolve named entities to symbols in an arbitrary database, without requiring labeled data for each new database. We introduce two techniques for Open-DB NED, one based on distant supervision and the other based on domain adaptation. In experiments on two domains, one with poor coverage by Wikipedia and the other with near-perfect coverage, our Open-DB NED strategies outperform a state-of-the-art Wikipedia NED system by over 25% in accuracy.

4 0.1511967 40 emnlp-2012-Ensemble Semantics for Large-scale Unsupervised Relation Extraction

Author: Bonan Min ; Shuming Shi ; Ralph Grishman ; Chin-Yew Lin

Abstract: Discovering significant types of relations from the web is challenging because of its open nature. Unsupervised algorithms are developed to extract relations from a corpus without knowing the relations in advance, but most of them rely on tagging arguments of predefined types. Recently, a new algorithm was proposed to jointly extract relations and their argument semantic classes, taking a set of relation instances extracted by an open IE algorithm as input. However, it cannot handle polysemy of relation phrases and fails to group many similar (“synonymous”) relation instances because of the sparseness of features. In this paper, we present a novel unsupervised algorithm that provides a more general treatment of the polysemy and synonymy problems. The algorithm incorporates various knowledge sources which we will show to be very effective for unsupervised extraction. Moreover, it explicitly disambiguates polysemous relation phrases and groups synonymous ones. While maintaining approximately the same precision, the algorithm achieves significant improvement on recall compared to the previous method. It is also very efficient. Experiments on a realworld dataset show that it can handle 14.7 million relation instances and extract a very large set of relations from the web. Ralph Grishman1 Chin-Yew Lin2 2Microsoft Research Asia Beijing, China { shumings cyl } @mi cro s o ft . com , that has many applications in answering factoid questions, building knowledge bases and improving search engine relevance. The web has become a massive potential source of such relations. However, its open nature brings an open-ended set of relation types. To extract these relations, a system should not assume a fixed set of relation types, nor rely on a fixed set of relation argument types. The past decade has seen some promising solutions, unsupervised relation extraction (URE) algorithms that extract relations from a corpus without knowing the relations in advance. However, most algorithms (Hasegawa et al., 2004, Shinyama and Sekine, 2006, Chen et. al, 2005) rely on tagging predefined types of entities as relation arguments, and thus are not well-suited for the open domain. Recently, Kok and Domingos (2008) proposed Semantic Network Extractor (SNE), which generates argument semantic classes and sets of synonymous relation phrases at the same time, thus avoiding the requirement of tagging relation arguments of predefined types. However, SNE has 2 limitations: 1) Following previous URE algorithms, it only uses features from the set of input relation instances for clustering. Empirically we found that it fails to group many relevant relation instances. These features, such as the surface forms of arguments and lexical sequences in between, are very sparse in practice. In contrast, there exist several well-known corpus-level semantic resources that can be automatically derived from a source corpus and are shown to be useful for generating the key elements of a relation: its 2 argument semantic classes and a set of synonymous phrases. For example, semantic classes can be derived from a source corpus with contextual distributional simi1 Introduction Relation extraction aims at discovering semantic larity and web table co-occurrences. The “synonymy” 1 problem for clustering relation instances relations between entities. It is an important task * Work done during an internship at Microsoft Research Asia 1027 LParnogcue agdein Lgesa ornf tihneg, 2 p0a1g2e Jso 1in02t C7–o1n0f3e7re,n Jce ju on Is Elanmdp,ir Kicoarlea M,e 1t2h–o1d4s J iunly N 2a0tu1r2a.l ? Lc a2n0g1u2ag Aes Psorcoicaetsiosin fgo arn Cdo Cmopmutpauti oantiaoln Lailn Ngautiustriacls could potentially be better solved by adding these resources. 2) SNE assumes that each entity or relation phrase belongs to exactly one cluster, thus is not able to effectively handle polysemy of relation phrases2. An example of a polysemous phrase is be the currency of as in 2 triples

5 0.14301577 76 emnlp-2012-Learning-based Multi-Sieve Co-reference Resolution with Knowledge

Author: Lev Ratinov ; Dan Roth

Abstract: We explore the interplay of knowledge and structure in co-reference resolution. To inject knowledge, we use a state-of-the-art system which cross-links (or “grounds”) expressions in free text to Wikipedia. We explore ways of using the resulting grounding to boost the performance of a state-of-the-art co-reference resolution system. To maximize the utility of the injected knowledge, we deploy a learningbased multi-sieve approach and develop novel entity-based features. Our end system outperforms the state-of-the-art baseline by 2 B3 F1 points on non-transcript portion of the ACE 2004 dataset.

6 0.13935721 19 emnlp-2012-An Entity-Topic Model for Entity Linking

7 0.13085893 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents

8 0.12070706 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction

9 0.12023105 72 emnlp-2012-Joint Inference for Event Timeline Construction

10 0.11139368 73 emnlp-2012-Joint Learning for Coreference Resolution with Markov Logic

11 0.11112683 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules

12 0.10716929 96 emnlp-2012-Name Phylogeny: A Generative Model of String Variation

13 0.1063823 98 emnlp-2012-No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities

14 0.10604998 91 emnlp-2012-Monte Carlo MCMC: Efficient Inference by Approximate Sampling

15 0.10407135 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars

16 0.10375594 62 emnlp-2012-Identifying Constant and Unique Relations by using Time-Series Text

17 0.09225107 100 emnlp-2012-Open Language Learning for Information Extraction

18 0.083758228 41 emnlp-2012-Entity based QA Retrieval

19 0.077899203 97 emnlp-2012-Natural Language Questions for the Web of Data

20 0.070951916 24 emnlp-2012-Biased Representation Learning for Domain Adaptation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.262), (1, 0.271), (2, 0.057), (3, -0.184), (4, -0.049), (5, -0.081), (6, 0.029), (7, 0.145), (8, -0.114), (9, 0.024), (10, -0.034), (11, 0.093), (12, 0.007), (13, -0.165), (14, -0.139), (15, 0.106), (16, -0.07), (17, 0.053), (18, -0.022), (19, -0.062), (20, -0.024), (21, -0.097), (22, 0.035), (23, -0.045), (24, -0.186), (25, -0.117), (26, 0.024), (27, 0.083), (28, -0.031), (29, -0.038), (30, 0.126), (31, 0.053), (32, -0.026), (33, 0.111), (34, -0.11), (35, 0.061), (36, -0.014), (37, 0.03), (38, 0.124), (39, -0.117), (40, 0.022), (41, -0.061), (42, 0.011), (43, 0.068), (44, 0.011), (45, -0.076), (46, 0.159), (47, 0.059), (48, -0.039), (49, -0.01)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96560532 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction

Author: Mihai Surdeanu ; Julie Tibshirani ; Ramesh Nallapati ; Christopher D. Manning

Abstract: Distant supervision for relation extraction (RE) gathering training data by aligning a database of facts with text – is an efficient approach to scale RE to thousands of different relations. However, this introduces a challenging learning scenario where the relation expressed by a pair of entities found in a sentence is unknown. For example, a sentence containing Balzac and France may express BornIn or Died, an unknown relation, or no relation at all. Because of this, traditional supervised learning, which assumes that each example is explicitly mapped to a label, is not appropriate. We propose a novel approach to multi-instance multi-label learning for RE, which jointly models all the instances of a pair of entities in text and all their labels using a graphical model with latent variables. Our model performs competitively on two difficult domains. –

2 0.64804804 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

Author: Jayant Krishnamurthy ; Tom Mitchell

Abstract: We present a method for training a semantic parser using only a knowledge base and an unlabeled text corpus, without any individually annotated sentences. Our key observation is that multiple forms ofweak supervision can be combined to train an accurate semantic parser: semantic supervision from a knowledge base, and syntactic supervision from dependencyparsed sentences. We apply our approach to train a semantic parser that uses 77 relations from Freebase in its knowledge representation. This semantic parser extracts instances of binary relations with state-of-theart accuracy, while simultaneously recovering much richer semantic structures, such as conjunctions of multiple relations with partially shared arguments. We demonstrate recovery of this richer structure by extracting logical forms from natural language queries against Freebase. On this task, the trained semantic parser achieves 80% precision and 56% recall, despite never having seen an annotated logical form.

3 0.64024782 62 emnlp-2012-Identifying Constant and Unique Relations by using Time-Series Text

Author: Yohei Takaku ; Nobuhiro Kaji ; Naoki Yoshinaga ; Masashi Toyoda

Abstract: Because the real world evolves over time, numerous relations between entities written in presently available texts are already obsolete or will potentially evolve in the future. This study aims at resolving the intricacy in consistently compiling relations extracted from text, and presents a method for identifying constancy and uniqueness of the relations in the context of supervised learning. We exploit massive time-series web texts to induce features on the basis of time-series frequency and linguistic cues. Experimental results confirmed that the time-series frequency distributions contributed much to the recall of constancy identification and the precision of the uniqueness identification.

4 0.63742208 84 emnlp-2012-Linking Named Entities to Any Database

Author: Avirup Sil ; Ernest Cronin ; Penghai Nie ; Yinfei Yang ; Ana-Maria Popescu ; Alexander Yates

Abstract: Existing techniques for disambiguating named entities in text mostly focus on Wikipedia as a target catalog of entities. Yet for many types of entities, such as restaurants and cult movies, relational databases exist that contain far more extensive information than Wikipedia. This paper introduces a new task, called Open-Database Named-Entity Disambiguation (Open-DB NED), in which a system must be able to resolve named entities to symbols in an arbitrary database, without requiring labeled data for each new database. We introduce two techniques for Open-DB NED, one based on distant supervision and the other based on domain adaptation. In experiments on two domains, one with poor coverage by Wikipedia and the other with near-perfect coverage, our Open-DB NED strategies outperform a state-of-the-art Wikipedia NED system by over 25% in accuracy.

5 0.5933643 40 emnlp-2012-Ensemble Semantics for Large-scale Unsupervised Relation Extraction

Author: Bonan Min ; Shuming Shi ; Ralph Grishman ; Chin-Yew Lin

Abstract: Discovering significant types of relations from the web is challenging because of its open nature. Unsupervised algorithms are developed to extract relations from a corpus without knowing the relations in advance, but most of them rely on tagging arguments of predefined types. Recently, a new algorithm was proposed to jointly extract relations and their argument semantic classes, taking a set of relation instances extracted by an open IE algorithm as input. However, it cannot handle polysemy of relation phrases and fails to group many similar (“synonymous”) relation instances because of the sparseness of features. In this paper, we present a novel unsupervised algorithm that provides a more general treatment of the polysemy and synonymy problems. The algorithm incorporates various knowledge sources which we will show to be very effective for unsupervised extraction. Moreover, it explicitly disambiguates polysemous relation phrases and groups synonymous ones. While maintaining approximately the same precision, the algorithm achieves significant improvement on recall compared to the previous method. It is also very efficient. Experiments on a realworld dataset show that it can handle 14.7 million relation instances and extract a very large set of relations from the web. Ralph Grishman1 Chin-Yew Lin2 2Microsoft Research Asia Beijing, China { shumings cyl } @mi cro s o ft . com , that has many applications in answering factoid questions, building knowledge bases and improving search engine relevance. The web has become a massive potential source of such relations. However, its open nature brings an open-ended set of relation types. To extract these relations, a system should not assume a fixed set of relation types, nor rely on a fixed set of relation argument types. The past decade has seen some promising solutions, unsupervised relation extraction (URE) algorithms that extract relations from a corpus without knowing the relations in advance. However, most algorithms (Hasegawa et al., 2004, Shinyama and Sekine, 2006, Chen et. al, 2005) rely on tagging predefined types of entities as relation arguments, and thus are not well-suited for the open domain. Recently, Kok and Domingos (2008) proposed Semantic Network Extractor (SNE), which generates argument semantic classes and sets of synonymous relation phrases at the same time, thus avoiding the requirement of tagging relation arguments of predefined types. However, SNE has 2 limitations: 1) Following previous URE algorithms, it only uses features from the set of input relation instances for clustering. Empirically we found that it fails to group many relevant relation instances. These features, such as the surface forms of arguments and lexical sequences in between, are very sparse in practice. In contrast, there exist several well-known corpus-level semantic resources that can be automatically derived from a source corpus and are shown to be useful for generating the key elements of a relation: its 2 argument semantic classes and a set of synonymous phrases. For example, semantic classes can be derived from a source corpus with contextual distributional simi1 Introduction Relation extraction aims at discovering semantic larity and web table co-occurrences. The “synonymy” 1 problem for clustering relation instances relations between entities. It is an important task * Work done during an internship at Microsoft Research Asia 1027 LParnogcue agdein Lgesa ornf tihneg, 2 p0a1g2e Jso 1in02t C7–o1n0f3e7re,n Jce ju on Is Elanmdp,ir Kicoarlea M,e 1t2h–o1d4s J iunly N 2a0tu1r2a.l ? Lc a2n0g1u2ag Aes Psorcoicaetsiosin fgo arn Cdo Cmopmutpauti oantiaoln Lailn Ngautiustriacls could potentially be better solved by adding these resources. 2) SNE assumes that each entity or relation phrase belongs to exactly one cluster, thus is not able to effectively handle polysemy of relation phrases2. An example of a polysemous phrase is be the currency of as in 2 triples

6 0.4830164 100 emnlp-2012-Open Language Learning for Information Extraction

7 0.47651637 98 emnlp-2012-No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities

8 0.46897581 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction

9 0.46614271 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules

10 0.45348707 76 emnlp-2012-Learning-based Multi-Sieve Co-reference Resolution with Knowledge

11 0.39975047 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars

12 0.37657019 72 emnlp-2012-Joint Inference for Event Timeline Construction

13 0.37267733 73 emnlp-2012-Joint Learning for Coreference Resolution with Markov Logic

14 0.36944026 19 emnlp-2012-An Entity-Topic Model for Entity Linking

15 0.36194137 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents

16 0.3538264 10 emnlp-2012-A Statistical Relational Learning Approach to Identifying Evidence Based Medicine Categories

17 0.34254283 96 emnlp-2012-Name Phylogeny: A Generative Model of String Variation

18 0.33551741 91 emnlp-2012-Monte Carlo MCMC: Efficient Inference by Approximate Sampling

19 0.32905489 32 emnlp-2012-Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants

20 0.32858941 97 emnlp-2012-Natural Language Questions for the Web of Data


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.023), (16, 0.03), (18, 0.275), (25, 0.023), (34, 0.07), (45, 0.011), (60, 0.146), (63, 0.078), (64, 0.03), (65, 0.057), (70, 0.015), (73, 0.023), (74, 0.033), (76, 0.051), (80, 0.016), (86, 0.026), (95, 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.87468851 15 emnlp-2012-Active Learning for Imbalanced Sentiment Classification

Author: Shoushan Li ; Shengfeng Ju ; Guodong Zhou ; Xiaojun Li

Abstract: Active learning is a promising way for sentiment classification to reduce the annotation cost. In this paper, we focus on the imbalanced class distribution scenario for sentiment classification, wherein the number of positive samples is quite different from that of negative samples. This scenario posits new challenges to active learning. To address these challenges, we propose a novel active learning approach, named co-selecting, by taking both the imbalanced class distribution issue and uncertainty into account. Specifically, our co-selecting approach employs two feature subspace classifiers to collectively select most informative minority-class samples for manual annotation by leveraging a certainty measurement and an uncertainty measurement, and in the meanwhile, automatically label most informative majority-class samples, to reduce humanannotation efforts. Extensive experiments across four domains demonstrate great potential and effectiveness of our proposed co-selecting approach to active learning for imbalanced sentiment classification. 1

same-paper 2 0.78964019 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction

Author: Mihai Surdeanu ; Julie Tibshirani ; Ramesh Nallapati ; Christopher D. Manning

Abstract: Distant supervision for relation extraction (RE) gathering training data by aligning a database of facts with text – is an efficient approach to scale RE to thousands of different relations. However, this introduces a challenging learning scenario where the relation expressed by a pair of entities found in a sentence is unknown. For example, a sentence containing Balzac and France may express BornIn or Died, an unknown relation, or no relation at all. Because of this, traditional supervised learning, which assumes that each example is explicitly mapped to a label, is not appropriate. We propose a novel approach to multi-instance multi-label learning for RE, which jointly models all the instances of a pair of entities in text and all their labels using a graphical model with latent variables. Our model performs competitively on two difficult domains. –

3 0.5844171 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents

Author: Heeyoung Lee ; Marta Recasens ; Angel Chang ; Mihai Surdeanu ; Dan Jurafsky

Abstract: We introduce a novel coreference resolution system that models entities and events jointly. Our iterative method cautiously constructs clusters of entity and event mentions using linear regression to model cluster merge operations. As clusters are built, information flows between entity and event clusters through features that model semantic role dependencies. Our system handles nominal and verbal events as well as entities, and our joint formulation allows information from event coreference to help entity coreference, and vice versa. In a cross-document domain with comparable documents, joint coreference resolution performs significantly better (over 3 CoNLL F1 points) than two strong baselines that resolve entities and events separately.

4 0.5769549 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

Author: Jayant Krishnamurthy ; Tom Mitchell

Abstract: We present a method for training a semantic parser using only a knowledge base and an unlabeled text corpus, without any individually annotated sentences. Our key observation is that multiple forms ofweak supervision can be combined to train an accurate semantic parser: semantic supervision from a knowledge base, and syntactic supervision from dependencyparsed sentences. We apply our approach to train a semantic parser that uses 77 relations from Freebase in its knowledge representation. This semantic parser extracts instances of binary relations with state-of-theart accuracy, while simultaneously recovering much richer semantic structures, such as conjunctions of multiple relations with partially shared arguments. We demonstrate recovery of this richer structure by extracting logical forms from natural language queries against Freebase. On this task, the trained semantic parser achieves 80% precision and 56% recall, despite never having seen an annotated logical form.

5 0.57365942 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction

Author: David McClosky ; Christopher D. Manning

Abstract: We present a distantly supervised system for extracting the temporal bounds of fluents (relations which only hold during certain times, such as attends school). Unlike previous pipelined approaches, our model does not assume independence between each fluent or even between named entities with known connections (parent, spouse, employer, etc.). Instead, we model what makes timelines of fluents consistent by learning cross-fluent constraints, potentially spanning entities as well. For example, our model learns that someone is unlikely to start a job at age two or to marry someone who hasn’t been born yet. Our system achieves a 36% error reduction over a pipelined baseline.

6 0.57093388 98 emnlp-2012-No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities

7 0.56876016 92 emnlp-2012-Multi-Domain Learning: When Do Domains Matter?

8 0.56815523 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules

9 0.5658145 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews

10 0.56580663 39 emnlp-2012-Enlarging Paraphrase Collections through Generalization and Instantiation

11 0.56451923 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games

12 0.5610984 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns

13 0.56050771 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP

14 0.56015933 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation

15 0.55986774 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes

16 0.55943251 138 emnlp-2012-Wiki-ly Supervised Part-of-Speech Tagging

17 0.55769372 24 emnlp-2012-Biased Representation Learning for Domain Adaptation

18 0.55760014 135 emnlp-2012-Using Discourse Information for Paraphrase Extraction

19 0.55719459 26 emnlp-2012-Building a Lightweight Semantic Model for Unsupervised Information Extraction on Short Listings

20 0.55667412 72 emnlp-2012-Joint Inference for Event Timeline Construction