emnlp emnlp2013 emnlp2013-68 knowledge-graph by maker-knowledge-mining

68 emnlp-2013-Effectiveness and Efficiency of Open Relation Extraction


Source: pdf

Author: Filipe Mesquita ; Jordan Schmidek ; Denilson Barbosa

Abstract: A large number of Open Relation Extraction approaches have been proposed recently, covering a wide range of NLP machinery, from “shallow” (e.g., part-of-speech tagging) to “deep” (e.g., semantic role labeling–SRL). A natural question then is what is the tradeoff between NLP depth (and associated computational cost) versus effectiveness. This paper presents a fair and objective experimental comparison of 8 state-of-the-art approaches over 5 different datasets, and sheds some light on the issue. The paper also describes a novel method, EXEMPLAR, which adapts ideas from SRL to less costly NLP machinery, resulting in substantial gains both in efficiency and effectiveness, over binary and n-ary relation extraction tasks.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The paper also describes a novel method, EXEMPLAR, which adapts ideas from SRL to less costly NLP machinery, resulting in substantial gains both in efficiency and effectiveness, over binary and n-ary relation extraction tasks. [sent-9, score-0.258]

2 1 Introduction Open Relation Extraction (ORE) (Banko and Et- zioni, 2008) has become prevalent over traditional relation extraction methods, especially on the Web, because of the intrinsic difficulty in training individual extractors for every single relation. [sent-10, score-0.183]

3 Broadly speaking, existing ORE approaches can be grouped according to the level of sophistication of the NLP techniques they rely upon: (1) shallow parsing, (2) dependency parsing and (3) semantic role labelling (SRL). [sent-11, score-0.235]

4 , 2012), identify relations by matching patterns over such tags. [sent-14, score-0.169]

5 Dependency parsing gives unambiguous relations among 447 each word in the sentence, and the ORE approaches in this category such as PATTY (Nakashole et al. [sent-15, score-0.169]

6 , 2013) identify whole subtrees connecting the relation predicate and its arguments. [sent-18, score-0.235]

7 , 2003), add roles to each node in a parse tree, en- abling ORE approaches that identify the precise connection between each argument and the predicate in a relation, independently. [sent-20, score-0.214]

8 Shallow methods handle ten times more sentences than dependency parsing methods, which in turn handle ten times more sentences than semantic parsing methods. [sent-26, score-0.218]

9 hc o2d0s1 i3n A Nsastoucria lti Loan fgoura Cgoem Ppruotcaetsiosin agl, L piang eusis 4t4ic7s–457, tify the precise connection between the argument and the predicate words in a relation) over a dependency parse tree (i. [sent-30, score-0.221]

10 One side of the argument favors shallow methods, claiming deep NLP costs orders of magnitude more and provide much less dramatic gains in terms of effectiveness (Christensen et al. [sent-40, score-0.27]

11 , 2012) extracts textual patterns from sentences based on paths in the dependency graph. [sent-54, score-0.186]

12 For all pairs of named entities, PATTY finds the shortest 448 WDEatBa-5se0t0SearcShou Srnceippets# Se5n0te0nces# R4e6la1tions PNEYNTN-5-10000NPeewnn Yo Trrkee Tbiamneks15000015510 Table 1: Binary relation datasets. [sent-55, score-0.221]

13 OLLIE merges binary relations that differ only in the preposition and second argument to produce n-ary extractions, as in: (A, “met with”, B) and (A, “met in”, C) leading to (A, “met”, [with B, in C]). [sent-62, score-0.368]

14 The shortest path between the two entities along with the shortest path between relational words and an entity are used as input to the tree kernel. [sent-65, score-0.212]

15 An expanded set of syntactic patterns based on those from ReVerb are used to generate relation candidates. [sent-66, score-0.221]

16 One of its major limitations is that it is only able to label arguments with verb predicates. [sent-73, score-0.175]

17 Lund, on the other hand, is based on dependency parsing and is trained on both PropBank and NomBank, making it able to extract relations with both verb and noun predicates. [sent-74, score-0.458]

18 We manually annotated the relations for WEB-500 and NYT500 and use the PENN-100 annotations provided by TreeKernel’s authors (Xu et al. [sent-86, score-0.167]

19 We identify exactly two entities and a trigger (a single token indicating a relation–see Section A. [sent-89, score-0.397]

20 In addition, we specify a window of tokens allowed to be in a relation, including modifiers of the trigger and prepositions connecting triggers to their arguments. [sent-91, score-0.579]

21 For each sentence annotated with two entities, a system must extract a string representing the relation between them. [sent-92, score-0.219]

22 Our evaluation method deems an extraction as correct if it contains the trigger and al449 lowed tokens only. [sent-93, score-0.328]

23 where “Google” and “YouTube” are entities of the type organization, “acquisition” is the trigger and the allowed tokens are “acquisition”, “’s” and “of”. [sent-95, score-0.478]

24 Since every method uses a different tool to recognize entities, we try to ensure every method is able to recognize the entities marked by our annotators. [sent-103, score-0.165]

25 For methods that extract relations between noun phrases (ReVerb, OLLIE, SwiRL and Lund), there is the additional task of identifying whether a noun phrase containing additional words surrounding “Europe” and “Asia” is still a reference to the annotated entity. [sent-108, score-0.331]

26 Additional steps to merge relations and remove infrequent relations are not applied. [sent-116, score-0.262]

27 In addition, we assume there is only one relation between a pair of entities in a sentence. [sent-117, score-0.278]

28 The number of entity pairs with more than one relation was insignificant in our datasets (less than 0. [sent-118, score-0.224]

29 This is because WEB500 sentences were collected by querying a search engine with known relation instances. [sent-126, score-0.209]

30 For efficiency, there is a clear separation of approximately one order of magnitude among methods based on shallow parsing (ReVerb and SONEX), dependency parsing (OLLIE, EXEMPLAR[M], EXEMPLAR[S], and PATTY) and semantic parsing (SwiRL and Lund). [sent-139, score-0.311]

31 This is possible because EXEMPLAR looks at each argument separately, as opposed to the whole subtree connecting two arguments. [sent-144, score-0.21]

32 PATTY relies on redundancy of extractions to normalize the produced relations in order to recover from mistakes done in the sentence-level. [sent-147, score-0.202]

33 Figure 2 illustrates the dominance relation differently, using precision versus recall. [sent-148, score-0.183]

34 The importance of relations triggered by nouns is illustrated by the higher recall of SONEX and Lund when compared, respectively, to ReVerb and SwiRL, similar methods that handle verb triggers only. [sent-154, score-0.451]

35 ” According to our annotation style, there is a relation “rivals” between “P&G;” and “Kao Corp. [sent-172, score-0.183]

36 On the other hand, the original annotations for PENN-100 consider only the relation between “Kap Corp. [sent-174, score-0.183]

37 These differences in annotation illustrate the challenges of producing a benchmark for open relation extraction. [sent-176, score-0.183]

38 Corpuslevel evaluations consider an extracted relation as correct regardless of whether a method was able to identify one or all sentences that describe this relation. [sent-180, score-0.238]

39 , merge near-duplicate relations and co-referential entities) all relations described in a corpus. [sent-183, score-0.262]

40 Musician–Musician), as opposed to the relation itself, which includes its two arguments. [sent-195, score-0.183]

41 The evaluations of ReVerb and OLLIE consider any noun phrase as a potential argument, while the evaluations of TreeKernel and SONEX consider named entities only. [sent-196, score-0.235]

42 Every sentence is annotated with a single relation trigger and its arguments. [sent-202, score-0.521]

43 , an), where r is the relation name and ai is an argument. [sent-207, score-0.214]

44 We only use the extracted relation whose name contains the annotated trigger, if it exists. [sent-208, score-0.25]

45 An argument of such relation is deemed a correct extraction if it is annotated in the sentence; otherwise, it is deemed incorrect. [sent-209, score-0.42]

46 Our automatic annotator identifies pairs of entities and a trigger of the relation between them. [sent-219, score-0.633]

47 Given two entities appearing within 10 tokens of each other in a sentence, our an- notator checks whether there is a relation connecting them in Freebase. [sent-221, score-0.356]

48 If such a relation exists, the annotator looks for a trigger in the sentence. [sent-222, score-0.565]

49 A trigger must be a synonym for the Freebase relation (according to WordNet) and its distance to the nearest entity cannot be more than 5 tokens. [sent-223, score-0.526]

50 Our evaluation is able to assess the effectiveness of different methods by specifying a trigger and a window of allowed tokens for each relation. [sent-239, score-0.389]

51 Rules could be learned from both dependency parsing and shallow parsing, or just shallow parsing if computing time is extremely limited. [sent-244, score-0.314]

52 A EXEMPLAR ORE methods must recognize relations from the text alone. [sent-247, score-0.166]

53 Banko and Etzioni claim that more than 90% of binary relations are expressed through a few syntactic patterns, such as “verb” and “noun+preposition” (Banko and Etzioni, 2008). [sent-249, score-0.164]

54 This section presents a study focusing on how nary relations (n > 2) are expressed in English, based on 100 distinct relations manually annotated from a random sample of 5 14 sentences in the New York Times Corpus (Sandhaus, 2008). [sent-251, score-0.324]

55 For instance, a relation “met with” indicates that the first argument is the subject and the second one is the object of “with”. [sent-255, score-0.344]

56 In order to represent n-ary relations, our patterns do not contain prepositions, possessives or any other word connecting the relation to the argu- ment. [sent-256, score-0.273]

57 For instance, the sentence “Obama met with Putin in Russia” contains the relation “meet” along three arguments: “Obama” (subject), “Putin” (object of preposition with) and “Russia” (object of preposition in). [sent-257, score-0.379]

58 A single relation can be represented in different ways using the patterns shown in Table 5. [sent-259, score-0.221]

59 For instance, the relation “donate” can be expressed as an active verb (“donates”), passive voice (“was donated by”) and normalized verb (“donation”). [sent-260, score-0.417]

60 In addition, an apposition+noun relation can be expressed as an copula+noun relation by replacing apposition for the copula verb “be”. [sent-261, score-0.637]

61 Table 6 present the relation types found through our analysis. [sent-271, score-0.183]

62 We developed EXEMPLAR to specifically recognize these relation types, including their variations. [sent-272, score-0.218]

63 An argument role defines how an argument participates in a relation. [sent-274, score-0.295]

64 In ORE, the roles for each relation are not provided and must also be recognized from the text. [sent-275, score-0.266]

65 We use the following roles: sub j ect, dire ct ob j ect and prep ob j e ct . [sent-276, score-0.31]

66 An argument has a role prep ob j ect when its connected to the relation by a preposition. [sent-277, score-0.541]

67 The roles of prepositional objects consist of their preposition and the suffix “ object”, indicating that each preposition corresponds to a different role. [sent-278, score-0.229]

68 ” is an object of the preposition “of” and has the role o f ob j e ct . [sent-283, score-0.213]

69 Multiple entities can play the same role in a relation instance. [sent-284, score-0.311]

70 Verb relations accept all three roles, while copula+noun and verb+noun relations accept sub j ect and prep ob j ect only. [sent-287, score-0.635]

71 sub ject: relat ion : o f ob j ect : in ob j ect : “NFL” “approve new stadium” “Falcons” “Atlanta” Figure 3: A relation instance extracted by EXEMPLAR for the sentence “NFL approves Falcons’ new stadium in Atlanta”. [sent-292, score-0.776]

72 nsubj paomsosd NFL approves Falcons' new stadium in Atlanta. [sent-293, score-0.232]

73 Entities are in bold, triggers are underline and arrows represent dependencies. [sent-295, score-0.174]

74 syntactic roles correspond to the same semantic role across different relations (Chambers and Jurafsky, 2011). [sent-296, score-0.247]

75 1 The method EXEMPLAR takes a stream of textual documents and extracts instances of n-ary relations as illustrated in Figure 3. [sent-299, score-0.163]

76 In this example, “stadium” depends on “approves” and the arrow connecting them can be read as “the direct object of approves is stadium”. [sent-306, score-0.207]

77 3 Detecting triggers After recognizing entities, EXEMPLAR detects relation triggers. [sent-313, score-0.351]

78 A trigger is a single token that indicates the presence of a relation. [sent-314, score-0.302]

79 EXEMPLAR also uses triggers to determine the relation name, as discussed later. [sent-317, score-0.322]

80 A trigger can be any noun or verb that was not tagged as being part of an entity mention. [sent-318, score-0.542]

81 A verb relation is triggered by a verb that does not include a noun as its direct object. [sent-321, score-0.536]

82 A noun must be a nominalized verb to be a trigger for verb relations. [sent-323, score-0.668]

83 The name of a relation triggered by a nominalized verb is the trigger’s original verb (before nominalization). [sent-326, score-0.535]

84 The copula used in the relation name can be a verb with cop- ula dependency to the trigger, or the verb “be” for 3http : //wordnetcode . [sent-330, score-0.653]

85 The relation name is the concatenation of the copula’s lemma and the trigger’s lemma along its modifiers. [sent-334, score-0.276]

86 For instance, the relation in the sentence “Jaden and Willow, the famous children of Will Smith” is “be famous child”. [sent-335, score-0.183]

87 EXEMPLAR recognizes two triggers for each verb+noun relation: a verb and a noun acting as its direct object. [sent-337, score-0.338]

88 The relation name is defined by concatenating the verb’s lemma with the the noun and its modifiers. [sent-338, score-0.327]

89 In our running example, “approves” and “stadium” trigger the relation “approve new stadium”. [sent-339, score-0.485]

90 4 Detecting candidate arguments After relation triggers are identified, EXEMPLAR proceeds to detect their candidate arguments. [sent-341, score-0.38]

91 For this, we look at the dependency between each entity and a trigger separately. [sent-342, score-0.433]

92 EXEMPLAR relies on two observations: (1) an argument is often adjacent to a trigger in the dependency graph, and (2) the type of the dependency can accurately predict whether an entity is an argument for the relation or not. [sent-343, score-0.994]

93 EXEMPLAR identifies as a candidate argument every entity that is connected to trigger, as long as their dependency type is listed in Table 7. [sent-345, score-0.288]

94 The entities “NFL” and “Atlanta” depends on the trigger “approves” and “Falcons” depend on the trigger “stadium”. [sent-347, score-0.725]

95 Since their dependency types are listed in Table 7, these entities are marked as candidate arguments. [sent-348, score-0.185]

96 5 Role Detection EXEMPLAR determines the role of an argument based on the trigger type (noun or verb), the type of dependency between the trigger and argument and the direction of the dependency. [sent-350, score-1.041]

97 To take into account the dependency direction, we prefix each dependency type with “>” when an entity depends on the trigger and “<” when the trigger depends on the entity. [sent-351, score-0.903]

98 Table 8 shows EXEMPLAR’s rules that assign roles to arguments for each relation type. [sent-352, score-0.352]

99 If trigger type = trigger and dependency type = dependency then assign role. [sent-355, score-0.836]

100 For example, the first rule in Table 8a specifies that an argument must be assigned the role of a subject if this argument depends on a verb trigger and the dependency type is >nsubj. [sent-356, score-0.856]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('exemplar', 0.526), ('trigger', 0.302), ('ollie', 0.245), ('ore', 0.197), ('relation', 0.183), ('patty', 0.182), ('treekernel', 0.182), ('lund', 0.171), ('sonex', 0.149), ('triggers', 0.139), ('stadium', 0.133), ('swirl', 0.133), ('argument', 0.131), ('relations', 0.131), ('reverb', 0.126), ('verb', 0.117), ('copula', 0.115), ('approves', 0.099), ('entities', 0.095), ('dependency', 0.09), ('ect', 0.084), ('roles', 0.083), ('noun', 0.082), ('srl', 0.081), ('ob', 0.077), ('shallow', 0.074), ('preposition', 0.073), ('europe', 0.071), ('extractions', 0.071), ('quick', 0.069), ('falcons', 0.066), ('putin', 0.066), ('arguments', 0.058), ('reilly', 0.058), ('annotator', 0.053), ('nfl', 0.053), ('banko', 0.053), ('connecting', 0.052), ('met', 0.05), ('nominalized', 0.05), ('asia', 0.046), ('freebase', 0.046), ('efficiency', 0.042), ('entity', 0.041), ('apposition', 0.039), ('sub', 0.039), ('patterns', 0.038), ('parsing', 0.038), ('obama', 0.038), ('shortest', 0.038), ('triggered', 0.037), ('concerns', 0.036), ('annotated', 0.036), ('deemed', 0.035), ('recognize', 0.035), ('arrows', 0.035), ('role', 0.033), ('binary', 0.033), ('brokerage', 0.033), ('christensen', 0.033), ('donation', 0.033), ('enclosed', 0.033), ('kao', 0.033), ('merhav', 0.033), ('musician', 0.033), ('textrunner', 0.033), ('prep', 0.033), ('magnitude', 0.033), ('extracts', 0.032), ('effectiveness', 0.032), ('etzioni', 0.032), ('name', 0.031), ('prepositions', 0.031), ('lemma', 0.031), ('object', 0.03), ('fair', 0.03), ('allowed', 0.029), ('evaluations', 0.029), ('detects', 0.029), ('donate', 0.029), ('approve', 0.029), ('leslie', 0.029), ('malt', 0.029), ('russia', 0.029), ('sheds', 0.029), ('accept', 0.028), ('rules', 0.028), ('looks', 0.027), ('recall', 0.027), ('annotate', 0.027), ('tokens', 0.026), ('sandhaus', 0.026), ('discount', 0.026), ('type', 0.026), ('sentences', 0.026), ('depends', 0.026), ('xu', 0.025), ('york', 0.025), ('nugues', 0.025), ('nyt', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 68 emnlp-2013-Effectiveness and Efficiency of Open Relation Extraction

Author: Filipe Mesquita ; Jordan Schmidek ; Denilson Barbosa

Abstract: A large number of Open Relation Extraction approaches have been proposed recently, covering a wide range of NLP machinery, from “shallow” (e.g., part-of-speech tagging) to “deep” (e.g., semantic role labeling–SRL). A natural question then is what is the tradeoff between NLP depth (and associated computational cost) versus effectiveness. This paper presents a fair and objective experimental comparison of 8 state-of-the-art approaches over 5 different datasets, and sheds some light on the issue. The paper also describes a novel method, EXEMPLAR, which adapts ideas from SRL to less costly NLP machinery, resulting in substantial gains both in efficiency and effectiveness, over binary and n-ary relation extraction tasks.

2 0.13330823 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

Author: Oier Lopez de Lacalle ; Mirella Lapata

Abstract: In this paper we present an unsupervised approach to relational information extraction. Our model partitions tuples representing an observed syntactic relationship between two named entities (e.g., “X was born in Y” and “X is from Y”) into clusters corresponding to underlying semantic relation types (e.g., BornIn, Located). Our approach incorporates general domain knowledge which we encode as First Order Logic rules and automatically combine with a topic model developed specifically for the relation extraction task. Evaluation results on the ACE 2007 English Relation Detection and Categorization (RDC) task show that our model outperforms competitive unsupervised approaches by a wide margin and is able to produce clusters shaped by both the data and the rules.

3 0.12718371 152 emnlp-2013-Predicting the Presence of Discourse Connectives

Author: Gary Patterson ; Andrew Kehler

Abstract: We present a classification model that predicts the presence or omission of a lexical connective between two clauses, based upon linguistic features of the clauses and the type of discourse relation holding between them. The model is trained on a set of high frequency relations extracted from the Penn Discourse Treebank and achieves an accuracy of 86.6%. Analysis of the results reveals that the most informative features relate to the discourse dependencies between sequences of coherence relations in the text. We also present results of an experiment that provides insight into the nature and difficulty of the task.

4 0.11791176 62 emnlp-2013-Detection of Product Comparisons - How Far Does an Out-of-the-Box Semantic Role Labeling System Take You?

Author: Wiltrud Kessler ; Jonas Kuhn

Abstract: This short paper presents a pilot study investigating the training of a standard Semantic Role Labeling (SRL) system on product reviews for the new task of detecting comparisons. An (opinionated) comparison consists of a comparative “predicate” and up to three “arguments”: the entity evaluated positively, the entity evaluated negatively, and the aspect under which the comparison is made. In user-generated product reviews, the “predicate” and “arguments” are expressed in highly heterogeneous ways; but since the elements are textually annotated in existing datasets, SRL is technically applicable. We address the interesting question how well training an outof-the-box SRL model works for English data. We observe that even without any feature engineering or other major adaptions to our task, the system outperforms a reasonable heuristic baseline in all steps (predicate identification, argument identification and argument classification) and in three different datasets.

5 0.11268673 118 emnlp-2013-Learning Biological Processes with Global Constraints

Author: Aju Thalappillil Scaria ; Jonathan Berant ; Mengqiu Wang ; Peter Clark ; Justin Lewis ; Brittany Harding ; Christopher D. Manning

Abstract: Biological processes are complex phenomena involving a series of events that are related to one another through various relationships. Systems that can understand and reason over biological processes would dramatically improve the performance of semantic applications involving inference such as question answering (QA) – specifically “How? ” and “Why? ” questions. In this paper, we present the task of process extraction, in which events within a process and the relations between the events are automatically extracted from text. We represent processes by graphs whose edges describe a set oftemporal, causal and co-reference event-event relations, and characterize the structural properties of these graphs (e.g., the graphs are connected). Then, we present a method for extracting relations between the events, which exploits these structural properties by performing joint in- ference over the set of extracted relations. On a novel dataset containing 148 descriptions of biological processes (released with this paper), we show significant improvement comparing to baselines that disregard process structure.

6 0.084796391 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations

7 0.080676027 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

8 0.078827076 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors

9 0.078618675 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types

10 0.077917688 160 emnlp-2013-Relational Inference for Wikification

11 0.074709862 49 emnlp-2013-Combining Generative and Discriminative Model Scores for Distant Supervision

12 0.071767494 112 emnlp-2013-Joint Coreference Resolution and Named-Entity Linking with Multi-Pass Sieves

13 0.064196616 41 emnlp-2013-Building Event Threads out of Multiple News Articles

14 0.062400341 166 emnlp-2013-Semantic Parsing on Freebase from Question-Answer Pairs

15 0.062357768 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations

16 0.06092995 74 emnlp-2013-Event-Based Time Label Propagation for Automatic Dating of News Articles

17 0.058753233 58 emnlp-2013-Dependency Language Models for Sentence Completion

18 0.058091443 7 emnlp-2013-A Hierarchical Entity-Based Approach to Structuralize User Generated Content in Social Media: A Case of Yahoo! Answers

19 0.053552132 75 emnlp-2013-Event Schema Induction with a Probabilistic Entity-Driven Model

20 0.052026335 187 emnlp-2013-Translation with Source Constituency and Dependency Trees


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.174), (1, 0.105), (2, 0.029), (3, 0.083), (4, -0.043), (5, 0.072), (6, -0.114), (7, 0.006), (8, 0.069), (9, 0.06), (10, 0.13), (11, -0.006), (12, -0.058), (13, 0.075), (14, -0.103), (15, 0.024), (16, -0.064), (17, 0.032), (18, 0.082), (19, 0.111), (20, -0.005), (21, 0.162), (22, -0.032), (23, -0.001), (24, -0.023), (25, 0.057), (26, 0.006), (27, 0.062), (28, -0.011), (29, 0.082), (30, -0.111), (31, -0.028), (32, 0.05), (33, -0.111), (34, 0.006), (35, -0.014), (36, -0.078), (37, 0.066), (38, -0.073), (39, -0.007), (40, -0.095), (41, -0.009), (42, 0.007), (43, -0.014), (44, 0.032), (45, -0.025), (46, 0.064), (47, -0.042), (48, -0.019), (49, -0.151)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95598257 68 emnlp-2013-Effectiveness and Efficiency of Open Relation Extraction

Author: Filipe Mesquita ; Jordan Schmidek ; Denilson Barbosa

Abstract: A large number of Open Relation Extraction approaches have been proposed recently, covering a wide range of NLP machinery, from “shallow” (e.g., part-of-speech tagging) to “deep” (e.g., semantic role labeling–SRL). A natural question then is what is the tradeoff between NLP depth (and associated computational cost) versus effectiveness. This paper presents a fair and objective experimental comparison of 8 state-of-the-art approaches over 5 different datasets, and sheds some light on the issue. The paper also describes a novel method, EXEMPLAR, which adapts ideas from SRL to less costly NLP machinery, resulting in substantial gains both in efficiency and effectiveness, over binary and n-ary relation extraction tasks.

2 0.6995514 152 emnlp-2013-Predicting the Presence of Discourse Connectives

Author: Gary Patterson ; Andrew Kehler

Abstract: We present a classification model that predicts the presence or omission of a lexical connective between two clauses, based upon linguistic features of the clauses and the type of discourse relation holding between them. The model is trained on a set of high frequency relations extracted from the Penn Discourse Treebank and achieves an accuracy of 86.6%. Analysis of the results reveals that the most informative features relate to the discourse dependencies between sequences of coherence relations in the text. We also present results of an experiment that provides insight into the nature and difficulty of the task.

3 0.61175287 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

Author: Oier Lopez de Lacalle ; Mirella Lapata

Abstract: In this paper we present an unsupervised approach to relational information extraction. Our model partitions tuples representing an observed syntactic relationship between two named entities (e.g., “X was born in Y” and “X is from Y”) into clusters corresponding to underlying semantic relation types (e.g., BornIn, Located). Our approach incorporates general domain knowledge which we encode as First Order Logic rules and automatically combine with a topic model developed specifically for the relation extraction task. Evaluation results on the ACE 2007 English Relation Detection and Categorization (RDC) task show that our model outperforms competitive unsupervised approaches by a wide margin and is able to produce clusters shaped by both the data and the rules.

4 0.5898723 49 emnlp-2013-Combining Generative and Discriminative Model Scores for Distant Supervision

Author: Benjamin Roth ; Dietrich Klakow

Abstract: Distant supervision is a scheme to generate noisy training data for relation extraction by aligning entities of a knowledge base with text. In this work we combine the output of a discriminative at-least-one learner with that of a generative hierarchical topic model to reduce the noise in distant supervision data. The combination significantly increases the ranking quality of extracted facts and achieves state-of-the-art extraction performance in an end-to-end setting. A simple linear interpolation of the model scores performs better than a parameter-free scheme based on nondominated sorting.

5 0.57099485 160 emnlp-2013-Relational Inference for Wikification

Author: Xiao Cheng ; Dan Roth

Abstract: Wikification, commonly referred to as Disambiguation to Wikipedia (D2W), is the task of identifying concepts and entities in text and disambiguating them into the most specific corresponding Wikipedia pages. Previous approaches to D2W focused on the use of local and global statistics over the given text, Wikipedia articles and its link structures, to evaluate context compatibility among a list of probable candidates. However, these methods fail (often, embarrassingly), when some level of text understanding is needed to support Wikification. In this paper we introduce a novel approach to Wikification by incorporating, along with statistical methods, richer relational analysis of the text. We provide an extensible, efficient and modular Integer Linear Programming (ILP) formulation of Wikification that incorporates the entity-relation inference problem, and show that the ability to identify relations in text helps both candi- date generation and ranking Wikipedia titles considerably. Our results show significant improvements in both Wikification and the TAC Entity Linking task.

6 0.51255131 35 emnlp-2013-Automatically Detecting and Attributing Indirect Quotations

7 0.50263083 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

8 0.49178615 62 emnlp-2013-Detection of Product Comparisons - How Far Does an Out-of-the-Box Semantic Role Labeling System Take You?

9 0.48568988 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations

10 0.47508284 183 emnlp-2013-The VerbCorner Project: Toward an Empirically-Based Semantic Decomposition of Verbs

11 0.46581355 137 emnlp-2013-Multi-Relational Latent Semantic Analysis

12 0.46052462 63 emnlp-2013-Discourse Level Explanatory Relation Extraction from Product Reviews Using First-Order Logic

13 0.4209671 189 emnlp-2013-Two-Stage Method for Large-Scale Acquisition of Contradiction Pattern Pairs using Entailment

14 0.41434756 12 emnlp-2013-A Semantically Enhanced Approach to Determine Textual Similarity

15 0.41260171 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations

16 0.41168177 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors

17 0.40934679 118 emnlp-2013-Learning Biological Processes with Global Constraints

18 0.38378221 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery

19 0.38052583 41 emnlp-2013-Building Event Threads out of Multiple News Articles

20 0.37362185 58 emnlp-2013-Dependency Language Models for Sentence Completion


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.038), (18, 0.04), (22, 0.061), (30, 0.06), (41, 0.322), (45, 0.017), (51, 0.176), (66, 0.02), (71, 0.022), (75, 0.063), (77, 0.02), (90, 0.014), (96, 0.039)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.7781598 196 emnlp-2013-Using Crowdsourcing to get Representations based on Regular Expressions

Author: Anders Sgaard ; Hector Martinez ; Jakob Elming ; Anders Johannsen

Abstract: Often the bottleneck in document classification is finding good representations that zoom in on the most important aspects of the documents. Most research uses n-gram representations, but relevant features often occur discontinuously, e.g., not. . . good in sentiment analysis. In this paper we present experiments getting experts to provide regular expressions, as well as crowdsourced annotation tasks from which regular expressions can be derived. Somewhat surprisingly, it turns out that these crowdsourced feature combinations outperform automatic feature combination methods, as well as expert features, by a very large margin and reduce error by 24-41% over n-gram representations.

same-paper 2 0.77154589 68 emnlp-2013-Effectiveness and Efficiency of Open Relation Extraction

Author: Filipe Mesquita ; Jordan Schmidek ; Denilson Barbosa

Abstract: A large number of Open Relation Extraction approaches have been proposed recently, covering a wide range of NLP machinery, from “shallow” (e.g., part-of-speech tagging) to “deep” (e.g., semantic role labeling–SRL). A natural question then is what is the tradeoff between NLP depth (and associated computational cost) versus effectiveness. This paper presents a fair and objective experimental comparison of 8 state-of-the-art approaches over 5 different datasets, and sheds some light on the issue. The paper also describes a novel method, EXEMPLAR, which adapts ideas from SRL to less costly NLP machinery, resulting in substantial gains both in efficiency and effectiveness, over binary and n-ary relation extraction tasks.

3 0.54742688 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks

Author: Zhongqing Wang ; Shoushan LI ; Fang Kong ; Guodong Zhou

Abstract: Personal profile information on social media like LinkedIn.com and Facebook.com is at the core of many interesting applications, such as talent recommendation and contextual advertising. However, personal profiles usually lack organization confronted with the large amount of available information. Therefore, it is always a challenge for people to find desired information from them. In this paper, we address the task of personal profile summarization by leveraging both personal profile textual information and social networks. Here, using social networks is motivated by the intuition that, people with similar academic, business or social connections (e.g. co-major, co-university, and cocorporation) tend to have similar experience and summaries. To achieve the learning process, we propose a collective factor graph (CoFG) model to incorporate all these resources of knowledge to summarize personal profiles with local textual attribute functions and social connection factors. Extensive evaluation on a large-scale dataset from LinkedIn.com demonstrates the effectiveness of the proposed approach. 1

4 0.54339778 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

Author: Jason Weston ; Antoine Bordes ; Oksana Yakhnenko ; Nicolas Usunier

Abstract: This paper proposes a novel approach for relation extraction from free text which is trained to jointly use information from the text and from existing knowledge. Our model is based on scoring functions that operate by learning low-dimensional embeddings of words, entities and relationships from a knowledge base. We empirically show on New York Times articles aligned with Freebase relations that our approach is able to efficiently use the extra information provided by a large subset of Freebase data (4M entities, 23k relationships) to improve over methods that rely on text features alone.

5 0.54272836 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction

Author: Shize Xu ; Shanshan Wang ; Yan Zhang

Abstract: The rapid development of Web2.0 leads to significant information redundancy. Especially for a complex news event, it is difficult to understand its general idea within a single coherent picture. A complex event often contains branches, intertwining narratives and side news which are all called storylines. In this paper, we propose a novel solution to tackle the challenging problem of storylines extraction and reconstruction. Specifically, we first investigate two requisite properties of an ideal storyline. Then a unified algorithm is devised to extract all effective storylines by optimizing these properties at the same time. Finally, we reconstruct all extracted lines and generate the high-quality story map. Experiments on real-world datasets show that our method is quite efficient and highly competitive, which can bring about quicker, clearer and deeper comprehension to readers.

6 0.54249281 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations

7 0.54201448 80 emnlp-2013-Exploiting Zero Pronouns to Improve Chinese Coreference Resolution

8 0.53991681 152 emnlp-2013-Predicting the Presence of Discourse Connectives

9 0.53964674 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity

10 0.53763402 69 emnlp-2013-Efficient Collective Entity Linking with Stacking

11 0.53706068 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

12 0.53523815 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

13 0.5349654 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery

14 0.53340566 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

15 0.5334025 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology

16 0.53308874 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction

17 0.53260261 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

18 0.53209043 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors

19 0.53149986 160 emnlp-2013-Relational Inference for Wikification

20 0.53132975 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types