emnlp emnlp2010 emnlp2010-46 knowledge-graph by maker-knowledge-mining

46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks


Source: pdf

Author: Ekaterina Buyko ; Udo Hahn

Abstract: In state-of-the-art approaches to information extraction (IE), dependency graphs constitute the fundamental data structure for syntactic structuring and subsequent knowledge elicitation from natural language documents. The top-performing systems in the BioNLP 2009 Shared Task on Event Extraction all shared the idea to use dependency structures generated by a variety of parsers either directly or in some converted manner — and optionally modified their output to fit the special needs of IE. As there are systematic differences between various dependency representations being used in this competition, we scrutinize on different encoding styles for dependency information and their possible impact on solving several IE tasks. After assessing more or less established dependency representations such as the Stanford and CoNLL-X dependen— cies, we will then focus on trimming operations that pave the way to more effective IE. Our evaluation study covers data from a number of constituency- and dependency-based parsers and provides experimental evidence which dependency representations are particularly beneficial for the event extraction task. Based on empirical findings from our study we were able to achieve the performance of 57.2% F-score on the development data set of the BioNLP Shared Task 2009.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract In state-of-the-art approaches to information extraction (IE), dependency graphs constitute the fundamental data structure for syntactic structuring and subsequent knowledge elicitation from natural language documents. [sent-4, score-0.513]

2 The top-performing systems in the BioNLP 2009 Shared Task on Event Extraction all shared the idea to use dependency structures generated by a variety of parsers either directly or in some converted manner — and optionally modified their output to fit the special needs of IE. [sent-5, score-0.538]

3 As there are systematic differences between various dependency representations being used in this competition, we scrutinize on different encoding styles for dependency information and their possible impact on solving several IE tasks. [sent-6, score-0.697]

4 After assessing more or less established dependency representations such as the Stanford and CoNLL-X dependen— cies, we will then focus on trimming operations that pave the way to more effective IE. [sent-7, score-0.624]

5 Our evaluation study covers data from a number of constituency- and dependency-based parsers and provides experimental evidence which dependency representations are particularly beneficial for the event extraction task. [sent-8, score-1.012]

6 For relation extraction in the biomedical domain (the focus of our work), a stunning convergence towards dependency-based syntactic representation structures is witnessed by the performance results of the top-performing systems in the BioNLP’09 — — 1http : / /papers . [sent-17, score-0.342]

7 7 Regarding the fact that dependency representations were always viewed as a vehicle to represent fundamental semantic relationships already at the syntactic level, this is not a great surprise. [sent-42, score-0.405]

8 Yet, dependency grammar is not a monolithic, consensually shaped and welldefined linguistic theory. [sent-43, score-0.292]

9 Accordingly, associated parsers tend to vary in terms of dependency pairing or structuring (which pairs of words join in a dependency relation? [sent-44, score-0.76]

10 ) and dependency typing (how are dependency relations for a particular pair labelled? [sent-45, score-0.664]

11 Depending on the type of dependency theory or parser being used, various representations emerge (Miyao et al. [sent-47, score-0.509]

12 The three now top-performing systems, TOKYO, TURKU and JULIELab, all rely on dependency graphs for solving the event extraction tasks. [sent-80, score-0.818]

13 (2008) already assessed the impact of different parsers for the task of biomedical relation extraction (PPI). [sent-85, score-0.396]

14 Here we perform a similar study for the task of event extraction and focus, in particular, on the impact of various dependency representations such as Stanford and CoNLL’X dependencies and additional trimming procedures. [sent-86, score-1.201]

15 Our main goal is to investigate into the crucial role of proper representation structures for dependency graphs so that the performance gap from Shared Task results between the best-performing TOKYO system and the JULIELab system be narrowed. [sent-88, score-0.435]

16 html whether the focus is on the event itself or on the arguments involved: Event trigger identification deals with the large variety of alternative verbalizations of the same event type, e. [sent-95, score-0.73]

17 , whether the event is expressed in a verbal or in a nominalized form (“A is expressed” as well as “the expression of A” both refer to the same event type, viz. [sent-97, score-0.731]

18 Since the same trigger may stand for more than one event type, event trigger ambiguity has to be resolved as well. [sent-99, score-0.776]

19 Event trigger disambiguation selects the correct event name from the set of alternative event triggers. [sent-100, score-0.73]

20 2 JULIELab System The JULIELab solution can best be characterized as a single-step learning approach for event detection as the system does not separate the overall learning task into independent event trigger and event argument learning subtasks. [sent-106, score-1.072]

21 9 The JULIELab system incorporates manually curated dictionaries and machine learning (ML) methodologies to sort out associated event triggers and arguments on dependency graph structures. [sent-107, score-0.727]

22 , 2010) using modified dependency representations from the MST parser (McDonald et al. [sent-117, score-0.509]

23 In this study, we perform event extraction experiments with various dependency representations that allow us to measure their effects on the event extraction task and to increase the overall JULIELab system performance in terms of F-score. [sent-119, score-1.341]

24 9The JULIELab system considers all relevant lexical items as potential event triggers which might represent an event. [sent-120, score-0.382]

25 Only those event triggers that can eventually be connected to arguments, finally, represent a true event. [sent-121, score-0.382]

26 984 4 Dependency Graph Representations In this section, we focus on representation formats of dependency graphs. [sent-122, score-0.385]

27 1, we introduce fundamental notions underlying dependency parsing and consider established representation formats for dependency structures as generated by various parsers. [sent-124, score-0.76]

28 2, we account for selected trimming operations for dependency graphs to ease IE. [sent-126, score-0.569]

29 In a nutshell, in dependency graphs of sentences, nodes represent single words and edges account for head-modifier relations between single words. [sent-130, score-0.393]

30 Despite this common understanding, concrete syntactic representations often differ markedly from one dependency theory/parser to the other. [sent-131, score-0.405]

31 The differences fall into two main categories: dependency pairing or structuring (which pairs of words join in a dependency relation? [sent-132, score-0.621]

32 ) and dependency typing (how are dependency relations for a particular pair labelled? [sent-133, score-0.664]

33 The Link Grammar Parser (Sleator and Temperley, 1991) employs a particularly finegrained repertoire of dependency relations adding up to 106 types, whereas the well-known MINIPAR parser (Lin, 1998) relies on 59 types. [sent-137, score-0.439]

34 Differences in dependency structure are at least as common as differences in dependency relation typing (see below). [sent-138, score-0.664]

35 From the GENIA corpus, using this script, we could only extract 29 CoNLL dependency relations. [sent-143, score-0.292]

36 Figure 1: Example of CoNLL 2008 dependencies, as used in most of the native dependency Figure 2: Stanford dependencies, basic conversion from Penn parsers. [sent-144, score-0.528]

37 In general, dependency graphs can be generated by syntactic parsers in two ways. [sent-146, score-0.489]

38 First, native dependency parsers output CoNLL’X or Stanford dependencies dependent on which representation format they have been trained Second, in a derivative dependency mode, the output of constituencybased parsers, e. [sent-147, score-1.0]

39 In the following, we provide a short description of these two established dependency graph representations: on. [sent-150, score-0.345]

40 This dependency Ltr’eXe f doerpmeantd was eusse (dC iDn t. [sent-152, score-0.292]

41 he T CisoN dLepLe’nXShared Tasks on multi-lingual dependency parsing (see Figure 1). [sent-153, score-0.337]

42 It has been adopted by most native dependency parsers and was originally obtained from Penn Treebank (PTB) trees using constituent-to-dependency conversion (Johansson and Nugues, 2007). [sent-154, score-0.635]

43 s (2006) wfaors 11We disregard in this study other dependency representations such as MINIPAR and LINK GRAMMAR representations. [sent-161, score-0.405]

44 While in SD the subject of the passive construction is represented by a special nsub j dependency label, in CD we find the same subject label as for active constructions SUB ( J) . [sent-175, score-0.47]

45 edu / s o ftware / Figure 3: Noun phrase representation CoNLL’X dependency trees. [sent-182, score-0.339]

46 in tation scheme is completely different in that auxiliaries much in common with standard dependency theory are chosen to occupy the role of the governor (see Figure 1). [sent-183, score-0.388]

47 This idea is directly reflected in the Stanford dependencies which narrow the distance between nodes in the dependency graph by collapsing procedures (the so-called collapsed mode of phrase structure conversion). [sent-187, score-0.701]

48 fA coordinations with sharing the dependency relations of conjuncts (the so-called ccprocessed mode of phrase structure conversion). [sent-190, score-0.646]

49 For coordinations, they propagate the dependency relation of the first conjunct to all the other conjuncts within the coordination. [sent-195, score-0.335]

50 For auxiliaries/modals, they prune the auxiliaries/modals as governors from the dependency graph and propagate the dependency relations of these nodes to the main verbs. [sent-196, score-0.68]

51 Finally, for prepositions, they collapse a pair of typed dependencies into a single typed dependency (as illustrated above). [sent-197, score-0.401]

52 For the following experiments, we extended the trimming procedures and propose the re-structuring 986 Figure 4: Trimming procedure noun phrase on CoNLL’X dependency trees. [sent-198, score-0.585]

53 of noun phrases with action adjectives to make the dependency representation even more compact for semantic interpretation. [sent-199, score-0.41]

54 The original dependency representation of the noun phrase selects the rightmost noun as the head of the NP and thus all remaining elements are its dependents (see Figure 3). [sent-200, score-0.415]

55 Therefore, we re-structure the dependency graph by changing the head of “IL-10” from “expression ” to “mediated”. [sent-203, score-0.345]

56 5 Experiments and Results In this section, we describe the experiments and results related to event extraction tasks based on alternative dependency graph representations. [sent-206, score-0.813]

57 GDep (Sagae and Tsujii, 2007), a native dependency parser. [sent-210, score-0.372]

58 The native dependency parsers were re-trained on the GENIA Treebank (Tateisi et al. [sent-215, score-0.511]

59 For the Stanford dependency conversion, we used the Stanford parser tool,16 for CoNLL’07 and CoNLL’08 we used the treebank-to-CoNLL conversion scripts17 available from the CoNLL’X Shared Task organizers. [sent-220, score-0.52]

60 For our experiments, we converted the prediction results of the phrase structure based parsers into five dependency graph representations, viz. [sent-224, score-0.484]

61 The JULIELab event extraction system was retrained on the Shared Task data enriched with different outputs of syntactic parsers as described above. [sent-226, score-0.607]

62 The results for the event extraction task are represented in Table 1. [sent-227, score-0.468]

63 Due to the space limitation of this paper we provide the summarized results of important event extraction sub-tasks only, i. [sent-228, score-0.468]

64 , results for basic events (Gene Expression, Transcription, Localization, Protein Catabolism) are summarized 14For the training of dependency parsers, we used from the available Stanford conversion variants only Stanford basic. [sent-230, score-0.539]

65 The collapsed and ccprocessed variants do not provide dependency trees and are not recommended for training native dependency parsers. [sent-231, score-0.843]

66 Obviously, the event extraction system trained on various dependency representations indeed produces truly different results. [sent-245, score-0.873]

67 The top three event extraction results on the development data based on different syntactic parsers results are achieved with M+C parser CoNLL’07 representation (55. [sent-259, score-0.758]

68 Surprisingly, both the CoNLL’08 and CoNLL’07 formats clearly outperform Stanford representations on all event extraction tasks. [sent-263, score-0.627]

69 The collapsed and ccprocessed modes produce even worse results for the event extraction tasks. [sent-265, score-0.688]

70 Our second experiment focused on trimming operations on CoNLL’X dependency graphs. [sent-266, score-0.511]

71 Here we performed event extraction after the trimming of the dependency trees as described in Section 4. [sent-267, score-0.979]

72 2 in different modes: coords re-structuring coordinations; preps collapsing of prepositions; auxiliaries propagating dependency relations of auxiliars and modals to main verbs; noun phrase re-structuring noun phrases containing action adjectives. [sent-268, score-0.679]

73 Our second experiment showed that the extraction of selected events can profit in particular from the trimming procedures coords and auxiliaries, but there is no evidence for a general trimming configuration for – – – – – – – the overall event extraction task. [sent-269, score-1.207]

74 It is quite evident that the CoNLL’08 and CoNLL’07 dependencies modified for auxiliaries and coordinations are the best configurations for four events (out of nine). [sent-271, score-0.378]

75 989 only one event profits from trimming of prepositions (Protein Catabolism). [sent-280, score-0.611]

76 Only the Binding event profits significantly from noun phrase modifications (see Table 3). [sent-281, score-0.38]

77 The overall event extraction results of this final configuration are presented in Tables 4 and 5. [sent-285, score-0.468]

78 9 percentage points F-score in the overall event extraction compared to the best-performing single parser configuration (M+C, CoNLL’07) (see Table 4, ALLTOTAL). [sent-287, score-0.572]

79 9 percentage points in the overall event extraction task (see Table 4, ALL-TOTAL). [sent-290, score-0.468]

80 18 The results on the official test data reveal that the performance differences between various parsers may play a much smaller role than the proper choice of dependency representations. [sent-292, score-0.431]

81 Therefore, we focus here on the analysis of false positives 18The current JULIELab system uses event-specific trimming procedures on CoNLL’07 dependencies determined on the development data set (see Buyko et al. [sent-296, score-0.504]

82 , nsub j pas s prep on prep with prep in prep for prep as . [sent-305, score-0.531]

83 Some dependency labels occur only in set B such as agent, prep unl ike prep upon. [sent-306, score-0.466]

84 The dependency labels such as abbrev dep nsub j nsub j pas s have , , , , , , , , , a higher occurrence in set B than in set A. [sent-311, score-0.484]

85 This analysis renders evidence that the distinction of nsub j and nsub jpas s does not seem to have been properly learned for event extraction. [sent-312, score-0.534]

86 As in the previous experiments, we compared false positives from two mode outputs, here the CoNLL’07 mode and the CoNLL’07 modified for auxiliaries and coordinations mode. [sent-314, score-0.488]

87 The dependency labels such as VC SUBJ, COORD, and IOBJ occur more frequently in the additional false positives from the CoNLL’07 mode than in the intersection of false positives from both system outputs. [sent-316, score-0.697]

88 Obviously, the trimming of auxiliary and coordination structures has a direct positive effect on the argument extraction reducing false positive numbers especially with corresponding dependency labels in shortest dependency paths. [sent-317, score-1.107]

89 Our analysis of false positives shows that the distinction between active and passive subject labels, abbreviation labels, as well as collapsing prepositions in the Stanford dependencies, could not have been properly learned, which consequently leads to , an increased rate of false positives. [sent-318, score-0.398]

90 The trimming of auxiliary structures and the subsequent coordination collapsing on CoNLL’07 dependencies has indeed event-specific positive effects on the event extraction. [sent-319, score-0.872]

91 The main focus of this work has been on the evaluation of effects of different dependency graph representations on the IE task achievement (here the task of event extraction). [sent-320, score-0.8]

92 , M+C parser, the MST, MALT and GDep, are a reasonable basis for achieving state-of-the art performance in biomedical event extraction. [sent-324, score-0.43]

93 (2008) showed in their experiments that native dependency parsers are faster than constituency-based parsers. [sent-328, score-0.511]

94 When it comes to scaling event extraction to huge biomedical document collections, such as MEDLINE, the selection of a parser is mainly influenced by its run-time performance. [sent-329, score-0.66]

95 , 2010) would thus be an appropriate choice for large-scale event extraction under these constraints. [sent-331, score-0.468]

96 20 7 Conclusion In this paper, we investigated the role different dependency representations may have on the accomplishment of the event extraction task as exemplified by biological events. [sent-332, score-0.916]

97 CoNLL) were then experimentally compared employing different parsers (Bikel, Charniak+Johnson, GDep, MST, MALT), both constituency based (for the derivative dependency mode) as well as dependency based (for the native dependency mode), considering different training scenarios (newspaper vs. [sent-334, score-1.136]

98 991 that the dependency graph representation has a crucial impact on the level of achievement of IE task requirements. [sent-337, score-0.392]

99 Syntactic simplification and se- mantic enrichment - Trimming dependency graphs for event extraction. [sent-377, score-0.692]

100 Event extraction with complex event classification using rich features. [sent-438, score-0.468]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('event', 0.342), ('dependency', 0.292), ('conll', 0.274), ('julielab', 0.256), ('trimming', 0.219), ('stanford', 0.207), ('bionlp', 0.178), ('ccprocessed', 0.144), ('parsers', 0.139), ('buyko', 0.128), ('gdep', 0.128), ('extraction', 0.126), ('conversion', 0.124), ('mst', 0.123), ('representations', 0.113), ('dependencies', 0.109), ('parser', 0.104), ('genia', 0.099), ('june', 0.099), ('fps', 0.096), ('malt', 0.096), ('nsub', 0.096), ('turku', 0.096), ('auxiliaries', 0.096), ('collapsing', 0.091), ('events', 0.091), ('biomedical', 0.088), ('prep', 0.087), ('mode', 0.085), ('coordinations', 0.082), ('miyao', 0.08), ('tokyo', 0.08), ('native', 0.08), ('positives', 0.073), ('ichi', 0.07), ('shared', 0.069), ('false', 0.067), ('sd', 0.06), ('sagae', 0.06), ('graphs', 0.058), ('jun', 0.056), ('udo', 0.055), ('graph', 0.053), ('passive', 0.05), ('prepositions', 0.05), ('protein', 0.049), ('binding', 0.049), ('catabolism', 0.048), ('coords', 0.048), ('ekaterina', 0.048), ('jena', 0.048), ('regulation', 0.048), ('tapio', 0.048), ('bikel', 0.048), ('charniak', 0.048), ('representation', 0.047), ('expression', 0.047), ('fscore', 0.046), ('formats', 0.046), ('trigger', 0.046), ('parsing', 0.045), ('relation', 0.043), ('relations', 0.043), ('biological', 0.043), ('pyysalo', 0.041), ('okyo', 0.041), ('modes', 0.041), ('bj', 0.041), ('lth', 0.041), ('derivative', 0.041), ('jp', 0.04), ('triggers', 0.04), ('intersection', 0.04), ('auxiliary', 0.039), ('noun', 0.038), ('structures', 0.038), ('structuring', 0.037), ('typing', 0.037), ('procedures', 0.036), ('kim', 0.036), ('johnson', 0.035), ('collapsed', 0.035), ('johansson', 0.034), ('cer', 0.034), ('coordination', 0.034), ('marneffe', 0.034), ('action', 0.033), ('kenji', 0.033), ('basic', 0.032), ('constructions', 0.032), ('bioinfer', 0.032), ('clegg', 0.032), ('faessler', 0.032), ('ginter', 0.032), ('heimonen', 0.032), ('jari', 0.032), ('joachim', 0.032), ('juho', 0.032), ('minipar', 0.032), ('miwa', 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

Author: Ekaterina Buyko ; Udo Hahn

Abstract: In state-of-the-art approaches to information extraction (IE), dependency graphs constitute the fundamental data structure for syntactic structuring and subsequent knowledge elicitation from natural language documents. The top-performing systems in the BioNLP 2009 Shared Task on Event Extraction all shared the idea to use dependency structures generated by a variety of parsers either directly or in some converted manner — and optionally modified their output to fit the special needs of IE. As there are systematic differences between various dependency representations being used in this competition, we scrutinize on different encoding styles for dependency information and their possible impact on solving several IE tasks. After assessing more or less established dependency representations such as the Stanford and CoNLL-X dependen— cies, we will then focus on trimming operations that pave the way to more effective IE. Our evaluation study covers data from a number of constituency- and dependency-based parsers and provides experimental evidence which dependency representations are particularly beneficial for the event extraction task. Based on empirical findings from our study we were able to achieve the performance of 57.2% F-score on the development data set of the BioNLP Shared Task 2009.

2 0.21371825 20 emnlp-2010-Automatic Detection and Classification of Social Events

Author: Apoorv Agarwal ; Owen Rambow

Abstract: In this paper we introduce the new task of social event extraction from text. We distinguish two broad types of social events depending on whether only one or both parties are aware of the social contact. We annotate part of Automatic Content Extraction (ACE) data, and perform experiments using Support Vector Machines with Kernel methods. We use a combination of structures derived from phrase structure trees and dependency trees. A characteristic of our events (which distinguishes them from ACE events) is that the participating entities can be spread far across the parse trees. We use syntactic and semantic insights to devise a new structure derived from dependency trees and show that this plays a role in achieving the best performing system for both social event detection and classification tasks. We also use three data sampling approaches to solve the problem of data skewness. Sampling methods improve the F1-measure for the task of relation detection by over 20% absolute over the baseline.

3 0.16607434 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

Author: Slav Petrov ; Pi-Chuan Chang ; Michael Ringgaard ; Hiyan Alshawi

Abstract: It is well known that parsing accuracies drop significantly on out-of-domain data. What is less known is that some parsers suffer more from domain shifts than others. We show that dependency parsers have more difficulty parsing questions than constituency parsers. In particular, deterministic shift-reduce dependency parsers, which are of highest interest for practical applications because of their linear running time, drop to 60% labeled accuracy on a question test set. We propose an uptraining procedure in which a deterministic parser is trained on the output of a more accurate, but slower, latent variable constituency parser (converted to dependencies). Uptraining with 100K unlabeled questions achieves results comparable to having 2K labeled questions for training. With 100K unlabeled and 2K labeled questions, uptraining is able to improve parsing accuracy to 84%, closing the gap between in-domain and out-of-domain performance.

4 0.11649054 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

Author: Samuel Brody

Abstract: We reveal a previously unnoticed connection between dependency parsing and statistical machine translation (SMT), by formulating the dependency parsing task as a problem of word alignment. Furthermore, we show that two well known models for these respective tasks (DMV and the IBM models) share common modeling assumptions. This motivates us to develop an alignment-based framework for unsupervised dependency parsing. The framework (which will be made publicly available) is flexible, modular and easy to extend. Using this framework, we implement several algorithms based on the IBM alignment models, which prove surprisingly effective on the dependency parsing task, and demonstrate the potential of the alignment-based approach.

5 0.084691204 38 emnlp-2010-Dual Decomposition for Parsing with Non-Projective Head Automata

Author: Terry Koo ; Alexander M. Rush ; Michael Collins ; Tommi Jaakkola ; David Sontag

Abstract: This paper introduces algorithms for nonprojective parsing based on dual decomposition. We focus on parsing algorithms for nonprojective head automata, a generalization of head-automata models to non-projective structures. The dual decomposition algorithms are simple and efficient, relying on standard dynamic programming and minimum spanning tree algorithms. They provably solve an LP relaxation of the non-projective parsing problem. Empirically the LP relaxation is very often tight: for many languages, exact solutions are achieved on over 98% of test sentences. The accuracy of our models is higher than previous work on a broad range of datasets.

6 0.07953462 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

7 0.076150849 114 emnlp-2010-Unsupervised Parse Selection for HPSG

8 0.076108567 60 emnlp-2010-Improved Fully Unsupervised Parsing with Zoomed Learning

9 0.073178701 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

10 0.068916641 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

11 0.06766884 25 emnlp-2010-Better Punctuation Prediction with Dynamic Conditional Random Fields

12 0.057289124 21 emnlp-2010-Automatic Discovery of Manner Relations and its Applications

13 0.057222076 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields

14 0.055556919 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

15 0.054050073 88 emnlp-2010-On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing

16 0.053853262 121 emnlp-2010-What a Parser Can Learn from a Semantic Role Labeler and Vice Versa

17 0.051806066 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

18 0.048820172 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

19 0.048059843 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

20 0.045969877 28 emnlp-2010-Collective Cross-Document Relation Extraction Without Labelled Data


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.188), (1, 0.119), (2, 0.151), (3, 0.15), (4, 0.054), (5, 0.002), (6, 0.1), (7, 0.12), (8, 0.114), (9, -0.026), (10, -0.134), (11, -0.016), (12, 0.105), (13, 0.128), (14, 0.007), (15, -0.017), (16, 0.141), (17, 0.261), (18, -0.018), (19, 0.017), (20, 0.216), (21, -0.012), (22, -0.052), (23, -0.069), (24, -0.036), (25, -0.065), (26, 0.137), (27, 0.056), (28, 0.129), (29, -0.22), (30, -0.018), (31, 0.047), (32, -0.064), (33, 0.097), (34, -0.011), (35, -0.208), (36, -0.062), (37, -0.016), (38, -0.094), (39, -0.042), (40, 0.063), (41, 0.053), (42, 0.057), (43, -0.023), (44, 0.027), (45, -0.08), (46, -0.107), (47, 0.095), (48, 0.095), (49, -0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98335499 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

Author: Ekaterina Buyko ; Udo Hahn

Abstract: In state-of-the-art approaches to information extraction (IE), dependency graphs constitute the fundamental data structure for syntactic structuring and subsequent knowledge elicitation from natural language documents. The top-performing systems in the BioNLP 2009 Shared Task on Event Extraction all shared the idea to use dependency structures generated by a variety of parsers either directly or in some converted manner — and optionally modified their output to fit the special needs of IE. As there are systematic differences between various dependency representations being used in this competition, we scrutinize on different encoding styles for dependency information and their possible impact on solving several IE tasks. After assessing more or less established dependency representations such as the Stanford and CoNLL-X dependen— cies, we will then focus on trimming operations that pave the way to more effective IE. Our evaluation study covers data from a number of constituency- and dependency-based parsers and provides experimental evidence which dependency representations are particularly beneficial for the event extraction task. Based on empirical findings from our study we were able to achieve the performance of 57.2% F-score on the development data set of the BioNLP Shared Task 2009.

2 0.74402153 20 emnlp-2010-Automatic Detection and Classification of Social Events

Author: Apoorv Agarwal ; Owen Rambow

Abstract: In this paper we introduce the new task of social event extraction from text. We distinguish two broad types of social events depending on whether only one or both parties are aware of the social contact. We annotate part of Automatic Content Extraction (ACE) data, and perform experiments using Support Vector Machines with Kernel methods. We use a combination of structures derived from phrase structure trees and dependency trees. A characteristic of our events (which distinguishes them from ACE events) is that the participating entities can be spread far across the parse trees. We use syntactic and semantic insights to devise a new structure derived from dependency trees and show that this plays a role in achieving the best performing system for both social event detection and classification tasks. We also use three data sampling approaches to solve the problem of data skewness. Sampling methods improve the F1-measure for the task of relation detection by over 20% absolute over the baseline.

3 0.50124222 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

Author: Slav Petrov ; Pi-Chuan Chang ; Michael Ringgaard ; Hiyan Alshawi

Abstract: It is well known that parsing accuracies drop significantly on out-of-domain data. What is less known is that some parsers suffer more from domain shifts than others. We show that dependency parsers have more difficulty parsing questions than constituency parsers. In particular, deterministic shift-reduce dependency parsers, which are of highest interest for practical applications because of their linear running time, drop to 60% labeled accuracy on a question test set. We propose an uptraining procedure in which a deterministic parser is trained on the output of a more accurate, but slower, latent variable constituency parser (converted to dependencies). Uptraining with 100K unlabeled questions achieves results comparable to having 2K labeled questions for training. With 100K unlabeled and 2K labeled questions, uptraining is able to improve parsing accuracy to 84%, closing the gap between in-domain and out-of-domain performance.

4 0.41044524 60 emnlp-2010-Improved Fully Unsupervised Parsing with Zoomed Learning

Author: Roi Reichart ; Ari Rappoport

Abstract: We introduce a novel training algorithm for unsupervised grammar induction, called Zoomed Learning. Given a training set T and a test set S, the goal of our algorithm is to identify subset pairs Ti, Si of T and S such that when the unsupervised parser is trained on a training subset Ti its results on its paired test subset Si are better than when it is trained on the entire training set T. A successful application of zoomed learning improves overall performance on the full test set S. We study our algorithm’s effect on the leading algorithm for the task of fully unsupervised parsing (Seginer, 2007) in three different English domains, WSJ, BROWN and GENIA, and show that it improves the parser F-score by up to 4.47%.

5 0.37683722 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

Author: Samuel Brody

Abstract: We reveal a previously unnoticed connection between dependency parsing and statistical machine translation (SMT), by formulating the dependency parsing task as a problem of word alignment. Furthermore, we show that two well known models for these respective tasks (DMV and the IBM models) share common modeling assumptions. This motivates us to develop an alignment-based framework for unsupervised dependency parsing. The framework (which will be made publicly available) is flexible, modular and easy to extend. Using this framework, we implement several algorithms based on the IBM alignment models, which prove surprisingly effective on the dependency parsing task, and demonstrate the potential of the alignment-based approach.

6 0.31332642 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

7 0.30478084 25 emnlp-2010-Better Punctuation Prediction with Dynamic Conditional Random Fields

8 0.29287663 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

9 0.2858068 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

10 0.28217712 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

11 0.28061929 122 emnlp-2010-WikiWars: A New Corpus for Research on Temporal Expressions

12 0.26460087 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions

13 0.26174256 38 emnlp-2010-Dual Decomposition for Parsing with Non-Projective Head Automata

14 0.23102523 75 emnlp-2010-Lessons Learned in Part-of-Speech Tagging of Conversational Speech

15 0.22827934 110 emnlp-2010-Turbo Parsers: Dependency Parsing by Approximate Variational Inference

16 0.22632641 21 emnlp-2010-Automatic Discovery of Manner Relations and its Applications

17 0.2194279 114 emnlp-2010-Unsupervised Parse Selection for HPSG

18 0.21637358 88 emnlp-2010-On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing

19 0.20419349 93 emnlp-2010-Resolving Event Noun Phrases to Their Verbal Mentions

20 0.19680764 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.015), (10, 0.019), (12, 0.045), (21, 0.018), (29, 0.081), (30, 0.021), (32, 0.015), (49, 0.362), (52, 0.022), (56, 0.077), (62, 0.018), (66, 0.062), (72, 0.066), (76, 0.041), (79, 0.015), (82, 0.012), (87, 0.023), (89, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.76594269 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

Author: Ekaterina Buyko ; Udo Hahn

Abstract: In state-of-the-art approaches to information extraction (IE), dependency graphs constitute the fundamental data structure for syntactic structuring and subsequent knowledge elicitation from natural language documents. The top-performing systems in the BioNLP 2009 Shared Task on Event Extraction all shared the idea to use dependency structures generated by a variety of parsers either directly or in some converted manner — and optionally modified their output to fit the special needs of IE. As there are systematic differences between various dependency representations being used in this competition, we scrutinize on different encoding styles for dependency information and their possible impact on solving several IE tasks. After assessing more or less established dependency representations such as the Stanford and CoNLL-X dependen— cies, we will then focus on trimming operations that pave the way to more effective IE. Our evaluation study covers data from a number of constituency- and dependency-based parsers and provides experimental evidence which dependency representations are particularly beneficial for the event extraction task. Based on empirical findings from our study we were able to achieve the performance of 57.2% F-score on the development data set of the BioNLP Shared Task 2009.

2 0.65787882 80 emnlp-2010-Modeling Organization in Student Essays

Author: Isaac Persing ; Alan Davis ; Vincent Ng

Abstract: Automated essay scoring is one of the most important educational applications of natural language processing. Recently, researchers have begun exploring methods of scoring essays with respect to particular dimensions of quality such as coherence, technical errors, and relevance to prompt, but there is relatively little work on modeling organization. We present a new annotated corpus and propose heuristic-based and learning-based approaches to scoring essays along the organization dimension, utilizing techniques that involve sequence alignment, alignment kernels, and string kernels.

3 0.36172804 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

Author: Kristian Woodsend ; Yansong Feng ; Mirella Lapata

Abstract: The task of selecting information and rendering it appropriately appears in multiple contexts in summarization. In this paper we present a model that simultaneously optimizes selection and rendering preferences. The model operates over a phrase-based representation of the source document which we obtain by merging PCFG parse trees and dependency graphs. Selection preferences for individual phrases are learned discriminatively, while a quasi-synchronous grammar (Smith and Eisner, 2006) captures rendering preferences such as paraphrases and compressions. Based on an integer linear programming formulation, the model learns to generate summaries that satisfy both types of preferences, while ensuring that length, topic coverage and grammar constraints are met. Experiments on headline and image caption generation show that our method obtains state-of-the-art performance using essentially the same model for both tasks without any major modifications.

4 0.35978192 32 emnlp-2010-Context Comparison of Bursty Events in Web Search and Online Media

Author: Yunliang Jiang ; Cindy Xide Lin ; Qiaozhu Mei

Abstract: In this paper, we conducted a systematic comparative analysis of language in different contexts of bursty topics, including web search, news media, blogging, and social bookmarking. We analyze (1) the content similarity and predictability between contexts, (2) the coverage of search content by each context, and (3) the intrinsic coherence of information in each context. Our experiments show that social bookmarking is a better predictor to the bursty search queries, but news media and social blogging media have a much more compelling coverage. This comparison provides insights on how the search behaviors and social information sharing behaviors of users are correlated to the professional news media in the context of bursty events.

5 0.35882816 82 emnlp-2010-Multi-Document Summarization Using A* Search and Discriminative Learning

Author: Ahmet Aker ; Trevor Cohn ; Robert Gaizauskas

Abstract: In this paper we address two key challenges for extractive multi-document summarization: the search problem of finding the best scoring summary and the training problem of learning the best model parameters. We propose an A* search algorithm to find the best extractive summary up to a given length, which is both optimal and efficient to run. Further, we propose a discriminative training algorithm which directly maximises the quality ofthe best summary, rather than assuming a sentence-level decomposition as in earlier work. Our approach leads to significantly better results than earlier techniques across a number of evaluation metrics.

6 0.35756195 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

7 0.35526481 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice

8 0.35390666 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

9 0.35266438 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

10 0.35261565 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

11 0.35152465 35 emnlp-2010-Discriminative Sample Selection for Statistical Machine Translation

12 0.34808844 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

13 0.34624708 103 emnlp-2010-Tense Sense Disambiguation: A New Syntactic Polysemy Task

14 0.34616566 48 emnlp-2010-Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails

15 0.34604812 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation

16 0.34592164 87 emnlp-2010-Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space

17 0.34586167 25 emnlp-2010-Better Punctuation Prediction with Dynamic Conditional Random Fields

18 0.34576961 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields

19 0.34537739 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

20 0.34450796 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation