acl acl2011 acl2011-122 knowledge-graph by maker-knowledge-mining

122 acl-2011-Event Extraction as Dependency Parsing

Source: pdf

Author: David McClosky ; Mihai Surdeanu ; Christopher Manning

Abstract: Nested event structures are a common occurrence in both open domain and domain specific extraction tasks, e.g., a “crime” event can cause a “investigation” event, which can lead to an “arrest” event. However, most current approaches address event extraction with highly local models that extract each event and argument independently. We propose a simple approach for the extraction of such structures by taking the tree of event-argument relations and using it directly as the representation in a reranking dependency parser. This provides a simple framework that captures global properties of both nested and flat event structures. We explore a rich feature space that models both the events to be parsed and context from the original supporting text. Our approach obtains competitive results in the extraction of biomedical events from the BioNLP’09 shared task with a F1 score of 53.5% in development and 48.6% in testing.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu , , Abstract Nested event structures are a common occurrence in both open domain and domain specific extraction tasks, e. [sent-3, score-0.793]

2 , a “crime” event can cause a “investigation” event, which can lead to an “arrest” event. [sent-5, score-0.625]

3 However, most current approaches address event extraction with highly local models that extract each event and argument independently. [sent-6, score-1.268]

4 We propose a simple approach for the extraction of such structures by taking the tree of event-argument relations and using it directly as the representation in a reranking dependency parser. [sent-7, score-0.367]

5 This provides a simple framework that captures global properties of both nested and flat event structures. [sent-8, score-0.723]

6 Our approach obtains competitive results in the extraction of biomedical events from the BioNLP’09 shared task with a F1 score of 53. [sent-10, score-0.452]

7 1 Introduction Event structures in open domain texts are frequently highly complex and nested: a “crime” event can cause an “investigation” event, which can lead to an “arrest” event (Chambers and Jurafsky, 2009). [sent-13, score-1.372]

8 , a REGULATION event causes a TRANSCRIPTION event (see Figure 1a for a detailed example). [sent-18, score-1.186]

9 Despite this observation, many stateof-the-art supervised event extraction models still 1626 extract events and event arguments independently, ignoring their underlying structure (Bj o¨rne et al. [sent-19, score-1.6]

10 In this paper, we propose a new approach for supervised event extraction where we take the tree of relations and their arguments and use it directly as the representation in a dependency parser (rather than conventional syntactic relations). [sent-22, score-0.969]

11 Our approach is conceptually simple: we first convert the original representation of events and their arguments to dependency trees by creating dependency arcs between event anchors (phrases that anchor events in the supporting text) and their corresponding arguments. [sent-23, score-1.993]

12 1 Note that after conversion, only event anchors and entities remain. [sent-24, score-0.814]

13 Figure 1shows a sentence and its converted form from the biomedical domain with four events: two POSITIVE REGULATION events, anchored by the phrase “acts as a costimulatory signal,” and two TRANSCRIPTION events, both anchored on “gene transcription. [sent-25, score-0.327]

14 ” All events take either protein entity mentions (PROT) or other events as arguments. [sent-26, score-0.708]

15 The latter is what allows for nested event structures. [sent-27, score-0.688]

16 We built a global reranking parser model using multiple decoders from MSTParser (McDonald et al. [sent-29, score-0.333]

17 c ss2o0ci1a1ti Aonss foocria Ctioomnp fourta Ctioomnaplu Ltaintigouniaslti Lcisn,g puaigsetsic 1s626–1635, (a) Original sentence with nested events (b) After conversion to event dependencies Figure 1: Nested events in the text fragment: “. [sent-37, score-1.375]

18 ” Throughout this paper, bold text indicates instances of event anchors and italicized text denotes entities (PROTEINs in the BioNLP’09 domain). [sent-43, score-0.814]

19 We propose a wide range of features for event extraction. [sent-47, score-0.622]

20 Our analysis indicates that features which model the global event structure yield considerable performance improvements, which proves that modeling event structure jointly is beneficial. [sent-48, score-1.25]

21 We evaluate on the biomolecular event corpus from the the BioNLP’09 shared task and show that our approach obtains competitive results. [sent-50, score-0.684]

22 On the other hand, our approach focuses on event structures that are nested and have an arbitrary number of arguments. [sent-61, score-0.772]

23 In the biomedical domain, two recent papers proposed joint models for event extraction based on Markov logic networks (MLN) (Riedel et al. [sent-65, score-0.702]

24 Both works propose elegant frameworks where event anchors and arguments are jointly predicted for all events in the same sentence. [sent-67, score-1.164]

25 We also propose and analyze a richer feature space that captures more information on the global event structure in a sentence. [sent-70, score-0.628]

26 Our approach converts the original event representation to dependency trees containing both event anchors and entity mentions, and trains a battery of parsers to recognize these structures. [sent-83, score-1.638]

27 The trees are built using event anchors predicted by a separate classifier. [sent-84, score-0.831]

28 net/projects/mstparser/ 1628 converted back to the original event representation and passed to a reranker component (Collins, 2000; Charniak and Johnson, 2005), tailored to optimize the task-specific evaluation metric. [sent-91, score-0.934]

29 Note that although we use the biomedical event domain from the BioNLP’09 shared task to illustrate our work, the core of our approach is almost domain independent. [sent-92, score-0.819]

30 Our only constraints are that each event mention be activated by a phrase that serves as an event anchor, and that the event-argument structures be mapped to a dependency tree. [sent-93, score-1.371]

31 The conversion between event and dependency structures and the reranker metric are the only domain dependent components in our approach. [sent-94, score-1.132]

32 1 Converting between Event Structures and Dependencies As in previous work, we extract event structures at sentence granularity, i. [sent-96, score-0.677]

33 For each sentence, we convert the BioNLP’09 event representation to a graph (representing a labeled dependency tree) as follows. [sent-102, score-0.766]

34 The nodes in the graph are protein entity mentions, event anchors, and a virtual ROOT node. [sent-103, score-0.781]

35 For each event anchor, we create one link to each of its arguments labeled with the slot name of the argument (for example, connecting gene transcription to IL-2 with the label THEME in Figure 1b). [sent-106, score-0.842]

36 We link the ROOT node to each entity that does not participate in an event using the ROOTLABEL dependency label. [sent-107, score-0.743]

37 Finally, we link the ROOT node to each top-level event anchor, (those which do not serve as arguments to other events) again using the ROOT-LABEL label. [sent-108, score-0.695]

38 Furthermore, the graph may contain selfreferential edges (self-loops) due to related events sharing the same anchor (example below). [sent-111, score-0.633]

39 An example of these can be seen in the text “the domain interacted preferentially with underphosphorylated TRAF2,” there are two events anchored by the same underphosphorylated phrase, a NEGATIVE REGULATION and a PHOSPHORYLATION event, and the latter serves as a THEME argument for the former. [sent-113, score-0.498]

40 5% of the events in the training arguments are left without arguments, so we remove them as well. [sent-116, score-0.384]

41 Step 2: We break structures where one argument participates in multiple events, by keeping only the dependency to the event that appears first in text. [sent-117, score-0.857]

42 For example, in the fragment “by enhancing its inactivation through binding to soluble TNF-alpha receptor type II,” the protein TNF-alpha receptor type II is an argument in both a BINDING event (binding) and in a NEGATIVE REGULATION event (inactivation). [sent-118, score-1.472]

43 Step 3: We unify events with the same types anchored on the same anchor phrase. [sent-121, score-0.608]

44 For example, for the fragment “Surface expression of intercellular adhesion molecule-1, P-selectin, and E-selectin,” the BioNLP’09 annotation contains three distinct GENE EXPRESSION events anchored on the same phrase (expression), each having one of the proteins as THEMEs. [sent-122, score-0.379]

45 5% of the events in training are removed in this step (but no dependencies are lost). [sent-125, score-0.343]

46 Since all non-BINDING events can have at most one THEME argument, we duplicate nonBINDING events with multiple THEME arguments by creating one separate event for each THEME. [sent-141, score-1.286]

47 Similarly, since only one CAUSE ULATION events ments, obtaining REGULATION events accepts argument, we duplicate REGwith multiple CAUSE arguone event per CAUSE. [sent-143, score-1.184]

48 In such situations, we first group THEME arguments by the label ofthe first Stanford dependency (Marneffe and Manning, 2008) from the head word of the anchor to this argument. [sent-148, score-0.468]

49 Then we create one event for each combination of THEME arguments in different groups. [sent-149, score-0.695]

50 , “acts” for the anchor “acts as a costimulatory signal”). [sent-153, score-0.31]

51 We hypothesize that the sequence tagger fails to capture potential dependencies between anchor labels – which are its main advantage over an i. [sent-156, score-0.326]

52 We also generate combination features by concatenating: (a) the last token in each path with the sequence of dependency labels along the corresponding path; and (b) the word to be classified, the last token in each path, and the sequence of dependency labels in that path. [sent-169, score-0.358]

53 Parsing Event Structures Given the entities and event anchors from the previous stages in the pipeline, the parser generates labeled dependency links between them. [sent-171, score-0.959]

54 Our features for MSTParser use both the event structures themselves as well as the surrounding English sentences which include them. [sent-188, score-0.706]

55 By mapping event anchors and entities back to the original text, we can incorporate information from the original English sentence as well its syntactic tree and corresponding Stanford dependencies. [sent-189, score-0.884]

56 MSTParser comes with a large number of features which, in our setup, operate on the event structure level (since this is the “sentence” from the parser’s point of view). [sent-191, score-0.622]

57 Original sentence words: Words from the full English sle snetnetnecnec surrounding ardnds f broemtwe thene ftuhlel nodes in event dependencies, and their bucketed distances. [sent-197, score-0.648]

58 This additional context helps compensate for how our anchor detection provides only the head word of each anchor, which does not necessarily provide the full context for event disambiguation. [sent-198, score-0.858]

59 , only regulation events can have edges labeled with CAUSE). [sent-203, score-0.475]

60 For instance, an edge between a BINDING event anchor and a POSITIVE REGULATION could cause this feature to fire with the values [head:EVENT, child:COMPLEX EVENT] or [head:SIMPLE EVENT, child:EVENT]. [sent-209, score-0.936]

61 6 The latter feature can capture generalizations such as “simple event anchors cannot take other events as arguments. [sent-210, score-1.062]

62 We include the syntactic paths between sibling nodes (adjacent arguments of the same event anchor). [sent-213, score-0.839]

63 For example, a POSITIVE REGULATION anchor attached to a PROTEIN and a BINDING event would produce an Ontology feature with the value [parent:COMPLEX EVENT, child1:PROTEIN, × child2:SIMPLE EVENT] (among several other possible combinations). [sent-216, score-0.858]

64 The log frequency component favors more frequent features while the entropy component favors features with low entropy in their edge − 5We define complex events are those which can accept other events are arguments. [sent-225, score-0.745]

65 Rerankers provide additional advantages in our case due to the mismatch between the dependency structures that the parser operates on and their corresponding event structures. [sent-240, score-0.822]

66 We convert the output from the parser to event structures (Section 3. [sent-241, score-0.721]

67 This allows the reranker to capture features over the actual event structures rather than their original dependency trees which may contain extraneous portions. [sent-243, score-1.107]

68 The parser, on the other hand, attempts to optimize the Labeled Attachment Score (LAS) between the dependency trees and converted gold dependency trees. [sent-245, score-0.318]

69 First, it is much more local than the BioNLP Second, the converted gold dependency trees lose information that doesn’t transfer to trees (specifically, that event structures are really multi-DAGs and not trees). [sent-247, score-0.945]

70 We adapt the maximum entropy reranker from Charniak and Johnson (2005) by creating a customized feature extractor for event structures in all other ways, the reranker model is unchanged. [sent-248, score-1.175]

71 9 — • Source: 8For instance, event anchors with no arguments could be proposed by the parser. [sent-250, score-0.882]

72 These event anchors are automatically dropped by the conversion process. [sent-251, score-0.842]

73 9As an example, getting an edge label between an anchor and its argument correct is unimportant if the anchor is missing other arguments. [sent-252, score-0.628]

74 15 Table 1: BioNLP recall, precision, and F1 scores of individual decoders and the best decoder combination on development data with the impact of event anchor detection and reranking. [sent-265, score-1.02]

75 Event path: Path from each node in the event tree up to hth:e P root. [sent-268, score-0.624]

76 Umnl eiakceh th noed Pea inth t hfeea etuvreenst in the parser, these paths are over event structures, not the syntactic dependency graphs from the original English sentence. [sent-269, score-0.77]

77 Event frames: Event anchors with all their arguments aamnde argument snlcoht names. [sent-274, score-0.341]

78 Throughout our experiments, we report BioNLP F1 scores with approximate span and recursive event matching (as described in the shared task definition). [sent-290, score-0.654]

79 We bias the anchor detector to favor recall, allowing the parser and reranker to determine which event anchors will ultimately be used. [sent-292, score-1.338]

80 Table 1a shows the performance of each of the decoders when using gold event anchors. [sent-296, score-0.755]

81 We also present the results from a reranker trained from multiple decoders which is our highest scoring model. [sent-298, score-0.411]

82 As before, the reranker trained from multiple decoders outperforms unreranked models and reranked single decoders. [sent-305, score-0.411]

83 1% due to our conversion process, which enforces the tree constraint, drops events spanning sentences, and performs approximate reconstruction of BINDING events. [sent-311, score-0.375]

84 An oracle reranker picks the highest scoring parse from the available parses. [sent-321, score-0.313]

85 The oracle score with multiple decoders and gold anchors is only 0. [sent-326, score-0.406]

86 Improving the features in the reranker as well as the original parsers will help us move closer to the limit. [sent-329, score-0.313]

87 To get a complex event correct, one must correctly detect and parse all events in 12Additionally, improvements such as document-level parsing and DAG parsing would eliminate the need for much of the approximate and lossy portions of the conversion process. [sent-335, score-1.13]

88 Components shown: AD event anchor detection; Parse best individual parsing model; RR reranking multiple parsers; Conv conversion between the event and dependency representations. [sent-346, score-1.799]

89 l 7245 Table 3: Oracle reranker BioNLP F1 scores for our n-best decoders and their combinations before reranking on the development corpus. [sent-352, score-0.476]

90 the event subtree allowing small errors to have large effects. [sent-353, score-0.593]

91 For example, the reranker can be used to combine not only several parsers but also multiple anchor recognizers. [sent-361, score-0.576]

92 This passes the anchor selection decision to the reranker, which uses global information not available to the current anchor recognizer or parser. [sent-362, score-0.565]

93 Furthermore, our approach can be adapted to parse event structures in entire documents (instead of in- Table 4: Results in the test set broken by event class; scores generated with the main official metric of approximate span and recursive event matching. [sent-363, score-1.897]

94 dividual sentences) by using a representation with a unique ROOT node for all event structures in a document. [sent-364, score-0.706]

95 This representation has the advantage that it maintains cross-sentence events (which account for 5% of BioNLP’09 events), and it allows for document-level features that model discourse structure. [sent-365, score-0.34]

96 One current limitation of the proposed model is that it constrains event structures to map to trees. [sent-367, score-0.677]

97 , 2009) do not have this limitation, because their local decisions are blind to (and hence not limited by) the global event structure. [sent-370, score-0.628]

98 6 Conclusion In this paper we proposed a simple approach for the joint extraction of event structures: we converted the representation of events and their arguments to dependency trees with arcs between event anchors and event arguments, and used a reranking parser to parse these structures. [sent-375, score-2.769]

99 Most importantly, we 1634 showed that the joint modeling of event structures is beneficial: our reranker outperforms parsing models without reranking in five out of the six configurations investigated. [sent-377, score-1.084]

100 Acknowledgments The authors would like to thank Mark Johnson for helpful discussions on the reranker component and the BioNLP shared task organizers, Sampo Pyysalo and Jin-Dong Kim, for answering questions. [sent-378, score-0.335]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('event', 0.593), ('events', 0.282), ('bionlp', 0.269), ('anchor', 0.265), ('reranker', 0.249), ('anchors', 0.187), ('regulation', 0.15), ('decoders', 0.135), ('bj', 0.12), ('arguments', 0.102), ('dependency', 0.101), ('theme', 0.1), ('nested', 0.095), ('reranking', 0.092), ('binding', 0.087), ('structures', 0.084), ('orne', 0.079), ('biomedical', 0.079), ('rne', 0.073), ('parsing', 0.066), ('mstparser', 0.065), ('protein', 0.065), ('conversion', 0.062), ('shared', 0.061), ('anchored', 0.061), ('gene', 0.061), ('dependencies', 0.061), ('path', 0.053), ('miwa', 0.053), ('argument', 0.052), ('trees', 0.051), ('entity', 0.049), ('endpoints', 0.049), ('edge', 0.046), ('ontology', 0.045), ('costimulatory', 0.045), ('pyysalo', 0.045), ('rerankers', 0.045), ('sampo', 0.045), ('parser', 0.044), ('edges', 0.043), ('surdeanu', 0.043), ('domain', 0.043), ('graph', 0.043), ('mihai', 0.042), ('stanford', 0.041), ('mcdonald', 0.04), ('johnson', 0.039), ('syntactic', 0.039), ('converted', 0.038), ('paths', 0.037), ('sibling', 0.037), ('token', 0.037), ('proteins', 0.036), ('ichi', 0.035), ('parsers', 0.035), ('global', 0.035), ('transcription', 0.034), ('dag', 0.034), ('entities', 0.034), ('parse', 0.034), ('mcclosky', 0.032), ('cause', 0.032), ('nodes', 0.031), ('tree', 0.031), ('agnostic', 0.031), ('frames', 0.031), ('oracle', 0.03), ('extraction', 0.03), ('biomolecular', 0.03), ('inactivation', 0.03), ('tapio', 0.03), ('underphosphorylated', 0.03), ('mentions', 0.03), ('charniak', 0.03), ('features', 0.029), ('kim', 0.029), ('representation', 0.029), ('finkel', 0.029), ('jun', 0.028), ('decoder', 0.027), ('complex', 0.027), ('acts', 0.027), ('gold', 0.027), ('multiple', 0.027), ('poon', 0.026), ('dags', 0.026), ('receptor', 0.026), ('consistency', 0.026), ('named', 0.025), ('component', 0.025), ('riedel', 0.025), ('parses', 0.025), ('limit', 0.025), ('marneffe', 0.025), ('mln', 0.024), ('bucketed', 0.024), ('lluis', 0.024), ('marquez', 0.024), ('crime', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 122 acl-2011-Event Extraction as Dependency Parsing

Author: David McClosky ; Mihai Surdeanu ; Christopher Manning

2 0.55286574 328 acl-2011-Using Cross-Entity Inference to Improve Event Extraction

Author: Yu Hong ; Jianfeng Zhang ; Bin Ma ; Jianmin Yao ; Guodong Zhou ; Qiaoming Zhu

Abstract: Event extraction is the task of detecting certain specified types of events that are mentioned in the source language data. The state-of-the-art research on the task is transductive inference (e.g. cross-event inference). In this paper, we propose a new method of event extraction by well using cross-entity inference. In contrast to previous inference methods, we regard entitytype consistency as key feature to predict event mentions. We adopt this inference method to improve the traditional sentence-level event extraction system. Experiments show that we can get 8.6% gain in trigger (event) identification, and more than 11.8% gain for argument (role) classification in ACE event extraction. 1

3 0.47309569 244 acl-2011-Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts

Author: Ruihong Huang ; Ellen Riloff

Abstract: The goal of our research is to improve event extraction by learning to identify secondary role filler contexts in the absence of event keywords. We propose a multilayered event extraction architecture that progressively “zooms in” on relevant information. Our extraction model includes a document genre classifier to recognize event narratives, two types of sentence classifiers, and noun phrase classifiers to extract role fillers. These modules are organized as a pipeline to gradually zero in on event-related information. We present results on the MUC-4 event extraction data set and show that this model performs better than previous systems.

4 0.45193619 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction

Author: Shasha Liao ; Ralph Grishman

Abstract: Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not enough. In this paper, we present a novel self-training strategy, which uses Information Retrieval (IR) to collect a cluster of related documents as the resource for bootstrapping. Also, based on the particular characteristics of this corpus, global inference is applied to provide more confident and informative data selection. We compare this approach to self-training on a normal newswire corpus and show that IR can provide a better corpus for bootstrapping and that global inference can further improve instance selection. We obtain gains of 1.7% in trigger labeling and 2.3% in role labeling through IR and an additional 1.1% in trigger labeling and 1.3% in role labeling by applying global inference. 1

5 0.1913576 18 acl-2011-A Latent Topic Extracting Method based on Events in a Document and its Application

Author: Risa Kitajima ; Ichiro Kobayashi

Abstract: Recently, several latent topic analysis methods such as LSI, pLSI, and LDA have been widely used for text analysis. However, those methods basically assign topics to words, but do not account for the events in a document. With this background, in this paper, we propose a latent topic extracting method which assigns topics to events. We also show that our proposed method is useful to generate a document summary based on a latent topic.

6 0.18241069 293 acl-2011-Template-Based Information Extraction without the Templates

7 0.15982316 186 acl-2011-Joint Training of Dependency Parsing Filters through Latent Support Vector Machines

8 0.13672803 286 acl-2011-Social Network Extraction from Texts: A Thesis Proposal

9 0.13128985 226 acl-2011-Multi-Modal Annotation of Quest Games in Second Life

10 0.12633736 121 acl-2011-Event Discovery in Social Media Feeds

11 0.10525759 143 acl-2011-Getting the Most out of Transition-based Dependency Parsing

12 0.10124099 39 acl-2011-An Ensemble Model that Combines Syntactic and Semantic Clustering for Discriminative Dependency Parsing

13 0.099935949 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

14 0.099866599 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing

15 0.09878277 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features

16 0.095653422 48 acl-2011-Automatic Detection and Correction of Errors in Dependency Treebanks

17 0.094851457 175 acl-2011-Integrating history-length interpolation and classes in language modeling

18 0.093581028 301 acl-2011-The impact of language models and loss functions on repair disfluency detection

19 0.087518886 333 acl-2011-Web-Scale Features for Full-Scale Parsing

20 0.087121747 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.214), (1, 0.057), (2, -0.296), (3, -0.113), (4, 0.355), (5, 0.258), (6, -0.134), (7, -0.042), (8, 0.391), (9, 0.022), (10, -0.046), (11, 0.067), (12, 0.017), (13, 0.026), (14, 0.02), (15, 0.051), (16, 0.088), (17, 0.05), (18, -0.01), (19, -0.025), (20, -0.004), (21, -0.045), (22, -0.021), (23, 0.023), (24, 0.035), (25, -0.002), (26, -0.013), (27, -0.008), (28, -0.011), (29, 0.05), (30, 0.018), (31, 0.037), (32, -0.02), (33, 0.018), (34, 0.035), (35, -0.02), (36, -0.03), (37, 0.029), (38, -0.016), (39, -0.055), (40, 0.018), (41, -0.032), (42, -0.001), (43, -0.01), (44, -0.003), (45, 0.028), (46, 0.024), (47, 0.017), (48, 0.045), (49, -0.05)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96830577 122 acl-2011-Event Extraction as Dependency Parsing

Author: David McClosky ; Mihai Surdeanu ; Christopher Manning

2 0.93020916 328 acl-2011-Using Cross-Entity Inference to Improve Event Extraction

Author: Yu Hong ; Jianfeng Zhang ; Bin Ma ; Jianmin Yao ; Guodong Zhou ; Qiaoming Zhu

3 0.92742032 244 acl-2011-Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts

Author: Ruihong Huang ; Ellen Riloff

4 0.88010353 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction

Author: Shasha Liao ; Ralph Grishman

5 0.62255734 293 acl-2011-Template-Based Information Extraction without the Templates

Author: Nathanael Chambers ; Dan Jurafsky

Abstract: Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data, to learn to extract their slot fillers (e.g., an embassy is the Target of a Bombing template). This paper describes an approach to template-based IE that removes this requirement and performs extraction without knowing the template structure in advance. Our algorithm instead learns the template structure automatically from raw text, inducing template schemas as sets of linked events (e.g., bombings include detonate, set off, and destroy events) associated with semantic roles. We also solve the standard IE task, using the induced syntactic patterns to extract role fillers from specific documents. We evaluate on the MUC-4 terrorism dataset and show that we induce template structure very similar to handcreated gold structure, and we extract role fillers with an F1 score of .40, approaching the performance of algorithms that require full knowledge of the templates.

6 0.61650336 186 acl-2011-Joint Training of Dependency Parsing Filters through Latent Support Vector Machines

7 0.5622527 121 acl-2011-Event Discovery in Social Media Feeds

8 0.56013775 226 acl-2011-Multi-Modal Annotation of Quest Games in Second Life

9 0.43171674 286 acl-2011-Social Network Extraction from Texts: A Thesis Proposal

10 0.38378224 18 acl-2011-A Latent Topic Extracting Method based on Events in a Document and its Application

11 0.29750928 230 acl-2011-Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation

12 0.28886604 39 acl-2011-An Ensemble Model that Combines Syntactic and Semantic Clustering for Discriminative Dependency Parsing

13 0.27936035 243 acl-2011-Partial Parsing from Bitext Projections

14 0.27914858 295 acl-2011-Temporal Restricted Boltzmann Machines for Dependency Parsing

15 0.27712178 143 acl-2011-Getting the Most out of Transition-based Dependency Parsing

16 0.27645949 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features

17 0.27629748 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

18 0.27347085 48 acl-2011-Automatic Detection and Correction of Errors in Dependency Treebanks

19 0.26865557 59 acl-2011-Better Automatic Treebank Conversion Using A Feature-Based Approach

20 0.2517983 333 acl-2011-Web-Scale Features for Full-Scale Parsing

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.02), (17, 0.038), (26, 0.013), (37, 0.475), (39, 0.054), (41, 0.078), (55, 0.022), (59, 0.036), (72, 0.032), (91, 0.027), (96, 0.1), (97, 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.9766503 179 acl-2011-Is Machine Translation Ripe for Cross-Lingual Sentiment Classification?

Author: Kevin Duh ; Akinori Fujino ; Masaaki Nagata

Abstract: Recent advances in Machine Translation (MT) have brought forth a new paradigm for building NLP applications in low-resource scenarios. To build a sentiment classifier for a language with no labeled resources, one can translate labeled data from another language, then train a classifier on the translated text. This can be viewed as a domain adaptation problem, where labeled translations and test data have some mismatch. Various prior work have achieved positive results using this approach. In this opinion piece, we take a step back and make some general statements about crosslingual adaptation problems. First, we claim that domain mismatch is not caused by MT errors, and accuracy degradation will occur even in the case of perfect MT. Second, we argue that the cross-lingual adaptation problem is qualitatively different from other (monolingual) adaptation problems in NLP; thus new adaptation algorithms ought to be considered. This paper will describe a series of carefullydesigned experiments that led us to these conclusions. 1 Summary Question 1: If MT gave perfect translations (semantically), do we still have a domain adaptation challenge in cross-lingual sentiment classification? Answer: Yes. The reason is that while many lations of a word may be valid, the MT system have a systematic bias. For example, the word some” might be prevalent in English reviews, transmight “awebut in 429 translated reviews, the word “excellent” is generated instead. From the perspective of MT, this translation is correct and preserves sentiment polarity. But from the perspective of a classifier, there is a domain mismatch due to differences in word distributions. Question 2: Can we apply standard adaptation algorithms developed for other (monolingual) adaptation problems to cross-lingual adaptation? Answer: No. It appears that the interaction between target unlabeled data and source data can be rather unexpected in the case of cross-lingual adaptation. We do not know the reason, but our experiments show that the accuracy of adaptation algorithms in cross-lingual scenarios have much higher variance than monolingual scenarios. The goal of this opinion piece is to argue the need to better understand the characteristics of domain adaptation in cross-lingual problems. We invite the reader to disagree with our conclusion (that the true barrier to good performance is not insufficient MT quality, but inappropriate domain adaptation methods). Here we present a series of experiments that led us to this conclusion. First we describe the experiment design (§2) and baselines (§3), before answering Question §12 (§4) dan bda Question 32) (§5). 2 Experiment Design The cross-lingual setup is this: we have labeled data from source domain S and wish to build a sentiment classifier for target domain T. Domain mismatch can arise from language differences (e.g. English vs. translated text) or market differences (e.g. DVD vs. Book reviews). Our experiments will involve fixing Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 429–433, T to a common testset and varying S. This allows us to experiment with different settings for adaptation. We use the Amazon review dataset of Prettenhofer (2010)1 , due to its wide range of languages (English [EN], Japanese [JP], French [FR], German [DE]) and markets (music, DVD, books). Unlike Prettenhofer (2010), we reverse the direction of cross-lingual adaptation and consider English as target. English is not a low-resource language, but this setting allows for more comparisons. Each source dataset has 2000 reviews, equally balanced between positive and negative. The target has 2000 test samples, large unlabeled data (25k, 30k, 50k samples respectively for Music, DVD, and Books), and an additional 2000 labeled data reserved for oracle experiments. Texts in JP, FR, and DE are translated word-by-word into English with Google Translate.2 We perform three sets of experiments, shown in Table 1. Table 2 lists all the results; we will interpret them in the following sections. Target (T) Source (S) 312BDMToVuasbDkil-ecE1N:ExpDMB eorVuimsDkice-JEnPtN,s eBD,MtuoVBDpuoVsk:-iFDck-iERxFN,T DB,vVoMaDruky-sSiDc.E-, 3 How much performance degradation occurs in cross-lingual adaptation? First, we need to quantify the accuracy degradation under different source data, without consideration of domain adaptation methods. So we train a SVM classifier on labeled source data3, and directly apply it on test data. The oracle setting, which has no domain-mismatch (e.g. train on Music-EN, test on Music-EN), achieves an average test accuracy of (81.6 + 80.9 + 80.0)/3 = 80.8%4. Aver1http://www.webis.de/research/corpora/webis-cls-10 2This is done by querying foreign words to build a bilingual dictionary. The words are converted to tfidf unigram features. 3For all methods we try here, 5% of the 2000 labeled source samples are held-out for parameter tuning. 4See column EN of Table 2, Supervised SVM results. 430 age cross-lingual accuracies are: 69.4% (JP), 75.6% (FR), 77.0% (DE), so degradations compared to oracle are: -11% (JP), -5% (FR), -4% (DE).5 Crossmarket degradations are around -6%6. Observation 1: Degradations due to market and language mismatch are comparable in several cases (e.g. MUSIC-DE and DVD-EN perform similarly for target MUSIC-EN). Observation 2: The ranking of source language by decreasing accuracy is DE > FR > JP. Does this mean JP-EN is a more difficult language pair for MT? The next section will show that this is not necessarily the case. Certainly, the domain mismatch for JP is larger than DE, but this could be due to phenomenon other than MT errors. 4 Where exactly is the domain mismatch? 4.1 Theory of Domain Adaptation We analyze domain adaptation by the concepts of labeling and instance mismatch (Jiang and Zhai, 2007). Let pt(x, y) = pt (y|x)pt (x) be the target distribution of samples x (e.g. unigram feature vec- tor) and labels y (positive / negative). Let ps (x, y) = ps (y|x)ps (x) be the corresponding source distributio(ny. Wx)pe assume that one (or both) of the following distributions differ between source and target: • Instance mismatch: ps (x) pt (x). • Labeling mismatch: ps (y|x) pt(y|x). Instance mismatch implies that the input feature vectors have different distribution (e.g. one dataset uses the word “excellent” often, while the other uses the word “awesome”). This degrades performance because classifiers trained on “excellent” might not know how to classify texts with the word “awesome.” The solution is to tie together these features (Blitzer et al., 2006) or re-weight the input distribution (Sugiyama et al., 2008). Under some assumptions (i.e. covariate shift), oracle accuracy can be achieved theoretically (Shimodaira, 2000). Labeling mismatch implies the same input has different labels in different domains. For example, the JP word meaning “excellent” may be mistranslated as “bad” in English. Then, positive JP = = 5See “Adapt by Language” columns of Table 2. Note JP+FR+DE condition has 6000 labeled samples, so is not directly comparable to other adaptation scenarios (2000 samples). Nevertheless, mixing languages seem to give good results. 6See “Adapt by Market” columns of Table 2. TargetClassifierOEraNcleJPAFdaRpt bDyE LanJgPu+agFeR+DEMUASdIaCpt D byV MDar BkeOtOK MUSIC-ENSAudpaeprtvedise TdS SVVMM8719..666783..50 7745..62 7 776..937880..36--7768..847745..16 DVD-ENSAudpaeprtveidse TdS SVVMM8801..907701..14 7765..54 7 767..347789..477754..28--7746..57 BOOK-ENSAudpaeprtveidse TdS SVVMM8801..026793..68 7775..64 7 767..747799..957735..417767..24-Table 2: Test accuracies (%) for English Music/DVD/Book reviews. Each column is an adaptation scenario using different source data. The source data may vary by language or by market. For example, the first row shows that for the target of Music-EN, the accuracy of a SVM trained on translated JP reviews (in the same market) is 68.5, while the accuracy of a SVM trained on DVD reviews (in the same language) is 76.8. “Oracle” indicates training on the same market and same language domain as the target. “JP+FR+DE” indicates the concatenation of JP, FR, DE as source data. Boldface shows the winner of Supervised vs. Adapted. reviews ps (y will be associated = +1|x = bad) co(nydit =io +na1l − |x = 1 will be high, whereas the true xdis =tr bibaudti)o wn bad) instead. labeling mismatch, with the word “bad”: lslh boeu hldi hha,v we high pt(y = There are several cases for depending on sheovwe tahle c polarity changes (Table 3). The solution is to filter out these noisy samples (Jiang and Zhai, 2007) or optimize loosely-linked objectives through shared parameters or Bayesian priors (Finkel and Manning, 2009). Which mismatch is responsible for accuracy degradations in cross-lingual adaptation? • Instance mismatch: Systematic Iantessta nwcoerd m diissmtraibtcuhti:on Ssy MT bias gener- sdtiefmferaetinct MfroTm b naturally- occurring English. (Translation may be valid.) Label mismatch: MT error mis-translates a word iLnatob something w: MithT Td eifrfreorren mti polarity. Conclusion from §4.2 and §4.3: Instance mismaCtcohn occurs often; M §4T. error appears Imnisntainmcael. • Mis-translated polarity Effect Taeb0+±.lge→ .3(:±“ 0−tgLhoae b”nd →l m− i“sg→m otbah+dce”h):mIfpoLAinse ca-ptsoriuaesncvieatl /ndioeansgbvcaewrptlimovaeshipntdvaei(+), negative (−), or neutral (0) words have different effects. Wnege athtiivnek ( −th)e, foirrs nt tuwtroa cases hoardves graceful degradation, but the third case may be catastrophic. 431 4.2 Analysis of Instance Mismatch To measure instance mismatch, we compute statistics between ps (x) and pt(x), or approximations thereof: First, we calculate a (normalized) average feature from all samples of source S, which represents the unigram distribution of MT output. Simi- larly, the average feature vector for target T approximates the unigram distribution of English reviews pt(x). Then we measure: • KL Divergence between Avg(S) and Avg(T), wKhLer De Avg() nisc eth bee average Avvegct(oSr.) • Set Coverage of Avg(T) on Avg(S): how many Sweotrd C (type) ien o Tf appears oatn le Aavsgt once ionw wS .m Both measures correlate strongly with final accuracy, as seen in Figure 1. The correlation coefficients are r = −0.78 for KL Divergence and r = 0.71 for Coverage, 0 b.7o8th statistically significant (p < 0.05). This implies that instance mismatch is an important reason for the degradations seen in Section 3.7 4.3 Analysis of Labeling Mismatch We measure labeling mismatch by looking at differences in the weight vectors of oracle SVM and adapted SVM. Intuitively, if a feature has positive weight in the oracle SVM, but negative weight in the adapted SVM, then it is likely a MT mis-translation 7The observant reader may notice that cross-market points exhibit higher coverage but equal accuracy (74-78%) to some cross-lingual points. This suggests that MT output may be more constrained in vocabulary than naturally-occurring English. 0.35 0.3 gnvLrDeiceKe0 0 0. 120.25 510 erts TeCovega0 0 0. .98657 68 70 72 7A4ccuracy76 78 80 82 0.4 68 70 72 7A4ccuracy76 78 80 82 Figure 1: KL Divergence and Coverage vs. accuracy. (o) are cross-lingual and (x) are cross-market data points. is causing the polarity flip. Algorithm 1 (with K=2000) shows how we compute polarity flip rate.8 We found that the polarity flip rate does not correlate well with accuracy at all (r = 0.04). Conclusion: Labeling mismatch is not a factor in performance degradation. Nevertheless, we note there is a surprising large number of flips (24% on average). A manual check of the flipped words in BOOK-JP revealed few MT mistakes. Only 3.7% of 450 random EN-JP word pairs checked can be judged as blatantly incorrect (without sentence context). The majority of flipped words do not have a clear sentiment orientation (e.g. “amazon”, “human”, “moreover”). 5 Are standard adaptation algorithms applicable to cross-lingual problems? One of the breakthroughs in cross-lingual text classification is the realization that it can be cast as domain adaptation. This makes available a host of preexisting adaptation algorithms for improving over supervised results. However, we argue that it may be 8The feature normalization in Step 1 is important that the weight magnitudes are comparable. to ensure 432 Algorithm 1 Measuring labeling mismatch Input: Weight vectors for source wsand target wt Input: Target data average sample vector avg(T) Output: Polarity flip rate f 1: Normalize: ws = avg(T) * ws ; wt = avg(T) * wt 2: Set S+ = { K most positive features in ws} 3: Set S− == {{ KK mmoosstt negative ffeeaattuurreess inn wws}} 4: Set T+ == {{ KK m moosstt npoesgiatitivvee f efeaatuturreess i inn w wt}} 5: Set T− == {{ KK mmoosstt negative ffeeaattuurreess inn wwt}} 6: for each= f{e a Ktur me io ∈t T+ adtiov 7: rif e ia c∈h S fe−a ttuhreen i if ∈ = T f + 1 8: enidf fio ∈r 9: for each feature j ∈ T− do 10: rif e j ∈h Sfe+a uthreen j f ∈ = T f + 1 11: enidf fjo r∈ 12: f = 2Kf better to “adapt” the standard adaptation algorithm to the cross-lingual setting. We arrived at this conclusion by trying the adapted counterpart of SVMs off-the-shelf. Recently, (Bergamo and Torresani, 2010) showed that Transductive SVMs (TSVM), originally developed for semi-supervised learning, are also strong adaptation methods. The idea is to train on source data like a SVM, but encourage the classification boundary to divide through low density regions in the unlabeled target data. Table 2 shows that TSVM outperforms SVM in all but one case for cross-market adaptation, but gives mixed results for cross-lingual adaptation. This is a puzzling result considering that both use the same unlabeled data. Why does TSVM exhibit such a large variance on cross-lingual problems, but not on cross-market problems? Is unlabeled target data interacting with source data in some unexpected way? Certainly there are several successful studies (Wan, 2009; Wei and Pal, 2010; Banea et al., 2008), but we think it is important to consider the possibility that cross-lingual adaptation has some fundamental differences. We conjecture that adapting from artificially-generated text (e.g. MT output) is a different story than adapting from naturallyoccurring text (e.g. cross-market). In short, MT is ripe for cross-lingual adaptation; what is not ripe is probably our understanding of the special characteristics of the adaptation problem. References Carmen Banea, Rada Mihalcea, Janyce Wiebe, and Samer Hassan. 2008. Multilingual subjectivity analysis using machine translation. In Proc. of Conference on Empirical Methods in Natural Language Processing (EMNLP). Alessandro Bergamo and Lorenzo Torresani. 2010. Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach. In Advances in Neural Information Processing Systems (NIPS). John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In Proc. of Conference on Empirical Methods in Natural Language Processing (EMNLP). Jenny Rose Finkel and Chris Manning. 2009. Hierarchical Bayesian domain adaptation. In Proc. of NAACL Human Language Technologies (HLT). Jing Jiang and ChengXiang Zhai. 2007. Instance weighting for domain adaptation in NLP. In Proc. of the Association for Computational Linguistics (ACL). Peter Prettenhofer and Benno Stein. 2010. Crosslanguage text classification using structural correspondence learning. In Proc. of the Association for Computational Linguistics (ACL). Hidetoshi Shimodaira. 2000. Improving predictive inference under covariate shift by weighting the loglikelihood function. Journal of Statistical Planning and Inferenc, 90. Masashi Sugiyama, Taiji Suzuki, Shinichi Nakajima, Hisashi Kashima, Paul von B ¨unau, and Motoaki Kawanabe. 2008. Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics, 60(4). Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classification. In Proc. of the Association for Computational Linguistics (ACL). Bin Wei and Chris Pal. 2010. Cross lingual adaptation: an experiment on sentiment classification. In Proceedings of the ACL 2010 Conference Short Papers. 433

2 0.9478547 230 acl-2011-Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation

Author: Roy Schwartz ; Omri Abend ; Roi Reichart ; Ari Rappoport

Abstract: Dependency parsing is a central NLP task. In this paper we show that the common evaluation for unsupervised dependency parsing is highly sensitive to problematic annotations. We show that for three leading unsupervised parsers (Klein and Manning, 2004; Cohen and Smith, 2009; Spitkovsky et al., 2010a), a small set of parameters can be found whose modification yields a significant improvement in standard evaluation measures. These parameters correspond to local cases where no linguistic consensus exists as to the proper gold annotation. Therefore, the standard evaluation does not provide a true indication of algorithm quality. We present a new measure, Neutral Edge Direction (NED), and show that it greatly reduces this undesired phenomenon.

3 0.9469406 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars

Author: Mark-Jan Nederhof ; Giorgio Satta

Abstract: We present a method for the computation of prefix probabilities for synchronous contextfree grammars. Our framework is fairly general and relies on the combination of a simple, novel grammar transformation and standard techniques to bring grammars into normal forms.

4 0.94415271 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

Author: Guangyou Zhou ; Jun Zhao ; Kang Liu ; Li Cai

Abstract: In this paper, we present a novel approach which incorporates the web-derived selectional preferences to improve statistical dependency parsing. Conventional selectional preference learning methods have usually focused on word-to-class relations, e.g., a verb selects as its subject a given nominal class. This paper extends previous work to wordto-word selectional preferences by using webscale data. Experiments show that web-scale data improves statistical dependency parsing, particularly for long dependency relationships. There is no data like more data, performance improves log-linearly with the number of parameters (unique N-grams). More importantly, when operating on new domains, we show that using web-derived selectional preferences is essential for achieving robust performance.

same-paper 5 0.93384939 122 acl-2011-Event Extraction as Dependency Parsing

Author: David McClosky ; Mihai Surdeanu ; Christopher Manning

6 0.93235922 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation

7 0.9265343 204 acl-2011-Learning Word Vectors for Sentiment Analysis

8 0.92367542 334 acl-2011-Which Noun Phrases Denote Which Concepts?

9 0.87839699 332 acl-2011-Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

10 0.83031589 92 acl-2011-Data point selection for cross-language adaptation of dependency parsers

11 0.8263905 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification

12 0.82142055 256 acl-2011-Query Weighting for Ranking Model Adaptation

13 0.81977236 186 acl-2011-Joint Training of Dependency Parsing Filters through Latent Support Vector Machines

14 0.8131367 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

15 0.80647069 85 acl-2011-Coreference Resolution with World Knowledge

16 0.79607838 199 acl-2011-Learning Condensed Feature Representations from Large Unsupervised Data Sets for Supervised Learning

17 0.79525787 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features

18 0.79359955 292 acl-2011-Target-dependent Twitter Sentiment Classification

19 0.79352361 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation

20 0.78323936 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation