acl acl2011 acl2011-165 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Youngjun Kim ; Ellen Riloff ; Stephane Meystre
Abstract: We present an NLP system that classifies the assertion type of medical problems in clinical notes used for the Fourth i2b2/VA Challenge. Our classifier uses a variety of linguistic features, including lexical, syntactic, lexicosyntactic, and contextual features. To overcome an extremely unbalanced distribution of assertion types in the data set, we focused our efforts on adding features specifically to improve the performance of minority classes. As a result, our system reached 94. 17% micro-averaged and 79.76% macro-averaged F1-measures, and showed substantial recall gains on the minority classes. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We present an NLP system that classifies the assertion type of medical problems in clinical notes used for the Fourth i2b2/VA Challenge. [sent-14, score-1.11]
2 Our classifier uses a variety of linguistic features, including lexical, syntactic, lexicosyntactic, and contextual features. [sent-15, score-0.133]
3 To overcome an extremely unbalanced distribution of assertion types in the data set, we focused our efforts on adding features specifically to improve the performance of minority classes. [sent-16, score-0.867]
4 76% macro-averaged F1-measures, and showed substantial recall gains on the minority classes. [sent-19, score-0.348]
5 1 Introduction Since the beginning of the new millennium, there has been a growing need in the medical community for Natural Language Processing (NLP) technology to provide computable information from narra- tive text and enable improved data quality and decision-making. [sent-20, score-0.402]
6 documents in the electronic health record) are also realizing that the transition to machine learning techniques from traditional rule-based methods can lead to more efficient ways to process increasingly large collections of clinical narratives. [sent-23, score-0.237]
7 311 In this paper, we focus on the medical assertions classification task. [sent-25, score-0.62]
8 Given a medical problem mentioned in a clinical text, an assertion classifier must look at the context and choose the status of how the medical problem pertains to the patient by assigning one of six labels: present, absent, hypothetical, possible, conditional, or not associated with the patient. [sent-26, score-1.746]
9 The corpus for this task consists of discharge summaries from Partners HealthCare (Boston, MA) and Beth Israel Deaconess Medical Center, as well as discharge summaries and progress notes from the University of Pittsburgh Medical Center (Pittsburgh, PA). [sent-27, score-0.294]
10 However, two of the assertion categories (present and absent) accounted for nearly 90% of the instances in the data set, while the other four classes were relatively infrequent. [sent-30, score-0.62]
11 When we analyzed our results, we saw that our performance on the four minority classes was weak (e. [sent-31, score-0.434]
12 Even though the minority classes are not common, they are extremely important to identify accurately (e. [sent-35, score-0.404]
13 , a medical problem not associated with the patient should not be assigned to the patient). [sent-37, score-0.542]
14 In this paper, we present our efforts to reduce the performance gap between the dominant assertion classes and the minority classes. [sent-38, score-0.941]
15 We made three types of changes to address this issue: we changed the multi-class learning strategy, filtered the training data to remove redundancy, and added new features specifically designed to increase recall on the minority classes. [sent-39, score-0.415]
16 We compare the performance of our new classifier with our original Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o. [sent-40, score-0.085]
17 i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 311–316, i2b2/VA Challenge classifier and show that it per- forms substantially better on the minority classes, while increasing overall performance as well. [sent-42, score-0.357]
18 2 Related Work During the Fourth i2b2/VA Challenge, the assertion classification task was tackled by participating researchers. [sent-43, score-0.491]
19 Their breakdown of F1 scores on the individual classes was: present 95. [sent-47, score-0.132]
20 Previously, some researchers had developed systems to recognize specific assertion categories. [sent-56, score-0.491]
21 (2001) created the NegEx algorithm, a simple rule-based system that uses regular expressions with trigger terms to determine whether a medical term is absent in a patient. [sent-58, score-0.722]
22 (2007) also introduced the ConText algorithm, which extended the NegEx algorithm to detect four assertion categories: absent, hypothet- ical, historical, and not associated with the patient. [sent-63, score-0.496]
23 (2009) developed the Statistical Assertion Classifier (StAC) and showed that a machine learning approach for assertion classification could achieve results competitive with their own implementation of Extended NegEx algorithm (ENegEx). [sent-65, score-0.521]
24 They used four assertion classes: present, absent, uncertain in the patient, or not associated with the patient. [sent-66, score-0.496]
25 3 The Assertion Classifier We approach the assertion classification task as a supervised learning problem. [sent-67, score-0.491]
26 The classifier is given a medical term within a sentence as input and must assign one of the six assertion categories to the medical term based on its surrounding context. [sent-68, score-1.478]
27 The architecture includes a section detector (adapted from earlier 312 work by Meystre and Haug (2005)), a tokenizer (based on regular expressions to split text on white space characters), a part-of-speech (POS) tagger (OpenNLP (Baldridge et al. [sent-71, score-0.032]
28 , 2001)), and a normalizer based on the LVG (Lexical Variants Generation) (LVG, 2010) annotator from cTAKES to retrieve normalized word forms. [sent-74, score-0.035]
29 The assertion classifier uses features extracted by the subcomponents to represent training and test instances. [sent-75, score-0.611]
30 We used LIBSVM, a library for support vector machines (SVM), (Chang and Lin, 2001) for multi-class classification with the RBF (Radial Basis Function) kernel. [sent-76, score-0.035]
31 2 Original i2b2 Feature Set The assertion classifier that we created for the i2b2/VA Challenge used the features listed below, which we developed by manually examining the training data: Lexical Features: The medical term itself, the three words preceding it, and the three words following it. [sent-78, score-1.179]
32 Syntactic Features: Part-of-speech tags of the three words preceding the medical term and the three words following it. [sent-83, score-0.507]
33 Lexico-Syntactic Features: We also defined features representing words corresponding to several parts-of-speech in the same sentence as the medical term. [sent-84, score-0.472]
34 The value for each feature is the normalized word string. [sent-85, score-0.075]
35 To mitigate the limited window size of lexical features, we defined one feature each for the nearest preceding and following adjective, adverb, preposition, and verb, and one additional preceding adjective and preposition and one additional following verb and preposition. [sent-86, score-0.275]
36 , 2001) to detect four contextual properties in the sentence: absent (negation), hypothetical, historical, and not associated with the patient. [sent-88, score-0.295]
37 We also created one feature to represent the Section Header with a string value normalized using (Meystre and Haug, 2005). [sent-90, score-0.136]
38 The system only using contextual features gave reasonable results: F1-measure overall 89. [sent-91, score-0.118]
39 Feature Pruning: We created an UNKNOWN feature value to cover rarely seen feature values. [sent-96, score-0.141]
40 Lexical feature values that had frequency < 4 and other feature values that had frequency < 2 were all encoded as UNKNOWNs. [sent-97, score-0.138]
41 3 New Features for Improvements After the i2b2/VA Challenge submission, we added the following new features, specifically to try to improve performance on the minority classes: Lexical Features: We created a second set of lexical features that were case-insensitive. [sent-99, score-0.444]
42 We also created three additional binary features for each lexical feature. [sent-100, score-0.214]
43 We computed the average tf-idf score for the words comprising the medical term itself, the average tf-idf score for the three words to its left, and the average tf-idf score for the three words to its right. [sent-101, score-0.454]
44 Each binary feature has a value of true if the average tf-idf score is smaller than a threshold (e. [sent-102, score-0.109]
45 Finally, we created another binary feature that is true if the medical term contains a word with a negative prefix. [sent-106, score-0.624]
46 1 Lexico-Syntactic Features: We defined two binary features that check for the presence of a 1 Negative prefixes: ab, de, di, il, im, in, ir, re, un, no, mel, mal, mis. [sent-107, score-0.112]
47 313 comma or question mark adjacent to the medical term. [sent-109, score-0.402]
48 We also defined features for the nearest preceding and following modal verb and wh-adverb (e. [sent-110, score-0.154]
49 Finally, we reduced the scope of these features from the entire sentence to a context window of size eight around the medical term. [sent-113, score-0.502]
50 Sentence Features: We created two binary features to represent whether a sentence is long (> 50 words) or short (<= 50 words), and whether the sentence contains more than 5 punctuation marks, primarily to identify sentences containing lists. [sent-114, score-0.173]
51 Context Features: We created a second set of ConText algorithm properties for negation restricted to the six word context window around the medical term. [sent-115, score-0.565]
52 According to the assertion annotation guidelines, problems associated with allergies were defined as conditional. [sent-116, score-0.676]
53 So we added one binary feature that is true if the section headers contain terms related to allergies (e. [sent-117, score-0.335]
54 Feature Pruning: We changed the pruning strategy to use document frequency values instead of corpus frequency for the lexical features, and used document frequency > 1 for normalized words and > 2 for case-insensitive words as thresholds. [sent-120, score-0.263]
55 We also removed 57 redundant instances from the training set. [sent-121, score-0.032]
56 Finally, when a medical term co-exists with other medical terms (problem concepts) in the same sentence, the others are excluded from the lexical and lexico-syntactic fea- 2 tures. [sent-122, score-0.897]
57 4 Multi-class Learning Strategies Our original i2b2 system used a 1-vs-1 classification strategy. [sent-124, score-0.035]
58 This approach creates one classifier for each possible pair of labels (e. [sent-125, score-0.085]
59 , one classifier decides whether an instance is present vs. [sent-127, score-0.118]
60 (2001) reported that this approach did not work well for data sets that had highly unbalanced class probabilities. [sent-133, score-0.078]
61 Therefore we experimented with an alternative 1vs-all classification strategy. [sent-134, score-0.035]
62 In this approach, we 2 We hoped to help the classifier recognize lists for negation scoping, although no scoping features were added per se. [sent-135, score-0.282]
63 create one classifier for each type of label using instances with that label as positive instances and instances with any other label as negative instances. [sent-136, score-0.181]
64 The final class label is assigned by choosing the class that was assigned with the highest confidence value (i. [sent-137, score-0.082]
65 4 Evaluation After changing to the 1-vs-all multi-class strategy and adding the new feature set, we evaluated our improved system on the test data and compared its performance with our original system. [sent-140, score-0.08]
66 1 Data The training set includes 349 clinical notes, with 11,967 assertions of medical problems. [sent-142, score-0.791]
67 These assertions were distributed as follows (Table 1): Table 1: Assertions Distribution 4. [sent-144, score-0.183]
68 2 Results For the i2b2/VA Challenge submission, our system showed good performance, with 93. [sent-145, score-0.03]
69 However, the macro F1measure was much lower because our recall on the minority classes was weak. [sent-147, score-0.45]
70 For example, most of the conditional test cases were misclassified as present. [sent-148, score-0.064]
71 17%, which now outperforms the best official score reported for the 2010 i2b2 challenge (which was 93. [sent-151, score-0.081]
72 The F1measure improved in all classes, but we saw especially large improvements with the possible class (+6. [sent-156, score-0.071]
73 Although the improvement on the dominant classes was limited in absolute terms (+. [sent-159, score-0.181]
74 86% for absent), the relative reduction in error rate was greater than for the minority classes: -29. [sent-161, score-0.272]
75 3 Analysis We performed five-fold cross validation on the training data to measure the impact of each of the four subsets of features explained in Section 3. [sent-166, score-0.129]
76 Table 3 shows the cross validation results when cumulatively adding each set of features. [sent-167, score-0.119]
77 Applying the 1-vs-all strategy showed interesting results: recall went up and precision went down for all classes except present. [sent-168, score-0.32]
78 Although the overall F1measure remained almost same, it helped to increase the recall on the minority classes, and we were able to gain most of the precision back (without sacrificing this recall) by adding the new features. [sent-169, score-0.365]
79 The new lexical features including negative prefixes and binary tf-idf features primarily increased performance on the absent class. [sent-170, score-0.49]
80 Using document frequency to prune lexical features showed small gains in all classes except absent. [sent-171, score-0.302]
81 Sentence features helped recognize hypothetical assertions, which often occur in relatively long sentences. [sent-172, score-0.252]
82 The possible class benefitted the most from the new lexico-syntactic features, with a 3. [sent-173, score-0.041]
83 The new contextual features helped detect more conditional cases. [sent-177, score-0.229]
84 “Allergies”, “Allergies and Medicine Reactions”, “Allergies/Sensitivities”, “Allergy”, and “Medication Allergies”) were associated with conditional assertions. [sent-180, score-0.104]
85 Together, all the new features increased recall by 26. [sent-181, score-0.145]
86 Conclusions We created a more accurate assertion classifier that now achieves state-of-the-art performance on assertion labeling for clinical texts. [sent-186, score-1.264]
87 We showed that it is possible to improve performance on recognizing minority classes by 1-vs-all strategy and richer features designed with the minority classes in mind. [sent-187, score-0.948]
88 However, performance on the minority classes still lags behind the dominant classes, so more work is needed in this area. [sent-188, score-0.453]
89 BioNLP 2007: Biological, translational, and clinical language processing, Prague, CZ. [sent-224, score-0.206]
90 Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. [sent-265, score-0.206]
wordName wordTfidf (topN-words)
[('assertion', 0.456), ('medical', 0.402), ('minority', 0.272), ('absent', 0.207), ('clinical', 0.206), ('assertions', 0.183), ('allergies', 0.18), ('chapman', 0.137), ('classes', 0.132), ('utah', 0.122), ('lvg', 0.12), ('med', 0.109), ('meystre', 0.106), ('patient', 0.1), ('hypothetical', 0.1), ('discharge', 0.097), ('bruijn', 0.09), ('ctakes', 0.09), ('uzuner', 0.09), ('classifier', 0.085), ('challenge', 0.081), ('negex', 0.073), ('uima', 0.073), ('berry', 0.073), ('features', 0.07), ('inform', 0.066), ('salt', 0.065), ('conditional', 0.064), ('biomedical', 0.063), ('created', 0.061), ('cumulatively', 0.06), ('haug', 0.06), ('mccray', 0.06), ('medication', 0.06), ('phane', 0.06), ('youngjun', 0.06), ('lake', 0.06), ('informatics', 0.056), ('savova', 0.053), ('preceding', 0.053), ('term', 0.052), ('dominant', 0.049), ('scoping', 0.049), ('wendy', 0.049), ('contextual', 0.048), ('helped', 0.047), ('recall', 0.046), ('healthcare', 0.046), ('headers', 0.046), ('notes', 0.046), ('ut', 0.045), ('ferrucci', 0.043), ('apache', 0.043), ('negation', 0.043), ('binary', 0.042), ('submission', 0.042), ('lexical', 0.041), ('class', 0.041), ('associated', 0.04), ('feature', 0.04), ('strategy', 0.04), ('libsvm', 0.04), ('fourth', 0.039), ('opennlp', 0.039), ('unbalanced', 0.037), ('translational', 0.037), ('baldridge', 0.036), ('went', 0.036), ('normalized', 0.035), ('historical', 0.035), ('recognize', 0.035), ('classification', 0.035), ('pruning', 0.033), ('city', 0.033), ('va', 0.033), ('decides', 0.033), ('instances', 0.032), ('efforts', 0.032), ('architecture', 0.032), ('prefixes', 0.031), ('health', 0.031), ('cherry', 0.031), ('nearest', 0.031), ('validation', 0.031), ('window', 0.03), ('saw', 0.03), ('showed', 0.03), ('frequency', 0.029), ('increased', 0.029), ('six', 0.029), ('cross', 0.028), ('riloff', 0.028), ('pittsburgh', 0.028), ('changed', 0.027), ('preposition', 0.027), ('summaries', 0.027), ('reached', 0.027), ('true', 0.027), ('xiaodan', 0.026), ('pertains', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 165 acl-2011-Improving Classification of Medical Assertions in Clinical Notes
Author: Youngjun Kim ; Ellen Riloff ; Stephane Meystre
Abstract: We present an NLP system that classifies the assertion type of medical problems in clinical notes used for the Fourth i2b2/VA Challenge. Our classifier uses a variety of linguistic features, including lexical, syntactic, lexicosyntactic, and contextual features. To overcome an extremely unbalanced distribution of assertion types in the data set, we focused our efforts on adding features specifically to improve the performance of minority classes. As a result, our system reached 94. 17% micro-averaged and 79.76% macro-averaged F1-measures, and showed substantial recall gains on the minority classes. 1
2 0.14954014 209 acl-2011-Lexically-Triggered Hidden Markov Models for Clinical Document Coding
Author: Svetlana Kiritchenko ; Colin Cherry
Abstract: The automatic coding of clinical documents is an important task for today’s healthcare providers. Though it can be viewed as multi-label document classification, the coding problem has the interesting property that most code assignments can be supported by a single phrase found in the input document. We propose a Lexically-Triggered Hidden Markov Model (LT-HMM) that leverages these phrases to improve coding accuracy. The LT-HMM works in two stages: first, a lexical match is performed against a term dictionary to collect a set of candidate codes for a document. Next, a discriminative HMM selects the best subset of codes to assign to the document by tagging candidates as present or absent. By confirming codes proposed by a dictionary, the LT-HMM can share features across codes, enabling strong performance even on rare codes. In fact, we are able to recover codes that do not occur in the training set at all. Our approach achieves the best ever performance on the 2007 Medical NLP Challenge test set, with an F-measure of 89.84.
3 0.11556838 42 acl-2011-An Interface for Rapid Natural Language Processing Development in UIMA
Author: Balaji Soundrarajan ; Thomas Ginter ; Scott DuVall
Abstract: This demonstration presents the Annotation Librarian, an application programming interface that supports rapid development of natural language processing (NLP) projects built in Apache Unstructured Information Management Architecture (UIMA). The flexibility of UIMA to support all types of unstructured data – images, audio, and text – increases the complexity of some of the most common NLP development tasks. The Annotation Librarian interface handles these common functions and allows the creation and management of annotations by mirroring Java methods used to manipulate Strings. The familiar syntax and NLP-centric design allows developers to adopt and rapidly develop NLP algorithms in UIMA. The general functionality of the interface is described in relation to the use cases that necessitated its creation. 1
4 0.097612284 50 acl-2011-Automatic Extraction of Lexico-Syntactic Patterns for Detection of Negation and Speculation Scopes
Author: Emilia Apostolova ; Noriko Tomuro ; Dina Demner-Fushman
Abstract: Detecting the linguistic scope of negated and speculated information in text is an important Information Extraction task. This paper presents ScopeFinder, a linguistically motivated rule-based system for the detection of negation and speculation scopes. The system rule set consists of lexico-syntactic patterns automatically extracted from a corpus annotated with negation/speculation cues and their scopes (the BioScope corpus). The system performs on par with state-of-the-art machine learning systems. Additionally, the intuitive and linguistically motivated rules will allow for manual adaptation of the rule set to new domains and corpora. 1 Motivation Information Extraction (IE) systems often face the problem of distinguishing between affirmed, negated, and speculative information in text. For example, sentiment analysis systems need to detect negation for accurate polarity classification. Similarly, medical IE systems need to differentiate between affirmed, negated, and speculated (possible) medical conditions. The importance of the task of negation and speculation (a.k.a. hedge) detection is attested by a number of research initiatives. The creation of the BioScope corpus (Vincze et al., 2008) assisted in the development and evaluation of several negation/hedge scope detection systems. The corpus consists of medical and biological texts annotated for negation, speculation, and their linguistic scope. The 2010 283 Noriko Tomuro Dina Demner-Fushman DePaul University Chicago, IL USA t omuro @ c s . depaul . edu National Library of Medicine Bethesda, MD USA ddemne r@mai l nih . gov . i2b2 NLP Shared Task1 included a track for detection of the assertion status of medical problems (e.g. affirmed, negated, hypothesized, etc.). The CoNLL2010 Shared Task (Farkas et al., 2010) focused on detecting hedges and their scopes in Wikipedia articles and biomedical texts. In this paper, we present a linguistically motivated rule-based system for the detection of negation and speculation scopes that performs on par with state-of-the-art machine learning systems. The rules used by the ScopeFinder system are automatically extracted from the BioScope corpus and encode lexico-syntactic patterns in a user-friendly format. While the system was developed and tested using a biomedical corpus, the rule extraction mechanism is not domain-specific. In addition, the linguistically motivated rule encoding allows for manual adaptation to new domains and corpora. 2 Task Definition Negation/Speculation detection is typically broken down into two sub-tasks - discovering a negation/speculation cue and establishing its scope. The following example from the BioScope corpus shows the annotated hedging cue (in bold) together with its associated scope (surrounded by curly brackets): Finally, we explored the {possible role of 5hydroxyeicosatetraenoic acid as a regulator of arachidonic acid liberation}. Typically, systems first identify negation/speculation cues and subsequently try to identify their associated cue scope. However, the two tasks are interrelated and both require 1https://www.i2b2.org/NLP/Relations/ Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 283–287, syntactic understanding. Consider the following two sentences from the BioScope corpus: 1) By contrast, {D-mib appears to be uniformly expre1ss)e Bdy yin c oimnatrgaisnta,l { dDis-mcsi }b. 2) Differentiation assays using water soluble phorbol esters reveal that differentiation becomes irreversible soon after AP-1 appears. Both sentences contain the word form appears, however in the first sentence the word marks a hedg- ing cue, while in the second sentence the word does not suggest speculation. Unlike previous work, we do not attempt to identify negation/speculation cues independently of their scopes. Instead, we concentrate on scope detection, simultaneously detecting corresponding cues. 3 Dataset We used the BioScope corpus (Vincze et al., 2008) to develop our system and evaluate its performance. To our knowledge, the BioScope corpus is the only publicly available dataset annotated with negation/speculation cues and their scopes. It consists of biomedical papers, abstracts, and clinical reports (corpus statistics are shown in Tables 1 and 2). Corpus Type Sentences Documents Mean Document Size Clinical752019543.85 Full Papers Paper Abstracts 3352 14565 9 1273 372.44 11.44 Table 1: Statistics of the BioScope corpus. Document sizes represent number of sentences. Corpus Type Negation Cues Speculation Cues Negation Speculation Clinical87211376.6%13.4% Full Papers Paper Abstracts 378 1757 682 2694 13.76% 13.45% 22.29% 17.69% Table 2: Statistics of the BioScope corpus. The 2nd and 3d columns show the total number of cues within the datasets; the 4th and 5th columns show the percentage of negated and speculative sentences. 70% ofthe corpus documents (randomly selected) were used to develop the ScopeFinder system (i.e. extract lexico-syntactic rules) and the remaining 30% were used to evaluate system performance. While the corpus focuses on the biomedical domain, our rule extraction method is not domain specific and in future work we are planning to apply our method on different types of corpora. 4 Method Intuitively, rules for detecting both speculation and negation scopes could be concisely expressed as a 284 Figure 1: Parse tree of the sentence ‘T cells {lack active NFkappa B } bPuatr express Sp1 as expected’ generated by cthtiev eS NtanF-fkoaprdp parser. Speculation scope ewxporedcste are gsehnoewrant eind ellipsis. tTanhecue word is shown in grey. The nearest common ancestor of all cue and scope leaf nodes is shown in a box. combination of lexical and syntactic patterns. example, BioScope O¨zg u¨r For and Radev (2009) examined sample sentences and developed hedging scope rules such as: The scope of a modal verb cue (e.g. may, might, could) is the verb phrase to which it is attached; The scope of a verb cue (e.g. appears, seems) followed by an infinitival clause extends to the whole sentence. Similar lexico-syntactic rules have been also manually compiled and used in a number of hedge scope detection systems, e.g. (Kilicoglu and Bergler, 2008), (Rei and Briscoe, 2010), (Velldal et al., 2010), (Kilicoglu and Bergler, 2010), (Zhou et al., 2010). However, manually creating a comprehensive set of such lexico-syntactic scope rules is a laborious and time-consuming process. In addition, such an approach relies heavily on the availability of accurately parsed sentences, which could be problematic for domains such as biomedical texts (Clegg and Shepherd, 2007; McClosky and Charniak, 2008). Instead, we attempted to automatically extract lexico-syntactic scope rules from the BioScope corpus, relying only on consistent (but not necessarily accurate) parse tree representations. We first parsed each sentence in the training dataset which contained a negation or speculation cue using the Stanford parser (Klein and Manning, 2003; De Marneffe et al., 2006). Figure 1 shows the parse tree of a sample sentence containing a negation cue and its scope. Next, for each cue-scope instance within the sen- tence, we identified the nearest common ancestor Figure 2: Lexico-syntactic pattern extracted from the sentence from Figure 1. The rule is equivalent to the following string representation: (VP (VBP lack) (NP (JJ *scope*) (NN *scope*) (NN *scope*))). which encompassed the cue word(s) and all words in the scope (shown in a box on Figure 1). The subtree rooted by this ancestor is the basis for the resulting lexico-syntactic rule. The leaf nodes of the resulting subtree were converted to a generalized representation: scope words were converted to *scope*; noncue and non-scope words were converted to *; cue words were converted to lower case. Figure 2 shows the resulting rule. This rule generation approach resulted in a large number of very specific rule patterns - 1,681 nega- tion scope rules and 3,043 speculation scope rules were extracted from the training dataset. To identify a more general set of rules (and increase recall) we next performed a simple transformation of the derived rule set. If all children of a rule tree node are of type *scope* or * (i.e. noncue words), the node label is replaced by *scope* or * respectively, and the node’s children are pruned from the rule tree; neighboring identical siblings of type *scope* or * are replaced by a single node of the corresponding type. Figure 3 shows an example of this transformation. (a)ThechildrenofnodesJ /N /N are(b)Thechildren pruned and their labels are replaced by of node NP are *scope*. pruned and its label is replaced by *scope*. Figure 3: Transformation of the tree shown in Figure 2. The final rule is equivalent to the following string representation: (VP (VBP lack) *scope* ) 285 The rule tree pruning described above reduced the negation scope rule patterns to 439 and the speculation rule patterns to 1,000. In addition to generating a set of scope finding rules, we also implemented a module that parses string representations of the lexico-syntactic rules and performs subtree matching. The ScopeFinder module2 identifies negation and speculation scopes in sentence parse trees using string-encoded lexicosyntactic patterns. Candidate sentence parse subtrees are first identified by matching the path of cue leafnodes to the root ofthe rule subtree pattern. Ifan identical path exists in the sentence, the root of the candidate subtree is thus also identified. The candidate subtree is evaluated for a match by recursively comparing all node children (starting from the root of the subtree) to the rule pattern subtree. Nodes of type *scope* and * match any number of nodes, similar to the semantics of Regex Kleene star (*). 5 Results As an informed baseline, we used a previously de- veloped rule-based system for negation and speculation scope discovery (Apostolova and Tomuro, 2010). The system, inspired by the NegEx algorithm (Chapman et al., 2001), uses a list of phrases split into subsets (preceding vs. following their scope) to identify cues using string matching. The cue scopes extend from the cue to the beginning or end of the sentence, depending on the cue type. Table 3 shows the baseline results. PSFCNalpueingpleciarPutcAlai opbtneisor tacsP6597C348o.r12075e4ctly6859RP203475r. 81e26d037icteF569784C52. 04u913e84s5F2A81905l.2786P14redictCus Table 3: Baseline system performance. P (Precision), R (Recall), and F (F1-score) are computed based on the sentence tokens of correctly predicted cues. The last column shows the F1-score for sentence tokens of all predicted cues (including erroneous ones). We used only the scopes of predicted cues (correctly predicted cues vs. all predicted cues) to mea- 2The rule sets and source code are publicly available at http://scopefinder.sourceforge.net/. sure the baseline system performance. The baseline system heuristics did not contain all phrase cues present in the dataset. The scopes of cues that are missing from the baseline system were not included in the results. As the baseline system was not penalized for missing cue phrases, the results represent the upper bound of the system. Table 4 shows the results from applying the full extracted rule set (1,681 negation scope rules and 3,043 speculation scope rules) on the test data. As expected, this rule set consisting of very specific scope matching rules resulted in very high precision and very low recall. Negation P R F A Clinical99.4734.3051.0117.58 Full Papers Paper Abstracts 95.23 87.33 25.89 05.78 40.72 10.84 28.00 07.85 Speculation Clinical96.5020.1233.3022.90 Full Papers Paper Abstracts 88.72 77.50 15.89 11.89 26.95 20.62 10.13 10.00 Table 4: Results from applying the full extracted rule set on the test data. Precision (P), Recall (R), and F1-score (F) are com- puted based the number of correctly identified scope tokens in each sentence. Accuracy (A) is computed for correctly identified full scopes (exact match). Table 5 shows the results from applying the rule set consisting of pruned pattern trees (439 negation scope rules and 1,000 speculation scope rules) on the test data. As shown, overall results improved significantly, both over the baseline and over the unpruned set of rules. Comparable results are shown in bold in Tables 3, 4, and 5. Negation P R F A Clinical85.5992.1588.7585.56 Full Papers 49.17 94.82 64.76 71.26 Paper Abstracts 61.48 92.64 73.91 80.63 Speculation Clinical67.2586.2475.5771.35 Full Papers 65.96 98.43 78.99 52.63 Paper Abstracts 60.24 95.48 73.87 65.28 Table 5: Results from applying the pruned rule set on the test data. Precision (P), Recall (R), and F1-score (F) are computed based on the number of correctly identified scope tokens in each sentence. Accuracy (A) is computed for correctly identified full scopes (exact match). 6 Related Work Interest in the task of identifying negation and spec- ulation scopes has developed in recent years. Rele286 vant research was facilitated by the appearance of a publicly available annotated corpus. All systems described below were developed and evaluated against the BioScope corpus (Vincze et al., 2008). O¨zg u¨r and Radev (2009) have developed a supervised classifier for identifying speculation cues and a manually compiled list of lexico-syntactic rules for identifying their scopes. For the performance of the rule based system on identifying speculation scopes, they report 61. 13 and 79.89 accuracy for BioScope full papers and abstracts respectively. Similarly, Morante and Daelemans (2009b) developed a machine learning system for identifying hedging cues and their scopes. They modeled the scope finding problem as a classification task that determines if a sentence token is the first token in a scope sequence, the last one, or neither. Results of the scope finding system with predicted hedge signals were reported as F1-scores of 38. 16, 59.66, 78.54 and for clinical texts, full papers, and abstracts respectively3. Accuracy (computed for correctly identified scopes) was reported as 26.21, 35.92, and 65.55 for clinical texts, papers, and abstracts respectively. Morante and Daelemans have also developed a metalearner for identifying the scope of negation (2009a). Results of the negation scope finding system with predicted cues are reported as F1-scores (computed on scope tokens) of 84.20, 70.94, and 82.60 for clinical texts, papers, and abstracts respectively. Accuracy (the percent of correctly identified exact scopes) is reported as 70.75, 41.00, and 66.07 for clinical texts, papers, and abstracts respectively. The top three best performers on the CoNLL2010 shared task on hedge scope detection (Farkas et al., 2010) report an F1-score for correctly identified hedge cues and their scopes ranging from 55.3 to 57.3. The shared task evaluation metrics used stricter matching criteria based on exact match of both cues and their corresponding scopes4. CoNLL-2010 shared task participants applied a variety of rule-based and machine learning methods 3F1-scores are computed based on scope tokens. Unlike our evaluation metric, scope token matches are computed for each cue within a sentence, i.e. a token is evaluated multiple times if it belongs to more than one cue scope. 4Our system does not focus on individual cue-scope pair de- tection (we instead optimized scope detection) and as a result performance metrics are not directly comparable. on the task - Morante et al. (2010) used a memorybased classifier based on the k-nearest neighbor rule to determine if a token is the first token in a scope sequence, the last, or neither; Rei and Briscoe (2010) used a combination of manually compiled rules, a CRF classifier, and a sequence of post-processing steps on the same task; Velldal et al (2010) manually compiled a set of heuristics based on syntactic information taken from dependency structures. 7 Discussion We presented a method for automatic extraction of lexico-syntactic rules for negation/speculation scopes from an annotated corpus. The developed ScopeFinder system, based on the automatically extracted rule sets, was compared to a baseline rule-based system that does not use syntactic information. The ScopeFinder system outperformed the baseline system in all cases and exhibited results comparable to complex feature-based, machine-learning systems. In future work, we will explore the use of statistically based methods for the creation of an optimum set of lexico-syntactic tree patterns and will evaluate the system performance on texts from different domains. References E. Apostolova and N. Tomuro. 2010. Exploring surfacelevel heuristics for negation and speculation discovery in clinical texts. In Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, pages 81–82. Association for Computational Linguistics. W.W. Chapman, W. Bridewell, P. Hanbury, G.F. Cooper, and B.G. Buchanan. 2001. A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of biomedical informatics, 34(5):301–310. A.B. Clegg and A.J. Shepherd. 2007. Benchmarking natural-language parsers for biological applications using dependency graphs. BMC bioinformatics, 8(1):24. M.C. De Marneffe, B. MacCartney, and C.D. Manning. 2006. Generating typed dependency parses from phrase structure parses. In LREC 2006. Citeseer. R. Farkas, V. Vincze, G. M o´ra, J. Csirik, and G. Szarvas. 2010. The CoNLL-2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text. In Proceedings of the Fourteenth Conference on 287 Computational Natural Language Learning (CoNLL2010): Shared Task, pages 1–12. H. Kilicoglu and S. Bergler. 2008. Recognizing speculative language in biomedical research articles: a linguistically motivated perspective. BMC bioinformatics, 9(Suppl 11):S10. H. Kilicoglu and S. Bergler. 2010. A High-Precision Approach to Detecting Hedges and Their Scopes. CoNLL-2010: Shared Task, page 70. D. Klein and C.D. Manning. 2003. Fast exact inference with a factored model for natural language parsing. Advances in neural information processing systems, pages 3–10. D. McClosky and E. Charniak. 2008. Self-training for biomedical parsing. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pages 101–104. Association for Computational Linguistics. R. Morante and W. Daelemans. 2009a. A metalearning approach to processing the scope of negation. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pages 21–29. Association for Computational Linguistics. R. Morante and W. Daelemans. 2009b. Learning the scope of hedge cues in biomedical texts. In Proceed- ings of the Workshop on BioNLP, pages 28–36. Association for Computational Linguistics. R. Morante, V. Van Asch, and W. Daelemans. 2010. Memory-based resolution of in-sentence scopes of hedge cues. CoNLL-2010: Shared Task, page 40. A. O¨zg u¨r and D.R. Radev. 2009. Detecting speculations and their scopes in scientific text. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3-Volume 3, pages 1398–1407. Association for Computational Linguistics. M. Rei and T. Briscoe. 2010. Combining manual rules and supervised learning for hedge cue and scope detection. In Proceedings of the 14th Conference on Natural Language Learning, pages 56–63. E. Velldal, L. Øvrelid, and S. Oepen. 2010. Resolving Speculation: MaxEnt Cue Classification and Dependency-Based Scope Rules. CoNLL-2010: Shared Task, page 48. V. Vincze, G. Szarvas, R. Farkas, G. M o´ra, and J. Csirik. 2008. The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC bioinformatics, 9(Suppl 11):S9. H. Zhou, X. Li, D. Huang, Z. Li, and Y. Yang. 2010. Exploiting Multi-Features to Detect Hedges and Their Scope in Biomedical Texts. CoNLL-2010: Shared Task, page 106.
5 0.058480777 259 acl-2011-Rare Word Translation Extraction from Aligned Comparable Documents
Author: Emmanuel Prochasson ; Pascale Fung
Abstract: We present a first known result of high precision rare word bilingual extraction from comparable corpora, using aligned comparable documents and supervised classification. We incorporate two features, a context-vector similarity and a co-occurrence model between words in aligned documents in a machine learning approach. We test our hypothesis on different pairs of languages and corpora. We obtain very high F-Measure between 80% and 98% for recognizing and extracting correct translations for rare terms (from 1to 5 occurrences). Moreover, we show that our system can be trained on a pair of languages and test on a different pair of languages, obtaining a F-Measure of 77% for the classification of Chinese-English translations using a training corpus of Spanish-French. Our method is therefore even potentially applicable to low resources languages without training data.
6 0.052689523 273 acl-2011-Semantic Representation of Negation Using Focus Detection
7 0.049559258 104 acl-2011-Domain Adaptation for Machine Translation by Mining Unseen Words
8 0.049032878 83 acl-2011-Contrasting Multi-Lingual Prosodic Cues to Predict Verbal Feedback for Rapport
9 0.048898939 320 acl-2011-Unsupervised Discovery of Domain-Specific Knowledge from Text
10 0.047268271 229 acl-2011-NULEX: An Open-License Broad Coverage Lexicon
11 0.045747381 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features
12 0.045648441 73 acl-2011-Collective Classification of Congressional Floor-Debate Transcripts
13 0.045332734 150 acl-2011-Hierarchical Text Classification with Latent Concepts
14 0.042803191 163 acl-2011-Improved Modeling of Out-Of-Vocabulary Words Using Morphological Classes
15 0.04193582 24 acl-2011-A Scalable Probabilistic Classifier for Language Modeling
16 0.041628964 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
17 0.040137496 95 acl-2011-Detection of Agreement and Disagreement in Broadcast Conversations
18 0.039998338 133 acl-2011-Extracting Social Power Relationships from Natural Language
19 0.039270684 254 acl-2011-Putting it Simply: a Context-Aware Approach to Lexical Simplification
20 0.039204888 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
topicId topicWeight
[(0, 0.127), (1, 0.036), (2, -0.032), (3, -0.004), (4, -0.012), (5, 0.028), (6, 0.035), (7, -0.004), (8, 0.007), (9, 0.007), (10, -0.038), (11, -0.051), (12, -0.011), (13, 0.05), (14, -0.016), (15, 0.0), (16, -0.025), (17, -0.017), (18, 0.048), (19, -0.05), (20, 0.026), (21, -0.099), (22, 0.022), (23, -0.027), (24, -0.002), (25, 0.074), (26, 0.034), (27, 0.054), (28, -0.075), (29, 0.008), (30, 0.031), (31, -0.016), (32, 0.063), (33, 0.034), (34, -0.097), (35, 0.095), (36, 0.011), (37, -0.044), (38, 0.071), (39, 0.032), (40, 0.066), (41, 0.049), (42, 0.009), (43, -0.099), (44, -0.017), (45, -0.056), (46, -0.022), (47, 0.045), (48, -0.086), (49, 0.137)]
simIndex simValue paperId paperTitle
same-paper 1 0.89625913 165 acl-2011-Improving Classification of Medical Assertions in Clinical Notes
Author: Youngjun Kim ; Ellen Riloff ; Stephane Meystre
Abstract: We present an NLP system that classifies the assertion type of medical problems in clinical notes used for the Fourth i2b2/VA Challenge. Our classifier uses a variety of linguistic features, including lexical, syntactic, lexicosyntactic, and contextual features. To overcome an extremely unbalanced distribution of assertion types in the data set, we focused our efforts on adding features specifically to improve the performance of minority classes. As a result, our system reached 94. 17% micro-averaged and 79.76% macro-averaged F1-measures, and showed substantial recall gains on the minority classes. 1
2 0.81869888 209 acl-2011-Lexically-Triggered Hidden Markov Models for Clinical Document Coding
Author: Svetlana Kiritchenko ; Colin Cherry
Abstract: The automatic coding of clinical documents is an important task for today’s healthcare providers. Though it can be viewed as multi-label document classification, the coding problem has the interesting property that most code assignments can be supported by a single phrase found in the input document. We propose a Lexically-Triggered Hidden Markov Model (LT-HMM) that leverages these phrases to improve coding accuracy. The LT-HMM works in two stages: first, a lexical match is performed against a term dictionary to collect a set of candidate codes for a document. Next, a discriminative HMM selects the best subset of codes to assign to the document by tagging candidates as present or absent. By confirming codes proposed by a dictionary, the LT-HMM can share features across codes, enabling strong performance even on rare codes. In fact, we are able to recover codes that do not occur in the training set at all. Our approach achieves the best ever performance on the 2007 Medical NLP Challenge test set, with an F-measure of 89.84.
3 0.59759825 50 acl-2011-Automatic Extraction of Lexico-Syntactic Patterns for Detection of Negation and Speculation Scopes
Author: Emilia Apostolova ; Noriko Tomuro ; Dina Demner-Fushman
Abstract: Detecting the linguistic scope of negated and speculated information in text is an important Information Extraction task. This paper presents ScopeFinder, a linguistically motivated rule-based system for the detection of negation and speculation scopes. The system rule set consists of lexico-syntactic patterns automatically extracted from a corpus annotated with negation/speculation cues and their scopes (the BioScope corpus). The system performs on par with state-of-the-art machine learning systems. Additionally, the intuitive and linguistically motivated rules will allow for manual adaptation of the rule set to new domains and corpora. 1 Motivation Information Extraction (IE) systems often face the problem of distinguishing between affirmed, negated, and speculative information in text. For example, sentiment analysis systems need to detect negation for accurate polarity classification. Similarly, medical IE systems need to differentiate between affirmed, negated, and speculated (possible) medical conditions. The importance of the task of negation and speculation (a.k.a. hedge) detection is attested by a number of research initiatives. The creation of the BioScope corpus (Vincze et al., 2008) assisted in the development and evaluation of several negation/hedge scope detection systems. The corpus consists of medical and biological texts annotated for negation, speculation, and their linguistic scope. The 2010 283 Noriko Tomuro Dina Demner-Fushman DePaul University Chicago, IL USA t omuro @ c s . depaul . edu National Library of Medicine Bethesda, MD USA ddemne r@mai l nih . gov . i2b2 NLP Shared Task1 included a track for detection of the assertion status of medical problems (e.g. affirmed, negated, hypothesized, etc.). The CoNLL2010 Shared Task (Farkas et al., 2010) focused on detecting hedges and their scopes in Wikipedia articles and biomedical texts. In this paper, we present a linguistically motivated rule-based system for the detection of negation and speculation scopes that performs on par with state-of-the-art machine learning systems. The rules used by the ScopeFinder system are automatically extracted from the BioScope corpus and encode lexico-syntactic patterns in a user-friendly format. While the system was developed and tested using a biomedical corpus, the rule extraction mechanism is not domain-specific. In addition, the linguistically motivated rule encoding allows for manual adaptation to new domains and corpora. 2 Task Definition Negation/Speculation detection is typically broken down into two sub-tasks - discovering a negation/speculation cue and establishing its scope. The following example from the BioScope corpus shows the annotated hedging cue (in bold) together with its associated scope (surrounded by curly brackets): Finally, we explored the {possible role of 5hydroxyeicosatetraenoic acid as a regulator of arachidonic acid liberation}. Typically, systems first identify negation/speculation cues and subsequently try to identify their associated cue scope. However, the two tasks are interrelated and both require 1https://www.i2b2.org/NLP/Relations/ Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 283–287, syntactic understanding. Consider the following two sentences from the BioScope corpus: 1) By contrast, {D-mib appears to be uniformly expre1ss)e Bdy yin c oimnatrgaisnta,l { dDis-mcsi }b. 2) Differentiation assays using water soluble phorbol esters reveal that differentiation becomes irreversible soon after AP-1 appears. Both sentences contain the word form appears, however in the first sentence the word marks a hedg- ing cue, while in the second sentence the word does not suggest speculation. Unlike previous work, we do not attempt to identify negation/speculation cues independently of their scopes. Instead, we concentrate on scope detection, simultaneously detecting corresponding cues. 3 Dataset We used the BioScope corpus (Vincze et al., 2008) to develop our system and evaluate its performance. To our knowledge, the BioScope corpus is the only publicly available dataset annotated with negation/speculation cues and their scopes. It consists of biomedical papers, abstracts, and clinical reports (corpus statistics are shown in Tables 1 and 2). Corpus Type Sentences Documents Mean Document Size Clinical752019543.85 Full Papers Paper Abstracts 3352 14565 9 1273 372.44 11.44 Table 1: Statistics of the BioScope corpus. Document sizes represent number of sentences. Corpus Type Negation Cues Speculation Cues Negation Speculation Clinical87211376.6%13.4% Full Papers Paper Abstracts 378 1757 682 2694 13.76% 13.45% 22.29% 17.69% Table 2: Statistics of the BioScope corpus. The 2nd and 3d columns show the total number of cues within the datasets; the 4th and 5th columns show the percentage of negated and speculative sentences. 70% ofthe corpus documents (randomly selected) were used to develop the ScopeFinder system (i.e. extract lexico-syntactic rules) and the remaining 30% were used to evaluate system performance. While the corpus focuses on the biomedical domain, our rule extraction method is not domain specific and in future work we are planning to apply our method on different types of corpora. 4 Method Intuitively, rules for detecting both speculation and negation scopes could be concisely expressed as a 284 Figure 1: Parse tree of the sentence ‘T cells {lack active NFkappa B } bPuatr express Sp1 as expected’ generated by cthtiev eS NtanF-fkoaprdp parser. Speculation scope ewxporedcste are gsehnoewrant eind ellipsis. tTanhecue word is shown in grey. The nearest common ancestor of all cue and scope leaf nodes is shown in a box. combination of lexical and syntactic patterns. example, BioScope O¨zg u¨r For and Radev (2009) examined sample sentences and developed hedging scope rules such as: The scope of a modal verb cue (e.g. may, might, could) is the verb phrase to which it is attached; The scope of a verb cue (e.g. appears, seems) followed by an infinitival clause extends to the whole sentence. Similar lexico-syntactic rules have been also manually compiled and used in a number of hedge scope detection systems, e.g. (Kilicoglu and Bergler, 2008), (Rei and Briscoe, 2010), (Velldal et al., 2010), (Kilicoglu and Bergler, 2010), (Zhou et al., 2010). However, manually creating a comprehensive set of such lexico-syntactic scope rules is a laborious and time-consuming process. In addition, such an approach relies heavily on the availability of accurately parsed sentences, which could be problematic for domains such as biomedical texts (Clegg and Shepherd, 2007; McClosky and Charniak, 2008). Instead, we attempted to automatically extract lexico-syntactic scope rules from the BioScope corpus, relying only on consistent (but not necessarily accurate) parse tree representations. We first parsed each sentence in the training dataset which contained a negation or speculation cue using the Stanford parser (Klein and Manning, 2003; De Marneffe et al., 2006). Figure 1 shows the parse tree of a sample sentence containing a negation cue and its scope. Next, for each cue-scope instance within the sen- tence, we identified the nearest common ancestor Figure 2: Lexico-syntactic pattern extracted from the sentence from Figure 1. The rule is equivalent to the following string representation: (VP (VBP lack) (NP (JJ *scope*) (NN *scope*) (NN *scope*))). which encompassed the cue word(s) and all words in the scope (shown in a box on Figure 1). The subtree rooted by this ancestor is the basis for the resulting lexico-syntactic rule. The leaf nodes of the resulting subtree were converted to a generalized representation: scope words were converted to *scope*; noncue and non-scope words were converted to *; cue words were converted to lower case. Figure 2 shows the resulting rule. This rule generation approach resulted in a large number of very specific rule patterns - 1,681 nega- tion scope rules and 3,043 speculation scope rules were extracted from the training dataset. To identify a more general set of rules (and increase recall) we next performed a simple transformation of the derived rule set. If all children of a rule tree node are of type *scope* or * (i.e. noncue words), the node label is replaced by *scope* or * respectively, and the node’s children are pruned from the rule tree; neighboring identical siblings of type *scope* or * are replaced by a single node of the corresponding type. Figure 3 shows an example of this transformation. (a)ThechildrenofnodesJ /N /N are(b)Thechildren pruned and their labels are replaced by of node NP are *scope*. pruned and its label is replaced by *scope*. Figure 3: Transformation of the tree shown in Figure 2. The final rule is equivalent to the following string representation: (VP (VBP lack) *scope* ) 285 The rule tree pruning described above reduced the negation scope rule patterns to 439 and the speculation rule patterns to 1,000. In addition to generating a set of scope finding rules, we also implemented a module that parses string representations of the lexico-syntactic rules and performs subtree matching. The ScopeFinder module2 identifies negation and speculation scopes in sentence parse trees using string-encoded lexicosyntactic patterns. Candidate sentence parse subtrees are first identified by matching the path of cue leafnodes to the root ofthe rule subtree pattern. Ifan identical path exists in the sentence, the root of the candidate subtree is thus also identified. The candidate subtree is evaluated for a match by recursively comparing all node children (starting from the root of the subtree) to the rule pattern subtree. Nodes of type *scope* and * match any number of nodes, similar to the semantics of Regex Kleene star (*). 5 Results As an informed baseline, we used a previously de- veloped rule-based system for negation and speculation scope discovery (Apostolova and Tomuro, 2010). The system, inspired by the NegEx algorithm (Chapman et al., 2001), uses a list of phrases split into subsets (preceding vs. following their scope) to identify cues using string matching. The cue scopes extend from the cue to the beginning or end of the sentence, depending on the cue type. Table 3 shows the baseline results. PSFCNalpueingpleciarPutcAlai opbtneisor tacsP6597C348o.r12075e4ctly6859RP203475r. 81e26d037icteF569784C52. 04u913e84s5F2A81905l.2786P14redictCus Table 3: Baseline system performance. P (Precision), R (Recall), and F (F1-score) are computed based on the sentence tokens of correctly predicted cues. The last column shows the F1-score for sentence tokens of all predicted cues (including erroneous ones). We used only the scopes of predicted cues (correctly predicted cues vs. all predicted cues) to mea- 2The rule sets and source code are publicly available at http://scopefinder.sourceforge.net/. sure the baseline system performance. The baseline system heuristics did not contain all phrase cues present in the dataset. The scopes of cues that are missing from the baseline system were not included in the results. As the baseline system was not penalized for missing cue phrases, the results represent the upper bound of the system. Table 4 shows the results from applying the full extracted rule set (1,681 negation scope rules and 3,043 speculation scope rules) on the test data. As expected, this rule set consisting of very specific scope matching rules resulted in very high precision and very low recall. Negation P R F A Clinical99.4734.3051.0117.58 Full Papers Paper Abstracts 95.23 87.33 25.89 05.78 40.72 10.84 28.00 07.85 Speculation Clinical96.5020.1233.3022.90 Full Papers Paper Abstracts 88.72 77.50 15.89 11.89 26.95 20.62 10.13 10.00 Table 4: Results from applying the full extracted rule set on the test data. Precision (P), Recall (R), and F1-score (F) are com- puted based the number of correctly identified scope tokens in each sentence. Accuracy (A) is computed for correctly identified full scopes (exact match). Table 5 shows the results from applying the rule set consisting of pruned pattern trees (439 negation scope rules and 1,000 speculation scope rules) on the test data. As shown, overall results improved significantly, both over the baseline and over the unpruned set of rules. Comparable results are shown in bold in Tables 3, 4, and 5. Negation P R F A Clinical85.5992.1588.7585.56 Full Papers 49.17 94.82 64.76 71.26 Paper Abstracts 61.48 92.64 73.91 80.63 Speculation Clinical67.2586.2475.5771.35 Full Papers 65.96 98.43 78.99 52.63 Paper Abstracts 60.24 95.48 73.87 65.28 Table 5: Results from applying the pruned rule set on the test data. Precision (P), Recall (R), and F1-score (F) are computed based on the number of correctly identified scope tokens in each sentence. Accuracy (A) is computed for correctly identified full scopes (exact match). 6 Related Work Interest in the task of identifying negation and spec- ulation scopes has developed in recent years. Rele286 vant research was facilitated by the appearance of a publicly available annotated corpus. All systems described below were developed and evaluated against the BioScope corpus (Vincze et al., 2008). O¨zg u¨r and Radev (2009) have developed a supervised classifier for identifying speculation cues and a manually compiled list of lexico-syntactic rules for identifying their scopes. For the performance of the rule based system on identifying speculation scopes, they report 61. 13 and 79.89 accuracy for BioScope full papers and abstracts respectively. Similarly, Morante and Daelemans (2009b) developed a machine learning system for identifying hedging cues and their scopes. They modeled the scope finding problem as a classification task that determines if a sentence token is the first token in a scope sequence, the last one, or neither. Results of the scope finding system with predicted hedge signals were reported as F1-scores of 38. 16, 59.66, 78.54 and for clinical texts, full papers, and abstracts respectively3. Accuracy (computed for correctly identified scopes) was reported as 26.21, 35.92, and 65.55 for clinical texts, papers, and abstracts respectively. Morante and Daelemans have also developed a metalearner for identifying the scope of negation (2009a). Results of the negation scope finding system with predicted cues are reported as F1-scores (computed on scope tokens) of 84.20, 70.94, and 82.60 for clinical texts, papers, and abstracts respectively. Accuracy (the percent of correctly identified exact scopes) is reported as 70.75, 41.00, and 66.07 for clinical texts, papers, and abstracts respectively. The top three best performers on the CoNLL2010 shared task on hedge scope detection (Farkas et al., 2010) report an F1-score for correctly identified hedge cues and their scopes ranging from 55.3 to 57.3. The shared task evaluation metrics used stricter matching criteria based on exact match of both cues and their corresponding scopes4. CoNLL-2010 shared task participants applied a variety of rule-based and machine learning methods 3F1-scores are computed based on scope tokens. Unlike our evaluation metric, scope token matches are computed for each cue within a sentence, i.e. a token is evaluated multiple times if it belongs to more than one cue scope. 4Our system does not focus on individual cue-scope pair de- tection (we instead optimized scope detection) and as a result performance metrics are not directly comparable. on the task - Morante et al. (2010) used a memorybased classifier based on the k-nearest neighbor rule to determine if a token is the first token in a scope sequence, the last, or neither; Rei and Briscoe (2010) used a combination of manually compiled rules, a CRF classifier, and a sequence of post-processing steps on the same task; Velldal et al (2010) manually compiled a set of heuristics based on syntactic information taken from dependency structures. 7 Discussion We presented a method for automatic extraction of lexico-syntactic rules for negation/speculation scopes from an annotated corpus. The developed ScopeFinder system, based on the automatically extracted rule sets, was compared to a baseline rule-based system that does not use syntactic information. The ScopeFinder system outperformed the baseline system in all cases and exhibited results comparable to complex feature-based, machine-learning systems. In future work, we will explore the use of statistically based methods for the creation of an optimum set of lexico-syntactic tree patterns and will evaluate the system performance on texts from different domains. References E. Apostolova and N. Tomuro. 2010. Exploring surfacelevel heuristics for negation and speculation discovery in clinical texts. In Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, pages 81–82. Association for Computational Linguistics. W.W. Chapman, W. Bridewell, P. Hanbury, G.F. Cooper, and B.G. Buchanan. 2001. A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of biomedical informatics, 34(5):301–310. A.B. Clegg and A.J. Shepherd. 2007. Benchmarking natural-language parsers for biological applications using dependency graphs. BMC bioinformatics, 8(1):24. M.C. De Marneffe, B. MacCartney, and C.D. Manning. 2006. Generating typed dependency parses from phrase structure parses. In LREC 2006. Citeseer. R. Farkas, V. Vincze, G. M o´ra, J. Csirik, and G. Szarvas. 2010. The CoNLL-2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text. In Proceedings of the Fourteenth Conference on 287 Computational Natural Language Learning (CoNLL2010): Shared Task, pages 1–12. H. Kilicoglu and S. Bergler. 2008. Recognizing speculative language in biomedical research articles: a linguistically motivated perspective. BMC bioinformatics, 9(Suppl 11):S10. H. Kilicoglu and S. Bergler. 2010. A High-Precision Approach to Detecting Hedges and Their Scopes. CoNLL-2010: Shared Task, page 70. D. Klein and C.D. Manning. 2003. Fast exact inference with a factored model for natural language parsing. Advances in neural information processing systems, pages 3–10. D. McClosky and E. Charniak. 2008. Self-training for biomedical parsing. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pages 101–104. Association for Computational Linguistics. R. Morante and W. Daelemans. 2009a. A metalearning approach to processing the scope of negation. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pages 21–29. Association for Computational Linguistics. R. Morante and W. Daelemans. 2009b. Learning the scope of hedge cues in biomedical texts. In Proceed- ings of the Workshop on BioNLP, pages 28–36. Association for Computational Linguistics. R. Morante, V. Van Asch, and W. Daelemans. 2010. Memory-based resolution of in-sentence scopes of hedge cues. CoNLL-2010: Shared Task, page 40. A. O¨zg u¨r and D.R. Radev. 2009. Detecting speculations and their scopes in scientific text. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3-Volume 3, pages 1398–1407. Association for Computational Linguistics. M. Rei and T. Briscoe. 2010. Combining manual rules and supervised learning for hedge cue and scope detection. In Proceedings of the 14th Conference on Natural Language Learning, pages 56–63. E. Velldal, L. Øvrelid, and S. Oepen. 2010. Resolving Speculation: MaxEnt Cue Classification and Dependency-Based Scope Rules. CoNLL-2010: Shared Task, page 48. V. Vincze, G. Szarvas, R. Farkas, G. M o´ra, and J. Csirik. 2008. The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC bioinformatics, 9(Suppl 11):S9. H. Zhou, X. Li, D. Huang, Z. Li, and Y. Yang. 2010. Exploiting Multi-Features to Detect Hedges and Their Scope in Biomedical Texts. CoNLL-2010: Shared Task, page 106.
4 0.57894474 42 acl-2011-An Interface for Rapid Natural Language Processing Development in UIMA
Author: Balaji Soundrarajan ; Thomas Ginter ; Scott DuVall
Abstract: This demonstration presents the Annotation Librarian, an application programming interface that supports rapid development of natural language processing (NLP) projects built in Apache Unstructured Information Management Architecture (UIMA). The flexibility of UIMA to support all types of unstructured data – images, audio, and text – increases the complexity of some of the most common NLP development tasks. The Annotation Librarian interface handles these common functions and allows the creation and management of annotations by mirroring Java methods used to manipulate Strings. The familiar syntax and NLP-centric design allows developers to adopt and rapidly develop NLP algorithms in UIMA. The general functionality of the interface is described in relation to the use cases that necessitated its creation. 1
5 0.56891578 74 acl-2011-Combining Indicators of Allophony
Author: Luc Boruta
Abstract: Allophonic rules are responsible for the great variety in phoneme realizations. Infants can not reliably infer abstract word representations without knowledge of their native allophonic grammar. We explore the hypothesis that some properties of infants’ input, referred to as indicators, are correlated with allophony. First, we provide an extensive evaluation of individual indicators that rely on distributional or lexical information. Then, we present a first evaluation of the combination of indicators of different types, considering both logical and numerical combinations schemes. Though distributional and lexical indicators are not redundant, straightforward combinations do not outperform individual indicators.
6 0.56779945 273 acl-2011-Semantic Representation of Negation Using Focus Detection
7 0.55324286 8 acl-2011-A Corpus of Scope-disambiguated English Text
9 0.52763844 24 acl-2011-A Scalable Probabilistic Classifier for Language Modeling
10 0.52610624 278 acl-2011-Semi-supervised condensed nearest neighbor for part-of-speech tagging
12 0.48207057 97 acl-2011-Discovering Sociolinguistic Associations with Structured Sparsity
13 0.4772014 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition
14 0.47196946 301 acl-2011-The impact of language models and loss functions on repair disfluency detection
15 0.46932575 78 acl-2011-Confidence-Weighted Learning of Factored Discriminative Language Models
16 0.46811819 157 acl-2011-I Thou Thee, Thou Traitor: Predicting Formal vs. Informal Address in English Literature
17 0.46557975 102 acl-2011-Does Size Matter - How Much Data is Required to Train a REG Algorithm?
18 0.45870781 80 acl-2011-ConsentCanvas: Automatic Texturing for Improved Readability in End-User License Agreements
19 0.45262858 223 acl-2011-Modeling Wisdom of Crowds Using Latent Mixture of Discriminative Experts
20 0.45033216 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation
topicId topicWeight
[(1, 0.013), (5, 0.052), (13, 0.025), (17, 0.033), (26, 0.022), (34, 0.029), (37, 0.074), (39, 0.057), (41, 0.043), (55, 0.034), (59, 0.043), (61, 0.28), (72, 0.036), (91, 0.038), (96, 0.112), (97, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.75168693 165 acl-2011-Improving Classification of Medical Assertions in Clinical Notes
Author: Youngjun Kim ; Ellen Riloff ; Stephane Meystre
Abstract: We present an NLP system that classifies the assertion type of medical problems in clinical notes used for the Fourth i2b2/VA Challenge. Our classifier uses a variety of linguistic features, including lexical, syntactic, lexicosyntactic, and contextual features. To overcome an extremely unbalanced distribution of assertion types in the data set, we focused our efforts on adding features specifically to improve the performance of minority classes. As a result, our system reached 94. 17% micro-averaged and 79.76% macro-averaged F1-measures, and showed substantial recall gains on the minority classes. 1
2 0.73446226 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes
Author: Bo Pang ; Ravi Kumar
Abstract: Web search is an information-seeking activity. Often times, this amounts to a user seeking answers to a question. However, queries, which encode user’s information need, are typically not expressed as full-length natural language sentences in particular, as questions. Rather, they consist of one or more text fragments. As humans become more searchengine-savvy, do natural-language questions still have a role to play in web search? Through a systematic, large-scale study, we find to our surprise that as time goes by, web users are more likely to use questions to express their search intent. —
3 0.649827 228 acl-2011-N-Best Rescoring Based on Pitch-accent Patterns
Author: Je Hun Jeon ; Wen Wang ; Yang Liu
Abstract: In this paper, we adopt an n-best rescoring scheme using pitch-accent patterns to improve automatic speech recognition (ASR) performance. The pitch-accent model is decoupled from the main ASR system, thus allowing us to develop it independently. N-best hypotheses from recognizers are rescored by additional scores that measure the correlation of the pitch-accent patterns between the acoustic signal and lexical cues. To test the robustness of our algorithm, we use two different data sets and recognition setups: the first one is English radio news data that has pitch accent labels, but the recognizer is trained from a small amount ofdata and has high error rate; the second one is English broadcast news data using a state-of-the-art SRI recognizer. Our experimental results demonstrate that our approach is able to reduce word error rate relatively by about 3%. This gain is consistent across the two different tests, showing promising future directions of incorporating prosodic information to improve speech recognition.
4 0.62688094 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition
Author: Stefan Rud ; Massimiliano Ciaramita ; Jens Muller ; Hinrich Schutze
Abstract: We use search engine results to address a particularly difficult cross-domain language processing task, the adaptation of named entity recognition (NER) from news text to web queries. The key novelty of the method is that we submit a token with context to a search engine and use similar contexts in the search results as additional information for correctly classifying the token. We achieve strong gains in NER performance on news, in-domain and out-of-domain, and on web queries.
5 0.60686868 147 acl-2011-Grammatical Error Correction with Alternating Structure Optimization
Author: Daniel Dahlmeier ; Hwee Tou Ng
Abstract: We present a novel approach to grammatical error correction based on Alternating Structure Optimization. As part of our work, we introduce the NUS Corpus of Learner English (NUCLE), a fully annotated one million words corpus of learner English available for research purposes. We conduct an extensive evaluation for article and preposition errors using various feature sets. Our experiments show that our approach outperforms two baselines trained on non-learner text and learner text, respectively. Our approach also outperforms two commercial grammar checking software packages.
6 0.53101099 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning
7 0.50980669 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
8 0.50950068 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
9 0.50886869 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing
10 0.50839561 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing
11 0.50762624 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation
12 0.5074234 209 acl-2011-Lexically-Triggered Hidden Markov Models for Clinical Document Coding
13 0.50637215 133 acl-2011-Extracting Social Power Relationships from Natural Language
14 0.50636601 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling
15 0.50635898 186 acl-2011-Joint Training of Dependency Parsing Filters through Latent Support Vector Machines
16 0.50551927 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
17 0.50389123 164 acl-2011-Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features
18 0.50388116 178 acl-2011-Interactive Topic Modeling
19 0.50318825 5 acl-2011-A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing
20 0.50242853 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization