acl acl2011 acl2011-273 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Eduardo Blanco ; Dan Moldovan
Abstract: Negation is present in all human languages and it is used to reverse the polarity of part of statements that are otherwise affirmative by default. A negated statement often carries positive implicit meaning, but to pinpoint the positive part from the negative part is rather difficult. This paper aims at thoroughly representing the semantics of negation by revealing implicit positive meaning. The proposed representation relies on focus of negation detection. For this, new annotation over PropBank and a learning algorithm are proposed.
Reference: text
sentIndex sentText sentNum sentScore
1 edu as Abstract Negation is present in all human languages and it is used to reverse the polarity of part of statements that are otherwise affirmative by default. [sent-3, score-0.207]
2 A negated statement often carries positive implicit meaning, but to pinpoint the positive part from the negative part is rather difficult. [sent-4, score-0.532]
3 This paper aims at thoroughly representing the semantics of negation by revealing implicit positive meaning. [sent-5, score-0.852]
4 The proposed representation relies on focus of negation detection. [sent-6, score-0.756]
5 Negation is present in all languages and it is always the case that statements are affirmative by default. [sent-12, score-0.177]
6 Negation is fairly well-understood in grammars; the valid ways to express a negation are documented. [sent-18, score-0.644]
7 At first glance, one would think that interpreting negation could be reduced to finding negative keywords, detect their scope using syntactic analysis and reverse its polarity. [sent-21, score-0.824]
8 Detecting the scope of negation in itself is challenging: All vegetarians do not eat meat means that vegetarians do not eat meat and yet All that glitters is not gold means that it is not the case that all that glitters is gold (so out of all things that glitter, some are gold and some are not). [sent-24, score-1.595]
9 In the former example, the universal quantifier all has scope over the negation; in the latter, the negation has scope over all. [sent-25, score-0.942]
10 For example, cows do not eat meat implies that cows eat something other than meat. [sent-29, score-0.928]
11 Otherwise, the speaker would have stated cows do not eat. [sent-30, score-0.192]
12 A clearer example is the correct and yet puzzling statement tables do not eat meat. [sent-31, score-0.317]
13 Ac s2s0o1ci1a Atiosnso fcoirat Cioonm foprut Caotimonpaulta Lti nognuails Lti cnsg,u piasgteics 581–589, unnatural because of the underlying positive statement (i. [sent-34, score-0.179]
14 Contrasts may use negation to disagree about a statement and not to negate it, e. [sent-39, score-0.745]
15 In logic, negation is usually the simplest unary operator and it reverses the truth value. [sent-43, score-0.685]
16 Linguists have found negation a complex phenomenon; Huddleston and Pullum (2002) dedicate over 60 pages to it. [sent-45, score-0.644]
17 In this paper, we follow the insights on scope and focus of negation by Huddleston and Pullum (2002) rather than Rooth’s (1985). [sent-51, score-0.905]
18 Within natural language processing, negation has drawn attention mainly in sentiment analysis (Wilson et al. [sent-52, score-0.644]
19 Morante and Daelemans (2009) and O¨zg u¨r and Radev (2009) propose scope detectors using the BioScope corpus. [sent-57, score-0.149]
20 (2010) present a supervised scope detector using their own annotation. [sent-59, score-0.149]
21 Regarding corpora, the BioScope corpus anno- tates negation marks and linguistic scopes exclusively on biomedical texts. [sent-64, score-0.692]
22 It does not annotate focus and it purposely ignores negations such as (talk582 ing about the reaction of certain elements) in NK3. [sent-65, score-0.162]
23 , 2008), which carry the kind of positive meaning this work aims at extracting (in NK3. [sent-67, score-0.224]
24 , 2005) only indicates the verb to which a negation mark attaches; it does not provide any information about the scope or focus. [sent-70, score-0.836]
25 , 1998) does not consider negation and FactBank (Saur ´ı and Pustejovsky, 2009) only annotates degrees offactuality for events. [sent-72, score-0.644]
26 None of the above references aim at detecting or annotating the focus of negation in natural language. [sent-73, score-0.789]
27 Neither do they aim at carefully representing the meaning of negated statements nor extracting implicit positive meaning from them. [sent-74, score-0.612]
28 3 Negation in Natural Language Simply put, negation is a process that turns a statement into its opposite. [sent-75, score-0.745]
29 Unlike affirmative state- ments, negation is marked by words (e. [sent-76, score-0.768]
30 For example, negated clauses use different connective adjuncts that positive clauses do: neither, nor instead of either, or. [sent-82, score-0.321]
31 , the ones trained over PropBank) do not completely represent the meaning of negated statements. [sent-90, score-0.276]
32 Given John didn ’t build a house to impress Mary, they encode AGENT(John, build), THEME(a house, build), PURPOSE(to impress Mary, build), NEGATION(n’t, build). [sent-91, score-0.216]
33 This representation corresponds to the interpretation it is not the case that John built a house to impress Mary, ignoring that it is implicitly stated that John did build a house. [sent-92, score-0.251]
34 For all statements s, current role labelers would only encode it is not the case that s. [sent-94, score-0.159]
35 However, examples (1–7) Table 1: Examples of negated statements and their interpretations considering underlying positive meaning. [sent-95, score-0.379]
36 A wavy underline indicates the focus of negation (Section 3. [sent-96, score-0.756]
37 (6–8) show that different verb arguments modify the interpretation and even signal the existence of positive meaning. [sent-100, score-0.212]
38 Note that (8, 9) do not carry any positive meaning; even though their interpretations do not contain a verbal negation, the meaning remains negative. [sent-102, score-0.301]
39 This paper aims at thoroughly representing the semantics of negation by revealing implicit positive meaning. [sent-106, score-0.852]
40 The main contributions are: (1) interpretation of negation using focus detection; (2) focus of negation annotation over all PropBank negated sentences1 ; (3) feature set to detect the focus of negation; and (4) model to semantically represent negation and reveal its underlying positive meaning. [sent-107, score-2.709]
41 • Analytic if the sole negated Amanarkly ti sc t iof tmheark s negation (Bill d tihde not go); synthetic if it has some other function as well function of the ([Nobody]AGENTwent to the meeting). [sent-110, score-0.831]
42 1Annotation will be available on the author’s website • 583 Clausal if the negation yields a negative clause (She didn’t hhaev nee a large income); nsuegbactliavuesca l aoutshe- erwise (She had a not inconsiderable income). [sent-111, score-0.676]
43 3 Scope and Focus Negation has both scope and focus and they are extremely important to capture its semantics. [sent-120, score-0.261]
44 Focus is that part of the scope that is most prominently or explicitly negated (Huddleston and Pullum, 2002). [sent-122, score-0.336]
45 Scope corresponds to all elements any of whose individual falsity would make the negated statement true. [sent-124, score-0.371]
46 Focus is the element of the scope that is intended to be interpreted as false to make the overall negative true. [sent-125, score-0.212]
47 Consider (1) Cows don ’t eat meat and its positive counterpart (2) Cows eat meat. [sent-126, score-0.657]
48 The truth conditions of (2) are: (a) somebody eats something; (b) cows are the ones who eat; and (c) meat is what is eaten. [sent-127, score-0.366]
49 In other words, (1) would be true if nobody eats, cows don ’t eat or meat is not eaten. [sent-130, score-0.523]
50 Therefore, all three statements (a–c) are inside the scope of (1). [sent-131, score-0.23]
51 The most probable focus for (1) is meat, which corresponds to the interpretation cows eat something else than meat. [sent-135, score-0.685]
52 Another possible focus is cows, which yields someone eats meat, but not cows. [sent-136, score-0.192]
53 Both scope and focus are primarily semantic, highly ambiguous and context-dependent. [sent-137, score-0.261]
54 In this Section, we outline how to incorporate negation into semantic relations. [sent-142, score-0.685]
55 Given s: The cow didn ’t eat grass with a fork, typical semantic roles encode AGENT(the cow, eat), THEME(grass, eat), INSTRUMENT(with a fork, eat) and NEGATION(n’ ’t, eat). [sent-147, score-0.714]
56 Second, we believe detecting the focus of negation is useful. [sent-152, score-0.789]
57 Even though it is open to discussion, the focus corresponds to INSTRUMENT(with a fork, ate) Thus, the negated statement should be interpreted as the cow ate grass, but it did not do so using a fork. [sent-153, score-0.766]
58 It attaches the negated mark and auxiliary to eat; the negation is part of the relation arguments. [sent-156, score-0.831]
59 This option fails to detect any underlying positive meaning and corresponds to the interpretation the cow did not eat, grass was not eaten and a fork was not used to eat. [sent-157, score-0.841]
60 Options (2–5) embody negation into the representation with the pseudo-relation NOT. [sent-158, score-0.644]
61 Option (2) includes all the scope as the argument of NOT and corresponds to the interpretation it is not the case that the cow ate grass with afork. [sent-160, score-0.689]
62 Like typical semantic roles, option (2) does not reveal the implicit positive meaning carried by statement s. [sent-161, score-0.451]
63 Options (3–5) encode different interpretations: (3) negates the AGENT; it corresponds to the cow (d3i)dn n ’etg eat, sb tuhet grass was eaten with a fork. [sent-162, score-0.389]
64 • (4) applies NOT to the THEME; it corresponds to t(4he) cow ate something with a fork, but not grass. [sent-163, score-0.358]
65 • (5) denies the INSTRUMENT, encoding the meaning dtehen cow ate grass, but it did not use a fork. [sent-164, score-0.345]
66 Option (5) is preferred since it captures the best implicit positive meaning. [sent-165, score-0.166]
67 It corresponds to the semantic representation of the affirmative counterpart after applying the pseudo-relation NOT over the focus of the negation. [sent-166, score-0.33]
68 2 Annotating the Focus of Negation Due to the lack of corpora containing annotation for focus of negation, new annotation is needed. [sent-169, score-0.22]
69 However, building on top of publicly available resources is a better approach: they are known by the community, they contain useful information for detecting the focus of negation and tools have already been developed to predict their annotation. [sent-171, score-0.789]
70 is underlined; ‘+’ indicates that the role is present, ‘-’ that it is not and ‘⋆’ that it corresponds to the focus of negation. [sent-172, score-0.237]
71 1 Annotation Guidelines The focus of a negation involving verb v is resolved as: If it cannot be inferred that an action v occurred, fnoncouts bise ro inlefe MNEG. [sent-182, score-0.799]
72 • Otherwise, focus is the role that is most prominently negated. [sent-183, score-0.19]
73 Regarding the first verb (growing), one cannot infer that anything was growing, so focus is MNEG. [sent-189, score-0.187]
74 For the second verb (providing), it is implicitly stated that the company was providing a not satisfactory return on investment, therefore, focus is A1. [sent-190, score-0.277]
75 The guidelines assume that the focus corresponds to a single role or the verb. [sent-191, score-0.237]
76 In cases where more than one role could be selected, the most likely focus is chosen; context and text understanding are key. [sent-192, score-0.19]
77 We define the most likely focus as the one that yields the most meaningful implicit information. [sent-193, score-0.232]
78 Figure 1: Example of focus annotation (marked with adamant about eating only Hunt’s ketchup), it is clear that the best option is A1. [sent-201, score-0.22]
79 The role that yields the most useful positive implicit information given the context is always chosen as focus. [sent-203, score-0.276]
80 Example (1) does not carry any positive meaning, the focus is V. [sent-205, score-0.247]
81 In (2–10) the verb must be interpreted as affirmative, as well as all roles except the one marked with ‘⋆’ (i. [sent-206, score-0.205]
82 2 Interpretation of -NOT The mark -NOT is interpreted as follows: • If MNEG-NOT(x, y), then verb y must be negated; the statement does not carry positive meaning. [sent-214, score-0.342]
83 • If any other role is marked with -NOT, ROLENOT(x, y) mr ruoslet b ise interpreted as it is not the case that x is ROLE ofy. [sent-215, score-0.169]
84 Unmarked roles are interpreted positive; they correspond to implicit positive meaning. [sent-216, score-0.3]
85 The new annotation for the example (Figure 1) must be interpreted as: While profitable, it (the company) was not growing and was providing a not satisfactory return on investment. [sent-221, score-0.19]
86 Before annotation began, all semantic information was removed by mapping all role labels to ARG. [sent-226, score-0.173]
87 A post-processing step incorporates focus annotation to the original PropBank by adding -NOT to the corresponding role. [sent-236, score-0.166]
88 The main point of conflict was selecting a focus that yields valid implicit meaning, but not the most valuable (Section 4. [sent-241, score-0.232]
89 Each sentence from PropBank containing a verbal negation becomes an instance. [sent-248, score-0.688]
90 Because PropBank adds semantic role annotation on top of the Penn TreeBank, we have available syntactic annotation and semantic role labels for all instances. [sent-274, score-0.346]
91 Features (15–16) check for POS tags as the presence of certain tags usually signal that the verb is not the focus of negation (e. [sent-309, score-0.799]
92 rithm exclusively the label corresponding to the last role and flags indicating the presence of roles yields 61. [sent-334, score-0.181]
93 and σd= 7 Conclusions In this paper, we present a novel way to semantically represent negation using focus detection. [sent-344, score-0.756]
94 Implicit positive meaning is identified, giving a thorough interpretation of negated statements. [sent-345, score-0.445]
95 Due to the lack of corpora annotating the focus of negation, we have added this information to all the negations marked with MNEG in PropBank. [sent-346, score-0.19]
96 A verbal negation is interpreted by considering all roles positive except the one corresponding to the focus. [sent-349, score-0.9]
97 In some cases, though, it is not easy to obtain the meaning of a negated role. [sent-351, score-0.276]
98 Empirical data (Table 4) shows that over 65% of negations in PropBank carry implicit positive meaning. [sent-356, score-0.273]
99 The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. [sent-453, score-0.692]
100 A survey on the role of negation in sentiment analysis. [sent-457, score-0.722]
wordName wordTfidf (topN-words)
[('negation', 0.644), ('eat', 0.216), ('negated', 0.187), ('propbank', 0.17), ('cow', 0.164), ('cows', 0.164), ('scope', 0.149), ('grass', 0.146), ('huddleston', 0.127), ('meat', 0.113), ('focus', 0.112), ('fork', 0.109), ('statement', 0.101), ('affirmative', 0.096), ('ate', 0.092), ('interpretation', 0.091), ('meaning', 0.089), ('pullum', 0.089), ('implicit', 0.088), ('statements', 0.081), ('role', 0.078), ('positive', 0.078), ('didn', 0.076), ('mneg', 0.073), ('roles', 0.071), ('bioscope', 0.064), ('lunch', 0.064), ('interpreted', 0.063), ('instrument', 0.061), ('company', 0.06), ('morante', 0.059), ('carry', 0.057), ('something', 0.055), ('theme', 0.055), ('impress', 0.055), ('profitable', 0.055), ('sold', 0.055), ('annotation', 0.054), ('option', 0.054), ('speculation', 0.053), ('horn', 0.053), ('negations', 0.05), ('agent', 0.049), ('eats', 0.048), ('rooth', 0.048), ('biomedical', 0.048), ('corresponds', 0.047), ('vincze', 0.044), ('roser', 0.044), ('verbal', 0.044), ('verb', 0.043), ('thoroughly', 0.042), ('semantic', 0.041), ('truth', 0.041), ('farkas', 0.04), ('growing', 0.039), ('anybody', 0.036), ('buyers', 0.036), ('factbank', 0.036), ('falsity', 0.036), ('glitters', 0.036), ('metalinguistic', 0.036), ('mtmp', 0.036), ('oy', 0.036), ('saur', 0.036), ('unhappy', 0.036), ('vegetarians', 0.036), ('ag', 0.036), ('logic', 0.035), ('counterpart', 0.034), ('satisfactory', 0.034), ('interpretations', 0.033), ('detecting', 0.033), ('yields', 0.032), ('anything', 0.032), ('eaten', 0.032), ('clausal', 0.032), ('income', 0.032), ('gyorgy', 0.032), ('anchez', 0.032), ('mats', 0.032), ('framenet', 0.031), ('detect', 0.031), ('house', 0.03), ('polarity', 0.03), ('sporleder', 0.03), ('orgy', 0.03), ('veronika', 0.03), ('zg', 0.03), ('nobody', 0.03), ('wiegand', 0.03), ('marked', 0.028), ('clauses', 0.028), ('stated', 0.028), ('bos', 0.028), ('analytic', 0.028), ('anyone', 0.028), ('coffee', 0.028), ('laurence', 0.028), ('eduardo', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000013 273 acl-2011-Semantic Representation of Negation Using Focus Detection
Author: Eduardo Blanco ; Dan Moldovan
Abstract: Negation is present in all human languages and it is used to reverse the polarity of part of statements that are otherwise affirmative by default. A negated statement often carries positive implicit meaning, but to pinpoint the positive part from the negative part is rather difficult. This paper aims at thoroughly representing the semantics of negation by revealing implicit positive meaning. The proposed representation relies on focus of negation detection. For this, new annotation over PropBank and a learning algorithm are proposed.
2 0.33591881 50 acl-2011-Automatic Extraction of Lexico-Syntactic Patterns for Detection of Negation and Speculation Scopes
Author: Emilia Apostolova ; Noriko Tomuro ; Dina Demner-Fushman
Abstract: Detecting the linguistic scope of negated and speculated information in text is an important Information Extraction task. This paper presents ScopeFinder, a linguistically motivated rule-based system for the detection of negation and speculation scopes. The system rule set consists of lexico-syntactic patterns automatically extracted from a corpus annotated with negation/speculation cues and their scopes (the BioScope corpus). The system performs on par with state-of-the-art machine learning systems. Additionally, the intuitive and linguistically motivated rules will allow for manual adaptation of the rule set to new domains and corpora. 1 Motivation Information Extraction (IE) systems often face the problem of distinguishing between affirmed, negated, and speculative information in text. For example, sentiment analysis systems need to detect negation for accurate polarity classification. Similarly, medical IE systems need to differentiate between affirmed, negated, and speculated (possible) medical conditions. The importance of the task of negation and speculation (a.k.a. hedge) detection is attested by a number of research initiatives. The creation of the BioScope corpus (Vincze et al., 2008) assisted in the development and evaluation of several negation/hedge scope detection systems. The corpus consists of medical and biological texts annotated for negation, speculation, and their linguistic scope. The 2010 283 Noriko Tomuro Dina Demner-Fushman DePaul University Chicago, IL USA t omuro @ c s . depaul . edu National Library of Medicine Bethesda, MD USA ddemne r@mai l nih . gov . i2b2 NLP Shared Task1 included a track for detection of the assertion status of medical problems (e.g. affirmed, negated, hypothesized, etc.). The CoNLL2010 Shared Task (Farkas et al., 2010) focused on detecting hedges and their scopes in Wikipedia articles and biomedical texts. In this paper, we present a linguistically motivated rule-based system for the detection of negation and speculation scopes that performs on par with state-of-the-art machine learning systems. The rules used by the ScopeFinder system are automatically extracted from the BioScope corpus and encode lexico-syntactic patterns in a user-friendly format. While the system was developed and tested using a biomedical corpus, the rule extraction mechanism is not domain-specific. In addition, the linguistically motivated rule encoding allows for manual adaptation to new domains and corpora. 2 Task Definition Negation/Speculation detection is typically broken down into two sub-tasks - discovering a negation/speculation cue and establishing its scope. The following example from the BioScope corpus shows the annotated hedging cue (in bold) together with its associated scope (surrounded by curly brackets): Finally, we explored the {possible role of 5hydroxyeicosatetraenoic acid as a regulator of arachidonic acid liberation}. Typically, systems first identify negation/speculation cues and subsequently try to identify their associated cue scope. However, the two tasks are interrelated and both require 1https://www.i2b2.org/NLP/Relations/ Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 283–287, syntactic understanding. Consider the following two sentences from the BioScope corpus: 1) By contrast, {D-mib appears to be uniformly expre1ss)e Bdy yin c oimnatrgaisnta,l { dDis-mcsi }b. 2) Differentiation assays using water soluble phorbol esters reveal that differentiation becomes irreversible soon after AP-1 appears. Both sentences contain the word form appears, however in the first sentence the word marks a hedg- ing cue, while in the second sentence the word does not suggest speculation. Unlike previous work, we do not attempt to identify negation/speculation cues independently of their scopes. Instead, we concentrate on scope detection, simultaneously detecting corresponding cues. 3 Dataset We used the BioScope corpus (Vincze et al., 2008) to develop our system and evaluate its performance. To our knowledge, the BioScope corpus is the only publicly available dataset annotated with negation/speculation cues and their scopes. It consists of biomedical papers, abstracts, and clinical reports (corpus statistics are shown in Tables 1 and 2). Corpus Type Sentences Documents Mean Document Size Clinical752019543.85 Full Papers Paper Abstracts 3352 14565 9 1273 372.44 11.44 Table 1: Statistics of the BioScope corpus. Document sizes represent number of sentences. Corpus Type Negation Cues Speculation Cues Negation Speculation Clinical87211376.6%13.4% Full Papers Paper Abstracts 378 1757 682 2694 13.76% 13.45% 22.29% 17.69% Table 2: Statistics of the BioScope corpus. The 2nd and 3d columns show the total number of cues within the datasets; the 4th and 5th columns show the percentage of negated and speculative sentences. 70% ofthe corpus documents (randomly selected) were used to develop the ScopeFinder system (i.e. extract lexico-syntactic rules) and the remaining 30% were used to evaluate system performance. While the corpus focuses on the biomedical domain, our rule extraction method is not domain specific and in future work we are planning to apply our method on different types of corpora. 4 Method Intuitively, rules for detecting both speculation and negation scopes could be concisely expressed as a 284 Figure 1: Parse tree of the sentence ‘T cells {lack active NFkappa B } bPuatr express Sp1 as expected’ generated by cthtiev eS NtanF-fkoaprdp parser. Speculation scope ewxporedcste are gsehnoewrant eind ellipsis. tTanhecue word is shown in grey. The nearest common ancestor of all cue and scope leaf nodes is shown in a box. combination of lexical and syntactic patterns. example, BioScope O¨zg u¨r For and Radev (2009) examined sample sentences and developed hedging scope rules such as: The scope of a modal verb cue (e.g. may, might, could) is the verb phrase to which it is attached; The scope of a verb cue (e.g. appears, seems) followed by an infinitival clause extends to the whole sentence. Similar lexico-syntactic rules have been also manually compiled and used in a number of hedge scope detection systems, e.g. (Kilicoglu and Bergler, 2008), (Rei and Briscoe, 2010), (Velldal et al., 2010), (Kilicoglu and Bergler, 2010), (Zhou et al., 2010). However, manually creating a comprehensive set of such lexico-syntactic scope rules is a laborious and time-consuming process. In addition, such an approach relies heavily on the availability of accurately parsed sentences, which could be problematic for domains such as biomedical texts (Clegg and Shepherd, 2007; McClosky and Charniak, 2008). Instead, we attempted to automatically extract lexico-syntactic scope rules from the BioScope corpus, relying only on consistent (but not necessarily accurate) parse tree representations. We first parsed each sentence in the training dataset which contained a negation or speculation cue using the Stanford parser (Klein and Manning, 2003; De Marneffe et al., 2006). Figure 1 shows the parse tree of a sample sentence containing a negation cue and its scope. Next, for each cue-scope instance within the sen- tence, we identified the nearest common ancestor Figure 2: Lexico-syntactic pattern extracted from the sentence from Figure 1. The rule is equivalent to the following string representation: (VP (VBP lack) (NP (JJ *scope*) (NN *scope*) (NN *scope*))). which encompassed the cue word(s) and all words in the scope (shown in a box on Figure 1). The subtree rooted by this ancestor is the basis for the resulting lexico-syntactic rule. The leaf nodes of the resulting subtree were converted to a generalized representation: scope words were converted to *scope*; noncue and non-scope words were converted to *; cue words were converted to lower case. Figure 2 shows the resulting rule. This rule generation approach resulted in a large number of very specific rule patterns - 1,681 nega- tion scope rules and 3,043 speculation scope rules were extracted from the training dataset. To identify a more general set of rules (and increase recall) we next performed a simple transformation of the derived rule set. If all children of a rule tree node are of type *scope* or * (i.e. noncue words), the node label is replaced by *scope* or * respectively, and the node’s children are pruned from the rule tree; neighboring identical siblings of type *scope* or * are replaced by a single node of the corresponding type. Figure 3 shows an example of this transformation. (a)ThechildrenofnodesJ /N /N are(b)Thechildren pruned and their labels are replaced by of node NP are *scope*. pruned and its label is replaced by *scope*. Figure 3: Transformation of the tree shown in Figure 2. The final rule is equivalent to the following string representation: (VP (VBP lack) *scope* ) 285 The rule tree pruning described above reduced the negation scope rule patterns to 439 and the speculation rule patterns to 1,000. In addition to generating a set of scope finding rules, we also implemented a module that parses string representations of the lexico-syntactic rules and performs subtree matching. The ScopeFinder module2 identifies negation and speculation scopes in sentence parse trees using string-encoded lexicosyntactic patterns. Candidate sentence parse subtrees are first identified by matching the path of cue leafnodes to the root ofthe rule subtree pattern. Ifan identical path exists in the sentence, the root of the candidate subtree is thus also identified. The candidate subtree is evaluated for a match by recursively comparing all node children (starting from the root of the subtree) to the rule pattern subtree. Nodes of type *scope* and * match any number of nodes, similar to the semantics of Regex Kleene star (*). 5 Results As an informed baseline, we used a previously de- veloped rule-based system for negation and speculation scope discovery (Apostolova and Tomuro, 2010). The system, inspired by the NegEx algorithm (Chapman et al., 2001), uses a list of phrases split into subsets (preceding vs. following their scope) to identify cues using string matching. The cue scopes extend from the cue to the beginning or end of the sentence, depending on the cue type. Table 3 shows the baseline results. PSFCNalpueingpleciarPutcAlai opbtneisor tacsP6597C348o.r12075e4ctly6859RP203475r. 81e26d037icteF569784C52. 04u913e84s5F2A81905l.2786P14redictCus Table 3: Baseline system performance. P (Precision), R (Recall), and F (F1-score) are computed based on the sentence tokens of correctly predicted cues. The last column shows the F1-score for sentence tokens of all predicted cues (including erroneous ones). We used only the scopes of predicted cues (correctly predicted cues vs. all predicted cues) to mea- 2The rule sets and source code are publicly available at http://scopefinder.sourceforge.net/. sure the baseline system performance. The baseline system heuristics did not contain all phrase cues present in the dataset. The scopes of cues that are missing from the baseline system were not included in the results. As the baseline system was not penalized for missing cue phrases, the results represent the upper bound of the system. Table 4 shows the results from applying the full extracted rule set (1,681 negation scope rules and 3,043 speculation scope rules) on the test data. As expected, this rule set consisting of very specific scope matching rules resulted in very high precision and very low recall. Negation P R F A Clinical99.4734.3051.0117.58 Full Papers Paper Abstracts 95.23 87.33 25.89 05.78 40.72 10.84 28.00 07.85 Speculation Clinical96.5020.1233.3022.90 Full Papers Paper Abstracts 88.72 77.50 15.89 11.89 26.95 20.62 10.13 10.00 Table 4: Results from applying the full extracted rule set on the test data. Precision (P), Recall (R), and F1-score (F) are com- puted based the number of correctly identified scope tokens in each sentence. Accuracy (A) is computed for correctly identified full scopes (exact match). Table 5 shows the results from applying the rule set consisting of pruned pattern trees (439 negation scope rules and 1,000 speculation scope rules) on the test data. As shown, overall results improved significantly, both over the baseline and over the unpruned set of rules. Comparable results are shown in bold in Tables 3, 4, and 5. Negation P R F A Clinical85.5992.1588.7585.56 Full Papers 49.17 94.82 64.76 71.26 Paper Abstracts 61.48 92.64 73.91 80.63 Speculation Clinical67.2586.2475.5771.35 Full Papers 65.96 98.43 78.99 52.63 Paper Abstracts 60.24 95.48 73.87 65.28 Table 5: Results from applying the pruned rule set on the test data. Precision (P), Recall (R), and F1-score (F) are computed based on the number of correctly identified scope tokens in each sentence. Accuracy (A) is computed for correctly identified full scopes (exact match). 6 Related Work Interest in the task of identifying negation and spec- ulation scopes has developed in recent years. Rele286 vant research was facilitated by the appearance of a publicly available annotated corpus. All systems described below were developed and evaluated against the BioScope corpus (Vincze et al., 2008). O¨zg u¨r and Radev (2009) have developed a supervised classifier for identifying speculation cues and a manually compiled list of lexico-syntactic rules for identifying their scopes. For the performance of the rule based system on identifying speculation scopes, they report 61. 13 and 79.89 accuracy for BioScope full papers and abstracts respectively. Similarly, Morante and Daelemans (2009b) developed a machine learning system for identifying hedging cues and their scopes. They modeled the scope finding problem as a classification task that determines if a sentence token is the first token in a scope sequence, the last one, or neither. Results of the scope finding system with predicted hedge signals were reported as F1-scores of 38. 16, 59.66, 78.54 and for clinical texts, full papers, and abstracts respectively3. Accuracy (computed for correctly identified scopes) was reported as 26.21, 35.92, and 65.55 for clinical texts, papers, and abstracts respectively. Morante and Daelemans have also developed a metalearner for identifying the scope of negation (2009a). Results of the negation scope finding system with predicted cues are reported as F1-scores (computed on scope tokens) of 84.20, 70.94, and 82.60 for clinical texts, papers, and abstracts respectively. Accuracy (the percent of correctly identified exact scopes) is reported as 70.75, 41.00, and 66.07 for clinical texts, papers, and abstracts respectively. The top three best performers on the CoNLL2010 shared task on hedge scope detection (Farkas et al., 2010) report an F1-score for correctly identified hedge cues and their scopes ranging from 55.3 to 57.3. The shared task evaluation metrics used stricter matching criteria based on exact match of both cues and their corresponding scopes4. CoNLL-2010 shared task participants applied a variety of rule-based and machine learning methods 3F1-scores are computed based on scope tokens. Unlike our evaluation metric, scope token matches are computed for each cue within a sentence, i.e. a token is evaluated multiple times if it belongs to more than one cue scope. 4Our system does not focus on individual cue-scope pair de- tection (we instead optimized scope detection) and as a result performance metrics are not directly comparable. on the task - Morante et al. (2010) used a memorybased classifier based on the k-nearest neighbor rule to determine if a token is the first token in a scope sequence, the last, or neither; Rei and Briscoe (2010) used a combination of manually compiled rules, a CRF classifier, and a sequence of post-processing steps on the same task; Velldal et al (2010) manually compiled a set of heuristics based on syntactic information taken from dependency structures. 7 Discussion We presented a method for automatic extraction of lexico-syntactic rules for negation/speculation scopes from an annotated corpus. The developed ScopeFinder system, based on the automatically extracted rule sets, was compared to a baseline rule-based system that does not use syntactic information. The ScopeFinder system outperformed the baseline system in all cases and exhibited results comparable to complex feature-based, machine-learning systems. In future work, we will explore the use of statistically based methods for the creation of an optimum set of lexico-syntactic tree patterns and will evaluate the system performance on texts from different domains. References E. Apostolova and N. Tomuro. 2010. Exploring surfacelevel heuristics for negation and speculation discovery in clinical texts. In Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, pages 81–82. Association for Computational Linguistics. W.W. Chapman, W. Bridewell, P. Hanbury, G.F. Cooper, and B.G. Buchanan. 2001. A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of biomedical informatics, 34(5):301–310. A.B. Clegg and A.J. Shepherd. 2007. Benchmarking natural-language parsers for biological applications using dependency graphs. BMC bioinformatics, 8(1):24. M.C. De Marneffe, B. MacCartney, and C.D. Manning. 2006. Generating typed dependency parses from phrase structure parses. In LREC 2006. Citeseer. R. Farkas, V. Vincze, G. M o´ra, J. Csirik, and G. Szarvas. 2010. The CoNLL-2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text. In Proceedings of the Fourteenth Conference on 287 Computational Natural Language Learning (CoNLL2010): Shared Task, pages 1–12. H. Kilicoglu and S. Bergler. 2008. Recognizing speculative language in biomedical research articles: a linguistically motivated perspective. BMC bioinformatics, 9(Suppl 11):S10. H. Kilicoglu and S. Bergler. 2010. A High-Precision Approach to Detecting Hedges and Their Scopes. CoNLL-2010: Shared Task, page 70. D. Klein and C.D. Manning. 2003. Fast exact inference with a factored model for natural language parsing. Advances in neural information processing systems, pages 3–10. D. McClosky and E. Charniak. 2008. Self-training for biomedical parsing. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pages 101–104. Association for Computational Linguistics. R. Morante and W. Daelemans. 2009a. A metalearning approach to processing the scope of negation. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pages 21–29. Association for Computational Linguistics. R. Morante and W. Daelemans. 2009b. Learning the scope of hedge cues in biomedical texts. In Proceed- ings of the Workshop on BioNLP, pages 28–36. Association for Computational Linguistics. R. Morante, V. Van Asch, and W. Daelemans. 2010. Memory-based resolution of in-sentence scopes of hedge cues. CoNLL-2010: Shared Task, page 40. A. O¨zg u¨r and D.R. Radev. 2009. Detecting speculations and their scopes in scientific text. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3-Volume 3, pages 1398–1407. Association for Computational Linguistics. M. Rei and T. Briscoe. 2010. Combining manual rules and supervised learning for hedge cue and scope detection. In Proceedings of the 14th Conference on Natural Language Learning, pages 56–63. E. Velldal, L. Øvrelid, and S. Oepen. 2010. Resolving Speculation: MaxEnt Cue Classification and Dependency-Based Scope Rules. CoNLL-2010: Shared Task, page 48. V. Vincze, G. Szarvas, R. Farkas, G. M o´ra, and J. Csirik. 2008. The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC bioinformatics, 9(Suppl 11):S9. H. Zhou, X. Li, D. Huang, Z. Li, and Y. Yang. 2010. Exploiting Multi-Features to Detect Hedges and Their Scope in Biomedical Texts. CoNLL-2010: Shared Task, page 106.
3 0.23451532 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features
Author: Awais Athar
Abstract: Sentiment analysis of citations in scientific papers and articles is a new and interesting problem due to the many linguistic differences between scientific texts and other genres. In this paper, we focus on the problem of automatic identification of positive and negative sentiment polarity in citations to scientific papers. Using a newly constructed annotated citation sentiment corpus, we explore the effectiveness of existing and novel features, including n-grams, specialised science-specific lexical features, dependency relations, sentence splitting and negation features. Our results show that 3-grams and dependencies perform best in this task; they outperform the sentence splitting, science lexicon and negation based features.
4 0.13317458 8 acl-2011-A Corpus of Scope-disambiguated English Text
Author: Mehdi Manshadi ; James Allen ; Mary Swift
Abstract: Previous work on quantifier scope annotation focuses on scoping sentences with only two quantified noun phrases (NPs), where the quantifiers are restricted to a predefined list. It also ignores negation, modal/logical operators, and other sentential adverbials. We present a comprehensive scope annotation scheme. We annotate the scope interaction between all scopal terms in the sentence from quantifiers to scopal adverbials, without putting any restriction on the number of scopal terms in a sentence. In addition, all NPs, explicitly quantified or not, with no restriction on the type of quantification, are investigated for possible scope interactions. 1
5 0.096535012 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
Author: Joel Lang ; Mirella Lapata
Abstract: In this paper we describe an unsupervised method for semantic role induction which holds promise for relieving the data acquisition bottleneck associated with supervised role labelers. We present an algorithm that iteratively splits and merges clusters representing semantic roles, thereby leading from an initial clustering to a final clustering of better quality. The method is simple, surprisingly effective, and allows to integrate linguistic knowledge transparently. By combining role induction with a rule-based component for argument identification we obtain an unsupervised end-to-end semantic role labeling system. Evaluation on the CoNLL 2008 benchmark dataset demonstrates that our method outperforms competitive unsupervised approaches by a wide margin.
6 0.07717026 159 acl-2011-Identifying Noun Product Features that Imply Opinions
7 0.076613441 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation
9 0.067286424 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
10 0.064100645 194 acl-2011-Language Use: What can it tell us?
11 0.056397241 322 acl-2011-Unsupervised Learning of Semantic Relation Composition
12 0.055576798 95 acl-2011-Detection of Agreement and Disagreement in Broadcast Conversations
13 0.053224102 230 acl-2011-Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation
14 0.052689523 165 acl-2011-Improving Classification of Medical Assertions in Clinical Notes
15 0.052585486 143 acl-2011-Getting the Most out of Transition-based Dependency Parsing
16 0.04890741 131 acl-2011-Extracting Opinion Expressions and Their Polarities - Exploration of Pipelines and Joint Models
17 0.044204462 293 acl-2011-Template-Based Information Extraction without the Templates
18 0.043688331 315 acl-2011-Types of Common-Sense Knowledge Needed for Recognizing Textual Entailment
19 0.042594053 288 acl-2011-Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications
20 0.041198488 317 acl-2011-Underspecifying and Predicting Voice for Surface Realisation Ranking
topicId topicWeight
[(0, 0.133), (1, 0.068), (2, -0.002), (3, -0.066), (4, 0.005), (5, 0.049), (6, 0.02), (7, -0.007), (8, -0.004), (9, -0.092), (10, -0.021), (11, -0.068), (12, -0.049), (13, 0.063), (14, -0.067), (15, -0.061), (16, -0.061), (17, -0.095), (18, 0.041), (19, -0.104), (20, -0.041), (21, 0.005), (22, -0.092), (23, -0.073), (24, 0.141), (25, 0.134), (26, -0.117), (27, 0.105), (28, -0.242), (29, 0.002), (30, 0.076), (31, 0.026), (32, 0.069), (33, 0.069), (34, -0.128), (35, 0.058), (36, -0.0), (37, -0.063), (38, 0.195), (39, -0.132), (40, -0.044), (41, -0.021), (42, 0.02), (43, -0.098), (44, 0.025), (45, 0.123), (46, -0.062), (47, 0.109), (48, 0.045), (49, 0.081)]
simIndex simValue paperId paperTitle
same-paper 1 0.96121359 273 acl-2011-Semantic Representation of Negation Using Focus Detection
Author: Eduardo Blanco ; Dan Moldovan
Abstract: Negation is present in all human languages and it is used to reverse the polarity of part of statements that are otherwise affirmative by default. A negated statement often carries positive implicit meaning, but to pinpoint the positive part from the negative part is rather difficult. This paper aims at thoroughly representing the semantics of negation by revealing implicit positive meaning. The proposed representation relies on focus of negation detection. For this, new annotation over PropBank and a learning algorithm are proposed.
2 0.88771796 50 acl-2011-Automatic Extraction of Lexico-Syntactic Patterns for Detection of Negation and Speculation Scopes
Author: Emilia Apostolova ; Noriko Tomuro ; Dina Demner-Fushman
Abstract: Detecting the linguistic scope of negated and speculated information in text is an important Information Extraction task. This paper presents ScopeFinder, a linguistically motivated rule-based system for the detection of negation and speculation scopes. The system rule set consists of lexico-syntactic patterns automatically extracted from a corpus annotated with negation/speculation cues and their scopes (the BioScope corpus). The system performs on par with state-of-the-art machine learning systems. Additionally, the intuitive and linguistically motivated rules will allow for manual adaptation of the rule set to new domains and corpora. 1 Motivation Information Extraction (IE) systems often face the problem of distinguishing between affirmed, negated, and speculative information in text. For example, sentiment analysis systems need to detect negation for accurate polarity classification. Similarly, medical IE systems need to differentiate between affirmed, negated, and speculated (possible) medical conditions. The importance of the task of negation and speculation (a.k.a. hedge) detection is attested by a number of research initiatives. The creation of the BioScope corpus (Vincze et al., 2008) assisted in the development and evaluation of several negation/hedge scope detection systems. The corpus consists of medical and biological texts annotated for negation, speculation, and their linguistic scope. The 2010 283 Noriko Tomuro Dina Demner-Fushman DePaul University Chicago, IL USA t omuro @ c s . depaul . edu National Library of Medicine Bethesda, MD USA ddemne r@mai l nih . gov . i2b2 NLP Shared Task1 included a track for detection of the assertion status of medical problems (e.g. affirmed, negated, hypothesized, etc.). The CoNLL2010 Shared Task (Farkas et al., 2010) focused on detecting hedges and their scopes in Wikipedia articles and biomedical texts. In this paper, we present a linguistically motivated rule-based system for the detection of negation and speculation scopes that performs on par with state-of-the-art machine learning systems. The rules used by the ScopeFinder system are automatically extracted from the BioScope corpus and encode lexico-syntactic patterns in a user-friendly format. While the system was developed and tested using a biomedical corpus, the rule extraction mechanism is not domain-specific. In addition, the linguistically motivated rule encoding allows for manual adaptation to new domains and corpora. 2 Task Definition Negation/Speculation detection is typically broken down into two sub-tasks - discovering a negation/speculation cue and establishing its scope. The following example from the BioScope corpus shows the annotated hedging cue (in bold) together with its associated scope (surrounded by curly brackets): Finally, we explored the {possible role of 5hydroxyeicosatetraenoic acid as a regulator of arachidonic acid liberation}. Typically, systems first identify negation/speculation cues and subsequently try to identify their associated cue scope. However, the two tasks are interrelated and both require 1https://www.i2b2.org/NLP/Relations/ Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 283–287, syntactic understanding. Consider the following two sentences from the BioScope corpus: 1) By contrast, {D-mib appears to be uniformly expre1ss)e Bdy yin c oimnatrgaisnta,l { dDis-mcsi }b. 2) Differentiation assays using water soluble phorbol esters reveal that differentiation becomes irreversible soon after AP-1 appears. Both sentences contain the word form appears, however in the first sentence the word marks a hedg- ing cue, while in the second sentence the word does not suggest speculation. Unlike previous work, we do not attempt to identify negation/speculation cues independently of their scopes. Instead, we concentrate on scope detection, simultaneously detecting corresponding cues. 3 Dataset We used the BioScope corpus (Vincze et al., 2008) to develop our system and evaluate its performance. To our knowledge, the BioScope corpus is the only publicly available dataset annotated with negation/speculation cues and their scopes. It consists of biomedical papers, abstracts, and clinical reports (corpus statistics are shown in Tables 1 and 2). Corpus Type Sentences Documents Mean Document Size Clinical752019543.85 Full Papers Paper Abstracts 3352 14565 9 1273 372.44 11.44 Table 1: Statistics of the BioScope corpus. Document sizes represent number of sentences. Corpus Type Negation Cues Speculation Cues Negation Speculation Clinical87211376.6%13.4% Full Papers Paper Abstracts 378 1757 682 2694 13.76% 13.45% 22.29% 17.69% Table 2: Statistics of the BioScope corpus. The 2nd and 3d columns show the total number of cues within the datasets; the 4th and 5th columns show the percentage of negated and speculative sentences. 70% ofthe corpus documents (randomly selected) were used to develop the ScopeFinder system (i.e. extract lexico-syntactic rules) and the remaining 30% were used to evaluate system performance. While the corpus focuses on the biomedical domain, our rule extraction method is not domain specific and in future work we are planning to apply our method on different types of corpora. 4 Method Intuitively, rules for detecting both speculation and negation scopes could be concisely expressed as a 284 Figure 1: Parse tree of the sentence ‘T cells {lack active NFkappa B } bPuatr express Sp1 as expected’ generated by cthtiev eS NtanF-fkoaprdp parser. Speculation scope ewxporedcste are gsehnoewrant eind ellipsis. tTanhecue word is shown in grey. The nearest common ancestor of all cue and scope leaf nodes is shown in a box. combination of lexical and syntactic patterns. example, BioScope O¨zg u¨r For and Radev (2009) examined sample sentences and developed hedging scope rules such as: The scope of a modal verb cue (e.g. may, might, could) is the verb phrase to which it is attached; The scope of a verb cue (e.g. appears, seems) followed by an infinitival clause extends to the whole sentence. Similar lexico-syntactic rules have been also manually compiled and used in a number of hedge scope detection systems, e.g. (Kilicoglu and Bergler, 2008), (Rei and Briscoe, 2010), (Velldal et al., 2010), (Kilicoglu and Bergler, 2010), (Zhou et al., 2010). However, manually creating a comprehensive set of such lexico-syntactic scope rules is a laborious and time-consuming process. In addition, such an approach relies heavily on the availability of accurately parsed sentences, which could be problematic for domains such as biomedical texts (Clegg and Shepherd, 2007; McClosky and Charniak, 2008). Instead, we attempted to automatically extract lexico-syntactic scope rules from the BioScope corpus, relying only on consistent (but not necessarily accurate) parse tree representations. We first parsed each sentence in the training dataset which contained a negation or speculation cue using the Stanford parser (Klein and Manning, 2003; De Marneffe et al., 2006). Figure 1 shows the parse tree of a sample sentence containing a negation cue and its scope. Next, for each cue-scope instance within the sen- tence, we identified the nearest common ancestor Figure 2: Lexico-syntactic pattern extracted from the sentence from Figure 1. The rule is equivalent to the following string representation: (VP (VBP lack) (NP (JJ *scope*) (NN *scope*) (NN *scope*))). which encompassed the cue word(s) and all words in the scope (shown in a box on Figure 1). The subtree rooted by this ancestor is the basis for the resulting lexico-syntactic rule. The leaf nodes of the resulting subtree were converted to a generalized representation: scope words were converted to *scope*; noncue and non-scope words were converted to *; cue words were converted to lower case. Figure 2 shows the resulting rule. This rule generation approach resulted in a large number of very specific rule patterns - 1,681 nega- tion scope rules and 3,043 speculation scope rules were extracted from the training dataset. To identify a more general set of rules (and increase recall) we next performed a simple transformation of the derived rule set. If all children of a rule tree node are of type *scope* or * (i.e. noncue words), the node label is replaced by *scope* or * respectively, and the node’s children are pruned from the rule tree; neighboring identical siblings of type *scope* or * are replaced by a single node of the corresponding type. Figure 3 shows an example of this transformation. (a)ThechildrenofnodesJ /N /N are(b)Thechildren pruned and their labels are replaced by of node NP are *scope*. pruned and its label is replaced by *scope*. Figure 3: Transformation of the tree shown in Figure 2. The final rule is equivalent to the following string representation: (VP (VBP lack) *scope* ) 285 The rule tree pruning described above reduced the negation scope rule patterns to 439 and the speculation rule patterns to 1,000. In addition to generating a set of scope finding rules, we also implemented a module that parses string representations of the lexico-syntactic rules and performs subtree matching. The ScopeFinder module2 identifies negation and speculation scopes in sentence parse trees using string-encoded lexicosyntactic patterns. Candidate sentence parse subtrees are first identified by matching the path of cue leafnodes to the root ofthe rule subtree pattern. Ifan identical path exists in the sentence, the root of the candidate subtree is thus also identified. The candidate subtree is evaluated for a match by recursively comparing all node children (starting from the root of the subtree) to the rule pattern subtree. Nodes of type *scope* and * match any number of nodes, similar to the semantics of Regex Kleene star (*). 5 Results As an informed baseline, we used a previously de- veloped rule-based system for negation and speculation scope discovery (Apostolova and Tomuro, 2010). The system, inspired by the NegEx algorithm (Chapman et al., 2001), uses a list of phrases split into subsets (preceding vs. following their scope) to identify cues using string matching. The cue scopes extend from the cue to the beginning or end of the sentence, depending on the cue type. Table 3 shows the baseline results. PSFCNalpueingpleciarPutcAlai opbtneisor tacsP6597C348o.r12075e4ctly6859RP203475r. 81e26d037icteF569784C52. 04u913e84s5F2A81905l.2786P14redictCus Table 3: Baseline system performance. P (Precision), R (Recall), and F (F1-score) are computed based on the sentence tokens of correctly predicted cues. The last column shows the F1-score for sentence tokens of all predicted cues (including erroneous ones). We used only the scopes of predicted cues (correctly predicted cues vs. all predicted cues) to mea- 2The rule sets and source code are publicly available at http://scopefinder.sourceforge.net/. sure the baseline system performance. The baseline system heuristics did not contain all phrase cues present in the dataset. The scopes of cues that are missing from the baseline system were not included in the results. As the baseline system was not penalized for missing cue phrases, the results represent the upper bound of the system. Table 4 shows the results from applying the full extracted rule set (1,681 negation scope rules and 3,043 speculation scope rules) on the test data. As expected, this rule set consisting of very specific scope matching rules resulted in very high precision and very low recall. Negation P R F A Clinical99.4734.3051.0117.58 Full Papers Paper Abstracts 95.23 87.33 25.89 05.78 40.72 10.84 28.00 07.85 Speculation Clinical96.5020.1233.3022.90 Full Papers Paper Abstracts 88.72 77.50 15.89 11.89 26.95 20.62 10.13 10.00 Table 4: Results from applying the full extracted rule set on the test data. Precision (P), Recall (R), and F1-score (F) are com- puted based the number of correctly identified scope tokens in each sentence. Accuracy (A) is computed for correctly identified full scopes (exact match). Table 5 shows the results from applying the rule set consisting of pruned pattern trees (439 negation scope rules and 1,000 speculation scope rules) on the test data. As shown, overall results improved significantly, both over the baseline and over the unpruned set of rules. Comparable results are shown in bold in Tables 3, 4, and 5. Negation P R F A Clinical85.5992.1588.7585.56 Full Papers 49.17 94.82 64.76 71.26 Paper Abstracts 61.48 92.64 73.91 80.63 Speculation Clinical67.2586.2475.5771.35 Full Papers 65.96 98.43 78.99 52.63 Paper Abstracts 60.24 95.48 73.87 65.28 Table 5: Results from applying the pruned rule set on the test data. Precision (P), Recall (R), and F1-score (F) are computed based on the number of correctly identified scope tokens in each sentence. Accuracy (A) is computed for correctly identified full scopes (exact match). 6 Related Work Interest in the task of identifying negation and spec- ulation scopes has developed in recent years. Rele286 vant research was facilitated by the appearance of a publicly available annotated corpus. All systems described below were developed and evaluated against the BioScope corpus (Vincze et al., 2008). O¨zg u¨r and Radev (2009) have developed a supervised classifier for identifying speculation cues and a manually compiled list of lexico-syntactic rules for identifying their scopes. For the performance of the rule based system on identifying speculation scopes, they report 61. 13 and 79.89 accuracy for BioScope full papers and abstracts respectively. Similarly, Morante and Daelemans (2009b) developed a machine learning system for identifying hedging cues and their scopes. They modeled the scope finding problem as a classification task that determines if a sentence token is the first token in a scope sequence, the last one, or neither. Results of the scope finding system with predicted hedge signals were reported as F1-scores of 38. 16, 59.66, 78.54 and for clinical texts, full papers, and abstracts respectively3. Accuracy (computed for correctly identified scopes) was reported as 26.21, 35.92, and 65.55 for clinical texts, papers, and abstracts respectively. Morante and Daelemans have also developed a metalearner for identifying the scope of negation (2009a). Results of the negation scope finding system with predicted cues are reported as F1-scores (computed on scope tokens) of 84.20, 70.94, and 82.60 for clinical texts, papers, and abstracts respectively. Accuracy (the percent of correctly identified exact scopes) is reported as 70.75, 41.00, and 66.07 for clinical texts, papers, and abstracts respectively. The top three best performers on the CoNLL2010 shared task on hedge scope detection (Farkas et al., 2010) report an F1-score for correctly identified hedge cues and their scopes ranging from 55.3 to 57.3. The shared task evaluation metrics used stricter matching criteria based on exact match of both cues and their corresponding scopes4. CoNLL-2010 shared task participants applied a variety of rule-based and machine learning methods 3F1-scores are computed based on scope tokens. Unlike our evaluation metric, scope token matches are computed for each cue within a sentence, i.e. a token is evaluated multiple times if it belongs to more than one cue scope. 4Our system does not focus on individual cue-scope pair de- tection (we instead optimized scope detection) and as a result performance metrics are not directly comparable. on the task - Morante et al. (2010) used a memorybased classifier based on the k-nearest neighbor rule to determine if a token is the first token in a scope sequence, the last, or neither; Rei and Briscoe (2010) used a combination of manually compiled rules, a CRF classifier, and a sequence of post-processing steps on the same task; Velldal et al (2010) manually compiled a set of heuristics based on syntactic information taken from dependency structures. 7 Discussion We presented a method for automatic extraction of lexico-syntactic rules for negation/speculation scopes from an annotated corpus. The developed ScopeFinder system, based on the automatically extracted rule sets, was compared to a baseline rule-based system that does not use syntactic information. The ScopeFinder system outperformed the baseline system in all cases and exhibited results comparable to complex feature-based, machine-learning systems. In future work, we will explore the use of statistically based methods for the creation of an optimum set of lexico-syntactic tree patterns and will evaluate the system performance on texts from different domains. References E. Apostolova and N. Tomuro. 2010. Exploring surfacelevel heuristics for negation and speculation discovery in clinical texts. In Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, pages 81–82. Association for Computational Linguistics. W.W. Chapman, W. Bridewell, P. Hanbury, G.F. Cooper, and B.G. Buchanan. 2001. A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of biomedical informatics, 34(5):301–310. A.B. Clegg and A.J. Shepherd. 2007. Benchmarking natural-language parsers for biological applications using dependency graphs. BMC bioinformatics, 8(1):24. M.C. De Marneffe, B. MacCartney, and C.D. Manning. 2006. Generating typed dependency parses from phrase structure parses. In LREC 2006. Citeseer. R. Farkas, V. Vincze, G. M o´ra, J. Csirik, and G. Szarvas. 2010. The CoNLL-2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text. In Proceedings of the Fourteenth Conference on 287 Computational Natural Language Learning (CoNLL2010): Shared Task, pages 1–12. H. Kilicoglu and S. Bergler. 2008. Recognizing speculative language in biomedical research articles: a linguistically motivated perspective. BMC bioinformatics, 9(Suppl 11):S10. H. Kilicoglu and S. Bergler. 2010. A High-Precision Approach to Detecting Hedges and Their Scopes. CoNLL-2010: Shared Task, page 70. D. Klein and C.D. Manning. 2003. Fast exact inference with a factored model for natural language parsing. Advances in neural information processing systems, pages 3–10. D. McClosky and E. Charniak. 2008. Self-training for biomedical parsing. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pages 101–104. Association for Computational Linguistics. R. Morante and W. Daelemans. 2009a. A metalearning approach to processing the scope of negation. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pages 21–29. Association for Computational Linguistics. R. Morante and W. Daelemans. 2009b. Learning the scope of hedge cues in biomedical texts. In Proceed- ings of the Workshop on BioNLP, pages 28–36. Association for Computational Linguistics. R. Morante, V. Van Asch, and W. Daelemans. 2010. Memory-based resolution of in-sentence scopes of hedge cues. CoNLL-2010: Shared Task, page 40. A. O¨zg u¨r and D.R. Radev. 2009. Detecting speculations and their scopes in scientific text. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3-Volume 3, pages 1398–1407. Association for Computational Linguistics. M. Rei and T. Briscoe. 2010. Combining manual rules and supervised learning for hedge cue and scope detection. In Proceedings of the 14th Conference on Natural Language Learning, pages 56–63. E. Velldal, L. Øvrelid, and S. Oepen. 2010. Resolving Speculation: MaxEnt Cue Classification and Dependency-Based Scope Rules. CoNLL-2010: Shared Task, page 48. V. Vincze, G. Szarvas, R. Farkas, G. M o´ra, and J. Csirik. 2008. The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC bioinformatics, 9(Suppl 11):S9. H. Zhou, X. Li, D. Huang, Z. Li, and Y. Yang. 2010. Exploiting Multi-Features to Detect Hedges and Their Scope in Biomedical Texts. CoNLL-2010: Shared Task, page 106.
3 0.8104679 8 acl-2011-A Corpus of Scope-disambiguated English Text
Author: Mehdi Manshadi ; James Allen ; Mary Swift
Abstract: Previous work on quantifier scope annotation focuses on scoping sentences with only two quantified noun phrases (NPs), where the quantifiers are restricted to a predefined list. It also ignores negation, modal/logical operators, and other sentential adverbials. We present a comprehensive scope annotation scheme. We annotate the scope interaction between all scopal terms in the sentence from quantifiers to scopal adverbials, without putting any restriction on the number of scopal terms in a sentence. In addition, all NPs, explicitly quantified or not, with no restriction on the type of quantification, are investigated for possible scope interactions. 1
4 0.48693702 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features
Author: Awais Athar
Abstract: Sentiment analysis of citations in scientific papers and articles is a new and interesting problem due to the many linguistic differences between scientific texts and other genres. In this paper, we focus on the problem of automatic identification of positive and negative sentiment polarity in citations to scientific papers. Using a newly constructed annotated citation sentiment corpus, we explore the effectiveness of existing and novel features, including n-grams, specialised science-specific lexical features, dependency relations, sentence splitting and negation features. Our results show that 3-grams and dependencies perform best in this task; they outperform the sentence splitting, science lexicon and negation based features.
5 0.4252345 165 acl-2011-Improving Classification of Medical Assertions in Clinical Notes
Author: Youngjun Kim ; Ellen Riloff ; Stephane Meystre
Abstract: We present an NLP system that classifies the assertion type of medical problems in clinical notes used for the Fourth i2b2/VA Challenge. Our classifier uses a variety of linguistic features, including lexical, syntactic, lexicosyntactic, and contextual features. To overcome an extremely unbalanced distribution of assertion types in the data set, we focused our efforts on adding features specifically to improve the performance of minority classes. As a result, our system reached 94. 17% micro-averaged and 79.76% macro-averaged F1-measures, and showed substantial recall gains on the minority classes. 1
6 0.39766347 138 acl-2011-French TimeBank: An ISO-TimeML Annotated Reference Corpus
7 0.35302657 209 acl-2011-Lexically-Triggered Hidden Markov Models for Clinical Document Coding
8 0.34966815 68 acl-2011-Classifying arguments by scheme
10 0.32764035 322 acl-2011-Unsupervised Learning of Semantic Relation Composition
11 0.31995714 131 acl-2011-Extracting Opinion Expressions and Their Polarities - Exploration of Pipelines and Joint Models
12 0.31741059 294 acl-2011-Temporal Evaluation
13 0.31129348 200 acl-2011-Learning Dependency-Based Compositional Semantics
14 0.31011289 288 acl-2011-Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications
15 0.30965859 71 acl-2011-Coherent Citation-Based Summarization of Scientific Papers
16 0.30166945 42 acl-2011-An Interface for Rapid Natural Language Processing Development in UIMA
17 0.30065104 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation
18 0.29664767 317 acl-2011-Underspecifying and Predicting Voice for Surface Realisation Ranking
19 0.29596135 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
20 0.29121515 159 acl-2011-Identifying Noun Product Features that Imply Opinions
topicId topicWeight
[(5, 0.04), (17, 0.04), (26, 0.019), (31, 0.019), (37, 0.066), (39, 0.043), (41, 0.062), (55, 0.034), (59, 0.066), (71, 0.277), (72, 0.033), (89, 0.037), (91, 0.039), (96, 0.103), (97, 0.022), (98, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.77955538 273 acl-2011-Semantic Representation of Negation Using Focus Detection
Author: Eduardo Blanco ; Dan Moldovan
Abstract: Negation is present in all human languages and it is used to reverse the polarity of part of statements that are otherwise affirmative by default. A negated statement often carries positive implicit meaning, but to pinpoint the positive part from the negative part is rather difficult. This paper aims at thoroughly representing the semantics of negation by revealing implicit positive meaning. The proposed representation relies on focus of negation detection. For this, new annotation over PropBank and a learning algorithm are proposed.
2 0.7123096 307 acl-2011-Towards Tracking Semantic Change by Visual Analytics
Author: Christian Rohrdantz ; Annette Hautli ; Thomas Mayer ; Miriam Butt ; Daniel A. Keim ; Frans Plank
Abstract: This paper presents a new approach to detecting and tracking changes in word meaning by visually modeling and representing diachronic development in word contexts. Previous studies have shown that computational models are capable of clustering and disambiguating senses, a more recent trend investigates whether changes in word meaning can be tracked by automatic methods. The aim of our study is to offer a new instrument for investigating the diachronic development of word senses in a way that allows for a better understanding of the nature of semantic change in general. For this purpose we combine techniques from the field of Visual Analytics with unsupervised methods from Natural Language Processing, allowing for an interactive visual exploration of semantic change.
3 0.62852967 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction
Author: Yee Seng Chan ; Dan Roth
Abstract: In this paper, we observe that there exists a second dimension to the relation extraction (RE) problem that is orthogonal to the relation type dimension. We show that most of these second dimensional structures are relatively constrained and not difficult to identify. We propose a novel algorithmic approach to RE that starts by first identifying these structures and then, within these, identifying the semantic type of the relation. In the real RE problem where relation arguments need to be identified, exploiting these structures also allows reducing pipelined propagated errors. We show that this RE framework provides significant improvement in RE performance.
4 0.61622494 69 acl-2011-Clause Restructuring For SMT Not Absolutely Helpful
Author: Susan Howlett ; Mark Dras
Abstract: There are a number of systems that use a syntax-based reordering step prior to phrasebased statistical MT. An early work proposing this idea showed improved translation performance, but subsequent work has had mixed results. Speculations as to cause have suggested the parser, the data, or other factors. We systematically investigate possible factors to give an initial answer to the question: Under what conditions does this use of syntax help PSMT?
5 0.51118296 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
Author: Joel Lang ; Mirella Lapata
Abstract: In this paper we describe an unsupervised method for semantic role induction which holds promise for relieving the data acquisition bottleneck associated with supervised role labelers. We present an algorithm that iteratively splits and merges clusters representing semantic roles, thereby leading from an initial clustering to a final clustering of better quality. The method is simple, surprisingly effective, and allows to integrate linguistic knowledge transparently. By combining role induction with a rule-based component for argument identification we obtain an unsupervised end-to-end semantic role labeling system. Evaluation on the CoNLL 2008 benchmark dataset demonstrates that our method outperforms competitive unsupervised approaches by a wide margin.
6 0.50964952 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
7 0.50891572 164 acl-2011-Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features
8 0.50309682 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
9 0.50127083 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation
10 0.50126982 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
11 0.50057638 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction
12 0.50009733 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
13 0.4985851 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning
14 0.49855337 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling
15 0.49778667 173 acl-2011-Insertion Operator for Bayesian Tree Substitution Grammars
16 0.49746323 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing
17 0.49502289 209 acl-2011-Lexically-Triggered Hidden Markov Models for Clinical Document Coding
18 0.49501437 244 acl-2011-Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts
19 0.49471223 178 acl-2011-Interactive Topic Modeling
20 0.49401483 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment