emnlp emnlp2010 emnlp2010-15 knowledge-graph by maker-knowledge-mining

15 emnlp-2010-A Unified Framework for Scope Learning via Simplified Shallow Semantic Parsing

Source: pdf

Author: Qiaoming Zhu ; Junhui Li ; Hongling Wang ; Guodong Zhou

Abstract: This paper approaches the scope learning problem via simplified shallow semantic parsing. This is done by regarding the cue as the predicate and mapping its scope into several constituents as the arguments of the cue. Evaluation on the BioScope corpus shows that the structural information plays a critical role in capturing the relationship between a cue and its dominated arguments. It also shows that our parsing approach significantly outperforms the state-of-the-art chunking ones. Although our parsing approach is only evaluated on negation and speculation scope learning here, it is portable to other kinds of scope learning. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 cn i Abstract This paper approaches the scope learning problem via simplified shallow semantic parsing. [sent-3, score-0.607]

2 This is done by regarding the cue as the predicate and mapping its scope into several constituents as the arguments of the cue. [sent-4, score-1.049]

3 Although our parsing approach is only evaluated on negation and speculation scope learning here, it is portable to other kinds of scope learning. [sent-7, score-1.787]

4 For example, of negation assertion concerned, a negation cue (e. [sent-10, score-1.153]

5 Generally, scope learning involves two subtasks: cue recognition and its scope identification. [sent-13, score-1.438]

6 (2008) pointed out that the extracted information within the scope of a negation or speculation cue should either be discarded or presented separately from factual information. [sent-20, score-1.708]

7 In addition to the IE tasks in the biomedical domain, negation scope learning has attracted increasing attention in some natural language processing (NLP) tasks, such as sentiment classification (Turney, 2002). [sent-28, score-0.831]

8 Similarly, seeing the increasing interest in speculation scope learning, the CoNLL’2010 shared task (Farkas et al. [sent-30, score-0.911]

9 (2008) and Morante and Daelemans (2009a & 2009b) pioneered the research on scope learning by formulating it as a chunking problem, which classifies the words of a sentence as being inside or outside the scope of a cue. [sent-40, score-0.995]

10 (2010) defined heuristic rules for speculation scope learning from constituency and dependency parse tree perspectives, respectively. [sent-42, score-0.988]

11 Although the chunking approach has been evaluated on negation and speculation scope learning and can be easily ported to other scope learning tasks, it ignores syntactic information and suffers Özgür from low performance. [sent-43, score-1.759]

12 , speculation scope learning), it is not readily adoptable to other scope learning tasks (e. [sent-46, score-1.375]

13 Instead, this paper explores scope learning from parse tree perspective and formulates it as a simplified shallow semantic parsing problem, which has been extensively studied in the past few years (Carreras and Màrquez, 2005). [sent-49, score-0.704]

14 In particular, the cue is recast as the predicate and the scope is recast as the arguments of the cue. [sent-50, score-1.023]

15 The motivation behind is that the structured syntactic information plays a critical role in scope learning and should be paid much more attention, as indicated by previous studies in shallow semantic parsing (Gildea and Palmer, 2002; Punyakanok et al. [sent-51, score-0.649]

16 Although our approach is evaluated only on negation and speculation scope learning here, it is portable to other kinds of scope learning. [sent-53, score-1.751]

17 Section 4 describes our parsing approach by formulating scope learning as a simplified shallow semantic parsing problem. [sent-57, score-0.697]

18 715 2 Related Work Most of the previous research on scope learning falls into negation scope learning and speculation scope learning. [sent-60, score-2.173]

19 (2008) pioneered the research on negation scope learning, largely due to the availability of a large-scale annotated corpus, the Bioscope corpus. [sent-62, score-0.798]

20 Morante and Daelemans (2009a) further improved the performance by combing several classifiers and achieved the accuracy of ~98% for negation cue recognition and the PCS (Percentage of Correct Scope) of ~74% for negation scope identification on the abstracts sub- corpus. [sent-64, score-1.897]

21 They concluded that their method for negation scope identification is portable to speculation scope identification. [sent-72, score-1.854]

22 However, of speculation scope identification concerned, it also suffers from low performance, with only 60. [sent-73, score-1.027]

23 Alternatively, and Radev (2009) employed some heuristic rules from constituency Özgür parse tree perspective on speculation scope identification. [sent-75, score-0.988]

24 The more recent CoNLL’2010 shared task was dedicated to the detection of speculation cues and their linguistic scope in natural language processing (Farkas et al. [sent-79, score-1.014]

25 In this corpus, every sentence is annotated with negation cues and speculation cues (if it has), as well as their linguistic scopes. [sent-86, score-0.987]

26 Among them, the full papers subcorpus and the abstracts subcorpus come from the same genre, and thus share some common characteristics in statistics, such as the number of words in the negation/speculation scope to the right (or left) of the negation/speculation cue and the average scope length. [sent-93, score-2.079]

27 In comparison, the clinical reports subcorpus consists of clinical radiology reports with short sentences. [sent-94, score-0.598]

28 2 4 Scope Learning via Simplified Shallow Semantic Parsing In this section, we first formulate the scope learning task as a simplified shallow semantic parsing problem. [sent-107, score-0.643]

29 As far as scope learning considered, the cue can be regarded as the predicate4, while its scope can be mapped into several constituents dominated by the cue and thus can be regarded as the arguments of the cue. [sent-111, score-1.953]

30 In particular, given a cue and its scope which covers wordm, wordn, we adopt the following two heuristic rules to map the scope of the cue into several constituents which can be deemed as its arguments in the given parse tree. [sent-112, score-2.005]

31 2) If constituent X is an argument of the given cue, then X should be the highest constituent dominated by the scope of wordm, wordn. [sent-114, score-0.619]

32 jp/GENIA 4 If a speculation cue consists of multiply words (e. [sent-123, score-0.91]

33 , not) is chosen to represent the negation cue if it consists of multiple words (e. [sent-130, score-0.797]

34 These two constraints between a cue and its arguments are consistent with shallow semantic parsing (Carreras and Màrquez, 2005). [sent-134, score-0.642]

35 , not) represents the negation cue “can not” while its arguments include three constituents {NP4,5, MD6,6, and VP8,11 }. [sent-138, score-0.873]

36 , indicate) represents the speculation cue “indicate that” while its arguments include one constituent SBAR3,11. [sent-141, score-0.975]

37 It is worth noting that according to the above rules, scope learning via shallow semantic parsing, i. [sent-142, score-0.564]

38 717 Compared with common shallow semantic parsing which needs to assign an argument with a semantic label, scope identification does not involve semantic label classification and thus could be divided into three consequent phases: argument pruning, argument processing. [sent-147, score-1.048]

39 Similar to the heuristic algorithm proposed in Xue and Palmer (2004) for argument pruning in common shallow semantic parsing, the argument pruning algorithm adopted here starts from designating the cue node as the current node and collects its siblings. [sent-150, score-0.825]

40 To sum up, except the cue node itself and its ancestral constituents, any constituent in the parse tree whose parent covers the given cue will be collected as argument candidates. [sent-153, score-1.15]

41 Taking the negation cue node “RB7,7” in Figure 2 as an example, constituents {MD6,6, VP8,11, NP4,5, IN3,3, Feature B1 B2 B3 B4 Remarks Cue itself: the word of the cue, e. [sent-154, score-0.849]

42 (left) Table 1: Basic features and their instantiations for argument identification in scope learning, with NP4,5 as the focus constituent (i. [sent-161, score-0.705]

43 Similar to argument identification in common shallow semantic parsing, the structured syntactic information plays a critical role in scope learning. [sent-180, score-0.817]

44 4 Post-Processing Although a cue in the BioScope corpus always has only one continuous block as its scope (including the cue itself), the scope identifier may result in discontinuous scope due to independent predication in the argument identification phase. [sent-191, score-2.522]

45 2, except the words presented by the cue, the projection covers the whole sentence and each constituent (LACi or RACj in Figure 3) receives a probability distribution of being an argument of the given cue in the argument identification phase. [sent-200, score-0.777]

46 Since a cue is deemed inside its scope in the BioScope corpus, our post-processing algorithm first includes the cue in its scope and then starts to identify the left and the right scope boundaries, respectively. [sent-201, score-2.362]

47 5 Cue Recognition Automatic recognition of cues of a special interest is the prerequisite for a scope learning system. [sent-207, score-0.614]

48 The approaches to recognizing cues of a special interest usually fall into two categories: 1) substring matching approaches, which require a set of cue words or phrases in advance (e. [sent-208, score-0.566]

49 In particular, we categorize these features into three groups: 1) features about the cue candidate itself (CC in short); 2) features about surrounding words (SW in short); and 3) structural features derived from the syntactic parse tree (SF in short). [sent-219, score-0.585]

50 (VBP + VP + S) Table 3: Features and their instantiations for cue recognition, with VBP2,2 as the cue candidate, regarding Figure 2. [sent-228, score-0.96]

51 5 Experimentation We have evaluated our simplified shallow semantic parsing approach to negation and speculation scope learning on the BioScope corpus. [sent-229, score-1.424]

52 (2008) and Morante and Daelemans (2009a & 2009b), the abstracts subcorpus is randomly divided into 10 folds so as to perform 10-fold crossvalidation, while the performance on both the papers and clinical reports subcorpora is evaluated using the system trained on the whole abstracts subcorpus. [sent-232, score-0.802]

53 For scope identification, we report the accuracy in PCS (Percentage of Correct Scopes) when the golden cues are given, and report precision/recall/F1-measure when the cues are automatically recognized. [sent-235, score-0.766]

54 01 ) for negation scope identification and improve the performance by 11. [sent-245, score-0.914]

55 , CFACCP5, AC2W, CFCP1) related to neighboring words of the cue play a critical role for both negation and speculation scope identification. [sent-252, score-1.725]

56 Since the additional selected features significantly improve the performance for both negation and speculation scope identification, we will include those additional selected features in all the remaining experiments. [sent-255, score-1.245]

57 org/ 720 Task Features Acc (%) Negation scope identification Baseline +selected features 74. [sent-258, score-0.58]

58 36 Table 4: Contribution of additional selected features on the development dataset of the abstracts subcorpus Since all the sentences in the abstracts subcorpus are included in the GTB 1. [sent-262, score-0.776]

59 0 corpus while we do not have golden parse trees for the sentences in the full papers and the clinical reports subcorpora, we only evaluate the performance of scope identification on the abstracts subcorpus with golden parse trees. [sent-263, score-1.497]

60 It shows that given golden parse trees and golden cues, speculation scope identification achieves higher performance (e. [sent-265, score-1.292]

61 This is mainly due to the observation on the BioScope corpus that the scope of a speculation cue can be usually characterized by its POS and the syntactic structures of the sentence where it occurs. [sent-269, score-1.39]

62 For example, the scope of a verb in active voice usually starts at the cue itself and ends at its object (e. [sent-270, score-0.927]

63 , the speculation cue “indicate that” in Figure 2 scopes the fragment of “indicate that corticoster- oid resistance can not be explained by abnormalities”). [sent-272, score-1.012]

64 Moreover, the statistics on the abstracts subcorpus shows that the number of arguments per speculation cue is smaller than that of arguments per negation cue (e. [sent-273, score-2.181]

65 Task Acc (%) Negation scope identification 83. [sent-279, score-0.58]

66 41 Table 5: Accuracy (%) of scope identification with golden parse trees and golden cues on the abstracts subcorpus using 10-fold cross-validation It is worth nothing that we adopted the postprocessing algorithm proposed in Section 4. [sent-281, score-1.336]

67 08 in accuracy for negation and speculation scope identification, respectively, which is lower than the performance in Table 5 achieved by our post-processing algorithm. [sent-286, score-1.245]

68 Table 6 shows the performance of scope identification on automatic parse trees and golden cues. [sent-295, score-0.765]

69 In addition, we also report an oracle performance to explore the best possible performance of our system by assuming that our scope finder can always correctly determine whether a candidate is an argument or not. [sent-296, score-0.594]

70 Table 6 shows that: 1) For both negation and speculaiton scope identification, automatic syntactic parsing lowers the performance on the abstracts subcorpus (e. [sent-299, score-1.284]

71 84% in accuracy for negation scope identification and from 86. [sent-303, score-0.914]

72 However, the performance drop shows that both negation and speculation scope identification are not as senstive to automatic syntactic parsing as common shallow semantic parsing, whose performance might decrease by about ~10 in F1measure (Toutanova et al. [sent-306, score-1.529]

73 This indicates that scope identification via simplified shallow semantic parsing is robust to some variations in the parse trees. [sent-308, score-0.805]

74 721 subcorpus while speculation scope identification even performs ~20% lower in accuracy than negation scope identification on the clinical report subcorpus. [sent-311, score-2.322]

75 29 Table 6: Accuracy (%) of scope identification on the three subcorpora using automatic parser trained on 6,691 sentences in GTB 1. [sent-326, score-0.653]

76 27 identification Speculation scope identification Ö Our baseline Our final 73. [sent-332, score-0.696]

77 Note that all the performances achieved on the full papers subcorpus and the clinical subcorpus are achieved using the whole GTB 1. [sent-350, score-0.681]

78 This further indicates the appropriateness of our simplified shallow semantic parsing approach and the effectiveness of structured syntactic information on scope identification. [sent-357, score-0.659]

79 However, the improvement on the clinical reports subcorpora for negation scope identification is much less apparent, partly due to the fact that the sentences in this subcorpus are much simpler (with average length of 6. [sent-359, score-1.387]

80 Table 7 also shows that our parsing approach to speculation scope identification outperforms the rule-based method in and Radev (2009), where 10-fold cross-validation is performed on both the abstracts and the full papers subcorpora. [sent-361, score-1.253]

81 In the following, we first report the results of cue recognition and then the results of scope identification with automatic cues. [sent-365, score-1.106]

82 49 Table 8: Performance of automatic cue recognition with gold parse trees on the abstracts subcorpus using 10-fold cross-validation Table 8 lists the performance of cue recognition on the abstracts subcorpus, assuming all words in the sentences as candidates. [sent-378, score-1.636]

83 It shows that as a complement to features derived from word/pos information (CC+SW features), structural features (SF features) derived from the syntactic parse tree significantly improve the performance of cue recognition by about 1. [sent-379, score-0.587]

84 78 in F1-measure for negation and speculation cue recognition, respectively, and thus included thereafter. [sent-381, score-1.244]

85 In addition, we have also experimented on only these words, which happen to be a cue or inside a cue in the training data as cue candidates. [sent-382, score-1.404]

86 90 Table 9: Performance of automatic cue recognition with automatic parse trees on the three subcorpora Table 9 presents the performance of cue recognition achieved with automatic parse trees on the three subcorpora. [sent-402, score-1.271]

87 It shows that: 1) The performance gap of cue recognition between golden parse trees and automatic parse trees on the abstracts subcorpus is not salient (e. [sent-403, score-1.156]

88 19 for speculation cues), largely due to the features defined for cue recognition are local and insenstive to syntactic variations. [sent-411, score-0.973]

89 2) The performance of negation cue recognition is higher than that of speculation cue recognition on all the three subcorpora. [sent-412, score-1.801]

90 This is prabably due to the fact that the collection of negation cue words or phrases is limitted while speculation cue words or phrases are more open. [sent-413, score-1.707]

91 This is illustrated by our statistics that about only 1% and 1% of negation cues in the full papers and the clinical reports subcorpora are absent from the abstracts subcorpus, compared to about 6% and 20% for speculation cues. [sent-414, score-1.298]

92 3) Unexpected, the recall of speculation cue recognition on the clinical reports subcorpus is very low (i. [sent-415, score-1.373]

93 This is probably due to the absence of about 20% speculation cues from the training data of the abstracts subcorpus. [sent-419, score-0.689]

94 Scope Identification with Automatic Cue Recognition Table 10 lists the performance of both negation and speculation scope identification with automatic cues and automatic parse trees. [sent-421, score-1.542]

95 38 in F1-measure for negation scope identification on the abstracts, the full papers and the clinical reports subcorpora, respectively, while it lowers the performance by 6. [sent-425, score-1.147]

96 23 in F1-measures for speculation scope identification on the three subcorpora, respectively, suggesting the big challenge of cue recognition in the two scope learning tasks. [sent-428, score-2.001]

97 In particular, we regard the cue as the predicate and map its scope into several constituents which are deemed as arguments of the cue. [sent-448, score-1.043]

98 Evaluation on the Bioscope corpus shows the appropriateness of our parsing approach and that structured syntactic information plays a critical role in capturing the domination relationship between a cue and its dominated arguments. [sent-449, score-0.571]

99 Although our approach is only evaluated on negation and speculation scope learning here, it is portable to other kinds of scope learning. [sent-451, score-1.751]

100 For the future work, we will explore tree kernelbased methods to further improve the performance of scope learning in better capturing the structural information, and apply our parsing approach to other kinds of scope learning. [sent-452, score-0.992]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('scope', 0.464), ('cue', 0.463), ('speculation', 0.447), ('negation', 0.334), ('subcorpus', 0.249), ('abstracts', 0.139), ('bioscope', 0.132), ('clinical', 0.132), ('identification', 0.116), ('morante', 0.11), ('cues', 0.103), ('golden', 0.096), ('argument', 0.088), ('gtb', 0.088), ('shallow', 0.066), ('subcorpora', 0.057), ('scopes', 0.051), ('papers', 0.051), ('recognition', 0.047), ('parse', 0.046), ('daelemans', 0.045), ('gy', 0.044), ('rgy', 0.044), ('arguments', 0.043), ('simplified', 0.043), ('resistance', 0.038), ('abnormalities', 0.037), ('farkas', 0.037), ('pcs', 0.037), ('szarvas', 0.037), ('vincze', 0.037), ('zg', 0.037), ('parsing', 0.036), ('reports', 0.035), ('chunking', 0.034), ('semantic', 0.034), ('constituents', 0.033), ('biomedical', 0.033), ('portable', 0.029), ('speculative', 0.029), ('candidate', 0.029), ('sw', 0.028), ('predicate', 0.027), ('trees', 0.027), ('dominated', 0.023), ('genia', 0.023), ('constituent', 0.022), ('vp', 0.022), ('assertion', 0.022), ('comfortable', 0.022), ('corticosteroid', 0.022), ('laci', 0.022), ('nos', 0.022), ('remarks', 0.022), ('wordm', 0.022), ('sf', 0.022), ('cc', 0.021), ('radev', 0.021), ('regarding', 0.019), ('veronika', 0.019), ('roser', 0.019), ('node', 0.019), ('parent', 0.019), ('formulating', 0.018), ('critical', 0.017), ('hedge', 0.017), ('syntactic', 0.016), ('pruning', 0.016), ('left', 0.016), ('heuristic', 0.016), ('surrounding', 0.016), ('plays', 0.016), ('automatic', 0.016), ('acc', 0.015), ('ancestral', 0.015), ('auto', 0.015), ('collier', 0.015), ('consecutiveness', 0.015), ('goldin', 0.015), ('medlock', 0.015), ('radiology', 0.015), ('specuaiton', 0.015), ('speculaiton', 0.015), ('speculations', 0.015), ('vrelid', 0.015), ('lowers', 0.015), ('chair', 0.015), ('chapman', 0.015), ('instantiations', 0.015), ('inside', 0.015), ('tree', 0.015), ('cheap', 0.014), ('vbp', 0.014), ('explained', 0.013), ('oracle', 0.013), ('walter', 0.013), ('deemed', 0.013), ('kinds', 0.013), ('conll', 0.013), ('recast', 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 15 emnlp-2010-A Unified Framework for Scope Learning via Simplified Shallow Semantic Parsing

Author: Qiaoming Zhu ; Junhui Li ; Hongling Wang ; Guodong Zhou

2 0.065225951 90 emnlp-2010-Positional Language Models for Clinical Information Retrieval

Author: Florian Boudin ; Jian-Yun Nie ; Martin Dawes

Abstract: The PECO framework is a knowledge representation for formulating clinical questions. Queries are decomposed into four aspects, which are Patient-Problem (P), Exposure (E), Comparison (C) and Outcome (O). However, no test collection is available to evaluate such framework in information retrieval. In this work, we first present the construction of a large test collection extracted from systematic literature reviews. We then describe an analysis of the distribution of PECO elements throughout the relevant documents and propose a language modeling approach that uses these distributions as a weighting strategy. In our experiments carried out on a collection of 1.5 million documents and 423 queries, our method was found to lead to an improvement of 28% in MAP and 50% in P@5, as com- pared to the state-of-the-art method.

3 0.055298448 68 emnlp-2010-Joint Inference for Bilingual Semantic Role Labeling

Author: Tao Zhuang ; Chengqing Zong

Abstract: We show that jointly performing semantic role labeling (SRL) on bitext can improve SRL results on both sides. In our approach, we use monolingual SRL systems to produce argument candidates for predicates in bitext at first. Then, we simultaneously generate SRL results for two sides of bitext using our joint inference model. Our model prefers the bilingual SRL result that is not only reasonable on each side of bitext, but also has more consistent argument structures between two sides. To evaluate the consistency between two argument structures, we also formulate a log-linear model to compute the probability of aligning two arguments. We have experimented with our model on Chinese-English parallel PropBank data. Using our joint inference model, F1 scores of SRL results on Chinese and English text achieve 79.53% and 77.87% respectively, which are 1.52 and 1.74 points higher than the results of baseline monolingual SRL combination systems respectively.

4 0.042846315 121 emnlp-2010-What a Parser Can Learn from a Semantic Role Labeler and Vice Versa

Author: Stephen Boxwell ; Dennis Mehay ; Chris Brew

Abstract: In many NLP systems, there is a unidirectional flow of information in which a parser supplies input to a semantic role labeler. In this paper, we build a system that allows information to flow in both directions. We make use of semantic role predictions in choosing a single-best parse. This process relies on an averaged perceptron model to distinguish likely semantic roles from erroneous ones. Our system penalizes parses that give rise to low-scoring semantic roles. To explore the consequences of this we perform two experiments. First, we use a baseline generative model to produce n-best parses, which are then re-ordered by our semantic model. Second, we use a modified version of our semantic role labeler to predict semantic roles at parse time. The performance of this modified labeler is weaker than that of our best full SRL, because it is restricted to features that can be computed directly from the parser’s packed chart. For both experiments, the resulting semantic predictions are then used to select parses. Finally, we feed the selected parses produced by each experiment to the full version of our semantic role labeler. We find that SRL performance can be improved over this baseline by selecting parses with likely semantic roles.

5 0.034513969 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

Author: Eugene Charniak

Abstract: We present a new syntactic parser that works left-to-right and top down, thus maintaining a fully-connected parse tree for a few alternative parse hypotheses. All of the commonly used statistical parsers use context-free dynamic programming algorithms and as such work bottom up on the entire sentence. Thus they only find a complete fully connected parse at the very end. In contrast, both subjective and experimental evidence show that people understand a sentence word-to-word as they go along, or close to it. The constraint that the parser keeps one or more fully connected syntactic trees is intended to operationalize this cognitive fact. Our parser achieves a new best result for topdown parsers of 89.4%,a 20% error reduction over the previous single-parser best result for parsers of this type of 86.8% (Roark, 2001) . The improved performance is due to embracing the very large feature set available in exchange for giving up dynamic programming.

6 0.034507334 114 emnlp-2010-Unsupervised Parse Selection for HPSG

7 0.032661103 59 emnlp-2010-Identifying Functional Relations in Web Text

8 0.031869706 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

9 0.031034539 21 emnlp-2010-Automatic Discovery of Manner Relations and its Applications

10 0.03076851 94 emnlp-2010-SCFG Decoding Without Binarization

11 0.030668795 14 emnlp-2010-A Tree Kernel-Based Unified Framework for Chinese Zero Anaphora Resolution

12 0.030479586 95 emnlp-2010-SRL-Based Verb Selection for ESL

13 0.030316111 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

14 0.028694773 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

15 0.022863183 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

16 0.021671837 24 emnlp-2010-Automatically Producing Plot Unit Representations for Narrative Text

17 0.021344785 11 emnlp-2010-A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

18 0.020983968 60 emnlp-2010-Improved Fully Unsupervised Parsing with Zoomed Learning

19 0.02059228 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

20 0.020059261 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.077), (1, 0.035), (2, 0.051), (3, 0.074), (4, 0.025), (5, 0.01), (6, 0.038), (7, -0.078), (8, 0.023), (9, -0.004), (10, -0.011), (11, 0.002), (12, -0.018), (13, -0.006), (14, -0.014), (15, 0.015), (16, 0.053), (17, 0.013), (18, 0.002), (19, 0.051), (20, -0.003), (21, 0.049), (22, 0.074), (23, -0.018), (24, -0.04), (25, 0.053), (26, -0.033), (27, 0.054), (28, -0.061), (29, 0.099), (30, 0.312), (31, -0.021), (32, 0.126), (33, 0.342), (34, -0.102), (35, -0.06), (36, 0.316), (37, -0.222), (38, -0.005), (39, 0.077), (40, 0.163), (41, 0.141), (42, 0.178), (43, 0.111), (44, 0.131), (45, 0.118), (46, -0.094), (47, -0.062), (48, 0.174), (49, -0.078)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97828543 15 emnlp-2010-A Unified Framework for Scope Learning via Simplified Shallow Semantic Parsing

Author: Qiaoming Zhu ; Junhui Li ; Hongling Wang ; Guodong Zhou

2 0.62783891 90 emnlp-2010-Positional Language Models for Clinical Information Retrieval

Author: Florian Boudin ; Jian-Yun Nie ; Martin Dawes

3 0.23568146 59 emnlp-2010-Identifying Functional Relations in Web Text

Author: Thomas Lin ; Mausam ; Oren Etzioni

Abstract: Determining whether a textual phrase denotes a functional relation (i.e., a relation that maps each domain element to a unique range element) is useful for numerous NLP tasks such as synonym resolution and contradiction detection. Previous work on this problem has relied on either counting methods or lexico-syntactic patterns. However, determining whether a relation is functional, by analyzing mentions of the relation in a corpus, is challenging due to ambiguity, synonymy, anaphora, and other linguistic phenomena. We present the LEIBNIZ system that overcomes these challenges by exploiting the synergy between the Web corpus and freelyavailable knowledge resources such as Freebase. It first computes multiple typedfunctionality scores, representing functionality of the relation phrase when its arguments are constrained to specific types. It then aggregates these scores to predict the global functionality for the phrase. LEIBNIZ outperforms previous work, increasing area under the precisionrecall curve from 0.61 to 0.88. We utilize LEIBNIZ to generate the first public repository of automatically-identified functional relations.

4 0.18048204 94 emnlp-2010-SCFG Decoding Without Binarization

Author: Mark Hopkins ; Greg Langmead

Abstract: Conventional wisdom dictates that synchronous context-free grammars (SCFGs) must be converted to Chomsky Normal Form (CNF) to ensure cubic time decoding. For arbitrary SCFGs, this is typically accomplished via the synchronous binarization technique of (Zhang et al., 2006). A drawback to this approach is that it inflates the constant factors associated with decoding, and thus the practical running time. (DeNero et al., 2009) tackle this problem by defining a superset of CNF called Lexical Normal Form (LNF), which also supports cubic time decoding under certain implicit assumptions. In this paper, we make these assumptions explicit, and in doing so, show that LNF can be further expanded to a broader class of grammars (called “scope3”) that also supports cubic-time decoding. By simply pruning non-scope-3 rules from a GHKM-extracted grammar, we obtain better translation performance than synchronous binarization.

5 0.15724927 68 emnlp-2010-Joint Inference for Bilingual Semantic Role Labeling

Author: Tao Zhuang ; Chengqing Zong

6 0.15101798 14 emnlp-2010-A Tree Kernel-Based Unified Framework for Chinese Zero Anaphora Resolution

7 0.14968958 21 emnlp-2010-Automatic Discovery of Manner Relations and its Applications

8 0.14430667 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

9 0.13543762 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

10 0.13502859 121 emnlp-2010-What a Parser Can Learn from a Semantic Role Labeler and Vice Versa

11 0.120643 60 emnlp-2010-Improved Fully Unsupervised Parsing with Zoomed Learning

12 0.11881543 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

13 0.11558832 114 emnlp-2010-Unsupervised Parse Selection for HPSG

14 0.10595308 35 emnlp-2010-Discriminative Sample Selection for Statistical Machine Translation

15 0.1036853 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

16 0.093392648 95 emnlp-2010-SRL-Based Verb Selection for ESL

17 0.09181004 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

18 0.087547801 27 emnlp-2010-Clustering-Based Stratified Seed Sampling for Semi-Supervised Relation Classification

19 0.087134168 123 emnlp-2010-Word-Based Dialect Identification with Georeferenced Rules

20 0.085971028 102 emnlp-2010-Summarizing Contrastive Viewpoints in Opinionated Text

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.509), (12, 0.035), (29, 0.053), (30, 0.016), (32, 0.012), (49, 0.016), (52, 0.019), (56, 0.044), (66, 0.059), (72, 0.026), (76, 0.047), (82, 0.014), (87, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84941822 15 emnlp-2010-A Unified Framework for Scope Learning via Simplified Shallow Semantic Parsing

Author: Qiaoming Zhu ; Junhui Li ; Hongling Wang ; Guodong Zhou

2 0.64386135 62 emnlp-2010-Improving Mention Detection Robustness to Noisy Input

Author: Radu Florian ; John Pitrelli ; Salim Roukos ; Imed Zitouni

Abstract: Information-extraction (IE) research typically focuses on clean-text inputs. However, an IE engine serving real applications yields many false alarms due to less-well-formed input. For example, IE in a multilingual broadcast processing system has to deal with inaccurate automatic transcription and translation. The resulting presence of non-target-language text in this case, and non-language material interspersed in data from other applications, raise the research problem of making IE robust to such noisy input text. We address one such IE task: entity-mention detection. We describe augmenting a statistical mention-detection system in order to reduce false alarms from spurious passages. The diverse nature of input noise leads us to pursue a multi-faceted approach to robustness. For our English-language system, at various miss rates we eliminate 97% of false alarms on inputs from other Latin-alphabet languages. In another experiment, representing scenarios in which genre-specific training is infeasible, we process real financial-transactions text containing mixed languages and data-set codes. On these data, because we do not train on data like it, we achieve a smaller but significant improvement. These gains come with virtually no loss in accuracy on clean English text.

3 0.61449039 61 emnlp-2010-Improving Gender Classification of Blog Authors

Author: Arjun Mukherjee ; Bing Liu

Abstract: The problem of automatically classifying the gender of a blog author has important applications in many commercial domains. Existing systems mainly use features such as words, word classes, and POS (part-ofspeech) n-grams, for classification learning. In this paper, we propose two new techniques to improve the current result. The first technique introduces a new class of features which are variable length POS sequence patterns mined from the training data using a sequence pattern mining algorithm. The second technique is a new feature selection method which is based on an ensemble of several feature selection criteria and approaches. Empirical evaluation using a real-life blog data set shows that these two techniques improve the classification accuracy of the current state-ofthe-art methods significantly.

4 0.27595407 44 emnlp-2010-Enhancing Mention Detection Using Projection via Aligned Corpora

Author: Yassine Benajiba ; Imed Zitouni

Abstract: The research question treated in this paper is centered on the idea of exploiting rich resources of one language to enhance the performance of a mention detection system of another one. We successfully achieve this goal by projecting information from one language to another via a parallel corpus. We examine the potential improvement using various degrees of linguistic information in a statistical framework and we show that the proposed technique is effective even when the target language model has access to a significantly rich feature set. Experimental results show up to 2.4F improvement in performance when the system has access to information obtained by projecting mentions from a resource-richlanguage mention detection system via a parallel corpus.

5 0.25849524 55 emnlp-2010-Handling Noisy Queries in Cross Language FAQ Retrieval

Author: Danish Contractor ; Govind Kothari ; Tanveer Faruquie ; L V Subramaniam ; Sumit Negi

Abstract: Recent times have seen a tremendous growth in mobile based data services that allow people to use Short Message Service (SMS) to access these data services. In a multilingual society it is essential that data services that were developed for a specific language be made accessible through other local languages also. In this paper, we present a service that allows a user to query a FrequentlyAsked-Questions (FAQ) database built in a local language (Hindi) using Noisy SMS English queries. The inherent noise in the SMS queries, along with the language mismatch makes this a challenging problem. We handle these two problems by formulating the query similarity over FAQ questions as a combinatorial search problem where the search space consists of combinations of dictionary variations of the noisy query and its top-N translations. We demonstrate the effectiveness of our approach on a real-life dataset.

6 0.25469744 8 emnlp-2010-A Multi-Pass Sieve for Coreference Resolution

7 0.25352296 20 emnlp-2010-Automatic Detection and Classification of Social Events

8 0.25142366 53 emnlp-2010-Fusing Eye Gaze with Speech Recognition Hypotheses to Resolve Exophoric References in Situated Dialogue

9 0.25061023 45 emnlp-2010-Evaluating Models of Latent Document Semantics in the Presence of OCR Errors

10 0.24965923 26 emnlp-2010-Classifying Dialogue Acts in One-on-One Live Chats

11 0.24818112 37 emnlp-2010-Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks

12 0.24746749 119 emnlp-2010-We're Not in Kansas Anymore: Detecting Domain Changes in Streams

13 0.24599017 14 emnlp-2010-A Tree Kernel-Based Unified Framework for Chinese Zero Anaphora Resolution

14 0.24520114 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

15 0.24477299 35 emnlp-2010-Discriminative Sample Selection for Statistical Machine Translation

16 0.24096631 82 emnlp-2010-Multi-Document Summarization Using A* Search and Discriminative Learning

17 0.23819943 21 emnlp-2010-Automatic Discovery of Manner Relations and its Applications

18 0.23017463 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

19 0.22893609 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

20 0.22888811 23 emnlp-2010-Automatic Keyphrase Extraction via Topic Decomposition