acl acl2011 acl2011-173 knowledge-graph by maker-knowledge-mining

173 acl-2011-Insertion Operator for Bayesian Tree Substitution Grammars

Source: pdf

Author: Hiroyuki Shindo ; Akinori Fujino ; Masaaki Nagata

Abstract: We propose a model that incorporates an insertion operator in Bayesian tree substitution grammars (BTSG). Tree insertion is helpful for modeling syntax patterns accurately with fewer grammar rules than BTSG. The experimental parsing results show that our model outperforms a standard PCFG and BTSG for a small dataset. For a large dataset, our model obtains comparable results to BTSG, making the number of grammar rules much smaller than with BTSG.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 2-4 Hikaridai Seika-cho Soraku-gun Kyoto 619-0237 Japan { shindo . [sent-2, score-0.043]

2 jp , , Abstract We propose a model that incorporates an insertion operator in Bayesian tree substitution grammars (BTSG). [sent-8, score-0.812]

3 Tree insertion is helpful for modeling syntax patterns accurately with fewer grammar rules than BTSG. [sent-9, score-0.428]

4 The experimental parsing results show that our model outperforms a standard PCFG and BTSG for a small dataset. [sent-10, score-0.067]

5 For a large dataset, our model obtains comparable results to BTSG, making the number of grammar rules much smaller than with BTSG. [sent-11, score-0.15]

6 1 Introduction Tree substitution grammar (TSG) is a promising formalism for modeling language data. [sent-12, score-0.235]

7 TSG generalizes context free grammars (CFG) by allowing nonterminal nodes to be replaced with subtrees of arbitrary size. [sent-13, score-0.201]

8 A natural extension of TSG involves adding an insertion operator for combining subtrees as in tree adjoining grammars (TAG) (Joshi, 1985) or tree insertion grammars (TIG) (Schabes and Waters, 1995). [sent-14, score-1.321]

9 An insertion operator is helpful for expressing various syntax patterns with fewer grammar rules, thus we expect that adding an insertion operator will improve parsing accuracy and realize a compact grammar size. [sent-15, score-1.191]

10 One of the challenges of adding an insertion operator is that the computational cost of grammar induction is high since tree insertion significantly increases the number of possible subtrees. [sent-16, score-1.026]

11 Instead, we incorporate an insertion operator in a Bayesian TSG (BTSG) model (Cohn et al. [sent-19, score-0.506]

12 Our model uses a restricted variant of subtrees for insertion to model the probability dis- tribution simply and train the model efficiently. [sent-21, score-0.534]

13 We also present an inference technique for handling a tree insertion that makes use of dynamic programming. [sent-22, score-0.449]

14 2 Overview of BTSG Model We briefly review the BTSG model described in (Cohn et al. [sent-23, score-0.028]

15 Subtrees for substitution are referred to as initial trees, and leaf nonterminals in initial trees are referred to as frontier nodes. [sent-27, score-0.564]

16 Their task is the unsupervised induction of TSG derivations from parse trees. [sent-28, score-0.053]

17 A derivation is information about how subtrees are combined to form parse trees. [sent-29, score-0.173]

18 dX and θX are hyperparameters that are used to control the model’s behavior. [sent-31, score-0.038]

19 Integrating out all possible values of GX, the resulting distribution is Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o. [sent-32, score-0.04]

20 ,ei−1 (ei, |X ) and , (1) βX = are previously gen- erated initial trees, and ne−i,iX is the number of times ei has been used in e−i. [sent-38, score-0.161]

21 n·−,Xi = Pene−,iX and t·,X = Pe te,X are the total co·u,Xnts ofP initial trees and tabPles, respectively. [sent-40, score-0.245]

22 The PYP prior produces “rich get richer” statistics: a few initial trees are often used for derivation while many are rarely used, and this is shown empirically to be well-suited for natural language (Teh, 2006b; Johnson and Goldwater, 2009). [sent-41, score-0.347]

23 The base probability of an initial tree, P0 (e |X ), is given as follows. [sent-42, score-0.166]

24 P0(e|X ) = Y PMLE(r) Y sA × r∈YCFG(e) A∈YLEAF(e) Y (1 − sB),(2) B ∈IYNTER(e) where CFG (e) is a set of decomposed CFG productions of e, PMLE (r) is a maximum likelihood estimate (MLE) of r. [sent-43, score-0.061]

25 LEAF (e) and INTER (e) are sets of leaf and internal symbols of e, respectively. [sent-44, score-0.061]

26 1 Tree Insertion Model We propose a model that incorporates an insertion operator in BTSG. [sent-47, score-0.535]

27 Figure 1b shows an example of an insertion operator. [sent-48, score-0.306]

28 To distinguish them from initial trees, subtrees for insertion are referred to as auxiliary trees. [sent-49, score-0.796]

29 An auxiliary tree includes a special nonterminal leaf node labeled with the same symbol as the root node. [sent-50, score-0.673]

30 This leaf node is referred to as a foot node (marked with the subscript “*”). [sent-51, score-0.345]

31 The definitions of substitution and insertion operators are identical with those of TIG and TAG. [sent-52, score-0.387]

32 Since it is computationally expensive to allow any auxiliary trees, we tackle the problem by introducing simple auxiliary trees, i. [sent-53, score-0.606]

33 , auxiliary trees whose root node must generate a foot node as an immediate child. [sent-55, score-0.777]

34 For example, “(N (JJ pretty) N*)” is a simple auxiliary tree, but “(S (NP ) (VP (V think) S*))” is 207 (a) (b) Figure 1: Example of (a) substitution and (b) insertion (dotted line). [sent-56, score-0.69]

35 Note that we place no restriction on the initial trees. [sent-58, score-0.085]

36 Our restricted formalism is a strict subset of TIG. [sent-59, score-0.117]

37 We briefly refer to some differences between TAG, TIG and our insertion model. [sent-60, score-0.306]

38 TAG generates tree adjoining languages, a strict superset of contextfree languages, and the computational complexity of parsing is O ? [sent-61, score-0.306]

39 Therefore, TIG generates context-free languages and the parsing complexity is O ? [sent-67, score-0.078]

40 nd, our model prohibits neither wrapping adjunction in TAG nor simultaneous adjunction in TIG, and allows only simple auxiliary trees. [sent-72, score-0.498]

41 The expressive power and computational complexity of our formalism is identical to TIG, however, our model allows us to define the probability distribution over auxiliary trees as having the same form as BTSG model. [sent-73, score-0.632]

42 We define a probability distribution over simple auxiliary trees as having the same form as eq. [sent-75, score-0.548]

43 However, we need modify the base distribution over simple auxiliary trees, P00 (e |X ), as follows, so that all probabilities of the simple auxiliary otrewess, sum to one. [sent-82, score-0.682]

44 P00(e|X ) = PM0LE(TOP(e)) Y A∈YLEAF(e) sA Y PMLE(r) r∈INTYER_CFG(e) Y (1 − sB),(4) B∈IYNTER(e) where TOP (e) is the CFG production that starts with the root node of e. [sent-83, score-0.144]

45 INTER_CFG (e) is a set of CFG productions of e excluding TOP (e). [sent-85, score-0.061]

46 PM0LE (r0) is a modified MLE for simple auxiliary trees, which is given by ( C(X→X∗YC )+0(r0C)(X→Y X∗) if r0includeelss ae foot node where C (r0) is the frequency of r0 in parse trees. [sent-86, score-0.499]

47 It is ensured that P00 (e |X ) generates a foot node as an immediate child. [sent-87, score-0.245]

48 We define the probability distribution over both initial trees and simple auxiliary trees with a PYP prior. [sent-88, score-0.793]

49 The base distribution over initial trees is defined as P0 (e |X ), and the base distribution over simple auxiliary t)r,ee asn ids dtheefin beads as P00 (e |X ). [sent-89, score-0.7]

50 Avenr initial tree ei replaces a frontier node we|iXth probability p (ei |e−i, X, dX , θX ). [sent-90, score-0.437]

51 On the other hand, a simple auxiliary tree e0i inserts an internal node with probability aX ×p0 ? [sent-91, score-0.549]

52 2 Grammar Decomposition We develop a grammar decomposition technique, which is an extension of work (Cohn and Blunsom, 2010) on BTSG model, to deal with an insertion operator. [sent-103, score-0.435]

53 The motivation behind grammar decom- × position is that it is hard to consider all possible Figure 2: Derivation of Fig. [sent-104, score-0.098]

54 derivations explicitly since the base distribution assigns non-zero probability to an infinite number of initial and auxiliary trees. [sent-108, score-0.567]

55 Alternatively, we transform a derivation into CFG productions and assign the probability for each CFG production so that its assignment is consistent with the probability distributions. [sent-109, score-0.289]

56 We can efficiently calculate an inside probability (described in the next subsection) by employing grammar decomposition. [sent-110, score-0.143]

57 Here we provide an example of the derivation shown in Fig. [sent-111, score-0.102]

58 2, all the derivation information is embedded in each symbol. [sent-117, score-0.102]

59 That is, NP(NP (DT the) (N girl)) is a root symbol of the initial tree “(NP (DT the) (N girl))”, which generates two child nodes: DT(DT the) and N(N girl). [sent-118, score-0.298]

60 On the other hand, Nins (N girl) denotes that N(N girl) is inserted by some auxiliary tree, and Ni(nNs ( (NJJ p grierlt)ty) N*) denotes that the inserted simple auxiliary tree is “(N (JJ pretty) (N*))”. [sent-120, score-0.811]

61 The inserted auxiliary tree, “(N (JJ pretty) (N*))”, must generate a foot node: “(N girl)” as an immediate child. [sent-121, score-0.471]

62 Second, we decompose the transformed tree into CFG productions and then assign the probability for each CFG production as shown in Table 1, where aDT, aN and aJJ are insertion probabilities for nonterminal DT, N and JJ, respectively. [sent-122, score-0.646]

63 Note that the probability of a derivation according to Table 1 is the same as the probability of a derivation obtained from the distribution over the initial and auxiliary trees (i. [sent-123, score-0.882]

64 In Table 1, we assume that the auxiliary tree “(N (JJ pretty) (N*))” is sampled from the first term of eq. [sent-128, score-0.454]

65 When it is sampled from the second term, we alternatively assign the probability β(0N (JJ pretty) N*), N. [sent-130, score-0.097]

66 3 Training We use a blocked Metropolis-Hastings (MH) algorithm (Cohn and Blunsom, 2010) to train our model. [sent-132, score-0.034]

67 The MH algorithm learns BTSG model parameters efficiently, and it can be applied to our insertion model. [sent-133, score-0.334]

68 Calculate the inside probability (Lari and Young, 1991) in a bottom-up manner using the grammar decomposition. [sent-136, score-0.143]

69 Accept or reject the derivation sample by using the MH test. [sent-140, score-0.102]

70 The hyperparameters of our model are updated with the auxiliary variable technique (Teh, 2006a). [sent-142, score-0.391]

71 We did not use a development set since our model automatically updates the hyperparameters for every iteration. [sent-144, score-0.066]

72 The treebank data was binarized using the CENTER-HEAD method (Matsuzaki et al. [sent-145, score-0.044]

73 40 Table 2: Small dataset experiments # rules (# aux. [sent-162, score-0.058]

74 3 Table 3: Full Penn Treebank dataset experiments words using lexical features. [sent-168, score-0.034]

75 We trained our model using a training set, and then sampled 10k derivations for each sentence in a test set. [sent-169, score-0.088]

76 We show the bracketing F1 score of predicted parse trees evaluated by EVALB4, averaged over three independent runs. [sent-172, score-0.16]

77 In small dataset experiments, we used BNC (1k sentences, 90% for training and 10% for testing) and WSJ (section 2 for training and section 22 for testing). [sent-173, score-0.034]

78 We trained the model with an MH sampler for 1k iterations. [sent-175, score-0.055]

79 Table 2 shows the parsing results for the test set. [sent-176, score-0.039]

80 We compared our model with standard PCFG and BTSG models implemented by us. [sent-177, score-0.028]

81 This suggests that adding an insertion operator is helpful for modeling syntax trees accurately. [sent-179, score-0.638]

82 The BTSG model described in (Cohn and Blunsom, 2010) is similar to ours. [sent-180, score-0.028]

83 edu/evalb/ (N¯P (N¯P ) (: –)) (N¯P (N¯P ) (ADVP (RB respectively))) (P¯P (P¯P ) (, ,)) (V¯P (V¯P ) (RB then)) (Q¯P (Q¯P ) (IN of)) (SB¯AR (SB¯AR ) (RB not)) (S¯ (S¯ ) (: ;)) Table 4: Examples of lexicalized auxiliary trees obtained from our model in the full treebank dataset. [sent-188, score-0.535]

84 Nonterminal symbols created by binarization are shown with an over-bar. [sent-189, score-0.03]

85 We also applied our model to the full WSJ Penn Treebank setting (section 2-21 for training and section 23 for testing). [sent-190, score-0.028]

86 For the full treebank dataset, our model obtained nearly identical results to those obtained with BTSG model, making the grammar size approximately 19% smaller than that ofBTSG. [sent-194, score-0.17]

87 We can see that only a small number of auxiliary trees have a great impact on reducing the grammar size. [sent-195, score-0.561]

88 Surprisingly, there are many fewer auxiliary trees than initial trees. [sent-196, score-0.548]

89 We believe this to be due to the tree binarization and our restricted assumption of simple auxiliary trees. [sent-197, score-0.482]

90 Table 4 shows examples of lexicalized auxiliary trees obtained with our model for the full treebank data. [sent-198, score-0.535]

91 We can see that punctuation (“–”, “,”, and “;”) and adverb (RB) tend to be inserted in other trees. [sent-199, score-0.071]

92 Punctuation and adverb appear in various positions in English sentences. [sent-200, score-0.029]

93 5 Summary We proposed a model that incorporates an insertion operator in BTSG and developed an efficient inference technique. [sent-202, score-0.535]

94 Since it is computationally expensive to allow any auxiliary trees, we tackled the problem by introducing a restricted variant of aux- iliary trees. [sent-203, score-0.331]

95 Our model outperformed the BTSG model for a small dataset, and achieved comparable parsing results for a large dataset, making the 210 number of grammars much smaller than the BTSG model. [sent-204, score-0.17]

96 We will extend our model to original TAG and evaluate its impact on statistical parsing performance. [sent-205, score-0.067]

97 Tree adjoining grammars: How much context-sensitivity is required to provide reasonable structural descriptions? [sent-243, score-0.074]

98 Applications of stochastic context-free grammars using the inside-outside algorithm. [sent-250, score-0.075]

99 The two-parameter PoissonDirichlet distribution derived from a stable subordinator. [sent-273, score-0.04]

100 Tree insertion grammar: a cubic-time, parsable formalism that lexicalizes context-free grammar without changing the trees produced. [sent-287, score-0.641]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('btsg', 0.487), ('girl', 0.317), ('insertion', 0.306), ('auxiliary', 0.303), ('pretty', 0.186), ('operator', 0.172), ('trees', 0.16), ('tig', 0.149), ('cfg', 0.148), ('jj', 0.136), ('cohn', 0.135), ('dt', 0.122), ('tree', 0.121), ('tsg', 0.111), ('mh', 0.106), ('derivation', 0.102), ('blunsom', 0.1), ('grammar', 0.098), ('foot', 0.093), ('initial', 0.085), ('dx', 0.084), ('substitution', 0.081), ('node', 0.08), ('ei', 0.076), ('grammars', 0.075), ('adjoining', 0.074), ('pmle', 0.073), ('subtrees', 0.071), ('nins', 0.064), ('leaf', 0.061), ('productions', 0.061), ('bayesian', 0.06), ('pyp', 0.059), ('adjunction', 0.059), ('formalism', 0.056), ('rb', 0.056), ('gx', 0.056), ('nonterminal', 0.055), ('sb', 0.052), ('bnc', 0.051), ('ajj', 0.049), ('grierlt', 0.049), ('iynter', 0.049), ('njj', 0.049), ('wrapping', 0.049), ('probability', 0.045), ('treebank', 0.044), ('ty', 0.044), ('yleaf', 0.043), ('shindo', 0.043), ('inserted', 0.042), ('np', 0.04), ('distribution', 0.04), ('lari', 0.04), ('parsing', 0.039), ('wsj', 0.039), ('generates', 0.039), ('hyperparameters', 0.038), ('pitman', 0.037), ('adt', 0.037), ('production', 0.036), ('base', 0.036), ('tag', 0.035), ('dataset', 0.034), ('blocked', 0.034), ('matsuzaki', 0.034), ('ntt', 0.034), ('immediate', 0.033), ('strict', 0.033), ('nns', 0.032), ('post', 0.031), ('decomposition', 0.031), ('referred', 0.031), ('derivations', 0.03), ('sampled', 0.03), ('mle', 0.03), ('binarization', 0.03), ('frontier', 0.03), ('incorporates', 0.029), ('adverb', 0.029), ('schabes', 0.029), ('stopping', 0.029), ('restricted', 0.028), ('infinite', 0.028), ('root', 0.028), ('model', 0.028), ('sampler', 0.027), ('ax', 0.027), ('symbol', 0.025), ('rules', 0.024), ('induction', 0.023), ('ae', 0.023), ('transformed', 0.022), ('sa', 0.022), ('technique', 0.022), ('pcfg', 0.022), ('alternatively', 0.022), ('lexicalizes', 0.021), ('rim', 0.021), ('waters', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.000001 173 acl-2011-Insertion Operator for Bayesian Tree Substitution Grammars

Author: Hiroyuki Shindo ; Akinori Fujino ; Masaaki Nagata

2 0.25483039 30 acl-2011-Adjoining Tree-to-String Translation

Author: Yang Liu ; Qun Liu ; Yajuan Lu

Abstract: We introduce synchronous tree adjoining grammars (TAG) into tree-to-string translation, which converts a source tree to a target string. Without reconstructing TAG derivations explicitly, our rule extraction algorithm directly learns tree-to-string rules from aligned Treebank-style trees. As tree-to-string translation casts decoding as a tree parsing problem rather than parsing, the decoder still runs fast when adjoining is included. Less than 2 times slower, the adjoining tree-tostring system improves translation quality by +0.7 BLEU over the baseline system only allowing for tree substitution on NIST ChineseEnglish test sets.

3 0.13950071 188 acl-2011-Judging Grammaticality with Tree Substitution Grammar Derivations

Author: Matt Post

Abstract: In this paper, we show that local features computed from the derivations of tree substitution grammars such as the identify of particular fragments, and a count of large and small fragments are useful in binary grammatical classification tasks. Such features outperform n-gram features and various model scores by a wide margin. Although they fall short of the performance of the hand-crafted feature set of Charniak and Johnson (2005) developed for parse tree reranking, they do so with an order of magnitude fewer features. Furthermore, since the TSGs employed are learned in a Bayesian setting, the use of their derivations can be viewed as the automatic discovery of tree patterns useful for classification. On the BLLIP dataset, we achieve an accuracy of 89.9% in discriminating between grammatical text and samples from an n-gram language model. — —

4 0.1315755 219 acl-2011-Metagrammar engineering: Towards systematic exploration of implemented grammars

Author: Antske Fokkens

Abstract: When designing grammars of natural language, typically, more than one formal analysis can account for a given phenomenon. Moreover, because analyses interact, the choices made by the engineer influence the possibilities available in further grammar development. The order in which phenomena are treated may therefore have a major impact on the resulting grammar. This paper proposes to tackle this problem by using metagrammar development as a methodology for grammar engineering. Iargue that metagrammar engineering as an approach facilitates the systematic exploration of grammars through comparison of competing analyses. The idea is illustrated through a comparative study of auxiliary structures in HPSG-based grammars for German and Dutch. Auxiliaries form a central phenomenon of German and Dutch and are likely to influence many components of the grammar. This study shows that a special auxiliary+verb construction significantly improves efficiency compared to the standard argument-composition analysis for both parsing and generation.

5 0.12924331 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations

Author: Bing Zhao ; Young-Suk Lee ; Xiaoqiang Luo ; Liu Li

Abstract: We propose a novel technique of learning how to transform the source parse trees to improve the translation qualities of syntax-based translation models using synchronous context-free grammars. We transform the source tree phrasal structure into a set of simpler structures, expose such decisions to the decoding process, and find the least expensive transformation operation to better model word reordering. In particular, we integrate synchronous binarizations, verb regrouping, removal of redundant parse nodes, and incorporate a few important features such as translation boundaries. We learn the structural preferences from the data in a generative framework. The syntax-based translation system integrating the proposed techniques outperforms the best Arabic-English unconstrained system in NIST08 evaluations by 1.3 absolute BLEU, which is statistically significant.

6 0.10047929 61 acl-2011-Binarized Forest to String Translation

7 0.10028213 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation

8 0.085813411 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

9 0.08486867 330 acl-2011-Using Derivation Trees for Treebank Error Detection

10 0.082645528 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction

11 0.080469377 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

12 0.079676658 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation

13 0.070737302 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing

14 0.069572181 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation

15 0.069194332 28 acl-2011-A Statistical Tree Annotator and Its Applications

16 0.069152474 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

17 0.069019988 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction

18 0.068822443 296 acl-2011-Terminal-Aware Synchronous Binarization

19 0.067314744 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar

20 0.065763518 108 acl-2011-EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.144), (1, -0.111), (2, 0.021), (3, -0.13), (4, -0.012), (5, -0.021), (6, -0.154), (7, -0.024), (8, -0.073), (9, -0.042), (10, -0.061), (11, 0.011), (12, 0.011), (13, 0.126), (14, 0.06), (15, 0.002), (16, 0.014), (17, 0.044), (18, 0.038), (19, -0.014), (20, 0.0), (21, 0.023), (22, 0.013), (23, 0.043), (24, 0.054), (25, 0.032), (26, -0.02), (27, -0.019), (28, 0.047), (29, -0.057), (30, -0.026), (31, -0.006), (32, -0.052), (33, 0.012), (34, 0.01), (35, -0.036), (36, -0.036), (37, -0.134), (38, -0.088), (39, -0.012), (40, -0.065), (41, -0.036), (42, -0.041), (43, 0.015), (44, -0.061), (45, 0.037), (46, 0.032), (47, 0.058), (48, 0.162), (49, -0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95428836 173 acl-2011-Insertion Operator for Bayesian Tree Substitution Grammars

Author: Hiroyuki Shindo ; Akinori Fujino ; Masaaki Nagata

2 0.7965157 330 acl-2011-Using Derivation Trees for Treebank Error Detection

Author: Seth Kulick ; Ann Bies ; Justin Mott

Abstract: This work introduces a new approach to checking treebank consistency. Derivation trees based on a variant of Tree Adjoining Grammar are used to compare the annotation of word sequences based on their structural similarity. This overcomes the problems of earlier approaches based on using strings of words rather than tree structure to identify the appropriate contexts for comparison. We report on the result of applying this approach to the Penn Arabic Treebank and how this approach leads to high precision of error detection.

3 0.77515239 30 acl-2011-Adjoining Tree-to-String Translation

Author: Yang Liu ; Qun Liu ; Yajuan Lu

4 0.68725127 154 acl-2011-How to train your multi bottom-up tree transducer

Author: Andreas Maletti

Abstract: The local multi bottom-up tree transducer is introduced and related to the (non-contiguous) synchronous tree sequence substitution grammar. It is then shown how to obtain a weighted local multi bottom-up tree transducer from a bilingual and biparsed corpus. Finally, the problem of non-preservation of regularity is addressed. Three properties that ensure preservation are introduced, and it is discussed how to adjust the rule extraction process such that they are automatically fulfilled.

5 0.68028319 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation

Author: Ashish Vaswani ; Haitao Mi ; Liang Huang ; David Chiang

Abstract: Most statistical machine translation systems rely on composed rules (rules that can be formed out of smaller rules in the grammar). Though this practice improves translation by weakening independence assumptions in the translation model, it nevertheless results in huge, redundant grammars, making both training and decoding inefficient. Here, we take the opposite approach, where we only use minimal rules (those that cannot be formed out of other rules), and instead rely on a rule Markov model of the derivation history to capture dependencies between minimal rules. Large-scale experiments on a state-of-the-art tree-to-string translation system show that our approach leads to a slimmer model, a faster decoder, yet the same translation quality (measured using B ) as composed rules.

6 0.67892385 188 acl-2011-Judging Grammaticality with Tree Substitution Grammar Derivations

7 0.63001329 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing

8 0.60655999 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation

9 0.59183967 219 acl-2011-Metagrammar engineering: Towards systematic exploration of implemented grammars

10 0.55802447 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars

11 0.55040634 28 acl-2011-A Statistical Tree Annotator and Its Applications

12 0.54830092 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations

13 0.53376931 239 acl-2011-P11-5002 k2opt.pdf

14 0.52903342 176 acl-2011-Integrating surprisal and uncertain-input models in online sentence comprehension: formal techniques and empirical results

15 0.50182253 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers

16 0.49686188 267 acl-2011-Reversible Stochastic Attribute-Value Grammars

17 0.45217472 61 acl-2011-Binarized Forest to String Translation

18 0.44772503 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar

19 0.42086148 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

20 0.4191258 217 acl-2011-Machine Translation System Combination by Confusion Forest

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.02), (17, 0.076), (26, 0.023), (28, 0.011), (37, 0.1), (39, 0.056), (41, 0.08), (55, 0.022), (59, 0.027), (72, 0.022), (89, 0.289), (91, 0.053), (96, 0.134)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.84514719 50 acl-2011-Automatic Extraction of Lexico-Syntactic Patterns for Detection of Negation and Speculation Scopes

Author: Emilia Apostolova ; Noriko Tomuro ; Dina Demner-Fushman

Abstract: Detecting the linguistic scope of negated and speculated information in text is an important Information Extraction task. This paper presents ScopeFinder, a linguistically motivated rule-based system for the detection of negation and speculation scopes. The system rule set consists of lexico-syntactic patterns automatically extracted from a corpus annotated with negation/speculation cues and their scopes (the BioScope corpus). The system performs on par with state-of-the-art machine learning systems. Additionally, the intuitive and linguistically motivated rules will allow for manual adaptation of the rule set to new domains and corpora. 1 Motivation Information Extraction (IE) systems often face the problem of distinguishing between affirmed, negated, and speculative information in text. For example, sentiment analysis systems need to detect negation for accurate polarity classification. Similarly, medical IE systems need to differentiate between affirmed, negated, and speculated (possible) medical conditions. The importance of the task of negation and speculation (a.k.a. hedge) detection is attested by a number of research initiatives. The creation of the BioScope corpus (Vincze et al., 2008) assisted in the development and evaluation of several negation/hedge scope detection systems. The corpus consists of medical and biological texts annotated for negation, speculation, and their linguistic scope. The 2010 283 Noriko Tomuro Dina Demner-Fushman DePaul University Chicago, IL USA t omuro @ c s . depaul . edu National Library of Medicine Bethesda, MD USA ddemne r@mai l nih . gov . i2b2 NLP Shared Task1 included a track for detection of the assertion status of medical problems (e.g. affirmed, negated, hypothesized, etc.). The CoNLL2010 Shared Task (Farkas et al., 2010) focused on detecting hedges and their scopes in Wikipedia articles and biomedical texts. In this paper, we present a linguistically motivated rule-based system for the detection of negation and speculation scopes that performs on par with state-of-the-art machine learning systems. The rules used by the ScopeFinder system are automatically extracted from the BioScope corpus and encode lexico-syntactic patterns in a user-friendly format. While the system was developed and tested using a biomedical corpus, the rule extraction mechanism is not domain-specific. In addition, the linguistically motivated rule encoding allows for manual adaptation to new domains and corpora. 2 Task Definition Negation/Speculation detection is typically broken down into two sub-tasks - discovering a negation/speculation cue and establishing its scope. The following example from the BioScope corpus shows the annotated hedging cue (in bold) together with its associated scope (surrounded by curly brackets): Finally, we explored the {possible role of 5hydroxyeicosatetraenoic acid as a regulator of arachidonic acid liberation}. Typically, systems first identify negation/speculation cues and subsequently try to identify their associated cue scope. However, the two tasks are interrelated and both require 1https://www.i2b2.org/NLP/Relations/ Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 283–287, syntactic understanding. Consider the following two sentences from the BioScope corpus: 1) By contrast, {D-mib appears to be uniformly expre1ss)e Bdy yin c oimnatrgaisnta,l { dDis-mcsi }b. 2) Differentiation assays using water soluble phorbol esters reveal that differentiation becomes irreversible soon after AP-1 appears. Both sentences contain the word form appears, however in the first sentence the word marks a hedg- ing cue, while in the second sentence the word does not suggest speculation. Unlike previous work, we do not attempt to identify negation/speculation cues independently of their scopes. Instead, we concentrate on scope detection, simultaneously detecting corresponding cues. 3 Dataset We used the BioScope corpus (Vincze et al., 2008) to develop our system and evaluate its performance. To our knowledge, the BioScope corpus is the only publicly available dataset annotated with negation/speculation cues and their scopes. It consists of biomedical papers, abstracts, and clinical reports (corpus statistics are shown in Tables 1 and 2). Corpus Type Sentences Documents Mean Document Size Clinical752019543.85 Full Papers Paper Abstracts 3352 14565 9 1273 372.44 11.44 Table 1: Statistics of the BioScope corpus. Document sizes represent number of sentences. Corpus Type Negation Cues Speculation Cues Negation Speculation Clinical87211376.6%13.4% Full Papers Paper Abstracts 378 1757 682 2694 13.76% 13.45% 22.29% 17.69% Table 2: Statistics of the BioScope corpus. The 2nd and 3d columns show the total number of cues within the datasets; the 4th and 5th columns show the percentage of negated and speculative sentences. 70% ofthe corpus documents (randomly selected) were used to develop the ScopeFinder system (i.e. extract lexico-syntactic rules) and the remaining 30% were used to evaluate system performance. While the corpus focuses on the biomedical domain, our rule extraction method is not domain specific and in future work we are planning to apply our method on different types of corpora. 4 Method Intuitively, rules for detecting both speculation and negation scopes could be concisely expressed as a 284 Figure 1: Parse tree of the sentence ‘T cells {lack active NFkappa B } bPuatr express Sp1 as expected’ generated by cthtiev eS NtanF-fkoaprdp parser. Speculation scope ewxporedcste are gsehnoewrant eind ellipsis. tTanhecue word is shown in grey. The nearest common ancestor of all cue and scope leaf nodes is shown in a box. combination of lexical and syntactic patterns. example, BioScope O¨zg u¨r For and Radev (2009) examined sample sentences and developed hedging scope rules such as: The scope of a modal verb cue (e.g. may, might, could) is the verb phrase to which it is attached; The scope of a verb cue (e.g. appears, seems) followed by an infinitival clause extends to the whole sentence. Similar lexico-syntactic rules have been also manually compiled and used in a number of hedge scope detection systems, e.g. (Kilicoglu and Bergler, 2008), (Rei and Briscoe, 2010), (Velldal et al., 2010), (Kilicoglu and Bergler, 2010), (Zhou et al., 2010). However, manually creating a comprehensive set of such lexico-syntactic scope rules is a laborious and time-consuming process. In addition, such an approach relies heavily on the availability of accurately parsed sentences, which could be problematic for domains such as biomedical texts (Clegg and Shepherd, 2007; McClosky and Charniak, 2008). Instead, we attempted to automatically extract lexico-syntactic scope rules from the BioScope corpus, relying only on consistent (but not necessarily accurate) parse tree representations. We first parsed each sentence in the training dataset which contained a negation or speculation cue using the Stanford parser (Klein and Manning, 2003; De Marneffe et al., 2006). Figure 1 shows the parse tree of a sample sentence containing a negation cue and its scope. Next, for each cue-scope instance within the sen- tence, we identified the nearest common ancestor Figure 2: Lexico-syntactic pattern extracted from the sentence from Figure 1. The rule is equivalent to the following string representation: (VP (VBP lack) (NP (JJ *scope*) (NN *scope*) (NN *scope*))). which encompassed the cue word(s) and all words in the scope (shown in a box on Figure 1). The subtree rooted by this ancestor is the basis for the resulting lexico-syntactic rule. The leaf nodes of the resulting subtree were converted to a generalized representation: scope words were converted to *scope*; noncue and non-scope words were converted to *; cue words were converted to lower case. Figure 2 shows the resulting rule. This rule generation approach resulted in a large number of very specific rule patterns - 1,681 nega- tion scope rules and 3,043 speculation scope rules were extracted from the training dataset. To identify a more general set of rules (and increase recall) we next performed a simple transformation of the derived rule set. If all children of a rule tree node are of type *scope* or * (i.e. noncue words), the node label is replaced by *scope* or * respectively, and the node’s children are pruned from the rule tree; neighboring identical siblings of type *scope* or * are replaced by a single node of the corresponding type. Figure 3 shows an example of this transformation. (a)ThechildrenofnodesJ /N /N are(b)Thechildren pruned and their labels are replaced by of node NP are *scope*. pruned and its label is replaced by *scope*. Figure 3: Transformation of the tree shown in Figure 2. The final rule is equivalent to the following string representation: (VP (VBP lack) *scope* ) 285 The rule tree pruning described above reduced the negation scope rule patterns to 439 and the speculation rule patterns to 1,000. In addition to generating a set of scope finding rules, we also implemented a module that parses string representations of the lexico-syntactic rules and performs subtree matching. The ScopeFinder module2 identifies negation and speculation scopes in sentence parse trees using string-encoded lexicosyntactic patterns. Candidate sentence parse subtrees are first identified by matching the path of cue leafnodes to the root ofthe rule subtree pattern. Ifan identical path exists in the sentence, the root of the candidate subtree is thus also identified. The candidate subtree is evaluated for a match by recursively comparing all node children (starting from the root of the subtree) to the rule pattern subtree. Nodes of type *scope* and * match any number of nodes, similar to the semantics of Regex Kleene star (*). 5 Results As an informed baseline, we used a previously de- veloped rule-based system for negation and speculation scope discovery (Apostolova and Tomuro, 2010). The system, inspired by the NegEx algorithm (Chapman et al., 2001), uses a list of phrases split into subsets (preceding vs. following their scope) to identify cues using string matching. The cue scopes extend from the cue to the beginning or end of the sentence, depending on the cue type. Table 3 shows the baseline results. PSFCNalpueingpleciarPutcAlai opbtneisor tacsP6597C348o.r12075e4ctly6859RP203475r. 81e26d037icteF569784C52. 04u913e84s5F2A81905l.2786P14redictCus Table 3: Baseline system performance. P (Precision), R (Recall), and F (F1-score) are computed based on the sentence tokens of correctly predicted cues. The last column shows the F1-score for sentence tokens of all predicted cues (including erroneous ones). We used only the scopes of predicted cues (correctly predicted cues vs. all predicted cues) to mea- 2The rule sets and source code are publicly available at http://scopefinder.sourceforge.net/. sure the baseline system performance. The baseline system heuristics did not contain all phrase cues present in the dataset. The scopes of cues that are missing from the baseline system were not included in the results. As the baseline system was not penalized for missing cue phrases, the results represent the upper bound of the system. Table 4 shows the results from applying the full extracted rule set (1,681 negation scope rules and 3,043 speculation scope rules) on the test data. As expected, this rule set consisting of very specific scope matching rules resulted in very high precision and very low recall. Negation P R F A Clinical99.4734.3051.0117.58 Full Papers Paper Abstracts 95.23 87.33 25.89 05.78 40.72 10.84 28.00 07.85 Speculation Clinical96.5020.1233.3022.90 Full Papers Paper Abstracts 88.72 77.50 15.89 11.89 26.95 20.62 10.13 10.00 Table 4: Results from applying the full extracted rule set on the test data. Precision (P), Recall (R), and F1-score (F) are com- puted based the number of correctly identified scope tokens in each sentence. Accuracy (A) is computed for correctly identified full scopes (exact match). Table 5 shows the results from applying the rule set consisting of pruned pattern trees (439 negation scope rules and 1,000 speculation scope rules) on the test data. As shown, overall results improved significantly, both over the baseline and over the unpruned set of rules. Comparable results are shown in bold in Tables 3, 4, and 5. Negation P R F A Clinical85.5992.1588.7585.56 Full Papers 49.17 94.82 64.76 71.26 Paper Abstracts 61.48 92.64 73.91 80.63 Speculation Clinical67.2586.2475.5771.35 Full Papers 65.96 98.43 78.99 52.63 Paper Abstracts 60.24 95.48 73.87 65.28 Table 5: Results from applying the pruned rule set on the test data. Precision (P), Recall (R), and F1-score (F) are computed based on the number of correctly identified scope tokens in each sentence. Accuracy (A) is computed for correctly identified full scopes (exact match). 6 Related Work Interest in the task of identifying negation and spec- ulation scopes has developed in recent years. Rele286 vant research was facilitated by the appearance of a publicly available annotated corpus. All systems described below were developed and evaluated against the BioScope corpus (Vincze et al., 2008). O¨zg u¨r and Radev (2009) have developed a supervised classifier for identifying speculation cues and a manually compiled list of lexico-syntactic rules for identifying their scopes. For the performance of the rule based system on identifying speculation scopes, they report 61. 13 and 79.89 accuracy for BioScope full papers and abstracts respectively. Similarly, Morante and Daelemans (2009b) developed a machine learning system for identifying hedging cues and their scopes. They modeled the scope finding problem as a classification task that determines if a sentence token is the first token in a scope sequence, the last one, or neither. Results of the scope finding system with predicted hedge signals were reported as F1-scores of 38. 16, 59.66, 78.54 and for clinical texts, full papers, and abstracts respectively3. Accuracy (computed for correctly identified scopes) was reported as 26.21, 35.92, and 65.55 for clinical texts, papers, and abstracts respectively. Morante and Daelemans have also developed a metalearner for identifying the scope of negation (2009a). Results of the negation scope finding system with predicted cues are reported as F1-scores (computed on scope tokens) of 84.20, 70.94, and 82.60 for clinical texts, papers, and abstracts respectively. Accuracy (the percent of correctly identified exact scopes) is reported as 70.75, 41.00, and 66.07 for clinical texts, papers, and abstracts respectively. The top three best performers on the CoNLL2010 shared task on hedge scope detection (Farkas et al., 2010) report an F1-score for correctly identified hedge cues and their scopes ranging from 55.3 to 57.3. The shared task evaluation metrics used stricter matching criteria based on exact match of both cues and their corresponding scopes4. CoNLL-2010 shared task participants applied a variety of rule-based and machine learning methods 3F1-scores are computed based on scope tokens. Unlike our evaluation metric, scope token matches are computed for each cue within a sentence, i.e. a token is evaluated multiple times if it belongs to more than one cue scope. 4Our system does not focus on individual cue-scope pair de- tection (we instead optimized scope detection) and as a result performance metrics are not directly comparable. on the task - Morante et al. (2010) used a memorybased classifier based on the k-nearest neighbor rule to determine if a token is the first token in a scope sequence, the last, or neither; Rei and Briscoe (2010) used a combination of manually compiled rules, a CRF classifier, and a sequence of post-processing steps on the same task; Velldal et al (2010) manually compiled a set of heuristics based on syntactic information taken from dependency structures. 7 Discussion We presented a method for automatic extraction of lexico-syntactic rules for negation/speculation scopes from an annotated corpus. The developed ScopeFinder system, based on the automatically extracted rule sets, was compared to a baseline rule-based system that does not use syntactic information. The ScopeFinder system outperformed the baseline system in all cases and exhibited results comparable to complex feature-based, machine-learning systems. In future work, we will explore the use of statistically based methods for the creation of an optimum set of lexico-syntactic tree patterns and will evaluate the system performance on texts from different domains. References E. Apostolova and N. Tomuro. 2010. Exploring surfacelevel heuristics for negation and speculation discovery in clinical texts. In Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, pages 81–82. Association for Computational Linguistics. W.W. Chapman, W. Bridewell, P. Hanbury, G.F. Cooper, and B.G. Buchanan. 2001. A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of biomedical informatics, 34(5):301–310. A.B. Clegg and A.J. Shepherd. 2007. Benchmarking natural-language parsers for biological applications using dependency graphs. BMC bioinformatics, 8(1):24. M.C. De Marneffe, B. MacCartney, and C.D. Manning. 2006. Generating typed dependency parses from phrase structure parses. In LREC 2006. Citeseer. R. Farkas, V. Vincze, G. M o´ra, J. Csirik, and G. Szarvas. 2010. The CoNLL-2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text. In Proceedings of the Fourteenth Conference on 287 Computational Natural Language Learning (CoNLL2010): Shared Task, pages 1–12. H. Kilicoglu and S. Bergler. 2008. Recognizing speculative language in biomedical research articles: a linguistically motivated perspective. BMC bioinformatics, 9(Suppl 11):S10. H. Kilicoglu and S. Bergler. 2010. A High-Precision Approach to Detecting Hedges and Their Scopes. CoNLL-2010: Shared Task, page 70. D. Klein and C.D. Manning. 2003. Fast exact inference with a factored model for natural language parsing. Advances in neural information processing systems, pages 3–10. D. McClosky and E. Charniak. 2008. Self-training for biomedical parsing. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pages 101–104. Association for Computational Linguistics. R. Morante and W. Daelemans. 2009a. A metalearning approach to processing the scope of negation. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pages 21–29. Association for Computational Linguistics. R. Morante and W. Daelemans. 2009b. Learning the scope of hedge cues in biomedical texts. In Proceed- ings of the Workshop on BioNLP, pages 28–36. Association for Computational Linguistics. R. Morante, V. Van Asch, and W. Daelemans. 2010. Memory-based resolution of in-sentence scopes of hedge cues. CoNLL-2010: Shared Task, page 40. A. O¨zg u¨r and D.R. Radev. 2009. Detecting speculations and their scopes in scientific text. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3-Volume 3, pages 1398–1407. Association for Computational Linguistics. M. Rei and T. Briscoe. 2010. Combining manual rules and supervised learning for hedge cue and scope detection. In Proceedings of the 14th Conference on Natural Language Learning, pages 56–63. E. Velldal, L. Øvrelid, and S. Oepen. 2010. Resolving Speculation: MaxEnt Cue Classification and Dependency-Based Scope Rules. CoNLL-2010: Shared Task, page 48. V. Vincze, G. Szarvas, R. Farkas, G. M o´ra, and J. Csirik. 2008. The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC bioinformatics, 9(Suppl 11):S9. H. Zhou, X. Li, D. Huang, Z. Li, and Y. Yang. 2010. Exploiting Multi-Features to Detect Hedges and Their Scope in Biomedical Texts. CoNLL-2010: Shared Task, page 106.

same-paper 2 0.76191449 173 acl-2011-Insertion Operator for Bayesian Tree Substitution Grammars

Author: Hiroyuki Shindo ; Akinori Fujino ; Masaaki Nagata

3 0.70224231 30 acl-2011-Adjoining Tree-to-String Translation

Author: Yang Liu ; Qun Liu ; Yajuan Lu

4 0.65846503 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters

Author: Ryan Gabbard ; Marjorie Freedman ; Ralph Weischedel

Abstract: As an alternative to requiring substantial supervised relation training data, many have explored bootstrapping relation extraction from a few seed examples. Most techniques assume that the examples are based on easily spotted anchors, e.g., names or dates. Sentences in a corpus which contain the anchors are then used to induce alternative ways of expressing the relation. We explore whether coreference can improve the learning process. That is, if the algorithm considered examples such as his sister, would accuracy be improved? With coreference, we see on average a 2-fold increase in F-Score. Despite using potentially errorful machine coreference, we see significant increase in recall on all relations. Precision increases in four cases and decreases in six.

5 0.61887717 26 acl-2011-A Speech-based Just-in-Time Retrieval System using Semantic Search

Author: Andrei Popescu-Belis ; Majid Yazdani ; Alexandre Nanchen ; Philip N. Garner

Abstract: The Automatic Content Linking Device is a just-in-time document retrieval system which monitors an ongoing conversation or a monologue and enriches it with potentially related documents, including multimedia ones, from local repositories or from the Internet. The documents are found using keyword-based search or using a semantic similarity measure between documents and the words obtained from automatic speech recognition. Results are displayed in real time to meeting participants, or to users watching a recorded lecture or conversation.

6 0.58999985 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction

7 0.58481139 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction

8 0.5806396 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing

9 0.57942367 241 acl-2011-Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation

10 0.57918149 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering

11 0.57861662 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

12 0.57743037 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

13 0.57727182 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation

14 0.57610273 273 acl-2011-Semantic Representation of Negation Using Focus Detection

15 0.57582158 209 acl-2011-Lexically-Triggered Hidden Markov Models for Clinical Document Coding

16 0.57556516 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation

17 0.57556403 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models

18 0.57541859 28 acl-2011-A Statistical Tree Annotator and Its Applications

19 0.57447922 311 acl-2011-Translationese and Its Dialects

20 0.57287824 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations