acl acl2010 acl2010-53 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Trevor Cohn ; Phil Blunsom
Abstract: Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian non-parametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this paper we present a novel training method for the model using a blocked Metropolis-Hastings sampler in place of the previous method’s local Gibbs sampler. The blocked sampler makes considerably larger moves than the local sampler and consequently con- verges in less time. A core component of the algorithm is a grammar transformation which represents an infinite tree substitution grammar in a finite context free grammar. This enables efficient blocked inference for training and also improves the parsing algorithm. Both algorithms are shown to improve parsing accuracy.
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract Learning a tree substitution grammar is very challenging due to derivational ambiguity. [sent-5, score-0.44]
2 Our recent approach used a Bayesian non-parametric model to induce good derivations from treebanked input (Cohn et al. [sent-6, score-0.103]
3 In this paper we present a novel training method for the model using a blocked Metropolis-Hastings sampler in place of the previous method’s local Gibbs sampler. [sent-8, score-0.944]
4 The blocked sampler makes considerably larger moves than the local sampler and consequently con- verges in less time. [sent-9, score-1.459]
5 A core component of the algorithm is a grammar transformation which represents an infinite tree substitution grammar in a finite context free grammar. [sent-10, score-0.73]
6 This enables efficient blocked inference for training and also improves the parsing algorithm. [sent-11, score-0.455]
7 1 Introduction Tree Substitution Grammar (TSG) is a compelling grammar formalism which allows nonterminal rewrites in the form of trees, thereby enabling the modelling of complex linguistic phenomena such as argument frames, lexical agreement and idiomatic phrases. [sent-13, score-0.223]
8 This is because treebanks are typically not annotated with their TSG derivations (how to decompose a tree into elementary tree fragments); instead the derivation needs to be inferred. [sent-15, score-0.668]
9 uk This used a Gibbs sampler for training, which repeatedly samples for every node in every training tree a binary value indicating whether the node is or is not a substitution point in the tree’s derivation. [sent-22, score-0.959]
10 Aggregated over the whole corpus, these values and the underlying trees specify the weighted grammar. [sent-23, score-0.061]
11 The sampler can get easily stuck because many locally improbable decisions are required to escape from a locally optimal solution. [sent-28, score-0.586]
12 This problem manifests itself both locally to a sentence and globally over the training sample. [sent-29, score-0.066]
13 The net result is a sampler that is non-convergent, overly dependent on its initialisation and cannot be said to be sampling from the posterior. [sent-30, score-0.612]
14 In this paper we present a blocked MetropolisHasting sampler for learning a TSG, similar to Johnson et al. [sent-31, score-0.844]
15 The sampler jointly updates all the substitution variables in a tree, making much larger moves than the local single-variable sampler. [sent-33, score-0.758]
16 A critical issue when developing a Metroplis-Hastings sampler is choosing a suitable proposal distribution, which must have the same support as the true distribution. [sent-34, score-0.588]
17 For our model the natural proposal distribution is a MAP point estimate, however this cannot be represented directly as it is infinitely large. [sent-35, score-0.167]
18 To solve this problem we develop a grammar transformation which can succinctly represent an infinite TSG in an equivalent finite Context Free Grammar (CFG). [sent-36, score-0.29]
19 The transformed grammar can be used as a proposal distribution, from which samples can be drawn in polynomial time. [sent-37, score-0.339]
20 Empirically, the blocked sampler converges in fewer iterations and in less time than the local Gibbs sampler. [sent-38, score-0.917]
21 In addition, we also show how the transformed grammar can be used for parsing, which yields theoretical and empirical improvements over our previous method which truncated the grammar. [sent-39, score-0.294]
22 (2003)) is a 4-tuple, G = (T, N, S, R), where T is a set of terminal symbols, N is a set of nonterminal symbols, S ∈ N is the distinguished root tneornmtienramlin syaml abnolds R, S Si s∈ a Nse its so tfh productions (rules). [sent-43, score-0.176]
23 The productions take the form of tree fragments, called elementary trees (ETs), in which each internal node is labelled with a nonterminal and each leaf is labelled with either a terminal or a nonterminal. [sent-44, score-0.689]
24 The frontier nonterminal nodes in each ET form the sites into which other ETs can be substituted. [sent-45, score-0.119]
25 A derivation creates a tree by recursive substitution starting with the root symbol and finishing when there are no remaining frontier nonterminals. [sent-46, score-0.49]
26 Figure 1 (left) shows an example derivation where the arrows denote substitution. [sent-47, score-0.118]
27 A Probabilistic Tree Substitution Grammar (PTSG) assigns a probability to each rule in the grammar, where each production is assumed to be condi- tionally independent given its root nonterminal. [sent-48, score-0.125]
28 The inference problem within this model is to identify the posterior distribution of the elementary trees e given whole trees t. [sent-52, score-0.435]
29 Rather than representing the distribution Gc explicitly, we integrate over all possible values of Gc. [sent-55, score-0.055]
30 The key result required for inference is that the conditional distribution of ei, given e1 . [sent-56, score-0.105]
31 (2009) we presented base distributions that favour small elementary trees which we expect will generalise well to unseen data. [sent-66, score-0.313]
32 In this work we show that if P0 is chosen such that it decomposes with the CFG rules contained within each elementary tree,1 then we can use a novel dynamic programming algorithm to sample derivations without ever enumerating all the elementary trees in the grammar. [sent-67, score-0.567]
33 The model was trained using a local Gibbs sampler (Geman and Geman, 1984), a Markov chain Monte Carlo (MCMC) method in which random variables are repeatedly sampled conditioned on the values of all other random variables in the model. [sent-68, score-0.643]
34 To formulate the local sampler, we associate a binary variable with each non-root internal node of each tree in the training set, indicat- ing whether that node is a substitution point or not (illustrated in Figure 1). [sent-69, score-0.475]
35 The sampler then visits each node in a random schedule and resamples that node’s substitution variable, where the probability of the two different configurations are given by (1). [sent-70, score-0.737]
36 Parsing was performed using a MetropolisHastings sampler to draw derivation samples for a string, from which the best tree was recovered. [sent-71, score-0.815]
37 However the sampler used for parsing was biased 1Both choices of base distribution in Cohn et al. [sent-72, score-0.649]
38 226 because it used as its proposal distribution a truncated grammar which excluded all but a handful of the unseen elementary trees. [sent-75, score-0.556]
39 Consequently the proposal had smaller support than the true model, voiding the MCMC convergence proofs. [sent-76, score-0.125]
40 3 Grammar Transformation We now present a blocked sampler using the Metropolis-Hastings (MH) algorithm to perform sentence-level inference, based on the work of Johnson et al. [sent-77, score-0.844]
41 (2007) who presented a MH sampler for a Bayesian PCFG. [sent-78, score-0.508]
42 (2007)’s, we have an added complication: the MAP grammar cannot be estimated directly. [sent-81, score-0.157]
43 This is a consequence of the base distribution having infinite support (assigning non-zero probability to infinitely many unseen tree fragments), which means the MAP has an infinite rule set. [sent-82, score-0.441]
44 For example, if our base distribution licences the CFG production NP → NP PP then our TSG grammar pwroil d uccotniotnain N tPhe → in NfiPni PteP s tehte onf o eulerm TeSnGtary trees NP → NP PP, NP → (NP NP PP) PP, tNarPy → (NP (NP →N PN PP) PP) PP, . [sent-83, score-0.366]
45 However, we can represent the infinite MAP using a grammar transformation inspired by Goodman (2003), which represents the MAP TSG in an equivalent finite PCFG. [sent-87, score-0.29]
46 2 Under the transformed PCFG inference is efficient, allowing its use as the proposal distribution in a blocked MH sampler. [sent-88, score-0.574]
47 We represent the MAP using the grammar transformation in Table 1which separates the ne,c and P0 terms in (1) into two separate CFGs, A and B. [sent-89, score-0.225]
48 Grammar A has productions for every ET with ne,c ≥ 1 which are assigned unsmoothed probabilitie≥s: omitting atrhee P0 itgenrmed fr uonmsm (1o)o. [sent-90, score-0.126]
49 The sign(e) function creates a unique string signature for an ET e (where the signature of a frontier node is itself) and sc is the Bernoulli probability of c being a substitution variable (and stopping the P0 recursion). [sent-99, score-0.365]
50 The remaining rules in grammar B correspond to every CFG production in the underlying PCFG base distribution, coupled with the binary decision whether or not nonterminal children should be substitution sites (frontier nonterminals). [sent-102, score-0.459]
51 In this way there are often two equivalent paths to reach the same chart cell using the same elementary tree via grammar A using observed TSG productions and via grammar B using P0 backoff; summing these yields the desired net probability. [sent-104, score-0.799]
52 Figure 2 shows an example of the transformation of an elementary tree with non-zero count, ne,c ≥ 1, into the two types of CFG rules. [sent-105, score-0.416]
53 Both parts are capable oef t parsing tshe o string NP, saw, NothP into a S, as illustrated in Figure 3; summing the probability of both analyses gives the model probability from (1). [sent-106, score-0.122]
54 Note that although the probabilities exactly match the true model for a single elementary tree, the probability of derivations com– posed of many elementary trees may not match because the model’s caching behaviour has been suppressed, i. [sent-107, score-0.579]
55 For training we define the MH sampler as follows. [sent-110, score-0.535]
56 Taking the product of the rule scores above the line yields the left term in (1), and the product of the scores below the line yields the right term. [sent-112, score-0.056]
57 S S S{NP,{VP{V{hates}},NP}} S’ GNeoPrgeV {hPa{teVs{}hatebsr}oNc, Po liGeNoPrgehaVte’sVP’broNcPoli Figure 3: Example trees under the grammar transform, which both encode the same TSG derivation from Figure 1. [sent-113, score-0.336]
58 The left tree encodes that the S → NP (VP (V hates) NP elementary ttrreeee wenacso dderasw thn ftr tohme tSh e→ →ca NchPe (,V wPh (iVle hfoatre tsh)e N rPig ehlte tmreeen ttharisy same elementary tree was drawn from the base distribution (the left and right terms in (1), respectively). [sent-114, score-0.795]
59 the derivations of training corpus excluding the current tree, which we represent using the PCFG transformation. [sent-115, score-0.089]
60 The next step is to sample derivations for a given tree, for which we use a constrained variant of the inside algorithm (Lari and Young, 1990). [sent-116, score-0.158]
61 We must ensure that the TSG derivation produces the given tree, and therefore during inside inference we only consider spans that are constituents in the tree and are labelled with the correct nonterminal. [sent-117, score-0.423]
62 Under the tarneed NcoPn{sDtraT,iNntsN t{hcea rt}im}e b complexity of inside inference is linear in the length of the sentence. [sent-121, score-0.118]
63 A derivation is then sampled from the inside chart using a top-down traversal (Johnson et al. [sent-122, score-0.257]
64 The derivation is scored with the true model and accepted or rejected using the MH test; accepted samples then replace the current derivation for the tree, and rejected samples leave the previous derivation unchanged. [sent-124, score-0.614]
65 These steps are then repeated for another tree in the training set, and the process is then repeated over the full training set many times. [sent-125, score-0.194]
66 Parsing The grammar transform is not only useful for training, but also for parsing. [sent-126, score-0.232]
67 To parse a sentence we sample a number of TSG derivations from the MAP which are then accepted or rejected into the full model using a MH step. [sent-127, score-0.171]
68 The samples are obtained from the same transformed grammar but adapting the algorithm for an unsupervised setting where parse trees are not available. [sent-128, score-0.32]
69 For this we use the standard inside algorithm applied to the sentence, omitting the tree constraints, which has time complexity cubic in the length of the sentence. [sent-129, score-0.26]
70 We then sample a derivation from the inside chart and perform the MH acceptance test. [sent-130, score-0.284]
71 This setup is theoretically more appealing than our previous approach in which we truncated the approximation grammar to exclude most of the zero count rules (Cohn et al. [sent-131, score-0.213]
72 We found that both the maximum probability derivation and tree were considerably worse than a tree constructed to maximise the expected number of correct CFG rules (MER), based on Goodman’s (2003) algorithm for maximising labelled recall. [sent-133, score-0.485]
73 For this reason we the MER parsing algorithm using sampled Monte Carlo estimates for the marginals over CFG rules at each sentence span. [sent-134, score-0.078]
74 Our models were all sampled for 5k iterations with hyperparameter inference for αc and sc ∀ c ∈ N, but in contrast to our previous approach we ∈di Nd not use annealing which we did not find to help generalisation accuracy. [sent-138, score-0.142]
75 The MH acceptance rates were in excess of 99% across both training and parsing. [sent-139, score-0.062]
76 For training the blocked MH sampler exhibits faster convergence than the local Gibbs sampler, as shown in Figure 4. [sent-141, score-0.989]
77 Irrespective of the initialisation the blocked sampler finds higher likelihood states in many fewer iterations (the same trend continues until iteration 5k). [sent-142, score-0.914]
78 To be fair, the blocked sampler is slower per iteration (roughly 50% worse) due to the higher overheads of the grammar transform and performing dynamic programming (despite nominal optimisation). [sent-143, score-1.112]
79 4 Even after accounting for the time differ4The speed difference diminishes with corpus size: sections 2–22 the blocked sampler is only 19% slower 228 on per iteration Figure 4: Training likelihood vs. [sent-144, score-0.88]
80 Each sampling method was initialised with both minimal and maximal elementary trees. [sent-146, score-0.283]
81 c492s08t1efodrmpas- ing algorithm and the novel grammar transform algorithm for four different training configurations. [sent-149, score-0.259]
82 ence the blocked sampler is more effective than the local Gibbs sampler. [sent-150, score-0.917]
83 Training likelihood is highly correlated with generalisation F1 (Pearson’s correlation efficient of 0. [sent-151, score-0.056]
84 95), and therefore improving the sampler convergence will have immediate effects on performance. [sent-152, score-0.553]
85 5 The blocked sampler results in better generalisation F1 scores than the local Gibbs sampler, irrespective of the initialisation condition or parsing method used. [sent-154, score-1.127]
86 The use of the grammar transform in parsing also yields better scores irrespective of the underlying model. [sent-155, score-0.344]
87 Together these results strongly advocate the use of the grammar transform for inference in infinite TSGs. [sent-156, score-0.347]
88 We initialised the model with the final sample from a run on the small training set, and used the blocked sampler for 6500 iterations. [sent-158, score-0.94]
89 We conjecture that the performance gap is due to the model using an overly simplistic treatment of unknown words, and also a further mixing problems with the sampler. [sent-167, score-0.071]
90 The sampler has difficulty escaping such modes and therefore is slower to mix. [sent-169, score-0.544]
91 One way to solve the mixing problem is for the sampler to make more global moves, e. [sent-170, score-0.579]
92 Another way is to use a variational approximation instead of MCMC sampling (Wainwright and Jordan, 2008). [sent-173, score-0.061]
93 5 Discussion We have demonstrated how our grammar transformation can implicitly represent an exponential space of tree fragments efficiently, allowing us to build a sampler with considerably better mixing properties than a local Gibbs sampler. [sent-174, score-1.058]
94 (2006) for which it would be trivial to adapt our technique to develop a blocked sampler. [sent-178, score-0.336]
95 We envisage similar representations being applied to these models to improve their mixing properties. [sent-182, score-0.071]
96 A particularly interesting avenue for further research is to employ our blocked sampler for unsupervised grammar induction. [sent-183, score-1.001]
97 While it is difficult to extend the local Gibbs sampler to the case where the tree is not observed, the dynamic program for our blocked sampler can be easily used for unsupervised inference by omitting the tree matching constraints. [sent-184, score-1.807]
98 A Bayesian model of syntax-directed tree to string grammar induction. [sent-196, score-0.297]
99 Improving nonparameteric bayesian inference: experiments on unsupervised word segmentation with adaptor grammars. [sent-228, score-0.07]
100 Bayesian inference for PCFGs via Markov chain Monte Carlo. [sent-232, score-0.076]
wordName wordTfidf (topN-words)
[('sampler', 0.508), ('blocked', 0.336), ('tsg', 0.28), ('elementary', 0.208), ('mh', 0.158), ('grammar', 0.157), ('cohn', 0.148), ('np', 0.145), ('gibbs', 0.143), ('substitution', 0.143), ('tree', 0.14), ('saw', 0.12), ('cfg', 0.119), ('derivation', 0.118), ('hates', 0.093), ('bod', 0.082), ('proposal', 0.08), ('vp', 0.079), ('transform', 0.075), ('productions', 0.074), ('local', 0.073), ('mixing', 0.071), ('bayesian', 0.07), ('initialisation', 0.07), ('mcmc', 0.07), ('transformation', 0.068), ('inside', 0.068), ('nonterminal', 0.066), ('infinite', 0.065), ('johnson', 0.062), ('derivations', 0.062), ('dop', 0.061), ('trees', 0.061), ('pcfg', 0.06), ('ei', 0.059), ('map', 0.058), ('generalisation', 0.056), ('truncated', 0.056), ('distribution', 0.055), ('blunsom', 0.055), ('monte', 0.053), ('frontier', 0.053), ('goldwater', 0.053), ('transformed', 0.053), ('omitting', 0.052), ('lari', 0.052), ('rejected', 0.052), ('phil', 0.052), ('inference', 0.05), ('pp', 0.05), ('geman', 0.05), ('production', 0.049), ('sharon', 0.049), ('samples', 0.049), ('trevor', 0.048), ('gc', 0.047), ('labelled', 0.047), ('broccoli', 0.047), ('wainwright', 0.047), ('node', 0.046), ('convergence', 0.045), ('base', 0.044), ('sima', 0.044), ('parsing', 0.042), ('irrespective', 0.042), ('fragments', 0.041), ('initialised', 0.041), ('treebanked', 0.041), ('mer', 0.041), ('probability', 0.04), ('sv', 0.04), ('locally', 0.039), ('carlo', 0.039), ('ets', 0.037), ('slower', 0.036), ('sampled', 0.036), ('root', 0.036), ('chart', 0.035), ('primed', 0.035), ('acceptance', 0.035), ('sampling', 0.034), ('moves', 0.034), ('khalil', 0.033), ('infinitely', 0.032), ('jain', 0.032), ('accepted', 0.029), ('signature', 0.028), ('denero', 0.028), ('yields', 0.028), ('sample', 0.028), ('expansion', 0.028), ('goodman', 0.027), ('drawing', 0.027), ('vv', 0.027), ('variational', 0.027), ('training', 0.027), ('stopping', 0.027), ('nonterminals', 0.027), ('backoff', 0.027), ('chain', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999946 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
Author: Trevor Cohn ; Phil Blunsom
Abstract: Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian non-parametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this paper we present a novel training method for the model using a blocked Metropolis-Hastings sampler in place of the previous method’s local Gibbs sampler. The blocked sampler makes considerably larger moves than the local sampler and consequently con- verges in less time. A core component of the algorithm is a grammar transformation which represents an infinite tree substitution grammar in a finite context free grammar. This enables efficient blocked inference for training and also improves the parsing algorithm. Both algorithms are shown to improve parsing accuracy.
2 0.27086404 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
Author: Elif Yamangil ; Stuart M. Shieber
Abstract: We describe our experiments with training algorithms for tree-to-tree synchronous tree-substitution grammar (STSG) for monolingual translation tasks such as sentence compression and paraphrasing. These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments, yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism. We formalize nonparametric Bayesian STSG with epsilon alignment in full generality, and provide a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression. We achieve improvements against a number of baselines, including expectation maximization and variational Bayes training, illustrating the merits of nonparametric inference over the space of grammars as opposed to sparse parametric inference with a fixed grammar.
3 0.18702522 169 acl-2010-Learning to Translate with Source and Target Syntax
Author: David Chiang
Abstract: Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. But progress has been slower on translation models that are able to learn the relationship between the grammars of both the source and target language. We discuss the reasons why this has been a challenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim- ple approach that uses both source and target syntax for significant improvements in translation accuracy.
4 0.17750706 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
Author: Mohit Bansal ; Dan Klein
Abstract: We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding.
5 0.16999936 75 acl-2010-Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar
Author: Yoshihide Kato ; Shigeki Matsubara
Abstract: This paper proposes a method of correcting annotation errors in a treebank. By using a synchronous grammar, the method transforms parse trees containing annotation errors into the ones whose errors are corrected. The synchronous grammar is automatically induced from the treebank. We report an experimental result of applying our method to the Penn Treebank. The result demonstrates that our method corrects syntactic annotation errors with high precision.
6 0.14269966 162 acl-2010-Learning Common Grammar from Multilingual Corpus
8 0.12543419 21 acl-2010-A Tree Transducer Model for Synchronous Tree-Adjoining Grammars
9 0.11058209 236 acl-2010-Top-Down K-Best A* Parsing
10 0.092870079 255 acl-2010-Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization
11 0.090956032 69 acl-2010-Constituency to Dependency Translation with Forests
12 0.086950161 203 acl-2010-Rebanking CCGbank for Improved NP Interpretation
13 0.086664535 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
14 0.085644238 133 acl-2010-Hierarchical Search for Word Alignment
15 0.077249415 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling
16 0.076285817 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System
17 0.076003753 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
18 0.072846226 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data
19 0.072813749 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
20 0.071474701 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
topicId topicWeight
[(0, -0.198), (1, -0.099), (2, 0.096), (3, -0.037), (4, -0.149), (5, -0.078), (6, 0.201), (7, 0.019), (8, 0.02), (9, -0.185), (10, -0.006), (11, -0.167), (12, 0.084), (13, -0.113), (14, -0.034), (15, -0.069), (16, 0.026), (17, -0.082), (18, 0.035), (19, 0.071), (20, -0.112), (21, 0.028), (22, 0.099), (23, 0.066), (24, 0.046), (25, -0.111), (26, 0.026), (27, 0.004), (28, -0.05), (29, -0.001), (30, 0.135), (31, 0.025), (32, -0.02), (33, 0.072), (34, -0.004), (35, 0.082), (36, 0.041), (37, -0.013), (38, -0.0), (39, 0.026), (40, -0.073), (41, 0.028), (42, -0.058), (43, -0.015), (44, -0.064), (45, -0.015), (46, -0.049), (47, 0.145), (48, -0.057), (49, 0.056)]
simIndex simValue paperId paperTitle
same-paper 1 0.96175599 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
Author: Trevor Cohn ; Phil Blunsom
Abstract: Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian non-parametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this paper we present a novel training method for the model using a blocked Metropolis-Hastings sampler in place of the previous method’s local Gibbs sampler. The blocked sampler makes considerably larger moves than the local sampler and consequently con- verges in less time. A core component of the algorithm is a grammar transformation which represents an infinite tree substitution grammar in a finite context free grammar. This enables efficient blocked inference for training and also improves the parsing algorithm. Both algorithms are shown to improve parsing accuracy.
2 0.84287494 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
Author: Elif Yamangil ; Stuart M. Shieber
Abstract: We describe our experiments with training algorithms for tree-to-tree synchronous tree-substitution grammar (STSG) for monolingual translation tasks such as sentence compression and paraphrasing. These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments, yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism. We formalize nonparametric Bayesian STSG with epsilon alignment in full generality, and provide a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression. We achieve improvements against a number of baselines, including expectation maximization and variational Bayes training, illustrating the merits of nonparametric inference over the space of grammars as opposed to sparse parametric inference with a fixed grammar.
3 0.72043717 75 acl-2010-Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar
Author: Yoshihide Kato ; Shigeki Matsubara
Abstract: This paper proposes a method of correcting annotation errors in a treebank. By using a synchronous grammar, the method transforms parse trees containing annotation errors into the ones whose errors are corrected. The synchronous grammar is automatically induced from the treebank. We report an experimental result of applying our method to the Penn Treebank. The result demonstrates that our method corrects syntactic annotation errors with high precision.
4 0.67676163 21 acl-2010-A Tree Transducer Model for Synchronous Tree-Adjoining Grammars
Author: Andreas Maletti
Abstract: A characterization of the expressive power of synchronous tree-adjoining grammars (STAGs) in terms of tree transducers (or equivalently, synchronous tree substitution grammars) is developed. Essentially, a STAG corresponds to an extended tree transducer that uses explicit substitution in both the input and output. This characterization allows the easy integration of STAG into toolkits for extended tree transducers. Moreover, the applicability of the characterization to several representational and algorithmic problems is demonstrated.
5 0.65002429 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
Author: Mohit Bansal ; Dan Klein
Abstract: We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding.
6 0.56479216 162 acl-2010-Learning Common Grammar from Multilingual Corpus
7 0.56035686 169 acl-2010-Learning to Translate with Source and Target Syntax
9 0.52687049 255 acl-2010-Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization
10 0.46796229 67 acl-2010-Computing Weakest Readings
11 0.46390602 95 acl-2010-Efficient Inference through Cascades of Weighted Tree Transducers
12 0.44956568 186 acl-2010-Optimal Rank Reduction for Linear Context-Free Rewriting Systems with Fan-Out Two
13 0.41332912 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
14 0.39517632 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System
15 0.34825069 236 acl-2010-Top-Down K-Best A* Parsing
16 0.34219733 34 acl-2010-Authorship Attribution Using Probabilistic Context-Free Grammars
17 0.34211734 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons
18 0.33029163 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
19 0.32717451 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
20 0.32441506 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data
topicId topicWeight
[(4, 0.04), (14, 0.015), (25, 0.115), (33, 0.041), (39, 0.014), (42, 0.015), (44, 0.015), (59, 0.102), (66, 0.214), (69, 0.01), (73, 0.05), (76, 0.011), (78, 0.056), (83, 0.069), (84, 0.019), (98, 0.126)]
simIndex simValue paperId paperTitle
same-paper 1 0.85205895 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
Author: Trevor Cohn ; Phil Blunsom
Abstract: Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian non-parametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this paper we present a novel training method for the model using a blocked Metropolis-Hastings sampler in place of the previous method’s local Gibbs sampler. The blocked sampler makes considerably larger moves than the local sampler and consequently con- verges in less time. A core component of the algorithm is a grammar transformation which represents an infinite tree substitution grammar in a finite context free grammar. This enables efficient blocked inference for training and also improves the parsing algorithm. Both algorithms are shown to improve parsing accuracy.
Author: Elif Yamangil ; Stuart M. Shieber
Abstract: We describe our experiments with training algorithms for tree-to-tree synchronous tree-substitution grammar (STSG) for monolingual translation tasks such as sentence compression and paraphrasing. These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments, yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism. We formalize nonparametric Bayesian STSG with epsilon alignment in full generality, and provide a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression. We achieve improvements against a number of baselines, including expectation maximization and variational Bayes training, illustrating the merits of nonparametric inference over the space of grammars as opposed to sparse parametric inference with a fixed grammar.
3 0.69585514 169 acl-2010-Learning to Translate with Source and Target Syntax
Author: David Chiang
Abstract: Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. But progress has been slower on translation models that are able to learn the relationship between the grammars of both the source and target language. We discuss the reasons why this has been a challenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim- ple approach that uses both source and target syntax for significant improvements in translation accuracy.
4 0.69476432 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar
Author: Timothy A. D. Fowler ; Gerald Penn
Abstract: The definition of combinatory categorial grammar (CCG) in the literature varies quite a bit from author to author. However, the differences between the definitions are important in terms of the language classes of each CCG. We prove that a wide range of CCGs are strongly context-free, including the CCG of CCGbank and of the parser of Clark and Curran (2007). In light of these new results, we train the PCFG parser of Petrov and Klein (2007) on CCGbank and achieve state of the art results in supertagging accuracy, PARSEVAL measures and dependency accuracy.
5 0.6905551 71 acl-2010-Convolution Kernel over Packed Parse Forest
Author: Min Zhang ; Hui Zhang ; Haizhou Li
Abstract: This paper proposes a convolution forest kernel to effectively explore rich structured features embedded in a packed parse forest. As opposed to the convolution tree kernel, the proposed forest kernel does not have to commit to a single best parse tree, is thus able to explore very large object spaces and much more structured features embedded in a forest. This makes the proposed kernel more robust against parsing errors and data sparseness issues than the convolution tree kernel. The paper presents the formal definition of convolution forest kernel and also illustrates the computing algorithm to fast compute the proposed convolution forest kernel. Experimental results on two NLP applications, relation extraction and semantic role labeling, show that the proposed forest kernel significantly outperforms the baseline of the convolution tree kernel. 1
6 0.68557441 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
7 0.68321574 75 acl-2010-Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar
8 0.68282843 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection
10 0.67780781 185 acl-2010-Open Information Extraction Using Wikipedia
11 0.67721736 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
12 0.67620134 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons
13 0.67614269 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results
14 0.67486763 69 acl-2010-Constituency to Dependency Translation with Forests
15 0.67422134 162 acl-2010-Learning Common Grammar from Multilingual Corpus
16 0.67225403 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
17 0.67188036 248 acl-2010-Unsupervised Ontology Induction from Text
18 0.67119992 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
19 0.67040563 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System
20 0.6700452 130 acl-2010-Hard Constraints for Grammatical Function Labelling