emnlp emnlp2012 emnlp2012-1 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Abby Levenberg ; Chris Dyer ; Phil Blunsom
Abstract: We describe a nonparametric model and corresponding inference algorithm for learning Synchronous Context Free Grammar derivations for parallel text. The model employs a Pitman-Yor Process prior which uses a novel base distribution over synchronous grammar rules. Through both synthetic grammar induction and statistical machine translation experiments, we show that our model learns complex translational correspondences— including discontiguous, many-to-many alignments—and produces competitive translation results. Further, inference is efficient and we present results on significantly larger corpora than prior work.
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract We describe a nonparametric model and corresponding inference algorithm for learning Synchronous Context Free Grammar derivations for parallel text. [sent-5, score-0.293]
2 The model employs a Pitman-Yor Process prior which uses a novel base distribution over synchronous grammar rules. [sent-6, score-0.529]
3 Through both synthetic grammar induction and statistical machine translation experiments, we show that our model learns complex translational correspondences— including discontiguous, many-to-many alignments—and produces competitive translation results. [sent-7, score-0.622]
4 The prevalence of complex phrasal, discontiguous, and non-monotonic translation phenomena in real-world applications of machine translation has driven the development of hierarchical and syntactic models based on synchronous context-free grammars (SCFGs). [sent-11, score-0.64]
5 However, while the models used for translation have evolved, the way in which they are learnt has not: na¨ ıve word-based models are still used to infer translational correspondences from parallel corpora. [sent-14, score-0.224]
6 uk this work we bring the learning of the minimal units of translation in step with the representational power of modern translation models. [sent-21, score-0.344]
7 We present a nonparametric Bayesian model of translation based on SCFGs, and we use its posterior distribution to infer synchronous derivations for a parallel corpus using a novel Gibbs sampler. [sent-22, score-0.737]
8 Learning synchronous grammars is hard due to the high polynomial complexity of dynamic programming and the exponential space of possible rules. [sent-24, score-0.296]
9 As such most prior work for learning SCFGs has relied on inference algorithms that were heuristically constrained or biased by word-based alignment models and small experiments (Wu, 1997; Zhang et al. [sent-25, score-0.244]
10 3M sentence pairs) without imposing restrictions on the form of the grammar rules or otherwise constraining the set of learnable rules (e. [sent-30, score-0.486]
11 We validate our sampler by demonstrating its ability to recover grammars used to generate synthetic datasets. [sent-33, score-0.457]
12 Our results attest to our model’s ability to learn synchronous grammars encoding complex translation phenomena. [sent-35, score-0.5]
13 lc L2a0n1g2ua Agseso Pcrioactieosnsi fnogr a Cnodm Cpoumtaptiuotna tilo Lnianlg Nuaist uircasl 2 Prior Work The goal of directly inducing phrasal translation models from parallel corpora has received a lot of attention in the NLP and SMT literature. [sent-38, score-0.329]
14 Marcu and Wong (2002) presented an ambitious maximum likelihood model and EM inference algorithm for learning phrasal translation representations. [sent-39, score-0.307]
15 This had the dual benefits of biasing the model towards learning minimal translation units, and integrating out the parameters such that a much smaller set of statistics would suffice for inference with a Gibbs sampler. [sent-44, score-0.237]
16 A popular solution to this problem is to heuristically restrict inference to derivations which agree with an independent alignment model (Cherry and Lin, 2007; Zhang et al. [sent-50, score-0.315]
17 (201 1) reported a novel Bayesian model for phrasal alignment and extraction that was able to model phrases of multiple granularities via a synchronous Adaptor Grammar. [sent-54, score-0.458]
18 (2009) presented an approach similar to ours that implemented a Gibbs sampler for a nonparametric Bayesian model of ITG. [sent-57, score-0.351]
19 Our model goes further by allowing discontiguous phrasal translation units. [sent-59, score-0.442]
20 Surprisingly, the freedom that this extra power affords allows the Gibbs sampler we propose to mix more quickly, allowing state-of-the-art results from a simple initialiser. [sent-60, score-0.259]
21 1 Synchronous Context-Free Grammar An synchronous context-free grammar (SCFG) is a 5-tuple hΣ, ∆, V, S, Ri that generalises context-free grammar Σto, generate strings concurrently oinn ttewxot- lfareneguages (Lewis and Stearns, 1968). [sent-66, score-0.622]
22 Σ is a finite set of source language terminal symbols, ∆ is a finite set of target language terminal symbols, V is a set of nonterminal symbols, with a designated start sym- bol S, and R is a set of synchronous rewrite rules. [sent-67, score-0.678]
23 The following are examples:1 VP NP → → h schlage NP1 NP2 vor | suggest NP2 to NP1 i h die Kommission | the commission i 1The nonterminal alignment a is indicated through subscripts on the nonterminals. [sent-69, score-0.249]
24 Translation with SCFGs is carried out by parsing the source language with the monolingual source language projection of the grammar (using standard monolingual parsing algorithms), which induces a parallel tree structure and translation in the target language (Chiang, 2007). [sent-71, score-0.49]
25 Alignment or synchronous parsing is the process of concurrently parsing both the source and target sentences, uncovering the derivation or derivations that give rise to a string pair (Wu, 1997; Dyer, 2010). [sent-72, score-0.513]
26 Our goal is to infer the most probable SCFG derivations that explain a corpus of parallel sentences, given a nonparametric prior over probabilistic SCFGs. [sent-73, score-0.311]
27 2 Pitman-Yor Process SCFG Before training we have no way of knowing how many rules will be needed in our grammar to adequately represent the data. [sent-76, score-0.332]
28 By using the PitmanYor process as a prior on the parameters of a synchronous grammar we can formulate a model which prefers smaller numbers of rules that are reused often, thereby avoiding degenerate grammars consisting of large, overly specific rules. [sent-77, score-0.708]
29 The discount is subtracted from each positive rule count and damp- ens the rich get richer effect where frequent rules are given higher probability compared to infrequent ones. [sent-80, score-0.361]
30 In our model, a draw from a PYP is a distribution over SCFG rules with a particular LHS (in fact, it is a distribution over all well-formed rules). [sent-82, score-0.279]
31 Although the PYP has no known analytical form, we can marginalise out the GX’s and reason about 225 Figure 1: Example generation of a synchronous grammar rule in our G0. [sent-84, score-0.506]
32 In this process, at time n a rule rn is generated by stochastically deciding whether to make another copy of a previously generated rule or to draw a new one from the base distribution, G0. [sent-86, score-0.334]
33 In particular, we set rn to ϕk with probability cθk +− n d, and increment ck, or with probability θ + d · |ϕ | θ +θ d + · n |ϕ|, we draw a new rule from G0, append it to iwt feo drr rn. [sent-93, score-0.233]
34 3 ϕ, and use Base Distribution The base distribution G0 for the PYP assigns probability stoe a i rsutlreib butaisoend our belief about what constitutes a good rule independent of observing any of the data. [sent-95, score-0.2]
35 The process begins by generating the source length (total number of terminal and nonterminal symbols, written |s|) by drawing from a Poisson disstryibmubtioolsn, w wirthitt mean 1 b: |s| ∼ Poisson(1) . [sent-98, score-0.309]
36 Then, for every position in s, we decide whether it will contain a terminal or nonterminal symbol by repeated, independent draws from a Bernoulli distribution. [sent-100, score-0.329]
37 Since we believe that shorter rules should be relatively more likely to contain terminal symbols than longer rules, we define the probability of a terminal symbol to be where 0 < φ < 1is a hyperparameter. [sent-101, score-0.591]
38 Let #NT(s) denote the number of nonterminal symbols we generated in s, i. [sent-104, score-0.224]
39 However, to ensure that the rule is well-formed, t must contain exactly as many nonterminal symbols as the source does. [sent-108, score-0.365]
40 We therefore draw the number of target terminal symbols from a Poisson whose mean is the number of terminal symbols in the source, plus a small constant λ0 to ensure that it is greater than zero: |t| − #NT(s) ∼ Poisson (|s| − #NT(s) + λ0) . [sent-109, score-0.513]
41 We then determine whether each position in t is a terminal or nonterminal symbol by drawing uniformly from the bag of #NT(s) source nonterminals and |t| − #NT(s) terminal indicators, without replacement. [sent-110, score-0.517]
42 At this point we have created a rule template which indicates how large the rule is, whether each position contains a terminal or nonterminal symbol, and the reordering of the source nonterminals a. [sent-111, score-0.54]
43 4 Gibbs Sampler In this section we introduce a Gibbs sampler that enables us to perform posterior inference given a corpus of sentence pairs. [sent-115, score-0.319]
44 Our innovation is to represent the synchronous derivation of a sentence pair in a hierarchical 4-dimensional binary alignment grid, with elements z[s,t,u,v] ∈ {0, 1}. [sent-116, score-0.403]
45 Our Gibbs sampler operates over the space of all the random variables z[s,t,u,v] , resampling one at a time. [sent-121, score-0.301]
46 To be valid, a Gibbs sampler must be ergodic and satisfy detailed balance. [sent-126, score-0.259]
47 Ergodicity requires that there is non-zero probability that any state in the sampler be reachable from any other state. [sent-127, score-0.29]
48 Clearly 2Our grid representation is the synchronous generalisation of the well-known correspondence between CFG derivations and Boolean matrices; see Lee (2002) for an overview. [sent-128, score-0.456]
49 our operator satisfies this since given any configuration of the alignment grid we can use the toggle operator to flatten the derivation to a single rule and then break it back down to reach any derivation. [sent-132, score-0.343]
50 Detailed balance requires that the probability of transitioning between two possible adjacent sampler states respects their joint probabilities in the stationary distribution. [sent-133, score-0.29]
51 Then, the probability of the sampler targeting any bispan in the grid is equal regardless of the current configuration of the alignment grid. [sent-135, score-0.63]
52 Therefore, the probability of sampling any possible bispan in the sentence pair is still uniform (ensuring detailed balance), while our sampler remains fast. [sent-143, score-0.51]
53 5 Evaluation The preceding sections have introduced a model, and accompanying inference technique, designed to induce a posterior distribution over SCFG derivations containing discontiguous and phrasal translation rules. [sent-144, score-0.662]
54 The evaluation that follows aims to determine our models ability to meet these design goals, and to do so in a range of translation scenarios. [sent-145, score-0.204]
55 In order to validate both the model and the sam- pler’s ability to learn an SCFG we first conduct a synthetic experiment in which the true grammar is known. [sent-146, score-0.311]
56 1 Synthetic Data Experiments Prior work on SCFG induction for SMT has validated modeling claims by reporting BLEU scores on real translation tasks. [sent-149, score-0.208]
57 Here we include a small synthetic data experiment to clearly validate our models ability to learn an SCFG that includes discontiguous and phrasal translation rules with non-monotonic word order. [sent-151, score-0.729]
58 We then ran the Gibbs sampler for fifty iterations through the data. [sent-153, score-0.259]
59 The bottom half of Table 1 lists the five rules with the highest marginal probability estimated by the sampler. [sent-154, score-0.222]
60 Even for such a small grammar the space of derivations is enormous and the task of recovering it from a data sample is non-trivial. [sent-156, score-0.297]
61 The divergence from the true probabilities is due to the effect of the prior assigning shorter rules higher probability. [sent-157, score-0.202]
62 012 X→ h X|X1 i Table 1: Manually created SCFG used to generate synthetic data, and the five most probable inferred rules by our model. [sent-173, score-0.255]
63 All translation systems employ a Hiero translation model during decoding. [sent-185, score-0.344]
64 , 2010) with the synchronous grammar extracted using the techniques developed by Lopez (2008). [sent-189, score-0.409]
65 All translation systems include a 5-gram language model built from a five hundred million token subset 3http : / /www . [sent-190, score-0.251]
66 The MODEL 1 INITIALISATION column is from the initialisation alignments using MODEL 1 and no sampling. [sent-209, score-0.279]
67 Experimental Setup To obtain the PYP-SCFG word alignments we ran the sampler for five hundred iterations for each of the language pairs and experimental conditions described below. [sent-212, score-0.457]
68 The Gibbs sampler requires an initial set of derivations from which to commence sampling. [sent-217, score-0.378]
69 In our experiments we investigated both weak and a strong initialisations, the former based on word alignments from IBM Model 1 and the latter on alignments from an HMM model (Vogel et al. [sent-218, score-0.32]
70 For decoding we used the word alignments implied by the derivations in the final sample to extract a Hiero grammar with the same standard set of relative frequency, length, and language model features used for the baseline. [sent-220, score-0.416]
71 Weak Initialisation Our first translation experiments ascertain the degree to which our proposed Gibbs sampling inference algorithm is able to learn good synchronous derivations for the PYP-SCFG model. [sent-221, score-0.628]
72 A number of prior works on alignment with Gibbs samplers have only evaluated models initialised with the more complex GIZA++ alignment models (Blunsom et al. [sent-222, score-0.433]
73 , 2008), as a result it can be difficult to separate the performance of the sampler from that of the initialisation. [sent-224, score-0.259]
74 In order to do this, we initialise the sampler 229 LANGUAGE PAIR UR-EN ZH-EN DE-EN PYP-SCFG MODEL 1 INIT. [sent-225, score-0.294]
75 We denote this a weak initialisation as no alignment models outside of those included in the PYP-SCFG model influence the resulting word alignments. [sent-241, score-0.364]
76 The BLEU scores for translation systems built from the five hundredth sample are show in the WEAK M 1 INIT. [sent-242, score-0.209]
77 Additionally we build a translation system from the MODEL 1 alignment used to initialise the sampler without using using our PYP-SCFG model or sampling. [sent-244, score-0.588]
78 Firstly it is clear MODEL 1 is indeed a weak initialiser as the resulting translation systems achieve uniformly low BLEU scores. [sent-246, score-0.254]
79 In contrast, the models built from the output of the Gibbs sampler for the PYP-SCFG model achieve BLEU scores comparable to those of the MODEL 4 BASELINE. [sent-247, score-0.259]
80 Thus the sampler has moved a good distance from its initialisation, and done so in a direction that results in better synchronous derivations. [sent-248, score-0.49]
81 Strong Initialisation Given we have established that the sampler can produce state-of-the-art translation results from a weak initialisation, it is instructive to investigate whether initialising the model with a strong alignment system, the GIZA++ HMM (Vogel et al. [sent-249, score-0.667]
82 of Table 3 shows the results for initialising with the HMM word alignments and sampling for 500 iterations. [sent-252, score-0.227]
83 Starting with a stronger initial sample results in both quicker mixing and better translation quality for the same number of sampling iterations. [sent-253, score-0.248]
84 Table 4 compares the average lengths of the rules produced by the sampler with both the strong and weak initialisers. [sent-254, score-0.495]
85 As the size of the training corpora increases (UR-EN → ZH-EN → DE-EN) we see that the average size of→ →the rules produced by the weakly initialised sampler also increases, while that of the strongly initialised model stays relatively uniform. [sent-255, score-0.625]
86 Initially both samplers start out with a large number of long rules and as the sampling progresses the rules are broken down into smaller, more generalisable, pieces. [sent-256, score-0.419]
87 As such we conclude from these metrics that after five hundred samples the strongly initialised model has converged to sampling from a mode of the distribution while the weakly initialised model converges more slowly and on the longer corpora is still travelling towards a mode. [sent-257, score-0.408]
88 This suggests that longer sampling runs, and Gibbs operators that make simultaneous updates to multiple parts of a derivation, would enable the weakly initialised model to obtain better translation results. [sent-258, score-0.354]
89 Grammar Analysis The BLEU scores are informative as a measure of translation quality but we also explored some of the differences in the grammars obtained from the PYPSCFG model compared to the standard approach. [sent-259, score-0.237]
90 From Figure 3 we see that the number of unique rules in the PYPSCFG grammar decreases steadily as the sampler iterates through the data, so the model is finding an increasingly sparser distribution with fewer but better quality rules as sampling progresses. [sent-261, score-0.862]
91 Figure 4 shows the distribution of rules with a given arity as a percentage 230 samples Figure 3: Unique grammar rules for each language pair as a function of the number of samples. [sent-263, score-0.591]
92 of the full grammar after the final sampling iteration. [sent-268, score-0.254]
93 The model prior biases the results to shorter rules as the vast majority of the model probability mass is on rules with zero, one or two nonterminals. [sent-269, score-0.387]
94 Tables 5 and 6 show the most probable rules in the Hiero translation system obtained using the PYPSCFG alignments that are not present in the TM from the GIZA++ alignments and visa versa. [sent-270, score-0.564]
95 zh-en arity Figure 4: The percentage of rules with a given arity in the final grammar of the PYP-SCFG model. [sent-273, score-0.46]
96 the PYP-SCFG grammar that are not in the heuristically extracted grammar are correct and minimal phrasal units of translation, whereas only two of the top probability rules in the GIZA++ grammar are of good translation quality. [sent-274, score-1.04]
97 We show state-of-the-art results and learn complex translation phenomena, including discontiguous and many-to-many 231 phrasal alignments, without applying any heuristic restrictions on the model to make learning tractable. [sent-277, score-0.442]
98 Our evaluation shows that we can use a principled approach to induce SCFGs designed specifically to utilize the full power of grammar based SMT instead of relying on complex word alignment heuristics with inherent bias. [sent-278, score-0.3]
99 We also expect that expanding our sampler beyond strict binary sampling may allow us to explore the space of hierarchical word alignments more quickly allowing for faster mixing. [sent-280, score-0.454]
100 We expect with these extensions our model of grammar induction may further improve translation output. [sent-281, score-0.386]
wordName wordTfidf (topN-words)
[('scfg', 0.392), ('sampler', 0.259), ('synchronous', 0.231), ('xx', 0.223), ('scfgs', 0.186), ('grammar', 0.178), ('translation', 0.172), ('discontiguous', 0.165), ('hiero', 0.165), ('initialisation', 0.16), ('rules', 0.154), ('bispan', 0.144), ('terminal', 0.138), ('nonterminal', 0.127), ('gibbs', 0.125), ('alignment', 0.122), ('alignments', 0.119), ('derivations', 0.119), ('initialised', 0.106), ('blunsom', 0.106), ('phrasal', 0.105), ('rule', 0.097), ('symbols', 0.097), ('gy', 0.096), ('pyp', 0.096), ('nonparametric', 0.092), ('weak', 0.082), ('succeed', 0.08), ('sampling', 0.076), ('smt', 0.075), ('grid', 0.074), ('bleu', 0.065), ('grammars', 0.065), ('arity', 0.064), ('poisson', 0.064), ('synthetic', 0.064), ('amna', 0.062), ('bispans', 0.062), ('kamyab', 0.062), ('pypscfg', 0.062), ('hmm', 0.06), ('giza', 0.058), ('dyer', 0.056), ('bayesian', 0.054), ('parallel', 0.052), ('derivation', 0.05), ('prior', 0.048), ('gx', 0.048), ('neubig', 0.048), ('dd', 0.048), ('discount', 0.044), ('hb', 0.044), ('lhs', 0.044), ('europarl', 0.044), ('heuristically', 0.044), ('source', 0.044), ('draw', 0.043), ('hs', 0.042), ('hundred', 0.042), ('variables', 0.042), ('nt', 0.042), ('distribution', 0.041), ('atol', 0.041), ('nhyn', 0.041), ('trg', 0.041), ('cherry', 0.039), ('hw', 0.039), ('nonterminals', 0.037), ('validate', 0.037), ('five', 0.037), ('nist', 0.037), ('induction', 0.036), ('biasing', 0.035), ('cdec', 0.035), ('concurrently', 0.035), ('initialise', 0.035), ('samplers', 0.035), ('stochastically', 0.035), ('subtracted', 0.035), ('pietra', 0.035), ('string', 0.034), ('symbol', 0.033), ('vogel', 0.033), ('association', 0.033), ('bernoulli', 0.032), ('degenerate', 0.032), ('generalisation', 0.032), ('initialising', 0.032), ('pitman', 0.032), ('pitmanyor', 0.032), ('src', 0.032), ('wood', 0.032), ('denero', 0.032), ('ability', 0.032), ('base', 0.031), ('probability', 0.031), ('draws', 0.031), ('rn', 0.031), ('inference', 0.03), ('posterior', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules
Author: Abby Levenberg ; Chris Dyer ; Phil Blunsom
Abstract: We describe a nonparametric model and corresponding inference algorithm for learning Synchronous Context Free Grammar derivations for parallel text. The model employs a Pitman-Yor Process prior which uses a novel base distribution over synchronous grammar rules. Through both synthetic grammar induction and statistical machine translation experiments, we show that our model learns complex translational correspondences— including discontiguous, many-to-many alignments—and produces competitive translation results. Further, inference is efficient and we present results on significantly larger corpora than prior work.
2 0.1390187 54 emnlp-2012-Forced Derivation Tree based Model Training to Statistical Machine Translation
Author: Nan Duan ; Mu Li ; Ming Zhou
Abstract: A forced derivation tree (FDT) of a sentence pair {f, e} denotes a derivation tree that can tpraainrsl {afte, f} idnetono itetss a acc duerraivtea target etrea tnhsaltat cioann e. In this paper, we present an approach that leverages structured knowledge contained in FDTs to train component models for statistical machine translation (SMT) systems. We first describe how to generate different FDTs for each sentence pair in training corpus, and then present how to infer the optimal FDTs based on their derivation and alignment qualities. As the first step in this line of research, we verify the effectiveness of our approach in a BTGbased phrasal system, and propose four FDTbased component models. Experiments are carried out on large scale English-to-Japanese and Chinese-to-English translation tasks, and significant improvements are reported on both translation quality and alignment quality.
3 0.12698767 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction
Author: Yang Feng ; Yang Liu ; Qun Liu ; Trevor Cohn
Abstract: Decoding algorithms for syntax based machine translation suffer from high computational complexity, a consequence of intersecting a language model with a context free grammar. Left-to-right decoding, which generates the target string in order, can improve decoding efficiency by simplifying the language model evaluation. This paper presents a novel left to right decoding algorithm for tree-to-string translation, using a bottom-up parsing strategy and dynamic future cost estimation for each partial translation. Our method outperforms previously published tree-to-string decoders, including a competing left-to-right method.
4 0.12588772 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT
Author: Shujie Liu ; Chi-Ho Li ; Mu Li ; Ming Zhou
Abstract: The training of most syntactic SMT approaches involves two essential components, word alignment and monolingual parser. In the current state of the art these two components are mutually independent, thus causing problems like lack of rule generalization, and violation of syntactic correspondence in translation rules. In this paper, we propose two ways of re-training monolingual parser with the target of maximizing the consistency between parse trees and alignment matrices. One is targeted self-training with a simple evaluation function; the other is based on training data selection from forced alignment of bilingual data. We also propose an auxiliary method for boosting alignment quality, by symmetrizing alignment matrices with respect to parse trees. The best combination of these novel methods achieves 3 Bleu point gain in an IWSLT task and more than 1 Bleu point gain in NIST tasks. 1
5 0.125352 67 emnlp-2012-Inducing a Discriminative Parser to Optimize Machine Translation Reordering
Author: Graham Neubig ; Taro Watanabe ; Shinsuke Mori
Abstract: This paper proposes a method for learning a discriminative parser for machine translation reordering using only aligned parallel text. This is done by treating the parser’s derivation tree as a latent variable in a model that is trained to maximize reordering accuracy. We demonstrate that efficient large-margin training is possible by showing that two measures of reordering accuracy can be factored over the parse tree. Using this model in the pre-ordering framework results in significant gains in translation accuracy over standard phrasebased SMT and previously proposed unsupervised syntax induction methods.
6 0.1092387 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation
7 0.10196083 86 emnlp-2012-Locally Training the Log-Linear Model for SMT
8 0.10081916 42 emnlp-2012-Entropy-based Pruning for Phrase-based Machine Translation
9 0.094858281 127 emnlp-2012-Transforming Trees to Improve Syntactic Convergence
10 0.093108088 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation
11 0.088366568 29 emnlp-2012-Concurrent Acquisition of Word Meaning and Lexical Categories
12 0.085872442 43 emnlp-2012-Exact Sampling and Decoding in High-Order Hidden Markov Models
13 0.084071234 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering
14 0.081431046 125 emnlp-2012-Towards Efficient Named-Entity Rule Induction for Customizability
15 0.08052361 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
16 0.07796777 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage
17 0.076764919 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
18 0.076057889 11 emnlp-2012-A Systematic Comparison of Phrase Table Pruning Techniques
19 0.074596159 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
20 0.074533984 48 emnlp-2012-Exploring Adaptor Grammars for Native Language Identification
topicId topicWeight
[(0, 0.242), (1, -0.155), (2, -0.15), (3, 0.007), (4, -0.161), (5, 0.008), (6, -0.115), (7, 0.015), (8, -0.005), (9, 0.007), (10, -0.025), (11, 0.153), (12, -0.041), (13, 0.06), (14, -0.175), (15, 0.099), (16, -0.109), (17, 0.057), (18, -0.049), (19, 0.082), (20, -0.015), (21, 0.036), (22, -0.085), (23, 0.168), (24, -0.059), (25, 0.009), (26, -0.073), (27, 0.116), (28, 0.007), (29, -0.024), (30, -0.023), (31, 0.061), (32, 0.0), (33, -0.185), (34, 0.005), (35, 0.134), (36, -0.047), (37, -0.054), (38, -0.051), (39, 0.106), (40, 0.02), (41, 0.072), (42, -0.034), (43, -0.011), (44, -0.046), (45, -0.031), (46, -0.05), (47, 0.018), (48, 0.007), (49, 0.09)]
simIndex simValue paperId paperTitle
same-paper 1 0.95958513 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules
Author: Abby Levenberg ; Chris Dyer ; Phil Blunsom
Abstract: We describe a nonparametric model and corresponding inference algorithm for learning Synchronous Context Free Grammar derivations for parallel text. The model employs a Pitman-Yor Process prior which uses a novel base distribution over synchronous grammar rules. Through both synthetic grammar induction and statistical machine translation experiments, we show that our model learns complex translational correspondences— including discontiguous, many-to-many alignments—and produces competitive translation results. Further, inference is efficient and we present results on significantly larger corpora than prior work.
2 0.65687376 54 emnlp-2012-Forced Derivation Tree based Model Training to Statistical Machine Translation
Author: Nan Duan ; Mu Li ; Ming Zhou
Abstract: A forced derivation tree (FDT) of a sentence pair {f, e} denotes a derivation tree that can tpraainrsl {afte, f} idnetono itetss a acc duerraivtea target etrea tnhsaltat cioann e. In this paper, we present an approach that leverages structured knowledge contained in FDTs to train component models for statistical machine translation (SMT) systems. We first describe how to generate different FDTs for each sentence pair in training corpus, and then present how to infer the optimal FDTs based on their derivation and alignment qualities. As the first step in this line of research, we verify the effectiveness of our approach in a BTGbased phrasal system, and propose four FDTbased component models. Experiments are carried out on large scale English-to-Japanese and Chinese-to-English translation tasks, and significant improvements are reported on both translation quality and alignment quality.
3 0.5981766 125 emnlp-2012-Towards Efficient Named-Entity Rule Induction for Customizability
Author: Ajay Nagesh ; Ganesh Ramakrishnan ; Laura Chiticariu ; Rajasekar Krishnamurthy ; Ankush Dharkar ; Pushpak Bhattacharyya
Abstract: Generic rule-based systems for Information Extraction (IE) have been shown to work reasonably well out-of-the-box, and achieve state-of-the-art accuracy with further domain customization. However, it is generally recognized that manually building and customizing rules is a complex and labor intensive process. In this paper, we discuss an approach that facilitates the process of building customizable rules for Named-Entity Recognition (NER) tasks via rule induction, in the Annotation Query Language (AQL). Given a set of basic features and an annotated document collection, our goal is to generate an initial set of rules with reasonable accuracy, that are interpretable and thus can be easily refined by a human developer. We present an efficient rule induction process, modeled on a fourstage manual rule development process and present initial promising results with our system. We also propose a simple notion of extractor complexity as a first step to quantify the interpretability of an extractor, and study the effect of induction bias and customization ofbasic features on the accuracy and complexity of induced rules. We demonstrate through experiments that the induced rules have good accuracy and low complexity according to our complexity measure.
4 0.58889306 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT
Author: Shujie Liu ; Chi-Ho Li ; Mu Li ; Ming Zhou
Abstract: The training of most syntactic SMT approaches involves two essential components, word alignment and monolingual parser. In the current state of the art these two components are mutually independent, thus causing problems like lack of rule generalization, and violation of syntactic correspondence in translation rules. In this paper, we propose two ways of re-training monolingual parser with the target of maximizing the consistency between parse trees and alignment matrices. One is targeted self-training with a simple evaluation function; the other is based on training data selection from forced alignment of bilingual data. We also propose an auxiliary method for boosting alignment quality, by symmetrizing alignment matrices with respect to parse trees. The best combination of these novel methods achieves 3 Bleu point gain in an IWSLT task and more than 1 Bleu point gain in NIST tasks. 1
5 0.5290153 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction
Author: Yang Feng ; Yang Liu ; Qun Liu ; Trevor Cohn
Abstract: Decoding algorithms for syntax based machine translation suffer from high computational complexity, a consequence of intersecting a language model with a context free grammar. Left-to-right decoding, which generates the target string in order, can improve decoding efficiency by simplifying the language model evaluation. This paper presents a novel left to right decoding algorithm for tree-to-string translation, using a bottom-up parsing strategy and dynamic future cost estimation for each partial translation. Our method outperforms previously published tree-to-string decoders, including a competing left-to-right method.
6 0.51620317 75 emnlp-2012-Large Scale Decipherment for Out-of-Domain Machine Translation
7 0.46216634 86 emnlp-2012-Locally Training the Log-Linear Model for SMT
8 0.44901168 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
9 0.43175423 67 emnlp-2012-Inducing a Discriminative Parser to Optimize Machine Translation Reordering
10 0.42098236 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation
11 0.4099232 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
12 0.39976844 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage
13 0.39520627 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering
14 0.3887158 43 emnlp-2012-Exact Sampling and Decoding in High-Order Hidden Markov Models
15 0.38691646 127 emnlp-2012-Transforming Trees to Improve Syntactic Convergence
16 0.38240781 29 emnlp-2012-Concurrent Acquisition of Word Meaning and Lexical Categories
17 0.37260854 48 emnlp-2012-Exploring Adaptor Grammars for Native Language Identification
18 0.35238582 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation
19 0.31837699 128 emnlp-2012-Translation Model Based Cross-Lingual Language Model Adaptation: from Word Models to Phrase Models
20 0.31593788 108 emnlp-2012-Probabilistic Finite State Machines for Regression-based MT Evaluation
topicId topicWeight
[(2, 0.012), (14, 0.015), (16, 0.027), (25, 0.018), (34, 0.087), (60, 0.067), (63, 0.038), (65, 0.013), (74, 0.538), (76, 0.031), (80, 0.016), (86, 0.021)]
simIndex simValue paperId paperTitle
1 0.97987831 21 emnlp-2012-Assessment of ESL Learners' Syntactic Competence Based on Similarity Measures
Author: Su-Youn Yoon ; Suma Bhat
Abstract: This study presents a novel method that measures English language learners’ syntactic competence towards improving automated speech scoring systems. In contrast to most previous studies which focus on the length of production units such as the mean length of clauses, we focused on capturing the differences in the distribution of morpho-syntactic features or grammatical expressions across proficiency. We estimated the syntactic competence through the use of corpus-based NLP techniques. Assuming that the range and so- phistication of grammatical expressions can be captured by the distribution of Part-ofSpeech (POS) tags, vector space models of POS tags were constructed. We use a large corpus of English learners’ responses that are classified into four proficiency levels by human raters. Our proposed feature measures the similarity of a given response with the most proficient group and is then estimates the learner’s syntactic competence level. Widely outperforming the state-of-the-art measures of syntactic complexity, our method attained a significant correlation with humanrated scores. The correlation between humanrated scores and features based on manual transcription was 0.43 and the same based on ASR-hypothesis was slightly lower, 0.42. An important advantage of our method is its robustness against speech recognition errors not to mention the simplicity of feature generation that captures a reasonable set of learnerspecific syntactic errors. 600 Measures Suma Bhat Beckman Institute, Urbana, IL 61801 . spbhat 2 @ i l l ino i edu s
same-paper 2 0.92374933 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules
Author: Abby Levenberg ; Chris Dyer ; Phil Blunsom
Abstract: We describe a nonparametric model and corresponding inference algorithm for learning Synchronous Context Free Grammar derivations for parallel text. The model employs a Pitman-Yor Process prior which uses a novel base distribution over synchronous grammar rules. Through both synthetic grammar induction and statistical machine translation experiments, we show that our model learns complex translational correspondences— including discontiguous, many-to-many alignments—and produces competitive translation results. Further, inference is efficient and we present results on significantly larger corpora than prior work.
3 0.87950218 7 emnlp-2012-A Novel Discriminative Framework for Sentence-Level Discourse Analysis
Author: Shafiq Joty ; Giuseppe Carenini ; Raymond Ng
Abstract: We propose a complete probabilistic discriminative framework for performing sentencelevel discourse analysis. Our framework comprises a discourse segmenter, based on a binary classifier, and a discourse parser, which applies an optimal CKY-like parsing algorithm to probabilities inferred from a Dynamic Conditional Random Field. We show on two corpora that our approach outperforms the state-of-the-art, often by a wide margin.
4 0.69307315 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
Author: Kewei Tu ; Vasant Honavar
Abstract: We introduce a novel approach named unambiguity regularization for unsupervised learning of probabilistic natural language grammars. The approach is based on the observation that natural language is remarkably unambiguous in the sense that only a tiny portion of the large number of possible parses of a natural language sentence are syntactically valid. We incorporate an inductive bias into grammar learning in favor of grammars that lead to unambiguous parses on natural language sentences. The resulting family of algorithms includes the expectation-maximization algorithm (EM) and its variant, Viterbi EM, as well as a so-called softmax-EM algorithm. The softmax-EM algorithm can be implemented with a simple and computationally efficient extension to standard EM. In our experiments of unsupervised dependency grammar learn- ing, we show that unambiguity regularization is beneficial to learning, and in combination with annealing (of the regularization strength) and sparsity priors it leads to improvement over the current state of the art.
5 0.56369627 31 emnlp-2012-Cross-Lingual Language Modeling with Syntactic Reordering for Low-Resource Speech Recognition
Author: Ping Xu ; Pascale Fung
Abstract: This paper proposes cross-lingual language modeling for transcribing source resourcepoor languages and translating them into target resource-rich languages if necessary. Our focus is to improve the speech recognition performance of low-resource languages by leveraging the language model statistics from resource-rich languages. The most challenging work of cross-lingual language modeling is to solve the syntactic discrepancies between the source and target languages. We therefore propose syntactic reordering for cross-lingual language modeling, and present a first result that compares inversion transduction grammar (ITG) reordering constraints to IBM and local constraints in an integrated speech transcription and translation system. Evaluations on resource-poor Cantonese speech transcription and Cantonese to resource-rich Mandarin translation tasks show that our proposed approach improves the system performance significantly, up to 3.4% relative WER reduction in Cantonese transcription and 13.3% relative bilingual evaluation understudy (BLEU) score improvement in Mandarin transcription compared with the system without reordering.
6 0.53880525 122 emnlp-2012-Syntactic Surprisal Affects Spoken Word Duration in Conversational Contexts
7 0.53240973 88 emnlp-2012-Minimal Dependency Length in Realization Ranking
8 0.52691996 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT
9 0.51859331 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction
10 0.51611137 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
11 0.51004493 125 emnlp-2012-Towards Efficient Named-Entity Rule Induction for Customizability
12 0.49006405 8 emnlp-2012-A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes
13 0.48991176 51 emnlp-2012-Extracting Opinion Expressions with semi-Markov Conditional Random Fields
14 0.48377368 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
15 0.48168737 133 emnlp-2012-Unsupervised PCFG Induction for Grounded Language Learning with Highly Ambiguous Supervision
16 0.4785946 75 emnlp-2012-Large Scale Decipherment for Out-of-Domain Machine Translation
17 0.47431189 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage
18 0.46037799 67 emnlp-2012-Inducing a Discriminative Parser to Optimize Machine Translation Reordering
19 0.45873868 27 emnlp-2012-Characterizing Stylistic Elements in Syntactic Structure
20 0.45428333 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing