emnlp emnlp2012 emnlp2012-67 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Graham Neubig ; Taro Watanabe ; Shinsuke Mori
Abstract: This paper proposes a method for learning a discriminative parser for machine translation reordering using only aligned parallel text. This is done by treating the parser’s derivation tree as a latent variable in a model that is trained to maximize reordering accuracy. We demonstrate that efficient large-margin training is possible by showing that two measures of reordering accuracy can be factored over the parse tree. Using this model in the pre-ordering framework results in significant gains in translation accuracy over standard phrasebased SMT and previously proposed unsupervised syntax induction methods.
Reference: text
sentIndex sentText sentNum sentScore
1 This is done by treating the parser’s derivation tree as a latent variable in a model that is trained to maximize reordering accuracy. [sent-2, score-0.751]
2 We demonstrate that efficient large-margin training is possible by showing that two measures of reordering accuracy can be factored over the parse tree. [sent-3, score-0.747]
3 Using this model in the pre-ordering framework results in significant gains in translation accuracy over standard phrasebased SMT and previously proposed unsupervised syntax induction methods. [sent-4, score-0.192]
4 1 Introduction Finding the appropriate word ordering in the target language is one of the most difficult problems for statistical machine translation (SMT) , particularly for language pairs with widely divergent syntax. [sent-5, score-0.199]
5 As a result, there is a large amount of previous research that handles the problem of reordering through the use of improved reordering models for phrase-based SMT (Koehn et al. [sent-6, score-1.108]
6 , 2005) , hierarchical phrase-based translation (Chiang, 2007) , syntax-based translation (Yamada and Knight, 2001) , or preordering (Xia and McCord, 2004) . [sent-7, score-0.334]
7 In particular, systems that use sourcelanguage syntax allow for the handling of longdistance reordering without large increases in The first autho The first author is now affiliated with the Nara Institute of Science and Technology. [sent-8, score-0.554]
8 In recent work, DeNero and Uszkoreit (2011) suggest that unsupervised grammar induction can be used to create source-sentence parse structure for use in translation as a part of a pre-ordering based translation system. [sent-11, score-0.399]
9 In this work, we present a method for inducing a parser for SMT by training a discriminative model to maximize reordering accuracy while treating the parse tree as a latent variable. [sent-12, score-0.856]
10 As a learning framework, we use online large-margin methods to train the model to directly minimize two measures of reordering accuracy. [sent-13, score-0.554]
11 Experiments find that the proposed model improves both reordering and translation accuracy, leading to average gains of 1. [sent-15, score-0.702]
12 In addition, we show that our model is able to effectively maximize various measures of reordering accuracy, and that the reordering measure that we choose has a direct effect on translation results. [sent-18, score-1.29]
13 2 Preordering for SMT Machine translation is defined as transformation of source sentence F = f1. [sent-19, score-0.188]
14 lc L2a0n1g2ua Agseso Pcrioactieosnsi fnogr a Cnodm Cpoumtaptiuotna tilo Lnianlg Nuaist uircasl Figure 1: An example with a source sentence F reordered into target order F0, and its corresponding target sentence E. [sent-28, score-0.187]
15 the pre-ordering approach to machine translation (Xia and McCord, 2004) , which performs translation as a two step process of reordering and translation (Figure 1) . [sent-30, score-0.998]
16 , 2003) , which can produce accurate translations when only local reordering is required. [sent-33, score-0.554]
17 However, as building a parser for each source language is a resourceintensive undertaking, there has also been some interest in developing reordering rules without the use of a parser (Rottmann and Vogel, 2007; Tromble and Eisner, 2009; DeNero and Uszkoreit, 2011; Visweswariah et al. [sent-42, score-0.698]
18 First, DeNero and Uszkoreit (2011) learn a reordering model through a three-step process of bilingual grammar induction, training a monolingual parser to reproduce the induced trees, and training 844 a reordering model that selects a reordering based on this parse structure. [sent-45, score-1.851]
19 In contrast, our method trains the model in a single step, treating the parse structure as a latent variable in a discriminative reordering model. [sent-46, score-0.726]
20 Our work is also unique in that we show that it is possible to di- rectly optimize several measures of reordering accuracy, which proves important for achieving good translations. [sent-50, score-0.554]
21 1 3 Training a Reordering Model with Latent Derivations In this section, we provide a basic overview of the proposed method for learning a reordering model with latent derivations using online discriminative learning. [sent-51, score-0.714]
22 BTGs represent a binary tree derivation D over the source sentence F as shown in Figure 1. [sent-54, score-0.17]
23 Each non-terminal node can either be a straight (str) or inverted (inv) production, and terminals (term) span a nonempty substring f. [sent-55, score-0.225]
24 Each subtree represents a source substring f and its reordered counterpart f0. [sent-57, score-0.283]
25 For each terminal node, no reordering occurs and f is equal to f0. [sent-58, score-0.593]
26 (2011) also optimizes reordering accuracy, but requires manually annotated parses as seed data. [sent-60, score-0.599]
27 2In the original BTG framework used in translation, terminals produce a bilingual substring pair f/e, but as we are only interested in reordering the source F, we simplify the model by removing the target substring e. [sent-61, score-0.732]
28 We define the space of all reorderings that can be produced by the BTG as F0, and attempt to bfined p rtohde bceesdt reordering sw Fithin this space. [sent-63, score-0.711]
29 2 Reorderings with Latent Derivations In order to find the best reordering given only the information in the source side sentence F, we define a scoring function S(F0 |F) , and choose the ordering of maximal score: Fˆ0 F˙0= argFm0axS(F0|F). [sent-65, score-0.645]
30 As our model is based on reorderings licensed by BTG derivations, we also assume that there is an underlying derivation D that produced F0. [sent-66, score-0.287]
31 3BTGs cannot reproduce all possible reorderings, but can handle most reorderings occurring in natural translated text (Haghighi et al. [sent-71, score-0.191]
32 845 Figure 2: An example of (a) the ranking function r(fj ) , (b) loss according to Kendall’s τ, (c) loss according to chunk fragmentation. [sent-73, score-0.522]
33 This section explains how to calculate oracle reorderings, and assign each F0 a loss and an accuracy according to how well it reproduces the oracle. [sent-75, score-0.399]
34 1 Calculating Oracle Orderings In order to calculate reordering quality, we first define a ranking function r(fj |F, A) , which indicates the relative position of source wwhoircdh fj iinthe proper target order (Figure 2 (a)) . [sent-77, score-0.74]
35 , aJ, where each aj is a set of the indices of the words in E to which fj is aligned. [sent-81, score-0.181]
36 We can now define measures of reordering accuracy for F0 by how well it arranges the words in order of ascending rank. [sent-92, score-0.598]
37 It should be noted that as we allow ties in rank, there are multiple possible F0 where all words are in strictly ascending order, which we will call oracle orderings. [sent-93, score-0.159]
38 2 Kendall’s τ The first measure of reordering accuracy that we will consider is Kendall’s τ (Kendall, 1938) , a measure of pairwise rank correlation which has been proposed for evaluating translation reordering accuracy (Isozaki et al. [sent-95, score-1.386]
39 Because j1 < j2 , fj01 comes before fj02 in the reordered sentence, the ranks should be r(fj01 ) ≤ r(fj02 ) in order to produce the correct ordering. [sent-100, score-0.191]
40 Based on this criterion, we first define a loss Lt (F0) that will be higher for orderings that are further from the oracle. [sent-101, score-0.249]
41 To calculate an accuracy measure for ordering F0, we first calculate the maximum loss for the sentence, which is equal to the total number of non-equal rank comparisons in the sentence5 ∑J−1 ∑J mFax0 Lt(F0) =∑ ∑ δ(r(fj01) = r(fj02)). [sent-104, score-0.421]
42 j∑1=1 j2=∑j1+1 (1) 5The traditional formulation of Kendall’s no ties in rank, and thus the maximum loss culated as J(J − 1)/2. [sent-105, score-0.243]
43 τ assumes can be cal- 846 Finally, we use this maximum loss to normalize the actual loss to get an accuracy At(F0) = 1 −mF˜aL0xt(LFt(0F)˜0), which will take a value between 0 (when F0 has maximal loss) , and 1 (when F0 matches one of the oracle orderings) . [sent-106, score-0.554]
44 3 Chunk Fragmentation Another measure that has been used in evaluation of translation accuracy (Banerjee and Lavie, 2005) and pre-ordering accuracy (Talbot et al. [sent-110, score-0.236]
45 To account for this, when calculating the chunk fragmentation score, we additionally add two sentence boundary words f0 and fJ+1 with ranks r(f0) = 0 and r(fJ+1) = 1 + mfj0a∈xF0r(fj0) and redefine the sum- mation in Equation (2) to consider these words (e. [sent-114, score-0.302]
46 D˙ ← arDgm∈DaxS(D|F,w) + L(D|F,A) Find the model parse Dˆ ← argminL(D|F,A) − αS(D|F,w) Find the oracle parse if L(Dˆ|F, A) L(D˙|F, A) then w ←|F β,A(w) =+ Lγ((Dφ(|DFˆ,, AF)) −th φen(D˙, F))) D. [sent-120, score-0.32]
47 5 Learning a BTG Parser for Reordering Now that we have a definition of loss over reorderings produced by the model, we have a clear learning objective: we would like to find reorderings F0 with low loss. [sent-126, score-0.512]
48 The learning algorithm we use to achieve this goal is motivated by discriminative training for machine translation systems (Liang et al. [sent-127, score-0.184]
49 847 minimal loss (the oracle parse) , and updating w if these two parses diverge (Figure 3) . [sent-133, score-0.357]
50 In order to create both of these parses efficiently, we first create a parse forest encoding a large number of derivations Di according to the lmarogdeel n scores. [sent-134, score-0.289]
51 Next, we fiionnds tDhe model parse D˙i, which is the parse in the forest Di that maxiwmhiziecsh t ishe t sum orfs eth ien tmhoede fol score and the loss S(Dk |Fk, w) + L(Dk |Fk, Ak) . [sent-135, score-0.454]
52 We also find an oracle parse which is selected solely to minimize the loss L(Dk |Fk, Ak) . [sent-139, score-0.415]
53 One important difference between the m|Fodel we describe here and traditional parsing models is that the target derivation is a latent variable. [sent-140, score-0.163]
54 Because many Dk achieve a particular reordering F0, many reorderings F0 are able to minimize the loss L(Fk0 |Fk , Ak) . [sent-141, score-0.909]
55 DeNero and Uszkoreit (2011) resolve this ambiguity with four features with empirically tuned scores before training a monolingual parser and reordering model. [sent-143, score-0.606]
56 In contrast, we follow previous work on discriminative learning with latent variables (Yu and Joachims, 2009) , and break ties within the pool of oracle derivations by selecting the derivation with the largest model score. [sent-144, score-0.449]
57 From an implementation point of view, this can be done by finding the derivation that minimizes L(Dk |Fk , Ak) αS(Dk |Fk, w) , where α is a constant s|mFall enough (toD ensure that the effect of the loss will always be greater than the Dˆi, Dˆk − effect of the score. [sent-145, score-0.328]
58 Finally, if the model parse has a loss that is greater than that of the oracle parse Dˆk, we update the weights to increase the score of the oracle parse and decrease the score of the model parse. [sent-146, score-0.735]
59 To perform this full process, given a source sentence Fk, alignment Ak , and model weights w we need to be able to efficiently calculate scores, calculate losses, and create parse forests for derivations Dk, the details of which will be explained in the following sections. [sent-150, score-0.32]
60 2 Scoring Derivation Trees First, we must consider how to efficiently assign scores S(D|F, w) to a derivation or forest during parsing. [sent-152, score-0.18]
61 TDh|Fe ,mwo)st t ost aa dnderaivrda tainond oerffi focireenstt way ntog do so is to create local features that can be calculated based only on the information included in a single node d in the derivation tree. [sent-153, score-0.164]
62 To ease explanation, we represent each node in the derivation as d = hs, l, c, c + 1, ri, where s tish teh de nroivdaet’iso symbol (str, inv, or term) , werheile s l and r are the leftmost and rightmost indices of the span that d covers. [sent-156, score-0.253]
63 fr in a supervised parse tree, and the intersection of the three labels. [sent-178, score-0.159]
64 3 Finding Losses for Derivation Trees The above features φ and their corresponding weights w are all that are needed to calculate scores of derivation trees at test time. [sent-183, score-0.208]
65 However, during training, it is also necessary to find model parses according to the loss-augmented scoring function S(D|F, w) + L(D|F, A) or oracle parses according (toD |tFh,ew l)os+s L(D|F, A) . [sent-184, score-0.204]
66 In this section, we demonstrate that the loss L(d|F, A) for the evaluation measures we dloesfsine Ld( |inF SAe)ct ifoonr t4h can (mostly) beea fuarcetsor wede over nodes in a fashion similar to features. [sent-187, score-0.233]
67 ∑j=l (3) For non-terminal nodes, we first focus on straight non-terminals with parent node d = hstr, l, c, c + 1, ri, and left and right child nodes dl = hsl , l, lc, l1c,r+i 1, , ci da lnedft dr = hsr, c+ 1i , rc, rc+ 1, ri. [sent-192, score-0.197]
68 j∑1=l j2∑=c+1 In other words, the subtree’s total loss can be factored into the loss of its left subtree, the loss of its right subtree, and the additional loss contributed by comparisons between the words spanning both subtrees. [sent-195, score-0.874]
69 2 Factoring Chunk Fragmentation Chunk fragmentation loss can be factored in a similar fashion. [sent-199, score-0.376]
70 First, it is clear that the loss for the terminal nodes can be calculated efficiently in a fashion similar to Equation (3) . [sent-200, score-0.306]
71 In order to calculate the loss for non-terminals d, we note that the summation in Equation (2) can be divided into the sum over the internal bi-grams in the left and right subtrees, and the bi-gram spanning the reordered trees Lc(d|F, A) =Lc(dl |F, A) + Lc(dr|F, A) + discont(fc0, fc0+1). [sent-201, score-0.459]
72 However, unlike Kendall’s τ, this equation relies not on the ranks of fc and fc+1 in the original sentence, but on the ranks of fc0 and fc0+1 in the reordered sentence. [sent-202, score-0.304]
73 4 Parsing Derivation Trees Finally, we must be able to create a parse forest from which we select model and oracle parses. [sent-206, score-0.267]
74 However, when keeping track of target positions for calculation of chunk fragmentation loss, there are a total of O(J5) nodes, an unreasonable burden in terms of time and memory. [sent-208, score-0.258]
75 6 Experiments Our experiments test the reordering and translation accuracy of translation systems using the proposed method. [sent-210, score-0.894]
76 As reordering metrics, we use Kendall’s τ and chunk fragmentation (Talbot et al. [sent-211, score-0.812]
77 7k Table 1: The number of sentences and words for training and testing the reordering model (RM) , translation model (TM) , and language model (LM) . [sent-234, score-0.702]
78 7 Except when stated otherwise, lader was trained to minimize chunk fragmentation loss with a cube pruning stack pop limit of 50, and the regularization constant of 10−3 (chosen through cross-validation) . [sent-236, score-0.683]
79 We use the designated development and test sets of manually created alignments as training data for the reordering models, removing sentences of more than 60 words. [sent-239, score-0.625]
80 As default features for lader and the monolingual parsing and reordering models in 3-step, we use all the features described in Section 5. [sent-240, score-0.781]
81 1 Effect of Pre-ordering Table 2 shows reordering and translation results for orig, 3-step, and lader. [sent-249, score-0.702]
82 It can be seen that the proposed lader outperforms the baselines in both reordering and translation. [sent-250, score-0.781]
83 9 There are a number of reasons why lader outperforms 3-step. [sent-251, score-0.227]
84 First, the pipeline of 3-step suffers from error propogation, with errors in monolingual parsing and reordering resulting in low overall accuracy. [sent-252, score-0.554]
85 1 describes, lader breaks ties between oracle parses based on model score, allowing easyto-reproduce model parses to be chosen dur- ing training. [sent-254, score-0.476]
86 In fact, lader generally found trees that followed from syntactic constituency, while 3-step more often used terminal nodes 8In addition, following the example of Sudoh et al. [sent-255, score-0.336]
87 10When using oracle parses, chunk accuracy was up to 81%, showing that parsing errors are highly detrimental. [sent-259, score-0.284]
88 2 shows in detail, the ability of lader to maximize reordering accuracy directly allows for improved reordering and translation results. [sent-271, score-1.561]
89 30 for English-Japanese and Japanese-English respectively, approximately matching lader in accuracy, but with a significant decrease in decoding speed. [sent-276, score-0.227]
90 Further, when pre-ordering with lader and hierarchical phrase-based SMT were combined, BLEU scores rose to 23. [sent-277, score-0.227]
91 2 Effect of Training Loss Table 3 shows results when one of three losses is optimized during training: chunk fragmentation (Lc) , Kendall’s τ (Lt) , or the linear interpolation of the two with weights chosen so that both losses contribute equally (Lt + Lc) . [sent-281, score-0.41]
92 (2011) , who find chunk fragmentation is better correlated with translation accuracy than Kendall’s τ. [sent-284, score-0.45]
93 lader is able to improve over the orig baseline in all cases, but when equal numbers of manual and automatic alignments are used, the reorderer trained on manual alignments is significantly better. [sent-293, score-0.49]
94 7 Conclusion We presented a method for learning a discriminative parser to maximize reordering accuracy for machine translation. [sent-295, score-0.72]
95 Improving arabic-to-english statistical machine translation by reordering post-verbal subjects for alignment. [sent-309, score-0.702]
96 Automatically learning source-side reordering rules for large scale machine translation. [sent-354, score-0.554]
97 Head finalization: A simple reordering rule for sov languages. [sent-369, score-0.554]
98 A probabilistic approach to syntax-based reordering for statistical machine translation. [sent-415, score-0.554]
99 Word reordering in statistical machine translation with a pos-based distortion model. [sent-446, score-0.702]
100 Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation. [sent-456, score-0.594]
wordName wordTfidf (topN-words)
[('reordering', 0.554), ('lader', 0.227), ('loss', 0.198), ('kendall', 0.177), ('reorderings', 0.157), ('lt', 0.156), ('translation', 0.148), ('reordered', 0.147), ('dk', 0.132), ('fragmentation', 0.132), ('derivation', 0.13), ('chunk', 0.126), ('oracle', 0.114), ('sudoh', 0.113), ('parse', 0.103), ('fj', 0.103), ('uszkoreit', 0.103), ('fk', 0.095), ('derivations', 0.091), ('btg', 0.088), ('denero', 0.087), ('smt', 0.084), ('hajime', 0.081), ('discont', 0.076), ('katsuhito', 0.076), ('losses', 0.076), ('isozaki', 0.073), ('alignments', 0.071), ('fc', 0.069), ('kyoto', 0.068), ('inv', 0.065), ('tsukada', 0.065), ('dl', 0.064), ('talbot', 0.063), ('dr', 0.06), ('neubig', 0.059), ('str', 0.059), ('tromble', 0.059), ('cfg', 0.057), ('fl', 0.057), ('duh', 0.057), ('mccord', 0.057), ('shinsuke', 0.057), ('koehn', 0.057), ('fr', 0.056), ('ak', 0.055), ('inverted', 0.054), ('parser', 0.052), ('ordering', 0.051), ('substring', 0.051), ('orderings', 0.051), ('forest', 0.05), ('visweswariah', 0.049), ('orig', 0.049), ('flannery', 0.049), ('span', 0.046), ('graham', 0.046), ('factored', 0.046), ('subtree', 0.045), ('ties', 0.045), ('parses', 0.045), ('accuracy', 0.044), ('ranks', 0.044), ('indices', 0.044), ('calculate', 0.043), ('rank', 0.042), ('hideki', 0.041), ('alexandra', 0.04), ('xia', 0.04), ('source', 0.04), ('bleu', 0.039), ('terminal', 0.039), ('rc', 0.039), ('lex', 0.038), ('straight', 0.038), ('factoring', 0.038), ('hsr', 0.038), ('hterm', 0.038), ('khalilov', 0.038), ('pegasos', 0.038), ('preordering', 0.038), ('rottmann', 0.038), ('franz', 0.037), ('birch', 0.036), ('manual', 0.036), ('discriminative', 0.036), ('spanning', 0.036), ('terminals', 0.036), ('nodes', 0.035), ('trees', 0.035), ('index', 0.035), ('maximize', 0.034), ('reproduce', 0.034), ('aj', 0.034), ('chiang', 0.034), ('dyer', 0.034), ('calculated', 0.034), ('latent', 0.033), ('pos', 0.033), ('leftmost', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999994 67 emnlp-2012-Inducing a Discriminative Parser to Optimize Machine Translation Reordering
Author: Graham Neubig ; Taro Watanabe ; Shinsuke Mori
Abstract: This paper proposes a method for learning a discriminative parser for machine translation reordering using only aligned parallel text. This is done by treating the parser’s derivation tree as a latent variable in a model that is trained to maximize reordering accuracy. We demonstrate that efficient large-margin training is possible by showing that two measures of reordering accuracy can be factored over the parse tree. Using this model in the pre-ordering framework results in significant gains in translation accuracy over standard phrasebased SMT and previously proposed unsupervised syntax induction methods.
2 0.32976928 31 emnlp-2012-Cross-Lingual Language Modeling with Syntactic Reordering for Low-Resource Speech Recognition
Author: Ping Xu ; Pascale Fung
Abstract: This paper proposes cross-lingual language modeling for transcribing source resourcepoor languages and translating them into target resource-rich languages if necessary. Our focus is to improve the speech recognition performance of low-resource languages by leveraging the language model statistics from resource-rich languages. The most challenging work of cross-lingual language modeling is to solve the syntactic discrepancies between the source and target languages. We therefore propose syntactic reordering for cross-lingual language modeling, and present a first result that compares inversion transduction grammar (ITG) reordering constraints to IBM and local constraints in an integrated speech transcription and translation system. Evaluations on resource-poor Cantonese speech transcription and Cantonese to resource-rich Mandarin translation tasks show that our proposed approach improves the system performance significantly, up to 3.4% relative WER reduction in Cantonese transcription and 13.3% relative bilingual evaluation understudy (BLEU) score improvement in Mandarin transcription compared with the system without reordering.
3 0.26790011 54 emnlp-2012-Forced Derivation Tree based Model Training to Statistical Machine Translation
Author: Nan Duan ; Mu Li ; Ming Zhou
Abstract: A forced derivation tree (FDT) of a sentence pair {f, e} denotes a derivation tree that can tpraainrsl {afte, f} idnetono itetss a acc duerraivtea target etrea tnhsaltat cioann e. In this paper, we present an approach that leverages structured knowledge contained in FDTs to train component models for statistical machine translation (SMT) systems. We first describe how to generate different FDTs for each sentence pair in training corpus, and then present how to infer the optimal FDTs based on their derivation and alignment qualities. As the first step in this line of research, we verify the effectiveness of our approach in a BTGbased phrasal system, and propose four FDTbased component models. Experiments are carried out on large scale English-to-Japanese and Chinese-to-English translation tasks, and significant improvements are reported on both translation quality and alignment quality.
4 0.1558222 86 emnlp-2012-Locally Training the Log-Linear Model for SMT
Author: Lemao Liu ; Hailong Cao ; Taro Watanabe ; Tiejun Zhao ; Mo Yu ; Conghui Zhu
Abstract: In statistical machine translation, minimum error rate training (MERT) is a standard method for tuning a single weight with regard to a given development data. However, due to the diversity and uneven distribution of source sentences, there are two problems suffered by this method. First, its performance is highly dependent on the choice of a development set, which may lead to an unstable performance for testing. Second, translations become inconsistent at the sentence level since tuning is performed globally on a document level. In this paper, we propose a novel local training method to address these two problems. Unlike a global training method, such as MERT, in which a single weight is learned and used for all the input sentences, we perform training and testing in one step by learning a sentencewise weight for each input sentence. We pro- pose efficient incremental training methods to put the local training into practice. In NIST Chinese-to-English translation tasks, our local training method significantly outperforms MERT with the maximal improvements up to 2.0 BLEU points, meanwhile its efficiency is comparable to that of the global method.
5 0.14120513 42 emnlp-2012-Entropy-based Pruning for Phrase-based Machine Translation
Author: Wang Ling ; Joao Graca ; Isabel Trancoso ; Alan Black
Abstract: Phrase-based machine translation models have shown to yield better translations than Word-based models, since phrase pairs encode the contextual information that is needed for a more accurate translation. However, many phrase pairs do not encode any relevant context, which means that the translation event encoded in that phrase pair is led by smaller translation events that are independent from each other, and can be found on smaller phrase pairs, with little or no loss in translation accuracy. In this work, we propose a relative entropy model for translation models, that measures how likely a phrase pair encodes a translation event that is derivable using smaller translation events with similar probabilities. This model is then applied to phrase table pruning. Tests show that considerable amounts of phrase pairs can be excluded, without much impact on the transla- . tion quality. In fact, we show that better translations can be obtained using our pruned models, due to the compression of the search space during decoding.
6 0.125352 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules
7 0.1208532 127 emnlp-2012-Transforming Trees to Improve Syntactic Convergence
8 0.11900713 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction
9 0.1177851 45 emnlp-2012-Exploiting Chunk-level Features to Improve Phrase Chunking
10 0.11068217 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT
11 0.10892491 55 emnlp-2012-Forest Reranking through Subtree Ranking
12 0.10401349 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation
13 0.096939318 11 emnlp-2012-A Systematic Comparison of Phrase Table Pruning Techniques
14 0.076564498 128 emnlp-2012-Translation Model Based Cross-Lingual Language Model Adaptation: from Word Models to Phrase Models
15 0.074325487 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage
16 0.071823873 108 emnlp-2012-Probabilistic Finite State Machines for Regression-based MT Evaluation
17 0.071487248 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP
18 0.070050836 105 emnlp-2012-Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output
19 0.067080684 81 emnlp-2012-Learning to Map into a Universal POS Tagset
20 0.062583245 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation
topicId topicWeight
[(0, 0.248), (1, -0.274), (2, -0.216), (3, -0.044), (4, -0.104), (5, -0.134), (6, -0.192), (7, 0.094), (8, 0.006), (9, -0.093), (10, -0.131), (11, 0.004), (12, -0.066), (13, 0.119), (14, -0.16), (15, 0.096), (16, 0.061), (17, 0.107), (18, 0.268), (19, 0.002), (20, 0.1), (21, -0.171), (22, -0.104), (23, -0.16), (24, 0.018), (25, 0.066), (26, 0.238), (27, 0.096), (28, 0.114), (29, 0.007), (30, 0.023), (31, 0.018), (32, -0.013), (33, 0.029), (34, 0.096), (35, -0.106), (36, -0.004), (37, -0.002), (38, 0.027), (39, -0.053), (40, -0.027), (41, 0.022), (42, 0.069), (43, 0.032), (44, 0.003), (45, 0.054), (46, -0.014), (47, 0.02), (48, -0.025), (49, -0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.94151598 67 emnlp-2012-Inducing a Discriminative Parser to Optimize Machine Translation Reordering
Author: Graham Neubig ; Taro Watanabe ; Shinsuke Mori
Abstract: This paper proposes a method for learning a discriminative parser for machine translation reordering using only aligned parallel text. This is done by treating the parser’s derivation tree as a latent variable in a model that is trained to maximize reordering accuracy. We demonstrate that efficient large-margin training is possible by showing that two measures of reordering accuracy can be factored over the parse tree. Using this model in the pre-ordering framework results in significant gains in translation accuracy over standard phrasebased SMT and previously proposed unsupervised syntax induction methods.
2 0.89278263 31 emnlp-2012-Cross-Lingual Language Modeling with Syntactic Reordering for Low-Resource Speech Recognition
Author: Ping Xu ; Pascale Fung
Abstract: This paper proposes cross-lingual language modeling for transcribing source resourcepoor languages and translating them into target resource-rich languages if necessary. Our focus is to improve the speech recognition performance of low-resource languages by leveraging the language model statistics from resource-rich languages. The most challenging work of cross-lingual language modeling is to solve the syntactic discrepancies between the source and target languages. We therefore propose syntactic reordering for cross-lingual language modeling, and present a first result that compares inversion transduction grammar (ITG) reordering constraints to IBM and local constraints in an integrated speech transcription and translation system. Evaluations on resource-poor Cantonese speech transcription and Cantonese to resource-rich Mandarin translation tasks show that our proposed approach improves the system performance significantly, up to 3.4% relative WER reduction in Cantonese transcription and 13.3% relative bilingual evaluation understudy (BLEU) score improvement in Mandarin transcription compared with the system without reordering.
3 0.71563494 54 emnlp-2012-Forced Derivation Tree based Model Training to Statistical Machine Translation
Author: Nan Duan ; Mu Li ; Ming Zhou
Abstract: A forced derivation tree (FDT) of a sentence pair {f, e} denotes a derivation tree that can tpraainrsl {afte, f} idnetono itetss a acc duerraivtea target etrea tnhsaltat cioann e. In this paper, we present an approach that leverages structured knowledge contained in FDTs to train component models for statistical machine translation (SMT) systems. We first describe how to generate different FDTs for each sentence pair in training corpus, and then present how to infer the optimal FDTs based on their derivation and alignment qualities. As the first step in this line of research, we verify the effectiveness of our approach in a BTGbased phrasal system, and propose four FDTbased component models. Experiments are carried out on large scale English-to-Japanese and Chinese-to-English translation tasks, and significant improvements are reported on both translation quality and alignment quality.
4 0.44744387 86 emnlp-2012-Locally Training the Log-Linear Model for SMT
Author: Lemao Liu ; Hailong Cao ; Taro Watanabe ; Tiejun Zhao ; Mo Yu ; Conghui Zhu
Abstract: In statistical machine translation, minimum error rate training (MERT) is a standard method for tuning a single weight with regard to a given development data. However, due to the diversity and uneven distribution of source sentences, there are two problems suffered by this method. First, its performance is highly dependent on the choice of a development set, which may lead to an unstable performance for testing. Second, translations become inconsistent at the sentence level since tuning is performed globally on a document level. In this paper, we propose a novel local training method to address these two problems. Unlike a global training method, such as MERT, in which a single weight is learned and used for all the input sentences, we perform training and testing in one step by learning a sentencewise weight for each input sentence. We pro- pose efficient incremental training methods to put the local training into practice. In NIST Chinese-to-English translation tasks, our local training method significantly outperforms MERT with the maximal improvements up to 2.0 BLEU points, meanwhile its efficiency is comparable to that of the global method.
5 0.40711391 45 emnlp-2012-Exploiting Chunk-level Features to Improve Phrase Chunking
Author: Junsheng Zhou ; Weiguang Qu ; Fen Zhang
Abstract: Most existing systems solved the phrase chunking task with the sequence labeling approaches, in which the chunk candidates cannot be treated as a whole during parsing process so that the chunk-level features cannot be exploited in a natural way. In this paper, we formulate phrase chunking as a joint segmentation and labeling task. We propose an efficient dynamic programming algorithm with pruning for decoding, which allows the direct use of the features describing the internal characteristics of chunk and the features capturing the correlations between adjacent chunks. A relaxed, online maximum margin training algorithm is used for learning. Within this framework, we explored a variety of effective feature representations for Chinese phrase chunking. The experimental results show that the use of chunk-level features can lead to significant performance improvement, and that our approach achieves state-of-the-art performance. In particular, our approach is much better at recognizing long and complicated phrases. 1
6 0.37076545 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules
7 0.35869923 55 emnlp-2012-Forest Reranking through Subtree Ranking
8 0.34657678 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT
9 0.34279194 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction
10 0.32079288 108 emnlp-2012-Probabilistic Finite State Machines for Regression-based MT Evaluation
11 0.31033856 127 emnlp-2012-Transforming Trees to Improve Syntactic Convergence
12 0.28517708 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation
13 0.28187913 42 emnlp-2012-Entropy-based Pruning for Phrase-based Machine Translation
14 0.27271307 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP
15 0.2401244 128 emnlp-2012-Translation Model Based Cross-Lingual Language Model Adaptation: from Word Models to Phrase Models
16 0.23453142 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage
17 0.22766277 118 emnlp-2012-Source Language Adaptation for Resource-Poor Machine Translation
18 0.21948226 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
19 0.21477151 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
20 0.21363793 105 emnlp-2012-Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output
topicId topicWeight
[(2, 0.012), (11, 0.022), (14, 0.028), (16, 0.035), (25, 0.013), (34, 0.124), (60, 0.079), (63, 0.04), (64, 0.01), (65, 0.013), (70, 0.03), (74, 0.079), (75, 0.318), (76, 0.04), (79, 0.013), (80, 0.012), (86, 0.021), (95, 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.77292067 67 emnlp-2012-Inducing a Discriminative Parser to Optimize Machine Translation Reordering
Author: Graham Neubig ; Taro Watanabe ; Shinsuke Mori
Abstract: This paper proposes a method for learning a discriminative parser for machine translation reordering using only aligned parallel text. This is done by treating the parser’s derivation tree as a latent variable in a model that is trained to maximize reordering accuracy. We demonstrate that efficient large-margin training is possible by showing that two measures of reordering accuracy can be factored over the parse tree. Using this model in the pre-ordering framework results in significant gains in translation accuracy over standard phrasebased SMT and previously proposed unsupervised syntax induction methods.
2 0.50339669 54 emnlp-2012-Forced Derivation Tree based Model Training to Statistical Machine Translation
Author: Nan Duan ; Mu Li ; Ming Zhou
Abstract: A forced derivation tree (FDT) of a sentence pair {f, e} denotes a derivation tree that can tpraainrsl {afte, f} idnetono itetss a acc duerraivtea target etrea tnhsaltat cioann e. In this paper, we present an approach that leverages structured knowledge contained in FDTs to train component models for statistical machine translation (SMT) systems. We first describe how to generate different FDTs for each sentence pair in training corpus, and then present how to infer the optimal FDTs based on their derivation and alignment qualities. As the first step in this line of research, we verify the effectiveness of our approach in a BTGbased phrasal system, and propose four FDTbased component models. Experiments are carried out on large scale English-to-Japanese and Chinese-to-English translation tasks, and significant improvements are reported on both translation quality and alignment quality.
3 0.464504 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT
Author: Shujie Liu ; Chi-Ho Li ; Mu Li ; Ming Zhou
Abstract: The training of most syntactic SMT approaches involves two essential components, word alignment and monolingual parser. In the current state of the art these two components are mutually independent, thus causing problems like lack of rule generalization, and violation of syntactic correspondence in translation rules. In this paper, we propose two ways of re-training monolingual parser with the target of maximizing the consistency between parse trees and alignment matrices. One is targeted self-training with a simple evaluation function; the other is based on training data selection from forced alignment of bilingual data. We also propose an auxiliary method for boosting alignment quality, by symmetrizing alignment matrices with respect to parse trees. The best combination of these novel methods achieves 3 Bleu point gain in an IWSLT task and more than 1 Bleu point gain in NIST tasks. 1
4 0.45864713 42 emnlp-2012-Entropy-based Pruning for Phrase-based Machine Translation
Author: Wang Ling ; Joao Graca ; Isabel Trancoso ; Alan Black
Abstract: Phrase-based machine translation models have shown to yield better translations than Word-based models, since phrase pairs encode the contextual information that is needed for a more accurate translation. However, many phrase pairs do not encode any relevant context, which means that the translation event encoded in that phrase pair is led by smaller translation events that are independent from each other, and can be found on smaller phrase pairs, with little or no loss in translation accuracy. In this work, we propose a relative entropy model for translation models, that measures how likely a phrase pair encodes a translation event that is derivable using smaller translation events with similar probabilities. This model is then applied to phrase table pruning. Tests show that considerable amounts of phrase pairs can be excluded, without much impact on the transla- . tion quality. In fact, we show that better translations can be obtained using our pruned models, due to the compression of the search space during decoding.
5 0.4565281 127 emnlp-2012-Transforming Trees to Improve Syntactic Convergence
Author: David Burkett ; Dan Klein
Abstract: We describe a transformation-based learning method for learning a sequence of monolingual tree transformations that improve the agreement between constituent trees and word alignments in bilingual corpora. Using the manually annotated English Chinese Translation Treebank, we show how our method automatically discovers transformations that accommodate differences in English and Chinese syntax. Furthermore, when transformations are learned on automatically generated trees and alignments from the same domain as the training data for a syntactic MT system, the transformed trees achieve a 0.9 BLEU improvement over baseline trees.
6 0.45118129 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction
7 0.4504984 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation
8 0.45031789 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing
9 0.44848031 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
10 0.4478125 69 emnlp-2012-Joining Forces Pays Off: Multilingual Joint Word Sense Disambiguation
11 0.44756755 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP
12 0.44712931 45 emnlp-2012-Exploiting Chunk-level Features to Improve Phrase Chunking
13 0.44706783 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
14 0.44652137 11 emnlp-2012-A Systematic Comparison of Phrase Table Pruning Techniques
15 0.445943 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
16 0.44565967 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
17 0.4443188 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
18 0.4437587 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage
19 0.44241801 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
20 0.44213322 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM