acl acl2012 acl2012-141 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xiaodong He ; Li Deng
Abstract: This paper proposes a new discriminative training method in constructing phrase and lexicon translation models. In order to reliably learn a myriad of parameters in these models, we propose an expected BLEU score-based utility function with KL regularization as the objective, and train the models on a large parallel dataset. For training, we derive growth transformations for phrase and lexicon translation probabilities to iteratively improve the objective. The proposed method, evaluated on the Europarl German-to-English dataset, leads to a 1.1 BLEU point improvement over a state-of-the-art baseline translation system. In IWSLT 201 1 Benchmark, our system using the proposed method achieves the best Chinese-to-English translation result on the task of translating TED talks.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract This paper proposes a new discriminative training method in constructing phrase and lexicon translation models. [sent-2, score-1.022]
2 In order to reliably learn a myriad of parameters in these models, we propose an expected BLEU score-based utility function with KL regularization as the objective, and train the models on a large parallel dataset. [sent-3, score-0.485]
3 For training, we derive growth transformations for phrase and lexicon translation probabilities to iteratively improve the objective. [sent-4, score-0.942]
4 1 BLEU point improvement over a state-of-the-art baseline translation system. [sent-6, score-0.383]
5 In IWSLT 201 1 Benchmark, our system using the proposed method achieves the best Chinese-to-English translation result on the task of translating TED talks. [sent-7, score-0.425]
6 Introduction Discriminative training is an active area in statistical machine translation (SMT) (e. [sent-9, score-0.411]
7 Och (2003) proposed using a loglinear model to incorporate multiple features for translation, and proposed a minimum error rate training (MERT) method to train the feature weights to optimize a desirable translation metric. [sent-17, score-0.618]
8 While the log-linear model itself is discriminative, the phrase and lexicon translation features, which are among the most important components of SMT, are derived from either generative models or heuristics (Koehn et al. [sent-18, score-0.89]
9 deng@mi cro s o ft com parameters in the phrase and lexicon translation models are estimated by relative frequency or maximizing joint likelihood, which may not correspond closely to the translation measure, e. [sent-22, score-1.368]
10 Therefore, it is desirable to train all these parameters to directly maximize an objective that directly links to translation quality. [sent-26, score-0.507]
11 However, there are a large number of parameters in these models, making discriminative training for them non-trivial (e. [sent-27, score-0.247]
12 Since many of the reference translations are non-reachable, an empirical local updating strategy had to be devised to fix this problem by picking a pseudo reference. [sent-34, score-0.262]
13 However, the number of parameters in common phrase and lexicon translation models is much larger. [sent-38, score-0.955]
14 In this work, we present a new, highly effective discriminative learning method for phrase and lexicon translation models. [sent-39, score-0.935]
15 The training objective is an expected BLEU score, which is closely linked to translation quality. [sent-40, score-0.628]
16 For effective optimization, we derive updating formulas of growth transformation (GT) for phrase and lexicon translation probabilities. [sent-42, score-1.132]
17 A GT is a transformation of the probabilities that guarantees strict non-decrease of the objective over each GT iteration unless a local maximum is reached. [sent-43, score-0.249]
18 Related Work One best known approach in discriminative training for SMT is proposed by Och (2003). [sent-58, score-0.228]
19 (2006) proposed a large set of lexical and Part-ofSpeech features in addition to the phrase translation model. [sent-62, score-0.629]
20 In that paper, the authors pointed out that forcing the model to update towards the reference translation could be problematic. [sent-64, score-0.434]
21 This is because the hidden structure such as phrase segmentation and alignment could be abused if the system is forced to produce a reference translation. [sent-65, score-0.353]
22 Therefore, instead of pushing the parameter update towards the reference translation (a. [sent-66, score-0.472]
23 bold updating), the author proposed a local updating strategy where the model parameters are updated towards a pseudo-reference (i. [sent-69, score-0.286]
24 This avoids the heuristics of picking the updating reference and therefore gives a more principal way of setting the training objective. [sent-75, score-0.351]
25 In our work, we have many more parameters to train, and the training is conducted on the entire training corpora. [sent-79, score-0.239]
26 (201 1) have proposed using differentiable expected BLEU score as the objective to train system combination parameters. [sent-82, score-0.305]
27 In these earlier work, however, the phrase and lexicon translation models used remained unchanged. [sent-86, score-0.89]
28 Another line of research that is closely related to our work is phrase table refinement and pruning. [sent-87, score-0.304]
29 (2010) proposed a method to train the phrase translation model using ExpectationMaximization algorithm with a leave-one-out strategy. [sent-89, score-0.663]
30 The parallel sentences were forced to be aligned at the phrase level using the phrase table and other features as in a decoding process. [sent-90, score-0.639]
31 Then the phrase translation probabilities were estimated based on the phrase alignments. [sent-91, score-0.898]
32 To prevent overfitting, the statistics of phrase pairs from a particular sentence was excluded from the phrase table when aligning that sentence. [sent-92, score-0.555]
33 , forced alignment between a source sentence and its reference translation was tricky, and the proposed alignment was likely to be unreliable. [sent-95, score-0.534]
34 Phrase-based Translation System The translation process of phrase-based SMT can be briefly described in three steps: segment source sentence into a sequence of phrases, translate each source phrase to a target phrase, re-order target phrases into target sentence (Koehn et al. [sent-98, score-0.831]
35 = Features used in a phrase-based system usually include LM, reordering model, word and phrase counts, and phrase and lexicon translation models. [sent-135, score-1.099]
36 Given the focus of this paper, we review only the phrase and lexicon translation models below. [sent-136, score-0.89]
37 Phrase translation model A set of phrase pairs are extracted from wordaligned parallel corpus according to phrase extraction rules (Koehn et al. [sent-139, score-0.876]
38 Phrase translation probabilities are then computed as relative frequencies of phrases over the training dataset. [sent-141, score-0.467]
39 In translation, the input sentence is segmented into K phrases, and the source-to-target forward phrase (FP) translation feature is scored as: ℎ! [sent-163, score-0.663]
40 The target-to-source (backward) phrase translation model is defined similarly. [sent-177, score-0.583]
41 Lexicon translation model There are several variations in lexicon translation features (Ayan and Dorr 2006, Koehn et al. [sent-180, score-0.905]
42 We use the word translation table from IBM Model 1 (Brown et al. [sent-183, score-0.324]
43 , 1993) and compute the sum over all possible word alignments within a phrase pair without normalizing for length (Quirk et al. [sent-184, score-0.259]
44 The source-to-target forward lexicon (FL) translation feature is: ℎ! [sent-186, score-0.624]
45 The target-to-source (backward) lexicon translation model is defined similarly. [sent-224, score-0.581]
46 set of all the parameters to be the optimized, including forward phrase and lexicon translation probabilities and their backward counterparts. [sent-229, score-1.059]
47 ∗ is the reference translation of the n-th source sentence ! [sent-267, score-0.443]
48 ) that denotes the list of translation hypotheses of ! [sent-274, score-0.324]
49 is proportional (with a factor of N) to the expected sentence BLEU score over the entire training set, i. [sent-308, score-0.313]
50 ∗) In a phrase-based SMT system, the total number of parameters of phrase and lexicon translation models, which we aim to learn discriminatively, is very large (see Table 1). [sent-329, score-0.905]
51 in our approach is the relative-frequency-based phrase translation model and the maximum-likelihood-estimated IBM model 1 (word translation model). [sent-365, score-0.907]
52 Optimization In this section, we derived GT formulas for iteratively updating the parameters so as to optimize objective (9). [sent-370, score-0.395]
53 Extended Baum-Welch Algorithm Baum-Eagon inequality (Baum and Eagon, 1967) gives the GT formula to iteratively maximize positive-coefficient polynomials of random 295 variables that are subject to sum-to-one constants. [sent-379, score-0.287]
54 GT of Translation Models Now we derive the GTs of translation models for our objective. [sent-456, score-0.42]
55 " the probability of translating the source phrase ito the target phrase j. [sent-544, score-0.642]
56 Then, the updating formula is (derivation omitted): ! [sent-545, score-0.257]
57 Then following the same derivation, we get the updating formula for forward lexicon translation model: ! [sent-682, score-0.881]
58 are the r-th and m-th word in the k-th phrase of the source sentence ! [sent-806, score-0.329]
59 GTs for updating backward phrase and lexicon translation models can be derived in a similar way, and is omitted here. [sent-814, score-1.12]
60 in the model updating formula is computed according to (2). [sent-828, score-0.257]
61 " = for sentence-level We further scale the reference length, r, by a factor such that the total length of references on the training set equals that of the baseline BLEU1. [sent-908, score-0.243]
62 Training procedure The parameter set θ is optimized on the training set while the feature weights are tuned on a small tuning Since θ and affect the training of each other, we train them in alternation. [sent-920, score-0.368]
63 Due to mismatch between training and tuning data, the training process might not always converge. [sent-925, score-0.255]
64 297 To build the baseline phrase-based SMT system, we first perform word alignment on the training set using a hidden Markov model with lexicalized distortion (He 2007), then extract the phrase table from the word aligned bilingual texts (Koehn et al. [sent-944, score-0.405]
65 Other models used in the baseline system include lexicalized ordering model, word count and phrase count, and a 3-gram LM trained on the English side of the parallel training corpus. [sent-947, score-0.489]
66 Details of the phrase and lexicon translation models are given in Table 1. [sent-950, score-0.89]
67 This baseline system is also used to generate a 100-best list of the training corpus during maximum expected BLEU training. [sent-953, score-0.268]
68 2 Experimental results on the Europarl task During training, we first tune the regularization factor τ based on the performance on the validation set. [sent-959, score-0.239]
69 For simplicity reasons, the tuning of τ makes use of only the phrase translation models. [sent-960, score-0.664]
70 "#$ Fixing the optimal regularization factor τ, we then study the relationship between the expected sentence-level BLEU (Exp. [sent-982, score-0.245]
71 Since the expected BLEU is affected by λ strongly, we fix the value of λ in order to make the expected BLEU comparable across different iterations. [sent-987, score-0.251]
72 2 it is clear that the expected BLEU score correlates strongly with the real BLEU score, justifying its use as our training objective. [sent-989, score-0.279]
73 Next, we study the effects of training the phrase translation probabilities and the lexicon translation probabilities according to the GT formulas presented in the preceding section. [sent-992, score-1.434]
74 Compared with the baseline, training phrase or lexicon models alone gives a gain of 0. [sent-994, score-0.731]
75 Both learning schedules give significant improvements over the baseline and also over training phrase or lexicon models alone. [sent-998, score-0.756]
76 The phrase translation probabilities (PT) are trained alone in the first stage, shown in blue color. [sent-1004, score-0.639]
77 After five iterations, the BLEU score on the validation set reaches the peak value, with further iteration giving BLEU score fluctuation. [sent-1005, score-0.263]
78 Hence, we perform lexicon model (LEX) training starting from the sixth iteration with the corresponding BLEU scores shown in red color in Fig. [sent-1006, score-0.419]
79 4 points after additional three iterations of training the lexicon models. [sent-1009, score-0.379]
80 In total, nine iterations are performed to comp lUete the two-stage GT training of all phrase and leLExicon models. [sent-1010, score-0.381]
81 The BLEU measures from various settings of maximum expected BLEU training are compared with the baseline, where * denotes that the gain over the baseline is statistically significant with a significance level > 99%, measured by paired bootstrap resampling method proposed by Koehn (2004). [sent-1015, score-0.352]
82 BLEU scores on the validation set as a function of the GT training iteration in two-stage training of both the phrase translation models (PT) and the lexicon models (LEX). [sent-1017, score-1.323]
83 The BLEU scores on training phrase models are shown in blue, and on training lexicon models in red. [sent-1018, score-0.79]
84 3 Experiments on the IWSLT2011 benchmark As the second evaluation task, we apply our new method described in this paper to the 201 1 IWSLT Chinese-to-English machine translation benchmark (Federico et al. [sent-1020, score-0.422]
85 The main focus of the IWSLT201 1 Evaluation is the translation of TED talks (www. [sent-1022, score-0.4]
86 In the Chinese-to-English translation task, we are provided with human translated Chinese text with punctuations inserted. [sent-1026, score-0.324]
87 In our system, a primary phrase table is trained from the 110K TED parallel training data, and a 3gram LM is trained on the English side of the parallel data. [sent-1035, score-0.414]
88 From them, we train a secondary 5-gram LM on 115M sentences of supplementary English data, and a secondary phrase table from 500K sentences selected from the supplementary UN corpus by the method proposed by Axelrod et al. [sent-1037, score-0.605]
89 × In carrying out the maximum expected BLEU training, we use 100-best list and tune the regularization factor to the optimal value of τ = 1 10! [sent-1039, score-0.316]
90 We only train the parameters of the primary phrase table. [sent-1042, score-0.358]
91 The secondary phrase table and LM are excluded from the training process since the out-of-domain phrase table is less relevant to the TED translation task, and the large LM slows down the N-best generation process significantly. [sent-1043, score-1.018]
92 At the end, we perform one final MERT to tune the relative weights with all features including the secondary phrase table and LM. [sent-1044, score-0.389]
93 The baseline is a phrase-based system with all features including the secondary phrase table and LM. [sent-1046, score-0.407]
94 The new system uses the same features except that the primary phrase table is discriminatively 299 trained using maximum expected-BLEU and GT optimization as described earlier in this paper. [sent-1047, score-0.388]
95 The results are obtained using the two-stage training schedule, including six iterations for training phrase translation models and two iterations for training lexicon translation models. [sent-1048, score-1.545]
96 9) for training of large-scale translation models, including phrase and lexicon models, with more parameters than all previous methods have attempted. [sent-1059, score-0.992]
97 The objective function consists of 1) the utility function of expected BLEU score, and 2) the regularization term taking the form of KL divergence in the parameter space. [sent-1060, score-0.533]
98 The expected BLEU score is closely linked to translation quality and the regularization is essential when many parameters are trained at scale. [sent-1061, score-0.684]
99 Third, the new objective function and new optimization technique are successfully applied to two important machine translation tasks, with implementation issues resolved (e. [sent-1065, score-0.508]
100 Decomposability of translation metrics for improved evaluation and efficient algorithms. [sent-1102, score-0.324]
wordName wordTfidf (topN-words)
[('bleu', 0.384), ('gt', 0.341), ('translation', 0.324), ('phrase', 0.259), ('lexicon', 0.257), ('updating', 0.175), ('kl', 0.121), ('bp', 0.114), ('regularization', 0.109), ('gopalakrishnan', 0.102), ('ted', 0.099), ('discriminative', 0.095), ('iwslt', 0.091), ('secondary', 0.089), ('inequality', 0.089), ('expected', 0.088), ('training', 0.087), ('europarl', 0.085), ('objective', 0.084), ('smt', 0.083), ('formula', 0.082), ('validation', 0.082), ('tuning', 0.081), ('polynomials', 0.076), ('talks', 0.076), ('xiaodong', 0.076), ('iteration', 0.075), ('formulas', 0.071), ('quirk', 0.067), ('rational', 0.065), ('parameters', 0.065), ('och', 0.061), ('update', 0.061), ('baseline', 0.059), ('koehn', 0.058), ('lm', 0.057), ('posterior', 0.057), ('divergence', 0.057), ('probabilities', 0.056), ('translating', 0.055), ('backward', 0.055), ('chiang', 0.054), ('deng', 0.054), ('utility', 0.053), ('score', 0.053), ('function', 0.052), ('liang', 0.052), ('baum', 0.051), ('justifying', 0.051), ('models', 0.05), ('benchmark', 0.049), ('reference', 0.049), ('lex', 0.049), ('optimization', 0.048), ('factor', 0.048), ('discriminatively', 0.047), ('derive', 0.046), ('proposed', 0.046), ('log', 0.046), ('forced', 0.045), ('closely', 0.045), ('schedules', 0.044), ('ayan', 0.044), ('ebw', 0.044), ('gts', 0.044), ('schedule', 0.044), ('supplementary', 0.044), ('maximizing', 0.044), ('franz', 0.043), ('forward', 0.043), ('decoding', 0.042), ('weights', 0.041), ('povey', 0.041), ('wuebker', 0.041), ('rosti', 0.041), ('microsoft', 0.04), ('gives', 0.04), ('minimum', 0.04), ('mert', 0.039), ('gain', 0.038), ('parameter', 0.038), ('fix', 0.038), ('axelrod', 0.038), ('tromble', 0.038), ('redmond', 0.038), ('commonly', 0.038), ('sentence', 0.037), ('value', 0.037), ('josef', 0.037), ('target', 0.036), ('denominator', 0.036), ('iterations', 0.035), ('leads', 0.035), ('emnlp', 0.035), ('parallel', 0.034), ('macherey', 0.034), ('train', 0.034), ('maximum', 0.034), ('ibm', 0.033), ('source', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models
Author: Xiaodong He ; Li Deng
Abstract: This paper proposes a new discriminative training method in constructing phrase and lexicon translation models. In order to reliably learn a myriad of parameters in these models, we propose an expected BLEU score-based utility function with KL regularization as the objective, and train the models on a large parallel dataset. For training, we derive growth transformations for phrase and lexicon translation probabilities to iteratively improve the objective. The proposed method, evaluated on the Europarl German-to-English dataset, leads to a 1.1 BLEU point improvement over a state-of-the-art baseline translation system. In IWSLT 201 1 Benchmark, our system using the proposed method achieves the best Chinese-to-English translation result on the task of translating TED talks.
2 0.23583663 140 acl-2012-Machine Translation without Words through Substring Alignment
Author: Graham Neubig ; Taro Watanabe ; Shinsuke Mori ; Tatsuya Kawahara
Abstract: In this paper, we demonstrate that accurate machine translation is possible without the concept of “words,” treating MT as a problem of transformation between character strings. We achieve this result by applying phrasal inversion transduction grammar alignment techniques to character strings to train a character-based translation model, and using this in the phrase-based MT framework. We also propose a look-ahead parsing algorithm and substring-informed prior probabilities to achieve more effective and efficient alignment. In an evaluation, we demonstrate that character-based translation can achieve results that compare to word-based systems while effectively translating unknown and uncommon words over several language pairs.
3 0.23327364 203 acl-2012-Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information
Author: Jinsong Su ; Hua Wu ; Haifeng Wang ; Yidong Chen ; Xiaodong Shi ; Huailin Dong ; Qun Liu
Abstract: To adapt a translation model trained from the data in one domain to another, previous works paid more attention to the studies of parallel corpus while ignoring the in-domain monolingual corpora which can be obtained more easily. In this paper, we propose a novel approach for translation model adaptation by utilizing in-domain monolingual topic information instead of the in-domain bilingual corpora, which incorporates the topic information into translation probability estimation. Our method establishes the relationship between the out-of-domain bilingual corpus and the in-domain monolingual corpora via topic mapping and phrase-topic distribution probability estimation from in-domain monolingual corpora. Experimental result on the NIST Chinese-English translation task shows that our approach significantly outperforms the baseline system.
4 0.22285631 143 acl-2012-Mixing Multiple Translation Models in Statistical Machine Translation
Author: Majid Razmara ; George Foster ; Baskaran Sankaran ; Anoop Sarkar
Abstract: Statistical machine translation is often faced with the problem of combining training data from many diverse sources into a single translation model which then has to translate sentences in a new domain. We propose a novel approach, ensemble decoding, which combines a number of translation systems dynamically at the decoding step. In this paper, we evaluate performance on a domain adaptation setting where we translate sentences from the medical domain. Our experimental results show that ensemble decoding outperforms various strong baselines including mixture models, the current state-of-the-art for domain adaptation in machine translation.
5 0.22007117 67 acl-2012-Deciphering Foreign Language by Combining Language Models and Context Vectors
Author: Malte Nuhn ; Arne Mauser ; Hermann Ney
Abstract: In this paper we show how to train statistical machine translation systems on reallife tasks using only non-parallel monolingual data from two languages. We present a modification of the method shown in (Ravi and Knight, 2011) that is scalable to vocabulary sizes of several thousand words. On the task shown in (Ravi and Knight, 2011) we obtain better results with only 5% of the computational effort when running our method with an n-gram language model. The efficiency improvement of our method allows us to run experiments with vocabulary sizes of around 5,000 words, such as a non-parallel version of the VERBMOBIL corpus. We also report results using data from the monolingual French and English GIGAWORD corpora.
6 0.20877191 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
9 0.20717347 158 acl-2012-PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning
10 0.20528562 131 acl-2012-Learning Translation Consensus with Structured Label Propagation
11 0.20282073 54 acl-2012-Combining Word-Level and Character-Level Models for Machine Translation Between Closely-Related Languages
12 0.20229247 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
13 0.19883864 199 acl-2012-Topic Models for Dynamic Translation Model Adaptation
14 0.19385935 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
15 0.18036108 179 acl-2012-Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the l0-norm
16 0.17834839 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT
17 0.17688757 128 acl-2012-Learning Better Rule Extraction with Translation Span Alignment
18 0.17513779 204 acl-2012-Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation
19 0.1633117 125 acl-2012-Joint Learning of a Dual SMT System for Paraphrase Generation
20 0.1581645 46 acl-2012-Character-Level Machine Translation Evaluation for Languages with Ambiguous Word Boundaries
topicId topicWeight
[(0, -0.357), (1, -0.359), (2, 0.203), (3, 0.074), (4, 0.074), (5, -0.072), (6, -0.003), (7, -0.008), (8, -0.016), (9, -0.013), (10, -0.035), (11, -0.018), (12, -0.055), (13, 0.021), (14, -0.033), (15, 0.027), (16, 0.056), (17, 0.08), (18, 0.002), (19, 0.007), (20, 0.021), (21, -0.13), (22, 0.068), (23, 0.015), (24, 0.141), (25, -0.08), (26, 0.075), (27, 0.056), (28, -0.089), (29, -0.009), (30, -0.019), (31, -0.024), (32, 0.04), (33, 0.001), (34, 0.091), (35, -0.085), (36, 0.023), (37, -0.003), (38, 0.017), (39, -0.049), (40, 0.022), (41, 0.014), (42, -0.011), (43, 0.019), (44, 0.038), (45, 0.074), (46, -0.016), (47, 0.038), (48, 0.05), (49, 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.97646856 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models
Author: Xiaodong He ; Li Deng
Abstract: This paper proposes a new discriminative training method in constructing phrase and lexicon translation models. In order to reliably learn a myriad of parameters in these models, we propose an expected BLEU score-based utility function with KL regularization as the objective, and train the models on a large parallel dataset. For training, we derive growth transformations for phrase and lexicon translation probabilities to iteratively improve the objective. The proposed method, evaluated on the Europarl German-to-English dataset, leads to a 1.1 BLEU point improvement over a state-of-the-art baseline translation system. In IWSLT 201 1 Benchmark, our system using the proposed method achieves the best Chinese-to-English translation result on the task of translating TED talks.
2 0.82920843 67 acl-2012-Deciphering Foreign Language by Combining Language Models and Context Vectors
Author: Malte Nuhn ; Arne Mauser ; Hermann Ney
Abstract: In this paper we show how to train statistical machine translation systems on reallife tasks using only non-parallel monolingual data from two languages. We present a modification of the method shown in (Ravi and Knight, 2011) that is scalable to vocabulary sizes of several thousand words. On the task shown in (Ravi and Knight, 2011) we obtain better results with only 5% of the computational effort when running our method with an n-gram language model. The efficiency improvement of our method allows us to run experiments with vocabulary sizes of around 5,000 words, such as a non-parallel version of the VERBMOBIL corpus. We also report results using data from the monolingual French and English GIGAWORD corpora.
Author: Joern Wuebker ; Hermann Ney ; Richard Zens
Abstract: In this work we present two extensions to the well-known dynamic programming beam search in phrase-based statistical machine translation (SMT), aiming at increased efficiency of decoding by minimizing the number of language model computations and hypothesis expansions. Our results show that language model based pre-sorting yields a small improvement in translation quality and a speedup by a factor of 2. Two look-ahead methods are shown to further increase translation speed by a factor of2 without changing the search space and a factor of 4 with the side-effect of some additional search errors. We compare our ap- proach with Moses and observe the same performance, but a substantially better trade-off between translation quality and speed. At a speed of roughly 70 words per second, Moses reaches 17.2% BLEU, whereas our approach yields 20.0% with identical models.
4 0.82848293 143 acl-2012-Mixing Multiple Translation Models in Statistical Machine Translation
Author: Majid Razmara ; George Foster ; Baskaran Sankaran ; Anoop Sarkar
Abstract: Statistical machine translation is often faced with the problem of combining training data from many diverse sources into a single translation model which then has to translate sentences in a new domain. We propose a novel approach, ensemble decoding, which combines a number of translation systems dynamically at the decoding step. In this paper, we evaluate performance on a domain adaptation setting where we translate sentences from the medical domain. Our experimental results show that ensemble decoding outperforms various strong baselines including mixture models, the current state-of-the-art for domain adaptation in machine translation.
5 0.81487375 158 acl-2012-PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning
Author: Boxing Chen ; Roland Kuhn ; Samuel Larkin
Abstract: Many machine translation (MT) evaluation metrics have been shown to correlate better with human judgment than BLEU. In principle, tuning on these metrics should yield better systems than tuning on BLEU. However, due to issues such as speed, requirements for linguistic resources, and optimization difficulty, they have not been widely adopted for tuning. This paper presents PORT , a new MT evaluation metric which combines precision, recall and an ordering metric and which is primarily designed for tuning MT systems. PORT does not require external resources and is quick to compute. It has a better correlation with human judgment than BLEU. We compare PORT-tuned MT systems to BLEU-tuned baselines in five experimental conditions involving four language pairs. PORT tuning achieves 1 consistently better performance than BLEU tuning, according to four automated metrics (including BLEU) and to human evaluation: in comparisons of outputs from 300 source sentences, human judges preferred the PORT-tuned output 45.3% of the time (vs. 32.7% BLEU tuning preferences and 22.0% ties). 1
6 0.78357404 54 acl-2012-Combining Word-Level and Character-Level Models for Machine Translation Between Closely-Related Languages
7 0.78110951 131 acl-2012-Learning Translation Consensus with Structured Label Propagation
8 0.78026497 136 acl-2012-Learning to Translate with Multiple Objectives
9 0.76820177 46 acl-2012-Character-Level Machine Translation Evaluation for Languages with Ambiguous Word Boundaries
10 0.73753172 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
11 0.72884142 203 acl-2012-Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information
12 0.71535683 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT
13 0.70767385 140 acl-2012-Machine Translation without Words through Substring Alignment
14 0.70471072 204 acl-2012-Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation
15 0.69372112 163 acl-2012-Prediction of Learning Curves in Machine Translation
16 0.69265389 178 acl-2012-Sentence Simplification by Monolingual Machine Translation
17 0.68179095 66 acl-2012-DOMCAT: A Bilingual Concordancer for Domain-Specific Computer Assisted Translation
18 0.6748603 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
19 0.66583824 105 acl-2012-Head-Driven Hierarchical Phrase-based Translation
20 0.64200592 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
topicId topicWeight
[(25, 0.013), (26, 0.021), (28, 0.075), (30, 0.018), (37, 0.031), (39, 0.034), (57, 0.01), (74, 0.416), (82, 0.016), (84, 0.017), (85, 0.029), (90, 0.148), (92, 0.031), (94, 0.054), (99, 0.025)]
simIndex simValue paperId paperTitle
1 0.9673872 192 acl-2012-Tense and Aspect Error Correction for ESL Learners Using Global Context
Author: Toshikazu Tajiri ; Mamoru Komachi ; Yuji Matsumoto
Abstract: As the number of learners of English is constantly growing, automatic error correction of ESL learners’ writing is an increasingly active area of research. However, most research has mainly focused on errors concerning articles and prepositions even though tense/aspect errors are also important. One of the main reasons why tense/aspect error correction is difficult is that the choice of tense/aspect is highly dependent on global context. Previous research on grammatical error correction typically uses pointwise prediction that performs classification on each word independently, and thus fails to capture the information of neighboring labels. In order to take global information into account, we regard the task as sequence labeling: each verb phrase in a document is labeled with tense/aspect depending on surrounding labels. Our experiments show that the global context makes a moderate con- tribution to tense/aspect error correction.
2 0.92256397 215 acl-2012-WizIE: A Best Practices Guided Development Environment for Information Extraction
Author: Yunyao Li ; Laura Chiticariu ; Huahai Yang ; Frederick Reiss ; Arnaldo Carreno-fuentes
Abstract: Information extraction (IE) is becoming a critical building block in many enterprise applications. In order to satisfy the increasing text analytics demands of enterprise applications, it is crucial to enable developers with general computer science background to develop high quality IE extractors. In this demonstration, we present WizIE, an IE development environment intended to reduce the development life cycle and enable developers with little or no linguistic background to write high quality IE rules. WizI E provides an integrated wizard-like environment that guides IE developers step-by-step throughout the entire development process, based on best practices synthesized from the experience of expert developers. In addition, WizIE reduces the manual effort involved in performing key IE development tasks by offering automatic result explanation and rule discovery functionality. Preliminary results indicate that WizI E is a step forward towards enabling extractor development for novice IE developers.
3 0.85121256 95 acl-2012-Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining
Author: Ariya Rastrow ; Mark Dredze ; Sanjeev Khudanpur
Abstract: Long-span features, such as syntax, can improve language models for tasks such as speech recognition and machine translation. However, these language models can be difficult to use in practice because of the time required to generate features for rescoring a large hypothesis set. In this work, we propose substructure sharing, which saves duplicate work in processing hypothesis sets with redundant hypothesis structures. We apply substructure sharing to a dependency parser and part of speech tagger to obtain significant speedups, and further improve the accuracy of these tools through up-training. When using these improved tools in a language model for speech recognition, we obtain significant speed improvements with both N-best and hill climbing rescoring, and show that up-training leads to WER reduction.
same-paper 4 0.84714943 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models
Author: Xiaodong He ; Li Deng
Abstract: This paper proposes a new discriminative training method in constructing phrase and lexicon translation models. In order to reliably learn a myriad of parameters in these models, we propose an expected BLEU score-based utility function with KL regularization as the objective, and train the models on a large parallel dataset. For training, we derive growth transformations for phrase and lexicon translation probabilities to iteratively improve the objective. The proposed method, evaluated on the Europarl German-to-English dataset, leads to a 1.1 BLEU point improvement over a state-of-the-art baseline translation system. In IWSLT 201 1 Benchmark, our system using the proposed method achieves the best Chinese-to-English translation result on the task of translating TED talks.
Author: Joern Wuebker ; Hermann Ney ; Richard Zens
Abstract: In this work we present two extensions to the well-known dynamic programming beam search in phrase-based statistical machine translation (SMT), aiming at increased efficiency of decoding by minimizing the number of language model computations and hypothesis expansions. Our results show that language model based pre-sorting yields a small improvement in translation quality and a speedup by a factor of 2. Two look-ahead methods are shown to further increase translation speed by a factor of2 without changing the search space and a factor of 4 with the side-effect of some additional search errors. We compare our ap- proach with Moses and observe the same performance, but a substantially better trade-off between translation quality and speed. At a speed of roughly 70 words per second, Moses reaches 17.2% BLEU, whereas our approach yields 20.0% with identical models.
7 0.5808894 158 acl-2012-PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning
8 0.57998562 103 acl-2012-Grammar Error Correction Using Pseudo-Error Sentences and Domain Adaptation
9 0.56202525 8 acl-2012-A Corpus of Textual Revisions in Second Language Writing
10 0.55644953 92 acl-2012-FLOW: A First-Language-Oriented Writing Assistant System
11 0.55145931 136 acl-2012-Learning to Translate with Multiple Objectives
13 0.54377663 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation
14 0.54336131 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities
15 0.54320574 54 acl-2012-Combining Word-Level and Character-Level Models for Machine Translation Between Closely-Related Languages
16 0.54147601 125 acl-2012-Joint Learning of a Dual SMT System for Paraphrase Generation
17 0.53465658 121 acl-2012-Iterative Viterbi A* Algorithm for K-Best Sequential Decoding
18 0.53055495 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
19 0.52585435 203 acl-2012-Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information
20 0.52024692 163 acl-2012-Prediction of Learning Curves in Machine Translation