acl acl2010 acl2010-240 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Joern Wuebker ; Arne Mauser ; Hermann Ney
Abstract: Several attempts have been made to learn phrase translation probabilities for phrasebased statistical machine translation that go beyond pure counting of phrases in word-aligned training data. Most approaches report problems with overfitting. We describe a novel leavingone-out approach to prevent over-fitting that allows us to train phrase models that show improved translation performance on the WMT08 Europarl German-English task. In contrast to most previous work where phrase models were trained separately from other models used in translation, we include all components such as single word lexica and reordering mod- els in training. Using this consistent training of phrase models we are able to achieve improvements of up to 1.4 points in BLEU. As a side effect, the phrase table size is reduced by more than 80%.
Reference: text
sentIndex sentText sentNum sentScore
1 We describe a novel leavingone-out approach to prevent over-fitting that allows us to train phrase models that show improved translation performance on the WMT08 Europarl German-English task. [sent-3, score-0.84]
2 In contrast to most previous work where phrase models were trained separately from other models used in translation, we include all components such as single word lexica and reordering mod- els in training. [sent-4, score-0.799]
3 Using this consistent training of phrase models we are able to achieve improvements of up to 1. [sent-5, score-0.71]
4 1 Introduction A phrase-based SMT system takes a source sentence and produces a translation by segmenting the sentence into phrases and translating those phrases separately (Koehn et al. [sent-8, score-0.749]
5 The phrase translation table, which contains the bilingual phrase pairs and the corresponding translation probabilities, is one of the main components of an SMT system. [sent-10, score-1.534]
6 The most common method for obtaining the phrase table is heuristic extraction from automatically word-aligned bilingual training data (Och et al. [sent-11, score-0.759]
7 de matter, whether the phrases are extracted from a highly probable phrase alignment or from an unlikely one. [sent-17, score-1.028]
8 The joint counts e) of the source phrase and the target phrase e in the entire training data are normalized by the marginal counts of source and target phrase to obtain a conditional probability C(f˜, f˜ pH(f˜| e˜) =CC(f˜(e˜ ,) e˜ ). [sent-19, score-2.161]
9 (1) The translation process is implemented as a weighted log-linear combination of several models hm(e1I, s1K, f1J) including the logarithm of the phrase probability in source-to-target as well as in target-to-source direction. [sent-20, score-0.897]
10 The phrase model is combined with a language model, word lexicon models, word and phrase penalty, and many others. [sent-21, score-1.157]
11 In contrast to heuristic extraction, the proposed method provides a way of consistently training and using phrase models in translation. [sent-23, score-0.771]
12 Our results show that the proposed phrase model training improves translation quality on the test set by 0. [sent-45, score-0.91]
13 We find that by interpolation with the heuristically extracted phrases translation performance can reach up to 1. [sent-47, score-0.671]
14 2 Related Work It has been pointed out in literature, that training phrase models poses some difficulties. [sent-52, score-0.671]
15 Their results show that it can not reach a performance competitive to extracting a phrase table from word alignment by heuristics (Och et al. [sent-57, score-0.853]
16 For example, it may be possible to transform one valid segmentation into another by splitting some of its phrases into sub-phrases or by shifting phrase boundaries. [sent-62, score-0.797]
17 That in turn leads to over-fitting which shows in overly determinized estimates of the phrase translation prob- abilities. [sent-65, score-0.767]
18 , 2006) found that the trained phrase table shows a highly peaked distribution in opposition to the more flat distribution resulting from heuristic extraction, leaving the decoder only few translation options at decoding time. [sent-67, score-1.072]
19 Forced alignment can also be utilized to train a phrase segmentation model, as is shown in (Shen et al. [sent-80, score-0.931]
20 They train a conditional “inverse” phrase model of the target phrase given the source phrase. [sent-84, score-1.296]
21 Additionally to the phrases, they model the segmentation sequence that is used to produce a phrase alignment between the source and the target sentence. [sent-85, score-1.067]
22 They used a phrase length limit of 4 words with longer phrases not resulting in further improvements. [sent-86, score-0.722]
23 To counteract over-fitting, they interpolate the phrase model with IBM Model 1probabilities that are computed on the phrase level. [sent-87, score-1.19]
24 They report improvements over a phrase-based model that uses an inverse phrase model and a language model. [sent-90, score-0.709]
25 But instead of focusing on the statistical model and relaxing the translation task by using monotone translation only, we use a full and competitive translation system as starting point with reordering and all models included. [sent-93, score-0.869]
26 In training, they use a greedy algorithm to produce the Viterbi phrase alignment and then apply a hill-climbing technique that modifies the Viterbi alignment by merge, move, split, and swap operations to find an alignment with a better probability in each iteration. [sent-98, score-1.424]
27 They observe that due to several constraints and pruning steps, the trained phrase table is much smaller than the heuristically extracted one, while preserving translation quality. [sent-103, score-0.991]
28 They show that by applying a prior distribution over the phrase translation probabilities they can prevent over-fitting. [sent-106, score-0.894]
29 The prior is composed of IBM1 lexical probabilities and a geometric distri- bution over phrase lengths which penalizes long phrases. [sent-107, score-0.719]
30 We then use these models and scaling factors to do a forced alignment, where we compute a phrase alignment for the training data. [sent-112, score-1.235]
31 From this alignment we then estimate new phrase models, while keeping all other models un477 changed. [sent-113, score-0.865]
32 In this section we describe our forced alignment procedure that is the basic training procedure for the models proposed here. [sent-114, score-0.671]
33 1 Forced Alignment The idea of forced alignment is to perform a phrase segmentation and alignment of each sentence pair ofthe training data using the full translation system as in decoding. [sent-116, score-1.753]
34 Given a source sentence f1J and target sentence e1I, we search for the best phrase segmentation and alignment that covers both sentences. [sent-119, score-1.092]
35 , K where for each segment ik is last position of kth target phrase, and (bk, jk) are the start and end positions of the source phrase aligned to the kth target phrase. [sent-123, score-0.738]
36 In addition to the phrase matching on the source sentence, we also discard all phrase translation candidates, that do not match any sequence in the given target sentence. [sent-126, score-1.422]
37 Sentences for which the decoder can not find an alignment are discarded for the phrase model training. [sent-127, score-0.97]
38 In this section, we describe a leaving-one-out method that can improve the phrase alignment in situations, where the probability of rare phrases and alignments might be overestimated. [sent-131, score-1.153]
39 , N is used for both the initialization of the translation model and the phrase model training. [sent-135, score-0.889]
40 The average length of the used phrases is an indicator of this kind of over-fitting, as the number of match- p(f˜|˜ e) ing training sentences decreases with increasing phrase length. [sent-142, score-0.804]
41 , 2006; Marcu and Wong, 2002) and by smoothing the phrase probabilities by lexical models on the phrase level (Ferrer and Juan, 2009). [sent-148, score-1.299]
42 When using leaving-one-out, we modify the phrase translation probabilities for each sentence pair. [sent-153, score-0.937]
43 For a training example fn, en, we have to remove all phrases e˜) that were extracted from this sentence pair from the phrase counts that Cn(f˜, 478 Figure 2: Segmentation example from forced alignment. [sent-154, score-1.138]
44 The same holds for the marginal counts Cn( e˜) and Starting from Equation 1, the leaving-oneout phrase probability for training sentence pair n is Cn(f˜). [sent-158, score-0.829]
45 pl1o,n(f˜|˜ e) =C(Cf˜,(e ˜ ˜ e))− − C Cnn(( e˜f˜), e˜ ) (4) To be able to perform the re-computation in an efficient way, we store the source and target phrase marginal counts for each phrase in the phrase table. [sent-159, score-1.859]
46 A phrase extraction is performed for each training sentence pair separately using the same word alignment as for the initialization. [sent-160, score-0.99]
47 It is then straightforward to compute the phrase counts after leaving-one-out using the phrase probabilities and marginal counts stored in the phrase table. [sent-161, score-1.944]
48 We refer to singleton phrases as phrase pairs that occur only in one sentence. [sent-163, score-0.808]
49 For these sentences, the decoder needs the singleton phrase pairs to produce an alignment. [sent-164, score-0.719]
50 Standard leavingone-out assigns a fixed probability α to singleton phrase pairs. [sent-167, score-0.714]
51 source phrase lengths in forced alignment without leaving-one-out and with standard and length-based leaving-one-out. [sent-172, score-1.119]
52 For higher iterations, phrase counts obtained in the previous iterations would have to be stored on disk separately for each sentence and accessed during the forced alignment process. [sent-182, score-1.169]
53 Instead of recomputing the phrase counts for each sentence individually, this is done for a whole batch of sentences at a time. [sent-184, score-0.685]
54 4 Parallelization To cope with the runtime and memory requirements of phrase model training that was pointed out by previous work (Marcu and Wong, 2002; Birch et al. [sent-187, score-0.691]
55 From the initial phrase table, each of these blocks only loads the phrases that are required for alignment. [sent-189, score-0.722]
56 The align479 ment and the counting of phrases are done separately for each block and then accumulated to build the updated phrase model. [sent-190, score-0.792]
57 4 Phrase Model Training The produced phrase alignment can be given as a single best alignment, as the n-best alignments or as an alignment graph representing all alignments considered by the decoder. [sent-191, score-1.314]
58 We have developed two different models for phrase translation probabilities which make use of the force-aligned training data. [sent-192, score-1.017]
59 The translation probability of a phrase pair e) is estimated as (f˜, pFA(f˜| e˜) =XCCFFAA(f(˜,f˜ e˜ 0), e˜ ) (5) Xf˜0 CFA(f˜, where e) is the count of the phrase pair e) in the phrase-aligned training data. [sent-196, score-1.639]
60 This can be applied to either the Viterbi phrase alignment or an n-best list. [sent-197, score-0.824]
61 We will refer to this model as the count model as we simply count the number of occurrences of a phrase pair. [sent-199, score-0.996]
62 While this might not cover all phrase translation probabilities, it allows the search space and translation times to be feasible and still contains the most probable alignments. [sent-208, score-0.986]
63 , 2006) have reported improvements in translation quality by interpolation of phrase tables produced by the generative and the heuristic model, we adopt this method and also report results using log-linear interpolation of the estimated model with the original model. [sent-219, score-1.408]
64 The log-linear interpolations of the phrase translation probabilities are estfim| ea˜t)e do as pint(f˜| e˜) pint(f˜| e˜) = ? [sent-220, score-0.894]
65 (ω) (6) where ω is the interpolation weight, pH the heuristically estimated phrase model and pgen the count model. [sent-224, score-1.1]
66 When interpolating phrase tables containing different sets of phrase pairs, we retain the intersection of the two. [sent-226, score-1.216]
67 As a generalization of the fixed interpolation of the two phrase tables we also experimented with adding the two trained phrase probabilities as additional features to the log-linear framework. [sent-227, score-1.499]
68 For the heuristic phrase model, we first use GIZA++ (Och and Ney, 2003) to compute the word alignment on TRAIN. [sent-239, score-0.924]
69 Next we obtain a phrase table by extraction of phrases from the word alignment. [sent-240, score-0.751]
70 The phrase table obtained by heuristic extraction is also used to initialize the training. [sent-242, score-0.677]
71 The forced alignment is run on the training data TRAIN from which we obtain the phrase alignments. [sent-243, score-1.102]
72 Those are used to build a phrase table according to the proposed generative phrase models. [sent-244, score-1.172]
73 Afterward, the scaling factors are trained on DEV for the new phrase table. [sent-245, score-0.688]
74 By feeding back the new phrase table into forced alignment we can reiterate the training procedure. [sent-246, score-1.131]
75 When training is finished the resulting phrase model is evaluated on DEV scaling factors and evaluate afterwards. [sent-247, score-0.783]
76 The baseline system is a standard phrase-based SMT system with eight features: phrase translation and word lexicon probabilities in both translation directions, phrase penalty, word penalty, language model score and a simple distance-based reordering model. [sent-248, score-1.763]
77 To investigate the generative models, we replace the two phrase translation probabilities and keep the other features identical to the baseline. [sent-250, score-0.941]
78 For the feature-wise combination the two generative phrase probabilities are added to the features, resulting in a total of 10 features. [sent-251, score-0.722]
79 4 BLEU over the heuristically extracted phrase model on the test data set. [sent-260, score-0.72]
80 Table 3 shows translation scores of the count model on the development data after the first training iteration for both leaving-one-out strategies we have introduced and for training without leaving-one-out with different restrictions on phrase length. [sent-265, score-1.205]
81 We can see that by restricting the source phrase length to a maximum of 3 words, the trained model is close to the performance of the heuristic phrase model. [sent-266, score-1.36]
82 1estimates phrase translation probabilities using counts from the n-best phrase alignments. [sent-270, score-1.507]
83 For smaller n the resulting phrase table contains fewer phrases and is more deterministic. [sent-271, score-0.787]
84 For higher values of n more competing alignments are taken into account, resulting in a bigger phrase table and a smoother distribution. [sent-272, score-0.684]
85 An additional benefit of the count model is the smaller phrase table size compared to the heuristic phrase extraction. [sent-277, score-1.485]
86 Even for the full model, we Table 4: Phrase table size of the count model for different n-best list sizes, the full model and for heuristic phrase extraction. [sent-282, score-1.038]
87 Due to pruning in the forced alignment step, not all translation options are considered. [sent-284, score-0.691]
88 As a result experiments can be done more rapidly and with less resources than with the heuristically extracted phrase table. [sent-285, score-0.688]
89 Also, our experiments show that the increased performance ofthe count model is partly derived from the smaller phrase table size. [sent-286, score-0.87]
90 In Table 5 we can see that the performance of the heuristic phrase model can be increased by 0. [sent-287, score-0.709]
91 6 BLEU on TEST by filtering the phrase table to contain the same phrases as the count model and reoptimizing the log-linear model weights. [sent-288, score-1.036]
92 The performance of the filtered baseline phrase table shows that part of that improvement derives from the smaller phrase table size. [sent-294, score-1.19]
93 Here, we used the phrase table trained with leaving-one-out in the first iteration and applied cross-validation in the second iteration. [sent-298, score-0.675]
94 The interpo482 Table 5: Final results for the heuristic phrase table filtered to contain the same phrases as the count model (baseline filt. [sent-302, score-1.075]
95 6 Conclusion We have shown that training phrase models can improve translation performance on a state-ofthe-art phrase-based translation model. [sent-329, score-1.109]
96 This is achieved by training phrase translation probabilities in a way that they are consistent with their use in translation. [sent-330, score-0.976]
97 We have shown that the technique is superior to limiting phrase lengths and smoothing with lexical probabilities alone. [sent-332, score-0.784]
98 While models trained from Viterbi alignments already lead to good results, we have demonstrated that considering the 100-best alignments allows to better model the ambiguities in phrase segmentation. [sent-333, score-0.912]
99 The proposed techniques are shown to be superior to previous approaches that only used lexical probabilities to smooth phrase tables or imposed limits on the phrase lengths. [sent-334, score-1.282]
100 4 BLEU points when interpolating the newly trained model with the original, heuristically extracted phrase table. [sent-337, score-0.798]
wordName wordTfidf (topN-words)
[('phrase', 0.548), ('alignment', 0.276), ('denero', 0.222), ('translation', 0.219), ('forced', 0.196), ('phrases', 0.174), ('interpolation', 0.167), ('count', 0.163), ('bleu', 0.155), ('dev', 0.133), ('probabilities', 0.127), ('alignments', 0.107), ('ferrer', 0.107), ('viterbi', 0.102), ('heuristic', 0.1), ('singleton', 0.086), ('decoder', 0.085), ('juan', 0.085), ('training', 0.082), ('wong', 0.081), ('heuristically', 0.081), ('och', 0.076), ('segmentation', 0.075), ('europarl', 0.072), ('marcu', 0.066), ('counts', 0.065), ('model', 0.061), ('retain', 0.061), ('scaling', 0.06), ('source', 0.055), ('target', 0.052), ('segmentations', 0.05), ('iteration', 0.05), ('ehling', 0.049), ('mhm', 0.049), ('mxm', 0.049), ('nelder', 0.049), ('pgen', 0.049), ('pint', 0.049), ('hermann', 0.048), ('probability', 0.048), ('trained', 0.048), ('generative', 0.047), ('lengths', 0.044), ('marginal', 0.043), ('sentence', 0.043), ('decoding', 0.043), ('ph', 0.043), ('birch', 0.043), ('tromble', 0.043), ('simplex', 0.043), ('jk', 0.043), ('cv', 0.043), ('models', 0.041), ('separately', 0.041), ('weighted', 0.041), ('reordering', 0.041), ('improvements', 0.039), ('lexica', 0.039), ('shorter', 0.039), ('full', 0.038), ('procedure', 0.038), ('smt', 0.037), ('bk', 0.037), ('smaller', 0.036), ('smoothing', 0.035), ('fn', 0.035), ('phrasebased', 0.035), ('ney', 0.034), ('partly', 0.033), ('lation', 0.033), ('counteract', 0.033), ('posterior', 0.033), ('penalty', 0.033), ('fixed', 0.032), ('train', 0.032), ('franz', 0.032), ('factors', 0.032), ('snover', 0.032), ('kneser', 0.032), ('cn', 0.031), ('phrasal', 0.031), ('discriminative', 0.031), ('statistical', 0.031), ('estimated', 0.031), ('adjusted', 0.031), ('ueffing', 0.031), ('ik', 0.031), ('agency', 0.031), ('superior', 0.03), ('morristown', 0.03), ('extracted', 0.03), ('hawaii', 0.03), ('interpolating', 0.03), ('done', 0.029), ('tables', 0.029), ('association', 0.029), ('alexandre', 0.029), ('blunsom', 0.029), ('table', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 1.000001 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
Author: Joern Wuebker ; Arne Mauser ; Hermann Ney
Abstract: Several attempts have been made to learn phrase translation probabilities for phrasebased statistical machine translation that go beyond pure counting of phrases in word-aligned training data. Most approaches report problems with overfitting. We describe a novel leavingone-out approach to prevent over-fitting that allows us to train phrase models that show improved translation performance on the WMT08 Europarl German-English task. In contrast to most previous work where phrase models were trained separately from other models used in translation, we include all components such as single word lexica and reordering mod- els in training. Using this consistent training of phrase models we are able to achieve improvements of up to 1.4 points in BLEU. As a side effect, the phrase table size is reduced by more than 80%.
2 0.27424291 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
Author: John DeNero ; Dan Klein
Abstract: We present a discriminative model that directly predicts which set ofphrasal translation rules should be extracted from a sentence pair. Our model scores extraction sets: nested collections of all the overlapping phrase pairs consistent with an underlying word alignment. Extraction set models provide two principle advantages over word-factored alignment models. First, we can incorporate features on phrase pairs, in addition to word links. Second, we can optimize for an extraction-based loss function that relates directly to the end task of generating translations. Our model gives improvements in alignment quality relative to state-of-the-art unsupervised and supervised baselines, as well as providing up to a 1.4 improvement in BLEU score in Chinese-to-English translation experiments.
3 0.25565025 133 acl-2010-Hierarchical Search for Word Alignment
Author: Jason Riesa ; Daniel Marcu
Abstract: We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can efficiently extract a ranked k-best list. We score a given alignment within the forest with a flexible, linear discriminative model incorporating hundreds of features, and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure, yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system.
4 0.25422844 147 acl-2010-Improving Statistical Machine Translation with Monolingual Collocation
Author: Zhanyi Liu ; Haifeng Wang ; Hua Wu ; Sheng Li
Abstract: This paper proposes to use monolingual collocations to improve Statistical Machine Translation (SMT). We make use of the collocation probabilities, which are estimated from monolingual corpora, in two aspects, namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT. The experimental results show that our method improves the performance of both word alignment and translation quality significantly. As compared to baseline systems, we achieve absolute improvements of 2.40 BLEU score on a phrase-based SMT system and 1.76 BLEU score on a parsing-based SMT system. 1
5 0.25312909 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment
Author: Vamshi Ambati ; Stephan Vogel ; Jaime Carbonell
Abstract: Semi-supervised word alignment aims to improve the accuracy of automatic word alignment by incorporating full or partial manual alignments. Motivated by standard active learning query sampling frameworks like uncertainty-, margin- and query-by-committee sampling we propose multiple query strategies for the alignment link selection task. Our experiments show that by active selection of uncertain and informative links, we reduce the overall manual effort involved in elicitation of alignment link data for training a semisupervised word aligner.
6 0.23218116 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages
7 0.20679621 54 acl-2010-Boosting-Based System Combination for Machine Translation
8 0.18600406 163 acl-2010-Learning Lexicalized Reordering Models from Reordering Graphs
9 0.18582387 170 acl-2010-Letter-Phoneme Alignment: An Exploration
10 0.1834874 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
11 0.17934138 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
12 0.17399105 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment
13 0.16639355 169 acl-2010-Learning to Translate with Source and Target Syntax
14 0.16410401 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network
15 0.16287293 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation
16 0.16140383 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
17 0.16089009 262 acl-2010-Word Alignment with Synonym Regularization
18 0.15260126 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
19 0.14635843 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
20 0.14443518 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
topicId topicWeight
[(0, -0.318), (1, -0.376), (2, -0.099), (3, -0.012), (4, 0.101), (5, 0.098), (6, -0.162), (7, 0.004), (8, 0.075), (9, -0.064), (10, 0.059), (11, 0.04), (12, -0.01), (13, 0.017), (14, -0.02), (15, 0.041), (16, -0.023), (17, 0.072), (18, -0.084), (19, -0.007), (20, 0.017), (21, 0.026), (22, 0.016), (23, -0.012), (24, -0.042), (25, -0.043), (26, 0.03), (27, -0.007), (28, -0.011), (29, -0.11), (30, 0.025), (31, 0.033), (32, -0.054), (33, -0.005), (34, 0.058), (35, -0.007), (36, 0.041), (37, -0.05), (38, -0.089), (39, 0.031), (40, 0.075), (41, 0.011), (42, -0.001), (43, -0.049), (44, -0.058), (45, -0.145), (46, 0.001), (47, 0.008), (48, -0.037), (49, -0.003)]
simIndex simValue paperId paperTitle
same-paper 1 0.98395896 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
Author: Joern Wuebker ; Arne Mauser ; Hermann Ney
Abstract: Several attempts have been made to learn phrase translation probabilities for phrasebased statistical machine translation that go beyond pure counting of phrases in word-aligned training data. Most approaches report problems with overfitting. We describe a novel leavingone-out approach to prevent over-fitting that allows us to train phrase models that show improved translation performance on the WMT08 Europarl German-English task. In contrast to most previous work where phrase models were trained separately from other models used in translation, we include all components such as single word lexica and reordering mod- els in training. Using this consistent training of phrase models we are able to achieve improvements of up to 1.4 points in BLEU. As a side effect, the phrase table size is reduced by more than 80%.
2 0.79598588 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
Author: John DeNero ; Dan Klein
Abstract: We present a discriminative model that directly predicts which set ofphrasal translation rules should be extracted from a sentence pair. Our model scores extraction sets: nested collections of all the overlapping phrase pairs consistent with an underlying word alignment. Extraction set models provide two principle advantages over word-factored alignment models. First, we can incorporate features on phrase pairs, in addition to word links. Second, we can optimize for an extraction-based loss function that relates directly to the end task of generating translations. Our model gives improvements in alignment quality relative to state-of-the-art unsupervised and supervised baselines, as well as providing up to a 1.4 improvement in BLEU score in Chinese-to-English translation experiments.
3 0.77705246 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages
Author: Bing Xiang ; Yonggang Deng ; Bowen Zhou
Abstract: We present a novel method to improve word alignment quality and eventually the translation performance by producing and combining complementary word alignments for low-resource languages. Instead of focusing on the improvement of a single set of word alignments, we generate multiple sets of diversified alignments based on different motivations, such as linguistic knowledge, morphology and heuristics. We demonstrate this approach on an English-to-Pashto translation task by combining the alignments obtained from syntactic reordering, stemming, and partial words. The combined alignment outperforms the baseline alignment, with significantly higher F-scores and better transla- tion performance.
4 0.74921632 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment
Author: Shujie Liu ; Chi-Ho Li ; Ming Zhou
Abstract: While Inversion Transduction Grammar (ITG) has regained more and more attention in recent years, it still suffers from the major obstacle of speed. We propose a discriminative ITG pruning framework using Minimum Error Rate Training and various features from previous work on ITG alignment. Experiment results show that it is superior to all existing heuristics in ITG pruning. On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs, which improves both F-score and Bleu score over the baseline alignment system of GIZA++. 1
5 0.72346896 147 acl-2010-Improving Statistical Machine Translation with Monolingual Collocation
Author: Zhanyi Liu ; Haifeng Wang ; Hua Wu ; Sheng Li
Abstract: This paper proposes to use monolingual collocations to improve Statistical Machine Translation (SMT). We make use of the collocation probabilities, which are estimated from monolingual corpora, in two aspects, namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT. The experimental results show that our method improves the performance of both word alignment and translation quality significantly. As compared to baseline systems, we achieve absolute improvements of 2.40 BLEU score on a phrase-based SMT system and 1.76 BLEU score on a parsing-based SMT system. 1
6 0.72119552 201 acl-2010-Pseudo-Word for Phrase-Based Machine Translation
7 0.71492791 133 acl-2010-Hierarchical Search for Word Alignment
8 0.70592731 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
9 0.69952595 163 acl-2010-Learning Lexicalized Reordering Models from Reordering Graphs
10 0.69488448 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment
11 0.67316878 54 acl-2010-Boosting-Based System Combination for Machine Translation
12 0.66852015 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation
13 0.64717704 262 acl-2010-Word Alignment with Synonym Regularization
14 0.64441615 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
15 0.62193173 170 acl-2010-Letter-Phoneme Alignment: An Exploration
16 0.58484662 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment
17 0.56973755 135 acl-2010-Hindi-to-Urdu Machine Translation through Transliteration
18 0.5534007 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation
19 0.54681772 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation
20 0.53194028 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
topicId topicWeight
[(25, 0.035), (42, 0.016), (59, 0.576), (73, 0.033), (78, 0.018), (83, 0.07), (84, 0.016), (98, 0.145)]
simIndex simValue paperId paperTitle
1 0.99483567 151 acl-2010-Intelligent Selection of Language Model Training Data
Author: Robert C. Moore ; William Lewis
Abstract: We address the problem of selecting nondomain-specific language model training data to build auxiliary language models for use in tasks such as machine translation. Our approach is based on comparing the cross-entropy, according to domainspecific and non-domain-specifc language models, for each sentence of the text source used to produce the latter language model. We show that this produces better language models, trained on less data, than both random data selection and two other previously proposed methods.
2 0.99316591 205 acl-2010-SVD and Clustering for Unsupervised POS Tagging
Author: Michael Lamar ; Yariv Maron ; Mark Johnson ; Elie Bienenstock
Abstract: We revisit the algorithm of Schütze (1995) for unsupervised part-of-speech tagging. The algorithm uses reduced-rank singular value decomposition followed by clustering to extract latent features from context distributions. As implemented here, it achieves state-of-the-art tagging accuracy at considerably less cost than more recent methods. It can also produce a range of finer-grained taggings, with potential applications to various tasks. 1
same-paper 3 0.98562855 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
Author: Joern Wuebker ; Arne Mauser ; Hermann Ney
Abstract: Several attempts have been made to learn phrase translation probabilities for phrasebased statistical machine translation that go beyond pure counting of phrases in word-aligned training data. Most approaches report problems with overfitting. We describe a novel leavingone-out approach to prevent over-fitting that allows us to train phrase models that show improved translation performance on the WMT08 Europarl German-English task. In contrast to most previous work where phrase models were trained separately from other models used in translation, we include all components such as single word lexica and reordering mod- els in training. Using this consistent training of phrase models we are able to achieve improvements of up to 1.4 points in BLEU. As a side effect, the phrase table size is reduced by more than 80%.
4 0.98558897 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs
Author: Galina Tremper
Abstract: Presupposition relations between verbs are not very well covered in existing lexical semantic resources. We propose a weakly supervised algorithm for learning presupposition relations between verbs that distinguishes five semantic relations: presupposition, entailment, temporal inclusion, antonymy and other/no relation. We start with a number of seed verb pairs selected manually for each semantic relation and classify unseen verb pairs. Our algorithm achieves an overall accuracy of 36% for type-based classification.
5 0.98355669 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
Author: Chris Dyer ; Adam Lopez ; Juri Ganitkevitch ; Jonathan Weese ; Ferhan Ture ; Phil Blunsom ; Hendra Setiawan ; Vladimir Eidelman ; Philip Resnik
Abstract: Adam Lopez University of Edinburgh alopez@inf.ed.ac.uk Juri Ganitkevitch Johns Hopkins University juri@cs.jhu.edu Ferhan Ture University of Maryland fture@cs.umd.edu Phil Blunsom Oxford University pblunsom@comlab.ox.ac.uk Vladimir Eidelman University of Maryland vlad@umiacs.umd.edu Philip Resnik University of Maryland resnik@umiacs.umd.edu classes in a unified way.1 Although open source decoders for both phraseWe present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including word-based models, phrase-based models, and models based on synchronous context-free grammars. Using a single unified internal representation for translation forests, the decoder strictly separates model-specific translation logic from general rescoring, pruning, and inference algorithms. From this unified representation, the decoder can extract not only the 1- or k-best translations, but also alignments to a reference, or the quantities necessary to drive discriminative training using gradient-based or gradient-free optimization techniques. Its efficient C++ implementation means that memory use and runtime performance are significantly better than comparable decoders.
6 0.88866693 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems
7 0.88839233 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study
8 0.83233994 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation
10 0.80859762 114 acl-2010-Faster Parsing by Supertagger Adaptation
11 0.8063544 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future
12 0.80525774 91 acl-2010-Domain Adaptation of Maximum Entropy Language Models
13 0.80329508 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences
14 0.79520869 26 acl-2010-All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision
15 0.7917859 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
16 0.79096699 44 acl-2010-BabelNet: Building a Very Large Multilingual Semantic Network
17 0.78852475 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network
18 0.78538549 212 acl-2010-Simple Semi-Supervised Training of Part-Of-Speech Taggers
19 0.78052694 96 acl-2010-Efficient Optimization of an MDL-Inspired Objective Function for Unsupervised Part-Of-Speech Tagging
20 0.77783328 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment