acl acl2010 acl2010-54 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tong Xiao ; Jingbo Zhu ; Muhua Zhu ; Huizhen Wang
Abstract: In this paper, we present a simple and effective method to address the issue of how to generate diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. Our method is based on the framework of boosting. First, a sequence of weak translation systems is generated from a baseline system in an iterative manner. Then, a strong translation system is built from the ensemble of these weak translation systems. To adapt boosting to SMT system combination, several key components of the original boosting algorithms are redesigned in this work. We evaluate our method on Chinese-to-English Machine Translation (MT) tasks in three baseline systems, including a phrase-based system, a hierarchical phrasebased system and a syntax-based system. The experimental results on three NIST evaluation test sets show that our method leads to significant improvements in translation accuracy over the baseline systems. 1
Reference: text
sentIndex sentText sentNum sentScore
1 com , , Abstract In this paper, we present a simple and effective method to address the issue of how to generate diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. [sent-6, score-0.784]
2 First, a sequence of weak translation systems is generated from a baseline system in an iterative manner. [sent-8, score-0.712]
3 Then, a strong translation system is built from the ensemble of these weak translation systems. [sent-9, score-0.931]
4 To adapt boosting to SMT system combination, several key components of the original boosting algorithms are redesigned in this work. [sent-10, score-0.571]
5 We evaluate our method on Chinese-to-English Machine Translation (MT) tasks in three baseline systems, including a phrase-based system, a hierarchical phrasebased system and a syntax-based system. [sent-11, score-0.274]
6 The experimental results on three NIST evaluation test sets show that our method leads to significant improvements in translation accuracy over the baseline systems. [sent-12, score-0.482]
7 With the emergence of various structurally different SMT systems, more and more studies are focused on combining multiple SMT systems for achieving higher translation accuracy rather than using a single translation system. [sent-19, score-0.679]
8 The basic idea of system combination is to extract or generate a translation by voting from an ensemble of translation outputs. [sent-20, score-0.946]
9 Depending on how the translation is combined and what voting strategy is adopted, several methods can be used for system combination, e. [sent-21, score-0.45]
10 sentence-level combination (Hildebrand and Vogel, 2008) simply selects one from original translations, while some more sophisticated methods, such as wordlevel and phrase-level combination (Matusov et al. [sent-23, score-0.252]
11 One of the key factors in SMT system combination is the diversity in the ensemble of translation outputs (Macherey and Och, 2007). [sent-26, score-0.912]
12 To obtain diversified translation outputs, most of the current system combination methods require multiple translation engines based on different models. [sent-27, score-1.012]
13 To reduce the burden of system development, it might be a nice way to combine a set of translation systems built from a single translation engine. [sent-29, score-0.793]
14 A key issue here is how to generate an ensemble of diversified translation systems from a single translation engine in a principled way. [sent-30, score-1.069]
15 Addressing this issue, we propose a boostingbased system combination method to learn a combined translation system from a single SMT engine. [sent-31, score-0.801]
16 In this method, a sequence of weak translation systems is generated from a baseline system in an iterative manner. [sent-32, score-0.712]
17 In each iteration, a new weak translation system is learned, focusing more on the sentences that are relatively poorly translated by the previous weak translation system. [sent-33, score-0.944]
18 Finally, a strong translation system is built from the ensemble of the weak translation systems. [sent-34, score-0.931]
19 Our experiments are conducted on Chinese-toEnglish translation in three state-of-the-art SMT systems, including a phrase-based system, a hierarchical phrase-based system and a syntax-based 739 Proce dinUgsp osfa tlhae, 4S8wthed Aen n,u 1a1l-1 M6e Jeutilnyg 2 o0f1 t0h. [sent-35, score-0.465]
20 Experimental re- sults show that our method leads to significant improvements in translation accuracy over the baseline systems. [sent-39, score-0.482]
21 e* = argmax(Pr(e e| f)) (1) e where Pr(e e| f) is the probability that e is the translation of the given source string f. [sent-41, score-0.341]
22 , uT(λ*T)} , the task of system combination is to build a new translation system v(u1(λ*1), . [sent-58, score-0.658]
23 , uT(λ*T)) denotes the combination system which combines translations from the ensemble of the output of each ui(λ*i). [sent-68, score-0.41]
24 As discussed in Section 1, the diversity among the outputs of member systems is an important factor to the success of system combination. [sent-73, score-0.834]
25 To obtain diversified member systems, traditional methods concentrate more on using structurally different member systems, that is u1 ≠ u2 ≠ . [sent-74, score-0.922]
26 However, this constraint condition cannot be satisfied when multiple translation engines are not available. [sent-78, score-0.304]
27 In this paper, we argue that the diversified member systems can also be generated from a single engine u(λ*) by adjusting the weight vector λ* in a principled way. [sent-79, score-0.781]
28 However, since most of the boosting algorithms are designed for the classification problem that is very different from the translation problem in natural language processing, several key components have to be redesigned when boosting is adapted to SMT system combination. [sent-89, score-0.875]
29 As the weighted BLEU is used to measure the translation accuracy on the training set, the error rate is defined to be: εt 3. [sent-111, score-0.304]
30 On each round, we increase the weights of the samples that are relatively poorly translated by the current weak system so that the MERT-based trainer can focus on the hard samples in next round. [sent-114, score-0.348]
31 αt can be regarded as a measure of the importance that the t-th weak system gains in boosting. [sent-116, score-0.263]
32 , ein} be the n-best translation candidates produced by the system. [sent-125, score-0.366]
33 , 2006) of the translation e with respect to the reference translations ri, and ei* is the oracle translation which is selected from {ei1, . [sent-127, score-0.734]
34 li can be viewed as a measure of the average cost that we guess the top-k translation candidates instead of the oracle translation. [sent-131, score-0.492]
35 The value of li counts for the magnitude of weight update, that is, a larger li means a larger weight update on Dt(i). [sent-132, score-0.261]
36 3 System Combination Scheme In the last step of our method, a strong translation system v(u(λ*1), . [sent-138, score-0.418]
37 In this work, a sentence-level combination method is used to select the best translation from the pool of the n-best outputs of all the member systems. [sent-147, score-0.913]
38 Let H(u(λ*t)) (or Ht for short) be the set of the n-best translation candidates produced by the t-th member system u(λ*t), and H(v) be the union set of all Ht (i. [sent-148, score-0.859]
39 The final translation is generated from H(v) based on the following scoring function: e*=areg∈ Hm( va )x∑tT=1βt⋅ φt(e )+ψ(e , H( v )) (8) where φt (e ) is the log-scaled model score of e in the t-th member system, and βt is the corre- sponding feature weight. [sent-151, score-0.723]
40 In this case, we can still calculate the model score of e in any other member systems, since all the member systems are based on the same model and share the same feature space. [sent-153, score-0.829]
41 ψ(e , H( v )) is a consensusbased scoring function which has been successfully adopted in SMT system combination (Duan et al. [sent-154, score-0.24]
42 ≠ ψ(e , H( v )) = ∑θn+ ⋅hn+(e , H( v ))+ n ∑θn−⋅h−n(e , H( v )) (9) n For each order of n-gram, hn+ (e , H( v )) and h−n (e , H( v )) are defined to measure the n-gram agreement and disagreement between e and other translation candidates in H(v), respectively. [sent-158, score-0.366]
43 If p orders of n-gram are used in computing ψ(e , H( v )) , the total number of features in the system combination will be T + 2 p (T modelscore-based features defined in Equation 8 and 2 p consensus-based features defined in Equation 9). [sent-163, score-0.24]
44 4 Optimization If implemented naively, the translation speed of the final translation system will be very slow. [sent-165, score-0.785]
45 For a given input sentence, each member system has to encode it individually, and the translation speed is inversely proportional to the number of member systems generated by our method. [sent-166, score-1.35]
46 A simple solution is to run member systems in parallel when translating a new sentence. [sent-168, score-0.45]
47 Since all the member systems share the same data re- sources, such as language model and translation table, we only need to keep one copy of the required resources in memory. [sent-169, score-0.754]
48 The translation speed just depends on the computing power of parallel computation environment, such as the number of CPUs. [sent-170, score-0.409]
49 Furthermore, we can use joint decoding techniques to save the computation of the equivalent translation hypotheses among member systems. [sent-171, score-0.81]
50 In joint decoding of member systems, the search space is structured as a translation hypergraph where the member systems can share their translation hypotheses. [sent-172, score-1.486]
51 If more than one member systems share the same translation hypothesis, we just need to compute the corresponding feature values only once, instead of repeating the computation in individual decoders. [sent-173, score-0.796]
52 In our experiments, we find that over 60% translation hypotheses can be shared among member systems when the number of member systems is over 4. [sent-174, score-1.24]
53 Another method to speed up the system is to accelerate n-gram language model with n-gram caching techniques. [sent-176, score-0.256]
54 As the translation speed of SMT system depends heavily on the computation of n-gram language model, the acceleration of n-gram language model generally leads to substantial speed-up of SMT system. [sent-180, score-0.523]
55 742 5 Experiments Our experiments are conducted on Chinese-toEnglish translation in three SMT systems. [sent-182, score-0.304]
56 1 Baseline Systems The first SMT system is a phrase-based system with two reordering models including the maxi- mum entropy-based lexicalized reordering model proposed by Xiong et al. [sent-184, score-0.344]
57 The second SMT system is an in-house reimplementation of the Hiero system which is based on the hierarchical phrase-based model proposed by Chiang (2005). [sent-187, score-0.275]
58 The third SMT system is a syntax-based system based on the string-to-tree model (Galley et al. [sent-188, score-0.228]
59 , 2009) is performed on each translation rule for the CKYstyle decoding. [sent-193, score-0.304]
60 In this work, baseline system refers to the system produced by the boosting-based system combination when the number of iterations (i. [sent-194, score-0.624]
61 To obtain satisfactory baseline per- formance, we train each SMT system for 5 times using MERT with different initial values of feature weights to generate a group of baseline candidates, and then select the best-performing one from this group as the final baseline system (i. [sent-197, score-0.554]
62 A 5gram language model is trained on the target-side 4 Our in-house experimental results show that this system performs slightly better than Moses on Chinese-to-English translation tasks. [sent-204, score-0.418]
63 The data set used for weight training in boostingbased system combination comes from NIST MT03 evaluation set. [sent-207, score-0.352]
64 The translation quality is evaluated in terms of case-insensitive NIST version BLEU metric. [sent-210, score-0.304]
65 The ngram consensuses-based features (in Equation 9) used in system combination ranges from unigram to 4-gram. [sent-215, score-0.24]
66 3 Evaluation of Translations First we investigate the effectiveness of the boosting-based system combination on the three systems. [sent-217, score-0.24]
67 Figures 2-5 show the BLEU curves on the development and test sets, where the X-axis is the iteration number, and the Y-axis is the BLEU score of the system generated by the boostingbased system combination. [sent-218, score-0.473]
68 After 5, 7 and 8 iterations, relatively stable improvements are achieved by the phrase-based system, the Hiero system and the syntax-based system, respectively. [sent-222, score-0.238]
69 Figures 2-5 also show that the boosting-based system combination seems to be more helpful to the phrase-based system than to the Hiero system and the syntax-based system. [sent-224, score-0.468]
70 For the comparison, we show the performance of the baseline systems with the n-best list size of 600 (Baseline+600best in Table 1) which equals to the maximum number of translation candidates accessed in the final combination system (combine 30 member systems, i. [sent-249, score-1.169]
71 These results indicate that the SMT systems can benefit more from the diversified outputs of member systems rather than from larger n-best lists produced by a single system. [sent-256, score-0.748]
72 4 Diversity among Member Systems We also study the change of diversity among the outputs of member systems during iterations. [sent-258, score-0.72]
73 In this work, the TER score for a given group of member systems is calculated by averaging the TER scores between the outputs of each pair of member systems in this group. [sent-262, score-0.997]
74 Figures 6-9 show the curves of diversity on the development and test sets, where the X-axis is the iteration number, and the Y-axis is the diversity. [sent-263, score-0.342]
75 The points at iteration 1 stand for the diversities of baseline systems. [sent-264, score-0.277]
76 In this work, the baseline’s diversity is the TER score of the group of baseline candidates that are generated in advance (Section 5. [sent-265, score-0.415]
77 It indicates that our method is very effective to generate diversified member systems. [sent-268, score-0.584]
78 In addition, the diversities of baseline systems (iteration 1) are much lower 745 than those of the systems generated by boosting (iterations 2-30). [sent-269, score-0.529]
79 Together with the results shown in Figures 2-5, it confirms our motivation that the diversified translation outputs can lead to performance improvements over the baseline systems. [sent-270, score-0.668]
80 Also as shown in Figures 6-9, the diversity of the Hiero system is much lower than that of the phrase-based and syntax-based systems at each individual setting of iteration number. [sent-271, score-0.527]
81 This interesting finding supports the observation that the performance of the Hiero system is relatively more stable than the other two systems as shown in Figures 2-5. [sent-272, score-0.244]
82 The relative lack of diversity in the Hiero system might be due to the spurious ambiguity in Hiero derivations which generally results in very few different translations in translation outputs (Chiang, 2007). [sent-273, score-0.76]
83 5 Evaluation of Oracle Translations In this set of experiments, we evaluate the oracle performance on the n-best lists of the baseline systems and the combined systems generated by boosting-based system combination. [sent-275, score-0.454]
84 Table 2 shows the results, where Baseline+600best stands for the top-600 translation candidates generated by the baseline systems, and Boosting-30iterations stands for the ensemble of 30 member systems’ top-20 translation candidates. [sent-277, score-1.259]
85 This result indicates that our method can provide much “better” translation candidates for system combination than enlarging the size of n-best list naively. [sent-279, score-0.647]
86 However, most of the previous work did not study the issue of how to improve a single SMT engine using boosting algorithms. [sent-287, score-0.295]
87 To our knowledge, the only work addressing this issue is (Lagarda and Casacuberta, 2008) in which the boosting algorithm was adopted in phrase-based SMT. [sent-288, score-0.248]
88 There are also some other studies on building diverse translation systems from a single translation engine for system combination. [sent-291, score-0.84]
89 They empirically showed that diverse translation systems could be generated by changing parameters at early-stages of the training procedure. [sent-293, score-0.415]
90 (2009) proposed a feature subspace method to build a group of translation systems from various different sub-models of an existing SMT system. [sent-295, score-0.491]
91 In this work, we use a sentence-level system combination method to generate final translations. [sent-306, score-0.281]
92 Another issue is how to determine an appropriate number of iterations for boosting-based system combination. [sent-308, score-0.241]
93 Our empirical study shows that the stable and satisfactory improvements can be achieved after 6-8 iterations, while the largest improvements can be achieved after 20 iterations. [sent-310, score-0.231]
94 In our future work, we will study in-depth principled ways to determine the appropriate number of iterations for boosting-based system combination. [sent-311, score-0.236]
95 8 Conclusions We have proposed a boosting-based system combination method to address the issue of building a strong translation system from a group of weak translation systems generated from a single SMT engine. [sent-312, score-1.302]
96 The experimental results show that our method is very effective to improve the translation accuracy of the SMT systems. [sent-314, score-0.345]
97 Scalable inferences and training of context-rich syntax translation models. [sent-370, score-0.304]
98 Combination of machine translation systems via hypothesis selection from combined n-best lists. [sent-381, score-0.407]
99 SPMT: Statistical machine translation with syntactified target language phrases. [sent-426, score-0.304]
100 Computing consensus translation from multiple machine translation systems using enhanced hypotheses alignment. [sent-431, score-0.755]
wordName wordTfidf (topN-words)
[('member', 0.379), ('smt', 0.354), ('translation', 0.304), ('bleu', 0.256), ('diversity', 0.207), ('boosting', 0.205), ('diversified', 0.164), ('iteration', 0.135), ('schapire', 0.132), ('combination', 0.126), ('wbleu', 0.117), ('system', 0.114), ('weak', 0.111), ('duan', 0.111), ('hiero', 0.106), ('ensemble', 0.098), ('hildebrand', 0.093), ('iterations', 0.084), ('lagarda', 0.075), ('ut', 0.074), ('li', 0.072), ('baseline', 0.072), ('translations', 0.072), ('freund', 0.071), ('hm', 0.071), ('systems', 0.071), ('adaboost', 0.07), ('boostingbased', 0.07), ('diversities', 0.07), ('macherey', 0.066), ('mert', 0.065), ('improvements', 0.065), ('speed', 0.063), ('outputs', 0.063), ('candidates', 0.062), ('stable', 0.059), ('xiao', 0.059), ('figures', 0.058), ('reordering', 0.058), ('nist', 0.058), ('chiang', 0.056), ('spmt', 0.056), ('galley', 0.055), ('oracle', 0.054), ('ri', 0.052), ('och', 0.051), ('ter', 0.051), ('casacuberta', 0.05), ('decoding', 0.049), ('dt', 0.049), ('cache', 0.048), ('binarization', 0.048), ('hierarchical', 0.047), ('engine', 0.047), ('elb', 0.047), ('jingbo', 0.047), ('matusov', 0.047), ('redesigned', 0.047), ('rosti', 0.047), ('rudin', 0.047), ('emnlp', 0.045), ('mu', 0.044), ('issue', 0.043), ('tong', 0.042), ('satisfactory', 0.042), ('pr', 0.042), ('weight', 0.042), ('computation', 0.042), ('yoram', 0.041), ('accessed', 0.041), ('vogel', 0.041), ('hn', 0.041), ('synchronous', 0.041), ('dongdong', 0.041), ('ghkm', 0.041), ('subspace', 0.041), ('trainer', 0.041), ('method', 0.041), ('samples', 0.041), ('consensus', 0.04), ('ming', 0.04), ('generated', 0.04), ('zhu', 0.038), ('principled', 0.038), ('pages', 0.038), ('gains', 0.038), ('eij', 0.038), ('caching', 0.038), ('mbr', 0.038), ('string', 0.037), ('hypotheses', 0.036), ('robert', 0.035), ('liu', 0.035), ('group', 0.034), ('zhang', 0.033), ('update', 0.033), ('combined', 0.032), ('nan', 0.032), ('mh', 0.032), ('liang', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 54 acl-2010-Boosting-Based System Combination for Machine Translation
Author: Tong Xiao ; Jingbo Zhu ; Muhua Zhu ; Huizhen Wang
Abstract: In this paper, we present a simple and effective method to address the issue of how to generate diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. Our method is based on the framework of boosting. First, a sequence of weak translation systems is generated from a baseline system in an iterative manner. Then, a strong translation system is built from the ensemble of these weak translation systems. To adapt boosting to SMT system combination, several key components of the original boosting algorithms are redesigned in this work. We evaluate our method on Chinese-to-English Machine Translation (MT) tasks in three baseline systems, including a phrase-based system, a hierarchical phrasebased system and a syntax-based system. The experimental results on three NIST evaluation test sets show that our method leads to significant improvements in translation accuracy over the baseline systems. 1
2 0.20679621 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
Author: Joern Wuebker ; Arne Mauser ; Hermann Ney
Abstract: Several attempts have been made to learn phrase translation probabilities for phrasebased statistical machine translation that go beyond pure counting of phrases in word-aligned training data. Most approaches report problems with overfitting. We describe a novel leavingone-out approach to prevent over-fitting that allows us to train phrase models that show improved translation performance on the WMT08 Europarl German-English task. In contrast to most previous work where phrase models were trained separately from other models used in translation, we include all components such as single word lexica and reordering mod- els in training. Using this consistent training of phrase models we are able to achieve improvements of up to 1.4 points in BLEU. As a side effect, the phrase table size is reduced by more than 80%.
3 0.20621015 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking
Author: Radu Soricut ; Abdessamad Echihabi
Abstract: The adoption ofMachine Translation technology for commercial applications is hampered by the lack of trust associated with machine-translated output. In this paper, we describe TrustRank, an MT system enhanced with a capability to rank the quality of translation outputs from good to bad. This enables the user to set a quality threshold, granting the user control over the quality of the translations. We quantify the gains we obtain in translation quality, and show that our solution works on a wide variety of domains and language pairs.
4 0.20276614 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
Author: Deyi Xiong ; Min Zhang ; Haizhou Li
Abstract: Automatic error detection is desired in the post-processing to improve machine translation quality. The previous work is largely based on confidence estimation using system-based features, such as word posterior probabilities calculated from Nbest lists or word lattices. We propose to incorporate two groups of linguistic features, which convey information from outside machine translation systems, into error detection: lexical and syntactic features. We use a maximum entropy classifier to predict translation errors by integrating word posterior probability feature and linguistic features. The experimental results show that 1) linguistic features alone outperform word posterior probability based confidence estimation in error detection; and 2) linguistic features can further provide complementary information when combined with word confidence scores, which collectively reduce the classification error rate by 18.52% and improve the F measure by 16.37%.
Author: Marine Carpuat ; Yuval Marton ; Nizar Habash
Abstract: We study the challenges raised by Arabic verb and subject detection and reordering in Statistical Machine Translation (SMT). We show that post-verbal subject (VS) constructions are hard to translate because they have highly ambiguous reordering patterns when translated to English. In addition, implementing reordering is difficult because the boundaries of VS constructions are hard to detect accurately, even with a state-of-the-art Arabic dependency parser. We therefore propose to reorder VS constructions into SV order for SMT word alignment only. This strategy significantly improves BLEU and TER scores, even on a strong large-scale baseline and despite noisy parses.
6 0.19485503 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
7 0.1832197 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation
8 0.17982808 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
9 0.17358588 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
10 0.16472766 147 acl-2010-Improving Statistical Machine Translation with Monolingual Collocation
11 0.15638012 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages
12 0.15620758 45 acl-2010-Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures
13 0.15377918 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
14 0.15364327 243 acl-2010-Tree-Based and Forest-Based Translation
15 0.15320382 69 acl-2010-Constituency to Dependency Translation with Forests
16 0.15313166 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
17 0.15100689 133 acl-2010-Hierarchical Search for Word Alignment
18 0.15080123 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation
19 0.14984843 56 acl-2010-Bridging SMT and TM with Translation Recommendation
20 0.13824473 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
topicId topicWeight
[(0, -0.27), (1, -0.306), (2, -0.089), (3, 0.015), (4, 0.067), (5, 0.025), (6, -0.076), (7, -0.018), (8, -0.105), (9, 0.078), (10, 0.204), (11, 0.17), (12, 0.114), (13, -0.082), (14, 0.064), (15, 0.035), (16, -0.059), (17, 0.072), (18, -0.128), (19, 0.023), (20, -0.028), (21, -0.021), (22, -0.017), (23, -0.016), (24, -0.082), (25, 0.004), (26, 0.06), (27, 0.036), (28, 0.019), (29, 0.059), (30, 0.096), (31, -0.007), (32, 0.019), (33, 0.09), (34, 0.002), (35, -0.002), (36, 0.08), (37, 0.001), (38, 0.042), (39, 0.01), (40, 0.0), (41, 0.004), (42, -0.003), (43, -0.004), (44, -0.023), (45, 0.029), (46, -0.018), (47, -0.016), (48, -0.045), (49, -0.004)]
simIndex simValue paperId paperTitle
same-paper 1 0.9758845 54 acl-2010-Boosting-Based System Combination for Machine Translation
Author: Tong Xiao ; Jingbo Zhu ; Muhua Zhu ; Huizhen Wang
Abstract: In this paper, we present a simple and effective method to address the issue of how to generate diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. Our method is based on the framework of boosting. First, a sequence of weak translation systems is generated from a baseline system in an iterative manner. Then, a strong translation system is built from the ensemble of these weak translation systems. To adapt boosting to SMT system combination, several key components of the original boosting algorithms are redesigned in this work. We evaluate our method on Chinese-to-English Machine Translation (MT) tasks in three baseline systems, including a phrase-based system, a hierarchical phrasebased system and a syntax-based system. The experimental results on three NIST evaluation test sets show that our method leads to significant improvements in translation accuracy over the baseline systems. 1
2 0.88212907 45 acl-2010-Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures
Author: Jesus Gonzalez Rubio ; Daniel Ortiz Martinez ; Francisco Casacuberta
Abstract: This work deals with the application of confidence measures within an interactivepredictive machine translation system in order to reduce human effort. If a small loss in translation quality can be tolerated for the sake of efficiency, user effort can be saved by interactively translating only those initial translations which the confidence measure classifies as incorrect. We apply confidence estimation as a way to achieve a balance between user effort savings and final translation error. Empirical results show that our proposal allows to obtain almost perfect translations while significantly reducing user effort.
3 0.82058179 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking
Author: Radu Soricut ; Abdessamad Echihabi
Abstract: The adoption ofMachine Translation technology for commercial applications is hampered by the lack of trust associated with machine-translated output. In this paper, we describe TrustRank, an MT system enhanced with a capability to rank the quality of translation outputs from good to bad. This enables the user to set a quality threshold, granting the user control over the quality of the translations. We quantify the gains we obtain in translation quality, and show that our solution works on a wide variety of domains and language pairs.
4 0.77874058 56 acl-2010-Bridging SMT and TM with Translation Recommendation
Author: Yifan He ; Yanjun Ma ; Josef van Genabith ; Andy Way
Abstract: We propose a translation recommendation framework to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We describe an implementation of this framework using an SVM binary classifier. We exploit methods to fine-tune the classifier and investigate a variety of features of different types. We rely on automatic MT evaluation metrics to approximate human judgements in our experiments. Experimental results show that our system can achieve 0.85 precision at 0.89 recall, excluding ex- act matches. Furthermore, it is possible for the end-user to achieve a desired balance between precision and recall by adjusting confidence levels.
5 0.77285701 223 acl-2010-Tackling Sparse Data Issue in Machine Translation Evaluation
Author: Ondrej Bojar ; Kamil Kos ; David Marecek
Abstract: We illustrate and explain problems of n-grams-based machine translation (MT) metrics (e.g. BLEU) when applied to morphologically rich languages such as Czech. A novel metric SemPOS based on the deep-syntactic representation of the sentence tackles the issue and retains the performance for translation to English as well.
6 0.7692517 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
7 0.76310503 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
8 0.71848017 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
9 0.70179564 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation
10 0.66394174 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation
11 0.65620315 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
12 0.63912725 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
13 0.63572651 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment
14 0.63541901 57 acl-2010-Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation
15 0.63469106 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking
16 0.63372558 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
17 0.60375261 243 acl-2010-Tree-Based and Forest-Based Translation
18 0.59974009 104 acl-2010-Evaluating Machine Translations Using mNCD
19 0.57755333 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
20 0.5579325 69 acl-2010-Constituency to Dependency Translation with Forests
topicId topicWeight
[(14, 0.014), (25, 0.055), (39, 0.011), (42, 0.017), (44, 0.013), (59, 0.181), (73, 0.06), (78, 0.023), (80, 0.021), (83, 0.118), (84, 0.019), (90, 0.174), (95, 0.026), (98, 0.167)]
simIndex simValue paperId paperTitle
1 0.94273508 104 acl-2010-Evaluating Machine Translations Using mNCD
Author: Marcus Dobrinkat ; Tero Tapiovaara ; Jaakko Vayrynen ; Kimmo Kettunen
Abstract: This paper introduces mNCD, a method for automatic evaluation of machine translations. The measure is based on normalized compression distance (NCD), a general information theoretic measure of string similarity, and flexible word matching provided by stemming and synonyms. The mNCD measure outperforms NCD in system-level correlation to human judgments in English.
2 0.91243517 143 acl-2010-Importance of Linguistic Constraints in Statistical Dependency Parsing
Author: Bharat Ram Ambati
Abstract: Statistical systems with high accuracy are very useful in real-world applications. If these systems can capture basic linguistic information, then the usefulness of these statistical systems improve a lot. This paper is an attempt at incorporating linguistic constraints in statistical dependency parsing. We consider a simple linguistic constraint that a verb should not have multiple subjects/objects as its children in the dependency tree. We first describe the importance of this constraint considering Machine Translation systems which use dependency parser output, as an example application. We then show how the current state-ofthe-art dependency parsers violate this constraint. We present two new methods to handle this constraint. We evaluate our methods on the state-of-the-art dependency parsers for Hindi and Czech. 1
same-paper 3 0.89859056 54 acl-2010-Boosting-Based System Combination for Machine Translation
Author: Tong Xiao ; Jingbo Zhu ; Muhua Zhu ; Huizhen Wang
Abstract: In this paper, we present a simple and effective method to address the issue of how to generate diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. Our method is based on the framework of boosting. First, a sequence of weak translation systems is generated from a baseline system in an iterative manner. Then, a strong translation system is built from the ensemble of these weak translation systems. To adapt boosting to SMT system combination, several key components of the original boosting algorithms are redesigned in this work. We evaluate our method on Chinese-to-English Machine Translation (MT) tasks in three baseline systems, including a phrase-based system, a hierarchical phrasebased system and a syntax-based system. The experimental results on three NIST evaluation test sets show that our method leads to significant improvements in translation accuracy over the baseline systems. 1
4 0.83995926 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences
Author: Nathanael Chambers ; Dan Jurafsky
Abstract: This paper improves the use of pseudowords as an evaluation framework for selectional preferences. While pseudowords originally evaluated word sense disambiguation, they are now commonly used to evaluate selectional preferences. A selectional preference model ranks a set of possible arguments for a verb by their semantic fit to the verb. Pseudo-words serve as a proxy evaluation for these decisions. The evaluation takes an argument of a verb like drive (e.g. car), pairs it with an alternative word (e.g. car/rock), and asks a model to identify the original. This paper studies two main aspects of pseudoword creation that affect performance results. (1) Pseudo-word evaluations often evaluate only a subset of the words. We show that selectional preferences should instead be evaluated on the data in its entirety. (2) Different approaches to selecting partner words can produce overly optimistic evaluations. We offer suggestions to address these factors and present a simple baseline that outperforms the state-ofthe-art by 13% absolute on a newspaper domain.
5 0.83923626 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
Author: Fei Huang ; Alexander Yates
Abstract: Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling. In experiments, our novel system reduces error by 16% relative to the previous state of the art on out-of-domain text.
6 0.83699965 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation
7 0.83682656 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study
8 0.83412009 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
9 0.83290339 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems
10 0.83244419 144 acl-2010-Improved Unsupervised POS Induction through Prototype Discovery
11 0.83164954 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
12 0.83114946 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
13 0.83077431 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment
14 0.8301782 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results
15 0.83011806 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment
16 0.82782513 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
17 0.82732236 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network
18 0.82688624 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
19 0.82489252 114 acl-2010-Faster Parsing by Supertagger Adaptation
20 0.82453936 96 acl-2010-Efficient Optimization of an MDL-Inspired Objective Function for Unsupervised Part-Of-Speech Tagging