acl acl2010 acl2010-56 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yifan He ; Yanjun Ma ; Josef van Genabith ; Andy Way
Abstract: We propose a translation recommendation framework to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We describe an implementation of this framework using an SVM binary classifier. We exploit methods to fine-tune the classifier and investigate a variety of features of different types. We rely on automatic MT evaluation metrics to approximate human judgements in our experiments. Experimental results show that our system can achieve 0.85 precision at 0.89 recall, excluding ex- act matches. Furthermore, it is possible for the end-user to achieve a desired balance between precision and recall by adjusting confidence levels.
Reference: text
sentIndex sentText sentNum sentScore
1 ie Abstract We propose a translation recommendation framework to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. [sent-3, score-0.396]
2 The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. [sent-4, score-0.209]
3 Furthermore, it is possible for the end-user to achieve a desired balance between precision and recall by adjusting confidence levels. [sent-11, score-0.356]
4 1 Introduction Recent years have witnessed rapid developments in statistical machine translation (SMT), with considerable improvements in translation quality. [sent-12, score-0.202]
5 However, to date most of the research has focused on better confidence measures for MT, e. [sent-22, score-0.192]
6 based on training regression models to perform confidence estimation on scores assigned by post-editors (cf. [sent-24, score-0.317]
7 Given that most postediting work is (still) based on TM output, we propose to recommend MT outputs which are better than TM hits to post-editors. [sent-27, score-0.199]
8 In this framework, post-editors still work with the TM while benefiting from (better) SMT outputs; the assets in TMs are not wasted and TM fuzzy match scores can still be used to estimate (the upper bound of) postediting labor. [sent-28, score-0.478]
9 Firstly, the recommendation should have high precision, otherwise it would be confusing for post-editors and may negatively affect the lower bound of the postediting effort. [sent-30, score-0.289]
10 Finally, post-editors should be able to easily adjust the recommendation threshold to particular requirements without having to retrain the model. [sent-34, score-0.244]
11 The first strand is confidence estimation for MT, initiated by (Ueffing et al. [sent-42, score-0.3]
12 The former experimented with confidence estimation with several different learning algorithms; the latter uses word-level confidence measures to determine whether a particular translation choice should be accepted or rejected in an interactive translation system. [sent-48, score-0.634]
13 The second strand of research focuses on combining TM information with an SMT system, so that the SMT system can produce better target language output when there is an exact or close match in the TM (Simard and Isabelle, 2009). [sent-49, score-0.262]
14 A third strand of research tries to incorporate confidence measures into a post-editing environment. [sent-51, score-0.252]
15 Instead of modeling on translation quality (often measured by automatic evaluation scores), this research uses regression on both the automatic scores and scores assigned by post-editors. [sent-54, score-0.262]
16 , 2009b), which applies Inductive Confidence Machines and a larger set of features to model post-editors’ judgement ofthe translation quality between ‘good’ and ‘bad’, or among three levels of post-editing effort. [sent-56, score-0.179]
17 However, we use outputs and features from the TM explicitly; therefore instead of having to solve a regression problem, we only have to solve a much easier binary prediction problem which can be integrated into TMs in a straightforward manner. [sent-58, score-0.209]
18 Because of this, the precision and recall scores reported in this paper are not directly comparable to those in (Specia et al. [sent-59, score-0.188]
19 4 Translation Recommendation as Binary Classification We use an SVM binary classifier to predict the relative quality of the SMT output to make a recommendation. [sent-69, score-0.144]
20 The SVM classifier uses features from the SMT system, the TM and additional linguistic features to estimate whether the SMT output is better than the hit from the TM. [sent-70, score-0.273]
21 1 Problem Formulation As we treat translation recommendation as a binary classification problem, we have a pair of outputs from TM and MT for each sentence. [sent-72, score-0.497]
22 Ideally the classifier will recommend the output that needs less post-editing effort. [sent-73, score-0.107]
23 2 Recommendation Confidence Estimation In classical settings involving SVMs, confidence levels are represented as margins of binary predictions. [sent-82, score-0.315]
24 What is more preferable is a probabilistic confidence score (e. [sent-84, score-0.236]
25 , 2007) to obtain the posterior probability of a classification, which is used as the confidence score in our system. [sent-88, score-0.278]
26 2 The TM Feature The TM feature is the fuzzy match (Sikes, 2007) cost of the TM hit. [sent-100, score-0.417]
27 The calculation of fuzzy match score itself is one of the core technologies in TM systems and varies among different vendors. [sent-101, score-0.425]
28 We compute fuzzy match cost as the minimum Edit Distance (Levenshtein, 1966) between the source and TM entry, normalized by the length of the source as in (6), as most of the current implementations are based on edit distance while allowing some additional flexible matching. [sent-102, score-0.56]
29 For fuzzy match scores F, this fuzzy match cost hfm roughly corresponds to 1−F. [sent-104, score-0.895]
30 Tchlaessdiiffficear-tion, and allows direct comparison between a pure TM system and a translation recommendation system in Section 5. [sent-106, score-0.401]
31 3 System-Independent Features We use several features that are independent of the translation system, which are useful when a third-party translation service is used or the MT system is simply treated as a black-box. [sent-111, score-0.276]
32 These features are source and target side LM scores, pseudo source fuzzy match scores and IBM model 1 scores. [sent-112, score-0.551]
33 The inputs that have lower perplexity or higher LM score are more similar to the dataset on which the SMT system is built. [sent-115, score-0.129]
34 Language model perplexity of the MT outputs are calculated, and LM probability is already part of the MT systems scores. [sent-118, score-0.117]
35 LM scores on TM outputs are also computed, though they are not as informative as scores on the MT side, since TM outputs should be grammatically perfect. [sent-119, score-0.266]
36 We compute the fuzzy match score between the original source sentence and this pseudo-source. [sent-122, score-0.448]
37 Therefore, the fuzzy match score here gives an estimation of the confidence level of the output. [sent-124, score-0.665]
38 The fuzzy match score does not measure whether the hit could be a correct translation, i. [sent-127, score-0.525]
39 , 1993) serves as a rough estimation of how good a translation it is on the word level; for the MT output, on the other hand, it is a black-box feature to estimate translation quality when the information from the translation model is not available. [sent-131, score-0.383]
40 1 Experimental Settings Our raw data set is an English–French translation memory with technical translation from Symantec, consisting of 5 1K sentence pairs. [sent-134, score-0.202]
41 2 The Evaluation Metrics We measure the quality of the classification by precision and recall. [sent-150, score-0.165]
42 Let A be the set of recommended MT outputs, and B be the set of MT outputs that have lower TER than TM hits. [sent-151, score-0.181]
43 We standardly define precision P, recall R and F-value as in (7): 1More specifically, we performed 5 iterations of Model 1, 5 iterations of HMM, 3 iterations of Model 3, and 3 iterations of Model 4. [sent-152, score-0.216]
44 3 Recommendation Results In Table 1, we report recommendation performance using MT and TM system features (SYS), system features plus system-independent features (ALL:SYS+SI), and system-independent features only (SI). [sent-154, score-0.484]
45 2645 From Table 1, we observe that MT and TM system-internal features are very useful for producing a stable (as indicated by the smaller confidence interval) recommendation system (SYS). [sent-185, score-0.51]
46 This indicates that at the default confidence level, current system-external (resp. [sent-191, score-0.192]
47 2 that combing both system-internal and external features can yield higher, more stable precision when adjusting the confidence levels of the classifier. [sent-196, score-0.359]
48 Additionally, the performance of system SI is promising given the fact that we are using only a limited number of simple features, which demonstrates a good prospect of applying our recommendation system to MT systems where we do not have access to their internal features. [sent-197, score-0.3]
49 4 Further Improving Recommendation Precision Table 1 shows that classification recall is very high, which suggests that precision can still be improved, even though the F-score is not low. [sent-199, score-0.176]
50 Considering that TM is the dominant technology used by post-editors, a recommendation to replace the hit from the TM would require more confidence, i. [sent-200, score-0.344]
51 9 precision at the cost of some recall, if necessary. [sent-204, score-0.129]
52 1 Classifier Margins We experiment with different margins on the training data to tune precision and recall in order to obtain a desired balance. [sent-208, score-0.249]
53 We try to achieve higher precision by enforcing a larger bias towards negative examples in the training set so that some borderline positive in- stances would actually be labeled as negative, and the classifier would have higher precision in the prediction stage as in (8). [sent-211, score-0.284]
54 05, other configurations all obtain higher precision than TER + 0. [sent-266, score-0.136]
55 85 precision without a big sacrifice in recall with b=0. [sent-268, score-0.175]
56 This is one limitation of using biased margins to 626 obtain high precision. [sent-273, score-0.122]
57 2 Adjusting Confidence Levels An alternative to using a biased margin is to output a confidence score during prediction and to threshold on the confidence score. [sent-279, score-0.533]
58 We use the SVM confidence estimation techniques in Section 4. [sent-281, score-0.24]
59 2 to obtain the confidence level of the recommendation, and change the confidence threshold for recommendation when necessary. [sent-282, score-0.65]
60 In a TM environment, some users simply ignore TM hits below a certain fuzzy match score F (usually from 0. [sent-284, score-0.472]
61 This fuzzy match score reflects the confidence of recommending the TM hits. [sent-287, score-0.715]
62 To obtain the confidence of recommending an SMT output, our baseline (FM) uses fuzzy match costs hFM ≈ 1−F (cf. [sent-288, score-0.674]
63 2e)r words, TthMe higher the fuzzy match cost of the TM hit is (lower fuzzy match score), the higher the confidence of recommending the SMT output. [sent-295, score-1.211]
64 Confdience Figure 1: Precision Changes with Confidence Level Figure 1 shows that the precision curve of FM is low and flat when the fuzzy match costs are low (from 0 to 0. [sent-297, score-0.474]
65 6), indicating that it is unwise to recommend an SMT output when the TM hit has a low fuzzy match cost (corresponding to higher fuzzy match score, from 0. [sent-298, score-0.996]
66 We also observe that the precision of the recommendation receives a boost when the fuzzy match costs for the TM hits are above 0. [sent-300, score-0.765]
67 3), indicating that SMT output should be recommended when the TM hit has a high fuzzy match cost (low fuzzy match score). [sent-302, score-1.049]
68 With this boost, the precision of the baseline system can reach 0. [sent-303, score-0.121]
69 85, demonstrating that a proper thresholding of fuzzy match scores can be used effectively to discriminate the recommendation of the TM hit from the recommendation of the SMT output. [sent-304, score-1.021]
70 For example, an excellent SMT output should be recommended even if there exists a good TM hit (e. [sent-306, score-0.251]
71 On the other hand, a misleading SMT output should not be recommended if there exists a poor but useful TM match (e. [sent-310, score-0.274]
72 Figure 1 shows that both the SYS and the ALL setting consistently outperform FM, indicating that our classification scheme can better integrate the MT output into the TM system than this naive baseline. [sent-315, score-0.119]
73 The SI feature set does not perform well when the confidence level is set above 0. [sent-316, score-0.192]
74 However, when the requirement on precision is not that high, and the MT-internal features are not available, it would still be desirable to obtain translation recommendations with these black-box features. [sent-320, score-0.296]
75 Note that our system will return the TM entry when there is an exact match, so the overall precision of the system 627 is above the precision score we set here in a mature TM environment, as a significant portion of the material to be translated will have a complete match in the TM system. [sent-345, score-0.409]
76 In Table 3 for MODEL@ K, the recall scores are achieved when the prediction precision is better than K with 0. [sent-346, score-0.214]
77 However, if we want to demand further recommendation precision (more conservative in recommending SMT output), the recall level will begin to drop more quickly. [sent-350, score-0.459]
78 If we use only system-independent features (SI), we cannot achieve as high precision as with other models even if we sacrifice more recall. [sent-351, score-0.178]
79 3 we suggested three sets of system-independent features: features based on the source- and target-side language model (LM), the IBM Model 1(M 1) and the fuzzy match scores on pseudo-source (PS). [sent-399, score-0.479]
80 In sum, all the three sets of system-independent features improve the precision and F-scores of the MT and TM system features. [sent-401, score-0.167]
81 The improvement is not significant, but improvement on every set of system-independent features gives some credit to the capability of SI features, as does the fact that SI features perform close to SYS features in Table 1. [sent-402, score-0.138]
82 6 Analysis of Post-Editing Effort A natural question on the integration models is whether the classification reduces the effort of the translators and post-editors: after reading these recommendations, will they translate/edit less than they would otherwise have to? [sent-403, score-0.107]
83 As we have not yet conducted a manual post-editing experiment, we conduct two sets of analyses, trying to show which type of edits will be required for different recommendation confidence levels. [sent-405, score-0.511]
84 the instances in which MT output is recommended over the TM hit) are given in Table 5. [sent-412, score-0.151]
85 When an MT output is recommended, its TM counterpart will require a larger average number of total edits than the MT output, as we expect. [sent-413, score-0.126]
86 In this case, the recommended MT output actually saves more effort for the editors than what is shown by the TER score. [sent-415, score-0.21]
87 the instances in which MT output is not recommended over the TM hit). [sent-419, score-0.151]
88 In this case, the MT output requires considerably more edits than the TM hits in terms of all four TER edit types, i. [sent-420, score-0.27]
89 2 Edit Statistics on Recommendations of Higher Confidence We present the edit statistics of recommendations with higher confidence in Table 7. [sent-425, score-0.372]
90 Comparing Tables 5 and 7, we see that if recommended with higher confidence, the MT output will need substantially less edits than the TM output: e. [sent-426, score-0.247]
91 From the characteristics of the high confidence recommendations, we suspect that these mainly comprise harder to translate (i. [sent-430, score-0.192]
92 By providing them with the TM output, the MT output and the one recommended to edit, we can measure the true accuracy of our recommendation, as well as the post-editing time we save for the post-editors; • Apply the presented method on open domain data and evaluate it using crowdsourcing. [sent-512, score-0.151]
93 Insodoing we handle the problem of MT quality estimation as binary prediction instead of regression. [sent-516, score-0.137]
94 We explore features from inside the MT system, from the TM, as well as features that make no assumption on the translation model for the binary classification. [sent-519, score-0.224]
95 89 recall, and even higher precision if we sacrifice more recall. [sent-523, score-0.153]
96 We present results to show that, if measured by number, type and content of edits in TER, the recommended sentences produced by the classification model would bring about less post-editing effort than the TM outputs. [sent-526, score-0.25]
97 A user study can serve two purposes: 1) it can validate the effectiveness of the method by measuring the amount of edit effort it saves; and 2) the byproduct of the user study post-edited sentences can be used to generate HTER scores to train a better recommendation model. [sent-530, score-0.428]
98 Fast, cheap, and creative: Evaluating translation quality using Amazon’s Mechanical Turk. [sent-550, score-0.133]
99 A study of translation edit rate with targeted human annotation. [sent-615, score-0.198]
100 Application of word-level confidence measures in interactive statistical machine translation. [sent-631, score-0.192]
wordName wordTfidf (topN-words)
[('tm', 0.696), ('fuzzy', 0.258), ('recommendation', 0.244), ('mt', 0.215), ('confidence', 0.192), ('smt', 0.182), ('ter', 0.125), ('match', 0.123), ('sys', 0.12), ('tms', 0.105), ('translation', 0.101), ('hit', 0.1), ('recommended', 0.1), ('edit', 0.097), ('precision', 0.093), ('outputs', 0.081), ('specia', 0.079), ('recommending', 0.079), ('edits', 0.075), ('margins', 0.072), ('lm', 0.064), ('si', 0.063), ('strand', 0.06), ('scores', 0.052), ('output', 0.051), ('svms', 0.051), ('estimation', 0.048), ('hits', 0.047), ('features', 0.046), ('hfm', 0.045), ('postediting', 0.045), ('svm', 0.045), ('score', 0.044), ('recall', 0.043), ('rbf', 0.043), ('summit', 0.042), ('classification', 0.04), ('sacrifice', 0.039), ('snover', 0.039), ('ueffing', 0.038), ('rr', 0.038), ('perplexity', 0.036), ('cost', 0.036), ('kernel', 0.035), ('effort', 0.035), ('recommendations', 0.034), ('platt', 0.034), ('substitution', 0.033), ('nicola', 0.032), ('translators', 0.032), ('deletion', 0.032), ('quality', 0.032), ('binary', 0.031), ('ibm', 0.031), ('xj', 0.03), ('mmtt', 0.03), ('simard', 0.03), ('symantec', 0.03), ('systemexternal', 0.03), ('classifier', 0.03), ('fm', 0.029), ('system', 0.028), ('biased', 0.028), ('adjusting', 0.028), ('statistics', 0.028), ('insertion', 0.028), ('professional', 0.027), ('recommend', 0.026), ('hter', 0.026), ('side', 0.026), ('prediction', 0.026), ('regression', 0.025), ('lucia', 0.024), ('saves', 0.024), ('xi', 0.024), ('source', 0.023), ('josef', 0.023), ('grid', 0.023), ('cortes', 0.023), ('localisation', 0.023), ('ottawa', 0.023), ('turchi', 0.023), ('environment', 0.022), ('obtain', 0.022), ('xii', 0.021), ('dublin', 0.021), ('libsvm', 0.021), ('ontario', 0.021), ('higher', 0.021), ('blatz', 0.02), ('posterior', 0.02), ('machines', 0.02), ('koehn', 0.02), ('settings', 0.02), ('pi', 0.02), ('iterations', 0.02), ('kneser', 0.02), ('tune', 0.019), ('reflects', 0.019), ('och', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 56 acl-2010-Bridging SMT and TM with Translation Recommendation
Author: Yifan He ; Yanjun Ma ; Josef van Genabith ; Andy Way
Abstract: We propose a translation recommendation framework to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We describe an implementation of this framework using an SVM binary classifier. We exploit methods to fine-tune the classifier and investigate a variety of features of different types. We rely on automatic MT evaluation metrics to approximate human judgements in our experiments. Experimental results show that our system can achieve 0.85 precision at 0.89 recall, excluding ex- act matches. Furthermore, it is possible for the end-user to achieve a desired balance between precision and recall by adjusting confidence levels.
2 0.16829747 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
Author: Deyi Xiong ; Min Zhang ; Haizhou Li
Abstract: Automatic error detection is desired in the post-processing to improve machine translation quality. The previous work is largely based on confidence estimation using system-based features, such as word posterior probabilities calculated from Nbest lists or word lattices. We propose to incorporate two groups of linguistic features, which convey information from outside machine translation systems, into error detection: lexical and syntactic features. We use a maximum entropy classifier to predict translation errors by integrating word posterior probability feature and linguistic features. The experimental results show that 1) linguistic features alone outperform word posterior probability based confidence estimation in error detection; and 2) linguistic features can further provide complementary information when combined with word confidence scores, which collectively reduce the classification error rate by 18.52% and improve the F measure by 16.37%.
3 0.14984843 54 acl-2010-Boosting-Based System Combination for Machine Translation
Author: Tong Xiao ; Jingbo Zhu ; Muhua Zhu ; Huizhen Wang
Abstract: In this paper, we present a simple and effective method to address the issue of how to generate diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. Our method is based on the framework of boosting. First, a sequence of weak translation systems is generated from a baseline system in an iterative manner. Then, a strong translation system is built from the ensemble of these weak translation systems. To adapt boosting to SMT system combination, several key components of the original boosting algorithms are redesigned in this work. We evaluate our method on Chinese-to-English Machine Translation (MT) tasks in three baseline systems, including a phrase-based system, a hierarchical phrasebased system and a syntax-based system. The experimental results on three NIST evaluation test sets show that our method leads to significant improvements in translation accuracy over the baseline systems. 1
4 0.13373096 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction
Author: Xiaojun Wan ; Huiying Li ; Jianguo Xiao
Abstract: Cross-language document summarization is a task of producing a summary in one language for a document set in a different language. Existing methods simply use machine translation for document translation or summary translation. However, current machine translation services are far from satisfactory, which results in that the quality of the cross-language summary is usually very poor, both in readability and content. In this paper, we propose to consider the translation quality of each sentence in the English-to-Chinese cross-language summarization process. First, the translation quality of each English sentence in the document set is predicted with the SVM regression method, and then the quality score of each sentence is incorporated into the summarization process. Finally, the English sentences with high translation quality and high informativeness are selected and translated to form the Chinese summary. Experimental results demonstrate the effectiveness and usefulness of the proposed approach. 1
5 0.13247541 169 acl-2010-Learning to Translate with Source and Target Syntax
Author: David Chiang
Abstract: Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. But progress has been slower on translation models that are able to learn the relationship between the grammars of both the source and target language. We discuss the reasons why this has been a challenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim- ple approach that uses both source and target syntax for significant improvements in translation accuracy.
6 0.1197743 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking
7 0.11045263 45 acl-2010-Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures
8 0.10292864 204 acl-2010-Recommendation in Internet Forums and Blogs
9 0.10083904 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking
10 0.099162675 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment
11 0.090833813 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment
12 0.086680867 30 acl-2010-An Open-Source Package for Recognizing Textual Entailment
13 0.083519131 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
14 0.081634514 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages
15 0.07876154 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
16 0.075559676 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
17 0.071293823 147 acl-2010-Improving Statistical Machine Translation with Monolingual Collocation
18 0.069598146 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
19 0.066407129 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
20 0.06480623 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation
topicId topicWeight
[(0, -0.176), (1, -0.125), (2, -0.069), (3, -0.002), (4, 0.018), (5, 0.019), (6, -0.06), (7, -0.043), (8, -0.075), (9, 0.042), (10, 0.1), (11, 0.135), (12, 0.044), (13, -0.067), (14, -0.012), (15, 0.034), (16, -0.001), (17, 0.005), (18, -0.054), (19, 0.036), (20, -0.024), (21, -0.017), (22, 0.049), (23, 0.018), (24, 0.002), (25, -0.064), (26, 0.11), (27, 0.046), (28, -0.033), (29, 0.179), (30, 0.127), (31, 0.066), (32, 0.102), (33, 0.105), (34, -0.076), (35, 0.006), (36, -0.024), (37, -0.054), (38, -0.021), (39, -0.12), (40, -0.055), (41, 0.027), (42, 0.061), (43, 0.054), (44, 0.004), (45, 0.044), (46, 0.118), (47, -0.015), (48, 0.007), (49, -0.056)]
simIndex simValue paperId paperTitle
same-paper 1 0.94827628 56 acl-2010-Bridging SMT and TM with Translation Recommendation
Author: Yifan He ; Yanjun Ma ; Josef van Genabith ; Andy Way
Abstract: We propose a translation recommendation framework to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We describe an implementation of this framework using an SVM binary classifier. We exploit methods to fine-tune the classifier and investigate a variety of features of different types. We rely on automatic MT evaluation metrics to approximate human judgements in our experiments. Experimental results show that our system can achieve 0.85 precision at 0.89 recall, excluding ex- act matches. Furthermore, it is possible for the end-user to achieve a desired balance between precision and recall by adjusting confidence levels.
2 0.78136528 45 acl-2010-Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures
Author: Jesus Gonzalez Rubio ; Daniel Ortiz Martinez ; Francisco Casacuberta
Abstract: This work deals with the application of confidence measures within an interactivepredictive machine translation system in order to reduce human effort. If a small loss in translation quality can be tolerated for the sake of efficiency, user effort can be saved by interactively translating only those initial translations which the confidence measure classifies as incorrect. We apply confidence estimation as a way to achieve a balance between user effort savings and final translation error. Empirical results show that our proposal allows to obtain almost perfect translations while significantly reducing user effort.
3 0.77414095 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking
Author: Radu Soricut ; Abdessamad Echihabi
Abstract: The adoption ofMachine Translation technology for commercial applications is hampered by the lack of trust associated with machine-translated output. In this paper, we describe TrustRank, an MT system enhanced with a capability to rank the quality of translation outputs from good to bad. This enables the user to set a quality threshold, granting the user control over the quality of the translations. We quantify the gains we obtain in translation quality, and show that our solution works on a wide variety of domains and language pairs.
4 0.74753112 223 acl-2010-Tackling Sparse Data Issue in Machine Translation Evaluation
Author: Ondrej Bojar ; Kamil Kos ; David Marecek
Abstract: We illustrate and explain problems of n-grams-based machine translation (MT) metrics (e.g. BLEU) when applied to morphologically rich languages such as Czech. A novel metric SemPOS based on the deep-syntactic representation of the sentence tackles the issue and retains the performance for translation to English as well.
5 0.74218315 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
Author: Deyi Xiong ; Min Zhang ; Haizhou Li
Abstract: Automatic error detection is desired in the post-processing to improve machine translation quality. The previous work is largely based on confidence estimation using system-based features, such as word posterior probabilities calculated from Nbest lists or word lattices. We propose to incorporate two groups of linguistic features, which convey information from outside machine translation systems, into error detection: lexical and syntactic features. We use a maximum entropy classifier to predict translation errors by integrating word posterior probability feature and linguistic features. The experimental results show that 1) linguistic features alone outperform word posterior probability based confidence estimation in error detection; and 2) linguistic features can further provide complementary information when combined with word confidence scores, which collectively reduce the classification error rate by 18.52% and improve the F measure by 16.37%.
6 0.69363153 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking
7 0.68309683 54 acl-2010-Boosting-Based System Combination for Machine Translation
8 0.63562208 104 acl-2010-Evaluating Machine Translations Using mNCD
9 0.50618428 57 acl-2010-Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation
10 0.48664868 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction
11 0.4407762 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
12 0.41864339 135 acl-2010-Hindi-to-Urdu Machine Translation through Transliteration
13 0.41800529 151 acl-2010-Intelligent Selection of Language Model Training Data
14 0.41638687 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation
15 0.39840215 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
16 0.3900637 117 acl-2010-Fine-Grained Genre Classification Using Structural Learning Algorithms
17 0.38147938 204 acl-2010-Recommendation in Internet Forums and Blogs
18 0.37873438 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment
19 0.37273008 68 acl-2010-Conditional Random Fields for Word Hyphenation
20 0.35643268 34 acl-2010-Authorship Attribution Using Probabilistic Context-Free Grammars
topicId topicWeight
[(2, 0.023), (15, 0.176), (16, 0.029), (25, 0.028), (33, 0.012), (39, 0.022), (42, 0.025), (44, 0.02), (59, 0.112), (71, 0.019), (73, 0.087), (76, 0.013), (78, 0.036), (80, 0.011), (83, 0.119), (84, 0.013), (98, 0.143)]
simIndex simValue paperId paperTitle
same-paper 1 0.87104201 56 acl-2010-Bridging SMT and TM with Translation Recommendation
Author: Yifan He ; Yanjun Ma ; Josef van Genabith ; Andy Way
Abstract: We propose a translation recommendation framework to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We describe an implementation of this framework using an SVM binary classifier. We exploit methods to fine-tune the classifier and investigate a variety of features of different types. We rely on automatic MT evaluation metrics to approximate human judgements in our experiments. Experimental results show that our system can achieve 0.85 precision at 0.89 recall, excluding ex- act matches. Furthermore, it is possible for the end-user to achieve a desired balance between precision and recall by adjusting confidence levels.
2 0.77574277 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
Author: Deyi Xiong ; Min Zhang ; Haizhou Li
Abstract: Automatic error detection is desired in the post-processing to improve machine translation quality. The previous work is largely based on confidence estimation using system-based features, such as word posterior probabilities calculated from Nbest lists or word lattices. We propose to incorporate two groups of linguistic features, which convey information from outside machine translation systems, into error detection: lexical and syntactic features. We use a maximum entropy classifier to predict translation errors by integrating word posterior probability feature and linguistic features. The experimental results show that 1) linguistic features alone outperform word posterior probability based confidence estimation in error detection; and 2) linguistic features can further provide complementary information when combined with word confidence scores, which collectively reduce the classification error rate by 18.52% and improve the F measure by 16.37%.
3 0.76884425 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking
Author: Radu Soricut ; Abdessamad Echihabi
Abstract: The adoption ofMachine Translation technology for commercial applications is hampered by the lack of trust associated with machine-translated output. In this paper, we describe TrustRank, an MT system enhanced with a capability to rank the quality of translation outputs from good to bad. This enables the user to set a quality threshold, granting the user control over the quality of the translations. We quantify the gains we obtain in translation quality, and show that our solution works on a wide variety of domains and language pairs.
4 0.76864755 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
Author: Fei Huang ; Alexander Yates
Abstract: Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling. In experiments, our novel system reduces error by 16% relative to the previous state of the art on out-of-domain text.
5 0.76690954 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
Author: Partha Pratim Talukdar ; Fernando Pereira
Abstract: Graph-based semi-supervised learning (SSL) algorithms have been successfully used to extract class-instance pairs from large unstructured and structured text collections. However, a careful comparison of different graph-based SSL algorithms on that task has been lacking. We compare three graph-based SSL algorithms for class-instance acquisition on a variety of graphs constructed from different domains. We find that the recently proposed MAD algorithm is the most effective. We also show that class-instance extraction can be significantly improved by adding semantic information in the form of instance-attribute edges derived from an independently developed knowledge base. All of our code and data will be made publicly available to encourage reproducible research in this area.
6 0.7632668 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences
7 0.76304001 158 acl-2010-Latent Variable Models of Selectional Preference
8 0.76129317 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
9 0.75947332 39 acl-2010-Automatic Generation of Story Highlights
10 0.75786221 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People
11 0.75773627 54 acl-2010-Boosting-Based System Combination for Machine Translation
12 0.75712764 85 acl-2010-Detecting Experiences from Weblogs
13 0.75556964 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
14 0.75540149 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment
15 0.75458884 144 acl-2010-Improved Unsupervised POS Induction through Prototype Discovery
16 0.75450683 42 acl-2010-Automatically Generating Annotator Rationales to Improve Sentiment Classification
17 0.7544831 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
18 0.75404716 29 acl-2010-An Exact A* Method for Deciphering Letter-Substitution Ciphers
19 0.75359476 238 acl-2010-Towards Open-Domain Semantic Role Labeling
20 0.75307184 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields