acl acl2010 acl2010-102 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Deyi Xiong ; Min Zhang ; Haizhou Li
Abstract: Automatic error detection is desired in the post-processing to improve machine translation quality. The previous work is largely based on confidence estimation using system-based features, such as word posterior probabilities calculated from Nbest lists or word lattices. We propose to incorporate two groups of linguistic features, which convey information from outside machine translation systems, into error detection: lexical and syntactic features. We use a maximum entropy classifier to predict translation errors by integrating word posterior probability feature and linguistic features. The experimental results show that 1) linguistic features alone outperform word posterior probability based confidence estimation in error detection; and 2) linguistic features can further provide complementary information when combined with word confidence scores, which collectively reduce the classification error rate by 18.52% and improve the F measure by 16.37%.
Reference: text
sentIndex sentText sentNum sentScore
1 {dyxi ong , mzhang , Abstract Automatic error detection is desired in the post-processing to improve machine translation quality. [sent-2, score-0.642]
2 The previous work is largely based on confidence estimation using system-based features, such as word posterior probabilities calculated from Nbest lists or word lattices. [sent-3, score-0.904]
3 We propose to incorporate two groups of linguistic features, which convey information from outside machine translation systems, into error detection: lexical and syntactic features. [sent-4, score-0.595]
4 We use a maximum entropy classifier to predict translation errors by integrating word posterior probability feature and linguistic features. [sent-5, score-0.974]
5 1 Introduction Translation hypotheses generated by a statistical machine translation (SMT) system always contain both correct parts (e. [sent-9, score-0.431]
6 A common approach to SMT error detection at the word level is calculating the confidence at which a word is correct. [sent-19, score-0.766]
7 The majority of word confidence estimation methods follows three steps: 1) Calculate features that express the correctness of words either based on SMT model (e. [sent-20, score-0.511]
8 Experimental results show that they are useful for error detection. [sent-36, score-0.224]
9 c As2s0o1c0ia Atisosnoc foiart Cionom fopru Ctaotmiopnuatla Lti onngaulis Lti cnsg,u piasgtiecs 604–611, sources from outside SMT systems is desired for error detection. [sent-41, score-0.224]
10 It has already been proven effective in error detection for speech recognition (Shi and Zhou, 2005). [sent-43, score-0.454]
11 However, it is not widely used in SMT error detection. [sent-44, score-0.254]
12 The reason is probably that people have yet to find effective linguistic features that outperform nonlinguistic features such as word posterior probability features (Blatz et al. [sent-45, score-1.026]
13 In this paper, we would like to show an effective use of linguistic features in SMT error detection. [sent-48, score-0.469]
14 We integrate two sets of linguistic features into a maximum entropy (MaxEnt) model and develop a MaxEnt-based binary classifier to predict the category (correct or incorrect) for each word in a generated target sentence. [sent-49, score-0.475]
15 Our experimental results show that linguistic features substantially improve error detection and even outperform word posterior probability features. [sent-50, score-1.204]
16 Further, they can produce additional improvements when combined with word posterior probability features. [sent-51, score-0.505]
17 In Section 2, we review the previous work on wordlevel confidence estimation which is used for error detection. [sent-53, score-0.531]
18 In Section 3, we introduce our linguistic features as well as the word posterior probability feature. [sent-54, score-0.75]
19 In Section 4, we elaborate our MaxEntbased error detection model which combine linguistic features and word posterior probability feature together. [sent-55, score-1.284]
20 2 Related Work In this section, we present an overview of confidence estimation (CE) for machine translation at the word level. [sent-58, score-0.522]
21 As we are only interested in error detection, we focus on work that uses confidence estimation approaches to detect translation errors. [sent-59, score-0.682]
22 Of course, confidence estimation is not limited to the application of error detection, it can also be used in other scenarios, such as translation prediction in an interactive environment (Grandrabur and Foster, 2003) . [sent-60, score-0.653]
23 (2003) investigate using neural networks and a naive Bayes classifier to combine various confidence features for confidence estimation at the word level as well as at the sentence level. [sent-62, score-0.726]
24 The features they use for word level CE include word posterior probabilities estimated from N-best lists, features based on SMT models, semantic features extracted from WordNet as well as simple syntactic features, i. [sent-63, score-1.036]
25 Among all these features, the word posterior probability is the most effective feature, which is much better than linguistic features such as semantic features, according to their final results. [sent-66, score-0.75]
26 Ueffing and Ney (2007) exhaustively explore various word-level confidence measures to label each word in a generated translation hypothesis as correct or incorrect. [sent-67, score-0.636]
27 All their measures are based on word posterior probabilities, which are estimated from 1) system output, such as word lattices or N-best lists and 2) word or phrase translation table. [sent-68, score-0.836]
28 Their experimental results show that word posterior probabilities directly estimated from phrase translation table are better than those from system output except for the Chinese-English language pair. [sent-69, score-0.676]
29 (2007) adopt a smoothed naive Bayes model to combine different word posterior probability based confidence features which are estimated from N-best lists, similar to (Ueffing and Ney, 2007). [sent-71, score-0.901]
30 (2009) study several confidence features based on mutual information between words and n-gram and backward n-gram language model for word-level and sentence-level CE. [sent-73, score-0.322]
31 They also explore linguistic features using information from syntactic category, tense, gender and so on. [sent-74, score-0.286]
32 Unfortunately, such linguistic features neither improve performance at the word level nor at the sentence level. [sent-75, score-0.309]
33 • • We exploit various linguistic features and sWhoew e txhpalto they are asb l ein to produce larger iamndprovements than widely used system-related features such as word posterior probabilities. [sent-77, score-0.828]
34 Yet another advantage of using linguistic features is that they are system-independent, which therefore can be used across different systems. [sent-79, score-0.245]
35 We treat error detection as a complete binary rcelaatss eirfircoartio dnet problem. [sent-80, score-0.498]
36 1 Most previous approaches make decisions based on a pretuned classification threshold τ as follows class ={ cinorcor ercet,ct, Φot(hceorrwreiscet,θ) > τ where Φ is a classifier or a confidence measure and θ is the parameter set of Φ. [sent-82, score-0.288]
37 3 Features We explore two sets of linguistic features for each word in a machine generated translation hypothesis. [sent-84, score-0.535]
38 The first set of linguistic features are simple lexical features. [sent-85, score-0.245]
39 The second set of linguistic features are syntactic features which are extracted from link grammar parse. [sent-86, score-0.552]
40 To compare with the previously widely used features, we also investigate features based on word posterior probabilities. [sent-87, score-0.583]
41 The basic idea of using these features is that words in rare patterns are more likely to be incorrect than words in frequently occurring patterns. [sent-93, score-0.24]
42 2 Syntactic Features High-level linguistic knowledge such as syntactic information about a word is a very natural and promising indicator to decide whether this word is syntactically correct or not. [sent-96, score-0.35]
43 The challenge of using syntactic knowledge for error detection is that machine- generated hypotheses are rarely fully grammatical. [sent-105, score-0.622]
44 They are mixed with grammatical and ungrammatical parts, which hence are not friendly to traditional parsers trained on grammatical sentences because ungrammatical parts of a machinegenerated sentence could lead to a parsing failure. [sent-106, score-0.214]
45 We hence straightforwardly define a syntactic feature for a word w according to its links as follows link(w) ={ nyoes, woth hearsw li sneks In Figure 1we show an example of a generated translation hypothesis with its link parse. [sent-117, score-0.627]
46 3 Word Posterior Probability Features Our word posterior probability is calculated on Nbest list, which is first proposed by (Ueffing et al. [sent-123, score-0.533]
47 The major work of calculating word posterior probabilities is to find the Levenshtein alignment (Levenshtein, 1966) between the best hypothesis e1 and its competing hypothesis 3Available at http://www. [sent-128, score-0.715]
48 The word in the hypothesis en which ei1 is Levenshtein aligned to is denoted as ℓi (e1, en). [sent-135, score-0.24]
49 The word posterior probability of ei1 is then calculated by summing up the probabilities over all hypotheses containing ei1 in a position which is Levenshtein aligned to ei1. [sent-136, score-0.688]
50 pwpp(ei1) =∑en: ℓ∑i(e1N1,epn()=ene1i)p(en) To use the word posterio∑r probability in our error detection model, we need to make it discrete. [sent-137, score-0.608]
51 We introduce a feature for a word w based on its word posterior probability as follows dwpp(w) = ⌊−log(pwpp(w))/df⌋ where df is the discrete factor which can be set to 1, 0. [sent-138, score-0.649]
52 Therefore a feature “dwpp = 2” represents that the logarithm of the word posterior probability is between -3 and -2; ” 4 Error Detection with a Maximum Entropy Model As mentioned before, we consider error detection as a binary classification task. [sent-144, score-1.123]
53 To formalize this task, we use a feature vector ψ to represent a word w in question, and a binary variable c to indicate whether this word is correct or not. [sent-145, score-0.326]
54 We collect features {wd, pos, link, dwpp} for each word among these {wwodrd,sp oans,dl icnokm,bdiwnep t}he fmor einactoh wtheo fde aamtuoren gvte hcetoser ψ for w. [sent-147, score-0.202]
55 , 1996) to predict whether a word w is correct or incorrect given its feature vector ψ. [sent-152, score-0.32]
56 p(c|ψ) =∑ec′xepx(p∑(∑iθi fθii(fci,(cψ′,))ψ)) where fi is a binary mode∑l feature defined on c and the feature vector ψ. [sent-153, score-0.204]
57 The challenge of collecting training {inψstan}ces is that the correctness of a word in a generated translation hypothesis is not intuitively clear (Ueffing and Ney, 2007). [sent-157, score-0.417]
58 5 SMT System To obtain machine-generated translation hypotheses for our error detection, we use a state-of-the-art phrase-based machine translation system MOSES (Koehn et al. [sent-163, score-0.66]
59 For minimum error rate tun- ing (Och, 2003), we use NIST MT-02 as the development set for the translation task. [sent-175, score-0.455]
60 In order to calculate word posterior probabilities, we generate 10,000 best lists for NIST MT-02/03/05 respectively. [sent-176, score-0.476]
61 Starting with MaxEnt models with single linguistic feature or word posterior probability based feature, we incorporated additional features incrementally by combining features together. [sent-180, score-0.968]
62 In doing so, we would like the experimental results not only to display the effectiveness of linguistic features for error detection but also to identify the additional contribution of each feature to the task. [sent-181, score-0.779]
63 1 Data Corpus For the error detection task, we use the best translation hypotheses of NIST MT-02/05/03 generated by MOSES as our training, development, and test corpus respectively. [sent-183, score-0.74]
64 T Der asvitneilnopgmentCM MoT rp- 0u 0532sSente1n098c 17e289s32 W541o, 623r12d 951s Table 3: Corpus statistics (number of sentences and words) for the error detection task. [sent-186, score-0.482]
65 To obtain the linkage information, we run the LG parser on all translation hypotheses. [sent-187, score-0.207]
66 To determine the true class of a word in a generated translation hypothesis, we follow (Blatz et al. [sent-193, score-0.293]
67 We tag a word as correct if it is aligned to itself in the Levenshtein alignment between the hypothesis and the nearest reference translation that has minimum edit distance to the hypothesis among four reference translations. [sent-195, score-0.635]
68 Figure 2 shows the Levenshtein alignment between a machine-generated hypothesis and its nearest reference translation. [sent-196, score-0.189]
69 The “Class” row shows the label of each word according to the alignment, where “c” and “i” represent correct and incorrect respectively. [sent-197, score-0.24]
70 In Figure 2, the two words “last year” in the hypothesis will be tagged as correct if we use the PER or Set metric since they do not consider the occurring positions of words. [sent-199, score-0.281]
71 It is also stricter than normal WER metric which compares each hypothesis to all references, rather than the nearest reference. [sent-331, score-0.224]
72 2 Evaluation Metrics To evaluate the overall performance of the error detection, we use the commonly used metric, classification error rate (CER) to evaluate our classi- fiers. [sent-334, score-0.522]
73 Hence the baseline CER in our experiments is equal to the ratio of correct words as these words are wrongly tagged as incorrect. [sent-337, score-0.208]
74 We also use precision and recall on errors to evaluate the performance of error detection. [sent-338, score-0.263]
75 Let ng be the number of words of which the true class is incorrect, nt be the number of words which are tagged as incorrect by classifiers, and nm be the number of words tagged as incorrect that are indeed translation errors. [sent-339, score-0.519]
76 The precision Pre is the percentage of words correctly tagged as translation errors. [sent-340, score-0.221]
77 3 Experimental Results Table 5 shows the performance of our experiments on the error detection task. [sent-344, score-0.454]
78 To compare with previous work using word posterior probabilities for confidence estimation, we carried out experiments using wpp estimated from N-best lists with the classification threshold τ, which was optimized on our development set to minimize CER. [sent-345, score-0.84]
79 27% is achieved over the baseline CER, which reconfirms the effectiveness of word posterior probabilities for error detection. [sent-347, score-0.705]
80 We conducted three groups of experiments using the MaxEnt based error detection model with various feature combinations. [sent-348, score-0.569]
81 55% relative improvement over the CER of word posterior probability thresholding. [sent-352, score-0.539]
82 Using discrete word posterior probabilities as features in the MaxEnt based error detection model is marginally better than word posterior probability thresholding in terms of CER, but obtains a 13. [sent-353, score-1.678]
83 The syntactic feature link also improves the error detection in terms of CER and particularly recall. [sent-355, score-0.703]
84 08961986480 Table 5: Performance of the error detection task. [sent-366, score-0.454]
85 • The second group of experiments concerns Twhiteh tehceo ncdom gbroiunapti oofn eoxfp linguistic foenactuerrenss without word posterior probability feature. [sent-367, score-0.612]
86 The combination of lexical features improves both CER and precision over single lexical feature (wd, pos). [sent-368, score-0.218]
87 The addition of syntactic feature link marginally undermines CER but improves recall by a lot. [sent-369, score-0.31]
88 • The last group of experiments concerns about tThhee a ldadsti gtioronuapl coof nextrpibeuritmioenn tosf c linguistic bfeoua-t tures to error detection with word posterior probability. [sent-370, score-0.976]
89 We added linguistic features incrementally into the feature pool. [sent-371, score-0.325]
90 The first two groups of experiments show that linguistic features, individually (except for link) or by combination, are able to produce much better performance than word posterior probability features in both CER and F measure. [sent-374, score-0.785]
91 The best combination of linguistic features achieves a relative improvement of 8. [sent-375, score-0.279]
92 58% in CER and F measure respectively over word posterior probability thresholding. [sent-377, score-0.505]
93 The Table 5 also reveals how linguistic features improve error detection. [sent-378, score-0.469]
94 This suggests that our error detection model can be learned from a rather small training set. [sent-386, score-0.454]
95 Figure 3 shows CERs for the feature combination MaxEnt (dwpp wd pos link) when the number of training sentences is enlarged incre- + + + mentally. [sent-387, score-0.246]
96 7 Conclusions and Future Work In this paper, we have presented a maximum entropy based approach to automatically detect errors in translation hypotheses generated by SMT 610 systems. [sent-390, score-0.406]
97 We incorporate two sets of linguistic features together with word posterior probability based features into error detection. [sent-391, score-1.112]
98 The extracted linguistic features are quite compact, which can be learned from a small train- ing set. [sent-393, score-0.245]
99 Therefore our approach can be used for other machine translation systems, such as rule-based or example-based system, which generally do not produce N-best lists. [sent-395, score-0.188]
100 Future work in this direction involve detecting particular error types such as incorrect positions, inappropriate/unnecessary words (Elliott, 2006) and automatically correcting errors. [sent-396, score-0.356]
wordName wordTfidf (topN-words)
[('posterior', 0.351), ('cer', 0.346), ('ueffing', 0.235), ('detection', 0.23), ('error', 0.224), ('confidence', 0.184), ('blatz', 0.176), ('smt', 0.175), ('translation', 0.159), ('dwpp', 0.144), ('features', 0.138), ('link', 0.128), ('levenshtein', 0.126), ('hypothesis', 0.117), ('sanchis', 0.117), ('lg', 0.117), ('raybaud', 0.115), ('ney', 0.112), ('linguistic', 0.107), ('incorrect', 0.102), ('wd', 0.093), ('maxent', 0.093), ('probability', 0.09), ('hypotheses', 0.089), ('cers', 0.086), ('pwpp', 0.086), ('estimation', 0.086), ('feature', 0.08), ('correct', 0.074), ('ungrammatical', 0.072), ('probabilities', 0.066), ('word', 0.064), ('tagged', 0.062), ('nbest', 0.061), ('marginally', 0.061), ('lists', 0.061), ('en', 0.059), ('shi', 0.059), ('akibay', 0.058), ('jayaraman', 0.058), ('ro', 0.054), ('om', 0.052), ('entropy', 0.052), ('nicola', 0.05), ('ye', 0.048), ('parser', 0.048), ('sleator', 0.046), ('prone', 0.046), ('pos', 0.045), ('binary', 0.044), ('simona', 0.043), ('parts', 0.042), ('syntactic', 0.041), ('wrongly', 0.041), ('nist', 0.041), ('bleu', 0.041), ('classification', 0.04), ('nearest', 0.04), ('correctness', 0.039), ('alberto', 0.039), ('gandrabur', 0.039), ('stricter', 0.039), ('thresholding', 0.039), ('rec', 0.039), ('errors', 0.039), ('koehn', 0.039), ('generated', 0.038), ('development', 0.038), ('naive', 0.038), ('lattices', 0.037), ('wer', 0.037), ('wordlevel', 0.037), ('zens', 0.037), ('moses', 0.037), ('estimated', 0.036), ('groups', 0.035), ('pre', 0.035), ('ch', 0.034), ('hermann', 0.034), ('relative', 0.034), ('rate', 0.034), ('bayes', 0.033), ('reference', 0.032), ('classifier', 0.032), ('class', 0.032), ('ratio', 0.031), ('berger', 0.031), ('na', 0.031), ('ld', 0.03), ('srilm', 0.03), ('widely', 0.03), ('correcting', 0.03), ('detect', 0.029), ('ne', 0.029), ('machine', 0.029), ('metric', 0.028), ('foster', 0.028), ('calculated', 0.028), ('sentences', 0.028), ('validate', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
Author: Deyi Xiong ; Min Zhang ; Haizhou Li
Abstract: Automatic error detection is desired in the post-processing to improve machine translation quality. The previous work is largely based on confidence estimation using system-based features, such as word posterior probabilities calculated from Nbest lists or word lattices. We propose to incorporate two groups of linguistic features, which convey information from outside machine translation systems, into error detection: lexical and syntactic features. We use a maximum entropy classifier to predict translation errors by integrating word posterior probability feature and linguistic features. The experimental results show that 1) linguistic features alone outperform word posterior probability based confidence estimation in error detection; and 2) linguistic features can further provide complementary information when combined with word confidence scores, which collectively reduce the classification error rate by 18.52% and improve the F measure by 16.37%.
2 0.20276614 54 acl-2010-Boosting-Based System Combination for Machine Translation
Author: Tong Xiao ; Jingbo Zhu ; Muhua Zhu ; Huizhen Wang
Abstract: In this paper, we present a simple and effective method to address the issue of how to generate diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. Our method is based on the framework of boosting. First, a sequence of weak translation systems is generated from a baseline system in an iterative manner. Then, a strong translation system is built from the ensemble of these weak translation systems. To adapt boosting to SMT system combination, several key components of the original boosting algorithms are redesigned in this work. We evaluate our method on Chinese-to-English Machine Translation (MT) tasks in three baseline systems, including a phrase-based system, a hierarchical phrasebased system and a syntax-based system. The experimental results on three NIST evaluation test sets show that our method leads to significant improvements in translation accuracy over the baseline systems. 1
3 0.18056455 45 acl-2010-Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures
Author: Jesus Gonzalez Rubio ; Daniel Ortiz Martinez ; Francisco Casacuberta
Abstract: This work deals with the application of confidence measures within an interactivepredictive machine translation system in order to reduce human effort. If a small loss in translation quality can be tolerated for the sake of efficiency, user effort can be saved by interactively translating only those initial translations which the confidence measure classifies as incorrect. We apply confidence estimation as a way to achieve a balance between user effort savings and final translation error. Empirical results show that our proposal allows to obtain almost perfect translations while significantly reducing user effort.
4 0.1800935 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment
Author: Vamshi Ambati ; Stephan Vogel ; Jaime Carbonell
Abstract: Semi-supervised word alignment aims to improve the accuracy of automatic word alignment by incorporating full or partial manual alignments. Motivated by standard active learning query sampling frameworks like uncertainty-, margin- and query-by-committee sampling we propose multiple query strategies for the alignment link selection task. Our experiments show that by active selection of uncertain and informative links, we reduce the overall manual effort involved in elicitation of alignment link data for training a semisupervised word aligner.
5 0.16829747 56 acl-2010-Bridging SMT and TM with Translation Recommendation
Author: Yifan He ; Yanjun Ma ; Josef van Genabith ; Andy Way
Abstract: We propose a translation recommendation framework to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We describe an implementation of this framework using an SVM binary classifier. We exploit methods to fine-tune the classifier and investigate a variety of features of different types. We rely on automatic MT evaluation metrics to approximate human judgements in our experiments. Experimental results show that our system can achieve 0.85 precision at 0.89 recall, excluding ex- act matches. Furthermore, it is possible for the end-user to achieve a desired balance between precision and recall by adjusting confidence levels.
6 0.14443518 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
7 0.13494623 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
8 0.12594441 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages
9 0.12557429 147 acl-2010-Improving Statistical Machine Translation with Monolingual Collocation
10 0.11353714 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking
11 0.11270294 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
12 0.10869961 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment
13 0.10766321 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
14 0.10453154 97 acl-2010-Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices
15 0.1035784 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
16 0.10111413 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation
17 0.099214785 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction
18 0.097907804 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation
19 0.097813196 133 acl-2010-Hierarchical Search for Word Alignment
20 0.096410282 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection
topicId topicWeight
[(0, -0.261), (1, -0.189), (2, -0.062), (3, -0.009), (4, 0.039), (5, 0.021), (6, -0.075), (7, -0.034), (8, 0.005), (9, 0.064), (10, 0.076), (11, 0.118), (12, 0.058), (13, -0.053), (14, 0.023), (15, 0.023), (16, -0.081), (17, 0.022), (18, -0.024), (19, 0.038), (20, -0.07), (21, 0.026), (22, 0.037), (23, 0.002), (24, 0.002), (25, -0.134), (26, 0.113), (27, 0.037), (28, 0.003), (29, 0.17), (30, 0.115), (31, 0.092), (32, 0.101), (33, 0.061), (34, 0.11), (35, -0.012), (36, -0.015), (37, 0.047), (38, 0.028), (39, -0.137), (40, 0.017), (41, 0.08), (42, -0.021), (43, 0.002), (44, -0.022), (45, 0.135), (46, 0.042), (47, -0.018), (48, -0.022), (49, -0.087)]
simIndex simValue paperId paperTitle
same-paper 1 0.95057249 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
Author: Deyi Xiong ; Min Zhang ; Haizhou Li
Abstract: Automatic error detection is desired in the post-processing to improve machine translation quality. The previous work is largely based on confidence estimation using system-based features, such as word posterior probabilities calculated from Nbest lists or word lattices. We propose to incorporate two groups of linguistic features, which convey information from outside machine translation systems, into error detection: lexical and syntactic features. We use a maximum entropy classifier to predict translation errors by integrating word posterior probability feature and linguistic features. The experimental results show that 1) linguistic features alone outperform word posterior probability based confidence estimation in error detection; and 2) linguistic features can further provide complementary information when combined with word confidence scores, which collectively reduce the classification error rate by 18.52% and improve the F measure by 16.37%.
2 0.85080862 56 acl-2010-Bridging SMT and TM with Translation Recommendation
Author: Yifan He ; Yanjun Ma ; Josef van Genabith ; Andy Way
Abstract: We propose a translation recommendation framework to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We describe an implementation of this framework using an SVM binary classifier. We exploit methods to fine-tune the classifier and investigate a variety of features of different types. We rely on automatic MT evaluation metrics to approximate human judgements in our experiments. Experimental results show that our system can achieve 0.85 precision at 0.89 recall, excluding ex- act matches. Furthermore, it is possible for the end-user to achieve a desired balance between precision and recall by adjusting confidence levels.
3 0.79966396 45 acl-2010-Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures
Author: Jesus Gonzalez Rubio ; Daniel Ortiz Martinez ; Francisco Casacuberta
Abstract: This work deals with the application of confidence measures within an interactivepredictive machine translation system in order to reduce human effort. If a small loss in translation quality can be tolerated for the sake of efficiency, user effort can be saved by interactively translating only those initial translations which the confidence measure classifies as incorrect. We apply confidence estimation as a way to achieve a balance between user effort savings and final translation error. Empirical results show that our proposal allows to obtain almost perfect translations while significantly reducing user effort.
4 0.73089314 54 acl-2010-Boosting-Based System Combination for Machine Translation
Author: Tong Xiao ; Jingbo Zhu ; Muhua Zhu ; Huizhen Wang
Abstract: In this paper, we present a simple and effective method to address the issue of how to generate diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. Our method is based on the framework of boosting. First, a sequence of weak translation systems is generated from a baseline system in an iterative manner. Then, a strong translation system is built from the ensemble of these weak translation systems. To adapt boosting to SMT system combination, several key components of the original boosting algorithms are redesigned in this work. We evaluate our method on Chinese-to-English Machine Translation (MT) tasks in three baseline systems, including a phrase-based system, a hierarchical phrasebased system and a syntax-based system. The experimental results on three NIST evaluation test sets show that our method leads to significant improvements in translation accuracy over the baseline systems. 1
5 0.6843999 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking
Author: Radu Soricut ; Abdessamad Echihabi
Abstract: The adoption ofMachine Translation technology for commercial applications is hampered by the lack of trust associated with machine-translated output. In this paper, we describe TrustRank, an MT system enhanced with a capability to rank the quality of translation outputs from good to bad. This enables the user to set a quality threshold, granting the user control over the quality of the translations. We quantify the gains we obtain in translation quality, and show that our solution works on a wide variety of domains and language pairs.
6 0.64381248 223 acl-2010-Tackling Sparse Data Issue in Machine Translation Evaluation
7 0.55344945 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking
8 0.54524475 151 acl-2010-Intelligent Selection of Language Model Training Data
9 0.54418665 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection
10 0.53753173 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
11 0.51001388 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation
12 0.50448489 57 acl-2010-Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation
13 0.50252867 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation
14 0.49896565 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
15 0.49485454 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment
16 0.49443051 97 acl-2010-Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices
17 0.49052501 104 acl-2010-Evaluating Machine Translations Using mNCD
18 0.48792842 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment
19 0.48770803 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
20 0.47569773 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
topicId topicWeight
[(14, 0.02), (16, 0.026), (25, 0.058), (39, 0.013), (42, 0.016), (59, 0.111), (73, 0.118), (75, 0.163), (76, 0.016), (78, 0.029), (80, 0.011), (83, 0.133), (84, 0.023), (98, 0.181)]
simIndex simValue paperId paperTitle
1 0.96560085 175 acl-2010-Models of Metaphor in NLP
Author: Ekaterina Shutova
Abstract: Automatic processing of metaphor can be clearly divided into two subtasks: metaphor recognition (distinguishing between literal and metaphorical language in a text) and metaphor interpretation (identifying the intended literal meaning of a metaphorical expression). Both of them have been repeatedly addressed in NLP. This paper is the first comprehensive and systematic review of the existing computational models of metaphor, the issues of metaphor annotation in corpora and the available resources.
2 0.91092741 60 acl-2010-Collocation Extraction beyond the Independence Assumption
Author: Gerlof Bouma
Abstract: In this paper we start to explore two-part collocation extraction association measures that do not estimate expected probabilities on the basis of the independence assumption. We propose two new measures based upon the well-known measures of mutual information and pointwise mutual information. Expected probabilities are derived from automatically trained Aggregate Markov Models. On three collocation gold standards, we find the new association measures vary in their effectiveness.
same-paper 3 0.89332592 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
Author: Deyi Xiong ; Min Zhang ; Haizhou Li
Abstract: Automatic error detection is desired in the post-processing to improve machine translation quality. The previous work is largely based on confidence estimation using system-based features, such as word posterior probabilities calculated from Nbest lists or word lattices. We propose to incorporate two groups of linguistic features, which convey information from outside machine translation systems, into error detection: lexical and syntactic features. We use a maximum entropy classifier to predict translation errors by integrating word posterior probability feature and linguistic features. The experimental results show that 1) linguistic features alone outperform word posterior probability based confidence estimation in error detection; and 2) linguistic features can further provide complementary information when combined with word confidence scores, which collectively reduce the classification error rate by 18.52% and improve the F measure by 16.37%.
4 0.82854831 56 acl-2010-Bridging SMT and TM with Translation Recommendation
Author: Yifan He ; Yanjun Ma ; Josef van Genabith ; Andy Way
Abstract: We propose a translation recommendation framework to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We describe an implementation of this framework using an SVM binary classifier. We exploit methods to fine-tune the classifier and investigate a variety of features of different types. We rely on automatic MT evaluation metrics to approximate human judgements in our experiments. Experimental results show that our system can achieve 0.85 precision at 0.89 recall, excluding ex- act matches. Furthermore, it is possible for the end-user to achieve a desired balance between precision and recall by adjusting confidence levels.
5 0.82697886 238 acl-2010-Towards Open-Domain Semantic Role Labeling
Author: Danilo Croce ; Cristina Giannone ; Paolo Annesi ; Roberto Basili
Abstract: Current Semantic Role Labeling technologies are based on inductive algorithms trained over large scale repositories of annotated examples. Frame-based systems currently make use of the FrameNet database but fail to show suitable generalization capabilities in out-of-domain scenarios. In this paper, a state-of-art system for frame-based SRL is extended through the encapsulation of a distributional model of semantic similarity. The resulting argument classification model promotes a simpler feature space that limits the potential overfitting effects. The large scale empirical study here discussed confirms that state-of-art accuracy can be obtained for out-of-domain evaluations.
6 0.8259095 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking
7 0.82375503 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
8 0.82261294 39 acl-2010-Automatic Generation of Story Highlights
9 0.82202977 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
10 0.82096779 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields
11 0.82076877 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People
12 0.82049215 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
13 0.81941587 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.
14 0.81924367 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries
15 0.81867898 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
16 0.81680667 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews
17 0.81600547 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results
18 0.81567907 133 acl-2010-Hierarchical Search for Word Alignment
19 0.813492 202 acl-2010-Reading between the Lines: Learning to Map High-Level Instructions to Commands
20 0.81343561 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation