acl acl2010 acl2010-244 knowledge-graph by maker-knowledge-mining

244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking


Source: pdf

Author: Radu Soricut ; Abdessamad Echihabi

Abstract: The adoption ofMachine Translation technology for commercial applications is hampered by the lack of trust associated with machine-translated output. In this paper, we describe TrustRank, an MT system enhanced with a capability to rank the quality of translation outputs from good to bad. This enables the user to set a quality threshold, granting the user control over the quality of the translations. We quantify the gains we obtain in translation quality, and show that our solution works on a wide variety of domains and language pairs.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract The adoption ofMachine Translation technology for commercial applications is hampered by the lack of trust associated with machine-translated output. [sent-3, score-0.237]

2 In this paper, we describe TrustRank, an MT system enhanced with a capability to rank the quality of translation outputs from good to bad. [sent-4, score-0.222]

3 This enables the user to set a quality threshold, granting the user control over the quality of the translations. [sent-5, score-0.228]

4 We quantify the gains we obtain in translation quality, and show that our solution works on a wide variety of domains and language pairs. [sent-6, score-0.289]

5 1 Introduction The accuracy of machine translation (MT) software has steadily increased over the last 20 years to achieve levels at which large-scale commercial applications of the technology have become feasible. [sent-7, score-0.207]

6 However, widespread adoption of MT technology remains hampered by the lack of trust associated with machine-translated output. [sent-8, score-0.185]

7 This lack of trust is a normal reaction to the erratic translation quality delivered by current state-of-theart MT systems. [sent-9, score-0.294]

8 Unfortunately, the lack of predictable quality discourages the adoption of largescale automatic translation solutions. [sent-10, score-0.299]

9 Consider the case of a commercial enterprise that hosts reviews written by travellers on its web site. [sent-11, score-0.137]

10 However, travel reviews present specific challenges: the reviews tend to have poor spelling, loose grammar, and broad topics of discussion. [sent-18, score-0.272]

11 We develop TrustRank, an MT system enhanced with a capability to rank the quality of translation outputs from good to bad. [sent-22, score-0.222]

12 This enables the user to set a quality threshold, granting the user control over the quality of the translations that it employs in its product. [sent-23, score-0.36]

13 2 Related Work Work on automatic MT evaluation started with the idea of comparing automatic translations against human-produced references. [sent-26, score-0.132]

14 In contrast, we are interested in performing MT quality assessments on documents for which reference translations are not available. [sent-33, score-0.363]

15 Reference-free approaches to automatic MT quality assessment, based on Machine Learning techniques such as classification (Kulesza and Shieber, 2004), regression (Albrecht and Hwa, 2007), and ranking (Ye et al. [sent-34, score-0.468]

16 In contrast, we focus on evaluating the quality of the translations themselves, while the MT system is kept constant. [sent-42, score-0.228]

17 The goal ofthis work is to identify small units of translated material (words and phrases) for which one can be confident in the quality of the translation. [sent-45, score-0.126]

18 Document-level granularity is a requirement for large-scale commercial applications that use fully-automated translation solutions. [sent-55, score-0.178]

19 In contrast, quality-prediction or confidence estimation at sentence- or word-level fits best a scenario in which automated translation is only a part of a larger pipeline. [sent-58, score-0.126]

20 Such pipelines usually involve human post-editing, and are useful for translation productivity (Lagarda et al. [sent-59, score-0.126]

21 Our fully-automated solution targets large volume translation needs, on the order of 10,000 documents/day or more. [sent-62, score-0.155]

22 However, they can be substituted with any of the proposed MT metrics that use human-produced references to automatically assess translation quality (Doddington, 2002; Lavie and Agarwal, 2007). [sent-66, score-0.263]

23 Third, the main metric we use to assess the performance of our solution is targeted directly at measuring translation quality gains. [sent-74, score-0.251]

24 We are interested in ranking N documents and assigning them into n quantiles. [sent-90, score-0.257]

25 The formula is: rAcc[n] = Avgin=1TNnPi=N1× Σin=1TPi where TPi (True-Positivei) is the number of correctly-assigned documents in quantile i. [sent-91, score-0.197]

26 Intuitively, this formula is an average of the ratio of documents correctly assigned in each quantile. [sent-92, score-0.118]

27 Therefore, the performance of any decent ranking method, when using 4 quantiles, can be expected to fall somewhere between these bounds. [sent-96, score-0.122]

28 Intuitively, this formula provides a volume-weighted average of the BLEU gain obtained while varying the threshold of acceptance from 1 to n-1. [sent-106, score-0.159]

29 (A threshold of acceptance set to the n-th quantile means accepting all the translations and therefore ignore the rankings, so we do not include it in the average. [sent-107, score-0.279]

30 With oracle ranking, the expected vBLEU∆[n] is a positive number representative of the upperbound on the quality of the translations that pass an acceptance threshold. [sent-112, score-0.446]

31 The choice regarding the number of quantiles is closely related to the choice of setting an acceptance quality threshold. [sent-114, score-0.308]

32 Because we want the solution to stay unchanged while the acceptance quality threshold can vary, we cannot treat this as a classification problem. [sent-115, score-0.193]

33 Instead, we need to provide a complete ranking over an input set of documents. [sent-116, score-0.122]

34 As already mentioned, TrustRank uses a regression method that is trained on BLEU scores as training labels. [sent-117, score-0.288]

35 The regression functions are then used to predict a BLEU-like number for each document in the input set. [sent-118, score-0.325]

36 Reference ranking is obtained similarly, using actual BLEU scores. [sent-120, score-0.154]

37 Although we are mainly interested in the ranking problem here, it helps to look at the error produced by the regression models to arrive at a more complete picture. [sent-121, score-0.446]

38 Our system produces translations that are competitive with state-of-the-art systems. [sent-126, score-0.132]

39 From the Regression set, we set aside 1000 parallel documents to be used as a blind test set (called Regression Test) for our experiments. [sent-142, score-0.151]

40 An additional set of 1000 parallel documents is used as a development set, and the rest of 1000 parallel documents is used as the regression-model training set. [sent-143, score-0.246]

41 The regression Test sets have the same distribution between Europarl data and news as the corresponding training data set for each language pair. [sent-151, score-0.25]

42 4 The ranking algorithm As mentioned before, TrustRank vised Machine Learning approach. [sent-152, score-0.122]

43 ically generate the training labels BLEU scores for every document sion training set. [sent-153, score-0.113]

44 The learning technique that consistently yields the best results is M5P regression trees (weka. [sent-167, score-0.25]

45 As an additional advantage, the decision trees and the regression models produced in training are easy to read, understand, and interpret. [sent-172, score-0.28]

46 They can be applied on the input, where they induce a correlation between the number of words in the input document and the expected BLEU score for that document size. [sent-181, score-0.21]

47 Language-model–based features These features are among the ones that were first proposed as possible differentiators between good and bad translations (Gamon et al. [sent-183, score-0.202]

48 Pseudo-reference–based features Previous work has shown that, in the absence of human-produced references, automaticallyproduced ones are still helpful in differentiating between good and bad translations (Albrecht and Hwa, 2008). [sent-187, score-0.167]

49 When computed on the target side, this type of features requires one or more secondary MT systems, used to generate translations starting from the same input. [sent-188, score-0.208]

50 These pseudoreferences are useful in gauging translation convergence, using BLEU scores as feature values. [sent-189, score-0.2]

51 In intuitive terms, their usefulness can be summarized as follows: “if system X produced a translation A and system Y produced a translation B starting from the same input, and A and B are sim- ilar, then A is probably a good translation”. [sent-190, score-0.339]

52 This property ensures that a convergence on similar translations is not just an artifact, but a true indication that the translations are correct. [sent-192, score-0.264]

53 A translated document produced by the main MT system is fed to the secondary MT system(s), translated back into the original source language, and used as pseudoreference(s) when computing a BLEU score for the original input. [sent-198, score-0.237]

54 Example-based features For example-based features, we use a development set of 1000 parallel documents, for which we produce translations and compute document-level BLEU scores. [sent-200, score-0.199]

55 We set aside the top-100 BLEU scoring documents and bottom-100 BLEU scoring documents. [sent-201, score-0.119]

56 Training-data–based features If the main MT system is trained on a parallel corpus, the data in this corpus can be exploited towards assessing translation quality (Specia et al. [sent-207, score-0.289]

57 A more powerful type of training-data–based features operates by computing a BLEU score between a document (source or target side) and the training-data documents used as references. [sent-211, score-0.232]

58 As a baseline, we use a regression function that outputs a constant number for each document, equal to the BLEU score of the Regression Training set. [sent-218, score-0.281]

59 As an upperbound, we use an oracle regression function that outputs a number for each document that is equal to the actual BLEU score of that document. [sent-219, score-0.43]

60 In Table 4, we present the performance of these regression functions across all the domains considered. [sent-220, score-0.317]

61 The vBLEU∆ values are bounded by 0 as lowerbound, and some positive BLEU gain value that varies among the domains we considered from +6. [sent-222, score-0.161]

62 50 0 Table 4: Lower- and upper-bounds for ranking and regression accuracy (English-Spanish). [sent-233, score-0.401]

63 The ranking accuracy numbers on a per-quantile basis reveals an important property for the approach we advocate. [sent-235, score-0.213]

64 The ranking accuracy on the first quantile Q1 (identifying the best 25% of the translations) is 52% on average across the domains. [sent-236, score-0.23]

65 This is much better than the ranking accuracy for the median-quality translations (35-37% accuracy for the two middle quantiles). [sent-238, score-0.312]

66 This property fits well our scenario, in which we are interested in associating trust in the quality of the translations in the top quantile. [sent-239, score-0.344]

67 The quality of the top quantile translations is quantifiable in terms of BLEU gain. [sent-240, score-0.307]

68 The 250 document translations in Q1 for Travel have a BLEU score of 38. [sent-241, score-0.238]

69 The assumption we make here is that the translation time dwarfs the time needed for fea617 Table 3: Detailed performance using all features (English-Spanish). [sent-256, score-0.161]

70 Therefore, the most time-expensive feature is the source-side pseudo-reference–based feature, which effectively doubles the translation time required. [sent-258, score-0.126]

71 Benefits vary by domain Even with oracle rankings (Table 4), the benefits vary from one domain to the next. [sent-268, score-0.194]

72 Performance varies by domain As the results in Table 3 show, the best performance we obtain also varies from one domain to the next. [sent-276, score-0.15]

73 For instance, the ranking accuracy for the WMT09 domain is only 32%, while for the HiTech domain is 59%. [sent-277, score-0.241]

74 Poor regression performance By looking at the results of the regression metrics, we conclude that the predicted BLEU numbers are not accurate in absolute value. [sent-297, score-0.608]

75 8, but it is too high to allow us to confidently use the document-level BLEU numbers as reliable indicators of translation accuracy. [sent-301, score-0.217]

76 We report the BLEU score obtained on our 1000document regression Test, as well as ranking and regression performance using the rAcc, vBLEU∆, MAE, and TE metrics. [sent-306, score-0.653]

77 As the numbers for the ranking and regression metrics show, the same trends we observed for English-Spanish hold for many other language pairs as well. [sent-307, score-0.475]

78 Some domains, such as HiTech, are easier to rank regardless of the language pair, and the quality gains are consistently high (+9. [sent-308, score-0.132]

79 For Travel, EnglishDutch is also an outlier in terms of quality gains (+12. [sent-314, score-0.132]

80 Similar with the conclusion for English-Spanish, the regression performance is currently too poor to allow us to confidently use the absolute document-level predicted BLEU numbers as indicators of translation accuracy. [sent-317, score-0.543]

81 6 Examples and Illustrations As the experimental results in Table 6 show, the regression performance varies considerably across domains. [sent-318, score-0.28]

82 The amount of correlation visible in these plots matches the performance numbers provided in Table 6, with Travel Eng-Fra at a lower level of correlation compared to Travel Eng-Dut and HiTech Eng-Rus. [sent-322, score-0.12]

83 Travel Eng-Fra case, the predicted BLEU numbers are spread across a narrower band (95% of the values are in the [19-35] interval), compared to the actual BLEU scores (95% of the values are in the [11-47] interval). [sent-338, score-0.178]

84 In the case of Travel Eng-Fra, the actual BLEU scores are clustered in a narrower band (interval [11-47] covers 95% of the values), compared to the actual BLEU scores for Travel Eng-Dut (interval [11-92] covers 95% of the values) and HiTech Eng-Rus (interval [3-80] covers 95% of the values). [sent-340, score-0.14]

85 means that the documents in the latter cases are easier to distinguish, compared to the documents in Travel Eng-Fra. [sent-342, score-0.182]

86 To provide an intuitive feel for the difference between the level of translation performance between documents ranked close to the bottom and documents ranked close to the top, we present here two example translations. [sent-343, score-0.308]

87 They are documents that we randomly picked from the bottom 10% and top 10% of the Travel Eng-Fra document set, and they correspond to points A and B in the first plot of Figure 1, respectively. [sent-344, score-0.166]

88 la chambre, le personnel, m eˆme d’autres clients dans d’autres pays, c’est tr` es agr e´able de voir que tout le monde vous aurais savoir au cours de ces derni `eres m eˆme si, ou bien ils vous, ne parlent pas chaque d’autres langues. [sent-351, score-0.157]

89 Nous adorons l’ˆ ıle des que, hopefuly, c’est l’endroit o u` nous avons retiring, nous ne pour chercher un endroit abordable. [sent-352, score-0.173]

90 Je conseil e cet h oˆtel a` Document A-Fra is a poor translation, and is ranked in the bottom 10%, while document B-Fra is a nearly-perfect translation ranked in the top 10%, out of a total of 1000 documents. [sent-368, score-0.231]

91 7 Conclusions and Future Work Commercial adoption of MT technology requires trust in the translation quality. [sent-369, score-0.275]

92 We present a mechanism that allows MT users to trade quantity for quality, using automaticallydetermined translation quality rankings. [sent-371, score-0.222]

93 The results we present in this paper show that document-level translation quality rankings provide quantitatively strong gains in translation quality, as measured by BLEU. [sent-372, score-0.446]

94 9 BLEU, like the one we obtain for the English- Spanish HiTech domain (Table 3), is persuasive evidence for inspiring trust in the quality of selected translations. [sent-374, score-0.213]

95 This approach enables us to develop TrustRank, a complete MT solution that enhances automatic translation with the ability to identify document subsets containing translations that pass an acceptable quality threshold. [sent-375, score-0.458]

96 When measuring the performance of our solution across several domains, it becomes clear that some domains allow for more accurate quality prediction than others. [sent-376, score-0.192]

97 Given the immediate benefit that can be derived from increasing the ranking accuracy for translation quality, we plan to open up publicly available benchmark data that can be used to stimulate and rigorously monitor progress in this direction. [sent-377, score-0.277]

98 The contribution of linguistic features to automatic machine translation evaluation. [sent-388, score-0.161]

99 Automatic evaluation of machine translation quality using n-gram coocurrence statistics. [sent-405, score-0.222]

100 METEOR: An autoamtic metric for mt evaluation with high levels of correlation with human judgments. [sent-447, score-0.233]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('bleu', 0.425), ('vbleu', 0.396), ('trustrank', 0.324), ('regression', 0.25), ('mt', 0.204), ('hitech', 0.198), ('racc', 0.162), ('travel', 0.148), ('quantiles', 0.144), ('translations', 0.132), ('translation', 0.126), ('ranking', 0.122), ('upperbound', 0.108), ('quality', 0.096), ('documents', 0.091), ('specia', 0.081), ('quantile', 0.079), ('albrecht', 0.079), ('mae', 0.079), ('adoption', 0.077), ('document', 0.075), ('trust', 0.072), ('acceptance', 0.068), ('domains', 0.067), ('gain', 0.064), ('gamon', 0.064), ('numbers', 0.062), ('rankings', 0.062), ('autres', 0.054), ('maete', 0.054), ('nous', 0.054), ('commercial', 0.052), ('est', 0.051), ('blatz', 0.049), ('interval', 0.047), ('reviews', 0.047), ('predicted', 0.046), ('domain', 0.045), ('interested', 0.044), ('ravi', 0.042), ('oracle', 0.042), ('secondary', 0.041), ('metrics', 0.041), ('enterprise', 0.038), ('scores', 0.038), ('gains', 0.036), ('amig', 0.036), ('bien', 0.036), ('bleuk', 0.036), ('domainraccvbleu', 0.036), ('endroit', 0.036), ('garage', 0.036), ('gauging', 0.036), ('granting', 0.036), ('gunawardana', 0.036), ('hampered', 0.036), ('hopefuly', 0.036), ('intercontinental', 0.036), ('lowerbound', 0.036), ('parking', 0.036), ('predbleu', 0.036), ('predbleuk', 0.036), ('vous', 0.036), ('side', 0.036), ('te', 0.036), ('electronics', 0.035), ('lavie', 0.035), ('features', 0.035), ('hwa', 0.034), ('kulesza', 0.033), ('parallel', 0.032), ('actual', 0.032), ('owczarzak', 0.032), ('haddow', 0.032), ('weaver', 0.032), ('kauchak', 0.032), ('hotel', 0.032), ('staff', 0.032), ('personnel', 0.032), ('retiring', 0.032), ('score', 0.031), ('variety', 0.031), ('varies', 0.03), ('produced', 0.03), ('poor', 0.03), ('translated', 0.03), ('solution', 0.029), ('correlation', 0.029), ('genres', 0.029), ('confidently', 0.029), ('lagarda', 0.029), ('que', 0.029), ('dem', 0.029), ('jn', 0.029), ('pour', 0.029), ('accuracy', 0.029), ('aside', 0.028), ('le', 0.028), ('formula', 0.027), ('probably', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking

Author: Radu Soricut ; Abdessamad Echihabi

Abstract: The adoption ofMachine Translation technology for commercial applications is hampered by the lack of trust associated with machine-translated output. In this paper, we describe TrustRank, an MT system enhanced with a capability to rank the quality of translation outputs from good to bad. This enables the user to set a quality threshold, granting the user control over the quality of the translations. We quantify the gains we obtain in translation quality, and show that our solution works on a wide variety of domains and language pairs.

2 0.20621015 54 acl-2010-Boosting-Based System Combination for Machine Translation

Author: Tong Xiao ; Jingbo Zhu ; Muhua Zhu ; Huizhen Wang

Abstract: In this paper, we present a simple and effective method to address the issue of how to generate diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. Our method is based on the framework of boosting. First, a sequence of weak translation systems is generated from a baseline system in an iterative manner. Then, a strong translation system is built from the ensemble of these weak translation systems. To adapt boosting to SMT system combination, several key components of the original boosting algorithms are redesigned in this work. We evaluate our method on Chinese-to-English Machine Translation (MT) tasks in three baseline systems, including a phrase-based system, a hierarchical phrasebased system and a syntax-based system. The experimental results on three NIST evaluation test sets show that our method leads to significant improvements in translation accuracy over the baseline systems. 1

3 0.18754761 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction

Author: Xiaojun Wan ; Huiying Li ; Jianguo Xiao

Abstract: Cross-language document summarization is a task of producing a summary in one language for a document set in a different language. Existing methods simply use machine translation for document translation or summary translation. However, current machine translation services are far from satisfactory, which results in that the quality of the cross-language summary is usually very poor, both in readability and content. In this paper, we propose to consider the translation quality of each sentence in the English-to-Chinese cross-language summarization process. First, the translation quality of each English sentence in the document set is predicted with the SVM regression method, and then the quality score of each sentence is incorporated into the summarization process. Finally, the English sentences with high translation quality and high informativeness are selected and translated to form the Chinese summary. Experimental results demonstrate the effectiveness and usefulness of the proposed approach. 1

4 0.16190243 223 acl-2010-Tackling Sparse Data Issue in Machine Translation Evaluation

Author: Ondrej Bojar ; Kamil Kos ; David Marecek

Abstract: We illustrate and explain problems of n-grams-based machine translation (MT) metrics (e.g. BLEU) when applied to morphologically rich languages such as Czech. A novel metric SemPOS based on the deep-syntactic representation of the sentence tackles the issue and retains the performance for translation to English as well.

5 0.13291918 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking

Author: Hiroshi Echizen-ya ; Kenji Araki

Abstract: As described in this paper, we propose a new automatic evaluation method for machine translation using noun-phrase chunking. Our method correctly determines the matching words between two sentences using corresponding noun phrases. Moreover, our method determines the similarity between two sentences in terms of the noun-phrase order of appearance. Evaluation experiments were conducted to calculate the correlation among human judgments, along with the scores produced us- ing automatic evaluation methods for MT outputs obtained from the 12 machine translation systems in NTCIR7. Experimental results show that our method obtained the highest correlations among the methods in both sentence-level adequacy and fluency.

6 0.12884267 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out

7 0.1197743 56 acl-2010-Bridging SMT and TM with Translation Recommendation

8 0.11584479 45 acl-2010-Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures

9 0.11425639 57 acl-2010-Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation

10 0.11353714 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features

11 0.1000087 133 acl-2010-Hierarchical Search for Word Alignment

12 0.096852735 69 acl-2010-Constituency to Dependency Translation with Forests

13 0.096689232 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation

14 0.096029557 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules

15 0.095355004 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation

16 0.094275273 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment

17 0.084012747 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages

18 0.076748982 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment

19 0.076160237 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation

20 0.073536426 50 acl-2010-Bilingual Lexicon Generation Using Non-Aligned Signatures


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.203), (1, -0.143), (2, -0.116), (3, 0.018), (4, 0.048), (5, 0.014), (6, -0.075), (7, -0.087), (8, -0.073), (9, 0.067), (10, 0.166), (11, 0.143), (12, 0.066), (13, -0.043), (14, -0.002), (15, 0.022), (16, -0.03), (17, 0.014), (18, -0.095), (19, 0.011), (20, -0.036), (21, 0.006), (22, 0.057), (23, -0.023), (24, -0.035), (25, -0.072), (26, 0.088), (27, 0.149), (28, -0.022), (29, 0.166), (30, 0.072), (31, -0.057), (32, 0.129), (33, 0.076), (34, -0.033), (35, 0.044), (36, 0.095), (37, -0.098), (38, -0.043), (39, -0.03), (40, -0.091), (41, 0.008), (42, -0.004), (43, 0.05), (44, 0.018), (45, -0.013), (46, 0.017), (47, 0.027), (48, 0.019), (49, 0.138)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9580574 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking

Author: Radu Soricut ; Abdessamad Echihabi

Abstract: The adoption ofMachine Translation technology for commercial applications is hampered by the lack of trust associated with machine-translated output. In this paper, we describe TrustRank, an MT system enhanced with a capability to rank the quality of translation outputs from good to bad. This enables the user to set a quality threshold, granting the user control over the quality of the translations. We quantify the gains we obtain in translation quality, and show that our solution works on a wide variety of domains and language pairs.

2 0.9033581 223 acl-2010-Tackling Sparse Data Issue in Machine Translation Evaluation

Author: Ondrej Bojar ; Kamil Kos ; David Marecek

Abstract: We illustrate and explain problems of n-grams-based machine translation (MT) metrics (e.g. BLEU) when applied to morphologically rich languages such as Czech. A novel metric SemPOS based on the deep-syntactic representation of the sentence tackles the issue and retains the performance for translation to English as well.

3 0.88275027 104 acl-2010-Evaluating Machine Translations Using mNCD

Author: Marcus Dobrinkat ; Tero Tapiovaara ; Jaakko Vayrynen ; Kimmo Kettunen

Abstract: This paper introduces mNCD, a method for automatic evaluation of machine translations. The measure is based on normalized compression distance (NCD), a general information theoretic measure of string similarity, and flexible word matching provided by stemming and synonyms. The mNCD measure outperforms NCD in system-level correlation to human judgments in English.

4 0.84552974 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking

Author: Hiroshi Echizen-ya ; Kenji Araki

Abstract: As described in this paper, we propose a new automatic evaluation method for machine translation using noun-phrase chunking. Our method correctly determines the matching words between two sentences using corresponding noun phrases. Moreover, our method determines the similarity between two sentences in terms of the noun-phrase order of appearance. Evaluation experiments were conducted to calculate the correlation among human judgments, along with the scores produced us- ing automatic evaluation methods for MT outputs obtained from the 12 machine translation systems in NTCIR7. Experimental results show that our method obtained the highest correlations among the methods in both sentence-level adequacy and fluency.

5 0.81919771 45 acl-2010-Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures

Author: Jesus Gonzalez Rubio ; Daniel Ortiz Martinez ; Francisco Casacuberta

Abstract: This work deals with the application of confidence measures within an interactivepredictive machine translation system in order to reduce human effort. If a small loss in translation quality can be tolerated for the sake of efficiency, user effort can be saved by interactively translating only those initial translations which the confidence measure classifies as incorrect. We apply confidence estimation as a way to achieve a balance between user effort savings and final translation error. Empirical results show that our proposal allows to obtain almost perfect translations while significantly reducing user effort.

6 0.80943388 56 acl-2010-Bridging SMT and TM with Translation Recommendation

7 0.75915533 54 acl-2010-Boosting-Based System Combination for Machine Translation

8 0.6171174 57 acl-2010-Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation

9 0.59718281 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features

10 0.57099962 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction

11 0.55168062 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation

12 0.47264138 50 acl-2010-Bilingual Lexicon Generation Using Non-Aligned Signatures

13 0.46273386 151 acl-2010-Intelligent Selection of Language Model Training Data

14 0.46133021 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models

15 0.45864058 135 acl-2010-Hindi-to-Urdu Machine Translation through Transliteration

16 0.45284024 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation

17 0.4396466 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web

18 0.41729072 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out

19 0.41230217 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation

20 0.40108439 201 acl-2010-Pseudo-Word for Phrase-Based Machine Translation


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.013), (25, 0.049), (39, 0.017), (42, 0.026), (59, 0.121), (69, 0.25), (71, 0.01), (73, 0.075), (76, 0.016), (78, 0.03), (80, 0.011), (83, 0.098), (84, 0.018), (98, 0.149)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.84555912 191 acl-2010-PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names

Author: Mark Johnson

Abstract: This paper establishes a connection between two apparently very different kinds of probabilistic models. Latent Dirichlet Allocation (LDA) models are used as “topic models” to produce a lowdimensional representation of documents, while Probabilistic Context-Free Grammars (PCFGs) define distributions over trees. The paper begins by showing that LDA topic models can be viewed as a special kind of PCFG, so Bayesian inference for PCFGs can be used to infer Topic Models as well. Adaptor Grammars (AGs) are a hierarchical, non-parameteric Bayesian extension of PCFGs. Exploiting the close relationship between LDA and PCFGs just described, we propose two novel probabilistic models that combine insights from LDA and AG models. The first replaces the unigram component of LDA topic models with multi-word sequences or collocations generated by an AG. The second extension builds on the first one to learn aspects of the internal structure of proper names.

2 0.81249762 116 acl-2010-Finding Cognate Groups Using Phylogenies

Author: David Hall ; Dan Klein

Abstract: A central problem in historical linguistics is the identification of historically related cognate words. We present a generative phylogenetic model for automatically inducing cognate group structure from unaligned word lists. Our model represents the process of transformation and transmission from ancestor word to daughter word, as well as the alignment between the words lists of the observed languages. We also present a novel method for simplifying complex weighted automata created during inference to counteract the otherwise exponential growth of message sizes. On the task of identifying cognates in a dataset of Romance words, our model significantly outperforms a baseline ap- proach, increasing accuracy by as much as 80%. Finally, we demonstrate that our automatically induced groups can be used to successfully reconstruct ancestral words.

same-paper 3 0.80020142 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking

Author: Radu Soricut ; Abdessamad Echihabi

Abstract: The adoption ofMachine Translation technology for commercial applications is hampered by the lack of trust associated with machine-translated output. In this paper, we describe TrustRank, an MT system enhanced with a capability to rank the quality of translation outputs from good to bad. This enables the user to set a quality threshold, granting the user control over the quality of the translations. We quantify the gains we obtain in translation quality, and show that our solution works on a wide variety of domains and language pairs.

4 0.71033728 20 acl-2010-A Transition-Based Parser for 2-Planar Dependency Structures

Author: Carlos Gomez-Rodriguez ; Joakim Nivre

Abstract: Finding a class of structures that is rich enough for adequate linguistic representation yet restricted enough for efficient computational processing is an important problem for dependency parsing. In this paper, we present a transition system for 2-planar dependency trees trees that can be decomposed into at most two planar graphs and show that it can be used to implement a classifier-based parser that runs in linear time and outperforms a stateof-the-art transition-based parser on four data sets from the CoNLL-X shared task. In addition, we present an efficient method – – for determining whether an arbitrary tree is 2-planar and show that 99% or more of the trees in existing treebanks are 2-planar.

5 0.67059088 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans

Author: Fei Huang ; Alexander Yates

Abstract: Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling. In experiments, our novel system reduces error by 16% relative to the previous state of the art on out-of-domain text.

6 0.66923237 56 acl-2010-Bridging SMT and TM with Translation Recommendation

7 0.66602832 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features

8 0.66571128 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts

9 0.66562998 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition

10 0.66544551 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences

11 0.66403854 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results

12 0.66346669 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation

13 0.66333061 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment

14 0.66311705 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web

15 0.66302896 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data

16 0.66236478 144 acl-2010-Improved Unsupervised POS Induction through Prototype Discovery

17 0.66235828 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation

18 0.66170752 54 acl-2010-Boosting-Based System Combination for Machine Translation

19 0.66097772 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network

20 0.6603418 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries