acl acl2010 acl2010-104 knowledge-graph by maker-knowledge-mining

104 acl-2010-Evaluating Machine Translations Using mNCD


Source: pdf

Author: Marcus Dobrinkat ; Tero Tapiovaara ; Jaakko Vayrynen ; Kimmo Kettunen

Abstract: This paper introduces mNCD, a method for automatic evaluation of machine translations. The measure is based on normalized compression distance (NCD), a general information theoretic measure of string similarity, and flexible word matching provided by stemming and synonyms. The mNCD measure outperforms NCD in system-level correlation to human judgments in English.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 fi Abstract This paper introduces mNCD, a method for automatic evaluation of machine translations. [sent-12, score-0.028]

2 The measure is based on normalized compression distance (NCD), a general information theoretic measure of string similarity, and flexible word matching provided by stemming and synonyms. [sent-13, score-0.265]

3 The mNCD measure outperforms NCD in system-level correlation to human judgments in English. [sent-14, score-0.162]

4 1 Introduction Automatic evaluation of machine translation (MT) systems requires automated procedures to ensure consistency and efficient handling of large amounts of data. [sent-15, score-0.053]

5 In statistical MT systems, automatic evaluation of translations is essential for parameter optimization and system development. [sent-16, score-0.09]

6 However, manual evaluation is important in the comparison ofdifferent MT systems and for the validation and development of automatic MT evaluation measures, which try to model human assessments oftranslations as closely as possible. [sent-18, score-0.088]

7 Recently, normalized compression distance (NCD) has been applied to the evaluation of machine translations. [sent-20, score-0.147]

8 NCD is a general information theoretic measure of string similarity, whereas most MT evaluation measures, e. [sent-21, score-0.098]

9 Parker (2008) introduced BADGER, an MT evaluation measure that uses NCD and a language independent word normalization method. [sent-24, score-0.051]

10 BADGER scores were directly compared against the scores of METEOR and word error rate (WER). [sent-25, score-0.03]

11 The correlation between BADGER and METEOR were low and correlations between BADGER and WER high. [sent-26, score-0.146]

12 NCD was not compared to human assessments of translations, but correlations of NCD and METEOR scores were very high for all the three language pairs. [sent-30, score-0.137]

13 (2010) have extended the work by including NCD in the ACL WMT08 evaluation framework and showing that NCD is correlated to human judgments. [sent-32, score-0.069]

14 The NCD measure did not match the performance of the state-of-the-art MT evaluation measures in English, but it presented a viable alternative to de facto standard BLEU (Papineni et al. [sent-33, score-0.085]

15 Some recent advances in automatic MT evaluation have included non-binary matching between compared items (Banerjee and Lavie, 2005; Agar- wal and Lavie, 2008; Chan and Ng, 2009), which is implicitly present in the string-based NCD measure. [sent-36, score-0.046]

16 We experiment with relaxed word matching using stemming and a lexical database to allow lexical changes. [sent-38, score-0.073]

17 These additional modules attempt to make the reference sentences more similar to the evaluated translations on the string level. [sent-39, score-0.142]

18 We report an experiment showing that document-level NCD and aggregated NCD scores for individual sentences produce very similar correlations to human judgments. [sent-40, score-0.16]

19 2 Normalized Compression Distance Normalized compression distance (NCD) is a similarity measure based on the idea that a string x is similar to another string y when both share substrings. [sent-44, score-0.181]

20 The description of y can reference shared substrings in the known x without repetition, indicating shared information. [sent-45, score-0.027]

21 Figure 1 shows an example in which the compression of the concatenation of x and y results in a shorter output than individual compressions of x and y. [sent-46, score-0.096]

22 The normalized compression distance, as defined by Cilibrasi and Vitanyi (2005), is given in Equation 1, with C(x) as length of the compression of x and C(x, y) as the length of the compression of the concatenation of x and y. [sent-47, score-0.283]

23 NCD is an approximation of the uncomputable normalized information distance (NID), a general measure for the similarity of two objects. [sent-49, score-0.084]

24 NID is based on the notion of Kolmogorov complexity K(x), a theoretical measure for the information content of a string x, defined as the shortest universal Turing machine that prints x and stops (Solomonoff, 1964). [sent-50, score-0.056]

25 3 mNCD Normalized compression distance was not conceived with MT evaluation in mind, but rather it is a general measure of string similarity. [sent-52, score-0.176]

26 Variation in language leads to several acceptable translations for each source sentence, which is why multiple reference translations are preferred in evaluation. [sent-55, score-0.163]

27 Unfortunately, it is typical to have only one reference translation. [sent-56, score-0.027]

28 Paraphrasing techniques can produce additional translation variants (Russo-Lassner et al. [sent-57, score-0.059]

29 The proposed method, mNCD, works analogously to M-BLEU and M-TER, which use the flexible word matching modules from METEOR to find relaxed word-to-word alignments (Agarwal and Lavie, 2008). [sent-61, score-0.115]

30 The modules are able to align words even if they do not share the same surface form, but instead have a common stem or are synonyms of each other. [sent-62, score-0.064]

31 A similarized translation reference is generated by replacing words in the reference with their aligned counterparts from the translation hypothesis. [sent-63, score-0.21]

32 The NCD score is computed between the translations and the similarized references to get the mNCD score. [sent-64, score-0.146]

33 Table 1 shows some hand-picked German– English candidate translations along with a) the reference translations including the 1-NCD score to easily compare with METEOR and b) the similarized references including the mNCD score. [sent-65, score-0.235]

34 For comparison, the corresponding METEOR scores without implicit relaxed matching are shown. [sent-66, score-0.075]

35 4 Experiments The proposed mNCD and the basic NCD measure were evaluated by computing correlation to human judgments of translations. [sent-67, score-0.162]

36 A high correlation value between an MT evaluation measure and human judgments indicates that the measure is able to evaluate translations in a more similar way to humans. [sent-68, score-0.275]

37 Relaxed alignments with the METEOR modules exact, stem and synonym were created for English for the computation of the mNCD score. [sent-69, score-0.128]

38 The synonym module was not available with other target languages. [sent-70, score-0.098]

39 There is no good way to halt gossip that has already begun to spread. [sent-74, score-0.048]

40 There is no e ffect ive means to stop gossip that has already begun to spread. [sent-75, score-0.048]

41 Nevertheless, the crisis should not have influenced the entire economy. [sent-87, score-0.053]

42 Nevertheless, the crisis should not have Influence the entire economy. [sent-88, score-0.053]

43 Perhaps you see the pen you thought you lost lying on your colleague’s desk. [sent-94, score-0.065]

44 Perhaps you meet ing the pen you thought you lost lying on your colleague’s desk. [sent-95, score-0.065]

45 13 Table 1: Example German–English translations showing the effect of relaxed matching in the 1-mNCD score (for rows S) compared with METEOR using the exact module only, since the modules stem and synonym are already used in the similarized reference. [sent-100, score-0.381]

46 The RANK category has human quality rankings of five translations for one sentence from different MT systems. [sent-105, score-0.137]

47 The CONST category contains rankings for short phrases (constituents), and the YES/NO category contains binary answers if a short phrase is an acceptable translation or not. [sent-106, score-0.117]

48 For the translation tasks into English, the relaxed alignment using a stem module and the synonym module affected 7. [sent-107, score-0.229]

49 For NCD we kept the data as is, which we called real casing (rc). [sent-111, score-0.036]

50 Since the used METEOR align module lowercases all text, we restored the case information in mNCD by copying the correct case from the reference translation to the similarized reference, based on METEOR’s alignment. [sent-112, score-0.18]

51 2 System-level correlation We follow the same evaluation methodology as in Callison-Burch et al. [sent-115, score-0.084]

52 (2008), which allows us to measure how well MT evaluation measures correlate with human judgments on the system level. [sent-116, score-0.132]

53 From the annotators’ input, the n systems were ranked based on the number of times each system’s output was selected as the best translation divided by the number of times each system was part of a judgment. [sent-118, score-0.036]

54 We computed system-level correlations for tasks with English, French, Spanish and German as the target language1 . [sent-119, score-0.092]

55 (2010) computed NCD between a set of candidate translations and references at the same time regardless of the sentence alignments, analogously to document comparison. [sent-123, score-0.074]

56 We experi- mented with segmentation of the candidate translations into smaller blocks, which were individually evaluated with NCD and aggregated into a single value with arithmetic mean. [sent-124, score-0.087]

57 The resulting system-level correlations between NCD and human judgments are shown in Figure 2 as a function of the block size. [sent-125, score-0.179]

58 The correlations are very similar with all block sizes, except for Spanish, where smaller block size produces higher correlation. [sent-126, score-0.157]

59 The reported results with mNCD use maximum block size, similar to V ¨ayrynen et al. [sent-128, score-0.039]

60 1The English-Spanish news task was left out as most measures had negative correlation with human judgments. [sent-130, score-0.114]

61 82 block size in lines Figure 2: The block size has very little effect on the correlation between NCD and human judgments. [sent-131, score-0.172]

62 The right side corresponds to document comparison and the left side to aggregated NCD scores for sentences. [sent-132, score-0.04]

63 2 mNCD against NCD Table 2 shows the average system level correlation of different NCD and mNCD variants for trans- lations into English. [sent-134, score-0.09]

64 PPMZ is slower to compute but performs slightly better compared to bz2, except for the Method Parameters KN RA mNCD PPMZ rc . [sent-136, score-0.127]

65 69 NCD PPMZ rc mNCD bz2 NCD STN CO /NOS YE na eM . [sent-137, score-0.127]

66 69 Table 2: Mean system level correlations over all translation tasks into English for variants of mNCD and NCD. [sent-168, score-0.138]

67 Parameters are the compressor PPMZ or bz2 and the preprocessing choice lowercasing (lc) or real casing (rc). [sent-170, score-0.06]

68 Target Lang Corr Method Parameters EN DE FR ES mNCD PPMZ rc . [sent-171, score-0.127]

69 15 Table 3: mNCD versus NCD system correlation RANK results with different parameters (the same as in Table 2) for each target language. [sent-203, score-0.08]

70 Target languages DE, FR and ES use only the stem module. [sent-205, score-0.047]

71 Table 2 shows that real casing improves RANK correlation slightly throughout NCD and mNCD variants, whereas it reduces correlation in the categories CONST, YES/NO as well as the mean. [sent-207, score-0.181]

72 Table 3 shows the correlation results for the RANK category by target language. [sent-213, score-0.101]

73 Correlations for other languages show mixed results and on average, mNCD gives lower correlations than NCD. [sent-215, score-0.093]

74 3 mNCD versus other methods Table 4 presents the results for the selected mNCD (PPMZ rc) and NCD (bz2 rc) variants along with the correlations for other MT evaluation methods from the WMT’08 data, based on the results in Callison-Burch et al. [sent-217, score-0.119]

75 Although mNCD correlation with human evaluations improved over NCD, the ranking among other measures was not affected. [sent-220, score-0.128]

76 Language and task specific results not shown here, reveal very low mNCD and NCD correlations in the Spanish-English news task, which significantly 83 Method DP ULCh DR meteor-ranking ULC posbleu SR posF4gram-gm KN RA STN CO /YEON S na eM . [sent-221, score-0.115]

77 68 meteor-baseline posF4gram-am mNCD (PPMZ rc) NCD (PPMZ rc) mbleu bleu mter svm-rank . [sent-253, score-0.078]

78 66 Table 4: Average system-level correlations over translation tasks into English for NCD, mNCD and other MT evaluations measures degrades the averages. [sent-289, score-0.149]

79 Considering the mean of the categories instead, mNCD’s correlation of . [sent-290, score-0.083]

80 The table is shorter since many of the better MT measures use language specific linguistic resources that are not easily available for languages other than English. [sent-293, score-0.034]

81 6 Discussion We have introduced a new MT evaluation measure, mNCD, which is based on normalized compression distance and METEOR’s relaxed alignment modules. [sent-295, score-0.189]

82 The mNCD measure outperforms NCD in English with all tested parameter combinations, whereas results with other target languages are unclear. [sent-296, score-0.072]

83 The improved correlations with mNCD did not change the position in the RANK category of the MT evaluation measures in the 2008 ACL WMT shared task. [sent-297, score-0.137]

84 The improvement in English was expected on the grounds of the synonym module, and indicated also by the larger number of affected words in the Method Target Lang Corr DE posbleu posF4gram-am posF4gram-gm bleu FR ES Mean . [sent-298, score-0.118]

85 68 NCD (bz2 rc) svm-rank mbleu mNCD (PPMZ rc) meteor-baseline meteor-ranking mter . [sent-314, score-0.048]

86 65 Table 5: Average system-level correlations for the RANK category from English for NCD, mNCD and other MT evaluation measures. [sent-346, score-0.117]

87 We believe there is potential for improvement in other languages as well if synonym lexicons are available. [sent-348, score-0.081]

88 We have also extended the basic NCD measure to scale between a document comparison measure and aggregated sentence-level measure. [sent-349, score-0.093]

89 The rather surprising result is that NCD produces quite similar scores with all block sizes. [sent-350, score-0.054]

90 (2008), we have doubts whether it presents the most effective method exploiting all the given human evaluations in the best way. [sent-353, score-0.041]

91 The system-level correlation measure only awards the winner of the ranking of five different systems. [sent-354, score-0.101]

92 In addition, the human knowledge that gave the lower rankings is not exploited. [sent-356, score-0.054]

93 In future work with mNCD as an MT evaluation measure, we are planning to evaluate synonym dictionaries for other languages than English. [sent-357, score-0.083]

94 The synonym module for English does not distinguish between different senses of words. [sent-358, score-0.085]

95 Therefore, synonym lexicons found with statistical methods might provide a viable alternative for manually constructed lexicons (Kauchak and Barzilay, 2006). [sent-359, score-0.096]

96 METEOR, M-BLEU and M-TER: evaluation metrics for highcorrelation with human rankings of machine translation output. [sent-362, score-0.107]

97 METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. [sent-367, score-0.122]

98 Re-evaluating the role of BLEU in machine translation research. [sent-372, score-0.036]

99 Packing it all up in search for a language independent MT quality measure tool. [sent-393, score-0.034]

100 BLEU: a method for automatic evaluation of machine translation. [sent-406, score-0.028]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ncd', 0.72), ('mncd', 0.528), ('ppmz', 0.216), ('meteor', 0.137), ('rc', 0.127), ('lc', 0.117), ('mt', 0.1), ('similarized', 0.084), ('compression', 0.08), ('correlations', 0.079), ('correlation', 0.067), ('translations', 0.062), ('badger', 0.06), ('crisis', 0.053), ('synonym', 0.052), ('ayrynen', 0.048), ('const', 0.048), ('relaxed', 0.042), ('kimmo', 0.042), ('block', 0.039), ('casing', 0.036), ('posbleu', 0.036), ('translation', 0.036), ('measure', 0.034), ('judgments', 0.034), ('stem', 0.033), ('module', 0.033), ('rank', 0.033), ('kauchak', 0.032), ('nid', 0.032), ('modules', 0.031), ('bleu', 0.03), ('rankings', 0.027), ('normalized', 0.027), ('reference', 0.027), ('human', 0.027), ('aggregated', 0.025), ('compressor', 0.024), ('kolmogorov', 0.024), ('mbleu', 0.024), ('mter', 0.024), ('stn', 0.024), ('tapiovaara', 0.024), ('lavie', 0.023), ('variants', 0.023), ('distance', 0.023), ('string', 0.022), ('category', 0.021), ('jaakko', 0.021), ('kettunen', 0.021), ('cilibrasi', 0.021), ('colleague', 0.021), ('gossip', 0.021), ('tero', 0.021), ('measures', 0.02), ('lost', 0.019), ('aalto', 0.019), ('finland', 0.019), ('lying', 0.018), ('corr', 0.018), ('agarwal', 0.018), ('wmt', 0.018), ('matching', 0.018), ('evaluation', 0.017), ('kn', 0.017), ('pen', 0.017), ('spanish', 0.016), ('assessments', 0.016), ('wer', 0.016), ('concatenation', 0.016), ('mean', 0.016), ('english', 0.015), ('begun', 0.015), ('banerjee', 0.015), ('lexicons', 0.015), ('scores', 0.015), ('fr', 0.015), ('showing', 0.014), ('lang', 0.014), ('alon', 0.014), ('evaluations', 0.014), ('theoretic', 0.014), ('viable', 0.014), ('languages', 0.014), ('chan', 0.013), ('stemming', 0.013), ('target', 0.013), ('already', 0.012), ('alignments', 0.012), ('paraphrasing', 0.012), ('analogously', 0.012), ('box', 0.012), ('acceptable', 0.012), ('barzilay', 0.011), ('es', 0.011), ('thought', 0.011), ('ra', 0.011), ('automatic', 0.011), ('whereas', 0.011), ('correlated', 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 104 acl-2010-Evaluating Machine Translations Using mNCD

Author: Marcus Dobrinkat ; Tero Tapiovaara ; Jaakko Vayrynen ; Kimmo Kettunen

Abstract: This paper introduces mNCD, a method for automatic evaluation of machine translations. The measure is based on normalized compression distance (NCD), a general information theoretic measure of string similarity, and flexible word matching provided by stemming and synonyms. The mNCD measure outperforms NCD in system-level correlation to human judgments in English.

2 0.070681274 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking

Author: Hiroshi Echizen-ya ; Kenji Araki

Abstract: As described in this paper, we propose a new automatic evaluation method for machine translation using noun-phrase chunking. Our method correctly determines the matching words between two sentences using corresponding noun phrases. Moreover, our method determines the similarity between two sentences in terms of the noun-phrase order of appearance. Evaluation experiments were conducted to calculate the correlation among human judgments, along with the scores produced us- ing automatic evaluation methods for MT outputs obtained from the 12 machine translation systems in NTCIR7. Experimental results show that our method obtained the highest correlations among the methods in both sentence-level adequacy and fluency.

3 0.065917164 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking

Author: Radu Soricut ; Abdessamad Echihabi

Abstract: The adoption ofMachine Translation technology for commercial applications is hampered by the lack of trust associated with machine-translated output. In this paper, we describe TrustRank, an MT system enhanced with a capability to rank the quality of translation outputs from good to bad. This enables the user to set a quality threshold, granting the user control over the quality of the translations. We quantify the gains we obtain in translation quality, and show that our solution works on a wide variety of domains and language pairs.

4 0.051308796 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction

Author: Xiaojun Wan ; Huiying Li ; Jianguo Xiao

Abstract: Cross-language document summarization is a task of producing a summary in one language for a document set in a different language. Existing methods simply use machine translation for document translation or summary translation. However, current machine translation services are far from satisfactory, which results in that the quality of the cross-language summary is usually very poor, both in readability and content. In this paper, we propose to consider the translation quality of each sentence in the English-to-Chinese cross-language summarization process. First, the translation quality of each English sentence in the document set is predicted with the SVM regression method, and then the quality score of each sentence is incorporated into the summarization process. Finally, the English sentences with high translation quality and high informativeness are selected and translated to form the Chinese summary. Experimental results demonstrate the effectiveness and usefulness of the proposed approach. 1

5 0.047193773 223 acl-2010-Tackling Sparse Data Issue in Machine Translation Evaluation

Author: Ondrej Bojar ; Kamil Kos ; David Marecek

Abstract: We illustrate and explain problems of n-grams-based machine translation (MT) metrics (e.g. BLEU) when applied to morphologically rich languages such as Czech. A novel metric SemPOS based on the deep-syntactic representation of the sentence tackles the issue and retains the performance for translation to English as well.

6 0.043727882 262 acl-2010-Word Alignment with Synonym Regularization

7 0.03930369 50 acl-2010-Bilingual Lexicon Generation Using Non-Aligned Signatures

8 0.03742671 39 acl-2010-Automatic Generation of Story Highlights

9 0.036314435 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation

10 0.035176333 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment

11 0.034519114 56 acl-2010-Bridging SMT and TM with Translation Recommendation

12 0.034361027 54 acl-2010-Boosting-Based System Combination for Machine Translation

13 0.03081324 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out

14 0.030776935 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages

15 0.029439233 133 acl-2010-Hierarchical Search for Word Alignment

16 0.028697638 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages

17 0.026508885 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation

18 0.026467538 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation

19 0.026265137 57 acl-2010-Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation

20 0.025811234 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.069), (1, -0.041), (2, -0.034), (3, -0.003), (4, 0.01), (5, 0.01), (6, -0.025), (7, -0.028), (8, 0.006), (9, 0.014), (10, 0.043), (11, 0.036), (12, -0.004), (13, -0.011), (14, -0.019), (15, 0.006), (16, 0.027), (17, -0.03), (18, -0.015), (19, 0.009), (20, -0.033), (21, -0.019), (22, 0.041), (23, 0.022), (24, 0.01), (25, -0.033), (26, 0.046), (27, 0.104), (28, -0.019), (29, 0.09), (30, -0.022), (31, -0.027), (32, 0.06), (33, -0.007), (34, -0.066), (35, 0.025), (36, 0.092), (37, -0.071), (38, -0.0), (39, -0.028), (40, -0.081), (41, -0.049), (42, -0.032), (43, 0.024), (44, 0.009), (45, -0.026), (46, 0.022), (47, 0.059), (48, -0.014), (49, 0.119)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.89481926 104 acl-2010-Evaluating Machine Translations Using mNCD

Author: Marcus Dobrinkat ; Tero Tapiovaara ; Jaakko Vayrynen ; Kimmo Kettunen

Abstract: This paper introduces mNCD, a method for automatic evaluation of machine translations. The measure is based on normalized compression distance (NCD), a general information theoretic measure of string similarity, and flexible word matching provided by stemming and synonyms. The mNCD measure outperforms NCD in system-level correlation to human judgments in English.

2 0.68112457 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking

Author: Radu Soricut ; Abdessamad Echihabi

Abstract: The adoption ofMachine Translation technology for commercial applications is hampered by the lack of trust associated with machine-translated output. In this paper, we describe TrustRank, an MT system enhanced with a capability to rank the quality of translation outputs from good to bad. This enables the user to set a quality threshold, granting the user control over the quality of the translations. We quantify the gains we obtain in translation quality, and show that our solution works on a wide variety of domains and language pairs.

3 0.66501451 223 acl-2010-Tackling Sparse Data Issue in Machine Translation Evaluation

Author: Ondrej Bojar ; Kamil Kos ; David Marecek

Abstract: We illustrate and explain problems of n-grams-based machine translation (MT) metrics (e.g. BLEU) when applied to morphologically rich languages such as Czech. A novel metric SemPOS based on the deep-syntactic representation of the sentence tackles the issue and retains the performance for translation to English as well.

4 0.6442405 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking

Author: Hiroshi Echizen-ya ; Kenji Araki

Abstract: As described in this paper, we propose a new automatic evaluation method for machine translation using noun-phrase chunking. Our method correctly determines the matching words between two sentences using corresponding noun phrases. Moreover, our method determines the similarity between two sentences in terms of the noun-phrase order of appearance. Evaluation experiments were conducted to calculate the correlation among human judgments, along with the scores produced us- ing automatic evaluation methods for MT outputs obtained from the 12 machine translation systems in NTCIR7. Experimental results show that our method obtained the highest correlations among the methods in both sentence-level adequacy and fluency.

5 0.50704718 45 acl-2010-Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures

Author: Jesus Gonzalez Rubio ; Daniel Ortiz Martinez ; Francisco Casacuberta

Abstract: This work deals with the application of confidence measures within an interactivepredictive machine translation system in order to reduce human effort. If a small loss in translation quality can be tolerated for the sake of efficiency, user effort can be saved by interactively translating only those initial translations which the confidence measure classifies as incorrect. We apply confidence estimation as a way to achieve a balance between user effort savings and final translation error. Empirical results show that our proposal allows to obtain almost perfect translations while significantly reducing user effort.

6 0.48166311 56 acl-2010-Bridging SMT and TM with Translation Recommendation

7 0.47818244 57 acl-2010-Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation

8 0.47341064 50 acl-2010-Bilingual Lexicon Generation Using Non-Aligned Signatures

9 0.39482808 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction

10 0.36889118 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages

11 0.36176938 54 acl-2010-Boosting-Based System Combination for Machine Translation

12 0.33400819 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation

13 0.32361472 105 acl-2010-Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems

14 0.31516725 135 acl-2010-Hindi-to-Urdu Machine Translation through Transliteration

15 0.28730446 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web

16 0.26838687 12 acl-2010-A Probabilistic Generative Model for an Intermediate Constituency-Dependency Representation

17 0.26830199 199 acl-2010-Preferences versus Adaptation during Referring Expression Generation

18 0.2678732 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment

19 0.26524067 201 acl-2010-Pseudo-Word for Phrase-Based Machine Translation

20 0.26366669 151 acl-2010-Intelligent Selection of Language Model Training Data


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.02), (25, 0.03), (39, 0.013), (42, 0.016), (44, 0.021), (59, 0.071), (73, 0.032), (78, 0.03), (83, 0.068), (84, 0.025), (90, 0.384), (98, 0.122)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.69464862 104 acl-2010-Evaluating Machine Translations Using mNCD

Author: Marcus Dobrinkat ; Tero Tapiovaara ; Jaakko Vayrynen ; Kimmo Kettunen

Abstract: This paper introduces mNCD, a method for automatic evaluation of machine translations. The measure is based on normalized compression distance (NCD), a general information theoretic measure of string similarity, and flexible word matching provided by stemming and synonyms. The mNCD measure outperforms NCD in system-level correlation to human judgments in English.

2 0.64996469 143 acl-2010-Importance of Linguistic Constraints in Statistical Dependency Parsing

Author: Bharat Ram Ambati

Abstract: Statistical systems with high accuracy are very useful in real-world applications. If these systems can capture basic linguistic information, then the usefulness of these statistical systems improve a lot. This paper is an attempt at incorporating linguistic constraints in statistical dependency parsing. We consider a simple linguistic constraint that a verb should not have multiple subjects/objects as its children in the dependency tree. We first describe the importance of this constraint considering Machine Translation systems which use dependency parser output, as an example application. We then show how the current state-ofthe-art dependency parsers violate this constraint. We present two new methods to handle this constraint. We evaluate our methods on the state-of-the-art dependency parsers for Hindi and Czech. 1

3 0.49967611 54 acl-2010-Boosting-Based System Combination for Machine Translation

Author: Tong Xiao ; Jingbo Zhu ; Muhua Zhu ; Huizhen Wang

Abstract: In this paper, we present a simple and effective method to address the issue of how to generate diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. Our method is based on the framework of boosting. First, a sequence of weak translation systems is generated from a baseline system in an iterative manner. Then, a strong translation system is built from the ensemble of these weak translation systems. To adapt boosting to SMT system combination, several key components of the original boosting algorithms are redesigned in this work. We evaluate our method on Chinese-to-English Machine Translation (MT) tasks in three baseline systems, including a phrase-based system, a hierarchical phrasebased system and a syntax-based system. The experimental results on three NIST evaluation test sets show that our method leads to significant improvements in translation accuracy over the baseline systems. 1

4 0.39442781 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans

Author: Fei Huang ; Alexander Yates

Abstract: Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling. In experiments, our novel system reduces error by 16% relative to the previous state of the art on out-of-domain text.

5 0.39398235 133 acl-2010-Hierarchical Search for Word Alignment

Author: Jason Riesa ; Daniel Marcu

Abstract: We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can efficiently extract a ranked k-best list. We score a given alignment within the forest with a flexible, linear discriminative model incorporating hundreds of features, and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure, yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system.

6 0.39386946 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing

7 0.39347911 146 acl-2010-Improving Chinese Semantic Role Labeling with Rich Syntactic Features

8 0.39283794 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts

9 0.39185619 79 acl-2010-Cross-Lingual Latent Topic Extraction

10 0.39183164 116 acl-2010-Finding Cognate Groups Using Phylogenies

11 0.39049351 144 acl-2010-Improved Unsupervised POS Induction through Prototype Discovery

12 0.39026302 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.

13 0.38996729 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation

14 0.38960528 202 acl-2010-Reading between the Lines: Learning to Map High-Level Instructions to Commands

15 0.38941151 98 acl-2010-Efficient Staggered Decoding for Sequence Labeling

16 0.38917172 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation

17 0.38903832 170 acl-2010-Letter-Phoneme Alignment: An Exploration

18 0.38887405 39 acl-2010-Automatic Generation of Story Highlights

19 0.38879657 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints

20 0.38863415 188 acl-2010-Optimizing Informativeness and Readability for Sentiment Summarization