emnlp emnlp2010 emnlp2010-63 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Philip Resnik ; Olivia Buzek ; Chang Hu ; Yakov Kronrod ; Alex Quinn ; Benjamin B. Bederson
Abstract: Targeted paraphrasing is a new approach to the problem of obtaining cost-effective, reasonable quality translation that makes use of simple and inexpensive human computations by monolingual speakers in combination with machine translation. The key insight behind the process is that it is possible to spot likely translation errors with only monolingual knowledge of the target language, and it is possible to generate alternative ways to say the same thing (i.e. paraphrases) with only monolingual knowledge of the source language. Evaluations demonstrate that this approach can yield substantial improvements in translation quality.
Reference: text
sentIndex sentText sentNum sentScore
1 The key insight behind the process is that it is possible to spot likely translation errors with only monolingual knowledge of the target language, and it is possible to generate alternative ways to say the same thing (i. [sent-9, score-0.611]
2 paraphrases) with only monolingual knowledge of the source language. [sent-11, score-0.24]
3 Evaluations demonstrate that this approach can yield substantial improvements in translation quality. [sent-12, score-0.388]
4 1 Introduction For most of the world’s languages, the availability of translation is limited to two possibilities: high quality at high cost, via professional translators, and low quality at low cost, via machine translation (MT). [sent-13, score-0.827]
5 The spectrum between these two extremes is very poorly populated, and at any point on the spectrum the ready availability of translation is limited to only a small fraction of the world’s languages. [sent-14, score-0.537]
6 There is, of course, a long history of technological assistance to translators, improving cost effectiveness using translation memory (Laurian, 1984; Bowker and Barlow, 2004) or other interactive tools to assist translators (Esteban et al. [sent-15, score-0.527]
7 However, all these alternatives face a central availability bottleneck: they require the participation of humans with bilingual expertise. [sent-19, score-0.249]
8 edu In this paper, we report on a new exploration of the middle ground, taking advantage of a virtually unutilized resource: speakers of the source and tar- get language who are effectively monolingual, i. [sent-24, score-0.208]
9 who each only know one of the two languages relevant for the translation task. [sent-26, score-0.388]
10 The solution we are proposing has the potential to provide a more cost effective approach to translation in scenarios where machine translation would be considered acceptable to use, if only it were generally of high enough quality. [sent-27, score-0.85]
11 This would clearly exclude tasks like translation of medical reports, business contracts, or literary works, where the validation of a qualified bilingual translator is absolutely necessary. [sent-28, score-0.45]
12 However, it does include a great many real-world scenarios, such as following news reports in another country, reading international comments about a product, or generating a decent first draft translation of a Wikipedia page for Wikipedia editors to improve. [sent-29, score-0.388]
13 The use of monolingual participants in a humanmachine translation process is not entirely new. [sent-30, score-0.6]
14 There have also been at least two independently developed human-machine translation frameworks that employ an iterative pro- tocol involving monolinguals on both the source and target side. [sent-34, score-0.561]
15 Shahaf and Horvitz (2010) use machine translation as a specific instance of a general game-based framework for combining a range of machine and human capabilities. [sent-40, score-0.445]
16 2 Targeted Paraphrasing The starting point for our approach is an observation: the source sentence provided as input to an MT system is just one of many ways in which the meaning could have been expressed, and for any given MT system, some forms of expression are easier to translate than others. [sent-45, score-0.282]
17 For example, consider the following real example of translation from English to French by an automatic MT system: • • Source: Polls indicate Brown, a state senator, aSnodu Coakley, Massachusetts’ Attorney General, are locked in a virtual tie to fill the late Sen. [sent-49, score-0.432]
18 ferme´s dans unecravatevirtuel A French speaker can look at this automatic translation and see immediately that the underlined parts are wrong, even without knowing the intended source meaning. [sent-52, score-0.54]
19 We can identify the spans in the source English sentence that are responsible for these badly translated French spans, and change them to alternative expressions with the same meaning (e. [sent-53, score-0.401]
20 changing Massachusetts’ Attorney General to the Attorney General of Massachusetts); if we do so and then use the same MT system again, we obtain a translation that is still imperfect (e. [sent-55, score-0.388]
21 cravate means necktie), but is more acceptable: • System: Les sondages indiquent que Brown, un sS´eynsatteemur: d’´etat, et Coakley, le procureur g´ene´ral du Massachusetts, sont enferm´es dans une cravate virtuel pourvoir le si´ege au Se´nat de Sen. [sent-57, score-0.341]
22 Operationally, then, translation with targeted paraphrasing includes the following steps. [sent-59, score-0.847]
23 In principle, however, any automatic translation system can be used in this role, potentially at some cost to quality, by performing post hoc target-to-source alignment. [sent-62, score-0.462]
24 This step identifies parts of the source sentence that lead to ungrammatical, nonsensical, or apparently incorrect translations on the target side. [sent-64, score-0.219]
25 (2010), in this proceedings, explore the use of source paraphrases without targeting apparent mistranslations, using lattice translation (Dyer et al. [sent-70, score-0.698]
26 , 2008) to efficiently represent and decode the resulting very large space of paraphrase alternatives. [sent-71, score-0.171]
27 This step generates alternative expressions for the source spans identified in the previous step. [sent-73, score-0.307]
28 To illustrate in English, someone seeing John and Mary took a European vacation this summer might supply the paraphrase Mary went on a European, verifying that the resulting John and Mary went on a European vacation this summer preserves the original meaning. [sent-75, score-0.253]
29 For example, if two nonoverlapping source spans are each paraphrased in three ways, we generate 9 sentential source paraphrases, each of which represents an alternative way of expressing the original sentence. [sent-80, score-0.507]
30 The alternative source sentences, produced via paraphrase, are sent through the same MT system, and a single-best translation hypothesis is selected, e. [sent-82, score-0.554]
31 on the basis of the translation system’s model score. [sent-84, score-0.388]
32 In principle, one could also combine the alternatives into a lattice representation and decode to find the best path using lattice translation (Dyer et al. [sent-85, score-0.53]
33 One could also present translation alternatives to a target speaker for selection, similarly to Callison-Burch et al. [sent-89, score-0.444]
34 Notice that with the exception of the initial translation, each remaining step in this pipeline can in129 volve either human participation or fully automatic processing. [sent-91, score-0.227]
35 The targeted paraphrasing framework therefore defines a rich set of intermediate points on the spectrum between fully automatic and fully human translation, of which we explore only a few in this paper. [sent-92, score-0.651]
36 Each span identified was projected back to its corresponding source span, and three Chinesespeaking Turkers were asked to provide paraphrases of each source span. [sent-105, score-0.378]
37 2 The Chinese source span paraphrases were then used to construct full-sentence paraphrases, which were retranslated, once again by Google Translate, to produce the output of the targeted paraphrasing translation process. [sent-107, score-1.114]
38 1Note that this page is not a translation of the corresponding English Wikipedia page or vice versa. [sent-108, score-0.388]
39 The initial translation outputs from Google Translate (GT) and the results of the targeted paraphrasing translation process (TP) were evaluated according to widely used critera of fluency and adequacy. [sent-111, score-1.417]
40 Fluency ratings were obtained on a 5-point scale from three native English speakers without knowledge of Chinese. [sent-112, score-0.196]
41 Translation adequacy ratings were obtained from three native Chinese speakers who are also fluent in English; they assessed adequacy of English sentences by comparing the communicated meaning to the Chinese source sentences. [sent-113, score-0.531]
42 For each GT output, we averaged across the ratings of the alternative TP to produce average TP fluency and adequacy scores. [sent-135, score-0.405]
43 The average GT output ratings, measuring the pure machine translation baseline, were 2. [sent-136, score-0.388]
44 One could argue that a more sensible evaluation is not to average across alternative TP outputs, but rather to simulate the behavior of a target-language speaker who simply chooses the one translation 130 among the alternatives that seems most fluent. [sent-142, score-0.499]
45 If we select the most fluent TP output for each source sentence according to the English-speakers’ average fluency ratings, we obtain average test set ratings of 3. [sent-143, score-0.482]
46 Figure 1 shows a selection of outputs: we present the two cases where the most fluent TP alternative shows the greatest gain in average fluency rating (best gain +2. [sent-149, score-0.41]
47 67); two cases near the median gain in average fluency (median +1); and the worst two cases with respect to effect on average fluency rating (worst -0. [sent-150, score-0.43]
48 4 Chinese-English Evaluation As a followup to our pilot study, we conducted an evaluation using Chinese-English test data taken from the NIST MT’08 machine translation evaluation, in order to obtain fully automatic translation evaluation scores. [sent-153, score-0.916]
49 3 spans per sentence, that yielded at least one source paraphrase on the source Chinese side. [sent-157, score-0.583]
50 Since the targeted paraphrasing translation process (TP) produces multiple hypotheses — one automatic translation output per sentential paraphrases we selected the single best output for each sentence by — 3Invalid paraphrase responses were rejected, i. [sent-163, score-1.656]
51 ConditionFluencyAdequacySentence the targeted paraphrase translation process (TP), selected to show a range from strong to weak improvements 131 ConditionBLEU GT (baseline)28. [sent-166, score-0.768]
52 41 Table 1: Results on a 49-sentence subset of the NIST MT’08 Chinese-English test set selecting the highest scoring English translation, according to the translation score delivered with each output by the Google Translate Research API. [sent-172, score-0.388]
53 (The original translation was, of course, included among the candidates for selection. [sent-173, score-0.388]
54 One could argue that this result is simply a result of having more hypotheses to choose from, not a result of the targeted paraphrasing process itself. [sent-176, score-0.459]
55 In order to rule out this possibility, we generated (n 1)-best Google translations, setting n for each sentence to match the number of alternative translations generated via targeted paraphrasing. [sent-177, score-0.372]
56 We then chose the best translation for each sentence, among the (n + 1)best Google hypotheses, via oracle selection, using the TERp metric (Snover et al. [sent-178, score-0.529]
57 + We did a similar oracle-best calculation using TERp for targeted paraphrasing (TP oracle). [sent-181, score-0.459]
58 46 BLEU points over the baseline, if the best scoring alternative from the targeted paraphrasing process were always chosen. [sent-183, score-0.514]
59 In addition to aggregate scoring using BLEU, we also looked at oracle results on a per-sentence basis using TERp (since BLEU more appropriate to use at the document level, not the sentence level). [sent-184, score-0.19]
60 assuming an oracle who chooses the original translation if none of the paraphrase-based alternatives are better, the average improvement over the entire set of 49 sentences is 5. [sent-193, score-0.585]
61 Although we have obtained results on only a small subset of the full NIST MT’08 test set, our automatic evaluation confirms the qualitative impressions in Figure 1 and the subjective ratings results obtained in our pilot study in Section 3. [sent-195, score-0.196]
62 The TP oracle results establish that by taking advantage of monolingual human speakers, it is possible to obtain quite substantial gains in translation quality. [sent-196, score-0.715]
63 The TP one-best results demonstrate that the majority of that oracle gain is obtained in automatic hypothesis selection, simply by selecting the paraphrase-based alternative translation with the highest translation score. [sent-197, score-1.038]
64 5 English-Chinese Evaluation As we noted in Section 2, the targeted paraphrasing translation process defines a set of human-machine combinations that do not require bilingual expertise. [sent-200, score-0.909]
65 The previous section described human identification of mistranslated spans on the target side, human generation of paraphrases for problematic sub-sentential spans on the source side, and both automatic hypothesis selection and human selection (via fluency ratings, in Section 3). [sent-201, score-1.009]
66 In this section, we take a step toward more automated processing, replacing human identification of mistranslated spans with an a fully automatic method. [sent-202, score-0.348]
67 6 The idea behind our automatic error identification is straightforward: if the source sentence 5“Gains” refer to a lower score: since TERp is an error measure, lower is better. [sent-203, score-0.16]
68 paraphrasing translation process (TP), and a human reference translation. [sent-206, score-0.746]
69 133 is translated to the target and then back-translated, a comparison of the result with the original is likely to identify places where the translation process encountered difficulty. [sent-207, score-0.388]
70 7 Briefly, we automatically translate source F to target E, then back-translate to produce F’ in the source language. [sent-208, score-0.299]
71 When at least two consecutive edits are found, we flag their smallest containing syntactic constituent as a potential source of translation difficulty. [sent-210, score-0.558]
72 8 In more detail, we posit that if an area of backtranslation F’ has many edits relative to original sentence F, then that area probably comes from parts of the target translation that did not represent the desired meaning in F very well. [sent-211, score-0.582]
73 We only consider consecutive edits in certain of the TERp edit categories, specifically, deletions (D), insertions (I), and shifts (S); the two remaining categories, matches (M) and paraphrases (P), indicate that the words are identical or that the original meaning was preserved. [sent-212, score-0.26]
74 In order to identify reasonably meaningful paraphrase units based on potential errors, we rely on a source language constituency parser. [sent-214, score-0.282]
75 8It is possible that the difficulty so identified involves backtranslation only, not translation in the original direction. [sent-221, score-0.429]
76 If that is the case, then more paraphrasing will be done than necessary, but the quality of the TP process’s output should not suffer. [sent-222, score-0.25]
77 9 We used English reference 0 as the source sentence, and the original Chinese sentence as the target. [sent-226, score-0.211]
78 (For six sentences, no paraphrases weres suggested for any of the problematic spans. [sent-229, score-0.156]
79 ) These yielded full-sentence paraphrase alternatives for the 1,000 sentences, which we again evaluated via an oracle study. [sent-230, score-0.417]
80 Comparing with the GT output, we find that TP yields a better-translated paraphrase sentence is available in 3 13 of the 1000 cases, or 31. [sent-233, score-0.22]
81 3%, and for those 313 cases, TER for the oracle-best paraphrase alternative improves on the TER for the original sentence by 12. [sent-234, score-0.275]
82 The cost for human tasks in this study just paraphrases, since identifying problematic spans was done automatically was $117. [sent-238, score-0.272]
83 NPNP PP Figure 3: TERp alignment of a source sentence and its back-translation in order to identify a problematic source span. [sent-244, score-0.271]
84 Our experimentation yielded a consistent pattern of results, supporting the conclusion that targeted 135 paraphrasing can lead to significant improvements in translation, via several different measures. [sent-248, score-0.508]
85 First, a very small pilot study for Chinese-English translation in Wikipedia provided preliminary validation that translation fluency and accuracy can be improved quite significantly for a set of fairly chosen test sentences, according to human ratings. [sent-249, score-1.112]
86 Second, a small experiment in Chinese-English translation using standard NIST test sentences suggested the potential for dramatic gains using the BLEU and TERp scores, with oracle improvements of 2. [sent-250, score-0.529]
87 And third, in a large scale evaluation of the approach using English-Chinese translation of 1,000 sentences, this time automating the step of identifying potentially mistranslated parts of source sentences, the oracle results demonstrated that a gain of nearly 4 TER points is available. [sent-255, score-0.813]
88 One important step will be to better characterize the relationship between cost and quality in quantitative terms: how much does it cost to obtain how much quality improvement, and how does that compare with typical professional translation costs of $0. [sent-257, score-0.536]
89 This question is closely connected with the dynamics of crowdsourcing platforms such as Mechanical Turk the cost per sentence in these experiments works out to be around $0. [sent-259, score-0.194]
90 12, but translation on a large scale will involve a complicated ecosystem of workers and cheaters, tasks and motivations and incentives (Quinn and Bederson, 2009). [sent-260, score-0.388]
91 A related crowdsourcing issue requiring further study is the availability of monolingual human participants for a range of language pairs, in order to validate the argument that drawing on monolingual human participation will significantly reduce the severity of the availability bottleneck. [sent-261, score-0.667]
92 — Another set ofissues concerns the underlying translation technology. [sent-263, score-0.388]
93 A reviewer correctly notes that the value of the approach taken here is likely to vary depending upon the quality of the underlying translation system, and the approach may break down at the extrema, when the baseline translation is either already very good or completely awful. [sent-264, score-0.776]
94 We plan to implement a fully automatic targeted paraphrasing translation pipeline, using the automated methods discussed when introducing the pipeline in Section 2, including translation of targeted paraphrase lattices (cf. [sent-267, score-1.705]
95 Correcting automatic translations through collaborations between mt and monolingual target-language users. [sent-277, score-0.305]
96 Fast, cheap, and creative: Evaluating translation quality using Amazon’s Mechanical Turk. [sent-306, score-0.388]
97 Improving arabicchinese statistical machine translation using english as pivot language. [sent-339, score-0.429]
98 Integration of speech to computer-assisted translation using finite-state automata. [sent-351, score-0.388]
99 Machine translation : What type of post-editing on what type of documents for what type of users. [sent-367, score-0.388]
100 A study of translation edit rate with targeted human annotation. [sent-407, score-0.654]
wordName wordTfidf (topN-words)
[('translation', 0.388), ('paraphrasing', 0.25), ('terp', 0.24), ('tp', 0.223), ('targeted', 0.209), ('fluency', 0.182), ('paraphrase', 0.171), ('paraphrases', 0.156), ('spans', 0.141), ('oracle', 0.141), ('monolingual', 0.129), ('mt', 0.117), ('gt', 0.112), ('source', 0.111), ('mistranslated', 0.107), ('ratings', 0.099), ('speakers', 0.097), ('mechanical', 0.097), ('pilot', 0.097), ('bederson', 0.083), ('intelligible', 0.083), ('turkers', 0.083), ('participation', 0.08), ('translate', 0.077), ('maryland', 0.076), ('cost', 0.074), ('turk', 0.074), ('crowdsourcing', 0.071), ('dyer', 0.07), ('adequacy', 0.069), ('gain', 0.066), ('translators', 0.065), ('attorney', 0.062), ('horizons', 0.062), ('kennedy', 0.062), ('monolinguals', 0.062), ('bilingual', 0.062), ('google', 0.06), ('edits', 0.059), ('translations', 0.059), ('ter', 0.058), ('human', 0.057), ('snover', 0.056), ('alternatives', 0.056), ('bleu', 0.055), ('alternative', 0.055), ('massachusetts', 0.054), ('du', 0.054), ('buzek', 0.053), ('olivia', 0.053), ('association', 0.052), ('availability', 0.051), ('chinese', 0.051), ('reference', 0.051), ('spectrum', 0.049), ('sentence', 0.049), ('yielded', 0.049), ('quinn', 0.048), ('pipeline', 0.047), ('nist', 0.045), ('sentential', 0.045), ('meaning', 0.045), ('late', 0.044), ('paraphrased', 0.044), ('lattice', 0.043), ('fully', 0.043), ('bound', 0.042), ('nj', 0.042), ('participants', 0.042), ('philip', 0.042), ('albrecht', 0.041), ('aure', 0.041), ('backtranslation', 0.041), ('bowker', 0.041), ('cheaters', 0.041), ('chinesespeaking', 0.041), ('coakley', 0.041), ('cravate', 0.041), ('dans', 0.041), ('ege', 0.041), ('enateur', 0.041), ('esteban', 0.041), ('etat', 0.041), ('humanmachine', 0.041), ('indiquent', 0.041), ('jupiter', 0.041), ('khadivi', 0.041), ('morita', 0.041), ('procureur', 0.041), ('shahaf', 0.041), ('sondages', 0.041), ('sont', 0.041), ('vacation', 0.041), ('pivot', 0.041), ('bannard', 0.041), ('fluent', 0.041), ('wikipedia', 0.039), ('morristown', 0.039), ('amazon', 0.039), ('thing', 0.039)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 63 emnlp-2010-Improving Translation via Targeted Paraphrasing
Author: Philip Resnik ; Olivia Buzek ; Chang Hu ; Yakov Kronrod ; Alex Quinn ; Benjamin B. Bederson
Abstract: Targeted paraphrasing is a new approach to the problem of obtaining cost-effective, reasonable quality translation that makes use of simple and inexpensive human computations by monolingual speakers in combination with machine translation. The key insight behind the process is that it is possible to spot likely translation errors with only monolingual knowledge of the target language, and it is possible to generate alternative ways to say the same thing (i.e. paraphrases) with only monolingual knowledge of the source language. Evaluations demonstrate that this approach can yield substantial improvements in translation quality.
2 0.39848486 47 emnlp-2010-Example-Based Paraphrasing for Improved Phrase-Based Statistical Machine Translation
Author: Aurelien Max
Abstract: In this article, an original view on how to improve phrase translation estimates is proposed. This proposal is grounded on two main ideas: first, that appropriate examples of a given phrase should participate more in building its translation distribution; second, that paraphrases can be used to better estimate this distribution. Initial experiments provide evidence of the potential of our approach and its implementation for effectively improving translation performance.
3 0.29110917 50 emnlp-2010-Facilitating Translation Using Source Language Paraphrase Lattices
Author: Jinhua Du ; Jie Jiang ; Andy Way
Abstract: For resource-limited language pairs, coverage of the test set by the parallel corpus is an important factor that affects translation quality in two respects: 1) out of vocabulary words; 2) the same information in an input sentence can be expressed in different ways, while current phrase-based SMT systems cannot automatically select an alternative way to transfer the same information. Therefore, given limited data, in order to facilitate translation from the input side, this paper proposes a novel method to reduce the translation difficulty using source-side lattice-based paraphrases. We utilise the original phrases from the input sentence and the corresponding paraphrases to build a lattice with estimated weights for each edge to improve translation quality. Compared to the baseline system, our method achieves relative improvements of 7.07%, 6.78% and 3.63% in terms of BLEU score on small, medium and large- scale English-to-Chinese translation tasks respectively. The results show that the proposed method is effective not only for resourcelimited language pairs, but also for resourcesufficient pairs to some extent.
4 0.27795494 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning
Author: Lei Shi ; Rada Mihalcea ; Mingjun Tian
Abstract: In this paper, we introduce a method that automatically builds text classifiers in a new language by training on already labeled data in another language. Our method transfers the classification knowledge across languages by translating the model features and by using an Expectation Maximization (EM) algorithm that naturally takes into account the ambiguity associated with the translation of a word. We further exploit the readily available unlabeled data in the target language via semisupervised learning, and adapt the translated model to better fit the data distribution of the target language.
5 0.24764545 89 emnlp-2010-PEM: A Paraphrase Evaluation Metric Exploiting Parallel Texts
Author: Chang Liu ; Daniel Dahlmeier ; Hwee Tou Ng
Abstract: We present PEM, the first fully automatic metric to evaluate the quality of paraphrases, and consequently, that of paraphrase generation systems. Our metric is based on three criteria: adequacy, fluency, and lexical dissimilarity. The key component in our metric is a robust and shallow semantic similarity measure based on pivot language N-grams that allows us to approximate adequacy independently of lexical similarity. Human evaluation shows that PEM achieves high correlation with human judgments.
6 0.2380441 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding
7 0.235415 52 emnlp-2010-Further Meta-Evaluation of Broad-Coverage Surface Realization
8 0.1957676 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice
9 0.17048401 22 emnlp-2010-Automatic Evaluation of Translation Quality for Distant Language Pairs
10 0.15222612 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities
11 0.13833266 5 emnlp-2010-A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages
12 0.13513678 99 emnlp-2010-Statistical Machine Translation with a Factorized Grammar
13 0.13030286 29 emnlp-2010-Combining Unsupervised and Supervised Alignments for MT: An Empirical Study
14 0.12827592 86 emnlp-2010-Non-Isomorphic Forest Pair Translation
15 0.12371463 36 emnlp-2010-Discriminative Word Alignment with a Function Word Reordering Model
16 0.12324162 35 emnlp-2010-Discriminative Sample Selection for Statistical Machine Translation
17 0.10865881 79 emnlp-2010-Mining Name Translations from Entity Graph Mapping
18 0.10216313 39 emnlp-2010-EMNLP 044
19 0.10102317 1 emnlp-2010-"Poetic" Statistical Machine Translation: Rhyme and Meter
20 0.095681027 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar
topicId topicWeight
[(0, 0.374), (1, -0.437), (2, -0.201), (3, 0.045), (4, -0.075), (5, 0.15), (6, -0.001), (7, 0.004), (8, 0.181), (9, -0.112), (10, -0.042), (11, -0.051), (12, 0.054), (13, 0.055), (14, -0.041), (15, 0.029), (16, 0.033), (17, 0.026), (18, 0.062), (19, 0.011), (20, -0.039), (21, -0.063), (22, -0.009), (23, -0.051), (24, -0.073), (25, 0.027), (26, -0.102), (27, -0.018), (28, 0.015), (29, 0.027), (30, -0.019), (31, -0.058), (32, 0.0), (33, 0.029), (34, 0.011), (35, 0.033), (36, -0.018), (37, 0.04), (38, 0.006), (39, 0.052), (40, -0.03), (41, 0.011), (42, -0.02), (43, -0.016), (44, 0.013), (45, -0.04), (46, -0.033), (47, 0.0), (48, 0.068), (49, 0.003)]
simIndex simValue paperId paperTitle
same-paper 1 0.96066368 63 emnlp-2010-Improving Translation via Targeted Paraphrasing
Author: Philip Resnik ; Olivia Buzek ; Chang Hu ; Yakov Kronrod ; Alex Quinn ; Benjamin B. Bederson
Abstract: Targeted paraphrasing is a new approach to the problem of obtaining cost-effective, reasonable quality translation that makes use of simple and inexpensive human computations by monolingual speakers in combination with machine translation. The key insight behind the process is that it is possible to spot likely translation errors with only monolingual knowledge of the target language, and it is possible to generate alternative ways to say the same thing (i.e. paraphrases) with only monolingual knowledge of the source language. Evaluations demonstrate that this approach can yield substantial improvements in translation quality.
2 0.85361403 47 emnlp-2010-Example-Based Paraphrasing for Improved Phrase-Based Statistical Machine Translation
Author: Aurelien Max
Abstract: In this article, an original view on how to improve phrase translation estimates is proposed. This proposal is grounded on two main ideas: first, that appropriate examples of a given phrase should participate more in building its translation distribution; second, that paraphrases can be used to better estimate this distribution. Initial experiments provide evidence of the potential of our approach and its implementation for effectively improving translation performance.
3 0.84310383 89 emnlp-2010-PEM: A Paraphrase Evaluation Metric Exploiting Parallel Texts
Author: Chang Liu ; Daniel Dahlmeier ; Hwee Tou Ng
Abstract: We present PEM, the first fully automatic metric to evaluate the quality of paraphrases, and consequently, that of paraphrase generation systems. Our metric is based on three criteria: adequacy, fluency, and lexical dissimilarity. The key component in our metric is a robust and shallow semantic similarity measure based on pivot language N-grams that allows us to approximate adequacy independently of lexical similarity. Human evaluation shows that PEM achieves high correlation with human judgments.
4 0.81748211 50 emnlp-2010-Facilitating Translation Using Source Language Paraphrase Lattices
Author: Jinhua Du ; Jie Jiang ; Andy Way
Abstract: For resource-limited language pairs, coverage of the test set by the parallel corpus is an important factor that affects translation quality in two respects: 1) out of vocabulary words; 2) the same information in an input sentence can be expressed in different ways, while current phrase-based SMT systems cannot automatically select an alternative way to transfer the same information. Therefore, given limited data, in order to facilitate translation from the input side, this paper proposes a novel method to reduce the translation difficulty using source-side lattice-based paraphrases. We utilise the original phrases from the input sentence and the corresponding paraphrases to build a lattice with estimated weights for each edge to improve translation quality. Compared to the baseline system, our method achieves relative improvements of 7.07%, 6.78% and 3.63% in terms of BLEU score on small, medium and large- scale English-to-Chinese translation tasks respectively. The results show that the proposed method is effective not only for resourcelimited language pairs, but also for resourcesufficient pairs to some extent.
5 0.69173461 22 emnlp-2010-Automatic Evaluation of Translation Quality for Distant Language Pairs
Author: Hideki Isozaki ; Tsutomu Hirao ; Kevin Duh ; Katsuhito Sudoh ; Hajime Tsukada
Abstract: Automatic evaluation of Machine Translation (MT) quality is essential to developing highquality MT systems. Various evaluation metrics have been proposed, and BLEU is now used as the de facto standard metric. However, when we consider translation between distant language pairs such as Japanese and English, most popular metrics (e.g., BLEU, NIST, PER, and TER) do not work well. It is well known that Japanese and English have completely different word orders, and special care must be paid to word order in translation. Otherwise, translations with wrong word order often lead to misunderstanding and incomprehensibility. For instance, SMT-based Japanese-to-English translators tend to translate ‘A because B’ as ‘B because A.’ Thus, word order is the most important problem for distant language translation. However, conventional evaluation metrics do not significantly penalize such word order mistakes. Therefore, locally optimizing these metrics leads to inadequate translations. In this paper, we propose an automatic evaluation metric based on rank correlation coefficients modified with precision. Our meta-evaluation of the NTCIR-7 PATMT JE task data shows that this metric outperforms conventional metrics.
6 0.65427649 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding
7 0.63678175 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning
8 0.60505593 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice
9 0.59906608 52 emnlp-2010-Further Meta-Evaluation of Broad-Coverage Surface Realization
10 0.57882619 5 emnlp-2010-A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages
11 0.51291198 99 emnlp-2010-Statistical Machine Translation with a Factorized Grammar
12 0.43364674 29 emnlp-2010-Combining Unsupervised and Supervised Alignments for MT: An Empirical Study
13 0.43100485 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar
14 0.42445219 79 emnlp-2010-Mining Name Translations from Entity Graph Mapping
15 0.41589585 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities
16 0.39842638 19 emnlp-2010-Automatic Analysis of Rhythmic Poetry with Applications to Generation and Translation
17 0.38598949 35 emnlp-2010-Discriminative Sample Selection for Statistical Machine Translation
18 0.38242146 1 emnlp-2010-"Poetic" Statistical Machine Translation: Rhyme and Meter
19 0.37552404 76 emnlp-2010-Maximum Entropy Based Phrase Reordering for Hierarchical Phrase-Based Translation
20 0.3641195 86 emnlp-2010-Non-Isomorphic Forest Pair Translation
topicId topicWeight
[(3, 0.016), (10, 0.017), (12, 0.045), (16, 0.238), (29, 0.138), (30, 0.084), (52, 0.042), (56, 0.076), (66, 0.14), (72, 0.052), (76, 0.022), (77, 0.011), (87, 0.014), (89, 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.82336003 63 emnlp-2010-Improving Translation via Targeted Paraphrasing
Author: Philip Resnik ; Olivia Buzek ; Chang Hu ; Yakov Kronrod ; Alex Quinn ; Benjamin B. Bederson
Abstract: Targeted paraphrasing is a new approach to the problem of obtaining cost-effective, reasonable quality translation that makes use of simple and inexpensive human computations by monolingual speakers in combination with machine translation. The key insight behind the process is that it is possible to spot likely translation errors with only monolingual knowledge of the target language, and it is possible to generate alternative ways to say the same thing (i.e. paraphrases) with only monolingual knowledge of the source language. Evaluations demonstrate that this approach can yield substantial improvements in translation quality.
2 0.80351889 109 emnlp-2010-Translingual Document Representations from Discriminative Projections
Author: John Platt ; Kristina Toutanova ; Wen-tau Yih
Abstract: Representing documents by vectors that are independent of language enhances machine translation and multilingual text categorization. We use discriminative training to create a projection of documents from multiple languages into a single translingual vector space. We explore two variants to create these projections: Oriented Principal Component Analysis (OPCA) and Coupled Probabilistic Latent Semantic Analysis (CPLSA). Both of these variants start with a basic model of documents (PCA and PLSA). Each model is then made discriminative by encouraging comparable document pairs to have similar vector representations. We evaluate these algorithms on two tasks: parallel document retrieval for Wikipedia and Europarl documents, and cross-lingual text classification on Reuters. The two discriminative variants, OPCA and CPLSA, significantly outperform their corre- sponding baselines. The largest differences in performance are observed on the task of retrieval when the documents are only comparable and not parallel. The OPCA method is shown to perform best.
3 0.67553097 47 emnlp-2010-Example-Based Paraphrasing for Improved Phrase-Based Statistical Machine Translation
Author: Aurelien Max
Abstract: In this article, an original view on how to improve phrase translation estimates is proposed. This proposal is grounded on two main ideas: first, that appropriate examples of a given phrase should participate more in building its translation distribution; second, that paraphrases can be used to better estimate this distribution. Initial experiments provide evidence of the potential of our approach and its implementation for effectively improving translation performance.
Author: Amr Ahmed ; Eric Xing
Abstract: With the proliferation of user-generated articles over the web, it becomes imperative to develop automated methods that are aware of the ideological-bias implicit in a document collection. While there exist methods that can classify the ideological bias of a given document, little has been done toward understanding the nature of this bias on a topical-level. In this paper we address the problem ofmodeling ideological perspective on a topical level using a factored topic model. We develop efficient inference algorithms using Collapsed Gibbs sampling for posterior inference, and give various evaluations and illustrations of the utility of our model on various document collections with promising results. Finally we give a Metropolis-Hasting inference algorithm for a semi-supervised extension with decent results.
5 0.67193693 89 emnlp-2010-PEM: A Paraphrase Evaluation Metric Exploiting Parallel Texts
Author: Chang Liu ; Daniel Dahlmeier ; Hwee Tou Ng
Abstract: We present PEM, the first fully automatic metric to evaluate the quality of paraphrases, and consequently, that of paraphrase generation systems. Our metric is based on three criteria: adequacy, fluency, and lexical dissimilarity. The key component in our metric is a robust and shallow semantic similarity measure based on pivot language N-grams that allows us to approximate adequacy independently of lexical similarity. Human evaluation shows that PEM achieves high correlation with human judgments.
6 0.6690076 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice
7 0.66615903 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities
8 0.66382426 7 emnlp-2010-A Mixture Model with Sharing for Lexical Semantics
9 0.66046524 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar
10 0.6603843 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment
11 0.65928864 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification
12 0.65924364 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding
13 0.65905601 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation
14 0.6586501 35 emnlp-2010-Discriminative Sample Selection for Statistical Machine Translation
15 0.65601975 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation
16 0.65581679 31 emnlp-2010-Constraints Based Taxonomic Relation Classification
17 0.65474457 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation
18 0.65365529 87 emnlp-2010-Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space
19 0.64887154 86 emnlp-2010-Non-Isomorphic Forest Pair Translation
20 0.64844376 84 emnlp-2010-NLP on Spoken Documents Without ASR