acl acl2011 acl2011-247 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sara Stymne
Abstract: In this thesis proposal Ipresent my thesis work, about pre- and postprocessing for statistical machine translation, mainly into Germanic languages. I focus my work on four areas: compounding, definite noun phrases, reordering, and error correction. Initial results are positive within all four areas, and there are promising possibilities for extending these approaches. In addition Ialso focus on methods for performing thorough error analysis of machine translation output, which can both motivate and evaluate the studies performed.
Reference: text
sentIndex sentText sentNum sentScore
1 se iu Abstract In this thesis proposal Ipresent my thesis work, about pre- and postprocessing for statistical machine translation, mainly into Germanic languages. [sent-3, score-0.554]
2 I focus my work on four areas: compounding, definite noun phrases, reordering, and error correction. [sent-4, score-0.53]
3 In addition Ialso focus on methods for performing thorough error analysis of machine translation output, which can both motivate and evaluate the studies performed. [sent-6, score-0.39]
4 1 Introduction Statistical machine translation (SMT) is based on training statistical models from large corpora of human translations. [sent-7, score-0.282]
5 A large drawback of SMT systems is that they use no or little grammatical knowledge, relying mainly on a target language model for producing correct target language texts, often resulting in ungrammatical output. [sent-9, score-0.122]
6 The main focus for SMT to date has been on translation into English, for which the models work relatively well, especially for source languages that are structurally similar to English. [sent-11, score-0.347]
7 There has been less research on translation out of English, or between other language pairs. [sent-12, score-0.184]
8 Methods that are useful for translation into English have problems in many cases, for instance for translation into morphologically rich languages. [sent-13, score-0.451]
9 Word order differences and 12 morphological complexity of a language have been shown to be explanatory variables for the performance of phrase-based SMT systems (Birch et al. [sent-14, score-0.071]
10 Some problems with SMT into German and Swedish are exemplified in Table 1. [sent-17, score-0.055]
11 In the German example, the translation of the verb welcome is missing in the SMT output. [sent-18, score-0.259]
12 Missing and misplaced verbs are common error types, since the German verb should appear last in the sentence in this context, as in the reference, begr u¨ßen. [sent-19, score-0.144]
13 In the Swedish example, there are problems with a definite NP, which has the wrong gender of the definite article, den instead of det, and is missing a definite suffix on the noun syns a¨tt(et) ((the) approach). [sent-21, score-1.046]
14 In this proposal Ioutline my thesis work which aims to improve statistical machine translation, particularly into Germanic languages, by using pre- and postprocessing on one or both language sides, with an additional focus on error analysis. [sent-22, score-0.585]
15 In section 2 I present a thesis overview, and in section 3 Ibriefly overview MT evaluation techniques, and discuss my work on MT error analysis. [sent-23, score-0.185]
16 In section 4 Idescribe my work on pre- and postprocessing, which is focused on compounding, definite noun phrases, word order, and error correction. [sent-24, score-0.453]
17 ra¨sdetoi 2 Thesis Overview My main research focus is how pre- and postprocessing can be used to improve statistical MT, with a focus on translation into Germanic languages. [sent-33, score-0.555]
18 The idea behind preprocessing is to change the training corpus on the source side and/or on the target side in order to make them more similar, which makes the SMT task easier, since the standard SMT models work better for more similar languages. [sent-34, score-0.079]
19 Postprocessing is needed after the translation when the target language has been preprocessed, in order to restore it to the normal target language. [sent-35, score-0.219]
20 Postprocessing can also be used on standard MT output, in order to correct some of the errors from the MT system. [sent-36, score-0.07]
21 I focus my work about pre- and postprocessing on four areas: compounding, definite noun phrases, word order, and error correction. [sent-37, score-0.754]
22 In addition Iam making an effort into error analysis, to identify and classify errors in the MT output, both in order to focus my research effort, and to evaluate and compare systems. [sent-38, score-0.271]
23 My work is based on the phrase-based approach to statistical machine translation (PBSMT, Koehn et al. [sent-39, score-0.282]
24 Ifurther use the framework of factored machine translation, where each word is represented as a vector of factors, such as surface word, lemma and part-of-speech, rather than only as surface words (Koehn and Hoang, 2007). [sent-41, score-0.113]
25 I have chosen to focus on PBSMT, which is a very successful MT approach, and have received much research focus. [sent-45, score-0.101]
26 (2007a)) can potentially overcome some language differences that are problematic for PBSMT, such as long-distance word order differences. [sent-49, score-0.105]
27 3 Evaluation and Error Analysis Machine translation systems are often only evaluated quantitatively by using automatic metrics, such as Bleu (Papineni et al. [sent-53, score-0.184]
28 While this type of evaluation has its advantages, mainly that it is fast and cheap, its correlation with human judgments is often low, especially for translation out of English (Callison-Burch et al. [sent-55, score-0.238]
29 In order to overcome these problems to some extent I use several metrics in my studies, instead of only Bleu. [sent-57, score-0.176]
30 Despite this, metrics only give a single score per sentence batch and system, which even using several metrics gives us little information on the particular problems with a system, or about what the possible improvements are. [sent-58, score-0.195]
31 One alternative to automatic metrics is human judgments, either absolute scores, for instance for adequacy or fluency, or by ranking sentences or segments. [sent-59, score-0.052]
32 I mainly take advantage of this type of evaluation as part of participating with my research group in MT shared tasks with large evaluation campaigns such as WMT (e. [sent-61, score-0.054]
33 To overcome the limitation of quantitative evalu- ations, Ifocus on error analysis (EA) of MT output in my thesis. [sent-65, score-0.182]
34 EA is the task of annotating and classifying the errors in MT output, which gives a qualitative view. [sent-66, score-0.063]
35 It can be used to evaluate and compare systems, but is also useful in order to focus the research effort on common problems for the language pair in question. [sent-67, score-0.18]
36 (2006) suggested a typology with five main categories: missing, incorrect, unknown, word order, and punctuation, which have also been used by other researchers, mainly for evaluation. [sent-70, score-0.201]
37 However, this typology is relatively shallow and mixes classification of errors with causes of errors. [sent-71, score-0.152]
38 (2010) suggested a typology based on linguistic categories, such as orthography and semantics, but their descriptions of these categories and their subcategories are not detailed. [sent-73, score-0.147]
39 Thus, as part of my research, Iam in the progress of designing a fine-grained typology and guidelines for EA. [sent-74, score-0.151]
40 Ihave also created a tool for performing MT error analysis (Stymne, 2011a). [sent-75, score-0.111]
41 Initial annotations have helped to focus my research efforts, and will be discussed below. [sent-76, score-0.08]
42 4 Main Research Problems In this section Idescribe the four main problem areas I will focus on in my thesis project. [sent-78, score-0.22]
43 I summarize briefly previous work in each area, and outline my own current and planned contributions. [sent-79, score-0.081]
44 1 Compounding In most Germanic languages, compounds are written without spaces or other word boundaries, which makes them problematic for SMT, mainly due to sparse data problems. [sent-82, score-0.246]
45 The standard method for treating compounds for translation from Germanic languages is to split them in both the training data and translation input (e. [sent-83, score-0.672]
46 Koehn and Knight (2003) also suggested a corpus14 based compound splitting method that has been much used for SMT, where compounds are split based on corpus frequencies of its parts. [sent-87, score-0.422]
47 If compounds are split for translation into Germanic languages, the SMT system produces output with split compounds, which need to be postprocessed into full compounds. [sent-88, score-0.497]
48 For this process to be successful, it is important that the SMT system produces the split compound parts in a correct word order. [sent-90, score-0.233]
49 To encourage this I have used a factored translation system that outputs parts-of-speech and uses a sequence model on parts-of-speech. [sent-91, score-0.25]
50 Iextended the part-of-speech tagset to use special part-of-speech tags for split compound parts, which depend on the head part-of-speech of the compound. [sent-92, score-0.2]
51 For instance, the Swedish noun p ¨arontr a¨d (pear tree) would be tagged as p¨ aron|N-part tr¨ ad|N when split. [sent-93, score-0.055]
52 Using tthagisg emdo dasel pt ah¨reo nnu|Nm-bpearr otf t compound parts tth. [sent-94, score-0.191]
53 at U were produced in the wrong order was reduced drastically compared to not using a part-of-speech sequence model for translation into German (Stymne, 2009a). [sent-95, score-0.252]
54 I also designed an algorithm for the merging task that uses these part-of-speech tags to merge compounds only when the next part-of-speech tag matches. [sent-96, score-0.283]
55 This merging method outperforms reim- plementations and variations of previous merging suggestions (Popovi c´ et al. [sent-97, score-0.21]
56 , 2006), and methods adapted from morphology merging (Virpioja et al. [sent-98, score-0.091]
57 It also has the advantage over previous merging methods that it can produce novel compounds, while at the same time reducing the risk of merging parts into non-words. [sent-100, score-0.215]
58 Ihave also shown that these compound processing methods work equally well for translation into Swedish (Stymne and Holmqvist, 2008). [sent-101, score-0.342]
59 Currently Iam working on methods for further improving compound merging, with promising initial results. [sent-102, score-0.194]
60 2 Definite Noun Phrases In Scandinavian languages there are two ways to express definiteness in noun phrases, either by a definite article, or by a suffix on the noun. [sent-104, score-0.457]
61 This leads to problems when translating into these languages, such as superfluous definite articles and wrong forms of nouns. [sent-105, score-0.467]
62 348417 Table 2: A selection of results for the four pre- and postprocessing strategies. [sent-130, score-0.253]
63 BL is baseline systems, +Comp with compound processing, +Def with definite processing, +Reo with iterative reordering and alignment and monotone decoding, +EC with grammar checker error correction. [sent-132, score-0.713]
64 The test set for error correction only contains sentences that are affected by the error correction. [sent-133, score-0.273]
65 report shows no gain for a simple pre-processing strategy for translation from German to Swedish (Samuelsson, 2006). [sent-134, score-0.225]
66 There is similar work on other phenomena, such as Nießen and Ney (2000), who move German separated verb prefixes, to imitate the English phrasal verb structure. [sent-135, score-0.066]
67 I address definiteness by preprocessing the source language, to make definite NPs structurally similar to target language NPs. [sent-136, score-0.421]
68 Definite NPs in Scandinavian languages are mimicked in the source language by removing superfluous definite articles, and/or adding definite suffixes to nouns. [sent-138, score-0.783]
69 In Swedish and Norwegian, the distribution of definite suffixes is more complex than in Danish, and the basic strategy that worked well for Danish was not successful (Stymne, 2011b). [sent-141, score-0.428]
70 A small modification to the basic strategy, so that superfluous English articles were removed, but no suffixes were added, was successful for translation from English into Swedish and Norwegian. [sent-142, score-0.376]
71 A planned extension is to integrate the transformations into a lattice that is fed to the decoder, in the spirit of (Dyer et al. [sent-143, score-0.156]
72 3 Word Order There has been a lot of research on how to handle word order differences between languages. [sent-146, score-0.071]
73 Ihave performed an initial study on a language independent word order strategy where reordering rule learning and word alignment are performed iteratively, since they both depend on the other process (Stymne, 2011c). [sent-155, score-0.202]
74 In this study Ionly choose the 1-best reordering as input to the SMT system. [sent-157, score-0.09]
75 I plan to extend this by presenting several reorderings to the decoder as a lattice, which has been successful in previous work (see e. [sent-158, score-0.053]
76 My preliminary error analysis has shown that there are two main word order difficulties for translation between English and Swedish, adverb placement, and V2 errors, where the verb is not placed in the correct position when it should be placed before the subject. [sent-162, score-0.363]
77 Iplan to design a preprocessing scheme to tackle these particular problems for English-Swedish translation. [sent-163, score-0.099]
78 A few examples are Elming (2006), who use transformation-based learning for word substitution based on aligned human post-edited sentences, and Guzm a´n (2007) who used regular expression to correct regular Spanish errors. [sent-167, score-0.074]
79 Ihave applied error correction suggestions given by a grammar checker to the MT output, showing that it can improve certain types of errors, such as NP agreement and word order, with a high precision, but unfortunately with a low recall (Stymne and Ahrenberg, 2010). [sent-168, score-0.257]
80 Since the recall is low, the positive effect on metrics such as Bleu is small on general test sets, but there are improvements on test sets which only contains sentences that are affected by the postprocessing. [sent-169, score-0.052]
81 An error analysis showed that 68–74% of the corrections made were useful, and only around 10% of the changes made were harmful. [sent-170, score-0.111]
82 The initial error analysis Ihave performed has helped to identify common errors in SMT output, and shown that many of them are quite regular. [sent-172, score-0.214]
83 A strategy I intend to pursue is to further identify common and regular problems, and to either construct rules or to train a machine learning classifier to identify them, in order to be able to postprocess them. [sent-173, score-0.16]
84 It might also be possible to use the annotations from the error analysis as part of the training data for such a classifier. [sent-174, score-0.111]
85 5 Discussion The main focus of my thesis will be on designing and evaluating methods for pre- and postprocessing of statistical MT, where Iwill contribute methods that can improve translation within the four areas discussed in section 4. [sent-175, score-0.713]
86 The effort is focused on translation into Germanic languages, including German, on which there has been much previous research, and Swedish and other Scandinavian languages, where there has been little previous research. [sent-176, score-0.262]
87 Istrongly believe that it is important for MT researchers to perform qualitative evaluations, both for identifying problems with MT systems, and for evaluating and comparing systems. [sent-180, score-0.083]
88 In my experience it is often the case that a change to the system to improve one aspect, such as compounding, also leads to many other changes, in the case of compounding for instance because of the possibility of improved alignments, which Ithink we lack a proper understanding of. [sent-181, score-0.128]
89 My planned thesis contributions are to design a detailed error typology, guidelines, and a tool, targeted at MT researchers, for performing error annotation, and to improve statistical machine translation in four problem areas, using several methods of preand postprocessing. [sent-182, score-0.688]
90 Linguistic-based evaluation criteria to identify statistical machine translation errors. [sent-213, score-0.282]
91 Chinese syntactic reordering for adequate generation of Korean verbal phrases in Chineseto-Korean SMT. [sent-237, score-0.134]
92 Using a grammar checker for evaluation and postprocessing of statistical machine translation. [sent-258, score-0.389]
93 Processing of Swedish compounds for phrase-based statistical machine translation. [sent-262, score-0.29]
94 German compounds in factored sta- tistical machine translation. [sent-266, score-0.305]
95 A comparison ofmerging strategies for translation of German compounds. [sent-271, score-0.184]
96 Definite noun phrases in statistical machine translation into Danish. [sent-275, score-0.381]
97 Blast: A tool for error analysis of machine translation output. [sent-279, score-0.342]
98 Definite noun phrases in statistical machine translation into Scandinavian languages. [sent-283, score-0.381]
99 Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner. [sent-296, score-0.282]
100 Improving a statistical MT system with automatically learned rewrite patterns. [sent-304, score-0.051]
wordName wordTfidf (topN-words)
[('stymne', 0.325), ('definite', 0.287), ('postprocessing', 0.224), ('smt', 0.221), ('swedish', 0.211), ('compounds', 0.192), ('scandinavian', 0.185), ('translation', 0.184), ('germanic', 0.178), ('mt', 0.171), ('sara', 0.168), ('compound', 0.158), ('pbsmt', 0.136), ('ihave', 0.134), ('compounding', 0.128), ('german', 0.125), ('typology', 0.117), ('error', 0.111), ('superfluous', 0.092), ('merging', 0.091), ('koehn', 0.09), ('reordering', 0.09), ('eamt', 0.089), ('iam', 0.089), ('danish', 0.085), ('planned', 0.081), ('popovi', 0.075), ('thesis', 0.074), ('languages', 0.07), ('areas', 0.069), ('checker', 0.067), ('factored', 0.066), ('philipp', 0.065), ('guzm', 0.062), ('idescribe', 0.062), ('virpioja', 0.062), ('noun', 0.055), ('problems', 0.055), ('farr', 0.054), ('iplan', 0.054), ('mainly', 0.054), ('nie', 0.053), ('successful', 0.053), ('metrics', 0.052), ('statistical', 0.051), ('correction', 0.051), ('vilar', 0.05), ('reo', 0.05), ('wmt', 0.049), ('ea', 0.049), ('bleu', 0.048), ('focus', 0.048), ('ping', 0.047), ('machine', 0.047), ('suffixes', 0.047), ('jos', 0.045), ('definiteness', 0.045), ('unpublished', 0.045), ('structurally', 0.045), ('preprocessing', 0.044), ('athens', 0.044), ('phrases', 0.044), ('lattice', 0.043), ('verlag', 0.043), ('effort', 0.042), ('missing', 0.042), ('split', 0.042), ('strategy', 0.041), ('copenhagen', 0.041), ('hermann', 0.04), ('birch', 0.038), ('regular', 0.037), ('output', 0.037), ('alexandra', 0.037), ('xia', 0.036), ('differences', 0.036), ('initial', 0.036), ('little', 0.036), ('errors', 0.035), ('order', 0.035), ('zhang', 0.035), ('designing', 0.034), ('overcome', 0.034), ('parts', 0.033), ('verb', 0.033), ('wrong', 0.033), ('drawback', 0.032), ('helped', 0.032), ('dyer', 0.032), ('transformations', 0.032), ('preprocessed', 0.031), ('knight', 0.03), ('proposal', 0.03), ('suggested', 0.03), ('four', 0.029), ('nps', 0.029), ('suggestions', 0.028), ('qualitative', 0.028), ('morphologically', 0.028), ('session', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999946 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages
Author: Sara Stymne
Abstract: In this thesis proposal Ipresent my thesis work, about pre- and postprocessing for statistical machine translation, mainly into Germanic languages. I focus my work on four areas: compounding, definite noun phrases, reordering, and error correction. Initial results are positive within all four areas, and there are promising possibilities for extending these approaches. In addition Ialso focus on methods for performing thorough error analysis of machine translation output, which can both motivate and evaluate the studies performed.
2 0.24817841 193 acl-2011-Language-independent compound splitting with morphological operations
Author: Klaus Macherey ; Andrew Dai ; David Talbot ; Ashok Popat ; Franz Och
Abstract: Translating compounds is an important problem in machine translation. Since many compounds have not been observed during training, they pose a challenge for translation systems. Previous decompounding methods have often been restricted to a small set of languages as they cannot deal with more complex compound forming processes. We present a novel and unsupervised method to learn the compound parts and morphological operations needed to split compounds into their compound parts. The method uses a bilingual corpus to learn the morphological operations required to split a compound into its parts. Furthermore, monolingual corpora are used to learn and filter the set of compound part candidates. We evaluate our method within a machine translation task and show significant improvements for various languages to show the versatility of the approach.
3 0.24053562 62 acl-2011-Blast: A Tool for Error Analysis of Machine Translation Output
Author: Sara Stymne
Abstract: We present BLAST, an open source tool for error analysis of machine translation (MT) output. We believe that error analysis, i.e., to identify and classify MT errors, should be an integral part ofMT development, since it gives a qualitative view, which is not obtained by standard evaluation methods. BLAST can aid MT researchers and users in this process, by providing an easy-to-use graphical user interface. It is designed to be flexible, and can be used with any MT system, language pair, and error typology. The annotation task can be aided by highlighting similarities with a reference translation.
4 0.16865161 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering
Author: Nadir Durrani ; Helmut Schmid ; Alexander Fraser
Abstract: We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the “N-gram” model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is able to handle short and long distance reorderings in a unified way, and (ii) a joint sequence model for the translation and reordering probabilities which is more flexible than standard phrase-based MT. We observe statistically significant improvements in BLEU over Moses for German-to-English and Spanish-to-English tasks, and comparable results for a French-to-English task.
5 0.15396075 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?
Author: Jinxi Xu ; Jinying Chen
Abstract: Word alignment is a central problem in statistical machine translation (SMT). In recent years, supervised alignment algorithms, which improve alignment accuracy by mimicking human alignment, have attracted a great deal of attention. The objective of this work is to explore the performance limit of supervised alignment under the current SMT paradigm. Our experiments used a manually aligned ChineseEnglish corpus with 280K words recently released by the Linguistic Data Consortium (LDC). We treated the human alignment as the oracle of supervised alignment. The result is surprising: the gain of human alignment over a state of the art unsupervised method (GIZA++) is less than 1point in BLEU. Furthermore, we showed the benefit of improved alignment becomes smaller with more training data, implying the above limit also holds for large training conditions. 1
6 0.15099625 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers
7 0.14853394 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
9 0.14492558 75 acl-2011-Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction
10 0.13445675 266 acl-2011-Reordering with Source Language Collocations
11 0.12948215 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation
12 0.12746441 49 acl-2011-Automatic Evaluation of Chinese Translation Output: Word-Level or Character-Level?
13 0.124141 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach
14 0.12110689 146 acl-2011-Goodness: A Method for Measuring Machine Translation Confidence
15 0.11921179 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
16 0.11250126 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation
17 0.11096458 263 acl-2011-Reordering Constraint Based on Document-Level Context
18 0.11003347 104 acl-2011-Domain Adaptation for Machine Translation by Mining Unseen Words
19 0.10751865 264 acl-2011-Reordering Metrics for MT
20 0.10486341 2 acl-2011-AM-FM: A Semantic Framework for Translation Quality Assessment
topicId topicWeight
[(0, 0.237), (1, -0.187), (2, 0.12), (3, 0.1), (4, 0.028), (5, 0.044), (6, 0.081), (7, -0.059), (8, 0.078), (9, 0.046), (10, -0.057), (11, -0.112), (12, -0.055), (13, -0.152), (14, 0.012), (15, 0.026), (16, -0.005), (17, -0.053), (18, -0.044), (19, 0.005), (20, -0.018), (21, 0.09), (22, 0.023), (23, -0.118), (24, -0.023), (25, 0.018), (26, 0.01), (27, -0.033), (28, -0.086), (29, 0.032), (30, 0.022), (31, 0.041), (32, 0.089), (33, -0.076), (34, 0.034), (35, 0.024), (36, -0.027), (37, 0.03), (38, -0.152), (39, 0.102), (40, -0.044), (41, -0.146), (42, -0.036), (43, -0.033), (44, 0.02), (45, 0.104), (46, 0.017), (47, -0.111), (48, 0.02), (49, -0.084)]
simIndex simValue paperId paperTitle
same-paper 1 0.93204027 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages
Author: Sara Stymne
Abstract: In this thesis proposal Ipresent my thesis work, about pre- and postprocessing for statistical machine translation, mainly into Germanic languages. I focus my work on four areas: compounding, definite noun phrases, reordering, and error correction. Initial results are positive within all four areas, and there are promising possibilities for extending these approaches. In addition Ialso focus on methods for performing thorough error analysis of machine translation output, which can both motivate and evaluate the studies performed.
2 0.8163684 193 acl-2011-Language-independent compound splitting with morphological operations
Author: Klaus Macherey ; Andrew Dai ; David Talbot ; Ashok Popat ; Franz Och
Abstract: Translating compounds is an important problem in machine translation. Since many compounds have not been observed during training, they pose a challenge for translation systems. Previous decompounding methods have often been restricted to a small set of languages as they cannot deal with more complex compound forming processes. We present a novel and unsupervised method to learn the compound parts and morphological operations needed to split compounds into their compound parts. The method uses a bilingual corpus to learn the morphological operations required to split a compound into its parts. Furthermore, monolingual corpora are used to learn and filter the set of compound part candidates. We evaluate our method within a machine translation task and show significant improvements for various languages to show the versatility of the approach.
3 0.76530105 62 acl-2011-Blast: A Tool for Error Analysis of Machine Translation Output
Author: Sara Stymne
Abstract: We present BLAST, an open source tool for error analysis of machine translation (MT) output. We believe that error analysis, i.e., to identify and classify MT errors, should be an integral part ofMT development, since it gives a qualitative view, which is not obtained by standard evaluation methods. BLAST can aid MT researchers and users in this process, by providing an easy-to-use graphical user interface. It is designed to be flexible, and can be used with any MT system, language pair, and error typology. The annotation task can be aided by highlighting similarities with a reference translation.
4 0.74292004 75 acl-2011-Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction
Author: Ann Clifton ; Anoop Sarkar
Abstract: This paper extends the training and tuning regime for phrase-based statistical machine translation to obtain fluent translations into morphologically complex languages (we build an English to Finnish translation system) . Our methods use unsupervised morphology induction. Unlike previous work we focus on morphologically productive phrase pairs – our decoder can combine morphemes across phrase boundaries. Morphemes in the target language may not have a corresponding morpheme or word in the source language. Therefore, we propose a novel combination of post-processing morphology prediction with morpheme-based translation. We show, using both automatic evaluation scores and linguistically motivated analyses of the output, that our methods outperform previously proposed ones and pro- vide the best known results on the EnglishFinnish Europarl translation task. Our methods are mostly language independent, so they should improve translation into other target languages with complex morphology. 1 Translation and Morphology Languages with rich morphological systems present significant hurdles for statistical machine translation (SMT) , most notably data sparsity, source-target asymmetry, and problems with automatic evaluation. In this work, we propose to address the problem of morphological complexity in an Englishto-Finnish MT task within a phrase-based translation framework. We focus on unsupervised segmentation methods to derive the morphological information supplied to the MT model in order to provide coverage on very large datasets and for languages with few hand-annotated 32 resources. In fact, in our experiments, unsupervised morphology always outperforms the use of a hand-built morphological analyzer. Rather than focusing on a few linguistically motivated aspects of Finnish morphological behaviour, we develop techniques for handling morphological complexity in general. We chose Finnish as our target language for this work, because it exemplifies many of the problems morphologically complex languages present for SMT. Among all the languages in the Europarl data-set, Finnish is the most difficult language to translate from and into, as was demonstrated in the MT Summit shared task (Koehn, 2005) . Another reason is the current lack of knowledge about how to apply SMT successfully to agglutinative languages like Turkish or Finnish. Our main contributions are: 1) the introduction of the notion of segmented translation where we explicitly allow phrase pairs that can end with a dangling morpheme, which can connect with other morphemes as part of the translation process, and 2) the use of a fully segmented translation model in combination with a post-processing morpheme prediction system, using unsupervised morphology induction. Both of these approaches beat the state of the art on the English-Finnish translation task. Morphology can express both content and function categories, and our experiments show that it is important to use morphology both within the translation model (for morphology with content) and outside it (for morphology contributing to fluency) . Automatic evaluation measures for MT, BLEU (Papineni et al., 2002), WER (Word Error Rate) and PER (Position Independent Word Error Rate) use the word as the basic unit rather than morphemes. In a word comProce dPinogrstla ofn tdh,e O 4r9etghon A,n Jnu nael 1 M9-e 2t4i,n2g 0 o1f1 t.he ?c A2s0s1o1ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 32–42, prised of multiple morphemes, getting even a single morpheme wrong means the entire word is wrong. In addition to standard MT evaluation measures, we perform a detailed linguistic analysis of the output. Our proposed approaches are significantly better than the state of the art, achieving the highest reported BLEU scores on the English-Finnish Europarl version 3 data-set. Our linguistic analysis shows that our models have fewer morpho-syntactic errors compared to the word-based baseline. 2 2.1 Models Baseline Models We set up three baseline models for comparison in this work. The first is a basic wordbased model (called Baseline in the results) ; we trained this on the original unsegmented version of the text. Our second baseline is a factored translation model (Koehn and Hoang, 2007) (called Factored) , which used as factors the word, “stem” 1 and suffix. These are derived from the same unsupervised segmentation model used in other experiments. The results (Table 3) show that a factored model was unable to match the scores of a simple wordbased baseline. We hypothesize that this may be an inherently difficult representational form for a language with the degree of morphological complexity found in Finnish. Because the morphology generation must be precomputed, for languages with a high degree of morphological complexity, the combinatorial explosion makes it unmanageable to capture the full range of morphological productivity. In addition, because the morphological variants are generated on a per-word basis within a given phrase, it excludes productive morphological combination across phrase boundaries and makes it impossible for the model to take into account any longdistance dependencies between morphemes. We conclude from this result that it may be more useful for an agglutinative language to use morphology beyond the confines of the phrasal unit, and condition its generation on more than just the local target stem. In order to compare the 1see Section 2.2. 33 performance of unsupervised segmentation for translation, our third baseline is a segmented translation model based on a supervised segmentation model (called Sup) , using the hand-built Omorfi morphological analyzer (Pirinen and Listenmaa, 2007) , which provided slightly higher BLEU scores than the word-based baseline. 2.2 Segmented Translation For segmented translation models, it cannot be taken for granted that greater linguistic accuracy in segmentation yields improved translation (Chang et al. , 2008) . Rather, the goal in segmentation for translation is instead to maximize the amount of lexical content-carrying morphology, while generalizing over the information not helpful for improving the translation model. We therefore trained several different segmentation models, considering factors of granularity, coverage, and source-target symmetry. We performed unsupervised segmentation of the target data, using Morfessor (Creutz and Lagus, 2005) and Paramor (Monson, 2008) , two top systems from the Morpho Challenge 2008 (their combined output was the Morpho Challenge winner) . However, translation models based upon either Paramor alone or the combined systems output could not match the wordbased baseline, so we concentrated on Morfessor. Morfessor uses minimum description length criteria to train a HMM-based segmentation model. When tested against a human-annotated gold standard of linguistic morpheme segmentations for Finnish, this algorithm outperforms competing unsupervised methods, achieving an F-score of 67.0% on a 3 million sentence corpus (Creutz and Lagus, 2006) . Varying the perplexity threshold in Morfessor does not segment more word types, but rather over-segments the same word types. In order to get robust, common segmentations, we trained the segmenter on the 5000 most frequent words2 ; we then used this to segment the entire data set. In order to improve coverage, we then further segmented 2For the factored model baseline we also used the same setting perplexity = 30, 5,000 most frequent words, but with all but the last suffix collapsed and called the “stem” . TabHMleoat1nr:gplhiMngor phermphocTur631ae04in, 81c9ie03ns67gi,64n0S14e567theTp 2rsa51t, 29Se 3t168able and in translation. any word type that contained a match from the most frequent suffix set, looking for the longest matching suffix character string. We call this method Unsup L-match. After the segmentation, word-internal morpheme boundary markers were inserted into the segmented text to be used to reconstruct the surface forms in the MT output. We then trained the Moses phrase-based system (Koehn et al., 2007) on the segmented and marked text. After decoding, it was a simple matter to join together all adjacent morphemes with word-internal boundary markers to reconstruct the surface forms. Figure 1(a) gives the full model overview for all the variants of the segmented translation model (supervised/unsupervised; with and without the Unsup L-match procedure) . Table 1shows how morphemes are being used in the MT system. Of the phrases that included segmentations (‘Morph’ in Table 1) , roughly a third were ‘productive’, i.e. had a hanging morpheme (with a form such as stem+) that could be joined to a suffix (‘Hanging Morph’ in Table 1) . However, in phrases used while decoding the development and test data, roughly a quarter of the phrases that generated the translated output included segmentations, but of these, only a small fraction (6%) had a hanging morpheme; and while there are many possible reasons to account for this we were unable to find a single convincing cause. 2.3 Morphology Generation Morphology generation as a post-processing step allows major vocabulary reduction in the translation model, and allows the use of morphologically targeted features for modeling inflection. A possible disadvantage of this approach is that in this model there is no opportunity to con34 sider the morphology in translation since it is removed prior to training the translation model. Morphology generation models can use a variety of bilingual and contextual information to capture dependencies between morphemes, often more long-distance than what is possible using n-gram language models over morphemes in the segmented model. Similar to previous work (Minkov et al. , 2007; Toutanova et al. , 2008) , we model morphology generation as a sequence learning problem. Un- like previous work, we use unsupervised morphology induction and use automatically generated suffix classes as tags. The first phase of our morphology prediction model is to train a MT system that produces morphologically simplified word forms in the target language. The output word forms are complex stems (a stem and some suffixes) but still missing some important suffix morphemes. In the second phase, the output of the MT decoder is then tagged with a sequence of abstract suffix tags. In particular, the output of the MT decoder is a sequence of complex stems denoted by x and the output is a sequence of suffix class tags denoted by y. We use a list of parts from (x,y) and map to a d-dimensional feature vector Φ(x, y) , with each dimension being a real number. We infer the best sequence of tags using: F(x) = argymaxp(y | x,w) where F(x) returns the highest scoring output y∗ . A conditional random field (CRF) (Lafferty et al. , 2001) defines the conditional probability as a linear score for each candidate y and a global normalization term: logp(y | x, w) = Φ(x, y) · w − log Z where Z = Py0∈ exp(Φ(x, y0) · w) . We use stochastiPc gradient descent (using crfsgd3) to train the weight vector w. So far, this is all off-the-shelf sequence learning. However, the output y∗ from the CRF decoder is still only a sequence of abstract suffix tags. The third and final phase in our morphology prediction model GEN(x) 3 http://leon. bottou. org/projects/sgd English Training Data words Finnish Training Data words Morphological Pre-Processing stem+ +morph MT System Alignment: word word word stem+ +morph stem stem+ +morph Post-Process: Morph Re-Stitching Fully inflected surface form Evaluation against original reference (a) Segmented Translation Model English Training Data words Finnish Training Data Morphological Pre-Prowceosrdsisng 1 stem+ +morph1+ +morph2 Morphological Pre-Processing 2 stem+ +morph1+ MPosrpthe-mPRr+eo-+cSmetsio crhp1i:nhg+swteomrd+ MA+lTmigwnSomyrspdthen 1mt:+ wsotermd complex stem: stem+morph1+ MPo rpsht-oPlro gcyesGse2n:erCaRtioFnstem+morph1+ morph2sLuarnfagcueagfeorMmomdealp ing Fully inflected surface form Evaluation against original reference (b) Post-Processing Model Translation & Generation Figure 1: Training and testing pipelines for the SMT models. is to take the abstract suffix tag sequence y∗ and then map it into fully inflected word forms, and rank those outputs using a morphemic language model. The abstract suffix tags are extracted from the unsupervised morpheme learning process, and are carefully designed to enable CRF training and decoding. We call this model CRFLM for short. Figure 1(b) shows the full pipeline and Figure 2 shows a worked example of all the steps involved. We use the morphologically segmented training data (obtained using the segmented corpus described in Section 2.24) and remove selected suffixes to create a morphologically simplified version of the training data. The MT model is trained on the morphologically simplified training data. The output from the MT system is then used as input to the CRF model. The CRF model was trained on a ∼210,000 Finnish sentences, consisting noefd d∼ o1n.5 a am ∼il2li1o0n,0 tokens; tishhe 2,000 cseens,te cnoncse Europarl t.e5s tm isl eito nco tnoskiesntesd; hoef 41,434 stem tokens. The labels in the output sequence y were obtained by selecting the most productive 150 stems, and then collapsing certain vowels into equivalence classes corresponding to Finnish vowel harmony patterns. Thus 4Note that unlike Section 2.2 we do not use Unsup L-match because when evaluating the CRF model on the suffix prediction task it obtained 95.61% without using Unsup L-match and 82.99% when using Unsup L-match. 35 variants -k¨ o and -ko become vowel-generic enclitic particle -kO, and variants -ss ¨a and -ssa become the vowel-generic inessive case marker -ssA, etc. This is the only language-specific component of our translation model. However, we expect this approach to work for other agglutinative languages as well. For fusional languages like Spanish, another mapping from suffix to abstract tags might be needed. These suffix transformations to their equivalence classes prevent morphophonemic variants of the same morpheme from competing against each other in the prediction model. This resulted in 44 possible label outputs per stem which was a reasonable sized tag-set for CRF training. The CRF was trained on monolingual features of the segmented text for suffix prediction, where t is the current token: Word Stem st−n, .., st, .., st+n(n = 4) Morph Prediction yt−2 , yt−1 , yt With this simple feature set, we were able to use features over longer distances, resulting in a total of 1,110,075 model features. After CRF based recovery of the suffix tag sequence, we use a bigram language model trained on a full segmented version on the training data to recover the original vowels. We used bigrams only, because the suffix vowel harmony alternation depends only upon the preceding phonemes in the word from which it was segmented. original training koskevaa mietint o¨ ¨a data: k ¨asitell ¨a ¨an segmentation: koske+ +va+ +a mietint ¨o+ + a¨ k a¨si+ +te+ +ll a¨+ + a¨+ +n (train bigram language model with mapping A = { a , a }) map n fi bniaglr asmuff liaxn gtou agbest mraocdte tag-set: koske+ +va+ +A mietint ¨o+ +A k ¨asi+ +te+ +ll ¨a+ + ¨a+ +n (train CRF model to predict the final suffix) peeling of final suffix: koske+ +va+ mietint ¨o+ k a¨si+ +te+ +ll a¨+ + a¨+ (train SMT model on this transformation of training data) (a) Training decoder output: koske+ +va+ mietint o¨+ k a¨si+ +te+ +ll a¨+ + a¨+ decoder output stitched up: koskeva+ mietint o¨+ k ¨asitell ¨a ¨a+ CRF model prediction: x = ‘koskeva+ mietint ¨o+ k ¨asitell ¨a ¨a+’, y = ‘+A +A +n’ koskeva+ +A mietint ¨o+ +A k ¨asitell a¨ ¨a+ +n unstitch morphemes: koske+ +va+ +A mietint ¨o+ +A k ¨asi+ +te+ +ll ¨a+ + ¨a+ +n language model disambiguation: koske+ +va+ +a mietint ¨o+ + a¨ k a¨si+ +te+ +ll a¨+ + a¨+ +n final stitching: koskevaa mietint o¨ ¨a k ¨asitell ¨a ¨an (the output is then compared to the reference translation) (b) Decoding Figure 2: Worked example of all steps in the post-processing morphology prediction model. 3 Experimental Results used the Europarl version 3 corpus (Koehn, 2005) English-Finnish training data set, as well as the standard development and test data sets. Our parallel training data consists of ∼1 million senFor all of the models built in this paper, we tpeanrcaelsle lo tfr a4i0n nwgor ddast or less, sw ohfi ∼le 1t mhei development and test sets were each 2,000 sentences long. In all the experiments conducted in this paper, we used the Moses5 phrase-based translation system (Koehn et al. , 2007) , 2008 version. We trained all of the Moses systems herein using the standard features: language model, reordering model, translation model, and word penalty; in addition to these, the factored experiments called for additional translation and generation features for the added factors as noted above. We used in all experiments the following settings: a hypothesis stack size 100, distortion limit 6, phrase translations limit 20, and maximum phrase length 20. For the language models, we used SRILM 5-gram language models (Stolcke, 2002) for all factors. For our word-based Baseline system, we trained a word-based model using the same Moses system with identical settings. For evaluation against segmented translation systems in segmented forms before word reconstruction, we also segmented the baseline system’s word-based output. All the BLEU scores reported are for lowercase evaluation. We did an initial evaluation of the segmented output translation for each system using the no5http://www.statmt.org/moses/ 36 TabSlBUeuna2gps:meulSipengLmta-e nioatedchMo12dme804-.lB8S714cL±oEr0eUs.6 9 S8up19Nre.358ofe498rUs9ntoihe supervised segmentation baseline model. m-BLEU indicates that the segmented output was evaluated against a segmented version of the reference (this measure does not have the same correlation with human judgement as BLEU) . No Uni indicates the segmented BLEU score without unigrams. tion of m-BLEU score (Luong et al. , 2010) where the BLEU score is computed by comparing the segmented output with a segmented reference translation. Table 2 shows the m-BLEU scores for various systems. We also show the m-BLEU score without unigrams, since over-segmentation could lead to artificially high m-BLEU scores. In fact, if we compare the relative improvement of our m-BLEU scores for the Unsup L-match system we see a relative improvement of 39.75% over the baseline. Luong et. al. (2010) report an m-BLEU score of 55.64% but obtain a relative improvement of 0.6% over their baseline m-BLEU score. We find that when using a good segmentation model, segmentation of the morphologically complex target language improves model performance over an unsegmented baseline (the confidence scores come from bootstrap resampling) . Table 3 shows the evaluation scores for all the baselines and the methods introduced in this paper using standard wordbased lowercase BLEU, WER and PER. We do TSCMaFBU(LubanRolpcesdFotu3lne-ipLr:gMdeLT-tms.al,Stc2ho0r1es:)l 1wB54 Le.r682E90c 27a9Us∗eBL-7 W46E3. U659478R6,1WE-7 TR412E. 847Ra1528nd TER. The ∗ indicates a statistically significant improvement o∗f BndLiEcaUte score over tchalel yB saisgenli nfice mntod imel.The boldface scores are the best performing scores per evaluation measure. better than (Luong et al. , 2010) , the previous best score for this task. We also show a better relative improvement over our baseline when compared to (Luong et al., 2010) : a relative improvement of 4.86% for Unsup L-match compared to our baseline word-based model, compared to their 1.65% improvement over their baseline word-based model. Our best performing method used unsupervised morphology with L-match (see Section 2.2) and the improvement is significant: bootstrap resampling provides a confidence margin of ±0.77 and a t-test (Collins ceot nafli.d , 2005) sahrogwined o significance aw ti-thte p = 0o.0ll0in1s. 3.1 Morphological Fluency Analysis To see how well the models were doing at getting morphology right, we examined several patterns of morphological behavior. While we wish to explore minimally supervised morphological MT models, and use as little language specific information as possible, we do want to use linguistic analysis on the output of our system to see how well the models capture essential morphological information in the target language. So, we ran the word-based baseline system, the segmented model (Unsup L-match) , and the prediction model (CRF-LM) outputs, along with the reference translation through the supervised morphological analyzer Omorfi (Pirinen and Listenmaa, 2007) . Using this analysis, we looked at a variety of linguistic constructions that might reveal patterns in morphological behavior. These were: (a) explicitly marked 37 noun forms, (b) noun-adjective case agreement, (c) subject-verb person/number agreement, (d) transitive object case marking, (e) postpositions, and (f) possession. In each of these categories, we looked for construction matches on a per-sentence level between the models’ output and the reference translation. Table 4 shows the models’ performance on the constructions we examined. In all of the categories, the CRF-LM model achieves the best precision score, as we explain below, while the Unsup L-match model most frequently gets the highest recall score. A general pattern in the most prevalent of these constructions is that the baseline tends to prefer the least marked form for noun cases (corresponding to the nominative) more than the reference or the CRF-LM model. The baseline leaves nouns in the (unmarked) nominative far more than the reference, while the CRF-LM model comes much closer, so it seems to fare better at explicitly marking forms, rather than defaulting to the more frequent unmarked form. Finnish adjectives must be marked with the same case as their head noun, while verbs must agree in person and number with their subject. We saw that in both these categories, the CRFLM model outperforms for precision, while the segmented model gets the best recall. In addition, Finnish generally marks direct objects of verbs with the accusative or the partitive case; we observed more accusative/partitive-marked nouns following verbs in the CRF-LM output than in the baseline, as illustrated by example (1) in Fig. 3. While neither translation picks the same verb as in the reference for the input ‘clarify,’ the CRFLM-output paraphrases it by using a grammatical construction of the transitive verb followed by a noun phrase inflected with the accusative case, correctly capturing the transitive construction. The baseline translation instead follows ‘give’ with a direct object in the nominative case. To help clarify the constructions in question, we have used Google Translate6 to provide back6 http://translate.google. com/ of occurrences per sentence, recall and F-score. also averaged The constructions over the various translations. are listed in descending P, R and F stand for precision, order of their frequency in the texts. The highlighted value in each column is the most accurate with respect to the reference value. translations of our MT output into English; to contextualize these back-translations, we have provided Google’s back-translation of the reference. The use of postpositions shows another difference between the models. Finnish postpositions require the preceding noun to be in the genitive or sometimes partitive case, which occurs correctly more frequently in the CRF-LM than the baseline. In example (2) in Fig. 3, all three translations correspond to the English text, ‘with the basque nationalists. ’ However, the CRF-LM output is more grammatical than the baseline, because not only do the adjective and noun agree for case, but the noun ‘baskien’ to which the postposition ‘kanssa’ belongs is marked with the correct genitive case. However, this well-formedness is not rewarded by BLEU, because ‘baskien’ does not match the reference. In addition, while Finnish may express possession using case marking alone, it has another construction for possession; this can disambiguate an otherwise ambiguous clause. This alternate construction uses a pronoun in the genitive case followed by a possessive-marked noun; we see that the CRF-LM model correctly marks this construction more frequently than the baseline. As example (3) in Fig. 3 shows, while neither model correctly translates ‘matkan’ (‘trip’) , the baseline’s output attributes the inessive ‘yhteydess’ (‘connection’) as belonging to ‘tulokset’ (‘results’) , and misses marking the possession linking it to ‘Commissioner Fischler’. Our manual evaluation shows that the CRF38 LM model is producing output translations that are more morphologically fluent than the wordbased baseline and the segmented translation Unsup L-match system, even though the word choices lead to a lower BLEU score overall when compared to Unsup L-match. 4 Related Work The work on morphology in MT can be grouped into three categories, factored models, segmented translation, and morphology generation. Factored models (Koehn and Hoang, 2007) factor the phrase translation probabilities over additional information annotated to each word, allowing for text to be represented on multiple levels of analysis. We discussed the drawbacks of factored models for our task in Section 2. 1. While (Koehn and Hoang, 2007; Yang and Kirchhoff, 2006; Avramidis and Koehn, 2008) obtain improvements using factored models for translation into English, German, Spanish, and Czech, these models may be less useful for capturing long-distance dependencies in languages with much more complex morphological systems such as Finnish. In our experiments factored models did worse than the baseline. Segmented translation performs morphological analysis on the morphologically complex text for use in the translation model (Brown et al. , 1993; Goldwater and McClosky, 2005; de Gispert and Mari n˜o, 2008) . This method unpacks complex forms into simpler, more frequently occurring components, and may also increase the symmetry of the lexically realized content be(1) Input: ‘the charter we are to approve today both strengthens and gives visible shape to the common fundamental rights and values our community is to be based upon. ’ a. Reference: perusoikeuskirja , jonka t ¨an ¨a ¨an aiomme hyv a¨ksy ¨a , sek ¨a vahvistaa ett ¨a selvent a¨ a¨ (selvent ¨a a¨/VERB/ACT/INF/SG/LAT-clarify) niit a¨ (ne/PRONOUN/PL/PAR-them) yhteisi ¨a perusoikeuksia ja arvoja , joiden on oltava yhteis¨ omme perusta. Back-translation: ‘Charter of Fundamental Rights, which today we are going to accept that clarify and strengthen the common fundamental rights and values, which must be community based. ’ b. Baseline: perusoikeuskirja me hyv ¨aksymme t¨ an ¨a a¨n molemmat vahvistaa ja antaa (antaa/VERB/INF/SG/LATgive) n a¨kyv a¨ (n¨ aky a¨/VERB/ACT/PCP/SG/NOM-visible) muokata yhteist ¨a perusoikeuksia ja arvoja on perustuttava. Back-translation: ‘Charter today, we accept both confirm and modify to make a visible and common values, fundamental rights must be based. ’ c. CRF-LM: perusoikeuskirja on hyv a¨ksytty t ¨an ¨a ¨an , sek ¨a vahvistaa ja antaa (antaa/VERB/ACT/INF/SG/LAT-give) konkreettisen (konkreettinen/ADJECTIVE/SG/GEN,ACC-concrete) muodon (muoto/NOUN/SG/GEN,ACCshape) yhteisi ¨a perusoikeuksia ja perusarvoja , yhteis¨ on on perustuttava. Back-translation: ‘Charter has been approved today, and to strengthen and give concrete shape to the common basic rights and fundamental values, the Community must be based. ’ (2) Input: ‘with the basque nationalists’ a. Reference: baskimaan kansallismielisten kanssa basque-SG/NOM+land-SG/GEN,ACC nationalists-PL/GEN with-POST b. Baseline: baskimaan kansallismieliset kanssa basque-SG/NOM-+land-SG/GEN,ACC kansallismielinen-PL/NOM,ACC-nationalists POST-with c. CRF-LM: kansallismielisten baskien kanssa nationalists-PL/GEN basques-PL/GEN with-POST (3) Input: ‘and in this respect we should value the latest measures from commissioner fischler , the results of his trip to morocco on the 26th of last month and the high level meetings that took place, including the one with the king himself’ a. Reference: ja t ¨ass¨ a mieless ¨a osaamme my¨ os arvostaa komission j¨ asen fischlerin viimeisimpi ¨a toimia , jotka ovat h a¨nen (h¨ anen/GEN-his) marokkoon 26 lokakuuta tekemns (tekem¨ ans ¨a/POSS-his) matkan (matkan/GENtour) ja korkean tason kokousten jopa itsens¨ a kuninkaan kanssa tulosta Back-translation: ‘and in this sense we can also appreciate the Commissioner Fischler’s latest actions, which are his to Morocco 26 October trip to high-level meetings and even the king himself with the result b. Baseline: ja t ¨ass¨ a yhteydess a¨ olisi arvoa viimeisin toimia komission j¨ asen fischler , tulokset monitulkintaisia marokon yhteydess a¨ (yhteydess/INE-connection) , ja viime kuussa pidettiin korkean tason kokouksissa , mukaan luettuna kuninkaan kanssa Back-translation: ‘and in this context would be the value of the last act, Commissioner Fischler, the results of the Moroccan context, ambiguous, and last month held high level meetings, including with the king’ c. CRF-LM: ja t ¨ass¨ a yhteydess a¨ meid ¨an olisi lis ¨aarvoa viimeist ¨a toimenpiteit a¨ kuin komission j¨ asen fischler , ett a¨ h a¨nen (h¨ anen/GEN-his) kokemuksensa (kokemuksensa/POSS-experience) marokolle (marokolle-Moroccan) viime kuun 26 ja korkean tason tapaamiset j¨ arjestettiin, kuninkaan kanssa Back-translation: ‘and in this context, we should value the last measures as the Commissioner Fischler, that his experience in Morocco has on the 26th and high-level meetings took place, including with the king. ’ Figure 3: Morphological fluency analysis (see Section 3. 1) . tween source and target. In a somewhat orthogonal approach to ours, (Ma et al. , 2007) use alignment of a parallel text to pack together adjacent segments in the alignment output, which are then fed back to the word aligner to bootstrap an improved alignment, which is then used in the translation model. We compared our results against (Luong et al. , 2010) in Table 3 since their results are directly comparable to ours. They use a segmented phrase table and language model along with the word-based versions in the decoder and in tuning a Finnish target. Their approach requires segmented phrases 39 to match word boundaries, eliminating morphologically productive phrases. In their work a segmented language model can score a translation, but cannot insert morphology that does not show source-side reflexes. In order to perform a similar experiment that still allowed for morphologically productive phrases, we tried training a segmented translation model, the output of which we stitched up in tuning so as to tune to a word-based reference. The goal of this experiment was to control the segmented model’s tendency to overfit by rewarding it for using correct whole-word forms. However, we found that this approach was less successful than using the segmented reference in tuning, and could not meet the baseline (13.97% BLEU best tuning score, versus 14.93% BLEU for the baseline best tuning score) . Previous work in segmented translation has often used linguistically motivated morphological analysis selectively applied based on a language-specific heuristic. A typical approach is to select a highly inflecting class of words and segment them for particular morphology (de Gispert and Mari n˜o, 2008; Ramanathan et al. , 2009) . Popovi¸ c and Ney (2004) perform segmentation to reduce morphological complexity of the source to translate into an isolating target, reducing the translation error rate for the English target. For Czech-to-English, Goldwater and McClosky (2005) lemmatized the source text and inserted a set of ‘pseudowords’ expected to have lexical reflexes in English. Minkov et. al. (2007) and Toutanova et. al. (2008) use a Maximum Entropy Markov Model for morphology generation. The main drawback to this approach is that it removes morphological information from the translation model (which only uses stems) ; this can be a problem for languages in which morphology ex- presses lexical content. de Gispert (2008) uses a language-specific targeted morphological classifier for Spanish verbs to avoid this issue. Talbot and Osborne (2006) use clustering to group morphological variants of words for word alignments and for smoothing phrase translation tables. Habash (2007) provides various methods to incorporate morphological variants of words in the phrase table in order to help recognize out of vocabulary words in the source language. 5 Conclusion and Future Work We found that using a segmented translation model based on unsupervised morphology induction and a model that combined morpheme segments in the translation model with a postprocessing morphology prediction model gave us better BLEU scores than a word-based baseline. Using our proposed approach we obtain better scores than the state of the art on the EnglishFinnish translation task (Luong et al. , 2010) : from 14.82% BLEU to 15.09%, while using a 40 simpler model. We show that using morphological segmentation in the translation model can improve output translation scores. We also demonstrate that for Finnish (and possibly other agglutinative languages) , phrase-based MT benefits from allowing the translation model access to morphological segmentation yielding productive morphological phrases. Taking advantage of linguistic analysis of the output we show that using a post-processing morphology generation model can improve translation fluency on a sub-word level, in a manner that is not captured by the BLEU word-based evaluation measure. In order to help with replication of the results in this paper, we have run the various morphological analysis steps and created the necessary training, tuning and test data files needed in order to train, tune and test any phrase-based machine translation system with our data. The files can be downloaded from natlang. cs.sfu. ca. In future work we hope to explore the utility of phrases with productive morpheme boundaries and explore why they are not used more pervasively in the decoder. Evaluation measures for morphologically complex languages and tun- ing to those measures are also important future work directions. Also, we would like to explore a non-pipelined approach to morphological preand post-processing so that a globally trained model could be used to remove the target side morphemes that would improve the translation model and then predict those morphemes in the target language. Acknowledgements This research was partially supported by NSERC, Canada (RGPIN: 264905) and a Google Faculty Award. We would like to thank Christian Monson, Franz Och, Fred Popowich, Howard Johnson, Majid Razmara, Baskaran Sankaran and the anonymous reviewers for their valuable comments on this work. We would particularly like to thank the developers of the open-source Moses machine translation toolkit and the Omorfi morphological analyzer for Finnish which we used for our experiments. References Eleftherios Avramidis and Philipp Koehn. 2008. Enriching morphologically poor languages for statistical machine translation. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, page 763?770, Columbus, Ohio, USA. Association for Computational Linguistics. Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and R. L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2) :263–31 1. Pi-Chuan Chang, Michel Galley, and Christopher D. Manning. 2008. Optimizing Chinese word segmentation for machine translation performance. In Proceedings of the Third Workshop on Statistical Machine Translation, pages 224–232, Columbus, Ohio, June. Association for Computational Linguistics. Michael Collins, Philipp Koehn, and Ivona Kucerova. 2005. Clause restructuring for statistical machine translation. In Proceedings of 43rd Annual Meeting of the Association for Computational Linguistics (A CL05). Association for Computational Linguistics. Mathias Creutz and Krista Lagus. 2005. Inducing the morphological lexicon of a natural language from unannotated text. In Proceedings of the International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reason- ing (AKRR ’05), pages 106–113, Espoo, Finland. Mathias Creutz and Krista Lagus. 2006. Morfessor in the morpho challenge. In Proceedings of the PASCAL Challenge Workshop on Unsupervised Segmentation of Words into Morphemes. Adri ´a de Gispert and Jos e´ Mari n˜o. 2008. On the impact of morphology in English to Spanish statistical MT. Speech Communication, 50(11-12) . Sharon Goldwater and David McClosky. 2005. Improving statistical MT through morphological analysis. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages 676–683, Vancouver, B.C. , Canada. Association for Computational Linguistics. Philipp Koehn and Hieu Hoang. 2007. Factored translation models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 868–876, Prague, Czech Republic. Association for Computational Linguistics. 41 Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In A CL ‘07: Proceedings of the 45th Annual Meeting of the A CL on Interactive Poster and Demonstration Sessions, pages 177–108, Prague, Czech Republic. Association for Computational Linguistics. Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of Machine Translation Summit X, pages 79–86, Phuket, Thailand. Association for Computational Linguistics. John Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, pages 282–289, San Francisco, California, USA. Association for Computing Machinery. Minh-Thang Luong, Preslav Nakov, and Min-Yen Kan. 2010. A hybrid morpheme-word representation for machine translation of morphologically rich languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 148–157, Cambridge, Massachusetts. Association for Computational Linguistics. Yanjun Ma, Nicolas Stroppa, and Andy Way. 2007. Bootstrapping word alignment via word packing. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 304–311, Prague, Czech Republic. Association for Computational Linguistics. Einat Minkov, Kristina Toutanova, and Hisami Suzuki. 2007. Generating complex morphology for machine translation. In In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (A CL07), pages 128–135, Prague, Czech Republic. Association for Computational Linguistics. Christian Monson. 2008. Paramor and morpho challenge 2008. In Lecture Notes in Computer Science: Workshop of the Cross-Language Evaluation Forum (CLEF 2008), Revised Selected Papers. Habash Nizar. 2007. Four techniques for online handling of out-of-vocabulary words in arabic-english statistical machine translation. In Proceedings of the 46th Annual Meeting of the Association of Computational Linguistics, Columbus, Ohio. Association for Computational Linguistics. Kishore Papineni, Salim Roukos, Todd Ward, and Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics A CL, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics. Tommi Pirinen and Inari Listenmaa. 2007. Omorfi morphological analzer. http://gna.org/projects/omorfi. Maja Popovi¸ c and Hermann Ney. 2004. Towards the use of word stems and suffixes for statistiWei jing cal machine translation. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC), pages 1585–1588, Lisbon, Portugal. European Language Resources Association (ELRA) . Ananthakrishnan Ramanathan, Hansraj Choudhary, Avishek Ghosh, and Pushpak Bhattacharyya. 2009. Case markers and morphology: Addressing the crux of the fluency problem in EnglishHindi SMT. In Proceedings of the Joint Conference of the 4 7th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, pages 800–808, Suntec, Singapore. Association for Computational Linguistics. Andreas Stolcke. 2002. Srilm – an extensible language modeling toolkit. 7th International Conference on Spoken Language Processing, 3:901–904. David Talbot and Miles Osborne. 2006. Modelling lexical redundancy for machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 969–976, Sydney, Australia, July. Association for Computational Linguistics. Kristina Toutanova, Hisami Suzuki, and Achim Ruopp. 2008. Applying morphology generation models to machine translation. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 514–522, Columbus, Ohio, USA. Association for Computational Linguistics. Mei Yang and Katrin Kirchhoff. 2006. Phrase-based backoff models for machine translation of highly inflected languages. In Proceedings of the European Chapter of the Association for Computational Linguistics, pages 41–48, Trento, Italy. Association for Computational Linguistics. 42
5 0.69628793 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering
Author: Nadir Durrani ; Helmut Schmid ; Alexander Fraser
Abstract: We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the “N-gram” model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is able to handle short and long distance reorderings in a unified way, and (ii) a joint sequence model for the translation and reordering probabilities which is more flexible than standard phrase-based MT. We observe statistically significant improvements in BLEU over Moses for German-to-English and Spanish-to-English tasks, and comparable results for a French-to-English task.
6 0.69616777 151 acl-2011-Hindi to Punjabi Machine Translation System
7 0.67350906 69 acl-2011-Clause Restructuring For SMT Not Absolutely Helpful
8 0.67107224 310 acl-2011-Translating from Morphologically Complex Languages: A Paraphrase-Based Approach
9 0.66230732 264 acl-2011-Reordering Metrics for MT
10 0.64791584 266 acl-2011-Reordering with Source Language Collocations
11 0.6422255 263 acl-2011-Reordering Constraint Based on Document-Level Context
12 0.62769049 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach
13 0.60080361 60 acl-2011-Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability
14 0.59076977 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers
15 0.58677655 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
16 0.58548069 249 acl-2011-Predicting Relative Prominence in Noun-Noun Compounds
18 0.5787859 146 acl-2011-Goodness: A Method for Measuring Machine Translation Confidence
19 0.57545662 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
20 0.56728774 313 acl-2011-Two Easy Improvements to Lexical Weighting
topicId topicWeight
[(5, 0.035), (10, 0.254), (17, 0.035), (26, 0.017), (31, 0.023), (37, 0.066), (39, 0.042), (41, 0.075), (55, 0.028), (59, 0.03), (65, 0.017), (72, 0.037), (91, 0.035), (96, 0.226), (97, 0.014)]
simIndex simValue paperId paperTitle
1 0.85523087 197 acl-2011-Latent Class Transliteration based on Source Language Origin
Author: Masato Hagiwara ; Satoshi Sekine
Abstract: Transliteration, a rich source of proper noun spelling variations, is usually recognized by phonetic- or spelling-based models. However, a single model cannot deal with different words from different language origins, e.g., “get” in “piaget” and “target.” Li et al. (2007) propose a method which explicitly models and classifies the source language origins and switches transliteration models accordingly. This model, however, requires an explicitly tagged training set with language origins. We propose a novel method which models language origins as latent classes. The parameters are learned from a set of transliterated word pairs via the EM algorithm. The experimental results of the transliteration task of Western names to Japanese show that the proposed model can achieve higher accuracy compared to the conventional models without latent classes.
same-paper 2 0.83721626 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages
Author: Sara Stymne
Abstract: In this thesis proposal Ipresent my thesis work, about pre- and postprocessing for statistical machine translation, mainly into Germanic languages. I focus my work on four areas: compounding, definite noun phrases, reordering, and error correction. Initial results are positive within all four areas, and there are promising possibilities for extending these approaches. In addition Ialso focus on methods for performing thorough error analysis of machine translation output, which can both motivate and evaluate the studies performed.
3 0.79538012 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction
Author: Phil Blunsom ; Trevor Cohn
Abstract: In this work we address the problem of unsupervised part-of-speech induction by bringing together several strands of research into a single model. We develop a novel hidden Markov model incorporating sophisticated smoothing using a hierarchical Pitman-Yor processes prior, providing an elegant and principled means of incorporating lexical characteristics. Central to our approach is a new type-based sampling algorithm for hierarchical Pitman-Yor models in which we track fractional table counts. In an empirical evaluation we show that our model consistently out-performs the current state-of-the-art across 10 languages.
4 0.73897541 62 acl-2011-Blast: A Tool for Error Analysis of Machine Translation Output
Author: Sara Stymne
Abstract: We present BLAST, an open source tool for error analysis of machine translation (MT) output. We believe that error analysis, i.e., to identify and classify MT errors, should be an integral part ofMT development, since it gives a qualitative view, which is not obtained by standard evaluation methods. BLAST can aid MT researchers and users in this process, by providing an easy-to-use graphical user interface. It is designed to be flexible, and can be used with any MT system, language pair, and error typology. The annotation task can be aided by highlighting similarities with a reference translation.
5 0.72820342 256 acl-2011-Query Weighting for Ranking Model Adaptation
Author: Peng Cai ; Wei Gao ; Aoying Zhou ; Kam-Fai Wong
Abstract: We propose to directly measure the importance of queries in the source domain to the target domain where no rank labels of documents are available, which is referred to as query weighting. Query weighting is a key step in ranking model adaptation. As the learning object of ranking algorithms is divided by query instances, we argue that it’s more reasonable to conduct importance weighting at query level than document level. We present two query weighting schemes. The first compresses the query into a query feature vector, which aggregates all document instances in the same query, and then conducts query weighting based on the query feature vector. This method can efficiently estimate query importance by compressing query data, but the potential risk is information loss resulted from the compression. The second measures the similarity between the source query and each target query, and then combines these fine-grained similarity values for its importance estimation. Adaptation experiments on LETOR3.0 data set demonstrate that query weighting significantly outperforms document instance weighting methods.
6 0.70863533 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation
7 0.70790052 308 acl-2011-Towards a Framework for Abstractive Summarization of Multimodal Documents
8 0.70726073 251 acl-2011-Probabilistic Document Modeling for Syntax Removal in Text Summarization
9 0.7061981 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
10 0.7052893 11 acl-2011-A Fast and Accurate Method for Approximate String Search
11 0.70411777 280 acl-2011-Sentence Ordering Driven by Local and Global Coherence for Summary Generation
12 0.70340014 286 acl-2011-Social Network Extraction from Texts: A Thesis Proposal
13 0.70324576 71 acl-2011-Coherent Citation-Based Summarization of Scientific Papers
14 0.70291597 46 acl-2011-Automated Whole Sentence Grammar Correction Using a Noisy Channel Model
15 0.70277297 4 acl-2011-A Class of Submodular Functions for Document Summarization
16 0.70249367 220 acl-2011-Minimum Bayes-risk System Combination
17 0.70215833 26 acl-2011-A Speech-based Just-in-Time Retrieval System using Semantic Search
18 0.70168388 47 acl-2011-Automatic Assessment of Coverage Quality in Intelligence Reports
19 0.70162278 177 acl-2011-Interactive Group Suggesting for Twitter
20 0.70161492 94 acl-2011-Deciphering Foreign Language