acl acl2013 acl2013-125 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Isao Goto ; Masao Utiyama ; Eiichiro Sumita ; Akihiro Tamura ; Sadao Kurohashi
Abstract: This paper proposes new distortion models for phrase-based SMT. In decoding, a distortion model estimates the source word position to be translated next (NP) given the last translated source word position (CP). We propose a distortion model that can consider the word at the CP, a word at an NP candidate, and the context of the CP and the NP candidate simultaneously. Moreover, we propose a further improved model that considers richer context by discriminating label sequences that specify spans from the CP to NP candidates. It enables our model to learn the effect of relative word order among NP candidates as well as to learn the effect of distances from the training data. In our experiments, our model improved 2.9 BLEU points for Japanese-English and 2.6 BLEU points for Chinese-English translation compared to the lexical reordering models.
Reference: text
sentIndex sentText sentNum sentScore
1 jpt Abstract This paper proposes new distortion models for phrase-based SMT. [sent-13, score-0.596]
2 In decoding, a distortion model estimates the source word position to be translated next (NP) given the last translated source word position (CP). [sent-14, score-1.236]
3 We propose a distortion model that can consider the word at the CP, a word at an NP candidate, and the context of the CP and the NP candidate simultaneously. [sent-15, score-0.775]
4 Moreover, we propose a further improved model that considers richer context by discriminating label sequences that specify spans from the CP to NP candidates. [sent-16, score-0.174]
5 It enables our model to learn the effect of relative word order among NP candidates as well as to learn the effect of distances from the training data. [sent-17, score-0.325]
6 6 BLEU points for Chinese-English translation compared to the lexical reordering models. [sent-20, score-0.141]
7 1 Introduction Estimating appropriate word order in a target language is one of the most difficult problems for statistical machine translation (SMT). [sent-21, score-0.128]
8 This is particularly true when translating between languages with widely different word orders. [sent-22, score-0.072]
9 To address this problem, there has been a lot of research done into word reordering: lexical reordering model (Tillman, 2004), which is one of the distortion models, reordering constraint (Zens et al. [sent-23, score-0.893]
10 In general, source language syntax is useful for handling long distance word reordering. [sent-25, score-0.139]
11 Phrase-based SMT mainly1 estimates word reordering using distortion models2. [sent-29, score-0.798]
12 Therefore, distortion models are one of the most important components for phrase-based SMT. [sent-30, score-0.596]
13 On the other hand, there are methods other than distortion models for improving word reordering for phrase-based SMT, such as pre-ordering or reordering constraints. [sent-31, score-0.875]
14 However, these methods also use distortion models when translating by phrase-based SMT. [sent-32, score-0.615]
15 Therefore, distortion models do not compete against these methods and are commonly used with them. [sent-33, score-0.615]
16 If there is a good distortion model, it will improve the translation quality of phrase-based SMT and benefit to the methods using distortion models. [sent-34, score-1.184]
17 In this paper, we propose two distortion models for phrase-based SMT. [sent-35, score-0.596]
18 In decoding, a distortion model estimates the source word position to be translated next (NP) given the last translated source word position (CP). [sent-36, score-1.236]
19 The proposed models are the pair model and the sequence model. [sent-37, score-0.076]
20 The pair model utilizes the word at the CP, a word at an NP candidate site, and the words surrounding the CP and the NP candidates (context) simultaneously. [sent-38, score-0.249]
21 In addition, the sequence model, which is the further improved model, considers richer context by identifying the label sequence that specify the span from the CP to the NP. [sent-39, score-0.182]
22 It enables our model to learn the effect of relative word order among NP candidates as well as to learn the effect of distances from the training data. [sent-40, score-0.325]
23 Our model learns the preference relations among NP 1A language model also supports the estimation. [sent-41, score-0.125]
24 2In this paper, reordering models for phrase-based SMT, which are intended to estimate the source word position to be translated next in decoding, are called distortion models. [sent-42, score-1.047]
25 This estimation is used to produce a hypothesis in the target language word order sequentially from left to right. [sent-43, score-0.131]
26 Figure 1: An example of left-to-right translation for Japanese-English. [sent-71, score-0.028]
27 Boxes represent phrases and arrows indicate the translation order of the phrases. [sent-72, score-0.046]
28 Our model consists of one probabilistic model and does not require a parser. [sent-74, score-0.072]
29 2 Distortion Model for Phrase-Based SMT A Moses-style phrase-based SMT generates target hypotheses sequentially from left to right. [sent-77, score-0.06]
30 Therefore, the role of the distortion model is to estimate the source phrase position to be translated next whose target side phrase will be located immediately to the right of the already generated hypotheses. [sent-78, score-0.98]
31 In Figure 1, we assume that only the kare wa (English side: “he”) has been translated. [sent-80, score-0.108]
32 The target word to be generated next will be “bought” and the source word to be selected next will be its corresponding Japanese word katta. [sent-81, score-0.298]
33 Thus, a distortion model should estimate phrases including katta as a source phrase position to be translated next. [sent-82, score-1.035]
34 To explain the distortion model task in more detail, we need to redefine more precisely two terms, the current position (CP) and next position (NP) in the source sentence. [sent-83, score-0.9]
35 CP is the source sentence position corresponding to the rightmost aligned target word in the generated target word sequence. [sent-84, score-0.365]
36 NP is the source sentence position corresponding to the leftmost aligned target word in the target phrase to be generated next. [sent-85, score-0.338]
37 The task of the distortion model is to estimate the NP3 from NP candidates (NPCs) for each CP in the source sentence. [sent-86, score-0.745]
38 4 3NP is not always one position, because there tiple correct hypotheses. [sent-87, score-0.024]
39 In existing methods, CP is the rightmost position of the last translated source phrase and NP is the leftmost position of the source phrase to be translated next. [sent-90, score-0.558]
40 The upper sentence is the source sentence and the sentence underneath is a target hypothesis for each example. [sent-298, score-0.164]
41 The NP is in bold, and the CP is in bold italics. [sent-299, score-0.021]
42 The point of an arrow with a mark indicates a wrong NP candiadarrtoe. [sent-300, score-0.018]
43 The superscript numbers indicate the word position in the source sentence. [sent-303, score-0.203]
44 However, in Figure 2 (b), the word (kare) at the CP is the same as (a), but the NP is different (the NP is 10). [sent-305, score-0.053]
45 From these examples, we see that distance is not the essential factor in deciding an NP. [sent-306, score-0.024]
46 And it also turns out that the word at the CP alone is not enough to estimate the NP. [sent-307, score-0.095]
47 Thus, not only the word at the CP but also the word at a NP candidate (NPC) should be considered simultaneously. [sent-308, score-0.126]
48 In (c) and (d) in Figure 2, the word (kare) at the CP is the same and karita (borrowed) and katta (bought) are at the NPCs. [sent-309, score-0.295]
49 Karita is the word at the NP and katta is not the word at the NP for (c), while katta is the word at the NP and karita is not the word at the NP for (d). [sent-310, score-0.588]
50 From these examples, considering what the word is at the NP not consider word-level correspondences. [sent-311, score-0.053]
51 One of the reasons for this difference is the relative word order between words. [sent-313, score-0.117]
52 In (d) and (e) in Figure 2, the word (kare) at the CP and the word order between katta and karita are the same. [sent-315, score-0.366]
53 However, the word at the NP for (d) and the word at the NP for (e) are different. [sent-316, score-0.106]
54 From these examples, we can see that selecting a nearby word is not always correct. [sent-317, score-0.053]
55 The difference is caused by the words surrounding the NPCs (context), the CP context, and the words between the CP and the NPC. [sent-318, score-0.037]
56 Thus, these should be con- sidered when estimating the NP. [sent-319, score-0.047]
57 In summary, in order to estimate the NP, the following should be considered simultaneously: the word at the NP, the word at the CP, the relative word order among the NPCs, the words surrounding NP and CP (context), and the words between the CP and the NPC. [sent-320, score-0.344]
58 There are distortion models that do not require a parser for phrase-based SMT. [sent-321, score-0.596]
59 The linear distortion cost model used in Moses (Koehn et al. [sent-322, score-0.635]
60 , 2007), whose costs are linearly proportional to the reordering distance, always gives a high cost to long distance reordering, even if the reordering is correct. [sent-323, score-0.271]
61 The MSD lexical reordering model (Tillman, 2004; Koehn et al. [sent-324, score-0.149]
62 , 2005; Galley and Manning, 2008) only calculates probabilities for the three kinds of phrase reorderings (monotone, swap, and discontinuous), and does not consider relative word order or words between the CP and the NPC. [sent-325, score-0.213]
63 Thus, these models are not sufficient for long distance word reordering. [sent-326, score-0.095]
64 Al-Onaizan and Papineni (2006) proposed a distortion model that used the word at the CP and the word at an NPC. [sent-327, score-0.72]
65 However, their model did not use context, relative word order, or words between the CP and the NPC. [sent-328, score-0.135]
66 (2009) proposed a method that adjusts the linear distortion cost using the word at the CP and its context. [sent-330, score-0.672]
67 Their model does not simultaneously consider both the word specified at the CP and the word specified at the NPCs. [sent-331, score-0.323]
68 Their model (the outbound model) estimates how far the NP should be from the CP using the word at the CP and its context. [sent-334, score-0.214]
69 5 Their model does not simultaneously con5They also proposed another model (the inbound model) sider both the word specified at the CP and the word specified at an NPC. [sent-335, score-0.425]
70 For example, the outbound model considers the word specified at the CP, but does not consider the word specified at an NPC. [sent-336, score-0.389]
71 Their models also do not consider relative word order. [sent-337, score-0.117]
72 In contrast, our distortion model solves the aforementioned problems. [sent-338, score-0.614]
73 Our distortion models utilize the word specified at the CP, the word specified at an NPC, and also the context of the CP and the NPC simultaneously. [sent-339, score-0.879]
74 Furthermore, our sequence model considers richer context including the relative word order among NPCs and also including all the words between the CP and the NPC. [sent-340, score-0.297]
75 In addition, unlike previous methods, our models learn the preference relations among NPCs. [sent-341, score-0.091]
76 3 Proposed Method In this section, we first define our distortion model and explain our learning strategy. [sent-342, score-0.638]
77 Then, we describe two proposed models: the pair model and the sequence model that is the further improved model. [sent-343, score-0.111]
78 1 Distortion Model and Learning Strategy First, we define our distortion model. [sent-345, score-0.578]
79 Let ibe a CP, j be an NPC, S be a source sentence, and X be the random variable of the NP. [sent-346, score-0.062]
80 In this paper, distortion probability is defined as P(X = j |i, S), tworhitciohn nis p troheb probability eofifn an N asPC P j being tjh|ei, NS)P. [sent-347, score-0.597]
81 , Our distortion model is defined as the model calculating the distortion probability. [sent-348, score-1.228]
82 Next, we explain the learning strategy for our distortion model. [sent-349, score-0.634]
83 We train this model as a discriminative model that discriminates the NP from NPCs. [sent-350, score-0.113]
84 Let J be a set of word positions in S other than i. [sent-351, score-0.053]
85 We train the distortion model subject to ∑P(X = j|i,S) = 1. [sent-352, score-0.614]
86 ∑j∈J The model parameters are learned to maximize the distortion probability of the NP among all of the NPCs J in each source sentence. [sent-353, score-0.7]
87 This learning strategy is a kind of preference relation learning (Evgniou and Pontil, 2002). [sent-354, score-0.061]
88 In this learning, the that estimates reverse direction distance. [sent-355, score-0.054]
89 Each NPC is regarded as an NP, and the inbound model estimates how far the corresponding CP should be from the NP using the word at the NP and its context. [sent-356, score-0.187]
90 157 distortion probability of the actual NP will be rel- atively higher than those of all the other NPCs J. [sent-357, score-0.6]
91 This learning strategy is different from that of (Al-Onaizan and Papineni, 2006; Green et al. [sent-358, score-0.032]
92 ∑(2010) trained their outbound model subject to ∑c∈C P(Y = c|i, S) = 1, where C is the set of ∑thec ∈niCne distortci|oin,S Sc)la =sse 1s,6 wanhedr Ye C Cis tsh teh era snedto omf ∑vheari naibnlee doisf tohrecorrect distortion class that the correct distortion is classified into. [sent-361, score-1.308]
93 o Ddeislt probabilities nthedat a they −le iar −ne 1d. [sent-364, score-0.021]
94 were the probabilities of distortion classes in all of the training data, not the relative preferences among the NPCs in each source sentence. [sent-365, score-0.731]
95 2 Pair Model The pair model utilizes the word at the CP, the word at an NPC, and the context of the CP and the NPC simultaneously to estimate the NP. [sent-367, score-0.281]
96 This can be done by our distortion model definition and the learning strategy described in the previous section. [sent-368, score-0.646]
97 The reason for this is that a model based on the maximum entropy method can calculate probabilities. [sent-371, score-0.036]
98 However, if we use scores as an approximation of the distortion probabilities, various discriminative machine learning methods can be applied to build the distortion model. [sent-372, score-1.173]
99 We add a beginning of sentence (BOS) marker to the head of the source sentence and an end of sentence (EOS) marker to the end, so the source sentence S is expressed as (s0 = BOS, sn+1 = EOS). [sent-377, score-0.248]
wordName wordTfidf (topN-words)
[('distortion', 0.578), ('cp', 0.558), ('np', 0.315), ('npc', 0.215), ('npcs', 0.188), ('katta', 0.134), ('reordering', 0.113), ('kare', 0.108), ('karita', 0.108), ('smt', 0.09), ('position', 0.088), ('outbound', 0.071), ('specified', 0.071), ('translated', 0.069), ('source', 0.062), ('estimates', 0.054), ('word', 0.053), ('bought', 0.048), ('relative', 0.046), ('green', 0.045), ('inbound', 0.044), ('estimate', 0.042), ('eos', 0.041), ('goto', 0.041), ('tillman', 0.039), ('simultaneously', 0.039), ('surrounding', 0.037), ('model', 0.036), ('context', 0.035), ('considers', 0.034), ('leftmost', 0.034), ('rightmost', 0.034), ('strategy', 0.032), ('bos', 0.031), ('sequentially', 0.031), ('calculates', 0.03), ('richer', 0.029), ('target', 0.029), ('preference', 0.029), ('marker', 0.028), ('translation', 0.028), ('estimating', 0.028), ('candidates', 0.027), ('phrase', 0.026), ('distances', 0.026), ('distance', 0.024), ('among', 0.024), ('ro', 0.024), ('explain', 0.024), ('tjh', 0.024), ('thec', 0.024), ('discriminates', 0.024), ('akihi', 0.024), ('isao', 0.024), ('tiple', 0.024), ('wanhedr', 0.024), ('next', 0.024), ('ni', 0.023), ('decoding', 0.023), ('utilizes', 0.023), ('specify', 0.023), ('koehn', 0.023), ('sequence', 0.022), ('underneath', 0.022), ('sider', 0.022), ('cis', 0.022), ('atively', 0.022), ('akihiro', 0.022), ('pontil', 0.022), ('cost', 0.021), ('probabilities', 0.021), ('omf', 0.021), ('mccord', 0.021), ('borrowed', 0.021), ('tamura', 0.021), ('bold', 0.021), ('candidate', 0.02), ('learn', 0.02), ('adjusts', 0.02), ('msd', 0.02), ('enables', 0.019), ('moses', 0.019), ('papineni', 0.019), ('compete', 0.019), ('sidered', 0.019), ('nis', 0.019), ('reorderings', 0.019), ('translating', 0.019), ('effect', 0.018), ('models', 0.018), ('order', 0.018), ('sse', 0.018), ('masao', 0.018), ('patent', 0.018), ('utiyama', 0.018), ('arrow', 0.018), ('discriminative', 0.017), ('sentence', 0.017), ('improved', 0.017), ('discontinuous', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 125 acl-2013-Distortion Model Considering Rich Context for Statistical Machine Translation
Author: Isao Goto ; Masao Utiyama ; Eiichiro Sumita ; Akihiro Tamura ; Sadao Kurohashi
Abstract: This paper proposes new distortion models for phrase-based SMT. In decoding, a distortion model estimates the source word position to be translated next (NP) given the last translated source word position (CP). We propose a distortion model that can consider the word at the CP, a word at an NP candidate, and the context of the CP and the NP candidate simultaneously. Moreover, we propose a further improved model that considers richer context by discriminating label sequences that specify spans from the CP to NP candidates. It enables our model to learn the effect of relative word order among NP candidates as well as to learn the effect of distances from the training data. In our experiments, our model improved 2.9 BLEU points for Japanese-English and 2.6 BLEU points for Chinese-English translation compared to the lexical reordering models.
2 0.20476797 166 acl-2013-Generalized Reordering Rules for Improved SMT
Author: Fei Huang ; Cezar Pendus
Abstract: We present a simple yet effective approach to syntactic reordering for Statistical Machine Translation (SMT). Instead of solely relying on the top-1 best-matching rule for source sentence preordering, we generalize fully lexicalized rules into partially lexicalized and unlexicalized rules to broaden the rule coverage. Furthermore, , we consider multiple permutations of all the matching rules, and select the final reordering path based on the weighed sum of reordering probabilities of these rules. Our experiments in English-Chinese and English-Japanese translations demonstrate the effectiveness of the proposed approach: we observe consistent and significant improvement in translation quality across multiple test sets in both language pairs judged by both humans and automatic metric. 1
3 0.11446539 101 acl-2013-Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation
Author: Karthik Visweswariah ; Mitesh M. Khapra ; Ananthakrishnan Ramanathan
Abstract: Preordering of a source language sentence to match target word order has proved to be useful for improving machine translation systems. Previous work has shown that a reordering model can be learned from high quality manual word alignments to improve machine translation performance. In this paper, we focus on further improving the performance of the reordering model (and thereby machine translation) by using a larger corpus of sentence aligned data for which manual word alignments are not available but automatic machine generated alignments are available. The main challenge we tackle is to generate quality data for training the reordering model in spite of the machine align- ments being noisy. To mitigate the effect of noisy machine alignments, we propose a novel approach that improves reorderings produced given noisy alignments and also improves word alignments using information from the reordering model. This approach generates alignments that are 2.6 f-Measure points better than a baseline supervised aligner. The data generated allows us to train a reordering model that gives an improvement of 1.8 BLEU points on the NIST MT-08 Urdu-English evaluation set over a reordering model that only uses manual word alignments, and a gain of 5.2 BLEU points over a standard phrase-based baseline.
4 0.10845855 74 acl-2013-Building Comparable Corpora Based on Bilingual LDA Model
Author: Zede Zhu ; Miao Li ; Lei Chen ; Zhenxin Yang
Abstract: Comparable corpora are important basic resources in cross-language information processing. However, the existing methods of building comparable corpora, which use intertranslate words and relative features, cannot evaluate the topical relation between document pairs. This paper adopts the bilingual LDA model to predict the topical structures of the documents and proposes three algorithms of document similarity in different languages. Experiments show that the novel method can obtain similar documents with consistent top- ics own better adaptability and stability performance.
5 0.10735064 388 acl-2013-Word Alignment Modeling with Context Dependent Deep Neural Network
Author: Nan Yang ; Shujie Liu ; Mu Li ; Ming Zhou ; Nenghai Yu
Abstract: In this paper, we explore a novel bilingual word alignment approach based on DNN (Deep Neural Network), which has been proven to be very effective in various machine learning tasks (Collobert et al., 2011). We describe in detail how we adapt and extend the CD-DNNHMM (Dahl et al., 2012) method introduced in speech recognition to the HMMbased word alignment model, in which bilingual word embedding is discriminatively learnt to capture lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences. While being capable to model the rich bilingual correspondence, our method generates a very compact model with much fewer parameters. Experiments on a large scale EnglishChinese word alignment task show that the proposed method outperforms the HMM and IBM model 4 baselines by 2 points in F-score.
6 0.1036802 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
7 0.10276385 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation
8 0.075297043 354 acl-2013-Training Nondeficient Variants of IBM-3 and IBM-4 for Word Alignment
9 0.074652441 200 acl-2013-Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation
10 0.07026495 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation
11 0.069409333 10 acl-2013-A Markov Model of Machine Translation using Non-parametric Bayesian Inference
12 0.062718704 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
13 0.060713183 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding
14 0.057002559 320 acl-2013-Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation
15 0.056921788 9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner
16 0.054717951 363 acl-2013-Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
17 0.054217216 221 acl-2013-Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines
18 0.052560903 378 acl-2013-Using subcategorization knowledge to improve case prediction for translation to German
19 0.051898073 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation
20 0.05188597 280 acl-2013-Plurality, Negation, and Quantification:Towards Comprehensive Quantifier Scope Disambiguation
topicId topicWeight
[(0, 0.114), (1, -0.094), (2, 0.081), (3, 0.059), (4, -0.035), (5, 0.025), (6, 0.018), (7, -0.011), (8, -0.011), (9, 0.043), (10, 0.017), (11, 0.012), (12, 0.028), (13, -0.007), (14, 0.014), (15, 0.031), (16, 0.095), (17, 0.028), (18, -0.048), (19, -0.037), (20, -0.066), (21, -0.017), (22, -0.01), (23, -0.11), (24, 0.075), (25, 0.021), (26, -0.038), (27, -0.069), (28, -0.12), (29, -0.05), (30, -0.08), (31, -0.04), (32, 0.015), (33, 0.051), (34, -0.046), (35, 0.032), (36, -0.028), (37, -0.022), (38, -0.01), (39, -0.039), (40, -0.049), (41, -0.107), (42, -0.003), (43, -0.009), (44, -0.15), (45, -0.039), (46, 0.031), (47, 0.032), (48, -0.015), (49, -0.04)]
simIndex simValue paperId paperTitle
same-paper 1 0.92958647 125 acl-2013-Distortion Model Considering Rich Context for Statistical Machine Translation
Author: Isao Goto ; Masao Utiyama ; Eiichiro Sumita ; Akihiro Tamura ; Sadao Kurohashi
Abstract: This paper proposes new distortion models for phrase-based SMT. In decoding, a distortion model estimates the source word position to be translated next (NP) given the last translated source word position (CP). We propose a distortion model that can consider the word at the CP, a word at an NP candidate, and the context of the CP and the NP candidate simultaneously. Moreover, we propose a further improved model that considers richer context by discriminating label sequences that specify spans from the CP to NP candidates. It enables our model to learn the effect of relative word order among NP candidates as well as to learn the effect of distances from the training data. In our experiments, our model improved 2.9 BLEU points for Japanese-English and 2.6 BLEU points for Chinese-English translation compared to the lexical reordering models.
2 0.83865142 166 acl-2013-Generalized Reordering Rules for Improved SMT
Author: Fei Huang ; Cezar Pendus
Abstract: We present a simple yet effective approach to syntactic reordering for Statistical Machine Translation (SMT). Instead of solely relying on the top-1 best-matching rule for source sentence preordering, we generalize fully lexicalized rules into partially lexicalized and unlexicalized rules to broaden the rule coverage. Furthermore, , we consider multiple permutations of all the matching rules, and select the final reordering path based on the weighed sum of reordering probabilities of these rules. Our experiments in English-Chinese and English-Japanese translations demonstrate the effectiveness of the proposed approach: we observe consistent and significant improvement in translation quality across multiple test sets in both language pairs judged by both humans and automatic metric. 1
3 0.74509448 101 acl-2013-Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation
Author: Karthik Visweswariah ; Mitesh M. Khapra ; Ananthakrishnan Ramanathan
Abstract: Preordering of a source language sentence to match target word order has proved to be useful for improving machine translation systems. Previous work has shown that a reordering model can be learned from high quality manual word alignments to improve machine translation performance. In this paper, we focus on further improving the performance of the reordering model (and thereby machine translation) by using a larger corpus of sentence aligned data for which manual word alignments are not available but automatic machine generated alignments are available. The main challenge we tackle is to generate quality data for training the reordering model in spite of the machine align- ments being noisy. To mitigate the effect of noisy machine alignments, we propose a novel approach that improves reorderings produced given noisy alignments and also improves word alignments using information from the reordering model. This approach generates alignments that are 2.6 f-Measure points better than a baseline supervised aligner. The data generated allows us to train a reordering model that gives an improvement of 1.8 BLEU points on the NIST MT-08 Urdu-English evaluation set over a reordering model that only uses manual word alignments, and a gain of 5.2 BLEU points over a standard phrase-based baseline.
4 0.70167708 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation
Author: Minwei Feng ; Jan-Thorsten Peter ; Hermann Ney
Abstract: In this paper, we propose a novel reordering model based on sequence labeling techniques. Our model converts the reordering problem into a sequence labeling problem, i.e. a tagging task. Results on five Chinese-English NIST tasks show that our model improves the baseline system by 1.32 BLEU and 1.53 TER on average. Results of comparative study with other seven widely used reordering models will also be reported.
5 0.66909325 77 acl-2013-Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT?
Author: Nadir Durrani ; Alexander Fraser ; Helmut Schmid ; Hieu Hoang ; Philipp Koehn
Abstract: The phrase-based and N-gram-based SMT frameworks complement each other. While the former is better able to memorize, the latter provides a more principled model that captures dependencies across phrasal boundaries. Some work has been done to combine insights from these two frameworks. A recent successful attempt showed the advantage of using phrasebased search on top of an N-gram-based model. We probe this question in the reverse direction by investigating whether integrating N-gram-based translation and reordering models into a phrase-based decoder helps overcome the problematic phrasal independence assumption. A large scale evaluation over 8 language pairs shows that performance does significantly improve.
6 0.66560787 200 acl-2013-Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation
7 0.65618432 363 acl-2013-Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
8 0.44099879 378 acl-2013-Using subcategorization knowledge to improve case prediction for translation to German
9 0.43290827 10 acl-2013-A Markov Model of Machine Translation using Non-parametric Bayesian Inference
10 0.42767137 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding
11 0.39896929 280 acl-2013-Plurality, Negation, and Quantification:Towards Comprehensive Quantifier Scope Disambiguation
12 0.39699891 390 acl-2013-Word surprisal predicts N400 amplitude during reading
13 0.38597372 180 acl-2013-Handling Ambiguities of Bilingual Predicate-Argument Structures for Statistical Machine Translation
14 0.38559201 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
15 0.38053253 305 acl-2013-SORT: An Interactive Source-Rewriting Tool for Improved Translation
16 0.37837967 110 acl-2013-Deepfix: Statistical Post-editing of Statistical Machine Translation Using Deep Syntactic Analysis
17 0.37255427 320 acl-2013-Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation
18 0.37128541 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation
19 0.35994855 388 acl-2013-Word Alignment Modeling with Context Dependent Deep Neural Network
20 0.35139027 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers
topicId topicWeight
[(0, 0.025), (6, 0.017), (11, 0.038), (15, 0.013), (24, 0.022), (26, 0.014), (35, 0.058), (42, 0.533), (48, 0.021), (70, 0.02), (88, 0.035), (90, 0.029), (95, 0.057)]
simIndex simValue paperId paperTitle
Author: Sina Zarriess ; Jonas Kuhn
Abstract: We suggest a generation task that integrates discourse-level referring expression generation and sentence-level surface realization. We present a data set of German articles annotated with deep syntax and referents, including some types of implicit referents. Our experiments compare several architectures varying the order of a set of trainable modules. The results suggest that a revision-based pipeline, with intermediate linearization, significantly outperforms standard pipelines or a parallel architecture.
same-paper 2 0.97926366 125 acl-2013-Distortion Model Considering Rich Context for Statistical Machine Translation
Author: Isao Goto ; Masao Utiyama ; Eiichiro Sumita ; Akihiro Tamura ; Sadao Kurohashi
Abstract: This paper proposes new distortion models for phrase-based SMT. In decoding, a distortion model estimates the source word position to be translated next (NP) given the last translated source word position (CP). We propose a distortion model that can consider the word at the CP, a word at an NP candidate, and the context of the CP and the NP candidate simultaneously. Moreover, we propose a further improved model that considers richer context by discriminating label sequences that specify spans from the CP to NP candidates. It enables our model to learn the effect of relative word order among NP candidates as well as to learn the effect of distances from the training data. In our experiments, our model improved 2.9 BLEU points for Japanese-English and 2.6 BLEU points for Chinese-English translation compared to the lexical reordering models.
3 0.97501397 372 acl-2013-Using CCG categories to improve Hindi dependency parsing
Author: Bharat Ram Ambati ; Tejaswini Deoskar ; Mark Steedman
Abstract: We show that informative lexical categories from a strongly lexicalised formalism such as Combinatory Categorial Grammar (CCG) can improve dependency parsing of Hindi, a free word order language. We first describe a novel way to obtain a CCG lexicon and treebank from an existing dependency treebank, using a CCG parser. We use the output of a supertagger trained on the CCGbank as a feature for a state-of-the-art Hindi dependency parser (Malt). Our results show that using CCG categories improves the accuracy of Malt on long distance dependencies, for which it is known to have weak rates of recovery.
4 0.94101197 64 acl-2013-Automatically Predicting Sentence Translation Difficulty
Author: Abhijit Mishra ; Pushpak Bhattacharyya ; Michael Carl
Abstract: In this paper we introduce Translation Difficulty Index (TDI), a measure of difficulty in text translation. We first define and quantify translation difficulty in terms of TDI. We realize that any measure of TDI based on direct input by translators is fraught with subjectivity and adhocism. We, rather, rely on cognitive evidences from eye tracking. TDI is measured as the sum of fixation (gaze) and saccade (rapid eye movement) times of the eye. We then establish that TDI is correlated with three properties of the input sentence, viz. length (L), degree of polysemy (DP) and structural complexity (SC). We train a Support Vector Regression (SVR) system to predict TDIs for new sentences using these features as input. The prediction done by our framework is well correlated with the empirical gold standard data, which is a repository of < L, DP, SC > and TDI pairs for a set of sentences. The primary use of our work is a way of “binning” sentences (to be translated) in “easy”, “medium” and “hard” categories as per their predicted TDI. This can decide pricing of any translation task, especially useful in a scenario where parallel corpora for Machine Translation are built through translation crowdsourcing/outsourcing. This can also provide a way of monitoring progress of second language learners.
5 0.93713409 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation
Author: Rico Sennrich ; Holger Schwenk ; Walid Aransa
Abstract: While domain adaptation techniques for SMT have proven to be effective at improving translation quality, their practicality for a multi-domain environment is often limited because of the computational and human costs of developing and maintaining multiple systems adapted to different domains. We present an architecture that delays the computation of translation model features until decoding, allowing for the application of mixture-modeling techniques at decoding time. We also de- scribe a method for unsupervised adaptation with development and test data from multiple domains. Experimental results on two language pairs demonstrate the effectiveness of both our translation model architecture and automatic clustering, with gains of up to 1BLEU over unadapted systems and single-domain adaptation.
6 0.9142763 302 acl-2013-Robust Automated Natural Language Processing with Multiword Expressions and Collocations
7 0.91117465 206 acl-2013-Joint Event Extraction via Structured Prediction with Global Features
8 0.90762252 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation
9 0.73228312 166 acl-2013-Generalized Reordering Rules for Improved SMT
10 0.71363968 77 acl-2013-Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT?
11 0.70026034 281 acl-2013-Post-Retrieval Clustering Using Third-Order Similarity Measures
12 0.69199151 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction
13 0.67487174 199 acl-2013-Integrating Multiple Dependency Corpora for Inducing Wide-coverage Japanese CCG Resources
14 0.67405593 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
15 0.65283829 69 acl-2013-Bilingual Lexical Cohesion Trigger Model for Document-Level Machine Translation
16 0.65183973 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
17 0.63516843 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk
18 0.63242155 363 acl-2013-Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
19 0.62722087 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation
20 0.62142181 101 acl-2013-Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation