acl acl2011 acl2011-16 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nadir Durrani ; Helmut Schmid ; Alexander Fraser
Abstract: We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the “N-gram” model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is able to handle short and long distance reorderings in a unified way, and (ii) a joint sequence model for the translation and reordering probabilities which is more flexible than standard phrase-based MT. We observe statistically significant improvements in BLEU over Moses for German-to-English and Spanish-to-English tasks, and comparable results for a French-to-English task.
Reference: text
sentIndex sentText sentNum sentScore
1 de Abstract We present a novel machine translation model which models translation by a linear sequence of operations. [sent-3, score-0.389]
2 In contrast to the “N-gram” model, this sequence includes not only translation but also reordering operations. [sent-4, score-0.52]
3 1 Introduction We present a novel generative model that explains the translation process as a linear sequence of oper- ations which generate a source and target sentence in parallel. [sent-7, score-0.513]
4 Possible operations are (i) generation of a sequence of source and target words (ii) insertion of gaps as explicit target positions for reordering operations, and (iii) forward and backward jump operations which do the actual reordering. [sent-8, score-1.219]
5 , the probability of an operation depends on the n − 1 preceding operations. [sent-11, score-0.283]
6 eSriantcioe nth dee tpraennsdlsat oionn t (generation) acneddi reordering operations are coupled in a single generative story, the reordering decisions may depend on preceding translation decisions and translation decisions may 1045 depend on preceding reordering decisions. [sent-12, score-1.393]
7 This provides a natural reordering mechanism which is able to deal with local and long-distance reorderings in a consistent way. [sent-13, score-0.427]
8 , 2006) but our model does reordering as an integral part of a generative model. [sent-15, score-0.352]
9 1 Relation of our work to PBSMT Phrase-based SMT provides a powerful translation mechanism which learns local reorderings, translation of short idioms, and the insertion and deletion of words sensitive to local context. [sent-25, score-0.418]
10 Then a gap is inserted on the Ger– – man side, followed by the generation of “gegessen eaten”. [sent-36, score-0.257]
11 Notice how the reordering decision is triggered by the translation decision in our model. [sent-39, score-0.465]
12 The probability of a gap insertion operation after the generation of the auxiliaries “hat has” will be high because reordering is necessary in order to move the second part of the German verb complex (“gegessen”) to its correct position at the end of the clause. [sent-40, score-0.872]
13 This mechanism better restricts reordering – – – – – – – 1The examples given in this section are not taken from the real data/system, but made-up for the sake of argument. [sent-41, score-0.337]
14 gelesen read” is internal to the phrase pair “hat er ein buch gelesen he read a book”, and is therefore handled conveniently. [sent-48, score-0.46]
15 On the other hand, the phrase table does not have the entry “hat er eine zeitung gelesen he read a newspaper” (Figure 1(b)). [sent-49, score-0.404]
16 The above sentence would be generated by the following sequence of operations: (i) generate “dann then” (ii) insert a gap (iii) generate “er he” (iv) backward jump to the gap (v) generate “hat. [sent-57, score-1.128]
17 [gelesen] read” (only “hat” and “read” are added to the sentences yet) (vi) jump forward to the right-most source word so far generated (vii) insert a gap (viii) continue the source cept (“gelesen” is inserted now) (ix) backward jump to the gap (x) generate “ein a” (xi) generate “buch book”. [sent-60, score-1.774]
18 The main difference between phrase-based and N-gram SMT is the extraction procedure of translation units and the statistical modeling of translation context (Crego et al. [sent-69, score-0.403]
19 The tuples used in N-gram systems are much smaller translation units than phrases and are extracted in such a way that a unique segmentation of each bilingual sentence pair is produced. [sent-71, score-0.434]
20 Reordering works by linearization of the source side and tuple unfolding (Crego et al. [sent-73, score-0.306]
21 A drawback of their reordering approach is that search is only performed on a small number of reorderings that are pre-calculated on the source side independently of the target side. [sent-77, score-0.577]
22 The reordering approach is entirely different and considers all possible orderings instead of a small set of pre-calculated orderings. [sent-82, score-0.298]
23 The standard N-gram model heavily relies on POS tags for reordering and is unable to use lexical triggers whereas our model exclusively uses lexical triggers and no POS information. [sent-83, score-0.298]
24 The most notable feature of our work is that it has a complete generative story of translation which combines translation and reordering operations into a single operation sequence model. [sent-86, score-1.18]
25 3 Generative Story Our generative story is motivated by the complex reorderings in the German-to-English translation task. [sent-91, score-0.356]
26 Occasionally the translator jumps back on the German side to insert some material at an earlier position. [sent-94, score-0.29]
27 We use 4 translation and 3 reordering opera- tions. [sent-97, score-0.465]
28 This operation causes the words in Y and the first word in X to be added to the English and German strings respectively, that were generated so far. [sent-101, score-0.345]
29 The generation of the second (and subsequent) German word in a multi-word cept can be delayed by gaps, jumps and the Generate Source Only operation defined below. [sent-104, score-0.615]
30 Continue Source Cept: The German words added 2However, Crego and Yvon (2009), in their N-gram system, use split rules to handle target-side gaps and show a slight improvement on a Chinese-English translation task. [sent-105, score-0.294]
31 to the queue by the Generate (X,Y) operation are generated by the Continue Source Cept operation. [sent-109, score-0.393]
32 Each Continue Source Cept operation removes one German word from the queue and copies it to the German string. [sent-110, score-0.331]
33 Xn, Y ) operation and n − 1 Continue Source Cept operations. [sent-114, score-0.283]
34 gelesen rrecead C” eips generated by Fthoer operation Generate (hat gelesen, read), which adds “hat” and “read” to the German and English strings and “gelesen” to a queue. [sent-118, score-0.494]
35 A Continue Source Cept operation later removes “gelesen” from the queue and adds it to the German string. [sent-119, score-0.331]
36 This operation is used to generate a German word X with no corresponding English word. [sent-121, score-0.368]
37 It is used during decoding, where a German word (X) is either translated to some English word(s) by a Generate (X,Y) operation or deleted with a Generate Source Only (X) operation. [sent-125, score-0.283]
38 The Generate Identical operation is used during decoding for the translation of unknown words. [sent-127, score-0.45]
39 The probability of this operation is estimated from singleton German words that are translated to an identical string. [sent-128, score-0.283]
40 For example, for a tuple “Portland Portland”, where German “Portland” was observed exactly once during training, we use a Generate Identical operation rather than Generate (Portland, Portland). [sent-129, score-0.365]
41 We now discuss the set of reordering operations used by the generative story. [sent-130, score-0.463]
42 During the generation process, the translator maintains an index which specifies the position after the previously covered German word (j), an index (Z) which specifies the index after the right-most German word covered so far, and an index of the next German word to be covered (j0). [sent-132, score-0.47]
43 The set of reordering operations used in 1048 – row indicates position j. [sent-133, score-0.484]
44 Insert Gap: This operation inserts a gap which acts as a place-holder for the skipped words. [sent-135, score-0.499]
45 Jump Back (W): This operation lets the translator jump back to an open gap. [sent-137, score-0.637]
46 It takes a parameter W specifying which gap to jump to. [sent-138, score-0.432]
47 Jump Back (1) jumps to the closest gap to Z, Jump Back (2) jumps to the second closest gap to Z, etc. [sent-139, score-0.612]
48 After the backward jump the target gap is closed. [sent-140, score-0.537]
49 Jump Forward: This operation makes the translator jump to Z. [sent-141, score-0.553]
50 A Jump Back (W) operation is only allowed at position Z. [sent-143, score-0.358]
51 A formal algorithm for converting a word-aligned bilingual corpus into an operation sequence is presented in Algorithm 1. [sent-147, score-0.38]
52 = 4 Model Our translation model p(F, E) is based on operation N-gram model which integrates translation and reordering operations. [sent-148, score-0.915]
53 Given a source string F, a sequence of tuples T = (t1, . [sent-149, score-0.305]
54 The relative position of the target gap is 1if it is closest to Z, 2 if it is the second closest gap etc. [sent-154, score-0.565]
55 The operation Generate Identical is chosen if Fi = Ei and the overall frequency of the German cept Fi is 1. [sent-155, score-0.525]
56 Our translation model is implemented as an N-gram model of operations using SRILM-Toolkit (Stolcke, 2002) with Kneser-Ney smoothing. [sent-163, score-0.278]
57 ,wj−1) Yj=1 where m = 4 (5-gram model) for the standard monolingual model (x = LM) and m = 8 (same as the operation for the prior probability model (x = pr). [sent-172, score-0.39]
58 We search for a target string E which maximizes a linear combination of feature functions: model5) generative6 tune7 5In decoding, the amount of context used for the prior probability is synchronized with the position of back-off in the operation model. [sent-174, score-0.454]
59 Other than the 3 features discussed above (log probabilities of the operation model, monolingual language model and prior probability model), we train 8 additional features discussed below: Length Bonus The length bonus feature counts the length of the target sentence in words. [sent-179, score-0.508]
60 Deleting a source word (Generate Source Only (X)) is a common operation in the generative story. [sent-181, score-0.431]
61 Gap Bonus and Open Gap Penalty These features are introduced to guide the reordering decisions. [sent-184, score-0.298]
62 We observe a large amount of reordering in the automatically word aligned training text. [sent-185, score-0.298]
63 The gap bonus feature sums to the total number of gaps inserted to produce a target sentence. [sent-188, score-0.502]
64 The open gap penalty feature is a penalty (paid once for each translation operation performed) whose value is the number of open gaps. [sent-189, score-0.884]
65 Distortion and Gap Distance Penalty We have two additional features to control the reordering decisions. [sent-191, score-0.298]
66 One of them is similar8 to the distancebased reordering model used by phrasal MT. [sent-192, score-0.386]
67 The other feature is the gap distance penalty which calculates the distance between the first word of a source cept X and the start of the left-most gap. [sent-193, score-0.625]
68 , Xn, we get the feature value gj = X1 − S, where S is the index of the left-most source wor−d wSh, werhee a gap sst tahrets. [sent-198, score-0.36]
69 , Ym represent indexes of the source words covered by the tuples tj and tj−1 respectively. [sent-205, score-0.336]
70 During hypothesis expansion, the decoder picks a tuple from the inventory and generates the sequence of operations required for the translation with this tuple in light of the previous hypothesis. [sent-231, score-0.551]
71 9 The sequence of op- erations may include translation (generate, continue source cept etc. [sent-232, score-0.613]
72 ) and reordering (gap insertions, jumps) operations. [sent-233, score-0.298]
73 is performed on hypotheses Recombination having the same cov- erage vector, monolingual language model context, and operation model context. [sent-235, score-0.352]
74 Our generative story does not handle target-side discontinuities and unaligned target words. [sent-243, score-0.272]
75 The resulting operation corpus contains one sequence of operations per sentence pair. [sent-252, score-0.449]
76 The operation model is estimated from the operation corpus. [sent-254, score-0.566]
77 The monolingual language model is estimated from the target side of the bilingual corpus and additional monolingual data. [sent-256, score-0.275]
78 The first system (Twno−rl) applies no hard reordering limit and uses the distortion and gap distance penalty features as soft constraints, allowing all possible reorderings. [sent-278, score-0.684]
79 The second system (Twrl−6) uses no distortion and gap distance features, but applies a hard constraint which limits reordering to no more than 6 11http://www. [sent-279, score-0.55]
80 In this experiment, we disallowed tuples which were discontinuous on the source side. [sent-288, score-0.324]
81 We compare our systems with two Moses systems as baseline, one using no reordering limit (Blno−rl) and one using the default distortion limit of 6 (Blrl−6). [sent-289, score-0.456]
82 Our best system (Twno−rl), which uses no hard reordering limit, gives statistically significant (p < 0. [sent-291, score-0.298]
83 The results for Moses drop by more than a BLEU point without the reordering limit (see Blno−rl in Table 3). [sent-293, score-0.359]
84 In another experiment, we tested our system also with tuples which were discontinuous on the source side. [sent-295, score-0.324]
85 These gappy translation units neither improved the performance of the system with hard reordering limit (Twrl−6−asg) nor that of the system without reordering limit (Twno−rl−asg) as Table 4 shows. [sent-296, score-1.085]
86 In an analysis of the output we found two reasons for this result: (i) Using tuples with source gaps increases the list of extracted n-best translation tuples exponentially which makes the search problem even more difficult. [sent-297, score-0.7]
87 (ii) The future cost14 is poorly estimated in case of tuples with gappy source cepts, causing search errors. [sent-299, score-0.381]
88 In an experiment, we deleted gappy tuples with 13We used Kevin Gimpel’s implementation of pairwise bootstrap resampling (Koehn, 2004b), 1000 samples. [sent-300, score-0.287]
89 We found that results improved (Twno−rl−hsg and Twrl−6−hsg in Table 4) compared to the version using all gaps (Twno−rl−asg, Twrl−6−asg), and are closer to the results without discontinuous tuples (Twno−rl and Twrl−6 in Table 3). [sent-308, score-0.357]
90 Example 1 in Figure 4 shows the powerful reordering mechanism of our model which moves the English verb phrase “do not want to negotiate” to its correct position between the subject “they” and the prepositional phrase “about concrete figures”. [sent-310, score-0.412]
91 Notice that although our model is using smaller translation units “nicht do not”, “verhandlen negotiate” and “wollen want to”, it is able to memorize the phrase translation “nicht verhandlen wollen do not want to negotiate” as a sequence of translation and reordering operations. [sent-312, score-1.035]
92 It learns the reordering of “verhandlen negotiate” and “wollen want to” and also captures dependencies across phrase boundaries. [sent-313, score-0.298]
93 Example 2 shows how our system without a reordering limit moves the English translation “vote” of the German clause-final verb “stimmen” across about 20 English tokens to its correct position behind the auxiliary “would”. [sent-314, score-0.601]
94 Example 3 shows how the system with gappy tuples translates a German sentence with the particle verb “kehrten. [sent-315, score-0.335]
95 The system without gappy units happens to produce the same translation by translating “kehrten” to “returned” and deleting the particle “zur u¨ck” (solid lines). [sent-320, score-0.454]
96 This is surprising because the operation for translating “kehrten” to “returned” and for deleting the particle are too far apart to influence each other in an n-gram model. [sent-321, score-0.37]
97 In contrast to Ngram based MT, our model has a generative story which tightly couples translation and reordering. [sent-326, score-0.266]
98 Our model is able to correctly reorder words across large distances, and it memorizes frequent phrasal translations including their reordering as probable operations sequences. [sent-328, score-0.497]
99 Edinburgh’s submission to all tracks of the WMT 2009 shared task with reordering and speed improvements to Moses. [sent-384, score-0.298]
100 Z-MERT: A fully configurable open source tool for minimum error rate training of machine translation systems. [sent-461, score-0.297]
wordName wordTfidf (topN-words)
[('reordering', 0.298), ('operation', 0.283), ('german', 0.251), ('cept', 0.242), ('gap', 0.216), ('jump', 0.216), ('hat', 0.17), ('translation', 0.167), ('tuples', 0.156), ('gelesen', 0.149), ('crego', 0.148), ('gappy', 0.131), ('gaps', 0.127), ('eine', 0.112), ('twno', 0.112), ('operations', 0.111), ('moses', 0.11), ('rl', 0.108), ('source', 0.094), ('cepts', 0.093), ('twrl', 0.093), ('jumps', 0.09), ('reorderings', 0.09), ('phrasal', 0.088), ('generate', 0.085), ('tuple', 0.082), ('eaten', 0.082), ('position', 0.075), ('gegessen', 0.074), ('hsg', 0.074), ('discontinuous', 0.074), ('penalty', 0.073), ('fj', 0.071), ('monolingual', 0.069), ('units', 0.069), ('asg', 0.066), ('josep', 0.066), ('negotiate', 0.066), ('generated', 0.062), ('limit', 0.061), ('insert', 0.061), ('bonus', 0.06), ('pizza', 0.06), ('koehn', 0.06), ('unaligned', 0.059), ('ei', 0.058), ('target', 0.058), ('beispiel', 0.056), ('blno', 0.056), ('buch', 0.056), ('discontinuities', 0.056), ('durrani', 0.056), ('kehrten', 0.056), ('unfolding', 0.056), ('verhandlen', 0.056), ('wollen', 0.056), ('zum', 0.056), ('continue', 0.055), ('sequence', 0.055), ('decoder', 0.054), ('er', 0.054), ('translator', 0.054), ('generative', 0.054), ('smt', 0.054), ('read', 0.052), ('index', 0.05), ('pbsmt', 0.049), ('particle', 0.048), ('queue', 0.048), ('back', 0.048), ('backward', 0.047), ('covered', 0.047), ('alignments', 0.046), ('story', 0.045), ('yj', 0.045), ('deletion', 0.045), ('mt', 0.044), ('forward', 0.044), ('oj', 0.043), ('bilingual', 0.042), ('inserted', 0.041), ('jos', 0.04), ('portland', 0.04), ('mechanism', 0.039), ('tj', 0.039), ('deleting', 0.039), ('prior', 0.038), ('aik', 0.037), ('argmeaxplm', 0.037), ('butterkekse', 0.037), ('hatgegessen', 0.037), ('linearization', 0.037), ('menge', 0.037), ('nadir', 0.037), ('nicht', 0.037), ('zeitung', 0.037), ('idioms', 0.037), ('side', 0.037), ('open', 0.036), ('distortion', 0.036)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000011 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering
Author: Nadir Durrani ; Helmut Schmid ; Alexander Fraser
Abstract: We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the “N-gram” model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is able to handle short and long distance reorderings in a unified way, and (ii) a joint sequence model for the translation and reordering probabilities which is more flexible than standard phrase-based MT. We observe statistically significant improvements in BLEU over Moses for German-to-English and Spanish-to-English tasks, and comparable results for a French-to-English task.
2 0.24424443 266 acl-2011-Reordering with Source Language Collocations
Author: Zhanyi Liu ; Haifeng Wang ; Hua Wu ; Ting Liu ; Sheng Li
Abstract: This paper proposes a novel reordering model for statistical machine translation (SMT) by means of modeling the translation orders of the source language collocations. The model is learned from a word-aligned bilingual corpus where the collocated words in source sentences are automatically detected. During decoding, the model is employed to softly constrain the translation orders of the source language collocations, so as to constrain the translation orders of those source phrases containing these collocated words. The experimental results show that the proposed method significantly improves the translation quality, achieving the absolute improvements of 1.1~1.4 BLEU score over the baseline methods. 1
3 0.20575489 141 acl-2011-Gappy Phrasal Alignment By Agreement
Author: Mohit Bansal ; Chris Quirk ; Robert Moore
Abstract: We propose a principled and efficient phraseto-phrase alignment model, useful in machine translation as well as other related natural language processing problems. In a hidden semiMarkov model, word-to-phrase and phraseto-word translations are modeled directly by the system. Agreement between two directional models encourages the selection of parsimonious phrasal alignments, avoiding the overfitting commonly encountered in unsupervised training with multi-word units. Expanding the state space to include “gappy phrases” (such as French ne ? pas) makes the alignment space more symmetric; thus, it allows agreement between discontinuous alignments. The resulting system shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime.
4 0.20384164 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
Author: Markos Mylonakis ; Khalil Sima'an
Abstract: While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. We propose a novel flexible modelling approach to introduce linguistic information of varying granularity from the source side. Our method induces joint probability synchronous grammars and estimates their parameters, by select- ing and weighing together linguistically motivated rules according to an objective function directly targeting generalisation over future data. We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target.
5 0.18429323 263 acl-2011-Reordering Constraint Based on Document-Level Context
Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita
Abstract: One problem with phrase-based statistical machine translation is the problem of longdistance reordering when translating between languages with different word orders, such as Japanese-English. In this paper, we propose a method of imposing reordering constraints using document-level context. As the documentlevel context, we use noun phrases which significantly occur in context documents containing source sentences. Given a source sentence, zones which cover the noun phrases are used as reordering constraints. Then, in decoding, reorderings which violate the zones are restricted. Experiment results for patent translation tasks show a significant improvement of 1.20% BLEU points in JapaneseEnglish translation and 1.41% BLEU points in English-Japanese translation.
6 0.17171964 264 acl-2011-Reordering Metrics for MT
7 0.16865161 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages
8 0.15682447 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
9 0.14034894 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation
10 0.13657475 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
11 0.12822179 104 acl-2011-Domain Adaptation for Machine Translation by Mining Unseen Words
12 0.12811875 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices
13 0.12492356 157 acl-2011-I Thou Thee, Thou Traitor: Predicting Formal vs. Informal Address in English Literature
14 0.12445045 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction
15 0.12384088 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation
16 0.12233678 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
17 0.12008516 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
18 0.11886867 69 acl-2011-Clause Restructuring For SMT Not Absolutely Helpful
19 0.1171935 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation
20 0.11009674 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation
topicId topicWeight
[(0, 0.246), (1, -0.213), (2, 0.142), (3, 0.129), (4, 0.061), (5, 0.036), (6, -0.005), (7, -0.018), (8, 0.008), (9, 0.052), (10, 0.048), (11, -0.031), (12, 0.012), (13, -0.099), (14, -0.048), (15, -0.003), (16, -0.014), (17, 0.02), (18, -0.076), (19, -0.037), (20, -0.045), (21, 0.026), (22, 0.01), (23, -0.118), (24, -0.078), (25, 0.064), (26, 0.071), (27, 0.043), (28, -0.059), (29, 0.039), (30, 0.101), (31, 0.004), (32, 0.175), (33, -0.07), (34, 0.036), (35, -0.132), (36, -0.07), (37, 0.049), (38, -0.047), (39, 0.059), (40, 0.048), (41, -0.035), (42, -0.065), (43, -0.012), (44, -0.027), (45, -0.034), (46, -0.116), (47, 0.011), (48, -0.055), (49, 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.94113088 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering
Author: Nadir Durrani ; Helmut Schmid ; Alexander Fraser
Abstract: We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the “N-gram” model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is able to handle short and long distance reorderings in a unified way, and (ii) a joint sequence model for the translation and reordering probabilities which is more flexible than standard phrase-based MT. We observe statistically significant improvements in BLEU over Moses for German-to-English and Spanish-to-English tasks, and comparable results for a French-to-English task.
2 0.9296183 266 acl-2011-Reordering with Source Language Collocations
Author: Zhanyi Liu ; Haifeng Wang ; Hua Wu ; Ting Liu ; Sheng Li
Abstract: This paper proposes a novel reordering model for statistical machine translation (SMT) by means of modeling the translation orders of the source language collocations. The model is learned from a word-aligned bilingual corpus where the collocated words in source sentences are automatically detected. During decoding, the model is employed to softly constrain the translation orders of the source language collocations, so as to constrain the translation orders of those source phrases containing these collocated words. The experimental results show that the proposed method significantly improves the translation quality, achieving the absolute improvements of 1.1~1.4 BLEU score over the baseline methods. 1
3 0.87574136 263 acl-2011-Reordering Constraint Based on Document-Level Context
Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita
Abstract: One problem with phrase-based statistical machine translation is the problem of longdistance reordering when translating between languages with different word orders, such as Japanese-English. In this paper, we propose a method of imposing reordering constraints using document-level context. As the documentlevel context, we use noun phrases which significantly occur in context documents containing source sentences. Given a source sentence, zones which cover the noun phrases are used as reordering constraints. Then, in decoding, reorderings which violate the zones are restricted. Experiment results for patent translation tasks show a significant improvement of 1.20% BLEU points in JapaneseEnglish translation and 1.41% BLEU points in English-Japanese translation.
4 0.77734447 69 acl-2011-Clause Restructuring For SMT Not Absolutely Helpful
Author: Susan Howlett ; Mark Dras
Abstract: There are a number of systems that use a syntax-based reordering step prior to phrasebased statistical MT. An early work proposing this idea showed improved translation performance, but subsequent work has had mixed results. Speculations as to cause have suggested the parser, the data, or other factors. We systematically investigate possible factors to give an initial answer to the question: Under what conditions does this use of syntax help PSMT?
5 0.7417078 264 acl-2011-Reordering Metrics for MT
Author: Alexandra Birch ; Miles Osborne
Abstract: One of the major challenges facing statistical machine translation is how to model differences in word order between languages. Although a great deal of research has focussed on this problem, progress is hampered by the lack of reliable metrics. Most current metrics are based on matching lexical items in the translation and the reference, and their ability to measure the quality of word order has not been demonstrated. This paper presents a novel metric, the LRscore, which explicitly measures the quality of word order by using permutation distance metrics. We show that the metric is more consistent with human judgements than other metrics, including the BLEU score. We also show that the LRscore can successfully be used as the objective function when training translation model parameters. Training with the LRscore leads to output which is preferred by humans. Moreover, the translations incur no penalty in terms of BLEU scores.
6 0.71822548 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices
7 0.70577544 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
8 0.70367211 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages
10 0.62379336 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation
11 0.60491776 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
12 0.59173363 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
13 0.58931059 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach
14 0.5793612 151 acl-2011-Hindi to Punjabi Machine Translation System
15 0.56668723 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction
16 0.55658376 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers
17 0.55435956 104 acl-2011-Domain Adaptation for Machine Translation by Mining Unseen Words
18 0.54911208 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
19 0.54731429 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules
20 0.54653752 157 acl-2011-I Thou Thee, Thou Traitor: Predicting Formal vs. Informal Address in English Literature
topicId topicWeight
[(5, 0.044), (17, 0.099), (26, 0.014), (31, 0.016), (37, 0.086), (39, 0.044), (41, 0.075), (49, 0.204), (51, 0.015), (55, 0.046), (59, 0.034), (72, 0.033), (91, 0.028), (96, 0.171), (97, 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.83254027 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering
Author: Nadir Durrani ; Helmut Schmid ; Alexander Fraser
Abstract: We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the “N-gram” model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is able to handle short and long distance reorderings in a unified way, and (ii) a joint sequence model for the translation and reordering probabilities which is more flexible than standard phrase-based MT. We observe statistically significant improvements in BLEU over Moses for German-to-English and Spanish-to-English tasks, and comparable results for a French-to-English task.
2 0.79871774 47 acl-2011-Automatic Assessment of Coverage Quality in Intelligence Reports
Author: Samuel Brody ; Paul Kantor
Abstract: Common approaches to assessing document quality look at shallow aspects, such as grammar and vocabulary. For many real-world applications, deeper notions of quality are needed. This work represents a first step in a project aimed at developing computational methods for deep assessment of quality in the domain of intelligence reports. We present an automated system for ranking intelligence reports with regard to coverage of relevant material. The system employs methodologies from the field of automatic summarization, and achieves performance on a par with human judges, even in the absence of the underlying information sources.
3 0.73961323 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models
Author: Sameer Singh ; Amarnag Subramanya ; Fernando Pereira ; Andrew McCallum
Abstract: Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas, we constructed a labeled corpus of 1.5 million disambiguated mentions in Web pages by selecting link anchors referring to Wikipedia entities. We show that the combination of the hierarchical model with distributed inference quickly obtains high accuracy (with error reduction of 38%) on this large dataset, demonstrating the scalability of our approach.
4 0.73893702 141 acl-2011-Gappy Phrasal Alignment By Agreement
Author: Mohit Bansal ; Chris Quirk ; Robert Moore
Abstract: We propose a principled and efficient phraseto-phrase alignment model, useful in machine translation as well as other related natural language processing problems. In a hidden semiMarkov model, word-to-phrase and phraseto-word translations are modeled directly by the system. Agreement between two directional models encourages the selection of parsimonious phrasal alignments, avoiding the overfitting commonly encountered in unsupervised training with multi-word units. Expanding the state space to include “gappy phrases” (such as French ne ? pas) makes the alignment space more symmetric; thus, it allows agreement between discontinuous alignments. The resulting system shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime.
5 0.73845369 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment
Author: Yashar Mehdad ; Matteo Negri ; Marcello Federico
Abstract: This paper explores the use of bilingual parallel corpora as a source of lexical knowledge for cross-lingual textual entailment. We claim that, in spite of the inherent difficulties of the task, phrase tables extracted from parallel data allow to capture both lexical relations between single words, and contextual information useful for inference. We experiment with a phrasal matching method in order to: i) build a system portable across languages, and ii) evaluate the contribution of lexical knowledge in isolation, without interaction with other inference mechanisms. Results achieved on an English-Spanish corpus obtained from the RTE3 dataset support our claim, with an overall accuracy above average scores reported by RTE participants on monolingual data. Finally, we show that using parallel corpora to extract paraphrase tables reveals their potential also in the monolingual setting, improving the results achieved with other sources of lexical knowledge.
6 0.73804373 32 acl-2011-Algorithm Selection and Model Adaptation for ESL Correction Tasks
7 0.73634553 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
8 0.73528808 311 acl-2011-Translationese and Its Dialects
9 0.73468459 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction
10 0.73424375 30 acl-2011-Adjoining Tree-to-String Translation
11 0.73374915 61 acl-2011-Binarized Forest to String Translation
12 0.7336666 11 acl-2011-A Fast and Accurate Method for Approximate String Search
13 0.73274446 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar
14 0.73236167 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
15 0.73218024 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
16 0.73112249 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction
17 0.7307812 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
19 0.73075628 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction
20 0.72995895 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules