emnlp emnlp2013 emnlp2013-175 knowledge-graph by maker-knowledge-mining

175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation


Source: pdf

Author: Uri Lerner ; Slav Petrov

Abstract: We present a simple and novel classifier-based preordering approach. Unlike existing preordering models, we train feature-rich discriminative classifiers that directly predict the target-side word order. Our approach combines the strengths of lexical reordering and syntactic preordering models by performing long-distance reorderings using the structure of the parse tree, while utilizing a discriminative model with a rich set of features, including lexical features. We present extensive experiments on 22 language pairs, including preordering into English from 7 other languages. We obtain improvements of up to 1.4 BLEU on language pairs in the WMT 2010 shared task. For languages from different families the improvements often exceed 2 BLEU. Many of these gains are also significant in human evaluations.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract We present a simple and novel classifier-based preordering approach. [sent-3, score-0.611]

2 Unlike existing preordering models, we train feature-rich discriminative classifiers that directly predict the target-side word order. [sent-4, score-0.777]

3 Our approach combines the strengths of lexical reordering and syntactic preordering models by performing long-distance reorderings using the structure of the parse tree, while utilizing a discriminative model with a rich set of features, including lexical features. [sent-5, score-1.124]

4 We present extensive experiments on 22 language pairs, including preordering into English from 7 other languages. [sent-6, score-0.611]

5 For languages from different families the improvements often exceed 2 BLEU. [sent-9, score-0.192]

6 Lexical reordering approaches (Tillmann, 2004; Zens and Ney, 2006) add a reordering component to standard phrase-based translation systems (Och and Ney, 2004). [sent-13, score-0.84]

7 Because the reordering model is trained discriminatively, it can use a rich set of lexical features. [sent-14, score-0.408]

8 However, it only has access to the local context which often times is insufficient to make the long-distance reordering decisions that are necessary for language pairs with significantly different word order. [sent-15, score-0.434]

9 Because preordering is performed prior to word alignment, it can improve the alignment process and can then be combined with any subsequent translation model. [sent-20, score-0.76]

10 Most preordering models use a source-side syntactic parser and perform a series of tree transformations. [sent-21, score-0.712]

11 The reordering operation is then to sort the words according to their assigned values. [sent-37, score-0.371]

12 In this work we present a simple classifier-based preordering model. [sent-44, score-0.611]

13 Our model operates over dependency parse trees and is therefore able to perform long-distance reordering decisions, as is typical for preordering models. [sent-45, score-1.072]

14 But instead of deterministic rules or ranking functions, we use discriminative classifiers to directly predict the final word order, using rich (bi-)lexical and syntactic features. [sent-46, score-0.213]

15 The first model uses a classifier to directly predict the permutation order in which a family of words (a head word and all its children) will appear on the target side. [sent-48, score-0.455]

16 In the first step, for each child word a binary classifier decides whether it appears before or after its parent in the target language. [sent-54, score-0.228]

17 We present experiments on 22 language pairs from different language families using our preordering approach in a phrase-based system (Och and Ney, 2004), as well as a forest-to-string system (Zhang et al. [sent-57, score-0.654]

18 In a second set of experiments, we use automatically mined parallel data from the web and build translation systems for languages from various language families. [sent-63, score-0.218]

19 Finally, we compare training the preordering classifiers on small amounts of manually aligned data to training on large quantities of automatically aligned data for English to Arabic, Hebrew, and Japanese. [sent-67, score-0.966]

20 When evaluated on a pure reordering task, the models trained on manually aligned data perform slightly better, but similar BLEU scores are obtained in both scenarios on an end-to-end translation task. [sent-68, score-0.616]

21 For example, when translating the English sentence: The black cat climbed to the tree top. [sent-70, score-0.383]

22 to Spanish, we would like to reorder it as: The cat black climbed to the top tree. [sent-71, score-0.274]

23 When translating to Japanese, we would like to get: The black cat the tree top to climbed. [sent-72, score-0.235]

24 For each head word we determine the order of the head and its children (independently of other decisions) and continue the traversal recursively in that order. [sent-76, score-0.443]

25 In the example, we first need to decide on the order of the head “climbed” and the children “cat”, “to ”, and “. [sent-77, score-0.296]

26 1 Classification Model & Features The reordering decisions are made by multi-class classifiers where class labels correspond to permutation sequences. [sent-80, score-0.622]

27 Crucially, we do not learn explicit tree transformations rules, but let the classifiers learn to trade off between a rich set of overlapping features. [sent-82, score-0.19]

28 2 Training Data The training data for the classifiers is generated from the word aligned parallel text. [sent-101, score-0.243]

29 For every family in the source dependency tree we generate a training instance if and only if the intersection defines a full order on the source words: • Every source word must be aligned to at least one target rwceor wd. [sent-104, score-0.294]

30 o 515 d e ta m o dn s u b jROOTp r e p p o bd j e tn c The black cat climbed to the tree top . [sent-105, score-0.336]

31 • If a source word is aligned to multiple target words, uthrceen no target lwigornde din t tohi ms range can gbeet aligned to a different source word. [sent-110, score-0.292]

32 In particular, we do not need to extract training instances for all words in a given sentence since the reordering decisions are made independently for every head word. [sent-112, score-0.581]

33 Russian or Japanese) the determiner “the ” may either not be aligned to any word or get aligned to the foreign word for “boy”. [sent-117, score-0.253]

34 First, there are instances where the En• glish word “the ” gets aligned to something (perhaps a preposition), and second, since the word “the ” is omitted in the target language its location in the reordered sentence is not very important. [sent-121, score-0.239]

35 The other direction is also true if we run preordering on the source side then the alignment task becomes easier and tends to produce better results. [sent-123, score-0.693]

36 Therefore it can be useful to iterate between generating the alignment and learning a preordering model. [sent-124, score-0.662]

37 , create the alignment, train a preordering model, use the preordering model to learn a new alignment, and then train the final preordering model. [sent-127, score-1.833]

38 3 1-Step Classifier As a first approach we use a single classifier to directly predict the correct permutation of a given family. [sent-129, score-0.206]

39 A possible outcome of the classifier can be the permutation 0-2-1-3, representing the order “cat”, “to ”, “climbed”, and “. [sent-133, score-0.169]

40 The number of permutations for the head and n children is of course (n + 1)! [sent-135, score-0.337]

41 The decision depends on the adjective itself and sometimes the head noun, but does not depend on other children. [sent-142, score-0.197]

42 Ideally, if for some adjective we have enough examples with 1 or 2 children we would like to make the same decision for a larger number of children, but these classifiers may not have enough relevant examples. [sent-143, score-0.297]

43 4 2-Step Classifier Our 2-step approach addresses the exponential blowup of the number of children by decomposing the prediction into two steps: 1. [sent-145, score-0.183]

44 Determine the order of the children that appear before the head and the order of the children after the head. [sent-148, score-0.478]

45 The two steps make the reordering of the modifiers before and after the head independent of each other, which is reminiscent of the lexicalized parse tree 516 generation approach of Collins (1997). [sent-149, score-0.651]

46 In the running example, for the head “climbed” we might first make the following three binary decisions: the word “cat” should appear before the head and the words “to” and “. [sent-150, score-0.327]

47 The first step is implemented using a binary classifier, called the pivot classifier (since the head functions like the pivot in quicksort). [sent-154, score-0.328]

48 The sec- ond step classifiers directly predict the correct permutation of the children before / after the head. [sent-155, score-0.374]

49 = 24 outcomes (if all the children are on one side of the head); if we are lucky and the children split evenly, then we only need two binary decisions in the second step (for the two pairs before and after the head). [sent-160, score-0.392]

50 In addition to the regular distance distortion model, we incorporate a maximum entropy based lexicalized phrase reordering model (Zens and Ney, 2006). [sent-166, score-0.405]

51 We then add our 1-step and 2-step preordering classifiers as preprocessing steps at training and test time. [sent-171, score-0.709]

52 We train the reordering classifiers on up to 15M training instances. [sent-172, score-0.469]

53 In our implementation, in the 1-step approach we did not do any reordering for nodes with 7 or more children. [sent-174, score-0.371]

54 In the 2-step approach we did not reorder the children on either side of the head if there were 7 or more of them. [sent-175, score-0.327]

55 There were very few cases where children were not reordered because of these thresholds, many of them corresponded to bad parses, and they had very little impact on the final scores. [sent-177, score-0.213]

56 Thus, for the 1-step ap– proach we had 6 classifiers: 1 binary classifier for a head and a single child and 5 multi-class classifiers for 3–7 words. [sent-178, score-0.437]

57 For a direct comparison to a strong preordering system, we compare to the system of Genzel (2010), which learns a set of unlexicalized reordering rules from automatically aligned data by minimizing the number of crossing alignments. [sent-180, score-1.139]

58 There are no length-based reordering constraints in the forest-to-string system. [sent-197, score-0.371]

59 2 Additional Languages In our second set of experiments, we explore the impact of classifier preordering for a number of languages with different word orders. [sent-203, score-0.775]

60 Some of the languages included in our study are verb-subject-object (VSO) languages (Arabic, Irish, Welsh), subjectobject-verb (SOV) languages (Japanese, Korean), and fairly free word order languages (Dutch, Hungarian). [sent-204, score-0.34]

61 Lexical reordering is included where it helps, but typically makes only a small difference. [sent-213, score-0.371]

62 This potentially underestimates the improvements that can be obtained, but also eliminates MERT as a possible source of improvement, allowing us to trace back improvements in translation quality directly to changes in preordering of the input data. [sent-217, score-0.837]

63 Lexical reordering (Zens and Ney, 2006) never hurts and is thus included in all systems. [sent-253, score-0.371]

64 The 2-step classifier preordering approach provides statistically significant improvements over the lexical reordering baseline on three out of the eight language pairs: English-Spanish (en-es: 1. [sent-255, score-1.162]

65 While the forest-to-string system is capable ofperforming long distance reordering in the decoder, it appears that an explicitly trained lexicalized preordering model can provide complementary benefits. [sent-261, score-1.016]

66 For the romance languages (Spanish and French), word ordering depends highly on lexical choice which is captured by the lexical features in our classifiers. [sent-263, score-0.196]

67 The base system includes a distance distortion model; the lexical system adds lexical reordering; rule is the rule preordering system of Genzel (2010) plus lexical reordering; 1-step and 2-step are our classifier-based systems plus lexical reordering. [sent-265, score-0.831]

68 Compared to a state-of-the-art preordering system, the automatic rule extraction system of Genzel (2010), we observe significant gains in several cases and no losses at all. [sent-269, score-0.729]

69 Comparing the different languages, Czech (cs) appears the most immune to improvements from preordering (and lexical reordering). [sent-271, score-0.712]

70 It is therefore difficult to learn reordering changes from English to Czech. [sent-273, score-0.371]

71 The SOV languages Korean (ko) and Japanese (ja) benefit the 519 most from preordering and gain more than 7 BLEU relative to the phrase-based baseline and still more than 3 BLEU for the forest-to-string system. [sent-279, score-0.696]

72 The benefits of our 2-step approach over the 1step approach become apparent on this set of languages where reordering is most important. [sent-289, score-0.456]

73 Lexical reordering is not included in any of the systems. [sent-296, score-0.371]

74 The gains relative to the rule reordering system of Genzel (2010) and the no-preordering baseline are even larger and therefore clearly also significant. [sent-301, score-0.489]

75 In all cases but English-Hungarian we observe significant improvements over the no preordering baseline. [sent-303, score-0.675]

76 It should be noted that the gains are not symmetric sometimes there are larger gains for translating out of English, while for Hungarian the gains are higher for translating into English. [sent-304, score-0.34]

77 For Dutch-English, the forest-tostring system yields the best results, which was also the case for German-English, further supporting the observation that combining different types of syntactic reordering approaches can be beneficial. [sent-306, score-0.371]

78 Lexical reordering is not used for any language pair. [sent-312, score-0.371]

79 The BLEU scores in Table 5 show that training from small amounts of manually aligned data or large amounts of automatically aligned data results in models of similar quality. [sent-316, score-0.257]

80 In absolute terms, the reordering accuracy is around 80% for Arabic and Japanese and close to 90% for Hebrew. [sent-319, score-0.371]

81 We also examined the accuracy of the individual classifiers and found that the pivot classifier has an accuracy around 95%. [sent-321, score-0.228]

82 It is therefore unlikely that a word is reordered to the wrong side of its head in the 2-step reordering approach. [sent-322, score-0.613]

83 5 Analysis In this section, we analyze an example whose translation is significantly improved by our preordering approach, demonstrating the usefulness of our lexicalized features. [sent-324, score-0.743]

84 In our experiments the rule-based approach of (Genzel, 2010) reordered the source sentence into: It was a whirlwind real. [sent-340, score-0.175]

85 The head “whirlwind” is a noun and the child “real” is an adjective; since adjectives typically appear after nouns in Spanish, their order is reversed. [sent-345, score-0.332]

86 In Table 6 we consider the 3 strongest features in favor of the child “real” appearing after the head “whirlwind” and the three strongest features in favor of the child appearing before the head. [sent-347, score-0.373]

87 Recall that the pivot is a binary classifier: positive features support one decision (in our case: the child should be after the head) and the negative features support the other decision (the child should be before the head). [sent-348, score-0.277]

88 It is interesting to note that for this particular ordering decision the child word is much more informative than the head word and indeed, all the important features contain information about the child and none of them contains any information about the head. [sent-351, score-0.41]

89 6 Conclusions & Future Work We presented a simple and novel preordering approach that produces substantial improvements in translation accuracy on a large number of languages. [sent-352, score-0.773]

90 We use a source-side syntactic parser and train discriminative classifiers to predict the order of a parent and its children in the target language, using features from the dependency tree as well as (bi-)lexical features. [sent-353, score-0.505]

91 5 BLEU over a strong directly comparable preordering system that is based on learning unlexicalized reordering rules. [sent-359, score-0.982]

92 , 2006; Dyer and Resnik, 2010) we note that both approaches use syntactic information for reordering decisions. [sent-365, score-0.371]

93 cause preordering is performed before learning word alignments, it has the potential to improve the word alignments. [sent-371, score-0.611]

94 Finally, preordering can be combined with syntaxbased translation models and our results confirm the complementary benefits that can be obtained. [sent-373, score-0.709]

95 Compared to other preordering models, our approach has the obvious problem of having to make predictions over an exponential set of permutations. [sent-374, score-0.645]

96 Compared to preordering systems that use ranking functions, our model has the advantage that it can encode information about the complete permutation. [sent-383, score-0.611]

97 NoNextSibling and NoNextHeadSibling mean that the child and head do not have a sibling to the right. [sent-396, score-0.295]

98 Promising directions for future work are joint parsing and reordering models, and measuring the influence of parsing accuracy on preordering and final translation quality. [sent-398, score-1.08]

99 Automatically learning source-side reordering rules for large scale machine translation. [sent-511, score-0.418]

100 A rankingbased approach to word reordering for statistical machine translation. [sent-752, score-0.401]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('preordering', 0.611), ('reordering', 0.371), ('children', 0.149), ('climbed', 0.148), ('head', 0.147), ('bleu', 0.139), ('prevchild', 0.129), ('wmt', 0.119), ('child', 0.113), ('whirlwind', 0.111), ('aligned', 0.11), ('translation', 0.098), ('classifiers', 0.098), ('genzel', 0.096), ('cat', 0.093), ('permutation', 0.09), ('languages', 0.085), ('japanese', 0.084), ('gains', 0.082), ('classifier', 0.079), ('hungarian', 0.077), ('korean', 0.07), ('reordered', 0.064), ('malay', 0.064), ('improvements', 0.064), ('decisions', 0.063), ('tree', 0.062), ('arabic', 0.057), ('hebrew', 0.056), ('sov', 0.055), ('welsh', 0.055), ('dependency', 0.053), ('treebank', 0.052), ('alignment', 0.051), ('pivot', 0.051), ('petrov', 0.051), ('adjective', 0.05), ('det', 0.049), ('nivre', 0.048), ('talbot', 0.048), ('irish', 0.048), ('mccord', 0.048), ('ja', 0.048), ('rewrite', 0.048), ('ney', 0.047), ('translating', 0.047), ('rules', 0.047), ('abeill', 0.044), ('xia', 0.043), ('families', 0.043), ('dutch', 0.043), ('indonesian', 0.041), ('permutations', 0.041), ('bolded', 0.041), ('portuguese', 0.041), ('iw', 0.041), ('spanish', 0.04), ('parser', 0.039), ('noun', 0.039), ('uszkoreit', 0.039), ('nn', 0.038), ('manually', 0.037), ('predict', 0.037), ('della', 0.037), ('pietra', 0.037), ('prevsibling', 0.037), ('vso', 0.037), ('lexical', 0.037), ('ordering', 0.037), ('parse', 0.037), ('target', 0.036), ('rule', 0.036), ('shared', 0.036), ('parallel', 0.035), ('zens', 0.035), ('sibling', 0.035), ('english', 0.034), ('exponential', 0.034), ('lexicalized', 0.034), ('czech', 0.034), ('treebanks', 0.034), ('black', 0.033), ('family', 0.033), ('appear', 0.033), ('determiner', 0.033), ('real', 0.032), ('heuristics', 0.032), ('raising', 0.032), ('boy', 0.032), ('uri', 0.032), ('och', 0.032), ('discriminative', 0.031), ('side', 0.031), ('zhang', 0.031), ('statistical', 0.03), ('transformations', 0.03), ('neubig', 0.029), ('tromble', 0.029), ('glish', 0.029), ('jj', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

Author: Uri Lerner ; Slav Petrov

Abstract: We present a simple and novel classifier-based preordering approach. Unlike existing preordering models, we train feature-rich discriminative classifiers that directly predict the target-side word order. Our approach combines the strengths of lexical reordering and syntactic preordering models by performing long-distance reorderings using the structure of the parse tree, while utilizing a discriminative model with a rich set of features, including lexical features. We present extensive experiments on 22 language pairs, including preordering into English from 7 other languages. We obtain improvements of up to 1.4 BLEU on language pairs in the WMT 2010 shared task. For languages from different families the improvements often exceed 2 BLEU. Many of these gains are also significant in human evaluations.

2 0.30315334 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation

Author: Zhongqiang Huang ; Jacob Devlin ; Rabih Zbib

Abstract: Translation Jacob Devlin Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA j devl in@bbn . com Rabih Zbib Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA r zbib@bbn . com have tried to introduce grammaticality to the transThis paper describes a factored approach to incorporating soft source syntactic constraints into a hierarchical phrase-based translation system. In contrast to traditional approaches that directly introduce syntactic constraints to translation rules by explicitly decorating them with syntactic annotations, which often exacerbate the data sparsity problem and cause other problems, our approach keeps translation rules intact and factorizes the use of syntactic constraints through two separate models: 1) a syntax mismatch model that associates each nonterminal of a translation rule with a distribution of tags that is used to measure the degree of syntactic compatibility of the translation rule on source spans; 2) a syntax-based reordering model that predicts whether a pair of sibling constituents in the constituent parse tree of the source sentence should be reordered or not when translated to the target language. The features produced by both models are used as soft constraints to guide the translation process. Experiments on Chinese-English translation show that the proposed approach significantly improves a strong string-to-dependency translation system on multiple evaluation sets.

3 0.24011338 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

Author: Peng Li ; Yang Liu ; Maosong Sun

Abstract: While inversion transduction grammar (ITG) is well suited for modeling ordering shifts between languages, how to make applying the two reordering rules (i.e., straight and inverted) dependent on actual blocks being merged remains a challenge. Unlike previous work that only uses boundary words, we propose to use recursive autoencoders to make full use of the entire merging blocks alternatively. The recursive autoencoders are capable of generating vector space representations for variable-sized phrases, which enable predicting orders to exploit syntactic and semantic information from a neural language modeling’s perspective. Experiments on the NIST 2008 dataset show that our system significantly improves over the MaxEnt classifier by 1.07 BLEU points.

4 0.19940871 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation

Author: Katsuhiko Hayashi ; Katsuhito Sudoh ; Hajime Tsukada ; Jun Suzuki ; Masaaki Nagata

Abstract: This paper presents a novel word reordering model that employs a shift-reduce parser for inversion transduction grammars. Our model uses rich syntax parsing features for word reordering and runs in linear time. We apply it to postordering of phrase-based machine translation (PBMT) for Japanese-to-English patent tasks. Our experimental results show that our method achieves a significant improvement of +3.1 BLEU scores against 30.15 BLEU scores of the baseline PBMT system.

5 0.18251355 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

Author: Joern Wuebker ; Stephan Peitz ; Felix Rietig ; Hermann Ney

Abstract: Automatically clustering words from a monolingual or bilingual training corpus into classes is a widely used technique in statistical natural language processing. We present a very simple and easy to implement method for using these word classes to improve translation quality. It can be applied across different machine translation paradigms and with arbitrary types of models. We show its efficacy on a small German→English and a larger F ornenc ah s→mGalelrm Gaenrm mtarann→slEatniognli tsahsk a nwdit ha lbaortghe rst Farnednacrhd→ phrase-based salandti nhie traaskrch wiciathl phrase-based translation systems for a common set of models. Our results show that with word class models, the baseline can be improved by up to 1.4% BLEU and 1.0% TER on the French→German task and 0.3% BLEU aonnd t h1e .1 F%re nTcEhR→ on tehrem German→English Btask.

6 0.16147508 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

7 0.14231043 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering

8 0.12017861 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

9 0.10928002 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

10 0.1085435 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

11 0.10743923 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

12 0.10295743 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

13 0.10083625 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation

14 0.10078106 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology

15 0.092465758 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

16 0.086931318 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing

17 0.086532101 70 emnlp-2013-Efficient Higher-Order CRFs for Morphological Tagging

18 0.086317509 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation

19 0.076399453 186 emnlp-2013-Translating into Morphologically Rich Languages with Synthetic Phrases

20 0.075022385 201 emnlp-2013-What is Hidden among Translation Rules


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.269), (1, -0.28), (2, 0.106), (3, 0.021), (4, 0.039), (5, -0.06), (6, -0.078), (7, -0.107), (8, 0.027), (9, 0.039), (10, -0.009), (11, 0.092), (12, -0.053), (13, -0.082), (14, -0.157), (15, 0.21), (16, -0.077), (17, -0.109), (18, -0.061), (19, -0.05), (20, 0.01), (21, 0.013), (22, 0.098), (23, 0.144), (24, 0.131), (25, 0.022), (26, -0.125), (27, 0.027), (28, -0.038), (29, 0.057), (30, 0.016), (31, 0.131), (32, -0.004), (33, 0.02), (34, -0.059), (35, 0.063), (36, -0.06), (37, -0.021), (38, 0.058), (39, 0.054), (40, -0.003), (41, 0.047), (42, -0.046), (43, 0.041), (44, 0.086), (45, 0.014), (46, -0.045), (47, 0.047), (48, -0.041), (49, 0.051)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93082774 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

Author: Uri Lerner ; Slav Petrov

Abstract: We present a simple and novel classifier-based preordering approach. Unlike existing preordering models, we train feature-rich discriminative classifiers that directly predict the target-side word order. Our approach combines the strengths of lexical reordering and syntactic preordering models by performing long-distance reorderings using the structure of the parse tree, while utilizing a discriminative model with a rich set of features, including lexical features. We present extensive experiments on 22 language pairs, including preordering into English from 7 other languages. We obtain improvements of up to 1.4 BLEU on language pairs in the WMT 2010 shared task. For languages from different families the improvements often exceed 2 BLEU. Many of these gains are also significant in human evaluations.

2 0.82969564 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation

Author: Katsuhiko Hayashi ; Katsuhito Sudoh ; Hajime Tsukada ; Jun Suzuki ; Masaaki Nagata

Abstract: This paper presents a novel word reordering model that employs a shift-reduce parser for inversion transduction grammars. Our model uses rich syntax parsing features for word reordering and runs in linear time. We apply it to postordering of phrase-based machine translation (PBMT) for Japanese-to-English patent tasks. Our experimental results show that our method achieves a significant improvement of +3.1 BLEU scores against 30.15 BLEU scores of the baseline PBMT system.

3 0.77220035 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

Author: Peng Li ; Yang Liu ; Maosong Sun

Abstract: While inversion transduction grammar (ITG) is well suited for modeling ordering shifts between languages, how to make applying the two reordering rules (i.e., straight and inverted) dependent on actual blocks being merged remains a challenge. Unlike previous work that only uses boundary words, we propose to use recursive autoencoders to make full use of the entire merging blocks alternatively. The recursive autoencoders are capable of generating vector space representations for variable-sized phrases, which enable predicting orders to exploit syntactic and semantic information from a neural language modeling’s perspective. Experiments on the NIST 2008 dataset show that our system significantly improves over the MaxEnt classifier by 1.07 BLEU points.

4 0.71942925 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation

Author: Zhongqiang Huang ; Jacob Devlin ; Rabih Zbib

Abstract: Translation Jacob Devlin Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA j devl in@bbn . com Rabih Zbib Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA r zbib@bbn . com have tried to introduce grammaticality to the transThis paper describes a factored approach to incorporating soft source syntactic constraints into a hierarchical phrase-based translation system. In contrast to traditional approaches that directly introduce syntactic constraints to translation rules by explicitly decorating them with syntactic annotations, which often exacerbate the data sparsity problem and cause other problems, our approach keeps translation rules intact and factorizes the use of syntactic constraints through two separate models: 1) a syntax mismatch model that associates each nonterminal of a translation rule with a distribution of tags that is used to measure the degree of syntactic compatibility of the translation rule on source spans; 2) a syntax-based reordering model that predicts whether a pair of sibling constituents in the constituent parse tree of the source sentence should be reordered or not when translated to the target language. The features produced by both models are used as soft constraints to guide the translation process. Experiments on Chinese-English translation show that the proposed approach significantly improves a strong string-to-dependency translation system on multiple evaluation sets.

5 0.69115639 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

Author: Hendra Setiawan ; Bowen Zhou ; Bing Xiang

Abstract: Reordering poses one of the greatest challenges in Statistical Machine Translation research as the key contextual information may well be beyond the confine oftranslation units. We present the “Anchor Graph” (AG) model where we use a graph structure to model global contextual information that is crucial for reordering. The key ingredient of our AG model is the edges that capture the relationship between the reordering around a set of selected translation units, which we refer to as anchors. As the edges link anchors that may span multiple translation units at decoding time, our AG model effectively encodes global contextual information that is previously absent. We integrate our proposed model into a state-of-the-art translation system and demonstrate the efficacy of our proposal in a largescale Chinese-to-English translation task.

6 0.62281972 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

7 0.54890347 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering

8 0.47415179 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

9 0.42942128 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

10 0.42112041 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation

11 0.4107556 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

12 0.40722746 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology

13 0.4050734 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

14 0.40475136 116 emnlp-2013-Joint Parsing and Disfluency Detection in Linear Time

15 0.390816 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

16 0.37470984 203 emnlp-2013-With Blinkers on: Robust Prediction of Eye Movements across Readers

17 0.36870518 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

18 0.36734685 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

19 0.36412665 190 emnlp-2013-Ubertagging: Joint Segmentation and Supertagging for English

20 0.36031693 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.034), (10, 0.011), (18, 0.031), (22, 0.047), (26, 0.011), (30, 0.124), (45, 0.023), (50, 0.04), (51, 0.156), (64, 0.213), (66, 0.038), (71, 0.022), (75, 0.032), (77, 0.094), (90, 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.81234866 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

Author: Uri Lerner ; Slav Petrov

Abstract: We present a simple and novel classifier-based preordering approach. Unlike existing preordering models, we train feature-rich discriminative classifiers that directly predict the target-side word order. Our approach combines the strengths of lexical reordering and syntactic preordering models by performing long-distance reorderings using the structure of the parse tree, while utilizing a discriminative model with a rich set of features, including lexical features. We present extensive experiments on 22 language pairs, including preordering into English from 7 other languages. We obtain improvements of up to 1.4 BLEU on language pairs in the WMT 2010 shared task. For languages from different families the improvements often exceed 2 BLEU. Many of these gains are also significant in human evaluations.

2 0.80968714 87 emnlp-2013-Fish Transporters and Miracle Homes: How Compositional Distributional Semantics can Help NP Parsing

Author: Angeliki Lazaridou ; Eva Maria Vecchi ; Marco Baroni

Abstract: In this work, we argue that measures that have been shown to quantify the degree of semantic plausibility of phrases, as obtained from their compositionally-derived distributional semantic representations, can resolve syntactic ambiguities. We exploit this idea to choose the correct parsing of NPs (e.g., (live fish) transporter rather than live (fish transporter)). We show that our plausibility cues outperform a strong baseline and significantly improve performance when used in combination with state-of-the-art features.

3 0.80162537 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

Author: Xiaoning Zhu ; Zhongjun He ; Hua Wu ; Haifeng Wang ; Conghui Zhu ; Tiejun Zhao

Abstract: This paper proposes a novel approach that utilizes a machine learning method to improve pivot-based statistical machine translation (SMT). For language pairs with few bilingual data, a possible solution in pivot-based SMT using another language as a

4 0.70449412 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

Author: Ann Irvine ; Chris Quirk ; Hal Daume III

Abstract: When using a machine translation (MT) model trained on OLD-domain parallel data to translate NEW-domain text, one major challenge is the large number of out-of-vocabulary (OOV) and new-translation-sense words. We present a method to identify new translations of both known and unknown source language words that uses NEW-domain comparable document pairs. Starting with a joint distribution of source-target word pairs derived from the OLD-domain parallel corpus, our method recovers a new joint distribution that matches the marginal distributions of the NEW-domain comparable document pairs, while minimizing the divergence from the OLD-domain distribution. Adding learned translations to our French-English MT model results in gains of about 2 BLEU points over strong baselines.

5 0.70075822 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

Author: Fandong Meng ; Jun Xie ; Linfeng Song ; Yajuan Lu ; Qun Liu

Abstract: We present a novel translation model, which simultaneously exploits the constituency and dependency trees on the source side, to combine the advantages of two types of trees. We take head-dependents relations of dependency trees as backbone and incorporate phrasal nodes of constituency trees as the source side of our translation rules, and the target side as strings. Our rules hold the property of long distance reorderings and the compatibility with phrases. Large-scale experimental results show that our model achieves significantly improvements over the constituency-to-string (+2.45 BLEU on average) and dependencyto-string (+0.91 BLEU on average) models, which only employ single type of trees, and significantly outperforms the state-of-theart hierarchical phrase-based model (+1.12 BLEU on average), on three Chinese-English NIST test sets.

6 0.70013767 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation

7 0.69990903 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

8 0.69867545 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

9 0.69786924 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

10 0.69592696 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

11 0.68964434 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

12 0.68922853 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

13 0.68637246 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

14 0.68155968 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction

15 0.67324048 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing

16 0.67314714 128 emnlp-2013-Max-Violation Perceptron and Forced Decoding for Scalable MT Training

17 0.67311811 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

18 0.67284411 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction

19 0.67164338 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)

20 0.67015904 172 emnlp-2013-Simple Customization of Recursive Neural Networks for Semantic Relation Classification