emnlp emnlp2013 emnlp2013-57 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Qing Dou ; Kevin Knight
Abstract: We introduce dependency relations into deciphering foreign languages and show that dependency relations help improve the state-ofthe-art deciphering accuracy by over 500%. We learn a translation lexicon from large amounts of genuinely non parallel data with decipherment to improve a phrase-based machine translation system trained with limited parallel data. In experiments, we observe BLEU gains of 1.2 to 1.8 across three different test sets.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We introduce dependency relations into deciphering foreign languages and show that dependency relations help improve the state-ofthe-art deciphering accuracy by over 500%. [sent-2, score-1.003]
2 We learn a translation lexicon from large amounts of genuinely non parallel data with decipherment to improve a phrase-based machine translation system trained with limited parallel data. [sent-3, score-1.656]
3 1 Introduction State-of-the-art machine translation (MT) systems apply statistical techniques to learn translation rules from large amounts of parallel data. [sent-7, score-0.659]
4 In general, it is easier to obtain non parallel data. [sent-9, score-0.287]
5 The ability to build a machine translation system using monolingual data could alleviate problems caused by insufficient parallel data. [sent-10, score-0.521]
6 Towards building a machine translation system without a parallel corpus, Klementiev et al. [sent-11, score-0.413]
7 (2012) use non parallel data to estimate parameters for a large scale MT system. [sent-12, score-0.287]
8 Other work tries to learn full MT systems using only non parallel data through decipherment (Ravi and Knight, 2011; Ravi, 2013). [sent-13, score-0.786]
9 Given that we often have some parallel data, it is more practical to improve a translation system trained on parallel corpora with non parallel 1668 Figure 1: Improving machine translation with decipherment (Grey boxes represent new data and process). [sent-15, score-1.598]
10 Mono: monolingual; LM: language model; LEX: translation lexicon; TM: translation model. [sent-16, score-0.398]
11 Dou and Knight (2012) successfully apply decipherment to learn a domain specific translation lexicon from monolingual data to improve out-ofdomain machine translation. [sent-18, score-0.937]
12 Moreover, the non parallel data used in their experiments is created from a parallel corpus. [sent-20, score-0.457]
13 In this work, we improve previous work by Dou and Knight (2012) using genuinely non parallel data, Proce Sdeiantgtlse o,f W thaesh 2i0n1gt3o nC,o UnSfeAre,n 1c8e- o2n1 E Omctpoibriecra 2l0 M13et. [sent-22, score-0.341]
14 oc d2s0 i1n3 N Aastusorcaila Ltiaon g fuoarg Ceo Pmrpoucetastsi on ga,l p Laignegsu 1is6t6ic8s–1676, and propose a framework to improve a machine translation system trained with a small amount of parallel data. [sent-24, score-0.443]
15 As shown in Figure 1, we use a lexicon learned from decipherment to improve translations of both observed and out-of-vocabulary (OOV) words. [sent-25, score-0.777]
16 The main contributions of this work are: • • We extract bigrams based on dependency relWateio nexs rfaocrt decipherment, dw ohnic dhe improves t rheestate-of-the-art deciphering accuracy by over 500%. [sent-26, score-0.721]
17 We demonstrate how to improve translations oWfe w doermdso ostbrsaeterv ehdow win t parallel ed tartaan by using a translation lexicon obtained from large amounts of non parallel data. [sent-27, score-0.954]
18 • • 2 We show that decipherment is able to find corrWecet sthraonwsl tahtiaotn dse fcoirp hOerOmVe wnto irsd as. [sent-28, score-0.499]
19 b We use a translation lexicon learned by deciphering large aamtioonun ltesx oicfo non parallel d daetato improve a phrase-based MT system trained with limited amounts of parallel data. [sent-29, score-1.268]
20 Previous Work Motivated by the idea that a translation lexicon induced from non parallel data can be applied to MT, a variety of prior research has tried to build a translation lexicon from non parallel or comparable data (Rapp, 1995; Fung and Yee, 1998; Koehn and Knight, 2002; Haghighi et al. [sent-33, score-1.221]
21 Although previous work is able to build a translation lexicon without parallel data, little has used the lexicon to improve machine translation. [sent-36, score-0.668]
22 There has been increasing interest in learning translation lexicons from non parallel data with decipherment techniques (Ravi and Knight, 2011; Dou and Knight, 2012; Nuhn et al. [sent-37, score-1.015]
23 Decipherment views one language as a cipher for another and learns a translation lexicon that produces a good decipherment. [sent-39, score-0.399]
24 In an effort to build a MT system without a parallel corpus, Ravi and Knight (201 1) view Spanish as a 1669 cipher for English and apply Bayesian learning to directly decipher Spanish into English. [sent-40, score-0.409]
25 Dou and Knight (2012) propose two techniques to make Bayesian decipherment scalable. [sent-42, score-0.499]
26 Reducing a ciphertext to a set of bigrams with counts significantly reduces the amount of cipher data. [sent-44, score-0.4]
27 According to Dou and Knight (2012), a ciphertext bigram F is generated through the fol- lowing generative story: • • Generate a sequence of two plaintext tokens e1e2 rwatiteh probability P(e1e2) given by a klaenn-s guage model built from large numbers of plaintext bigrams. [sent-45, score-0.375]
28 The probability of any cipher bigram F is: = Y2 P(F) XP(e1e2)YP(fi|ei) eX1 Xe2 iY= Y1 Given a corpus of N cipher bigrams F1. [sent-47, score-0.49]
29 How- ever, EM has time complexity O(N · Ve2) and space complexity O(Vf · Ve), wexhietyre O Vf, Ve are the sizes of ciphertext and plaintext vocabularies respectively, and N is the number of cipher bigrams. [sent-53, score-0.24]
30 At the end of sampling, P(fi |ei) is estimated by: P(fi|ei) =cocuountn(tf(ie,ie)i) However, Bayesian decipherment is still very slow with Gibbs sampling (Geman and Geman, 1987), as each sampling step requires considering Ve possibilities. [sent-58, score-0.649]
31 3 From Adjacent Bigrams to Dependency Bigrams A major limitation of work by Dou and Knight (2012) is their monotonic generative story for deciphering adjacent bigrams. [sent-60, score-0.423]
32 While the generation process works well for deciphering similar languages (e. [sent-61, score-0.358]
33 In this section, we first look at why adjacent bigrams are bad for decipherment. [sent-66, score-0.343]
34 The left column in Table 1 contains adjacent bigrams extracted from the Spanish phrase “misi ´on 1670 de naciones unidas en oriente medio”. [sent-68, score-0.534]
35 The correct decipherment for the bigram “naciones unidas” should be “united nations”. [sent-69, score-0.559]
36 Since the deciphering model described by Dou and Knight (2012) does not consider word reordering, it needs to decipher the bigram into “nations united” in order to get the right word translations “naciones”→“nations” athned r“iugnhitd was”or→d“ turannitseldat”i. [sent-70, score-0.573]
37 o However, nthese” English olansn”guage mnidoadse”l u→se“du nfoitre decipherment tihs beu Einlt gflrioshm lEann-- glish adjacent bigrams, so it strongly disprefers “nations united” and is not likely to produce a sensible decipherment for “naciones unidas”. [sent-71, score-1.091]
38 Thus, without considering word reordering, the model described by Dou and Knight (2012) is not a good fit for deciphering Spanish into English. [sent-73, score-0.33]
39 However, if we extract bigrams based on dependency relations for both languages, the model fits better. [sent-74, score-0.37]
40 To extract such bigrams, we first use dependency parsers to parse both languages, and extract bigrams by putting head word first, followed by the modifier. [sent-75, score-0.396]
41 The right column in Table 1 lists examples of Spanish dependency bigrams extracted from the same Spanish phrase. [sent-77, score-0.37]
42 With a language model built with English dependency bigrams, the same model used for deciphering adjacent bigrams is able to decipher Spanish dependency bigram “naciones(head) unidas(modifier)” into “nations(head) united(modifier)”. [sent-78, score-1.093]
43 We might instead propose to consider word reordering when deciphering adjacent bigrams (e. [sent-79, score-0.713]
44 How- ever, using dependency bigrams has the following advantages: • • First, using dependency bigrams avoids complicating nthge d model, keeping deciphering oemffi-cient and scalable. [sent-82, score-1.07]
45 Furthermore, using dependency bigrams allows us to use dependency types to further 1As use of “del” and “de” in Spanish is much more frequent than the use of “of” in English, we skip those words by using their head words as new heads if any of them serves as a head. [sent-84, score-0.519]
46 Then all of the following English dependency bigrams are possible decipherments: “accepted(verb) UN(subject)”, “accepted(verb) government(subject)”, “accepted(verb) request(object)”. [sent-87, score-0.37]
47 However, if we know the type of the Spanish dependency bigram and use a language model built with the same type in English, the only possible decipher- ment is “accepted(verb) request(object)”. [sent-88, score-0.204]
48 4 Deciphering Spanish Gigaword In this section, we compare dependency bigrams with adjacent bigrams for deciphering Spanish into English. [sent-90, score-1.043]
49 1 Data We use the Gigaword corpus for our decipherment experiments. [sent-92, score-0.499]
50 We use only the AFP (Agence FrancePresse) section of the corpus in decipherment experiments. [sent-94, score-0.499]
51 The baseline system collects adjacent bigrams and their counts from Spanish and English texts. [sent-103, score-0.367]
52 It then builds an English bigram language model using the English adjacent bigrams and uses it to decipher the Spanish adjacent bigrams. [sent-104, score-0.592]
53 1671 GGrro u p 21PVDere rpbpe/oSnsduitebionjnceyc/Pt Treyp oes it on-Object, TableGr2o:uDpe3pendVeNneocrubyn/rN/eNloa ut ion -nOsMbdojivedicditfeiderinto hre groups We build the second system, Dependency, using dependency bigrams for decipherment. [sent-105, score-0.399]
54 As the two parsers do not output the same set of dependency relations, we cannot extract all types of dependency bigrams. [sent-106, score-0.266]
55 Instead, we select a subset of dependency bigrams whose dependency relations are shared by the two parser outputs. [sent-107, score-0.49]
56 The third system, DepType, is built using both dependent bigrams and their dependency types. [sent-110, score-0.394]
57 We first extract dependency bigrams for both languages, then group them based on their dependency types. [sent-111, score-0.49]
58 As both parsers treat noun phrases dependent on “del”, “de”, and “of” as prepositional phrases, we choose to divide the dependency bigrams into 3 groups and list them in Table 2. [sent-112, score-0.396]
59 A separate language model is built for each group of English dependency bigrams and used to decipher the group of Spanish dependency bigrams with same dependency type. [sent-113, score-0.98]
60 3 Sampling Procedure In experiments, we find that the iterative sampling method described by Dou and Knight (2012) helps improve deciphering accuracy. [sent-118, score-0.435]
61 The details of the new sampling procedure are provided here: • • • Extract dependency bigrams from parsing outputs aacntd d ecpoellencdte tnhceyir b ciogruanmtss. [sent-121, score-0.445]
62 Keep bigrams whose counts are greater than a tKhereesph boildgr α. [sent-122, score-0.25]
63 T trhaenns claotniosntru pcrot a atrbailnistilaetsio Pn( tea|bfle) by keep- ing translation pairs (f, e) seen in more than one decipherment and use the average P(e|f) as teh dee new etrramnsenlatti aonnd probability. [sent-127, score-0.698]
64 • • Lower the threshold α to include more bigrams iLnotow etrh eth sampling process. [sent-128, score-0.325]
65 dSet amrto 1e0 b gdirfafemr-s ent sampling processes again and initialize the first sample using the translation pairs obtained from the previous step (for each Spanish token f, choose an English token e whose P(e|f) is tfh,e c highest). [sent-129, score-0.274]
66 We use type accuracy as our evaluation metric: Given a word type f in Spanish, we find a translation pair (f, e) with the highest average P(e|f) from the translation tatbhlee hleiagrhneesdt through decipherment. [sent-135, score-0.419]
67 hIfe t threan tsrlaantisolantio tanpair (f, e) can also be found in a gold translation lexicon Tgold, we treat the word type f as correctly deciphered. [sent-136, score-0.309]
68 a Wd |eV d |e bfiene th type accuracy as |C| ||CV || T|o create Tgold, we use GIZA (Och and Ney, 2003) to align a small amount of Spanish-English parallel text (1 million tokens for each language), and use the lexicon derived from the alignment as our gold translation lexicon. [sent-138, score-0.577]
69 5 Results During decipherment, we gradually increase the size of Spanish texts and compare the learning curves of three deciphering systems in Figure 2. [sent-142, score-0.33]
70 1672 Figure 2: Learning curves for three decipherment systems. [sent-143, score-0.499]
71 Compared with Adjacent (previous state of the art), systems that use dependency bigrams improve deciphering accuracy by over 500%. [sent-144, score-0.751]
72 1 Data We use approximately one million tokens of the Europarl corpus (Koehn, 2005) as our small out-ofdomain parallel training data and Gigaword as our large in-domain monolingual training data to build language models and a new translation lexicon to improve a phrase-based MT baseline system. [sent-154, score-0.694]
73 PBMT has 3 models: a translation model, a distortion model, and a language model. [sent-164, score-0.199]
74 In the following sections, we describe how to use a translation lexicon learned from large amounts of non parallel data to improve translation of OOV words, as well as words observed in Tphrase. [sent-170, score-0.947]
75 We perform 20 random restarts with 10k iterations on each and build a word-to-word translation lexicon Tdecipher by collecting translation pairs seen in at least 3 final decipherments with either P(f|e) ≥ 0. [sent-177, score-0.589]
76 3 Improving Translation of Observed Words with Decipherment To improve translation of words observed in our parallel corpus, we simply use Tdecipher as an additional parallel corpus. [sent-182, score-0.596]
77 First, we filter Tdecipher by keeping only translation pairs (f, e), where f is observed in the Spanish part and e is observed in the English part of the parallel corpus. [sent-183, score-0.423]
78 The training and tuning process is the same as the baseline machine translation system PBMT. [sent-185, score-0.306]
79 4 Improving OOV translation with Decipherment As Tdecipher is learned from large amounts of indomain monolingual data, we expect that Tdecipher contains a number of useful translations for words not seen in the limited amount of parallel data (OOV words). [sent-189, score-0.653]
80 During decoding, if a source word f is in Tphrase, its translation options are collected from Tphrase exclusively. [sent-191, score-0.199]
81 If f is not in either translation table, the decoder just copies it directly to the output. [sent-193, score-0.221]
82 However, when an OOV’s correct translation is same as its surface form and all its possible translations in Tdecipher are wrong, it is better to just copy OOV words directly to output. [sent-195, score-0.286]
83 To avoid over trusting Tdecipher, we add a new translation pair (f, f) for each source word f in Tdecipher if the translation pair (f, f) is not originally in Tdecipher. [sent-197, score-0.398]
84 For each newly added translation pair, both of its log translation probabilities are set to 0. [sent-198, score-0.417]
85 To distinguish the added translation pairs from the others learned through decipherment, we add a binary feature θ to each translation pair in Tdecipher. [sent-199, score-0.422]
86 5 A Combined Approach In the end, we build a system Decipher-COMB, which uses Tdecipher to improve translation of both observed and OOV words with methods described in sections 5. [sent-204, score-0.309]
87 Table 4 shows that the translation lexicon learned from decipherment helps achieve higher BLEU scores across tuning and testing sets. [sent-211, score-0.918]
88 First, adding Tdecipher to small amounts of parallel corpus improves word level translation probabilities, which lead to better lexical weighting; second, Tdecipher contains new alternative translations for words observed in the parallel corpus. [sent-215, score-0.724]
89 We also observe that systems using Tdecipher learned by deciphering dependency bigrams leads to larger gains in BLEU scores. [sent-217, score-0.724]
90 When decipherment is used to improve translation of both observed and OOV words, we see improvement in BLEU score as high as 1. [sent-218, score-0.755]
91 The consistent improvement on the tuning and different testing data suggests that decipherment is capable of learning good translations for a number of OOV words. [sent-220, score-0.672]
92 To further demonstrate that our decipherment approach finds useful translations for OOV words, we list the top 10 most frequent OOV words from both the tuning set and testing set as well as their translations (up to three most likely translations) in Table 5. [sent-221, score-0.809]
93 From the table, we can see that decipherment finds correct translations (bolded) for 7 out of the 10 most frequent OOV words. [sent-224, score-0.636]
94 Nonetheless, decipherment still finds enough correct translations to improve the baseline. [sent-226, score-0.637]
95 6 Conclusion We introduce syntax for deciphering Spanish into English. [sent-227, score-0.33]
96 Experiment results show that using dependency bigrams improves decipherment accuracy by over 500% compared with the state-of-the-art approach. [sent-228, score-0.89]
97 Moreover, we learn a domain specific translation lexicon by deciphering large amounts of monolingual data and show that the lexicon can improve a baseline machine translation system trained with limited parallel data. [sent-229, score-1.365]
98 409 Table 4: Systems that use translation lexicons learned from decipherment show consistent improvement over the baseline system across tuning and testing sets. [sent-242, score-0.862]
99 Do- main adaptation for machine translation by mining unseen words. [sent-260, score-0.219]
100 Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences. [sent-275, score-0.508]
wordName wordTfidf (topN-words)
[('decipherment', 0.499), ('deciphering', 0.33), ('tdecipher', 0.294), ('spanish', 0.267), ('bigrams', 0.25), ('translation', 0.199), ('oov', 0.181), ('parallel', 0.17), ('dou', 0.169), ('knight', 0.159), ('dependency', 0.12), ('non', 0.117), ('lexicon', 0.11), ('decipher', 0.096), ('adjacent', 0.093), ('cipher', 0.09), ('plaintext', 0.09), ('translations', 0.087), ('naciones', 0.087), ('ravi', 0.08), ('monolingual', 0.079), ('sampling', 0.075), ('amounts', 0.071), ('tphrase', 0.069), ('unidas', 0.069), ('nations', 0.064), ('ei', 0.064), ('tuning', 0.063), ('irvine', 0.06), ('ciphertext', 0.06), ('pbmt', 0.06), ('bigram', 0.06), ('bleu', 0.053), ('decipherments', 0.052), ('deptype', 0.052), ('geman', 0.052), ('tgold', 0.052), ('tokens', 0.051), ('fi', 0.05), ('europarl', 0.044), ('nuhn', 0.041), ('mt', 0.041), ('reordering', 0.04), ('accepted', 0.038), ('english', 0.038), ('moses', 0.037), ('bayesian', 0.037), ('iph', 0.035), ('medio', 0.035), ('oriente', 0.035), ('pbayes', 0.035), ('bohnet', 0.034), ('koehn', 0.033), ('garera', 0.03), ('lexicons', 0.03), ('improve', 0.03), ('build', 0.029), ('frequent', 0.029), ('kevin', 0.028), ('languages', 0.028), ('united', 0.028), ('association', 0.028), ('afp', 0.027), ('sujith', 0.027), ('malte', 0.027), ('oovs', 0.027), ('qing', 0.027), ('observed', 0.027), ('chris', 0.027), ('million', 0.026), ('parsers', 0.026), ('copying', 0.026), ('vf', 0.026), ('mert', 0.025), ('foreign', 0.024), ('built', 0.024), ('klementiev', 0.024), ('genuinely', 0.024), ('system', 0.024), ('bilingual', 0.024), ('och', 0.024), ('gigaword', 0.024), ('learned', 0.024), ('iy', 0.023), ('fung', 0.023), ('del', 0.023), ('testing', 0.023), ('news', 0.023), ('limited', 0.023), ('request', 0.022), ('slice', 0.022), ('copies', 0.022), ('yp', 0.022), ('seed', 0.022), ('accuracy', 0.021), ('finds', 0.021), ('machine', 0.02), ('philipp', 0.02), ('verb', 0.02), ('probabilities', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation
Author: Qing Dou ; Kevin Knight
Abstract: We introduce dependency relations into deciphering foreign languages and show that dependency relations help improve the state-ofthe-art deciphering accuracy by over 500%. We learn a translation lexicon from large amounts of genuinely non parallel data with decipherment to improve a phrase-based machine translation system trained with limited parallel data. In experiments, we observe BLEU gains of 1.2 to 1.8 across three different test sets.
2 0.25291127 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation
Author: Ann Irvine ; Chris Quirk ; Hal Daume III
Abstract: When using a machine translation (MT) model trained on OLD-domain parallel data to translate NEW-domain text, one major challenge is the large number of out-of-vocabulary (OOV) and new-translation-sense words. We present a method to identify new translations of both known and unknown source language words that uses NEW-domain comparable document pairs. Starting with a joint distribution of source-target word pairs derived from the OLD-domain parallel corpus, our method recovers a new joint distribution that matches the marginal distributions of the NEW-domain comparable document pairs, while minimizing the divergence from the OLD-domain distribution. Adding learned translations to our French-English MT model results in gains of about 2 BLEU points over strong baselines.
3 0.14513485 54 emnlp-2013-Decipherment with a Million Random Restarts
Author: Taylor Berg-Kirkpatrick ; Dan Klein
Abstract: This paper investigates the utility and effect of running numerous random restarts when using EM to attack decipherment problems. We find that simple decipherment models are able to crack homophonic substitution ciphers with high accuracy if a large number of random restarts are used but almost completely fail with only a few random restarts. For particularly difficult homophonic ciphers, we find that big gains in accuracy are to be had by running upwards of 100K random restarts, which we accomplish efficiently using a GPU-based parallel implementation. We run a series of experiments using millions of random restarts in order to investigate other empirical properties of decipherment problems, including the famously uncracked Zodiac 340.
4 0.13035671 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation
Author: Zhongqiang Huang ; Jacob Devlin ; Rabih Zbib
Abstract: Translation Jacob Devlin Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA j devl in@bbn . com Rabih Zbib Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA r zbib@bbn . com have tried to introduce grammaticality to the transThis paper describes a factored approach to incorporating soft source syntactic constraints into a hierarchical phrase-based translation system. In contrast to traditional approaches that directly introduce syntactic constraints to translation rules by explicitly decorating them with syntactic annotations, which often exacerbate the data sparsity problem and cause other problems, our approach keeps translation rules intact and factorizes the use of syntactic constraints through two separate models: 1) a syntax mismatch model that associates each nonterminal of a translation rule with a distribution of tags that is used to measure the degree of syntactic compatibility of the translation rule on source spans; 2) a syntax-based reordering model that predicts whether a pair of sibling constituents in the constituent parse tree of the source sentence should be reordered or not when translated to the target language. The features produced by both models are used as soft constraints to guide the translation process. Experiments on Chinese-English translation show that the proposed approach significantly improves a strong string-to-dependency translation system on multiple evaluation sets.
5 0.11418961 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models
Author: Joern Wuebker ; Stephan Peitz ; Felix Rietig ; Hermann Ney
Abstract: Automatically clustering words from a monolingual or bilingual training corpus into classes is a widely used technique in statistical natural language processing. We present a very simple and easy to implement method for using these word classes to improve translation quality. It can be applied across different machine translation paradigms and with arbitrary types of models. We show its efficacy on a small German→English and a larger F ornenc ah s→mGalelrm Gaenrm mtarann→slEatniognli tsahsk a nwdit ha lbaortghe rst Farnednacrhd→ phrase-based salandti nhie traaskrch wiciathl phrase-based translation systems for a common set of models. Our results show that with word class models, the baseline can be improved by up to 1.4% BLEU and 1.0% TER on the French→German task and 0.3% BLEU aonnd t h1e .1 F%re nTcEhR→ on tehrem German→English Btask.
6 0.11235298 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation
7 0.10960463 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation
8 0.10803863 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk
9 0.10083625 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
10 0.095523693 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)
11 0.091718905 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation
12 0.088244133 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology
13 0.088209599 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning
14 0.082726866 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations
15 0.080686785 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
16 0.077924706 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
17 0.07662005 201 emnlp-2013-What is Hidden among Translation Rules
18 0.076598614 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation
19 0.074351855 83 emnlp-2013-Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech
20 0.074293397 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation
topicId topicWeight
[(0, -0.204), (1, -0.222), (2, 0.036), (3, -0.001), (4, 0.047), (5, -0.078), (6, -0.055), (7, -0.038), (8, 0.053), (9, -0.122), (10, 0.006), (11, -0.022), (12, 0.136), (13, -0.036), (14, -0.025), (15, -0.004), (16, 0.0), (17, 0.044), (18, 0.085), (19, 0.103), (20, 0.014), (21, 0.07), (22, 0.057), (23, 0.136), (24, 0.097), (25, -0.106), (26, -0.176), (27, 0.154), (28, 0.059), (29, -0.04), (30, 0.045), (31, -0.242), (32, 0.077), (33, 0.007), (34, 0.059), (35, -0.239), (36, 0.007), (37, 0.068), (38, 0.075), (39, -0.221), (40, -0.099), (41, 0.044), (42, -0.131), (43, -0.003), (44, 0.053), (45, 0.037), (46, 0.015), (47, -0.037), (48, -0.046), (49, -0.004)]
simIndex simValue paperId paperTitle
same-paper 1 0.93647063 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation
Author: Qing Dou ; Kevin Knight
Abstract: We introduce dependency relations into deciphering foreign languages and show that dependency relations help improve the state-ofthe-art deciphering accuracy by over 500%. We learn a translation lexicon from large amounts of genuinely non parallel data with decipherment to improve a phrase-based machine translation system trained with limited parallel data. In experiments, we observe BLEU gains of 1.2 to 1.8 across three different test sets.
2 0.72032905 54 emnlp-2013-Decipherment with a Million Random Restarts
Author: Taylor Berg-Kirkpatrick ; Dan Klein
Abstract: This paper investigates the utility and effect of running numerous random restarts when using EM to attack decipherment problems. We find that simple decipherment models are able to crack homophonic substitution ciphers with high accuracy if a large number of random restarts are used but almost completely fail with only a few random restarts. For particularly difficult homophonic ciphers, we find that big gains in accuracy are to be had by running upwards of 100K random restarts, which we accomplish efficiently using a GPU-based parallel implementation. We run a series of experiments using millions of random restarts in order to investigate other empirical properties of decipherment problems, including the famously uncracked Zodiac 340.
3 0.68240261 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation
Author: Ann Irvine ; Chris Quirk ; Hal Daume III
Abstract: When using a machine translation (MT) model trained on OLD-domain parallel data to translate NEW-domain text, one major challenge is the large number of out-of-vocabulary (OOV) and new-translation-sense words. We present a method to identify new translations of both known and unknown source language words that uses NEW-domain comparable document pairs. Starting with a joint distribution of source-target word pairs derived from the OLD-domain parallel corpus, our method recovers a new joint distribution that matches the marginal distributions of the NEW-domain comparable document pairs, while minimizing the divergence from the OLD-domain distribution. Adding learned translations to our French-English MT model results in gains of about 2 BLEU points over strong baselines.
4 0.54760462 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk
Author: Xiaoning Zhu ; Zhongjun He ; Hua Wu ; Haifeng Wang ; Conghui Zhu ; Tiejun Zhao
Abstract: This paper proposes a novel approach that utilizes a machine learning method to improve pivot-based statistical machine translation (SMT). For language pairs with few bilingual data, a possible solution in pivot-based SMT using another language as a
5 0.52140683 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation
Author: Kevin Gimpel ; Dhruv Batra ; Chris Dyer ; Gregory Shakhnarovich
Abstract: This paper addresses the problem of producing a diverse set of plausible translations. We present a simple procedure that can be used with any statistical machine translation (MT) system. We explore three ways of using diverse translations: (1) system combination, (2) discriminative reranking with rich features, and (3) a novel post-editing scenario in which multiple translations are presented to users. We find that diversity can improve performance on these tasks, especially for sentences that are difficult for MT.
6 0.46702906 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models
7 0.4204756 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models
8 0.38787752 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation
9 0.38542536 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
10 0.3721979 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)
12 0.36015356 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation
13 0.35990581 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations
14 0.35883927 201 emnlp-2013-What is Hidden among Translation Rules
15 0.35843936 42 emnlp-2013-Building Specialized Bilingual Lexicons Using Large Scale Background Knowledge
16 0.35370877 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation
17 0.3166616 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning
18 0.30729705 156 emnlp-2013-Recurrent Continuous Translation Models
19 0.30450752 150 emnlp-2013-Pair Language Models for Deriving Alternative Pronunciations and Spellings from Pronunciation Dictionaries
20 0.29709825 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
topicId topicWeight
[(3, 0.018), (10, 0.012), (18, 0.025), (22, 0.059), (30, 0.077), (45, 0.023), (48, 0.02), (50, 0.017), (51, 0.172), (53, 0.022), (66, 0.029), (68, 0.075), (75, 0.019), (77, 0.298)]
simIndex simValue paperId paperTitle
1 0.95919979 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation
Author: Zhongqiang Huang ; Jacob Devlin ; Rabih Zbib
Abstract: Translation Jacob Devlin Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA j devl in@bbn . com Rabih Zbib Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA r zbib@bbn . com have tried to introduce grammaticality to the transThis paper describes a factored approach to incorporating soft source syntactic constraints into a hierarchical phrase-based translation system. In contrast to traditional approaches that directly introduce syntactic constraints to translation rules by explicitly decorating them with syntactic annotations, which often exacerbate the data sparsity problem and cause other problems, our approach keeps translation rules intact and factorizes the use of syntactic constraints through two separate models: 1) a syntax mismatch model that associates each nonterminal of a translation rule with a distribution of tags that is used to measure the degree of syntactic compatibility of the translation rule on source spans; 2) a syntax-based reordering model that predicts whether a pair of sibling constituents in the constituent parse tree of the source sentence should be reordered or not when translated to the target language. The features produced by both models are used as soft constraints to guide the translation process. Experiments on Chinese-English translation show that the proposed approach significantly improves a strong string-to-dependency translation system on multiple evaluation sets.
2 0.94780111 55 emnlp-2013-Decoding with Large-Scale Neural Language Models Improves Translation
Author: Ashish Vaswani ; Yinggong Zhao ; Victoria Fossum ; David Chiang
Abstract: We explore the application of neural language models to machine translation. We develop a new model that combines the neural probabilistic language model of Bengio et al., rectified linear units, and noise-contrastive estimation, and we incorporate it into a machine translation system both by reranking k-best lists and by direct integration into the decoder. Our large-scale, large-vocabulary experiments across four language pairs show that our neural language model improves translation quality by up to 1. 1B .
same-paper 3 0.86962402 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation
Author: Qing Dou ; Kevin Knight
Abstract: We introduce dependency relations into deciphering foreign languages and show that dependency relations help improve the state-ofthe-art deciphering accuracy by over 500%. We learn a translation lexicon from large amounts of genuinely non parallel data with decipherment to improve a phrase-based machine translation system trained with limited parallel data. In experiments, we observe BLEU gains of 1.2 to 1.8 across three different test sets.
4 0.86933613 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation
Author: Ann Irvine ; Chris Quirk ; Hal Daume III
Abstract: When using a machine translation (MT) model trained on OLD-domain parallel data to translate NEW-domain text, one major challenge is the large number of out-of-vocabulary (OOV) and new-translation-sense words. We present a method to identify new translations of both known and unknown source language words that uses NEW-domain comparable document pairs. Starting with a joint distribution of source-target word pairs derived from the OLD-domain parallel corpus, our method recovers a new joint distribution that matches the marginal distributions of the NEW-domain comparable document pairs, while minimizing the divergence from the OLD-domain distribution. Adding learned translations to our French-English MT model results in gains of about 2 BLEU points over strong baselines.
5 0.71112037 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
Author: Fandong Meng ; Jun Xie ; Linfeng Song ; Yajuan Lu ; Qun Liu
Abstract: We present a novel translation model, which simultaneously exploits the constituency and dependency trees on the source side, to combine the advantages of two types of trees. We take head-dependents relations of dependency trees as backbone and incorporate phrasal nodes of constituency trees as the source side of our translation rules, and the target side as strings. Our rules hold the property of long distance reorderings and the compatibility with phrases. Large-scale experimental results show that our model achieves significantly improvements over the constituency-to-string (+2.45 BLEU on average) and dependencyto-string (+0.91 BLEU on average) models, which only employ single type of trees, and significantly outperforms the state-of-theart hierarchical phrase-based model (+1.12 BLEU on average), on three Chinese-English NIST test sets.
6 0.68864673 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation
7 0.68451369 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
8 0.67619574 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models
9 0.67396557 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation
10 0.67376125 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology
11 0.67319334 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation
12 0.66286457 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models
13 0.66043091 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation
14 0.65639645 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction
15 0.65360928 88 emnlp-2013-Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding
16 0.64850169 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation
17 0.63425457 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
18 0.62711281 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
19 0.62618786 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk
20 0.62521523 111 emnlp-2013-Joint Chinese Word Segmentation and POS Tagging on Heterogeneous Annotated Corpora with Multiple Task Learning