emnlp emnlp2013 emnlp2013-55 knowledge-graph by maker-knowledge-mining

55 emnlp-2013-Decoding with Large-Scale Neural Language Models Improves Translation

Source: pdf

Author: Ashish Vaswani ; Yinggong Zhao ; Victoria Fossum ; David Chiang

Abstract: We explore the application of neural language models to machine translation. We develop a new model that combines the neural probabilistic language model of Bengio et al., rectified linear units, and noise-contrastive estimation, and we incorporate it into a machine translation system both by reranking k-best lists and by direct integration into the decoder. Our large-scale, large-vocabulary experiments across four language pairs show that our neural language model improves translation quality by up to 1. 1B .

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We explore the application of neural language models to machine translation. [sent-6, score-0.236]

2 We develop a new model that combines the neural probabilistic language model of Bengio et al. [sent-7, score-0.326]

3 , rectified linear units, and noise-contrastive estimation, and we incorporate it into a machine translation system both by reranking k-best lists and by direct integration into the decoder. [sent-8, score-0.566]

4 Our large-scale, large-vocabulary experiments across four language pairs show that our neural language model improves translation quality by up to 1. [sent-9, score-0.351]

5 1 Introduction Machine translation (MT) systems rely upon language models (LMs) during decoding to ensure flu- ent output in the target language. [sent-11, score-0.299]

6 Typically, these LMs are n-gram models over discrete representations of words. [sent-12, score-0.095]

7 Such models are susceptible to data sparsity–that is, the probability of an n-gram observed only few times is difficult to estimate reliably, because these models do not use any information about similarities between words. [sent-13, score-0.142]

8 (2003) propose distributed word representations, in which each word is represented as a real-valued vector in a high-dimensional feature space. [sent-15, score-0.143]

9 (2003) introduce a feed-forward neural probabilistic LM (NPLM) that operates over these distributed representations. [sent-17, score-0.488]

10 During training, the NPLM learns both a distributed representation for each word in the vocabulary and an n-gram probability distribution over words in terms of these distributed representations. [sent-18, score-0.368]

11 Although neural LMs have begun to rival or even surpass traditional n-gram LMs (Mnih and Hinton, 2009; Mikolov et al. [sent-19, score-0.391]

12 , 2011), they have not yet been widely adopted in large-vocabulary applications such as MT, because standard maximum likelihood estimation (MLE) requires repeated summations over all words in the vocabulary. [sent-20, score-0.468]

13 A variety of strategies have been proposed to combat this issue, many of which require severe restrictions on the size of the network or the size of the data. [sent-21, score-0.316]

14 First, we use rectified linear units (Nair and Hinton, 2010), whose activations are cheaper to compute than sigmoid or tanh units. [sent-24, score-0.792]

15 There is also evidence that deep neural networks with rectified linear units can be trained successfully without pre-training (Zeiler et al. [sent-25, score-0.824]

16 Second, we train using noise-contrastive estimation or NCE (Gutmann and Hyv a¨rinen, 2010; Mnih and Teh, 2012), which does not require repeated summations over the whole vocabulary. [sent-27, score-0.44]

17 This enables us to efficiently build NPLMs on a larger scale than would be possible otherwise. [sent-28, score-0.076]

18 First, we use it to rerank the k-best output of a hierarchical phrase-based decoder (Chiang, 2007). [sent-30, score-0.205]

19 Second, we integrate it directly into the decoder, allowing the neural LM to more strongly influence the model. [sent-31, score-0.38]

20 6 B translating French, German, and Spanish to English, and up to 1. [sent-33, score-0.041]

21 oc d2s0 i1n3 N Aastusorcaila Ltiaon g fuoarg Ceo Pmrpoucetastsi on ga,l p Laignegsu 1is3t8ic7s–1392, Figure 1: Neural probabilistic language model (Bengio et al. [sent-37, score-0.049]

22 2 Neural Language Models Let V be the vocabulary, and n be the order of the language model; let u range over contexts, i. [sent-39, score-0.077]

23 , strings of length (n − 1), and w range over words. [sent-41, score-0.084]

24 For simplicity, we assume ,th aantd dth we training edra wtao risd a s Fion-r gle very long string, w1 · · · wN, where wN is a special stop symbol, . [sent-42, score-0.346]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('nplm', 0.339), ('rectified', 0.339), ('lms', 0.315), ('bengio', 0.246), ('neural', 0.236), ('summations', 0.226), ('isi', 0.179), ('lm', 0.158), ('mnih', 0.15), ('southern', 0.15), ('distributed', 0.143), ('hinton', 0.117), ('units', 0.102), ('mt', 0.099), ('wn', 0.099), ('gle', 0.098), ('fossum', 0.098), ('susceptible', 0.098), ('zeiler', 0.098), ('wi', 0.093), ('repeated', 0.093), ('cheaper', 0.09), ('victoria', 0.09), ('rerank', 0.09), ('mle', 0.09), ('wtao', 0.09), ('nair', 0.09), ('combat', 0.09), ('california', 0.089), ('decoder', 0.086), ('chiang', 0.084), ('estimation', 0.083), ('vaswani', 0.083), ('rival', 0.083), ('nce', 0.083), ('ashish', 0.079), ('translation', 0.077), ('tanh', 0.075), ('activations', 0.075), ('dth', 0.072), ('begun', 0.072), ('ui', 0.066), ('sigmoid', 0.066), ('severe', 0.064), ('mikolov', 0.064), ('decoding', 0.062), ('operates', 0.06), ('integration', 0.055), ('restrictions', 0.055), ('ent', 0.055), ('issue', 0.053), ('vocabulary', 0.052), ('reliably', 0.052), ('reranking', 0.05), ('representations', 0.05), ('probabilistic', 0.049), ('spanish', 0.048), ('special', 0.046), ('french', 0.045), ('discrete', 0.045), ('linear', 0.045), ('similarities', 0.044), ('range', 0.043), ('laboratory', 0.043), ('symbol', 0.043), ('zhao', 0.042), ('software', 0.041), ('strings', 0.041), ('translating', 0.041), ('integrate', 0.041), ('combines', 0.041), ('upon', 0.04), ('stop', 0.04), ('sparsity', 0.039), ('german', 0.039), ('enables', 0.039), ('teh', 0.039), ('simplicity', 0.039), ('require', 0.038), ('improves', 0.038), ('efficiently', 0.037), ('adopted', 0.037), ('strongly', 0.037), ('ensure', 0.036), ('contexts', 0.036), ('sciences', 0.036), ('gains', 0.036), ('strategies', 0.035), ('deep', 0.035), ('influence', 0.035), ('let', 0.034), ('networks', 0.034), ('network', 0.034), ('successfully', 0.033), ('allowing', 0.031), ('institute', 0.031), ('learns', 0.03), ('string', 0.03), ('yet', 0.029), ('output', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 55 emnlp-2013-Decoding with Large-Scale Neural Language Models Improves Translation

Author: Ashish Vaswani ; Yinggong Zhao ; Victoria Fossum ; David Chiang

2 0.12555031 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning

Author: Lei Cui ; Xilun Chen ; Dongdong Zhang ; Shujie Liu ; Mu Li ; Ming Zhou

Abstract: Domain adaptation for SMT usually adapts models to an individual specific domain. However, it often lacks some correlation among different domains where common knowledge could be shared to improve the overall translation quality. In this paper, we propose a novel multi-domain adaptation approach for SMT using Multi-Task Learning (MTL), with in-domain models tailored for each specific domain and a general-domain model shared by different domains. The parameters of these models are tuned jointly via MTL so that they can learn general knowledge more accurately and exploit domain knowledge better. Our experiments on a largescale English-to-Chinese translation task validate that the MTL-based adaptation approach significantly and consistently improves the translation quality compared to a non-adapted baseline. Furthermore, it also outperforms the individual adaptation of each specific domain.

3 0.10858347 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

Author: Joern Wuebker ; Stephan Peitz ; Felix Rietig ; Hermann Ney

Abstract: Automatically clustering words from a monolingual or bilingual training corpus into classes is a widely used technique in statistical natural language processing. We present a very simple and easy to implement method for using these word classes to improve translation quality. It can be applied across different machine translation paradigms and with arbitrary types of models. We show its efficacy on a small German→English and a larger F ornenc ah s→mGalelrm Gaenrm mtarann→slEatniognli tsahsk a nwdit ha lbaortghe rst Farnednacrhd→ phrase-based salandti nhie traaskrch wiciathl phrase-based translation systems for a common set of models. Our results show that with word class models, the baseline can be improved by up to 1.4% BLEU and 1.0% TER on the French→German task and 0.3% BLEU aonnd t h1e .1 F%re nTcEhR→ on tehrem German→English Btask.

4 0.091929622 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

Author: Kevin Gimpel ; Dhruv Batra ; Chris Dyer ; Gregory Shakhnarovich

Abstract: This paper addresses the problem of producing a diverse set of plausible translations. We present a simple procedure that can be used with any statistical machine translation (MT) system. We explore three ways of using diverse translations: (1) system combination, (2) discriminative reranking with rich features, and (3) a novel post-editing scenario in which multiple translations are presented to users. We find that diversity can improve performance on these tasks, especially for sentences that are difficult for MT.

5 0.091411665 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

Author: Will Y. Zou ; Richard Socher ; Daniel Cer ; Christopher D. Manning

Abstract: We introduce bilingual word embeddings: semantic embeddings associated across two languages in the context of neural language models. We propose a method to learn bilingual embeddings from a large unlabeled corpus, while utilizing MT word alignments to constrain translational equivalence. The new embeddings significantly out-perform baselines in word semantic similarity. A single semantic similarity feature induced with bilingual embeddings adds near half a BLEU point to the results of NIST08 Chinese-English machine translation task.

6 0.085880682 58 emnlp-2013-Dependency Language Models for Sentence Completion

7 0.082851276 113 emnlp-2013-Joint Language and Translation Modeling with Recurrent Neural Networks

8 0.081756964 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

9 0.081472121 128 emnlp-2013-Max-Violation Perceptron and Forced Decoding for Scalable MT Training

10 0.077877931 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

11 0.071837828 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

12 0.063436925 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

13 0.058744837 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

14 0.054699842 158 emnlp-2013-Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

15 0.05446513 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering

16 0.054376394 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation

17 0.053979225 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation

18 0.05093893 134 emnlp-2013-Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks

19 0.049708895 52 emnlp-2013-Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation

20 0.049202878 88 emnlp-2013-Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.139), (1, -0.139), (2, 0.01), (3, -0.022), (4, 0.056), (5, -0.007), (6, 0.008), (7, 0.008), (8, -0.101), (9, -0.015), (10, 0.016), (11, -0.054), (12, -0.022), (13, -0.168), (14, -0.008), (15, -0.051), (16, 0.107), (17, 0.022), (18, -0.01), (19, 0.114), (20, 0.06), (21, 0.042), (22, -0.008), (23, -0.024), (24, -0.101), (25, 0.101), (26, 0.115), (27, 0.024), (28, 0.021), (29, 0.033), (30, 0.069), (31, -0.082), (32, 0.018), (33, -0.084), (34, -0.092), (35, 0.014), (36, -0.019), (37, -0.104), (38, 0.012), (39, -0.014), (40, -0.056), (41, -0.013), (42, 0.055), (43, -0.011), (44, -0.138), (45, 0.028), (46, -0.003), (47, 0.022), (48, 0.102), (49, -0.075)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96152687 55 emnlp-2013-Decoding with Large-Scale Neural Language Models Improves Translation

Author: Ashish Vaswani ; Yinggong Zhao ; Victoria Fossum ; David Chiang

2 0.77874601 52 emnlp-2013-Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation

Author: Rui Wang ; Masao Utiyama ; Isao Goto ; Eiichro Sumita ; Hai Zhao ; Bao-Liang Lu

Abstract: Neural network language models, or continuous-space language models (CSLMs), have been shown to improve the performance of statistical machine translation (SMT) when they are used for reranking n-best translations. However, CSLMs have not been used in the first pass decoding of SMT, because using CSLMs in decoding takes a lot of time. In contrast, we propose a method for converting CSLMs into back-off n-gram language models (BNLMs) so that we can use converted CSLMs in decoding. We show that they outperform the original BNLMs and are comparable with the traditional use of CSLMs in reranking.

3 0.64295197 113 emnlp-2013-Joint Language and Translation Modeling with Recurrent Neural Networks

Author: Michael Auli ; Michel Galley ; Chris Quirk ; Geoffrey Zweig

Abstract: We present a joint language and translation model based on a recurrent neural network which predicts target words based on an unbounded history of both source and target words. The weaker independence assumptions of this model result in a vastly larger search space compared to related feedforward-based language or translation models. We tackle this issue with a new lattice rescoring algorithm and demonstrate its effectiveness empirically. Our joint model builds on a well known recurrent neural network language model (Mikolov, 2012) augmented by a layer of additional inputs from the source language. We show competitive accuracy compared to the traditional channel model features. Our best results improve the output of a system trained on WMT 2012 French-English data by up to 1.5 BLEU, and by 1.1BLEU on average across several test sets.

4 0.55355269 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning

Author: Lei Cui ; Xilun Chen ; Dongdong Zhang ; Shujie Liu ; Mu Li ; Ming Zhou

5 0.535088 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

Author: Kevin Gimpel ; Dhruv Batra ; Chris Dyer ; Gregory Shakhnarovich

6 0.51576489 156 emnlp-2013-Recurrent Continuous Translation Models

7 0.50011152 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

8 0.46200207 58 emnlp-2013-Dependency Language Models for Sentence Completion

9 0.45792088 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

10 0.45645249 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation

11 0.42488325 59 emnlp-2013-Deriving Adjectival Scales from Continuous Space Word Representations

12 0.39574578 128 emnlp-2013-Max-Violation Perceptron and Forced Decoding for Scalable MT Training

13 0.38641849 134 emnlp-2013-Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks

14 0.3615607 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

15 0.35785767 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

16 0.3524906 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

17 0.33838665 122 emnlp-2013-Learning to Freestyle: Hip Hop Challenge-Response Induction via Transduction Rule Segmentation

18 0.33082387 72 emnlp-2013-Elephant: Sequence Labeling for Word and Sentence Segmentation

19 0.32563996 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

20 0.31533852 196 emnlp-2013-Using Crowdsourcing to get Representations based on Regular Expressions

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(18, 0.044), (30, 0.06), (50, 0.012), (51, 0.076), (66, 0.017), (77, 0.693)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9221372 55 emnlp-2013-Decoding with Large-Scale Neural Language Models Improves Translation

Author: Ashish Vaswani ; Yinggong Zhao ; Victoria Fossum ; David Chiang

2 0.8636626 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation

Author: Zhongqiang Huang ; Jacob Devlin ; Rabih Zbib

Abstract: Translation Jacob Devlin Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA j devl in@bbn . com Rabih Zbib Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA r zbib@bbn . com have tried to introduce grammaticality to the transThis paper describes a factored approach to incorporating soft source syntactic constraints into a hierarchical phrase-based translation system. In contrast to traditional approaches that directly introduce syntactic constraints to translation rules by explicitly decorating them with syntactic annotations, which often exacerbate the data sparsity problem and cause other problems, our approach keeps translation rules intact and factorizes the use of syntactic constraints through two separate models: 1) a syntax mismatch model that associates each nonterminal of a translation rule with a distribution of tags that is used to measure the degree of syntactic compatibility of the translation rule on source spans; 2) a syntax-based reordering model that predicts whether a pair of sibling constituents in the constituent parse tree of the source sentence should be reordered or not when translated to the target language. The features produced by both models are used as soft constraints to guide the translation process. Experiments on Chinese-English translation show that the proposed approach significantly improves a strong string-to-dependency translation system on multiple evaluation sets.

3 0.60899621 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

Author: Ann Irvine ; Chris Quirk ; Hal Daume III

Abstract: When using a machine translation (MT) model trained on OLD-domain parallel data to translate NEW-domain text, one major challenge is the large number of out-of-vocabulary (OOV) and new-translation-sense words. We present a method to identify new translations of both known and unknown source language words that uses NEW-domain comparable document pairs. Starting with a joint distribution of source-target word pairs derived from the OLD-domain parallel corpus, our method recovers a new joint distribution that matches the marginal distributions of the NEW-domain comparable document pairs, while minimizing the divergence from the OLD-domain distribution. Adding learned translations to our French-English MT model results in gains of about 2 BLEU points over strong baselines.

4 0.58260894 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation

Author: Qing Dou ; Kevin Knight

Abstract: We introduce dependency relations into deciphering foreign languages and show that dependency relations help improve the state-ofthe-art deciphering accuracy by over 500%. We learn a translation lexicon from large amounts of genuinely non parallel data with decipherment to improve a phrase-based machine translation system trained with limited parallel data. In experiments, we observe BLEU gains of 1.2 to 1.8 across three different test sets.

5 0.37575352 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

Author: Fandong Meng ; Jun Xie ; Linfeng Song ; Yajuan Lu ; Qun Liu

Abstract: We present a novel translation model, which simultaneously exploits the constituency and dependency trees on the source side, to combine the advantages of two types of trees. We take head-dependents relations of dependency trees as backbone and incorporate phrasal nodes of constituency trees as the source side of our translation rules, and the target side as strings. Our rules hold the property of long distance reorderings and the compatibility with phrases. Large-scale experimental results show that our model achieves significantly improvements over the constituency-to-string (+2.45 BLEU on average) and dependencyto-string (+0.91 BLEU on average) models, which only employ single type of trees, and significantly outperforms the state-of-theart hierarchical phrase-based model (+1.12 BLEU on average), on three Chinese-English NIST test sets.

6 0.34138662 88 emnlp-2013-Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding

7 0.34023327 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

8 0.33074299 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

9 0.33042225 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

10 0.3264685 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

11 0.31927669 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

12 0.31231785 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology

13 0.31012544 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

14 0.30764854 113 emnlp-2013-Joint Language and Translation Modeling with Recurrent Neural Networks

15 0.30482256 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

16 0.30227941 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation

17 0.28142184 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction

18 0.28009734 156 emnlp-2013-Recurrent Continuous Translation Models

19 0.27753931 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering

20 0.27524608 128 emnlp-2013-Max-Violation Perceptron and Forced Decoding for Scalable MT Training