emnlp emnlp2013 emnlp2013-156 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nal Kalchbrenner ; Phil Blunsom
Abstract: We introduce a class of probabilistic continuous translation models called Recurrent Continuous Translation Models that are purely based on continuous representations for words, phrases and sentences and do not rely on alignments or phrasal translation units. The models have a generation and a conditioning aspect. The generation of the translation is modelled with a target Recurrent Language Model, whereas the conditioning on the source sentence is modelled with a Convolutional Sentence Model. Through various experiments, we show first that our models obtain a perplexity with respect to gold translations that is > 43% lower than that of stateof-the-art alignment-based translation models. Secondly, we show that they are remarkably sensitive to the word order, syntax, and meaning of the source sentence despite lacking alignments. Finally we show that they match a state-of-the-art system when rescoring n-best lists of translations.
Reference: text
sentIndex sentText sentNum sentScore
1 kal chbrenne r phi l Abstract We introduce a class of probabilistic continuous translation models called Recurrent Continuous Translation Models that are purely based on continuous representations for words, phrases and sentences and do not rely on alignments or phrasal translation units. [sent-2, score-0.48]
2 The generation of the translation is modelled with a target Recurrent Language Model, whereas the conditioning on the source sentence is modelled with a Convolutional Sentence Model. [sent-4, score-0.365]
3 Through various experiments, we show first that our models obtain a perplexity with respect to gold translations that is > 43% lower than that of stateof-the-art alignment-based translation models. [sent-5, score-0.243]
4 1 Introduction In most statistical approaches to machine translation the basic units of translation are phrases that are composed of one or more words. [sent-8, score-0.182]
5 A crucial component of translation systems are models that estimate translation probabilities for pairs of phrases, one phrase being from the source language and the other from the target language. [sent-9, score-0.324]
6 uk ties, linguistic or otherwise, they do not share statistical weight in the models’ estimation of their translation probabilities. [sent-15, score-0.145]
7 Word representations have also shown a marked sensitivity to conditioning information (Mikolov and Zweig, 2012). [sent-23, score-0.187]
8 Phrase-based continuous translation models were first proposed in (Schwenk et al. [sent-33, score-0.212]
9 Although wide-reaching in their scope, these models are limited to fixed-size source and target phrases and simplify the dependencies between the target words taking into account restricted target language modelling information. [sent-40, score-0.297]
10 We describe a class of continuous translation models called Recurrent Continuous Translation Models (RCTM) that map without loss of generality a sentence from the source language to a probability distribution over the sentences in the target language. [sent-41, score-0.401]
11 Both models adopt a recurrent language model for the generation of the target translation (Mikolov et al. [sent-43, score-0.313]
12 In contrast to other n-gram approaches, the recurrent language model makes no Markov assumptions about the dependencies of the words in the target sentence. [sent-45, score-0.222]
13 The two RCTMs differ in the way they condition the target language model on the source sentence. [sent-46, score-0.142]
14 The first RCTM uses the convolutional sentence model (Kalchbrenner and Blunsom, 2013) to transform the source word representations into a representation for the source sentence. [sent-47, score-0.415]
15 The source sentence representation in turn constraints the generation of each target word. [sent-48, score-0.234]
16 It uses a truncated variant of the convolutional sentence model to first transform the source word representations into representations for the target words; the latter then constrain the generation of the target sentence. [sent-50, score-0.461]
17 In both cases, the convolutional layers are used to generate combined representations for the phrases in a sentence from the representations of the words in the sentence. [sent-51, score-0.288]
18 Connections between source and target words, phrases and sentences are learnt only implicitly as mappings between their continuous representations. [sent-53, score-0.263]
19 Another advantage is 1701 that the probability of a translation under the models is efficiently computable requiring a small number of matrix-vector products that is linear in the length of the source and the target sentence. [sent-56, score-0.258]
20 Since the translation probabilities of the RCTMs are tractable, we can measure the perplexity of the models with respect to the reference translations. [sent-59, score-0.184]
21 The perplexity of the models is significantly lower than that of IBM Model 1and is > 43% lower than the perplexity of a state-of-the-art variant of the IBM Model 2 (Brown et al. [sent-60, score-0.186]
22 The second experiment shows that under a random permutation of the words in the source sentences, the perplexity of the model with respect to the reference translations becomes significantly worse, suggesting that the model is highly sensitive to word position and order. [sent-64, score-0.235]
23 The generated translations demonstrate remarkable morphological, syntactic and semantic agreement with the source sentence. [sent-66, score-0.142]
24 The performance of the RCTM probabilities joined with a single word penalty feature matches the performance of the state-of-the-art translation system cdec that makes use of twelve features including five alignment-based translation models (Dyer et al. [sent-68, score-0.256]
25 We see that an RCTM is sensitive not just to the source sentence e but also to the preceding words f1:i−1 in the target sentence; by doing so it incorporates a model of the target language itself. [sent-92, score-0.248]
26 To model the conditional probability P(f|e), an RCTToM m comprises o bnodthit a generative ialirtcyhi Ptec(tfu|er)e, f aonr the target sentence and an architecture for conditioning the latter on the source sentence. [sent-93, score-0.366]
27 1, we model the generative architecture with a recurrent language model (RLM) based on a recurrent neural network (Mikolov et al. [sent-95, score-0.394]
28 The prediction of the i-th word fi in a RLM depends on all the preceding words f1:i−1 in the target sentence ensuring that conditional independence assumptions are not introduced in Eq. [sent-97, score-0.261]
29 Both the generative and conditioning aspects of the models deploy continuous representations for the constituents and are trained as a single joint architecture. [sent-102, score-0.262]
30 Given the modelling framework underlying RCTMs, we now proceed to describe in detail the recurrent language model underlying the generative aspect. [sent-103, score-0.225]
31 The recurrent transformation is applied to the hidden layer hi−1 and the result is summed to the representation for the current word fi. [sent-114, score-0.276]
32 The prediction proceeds by successively applying the recurrent transformation R to the word representations and predicting the next word at each step. [sent-120, score-0.311]
33 wRe|V i |n×d1i h1 = σ(I · v(f1)) hi+1 = σ(R oi+1 · hi + I v(fi+1)) · = O · hi (3a) (3b) (3c) and the conditional distribution is given by, P(fi= v|f1:i−1) =PVve=x1pe(xopi,(vo)i,v) (4) In Eq. [sent-123, score-0.161]
34 The error in the predicted distribution calculated at the output layer is backpropagated through the recurrent layers and cumulatively added to the errors ofthe previous predictions for a given number d of steps. [sent-130, score-0.221]
35 RCTMs may be thought of as RLMs, in which the predicted distributions for each word fi are conditioned on the source sentence e. [sent-133, score-0.26]
36 3 Recurrent Continuous Translation Model I The RCTM Iuses a convolutional sentence model (CSM) in the conditioning architecture. [sent-135, score-0.233]
37 The CSM creates a representation for a sentence that is progressively built up from representations of the ngrams in the sentence. [sent-136, score-0.148]
38 Although it does not make use of an explicit parse tree, the operations that generate the representations act locally on small n-grams in the lower layers of the model and act increasingly more globally on the whole sentence in the upper layers of the model. [sent-138, score-0.159]
39 Secondly, the translation probability distribution over the target sentences does not depend on the chosen parse tree. [sent-143, score-0.15]
40 The RCTM Iconditions the probability of each target word fi on the continuous representation ofthe source sentence e generated through the CSM. [sent-144, score-0.485]
41 This is accomplished by adding the sentence representation to each hidden layer hi in the target recurrent language model. [sent-145, score-0.412]
42 1 Convolutional Sentence Model The CSM models the continuous representation of a sentence based on the continuous representations of the words in the sentence. [sent-148, score-0.39]
43 ek be 1 a sentence in a language and let v(ei) ∈ Rq× be the continuous representation of the w)o ∈rd ei. [sent-152, score-0.213]
44 Let Ee ∈ be the sentence matrix for e defined by, Rq×k E:e,i = v(ei) (5) 1703 e Figure 2: A CSM for a six word source sentence e and the computed sentence representation e. [sent-153, score-0.321]
45 K2 , K3 are weight matrices and L3 is a top weight matrix. [sent-154, score-0.177]
46 To the right, an instance of a one-dimensional convolution between some weight matrix Ki and a generic matrix M that could for instance correspond to E2e. [sent-155, score-0.189]
47 The main component of the architecture of the CSM is a sequence of weight matrices (Ki)2≤i≤r that correspond to the kernels or filters of the convolution and can be thought of as learnt feature detectors. [sent-157, score-0.198]
48 From the sentence matrix Ee the CSM computes a continuous vector representation e ∈ Rq×1 for the sceonntteinncueo e by applying a sequence o∈f cRonvolutions to Ee whose weights are given by the weight matrices. [sent-158, score-0.319]
49 We denote by (Ki)2≤i≤r a sequence of weight matrices where eac√h K2≤i ∈ Rq×i is a matrix of i columns and r = d√2Ne, ∈wh Rere N is the length of tchoel longest source sen2tNenec,e w winh etrhee training s leetn. [sent-160, score-0.315]
50 Given for instance a matrix M ∈ Rq×j where the nGuivmebner f orf i ncosltuanmcnes a j ≥ i, e Mach row of Ki can be ncuonmvboelrve odf w coitlhu tmhens corresponding row oinf MK, resulting in a matrix Ki ∗ M, where ∗ indicates the convolution operation a∗nd M (,K wi ∗ M) ∈ Rdqic×a . [sent-162, score-0.135]
51 In the latter case, we equally obtain a vector in Rq×1 by simply applying a top weight matrix Lj that has the same number of columns as Eie. [sent-172, score-0.138]
52 We thus obtain a sentence representation e ∈ Rq×1 for the source sentence e. [sent-173, score-0.222]
53 Note also that, given the different levels at which the weight matrices Ki and Li are applied, the top weight matrix Lj comes from an additional sequence of weight matrices (Li)2≤i≤r distinct from (Ki)2≤i≤r. [sent-176, score-0.352]
54 It proceeds recursively as follows: Rq×|VF|, R|VF|×q s= h1 = hi+1 = oi+1 = S · csm(e) σ(I · v(f1) + s) σ(R · hi + I v(fi+1) + s) · O · hi 2For a formal treatment of the construction, ner and Blunsom, 2013). [sent-194, score-0.168]
55 First, the length of the target sentence is predicted by the target RLM itself that by its architecture has a bias towards shorter sentences. [sent-201, score-0.234]
56 Secondly, the representation of the source sentence e constraints uniformly all the target words, contrary to the fact that the target words depend more strongly on certain parts of the source sentence and less on other parts. [sent-202, score-0.423]
57 4 Recurrent Continuous Translation Model II The central idea behind the RCTM II is to first estimate the length m of the target sentence independently of the main architecture. [sent-204, score-0.131]
58 Given m and the source sentence e, the model constructs a representation for the n-grams in e, where n is set to 4. [sent-205, score-0.175]
59 From the 4-gram representation of the source sentence e, the model builds a representation of a sentence that has the predicted length m of the target. [sent-209, score-0.292]
60 This is similarly accomplished by truncating the inverted CSM for a sentence of length m. [sent-210, score-0.135]
61 P( f |m, e ) Arrows represent full matrix transformations while lines are vector transformations corresponding to columns of weight matrices. [sent-221, score-0.212]
62 We denote by cgm(e, n) that matrix Eie from the CSM that represents the n-grams of the source sentence e. [sent-224, score-0.182]
63 The CGM can also be inverted to obtain a representation for a sentence from the representation of its n-grams. [sent-225, score-0.165]
64 We denote by icgm the inverse CGM, which depends on the size of the n-gram representation cgm (e, n) and on the target sentence length m. [sent-226, score-0.361]
65 The transformation icgm unfolds the n-gram representation onto a representation of a target sentence with m words. [sent-227, score-0.28]
66 Given the transformations cgm and icgm, we now detail the computation of the RCTM II. [sent-230, score-0.176]
67 the elements of the RCTM Itogether with the following additional elements: a translation transformation Tq×q and two sequences of weight matrices (Ji)2≤i≤s and (Hi)2≤i≤s that are part of the icgm3. [sent-234, score-0.252]
68 Note ho|wf each re- constructed vector F:,i is added successively to the corresponding layer hi that predicts the target word fi. [sent-237, score-0.179]
69 3Just like r the value s is small and depends on the length of the source and target sentences in the training set. [sent-240, score-0.167]
70 For the separate estimation of the length of the translation, we estimate the conditional probability P(m|e) by letting, P(m|e) = P(m| k) = Poisson (λk) (11) where k is the length of the source sentence e and Poisson (λ) is a Poisson distribution with mean λ. [sent-245, score-0.205]
71 The source language is English and the target language is French. [sent-257, score-0.142]
72 This yields a relatively ∈sma Vll recurrent matrix and corresponding models. [sent-271, score-0.215]
73 For the RCTM I, the number of weight matrices r for the CSM is 15, whereas in the RCTM IIthe number r of weight matrices for the CGM is 7 and the number s of weight matrices for the inverse CGM is 9. [sent-275, score-0.369]
74 If a test sentence is longer than all training sentences and a larger weight matrix is required by the model, the larger weight matrix is easily factorized into two smaller weight matrices whose weights have been trained. [sent-276, score-0.382]
75 For instance, if a weight matrix of 10 weights is required, but weight matrices have been trained only up to weight 9, then one can factorize the matrix of 10 weights with one of 9 and one of 2. [sent-277, score-0.335]
76 Across all test sets the proportion of sentence pairs that require larger weight matrices to be factorized into smaller ones is < 0. [sent-278, score-0.17]
77 The cross-entropy error calculated at the output layer at each step is back-propagated through the recurrent structure for a number d of steps; for all models we let d = 6. [sent-286, score-0.193]
78 2 Perplexity of gold translations Since the computation of the probability of a translation under one of the RCTMs is efficient, we can compute the perplexities of the RCTMs with respect to the reference translations in the test sets. [sent-295, score-0.249]
79 We compare the perplexities of the RCTMs with the perplexity of the IBM Model 1 (Brown et al. [sent-297, score-0.133]
80 The RCTM II obtains a perplexity that is > 43% lower than that of the alignment based models and that is 40% lower than the perplexity of the RCTM I. [sent-303, score-0.186]
81 The low perplexity of the RCTMs suggests that continuous representations and the transformations between them make up well for the lack of explicit alignments. [sent-304, score-0.307]
82 Further, the difference in perplexity between the RCTMs themselves demonstrates the importance of the conditioning architecture and suggests that the localised 4-gram conditioning in the RCTM IIis superior to the conditioning with the whole source sentence of the RCTM I. [sent-305, score-0.522]
83 3 Sensitivity to source sentence structure The second experiment aims at showing the sensitivity of the RCTM II to the order and position of words in the English source sentence. [sent-307, score-0.259]
84 1is very significant, clearly indicating the sensitivity to word order and position of the translation model. [sent-315, score-0.137]
85 1 Generating from the RCTM II To show that the RCTM IIis sensitive not only to word order, but also to other syntactic and semantic traits of the sentence, we generate and inspect candidate translations for various English source sentences. [sent-318, score-0.169]
86 Given an English source sentence e, we let m be the length of the gold translation and we search the distribution computed by the RCTM IIover all sentences of length m. [sent-320, score-0.271]
87 The number of possible target sentences of length m amounts to |V |m = 34831m wsehneterne cVes = lVenFg tish tmhe Fmroeuncnhts vocabulary; directly considering all possible translations is intractable. [sent-321, score-0.143]
88 We start by predicting a distribution for the first target word, restricting that distribution to the top 5 most probable words and sampling the first word of a candidate translation from the restricted distribution of 5 words. [sent-323, score-0.177]
89 Table 3 gives various English source sentences and some candidate French translations generated by the RCTM IItogether with their ranks. [sent-326, score-0.169]
90 3 show the remarkable syntactic agreements of the candidate translations; the English source sentence French gold translation RCTM IIcandidate translation Rank the patient is sick . [sent-328, score-0.479]
91 6 Table 3: English source sentences, respective translations in French and candidate translations generated from the RCTM IIand ranked out of 2000 samples according to their decreasing probability. [sent-363, score-0.228]
92 The cdec system includes WP as well as five translation models and two language modelling features, among others. [sent-379, score-0.202]
93 Finally, the meaning of the English source is well transferred on the French candidate targets; where a correlation is unlikely or the target word is not in the French vocabulary, a semantically related word or synonym is selected by the model. [sent-382, score-0.169]
94 4 Rescoring and BLEU Evaluation The fourth experiment tests the ability of the RCTM Iand the RCTM II to choose the best translation among a large number of candidate translations pro- duced by another system. [sent-385, score-0.177]
95 We use the cdec system to generate a list of 1000 best candidate translations for each English sentence in the four WMTNT sets. [sent-386, score-0.207]
96 cdec employs 12 engineered features including, among others, 5 translation models, 2 language model features and a word penalty feature (WP). [sent-388, score-0.165]
97 Combining a monolingual RLM feature with the RCTMs does not improve the scores, while reducing cdec to just one core translation probability and language model features drops its score by two to five tenths. [sent-393, score-0.165]
98 6 Conclusion We have introduced Recurrent Continuous Translation Models that comprise a class of purely continuous sentence-level translation models. [sent-395, score-0.212]
99 The RCTMs offer great modelling flexibility due to the sensitivity of the continuous representations to conditioning information. [sent-398, score-0.345]
100 The models also suggest a wide range of potential advantages and extensions, from being able to include discourse representations beyond the single sentence and multilingual source representations, to being able to model morphologically rich languages through character-level recurrences. [sent-399, score-0.186]
wordName wordTfidf (topN-words)
[('rctm', 0.73), ('csm', 0.243), ('rctms', 0.185), ('recurrent', 0.163), ('patients', 0.154), ('cgm', 0.139), ('rlm', 0.131), ('fi', 0.13), ('continuous', 0.121), ('rq', 0.12), ('sont', 0.111), ('patient', 0.11), ('convolutional', 0.101), ('ki', 0.099), ('perplexity', 0.093), ('translation', 0.091), ('conditioning', 0.085), ('source', 0.083), ('eie', 0.081), ('les', 0.077), ('cdec', 0.074), ('matrices', 0.069), ('hi', 0.068), ('french', 0.061), ('target', 0.059), ('mikolov', 0.059), ('est', 0.059), ('translations', 0.059), ('representations', 0.056), ('ii', 0.056), ('weight', 0.054), ('matrix', 0.052), ('sentence', 0.047), ('icgm', 0.046), ('kalchbrenner', 0.046), ('morts', 0.046), ('sensitivity', 0.046), ('representation', 0.045), ('architecture', 0.044), ('vf', 0.043), ('perplexities', 0.04), ('rescoring', 0.038), ('transformation', 0.038), ('transformations', 0.037), ('modelling', 0.037), ('wp', 0.035), ('oi', 0.035), ('le', 0.035), ('iiis', 0.035), ('malade', 0.035), ('malades', 0.035), ('mort', 0.035), ('truncating', 0.035), ('nal', 0.034), ('blunsom', 0.034), ('pas', 0.034), ('fk', 0.033), ('proceeds', 0.032), ('columns', 0.032), ('convolution', 0.031), ('hunknowni', 0.03), ('sick', 0.03), ('layer', 0.03), ('gram', 0.029), ('ibm', 0.029), ('layers', 0.028), ('inverted', 0.028), ('candidate', 0.027), ('socher', 0.026), ('ym', 0.026), ('poisson', 0.026), ('english', 0.025), ('conditional', 0.025), ('length', 0.025), ('dyer', 0.025), ('proceed', 0.025), ('neural', 0.024), ('dead', 0.024), ('schwenk', 0.024), ('ill', 0.024), ('phil', 0.024), ('cernock', 0.023), ('insuffisante', 0.023), ('rlms', 0.023), ('sauv', 0.023), ('tait', 0.023), ('tfio', 0.023), ('wmtnt', 0.023), ('iy', 0.023), ('vo', 0.023), ('comprises', 0.023), ('ee', 0.022), ('tomas', 0.022), ('successively', 0.022), ('yp', 0.022), ('ne', 0.022), ('vocabulary', 0.021), ('ei', 0.021), ('secondly', 0.02), ('mal', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 156 emnlp-2013-Recurrent Continuous Translation Models
Author: Nal Kalchbrenner ; Phil Blunsom
Abstract: We introduce a class of probabilistic continuous translation models called Recurrent Continuous Translation Models that are purely based on continuous representations for words, phrases and sentences and do not rely on alignments or phrasal translation units. The models have a generation and a conditioning aspect. The generation of the translation is modelled with a target Recurrent Language Model, whereas the conditioning on the source sentence is modelled with a Convolutional Sentence Model. Through various experiments, we show first that our models obtain a perplexity with respect to gold translations that is > 43% lower than that of stateof-the-art alignment-based translation models. Secondly, we show that they are remarkably sensitive to the word order, syntax, and meaning of the source sentence despite lacking alignments. Finally we show that they match a state-of-the-art system when rescoring n-best lists of translations.
2 0.14074315 113 emnlp-2013-Joint Language and Translation Modeling with Recurrent Neural Networks
Author: Michael Auli ; Michel Galley ; Chris Quirk ; Geoffrey Zweig
Abstract: We present a joint language and translation model based on a recurrent neural network which predicts target words based on an unbounded history of both source and target words. The weaker independence assumptions of this model result in a vastly larger search space compared to related feedforward-based language or translation models. We tackle this issue with a new lattice rescoring algorithm and demonstrate its effectiveness empirically. Our joint model builds on a well known recurrent neural network language model (Mikolov, 2012) augmented by a layer of additional inputs from the source language. We show competitive accuracy compared to the traditional channel model features. Our best results improve the output of a system trained on WMT 2012 French-English data by up to 1.5 BLEU, and by 1.1BLEU on average across several test sets.
3 0.083743952 46 emnlp-2013-Classifying Message Board Posts with an Extracted Lexicon of Patient Attributes
Author: Ruihong Huang ; Ellen Riloff
Abstract: The goal of our research is to distinguish veterinary message board posts that describe a case involving a specific patient from posts that ask a general question. We create a text classifier that incorporates automatically generated attribute lists for veterinary patients to tackle this problem. Using a small amount of annotated data, we train an information extraction (IE) system to identify veterinary patient attributes. We then apply the IE system to a large collection of unannotated texts to produce a lexicon of veterinary patient attribute terms. Our experimental results show that using the learned attribute lists to encode patient information in the text classifier yields improved performance on this task.
4 0.073188014 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation
Author: Xinyan Xiao ; Deyi Xiong
Abstract: Traditional synchronous grammar induction estimates parameters by maximizing likelihood, which only has a loose relation to translation quality. Alternatively, we propose a max-margin estimation approach to discriminatively inducing synchronous grammars for machine translation, which directly optimizes translation quality measured by BLEU. In the max-margin estimation of parameters, we only need to calculate Viterbi translations. This further facilitates the incorporation of various non-local features that are defined on the target side. We test the effectiveness of our max-margin estimation framework on a competitive hierarchical phrase-based system. Experiments show that our max-margin method significantly outperforms the traditional twostep pipeline for synchronous rule extraction by 1.3 BLEU points and is also better than previous max-likelihood estimation method.
5 0.072538026 59 emnlp-2013-Deriving Adjectival Scales from Continuous Space Word Representations
Author: Joo-Kyung Kim ; Marie-Catherine de Marneffe
Abstract: Continuous space word representations extracted from neural network language models have been used effectively for natural language processing, but until recently it was not clear whether the spatial relationships of such representations were interpretable. Mikolov et al. (2013) show that these representations do capture syntactic and semantic regularities. Here, we push the interpretation of continuous space word representations further by demonstrating that vector offsets can be used to derive adjectival scales (e.g., okay < good < excellent). We evaluate the scales on the indirect answers to yes/no questions corpus (de Marneffe et al., 2010). We obtain 72.8% accuracy, which outperforms previous results (∼60%) on tichihs corpus aornmd highlights sth rees quality o6f0% the) scales extracted, providing further support that the continuous space word representations are meaningful.
6 0.067373887 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation
7 0.063967787 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models
8 0.06378226 186 emnlp-2013-Translating into Morphologically Rich Languages with Synthetic Phrases
9 0.06332472 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification
10 0.062656544 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation
11 0.059539035 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation
12 0.055918217 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation
13 0.052945841 2 emnlp-2013-A Convex Alternative to IBM Model 2
14 0.04767362 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk
15 0.047083236 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation
16 0.045707263 52 emnlp-2013-Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation
17 0.045657657 172 emnlp-2013-Simple Customization of Recursive Neural Networks for Semantic Relation Classification
18 0.045254443 134 emnlp-2013-Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks
19 0.044876445 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
20 0.044832371 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation
topicId topicWeight
[(0, -0.144), (1, -0.105), (2, 0.001), (3, -0.008), (4, 0.035), (5, 0.036), (6, 0.002), (7, -0.016), (8, -0.097), (9, -0.041), (10, 0.052), (11, 0.017), (12, 0.008), (13, -0.076), (14, -0.009), (15, -0.012), (16, 0.112), (17, 0.032), (18, -0.056), (19, 0.024), (20, 0.108), (21, 0.06), (22, 0.03), (23, 0.009), (24, -0.157), (25, 0.041), (26, 0.113), (27, -0.035), (28, 0.187), (29, 0.108), (30, -0.009), (31, -0.04), (32, 0.126), (33, -0.062), (34, 0.032), (35, -0.023), (36, 0.215), (37, -0.003), (38, 0.01), (39, -0.041), (40, 0.063), (41, -0.078), (42, 0.028), (43, 0.028), (44, 0.056), (45, 0.004), (46, -0.005), (47, -0.044), (48, -0.097), (49, 0.08)]
simIndex simValue paperId paperTitle
same-paper 1 0.88454932 156 emnlp-2013-Recurrent Continuous Translation Models
Author: Nal Kalchbrenner ; Phil Blunsom
Abstract: We introduce a class of probabilistic continuous translation models called Recurrent Continuous Translation Models that are purely based on continuous representations for words, phrases and sentences and do not rely on alignments or phrasal translation units. The models have a generation and a conditioning aspect. The generation of the translation is modelled with a target Recurrent Language Model, whereas the conditioning on the source sentence is modelled with a Convolutional Sentence Model. Through various experiments, we show first that our models obtain a perplexity with respect to gold translations that is > 43% lower than that of stateof-the-art alignment-based translation models. Secondly, we show that they are remarkably sensitive to the word order, syntax, and meaning of the source sentence despite lacking alignments. Finally we show that they match a state-of-the-art system when rescoring n-best lists of translations.
2 0.81416029 113 emnlp-2013-Joint Language and Translation Modeling with Recurrent Neural Networks
Author: Michael Auli ; Michel Galley ; Chris Quirk ; Geoffrey Zweig
Abstract: We present a joint language and translation model based on a recurrent neural network which predicts target words based on an unbounded history of both source and target words. The weaker independence assumptions of this model result in a vastly larger search space compared to related feedforward-based language or translation models. We tackle this issue with a new lattice rescoring algorithm and demonstrate its effectiveness empirically. Our joint model builds on a well known recurrent neural network language model (Mikolov, 2012) augmented by a layer of additional inputs from the source language. We show competitive accuracy compared to the traditional channel model features. Our best results improve the output of a system trained on WMT 2012 French-English data by up to 1.5 BLEU, and by 1.1BLEU on average across several test sets.
3 0.58837688 59 emnlp-2013-Deriving Adjectival Scales from Continuous Space Word Representations
Author: Joo-Kyung Kim ; Marie-Catherine de Marneffe
Abstract: Continuous space word representations extracted from neural network language models have been used effectively for natural language processing, but until recently it was not clear whether the spatial relationships of such representations were interpretable. Mikolov et al. (2013) show that these representations do capture syntactic and semantic regularities. Here, we push the interpretation of continuous space word representations further by demonstrating that vector offsets can be used to derive adjectival scales (e.g., okay < good < excellent). We evaluate the scales on the indirect answers to yes/no questions corpus (de Marneffe et al., 2010). We obtain 72.8% accuracy, which outperforms previous results (∼60%) on tichihs corpus aornmd highlights sth rees quality o6f0% the) scales extracted, providing further support that the continuous space word representations are meaningful.
Author: Rui Wang ; Masao Utiyama ; Isao Goto ; Eiichro Sumita ; Hai Zhao ; Bao-Liang Lu
Abstract: Neural network language models, or continuous-space language models (CSLMs), have been shown to improve the performance of statistical machine translation (SMT) when they are used for reranking n-best translations. However, CSLMs have not been used in the first pass decoding of SMT, because using CSLMs in decoding takes a lot of time. In contrast, we propose a method for converting CSLMs into back-off n-gram language models (BNLMs) so that we can use converted CSLMs in decoding. We show that they outperform the original BNLMs and are comparable with the traditional use of CSLMs in reranking.
5 0.48294771 55 emnlp-2013-Decoding with Large-Scale Neural Language Models Improves Translation
Author: Ashish Vaswani ; Yinggong Zhao ; Victoria Fossum ; David Chiang
Abstract: We explore the application of neural language models to machine translation. We develop a new model that combines the neural probabilistic language model of Bengio et al., rectified linear units, and noise-contrastive estimation, and we incorporate it into a machine translation system both by reranking k-best lists and by direct integration into the decoder. Our large-scale, large-vocabulary experiments across four language pairs show that our neural language model improves translation quality by up to 1. 1B .
6 0.44426796 46 emnlp-2013-Classifying Message Board Posts with an Extracted Lexicon of Patient Attributes
7 0.37631756 117 emnlp-2013-Latent Anaphora Resolution for Cross-Lingual Pronoun Prediction
8 0.36653599 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models
9 0.34052181 72 emnlp-2013-Elephant: Sequence Labeling for Word and Sentence Segmentation
10 0.33948192 172 emnlp-2013-Simple Customization of Recursive Neural Networks for Semantic Relation Classification
11 0.32693592 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk
12 0.32282618 88 emnlp-2013-Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding
13 0.32146689 142 emnlp-2013-Open-Domain Fine-Grained Class Extraction from Web Search Queries
14 0.30927882 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation
15 0.30761388 186 emnlp-2013-Translating into Morphologically Rich Languages with Synthetic Phrases
16 0.30248931 58 emnlp-2013-Dependency Language Models for Sentence Completion
17 0.30074963 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation
18 0.2937538 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation
19 0.28908595 201 emnlp-2013-What is Hidden among Translation Rules
20 0.28741339 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation
topicId topicWeight
[(3, 0.038), (6, 0.02), (18, 0.033), (22, 0.028), (30, 0.079), (43, 0.029), (45, 0.015), (50, 0.014), (51, 0.117), (55, 0.343), (66, 0.051), (71, 0.025), (75, 0.043), (77, 0.035), (96, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.72402412 156 emnlp-2013-Recurrent Continuous Translation Models
Author: Nal Kalchbrenner ; Phil Blunsom
Abstract: We introduce a class of probabilistic continuous translation models called Recurrent Continuous Translation Models that are purely based on continuous representations for words, phrases and sentences and do not rely on alignments or phrasal translation units. The models have a generation and a conditioning aspect. The generation of the translation is modelled with a target Recurrent Language Model, whereas the conditioning on the source sentence is modelled with a Convolutional Sentence Model. Through various experiments, we show first that our models obtain a perplexity with respect to gold translations that is > 43% lower than that of stateof-the-art alignment-based translation models. Secondly, we show that they are remarkably sensitive to the word order, syntax, and meaning of the source sentence despite lacking alignments. Finally we show that they match a state-of-the-art system when rescoring n-best lists of translations.
2 0.71824265 142 emnlp-2013-Open-Domain Fine-Grained Class Extraction from Web Search Queries
Author: Marius Pasca
Abstract: This paper introduces a method for extracting fine-grained class labels ( “countries with double taxation agreements with india ”) from Web search queries. The class labels are more numerous and more diverse than those produced by current extraction methods. Also extracted are representative sets of instances (singapore, united kingdom) for the class labels.
3 0.62692302 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation
Author: Hendra Setiawan ; Bowen Zhou ; Bing Xiang
Abstract: Reordering poses one of the greatest challenges in Statistical Machine Translation research as the key contextual information may well be beyond the confine oftranslation units. We present the “Anchor Graph” (AG) model where we use a graph structure to model global contextual information that is crucial for reordering. The key ingredient of our AG model is the edges that capture the relationship between the reordering around a set of selected translation units, which we refer to as anchors. As the edges link anchors that may span multiple translation units at decoding time, our AG model effectively encodes global contextual information that is previously absent. We integrate our proposed model into a state-of-the-art translation system and demonstrate the efficacy of our proposal in a largescale Chinese-to-English translation task.
4 0.42951033 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
Author: Xiaoqing Zheng ; Hanyang Chen ; Tianyu Xu
Abstract: This study explores the feasibility of performing Chinese word segmentation (CWS) and POS tagging by deep learning. We try to avoid task-specific feature engineering, and use deep layers of neural networks to discover relevant features to the tasks. We leverage large-scale unlabeled data to improve internal representation of Chinese characters, and use these improved representations to enhance supervised word segmentation and POS tagging models. Our networks achieved close to state-of-theart performance with minimal computational cost. We also describe a perceptron-style algorithm for training the neural networks, as an alternative to maximum-likelihood method, to speed up the training process and make the learning algorithm easier to be implemented.
5 0.42264894 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation
Author: Will Y. Zou ; Richard Socher ; Daniel Cer ; Christopher D. Manning
Abstract: We introduce bilingual word embeddings: semantic embeddings associated across two languages in the context of neural language models. We propose a method to learn bilingual embeddings from a large unlabeled corpus, while utilizing MT word alignments to constrain translational equivalence. The new embeddings significantly out-perform baselines in word semantic similarity. A single semantic similarity feature induced with bilingual embeddings adds near half a BLEU point to the results of NIST08 Chinese-English machine translation task.
6 0.42248011 143 emnlp-2013-Open Domain Targeted Sentiment
7 0.42145544 158 emnlp-2013-Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
8 0.4213928 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models
9 0.42109504 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation
10 0.41859812 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
11 0.41814259 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation
12 0.41700637 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs
13 0.41654691 81 emnlp-2013-Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media
14 0.41593072 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation
15 0.41591051 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction
16 0.41550177 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
17 0.41508567 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
18 0.41450119 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction
19 0.41433451 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
20 0.41372928 83 emnlp-2013-Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech