emnlp emnlp2013 emnlp2013-135 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ann Irvine ; Chris Quirk ; Hal Daume III
Abstract: When using a machine translation (MT) model trained on OLD-domain parallel data to translate NEW-domain text, one major challenge is the large number of out-of-vocabulary (OOV) and new-translation-sense words. We present a method to identify new translations of both known and unknown source language words that uses NEW-domain comparable document pairs. Starting with a joint distribution of source-target word pairs derived from the OLD-domain parallel corpus, our method recovers a new joint distribution that matches the marginal distributions of the NEW-domain comparable document pairs, while minimizing the divergence from the OLD-domain distribution. Adding learned translations to our French-English MT model results in gains of about 2 BLEU points over strong baselines.
Reference: text
sentIndex sentText sentNum sentScore
1 name Abstract When using a machine translation (MT) model trained on OLD-domain parallel data to translate NEW-domain text, one major challenge is the large number of out-of-vocabulary (OOV) and new-translation-sense words. [sent-4, score-0.305]
2 We present a method to identify new translations of both known and unknown source language words that uses NEW-domain comparable document pairs. [sent-5, score-0.522]
3 Adding learned translations to our French-English MT model results in gains of about 2 BLEU points over strong baselines. [sent-7, score-0.303]
4 parliamentary proceedings) parallel text is used to translate text in a NEWdomain (e. [sent-10, score-0.314]
5 For example, the French 1077 Figure 1: Percent of test set word types by domain that are OOV with respect to five million tokens of OLDdomain French parliamentary proceedings data. [sent-16, score-0.223]
6 word enceinte is mostly translated in parliamentary proceedings as place, house, or chamber; in medical text, the translation is mostly pregnant; in scientific text, enclosures. [sent-17, score-0.479]
7 (2010), for example, mine parallel text from comparable corpora. [sent-20, score-0.261]
8 oc d2s0 i1n3 N Aastusorcaila Ltiaon g fuoarg Ceo Pmrpoucetastsi on ga,l p Laignegsu 1is0t7ic7s–108 , tion of translation probabilities over all source and target word pairs in the NEW-domain. [sent-28, score-0.251]
9 We begin with a maximum likelihood estimate of the joint based on a word aligned OLD-domain corpus and update this distribution using NEW-domain comparable data. [sent-29, score-0.299]
10 We define a model based on a single comparable corpus and then extend it to learn from document aligned comparable corpora with any number of comparable document pairs. [sent-30, score-0.764]
11 French cisaillement and per ¸cage, which translate as shear and drilling, in the scien- tific domain) as well as new translations for previously observed NTS words (e. [sent-33, score-0.26]
12 In our MT experiments, we use the learned NEWdomain joint distribution to update our SMT model with translations of OOV and low frequency words; we leave the integration of new translations for NTS words to future work. [sent-36, score-0.558]
13 Our approach crucially depends on finding comparable document pairs relevant to the NEW-domain. [sent-37, score-0.353]
14 Such pairs could be derived from a number of sources, with document pairings inferred from timestamps (e. [sent-38, score-0.211]
15 Our model also relies on the assumption that each comparable document pair describes generally the same concepts, though the order and structure of presentation may differ significantly. [sent-44, score-0.325]
16 That work concludes that errors resulting from unseen (OOV) and new translation sense words cause the majority of the degradation in translation performance that occurs when an MT model trained on OLD-domain data is used to translate data in a NEW-domain. [sent-50, score-0.324]
17 Here, we target OOV errors, though our marginal matching method is also applicable to learning translations for NTS words. [sent-51, score-0.416]
18 (2012); we learn a translation distribution despite a lack of parallel data. [sent-59, score-0.317]
19 Daum e´ III and Jagarlamudi (201 1) mine translations for high frequency OOV words in NEWdomain text in order to do domain adaptation. [sent-63, score-0.291]
20 Al- though that work shows significant MT improvements, it is based primarily on distributional similarity, thus making it difficult to learn translations for low frequency source words with sparse word context counts. [sent-64, score-0.266]
21 Additionally, that work reports results using artificially created monolingual corpora taken from separate source and target halves of a NEWdomain parallel corpus, which may have more lexical overlap with the corresponding test set than we could expect from true monolingual corpora. [sent-65, score-0.378]
22 (2013) take a fundamentally different approach and construct a graph using source language monolingual text and identify translations for source language OOV words by pivoting through paraphrases. [sent-69, score-0.366]
23 3 Model Our goal is to recover a probabilistic translation dictionary in a NEW-domain, represented as a joint probability distribution pnew(s, t) over source/target word pairs. [sent-73, score-0.259]
24 At our disposal, we have access to ajoint distribution pold (s, t) from the OLD-domain (computed from word alignments), plus comparable document pairs in the NEW-domain. [sent-74, score-0.541]
25 From these comparable documents, we can extract raw word frequencies on both the source and target side, represented as marginal distributions q(s) and q(t). [sent-75, score-0.401]
26 The key idea is to estimate this NEW-domain joint distribution to be as similar to the OLD-domain distribution as possible, subject to the constraint that its marginals match those of q. [sent-76, score-0.407]
27 In the NEW-domain comparable data, we find that accorder occurs 5 times, but grant occurs only once, and tune occurs 4 times. [sent-79, score-0.197]
28 Clearly accorder no longer translates as grant most of the time; perhaps we should shift much of its mass onto the translation tune instead. [sent-80, score-0.193]
29 First, we present an objective function and set of constraints over joint distributions to minimize the divergence from the OLD-domain distribution while matching both the source and target NEW-domain marginal distributions. [sent-82, score-0.489]
30 Optimizing this objective with a single pair of source and target marginals can be performed using an off-the-shelf solver. [sent-84, score-0.319]
31 Therefore, we present a sequential learning method for approximately matching the large set of document pair marginal distributions. [sent-87, score-0.419]
32 Finally, we describe how we identify comparable document pairs relevant to the NEW-domain. [sent-88, score-0.353]
33 Next, we find source and target marginal distributions, q(s) and q(t), by relative frequency estimates over the source and target comparable corpora. [sent-92, score-0.45]
34 Our goal is to recover a joint distribution pnew(s, t) for the new domain that matches the marginals, q(s) and q(t), but is minimally different from the original joint distribution, pold(s,t). [sent-93, score-0.296]
35 , = argmpin subject to: (1) p(s,t) ≥ 0 Xs,t Xp(s,t) = q(t), Xp(s,t) = q(s) Xs Xt In the objective function, the joint probability matri- pold ces p and are interpreted as large vectors over all word pairs (s, t). [sent-104, score-0.332]
36 Following prior work (Ravi and Knight, 2011), we would like the matrix to remain as sparse as possible; that is, introduce the smallest number of new translation pairs necessary. [sent-106, score-0.2]
37 In this example, a translation is learned for the previously OOV word fille, and pregnant becomes a preferred translation for enceinte. [sent-108, score-0.355]
38 If the old domain joint probability pold (s, t) was nonzero, there is no penalty. [sent-109, score-0.302]
39 To discourage the addition of translation pairs that are unnecessary in the new domain, we use a value of λr greater than one. [sent-111, score-0.2]
40 We define a penalty function f(p) as follows: if the normalized Levenshtein edit distance between s without accents and t is less than 0. [sent-117, score-0.209]
41 We modify our approach to take advantage of the document correspondences within our comparable corpus. [sent-136, score-0.291]
42 In particular, we would like to match the marginals for all document pairs. [sent-137, score-0.339]
43 4 By maintaining separate marginal distributions, our algorithm is presented with more 3We experimented with penalties measuring document-pair co-occurrence and monolingual frequency differences but did not see gains on our development sets. [sent-138, score-0.371]
44 For example, imagine that one document pair uses “dog” and “chien”, where another document pair uses “cat” and “chat”, each with similar frequency. [sent-141, score-0.366]
45 If we sum these marginals to produce a single marginal distribution, it is now difficult to identify that “dog” should correspond to “chien” and not “chat. [sent-142, score-0.361]
46 An initial formulation of our problem with multiple comparable document pairs might require the marginals to match all of the document marginals. [sent-144, score-0.692]
47 Instead, we take an incremental, online solution, considering a single comparable document pair at a time. [sent-146, score-0.325]
48 For document pair k, we solve the optimization problem in Eq (1) to find the joint distribution minimally different from , while matching the marginals of this pair only. [sent-147, score-0.631]
49 We then update our pnew pk-1 current guess of the new domain joint toward this document-pair-specific distribution, much like a step in stochastic gradient ascent. [sent-149, score-0.229]
50 More formally, suppose that before processing the kth document we have a guess at the NEW-domain joint distribution, p1n:ekw−1 (the subscript indicates that it includes all docu1m:k−en1t pairs up to and including document k − 1). [sent-150, score-0.421]
51 oc Wumee fnirst pair, finding a joint yd oisntribution pnkew that matches the marginals of the kth document pair only and is minimally different from p1ne:kw−1 . [sent-152, score-0.509]
52 Eight parallel learners update an initial joint distribution based on 100 document pairs (i. [sent-164, score-0.451]
53 3 Comparable Data Selection It remains to select comparable document pairs. [sent-168, score-0.291]
54 We assume that we have enough monolingual NEWdomain data in one language to rank comparable document pairs (here, Wikipedia pages) according to how NEW-domain-like they are. [sent-169, score-0.472]
55 For each Wikipedia document pair, we compute the percent of French phrases up to length four that are observed in the French monolingual NEW-domain corpus and rank document pairs by the geometric mean of the four overlap measures. [sent-175, score-0.479]
56 1 Data We use French-English Hansard parliamentary proceedings7 as our OLD-domain parallel corpus. [sent-180, score-0.266]
57 With over 8 million parallel lines of text, it is one of the largest freely available parallel corpora for any lan- 6We could have, analogously, used the target language (English) side of the parallel corpus and measure overlap with the English Wikipedia documents, or even used both. [sent-181, score-0.397]
58 , 2013a), and (3) a corpus of translated movie subtitles (Tiedemann, 2009). [sent-188, score-0.203]
59 We use the NEW-domain parallel training corpora only for language modeling and for identifying NEW-domain-like comparable documents. [sent-190, score-0.301]
60 Using Moses, we extract a phrase table with a phrase limit of five words and estimate the standard set of five feature functions (phrase and lexical translation probabilities in each direction and a constant phrase penalty feature). [sent-194, score-0.353]
61 3 Experiments For each domain, we use the marginal matching method described in Section 3 to learn a new, domain-adapted joint distribution, pknew(s, t), over all French and English words. [sent-201, score-0.297]
62 We supplement phrase tables with translations for OOV and low frequency words (we experiment with training data frequencies less than 101, 11, and 1) and include pknew(t|s) and pknew(s|t) as new translation features for th(to|sse) supplemental 1082 translations. [sent-205, score-0.396]
63 For phrase pairs extracted bilingually, we use the bilingually estimated translation probabilities and uniform scores for the new translation features. [sent-207, score-0.436]
64 We experimented with using pnkew(t|s) and pknew(s|t) to estimate additional lexical (trta|ns)sla atniodn p probabilities sfotirm tahtee bilingually extracted phrase pairs but did not observe any gains (experimental details omitted due to space constraints). [sent-208, score-0.277]
65 We also perform oracle experiments in which we identify translations for French words in wordaligned development and test sets and append these translations to baseline phrase tables. [sent-210, score-0.493]
66 1 Semi-extrinsic evaluation Before doing end-to-end MT experiments, we evaluate our learned joint distribution, pknew(s, t), by comparing it to the joint distribution taken from a word aligned NEW-domain parallel development set, pgold (s, t). [sent-212, score-0.343]
67 Figure 3 shows the mean reciprocal rank for the learned distribution, pknew(s, t), for each domains as a function of the number of comparable document pairs used in learning. [sent-221, score-0.507]
68 In all domains, the compara- ble document pairs are sorted according to their sim8And, indeed, by default our decoder copies OOV strings into its output directly. [sent-222, score-0.211]
69 For each source word s, the edit distance (ED) baseline ranks all English words t in our monolingual data by their edit distance with s. [sent-227, score-0.401]
70 9 The Canonical Correlation Analysis (CCA) baseline uses the approach of Daum e´ III and Jagarlamudi (201 1) and the top 25, 000 ranked document pairs as a comparable corpus. [sent-228, score-0.397]
71 For Science, learning is gradual and it appears that additional gains could be made by iterating over even more document pairs. [sent-232, score-0.23]
72 We experimented with making multiple learning passes over the document pairs and observed relatively small gains from doing so. [sent-240, score-0.292]
73 In all experiments, learning from some number of additional new document pairs resulted in higher semi-extrinsic performance gains than passing over document pairs which were already observed. [sent-241, score-0.503]
74 Table 1shows some examples of what the marginal matching method learns for different types of source words (OOVs, low frequency, and NTS). [sent-244, score-0.287]
75 45, respectively, when we changed the default handling of OOVs to strip accents before Table 1: Hand-picked examples of Science-domain French words and their top English translations in the OLD- domain, NEW-domain, and marginal matching distributions. [sent-257, score-0.464]
76 In the results presented below, including the baselines, we supplement phrase tables with a new candidate translation but also include accent-stripped identity, or ‘freebie,’ translations in the table for all OOV words. [sent-265, score-0.361]
77 2 BLEU improvement in the Science domain), so instead we simply include both types of translations in the phrase tables. [sent-267, score-0.223]
78 That is, for all words w, we compute D(w), the vector indicating the document pairs in which w occurs, over the set of 50,000 document-pairs which are most NEWdomain-like. [sent-272, score-0.211]
79 For all French OOVs, we rank all English translations according to the cosine similarity between the pair of D(w) vectors. [sent-276, score-0.286]
80 For the CCA baseline comparison, we only learned translations using 25,000 Science-domain document pairs, rather than the full 50,000 and for all domains. [sent-278, score-0.415]
81 Along with each new translation pair, we include one new phrase table feature with the relevant translation score (edit distance, document similarity, or CCA distributional similarity). [sent-281, score-0.468]
82 For all baselines other than drop-OOVs, we also include accent-stripped translation pairs with an additional indicator feature. [sent-282, score-0.2]
83 Using document pair co-occurrences is the strongest baseline for the Science and EMEA do- mains. [sent-285, score-0.227]
84 This confirms our intuition that taking ad- vantage of document pair alignments is worthwhile. [sent-286, score-0.215]
85 For Science and EMEA, supplementing a model with OOV translations learned through our marginal matching method drastically outperforms all base- IRBMOne aMpsfuVetlsinteranslt eh ae st er ped ´eus ni cs othar ni cegecst slhyiaenuansc rhids eati nrl ec renome grietnhnefcsot. [sent-287, score-0.554]
86 Also in the third example, the low frequency word d ´ecroissance is translated as the MM-hypothesized incorrect translation linear. [sent-305, score-0.232]
87 We observe additional gains by also supplementing the model with translations for low frequency French words. [sent-313, score-0.392]
88 Table 3 also shows the result of supplementing a baseline phrase table with oracle OOV translations. [sent-318, score-0.229]
89 Using the marginal matching learned OOV translations takes us 30% and 40% of the way from the baseline to the oracle upper bound for Science and EMEA, respectively. [sent-319, score-0.548]
90 6 Discussion BLEU score performance gains are substantial for the Science and EMEA domains, but we don’t observe gains on the subtitles text. [sent-331, score-0.306]
91 Our science and EMEA corpora are certainly different in domain from the OLD-domain parliamentary proceedings, and our success in boosting MT performance with our methods indicates that the Wikipedia comparable corpora that we mined match those domains well. [sent-335, score-0.522]
92 In contrast, the subtitles data differs from the OLD-domain parliamentary proceedings in both domain and register. [sent-336, score-0.367]
93 Although the Wikipedia data that we mined may be closer in domain to the subtitles data than the parliamentary proceedings,11 its register is certainly not film dialogues. [sent-337, score-0.403]
94 Although the use of marginal matching is, to the best of our knowledge, novel in MT, there are related threads of research that might inspire future work. [sent-338, score-0.236]
95 The intuition that we should match marginal distributions is similar to work using no example labels but only label proportions to estimate labels, for example in Quadrianto et al. [sent-339, score-0.244]
96 Also, while the marginal matching objective seems effective in practice, it is difficult to optimize. [sent-344, score-0.28]
97 Considering the marginal distributions from each document pair to be a separate subproblem, we could approach the global objective of satisfying all subproblems as an instance of dual decomposition (Sontag et al. [sent-346, score-0.435]
98 7 Conclusions We proposed a model for learning a joint distribution of source-target word pairs based on the idea that its marginals should match those observed in NEW-domain comparable corpora. [sent-355, score-0.515]
99 Supplementing a baseline phrase-based SMT model with learned translations results in BLEU score gains of about two points in the medical and science domains. [sent-356, score-0.401]
100 Extracting parallel sentences from comparable corpora using document level alignment. [sent-536, score-0.45]
wordName wordTfidf (topN-words)
[('oov', 0.46), ('emea', 0.257), ('french', 0.236), ('marginals', 0.19), ('translations', 0.18), ('marginal', 0.171), ('document', 0.149), ('parliamentary', 0.147), ('pknew', 0.147), ('subtitles', 0.144), ('comparable', 0.142), ('translation', 0.138), ('pold', 0.128), ('parallel', 0.119), ('nts', 0.109), ('mt', 0.107), ('supplementing', 0.096), ('daum', 0.092), ('newdomain', 0.092), ('pnew', 0.092), ('irvine', 0.09), ('oovs', 0.087), ('monolingual', 0.084), ('jagarlamudi', 0.081), ('gains', 0.081), ('ekw', 0.08), ('carpuat', 0.08), ('domains', 0.077), ('edit', 0.077), ('domain', 0.076), ('cca', 0.073), ('hal', 0.066), ('iii', 0.066), ('matching', 0.065), ('schafer', 0.064), ('bleu', 0.063), ('wikipedia', 0.063), ('pairs', 0.062), ('joint', 0.061), ('distribution', 0.06), ('translated', 0.059), ('hansard', 0.058), ('accorder', 0.055), ('bilingually', 0.055), ('medical', 0.054), ('source', 0.051), ('gurobi', 0.051), ('penalty', 0.05), ('english', 0.048), ('translate', 0.048), ('accents', 0.048), ('marine', 0.048), ('johns', 0.047), ('oracle', 0.046), ('scientific', 0.044), ('ann', 0.044), ('objective', 0.044), ('razmara', 0.044), ('baseline', 0.044), ('phrase', 0.043), ('mira', 0.043), ('foreign', 0.043), ('learned', 0.042), ('jagadeesh', 0.041), ('corpora', 0.04), ('hopkins', 0.039), ('smt', 0.038), ('minimally', 0.038), ('bilingual', 0.038), ('distributions', 0.037), ('similarity', 0.037), ('old', 0.037), ('argmpin', 0.037), ('damt', 0.037), ('ecroissance', 0.037), ('enceinte', 0.037), ('freebie', 0.037), ('gabay', 0.037), ('glowinski', 0.037), ('katharine', 0.037), ('pnkew', 0.037), ('pregnant', 0.037), ('quadrianto', 0.037), ('register', 0.036), ('fung', 0.036), ('ravi', 0.036), ('estimate', 0.036), ('rank', 0.035), ('frequency', 0.035), ('distance', 0.034), ('pair', 0.034), ('chris', 0.033), ('alignments', 0.032), ('mm', 0.032), ('knight', 0.032), ('chien', 0.032), ('robertson', 0.032), ('cisaillement', 0.032), ('comparability', 0.032), ('sontag', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999875 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation
Author: Ann Irvine ; Chris Quirk ; Hal Daume III
Abstract: When using a machine translation (MT) model trained on OLD-domain parallel data to translate NEW-domain text, one major challenge is the large number of out-of-vocabulary (OOV) and new-translation-sense words. We present a method to identify new translations of both known and unknown source language words that uses NEW-domain comparable document pairs. Starting with a joint distribution of source-target word pairs derived from the OLD-domain parallel corpus, our method recovers a new joint distribution that matches the marginal distributions of the NEW-domain comparable document pairs, while minimizing the divergence from the OLD-domain distribution. Adding learned translations to our French-English MT model results in gains of about 2 BLEU points over strong baselines.
2 0.25291127 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation
Author: Qing Dou ; Kevin Knight
Abstract: We introduce dependency relations into deciphering foreign languages and show that dependency relations help improve the state-ofthe-art deciphering accuracy by over 500%. We learn a translation lexicon from large amounts of genuinely non parallel data with decipherment to improve a phrase-based machine translation system trained with limited parallel data. In experiments, we observe BLEU gains of 1.2 to 1.8 across three different test sets.
3 0.15570997 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models
Author: Joern Wuebker ; Stephan Peitz ; Felix Rietig ; Hermann Ney
Abstract: Automatically clustering words from a monolingual or bilingual training corpus into classes is a widely used technique in statistical natural language processing. We present a very simple and easy to implement method for using these word classes to improve translation quality. It can be applied across different machine translation paradigms and with arbitrary types of models. We show its efficacy on a small German→English and a larger F ornenc ah s→mGalelrm Gaenrm mtarann→slEatniognli tsahsk a nwdit ha lbaortghe rst Farnednacrhd→ phrase-based salandti nhie traaskrch wiciathl phrase-based translation systems for a common set of models. Our results show that with word class models, the baseline can be improved by up to 1.4% BLEU and 1.0% TER on the French→German task and 0.3% BLEU aonnd t h1e .1 F%re nTcEhR→ on tehrem German→English Btask.
4 0.15100378 42 emnlp-2013-Building Specialized Bilingual Lexicons Using Large Scale Background Knowledge
Author: Dhouha Bouamor ; Adrian Popescu ; Nasredine Semmar ; Pierre Zweigenbaum
Abstract: Bilingual lexicons are central components of machine translation and cross-lingual information retrieval systems. Their manual construction requires strong expertise in both languages involved and is a costly process. Several automatic methods were proposed as an alternative but they often rely on resources available in a limited number of languages and their performances are still far behind the quality of manual translations. We introduce a novel approach to the creation of specific domain bilingual lexicon that relies on Wikipedia. This massively multilingual encyclopedia makes it possible to create lexicons for a large number of language pairs. Wikipedia is used to extract domains in each language, to link domains between languages and to create generic translation dictionaries. The approach is tested on four specialized domains and is compared to three state of the art approaches using two language pairs: FrenchEnglish and Romanian-English. The newly introduced method compares favorably to existing methods in all configurations tested.
5 0.14921372 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation
Author: Kevin Gimpel ; Dhruv Batra ; Chris Dyer ; Gregory Shakhnarovich
Abstract: This paper addresses the problem of producing a diverse set of plausible translations. We present a simple procedure that can be used with any statistical machine translation (MT) system. We explore three ways of using diverse translations: (1) system combination, (2) discriminative reranking with rich features, and (3) a novel post-editing scenario in which multiple translations are presented to users. We find that diversity can improve performance on these tasks, especially for sentences that are difficult for MT.
6 0.14165002 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning
7 0.12749229 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification
8 0.12288327 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk
9 0.11531309 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation
10 0.11248416 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation
11 0.10117711 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)
12 0.098804049 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation
13 0.094161689 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation
14 0.093223177 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering
15 0.092465758 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
16 0.090933882 96 emnlp-2013-Identifying Phrasal Verbs Using Many Bilingual Corpora
17 0.086003453 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation
18 0.085947908 9 emnlp-2013-A Log-Linear Model for Unsupervised Text Normalization
19 0.084721774 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations
20 0.084691979 148 emnlp-2013-Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching
topicId topicWeight
[(0, -0.273), (1, -0.201), (2, 0.024), (3, 0.012), (4, 0.117), (5, -0.059), (6, 0.005), (7, 0.024), (8, 0.024), (9, -0.249), (10, 0.022), (11, -0.059), (12, 0.137), (13, -0.036), (14, 0.038), (15, -0.029), (16, -0.074), (17, 0.01), (18, 0.003), (19, 0.099), (20, -0.001), (21, 0.053), (22, 0.074), (23, 0.089), (24, 0.087), (25, -0.094), (26, -0.075), (27, 0.07), (28, 0.052), (29, 0.003), (30, 0.067), (31, -0.112), (32, 0.046), (33, 0.06), (34, 0.025), (35, -0.117), (36, -0.015), (37, 0.015), (38, 0.035), (39, -0.086), (40, 0.035), (41, 0.055), (42, -0.075), (43, -0.031), (44, -0.022), (45, 0.048), (46, -0.059), (47, -0.064), (48, -0.132), (49, 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.94047654 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation
Author: Ann Irvine ; Chris Quirk ; Hal Daume III
Abstract: When using a machine translation (MT) model trained on OLD-domain parallel data to translate NEW-domain text, one major challenge is the large number of out-of-vocabulary (OOV) and new-translation-sense words. We present a method to identify new translations of both known and unknown source language words that uses NEW-domain comparable document pairs. Starting with a joint distribution of source-target word pairs derived from the OLD-domain parallel corpus, our method recovers a new joint distribution that matches the marginal distributions of the NEW-domain comparable document pairs, while minimizing the divergence from the OLD-domain distribution. Adding learned translations to our French-English MT model results in gains of about 2 BLEU points over strong baselines.
2 0.85454774 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation
Author: Qing Dou ; Kevin Knight
Abstract: We introduce dependency relations into deciphering foreign languages and show that dependency relations help improve the state-ofthe-art deciphering accuracy by over 500%. We learn a translation lexicon from large amounts of genuinely non parallel data with decipherment to improve a phrase-based machine translation system trained with limited parallel data. In experiments, we observe BLEU gains of 1.2 to 1.8 across three different test sets.
3 0.69088215 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk
Author: Xiaoning Zhu ; Zhongjun He ; Hua Wu ; Haifeng Wang ; Conghui Zhu ; Tiejun Zhao
Abstract: This paper proposes a novel approach that utilizes a machine learning method to improve pivot-based statistical machine translation (SMT). For language pairs with few bilingual data, a possible solution in pivot-based SMT using another language as a
4 0.68981022 42 emnlp-2013-Building Specialized Bilingual Lexicons Using Large Scale Background Knowledge
Author: Dhouha Bouamor ; Adrian Popescu ; Nasredine Semmar ; Pierre Zweigenbaum
Abstract: Bilingual lexicons are central components of machine translation and cross-lingual information retrieval systems. Their manual construction requires strong expertise in both languages involved and is a costly process. Several automatic methods were proposed as an alternative but they often rely on resources available in a limited number of languages and their performances are still far behind the quality of manual translations. We introduce a novel approach to the creation of specific domain bilingual lexicon that relies on Wikipedia. This massively multilingual encyclopedia makes it possible to create lexicons for a large number of language pairs. Wikipedia is used to extract domains in each language, to link domains between languages and to create generic translation dictionaries. The approach is tested on four specialized domains and is compared to three state of the art approaches using two language pairs: FrenchEnglish and Romanian-English. The newly introduced method compares favorably to existing methods in all configurations tested.
5 0.68045825 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation
Author: Kevin Gimpel ; Dhruv Batra ; Chris Dyer ; Gregory Shakhnarovich
Abstract: This paper addresses the problem of producing a diverse set of plausible translations. We present a simple procedure that can be used with any statistical machine translation (MT) system. We explore three ways of using diverse translations: (1) system combination, (2) discriminative reranking with rich features, and (3) a novel post-editing scenario in which multiple translations are presented to users. We find that diversity can improve performance on these tasks, especially for sentences that are difficult for MT.
6 0.64034605 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models
7 0.58522135 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning
8 0.56477231 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models
9 0.55504757 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)
10 0.48107198 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation
11 0.48067769 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
12 0.47842649 39 emnlp-2013-Boosting Cross-Language Retrieval by Learning Bilingual Phrase Associations from Relevance Rankings
13 0.46576881 148 emnlp-2013-Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching
14 0.45772821 151 emnlp-2013-Paraphrasing 4 Microblog Normalization
15 0.45417058 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification
16 0.45240888 156 emnlp-2013-Recurrent Continuous Translation Models
17 0.43835366 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation
18 0.43216103 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation
19 0.43185845 54 emnlp-2013-Decipherment with a Million Random Restarts
20 0.42557052 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations
topicId topicWeight
[(3, 0.037), (18, 0.035), (22, 0.047), (30, 0.083), (43, 0.011), (45, 0.01), (50, 0.02), (51, 0.162), (66, 0.028), (71, 0.019), (75, 0.023), (77, 0.41), (96, 0.015)]
simIndex simValue paperId paperTitle
1 0.97718096 55 emnlp-2013-Decoding with Large-Scale Neural Language Models Improves Translation
Author: Ashish Vaswani ; Yinggong Zhao ; Victoria Fossum ; David Chiang
Abstract: We explore the application of neural language models to machine translation. We develop a new model that combines the neural probabilistic language model of Bengio et al., rectified linear units, and noise-contrastive estimation, and we incorporate it into a machine translation system both by reranking k-best lists and by direct integration into the decoder. Our large-scale, large-vocabulary experiments across four language pairs show that our neural language model improves translation quality by up to 1. 1B .
2 0.96800315 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation
Author: Zhongqiang Huang ; Jacob Devlin ; Rabih Zbib
Abstract: Translation Jacob Devlin Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA j devl in@bbn . com Rabih Zbib Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA r zbib@bbn . com have tried to introduce grammaticality to the transThis paper describes a factored approach to incorporating soft source syntactic constraints into a hierarchical phrase-based translation system. In contrast to traditional approaches that directly introduce syntactic constraints to translation rules by explicitly decorating them with syntactic annotations, which often exacerbate the data sparsity problem and cause other problems, our approach keeps translation rules intact and factorizes the use of syntactic constraints through two separate models: 1) a syntax mismatch model that associates each nonterminal of a translation rule with a distribution of tags that is used to measure the degree of syntactic compatibility of the translation rule on source spans; 2) a syntax-based reordering model that predicts whether a pair of sibling constituents in the constituent parse tree of the source sentence should be reordered or not when translated to the target language. The features produced by both models are used as soft constraints to guide the translation process. Experiments on Chinese-English translation show that the proposed approach significantly improves a strong string-to-dependency translation system on multiple evaluation sets.
same-paper 3 0.8192637 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation
Author: Ann Irvine ; Chris Quirk ; Hal Daume III
Abstract: When using a machine translation (MT) model trained on OLD-domain parallel data to translate NEW-domain text, one major challenge is the large number of out-of-vocabulary (OOV) and new-translation-sense words. We present a method to identify new translations of both known and unknown source language words that uses NEW-domain comparable document pairs. Starting with a joint distribution of source-target word pairs derived from the OLD-domain parallel corpus, our method recovers a new joint distribution that matches the marginal distributions of the NEW-domain comparable document pairs, while minimizing the divergence from the OLD-domain distribution. Adding learned translations to our French-English MT model results in gains of about 2 BLEU points over strong baselines.
4 0.79671383 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation
Author: Qing Dou ; Kevin Knight
Abstract: We introduce dependency relations into deciphering foreign languages and show that dependency relations help improve the state-ofthe-art deciphering accuracy by over 500%. We learn a translation lexicon from large amounts of genuinely non parallel data with decipherment to improve a phrase-based machine translation system trained with limited parallel data. In experiments, we observe BLEU gains of 1.2 to 1.8 across three different test sets.
5 0.62918872 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
Author: Fandong Meng ; Jun Xie ; Linfeng Song ; Yajuan Lu ; Qun Liu
Abstract: We present a novel translation model, which simultaneously exploits the constituency and dependency trees on the source side, to combine the advantages of two types of trees. We take head-dependents relations of dependency trees as backbone and incorporate phrasal nodes of constituency trees as the source side of our translation rules, and the target side as strings. Our rules hold the property of long distance reorderings and the compatibility with phrases. Large-scale experimental results show that our model achieves significantly improvements over the constituency-to-string (+2.45 BLEU on average) and dependencyto-string (+0.91 BLEU on average) models, which only employ single type of trees, and significantly outperforms the state-of-theart hierarchical phrase-based model (+1.12 BLEU on average), on three Chinese-English NIST test sets.
6 0.60305923 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation
7 0.59435874 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
8 0.58964193 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation
9 0.58504659 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation
10 0.58177233 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models
11 0.5798499 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology
12 0.57981473 88 emnlp-2013-Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding
13 0.57484525 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation
14 0.57483256 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models
15 0.56067127 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction
16 0.54929674 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation
17 0.543163 113 emnlp-2013-Joint Language and Translation Modeling with Recurrent Neural Networks
18 0.53785497 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
19 0.5371893 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
20 0.53108823 128 emnlp-2013-Max-Violation Perceptron and Forced Decoding for Scalable MT Training