emnlp emnlp2013 emnlp2013-38 knowledge-graph by maker-knowledge-mining

38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

Source: pdf

Author: Will Y. Zou ; Richard Socher ; Daniel Cer ; Christopher D. Manning

Abstract: We introduce bilingual word embeddings: semantic embeddings associated across two languages in the context of neural language models. We propose a method to learn bilingual embeddings from a large unlabeled corpus, while utilizing MT word alignments to constrain translational equivalence. The new embeddings significantly out-perform baselines in word semantic similarity. A single semantic similarity feature induced with bilingual embeddings adds near half a BLEU point to the results of NIST08 Chinese-English machine translation task.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 org cer Abstract We introduce bilingual word embeddings: semantic embeddings associated across two languages in the context of neural language models. [sent-5, score-1.413]

2 We propose a method to learn bilingual embeddings from a large unlabeled corpus, while utilizing MT word alignments to constrain translational equivalence. [sent-6, score-1.218]

3 The new embeddings significantly out-perform baselines in word semantic similarity. [sent-7, score-0.848]

4 A single semantic similarity feature induced with bilingual embeddings adds near half a BLEU point to the results of NIST08 Chinese-English machine translation task. [sent-8, score-1.303]

5 1 Introduction It is difficult to recognize and quantify semantic similarities across languages. [sent-9, score-0.14]

6 The Fr-En phrase-pair {‘un cas de force majeure’ , ‘case of absolute necessity’ }, aZsh d-Ee fno phrase pair {,‘依 ‘ca然se故 of我 ab’s,o‘pluetresi nste cine a sstiutyb’b},or Znh m-Eannn pherr’a } are isrim {‘i依lar然然in故故se我m’a,n‘ptiecrss. [sent-10, score-0.034]

7 Itf i co- occurrences nofn exact wreo sridm ciloamr ibnin saetmioanns are rare ointhe training parallel text, it can be difficult for classical statistical MT methods to identify this similarity, or produce a reasonable translation given the source phrase. [sent-11, score-0.131]

8 We introduce an unsupervised neural model to learn bilingual semantic embedding for words across two languages. [sent-12, score-0.567]

9 As an extension to their monolingual counter-part (Turian et al. [sent-13, score-0.088]

10 , 2003), bilingual embeddings capture not only semantic information of monolingual words, but also semantic relationships across different languages. [sent-16, score-1.325]

11 This prop1393 erty allows them to define semantic similarity metrics across phrase-pairs, making them perfect features for machine translation. [sent-17, score-0.16]

12 To learn bilingual embeddings, we use a new objective function which embodies both monolingual semantics and bilingual translation equivalence. [sent-18, score-0.882]

13 The latter utilizes word alignments, a natural sub-task in the machine translation pipeline. [sent-19, score-0.143]

14 , 2009), we obtain bilingual distributed representations which lie in the same feature space. [sent-21, score-0.403]

15 Embeddings of direct translations overlap, and semantic relationships across bilingual embeddings were further improved through unsupervised learning on a large unlabeled corpus. [sent-22, score-1.223]

16 Consequently, we produce for the research community a first set of Mandarin Chinese word embeddings with 100,000 words trained on the Chinese Gigaword corpus. [sent-23, score-0.809]

17 We evaluate these embedding on Chinese word semantic similarity from SemEval2012 (Jin and Wu, 2012). [sent-24, score-0.213]

18 The embeddings significantly out-perform prior work and pruned tf-idf base-lines. [sent-25, score-0.796]

19 In addition, the learned embeddings give rise to 0. [sent-26, score-0.743]

20 We apply the bilingual embeddings in an end-toend phrase-based MT system by computing semantic similarities between phrase pairs. [sent-29, score-1.211]

21 On NIST08 Chinese-English translation task, we obtain an improvement of 0. [sent-30, score-0.106]

22 oc d2s0 i1n3 N Aastusorcaila Ltiaon g fuoarg Ceo Pmrpoucetastsi on ga,l p Laignegsu 1is3t9ic3s–1398, 2 Review of prior work Distributed word representations are useful in NLP applications such as information retrieval (Pa ¸sca et al. [sent-36, score-0.082]

23 A number of methods have been explored to train and apply word embeddings using continuous models for language. [sent-41, score-0.832]

24 (2008) learn embeddings in an unsupervised manner through a contrastive estimation technique. [sent-43, score-0.766]

25 (2012) introduced global document context and multiple word prototypes. [sent-48, score-0.037]

26 Recently, morphology is explored to learn better word representations through Recursive Neural Networks (Luong et al. [sent-49, score-0.106]

27 Bilingual word representations have been explored with hand-designed vector space mod- els (Peirsman and Pad o´ , 2010; Sumita, 2000), and with unsupervised algorithms such as LDA and LSA (Boyd-Graber and Resnik, 2010; Tam et al. [sent-51, score-0.129]

28 Only recently have continuous space models been applied to machine translation (Le et al. [sent-53, score-0.134]

29 Despite growing interest in these models, little work has been done along the same lines to train bilingual distributioned word represenations to improve machine translation. [sent-55, score-0.363]

30 In this paper, we learn bilingual word embeddings which achieve competitive performance on semantic word similarity, and apply them in a practical phrase-based MT system. [sent-56, score-1.243]

31 1 Unsupervised training with global context Our method starts with embedding learning formulations in Collobert et al. [sent-58, score-0.048]

32 Given a context window c in a document d, the optimization minimizes the following Context Objective for a word w in the vocabulary: JC(cO,d) = X max(0, 1− f(cw, d) + wXr∈VR f(cwr, d)) (1) 1394 Here f is a function defined by a neural network. [sent-60, score-0.131]

33 is a word chosen in a random subset VR of the vocabulary, and is the context window containing word This unsupervised objective function contrasts the score between when the correct word is placed in context with when a random word is placed in the same context. [sent-61, score-0.289]

34 2 Bilingual initialization and training In the joint semantic space of words across two languages, the Chinese word ‘政府’ is expected to be close to its English translation ‘government’ . [sent-66, score-0.305]

35 ‘lake’ and the Chinese word ‘潭’ (deep pond), their semantic proximity could be correctly quantified. [sent-69, score-0.105]

36 We describe in the next sub-sections the methods to intialize and train bilingual embeddings. [sent-70, score-0.326]

37 These methods ensure that bilingual embeddings retain their translational equivalence while their distributional semantics are improved during online training with a monolingual corpus. [sent-71, score-1.299]

38 1 Initialization by MT alignments First, we use MT Alignment counts as weighting to initialize Chinese word embeddings. [sent-74, score-0.117]

39 In our experiments, we use MT word alignments extracted with the Berkeley Aligner (Liang et al. [sent-75, score-0.092]

40 Specifically, we use the following equation to compute starting word embeddings: Wt-init=sX=S1CCtts++ S 1Ws (2) In this equation, S is the number of possible target language words that are aligned with the source word. [sent-77, score-0.106]

41 Cts denotes the number of times when word t in the target and word s in the source are aligned in the training parallel text; Ct denotes the total number of counts of word t that appeared in the target language. [sent-78, score-0.187]

42 1On NIST08 Zh-En training data and data from GALE MT evaluation in the past 5 years Single-prototype English embeddings by Huang et al. [sent-80, score-0.743]

43 The initialization readily provides a set (Align-Init) of benchmark embeddings in experiments (Section 4), and ensures translation equivalence in the embeddings at start of training. [sent-82, score-1.739]

44 2 Bilingual training Using the alignment counts, we form alignment matrices Aen→zh and Azh→en. [sent-85, score-0.154]

45 For Aen→zh, each row corresponds to a Chinese word, and each column an English word. [sent-86, score-0.025]

46 An element aij is first assigned the counts of when the ith Chinese word is aligned with the jth English word in parallel text. [sent-87, score-0.15]

47 After assignments, each row is normalized such that it sums to one. [sent-88, score-0.025]

48 Denote the set of Chinese word embeddings as Vzh, with each row a word embedding, and the set of English word embeddings as Ven. [sent-90, score-1.622]

49 With the two alignment matrices, we define the Translation Equivalence Objective: JTEO-en→zh = kVzh − Aen→zhVen k2 (3) JTEO-zh→en = kVen − Azh→enVzhk2 (4) We optimize for a combined objective during training. [sent-91, score-0.137]

50 For the Chinese embeddings we optimize for: JCO-zh + λJTEO-en→zh (5) For the English embeddings we optimize for: JCO-en + λJTEO-zh→en (6) During bilingual training, we chose the value of λ such that convergence is achieved for both JCO and JTEO. [sent-92, score-1.86]

51 A small validation set of word similarities from (Jin and Wu, 2012) is used to ensure the embeddings have reasonable semantics. [sent-93, score-0.82]

52 2 In the next sections, ‘bilingual trained’ embeddings refer to those initialized with MT alignments and trained with the objective defined by Equation 5. [sent-94, score-0.888]

53 ‘Monolingual trained’ embeddings refer to those intialized by alignment but trained without JTEO-en→zh. [sent-95, score-0.849]

54 3 Curriculum training We train 100k-vocabulary word embeddings using curriculum training (Turian et al. [sent-98, score-0.923]

55 For each curriculum, we sort the vocabulary by frequency and segment the vocabulary by a band-size taken from {5k, 10k, 25k, 50k}. [sent-100, score-0.082]

56 Separate bbaanndds-s oizfe eth teak vocabulary are t0rka,in 2e5dk i,n 5 parallel using minibatch L-BFGS on the Chinese Gigaword corpus 3. [sent-101, score-0.123]

57 We train 100,000 iterations for each curriculum, and the entire 100k vocabulary is trained for 500,000 iterations. [sent-102, score-0.07]

58 We show visualization of learned embeddings overlaid with English in Figure 1. [sent-104, score-0.825]

59 The two-dimensional vectors for this visualization is obtained with t-SNE (van der Maaten and Hinton, 2008). [sent-105, score-0.056]

60 To make the figure comprehensible, subsets of Chinese words are provided with reference translations in boxes with green borders. [sent-106, score-0.127]

61 Words across the two languages are positioned by the semantic relationships implied by their embeddings. [sent-107, score-0.124]

62 Figure 1: Overlaid bilingual embeddings: English words are plotted in yellow boxes, and Chinese words in green; reference translations to English are provided in boxes with green borders directly below the original word. [sent-108, score-0.475]

63 1 Semantic Similarity We evaluate the Mandarin Chinese embeddings with the semantic similarity test-set provided by the or3Fifth Edition. [sent-110, score-0.871]

64 This test-set contains 297 Chinese word pairs with similarity scores estimated by humans. [sent-128, score-0.097]

65 The results for semantic similarity are shown in Table 1. [sent-129, score-0.128]

66 For both, bilingual embeddings trained with the combined objective defined by Equation 5 perform best. [sent-131, score-1.134]

67 (2012) and count word co-occurrences in a 10word window. [sent-134, score-0.037]

68 The bilingual and monolingual × trained embeddings4 out-perform pruned tf-idf by 14. [sent-136, score-0.496]

69 Further, they out-perform embeddings riensiptieacl-ized from alignment by 7. [sent-139, score-0.82]

70 Both our tf-idf implementation and the word embeddings have significantly higher Kendall’s Tau value compared to Prior work (Jin and Wu, 2012). [sent-142, score-0.78]

71 , 2006) to validate the quality of the Chinese word embeddings. [sent-147, score-0.037]

72 With embeddings, we build a naive feed-forward neural network (Collobert et al. [sent-150, score-0.141]

73 , 2008) with 2000 hidden neurons and a sliding window of five words. [sent-151, score-0.024]

74 This naive setting, without sequence modeling or sophisticated 4Due to variations caused by online minibatch L-BFGS, we take embeddings from five random points out of last 105 minibatch iterations, and average their semantic similarity results. [sent-152, score-1.025]

75 59 join optimization, is not competitive with state-ofthe-art (Wang et al. [sent-174, score-0.032]

76 Table 2 shows that the bilingual embeddings obtains 0. [sent-176, score-1.069]

77 3 Vector matching alignment Translation equivalence of the bilingual embeddings is evaluated by naive word alignment to match word embeddings by cosine distance. [sent-181, score-2.165]

78 5 The Alignment Error Rates (AER) reported in Table 3 suggest that bilingual training using Equation 5 produces embeddings with better translation equivalence compared to those produced by monolingual training. [sent-182, score-1.348]

79 4 Phrase-based machine translation Our experiments are performed using the Stanford Phrasal phrase-based machine translation system (Cer et al. [sent-184, score-0.212]

80 In addition to NIST08 training data, we perform phrase extraction, filtering and phrase table learning with additional data from GALE MT evaluations in the past 5 years. [sent-186, score-0.068]

81 In the phrase-based MT system, we add one feature to bilingual phrase-pairs. [sent-190, score-0.326]

82 For each phrase, the word embeddings are averaged to obtain a feature vector. [sent-191, score-0.78]

83 If a word is not found in the vocabulary, we disregard and assume it is not in the phrase; if no word is found in a phrase, a zero vector is assigned 5This is evaluated on 10,000 randomly selected sentence pairs from the MT training set. [sent-192, score-0.074]

84 Table 4: NIST08 Chinese-English translation BLEU MethodBLEU Our baseline30. [sent-193, score-0.106]

85 We then compute the cosine distance between the feature vectors of a phrase pair to form a semantic similarity feature for the decoder. [sent-199, score-0.162]

86 Results on NIST08 Chinese-English translation task are reported in Table 46. [sent-200, score-0.106]

87 48 BLEU is obtained with semantic similarity with bilingual embeddings. [sent-202, score-0.454]

88 From these suggestive evidence in the MT results, random initialized monolingual trained embeddings add little gains to the baseline. [sent-207, score-0.923]

89 Bilingual initialization and training seem to be offering relatively more consistent gains by introducing translational equivalence. [sent-208, score-0.119]

90 5 Conclusion In this paper, we introduce bilingual word embed- dings through initialization and optimization constraint using MT alignments The embeddings are learned through curriculum training on the Chinese Gigaword corpus. [sent-209, score-1.366]

91 We show good performance on Chinese semantic similarity with bilingual trained embeddings. [sent-210, score-0.483]

92 When used to compute semantic similarity of phrase pairs, bilingual embeddings improve NIST08 end-to-end machine translation results by just below half a BLEU point. [sent-211, score-1.337]

93 This implies that semantic embeddings are useful features for improving MT systems. [sent-212, score-0.811]

94 Further, our results offer suggestive evidence that bilingual word embeddings act as high-quality semantic features and embody bilingual translation equivalence across languages. [sent-213, score-1.761]

95 Holistic sentiment analysis across languages: multilingual supervised latent dirichlet allocation. [sent-250, score-0.055]

96 A unified architecture for natural language processing: Deep neural networks with multitask learning. [sent-271, score-0.101]

97 Fast and adaptive online training of feature-rich translation models. [sent-296, score-0.106]

98 Better word representations with recursive neural networks for morphology. [sent-371, score-0.224]

99 Names and similarities on the web: fact extraction in the fast lane. [sent-441, score-0.04]

100 Cross-lingual induction of selectional preferences with bilingual vector spaces. [sent-447, score-0.326]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('embeddings', 0.743), ('bilingual', 0.326), ('chinese', 0.145), ('curriculum', 0.143), ('mt', 0.128), ('cer', 0.113), ('translation', 0.106), ('bleu', 0.097), ('monolingual', 0.088), ('equivalence', 0.085), ('azh', 0.077), ('luong', 0.077), ('tau', 0.077), ('alignment', 0.077), ('neural', 0.07), ('turian', 0.068), ('semantic', 0.068), ('bengio', 0.067), ('zh', 0.065), ('initialization', 0.062), ('jin', 0.062), ('collobert', 0.062), ('reisinger', 0.061), ('similarity', 0.06), ('socher', 0.057), ('minibatch', 0.057), ('translational', 0.057), ('alignments', 0.055), ('huang', 0.054), ('pruned', 0.053), ('cwr', 0.051), ('overlaid', 0.051), ('green', 0.049), ('embedding', 0.048), ('ontonotes', 0.047), ('boxes', 0.047), ('representations', 0.045), ('aer', 0.045), ('kendall', 0.045), ('morin', 0.045), ('equation', 0.043), ('vocabulary', 0.041), ('aen', 0.041), ('maaten', 0.041), ('tam', 0.041), ('recursive', 0.041), ('hinton', 0.04), ('similarities', 0.04), ('naive', 0.04), ('mandarin', 0.038), ('gale', 0.038), ('suggestive', 0.038), ('word', 0.037), ('vr', 0.036), ('objective', 0.036), ('gigaword', 0.035), ('phrasal', 0.035), ('phrase', 0.034), ('mnih', 0.034), ('stanford', 0.033), ('hovy', 0.033), ('wu', 0.032), ('distributed', 0.032), ('competitive', 0.032), ('across', 0.032), ('visualization', 0.031), ('peirsman', 0.031), ('networks', 0.031), ('translations', 0.031), ('network', 0.031), ('spearman', 0.03), ('pennington', 0.03), ('manning', 0.03), ('pad', 0.029), ('placed', 0.029), ('mikolov', 0.029), ('trained', 0.029), ('autoencoders', 0.028), ('continuous', 0.028), ('xing', 0.027), ('aligned', 0.026), ('counts', 0.025), ('english', 0.025), ('row', 0.025), ('parallel', 0.025), ('nist', 0.025), ('initialized', 0.025), ('der', 0.025), ('window', 0.024), ('explored', 0.024), ('languages', 0.024), ('deep', 0.024), ('optimize', 0.024), ('minimum', 0.023), ('sentiment', 0.023), ('ca', 0.023), ('unsupervised', 0.023), ('galley', 0.023), ('lifchits', 0.022), ('borders', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999911 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

Author: Will Y. Zou ; Richard Socher ; Daniel Cer ; Christopher D. Manning

2 0.18755651 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

Author: Jason Weston ; Antoine Bordes ; Oksana Yakhnenko ; Nicolas Usunier

Abstract: This paper proposes a novel approach for relation extraction from free text which is trained to jointly use information from the text and from existing knowledge. Our model is based on scoring functions that operate by learning low-dimensional embeddings of words, entities and relationships from a knowledge base. We empirically show on New York Times articles aligned with Freebase relations that our approach is able to efficiently use the extra information provided by a large subset of Freebase data (4M entities, 23k relationships) to improve over methods that rely on text features alone.

3 0.18073535 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

Author: Xiaoqing Zheng ; Hanyang Chen ; Tianyu Xu

Abstract: This study explores the feasibility of performing Chinese word segmentation (CWS) and POS tagging by deep learning. We try to avoid task-specific feature engineering, and use deep layers of neural networks to discover relevant features to the tasks. We leverage large-scale unlabeled data to improve internal representation of Chinese characters, and use these improved representations to enhance supervised word segmentation and POS tagging models. Our networks achieved close to state-of-theart performance with minimal computational cost. We also describe a perceptron-style algorithm for training the neural networks, as an alternative to maximum-likelihood method, to speed up the training process and make the learning algorithm easier to be implemented.

4 0.17568505 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)

Author: Ivan Vulic ; Marie-Francine Moens

Abstract: We present a new language pair agnostic approach to inducing bilingual vector spaces from non-parallel data without any other resource in a bootstrapping fashion. The paper systematically introduces and describes all key elements of the bootstrapping procedure: (1) starting point or seed lexicon, (2) the confidence estimation and selection of new dimensions of the space, and (3) convergence. We test the quality of the induced bilingual vector spaces, and analyze the influence of the different components of the bootstrapping approach in the task of bilingual lexicon extraction (BLE) for two language pairs. Results reveal that, contrary to conclusions from prior work, the seeding of the bootstrapping process has a heavy impact on the quality of the learned lexicons. We also show that our approach outperforms the best performing fully corpus-based BLE methods on these test sets.

5 0.15463327 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

Author: Min Xiao ; Yuhong Guo

Abstract: Cross-lingual adaptation aims to learn a prediction model in a label-scarce target language by exploiting labeled data from a labelrich source language. An effective crosslingual adaptation system can substantially reduce the manual annotation effort required in many natural language processing tasks. In this paper, we propose a new cross-lingual adaptation approach for document classification based on learning cross-lingual discriminative distributed representations of words. Specifically, we propose to maximize the loglikelihood of the documents from both language domains under a cross-lingual logbilinear document model, while minimizing the prediction log-losses of labeled documents. We conduct extensive experiments on cross-lingual sentiment classification tasks of Amazon product reviews. Our experimental results demonstrate the efficacy of the pro- posed cross-lingual adaptation approach.

6 0.12195998 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

7 0.11710663 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

8 0.11532962 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

9 0.11292446 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning

10 0.11067296 96 emnlp-2013-Identifying Phrasal Verbs Using Many Bilingual Corpora

11 0.10734521 42 emnlp-2013-Building Specialized Bilingual Lexicons Using Large Scale Background Knowledge

12 0.10180495 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

13 0.099062197 167 emnlp-2013-Semi-Markov Phrase-Based Monolingual Alignment

14 0.094161689 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

15 0.091411665 55 emnlp-2013-Decoding with Large-Scale Neural Language Models Improves Translation

16 0.088990264 102 emnlp-2013-Improving Learning and Inference in a Large Knowledge-Base using Latent Syntactic Cues

17 0.084476903 134 emnlp-2013-Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks

18 0.079104714 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

19 0.07855624 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

20 0.078197323 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.232), (1, -0.163), (2, 0.022), (3, -0.055), (4, 0.12), (5, 0.071), (6, 0.017), (7, 0.099), (8, -0.16), (9, -0.05), (10, 0.172), (11, -0.039), (12, 0.166), (13, -0.133), (14, -0.079), (15, -0.094), (16, -0.0), (17, 0.108), (18, 0.053), (19, -0.027), (20, -0.043), (21, -0.094), (22, 0.014), (23, 0.088), (24, -0.133), (25, 0.037), (26, 0.067), (27, 0.013), (28, -0.081), (29, -0.024), (30, -0.118), (31, 0.051), (32, -0.198), (33, 0.118), (34, -0.233), (35, -0.14), (36, -0.125), (37, -0.02), (38, -0.012), (39, -0.075), (40, -0.046), (41, -0.008), (42, 0.029), (43, -0.044), (44, -0.045), (45, -0.17), (46, 0.036), (47, -0.029), (48, 0.041), (49, -0.061)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9502002 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

Author: Will Y. Zou ; Richard Socher ; Daniel Cer ; Christopher D. Manning

2 0.63074064 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)

Author: Ivan Vulic ; Marie-Francine Moens

3 0.51020515 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

Author: Jason Weston ; Antoine Bordes ; Oksana Yakhnenko ; Nicolas Usunier

4 0.45459515 55 emnlp-2013-Decoding with Large-Scale Neural Language Models Improves Translation

Author: Ashish Vaswani ; Yinggong Zhao ; Victoria Fossum ; David Chiang

Abstract: We explore the application of neural language models to machine translation. We develop a new model that combines the neural probabilistic language model of Bengio et al., rectified linear units, and noise-contrastive estimation, and we incorporate it into a machine translation system both by reranking k-best lists and by direct integration into the decoder. Our large-scale, large-vocabulary experiments across four language pairs show that our neural language model improves translation quality by up to 1. 1B .

5 0.4541395 102 emnlp-2013-Improving Learning and Inference in a Large Knowledge-Base using Latent Syntactic Cues

Author: Matt Gardner ; Partha Pratim Talukdar ; Bryan Kisiel ; Tom Mitchell

Abstract: Automatically constructed Knowledge Bases (KBs) are often incomplete and there is a genuine need to improve their coverage. Path Ranking Algorithm (PRA) is a recently proposed method which aims to improve KB coverage by performing inference directly over the KB graph. For the first time, we demonstrate that addition of edges labeled with latent features mined from a large dependency parsed corpus of 500 million Web documents can significantly outperform previous PRAbased approaches on the KB inference task. We present extensive experimental results validating this finding. The resources presented in this paper are publicly available.

6 0.43151686 42 emnlp-2013-Building Specialized Bilingual Lexicons Using Large Scale Background Knowledge

7 0.42299679 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

8 0.40435803 139 emnlp-2013-Noise-Aware Character Alignment for Bootstrapping Statistical Machine Transliteration from Bilingual Corpora

9 0.39256731 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

10 0.39199865 134 emnlp-2013-Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks

11 0.38626739 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

12 0.36811045 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning

13 0.36310077 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

14 0.34800005 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

15 0.34578651 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

16 0.33378762 167 emnlp-2013-Semi-Markov Phrase-Based Monolingual Alignment

17 0.33171916 96 emnlp-2013-Identifying Phrasal Verbs Using Many Bilingual Corpora

18 0.3258841 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

19 0.30834711 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation

20 0.3011182 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.017), (6, 0.019), (18, 0.043), (22, 0.041), (30, 0.108), (43, 0.015), (50, 0.031), (51, 0.18), (66, 0.031), (71, 0.032), (75, 0.056), (77, 0.063), (80, 0.218), (96, 0.034), (97, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86218476 46 emnlp-2013-Classifying Message Board Posts with an Extracted Lexicon of Patient Attributes

Author: Ruihong Huang ; Ellen Riloff

Abstract: The goal of our research is to distinguish veterinary message board posts that describe a case involving a specific patient from posts that ask a general question. We create a text classifier that incorporates automatically generated attribute lists for veterinary patients to tackle this problem. Using a small amount of annotated data, we train an information extraction (IE) system to identify veterinary patient attributes. We then apply the IE system to a large collection of unannotated texts to produce a lexicon of veterinary patient attribute terms. Our experimental results show that using the learned attribute lists to encode patient information in the text classifier yields improved performance on this task.

same-paper 2 0.83758628 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

Author: Will Y. Zou ; Richard Socher ; Daniel Cer ; Christopher D. Manning

3 0.72065032 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

Author: Xiaoqing Zheng ; Hanyang Chen ; Tianyu Xu

4 0.71689522 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

Author: Peng Li ; Yang Liu ; Maosong Sun

Abstract: While inversion transduction grammar (ITG) is well suited for modeling ordering shifts between languages, how to make applying the two reordering rules (i.e., straight and inverted) dependent on actual blocks being merged remains a challenge. Unlike previous work that only uses boundary words, we propose to use recursive autoencoders to make full use of the entire merging blocks alternatively. The recursive autoencoders are capable of generating vector space representations for variable-sized phrases, which enable predicting orders to exploit syntactic and semantic information from a neural language modeling’s perspective. Experiments on the NIST 2008 dataset show that our system significantly improves over the MaxEnt classifier by 1.07 BLEU points.

5 0.71601528 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

Author: Kevin Gimpel ; Dhruv Batra ; Chris Dyer ; Gregory Shakhnarovich

Abstract: This paper addresses the problem of producing a diverse set of plausible translations. We present a simple procedure that can be used with any statistical machine translation (MT) system. We explore three ways of using diverse translations: (1) system combination, (2) discriminative reranking with rich features, and (3) a novel post-editing scenario in which multiple translations are presented to users. We find that diversity can improve performance on these tasks, especially for sentences that are difficult for MT.

6 0.70994794 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

7 0.70674407 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

8 0.70482475 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

9 0.70370781 158 emnlp-2013-Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

10 0.70256233 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction

11 0.7019732 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks

12 0.70057297 143 emnlp-2013-Open Domain Targeted Sentiment

13 0.69990629 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)

14 0.69925469 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

15 0.69819957 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

16 0.69729757 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

17 0.69703901 80 emnlp-2013-Exploiting Zero Pronouns to Improve Chinese Coreference Resolution

18 0.6968115 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations

19 0.69585764 167 emnlp-2013-Semi-Markov Phrase-Based Monolingual Alignment

20 0.69571948 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge