emnlp emnlp2013 emnlp2013-157 knowledge-graph by maker-knowledge-mining

157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation


Source: pdf

Author: Peng Li ; Yang Liu ; Maosong Sun

Abstract: While inversion transduction grammar (ITG) is well suited for modeling ordering shifts between languages, how to make applying the two reordering rules (i.e., straight and inverted) dependent on actual blocks being merged remains a challenge. Unlike previous work that only uses boundary words, we propose to use recursive autoencoders to make full use of the entire merging blocks alternatively. The recursive autoencoders are capable of generating vector space representations for variable-sized phrases, which enable predicting orders to exploit syntactic and semantic information from a neural language modeling’s perspective. Experiments on the NIST 2008 dataset show that our system significantly improves over the MaxEnt classifier by 1.07 BLEU points.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 cn 1 , Abstract While inversion transduction grammar (ITG) is well suited for modeling ordering shifts between languages, how to make applying the two reordering rules (i. [sent-4, score-0.586]

2 , straight and inverted) dependent on actual blocks being merged remains a challenge. [sent-6, score-0.438]

3 Unlike previous work that only uses boundary words, we propose to use recursive autoencoders to make full use of the entire merging blocks alternatively. [sent-7, score-0.792]

4 The recursive autoencoders are capable of generating vector space representations for variable-sized phrases, which enable predicting orders to exploit syntactic and semantic information from a neural language modeling’s perspective. [sent-8, score-0.744]

5 Therefore, phrase reordering modeling has attracted intensive attention in the past decade (e. [sent-17, score-0.536]

6 Among them, reordering models based on inversion transduction grammar (ITG) (Wu, 1997) are one of the important ongoing research directions. [sent-26, score-0.586]

7 As a result, a number of authors have incorporated ITG into leftto-right decoding to constrain the reordering space and reported significant improvements (e. [sent-28, score-0.536]

8 (2006) propose a maximum entropy (MaxEnt) reordering model based on ITG. [sent-34, score-0.468]

9 They use the CKY algorithm to recursively merge two blocks (i. [sent-35, score-0.285]

10 , a pair of source and target strings) into larger blocks, either in a straight or an inverted order. [sent-37, score-0.37]

11 Unlike lexicalized reordering models (Tillman, 2004; Koehn et al. [sent-38, score-0.468]

12 , 2007; Galley and Manning, 2008) that are defined on individual bilingual phrases, the MaxEnt ITG reordering model is a two-category classifier (i. [sent-39, score-0.571]

13 , straight or inverted) for two arbitrary bilingual phrases of which the source phrases are adja- cent. [sent-41, score-0.438]

14 hc o2d0s1 i3n A Nsastoucria lti Loan fgoura Cgoem Ppruotcaetsiosin agl, L piang eusis 5t6ic7s–57 , problem since there are usually a large number of reordering training examples available (Xiong et al. [sent-44, score-0.468]

15 Despite these successful efforts, the ITG reordering classifiers still face a major challenge: how to extract features from training examples (i. [sent-49, score-0.468]

16 More importantly, it is possible to learn vector space representations for multi-word phrases using recursive autoencoders (Socher et al. [sent-72, score-0.65]

17 , 2011c), which opens the door to leveraging semantic representations of phrases in reordering models from a neural language modeling point of view. [sent-73, score-0.769]

18 In this work, we propose an ITG reordering classifier based on recursive autoencoders. [sent-74, score-0.731]

19 , the first source phrase, the first target phrase, the second source phrase, and the second target phrase) and a softmax layer. [sent-77, score-0.212]

20 The recursive autoencoders, which are trained on reordering examples extracted from word-aligned bilingual corpus, are capable of producing vector space representations for arbitrary multi-word strings in decoding. [sent-78, score-1.031]

21 Therefore, our model takes the whole phrases rather than only boundary words into consideration when predict- ing phrase permutations. [sent-79, score-0.212]

22 X1 and X2 are two neighboring blocks of which the two source phrases are adjacent. [sent-86, score-0.383]

23 While rule (1) merges two target phrases in a straight order, rule (2) merges in an inverted order. [sent-87, score-0.394]

24 Besides these two reordering rules, rule (3) is a lexical rule that translates a source phrase f into a target phrase e. [sent-88, score-0.694]

25 atomic blocks: blocks generated by applying lexical rules, 2. [sent-93, score-0.352]

26 Our neural ITG reordering model first assigns vector space representations to single wo=rd hsf and tih teon r produces ave bclotcorks. [sent-99, score-0.791]

27 for phrases using recursive autoencoders, which form atomic blocks. [sent-100, score-0.389]

28 The atomic blocks are recursively merged into composed blocks, the vector space representations of which are produced by recursive autoencoders simultaneously. [sent-101, score-1.065]

29 The neural classifier makes decisions at each node using the vectors of all its descendants. [sent-102, score-0.24]

30 More hfij formally, a block Xi,j,k,l = , elki is a pair of a source phrase = fi+1 . [sent-108, score-0.274]

31 Obviously, these atomic blocks are generated by lexical rules. [sent-115, score-0.352]

32 Two blocks of which the source phrases are adjacent can be merged into a larger one in two ways: concatenating the target phrases in a straight order using rule (1) or in an inverted order using rule (2). [sent-116, score-0.818]

33 For example, atomic blocks X3,5,5,6 and X5,8,6,8 are merged into a composed block X3,8,5,8 in a straight order, which is further merged with an atomic block X8,10,3,5 into another composed block X3,10,3,8 in an inverted order. [sent-117, score-1.181]

34 The major challenge of applying ITG to machine translation is to decide when to merge two blocks in a straight order and when in an inverted order. [sent-119, score-0.562]

35 Therefore, the ITG reordering model can be seen as a two-category classifier P(o|X1 , X2), where o ∈ {straight, inverted}. [sent-120, score-0.521]

36 tAr aniagivhet way eisr t eod assign fixed probabilities to two reordering rules, which is referred to as flat model by Xiong et al. [sent-121, score-0.468]

37 1p − p o = =i s ntrvaeirgtehdt (4) The drawback of the flat model is ignoring the actual blocks being merged. [sent-123, score-0.251]

38 Intuitively, different blocks should have different preferences between the two orders. [sent-124, score-0.251]

39 Actually, 570 Figure 2: A recursive autoencoder for multi-word strings. [sent-132, score-0.306]

40 it is hard to decide which internal words composed blocks are representative and tive. [sent-136, score-0.317]

41 (2008) find that the MaxEnt classifier with boundary words as features is prone to make wrong predictions for long composed blocks. [sent-140, score-0.218]

42 As a result, they have to impose a hard constraint to always prefer merging long composed blocks in a monotonic way. [sent-141, score-0.403]

43 Therefore, it is important to consider more than boundary words to make more accurate reordering predictions. [sent-142, score-0.534]

44 1 Vector Space Representations for Words In neural networks, a natural language word is represented as a real-valued vector (Bengio et al. [sent-147, score-0.209]

45 4]T to represent “female” and 1Strictly speaking, the ITG reordering model is not a phrase reordering model since phrase pairs are only the atomic blocks. [sent-152, score-1.173]

46 Instead, it is defined to work on arbitrarily long strings because composed blocks become larger and larger until the entire sentence pair is generated. [sent-153, score-0.441]

47 The binary classifier makes decisions based on the vector space representa- tions of the source and target sides of merging blocks. [sent-155, score-0.324]

48 Such vector space representations enable natural language words to be fed to neural networks as input. [sent-160, score-0.362]

49 aGtriivxen L a sentence ,t whahte eirse an o |rd iser theed lviostof m words, each word has an associated vocabulary index k into the word embedding matrix L that we use to retrieve the word’s vector space representation. [sent-163, score-0.218]

50 2 Vector Space Representations for Multi-Word Strings To apply neural networks to ITG-based translation, it is important to generate vector space representations for atomic and composed blocks. [sent-168, score-0.529]

51 The same neural network can be recursively applied to two strings until the vector of the entire sentence is generated. [sent-188, score-0.367]

52 As ITG derivation builds a binary parse tree, the neural network can be naturally integrated into CKY parsing. [sent-189, score-0.23]

53 These neural networks are called recursive autoencoders (Socher et al. [sent-192, score-0.572]

54 Figure 2 illustrates an application of a recursive autoencoder to a f(2) b(2) binary tree. [sent-194, score-0.334]

55 The binary tree is composed of a set of triplets in the form of (p → c1 c2), where p is a parent vector and c1 aonfd ( c2 are children vectors of p. [sent-197, score-0.266]

56 In→ Figure 1, we use recursive autoencoders to generate vector space representations for Chinese and English phrases, which form the atomic blocks for further block merging. [sent-200, score-1.002]

57 3 A Neural ITG Reordering Model Once the vectors for blocks are generated, it is straightforward to introduce a neural ITG reordering model. [sent-203, score-0.906]

58 As shown in Figure 3, the neural network consists of an input layer and a softmax layer. [sent-204, score-0.254]

59 The input layer is composed of the vectors of the first source phrase, the first target phrase, the second source phrase, and the second target phrase. [sent-205, score-0.338]

60 Note that all phrases in the same language use the the same recursive autoencoder. [sent-206, score-0.288]

61 3 Training There are three sets of parameters in our recursive autoencoders: 1. [sent-208, score-0.21]

62 θrec: recursive autoencoder parameter matrices and bias terms for both source and target languages (Section 2. [sent-212, score-0.454]

63 θreo: neural ITG reordering model parameter matrix Wo and bias term bo (Section 2. [sent-215, score-0.68]

64 This works well in a supervised scenario, in which a neural network updates the matrix in order to optimize some task-specific objectives (Collobert et al. [sent-222, score-0.216]

65 In the second setting, the word embedding matrix is pre-trained using an unsupervised neural language model (Bengio et al. [sent-225, score-0.26]

66 In this work, we prefer to the first setting because the word embedding matrices can be trained to minimize errors with respect to reordering modeling. [sent-227, score-0.574]

67 reconstruction error: how well the learned vector space representations represent the corresponding strings? [sent-229, score-0.299]

68 reordering error: how well the classifier predicts the merging order? [sent-231, score-0.574]

69 2, the input vector c1 and c2 of a recursive autoencoder can be reconstructed using Eq. [sent-234, score-0.446]

70 We use Euclidean distance between the input and the reconstructed vectors to measure the reconstruction error: Erec([c1;c2];θ) = 21 ? [sent-236, score-0.236]

71 Suppose Erec( [x1; x2] ; θ) is the smallest, the algorithm will replace x1 and x2 with their vector representation y1 produced by the recursive autoencoder. [sent-250, score-0.277]

72 Given a training example set S = {ti = (oi, Xi1, Xi2)}, the average reconstruction error on the source si)d}e, on eth aev training set istsr duectfiinoend e as Erec,s(S;θ) =N1sXip∈TXRθ(ti,s)Erec([p. [sent-254, score-0.21]

73 reordering error aisi dgehftin,iendv as As a result, the Ereo(S;θ) =|S1|XiEc(ti;θ). [sent-264, score-0.506]

74 (15) Therefore, the joint training objective function is J = αErec(S; θ) + (1−α)Ereo(S; θ) + R(θ) (16) where α is a parameter used to balance the preference between reconstruction error and reordering error, R(θ) is the regularizer and defined as 2 R(θ) =λ2LkθLk2+λ2reckθreck2+λr2eokθreok2. [sent-265, score-0.624]

75 (201 1c) stated, a naive way for lowering the reconstruction error is to make the magnitude of the hidden layer very small, which is 2The bias terms b(1), b(2) and bo are not regularized. [sent-267, score-0.232]

76 Because of the expensive computational cost for training our neural ITG reordering model, only the reordering examples extracted from about 1/5 of the entire parallel training corpus were used to train our neural ITG reordering model. [sent-289, score-1.719]

77 For the neural ITG reordering model, we set the dimension of the word embedding vectors to 25 empirically, which is a trade-offbetween computational cost and expressive power. [sent-290, score-0.732]

78 We randomly select 400,000 reordering examples as training set, 500 as development set, and another 500 as test set. [sent-294, score-0.468]

79 The numbers of straight and inverted reordering examples in the development/test set are set to be equal to avoid biases. [sent-295, score-0.748]

80 “maxent” denotes the baseline maximum entropy system and “neural” denotes our recursive autoencoder system. [sent-304, score-0.306]

81 Our system is different from the baseline by replacing the MaxEnt reordering model with a neural model. [sent-317, score-0.61]

82 574 Figure 4: Comparison of reordering classification accuracies between the MaxEnt and neural classifiers over varying phrase lengths. [sent-333, score-0.678]

83 “Length” denotes the sum of the lengths of two source phrases in a reordering example. [sent-334, score-0.6]

84 8743285908 Table 3: The effect of reordering training data size on BLEU scores. [sent-338, score-0.468]

85 Due to the computational cost, we only used 1/5 of the entire bilingual corpus to train our neural reordering model. [sent-340, score-0.691]

86 “Length” denotes the sum of the lengths of two source phrases in a reordering example. [sent-343, score-0.6]

87 For each length, we randomly select 200 unseen reordering examples to calculate the classification accuracy. [sent-344, score-0.468]

88 (2008) find that the performance of the baseline system can be improved by forbidding inverted reordering if the phrase length exceeds a pre-defined distortion limit. [sent-347, score-0.707]

89 The above results suggest that our system does go beyond using boundary words and make a better use of the merging blocks by using vector space representations. [sent-356, score-0.47]

90 As the training process is very time-consuming, only the reordering examples extracted from 1/5 of the entire parallel training cor- pus are used in our experiments to train our model. [sent-359, score-0.499]

91 Obviously, with more efficient training algorithms, making full use of all the reordering examples extracted from the entire corpus will result in better results. [sent-360, score-0.499]

92 We find that the words and phrases in the same cluster have sim- 575 ilar behaviors from a reordering point of view rather than relatedness. [sent-367, score-0.585]

93 This indicates that the vector representations produced by the recursive autoencoders are helpful for capturing reordering regularities. [sent-368, score-1.007]

94 5 Conclusion We have presented an ITG reordering classifier based on recursive autoencoders. [sent-369, score-0.731]

95 As recursive autoencoders are capable of producing vector space representations for arbitrary multi-word strings in decoding, our neural ITG system achieves an absolute improvement of 1. [sent-370, score-0.836]

96 (2013) to combine linguistically-motivated labels with recursive neural networks. [sent-377, score-0.352]

97 A simple and effective hierarchical phrase reordering model. [sent-435, score-0.536]

98 Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. [sent-496, score-0.391]

99 Parsing natural scenes and natural language with recursive neural networks. [sent-503, score-0.352]

100 Maximum entropy based phrase reordering model for statistical machine translation. [sent-539, score-0.536]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('reordering', 0.468), ('itg', 0.379), ('blocks', 0.251), ('recursive', 0.21), ('erec', 0.201), ('autoencoders', 0.181), ('xiong', 0.172), ('socher', 0.172), ('maxent', 0.166), ('straight', 0.146), ('neural', 0.142), ('inverted', 0.134), ('reconstruction', 0.118), ('atomic', 0.101), ('autoencoder', 0.096), ('bleu', 0.088), ('representations', 0.081), ('phrases', 0.078), ('block', 0.078), ('embedding', 0.077), ('friend', 0.073), ('reconstructed', 0.073), ('bordes', 0.073), ('nist', 0.07), ('collobert', 0.07), ('phrase', 0.068), ('vector', 0.067), ('boundary', 0.066), ('composed', 0.066), ('tsinghua', 0.064), ('transduction', 0.062), ('strings', 0.06), ('female', 0.059), ('reo', 0.058), ('inversion', 0.056), ('source', 0.054), ('classifier', 0.053), ('weston', 0.053), ('merging', 0.053), ('bilingual', 0.05), ('yoshua', 0.049), ('layer', 0.047), ('rec', 0.046), ('vectors', 0.045), ('oi', 0.045), ('glorot', 0.044), ('matrix', 0.041), ('merged', 0.041), ('antoine', 0.041), ('bengio', 0.04), ('cluster', 0.039), ('networks', 0.039), ('deyi', 0.038), ('error', 0.038), ('distortion', 0.037), ('wo', 0.037), ('bergstra', 0.037), ('bisazza', 0.037), ('elki', 0.037), ('ereo', 0.037), ('hfij', 0.037), ('pages', 0.036), ('target', 0.036), ('decoding', 0.035), ('zens', 0.035), ('euclidean', 0.035), ('recursively', 0.034), ('space', 0.033), ('long', 0.033), ('galley', 0.033), ('children', 0.033), ('greedy', 0.033), ('network', 0.033), ('arbitrary', 0.032), ('richard', 0.032), ('softmax', 0.032), ('tillman', 0.032), ('translation', 0.031), ('och', 0.031), ('entire', 0.031), ('capable', 0.03), ('koehn', 0.029), ('shouxun', 0.029), ('goller', 0.029), ('bias', 0.029), ('hermann', 0.029), ('matrices', 0.029), ('qun', 0.028), ('binary', 0.028), ('ti', 0.027), ('aug', 0.027), ('cong', 0.027), ('aiti', 0.027), ('triplets', 0.027), ('riezler', 0.027), ('derivation', 0.027), ('child', 0.026), ('proceedings', 0.026), ('christopher', 0.026), ('activation', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

Author: Peng Li ; Yang Liu ; Maosong Sun

Abstract: While inversion transduction grammar (ITG) is well suited for modeling ordering shifts between languages, how to make applying the two reordering rules (i.e., straight and inverted) dependent on actual blocks being merged remains a challenge. Unlike previous work that only uses boundary words, we propose to use recursive autoencoders to make full use of the entire merging blocks alternatively. The recursive autoencoders are capable of generating vector space representations for variable-sized phrases, which enable predicting orders to exploit syntactic and semantic information from a neural language modeling’s perspective. Experiments on the NIST 2008 dataset show that our system significantly improves over the MaxEnt classifier by 1.07 BLEU points.

2 0.30545819 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation

Author: Zhongqiang Huang ; Jacob Devlin ; Rabih Zbib

Abstract: Translation Jacob Devlin Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA j devl in@bbn . com Rabih Zbib Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA r zbib@bbn . com have tried to introduce grammaticality to the transThis paper describes a factored approach to incorporating soft source syntactic constraints into a hierarchical phrase-based translation system. In contrast to traditional approaches that directly introduce syntactic constraints to translation rules by explicitly decorating them with syntactic annotations, which often exacerbate the data sparsity problem and cause other problems, our approach keeps translation rules intact and factorizes the use of syntactic constraints through two separate models: 1) a syntax mismatch model that associates each nonterminal of a translation rule with a distribution of tags that is used to measure the degree of syntactic compatibility of the translation rule on source spans; 2) a syntax-based reordering model that predicts whether a pair of sibling constituents in the constituent parse tree of the source sentence should be reordered or not when translated to the target language. The features produced by both models are used as soft constraints to guide the translation process. Experiments on Chinese-English translation show that the proposed approach significantly improves a strong string-to-dependency translation system on multiple evaluation sets.

3 0.24011338 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

Author: Uri Lerner ; Slav Petrov

Abstract: We present a simple and novel classifier-based preordering approach. Unlike existing preordering models, we train feature-rich discriminative classifiers that directly predict the target-side word order. Our approach combines the strengths of lexical reordering and syntactic preordering models by performing long-distance reorderings using the structure of the parse tree, while utilizing a discriminative model with a rich set of features, including lexical features. We present extensive experiments on 22 language pairs, including preordering into English from 7 other languages. We obtain improvements of up to 1.4 BLEU on language pairs in the WMT 2010 shared task. For languages from different families the improvements often exceed 2 BLEU. Many of these gains are also significant in human evaluations.

4 0.19173989 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation

Author: Katsuhiko Hayashi ; Katsuhito Sudoh ; Hajime Tsukada ; Jun Suzuki ; Masaaki Nagata

Abstract: This paper presents a novel word reordering model that employs a shift-reduce parser for inversion transduction grammars. Our model uses rich syntax parsing features for word reordering and runs in linear time. We apply it to postordering of phrase-based machine translation (PBMT) for Japanese-to-English patent tasks. Our experimental results show that our method achieves a significant improvement of +3.1 BLEU scores against 30.15 BLEU scores of the baseline PBMT system.

5 0.18936223 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

Author: Hendra Setiawan ; Bowen Zhou ; Bing Xiang

Abstract: Reordering poses one of the greatest challenges in Statistical Machine Translation research as the key contextual information may well be beyond the confine oftranslation units. We present the “Anchor Graph” (AG) model where we use a graph structure to model global contextual information that is crucial for reordering. The key ingredient of our AG model is the edges that capture the relationship between the reordering around a set of selected translation units, which we refer to as anchors. As the edges link anchors that may span multiple translation units at decoding time, our AG model effectively encodes global contextual information that is previously absent. We integrate our proposed model into a state-of-the-art translation system and demonstrate the efficacy of our proposal in a largescale Chinese-to-English translation task.

6 0.16717863 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering

7 0.16022897 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

8 0.15854031 158 emnlp-2013-Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

9 0.12927659 172 emnlp-2013-Simple Customization of Recursive Neural Networks for Semantic Relation Classification

10 0.1236731 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

11 0.12195998 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

12 0.10894369 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

13 0.08952029 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

14 0.088990889 134 emnlp-2013-Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks

15 0.083540425 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

16 0.083088011 113 emnlp-2013-Joint Language and Translation Modeling with Recurrent Neural Networks

17 0.076352812 87 emnlp-2013-Fish Transporters and Miracle Homes: How Compositional Distributional Semantics can Help NP Parsing

18 0.075312681 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity

19 0.071838483 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

20 0.071837828 55 emnlp-2013-Decoding with Large-Scale Neural Language Models Improves Translation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.259), (1, -0.254), (2, 0.058), (3, 0.006), (4, 0.135), (5, 0.033), (6, -0.03), (7, -0.117), (8, -0.137), (9, 0.092), (10, 0.069), (11, 0.117), (12, -0.146), (13, -0.154), (14, -0.2), (15, 0.182), (16, 0.059), (17, -0.197), (18, -0.116), (19, -0.011), (20, -0.023), (21, -0.073), (22, 0.103), (23, 0.077), (24, 0.093), (25, 0.069), (26, -0.007), (27, -0.051), (28, -0.023), (29, 0.051), (30, -0.013), (31, 0.115), (32, -0.065), (33, 0.056), (34, -0.083), (35, 0.057), (36, -0.001), (37, -0.029), (38, 0.057), (39, -0.007), (40, -0.028), (41, -0.028), (42, -0.04), (43, -0.003), (44, 0.055), (45, -0.088), (46, -0.02), (47, 0.072), (48, -0.007), (49, 0.06)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94302076 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

Author: Peng Li ; Yang Liu ; Maosong Sun

Abstract: While inversion transduction grammar (ITG) is well suited for modeling ordering shifts between languages, how to make applying the two reordering rules (i.e., straight and inverted) dependent on actual blocks being merged remains a challenge. Unlike previous work that only uses boundary words, we propose to use recursive autoencoders to make full use of the entire merging blocks alternatively. The recursive autoencoders are capable of generating vector space representations for variable-sized phrases, which enable predicting orders to exploit syntactic and semantic information from a neural language modeling’s perspective. Experiments on the NIST 2008 dataset show that our system significantly improves over the MaxEnt classifier by 1.07 BLEU points.

2 0.73698574 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

Author: Uri Lerner ; Slav Petrov

Abstract: We present a simple and novel classifier-based preordering approach. Unlike existing preordering models, we train feature-rich discriminative classifiers that directly predict the target-side word order. Our approach combines the strengths of lexical reordering and syntactic preordering models by performing long-distance reorderings using the structure of the parse tree, while utilizing a discriminative model with a rich set of features, including lexical features. We present extensive experiments on 22 language pairs, including preordering into English from 7 other languages. We obtain improvements of up to 1.4 BLEU on language pairs in the WMT 2010 shared task. For languages from different families the improvements often exceed 2 BLEU. Many of these gains are also significant in human evaluations.

3 0.72516745 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation

Author: Katsuhiko Hayashi ; Katsuhito Sudoh ; Hajime Tsukada ; Jun Suzuki ; Masaaki Nagata

Abstract: This paper presents a novel word reordering model that employs a shift-reduce parser for inversion transduction grammars. Our model uses rich syntax parsing features for word reordering and runs in linear time. We apply it to postordering of phrase-based machine translation (PBMT) for Japanese-to-English patent tasks. Our experimental results show that our method achieves a significant improvement of +3.1 BLEU scores against 30.15 BLEU scores of the baseline PBMT system.

4 0.64526546 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

Author: Hendra Setiawan ; Bowen Zhou ; Bing Xiang

Abstract: Reordering poses one of the greatest challenges in Statistical Machine Translation research as the key contextual information may well be beyond the confine oftranslation units. We present the “Anchor Graph” (AG) model where we use a graph structure to model global contextual information that is crucial for reordering. The key ingredient of our AG model is the edges that capture the relationship between the reordering around a set of selected translation units, which we refer to as anchors. As the edges link anchors that may span multiple translation units at decoding time, our AG model effectively encodes global contextual information that is previously absent. We integrate our proposed model into a state-of-the-art translation system and demonstrate the efficacy of our proposal in a largescale Chinese-to-English translation task.

5 0.63577157 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation

Author: Zhongqiang Huang ; Jacob Devlin ; Rabih Zbib

Abstract: Translation Jacob Devlin Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA j devl in@bbn . com Rabih Zbib Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA r zbib@bbn . com have tried to introduce grammaticality to the transThis paper describes a factored approach to incorporating soft source syntactic constraints into a hierarchical phrase-based translation system. In contrast to traditional approaches that directly introduce syntactic constraints to translation rules by explicitly decorating them with syntactic annotations, which often exacerbate the data sparsity problem and cause other problems, our approach keeps translation rules intact and factorizes the use of syntactic constraints through two separate models: 1) a syntax mismatch model that associates each nonterminal of a translation rule with a distribution of tags that is used to measure the degree of syntactic compatibility of the translation rule on source spans; 2) a syntax-based reordering model that predicts whether a pair of sibling constituents in the constituent parse tree of the source sentence should be reordered or not when translated to the target language. The features produced by both models are used as soft constraints to guide the translation process. Experiments on Chinese-English translation show that the proposed approach significantly improves a strong string-to-dependency translation system on multiple evaluation sets.

6 0.53222448 172 emnlp-2013-Simple Customization of Recursive Neural Networks for Semantic Relation Classification

7 0.51149344 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

8 0.51121074 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering

9 0.43460727 158 emnlp-2013-Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

10 0.38722664 134 emnlp-2013-Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks

11 0.38605449 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

12 0.38589713 156 emnlp-2013-Recurrent Continuous Translation Models

13 0.36332157 55 emnlp-2013-Decoding with Large-Scale Neural Language Models Improves Translation

14 0.35531977 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

15 0.3537789 59 emnlp-2013-Deriving Adjectival Scales from Continuous Space Word Representations

16 0.33871314 87 emnlp-2013-Fish Transporters and Miracle Homes: How Compositional Distributional Semantics can Help NP Parsing

17 0.33595365 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

18 0.33053461 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

19 0.32902503 191 emnlp-2013-Understanding and Quantifying Creativity in Lexical Composition

20 0.31663111 113 emnlp-2013-Joint Language and Translation Modeling with Recurrent Neural Networks


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.026), (6, 0.048), (10, 0.015), (18, 0.04), (22, 0.065), (30, 0.115), (44, 0.174), (45, 0.018), (50, 0.021), (51, 0.132), (66, 0.04), (71, 0.023), (75, 0.034), (77, 0.087), (96, 0.025), (97, 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86411345 116 emnlp-2013-Joint Parsing and Disfluency Detection in Linear Time

Author: Mohammad Sadegh Rasooli ; Joel Tetreault

Abstract: We introduce a novel method to jointly parse and detect disfluencies in spoken utterances. Our model can use arbitrary features for parsing sentences and adapt itself with out-ofdomain data. We show that our method, based on transition-based parsing, performs at a high level of accuracy for both the parsing and disfluency detection tasks. Additionally, our method is the fastest for the joint task, running in linear time.

same-paper 2 0.83624822 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

Author: Peng Li ; Yang Liu ; Maosong Sun

Abstract: While inversion transduction grammar (ITG) is well suited for modeling ordering shifts between languages, how to make applying the two reordering rules (i.e., straight and inverted) dependent on actual blocks being merged remains a challenge. Unlike previous work that only uses boundary words, we propose to use recursive autoencoders to make full use of the entire merging blocks alternatively. The recursive autoencoders are capable of generating vector space representations for variable-sized phrases, which enable predicting orders to exploit syntactic and semantic information from a neural language modeling’s perspective. Experiments on the NIST 2008 dataset show that our system significantly improves over the MaxEnt classifier by 1.07 BLEU points.

3 0.72981989 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

Author: Fandong Meng ; Jun Xie ; Linfeng Song ; Yajuan Lu ; Qun Liu

Abstract: We present a novel translation model, which simultaneously exploits the constituency and dependency trees on the source side, to combine the advantages of two types of trees. We take head-dependents relations of dependency trees as backbone and incorporate phrasal nodes of constituency trees as the source side of our translation rules, and the target side as strings. Our rules hold the property of long distance reorderings and the compatibility with phrases. Large-scale experimental results show that our model achieves significantly improvements over the constituency-to-string (+2.45 BLEU on average) and dependencyto-string (+0.91 BLEU on average) models, which only employ single type of trees, and significantly outperforms the state-of-theart hierarchical phrase-based model (+1.12 BLEU on average), on three Chinese-English NIST test sets.

4 0.72360629 158 emnlp-2013-Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Author: Richard Socher ; Alex Perelygin ; Jean Wu ; Jason Chuang ; Christopher D. Manning ; Andrew Ng ; Christopher Potts

Abstract: Semantic word spaces have been very useful but cannot express the meaning of longer phrases in a principled way. Further progress towards understanding compositionality in tasks such as sentiment detection requires richer supervised training and evaluation resources and more powerful models of composition. To remedy this, we introduce a Sentiment Treebank. It includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality. To address them, we introduce the Recursive Neural Tensor Network. When trained on the new treebank, this model outperforms all previous methods on several metrics. It pushes the state of the art in single sentence positive/negative classification from 80% up to 85.4%. The accuracy of predicting fine-grained sentiment labels for all phrases reaches 80.7%, an improvement of 9.7% over bag of features baselines. Lastly, it is the only model that can accurately capture the effects of negation and its scope at various tree levels for both positive and negative phrases.

5 0.7212407 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

Author: Uri Lerner ; Slav Petrov

Abstract: We present a simple and novel classifier-based preordering approach. Unlike existing preordering models, we train feature-rich discriminative classifiers that directly predict the target-side word order. Our approach combines the strengths of lexical reordering and syntactic preordering models by performing long-distance reorderings using the structure of the parse tree, while utilizing a discriminative model with a rich set of features, including lexical features. We present extensive experiments on 22 language pairs, including preordering into English from 7 other languages. We obtain improvements of up to 1.4 BLEU on language pairs in the WMT 2010 shared task. For languages from different families the improvements often exceed 2 BLEU. Many of these gains are also significant in human evaluations.

6 0.71869385 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

7 0.71783042 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

8 0.71558398 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation

9 0.7150656 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

10 0.71478796 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

11 0.71415585 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

12 0.7136898 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

13 0.70801973 172 emnlp-2013-Simple Customization of Recursive Neural Networks for Semantic Relation Classification

14 0.70777136 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

15 0.70631808 134 emnlp-2013-Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks

16 0.69802582 113 emnlp-2013-Joint Language and Translation Modeling with Recurrent Neural Networks

17 0.69735587 88 emnlp-2013-Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding

18 0.69662601 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction

19 0.69529843 52 emnlp-2013-Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation

20 0.69426757 156 emnlp-2013-Recurrent Continuous Translation Models