emnlp emnlp2013 emnlp2013-127 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xinyan Xiao ; Deyi Xiong
Abstract: Traditional synchronous grammar induction estimates parameters by maximizing likelihood, which only has a loose relation to translation quality. Alternatively, we propose a max-margin estimation approach to discriminatively inducing synchronous grammars for machine translation, which directly optimizes translation quality measured by BLEU. In the max-margin estimation of parameters, we only need to calculate Viterbi translations. This further facilitates the incorporation of various non-local features that are defined on the target side. We test the effectiveness of our max-margin estimation framework on a competitive hierarchical phrase-based system. Experiments show that our max-margin method significantly outperforms the traditional twostep pipeline for synchronous rule extraction by 1.3 BLEU points and is also better than previous max-likelihood estimation method.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract Traditional synchronous grammar induction estimates parameters by maximizing likelihood, which only has a loose relation to translation quality. [sent-3, score-1.04]
2 Alternatively, we propose a max-margin estimation approach to discriminatively inducing synchronous grammars for machine translation, which directly optimizes translation quality measured by BLEU. [sent-4, score-1.021]
3 Experiments show that our max-margin method significantly outperforms the traditional twostep pipeline for synchronous rule extraction by 1. [sent-8, score-0.701]
4 1 Introduction Synchronous grammar induction, which refers to the process of learning translation rules from bilingual corpus, still remains an open problem in statistical machine translation (SMT). [sent-10, score-0.998]
5 Although stateof-the-art SMT systems model the translation process based on synchronous grammars (including bilingual phrases), most of them still learn translation rules via a pipeline with word-based heuristics (Koehn et al. [sent-11, score-1.397]
6 Therefore, researchers have proposed alternative approaches to learning synchronous grammars directly from sentence pairs without word alignments, via generative models (Marcu and Wong, 2002; Cherry and Lin, 2007; Zhang et al. [sent-18, score-0.545]
7 Theoretically, these approaches describe how sentence pairs are generated by applying sequences of synchronous rules in an elegant way. [sent-25, score-0.568]
8 However, they learn synchronous grammars by maximizing likelihood,1 which only has a loose relation to translation quality (He and Deng, 2012). [sent-26, score-0.864]
9 Moreover, generative models are normally hard to be extended to incorporate useful features, and the discriminative synchronous grammar induction model proposed by Xiao et al. [sent-27, score-0.879]
10 Consequently, we would like to learn synchronous grammars in a discriminative way that can directly maximize the end-to-end translation quality measured by BLEU (Papineni et al. [sent-30, score-0.978]
11 hc o2d0s1 i3n A Nsastoucria lti Loan fgoura Cgoem Ppruotcaetsiosin agl, L piang eusis 2t5ic5s–264, to discriminatively induce synchronous grammar directly from sentence pairs without word alignments. [sent-36, score-0.632]
12 We try to maximize the margin between a reference translation and a candidate translation with translation errors that are measured by BLEU. [sent-37, score-1.081]
13 The more serious the translation errors, the larger the margin. [sent-38, score-0.319]
14 In this way, our max-margin method is able to learn synchronous grammars according to their translation performance. [sent-39, score-0.864]
15 We efficiently calculate the non-local feature values of a translation over its exponential derivation space using the inside-outside algorithm. [sent-41, score-0.65]
16 Because our maxmargin estimation optimizes feature weights only by the feature values of Viterbi and reference transla- tions, we are able to efficiently perform optimization even with non-local features. [sent-42, score-0.481]
17 We apply the proposed max-margin estimation method to learn synchronous grammars for a hierarchical phrase-based translation system (Chiang, 2007) which typically produces state-of-the-art performance. [sent-43, score-1.021]
18 With non-local features defined on target parse trees, our max-margin method significantly outperforms the baseline that uses synchronous rules learned from the traditional pipeline by 1. [sent-44, score-0.904]
19 Section 2 presents the discriminative synchronous grammar induction model with the nonlocal features. [sent-47, score-0.919]
20 In Section 3, we elaborate our maxmargin estimation method which is able to directly optimize BLEU, and discuss how we induce grammar rules. [sent-48, score-0.435]
21 d Gidivateen t ara snosulractieo snesn tine ntchee target language othteast can be generated by a synchronous grammar G. [sent-53, score-0.709]
22 A translation t ∈ T (s) is generated by a sequence of ttrraannssllaattiioonn steps (r1, . [sent-54, score-0.319]
23 , rn), wratheedre b we apply a syn256 : r1 r2: r3: ⟨ yu shalong ⇒ with Sharon ⟩ ⟨ yXu juxing ghu ⇒itan w ⇒ hSehaldr a t a⟩lk X ⟩ ⟨ bXus jhuix Xing ⇒ Bitaunsh ⇒ ⇒X h ⟩ Figure 1: A derivation of a sentence pair represented by a synchronous tree. [sent-57, score-0.632]
24 A dashed line denotes an alignment from a source span to a target span. [sent-60, score-0.515]
25 a sum of values of each synchronous rule in the derivation d. [sent-78, score-0.721]
26 A feature is a local feature if and only if it can be factored among the translation steps in a derivation. [sent-80, score-0.492]
27 Our discriminative model allows to incorporate nonlocal features that are defined on target translations. [sent-83, score-0.363]
28 A source span boundary feature in Figure 2(b) that is defined on the source parse tree is also a local feature. [sent-85, score-0.682]
29 However, a target span boundary feature in Figure 2(c), which assesses the target parse structure, is a non-local feature. [sent-86, score-0.593]
30 According to Figure 1, the span is parsed in step r2, but it also depends on the translation boundary word “held” generated in previous step r1. [sent-87, score-0.641]
31 However, it is computationally expensive to calculate the expected values of non-local features over D(s), as non-local fueeastu orefs n require lto f eraetcuorreds s otvateers Dof( target boundary 257 related notations for clarity. [sent-90, score-0.333]
32 Fortunately, when integrating out derivations over the derivation space D(s, t) of a source sentence and its translation, we can efficiently ccealc suenlatteen cthee a nnodn i-tslo ctraaln fselaa-tures. [sent-92, score-0.338]
33 In the proposed max-margin estimation described in next section, we only need to integrate out derivation for a Viterbi translation and a reference translation when updating feature weights. [sent-96, score-1.145]
34 c 3 Max-Margin Estimation In this section, we describe how we use a parallel training corpus {S, T} = to estimate fnegat cuorerp weights Tθ,} }w =hic {h( scontain parameters of the induced synchronous grammars and the defined non-local features. [sent-98, score-0.602]
35 We choose the parameters that maximize the translation quality measured by BLEU using the max-margin estimation (Taskar et al. [sent-99, score-0.476]
36 Margin refers to the difference of the model score between a reference translation and a candidate translation t. [sent-101, score-0.723]
37 We hope that the worse the translation quality of t, the larger the margin between t and . [sent-102, score-0.358]
38 In this way, we penalize larger translation {(s(i), t(i))}iN=1 t(i) t(i) errors more severely than smaller ones. [sent-103, score-0.319]
39 21∥θ∥2 f(s(i), t(i)) − f(s(i), ∀t ∈ T (s(i)) (3) t) ≥ cost(t(i),t) Here, f(s, t) is the feature function of a translation, and cost function cost(t(i) , t) measures the translation errors of a candidate translation t comparing with a reference translation t(i) . [sent-107, score-1.266]
40 We define the cost function via the widely-used translation evaluation metric BLEU. [sent-108, score-0.447]
41 1 Integrate Out Derivation by Averaging Although we only model the triple ⟨s, t,d⟩ in the equation (1), i ot’nsl necessary to ctrailpcluel a⟨ste, tt,hed scoring function f(s, t) of a translation by integrating out the variable of derivation as derivation is not observed in the training data. [sent-116, score-0.833]
42 We use an averaging computation over all possible derivations of a translation D(s, t). [sent-117, score-0.39]
43 a 258 First, as a translation has an exponential number of derivations, finding the max derivation of a reference translation for learning is nontrivial (Chiang et al. [sent-126, score-0.979]
44 Second, the max derivation estimation will result in a low rule coverage, as rules in a max derivation only covers a small fraction of rules in the D(s, t). [sent-128, score-0.914]
45 Because rule coverage is important in synchronous grammar induction, we sw imouplod ltiaknet iton explore the entire derivation space using the average operator. [sent-129, score-0.896]
46 Costaugmented inference finds a translation that has a maximum model score augmented with cost. [sent-133, score-0.319]
47 Using the average scoring function in the equation (5), the sub-gradient of hinge loss function for a sentence pair is the difference of average feature values between a Viterbi translation × Algorithm 1UPDATE(s, t,θ, G) ◃ One step in online algorithm. [sent-138, score-0.658]
48 In the procedure, we first biparse the sentence pair to construct a synchronous hypergraph of a reference translation (line 1). [sent-142, score-1.12]
49 In the biparsing algorithm, synchronous rules for constructing hyperedges are not required to be in G, but can be any rules that follow the form defined in Chiang (2007). [sent-143, score-0.91]
50 Thus, the biparsing algorithm can discover new rules that are not in G. [sent-144, score-0.323]
51 Then we collect the translation rules discovered in the hypergraph of the reference translation (line 2), which are rules indicated by hyperedges in the hypergraph. [sent-145, score-1.185]
52 We then calculate the Viterbi translation according to the scoring function and cost function (see Section 3. [sent-146, score-0.591]
53 3) (line 3), and build the synchronous hypergraph for the Viterbi translation (line 4). [sent-147, score-0.921]
54 The sub-gradient is calculated based on the hypergraph of Viterbi translation and reference translation. [sent-149, score-0.549]
55 Then, we collect grammar rules from the generated reference hypergraphs. [sent-152, score-0.371]
56 To find the Viterbi translation, we run the traditional translation decoding algorithm (Chiang, 2007) to get the best derivation. [sent-158, score-0.449]
57 Then we use the translation yielded by the best derivation as the Viterbi translation. [sent-159, score-0.494]
58 We build synchronous hypergraphs using the cube-pruning based biparsing algorithm (Xiao et al. [sent-164, score-0.696]
59 Using a chart, the biparsing algorithm constructs k-best alignments for every source word (lines 1-5) and kbest hyperedges for every source span (lines 6-13) from the bottom up. [sent-167, score-0.71]
60 Thus, a synchronous hypergraph is generated during the construction of the chart. [sent-168, score-0.602]
61 Create k-best hyperedges for each source span 6: H ← ∅ 7: fHor ← ←h ←∅ 1, . [sent-183, score-0.363]
62 Here γi is a partial source parse that covers either a single source word or a span of source words. [sent-191, score-0.512]
63 Then it uses the cube pruning algorithm to keep the top k derivations among all partial derivations that share the same source span [i, j] (line 12). [sent-192, score-0.449]
64 Notably, this biparsing algorithm does not require specific translation rules as input. [sent-193, score-0.605]
65 Instead, it is able to discover new synchronous grammar rules when constructing a synchronous hypergraph: extracting each hyperedge in the hypergraph as a synchronous rule. [sent-194, score-1.839]
66 Based on the biparsing algorithm, we are able to construct the reference hypergraph H(s(i) , t(i)) and Vcoitnesrtbriu hypergraph nHce(s h(yi) , etˆ)r. [sent-195, score-0.511]
67 g By t Hhe( rseference hypergraph, we rcaopllheHc t new synchronous translation rules and record them in the grammar G. [sent-196, score-1.062]
68 The noisy-or feature is estimated by word translation probabilities output by GIZA++. [sent-207, score-0.376]
69 Length feature We integrate the length of target translation that is used in traditional SMT system as our feature. [sent-211, score-0.577]
70 Source span boundary features We use this kind of feature to assess the source parse tree in a derivation. [sent-212, score-0.575]
71 (2004), for a bispan [i, j,k, l] in a derivation, we define the feature templates that indicates the boundaries of a span by its beginning and end words: {B : si+1 ; E : sj ; BE : si+1, sj}. [sent-217, score-0.363]
72 Source span orientation features Orientation features are only used for those spans that are swapping. [sent-218, score-0.344]
73 In Figure 1, the translation of source span [1, 3] is swapping with that of span [4, 5] by r2, thus orientation feature for span [1, 3] is activated. [sent-219, score-1.13]
74 We also define three feature templates for a swapping span similar to the boundary features: {B : si+1 ; E : sj ; BilaEr : si+1 , sj}. [sent-220, score-0.518]
75 2 Non-local Features Target span boundary features We also want to assess the target tree structure in a derivation. [sent-224, score-0.443]
76 We define these features in a way similar to source span boundary features. [sent-225, score-0.458]
77 target span boundary as: {B : tk+1; E : tl; BE : tk+1, tl}. [sent-253, score-0.399]
78 Target span orientation features Similar target orientation features are used for a swapping span [i, j,k, l] with feature templates {B : tk+1; E : tl; jB,kE, : tk+1, tl}. [sent-254, score-0.825]
79 For an aligned word pair with source position iand target position j, the value of this feature is ||si| As this feature depends on sth fee length so f| |tsh|e− target sentence, fiet aitsu a eno dne-pleoncdasl feature. [sent-256, score-0.36]
80 For efficiency, we use a 3-gram language model trained on the target side of our training data during the induction of synchronous grammars. [sent-259, score-0.623]
81 5 Experiment In this section, we present our experiments on the NIST Chinese-to-English translation tasks. [sent-260, score-0.319]
82 We then present a detailed comparison on a smaller dataset, in order to analyze the effectiveness of max-margin estimation comparing with the max likelihood estimation (Xiao et al. [sent-263, score-0.362]
83 , 2002) is used to measure translation performance, and also the cost function in the max-margin estimation. [sent-274, score-0.447]
84 Without special explanation, we used the same features as those in the traditional pipeline: forward and backward translation probabilities, forward and backward lexical weights, count of extracted rules, count of glue rules, length of translation, and language model. [sent-278, score-0.454]
85 For the discriminative grammar induction, rule translation probabilities were calculated using the expectations of rules in the synchronous hypergraphs of sentence pairs. [sent-280, score-1.329]
86 As our max-margin synchronous grammar induction is trained on the entire bitext, it is necessary to load all the rules into the memory during training. [sent-281, score-0.832]
87 2 Result on Large Dataset Table 2 shows the translation results. [sent-306, score-0.319]
88 With fewer translation rules, our method obtains an average improvement of +0. [sent-312, score-0.319]
89 As the difference between the baseline and our max-margin synchronous grammar induction model only lies in the grammar, this result clearly denotes that our learnt grammar does outperform the grammar extracted by the traditional two-step pipeline. [sent-314, score-1.197]
90 6 Related Work As the synchronous grammar is the key component in SMT systems, researchers have proposed various methods to improve the quality of grammars. [sent-339, score-0.632]
91 In addition to the generative and discriminative models introduced in Section 1, researchers also have made efforts on word alignment and grammar weight rescoring. [sent-340, score-0.369]
92 Yet another line is to rescore the weights of translation rules. [sent-348, score-0.471]
93 However, in rescoring, translation rules are still extracted by the heuristic two-step pipeline. [sent-353, score-0.43]
94 7 Conclusion In this paper we have presented a max-margin estimation for discriminative synchronous grammar induction. [sent-361, score-0.903]
95 By associating the margin with the translation quality, we directly learn translation rules that optimize the translation performance measured by BLEU. [sent-362, score-1.142]
96 Because our proposed model is quite general, we are also interested in applying this method to induce linguistically motivated synchronous grammars for syntax-based SMT. [sent-370, score-0.545]
97 Maximum expected bleu training of phrase and lexicon translation models. [sent-440, score-0.451]
98 Sinuhe – statistical machine translation using a globally trained conditional exponential family translation model. [sent-455, score-0.706]
99 Fast generation of translation forest for largescale smt discriminative training. [sent-549, score-0.528]
100 Unsupervised discriminative induction of synchronous grammar for machine translation. [sent-554, score-0.835]
wordName wordTfidf (topN-words)
[('synchronous', 0.457), ('translation', 0.319), ('span', 0.176), ('grammar', 0.175), ('derivation', 0.175), ('estimation', 0.157), ('viterbi', 0.152), ('boundary', 0.146), ('hypergraph', 0.145), ('biparsing', 0.136), ('bleu', 0.132), ('xiao', 0.124), ('discriminative', 0.114), ('biparse', 0.114), ('rules', 0.111), ('smt', 0.095), ('hyperedges', 0.095), ('source', 0.092), ('traditional', 0.091), ('blunsom', 0.089), ('rule', 0.089), ('cost', 0.089), ('induction', 0.089), ('grammars', 0.088), ('reference', 0.085), ('nonlocal', 0.084), ('orientation', 0.08), ('alignment', 0.08), ('target', 0.077), ('chart', 0.073), ('derivations', 0.071), ('cohn', 0.071), ('maxmargin', 0.068), ('pegasos', 0.068), ('xinyan', 0.068), ('chiang', 0.068), ('calculate', 0.066), ('hypergraphs', 0.064), ('pipeline', 0.064), ('hinge', 0.063), ('tl', 0.063), ('parse', 0.06), ('bitext', 0.059), ('local', 0.059), ('phil', 0.059), ('moses', 0.058), ('loss', 0.058), ('weights', 0.057), ('feature', 0.057), ('line', 0.055), ('swapping', 0.054), ('xiong', 0.053), ('tk', 0.052), ('emnlp', 0.05), ('dyer', 0.049), ('sj', 0.048), ('max', 0.048), ('deyi', 0.048), ('denero', 0.047), ('trevor', 0.047), ('bispan', 0.045), ('fhor', 0.045), ('levenberg', 0.045), ('rescoring', 0.045), ('incorporate', 0.044), ('equation', 0.044), ('features', 0.044), ('nist', 0.044), ('triple', 0.042), ('si', 0.042), ('alignments', 0.04), ('taskar', 0.04), ('minimization', 0.04), ('rescore', 0.04), ('kbest', 0.04), ('nsl', 0.04), ('riesa', 0.04), ('algorithm', 0.039), ('function', 0.039), ('bilingual', 0.039), ('margin', 0.039), ('scoring', 0.039), ('watanabe', 0.037), ('discover', 0.037), ('kevin', 0.037), ('templates', 0.037), ('koehn', 0.036), ('gimpel', 0.036), ('tj', 0.036), ('neubig', 0.036), ('burkett', 0.036), ('shouxun', 0.036), ('denotes', 0.035), ('chris', 0.035), ('optimize', 0.035), ('naacl', 0.035), ('statistical', 0.035), ('unconstrained', 0.034), ('exponential', 0.033), ('integrate', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation
Author: Xinyan Xiao ; Deyi Xiong
Abstract: Traditional synchronous grammar induction estimates parameters by maximizing likelihood, which only has a loose relation to translation quality. Alternatively, we propose a max-margin estimation approach to discriminatively inducing synchronous grammars for machine translation, which directly optimizes translation quality measured by BLEU. In the max-margin estimation of parameters, we only need to calculate Viterbi translations. This further facilitates the incorporation of various non-local features that are defined on the target side. We test the effectiveness of our max-margin estimation framework on a competitive hierarchical phrase-based system. Experiments show that our max-margin method significantly outperforms the traditional twostep pipeline for synchronous rule extraction by 1.3 BLEU points and is also better than previous max-likelihood estimation method.
2 0.22781175 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation
Author: Zhongqiang Huang ; Jacob Devlin ; Rabih Zbib
Abstract: Translation Jacob Devlin Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA j devl in@bbn . com Rabih Zbib Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA r zbib@bbn . com have tried to introduce grammaticality to the transThis paper describes a factored approach to incorporating soft source syntactic constraints into a hierarchical phrase-based translation system. In contrast to traditional approaches that directly introduce syntactic constraints to translation rules by explicitly decorating them with syntactic annotations, which often exacerbate the data sparsity problem and cause other problems, our approach keeps translation rules intact and factorizes the use of syntactic constraints through two separate models: 1) a syntax mismatch model that associates each nonterminal of a translation rule with a distribution of tags that is used to measure the degree of syntactic compatibility of the translation rule on source spans; 2) a syntax-based reordering model that predicts whether a pair of sibling constituents in the constituent parse tree of the source sentence should be reordered or not when translated to the target language. The features produced by both models are used as soft constraints to guide the translation process. Experiments on Chinese-English translation show that the proposed approach significantly improves a strong string-to-dependency translation system on multiple evaluation sets.
3 0.22223173 201 emnlp-2013-What is Hidden among Translation Rules
Author: Libin Shen ; Bowen Zhou
Abstract: Most of the machine translation systems rely on a large set of translation rules. These rules are treated as discrete and independent events. In this short paper, we propose a novel method to model rules as observed generation output of a compact hidden model, which leads to better generalization capability. We present a preliminary generative model to test this idea. Experimental results show about one point improvement on TER-BLEU over a strong baseline in Chinese-to-English translation.
4 0.19246472 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering
Author: Maryam Siahbani ; Baskaran Sankaran ; Anoop Sarkar
Abstract: Left-to-right (LR) decoding (Watanabe et al., 2006b) is a promising decoding algorithm for hierarchical phrase-based translation (Hiero). It generates the target sentence by extending the hypotheses only on the right edge. LR decoding has complexity O(n2b) for input of n words and beam size b, compared to O(n3) for the CKY algorithm. It requires a single language model (LM) history for each target hypothesis rather than two LM histories per hypothesis as in CKY. In this paper we present an augmented LR decoding algorithm that builds on the original algorithm in (Watanabe et al., 2006b). Unlike that algorithm, using experiments over multiple language pairs we show two new results: our LR decoding algorithm provides demonstrably more efficient decoding than CKY Hiero, four times faster; and by introducing new distortion and reordering features for LR decoding, it maintains the same translation quality (as in BLEU scores) ob- tained phrase-based and CKY Hiero with the same translation model.
5 0.18596146 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models
Author: Joern Wuebker ; Stephan Peitz ; Felix Rietig ; Hermann Ney
Abstract: Automatically clustering words from a monolingual or bilingual training corpus into classes is a widely used technique in statistical natural language processing. We present a very simple and easy to implement method for using these word classes to improve translation quality. It can be applied across different machine translation paradigms and with arbitrary types of models. We show its efficacy on a small German→English and a larger F ornenc ah s→mGalelrm Gaenrm mtarann→slEatniognli tsahsk a nwdit ha lbaortghe rst Farnednacrhd→ phrase-based salandti nhie traaskrch wiciathl phrase-based translation systems for a common set of models. Our results show that with word class models, the baseline can be improved by up to 1.4% BLEU and 1.0% TER on the French→German task and 0.3% BLEU aonnd t h1e .1 F%re nTcEhR→ on tehrem German→English Btask.
6 0.18487945 88 emnlp-2013-Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding
7 0.17961869 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation
8 0.16588899 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
9 0.15910834 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation
10 0.15827602 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning
11 0.15822506 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation
12 0.15305287 128 emnlp-2013-Max-Violation Perceptron and Forced Decoding for Scalable MT Training
13 0.14594474 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk
14 0.14247407 145 emnlp-2013-Optimal Beam Search for Machine Translation
15 0.13334125 14 emnlp-2013-A Synchronous Context Free Grammar for Time Normalization
16 0.12417103 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models
17 0.1236731 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation
18 0.12017861 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
19 0.11995734 167 emnlp-2013-Semi-Markov Phrase-Based Monolingual Alignment
20 0.11851961 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction
topicId topicWeight
[(0, -0.321), (1, -0.371), (2, 0.128), (3, 0.101), (4, 0.115), (5, -0.057), (6, -0.023), (7, -0.005), (8, 0.072), (9, 0.065), (10, -0.065), (11, -0.042), (12, -0.005), (13, 0.063), (14, 0.046), (15, -0.048), (16, 0.077), (17, 0.019), (18, 0.028), (19, -0.023), (20, -0.049), (21, -0.047), (22, -0.011), (23, -0.151), (24, -0.075), (25, -0.023), (26, -0.015), (27, -0.077), (28, 0.055), (29, -0.09), (30, -0.098), (31, -0.036), (32, 0.022), (33, -0.005), (34, 0.055), (35, 0.038), (36, -0.003), (37, 0.009), (38, -0.017), (39, -0.065), (40, -0.02), (41, -0.047), (42, 0.024), (43, -0.031), (44, -0.051), (45, 0.019), (46, 0.083), (47, 0.041), (48, 0.039), (49, -0.063)]
simIndex simValue paperId paperTitle
same-paper 1 0.97112739 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation
Author: Xinyan Xiao ; Deyi Xiong
Abstract: Traditional synchronous grammar induction estimates parameters by maximizing likelihood, which only has a loose relation to translation quality. Alternatively, we propose a max-margin estimation approach to discriminatively inducing synchronous grammars for machine translation, which directly optimizes translation quality measured by BLEU. In the max-margin estimation of parameters, we only need to calculate Viterbi translations. This further facilitates the incorporation of various non-local features that are defined on the target side. We test the effectiveness of our max-margin estimation framework on a competitive hierarchical phrase-based system. Experiments show that our max-margin method significantly outperforms the traditional twostep pipeline for synchronous rule extraction by 1.3 BLEU points and is also better than previous max-likelihood estimation method.
2 0.82326186 88 emnlp-2013-Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding
Author: Martin Cmejrek ; Haitao Mi ; Bowen Zhou
Abstract: Machine translation benefits from system combination. We propose flexible interaction of hypergraphs as a novel technique combining different translation models within one decoder. We introduce features controlling the interactions between the two systems and explore three interaction schemes of hiero and forest-to-string models—specification, generalization, and interchange. The experiments are carried out on large training data with strong baselines utilizing rich sets of dense and sparse features. All three schemes significantly improve results of any single system on four testsets. We find that specification—a more constrained scheme that almost entirely uses forest-to-string rules, but optionally uses hiero rules for shorter spans—comes out as the strongest, yielding improvement up to 0.9 (T -B )/2 points. We also provide a detailed experimental and qualitative analysis of the results.
3 0.81650931 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering
Author: Maryam Siahbani ; Baskaran Sankaran ; Anoop Sarkar
Abstract: Left-to-right (LR) decoding (Watanabe et al., 2006b) is a promising decoding algorithm for hierarchical phrase-based translation (Hiero). It generates the target sentence by extending the hypotheses only on the right edge. LR decoding has complexity O(n2b) for input of n words and beam size b, compared to O(n3) for the CKY algorithm. It requires a single language model (LM) history for each target hypothesis rather than two LM histories per hypothesis as in CKY. In this paper we present an augmented LR decoding algorithm that builds on the original algorithm in (Watanabe et al., 2006b). Unlike that algorithm, using experiments over multiple language pairs we show two new results: our LR decoding algorithm provides demonstrably more efficient decoding than CKY Hiero, four times faster; and by introducing new distortion and reordering features for LR decoding, it maintains the same translation quality (as in BLEU scores) ob- tained phrase-based and CKY Hiero with the same translation model.
4 0.80770576 201 emnlp-2013-What is Hidden among Translation Rules
Author: Libin Shen ; Bowen Zhou
Abstract: Most of the machine translation systems rely on a large set of translation rules. These rules are treated as discrete and independent events. In this short paper, we propose a novel method to model rules as observed generation output of a compact hidden model, which leads to better generalization capability. We present a preliminary generative model to test this idea. Experimental results show about one point improvement on TER-BLEU over a strong baseline in Chinese-to-English translation.
5 0.71661305 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation
Author: Hendra Setiawan ; Bowen Zhou ; Bing Xiang
Abstract: Reordering poses one of the greatest challenges in Statistical Machine Translation research as the key contextual information may well be beyond the confine oftranslation units. We present the “Anchor Graph” (AG) model where we use a graph structure to model global contextual information that is crucial for reordering. The key ingredient of our AG model is the edges that capture the relationship between the reordering around a set of selected translation units, which we refer to as anchors. As the edges link anchors that may span multiple translation units at decoding time, our AG model effectively encodes global contextual information that is previously absent. We integrate our proposed model into a state-of-the-art translation system and demonstrate the efficacy of our proposal in a largescale Chinese-to-English translation task.
6 0.69914538 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models
7 0.69853669 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation
8 0.65651578 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
9 0.64909071 14 emnlp-2013-A Synchronous Context Free Grammar for Time Normalization
10 0.63386899 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models
11 0.6303305 145 emnlp-2013-Optimal Beam Search for Machine Translation
12 0.6287685 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk
13 0.62485111 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation
14 0.56949431 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning
15 0.56888741 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation
16 0.55537218 128 emnlp-2013-Max-Violation Perceptron and Forced Decoding for Scalable MT Training
17 0.52184945 2 emnlp-2013-A Convex Alternative to IBM Model 2
18 0.51906282 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction
19 0.51548302 10 emnlp-2013-A Multi-Teraflop Constituency Parser using GPUs
20 0.50650531 122 emnlp-2013-Learning to Freestyle: Hip Hop Challenge-Response Induction via Transduction Rule Segmentation
topicId topicWeight
[(18, 0.034), (22, 0.037), (26, 0.027), (30, 0.549), (45, 0.019), (50, 0.015), (51, 0.103), (66, 0.037), (71, 0.011), (75, 0.021), (77, 0.038)]
simIndex simValue paperId paperTitle
1 0.99240053 20 emnlp-2013-An Efficient Language Model Using Double-Array Structures
Author: Makoto Yasuhara ; Toru Tanaka ; Jun-ya Norimatsu ; Mikio Yamamoto
Abstract: Ngram language models tend to increase in size with inflating the corpus size, and consume considerable resources. In this paper, we propose an efficient method for implementing ngram models based on doublearray structures. First, we propose a method for representing backwards suffix trees using double-array structures and demonstrate its efficiency. Next, we propose two optimization methods for improving the efficiency of data representation in the double-array structures. Embedding probabilities into unused spaces in double-array structures reduces the model size. Moreover, tuning the word IDs in the language model makes the model smaller and faster. We also show that our method can be used for building large language models using the division method. Lastly, we show that our method outperforms methods based on recent related works from the viewpoints of model size and query speed when both optimization methods are used.
2 0.96666747 92 emnlp-2013-Growing Multi-Domain Glossaries from a Few Seeds using Probabilistic Topic Models
Author: Stefano Faralli ; Roberto Navigli
Abstract: In this paper we present a minimallysupervised approach to the multi-domain acquisition ofwide-coverage glossaries. We start from a small number of hypernymy relation seeds and bootstrap glossaries from the Web for dozens of domains using Probabilistic Topic Models. Our experiments show that we are able to extract high-precision glossaries comprising thousands of terms and definitions.
same-paper 3 0.96312803 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation
Author: Xinyan Xiao ; Deyi Xiong
Abstract: Traditional synchronous grammar induction estimates parameters by maximizing likelihood, which only has a loose relation to translation quality. Alternatively, we propose a max-margin estimation approach to discriminatively inducing synchronous grammars for machine translation, which directly optimizes translation quality measured by BLEU. In the max-margin estimation of parameters, we only need to calculate Viterbi translations. This further facilitates the incorporation of various non-local features that are defined on the target side. We test the effectiveness of our max-margin estimation framework on a competitive hierarchical phrase-based system. Experiments show that our max-margin method significantly outperforms the traditional twostep pipeline for synchronous rule extraction by 1.3 BLEU points and is also better than previous max-likelihood estimation method.
4 0.95674551 176 emnlp-2013-Structured Penalties for Log-Linear Language Models
Author: Anil Kumar Nelakanti ; Cedric Archambeau ; Julien Mairal ; Francis Bach ; Guillaume Bouchard
Abstract: Language models can be formalized as loglinear regression models where the input features represent previously observed contexts up to a certain length m. The complexity of existing algorithms to learn the parameters by maximum likelihood scale linearly in nd, where n is the length of the training corpus and d is the number of observed features. We present a model that grows logarithmically in d, making it possible to efficiently leverage longer contexts. We account for the sequential structure of natural language using treestructured penalized objectives to avoid overfitting and achieve better generalization.
5 0.89303702 4 emnlp-2013-A Dataset for Research on Short-Text Conversations
Author: Hao Wang ; Zhengdong Lu ; Hang Li ; Enhong Chen
Abstract: Natural language conversation is widely regarded as a highly difficult problem, which is usually attacked with either rule-based or learning-based models. In this paper we propose a retrieval-based automatic response model for short-text conversation, to exploit the vast amount of short conversation instances available on social media. For this purpose we introduce a dataset of short-text conversation based on the real-world instances from Sina Weibo (a popular Chinese microblog service), which will be soon released to public. This dataset provides rich collection of instances for the research on finding natural and relevant short responses to a given short text, and useful for both training and testing of conversation models. This dataset consists of both naturally formed conversations, manually labeled data, and a large repository of candidate responses. Our preliminary experiments demonstrate that the simple retrieval-based conversation model performs reasonably well when combined with the rich instances in our dataset.
6 0.8846966 163 emnlp-2013-Sarcasm as Contrast between a Positive Sentiment and Negative Situation
7 0.72766346 113 emnlp-2013-Joint Language and Translation Modeling with Recurrent Neural Networks
8 0.69001848 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation
9 0.6805324 146 emnlp-2013-Optimal Incremental Parsing via Best-First Dynamic Programming
10 0.66852224 172 emnlp-2013-Simple Customization of Recursive Neural Networks for Semantic Relation Classification
11 0.66551083 156 emnlp-2013-Recurrent Continuous Translation Models
12 0.65811598 2 emnlp-2013-A Convex Alternative to IBM Model 2
13 0.65613955 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing
14 0.65221232 122 emnlp-2013-Learning to Freestyle: Hip Hop Challenge-Response Induction via Transduction Rule Segmentation
15 0.64954543 14 emnlp-2013-A Synchronous Context Free Grammar for Time Normalization
16 0.64824098 105 emnlp-2013-Improving Web Search Ranking by Incorporating Structured Annotation of Queries
17 0.64446747 128 emnlp-2013-Max-Violation Perceptron and Forced Decoding for Scalable MT Training
19 0.63149303 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
20 0.63095498 88 emnlp-2013-Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding