emnlp emnlp2013 emnlp2013-71 knowledge-graph by maker-knowledge-mining

71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering


Source: pdf

Author: Maryam Siahbani ; Baskaran Sankaran ; Anoop Sarkar

Abstract: Left-to-right (LR) decoding (Watanabe et al., 2006b) is a promising decoding algorithm for hierarchical phrase-based translation (Hiero). It generates the target sentence by extending the hypotheses only on the right edge. LR decoding has complexity O(n2b) for input of n words and beam size b, compared to O(n3) for the CKY algorithm. It requires a single language model (LM) history for each target hypothesis rather than two LM histories per hypothesis as in CKY. In this paper we present an augmented LR decoding algorithm that builds on the original algorithm in (Watanabe et al., 2006b). Unlike that algorithm, using experiments over multiple language pairs we show two new results: our LR decoding algorithm provides demonstrably more efficient decoding than CKY Hiero, four times faster; and by introducing new distortion and reordering features for LR decoding, it maintains the same translation quality (as in BLEU scores) ob- tained phrase-based and CKY Hiero with the same translation model.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 , 2006b) is a promising decoding algorithm for hierarchical phrase-based translation (Hiero). [sent-5, score-0.358]

2 LR decoding has complexity O(n2b) for input of n words and beam size b, compared to O(n3) for the CKY algorithm. [sent-7, score-0.279]

3 It requires a single language model (LM) history for each target hypothesis rather than two LM histories per hypothesis as in CKY. [sent-8, score-0.315]

4 In this paper we present an augmented LR decoding algorithm that builds on the original algorithm in (Watanabe et al. [sent-9, score-0.29]

5 Typically, CKY-style decoding is used for Hiero with time complexity O(n3) for source input with n words. [sent-13, score-0.283]

6 The size of a Hiero SCFG grammar is typically larger than phrase-based models extracted 1089 from the same data creating challenges in rule extraction and decoding time especially for larger datasets (Sankaran et al. [sent-18, score-0.356]

7 It means LR decoding has the potential to replace CKY decoding for Hiero. [sent-21, score-0.44]

8 , 2006b) does not perform to the same level of the standard CKY Hiero with cube pruning (see Table 3). [sent-23, score-0.366]

9 In this paper we propose modifications to the LR decoding algorithm that addresses these limitations and provides, for the first time, a true alternative to the standard CKY Hiero algorithm that uses left-to-right decoding. [sent-25, score-0.29]

10 Our analysis of left to right decoding showed that it has more potential for search errors due to early pruning of good hypotheses. [sent-30, score-0.321]

11 This is unlike bottom-up decoding (CKY) which keeps best hypotheses for each span. [sent-31, score-0.45]

12 oc d2s0 i1n3 N Aastusorcaila Ltiaon g fuoarg Ceo Pmrpoucetastsi on ga,l p Laignegsu 1is0t8ic9s–109 , calized target side rules equals the scores obtained by CKY decoding with prefix lexicalized target side rules and phrase-based translation system. [sent-35, score-0.955]

13 For translating sentences longer than the maximum phrase-pair length, the decoder relies on additional glue rules S → hX, Xi acondd Sr → hSX, SXi othnaatl gall uoews ru monotone hcXom,Xbi-i anantdio Sn o→f phrases. [sent-45, score-0.414]

14 XThie t glue lruowless are nuosteodn ew choemn no rules could match or the span length is larger than the maximum phrase-pair length. [sent-46, score-0.4]

15 , 2006b) generates the target hypotheses left to right, but for synchronous context-free grammar (SCFG) as used in Hiero. [sent-49, score-0.306]

16 Rule extraction is similar to Hiero, except any rules violating GNF form on the target side are excluded. [sent-62, score-0.283]

17 Figure 1(a) shows a word- aligned German-English sentence with a phrase pair hihre arbeit noch nicht gemacht haben, hpaaivre nihotr yet dboeinte tnhocehir worki tgheamt wacilhl tle hada bteon a ShCavFeG n n routle y. [sent-64, score-0.33]

18 2 LR-Hiero rule extraction excludes non-GNF rules such as X → hX1 noch nicht gemacht X2 , X2 not yet done X1i. [sent-67, score-0.487]

19 X → h h Xf¯X1f¯1, e e¯ ¯X 1 i X → → h hX 1 f ¯ X 2 , ¯ e e¯X 21X 12i (3) It might appear that the restriction that target-side rules be GNF is a severe restriction on the coverage of possible hypotheses compared to the full set of rules permitted by the Hiero extraction heuristic. [sent-69, score-0.514]

20 However there is some evidence in the literature that discontinuous spans on the source side in translation rules is a lot more useful than discontinuous spans in the target side (which is disallowed in the GNF). [sent-70, score-0.908]

21 For instance, (Galley and Manning, 2010) do an extensive study of discontinuous spans on source and target side and show that source side discontinuous spans are very useful but removing discontinuous spans on the target side only lowers the BLEU score by 0. [sent-71, score-1.095]

22 Removing discontinuous spans means that the target side rules have the form: uX, Xu, XuX, XXu, or uXX of which we disallow Xu, XuX, XXu. [sent-73, score-0.48]

23 Zhang and Zong (2012) also conduct a study on discontinuous spans on source and target side of Hiero rules and conclude that source discontinuous spans are always more useful than discontinuities on the target side with experiments on four language pairs (zh-en, fren, de-en and es-en). [sent-74, score-0.944]

24 As we shall also see in our experimental results (see Table 4) we can get close to the BLEU scores obtained using the full set of Hi- ero rules by using only target lexicalized rules in our LR decoder. [sent-75, score-0.368]

25 2 LR-Hiero Decoding LR-Hiero decoding uses a top-down depth-first search, which strictly grows the hypotheses in target surface ordering. [sent-77, score-0.495]

26 Search on the source side follows an Earley-style search (Earley, 1970), the dot jumps around on the source side of the rules based on the order of nonterminals on the target side. [sent-78, score-0.576]

27 This search is integrated with beam search or cube pruning to efficiently find the k-best translations. [sent-79, score-0.425]

28 We explain our own modified algorithm for LR decoding with cube pruning in Section 2. [sent-102, score-0.621]

29 Each partial hypothesis h is a 4-tuple (ht, hs , hcov, hc) : consisting of a translation prefix ht, a (LIFO-ordered) list hs of uncovered spans, source words coverage set hcov and the hypothesis cost hc. [sent-106, score-0.826]

30 The initial hypothesis is a null string with just a sentence-initial marker hsi and the list hs containing a span o-ifn tihtiea lw mhaorlkee sentence, [0, n] . [sent-107, score-0.357]

31 At the beginning of beam search the initial hyrules source side coverage hypothesis G426531)X → 〈 isX. [sent-113, score-0.385]

32 (a) Rules pane show the rules used in the derivation (glue rules are marked by G) (b) Decoder state using Earley dot notation (superscripts show rule#) (c) Hypotheses pane showing translation prefix and ordered list of yet-to-be-covered spans. [sent-119, score-0.678]

33 Hypotheses in each decoder stack are expanded iteratively, generating new hypotheses, which are added to the latter stacks corresponding to the number of source words covered. [sent-121, score-0.385]

34 For example, given a rule r, with source side rs : hX1 the X2i and source phrase p : hok, the more thhXe betteri . [sent-125, score-0.36]

35 The GrowHypothesis routine creates a new candidate by expanding given hypothesis h using rule r and computes the complete hypothesis score including language model score. [sent-128, score-0.411]

36 Since the target-side rules are in GNF, the translation prefix of the new hypothesis is obtained by simply concatenating the terminal prefixes of h and r in same order (line 20). [sent-129, score-0.457]

37 The hs list is built by pushing the non-terminal spans of rule r in a reverse order (lines 17 and 18). [sent-131, score-0.312]

38 In the walk-through in Figure 2, the derivation process starts by expanding the initial hypothesis h0 (first item in the right pane of Fig 2) with the rule (rule #1 in left pane) to generate a new partial candidate having a terminal prefix of hsi students (second diteatme hianv right pane). [sent-133, score-0.514]

39 Now the decoder 1092 considers the second hypothesis and pops the span [1, 8] . [sent-135, score-0.354]

40 It then matches the rule (#2) and pushes the spans [1, 6] and [7, 8] into the list hs in the reverse order of their appearance in the target-side rule. [sent-136, score-0.312]

41 At each step the new hypothesis is added to the decoder stack Sl depending on the number of covered words in the new hypothesis (line 13 in Algorithm 1). [sent-137, score-0.487]

42 For pruning we use an estimate of the future of the spans uncovered by current hypothesis together with the hypothesis cost. [sent-138, score-0.622]

43 The ComputeCost method (line 22 in Algorithm 1) uses the usual log-linear model and scores a hypothesis based on its different feature scores g(h0) and the future cost of the yet to be covered spans (F¬h0cov). [sent-141, score-0.28]

44 Tfuitmuere complexity o yfe ltef tto t ob right eHreidero s decoding with beam search is O(n2b) in practice where n is the length of source sentence and b is the size of beam (Huang and Mi, 2010). [sent-142, score-0.401]

45 3 LR-Hiero Decoding with Cube Pruning The Algorithm 1 presented earlier does an exhaustive search as it generates all possible partial translations for a given stack that are reachable from the hypotheses in previous stacks. [sent-144, score-0.35]

46 The cube pruning technique (Chiang, 2007) avoids the wasteful generation of poor hypotheses that are likely to be pruned away by efficiently restricting the generation to only high scoring partial translations. [sent-146, score-0.596]

47 We modify the cube pruning for LR-decoding that takes into account the next uncovered span to 3 Watanabe et al. [sent-147, score-0.632]

48 The structure of stacks and hypotheses and computing the future cost is similar to Algorithm 1 (lines 1-5). [sent-162, score-0.373]

49 All hypotheses in each stack Sp (covering p words on the source-side) are first partitioned into a set of groups, {G}, based on their ftiiorsnt udnc inotvoer aed s span (line 9) 5 {. [sent-164, score-0.472]

50 , 2010) 1093 2-tuple (gspan, ghyps), where ghyps is a list of hypotheses which share the same first uncovered span gspan. [sent-167, score-0.571]

51 Rules matching the span gspan are obtained from routine GetSpanRules, which are then grouped based on unique source side rules (i. [sent-168, score-0.504]

52 each Rs contains rules that share the same source side s but have different target sides). [sent-170, score-0.346]

53 Each ghyps and possible Rs6 create a cube which is added to cubeList. [sent-171, score-0.34]

54 In LR-Hiero, each hypothesis is developed with only one uncovered span, therefore each cube always has just two dimensions: (1) hypotheses with the same number of covered words and similar first uncovered span, (2) rules sharing the same source side. [sent-172, score-1.123]

55 The Merge routine is the core function of cube pruning which generates the best hypotheses from all cubes (Chiang, 2007). [sent-175, score-0.676]

56 Figure 3 (b) shows a more detailed view of a cube (shaded cube in Figure 3(a)). [sent-177, score-0.53]

57 Rows are hypotheses and columns are rules which are sorted based on their scores. [sent-178, score-0.372]

58 Iteratively the best hypothesis is popped from the queue (line 26) and its neighbours in the cube are added to the priority queue (using GetNeighbours([H, Q] )). [sent-180, score-0.73]

59 Using cube pruning technique, each stack is filled with K best hypotheses without generating all possible hypotheses in each cube. [sent-182, score-0.946]

60 the hypotheses in a given stack based on their coverage vector. [sent-183, score-0.35]

61 But this idea does not work in LRHiero decoding in which the expansion of each hypothesis is restricted to its first uncovered span. [sent-184, score-0.499]

62 6Note that, just rules whose number of terminals in their source side is equal to i− p can be used. [sent-187, score-0.334]

63 Figure 3: Example of generating hypotheses in cube pruning using Figure 2: (a) Hypotheses in previous stacks are grouped based on their first uncovered span, and build cubes (grids on top). [sent-188, score-0.889]

64 Cubes are fed to a priority queue (triangle) and new hypotheses are iteratively popped from the queue and added to the current stack, S5. [sent-190, score-0.529]

65 The top side of the grid denotes the target side of rules sharing the same source side (Rs) along with their scores. [sent-192, score-0.538]

66 Left side of the grid shows the hypotheses in a same group, their first uncovered span and their scores. [sent-193, score-0.592]

67 The best hypothesis of this cube which is likely created from the best hypothesis and rule (left top most entry) is popped at first step. [sent-197, score-0.723]

68 Then, GetNeighbours calls GrowHypothesis to generate next potential best hypotheses of this cube (neighbours of the popped entry which are shaded in Figure 3(b)). [sent-198, score-0.612]

69 In the next iteration, the best hypothesis is popped from all candidates in the queue and algorithm continues. [sent-200, score-0.342]

70 In addition we also use the glue rule count and the two reordering penalty features employed by Watanabe et al. [sent-202, score-0.44]

71 However, the height and width features do not distinguish between a rule that reorders the nonterminals in source and target from one that preserves the ordering. [sent-211, score-0.347]

72 The decoder is thus agnostic to this difference and would not be able to exploit this effectively to control reordering and instead would rely on the partial LM score. [sent-213, score-0.296]

73 f1 X1 f2 X2 f3 r : hf1X1f2X2f3,tX2X1i (a) I [`,f1,f2,f3,X2,X1, a] = r :〈X1 noch nicht X2/not yet X2 X1〉 1ihre2arbeit3noch4nicht5gemacht6 (b) I=[(1,1),(3,5),(5,6),(1,3),(6,6)] Figure 4: (a) Distortion feature computation using a rule r. [sent-220, score-0.285]

74 (b) Example of distortion computation for applying r3 on phrase hihre arbeit noch nicht gemacht habeni . [sent-221, score-0.534]

75 Representing the non-terminal spans and each sequence of terminals in γ as distinct items, our distortion feature counts the total length ofjumps between the items during Earley parsing. [sent-226, score-0.314]

76 Figure 4 (a) explains the computation of our distortion feature for an example rule r. [sent-227, score-0.279]

77 Figure 4 (b) shows an example of distortion computation for r3 and phrase hihre arbeit noch nicht gemacht habeni afrndom p Figure i2h. [sent-238, score-0.534]

78 Since the glue rules are likely to be used in the top levels (possibly with large distortion) of the derivation, we would want the decoder to learn the distortion for regular and glue rules separately. [sent-239, score-0.827]

79 We thus use two distortion features for the two rule types and we call them dp and dg. [sent-240, score-0.279]

80 2 Reordering Feature This feature simply counts the number of reordering rules, where the non-terminals in source and target sides are reordered. [sent-245, score-0.307]

81 4 Experiments We conduct different types of experiments to evaluate LR-Hiero decoding developed by cube pruning and integrating new features into LR-Hiero system for two language pairs: German-English (de-en) and Czech-English (cs-en). [sent-249, score-0.586]

82 , 2007) • LR-Hiero+CP: LR-Hiero decoding with cube pruning. [sent-269, score-0.485]

83 We use comparable pop limits in each of the decoders: 1000 for Moses and LR-Hiero and 500 with cube pruning for CKY Hiero and LR-Hiero+CP. [sent-272, score-0.405]

84 2 Time Efficiency Comparison To evaluate the performance of LR-Hiero decoding with cube pruning (LR-Hiero+CP), we compare it with three baselines: (i) CKY Hiero, (ii) CKY Hiero-GNF, and (iii) LR-Hiero (without cube pruning) with two different beam size 500 and 1000. [sent-293, score-0.91]

85 By analogy to parsing algorithm comparisons, we compare the different decoding algorithms with respect to the number of calls made to the language model (LM) since that directly corresponds to the number of hypotheses considered by the decoder. [sent-296, score-0.519]

86 A decoder is more time efficient if it can consider fewer translation hypotheses while maintaining the same BLEU score. [sent-297, score-0.399]

87 3 BLEU (not shown due to lack of space) when we split the distortion feature for the two rule types (dp and dg). [sent-321, score-0.279]

88 To investigate the impact of proposed reordering features with other decoder or models. [sent-329, score-0.296]

89 The last part of Table 4 shows the performance CKY decoder 7Feature rhi is defined for SCFG rules and cannot be adopted to phrase-based translation systems; and Moses uses distortion feature therefore we omit Moses from this experiment. [sent-331, score-0.53]

90 Nguyen and Vogel (2013) integrate traditional phrase-based features: distortion and lexicalized reordering into Hiero as well. [sent-335, score-0.412]

91 In LR-decoding, we compute distortion for rules even though we are yet to translate some of the sub-spans. [sent-339, score-0.316]

92 We presented an augmented LR decoding algorithm that builds on the original algorithm in (Watanabe et al. [sent-346, score-0.29]

93 , 2006b) but unlike that algorithm, using experiments over multiple language pairs we showed two new results: (i) Our LR decoding algorithm provides demonstrably more efficient decoding than CKY Hiero and the original LR decoding algorithm in (Watanabe et al. [sent-347, score-0.775]

94 And, (ii) by introducing new distortion and reordering features for LR decoding we show that it maintains the BLEU scores obtained by phrasebased and CKY Hiero-GNF. [sent-349, score-0.631]

95 CKY Hiero uses standard Hiero-style translation rules capturing better reordering model than prefix lexicalized target-side translation rules used in LRHiero. [sent-350, score-0.724]

96 We also introduce the use of future cost in decoding algorithm which is an essential part in decoding. [sent-355, score-0.293]

97 We have shown in this paper that left-to-right (LR) decoding can be considered as a potential faster alternative to CKY decoding for Hiero-style machine translation systems. [sent-356, score-0.546]

98 We also plan to extend the glue rules in LR-Hiero to provide a bet- ter reordering model. [sent-359, score-0.477]

99 We believe such an extension would be very effective in reducing search errors and capturing better reordering models in language pairs involving complex reordering requirements like Chinese-English. [sent-360, score-0.398]

100 Integrating phrase-based reordering features into chart-based decoder for machine translation. [sent-438, score-0.296]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hiero', 0.377), ('cube', 0.265), ('hypotheses', 0.23), ('decoding', 0.22), ('cky', 0.212), ('reordering', 0.199), ('lr', 0.187), ('distortion', 0.174), ('watanabe', 0.165), ('gnf', 0.15), ('uncovered', 0.144), ('rules', 0.142), ('glue', 0.136), ('hypothesis', 0.135), ('span', 0.122), ('stack', 0.12), ('spans', 0.107), ('rule', 0.105), ('stacks', 0.105), ('bleu', 0.103), ('pruning', 0.101), ('decoder', 0.097), ('side', 0.096), ('sankaran', 0.092), ('nicht', 0.09), ('noch', 0.09), ('pane', 0.09), ('discontinuous', 0.09), ('queue', 0.089), ('popped', 0.083), ('cubelist', 0.075), ('getspanrules', 0.075), ('ghyps', 0.075), ('growhypothesis', 0.075), ('translation', 0.072), ('lm', 0.07), ('hs', 0.068), ('earley', 0.065), ('source', 0.063), ('scfg', 0.063), ('gemacht', 0.06), ('hyplist', 0.06), ('kriya', 0.06), ('lrhiero', 0.06), ('mrl', 0.06), ('beam', 0.059), ('prefix', 0.058), ('height', 0.057), ('heafield', 0.056), ('cp', 0.055), ('baskaran', 0.052), ('terminal', 0.05), ('koehn', 0.048), ('push', 0.048), ('width', 0.046), ('target', 0.045), ('arbeit', 0.045), ('demonstrably', 0.045), ('getneighbours', 0.045), ('gspan', 0.045), ('hcov', 0.045), ('heapq', 0.045), ('hihre', 0.045), ('lifo', 0.045), ('rhi', 0.045), ('moses', 0.045), ('chiang', 0.045), ('cubes', 0.044), ('derivation', 0.044), ('dot', 0.04), ('hyp', 0.039), ('monotone', 0.039), ('pop', 0.039), ('superscripts', 0.039), ('lexicalized', 0.039), ('priority', 0.038), ('maintains', 0.038), ('anoop', 0.038), ('cost', 0.038), ('routine', 0.036), ('fig', 0.036), ('razmara', 0.036), ('algorithm', 0.035), ('nguyen', 0.034), ('calls', 0.034), ('faster', 0.034), ('line', 0.033), ('rs', 0.033), ('terminals', 0.033), ('reverse', 0.032), ('initial', 0.032), ('taro', 0.031), ('nonterminals', 0.031), ('neighbours', 0.031), ('grammar', 0.031), ('hierarchical', 0.031), ('backtraced', 0.03), ('computecost', 0.03), ('futurecost', 0.03), ('habeni', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering

Author: Maryam Siahbani ; Baskaran Sankaran ; Anoop Sarkar

Abstract: Left-to-right (LR) decoding (Watanabe et al., 2006b) is a promising decoding algorithm for hierarchical phrase-based translation (Hiero). It generates the target sentence by extending the hypotheses only on the right edge. LR decoding has complexity O(n2b) for input of n words and beam size b, compared to O(n3) for the CKY algorithm. It requires a single language model (LM) history for each target hypothesis rather than two LM histories per hypothesis as in CKY. In this paper we present an augmented LR decoding algorithm that builds on the original algorithm in (Watanabe et al., 2006b). Unlike that algorithm, using experiments over multiple language pairs we show two new results: our LR decoding algorithm provides demonstrably more efficient decoding than CKY Hiero, four times faster; and by introducing new distortion and reordering features for LR decoding, it maintains the same translation quality (as in BLEU scores) ob- tained phrase-based and CKY Hiero with the same translation model.

2 0.25889593 88 emnlp-2013-Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding

Author: Martin Cmejrek ; Haitao Mi ; Bowen Zhou

Abstract: Machine translation benefits from system combination. We propose flexible interaction of hypergraphs as a novel technique combining different translation models within one decoder. We introduce features controlling the interactions between the two systems and explore three interaction schemes of hiero and forest-to-string models—specification, generalization, and interchange. The experiments are carried out on large training data with strong baselines utilizing rich sets of dense and sparse features. All three schemes significantly improve results of any single system on four testsets. We find that specification—a more constrained scheme that almost entirely uses forest-to-string rules, but optionally uses hiero rules for shorter spans—comes out as the strongest, yielding improvement up to 0.9 (T -B )/2 points. We also provide a detailed experimental and qualitative analysis of the results.

3 0.23081911 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation

Author: Zhongqiang Huang ; Jacob Devlin ; Rabih Zbib

Abstract: Translation Jacob Devlin Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA j devl in@bbn . com Rabih Zbib Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA r zbib@bbn . com have tried to introduce grammaticality to the transThis paper describes a factored approach to incorporating soft source syntactic constraints into a hierarchical phrase-based translation system. In contrast to traditional approaches that directly introduce syntactic constraints to translation rules by explicitly decorating them with syntactic annotations, which often exacerbate the data sparsity problem and cause other problems, our approach keeps translation rules intact and factorizes the use of syntactic constraints through two separate models: 1) a syntax mismatch model that associates each nonterminal of a translation rule with a distribution of tags that is used to measure the degree of syntactic compatibility of the translation rule on source spans; 2) a syntax-based reordering model that predicts whether a pair of sibling constituents in the constituent parse tree of the source sentence should be reordered or not when translated to the target language. The features produced by both models are used as soft constraints to guide the translation process. Experiments on Chinese-English translation show that the proposed approach significantly improves a strong string-to-dependency translation system on multiple evaluation sets.

4 0.19246472 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

Author: Xinyan Xiao ; Deyi Xiong

Abstract: Traditional synchronous grammar induction estimates parameters by maximizing likelihood, which only has a loose relation to translation quality. Alternatively, we propose a max-margin estimation approach to discriminatively inducing synchronous grammars for machine translation, which directly optimizes translation quality measured by BLEU. In the max-margin estimation of parameters, we only need to calculate Viterbi translations. This further facilitates the incorporation of various non-local features that are defined on the target side. We test the effectiveness of our max-margin estimation framework on a competitive hierarchical phrase-based system. Experiments show that our max-margin method significantly outperforms the traditional twostep pipeline for synchronous rule extraction by 1.3 BLEU points and is also better than previous max-likelihood estimation method.

5 0.16915321 145 emnlp-2013-Optimal Beam Search for Machine Translation

Author: Alexander Rush ; Yin-Wen Chang ; Michael Collins

Abstract: Beam search is a fast and empirically effective method for translation decoding, but it lacks formal guarantees about search error. We develop a new decoding algorithm that combines the speed of beam search with the optimal certificate property of Lagrangian relaxation, and apply it to phrase- and syntax-based translation decoding. The new method is efficient, utilizes standard MT algorithms, and returns an exact solution on the majority of translation examples in our test data. The algorithm is 3.5 times faster than an optimized incremental constraint-based decoder for phrase-based translation and 4 times faster for syntax-based translation.

6 0.16717863 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

7 0.16563435 128 emnlp-2013-Max-Violation Perceptron and Forced Decoding for Scalable MT Training

8 0.14599204 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

9 0.14566773 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

10 0.14231043 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

11 0.1386753 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation

12 0.12990567 146 emnlp-2013-Optimal Incremental Parsing via Best-First Dynamic Programming

13 0.11987767 10 emnlp-2013-A Multi-Teraflop Constituency Parser using GPUs

14 0.11771847 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation

15 0.11303703 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

16 0.10140709 201 emnlp-2013-What is Hidden among Translation Rules

17 0.10101771 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

18 0.092666514 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning

19 0.066940933 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

20 0.065541282 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.223), (1, -0.312), (2, 0.129), (3, 0.1), (4, 0.121), (5, -0.053), (6, -0.03), (7, -0.056), (8, 0.094), (9, 0.179), (10, -0.117), (11, 0.015), (12, -0.136), (13, 0.025), (14, -0.028), (15, 0.024), (16, 0.028), (17, -0.073), (18, -0.08), (19, -0.026), (20, -0.07), (21, 0.002), (22, -0.034), (23, -0.13), (24, 0.039), (25, -0.049), (26, 0.079), (27, 0.001), (28, -0.079), (29, 0.01), (30, -0.003), (31, 0.04), (32, -0.039), (33, -0.05), (34, 0.003), (35, 0.074), (36, 0.039), (37, -0.067), (38, -0.037), (39, -0.032), (40, -0.007), (41, -0.006), (42, 0.021), (43, -0.029), (44, 0.082), (45, 0.077), (46, 0.097), (47, 0.002), (48, -0.126), (49, -0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96097076 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering

Author: Maryam Siahbani ; Baskaran Sankaran ; Anoop Sarkar

Abstract: Left-to-right (LR) decoding (Watanabe et al., 2006b) is a promising decoding algorithm for hierarchical phrase-based translation (Hiero). It generates the target sentence by extending the hypotheses only on the right edge. LR decoding has complexity O(n2b) for input of n words and beam size b, compared to O(n3) for the CKY algorithm. It requires a single language model (LM) history for each target hypothesis rather than two LM histories per hypothesis as in CKY. In this paper we present an augmented LR decoding algorithm that builds on the original algorithm in (Watanabe et al., 2006b). Unlike that algorithm, using experiments over multiple language pairs we show two new results: our LR decoding algorithm provides demonstrably more efficient decoding than CKY Hiero, four times faster; and by introducing new distortion and reordering features for LR decoding, it maintains the same translation quality (as in BLEU scores) ob- tained phrase-based and CKY Hiero with the same translation model.

2 0.74305159 88 emnlp-2013-Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding

Author: Martin Cmejrek ; Haitao Mi ; Bowen Zhou

Abstract: Machine translation benefits from system combination. We propose flexible interaction of hypergraphs as a novel technique combining different translation models within one decoder. We introduce features controlling the interactions between the two systems and explore three interaction schemes of hiero and forest-to-string models—specification, generalization, and interchange. The experiments are carried out on large training data with strong baselines utilizing rich sets of dense and sparse features. All three schemes significantly improve results of any single system on four testsets. We find that specification—a more constrained scheme that almost entirely uses forest-to-string rules, but optionally uses hiero rules for shorter spans—comes out as the strongest, yielding improvement up to 0.9 (T -B )/2 points. We also provide a detailed experimental and qualitative analysis of the results.

3 0.69958031 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

Author: Hendra Setiawan ; Bowen Zhou ; Bing Xiang

Abstract: Reordering poses one of the greatest challenges in Statistical Machine Translation research as the key contextual information may well be beyond the confine oftranslation units. We present the “Anchor Graph” (AG) model where we use a graph structure to model global contextual information that is crucial for reordering. The key ingredient of our AG model is the edges that capture the relationship between the reordering around a set of selected translation units, which we refer to as anchors. As the edges link anchors that may span multiple translation units at decoding time, our AG model effectively encodes global contextual information that is previously absent. We integrate our proposed model into a state-of-the-art translation system and demonstrate the efficacy of our proposal in a largescale Chinese-to-English translation task.

4 0.6865654 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

Author: Xinyan Xiao ; Deyi Xiong

Abstract: Traditional synchronous grammar induction estimates parameters by maximizing likelihood, which only has a loose relation to translation quality. Alternatively, we propose a max-margin estimation approach to discriminatively inducing synchronous grammars for machine translation, which directly optimizes translation quality measured by BLEU. In the max-margin estimation of parameters, we only need to calculate Viterbi translations. This further facilitates the incorporation of various non-local features that are defined on the target side. We test the effectiveness of our max-margin estimation framework on a competitive hierarchical phrase-based system. Experiments show that our max-margin method significantly outperforms the traditional twostep pipeline for synchronous rule extraction by 1.3 BLEU points and is also better than previous max-likelihood estimation method.

5 0.67513567 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation

Author: Zhongqiang Huang ; Jacob Devlin ; Rabih Zbib

Abstract: Translation Jacob Devlin Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA j devl in@bbn . com Rabih Zbib Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA r zbib@bbn . com have tried to introduce grammaticality to the transThis paper describes a factored approach to incorporating soft source syntactic constraints into a hierarchical phrase-based translation system. In contrast to traditional approaches that directly introduce syntactic constraints to translation rules by explicitly decorating them with syntactic annotations, which often exacerbate the data sparsity problem and cause other problems, our approach keeps translation rules intact and factorizes the use of syntactic constraints through two separate models: 1) a syntax mismatch model that associates each nonterminal of a translation rule with a distribution of tags that is used to measure the degree of syntactic compatibility of the translation rule on source spans; 2) a syntax-based reordering model that predicts whether a pair of sibling constituents in the constituent parse tree of the source sentence should be reordered or not when translated to the target language. The features produced by both models are used as soft constraints to guide the translation process. Experiments on Chinese-English translation show that the proposed approach significantly improves a strong string-to-dependency translation system on multiple evaluation sets.

6 0.64111376 145 emnlp-2013-Optimal Beam Search for Machine Translation

7 0.55124706 128 emnlp-2013-Max-Violation Perceptron and Forced Decoding for Scalable MT Training

8 0.53679407 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

9 0.53217322 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

10 0.53145629 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation

11 0.51468045 201 emnlp-2013-What is Hidden among Translation Rules

12 0.50292319 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

13 0.49285075 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation

14 0.4898611 146 emnlp-2013-Optimal Incremental Parsing via Best-First Dynamic Programming

15 0.47650817 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

16 0.46601155 10 emnlp-2013-A Multi-Teraflop Constituency Parser using GPUs

17 0.44719785 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

18 0.43575788 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

19 0.42778009 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

20 0.37228122 14 emnlp-2013-A Synchronous Context Free Grammar for Time Normalization


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.02), (18, 0.462), (22, 0.031), (30, 0.11), (45, 0.018), (50, 0.013), (51, 0.084), (66, 0.032), (69, 0.025), (71, 0.015), (75, 0.013), (77, 0.065), (96, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.88419884 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering

Author: Maryam Siahbani ; Baskaran Sankaran ; Anoop Sarkar

Abstract: Left-to-right (LR) decoding (Watanabe et al., 2006b) is a promising decoding algorithm for hierarchical phrase-based translation (Hiero). It generates the target sentence by extending the hypotheses only on the right edge. LR decoding has complexity O(n2b) for input of n words and beam size b, compared to O(n3) for the CKY algorithm. It requires a single language model (LM) history for each target hypothesis rather than two LM histories per hypothesis as in CKY. In this paper we present an augmented LR decoding algorithm that builds on the original algorithm in (Watanabe et al., 2006b). Unlike that algorithm, using experiments over multiple language pairs we show two new results: our LR decoding algorithm provides demonstrably more efficient decoding than CKY Hiero, four times faster; and by introducing new distortion and reordering features for LR decoding, it maintains the same translation quality (as in BLEU scores) ob- tained phrase-based and CKY Hiero with the same translation model.

2 0.86598134 49 emnlp-2013-Combining Generative and Discriminative Model Scores for Distant Supervision

Author: Benjamin Roth ; Dietrich Klakow

Abstract: Distant supervision is a scheme to generate noisy training data for relation extraction by aligning entities of a knowledge base with text. In this work we combine the output of a discriminative at-least-one learner with that of a generative hierarchical topic model to reduce the noise in distant supervision data. The combination significantly increases the ranking quality of extracted facts and achieves state-of-the-art extraction performance in an end-to-end setting. A simple linear interpolation of the model scores performs better than a parameter-free scheme based on nondominated sorting.

3 0.85355562 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

Author: Min Xiao ; Yuhong Guo

Abstract: Cross-lingual adaptation aims to learn a prediction model in a label-scarce target language by exploiting labeled data from a labelrich source language. An effective crosslingual adaptation system can substantially reduce the manual annotation effort required in many natural language processing tasks. In this paper, we propose a new cross-lingual adaptation approach for document classification based on learning cross-lingual discriminative distributed representations of words. Specifically, we propose to maximize the loglikelihood of the documents from both language domains under a cross-lingual logbilinear document model, while minimizing the prediction log-losses of labeled documents. We conduct extensive experiments on cross-lingual sentiment classification tasks of Amazon product reviews. Our experimental results demonstrate the efficacy of the pro- posed cross-lingual adaptation approach.

4 0.65909541 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering

Author: Min Xiao ; Feipeng Zhao ; Yuhong Guo

Abstract: Domain adaptation has been popularly studied on exploiting labeled information from a source domain to learn a prediction model in a target domain. In this paper, we develop a novel representation learning approach to address domain adaptation for text classification with automatically induced discriminative latent features, which are generalizable across domains while informative to the prediction task. Specifically, we propose a hierarchical multinomial Naive Bayes model with latent variables to conduct supervised word clustering on labeled documents from both source and target domains, and then use the produced cluster distribution of each word as its latent feature representation for domain adaptation. We train this latent graphical model us- ing a simple expectation-maximization (EM) algorithm. We empirically evaluate the proposed method with both cross-domain document categorization tasks on Reuters-21578 dataset and cross-domain sentiment classification tasks on Amazon product review dataset. The experimental results demonstrate that our proposed approach achieves superior performance compared with alternative methods.

5 0.57572806 164 emnlp-2013-Scaling Semantic Parsers with On-the-Fly Ontology Matching

Author: Tom Kwiatkowski ; Eunsol Choi ; Yoav Artzi ; Luke Zettlemoyer

Abstract: We consider the challenge of learning semantic parsers that scale to large, open-domain problems, such as question answering with Freebase. In such settings, the sentences cover a wide variety of topics and include many phrases whose meaning is difficult to represent in a fixed target ontology. For example, even simple phrases such as ‘daughter’ and ‘number of people living in’ cannot be directly represented in Freebase, whose ontology instead encodes facts about gender, parenthood, and population. In this paper, we introduce a new semantic parsing approach that learns to resolve such ontological mismatches. The parser is learned from question-answer pairs, uses a probabilistic CCG to build linguistically motivated logicalform meaning representations, and includes an ontology matching model that adapts the output logical forms for each target ontology. Experiments demonstrate state-of-the-art performance on two benchmark semantic parsing datasets, including a nine point accuracy improvement on a recent Freebase QA corpus.

6 0.48770353 88 emnlp-2013-Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding

7 0.48720968 134 emnlp-2013-Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks

8 0.48007047 172 emnlp-2013-Simple Customization of Recursive Neural Networks for Semantic Relation Classification

9 0.47277761 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

10 0.47175506 113 emnlp-2013-Joint Language and Translation Modeling with Recurrent Neural Networks

11 0.47018415 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

12 0.46713558 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

13 0.46406606 139 emnlp-2013-Noise-Aware Character Alignment for Bootstrapping Statistical Machine Transliteration from Bilingual Corpora

14 0.46250841 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity

15 0.46047029 14 emnlp-2013-A Synchronous Context Free Grammar for Time Normalization

16 0.45356065 156 emnlp-2013-Recurrent Continuous Translation Models

17 0.45307204 128 emnlp-2013-Max-Violation Perceptron and Forced Decoding for Scalable MT Training

18 0.45225042 52 emnlp-2013-Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation

19 0.45168653 81 emnlp-2013-Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

20 0.44881961 189 emnlp-2013-Two-Stage Method for Large-Scale Acquisition of Contradiction Pattern Pairs using Entailment