acl acl2013 acl2013-77 knowledge-graph by maker-knowledge-mining

77 acl-2013-Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT?


Source: pdf

Author: Nadir Durrani ; Alexander Fraser ; Helmut Schmid ; Hieu Hoang ; Philipp Koehn

Abstract: The phrase-based and N-gram-based SMT frameworks complement each other. While the former is better able to memorize, the latter provides a more principled model that captures dependencies across phrasal boundaries. Some work has been done to combine insights from these two frameworks. A recent successful attempt showed the advantage of using phrasebased search on top of an N-gram-based model. We probe this question in the reverse direction by investigating whether integrating N-gram-based translation and reordering models into a phrase-based decoder helps overcome the problematic phrasal independence assumption. A large scale evaluation over 8 language pairs shows that performance does significantly improve.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Alexander Fraser Helmut Schmid Ludwig Maximilian University Munich fraser , s chmid@ cis . [sent-5, score-0.037]

2 While the former is better able to memorize, the latter provides a more principled model that captures dependencies across phrasal boundaries. [sent-12, score-0.114]

3 A recent successful attempt showed the advantage of using phrasebased search on top of an N-gram-based model. [sent-14, score-0.102]

4 We probe this question in the reverse direction by investigating whether integrating N-gram-based translation and reordering models into a phrase-based decoder helps overcome the problematic phrasal independence assumption. [sent-15, score-0.322]

5 A fundamental drawback is that phrases are translated and reordered independently of each other and contextual information outside of phrasal boundaries is ignored. [sent-19, score-0.091]

6 However i) often the language model cannot overcome the dispreference of the translation model for nonlocal dependencies, ii) source-side contextual dependencies are still ignored and iii) generation oflexical translations and reordering is separated. [sent-21, score-0.317]

7 The N-gram-based SMT framework addresses these problems by learning Markov chains over sequences of minimal translation units (MTUs) also known as tuples (Mari ˜no et al. [sent-22, score-0.155]

8 , 2006) or over op- erations coupling lexical generation and reordering (Durrani et al. [sent-23, score-0.207]

9 Because the models condition the MTU probabilities on the previous MTUs, they capture non-local dependencies and both source and target contextual information across phrasal boundaries. [sent-25, score-0.174]

10 In this paper we study the effect of integrating tuple-based N-gram models (TSM) and operationbased N-gram models (OSM) into the phrasebased model in Moses, a state-of-the-art phrasebased system. [sent-26, score-0.158]

11 Rather than using POS-based rewrite rules (Crego and Mari n˜o, 2006) to form a search graph, we use the ability of the phrasebased system to memorize larger translation units to replicate the effect of source linearization as done in the TSM model. [sent-27, score-0.447]

12 We also show that using phrase-based search with MTU N-gram translation models helps to address some of the search problems that are nontrivial to handle when decoding with minimal translation units. [sent-28, score-0.393]

13 An important limitation of the OSM N-gram model is that it does not handle unaligned or discontinuous target MTUs and requires post-processing of the alignment to remove these. [sent-29, score-0.36]

14 Using phrases during search enabled us to make novel changes to the OSM generative story (also applicable to the TSM model) to handle unaligned target words and to use target linearization to deal with discontinuous target MTUs. [sent-30, score-0.578]

15 We performed an extensive evaluation, carrying out translation experiments from French, Spanish, Czech and Russian to English and in the opposite direction. [sent-31, score-0.085]

16 Our integration of the OSM model into Moses and our modification of the OSM model to deal with unaligned and discontinuous target tokens consistently improves BLEU scores over the 399 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t. [sent-32, score-0.387]

17 Crego and Yvon (2010) modified the phrasebased lexical reordering model of Tillman (2004) for an N-gram-based system. [sent-40, score-0.231]

18 (201 1) integrated a bilingual language model based on surface word forms and POS tags into a phrasebased system. [sent-42, score-0.142]

19 A drawback of the TSM model is the assumption that source and target information is generated monotonically. [sent-45, score-0.124]

20 The process of reordering is disconnected from lexical generation which restricts the search to a small set of precomputed reorderings. [sent-46, score-0.201]

21 (201 1) addressed this problem by coupling lexical generation and reordering information into a single generative process and enriching the N-gram models to learn lexical reordering triggers. [sent-48, score-0.349]

22 (2013) showed that using larger phrasal units during de- coding is superior to MTU-based decoding in an N-gram-based system. [sent-50, score-0.187]

23 This paper combines insights from these recent pieces of work and show that phrase-based search combined with N-gram-based and phrase-based models in decoding is the overall best way to go. [sent-52, score-0.109]

24 We integrate the two N-grambased models, TSM and OSM, into phrase-based Moses and show that the translation quality is improved by taking both translation and reordering context into account. [sent-53, score-0.312]

25 Other approaches that explored such models in syntax-based systems used MTUs for sentence level reranking (Khalilov and Fonollosa, 2009), in dependency translation models (Quirk and Menezes, 2006) and in target language syntax systems (Vaswani et al. [sent-54, score-0.135]

26 3 Integration of N-gram Models We now describe our integration of TSM and OSM N-gram models into the phrase-based sys- Figure 1: Example (a) Word Alignments (b) Unfolded MTU Sequence (c) Operation Sequence (d) Step-wise Generation tem. [sent-56, score-0.034]

27 Given a bilingual sentence pair (F, E) and its alignment (A), we first identify minimal translation units (MTUs) from it. [sent-57, score-0.187]

28 An MTU is defined as a translation rule that cannot be broken down any further. [sent-58, score-0.085]

29 1 Tuple Sequence Model (TSM) The TSM translation model assumes that MTUs are generated monotonically. [sent-66, score-0.129]

30 To achieve this effect, we enumerate the MTUs in the target leftto-right order. [sent-67, score-0.05]

31 This process is also called source linearization or tuple unfolding. [sent-68, score-0.124]

32 The resulting sequence of monotonic MTUs is shown in Figure 1(b). [sent-69, score-0.043]

33 We then define a TSM model over this sequence (t1, t2, . [sent-70, score-0.063]

34 A 4-gram Kneser-Ney smoothed language model is trained with SRILM (Stolcke, 2002). [sent-77, score-0.044]

35 Search: In previous work, the search graph in TSM N-gram SMT was not built dynamically like in the phrase-based system, but instead con- structed as a preprocessing step using POS-based rewrite rules (learned when linearizing the source side). [sent-78, score-0.148]

36 400 phrase-based search which builds up the decoding graph dynamically and searches through all possible reorderings within a fixed window. [sent-84, score-0.134]

37 During decoding we use the phrase-internal alignments to perform source linearization. [sent-85, score-0.156]

38 For example, if during decoding we would like to apply the phrase pair “C D H d c”, a combination of t3 and t4 in Figure 1(b), then we extract the MTUs from this phrase-pair and linearize the source to be in the order of the target. [sent-86, score-0.145]

39 The idea is to replicate rewrite rules with phrase-pairs to linearize the source. [sent-88, score-0.125]

40 Previous work on N-gram-based models restricted the length of the rewrite rules to be 7 or less POS tags. [sent-89, score-0.056]

41 2 Operation Sequence Model (OSM) The OSM model represents a bilingual sentence pair and its alignment through a sequence of operations that generate the aligned sentence pair. [sent-92, score-0.148]

42 An operation either generates source and target words or it performs reordering by inserting gaps and jumping forward and backward. [sent-93, score-0.311]

43 The MTUs are generated in the target left-to-right order just as in the TSM model. [sent-94, score-0.074]

44 However rather than linearizing the source-side, reordering operations (gaps and jumps) are used to handle crossing alignments. [sent-95, score-0.251]

45 During training, each bilingual sentence pair is deterministically converted to a unique sequence of operations. [sent-96, score-0.075]

46 2 The example in Figure 1(a) is converted to the sequence of operations shown in Figure 1(c). [sent-97, score-0.096]

47 A step-wise generation of MTUs along with reordering operations is shown in Figure 1(d). [sent-98, score-0.221]

48 We learn a Markov model over a sequence of operations (o1, o2, . [sent-99, score-0.116]

49 , oJ) that encapsulate MTUs and reordering information which is defined as follows: YJ posm(F,E,A) = Yp(oj|oj−n+1,. [sent-102, score-0.142]

50 ,oj−1) Yj=1 A 9-gram Kneser-Ney smoothed language model is trained with SRILM. [sent-105, score-0.044]

51 3 By coupling reordering with lexical generation, each (translation or reordering) decision conditions on n 1 previous (translation aiondn reordering) dnec nis −ion 1s spanning across phrasal boundaries. [sent-106, score-0.251]

52 The reordering decisions therefore influence lexical selection and − 2Please refer to Durrani et al. [sent-107, score-0.142]

53 (2011) for a list of operations and the conversion algorithm. [sent-108, score-0.053]

54 A heterogeneous mixture of translation and reordering operations enables the OSM model to memorize reordering patterns and lexicalized triggers unlike the TSM model where translation and reordering are modeled separately. [sent-111, score-0.728]

55 Search: We integrated the generative story of the OSM model into the hypothesis extension process of the phrase-based decoder. [sent-112, score-0.061]

56 Each hypothesis maintains the position of the source word covered by the last generated MTU, the right-most source word generated so far, the number of open gaps and their relative indexes, etc. [sent-113, score-0.138]

57 This information is required to generate the operation sequence for the MTUs in the hypothesized phrase-pair. [sent-114, score-0.102]

58 After the operation sequence is generated, we compute its probability given the previous operations. [sent-115, score-0.102]

59 3 Problem: Target Discontinuity and Unaligned Words Two issues that we have ignored so far are the handling of MTUs which have discontinuous targets, and the handling of unaligned target words. [sent-119, score-0.313]

60 a can not be generated because of tMheT intervening . [sent-125, score-0.058]

61 it Ihn by merging TaSllM Mthme intervening MsesTaUres to form a bigger unit t01 in Figure 2(c). [sent-133, score-0.057]

62 (201 1) dealt with this problem by applying a post-processing (PP) heuristic that modifies the alignments to remove such cases. [sent-137, score-0.05]

63 When a source word is aligned to a discontinuous target-cept, first the link to the least frequent target word is identified, and the group of links containing this word is retained while the others are deleted. [sent-138, score-0.225]

64 This allows OSM to extract the intervening MTUs t2 . [sent-140, score-0.034]

65 Note that this problem does not exist when dealing with source-side discontinuities: the TSM model linearizes discontinuous source-side MTUs such as C . [sent-144, score-0.165]

66 The second problem is the unaligned target-side MTUs such as ε → f in Figure 2(a). [sent-155, score-0.118]

67 Inserting target-side hwao srd εs “spuriously” during decoding igs a non-trival problem because there is no evidence of when to hypothesize such words. [sent-156, score-0.076]

68 (201 1) for both TSM and OSM N-gram models, but found that it lowers the translation quality (See Row 2 in Table 2) in some language pairs. [sent-164, score-0.085]

69 4 Solution: Insertion and Linearization To deal with these problems, we made novel modifications to the generative story ofthe OSM model. [sent-166, score-0.052]

70 Rather than merging the unaligned target MTU such as ε f, to its right or left MTU, we genesruacthe aits through a new gGhetn oerr laetfet Target Only (f) operation. [sent-167, score-0.191]

71 Orthogonal to its counterpart Generate Source Only (I) operation (as used for MTU t7 in Figure 2 (c)), this operation is generated as soon as the MTU containing its previous target word is generated. [sent-168, score-0.192]

72 eInd a sequence yof a unaligned source a insd target MTUs, unaligned source MTUs are generated before the unaligned target MTUs. [sent-171, score-0.581]

73 We do not modify the de− − − coder to arbitrarily generate unaligned MTUs but hypothesize these only when they appear within an extracted phrase-pair. [sent-172, score-0.118]

74 The constraint provided by the phrase-based search makes the Generate Target Only operation tractable. [sent-173, score-0.092]

75 Using phrasebased search therefore helps addressing some of the problems that exist in the decoding framework of N-gram SMT. [sent-174, score-0.203]

76 The remaining problem is the discontinuous target MTUs such as A → g . [sent-175, score-0.195]

77 We hgaetnd MleT Uthiss s uwcihth a target lin ge . [sent-179, score-0.05]

78 We collapse the target words g and a in the MTU A → g . [sent-184, score-0.05]

79 a to occur consecutively w inhe thn generating →the g operation sequence. [sent-187, score-0.059]

80 The conversion algorithm that generates the operations thinks that g and a occurred adjacently. [sent-188, score-0.053]

81 During decoding we use the phrasal alignments to linearize such MTUs within a phrasal unit. [sent-189, score-0.305]

82 This linearization is done only to compute the OSM feature. [sent-190, score-0.064]

83 , language model) work with the target string in its original order. [sent-193, score-0.05]

84 Notice again how memorizing larger translation units using phrases helps us reproduce such patterns. [sent-194, score-0.172]

85 This is achieved in the tuple N-gram model by using POS-based split and rewrite rules. [sent-195, score-0.106]

86 4 Evaluation Corpus: We ran experiments with data made available for the translation task of the Eighth Workshop on Statistical Machine Translation. [sent-196, score-0.085]

87 , 2012), distortion limit of 6, 100-best translation options, minimum bayes-risk decoding (Kumar and Byrne, 2004), cube-pruning (Huang and Chiang, 2007) and the no-reordering-overpunctuation heuristic. [sent-218, score-0.161]

88 Row 2 (+pp) shows that the post-editing of alignments to remove unaligned and discontinuous target MTUs decreases the performance in the case of ru-en, csen and en-fr. [sent-221, score-0.363]

89 Row 3 (+pp+tsm) shows that our integration of the TSM model slightly improves the BLEU scores for en-fr, and es-en. [sent-222, score-0.054]

90 The only result that is lower than the baseline system is that of the ru-en experiment, because OSM is built with PP alignments which particularly hurt the performance for ru-en. [sent-225, score-0.05]

91 Finally Row 5 (+osm*) shows that our modifications to the OSM model (Section 3. [sent-226, score-0.052]

92 The largest gains are obtained in the ru-en translation task (where the PP heuristic inflicted maximum damage). [sent-232, score-0.085]

93 We try to replicate the effect of rewrite and split rules as used in the TSM model through phrasal alignments. [sent-234, score-0.176]

94 We presented a novel extension of the OSM model to handle unaligned and discontinuous target MTUs in the OSM model. [sent-235, score-0.36]

95 Phrase-based search helps us to address these problems that are non-trivial to handle in the decoding frameworks of the N-grambased models. [sent-236, score-0.161]

96 Our integration of TSM shows small improvements in a few cases. [sent-238, score-0.034]

97 The OSM model which takes both reordering and lexical context into consideration consistently improves the performance of the baseline system. [sent-239, score-0.162]

98 Although our modifications to the OSM model enables discontinuous MTUs, we did not fully utilize these during decoding, as Moses only uses continous phrases. [sent-241, score-0.197]

99 The discontinuous MTUs that span beyond a phrasal length of 6 words are therefore never hypothesized. [sent-242, score-0.215]

100 We would like to explore this further by extending the search to use discontinuous phrases (Galley and Manning, 2010). [sent-243, score-0.199]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mtus', 0.582), ('osm', 0.468), ('tsm', 0.343), ('mtu', 0.187), ('discontinuous', 0.145), ('crego', 0.144), ('durrani', 0.144), ('reordering', 0.142), ('unaligned', 0.118), ('mari', 0.097), ('josep', 0.094), ('translation', 0.085), ('jos', 0.079), ('decoding', 0.076), ('phrasal', 0.07), ('phrasebased', 0.069), ('linearization', 0.064), ('operation', 0.059), ('pp', 0.057), ('rewrite', 0.056), ('oj', 0.053), ('operations', 0.053), ('alignments', 0.05), ('target', 0.05), ('nadir', 0.047), ('moses', 0.045), ('sequence', 0.043), ('units', 0.041), ('gispert', 0.041), ('fonollosa', 0.041), ('koehn', 0.04), ('coupling', 0.039), ('yj', 0.039), ('memorize', 0.039), ('linearize', 0.039), ('helmut', 0.038), ('tj', 0.038), ('smt', 0.038), ('fraser', 0.037), ('yvon', 0.037), ('philipp', 0.036), ('ncode', 0.035), ('row', 0.035), ('intervening', 0.034), ('integration', 0.034), ('search', 0.033), ('modifications', 0.032), ('bilingual', 0.032), ('deutsche', 0.031), ('forschungsgemeinschaft', 0.031), ('hasler', 0.031), ('source', 0.03), ('tuple', 0.03), ('gaps', 0.03), ('replicate', 0.03), ('cois', 0.029), ('adri', 0.029), ('discontinuities', 0.029), ('khalilov', 0.029), ('linearizing', 0.029), ('minimal', 0.029), ('schmid', 0.028), ('fran', 0.027), ('niehues', 0.027), ('haddow', 0.027), ('bleu', 0.027), ('handle', 0.027), ('generation', 0.026), ('vaswani', 0.026), ('helps', 0.025), ('srilm', 0.025), ('barry', 0.025), ('reorderings', 0.025), ('smoothed', 0.024), ('statistical', 0.024), ('generated', 0.024), ('kenlm', 0.024), ('interpolated', 0.024), ('georgia', 0.024), ('dependencies', 0.024), ('merging', 0.023), ('denver', 0.022), ('quirk', 0.022), ('schwenk', 0.022), ('hieu', 0.022), ('edinburgh', 0.022), ('alexander', 0.021), ('marta', 0.021), ('integrated', 0.021), ('phrases', 0.021), ('association', 0.021), ('hoang', 0.02), ('model', 0.02), ('atlanta', 0.02), ('usa', 0.02), ('story', 0.02), ('inf', 0.02), ('technologies', 0.019), ('markov', 0.019), ('batch', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 77 acl-2013-Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT?

Author: Nadir Durrani ; Alexander Fraser ; Helmut Schmid ; Hieu Hoang ; Philipp Koehn

Abstract: The phrase-based and N-gram-based SMT frameworks complement each other. While the former is better able to memorize, the latter provides a more principled model that captures dependencies across phrasal boundaries. Some work has been done to combine insights from these two frameworks. A recent successful attempt showed the advantage of using phrasebased search on top of an N-gram-based model. We probe this question in the reverse direction by investigating whether integrating N-gram-based translation and reordering models into a phrase-based decoder helps overcome the problematic phrasal independence assumption. A large scale evaluation over 8 language pairs shows that performance does significantly improve.

2 0.15738101 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation

Author: Minwei Feng ; Jan-Thorsten Peter ; Hermann Ney

Abstract: In this paper, we propose a novel reordering model based on sequence labeling techniques. Our model converts the reordering problem into a sequence labeling problem, i.e. a tagging task. Results on five Chinese-English NIST tasks show that our model improves the baseline system by 1.32 BLEU and 1.53 TER on average. Results of comparative study with other seven widely used reordering models will also be reported.

3 0.14836884 101 acl-2013-Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation

Author: Karthik Visweswariah ; Mitesh M. Khapra ; Ananthakrishnan Ramanathan

Abstract: Preordering of a source language sentence to match target word order has proved to be useful for improving machine translation systems. Previous work has shown that a reordering model can be learned from high quality manual word alignments to improve machine translation performance. In this paper, we focus on further improving the performance of the reordering model (and thereby machine translation) by using a larger corpus of sentence aligned data for which manual word alignments are not available but automatic machine generated alignments are available. The main challenge we tackle is to generate quality data for training the reordering model in spite of the machine align- ments being noisy. To mitigate the effect of noisy machine alignments, we propose a novel approach that improves reorderings produced given noisy alignments and also improves word alignments using information from the reordering model. This approach generates alignments that are 2.6 f-Measure points better than a baseline supervised aligner. The data generated allows us to train a reordering model that gives an improvement of 1.8 BLEU points on the NIST MT-08 Urdu-English evaluation set over a reordering model that only uses manual word alignments, and a gain of 5.2 BLEU points over a standard phrase-based baseline.

4 0.12272105 166 acl-2013-Generalized Reordering Rules for Improved SMT

Author: Fei Huang ; Cezar Pendus

Abstract: We present a simple yet effective approach to syntactic reordering for Statistical Machine Translation (SMT). Instead of solely relying on the top-1 best-matching rule for source sentence preordering, we generalize fully lexicalized rules into partially lexicalized and unlexicalized rules to broaden the rule coverage. Furthermore, , we consider multiple permutations of all the matching rules, and select the final reordering path based on the weighed sum of reordering probabilities of these rules. Our experiments in English-Chinese and English-Japanese translations demonstrate the effectiveness of the proposed approach: we observe consistent and significant improvement in translation quality across multiple test sets in both language pairs judged by both humans and automatic metric. 1

5 0.11892907 10 acl-2013-A Markov Model of Machine Translation using Non-parametric Bayesian Inference

Author: Yang Feng ; Trevor Cohn

Abstract: Most modern machine translation systems use phrase pairs as translation units, allowing for accurate modelling of phraseinternal translation and reordering. However phrase-based approaches are much less able to model sentence level effects between different phrase-pairs. We propose a new model to address this imbalance, based on a word-based Markov model of translation which generates target translations left-to-right. Our model encodes word and phrase level phenomena by conditioning translation decisions on previous decisions and uses a hierarchical Pitman-Yor Process prior to provide dynamic adaptive smoothing. This mechanism implicitly supports not only traditional phrase pairs, but also gapping phrases which are non-consecutive in the source. Our experiments on Chinese to English and Arabic to English translation show consistent improvements over competitive baselines, of up to +3.4 BLEU.

6 0.11156893 200 acl-2013-Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation

7 0.096679114 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation

8 0.09380696 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers

9 0.090146676 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

10 0.088139988 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation

11 0.075712331 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation

12 0.073922612 363 acl-2013-Two-Neighbor Orientation Model with Cross-Boundary Global Contexts

13 0.066553213 38 acl-2013-Additive Neural Networks for Statistical Machine Translation

14 0.06629426 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT

15 0.06509003 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation

16 0.062013894 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation

17 0.059642579 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk

18 0.059294481 314 acl-2013-Semantic Roles for String to Tree Machine Translation

19 0.058894709 320 acl-2013-Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation

20 0.057299756 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.136), (1, -0.121), (2, 0.124), (3, 0.086), (4, -0.032), (5, 0.045), (6, 0.003), (7, 0.001), (8, -0.007), (9, 0.04), (10, 0.007), (11, 0.015), (12, -0.003), (13, 0.014), (14, 0.049), (15, 0.046), (16, 0.082), (17, 0.027), (18, 0.035), (19, -0.004), (20, -0.09), (21, -0.024), (22, 0.013), (23, -0.084), (24, 0.041), (25, 0.012), (26, -0.014), (27, -0.072), (28, -0.129), (29, -0.05), (30, -0.014), (31, -0.005), (32, -0.013), (33, 0.02), (34, -0.011), (35, 0.009), (36, 0.008), (37, -0.017), (38, -0.027), (39, 0.026), (40, 0.012), (41, -0.095), (42, -0.006), (43, -0.047), (44, -0.026), (45, 0.007), (46, -0.005), (47, 0.046), (48, -0.018), (49, 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91586274 166 acl-2013-Generalized Reordering Rules for Improved SMT

Author: Fei Huang ; Cezar Pendus

Abstract: We present a simple yet effective approach to syntactic reordering for Statistical Machine Translation (SMT). Instead of solely relying on the top-1 best-matching rule for source sentence preordering, we generalize fully lexicalized rules into partially lexicalized and unlexicalized rules to broaden the rule coverage. Furthermore, , we consider multiple permutations of all the matching rules, and select the final reordering path based on the weighed sum of reordering probabilities of these rules. Our experiments in English-Chinese and English-Japanese translations demonstrate the effectiveness of the proposed approach: we observe consistent and significant improvement in translation quality across multiple test sets in both language pairs judged by both humans and automatic metric. 1

2 0.91114497 101 acl-2013-Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation

Author: Karthik Visweswariah ; Mitesh M. Khapra ; Ananthakrishnan Ramanathan

Abstract: Preordering of a source language sentence to match target word order has proved to be useful for improving machine translation systems. Previous work has shown that a reordering model can be learned from high quality manual word alignments to improve machine translation performance. In this paper, we focus on further improving the performance of the reordering model (and thereby machine translation) by using a larger corpus of sentence aligned data for which manual word alignments are not available but automatic machine generated alignments are available. The main challenge we tackle is to generate quality data for training the reordering model in spite of the machine align- ments being noisy. To mitigate the effect of noisy machine alignments, we propose a novel approach that improves reorderings produced given noisy alignments and also improves word alignments using information from the reordering model. This approach generates alignments that are 2.6 f-Measure points better than a baseline supervised aligner. The data generated allows us to train a reordering model that gives an improvement of 1.8 BLEU points on the NIST MT-08 Urdu-English evaluation set over a reordering model that only uses manual word alignments, and a gain of 5.2 BLEU points over a standard phrase-based baseline.

same-paper 3 0.88711548 77 acl-2013-Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT?

Author: Nadir Durrani ; Alexander Fraser ; Helmut Schmid ; Hieu Hoang ; Philipp Koehn

Abstract: The phrase-based and N-gram-based SMT frameworks complement each other. While the former is better able to memorize, the latter provides a more principled model that captures dependencies across phrasal boundaries. Some work has been done to combine insights from these two frameworks. A recent successful attempt showed the advantage of using phrasebased search on top of an N-gram-based model. We probe this question in the reverse direction by investigating whether integrating N-gram-based translation and reordering models into a phrase-based decoder helps overcome the problematic phrasal independence assumption. A large scale evaluation over 8 language pairs shows that performance does significantly improve.

4 0.88033164 200 acl-2013-Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation

Author: ThuyLinh Nguyen ; Stephan Vogel

Abstract: Hiero translation models have two limitations compared to phrase-based models: 1) Limited hypothesis space; 2) No lexicalized reordering model. We propose an extension of Hiero called PhrasalHiero to address Hiero’s second problem. Phrasal-Hiero still has the same hypothesis space as the original Hiero but incorporates a phrase-based distance cost feature and lexicalized reodering features into the chart decoder. The work consists of two parts: 1) for each Hiero translation derivation, find its corresponding dis- continuous phrase-based path. 2) Extend the chart decoder to incorporate features from the phrase-based path. We achieve significant improvement over both Hiero and phrase-based baselines for ArabicEnglish, Chinese-English and GermanEnglish translation.

5 0.84838855 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation

Author: Minwei Feng ; Jan-Thorsten Peter ; Hermann Ney

Abstract: In this paper, we propose a novel reordering model based on sequence labeling techniques. Our model converts the reordering problem into a sequence labeling problem, i.e. a tagging task. Results on five Chinese-English NIST tasks show that our model improves the baseline system by 1.32 BLEU and 1.53 TER on average. Results of comparative study with other seven widely used reordering models will also be reported.

6 0.83218986 363 acl-2013-Two-Neighbor Orientation Model with Cross-Boundary Global Contexts

7 0.82302701 125 acl-2013-Distortion Model Considering Rich Context for Statistical Machine Translation

8 0.59029186 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding

9 0.58086717 320 acl-2013-Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation

10 0.57795203 10 acl-2013-A Markov Model of Machine Translation using Non-parametric Bayesian Inference

11 0.57372671 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers

12 0.54065818 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT

13 0.51726192 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation

14 0.50701284 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation

15 0.49940941 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation

16 0.49034265 15 acl-2013-A Novel Graph-based Compact Representation of Word Alignment

17 0.48910353 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk

18 0.4841105 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

19 0.47479588 214 acl-2013-Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation

20 0.46374795 354 acl-2013-Training Nondeficient Variants of IBM-3 and IBM-4 for Word Alignment


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.055), (6, 0.058), (11, 0.036), (14, 0.014), (24, 0.033), (25, 0.232), (26, 0.049), (35, 0.079), (40, 0.01), (42, 0.13), (48, 0.028), (70, 0.035), (90, 0.069), (95, 0.068)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80763447 77 acl-2013-Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT?

Author: Nadir Durrani ; Alexander Fraser ; Helmut Schmid ; Hieu Hoang ; Philipp Koehn

Abstract: The phrase-based and N-gram-based SMT frameworks complement each other. While the former is better able to memorize, the latter provides a more principled model that captures dependencies across phrasal boundaries. Some work has been done to combine insights from these two frameworks. A recent successful attempt showed the advantage of using phrasebased search on top of an N-gram-based model. We probe this question in the reverse direction by investigating whether integrating N-gram-based translation and reordering models into a phrase-based decoder helps overcome the problematic phrasal independence assumption. A large scale evaluation over 8 language pairs shows that performance does significantly improve.

2 0.70429486 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification

Author: Matt Post ; Shane Bergsma

Abstract: Syntactic features are useful for many text classification tasks. Among these, tree kernels (Collins and Duffy, 2001) have been perhaps the most robust and effective syntactic tool, appealing for their empirical success, but also because they do not require an answer to the difficult question of which tree features to use for a given task. We compare tree kernels to different explicit sets of tree features on five diverse tasks, and find that explicit features often perform as well as tree kernels on accuracy and always in orders of magnitude less time, and with smaller models. Since explicit features are easy to generate and use (with publicly avail- able tools) , we suggest they should always be included as baseline comparisons in tree kernel method evaluations.

3 0.69685876 95 acl-2013-Crawling microblogging services to gather language-classified URLs. Workflow and case study

Author: Adrien Barbaresi

Abstract: We present a way to extract links from messages published on microblogging platforms and we classify them according to the language and possible relevance of their target in order to build a text corpus. Three platforms are taken into consideration: FriendFeed, identi.ca and Reddit, as they account for a relative diversity of user profiles and more importantly user languages. In order to explore them, we introduce a traversal algorithm based on user pages. As we target lesser-known languages, we try to focus on non-English posts by filtering out English text. Using mature open-source software from the NLP research field, a spell checker (as- pell) and a language identification system (langid .py), our case study and our benchmarks give an insight into the linguistic structure of the considered services.

4 0.67565745 387 acl-2013-Why-Question Answering using Intra- and Inter-Sentential Causal Relations

Author: Jong-Hoon Oh ; Kentaro Torisawa ; Chikara Hashimoto ; Motoki Sano ; Stijn De Saeger ; Kiyonori Ohtake

Abstract: In this paper, we explore the utility of intra- and inter-sentential causal relations between terms or clauses as evidence for answering why-questions. To the best of our knowledge, this is the first work that uses both intra- and inter-sentential causal relations for why-QA. We also propose a method for assessing the appropriateness of causal relations as answers to a given question using the semantic orientation of excitation proposed by Hashimoto et al. (2012). By applying these ideas to Japanese why-QA, we improved precision by 4.4% against all the questions in our test set over the current state-of-theart system for Japanese why-QA. In addi- tion, unlike the state-of-the-art system, our system could achieve very high precision (83.2%) for 25% of all the questions in the test set by restricting its output to the confident answers only.

5 0.65409446 166 acl-2013-Generalized Reordering Rules for Improved SMT

Author: Fei Huang ; Cezar Pendus

Abstract: We present a simple yet effective approach to syntactic reordering for Statistical Machine Translation (SMT). Instead of solely relying on the top-1 best-matching rule for source sentence preordering, we generalize fully lexicalized rules into partially lexicalized and unlexicalized rules to broaden the rule coverage. Furthermore, , we consider multiple permutations of all the matching rules, and select the final reordering path based on the weighed sum of reordering probabilities of these rules. Our experiments in English-Chinese and English-Japanese translations demonstrate the effectiveness of the proposed approach: we observe consistent and significant improvement in translation quality across multiple test sets in both language pairs judged by both humans and automatic metric. 1

6 0.64075619 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation

7 0.63487089 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT

8 0.63097155 38 acl-2013-Additive Neural Networks for Statistical Machine Translation

9 0.61995691 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding

10 0.61840272 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation

11 0.61783355 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation

12 0.61651707 101 acl-2013-Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation

13 0.61545742 206 acl-2013-Joint Event Extraction via Structured Prediction with Global Features

14 0.61204034 328 acl-2013-Stacking for Statistical Machine Translation

15 0.61166555 90 acl-2013-Conditional Random Fields for Responsive Surface Realisation using Global Features

16 0.61092401 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models

17 0.61031342 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation

18 0.60972494 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing

19 0.60906249 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

20 0.60738128 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk