acl acl2013 acl2013-201 knowledge-graph by maker-knowledge-mining

201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding


Source: pdf

Author: Kun Wang ; Chengqing Zong ; Keh-Yih Su

Abstract: Since statistical machine translation (SMT) and translation memory (TM) complement each other in matched and unmatched regions, integrated models are proposed in this paper to incorporate TM information into phrase-based SMT. Unlike previous multi-stage pipeline approaches, which directly merge TM result into the final output, the proposed models refer to the corresponding TM information associated with each phrase at SMT decoding. On a Chinese–English TM database, our experiments show that the proposed integrated Model-III is significantly better than either the SMT or the TM systems when the fuzzy match score is above 0.4. Furthermore, integrated Model-III achieves overall 3.48 BLEU points improvement and 2.62 TER points reduction in comparison with the pure SMT system. Be- . sides, the proposed models also outperform previous approaches significantly.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Unlike previous multi-stage pipeline approaches, which directly merge TM result into the final output, the proposed models refer to the corresponding TM information associated with each phrase at SMT decoding. [sent-2, score-0.257]

2 On a Chinese–English TM database, our experiments show that the proposed integrated Model-III is significantly better than either the SMT or the TM systems when the fuzzy match score is above 0. [sent-3, score-0.48]

3 1 Introduction Statistical machine translation (SMT), especially the phrase-based model (Koehn et al. [sent-10, score-0.116]

4 However, SMT is rarely applied to professional translation because its output quality is still far from satisfactory. [sent-14, score-0.149]

5 In contrast, translation memory (TM), which uses the most similar translation sentence (usually above a certain fuzzy match threshold) in the database as the reference for post-editing, has ia . [sent-17, score-0.666]

6 tw been widely adopted in professional translation field for many years (Lagoudaki, 2006). [sent-21, score-0.223]

7 TM is very useful for repetitive material such as updated product manuals, and can give high quality and consistent translations when the similarity of fuzzy match is high. [sent-22, score-0.334]

8 However, high-similarity fuzzy matches are available unless the material is very repetitive. [sent-24, score-0.257]

9 Since TM and SMT complement each other in those matched and unmatched segments, the output quality is expected to be raised significantly if they can be combined to supplement each other. [sent-28, score-0.232]

10 In recent years, some previous works have incorporated TM matched segments into SMT in a pipelined manner (Koehn and Senellart, 2010; Zhechev and van Genabith, 2010; He et al. [sent-29, score-0.169]

11 Most of them use segments1, fuzzy match score as the threshold, but He et al. [sent-34, score-0.368]

12 Afterwards, they merge the relevant translations of matched segments into the source sentence, and then force the SMT system to only translate those unmatched segments at decoding. [sent-37, score-0.386]

13 Firstly, all of them determine whether those matched segments 1 We mean “sub-sentential segments” in this work. [sent-39, score-0.146]

14 Secondly, as several TM target phrases might be available for one given TM source phrase due to insertions, the incorrect selection made in the merging stage cannot be remedied in the following translation stage. [sent-45, score-0.461]

15 For example, there are six possible corresponding TM target phrases for the given TM source phrase “关联4 的5 象6” (as shown in Figure 1) such as “object2 that3 is4 associated5”, and “an1 object2 that3 is4 associated5 with6”, etc. [sent-46, score-0.386]

16 Thirdly, the pipeline approach does not utilize the SMT probabilistic information in deciding whether a matched TM phrase should be adopted or not, and which target phrase should be selected when we have multiple candidates. [sent-48, score-0.569]

17 However, since only one aligned target phrase will be added for each matched source phrase, they share most drawbacks with the pipeline approaches mentioned above and merely achieve similar performance. [sent-51, score-0.482]

18 To avoid the drawbacks of the pipeline approach (mainly due to making a hard decision before decoding), we propose several integrated models to completely make use of TM information during decoding. [sent-52, score-0.179]

19 For each TM source phrase, we keep all its possible corresponding target phrases (instead of keeping only one of them). [sent-53, score-0.257]

20 The integrated models then consider all corresponding TM target phrases and SMT preference during decoding. [sent-54, score-0.257]

21 Therefore, the proposed integrated models combine SMT and TM at a deep level (versus the surface level at which TM result is directly plugged in under previous pipeline approaches). [sent-55, score-0.146]

22 On a Chinese–English computer technical documents TM database, our experiments have shown that the proposed Model-III improves the translation quality significantly over either the pure phrase-based SMT or the TM systems when the fuzzy match score is above 0. [sent-56, score-0.547]

23 Compared with the pure SMT system, the proposed integrated Model-III achieves 3. [sent-58, score-0.117]

24 denotes the word alignment between and Let and denote the k-th associated source phrase and target phrase, respectively. [sent-65, score-0.314]

25 Also, and denote the associated source phrase sequence and the target phrase sequence, respectively (total phrases without insertion). [sent-66, score-0.498]

26 Then the above formula (1) can be decomposed as below: , (2) Afterwards, for any given source phrase we can find its corresponding TM source phrase and all possible TM target phrases (each of them is denoted by ) the help of with corresponding editing operations and word alignment . [sent-67, score-0.639]

27 As mentioned above, we can have six different possible TM target phrases for the TM source phrase “关联 4 的 5 对象 6”. [sent-68, score-0.345]

28 This 12 , is because there are insertions around the directly aligned TM target phrase. [sent-69, score-0.118]

29 In the above Equation (2), we first segment the given source sentence into various phrases, and then translate the sentence based on those source phrases. [sent-70, score-0.238]

30 In the second line of Equation (3), we convert the fuzzy match score into its corresponding interval and incorporate all possible combinations of TM target phrases. [sent-76, score-0.572]

31 Last, in the fourth line, we introduce the source matching status and the target linking status (detailed features would be defined later). [sent-78, score-0.557]

32 Since we might have several possible TM target phrases the one with the maximum score will be adopted , , during decoding. [sent-79, score-0.241]

33 is then specified as: All features incorporated in this model are specified as follows: TM Fuzzy Match Interval (z): The fuzzy match score (FMS) between source sentence and TM source sentence indicates the reliability of the given TM sentence, and is defined as (Sikes, 2007): . [sent-86, score-0.67]

34 Where is the word-based Levenshtein Distance (Levenshtein, 1966) between and We equally divide FMS into ten fuzzy match intervals such as: [0. [sent-89, score-0.377]

35 For example, since the fuzzy match score between and in Figure 1 is 0. [sent-95, score-0.368]

36 667, then Target Phrase Content Matching Status (TCM): It indicates the content matching status between and and reflects the quality of Because is nearly perfect when FMS is high, if the similarity between and is high, it implies that the given is possibly a good candidate. [sent-96, score-0.23]

37 It is a member of {Same, High, Low, NA (Not-Applicable) }, and is specified as: (1) If is not null: (a) if (b) else if (c) else, ; (2) If is null, Here is null means that either there is no corresponding TM source phrase or there is no corresponding TM target phrase ,, ; ; ; . [sent-97, score-0.726]

38 If is “object2 that3 is4 associated5”, if is “an1 object2 that3 is4 associated5”, 对 ; Source Phrase Content Matching Status (SCM): Which indicates the content matching status between and , and it affects . [sent-100, score-0.207]

39 对 Number of Linking Neighbors (NLN): Usually, the context of a source phrase would affect its target translation. [sent-104, score-0.29]

40 Therefore, this NLN feature reflects the number of matched neighbors (words) and it is a vector of . [sent-106, score-0.163]

41 Where “x” denotes the number of matched source neighbors; and “y” denotes how many those neighbors are also linked to target words (not null), which also affects the TM target phrase selection. [sent-107, score-0.508]

42 For the source phrase “关联 5 6 对象 7” in Figure 1, the corresponding TM source phrase is “关联 4 5 对 象 6” . [sent-109, score-0.465]

43 的 的 Source Phrase Length (SPL): Usually the longer the source phrase is, the more reliable the TM target phrase is. [sent-112, score-0.419]

44 For example, the corresponding for the source phrase with 5 words would be more reliable than that with only one word. [sent-113, score-0.253]

45 Sentence End Punctuation Indicator (SEP): Which indicates whether the current phrase is a punctuation at the end of the sentence, and is a member of { Yes, No} . [sent-116, score-0.209]

46 For “关联 4 5 对象 6” in Figure 1, the linked TM target phrase is “object2 that3 is4 associated5”, and there are 5 other candidates by extending to both sides. [sent-124, score-0.259]

47 Longest TM Candidate Indicator (LTC): Which indicates whether the given is the longest candidate or not, and is a member of { Original, Left-Longest, Right-Longest, BothLongest, Medium, NA } . [sent-126, score-0.159]

48 The new feature CPM adopted in Model-III is defined as: Target Phrase Adjacent Candidate Relative Position Matching Status (CPM): Which indicates the matching status between the relative position of and the relative position of . [sent-132, score-0.281]

49 It checks if are positioned in the same order with and reflects the quality of ordering the given target candidate . [sent-133, score-0.132]

50 Recall that is always right adjacent to then various cases are defined as , , follows: (1) If both and are not null: (a) If is on the right of and they are also adjacent to each other: i. [sent-135, score-0.17]

51 O therwise, ; (b) If is on the right of but they are not adjacent to each other, ; (c) If is not on the right of , : i. [sent-137, score-0.121]

52 O therwise, ; (2) If is null but is not null, then find the first which is not null (starts from 2)2: (a) If is on the right of , ; (b) If is not on the right of , . [sent-139, score-0.23]

53 For “object2 that3 is4 associated5”, because is on the right of and they are adjacent pair, and both boundary words (“an” and “an1”; “object” and “object2”) are matched, ; for “an1 object2 that3 is4 associated5”, because there are cross parts “an1” between and , . [sent-144, score-0.15]

54 1 Experimental Setup Our TM database consists of computer domain Chinese-English translation sentence-pairs, which contains about 267k sentence-pairs. [sent-151, score-0.142]

55 Furthermore, development set and test set are divided into various intervals according to their best fuzzy match scores. [sent-157, score-0.377]

56 The maximum phrase length is set to 7 in our experiments. [sent-166, score-0.129]

57 In this work, the translation performance is measured with case-insensitive BLEU-4 score (Papineni et al. [sent-167, score-0.15]

58 2 Cross-Fold Translation To estimate the probabilities of proposed models, the corresponding phrase segmentations for bilingual sentences are required. [sent-172, score-0.199]

59 As we want to check what actually happened during decoding in the real situation, cross-fold translation is used to obtain the corresponding phrase segmentations. [sent-173, score-0.338]

60 Afterwards, we generate the corresponding phrase segmentations for the remaining 5% bi3 “grow-diag-final” and “grow-diag-final-and” are also tested. [sent-175, score-0.199]

61 However, “intersection” is the best option in our experiments, especially for those high fuzzy match intervals. [sent-176, score-0.334]

62 , 2010), which searches the best phrase segmentation for the specified output. [sent-214, score-0.161]

63 Having repeated the above steps 20 times4, we obtain the corresponding phrase segmentations for the SMT training data (which will then be used to train the integrated models). [sent-215, score-0.282]

64 7% of the training bilingual sentences can generate the corresponding target results. [sent-218, score-0.119]

65 Furthermore, more than 90% obtained source phrases are observed to be less than 5 words, which explains why five different quantization levels are adopted for Source Phrase Length (SPL) in section 3. [sent-221, score-0.212]

66 Afterwards, we incorporate the TM information for each phrase at decoding. [sent-228, score-0.129]

67 Table 3 and 4 give the translation results of TM, SMT, and three integrated models in the test set. [sent-237, score-0.199]

68 In the tables, the best translation results (either in BLEU or TER) at each interval have been marked in bold. [sent-238, score-0.239]

69 It can be seen that TM significantly exceeds SMT at the interval [0. [sent-241, score-0.159]

70 Compared with TM and SMT, Model-I is significantly better than the SMT system in either BLEU or TER when the fuzzy match score is above 0. [sent-244, score-0.397]

71 7; Model-II significantly outperforms both the TM and the SMT systems in either BLEU or TER when the fuzzy match score is above 0. [sent-245, score-0.397]

72 5; Model-III significantly exceeds both the TM and the SMT systems in either BLEU or TER when the fuzzy match score is above 0. [sent-246, score-0.442]

73 However, the improvements from integrated models get less when the fuzzy match score decreases. [sent-249, score-0.451]

74 This is because lower fuzzy match score means that there are more unmatched parts between and ; the output of TM is thus less reliable. [sent-257, score-0.535]

75 If intervals are evaluated separately, when the fuzzy match score is above 0. [sent-261, score-0.411]

76 [Miss 7 target words: 9~12, 20~21, 28; Has one wrong permuta- tiif oyno] u do you disable this policy setting , internet explorer does not check the internet for new Koehn-10 versions of the browser , so does not prompt users to install them . [sent-274, score-0.668]

77 [Insert two spurious target iwf oyrodus ] disable this policy setting , internet explorer does not prompt users to install internet for Ma-11 new versions of the browser . [sent-275, score-0.61]

78 [Miss 7 target words: 9~12, 20~21, 28; Has one wrong permuta- itif oyn]ou disable this policy setting , internet explorer does not prompt users to install new verModel-I sions of the browser , so does not check the internet . [sent-276, score-0.668]

79 [Miss 2 target words: 14, 28; Has one Model-II if you disable this policy setting , internet explorer does not prompt users to install new versions of the browser , so does not check the internet . [sent-277, score-0.634]

80 [Miss 2 target words: 14, 28; Has one wrong permutation] wrong permutation] Model-I iof ythoeu bdrioswabsle rt h, is o pdo leiscy n oset ptirnogm p, itn utesrenres t oe xipnslotarlel rt hdeomes . [sent-278, score-0.146]

81 (2010) first find out the unmatched parts between the given source sentence and TM source sentence. [sent-283, score-0.34]

82 Afterwards, for each unmatched phrase in the TM source sentence, they replace its corresponding translation in the TM target sentence by the corresponding source phrase in the input sentence, and then mark the substitution part. [sent-284, score-0.843]

83 After replacing the corresponding translations of all unmatched source phrases in the TM target sentence, an XML input sentence (with mixed TM target phrases and marked input source phrases) is thus obtained. [sent-285, score-0.654]

84 The SMT decoder then only translates the unmatched/marked source phrases and gets the desired results. [sent-286, score-0.164]

85 Therefore, the inserted parts in the TM target sentence are automatically included. [sent-287, score-0.145]

86 They use fuzzy match score to determine wheth- er the current sentence should be marked or not; and their experiments show that this method is only effective when the fuzzy match score is above 0. [sent-288, score-0.81]

87 (201 1) think fuzzy match score is not reliable and use a discriminative learning method to decide whether the current sentence should be marked or not. [sent-291, score-0.442]

88 In constructing the XML input sentence, Ma-1 1 replaces each matched source phrase in the given source sentence with the corresponding TM target phrase. [sent-293, score-0.546]

89 Therefore, the inserted parts in the TM target sentence are not included. [sent-294, score-0.145]

90 More importantly, the proposed models achieve much better TER score than the TM system does at interval [0. [sent-312, score-0.119]

91 5 Conclusion and Future Work Unlike the previous pipeline approaches, which directly merge TM phrases into the final translation result, we integrate TM information of each source phrase into the phrase-based SMT at decoding. [sent-326, score-0.446]

92 In addition, all possible TM target phrases are kept and the proposed models select the best one during decoding via referring SMT information. [sent-327, score-0.161]

93 05) in either BLEU or TER when fuzzy match score is above 0. [sent-330, score-0.368]

94 However, the TM is expected to play an even more important role when the SMT training-set differs from the TM database, as additional phrase-pairs that are unseen in the SMT phrase table can be extracted from TM (which can then be dynamically added into the SMT phrase table at decoding time). [sent-339, score-0.286]

95 In addition, more source phrases can be matched if a set of high-FMS sentences, instead of only the sentence with the highest FMS, can be extracted and referred at the same time. [sent-341, score-0.27]

96 Dynamic translation memory: using statistical machine translation to improve translation memory fuzzy match- es. [sent-363, score-0.643]

97 Translation memories survey 2006: Users’ perceptions around tm use. [sent-404, score-0.706]

98 Consistent translation using discriminative learning: a translation memory-inspired approach. [sent-419, score-0.232]

99 Cunei: open-source machine translation with relevance-based models of each translation instance. [sent-441, score-0.232]

100 Seeding statistical machine translation with translation memory output through tree-based structural alignment. [sent-474, score-0.27]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tm', 0.706), ('smt', 0.274), ('fuzzy', 0.257), ('status', 0.16), ('phrase', 0.129), ('translation', 0.116), ('unmatched', 0.107), ('ter', 0.1), ('matched', 0.096), ('interval', 0.085), ('source', 0.083), ('integrated', 0.083), ('disable', 0.082), ('fms', 0.082), ('spl', 0.082), ('null', 0.079), ('target', 0.078), ('match', 0.077), ('bleu', 0.074), ('adopted', 0.074), ('sep', 0.072), ('ma', 0.072), ('longest', 0.071), ('afterwards', 0.067), ('explorer', 0.067), ('install', 0.067), ('internet', 0.066), ('cpm', 0.065), ('nln', 0.065), ('pipeline', 0.063), ('member', 0.057), ('koehn', 0.055), ('browser', 0.055), ('prompt', 0.055), ('phrases', 0.055), ('css', 0.053), ('tcm', 0.053), ('segments', 0.05), ('yifan', 0.05), ('ltc', 0.05), ('policy', 0.05), ('scm', 0.049), ('adjacent', 0.049), ('yanjun', 0.048), ('matching', 0.047), ('exceeds', 0.045), ('neighbors', 0.044), ('intervals', 0.043), ('na', 0.042), ('corresponding', 0.041), ('josef', 0.041), ('insertions', 0.04), ('memory', 0.038), ('marked', 0.038), ('sentence', 0.036), ('right', 0.036), ('xml', 0.035), ('wrong', 0.034), ('pure', 0.034), ('miss', 0.034), ('score', 0.034), ('boundary', 0.034), ('professional', 0.033), ('drawbacks', 0.033), ('ici', 0.033), ('senellart', 0.033), ('therwise', 0.033), ('xmlmarkup', 0.033), ('zhechev', 0.033), ('specified', 0.032), ('parts', 0.031), ('andy', 0.031), ('candidate', 0.031), ('candidates', 0.029), ('points', 0.029), ('significantly', 0.029), ('means', 0.029), ('nagao', 0.029), ('auli', 0.029), ('linking', 0.029), ('segmentations', 0.029), ('else', 0.028), ('decoding', 0.028), ('object', 0.028), ('witten', 0.027), ('ebmt', 0.027), ('database', 0.026), ('gets', 0.026), ('phillips', 0.025), ('och', 0.025), ('check', 0.024), ('wisniewski', 0.024), ('philipp', 0.024), ('associated', 0.024), ('users', 0.024), ('extending', 0.023), ('van', 0.023), ('reflects', 0.023), ('reordering', 0.023), ('punctuation', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding

Author: Kun Wang ; Chengqing Zong ; Keh-Yih Su

Abstract: Since statistical machine translation (SMT) and translation memory (TM) complement each other in matched and unmatched regions, integrated models are proposed in this paper to incorporate TM information into phrase-based SMT. Unlike previous multi-stage pipeline approaches, which directly merge TM result into the final output, the proposed models refer to the corresponding TM information associated with each phrase at SMT decoding. On a Chinese–English TM database, our experiments show that the proposed integrated Model-III is significantly better than either the SMT or the TM systems when the fuzzy match score is above 0.4. Furthermore, integrated Model-III achieves overall 3.48 BLEU points improvement and 2.62 TER points reduction in comparison with the pure SMT system. Be- . sides, the proposed models also outperform previous approaches significantly.

2 0.18153308 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation

Author: Rico Sennrich ; Holger Schwenk ; Walid Aransa

Abstract: While domain adaptation techniques for SMT have proven to be effective at improving translation quality, their practicality for a multi-domain environment is often limited because of the computational and human costs of developing and maintaining multiple systems adapted to different domains. We present an architecture that delays the computation of translation model features until decoding, allowing for the application of mixture-modeling techniques at decoding time. We also de- scribe a method for unsupervised adaptation with development and test data from multiple domains. Experimental results on two language pairs demonstrate the effectiveness of both our translation model architecture and automatic clustering, with gains of up to 1BLEU over unadapted systems and single-domain adaptation.

3 0.17636432 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation

Author: Jiajun Zhang ; Chengqing Zong

Abstract: Currently, almost all of the statistical machine translation (SMT) models are trained with the parallel corpora in some specific domains. However, when it comes to a language pair or a different domain without any bilingual resources, the traditional SMT loses its power. Recently, some research works study the unsupervised SMT for inducing a simple word-based translation model from the monolingual corpora. It successfully bypasses the constraint of bitext for SMT and obtains a relatively promising result. In this paper, we take a step forward and propose a simple but effective method to induce a phrase-based model from the monolingual corpora given an automatically-induced translation lexicon or a manually-edited translation dictionary. We apply our method for the domain adaptation task and the extensive experiments show that our proposed method can substantially improve the translation quality. 1

4 0.12964983 195 acl-2013-Improving machine translation by training against an automatic semantic frame based evaluation metric

Author: Chi-kiu Lo ; Karteek Addanki ; Markus Saers ; Dekai Wu

Abstract: We present the first ever results showing that tuning a machine translation system against a semantic frame based objective function, MEANT, produces more robustly adequate translations than tuning against BLEU or TER as measured across commonly used metrics and human subjective evaluation. Moreover, for informal web forum data, human evaluators preferred MEANT-tuned systems over BLEU- or TER-tuned systems by a significantly wider margin than that for formal newswire—even though automatic semantic parsing might be expected to fare worse on informal language. We argue thatbypreserving the meaning ofthe trans- lations as captured by semantic frames right in the training process, an MT system is constrained to make more accurate choices of both lexical and reordering rules. As a result, MT systems tuned against semantic frame based MT evaluation metrics produce output that is more adequate. Tuning a machine translation system against a semantic frame based objective function is independent ofthe translation model paradigm, so, any translation model can benefit from the semantic knowledge incorporated to improve translation adequacy through our approach.

5 0.12124144 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation

Author: Conghui Zhu ; Taro Watanabe ; Eiichiro Sumita ; Tiejun Zhao

Abstract: Typical statistical machine translation systems are batch trained with a given training data and their performances are largely influenced by the amount of data. With the growth of the available data across different domains, it is computationally demanding to perform batch training every time when new data comes. In face of the problem, we propose an efficient phrase table combination method. In particular, we train a Bayesian phrasal inversion transduction grammars for each domain separately. The learned phrase tables are hierarchically combined as if they are drawn from a hierarchical Pitman-Yor process. The performance measured by BLEU is at least as comparable to the traditional batch training method. Furthermore, each phrase table is trained separately in each domain, and while computational overhead is significantly reduced by training them in parallel.

6 0.12043594 235 acl-2013-Machine Translation Detection from Monolingual Web-Text

7 0.11334427 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation

8 0.11273571 383 acl-2013-Vector Space Model for Adaptation in Statistical Machine Translation

9 0.11122952 136 acl-2013-Enhanced and Portable Dependency Projection Algorithms Using Interlinear Glossed Text

10 0.11119638 374 acl-2013-Using Context Vectors in Improving a Machine Translation System with Bridge Language

11 0.10904408 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation

12 0.10888942 197 acl-2013-Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation

13 0.1081409 10 acl-2013-A Markov Model of Machine Translation using Non-parametric Bayesian Inference

14 0.10657079 338 acl-2013-Task Alternation in Parallel Sentence Retrieval for Twitter Translation

15 0.10653789 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk

16 0.10564272 289 acl-2013-QuEst - A translation quality estimation framework

17 0.099811308 38 acl-2013-Additive Neural Networks for Statistical Machine Translation

18 0.096466117 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers

19 0.096431807 328 acl-2013-Stacking for Statistical Machine Translation

20 0.094457023 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.202), (1, -0.141), (2, 0.175), (3, 0.099), (4, 0.028), (5, 0.034), (6, -0.006), (7, 0.008), (8, 0.016), (9, 0.052), (10, -0.034), (11, 0.042), (12, -0.022), (13, 0.076), (14, -0.006), (15, 0.025), (16, -0.018), (17, -0.016), (18, -0.012), (19, 0.034), (20, 0.052), (21, -0.031), (22, 0.001), (23, -0.008), (24, 0.028), (25, 0.005), (26, 0.02), (27, 0.023), (28, -0.025), (29, 0.032), (30, 0.026), (31, 0.015), (32, -0.003), (33, -0.014), (34, 0.038), (35, 0.03), (36, 0.035), (37, -0.094), (38, -0.062), (39, -0.024), (40, 0.034), (41, -0.029), (42, -0.027), (43, 0.003), (44, -0.061), (45, 0.031), (46, -0.07), (47, 0.029), (48, -0.063), (49, -0.071)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96336377 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding

Author: Kun Wang ; Chengqing Zong ; Keh-Yih Su

Abstract: Since statistical machine translation (SMT) and translation memory (TM) complement each other in matched and unmatched regions, integrated models are proposed in this paper to incorporate TM information into phrase-based SMT. Unlike previous multi-stage pipeline approaches, which directly merge TM result into the final output, the proposed models refer to the corresponding TM information associated with each phrase at SMT decoding. On a Chinese–English TM database, our experiments show that the proposed integrated Model-III is significantly better than either the SMT or the TM systems when the fuzzy match score is above 0.4. Furthermore, integrated Model-III achieves overall 3.48 BLEU points improvement and 2.62 TER points reduction in comparison with the pure SMT system. Be- . sides, the proposed models also outperform previous approaches significantly.

2 0.81067514 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk

Author: Lei Cui ; Dongdong Zhang ; Shujie Liu ; Mu Li ; Ming Zhou

Abstract: The quality of bilingual data is a key factor in Statistical Machine Translation (SMT). Low-quality bilingual data tends to produce incorrect translation knowledge and also degrades translation modeling performance. Previous work often used supervised learning methods to filter lowquality data, but a fair amount of human labeled examples are needed which are not easy to obtain. To reduce the reliance on labeled examples, we propose an unsupervised method to clean bilingual data. The method leverages the mutual reinforcement between the sentence pairs and the extracted phrase pairs, based on the observation that better sentence pairs often lead to better phrase extraction and vice versa. End-to-end experiments show that the proposed method substantially improves the performance in largescale Chinese-to-English translation tasks.

3 0.80697501 374 acl-2013-Using Context Vectors in Improving a Machine Translation System with Bridge Language

Author: Samira Tofighi Zahabi ; Somayeh Bakhshaei ; Shahram Khadivi

Abstract: Mapping phrases between languages as translation of each other by using an intermediate language (pivot language) may generate translation pairs that are wrong. Since a word or a phrase has different meanings in different contexts, we should map source and target phrases in an intelligent way. We propose a pruning method based on the context vectors to remove those phrase pairs that connect to each other by a polysemous pivot phrase or by weak translations. We use context vectors to implicitly disambiguate the phrase senses and to recognize irrelevant phrase translation pairs. Using the proposed method a relative improvement of 2.8 percent in terms of BLEU score is achieved. 1

4 0.80549431 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation

Author: Jiajun Zhang ; Chengqing Zong

Abstract: Currently, almost all of the statistical machine translation (SMT) models are trained with the parallel corpora in some specific domains. However, when it comes to a language pair or a different domain without any bilingual resources, the traditional SMT loses its power. Recently, some research works study the unsupervised SMT for inducing a simple word-based translation model from the monolingual corpora. It successfully bypasses the constraint of bitext for SMT and obtains a relatively promising result. In this paper, we take a step forward and propose a simple but effective method to induce a phrase-based model from the monolingual corpora given an automatically-induced translation lexicon or a manually-edited translation dictionary. We apply our method for the domain adaptation task and the extensive experiments show that our proposed method can substantially improve the translation quality. 1

5 0.80530918 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation

Author: Conghui Zhu ; Taro Watanabe ; Eiichiro Sumita ; Tiejun Zhao

Abstract: Typical statistical machine translation systems are batch trained with a given training data and their performances are largely influenced by the amount of data. With the growth of the available data across different domains, it is computationally demanding to perform batch training every time when new data comes. In face of the problem, we propose an efficient phrase table combination method. In particular, we train a Bayesian phrasal inversion transduction grammars for each domain separately. The learned phrase tables are hierarchically combined as if they are drawn from a hierarchical Pitman-Yor process. The performance measured by BLEU is at least as comparable to the traditional batch training method. Furthermore, each phrase table is trained separately in each domain, and while computational overhead is significantly reduced by training them in parallel.

6 0.7701602 214 acl-2013-Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation

7 0.76858425 355 acl-2013-TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain

8 0.7605837 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation

9 0.75814837 10 acl-2013-A Markov Model of Machine Translation using Non-parametric Bayesian Inference

10 0.75531679 383 acl-2013-Vector Space Model for Adaptation in Statistical Machine Translation

11 0.75504231 77 acl-2013-Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT?

12 0.72542197 338 acl-2013-Task Alternation in Parallel Sentence Retrieval for Twitter Translation

13 0.72522521 200 acl-2013-Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation

14 0.72253931 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation

15 0.71584904 64 acl-2013-Automatically Predicting Sentence Translation Difficulty

16 0.71540612 195 acl-2013-Improving machine translation by training against an automatic semantic frame based evaluation metric

17 0.71062362 305 acl-2013-SORT: An Interactive Source-Rewriting Tool for Improved Translation

18 0.70548576 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT

19 0.7043106 328 acl-2013-Stacking for Statistical Machine Translation

20 0.70353258 110 acl-2013-Deepfix: Statistical Post-editing of Statistical Machine Translation Using Deep Syntactic Analysis


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.05), (6, 0.033), (11, 0.036), (15, 0.015), (24, 0.054), (26, 0.045), (35, 0.099), (42, 0.084), (48, 0.03), (51, 0.2), (70, 0.048), (88, 0.03), (90, 0.072), (95, 0.113)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.89744908 35 acl-2013-Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation

Author: Kevin Duh ; Graham Neubig ; Katsuhito Sudoh ; Hajime Tsukada

Abstract: Data selection is an effective approach to domain adaptation in statistical machine translation. The idea is to use language models trained on small in-domain text to select similar sentences from large general-domain corpora, which are then incorporated into the training data. Substantial gains have been demonstrated in previous works, which employ standard ngram language models. Here, we explore the use of neural language models for data selection. We hypothesize that the continuous vector representation of words in neural language models makes them more effective than n-grams for modeling un- known word contexts, which are prevalent in general-domain text. In a comprehensive evaluation of 4 language pairs (English to German, French, Russian, Spanish), we found that neural language models are indeed viable tools for data selection: while the improvements are varied (i.e. 0.1 to 1.7 gains in BLEU), they are fast to train on small in-domain data and can sometimes substantially outperform conventional n-grams.

2 0.87601948 310 acl-2013-Semantic Frames to Predict Stock Price Movement

Author: Boyi Xie ; Rebecca J. Passonneau ; Leon Wu ; German G. Creamer

Abstract: Semantic frames are a rich linguistic resource. There has been much work on semantic frame parsers, but less that applies them to general NLP problems. We address a task to predict change in stock price from financial news. Semantic frames help to generalize from specific sentences to scenarios, and to detect the (positive or negative) roles of specific companies. We introduce a novel tree representation, and use it to train predictive models with tree kernels using support vector machines. Our experiments test multiple text representations on two binary classification tasks, change of price and polarity. Experiments show that features derived from semantic frame parsing have significantly better performance across years on the polarity task.

same-paper 3 0.82210696 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding

Author: Kun Wang ; Chengqing Zong ; Keh-Yih Su

Abstract: Since statistical machine translation (SMT) and translation memory (TM) complement each other in matched and unmatched regions, integrated models are proposed in this paper to incorporate TM information into phrase-based SMT. Unlike previous multi-stage pipeline approaches, which directly merge TM result into the final output, the proposed models refer to the corresponding TM information associated with each phrase at SMT decoding. On a Chinese–English TM database, our experiments show that the proposed integrated Model-III is significantly better than either the SMT or the TM systems when the fuzzy match score is above 0.4. Furthermore, integrated Model-III achieves overall 3.48 BLEU points improvement and 2.62 TER points reduction in comparison with the pure SMT system. Be- . sides, the proposed models also outperform previous approaches significantly.

4 0.78510749 383 acl-2013-Vector Space Model for Adaptation in Statistical Machine Translation

Author: Boxing Chen ; Roland Kuhn ; George Foster

Abstract: This paper proposes a new approach to domain adaptation in statistical machine translation (SMT) based on a vector space model (VSM). The general idea is first to create a vector profile for the in-domain development (“dev”) set. This profile might, for instance, be a vector with a dimensionality equal to the number of training subcorpora; each entry in the vector reflects the contribution of a particular subcorpus to all the phrase pairs that can be extracted from the dev set. Then, for each phrase pair extracted from the training data, we create a vector with features defined in the same way, and calculate its similarity score with the vector representing the dev set. Thus, we obtain a de- coding feature whose value represents the phrase pair’s closeness to the dev. This is a simple, computationally cheap form of instance weighting for phrase pairs. Experiments on large scale NIST evaluation data show improvements over strong baselines: +1.8 BLEU on Arabic to English and +1.4 BLEU on Chinese to English over a non-adapted baseline, and significant improvements in most circumstances over baselines with linear mixture model adaptation. An informal analysis suggests that VSM adaptation may help in making a good choice among words with the same meaning, on the basis of style and genre.

5 0.71277446 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT

Author: Wenduan Xu ; Yue Zhang ; Philip Williams ; Philipp Koehn

Abstract: We present a context-sensitive chart pruning method for CKY-style MT decoding. Source phrases that are unlikely to have aligned target constituents are identified using sequence labellers learned from the parallel corpus, and speed-up is obtained by pruning corresponding chart cells. The proposed method is easy to implement, orthogonal to cube pruning and additive to its pruning power. On a full-scale Englishto-German experiment with a string-totree model, we obtain a speed-up of more than 60% over a strong baseline, with no loss in BLEU.

6 0.71256065 250 acl-2013-Models of Translation Competitions

7 0.71021497 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation

8 0.70763725 172 acl-2013-Graph-based Local Coherence Modeling

9 0.70621848 166 acl-2013-Generalized Reordering Rules for Improved SMT

10 0.70526612 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation

11 0.70479739 312 acl-2013-Semantic Parsing as Machine Translation

12 0.70303005 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation

13 0.70260644 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers

14 0.70237887 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation

15 0.70090449 338 acl-2013-Task Alternation in Parallel Sentence Retrieval for Twitter Translation

16 0.69974095 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

17 0.6995157 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models

18 0.69923449 25 acl-2013-A Tightly-coupled Unsupervised Clustering and Bilingual Alignment Model for Transliteration

19 0.69901001 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation

20 0.6983788 197 acl-2013-Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation