acl acl2013 acl2013-383 knowledge-graph by maker-knowledge-mining

383 acl-2013-Vector Space Model for Adaptation in Statistical Machine Translation

Source: pdf

Author: Boxing Chen ; Roland Kuhn ; George Foster

Abstract: This paper proposes a new approach to domain adaptation in statistical machine translation (SMT) based on a vector space model (VSM). The general idea is first to create a vector profile for the in-domain development (“dev”) set. This profile might, for instance, be a vector with a dimensionality equal to the number of training subcorpora; each entry in the vector reflects the contribution of a particular subcorpus to all the phrase pairs that can be extracted from the dev set. Then, for each phrase pair extracted from the training data, we create a vector with features defined in the same way, and calculate its similarity score with the vector representing the dev set. Thus, we obtain a de- coding feature whose value represents the phrase pair’s closeness to the dev. This is a simple, computationally cheap form of instance weighting for phrase pairs. Experiments on large scale NIST evaluation data show improvements over strong baselines: +1.8 BLEU on Arabic to English and +1.4 BLEU on Chinese to English over a non-adapted baseline, and significant improvements in most circumstances over baselines with linear mixture model adaptation. An informal analysis suggests that VSM adaptation may help in making a good choice among words with the same meaning, on the basis of style and genre.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ca Abstract This paper proposes a new approach to domain adaptation in statistical machine translation (SMT) based on a vector space model (VSM). [sent-4, score-0.493]

2 This profile might, for instance, be a vector with a dimensionality equal to the number of training subcorpora; each entry in the vector reflects the contribution of a particular subcorpus to all the phrase pairs that can be extracted from the dev set. [sent-6, score-0.772]

3 Then, for each phrase pair extracted from the training data, we create a vector with features defined in the same way, and calculate its similarity score with the vector representing the dev set. [sent-7, score-0.787]

4 This is a simple, computationally cheap form of instance weighting for phrase pairs. [sent-9, score-0.201]

5 4 BLEU on Chinese to English over a non-adapted baseline, and significant improvements in most circumstances over baselines with linear mixture model adaptation. [sent-12, score-0.324]

6 An informal analysis suggests that VSM adaptation may help in making a good choice among words with the same meaning, on the basis of style and genre. [sent-13, score-0.338]

7 1 Introduction The translation models of a statistical machine translation (SMT) system are trained on parallel data. [sent-14, score-0.183]

8 Unless there is a perfect match between the training data domain and the (test) domain in which the SMT system will be used, one can often get better performance by adapting the system to the test domain. [sent-16, score-0.178]

9 Domain adaptation is an active topic in the natural language processing (NLP) research community. [sent-17, score-0.251]

10 Approaches that have been tried for SMT model adaptation include mixture models, transductive learning, data selection, instance weighting, and phrase sense disambiguation, etc. [sent-19, score-0.676]

11 Research on mixture models has considered both linear and log-linear mixtures. [sent-20, score-0.299]

12 In transductive learning, an MT system trained on general domain data is used to translate indomain monolingual data. [sent-23, score-0.137]

13 Ac s2s0o1ci3a Atiosnso fcoirat Cioonm foprut Caotimonpaulta Lti nognuails Lti cnsg,u piasgteics 1285–1293, typically use a rich feature set to decide on weights for the training data, at the sentence or phrase pair level. [sent-36, score-0.209]

14 For example, a sentence from a subcorpus whose domain is far from that of the dev set would typically receive a low weight, but sentences in this subcorpus that appear to be of a general na- ture might receive higher weights. [sent-37, score-0.586]

15 The 2012 JHU workshop on Domain Adaptation for MT 1 proposed phrase sense disambiguation (PSD) for translation model adaptation. [sent-38, score-0.181]

16 In this approach, the context of a phrase helps the system to find the appropriate translation. [sent-39, score-0.147]

17 In this paper, we propose a new instance weighting approach to domain adaptation based on a vector space model (VSM). [sent-40, score-0.485]

18 Instead of using word-based features and a computationally expensive training procedure, we capture the distributional properties of each phrase pair directly, representing it as a vector in a space which also contains a representation of the dev set. [sent-44, score-0.698]

19 The similarity between a given phrase pair’s vector and the dev set vector becomes a feature for the decoder. [sent-45, score-0.696]

20 It rewards phrase pairs that are in some sense closer to those found in the dev set, and punishes the rest. [sent-46, score-0.525]

21 They all enabled VSM adaptation to beat the non-adaptive baseline, but Bhattacharyya similarity worked best, so we adopted it for the remaining experiments. [sent-48, score-0.352]

22 The vector space used by VSM adaptation can be defined in various ways. [sent-49, score-0.357]

23 In the experiments described below, we chose a definition that measures the contribution (to counts of a given phrase pair, or to counts of all phrase pairs in the dev set) of each training subcorpus. [sent-50, score-0.745]

24 Thus, the variant of VSM adaptation tested here bears a superficial resemblance to domain adaptation based on mixture models for TMs, as in (Foster and Kuhn, 2007), in that both approaches rely on information about the subcorpora from which the data originate. [sent-51, score-0.975]

25 However, a key difference is that in this paper we explicitly capture each phrase pair’s distribution across subcorpora, and compare it to the aggregated distribution of phrase pairs in the dev set. [sent-52, score-0.643]

26 edu/workshops/archive/ws12/groups/dasmt tion across subcorpora is captured only implicitly, by probabilities that reflect the prevalence of the pair within each subcorpus. [sent-56, score-0.283]

27 Thus, VSM adaptation occurs at a much finer granularity than mixture model adaptation. [sent-57, score-0.457]

28 The (dev, phrase pair) similarity would then be independent of the subcorpora. [sent-61, score-0.175]

29 Thus, VSM adaptation is not limited to the variant of it that we tested in our experiments. [sent-63, score-0.251]

30 2 Vector space model adaptation Vector space models (VSMs) have been widely applied in many information retrieval and natural language processing applications. [sent-64, score-0.325]

31 For instance, to compute the sense similarity between terms, many researchers extract features for each term from its context in a corpus, define a VSM and then apply similarity functions (Hindle, 1990; Lund and Burgess, 1996; Lin, 1998; Turney, 2001). [sent-65, score-0.159]

32 For instance, the Chinese-English training data are made up of 14 subcorpora (see section 3 below). [sent-67, score-0.252]

33 (2) To avoid a bias towards longer corpora, we normalize the raw joint count ci (f, e) in the corpus si by dividing by the maximum raw count of any 12p86hrase pair extracted in the corpus si. [sent-78, score-0.141]

34 , (4) where df(f, e) is the number of subcorpora that (f, e) appears in, and λ is an empirically determined smoothing term. [sent-83, score-0.254]

35 For the in-domain dev set, we first run word alignment and phrases extracting in the usual way for the dev set, then sum the distribution of each phrase pair (fj , ek) extracted from the dev data across subcorpora to represent its domain information. [sent-84, score-1.595]

36 , wC(dev) >, (5) = X X cdev(fj,ek)wi(fj,ek) (6) w1 where Xj=J wi(dev) Xj=0 kX=K Xk=0 J, K are the total numbers of source/target phrases extracted from the dev data respectively. [sent-88, score-0.383]

37 cdev (fj , ek) is the joint count of phrase pair fj , ek found in the dev set. [sent-89, score-0.705]

38 The vector can also be built with other features of the phrase pair. [sent-90, score-0.187]

39 For instance, we could replace the raw joint count ci(f, e) in Equation 3 with the raw marginal count of phrase pairs (f, e). [sent-91, score-0.278]

40 Therefore, even within the variant of VSM adaptation we focus on in this paper, where the definition of the vector space is based on the existence of subcorpora, one could utilize other definitions of the vectors ofthe similarity function than those we utilized in our experiments. [sent-92, score-0.414]

41 1 Vector similarity functions VSM uses the similarity score between the vector representing the in-domain dev set and the vector representing each phrase pair as a decoder feature. [sent-94, score-0.859]

42 The Bhattacharyya coefficient (BC) is defined as follows: Xi=C BC(dev;f,e) = X ppi(dev) · pi(f,e) (9) Xi=0 The other two similarity functions we also tested are JSD and cosine (Cos). [sent-105, score-0.132]

43 Table 1 summarizes information about the training, development and test sets; we show the sizes of the training subcorpora in number of words as a percentage of all training data. [sent-113, score-0.282]

44 Each corpus was word-aligned using IBM2, HMM, and IBM4 models, and the phrase table was the union of phrase pairs extracted from these separate alignments, with a length limit of 7. [sent-129, score-0.26]

45 cfm 12t8o8 two widely used TM domain adaptation ap- proaches. [sent-140, score-0.296]

46 One is the log-linear combination of TMs trained on each subcorpus (Koehn and Schroeder, 2007), with weights of each model tuned under minimal error rate training using MIRA. [sent-141, score-0.133]

47 The other is a linear combination of TMs trained on each subcorpus, with the weights of each model learned with an EM algorithm to maximize the likelihood of joint empirical phrase pair counts for in-domain dev data. [sent-142, score-0.691]

48 1) are determined by the performance on the dev set of the Arabic-to-English system. [sent-145, score-0.383]

49 For both Arabic-to-English and Chinese-to-English experiment, these values obtained on Arabic dev were used to obtain the results below: λ was set to 8, and α was set to 0. [sent-146, score-0.383]

50 The Bhattacharyya coefficient is explicitly designed to measure the overlap between the probability distributions of two statistical samples or populations, which is precisely what we are trying to do here: we are trying to reward phrase pairs whose distribution is similar to that of the dev set. [sent-159, score-0.583]

51 In the next set of experiments, we compared VSM adaptation using the BC similarity function with the baseline which concatenates all training data and with log-linear and linear TM mixtures JsbCBySaOsDteSlimne3 C123h. [sent-161, score-0.477]

52 Table 4 shows that log-linear combination performs worse than the baseline: the tuning algorithm failed to optimize the log-linear combination even on dev set. [sent-171, score-0.383]

53 For Chinese, the BLEU score of the dev set on the baseline system is 27. [sent-172, score-0.458]

54 0; for Arabic, the BLEU score of the dev set on the baseline system is 46. [sent-174, score-0.458]

55 Linear mixture was significantly better than the baseline at the p < 0. [sent-178, score-0.252]

56 Since our approach, VSM, performed better than the linear mixture for both pairs, it is of course also significantly better than the baseline at the p < 0. [sent-180, score-0.345]

57 This raises the question: is VSM performance significantly better than that of a linear mixture of TMs? [sent-182, score-0.299]

58 The answer (not shown in the table) is that for Arabic to English, VSM performance is better than linear mixture at the p < 0. [sent-183, score-0.299]

59 For Chinese to English, the argument for the superiority of VSM over linear mixture is less convincing: there is significance at the p < 0. [sent-185, score-0.325]

60 At any rate, these results establish that VSM adaptation is clearly superior to linear mixture TM adaptation, for one of the two language pairs. [sent-187, score-0.586]

61 In Table 4, the VSM results are based on the 1289 Table 5: Results for adaptation based on joint or maginal counts. [sent-188, score-0.251]

62 In Table 5, we first show the results based on source and target marginal counts, then the results of using feature sets drawn from three decoder VSM features: a joint count feature, a source marginal count feature, and a target marginal count feature. [sent-191, score-0.288]

63 When we compared two sets of results in Table 4, the joint count version of VSM and linear mixture of TMs, we found that for Arabic to English, VSM performance is better than linear mixture at the p < 0. [sent-198, score-0.638]

64 01 level; the Chinese to English significance test was inconclusive (VSM found to be superior to linear mixture at p < 0. [sent-199, score-0.361]

65 For Chinese, 3-feature VSM is now superior to linear mixture at p < 0. [sent-205, score-0.335]

66 01 on NIST06 test set, but 3-feature VSM still doesn’t have a statistically significant edge over linear mixture on NIST08 test set. [sent-206, score-0.324]

67 A fair summary would be that 3feature VSM adaptation is decisively superior to linear mixture adaptation for Arabic to English, tation for Chinese to English. [sent-207, score-0.837]

68 Our last set of experiments examined the question: when added to a system that already has some form of linear mixture model adaptation, does VSM improve performance? [sent-208, score-0.328]

69 In (Foster and Kuhn, 2007), two kinds of linear mixture were described: linear mixture of language models (LMs), and linear mixture of translation models (TMs). [sent-209, score-0.96]

70 Some of the results reported above involved linear TM mixtures, but none of them involved linear LM mixtures. [sent-210, score-0.186]

71 Table 6 shows the results of different combinations of VSM and mixture models. [sent-211, score-0.206]

72 * and ** denote significant gains over the row no vsm at p < 0. [sent-212, score-0.674]

73 For instance, with an initial Chinese system that employs linear mixture LM adaptation (lin-lm) and has a BLEU of 32. [sent-216, score-0.579]

74 1, adding 1-feature VSM adaptation (+vsm, joint) improves performance to 33. [sent-217, score-0.251]

75 For Arabic, including either form of VSM adaptation always improves performance with significance at p < 0. [sent-223, score-0.277]

76 01, even over a system including both linear TM and linear LM adaptation. [sent-224, score-0.215]

77 For Chinese, adding VSM still always yields an improvement, but the improvement is not significant if linear TM adaptation is already in the system. [sent-225, score-0.369]

78 These results show that combining VSM adaptation and either or both kinds of linear mixture adaptation never hurts performance, and often improves it by a significant amount. [sent-226, score-0.826]

79 4 Informal Data Analysis To get an intuition for how VSM adaptation improves BLEU scores, we compared outputs from the baseline and VSM-adapted system (“vsm, joint” in Table 5) on the Chinese test data. [sent-228, score-0.326]

80 Thus, we ignored differences in the two translations that might have been due to the secondary and highly competitive with linear mixture adap- Table 6: 12e90ffects of VSM adaptation (such as a different tar- CAhraibnecs+ vo s vm sm, j3oifenat. [sent-230, score-0.55]

81 -19384lm*+in-t Results of combining VSM and linear mixture adaptation. [sent-235, score-0.299]

82 01 levels, “lin-lm” is linear language model * and ** denote significant gains over the row respectively. [sent-239, score-0.142]

83 get phrase being preferred by the language model in the VSM-adapted system from the one preferred in the baseline system because of a Bhattacharyyamediated change in the phrase preceding it). [sent-240, score-0.34]

84 An interesting pattern soon emerged: the VSMadapted system seems to be better than the baseline at choosing among synonyms in a way that is appropriate to the genre or style of a text. [sent-241, score-0.197]

85 For instance, where the text to be translated is from an informal genre such as weblog, the VSM-adapted system will often pick an informal word where the baseline picks a formal word with the same or similar meaning, and vice versa where the text to be translated is from a more formal genre. [sent-242, score-0.259]

86 In the first example, the first two lines show that VSM finds that the Chinese-English phrase pair (殴打, assaulted) has a Bhattacharyya (BC) similarity of 0. [sent-245, score-0.236]

87 556163 to the dev set, while the phrase pair (殴打, beat) has a BC similarity of 0. [sent-246, score-0.619]

88 Note that the result of VSM adaptation is that the rather formal word “assaulted” is replaced by its informal near-synonym “beat” in the translation of an informal weblog text. [sent-250, score-0.481]

89 However, it looks as though the VSM-adapted system has learned from the dev that among synonyms, those more characteristic of news stories than of legal texts should be chosen: it therefore picks “arrest” over its synonym “apprehend”. [sent-252, score-0.436]

90 4 Conclusions and future work This paper proposed a new approach to domain adaptation in statistical machine translation, based on vector space models (VSMs). [sent-256, score-0.43]

91 This approach measures the similarity between a vector representing a particular phrase pair in the phrase table and a vector representing the dev set, yielding a feature associated with that phrase pair that will be used by the decoder. [sent-257, score-1.054]

92 Furthermore, VSM adaptation can be exploited in a number of different ways, which we have only begun to explore. [sent-275, score-0.251]

93 In our experiments, we based the vector space on subcorpora defined by the nature of the training data. [sent-276, score-0.358]

94 A feature derived from this topicrelated vector space might complement some features derived from the subcorpora which we explored in the experiments above, and which seem to exploit information related to genre and style. [sent-279, score-0.384]

95 Domain adaptation for statistical machine translation with monolingual resources. [sent-286, score-0.342]

96 Discriminative instance weighting for domain adaptation in statistical machine translation. [sent-317, score-0.407]

97 Adaptation of the translation model for statistical machine translation based on information retrieval. [sent-327, score-0.154]

98 Training machine translation with a second-order taylor approximation of weighted translation instances. [sent-405, score-0.126]

99 Perplexity minimization for translation model domain adaptation in statistical machine translation. [sent-413, score-0.387]

100 Language model adaptation for statistical machine translation with structured query models. [sent-426, score-0.342]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('vsm', 0.625), ('dev', 0.383), ('adaptation', 0.251), ('subcorpora', 0.222), ('mixture', 0.206), ('bc', 0.142), ('phrase', 0.118), ('tms', 0.11), ('bhattacharyya', 0.107), ('foster', 0.098), ('linear', 0.093), ('subcorpus', 0.079), ('vsmadapted', 0.077), ('arabic', 0.075), ('vector', 0.069), ('tm', 0.068), ('bleu', 0.064), ('nist', 0.063), ('weblog', 0.063), ('jsd', 0.063), ('cos', 0.063), ('chinese', 0.063), ('translation', 0.063), ('kuhn', 0.062), ('pair', 0.061), ('assaulted', 0.058), ('similarity', 0.057), ('marginal', 0.056), ('genre', 0.056), ('informal', 0.052), ('boxing', 0.051), ('roland', 0.047), ('pi', 0.046), ('baseline', 0.046), ('domain', 0.045), ('functions', 0.045), ('beat', 0.044), ('weighting', 0.044), ('cha', 0.042), ('idf', 0.042), ('lm', 0.042), ('smt', 0.041), ('count', 0.04), ('broadcast', 0.039), ('transductive', 0.039), ('instance', 0.039), ('apprehend', 0.039), ('cdev', 0.039), ('hildebrand', 0.039), ('logpi', 0.039), ('tfi', 0.039), ('space', 0.037), ('superior', 0.036), ('counts', 0.036), ('style', 0.035), ('fj', 0.035), ('pjj', 0.034), ('newsgroup', 0.034), ('eck', 0.034), ('vsms', 0.034), ('schroeder', 0.034), ('bn', 0.034), ('koehn', 0.033), ('smoothing', 0.032), ('ueffing', 0.032), ('ppi', 0.032), ('populations', 0.032), ('synonyms', 0.031), ('training', 0.03), ('nw', 0.03), ('lund', 0.03), ('phillips', 0.03), ('coefficient', 0.03), ('ek', 0.029), ('system', 0.029), ('statistical', 0.028), ('isi', 0.027), ('discounting', 0.027), ('un', 0.027), ('george', 0.026), ('matsoukas', 0.026), ('axelrod', 0.026), ('arrest', 0.026), ('genres', 0.026), ('significance', 0.026), ('bing', 0.025), ('divergence', 0.025), ('significant', 0.025), ('family', 0.025), ('picks', 0.024), ('indomain', 0.024), ('pairs', 0.024), ('tuned', 0.024), ('gains', 0.024), ('wc', 0.024), ('wl', 0.024), ('matthias', 0.024), ('bertoldi', 0.023), ('tried', 0.023), ('df', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 383 acl-2013-Vector Space Model for Adaptation in Statistical Machine Translation

Author: Boxing Chen ; Roland Kuhn ; George Foster

2 0.21285853 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation

Author: Rico Sennrich ; Holger Schwenk ; Walid Aransa

Abstract: While domain adaptation techniques for SMT have proven to be effective at improving translation quality, their practicality for a multi-domain environment is often limited because of the computational and human costs of developing and maintaining multiple systems adapted to different domains. We present an architecture that delays the computation of translation model features until decoding, allowing for the application of mixture-modeling techniques at decoding time. We also de- scribe a method for unsupervised adaptation with development and test data from multiple domains. Experimental results on two language pairs demonstrate the effectiveness of both our translation model architecture and automatic clustering, with gains of up to 1BLEU over unadapted systems and single-domain adaptation.

3 0.15211621 296 acl-2013-Recognizing Identical Events with Graph Kernels

Author: Goran Glavas ; Jan Snajder

Abstract: Identifying news stories that discuss the same real-world events is important for news tracking and retrieval. Most existing approaches rely on the traditional vector space model. We propose an approach for recognizing identical real-world events based on a structured, event-oriented document representation. We structure documents as graphs of event mentions and use graph kernels to measure the similarity between document pairs. Our experiments indicate that the proposed graph-based approach can outperform the traditional vector space model, and is especially suitable for distinguishing between topically similar, yet non-identical events.

4 0.15052074 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation

Author: Jiajun Zhang ; Chengqing Zong

Abstract: Currently, almost all of the statistical machine translation (SMT) models are trained with the parallel corpora in some specific domains. However, when it comes to a language pair or a different domain without any bilingual resources, the traditional SMT loses its power. Recently, some research works study the unsupervised SMT for inducing a simple word-based translation model from the monolingual corpora. It successfully bypasses the constraint of bitext for SMT and obtains a relatively promising result. In this paper, we take a step forward and propose a simple but effective method to induce a phrase-based model from the monolingual corpora given an automatically-induced translation lexicon or a manually-edited translation dictionary. We apply our method for the domain adaptation task and the extensive experiments show that our proposed method can substantially improve the translation quality. 1

5 0.13457392 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation

Author: Conghui Zhu ; Taro Watanabe ; Eiichiro Sumita ; Tiejun Zhao

Abstract: Typical statistical machine translation systems are batch trained with a given training data and their performances are largely influenced by the amount of data. With the growth of the available data across different domains, it is computationally demanding to perform batch training every time when new data comes. In face of the problem, we propose an efficient phrase table combination method. In particular, we train a Bayesian phrasal inversion transduction grammars for each domain separately. The learned phrase tables are hierarchically combined as if they are drawn from a hierarchical Pitman-Yor process. The performance measured by BLEU is at least as comparable to the traditional batch training method. Furthermore, each phrase table is trained separately in each domain, and while computational overhead is significantly reduced by training them in parallel.

6 0.13201544 328 acl-2013-Stacking for Statistical Machine Translation

7 0.13170509 197 acl-2013-Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation

8 0.11273571 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding

9 0.10181653 329 acl-2013-Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

10 0.093337096 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction

11 0.092131518 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk

12 0.088320658 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation

13 0.087836005 338 acl-2013-Task Alternation in Parallel Sentence Retrieval for Twitter Translation

14 0.083546944 374 acl-2013-Using Context Vectors in Improving a Machine Translation System with Bridge Language

15 0.08353316 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data

16 0.080569394 235 acl-2013-Machine Translation Detection from Monolingual Web-Text

17 0.080222279 214 acl-2013-Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation

18 0.080101445 195 acl-2013-Improving machine translation by training against an automatic semantic frame based evaluation metric

19 0.079422385 35 acl-2013-Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation

20 0.077294111 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.2), (1, -0.088), (2, 0.129), (3, 0.043), (4, 0.039), (5, 0.014), (6, -0.019), (7, 0.035), (8, -0.005), (9, 0.034), (10, 0.01), (11, 0.039), (12, -0.006), (13, 0.028), (14, 0.005), (15, 0.069), (16, -0.08), (17, -0.024), (18, -0.001), (19, 0.071), (20, 0.121), (21, -0.038), (22, 0.093), (23, 0.023), (24, -0.009), (25, 0.006), (26, 0.084), (27, 0.027), (28, -0.001), (29, -0.059), (30, 0.054), (31, 0.124), (32, -0.042), (33, -0.045), (34, 0.059), (35, 0.076), (36, 0.084), (37, -0.093), (38, -0.01), (39, 0.008), (40, 0.025), (41, 0.073), (42, -0.007), (43, 0.008), (44, 0.001), (45, -0.051), (46, 0.005), (47, -0.03), (48, -0.013), (49, 0.064)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94263959 383 acl-2013-Vector Space Model for Adaptation in Statistical Machine Translation

Author: Boxing Chen ; Roland Kuhn ; George Foster

2 0.80939788 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation

Author: Rico Sennrich ; Holger Schwenk ; Walid Aransa

3 0.78193384 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation

Author: Conghui Zhu ; Taro Watanabe ; Eiichiro Sumita ; Tiejun Zhao

4 0.69066602 328 acl-2013-Stacking for Statistical Machine Translation

Author: Majid Razmara ; Anoop Sarkar

Abstract: We propose the use of stacking, an ensemble learning technique, to the statistical machine translation (SMT) models. A diverse ensemble of weak learners is created using the same SMT engine (a hierarchical phrase-based system) by manipulating the training data and a strong model is created by combining the weak models on-the-fly. Experimental results on two language pairs and three different sizes of training data show significant improvements of up to 4 BLEU points over a conventionally trained SMT model.

5 0.67372477 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding

Author: Kun Wang ; Chengqing Zong ; Keh-Yih Su

Abstract: Since statistical machine translation (SMT) and translation memory (TM) complement each other in matched and unmatched regions, integrated models are proposed in this paper to incorporate TM information into phrase-based SMT. Unlike previous multi-stage pipeline approaches, which directly merge TM result into the final output, the proposed models refer to the corresponding TM information associated with each phrase at SMT decoding. On a Chinese–English TM database, our experiments show that the proposed integrated Model-III is significantly better than either the SMT or the TM systems when the fuzzy match score is above 0.4. Furthermore, integrated Model-III achieves overall 3.48 BLEU points improvement and 2.62 TER points reduction in comparison with the pure SMT system. Be- . sides, the proposed models also outperform previous approaches significantly.

6 0.64798725 214 acl-2013-Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation

7 0.64620954 374 acl-2013-Using Context Vectors in Improving a Machine Translation System with Bridge Language

8 0.63875782 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction

9 0.62573403 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation

10 0.61968756 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk

11 0.61932164 221 acl-2013-Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines

12 0.5964244 338 acl-2013-Task Alternation in Parallel Sentence Retrieval for Twitter Translation

13 0.59166831 197 acl-2013-Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation

14 0.57867932 156 acl-2013-Fast and Adaptive Online Training of Feature-Rich Translation Models

15 0.56627977 24 acl-2013-A Tale about PRO and Monsters

16 0.56167942 38 acl-2013-Additive Neural Networks for Statistical Machine Translation

17 0.55052388 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation

18 0.54768521 235 acl-2013-Machine Translation Detection from Monolingual Web-Text

19 0.54590434 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data

20 0.53744322 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.041), (6, 0.04), (11, 0.072), (15, 0.018), (24, 0.026), (26, 0.059), (28, 0.011), (35, 0.055), (42, 0.148), (48, 0.037), (51, 0.175), (70, 0.036), (88, 0.029), (90, 0.053), (95, 0.113)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.85833782 35 acl-2013-Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation

Author: Kevin Duh ; Graham Neubig ; Katsuhito Sudoh ; Hajime Tsukada

Abstract: Data selection is an effective approach to domain adaptation in statistical machine translation. The idea is to use language models trained on small in-domain text to select similar sentences from large general-domain corpora, which are then incorporated into the training data. Substantial gains have been demonstrated in previous works, which employ standard ngram language models. Here, we explore the use of neural language models for data selection. We hypothesize that the continuous vector representation of words in neural language models makes them more effective than n-grams for modeling un- known word contexts, which are prevalent in general-domain text. In a comprehensive evaluation of 4 language pairs (English to German, French, Russian, Spanish), we found that neural language models are indeed viable tools for data selection: while the improvements are varied (i.e. 0.1 to 1.7 gains in BLEU), they are fast to train on small in-domain data and can sometimes substantially outperform conventional n-grams.

same-paper 2 0.85421097 383 acl-2013-Vector Space Model for Adaptation in Statistical Machine Translation

Author: Boxing Chen ; Roland Kuhn ; George Foster

3 0.8430829 310 acl-2013-Semantic Frames to Predict Stock Price Movement

Author: Boyi Xie ; Rebecca J. Passonneau ; Leon Wu ; German G. Creamer

Abstract: Semantic frames are a rich linguistic resource. There has been much work on semantic frame parsers, but less that applies them to general NLP problems. We address a task to predict change in stock price from financial news. Semantic frames help to generalize from specific sentences to scenarios, and to detect the (positive or negative) roles of specific companies. We introduce a novel tree representation, and use it to train predictive models with tree kernels using support vector machines. Our experiments test multiple text representations on two binary classification tasks, change of price and polarity. Experiments show that features derived from semantic frame parsing have significantly better performance across years on the polarity task.

4 0.83317864 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding

Author: Kun Wang ; Chengqing Zong ; Keh-Yih Su

5 0.77637428 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT

Author: Wenduan Xu ; Yue Zhang ; Philip Williams ; Philipp Koehn

Abstract: We present a context-sensitive chart pruning method for CKY-style MT decoding. Source phrases that are unlikely to have aligned target constituents are identified using sequence labellers learned from the parallel corpus, and speed-up is obtained by pruning corresponding chart cells. The proposed method is easy to implement, orthogonal to cube pruning and additive to its pruning power. On a full-scale Englishto-German experiment with a string-totree model, we obtain a speed-up of more than 60% over a strong baseline, with no loss in BLEU.

6 0.77173328 38 acl-2013-Additive Neural Networks for Statistical Machine Translation

7 0.7700814 166 acl-2013-Generalized Reordering Rules for Improved SMT

8 0.76666164 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation

9 0.76076376 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation

10 0.75963944 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk

11 0.75924826 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation

12 0.75479698 206 acl-2013-Joint Event Extraction via Structured Prediction with Global Features

13 0.75310725 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation

14 0.75198877 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search

15 0.74930167 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing

16 0.74794865 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models

17 0.74787807 101 acl-2013-Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation

18 0.74669099 77 acl-2013-Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT?

19 0.74594384 328 acl-2013-Stacking for Statistical Machine Translation

20 0.74235022 264 acl-2013-Online Relative Margin Maximization for Statistical Machine Translation