emnlp emnlp2013 emnlp2013-136 knowledge-graph by maker-knowledge-mining

136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning

Source: pdf

Author: Lei Cui ; Xilun Chen ; Dongdong Zhang ; Shujie Liu ; Mu Li ; Ming Zhou

Abstract: Domain adaptation for SMT usually adapts models to an individual specific domain. However, it often lacks some correlation among different domains where common knowledge could be shared to improve the overall translation quality. In this paper, we propose a novel multi-domain adaptation approach for SMT using Multi-Task Learning (MTL), with in-domain models tailored for each specific domain and a general-domain model shared by different domains. The parameters of these models are tuned jointly via MTL so that they can learn general knowledge more accurately and exploit domain knowledge better. Our experiments on a largescale English-to-Chinese translation task validate that the MTL-based adaptation approach significantly and consistently improves the translation quality compared to a non-adapted baseline. Furthermore, it also outperforms the individual adaptation of each specific domain.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com iu , , , Abstract Domain adaptation for SMT usually adapts models to an individual specific domain. [sent-12, score-0.252]

2 However, it often lacks some correlation among different domains where common knowledge could be shared to improve the overall translation quality. [sent-13, score-0.333]

3 In this paper, we propose a novel multi-domain adaptation approach for SMT using Multi-Task Learning (MTL), with in-domain models tailored for each specific domain and a general-domain model shared by different domains. [sent-14, score-0.473]

4 The parameters of these models are tuned jointly via MTL so that they can learn general knowledge more accurately and exploit domain knowledge better. [sent-15, score-0.277]

5 Our experiments on a largescale English-to-Chinese translation task validate that the MTL-based adaptation approach significantly and consistently improves the translation quality compared to a non-adapted baseline. [sent-16, score-0.603]

6 Furthermore, it also outperforms the individual adaptation of each specific domain. [sent-17, score-0.252]

7 1 Introduction Domain adaptation is an active topic in statistical machine learning and aims to alleviate the domain mismatch between training and testing data. [sent-18, score-0.576]

8 Like many machine learning tasks, Statistical Machine Translation (SMT) assumes that the data distributions of training and testing domains are similar. [sent-19, score-0.24]

9 The translation quality is often unsatisfactory when ∗This work was done while the first and second authors were visiting Microsoft Research Asia. [sent-21, score-0.193]

10 1055 translating texts from a specific domain using a general model that is trained over a hotchpotch of bilingual corpora. [sent-22, score-0.34]

11 Therefore, domain adaptation is crucial for SMT systems to achieve better performance. [sent-23, score-0.438]

12 Previous research on domain adaptation for SMT includes data selection and weighting (Eck et al. [sent-24, score-0.494]

13 Most of these methods adapt SMT models to a specific domain according to testing data and have achieved good performance. [sent-31, score-0.251]

14 It is natural that real world SMT systems should adapt the models to multiple domains because the input may be heterogeneous, so that the overall translation quality can be improved. [sent-32, score-0.439]

15 Although we can easily apply these methods to multiple domains individually, it is difficult to use the common knowledge across different domains. [sent-33, score-0.209]

16 To leverage the common knowledge, we need to devise a multi-domain adaptation approach that jointly adapts the SMT models. [sent-34, score-0.296]

17 Multi-domain adaptation has been proved quite effective in sentiment analysis (Dredze and Crammer, 2008) and web ranking (Chapelle et al. [sent-35, score-0.323]

18 , 2011), where the commonalities and differences across multiple domains are explicitly addressed by Multitask Learning (MTL). [sent-36, score-0.316]

19 Analogously, we expect that the overall translation quality can be further improved by using an MTL-based ProceeSdeiantgtlse o,f W thaesh 2i0n1gt3o nC,o UnSfeAre,n 1c8e- o2n1 E Omctpoibriecra 2l0 M13et. [sent-40, score-0.193]

20 Ti is the in-domain training data for the i-th domain selected from T using the bilingual cross-entropy based method (Axelrod et al. [sent-43, score-0.34]

21 Specifically, we develop multiple SMT systems based on mixture models, where each system is tailored for one specific domain with an in-domain Translation Model (TM) and an in-domain Language Model (LM). [sent-50, score-0.376]

22 With the MTL-based joint tuning, general knowledge can be better learned by the generaldomain models, while domain knowledge can be better exploited by the in-domain models as well. [sent-53, score-0.293]

23 Experimental results have shown that our method can significantly improve the translation quality on multiple domains over a nonadapted baseline. [sent-57, score-0.445]

24 Moreover, the MTL-based adaptation also outperforms the conventional individual 1056 adaptation approach towards each domain. [sent-58, score-0.536]

25 2 The Proposed Approach Figure 1 gives an example with N pre-defined domains to illustrate the main idea. [sent-63, score-0.175]

26 First, in-domain training data is selected according to the pre-defined domains (Section 2. [sent-65, score-0.175]

27 1 In-domain Data Selection In the first step, in-domain bilingual data is selected from all the bilingual data to train in-domain TMs. [sent-72, score-0.308]

28 , 2011) to obtain the in-domain data: [HI−src(s)−HG−src(s)]+[HI−tgt(t)−HG−tgt(t)] (1) where {s,t} is a bilingual sentence pair in the entire bilingual corpus. [sent-74, score-0.308]

29 HI−src(s) −HG−src(s) is the cross-entropy difference of string s H between the indomain and general-domain source-side LMs, and HI−tgt (t) − HG−tgt (t) is the cross-entropy difference of( string Ht between the in-domain and generaldomain target-side LMs. [sent-77, score-0.243]

30 There are a large number of monolingual webpages with domain information from web portal sites1, which can be collected to train in-domain LMs. [sent-82, score-0.454]

31 In large-scale real world SMT systems, practical domain adaptation techniques should target more domains rather than just one due to heterogeneous input. [sent-83, score-0.65]

32 Therefore, we use a web crawler to collect monolingual webpages of N domains from web portal sites, for both the source language and the tar- get language. [sent-84, score-0.507]

33 Finally, these indomain and general-domain LMs are used to select in-domain bilingual data for different domains according to Formula (1). [sent-89, score-0.465]

34 In particular, we use the mixture model based approach proposed by Koehn and Schroeder 1Many web portal sites contain domain information for webpages, such as ”www. [sent-92, score-0.445]

35 Specifically, we have developed N SMT systems for N domains respectively, where each system is a typical log-linear model. [sent-100, score-0.175]

36 The indomain TMs are trained using the selected bilingual training data according to Formula (1), and the general-domain TM is trained using the entire bilingual training data. [sent-104, score-0.444]

37 The basic idea of the objective function is to minimize the sum of loss functions for all the domains, rather than one domain at a time. [sent-130, score-0.216]

38 Therefore, by adjusting the in-domain and general-domain feature weights, the translation quality is expected to be good across different domains. [sent-131, score-0.242]

39 Let a translation candidate be denoted by its feature vector v ∈ RD, the pairwise preference for training is con- vstr ∈uc Rted by ranking two candidates according to the smoothed sentence-level BLEU (Liang et al. [sent-135, score-0.36]

40 These bins are used for pairwise ranking where the translation preference pairs are built between the candidates in High-Middle, Middle-Low, and HighLow, but not the candidates within the same bin, which is shown in Figure 2. [sent-142, score-0.377]

41 P − 1} do ud,t,i,j+1 ← ud,t,i,j − η∇L(ud,t,i,j) end for ud,t,i+1,0 ← ud,t,i,P end for end for for all domains d ∈ {1. [sent-167, score-0.175]

42 Multiple SMT decoders run in parallel and each decoder updates its feature weights individually using its indomain development data (line 4-15). [sent-193, score-0.385]

43 , 2012), we only average the generaldomain feature weights w1G, . [sent-200, score-0.213]

44 After the joint MTL-based tuning, the feature weights tailored for domain-specific SMT systems are used to translate the testing data. [sent-210, score-0.246]

45 We collect indomain testing data for each domain to evaluate the domain-specific systems. [sent-211, score-0.387]

46 Although this is not always the case in real applications where the testing domain is known, this study mainly focuses on the effectiveness of the MTL-based tuning approach. [sent-212, score-0.34]

47 1 Data We evaluated our MTL-based domain adaptation approach on a large-scale English-to-Chinese machine translation task. [sent-214, score-0.596]

48 The training data consisted of two parts: monolingual data and bilingual data. [sent-215, score-0.227]

49 1, we built a web crawler to collect a large number of webpages from web portal sites in English and Chinese respectively. [sent-220, score-0.304]

50 The bilingual data we used was mainly mined from the web using the method proposed by Jiang et al. [sent-224, score-0.184]

51 (2009), with a post-processing step using our bilingual data cleaning method (Cui et al. [sent-225, score-0.186]

52 1to rank the entire bilingual data, and the top 10% sentence pairs from the ranked bilingual data were selected as the in-domain data to train the in-domain TM. [sent-241, score-0.308]

53 1060 The phrase tables were filtered to retain top-20 translation candidates for each source phrase for efficiency. [sent-252, score-0.19]

54 The evaluation metric for the overall translation quality was case-insensitive BLEU4 (Papineni et al. [sent-254, score-0.193]

55 Moreover, we also compared our method with the adapted systems towards each domain individually (Koehn and Schroeder, 2007). [sent-266, score-0.282]

56 We found that the baseline has a similar performance to Google Translation, with certain domains performed even better (Business, Sci&Tech;, Sports, Politics). [sent-270, score-0.175]

57 This demonstrates that the translation quality of our baseline is state-of-the-art. [sent-271, score-0.193]

58 Moreover, we can answer three questions according to the experimental results as follow: First, is domain mismatch a significant problem for a real world SMT system? [sent-272, score-0.261]

59 We used the same system only with general-domain TM and LM, but tuned towards each domain individually using in-domain dev data. [sent-273, score-0.35]

60 ”[A]” denotes that the system is adapted towards each domain individually using MERT on in-domain dev data. [sent-298, score-0.346]

61 ”I-TM” and ”G-TM” indicates that the system was tuned using denote the in-domain and general-domain translation model. [sent-300, score-0.205]

62 the non-adapted baseline across all domains with at least 1. [sent-303, score-0.175]

63 Analogous to previous research, this confirms that the domain mismatch indeed exists and the parameter estimation using in-domain dev data is quite useful. [sent-306, score-0.288]

64 Second, does the mixture models based adaptation work for a variety of domains? [sent-307, score-0.373]

65 The reason is the data for general models has already included the in-domain data and the data coverage is much larger, thus the probability estimation is more reliable and the translation quality is much better. [sent-311, score-0.193]

66 For the LM, the in-domain LM performs better than the general-domain LM because our monolingual data (Table 1) for each domain is already sufficient for training an in-domain LM with good performance. [sent-312, score-0.259]

67 From Table 3, we observed that the setting ”[A] (G+I)-TM + I-LM” outperforms ”[A] (G+I)-TM + G-LM”, with the ”Sports” domain being the most significant. [sent-313, score-0.186]

68 When each system uses two TMs and two LMs, it consistently results in better performance, indicating that mixture models are crucial for domain adaptation in SMT. [sent-317, score-0.559]

69 We used the MTL-based approach to jointly tune multiple domain-specific systems, lever- aging the commonalities among different but related tasks. [sent-319, score-0.223]

70 From Table 3, the MTL-based approach significantly improve the translation quality over the non-adapted baseline, and also outperforms conventional mixture models based methods. [sent-320, score-0.346]

71 In particular, the ”Sports” domain benefits the most from the indomain knowledge, which confirms that domain discrepancy should be addressed and may bring large improvements on certain domains. [sent-321, score-0.508]

72 An example sentence from the Sports domain with translations from different methods is shown in Table 4. [sent-332, score-0.231]

73 Both our MTL-based approach and the conventional adaptation methods leverage the mixture models. [sent-338, score-0.405]

74 The regularization prevents the general features from biasing towards certain domains to the extreme. [sent-343, score-0.205]

75 Usually, a sentence is composed of some domain-specific words and some general words, so it is often improper to translate every word in the sentence using the indomain knowledge. [sent-345, score-0.176]

76 For the example in Table 4, the individual adaptation method ”[A] (G+I)-TM + (G+I)-LM” translates ”land” to ” 区域” (zone) improperly, because ” 区域” appears more often in the Sports text than the general-domain text. [sent-346, score-0.322]

77 This shows that the individual adaptation methods tend to overfit the in-domain development data. [sent-347, score-0.252]

78 1 Domain Adaptation One direction of domain adaptation explored the data selection and weighting approach to improve the performance of SMT on specific domains. [sent-350, score-0.494]

79 (2004) first decoded the testing data with a general TM, and then used the translation results to train an adapted LM, which was in turn used to re-decode the testing data. [sent-352, score-0.331]

80 (201 1) further extended their cross-entropy based method to the selection of bilingual corpus in the hope that more relevant corpus to the target domain could yield smaller models with better performance. [sent-357, score-0.396]

81 Sennrich (2012) investigated the TM perplexity minimization as a method to set model weights in mixture modeling. [sent-363, score-0.178]

82 (2012) used the ensemble decoding method to mix multiple translation models, which outperformed a variety of strong baselines. [sent-365, score-0.192]

83 Generally, most previous methods merely conducted domain adaption for a single domain, rather than multiple domains at the same time. [sent-366, score-0.395]

84 So far, there has been little research into the multi-domain adaptation problem over mixture models for SMT systems, as proposed in this paper. [sent-368, score-0.373]

85 (2006) extended the MTL approach (Ando and Zhang, 2005) to domain adaptation tasks in part-of-speech tagging. [sent-376, score-0.438]

86 Inspired by these methods, we used MTL to tune multiple SMT systems at the same time, where each system was composed of in-domain and generaldomain models. [sent-386, score-0.179]

87 5 Conclusion and Future Work In this paper, we propose an MTL-based approach to address multi-domain adaptation for SMT. [sent-389, score-0.252]

88 We first use the cross-entropy based data selection method to obtain in-domain bilingual data. [sent-390, score-0.21]

89 Experimental results have shown that our approach is quite promising for the multi-domain adaptation problem, and it brings significant improvement over both the non-adapted baselines and the conventional domain adaptation methods with mixture models. [sent-394, score-0.843]

90 We assume the domain information for testing data is known beforehand in this study. [sent-395, score-0.251]

91 Therefore, to apply our approach in real applications, the domain information needs to be identified automatically. [sent-397, score-0.223]

92 In the future, we will pre-define more popular domains and develop automatic domain classifiers. [sent-398, score-0.361]

93 For those domains that are identified with high confidence, we use the domain- specific system to translate the texts. [sent-399, score-0.215]

94 Furthermore, since our approach is a general training method, we may also combine this approach with other domain adaptation methods to get more performance improvement. [sent-401, score-0.438]

95 Bilingual data cleaning for smt using graph-based random walk. [sent-434, score-0.418]

96 Language model adaptation for statistical machine translation based on information retrieval. [sent-449, score-0.445]

97 Discriminative instance weighting for domain adaptation in statistical machine translation. [sent-464, score-0.473]

98 Mining bilingual data from the web with adaptively learnt patterns. [sent-469, score-0.184]

99 Improving statistical machine translation performance by training data selection and optimization. [sent-496, score-0.249]

100 Perplexity minimization for translation model domain adaptation in statistical machine translation. [sent-526, score-0.631]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mtl', 0.407), ('smt', 0.386), ('adaptation', 0.252), ('domain', 0.186), ('lms', 0.179), ('domains', 0.175), ('lm', 0.17), ('tm', 0.167), ('translation', 0.158), ('bilingual', 0.154), ('indomain', 0.136), ('tms', 0.131), ('simianer', 0.129), ('mixture', 0.121), ('axelrod', 0.107), ('commonalities', 0.107), ('generaldomain', 0.107), ('foster', 0.106), ('webpages', 0.102), ('tgt', 0.093), ('schroeder', 0.086), ('sports', 0.08), ('src', 0.075), ('hg', 0.075), ('monolingual', 0.073), ('translates', 0.07), ('testing', 0.065), ('eck', 0.064), ('dev', 0.064), ('portal', 0.063), ('weights', 0.057), ('selection', 0.056), ('individually', 0.053), ('tuning', 0.052), ('razmara', 0.051), ('feature', 0.049), ('pairwise', 0.049), ('decoders', 0.049), ('ueffing', 0.048), ('chapelle', 0.048), ('ball', 0.048), ('kuhn', 0.048), ('tuned', 0.047), ('hi', 0.045), ('sites', 0.045), ('xxx', 0.045), ('roland', 0.045), ('translations', 0.045), ('jointly', 0.044), ('bleu', 0.044), ('adapted', 0.043), ('nonadapted', 0.043), ('sennrich', 0.043), ('wng', 0.043), ('multitask', 0.043), ('transductive', 0.043), ('koehn', 0.041), ('ranking', 0.041), ('mert', 0.041), ('cui', 0.041), ('decoder', 0.041), ('translate', 0.04), ('sgd', 0.039), ('mismatch', 0.038), ('player', 0.038), ('tune', 0.038), ('wn', 0.037), ('real', 0.037), ('lmi', 0.037), ('dongdong', 0.037), ('kneser', 0.037), ('association', 0.037), ('dredze', 0.036), ('wd', 0.035), ('tailored', 0.035), ('quality', 0.035), ('statistical', 0.035), ('line', 0.035), ('multiple', 0.034), ('crawler', 0.034), ('bins', 0.034), ('zone', 0.034), ('shujie', 0.034), ('george', 0.034), ('republic', 0.034), ('pages', 0.033), ('australia', 0.033), ('prague', 0.033), ('candidates', 0.032), ('conventional', 0.032), ('cleaning', 0.032), ('ando', 0.032), ('moore', 0.032), ('czech', 0.031), ('preference', 0.031), ('box', 0.031), ('loss', 0.03), ('web', 0.03), ('regularization', 0.03), ('sydney', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999928 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning

Author: Lei Cui ; Xilun Chen ; Dongdong Zhang ; Shujie Liu ; Mu Li ; Ming Zhou

2 0.18588884 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

Author: Joern Wuebker ; Stephan Peitz ; Felix Rietig ; Hermann Ney

Abstract: Automatically clustering words from a monolingual or bilingual training corpus into classes is a widely used technique in statistical natural language processing. We present a very simple and easy to implement method for using these word classes to improve translation quality. It can be applied across different machine translation paradigms and with arbitrary types of models. We show its efficacy on a small German→English and a larger F ornenc ah s→mGalelrm Gaenrm mtarann→slEatniognli tsahsk a nwdit ha lbaortghe rst Farnednacrhd→ phrase-based salandti nhie traaskrch wiciathl phrase-based translation systems for a common set of models. Our results show that with word class models, the baseline can be improved by up to 1.4% BLEU and 1.0% TER on the French→German task and 0.3% BLEU aonnd t h1e .1 F%re nTcEhR→ on tehrem German→English Btask.

3 0.18124981 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering

Author: Min Xiao ; Feipeng Zhao ; Yuhong Guo

Abstract: Domain adaptation has been popularly studied on exploiting labeled information from a source domain to learn a prediction model in a target domain. In this paper, we develop a novel representation learning approach to address domain adaptation for text classification with automatically induced discriminative latent features, which are generalizable across domains while informative to the prediction task. Specifically, we propose a hierarchical multinomial Naive Bayes model with latent variables to conduct supervised word clustering on labeled documents from both source and target domains, and then use the produced cluster distribution of each word as its latent feature representation for domain adaptation. We train this latent graphical model us- ing a simple expectation-maximization (EM) algorithm. We empirically evaluate the proposed method with both cross-domain document categorization tasks on Reuters-21578 dataset and cross-domain sentiment classification tasks on Amazon product review dataset. The experimental results demonstrate that our proposed approach achieves superior performance compared with alternative methods.

4 0.15827602 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

Author: Xinyan Xiao ; Deyi Xiong

Abstract: Traditional synchronous grammar induction estimates parameters by maximizing likelihood, which only has a loose relation to translation quality. Alternatively, we propose a max-margin estimation approach to discriminatively inducing synchronous grammars for machine translation, which directly optimizes translation quality measured by BLEU. In the max-margin estimation of parameters, we only need to calculate Viterbi translations. This further facilitates the incorporation of various non-local features that are defined on the target side. We test the effectiveness of our max-margin estimation framework on a competitive hierarchical phrase-based system. Experiments show that our max-margin method significantly outperforms the traditional twostep pipeline for synchronous rule extraction by 1.3 BLEU points and is also better than previous max-likelihood estimation method.

5 0.15698604 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

Author: Min Xiao ; Yuhong Guo

Abstract: Cross-lingual adaptation aims to learn a prediction model in a label-scarce target language by exploiting labeled data from a labelrich source language. An effective crosslingual adaptation system can substantially reduce the manual annotation effort required in many natural language processing tasks. In this paper, we propose a new cross-lingual adaptation approach for document classification based on learning cross-lingual discriminative distributed representations of words. Specifically, we propose to maximize the loglikelihood of the documents from both language domains under a cross-lingual logbilinear document model, while minimizing the prediction log-losses of labeled documents. We conduct extensive experiments on cross-lingual sentiment classification tasks of Amazon product reviews. Our experimental results demonstrate the efficacy of the pro- posed cross-lingual adaptation approach.

6 0.14165002 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

7 0.12982298 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation

8 0.12683611 29 emnlp-2013-Automatic Domain Partitioning for Multi-Domain Learning

9 0.12555031 55 emnlp-2013-Decoding with Large-Scale Neural Language Models Improves Translation

10 0.12145104 42 emnlp-2013-Building Specialized Bilingual Lexicons Using Large Scale Background Knowledge

11 0.11596783 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

12 0.11292446 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

13 0.11177946 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

14 0.11145 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)

15 0.10673691 39 emnlp-2013-Boosting Cross-Language Retrieval by Learning Bilingual Phrase Associations from Relevance Rankings

16 0.10250905 128 emnlp-2013-Max-Violation Perceptron and Forced Decoding for Scalable MT Training

17 0.1008593 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

18 0.094304614 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation

19 0.092666514 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering

20 0.092410401 52 emnlp-2013-Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.258), (1, -0.217), (2, 0.024), (3, -0.001), (4, 0.175), (5, -0.027), (6, 0.023), (7, 0.065), (8, -0.009), (9, -0.163), (10, 0.028), (11, -0.163), (12, 0.073), (13, -0.162), (14, 0.072), (15, -0.047), (16, -0.059), (17, 0.045), (18, -0.019), (19, 0.063), (20, 0.074), (21, 0.007), (22, 0.002), (23, -0.071), (24, 0.021), (25, 0.03), (26, 0.067), (27, 0.014), (28, -0.136), (29, -0.076), (30, -0.012), (31, -0.071), (32, 0.062), (33, -0.096), (34, -0.003), (35, 0.096), (36, -0.059), (37, -0.112), (38, -0.002), (39, 0.052), (40, -0.034), (41, 0.029), (42, 0.053), (43, 0.012), (44, -0.089), (45, 0.081), (46, 0.062), (47, -0.041), (48, 0.059), (49, -0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9573741 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning

Author: Lei Cui ; Xilun Chen ; Dongdong Zhang ; Shujie Liu ; Mu Li ; Ming Zhou

2 0.68778408 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

Author: Joern Wuebker ; Stephan Peitz ; Felix Rietig ; Hermann Ney

3 0.67912835 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

Author: Kevin Gimpel ; Dhruv Batra ; Chris Dyer ; Gregory Shakhnarovich

Abstract: This paper addresses the problem of producing a diverse set of plausible translations. We present a simple procedure that can be used with any statistical machine translation (MT) system. We explore three ways of using diverse translations: (1) system combination, (2) discriminative reranking with rich features, and (3) a novel post-editing scenario in which multiple translations are presented to users. We find that diversity can improve performance on these tasks, especially for sentences that are difficult for MT.

4 0.66632062 29 emnlp-2013-Automatic Domain Partitioning for Multi-Domain Learning

Author: Di Wang ; Chenyan Xiong ; William Yang Wang

Abstract: Chenyan Xiong School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, USA cx@ c s . cmu .edu William Yang Wang School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, USA ww@ cmu .edu might not be generalizable to other domains (BenDavid et al., 2006; Ben-David et al., 2010). Multi-Domain learning (MDL) assumes that the domain labels in the dataset are known. However, when there are multiple metadata at- tributes available, it is not always straightforward to select a single best attribute for domain partition, and it is possible that combining more than one metadata attributes (including continuous attributes) can lead to better MDL performance. In this work, we propose an automatic domain partitioning approach that aims at providing better domain identities for MDL. We use a supervised clustering approach that learns the domain distance between data instances , and then cluster the data into better domains for MDL. Our experiment on real multi-domain datasets shows that using our automatically generated domain partition improves over popular MDL methods.

5 0.66261029 55 emnlp-2013-Decoding with Large-Scale Neural Language Models Improves Translation

Author: Ashish Vaswani ; Yinggong Zhao ; Victoria Fossum ; David Chiang

Abstract: We explore the application of neural language models to machine translation. We develop a new model that combines the neural probabilistic language model of Bengio et al., rectified linear units, and noise-contrastive estimation, and we incorporate it into a machine translation system both by reranking k-best lists and by direct integration into the decoder. Our large-scale, large-vocabulary experiments across four language pairs show that our neural language model improves translation quality by up to 1. 1B .

6 0.65770972 52 emnlp-2013-Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation

7 0.64673519 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

8 0.61708409 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering

9 0.61519396 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

10 0.61040002 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation

11 0.57780349 42 emnlp-2013-Building Specialized Bilingual Lexicons Using Large Scale Background Knowledge

12 0.56441075 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

13 0.56025839 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

14 0.52804208 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

15 0.49802145 39 emnlp-2013-Boosting Cross-Language Retrieval by Learning Bilingual Phrase Associations from Relevance Rankings

16 0.48582113 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

17 0.48190409 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)

18 0.45919216 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

19 0.43104348 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation

20 0.42483863 201 emnlp-2013-What is Hidden among Translation Rules

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.012), (10, 0.013), (18, 0.039), (22, 0.463), (30, 0.091), (45, 0.011), (50, 0.014), (51, 0.134), (66, 0.034), (71, 0.018), (75, 0.016), (77, 0.065), (90, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.943932 74 emnlp-2013-Event-Based Time Label Propagation for Automatic Dating of News Articles

Author: Tao Ge ; Baobao Chang ; Sujian Li ; Zhifang Sui

Abstract: Since many applications such as timeline summaries and temporal IR involving temporal analysis rely on document timestamps, the task of automatic dating of documents has been increasingly important. Instead of using feature-based methods as conventional models, our method attempts to date documents in a year level by exploiting relative temporal relations between documents and events, which are very effective for dating documents. Based on this intuition, we proposed an eventbased time label propagation model called confidence boosting in which time label information can be propagated between documents and events on a bipartite graph. The experiments show that our event-based propagation model can predict document timestamps in high accuracy and the model combined with a MaxEnt classifier outperforms the state-ofthe-art method for this task especially when the size of the training set is small.

2 0.92601383 25 emnlp-2013-Appropriately Incorporating Statistical Significance in PMI

Author: Om P. Damani ; Shweta Ghonge

Abstract: Two recent measures incorporate the notion of statistical significance in basic PMI formulation. In some tasks, we find that the new measures perform worse than the PMI. Our analysis shows that while the basic ideas in incorporating statistical significance in PMI are reasonable, they have been applied slightly inappropriately. By fixing this, we get new measures that improve performance over not just PMI but on other popular co-occurrence measures as well. In fact, the revised measures perform reasonably well compared with more resource intensive non co-occurrence based methods also.

same-paper 3 0.91077614 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning

Author: Lei Cui ; Xilun Chen ; Dongdong Zhang ; Shujie Liu ; Mu Li ; Ming Zhou

4 0.89943039 41 emnlp-2013-Building Event Threads out of Multiple News Articles

Author: Xavier Tannier ; Veronique Moriceau

Abstract: We present an approach for building multidocument event threads from a large corpus of newswire articles. An event thread is basically a succession of events belonging to the same story. It helps the reader to contextualize the information contained in a single article, by navigating backward or forward in the thread from this article. A specific effort is also made on the detection of reactions to a particular event. In order to build these event threads, we use a cascade of classifiers and other modules, taking advantage of the redundancy of information in the newswire corpus. We also share interesting comments concerning our manual annotation procedure for building a training and testing set1.

5 0.67760974 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

Author: Zhiyuan Chen ; Arjun Mukherjee ; Bing Liu ; Meichun Hsu ; Malu Castellanos ; Riddhiman Ghosh

Abstract: Aspect extraction is one of the key tasks in sentiment analysis. In recent years, statistical models have been used for the task. However, such models without any domain knowledge often produce aspects that are not interpretable in applications. To tackle the issue, some knowledge-based topic models have been proposed, which allow the user to input some prior domain knowledge to generate coherent aspects. However, existing knowledge-based topic models have several major shortcomings, e.g., little work has been done to incorporate the cannot-link type of knowledge or to automatically adjust the number of topics based on domain knowledge. This paper proposes a more advanced topic model, called MC-LDA (LDA with m-set and c-set), to address these problems, which is based on an Extended generalized Pólya urn (E-GPU) model (which is also proposed in this paper). Experiments on real-life product reviews from a variety of domains show that MCLDA outperforms the existing state-of-the-art models markedly.

6 0.67141229 29 emnlp-2013-Automatic Domain Partitioning for Multi-Domain Learning

7 0.65731728 118 emnlp-2013-Learning Biological Processes with Global Constraints

8 0.62122095 88 emnlp-2013-Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding

9 0.61537492 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

10 0.61304593 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering

11 0.61208618 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification

12 0.60631013 21 emnlp-2013-An Empirical Study Of Semi-Supervised Chinese Word Segmentation Using Co-Training

13 0.60243088 192 emnlp-2013-Unsupervised Induction of Contingent Event Pairs from Film Scenes

14 0.60180366 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction

15 0.60068935 125 emnlp-2013-Lexical Chain Based Cohesion Models for Document-Level Statistical Machine Translation

16 0.58857656 52 emnlp-2013-Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation

17 0.58590049 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing

18 0.57363671 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

19 0.57316285 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

20 0.56907213 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction