emnlp emnlp2013 emnlp2013-104 knowledge-graph by maker-knowledge-mining

104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models


Source: pdf

Author: Joern Wuebker ; Stephan Peitz ; Felix Rietig ; Hermann Ney

Abstract: Automatically clustering words from a monolingual or bilingual training corpus into classes is a widely used technique in statistical natural language processing. We present a very simple and easy to implement method for using these word classes to improve translation quality. It can be applied across different machine translation paradigms and with arbitrary types of models. We show its efficacy on a small German→English and a larger F ornenc ah s→mGalelrm Gaenrm mtarann→slEatniognli tsahsk a nwdit ha lbaortghe rst Farnednacrhd→ phrase-based salandti nhie traaskrch wiciathl phrase-based translation systems for a common set of models. Our results show that with word class models, the baseline can be improved by up to 1.4% BLEU and 1.0% TER on the French→German task and 0.3% BLEU aonnd t h1e .1 F%re nTcEhR→ on tehrem German→English Btask.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We present a very simple and easy to implement method for using these word classes to improve translation quality. [sent-2, score-0.464]

2 It can be applied across different machine translation paradigms and with arbitrary types of models. [sent-3, score-0.363]

3 We show its efficacy on a small German→English and a larger F ornenc ah s→mGalelrm Gaenrm mtarann→slEatniognli tsahsk a nwdit ha lbaortghe rst Farnednacrhd→ phrase-based salandti nhie traaskrch wiciathl phrase-based translation systems for a common set of models. [sent-4, score-0.341]

4 Our results show that with word class models, the baseline can be improved by up to 1. [sent-5, score-0.202]

5 1 Introduction Data sparsity is one of the major problems for statistical learning methods in natural language processing (NLP) today. [sent-10, score-0.128]

6 One possiblity to reduce the sparsity for model estimation is to reduce the vocabulary size. [sent-13, score-0.134]

7 By clustering the vocabulary into a fixed number of word classes, it is possible to train models that are less prone to sparsity issues. [sent-14, score-0.175]

8 This work investigates the performance of standard models used in statistical machine transla1377 . [sent-15, score-0.053]

9 de tion when they are trained on automatically learned word classes rather than the actual word identities. [sent-17, score-0.303]

10 In the popular tooklit GIZA++ (Och and Ney, 2003), word classes are an essential ingredient to model alignment probabilities with the HMM or IBM translation models. [sent-18, score-0.555]

11 It contains the mkcl s tool (Och, 1999), which can automatically cluster the vo- cabulary into classes. [sent-19, score-0.168]

12 Using this tool, we propose to re-parameterize the standard models used in statistical machine translation (SMT), which are usually conditioned on word identities rather than word classes. [sent-20, score-0.529]

13 The idea is that this should lead to a smoother distribution, which is more reliable due to less sparsity. [sent-21, score-0.037]

14 Here, we focus on the phrase-based and lexical channel models in both directions, simple count models identifying frequency thresholds, lexicalized reordering models and an n-gram language model. [sent-22, score-0.226]

15 Although our results show that it is not a good idea to replace the original models, we argue that adding them to the log-linear feature combination can improve translation quality. [sent-23, score-0.25]

16 They can easily be computed for different translation paradigms and arbitrary models. [sent-24, score-0.363]

17 Training and decoding is possible without or with only little change to the code base. [sent-25, score-0.036]

18 By using word class models, we can improve our respective baselines by 1. [sent-27, score-0.157]

19 Training an additional language model for transProce Sdeiantgtlse o,f W thaesh 2i0n1gt3o nC,o UnSfeAre,n 1c8e- o2n1 E Omctpoibriecra 2l0 M13et. [sent-33, score-0.041]

20 oc d2s0 i1n3 N Aastusorcaila Ltiaon g fuoarg Ceo Pmrpoucetastsi on ga,l p Laignegsu 1is3t7ic7s–1381, lation based on word classes has been proposed in (Wuebker et al. [sent-35, score-0.214]

21 In addition to the reduced sparsity, an advantage of the smaller vocabulary is that longer n-gram context can be modeled efficiently. [sent-38, score-0.059]

22 Mathematically, our idea is equivalent to a special case of the Factored Translation Models proposed by Koehn and Hoang (2007). [sent-39, score-0.039]

23 Also related to our work, Cherry (2013) proposes to parameterize a hierarchical reordering model with sparse features that are conditioned on word classes trained with mkcl s. [sent-41, score-0.7]

24 However, the features are trained with MIRA rather than estimated by relative frequencies. [sent-42, score-0.048]

25 1 Standard Models The translation model of most phrase-based and hierarchical phrase-based SMT systems is parameter- ized by two phrasal and two lexical channel models (Koehn et al. [sent-44, score-0.417]

26 Their counts are extracted heuristically from a word aligned bilingual training corpus. [sent-46, score-0.115]

27 In addition to the four channel models, our baseline contains binary count features that fire, if the extraction count of the corresponding phrase pair is greater or equal to a given threshold τ. [sent-47, score-0.165]

28 Our phrase-based baseline contains the hierarchical reordering model (HRM) described by Galley and Manning (2008). [sent-49, score-0.3]

29 , 2012), we apply it in both translation directions with separate scaling factors for the three orientation classes, leading to a total of six feature weights. [sent-51, score-0.3]

30 An n-gram language model (LM) is another important feature of our translation systems. [sent-52, score-0.25]

31 The baselines apply 4-gram LMs trained by the SRILM toolkit (Stolcke, 2002) with interpolated modified Kneser-Ney smoothing (Chen and Goodman, 1998). [sent-53, score-0.048]

32 The smaller vocabulary size allows us to efficiently model larger context, so in addition to the 4-gram LM, we also train a 7-gram LM based on word classes. [sent-54, score-0.1]

33 In contrast to an LM of the same size trained on word identities, the increase in computational re- sources needed for translation is negligible for the 7-gram word class LM (wcLM). [sent-55, score-0.496]

34 2 Training By replacing the words on both source and target side of the training data with their respective word classes and keeping the word alignment unchanged, all of the above models can easily be trained conditioned on word classes by using the same training procedure as usual. [sent-57, score-0.771]

35 We end up with two separate model files, usually in the form of large tables, one with word identities and one with classes. [sent-58, score-0.131]

36 By walking through both sorted tables simultaneously, we can then efficiently augment the standard model file with an additonal feature (or additional features) based on word classes. [sent-60, score-0.181]

37 The word class LM is directly passed on to the decoder. [sent-61, score-0.157]

38 3 Decoding The decoder searches for the best translation given a set of models hm(e1I, s1K, f1J) by maximizing the log-linear feature score (Och and Ney, 2004): e1ˆI= argI,em1Iax(mXM=1λmhm(e1I,s1K,f1J)), (1) where f1J = f1. [sent-63, score-0.25]

39 All the above mentioned models can easily be integrated into this framework as additional features hm. [sent-73, score-0.041]

40 Both the dev and the te st set are composed of a mixture of broadcast news and broadcast conversations crawled from the web and have two references. [sent-78, score-0.12]

41 2 Setup In the French→German task, our baseline is a stanIdnar tdh phrase-based system augmented lwiniteh tshe a shtaienr-archical reordering model (HRM) described in Section 2. [sent-91, score-0.202]

42 The language model is a 4-gram LM trained on all German monolingual sources provided for WMT 2012. [sent-93, score-0.048]

43 For the class-based models, we run mkcl s on the source and target side of the bilingual training data to cluster the vocabulary into 100 classes each. [sent-94, score-0.567]

44 This clustering is used to train the models described above for word classes on the same training data as their counterparts based on word identity. [sent-95, score-0.255]

45 This also holds for the wcLM, which is a 4-gram LM trained on the same data as the baseline LM. [sent-96, score-0.093]

46 Further, the smaller vocabulary allows us to build an additional wcLM with a 7-gram context length. [sent-97, score-0.1]

47 On this task we also run additional experiments with 200 and 500 classes. [sent-98, score-0.041]

48 As bilingual training data we use the TED talks, which we cluster into 100 classes on both source and target side. [sent-101, score-0.379]

49 The 4-gram LM is trained on the TED, Europarl and news-commentary corpora. [sent-102, score-0.048]

50 Our experiments are conducted with the open source toolkit Jane (Wuebker et al. [sent-108, score-0.048]

51 9 R5e%s clotsn mfidaernkcede, w reitshult ‡s mrear skteadtis wticiathll y† swigitnhi i9c0a%nt cwoitnhfid 9e5n%ce. [sent-113, score-0.516]

52 c n-Xfid +enwcceX, r edseunlotste m tahrek systems, w whiethre 9 0th%e model X in the baseline is replaced by its word class counterpart. [sent-114, score-0.202]

53 wcModelsX denotes all word class models trained on X classes. [sent-116, score-0.205]

54 fIonr a hfires tF sreetn cofh experiments we replaced one of the standard TM, LM and HRM models by the same model based on word classes. [sent-119, score-0.041]

55 The strongest degradation can be seen when replacing the TM, while replacing the HRM only leads to a small drop in performance. [sent-121, score-0.106]

56 However, when the word class models are added as additional features to the baseline, we observe improvements. [sent-122, score-0.198]

57 Extending the context length of the wcLM to 7-grams gives an additional boost, reaching a total gain over the baseline of 1. [sent-130, score-0.086]

58 Using 200 classes instead of 100 seems to perform slightly better on test, but with 500 classes, translation quality de- grades again. [sent-133, score-0.423]

59 Here we are able to improve over the phrase-based baseline by 0. [sent-135, score-0.045]

60 9 R5e%s clotsn mfidaernkcede, wreitshul‡t s mrear skteadtis wticiathll y† swigitnhi i9c0a%nt cwoitnhfid 9e5n%ce. [sent-139, score-0.516]

61 Here, the surface form of the source word is analyzed to produce the factors, which are then translated and finally the surface form of the target word is generated from the target factors. [sent-147, score-0.22]

62 Although the translations of the factors operate on the same phrase segmentation, they are assumed to be independent. [sent-148, score-0.101]

63 In practice this is done by phrase expansion, which generates a joint phrase table as the cross product from the phrase tables of the individual factors. [sent-149, score-0.215]

64 In contrast, in this work each word is mapped to a single class, which means that when we have selected a translation option for the surface form, the target side on the word class level is predetermined. [sent-150, score-0.493]

65 Thus, no phrase expansion or generation steps are necessary to incorporate the word class information. [sent-151, score-0.208]

66 The phrase table can simply be extended with additional scores, keeping the set of phrases constant. [sent-152, score-0.092]

67 Although the implementation is simpler, our ap- proach is mathematically equivalent to a special case of the factored translation framework, which is shown in Figure 1. [sent-153, score-0.475]

68 The generation step from target word e to its target class c(e) assigns all probability 1380 Input Output canlwasolyrscd(isf)translationwcglaoesrdnsecr(aeti)on Figure 1: The factored translation model equivalent to our approach. [sent-154, score-0.654]

69 The generation step assigns all probability mass to a single event: pgen (c(e) |e) = 1. [sent-155, score-0.127]

70 mass to a single event: pgen(c|e) =(01,, eifls ce = c(e) 5 Conclusion (2) We have presented a simple and very easy to implement method to make use of word clusters for improving machine translation quality. [sent-156, score-0.369]

71 It is applicable across different paradigms and for arbitrary types of models. [sent-157, score-0.113]

72 Depending on the model type, it requires little or no change to the training and decoding software. [sent-158, score-0.036]

73 We have shown the efficacy of this method on two translation tasks and with both the standard phrase-based and the hierarchical phrase-based translation paradigm. [sent-159, score-0.648]

74 It was applied to relative frequency translation probabilities, the n-gram language model and a hierarchical reordering model. [sent-160, score-0.505]

75 In our experiments, the baseline is improved by 1. [sent-161, score-0.045]

76 Intuitively, it should be most effective for morphologically rich languages, which naturally have stronger sparsity problems. [sent-168, score-0.075]

77 In Proceedings of the 7th Workshop on Statistical Machine Translation, WMT ’ 12, pages 200–209, Montral, Canada. [sent-180, score-0.04]

78 In The 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), pages 22– 3 1, Atlanta, Georgia, USA, June. [sent-184, score-0.04]

79 In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 847–855, Honolulu, Hawaii, USA, October. [sent-189, score-0.04]

80 In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 868–876, Prague, Czech Republic, June. [sent-193, score-0.04]

81 In Proceedings of the 2003 Meeting of the North American chapter of the Associationfor Computational Linguistics (NAACL-03), pages 127–133, Edmonton, Alberta. [sent-201, score-0.04]

82 on Empirical Methods for Natural Language Processing (EMNLP), pages 388–395, Barcelona, Spain, July. [sent-207, score-0.04]

83 An efficient method for determining bilingual word classes. [sent-224, score-0.115]

84 Chapter of the Association of Computational Linguistics, pages 71–76, Bergen, Norway, June. [sent-228, score-0.04]

85 of the 41th Annual Meeting of the Association for Computational Linguistics (ACL), pages 160–167, Sapporo, Japan, July. [sent-233, score-0.04]

86 In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 3 11–3 18, Philadelphia, Pennsylvania, USA, July. [sent-237, score-0.04]

87 In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, pages 223–231, Cambridge, Massachusetts, USA, August. [sent-241, score-0.04]

88 on Speech and Language Processing (ICSLP), volume 2, pages 901–904, Denver, CO, September. [sent-248, score-0.04]

89 Jane: Open source hierarchical translation, extended with reordering and lexicon models. [sent-251, score-0.303]

90 In ACL 2010 Joint Fifth Workshop on Statistical Machine Translation and Metrics MATR, pages 262–270, Uppsala, Sweden, July. [sent-252, score-0.04]

91 Jane 2: Open source phrase-based and hierarchical statistical machine translation. [sent-255, score-0.199]

92 In International Conference on Computational Linguistics, pages 483–491, Mumbai, India, December. [sent-256, score-0.04]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('german', 0.284), ('translation', 0.25), ('bleu', 0.236), ('wclm', 0.224), ('lm', 0.221), ('ter', 0.196), ('french', 0.173), ('classes', 0.173), ('hrm', 0.171), ('reordering', 0.157), ('wuebker', 0.15), ('mkcl', 0.129), ('wctm', 0.129), ('factored', 0.118), ('class', 0.116), ('och', 0.103), ('jane', 0.102), ('hierarchical', 0.098), ('koehn', 0.097), ('identities', 0.09), ('clotsn', 0.086), ('cwoitnhfid', 0.086), ('huck', 0.086), ('mediani', 0.086), ('mfidaernkcede', 0.086), ('mrear', 0.086), ('pgen', 0.086), ('skteadtis', 0.086), ('swigitnhi', 0.086), ('vilar', 0.086), ('wticiathll', 0.086), ('hermann', 0.085), ('iwslt', 0.082), ('wmt', 0.076), ('sparsity', 0.075), ('aachen', 0.075), ('joern', 0.075), ('peitz', 0.075), ('bilingual', 0.074), ('channel', 0.069), ('snover', 0.068), ('mathematically', 0.068), ('paradigms', 0.067), ('matthias', 0.064), ('tables', 0.062), ('cherry', 0.062), ('hoang', 0.062), ('broadcast', 0.06), ('vocabulary', 0.059), ('conditioned', 0.054), ('alignment', 0.054), ('statistical', 0.053), ('replacing', 0.053), ('phrase', 0.051), ('efficacy', 0.05), ('factors', 0.05), ('josef', 0.05), ('trained', 0.048), ('source', 0.048), ('ney', 0.047), ('phrasebased', 0.047), ('franz', 0.046), ('programme', 0.046), ('arbitrary', 0.046), ('baseline', 0.045), ('target', 0.045), ('tm', 0.045), ('srilm', 0.045), ('english', 0.042), ('stephan', 0.042), ('ted', 0.042), ('word', 0.041), ('ha', 0.041), ('thresholds', 0.041), ('mass', 0.041), ('additional', 0.041), ('pages', 0.04), ('cluster', 0.039), ('equivalent', 0.039), ('galley', 0.038), ('papineni', 0.038), ('colin', 0.038), ('ingredient', 0.037), ('smoother', 0.037), ('felix', 0.037), ('rwth', 0.037), ('niehues', 0.037), ('gone', 0.037), ('hm', 0.037), ('fionr', 0.037), ('thge', 0.037), ('americas', 0.037), ('eifls', 0.037), ('matr', 0.037), ('micciulla', 0.037), ('mxm', 0.037), ('walking', 0.037), ('usa', 0.036), ('smt', 0.036), ('decoding', 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000008 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

Author: Joern Wuebker ; Stephan Peitz ; Felix Rietig ; Hermann Ney

Abstract: Automatically clustering words from a monolingual or bilingual training corpus into classes is a widely used technique in statistical natural language processing. We present a very simple and easy to implement method for using these word classes to improve translation quality. It can be applied across different machine translation paradigms and with arbitrary types of models. We show its efficacy on a small German→English and a larger F ornenc ah s→mGalelrm Gaenrm mtarann→slEatniognli tsahsk a nwdit ha lbaortghe rst Farnednacrhd→ phrase-based salandti nhie traaskrch wiciathl phrase-based translation systems for a common set of models. Our results show that with word class models, the baseline can be improved by up to 1.4% BLEU and 1.0% TER on the French→German task and 0.3% BLEU aonnd t h1e .1 F%re nTcEhR→ on tehrem German→English Btask.

2 0.23646581 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation

Author: Zhongqiang Huang ; Jacob Devlin ; Rabih Zbib

Abstract: Translation Jacob Devlin Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA j devl in@bbn . com Rabih Zbib Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA r zbib@bbn . com have tried to introduce grammaticality to the transThis paper describes a factored approach to incorporating soft source syntactic constraints into a hierarchical phrase-based translation system. In contrast to traditional approaches that directly introduce syntactic constraints to translation rules by explicitly decorating them with syntactic annotations, which often exacerbate the data sparsity problem and cause other problems, our approach keeps translation rules intact and factorizes the use of syntactic constraints through two separate models: 1) a syntax mismatch model that associates each nonterminal of a translation rule with a distribution of tags that is used to measure the degree of syntactic compatibility of the translation rule on source spans; 2) a syntax-based reordering model that predicts whether a pair of sibling constituents in the constituent parse tree of the source sentence should be reordered or not when translated to the target language. The features produced by both models are used as soft constraints to guide the translation process. Experiments on Chinese-English translation show that the proposed approach significantly improves a strong string-to-dependency translation system on multiple evaluation sets.

3 0.18596146 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

Author: Xinyan Xiao ; Deyi Xiong

Abstract: Traditional synchronous grammar induction estimates parameters by maximizing likelihood, which only has a loose relation to translation quality. Alternatively, we propose a max-margin estimation approach to discriminatively inducing synchronous grammars for machine translation, which directly optimizes translation quality measured by BLEU. In the max-margin estimation of parameters, we only need to calculate Viterbi translations. This further facilitates the incorporation of various non-local features that are defined on the target side. We test the effectiveness of our max-margin estimation framework on a competitive hierarchical phrase-based system. Experiments show that our max-margin method significantly outperforms the traditional twostep pipeline for synchronous rule extraction by 1.3 BLEU points and is also better than previous max-likelihood estimation method.

4 0.18588884 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning

Author: Lei Cui ; Xilun Chen ; Dongdong Zhang ; Shujie Liu ; Mu Li ; Ming Zhou

Abstract: Domain adaptation for SMT usually adapts models to an individual specific domain. However, it often lacks some correlation among different domains where common knowledge could be shared to improve the overall translation quality. In this paper, we propose a novel multi-domain adaptation approach for SMT using Multi-Task Learning (MTL), with in-domain models tailored for each specific domain and a general-domain model shared by different domains. The parameters of these models are tuned jointly via MTL so that they can learn general knowledge more accurately and exploit domain knowledge better. Our experiments on a largescale English-to-Chinese translation task validate that the MTL-based adaptation approach significantly and consistently improves the translation quality compared to a non-adapted baseline. Furthermore, it also outperforms the individual adaptation of each specific domain.

5 0.18251355 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

Author: Uri Lerner ; Slav Petrov

Abstract: We present a simple and novel classifier-based preordering approach. Unlike existing preordering models, we train feature-rich discriminative classifiers that directly predict the target-side word order. Our approach combines the strengths of lexical reordering and syntactic preordering models by performing long-distance reorderings using the structure of the parse tree, while utilizing a discriminative model with a rich set of features, including lexical features. We present extensive experiments on 22 language pairs, including preordering into English from 7 other languages. We obtain improvements of up to 1.4 BLEU on language pairs in the WMT 2010 shared task. For languages from different families the improvements often exceed 2 BLEU. Many of these gains are also significant in human evaluations.

6 0.17707688 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

7 0.16362655 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

8 0.16301337 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation

9 0.16022897 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

10 0.15570997 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

11 0.14599204 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering

12 0.13761781 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

13 0.12378244 201 emnlp-2013-What is Hidden among Translation Rules

14 0.12280197 128 emnlp-2013-Max-Violation Perceptron and Forced Decoding for Scalable MT Training

15 0.11532962 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

16 0.11433355 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

17 0.11418961 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation

18 0.10858347 55 emnlp-2013-Decoding with Large-Scale Neural Language Models Improves Translation

19 0.10686928 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

20 0.10527653 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.275), (1, -0.345), (2, 0.1), (3, 0.046), (4, 0.153), (5, -0.096), (6, -0.07), (7, -0.021), (8, -0.006), (9, -0.101), (10, 0.006), (11, 0.008), (12, 0.05), (13, -0.157), (14, -0.042), (15, 0.043), (16, 0.037), (17, -0.032), (18, -0.018), (19, 0.075), (20, 0.023), (21, 0.083), (22, -0.05), (23, 0.044), (24, -0.014), (25, 0.054), (26, -0.031), (27, -0.031), (28, -0.055), (29, -0.012), (30, 0.092), (31, 0.046), (32, 0.014), (33, 0.01), (34, 0.001), (35, 0.062), (36, -0.006), (37, -0.072), (38, 0.003), (39, 0.041), (40, 0.028), (41, 0.061), (42, 0.075), (43, 0.045), (44, 0.007), (45, 0.065), (46, -0.069), (47, 0.005), (48, -0.062), (49, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95883811 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

Author: Joern Wuebker ; Stephan Peitz ; Felix Rietig ; Hermann Ney

Abstract: Automatically clustering words from a monolingual or bilingual training corpus into classes is a widely used technique in statistical natural language processing. We present a very simple and easy to implement method for using these word classes to improve translation quality. It can be applied across different machine translation paradigms and with arbitrary types of models. We show its efficacy on a small German→English and a larger F ornenc ah s→mGalelrm Gaenrm mtarann→slEatniognli tsahsk a nwdit ha lbaortghe rst Farnednacrhd→ phrase-based salandti nhie traaskrch wiciathl phrase-based translation systems for a common set of models. Our results show that with word class models, the baseline can be improved by up to 1.4% BLEU and 1.0% TER on the French→German task and 0.3% BLEU aonnd t h1e .1 F%re nTcEhR→ on tehrem German→English Btask.

2 0.80938393 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

Author: Kevin Gimpel ; Dhruv Batra ; Chris Dyer ; Gregory Shakhnarovich

Abstract: This paper addresses the problem of producing a diverse set of plausible translations. We present a simple procedure that can be used with any statistical machine translation (MT) system. We explore three ways of using diverse translations: (1) system combination, (2) discriminative reranking with rich features, and (3) a novel post-editing scenario in which multiple translations are presented to users. We find that diversity can improve performance on these tasks, especially for sentences that are difficult for MT.

3 0.75655389 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

Author: Hendra Setiawan ; Bowen Zhou ; Bing Xiang

Abstract: Reordering poses one of the greatest challenges in Statistical Machine Translation research as the key contextual information may well be beyond the confine oftranslation units. We present the “Anchor Graph” (AG) model where we use a graph structure to model global contextual information that is crucial for reordering. The key ingredient of our AG model is the edges that capture the relationship between the reordering around a set of selected translation units, which we refer to as anchors. As the edges link anchors that may span multiple translation units at decoding time, our AG model effectively encodes global contextual information that is previously absent. We integrate our proposed model into a state-of-the-art translation system and demonstrate the efficacy of our proposal in a largescale Chinese-to-English translation task.

4 0.73829406 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

Author: Xiaoning Zhu ; Zhongjun He ; Hua Wu ; Haifeng Wang ; Conghui Zhu ; Tiejun Zhao

Abstract: This paper proposes a novel approach that utilizes a machine learning method to improve pivot-based statistical machine translation (SMT). For language pairs with few bilingual data, a possible solution in pivot-based SMT using another language as a

5 0.72553682 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation

Author: Ming Tan ; Tian Xia ; Shaojun Wang ; Bowen Zhou

Abstract: MIRA based tuning methods have been widely used in statistical machine translation (SMT) system with a large number of features. Since the corpus-level BLEU is not decomposable, these MIRA approaches usually define a variety of heuristic-driven sentencelevel BLEUs in their model losses. Instead, we present a new MIRA method, which employs an exact corpus-level BLEU to compute the model loss. Our method is simpler in implementation. Experiments on Chinese-toEnglish translation show its effectiveness over two state-of-the-art MIRA implementations.

6 0.72157866 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

7 0.69563204 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation

8 0.67980373 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning

9 0.65924627 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

10 0.65729505 52 emnlp-2013-Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation

11 0.65710557 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

12 0.63937116 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

13 0.62438762 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering

14 0.61564881 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

15 0.59270215 55 emnlp-2013-Decoding with Large-Scale Neural Language Models Improves Translation

16 0.56688136 156 emnlp-2013-Recurrent Continuous Translation Models

17 0.55668223 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation

18 0.5464251 201 emnlp-2013-What is Hidden among Translation Rules

19 0.52926278 39 emnlp-2013-Boosting Cross-Language Retrieval by Learning Bilingual Phrase Associations from Relevance Rankings

20 0.52243054 171 emnlp-2013-Shift-Reduce Word Reordering for Machine Translation


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.018), (10, 0.363), (18, 0.04), (22, 0.051), (30, 0.099), (45, 0.013), (50, 0.02), (51, 0.132), (66, 0.044), (71, 0.019), (75, 0.038), (77, 0.094)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.7319032 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

Author: Joern Wuebker ; Stephan Peitz ; Felix Rietig ; Hermann Ney

Abstract: Automatically clustering words from a monolingual or bilingual training corpus into classes is a widely used technique in statistical natural language processing. We present a very simple and easy to implement method for using these word classes to improve translation quality. It can be applied across different machine translation paradigms and with arbitrary types of models. We show its efficacy on a small German→English and a larger F ornenc ah s→mGalelrm Gaenrm mtarann→slEatniognli tsahsk a nwdit ha lbaortghe rst Farnednacrhd→ phrase-based salandti nhie traaskrch wiciathl phrase-based translation systems for a common set of models. Our results show that with word class models, the baseline can be improved by up to 1.4% BLEU and 1.0% TER on the French→German task and 0.3% BLEU aonnd t h1e .1 F%re nTcEhR→ on tehrem German→English Btask.

2 0.6424771 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

Author: Oier Lopez de Lacalle ; Mirella Lapata

Abstract: In this paper we present an unsupervised approach to relational information extraction. Our model partitions tuples representing an observed syntactic relationship between two named entities (e.g., “X was born in Y” and “X is from Y”) into clusters corresponding to underlying semantic relation types (e.g., BornIn, Located). Our approach incorporates general domain knowledge which we encode as First Order Logic rules and automatically combine with a topic model developed specifically for the relation extraction task. Evaluation results on the ACE 2007 English Relation Detection and Categorization (RDC) task show that our model outperforms competitive unsupervised approaches by a wide margin and is able to produce clusters shaped by both the data and the rules.

3 0.63666505 89 emnlp-2013-Gender Inference of Twitter Users in Non-English Contexts

Author: Morgane Ciot ; Morgan Sonderegger ; Derek Ruths

Abstract: While much work has considered the problem of latent attribute inference for users of social media such as Twitter, little has been done on non-English-based content and users. Here, we conduct the first assessment of latent attribute inference in languages beyond English, focusing on gender inference. We find that the gender inference problem in quite diverse languages can be addressed using existing machinery. Further, accuracy gains can be made by taking language-specific features into account. We identify languages with complex orthography, such as Japanese, as difficult for existing methods, suggesting a valuable direction for future research.

4 0.50448084 81 emnlp-2013-Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Author: Svitlana Volkova ; Theresa Wilson ; David Yarowsky

Abstract: Theresa Wilson Human Language Technology Center of Excellence Johns Hopkins University Baltimore, MD t aw@ j hu .edu differences may Different demographics, e.g., gender or age, can demonstrate substantial variation in their language use, particularly in informal contexts such as social media. In this paper we focus on learning gender differences in the use of subjective language in English, Spanish, and Russian Twitter data, and explore cross-cultural differences in emoticon and hashtag use for male and female users. We show that gender differences in subjective language can effectively be used to improve sentiment analysis, and in particular, polarity classification for Spanish and Russian. Our results show statistically significant relative F-measure improvement over the gender-independent baseline 1.5% and 1% for Russian, 2% and 0.5% for Spanish, and 2.5% and 5% for English for polarity and subjectivity classification.

5 0.48762679 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

Author: Jesus Gonzalez-Rubio ; Daniel Ortiz-Martinez ; Jose-Miguel Benedi ; Francisco Casacuberta

Abstract: Current automatic machine translation systems are not able to generate error-free translations and human intervention is often required to correct their output. Alternatively, an interactive framework that integrates the human knowledge into the translation process has been presented in previous works. Here, we describe a new interactive machine translation approach that is able to work with phrase-based and hierarchical translation models, and integrates error-correction all in a unified statistical framework. In our experiments, our approach outperforms previous interactive translation systems, and achieves estimated effort reductions of as much as 48% relative over a traditional post-edition system.

6 0.48546261 52 emnlp-2013-Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation

7 0.47931984 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

8 0.47776383 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation

9 0.47395861 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

10 0.46922627 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

11 0.46892118 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

12 0.46561351 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

13 0.46319428 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

14 0.46247128 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

15 0.46097502 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

16 0.45486951 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction

17 0.44984901 113 emnlp-2013-Joint Language and Translation Modeling with Recurrent Neural Networks

18 0.44822791 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

19 0.44701433 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

20 0.44691154 88 emnlp-2013-Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding