emnlp emnlp2011 emnlp2011-125 knowledge-graph by maker-knowledge-mining

125 emnlp-2011-Statistical Machine Translation with Local Language Models


Source: pdf

Author: Christof Monz

Abstract: Part-of-speech language modeling is commonly used as a component in statistical machine translation systems, but there is mixed evidence that its usage leads to significant improvements. We argue that its limited effectiveness is due to the lack of lexicalization. We introduce a new approach that builds a separate local language model for each word and part-of-speech pair. The resulting models lead to more context-sensitive probability distributions and we also exploit the fact that different local models are used to estimate the language model probability of each word during decoding. Our approach is evaluated for Arabic- and Chinese-to-English translation. We show that it leads to statistically significant improvements for multiple test sets and also across different genres, when compared against a competitive baseline and a system using a part-of-speech model.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 nl Abstract Part-of-speech language modeling is commonly used as a component in statistical machine translation systems, but there is mixed evidence that its usage leads to significant improvements. [sent-5, score-0.367]

2 We introduce a new approach that builds a separate local language model for each word and part-of-speech pair. [sent-7, score-0.228]

3 The resulting models lead to more context-sensitive probability distributions and we also exploit the fact that different local models are used to estimate the language model probability of each word during decoding. [sent-8, score-0.518]

4 We show that it leads to statistically significant improvements for multiple test sets and also across different genres, when compared against a competitive baseline and a system using a part-of-speech model. [sent-10, score-0.41]

5 1 Introduction Language models are an important component of current statistical machine translation systems. [sent-11, score-0.309]

6 They affect the selection of phrase translation candidates and reordering choices by estimating the probability that an application of a phrase translation is a fluent continuation of the current translation hypothesis. [sent-12, score-0.756]

7 The size and domain of the language model can have a significant impact on translation quality. [sent-13, score-0.192]

8 (2007) have shown that each doubling of the training data from the news domain (used to build the language model), leads to improvements of approximately 0. [sent-15, score-0.298]

9 On the other hand, 869 each doubling using general web data leads to improvements of approximately 0. [sent-17, score-0.298]

10 While large n-gram language models do lead to improved translation quality, they still lack any generalization beyond the surface forms (Schwenk, 2007). [sent-19, score-0.369]

11 Consider example (1), which is a short sentence fragment from the MT09 Arabic-English test set, with the corresponding machine translation output (1. [sent-20, score-0.274]

12 b), from a phrase-based statistical machine ظ translation system, and reference translation (1. [sent-21, score-0.464]

13 A straightforward approach to address this is to exploit the part-of-speech (POS) tags of the target words during translation (Kirchhoff and Yang, 2005). [sent-74, score-0.264]

14 In this paper, we introduce a novel approach that builds and uses individual, local POS language models for each word in the vocabulary. [sent-84, score-0.302]

15 Our experiments show that it leads to statistically significant improvements over a competitive baseline, using lexicalized reordering and a sizable 5-gram word language model, as well as a standard 7-gram POS language model approach. [sent-85, score-0.457]

16 While word-based models estimate the probability of a string of m words by Equation 2, POS-based models estimate the probability of string of m POS tags by Equation 3. [sent-89, score-0.318]

17 To deploy POS language models in machine translation, translation candidates need to be annotated with POS tags. [sent-100, score-0.309]

18 For machine translation one can sum over all possible tag sequences, as in Equation 4. [sent-103, score-0.301]

19 2 Effectiveness of POS Language Models Reported results on the effectiveness of POS language models for machine translation are mixed, in particular when translating into languages that are not morphologically rich, such as English. [sent-107, score-0.352]

20 While they rarely seem to hurt translation quality, there does not seem to be a clear consensus that they significantly improve quality either. [sent-108, score-0.192]

21 86 BLEU points for German-to-English translation for small training data. [sent-110, score-0.192]

22 (2007) report improvements when comparing a supertag language model to a baseline using a word language model only. [sent-114, score-0.251]

23 Once the baseline incorporates lexicalized distortion (Tillmann, 2004; Koehn et al. [sent-115, score-0.262]

24 Factored language models have not resulted in significant improvements either. [sent-117, score-0.261]

25 Kirchhoff and Yang (2005) report slight improvements when re-ranking the n-best lists of their decoder, which word tri-grams. [sent-118, score-0.187]

26 But these improvements are less than those gained by re-ranking the n-best lists with a 4-gram word language model. [sent-119, score-0.233]

27 The impact of POS language models depends among other things on the size of the parallel corpus, the size and order of the word language model, and whether lexicalized distortion models are used. [sent-120, score-0.346]

28 To gauge the potential effectiveness of POS language models without taking into consideration all these factors, we isolate the contribution of the language model by simulating machine translation output using English data only (Al-Onaizan and Papineni, 2006; Post and Gildea, 2008). [sent-121, score-0.39]

29 1 On the other hand, local language models alone (as introduced in Section 3) correlate with BLEU only slightly worse than surface models. [sent-133, score-0.361]

30 The BLEU scores in Table 1 1Interpolating both models does not lead to further correlation improvements. [sent-135, score-0.176]

31 These system-agnostic correlation results look promising for our local models and the endto-end translation results in Section 5 confirm these initial findings. [sent-142, score-0.552]

32 Instead ofusing one global POS language model that is built by using all of a mono-lingual corpus in the target language, we build individual models, or local models, for each word-POS pair using the POS tags surrounding each occurrence of that pair. [sent-144, score-0.354]

33 The effect is that the resulting ngram probability distributions of each local model are more biased towards the contextual constraints of each individual word-POS pair. [sent-146, score-0.277]

34 1 Definition of Local Language Models Each conditional probability of order n in a local model for the word-POS pair w :t is of the form: pw:t(tn,pn|t1 :p1, . [sent-149, score-0.277]

35 The conditional local n-gram probabilities (a–d) are generated from the occurrence of the word told with POS tag VBD. [sent-154, score-0.386]

36 For each local model we use a sliding window considering all n-grams of length n starting n words to the left and ending n words to the right of an occurrence of the word-POS pair of the model at hand. [sent-161, score-0.282]

37 All local model probabilities are smoothed using Witten-Bell smoothing and interpolation. [sent-162, score-0.27]

38 3 A local model of order n contains the conditional probabilities for words occurring at relative positions -1, +1, . [sent-165, score-0.273]

39 Therefore the probability of a word occurrence is estimated by all local models covering this word’s position. [sent-169, score-0.405]

40 , the probability of word wi+2 is based on the probability of the local model for wi+1, wi, wi−1, and wi−2 (the last two are not shown in Figure 2 for space reasons). [sent-173, score-0.326]

41 , ti−2 : -2i) nY−1 · Ypwi−n+j:ti−n+j(ti:n−j | Hi,n[j, ·]) (5) Yj=0 2The smaller event space of local models often leads to incomplete counts-of-counts, preventing the use of Kneser-Ney smoothing (Chen and Goodman, 1999). [sent-178, score-0.411]

42 Figure 2: Schema of overlapping local language model applications. [sent-192, score-0.228]

43 a Erixac shp row j ogf t Hi,n represents the history of the conditional probability belonging to the local model associated with position i −n+j. [sent-194, score-0.318]

44 The example how tri-gram in Figure 3 shows word-by-word local language models are used to compute the probability of a whole sentence. [sent-218, score-0.351]

45 Our local language model approach also bears some resemblance to statistical approaches to modeling subcategorization frames (Manning, 1993). [sent-219, score-0.347]

46 2 Building Local Language Models To build the local language models, we use the SRILM toolkit (Stolcke, 2002), which is commonly applied in speech recognition and statistical machine translation. [sent-222, score-0.271]

47 Often this is due to the computational (and implementational) complexities of integrating more complex language models with the decoder, although it is expected that a tighter integration with the decoder itself leads to better improvements than n-best list re-ranking. [sent-234, score-0.482]

48 Integrating our local language modeling approach with a decoder is straightforward. [sent-235, score-0.362]

49 Since SRILM supports arbitrarily many language models, local language models can be added using the same functionalities of SRILM’s API. [sent-237, score-0.302]

50 For the experiments discussed in Section 4, we add about 150,000 local language models to the word model. [sent-238, score-0.302]

51 All local language model probabilities are coupled with the same feature weight. [sent-239, score-0.228]

52 Potentially, improvements could be gained from using separate weights for individual local models, but this would require an optimization procedure such as MIRA (Chiang et al. [sent-240, score-0.461]

53 4 Experimental Setup Three approaches are compared in our experiments: the baseline system is a phrase-based statistical machine translation system (Koehn et al. [sent-245, score-0.299]

54 The third approach represents the work described in this paper, extending the baseline by including 4-gram local language models. [sent-249, score-0.292]

55 Only resources allowed under NIST’s constrained data conditions are used to train the language, translation, and lexicalized distortion models. [sent-252, score-0.198]

56 To see whether our local language models result in improvements over a competitive baseline, we designed the baseline to use a large 5-gram word language model and lexicalized distortion modeling, both of which are known to cancel-out improve- ments gained from POS language models (Birch et al. [sent-253, score-0.917]

57 The 7-gram POS and 4-gram local language models were both trained on the POS tagged English side of the bitext and 10M sentences from Gigaword’s Xinhua and AFP sections. [sent-261, score-0.354]

58 The data for building the translation models were primarily drawn from the parallel news resources distributed by the Linguistic Data Consor- tium (LDC). [sent-262, score-0.266]

59 , 2001), version 13a, where the brevity penalty is based on the reference translation with the closest length, and translation error rate (TER) version 0. [sent-279, score-0.461]

60 To see whether the differences between the approaches we compared in our experiments are statistically significant, we apply approximate randomization (Noreen, 1989); Riezler and Maxwell (2005) have shown that approximate randomization is less sensitive to Type-I errors, i. [sent-285, score-0.178]

61 Comparison of our approach (+locLM, rows 4a/b) to the baseline using a word language model (wordLM, rows 1a/b) and a competing approach using a POS-based language model (+posLM, rows 2a/b). [sent-293, score-0.427]

62 Rows 3a/b show the relative improvements over the baseline. [sent-303, score-0.187]

63 The third approach ‘+locLM’ (rows 4a/b) uses local language models in addition to the baseline’s word-based model. [sent-304, score-0.302]

64 Rows 5a/b show the relative improvements of the local modeling approach over the baseline and rows 6a/b the improvements over the approach using a POS language model. [sent-306, score-0.852]

65 The approach using a POS language model results in statistically significant improvements for only one test set (MT05) and the newswire documents of MT09. [sent-308, score-0.283]

66 The average improvements across all sets and genres are negligible (+0. [sent-309, score-0.297]

67 Our local language modeling approach achieves the highest BLEU scores for all test 875 sets and across all genres. [sent-311, score-0.293]

68 With the exception of MT08-WB and MT09-WB all BLEU improvements over the baseline are statistically significant. [sent-314, score-0.297]

69 When evaluating with 1-TER, local language modeling also achieves the best results, with the exception of MT06, where the POS language model approach performs slightly better. [sent-315, score-0.293]

70 Turning to the Chinese-English results in Table 3, we see similar improvements in BLEU. [sent-316, score-0.187]

71 The improvements of using a POS language model are negligible (+0. [sent-317, score-0.253]

72 Here as well, local language modeling leads to the best results, with substantial improvements of +0. [sent-319, score-0.547]

73 The major difference between Arabic-English and Chinese-English is the discrepancy between BLEU score improvements and decreases in 1-TER. [sent-321, score-0.225]

74 While we cannot explain this discrepancy, it is worth noting that similar discrepancies between BLEU and TER and Arabic-to-English and Chinese-to-English translation can be found in the literature. [sent-322, score-0.192]

75 for Arabic-to-English on the MT06 and MT08 sets, but for Chinese-to-English the correlation seems to be much weaker and BLEU improvements of +0. [sent-328, score-0.245]

76 One of the motivations of using POS language models in general, and local language models in our case, is to improve the fluency of translations, which should be reflected in increased precision for higherorder n-grams. [sent-331, score-0.376]

77 Table 4 shows that this is the case when comparing local modeling to both word and POS language models for Arabic-to-English translation. [sent-332, score-0.367]

78 ≤ The effectiveness of a POS language model often diminishes with improved translation quality of the base system to which it is added. [sent-345, score-0.284]

79 Naturally, we are interested in the extent that this diminishing effect also holds for our local language mod876 els. [sent-346, score-0.228]

80 Nevertheless, we can gauge this by taking a closer look at the distribution of improvements within our experiments. [sent-348, score-0.225]

81 Figure 4 shows performance improvements in document-level BLEU for both language pairs. [sent-349, score-0.187]

82 The document-level BLEU score for the baseline system is plotted on the x-axis and improvements are plotted on the y-axis. [sent-350, score-0.251]

83 If the effectiveness of either added model (POS or local) diminishes with increasing translation quality, we would expect a declining regression line. [sent-352, score-0.335]

84 Relative im- provements for both added models increase as the translation quality of the baseline increases. [sent-354, score-0.33]

85 The slope of both regression fits is almost identical, but the y-intercept is larger for our local modeling approach. [sent-355, score-0.417]

86 Both models seem to help more for documents with lower baseline translation quality. [sent-358, score-0.33]

87 For the local language model, the regression line intersects with the neutral line at about 40 BLEU, suggesting that until translation quality improves substantially, local language models could still have a positive impact. [sent-362, score-0.829]

88 The work on factored language models (Bilmes and Kirchhoff, 2003) is related to our work to the extent that it also mixes POS tags with lexical information, albeit in a very different manner. [sent-364, score-0.225]

89 Kirchhoff and Yang (2005) applied factored language models to machine translation but the improvements were negligible. [sent-366, score-0.575]

90 (2005) proposed a discriminative language modeling approach that uses mixtures of POS and surface information and showed that it leads to a reduction in speech recognition word er877 ror rates. [sent-368, score-0.191]

91 On the other hand, their approach seems more suited for n-best list re-ranking and it is not clear whether those improvements carry over to machine translation. [sent-369, score-0.23]

92 Li and Khudanpur (2008) adapted this discriminative approach to machine translation re-ranking but used surface forms only. [sent-370, score-0.294]

93 7 Conclusion Though POS language models do not lead to significant improvements over a competitive baseline, we have shown that a competitive phrase-based baseline system can benefit from using POS information by building lexically anchored local models. [sent-375, score-0.733]

94 Our local model approach does not only lead to more contextspecific probability distributions, but also takes advantage of the language model probability of each word being based on all surrounding local models. [sent-376, score-0.598]

95 The evaluations for Arabic- and Chinese-to-English show that local models lead to statistically significant improvements across different test sets and genres. [sent-377, score-0.579]

96 Correlating the translation quality of the baseline with the improvements that result from adding local models, further suggests that these improvements are sustainable and should carry over to improved baseline systems. [sent-378, score-0.922]

97 Edinburgh system description for the 2005 IWSLT speech translation evaluation. [sent-444, score-0.192]

98 Towards better machine translation quality for the german–english language pairs. [sent-452, score-0.235]

99 A study of translation edit rate with targeted human annotation. [sent-513, score-0.192]

100 Reranking machine translation hypotheses with structured and web-based language models. [sent-530, score-0.235]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pos', 0.331), ('bleu', 0.311), ('local', 0.228), ('translation', 0.192), ('improvements', 0.187), ('loclm', 0.169), ('wordlm', 0.169), ('poslm', 0.141), ('ptold', 0.141), ('distortion', 0.138), ('nnp', 0.127), ('kirchhoff', 0.122), ('rows', 0.121), ('srilm', 0.114), ('jjr', 0.113), ('koehn', 0.106), ('vbd', 0.102), ('vbz', 0.091), ('nist', 0.09), ('philipp', 0.083), ('factored', 0.079), ('nn', 0.075), ('models', 0.074), ('slope', 0.073), ('tags', 0.072), ('decoder', 0.069), ('birch', 0.069), ('leads', 0.067), ('tag', 0.066), ('randomization', 0.066), ('negligible', 0.066), ('commission', 0.066), ('afp', 0.066), ('modeling', 0.065), ('baseline', 0.064), ('nns', 0.061), ('lexicalized', 0.06), ('surface', 0.059), ('alexandra', 0.059), ('correlation', 0.058), ('controversial', 0.057), ('cuba', 0.056), ('frees', 0.056), ('intersects', 0.056), ('pcuba', 0.056), ('pmore', 0.056), ('subcategorization', 0.054), ('genre', 0.054), ('occurrence', 0.054), ('wen', 0.052), ('bitext', 0.052), ('reordering', 0.051), ('regression', 0.051), ('translations', 0.051), ('newswire', 0.05), ('wi', 0.05), ('probability', 0.049), ('accused', 0.049), ('uva', 0.049), ('americas', 0.049), ('diminishes', 0.049), ('bilmes', 0.049), ('hieu', 0.047), ('ter', 0.046), ('statistically', 0.046), ('competitive', 0.046), ('gained', 0.046), ('xinhua', 0.046), ('positions', 0.045), ('gigaword', 0.045), ('genres', 0.044), ('integration', 0.044), ('holger', 0.044), ('wii', 0.044), ('doubling', 0.044), ('anchored', 0.044), ('hasan', 0.044), ('lead', 0.044), ('machine', 0.043), ('effectiveness', 0.043), ('yang', 0.042), ('wang', 0.042), ('smoothing', 0.042), ('pages', 0.041), ('proceedings', 0.041), ('history', 0.041), ('tighter', 0.041), ('brevity', 0.04), ('och', 0.04), ('phrase', 0.04), ('fragment', 0.039), ('positional', 0.038), ('reordered', 0.038), ('discrepancy', 0.038), ('told', 0.038), ('gauge', 0.038), ('shen', 0.038), ('reference', 0.037), ('andreas', 0.037), ('precede', 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 125 emnlp-2011-Statistical Machine Translation with Local Language Models

Author: Christof Monz

Abstract: Part-of-speech language modeling is commonly used as a component in statistical machine translation systems, but there is mixed evidence that its usage leads to significant improvements. We argue that its limited effectiveness is due to the lack of lexicalization. We introduce a new approach that builds a separate local language model for each word and part-of-speech pair. The resulting models lead to more context-sensitive probability distributions and we also exploit the fact that different local models are used to estimate the language model probability of each word during decoding. Our approach is evaluated for Arabic- and Chinese-to-English translation. We show that it leads to statistically significant improvements for multiple test sets and also across different genres, when compared against a competitive baseline and a system using a part-of-speech model.

2 0.25373065 22 emnlp-2011-Better Evaluation Metrics Lead to Better Machine Translation

Author: Chang Liu ; Daniel Dahlmeier ; Hwee Tou Ng

Abstract: Many machine translation evaluation metrics have been proposed after the seminal BLEU metric, and many among them have been found to consistently outperform BLEU, demonstrated by their better correlations with human judgment. It has long been the hope that by tuning machine translation systems against these new generation metrics, advances in automatic machine translation evaluation can lead directly to advances in automatic machine translation. However, to date there has been no unambiguous report that these new metrics can improve a state-of-theart machine translation system over its BLEUtuned baseline. In this paper, we demonstrate that tuning Joshua, a hierarchical phrase-based statistical machine translation system, with the TESLA metrics results in significantly better humanjudged translation quality than the BLEUtuned baseline. TESLA-M in particular is simple and performs well in practice on large datasets. We release all our implementation under an open source license. It is our hope that this work will encourage the machine translation community to finally move away from BLEU as the unquestioned default and to consider the new generation metrics when tuning their systems.

3 0.21194032 44 emnlp-2011-Domain Adaptation via Pseudo In-Domain Data Selection

Author: Amittai Axelrod ; Xiaodong He ; Jianfeng Gao

Abstract: Xiaodong He Microsoft Research Redmond, WA 98052 xiaohe @mi cro s o ft . com Jianfeng Gao Microsoft Research Redmond, WA 98052 j fgao @mi cro s o ft . com have its own argot, vocabulary or stylistic preferences, such that the corpus characteristics will necWe explore efficient domain adaptation for the task of statistical machine translation based on extracting sentences from a large generaldomain parallel corpus that are most relevant to the target domain. These sentences may be selected with simple cross-entropy based methods, of which we present three. As these sentences are not themselves identical to the in-domain data, we call them pseudo in-domain subcorpora. These subcorpora 1% the size of the original can then used to train small domain-adapted Statistical Machine Translation (SMT) systems which outperform systems trained on the entire corpus. Performance is further improved when we use these domain-adapted models in combination with a true in-domain model. The results show that more training data is not always better, and that best results are attained via proper domain-relevant data selection, as well as combining in- and general-domain systems during decoding. – –

4 0.18672277 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing

Author: Zhenghua Li ; Min Zhang ; Wanxiang Che ; Ting Liu ; Wenliang Chen ; Haizhou Li

Abstract: Part-of-speech (POS) is an indispensable feature in dependency parsing. Current research usually models POS tagging and dependency parsing independently. This may suffer from error propagation problem. Our experiments show that parsing accuracy drops by about 6% when using automatic POS tags instead of gold ones. To solve this issue, this paper proposes a solution by jointly optimizing POS tagging and dependency parsing in a unique model. We design several joint models and their corresponding decoding algorithms to incorporate different feature sets. We further present an effective pruning strategy to reduce the search space of candidate POS tags, leading to significant improvement of parsing speed. Experimental results on Chinese Penn Treebank 5 show that our joint models significantly improve the state-of-the-art parsing accuracy by about 1.5%. Detailed analysis shows that the joint method is able to choose such POS tags that are more helpful and discriminative from parsing viewpoint. This is the fundamental reason of parsing accuracy improvement.

5 0.17105059 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation

Author: Yang Gao ; Philipp Koehn ; Alexandra Birch

Abstract: Long-distance reordering remains one of the biggest challenges facing machine translation. We derive soft constraints from the source dependency parsing to directly address the reordering problem for the hierarchical phrasebased model. Our approach significantly improves Chinese–English machine translation on a large-scale task by 0.84 BLEU points on average. Moreover, when we switch the tuning function from BLEU to the LRscore which promotes reordering, we observe total improvements of 1.21 BLEU, 1.30 LRscore and 3.36 TER over the baseline. On average our approach improves reordering precision and recall by 6.9 and 0.3 absolute points, respectively, and is found to be especially effective for long-distance reodering.

6 0.16229418 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation

7 0.15463465 13 emnlp-2011-A Word Reordering Model for Improved Machine Translation

8 0.13540407 136 emnlp-2011-Training a Parser for Machine Translation Reordering

9 0.12793621 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation

10 0.11297207 36 emnlp-2011-Corroborating Text Evaluation Results with Heterogeneous Measures

11 0.11236904 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation

12 0.11226262 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training

13 0.11197305 3 emnlp-2011-A Correction Model for Word Alignments

14 0.10965953 93 emnlp-2011-Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation

15 0.10477787 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax

16 0.10473047 100 emnlp-2011-Optimal Search for Minimum Error Rate Training

17 0.1041429 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation

18 0.097442254 10 emnlp-2011-A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

19 0.097128324 76 emnlp-2011-Language Models for Machine Translation: Original vs. Translated Texts

20 0.095363989 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.331), (1, 0.196), (2, 0.128), (3, -0.186), (4, -0.01), (5, -0.001), (6, 0.087), (7, -0.037), (8, -0.063), (9, -0.099), (10, 0.078), (11, -0.0), (12, -0.004), (13, 0.053), (14, -0.048), (15, 0.184), (16, 0.026), (17, 0.042), (18, 0.008), (19, 0.102), (20, -0.157), (21, 0.042), (22, 0.099), (23, 0.058), (24, -0.069), (25, 0.059), (26, -0.007), (27, -0.036), (28, -0.225), (29, -0.102), (30, -0.063), (31, -0.043), (32, 0.01), (33, 0.01), (34, -0.162), (35, 0.148), (36, 0.046), (37, -0.03), (38, -0.059), (39, 0.052), (40, -0.039), (41, -0.044), (42, -0.028), (43, -0.041), (44, 0.153), (45, -0.026), (46, -0.031), (47, 0.03), (48, -0.032), (49, -0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9790408 125 emnlp-2011-Statistical Machine Translation with Local Language Models

Author: Christof Monz

Abstract: Part-of-speech language modeling is commonly used as a component in statistical machine translation systems, but there is mixed evidence that its usage leads to significant improvements. We argue that its limited effectiveness is due to the lack of lexicalization. We introduce a new approach that builds a separate local language model for each word and part-of-speech pair. The resulting models lead to more context-sensitive probability distributions and we also exploit the fact that different local models are used to estimate the language model probability of each word during decoding. Our approach is evaluated for Arabic- and Chinese-to-English translation. We show that it leads to statistically significant improvements for multiple test sets and also across different genres, when compared against a competitive baseline and a system using a part-of-speech model.

2 0.81434631 22 emnlp-2011-Better Evaluation Metrics Lead to Better Machine Translation

Author: Chang Liu ; Daniel Dahlmeier ; Hwee Tou Ng

Abstract: Many machine translation evaluation metrics have been proposed after the seminal BLEU metric, and many among them have been found to consistently outperform BLEU, demonstrated by their better correlations with human judgment. It has long been the hope that by tuning machine translation systems against these new generation metrics, advances in automatic machine translation evaluation can lead directly to advances in automatic machine translation. However, to date there has been no unambiguous report that these new metrics can improve a state-of-theart machine translation system over its BLEUtuned baseline. In this paper, we demonstrate that tuning Joshua, a hierarchical phrase-based statistical machine translation system, with the TESLA metrics results in significantly better humanjudged translation quality than the BLEUtuned baseline. TESLA-M in particular is simple and performs well in practice on large datasets. We release all our implementation under an open source license. It is our hope that this work will encourage the machine translation community to finally move away from BLEU as the unquestioned default and to consider the new generation metrics when tuning their systems.

3 0.71615732 44 emnlp-2011-Domain Adaptation via Pseudo In-Domain Data Selection

Author: Amittai Axelrod ; Xiaodong He ; Jianfeng Gao

Abstract: Xiaodong He Microsoft Research Redmond, WA 98052 xiaohe @mi cro s o ft . com Jianfeng Gao Microsoft Research Redmond, WA 98052 j fgao @mi cro s o ft . com have its own argot, vocabulary or stylistic preferences, such that the corpus characteristics will necWe explore efficient domain adaptation for the task of statistical machine translation based on extracting sentences from a large generaldomain parallel corpus that are most relevant to the target domain. These sentences may be selected with simple cross-entropy based methods, of which we present three. As these sentences are not themselves identical to the in-domain data, we call them pseudo in-domain subcorpora. These subcorpora 1% the size of the original can then used to train small domain-adapted Statistical Machine Translation (SMT) systems which outperform systems trained on the entire corpus. Performance is further improved when we use these domain-adapted models in combination with a true in-domain model. The results show that more training data is not always better, and that best results are attained via proper domain-relevant data selection, as well as combining in- and general-domain systems during decoding. – –

4 0.65000749 36 emnlp-2011-Corroborating Text Evaluation Results with Heterogeneous Measures

Author: Enrique Amigo ; Julio Gonzalo ; Jesus Gimenez ; Felisa Verdejo

Abstract: Automatically produced texts (e.g. translations or summaries) are usually evaluated with n-gram based measures such as BLEU or ROUGE, while the wide set of more sophisticated measures that have been proposed in the last years remains largely ignored for practical purposes. In this paper we first present an indepth analysis of the state of the art in order to clarify this issue. After this, we formalize and verify empirically a set of properties that every text evaluation measure based on similarity to human-produced references satisfies. These properties imply that corroborating system improvements with additional measures always increases the overall reliability of the evaluation process. In addition, the greater the heterogeneity of the measures (which is measurable) the higher their combined reliability. These results support the use of heterogeneous measures in order to consolidate text evaluation results.

5 0.61769575 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation

Author: Yang Gao ; Philipp Koehn ; Alexandra Birch

Abstract: Long-distance reordering remains one of the biggest challenges facing machine translation. We derive soft constraints from the source dependency parsing to directly address the reordering problem for the hierarchical phrasebased model. Our approach significantly improves Chinese–English machine translation on a large-scale task by 0.84 BLEU points on average. Moreover, when we switch the tuning function from BLEU to the LRscore which promotes reordering, we observe total improvements of 1.21 BLEU, 1.30 LRscore and 3.36 TER over the baseline. On average our approach improves reordering precision and recall by 6.9 and 0.3 absolute points, respectively, and is found to be especially effective for long-distance reodering.

6 0.58696508 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing

7 0.58659923 76 emnlp-2011-Language Models for Machine Translation: Original vs. Translated Texts

8 0.55127877 148 emnlp-2011-Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation.

9 0.5386191 11 emnlp-2011-A Simple Word Trigger Method for Social Tag Suggestion

10 0.48985109 93 emnlp-2011-Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation

11 0.48916417 13 emnlp-2011-A Word Reordering Model for Improved Machine Translation

12 0.46684042 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation

13 0.44376898 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation

14 0.43884644 38 emnlp-2011-Data-Driven Response Generation in Social Media

15 0.43722132 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance

16 0.43335581 100 emnlp-2011-Optimal Search for Minimum Error Rate Training

17 0.42311823 3 emnlp-2011-A Correction Model for Word Alignments

18 0.42297614 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation

19 0.40229809 18 emnlp-2011-Analyzing Methods for Improving Precision of Pivot Based Bilingual Dictionaries

20 0.40153408 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(23, 0.093), (36, 0.015), (37, 0.029), (45, 0.08), (53, 0.44), (54, 0.025), (57, 0.022), (62, 0.017), (64, 0.021), (66, 0.045), (69, 0.013), (79, 0.039), (82, 0.013), (85, 0.033), (87, 0.011), (96, 0.026), (98, 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93185049 125 emnlp-2011-Statistical Machine Translation with Local Language Models

Author: Christof Monz

Abstract: Part-of-speech language modeling is commonly used as a component in statistical machine translation systems, but there is mixed evidence that its usage leads to significant improvements. We argue that its limited effectiveness is due to the lack of lexicalization. We introduce a new approach that builds a separate local language model for each word and part-of-speech pair. The resulting models lead to more context-sensitive probability distributions and we also exploit the fact that different local models are used to estimate the language model probability of each word during decoding. Our approach is evaluated for Arabic- and Chinese-to-English translation. We show that it leads to statistically significant improvements for multiple test sets and also across different genres, when compared against a competitive baseline and a system using a part-of-speech model.

2 0.87177169 18 emnlp-2011-Analyzing Methods for Improving Precision of Pivot Based Bilingual Dictionaries

Author: Xabier Saralegi ; Iker Manterola ; Inaki San Vicente

Abstract: An A-C bilingual dictionary can be inferred by merging A-B and B-C dictionaries using B as pivot. However, polysemous pivot words often produce wrong translation candidates. This paper analyzes two methods for pruning wrong candidates: one based on exploiting the structure of the source dictionaries, and the other based on distributional similarity computed from comparable corpora. As both methods depend exclusively on easily available resources, they are well suited to less resourced languages. We studied whether these two techniques complement each other given that they are based on different paradigms. We also researched combining them by looking for the best adequacy depending on various application scenarios. ,

3 0.52465469 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation

Author: Yang Gao ; Philipp Koehn ; Alexandra Birch

Abstract: Long-distance reordering remains one of the biggest challenges facing machine translation. We derive soft constraints from the source dependency parsing to directly address the reordering problem for the hierarchical phrasebased model. Our approach significantly improves Chinese–English machine translation on a large-scale task by 0.84 BLEU points on average. Moreover, when we switch the tuning function from BLEU to the LRscore which promotes reordering, we observe total improvements of 1.21 BLEU, 1.30 LRscore and 3.36 TER over the baseline. On average our approach improves reordering precision and recall by 6.9 and 0.3 absolute points, respectively, and is found to be especially effective for long-distance reodering.

4 0.51358604 76 emnlp-2011-Language Models for Machine Translation: Original vs. Translated Texts

Author: Gennadi Lembersky ; Noam Ordan ; Shuly Wintner

Abstract: We investigate the differences between language models compiled from original target-language texts and those compiled from texts manually translated to the target language. Corroborating established observations of Translation Studies, we demonstrate that the latter are significantly better predictors of translated sentences than the former, and hence fit the reference set better. Furthermore, translated texts yield better language models for statistical machine translation than original texts.

5 0.50547898 22 emnlp-2011-Better Evaluation Metrics Lead to Better Machine Translation

Author: Chang Liu ; Daniel Dahlmeier ; Hwee Tou Ng

Abstract: Many machine translation evaluation metrics have been proposed after the seminal BLEU metric, and many among them have been found to consistently outperform BLEU, demonstrated by their better correlations with human judgment. It has long been the hope that by tuning machine translation systems against these new generation metrics, advances in automatic machine translation evaluation can lead directly to advances in automatic machine translation. However, to date there has been no unambiguous report that these new metrics can improve a state-of-theart machine translation system over its BLEUtuned baseline. In this paper, we demonstrate that tuning Joshua, a hierarchical phrase-based statistical machine translation system, with the TESLA metrics results in significantly better humanjudged translation quality than the BLEUtuned baseline. TESLA-M in particular is simple and performs well in practice on large datasets. We release all our implementation under an open source license. It is our hope that this work will encourage the machine translation community to finally move away from BLEU as the unquestioned default and to consider the new generation metrics when tuning their systems.

6 0.4958545 44 emnlp-2011-Domain Adaptation via Pseudo In-Domain Data Selection

7 0.46490791 25 emnlp-2011-Cache-based Document-level Statistical Machine Translation

8 0.45924705 66 emnlp-2011-Hierarchical Phrase-based Translation Representations

9 0.45426136 133 emnlp-2011-The Imagination of Crowds: Conversational AAC Language Modeling using Crowdsourcing and Large Data Sources

10 0.44921529 68 emnlp-2011-Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding

11 0.44229069 13 emnlp-2011-A Word Reordering Model for Improved Machine Translation

12 0.44012558 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases

13 0.43901789 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing

14 0.43767416 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification

15 0.42678413 3 emnlp-2011-A Correction Model for Word Alignments

16 0.42461324 46 emnlp-2011-Efficient Subsampling for Training Complex Language Models

17 0.42195261 38 emnlp-2011-Data-Driven Response Generation in Social Media

18 0.41922143 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation

19 0.41675657 98 emnlp-2011-Named Entity Recognition in Tweets: An Experimental Study

20 0.41600278 35 emnlp-2011-Correcting Semantic Collocation Errors with L1-induced Paraphrases