emnlp emnlp2012 emnlp2012-11 knowledge-graph by maker-knowledge-mining

11 emnlp-2012-A Systematic Comparison of Phrase Table Pruning Techniques


Source: pdf

Author: Richard Zens ; Daisy Stanton ; Peng Xu

Abstract: When trained on very large parallel corpora, the phrase table component of a machine translation system grows to consume vast computational resources. In this paper, we introduce a novel pruning criterion that places phrase table pruning on a sound theoretical foundation. Systematic experiments on four language pairs under various data conditions show that our principled approach is superior to existing ad hoc pruning methods.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract When trained on very large parallel corpora, the phrase table component of a machine translation system grows to consume vast computational resources. [sent-3, score-0.456]

2 In this paper, we introduce a novel pruning criterion that places phrase table pruning on a sound theoretical foundation. [sent-4, score-2.02]

3 Systematic experiments on four language pairs under various data conditions show that our principled approach is superior to existing ad hoc pruning methods. [sent-5, score-0.77]

4 The most resource-intensive components of a statistical machine translation system are the language model and the phrase table. [sent-18, score-0.483]

5 In this paper, we address the other problem of any statistical machine translation system: large phrase tables. [sent-21, score-0.483]

6 (2007) has shown that large portions of the phrase table can be removed without loss in translation quality. [sent-23, score-0.456]

7 This motivated us to perform a systematic comparison of different pruning methods. [sent-24, score-0.827]

8 The pruning criterion introduced in this work is inspired by the very successful and still state-of-theart language model pruning criterion based on entropy measures (Stolcke, 1998). [sent-26, score-1.905]

9 We motivate its derivation by stating the desiderata for a good phrase table pruning criterion: • Soundness: The criterion should optimize some nweeslsl:-understood information-theoretic measure of translation model quality. [sent-27, score-1.362]

10 i Self-containedness: As a practical consideration, we twaainnet dton prune phrases cfrtiocmal an existing phrase table. [sent-33, score-0.592]

11 This means pruning should use only information contained in the model itself. [sent-34, score-0.77]

12 Good empirical behavior: We would like to bGeo aobdle e tmo prune large parts o Wf tehe w phrase ktaeb lteo without significant loss in translation quality. [sent-35, score-0.56]

13 Analyzing existing pruning techniques based on these objectives, we found that they are commonly deficient in at least one of them. [sent-36, score-0.77]

14 We thus designed a novel pruning criterion that not only meets these objectives, it also performs very well in empirical evaluations. [sent-37, score-0.906]

15 an experimental comparison of several pruning methods for several language pairs. [sent-43, score-0.77]

16 2 Related Work The most basic pruning methods rely on probability and count cutoffs. [sent-44, score-0.795]

17 (2007) is promising as it shows that large parts of the phrase table can be removed without affecting translation quality. [sent-50, score-0.456]

18 However, it is unclear how this significance-based pruning criterion is related to translation model quality. [sent-52, score-1.075]

19 The same idea of significance-based pruning was exploited in (Yang and Zheng, 2009; Tomeh et al. [sent-55, score-0.77]

20 973 A different approach to phrase table pruning was undertaken by Eck et al. [sent-57, score-1.057]

21 Another approach to phrase table pruning is triangulation (Chen et al. [sent-61, score-1.057]

22 (201 1) modify the phrase extraction methods in order to reduce the phrase table size. [sent-69, score-0.574]

23 3 Pruning Using Simple Statistics In this section, we will review existing pruning methods based on simple phrase table statistics. [sent-71, score-1.057]

24 There are two common classes ofthese methods: absolute phrase table pruning and relative phrase table pruning. [sent-72, score-1.423]

25 1 Absolute pruning Absolute pruning methods rely only on the statistics of a single phrase pair e). [sent-74, score-1.879]

26 Hence, they are independent of other phrases in the phrase table. [sent-75, score-0.488]

27 2 Relative pruning A potential problem with the absolute pruning methods is that it can prune all occurrences of a source phrase Relative pruning methods avoid this by considering the full set of target phrases for a specific source phrase f˜. [sent-85, score-3.385]

28 This method discards tThhorsee phrases tuhnaitn are far worse than the best target phrase for a given source phrase Given a pruning threshold τt, a phrase pair e) is discarded if: f˜. [sent-88, score-1.982]

29 tFoo trh reeaschhsource phrase this method preserves the K target phrases with highest probability or, equivalently, tihtheir h cigohuenstt e). [sent-91, score-0.518]

30 As we will confirm in the empirical evaluation, this will likely cause drops in translation quality, since frequent source phrases are more useful than the infrequent ones. [sent-93, score-0.453]

31 4 Significance Pruning In this section, we briefly review significance pruning following Johnson et al. [sent-94, score-0.807]

32 The idea of significance pruning is to test whether a source phrase and a target phrase co-occur more frequently in a bilingual corpus than they should just by chance. [sent-96, score-1.522]

33 The lower the p-value, the less likely this phrase pair occurred with the observed frequency by chance; we thus prune a phrase pair (f˜, e) if: k=NX∞(f˜, e˜)ph(k)> τF (5) for some pruning threshold τF. [sent-105, score-1.539]

34 5 Entropy-based Pruning In this section, we will derive a novel entropy-based pruning criterion. [sent-109, score-0.77]

35 1 Motivational Example In general, pruning the phrase table can be considered as selecting a subset ofthe original phrase table. [sent-111, score-1.344]

36 In Table 2, we show some example phrases from the learned French-English WMT phrase table, along with their counts and probabilities. [sent-115, score-0.488]

37 For the French phrase le gouvernement fran c¸ais, we have, among others, two translations: the French government and the government of France. [sent-116, score-0.452]

38 Removing the phrase the government of France would increase this cost dramatically. [sent-119, score-0.355]

39 On the other hand, composing the phrase the French government out of shorter phrases has probability 0. [sent-128, score-0.614]

40 i cThh i ss means it is safe to discard the phrase the French gov- ernment, since the translation cost remains essentially unchanged. [sent-136, score-0.456]

41 By contrast, discarding the phrase the government of France does not have this effect: it leads to a large change in translation cost. [sent-137, score-0.546]

42 Note that here the pruning criterion only considers redundancy of the phrases, not the quality. [sent-138, score-0.906]

43 This assumes that phrase pairs affect the relative entropy roughly independently. [sent-146, score-0.409]

44 We can then choose a pruning threshold τE and prune those phrase pairs with a contribution to the relative entropy below that threshold. [sent-147, score-1.316]

45 Thus, we prune a phrase pair e), if (f˜, p(˜ e,f˜)hlogp( e˜|f˜) − logp0( e˜|f˜)i < τE (8) We now address how to assign the probability under the pruned model. [sent-148, score-0.487]

46 If a segmen- p0( e˜|f˜) tation into longer phrases does not exist, the system has to compose a translation out of shorter phrases. [sent-150, score-0.428]

47 Thus, if a phrase pair e) is no longer available, the decoder has to use shorter phrases to produce the same translation. [sent-151, score-0.599]

48 Using the normal phrase translation model, we obtain: p0( e˜|f˜) = X s1KX,πK1 YK p(s1K,π1K|f˜) Yp( e¯k|f¯πk) (11) Yk=1 Virtually all phrase-based decoders use the so- called maximum-approximation, i. [sent-159, score-0.456]

49 As we would like the pruning criterion to be similar to the search criterion used during decoding, we do the same and obtain: YK p0( ˜e|f˜) ≈s m1Ka,πx1KkY=1p( ¯ek|f¯πk) (12) Note that we also drop the segmentation probability, as this is not used at decoding time. [sent-162, score-1.081]

50 This leaves the pruning criterion a function only of the model as stored in the phrase table. [sent-163, score-1.193]

51 3 Computation In our experiments, it was more efficient to vary the pruning threshold τE without having to re-compute the entire phrase table. [sent-172, score-1.09]

52 Therefore, we computed the entropy criterion in Equation (8) once for the whole phrase table. [sent-173, score-0.516]

53 This introduces an approximation for the pruned model score It might happen that we prune short phrases t|hat were used as part of the best segmentation of longer phrases. [sent-174, score-0.441]

54 One way to avoid this approximation would be to perform entropy pruning with increasing phrase length. [sent-177, score-1.15]

55 Thus it can happen that a phrase is pruned for X-to-Y, but not for Y-to-X. [sent-187, score-0.384]

56 The baseline system uses the common phrase translation models, such as p( e˜|f˜) and p(f˜|˜ e), lextircaanl models, wdoelrsd, sauncdh phrase penalty, fdi|s˜ et)o,rt lieoxnpenalty as well as a lexicalized reordering model (Zens and Ney, 2006). [sent-196, score-0.764]

57 , we did not rerun MERT after pruning to avoid adding unnecessary noise. [sent-205, score-0.77]

58 The baseline system already includes phrase table pruning by removing singletons and keeping up to 30 target language phrases per source phrase. [sent-207, score-1.346]

59 First, we show a comparison of several probability-based pruning methods in Figure 1. [sent-217, score-0.77]

60 There is no difference between absolute and relative pruning methods, except that the two relative methods (Thres and Hist) are limited by 4The Bleu score drops are as follows: English-French 0. [sent-229, score-0.903]

61 Thus, they reach a point where they cannot prune the phrase table any further. [sent-238, score-0.391]

62 The results that follow use only the absolute pruning method as a representative for probability-based pruning. [sent-240, score-0.82]

63 In Figures 2 through 5, we show the translation quality as a function of the phrase table size. [sent-241, score-0.491]

64 We vary the pruning thresholds to obtain different phrase table sizes. [sent-242, score-1.057]

65 We compare four pruning methods: • • • • Count. [sent-243, score-0.77]

66 For instance, entropy pruning requires less than a quarter of the number of phrases needed by count- or significance-based pruning to achieve a Spanish-English Bleu score of 34 (0. [sent-262, score-1.834]

67 These results clearly show how the pruning methods compare: 1. [sent-265, score-0.77]

68 It should be used only to prune small fractions of the phrase table. [sent-267, score-0.391]

69 Entropy pruning consistently outperforms the other methods across translation directions and language pairs. [sent-272, score-0.939]

70 Figures 6 and 7 show compositionality statistics for the pruned Spanish-English phrase table (we observed similar results for the other language pairs). [sent-273, score-0.377]

71 Each figure shows the composition of the phrase table for a type of pruning for different phrase tables sizes. [sent-275, score-1.344]

72 For instance, in case of the smallest phrase table for count-based pruning, the 1-word phrases account for about 30% of all phrases, the 2-word phrases account for about 35% of all phrases, etc. [sent-279, score-0.689]

73 We observe that entropy-based pruning removes many more long phrases than any of the other methods. [sent-282, score-0.971]

74 The plot for probability-based pruning is different in that the percentage of long phrases actually increases with more aggressive pruning (i. [sent-283, score-1.764]

75 A possible explanation is that probability-based pruning does not take the frequency of the source phrase into account. [sent-286, score-1.115]

76 In Figure 8, we show the effect of the constant Number of Phrases [M] Number of Phrases [M] Figure 2: Translation quality as a function of the phrase table size for Spanish-English (left) and English-Spanish (right). [sent-295, score-0.346]

77 979 Prob Number of Phrases [millions] Count Number of Phrases [millions] Figure 6: Phrase length statistics for Spanish-English for probability-based (left) and count-based pruning (right). [sent-299, score-0.793]

78 Fisher Number of Phrases [millions] Entropy Number of Phrases [millions] Figure 7: Phrase length statistics for Spanish-English for significance-based (left) and entropy-based pruning (right). [sent-300, score-0.793]

79 The results in Figure 2 to Figure 5 show that entropy-based pruning clearly outperforms the alternative pruning methods. [sent-305, score-1.54]

80 In Ta- ble 5, we show how much of the phrase table we have to retain under various pruning criteria without losing more than one Bleu point in translation quality. [sent-307, score-1.226]

81 We see that probability-based pruning allows only for marginal savings. [sent-308, score-0.77]

82 Count-based and significance-based pruning results in larger savings between 70% and 90%, albeit with fairly high vari6The values are in neg-log-space, i. [sent-309, score-0.838]

83 Entropy-based pruning achieves consistently high savings between 85% and 95% of the phrase table. [sent-313, score-1.125]

84 It always outperforms the other pruning methods and yields significant savings on top of countbased or significance-based pruning methods. [sent-314, score-1.608]

85 Often, we can cut the required phrase table size in half compared to count or significance based pruning. [sent-315, score-0.373]

86 As a last experiment, we want to confirm that phrase-table pruning methods are actually better than simply reducing the maximum phrase length. [sent-316, score-1.057]

87 In Figure 9, we show a comparison of different pruning methods and a length-based approach for Spanish-English. [sent-317, score-0.77]

88 until we are left with only single-word phrases; the phrase length is measured as the number of source language words. [sent-319, score-0.375]

89 We observe that entropy-based, count-based and significance-based pruning indeed outperform the length-based approach. [sent-320, score-0.77]

90 Number of Phrases [M] Figure 8: Translation quality (Bleu) as a function of the phrase table size for Spanish-English for entropy pruning with different constants pc. [sent-333, score-1.209]

91 7 Conclusions Phrase table pruning is often addressed in an ad-hoc way using the heuristics described in Section 3. [sent-334, score-0.77]

92 Choosing the wrong technique can result in significant drops in translation quality without saving much in terms of phrase table size. [sent-336, score-0.516]

93 We introduced a novel entropy-based criterion and put phrase table pruning on a sound theoretical foundation. [sent-337, score-1.25]

94 We can summarize our conclusions as follows: • Probability-based pruning performs poorly Pwrhoebna pruning large parts of the phrase table. [sent-340, score-1.827]

95 • Count-based pruning performs as well as 981 Number of Phrases [M] Figure 9: Translation quality (Bleu) as a function of the phrase table size for Spanish-English. [sent-342, score-1.116]

96 • Entropy-based pruning gives significantly larger savings in phrase table size than any other pruning method. [sent-344, score-1.919]

97 • Compared to previous work, the novel entropybCaosmedp pruning eovfiteonu a wchoirkev,e ths et nheo same oBpleyuscore with only half the number of phrases. [sent-345, score-0.77]

98 Improving phrase extraction via MBR phrase scoring and pruning. [sent-392, score-0.574]

99 Translation model pruning via usage statistics for statistical machine translation. [sent-400, score-0.82]

100 Bilingual segmentation for phrasetable pruning in statistical machine translation. [sent-465, score-0.836]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pruning', 0.77), ('phrase', 0.287), ('phrases', 0.201), ('translation', 0.169), ('criterion', 0.136), ('zens', 0.104), ('prune', 0.104), ('entropy', 0.093), ('millions', 0.073), ('tomeh', 0.07), ('savings', 0.068), ('government', 0.068), ('pruned', 0.067), ('bleu', 0.064), ('hermann', 0.06), ('source', 0.058), ('shorter', 0.058), ('systematic', 0.057), ('french', 0.055), ('bilingual', 0.053), ('absolute', 0.05), ('franz', 0.048), ('och', 0.047), ('wmt', 0.045), ('eck', 0.044), ('vogel', 0.043), ('histogram', 0.041), ('hlogp', 0.041), ('macherey', 0.041), ('ofexisting', 0.041), ('pharaoh', 0.041), ('pc', 0.039), ('fisher', 0.039), ('segmentation', 0.039), ('significance', 0.037), ('ek', 0.035), ('sound', 0.035), ('quality', 0.035), ('contingency', 0.035), ('denmark', 0.035), ('nadi', 0.035), ('stephan', 0.034), ('yk', 0.034), ('koehn', 0.034), ('threshold', 0.033), ('johnson', 0.033), ('richard', 0.032), ('alignment', 0.031), ('compositional', 0.031), ('france', 0.031), ('summit', 0.031), ('million', 0.031), ('target', 0.03), ('republic', 0.03), ('mert', 0.03), ('happen', 0.03), ('left', 0.03), ('figures', 0.03), ('czech', 0.03), ('andreas', 0.029), ('fran', 0.029), ('copenhagen', 0.029), ('matthias', 0.029), ('prunes', 0.029), ('unpruned', 0.029), ('pair', 0.029), ('relative', 0.029), ('ney', 0.028), ('faster', 0.028), ('pauls', 0.027), ('segmentations', 0.027), ('statistical', 0.027), ('prague', 0.026), ('xf', 0.026), ('equation', 0.026), ('philipp', 0.025), ('association', 0.025), ('pages', 0.025), ('count', 0.025), ('duan', 0.025), ('peng', 0.025), ('drops', 0.025), ('josef', 0.025), ('decoder', 0.024), ('size', 0.024), ('statistics', 0.023), ('talbot', 0.023), ('francisco', 0.023), ('aggressive', 0.023), ('nicola', 0.023), ('stolcke', 0.023), ('theoretical', 0.022), ('alexandra', 0.022), ('discarding', 0.022), ('martin', 0.021), ('ep', 0.021), ('reordering', 0.021), ('brants', 0.021), ('decompose', 0.021), ('ef', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 11 emnlp-2012-A Systematic Comparison of Phrase Table Pruning Techniques

Author: Richard Zens ; Daisy Stanton ; Peng Xu

Abstract: When trained on very large parallel corpora, the phrase table component of a machine translation system grows to consume vast computational resources. In this paper, we introduce a novel pruning criterion that places phrase table pruning on a sound theoretical foundation. Systematic experiments on four language pairs under various data conditions show that our principled approach is superior to existing ad hoc pruning methods.

2 0.48879677 42 emnlp-2012-Entropy-based Pruning for Phrase-based Machine Translation

Author: Wang Ling ; Joao Graca ; Isabel Trancoso ; Alan Black

Abstract: Phrase-based machine translation models have shown to yield better translations than Word-based models, since phrase pairs encode the contextual information that is needed for a more accurate translation. However, many phrase pairs do not encode any relevant context, which means that the translation event encoded in that phrase pair is led by smaller translation events that are independent from each other, and can be found on smaller phrase pairs, with little or no loss in translation accuracy. In this work, we propose a relative entropy model for translation models, that measures how likely a phrase pair encodes a translation event that is derivable using smaller translation events with similar probabilities. This model is then applied to phrase table pruning. Tests show that considerable amounts of phrase pairs can be excluded, without much impact on the transla- . tion quality. In fact, we show that better translations can be obtained using our pruned models, due to the compression of the search space during decoding.

3 0.20537308 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage

Author: Kenneth Heafield ; Philipp Koehn ; Alon Lavie

Abstract: Approximate search algorithms, such as cube pruning in syntactic machine translation, rely on the language model to estimate probabilities of sentence fragments. We contribute two changes that trade between accuracy of these estimates and memory, holding sentence-level scores constant. Common practice uses lowerorder entries in an N-gram model to score the first few words of a fragment; this violates assumptions made by common smoothing strategies, including Kneser-Ney. Instead, we use a unigram model to score the first word, a bigram for the second, etc. This improves search at the expense of memory. Conversely, we show how to save memory by collapsing probability and backoff into a single value without changing sentence-level scores, at the expense of less accurate estimates for sentence fragments. These changes can be stacked, achieving better estimates with unchanged memory usage. In order to interpret changes in search accuracy, we adjust the pop limit so that accuracy is unchanged and report the change in CPU time. In a GermanEnglish Moses system with target-side syntax, improved estimates yielded a 63% reduction in CPU time; for a Hiero-style version, the reduction is 21%. The compressed language model uses 26% less RAM while equivalent search quality takes 27% more CPU. Source code is released as part of KenLM.

4 0.15768485 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning

Author: Hao Zhang ; Ryan McDonald

Abstract: State-of-the-art graph-based parsers use features over higher-order dependencies that rely on decoding algorithms that are slow and difficult to generalize. On the other hand, transition-based dependency parsers can easily utilize such features without increasing the linear complexity of the shift-reduce system beyond a constant. In this paper, we attempt to address this imbalance for graph-based parsing by generalizing the Eisner (1996) algorithm to handle arbitrary features over higherorder dependencies. The generalization is at the cost of asymptotic efficiency. To account for this, cube pruning for decoding is utilized (Chiang, 2007). For the first time, label tuple and structural features such as valencies can be scored efficiently with third-order features in a graph-based parser. Our parser achieves the state-of-art unlabeled accuracy of 93.06% and labeled accuracy of 91.86% on the standard test set for English, at a faster speed than a reimplementation ofthe third-ordermodel of Koo et al. (2010).

5 0.11411804 86 emnlp-2012-Locally Training the Log-Linear Model for SMT

Author: Lemao Liu ; Hailong Cao ; Taro Watanabe ; Tiejun Zhao ; Mo Yu ; Conghui Zhu

Abstract: In statistical machine translation, minimum error rate training (MERT) is a standard method for tuning a single weight with regard to a given development data. However, due to the diversity and uneven distribution of source sentences, there are two problems suffered by this method. First, its performance is highly dependent on the choice of a development set, which may lead to an unstable performance for testing. Second, translations become inconsistent at the sentence level since tuning is performed globally on a document level. In this paper, we propose a novel local training method to address these two problems. Unlike a global training method, such as MERT, in which a single weight is learned and used for all the input sentences, we perform training and testing in one step by learning a sentencewise weight for each input sentence. We pro- pose efficient incremental training methods to put the local training into practice. In NIST Chinese-to-English translation tasks, our local training method significantly outperforms MERT with the maximal improvements up to 2.0 BLEU points, meanwhile its efficiency is comparable to that of the global method.

6 0.11171474 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation

7 0.10280627 54 emnlp-2012-Forced Derivation Tree based Model Training to Statistical Machine Translation

8 0.098666109 31 emnlp-2012-Cross-Lingual Language Modeling with Syntactic Reordering for Low-Resource Speech Recognition

9 0.097027443 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT

10 0.096939318 67 emnlp-2012-Inducing a Discriminative Parser to Optimize Machine Translation Reordering

11 0.0950379 128 emnlp-2012-Translation Model Based Cross-Lingual Language Model Adaptation: from Word Models to Phrase Models

12 0.09311007 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction

13 0.083991341 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation

14 0.076057889 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules

15 0.073370159 25 emnlp-2012-Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation

16 0.07220874 39 emnlp-2012-Enlarging Paraphrase Collections through Generalization and Instantiation

17 0.067056052 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP

18 0.066358387 58 emnlp-2012-Generalizing Sub-sentential Paraphrase Acquisition across Original Signal Type of Text Pairs

19 0.065840252 127 emnlp-2012-Transforming Trees to Improve Syntactic Convergence

20 0.056537259 97 emnlp-2012-Natural Language Questions for the Web of Data


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.235), (1, -0.227), (2, -0.295), (3, -0.012), (4, -0.145), (5, -0.136), (6, -0.177), (7, 0.195), (8, 0.037), (9, -0.043), (10, -0.037), (11, -0.14), (12, 0.196), (13, -0.2), (14, 0.116), (15, -0.253), (16, 0.004), (17, -0.086), (18, -0.287), (19, -0.229), (20, -0.039), (21, 0.065), (22, 0.101), (23, -0.044), (24, 0.008), (25, -0.031), (26, 0.018), (27, -0.096), (28, 0.076), (29, -0.006), (30, -0.047), (31, 0.023), (32, -0.103), (33, -0.066), (34, -0.021), (35, -0.06), (36, -0.041), (37, 0.009), (38, -0.065), (39, 0.015), (40, 0.007), (41, 0.029), (42, 0.01), (43, 0.003), (44, 0.038), (45, -0.033), (46, 0.045), (47, -0.025), (48, -0.049), (49, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98855734 11 emnlp-2012-A Systematic Comparison of Phrase Table Pruning Techniques

Author: Richard Zens ; Daisy Stanton ; Peng Xu

Abstract: When trained on very large parallel corpora, the phrase table component of a machine translation system grows to consume vast computational resources. In this paper, we introduce a novel pruning criterion that places phrase table pruning on a sound theoretical foundation. Systematic experiments on four language pairs under various data conditions show that our principled approach is superior to existing ad hoc pruning methods.

2 0.95130455 42 emnlp-2012-Entropy-based Pruning for Phrase-based Machine Translation

Author: Wang Ling ; Joao Graca ; Isabel Trancoso ; Alan Black

Abstract: Phrase-based machine translation models have shown to yield better translations than Word-based models, since phrase pairs encode the contextual information that is needed for a more accurate translation. However, many phrase pairs do not encode any relevant context, which means that the translation event encoded in that phrase pair is led by smaller translation events that are independent from each other, and can be found on smaller phrase pairs, with little or no loss in translation accuracy. In this work, we propose a relative entropy model for translation models, that measures how likely a phrase pair encodes a translation event that is derivable using smaller translation events with similar probabilities. This model is then applied to phrase table pruning. Tests show that considerable amounts of phrase pairs can be excluded, without much impact on the transla- . tion quality. In fact, we show that better translations can be obtained using our pruned models, due to the compression of the search space during decoding.

3 0.67029625 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage

Author: Kenneth Heafield ; Philipp Koehn ; Alon Lavie

Abstract: Approximate search algorithms, such as cube pruning in syntactic machine translation, rely on the language model to estimate probabilities of sentence fragments. We contribute two changes that trade between accuracy of these estimates and memory, holding sentence-level scores constant. Common practice uses lowerorder entries in an N-gram model to score the first few words of a fragment; this violates assumptions made by common smoothing strategies, including Kneser-Ney. Instead, we use a unigram model to score the first word, a bigram for the second, etc. This improves search at the expense of memory. Conversely, we show how to save memory by collapsing probability and backoff into a single value without changing sentence-level scores, at the expense of less accurate estimates for sentence fragments. These changes can be stacked, achieving better estimates with unchanged memory usage. In order to interpret changes in search accuracy, we adjust the pop limit so that accuracy is unchanged and report the change in CPU time. In a GermanEnglish Moses system with target-side syntax, improved estimates yielded a 63% reduction in CPU time; for a Hiero-style version, the reduction is 21%. The compressed language model uses 26% less RAM while equivalent search quality takes 27% more CPU. Source code is released as part of KenLM.

4 0.38402784 128 emnlp-2012-Translation Model Based Cross-Lingual Language Model Adaptation: from Word Models to Phrase Models

Author: Shixiang Lu ; Wei Wei ; Xiaoyin Fu ; Bo Xu

Abstract: In this paper, we propose a novel translation model (TM) based cross-lingual data selection model for language model (LM) adaptation in statistical machine translation (SMT), from word models to phrase models. Given a source sentence in the translation task, this model directly estimates the probability that a sentence in the target LM training corpus is similar. Compared with the traditional approaches which utilize the first pass translation hypotheses, cross-lingual data selection model avoids the problem of noisy proliferation. Furthermore, phrase TM based cross-lingual data selection model is more effective than the traditional approaches based on bag-ofwords models and word-based TM, because it captures contextual information in modeling the selection of phrase as a whole. Experiments conducted on large-scale data sets demonstrate that our approach significantly outperforms the state-of-the-art approaches on both LM perplexity and SMT performance.

5 0.36542013 86 emnlp-2012-Locally Training the Log-Linear Model for SMT

Author: Lemao Liu ; Hailong Cao ; Taro Watanabe ; Tiejun Zhao ; Mo Yu ; Conghui Zhu

Abstract: In statistical machine translation, minimum error rate training (MERT) is a standard method for tuning a single weight with regard to a given development data. However, due to the diversity and uneven distribution of source sentences, there are two problems suffered by this method. First, its performance is highly dependent on the choice of a development set, which may lead to an unstable performance for testing. Second, translations become inconsistent at the sentence level since tuning is performed globally on a document level. In this paper, we propose a novel local training method to address these two problems. Unlike a global training method, such as MERT, in which a single weight is learned and used for all the input sentences, we perform training and testing in one step by learning a sentencewise weight for each input sentence. We pro- pose efficient incremental training methods to put the local training into practice. In NIST Chinese-to-English translation tasks, our local training method significantly outperforms MERT with the maximal improvements up to 2.0 BLEU points, meanwhile its efficiency is comparable to that of the global method.

6 0.3581242 54 emnlp-2012-Forced Derivation Tree based Model Training to Statistical Machine Translation

7 0.35048765 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation

8 0.34277686 31 emnlp-2012-Cross-Lingual Language Modeling with Syntactic Reordering for Low-Resource Speech Recognition

9 0.33304292 45 emnlp-2012-Exploiting Chunk-level Features to Improve Phrase Chunking

10 0.32745361 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning

11 0.31328997 25 emnlp-2012-Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation

12 0.28271011 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules

13 0.27888906 67 emnlp-2012-Inducing a Discriminative Parser to Optimize Machine Translation Reordering

14 0.27615663 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT

15 0.26421511 97 emnlp-2012-Natural Language Questions for the Web of Data

16 0.26385087 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP

17 0.26229313 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction

18 0.24843939 118 emnlp-2012-Source Language Adaptation for Resource-Poor Machine Translation

19 0.24769285 39 emnlp-2012-Enlarging Paraphrase Collections through Generalization and Instantiation

20 0.21285687 127 emnlp-2012-Transforming Trees to Improve Syntactic Convergence


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.019), (14, 0.043), (16, 0.028), (25, 0.019), (34, 0.134), (41, 0.264), (60, 0.1), (63, 0.08), (65, 0.021), (70, 0.018), (74, 0.062), (76, 0.042), (79, 0.023), (80, 0.017), (86, 0.021), (95, 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83521104 11 emnlp-2012-A Systematic Comparison of Phrase Table Pruning Techniques

Author: Richard Zens ; Daisy Stanton ; Peng Xu

Abstract: When trained on very large parallel corpora, the phrase table component of a machine translation system grows to consume vast computational resources. In this paper, we introduce a novel pruning criterion that places phrase table pruning on a sound theoretical foundation. Systematic experiments on four language pairs under various data conditions show that our principled approach is superior to existing ad hoc pruning methods.

2 0.82600999 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures

Author: Zhongguo Li ; Guodong Zhou

Abstract: Most previous approaches to syntactic parsing of Chinese rely on a preprocessing step of word segmentation, thereby assuming there was a clearly defined boundary between morphology and syntax in Chinese. We show how this assumption can fail badly, leading to many out-of-vocabulary words and incompatible annotations. Hence in practice the strict separation of morphology and syntax in the Chinese language proves to be untenable. We present a unified dependency parsing approach for Chinese which takes unsegmented sentences as input and outputs both morphological and syntactic structures with a single model and algorithm. By removing the intermediate word segmentation, the unified parser no longer needs separate notions for words and phrases. Evaluation proves the effectiveness of the unified model and algorithm in parsing structures of words, phrases and sen- tences simultaneously. 1

3 0.68102455 42 emnlp-2012-Entropy-based Pruning for Phrase-based Machine Translation

Author: Wang Ling ; Joao Graca ; Isabel Trancoso ; Alan Black

Abstract: Phrase-based machine translation models have shown to yield better translations than Word-based models, since phrase pairs encode the contextual information that is needed for a more accurate translation. However, many phrase pairs do not encode any relevant context, which means that the translation event encoded in that phrase pair is led by smaller translation events that are independent from each other, and can be found on smaller phrase pairs, with little or no loss in translation accuracy. In this work, we propose a relative entropy model for translation models, that measures how likely a phrase pair encodes a translation event that is derivable using smaller translation events with similar probabilities. This model is then applied to phrase table pruning. Tests show that considerable amounts of phrase pairs can be excluded, without much impact on the transla- . tion quality. In fact, we show that better translations can be obtained using our pruned models, due to the compression of the search space during decoding.

4 0.60299248 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage

Author: Kenneth Heafield ; Philipp Koehn ; Alon Lavie

Abstract: Approximate search algorithms, such as cube pruning in syntactic machine translation, rely on the language model to estimate probabilities of sentence fragments. We contribute two changes that trade between accuracy of these estimates and memory, holding sentence-level scores constant. Common practice uses lowerorder entries in an N-gram model to score the first few words of a fragment; this violates assumptions made by common smoothing strategies, including Kneser-Ney. Instead, we use a unigram model to score the first word, a bigram for the second, etc. This improves search at the expense of memory. Conversely, we show how to save memory by collapsing probability and backoff into a single value without changing sentence-level scores, at the expense of less accurate estimates for sentence fragments. These changes can be stacked, achieving better estimates with unchanged memory usage. In order to interpret changes in search accuracy, we adjust the pop limit so that accuracy is unchanged and report the change in CPU time. In a GermanEnglish Moses system with target-side syntax, improved estimates yielded a 63% reduction in CPU time; for a Hiero-style version, the reduction is 21%. The compressed language model uses 26% less RAM while equivalent search quality takes 27% more CPU. Source code is released as part of KenLM.

5 0.58104873 114 emnlp-2012-Revisiting the Predictability of Language: Response Completion in Social Media

Author: Bo Pang ; Sujith Ravi

Abstract: The question “how predictable is English?” has long fascinated researchers. While prior work has focused on formal English typically used in news articles, we turn to texts generated by users in online settings that are more informal in nature. We are motivated by a novel application scenario: given the difficulty of typing on mobile devices, can we help reduce typing effort with message completion, especially in conversational settings? We propose a method for automatic response completion. Our approach models both the language used in responses and the specific context provided by the original message. Our experimental results on a large-scale dataset show that both components help reduce typing effort. We also perform an information-theoretic study in this setting and examine the entropy of user-generated content, especially in con- versational scenarios, to better understand predictability of user generated English.

6 0.57794195 54 emnlp-2012-Forced Derivation Tree based Model Training to Statistical Machine Translation

7 0.57592446 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT

8 0.57577842 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP

9 0.57085323 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling

10 0.57081807 104 emnlp-2012-Parse, Price and Cut-Delayed Column and Row Generation for Graph Based Parsers

11 0.57077354 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation

12 0.56879908 22 emnlp-2012-Automatically Constructing a Normalisation Dictionary for Microblogs

13 0.56628549 67 emnlp-2012-Inducing a Discriminative Parser to Optimize Machine Translation Reordering

14 0.56169635 108 emnlp-2012-Probabilistic Finite State Machines for Regression-based MT Evaluation

15 0.55944365 45 emnlp-2012-Exploiting Chunk-level Features to Improve Phrase Chunking

16 0.5591954 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

17 0.55865914 95 emnlp-2012-N-gram-based Tense Models for Statistical Machine Translation

18 0.55822194 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM

19 0.55703402 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing

20 0.55575484 50 emnlp-2012-Extending Machine Translation Evaluation Metrics with Lexical Cohesion to Document Level