acl acl2013 acl2013-226 knowledge-graph by maker-knowledge-mining

226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT


Source: pdf

Author: Wenduan Xu ; Yue Zhang ; Philip Williams ; Philipp Koehn

Abstract: We present a context-sensitive chart pruning method for CKY-style MT decoding. Source phrases that are unlikely to have aligned target constituents are identified using sequence labellers learned from the parallel corpus, and speed-up is obtained by pruning corresponding chart cells. The proposed method is easy to implement, orthogonal to cube pruning and additive to its pruning power. On a full-scale Englishto-German experiment with a string-totree model, we obtain a speed-up of more than 60% over a strong baseline, with no loss in BLEU.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Learning to Prune: Context-Sensitive Pruning for Syntactic MT Wenduan Xu Computer Laboratory University of Cambridge wenduan . [sent-1, score-0.071]

2 uk Yue Zhang Singapore University of Technology and Design yue zhang@ sutd . [sent-5, score-0.107]

3 uk Abstract We present a context-sensitive chart pruning method for CKY-style MT decoding. [sent-15, score-0.829]

4 Source phrases that are unlikely to have aligned target constituents are identified using sequence labellers learned from the parallel corpus, and speed-up is obtained by pruning corresponding chart cells. [sent-16, score-1.254]

5 The proposed method is easy to implement, orthogonal to cube pruning and additive to its pruning power. [sent-17, score-1.239]

6 On a full-scale Englishto-German experiment with a string-totree model, we obtain a speed-up of more than 60% over a strong baseline, with no loss in BLEU. [sent-18, score-0.031]

7 1 Introduction Syntactic MT models suffer from decoding efficiency bottlenecks introduced by online n-gram language model integration and high grammar complexity. [sent-19, score-0.376]

8 Various efforts have been devoted to improving decoding efficiency, including hypergraph rescoring (Heafield et al. [sent-20, score-0.25]

9 , 2008; Zhang and Gildea, 2008) and grammar transformations (Zhang et al. [sent-22, score-0.07]

10 For more expressive, linguistically-motivated syntactic MT models (Galley et al. [sent-24, score-0.036]

11 , 2006), the grammar complexity has grown considerably over hierarchical phrase-based models (Chiang, 2007), and decoding still suffers from efficiency issues (DeNero et al. [sent-26, score-0.345]

12 In this paper, we study a chart pruning method for CKY-style MT decoding that is orthogonal to cube pruning (Chiang, 2007) and additive to its pruning power. [sent-28, score-2.272]

13 The main intuition of our method is to find those source phrases (i. [sent-29, score-0.103]

14 any sequence of consecutive words) that are unlikely to have any consistently aligned target counterparts according to the source context and grammar constraints. [sent-31, score-0.346]

15 We show that by using highly-efficient sequence labelling models learned from the bitext used for translation model training, such phrases can be effectively identified prior to MT decoding, and corresponding chart cells can be excluded for decoding without affecting translation quality. [sent-32, score-1.23]

16 We call our method context-sensitive pruning (CSP); it can be viewed as a bilingual adaptation of similar methods in monolingual parsing (Roark and Hollingshead, 2008; Zhang et al. [sent-33, score-0.583]

17 , 2010) which improve parsing efficiency by “closing” chart cells using binary classifiers. [sent-34, score-0.68]

18 Our contribution is that we demonstrate such methods can be applied to synchronous-grammar parsing by labelling the source-side alone. [sent-35, score-0.219]

19 This is achieved through a novel training scheme where the labelling models are trained over the word-aligned bitext and gold-standard pruning labels are obtained by projecting target-side constituents to the source words. [sent-36, score-0.833]

20 Results on a full-scale English-to-German experiment show that it gives more than 60% speed-up over a strong cube pruning baseline, with no loss in BLEU. [sent-39, score-0.688]

21 bp,ircaeousz ,hßeinNP-SB1i Figure 1: A selection of grammar rules extractable from an example word-aligned sentence pair. [sent-48, score-0.105]

22 2 The Baseline String-to-Tree Model Our baseline translation model uses the rule extraction algorithm of Chiang (2007) adapted to a string-to-tree grammar. [sent-49, score-0.195]

23 (2003), all pairs whose target phrases are not exhaustively dominated by a constituent of the parse tree are removed and each remaining pair, hf, ei, together rwemitho vietsd c aonndst eitaucehnt r label, Cing, f poarmir,s a l,eexi-, ical grammar rule: C → hf, ei. [sent-51, score-0.262]

24 The rules r1, r2, iacnadl r3 imn Figure 1: are →lex hifca,le ir. [sent-52, score-0.035]

25 e rNuloens- rlexical rules are generated by eliminating one or more pairs of terminal substrings from an existing rule and substituting non-terminals. [sent-54, score-0.145]

26 This process produces the example rules r4 and r5. [sent-55, score-0.035]

27 Our decoding algorithm is a variant of CKY and is similar to other algorithms tailored for specific syntactic translation grammars (DeNero et al. [sent-56, score-0.332]

28 By taking the source-side of each rule, projecting onto it the non-terminal labels from the target-side, and weighting the grammar according to the model’s local scoring features, decoding is a straightforward extension of monolingual weighted chart parsing. [sent-58, score-0.681]

29 Non-local features, such as n-gram language model scores, are incorporated through cube pruning (Chiang, 2007). [sent-59, score-0.657]

30 1 Motivations The abstract rules and large non-terminal sets of many syntactic MT grammars cause translation NP NP-TOP NN Wert NP-AG NP NN ART NN NN DEG der Produkte 製品 の value of the products (a) en-de value of the 価値 products (b) en-jp Figure 2: Two example alignments. [sent-61, score-0.421]

31 In (a) “the products” does not have a consistent alignment on the target side, while it does in (b). [sent-62, score-0.157]

32 overgeneration at the span level and render decoding inefficient. [sent-63, score-0.284]

33 Prior work on monolingual syntactic parsing has demonstrated that by excluding chart cells that are likely to violate constituent constraints, decoding efficiency can be improved with no loss in accuracy (Roark and Hollingshead, 2008). [sent-64, score-1.087]

34 We consider a similar mechanism for syntactic MT decoding by prohibiting subtranslation generation for chart cells violating synchronousgrammar constraints. [sent-65, score-0.837]

35 A motivating example is shown in Figure 2a, where a segment of an English-German sentence pair from the training data, along with its word alignment and target-side parse tree is depicted. [sent-66, score-0.061]

36 The English phrases “value of” and “the products” do not have corresponding German translations in this example. [sent-67, score-0.042]

37 Although the grammar may have rules to translate these two phrases, they can be safely pruned for this particular sentence pair. [sent-68, score-0.221]

38 In contrast to chart pruning for monolingual parsing, our pruning decisions are based on the source context, its target translation and the mapping between the two. [sent-69, score-1.609]

39 This distinction is important since the syntactic correspondence between different language pairs is different. [sent-70, score-0.036]

40 Suppose that we were to translate the same English sentence into Japanese (Figure 2a); unlike the English to German example, the English phrase “the products” will be a valid phrase that has a Japanese translation under a target constituent, since it is syntactically aligned to “製品” (Figure 2b). [sent-71, score-0.346]

41 The key question to consider is how to inject target syntax and word alignment information into our labelling models, so that pruning decisions can be based on the source alone, we address this in the following two sections. [sent-72, score-0.885]

42 2 Pruning by Labelling We use binary tags to indicate whether a source word can start or end a multi-word phrase that has 353 1 0 1 1 1 (a) b-tags 1 1 1 0 1 (b) e-tags Figure 3: The pruning effects of two types of binary tags. [sent-74, score-0.684]

43 The shaded cells are pruned and two types of tags are assigned independently. [sent-75, score-0.413]

44 Under this scheme, a b-tag value of 1 indicates that a source word can be the start of a source phrase that has a consistently aligned target phrase; similarly an e-tag of0 indicates that a word cannot end a source phrase. [sent-78, score-0.451]

45 If either the b-tag or the e-tag of an input phrase is 0, the corresponding chart cells will be pruned. [sent-79, score-0.629]

46 The pruning effects of the two types of tags are illustrated in Figure 3. [sent-80, score-0.56]

47 In general, 0-valued b-tags prune a whole column of chart cells and 0-valued e-tags prune a whole diagonal of cells; and the chart cells on the first row and the top-most cell are always kept so that complete translations can always be found. [sent-81, score-1.353]

48 We build a separate labeller for each tag type using gold-standard b- and e-tags, respectively. [sent-82, score-0.042]

49 We train the labellers with maximum-entropy models (Curran and Clark, 2003; Ratnaparkhi, 1996), using features similar to those used for suppertagging for CCG parsing (Clark and Curran, 2004). [sent-83, score-0.256]

50 In each case, features for a pruning tag consist of word and POS uni-grams extracted from the 5word window with the current word in the middle, POS trigrams ending with the current word, as well as two previous tags as a bigram and two separate uni-grams. [sent-84, score-0.602]

51 Our pruning labellers are highly efficient, run in linear time and add little overhead to decoding. [sent-85, score-0.713]

52 During testing, in order to prevent overpruning, a probability cutoff value θ is used. [sent-86, score-0.106]

53 A tag value of 0 is assigned to a word only if its marginal probability is greater than θ. [sent-87, score-0.074]

54 3 Gold-standard Pruning Tags Gold-standard tags are extracted from the word- aligned bitext used for translation model training, respecting rule extraction constraints, which is crucial for the success of our method. [sent-89, score-0.354]

55 First, we initialize both tags of each source word to 0s. [sent-91, score-0.121]

56 Then, we iterate through all target constituent spans, and for each span, we find its corresponding source phrase, as determined by the word alignment. [sent-92, score-0.211]

57 If a constituent exists for the phrase pair, the b-tag of the first word and the e-tag of the last word in the source phrase are set to 1s, respectively. [sent-93, score-0.283]

58 Taking the target constituent span covering “der Produkte” as an example, the source phrase under a consistent word alignment is “of the products”. [sent-98, score-0.426]

59 After considering all target constituent spans, the complete b- and e-tag se- quences for the source-side phrase in Figure 2a are [1, 1, 0, 0] and [0, 0, 1, 1] , respectively. [sent-100, score-0.257]

60 Note that, since we never prune single-word spans, we ignore source phrases under consistent one-to-one or one-to-many alignments. [sent-101, score-0.216]

61 69% of the 54M words do not begin a multi-word aligned phrase and 77. [sent-103, score-0.137]

62 71% do not end a multiword aligned phrase; the 1-best accuracies of the two labellers tested on a held-out 20K sentences are 82. [sent-104, score-0.287]

63 BLEU (x106) Hypothesis Count (b) hypo count vs. [sent-108, score-0.071]

64 BLEU Figure 4: Translation quality comparison with the cube pruning baseline. [sent-109, score-0.657]

65 For all experiments, a 5-gram language model with Kneser-Ney smoothing (Chen and Goodman, 1996) built with the SRILM Toolkit (Stolcke and others, 2002) is used. [sent-117, score-0.036]

66 The development and test sets are the 2008 WMT newstest (2,051 sentences) and 2009 WMT newstest (2,525 sentences) respectively. [sent-118, score-0.142]

67 For both rule extraction and decoding, up to seven terminal/non-terminal symbols on the source-side are allowed. [sent-121, score-0.07]

68 For decoding, the maximum spanlength is restricted to 15, and the grammar is prefiltered to match the entire test set for both the baseline system and the chart pruning decoder. [sent-122, score-0.932]

69 We use two labellers to perform b- and e-tag labelling independently prior to decoding. [sent-123, score-0.389]

70 Training of the labelling models is able to complete in under 2. [sent-124, score-0.22]

71 A standard perceptron POS tagger (Collins, 2002) trained on Wall Street Journal sections 2-21 of the Penn Treebank is used to assign POS tags for both our training and test data. [sent-126, score-0.09]

72 2 Results Figures 4a and 4b compare CSP with the cube pruning baseline in terms of BLEU. [sent-128, score-0.69]

73 Decoding speed is measured by the average decoding time and average number of hypotheses generated per sentence. [sent-129, score-0.246]

74 We first run the baseline decoder under various beam settings (b = 100 - 2500) until no further increase in BLEU is observed. [sent-130, score-0.17]

75 We then run the CSP decoder with a range of θ values (θ = 0. [sent-131, score-0.062]

76 99), at the default beam size uofe s1 (0θ00 = =o f0 9th1e −ba 0se. [sent-133, score-0.075]

77 uTlhte b CeaSmP dizeecoder, which considers far fewer chart cells and generates significantly fewer subtranslations, consistently outperforms the slower baseline. [sent-135, score-0.679]

78 At all levels of comparable translation quality, our decoder is faster than the baseline. [sent-139, score-0.154]

79 58% as measured by average decoding time, and comparing on a point-by-point basis, our decoder always runs over 60% faster. [sent-141, score-0.266]

80 30%, compared with a beam size of 400 for the baseline, where both achieved the highest BLEU. [sent-144, score-0.075]

81 Figures 5a and 5b demonstrate the pruning power of CSP (θ = 0. [sent-145, score-0.5]

82 95) in comparison with the baseline (beam size = 300); across all the cutoff values and beam sizes, the CSP decoder considers 54. [sent-146, score-0.244]

83 92% fewer translation hypotheses on average and the minimal reduction achieved is 46. [sent-147, score-0.168]

84 Figure 6 shows the percentage of spans of different lengths pruned by CSP (θ = 0. [sent-149, score-0.233]

85 hypo count Sentence Length (b) sentence length vs. [sent-152, score-0.071]

86 cell count Figure 5: Search space comparison with the cube pruning baseline. [sent-153, score-0.692]

87 Span Length Figure 6: Percentage of spans of different lengths pruned at θ = 0. [sent-154, score-0.233]

88 pected, longer spans are pruned more often, as they are more likely to be at the intersections of cells pruned by the two types of pruning labels, thus can be pruned by either type. [sent-156, score-1.17]

89 We also find CSP does not improve search quality and it leads to slightly lower model scores, which shows that some higher scored translation hypotheses are pruned. [sent-157, score-0.134]

90 Since our pruning decisions are based on independent labellers using contextual information, with the objective of eliminating unlikely subtranslations and rule applications. [sent-159, score-0.969]

91 It may even offset defects of the translation model (i. [sent-160, score-0.092]

92 Finally, it is worth noting that our string-to-tree model does not force complete target parses to be built during decoding, which is not required in our pruning method either. [sent-164, score-0.598]

93 We do not use any other heuristics (other than keeping singleton and the top-most cells) to make complete translation always possible. [sent-165, score-0.136]

94 The hypothesis here is that good labelling models should not affect the derivation of complete target translations. [sent-166, score-0.274]

95 5 Conclusion We presented a novel sequence labelling based, context-sensitive pruning method for a string-totree MT model. [sent-167, score-0.676]

96 Our method achieves more than 60% speed-up over a state-of-the-art baseline on a full-scale translation task. [sent-168, score-0.125]

97 In future work, we plan to adapt our method to models with different rule extraction algorithms, such as Hiero and forest-based translation (Mi and Huang, 2008). [sent-169, score-0.162]

98 An empirical study of smoothing techniques for language modeling. [sent-179, score-0.036]

99 Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. [sent-198, score-0.03]

100 Scalable inference and training of context-rich syntactic translation models. [sent-225, score-0.128]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pruning', 0.5), ('chart', 0.329), ('cells', 0.237), ('csp', 0.219), ('labellers', 0.213), ('decoding', 0.204), ('labelling', 0.176), ('cube', 0.157), ('pruned', 0.116), ('products', 0.097), ('constituent', 0.096), ('translation', 0.092), ('koehn', 0.09), ('spans', 0.085), ('mt', 0.083), ('roark', 0.081), ('curran', 0.076), ('beam', 0.075), ('bleu', 0.075), ('cutoff', 0.074), ('aligned', 0.074), ('hollingshead', 0.071), ('hypo', 0.071), ('kristy', 0.071), ('newstest', 0.071), ('produkte', 0.071), ('subtranslations', 0.071), ('wenduan', 0.071), ('efficiency', 0.071), ('prune', 0.071), ('grammar', 0.07), ('rule', 0.07), ('zhang', 0.063), ('phrase', 0.063), ('decoder', 0.062), ('alignment', 0.061), ('source', 0.061), ('tags', 0.06), ('denero', 0.058), ('bitext', 0.058), ('galley', 0.058), ('sutd', 0.058), ('chiang', 0.057), ('nn', 0.057), ('target', 0.054), ('philipp', 0.051), ('yue', 0.049), ('span', 0.049), ('hf', 0.048), ('hopkins', 0.048), ('rescoring', 0.046), ('consistently', 0.045), ('heafield', 0.045), ('clark', 0.044), ('complete', 0.044), ('och', 0.043), ('parsing', 0.043), ('additive', 0.042), ('hypotheses', 0.042), ('unlikely', 0.042), ('consistent', 0.042), ('phrases', 0.042), ('tag', 0.042), ('orthogonal', 0.04), ('monolingual', 0.04), ('eliminating', 0.04), ('williams', 0.04), ('projecting', 0.038), ('ccg', 0.037), ('europarl', 0.037), ('wmt', 0.037), ('huang', 0.036), ('syntactic', 0.036), ('hao', 0.036), ('smoothing', 0.036), ('cell', 0.035), ('rules', 0.035), ('min', 0.034), ('fewer', 0.034), ('brian', 0.034), ('baseline', 0.033), ('decisions', 0.033), ('synchronous', 0.033), ('lengths', 0.032), ('value', 0.032), ('reform', 0.031), ('istd', 0.031), ('srg', 0.031), ('overgeneration', 0.031), ('pected', 0.031), ('isf', 0.031), ('bottlenecks', 0.031), ('kon', 0.031), ('prohibiting', 0.031), ('stolcke', 0.031), ('german', 0.031), ('loss', 0.031), ('perceptron', 0.03), ('gildea', 0.03), ('pages', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999934 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT

Author: Wenduan Xu ; Yue Zhang ; Philip Williams ; Philipp Koehn

Abstract: We present a context-sensitive chart pruning method for CKY-style MT decoding. Source phrases that are unlikely to have aligned target constituents are identified using sequence labellers learned from the parallel corpus, and speed-up is obtained by pruning corresponding chart cells. The proposed method is easy to implement, orthogonal to cube pruning and additive to its pruning power. On a full-scale Englishto-German experiment with a string-totree model, we obtain a speed-up of more than 60% over a strong baseline, with no loss in BLEU.

2 0.1992811 374 acl-2013-Using Context Vectors in Improving a Machine Translation System with Bridge Language

Author: Samira Tofighi Zahabi ; Somayeh Bakhshaei ; Shahram Khadivi

Abstract: Mapping phrases between languages as translation of each other by using an intermediate language (pivot language) may generate translation pairs that are wrong. Since a word or a phrase has different meanings in different contexts, we should map source and target phrases in an intelligent way. We propose a pruning method based on the context vectors to remove those phrase pairs that connect to each other by a polysemous pivot phrase or by weak translations. We use context vectors to implicitly disambiguate the phrase senses and to recognize irrelevant phrase translation pairs. Using the proposed method a relative improvement of 2.8 percent in terms of BLEU score is achieved. 1

3 0.1886946 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers

Author: Graham Neubig

Abstract: In this paper we describe Travatar, a forest-to-string machine translation (MT) engine based on tree transducers. It provides an open-source C++ implementation for the entire forest-to-string MT pipeline, including rule extraction, tuning, decoding, and evaluation. There are a number of options for model training, and tuning includes advanced options such as hypergraph MERT, and training of sparse features through online learning. The training pipeline is modeled after that of the popular Moses decoder, so users familiar with Moses should be able to get started quickly. We perform a validation experiment of the decoder on EnglishJapanese machine translation, and find that it is possible to achieve greater accuracy than translation using phrase-based and hierarchical-phrase-based translation. As auxiliary results, we also compare different syntactic parsers and alignment techniques that we tested in the process of developing the decoder. Travatar is available under the LGPL at http : / /phont ron . com/t ravat ar

4 0.17924014 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

Author: Yang Liu

Abstract: We introduce a shift-reduce parsing algorithm for phrase-based string-todependency translation. As the algorithm generates dependency trees for partial translations left-to-right in decoding, it allows for efficient integration of both n-gram and dependency language models. To resolve conflicts in shift-reduce parsing, we propose a maximum entropy model trained on the derivation graph of training data. As our approach combines the merits of phrase-based and string-todependency models, it achieves significant improvements over the two baselines on the NIST Chinese-English datasets.

5 0.15322646 320 acl-2013-Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation

Author: Fabienne Braune ; Nina Seemann ; Daniel Quernheim ; Andreas Maletti

Abstract: We present a new translation model integrating the shallow local multi bottomup tree transducer. We perform a largescale empirical evaluation of our obtained system, which demonstrates that we significantly beat a realistic tree-to-tree baseline on the WMT 2009 English → German tlriannes olnati tohne tWasMk.T TA 2s0 an a Edndgitliisonha →l c Gonetrrmibauntion we make the developed software and complete tool-chain publicly available for further experimentation.

6 0.15083544 314 acl-2013-Semantic Roles for String to Tree Machine Translation

7 0.14807384 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation

8 0.14789334 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing

9 0.14713244 200 acl-2013-Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation

10 0.14409813 325 acl-2013-Smoothed marginal distribution constraints for language modeling

11 0.14330158 38 acl-2013-Additive Neural Networks for Statistical Machine Translation

12 0.14289187 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation

13 0.12462708 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation

14 0.12090355 10 acl-2013-A Markov Model of Machine Translation using Non-parametric Bayesian Inference

15 0.11127645 255 acl-2013-Name-aware Machine Translation

16 0.11022298 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing

17 0.10897954 307 acl-2013-Scalable Decipherment for Machine Translation via Hash Sampling

18 0.10732938 9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner

19 0.10695006 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation

20 0.10631884 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.245), (1, -0.224), (2, 0.079), (3, 0.125), (4, -0.081), (5, 0.048), (6, 0.035), (7, -0.003), (8, -0.003), (9, 0.026), (10, -0.005), (11, 0.041), (12, 0.018), (13, -0.038), (14, 0.063), (15, -0.021), (16, 0.005), (17, 0.002), (18, -0.019), (19, 0.058), (20, -0.041), (21, -0.01), (22, -0.008), (23, 0.027), (24, -0.008), (25, -0.018), (26, -0.019), (27, -0.03), (28, 0.009), (29, 0.04), (30, 0.045), (31, 0.02), (32, -0.039), (33, 0.004), (34, -0.022), (35, -0.032), (36, 0.073), (37, -0.016), (38, -0.053), (39, 0.002), (40, -0.057), (41, 0.092), (42, -0.065), (43, 0.09), (44, 0.093), (45, 0.107), (46, -0.038), (47, 0.028), (48, 0.02), (49, -0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94770509 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT

Author: Wenduan Xu ; Yue Zhang ; Philip Williams ; Philipp Koehn

Abstract: We present a context-sensitive chart pruning method for CKY-style MT decoding. Source phrases that are unlikely to have aligned target constituents are identified using sequence labellers learned from the parallel corpus, and speed-up is obtained by pruning corresponding chart cells. The proposed method is easy to implement, orthogonal to cube pruning and additive to its pruning power. On a full-scale Englishto-German experiment with a string-totree model, we obtain a speed-up of more than 60% over a strong baseline, with no loss in BLEU.

2 0.82756484 320 acl-2013-Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation

Author: Fabienne Braune ; Nina Seemann ; Daniel Quernheim ; Andreas Maletti

Abstract: We present a new translation model integrating the shallow local multi bottomup tree transducer. We perform a largescale empirical evaluation of our obtained system, which demonstrates that we significantly beat a realistic tree-to-tree baseline on the WMT 2009 English → German tlriannes olnati tohne tWasMk.T TA 2s0 an a Edndgitliisonha →l c Gonetrrmibauntion we make the developed software and complete tool-chain publicly available for further experimentation.

3 0.81619239 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers

Author: Graham Neubig

Abstract: In this paper we describe Travatar, a forest-to-string machine translation (MT) engine based on tree transducers. It provides an open-source C++ implementation for the entire forest-to-string MT pipeline, including rule extraction, tuning, decoding, and evaluation. There are a number of options for model training, and tuning includes advanced options such as hypergraph MERT, and training of sparse features through online learning. The training pipeline is modeled after that of the popular Moses decoder, so users familiar with Moses should be able to get started quickly. We perform a validation experiment of the decoder on EnglishJapanese machine translation, and find that it is possible to achieve greater accuracy than translation using phrase-based and hierarchical-phrase-based translation. As auxiliary results, we also compare different syntactic parsers and alignment techniques that we tested in the process of developing the decoder. Travatar is available under the LGPL at http : / /phont ron . com/t ravat ar

4 0.79286581 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

Author: Yang Liu

Abstract: We introduce a shift-reduce parsing algorithm for phrase-based string-todependency translation. As the algorithm generates dependency trees for partial translations left-to-right in decoding, it allows for efficient integration of both n-gram and dependency language models. To resolve conflicts in shift-reduce parsing, we propose a maximum entropy model trained on the derivation graph of training data. As our approach combines the merits of phrase-based and string-todependency models, it achieves significant improvements over the two baselines on the NIST Chinese-English datasets.

5 0.77747613 312 acl-2013-Semantic Parsing as Machine Translation

Author: Jacob Andreas ; Andreas Vlachos ; Stephen Clark

Abstract: Semantic parsing is the problem of deriving a structured meaning representation from a natural language utterance. Here we approach it as a straightforward machine translation task, and demonstrate that standard machine translation components can be adapted into a semantic parser. In experiments on the multilingual GeoQuery corpus we find that our parser is competitive with the state of the art, and in some cases achieves higher accuracy than recently proposed purpose-built systems. These results support the use of machine translation methods as an informative baseline in semantic parsing evaluations, and suggest that research in semantic parsing could benefit from advances in machine translation.

6 0.74922132 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation

7 0.69866413 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation

8 0.6893633 10 acl-2013-A Markov Model of Machine Translation using Non-parametric Bayesian Inference

9 0.68723339 328 acl-2013-Stacking for Statistical Machine Translation

10 0.67458159 363 acl-2013-Two-Neighbor Orientation Model with Cross-Boundary Global Contexts

11 0.67023808 77 acl-2013-Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT?

12 0.6691069 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation

13 0.66389561 200 acl-2013-Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation

14 0.66112173 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation

15 0.65708399 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding

16 0.64749449 314 acl-2013-Semantic Roles for String to Tree Machine Translation

17 0.64535326 330 acl-2013-Stem Translation with Affix-Based Rule Selection for Agglutinative Languages

18 0.63549012 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation

19 0.63021171 307 acl-2013-Scalable Decipherment for Machine Translation via Hash Sampling

20 0.6192503 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.071), (6, 0.055), (11, 0.058), (14, 0.012), (24, 0.022), (26, 0.072), (28, 0.013), (35, 0.059), (40, 0.018), (42, 0.103), (48, 0.031), (70, 0.049), (88, 0.035), (90, 0.107), (95, 0.083), (97, 0.139)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.87409252 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT

Author: Wenduan Xu ; Yue Zhang ; Philip Williams ; Philipp Koehn

Abstract: We present a context-sensitive chart pruning method for CKY-style MT decoding. Source phrases that are unlikely to have aligned target constituents are identified using sequence labellers learned from the parallel corpus, and speed-up is obtained by pruning corresponding chart cells. The proposed method is easy to implement, orthogonal to cube pruning and additive to its pruning power. On a full-scale Englishto-German experiment with a string-totree model, we obtain a speed-up of more than 60% over a strong baseline, with no loss in BLEU.

2 0.84031886 338 acl-2013-Task Alternation in Parallel Sentence Retrieval for Twitter Translation

Author: Felix Hieber ; Laura Jehl ; Stefan Riezler

Abstract: We present an approach to mine comparable data for parallel sentences using translation-based cross-lingual information retrieval (CLIR). By iteratively alternating between the tasks of retrieval and translation, an initial general-domain model is allowed to adapt to in-domain data. Adaptation is done by training the translation system on a few thousand sentences retrieved in the step before. Our setup is time- and memory-efficient and of similar quality as CLIR-based adaptation on millions of parallel sentences.

3 0.80902719 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals

Author: Haifeng Hu ; Bingquan Liu ; Baoxun Wang ; Ming Liu ; Xiaolong Wang

Abstract: In this paper, we address the problem for predicting cQA answer quality as a classification task. We propose a multimodal deep belief nets based approach that operates in two stages: First, the joint representation is learned by taking both textual and non-textual features into a deep learning network. Then, the joint representation learned by the network is used as input features for a linear classifier. Extensive experimental results conducted on two cQA datasets demonstrate the effectiveness of our proposed approach.

4 0.79429573 320 acl-2013-Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation

Author: Fabienne Braune ; Nina Seemann ; Daniel Quernheim ; Andreas Maletti

Abstract: We present a new translation model integrating the shallow local multi bottomup tree transducer. We perform a largescale empirical evaluation of our obtained system, which demonstrates that we significantly beat a realistic tree-to-tree baseline on the WMT 2009 English → German tlriannes olnati tohne tWasMk.T TA 2s0 an a Edndgitliisonha →l c Gonetrrmibauntion we make the developed software and complete tool-chain publicly available for further experimentation.

5 0.78746641 390 acl-2013-Word surprisal predicts N400 amplitude during reading

Author: Stefan L. Frank ; Leun J. Otten ; Giulia Galli ; Gabriella Vigliocco

Abstract: We investigated the effect of word surprisal on the EEG signal during sentence reading. On each word of 205 experimental sentences, surprisal was estimated by three types of language model: Markov models, probabilistic phrasestructure grammars, and recurrent neural networks. Four event-related potential components were extracted from the EEG of 24 readers of the same sentences. Surprisal estimates under each model type formed a significant predictor of the amplitude of the N400 component only, with more surprising words resulting in more negative N400s. This effect was mostly due to content words. These findings provide support for surprisal as a gener- ally applicable measure of processing difficulty during language comprehension.

6 0.78570253 197 acl-2013-Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation

7 0.78501308 139 acl-2013-Entity Linking for Tweets

8 0.78170431 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation

9 0.78056157 242 acl-2013-Mining Equivalent Relations from Linked Data

10 0.77840889 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers

11 0.77797568 166 acl-2013-Generalized Reordering Rules for Improved SMT

12 0.77773547 38 acl-2013-Additive Neural Networks for Statistical Machine Translation

13 0.7741245 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing

14 0.77376044 200 acl-2013-Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation

15 0.76437891 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

16 0.76277936 250 acl-2013-Models of Translation Competitions

17 0.7624197 328 acl-2013-Stacking for Statistical Machine Translation

18 0.76189488 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation

19 0.75752705 312 acl-2013-Semantic Parsing as Machine Translation

20 0.7572 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation