acl acl2011 acl2011-155 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nan Duan ; Mu Li ; Ming Zhou
Abstract: This paper presents hypothesis mixture decoding (HM decoding), a new decoding scheme that performs translation reconstruction using hypotheses generated by multiple translation systems. HM decoding involves two decoding stages: first, each component system decodes independently, with the explored search space kept for use in the next step; second, a new search space is constructed by composing existing hypotheses produced by all component systems using a set of rules provided by the HM decoder itself, and a new set of model independent features are used to seek the final best translation from this new search space. Few assumptions are made by our approach about the underlying component systems, enabling us to leverage SMT models based on arbitrary paradigms. We compare our approach with several related techniques, and demonstrate significant BLEU improvements in large-scale Chinese-to-English translation tasks.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract This paper presents hypothesis mixture decoding (HM decoding), a new decoding scheme that performs translation reconstruction using hypotheses generated by multiple translation systems. [sent-2, score-1.801]
2 Few assumptions are made by our approach about the underlying component systems, enabling us to leverage SMT models based on arbitrary paradigms. [sent-4, score-0.16]
3 We compare our approach with several related techniques, and demonstrate significant BLEU improvements in large-scale Chinese-to-English translation tasks. [sent-5, score-0.137]
4 , 2008; Chiang 2010), many researchers have concentrated on the approaches that improve translation quality using information between hypotheses from one or more SMT systems as well. [sent-8, score-0.378]
5 System combination is built on top of the N-best outputs generated by multiple component systems (Rosti et al. [sent-9, score-0.396]
6 , 2009b) which aligns multiple hypotheses to build confusion networks as new search spaces, and outputs 1258 Mu Li, and Ming Zhou Natural Language Computing Group Microsoft Research Asia Beijing, China { mul i mingzhou } @mi cros o ft . [sent-12, score-0.426]
7 Because hypotheses generated by a single model are highly correlated, improvements obtained are usually small; recently, dedicated efforts have been made to extend it from single system to multiple systems (Li et al. [sent-18, score-0.414]
8 Such methods select translations by optimizing consensus models over the combined hypotheses using all component systems’ posterior distributions. [sent-22, score-0.734]
9 Although these two types of approaches have shown consistent improvements over the standard Maximum a Posteriori (MAP) decoding scheme, most of them are implemented as post-processing procedures over translations generated by MAP decoders. [sent-23, score-0.677]
10 (2009a) is different in that both partial and full hypotheses are re-ranked during the decoding phase directly using consensus between translations from differ- ent SMT systems. [sent-25, score-0.998]
11 However, their method does not change component systems’ search spaces. [sent-26, score-0.247]
12 This paper presents hypothesis mixture decoding (HM decoding), a new decoding scheme that performs translation reconstruction using hypotheses generated by multiple component systems. [sent-27, score-1.85]
13 92, 1] 中中国国 的的 经经济济 发发展展 Figure 1: A decoding example of a phrase-based SMT system. [sent-37, score-0.5]
14 Each hypothesis is annotated with a feature vector, which includes a logarithmic probabil- ity feature and a word count feature. [sent-38, score-0.188]
15 theses produced by all component systems using a set of rules provided by the HM decoder itself, and a new set of component model independent features are used to seek the final best translation from this new constructed search space. [sent-39, score-0.76]
16 We evaluate by combining two SMT models with state-of-the-art performances on the NIST Chinese-to-English translation tasks. [sent-40, score-0.148]
17 Experimental results show that our approach outperforms the best component SMT system by up to 2. [sent-41, score-0.189]
18 Consistent improvements can be observed over several related decoding techniques as well, including word-level system combination, collaborative decoding and model combination. [sent-43, score-1.09]
19 Motivated by the success of system combination research, the key contribution of this work is to make more effective use of the extended search spaces from different SMT models in decoding phase directly, rather than just post-processing their final outputs. [sent-46, score-0.797]
20 1 However, hypotheses generated by different SMT systems cannot be combined directly to form new translations because of two major issues: The first one is the heterogeneous structures of . [sent-49, score-0.418]
21 For example, a string-totree system cannot use hypotheses generated by a phrase-based system in decoding procedure, as such hypotheses are based on flat structures, which cannot provide any additional information needed in the syntactic model. [sent-51, score-1.104]
22 The second one is the incompatible feature spaces of different SMT models. [sent-52, score-0.143]
23 To address these two issues discussed above, we propose HM decoding that performs translation reconstruction using hypotheses generated by multiple component systems. [sent-54, score-1.154]
24 2 Our method involves two decoding stages depicted as follows: 1. [sent-55, score-0.52]
25 Independent decoding stage, in which each component system decodes input sentences independently based on its own model and search algorithm, and the explored search spaces (translation forests) are kept for use in the next stage. [sent-56, score-1.021]
26 1 There are also features independent of translation derivations, such as the language model feature. [sent-57, score-0.159]
27 As a result, any phrase-based SMT system can be used as a component in our HM decoding method. [sent-60, score-0.689]
28 Hypotheses light-shaded come from a phrase-based system, and hypotheses darkshaded come from a syntax-based system. [sent-62, score-0.24]
29 HM decoding can use lexicalized hypotheses of arbitrary SMT models to derive translation, and a set of component model independent features are used to compute translation confidence. [sent-65, score-1.079]
30 We discuss mixture search space construction, details of model and feature designs as well as HM decoding algorithms in Section 2. [sent-66, score-0.816]
31 2 Let Mixture Search Space Construction denote component MT systems, denote the span of a source sentence starting at position and ending at position . [sent-71, score-0.203]
32 of denoting the search space predicted by , and denoting the mixture search space constructed by the HM decoder, which is defined recursively as follows: This rule adds all component systems’ search spaces into the mixture search space for use in HM decoding. [sent-75, score-1.067]
33 Thus hypotheses produced by all component systems are still available to the HM decoder. [sent-76, score-0.427]
34 of 1260 in which and is a translation rule provided by HM decoder that composes a new hypothesis using smaller hypotheses in the search spaces These rules further extend with hypotheses generated by the HM decoder itself. [sent-77, score-1.17]
35 Figure 2 shows an example of HM decoding, in which hypotheses generated by two SMT systems are used together to compose new translations. [sent-78, score-0.371]
36 Since search space pruning is the indispensable procedure for all SMT systems, we will omit its explicit expression in the following descriptions and algorithms for convenience. [sent-79, score-0.137]
37 where is an HM decoding feature with its corresponding feature weight In this paper, the HM decoder does not assume the availability of any internal knowledge of the underlying component systems. [sent-83, score-0.844]
38 The HM decoding features are independent of component models as well, which fall into two categories: The first category contains a set of consensusbased features, which are inspired by the success of consensus decoding approaches. [sent-84, score-1.341]
39 These features are described in details as follows: 1) : n-gram the posterior feature of computed based on the component search space generated by : 2) , , , is the posterior probability of an n-gram in is the number of times that occurs in equals to 1 when occurs in and 0 otherwise. [sent-85, score-0.749]
40 the stemmed n-gram posterior feature of computed based on the stemmed component search space A word stem dictionary that includes 22,660 entries is used to convert and into their stem forms and by replacing each word into its stem form. [sent-86, score-0.745]
41 3) : n-gram posterior the feature of computed based on the mixture search space generated by the HM decoder: , . [sent-89, score-0.526]
42 Consensus features based on component search spaces have already shown effectiveness (Kumar et al. [sent-91, score-0.39]
43 We leverage consensus features based on the mixture search space newly generated in HM decoding as well. [sent-95, score-1.061]
44 Although there are more features that can be incorporated into HM decoding besides the ones we list below, we only utilize the most representative ones for convenience: 1) 2) 3) : word count feature. [sent-98, score-0.575]
45 1261 : the dictionary-based feature that counts how many lexicon pairs can be found in a given translation pair 4) and : reordering features that penalize the uses of straight and inverted BTG rules during the derivation of in HM decoding. [sent-101, score-0.425]
46 These two features are specific to BTG-based HM decoding (Section 2. [sent-102, score-0.548]
47 1): 5) and : reordering features that penalize the uses of hierarchical and glue rules during the derivation of in HM decoding. [sent-104, score-0.271]
48 These two features are specific to SCFG-based HM decoding (Section 2. [sent-105, score-0.548]
49 2): , is the hierarchical rule set provided by the HM decoder itself, equals to 1 when is provided by and 0 otherwise. [sent-107, score-0.198]
50 6) the feature that counts how many n-grams in are newly generated by the HM decoder, which cannot be found in all existing component search spaces: : , equals to 1 when does not exist in and 0 otherwise. [sent-108, score-0.437]
51 The MERT algorithm (Och, 2003) is used to tune weights of HM decoding features. [sent-109, score-0.5]
52 4 Decoding Algorithms Two CKY-style algorithms for HM decoding are presented in this subsection. [sent-111, score-0.5]
53 1 BTG-based HM Decoding The first algorithm, BTG-HMD, is presented in Algorithm 1, where hypotheses of two consecutive source spans are composed using two BTG rules: Straight rule . [sent-115, score-0.313]
54 Inverted rule It combines translations of two consecutive blocks into a single larger block in an inverted order. [sent-118, score-0.163]
55 We use two reordering rule penalty features, and to penalize the uses of these two rules. [sent-120, score-0.198]
56 Algorithm 1: BTG-based HM Decoding 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: for each component model output the search space end for for to do for all s. [sent-121, score-0.329]
57 1262 as well, and add them to From line 17 to 20, we update current HM decoding scores for all hypotheses in using the n-gram and length posterior features computed on . [sent-127, score-0.974]
58 Two reordering rule penalty features, and are used to adjust the preferences of using hierarchical rules and glue rules. [sent-134, score-0.267]
59 Algorithm 2: SCFG-based HM Decoding 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: for each component model output the search space end for for to do for all s. [sent-135, score-0.329]
60 , 2010) is an approach that selects translations from a conjoint search space using information from multiple SMT component models; Duan et al. [sent-140, score-0.408]
61 (2010) presents a similar method, which utilizes a mixture model to combine distributions of hypotheses from different systems for Bayes-risk computation, and selects final translations from the combined search spaces using MBR decoding. [sent-141, score-0.685]
62 In contrast, by reusing hypotheses generated by all component systems in HM decoding, translations beyond any existing search space can be generated. [sent-143, score-0.715]
63 (2009a) proposes collaborative decoding, an approach that combines translation systems by re-ranking partial and full translations iteratively using n-gram features from the predictions of other member systems. [sent-146, score-0.382]
64 However, in co-decoding, all member systems must work in a synchronous way, and hypotheses between different systems cannot be shared during decoding procedure; Liu et al. [sent-147, score-0.83]
65 (2009) proposes joint-decoding, in which multiple SMT models are combined in either translation or derivation levels. [sent-148, score-0.159]
66 HM decoding, on the other hand, can use hypotheses from component search spaces directly without any restriction. [sent-150, score-0.582]
67 This method uses the system combination technique in decoding directly to combine partial hypotheses from different SMT models. [sent-154, score-0.875]
68 What’s more, partial hypotheses generated by confusion network decoding cannot be assigned exact feature values for future use in higher level decoding, and they only use feature values of 1-best hypothesis as an approximation. [sent-156, score-1.029]
69 HM decoding, on the other hand, leverages a set of enriched features, which 1263 are computable for all the hypotheses generated by either component systems or the HM decoder. [sent-157, score-0.493]
70 , 2002), which compute the brevity penalty using the shortest reference translation for each segment. [sent-167, score-0.173]
71 2 Component Systems For convenience of comparing HM decoding with several related decoding techniques, we include two state-of-the-art SMT systems as component systems only: PB. [sent-171, score-1.24]
72 Phrasal rules are extracted on all bilingual data, hierarchical rules used in DHPB and reordering rules used in SCFG-HMD are extracted from a selected data set3. [sent-178, score-0.218]
73 3 Contrastive Techniques We compare HM decoding with three multiplesystem based decoding techniques: Word-Level System Combination (SC). [sent-182, score-1.0]
74 4 Comparison to Component Systems We compared HM decoding with two component SMT systems first (in Table 2). [sent-195, score-0.687]
75 HMD or SCFG-HMD; 4 count features for newly generated n-grams in HM decoding for All n-gram posteriors are computed using the efficient algorithm proposed by Kumar et al. [sent-197, score-0.715]
76 Ty6s5409t28e3*m decoding (*: significantly better than each component system with < 0. [sent-204, score-0.689]
77 01) From table 2 we can see, both BTG-HMD and SCFG-HMD outperform decoding results of the best component system (DHPB) with significant improvements: +1. [sent-205, score-0.689]
78 We think the potential reason is that more reordering rules are used in SCFG-HMD to handle phrase movements than BTG-HMD do; however, current HM decoding model lacks the ability to distinguish the qualities of different rules. [sent-213, score-0.614]
79 8 n-gram posterior features based on 2 component search spaces plus 3 commonly used features (1 LM feature, 1 word count feature and 1 dictionary-based feature). [sent-216, score-0.629]
80 8 stemmed n-gram posterior features based on 2 stemmed component search spaces. [sent-218, score-0.577]
81 4 n-gram posterior features and 1 length posterior feature based on the mixture search space of the HM decoder. [sent-220, score-0.596]
82 4 count features for unseen n-grams generated by HM decoder itself. [sent-224, score-0.229]
83 Except for the dictionary-based feature, all the features contained in Set-1 are used by the latest multiple-system based consensus decoding techniques (DeNero et al. [sent-225, score-0.701]
84 Each time, we add one more feature set and describe the changes of performances by drawing two curves for each HM decoding algorithm on MT08 in Figure 3. [sent-229, score-0.604]
85 5 Comparison to System Combination Word-level system combination is state-of-the-art method to improve translation performance using outputs generated by multiple SMT systems. [sent-234, score-0.349]
86 In this paper, we compare our HM decoding with the combination method proposed by Li et al. [sent-235, score-0.566]
87 We think the potential reason for these improvements is that, system combination can only use a small portion of the component systems’ search spaces; HM decoding, on the other hand, can make full use of the entire translation spaces of all component systems. [sent-245, score-0.734]
88 6 Comparison to Consensus Decoding Consensus decoding is another decoding technique that motivates our approach. [sent-247, score-1.0]
89 We compare our HM decoding with two latest multiple-system based consensus decoding approaches, co-decoding and model combination. [sent-248, score-1.153]
90 We list the comparison results in Table 4, in which CD-PB and CD-DHPB denote the translation results of two member systems in co-decoding respectively, CD-Comb denotes the results of further combination using outputs of CD-PB and CD-DHPB, MC denotes the results of model combination. [sent-249, score-0.291]
91 nificantly better than the best result of consensus decoding methods with < 0. [sent-250, score-0.661]
92 7 System Combination over BTG-HMD and SCFG-HMD Outputs As BTG-HMD and SCFG-HMD are based on two different decoding grammars, we could perform system combination over the outputs of these two settings (SCBTG+SCFG) for further improvements as well, just as Li et al. [sent-254, score-0.672]
93 065ts928+of BTG-HMD and SCFG-HMD (+: significantly better than the best HM decoding algorithm (SCFG-HMD) with < 0. [sent-261, score-0.5]
94 05) After system combination, translation results are significantly better than all decoding approaches investigated in this paper: up to 2. [sent-262, score-0.64]
95 11 BLEU points over the best component system (DHPB), up to 1. [sent-263, score-0.213]
96 8 Evaluation of Oracle Translations In the last part, we evaluate the quality of oracle translations on the n-best lists generated by HM decoding and all decoding approaches discussed in this paper. [sent-268, score-1.184]
97 (2007), and each decoding approach outputs its 1000-best hypotheses, which are used to extract oracle translations. [sent-270, score-0.584]
98 significantly better than the best multiple-system based decoding method (CD-Comb) with < 0. [sent-271, score-0.5]
99 5 Conclusion In this paper, we have presented the hypothesis mixture decoding approach to combine multiple SMT models, in which hypotheses generated by multiple component systems are used to compose new translations. [sent-273, score-1.279]
100 HM decoding method integrates 1266 the advantages of both system combination and consensus decoding techniques into a unified framework. [sent-274, score-1.228]
wordName wordTfidf (topN-words)
[('hm', 0.605), ('decoding', 0.5), ('hypotheses', 0.24), ('smt', 0.179), ('component', 0.16), ('consensus', 0.133), ('mixture', 0.131), ('posterior', 0.116), ('translation', 0.111), ('spaces', 0.095), ('decoder', 0.088), ('search', 0.087), ('translations', 0.085), ('stemmed', 0.083), ('reordering', 0.076), ('economic', 0.074), ('duan', 0.07), ('generated', 0.066), ('combination', 0.066), ('btg', 0.065), ('hypothesis', 0.065), ('dhpb', 0.064), ('kumar', 0.064), ('denero', 0.063), ('bleu', 0.061), ('china', 0.056), ('rule', 0.052), ('outputs', 0.051), ('space', 0.05), ('growth', 0.049), ('scbtg', 0.048), ('features', 0.048), ('li', 0.048), ('feature', 0.048), ('newly', 0.046), ('mc', 0.043), ('decodes', 0.043), ('penalty', 0.042), ('nist', 0.041), ('ming', 0.041), ('mu', 0.04), ('partial', 0.04), ('composing', 0.039), ('scfg', 0.039), ('compose', 0.038), ('rules', 0.038), ('performances', 0.037), ('member', 0.036), ('shankar', 0.036), ('collaborative', 0.035), ('dongdong', 0.035), ('oracle', 0.033), ('scfghmd', 0.032), ('economy', 0.032), ('reconstruction', 0.032), ('end', 0.032), ('glue', 0.031), ('equals', 0.03), ('stem', 0.03), ('system', 0.029), ('forests', 0.029), ('nificantly', 0.028), ('tianjin', 0.028), ('computed', 0.028), ('hierarchical', 0.028), ('penalize', 0.028), ('straight', 0.028), ('nan', 0.028), ('chiang', 0.028), ('count', 0.027), ('systems', 0.027), ('convenience', 0.026), ('inverted', 0.026), ('improvements', 0.026), ('tromble', 0.026), ('multiple', 0.026), ('mbr', 0.025), ('cui', 0.025), ('rosti', 0.025), ('points', 0.024), ('xiaodong', 0.023), ('update', 0.023), ('minimum', 0.023), ('franz', 0.023), ('derivation', 0.022), ('decoders', 0.022), ('span', 0.022), ('confusion', 0.022), ('mt', 0.021), ('paradigms', 0.021), ('seek', 0.021), ('source', 0.021), ('stages', 0.02), ('kept', 0.02), ('compute', 0.02), ('covered', 0.02), ('final', 0.02), ('latest', 0.02), ('performs', 0.019), ('add', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999928 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation
Author: Nan Duan ; Mu Li ; Ming Zhou
Abstract: This paper presents hypothesis mixture decoding (HM decoding), a new decoding scheme that performs translation reconstruction using hypotheses generated by multiple translation systems. HM decoding involves two decoding stages: first, each component system decodes independently, with the explored search space kept for use in the next step; second, a new search space is constructed by composing existing hypotheses produced by all component systems using a set of rules provided by the HM decoder itself, and a new set of model independent features are used to seek the final best translation from this new search space. Few assumptions are made by our approach about the underlying component systems, enabling us to leverage SMT models based on arbitrary paradigms. We compare our approach with several related techniques, and demonstrate significant BLEU improvements in large-scale Chinese-to-English translation tasks.
2 0.25321683 220 acl-2011-Minimum Bayes-risk System Combination
Author: Jesus Gonzalez-Rubio ; Alfons Juan ; Francisco Casacuberta
Abstract: We present minimum Bayes-risk system combination, a method that integrates consensus decoding and system combination into a unified multi-system minimum Bayes-risk (MBR) technique. Unlike other MBR methods that re-rank translations of a single SMT system, MBR system combination uses the MBR decision rule and a linear combination of the component systems’ probability distributions to search for the minimum risk translation among all the finite-length strings over the output vocabulary. We introduce expected BLEU, an approximation to the BLEU score that allows to efficiently apply MBR in these conditions. MBR system combination is a general method that is independent of specific SMT models, enabling us to combine systems with heterogeneous structure. Experiments show that our approach bring significant improvements to single-system-based MBR decoding and achieves comparable results to different state-of-the-art system combination methods.
3 0.23003598 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation
Author: Jingbo Zhu ; Tong Xiao
Abstract: To address the parse error issue for tree-tostring translation, this paper proposes a similarity-based decoding generation (SDG) solution by reconstructing similar source parse trees for decoding at the decoding time instead of taking multiple source parse trees as input for decoding. Experiments on Chinese-English translation demonstrated that our approach can achieve a significant improvement over the standard method, and has little impact on decoding speed in practice. Our approach is very easy to implement, and can be applied to other paradigms such as tree-to-tree models. 1
4 0.18881717 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
Author: Markos Mylonakis ; Khalil Sima'an
Abstract: While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. We propose a novel flexible modelling approach to introduce linguistic information of varying granularity from the source side. Our method induces joint probability synchronous grammars and estimates their parameters, by select- ing and weighing together linguistically motivated rules according to an objective function directly targeting generalisation over future data. We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target.
5 0.18874837 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation
Author: Bing Xiang ; Abraham Ittycheriah
Abstract: In this paper we present a novel discriminative mixture model for statistical machine translation (SMT). We model the feature space with a log-linear combination ofmultiple mixture components. Each component contains a large set of features trained in a maximumentropy framework. All features within the same mixture component are tied and share the same mixture weights, where the mixture weights are trained discriminatively to maximize the translation performance. This approach aims at bridging the gap between the maximum-likelihood training and the discriminative training for SMT. It is shown that the feature space can be partitioned in a variety of ways, such as based on feature types, word alignments, or domains, for various applications. The proposed approach improves the translation performance significantly on a large-scale Arabic-to-English MT task.
6 0.17606963 217 acl-2011-Machine Translation System Combination by Confusion Forest
7 0.16060266 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
8 0.14009386 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
9 0.13579561 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation
10 0.12925281 266 acl-2011-Reordering with Source Language Collocations
11 0.12511486 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
12 0.12161637 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
13 0.11993725 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
14 0.1171935 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering
15 0.11292429 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation
16 0.11261449 44 acl-2011-An exponential translation model for target language morphology
17 0.11137236 61 acl-2011-Binarized Forest to String Translation
18 0.10992268 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?
19 0.10847783 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar
20 0.10795546 90 acl-2011-Crowdsourcing Translation: Professional Quality from Non-Professionals
topicId topicWeight
[(0, 0.211), (1, -0.212), (2, 0.132), (3, 0.053), (4, 0.05), (5, 0.024), (6, -0.114), (7, -0.044), (8, -0.001), (9, 0.024), (10, -0.004), (11, -0.048), (12, 0.001), (13, -0.115), (14, -0.012), (15, -0.017), (16, -0.03), (17, -0.009), (18, -0.093), (19, -0.001), (20, -0.012), (21, -0.071), (22, 0.107), (23, 0.075), (24, -0.029), (25, -0.076), (26, 0.071), (27, 0.027), (28, -0.072), (29, 0.133), (30, -0.062), (31, 0.022), (32, -0.009), (33, -0.021), (34, -0.003), (35, 0.062), (36, -0.048), (37, -0.011), (38, -0.025), (39, -0.107), (40, 0.204), (41, 0.118), (42, -0.059), (43, -0.079), (44, 0.16), (45, -0.019), (46, 0.053), (47, -0.104), (48, -0.109), (49, 0.14)]
simIndex simValue paperId paperTitle
same-paper 1 0.96950996 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation
Author: Nan Duan ; Mu Li ; Ming Zhou
Abstract: This paper presents hypothesis mixture decoding (HM decoding), a new decoding scheme that performs translation reconstruction using hypotheses generated by multiple translation systems. HM decoding involves two decoding stages: first, each component system decodes independently, with the explored search space kept for use in the next step; second, a new search space is constructed by composing existing hypotheses produced by all component systems using a set of rules provided by the HM decoder itself, and a new set of model independent features are used to seek the final best translation from this new search space. Few assumptions are made by our approach about the underlying component systems, enabling us to leverage SMT models based on arbitrary paradigms. We compare our approach with several related techniques, and demonstrate significant BLEU improvements in large-scale Chinese-to-English translation tasks.
2 0.89294201 220 acl-2011-Minimum Bayes-risk System Combination
Author: Jesus Gonzalez-Rubio ; Alfons Juan ; Francisco Casacuberta
Abstract: We present minimum Bayes-risk system combination, a method that integrates consensus decoding and system combination into a unified multi-system minimum Bayes-risk (MBR) technique. Unlike other MBR methods that re-rank translations of a single SMT system, MBR system combination uses the MBR decision rule and a linear combination of the component systems’ probability distributions to search for the minimum risk translation among all the finite-length strings over the output vocabulary. We introduce expected BLEU, an approximation to the BLEU score that allows to efficiently apply MBR in these conditions. MBR system combination is a general method that is independent of specific SMT models, enabling us to combine systems with heterogeneous structure. Experiments show that our approach bring significant improvements to single-system-based MBR decoding and achieves comparable results to different state-of-the-art system combination methods.
3 0.78408164 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation
Author: Jingbo Zhu ; Tong Xiao
Abstract: To address the parse error issue for tree-tostring translation, this paper proposes a similarity-based decoding generation (SDG) solution by reconstructing similar source parse trees for decoding at the decoding time instead of taking multiple source parse trees as input for decoding. Experiments on Chinese-English translation demonstrated that our approach can achieve a significant improvement over the standard method, and has little impact on decoding speed in practice. Our approach is very easy to implement, and can be applied to other paradigms such as tree-to-tree models. 1
4 0.70710611 217 acl-2011-Machine Translation System Combination by Confusion Forest
Author: Taro Watanabe ; Eiichiro Sumita
Abstract: The state-of-the-art system combination method for machine translation (MT) is based on confusion networks constructed by aligning hypotheses with regard to word similarities. We introduce a novel system combination framework in which hypotheses are encoded as a confusion forest, a packed forest representing alternative trees. The forest is generated using syntactic consensus among parsed hypotheses: First, MT outputs are parsed. Second, a context free grammar is learned by extracting a set of rules that constitute the parse trees. Third, a packed forest is generated starting from the root symbol of the extracted grammar through non-terminal rewriting. The new hypothesis is produced by searching the best derivation in the forest. Experimental results on the WMT10 system combination shared task yield comparable performance to the conventional confusion network based method with smaller space.
5 0.63033599 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
Author: Alexander M. Rush ; Michael Collins
Abstract: We describe an exact decoding algorithm for syntax-based statistical translation. The approach uses Lagrangian relaxation to decompose the decoding problem into tractable subproblems, thereby avoiding exhaustive dynamic programming. The method recovers exact solutions, with certificates of optimality, on over 97% of test examples; it has comparable speed to state-of-the-art decoders.
6 0.62087899 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation
8 0.55275232 60 acl-2011-Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability
9 0.55087918 106 acl-2011-Dual Decomposition for Natural Language Processing
10 0.54237092 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
11 0.54219311 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
12 0.54095864 146 acl-2011-Goodness: A Method for Measuring Machine Translation Confidence
13 0.53191173 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
14 0.53182638 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
15 0.50276464 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering
16 0.50136632 61 acl-2011-Binarized Forest to String Translation
17 0.49691576 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
18 0.48960268 263 acl-2011-Reordering Constraint Based on Document-Level Context
19 0.48895979 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach
20 0.48696423 313 acl-2011-Two Easy Improvements to Lexical Weighting
topicId topicWeight
[(5, 0.031), (17, 0.068), (26, 0.041), (37, 0.123), (39, 0.053), (41, 0.056), (55, 0.03), (59, 0.064), (62, 0.012), (72, 0.022), (88, 0.013), (91, 0.043), (96, 0.236), (97, 0.019), (99, 0.109)]
simIndex simValue paperId paperTitle
1 0.98172969 17 acl-2011-A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation
Author: Ming Tan ; Wenli Zhou ; Lei Zheng ; Shaojun Wang
Abstract: This paper presents an attempt at building a large scale distributed composite language model that simultaneously accounts for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content under a directed Markov random field paradigm. The composite language model has been trained by performing a convergent N-best list approximate EM algorithm that has linear time complexity and a followup EM algorithm to improve word prediction power on corpora with up to a billion tokens and stored on a supercomputer. The large scale distributed composite language model gives drastic perplexity reduction over ngrams and achieves significantly better translation quality measured by the BLEU score and “readability” when applied to the task of re-ranking the N-best list from a state-of-the- art parsing-based machine translation system.
2 0.96501529 207 acl-2011-Learning to Win by Reading Manuals in a Monte-Carlo Framework
Author: S.R.K Branavan ; David Silver ; Regina Barzilay
Abstract: This paper presents a novel approach for leveraging automatically extracted textual knowledge to improve the performance of control applications such as games. Our ultimate goal is to enrich a stochastic player with highlevel guidance expressed in text. Our model jointly learns to identify text that is relevant to a given game state in addition to learning game strategies guided by the selected text. Our method operates in the Monte-Carlo search framework, and learns both text analysis and game strategies based only on environment feedback. We apply our approach to the complex strategy game Civilization II using the official game manual as the text guide. Our results show that a linguistically-informed game-playing agent significantly outperforms its language-unaware counterpart, yielding a 27% absolute improvement and winning over 78% of games when playing against the built- . in AI of Civilization II. 1
same-paper 3 0.92577571 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation
Author: Nan Duan ; Mu Li ; Ming Zhou
Abstract: This paper presents hypothesis mixture decoding (HM decoding), a new decoding scheme that performs translation reconstruction using hypotheses generated by multiple translation systems. HM decoding involves two decoding stages: first, each component system decodes independently, with the explored search space kept for use in the next step; second, a new search space is constructed by composing existing hypotheses produced by all component systems using a set of rules provided by the HM decoder itself, and a new set of model independent features are used to seek the final best translation from this new search space. Few assumptions are made by our approach about the underlying component systems, enabling us to leverage SMT models based on arbitrary paradigms. We compare our approach with several related techniques, and demonstrate significant BLEU improvements in large-scale Chinese-to-English translation tasks.
4 0.92191952 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
Author: Ivan Titov ; Alexandre Klementiev
Abstract: We propose a non-parametric Bayesian model for unsupervised semantic parsing. Following Poon and Domingos (2009), we consider a semantic parsing setting where the goal is to (1) decompose the syntactic dependency tree of a sentence into fragments, (2) assign each of these fragments to a cluster of semantically equivalent syntactic structures, and (3) predict predicate-argument relations between the fragments. We use hierarchical PitmanYor processes to model statistical dependencies between meaning representations of predicates and those of their arguments, as well as the clusters of their syntactic realizations. We develop a modification of the MetropolisHastings split-merge sampler, resulting in an efficient inference algorithm for the model. The method is experimentally evaluated by us- ing the induced semantic representation for the question answering task in the biomedical domain.
5 0.92183661 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
Author: Joseph Reisinger ; Marius Pasca
Abstract: We develop a novel approach to the semantic analysis of short text segments and demonstrate its utility on a large corpus of Web search queries. Extracting meaning from short text segments is difficult as there is little semantic redundancy between terms; hence methods based on shallow semantic analysis may fail to accurately estimate meaning. Furthermore search queries lack explicit syntax often used to determine intent in question answering. In this paper we propose a hybrid model of semantic analysis combining explicit class-label extraction with a latent class PCFG. This class-label correlation (CLC) model admits a robust parallel approximation, allowing it to scale to large amounts of query data. We demonstrate its performance in terms of (1) its predicted label accuracy on polysemous queries and (2) its ability to accurately chunk queries into base constituents.
6 0.92021334 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment
7 0.91974366 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
8 0.9194833 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
9 0.91948247 198 acl-2011-Latent Semantic Word Sense Induction and Disambiguation
10 0.91860223 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
11 0.91737819 240 acl-2011-ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation
12 0.91658974 104 acl-2011-Domain Adaptation for Machine Translation by Mining Unseen Words
13 0.91560054 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models
14 0.91549462 28 acl-2011-A Statistical Tree Annotator and Its Applications
15 0.91541982 44 acl-2011-An exponential translation model for target language morphology
16 0.91535991 193 acl-2011-Language-independent compound splitting with morphological operations
17 0.91527653 133 acl-2011-Extracting Social Power Relationships from Natural Language
18 0.91413265 187 acl-2011-Jointly Learning to Extract and Compress
19 0.91374075 241 acl-2011-Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation
20 0.91343021 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations