acl acl2011 acl2011-220 knowledge-graph by maker-knowledge-mining

220 acl-2011-Minimum Bayes-risk System Combination


Source: pdf

Author: Jesus Gonzalez-Rubio ; Alfons Juan ; Francisco Casacuberta

Abstract: We present minimum Bayes-risk system combination, a method that integrates consensus decoding and system combination into a unified multi-system minimum Bayes-risk (MBR) technique. Unlike other MBR methods that re-rank translations of a single SMT system, MBR system combination uses the MBR decision rule and a linear combination of the component systems’ probability distributions to search for the minimum risk translation among all the finite-length strings over the output vocabulary. We introduce expected BLEU, an approximation to the BLEU score that allows to efficiently apply MBR in these conditions. MBR system combination is a general method that is independent of specific SMT models, enabling us to combine systems with heterogeneous structure. Experiments show that our approach bring significant improvements to single-system-based MBR decoding and achieves comparable results to different state-of-the-art system combination methods.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 e s Abstract We present minimum Bayes-risk system combination, a method that integrates consensus decoding and system combination into a unified multi-system minimum Bayes-risk (MBR) technique. [sent-4, score-0.641]

2 MBR system combination is a general method that is independent of specific SMT models, enabling us to combine systems with heterogeneous structure. [sent-7, score-0.23]

3 Experiments show that our approach bring significant improvements to single-system-based MBR decoding and achieves comparable results to different state-of-the-art system combination methods. [sent-8, score-0.367]

4 1 Introduction Once statistical models are trained, a decoding approach determines what translations are finally selected. [sent-9, score-0.205]

5 Two parallel lines of research have shown consistent improvements over the max–derivation decoding objective, which selects the highest probability derivation. [sent-10, score-0.18]

6 Consensus decoding procedures select translations for a single system with a minimum Bayes risk (MBR) (Kumar and Byrne, 2004). [sent-11, score-0.429]

7 System combination procedures, on the other hand, generate translations from the output of multiple component systems by combining the best fragments of these outputs (Frederking and Nirenburg, 1268 Alfons Juan Francisco Casacuberta D. [sent-12, score-0.352]

8 In this paper, we present minimum Bayes risk system combination, a technique that unifies these two approaches by learning a consensus translation over multiple underlying component systems. [sent-17, score-0.442]

9 MBR system combination operates directly on the outputs of the component models. [sent-18, score-0.339]

10 We perform an MBR decoding using a linear combination of the component models’ probability distributions. [sent-19, score-0.417]

11 Instead of re-ranking the translations provided by the component systems, we search for the hypothesis with the minimum expected translation error among all the possible finite-length strings in the target language. [sent-20, score-0.442]

12 , 2002), we avoid the hypothesis alignment problem that is central to standard system combination approaches (Rosti et al. [sent-22, score-0.255]

13 MBR system combination assumes only that each translation model can produce expectations of n-gram counts; the latent derivation structures ofthe component systems can differ arbitrary. [sent-24, score-0.404]

14 over the best single system max-derivation, and state-ofthe-art performance in the system combination task of the ACL 2010 workshop on SMT. [sent-29, score-0.264]

15 c s 2o0ci1a1ti Aonss foocria Ctioomnp fourta Ctioomnaplu Ltaintigouniaslti Lcisn,g puaigsetsic 1s268–1277, 2 Related Work MBR system combination is a multi-system generalization of MBR decoding where the space of hypotheses is not constrained to the space ofevidences. [sent-32, score-0.658]

16 We expand the space of hypotheses following some underlying ideas of system combination techniques. [sent-33, score-0.416]

17 1 Minimum Bayes risk In SMT, MBR decoding allows to minimize the loss of the output for a single translation system. [sent-35, score-0.376]

18 (2010) present an MBR decoding that makes use of a mixture of different SMT systems to improve translation accuracy. [sent-45, score-0.281]

19 (2010) present model combination, a multi-system lattice MBR decoding on the conjoined evidences spaces of the component systems. [sent-49, score-0.681]

20 Our technique differs in that we perform the search in an extended search space not restricted to the provided evidences, have fewer parameters to learn, and optimizes an expected BLEU score instead of the linear BLEU approximation. [sent-50, score-0.27]

21 2 System Combination System combination techniques in MT take as input the outputs {e1, · · · , eN} of N translation systems, ew ohuetpreu en eis a s·tr ,uectu}r oedf Ntrant raslnastiloanti object (or N-best lists thereof), typically viewed as a sequence of words. [sent-55, score-0.336]

22 A new search space is constructed from these backbone-aligned outputs and then a voting procedure of feature-based model predicts a final consensus translation (Rosti et al. [sent-57, score-0.304]

23 MBR system combination entirely avoids this alignment problem by considering hypotheses as n-gram occurrence vectors rather than word sequences. [sent-59, score-0.351]

24 MBR system combination performs the decoding in a larger search space and includes statistics from the components’ posteriors, whereas system combination techniques typically do not. [sent-60, score-0.725]

25 Despite these advantages, system combination may be more appropriate in some settings. [sent-61, score-0.209]

26 In particular, MBR system combination is designed primarily for statistical systems that generate N-best or lattice outputs. [sent-62, score-0.268]

27 MBR system combination can integrate non-statistical systems that generate either a single or an unweighted output. [sent-63, score-0.251]

28 However, we would not expect the same strong performance from MBR system combination in these constrained settings. [sent-64, score-0.228]

29 3 Minimum Bayes risk Decoding MBR decoding aims to find the candidate hypothesis that has the least expected loss under a probability model (Bickel and Doksum, 1977). [sent-65, score-0.398]

30 If the loss function between any two hypotheses can be bounded: L(e, e0) ≤ Lmax, the MBR deccoande br can u bned eredw:ri Ltt(een, ien )te ≤rm Lof a similarity function S(e, e0) = Lmax − L(e, e0). [sent-72, score-0.208]

31 (3) (4) MBR decoding can use different spaces for hypothesis selection and gain computation (arg max and summatory in Eq. [sent-74, score-0.377]

32 Therefore, the MBR decoder can be more generally written as follows: e = areg0∈mEhaxeX∈EeP(e|f) · S(e,e0) , (5) where Eh refers to the hypotheses space form where the translations are chosen and Ee refers to the evidences space that is used to compute the Bayes gain. [sent-76, score-0.596]

33 We will investigate the expansion of the hypotheses space while keeping the evidences space as provided by the decoder. [sent-77, score-0.528]

34 4 MBR System Combination MBR system combination is a multi-system generalization of MBR decoding. [sent-78, score-0.209]

35 It uses the MBR decision rule on a linear combination of the probability distributions of the component systems. [sent-79, score-0.259]

36 Unlike existing MBR decoding methods that re-rank translation outputs, MBR system combination search for the minimum risk hypotheses on the complete set of finite-length hypotheses over the output vocabulary. [sent-80, score-0.941]

37 We assume the component systems to be statistically independent and define the Bayes gain as a linear 1270 combination of the Bayes gains of the components. [sent-81, score-0.352]

38 Each system provides its own space of evidences Dn(f) and its posterior distribution over translations Pn(e|f). [sent-82, score-0.423]

39 It is worth mentioning that by using a linear combination instead of a mixture model, we avoid the problem of component systems not sharing the same search space (Duan et al. [sent-84, score-0.408]

40 MBR system combination parameters training and decoding in the extended hypotheses space are described below. [sent-86, score-0.592]

41 We used BLEU, choosing the scaling factors to maximize BLEU score of the set of translations predicted by MBR system combination. [sent-91, score-0.174]

42 2 Model Decoding In most MBR algorithms, the hypotheses space is equal to the evidences space. [sent-94, score-0.463]

43 Following the underlying idea of system combination, we are interested in extend the hypotheses space by including new sentences created using fragments of the hypotheses in the evidences spaces of the component models. [sent-95, score-0.787]

44 AMS algorithm perform a search on a hypotheses space equal to the free monoid Σ∗ of the vocabulary of the evidences Σ = V oc(Ee). [sent-100, score-0.559]

45 If the Bayes gain of any of the new edited hypotheses is higher than the Bayes gain of the current hypothesis (Line 17), we repeat the loop with this new hypotheses in other case, we return the current hypothesis. [sent-104, score-0.474]

46 AMS algorithm takes as input an initial hypothesis e and the combined vocabulary of the evidences spaces Σ. [sent-105, score-0.382]

47 Its output is a possibly new hypothesis whose Bayes gain is assured to be higher or equal than the Bayes gain of the initial hypothesis. [sent-106, score-0.19]

48 5 Computing BLEU-based Gain We are interested in performing MBR system combination under BLEU. [sent-109, score-0.209]

49 The evidences space Dn(f) may contain a huge numThbeer e ovifd heynpcoesth sepseasc1e w Dhich often make impractical to compute Eq. [sent-118, score-0.342]

50 (2008) propose linear BLEU, an approximation to the BLEU score to efficiently perform MBR decoding when the search space is represented with lattices. [sent-121, score-0.294]

51 However, our hypotheses space is the full set of finite-length strings in the target vocabulary and can not be represented in a lattice. [sent-122, score-0.235]

52 (9), we have one hypothesis e0 that is to be compared to a set of evidences e ∈ Dn(f) which fcoolmlopwa a probability fd eisvtidriebuntcieosn Pn(e|f) . [sent-124, score-0.302]

53 Instead of computing the expected BLEU score by ncastlecaudlating the BLEU score with respect to each of the evidences, our approach will be to use the expected n-gram counts and sentence length of the evidences to compute a single-reference BLEU score. [sent-125, score-0.427]

54 (10)) by the expected statistics (r0 and m0n) given the pos1For example, in a lattice the number of hypotheses exponential in the size of its state set. [sent-127, score-0.298]

55 Both, the expected length of the evidences r0 and their expected n-gram counts m0k can be pre-computed efficiently from N-best lists and translation lattices (Kumar et al. [sent-137, score-0.513]

56 R43m∗ax- derivation decoding (Best MAX), the best single system minimum Bayes risk decoding (Best MBR) and minimum Bayes risk system combination (MBR-SC) combining three systems. [sent-152, score-0.937]

57 For each system, we report the performance of max-derivation decoding (MAX) and 1000-best3 MBR decoding (Kumar and Byrne, 2004). [sent-161, score-0.316]

58 2 Experimental Results Table 2 compares MBR system combination (MBRSC) to the best MAX and MBR systems. [sent-163, score-0.209]

59 MBR-SC uses expected BLEU as gain function using the conjoined evidences spaces of the three systems to compute expected BLEU statistics. [sent-168, score-0.693]

60 MBR system combination improves single Best MAX system by +2. [sent-171, score-0.264]

61 This improvement could arise due to multiple reasons: the expected BLEU gain, the larger evidences space, the extended hypotheses space, or the MERT tuned scaling factor values. [sent-173, score-0.545]

62 Best MBR and MBR-SC-Expected differ only in the gain function: MBR uses sentence level BLEU while MBR-SC-Expected uses the expected BLEU gain described in Section 5. [sent-176, score-0.219]

63 MBRSC-Expected performance is comparable to MBR decoding on the 1000-best list from the single best system. [sent-177, score-0.158]

64 We now extend the evidences space to the conjoined 1000-best lists (MBR-SC-E/Conjoin). [sent-179, score-0.45]

65 This implies that either the expected BLEU statistics computed in the conjoined evidences space are stronger or the larger conjoined evidences spaces introduce better hypotheses. [sent-181, score-1.006]

66 When we restrict the BLEU statistics to be computed from only the best system’s evidences space 1273 (MBR-SC-E/C/evidences-best), BLEU scores dramatically decrease relative to MBR-SC-E/Conjoin. [sent-182, score-0.364]

67 This implies that the expected BLEU statistics computed over the conjoined 1000-best lists are stronger than the corresponding statistics from the single best system. [sent-183, score-0.345]

68 On the other hand, if we restrict the search space to only the 1000-best list of the best system (MBR-SC-E/C/hypotheses-best), BLEU scores also decrease relative to MBR-SC-E/Conjoin. [sent-184, score-0.161]

69 This implies that the conjoined search space also contains better hypotheses than the single best system’s search space. [sent-185, score-0.417]

70 The linear combination of the probability distributions in the conjoined evidences spaces allows to compute much stronger statistics for the expected BLEU gain and also contains some better hypotheses than the single best system’s search space does. [sent-187, score-1.101]

71 We next expand the conjoined evidences spaces using the decoding algorithm described in Section 4. [sent-188, score-0.568]

72 In this case, the expected BLEU statistics are computed from the conjoined 1000-best lists of the three systems, but the hypotheses space where we perform the decod- ing is expanded to the set of all possible finitelength hypotheses over the vocabulary of the evidences. [sent-190, score-0.624]

73 We take the output of MBR-SC-E/Conjoin as the initial hypotheses of the decoding (see Algorithm 1). [sent-191, score-0.3]

74 Since these two systems are identical in their expected BLEU statistics, the improvements in BLEU imply that the extended search space has introduced better hypotheses. [sent-193, score-0.22]

75 The degradation in TER performance can be explained by the use of a BLEU-based gain function in the decoding process. [sent-194, score-0.249]

76 Figure 1: Performance of minimum Bayes risk system combination (MBR-SC) for different sizes of the evidences space in comparison to other MBR-SC setups. [sent-201, score-0.699]

77 MBR-SC-E/C/Ex/MERT is the standard setup for MBR system combination and, from now, on we will refer to it as MBR-SC. [sent-202, score-0.209]

78 We next evaluate performance of MBR system combination on N-best lists of increasing sizes, and compare it to MBR-SC-E/C/Extended and MBRSC-E/Conjoin in the same N-best lists. [sent-203, score-0.236]

79 MBR-SCConjoin is consistently better than the Best MAX system, and differences in BLEU increase with the size of the evidences space. [sent-206, score-0.256]

80 This implies that the linear combination of posterior probabilities allow to compute stronger statistics for the expected BLEU gain, and, in addition, the larger the evidences space is, the stronger the computed statistics are. [sent-207, score-0.771]

81 This result show that the extended search space always contains better hypotheses than the conjoined evidences spaces; also confirms the soundness of Algorithm 1 that allows to reach them. [sent-210, score-0.643]

82 Figure 2 display the MBR system combination translation and compare it to the max-derivation translations of the three component systems. [sent-213, score-0.411]

83 3 Comparison to System Combination Figure 3 compares MBR system combination (MBR-SC) with state-of-the-art system combination techniques presented to the system combination task of the ACL 2010 workshop on MT (WMT2010). [sent-222, score-0.627]

84 All system combination techniques build a “word sausage” from the outputs of the different component systems and choose a path trough the sausage with the highest score under different models. [sent-223, score-0.387]

85 In this task, the output of the component systems are single hypotheses or unweighted lists thereof. [sent-226, score-0.286]

86 Therefore, we lack of the statistics of the components’ posteriors which is one of the main advantages of MBR system combination over system combination techniques. [sent-227, score-0.461]

87 However, we find that, even in these constrained setting, MBR system combination performance is similar to the best system combination techniques for all translation directions. [sent-228, score-0.517]

88 MBR system combination yields state-of-the-art performance while avoiding the challenge of aligning translation hypotheses. [sent-230, score-0.31]

89 7 Conclusion MBR system combination integrates consensus decoding and system combination into a unified multisystem MBR technique. [sent-231, score-0.639]

90 Component systems can have varied decoding strategies; we only require that each system produce an N-best list (or a lattice) of translations. [sent-234, score-0.234]

91 (2010) generate intermediate translations in several pivot languages, translate them separately into the target language, and generate a consensus translation out ofthese using a system combination technique. [sent-237, score-0.399]

92 MBR system combination has two significant advantages over current approaches to system combi- nation. [sent-239, score-0.264]

93 Aligning translation hypotheses can be challenging and has a substantial effect on combination performance (He et al. [sent-241, score-0.376]

94 Instead of aligning the sentences, we view the sentences as vectors of n-gram counts and compute the expected statistics of the BLEU score to compute the Bayes gain. [sent-243, score-0.181]

95 Choosing a backbone system can also be challenging and also affects system combination performance (He and Toutanova, 2009). [sent-245, score-0.302]

96 MBR system combination sidesteps this issue by working directly on the conjoined evidences space produced by the outputs of the component systems, and allows the consensus model to express system preferences via scaling factors. [sent-246, score-0.953]

97 Despite its simplicity, MBR system combination provides strong performance by leveraging different consensus, decoding and training techniques. [sent-247, score-0.367]

98 In addition, it obtains state- of-the-art performance in a constrained setting better suited for dominant system combination techniques. [sent-249, score-0.228]

99 Mixture model-based minimum bayes risk decoding using multiple machine translation systems. [sent-276, score-0.522]

100 Efficient minimum error rate training and minimum bayes-risk decoding for translation hypergraphs and lattices. [sent-326, score-0.394]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mbr', 0.76), ('evidences', 0.256), ('ecur', 0.219), ('bleu', 0.201), ('decoding', 0.158), ('combination', 0.154), ('hypotheses', 0.142), ('bayes', 0.115), ('conjoined', 0.102), ('risk', 0.091), ('translation', 0.08), ('minimum', 0.078), ('expected', 0.075), ('component', 0.075), ('gain', 0.072), ('space', 0.065), ('consensus', 0.063), ('morristown', 0.061), ('kumar', 0.061), ('ams', 0.06), ('outputs', 0.055), ('system', 0.055), ('xdn', 0.055), ('nj', 0.055), ('scaling', 0.054), ('spaces', 0.052), ('max', 0.049), ('translations', 0.047), ('hypothesis', 0.046), ('byrne', 0.046), ('tromble', 0.044), ('statistics', 0.043), ('search', 0.041), ('pn', 0.041), ('dn', 0.04), ('smt', 0.039), ('mert', 0.038), ('lattice', 0.038), ('backbone', 0.038), ('ng', 0.037), ('gn', 0.036), ('gonz', 0.036), ('denero', 0.033), ('rosti', 0.031), ('shankar', 0.031), ('linear', 0.03), ('stronger', 0.029), ('loss', 0.028), ('nez', 0.028), ('median', 0.028), ('vocabulary', 0.028), ('goel', 0.027), ('meahxnx', 0.027), ('monoid', 0.027), ('sausage', 0.027), ('lists', 0.027), ('spanish', 0.026), ('implies', 0.026), ('duan', 0.026), ('mt', 0.025), ('nelder', 0.024), ('upv', 0.024), ('lmax', 0.024), ('cg', 0.024), ('leusch', 0.024), ('frederking', 0.024), ('volume', 0.024), ('mart', 0.024), ('jth', 0.023), ('mixture', 0.022), ('ecnica', 0.022), ('ncia', 0.022), ('polit', 0.022), ('bickel', 0.022), ('lines', 0.022), ('compute', 0.021), ('systems', 0.021), ('aligning', 0.021), ('jes', 0.021), ('casacuberta', 0.021), ('unweighted', 0.021), ('valencia', 0.021), ('en', 0.02), ('xiaodong', 0.02), ('papineni', 0.019), ('allows', 0.019), ('franz', 0.019), ('och', 0.019), ('association', 0.019), ('derivation', 0.019), ('val', 0.019), ('simplex', 0.019), ('edit', 0.019), ('function', 0.019), ('constrained', 0.019), ('factors', 0.018), ('ei', 0.018), ('afnlp', 0.018), ('es', 0.018), ('extended', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 220 acl-2011-Minimum Bayes-risk System Combination

Author: Jesus Gonzalez-Rubio ; Alfons Juan ; Francisco Casacuberta

Abstract: We present minimum Bayes-risk system combination, a method that integrates consensus decoding and system combination into a unified multi-system minimum Bayes-risk (MBR) technique. Unlike other MBR methods that re-rank translations of a single SMT system, MBR system combination uses the MBR decision rule and a linear combination of the component systems’ probability distributions to search for the minimum risk translation among all the finite-length strings over the output vocabulary. We introduce expected BLEU, an approximation to the BLEU score that allows to efficiently apply MBR in these conditions. MBR system combination is a general method that is independent of specific SMT models, enabling us to combine systems with heterogeneous structure. Experiments show that our approach bring significant improvements to single-system-based MBR decoding and achieves comparable results to different state-of-the-art system combination methods.

2 0.25321683 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation

Author: Nan Duan ; Mu Li ; Ming Zhou

Abstract: This paper presents hypothesis mixture decoding (HM decoding), a new decoding scheme that performs translation reconstruction using hypotheses generated by multiple translation systems. HM decoding involves two decoding stages: first, each component system decodes independently, with the explored search space kept for use in the next step; second, a new search space is constructed by composing existing hypotheses produced by all component systems using a set of rules provided by the HM decoder itself, and a new set of model independent features are used to seek the final best translation from this new search space. Few assumptions are made by our approach about the underlying component systems, enabling us to leverage SMT models based on arbitrary paradigms. We compare our approach with several related techniques, and demonstrate significant BLEU improvements in large-scale Chinese-to-English translation tasks.

3 0.13400264 217 acl-2011-Machine Translation System Combination by Confusion Forest

Author: Taro Watanabe ; Eiichiro Sumita

Abstract: The state-of-the-art system combination method for machine translation (MT) is based on confusion networks constructed by aligning hypotheses with regard to word similarities. We introduce a novel system combination framework in which hypotheses are encoded as a confusion forest, a packed forest representing alternative trees. The forest is generated using syntactic consensus among parsed hypotheses: First, MT outputs are parsed. Second, a context free grammar is learned by extracting a set of rules that constitute the parse trees. Third, a packed forest is generated starting from the root symbol of the extracted grammar through non-terminal rewriting. The new hypothesis is produced by searching the best derivation in the forest. Experimental results on the WMT10 system combination shared task yield comparable performance to the conventional confusion network based method with smaller space.

4 0.1030179 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?

Author: Jinxi Xu ; Jinying Chen

Abstract: Word alignment is a central problem in statistical machine translation (SMT). In recent years, supervised alignment algorithms, which improve alignment accuracy by mimicking human alignment, have attracted a great deal of attention. The objective of this work is to explore the performance limit of supervised alignment under the current SMT paradigm. Our experiments used a manually aligned ChineseEnglish corpus with 280K words recently released by the Linguistic Data Consortium (LDC). We treated the human alignment as the oracle of supervised alignment. The result is surprising: the gain of human alignment over a state of the art unsupervised method (GIZA++) is less than 1point in BLEU. Furthermore, we showed the benefit of improved alignment becomes smaller with more training data, implying the above limit also holds for large training conditions. 1

5 0.10142467 90 acl-2011-Crowdsourcing Translation: Professional Quality from Non-Professionals

Author: Omar F. Zaidan ; Chris Callison-Burch

Abstract: Naively collecting translations by crowdsourcing the task to non-professional translators yields disfluent, low-quality results if no quality control is exercised. We demonstrate a variety of mechanisms that increase the translation quality to near professional levels. Specifically, we solicit redundant translations and edits to them, and automatically select the best output among them. We propose a set of features that model both the translations and the translators, such as country of residence, LM perplexity of the translation, edit rate from the other translations, and (optionally) calibration against professional translators. Using these features to score the collected translations, we are able to discriminate between acceptable and unacceptable translations. We recreate the NIST 2009 Urdu-toEnglish evaluation set with Mechanical Turk, and quantitatively show that our models are able to select translations within the range of quality that we expect from professional trans- lators. The total cost is more than an order of magnitude lower than professional translation.

6 0.095909186 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

7 0.093292639 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation

8 0.08459723 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

9 0.082256906 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation

10 0.081572555 2 acl-2011-AM-FM: A Semantic Framework for Translation Quality Assessment

11 0.079049967 60 acl-2011-Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability

12 0.075393133 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation

13 0.074562877 104 acl-2011-Domain Adaptation for Machine Translation by Mining Unseen Words

14 0.072905689 264 acl-2011-Reordering Metrics for MT

15 0.069938101 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations

16 0.069313325 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation

17 0.069030479 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach

18 0.068861008 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation

19 0.067796312 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering

20 0.067293681 216 acl-2011-MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.153), (1, -0.133), (2, 0.079), (3, 0.072), (4, 0.026), (5, 0.018), (6, -0.031), (7, -0.02), (8, 0.003), (9, 0.019), (10, -0.01), (11, -0.03), (12, 0.01), (13, -0.091), (14, -0.031), (15, 0.012), (16, -0.029), (17, -0.011), (18, -0.057), (19, -0.018), (20, 0.018), (21, -0.018), (22, 0.086), (23, 0.054), (24, -0.02), (25, -0.061), (26, 0.043), (27, 0.006), (28, -0.042), (29, 0.083), (30, -0.097), (31, 0.032), (32, -0.013), (33, -0.022), (34, -0.017), (35, 0.044), (36, -0.014), (37, 0.013), (38, -0.026), (39, -0.127), (40, 0.139), (41, 0.053), (42, -0.053), (43, -0.006), (44, 0.09), (45, 0.052), (46, 0.035), (47, -0.102), (48, -0.074), (49, 0.063)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94866359 220 acl-2011-Minimum Bayes-risk System Combination

Author: Jesus Gonzalez-Rubio ; Alfons Juan ; Francisco Casacuberta

Abstract: We present minimum Bayes-risk system combination, a method that integrates consensus decoding and system combination into a unified multi-system minimum Bayes-risk (MBR) technique. Unlike other MBR methods that re-rank translations of a single SMT system, MBR system combination uses the MBR decision rule and a linear combination of the component systems’ probability distributions to search for the minimum risk translation among all the finite-length strings over the output vocabulary. We introduce expected BLEU, an approximation to the BLEU score that allows to efficiently apply MBR in these conditions. MBR system combination is a general method that is independent of specific SMT models, enabling us to combine systems with heterogeneous structure. Experiments show that our approach bring significant improvements to single-system-based MBR decoding and achieves comparable results to different state-of-the-art system combination methods.

2 0.91098058 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation

Author: Nan Duan ; Mu Li ; Ming Zhou

Abstract: This paper presents hypothesis mixture decoding (HM decoding), a new decoding scheme that performs translation reconstruction using hypotheses generated by multiple translation systems. HM decoding involves two decoding stages: first, each component system decodes independently, with the explored search space kept for use in the next step; second, a new search space is constructed by composing existing hypotheses produced by all component systems using a set of rules provided by the HM decoder itself, and a new set of model independent features are used to seek the final best translation from this new search space. Few assumptions are made by our approach about the underlying component systems, enabling us to leverage SMT models based on arbitrary paradigms. We compare our approach with several related techniques, and demonstrate significant BLEU improvements in large-scale Chinese-to-English translation tasks.

3 0.71713936 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation

Author: Jingbo Zhu ; Tong Xiao

Abstract: To address the parse error issue for tree-tostring translation, this paper proposes a similarity-based decoding generation (SDG) solution by reconstructing similar source parse trees for decoding at the decoding time instead of taking multiple source parse trees as input for decoding. Experiments on Chinese-English translation demonstrated that our approach can achieve a significant improvement over the standard method, and has little impact on decoding speed in practice. Our approach is very easy to implement, and can be applied to other paradigms such as tree-to-tree models. 1

4 0.69376045 217 acl-2011-Machine Translation System Combination by Confusion Forest

Author: Taro Watanabe ; Eiichiro Sumita

Abstract: The state-of-the-art system combination method for machine translation (MT) is based on confusion networks constructed by aligning hypotheses with regard to word similarities. We introduce a novel system combination framework in which hypotheses are encoded as a confusion forest, a packed forest representing alternative trees. The forest is generated using syntactic consensus among parsed hypotheses: First, MT outputs are parsed. Second, a context free grammar is learned by extracting a set of rules that constitute the parse trees. Third, a packed forest is generated starting from the root symbol of the extracted grammar through non-terminal rewriting. The new hypothesis is produced by searching the best derivation in the forest. Experimental results on the WMT10 system combination shared task yield comparable performance to the conventional confusion network based method with smaller space.

5 0.64309448 60 acl-2011-Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability

Author: Jonathan H. Clark ; Chris Dyer ; Alon Lavie ; Noah A. Smith

Abstract: In statistical machine translation, a researcher seeks to determine whether some innovation (e.g., a new feature, model, or inference algorithm) improves translation quality in comparison to a baseline system. To answer this question, he runs an experiment to evaluate the behavior of the two systems on held-out data. In this paper, we consider how to make such experiments more statistically reliable. We provide a systematic analysis of the effects of optimizer instability—an extraneous variable that is seldom controlled for—on experimental outcomes, and make recommendations for reporting results more accurately.

6 0.59894943 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation

7 0.58199668 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation

8 0.56188941 146 acl-2011-Goodness: A Method for Measuring Machine Translation Confidence

9 0.55055183 90 acl-2011-Crowdsourcing Translation: Professional Quality from Non-Professionals

10 0.53007454 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

11 0.51484519 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation

12 0.50881261 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach

13 0.50144655 264 acl-2011-Reordering Metrics for MT

14 0.49806261 116 acl-2011-Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers

15 0.49429232 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment

16 0.48909527 106 acl-2011-Dual Decomposition for Natural Language Processing

17 0.48381159 313 acl-2011-Two Easy Improvements to Lexical Weighting

18 0.48282447 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages

19 0.47700614 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

20 0.46719351 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.029), (17, 0.042), (26, 0.036), (37, 0.09), (39, 0.04), (41, 0.07), (55, 0.028), (59, 0.027), (62, 0.036), (72, 0.037), (86, 0.168), (91, 0.031), (96, 0.25)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90992355 220 acl-2011-Minimum Bayes-risk System Combination

Author: Jesus Gonzalez-Rubio ; Alfons Juan ; Francisco Casacuberta

Abstract: We present minimum Bayes-risk system combination, a method that integrates consensus decoding and system combination into a unified multi-system minimum Bayes-risk (MBR) technique. Unlike other MBR methods that re-rank translations of a single SMT system, MBR system combination uses the MBR decision rule and a linear combination of the component systems’ probability distributions to search for the minimum risk translation among all the finite-length strings over the output vocabulary. We introduce expected BLEU, an approximation to the BLEU score that allows to efficiently apply MBR in these conditions. MBR system combination is a general method that is independent of specific SMT models, enabling us to combine systems with heterogeneous structure. Experiments show that our approach bring significant improvements to single-system-based MBR decoding and achieves comparable results to different state-of-the-art system combination methods.

2 0.85072517 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation

Author: Nan Duan ; Mu Li ; Ming Zhou

Abstract: This paper presents hypothesis mixture decoding (HM decoding), a new decoding scheme that performs translation reconstruction using hypotheses generated by multiple translation systems. HM decoding involves two decoding stages: first, each component system decodes independently, with the explored search space kept for use in the next step; second, a new search space is constructed by composing existing hypotheses produced by all component systems using a set of rules provided by the HM decoder itself, and a new set of model independent features are used to seek the final best translation from this new search space. Few assumptions are made by our approach about the underlying component systems, enabling us to leverage SMT models based on arbitrary paradigms. We compare our approach with several related techniques, and demonstrate significant BLEU improvements in large-scale Chinese-to-English translation tasks.

3 0.84850639 217 acl-2011-Machine Translation System Combination by Confusion Forest

Author: Taro Watanabe ; Eiichiro Sumita

Abstract: The state-of-the-art system combination method for machine translation (MT) is based on confusion networks constructed by aligning hypotheses with regard to word similarities. We introduce a novel system combination framework in which hypotheses are encoded as a confusion forest, a packed forest representing alternative trees. The forest is generated using syntactic consensus among parsed hypotheses: First, MT outputs are parsed. Second, a context free grammar is learned by extracting a set of rules that constitute the parse trees. Third, a packed forest is generated starting from the root symbol of the extracted grammar through non-terminal rewriting. The new hypothesis is produced by searching the best derivation in the forest. Experimental results on the WMT10 system combination shared task yield comparable performance to the conventional confusion network based method with smaller space.

4 0.84688199 18 acl-2011-A Latent Topic Extracting Method based on Events in a Document and its Application

Author: Risa Kitajima ; Ichiro Kobayashi

Abstract: Recently, several latent topic analysis methods such as LSI, pLSI, and LDA have been widely used for text analysis. However, those methods basically assign topics to words, but do not account for the events in a document. With this background, in this paper, we propose a latent topic extracting method which assigns topics to events. We also show that our proposed method is useful to generate a document summary based on a latent topic.

5 0.84655088 104 acl-2011-Domain Adaptation for Machine Translation by Mining Unseen Words

Author: Hal Daume III ; Jagadeesh Jagarlamudi

Abstract: We show that unseen words account for a large part of the translation error when moving to new domains. Using an extension of a recent approach to mining translations from comparable corpora (Haghighi et al., 2008), we are able to find translations for otherwise OOV terms. We show several approaches to integrating such translations into a phrasebased translation system, yielding consistent improvements in translations quality (between 0.5 and 1.5 Bleu points) on four domains and two language pairs.

6 0.84652448 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

7 0.84630597 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

8 0.84623832 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation

9 0.84520161 240 acl-2011-ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation

10 0.84517038 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation

11 0.84504664 71 acl-2011-Coherent Citation-Based Summarization of Scientific Papers

12 0.8449055 251 acl-2011-Probabilistic Document Modeling for Syntax Removal in Text Summarization

13 0.84488308 76 acl-2011-Comparative News Summarization Using Linear Programming

14 0.84481865 90 acl-2011-Crowdsourcing Translation: Professional Quality from Non-Professionals

15 0.84437573 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

16 0.84408593 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation

17 0.84366381 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach

18 0.84357965 341 acl-2011-Word Maturity: Computational Modeling of Word Knowledge

19 0.84344763 62 acl-2011-Blast: A Tool for Error Analysis of Machine Translation Output

20 0.84333229 45 acl-2011-Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews