acl acl2011 acl2011-266 knowledge-graph by maker-knowledge-mining

266 acl-2011-Reordering with Source Language Collocations


Source: pdf

Author: Zhanyi Liu ; Haifeng Wang ; Hua Wu ; Ting Liu ; Sheng Li

Abstract: This paper proposes a novel reordering model for statistical machine translation (SMT) by means of modeling the translation orders of the source language collocations. The model is learned from a word-aligned bilingual corpus where the collocated words in source sentences are automatically detected. During decoding, the model is employed to softly constrain the translation orders of the source language collocations, so as to constrain the translation orders of those source phrases containing these collocated words. The experimental results show that the proposed method significantly improves the translation quality, achieving the absolute improvements of 1.1~1.4 BLEU score over the baseline methods. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 cn i Abstract This paper proposes a novel reordering model for statistical machine translation (SMT) by means of modeling the translation orders of the source language collocations. [sent-5, score-1.131]

2 The model is learned from a word-aligned bilingual corpus where the collocated words in source sentences are automatically detected. [sent-6, score-0.607]

3 During decoding, the model is employed to softly constrain the translation orders of the source language collocations, so as to constrain the translation orders of those source phrases containing these collocated words. [sent-7, score-1.468]

4 , 1993), usually called IBM constraint model, where the movement of words during translation is modeled. [sent-12, score-0.212]

5 Although the sentence structure has been taken into consideration, these methods don‟t explicitly make use of the strong correlations between words, such as collocations, which can effectively indicate reordering in the target language. [sent-15, score-0.581]

6 In this paper, we propose a novel method to improve the reordering for SMT by estimating the reordering score of the source-language collocations (source collocations for short in this paper). [sent-16, score-1.638]

7 Given a bilingual corpus, the collocations in the source sentence are first detected automatically using a monolingual word alignment (MWA) method without employing additional resources (Liu et al. [sent-17, score-0.507]

8 , 2009), and then the reordering model based on the detected collocations is learned from the word-aligned bilingual corpus. [sent-18, score-0.922]

9 This method has two advantages: (1) it can automatically detect and leverage collocated words in a sentence, including long-distance collocated words; (2) such a reordering model can be inte- ProceedingPso orftla thned 4,9 Otrhe Agonnn,u Jauln Mee 1e9t-i2ng4, o 2f0 t1h1e. [sent-20, score-1.488]

10 c s 2o0ci1a1ti Aonss foocria Ctioomnp fourta Ctioomnaplu Ltaintigouniaslti Lcisn,g puaigsetsic 1s036–1044, The paper is organized as follows: In section 2, we describe the motivation to use source collocations for reordering, and briefly introduces the collocation extraction method. [sent-22, score-0.566]

11 2 Collocation A collocation is generally composed of a group of words that occur together more often than by chance. [sent-27, score-0.251]

12 Given two words in a collocation, they can be translated in the same order as in the source language, or in the inverted order. [sent-29, score-0.266]

13 We further notice that some words are translated in different orders when they are collocated with different words. [sent-33, score-0.634]

14 For instance, when “潮 流 chao-liu „trend‟” is collocated with “时代 shi-dai „times‟”, they are often translated into the “trend of times”; when collocated with “历史 li-shi „history‟”, the translation usually becomes the “historical trend”. [sent-34, score-1.043]

15 Thus, if we can automatically detect the collocations in the sentence to be translated and their orders in the target language, the reordering information of the collocations could be used to constrain the reordering of phrases during decoding. [sent-35, score-1.952]

16 Therefore, in this paper, we propose to improve the reordering model for SMT by estimating the reordering score based on the translation orders of the source collocations. [sent-36, score-1.512]

17 In general, the collocations can be automatically identified based on syntactic information such as dependency trees (Lin, 1998). [sent-37, score-0.26]

18 (2009) proposed to automatically detect the collocated words in a sentence with the MWA method. [sent-41, score-0.5]

19 The advantage of this method lies in that it can identify the collocated words in a sentence without additional resources. [sent-42, score-0.491]

20 (2009) to detect collocations in sentences, which are shown in Eq. [sent-44, score-0.295]

21 The MWA models measure the collocated words under different constraints. [sent-47, score-0.469]

22 MWA Model 1 only models word collocation probabilities t(wj | wcj ) . [sent-48, score-0.349]

23 MWA Model 2 additionally employs position collocation probabilities d(j | cj,l) . [sent-49, score-0.252]

24 Given a sentence, the optimal collocated words can be obtained according to Eq. [sent-51, score-0.442]

25 A*  arg max pMWA (A | S) Model i (4) A Given a monolingual word aligned corpus, the collocation probabilities can be estimated as follows. [sent-53, score-0.348]

26 r(wi,wj)p(wi|wj)2p(wj|wi) Where, (5) p(wi|wj)wcocuonutn(tw(wi,w,wj)j) ; (wi,wj) denotes the collocated words in the corpus and count(wi ,wj) denotes the co-occurrence frequency. [sent-54, score-0.534]

27 3 Reordering Model with Source Language Collocations In this section, we first describe how to estimate the orientation probabilities for a given collocation, and then describe the estimation of the reordering score during translation. [sent-55, score-0.656]

28 Finally, we describe the integration of the reordering model into the SMT system. [sent-56, score-0.567]

29 1 Reordering probability estimation Given a source collocation (fi ,fj ) and its corresponding translations (eai ,eaj ) in a bilingual sen- tence pair, the reordering orientation collocation can be defined as in Eq. [sent-58, score-1.207]

30 of the oi,j ai,ajinstvrearigtehdti f i jj&&aaii;aajjoorri jj&&aaii;aajj(6) In our method, only those collocated words in source language that are aligned to different target words, are taken into consideration, and those being aligned to the same target word are ignored. [sent-60, score-0.633]

31 Then the reordering score is estimated according to the reordering probability weighted by the collocation probability of the collocated words. [sent-65, score-1.78]

32 Formally, for a generated translation candidate T , the reordering score is calculated as follows. [sent-66, score-0.742]

33 PO(F,T)(i,ci)r(fi,fci)logp(oi,ci,ai,aci |fi,fci) (9) 1038 reordering frequency Here, r(fi ,fci ) denotes the collocation probability of fi and fci as shown in Eq. [sent-67, score-0.845]

34 In addition to the detected collocated words in the sentence, we also consider other possible word pairs whose collocation probabilities are higher than a given threshold. [sent-69, score-0.739]

35 Thus, the reordering score is further improved according to Eq. [sent-70, score-0.561]

36 Given an input sentence F, the final translation E* with the highest score is chosen from candidates, as in Eq. [sent-78, score-0.213]

37 Our reordering model can be integrated into the system as one feature as shown in (10). [sent-85, score-0.587]

38 An example for reordering 4 Evaluation of Our Method 4. [sent-87, score-0.531]

39 Based on the GIZA++ package (Och and Ney, 2003), we im- plemented a MWA tool for collocation detection. [sent-90, score-0.227]

40 Thus, given a sentence to be translated, we first identify the collocations in the sentence, and then estimate the reordering score according to the translation hypothesis. [sent-91, score-1.004]

41 For a translation option to be expanded, the reordering score inside this source phrase is calculated according to their translation orders of the collocations in the corresponding target phrase. [sent-92, score-1.475]

42 The reordering score crossing the current translation option and the covered parts can be calculated according to the relative position of the collocated words. [sent-93, score-1.256]

43 If the source phrase matched by the current translation option is behind the covered parts in the source sentence, then logp(o  staight | . [sent-94, score-0.417]

44 For example, in Figure 2, the current translation option is ( f2 f3  e3 e4 ). [sent-101, score-0.197]

45 The collocations related to this translation option are (f1,f3) , (f2,f3) , (f3,f5) . [sent-102, score-0.457]

46 For any uncovered word and its collocates in the input sentence, if the collocate is uncovered, then the higher reordering probability is used. [sent-104, score-0.59]

47 If the collocate has been covered, then the reordering orientation can 1039 score be determined according to the relative positions of the words and the corresponding reordering probability is employed. [sent-105, score-1.221]

48 3 Translation results We compare the proposed method with various reordering methods in previous work. [sent-114, score-0.557]

49 Distortion based reordering (DBR) model: a distortion based reordering method (AlOnaizan & Papineni, 2006). [sent-116, score-1.131]

50 But our method considers the translation order of the collocated words. [sent-132, score-0.636]

51 msd-bidirectional-fe reordering (MSDR or Baseline) model: it is one of the reordering models in Moses. [sent-133, score-1.089]

52 It considers three different orientation types (monotone, swap, and discon- tinuous) on both source phrases and target phrases. [sent-134, score-0.23]

53 And the translation orders of both the next phrase and the previous phrase in respect to the current phrase are modeled. [sent-135, score-0.38]

54 Source collocation based reordering (SCBR) model: our proposed method. [sent-136, score-0.758]

55 We investigate three reordering models based on the corresponding MWA models and their combinations. [sent-137, score-0.585]

56 In SCBR Model i(i=1~3), we use MWA Model ias described in section 2 to obtain the collocated words and estimate the reordering probabilities according to section 3. [sent-138, score-0.998]

57 For example, the reordering cases in the trained pairwise distortion model only covered 1040 32~38% of those in the test sets. [sent-141, score-0.647]

58 Our models further improve the translation quality, achieving better performance than the combination of MSDR model and DBR model. [sent-144, score-0.223]

59 As compared to other reordering models, our models achieve an absolute improvement of 0. [sent-148, score-0.558]

60 The collocation (基本, 立场) is included in the same phrase and translated together as a whole. [sent-158, score-0.299]

61 For the other two long-distance collocations (双方, 立场) and (立场, 松动), their translation orders are not correctly handled by the reordering model in the baseline system. [sent-160, score-1.132]

62 For the collocation (双方, 立场), since the SCBR models indicate p(o=straight|双方, 立场) < p(o=inverted|双方, 立场), the system finally generates the translation T2 by constraining their translation order with the proposed model. [sent-161, score-0.574]

63 Co-occurring Words We compared our method with the method that models the reordering orientations based on cooccurring words in the source sentences, rather than the collocations. [sent-163, score-0.767]

64 1 Co-occurrence based reordering model We use the similar algorithm described in section 3 to train the co-occurrence based reordering (CBR) model, except that the probability of the reordering orientation is estimated on the co-occurring words and the relative distance. [sent-165, score-1.766]

65 Given an input sentence and a translation candidate, the reordering score is estimated as shown in Eq. [sent-166, score-0.787]

66 We also construct the weighted co-occurrence based reordering (WCBR) model. [sent-169, score-0.531]

67 In this model, the probability of the reordering orientation is additionally weighted by the pointwise mutual information 2 score of the two words (Manning and Schütze, 1999), which is estimated as shown in Eq. [sent-170, score-0.698]

68 When the WCBR model is used, the translation quality is further improved. [sent-177, score-0.196]

69 However, its performance is still inferior to that of the SCBR models, indicating that our method (SCBR models) of modeling the translation orders of source collocations is more effective. [sent-178, score-0.67]

70 3 Result analysis Precision of prediction First of all, we investigate the performance of the reordering models by calculating precisions of the translation orders predicted by the reordering models. [sent-181, score-1.517]

71 Based on the source sentences and reference translations of the development set, where the source words and target words are automatically aligned by the bilingual word alignment method, we construct the reference translation orders for two words. [sent-182, score-0.64]

72 PIW |{|ij |{|oi1,j&| oii,j |o1i },j,|ai,aj }| (15) Ptotal |{oi,j |{oio,ij, }j,a| i,aj }| (16) Here, oi,j denotes the translation order of ( fi ,fj ) predicted by the reordering models. [sent-184, score-0.778]

73 If p(o  straight | fi,fj ) > p(o  inverted |fi ,fj ) , then oi,j  straight , else if p(o  straight | fi,fj ) < p(o inverted |fi,fj) , then oi,j inverted . [sent-185, score-0.419]

74 oi,j,ai,aj denotes the translation order derived from the word alignments. [sent-186, score-0.206]

75 PCW and PIW denote the precisions calculated on the consecutive words and the interrupted words in the source sentences, respectively. [sent-188, score-0.367]

76 From the results in Table 3, it can be seen that the CBR model has a higher precision on the consecutive words than the SCBR model, but lower precisions on the interrupted words. [sent-192, score-0.279]

77 It is mainly because the CBR model introduces more noise when the relative distance of words is set to a large number, while the MWA method can effectively detect the long-distance collocations in sentences (Liu et al. [sent-193, score-0.381]

78 Effect of the reordering model Then we evaluate the reordering results of the generated translations in the test sets. [sent-197, score-1.121]

79 Using the above method, we construct the reference translation orders of collocations in the test sets. [sent-198, score-0.565]

80 For a given word pair in a source sentence, if the translation order in the generated translation is the same as that in the reference translations, then it is correct, otherwise wrong. [sent-199, score-0.399]

81 Precisions (total) of the reordering models on the test sets the results, it can be seen that our method achieves higher precisions than both the baseline and the method modeling the translation orders of the cooccurring words. [sent-206, score-1.067]

82 It indicates that the proposed method effectively constrains the reordering of source words during decoding and improves the translation quality. [sent-207, score-0.84]

83 This model treats the source word sequence as a coverage set that is processed sequentially and a source token is covered when it is translated into a new target token. [sent-211, score-0.305]

84 In 1997, another model called ITG constraint was presented, in which the reordering order can be hierarchically modeled as straight or inverted for two nodes in a binary branching structure (Wu, 1997). [sent-212, score-0.812]

85 Although the ITG constraint allows more flexible reordering during decoding, Zens and Ney (2003) showed that the IBM constraint results in higher BLEU scores. [sent-213, score-0.608]

86 Our method models the reordering of collocated words in sentences instead of all words in IBM models or two neighboring blocks in ITG models. [sent-214, score-1.077]

87 Hierarchical phrasebased SMT methods employ SCFG bilingual translation model and allow flexible reordering (Chiang, 2005). [sent-219, score-0.798]

88 In our method, we automatically detect the collocated words in sentences and 1042 their translation orders in the target languages, which are used to constrain the ordering models with the estimated reordering (straight or inverted) score. [sent-221, score-1.481]

89 Moreover, our method allows flexible reordering by considering both consecutive words and interrupted words. [sent-222, score-0.698]

90 In order to further improve translation results, many researchers employed syntax-based reordering methods (Zhang et al. [sent-223, score-0.716]

91 Our method directly obtains collocation information without resorting to any linguistic knowledge or tools, therefore is suitable for any language pairs. [sent-227, score-0.277]

92 In addition, a few models employed the collocation information to improve the performance of the ITG constraints (Xiong et al. [sent-228, score-0.279]

93 used the consecutive co-occurring words as collocation information to constrain the reordering, which did not lead to higher translation quality in their experiments. [sent-231, score-0.516]

94 In our method, we first detect both consecutive and interrupted collocated words in the source sentence, and then estimated the reordering score of these collocated words, which are used to softly constrain the reordering of source phrases. [sent-232, score-2.412]

95 7 Conclusions We presented a novel model to improve SMT by means of modeling the translation orders of source collocations. [sent-233, score-0.42]

96 The model was learned from a wordaligned bilingual corpus where the potentially collocated words in source sentences were automatically detected by the MWA method. [sent-234, score-0.652]

97 During decoding, the model is employed to softly constrain the translation orders of the source language collocations. [sent-235, score-0.573]

98 Since we only model the reordering of collocated words, our methods can partially alleviate the data sparseness encountered by other methods directly modeling the reordering based on source phrases or target phrases. [sent-236, score-1.644]

99 In addition, this kind of reordering information can be integrated into any SMT systems without resorting to any additional resources. [sent-237, score-0.575]

100 The experimental results show that the proposed method significantly improves the translation quality of a phrase based SMT system, achieving an absolute improvement of 1. [sent-238, score-0.211]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('reordering', 0.531), ('collocated', 0.418), ('mwa', 0.264), ('scbr', 0.264), ('collocations', 0.26), ('collocation', 0.227), ('translation', 0.16), ('orders', 0.145), ('cbr', 0.123), ('precisions', 0.123), ('smt', 0.12), ('inverted', 0.116), ('straight', 0.101), ('dbr', 0.088), ('source', 0.079), ('constrain', 0.071), ('msdr', 0.07), ('pmwa', 0.07), ('wcj', 0.07), ('orientation', 0.07), ('logp', 0.069), ('itg', 0.064), ('interrupted', 0.062), ('wj', 0.061), ('softly', 0.057), ('visweswariah', 0.053), ('bilingual', 0.05), ('translated', 0.047), ('denotes', 0.046), ('detected', 0.045), ('bleu', 0.044), ('distortion', 0.043), ('estimated', 0.043), ('wi', 0.041), ('fi', 0.041), ('po', 0.039), ('monotone', 0.039), ('xiong', 0.039), ('tillmann', 0.038), ('koehn', 0.038), ('option', 0.037), ('covered', 0.037), ('zens', 0.037), ('ibm', 0.036), ('model', 0.036), ('collocate', 0.035), ('pcw', 0.035), ('piw', 0.035), ('ptotal', 0.035), ('wcbr', 0.035), ('detect', 0.035), ('consecutive', 0.034), ('considers', 0.032), ('marton', 0.031), ('herman', 0.031), ('score', 0.03), ('aligned', 0.029), ('crisis', 0.029), ('cooccurring', 0.029), ('cou', 0.029), ('constraint', 0.028), ('target', 0.027), ('pietra', 0.027), ('della', 0.027), ('models', 0.027), ('method', 0.026), ('employed', 0.025), ('patent', 0.025), ('fbis', 0.025), ('financial', 0.025), ('orientations', 0.025), ('phrase', 0.025), ('probabilities', 0.025), ('trend', 0.024), ('sides', 0.024), ('nto', 0.024), ('uncovered', 0.024), ('resorting', 0.024), ('liu', 0.024), ('words', 0.024), ('monolingual', 0.024), ('fertility', 0.023), ('sentence', 0.023), ('translations', 0.023), ('movements', 0.023), ('meeting', 0.023), ('annual', 0.022), ('crossing', 0.022), ('phrases', 0.022), ('philipp', 0.021), ('flexible', 0.021), ('alexandra', 0.021), ('calculated', 0.021), ('statistical', 0.02), ('consideration', 0.02), ('oi', 0.02), ('berger', 0.02), ('decoding', 0.02), ('integrated', 0.02), ('sheng', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 266 acl-2011-Reordering with Source Language Collocations

Author: Zhanyi Liu ; Haifeng Wang ; Hua Wu ; Ting Liu ; Sheng Li

Abstract: This paper proposes a novel reordering model for statistical machine translation (SMT) by means of modeling the translation orders of the source language collocations. The model is learned from a word-aligned bilingual corpus where the collocated words in source sentences are automatically detected. During decoding, the model is employed to softly constrain the translation orders of the source language collocations, so as to constrain the translation orders of those source phrases containing these collocated words. The experimental results show that the proposed method significantly improves the translation quality, achieving the absolute improvements of 1.1~1.4 BLEU score over the baseline methods. 1

2 0.24424443 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering

Author: Nadir Durrani ; Helmut Schmid ; Alexander Fraser

Abstract: We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the “N-gram” model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is able to handle short and long distance reorderings in a unified way, and (ii) a joint sequence model for the translation and reordering probabilities which is more flexible than standard phrase-based MT. We observe statistically significant improvements in BLEU over Moses for German-to-English and Spanish-to-English tasks, and comparable results for a French-to-English task.

3 0.2407655 263 acl-2011-Reordering Constraint Based on Document-Level Context

Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita

Abstract: One problem with phrase-based statistical machine translation is the problem of longdistance reordering when translating between languages with different word orders, such as Japanese-English. In this paper, we propose a method of imposing reordering constraints using document-level context. As the documentlevel context, we use noun phrases which significantly occur in context documents containing source sentences. Given a source sentence, zones which cover the noun phrases are used as reordering constraints. Then, in decoding, reorderings which violate the zones are restricted. Experiment results for patent translation tasks show a significant improvement of 1.20% BLEU points in JapaneseEnglish translation and 1.41% BLEU points in English-Japanese translation.

4 0.23815753 264 acl-2011-Reordering Metrics for MT

Author: Alexandra Birch ; Miles Osborne

Abstract: One of the major challenges facing statistical machine translation is how to model differences in word order between languages. Although a great deal of research has focussed on this problem, progress is hampered by the lack of reliable metrics. Most current metrics are based on matching lexical items in the translation and the reference, and their ability to measure the quality of word order has not been demonstrated. This paper presents a novel metric, the LRscore, which explicitly measures the quality of word order by using permutation distance metrics. We show that the metric is more consistent with human judgements than other metrics, including the BLEU score. We also show that the LRscore can successfully be used as the objective function when training translation model parameters. Training with the LRscore leads to output which is preferred by humans. Moreover, the translations incur no penalty in terms of BLEU scores.

5 0.23305908 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

Author: Markos Mylonakis ; Khalil Sima'an

Abstract: While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. We propose a novel flexible modelling approach to introduce linguistic information of varying granularity from the source side. Our method induces joint probability synchronous grammars and estimates their parameters, by select- ing and weighing together linguistically motivated rules according to an objective function directly targeting generalisation over future data. We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target.

6 0.1851844 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations

7 0.18105252 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices

8 0.13445675 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages

9 0.12925281 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation

10 0.1292493 69 acl-2011-Clause Restructuring For SMT Not Absolutely Helpful

11 0.10516403 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

12 0.10063971 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction

13 0.10054034 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

14 0.099805564 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation

15 0.098863035 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach

16 0.097334668 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation

17 0.094136052 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers

18 0.092856452 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?

19 0.089771792 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation

20 0.086877465 245 acl-2011-Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.186), (1, -0.188), (2, 0.141), (3, 0.126), (4, 0.069), (5, 0.034), (6, -0.041), (7, -0.017), (8, 0.019), (9, 0.025), (10, 0.012), (11, -0.094), (12, 0.002), (13, -0.154), (14, -0.034), (15, 0.007), (16, -0.038), (17, 0.034), (18, -0.119), (19, -0.02), (20, -0.079), (21, 0.029), (22, -0.017), (23, -0.133), (24, -0.065), (25, 0.068), (26, 0.133), (27, 0.031), (28, -0.032), (29, 0.06), (30, 0.08), (31, -0.013), (32, 0.201), (33, -0.04), (34, 0.073), (35, -0.214), (36, -0.07), (37, 0.03), (38, -0.005), (39, 0.058), (40, 0.118), (41, -0.071), (42, -0.054), (43, 0.045), (44, 0.031), (45, -0.079), (46, -0.108), (47, 0.086), (48, -0.028), (49, -0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9285627 266 acl-2011-Reordering with Source Language Collocations

Author: Zhanyi Liu ; Haifeng Wang ; Hua Wu ; Ting Liu ; Sheng Li

Abstract: This paper proposes a novel reordering model for statistical machine translation (SMT) by means of modeling the translation orders of the source language collocations. The model is learned from a word-aligned bilingual corpus where the collocated words in source sentences are automatically detected. During decoding, the model is employed to softly constrain the translation orders of the source language collocations, so as to constrain the translation orders of those source phrases containing these collocated words. The experimental results show that the proposed method significantly improves the translation quality, achieving the absolute improvements of 1.1~1.4 BLEU score over the baseline methods. 1

2 0.87502819 263 acl-2011-Reordering Constraint Based on Document-Level Context

Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita

Abstract: One problem with phrase-based statistical machine translation is the problem of longdistance reordering when translating between languages with different word orders, such as Japanese-English. In this paper, we propose a method of imposing reordering constraints using document-level context. As the documentlevel context, we use noun phrases which significantly occur in context documents containing source sentences. Given a source sentence, zones which cover the noun phrases are used as reordering constraints. Then, in decoding, reorderings which violate the zones are restricted. Experiment results for patent translation tasks show a significant improvement of 1.20% BLEU points in JapaneseEnglish translation and 1.41% BLEU points in English-Japanese translation.

3 0.82149327 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering

Author: Nadir Durrani ; Helmut Schmid ; Alexander Fraser

Abstract: We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the “N-gram” model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is able to handle short and long distance reorderings in a unified way, and (ii) a joint sequence model for the translation and reordering probabilities which is more flexible than standard phrase-based MT. We observe statistically significant improvements in BLEU over Moses for German-to-English and Spanish-to-English tasks, and comparable results for a French-to-English task.

4 0.76376575 264 acl-2011-Reordering Metrics for MT

Author: Alexandra Birch ; Miles Osborne

Abstract: One of the major challenges facing statistical machine translation is how to model differences in word order between languages. Although a great deal of research has focussed on this problem, progress is hampered by the lack of reliable metrics. Most current metrics are based on matching lexical items in the translation and the reference, and their ability to measure the quality of word order has not been demonstrated. This paper presents a novel metric, the LRscore, which explicitly measures the quality of word order by using permutation distance metrics. We show that the metric is more consistent with human judgements than other metrics, including the BLEU score. We also show that the LRscore can successfully be used as the objective function when training translation model parameters. Training with the LRscore leads to output which is preferred by humans. Moreover, the translations incur no penalty in terms of BLEU scores.

5 0.71981198 69 acl-2011-Clause Restructuring For SMT Not Absolutely Helpful

Author: Susan Howlett ; Mark Dras

Abstract: There are a number of systems that use a syntax-based reordering step prior to phrasebased statistical MT. An early work proposing this idea showed improved translation performance, but subsequent work has had mixed results. Speculations as to cause have suggested the parser, the data, or other factors. We systematically investigate possible factors to give an initial answer to the question: Under what conditions does this use of syntax help PSMT?

6 0.64398074 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices

7 0.62236279 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

8 0.58155209 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages

9 0.56008494 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations

10 0.53633642 116 acl-2011-Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers

11 0.48716733 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach

12 0.48297679 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation

13 0.48099807 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

14 0.45287195 151 acl-2011-Hindi to Punjabi Machine Translation System

15 0.45047683 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

16 0.44947675 60 acl-2011-Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability

17 0.44455791 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction

18 0.44179958 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules

19 0.42413908 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation

20 0.42241195 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.02), (17, 0.082), (26, 0.025), (37, 0.067), (39, 0.032), (41, 0.041), (55, 0.042), (59, 0.033), (65, 0.244), (72, 0.035), (91, 0.021), (96, 0.246)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.82731527 62 acl-2011-Blast: A Tool for Error Analysis of Machine Translation Output

Author: Sara Stymne

Abstract: We present BLAST, an open source tool for error analysis of machine translation (MT) output. We believe that error analysis, i.e., to identify and classify MT errors, should be an integral part ofMT development, since it gives a qualitative view, which is not obtained by standard evaluation methods. BLAST can aid MT researchers and users in this process, by providing an easy-to-use graphical user interface. It is designed to be flexible, and can be used with any MT system, language pair, and error typology. The annotation task can be aided by highlighting similarities with a reference translation.

same-paper 2 0.81896913 266 acl-2011-Reordering with Source Language Collocations

Author: Zhanyi Liu ; Haifeng Wang ; Hua Wu ; Ting Liu ; Sheng Li

Abstract: This paper proposes a novel reordering model for statistical machine translation (SMT) by means of modeling the translation orders of the source language collocations. The model is learned from a word-aligned bilingual corpus where the collocated words in source sentences are automatically detected. During decoding, the model is employed to softly constrain the translation orders of the source language collocations, so as to constrain the translation orders of those source phrases containing these collocated words. The experimental results show that the proposed method significantly improves the translation quality, achieving the absolute improvements of 1.1~1.4 BLEU score over the baseline methods. 1

3 0.81866938 73 acl-2011-Collective Classification of Congressional Floor-Debate Transcripts

Author: Clinton Burfoot ; Steven Bird ; Timothy Baldwin

Abstract: This paper explores approaches to sentiment classification of U.S. Congressional floordebate transcripts. Collective classification techniques are used to take advantage of the informal citation structure present in the debates. We use a range of methods based on local and global formulations and introduce novel approaches for incorporating the outputs of machine learners into collective classification algorithms. Our experimental evaluation shows that the mean-field algorithm obtains the best results for the task, significantly outperforming the benchmark technique.

4 0.74947232 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation

Author: Xianchao Wu ; Takuya Matsuzaki ; Jun'ichi Tsujii

Abstract: In the present paper, we propose the effective usage of function words to generate generalized translation rules for forest-based translation. Given aligned forest-string pairs, we extract composed tree-to-string translation rules that account for multiple interpretations of both aligned and unaligned target function words. In order to constrain the exhaustive attachments of function words, we limit to bind them to the nearby syntactic chunks yielded by a target dependency parser. Therefore, the proposed approach can not only capture source-tree-to-target-chunk correspondences but can also use forest structures that compactly encode an exponential number of parse trees to properly generate target function words during decoding. Extensive experiments involving large-scale English-toJapanese translation revealed a significant im- provement of 1.8 points in BLEU score, as compared with a strong forest-to-string baseline system.

5 0.74726945 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

Author: Lane Schwartz ; Chris Callison-Burch ; William Schuler ; Stephen Wu

Abstract: This paper describes a novel technique for incorporating syntactic knowledge into phrasebased machine translation through incremental syntactic parsing. Bottom-up and topdown parsers typically require a completed string as input. This requirement makes it difficult to incorporate them into phrase-based translation, which generates partial hypothesized translations from left-to-right. Incremental syntactic language models score sentences in a similar left-to-right fashion, and are therefore a good mechanism for incorporat- ing syntax into phrase-based translation. We give a formal definition of one such lineartime syntactic language model, detail its relation to phrase-based decoding, and integrate the model with the Moses phrase-based translation system. We present empirical results on a constrained Urdu-English translation task that demonstrate a significant BLEU score improvement and a large decrease in perplexity.

6 0.7436654 116 acl-2011-Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers

7 0.74320143 169 acl-2011-Improving Question Recommendation by Exploiting Information Need

8 0.74134862 46 acl-2011-Automated Whole Sentence Grammar Correction Using a Noisy Channel Model

9 0.73990452 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations

10 0.73830932 141 acl-2011-Gappy Phrasal Alignment By Agreement

11 0.73814285 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering

12 0.73724008 175 acl-2011-Integrating history-length interpolation and classes in language modeling

13 0.73705596 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach

14 0.73698878 61 acl-2011-Binarized Forest to String Translation

15 0.73694766 263 acl-2011-Reordering Constraint Based on Document-Level Context

16 0.73659998 341 acl-2011-Word Maturity: Computational Modeling of Word Knowledge

17 0.73650217 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices

18 0.73613387 26 acl-2011-A Speech-based Just-in-Time Retrieval System using Semantic Search

19 0.73539174 264 acl-2011-Reordering Metrics for MT

20 0.73528767 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation