acl acl2010 acl2010-201 knowledge-graph by maker-knowledge-mining

201 acl-2010-Pseudo-Word for Phrase-Based Machine Translation


Source: pdf

Author: Xiangyu Duan ; Min Zhang ; Haizhou Li

Abstract: The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus. But word appears to be too fine-grained in some cases such as non-compositional phrasal equivalences, where no clear word alignments exist. Using words as inputs to PBSMT pipeline has inborn deficiency. This paper proposes pseudo-word as a new start point for PB-SMT pipeline. Pseudo-word is a kind of basic multi-word expression that characterizes minimal sequence of consecutive words in sense of translation. By casting pseudo-word searching problem into a parsing framework, we search for pseudo-words in a monolingual way and a bilingual synchronous way. Experiments show that pseudo-word significantly outperforms word for PB-SMT model in both travel translation domain and news translation domain. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 But word appears to be too fine-grained in some cases such as non-compositional phrasal equivalences, where no clear word alignments exist. [sent-5, score-0.262]

2 Pseudo-word is a kind of basic multi-word expression that characterizes minimal sequence of consecutive words in sense of translation. [sent-8, score-0.308]

3 By casting pseudo-word searching problem into a parsing framework, we search for pseudo-words in a monolingual way and a bilingual synchronous way. [sent-9, score-0.736]

4 Experiments show that pseudo-word significantly outperforms word for PB-SMT model in both travel translation domain and news translation domain. [sent-10, score-0.266]

5 But there is a deficiency in such manner that word is too finegrained in some cases such as non-compositional phrasal equivalences, where clear word alignments do not exist. [sent-15, score-0.262]

6 No clear word alignments are there in such phrasal equivalences. [sent-17, score-0.232]

7 Moreover, should basic translational unit be word or coarsegrained multi-word is an open problem for optimizing SMT models. [sent-18, score-0.297]

8 Some researchers have explored coarse- 少 grained translational unit for machine translation. [sent-19, score-0.22]

9 Marcu and Wong (2002) attempted to directly learn phrasal alignments instead of word alignments. [sent-20, score-0.232]

10 (2008) used synchronous ITG (Wu, 1997) and constraints to find non-compositional phrasal equivalences, but they suffered from intractable estimation problem. [sent-23, score-0.246]

11 (2008; 2009) induced phrasal synchronous grammar, which aimed at finding hierarchical phrasal equivalences. [sent-25, score-0.352]

12 Another direction of questioning word as basic translational unit is to directly question word segmentation on languages where word boundaries are not orthographically marked. [sent-26, score-0.415]

13 (2008) used a Bayesian semi-supervised method that combines Chinese word segmentation model and Chinese-to-English translation model to derive a Chinese segmentation suitable for machine translation. [sent-30, score-0.236]

14 Since there are many 1-to-n phrasal equivalences in Chinese-to-English translation (Ma and Way. [sent-35, score-0.267]

15 2009), only focusing on Chinese word as basic translational unit is not adequate to model 1-to-n translations. [sent-36, score-0.297]

16 Ma and Way (2009) tackle this problem by using word aligner to bootstrap bilingual segmentation suitable for machine translation. [sent-37, score-0.301]

17 c As2s0o1c0ia Atisosnoc foiart Cionom fopru Ctaotmiopnuatla Lti on gaulis Lti cnsg,u piasgtiecs 148–156, pressions by monotonically segmenting a given Spanish-English sentence pair into bilingual units, where word aligner is also used. [sent-40, score-0.313]

18 , 1993) and Deng and Byrne (2005) are another kind of related works that allow 1-to-n alignments, but they rarely questioned if such alignments exist in word units level, that is, they rarely questioned word as basic translational unit. [sent-42, score-0.465]

19 This paper focuses on determining the basic translational units on both language sides without using word aligner before feeding them into PBSMT pipeline. [sent-44, score-0.36]

20 We call such basic translational unit as pseudo-word to differentiate with word. [sent-45, score-0.267]

21 Pseudo-word searching problem is the same to decomposition of a given sentence into pseudowords. [sent-47, score-0.421]

22 We use a measurement, which characterizes pseudo-word as minimal sequence of consecutive words in sense of translation, as potential function in Gibbs distribution. [sent-49, score-0.261]

23 Note that the number of decomposition of one sentence into pseudo-words grows exponentially with sentence length. [sent-50, score-0.263]

24 By fitting decomposition problem into parsing framework, we can find optimal pseudo-word sequence in polynomial time. [sent-51, score-0.346]

25 Then we feed pseudo-words into PB-SMT pipeline, and find that pseudo-words as basic translational units improve translation performance over words as basic translational units. [sent-52, score-0.558]

26 Further experiments of removing the power of higher order language model and longer max phrase length, which are inherent in pseudowords, show that pseudo-words still improve translational performance significantly over unary words. [sent-53, score-0.416]

27 This paper is structured as follows: In section 2, we define the task of searching for pseudowords and its solution. [sent-54, score-0.363]

28 2 Searching for Pseudo-words Pseudo-word searching problem is equal to decomposition of a given sentence into pseudowords. [sent-57, score-0.45]

29 We assume that the distribution of such decomposition is in the form of Gibbs distribution as below: P(Y|X)=Z1Xexp(∑kSigyk) (1) where X denotes the sentence, Y denotes a decomposition of X. [sent-58, score-0.424]

30 Given X, ZX is fixed, so searching for optimal decomposition is as below: Yˆ=ARGYMAXP(Y|X)=ARGY1KMAX∑kSigyk (2) where Y1K denotes K multi-word units from decomposition of X. [sent-61, score-0.648]

31 A multi-word sequence with maximal sum of Sig function values is the search target — pseudo-word sequence. [sent-62, score-0.259]

32 In this paper Sig function calculates sequence significance which is proposed to characterize pseudo-word as minimal sequence of consecutive words in sense of translation. [sent-64, score-0.467]

33 The detail of sequence significance is described in the following section. [sent-65, score-0.255]

34 1 Sequence Significance Two kinds of definitions of sequence significance are proposed. [sent-67, score-0.255]

35 X and Y are monolingual sentence and monolingual multi-words respectively in this monolingual scenario. [sent-69, score-0.524]

36 X and Y are sentence pair and multi-word pairs respectively in this bilingual scenario. [sent-71, score-0.232]

37 We also denote word sequence wi, wj as span[i, j], whole sentence as span[1, n]. [sent-75, score-0.212]

38 Monolingual sequence significance of span[i, j] is proportional to span[i, j]’s frequency, while is inversely proportion to frequency of expanded span (span[i-1, j+1]). [sent-77, score-0.456]

39 Such definition characterizes minimal sequence of consecutive words which we are looking for. [sent-78, score-0.261]

40 Our target is to find pseudo-word sequence which has maximal sum j …, …, …, of spans’ significances: 149 pw1K = ARGspManA1KX ∑ Kk=1 Sig span k (4) where pw denotes pseudo-word, K is equal to or less than sentence’s length. [sent-79, score-0.555]

41 Details of searching algorithm are described in section 2. [sent-83, score-0.231]

42 We firstly search for monolingual pseudowords on source and target side individually. [sent-86, score-0.353]

43 We argue that word alignment techniques will work fine if nonexistent word alignments in such as noncompositional phrasal equivalences have been filtered by pseudo-words. [sent-88, score-0.388]

44 2 Bilingual Sequence Significance Bilingual sequence significance is proposed to characterize pseudo-word pairs. [sent-91, score-0.255]

45 Co-occurrence of sequences on both language sides is used to define bilingual sequence significance. [sent-92, score-0.33]

46 Given a bilingual sequence pair: span-pair[is, js, it, jt] (source side span[is, js] and target side span[it, jt]), bilingual sequence significance is defined as below: Sigis,js,it,jt=FreFqrise− 1q,isjs,+js 1, i t ,− j1t,jt+1 (5) where Freq denotes the frequency of a span-pair. [sent-93, score-0.905]

47 Bilingual sequence significance is an extension of monolingual sequence significance. [sent-94, score-0.553]

48 Pseudo-word pairs of one sentence pair are such pairs that maximize the sum of span-pairs’ bilingual sequence significances: pwp1K=AspRanG−paMir1KAX∑Kk=1Sigspan−pairk( 6) pwp represents pseudo-word pair. [sent-96, score-0.423]

49 Searching for pseudo-word pairs pwp1K is equal to bilingual segmentation of a sentence pair into optimal span-pair1K. [sent-98, score-0.348]

50 Details of searching algorithm are presented in section 2. [sent-99, score-0.231]

51 2 Algorithms words of Searching for Pseudo- Pseudo-word searching problem is equal to decomposition of a sentence into pseudo-words. [sent-103, score-0.45]

52 But the number of possible decompositions of the sentence grows exponentially with the sentence length in both monolingual scenario and bilingual scenario. [sent-104, score-0.547]

53 By casting such decomposition problem into parsing framework, we can find pseudo-word sequence in polynomial time. [sent-105, score-0.327]

54 According to the two scenarios, searching for pseudo-words can be performed in a monolingual way and a synchronous way. [sent-106, score-0.531]

55 Details of the two kinds of searching algorithms are described in the following two sections. [sent-107, score-0.231]

56 1 Algorithm of Searching for Monolingual Pseudo-words (SMP) Searching for monolingual pseudo-words is based on the computation of monolingual sequence significance. [sent-110, score-0.458]

57 In this algorithm, Wi, j records maximal sum of monolingual sequence significances of sub spans of span[i, j]. [sent-117, score-0.63]

58 During initialization, Wi, is initialized as Sigi,i (note that this sequence is word wi only). [sent-118, score-0.326]

59 For span[i, j], Wi, j is updated if higher sum of monolingual sequence significances is found. [sent-121, score-0.516]

60 After maximal sum of significances is found in small spans, big span’s computation, which uses small spans’ maximal sum, is continued. [sent-124, score-0.354]

61 Maximal sum of significances for whole sentence (W1,n, n is sentence’s length) is guaranteed in this way, and optimal decomposition is obtained correspondingly. [sent-125, score-0.437]

62 After steps 3-6, all possible decompositions of span[i, j] are explored and Wi, j of optimal decomposition of span[i, j] is recorded. [sent-127, score-0.265]

63 Then monolingual sequence significance Sigi,j of span[i, j] is computed at step 7, and it is compared to Wi, j at step 8. [sent-128, score-0.415]

64 2 Algorithm of Synchronous Searching for Pseudo-words (SSP) Synchronous searching for pseudo-words utilizes bilingual sequence significance. [sent-133, score-0.531]

65 What it cares about is the span-pairs that maximize the sum of bilingual sequence significances. [sent-136, score-0.353]

66 Initialization: if is = js or it = jt then Wis,js,it,jt Sig is ,js,it,jt else Wis,js,it,jt 0 = = ; ; 1: for ds = 2 … ns, dt = 2 … nt do 2: for all is, js, it, jt s. [sent-137, score-0.915]

67 js-is=ds-1 and jt-it=dt-1 do 3: for ks = is js – 1, kt = it jt – 1 do … 4: … v = max{Wis,ks,it,kt +Wks+ 1,js,kt+ 1,jt Wis,ks,kt+1,jt + Wks+ 1,js,it,kt } , 5: 6: if v > Wis,js,it,jt then Wis,js,it,jt = v; 7: u = Sigis,js,it,jt 8: if u > Wis,js,it ,jt then = 9: Wis,js,it,jt u; Figure 2. [sent-139, score-0.612]

68 In the algorithm, Wis,js,it,jt records maximal sum of bilingual sequence significances of sub span-pairs of span-pair[is, js, it, jt]. [sent-141, score-0.586]

69 For 1-to-m span-pairs, Ws are initialized as bilingual sequence significances of such span-pairs. [sent-142, score-0.502]

70 In the main algorithm, ds/dt denotes the length of a span on source/target side, ranging from 2 to ns/nt (source/target sentence’s length). [sent-144, score-0.318]

71 For span-pair[is, js, it, jt], Wis,js,it,jt is updated at step 6 if higher sum of bilingual sequence significances is found. [sent-148, score-0.518]

72 Fitting the bilingually searching for pseudowords into ITG framework is located at steps 7-9. [sent-149, score-0.453]

73 Then bilingual sequence significance of span-pair[is, js, it, jt] is computed at step 7. [sent-151, score-0.417]

74 Update is taken at step 9 if bilingual sequence significance of span-pair[is, js, it, jt] is bigger than Wis,js,it,jt , which indicates that span-pair[is, js, it, jt] is non-decomposable. [sent-153, score-0.417]

75 In addition to the initialization step, all spanpairs’ bilingual sequence significances are computed. [sent-155, score-0.508]

76 Maximal sum of bilingual sequence sig- nificances for one sentence pair is guaranteed through this bottom-up way, and the optimal decomposition of the sentence pair is obtained correspondingly. [sent-156, score-0.668]

77 151 Initialization: if is = js or it = jt then Wis,js,it,jt Sig is,js,it,jt else = ; = ; Wis,js,it,jt 0 1: for ds = 2 … ns, dt = 2 … nt do 2: for all is, js, it, jt s. [sent-166, score-0.915]

78 We can see that in Figure 4, each monolingual span is configured into three parts, for example: span[is, ks1-1], span[ks1, ks2] and span[ks2+1, js] on source language side. [sent-172, score-0.361]

79 Bilingual sequence significance is computed only on pairs of blank boxes, solid boxes are excluded in this computation to represent NULL alignment cases. [sent-174, score-0.469]

80 Generally, span length of NULL alignment is not very long, so we can set a length threshold for NULL alignments, eg. [sent-179, score-0.358]

81 2 Pseudo-word Unpacking Because pseudo-word is a kind of multi-word expression, it has inborn advantage of higher language model order and longer max phrase length over unary word. [sent-221, score-0.325]

82 The advantage of longer max phrase length is removed during phrase extraction, and the advantage of higher order of language model is also removed during decoding since we use language model trained on unary words. [sent-225, score-0.307]

83 Performances of pseudoword unpacking are reported in section 3. [sent-226, score-0.22]

84 Ma and Way (2009) used the unpacking after phrase extraction, then re-estimated phrase translation probability and lexical reordering model. [sent-231, score-0.338]

85 pwchpwen denotes that pseudo-words are on both language side of training data, and they are input strings during development and testing, and translations are also pseudo-words, which will be converted to words as final output. [sent-235, score-0.221]

86 This shows that excluding NULL alignments in synchronous searching for pseudowords is effective. [sent-238, score-0.599]

87 ESSP is superior to SMP indicating that bilingually motivated searching for pseudo-words is more effective. [sent-240, score-0.288]

88 This indicates that pseudo-words, through either monolingual searching or synchronous searching, are more effective than words as to being basic translational units. [sent-250, score-0.75]

89 pseudo-word itself as basic translational unit, does not rely very much on higher language model order or longer max phrase length setting. [sent-271, score-0.414]

90 Corpus scale has an influence on computation of sequence significance in long sentences which appear frequently in news domain. [sent-278, score-0.284]

91 Similar to performances on small corpus, wchpwen always performs better than the other two cases, which indicates that Chinese word prefers to have English pseudo-word equivalence which has more than or equal to one word. [sent-281, score-0.332]

92 1 Pseudo-word Unpacking ances on Large Corpus Perform- Table 6 presents pseudo-word unpacking performances on large corpus. [sent-305, score-0.247]

93 It shows that the improvement derives from pseudo-word itself as basic transla- tional unit, does not rely very much on higher language model order or longer max phrase length setting. [sent-313, score-0.242]

94 In fact, slight improvement in pwchpwen and pwchwen is seen after pseudo-word unpacking, which indicates that higher language model order and longer max phrase length impact the performance in these two configurations. [sent-314, score-0.346]

95 4 Conclusion We have presented pseudo-word as a novel machine translational unit for phrase-based machine translation. [sent-328, score-0.22]

96 It is proposed to replace too finegrained word as basic translational unit. [sent-329, score-0.249]

97 Pseudoword is a kind of basic multi-word expression that characterizes minimal sequence of consecutive words in sense of translation. [sent-330, score-0.308]

98 By casting pseudo-word searching problem into a parsing framework, we search for pseudo-words in polynomial time. [sent-331, score-0.274]

99 Experimental results of Chinese-toEnglish translation task show that, in phrasebased machine translation model, pseudo-word performs significantly better than word in both spoken language translation domain and news domain. [sent-332, score-0.382]

100 Removing the power of higher order language model and longer max phrase length, which are inherent in pseudo-words, shows that pseudo-words still improve translational performance significantly over unary words. [sent-333, score-0.416]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('js', 0.309), ('jt', 0.303), ('essp', 0.264), ('smp', 0.245), ('searching', 0.231), ('span', 0.201), ('ssp', 0.199), ('translational', 0.172), ('unpacking', 0.17), ('wchpwen', 0.17), ('significances', 0.165), ('bilingual', 0.162), ('monolingual', 0.16), ('decomposition', 0.146), ('synchronous', 0.14), ('sequence', 0.138), ('pseudowords', 0.132), ('wi', 0.121), ('significance', 0.117), ('phrasal', 0.106), ('sig', 0.099), ('alignments', 0.096), ('pwchpwen', 0.094), ('translation', 0.09), ('null', 0.085), ('chinese', 0.083), ('performances', 0.077), ('unary', 0.073), ('equivalences', 0.071), ('excluded', 0.071), ('maximal', 0.068), ('denotes', 0.066), ('max', 0.063), ('side', 0.061), ('segmentation', 0.058), ('decompositions', 0.057), ('bilingually', 0.057), ('inborn', 0.057), ('pwchwen', 0.057), ('sigi', 0.057), ('unpacked', 0.057), ('alignment', 0.055), ('bleu', 0.053), ('sum', 0.053), ('pipeline', 0.052), ('length', 0.051), ('aligner', 0.051), ('pseudoword', 0.05), ('characterizes', 0.049), ('boxes', 0.049), ('unit', 0.048), ('basic', 0.047), ('spans', 0.046), ('consecutive', 0.045), ('chunking', 0.045), ('sentence', 0.044), ('initialization', 0.043), ('casting', 0.043), ('longer', 0.042), ('phrase', 0.039), ('solid', 0.039), ('attains', 0.038), ('flexcrfs', 0.038), ('ksigyk', 0.038), ('pbsmt', 0.038), ('spanpairs', 0.038), ('wks', 0.038), ('initialized', 0.037), ('itg', 0.037), ('ma', 0.036), ('gibbs', 0.036), ('xu', 0.034), ('fitting', 0.033), ('blunsom', 0.033), ('steps', 0.033), ('desk', 0.033), ('phan', 0.033), ('lambert', 0.033), ('statistical', 0.032), ('aligned', 0.032), ('koehn', 0.032), ('giza', 0.031), ('units', 0.03), ('void', 0.03), ('questioned', 0.03), ('word', 0.03), ('sides', 0.03), ('och', 0.03), ('optimal', 0.029), ('news', 0.029), ('exponentially', 0.029), ('equal', 0.029), ('minimal', 0.029), ('kk', 0.028), ('zx', 0.028), ('cky', 0.028), ('significantly', 0.027), ('meteor', 0.027), ('pair', 0.026), ('performs', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 201 acl-2010-Pseudo-Word for Phrase-Based Machine Translation

Author: Xiangyu Duan ; Min Zhang ; Haizhou Li

Abstract: The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus. But word appears to be too fine-grained in some cases such as non-compositional phrasal equivalences, where no clear word alignments exist. Using words as inputs to PBSMT pipeline has inborn deficiency. This paper proposes pseudo-word as a new start point for PB-SMT pipeline. Pseudo-word is a kind of basic multi-word expression that characterizes minimal sequence of consecutive words in sense of translation. By casting pseudo-word searching problem into a parsing framework, we search for pseudo-words in a monolingual way and a bilingual synchronous way. Experiments show that pseudo-word significantly outperforms word for PB-SMT model in both travel translation domain and news translation domain. 1

2 0.1538699 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation

Author: John DeNero ; Dan Klein

Abstract: We present a discriminative model that directly predicts which set ofphrasal translation rules should be extracted from a sentence pair. Our model scores extraction sets: nested collections of all the overlapping phrase pairs consistent with an underlying word alignment. Extraction set models provide two principle advantages over word-factored alignment models. First, we can incorporate features on phrase pairs, in addition to word links. Second, we can optimize for an extraction-based loss function that relates directly to the end task of generating translations. Our model gives improvements in alignment quality relative to state-of-the-art unsupervised and supervised baselines, as well as providing up to a 1.4 improvement in BLEU score in Chinese-to-English translation experiments.

3 0.14758372 147 acl-2010-Improving Statistical Machine Translation with Monolingual Collocation

Author: Zhanyi Liu ; Haifeng Wang ; Hua Wu ; Sheng Li

Abstract: This paper proposes to use monolingual collocations to improve Statistical Machine Translation (SMT). We make use of the collocation probabilities, which are estimated from monolingual corpora, in two aspects, namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT. The experimental results show that our method improves the performance of both word alignment and translation quality significantly. As compared to baseline systems, we achieve absolute improvements of 2.40 BLEU score on a phrase-based SMT system and 1.76 BLEU score on a parsing-based SMT system. 1

4 0.14119922 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment

Author: Shujie Liu ; Chi-Ho Li ; Ming Zhou

Abstract: While Inversion Transduction Grammar (ITG) has regained more and more attention in recent years, it still suffers from the major obstacle of speed. We propose a discriminative ITG pruning framework using Minimum Error Rate Training and various features from previous work on ITG alignment. Experiment results show that it is superior to all existing heuristics in ITG pruning. On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs, which improves both F-score and Bleu score over the baseline alignment system of GIZA++. 1

5 0.13426889 133 acl-2010-Hierarchical Search for Word Alignment

Author: Jason Riesa ; Daniel Marcu

Abstract: We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can efficiently extract a ranked k-best list. We score a given alignment within the forest with a flexible, linear discriminative model incorporating hundreds of features, and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure, yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system.

6 0.1326839 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation

7 0.13146366 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out

8 0.13047558 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels

9 0.1165652 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints

10 0.11067381 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation

11 0.10893469 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages

12 0.10294055 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules

13 0.100233 262 acl-2010-Word Alignment with Synonym Regularization

14 0.099009857 163 acl-2010-Learning Lexicalized Reordering Models from Reordering Graphs

15 0.095551156 54 acl-2010-Boosting-Based System Combination for Machine Translation

16 0.095006987 169 acl-2010-Learning to Translate with Source and Target Syntax

17 0.088608302 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment

18 0.087123469 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans

19 0.082689166 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models

20 0.077937238 3 acl-2010-A Bayesian Method for Robust Estimation of Distributional Similarities


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.2), (1, -0.211), (2, -0.034), (3, 0.045), (4, 0.047), (5, 0.03), (6, -0.085), (7, 0.001), (8, -0.006), (9, 0.002), (10, -0.027), (11, -0.039), (12, 0.0), (13, 0.047), (14, 0.003), (15, -0.058), (16, -0.037), (17, -0.06), (18, -0.023), (19, -0.014), (20, -0.01), (21, -0.1), (22, -0.002), (23, 0.02), (24, -0.007), (25, 0.017), (26, -0.083), (27, -0.022), (28, -0.025), (29, -0.055), (30, -0.084), (31, -0.015), (32, 0.081), (33, -0.016), (34, 0.032), (35, 0.067), (36, 0.126), (37, 0.033), (38, -0.046), (39, 0.08), (40, 0.007), (41, 0.09), (42, 0.059), (43, 0.073), (44, 0.091), (45, -0.084), (46, -0.058), (47, 0.089), (48, -0.027), (49, -0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93672442 201 acl-2010-Pseudo-Word for Phrase-Based Machine Translation

Author: Xiangyu Duan ; Min Zhang ; Haizhou Li

Abstract: The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus. But word appears to be too fine-grained in some cases such as non-compositional phrasal equivalences, where no clear word alignments exist. Using words as inputs to PBSMT pipeline has inborn deficiency. This paper proposes pseudo-word as a new start point for PB-SMT pipeline. Pseudo-word is a kind of basic multi-word expression that characterizes minimal sequence of consecutive words in sense of translation. By casting pseudo-word searching problem into a parsing framework, we search for pseudo-words in a monolingual way and a bilingual synchronous way. Experiments show that pseudo-word significantly outperforms word for PB-SMT model in both travel translation domain and news translation domain. 1

2 0.62479556 262 acl-2010-Word Alignment with Synonym Regularization

Author: Hiroyuki Shindo ; Akinori Fujino ; Masaaki Nagata

Abstract: We present a novel framework for word alignment that incorporates synonym knowledge collected from monolingual linguistic resources in a bilingual probabilistic model. Synonym information is helpful for word alignment because we can expect a synonym to correspond to the same word in a different language. We design a generative model for word alignment that uses synonym information as a regularization term. The experimental results show that our proposed method significantly improves word alignment quality.

3 0.60948503 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation

Author: John DeNero ; Dan Klein

Abstract: We present a discriminative model that directly predicts which set ofphrasal translation rules should be extracted from a sentence pair. Our model scores extraction sets: nested collections of all the overlapping phrase pairs consistent with an underlying word alignment. Extraction set models provide two principle advantages over word-factored alignment models. First, we can incorporate features on phrase pairs, in addition to word links. Second, we can optimize for an extraction-based loss function that relates directly to the end task of generating translations. Our model gives improvements in alignment quality relative to state-of-the-art unsupervised and supervised baselines, as well as providing up to a 1.4 improvement in BLEU score in Chinese-to-English translation experiments.

4 0.60646951 147 acl-2010-Improving Statistical Machine Translation with Monolingual Collocation

Author: Zhanyi Liu ; Haifeng Wang ; Hua Wu ; Sheng Li

Abstract: This paper proposes to use monolingual collocations to improve Statistical Machine Translation (SMT). We make use of the collocation probabilities, which are estimated from monolingual corpora, in two aspects, namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT. The experimental results show that our method improves the performance of both word alignment and translation quality significantly. As compared to baseline systems, we achieve absolute improvements of 2.40 BLEU score on a phrase-based SMT system and 1.76 BLEU score on a parsing-based SMT system. 1

5 0.60273826 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment

Author: Shujie Liu ; Chi-Ho Li ; Ming Zhou

Abstract: While Inversion Transduction Grammar (ITG) has regained more and more attention in recent years, it still suffers from the major obstacle of speed. We propose a discriminative ITG pruning framework using Minimum Error Rate Training and various features from previous work on ITG alignment. Experiment results show that it is superior to all existing heuristics in ITG pruning. On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs, which improves both F-score and Bleu score over the baseline alignment system of GIZA++. 1

6 0.58477753 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation

7 0.58153528 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out

8 0.55605519 133 acl-2010-Hierarchical Search for Word Alignment

9 0.55003023 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints

10 0.53613359 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models

11 0.53509545 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages

12 0.53080755 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation

13 0.52814358 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels

14 0.50338244 3 acl-2010-A Bayesian Method for Robust Estimation of Distributional Similarities

15 0.49995548 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules

16 0.48734602 50 acl-2010-Bilingual Lexicon Generation Using Non-Aligned Signatures

17 0.47657466 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation

18 0.47562432 169 acl-2010-Learning to Translate with Source and Target Syntax

19 0.44795188 54 acl-2010-Boosting-Based System Combination for Machine Translation

20 0.44255394 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.016), (25, 0.026), (42, 0.011), (44, 0.013), (59, 0.091), (73, 0.021), (78, 0.011), (83, 0.047), (84, 0.014), (98, 0.651)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.9947114 221 acl-2010-Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish

Author: Reyyan Yeniterzi ; Kemal Oflazer

Abstract: We present a novel scheme to apply factored phrase-based SMT to a language pair with very disparate morphological structures. Our approach relies on syntactic analysis on the source side (English) and then encodes a wide variety of local and non-local syntactic structures as complex structural tags which appear as additional factors in the training data. On the target side (Turkish), we only perform morphological analysis and disambiguation but treat the complete complex morphological tag as a factor, instead of separating morphemes. We incrementally explore capturing various syntactic substructures as complex tags on the English side, and evaluate how our translations improve in BLEU scores. Our maximal set of source and target side transformations, coupled with some additional techniques, provide an 39% relative improvement from a baseline 17.08 to 23.78 BLEU, all averaged over 10 training and test sets. Now that the syntactic analysis on the English side is available, we also experiment with more long distance constituent reordering to bring the English constituent order close to Turkish, but find that these transformations do not provide any additional consistent tangible gains when averaged over the 10 sets.

2 0.99453056 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach

Author: Yabin Zheng ; Zhiyuan Liu ; Lixing Xie

Abstract: Motivated by Google Sets, we study the problem of growing related words from a single seed word by leveraging user behaviors hiding in user records of Chinese input method. Our proposed method is motivated by the observation that the more frequently two words cooccur in user records, the more related they are. First, we utilize user behaviors to generate candidate words. Then, we utilize search engine to enrich candidate words with adequate semantic features. Finally, we reorder candidate words according to their semantic relatedness to the seed word. Experimental results on a Chinese input method dataset show that our method gains better performance. 1

3 0.99372339 242 acl-2010-Tree-Based Deterministic Dependency Parsing - An Application to Nivre's Method -

Author: Kotaro Kitagawa ; Kumiko Tanaka-Ishii

Abstract: Nivre’s method was improved by enhancing deterministic dependency parsing through application of a tree-based model. The model considers all words necessary for selection of parsing actions by including words in the form of trees. It chooses the most probable head candidate from among the trees and uses this candidate to select a parsing action. In an evaluation experiment using the Penn Treebank (WSJ section), the proposed model achieved higher accuracy than did previous deterministic models. Although the proposed model’s worst-case time complexity is O(n2), the experimental results demonstrated an average pars- ing time not much slower than O(n).

4 0.98876476 27 acl-2010-An Active Learning Approach to Finding Related Terms

Author: David Vickrey ; Oscar Kipersztok ; Daphne Koller

Abstract: We present a novel system that helps nonexperts find sets of similar words. The user begins by specifying one or more seed words. The system then iteratively suggests a series of candidate words, which the user can either accept or reject. Current techniques for this task typically bootstrap a classifier based on a fixed seed set. In contrast, our system involves the user throughout the labeling process, using active learning to intelligently explore the space of similar words. In particular, our system can take advantage of negative examples provided by the user. Our system combines multiple preexisting sources of similarity data (a standard thesaurus, WordNet, contextual similarity), enabling it to capture many types of similarity groups (“synonyms of crash,” “types of car,” etc.). We evaluate on a hand-labeled evaluation set; our system improves over a strong baseline by 36%.

same-paper 5 0.98798668 201 acl-2010-Pseudo-Word for Phrase-Based Machine Translation

Author: Xiangyu Duan ; Min Zhang ; Haizhou Li

Abstract: The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus. But word appears to be too fine-grained in some cases such as non-compositional phrasal equivalences, where no clear word alignments exist. Using words as inputs to PBSMT pipeline has inborn deficiency. This paper proposes pseudo-word as a new start point for PB-SMT pipeline. Pseudo-word is a kind of basic multi-word expression that characterizes minimal sequence of consecutive words in sense of translation. By casting pseudo-word searching problem into a parsing framework, we search for pseudo-words in a monolingual way and a bilingual synchronous way. Experiments show that pseudo-word significantly outperforms word for PB-SMT model in both travel translation domain and news translation domain. 1

6 0.98105675 8 acl-2010-A Hybrid Hierarchical Model for Multi-Document Summarization

7 0.97873497 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment

8 0.95273137 253 acl-2010-Using Smaller Constituents Rather Than Sentences in Active Learning for Japanese Dependency Parsing

9 0.94483691 20 acl-2010-A Transition-Based Parser for 2-Planar Dependency Structures

10 0.93729097 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models

11 0.92501533 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages

12 0.88930833 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints

13 0.88750625 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction

14 0.88143259 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification

15 0.88036323 79 acl-2010-Cross-Lingual Latent Topic Extraction

16 0.87018967 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking

17 0.86743861 262 acl-2010-Word Alignment with Synonym Regularization

18 0.85789847 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels

19 0.85699999 188 acl-2010-Optimizing Informativeness and Readability for Sentiment Summarization

20 0.85373598 133 acl-2010-Hierarchical Search for Word Alignment