acl acl2012 acl2012-105 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Junhui Li ; Zhaopeng Tu ; Guodong Zhou ; Josef van Genabith
Abstract: This paper presents an extension of Chiang’s hierarchical phrase-based (HPB) model, called Head-Driven HPB (HD-HPB), which incorporates head information in translation rules to better capture syntax-driven information, as well as improved reordering between any two neighboring non-terminals at any stage of a derivation to explore a larger reordering search space. Experiments on Chinese-English translation on four NIST MT test sets show that the HD-HPB model significantly outperforms Chiang’s model with average gains of 1.91 points absolute in BLEU. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Processing Institute of Computing Technology, Chinese Academy of Sciences ‡School of Computer Science and Technology Soochow University, China { j l j ose f}@ comput ing . [sent-3, score-0.021]
2 Experiments on Chinese-English translation on four NIST MT test sets show that the HD-HPB model significantly outperforms Chiang’s model with average gains of 1. [sent-7, score-0.214]
3 1 Introduction Chiang’s hierarchical phrase-based (HPB) transla- tion model utilizes synchronous context free grammar (SCFG) for translation derivation (Chiang, 2005; Chiang, 2007) and has been widely adopted in statistical machine translation (SMT). [sent-9, score-0.506]
4 Typically, such models define two types of translation rules: hierarchical (translation) rules which consist of both terminals and non-terminals, and glue (grammar) rules which combine translated phrases in a monotone fashion. [sent-10, score-0.779]
5 Due to lack of linguistic knowledge, Chiang’s HPB model contains only one type of nonterminal symbol X, often making it difficult to select the most appropriate translation rules. [sent-11, score-0.234]
6 1 What is more, Chiang’s HPB model suffers from limited phrase reordering combining translated phrases in a monotonic way with glue rules. [sent-12, score-0.396]
7 In addition, once a 1Another non-terminal symbol S is used in glue rules. [sent-13, score-0.197]
8 cn glue rule is adopted, it requires all rules above it to be glue rules. [sent-19, score-0.499]
9 One important research question is therefore how to refine the non-terminal category X using linguis- tically motivated information: Zollmann and Venugopal (2006) (SAMT) e. [sent-20, score-0.054]
10 Inspired by previous work in parsing (Charniak, 2000; Collins, 2003), our Head-Driven HPB (HD-HPB) model is based on the intuition that linguistic heads provide important information about a constituent or distributionally defined fragment, as in HPB. [sent-26, score-0.152]
11 We identify heads using linguistically motivated dependency parsing, and use their POS to refine X. [sent-27, score-0.185]
12 In addition HD-HPB provides flexible reordering rules freely mixing translation and reordering (including swap) at any stage in a derivation. [sent-28, score-0.526]
13 Different from the soft constraint modeling adopted in (Chan et al. [sent-29, score-0.06]
14 , 2011), our approach encodes syntactic information in translation rules. [sent-34, score-0.14]
15 However, the two approaches are not mutually exclusive, as we could also include a set of syntax-driven features into our translation model. [sent-35, score-0.14]
16 Our approach maintains the advantages of Chiang’s HPB model while at the same time incorporating head information and flexProceeJedijung, sR oefpu thbeli c50 othf K Aonrneua,a8l -M14ee Jtiunlgy o 2f0 t1h2e. [sent-36, score-0.094]
17 ible reordering in a derivation in a natural way. [sent-40, score-0.177]
18 Experiments on Chinese-English translation using four NIST MT test sets show that our HD-HPB model significantly outperforms Chiang’s HPB as well as a SAMT-style refined version of HPB. [sent-41, score-0.26]
19 2 Head-Driven HPB Translation Model Like Chiang (2005) and Chiang (2007), our HDHPB translation model adopts a synchronous context free grammar, a rewriting system which generates source and target side string pairs simultaneously using a context-free grammar. [sent-42, score-0.355]
20 Instead of collapsing all non-terminals in the source language into a single symbol X as in Chiang (2007), given a word sequence fji from position ito position j, we first find heads and then concatenate the POS tags of these heads as fji’s non-terminal symbol. [sent-43, score-0.455]
21 Specifically, we adopt unlabeled dependency structure to derive heads, which are defined as: Definition 1. [sent-44, score-0.05]
22 For word sequence fji, word fk (i ≤ k ≤ j) is regarded as a head if it is dominat(eid ≤ by a ≤w jo)rd i sou rtesgidaerd of this sequence. [sent-45, score-0.102]
23 Note that this definition (i) allows for a word sequence to have one or more heads (largely due to the fact that a word sequence is not necessarily linguistically constrained) and (ii) ensures that heads are always the highest heads in the sequence from a dependency structure perspective. [sent-46, score-0.506]
24 For example, the word sequence ouzhou baguo lianming in Figure 1 has two heads (i. [sent-47, score-0.529]
25 , baguo and lianming, ouzhou is not a head of this sequence since its headword baguo falls within this sequence) and the non-terminal corresponding to the sequence is thus labeled as NNAD. [sent-49, score-0.406]
26 It is worth noting that in this paper we only refine non-terminal X on the source side to headinformed ones, while still using X on the target side. [sent-50, score-0.148]
27 For rule extraction, we first identify initial phrase pairs on word-aligned sentence pairs by using the same criterion as most phrase-based translation models (Och and Ney, 2004) and Chiang’s HPB model (Chiang, 2005; Chiang, 2007). [sent-52, score-0.353]
28 We extract HD-HRs and NRRs based on initial phrase pairs, respectively. [sent-53, score-0.07]
29 1 HD-HRs: Head-Driven Hierarchical Rules As mentioned, a HD-HR has at least one terminal on both source and target sides. [sent-55, score-0.092]
30 This is the same as the hierarchical rules defined in Chiang’s HPB model (Chiang, 2007), except that we use head POSinformed non-terminal symbols in the source language. [sent-56, score-0.357]
31 We look for initial phrase pairs that contain other phrases and then replace sub-phrases with POS tags corresponding to their heads. [sent-57, score-0.093]
32 Given the word alignment in Figure 1, Table 1 demonstrates the difference between hierarchical rules in Chiang (2007) and HD-HRs defined here. [sent-58, score-0.248]
33 Similar to Chiang’s HPB model, our HD-HPB model will result in a large number of rules causing problems in decoding. [sent-59, score-0.134]
34 To alleviate these problems, we filter our HD-HRs according to the same constraints as described in Chiang (2007). [sent-60, score-0.023]
35 Moreover, we discard rules that have non-terminals with more than four heads. [sent-61, score-0.132]
36 2 NRRs: Non-terminal Reordering Rules NRRs are translation rules without terminals. [sent-63, score-0.248]
37 X2i ; Swap hY → Y1Y2, X → X2X1i ; Discontinuous swap hY → Y1Y2, X → X2 . [sent-67, score-0.069]
38 , NN, VV, VV-NR) capture the head(s) POS tags of the corresponding word sequence in the source language. [sent-75, score-0.071]
39 Merging two neighboring non-terminals into a single non-terminal, NRRs enable the translation model to explore a wider search space. [sent-76, score-0.19]
40 To speed up decoding, we currently (i) only use monotone and swap NRRs and (ii) limit the number of non-terminals in a NRR to 2. [sent-78, score-0.132]
41 Our decoder is based on CKY-style chart parsing with beam search and searches for the best derivation bottom-up. [sent-81, score-0.124]
42 For a source span [i, j], it applies both types of HD-HRs and NRRs. [sent-82, score-0.062]
43 However, HDHRs are only applied to generate derivations spanning no more than K words the initial phrase length limit used in training to extract HD-HRs while NRRs are applied to derivations spanning any length. [sent-83, score-0.156]
44 – – 35 3 Experiments We evaluate the performance of our HD-HPB model and compare it with our implementation of Chiang’s HPB model (Chiang, 2007), a source-side SAMTstyle refined version of HPB (SAMT-HPB), and the Moses implementation of HPB. [sent-85, score-0.124]
45 For Moses HPB, we use “grow-diag-final-and” to obtain symmetric word alignments, 10 for the maximum phrase length, and the recommended default values for all other parameters. [sent-89, score-0.058]
46 2 We use the 2002 NIST MT evaluation test data (878 sentence pairs) as the development data, and the 2003, 2004, 2005, 2006-news NIST MT evaluation test data (919, 1788, 1082, and 616 sentence pairs, respectively) as the test data. [sent-92, score-0.066]
47 To find heads, we parse the source sentences with the Berkeley Parser3 (Petrov and Klein, 2007) trained on Chinese TreeBank 6. [sent-93, score-0.037]
48 0 and use the Penn2Malt toolkit4 to obtain (unlabeled) dependency structures. [sent-94, score-0.026]
49 We use the SRI language modeling toolkit to train a 5-gram language model on the Xinhua portion ofthe Gigaword corpus and standard MERT (Och, 2003) to tune the feature weights on the development data. [sent-103, score-0.026]
50 For evaluation, the NIST BLEU script (version 12) with the default settings is used to calculate the BLEU scores. [sent-104, score-0.022]
51 To test whether a performance differ- ence is statistically significant, we conduct significance tests following the paired bootstrap approach (Koehn, 2004). [sent-105, score-0.047]
52 The full rule table size (including HD-HRs and NRRs) of our HDHPB model is ˜1 . [sent-111, score-0.097]
53 5 times that of Chiang’s, largely due to refining the non-terminal symbol X in Chiang’s model into head-informed ones in our model. [sent-112, score-0.063]
54 It is also unsurprising, that the test set-filtered rule table size of our model is only ˜0. [sent-113, score-0.119]
55 7 times that of Chiang’s: this is due to the fact that some of the refined translation rule patterns required by the test set are unattested in the training data. [sent-114, score-0.305]
56 Furthermore, the rule table size of NRRs is much smaller than that of HDHRs since a NRR contains only two non-terminals. [sent-115, score-0.071]
57 Table 3 lists the translation performance with BLEU scores. [sent-116, score-0.14]
58 Note that our re-implementation of Chiang’s original HPB model performs on a par with Moses HPB. [sent-117, score-0.026]
59 Table 3 shows that our HD-HPB model significantly outperforms Chiang’s HPB model with an average improvement of 1. [sent-118, score-0.052]
60 Table 3 shows that the head-driven scheme outperforms a SAMT-style approach (for each test set p < 0. [sent-120, score-0.022]
61 01), indicating that head information is more effective than (partial) CFG categories. [sent-121, score-0.068]
62 Taking lianming zhichi in Figure 1 as an example, HD-HPB labels the span VV, as lianming is dominated by zhichi, effecively ignoring lianming in the translation rule, while the SAMT label is which is more susceptible to data sparsity. [sent-122, score-0.953]
63 In addition, SAMT resorts to X if a text span fails to satisify pre-defined categories. [sent-123, score-0.025]
64 Examining initial phrases ADVP:AD+VV5 5the constituency structure for lianming zhichi is (VP (ADVP (AD lianming)) (VP (VV zhichi) . [sent-124, score-0.428]
65 Note: 1) For HD-HPB, the rule sizes separated by / indicate HD-HRs and NRRs, respectively; 2) Except for “Total”, the figures correspond to rules filtered on the corresponding test set. [sent-148, score-0.201]
66 Note: 1) SAMT-HPB indicates our HD-HPB model with nonterminal scheme of Zollmann and Venugopal (2006); 2) HD-HR+Glue indicates our HD-HPB model replacing NRRs with glue rules; 3) Significance tests for Moses HPB, HD-HPB, SAMT-HPB, and HD-HR+Glue are done against HPB. [sent-176, score-0.268]
67 In order to separate out the individual contributions of the novel HD-HRs and NRRs, we carry out an additional experiment (HD-HR+Glue) using HDHRs with monotonic glue rules only (adjusted to refined rule labels, but effectively switching off the extra reordering power of full NRRs). [sent-178, score-0.585]
68 Table 3 shows that on average more than half of the improvement over HPB (Chiang and Moses) comes from the refined HD-HRs, the rest from NRRs. [sent-179, score-0.072]
69 Examining translation rules extracted from the training data shows that there are 72,366 types of non-terminals with respect to 33 types of POS tags. [sent-180, score-0.248]
70 6 hierarchical rules/glue rules in Chiang’s model, providing further indication of the importance of NRRs in translation. [sent-185, score-0.226]
71 4 Conclusion We present a head-driven hierarchical phrase-based (HD-HPB) translation model, which adopts head information (derived through unlabeled dependency analysis) in the definition of non-terminals to better differentiate among translation rules. [sent-186, score-0.545]
72 In addition, improved and better integrated reordering rules allow better reordering between consecutive non-terminals through exploration of a larger search space in the derivation. [sent-187, score-0.386]
73 Experimental results on Chinese-English translation across four test sets demonstrate significant improvements of the HDHPB model over both Chiang’s HPB and a sourceside SAMT-style refined version of HPB. [sent-188, score-0.26]
74 Soft dependency constraints for reordering in hierarchical phrase-based translation. [sent-221, score-0.306]
75 Maximum entropy based phrase reordering for hierarchical phrase-based translation. [sent-225, score-0.293]
76 Soft syntactic constraints for hierarchical 37 phrase-based translation using latent syntactic distributions. [sent-229, score-0.281]
77 In Proceedings of NAACL 2006 - Workshop on Statistical Machine Translation, pages 138–141 . [sent-270, score-0.033]
78 A wordclass approach to labeling PSCFG rules for machine translation. [sent-273, score-0.108]
wordName wordTfidf (topN-words)
[('hpb', 0.566), ('nrrs', 0.418), ('chiang', 0.245), ('lianming', 0.197), ('zhichi', 0.197), ('glue', 0.16), ('translation', 0.14), ('reordering', 0.139), ('heads', 0.126), ('meiguo', 0.123), ('hierarchical', 0.118), ('rules', 0.108), ('america', 0.104), ('baguo', 0.098), ('lichang', 0.098), ('zollmann', 0.086), ('samt', 0.086), ('terminals', 0.082), ('fji', 0.074), ('hdhpb', 0.074), ('hdhrs', 0.074), ('nrr', 0.074), ('ouzhou', 0.074), ('refined', 0.072), ('rule', 0.071), ('moses', 0.069), ('swap', 0.069), ('head', 0.068), ('hy', 0.064), ('monotone', 0.063), ('mt', 0.053), ('chart', 0.053), ('almaghout', 0.049), ('mylonakis', 0.049), ('plex', 0.049), ('nist', 0.048), ('side', 0.047), ('josef', 0.045), ('och', 0.044), ('stand', 0.044), ('advp', 0.043), ('dublin', 0.043), ('derivations', 0.043), ('vv', 0.04), ('bleu', 0.039), ('localisation', 0.039), ('venugopal', 0.039), ('derivation', 0.038), ('soft', 0.038), ('source', 0.037), ('symbol', 0.037), ('sima', 0.037), ('phrase', 0.036), ('monotonic', 0.035), ('sequence', 0.034), ('initial', 0.034), ('beam', 0.033), ('exp', 0.033), ('refine', 0.033), ('franz', 0.033), ('pages', 0.033), ('discontinuous', 0.031), ('marton', 0.031), ('nonterminal', 0.031), ('pos', 0.031), ('target', 0.031), ('adopts', 0.029), ('tu', 0.028), ('koehn', 0.028), ('centre', 0.027), ('cfg', 0.027), ('naacl', 0.027), ('cell', 0.027), ('dependency', 0.026), ('model', 0.026), ('span', 0.025), ('tests', 0.025), ('chan', 0.025), ('examining', 0.025), ('discard', 0.024), ('neighboring', 0.024), ('philipp', 0.024), ('terminal', 0.024), ('emnlp', 0.024), ('unlabeled', 0.024), ('gao', 0.023), ('penalty', 0.023), ('constraints', 0.023), ('constituents', 0.023), ('pairs', 0.023), ('test', 0.022), ('vp', 0.022), ('synchronous', 0.022), ('default', 0.022), ('alignment', 0.022), ('adopted', 0.022), ('collapsing', 0.021), ('tically', 0.021), ('bowen', 0.021), ('comput', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 105 acl-2012-Head-Driven Hierarchical Phrase-based Translation
Author: Junhui Li ; Zhaopeng Tu ; Guodong Zhou ; Josef van Genabith
Abstract: This paper presents an extension of Chiang’s hierarchical phrase-based (HPB) model, called Head-Driven HPB (HD-HPB), which incorporates head information in translation rules to better capture syntax-driven information, as well as improved reordering between any two neighboring non-terminals at any stage of a derivation to explore a larger reordering search space. Experiments on Chinese-English translation on four NIST MT test sets show that the HD-HPB model significantly outperforms Chiang’s model with average gains of 1.91 points absolute in BLEU. 1
2 0.1892347 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
Author: Tong Xiao ; Jingbo Zhu ; Hao Zhang ; Qiang Li
Abstract: We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation, including the phrase-based model, the hierachical phrase-based model, and various syntaxbased models. The key innovation provided by the toolkit is that the decoder can work with various grammars and offers different choices of decoding algrithms, such as phrase-based decoding, decoding as parsing/tree-parsing and forest-based decoding. Moreover, several useful utilities were distributed with the toolkit, including a discriminative reordering model, a simple and fast language model, and an implementation of minimum error rate training for weight tuning. 1
3 0.14467408 108 acl-2012-Hierarchical Chunk-to-String Translation
Author: Yang Feng ; Dongdong Zhang ; Mu Li ; Qun Liu
Abstract: We present a hierarchical chunk-to-string translation model, which can be seen as a compromise between the hierarchical phrasebased model and the tree-to-string model, to combine the merits of the two models. With the help of shallow parsing, our model learns rules consisting of words and chunks and meanwhile introduce syntax cohesion. Under the weighed synchronous context-free grammar defined by these rules, our model searches for the best translation derivation and yields target translation simultaneously. Our experiments show that our model significantly outperforms the hierarchical phrasebased model and the tree-to-string model on English-Chinese Translation tasks.
4 0.13699295 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation
Author: Arianna Bisazza ; Marcello Federico
Abstract: This paper presents a novel method to suggest long word reorderings to a phrase-based SMT decoder. We address language pairs where long reordering concentrates on few patterns, and use fuzzy chunk-based rules to predict likely reorderings for these phenomena. Then we use reordered n-gram LMs to rank the resulting permutations and select the n-best for translation. Finally we encode these reorderings by modifying selected entries of the distortion cost matrix, on a per-sentence basis. In this way, we expand the search space by a much finer degree than if we simply raised the distortion limit. The proposed techniques are tested on Arabic-English and German-English using well-known SMT benchmarks.
5 0.13242909 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation
Author: Nan Yang ; Mu Li ; Dongdong Zhang ; Nenghai Yu
Abstract: Long distance word reordering is a major challenge in statistical machine translation research. Previous work has shown using source syntactic trees is an effective way to tackle this problem between two languages with substantial word order difference. In this work, we further extend this line of exploration and propose a novel but simple approach, which utilizes a ranking model based on word order precedence in the target language to reposition nodes in the syntactic parse tree of a source sentence. The ranking model is automatically derived from word aligned parallel data with a syntactic parser for source language based on both lexical and syntactical features. We evaluated our approach on largescale Japanese-English and English-Japanese machine translation tasks, and show that it can significantly outperform the baseline phrase- based SMT system.
6 0.12555908 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT
7 0.12444445 204 acl-2012-Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation
8 0.12222999 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models
9 0.12175601 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
10 0.12039019 128 acl-2012-Learning Better Rule Extraction with Translation Span Alignment
11 0.11177879 143 acl-2012-Mixing Multiple Translation Models in Statistical Machine Translation
12 0.11017796 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
13 0.10114969 179 acl-2012-Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the l0-norm
14 0.10106137 140 acl-2012-Machine Translation without Words through Substring Alignment
15 0.099099882 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT
16 0.093570195 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
17 0.092382893 162 acl-2012-Post-ordering by Parsing for Japanese-English Statistical Machine Translation
18 0.087798305 203 acl-2012-Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information
19 0.087436438 131 acl-2012-Learning Translation Consensus with Structured Label Propagation
20 0.086621813 4 acl-2012-A Comparative Study of Target Dependency Structures for Statistical Machine Translation
topicId topicWeight
[(0, -0.212), (1, -0.206), (2, 0.044), (3, -0.008), (4, -0.002), (5, -0.109), (6, -0.03), (7, 0.046), (8, 0.042), (9, 0.01), (10, 0.022), (11, -0.037), (12, -0.015), (13, -0.05), (14, 0.001), (15, -0.017), (16, -0.032), (17, 0.025), (18, 0.021), (19, -0.143), (20, 0.032), (21, 0.069), (22, -0.094), (23, 0.061), (24, -0.04), (25, 0.018), (26, -0.044), (27, -0.048), (28, 0.042), (29, -0.043), (30, 0.04), (31, 0.037), (32, -0.017), (33, 0.036), (34, -0.063), (35, -0.046), (36, -0.023), (37, -0.011), (38, 0.024), (39, -0.02), (40, 0.004), (41, 0.017), (42, 0.041), (43, -0.014), (44, 0.009), (45, 0.003), (46, 0.034), (47, -0.013), (48, 0.049), (49, 0.008)]
simIndex simValue paperId paperTitle
same-paper 1 0.92685646 105 acl-2012-Head-Driven Hierarchical Phrase-based Translation
Author: Junhui Li ; Zhaopeng Tu ; Guodong Zhou ; Josef van Genabith
Abstract: This paper presents an extension of Chiang’s hierarchical phrase-based (HPB) model, called Head-Driven HPB (HD-HPB), which incorporates head information in translation rules to better capture syntax-driven information, as well as improved reordering between any two neighboring non-terminals at any stage of a derivation to explore a larger reordering search space. Experiments on Chinese-English translation on four NIST MT test sets show that the HD-HPB model significantly outperforms Chiang’s model with average gains of 1.91 points absolute in BLEU. 1
2 0.87314057 108 acl-2012-Hierarchical Chunk-to-String Translation
Author: Yang Feng ; Dongdong Zhang ; Mu Li ; Qun Liu
Abstract: We present a hierarchical chunk-to-string translation model, which can be seen as a compromise between the hierarchical phrasebased model and the tree-to-string model, to combine the merits of the two models. With the help of shallow parsing, our model learns rules consisting of words and chunks and meanwhile introduce syntax cohesion. Under the weighed synchronous context-free grammar defined by these rules, our model searches for the best translation derivation and yields target translation simultaneously. Our experiments show that our model significantly outperforms the hierarchical phrasebased model and the tree-to-string model on English-Chinese Translation tasks.
3 0.84329909 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
Author: Tong Xiao ; Jingbo Zhu ; Hao Zhang ; Qiang Li
Abstract: We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation, including the phrase-based model, the hierachical phrase-based model, and various syntaxbased models. The key innovation provided by the toolkit is that the decoder can work with various grammars and offers different choices of decoding algrithms, such as phrase-based decoding, decoding as parsing/tree-parsing and forest-based decoding. Moreover, several useful utilities were distributed with the toolkit, including a discriminative reordering model, a simple and fast language model, and an implementation of minimum error rate training for weight tuning. 1
4 0.74286467 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation
Author: Arianna Bisazza ; Marcello Federico
Abstract: This paper presents a novel method to suggest long word reorderings to a phrase-based SMT decoder. We address language pairs where long reordering concentrates on few patterns, and use fuzzy chunk-based rules to predict likely reorderings for these phenomena. Then we use reordered n-gram LMs to rank the resulting permutations and select the n-best for translation. Finally we encode these reorderings by modifying selected entries of the distortion cost matrix, on a per-sentence basis. In this way, we expand the search space by a much finer degree than if we simply raised the distortion limit. The proposed techniques are tested on Arabic-English and German-English using well-known SMT benchmarks.
5 0.74161166 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation
Author: Nan Yang ; Mu Li ; Dongdong Zhang ; Nenghai Yu
Abstract: Long distance word reordering is a major challenge in statistical machine translation research. Previous work has shown using source syntactic trees is an effective way to tackle this problem between two languages with substantial word order difference. In this work, we further extend this line of exploration and propose a novel but simple approach, which utilizes a ranking model based on word order precedence in the target language to reposition nodes in the syntactic parse tree of a source sentence. The ranking model is automatically derived from word aligned parallel data with a syntactic parser for source language based on both lexical and syntactical features. We evaluated our approach on largescale Japanese-English and English-Japanese machine translation tasks, and show that it can significantly outperform the baseline phrase- based SMT system.
6 0.72196478 204 acl-2012-Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation
7 0.71075547 128 acl-2012-Learning Better Rule Extraction with Translation Span Alignment
8 0.69627059 162 acl-2012-Post-ordering by Parsing for Japanese-English Statistical Machine Translation
9 0.6868664 131 acl-2012-Learning Translation Consensus with Structured Label Propagation
10 0.67731255 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
11 0.62703264 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT
12 0.59955478 143 acl-2012-Mixing Multiple Translation Models in Statistical Machine Translation
13 0.55635417 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
14 0.55589843 140 acl-2012-Machine Translation without Words through Substring Alignment
15 0.553285 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models
16 0.52288121 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
17 0.52145159 107 acl-2012-Heuristic Cube Pruning in Linear Time
19 0.5120337 66 acl-2012-DOMCAT: A Bilingual Concordancer for Domain-Specific Computer Assisted Translation
20 0.51103508 203 acl-2012-Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information
topicId topicWeight
[(25, 0.018), (26, 0.024), (28, 0.052), (30, 0.038), (37, 0.034), (39, 0.041), (52, 0.292), (57, 0.019), (74, 0.039), (82, 0.021), (84, 0.021), (85, 0.037), (90, 0.125), (92, 0.034), (94, 0.074), (99, 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 0.70152509 105 acl-2012-Head-Driven Hierarchical Phrase-based Translation
Author: Junhui Li ; Zhaopeng Tu ; Guodong Zhou ; Josef van Genabith
Abstract: This paper presents an extension of Chiang’s hierarchical phrase-based (HPB) model, called Head-Driven HPB (HD-HPB), which incorporates head information in translation rules to better capture syntax-driven information, as well as improved reordering between any two neighboring non-terminals at any stage of a derivation to explore a larger reordering search space. Experiments on Chinese-English translation on four NIST MT test sets show that the HD-HPB model significantly outperforms Chiang’s model with average gains of 1.91 points absolute in BLEU. 1
2 0.69628155 126 acl-2012-Labeling Documents with Timestamps: Learning from their Time Expressions
Author: Nathanael Chambers
Abstract: Temporal reasoners for document understanding typically assume that a document’s creation date is known. Algorithms to ground relative time expressions and order events often rely on this timestamp to assist the learner. Unfortunately, the timestamp is not always known, particularly on the Web. This paper addresses the task of automatic document timestamping, presenting two new models that incorporate rich linguistic features about time. The first is a discriminative classifier with new features extracted from the text’s time expressions (e.g., ‘since 1999’). This model alone improves on previous generative models by 77%. The second model learns probabilistic constraints between time expressions and the unknown document time. Imposing these learned constraints on the discriminative model further improves its accuracy. Finally, we present a new experiment design that facil- itates easier comparison by future work.
3 0.6728586 35 acl-2012-Automatically Mining Question Reformulation Patterns from Search Log Data
Author: Xiaobing Xue ; Yu Tao ; Daxin Jiang ; Hang Li
Abstract: Natural language questions have become popular in web search. However, various questions can be formulated to convey the same information need, which poses a great challenge to search systems. In this paper, we automatically mined 5w1h question reformulation patterns from large scale search log data. The question reformulations generated from these patterns are further incorporated into the retrieval model. Experiments show that using question reformulation patterns can significantly improve the search performance of natural language questions.
Author: Xu Sun ; Houfeng Wang ; Wenjie Li
Abstract: We present a joint model for Chinese word segmentation and new word detection. We present high dimensional new features, including word-based features and enriched edge (label-transition) features, for the joint modeling. As we know, training a word segmentation system on large-scale datasets is already costly. In our case, adding high dimensional new features will further slow down the training speed. To solve this problem, we propose a new training method, adaptive online gradient descent based on feature frequency information, for very fast online training of the parameters, even given large-scale datasets with high dimensional features. Compared with existing training methods, our training method is an order magnitude faster in terms of training time, and can achieve equal or even higher accuracies. The proposed fast training method is a general purpose optimization method, and it is not limited in the specific task discussed in this paper.
5 0.49900767 118 acl-2012-Improving the IBM Alignment Models Using Variational Bayes
Author: Darcey Riley ; Daniel Gildea
Abstract: Bayesian approaches have been shown to reduce the amount of overfitting that occurs when running the EM algorithm, by placing prior probabilities on the model parameters. We apply one such Bayesian technique, variational Bayes, to the IBM models of word alignment for statistical machine translation. We show that using variational Bayes improves the performance of the widely used GIZA++ software, as well as improving the overall performance of the Moses machine translation system in terms of BLEU score.
7 0.49658903 140 acl-2012-Machine Translation without Words through Substring Alignment
8 0.49512413 6 acl-2012-A Comprehensive Gold Standard for the Enron Organizational Hierarchy
9 0.4933728 173 acl-2012-Self-Disclosure and Relationship Strength in Twitter Conversations
10 0.49171847 136 acl-2012-Learning to Translate with Multiple Objectives
11 0.48964041 176 acl-2012-Sentence Compression with Semantic Role Constraints
12 0.48881745 179 acl-2012-Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the l0-norm
13 0.48869339 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation
14 0.48826611 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
15 0.48676008 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
16 0.48592496 158 acl-2012-PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning
17 0.48425725 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules
18 0.48226914 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
19 0.48092785 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT
20 0.48011833 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents