acl acl2012 acl2012-131 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shujie Liu ; Chi-Ho Li ; Mu Li ; Ming Zhou
Abstract: In this paper, we address the issue for learning better translation consensus in machine translation (MT) research, and explore the search of translation consensus from similar, rather than the same, source sentences or their spans. Unlike previous work on this topic, we formulate the problem as structured labeling over a much smaller graph, and we propose a novel structured label propagation for the task. We convert such graph-based translation consensus from similar source strings into useful features both for n-best output reranking and for decoding algorithm. Experimental results show that, our method can significantly improve machine translation performance on both IWSLT and NIST data, compared with a state-ofthe-art baseline. 1
Reference: text
sentIndex sentText sentNum sentScore
1 cn Abstract In this paper, we address the issue for learning better translation consensus in machine translation (MT) research, and explore the search of translation consensus from similar, rather than the same, source sentences or their spans. [sent-4, score-2.422]
2 Unlike previous work on this topic, we formulate the problem as structured labeling over a much smaller graph, and we propose a novel structured label propagation for the task. [sent-5, score-0.459]
3 We convert such graph-based translation consensus from similar source strings into useful features both for n-best output reranking and for decoding algorithm. [sent-6, score-1.329]
4 The principle of consensus can be sketched as “a translation candidate is deemed more plausible if it is supported by other translation candidates. [sent-9, score-1.356]
5 ” The actual formulation of the principle depends on whether the translation candidate is a complete sentence or just a span of it, whether the candidate is the same as or similar to the supporting candidates, and whether the supporting candidates come from the same or different MT system. [sent-10, score-0.626]
6 com Translation consensus is employed in those minimum Bayes risk (MBR) approaches where the loss function of a translation is defined with respect to all other translation candidates. [sent-13, score-1.353]
7 That is, the translation with the minimal Bayes risk is the one to the greatest extent similar to other candidates. [sent-14, score-0.298]
8 Others extend consensus among translations from the same MT system to those from different MT systems. [sent-18, score-0.79]
9 , 2009) scores the translation of a source span by its n-gram similarity to the translations by other systems. [sent-20, score-0.588]
10 All these approaches are about utilizing consensus among translations for the same (span of) source sentence. [sent-23, score-0.905]
11 It should be noted that consensus among translations of similar source sentences/spans is also helpful for good candidate selection. [sent-24, score-0.943]
12 For the source (Chinese) span “五五百百 元元 以以下下 下下 的的 茶茶 ”, the MT system produced the correct translation for the second sentence, but it failed to do so for the first one. [sent-26, score-0.478]
13 If the translation of the first sentence could take into consideration the translation of the second sentence, which is similar to but not exactly the same as the first one, the final translation output may be improved. [sent-27, score-0.88]
14 Following this line of reasoning, a discriminative learning method is proposed to constrain the translation of an input sentence using Proce dJienjgus, R ofep thueb 5lic0t hof A Knonruea ,l M 8-e1e4ti Jnugly o f2 t0h1e2 A. [sent-28, score-0.325]
15 the most similar translation examples from translation memory (TM) systems (Ma et al. [sent-33, score-0.534]
16 A classifier is applied to re-rank the n-best output of a decoder, taking as features the information about the agreement with those similar translation examples. [sent-35, score-0.332]
17 Note that these two attempts are about translation consensus for similar sentences, and about reranking of n-best output. [sent-37, score-1.017]
18 It is still an open question whether translation consensus for similar sentences/spans can be applied to the decoding process. [sent-38, score-1.125]
19 ) In this paper, we attempt to leverage translation consensus among similar (spans of) source sentences in bilingual training data, by a novel graph-based model of translation consensus. [sent-43, score-1.447]
20 Unlike Alexandrescu and Kirchhoff (2009), we reformulate the task of seeking translation consensus among source sentences as structured labeling. [sent-44, score-1.247]
21 We propose a novel label propagation algorithm for structured labeling, which is much more efficient than simple label propagation, and derive useful MT decoder features out of it. [sent-45, score-0.628]
22 2 Graph-based Translation Consensus Our MT system with graph-based translation consensus adopts the conventional log-linear model. [sent-47, score-0.993]
23 , the conditional probability of a translation candidate ? [sent-49, score-0.336]
24 is the set of translation hypotheses in the search space. [sent-81, score-0.309]
25 We develop a structured label propagation method, which can calculate consensus statistics from translation candidates of similar source sentences/spans. [sent-83, score-1.585]
26 In the following, we explain why the standard, simple label propagation is not suitable for translation consensus, and then introduce how the problem is formulated as an instance of structured labeling, with the proposed structured label propagation algorithm, in section 3. [sent-84, score-0.973]
27 Before elaborating how the graph model of consensus is constructed for both a decoder and N-best output re-ranking in section 5, we will describe how the consensus features and their feature weights can be trained in a semi-supervised way, in section 4. [sent-85, score-1.912]
28 A graph is constructed so that each instance is represented by a node, and the weight of the edge between a pair of nodes represents the similarity between them. [sent-87, score-0.44]
29 In MT, the instances are source sentences or spans of source sentences, and the possible labels are their translation candidates. [sent-89, score-0.687]
30 Therefore, the principle of graph-based translation consensus must be reformulated as, if two instances (source spans) are similar, then their labels (translations) tend to be similar (rather than the same). [sent-97, score-1.179]
31 Note that Alexandrescu and Kirchhoff (2009) do not consider translation as structured labeling. [sent-98, score-0.373]
32 In their graph, a node does not represent only a source sentence but a pair of source sentence and its candidate translation, and there are only two possible labels for each node, namely, 1 (this is a good translation pair) and 0 (this is not a good translation pair). [sent-99, score-1.11]
33 An average MT decoder considers a vast amount of translation candidates for each source sentence, and therefore the corresponding graph also contains a vast amount of nodes, thus rendering learning over a large dataset is infeasible. [sent-102, score-0.811]
34 Note that the graph contains nodes for training instances, whose correct labels are known. [sent-169, score-0.404]
35 According to them, a node in the graph represents the pair of some source sentence/span ? [sent-193, score-0.391]
36 When the problem is reformulated as structured labeling, each node represents the source sentence/span only, and the translation candidates become labels. [sent-252, score-0.776]
37 is the set of translation candidates for source ? [sent-371, score-0.506]
38 The new rule updates the probability of a translation ? [sent-495, score-0.335]
39 ; 4 Features and Training The last section sketched the structured label propagation algorithm. [sent-584, score-0.384]
40 Before elaborating the details of how the actual graph is constructed, we would like to first introduce how the graph-based translation consensus can be used in an MT system. [sent-585, score-1.179]
41 1 Graph-based Consensus Features The probability as estimated in equation (7) is taken as a group of new features in either a decoder or an n-best output re-ranker. [sent-587, score-0.289]
42 We will call these features collectively as graph-based consensus features (GC): ? [sent-588, score-0.814]
43 4 for similarity between translation candidates, thus leading to four features. [sent-745, score-0.335]
44 2 Other Features In addition to graph-based consensus features, we also propose local consensus features, defined over the n-best translation candidates as: ? [sent-856, score-1.892]
45 3 Training Method When graph-based consensus is applied to an MT system, the graph will have nodes for training data, development (dev) data, and test data (details in Section 5). [sent-883, score-1.097]
46 For each dev/test data node, the possible labels are the n-best translation candidates from the decoder. [sent-885, score-0.459]
47 Note that there is mutual dependence between the consensus graph and the decoder. [sent-886, score-0.878]
48 On the one hand, the MT decoder depends on the graph for the GC features. [sent-887, score-0.305]
49 On the other hand, the graph needs the decoder to provide the translation candidates as possible labels, and their posterior probabilities as initial values of various ? [sent-888, score-0.754]
50 Therefore, we can alternatively update graph-based consensus features and feature weights in the log-linear model. [sent-892, score-0.805]
51 The entire process starts with a decoder without consensus features. [sent-996, score-0.879]
52 The decoder with new feature weights then provides new n-best candidates and their posteriors for constructing another consensus graph, which in turn gives rise to next round of 306 12E3, A e 12 B0 a . [sent-1002, score-1.066]
53 This alternation of structured label propagation and MERT stops when the BLEU score on dev data converges, or a pre-set limit (10 rounds) is reached. [sent-1009, score-0.39]
54 5 Graph Construction A technical detail is still needed to complete the description of graph-based consensus, namely, how the actual consensus graph is constructed. [sent-1010, score-0.878]
55 1 Graph Construction for Re-Ranking When graph-based consensus is used for reranking the n-best outputs of a decoder, each node in the graph corresponds to a complete sentence. [sent-1013, score-1.003]
56 A separate node is created for each source sentence in training data, dev data, and test data. [sent-1014, score-0.386]
57 If there are sentence pairs with the same source sentence but different translations, all the translations will be assigned as labels to that source sentence, and the corresponding probabilities are estimated by MLE. [sent-1019, score-0.506]
58 Each node from dev/test data (henceforth test node) is unlabeled, but it will be given an n-best list of translation candidates as possible labels from a MT decoder. [sent-1021, score-0.596]
59 The decoder also provides translation posteriors as the initial confidences of the labels. [sent-1022, score-0.448]
60 A test node can be connected to training nodes and other test nodes. [sent-1023, score-0.335]
61 If the source sentences of a test node and some other node are sufficiently similar, a similarity edge is created between them. [sent-1024, score-0.507]
62 In our experiment we measure similarity by symmetrical sentence level BLEU of source sentences, and 0. [sent-1025, score-0.31]
63 Each node is depicted as rectangle with the upper half showing the source sentence and the lower half showing the correct or possible labels. [sent-1028, score-0.296]
64 Training nodes are in grey while test nodes are in white. [sent-1029, score-0.282]
65 The edges between the nodes are weighted by the similarities between the corresponding source sentences. [sent-1030, score-0.274]
66 2 Graph Construction for Decoding Graph-based consensus can also be used in the decoding algorithm, by re-ranking the translation candidates of not only the entire source sentence but also every source span. [sent-1032, score-1.537]
67 Accordingly the graph does not contain only the nodes for source sentences but also the nodes for all source spans. [sent-1033, score-0.661]
68 It is not difficult to handle test nodes, since the purpose of MT decoder is to get all possible segmentations of a source sentence in dev/test data, search for the translation candidates of each source span, and calculate the probabilities of the candidates. [sent-1035, score-0.917]
69 Therefore, the cells in the search space of a decoder can be directly mapped as test nodes in the graph. [sent-1036, score-0.368]
70 Forced alignment performs phrase segmentation and alignment of each sentence pair of the training data using the full translation system as in decoding (Wuebker et al. [sent-1038, score-0.563]
71 In simpler term, for each sentence pair in training data, a decoder is applied to the source side, and all the translation candidates that do not match any substring of the target side are deleted. [sent-1040, score-0.806]
72 The cells of in such a reduced search space of the decoder can be directly mapped as training nodes in the graph, just as in the case of test nodes. [sent-1041, score-0.407]
73 Note that, due to pruning in both decoding and translation model training, forced alignment may fail, i. [sent-1042, score-0.491]
74 Edges in dash line indicate relation between a span and its sub-span, whereas edges of solid line indicate source side similarity. [sent-1048, score-0.311]
75 Note also that the shorter a source span is, the more likely it appears in more than one source sentence. [sent-1049, score-0.304]
76 All the translation candidates of the same source span in different source sentences are merged. [sent-1050, score-0.728]
77 Edge creation is the same as that in graph construction for n-best re-ranking, except that two nodes are always connected if they are about a span and its sub-span. [sent-1051, score-0.373]
78 There is one node for the training sentence "E A M N" and two nodes for the test sentences "E A B C" and "F D B C". [sent-1054, score-0.39]
79 As we see, the translation candidates for "M N" and "E A" are not the sub-strings from the target sentence of "E A M N". [sent-1057, score-0.449]
80 Solid lines are edges connecting nodes with sufficient source side n-gram similarity, such as the one between "E A M N" and "E A B C". [sent-1060, score-0.301]
81 6 Experiments and Results In this section, graph-based translation consensus is tested on the Chinese to English translation tasks. [sent-1061, score-1.26]
82 Our baseline decoder is an in-house implementation of Bracketing Transduction Grammar (Dekai Wu, 1997) (BTG) in CKY-style decoding with a lexical reordering model trained with maximum entropy (Xiong et al. [sent-1067, score-0.31]
83 The features we used are commonly used features as standard BTG decoder, such as translation probabilities, lexical weights, language model, word penalty and distortion probabilities. [sent-1069, score-0.355]
84 To perform consensus-based re-ranking, we first use the baseline decoder to get the n-best list for each sentence of development and test data, then we create graph using the n-best lists and training data as we described in section 5. [sent-1090, score-0.484]
85 We use the baseline system to perform forced alignment procedure on the training data, and create span nodes using the derivation tree of the forced alignment. [sent-1098, score-0.423]
86 In such a way, we create the graph for decoding, and perform semisupervised training to calculate graph-based consensus features, and tune the weights for all the features we used. [sent-1100, score-1.019]
87 Without the graph-based consensus features, our consensus-based re-ranking and decoding is simplified into a consensus re-ranking and consensus decoding system, which only re-rank the candidates according to the consensus information of other candidates in the same n-best list. [sent-1104, score-3.416]
88 We perform modified label propagation with the separate graphs to get the graph-based consensus for n-best list of each sentence, and the graph-based consensus will be recorded for the MERT to tune the weights. [sent-1114, score-1.722]
89 Local consensus features (G-ReRank-LC and G-Decode-LC) improve the performance slightly. [sent-1116, score-0.77]
90 The combination of graphbased and local consensus features can improve the translation performance significantly on SMT re-ranking. [sent-1117, score-1.101]
91 With graph-based consensus features, G-Decode-GC achieves significant performance gain, and combined with local consensus features, G-Decode performance is improved farther. [sent-1118, score-1.48]
92 7 Conclusion and Future Work In this paper, we extend the consensus method by collecting consensus statistics, not only from translation candidates of the same source sentence/span, but also from those of similar ones. [sent-1119, score-1.958]
93 To calculate consensus statistics, we develop a novel structured label propagation method for structured learning problems, such as machine translation. [sent-1120, score-1.185]
94 Note that, the structured label propagation can be applied to other structured learning tasks, such as POS tagging and syntactic parsing. [sent-1121, score-0.459]
95 The consensus statistics are integrated into the conventional log-linear model as features. [sent-1122, score-0.726]
96 In the future, we will explore other consensus features and other similarity measures, which may take document level information, or syntactic and semantic information into consideration. [sent-1126, score-0.838]
97 Mixture model-based minimum bayes risk decoding using multiple machine translation Systems. [sent-1147, score-0.494]
98 Efficient minimum error rate training and minimum bayes-risk decoding for translation hypergraphs and lattices. [sent-1159, score-0.52]
99 Collaborative decoding: partial hypothesis re-ranking using translation consensus between decoders. [sent-1163, score-0.993]
100 Consistent translation using discriminative learning: a translation memory-inspired approach. [sent-1170, score-0.534]
wordName wordTfidf (topN-words)
[('consensus', 0.726), ('translation', 0.267), ('propagation', 0.169), ('decoder', 0.153), ('graph', 0.152), ('alexandrescu', 0.138), ('decoding', 0.132), ('iwslt', 0.129), ('candidates', 0.124), ('nodes', 0.123), ('source', 0.115), ('structured', 0.106), ('node', 0.101), ('nist', 0.1), ('mt', 0.091), ('propagating', 0.086), ('kirchhoff', 0.083), ('label', 0.078), ('span', 0.074), ('forced', 0.07), ('symmetrical', 0.069), ('kumar', 0.068), ('updating', 0.068), ('labels', 0.068), ('similarity', 0.068), ('translations', 0.064), ('reformulated', 0.063), ('spans', 0.061), ('sentence', 0.058), ('edge', 0.053), ('bleu', 0.051), ('shankar', 0.051), ('dice', 0.044), ('toy', 0.044), ('features', 0.044), ('minimum', 0.041), ('mert', 0.04), ('equation', 0.04), ('duan', 0.04), ('training', 0.039), ('dialog', 0.038), ('candidate', 0.038), ('rule', 0.037), ('dev', 0.037), ('graphbased', 0.036), ('test', 0.036), ('edges', 0.036), ('weights', 0.035), ('cells', 0.035), ('mu', 0.035), ('elaborating', 0.034), ('mbr', 0.034), ('sentences', 0.033), ('neighbors', 0.033), ('btg', 0.031), ('dash', 0.031), ('wuebker', 0.031), ('sketched', 0.031), ('risk', 0.031), ('probability', 0.031), ('posterior', 0.03), ('ming', 0.03), ('tromble', 0.029), ('instances', 0.028), ('probabilities', 0.028), ('solid', 0.028), ('posteriors', 0.028), ('dongdong', 0.028), ('local', 0.028), ('principle', 0.027), ('side', 0.027), ('franz', 0.026), ('harbin', 0.026), ('bc', 0.026), ('gc', 0.025), ('baseline', 0.025), ('reranking', 0.024), ('collaborative', 0.024), ('construction', 0.024), ('chinese', 0.024), ('wolfgang', 0.023), ('xiong', 0.023), ('henceforth', 0.023), ('li', 0.023), ('pages', 0.023), ('tune', 0.023), ('bayes', 0.023), ('pair', 0.023), ('correct', 0.022), ('alignment', 0.022), ('denero', 0.022), ('transduction', 0.022), ('defined', 0.021), ('constructed', 0.021), ('nan', 0.021), ('confidence', 0.021), ('hypotheses', 0.021), ('development', 0.021), ('search', 0.021), ('output', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999946 131 acl-2012-Learning Translation Consensus with Structured Label Propagation
Author: Shujie Liu ; Chi-Ho Li ; Mu Li ; Ming Zhou
Abstract: In this paper, we address the issue for learning better translation consensus in machine translation (MT) research, and explore the search of translation consensus from similar, rather than the same, source sentences or their spans. Unlike previous work on this topic, we formulate the problem as structured labeling over a much smaller graph, and we propose a novel structured label propagation for the task. We convert such graph-based translation consensus from similar source strings into useful features both for n-best output reranking and for decoding algorithm. Experimental results show that, our method can significantly improve machine translation performance on both IWSLT and NIST data, compared with a state-ofthe-art baseline. 1
2 0.20695721 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
Author: Tong Xiao ; Jingbo Zhu ; Hao Zhang ; Qiang Li
Abstract: We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation, including the phrase-based model, the hierachical phrase-based model, and various syntaxbased models. The key innovation provided by the toolkit is that the decoder can work with various grammars and offers different choices of decoding algrithms, such as phrase-based decoding, decoding as parsing/tree-parsing and forest-based decoding. Moreover, several useful utilities were distributed with the toolkit, including a discriminative reordering model, a simple and fast language model, and an implementation of minimum error rate training for weight tuning. 1
3 0.20528562 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models
Author: Xiaodong He ; Li Deng
Abstract: This paper proposes a new discriminative training method in constructing phrase and lexicon translation models. In order to reliably learn a myriad of parameters in these models, we propose an expected BLEU score-based utility function with KL regularization as the objective, and train the models on a large parallel dataset. For training, we derive growth transformations for phrase and lexicon translation probabilities to iteratively improve the objective. The proposed method, evaluated on the Europarl German-to-English dataset, leads to a 1.1 BLEU point improvement over a state-of-the-art baseline translation system. In IWSLT 201 1 Benchmark, our system using the proposed method achieves the best Chinese-to-English translation result on the task of translating TED talks.
4 0.18077265 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
Author: Hui Zhang ; David Chiang
Abstract: Syntax-based translation models that operate on the output of a source-language parser have been shown to perform better if allowed to choose from a set of possible parses. In this paper, we investigate whether this is because it allows the translation stage to overcome parser errors or to override the syntactic structure itself. We find that it is primarily the latter, but that under the right conditions, the translation stage does correct parser errors, improving parsing accuracy on the Chinese Treebank.
5 0.17756619 128 acl-2012-Learning Better Rule Extraction with Translation Span Alignment
Author: Jingbo Zhu ; Tong Xiao ; Chunliang Zhang
Abstract: This paper presents an unsupervised approach to learning translation span alignments from parallel data that improves syntactic rule extraction by deleting spurious word alignment links and adding new valuable links based on bilingual translation span correspondences. Experiments on Chinese-English translation demonstrate improvements over standard methods for tree-to-string and tree-to-tree translation. 1
6 0.16942728 143 acl-2012-Mixing Multiple Translation Models in Statistical Machine Translation
7 0.15365882 140 acl-2012-Machine Translation without Words through Substring Alignment
8 0.15263245 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
9 0.13073106 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT
10 0.12924854 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation
11 0.1250037 203 acl-2012-Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information
12 0.11716302 204 acl-2012-Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation
13 0.11584621 134 acl-2012-Learning to Find Translations and Transliterations on the Web
14 0.11375494 97 acl-2012-Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
15 0.11359232 67 acl-2012-Deciphering Foreign Language by Combining Language Models and Context Vectors
16 0.11092253 66 acl-2012-DOMCAT: A Bilingual Concordancer for Domain-Specific Computer Assisted Translation
17 0.10811664 179 acl-2012-Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the l0-norm
18 0.10726315 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT
19 0.10502527 199 acl-2012-Topic Models for Dynamic Translation Model Adaptation
20 0.10152374 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
topicId topicWeight
[(0, -0.275), (1, -0.239), (2, 0.095), (3, 0.058), (4, 0.065), (5, -0.074), (6, -0.003), (7, 0.015), (8, 0.009), (9, -0.006), (10, 0.014), (11, 0.0), (12, -0.024), (13, -0.036), (14, 0.032), (15, 0.009), (16, 0.015), (17, 0.09), (18, 0.098), (19, -0.092), (20, 0.025), (21, -0.029), (22, -0.0), (23, 0.042), (24, 0.018), (25, 0.023), (26, -0.029), (27, -0.118), (28, -0.089), (29, -0.018), (30, 0.017), (31, -0.041), (32, 0.061), (33, 0.061), (34, -0.037), (35, -0.013), (36, 0.027), (37, 0.117), (38, 0.003), (39, -0.026), (40, 0.001), (41, 0.002), (42, 0.035), (43, 0.094), (44, 0.133), (45, 0.017), (46, -0.081), (47, 0.005), (48, -0.05), (49, -0.068)]
simIndex simValue paperId paperTitle
same-paper 1 0.95126772 131 acl-2012-Learning Translation Consensus with Structured Label Propagation
Author: Shujie Liu ; Chi-Ho Li ; Mu Li ; Ming Zhou
Abstract: In this paper, we address the issue for learning better translation consensus in machine translation (MT) research, and explore the search of translation consensus from similar, rather than the same, source sentences or their spans. Unlike previous work on this topic, we formulate the problem as structured labeling over a much smaller graph, and we propose a novel structured label propagation for the task. We convert such graph-based translation consensus from similar source strings into useful features both for n-best output reranking and for decoding algorithm. Experimental results show that, our method can significantly improve machine translation performance on both IWSLT and NIST data, compared with a state-ofthe-art baseline. 1
2 0.80480206 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
Author: Tong Xiao ; Jingbo Zhu ; Hao Zhang ; Qiang Li
Abstract: We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation, including the phrase-based model, the hierachical phrase-based model, and various syntaxbased models. The key innovation provided by the toolkit is that the decoder can work with various grammars and offers different choices of decoding algrithms, such as phrase-based decoding, decoding as parsing/tree-parsing and forest-based decoding. Moreover, several useful utilities were distributed with the toolkit, including a discriminative reordering model, a simple and fast language model, and an implementation of minimum error rate training for weight tuning. 1
3 0.73424959 105 acl-2012-Head-Driven Hierarchical Phrase-based Translation
Author: Junhui Li ; Zhaopeng Tu ; Guodong Zhou ; Josef van Genabith
Abstract: This paper presents an extension of Chiang’s hierarchical phrase-based (HPB) model, called Head-Driven HPB (HD-HPB), which incorporates head information in translation rules to better capture syntax-driven information, as well as improved reordering between any two neighboring non-terminals at any stage of a derivation to explore a larger reordering search space. Experiments on Chinese-English translation on four NIST MT test sets show that the HD-HPB model significantly outperforms Chiang’s model with average gains of 1.91 points absolute in BLEU. 1
4 0.7228533 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models
Author: Xiaodong He ; Li Deng
Abstract: This paper proposes a new discriminative training method in constructing phrase and lexicon translation models. In order to reliably learn a myriad of parameters in these models, we propose an expected BLEU score-based utility function with KL regularization as the objective, and train the models on a large parallel dataset. For training, we derive growth transformations for phrase and lexicon translation probabilities to iteratively improve the objective. The proposed method, evaluated on the Europarl German-to-English dataset, leads to a 1.1 BLEU point improvement over a state-of-the-art baseline translation system. In IWSLT 201 1 Benchmark, our system using the proposed method achieves the best Chinese-to-English translation result on the task of translating TED talks.
5 0.7092005 204 acl-2012-Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation
Author: Seung-Wook Lee ; Dongdong Zhang ; Mu Li ; Ming Zhou ; Hae-Chang Rim
Abstract: In this paper, we propose a novel method of reducing the size of translation model for hierarchical phrase-based machine translation systems. Previous approaches try to prune infrequent entries or unreliable entries based on statistics, but cause a problem of reducing the translation coverage. On the contrary, the proposed method try to prune only ineffective entries based on the estimation of the information redundancy encoded in phrase pairs and hierarchical rules, and thus preserve the search space of SMT decoders as much as possible. Experimental results on Chinese-toEnglish machine translation tasks show that our method is able to reduce almost the half size of the translation model with very tiny degradation of translation performance.
6 0.69530994 143 acl-2012-Mixing Multiple Translation Models in Statistical Machine Translation
7 0.69498658 128 acl-2012-Learning Better Rule Extraction with Translation Span Alignment
8 0.68192667 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
10 0.65050751 66 acl-2012-DOMCAT: A Bilingual Concordancer for Domain-Specific Computer Assisted Translation
11 0.64286977 108 acl-2012-Hierarchical Chunk-to-String Translation
12 0.63821286 67 acl-2012-Deciphering Foreign Language by Combining Language Models and Context Vectors
13 0.63090515 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
14 0.59601438 140 acl-2012-Machine Translation without Words through Substring Alignment
15 0.57040787 136 acl-2012-Learning to Translate with Multiple Objectives
16 0.56279504 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT
17 0.55892724 46 acl-2012-Character-Level Machine Translation Evaluation for Languages with Ambiguous Word Boundaries
18 0.55350411 203 acl-2012-Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information
19 0.54840076 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT
20 0.54631549 42 acl-2012-Bootstrapping via Graph Propagation
topicId topicWeight
[(26, 0.029), (28, 0.08), (30, 0.047), (37, 0.043), (39, 0.041), (57, 0.029), (70, 0.138), (74, 0.074), (82, 0.035), (84, 0.012), (85, 0.047), (90, 0.204), (92, 0.047), (94, 0.029), (99, 0.049)]
simIndex simValue paperId paperTitle
1 0.92067784 37 acl-2012-Baselines and Bigrams: Simple, Good Sentiment and Topic Classification
Author: Sida Wang ; Christopher Manning
Abstract: Variants of Naive Bayes (NB) and Support Vector Machines (SVM) are often used as baseline methods for text classification, but their performance varies greatly depending on the model variant, features used and task/ dataset. We show that: (i) the inclusion of word bigram features gives consistent gains on sentiment analysis tasks; (ii) for short snippet sentiment tasks, NB actually does better than SVMs (while for longer documents the opposite result holds); (iii) a simple but novel SVM variant using NB log-count ratios as feature values consistently performs well across tasks and datasets. Based on these observations, we identify simple NB and SVM variants which outperform most published results on sentiment analysis datasets, sometimes providing a new state-of-the-art performance level.
same-paper 2 0.88515913 131 acl-2012-Learning Translation Consensus with Structured Label Propagation
Author: Shujie Liu ; Chi-Ho Li ; Mu Li ; Ming Zhou
Abstract: In this paper, we address the issue for learning better translation consensus in machine translation (MT) research, and explore the search of translation consensus from similar, rather than the same, source sentences or their spans. Unlike previous work on this topic, we formulate the problem as structured labeling over a much smaller graph, and we propose a novel structured label propagation for the task. We convert such graph-based translation consensus from similar source strings into useful features both for n-best output reranking and for decoding algorithm. Experimental results show that, our method can significantly improve machine translation performance on both IWSLT and NIST data, compared with a state-ofthe-art baseline. 1
3 0.84919858 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
Author: Spence Green ; John DeNero
Abstract: When automatically translating from a weakly inflected source language like English to a target language with richer grammatical features such as gender and dual number, the output commonly contains morpho-syntactic agreement errors. To address this issue, we present a target-side, class-based agreement model. Agreement is promoted by scoring a sequence of fine-grained morpho-syntactic classes that are predicted during decoding for each translation hypothesis. For English-to-Arabic translation, our model yields a +1.04 BLEU average improvement over a state-of-the-art baseline. The model does not require bitext or phrase table annotations and can be easily implemented as a feature in many phrase-based decoders. 1
4 0.84756088 158 acl-2012-PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning
Author: Boxing Chen ; Roland Kuhn ; Samuel Larkin
Abstract: Many machine translation (MT) evaluation metrics have been shown to correlate better with human judgment than BLEU. In principle, tuning on these metrics should yield better systems than tuning on BLEU. However, due to issues such as speed, requirements for linguistic resources, and optimization difficulty, they have not been widely adopted for tuning. This paper presents PORT , a new MT evaluation metric which combines precision, recall and an ordering metric and which is primarily designed for tuning MT systems. PORT does not require external resources and is quick to compute. It has a better correlation with human judgment than BLEU. We compare PORT-tuned MT systems to BLEU-tuned baselines in five experimental conditions involving four language pairs. PORT tuning achieves 1 consistently better performance than BLEU tuning, according to four automated metrics (including BLEU) and to human evaluation: in comparisons of outputs from 300 source sentences, human judges preferred the PORT-tuned output 45.3% of the time (vs. 32.7% BLEU tuning preferences and 22.0% ties). 1
Author: Patrick Simianer ; Stefan Riezler ; Chris Dyer
Abstract: With a few exceptions, discriminative training in statistical machine translation (SMT) has been content with tuning weights for large feature sets on small development data. Evidence from machine learning indicates that increasing the training sample size results in better prediction. The goal of this paper is to show that this common wisdom can also be brought to bear upon SMT. We deploy local features for SCFG-based SMT that can be read off from rules at runtime, and present a learning algorithm that applies ‘1/‘2 regularization for joint feature selection over distributed stochastic learning processes. We present experiments on learning on 1.5 million training sentences, and show significant improvements over tuning discriminative models on small development sets.
6 0.84440172 140 acl-2012-Machine Translation without Words through Substring Alignment
7 0.84360296 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation
8 0.84218574 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
10 0.84073031 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
11 0.83634526 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT
12 0.83464801 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules
13 0.83324939 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
14 0.83266908 45 acl-2012-Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
15 0.8301484 103 acl-2012-Grammar Error Correction Using Pseudo-Error Sentences and Domain Adaptation
16 0.82986689 97 acl-2012-Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
17 0.82982808 136 acl-2012-Learning to Translate with Multiple Objectives
18 0.82832849 9 acl-2012-A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors
19 0.82771569 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing
20 0.82584828 178 acl-2012-Sentence Simplification by Monolingual Machine Translation