acl acl2013 acl2013-181 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Conghui Zhu ; Taro Watanabe ; Eiichiro Sumita ; Tiejun Zhao
Abstract: Typical statistical machine translation systems are batch trained with a given training data and their performances are largely influenced by the amount of data. With the growth of the available data across different domains, it is computationally demanding to perform batch training every time when new data comes. In face of the problem, we propose an efficient phrase table combination method. In particular, we train a Bayesian phrasal inversion transduction grammars for each domain separately. The learned phrase tables are hierarchically combined as if they are drawn from a hierarchical Pitman-Yor process. The performance measured by BLEU is at least as comparable to the traditional batch training method. Furthermore, each phrase table is trained separately in each domain, and while computational overhead is significantly reduced by training them in parallel.
Reference: text
sentIndex sentText sentNum sentScore
1 jp Abstract Typical statistical machine translation systems are batch trained with a given training data and their performances are largely influenced by the amount of data. [sent-10, score-0.331]
2 In face of the problem, we propose an efficient phrase table combination method. [sent-12, score-0.371]
3 In particular, we train a Bayesian phrasal inversion transduction grammars for each domain separately. [sent-13, score-0.417]
4 The learned phrase tables are hierarchically combined as if they are drawn from a hierarchical Pitman-Yor process. [sent-14, score-0.648]
5 The performance measured by BLEU is at least as comparable to the traditional batch training method. [sent-15, score-0.178]
6 Furthermore, each phrase table is trained separately in each domain, and while computational overhead is significantly reduced by training them in parallel. [sent-16, score-0.377]
7 1 Introduction Statistical machine translation (SMT) systems usually achieve ’crowd-sourced’ improvements with batch training. [sent-17, score-0.29]
8 Phrase pair extraction, the key step to discover translation knowledge, heavily relies on the scale of training data. [sent-18, score-0.252]
9 Typically, the more parallel corpora used, the more phrase pairs and more accurate parameters will be learned, which can obviously be beneficial to improving translation performances. [sent-19, score-0.554]
10 Traditional domain adaption methods for SMT are also not adequate in this scenario. [sent-25, score-0.204]
11 Most of them have been proposed in order to make translation systems perform better for resource-scarce domains when most training data comes from resourcerich domains, and ignore performance on a more generic domain without domain bias (Wang et al. [sent-26, score-0.595]
12 Since SMT systems trend to employ very large scale training data for translation knowledge extraction, updating several sentence pairs each time will be annihilated in the existing corpus. [sent-29, score-0.2]
13 This paper proposes a new phrase table combi- nation method. [sent-30, score-0.305]
14 First, phrase pairs are extracted from each domain without interfering with other domains. [sent-31, score-0.568]
15 In particular, we employ the nonparametric Bayesian phrasal inversion transduction grammar (ITG) of Neubig et al. [sent-32, score-0.255]
16 Thus, we can easily update the chain of phrase tables by appending the newly extracted phrase table and by treating the chain of the previous ones as its prior. [sent-35, score-0.7]
17 In section 3, we briefly describe the translation model with phrasal ITGs and Pitman-Yor process. [sent-41, score-0.275]
18 In section 4, we explain our hierarchical combination approach and give experiment results in section 5. [sent-42, score-0.251]
19 A number of approaches have been proposed to make use of the full potential of the available parallel sentences from various domains, such as domain adaptation and incremental learning for SMT. [sent-47, score-0.333]
20 In the case of the previous work on translation modeling, mixed methods have been investigated for domain adaptation in SMT by adding domain information as additional labels to the original phrase table (Foster and Kuhn, 2007). [sent-53, score-0.84]
21 Then allthe phrase pairs and features are tuned together with different weights during decoding. [sent-55, score-0.351]
22 As a way to choose the right domain for the domain adaption, a classifier-based method and a feature-based method have been proposed. [sent-56, score-0.324]
23 Classification-based methods must at least add an explicit label to indicate which domain the current phrase pair comes from. [sent-57, score-0.565]
24 This is traditionally done with an automatic domain classifier, and each input sentence is classified into its corresponding domain (Xu et al. [sent-58, score-0.324]
25 (2012) employed a featurebased approach, in which phrase pairs are enriched by a feature set to potentially reflect the domain information. [sent-61, score-0.592]
26 Monolingual topic information is taken as a new feature for a domain adaptive translation model and tuned on the development set (Su et al. [sent-64, score-0.361]
27 Regardless of underlying methods, either classifier-based or featurebased method, the performance of current domain adaptive phrase extraction methods is more sensitive to the development set selection. [sent-66, score-0.614]
28 Compared to traditional frequent batch oriented methods, an online EM algorithm and active learning are applied to phrase pair extraction and achieves almost compa- rable translation performance with less computational overhead (Levenberg et al. [sent-69, score-0.869]
29 However, their methods usually require numbers of hyperparameters, such as mini-batch size, step size, or human judgment to determine the quality of phrases, and still rely on a heuristic phrase extraction method in each phrase table update. [sent-72, score-0.721]
30 3 Phrase Pair Extraction with Unsupervised Phrasal ITGs Recently, phrase alignment with ITGs (Cherry and Lin, 2007; Zhang et al. [sent-73, score-0.365]
31 It can achieve comparable translation accuracy with a much smaller phrase table than the traditional GIZA++ and heuristic phrase extraction methods. [sent-78, score-0.917]
32 It has al- so been proved successful in adjusting the phrase length granularity by applying character-based SMT with more sophisticated inference (Neubig et al. [sent-79, score-0.305]
33 ITG is a synchronous grammar formalism which analyzes bilingual text by introducing inverted rules, and each ITG derivation corresponds to the alignment of a sentence pair (Wu, 1997). [sent-81, score-0.208]
34 h is parameteribzileidty by a phrase pair d hiest,rfibi,ut wiohni θt aisnd p a symbol distribution θx. [sent-87, score-0.403]
35 r, s is the strength × parameter, and , and Pdac is a prior probability which acts as a fallback probability when a phrase pair is not in the model. [sent-92, score-0.643]
36 Under this model, the probability for a phrase pair found in a bilingual corpus hE, Fi can be reprpeasiern ftoeudn by nth ae b following equation using nth bee C rehpi-nese restaurant process (Teh, 2006): P? [sent-93, score-0.65]
37 C 1+ s(ci− d × ti)+ T) Pdac(hei,fii) (2) and ti are the customer and table count of the ith phrase pair hei, fii found in a bilingual corpus hE, Fi ; ci 2. [sent-96, score-0.716]
38 The prior probability Pdac is recursively defined by breaking a longer phrase pair into two through the recursive ITG’s generative story as follows (Neubig et al. [sent-99, score-0.493]
39 If x = Base, generate a new phrase directly from Pbase. [sent-105, score-0.305]
40 pair and conpair Figure 1: A word alignment (a), and its hierarchical derivation (b). [sent-113, score-0.343]
41 Compared to GIZA++ with heuristic phrase extraction, the Bayesian phrasal ITG can achieve competitive accuracy under a smaller phrase table size. [sent-120, score-0.78]
42 Figure 1 (b) illustrates an example of the phrasal ITG derivation for word alignment in Figure 1 (a) in which a bilingual sentence pair is recursively divided into two through the recursively defined generative story. [sent-122, score-0.427]
43 4 Hierarchical Phrase Table Combination We propose a new phrase table combination method, in which individually learned phrase table are hierarchically chained through a hierarchical Pitman-Yor process. [sent-123, score-0.997]
44 s pTlihten in phrase pairs are 804 Figure 2: A hierarchical phrase table combination (a), and a basic unit of a Chinese restaurant process with K tables and N customers. [sent-128, score-1.107]
45 p aIntraditional domain adaptation approaches, phrase pairs are extracted together with their probabilities and/or frequencies so that the extracted phrase pairs are merged uniformly or after scaling. [sent-130, score-0.975]
46 In this work, we extract the table counts for each phrase pair under the Chinese restaurant process given in Section 3. [sent-131, score-0.513]
47 Our proposed hierarchical phrase table combination can be formally expressed as following: θ1 · · · θj · · PY (d1, ∼ · · PY (dj, sj, Pj+1) ∼ · · s1, P2) · · · θJ PY? [sent-134, score-0.556]
48 l Pitman-Yor process is employed as a base measure for the jth layer hierarchical Pitman-Yor process. [sent-138, score-0.466]
49 The hierarchical chain is terminated by the base measure from the Jth domain PbJase. [sent-139, score-0.398]
50 The hierarchical structure is illustrated in Figure 2 (a) in which the solid lines implies a fall back using the ta- ble counts from the subsequent domains, and the dotted lines means the final fallback to the base measure PbJase. [sent-140, score-0.394]
51 When we query a probability of a phrase pair he, fi, we first query the probability horfa sthee p faiirrst h layer wP1e (he, fi). [sent-141, score-0.536]
52 For example in Figure 2 (a), the ith phrase pair hei, fii appears only in the domain 1 and domain 2, so i ts a ptrpaenasrlsat oionnly probability can 1be a ncdal dcoulmaateidn by substituting Equation (3) with Equation (2): C11+ s1(ci1− d1 × ti1) +(C1+s1 s+1) d ×1 (×C T21+ s2)(ci2− d2× ti2) P? [sent-148, score-0.989]
53 The first term in Equation (4) is the phrase probability from the first domain, and the second one comes from the second domain, but weighted by the fallback weight of the 1st domain. [sent-155, score-0.504]
54 Since hei, fii does not appear in the r1esstt oofm thaein layers, t hhee lasti tdeormes niso tta akpepne farro imn tahlelthe fallback weight from the second layer to the Jth layer with the final PbJase. [sent-156, score-0.481]
55 All the parameters θj and hyperparameters dj and sj, are obtained by learning on the jth domain. [sent-157, score-0.232]
56 Returning the hyperparameters again when cascading another domain may improve the performance of the combination weight, but we will leave it for future work. [sent-158, score-0.267]
57 The hierarchical process can be viewed as an instance of adapted integration of translation knowledge from each sub-domain. [sent-159, score-0.427]
58 First, each phrase pair extraction can concentrate on a small portion of domain-specific data without interfering with other domains. [sent-161, score-0.52]
59 Since no tuning stage is involved in the hierarchical combination, we can easily include a new phrase table from a new domain by simply chaining them together. [sent-162, score-0.652]
60 Second, phrase pair phrase extraction in each domain is completely independent, so it is easy to parallelize in a situation where the training data is too large to fit into a small amount of memory. [sent-163, score-0.932]
61 When we encounter a new domain, and if a phrase pair is completely new in terms of the model, the phrase pair is simply appended to the current model, and computed without the fallback probabilities, since otherwise, the phrase pair would be boosted by the fallback probabilities. [sent-165, score-1.525]
62 Pitman-Yor process is also employed in n-gram language models which are hierarchically represented through the hierarchical Pitman-Yor process with switch priors to integrate different domains in all the levels (Wood and Teh, 2009). [sent-166, score-0.501]
63 Our work incrementally combines the models from different domains by directly employing the hierarchical process through the base measures. [sent-167, score-0.481]
64 In order to evaluate our approach, four phrase pair extraction methods are performed: 1. [sent-176, score-0.465]
65 GIZA-linear: Phase pairs are extracted in each domain by GIZA++ (Och and Ney, 2003) and the ”grow-diag-final-and” method with a maximum length 7. [sent-177, score-0.208]
66 The phrase tables from various domains are linearly combined by averag- ing the feature values. [sent-178, score-0.512]
67 Pialign-linear: Similar to GIZA-linear, but we employed the phrasal ITG method described in Section 3 using the pialign toolkit (Neubig et 3 1http://code. [sent-180, score-0.282]
68 com/pialign/ 806 Table 2: BLEU scores and phrase table size by alignment method and probabilities Pialign was run with five samples. [sent-185, score-0.419]
69 Extracted phrase pairs are linearly combined by averaging the feature values. [sent-190, score-0.351]
70 GIZA-batch: Instead of splitting into each domain, the data set is merged as a single corpus and then a heuristic GZA-based phrase extrac- tion is performed, similar as GIZA-linear. [sent-192, score-0.354]
71 Pialign-adaptive: Alignment and phrase pairs extraction are same to Pialign-batch, while translation probabilities are estimated by the adaptive method with monolingual topic information (Su et al. [sent-197, score-0.705]
72 The method established the relationship between the out-ofdomain bilingual corpus and in-domain monolingual corpora via topic distribution to estimate the translation probability. [sent-199, score-0.243]
73 It extracts phrase pairs in the same way as the Pialign-linear. [sent-202, score-0.351]
74 In the phrase table combination process, the translation probability of each phrase pair is estimated by the Hier-combin and the other features are also linearly combined by averaging the feature values. [sent-203, score-0.969]
75 1 Performances of various extraction methods We carry out a series of experiments to evaluate translation performance. [sent-218, score-0.216]
76 Except for the translation probabilities, the phrase pairs of two methods are exactly same, so the number of phrase pairs are equal in the two methods. [sent-221, score-0.856]
77 650i9n5ute) Table 3: Minutes used for alignment and phase pair extraction in the FBIS data set. [sent-224, score-0.274]
78 with monolingual topic information is useful in the tasks, but our approach with the hierarchical Pitman-Yor process can estimate more accurate translation probabilities based on all the data from various domains. [sent-225, score-0.478]
79 Compared with the GIZA-batch, our approach achieves competitive performance with a much smaller phrase table. [sent-226, score-0.305]
80 In the framework we proposed, phrase pairs are extracted from each domain completely independent of each other, so those tasks can be executed on different machines, at different times, and of course in parallel when we assume that the domains are not incrementally added in the training data. [sent-235, score-0.761]
81 Even the performance of the pialign-linear is better than the Baseline GIZA-linear’s, which means that phrase pair extraction with hierarchical phrasal ITGs and sampling is more suitable for domain adaptation tasks than the combination GIZA++ and a heuristic method. [sent-239, score-1.145]
82 Generally, the hierarchical combination method exploits the nature of a hierarchical Pitman-Yor process and gains the advantage of its smoothing effect, and our approach can incrementally generate a succinct phrase table based on all the data from various domains with more accurate probabilities. [sent-240, score-1.028]
83 Traditional SMT phrase pair extraction is batch-based, while our method has no obvious shortcomings in translation accuracy, not to mention efficiency. [sent-241, score-0.619]
84 2 Effect of Integration Order Here, we evaluate whether our hierarchical combination is sensitive to the order of the domains when forming a hierarchical structure. [sent-244, score-0.553]
85 Through Equation (3), in our experiments, we chained the domains in the order listed in Table 1, which is in almost chronological order. [sent-245, score-0.185]
86 Table 4 shows the BLEU scores for the three data sets, in which the order of combining phrase tables from each domain is alternated in the ascending and descending of the similarity to the test data. [sent-246, score-0.557]
87 The result may indicate that our hierarchical phrase combination method is sensitive to the integration order when the training data is small and there exists large gap in the similarity. [sent-253, score-0.598]
88 However, if most domains are similar (FBIS data set) or if there are enough parallel sentence pairs (NIST data set) in each domain, then the translation performances are almost similar even with the opposite integrating orders. [sent-254, score-0.366]
89 2 1C6548 Table 4: BLEU scores for the hierarchical model with different integrating orders. [sent-259, score-0.185]
90 6 Conclusion and Future Work In this paper, we present a novel hierarchical phrase table combination method for SMT, which can exploit more of the potential from all of data coming from various fields and generate a suc808 cinct phrase table with more accurate translation probabilities. [sent-261, score-1.015]
91 The method assumes that a combined model is derived from a hierarchical PitmanYor process with each prior learned separately in each domain, and achieves BLEU scores competitive with traditional batch-based ones. [sent-262, score-0.273]
92 Meanwhile, the framework has natural characteristics for parallel and incremental phrase pair extraction. [sent-263, score-0.517]
93 In future work, we will also introduce incremental learning for phase pair extraction inside a domain, which means using the current translation probabilities already obtained as the base measure of sampling parameters for the upcoming domain. [sent-265, score-0.578]
94 Furthermore, we will investigate any tradeoffs between the accuracy of the probability estimation and the coverage of phrase pairs. [sent-266, score-0.346]
95 Improving statistical machine translation performance by train- ing data selection and optimization. [sent-351, score-0.195]
96 An unsupervised model for joint phrase alignment and extraction. [sent-356, score-0.365]
97 Translation model adaptation for statistical machine translation with monolingual topic information. [sent-394, score-0.291]
98 A hierarchical bayesian language model based on pitman-yor processes. [sent-398, score-0.239]
99 A hierarchical nonparametric Bayesian approach to statistical language model domain adaptation. [sent-410, score-0.388]
100 Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. [sent-414, score-0.233]
wordName wordTfidf (topN-words)
[('phrase', 0.305), ('fii', 0.221), ('hei', 0.22), ('hierarchical', 0.185), ('levenberg', 0.171), ('pdac', 0.166), ('domain', 0.162), ('itg', 0.162), ('fallback', 0.158), ('translation', 0.154), ('fi', 0.147), ('itgs', 0.146), ('sj', 0.14), ('neubig', 0.14), ('batch', 0.136), ('pialign', 0.122), ('phrasal', 0.121), ('domains', 0.117), ('fbis', 0.116), ('pbjase', 0.11), ('dj', 0.099), ('pair', 0.098), ('btec', 0.098), ('jth', 0.094), ('tables', 0.09), ('bleu', 0.084), ('incrementally', 0.082), ('abby', 0.073), ('py', 0.072), ('overhead', 0.072), ('hit', 0.069), ('hierarchically', 0.068), ('transduction', 0.068), ('tij', 0.068), ('chained', 0.068), ('smt', 0.067), ('inversion', 0.066), ('combination', 0.066), ('incremental', 0.065), ('restaurant', 0.064), ('extraction', 0.062), ('cherry', 0.06), ('alignment', 0.06), ('blunsom', 0.059), ('tj', 0.059), ('och', 0.059), ('cj', 0.058), ('adaptation', 0.057), ('cji', 0.055), ('fji', 0.055), ('hej', 0.055), ('interfering', 0.055), ('wood', 0.055), ('xtf', 0.055), ('miles', 0.055), ('phase', 0.054), ('bayesian', 0.054), ('probabilities', 0.054), ('taro', 0.052), ('ldc', 0.052), ('layer', 0.051), ('base', 0.051), ('bilingual', 0.05), ('parallel', 0.049), ('recursively', 0.049), ('heuristic', 0.049), ('inv', 0.049), ('pitman', 0.049), ('pairs', 0.046), ('process', 0.046), ('association', 0.046), ('equation', 0.046), ('denero', 0.045), ('adaptive', 0.045), ('reg', 0.045), ('gonz', 0.045), ('franz', 0.044), ('succinct', 0.042), ('cij', 0.042), ('adaption', 0.042), ('integration', 0.042), ('customer', 0.042), ('ohio', 0.042), ('traditional', 0.042), ('probability', 0.041), ('statistical', 0.041), ('giza', 0.041), ('mori', 0.04), ('shinsuke', 0.04), ('tatsuya', 0.04), ('divergent', 0.04), ('featurebased', 0.04), ('macherey', 0.04), ('tf', 0.04), ('sampling', 0.04), ('wi', 0.04), ('employed', 0.039), ('monolingual', 0.039), ('koehn', 0.039), ('hyperparameters', 0.039)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation
Author: Conghui Zhu ; Taro Watanabe ; Eiichiro Sumita ; Tiejun Zhao
Abstract: Typical statistical machine translation systems are batch trained with a given training data and their performances are largely influenced by the amount of data. With the growth of the available data across different domains, it is computationally demanding to perform batch training every time when new data comes. In face of the problem, we propose an efficient phrase table combination method. In particular, we train a Bayesian phrasal inversion transduction grammars for each domain separately. The learned phrase tables are hierarchically combined as if they are drawn from a hierarchical Pitman-Yor process. The performance measured by BLEU is at least as comparable to the traditional batch training method. Furthermore, each phrase table is trained separately in each domain, and while computational overhead is significantly reduced by training them in parallel.
2 0.34259322 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation
Author: Trevor Cohn ; Gholamreza Haffari
Abstract: Modern phrase-based machine translation systems make extensive use of wordbased translation models for inducing alignments from parallel corpora. This is problematic, as the systems are incapable of accurately modelling many translation phenomena that do not decompose into word-for-word translation. This paper presents a novel method for inducing phrase-based translation units directly from parallel data, which we frame as learning an inverse transduction grammar (ITG) using a recursive Bayesian prior. Overall this leads to a model which learns translations of entire sentences, while also learning their decomposition into smaller units (phrase-pairs) recursively, terminating at word translations. Our experiments on Arabic, Urdu and Farsi to English demonstrate improvements over competitive baseline systems.
3 0.28807113 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation
Author: Jiajun Zhang ; Chengqing Zong
Abstract: Currently, almost all of the statistical machine translation (SMT) models are trained with the parallel corpora in some specific domains. However, when it comes to a language pair or a different domain without any bilingual resources, the traditional SMT loses its power. Recently, some research works study the unsupervised SMT for inducing a simple word-based translation model from the monolingual corpora. It successfully bypasses the constraint of bitext for SMT and obtains a relatively promising result. In this paper, we take a step forward and propose a simple but effective method to induce a phrase-based model from the monolingual corpora given an automatically-induced translation lexicon or a manually-edited translation dictionary. We apply our method for the domain adaptation task and the extensive experiments show that our proposed method can substantially improve the translation quality. 1
4 0.21844622 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation
Author: Rico Sennrich ; Holger Schwenk ; Walid Aransa
Abstract: While domain adaptation techniques for SMT have proven to be effective at improving translation quality, their practicality for a multi-domain environment is often limited because of the computational and human costs of developing and maintaining multiple systems adapted to different domains. We present an architecture that delays the computation of translation model features until decoding, allowing for the application of mixture-modeling techniques at decoding time. We also de- scribe a method for unsupervised adaptation with development and test data from multiple domains. Experimental results on two language pairs demonstrate the effectiveness of both our translation model architecture and automatic clustering, with gains of up to 1BLEU over unadapted systems and single-domain adaptation.
5 0.19745395 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk
Author: Lei Cui ; Dongdong Zhang ; Shujie Liu ; Mu Li ; Ming Zhou
Abstract: The quality of bilingual data is a key factor in Statistical Machine Translation (SMT). Low-quality bilingual data tends to produce incorrect translation knowledge and also degrades translation modeling performance. Previous work often used supervised learning methods to filter lowquality data, but a fair amount of human labeled examples are needed which are not easy to obtain. To reduce the reliance on labeled examples, we propose an unsupervised method to clean bilingual data. The method leverages the mutual reinforcement between the sentence pairs and the extracted phrase pairs, based on the observation that better sentence pairs often lead to better phrase extraction and vice versa. End-to-end experiments show that the proposed method substantially improves the performance in largescale Chinese-to-English translation tasks.
6 0.16781868 374 acl-2013-Using Context Vectors in Improving a Machine Translation System with Bridge Language
7 0.16329937 197 acl-2013-Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation
8 0.15837462 307 acl-2013-Scalable Decipherment for Machine Translation via Hash Sampling
9 0.14785028 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers
10 0.14160648 10 acl-2013-A Markov Model of Machine Translation using Non-parametric Bayesian Inference
11 0.13457392 383 acl-2013-Vector Space Model for Adaptation in Statistical Machine Translation
12 0.13416544 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
13 0.12756996 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
14 0.12753811 214 acl-2013-Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation
15 0.12354423 120 acl-2013-Dirt Cheap Web-Scale Parallel Text from the Common Crawl
16 0.12138887 156 acl-2013-Fast and Adaptive Online Training of Feature-Rich Translation Models
17 0.12124144 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding
18 0.11462605 235 acl-2013-Machine Translation Detection from Monolingual Web-Text
19 0.11427609 200 acl-2013-Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation
20 0.11258599 240 acl-2013-Microblogs as Parallel Corpora
topicId topicWeight
[(0, 0.292), (1, -0.179), (2, 0.23), (3, 0.128), (4, 0.017), (5, 0.037), (6, 0.002), (7, 0.017), (8, -0.049), (9, 0.017), (10, 0.02), (11, 0.013), (12, 0.016), (13, 0.032), (14, 0.023), (15, -0.024), (16, -0.035), (17, -0.005), (18, -0.052), (19, 0.001), (20, 0.026), (21, -0.058), (22, 0.06), (23, 0.064), (24, 0.054), (25, -0.013), (26, 0.015), (27, 0.018), (28, 0.111), (29, 0.043), (30, 0.102), (31, 0.104), (32, -0.05), (33, -0.049), (34, 0.057), (35, 0.099), (36, 0.085), (37, -0.107), (38, -0.028), (39, -0.045), (40, 0.092), (41, 0.03), (42, -0.072), (43, 0.026), (44, 0.034), (45, -0.008), (46, 0.002), (47, -0.015), (48, -0.008), (49, 0.097)]
simIndex simValue paperId paperTitle
same-paper 1 0.96591359 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation
Author: Conghui Zhu ; Taro Watanabe ; Eiichiro Sumita ; Tiejun Zhao
Abstract: Typical statistical machine translation systems are batch trained with a given training data and their performances are largely influenced by the amount of data. With the growth of the available data across different domains, it is computationally demanding to perform batch training every time when new data comes. In face of the problem, we propose an efficient phrase table combination method. In particular, we train a Bayesian phrasal inversion transduction grammars for each domain separately. The learned phrase tables are hierarchically combined as if they are drawn from a hierarchical Pitman-Yor process. The performance measured by BLEU is at least as comparable to the traditional batch training method. Furthermore, each phrase table is trained separately in each domain, and while computational overhead is significantly reduced by training them in parallel.
2 0.86326051 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation
Author: Rico Sennrich ; Holger Schwenk ; Walid Aransa
Abstract: While domain adaptation techniques for SMT have proven to be effective at improving translation quality, their practicality for a multi-domain environment is often limited because of the computational and human costs of developing and maintaining multiple systems adapted to different domains. We present an architecture that delays the computation of translation model features until decoding, allowing for the application of mixture-modeling techniques at decoding time. We also de- scribe a method for unsupervised adaptation with development and test data from multiple domains. Experimental results on two language pairs demonstrate the effectiveness of both our translation model architecture and automatic clustering, with gains of up to 1BLEU over unadapted systems and single-domain adaptation.
3 0.84211177 383 acl-2013-Vector Space Model for Adaptation in Statistical Machine Translation
Author: Boxing Chen ; Roland Kuhn ; George Foster
Abstract: This paper proposes a new approach to domain adaptation in statistical machine translation (SMT) based on a vector space model (VSM). The general idea is first to create a vector profile for the in-domain development (“dev”) set. This profile might, for instance, be a vector with a dimensionality equal to the number of training subcorpora; each entry in the vector reflects the contribution of a particular subcorpus to all the phrase pairs that can be extracted from the dev set. Then, for each phrase pair extracted from the training data, we create a vector with features defined in the same way, and calculate its similarity score with the vector representing the dev set. Thus, we obtain a de- coding feature whose value represents the phrase pair’s closeness to the dev. This is a simple, computationally cheap form of instance weighting for phrase pairs. Experiments on large scale NIST evaluation data show improvements over strong baselines: +1.8 BLEU on Arabic to English and +1.4 BLEU on Chinese to English over a non-adapted baseline, and significant improvements in most circumstances over baselines with linear mixture model adaptation. An informal analysis suggests that VSM adaptation may help in making a good choice among words with the same meaning, on the basis of style and genre.
4 0.83025223 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation
Author: Jiajun Zhang ; Chengqing Zong
Abstract: Currently, almost all of the statistical machine translation (SMT) models are trained with the parallel corpora in some specific domains. However, when it comes to a language pair or a different domain without any bilingual resources, the traditional SMT loses its power. Recently, some research works study the unsupervised SMT for inducing a simple word-based translation model from the monolingual corpora. It successfully bypasses the constraint of bitext for SMT and obtains a relatively promising result. In this paper, we take a step forward and propose a simple but effective method to induce a phrase-based model from the monolingual corpora given an automatically-induced translation lexicon or a manually-edited translation dictionary. We apply our method for the domain adaptation task and the extensive experiments show that our proposed method can substantially improve the translation quality. 1
5 0.81524658 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk
Author: Lei Cui ; Dongdong Zhang ; Shujie Liu ; Mu Li ; Ming Zhou
Abstract: The quality of bilingual data is a key factor in Statistical Machine Translation (SMT). Low-quality bilingual data tends to produce incorrect translation knowledge and also degrades translation modeling performance. Previous work often used supervised learning methods to filter lowquality data, but a fair amount of human labeled examples are needed which are not easy to obtain. To reduce the reliance on labeled examples, we propose an unsupervised method to clean bilingual data. The method leverages the mutual reinforcement between the sentence pairs and the extracted phrase pairs, based on the observation that better sentence pairs often lead to better phrase extraction and vice versa. End-to-end experiments show that the proposed method substantially improves the performance in largescale Chinese-to-English translation tasks.
6 0.80950177 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation
7 0.79081237 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding
8 0.77171677 374 acl-2013-Using Context Vectors in Improving a Machine Translation System with Bridge Language
9 0.75772214 328 acl-2013-Stacking for Statistical Machine Translation
10 0.74209398 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT
11 0.73995835 214 acl-2013-Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation
12 0.72737139 307 acl-2013-Scalable Decipherment for Machine Translation via Hash Sampling
13 0.72648907 10 acl-2013-A Markov Model of Machine Translation using Non-parametric Bayesian Inference
14 0.71949488 338 acl-2013-Task Alternation in Parallel Sentence Retrieval for Twitter Translation
15 0.70978707 197 acl-2013-Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation
16 0.69250357 120 acl-2013-Dirt Cheap Web-Scale Parallel Text from the Common Crawl
17 0.68615288 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
18 0.68508607 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers
19 0.67428678 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
20 0.64865386 255 acl-2013-Name-aware Machine Translation
topicId topicWeight
[(0, 0.078), (6, 0.035), (11, 0.06), (15, 0.019), (24, 0.032), (26, 0.046), (35, 0.085), (42, 0.096), (48, 0.043), (70, 0.053), (77, 0.21), (88, 0.028), (90, 0.053), (95, 0.09)]
simIndex simValue paperId paperTitle
1 0.8896448 243 acl-2013-Mining Informal Language from Chinese Microtext: Joint Word Recognition and Segmentation
Author: Aobo Wang ; Min-Yen Kan
Abstract: We address the problem of informal word recognition in Chinese microblogs. A key problem is the lack of word delimiters in Chinese. We exploit this reliance as an opportunity: recognizing the relation between informal word recognition and Chinese word segmentation, we propose to model the two tasks jointly. Our joint inference method significantly outperforms baseline systems that conduct the tasks individually or sequentially.
same-paper 2 0.81914771 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation
Author: Conghui Zhu ; Taro Watanabe ; Eiichiro Sumita ; Tiejun Zhao
Abstract: Typical statistical machine translation systems are batch trained with a given training data and their performances are largely influenced by the amount of data. With the growth of the available data across different domains, it is computationally demanding to perform batch training every time when new data comes. In face of the problem, we propose an efficient phrase table combination method. In particular, we train a Bayesian phrasal inversion transduction grammars for each domain separately. The learned phrase tables are hierarchically combined as if they are drawn from a hierarchical Pitman-Yor process. The performance measured by BLEU is at least as comparable to the traditional batch training method. Furthermore, each phrase table is trained separately in each domain, and while computational overhead is significantly reduced by training them in parallel.
3 0.7360543 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
Author: Wen-tau Yih ; Ming-Wei Chang ; Christopher Meek ; Andrzej Pastusiak
Abstract: In this paper, we study the answer sentence selection problem for question answering. Unlike previous work, which primarily leverages syntactic analysis through dependency tree matching, we focus on improving the performance using models of lexical semantic resources. Experiments show that our systems can be consistently and significantly improved with rich lexical semantic information, regardless of the choice of learning algorithms. When evaluated on a benchmark dataset, the MAP and MRR scores are increased by 8 to 10 points, compared to one of our baseline systems using only surface-form matching. Moreover, our best system also outperforms pervious work that makes use of the dependency tree structure by a wide margin.
4 0.72946 255 acl-2013-Name-aware Machine Translation
Author: Haibo Li ; Jing Zheng ; Heng Ji ; Qi Li ; Wen Wang
Abstract: We propose a Name-aware Machine Translation (MT) approach which can tightly integrate name processing into MT model, by jointly annotating parallel corpora, extracting name-aware translation grammar and rules, adding name phrase table and name translation driven decoding. Additionally, we also propose a new MT metric to appropriately evaluate the translation quality of informative words, by assigning different weights to different words according to their importance values in a document. Experiments on Chinese-English translation demonstrated the effectiveness of our approach on enhancing the quality of overall translation, name translation and word alignment over a high-quality MT baseline1 .
5 0.72475493 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation
Author: Trevor Cohn ; Gholamreza Haffari
Abstract: Modern phrase-based machine translation systems make extensive use of wordbased translation models for inducing alignments from parallel corpora. This is problematic, as the systems are incapable of accurately modelling many translation phenomena that do not decompose into word-for-word translation. This paper presents a novel method for inducing phrase-based translation units directly from parallel data, which we frame as learning an inverse transduction grammar (ITG) using a recursive Bayesian prior. Overall this leads to a model which learns translations of entire sentences, while also learning their decomposition into smaller units (phrase-pairs) recursively, terminating at word translations. Our experiments on Arabic, Urdu and Farsi to English demonstrate improvements over competitive baseline systems.
6 0.69716078 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT
7 0.69550216 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation
8 0.69372374 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
9 0.69120675 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
10 0.690651 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
11 0.6895479 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation
12 0.68601936 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
13 0.68589592 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding
14 0.68482262 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
15 0.68238413 250 acl-2013-Models of Translation Competitions
16 0.68154383 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
17 0.68116045 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
18 0.6805023 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
19 0.68004608 312 acl-2013-Semantic Parsing as Machine Translation
20 0.6792987 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction