emnlp emnlp2010 emnlp2010-39 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: George Foster
Abstract: We describe a new approach to SMT adaptation that weights out-of-domain phrase pairs according to their relevance to the target domain, determined by both how similar to it they appear to be, and whether they belong to general language or not. This extends previous work on discriminative weighting by using a finer granularity, focusing on the properties of instances rather than corpus components, and using a simpler training procedure. We incorporate instance weighting into a mixture-model framework, and find that it yields consistent improvements over a wide range of baselines.
Reference: text
sentIndex sentText sentNum sentScore
1 ca Abstract We describe a new approach to SMT adaptation that weights out-of-domain phrase pairs according to their relevance to the target domain, determined by both how similar to it they appear to be, and whether they belong to general language or not. [sent-4, score-0.583]
2 This extends previous work on discriminative weighting by using a finer granularity, focusing on the properties of instances rather than corpus components, and using a simpler training procedure. [sent-5, score-0.172]
3 We incorporate instance weighting into a mixture-model framework, and find that it yields consistent improvements over a wide range of baselines. [sent-6, score-0.209]
4 1 Introduction Domain adaptation is a common concern when optimizing empirical NLP applications. [sent-7, score-0.258]
5 Even when there is training data available in the domain of interest, there is often additional data from other domains that could in principle be used to improve performance. [sent-8, score-0.138]
6 Realizing gains in practice can be challenging, however, particularly when the target domain is distant from the background data. [sent-9, score-0.147]
7 For developers of Statistical Machine Translation (SMT) systems, an additional complication is the heterogeneous nature of SMT components (word-alignment model, language model, translation model, etc. [sent-10, score-0.104]
8 In this paper we study the problem of using a parallel corpus from a background domain (OUT) to improve performance on a target domain (IN) for which a smaller amount of parallel 451 training material—though adequate for reasonable performance—is also available. [sent-12, score-0.351]
9 This highly effective approach is not directly applicable to the multinomial models used for core SMT components, which have no natural method for combining split features, so we rely on an instance-weighting approach (Jiang and Zhai, 2007) to downweight domain-specific examples in OUT. [sent-24, score-0.111]
10 Within this framework, we use features intended to capture degree of generality, including the output from an SVM classifier that uses the intersection between IN and OUT as positive examples. [sent-25, score-0.202]
11 Phrase-level granularity distinguishes our work from previous work by Matsoukas et al (2009), who weight sentences according to sub-corpus and genre membership. [sent-32, score-0.174]
12 We train linear mixture models for conditional phrase pair probabilities over IN and OUT so as to maximize the likelihood of an empirical joint phrase-pair distribution extracted from a development set. [sent-34, score-0.349]
13 This is a simple and effective alternative to setting weights discriminatively to maximize a metric such as BLEU. [sent-35, score-0.143]
14 , 2007), we select sentences from OUT using language model perplexities from IN. [sent-38, score-0.093]
15 This is a straightforward technique that is arguably better suited to the adaptation task than the standard method of treating representative IN sentences as queries, then pooling the match results. [sent-39, score-0.258]
16 2 Baseline SMT Adaptation Techniques Standard SMT systems have a hierarchical parameter structure: top-level log-linear weights are used to combine a small set of complex features, interpreted as log probabilities, many of which have their own internal parameters and objectives. [sent-44, score-0.147]
17 The toplevel weights are trained to maximize a metric such as BLEU on a small development set of approxi- mately 1000 sentence pairs. [sent-45, score-0.098]
18 Thus, provided at least this amount of IN data is available—as it is in our setting—adapting these weights is straightforward. [sent-46, score-0.098]
19 We do not adapt the alignment procedure for generating the phrase table from which the TM distributions are derived. [sent-48, score-0.121]
20 When OUT is large and distinct, its contribution can be controlled by training separate IN and OUT models, and weighting their combination. [sent-52, score-0.172]
21 An easy way to achieve this is to put the domain-specific LMs and TMs into the top-level log-linear model and learn optimal weights with MERT (Och, 2003). [sent-53, score-0.098]
22 This is appropriate in cases where it is sanctioned by Bayes’ law, such as multiplying LM and TM probabilities, but for adaptation a more suitable framework is often a mixture model in which each event may be generated from some domain. [sent-57, score-0.334]
23 This leads to a linear combination of domain-specific probabilities, with weights in [0, 1], normalized to sum to 1. [sent-58, score-0.228]
24 Linear weights are difficult to incorporate into the standard MERT procedure because they are “hidden” within a top-level probability that represents the linear combination. [sent-59, score-0.163]
25 1 Following previous work (Foster and Kuhn, 2007), we circumvent this prob- lem by choosing weights to optimize corpus loglikelihood, which is roughly speaking the training criterion used by the LM and TM themselves. [sent-60, score-0.098]
26 For the LM, adaptive weights are set as follows: αˆ = argαmax! [sent-62, score-0.098]
27 iαipi(w|h), (1) where α is a weight vector containing an element αi for each domain (just IN and OUT in our case), pi are the corresponding domain-specific models, and ˜p(w, h) is an empirical distribution from a targetlanguage training corpus—we used the IN dev set for this. [sent-64, score-0.347]
28 This has led previous workers to adopt ad hoc linear weighting schemes (Finch and Sumita, 2008; Foster and Kuhn, 2007; L u¨ et al. [sent-66, score-0.237]
29 However, we note that the final conditional estimates p(s|t) from a given phrase table dmitaixoinmailze es ttihme lteiksel pi(hso|otd) orfo joint empirical phrase pair counts over a word-aligned corpus. [sent-68, score-0.376]
30 iαipi(s|t), (2) where p˜(s, t) is a joint empirical distribution extracted from the IN dev set using the standard procedure. [sent-71, score-0.163]
31 2 An alternative form of linear combination is a maximum a posteriori (MAP) combination (Bacchiani et al. [sent-72, score-0.195]
32 For the TM, this is: p(s|t) =cI(s,ctI)( +t) β + p βo(s|t), (3) where cI(s, t) is the count in the IN phrase table of pair (s, t), po(s|t)" is its probability under the OUT TM, and cI(t) = "s! [sent-74, score-0.163]
33 Tilihtyis uisn mdeorti tvhaete OdU by taking β po(s|t) t"o be the parameters of a Dirichlet prior on phrase probabilities, ttherens maximizing posterior estimates p(s|t) given the IN corpus. [sent-77, score-0.213]
34 To set β, we used the same criterion as for α, over a dev corpus: βˆ = argβmax! [sent-80, score-0.163]
35 on all available IN 453 The MAP combination was used for TM probabilities only, in part due to a technical difficulty in formulating coherent counts when using standard LM smoothing techniques (Kneser and Ney, 1995). [sent-83, score-0.148]
36 3 Sentence Selection Motivated by information retrieval, a number of approaches choose “relevant” sentence pairs from OUT by matching individual source sentences from IN (Hildebrand et al. [sent-85, score-0.106]
37 To approximate these baselines, we implemented a very simple sentence selection algorithm in which parallel sentence pairs from OUT are ranked by the perplexity oftheir target half according to the IN language model. [sent-91, score-0.159]
38 Matsoukas et al (2009) generalize it by learning weights on sentence pairs that are used when estimating relative-frequency phrase-pair probabilities. [sent-94, score-0.243]
39 We extend the Matsoukas et al approach in several ways. [sent-96, score-0.088]
40 First, we learn weights on individual phrase pairs rather than sentences. [sent-97, score-0.276]
41 Intuitively, as suggested by the example in the introduction, this is the right granularity to capture domain effects. [sent-98, score-0.143]
42 Second, rather than relying on a division of the corpus into manually-assigned portions, we use features intended to capture the usefulness of each phrase pair. [sent-99, score-0.219]
43 Finally, we incorporate the instance-weighting model into a general linear combination, and learn weights and mixing parameters simultaneously. [sent-100, score-0.22]
44 3Bacchiani et al (2004) solve this problem by reconstituting joint counts from smoothed conditional estimates and unsmoothed marginals, but this seems somewhat unsatisfactory. [sent-101, score-0.18]
45 This combination generalizes (2) and (3): we use either αt = α to obtain a fixed-weight linear combination, or αt = cI(t)/(cI(t) + β) to obtain a MAP combination. [sent-104, score-0.13]
46 tT)h ies original dOiUstrTi cuotiuonnts, co(s, t) are weighted by a logistic function wλ(s, t) : (s, t) = co(s, t) wλ = cλ (s, t) (6) co(s,t) [1 + exp(−! [sent-108, score-0.092]
47 i where each fi(s, t) is a feature intended to characterize the usefulness of (s, t), weighted by λi. [sent-110, score-0.137]
48 The mixing parameters and feature weights (collectively φ) are optimized simultaneously using devset maximum likelihood as before: φˆ = argφmax! [sent-111, score-0.155]
49 Drop- ping the conditioning on φ for brevity, and letting cλ(s, t) = cλ(s, t) γu(s|t), and cλ (t) = + 4Note that the probabilities in (7) need only be evaluated over the support of p˜(s, t), which is quite small when this distribution is derived from a dev set. [sent-116, score-0.246]
50 2 Interpretation and Variants To motivate weighting joint OUT counts as in (6), we begin with the “ideal” objective for setting multinomial phrase probabilities θ = {p(s|t) ,∀st}, wmhulictihn oism tiahel plihkrealsieho pordob wabiithli respect t {op t(hse| t)r,u∀es It}N, distribution pˆI(s, t). [sent-131, score-0.434]
51 This is not unreasonable given the application to phrase pairs from OUT, but it suggests that an interesting alternative might be to use a plain log-linear weighting function exp("i λifi (s, t)), with outputs in [0, ∞] . [sent-143, score-0.35]
52 A final alternate approach would be to combine weighted joint frequencies rather than conditional estimates, ie: cI(s, t) + wλ(s, t)co(, s, t), suitably normalized. [sent-148, score-0.145]
53 3 Simple Features We used 22 features for the logistic weighting model, divided into two groups: one intended to reflect the degree to which a phrase pair belongs to general language, and one intended to capture similarity to the IN domain. [sent-153, score-0.628]
54 6One of our experimental settings lacks document boundaries, and we used this approximation in both settings for consistency. [sent-156, score-0.106]
55 4 SVM Feature In addition to using the simple features directly, we also trained an SVM classifier with these features to distinguish between IN and OUT phrase pairs. [sent-161, score-0.121]
56 Phrase tables were extracted from the IN and OUT training corpora (not the dev as was used for instance weighting models), and phrase pairs in the intersection of the IN and OUT phrase tables were used as positive examples, with two alternate definitions of negative examples: 1. [sent-162, score-0.776]
57 Pairs from OUT that are not in IN, but whose source phrase is. [sent-163, score-0.17]
58 Pairs from OUT that are not in IN, but whose source phrase is, and where the intersection of IN and OUT translations for that source phrase is empty. [sent-165, score-0.4]
59 We used it to score all phrase pairs in the OUT table, in order to provide a feature for the instance-weighting model. [sent-168, score-0.178]
60 1 Experiments Corpora and System We carried out translation experiments in two different settings. [sent-170, score-0.104]
61 org/ europarl) as OUT, for English/French translation in both directions. [sent-173, score-0.104]
62 The dev and test sets were randomly chosen from the EMEA corpus. [sent-174, score-0.163]
63 The second setting uses the news-related subcorpora for the NIST09 MT Chinese to English evaluation8 as IN, and the remaining NIST parallel Chinese/English corpora (UN, Hong Kong Laws, and Hong Kong Hansard) as OUT. [sent-176, score-0.098]
64 The dev corpus was taken from the NIST05 evaluation set, augmented with some randomly-selected material reserved from the training set. [sent-177, score-0.163]
65 (Thus the domain of the dev and test corpora matches IN. [sent-179, score-0.261]
66 ) Compared to the EMEA/EP setting, the two domains in the NIST setting are less homogeneous and more similar to each other; there is also considerably more IN text available. [sent-180, score-0.125]
67 Feature weights were set using Och’s MERT algorithm (Och, 2003). [sent-192, score-0.098]
68 The corpus was wordaligned using both HMM and IBM2 models, and the phrase table was the union of phrases extracted from these separate alignments, with a length limit of 7. [sent-193, score-0.121]
69 It was filtered to retain the top 30 translations for each source phrase using the TM part of the current log-linear model. [sent-194, score-0.17]
70 The 1st block contains the simple baselines from section 2. [sent-197, score-0.117]
71 Log-linear combination (loglin) improves on this in all cases, and also beats the pure IN system. [sent-200, score-0.129]
72 The 2nd block contains the IR system, which was tuned by selecting text in multiples of the size of the EMEA training corpus, according to dev set performance. [sent-201, score-0.233]
73 However, when the linear LM is combined with a (fren) and French (enfr); and for NIST Chinese to English translation with NIST06 and NIST08 evaluation sets. [sent-205, score-0.169]
74 linear TM (lm+lin tm) or MAP TM (lm+map TM), the results are much better than a log-linear combination for the EMEA setting, and on a par for NIST. [sent-207, score-0.13]
75 This is consistent with the nature of these two settings: log-linear combination, which effectively takes the intersection of IN and OUT, does relatively better on NIST, where the domains are broader and closer together. [sent-208, score-0.1]
76 The 4th block contains instance-weighting models trained on all features, used within a MAP TM combination, and with a linear LM mixture. [sent-210, score-0.135]
77 The iw all map variant uses a non-0 γ weight on a uniform prior in po(s|t), and outperforms a version with γ = 0 (iw all) a)n,d a nthde o“uftlpatetrefnoerdm” sva ari vanerts dioenscribed in section 3. [sent-211, score-0.199]
78 Clearly, retaining the original frequencies is important for good performance, and globally smoothing the final weighted frequencies is crucial. [sent-213, score-0.199]
79 This best instance-weighting model beats the equivalant model without instance weights by between 0. [sent-214, score-0.199]
80 5 Related Work We have already mentioned the closely related work by Matsoukas et al (2009) on discriminative corpus weighting, and Jiang and Zhai (2007) on (nondiscriminative) instance weighting. [sent-220, score-0.125]
81 However, for multinomial models like our LMs and TMs, there is a one to one correspondence between instances and features, eg the correspondence between a phrase pair (s, t) and its conditional multinomial probability p(s|t). [sent-226, score-0.305]
82 Moving beyond directly related work, major themes in SMT adaptation include the IR (Hildebrand et al. [sent-229, score-0.258]
83 There has also been some work on adapting the word alignment model prior to phrase extraction (Civera and Juan, 2007; Wu et al. [sent-235, score-0.179]
84 , 2005), and on dynamically choosing a dev set (Xu et al. [sent-236, score-0.163]
85 Other work includes transferring latent topic distributions from source to target language for LM adaptation, (Tam et al. [sent-238, score-0.098]
86 6 Conclusion In this paper we have proposed an approach for instance-weighting phrase pairs in an out-of-domain corpus in order to improve in-domain performance. [sent-240, score-0.178]
87 Each out-of-domain phrase pair is characterized by a set of simple features intended to reflect how useful it will be. [sent-241, score-0.261]
88 The features are weighted within a logistic model to give an overall weight that is applied to the phrase pair’s frequency prior to making MAP-smoothed relative-frequency estimates (different weights are learned for each conditioning direction). [sent-242, score-0.482]
89 These estimates are in turn combined linearly with relative-frequency estimates from an in-domain phrase table. [sent-243, score-0.305]
90 Mixing, smoothing, and instance-feature weights are learned at the same time using an efficient maximum-likelihood procedure that relies on only a small in-domain development corpus. [sent-244, score-0.098]
91 8 over an equivalent mixture model (with an identical training procedure but without instance weighting). [sent-248, score-0.113]
92 In future work we plan to try this approach with more competitive SMT systems, and to extend instance weighting to other standard SMT components such as the LM, lexical phrase weights, and lexicalized distortion. [sent-249, score-0.33]
93 We will also directly compare with 458 a baseline similar to the Matsoukas et al approach in order to measure the benefit from weighting phrase pairs (or ngrams) rather than full sentences. [sent-250, score-0.438]
94 Finally, we intend to explore more sophisticated instanceweighting features for capturing the degree of generality of phrase pairs. [sent-251, score-0.225]
95 Language model adaptation with MAP estimation and the perceptron algorithm. [sent-257, score-0.258]
96 Domain adaptation for statistical machine translation with monolingual resources. [sent-261, score-0.4]
97 Domain adaptation in Statistical Machine Translation with mixture modelling. [sent-265, score-0.334]
98 Adaptation of the translation model for statistical machine translation based on information retrieval. [sent-296, score-0.246]
99 Translation model adaptation for an arabic/french news translation system by lightly-supervised training. [sent-343, score-0.362]
100 Language model adaptation for statistical machine translation with structured query models. [sent-380, score-0.4]
wordName wordTfidf (topN-words)
[('tm', 0.305), ('adaptation', 0.258), ('lm', 0.217), ('smt', 0.178), ('weighting', 0.172), ('matsoukas', 0.171), ('dev', 0.163), ('kuhn', 0.154), ('po', 0.14), ('emea', 0.128), ('foster', 0.125), ('phrase', 0.121), ('finch', 0.12), ('tms', 0.12), ('co', 0.113), ('map', 0.112), ('ci', 0.11), ('daum', 0.11), ('arg', 0.105), ('translation', 0.104), ('weights', 0.098), ('domain', 0.098), ('intended', 0.098), ('mert', 0.094), ('perplexities', 0.093), ('estimates', 0.092), ('sumita', 0.09), ('al', 0.088), ('lms', 0.085), ('hildebrand', 0.077), ('roland', 0.077), ('mixture', 0.076), ('wmt', 0.071), ('block', 0.07), ('logp', 0.068), ('max', 0.068), ('linear', 0.065), ('combination', 0.065), ('beats', 0.064), ('frequencies', 0.061), ('intersection', 0.06), ('bacchiani', 0.06), ('civera', 0.06), ('downweight', 0.06), ('ifi', 0.06), ('instanceweighting', 0.06), ('ipi', 0.06), ('liikanen', 0.06), ('ppo', 0.06), ('silapo', 0.06), ('bleu', 0.06), ('europarl', 0.058), ('adapting', 0.058), ('pairs', 0.057), ('jiang', 0.057), ('mixing', 0.057), ('parallel', 0.053), ('settings', 0.053), ('ss', 0.053), ('logistic', 0.053), ('nist', 0.053), ('kt', 0.051), ('precludes', 0.051), ('tam', 0.051), ('multinomial', 0.051), ('log', 0.049), ('zhai', 0.049), ('source', 0.049), ('target', 0.049), ('och', 0.047), ('baselines', 0.047), ('ueffing', 0.046), ('eck', 0.046), ('matthias', 0.046), ('iw', 0.046), ('kneser', 0.046), ('schwenk', 0.046), ('setting', 0.045), ('pi', 0.045), ('granularity', 0.045), ('alternate', 0.045), ('svm', 0.045), ('probabilities', 0.045), ('degree', 0.044), ('pointing', 0.043), ('ppi', 0.043), ('pair', 0.042), ('weight', 0.041), ('domains', 0.04), ('eg', 0.04), ('kong', 0.04), ('homogeneous', 0.04), ('weighted', 0.039), ('statistical', 0.038), ('smoothing', 0.038), ('conditioning', 0.038), ('hong', 0.038), ('st', 0.038), ('instance', 0.037), ('zhao', 0.037)]
simIndex simValue paperId paperTitle
same-paper 1 1.000001 39 emnlp-2010-EMNLP 044
Author: George Foster
Abstract: We describe a new approach to SMT adaptation that weights out-of-domain phrase pairs according to their relevance to the target domain, determined by both how similar to it they appear to be, and whether they belong to general language or not. This extends previous work on discriminative weighting by using a finer granularity, focusing on the properties of instances rather than corpus components, and using a simpler training procedure. We incorporate instance weighting into a mixture-model framework, and find that it yields consistent improvements over a wide range of baselines.
2 0.21944202 104 emnlp-2010-The Necessity of Combining Adaptation Methods
Author: Ming-Wei Chang ; Michael Connor ; Dan Roth
Abstract: Problems stemming from domain adaptation continue to plague the statistical natural language processing community. There has been continuing work trying to find general purpose algorithms to alleviate this problem. In this paper we argue that existing general purpose approaches usually only focus on one of two issues related to the difficulties faced by adaptation: 1) difference in base feature statistics or 2) task differences that can be detected with labeled data. We argue that it is necessary to combine these two classes of adaptation algorithms, using evidence collected through theoretical analysis and simulated and real-world data experiments. We find that the combined approach often outperforms the individual adaptation approaches. By combining simple approaches from each class of adaptation algorithm, we achieve state-of-the-art results for both Named Entity Recognition adaptation task and the Preposition Sense Disambiguation adaptation task. Second, we also show that applying an adaptation algorithm that finds shared representation between domains often impacts the choice in adaptation algorithm that makes use of target labeled data.
3 0.17950772 5 emnlp-2010-A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages
Author: Minh-Thang Luong ; Preslav Nakov ; Min-Yen Kan
Abstract: We propose a language-independent approach for improving statistical machine translation for morphologically rich languages using a hybrid morpheme-word representation where the basic unit of translation is the morpheme, but word boundaries are respected at all stages of the translation process. Our model extends the classic phrase-based model by means of (1) word boundary-aware morpheme-level phrase extraction, (2) minimum error-rate training for a morpheme-level translation model using word-level BLEU, and (3) joint scoring with morpheme- and word-level language models. Further improvements are achieved by combining our model with the classic one. The evaluation on English to Finnish using Europarl (714K sentence pairs; 15.5M English words) shows statistically significant improvements over the classic model based on BLEU and human judgments.
4 0.15557334 47 emnlp-2010-Example-Based Paraphrasing for Improved Phrase-Based Statistical Machine Translation
Author: Aurelien Max
Abstract: In this article, an original view on how to improve phrase translation estimates is proposed. This proposal is grounded on two main ideas: first, that appropriate examples of a given phrase should participate more in building its translation distribution; second, that paraphrases can be used to better estimate this distribution. Initial experiments provide evidence of the potential of our approach and its implementation for effectively improving translation performance.
5 0.14268446 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning
Author: Lei Shi ; Rada Mihalcea ; Mingjun Tian
Abstract: In this paper, we introduce a method that automatically builds text classifiers in a new language by training on already labeled data in another language. Our method transfers the classification knowledge across languages by translating the model features and by using an Expectation Maximization (EM) algorithm that naturally takes into account the ambiguity associated with the translation of a word. We further exploit the readily available unlabeled data in the target language via semisupervised learning, and adapt the translated model to better fit the data distribution of the target language.
6 0.13910893 35 emnlp-2010-Discriminative Sample Selection for Statistical Machine Translation
7 0.12717997 29 emnlp-2010-Combining Unsupervised and Supervised Alignments for MT: An Empirical Study
8 0.12685366 119 emnlp-2010-We're Not in Kansas Anymore: Detecting Domain Changes in Streams
9 0.12066209 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice
10 0.11628379 50 emnlp-2010-Facilitating Translation Using Source Language Paraphrase Lattices
11 0.11571863 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding
12 0.10544096 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities
13 0.10216313 63 emnlp-2010-Improving Translation via Targeted Paraphrasing
14 0.098641448 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions
15 0.094918653 36 emnlp-2010-Discriminative Word Alignment with a Function Word Reordering Model
16 0.092793755 22 emnlp-2010-Automatic Evaluation of Translation Quality for Distant Language Pairs
17 0.088770591 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment
18 0.077315673 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models
19 0.073573545 99 emnlp-2010-Statistical Machine Translation with a Factorized Grammar
20 0.07265991 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing
topicId topicWeight
[(0, 0.318), (1, -0.124), (2, -0.059), (3, -0.015), (4, -0.052), (5, 0.014), (6, 0.117), (7, 0.116), (8, -0.026), (9, 0.158), (10, 0.092), (11, 0.143), (12, -0.089), (13, 0.06), (14, 0.02), (15, -0.06), (16, -0.057), (17, -0.019), (18, -0.112), (19, -0.021), (20, 0.088), (21, 0.026), (22, 0.11), (23, 0.23), (24, -0.004), (25, 0.023), (26, 0.166), (27, 0.06), (28, 0.095), (29, 0.121), (30, 0.018), (31, 0.134), (32, -0.223), (33, 0.048), (34, -0.287), (35, 0.008), (36, -0.006), (37, -0.01), (38, 0.103), (39, -0.034), (40, 0.044), (41, -0.087), (42, 0.06), (43, -0.076), (44, 0.066), (45, 0.012), (46, 0.055), (47, -0.001), (48, 0.003), (49, -0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.95482206 39 emnlp-2010-EMNLP 044
Author: George Foster
Abstract: We describe a new approach to SMT adaptation that weights out-of-domain phrase pairs according to their relevance to the target domain, determined by both how similar to it they appear to be, and whether they belong to general language or not. This extends previous work on discriminative weighting by using a finer granularity, focusing on the properties of instances rather than corpus components, and using a simpler training procedure. We incorporate instance weighting into a mixture-model framework, and find that it yields consistent improvements over a wide range of baselines.
2 0.74490273 104 emnlp-2010-The Necessity of Combining Adaptation Methods
Author: Ming-Wei Chang ; Michael Connor ; Dan Roth
Abstract: Problems stemming from domain adaptation continue to plague the statistical natural language processing community. There has been continuing work trying to find general purpose algorithms to alleviate this problem. In this paper we argue that existing general purpose approaches usually only focus on one of two issues related to the difficulties faced by adaptation: 1) difference in base feature statistics or 2) task differences that can be detected with labeled data. We argue that it is necessary to combine these two classes of adaptation algorithms, using evidence collected through theoretical analysis and simulated and real-world data experiments. We find that the combined approach often outperforms the individual adaptation approaches. By combining simple approaches from each class of adaptation algorithm, we achieve state-of-the-art results for both Named Entity Recognition adaptation task and the Preposition Sense Disambiguation adaptation task. Second, we also show that applying an adaptation algorithm that finds shared representation between domains often impacts the choice in adaptation algorithm that makes use of target labeled data.
3 0.61757475 5 emnlp-2010-A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages
Author: Minh-Thang Luong ; Preslav Nakov ; Min-Yen Kan
Abstract: We propose a language-independent approach for improving statistical machine translation for morphologically rich languages using a hybrid morpheme-word representation where the basic unit of translation is the morpheme, but word boundaries are respected at all stages of the translation process. Our model extends the classic phrase-based model by means of (1) word boundary-aware morpheme-level phrase extraction, (2) minimum error-rate training for a morpheme-level translation model using word-level BLEU, and (3) joint scoring with morpheme- and word-level language models. Further improvements are achieved by combining our model with the classic one. The evaluation on English to Finnish using Europarl (714K sentence pairs; 15.5M English words) shows statistically significant improvements over the classic model based on BLEU and human judgments.
4 0.52351975 119 emnlp-2010-We're Not in Kansas Anymore: Detecting Domain Changes in Streams
Author: Mark Dredze ; Tim Oates ; Christine Piatko
Abstract: Domain adaptation, the problem of adapting a natural language processing system trained in one domain to perform well in a different domain, has received significant attention. This paper addresses an important problem for deployed systems that has received little attention – detecting when such adaptation is needed by a system operating in the wild, i.e., performing classification over a stream of unlabeled examples. Our method uses Aodifst uannlaceb,e a dm eextraicm fpolre detecting tshhoifdts u sine sd Aatastreams, combined with classification margins to detect domain shifts. We empirically show effective domain shift detection on a variety of data sets and shift conditions.
5 0.45414722 35 emnlp-2010-Discriminative Sample Selection for Statistical Machine Translation
Author: Sankaranarayanan Ananthakrishnan ; Rohit Prasad ; David Stallard ; Prem Natarajan
Abstract: Production of parallel training corpora for the development of statistical machine translation (SMT) systems for resource-poor languages usually requires extensive manual effort. Active sample selection aims to reduce the labor, time, and expense incurred in producing such resources, attaining a given performance benchmark with the smallest possible training corpus by choosing informative, nonredundant source sentences from an available candidate pool for manual translation. We present a novel, discriminative sample selection strategy that preferentially selects batches of candidate sentences with constructs that lead to erroneous translations on a held-out development set. The proposed strategy supports a built-in diversity mechanism that reduces redundancy in the selected batches. Simulation experiments on English-to-Pashto and Spanish-to-English translation tasks demon- strate the superiority of the proposed approach to a number of competing techniques, such as random selection, dissimilarity-based selection, as well as a recently proposed semisupervised active learning strategy.
6 0.42697001 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding
7 0.39551312 29 emnlp-2010-Combining Unsupervised and Supervised Alignments for MT: An Empirical Study
8 0.39230892 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice
9 0.390423 47 emnlp-2010-Example-Based Paraphrasing for Improved Phrase-Based Statistical Machine Translation
10 0.38099268 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning
11 0.35572618 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing
12 0.35504779 17 emnlp-2010-An Efficient Algorithm for Unsupervised Word Segmentation with Branching Entropy and MDL
13 0.34302866 22 emnlp-2010-Automatic Evaluation of Translation Quality for Distant Language Pairs
14 0.33016697 50 emnlp-2010-Facilitating Translation Using Source Language Paraphrase Lattices
15 0.32593313 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models
16 0.31963953 108 emnlp-2010-Training Continuous Space Language Models: Some Practical Issues
17 0.31660095 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions
18 0.307098 63 emnlp-2010-Improving Translation via Targeted Paraphrasing
19 0.29013318 36 emnlp-2010-Discriminative Word Alignment with a Function Word Reordering Model
20 0.27589554 76 emnlp-2010-Maximum Entropy Based Phrase Reordering for Hierarchical Phrase-Based Translation
topicId topicWeight
[(3, 0.014), (10, 0.012), (12, 0.022), (29, 0.071), (30, 0.028), (32, 0.014), (52, 0.546), (56, 0.057), (62, 0.013), (66, 0.096), (72, 0.034), (76, 0.012), (87, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.87072253 39 emnlp-2010-EMNLP 044
Author: George Foster
Abstract: We describe a new approach to SMT adaptation that weights out-of-domain phrase pairs according to their relevance to the target domain, determined by both how similar to it they appear to be, and whether they belong to general language or not. This extends previous work on discriminative weighting by using a finer granularity, focusing on the properties of instances rather than corpus components, and using a simpler training procedure. We incorporate instance weighting into a mixture-model framework, and find that it yields consistent improvements over a wide range of baselines.
2 0.84406883 36 emnlp-2010-Discriminative Word Alignment with a Function Word Reordering Model
Author: Hendra Setiawan ; Chris Dyer ; Philip Resnik
Abstract: We address the modeling, parameter estimation and search challenges that arise from the introduction of reordering models that capture non-local reordering in alignment modeling. In particular, we introduce several reordering models that utilize (pairs of) function words as contexts for alignment reordering. To address the parameter estimation challenge, we propose to estimate these reordering models from a relatively small amount of manuallyaligned corpora. To address the search challenge, we devise an iterative local search algorithm that stochastically explores reordering possibilities. By capturing non-local reordering phenomena, our proposed alignment model bears a closer resemblance to stateof-the-art translation model. Empirical results show significant improvements in alignment quality as well as in translation performance over baselines in a large-scale ChineseEnglish translation task.
Author: Zhongqiang Huang ; Martin Cmejrek ; Bowen Zhou
Abstract: In this paper, we present a novel approach to enhance hierarchical phrase-based machine translation systems with linguistically motivated syntactic features. Rather than directly using treebank categories as in previous studies, we learn a set of linguistically-guided latent syntactic categories automatically from a source-side parsed, word-aligned parallel corpus, based on the hierarchical structure among phrase pairs as well as the syntactic structure of the source side. In our model, each X nonterminal in a SCFG rule is decorated with a real-valued feature vector computed based on its distribution of latent syntactic categories. These feature vectors are utilized at decod- ing time to measure the similarity between the syntactic analysis of the source side and the syntax of the SCFG rules that are applied to derive translations. Our approach maintains the advantages of hierarchical phrase-based translation systems while at the same time naturally incorporates soft syntactic constraints.
4 0.50656027 76 emnlp-2010-Maximum Entropy Based Phrase Reordering for Hierarchical Phrase-Based Translation
Author: Zhongjun He ; Yao Meng ; Hao Yu
Abstract: Hierarchical phrase-based (HPB) translation provides a powerful mechanism to capture both short and long distance phrase reorderings. However, the phrase reorderings lack of contextual information in conventional HPB systems. This paper proposes a contextdependent phrase reordering approach that uses the maximum entropy (MaxEnt) model to help the HPB decoder select appropriate reordering patterns. We classify translation rules into several reordering patterns, and build a MaxEnt model for each pattern based on various contextual features. We integrate the MaxEnt models into the HPB model. Experimental results show that our approach achieves significant improvements over a standard HPB system on large-scale translation tasks. On Chinese-to-English translation, , the absolute improvements in BLEU (caseinsensitive) range from 1.2 to 2.1.
5 0.49137953 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities
Author: Adria de Gispert ; Juan Pino ; William Byrne
Abstract: We report on investigations into hierarchical phrase-based translation grammars based on rules extracted from posterior distributions over alignments of the parallel text. Rather than restrict rule extraction to a single alignment, such as Viterbi, we instead extract rules based on posterior distributions provided by the HMM word-to-word alignmentmodel. We define translation grammars progressively by adding classes of rules to a basic phrase-based system. We assess these grammars in terms of their expressive power, measured by their ability to align the parallel text from which their rules are extracted, and the quality of the translations they yield. In Chinese-to-English translation, we find that rule extraction from posteriors gives translation improvements. We also find that grammars with rules with only one nonterminal, when extracted from posteri- ors, can outperform more complex grammars extracted from Viterbi alignments. Finally, we show that the best way to exploit source-totarget and target-to-source alignment models is to build two separate systems and combine their output translation lattices.
6 0.47043034 29 emnlp-2010-Combining Unsupervised and Supervised Alignments for MT: An Empirical Study
7 0.45521688 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding
8 0.43403965 3 emnlp-2010-A Fast Fertility Hidden Markov Model for Word Alignment Using MCMC
9 0.42777374 35 emnlp-2010-Discriminative Sample Selection for Statistical Machine Translation
10 0.41566959 5 emnlp-2010-A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages
11 0.40415183 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment
12 0.39220107 42 emnlp-2010-Efficient Incremental Decoding for Tree-to-String Translation
13 0.38745573 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice
14 0.38472363 68 emnlp-2010-Joint Inference for Bilingual Semantic Role Labeling
15 0.38110629 104 emnlp-2010-The Necessity of Combining Adaptation Methods
16 0.37886631 47 emnlp-2010-Example-Based Paraphrasing for Improved Phrase-Based Statistical Machine Translation
17 0.37641731 50 emnlp-2010-Facilitating Translation Using Source Language Paraphrase Lattices
18 0.37042245 86 emnlp-2010-Non-Isomorphic Forest Pair Translation
19 0.36619279 80 emnlp-2010-Modeling Organization in Student Essays
20 0.36371511 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar