acl acl2011 acl2011-57 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Coskun Mermer ; Murat Saraclar
Abstract: In this work, we compare the translation performance of word alignments obtained via Bayesian inference to those obtained via expectation-maximization (EM). We propose a Gibbs sampler for fully Bayesian inference in IBM Model 1, integrating over all possible parameter values in finding the alignment distribution. We show that Bayesian inference outperforms EM in all of the tested language pairs, domains and data set sizes, by up to 2.99 BLEU points. We also show that the proposed method effectively addresses the well-known rare word problem in EM-estimated models; and at the same time induces a much smaller dictionary of bilingual word-pairs. .t r
Reference: text
sentIndex sentText sentNum sentScore
1 Bayesian Word Alignment for Statistical Machine Translation Mermer1,2 1BILGEM Cos ¸kun TUBITAK Gebze 41470 Kocaeli, Turkey co s kun @ uekae tubit ak gov . [sent-1, score-0.068]
2 Abstract In this work, we compare the translation performance of word alignments obtained via Bayesian inference to those obtained via expectation-maximization (EM). [sent-3, score-0.499]
3 We propose a Gibbs sampler for fully Bayesian inference in IBM Model 1, integrating over all possible parameter values in finding the alignment distribution. [sent-4, score-0.659]
4 We show that Bayesian inference outperforms EM in all of the tested language pairs, domains and data set sizes, by up to 2. [sent-5, score-0.153]
5 We also show that the proposed method effectively addresses the well-known rare word problem in EM-estimated models; and at the same time induces a much smaller dictionary of bilingual word-pairs. [sent-7, score-0.06]
6 t r 1 Introduction Word alignment is a crucial early step in the training of most statistical machine translation (SMT) systems, in which the estimated alignments are used for constraining the set ofcandidates in phrase/grammar extraction (Koehn et al. [sent-9, score-0.602]
7 State-of-the-art word alignment models, such as IBM Models (Brown et al. [sent-12, score-0.256]
8 , 1996), and the jointly-trained symmetric HMM (Liang et al. [sent-14, score-0.059]
9 , word translation probabilities) that need to be estimated in addition to the desired hidden alignment variables. [sent-17, score-0.421]
10 The most common method of inference in such models is expectation-maximization (EM) (Dempster et al. [sent-18, score-0.153]
11 Bogazici University Bebek 34342 Istanbul, Turkey murat . [sent-22, score-0.088]
12 In essence, the alignment distribution obtained via EM takes into account only the most likely point in the parameter space, but does not consider contributions from other points. [sent-28, score-0.374]
13 Zhao and Xing (2006) note that the parameter estimation (for which they use variational EM) suffers from data sparsity and use symmetric Dirichlet priors, but they find the MAP solution. [sent-30, score-0.204]
14 , 2009) and learning phrase alignments directly (DeNero et al. [sent-33, score-0.22]
15 Word alignment learning problem was addressed jointly with segmentation learning in Xu et al. [sent-35, score-0.256]
16 The former two works place nonparametric priors (also known as cache models) on the parameters and utilize Gibbs sampling. [sent-38, score-0.118]
17 However, alignment inference in neither of these works is exactly Bayesian since the alignments are updated by running GIZA++ (Xu et al. [sent-39, score-0.629]
18 i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 182–187, Chung and Gildea (2009) apply a sparse Dirichlet prior on the multinomial parameters to prevent overfitting. [sent-44, score-0.163]
19 They use variational Bayes for inference, but they do not investigate the effect of Bayesian inference to word alignment in isolation. [sent-45, score-0.454]
20 Even though they report substantial reductions in alignment error rate, the translation BLEU scores do not improve. [sent-47, score-0.382]
21 Our approach in this paper is fully Bayesian in which the alignment probabilities are inferred by integrating over all possible parameter values assuming an intuitive, sparse prior. [sent-48, score-0.409]
22 We evaluate the inferred alignments in terms of the end-toend translation performance, where we show the results with a variety of input data to illustrate the general applicability of the proposed technique. [sent-50, score-0.346]
23 To our knowledge, this is the first work to directly investigate the effects of Bayesian alignment inference on translation performance. [sent-51, score-0.535]
24 2 Bayesian Inference with IBM Model 1 × Given a sentence-aligned parallel corpus (E, F), let ei (fj) denote the i-th (j-th) source (target)1 word in e (f), which in turn consists of I words and (J) denotes the s-th sentence in E (F). [sent-52, score-0.115]
25 2 Each source sentence is also hypothesized to have an additional imaginary “null” word e0. [sent-53, score-0.114]
26 Also let VE (VF) denote the size of the observed source (target) vocabulary. [sent-54, score-0.077]
27 , 1993), each target word 1We use the “source” and “target” labels following the generative process, in which E generates F (cf. [sent-56, score-0.055]
28 2Dependence of the sentence-level variables e, f, I, J (and a and n, which are introduced later) on the sentence index s should be understood even though not explicitly indicated for notational simplicity. [sent-59, score-0.037]
29 183 fj is associated with a hidden alignment variable aj whose value ranges over the word positions in the corresponding source sentence. [sent-60, score-0.577]
30 The set of alignments for a sentence (corpus) is denoted by a (A). [sent-61, score-0.22]
31 The model parameters consist of a VE VF table T of word translation probabilities ×suc Vh that te,f = P(f| e) . [sent-62, score-0.126]
32 stribution of the Model-1 variables is given by the following generative model3 : P(E,F,A;T) = YP(e)P(a|e)P(f|a,e;T) (1) Ys =Ys(IP +(e 1))JjY=J1teaj,fj (2) In the proposed Bayesian setting, we treat T as a random variable with a prior P(T). [sent-64, score-0.098]
33 Since the distribution over {te,f} in (4) is iPn the exponential family, specifically being a nm (u4l)ti inso iPnm tihale distribution, we cihlyo,o spsee tchifei conjugate prior, in this case the Dirichlet distribution, for computational convenience. [sent-66, score-0.106]
34 A sparse prior favors 3We omit P(J|e) since both J and e are observed and so this Wterme o dmoeits Pn(otJ |afef)ec sti ntchee i bnofethre Jnc aen odf ehiad dreen o bvsaeriravebdles a. [sent-69, score-0.163]
35 n distributions that peak at a single target word and penalizes flatter translation distributions, even for rare words. [sent-70, score-0.278]
36 This choice addresses the well-known problem in the IBM Models, and more severely in Model 1, in which rare words act as “garbage collectors” (Och and Ney, 2003) and get assigned excessively large number of word alignments. [sent-71, score-0.097]
37 Then we obtain the joint distribution of all (observed + hidden) variables as: P(E, F, A, T; Θ) = P(T; Θ) P(E, F, A|T) (5) where Θ = Θ1 · · · ΘVE . [sent-72, score-0.106]
38 To infer the posterior distribution of the alignments, we use Gibbs sampling (Geman and Geman, 1984). [sent-73, score-0.179]
39 One possible method is to derive the Gibbs sampler from P(E, F, A, T; Θ) obtained in (5) and sample the unknowns A and T in turn, resulting in an explicit Gibbs sampler. [sent-74, score-0.162]
40 In this work, we marginalize out T by: P(E,F,A;Θ) =ZTP(E,F,A,T;Θ) (6) and obtain a collapsed Gibbs sampler, which samples only the alignment variables. [sent-75, score-0.297]
41 Using P(E, F, A; Θ) obtained in (6), the Gibbs sampling formula for the individual alignments is derived as:4 P(aj = i|E, F, A¬j; Θ) =PfV=F1NNe¬ei¬,ij,jf j++ θPei,fVf=Fj1θei,f (7) where the superscPript ¬j denotePs the exclusion of twheh ecreurr tehent svuaplueers ocrfi aj. [sent-76, score-0.367]
42 Once the Gibbs sampler is deemed to have converged after B burn-in iterations, we collect M samples of A with L iterations in-between5 to estimate P(A|E, F). [sent-82, score-0.203]
43 , 2003), we select for each aj the most frequent value in the M collected samples. [sent-84, score-0.113]
44 4The derivation is quite standard and similar to other Dirichlet-multinomial Gibbs sampler derivations, e. [sent-85, score-0.162]
45 3 Experimental Setup For Turkish↔English experiments, we used the 2F0orK- Tsuenrtkeinshce↔ tEranvgelli hdoem xapiner iBmTeEnCts, ,dwa taese uts (Kikui et al. [sent-90, score-0.037]
46 For Czech↔English, we used the 95K-sentence news commentary parallel corpus hfreo m95 tKh-es eWntMenTc esh naerweds task8 for training, news2008 set for development, news2009 set for testing, and the 438M-word English and 81. [sent-92, score-0.135]
47 7M-word Czech monolingual news corpora for additional language model (LM) training. [sent-93, score-0.061]
48 All language models are 4-gram in the travel domain experiments and 5-gram in the news domain experiments. [sent-95, score-0.13]
49 For each language pair, we trained standard phrase-based SMT systems in both directions (including alignment symmetrization and log-linear model tuning) using Moses (Koehn et al. [sent-96, score-0.293]
50 To obtain word alignments, we used the accompanying Perl code for Bayesian inference and 6International Workshop on Spoken Language Translation. [sent-99, score-0.221]
51 For each translation task, we report two EM estimates, obtained after 5 and 80 iterations (EM-5 and EM-80), respectively; and three Gibbs sampling estimates, two of which were initialized with those two EM Viterbi alignments (GS-5 and GS-80) and a third was initialized naively9 (GS-N). [sent-116, score-0.532]
52 4 Results Table 2 compares the BLEU scores of Bayesian inference and EM estimation. [sent-120, score-0.153]
53 99 (in English-to-Turkish) BLEU points in travel domain and from 0. [sent-124, score-0.069]
54 Compared to the state-of-the-art IBM Model 4, the Bayesian Model 1 is better in all travel domain tasks and is comparable or better in the news domain. [sent-127, score-0.13]
55 Fertility of a source word is defined as the number of target words aligned to it. [sent-128, score-0.132]
56 Table 3 shows the distribution of fertilities in alignments obtained from different methods. [sent-129, score-0.401]
57 5282K 5 GEMMS--8480 2489289421893 109842490149 6 Table 3: Distribution ofinferred alignment fertilities. [sent-147, score-0.256]
58 The four blocks of rows from top to bottom correspond to (in order) the total number of source tokens, source tokens with fertilities in the range 4–7, source tokens with fertilities higher than 7, and the maximum observed fertility. [sent-148, score-0.455]
59 The first language listed is the source in alignment (Section 2). [sent-149, score-0.333]
60 aTtehse “neuxmcebsesriv oef” d ailsitginnmct word-pairs yin ≥du 8c)ed by an alignment has been recently proposed as an objective function for word alignment (Bodrumlu et al. [sent-154, score-0.512]
61 Table 4 shows that the proposed inference method substantially reduces the alignment dictionary size, in most cases by more than 50%. [sent-157, score-0.409]
62 5 Conclusion We developed a Gibbs sampling-based Bayesian inference method for IBM Model 1 word alignments and showed that it outperforms EM estimation in terms of translation BLEU scores across several language pairs, data sizes and domains. [sent-158, score-0.593]
63 As a result of this increase, Bayesian Model 1 alignments perform close to or better than the state-of-the-art IBM 10The GIZA++ implementation of Model 4 artificially limits fertility parameter values to at most nine. [sent-159, score-0.438]
64 The proposed method learns a compact, sparse translation distribution, overcoming the wellknown “garbage collection” problem of rare words in EM-estimated current models. [sent-161, score-0.251]
65 Scalable inference and training of context-rich syntactic translation models. [sent-208, score-0.279]
66 Bayesian inference for PCFGs via Markov chain Monte Carlo. [sent-221, score-0.153]
67 Z-MERT: A fully configurable open source tool for minimum error rate training of machine translation systems. [sent-276, score-0.242]
68 A fast fertility hidden Markov model for word alignment using MCMC. [sent-280, score-0.464]
wordName wordTfidf (topN-words)
[('bayesian', 0.311), ('gibbs', 0.294), ('alignment', 0.256), ('em', 0.232), ('alignments', 0.22), ('ibm', 0.174), ('fertility', 0.169), ('sampler', 0.162), ('inference', 0.153), ('translation', 0.126), ('bleu', 0.119), ('aj', 0.113), ('fertilities', 0.112), ('sampling', 0.11), ('dirichlet', 0.109), ('geman', 0.097), ('fj', 0.092), ('murat', 0.088), ('bodrumlu', 0.084), ('egm', 0.084), ('source', 0.077), ('smt', 0.076), ('priors', 0.076), ('garbage', 0.074), ('zhao', 0.071), ('chung', 0.071), ('travel', 0.069), ('ys', 0.069), ('distribution', 0.069), ('kun', 0.068), ('accompanying', 0.068), ('turkey', 0.068), ('nguyen', 0.066), ('sparse', 0.065), ('afp', 0.064), ('kikui', 0.064), ('vf', 0.064), ('ip', 0.062), ('news', 0.061), ('prior', 0.061), ('rare', 0.06), ('czech', 0.059), ('symmetric', 0.059), ('vogel', 0.059), ('koehn', 0.056), ('iwslt', 0.056), ('target', 0.055), ('ml', 0.054), ('sujith', 0.054), ('turkish', 0.053), ('estimation', 0.051), ('sara', 0.051), ('gildea', 0.051), ('ve', 0.051), ('xu', 0.051), ('giza', 0.051), ('parameter', 0.049), ('hmm', 0.049), ('monte', 0.047), ('och', 0.047), ('variational', 0.045), ('dempster', 0.045), ('chiang', 0.044), ('goldwater', 0.043), ('blunsom', 0.043), ('graehl', 0.043), ('sizes', 0.043), ('sharon', 0.043), ('estimates', 0.042), ('nonparametric', 0.042), ('knight', 0.042), ('denero', 0.041), ('samples', 0.041), ('te', 0.041), ('hermann', 0.041), ('griffiths', 0.04), ('resnik', 0.04), ('fully', 0.039), ('hidden', 0.039), ('initialized', 0.038), ('ei', 0.038), ('ney', 0.038), ('variables', 0.037), ('multinomial', 0.037), ('wli', 0.037), ('shaojun', 0.037), ('symmetrization', 0.037), ('admixture', 0.037), ('clar', 0.037), ('esh', 0.037), ('excessively', 0.037), ('flatter', 0.037), ('hfreo', 0.037), ('imaginary', 0.037), ('inso', 0.037), ('ntchee', 0.037), ('seiichi', 0.037), ('twheh', 0.037), ('uts', 0.037), ('wme', 0.037)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation
Author: Coskun Mermer ; Murat Saraclar
Abstract: In this work, we compare the translation performance of word alignments obtained via Bayesian inference to those obtained via expectation-maximization (EM). We propose a Gibbs sampler for fully Bayesian inference in IBM Model 1, integrating over all possible parameter values in finding the alignment distribution. We show that Bayesian inference outperforms EM in all of the tested language pairs, domains and data set sizes, by up to 2.99 BLEU points. We also show that the proposed method effectively addresses the well-known rare word problem in EM-estimated models; and at the same time induces a much smaller dictionary of bilingual word-pairs. .t r
2 0.2635743 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?
Author: Jinxi Xu ; Jinying Chen
Abstract: Word alignment is a central problem in statistical machine translation (SMT). In recent years, supervised alignment algorithms, which improve alignment accuracy by mimicking human alignment, have attracted a great deal of attention. The objective of this work is to explore the performance limit of supervised alignment under the current SMT paradigm. Our experiments used a manually aligned ChineseEnglish corpus with 280K words recently released by the Linguistic Data Consortium (LDC). We treated the human alignment as the oracle of supervised alignment. The result is surprising: the gain of human alignment over a state of the art unsupervised method (GIZA++) is less than 1point in BLEU. Furthermore, we showed the benefit of improved alignment becomes smaller with more training data, implying the above limit also holds for large training conditions. 1
3 0.23724531 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition
Author: John DeNero ; Klaus Macherey
Abstract: Unsupervised word alignment is most often modeled as a Markov process that generates a sentence f conditioned on its translation e. A similar model generating e from f will make different alignment predictions. Statistical machine translation systems combine the predictions of two directional models, typically using heuristic combination procedures like grow-diag-final. This paper presents a graphical model that embeds two directional aligners into a single model. Inference can be performed via dual decomposition, which reuses the efficient inference algorithms of the directional models. Our bidirectional model enforces a one-to-one phrase constraint while accounting for the uncertainty in the underlying directional models. The resulting alignments improve upon baseline combination heuristics in word-level and phrase-level evaluations.
4 0.22559758 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction
Author: Graham Neubig ; Taro Watanabe ; Eiichiro Sumita ; Shinsuke Mori ; Tatsuya Kawahara
Abstract: We present an unsupervised model for joint phrase alignment and extraction using nonparametric Bayesian methods and inversion transduction grammars (ITGs). The key contribution is that phrases of many granularities are included directly in the model through the use of a novel formulation that memorizes phrases generated not only by terminal, but also non-terminal symbols. This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs. Experiments on several language pairs demonstrate that the proposed model matches the accuracy of traditional two-step word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size.
5 0.2179828 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction
Author: Phil Blunsom ; Trevor Cohn
Abstract: In this work we address the problem of unsupervised part-of-speech induction by bringing together several strands of research into a single model. We develop a novel hidden Markov model incorporating sophisticated smoothing using a hierarchical Pitman-Yor processes prior, providing an elegant and principled means of incorporating lexical characteristics. Central to our approach is a new type-based sampling algorithm for hierarchical Pitman-Yor models in which we track fractional table counts. In an empirical evaluation we show that our model consistently out-performs the current state-of-the-art across 10 languages.
6 0.21590884 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features
7 0.21189313 94 acl-2011-Deciphering Foreign Language
8 0.20683251 141 acl-2011-Gappy Phrasal Alignment By Agreement
9 0.19203082 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity
10 0.17468284 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models
11 0.16966942 339 acl-2011-Word Alignment Combination over Multiple Word Segmentation
12 0.15044482 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
13 0.14948952 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation
14 0.14677142 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
15 0.14455903 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
16 0.14312243 224 acl-2011-Models and Training for Unsupervised Preposition Sense Disambiguation
17 0.14003621 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation
18 0.13560978 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
19 0.13409676 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers
20 0.13331166 232 acl-2011-Nonparametric Bayesian Machine Transliteration with Synchronous Adaptor Grammars
topicId topicWeight
[(0, 0.306), (1, -0.212), (2, 0.135), (3, 0.173), (4, 0.073), (5, -0.002), (6, 0.034), (7, 0.078), (8, -0.035), (9, 0.179), (10, 0.167), (11, 0.129), (12, 0.075), (13, 0.176), (14, -0.102), (15, 0.067), (16, 0.048), (17, 0.03), (18, -0.084), (19, 0.034), (20, -0.015), (21, 0.091), (22, 0.028), (23, 0.034), (24, 0.118), (25, 0.067), (26, 0.003), (27, -0.002), (28, -0.017), (29, -0.014), (30, -0.043), (31, -0.005), (32, -0.062), (33, -0.005), (34, 0.035), (35, 0.003), (36, 0.058), (37, -0.058), (38, 0.041), (39, 0.02), (40, -0.025), (41, 0.009), (42, 0.056), (43, 0.002), (44, -0.117), (45, 0.045), (46, -0.042), (47, 0.05), (48, 0.007), (49, -0.059)]
simIndex simValue paperId paperTitle
same-paper 1 0.96947551 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation
Author: Coskun Mermer ; Murat Saraclar
Abstract: In this work, we compare the translation performance of word alignments obtained via Bayesian inference to those obtained via expectation-maximization (EM). We propose a Gibbs sampler for fully Bayesian inference in IBM Model 1, integrating over all possible parameter values in finding the alignment distribution. We show that Bayesian inference outperforms EM in all of the tested language pairs, domains and data set sizes, by up to 2.99 BLEU points. We also show that the proposed method effectively addresses the well-known rare word problem in EM-estimated models; and at the same time induces a much smaller dictionary of bilingual word-pairs. .t r
2 0.86538249 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features
Author: Chris Dyer ; Jonathan H. Clark ; Alon Lavie ; Noah A. Smith
Abstract: We introduce a discriminatively trained, globally normalized, log-linear variant of the lexical translation models proposed by Brown et al. (1993). In our model, arbitrary, nonindependent features may be freely incorporated, thereby overcoming the inherent limitation of generative models, which require that features be sensitive to the conditional independencies of the generative process. However, unlike previous work on discriminative modeling of word alignment (which also permits the use of arbitrary features), the parameters in our models are learned from unannotated parallel sentences, rather than from supervised word alignments. Using a variety of intrinsic and extrinsic measures, including translation performance, we show our model yields better alignments than generative baselines in a number of language pairs.
3 0.83953714 141 acl-2011-Gappy Phrasal Alignment By Agreement
Author: Mohit Bansal ; Chris Quirk ; Robert Moore
Abstract: We propose a principled and efficient phraseto-phrase alignment model, useful in machine translation as well as other related natural language processing problems. In a hidden semiMarkov model, word-to-phrase and phraseto-word translations are modeled directly by the system. Agreement between two directional models encourages the selection of parsimonious phrasal alignments, avoiding the overfitting commonly encountered in unsupervised training with multi-word units. Expanding the state space to include “gappy phrases” (such as French ne ? pas) makes the alignment space more symmetric; thus, it allows agreement between discontinuous alignments. The resulting system shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime.
4 0.82719052 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition
Author: John DeNero ; Klaus Macherey
Abstract: Unsupervised word alignment is most often modeled as a Markov process that generates a sentence f conditioned on its translation e. A similar model generating e from f will make different alignment predictions. Statistical machine translation systems combine the predictions of two directional models, typically using heuristic combination procedures like grow-diag-final. This paper presents a graphical model that embeds two directional aligners into a single model. Inference can be performed via dual decomposition, which reuses the efficient inference algorithms of the directional models. Our bidirectional model enforces a one-to-one phrase constraint while accounting for the uncertainty in the underlying directional models. The resulting alignments improve upon baseline combination heuristics in word-level and phrase-level evaluations.
5 0.81938791 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity
Author: Kristina Toutanova ; Michel Galley
Abstract: Contrary to popular belief, we show that the optimal parameters for IBM Model 1 are not unique. We demonstrate that, for a large class of words, IBM Model 1 is indifferent among a continuum of ways to allocate probability mass to their translations. We study the magnitude of the variance in optimal model parameters using a linear programming approach as well as multiple random trials, and demonstrate that it results in variance in test set log-likelihood and alignment error rate.
6 0.80819911 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?
7 0.75233024 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction
8 0.75180525 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices
9 0.70453316 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models
10 0.70013541 93 acl-2011-Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment
11 0.67824489 94 acl-2011-Deciphering Foreign Language
12 0.67757171 339 acl-2011-Word Alignment Combination over Multiple Word Segmentation
13 0.67095488 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction
14 0.6597771 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
15 0.61534971 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation
16 0.59986877 340 acl-2011-Word Alignment via Submodular Maximization over Matroids
17 0.57511693 146 acl-2011-Goodness: A Method for Measuring Machine Translation Confidence
18 0.56735438 17 acl-2011-A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation
19 0.56689894 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers
20 0.55620378 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules
topicId topicWeight
[(5, 0.023), (17, 0.062), (26, 0.016), (37, 0.082), (39, 0.045), (41, 0.123), (53, 0.011), (55, 0.037), (59, 0.063), (60, 0.121), (72, 0.044), (91, 0.026), (96, 0.277), (97, 0.013)]
simIndex simValue paperId paperTitle
1 0.97091877 120 acl-2011-Even the Abstract have Color: Consensus in Word-Colour Associations
Author: Saif Mohammad
Abstract: Colour is a key component in the successful dissemination of information. Since many real-world concepts are associated with colour, for example danger with red, linguistic information is often complemented with the use of appropriate colours in information visualization and product marketing. Yet, there is no comprehensive resource that captures concept–colour associations. We present a method to create a large word–colour association lexicon by crowdsourcing. A wordchoice question was used to obtain sense-level annotations and to ensure data quality. We focus especially on abstract concepts and emotions to show that even they tend to have strong colour associations. Thus, using the right colours can not only improve semantic coherence, but also inspire the desired emotional response.
2 0.95650011 326 acl-2011-Using Bilingual Information for Cross-Language Document Summarization
Author: Xiaojun Wan
Abstract: Cross-language document summarization is defined as the task of producing a summary in a target language (e.g. Chinese) for a set of documents in a source language (e.g. English). Existing methods for addressing this task make use of either the information from the original documents in the source language or the information from the translated documents in the target language. In this study, we propose to use the bilingual information from both the source and translated documents for this task. Two summarization methods (SimFusion and CoRank) are proposed to leverage the bilingual information in the graph-based ranking framework for cross-language summary extraction. Experimental results on the DUC2001 dataset with manually translated reference Chinese summaries show the effectiveness of the proposed methods. 1
same-paper 3 0.9506259 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation
Author: Coskun Mermer ; Murat Saraclar
Abstract: In this work, we compare the translation performance of word alignments obtained via Bayesian inference to those obtained via expectation-maximization (EM). We propose a Gibbs sampler for fully Bayesian inference in IBM Model 1, integrating over all possible parameter values in finding the alignment distribution. We show that Bayesian inference outperforms EM in all of the tested language pairs, domains and data set sizes, by up to 2.99 BLEU points. We also show that the proposed method effectively addresses the well-known rare word problem in EM-estimated models; and at the same time induces a much smaller dictionary of bilingual word-pairs. .t r
4 0.92886245 94 acl-2011-Deciphering Foreign Language
Author: Sujith Ravi ; Kevin Knight
Abstract: In this work, we tackle the task of machine translation (MT) without parallel training data. We frame the MT problem as a decipherment task, treating the foreign text as a cipher for English and present novel methods for training translation models from nonparallel text.
5 0.92176849 11 acl-2011-A Fast and Accurate Method for Approximate String Search
Author: Ziqi Wang ; Gu Xu ; Hang Li ; Ming Zhang
Abstract: This paper proposes a new method for approximate string search, specifically candidate generation in spelling error correction, which is a task as follows. Given a misspelled word, the system finds words in a dictionary, which are most “similar” to the misspelled word. The paper proposes a probabilistic approach to the task, which is both accurate and efficient. The approach includes the use of a log linear model, a method for training the model, and an algorithm for finding the top k candidates. The log linear model is defined as a conditional probability distribution of a corrected word and a rule set for the correction conditioned on the misspelled word. The learning method employs the criterion in candidate generation as loss function. The retrieval algorithm is efficient and is guaranteed to find the optimal k candidates. Experimental results on large scale data show that the proposed approach improves upon existing methods in terms of accuracy in different settings.
6 0.92109078 37 acl-2011-An Empirical Evaluation of Data-Driven Paraphrase Generation Techniques
7 0.91993219 244 acl-2011-Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts
8 0.91801953 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction
9 0.91798192 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
10 0.91687608 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models
11 0.9152931 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
12 0.91421604 286 acl-2011-Social Network Extraction from Texts: A Thesis Proposal
13 0.91329026 251 acl-2011-Probabilistic Document Modeling for Syntax Removal in Text Summarization
14 0.91257513 4 acl-2011-A Class of Submodular Functions for Document Summarization
15 0.91069257 72 acl-2011-Collecting Highly Parallel Data for Paraphrase Evaluation
16 0.91025198 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering
17 0.91006386 240 acl-2011-ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation
18 0.90994108 280 acl-2011-Sentence Ordering Driven by Local and Global Coherence for Summary Generation
19 0.90987492 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
20 0.90954232 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations