acl acl2011 acl2011-233 knowledge-graph by maker-knowledge-mining

233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation


Source: pdf

Author: Sankaranarayanan Ananthakrishnan ; Rohit Prasad ; Prem Natarajan

Abstract: The language model (LM) is a critical component in most statistical machine translation (SMT) systems, serving to establish a probability distribution over the hypothesis space. Most SMT systems use a static LM, independent of the source language input. While previous work has shown that adapting LMs based on the input improves SMT performance, none of the techniques has thus far been shown to be feasible for on-line systems. In this paper, we develop a novel measure of cross-lingual similarity for biasing the LM based on the test input. We also illustrate an efficient on-line implementation that supports integration with on-line SMT systems by transferring much of the computational load off-line. Our approach yields significant reductions in target perplexity compared to the static LM, as well as consistent improvements in SMT performance across language pairs (English-Dari and English-Pashto).

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract The language model (LM) is a critical component in most statistical machine translation (SMT) systems, serving to establish a probability distribution over the hypothesis space. [sent-5, score-0.1]

2 Most SMT systems use a static LM, independent of the source language input. [sent-6, score-0.346]

3 While previous work has shown that adapting LMs based on the input improves SMT performance, none of the techniques has thus far been shown to be feasible for on-line systems. [sent-7, score-0.031]

4 In this paper, we develop a novel measure of cross-lingual similarity for biasing the LM based on the test input. [sent-8, score-0.432]

5 Our approach yields significant reductions in target perplexity compared to the static LM, as well as consistent improvements in SMT performance across language pairs (English-Dari and English-Pashto). [sent-10, score-0.738]

6 1 Introduction While much of the focus in developing a statistical machine translation (SMT) system revolves around the translation model (TM), most systems do not emphasize the role of the language model (LM). [sent-11, score-0.2]

7 The latter generally follows a n-gram structure and is estimated from a large, monolingual corpus of target sentences. [sent-12, score-0.309]

8 In most systems, the LM is independent of the test input, i. [sent-13, score-0.043]

9 fixed n-gram probabilities determine the likelihood of all translation hypotheses, regardless of the source input. [sent-15, score-0.216]

10 445 Some previous work exists in LM adaptation for SMT. [sent-19, score-0.035]

11 (2008) used a cross-lingual information retrieval (CLIR) system to select a subset of target documents “comparable” to the source document; bias LMs estimated from these subsets were interpolated with a static background LM. [sent-21, score-0.698]

12 (2004) converted initial SMT hypotheses to queries and retrieved similar sentences from a large monolingual collection. [sent-23, score-0.109]

13 The latter were used to build source-specific LMs that were then interpolated with a background model. [sent-24, score-0.081]

14 While feasible in offline evaluations where the test set is relatively static, the above techniques are computationally expensive and therefore not suitable for low-latency, interactive applications of SMT. [sent-26, score-0.164]

15 Examples include speechto-speech and web-based interactive translation systems, where test inputs are user-generated and preclude off-line LM adaptation. [sent-27, score-0.234]

16 In this paper, we present a novel technique for weighting a LM corpus at the sentence level based on the source language input. [sent-28, score-0.238]

17 The weighting scheme relies on a measure of cross-lingual similarity evaluated by projecting sparse vector representations of the target sentences into the space of source sentences using a transformation matrix computed from the bilingual parallel data. [sent-29, score-0.927]

18 The LM estimated from this weighted corpus boosts the probability of relevant target n-grams, while attenuating unrelated target segments. [sent-30, score-0.47]

19 Our formulation, based on simple ideas in linear algebra, alleviates run-time complexity by pre-computing the majority of intermediate products off-line. [sent-31, score-0.067]

20 i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 445–449, 2 Cross-Lingual Similarity We propose a novel measure of cross-lingual similarity that evaluates the likeness between an arbitrary pair of source and target language sentences. [sent-34, score-0.512]

21 The proposed approach represents the source and target sentences in sparse vector spaces defined by their corresponding vocabularies, and relies on a bilingual projection matrix to transform vectors in the target language space to the source language space. [sent-35, score-0.952]

22 , tN} represent Sthe = source and target language vocabul}arrieeps-. [sent-42, score-0.315]

23 Let u represent the candidate source sentence in a M-dimensional vector space, whose mth dimension um represents the count ofvocabulary item sm in the sentence. [sent-43, score-0.366]

24 Similarly, v represents the candidate target sentence in a N-dimensional vector space. [sent-44, score-0.355]

25 Traditionally, the cosine similarity measure is used to evaluate the likeness of two term-frequency representations. [sent-46, score-0.197]

26 Thus, it is necessary to find a projection of × v in the source vocabulary vector space before similarity can be evaluated. [sent-48, score-0.343]

27 Assuming we are able to compute a M NdimAesnssuimoninalg bilingual awbolerd t co-occurrence Mma ×trix N NΣfrom the SMT parallel corpus, the matrix-vector product = Σv is a projection of the target sentence in the source vector space. [sent-49, score-0.596]

28 Those source terms of the M-dimensional vector will be emphasized that most frequently co-occur with the target terms in v. [sent-50, score-0.384]

29 In other words, can be interpreted as a “bagof-words” translation of v. [sent-51, score-0.129]

30 The cross-lingual similarity between the candidate source and target sentences then reduces to the cosine similarity between the source term-frequency vector u and the projected target term-frequency vector as shown in Equation 2. [sent-52, score-1.06]

31 1) In the above equation, we ensure that both u and are normalized to unit L2-norm. [sent-54, score-0.039]

32 This prevents over- or under-estimation of cross-lingual similarity due to sentence length mismatch. [sent-55, score-0.126]

33 446 We estimate the bilingual word co-occurrence matrix Σ from an unsupervised, automatic word alignment induced over the parallel training corpus P. [sent-56, score-0.197]

34 , 1999) eto u eest tihmea GteI tAhe+ parameters -oOf nIBaizMan M etod ale. [sent-58, score-0.058]

35 , 1993), and combine the forward and backward Viterbi alignments to obtain many-tomany word alignments as described in Koehn et al. [sent-60, score-0.16]

36 The (m, n)th entry Σm,n of this matrix is the number of times source word sm aligns to target word tn in P. [sent-62, score-0.454]

37 3 Language Model Biasing In traditional LM training, n-gram counts are evaluated assuming unit weight for each sentence. [sent-63, score-0.123]

38 Our approach to LM biasing involves re-distributing these weights to favor target sentences that are “similar” to the candidate source sentence according to the measure of cross-lingual similarity developed in Section 2. [sent-64, score-0.834]

39 Thus, n-grams that appear in the trans- lation hypothesis for the candidate input will be assigned high probability by the biased LM, and viceversa. [sent-65, score-0.356]

40 Let u be the term-frequency representation of the candidate source sentence for which the LM must be biased. [sent-66, score-0.203]

41 ,vK} similarly represent hthee Ket target tLorMs training sent}ensc iems. [sent-70, score-0.199]

42 i aWrley compute the similarity of the source sentence u to each target sentence vj according to Equation 3. [sent-71, score-0.587]

43 1) The biased LM is estimated by weighting n-gram counts collected from the jth target sentence with the corresponding cross-lingual similarity ωj. [sent-73, score-0.796]

44 In order to alleviate the run-time complexity of on-line LM biasing, we present an efficient method for obtaining biased counts of an arbitrary target n-gram t. [sent-75, score-0.579]

45 Let ω = [ω1, be the vector representing crosslingual similarity between the candidate source sentence and each of the K target sentences. [sent-84, score-0.639]

46 Then, the biased count of this n-gram, denoted by C∗ (t), is given by Equation i3s. [sent-85, score-0.335]

47 2) The vector bt can be interpreted as the projection oftarget n-gram t in the source space. [sent-90, score-0.397]

48 Note that bt is independent of the source input u, and can therefore be pre-computed off-line. [sent-91, score-0.204]

49 At run-time, the biased count of any n-gram can be obtained via a simple dot product. [sent-92, score-0.335]

50 This adds very little on-line time complexity because u is a sparse vector. [sent-93, score-0.097]

51 Since bt is technically a dense vector, the space complexity of this approach may seem very high. [sent-94, score-0.155]

52 In practice, the mass of bt is concentrated around a very small number of source words that frequently co-occur with target ngram t; thus, it can be “sparsified” with little or no loss of information by simply establishing a cutoff threshold on its elements. [sent-95, score-0.517]

53 Biased counts and probabilities can be computed on demand for specific ngrams without re-estimating the entire LM. [sent-96, score-0.042]

54 We conduct experiments on two resource-poor language pairs commissioned under the DARPA Transtac speech-to-speech translation initiative, viz. [sent-98, score-0.128]

55 English-Dari (E2D) and English-Pashto (E2P), on test sets with single as well as multiple references. [sent-99, score-0.043]

56 447 LT DeM astvTea(lr41osa-epirtmneif) gnt-123E,7 89D1k0sp eanirtsenc2513E,6023184Pk5s3apm eaniprtsle ncs Table 1: Data configuration for perplexity/SMT experiments. [sent-100, score-0.03]

57 1 Data Configuration Parallel data were made available under the Transtac program for both language pairs evaluated in this paper. [sent-105, score-0.042]

58 We divided these into training, held-out development, and test sets for building, tuning, and evaluating the SMT system, respectively. [sent-106, score-0.043]

59 These development and test sets provide only one reference translation for each source sentence. [sent-107, score-0.259]

60 For E2P, DARPA has made available to all program participants an additional evaluation set with multiple (four) references for each test input. [sent-108, score-0.043]

61 The Dari and Pashto monolingual corpora for LM training are a superset of target sentences from the parallel training corpus, consisting of additional untranslated sentences, as well as data derived from other sources, such as the web. [sent-109, score-0.358]

62 2 Perplexity Analysis For both Dari and Pashto, we estimated a static trigram LM with unit sentence level weights that served as a baseline. [sent-112, score-0.434]

63 We tuned this LM by varying the bigram and trigram frequency cutoff thresholds to minimize perplexity on the held-out target sentences. [sent-113, score-0.744]

64 Finally, we evaluated test target perplexity with the optimized baseline LM. [sent-114, score-0.538]

65 We then applied the proposed technique to estimate trigram LMs biased to source sentences in the held-out and test sets. [sent-115, score-0.627]

66 We evaluated sourceconditional target perplexity by computing the total log-probability of all target sentences in a parallel test corpus against the LM biased by the corresponding source sentences. [sent-116, score-1.273]

67 Again, bigram and trigram cutoff thresholds were tuned to minimize source-conditional target perplexity on the held-out set. [sent-117, score-0.744]

68 The tuned biased LMs were used to compute source-conditional target perplexity on the test set. [sent-118, score-0.84]

69 53 c%t ion Table 2: Reduction in perplexity using biased LMs. [sent-122, score-0.553]

70 Table 2 summarizes the reduction in target perplexity using biased LMs; on the E2D and E2P single-reference test sets, we obtained perplexity reductions of 12. [sent-124, score-1.135]

71 This indicates that the biased models are significantly better predictors of the corresponding target sentences than the static baseline LM. [sent-127, score-0.771]

72 We used GIZA++ to induce automatic word alignments from the parallel training corpus. [sent-132, score-0.124]

73 Phrase translation rules (up to a maximum source span of 5 words) were extracted from a combination of forward and backward word alignments (Koehn et al. [sent-133, score-0.33]

74 The SMT decoder uses a log-linear model that combines numerous features, including but not limited to phrase translation probability, LM probability, and distortion penalty, to estimate the posterior probability of target hypotheses. [sent-135, score-0.328]

75 Finally, we evaluated SMT performance on the test set in terms of BLEU and TER (Snover et al. [sent-138, score-0.085]

76 The baseline SMT system used the static trigram LM with cutoff frequencies optimized for minimum perplexity on the development set. [sent-140, score-0.661]

77 Biased LMs (with n-gram cutoffs tuned as above) were estimated for all source sentences in the development and test 448 ETe2 PsDt-1s4er tef- ts S21t5a34. [sent-141, score-0.319]

78 84e5d Table 3: SMT performance with static and biased LMs. [sent-145, score-0.529]

79 Table 3 summarizes the consistent improvement in BLEU/TER across multiple test sets and language pairs. [sent-147, score-0.074]

80 5 Discussion and Future Work Existing methods for target LM biasing for SMT rely on information retrieval to select a comparable subset from the training corpus. [sent-148, score-0.457]

81 A foreground LM estimated from this subset is interpolated with the static background LM. [sent-149, score-0.416]

82 However, given the large size of a typical LM corpus, these methods are unsuitable for on-line, interactive SMT applications. [sent-150, score-0.086]

83 In this paper, we proposed a novel LM biasing technique based on linear transformations of target sentences in a sparse vector space. [sent-151, score-0.661]

84 We adopted a fine-grained approach, weighting individual target sentences based on the proposed measure of crosslingual similarity, and by using the entire, weighted corpus to estimate a biased LM. [sent-152, score-0.735]

85 Finally, we showed that biased LMs yield significant reductions in target perplexity, and consistent improvements in SMT performance. [sent-155, score-0.553]

86 While we used phrase-based SMT as a test-bed for evaluating translation performance, it should be noted that the proposed LM biasing approach is independent of SMT architecture. [sent-156, score-0.358]

87 We plan to test its effectiveness in hierarchical and syntax-based SMT systems. [sent-157, score-0.043]

88 We also plan to investigate the relative usefulness of LM biasing as we move from low- languages to those for which significantly larger parallel corpora and LM training data are available. [sent-158, score-0.336]

89 A study of translation edit rate with targeted human annotation. [sent-203, score-0.1]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('lm', 0.456), ('smt', 0.308), ('biased', 0.299), ('biasing', 0.258), ('perplexity', 0.254), ('lms', 0.242), ('static', 0.23), ('target', 0.199), ('vj', 0.116), ('source', 0.116), ('cutoff', 0.114), ('dari', 0.112), ('pashto', 0.112), ('translation', 0.1), ('similarity', 0.096), ('bt', 0.088), ('parallel', 0.078), ('estimated', 0.072), ('crosslingual', 0.072), ('snover', 0.071), ('vector', 0.069), ('likeness', 0.066), ('trigram', 0.063), ('projection', 0.062), ('transtac', 0.061), ('sparse', 0.058), ('weighting', 0.058), ('interactive', 0.058), ('sm', 0.058), ('candidate', 0.057), ('reductions', 0.055), ('interpolated', 0.05), ('koehn', 0.05), ('matrix', 0.048), ('alignments', 0.046), ('tuned', 0.045), ('sentences', 0.043), ('test', 0.043), ('bilingual', 0.042), ('counts', 0.042), ('evaluated', 0.042), ('equation', 0.041), ('bbn', 0.041), ('complexity', 0.039), ('unit', 0.039), ('morristown', 0.038), ('thresholds', 0.038), ('monolingual', 0.038), ('backward', 0.038), ('count', 0.036), ('stroudsburg', 0.036), ('measure', 0.035), ('adaptation', 0.035), ('bleu', 0.035), ('nj', 0.034), ('technique', 0.034), ('tn', 0.033), ('preclude', 0.033), ('mma', 0.033), ('dem', 0.033), ('oftarget', 0.033), ('foreground', 0.033), ('ctt', 0.033), ('natarajan', 0.033), ('computationally', 0.032), ('background', 0.031), ('minimize', 0.031), ('summarizes', 0.031), ('feasible', 0.031), ('sentence', 0.03), ('configuration', 0.03), ('bonnie', 0.03), ('rohit', 0.03), ('ananthakrishnan', 0.03), ('raytheon', 0.03), ('etod', 0.03), ('clir', 0.03), ('sankaranarayanan', 0.03), ('sthe', 0.03), ('giza', 0.03), ('dorr', 0.03), ('forward', 0.03), ('alexandra', 0.03), ('moses', 0.03), ('estimate', 0.029), ('interpreted', 0.029), ('eto', 0.028), ('ara', 0.028), ('prem', 0.028), ('unsuitable', 0.028), ('approved', 0.028), ('proceeded', 0.028), ('alleviates', 0.028), ('technically', 0.028), ('tahe', 0.028), ('matthew', 0.028), ('conduct', 0.028), ('hypotheses', 0.028), ('let', 0.027), ('darpa', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation

Author: Sankaranarayanan Ananthakrishnan ; Rohit Prasad ; Prem Natarajan

Abstract: The language model (LM) is a critical component in most statistical machine translation (SMT) systems, serving to establish a probability distribution over the hypothesis space. Most SMT systems use a static LM, independent of the source language input. While previous work has shown that adapting LMs based on the input improves SMT performance, none of the techniques has thus far been shown to be feasible for on-line systems. In this paper, we develop a novel measure of cross-lingual similarity for biasing the LM based on the test input. We also illustrate an efficient on-line implementation that supports integration with on-line SMT systems by transferring much of the computational load off-line. Our approach yields significant reductions in target perplexity compared to the static LM, as well as consistent improvements in SMT performance across language pairs (English-Dari and English-Pashto).

2 0.16713713 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

Author: Lane Schwartz ; Chris Callison-Burch ; William Schuler ; Stephen Wu

Abstract: This paper describes a novel technique for incorporating syntactic knowledge into phrasebased machine translation through incremental syntactic parsing. Bottom-up and topdown parsers typically require a completed string as input. This requirement makes it difficult to incorporate them into phrase-based translation, which generates partial hypothesized translations from left-to-right. Incremental syntactic language models score sentences in a similar left-to-right fashion, and are therefore a good mechanism for incorporat- ing syntax into phrase-based translation. We give a formal definition of one such lineartime syntactic language model, detail its relation to phrase-based decoding, and integrate the model with the Moses phrase-based translation system. We present empirical results on a constrained Urdu-English translation task that demonstrate a significant BLEU score improvement and a large decrease in perplexity.

3 0.14003621 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation

Author: Coskun Mermer ; Murat Saraclar

Abstract: In this work, we compare the translation performance of word alignments obtained via Bayesian inference to those obtained via expectation-maximization (EM). We propose a Gibbs sampler for fully Bayesian inference in IBM Model 1, integrating over all possible parameter values in finding the alignment distribution. We show that Bayesian inference outperforms EM in all of the tested language pairs, domains and data set sizes, by up to 2.99 BLEU points. We also show that the proposed method effectively addresses the well-known rare word problem in EM-estimated models; and at the same time induces a much smaller dictionary of bilingual word-pairs. .t r

4 0.13106127 313 acl-2011-Two Easy Improvements to Lexical Weighting

Author: David Chiang ; Steve DeNeefe ; Michael Pust

Abstract: We introduce two simple improvements to the lexical weighting features of Koehn, Och, and Marcu (2003) for machine translation: one which smooths the probability of translating word f to word e by simplifying English morphology, and one which conditions it on the kind of training data that f and e co-occurred in. These new variations lead to improvements of up to +0.8 BLEU, with an average improvement of +0.6 BLEU across two language pairs, two genres, and two translation systems.

5 0.12948215 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages

Author: Sara Stymne

Abstract: In this thesis proposal Ipresent my thesis work, about pre- and postprocessing for statistical machine translation, mainly into Germanic languages. I focus my work on four areas: compounding, definite noun phrases, reordering, and error correction. Initial results are positive within all four areas, and there are promising possibilities for extending these approaches. In addition Ialso focus on methods for performing thorough error analysis of machine translation output, which can both motivate and evaluate the studies performed.

6 0.12765089 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling

7 0.12576586 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?

8 0.11972418 104 acl-2011-Domain Adaptation for Machine Translation by Mining Unseen Words

9 0.11292429 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation

10 0.11053101 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach

11 0.11009674 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering

12 0.10407244 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation

13 0.099805564 266 acl-2011-Reordering with Source Language Collocations

14 0.098351754 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers

15 0.097230807 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

16 0.09606953 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation

17 0.095310494 146 acl-2011-Goodness: A Method for Measuring Machine Translation Confidence

18 0.092377201 163 acl-2011-Improved Modeling of Out-Of-Vocabulary Words Using Morphological Classes

19 0.090400845 90 acl-2011-Crowdsourcing Translation: Professional Quality from Non-Professionals

20 0.08840239 44 acl-2011-An exponential translation model for target language morphology


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.221), (1, -0.12), (2, 0.108), (3, 0.13), (4, 0.028), (5, -0.012), (6, 0.042), (7, -0.013), (8, 0.045), (9, 0.061), (10, 0.018), (11, -0.069), (12, 0.058), (13, -0.032), (14, 0.03), (15, 0.035), (16, -0.059), (17, 0.026), (18, 0.001), (19, -0.059), (20, 0.037), (21, -0.099), (22, 0.117), (23, -0.053), (24, -0.008), (25, -0.003), (26, 0.042), (27, 0.042), (28, 0.003), (29, 0.029), (30, -0.027), (31, -0.051), (32, -0.009), (33, 0.001), (34, 0.07), (35, -0.069), (36, 0.013), (37, 0.004), (38, 0.033), (39, -0.05), (40, -0.033), (41, -0.027), (42, -0.02), (43, -0.058), (44, -0.068), (45, 0.12), (46, -0.0), (47, -0.03), (48, -0.08), (49, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.954898 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation

Author: Sankaranarayanan Ananthakrishnan ; Rohit Prasad ; Prem Natarajan

Abstract: The language model (LM) is a critical component in most statistical machine translation (SMT) systems, serving to establish a probability distribution over the hypothesis space. Most SMT systems use a static LM, independent of the source language input. While previous work has shown that adapting LMs based on the input improves SMT performance, none of the techniques has thus far been shown to be feasible for on-line systems. In this paper, we develop a novel measure of cross-lingual similarity for biasing the LM based on the test input. We also illustrate an efficient on-line implementation that supports integration with on-line SMT systems by transferring much of the computational load off-line. Our approach yields significant reductions in target perplexity compared to the static LM, as well as consistent improvements in SMT performance across language pairs (English-Dari and English-Pashto).

2 0.72138858 313 acl-2011-Two Easy Improvements to Lexical Weighting

Author: David Chiang ; Steve DeNeefe ; Michael Pust

Abstract: We introduce two simple improvements to the lexical weighting features of Koehn, Och, and Marcu (2003) for machine translation: one which smooths the probability of translating word f to word e by simplifying English morphology, and one which conditions it on the kind of training data that f and e co-occurred in. These new variations lead to improvements of up to +0.8 BLEU, with an average improvement of +0.6 BLEU across two language pairs, two genres, and two translation systems.

3 0.6990844 104 acl-2011-Domain Adaptation for Machine Translation by Mining Unseen Words

Author: Hal Daume III ; Jagadeesh Jagarlamudi

Abstract: We show that unseen words account for a large part of the translation error when moving to new domains. Using an extension of a recent approach to mining translations from comparable corpora (Haghighi et al., 2008), we are able to find translations for otherwise OOV terms. We show several approaches to integrating such translations into a phrasebased translation system, yielding consistent improvements in translations quality (between 0.5 and 1.5 Bleu points) on four domains and two language pairs.

4 0.68998379 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation

Author: Bing Xiang ; Abraham Ittycheriah

Abstract: In this paper we present a novel discriminative mixture model for statistical machine translation (SMT). We model the feature space with a log-linear combination ofmultiple mixture components. Each component contains a large set of features trained in a maximumentropy framework. All features within the same mixture component are tied and share the same mixture weights, where the mixture weights are trained discriminatively to maximize the translation performance. This approach aims at bridging the gap between the maximum-likelihood training and the discriminative training for SMT. It is shown that the feature space can be partitioned in a variety of ways, such as based on feature types, word alignments, or domains, for various applications. The proposed approach improves the translation performance significantly on a large-scale Arabic-to-English MT task.

5 0.68758476 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering

Author: Nadir Durrani ; Helmut Schmid ; Alexander Fraser

Abstract: We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the “N-gram” model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is able to handle short and long distance reorderings in a unified way, and (ii) a joint sequence model for the translation and reordering probabilities which is more flexible than standard phrase-based MT. We observe statistically significant improvements in BLEU over Moses for German-to-English and Spanish-to-English tasks, and comparable results for a French-to-English task.

6 0.67157257 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach

7 0.66074896 146 acl-2011-Goodness: A Method for Measuring Machine Translation Confidence

8 0.65742731 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

9 0.652915 116 acl-2011-Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers

10 0.65220594 60 acl-2011-Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability

11 0.64429075 90 acl-2011-Crowdsourcing Translation: Professional Quality from Non-Professionals

12 0.63721681 69 acl-2011-Clause Restructuring For SMT Not Absolutely Helpful

13 0.62951589 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages

14 0.626302 220 acl-2011-Minimum Bayes-risk System Combination

15 0.62363958 266 acl-2011-Reordering with Source Language Collocations

16 0.60943997 310 acl-2011-Translating from Morphologically Complex Languages: A Paraphrase-Based Approach

17 0.59885979 17 acl-2011-A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation

18 0.59674168 75 acl-2011-Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction

19 0.59613532 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity

20 0.59607643 259 acl-2011-Rare Word Translation Extraction from Aligned Comparable Documents


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.029), (7, 0.197), (17, 0.062), (26, 0.033), (37, 0.108), (39, 0.03), (41, 0.071), (55, 0.014), (59, 0.042), (72, 0.064), (91, 0.025), (96, 0.221)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.87661862 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation

Author: Sankaranarayanan Ananthakrishnan ; Rohit Prasad ; Prem Natarajan

Abstract: The language model (LM) is a critical component in most statistical machine translation (SMT) systems, serving to establish a probability distribution over the hypothesis space. Most SMT systems use a static LM, independent of the source language input. While previous work has shown that adapting LMs based on the input improves SMT performance, none of the techniques has thus far been shown to be feasible for on-line systems. In this paper, we develop a novel measure of cross-lingual similarity for biasing the LM based on the test input. We also illustrate an efficient on-line implementation that supports integration with on-line SMT systems by transferring much of the computational load off-line. Our approach yields significant reductions in target perplexity compared to the static LM, as well as consistent improvements in SMT performance across language pairs (English-Dari and English-Pashto).

2 0.80455291 104 acl-2011-Domain Adaptation for Machine Translation by Mining Unseen Words

Author: Hal Daume III ; Jagadeesh Jagarlamudi

Abstract: We show that unseen words account for a large part of the translation error when moving to new domains. Using an extension of a recent approach to mining translations from comparable corpora (Haghighi et al., 2008), we are able to find translations for otherwise OOV terms. We show several approaches to integrating such translations into a phrasebased translation system, yielding consistent improvements in translations quality (between 0.5 and 1.5 Bleu points) on four domains and two language pairs.

3 0.8033421 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

Author: Raphael Hoffmann ; Congle Zhang ; Xiao Ling ; Luke Zettlemoyer ; Daniel S. Weld

Abstract: Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. Recently, researchers have developed multiinstance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded ( Jobs Apple ) and CEO-o f ( Jobs Apple ) . , , This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extrac- , tion model with a simple, corpus-level component for aggregating the individual facts. We apply our model to learn extractors for NY Times text using weak supervision from Freebase. Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level.

4 0.80289483 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

Author: Jason Naradowsky ; Kristina Toutanova

Abstract: This paper describes an unsupervised dynamic graphical model for morphological segmentation and bilingual morpheme alignment for statistical machine translation. The model extends Hidden Semi-Markov chain models by using factored output nodes and special structures for its conditional probability distributions. It relies on morpho-syntactic and lexical source-side information (part-of-speech, morphological segmentation) while learning a morpheme segmentation over the target language. Our model outperforms a competitive word alignment system in alignment quality. Used in a monolingual morphological segmentation setting it substantially improves accuracy over previous state-of-the-art models on three Arabic and Hebrew datasets.

5 0.80154574 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition

Author: John DeNero ; Klaus Macherey

Abstract: Unsupervised word alignment is most often modeled as a Markov process that generates a sentence f conditioned on its translation e. A similar model generating e from f will make different alignment predictions. Statistical machine translation systems combine the predictions of two directional models, typically using heuristic combination procedures like grow-diag-final. This paper presents a graphical model that embeds two directional aligners into a single model. Inference can be performed via dual decomposition, which reuses the efficient inference algorithms of the directional models. Our bidirectional model enforces a one-to-one phrase constraint while accounting for the uncertainty in the underlying directional models. The resulting alignments improve upon baseline combination heuristics in word-level and phrase-level evaluations.

6 0.79970717 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment

7 0.79864752 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition

8 0.79785728 240 acl-2011-ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation

9 0.79717302 23 acl-2011-A Pronoun Anaphora Resolution System based on Factorial Hidden Markov Models

10 0.79660261 141 acl-2011-Gappy Phrasal Alignment By Agreement

11 0.79633904 77 acl-2011-Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech

12 0.79579389 76 acl-2011-Comparative News Summarization Using Linear Programming

13 0.79566157 40 acl-2011-An Error Analysis of Relation Extraction in Social Media Documents

14 0.79550064 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering

15 0.79531002 90 acl-2011-Crowdsourcing Translation: Professional Quality from Non-Professionals

16 0.79516923 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment

17 0.79513788 101 acl-2011-Disentangling Chat with Local Coherence Models

18 0.79497415 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models

19 0.79495752 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features

20 0.79482758 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing