acl acl2010 acl2010-151 knowledge-graph by maker-knowledge-mining

151 acl-2010-Intelligent Selection of Language Model Training Data


Source: pdf

Author: Robert C. Moore ; William Lewis

Abstract: We address the problem of selecting nondomain-specific language model training data to build auxiliary language models for use in tasks such as machine translation. Our approach is based on comparing the cross-entropy, according to domainspecific and non-domain-specifc language models, for each sentence of the text source used to produce the latter language model. We show that this produces better language models, trained on less data, than both random data selection and two other previously proposed methods.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract We address the problem of selecting nondomain-specific language model training data to build auxiliary language models for use in tasks such as machine translation. [sent-3, score-0.269]

2 Our approach is based on comparing the cross-entropy, according to domainspecific and non-domain-specifc language models, for each sentence of the text source used to produce the latter language model. [sent-4, score-0.237]

3 We show that this produces better language models, trained on less data, than both random data selection and two other previously proposed methods. [sent-5, score-0.278]

4 1 Introduction Statistical N-gram language models are widely used in applications that produce natural-language text as output, particularly speech recognition and machine translation. [sent-6, score-0.14]

5 It seems to be a univer- sal truth that output quality can always be improved by using more language model training data, but only if the training data is reasonably well-matched to the desired output. [sent-7, score-0.26]

6 This presents a problem, because in virtually any particular application the amount of in-domain data is limited. [sent-8, score-0.036]

7 Log-linear interpolation is particularly popular in statistical machine translation (e. [sent-10, score-0.09]

8 , 2007), because the interpolation weights can easily be discriminatively trained to optimize an end-to-end translation objective function (such as BLEU) by making the log probability according to each language model a separate feature function in the overall translation model. [sent-13, score-0.509]

9 The normal practice when using multiple languages models in machine translation seems to be to train models on as much data as feasible from each source, and to depend on feature weight optimization to down-weight the impact of data that is less well-matched to the translation application. [sent-14, score-0.281]

10 In this paper, however, we show that for a data source that is not entirely in-domain, we can improve the match between the language model from that data source and the desired application output by intelligently selecting a subset of the available data as language model training data. [sent-15, score-0.509]

11 (2002) both used a method similar to ours, in which the metric used to score text segments is their perplexity according to the in-domain language model. [sent-21, score-0.918]

12 The candidate text segments with perplexity less than some threshold are selected. [sent-22, score-0.911]

13 Klakow (2000) estimates a unigram language model from the entire non-domain-specific corpus to be selected 220 UppsalaP,r Sowce ed ein ,g 1s1 o-f16 th Jeu AlyC 2L0 210 1. [sent-24, score-0.189]

14 Those segments whose removal would decrease the log likelihood of the in-domain data more than some threshold are selected. [sent-27, score-0.418]

15 Our method is a fairly simple variant of scoring by perplexity according to an in-domain language model. [sent-28, score-0.734]

16 First, note that selecting segments based on a perplexity threshold is equivalent to selecting based on a cross-entropy threshold. [sent-29, score-0.978]

17 Perplexity and cross-entropy are monotonically related, since the perplexity of a string s according to a model M is simply where HM(s) is the cross-entropy of s according to M and b is the base with respect to which the cross-entropy is measured (e. [sent-30, score-0.77]

18 bHM(s), To state this formally, let I an in-domain data be set and N be a non-domain-specific (or otherwise not entirely in-domain) data set. [sent-34, score-0.104]

19 Let HI(s) be the per-word cross-entropy, according to a language model trained on I,of a text segment s drawn from N. [sent-35, score-0.407]

20 Let HN(s) be the per-word cross-entropy of s according to a language model trained on a random sample of N. [sent-36, score-0.289]

21 , sentences), and score the segments according to HI(s) HN(s), selecting all text segments whose score −i sH less than a threshold T. [sent-39, score-0.678]

22 This method can be justified by reasoning simliar to that used to derive methods for training binary text classifiers without labeled negative examples (Denis et al. [sent-40, score-0.114]

23 Let us imagine that our non-domainspecific corpus N contains an in-domain subcorpus NI, drawn from the same distribution as our in-domain corpus I. [sent-42, score-0.141]

24 Since NI is statistically just like our in-domain data I, it would seem to be a good candidate for the data that we want to extract from N. [sent-43, score-0.105]

25 Hence, and P(NI|s,N) =P(s|PI)(Ps|(NN)I|N) If we could estimate all the probabilities in the right-hand side of this equation, we could use it to select text segments that have a high probability of being in NI. [sent-45, score-0.308]

26 We can estimate P(s|I) and P(s|N) by training language mimoadteels P on II )a andnd a sample o bfy N, respectively. [sent-46, score-0.169]

27 That leaves us only P(NI |N), to estimate, but we really don’t care what| P(NI |N) is, because knowing that would still leave us wondering what threshold to set on P(NI |s, N). [sent-47, score-0.121]

28 We don’t care about classification accuracy; we care only about the quality of the resulting language model, so we might as well just attempt to find a threshold on P(s|I)/P(s|N) that optimizes the fait t horfe tshheo resulting language Nmo)d theal tto o phteilmd-izoeust tihnedomain data. [sent-48, score-0.252]

29 Equivalently, we can work in the log domain with the quantity log(P(s|I)) log(P(s|N)). [sent-49, score-0.146]

30 ( T(hs|eI reason Ptha(st we ,ne wedit hto hneorm siganliz ree vfeorrs length is that the value of log(P(s|I)) log(P(s|N)) tends to tchoerr velaaltuee very strongly )w −ith l oteg(xtP segment length. [sent-52, score-0.151]

31 If the candidate text segments vary greatly in length—e. [sent-53, score-0.272]

32 , if we partition N into sentences— this correlation can be a serious problem. [sent-55, score-0.034]

33 We estimated this effect on a 1000-sentence sample of our experimental data described below, and found the correlation between sentence log probability difference and sentence length to be r = −0. [sent-56, score-0.341]

34 Hence, using sentence probability ra− − − − tios or log probability differences as our scoring function would result in selecting disproportionately very short sentences. [sent-60, score-0.412]

35 We tested this in an experiment not described here in detail, and found it not to be significantly better as a selection criterion than random selection. [sent-61, score-0.171]

36 For the in-domain corpus, we chose the English side of the English-French parallel text from release v5 of the Europarl corpus (Koehn, 2005). [sent-63, score-0.095]

37 We used the text from 1999 through 2008 as in-domain training data, and we used the first 2000 sentences from January 2009 as test data. [sent-65, score-0.114]

38 We used a simple tokenization scheme on all data, splitting on white space and on boundaries between alphanumeric and nonalphanumeric (e. [sent-68, score-0.069]

39 With this tokenization, the sizes of our data sets in terms of sentences and tokens are shown in Table 1. [sent-71, score-0.109]

40 To implement our data selection method we required one language model trained on the Europarl training data and one trained on the Gigaword data. [sent-73, score-0.443]

41 To further increase the comparability of these Europarl and Gigaword language models, we restricted the vocabulary of both models to the tokens appearing at least twice in the Europarl training data, treating all other tokens as instances of . [sent-75, score-0.409]

42 With this vocabulary, 4-gram language models were trained on both the Europarl training data and the Gigaword random sample using backoff absolute discounting (Ney et al. [sent-76, score-0.413]

43 The discounted probability mass at the unigram level was added to the probability of . [sent-79, score-0.17]

44 A count cutoff of 2 occurrences was applied to the trigrams and 4-grams in estimating these models. [sent-80, score-0.093]

45 We computed the cross-entropy of each sentence in the Gigaword corpus according to both models, and scored each sentence by the difference in cross-entropy, HEp(s) − HGw (s). [sent-81, score-0.183]

46 We then selected subsets of the Gigaword Hdata corresponding to 8 cutoff points in the cross-entropy difference scores, and trained 4-gram models (again using absolute discounting with a discount of 0. [sent-82, score-0.492]

47 7) on each ofthese subsets and on the full Gigaword corpus. [sent-83, score-0.081]

48 We compared our selection method to three other methods. [sent-85, score-0.119]

49 As a baseline, we trained language models on random subsets of the Gigaword corpus of approximately equal size to the data sets produced by the cutoffs we selected for the cross-entropy difference scores. [sent-86, score-0.423]

50 Next, we scored all the Gigaword sentences by the cross-entropy according to the Europarl-trained model alone. [sent-87, score-0.152]

51 As we noted above, this is equivalent to the indomain perplexity scoring method used by Lin et al. [sent-88, score-0.73]

52 Finally, we implemented Klakow’s (2000) method, scoring each Gigaword sentence by removing it from the Gigaword corpus and computing the difference in the log likelihood of the Europarl corpus according to unigram models trained on the Gigaword corpus with and without that sentence. [sent-91, score-0.668]

53 With the latter two methods, we chose cutoff points in the resulting scores to produce data sets approximately equal in size to those obtained using our selection method. [sent-92, score-0.344]

54 4 Results For all four selection methods, plots of test set perplexity vs. [sent-93, score-0.703]

55 the number of training data tokens selected are displayed in Figure 1. [sent-94, score-0.207]

56 (Note that the training data token counts are displayed on a logarithmic scale. [sent-95, score-0.177]

57 ) The test set perplexity for the language model trained on the full Gigaword corpus is 135. [sent-96, score-0.776]

58 As we might expect, reducing training data by random sampling always increases perplexity. [sent-97, score-0.176]

59 Selecting Gigaword sentences by their 222 Figure 1: Test set perplexity vs. [sent-98, score-0.584]

60 training set size iSne-ldeocmtioanin M creothsso-dentropy scoringOrigin1a2l4 L. [sent-99, score-0.085]

61 89 Table 2: Results adjusted for vocabulary coverage cross-entropy according to the Europarl-trained model is effective in reducing both test set perplexity and training corpus size, with an optimum perplexity of 124, obtained with a model built from 36% of the Gigaword corpus. [sent-105, score-1.661]

62 Klakow’s method is even more effective, with an optimum perplexity of 111, obtained with a model built from 21% of the Gigaword corpus. [sent-106, score-0.698]

63 The cross-entropy difference selection method, however, is yet more effective, with an optimum perplexity of 101, obtained with a model built from less than 7% of the Gigaword corpus. [sent-107, score-0.865]

64 The comparisons implied by Figure 1, however, are only approximate, because each perplexity (even along the same curve) is computed with respect to a different vocabulary, resulting in a different out-of-vocabulary (OOV) rate. [sent-108, score-0.584]

65 OOV tokens in the test data are excluded from the perplexity computation, so the perplexity measurements are not strictly comparable. [sent-109, score-1.277]

66 Out of the 55566 test set tokens, the number of OOV tokens ranges from 418 (0. [sent-110, score-0.073]

67 75%), for the smallest training set based on in-domain crossentropy scoring, to 20 (0. [sent-111, score-0.133]

68 If we consider only the training sets that appear to produce the lowest perplexity for each selection method, however, the spread of OOV counts is much narrower, ranging 53 (0. [sent-113, score-0.873]

69 10%) for best training set based on crossentropy difference scoring, to 20 (0. [sent-114, score-0.181]

70 To control for the difference in vocabulary, we estimated a modified 4-gram language model for each selection method (other than random selection) using the training set that appeared to produce the lowest perplexity for that selection method in our initial experiments. [sent-116, score-1.132]

71 In the modified language models, the unigram model based on the selected training set is smoothed by absolute discounting, and backed-off to an unsmoothed unigram model based on the full Gigaword corpus. [sent-117, score-0.457]

72 This produces language models that are normalized over the same vocabulary as a model trained on the full Gigaword corpus; thus the test set has the same OOVs for each model. [sent-118, score-0.333]

73 Test set perplexity for each of these modifed language models is compared to that of the original version of the model in Table 2. [sent-119, score-0.682]

74 5 Conclusions The cross-entropy difference selection method introduced here seems to produce language models that are both a better match to texts in a restricted domain, and require less data for training, than any of the other data selection methods tested. [sent-121, score-0.469]

75 This study is preliminary, however, in that we have not yet shown improved end-to-end task performance applying this approach, such as improved BLEU scores in a machine translation task. [sent-122, score-0.075]

76 Chinese language model adaptation based on document classification and multiple domainspecific language models. [sent-154, score-0.096]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('perplexity', 0.584), ('gigaword', 0.388), ('ni', 0.237), ('europarl', 0.204), ('segments', 0.181), ('log', 0.146), ('vocabulary', 0.134), ('segment', 0.122), ('selection', 0.119), ('klakow', 0.101), ('oov', 0.1), ('unigram', 0.098), ('cutoff', 0.093), ('scoring', 0.084), ('selecting', 0.079), ('crossentropy', 0.077), ('discount', 0.077), ('elkin', 0.077), ('tokens', 0.073), ('trained', 0.071), ('discounting', 0.07), ('hn', 0.068), ('according', 0.066), ('care', 0.066), ('indomain', 0.062), ('optimum', 0.06), ('text', 0.058), ('gao', 0.057), ('training', 0.056), ('threshold', 0.055), ('cutoffs', 0.055), ('model', 0.054), ('random', 0.052), ('denis', 0.052), ('subsets', 0.051), ('hi', 0.049), ('difference', 0.048), ('sample', 0.046), ('translation', 0.046), ('interpolation', 0.044), ('ldc', 0.044), ('models', 0.044), ('counts', 0.043), ('june', 0.043), ('displayed', 0.042), ('domainspecific', 0.042), ('tokenization', 0.04), ('absolute', 0.038), ('produce', 0.038), ('brants', 0.037), ('corpus', 0.037), ('drawn', 0.036), ('data', 0.036), ('probability', 0.036), ('translations', 0.035), ('partition', 0.034), ('bhm', 0.034), ('gilleron', 0.034), ('xtp', 0.034), ('noto', 0.034), ('nats', 0.034), ('hep', 0.034), ('andi', 0.034), ('bfy', 0.034), ('chien', 0.034), ('istanbul', 0.034), ('nmo', 0.034), ('tshheo', 0.034), ('turkey', 0.034), ('estimate', 0.033), ('lowest', 0.033), ('source', 0.033), ('candidate', 0.033), ('scored', 0.032), ('entirely', 0.032), ('reducing', 0.032), ('intelligently', 0.031), ('saves', 0.031), ('tios', 0.031), ('subcorpus', 0.031), ('ppl', 0.031), ('ashok', 0.031), ('popat', 0.031), ('tto', 0.031), ('dietrich', 0.031), ('full', 0.03), ('scores', 0.029), ('seems', 0.029), ('estimated', 0.029), ('alphanumeric', 0.029), ('january', 0.029), ('unsmoothed', 0.029), ('tphe', 0.029), ('hto', 0.029), ('comparability', 0.029), ('narrower', 0.029), ('cois', 0.029), ('desired', 0.029), ('score', 0.029), ('size', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 151 acl-2010-Intelligent Selection of Language Model Training Data

Author: Robert C. Moore ; William Lewis

Abstract: We address the problem of selecting nondomain-specific language model training data to build auxiliary language models for use in tasks such as machine translation. Our approach is based on comparing the cross-entropy, according to domainspecific and non-domain-specifc language models, for each sentence of the text source used to produce the latter language model. We show that this produces better language models, trained on less data, than both random data selection and two other previously proposed methods.

2 0.14825645 91 acl-2010-Domain Adaptation of Maximum Entropy Language Models

Author: Tanel Alumae ; Mikko Kurimo

Abstract: We investigate a recently proposed Bayesian adaptation method for building style-adapted maximum entropy language models for speech recognition, given a large corpus of written language data and a small corpus of speech transcripts. Experiments show that the method consistently outperforms linear interpolation which is typically used in such cases.

3 0.13669191 173 acl-2010-Modeling Norms of Turn-Taking in Multi-Party Conversation

Author: Kornel Laskowski

Abstract: Substantial research effort has been invested in recent decades into the computational study and automatic processing of multi-party conversation. While most aspects of conversational speech have benefited from a wide availability of analytic, computationally tractable techniques, only qualitative assessments are available for characterizing multi-party turn-taking. The current paper attempts to address this deficiency by first proposing a framework for computing turn-taking model perplexity, and then by evaluating several multi-participant modeling approaches. Experiments show that direct multi-participant models do not generalize to held out data, and likely never will, for practical reasons. In contrast, the Extended-Degree-of-Overlap model represents a suitable candidate for future work in this area, and is shown to successfully predict the distribution of speech in time and across participants in previously unseen conversations.

4 0.094949335 246 acl-2010-Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure

Author: Minwoo Jeong ; Ivan Titov

Abstract: Documents often have inherently parallel structure: they may consist of a text and commentaries, or an abstract and a body, or parts presenting alternative views on the same problem. Revealing relations between the parts by jointly segmenting and predicting links between the segments, would help to visualize such documents and construct friendlier user interfaces. To address this problem, we propose an unsupervised Bayesian model for joint discourse segmentation and alignment. We apply our method to the “English as a second language” podcast dataset where each episode is composed of two parallel parts: a story and an explanatory lecture. The predicted topical links uncover hidden re- lations between the stories and the lectures. In this domain, our method achieves competitive results, rivaling those of a previously proposed supervised technique.

5 0.093370058 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out

Author: Joern Wuebker ; Arne Mauser ; Hermann Ney

Abstract: Several attempts have been made to learn phrase translation probabilities for phrasebased statistical machine translation that go beyond pure counting of phrases in word-aligned training data. Most approaches report problems with overfitting. We describe a novel leavingone-out approach to prevent over-fitting that allows us to train phrase models that show improved translation performance on the WMT08 Europarl German-English task. In contrast to most previous work where phrase models were trained separately from other models used in translation, we include all components such as single word lexica and reordering mod- els in training. Using this consistent training of phrase models we are able to achieve improvements of up to 1.4 points in BLEU. As a side effect, the phrase table size is reduced by more than 80%.

6 0.084505901 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction

7 0.077440269 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation

8 0.072509535 233 acl-2010-The Same-Head Heuristic for Coreference

9 0.066566095 220 acl-2010-Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure

10 0.06572324 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking

11 0.065684579 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment

12 0.065128081 54 acl-2010-Boosting-Based System Combination for Machine Translation

13 0.064524993 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation

14 0.063301884 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences

15 0.060009807 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features

16 0.059523601 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries

17 0.058245327 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules

18 0.05761439 106 acl-2010-Event-Based Hyperspace Analogue to Language for Query Expansion

19 0.057435464 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data

20 0.056700543 25 acl-2010-Adapting Self-Training for Semantic Role Labeling


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.172), (1, -0.041), (2, -0.042), (3, -0.013), (4, 0.016), (5, -0.008), (6, -0.023), (7, -0.071), (8, 0.057), (9, 0.056), (10, 0.013), (11, 0.073), (12, 0.091), (13, -0.06), (14, -0.02), (15, -0.005), (16, -0.007), (17, 0.012), (18, -0.032), (19, -0.012), (20, 0.031), (21, -0.031), (22, -0.005), (23, -0.032), (24, -0.033), (25, 0.012), (26, 0.175), (27, -0.028), (28, 0.056), (29, 0.037), (30, -0.031), (31, 0.063), (32, 0.086), (33, -0.015), (34, 0.03), (35, 0.072), (36, 0.009), (37, -0.006), (38, -0.147), (39, 0.054), (40, -0.017), (41, -0.086), (42, 0.006), (43, -0.028), (44, -0.133), (45, 0.146), (46, -0.01), (47, -0.038), (48, -0.054), (49, 0.059)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93125468 151 acl-2010-Intelligent Selection of Language Model Training Data

Author: Robert C. Moore ; William Lewis

Abstract: We address the problem of selecting nondomain-specific language model training data to build auxiliary language models for use in tasks such as machine translation. Our approach is based on comparing the cross-entropy, according to domainspecific and non-domain-specifc language models, for each sentence of the text source used to produce the latter language model. We show that this produces better language models, trained on less data, than both random data selection and two other previously proposed methods.

2 0.78260183 91 acl-2010-Domain Adaptation of Maximum Entropy Language Models

Author: Tanel Alumae ; Mikko Kurimo

Abstract: We investigate a recently proposed Bayesian adaptation method for building style-adapted maximum entropy language models for speech recognition, given a large corpus of written language data and a small corpus of speech transcripts. Experiments show that the method consistently outperforms linear interpolation which is typically used in such cases.

3 0.72940171 74 acl-2010-Correcting Errors in Speech Recognition with Articulatory Dynamics

Author: Frank Rudzicz

Abstract: We introduce a novel mechanism for incorporating articulatory dynamics into speech recognition with the theory of task dynamics. This system reranks sentencelevel hypotheses by the likelihoods of their hypothetical articulatory realizations which are derived from relationships learned with aligned acoustic/articulatory data. Experiments compare this with two baseline systems, namely an acoustic hidden Markov model and a dynamic Bayes network augmented with discretized representations of the vocal tract. Our system based on task dynamics reduces worderror rates significantly by 10.2% relative to the best baseline models.

4 0.62806076 193 acl-2010-Personalising Speech-To-Speech Translation in the EMIME Project

Author: Mikko Kurimo ; William Byrne ; John Dines ; Philip N. Garner ; Matthew Gibson ; Yong Guan ; Teemu Hirsimaki ; Reima Karhila ; Simon King ; Hui Liang ; Keiichiro Oura ; Lakshmi Saheer ; Matt Shannon ; Sayaki Shiota ; Jilei Tian

Abstract: In the EMIME project we have studied unsupervised cross-lingual speaker adaptation. We have employed an HMM statistical framework for both speech recognition and synthesis which provides transformation mechanisms to adapt the synthesized voice in TTS (text-to-speech) using the recognized voice in ASR (automatic speech recognition). An important application for this research is personalised speech-to-speech translation that will use the voice of the speaker in the input language to utter the translated sentences in the output language. In mobile environments this enhances the users’ interaction across language barriers by making the output speech sound more like the original speaker’s way of speaking, even if she or he could not speak the output language.

5 0.62587005 173 acl-2010-Modeling Norms of Turn-Taking in Multi-Party Conversation

Author: Kornel Laskowski

Abstract: Substantial research effort has been invested in recent decades into the computational study and automatic processing of multi-party conversation. While most aspects of conversational speech have benefited from a wide availability of analytic, computationally tractable techniques, only qualitative assessments are available for characterizing multi-party turn-taking. The current paper attempts to address this deficiency by first proposing a framework for computing turn-taking model perplexity, and then by evaluating several multi-participant modeling approaches. Experiments show that direct multi-participant models do not generalize to held out data, and likely never will, for practical reasons. In contrast, the Extended-Degree-of-Overlap model represents a suitable candidate for future work in this area, and is shown to successfully predict the distribution of speech in time and across participants in previously unseen conversations.

6 0.58352959 137 acl-2010-How Spoken Language Corpora Can Refine Current Speech Motor Training Methodologies

7 0.53497154 61 acl-2010-Combining Data and Mathematical Models of Language Change

8 0.49678037 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data

9 0.49584875 256 acl-2010-Vocabulary Choice as an Indicator of Perspective

10 0.49321637 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking

11 0.48485196 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features

12 0.47895131 56 acl-2010-Bridging SMT and TM with Translation Recommendation

13 0.46962497 54 acl-2010-Boosting-Based System Combination for Machine Translation

14 0.46044159 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation

15 0.45130324 78 acl-2010-Cross-Language Text Classification Using Structural Correspondence Learning

16 0.45065838 177 acl-2010-Multilingual Pseudo-Relevance Feedback: Performance Study of Assisting Languages

17 0.44386277 223 acl-2010-Tackling Sparse Data Issue in Machine Translation Evaluation

18 0.43928015 195 acl-2010-Phylogenetic Grammar Induction

19 0.43089637 34 acl-2010-Authorship Attribution Using Probabilistic Context-Free Grammars

20 0.42059609 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.036), (42, 0.013), (59, 0.599), (73, 0.041), (78, 0.014), (83, 0.067), (84, 0.015), (98, 0.115)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98895472 151 acl-2010-Intelligent Selection of Language Model Training Data

Author: Robert C. Moore ; William Lewis

Abstract: We address the problem of selecting nondomain-specific language model training data to build auxiliary language models for use in tasks such as machine translation. Our approach is based on comparing the cross-entropy, according to domainspecific and non-domain-specifc language models, for each sentence of the text source used to produce the latter language model. We show that this produces better language models, trained on less data, than both random data selection and two other previously proposed methods.

2 0.98838794 205 acl-2010-SVD and Clustering for Unsupervised POS Tagging

Author: Michael Lamar ; Yariv Maron ; Mark Johnson ; Elie Bienenstock

Abstract: We revisit the algorithm of Schütze (1995) for unsupervised part-of-speech tagging. The algorithm uses reduced-rank singular value decomposition followed by clustering to extract latent features from context distributions. As implemented here, it achieves state-of-the-art tagging accuracy at considerably less cost than more recent methods. It can also produce a range of finer-grained taggings, with potential applications to various tasks. 1

3 0.977305 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs

Author: Galina Tremper

Abstract: Presupposition relations between verbs are not very well covered in existing lexical semantic resources. We propose a weakly supervised algorithm for learning presupposition relations between verbs that distinguishes five semantic relations: presupposition, entailment, temporal inclusion, antonymy and other/no relation. We start with a number of seed verb pairs selected manually for each semantic relation and classify unseen verb pairs. Our algorithm achieves an overall accuracy of 36% for type-based classification.

4 0.97562248 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out

Author: Joern Wuebker ; Arne Mauser ; Hermann Ney

Abstract: Several attempts have been made to learn phrase translation probabilities for phrasebased statistical machine translation that go beyond pure counting of phrases in word-aligned training data. Most approaches report problems with overfitting. We describe a novel leavingone-out approach to prevent over-fitting that allows us to train phrase models that show improved translation performance on the WMT08 Europarl German-English task. In contrast to most previous work where phrase models were trained separately from other models used in translation, we include all components such as single word lexica and reordering mod- els in training. Using this consistent training of phrase models we are able to achieve improvements of up to 1.4 points in BLEU. As a side effect, the phrase table size is reduced by more than 80%.

5 0.97373259 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models

Author: Chris Dyer ; Adam Lopez ; Juri Ganitkevitch ; Jonathan Weese ; Ferhan Ture ; Phil Blunsom ; Hendra Setiawan ; Vladimir Eidelman ; Philip Resnik

Abstract: Adam Lopez University of Edinburgh alopez@inf.ed.ac.uk Juri Ganitkevitch Johns Hopkins University juri@cs.jhu.edu Ferhan Ture University of Maryland fture@cs.umd.edu Phil Blunsom Oxford University pblunsom@comlab.ox.ac.uk Vladimir Eidelman University of Maryland vlad@umiacs.umd.edu Philip Resnik University of Maryland resnik@umiacs.umd.edu classes in a unified way.1 Although open source decoders for both phraseWe present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including word-based models, phrase-based models, and models based on synchronous context-free grammars. Using a single unified internal representation for translation forests, the decoder strictly separates model-specific translation logic from general rescoring, pruning, and inference algorithms. From this unified representation, the decoder can extract not only the 1- or k-best translations, but also alignments to a reference, or the quantities necessary to drive discriminative training using gradient-based or gradient-free optimization techniques. Its efficient C++ implementation means that memory use and runtime performance are significantly better than comparable decoders.

6 0.87027848 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems

7 0.86554021 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study

8 0.80614424 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation

9 0.8023237 97 acl-2010-Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices

10 0.78599405 91 acl-2010-Domain Adaptation of Maximum Entropy Language Models

11 0.7829591 114 acl-2010-Faster Parsing by Supertagger Adaptation

12 0.77669948 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future

13 0.77602112 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences

14 0.76919717 26 acl-2010-All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision

15 0.767317 44 acl-2010-BabelNet: Building a Very Large Multilingual Semantic Network

16 0.76197064 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation

17 0.7593199 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network

18 0.75851619 212 acl-2010-Simple Semi-Supervised Training of Part-Of-Speech Taggers

19 0.7517516 96 acl-2010-Efficient Optimization of an MDL-Inspired Objective Function for Unsupervised Part-Of-Speech Tagging

20 0.74465758 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment