acl acl2010 acl2010-91 knowledge-graph by maker-knowledge-mining

91 acl-2010-Domain Adaptation of Maximum Entropy Language Models


Source: pdf

Author: Tanel Alumae ; Mikko Kurimo

Abstract: We investigate a recently proposed Bayesian adaptation method for building style-adapted maximum entropy language models for speech recognition, given a large corpus of written language data and a small corpus of speech transcripts. Experiments show that the method consistently outperforms linear interpolation which is typically used in such cases.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 fi @ Abstract We investigate a recently proposed Bayesian adaptation method for building style-adapted maximum entropy language models for speech recognition, given a large corpus of written language data and a small corpus of speech transcripts. [sent-3, score-0.797]

2 Experiments show that the method consistently outperforms linear interpolation which is typically used in such cases. [sent-4, score-0.208]

3 1 Introduction In large vocabulary speech recognition, a language model (LM) is typically estimated from large amounts of written text data. [sent-5, score-0.382]

4 However, recognition is typically applied to speech that is stylistically different from written language. [sent-6, score-0.417]

5 For example, in an often-tried setting, speech recognition is applied to broadcast news, that includes introductory segments, conversations and spontaneous interviews. [sent-7, score-0.382]

6 To decrease the mismatch between training and test data, often a small amount of speech data is human-transcribed. [sent-8, score-0.158]

7 A LM is then built by interpolating the models estimated from large corpus of written language and the small corpus of transcribed data. [sent-9, score-0.256]

8 However, in practice, different models might be of different importance depending on the word context. [sent-10, score-0.093]

9 Global interpolation doesn’t take such variability into account and all predictions are weighted across models identically, regardless of the context. [sent-11, score-0.24]

10 In this paper we investigate a recently proposed Bayesian adaptation approach (Daume III, 2007; Finkel and Manning, 2009) for adapting a conditional maximum entropy (ME) LM (Rosenfeld, 1996) to a new domain, given a large corpus of out-of-domain training data and a small corpus of in-domain data. [sent-12, score-0.397]

11 fi paper is that we show how the suggested hierarchical adaptation can be used with suitable priors and combined with the class-based speedup technique (Goodman, 2001) to adapt ME LMs in large-vocabulary speech recognition when the amount of target data is small. [sent-15, score-0.711]

12 The results outperform the conventional linear interpolation of background and target models in both N-grams and ME models. [sent-16, score-0.24]

13 It seems that with the adapted ME models, the same recognition accuracy for the target evaluation data can be obtained with 50% less adaptation data than in interpolated ME models. [sent-17, score-0.584]

14 2 Review of Conditional Maximum Entropy Language Models Maximum entropy (ME) modeling is a framework that has been used in a wide area of natural language processing (NLP) tasks. [sent-18, score-0.095]

15 A conditional ME model has the following form: P(x|h) =Pxe0PeiPλjiλfij(fxj,(hx)0,h) (1) where x is an outcomPe (in case of a LM, a word), h is a context (the word history), and x0 a set of all possible outcomes (words). [sent-19, score-0.048]

16 During ME training, the optimal weights λi corresponding to features fi(x, h) are learned. [sent-21, score-0.058]

17 More precisely, finding the ME model is equal to finding weights that maximize the log-likelihood L(X; Λ) of the training data X. [sent-22, score-0.093]

18 The weights are learned via improved iterative scaling algorithm or some of its modern fast counterparts (i. [sent-23, score-0.058]

19 Since LMs typically have a vocabulary of tens of thousands of words, the use of a normalization factor over all possible outcomes makes estimating a ME LM very memory and time consuming. [sent-26, score-0.186]

20 Goodman (2001) proposed a class-based method that drastically reduces the resource requirements for training such models. [sent-27, score-0.082]

21 c C2o0n1f0er Aenscseoc Sihatoirotn P faopre Crso,m papguetsat 3io0n1a–l3 L0i6n,guistics words in the vocabulary into classes (e. [sent-30, score-0.116]

22 It turns out that normalizing the second model is also easier: for a context h, C(w), we only need to normalize over words that belong to class C(w), since other words cannot occur in this context. [sent-36, score-0.054]

23 This decomposition can be further extended by using hierarchical classes. [sent-37, score-0.107]

24 To avoid overfitting, ME models are usually smoothed (regularized). [sent-38, score-0.093]

25 The most widely used smoothing method for ME LMs is Gaussian priors (Chen and Rosenfeld, 2000): a zero-mean prior with a given variance is added to all feature weights, and the model optimization criteria becomes: L0(X;Λ) = L(X;Λ) −Xi=F12λσi2i2 (3) where F is the number of feature functions. [sent-39, score-0.34]

26 The optimal variance is usually estimated on a development set. [sent-41, score-0.261]

27 Intuitively, this method encourages feature weights to be smaller, by penalizing weights with big absolute values. [sent-42, score-0.161]

28 3 Domain Adaptation of Maximum Entropy Models Recently, a hierarchical Bayesian adaptation method was proposed that can be applied to a large family of discriminative learning tasks (such as ME models, SVMs) (Daume III, 2007; Finkel and Manning, 2009). [sent-43, score-0.327]

29 In NLP problems, data often comes from different sources (e. [sent-44, score-0.046]

30 There are three classic approaches for building models from multiple sources. [sent-47, score-0.093]

31 The third and often the best performing approach is to train separate models for each data source, apply them to test data and interpolate the results. [sent-51, score-0.093]

32 The hierarchical Bayesian adaptation method is a generalization of the three approaches described above. [sent-52, score-0.327]

33 The hierarchical model jointly optimizes global and domain-specific parameters, using parameters built from pooled data as priors for domain-specific parameters. [sent-53, score-0.565]

34 In other words, instead of using smoothing to encourage parameters to be closer to zero, it encourages domainspecific model parameters to be closer to the corresponding global parameters, while a zero mean Gaussian prior is still applied for global parameters. [sent-54, score-0.714]

35 Intuitively, this approach can be described as follows: the domain-specific parameters are largely determined by global data, unless there is good domainspecific evidence that they should be different. [sent-56, score-0.287]

36 The key to this approach is that the global and domain-specific parameters are learnedjointly, not hierarchically. [sent-57, score-0.247]

37 This allows domain-specific parameters to influence the global parameters, and vice versa. [sent-58, score-0.247]

38 −XiF=1λ2σ∗2∗,2i (4) where Xd is data for domain d, λ∗,i the global parameters, λd,i the domain-specific parameters, σ∗2 the global variance and σd2 the domain-specific va∗riances. [sent-60, score-0.511]

39 The global and domain-specific variances are optimized on the heldout data. [sent-61, score-0.234]

40 Usually, larger values are used for global parameters and for domains with more data, while for domains with less data, the variance is typically set to be smaller, encouraging the domain-specific parameters to be closer to global values. [sent-62, score-0.783]

41 This adaptation scheme is very similar to the approaches proposed by (Chelba and Acero, 2006) and (Chen, 2009b): both use a model estimated from background data as a prior when learning a model from in-domain data. [sent-63, score-0.294]

42 The main difference is the fact that in this method, the models are estimated jointly while in the other works, back302 ground model has to be estimated before learning the in-domain model. [sent-64, score-0.278]

43 4 Experiments In this section, we look at experimental results over two speech recognition tasks. [sent-65, score-0.235]

44 This recognition task consists of the English broadcast news section of the 2003 NIST Rich Transcription Evaluation Data. [sent-68, score-0.306]

45 The data includes six news recordings from six different sources with a total length of 176 minutes. [sent-69, score-0.143]

46 As acoustic models, the CMU Sphinx open source triphone HUB4 models for wideband (16kHz) speech1 were used. [sent-70, score-0.31]

47 The models have been trained using 140 hours of speech. [sent-71, score-0.135]

48 For training the LMs, two sources were used: first 5M sentences from the Gigaword (2nd ed. [sent-72, score-0.081]

49 5M words), and broadcast news transcriptions from the TDT4 corpus (1. [sent-74, score-0.351]

50 The latter was treated as in-domain data in the adaptation experiments. [sent-76, score-0.22]

51 The audio used for testing was segmented into parts of up to 20 seconds in length. [sent-81, score-0.04]

52 A three-pass recognition strategy was applied: the first pass recognition hypotheses were used for calculating MLLR-adapted models for each speaker. [sent-86, score-0.399]

53 In the second pass, the adapted acoustic models were used for generating a 5000-best list of hypotheses for each segment. [sent-87, score-0.318]

54 In the third pass, the ME LM was used to re-rank the hypotheses and select the best one. [sent-88, score-0.042]

55 The trigram model was an interpolation of source-specific models which were estimated using Kneser-Ney discounting. [sent-90, score-0.364]

56 The second recognition task consists of four recordings from different live talk programs from 1http : / /www . [sent-92, score-0.352]

57 edu / sphinx / mode l / s three Estonian radio stations. [sent-96, score-0.225]

58 The acoustic models were trained on various wideband Estonian speech corpora: the BABEL speech database (9h), transcriptions of Estonian broadcast news (7. [sent-99, score-0.833]

59 For training the LMs, two sources were used: about 10M sentences from various Estonian newspapers, and manual transcriptions of 10 hours of live talk programs from three Estonian radio stations. [sent-102, score-0.548]

60 As Estonian is a highly inflective language, morphemes are used as basic units in the LM. [sent-104, score-0.046]

61 After such processing, the newspaper corpus includes of 185M tokens, and the transcribed data 104K tokens. [sent-106, score-0.042]

62 A vocabulary of 30K tokens was used for this task, with an OOV rate of 1. [sent-107, score-0.112]

63 As with English data, a three-pass recognition strategy involving MLLR adaptation was applied. [sent-110, score-0.332]

64 2 Results For both tasks, we rescored the N-best lists in two different ways: (1) using linear interpolation of source-specific ME models and (2) using hierarchically domain-adapted ME model (as described in previous chapter). [sent-112, score-0.29]

65 The English ME models had a three-level and Estonian models a four-level class hierarchy. [sent-113, score-0.186]

66 The number of classes at each level was determined experimentally so as to optimize the resource requirements for training ME models (specifically, the number of classes was 150, 1000 and 5000 for the English models and 20, 150, 1000 and 6000 for the Estonian models). [sent-115, score-0.346]

67 We used unigram, bigram and trigram features that occurred at least twice in the training data. [sent-116, score-0.085]

68 The feature set was identical for interpolated and adapted models. [sent-118, score-0.252]

69 For the English task, we also explored the efficiency of these two approaches with varying size of adaptation data: we repeated the experiments when using one eighth, one quarter, half and all of the TDT4 transcription data for interpolation/adaptation. [sent-121, score-0.283]

70 In all cases, interpolation weights were re-optimized and new Gaussian variance values were heuristically determined. [sent-123, score-0.433]

71 The TADM toolkit2 was used for estimating ME models, utilizing its implementation of the conjugate gradient algorithm. [sent-124, score-0.087]

72 The variance parameters were chosen heuristically based on light tuning on development set perplexity. [sent-126, score-0.33]

73 For the source-specific ME models, the variance was fixed on per-model basis. [sent-127, score-0.187]

74 For the adapted model, that jointly models global and domain-specific data, the Gaussian priors were fixed for each hierarchy node (i. [sent-128, score-0.489]

75 , the variance was fixed across global, out-of-domain, and in-domain parameters). [sent-130, score-0.187]

76 Table 1 lists values for the variances of Gaussian priors (as in equations 3 and 4) that we used in the experiments. [sent-131, score-0.189]

77 In other publications, the variance values are often normalized to the size of the data. [sent-132, score-0.187]

78 We chose not to normalize the values, since in the hierarchical adaptation scheme, also data from other domains have impact on the learned model parameters, thus 2http : / /t adm . [sent-133, score-0.381]

79 net / it’s not possible to simply normalize the variances. [sent-135, score-0.054]

80 Perplexity and word error rate (WER) results of the interpolated and adapted models are compared. [sent-137, score-0.38]

81 For the Estonian task, letter error rate (LER) is also reported, since it tends to be a more indicative measure of speech recognition quality for highly inflected languages. [sent-138, score-0.27]

82 In all experiments, using the adapted models resulted in lower perplexity and lower error rate. [sent-139, score-0.318]

83 Improvements in the English experiment were less evident than in the Estonian system, with under 10% improvement in perplexity and 1-3% in WER, against 15% and 4% for the Estonian experiment. [sent-140, score-0.111]

84 In most cases, there was a significant improvement in WER when using the adapted ME model (according to the Wilcoxon test), with and exception of the English experiments on the 292K and 591K data sets. [sent-141, score-0.114]

85 The comparison between N-gram models and ME models is not entirely fair since ME models are actually class-based. [sent-142, score-0.279]

86 Such transformation introduces additional smoothing into the model and can improve model perplexity, as also noticed by Goodman (2001). [sent-143, score-0.053]

87 5 Discussion In this paper we have tested a hierarchical adaptation method (Daume III, 2007; Finkel and Man- ning, 2009) on building style-adapted LMs for speech recognition. [sent-144, score-0.45]

88 We showed that the method achieves consistently lower error rates than when using linear interpolation which is typically used in such scenarios. [sent-145, score-0.208]

89 The tested method is ideally suited for language modeling in speech recognition: we almost always have access to large amounts of data from written sources but commonly the speech to be recognized is stylistically noticeably different. [sent-146, score-0.413]

90 The hierarchical adaptation method enables to use even a small amount of in-domain data to modify the parameters estimated from out-of-domain data, if there is enough evidence. [sent-147, score-0.503]

91 As Finkel and Manning (2009) point out, the hierarchical nature of the method makes it possible to estimate highly specific models: we could draw style-specific models from general high-level priors, and topic-and-style specific models from style-specific priors. [sent-148, score-0.293]

92 Furthermore, the models don’t have to be hierarchical: it is easy to generalize the method to general multilevel approach where a model is drawn from multiple priors. [sent-149, score-0.093]

93 6 Table 2: Perplexity, WER and LER results comparing pooled and interpolated N-gram models and interpolated and adapted ME models, with changing amount of available in-domain data. [sent-175, score-0.557]

94 instance, we could build a model for recognizing computer science lectures, given data from textbooks, including those about computer science, and transcripts oflectures on various topics (which don’t even need to include lectures about computer science). [sent-176, score-0.064]

95 First, training ME LMs in general has much higher resource requirements than training N-gram models which are typically used in speech recognition. [sent-178, score-0.394]

96 Moreover, training hierarchical ME models requires even more memory than training simple ME models, proportional to the number of nodes in the hierarchy. [sent-179, score-0.27]

97 It is also difficult to determine good variance values σi2 for the global and domain-specific priors. [sent-181, score-0.332]

98 Adaptation of maximum entropy capitalizer: Little data can help a lot. [sent-188, score-0.142]

99 The LIUM speech transcription system: a CMU Sphinx III-based system for French broadcast news. [sent-219, score-0.333]

100 Building a topicdependent maximum entropy model for very large corpora. [sent-254, score-0.142]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('estonian', 0.451), ('adaptation', 0.22), ('lms', 0.193), ('variance', 0.187), ('transcriptions', 0.157), ('sphinx', 0.147), ('interpolation', 0.147), ('broadcast', 0.147), ('global', 0.145), ('interpolated', 0.138), ('speech', 0.123), ('gaussian', 0.122), ('lm', 0.122), ('daume', 0.118), ('adapted', 0.114), ('recognition', 0.112), ('perplexity', 0.111), ('hierarchical', 0.107), ('parameters', 0.102), ('priors', 0.1), ('wer', 0.096), ('entropy', 0.095), ('models', 0.093), ('finkel', 0.09), ('finland', 0.089), ('variances', 0.089), ('cmu', 0.086), ('programs', 0.079), ('xd', 0.078), ('radio', 0.078), ('vocabulary', 0.077), ('estimated', 0.074), ('chelba', 0.074), ('glise', 0.074), ('kaalep', 0.074), ('lium', 0.074), ('pooled', 0.074), ('stylistically', 0.074), ('textbooks', 0.074), ('triphone', 0.074), ('wideband', 0.074), ('xif', 0.074), ('acoustic', 0.069), ('adaptive', 0.068), ('goodman', 0.065), ('lectures', 0.064), ('mikko', 0.064), ('transcription', 0.063), ('history', 0.062), ('typically', 0.061), ('live', 0.061), ('aalto', 0.059), ('weights', 0.058), ('helsinki', 0.055), ('normalize', 0.054), ('smoothing', 0.053), ('newspapers', 0.052), ('conjugate', 0.052), ('informatics', 0.052), ('chen', 0.051), ('bayesian', 0.051), ('talk', 0.05), ('trigram', 0.05), ('rosenfeld', 0.05), ('hierarchically', 0.05), ('recordings', 0.05), ('fi', 0.049), ('outcomes', 0.048), ('ler', 0.048), ('oov', 0.048), ('kneser', 0.048), ('requirements', 0.047), ('maximum', 0.047), ('news', 0.047), ('written', 0.047), ('morphemes', 0.046), ('boulder', 0.046), ('sources', 0.046), ('encourages', 0.045), ('od', 0.043), ('hypotheses', 0.042), ('hours', 0.042), ('transcribed', 0.042), ('prediction', 0.042), ('heuristically', 0.041), ('closer', 0.041), ('pass', 0.04), ('audio', 0.04), ('domainspecific', 0.04), ('regularized', 0.04), ('icassp', 0.04), ('classes', 0.039), ('iii', 0.038), ('jointly', 0.037), ('hyperparameters', 0.037), ('gradient', 0.035), ('centre', 0.035), ('training', 0.035), ('rate', 0.035), ('domain', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 91 acl-2010-Domain Adaptation of Maximum Entropy Language Models

Author: Tanel Alumae ; Mikko Kurimo

Abstract: We investigate a recently proposed Bayesian adaptation method for building style-adapted maximum entropy language models for speech recognition, given a large corpus of written language data and a small corpus of speech transcripts. Experiments show that the method consistently outperforms linear interpolation which is typically used in such cases.

2 0.14825645 151 acl-2010-Intelligent Selection of Language Model Training Data

Author: Robert C. Moore ; William Lewis

Abstract: We address the problem of selecting nondomain-specific language model training data to build auxiliary language models for use in tasks such as machine translation. Our approach is based on comparing the cross-entropy, according to domainspecific and non-domain-specifc language models, for each sentence of the text source used to produce the latter language model. We show that this produces better language models, trained on less data, than both random data selection and two other previously proposed methods.

3 0.14121346 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data

Author: Jenny Rose Finkel ; Christopher D. Manning

Abstract: One of the main obstacles to producing high quality joint models is the lack of jointly annotated data. Joint modeling of multiple natural language processing tasks outperforms single-task models learned from the same data, but still underperforms compared to single-task models learned on the more abundant quantities of available single-task annotated data. In this paper we present a novel model which makes use of additional single-task annotated data to improve the performance of a joint model. Our model utilizes a hierarchical prior to link the feature weights for shared features in several single-task models and the joint model. Experiments on joint parsing and named entity recog- nition, using the OntoNotes corpus, show that our hierarchical joint model can produce substantial gains over a joint model trained on only the jointly annotated data.

4 0.13320917 193 acl-2010-Personalising Speech-To-Speech Translation in the EMIME Project

Author: Mikko Kurimo ; William Byrne ; John Dines ; Philip N. Garner ; Matthew Gibson ; Yong Guan ; Teemu Hirsimaki ; Reima Karhila ; Simon King ; Hui Liang ; Keiichiro Oura ; Lakshmi Saheer ; Matt Shannon ; Sayaki Shiota ; Jilei Tian

Abstract: In the EMIME project we have studied unsupervised cross-lingual speaker adaptation. We have employed an HMM statistical framework for both speech recognition and synthesis which provides transformation mechanisms to adapt the synthesized voice in TTS (text-to-speech) using the recognized voice in ASR (automatic speech recognition). An important application for this research is personalised speech-to-speech translation that will use the voice of the speaker in the input language to utter the translated sentences in the output language. In mobile environments this enhances the users’ interaction across language barriers by making the output speech sound more like the original speaker’s way of speaking, even if she or he could not speak the output language.

5 0.084283292 14 acl-2010-A Risk Minimization Framework for Extractive Speech Summarization

Author: Shih-Hsiang Lin ; Berlin Chen

Abstract: In this paper, we formulate extractive summarization as a risk minimization problem and propose a unified probabilistic framework that naturally combines supervised and unsupervised summarization models to inherit their individual merits as well as to overcome their inherent limitations. In addition, the introduction of various loss functions also provides the summarization framework with a flexible but systematic way to render the redundancy and coherence relationships among sentences and between sentences and the whole document, respectively. Experiments on speech summarization show that the methods deduced from our framework are very competitive with existing summarization approaches. 1

6 0.076109849 195 acl-2010-Phylogenetic Grammar Induction

7 0.074196868 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out

8 0.070941128 26 acl-2010-All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision

9 0.067831673 173 acl-2010-Modeling Norms of Turn-Taking in Multi-Party Conversation

10 0.064024694 158 acl-2010-Latent Variable Models of Selectional Preference

11 0.061617982 78 acl-2010-Cross-Language Text Classification Using Structural Correspondence Learning

12 0.060964532 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation

13 0.058835197 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation

14 0.057805132 74 acl-2010-Correcting Errors in Speech Recognition with Articulatory Dynamics

15 0.056243956 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages

16 0.055114381 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation

17 0.053404722 96 acl-2010-Efficient Optimization of an MDL-Inspired Objective Function for Unsupervised Part-Of-Speech Tagging

18 0.05062221 214 acl-2010-Sparsity in Dependency Grammar Induction

19 0.050579183 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features

20 0.049544778 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.163), (1, -0.0), (2, -0.038), (3, -0.029), (4, 0.016), (5, -0.038), (6, -0.01), (7, -0.033), (8, 0.077), (9, 0.02), (10, -0.026), (11, 0.068), (12, 0.106), (13, -0.071), (14, -0.054), (15, -0.063), (16, -0.01), (17, 0.029), (18, 0.025), (19, -0.032), (20, 0.004), (21, -0.022), (22, -0.014), (23, -0.021), (24, -0.078), (25, 0.07), (26, 0.137), (27, -0.043), (28, 0.114), (29, -0.071), (30, -0.095), (31, 0.18), (32, 0.067), (33, -0.023), (34, 0.04), (35, 0.131), (36, -0.065), (37, 0.135), (38, -0.101), (39, 0.017), (40, -0.069), (41, 0.002), (42, 0.133), (43, -0.063), (44, -0.274), (45, 0.103), (46, 0.089), (47, -0.002), (48, -0.098), (49, 0.105)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95298946 91 acl-2010-Domain Adaptation of Maximum Entropy Language Models

Author: Tanel Alumae ; Mikko Kurimo

Abstract: We investigate a recently proposed Bayesian adaptation method for building style-adapted maximum entropy language models for speech recognition, given a large corpus of written language data and a small corpus of speech transcripts. Experiments show that the method consistently outperforms linear interpolation which is typically used in such cases.

2 0.80177063 193 acl-2010-Personalising Speech-To-Speech Translation in the EMIME Project

Author: Mikko Kurimo ; William Byrne ; John Dines ; Philip N. Garner ; Matthew Gibson ; Yong Guan ; Teemu Hirsimaki ; Reima Karhila ; Simon King ; Hui Liang ; Keiichiro Oura ; Lakshmi Saheer ; Matt Shannon ; Sayaki Shiota ; Jilei Tian

Abstract: In the EMIME project we have studied unsupervised cross-lingual speaker adaptation. We have employed an HMM statistical framework for both speech recognition and synthesis which provides transformation mechanisms to adapt the synthesized voice in TTS (text-to-speech) using the recognized voice in ASR (automatic speech recognition). An important application for this research is personalised speech-to-speech translation that will use the voice of the speaker in the input language to utter the translated sentences in the output language. In mobile environments this enhances the users’ interaction across language barriers by making the output speech sound more like the original speaker’s way of speaking, even if she or he could not speak the output language.

3 0.76038992 74 acl-2010-Correcting Errors in Speech Recognition with Articulatory Dynamics

Author: Frank Rudzicz

Abstract: We introduce a novel mechanism for incorporating articulatory dynamics into speech recognition with the theory of task dynamics. This system reranks sentencelevel hypotheses by the likelihoods of their hypothetical articulatory realizations which are derived from relationships learned with aligned acoustic/articulatory data. Experiments compare this with two baseline systems, namely an acoustic hidden Markov model and a dynamic Bayes network augmented with discretized representations of the vocal tract. Our system based on task dynamics reduces worderror rates significantly by 10.2% relative to the best baseline models.

4 0.72713536 151 acl-2010-Intelligent Selection of Language Model Training Data

Author: Robert C. Moore ; William Lewis

Abstract: We address the problem of selecting nondomain-specific language model training data to build auxiliary language models for use in tasks such as machine translation. Our approach is based on comparing the cross-entropy, according to domainspecific and non-domain-specifc language models, for each sentence of the text source used to produce the latter language model. We show that this produces better language models, trained on less data, than both random data selection and two other previously proposed methods.

5 0.71520531 137 acl-2010-How Spoken Language Corpora Can Refine Current Speech Motor Training Methodologies

Author: Daniil Umanski ; Federico Sangati

Abstract: The growing availability of spoken language corpora presents new opportunities for enriching the methodologies of speech and language therapy. In this paper, we present a novel approach for constructing speech motor exercises, based on linguistic knowledge extracted from spoken language corpora. In our study with the Dutch Spoken Corpus, syllabic inventories were obtained by means of automatic syllabification of the spoken language data. Our experimental syllabification method exhibited a reliable performance, and allowed for the acquisition of syllabic tokens from the corpus. Consequently, the syl- labic tokens were integrated in a tool for clinicians, a result which holds the potential of contributing to the current state of speech motor training methodologies.

6 0.6144259 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data

7 0.59285253 173 acl-2010-Modeling Norms of Turn-Taking in Multi-Party Conversation

8 0.48594671 61 acl-2010-Combining Data and Mathematical Models of Language Change

9 0.44664848 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study

10 0.41613549 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation

11 0.41286984 256 acl-2010-Vocabulary Choice as an Indicator of Perspective

12 0.40570045 263 acl-2010-Word Representations: A Simple and General Method for Semi-Supervised Learning

13 0.40484247 195 acl-2010-Phylogenetic Grammar Induction

14 0.4037129 34 acl-2010-Authorship Attribution Using Probabilistic Context-Free Grammars

15 0.40354913 161 acl-2010-Learning Better Data Representation Using Inference-Driven Metric Learning

16 0.40210056 26 acl-2010-All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision

17 0.37080547 199 acl-2010-Preferences versus Adaptation during Referring Expression Generation

18 0.36313459 116 acl-2010-Finding Cognate Groups Using Phylogenies

19 0.35827109 78 acl-2010-Cross-Language Text Classification Using Structural Correspondence Learning

20 0.35597873 190 acl-2010-P10-5005 k2opt.pdf


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.042), (42, 0.018), (59, 0.156), (71, 0.391), (73, 0.054), (78, 0.016), (83, 0.114), (84, 0.019), (98, 0.103)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.82320142 161 acl-2010-Learning Better Data Representation Using Inference-Driven Metric Learning

Author: Paramveer S. Dhillon ; Partha Pratim Talukdar ; Koby Crammer

Abstract: We initiate a study comparing effectiveness of the transformed spaces learned by recently proposed supervised, and semisupervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e.g., PCA). Through a variety of experiments on different realworld datasets, we find IDML-IT, a semisupervised metric learning algorithm to be the most effective.

2 0.81107891 157 acl-2010-Last but Definitely Not Least: On the Role of the Last Sentence in Automatic Polarity-Classification

Author: Israela Becker ; Vered Aharonson

Abstract: Two psycholinguistic and psychophysical experiments show that in order to efficiently extract polarity of written texts such as customerreviews on the Internet, one should concentrate computational efforts on messages in the final position of the text.

same-paper 3 0.79056448 91 acl-2010-Domain Adaptation of Maximum Entropy Language Models

Author: Tanel Alumae ; Mikko Kurimo

Abstract: We investigate a recently proposed Bayesian adaptation method for building style-adapted maximum entropy language models for speech recognition, given a large corpus of written language data and a small corpus of speech transcripts. Experiments show that the method consistently outperforms linear interpolation which is typically used in such cases.

4 0.52231264 78 acl-2010-Cross-Language Text Classification Using Structural Correspondence Learning

Author: Peter Prettenhofer ; Benno Stein

Abstract: We present a new approach to crosslanguage text classification that builds on structural correspondence learning, a recently proposed theory for domain adaptation. The approach uses unlabeled documents, along with a simple word translation oracle, in order to induce taskspecific, cross-lingual word correspondences. We report on analyses that reveal quantitative insights about the use of unlabeled data and the complexity of interlanguage correspondence modeling. We conduct experiments in the field of cross-language sentiment classification, employing English as source language, and German, French, and Japanese as target languages. The results are convincing; they demonstrate both the robustness and the competitiveness of the presented ideas.

5 0.51354033 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition

Author: Partha Pratim Talukdar ; Fernando Pereira

Abstract: Graph-based semi-supervised learning (SSL) algorithms have been successfully used to extract class-instance pairs from large unstructured and structured text collections. However, a careful comparison of different graph-based SSL algorithms on that task has been lacking. We compare three graph-based SSL algorithms for class-instance acquisition on a variety of graphs constructed from different domains. We find that the recently proposed MAD algorithm is the most effective. We also show that class-instance extraction can be significantly improved by adding semantic information in the form of instance-attribute edges derived from an independently developed knowledge base. All of our code and data will be made publicly available to encourage reproducible research in this area.

6 0.50657696 80 acl-2010-Cross Lingual Adaptation: An Experiment on Sentiment Classifications

7 0.50651956 96 acl-2010-Efficient Optimization of an MDL-Inspired Objective Function for Unsupervised Part-Of-Speech Tagging

8 0.50593972 26 acl-2010-All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision

9 0.5036779 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes

10 0.49825683 42 acl-2010-Automatically Generating Annotator Rationales to Improve Sentiment Classification

11 0.49710387 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems

12 0.49601078 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years

13 0.49482599 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study

14 0.49448881 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences

15 0.49132186 193 acl-2010-Personalising Speech-To-Speech Translation in the EMIME Project

16 0.48905277 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans

17 0.48874342 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

18 0.48537362 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data

19 0.48536676 54 acl-2010-Boosting-Based System Combination for Machine Translation

20 0.4838728 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People