acl acl2010 acl2010-193 knowledge-graph by maker-knowledge-mining

193 acl-2010-Personalising Speech-To-Speech Translation in the EMIME Project


Source: pdf

Author: Mikko Kurimo ; William Byrne ; John Dines ; Philip N. Garner ; Matthew Gibson ; Yong Guan ; Teemu Hirsimaki ; Reima Karhila ; Simon King ; Hui Liang ; Keiichiro Oura ; Lakshmi Saheer ; Matt Shannon ; Sayaki Shiota ; Jilei Tian

Abstract: In the EMIME project we have studied unsupervised cross-lingual speaker adaptation. We have employed an HMM statistical framework for both speech recognition and synthesis which provides transformation mechanisms to adapt the synthesized voice in TTS (text-to-speech) using the recognized voice in ASR (automatic speech recognition). An important application for this research is personalised speech-to-speech translation that will use the voice of the speaker in the input language to utter the translated sentences in the output language. In mobile environments this enhances the users’ interaction across language barriers by making the output speech sound more like the original speaker’s way of speaking, even if she or he could not speak the output language.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Personalising speech-to-speech translation in the EMIME project Mikko Kurimo1†, William Byrne6, John Dines3, Philip N. [sent-1, score-0.091]

2 fi Abstract In the EMIME project we have studied unsupervised cross-lingual speaker adaptation. [sent-4, score-0.394]

3 We have employed an HMM statistical framework for both speech recognition and synthesis which provides transformation mechanisms to adapt the synthesized voice in TTS (text-to-speech) using the recognized voice in ASR (automatic speech recognition). [sent-5, score-0.797]

4 An important application for this research is personalised speech-to-speech translation that will use the voice of the speaker in the input language to utter the translated sentences in the output language. [sent-6, score-0.597]

5 In mobile environments this enhances the users’ interaction across language barriers by making the output speech sound more like the original speaker’s way of speaking, even if she or he could not speak the output language. [sent-7, score-0.402]

6 1 Introduction A mobile real-time speech-to-speech translation (S2ST) device is one of the grand challenges in natural language processing (NLP). [sent-8, score-0.193]

7 It involves several important NLP research areas: automatic speech recognition (ASR), statistical machine translation (SMT) and speech synthesis, also known as text-to-speech (TTS). [sent-9, score-0.31]

8 In recent years significant advance have also been made in relevant technological devices: the size of powerful computers has decreased to fit in a mobile phone and fast WiFi and 3G networks have spread widely to connect them to even more powerful computation servers. [sent-10, score-0.167]

9 Several hand-held S2ST applications and devices have already become available, for example by IBM, Google or Jibbigo1 , but there are still serious limitations in vocabulary and language selection and performance. [sent-11, score-0.126]

10 When an S2ST device is used in practical human interaction across a language barrier, one feature that is often missed is the personalization of the output voice. [sent-12, score-0.078]

11 Whoever speaks to the device in what ever manner, the output voice always sounds the same. [sent-13, score-0.3]

12 Producing high-quality synthesis voices is expensive and even if the system had many output voices, it is hard to select one that would sound like the input voice. [sent-14, score-0.393]

13 There are many features in the output voice that could raise the interaction experience to a much more natural level, for example, emotions, speaking rate, loudness and the speaker identity. [sent-15, score-0.561]

14 After the recent development in hidden Markov model (HMM) based TTS, it has become possible to adapt the output voice using model transformations that can be estimated from a small number of speech samples. [sent-16, score-0.456]

15 These techniques, for instance the maximum likelihood linear regression (MLLR), are adopted from HMM-based ASR where they are very powerful in fast adaptation of speaker and recording environment characteristics (Gales, 1998). [sent-17, score-0.693]

16 Using hierarchical regression trees, the TTS and ASR models can further be coupled in a way that enables unsupervised TTS adaptation (King et al. [sent-18, score-0.41]

17 In unsupervised adaptation samples are annotated by applying ASR. [sent-20, score-0.381]

18 By eliminating the need for human intervention it becomes possible to perform voice adaptation for TTS in almost real-time. [sent-21, score-0.494]

19 The target in the EMIME project2 is to study unsupervised cross-lingual speaker adaptation for S2ST systems. [sent-22, score-0.65]

20 The first results of the project have 1http : / /www . [sent-23, score-0.049]

21 , 2009) systems for morphologically rich languages, and to develop robust TTS (Yamagishi et al. [sent-31, score-0.1]

22 The next step has been preliminary experiments in intra-lingual and cross-lingual speaker adaptation (Wu et al. [sent-33, score-0.601]

23 For cross-lingual adaptation several new methods have been proposed for mapping the HMM states, adaptation data and model transformations (Wu et al. [sent-35, score-0.687]

24 Even though the project is still ongoing, we have an initial version of mobile S2ST system and crosslingual speaker adaptation to show. [sent-38, score-0.824]

25 2 Baseline ASR, TTS and SMT systems The baseline ASR systems in the project are developed using the HTK toolkit (Young et al. [sent-39, score-0.075]

26 The main structure of the baseline systems for each of the four languages is similar and fairly standard and in line with most other state-ofthe-art large vocabulary ASR systems. [sent-44, score-0.069]

27 Some spe- cial flavors for have been added, such as the morphological analysis for Finnish (Hirsim a¨ki et al. [sent-45, score-0.025]

28 For speaker adaptation, the MLLR transformation based on hierarchical regression classes is included for all languages. [sent-47, score-0.327]

29 The baseline TTS systems in the project utilize the HTS toolkit (Yamagishi et al. [sent-48, score-0.049]

30 The HMM-based TTS systems have been developed for Finnish, English, Mandarin and Japanese. [sent-50, score-0.026]

31 The systems include an average voice model for each language trained over hundreds of speakers taken from standard ASR corpora, such as Speecon (Iskra et al. [sent-51, score-0.26]

32 Using speaker adaptation transforms, thousands of new voices have been created (Yamagishi et al. [sent-53, score-0.854]

33 , 2010) and new voices can be added using a small number of either supervised or unsupervised speech samples. [sent-54, score-0.409]

34 Crosslingual adaptation is possible by creating a mapping between the HMM states in the input and the output language (Wu et al. [sent-55, score-0.337]

35 Because the resources of the EMIME project have been focused on ASR, TTS and speaker adaptation, we aim at relying on existing solutions for SMT as far as possible. [sent-57, score-0.345]

36 New methods have been studied concerning the morphologically rich languages (de Gispert et al. [sent-58, score-0.1]

37 1 Monolingual systems In robust speech synthesis, a computer can learn to speak in the desired way after processing only a relatively small amount of training speech. [sent-61, score-0.223]

38 The training speech can even be a normal quality recording outside the studio environment, where the target speaker is speaking to a standard microphone and the speech is not annotated. [sent-62, score-0.66]

39 This differs dramatically from conventional TTS, where building a new voice requires an hour or more careful repetition of specially selected prompts recorded in an anechoic chamber with high quality equipment. [sent-63, score-0.243]

40 Robust TTS has recently become possible using the statistical HMM framework for both ASR and TTS. [sent-64, score-0.024]

41 This framework enables the use of efficient speaker adaptation transformations developed for ASR to be used also for the TTS models. [sent-65, score-0.729]

42 Using large corpora collected for ASR, we can train average voice models for both ASR and TTS. [sent-66, score-0.189]

43 The training data may include a small amount of speech with poor coverage of phonetic contexts from each single speaker, but by summing the material over hundreds of speakers, we can obtain sufficient models for an average speaker. [sent-67, score-0.167]

44 Only a small amount of adaptation data is then required to create transformations for tuning the average voice closer to the target voice. [sent-68, score-0.571]

45 In addition to the supervised adaptation using annotated speech, it is also possible to employ ASR to create annotations. [sent-69, score-0.305]

46 This unsupervised adaptation enables the system to use a much broader selection of sources, for example, recorded samples from the internet, to learn a new voice. [sent-70, score-0.431]

47 In EMIME Voice cloning in Finnish and English the goal is that the users can clone their own voice. [sent-72, score-0.022]

48 The user will dictate for about 3http : //translate . [sent-73, score-0.04]

49 com 49 Blue markers show male speakers and red markers show female speakers. [sent-75, score-0.274]

50 1 10 minutes and then after half an hour of processing time, the TTS system has transformed the average model towards the user’s voice and can speak with this voice. [sent-79, score-0.282]

51 The cloned voices may become especially valuable, for example, if a person’s voice is later damaged in an accident or by a disease. [sent-80, score-0.439]

52 In EMIME Thousand voices map the goal is to browse the world’s largest collection of synthetic voices by using a world map interface (Yamagishi et al. [sent-82, score-0.538]

53 The user can zoom in the world map and select any voice, which are organized according to the place of living of the adapted speaker, to utter the given sentence. [sent-84, score-0.138]

54 This interactive geographical representation is shown in Figure 1. [sent-85, score-0.09]

55 Blue markers show male speakers and red markers show female speakers. [sent-87, score-0.274]

56 Some markers are in arbitrary locations (in the correct country) because precise location information is not available for all speakers. [sent-88, score-0.073]

57 This geographical representation, which includes an interactive TTS demonstration of many of the voices, is available from the URL provided. [sent-89, score-0.113]

58 Clicking on a marker will play synthetic speech from that speaker4. [sent-90, score-0.202]

59 As well as 4Currently the interactive mode supports English and Spanish only. [sent-91, score-0.055]

60 For other languages this only provides pre- being a convenient interface to compare the many voices, the interactive map is an attractive and easy-to-understand demonstration of the technology being developed in EMIME. [sent-92, score-0.154]

61 The models developed in the HMM framework can be demonstrated also in adaptation of an ASR system for large-vocabulary continuous speech recognition. [sent-94, score-0.49]

62 By utilizing morpheme-based language models instead of word-based models the Finnish ASR system is able to cover practically an unlimited vocabulary (Hirsim a¨ki et al. [sent-95, score-0.076]

63 This is necessary for morphologically rich languages where, due to inflection, derivation and composition, there exists so many different word forms that word based language modeling becomes impractical. [sent-97, score-0.1]

64 2 Cross-lingual systems In the EMIME project the goal is to learn crosslingual speaker adaptation. [sent-99, score-0.414]

65 Here the output language ASR or TTS system is adapted from speech samples in the input language. [sent-100, score-0.228]

66 The results so far are encouraging, especially for TTS: Even though the cross-lingual adaptation may somewhat degrade the synthesis quality, the adapted speech now sounds more like the target speaker. [sent-101, score-0.607]

67 Several recent evaluations of the cross-lingual speaker synthesised examples, but we plan to add an interactive typein text-to-speech feature in the near future. [sent-102, score-0.351]

68 50 adaptation methods can be found in (Gibson et al. [sent-103, score-0.305]

69 In EMIME Cross-lingual Finnish/English and Mandarin/English TTS adaptation the input language sentences dictated by the user will be used to learn the characteristics of her or his voice. [sent-109, score-0.345]

70 The adapted cross-lingual model will be used to speak output language (English) sentences in the user’s voice. [sent-110, score-0.131]

71 The user does not need to be bilingual and only reads sentences in their native language. [sent-111, score-0.071]

72 In EMIME Real-time speech-to-speech mobile translation demo two users will interact using a pair of mobile N97 devices (see Figure 3). [sent-113, score-0.349]

73 The system will recognize the phrase the other user is speaking in his native language and translate and speak it in the native language of the other user. [sent-114, score-0.21]

74 After a few sentences the system will have the speaker adaptation transformations ready and can apply them in the synthesized voices to make them sound more like the original speaker instead of a standard voice. [sent-115, score-1.286]

75 The first real-time demo version is available for the Mandarin/English language pair. [sent-116, score-0.039]

76 The morpheme-based translation system for Finnish/English and English/Finnish can be compared to a word based translation for arbitrary sentences. [sent-118, score-0.084]

77 The morpheme-based approach is particularly useful for language pairs where one or both languages are morphologically rich ones where the amount and complexity of different word forms severely limits the performance for word-based translation. [sent-119, score-0.1]

78 The morpheme-based systems can learn translation models for phrases where morphemes are used instead of words (de Gispert et al. [sent-120, score-0.042]

79 , 2009) have shown that the performance of the unsupervised data-driven morpheme segmentation can rival the conventional rule-based ones. [sent-123, score-0.049]

80 This is very useful if hand-crafted morphological analyzers are not available or their coverage is not sufficient for all languages. [sent-124, score-0.025]

81 Minimum Bayes risk combination of translation hypotheses from alternative morphological decompositions. [sent-132, score-0.067]

82 Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using two-pass decision tree construction. [sent-155, score-0.835]

83 Unlimited vocabulary speech recognition with morph language models applied to finnish. [sent-166, score-0.178]

84 Importance of high-order n-gram models in morphbased speech recognition. [sent-172, score-0.134]

85 SPEECON speech databases for consumer devices: Database specification and validation. [sent-184, score-0.134]

86 Free software toolkit for japanese large vocabulary continuous speech recognition. [sent-201, score-0.203]

87 A comparison of supervised and unsupervised crosslingual speaker adaptation approaches for HMMbased speech synthesis. [sent-224, score-0.853]

88 State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis. [sent-257, score-0.735]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tts', 0.363), ('asr', 0.336), ('adaptation', 0.305), ('speaker', 0.296), ('emime', 0.281), ('yamagishi', 0.23), ('voices', 0.226), ('hirsim', 0.204), ('voice', 0.189), ('speech', 0.134), ('oura', 0.128), ('mobile', 0.105), ('dines', 0.102), ('synthesis', 0.1), ('kurimo', 0.096), ('tokuda', 0.089), ('gispert', 0.082), ('king', 0.081), ('transformations', 0.077), ('htk', 0.077), ('finnish', 0.075), ('markers', 0.073), ('ki', 0.073), ('crosslingual', 0.069), ('mikko', 0.067), ('aki', 0.067), ('speak', 0.064), ('virpioja', 0.062), ('devices', 0.058), ('hmm', 0.057), ('interactive', 0.055), ('idiap', 0.051), ('iskra', 0.051), ('junichi', 0.051), ('karhila', 0.051), ('keiichiro', 0.051), ('mirjam', 0.051), ('pylkk', 0.051), ('speecon', 0.051), ('synthesized', 0.051), ('zen', 0.051), ('interspeech', 0.05), ('unsupervised', 0.049), ('project', 0.049), ('smt', 0.048), ('morphologically', 0.048), ('device', 0.046), ('keiichi', 0.045), ('mllr', 0.045), ('tkk', 0.045), ('vocabulary', 0.044), ('speaking', 0.044), ('translation', 0.042), ('audio', 0.041), ('icassp', 0.041), ('user', 0.04), ('demo', 0.039), ('mandarin', 0.038), ('hmmbased', 0.038), ('utter', 0.038), ('wu', 0.038), ('speakers', 0.038), ('synthetic', 0.036), ('sound', 0.035), ('adapted', 0.035), ('geographical', 0.035), ('gibson', 0.035), ('kawahara', 0.033), ('female', 0.033), ('sounds', 0.033), ('hundreds', 0.033), ('unlimited', 0.032), ('marker', 0.032), ('output', 0.032), ('regression', 0.031), ('native', 0.031), ('male', 0.031), ('powerful', 0.031), ('recording', 0.03), ('hour', 0.029), ('blue', 0.028), ('rich', 0.027), ('google', 0.027), ('simon', 0.027), ('thousands', 0.027), ('samples', 0.027), ('red', 0.026), ('developed', 0.026), ('map', 0.025), ('morphological', 0.025), ('languages', 0.025), ('enables', 0.025), ('robust', 0.025), ('recorded', 0.025), ('continuous', 0.025), ('become', 0.024), ('demonstration', 0.023), ('studio', 0.022), ('utsuro', 0.022), ('clone', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 193 acl-2010-Personalising Speech-To-Speech Translation in the EMIME Project

Author: Mikko Kurimo ; William Byrne ; John Dines ; Philip N. Garner ; Matthew Gibson ; Yong Guan ; Teemu Hirsimaki ; Reima Karhila ; Simon King ; Hui Liang ; Keiichiro Oura ; Lakshmi Saheer ; Matt Shannon ; Sayaki Shiota ; Jilei Tian

Abstract: In the EMIME project we have studied unsupervised cross-lingual speaker adaptation. We have employed an HMM statistical framework for both speech recognition and synthesis which provides transformation mechanisms to adapt the synthesized voice in TTS (text-to-speech) using the recognized voice in ASR (automatic speech recognition). An important application for this research is personalised speech-to-speech translation that will use the voice of the speaker in the input language to utter the translated sentences in the output language. In mobile environments this enhances the users’ interaction across language barriers by making the output speech sound more like the original speaker’s way of speaking, even if she or he could not speak the output language.

2 0.13320917 91 acl-2010-Domain Adaptation of Maximum Entropy Language Models

Author: Tanel Alumae ; Mikko Kurimo

Abstract: We investigate a recently proposed Bayesian adaptation method for building style-adapted maximum entropy language models for speech recognition, given a large corpus of written language data and a small corpus of speech transcripts. Experiments show that the method consistently outperforms linear interpolation which is typically used in such cases.

3 0.076092266 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study

Author: Yun-Cheng Ju ; Tim Paek

Abstract: Speech recognition affords automobile drivers a hands-free, eyes-free method of replying to Short Message Service (SMS) text messages. Although a voice search approach based on template matching has been shown to be more robust to the challenging acoustic environment of automobiles than using dictation, users may have difficulties verifying whether SMS response templates match their intended meaning, especially while driving. Using a high-fidelity driving simulator, we compared dictation for SMS replies versus voice search in increasingly difficult driving conditions. Although the two approaches did not differ in terms of driving performance measures, users made about six times more errors on average using dictation than voice search. 1

4 0.067787133 74 acl-2010-Correcting Errors in Speech Recognition with Articulatory Dynamics

Author: Frank Rudzicz

Abstract: We introduce a novel mechanism for incorporating articulatory dynamics into speech recognition with the theory of task dynamics. This system reranks sentencelevel hypotheses by the likelihoods of their hypothetical articulatory realizations which are derived from relationships learned with aligned acoustic/articulatory data. Experiments compare this with two baseline systems, namely an acoustic hidden Markov model and a dynamic Bayes network augmented with discretized representations of the vocal tract. Our system based on task dynamics reduces worderror rates significantly by 10.2% relative to the best baseline models.

5 0.065097637 26 acl-2010-All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision

Author: Mitesh Khapra ; Anup Kulkarni ; Saurabh Sohoney ; Pushpak Bhattacharyya

Abstract: In spite of decades of research on word sense disambiguation (WSD), all-words general purpose WSD has remained a distant goal. Many supervised WSD systems have been built, but the effort of creating the training corpus - annotated sense marked corpora - has always been a matter of concern. Therefore, attempts have been made to develop unsupervised and knowledge based techniques for WSD which do not need sense marked corpora. However such approaches have not proved effective, since they typically do not better Wordnet first sense baseline accuracy. Our research reported here proposes to stick to the supervised approach, but with far less demand on annotation. We show that if we have ANY sense marked corpora, be it from mixed domain or a specific domain, a small amount of annotation in ANY other domain can deliver the goods almost as if exhaustive sense marking were available in that domain. We have tested our approach across Tourism and Health domain corpora, using also the well known mixed domain SemCor corpus. Accuracy figures close to self domain training lend credence to the viability of our approach. Our contribution thus lies in finding a convenient middle ground between pure supervised and pure unsupervised WSD. Finally, our approach is not restricted to any specific set of target words, a departure from a commonly observed practice in domain specific WSD.

6 0.062219664 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices

7 0.060670752 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation

8 0.054178897 167 acl-2010-Learning to Adapt to Unknown Users: Referring Expression Generation in Spoken Dialogue Systems

9 0.053586818 221 acl-2010-Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish

10 0.052203715 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages

11 0.048797742 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans

12 0.048583254 187 acl-2010-Optimising Information Presentation for Spoken Dialogue Systems

13 0.047960173 142 acl-2010-Importance-Driven Turn-Bidding for Spoken Dialogue Systems

14 0.045766886 199 acl-2010-Preferences versus Adaptation during Referring Expression Generation

15 0.045731824 80 acl-2010-Cross Lingual Adaptation: An Experiment on Sentiment Classifications

16 0.045603603 54 acl-2010-Boosting-Based System Combination for Machine Translation

17 0.042341359 137 acl-2010-How Spoken Language Corpora Can Refine Current Speech Motor Training Methodologies

18 0.042260882 78 acl-2010-Cross-Language Text Classification Using Structural Correspondence Learning

19 0.041965496 112 acl-2010-Extracting Social Networks from Literary Fiction

20 0.04151706 14 acl-2010-A Risk Minimization Framework for Extractive Speech Summarization


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.103), (1, -0.008), (2, -0.041), (3, -0.049), (4, 0.014), (5, -0.06), (6, -0.05), (7, 0.005), (8, 0.006), (9, 0.016), (10, 0.033), (11, 0.055), (12, 0.04), (13, -0.048), (14, -0.03), (15, -0.042), (16, 0.014), (17, 0.011), (18, 0.043), (19, 0.018), (20, 0.016), (21, -0.089), (22, 0.021), (23, -0.04), (24, -0.004), (25, 0.102), (26, 0.048), (27, 0.008), (28, -0.012), (29, -0.02), (30, -0.1), (31, 0.097), (32, 0.066), (33, 0.021), (34, 0.082), (35, 0.013), (36, -0.005), (37, 0.09), (38, 0.011), (39, 0.088), (40, -0.151), (41, -0.042), (42, 0.113), (43, -0.079), (44, -0.227), (45, 0.049), (46, 0.14), (47, 0.025), (48, -0.013), (49, 0.109)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95982969 193 acl-2010-Personalising Speech-To-Speech Translation in the EMIME Project

Author: Mikko Kurimo ; William Byrne ; John Dines ; Philip N. Garner ; Matthew Gibson ; Yong Guan ; Teemu Hirsimaki ; Reima Karhila ; Simon King ; Hui Liang ; Keiichiro Oura ; Lakshmi Saheer ; Matt Shannon ; Sayaki Shiota ; Jilei Tian

Abstract: In the EMIME project we have studied unsupervised cross-lingual speaker adaptation. We have employed an HMM statistical framework for both speech recognition and synthesis which provides transformation mechanisms to adapt the synthesized voice in TTS (text-to-speech) using the recognized voice in ASR (automatic speech recognition). An important application for this research is personalised speech-to-speech translation that will use the voice of the speaker in the input language to utter the translated sentences in the output language. In mobile environments this enhances the users’ interaction across language barriers by making the output speech sound more like the original speaker’s way of speaking, even if she or he could not speak the output language.

2 0.7102223 91 acl-2010-Domain Adaptation of Maximum Entropy Language Models

Author: Tanel Alumae ; Mikko Kurimo

Abstract: We investigate a recently proposed Bayesian adaptation method for building style-adapted maximum entropy language models for speech recognition, given a large corpus of written language data and a small corpus of speech transcripts. Experiments show that the method consistently outperforms linear interpolation which is typically used in such cases.

3 0.70930517 137 acl-2010-How Spoken Language Corpora Can Refine Current Speech Motor Training Methodologies

Author: Daniil Umanski ; Federico Sangati

Abstract: The growing availability of spoken language corpora presents new opportunities for enriching the methodologies of speech and language therapy. In this paper, we present a novel approach for constructing speech motor exercises, based on linguistic knowledge extracted from spoken language corpora. In our study with the Dutch Spoken Corpus, syllabic inventories were obtained by means of automatic syllabification of the spoken language data. Our experimental syllabification method exhibited a reliable performance, and allowed for the acquisition of syllabic tokens from the corpus. Consequently, the syl- labic tokens were integrated in a tool for clinicians, a result which holds the potential of contributing to the current state of speech motor training methodologies.

4 0.66830206 74 acl-2010-Correcting Errors in Speech Recognition with Articulatory Dynamics

Author: Frank Rudzicz

Abstract: We introduce a novel mechanism for incorporating articulatory dynamics into speech recognition with the theory of task dynamics. This system reranks sentencelevel hypotheses by the likelihoods of their hypothetical articulatory realizations which are derived from relationships learned with aligned acoustic/articulatory data. Experiments compare this with two baseline systems, namely an acoustic hidden Markov model and a dynamic Bayes network augmented with discretized representations of the vocal tract. Our system based on task dynamics reduces worderror rates significantly by 10.2% relative to the best baseline models.

5 0.66443115 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study

Author: Yun-Cheng Ju ; Tim Paek

Abstract: Speech recognition affords automobile drivers a hands-free, eyes-free method of replying to Short Message Service (SMS) text messages. Although a voice search approach based on template matching has been shown to be more robust to the challenging acoustic environment of automobiles than using dictation, users may have difficulties verifying whether SMS response templates match their intended meaning, especially while driving. Using a high-fidelity driving simulator, we compared dictation for SMS replies versus voice search in increasingly difficult driving conditions. Although the two approaches did not differ in terms of driving performance measures, users made about six times more errors on average using dictation than voice search. 1

6 0.49873763 151 acl-2010-Intelligent Selection of Language Model Training Data

7 0.42338955 199 acl-2010-Preferences versus Adaptation during Referring Expression Generation

8 0.40712535 61 acl-2010-Combining Data and Mathematical Models of Language Change

9 0.39824802 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation

10 0.39583254 26 acl-2010-All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision

11 0.38766265 173 acl-2010-Modeling Norms of Turn-Taking in Multi-Party Conversation

12 0.37201312 100 acl-2010-Enhanced Word Decomposition by Calibrating the Decision Threshold of Probabilistic Models and Using a Model Ensemble

13 0.35423487 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages

14 0.32189053 78 acl-2010-Cross-Language Text Classification Using Structural Correspondence Learning

15 0.31462768 224 acl-2010-Talking NPCs in a Virtual Game World

16 0.30532593 187 acl-2010-Optimising Information Presentation for Spoken Dialogue Systems

17 0.30307907 80 acl-2010-Cross Lingual Adaptation: An Experiment on Sentiment Classifications

18 0.29847887 167 acl-2010-Learning to Adapt to Unknown Users: Referring Expression Generation in Spoken Dialogue Systems

19 0.29748353 64 acl-2010-Complexity Assumptions in Ontology Verbalisation

20 0.29547653 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.012), (25, 0.03), (39, 0.02), (42, 0.023), (45, 0.012), (52, 0.427), (59, 0.086), (71, 0.013), (73, 0.048), (78, 0.021), (83, 0.081), (84, 0.026), (98, 0.11)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75903165 193 acl-2010-Personalising Speech-To-Speech Translation in the EMIME Project

Author: Mikko Kurimo ; William Byrne ; John Dines ; Philip N. Garner ; Matthew Gibson ; Yong Guan ; Teemu Hirsimaki ; Reima Karhila ; Simon King ; Hui Liang ; Keiichiro Oura ; Lakshmi Saheer ; Matt Shannon ; Sayaki Shiota ; Jilei Tian

Abstract: In the EMIME project we have studied unsupervised cross-lingual speaker adaptation. We have employed an HMM statistical framework for both speech recognition and synthesis which provides transformation mechanisms to adapt the synthesized voice in TTS (text-to-speech) using the recognized voice in ASR (automatic speech recognition). An important application for this research is personalised speech-to-speech translation that will use the voice of the speaker in the input language to utter the translated sentences in the output language. In mobile environments this enhances the users’ interaction across language barriers by making the output speech sound more like the original speaker’s way of speaking, even if she or he could not speak the output language.

2 0.60650122 154 acl-2010-Jointly Optimizing a Two-Step Conditional Random Field Model for Machine Transliteration and Its Fast Decoding Algorithm

Author: Dong Yang ; Paul Dixon ; Sadaoki Furui

Abstract: This paper presents a joint optimization method of a two-step conditional random field (CRF) model for machine transliteration and a fast decoding algorithm for the proposed method. Our method lies in the category of direct orthographical mapping (DOM) between two languages without using any intermediate phonemic mapping. In the two-step CRF model, the first CRF segments an input word into chunks and the second one converts each chunk into one unit in the target language. In this paper, we propose a method to jointly optimize the two-step CRFs and also a fast algorithm to realize it. Our experiments show that the proposed method outper- forms the well-known joint source channel model (JSCM) and our proposed fast algorithm decreases the decoding time significantly. Furthermore, combination of the proposed method and the JSCM gives further improvement, which outperforms state-of-the-art results in terms of top-1 accuracy.

3 0.58749175 42 acl-2010-Automatically Generating Annotator Rationales to Improve Sentiment Classification

Author: Ainur Yessenalina ; Yejin Choi ; Claire Cardie

Abstract: One ofthe central challenges in sentimentbased text categorization is that not every portion of a document is equally informative for inferring the overall sentiment of the document. Previous research has shown that enriching the sentiment labels with human annotators’ “rationales” can produce substantial improvements in categorization performance (Zaidan et al., 2007). We explore methods to automatically generate annotator rationales for document-level sentiment classification. Rather unexpectedly, we find the automatically generated rationales just as helpful as human rationales.

4 0.51515114 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields

Author: Jackie Chi Kit Cheung ; Gerald Penn

Abstract: One goal of natural language generation is to produce coherent text that presents information in a logical order. In this paper, we show that topological fields, which model high-level clausal structure, are an important component of local coherence in German. First, we show in a sentence ordering experiment that topological field information improves the entity grid model of Barzilay and Lapata (2008) more than grammatical role and simple clausal order information do, particularly when manual annotations of this information are not available. Then, we incorporate the model enhanced with topological fields into a natural language generation system that generates constituent orders for German text, and show that the added coherence component improves performance slightly, though not statistically significantly.

5 0.37406716 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans

Author: Fei Huang ; Alexander Yates

Abstract: Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling. In experiments, our novel system reduces error by 16% relative to the previous state of the art on out-of-domain text.

6 0.37142125 56 acl-2010-Bridging SMT and TM with Translation Recommendation

7 0.3713209 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts

8 0.37011546 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features

9 0.3700369 39 acl-2010-Automatic Generation of Story Highlights

10 0.36998099 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking

11 0.36903006 144 acl-2010-Improved Unsupervised POS Induction through Prototype Discovery

12 0.36892495 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition

13 0.36840355 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries

14 0.36806387 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data

15 0.36799121 195 acl-2010-Phylogenetic Grammar Induction

16 0.36772275 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences

17 0.36744043 65 acl-2010-Complexity Metrics in an Incremental Right-Corner Parser

18 0.36712956 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation

19 0.36706072 54 acl-2010-Boosting-Based System Combination for Machine Translation

20 0.36682501 29 acl-2010-An Exact A* Method for Deciphering Letter-Substitution Ciphers