nips nips2012 nips2012-270 nips2012-270-reference knowledge-graph by maker-knowledge-mining

270 nips-2012-Phoneme Classification using Constrained Variational Gaussian Process Dynamical System


Source: pdf

Author: Hyunsin Park, Sungrack Yun, Sanghyuk Park, Jongmin Kim, Chang D. Yoo

Abstract: For phoneme classification, this paper describes an acoustic model based on the variational Gaussian process dynamical system (VGPDS). The nonlinear and nonparametric acoustic model is adopted to overcome the limitations of classical hidden Markov models (HMMs) in modeling speech. The Gaussian process prior on the dynamics and emission functions respectively enable the complex dynamic structure and long-range dependency of speech to be better represented than that by an HMM. In addition, a variance constraint in the VGPDS is introduced to eliminate the sparse approximation error in the kernel matrix. The effectiveness of the proposed model is demonstrated with three experimental results, including parameter estimation and classification performance, on the synthetic and benchmark datasets. 1


reference text

[1] F. Jelinek, “Continuous speech recognition by statistical methods,” Proceedings of the IEEE, Vol.64, pp.532556, 1976.

[2] M. Ostendorf, V. Digalakis, and J. Rohlicek, “From HMMs to segment models: A unified view of stochastic modeling for speech recognition,” IEEE Trans. on Speech and Audio Processing, Vol.4, pp.360-378, 1996.

[3] L. Deng, D. Yu, and A. Acero, “Structured Speech Modeling,” IEEE Trans. on Audio, Speech, and Language Processing, Vol.14, pp.1492-1504, 2006.

[4] C. E. Rasmussen and C. K. I. Williams, “Gaussian Process for Machine Learning,” MIT Press, Cambridge, MA, 2006.

[5] N. D. Lawrence, “Probabilistic non-linear principal component analysis with Gaussian process latent variable models,” Journal of Machine Learning Research (JMLR), Vol.6, pp.1783-1816, 2005.

[6] N. D. Lawrence, “Learning for larger datasets with the Gaussian process latent variable model,” International Conference on Artificial Intelligence and Statistics (AISTATS), pp.243-250, 2007.

[7] M. K. Titsias and N. D. Lawrence, “Bayesian Gaussian Process Latent Variable Model,” International Conference on Artificial Intelligence and Statistics (AISTATS), pp.844-851, 2010.

[8] J. Qui˜ onero-Candela and C. E. Rasmussen, “A Unifying View of Sparse Approximate Gaussian Process n Regression,” Journal of Machine Learning Research (JMLR), Vol.6, pp.1939-1959, 2005.

[9] A. C. Damianou, M. K. Titsias, and N. D. Lawrence, “Variational Gaussian Process Dynamical Systems,” Advances in Neural Information Processing Systems (NIPS), 2011.

[10] J. M. Wang, D. J. Fleet, and A. Hertzmann, “Gaussian Process Dynamical Models for Human Motion,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.30, pp.283-298, 2008.

[11] K. F. Lee and H. W. Hon, “Speaker-independent phone recognition using hidden Markov models,” IEEE Trans. on Acoustics, Speech and Signal Processing, vol.37, pp.1641-1648, 1989.

[12] A. Mohamed, G. Dahl, and G. Hinton, “Acoustic modeling using deep belief networks,” IEEE Trans. on Audio, Speech, and Language Processing, Vol.20, no.1, pp. 14-22, 2012.

[13] F. Sha and L. K. Saul, “Large margin hidden markov models for automatic speech recognition,” Advances in Neural Information Processing Systems (NIPS), 2007. 9