nips nips2000 nips2000-51 nips2000-51-reference knowledge-graph by maker-knowledge-mining

51 nips-2000-Factored Semi-Tied Covariance Matrices

Source: pdf

Author: Mark J. F. Gales

Abstract: A new form of covariance modelling for Gaussian mixture models and hidden Markov models is presented. This is an extension to an efficient form of covariance modelling used in speech recognition, semi-tied covariance matrices. In the standard form of semi-tied covariance matrices the covariance matrix is decomposed into a highly shared decorrelating transform and a component-specific diagonal covariance matrix. The use of a factored decorrelating transform is presented in this paper. This factoring effectively increases the number of possible transforms without increasing the number of free parameters. Maximum likelihood estimation schemes for all the model parameters are presented including the component/transform assignment, transform and component parameters. This new model form is evaluated on a large vocabulary speech recognition task. It is shown that using this factored form of covariance modelling reduces the word error rate.

reference text

[1] A P Dempster, N M Laird, and D B Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39:1-38, 1977.

[2] M J F Gales. Maximum likelihood multiple projection schemes for hidden Markov models. Technical Report CUEDIF-INFENGffR365, Cambridge University, 1999. Available via anonymous ftp from: svr-ftp.eng.cam.ac.uk.

[3] M J F Gales. Semi-tied covariance matrices for hidden Markov models. IEEE Transactions Speech and Audio Processing, 7:272-281, 1999.

[4] N K Goel and R Gopinath. Multiple linear transforms. In Proceedings ICASSP, 2001. To appear.

[5] N Kumar. Investigation of Silicon-Auditory Models and Generalization of Linear Discriminant Analysisfor Improved Speech Recognition. PhD thesis, John Hopkins University, 1997.

[6] L R Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, February 1989.

[7] S Roweiss and Z Ghahramani. A unifying review of linear Gaussian models. Neural Computation, 11:305-345, 1999.

[8] PC Woodland, J J Odell, V Valtchev, and S J Young. The development of the 1994 HTK large vocabulary speech recognition system. In Proceedings ARPA Workshop on Spoken Language Systems Technology, pages 104-109, 1995.

[9] S J Young, J Jansen, J Odell, D Ollason, and P Woodland. The HTK Book (for HTK Version 2.0). Cambridge University, 1996.