nips nips2001 nips2001-172 nips2001-172-reference knowledge-graph by maker-knowledge-mining

172 nips-2001-Speech Recognition using SVMs

Source: pdf

Author: N. Smith, Mark Gales

Abstract: An important issue in applying SVMs to speech recognition is the ability to classify variable length sequences. This paper presents extensions to a standard scheme for handling this variable length data, the Fisher score. A more useful mapping is introduced based on the likelihood-ratio. The score-space defined by this mapping avoids some limitations of the Fisher score. Class-conditional generative models are directly incorporated into the definition of the score-space. The mapping, and appropriate normalisation schemes, are evaluated on a speaker-independent isolated letter task where the new mapping outperforms both the Fisher score and HMMs trained to maximise likelihood. 1

reference text

[1] V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, 1995.

[2] N. Smith, M. Gales, and M. Niranjan. Data-dependent kernels in SVM classification of speech patterns. Tech. Report CUED/F-INFENG/TR.387, Cambridge University Eng.Dept., April 2001.

[3] K. Tsuda et al. A New Discriminative Kernel from Probabilistic Models. In T.G . Dietterich, S. Becker and Z. Ghahramani, editors Advances in Neural Information Processing Systems 14, MIT Press, 2002.

[4] T. Jaakkola and D. Haussler. Exploiting Generative Models in Discriminative Classifiers. In M.S. Kearns, S.A. Solia, and D.A. Cohn, editors, Advances in Neural Information Processing Systems 11 . MIT Press, 1999.

[5] N. Oliver, B. Scholkopf, and A. Smola. Advances in Large-Margin Classifiers, chapter Natural Regularization from Generative Models. MIT Press, 2000.

[6] S. Fine, J. Navratil, and R. Gopinath. A hybrid GMM / SVM approach to speaker identification. In Proceedings, volume 1, International Conference on Acoustics, Speech, and Signal Processing, May 2001. Utah, USA .

[7] N. Smith and M. Gales. Using SVMs to classify variable length speech patterns. Tech. Report CUED/ F-INFENG/ TR.412, Cambridge University Eng.Dept., June 2001.

[8] M. Fanty and R . Cole. Spoken Letter Recognition. In R.P. Lippmann, J .E. Moody, and D.S . Touretzky, editors, Neural Information Processing Systems 3, pages 220-226 . Morgan Kaufmann Publishers, 1991.

[9] T. Joachims. Making Large-Scale SVM Learning Practical. In B. Scholkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning. MIT-Press, 1999.

[10] P.C. Loizou and A.S. Spanias. High-Performance Alphabet Recognition. IEEE Transactions on Speech and Audio Processing, 4(6):430-445, Nov. 1996.