nips nips2009 nips2009-227 nips2009-227-reference knowledge-graph by maker-knowledge-mining

227 nips-2009-Speaker Comparison with Inner Product Discriminant Functions


Source: pdf

Author: Zahi Karam, Douglas Sturim, William M. Campbell

Abstract: Speaker comparison, the process of finding the speaker similarity between two speech signals, occupies a central role in a variety of applications—speaker verification, clustering, and identification. Speaker comparison can be placed in a geometric framework by casting the problem as a model comparison process. For a given speech signal, feature vectors are produced and used to adapt a Gaussian mixture model (GMM). Speaker comparison can then be viewed as the process of compensating and finding metrics on the space of adapted models. We propose a framework, inner product discriminant functions (IPDFs), which extends many common techniques for speaker comparison—support vector machines, joint factor analysis, and linear scoring. The framework uses inner products between the parameter vectors of GMM models motivated by several statistical methods. Compensation of nuisances is performed via linear transforms on GMM parameter vectors. Using the IPDF framework, we show that many current techniques are simple variations of each other. We demonstrate, on a 2006 NIST speaker recognition evaluation task, new scoring methods using IPDFs which produce excellent error rates and require significantly less computation than current techniques.


reference text

[1] Douglas A. Reynolds, T. F. Quatieri, and R. Dunn, “Speaker verification using adapted Gaussian mixture models,” Digital Signal Processing, vol. 10, no. 1-3, pp. 19–41, 2000.

[2] W. M. Campbell, D. E. Sturim, D. A. Reynolds, and A. Solomonoff, “SVM based speaker verification using a GMM supervector kernel and NAP variability compensation,” in Proc. ICASSP, 2006, pp. I97– I100.

[3] C. Longworth and M. J. F. Gales, “Derivative and parametric kernels for speaker verification,” in Proc. Interspeech, 2007, pp. 310–313.

[4] W. M. Campbell, “Generalized linear discriminant sequence kernels for speaker recognition,” in Proc. ICASSP, 2002, pp. 161–164.

[5] P. Kenny, P. Ouellet, N. Dehak, V. Gupta, and P. Dumouchel, “A study of inter-speaker variability in speaker verification,” IEEE Transactions on Audio, Speech and Language Processing, 2008.

[6] Ondrej Glembek, Lukas Burget, Najim Dehak, Niko Brummer, and Patrick Kenny, “Comparison of scoring methods used in speaker recognition with joint factor analysis,” in Proc. ICASSP, 2009.

[7] Pedro J. Moreno, Purdy P. Ho, and Nuno Vasconcelos, “A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications,” in Adv. in Neural Inf. Proc. Systems 16, S. Thrun, L. Saul, and B. Schölkopf, Eds. MIT Press, Cambridge, MA, 2004.

[8] Keinosuke Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, 1990.

[9] Simon Lucey and Tsuhan Chen, “Improved speaker verification through probabilistic subspace adaptation,” in Proc. Interspeech, 2003, pp. 2021–2024.

[10] Robbie Vogt, Brendan Baker, and Sridha Sriharan, “Modelling session variability in text-independent speaker verification,” in Proc. Interspeech, 2005, pp. 3117–3120.

[11] Mark J. F. Gales, “Cluster adaptive training of hidden markov models,” IEEE Trans. Speech and Audio Processing, vol. 8, no. 4, pp. 417–428, 2000.

[12] M. A. Przybocki, A. F. Martin, and A. N. Le, “NIST speaker recognition evaluations utilizing the Mixer corpora—2004,2005,2006,” IEEE Trans. on Speech, Audio, Lang., vol. 15, no. 7, pp. 1951–1959, 2007.

[13] Roland Auckenthaler, Michael Carey, and Harvey Lloyd-Thomas, “Score normalization for textindependent speaker verification systems,” Digital Signal Processing, vol. 10, pp. 42–54, 2000.

[14] J. Odell, D. Ollason, P. Woodland, S. Young, and J. Jansen, The HTK Book for HTK V2.0, Cambridge University Press, Cambridge, UK, 1995. 9