nips nips2003 nips2003-156 nips2003-156-reference knowledge-graph by maker-knowledge-mining

156 nips-2003-Phonetic Speaker Recognition with Support Vector Machines

Source: pdf

Author: William M. Campbell, Joseph P. Campbell, Douglas A. Reynolds, Douglas A. Jones, Timothy R. Leek

Abstract: A recent area of signiﬁcant progress in speaker recognition is the use of high level features—idiolect, phonetic relations, prosody, discourse structure, etc. A speaker not only has a distinctive acoustic sound but uses language in a characteristic manner. Large corpora of speech data available in recent years allow experimentation with long term statistics of phone patterns, word patterns, etc. of an individual. We propose the use of support vector machines and term frequency analysis of phone sequences to model a given speaker. To this end, we explore techniques for text categorization applied to the problem. We derive a new kernel based upon a linearization of likelihood ratio scoring. We introduce a new phone-based SVM speaker recognition approach that halves the error rate of conventional phone-based approaches.

reference text

[1] Douglas A. Reynolds, T. F. Quatieri, and R. Dunn, “Speaker veriﬁcation using adapted Gaussian mixture models,” Digital Signal Processing, vol. 10, no. 1-3, pp. 19–41, 2000.

[2] W. M. Campbell, “Generalized linear discriminant sequence kernels for speaker recognition,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2002, pp. 161–164.

[3] T. F. Quatieri, D. A. Reynolds, and G. C. O’Leary, “Estimation of handset nonlinearity with application to speaker recognition,” IEEE Trans. Speech and Audio Processing, vol. 8, no. 5, pp. 567–584, 2000.

[4] Astrid Schmidt-Nielsen and Thomas H. Crystal, “Speaker veriﬁcation by human listeners: Experiments comparing human and machine performance using the NIST 1998 speaker evaluation data,” Digital Signal Processing, vol. 10, pp. 249–266, 2000.

[5] G. Doddington, “Speaker recognition based on idiolectal differences between speakers,” in Proceedings of Eurospeech, 2001, pp. 2521–2524.

[6] Walter D. Andrews, Mary A. Kohler, Joseph P. Campbell, John J. Godfrey, and Jaime Hern´ ndez-Cordero, “Gender-dependent phonetic refraction for speaker recognition,” in Proa ceedings of the International Conference on Acoustics Speech and Signal Processing, 2002, pp. I149–I153.

[7] David Klus´ cek, Jir´ Navar´ til, D. A. Reynolds, and J. P. Campbell, “Conditional pronunciation aˇ i a modeling in speaker detection,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2003, pp. IV–804–IV–807.

[8] Andre Adami, Radu Mihaescu, Douglas A. Reynolds, and John J. Godfrey, “Modeling prosodic dynamics for speaker recognition,” in Proceedings of the International Conference on Acoustics Speech and Signal Processing, 2003, pp. IV–788–IV–791.

[9] M. Przybocki and A. Martin, “The NIST year 2003 speaker recognition evaluation plan,” http://www.nist.gov/speech/tests/spk/2003/index.htm, 2003.

[10] Linguistic Data Consortium, “Switchboard-2 corpora,” http://www.ldc.upenn.edu.

[11] M. Zissman, “Comparison of four approaches to automatic language identiﬁcation of telephone speech,” IEEE Trans. Speech and Audio Processing, vol. 4, no. 1, pp. 31–44, 1996.

[12] Thorsten Joachims, Learning to Classify Text Using Support Vector Machines, Kluwer Academic Publishers, 2002.

[13] G. Salton and C. Buckley, “Term weighting approaches in automatic text retrieval,” Information Processing and Management, vol. 24, no. 5, pp. 513–523, 1988.

[14] Ronan Collobert and Samy Bengio, “SVMTorch: Support vector machines for large-scale regression problems,” Journal of Machine Learning Research, vol. 1, pp. 143–160, 2001.

[15] Alvin Martin, G. Doddington, T. Kamm, M. Ordowski, and Marc Przybocki, “The DET curve in assessment of detection task performance,” in Proceedings of Eurospeech, 1997, pp. 1895– 1898.