nips nips2004 nips2004-31 nips2004-31-reference knowledge-graph by maker-knowledge-mining

31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach


Source: pdf

Author: Francis R. Bach, Michael I. Jordan

Abstract: We present an algorithm to perform blind, one-microphone speech separation. Our algorithm separates mixtures of speech without modeling individual speakers. Instead, we formulate the problem of speech separation as a problem in segmenting the spectrogram of the signal into two or more disjoint sets. We build feature sets for our segmenter using classical cues from speech psychophysics. We then combine these features into parameterized affinity matrices. We also take advantage of the fact that we can generate training examples for segmentation by artificially superposing separately-recorded signals. Thus the parameters of the affinity matrices can be tuned using recent work on learning spectral clustering [1]. This yields an adaptive, speech-specific segmentation algorithm that can successfully separate one-microphone speech mixtures. 1


reference text

[1] F. R. Bach and M. I. Jordan. Learning spectral clustering. In NIPS 16, 2004.

[2] A. Hyv¨ rinen, J. Karhunen, and E. Oja. Independent Component Analysis. John a Wiley & Sons, 2001.

[3] M. Zibulevsky, P. Kisilev, Y. Y. Zeevi, and B. A. Pearlmutter. Blind source separation via multinode sparse representation. In NIPS 14, 2002.

[4] O. Yilmaz and S. Rickard. Blind separation of speech mixtures via time-frequency masking. IEEE Trans. Sig. Proc., 52(7):1830–1847, 2004.

[5] S. T. Roweis. One microphone source separation. In NIPS 13, 2001.

[6] G.-J. Jang and T.-W. Lee. A probabilistic approach to single channel source separation. In NIPS 15, 2003.

[7] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE PAMI, 22(8):888–905, 2000.

[8] A. S. Bregman. Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press, 1990.

[9] G. J. Brown and M. P. Cooke. Computational auditory scene analysis. Computer Speech and Language, 8:297–333, 1994.

[10] S. Mallat. A Wavelet Tour of Signal Processing. Academic Press, 1998.

[11] M. Cooke and D. P. W. Ellis. The auditory organization of speech and other sources in listeners and computational models. Speech Communication, 35(3-4):141–177, 2001.

[12] B. Gold and N. Morgan. Speech and Audio Signal Processing: Processing and Perception of Speech and Music. Wiley Press, 1999.

[13] S. Belongie, C. Fowlkes, F. Chung, and J. Malik. Spectral partitioning with indefinite kernels using the Nystr¨ m extension. In ECCV, 2002. o

[14] G. Wahba. Spline Models for Observational Data. SIAM, 1990.

[15] F. R. Bach and M. I. Jordan. Discriminative training of hidden Markov models for multiple pitch tracking. In ICASSP, 2005.