nips nips2009 nips2009-17 nips2009-17-reference knowledge-graph by maker-knowledge-mining

17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

Source: pdf

Author: Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj

Abstract: In this paper we present an algorithm for separating mixed sounds from a monophonic recording. Our approach makes use of training data which allows us to learn representations of the types of sounds that compose the mixture. In contrast to popular methods that attempt to extract compact generalizable models for each sound from training data, we employ the training data itself as a representation of the sources in the mixture. We show that mixtures of known sounds can be described as sparse combinations of the training data itself, and in doing so produce signiﬁcantly better separation results as compared to similar systems based on compact statistical models. Keywords: Example-Based Representation, Signal Separation, Sparse Models. 1

reference text

[1] S. T. Roweis, One microphone source separation, in Advances in Neural Information Processing Systems, 2001.

[2] Reddy, A.M. and B. Raj. Soft Mask Methods for Single-Channel Speaker Separation, in IEEE Transactions of Audio, Speech, and Language Processing, Volume: 15, Issue: 6, Aug 2007.

[3] T. Kristjansson, J. Hershey, P. Olsen, S. Rennie, and R. Gopinath, Super-human multitalker speech recognition: The IBM 2006 speech separation challenge system, in International Conference on Spoken Language Processing (INTERSPEECH), 2006, pp. 97–100, Kluwer Academic Publishers, ch. 20, pp. 295304.

[4] Casey, M.A., and A. Westner. Separation of mixed audio sources by independent subspace analysis, in Proceedings of the International Conference of Computer Music, 2000.

[5] Jang, G.-J., T.-W. Lee. A Maximum Likelihood Approach to Single-channel Source Separation, in Journal of Machine Learning Research 4 (2003) pp. 1365–1392.

[6] Pearlmutter, B., M. Zibulevsky, Blind Source Separation by Sparse Decomposition in a Signal Dictionary, in Neural Computation 13, pp. 863–882. 2001.

[7] L. Benaroya, L. M. Donagh, F. Bimbot, and R. Gribonval, Non negative sparse representation for wiener based source separation with a single sensor, in Acoustics, Speech, and Signal Processing, IEEE International Conference on, 2003, pp. 613–616.

[8] M. N. Schmidt and R. K. Olsson, Single-channel speech separation using sparse nonnegative matrix factorization, in International Conference on Spoken Language Processing (INTERSPEECH), 2006.

[9] T. Virtanen, Sound source separation using sparse coding with temporal continuity objective, in International Computer Music Conference, ICMC, 2003.

[10] Smaragdis, P. Raj, B. and Shashanka, M.V. 2007. Supervised and Semi-Supervised Separation of Sounds from Single-Channel Mixtures. In proceedings of ICA 2007. London, UK. September 2007.

[11] Raj, B.; Smaragdis, P. 2005. Latent Variable Decomposition of Spectrograms for single channel speaker separation. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, October, 2005.

[12] Shashanka, M.V., B. Raj, P. Smaragdis, 2007. Sparse Overcomplete Latent Variable Decoposition of Counts Data. In Neural Information Processing Systems (NIPS), Vancouver, BC, Canada. December 2007.

[13] Brand, M.E. Pattern Discovery via Entropy Minimization. In Uncertainty 99, AISTATS99,1999.

[14] Corless, R.M., G.H. Gonnet, D.E.G. Hare, D.J. Jeﬀrey, and D.E. Knuth. On the Lambert W Function. Advances in Computational Mathematics,1996.

[15] Bouguila N. and D. Ziou. Using unsupervised learning of a ﬁnite Dirichlet mixture model to improve pattern recognition applications, Pattern Recognition Letters, Volume 26, Issue 12, September 2005.

[16] Hinneburg, A., Gabriel, H.-H. and Gohr, A. Bayesian Folding-In with Dirichlet Kernels for PLSI, in Seventh IEEE International Conference on Data Mining, Oct. 2007

[17] F´votte, C., R. Gribonval and E. Vincent. 2005. BSS EVAL Toolbox User Guide, IRISA e Technical Report 1706, Rennes, France, April 2005. 9