nips nips2007 nips2007-130 nips2007-130-reference knowledge-graph by maker-knowledge-mining

130 nips-2007-Modeling Natural Sounds with Modulation Cascade Processes

Source: pdf

Author: Richard Turner, Maneesh Sahani

Abstract: Natural sounds are structured on many time-scales. A typical segment of speech, for example, contains features that span four orders of magnitude: Sentences (∼ 1 s); phonemes (∼ 10−1 s); glottal pulses (∼ 10−2 s); and formants ( 10−3 s). The auditory system uses information from each of these time-scales to solve complicated tasks such as auditory scene analysis [1]. One route toward understanding how auditory processing accomplishes this analysis is to build neuroscienceinspired algorithms which solve similar tasks and to compare the properties of these algorithms with properties of auditory processing. There is however a discord: Current machine-audition algorithms largely concentrate on the shorter time-scale structures in sounds, and the longer structures are ignored. The reason for this is two-fold. Firstly, it is a difﬁcult technical problem to construct an algorithm that utilises both sorts of information. Secondly, it is computationally demanding to simultaneously process data both at high resolution (to extract short temporal information) and for long duration (to extract long temporal information). The contribution of this work is to develop a new statistical model for natural sounds that captures structure across a wide range of time-scales, and to provide efﬁcient learning and inference algorithms. We demonstrate the success of this approach on a missing data task. 1

reference text

[1]

[2]

[3]

[4] Bregman, A.S. (1990) Auditory Scene Analysis. MIT Press. Smith E. & Lewicki, M.S. (2006) Efﬁcient Auditory Coding. Nature 439 (7079). Simoncelli, E.P. (2003) Vision and the statistics of the visual environment. Curr Opin Neurobi 13(2):144-9. Patterson, R.D. (2000) Auditory images: How complex sounds are represented in the auditory system. J Acoust Soc Japan (E) 21(4):183-190.

[5] Grifﬁn, D. & Lim J. (1984) Signal estimation from modiﬁed short-time Fourier transform. IEEE Trans. on ASSP 32(2):236-243.

[6] Qi, Y., Minka, T. & Picard, R.W. (2002) Bayesian Spectrum Estimation of Unevenly Sampled Nonstationary Data. MIT Media Lab Technical Report Vismod-TR-556.

[7] Attias, H. & Schreiner, C.E. (1997) Low-Order Temporal Statistics of Natural Sounds. Adv in Neural Info Processing Sys 9. MIT Press.

[8] Anonymous Authors (2007) Probabilistic Amplitude Demodulation. ICA 2007 Conference Proceedings. Springer, in press.

[9] Moore, B.C.J. (2003) An Introduction to the Psychology of Hearing. Academic Press.

[10] Karklin, Y. & Lewicki, M.S. (2005) A hierarchical Bayesian model for learning nonlinear statistical regularities in nonstationary natural signals. Neural Comput 17(2):397-423. 8