nips nips2000 nips2000-65 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Lucas C. Parra, Clay Spence, Paul Sajda
Abstract: We present evidence that several higher-order statistical properties of natural images and signals can be explained by a stochastic model which simply varies scale of an otherwise stationary Gaussian process. We discuss two interesting consequences. The first is that a variety of natural signals can be related through a common model of spherically invariant random processes, which have the attractive property that the joint densities can be constructed from the one dimensional marginal. The second is that in some cases the non-stationarity assumption and only second order methods can be explicitly exploited to find a linear basis that is equivalent to independent components obtained with higher-order methods. This is demonstrated on spectro-temporal components of speech. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We present evidence that several higher-order statistical properties of natural images and signals can be explained by a stochastic model which simply varies scale of an otherwise stationary Gaussian process. [sent-3, score-1.042]
2 The first is that a variety of natural signals can be related through a common model of spherically invariant random processes, which have the attractive property that the joint densities can be constructed from the one dimensional marginal. [sent-5, score-1.054]
3 The second is that in some cases the non-stationarity assumption and only second order methods can be explicitly exploited to find a linear basis that is equivalent to independent components obtained with higher-order methods. [sent-6, score-0.193]
4 1 Introduction Recently, considerable attention has been paid to understanding and modeling the non-Gaussian or "higher-order" properties of natural signals, particularly images. [sent-8, score-0.246]
5 For example, marginal densities of features have been shown to have high kurtosis or "heavy tails", indicating a non-Gaussian, sparse representation. [sent-10, score-0.512]
6 Another example is the "bow-tie" shape of conditional distributions of neighboring features, indicating dependence of variances [11]. [sent-11, score-0.321]
7 These non-Gaussian properties have motivated a number of image and signal processing algorithms that attempt to exploit higher-order statistics of the signals, e. [sent-12, score-0.302]
8 In this paper we show that these previously observed higher-order phenomena are ubiquitous and can be accounted for by a model which simply varies the scale of an otherwise stationary Gaussian process. [sent-15, score-0.522]
9 This enables us to relate a variety of natural signals to one another and to spherically invariant random processes, which are well-known in the signal processing literature [6, 3]. [sent-16, score-0.842]
10 Finally we present the results of experiments with algorithms for finding a linear basis equivalent to independent components that exploit non-stationarity so as to require only 2nd-order statistics. [sent-20, score-0.243]
11 This simplification is possible whenever linearity and non-stationarity of independent sources is guaranteed such as for the powers of acoustic signals. [sent-21, score-0.296]
12 2 Scale non-stationarity and high kurtosis Natural signals can be non-stationary in various ways, e. [sent-22, score-0.465]
13 varying powers, changing correlation of neighboring samples, or even non-stationary higher moments. [sent-24, score-0.218]
14 We will concentrate on the simplest possible variation and show in the following sections how it can give rise to many higher-order properties observed in natural signals. [sent-25, score-0.335]
15 We assume that at any given instance a signal is specified by a probability density function with zero mean and unknown scale or power. [sent-26, score-0.458]
16 The signal is assumed nonstationary in the sense that its power varies from one time instance to the next. [sent-27, score-0.199]
17 1 We can think of this as a stochastic process with samples z(t) drawn from a zero mean distribution Pz(z) with samples possibly correlated in time. [sent-28, score-0.318]
18 We observe a scaled version of this process with time varying scales s(t) > sampled from Ps(s), 째 x(t) = s(t)z(t) , (1) The observable process x(t) is distributed according to Px(x) = (OOdsPs(s)Px(xls) 10 = r dsps(s) S-l Pz(~). [sent-29, score-0.204]
19 10 XJ s (2) We refer to px(x) as the long-term distribution and pz(z) as the instantaneous distribution. [sent-30, score-0.04]
20 In essence Px (x) is a mixture distribution with infinitely many kernels S-lpz(~). [sent-31, score-0.095]
21 We would like to relate the sparseness of Pz(z), as measured by the kurtosis, to the sparseness of the observable distribution Px(x). [sent-32, score-0.116]
22 For a zero mean random variable x this reduces up to a constant to K[x] = ~::;! [sent-35, score-0.165]
23 (3) In this case we find that the kurtosis of the long-term distribution is always larger than the kurtosis of the instantaneous distribution unless the scale is stationary ([9] and [1] for symmetric pz(z)), K[x] ~K[z]. [sent-37, score-0.757]
24 To see this note that the independence of sand z implies, (xn)x therefore, K[x] = K[z] (S4)s / (S2)~. [sent-38, score-0.042]
25 Together this leads to inequality (4), which states that for a fixed scale s(t), i. [sent-40, score-0.2]
26 the magnitude of the signal is stationary, the C kurtosis will be minimal. [sent-42, score-0.356]
27 Conversely, non-stationary signals, defined as a variable scaling of an otherwise stationary process, will have increased kurtosis. [sent-43, score-0.147]
28 IThroughout this paper we will refer to signals that are sampled in time. [sent-44, score-0.319]
29 Note that all the arguments apply equally well to a spatial rather than temporal sampling, that is, images rather than time series. [sent-45, score-0.044]
30 -1 -1 -2 -2 -3 -3 -4 -2 -2 -5 -2 -5 -2 Figure 1: Marginal distributions within 3 standard deviations are shown on a logarithmic scale; left to right: natural image features, speech sound intensities, stock market variation, MEG alpha activity. [sent-46, score-0.863]
31 On top the empirical histograms are presented and on bottom the model distributions. [sent-52, score-0.239]
32 The speech data has been fit with a Meijer-G function G5g [3]. [sent-53, score-0.206]
33 For the MEG activity, the stock market data and the image features a mixture of zero mean Gaussians was used. [sent-54, score-0.563]
34 Figure 1 shows empirical plots of the marginal distributions for four natural signals; image, speech, stock market, and MEG data. [sent-55, score-0.554]
35 As image feature we used a wavelet component for a 162x162 natural texture image of sand (presented in [4]). [sent-56, score-0.321]
36 3 s recording of a female speaker sampled at 8 kHz with a noise level less than -25 dB. [sent-59, score-0.12]
37 The signal has been band limited between 300 Hz and 3. [sent-60, score-0.096]
38 The market data are the daily closing values of the NY Stock exchange composite index from 02/01/1990 to 04/28/2000. [sent-62, score-0.15]
39 We analyzed the variation from the one day linear prediction value to remove the upwards trend of the last decade. [sent-63, score-0.052]
40 The MEG data is band-passed (10-12 Hz) alpha activity of a independent component of 122 MEG signals. [sent-64, score-0.186]
41 This independendt component exhibits alpha de-synchronization for a visio-motor integration task [10]. [sent-65, score-0.089]
42 One can see that in all four cases the kurtosis is high relative to a Gaussian (K = 3). [sent-66, score-0.258]
43 Our claim is that for natural signals, high kurtosis is a natural result of the scale non-stationarity of the signal. [sent-67, score-0.765]
44 Additional evidence comes from the behavior seen in the conditional histograms of the joint distributions, presented in the next section. [sent-68, score-0.393]
45 Figure 2 shows empirical conditional histograms for the four types of natural signals we considered earlier. [sent-70, score-0.719]
46 One can see that speech and stock-market data exhibit the same variance dependency or "bow-tie" shape exhibited by images. [sent-71, score-0.336]
47 -2 -2 -2 -2 -2 -2 -2 -2 Figure 2: (Top) Empirical conditional histograms and (bottom) model conditional density derived from the one dimensional marginals presented in the previous figure assuming the data is sampled form a SIRP. [sent-72, score-0.548]
48 Good correspondence validates the SIRP assumption which is equivalent to our non-stationary scale model for slow varying scales. [sent-73, score-0.279]
49 The model of Equation 1 can easily account for this observation if we assume slowly changing scales s(t). [sent-74, score-0.08]
50 A possible explanation is that neighboring samples or features exhibit a common scale. [sent-75, score-0.362]
51 If two zero mean stochastic variables are scaled both with the same factors their magnitude and variance will increase together. [sent-76, score-0.223]
52 That is, as the magnitudes of one variable increase so will the magnitude and the variance of the other variable. [sent-77, score-0.12]
53 This results in a broadening of the histogram of one variable as one increases the value of the conditioning variable - resulting in a "bow-tie" shaped conditional density. [sent-78, score-0.165]
54 4 Relationship to spherical invariant random process A closely related class of signals to those in Equation 1 is the so-called Spherical Invariant Random Process (SIRP). [sent-79, score-0.443]
55 If the signals are short time Gaussian and the powers vary slowly the class of signals described are approximately SIRPs. [sent-80, score-0.771]
56 Despite the restriction to Gaussian distributed z SIRPs have been shown to be a good model for a range of stochastic processes with very different higher-order properties, depending on the scale distributions Ps (s). [sent-81, score-0.341]
57 They have been used in a variety of signal processing applications [6]. [sent-82, score-0.164]
58 If z is multidimensional, such as a window of samples in a time series or a multi-dimensional feature vector, one talks about Spherically Invariant Random Vectors SIRVs. [sent-84, score-0.066]
59 Natural images have been modeled by what in essence is closely related to SIRV s - a infinite mixture of zero mean Gaussian features [11]. [sent-85, score-0.298]
60 The fundamental property of SIRPs is that the joint distribution of a SIRP is entirely defined by a univariate characteristic function Cx(u) and the covariance ~ of neighboring samples [6]. [sent-87, score-0.625]
61 They are directly related to our scale-non-stationarity model through a theorem by Kingman and Yao which states that any SIRP is equivalent to a zero mean Gaussian process z(t) with an independent stochastic scale s. [sent-88, score-0.483]
62 Furthermore the univariate characteristic function Cx(u) specifies Ps(s) and the 1D marginal Px(x) and visa versa [6]. [sent-89, score-0.263]
63 From the characteristic function Cx(u) and the covariance 1; one can also construct all higher dimensional joint densities. [sent-90, score-0.355]
64 Comparing the resulting 2D joint density to the observed joint density allows to us verify the assumption that the data is sampled from a SIRP. [sent-92, score-0.563]
65 In so doing we can more firmly assert that the observed two dimensional joint histograms can in fact be explained as a Gaussian process with a non-stationary scale. [sent-93, score-0.484]
66 If we use zero mean Gaussian mixtures, p1(X) = L~lmiexp(-x2/uT), as the 1D model distribution the resulting 2D joint distribution is simply Pn(x) = L~l mi exp( -x T 1;-l x / uT). [sent-94, score-0.29]
67 If the model density is given by a Meijer-G function, as suggested in [3] with P1(X) = ro1A)G5g(A 2X 2IA - 0. [sent-95, score-0.11]
68 5), then the 2D joint is p2(X) = ~:(A) G~g(A2xT1;-lxl - 0. [sent-97, score-0.156]
69 Brehm has used this approach to demonstrate that band-limited speech is well described by a SIRP [3] . [sent-100, score-0.206]
70 In addition, we show here that the same is true for the image features and stock market data presented above. [sent-101, score-0.429]
71 The model conditional densities shown in Figure 2 correspond well with the empirical conditional histograms. [sent-102, score-0.38]
72 We emphasize that these model 2D joint densities have been obtained only from the 1D marginal of Figure 1 and the covariance of neighboring samples. [sent-104, score-0.674]
73 The deviations of the observed and model 2D joint distributions are likely due to variable covariance itself, that is, not only does the overall scale or power vary with time, but the components of the covariance matrix vary independently of each other. [sent-105, score-0.832]
74 For example in speech the covariance of neighboring samples is well known to change considerably over time. [sent-106, score-0.562]
75 Nevertheless, the surprising result is that a simple scale non-stationarity model can reproduce the higher-order statistical properties in a variety of natural signals. [sent-107, score-0.659]
76 5 Spectro-temporallinear basis for speech As an example of the utility of this non-stationarity assumption, we analyze the statistical properties of the powers of a single source, in particular for speech signals. [sent-108, score-0.772]
77 Motivated by the auditory spectro-temporal receptive field reported in [5] and work on receptive fields and independent components we are interested to find a linear basis of independent components in a spectro-temporal window of speech signals. [sent-109, score-0.56]
78 In [9, 8] we show that one can use second order statistic to uniquely recover sources from a mixture provided that the mix is linear and the sources are non-stationary. [sent-110, score-0.121]
79 One can do so by finding a basis that guarantees uncorrelated signals at multiple time intervals (multiple decorrelation algorithm (MDA)). [sent-111, score-0.31]
80 Our present model argues that features of natural signals such as the powers in different frequency bands can be assumed non-stationary, while powers of independent signals are known to add "We had a barbecue over the weekend at my house. [sent-112, score-1.291]
81 In the vertical direction 21 Bark scale power bands are displayed. [sent-115, score-0.289]
82 5 s segment of the 200 s recording used to compute the different linear bases. [sent-117, score-0.044]
83 The three lower diagrams show three sets of 15 linear basis components for 2lx8 spectra-temporal segments of the speech powers. [sent-118, score-0.343]
84 We should be able therefore to identify with second order methods the same linear components as with independent component algorithms where highorder statistical assumptions are invoked. [sent-122, score-0.161]
85 We compute the powers in 21 frequency bands on a Bark scale for short consecutive time intervals. [sent-123, score-0.489]
86 We choose to find a basis for a segment of 21 bands and 8 neighboring time slices corresponding to 128 ms of signal between 0 and 4 kHz. [sent-124, score-0.519]
87 We used half overlapping windows of 256 samples such that for a 8 kHz signal neighboring time slices are 16 ms apart. [sent-125, score-0.429]
88 A set of 7808 such spectro-temporal segments were sampled from 200 s of the same speech data presented previously. [sent-126, score-0.282]
89 One can see that the components obtained with MDA are quite similar to the result of rcA and differ considerably from the principal components. [sent-128, score-0.107]
90 From this we conclude that speech powers can in fact be thought of as a linear combination of non-stationary independent components. [sent-129, score-0.462]
91 In general, the point we wish to make is to demonstrate the strength of secondorder methods when the assumptions of non-stationarity, independence, and linear superposition are met. [sent-130, score-0.035]
92 6 Conclusion We have presented evidence that several high-order statistical properties of natural signals can be explained by a simple scale non-stationary model. [sent-131, score-0.751]
93 For four types of natural signals, we have shown that a scale non-stationary model will reproduce the high-kurtosis behavior of the marginal densities. [sent-132, score-0.647]
94 Furthermore, for the case of scale non-stationary with Gaussian density (SIRP), we have shown that we can reproduce the variance dependency seen in conditional histograms of the joint density directly from the empirical marginal densities. [sent-133, score-1.078]
95 This leads to the conclusion that a scale nonstationary model (e. [sent-134, score-0.295]
96 We have shown that one can exploit the assumptions of this model to compute a linear basis for natural signals without having to invoke higher order statistically techniques. [sent-137, score-0.589]
97 Though we do not claim that all higher-order properties or all natural signals can be explained by a scale non-stationary model, it is remarkable that such a simple model can account for a variety of the higher-order phenomena and for a variety of signal types. [sent-138, score-1.1]
98 Characteristic neuros in the primary auditory cortex of the awake primate using reverse correlation. [sent-170, score-0.035]
99 Detection in the presence of spherically symmetric random vectors. [sent-176, score-0.169]
100 Scale mixtures of Gaussians and the statistics of natural images. [sent-200, score-0.153]
wordName wordTfidf (topN-words)
[('signals', 0.243), ('sirp', 0.242), ('kurtosis', 0.222), ('meg', 0.207), ('speech', 0.206), ('scale', 0.2), ('powers', 0.2), ('neighboring', 0.18), ('joint', 0.156), ('px', 0.155), ('natural', 0.153), ('market', 0.15), ('stock', 0.15), ('histograms', 0.148), ('mda', 0.138), ('spherically', 0.135), ('marginal', 0.113), ('pz', 0.111), ('densities', 0.111), ('reproduce', 0.104), ('sirps', 0.104), ('signal', 0.096), ('properties', 0.093), ('characteristic', 0.09), ('parra', 0.089), ('alpha', 0.089), ('conditional', 0.089), ('bands', 0.089), ('cx', 0.081), ('khz', 0.081), ('invariant', 0.077), ('sampled', 0.076), ('stationary', 0.073), ('covariance', 0.073), ('ps', 0.072), ('components', 0.07), ('brehm', 0.069), ('clay', 0.069), ('engle', 0.069), ('lucas', 0.069), ('density', 0.069), ('variety', 0.068), ('basis', 0.067), ('features', 0.066), ('samples', 0.066), ('image', 0.063), ('explained', 0.062), ('univariate', 0.06), ('independent', 0.056), ('zero', 0.055), ('essence', 0.054), ('nonstationary', 0.054), ('bark', 0.054), ('variation', 0.052), ('distributions', 0.052), ('exploit', 0.05), ('empirical', 0.05), ('exhibit', 0.05), ('tails', 0.05), ('slices', 0.05), ('varies', 0.049), ('stochastic', 0.048), ('accounted', 0.047), ('vary', 0.046), ('gaussian', 0.046), ('process', 0.045), ('images', 0.044), ('variance', 0.044), ('spherical', 0.044), ('recording', 0.044), ('sand', 0.042), ('editors', 0.042), ('model', 0.041), ('mixture', 0.041), ('activity', 0.041), ('blind', 0.04), ('sparseness', 0.04), ('instantaneous', 0.04), ('sources', 0.04), ('phenomena', 0.039), ('slowly', 0.039), ('source', 0.039), ('variable', 0.038), ('magnitude', 0.038), ('varying', 0.038), ('mean', 0.038), ('observed', 0.037), ('claim', 0.037), ('considerably', 0.037), ('ms', 0.037), ('otherwise', 0.036), ('dependency', 0.036), ('pn', 0.036), ('relate', 0.036), ('hz', 0.036), ('dimensional', 0.036), ('four', 0.036), ('assumptions', 0.035), ('auditory', 0.035), ('random', 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 65 nips-2000-Higher-Order Statistical Properties Arising from the Non-Stationarity of Natural Signals
Author: Lucas C. Parra, Clay Spence, Paul Sajda
Abstract: We present evidence that several higher-order statistical properties of natural images and signals can be explained by a stochastic model which simply varies scale of an otherwise stationary Gaussian process. We discuss two interesting consequences. The first is that a variety of natural signals can be related through a common model of spherically invariant random processes, which have the attractive property that the joint densities can be constructed from the one dimensional marginal. The second is that in some cases the non-stationarity assumption and only second order methods can be explicitly exploited to find a linear basis that is equivalent to independent components obtained with higher-order methods. This is demonstrated on spectro-temporal components of speech. 1
2 0.17848183 123 nips-2000-Speech Denoising and Dereverberation Using Probabilistic Models
Author: Hagai Attias, John C. Platt, Alex Acero, Li Deng
Abstract: This paper presents a unified probabilistic framework for denoising and dereverberation of speech signals. The framework transforms the denoising and dereverberation problems into Bayes-optimal signal estimation. The key idea is to use a strong speech model that is pre-trained on a large data set of clean speech. Computational efficiency is achieved by using variational EM, working in the frequency domain, and employing conjugate priors. The framework covers both single and multiple microphones. We apply this approach to noisy reverberant speech signals and get results substantially better than standard methods.
3 0.16351025 96 nips-2000-One Microphone Source Separation
Author: Sam T. Roweis
Abstract: Source separation, or computational auditory scene analysis , attempts to extract individual acoustic objects from input which contains a mixture of sounds from different sources, altered by the acoustic environment. Unmixing algorithms such as lCA and its extensions recover sources by reweighting multiple observation sequences, and thus cannot operate when only a single observation signal is available. I present a technique called refiltering which recovers sources by a nonstationary reweighting (
4 0.14097293 89 nips-2000-Natural Sound Statistics and Divisive Normalization in the Auditory System
Author: Odelia Schwartz, Eero P. Simoncelli
Abstract: We explore the statistical properties of natural sound stimuli preprocessed with a bank of linear filters. The responses of such filters exhibit a striking form of statistical dependency, in which the response variance of each filter grows with the response amplitude of filters tuned for nearby frequencies. These dependencies may be substantially reduced using an operation known as divisive normalization, in which the response of each filter is divided by a weighted sum of the rectified responses of other filters. The weights may be chosen to maximize the independence of the normalized responses for an ensemble of natural sounds. We demonstrate that the resulting model accounts for nonlinearities in the response characteristics of the auditory nerve, by comparing model simulations to electrophysiological recordings. In previous work (NIPS, 1998) we demonstrated that an analogous model derived from the statistics of natural images accounts for non-linear properties of neurons in primary visual cortex. Thus, divisive normalization appears to be a generic mechanism for eliminating a type of statistical dependency that is prevalent in natural signals of different modalities. Signals in the real world are highly structured. For example, natural sounds typically contain both harmonic and rythmic structure. It is reasonable to assume that biological auditory systems are designed to represent these structures in an efficient manner [e.g., 1,2]. Specifically, Barlow hypothesized that a role of early sensory processing is to remove redundancy in the sensory input, resulting in a set of neural responses that are statistically independent. Experimentally, one can test this hypothesis by examining the statistical properties of neural responses under natural stimulation conditions [e.g., 3,4], or the statistical dependency of pairs (or groups) of neural responses. Due to their technical difficulty, such multi-cellular experiments are only recently becoming possible, and the earliest reports in vision appear consistent with the hypothesis [e.g., 5]. An alternative approach, which we follow here, is to develop a neural model from the statistics of natural signals and show that response properties of this model are similar to those of biological sensory neurons. A number of researchers have derived linear filter models using statistical criterion. For visual images, this results in linear filters localized in frequency, orientation and phase [6, 7]. Similar work in audition has yielded filters localized in frequency and phase [8]. Although these linear models provide an important starting point for neural modeling, sensory neurons are highly nonlinear. In addition, the statistical properties of natural signals are too complex to expect a linear transformation to result in an independent set of components. Recent results indicate that nonlinear gain control plays an important role in neural processing. Ruderman and Bialek [9] have shown that division by a local estimate of standard deviation can increase the entropy of responses of center-surround filters to natural images. Such a model is consistent with the properties of neurons in the retina and lateral geniculate nucleus. Heeger and colleagues have shown that the nonlinear behaviors of neurons in primary visual cortex may be described using a form of gain control known as divisive normalization [10], in which the response of a linear kernel is rectified and divided by the sum of other rectified kernel responses and a constant. We have recently shown that the responses of oriented linear filters exhibit nonlinear statistical dependencies that may be substantially reduced using a variant of this model, in which the normalization signal is computed from a weighted sum of other rectified kernel responses [11, 12]. The resulting model, with weighting parameters determined from image statistics, accounts qualitatively for physiological nonlinearities observed in primary visual cortex. In this paper, we demonstrate that the responses of bandpass linear filters to natural sounds exhibit striking statistical dependencies, analogous to those found in visual images. A divisive normalization procedure can substantially remove these dependencies. We show that this model, with parameters optimized for a collection of natural sounds, can account for nonlinear behaviors of neurons at the level of the auditory nerve. Specifically, we show that: 1) the shape offrequency tuning curves varies with sound pressure level, even though the underlying linear filters are fixed; and 2) superposition of a non-optimal tone suppresses the response of a linear filter in a divisive fashion, and the amount of suppression depends on the distance between the frequency of the tone and the preferred frequency of the filter. 1 Empirical observations of natural sound statistics The basic statistical properties of natural sounds, as observed through a linear filter, have been previously documented by Attias [13]. In particular, he showed that, as with visual images, the spectral energy falls roughly according to a power law, and that the histograms of filter responses are more kurtotic than a Gaussian (i.e., they have a sharp peak at zero, and very long tails). Here we examine the joint statistical properties of a pair of linear filters tuned for nearby temporal frequencies. We choose a fixed set of filters that have been widely used in modeling the peripheral auditory system [14]. Figure 1 shows joint histograms of the instantaneous responses of a particular pair of linear filters to five different types of natural sound, and white noise. First note that the responses are approximately decorrelated: the expected value of the y-axis value is roughly zero for all values of the x-axis variable. The responses are not, however, statistically independent: the width of the distribution of responses of one filter increases with the response amplitude of the other filter. If the two responses were statistically independent, then the response of the first filter should not provide any information about the distribution of responses of the other filter. We have found that this type of variance dependency (sometimes accompanied by linear correlation) occurs in a wide range of natural sounds, ranging from animal sounds to music. We emphasize that this dependency is a property of natural sounds, and is not due purely to our choice of linear filters. For example, no such dependency is observed when the input consists of white noise (see Fig. 1). The strength of this dependency varies for different pairs of linear filters . In addition, we see this type of dependency between instantaneous responses of a single filter at two Speech o -1 Drums • Monkey Cat White noise Nocturnal nature I~ ~; ~ • Figure 1: Joint conditional histogram of instantaneous linear responses of two bandpass filters with center frequencies 2000 and 2840 Hz. Pixel intensity corresponds to frequency of occurrence of a given pair of values, except that each column has been independently rescaled to fill the full intensity range. For the natural sounds, responses are not independent: the standard deviation of the ordinate is roughly proportional to the magnitude of the abscissa. Natural sounds were recorded from CDs and converted to sampling frequency of 22050 Hz. nearby time instants. Since the dependency involves the variance of the responses, we can substantially reduce it by dividing. In particular, the response of each filter is divided by a weighted sum of responses of other rectified filters and an additive constant. Specifically: L2 Ri = 2: (1) 12 j WjiLj + 0'2 where Li is the instantaneous linear response of filter i, strength of suppression of filter i by filter j. 0' is a constant and Wji controls the We would like to choose the parameters of the model (the weights Wji, and the constant 0') to optimize the independence of the normalized response to an ensemble of natural sounds. Such an optimization is quite computationally expensive. We instead assume a Gaussian form for the underlying conditional distribution, as described in [15]: P (LiILj,j E Ni ) '
Author: Jürgen Tchorz, Michael Kleinschmidt, Birger Kollmeier
Abstract: A novel noise suppression scheme for speech signals is proposed which is based on a neurophysiologically-motivated estimation of the local signal-to-noise ratio (SNR) in different frequency channels. For SNR-estimation, the input signal is transformed into so-called Amplitude Modulation Spectrograms (AMS), which represent both spectral and temporal characteristics of the respective analysis frame, and which imitate the representation of modulation frequencies in higher stages of the mammalian auditory system. A neural network is used to analyse AMS patterns generated from noisy speech and estimates the local SNR. Noise suppression is achieved by attenuating frequency channels according to their SNR. The noise suppression algorithm is evaluated in speakerindependent digit recognition experiments and compared to noise suppression by Spectral Subtraction. 1
6 0.12397232 78 nips-2000-Learning Joint Statistical Models for Audio-Visual Fusion and Segregation
7 0.1117093 90 nips-2000-New Approaches Towards Robust and Adaptive Speech Recognition
8 0.10978433 99 nips-2000-Periodic Component Analysis: An Eigenvalue Method for Representing Periodic Structure in Speech
9 0.093669116 32 nips-2000-Color Opponency Constitutes a Sparse Representation for the Chromatic Structure of Natural Scenes
10 0.088571459 2 nips-2000-A Comparison of Image Processing Techniques for Visual Speech Recognition Applications
11 0.08713115 33 nips-2000-Combining ICA and Top-Down Attention for Robust Speech Recognition
12 0.083005868 76 nips-2000-Learning Continuous Distributions: Simulations With Field Theoretic Priors
13 0.082763523 77 nips-2000-Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations
14 0.080699645 59 nips-2000-From Mixtures of Mixtures to Adaptive Transform Coding
15 0.078735799 45 nips-2000-Emergence of Movement Sensitive Neurons' Properties by Learning a Sparse Code for Natural Moving Images
16 0.078522511 88 nips-2000-Multiple Timescales of Adaptation in a Neural Code
17 0.076729149 51 nips-2000-Factored Semi-Tied Covariance Matrices
18 0.072135642 31 nips-2000-Beyond Maximum Likelihood and Density Estimation: A Sample-Based Criterion for Unsupervised Learning of Complex Models
19 0.070902832 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach
20 0.070689283 73 nips-2000-Kernel-Based Reinforcement Learning in Average-Cost Problems: An Application to Optimal Portfolio Choice
topicId topicWeight
[(0, 0.261), (1, -0.13), (2, 0.142), (3, 0.185), (4, -0.049), (5, -0.098), (6, -0.221), (7, -0.054), (8, 0.012), (9, -0.07), (10, -0.06), (11, 0.027), (12, -0.037), (13, -0.001), (14, -0.006), (15, 0.029), (16, -0.029), (17, 0.023), (18, -0.099), (19, -0.01), (20, -0.069), (21, -0.023), (22, -0.061), (23, -0.008), (24, 0.11), (25, -0.031), (26, -0.078), (27, 0.023), (28, 0.03), (29, -0.061), (30, 0.049), (31, -0.019), (32, 0.018), (33, 0.02), (34, 0.025), (35, 0.017), (36, -0.084), (37, 0.083), (38, -0.011), (39, -0.033), (40, 0.14), (41, -0.096), (42, -0.025), (43, -0.076), (44, 0.024), (45, 0.003), (46, 0.013), (47, 0.028), (48, -0.066), (49, -0.01)]
simIndex simValue paperId paperTitle
same-paper 1 0.96892619 65 nips-2000-Higher-Order Statistical Properties Arising from the Non-Stationarity of Natural Signals
Author: Lucas C. Parra, Clay Spence, Paul Sajda
Abstract: We present evidence that several higher-order statistical properties of natural images and signals can be explained by a stochastic model which simply varies scale of an otherwise stationary Gaussian process. We discuss two interesting consequences. The first is that a variety of natural signals can be related through a common model of spherically invariant random processes, which have the attractive property that the joint densities can be constructed from the one dimensional marginal. The second is that in some cases the non-stationarity assumption and only second order methods can be explicitly exploited to find a linear basis that is equivalent to independent components obtained with higher-order methods. This is demonstrated on spectro-temporal components of speech. 1
2 0.70991498 123 nips-2000-Speech Denoising and Dereverberation Using Probabilistic Models
Author: Hagai Attias, John C. Platt, Alex Acero, Li Deng
Abstract: This paper presents a unified probabilistic framework for denoising and dereverberation of speech signals. The framework transforms the denoising and dereverberation problems into Bayes-optimal signal estimation. The key idea is to use a strong speech model that is pre-trained on a large data set of clean speech. Computational efficiency is achieved by using variational EM, working in the frequency domain, and employing conjugate priors. The framework covers both single and multiple microphones. We apply this approach to noisy reverberant speech signals and get results substantially better than standard methods.
3 0.69439238 96 nips-2000-One Microphone Source Separation
Author: Sam T. Roweis
Abstract: Source separation, or computational auditory scene analysis , attempts to extract individual acoustic objects from input which contains a mixture of sounds from different sources, altered by the acoustic environment. Unmixing algorithms such as lCA and its extensions recover sources by reweighting multiple observation sequences, and thus cannot operate when only a single observation signal is available. I present a technique called refiltering which recovers sources by a nonstationary reweighting (
Author: Jürgen Tchorz, Michael Kleinschmidt, Birger Kollmeier
Abstract: A novel noise suppression scheme for speech signals is proposed which is based on a neurophysiologically-motivated estimation of the local signal-to-noise ratio (SNR) in different frequency channels. For SNR-estimation, the input signal is transformed into so-called Amplitude Modulation Spectrograms (AMS), which represent both spectral and temporal characteristics of the respective analysis frame, and which imitate the representation of modulation frequencies in higher stages of the mammalian auditory system. A neural network is used to analyse AMS patterns generated from noisy speech and estimates the local SNR. Noise suppression is achieved by attenuating frequency channels according to their SNR. The noise suppression algorithm is evaluated in speakerindependent digit recognition experiments and compared to noise suppression by Spectral Subtraction. 1
5 0.6409831 99 nips-2000-Periodic Component Analysis: An Eigenvalue Method for Representing Periodic Structure in Speech
Author: Lawrence K. Saul, Jont B. Allen
Abstract: An eigenvalue method is developed for analyzing periodic structure in speech. Signals are analyzed by a matrix diagonalization reminiscent of methods for principal component analysis (PCA) and independent component analysis (ICA). Our method-called periodic component analysis (1l
6 0.5825842 89 nips-2000-Natural Sound Statistics and Divisive Normalization in the Auditory System
7 0.53836483 90 nips-2000-New Approaches Towards Robust and Adaptive Speech Recognition
8 0.49847388 60 nips-2000-Gaussianization
9 0.46760809 32 nips-2000-Color Opponency Constitutes a Sparse Representation for the Chromatic Structure of Natural Scenes
10 0.45211515 59 nips-2000-From Mixtures of Mixtures to Adaptive Transform Coding
11 0.44895402 78 nips-2000-Learning Joint Statistical Models for Audio-Visual Fusion and Segregation
12 0.41092917 53 nips-2000-Feature Correspondence: A Markov Chain Monte Carlo Approach
13 0.39920866 2 nips-2000-A Comparison of Image Processing Techniques for Visual Speech Recognition Applications
14 0.38793012 76 nips-2000-Learning Continuous Distributions: Simulations With Field Theoretic Priors
16 0.38400838 135 nips-2000-The Manhattan World Assumption: Regularities in Scene Statistics which Enable Bayesian Inference
17 0.3744334 33 nips-2000-Combining ICA and Top-Down Attention for Robust Speech Recognition
18 0.37233463 85 nips-2000-Mixtures of Gaussian Processes
19 0.36909422 77 nips-2000-Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations
20 0.3601526 46 nips-2000-Ensemble Learning and Linear Response Theory for ICA
topicId topicWeight
[(2, 0.016), (4, 0.014), (10, 0.011), (17, 0.12), (32, 0.016), (33, 0.53), (54, 0.013), (55, 0.014), (62, 0.029), (65, 0.014), (67, 0.046), (76, 0.045), (81, 0.02), (90, 0.017), (91, 0.019), (97, 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.95937991 65 nips-2000-Higher-Order Statistical Properties Arising from the Non-Stationarity of Natural Signals
Author: Lucas C. Parra, Clay Spence, Paul Sajda
Abstract: We present evidence that several higher-order statistical properties of natural images and signals can be explained by a stochastic model which simply varies scale of an otherwise stationary Gaussian process. We discuss two interesting consequences. The first is that a variety of natural signals can be related through a common model of spherically invariant random processes, which have the attractive property that the joint densities can be constructed from the one dimensional marginal. The second is that in some cases the non-stationarity assumption and only second order methods can be explicitly exploited to find a linear basis that is equivalent to independent components obtained with higher-order methods. This is demonstrated on spectro-temporal components of speech. 1
2 0.94192594 148 nips-2000-`N-Body' Problems in Statistical Learning
Author: Alexander G. Gray, Andrew W. Moore
Abstract: We present efficient algorithms for all-point-pairs problems , or 'Nbody '-like problems , which are ubiquitous in statistical learning. We focus on six examples, including nearest-neighbor classification, kernel density estimation, outlier detection , and the two-point correlation. These include any problem which abstractly requires a comparison of each of the N points in a dataset with each other point and would naively be solved using N 2 distance computations. In practice N is often large enough to make this infeasible. We present a suite of new geometric t echniques which are applicable in principle to any 'N-body ' computation including large-scale mixtures of Gaussians, RBF neural networks, and HMM 's. Our algorithms exhibit favorable asymptotic scaling and are empirically several orders of magnitude faster than the naive computation, even for small datasets. We are aware of no exact algorithms for these problems which are more efficient either empirically or theoretically. In addition, our framework yields simple and elegant algorithms. It also permits two important generalizations beyond the standard all-point-pairs problems, which are more difficult. These are represented by our final examples, the multiple two-point correlation and the notorious n-point correlation. 1
3 0.94092137 58 nips-2000-From Margin to Sparsity
Author: Thore Graepel, Ralf Herbrich, Robert C. Williamson
Abstract: We present an improvement of Novikoff's perceptron convergence theorem. Reinterpreting this mistake bound as a margin dependent sparsity guarantee allows us to give a PAC-style generalisation error bound for the classifier learned by the perceptron learning algorithm. The bound value crucially depends on the margin a support vector machine would achieve on the same data set using the same kernel. Ironically, the bound yields better guarantees than are currently available for the support vector solution itself. 1
4 0.58977234 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm
Author: Claudio Gentile
Abstract: A new incremental learning algorithm is described which approximates the maximal margin hyperplane w.r.t. norm p ~ 2 for a set of linearly separable data. Our algorithm, called ALMAp (Approximate Large Margin algorithm w.r.t. norm p), takes 0 ((P~21;;2) corrections to separate the data with p-norm margin larger than (1 - 0:) ,,(, where,,( is the p-norm margin of the data and X is a bound on the p-norm of the instances. ALMAp avoids quadratic (or higher-order) programming methods. It is very easy to implement and is as fast as on-line algorithms, such as Rosenblatt's perceptron. We report on some experiments comparing ALMAp to two incremental algorithms: Perceptron and Li and Long's ROMMA. Our algorithm seems to perform quite better than both. The accuracy levels achieved by ALMAp are slightly inferior to those obtained by Support vector Machines (SVMs). On the other hand, ALMAp is quite faster and easier to implement than standard SVMs training algorithms.
5 0.57738817 9 nips-2000-A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work
Author: Ralf Herbrich, Thore Graepel
Abstract: We present a bound on the generalisation error of linear classifiers in terms of a refined margin quantity on the training set. The result is obtained in a PAC- Bayesian framework and is based on geometrical arguments in the space of linear classifiers. The new bound constitutes an exponential improvement of the so far tightest margin bound by Shawe-Taylor et al. [8] and scales logarithmically in the inverse margin. Even in the case of less training examples than input dimensions sufficiently large margins lead to non-trivial bound values and - for maximum margins - to a vanishing complexity term. Furthermore, the classical margin is too coarse a measure for the essential quantity that controls the generalisation error: the volume ratio between the whole hypothesis space and the subset of consistent hypotheses. The practical relevance of the result lies in the fact that the well-known support vector machine is optimal w.r.t. the new bound only if the feature vectors are all of the same length. As a consequence we recommend to use SVMs on normalised feature vectors only - a recommendation that is well supported by our numerical experiments on two benchmark data sets. 1
6 0.53807729 131 nips-2000-The Early Word Catches the Weights
7 0.53325075 37 nips-2000-Convergence of Large Margin Separable Linear Classification
8 0.51492745 91 nips-2000-Noise Suppression Based on Neurophysiologically-motivated SNR Estimation for Robust Speech Recognition
9 0.50265563 123 nips-2000-Speech Denoising and Dereverberation Using Probabilistic Models
10 0.49998513 75 nips-2000-Large Scale Bayes Point Machines
11 0.49340698 94 nips-2000-On Reversing Jensen's Inequality
12 0.49186265 21 nips-2000-Algorithmic Stability and Generalization Performance
13 0.48929209 145 nips-2000-Weak Learners and Improved Rates of Convergence in Boosting
14 0.48635915 111 nips-2000-Regularized Winnow Methods
15 0.48510709 89 nips-2000-Natural Sound Statistics and Divisive Normalization in the Auditory System
16 0.47961783 36 nips-2000-Constrained Independent Component Analysis
17 0.4745982 119 nips-2000-Some New Bounds on the Generalization Error of Combined Classifiers
18 0.47027877 97 nips-2000-Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping
19 0.46913123 78 nips-2000-Learning Joint Statistical Models for Audio-Visual Fusion and Segregation
20 0.46693709 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks