nips nips2009 nips2009-17 knowledge-graph by maker-knowledge-mining

17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds


Source: pdf

Author: Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj

Abstract: In this paper we present an algorithm for separating mixed sounds from a monophonic recording. Our approach makes use of training data which allows us to learn representations of the types of sounds that compose the mixture. In contrast to popular methods that attempt to extract compact generalizable models for each sound from training data, we employ the training data itself as a representation of the sources in the mixture. We show that mixtures of known sounds can be described as sparse combinations of the training data itself, and in doing so produce significantly better separation results as compared to similar systems based on compact statistical models. Keywords: Example-Based Representation, Signal Separation, Sparse Models. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract In this paper we present an algorithm for separating mixed sounds from a monophonic recording. [sent-8, score-0.166]

2 Our approach makes use of training data which allows us to learn representations of the types of sounds that compose the mixture. [sent-9, score-0.164]

3 In contrast to popular methods that attempt to extract compact generalizable models for each sound from training data, we employ the training data itself as a representation of the sources in the mixture. [sent-10, score-0.454]

4 We show that mixtures of known sounds can be described as sparse combinations of the training data itself, and in doing so produce significantly better separation results as compared to similar systems based on compact statistical models. [sent-11, score-0.394]

5 1 Introduction This paper deals with the problem of single-channel signal separation – separating out signals from individual sources in a mixed recording. [sent-13, score-0.451]

6 As of recently, a popular statistical approach has been to obtain compact characterizations of individual sources and employ them to identify and extract their counterpart components from mixture signals. [sent-14, score-0.404]

7 Statistical characterizations may include codebooks [1], Gaussian mixture densities [2], HMMs [3], independent components [4, 5], sparse dictionaries [6], non-negative decompositions [7–9] and latent variable models [10, 11]. [sent-15, score-0.372]

8 Separation is achieved by abstracting components from the mixed signal that conform to the statistical characterizations of the individual sources. [sent-17, score-0.163]

9 The key here is the specific statistical model employed – the more effectively it captures the specific characteristics of the signal sources, the better the separation that may be achieved. [sent-18, score-0.235]

10 This has been the basis of several example-based characterizations of a data source, such as nearest-neighbor, K-nearest neighbor, Parzen-window based models of source distributions etc. [sent-20, score-0.536]

11 Here, we use the same idea to develop a monaural source-separation algorithm that directly uses samples from the training data to represent the sources in a mixture. [sent-21, score-0.233]

12 Identifying the proper samples from the training data that best approximate a sample of the mixture is of course a hard combinatorial problem, which can be computationally demanding. [sent-23, score-0.27]

13 We additionally show that this approach results in 1 source estimates which are guaranteed to lie on the source manifold, as opposed to trainedbasis approaches which can produce arbitrary outputs that will not necessarily be plausible source estimates. [sent-25, score-1.541]

14 Experimental evaluations show that this approach results in separated signals that exhibit significantly higher performance metrics as compared to conceptually similar techniques which are based on various types of combinations of generalizable bases representing the sources. [sent-26, score-0.401]

15 1 The Basic Model Given a magnitude spectrogram of a single source, each spectral frame is modeled as a histogram of repeated draws from a multinomial distribution over the frequency bins. [sent-29, score-0.344]

16 At a given time frame t, consider a random process characterized by the probability Pt (f ) of drawing frequency f in a given draw. [sent-30, score-0.15]

17 The model assumes that Pt (f ) is comprised of bases indexed by a latent variable z. [sent-32, score-0.266]

18 We use this model to learn the source-specific bases given by Pt (f |z) as done in [10, 11]. [sent-35, score-0.226]

19 Now let the matrix VF ×T of entries vf t represent the magnitude spectrogram of the mixture sound and vt represent time frame t (the t-th column vector of matrix V). [sent-37, score-0.665]

20 Each mixture spectral frame is again modeled as a histogram of repeated draws, from the multinomial distributions corresponding to every source. [sent-38, score-0.337]

21 We can assume that for each source in the mixture we have an already trained model in the form of basis vectors Ps (f |z). [sent-40, score-0.657]

22 These bases will represent a dictionary of spectra that best describe each source. [sent-41, score-0.334]

23 Armed with this knowledge we can decompose a new mixture of these known sources in terms of the contributions of the dictionaries for each source. [sent-42, score-0.368]

24 The triangles denote the position of basis functions for two source classes. [sent-44, score-0.476]

25 The square is an instance of a mixture of the two sources. [sent-45, score-0.181]

26 The mixture point is not within the convex hull which covers either source, but it is within the convex hull defined by all the bases combined. [sent-46, score-0.783]

27 These reconstructions will approximate the magnitude spectrogram of each source in the mixture. [sent-47, score-0.634]

28 Once we obtain these reconstructions we can use them to modulate the original phase spectrogram of the mixture and obtain the time-series representation of the sources. [sent-48, score-0.381]

29 Each basis vector and the mixture input will lie in a F − 1 dimensional simplex (due to the fact that these quantities are normalized to sum to unity). [sent-50, score-0.302]

30 Each source’s basis set will define a convex hull within which any point can be approximated using these bases. [sent-51, score-0.249]

31 Assuming that the training data is accurate, all potential inputs from that source should lie in that area. [sent-52, score-0.602]

32 The union of all the source bases will define a larger space in which a mixture input will be inside. [sent-53, score-0.841]

33 Any mixture point can then be approximated as a weighted sum of multiple bases from both sources. [sent-54, score-0.426]

34 2 Using Training Data Directly as a Dictionary In this paper, we would like to explain the mixture frame from the training spectral frames instead of using a smaller set of learned bases. [sent-57, score-0.499]

35 The secondary rationale behind this operation is based on the observation that the points defined by the convex hull of a source’s model, do not necessarily all fall on that source’s manifold. [sent-61, score-0.188]

36 In both of these plots the sources exhibit a clear structure. [sent-63, score-0.195]

37 In the left plot both sources appear in a circular pattern, and in the right plot in a spiral form. [sent-64, score-0.247]

38 As shown in [12], learning a set of bases that explains these sources results in defining a convex hull that surrounds the training data. [sent-65, score-0.647]

39 Under this model potential source estimates can now lie anywhere inside these hulls. [sent-66, score-0.541]

40 Using trainedbasis models, if we decompose the mixture points in these figures we obtain two source estimates which do not lie in the same manifold as the original sources. [sent-67, score-0.804]

41 Although the input was adequately approximated, there is no guarantee that the extracted sources are indeed appropriate outcomes for their sound class. [sent-68, score-0.212]

42 In order to address this problem and to also provide a richer dictionary for the source reconstructions, we will make direct use of the training data in order to explain the mixture, and bypass the basis representation as an abstraction. [sent-69, score-0.673]

43 To do so we will use each frame of the (s) spectrograms of the training sequences as the bases Ps (f |z). [sent-70, score-0.494]

44 More specifically, let WF ×T (s) (s) be the training spectrogram from source s and let wt represent the time frame t from the spectrogram. [sent-71, score-0.79]

45 In this case, the latent variable z for source s takes T (s) values, and the z-th basis function will be given by the (normalized) z-th column vector of W(s) . [sent-72, score-0.516]

46 In both plots the training data for each source are denoted by △ and ▽, and the mixture sample by . [sent-74, score-0.734]

47 The learned bases of each source are the vertices of the two dashed convex hulls that enclose each class. [sent-75, score-0.699]

48 The source estimates and the approximation of the mixture are denoted by ×, + and . [sent-76, score-0.664]

49 In the left case the two sources lie on two overlapping circular areas, the source estimates however lie outside these areas. [sent-77, score-0.787]

50 The recovered sources lie very closely on the competing source’s area, thereby providing a highly inappropriate decomposition. [sent-79, score-0.223]

51 Although the mixture was well approximated in both cases, the estimated sources were poor representations of their classes. [sent-80, score-0.344]

52 With the above model we would ideally want to use one dictionary element per source at any point in time. [sent-81, score-0.542]

53 Doing so will ensure that the outputs would lie on the source manifold, and also offset any issues of potential overcompleteness. [sent-82, score-0.513]

54 One way to ensure this is to perform a reconstruction such that we only use one element of each source at any time, much akin to a nearest-neighbor model, albeit in an additive setting. [sent-83, score-0.472]

55 The intuition is that at any given point in time, the mixture frame is explained by very few active elements from the training data. [sent-85, score-0.42]

56 In other words, we need the mixture weight distributions and the speaker priors to be sparse at every time instant. [sent-86, score-0.364]

57 We would like to minimize the entropies of both the speaker dependent mixture weight distributions (given by Pt (z|s)) and the source priors (given by Pt (s)) at every frame. [sent-91, score-0.77]

58 Thus, reducing the entropy of the joint distribution Pt (z, s) is equivalent to reducing the conditional entropy of the source dependent mixture weights and the entropy of the source priors. [sent-94, score-1.186]

59 Since the dictionary is already known and is given by the normalized spectral frames from source training spectrograms, the parameter to be estimated is given by Pt (z, s). [sent-95, score-0.733]

60 We impose an entropic prior distribution on Pt (z, s) 4 Source A Source B Mixture Estimate for A Estimate for B Approximation of mixture Source A Source B Mixture Estimate for A Estimate for B Approximation of mixture Figure 3: Using a sparse reconstruction on the data in figure 2. [sent-100, score-0.528]

61 Note how in contrast to that figure the source estimates are now identified as training data points, and are thus plausible solutions. [sent-101, score-0.601]

62 The approximation of the mixture is the nearest point of the line connecting the two source estimates, to the actual mixture input. [sent-102, score-0.837]

63 Note that the proper solution is the one that results in such a line that is as close as possible to the mixture point, and not one that is defined by two training points close to the mixture. [sent-103, score-0.27]

64 Once Pt (z, s) is estimated, the reconstruction of source s can be computed as z∈{zs } (s) vf t = ˆ s Ps (f |z)Pt (z, s) z∈{zs } Ps (f |z)Pt (z, s) vf t Now let us consider how this problem resolves the issues presented in figure 2. [sent-110, score-0.805]

65 In both plots we see that the source reconstructions lie on a training point, thereby being a plausible source estimate. [sent-114, score-1.176]

66 The approximation of the mixture is not as exact as before, since now it has to lie on the line connecting the two active source elements. [sent-115, score-0.738]

67 This is not however an issue of concern since in practice the approximation is always good enough, and the guarantee of a plausible source estimate is more valuable than the exact approximation of the mixture. [sent-116, score-0.568]

68 In these approaches the priors are imposed on the mixture weights and thus are not as effective for this particular task since they still suffer from the symptoms of learned-basis models. [sent-118, score-0.207]

69 The top plots show the input waveforms, and the bottom plots shows the estimated weights multiplied with the source priors. [sent-121, score-0.494]

70 The sources were sampled as 16 kHz, we used 64 ms windows for the spectrogram computation, and an overlap of 32 ms. [sent-125, score-0.284]

71 The training data was around 25 sec worth of speech for each speaker, and the testing mixture was about 3 sec long. [sent-127, score-0.445]

72 We evaluated the separation performance using the metrics provided in [17]. [sent-128, score-0.225]

73 The first is a measure of how well we suppress the interfering speaker, whereas the other two provide us with a sense of how much the extracted source is corrupted due to the separation process. [sent-130, score-0.597]

74 The first experiment we perform is on a mixture for which the training data includes its isolated constituent sentences. [sent-137, score-0.27]

75 The primary observation from this experiment is that the more bases we use the better the results get. [sent-146, score-0.226]

76 01 β 0 5 10 20 40 80 Train 320 160 Bases Figure 5: Average separation performance metrics for oracle cases, as dependent on the choice of different number of elements in the speaker’s dictionary, and different choices of the entropic prior parameter β. [sent-166, score-0.388]

77 The basis row labeled as “Train” is the case where we use all the training data as a basis set. [sent-168, score-0.173]

78 01 β 0 5 10 20 40 80 Train 320 160 Bases Figure 6: Average separation performance metrics for real-world cases, as dependent on the choice of different number of elements in the speaker’s dictionary, and different choices of the entropic prior parameter β. [sent-187, score-0.325]

79 2 Results on Realistic Situations Let us now consider the more realistic case where the mixture data is different from the training set. [sent-191, score-0.292]

80 The input mixture has to be reconstructed using approximate samples. [sent-193, score-0.181]

81 We do not obtain such high numbers in performance as in the oracle case, but we also see a stronger trend in favor of sparsity and the use of all the training data as a dictionary. [sent-195, score-0.228]

82 We can clearly see that in all metrics using all the training data significantly outperforms trained-basis models. [sent-197, score-0.151]

83 The use of sparsity ensures that the output is a plausible speech signal devoid of artifacts like distortion and musical noise. [sent-204, score-0.368]

84 dB 15 10 5 SDR 0 0% SIR 20% SAR 40% 60% 70% 80% Percentage of discarded training frames 90% 95% Figure 8: Effect of discarding low energy training frames. [sent-214, score-0.323]

85 The horizontal axis denotes the percentage of training frames that have been discarded. [sent-215, score-0.162]

86 In order to address this concern we show that the size of the training data can be easily pruned down to a size comparable to trainedbasis models and still outperform them. [sent-219, score-0.17]

87 Since sound signals, especially speech, tend to have a considerable amount of short-term pauses and regions of silence, we can use an energy threshold to in order to select the loudest frames of the training spectrogram as bases. [sent-220, score-0.406]

88 In figure 8 we show how the separation performance metrics are influenced as we increasingly remove bases which lie under various energy percentiles. [sent-221, score-0.566]

89 It is clear that even after discarding up to at least 70% of the lowest energy training frames the performance is still approximately the same. [sent-222, score-0.234]

90 Overall computations for a single mixture took roughly 4 sec when not using the sparsity prior, 14 sec when using the sparsity prior (primarily due to slow computation of Lambert’s function), and dropped down to 5 sec when using the 30% highest energy frames from the training data. [sent-227, score-0.684]

91 4 Conclusion In this paper we present a new approach to solving the monophonic source separation problem. [sent-228, score-0.637]

92 In order to do so we present a sparse learning algorithm which can efficiently solve this problem, and also guarantees that the returned source estimates are plausible given the training data. [sent-230, score-0.649]

93 We provide experiments that show how this approach is influenced by the use of varying sparsity constraints and training data selection. [sent-231, score-0.165]

94 Roweis, One microphone source separation, in Advances in Neural Information Processing Systems, 2001. [sent-235, score-0.434]

95 Gopinath, Super-human multitalker speech recognition: The IBM 2006 speech separation challenge system, in International Conference on Spoken Language Processing (INTERSPEECH), 2006, pp. [sent-246, score-0.309]

96 Separation of mixed audio sources by independent subspace analysis, in Proceedings of the International Conference of Computer Music, 2000. [sent-254, score-0.207]

97 Gribonval, Non negative sparse representation for wiener based source separation with a single sensor, in Acoustics, Speech, and Signal Processing, IEEE International Conference on, 2003, pp. [sent-272, score-0.645]

98 Olsson, Single-channel speech separation using sparse nonnegative matrix factorization, in International Conference on Spoken Language Processing (INTERSPEECH), 2006. [sent-278, score-0.284]

99 Virtanen, Sound source separation using sparse coding with temporal continuity objective, in International Computer Music Conference, ICMC, 2003. [sent-280, score-0.645]

100 Using unsupervised learning of a finite Dirichlet mixture model to improve pattern recognition applications, Pattern Recognition Letters, Volume 26, Issue 12, September 2005. [sent-324, score-0.181]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pt', 0.533), ('source', 0.434), ('bases', 0.226), ('mixture', 0.181), ('separation', 0.163), ('ps', 0.152), ('hull', 0.149), ('vf', 0.149), ('zs', 0.145), ('sources', 0.144), ('spectrogram', 0.14), ('frame', 0.127), ('speaker', 0.109), ('dictionary', 0.108), ('sar', 0.105), ('sir', 0.09), ('training', 0.089), ('erent', 0.087), ('entropic', 0.08), ('sdr', 0.08), ('shashanka', 0.08), ('smaragdis', 0.08), ('lie', 0.079), ('sparsity', 0.076), ('sounds', 0.075), ('frames', 0.073), ('speech', 0.073), ('signal', 0.072), ('sound', 0.068), ('oracle', 0.063), ('metrics', 0.062), ('reconstructions', 0.06), ('characterizations', 0.06), ('raj', 0.06), ('trainedbasis', 0.06), ('gure', 0.057), ('di', 0.055), ('lambert', 0.052), ('spectrograms', 0.052), ('sec', 0.051), ('artifacts', 0.05), ('plausible', 0.05), ('sparse', 0.048), ('distortion', 0.047), ('generalizable', 0.045), ('ratio', 0.045), ('dictionaries', 0.043), ('interference', 0.043), ('basis', 0.042), ('bhiksha', 0.04), ('brand', 0.04), ('gribonval', 0.04), ('monophonic', 0.04), ('plot', 0.04), ('latent', 0.04), ('entropy', 0.039), ('convex', 0.039), ('db', 0.039), ('train', 0.038), ('reconstruction', 0.038), ('discarding', 0.036), ('energy', 0.036), ('resolves', 0.035), ('audio', 0.032), ('mixed', 0.031), ('plots', 0.03), ('spoken', 0.03), ('spectral', 0.029), ('interspeech', 0.028), ('estimates', 0.028), ('speakers', 0.027), ('conceptually', 0.026), ('priors', 0.026), ('draws', 0.025), ('frequency', 0.023), ('acoustics', 0.023), ('active', 0.023), ('su', 0.023), ('circular', 0.023), ('opposed', 0.022), ('paris', 0.022), ('boost', 0.022), ('manifold', 0.022), ('realistic', 0.022), ('music', 0.022), ('signals', 0.021), ('approximation', 0.021), ('exhibit', 0.021), ('concern', 0.021), ('estimate', 0.021), ('separating', 0.02), ('actual', 0.02), ('channel', 0.02), ('ect', 0.02), ('uenced', 0.02), ('dependent', 0.02), ('tests', 0.019), ('observing', 0.019), ('approximated', 0.019), ('compact', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

Author: Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj

Abstract: In this paper we present an algorithm for separating mixed sounds from a monophonic recording. Our approach makes use of training data which allows us to learn representations of the types of sounds that compose the mixture. In contrast to popular methods that attempt to extract compact generalizable models for each sound from training data, we employ the training data itself as a representation of the sources in the mixture. We show that mixtures of known sounds can be described as sparse combinations of the training data itself, and in doing so produce significantly better separation results as compared to similar systems based on compact statistical models. Keywords: Example-Based Representation, Signal Separation, Sparse Models. 1

2 0.18266319 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

Author: Marcel V. Gerven, Botond Cseke, Robert Oostenveld, Tom Heskes

Abstract: We introduce a novel multivariate Laplace (MVL) distribution as a sparsity promoting prior for Bayesian source localization that allows the specification of constraints between and within sources. We represent the MVL distribution as a scale mixture that induces a coupling between source variances instead of their means. Approximation of the posterior marginals using expectation propagation is shown to be very efficient due to properties of the scale mixture representation. The computational bottleneck amounts to computing the diagonal elements of a sparse matrix inverse. Our approach is illustrated using a mismatch negativity paradigm for which MEG data and a structural MRI have been acquired. We show that spatial coupling leads to sources which are active over larger cortical areas as compared with an uncoupled prior. 1

3 0.16911611 140 nips-2009-Linearly constrained Bayesian matrix factorization for blind source separation

Author: Mikkel Schmidt

Abstract: We present a general Bayesian approach to probabilistic matrix factorization subject to linear constraints. The approach is based on a Gaussian observation model and Gaussian priors with bilinear equality and inequality constraints. We present an efficient Markov chain Monte Carlo inference procedure based on Gibbs sampling. Special cases of the proposed model are Bayesian formulations of nonnegative matrix factorization and factor analysis. The method is evaluated on a blind source separation problem. We demonstrate that our algorithm can be used to extract meaningful and interpretable features that are remarkably different from features extracted using existing related matrix factorization techniques.

4 0.13494855 253 nips-2009-Unsupervised feature learning for audio classification using convolutional deep belief networks

Author: Honglak Lee, Peter Pham, Yan Largman, Andrew Y. Ng

Abstract: In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. However, to our knowledge, these deep learning approaches have not been extensively studied for auditory data. In this paper, we apply convolutional deep belief networks to audio data and empirically evaluate them on various audio classification tasks. In the case of speech data, we show that the learned features correspond to phones/phonemes. In addition, our feature representations learned from unlabeled audio data show very good performance for multiple audio classification tasks. We hope that this paper will inspire more research on deep learning approaches applied to a wide range of audio recognition tasks. 1

5 0.12996544 83 nips-2009-Estimating image bases for visual image reconstruction from human brain activity

Author: Yusuke Fujiwara, Yoichi Miyawaki, Yukiyasu Kamitani

Abstract: Image representation based on image bases provides a framework for understanding neural representation of visual perception. A recent fMRI study has shown that arbitrary contrast-defined visual images can be reconstructed from fMRI activity patterns using a combination of multi-scale local image bases. In the reconstruction model, the mapping from an fMRI activity pattern to the contrasts of the image bases was learned from measured fMRI responses to visual images. But the shapes of the images bases were fixed, and thus may not be optimal for reconstruction. Here, we propose a method to build a reconstruction model in which image bases are automatically extracted from the measured data. We constructed a probabilistic model that relates the fMRI activity space to the visual image space via a set of latent variables. The mapping from the latent variables to the visual image space can be regarded as a set of image bases. We found that spatially localized, multi-scale image bases were estimated near the fovea, and that the model using the estimated image bases was able to accurately reconstruct novel visual images. The proposed method provides a means to discover a novel functional mapping between stimuli and brain activity patterns.

6 0.12573583 147 nips-2009-Matrix Completion from Noisy Entries

7 0.097353257 52 nips-2009-Code-specific policy gradient rules for spiking neurons

8 0.095389761 227 nips-2009-Speaker Comparison with Inner Product Discriminant Functions

9 0.089184508 104 nips-2009-Group Sparse Coding

10 0.08646895 137 nips-2009-Learning transport operators for image manifolds

11 0.076050408 167 nips-2009-Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations

12 0.074777782 50 nips-2009-Canonical Time Warping for Alignment of Human Behavior

13 0.073812455 129 nips-2009-Learning a Small Mixture of Trees

14 0.071992107 146 nips-2009-Manifold Regularization for SIR with Rate Root-n Convergence

15 0.065755561 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

16 0.057068553 24 nips-2009-Adapting to the Shifting Intent of Search Queries

17 0.056610744 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization

18 0.056335777 157 nips-2009-Multi-Label Prediction via Compressed Sensing

19 0.055841062 75 nips-2009-Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models

20 0.055269491 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.172), (1, -0.037), (2, 0.009), (3, 0.031), (4, 0.008), (5, -0.026), (6, 0.074), (7, -0.05), (8, 0.055), (9, 0.108), (10, -0.034), (11, 0.057), (12, -0.139), (13, 0.113), (14, -0.041), (15, -0.006), (16, 0.047), (17, 0.146), (18, -0.124), (19, -0.108), (20, -0.039), (21, -0.05), (22, -0.131), (23, 0.074), (24, -0.173), (25, -0.165), (26, 0.213), (27, 0.05), (28, 0.076), (29, -0.032), (30, 0.051), (31, -0.123), (32, 0.044), (33, -0.226), (34, -0.01), (35, -0.154), (36, 0.072), (37, 0.108), (38, 0.107), (39, -0.02), (40, -0.049), (41, -0.055), (42, 0.111), (43, 0.061), (44, 0.022), (45, -0.068), (46, 0.002), (47, 0.014), (48, -0.078), (49, -0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96719766 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

Author: Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj

Abstract: In this paper we present an algorithm for separating mixed sounds from a monophonic recording. Our approach makes use of training data which allows us to learn representations of the types of sounds that compose the mixture. In contrast to popular methods that attempt to extract compact generalizable models for each sound from training data, we employ the training data itself as a representation of the sources in the mixture. We show that mixtures of known sounds can be described as sparse combinations of the training data itself, and in doing so produce significantly better separation results as compared to similar systems based on compact statistical models. Keywords: Example-Based Representation, Signal Separation, Sparse Models. 1

2 0.71016073 140 nips-2009-Linearly constrained Bayesian matrix factorization for blind source separation

Author: Mikkel Schmidt

Abstract: We present a general Bayesian approach to probabilistic matrix factorization subject to linear constraints. The approach is based on a Gaussian observation model and Gaussian priors with bilinear equality and inequality constraints. We present an efficient Markov chain Monte Carlo inference procedure based on Gibbs sampling. Special cases of the proposed model are Bayesian formulations of nonnegative matrix factorization and factor analysis. The method is evaluated on a blind source separation problem. We demonstrate that our algorithm can be used to extract meaningful and interpretable features that are remarkably different from features extracted using existing related matrix factorization techniques.

3 0.61100781 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

Author: Marcel V. Gerven, Botond Cseke, Robert Oostenveld, Tom Heskes

Abstract: We introduce a novel multivariate Laplace (MVL) distribution as a sparsity promoting prior for Bayesian source localization that allows the specification of constraints between and within sources. We represent the MVL distribution as a scale mixture that induces a coupling between source variances instead of their means. Approximation of the posterior marginals using expectation propagation is shown to be very efficient due to properties of the scale mixture representation. The computational bottleneck amounts to computing the diagonal elements of a sparse matrix inverse. Our approach is illustrated using a mismatch negativity paradigm for which MEG data and a structural MRI have been acquired. We show that spatial coupling leads to sources which are active over larger cortical areas as compared with an uncoupled prior. 1

4 0.45505655 253 nips-2009-Unsupervised feature learning for audio classification using convolutional deep belief networks

Author: Honglak Lee, Peter Pham, Yan Largman, Andrew Y. Ng

Abstract: In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. However, to our knowledge, these deep learning approaches have not been extensively studied for auditory data. In this paper, we apply convolutional deep belief networks to audio data and empirically evaluate them on various audio classification tasks. In the case of speech data, we show that the learned features correspond to phones/phonemes. In addition, our feature representations learned from unlabeled audio data show very good performance for multiple audio classification tasks. We hope that this paper will inspire more research on deep learning approaches applied to a wide range of audio recognition tasks. 1

5 0.42493784 227 nips-2009-Speaker Comparison with Inner Product Discriminant Functions

Author: Zahi Karam, Douglas Sturim, William M. Campbell

Abstract: Speaker comparison, the process of finding the speaker similarity between two speech signals, occupies a central role in a variety of applications—speaker verification, clustering, and identification. Speaker comparison can be placed in a geometric framework by casting the problem as a model comparison process. For a given speech signal, feature vectors are produced and used to adapt a Gaussian mixture model (GMM). Speaker comparison can then be viewed as the process of compensating and finding metrics on the space of adapted models. We propose a framework, inner product discriminant functions (IPDFs), which extends many common techniques for speaker comparison—support vector machines, joint factor analysis, and linear scoring. The framework uses inner products between the parameter vectors of GMM models motivated by several statistical methods. Compensation of nuisances is performed via linear transforms on GMM parameter vectors. Using the IPDF framework, we show that many current techniques are simple variations of each other. We demonstrate, on a 2006 NIST speaker recognition evaluation task, new scoring methods using IPDFs which produce excellent error rates and require significantly less computation than current techniques.

6 0.36764807 83 nips-2009-Estimating image bases for visual image reconstruction from human brain activity

7 0.34773347 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

8 0.34615946 147 nips-2009-Matrix Completion from Noisy Entries

9 0.33736432 192 nips-2009-Posterior vs Parameter Sparsity in Latent Variable Models

10 0.31620958 222 nips-2009-Sparse Estimation Using General Likelihoods and Non-Factorial Priors

11 0.31347367 129 nips-2009-Learning a Small Mixture of Trees

12 0.31094036 138 nips-2009-Learning with Compressible Priors

13 0.30807751 79 nips-2009-Efficient Recovery of Jointly Sparse Vectors

14 0.30471906 203 nips-2009-Replacing supervised classification learning by Slow Feature Analysis in spiking neural networks

15 0.30328858 137 nips-2009-Learning transport operators for image manifolds

16 0.30082417 50 nips-2009-Canonical Time Warping for Alignment of Human Behavior

17 0.29950958 75 nips-2009-Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models

18 0.29371169 25 nips-2009-Adaptive Design Optimization in Experiments with People

19 0.2903538 146 nips-2009-Manifold Regularization for SIR with Rate Root-n Convergence

20 0.28485742 104 nips-2009-Group Sparse Coding


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.063), (7, 0.155), (24, 0.045), (25, 0.062), (31, 0.017), (35, 0.04), (36, 0.097), (39, 0.057), (58, 0.079), (61, 0.018), (71, 0.051), (81, 0.071), (86, 0.148)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.9260124 95 nips-2009-Fast subtree kernels on graphs

Author: Nino Shervashidze, Karsten M. Borgwardt

Abstract: In this article, we propose fast subtree kernels on graphs. On graphs with n nodes and m edges and maximum degree d, these kernels comparing subtrees of height h can be computed in O(mh), whereas the classic subtree kernel by Ramon & G¨ rtner scales as O(n2 4d h). Key to this efficiency is the observation that the a Weisfeiler-Lehman test of isomorphism from graph theory elegantly computes a subtree kernel as a byproduct. Our fast subtree kernels can deal with labeled graphs, scale up easily to large graphs and outperform state-of-the-art graph kernels on several classification benchmark datasets in terms of accuracy and runtime. 1

2 0.90688103 165 nips-2009-Noise Characterization, Modeling, and Reduction for In Vivo Neural Recording

Author: Zhi Yang, Qi Zhao, Edward Keefer, Wentai Liu

Abstract: Studying signal and noise properties of recorded neural data is critical in developing more efficient algorithms to recover the encoded information. Important issues exist in this research including the variant spectrum spans of neural spikes that make it difficult to choose a globally optimal bandpass filter. Also, multiple sources produce aggregated noise that deviates from the conventional white Gaussian noise. In this work, the spectrum variability of spikes is addressed, based on which the concept of adaptive bandpass filter that fits the spectrum of individual spikes is proposed. Multiple noise sources have been studied through analytical models as well as empirical measurements. The dominant noise source is identified as neuron noise followed by interface noise of the electrode. This suggests that major efforts to reduce noise from electronics are not well spent. The measured noise from in vivo experiments shows a family of 1/f x spectrum that can be reduced using noise shaping techniques. In summary, the methods of adaptive bandpass filtering and noise shaping together result in several dB signal-to-noise ratio (SNR) enhancement.

same-paper 3 0.89090562 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

Author: Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj

Abstract: In this paper we present an algorithm for separating mixed sounds from a monophonic recording. Our approach makes use of training data which allows us to learn representations of the types of sounds that compose the mixture. In contrast to popular methods that attempt to extract compact generalizable models for each sound from training data, we employ the training data itself as a representation of the sources in the mixture. We show that mixtures of known sounds can be described as sparse combinations of the training data itself, and in doing so produce significantly better separation results as compared to similar systems based on compact statistical models. Keywords: Example-Based Representation, Signal Separation, Sparse Models. 1

4 0.88406527 170 nips-2009-Nonlinear directed acyclic structure learning with weakly additive noise models

Author: Arthur Gretton, Peter Spirtes, Robert E. Tillman

Abstract: The recently proposed additive noise model has advantages over previous directed structure learning approaches since it (i) does not assume linearity or Gaussianity and (ii) can discover a unique DAG rather than its Markov equivalence class. However, for certain distributions, e.g. linear Gaussians, the additive noise model is invertible and thus not useful for structure learning, and it was originally proposed for the two variable case with a multivariate extension which requires enumerating all possible DAGs. We introduce weakly additive noise models, which extends this framework to cases where the additive noise model is invertible and when additive noise is not present. We then provide an algorithm that learns an equivalence class for such models from data, by combining a PC style search using recent advances in kernel measures of conditional dependence with local searches for additive noise models in substructures of the Markov equivalence class. This results in a more computationally efficient approach that is useful for arbitrary distributions even when additive noise models are invertible. 1

5 0.87726426 80 nips-2009-Efficient and Accurate Lp-Norm Multiple Kernel Learning

Author: Marius Kloft, Ulf Brefeld, Pavel Laskov, Klaus-Robert Müller, Alexander Zien, Sören Sonnenburg

Abstract: Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability. Unfortunately, 1 -norm MKL is hardly observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures, we generalize MKL to arbitrary p -norms. We devise new insights on the connection between several existing MKL formulations and develop two efficient interleaved optimization strategies for arbitrary p > 1. Empirically, we demonstrate that the interleaved optimization strategies are much faster compared to the traditionally used wrapper approaches. Finally, we apply p -norm MKL to real-world problems from computational biology, showing that non-sparse MKL achieves accuracies that go beyond the state-of-the-art. 1

6 0.86806428 69 nips-2009-Discrete MDL Predicts in Total Variation

7 0.79868138 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs

8 0.79730999 119 nips-2009-Kernel Methods for Deep Learning

9 0.78689367 99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning

10 0.78419966 127 nips-2009-Learning Label Embeddings for Nearest-Neighbor Multi-class Classification with an Application to Speech Recognition

11 0.77127153 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

12 0.77068698 141 nips-2009-Local Rules for Global MAP: When Do They Work ?

13 0.77004528 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning

14 0.76646286 104 nips-2009-Group Sparse Coding

15 0.76516187 137 nips-2009-Learning transport operators for image manifolds

16 0.76471961 13 nips-2009-A Neural Implementation of the Kalman Filter

17 0.76259011 237 nips-2009-Subject independent EEG-based BCI decoding

18 0.76006335 151 nips-2009-Measuring Invariances in Deep Networks

19 0.75929129 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

20 0.75791413 3 nips-2009-AUC optimization and the two-sample problem