nips nips2001 nips2001-103 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Stefan Harmeling, Andreas Ziehe, Motoaki Kawanabe, Klaus-Robert Müller
Abstract: In kernel based learning the data is mapped to a kernel feature space of a dimension that corresponds to the number of training data points. In practice, however, the data forms a smaller submanifold in feature space, a fact that has been used e.g. by reduced set techniques for SVMs. We propose a new mathematical construction that permits to adapt to the intrinsic dimension and to find an orthonormal basis of this submanifold. In doing so, computations get much simpler and more important our theoretical framework allows to derive elegant kernelized blind source separation (BSS) algorithms for arbitrary invertible nonlinear mixings. Experiments demonstrate the good performance and high computational efficiency of our kTDSEP algorithm for the problem of nonlinear BSS.
Reference: text
sentIndex sentText sentNum sentScore
1 de Abstract In kernel based learning the data is mapped to a kernel feature space of a dimension that corresponds to the number of training data points. [sent-5, score-0.626]
2 In practice, however, the data forms a smaller submanifold in feature space, a fact that has been used e. [sent-6, score-0.103]
3 We propose a new mathematical construction that permits to adapt to the intrinsic dimension and to find an orthonormal basis of this submanifold. [sent-9, score-0.365]
4 In doing so, computations get much simpler and more important our theoretical framework allows to derive elegant kernelized blind source separation (BSS) algorithms for arbitrary invertible nonlinear mixings. [sent-10, score-1.111]
5 Experiments demonstrate the good performance and high computational efficiency of our kTDSEP algorithm for the problem of nonlinear BSS. [sent-11, score-0.264]
6 1 Introduction In a widespread area of applications kernel based learning machines, e. [sent-12, score-0.179]
7 , T ) into some kernel feature space F by some mapping Φ : n → F. [sent-23, score-0.314]
8 Performing a simple linear algorithm in F, then corresponds to a nonlinear algorithm in input space. [sent-24, score-0.375]
9 Essential ingredients to kernel based learning are (a) VC theory that can provide a relation between the complexity of the function class in use and the generalization error and (b) the famous kernel trick k(x, y) = Φ(x) · Φ(y), (1) which allows to efficiently compute scalar products. [sent-25, score-0.476]
10 Note that even though F might be infinite dimensional the subspace where the data lies is maximally T -dimensional. [sent-29, score-0.095]
11 However, the data typically forms an even smaller subspace in F (cf. [sent-30, score-0.057]
12 In this work we therefore propose a new mathematical construction that allows us to adapt to the intrinsic dimension and to provide an orthonormal basis of this submanifold. [sent-32, score-0.392]
13 Furthermore, this makes computations much simpler and provides the basis for a new set of kernelized learning algorithms. [sent-33, score-0.123]
14 To demonstrate the power of our new framework we will focus on the problem of nonlinear BSS [2, 18, 9, 10, 20, 11, 13, 14, 7, 17, 8] and provide an elegant kernel based algorithm for arbitrary invertible nonlinearities. [sent-35, score-0.632]
15 In nonlinear BSS we observe a mixed signal of the following structure xt = f (st ), (2) where xt and st are n × 1 column vectors and f is a possibly nonlinear function from n to n . [sent-36, score-0.869]
16 In the special case where f is an n×n matrix we retrieve standard linear BSS (e. [sent-37, score-0.08]
17 Nonlinear BSS has so far been only applied to industrial pulp data [8], but a large class of applications where nonlinearities can occur in the mixing process are conceivable, e. [sent-40, score-0.056]
18 in the fields of telecommunications, array processing, biomedical data analysis (EEG, MEG, EMG, . [sent-42, score-0.077]
19 xt = f (Ast ), (3) where A is a linear mixing matrix and f is a post-nonlinearity that operates componentwise. [sent-48, score-0.206]
20 self-organizing maps [13, 10], extensions of GTM [14], neural networks [2, 11] or ensemble learning [18] to unfold the nonlinearity f . [sent-52, score-0.094]
21 Also a kernel based method was tried on very simple toy signals; however with some stability problems [7]. [sent-53, score-0.224]
22 In our contribution to the general invertable nonlinear BSS case we apply a standard BSS technique [21, 1] (that relies on temporal correlations) to mapped signals in feature space (cf. [sent-55, score-0.647]
23 section 2), but proves to be a remarkably stable and efficient algorithm with high performance, as we will see in the experiments on nonlinear mixtures of toy and speech data (cf. [sent-58, score-0.378]
24 2 Theory An orthonormal basis for a subspace in F In order to establish a linear problem in feature space that corresponds to some nonlinear problem in input space we need to specify how to map inputs x 1 , . [sent-61, score-0.849]
25 , xT ∈ n into the feature space F and how to handle its possibly high dimensionality. [sent-64, score-0.135]
26 , vd ∈ n from the same space, that will later generate a basis in F. [sent-68, score-0.223]
27 Let us denote the mapped points by Φx := [Φ(x1 ) · · · Φ(xT )] and Φv := [Φ(v1 ) · · · Φ(vd )]. [sent-71, score-0.12]
28 We assume that the columns of Φv constitute a basis of the column space1 of Φx , which we note by span(Φv ) = span(Φx ) and rank(Φv ) = d. [sent-72, score-0.169]
29 (4) Moreover, Φv being a basis implies that the matrix Φv Φv has full rank and its inverse exists. [sent-73, score-0.194]
30 So, now we can define an orthonormal basis 1 Ξ := Φv (Φv Φv )− 2 (5) the column space of which is identical to the column space of Φv . [sent-74, score-0.582]
31 Consequently this basis Ξ enables us to parameterize all vectors that lie in the column space of Φ x by some vectors T in d . [sent-75, score-0.363]
32 For instance for vectors i=1 αΦi Φ(xi ), which we write more compactly as Φx αΦ , and Φx βΦ in the column space of Φx with αΦ and βΦ in T there exist αΞ and βΞ in d such that Φx αΦ = ΞαΞ and Φx βΦ = ΞβΞ . [sent-76, score-0.212]
33 The orthonormality implies αΦ Φx Φx βΦ = αΞ Ξ ΞβΞ = αΞ βΞ (6) input space n feature space span(Ξ) F parameter space d Figure 1: Input data are mapped to some submanifold of F which is in the span of some ddimensional orthonormal basis Ξ. [sent-77, score-0.837]
34 Therefore these mapped points can be parametrized in d . [sent-78, score-0.12]
35 The linear directions in parameter space correspond to nonlinear directions in input space. [sent-79, score-0.486]
36 which states the remarkable property that the dot product of two linear combinations of the columns of Φx in F coincides with the dot product in d . [sent-80, score-0.112]
37 (5)) the column space of Φx is naturally isomorphic (as vector spaces) to d . [sent-83, score-0.172]
38 Moreover, this isomorphism is compatible with the two involved dot products as was shown in eq. [sent-84, score-0.089]
39 This implies that all properties regarding angles and lengths can be taken back and forth between the column space of Φx and d . [sent-86, score-0.206]
40 The space that is spanned by Ξ is called parameter space. [sent-87, score-0.134]
41 Figure 1 pictures our intuition: Usually kernel methods parameterize the column space of Φx in terms of the mapped patterns {Φ(xi )} which effectively corresponds to vectors in T . [sent-88, score-0.517]
42 in the span of Ξ, which is extremely valuable since d depends solely on the kernel function and the dimensionality of the input space. [sent-92, score-0.362]
43 Mapping inputs Having established the machinery above, we will now show how to map the input data to the right space. [sent-94, score-0.079]
44 d are the entries of a real valued d × d matrix Φv Φv that can be effectively calculated using the kernel trick and by construction of v1 , . [sent-98, score-0.39]
45 T, which are the entries of the real valued d × T matrix Φv Φx . [sent-108, score-0.125]
46 Using both matrices we compute finally the parameter matrix 1 Ψx := Ξ Φx = (Φv Φv )− 2 Φv Φx (7) 1 The column space of Φx is the space that is spanned by the column vectors of Φx , written span(Φx ). [sent-109, score-0.487]
47 Regarding computational costs, we have to evaluate the kernel function O(d 2 ) + O(dT ) times and eq. [sent-111, score-0.179]
48 Furthermore storage requirements are cheaper as we do not have to hold the full T × T kernel matrix but only a d × T matrix. [sent-113, score-0.223]
49 Also, kernel based algorithms often require centering in F, which in our setting is equivalent to centering in d . [sent-114, score-0.271]
50 Choosing vectors for the basis in F So far we have assumed to have points v1 , . [sent-116, score-0.145]
51 , vd are roughly analogous to a reduced set in the support vector world [15]. [sent-124, score-0.151]
52 (8) In this case we strive for points that provide the best approximation. [sent-129, score-0.06]
53 Obviously d is finite since it is bounded by T , the number of inputs, and by the dimensionality of the feature space. [sent-130, score-0.06]
54 Before formulating the algorithm we define the function rk(n) for numbers n by the following process: randomly pick n points v 1 , . [sent-131, score-0.068]
55 , vn from the inputs and compute the rank of the corresponding n × n matrix Φv Φv . [sent-134, score-0.161]
56 Using this definition we can formulate a recipe to find d (the dimension of the subspace of F): (1) start with a large d with rk(d) < d. [sent-139, score-0.103]
57 3 Nonlinear blind source separation To demonstrate the use of the orthonormal basis in F, we formulate a new nonlinear BSS algorithm based on TDSEP [21]. [sent-147, score-1.171]
58 , vd , that are provided by the algorithm from the last section such that eq. [sent-151, score-0.186]
59 Hereby we have transformed the time signals x[t] from input space to parameter space signals Ψx [t] (cf. [sent-155, score-0.554]
60 Now we apply the usual TDSEP ([21]) that relies on simultaneous diagonalisation techniques [5] to perform linear blind source separation on Ψ x [t] to obtain d linear directions of separated nonlinear components in input space. [sent-158, score-1.14]
61 This new algorithm is denoted as kTDSEP (kernel TDSEP); in short, kTDSEP is TDSEP on the parameter space defined in Fig. [sent-159, score-0.14]
62 A key to the success of our algorithm are the time correlations exploited by TDSEP; intuitively they provide the ‘glue’ that yields the coherence for the separated signals. [sent-161, score-0.093]
63 Note that for a linear kernel functions the new algorithm performs linear BSS. [sent-162, score-0.286]
64 Note that common kernel based algorithms which do not use the d-dimensional orthonormal basis will run into computational problems. [sent-164, score-0.417]
65 They need to hold and compute with a kernel matrix that is T × T instead of d × T with T d in BSS problems. [sent-165, score-0.223]
66 Moreover BSS methods typically become unfeasible for separation problems of dimension T . [sent-167, score-0.256]
67 Note, that the nonlinear unmixing agrees very nicely with the scatterplot of the true source signal. [sent-170, score-0.505]
68 4 Experiments In the first experiment the source signals s[t] = [s1 [t] s2 [t]] are a sinusoidal and a sawtooth signal with 2000 samples each. [sent-171, score-0.424]
69 A dimension d = 22 of the manifold in feature space was obtained by kTDSEP using a polynomial kernel k(x, y) = (x y + 1)6 by sampling from the inputs. [sent-175, score-0.42]
70 , v22 are shown as big dots in the upper left panel of Figure 2. [sent-179, score-0.133]
71 Applying TDSEP to the 22 dimensional mapped signals Ψx [t] we get 22 components in parameter space. [sent-180, score-0.355]
72 A scatter plot with the two components that best match the source signals are shown in the right upper panel of Figure 2. [sent-181, score-0.521]
73 The left lower panel also shows for comparison the two components that we obtained by applying linear TDSEP directly to the mixed signals x[t]. [sent-182, score-0.405]
74 The plots clearly indicate that kTDSEP has unfolded the nonlinearity successfully while the linear demixing algorithm failed. [sent-183, score-0.173]
75 In a second experiment two speech signals (with 20000 samples, sampling rate 8 kHz) that are nonlinearly mixed by x1 [t] = s1 [t] + s3 [t] 2 x2 [t] = s3 [t] + tanh(s2 [t]). [sent-184, score-0.374]
76 1 This time we used a Gaussian RBF kernel k(x, y) = exp(−|x − y|2 ). [sent-185, score-0.179]
77 These points are marked as ’+’ in the left panel of figure 4. [sent-190, score-0.137]
78 An application of TDSEP to the 41 dimensional parameter mixture kTDSEP TDSEP x1 s1 s2 x2 u1 u2 u1 u2 0. [sent-191, score-0.068]
79 55 Table 3: Correlation coefficients for the signals shown in Fig. [sent-203, score-0.167]
80 space yields nonlinear components whose projections to the input space are depicted in the right lower panel. [sent-205, score-0.452]
81 We can see that linear TDSEP (right middle panel) failed and that the directions of best matching kTDSEP components closely resemble the sources. [sent-206, score-0.107]
82 To confirm this visual impression we calculated the correlation coefficients of the kTDSEP and TDSEP solution to the source signals (cf. [sent-207, score-0.355]
83 First, we propose a new formulation in the field of kernel based learning methods that allows to construct an orthonormal basis of the subspace of kernel feature space F where the data lies. [sent-211, score-0.788]
84 This technique establishes a highly useful (scalar product preserving) isomorphism between the image of the data points in F and a d-dimensional space d . [sent-212, score-0.159]
85 a new and eventually more stable variant of kernel PCA [16]. [sent-215, score-0.179]
86 Moreover, we can acquire knowledge about the intrinsic dimension of the data manifold in F from the learning process. [sent-216, score-0.113]
87 Second, using our new formulation we tackle the problem of nonlinear BSS from the viewpoint of kernel based learning. [sent-217, score-0.408]
88 The proposed kTDSEP algorithm allows to unmix arbitrary invertible nonlinear mixtures with low computational costs. [sent-218, score-0.382]
89 Note, that the important ingredients are the temporal correlations of the source signals used by TDSEP. [sent-219, score-0.432]
90 Experiments on toy and speech signals underline that an elegant solution has been found to a challenging problem. [sent-220, score-0.325]
91 Applications where nonlinearly mixed signals can occur, are found e. [sent-221, score-0.302]
92 in the fields of telecommunications, array processing, biomedical data analysis (EEG, MEG, EMG, . [sent-223, score-0.077]
93 In fact, our algorithm would allow to provide a softwarebased correction of sensors that have a nonlinear characteristics e. [sent-227, score-0.291]
94 Clearly kTDSEP is only one algorithm that can perform nonlinear BSS; kernelizing other ICA algorithms can be done following our reasoning. [sent-230, score-0.264]
95 8 2 4 x 10 Figure 4: A highly nonlinear mixture of two speech signals: Scatterplot of x1 vs x2 and the waveforms of the true source signals (upper panel) in comparison to the best matching linear and nonlinear separation results are shown in the middle and lower panel, respectively. [sent-257, score-1.098]
96 A blind source separation technique based on second order statistics. [sent-264, score-0.669]
97 A maximum likelihood approach to nonlinear blind source separation. [sent-358, score-0.688]
98 Nonlinear independent component analysis using ensemble learning: Experiments and discussion. [sent-396, score-0.06]
99 Information-theoretic approach to blind separation of sources in non-linear mixture. [sent-411, score-0.481]
100 TDSEP—an efficient algorithm for blind separation using time structure. [sent-417, score-0.516]
wordName wordTfidf (topN-words)
[('ktdsep', 0.351), ('bss', 0.331), ('tdsep', 0.331), ('blind', 0.271), ('nonlinear', 0.229), ('separation', 0.21), ('source', 0.188), ('kernel', 0.179), ('signals', 0.167), ('orthonormal', 0.166), ('vd', 0.151), ('rk', 0.13), ('span', 0.114), ('panel', 0.104), ('column', 0.097), ('invertible', 0.088), ('scatterplot', 0.088), ('mapped', 0.087), ('rank', 0.078), ('space', 0.075), ('elegant', 0.074), ('basis', 0.072), ('nonlinearly', 0.07), ('xt', 0.07), ('signal', 0.069), ('mixed', 0.065), ('demixing', 0.065), ('tsch', 0.061), ('feature', 0.06), ('pajunen', 0.059), ('mika', 0.058), ('subspace', 0.057), ('mixing', 0.056), ('valued', 0.053), ('ica', 0.051), ('kernelized', 0.051), ('finland', 0.051), ('helsinki', 0.051), ('isomorphism', 0.051), ('emg', 0.046), ('centering', 0.046), ('meg', 0.046), ('ingredients', 0.046), ('dimension', 0.046), ('trick', 0.045), ('toy', 0.045), ('matrix', 0.044), ('vi', 0.043), ('potsdam', 0.043), ('icann', 0.043), ('biomedical', 0.043), ('cardoso', 0.043), ('submanifold', 0.043), ('construction', 0.041), ('sch', 0.041), ('eeg', 0.041), ('ful', 0.041), ('ef', 0.04), ('intrinsic', 0.04), ('vectors', 0.04), ('input', 0.04), ('inputs', 0.039), ('speech', 0.039), ('parameterize', 0.039), ('telecommunications', 0.039), ('dimensional', 0.038), ('dot', 0.038), ('pages', 0.038), ('directions', 0.038), ('exp', 0.037), ('nonlinearity', 0.037), ('linear', 0.036), ('workshop', 0.036), ('vj', 0.035), ('algorithm', 0.035), ('array', 0.034), ('angles', 0.034), ('points', 0.033), ('sampling', 0.033), ('components', 0.033), ('acoustic', 0.032), ('ller', 0.032), ('component', 0.031), ('correlations', 0.031), ('mixtures', 0.03), ('simultaneous', 0.03), ('parameter', 0.03), ('moreover', 0.03), ('upper', 0.029), ('spanned', 0.029), ('elds', 0.029), ('relies', 0.029), ('valuable', 0.029), ('lkopf', 0.029), ('ensemble', 0.029), ('networks', 0.028), ('entries', 0.028), ('ieee', 0.027), ('provide', 0.027), ('manifold', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation
Author: Stefan Harmeling, Andreas Ziehe, Motoaki Kawanabe, Klaus-Robert Müller
Abstract: In kernel based learning the data is mapped to a kernel feature space of a dimension that corresponds to the number of training data points. In practice, however, the data forms a smaller submanifold in feature space, a fact that has been used e.g. by reduced set techniques for SVMs. We propose a new mathematical construction that permits to adapt to the intrinsic dimension and to find an orthonormal basis of this submanifold. In doing so, computations get much simpler and more important our theoretical framework allows to derive elegant kernelized blind source separation (BSS) algorithms for arbitrary invertible nonlinear mixings. Experiments demonstrate the good performance and high computational efficiency of our kTDSEP algorithm for the problem of nonlinear BSS.
2 0.40580028 71 nips-2001-Estimating the Reliability of ICA Projections
Author: Frank C. Meinecke, Andreas Ziehe, Motoaki Kawanabe, Klaus-Robert Müller
Abstract: When applying unsupervised learning techniques like ICA or temporal decorrelation, a key question is whether the discovered projections are reliable. In other words: can we give error bars or can we assess the quality of our separation? We use resampling methods to tackle these questions and show experimentally that our proposed variance estimations are strongly correlated to the separation error. We demonstrate that this reliability estimation can be used to choose the appropriate ICA-model, to enhance significantly the separation performance, and, most important, to mark the components that have a actual physical meaning. Application to 49-channel-data from an magneto encephalography (MEG) experiment underlines the usefulness of our approach. 1
3 0.2408205 127 nips-2001-Multi Dimensional ICA to Separate Correlated Sources
Author: Roland Vollgraf, Klaus Obermayer
Abstract: We present a new method for the blind separation of sources, which do not fulfill the independence assumption. In contrast to standard methods we consider groups of neighboring samples (
4 0.19219825 44 nips-2001-Blind Source Separation via Multinode Sparse Representation
Author: Michael Zibulevsky, Pavel Kisilev, Yehoshua Y. Zeevi, Barak A. Pearlmutter
Abstract: We consider a problem of blind source separation from a set of instantaneous linear mixtures, where the mixing matrix is unknown. It was discovered recently, that exploiting the sparsity of sources in an appropriate representation according to some signal dictionary, dramatically improves the quality of separation. In this work we use the property of multi scale transforms, such as wavelet or wavelet packets, to decompose signals into sets of local features with various degrees of sparsity. We use this intrinsic property for selecting the best (most sparse) subsets of features for further separation. The performance of the algorithm is verified on noise-free and noisy data. Experiments with simulated signals, musical sounds and images demonstrate significant improvement of separation quality over previously reported results. 1
5 0.16137846 164 nips-2001-Sampling Techniques for Kernel Methods
Author: Dimitris Achlioptas, Frank Mcsherry, Bernhard Schölkopf
Abstract: We propose randomized techniques for speeding up Kernel Principal Component Analysis on three levels: sampling and quantization of the Gram matrix in training, randomized rounding in evaluating the kernel expansions, and random projections in evaluating the kernel itself. In all three cases, we give sharp bounds on the accuracy of the obtained approximations. Rather intriguingly, all three techniques can be viewed as instantiations of the following idea: replace the kernel function by a “randomized kernel” which behaves like in expectation.
6 0.12483628 74 nips-2001-Face Recognition Using Kernel Methods
7 0.11404213 58 nips-2001-Covariance Kernels from Bayesian Generative Models
8 0.10318471 170 nips-2001-Spectral Kernel Methods for Clustering
9 0.10285081 105 nips-2001-Kernel Machines and Boolean Functions
10 0.10203461 165 nips-2001-Scaling Laws and Local Minima in Hebbian ICA
11 0.10144133 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
12 0.094300672 15 nips-2001-A New Discriminative Kernel From Probabilistic Models
13 0.090411916 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines
14 0.083052091 134 nips-2001-On Kernel-Target Alignment
15 0.077348836 136 nips-2001-On the Concentration of Spectral Properties
16 0.07697539 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine
17 0.074363142 50 nips-2001-Classifying Single Trial EEG: Towards Brain Computer Interfacing
18 0.073364221 82 nips-2001-Generating velocity tuning by asymmetric recurrent connections
19 0.07086733 39 nips-2001-Audio-Visual Sound Separation Via Hidden Markov Models
20 0.069970623 171 nips-2001-Spectral Relaxation for K-means Clustering
topicId topicWeight
[(0, -0.24), (1, 0.063), (2, -0.07), (3, -0.198), (4, 0.094), (5, 0.135), (6, 0.027), (7, 0.109), (8, 0.248), (9, -0.339), (10, -0.298), (11, -0.122), (12, -0.022), (13, 0.007), (14, 0.136), (15, 0.038), (16, 0.037), (17, 0.076), (18, -0.046), (19, -0.037), (20, 0.018), (21, 0.024), (22, -0.094), (23, -0.007), (24, -0.059), (25, 0.021), (26, 0.003), (27, -0.013), (28, 0.049), (29, -0.004), (30, -0.071), (31, -0.019), (32, -0.025), (33, 0.021), (34, -0.05), (35, -0.017), (36, -0.043), (37, -0.1), (38, -0.034), (39, -0.117), (40, -0.005), (41, -0.111), (42, -0.016), (43, -0.006), (44, -0.053), (45, -0.055), (46, -0.017), (47, -0.033), (48, 0.002), (49, 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.94140184 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation
Author: Stefan Harmeling, Andreas Ziehe, Motoaki Kawanabe, Klaus-Robert Müller
Abstract: In kernel based learning the data is mapped to a kernel feature space of a dimension that corresponds to the number of training data points. In practice, however, the data forms a smaller submanifold in feature space, a fact that has been used e.g. by reduced set techniques for SVMs. We propose a new mathematical construction that permits to adapt to the intrinsic dimension and to find an orthonormal basis of this submanifold. In doing so, computations get much simpler and more important our theoretical framework allows to derive elegant kernelized blind source separation (BSS) algorithms for arbitrary invertible nonlinear mixings. Experiments demonstrate the good performance and high computational efficiency of our kTDSEP algorithm for the problem of nonlinear BSS.
2 0.89558607 71 nips-2001-Estimating the Reliability of ICA Projections
Author: Frank C. Meinecke, Andreas Ziehe, Motoaki Kawanabe, Klaus-Robert Müller
Abstract: When applying unsupervised learning techniques like ICA or temporal decorrelation, a key question is whether the discovered projections are reliable. In other words: can we give error bars or can we assess the quality of our separation? We use resampling methods to tackle these questions and show experimentally that our proposed variance estimations are strongly correlated to the separation error. We demonstrate that this reliability estimation can be used to choose the appropriate ICA-model, to enhance significantly the separation performance, and, most important, to mark the components that have a actual physical meaning. Application to 49-channel-data from an magneto encephalography (MEG) experiment underlines the usefulness of our approach. 1
3 0.77577347 127 nips-2001-Multi Dimensional ICA to Separate Correlated Sources
Author: Roland Vollgraf, Klaus Obermayer
Abstract: We present a new method for the blind separation of sources, which do not fulfill the independence assumption. In contrast to standard methods we consider groups of neighboring samples (
4 0.73458105 44 nips-2001-Blind Source Separation via Multinode Sparse Representation
Author: Michael Zibulevsky, Pavel Kisilev, Yehoshua Y. Zeevi, Barak A. Pearlmutter
Abstract: We consider a problem of blind source separation from a set of instantaneous linear mixtures, where the mixing matrix is unknown. It was discovered recently, that exploiting the sparsity of sources in an appropriate representation according to some signal dictionary, dramatically improves the quality of separation. In this work we use the property of multi scale transforms, such as wavelet or wavelet packets, to decompose signals into sets of local features with various degrees of sparsity. We use this intrinsic property for selecting the best (most sparse) subsets of features for further separation. The performance of the algorithm is verified on noise-free and noisy data. Experiments with simulated signals, musical sounds and images demonstrate significant improvement of separation quality over previously reported results. 1
5 0.4687537 165 nips-2001-Scaling Laws and Local Minima in Hebbian ICA
Author: Magnus Rattray, Gleb Basalyga
Abstract: We study the dynamics of a Hebbian ICA algorithm extracting a single non-Gaussian component from a high-dimensional Gaussian background. For both on-line and batch learning we find that a surprisingly large number of examples are required to avoid trapping in a sub-optimal state close to the initial conditions. To extract a skewed signal at least examples are required for -dimensional data and examples are required to extract a symmetrical signal with non-zero kurtosis. § ¡ ©£¢ £ §¥ ¡ ¨¦¤£¢
6 0.45241332 164 nips-2001-Sampling Techniques for Kernel Methods
7 0.42583486 74 nips-2001-Face Recognition Using Kernel Methods
8 0.35817519 15 nips-2001-A New Discriminative Kernel From Probabilistic Models
9 0.34792793 155 nips-2001-Quantizing Density Estimators
10 0.33687258 58 nips-2001-Covariance Kernels from Bayesian Generative Models
11 0.32868683 19 nips-2001-A Rotation and Translation Invariant Discrete Saliency Network
12 0.32710707 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines
13 0.31156552 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
14 0.29793736 170 nips-2001-Spectral Kernel Methods for Clustering
15 0.2967945 105 nips-2001-Kernel Machines and Boolean Functions
16 0.29138845 136 nips-2001-On the Concentration of Spectral Properties
17 0.26969591 134 nips-2001-On Kernel-Target Alignment
18 0.26675653 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity
19 0.26671109 48 nips-2001-Characterizing Neural Gain Control using Spike-triggered Covariance
20 0.2663143 60 nips-2001-Discriminative Direction for Kernel Classifiers
topicId topicWeight
[(14, 0.051), (17, 0.029), (19, 0.047), (27, 0.151), (30, 0.09), (36, 0.012), (38, 0.01), (55, 0.262), (59, 0.069), (72, 0.052), (79, 0.037), (83, 0.031), (91, 0.089)]
simIndex simValue paperId paperTitle
same-paper 1 0.82049721 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation
Author: Stefan Harmeling, Andreas Ziehe, Motoaki Kawanabe, Klaus-Robert Müller
Abstract: In kernel based learning the data is mapped to a kernel feature space of a dimension that corresponds to the number of training data points. In practice, however, the data forms a smaller submanifold in feature space, a fact that has been used e.g. by reduced set techniques for SVMs. We propose a new mathematical construction that permits to adapt to the intrinsic dimension and to find an orthonormal basis of this submanifold. In doing so, computations get much simpler and more important our theoretical framework allows to derive elegant kernelized blind source separation (BSS) algorithms for arbitrary invertible nonlinear mixings. Experiments demonstrate the good performance and high computational efficiency of our kTDSEP algorithm for the problem of nonlinear BSS.
2 0.79609597 39 nips-2001-Audio-Visual Sound Separation Via Hidden Markov Models
Author: John R. Hershey, Michael Casey
Abstract: It is well known that under noisy conditions we can hear speech much more clearly when we read the speaker's lips. This suggests the utility of audio-visual information for the task of speech enhancement. We propose a method to exploit audio-visual cues to enable speech separation under non-stationary noise and with a single microphone. We revise and extend HMM-based speech enhancement techniques, in which signal and noise models are factori ally combined, to incorporate visual lip information and employ novel signal HMMs in which the dynamics of narrow-band and wide band components are factorial. We avoid the combinatorial explosion in the factorial model by using a simple approximate inference technique to quickly estimate the clean signals in a mixture. We present a preliminary evaluation of this approach using a small-vocabulary audio-visual database, showing promising improvements in machine intelligibility for speech enhanced using audio and visual information. 1
3 0.63804042 71 nips-2001-Estimating the Reliability of ICA Projections
Author: Frank C. Meinecke, Andreas Ziehe, Motoaki Kawanabe, Klaus-Robert Müller
Abstract: When applying unsupervised learning techniques like ICA or temporal decorrelation, a key question is whether the discovered projections are reliable. In other words: can we give error bars or can we assess the quality of our separation? We use resampling methods to tackle these questions and show experimentally that our proposed variance estimations are strongly correlated to the separation error. We demonstrate that this reliability estimation can be used to choose the appropriate ICA-model, to enhance significantly the separation performance, and, most important, to mark the components that have a actual physical meaning. Application to 49-channel-data from an magneto encephalography (MEG) experiment underlines the usefulness of our approach. 1
4 0.62334067 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
Author: Michael Collins, S. Dasgupta, Robert E. Schapire
Abstract: Principal component analysis (PCA) is a commonly applied technique for dimensionality reduction. PCA implicitly minimizes a squared loss function, which may be inappropriate for data that is not real-valued, such as binary-valued data. This paper draws on ideas from the Exponential family, Generalized linear models, and Bregman distances, to give a generalization of PCA to loss functions that we argue are better suited to other data types. We describe algorithms for minimizing the loss functions, and give examples on simulated data.
5 0.62125969 137 nips-2001-On the Convergence of Leveraging
Author: Gunnar Rätsch, Sebastian Mika, Manfred K. Warmuth
Abstract: We give an unified convergence analysis of ensemble learning methods including e.g. AdaBoost, Logistic Regression and the Least-SquareBoost algorithm for regression. These methods have in common that they iteratively call a base learning algorithm which returns hypotheses that are then linearly combined. We show that these methods are related to the Gauss-Southwell method known from numerical optimization and state non-asymptotical convergence results for all these methods. Our analysis includes -norm regularized cost functions leading to a clean and general way to regularize ensemble learning.
6 0.61686957 13 nips-2001-A Natural Policy Gradient
7 0.61654538 74 nips-2001-Face Recognition Using Kernel Methods
8 0.61570394 60 nips-2001-Discriminative Direction for Kernel Classifiers
9 0.61472213 8 nips-2001-A General Greedy Approximation Algorithm with Applications
10 0.61422384 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines
11 0.61304885 190 nips-2001-Thin Junction Trees
12 0.61223489 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms
13 0.61009246 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine
14 0.60952067 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade
15 0.60932553 127 nips-2001-Multi Dimensional ICA to Separate Correlated Sources
16 0.60757607 164 nips-2001-Sampling Techniques for Kernel Methods
17 0.6063993 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes
18 0.60625339 27 nips-2001-Activity Driven Adaptive Stochastic Resonance
19 0.60567558 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding
20 0.60494959 197 nips-2001-Why Neuronal Dynamics Should Control Synaptic Learning Rules