nips nips2006 nips2006-46 knowledge-graph by maker-knowledge-mining

46 nips-2006-Blind source separation for over-determined delayed mixtures


Source: pdf

Author: Lars Omlor, Martin Giese

Abstract: Blind source separation, i.e. the extraction of unknown sources from a set of given signals, is relevant for many applications. A special case of this problem is dimension reduction, where the goal is to approximate a given set of signals by superpositions of a minimal number of sources. Since in this case the signals outnumber the sources the problem is over-determined. Most popular approaches for addressing this problem are based on purely linear mixing models. However, many applications like the modeling of acoustic signals, EMG signals, or movement trajectories, require temporal shift-invariance of the extracted components. This case has only rarely been treated in the computational literature, and specifically for the case of dimension reduction almost no algorithms have been proposed. We present a new algorithm for the solution of this problem, which is based on a timefrequency transformation (Wigner-Ville distribution) of the generative model. We show that this algorithm outperforms classical source separation algorithms for linear mixtures, and also a related method for mixtures with delays. In addition, applying the new algorithm to trajectories of human gaits, we demonstrate that it is suitable for the extraction of spatio-temporal components that are easier to interpret than components extracted with other classical algorithms. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Blind source separation for over-determined delayed mixtures Lars Omlor, Martin Giese∗ Laboratory for Action Representation and Learning Department of Cognitive Neurology, Hertie Institute for Clinical Brain Research University of T¨ bingen, Germany u Abstract Blind source separation, i. [sent-1, score-0.812]

2 the extraction of unknown sources from a set of given signals, is relevant for many applications. [sent-3, score-0.283]

3 A special case of this problem is dimension reduction, where the goal is to approximate a given set of signals by superpositions of a minimal number of sources. [sent-4, score-0.362]

4 Since in this case the signals outnumber the sources the problem is over-determined. [sent-5, score-0.598]

5 Most popular approaches for addressing this problem are based on purely linear mixing models. [sent-6, score-0.194]

6 However, many applications like the modeling of acoustic signals, EMG signals, or movement trajectories, require temporal shift-invariance of the extracted components. [sent-7, score-0.187]

7 We show that this algorithm outperforms classical source separation algorithms for linear mixtures, and also a related method for mixtures with delays. [sent-10, score-0.531]

8 In addition, applying the new algorithm to trajectories of human gaits, we demonstrate that it is suitable for the extraction of spatio-temporal components that are easier to interpret than components extracted with other classical algorithms. [sent-11, score-0.379]

9 1 Introduction Blind source separation techniques, such as Independent Components Analysis (ICA), have received great interest in many domains including neuroscience [3; 19; 2], machine learning [12; 11], and speech and signal processing [25]. [sent-12, score-0.455]

10 A variety of algorithms have been proposed for different types of mixing models. [sent-13, score-0.194]

11 Many studies have focused on instantaneous mixing, where target signals are modeled by the linear superposition of a number of source signals separately for each point in time. [sent-14, score-0.889]

12 Another set of studies has treated convolutive mixing, where signals result from the superposition of filtered source signals (see [9] and [6] for review). [sent-15, score-0.909]

13 In this case, signals are approximated by linear combinations of source signals with time delays. [sent-17, score-0.792]

14 Classical cases of anechoic mixing arise in electrical engineering, when signals from multiple antennas are received asynchronously, or in acoustics when sound signals are recorded with multiple microphones resulting in different running times. [sent-18, score-1.179]

15 A few algorithms have been proposed for the solution of under-determined anechoic mixing problems, where the number of sources exceeds the number of signals [8; 4; 25; 22]. [sent-19, score-0.928]

16 A method that treats the case of equal numbers of signals and sources, which is based on joint diagonalization of spectral matrices, has been proposed by Yeredor [24]. [sent-20, score-0.386]

17 Almost no work exists on over-determined anechoic mixing problems, where the number of source signals is smaller than the number of original signals – the case that is most important for dimension reduction problems. [sent-21, score-1.169]

18 One approach employed for under-determined anechoic mixtures is based on the assumption of small delays and a linearization of the mixture model [5]. [sent-30, score-0.544]

19 In this paper we present a new algorithm for the solution of the over-determined anechoic mixing problem, which makes no further assumptions about the size of the delays. [sent-32, score-0.377]

20 We tested the novel algorithm with two different test data sets, human movement trajectories and synthetic mixtures of acoustic signals. [sent-34, score-0.531]

21 We demonstrate that the method results in more accurate solutions with fewer sources than classical methods (like PCA and normal ICA) for instantaneous mixing. [sent-35, score-0.368]

22 Also, we demonstrate that our algorithm outperforms the SOBIDS algorithm in [1] for anechoic mixtures. [sent-36, score-0.213]

23 In addition, we demonstrate that the method seems suitable for the extraction of biologically meaningful components from human movement data. [sent-37, score-0.224]

24 1 Source separation for over-determined delayed mixtures Delayed mixture problem In the following we assume that m signals xi (t), 1 ≤ i ≤ m have been observed. [sent-39, score-0.87]

25 These signals are approximated by a linear combination of n source signals sj (t) with 1 ≤ j ≤ n, with temporal delays τij . [sent-40, score-1.017]

26 In the case of anechoic mixing signals and sources obey the relationship: n αij · sj (t − τij ) i = 1, · · · , m xi (t) = (1) j=1 In the over-determined case the signals outnumber the sources, i. [sent-41, score-1.375]

27 Equation (1) is a special case of a convolutive mixture problem, where the filter kernels are given by delta functions. [sent-44, score-0.149]

28 Since normal Fourier transformation of equation (1) results in frequency-dependent mixtures of complex phase terms, time-frequency analysis turns out to be a more appropriate framework for the separation of sources in the above mixture models. [sent-46, score-0.682]

29 2 Wigner Ville Spectrum In signal processing and acoustics a variety of time-frequency representations have been proposed, ranging from linear and multilinear to nonlinear transformations. [sent-48, score-0.11]

30 Marginal properties: (6) tWx (t, ω)dt (7) Wx (t, ω)dt Since the group delay (7) is not uniquely defined in the stochastic case [15], the last property gives a natural definition. [sent-57, score-0.159]

31 Consistent with the standard definition for deterministic signals, the group delay for the deterministic case can be rewritten: tWx (t, ω)dt −1 ∂ arg((Fx)(ω)) = tx (ω) = 2π ∂ω Wx (t, ω)dt . [sent-58, score-0.192]

32 by time-frequency masking [25] or joint diagonalization [24; 1]. [sent-64, score-0.071]

33 These moments can be computed analytically making use of equations (5) and (7), resulting in: n 2 E{|Fxi | } = n |α|2 ij Wxi (t, ω)dt = j |α|2 E{|Fsj |2 } ij Wsj (t − τij , ω)dt = (10) j and analogously n E{|Fxi (ω)|2 } · txi (ω) = |α|2 · E{|Fsi |2 } · tsj (ω) + τij ij . [sent-69, score-0.516]

34 (11) j Equation (10) defines an instantaneous mixture problem with non-negativity constrains. [sent-70, score-0.129]

35 After solving equation (10), the remaining unknowns in equation (11) are the delays and the group delay (complex phase). [sent-72, score-0.396]

36 4 Two-step algorithm The last result implies the following algorithm for the estimation of the sources, delays, and the mixing matrix in (1): 1. [sent-77, score-0.223]

37 Compute |Fxi |2 and solve n |Fxi |2 (ω) = |α|2 |Fsj |2 (ω) ij (12) j e. [sent-78, score-0.172]

38 Initialize τij = 0 and iterate the following steps: (a) Numerically solve : |Fxi (ω)|2 · ∂ arg{Fxi } = ∂ω n |α|2 · |Fsi |2 · ij j ∂ arg{Fsj } + τij ∂ω (13) ∂ for the term ∂ω arg{Fsj }. [sent-83, score-0.172]

39 A first set of data was generated artificially by mixing different sound sources, varying the mixing weights and the delays. [sent-88, score-0.481]

40 This data was non-periodic and enabled us to validate the accuracy of the reconstruction of the source signals. [sent-89, score-0.162]

41 The second data set were human movement trajectories of emotional gaits that were recorded using motion capture. [sent-90, score-0.598]

42 The gait trajectories were periodic and served for testing the suitability of the new method for extracting biologically interpretable movement components. [sent-91, score-0.6]

43 Our first data set consisted of synthetically generated delayed mixtures generated from segments of speech and sound signals taken from an ICA benchmark data set described in [6]. [sent-92, score-0.869]

44 In order to obtain statistically representative results, data sets were recomputed 20 times with random selection of the source signals, and/or of the mixing and delay matrices. [sent-94, score-0.515]

45 Three types of mixtures were generated: (I) Mixtures of 2 source segments with random mixing and delay matrices (2 × 2), each resulting in two simulated signals x1,2 . [sent-95, score-1.08]

46 (II) Mixtures of 2 randomly selected segments from the speech data basis using the constant mixing matrix A = [1, 2; 3, 1; 10, 5; 1, 2; 1, 1] and the constant delay matrix T = (τij )ij = [0, 4000; 2500, 5000; 100, 200; 1, 1; 500, 333]. [sent-98, score-0.538]

47 This data set was used to compare the new method with PCA and ICA, and the SOBIDS algorithm [1], which requires at least twice as many signals as sources. [sent-99, score-0.315]

48 Data set (II) with fixed mixing and delay matrices was included since completely random generation sometimes produced degenerated anechoic mixtures (instantaneous mixtures or ill-conditioned mixing matrices). [sent-100, score-1.077]

49 (III) A third data set was generated by mixing two randomly selected source segments with random mixture matrices and random delay matrices. [sent-101, score-0.678]

50 The movements were recorded using a VICON 612 motion capture system with 7 cameras, obtaining the 3D positions of 41 passive markers on the bodies of the actors. [sent-103, score-0.074]

51 We recorded trajectories from 13 lay actors, repeating each walking style three times per actor. [sent-104, score-0.287]

52 A hierarchical kinematic body model (skeleton) with 17 joints was fitted to the marker positions, and joint angles were computed. [sent-105, score-0.163]

53 Rotations between adjacent body segments were described as Euler angles, defining flexion, abduction and rotation about the connecting joints. [sent-106, score-0.086]

54 The data for the unsupervised learning procedure included only the flexion angles of the hip, knee, elbow, shoulder and the clavicle, since the other angles had relatively high noise levels. [sent-107, score-0.094]

55 From each trajectory only one gait cycle was extracted, which was time normalized. [sent-108, score-0.292]

56 1 Results Delayed mixtures of sound sources: Figure 1 shows the results for the extraction of the sound sources from the data sets (I)-(III). [sent-111, score-0.623]

57 Figure 1: Comparison of different blind source separation algorithms for synthetic mixtures of sound signals with delays (data sets I-III, see text). [sent-122, score-1.21]

58 Human gait trajectories: With the second data set we tested whether the proposed novel algorithm is suitable for the extraction of interpretable source signals from human movement data. [sent-125, score-1.041]

59 By performing normal ICA separately on individual joint trajectories and comparing the extracted sources, we had observed before that such sources are often very similar except for time shifts between different joints. [sent-126, score-0.546]

60 This motivates the hypothesis that (1) might provide an appropriate generative model for such gait trajectories. [sent-127, score-0.258]

61 This hypothesis is confirmed by the data presented in Figure 2 that shows the approximation quality (explained variance) for different numbers of extracted sources and comparing four different algorithms: PCA, (fast) ICA [12], Bayesian positive ICA [11], SOBIDS, and the new algorithm. [sent-128, score-0.31]

62 Specifically, the new algorithm is capable of approximating 97% of the trajectory data with only 3 sources, while PCA and ICA require more than 6 sources to achieve the same level of accuracy. [sent-130, score-0.27]

63 Figure 2: Comparison of different blind source separation algorithms. [sent-131, score-0.508]

64 Explained variance is shown for different numbers of extracted sources. [sent-132, score-0.074]

65 In order to test whether the novel algorithm results in source signals that provide useful interpretations of biological data we modeled all trajectories in our gait data sets by linear superpositions of the extracted sources and analyzed the resulting mixture matrices A. [sent-133, score-1.394]

66 To extract weight components that are specific for individual emotional styles we modeled the mixture matrices applying sparse linear regression. [sent-134, score-0.327]

67 The (vectorized) weights of the individual gait trajectories for emotion j, defining the vector aj , were approximated by the sum of a component a0 (containing the weights that characterize neutral walking) and an emotion-specific contribution. [sent-135, score-0.604]

68 Formally, this multi-linear regression model can be written as aj ≈ a0 + C · ej , (15) where C is a weight matrix that determines the emotion-specific contributions to the mixing weights. [sent-136, score-0.323]

69 Its columns are given by the differences between the weights for the different emotional styles (happy, sad, fearful and angry) and the weights for neutral walking. [sent-137, score-0.339]

70 In order to obtain easily interpretable results, the matrix C was sparsified by L1 norm minimization. [sent-138, score-0.103]

71 Figure 3 shows a gray level plot of the matrix C, illustrating the weight differences compared to neutral walking for the four different emotional styles and the different joint angles. [sent-142, score-0.449]

72 Positive elements of the matrix indicate cases where the joint amplitudes for the emotional gait are increased compared to normal walking. [sent-143, score-0.622]

73 They correspond to cases where the joint angle amplitudes for the emotional walk are reduced compared to normal walking. [sent-145, score-0.307]

74 The + and − signs in the figure summarize data from psychophysical experiments that have investigated kinematic features that were important for the perception of emotional gaits [17; 23]. [sent-146, score-0.439]

75 Comparison between these psychophysical results and the elements of the matrix C (Figure 3a) shows a very close match between the weight changes and the features that are important for the perception of emotions from gaits. [sent-148, score-0.216]

76 Figure 3b shows the results of the same analysis for sources that had been extracted with PCA, matching the numbers of non-zero elements of the estimated matrix C. [sent-150, score-0.367]

77 For gait trajectories the source signals by PCA and ICA are virtually identical. [sent-151, score-0.894]

78 In either case, the match is significantly worse than for the sources extracted with the novel algorithm (panel a). [sent-153, score-0.347]

79 In addition, the signs of the matrix elements often do not match the signs of the amplitude changes in the psychophysical experiments. [sent-154, score-0.238]

80 This implies that the new algorithm extracts spatio-temporal components from human gait data that are more easily interpretable than components extracted with PCA. [sent-155, score-0.474]

81 Figure 3: Elements of the weight matrix C, encoding emotion-specific deviations from neutral walking, for different degrees of freedom. [sent-156, score-0.1]

82 Kinematic features that have been shown to be important for the perception of emotions from gait in psychophysical experiments are indicated by the plus and minus signs. [sent-158, score-0.417]

83 2 Conclusion We present a new algorithm for the solution of over-determined blind source separation problems for mixtures of sources with delays. [sent-161, score-0.898]

84 The proposed method has been derived by application of a timefrequency transformation to the mixture model, resulting in a two-step algorithm that combines positive ICA with another iterative optimization step. [sent-162, score-0.108]

85 We demonstrate that the developed algorithm outperforms other source separation algorithms with and without time delays on synthetic data sets defined by delayed mixtures of speech signals, and also on real data sets obtained by motion capture of human full-body movements. [sent-163, score-0.994]

86 For human movements we also demonstrate that, at least for the case of human gait, the new algorithm provides a more compact and interpretable representations than the alternative methods we tested. [sent-164, score-0.21]

87 To our knowledge the proposed algorithm is the first one that solves over-determined delayed mixing problems without specific additional assumptions about the structure of the delay matrix, e. [sent-165, score-0.533]

88 In contrast to nonnegative matrix factorization with delays [2], the proposed method is applicable to non-positive signals and sources. [sent-168, score-0.484]

89 Ashtar, et al (2004) A novel approach to blind separation of delayed sources in linear mixtures. [sent-177, score-0.799]

90 Sejnowski (1995) An information-maximization approach to blind separation and blind deconvolution. [sent-187, score-0.538]

91 Bofill (2003) Underdetermined blind separation of delayed sound sources in the frequency domain. [sent-190, score-0.855]

92 Comon (1998) Estimation of time delays between unknown colored signals. [sent-206, score-0.14]

93 Rickard (1982) Survey of sparse and non-sparse methods in source separation. [sent-214, score-0.162]

94 International Journal of Imaging Systems and Technology (IJIST), special issue on blind source separation and deconvolution in imaging and image processing (15). [sent-215, score-0.543]

95 Lacquaniti (2004) Five basic muscle activation patterns account for muscle activity during human locomotion. [sent-234, score-0.158]

96 Hlawatsch (2003) Wigner distributions (nearly) everywhere: Time-frequency analysis of signals, systems, random processes, signal spaces, and frames. [sent-253, score-0.069]

97 de Meijer (1989) The contribution of general features of body movement to the attribution of emotions. [sent-257, score-0.109]

98 Kailath (1989) ESPRIT—Estimation of signal parameters via rotational invariance techniques. [sent-272, score-0.069]

99 Swindelhurst (1998) Time delay and spatial signature estimation using known asynchronous signals. [sent-279, score-0.159]

100 Torkkola (1996) Blind separation of delayed sources based on information maximization. [sent-286, score-0.57]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('signals', 0.315), ('gait', 0.258), ('sources', 0.236), ('mixing', 0.194), ('blind', 0.192), ('anechoic', 0.183), ('delayed', 0.18), ('ica', 0.175), ('ij', 0.172), ('wigner', 0.164), ('source', 0.162), ('delay', 0.159), ('trajectories', 0.159), ('emotional', 0.156), ('mixtures', 0.154), ('separation', 0.154), ('sobids', 0.141), ('wvs', 0.141), ('delays', 0.14), ('fxi', 0.122), ('fsj', 0.117), ('wx', 0.093), ('sound', 0.093), ('walking', 0.09), ('sj', 0.085), ('convolutive', 0.082), ('movement', 0.08), ('pca', 0.079), ('amplitudes', 0.074), ('interpretable', 0.074), ('extracted', 0.074), ('dt', 0.073), ('neutral', 0.071), ('ville', 0.07), ('speech', 0.07), ('signal', 0.069), ('aj', 0.069), ('human', 0.068), ('mixture', 0.067), ('styles', 0.065), ('instantaneous', 0.062), ('gaits', 0.061), ('wxi', 0.061), ('signs', 0.061), ('psychophysical', 0.059), ('segments', 0.057), ('perception', 0.053), ('kinematic', 0.049), ('angles', 0.047), ('extraction', 0.047), ('angry', 0.047), ('emotion', 0.047), ('emotions', 0.047), ('exion', 0.047), ('fearful', 0.047), ('fsi', 0.047), ('hlawatsch', 0.047), ('outnumber', 0.047), ('superpositions', 0.047), ('twx', 0.047), ('yeredor', 0.047), ('mf', 0.047), ('muscle', 0.045), ('acoustics', 0.041), ('ik', 0.041), ('timefrequency', 0.041), ('wsj', 0.041), ('happy', 0.041), ('sad', 0.041), ('normal', 0.039), ('matrices', 0.039), ('joint', 0.038), ('recorded', 0.038), ('perceived', 0.037), ('actors', 0.037), ('novel', 0.037), ('motion', 0.036), ('fx', 0.035), ('triangles', 0.035), ('cohen', 0.035), ('deconvolution', 0.035), ('superposition', 0.035), ('transferred', 0.035), ('trajectory', 0.034), ('spectrum', 0.034), ('acoustic', 0.033), ('tx', 0.033), ('unknowns', 0.033), ('sparsi', 0.033), ('diagonalization', 0.033), ('equation', 0.032), ('interference', 0.031), ('ej', 0.031), ('classical', 0.031), ('outperforms', 0.03), ('matrix', 0.029), ('body', 0.029), ('shift', 0.029), ('biologically', 0.029), ('elements', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 46 nips-2006-Blind source separation for over-determined delayed mixtures

Author: Lars Omlor, Martin Giese

Abstract: Blind source separation, i.e. the extraction of unknown sources from a set of given signals, is relevant for many applications. A special case of this problem is dimension reduction, where the goal is to approximate a given set of signals by superpositions of a minimal number of sources. Since in this case the signals outnumber the sources the problem is over-determined. Most popular approaches for addressing this problem are based on purely linear mixing models. However, many applications like the modeling of acoustic signals, EMG signals, or movement trajectories, require temporal shift-invariance of the extracted components. This case has only rarely been treated in the computational literature, and specifically for the case of dimension reduction almost no algorithms have been proposed. We present a new algorithm for the solution of this problem, which is based on a timefrequency transformation (Wigner-Ville distribution) of the generative model. We show that this algorithm outperforms classical source separation algorithms for linear mixtures, and also a related method for mixtures with delays. In addition, applying the new algorithm to trajectories of human gaits, we demonstrate that it is suitable for the extraction of spatio-temporal components that are easier to interpret than components extracted with other classical algorithms. 1

2 0.28411543 27 nips-2006-An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments

Author: Michael I. Mandel, Daniel P. Ellis, Tony Jebara

Abstract: We present a method for localizing and separating sound sources in stereo recordings that is robust to reverberation and does not make any assumptions about the source statistics. The method consists of a probabilistic model of binaural multisource recordings and an expectation maximization algorithm for finding the maximum likelihood parameters of that model. These parameters include distributions over delays and assignments of time-frequency regions to sources. We evaluate this method against two comparable algorithms on simulations of simultaneous speech from two or three sources. Our method outperforms the others in anechoic conditions and performs as well as the better of the two in the presence of reverberation. 1

3 0.15156625 12 nips-2006-A Probabilistic Algorithm Integrating Source Localization and Noise Suppression of MEG and EEG data

Author: Johanna M. Zumer, Hagai T. Attias, Kensuke Sekihara, Srikantan S. Nagarajan

Abstract: We have developed a novel algorithm for integrating source localization and noise suppression based on a probabilistic graphical model of stimulus-evoked MEG/EEG data. Our algorithm localizes multiple dipoles while suppressing noise sources with the computational complexity equivalent to a single dipole scan, and is therefore more efficient than traditional multidipole fitting procedures. In simulation, the algorithm can accurately localize and estimate the time course of several simultaneously-active dipoles, with rotating or fixed orientation, at noise levels typical for averaged MEG data. Furthermore, the algorithm is superior to beamforming techniques, which we show to be an approximation to our graphical model, in estimation of temporally correlated sources. Success of this algorithm for localizing auditory cortex in a tumor patient and for localizing an epileptic spike source are also demonstrated. 1

4 0.13499278 194 nips-2006-Towards a general independent subspace analysis

Author: Fabian J. Theis

Abstract: The increasingly popular independent component analysis (ICA) may only be applied to data following the generative ICA model in order to guarantee algorithmindependent and theoretically valid results. Subspace ICA models generalize the assumption of component independence to independence between groups of components. They are attractive candidates for dimensionality reduction methods, however are currently limited by the assumption of equal group sizes or less general semi-parametric models. By introducing the concept of irreducible independent subspaces or components, we present a generalization to a parameter-free mixture model. Moreover, we relieve the condition of at-most-one-Gaussian by including previous results on non-Gaussian component analysis. After introducing this general model, we discuss joint block diagonalization with unknown block sizes, on which we base a simple extension of JADE to algorithmically perform the subspace analysis. Simulations confirm the feasibility of the algorithm. 1 Independent subspace analysis A random vector Y is called an independent component of the random vector X, if there exists an invertible matrix A and a decomposition X = A(Y, Z) such that Y and Z are stochastically independent. The goal of a general independent subspace analysis (ISA) or multidimensional independent component analysis is the decomposition of an arbitrary random vector X into independent components. If X is to be decomposed into one-dimensional components, this coincides with ordinary independent component analysis (ICA). Similarly, if the independent components are required to be of the same dimension k, then this is denoted by multidimensional ICA of fixed group size k or simply k-ISA. So 1-ISA is equivalent to ICA. 1.1 Why extend ICA? An important structural aspect in the search for decompositions is the knowledge of the number of solutions i.e. the indeterminacies of the problem. Without it, the result of any ICA or ISA algorithm cannot be compared with other solutions, so for instance blind source separation (BSS) would be impossible. Clearly, given an ISA solution, invertible transforms in each component (scaling matrices L) as well as permutations of components of the same dimension (permutation matrices P) give again an ISA of X. And indeed, in the special case of ICA, scaling and permutation are already all indeterminacies given that at most one Gaussian is contained in X [6]. This is one of the key theoretical results in ICA, allowing the usage of ICA for solving BSS problems and hence stimulating many applications. It has been shown that also for k-ISA, scalings and permutations as above are the only indeterminacies [11], given some additional rather weak restrictions to the model. However, a serious drawback of k-ISA (and hence of ICA) lies in the fact that the requirement fixed group-size k does not allow us to apply this analysis to an arbitrary random vector. Indeed, crosstalking error 4 3 2 1 0 FastICA JADE Extended Infomax Figure 1: Applying ICA to a random vector X = AS that does not fulfill the ICA model; here S is chosen to consist of a two-dimensional and a one-dimensional irreducible component. Shown are the statistics over 100 runs of the Amari error of the random original and the reconstructed mixing matrix using the three ICA-algorithms FastICA, JADE and Extended Infomax. Clearly, the original mixing matrix could not be reconstructed in any of the experiments. However, interestingly, the latter two algorithms do indeed find an ISA up to permutation, which will be explained in section 3. theoretically speaking, it may only be applied to random vectors following the k-ISA blind source separation model, which means that they have to be mixtures of a random vector that consists of independent groups of size k. If this is the case, uniqueness up to permutation and scaling holds as noted above; however if k-ISA is applied to any random vector, a decomposition into groups that are only ‘as independent as possible’ cannot be unique and depends on the contrast and the algorithm. In the literature, ICA is often applied to find representations fulfilling the independence condition as well as possible, however care has to be taken; the strong uniqueness result is not valid any more, and the results may depend on the algorithm as illustrated in figure 1. This work aims at finding an ISA model that allows applicability to any random vector. After reviewing previous approaches, we will provide such a model together with a corresponding uniqueness result and a preliminary algorithm. 1.2 Previous approaches to ISA for dependent component analysis Generalizations of the ICA model that are to include dependencies of multiple one-dimensional components have been studied for quite some time. ISA in the terminology of multidimensional ICA has first been introduced by Cardoso [4] using geometrical motivations. His model as well as the related but independently proposed factorization of multivariate function classes [9] is quite general, however no identifiability results were presented, and applicability to an arbitrary random vector was unclear; later, in the special case of equal group sizes (k-ISA) uniqueness results have been extended from the ICA theory [11]. Algorithmic enhancements in this setting have been recently studied by [10]. Moreover, if the observation contain additional structures such as spatial or temporal structures, these may be used for the multidimensional separation [13]. Hyv¨ rinen and Hoyer presented a special case of k-ISA by combining it with invariant feature suba space analysis [7]. They model the dependence within a k-tuple explicitly and are therefore able to propose more efficient algorithms without having to resort to the problematic multidimensional density estimation. A related relaxation of the ICA assumption is given by topographic ICA [8], where dependencies between all components are assumed and modelled along a topographic structure (e.g. a 2-dimensional grid). Bach and Jordan [2] formulate ISA as a component clustering problem, which necessitates a model for inter-cluster independence and intra-cluster dependence. For the latter, they propose to use a tree-structure as employed by their tree dependepent component analysis. Together with inter-cluster independence, this implies a search for a transformation of the mixtures into a forest i.e. a set of disjoint trees. However, the above models are all semi-parametric and hence not fully blind. In the following, no additional structures are necessary for the separation. 1.3 General ISA Definition 1.1. A random vector S is said to be irreducible if it contains no lower-dimensional independent component. An invertible matrix W is called a (general) independent subspace analysis of X if WX = (S1 , . . . , Sk ) with pairwise independent, irreducible random vectors Si . Note that in this case, the Si are independent components of X. The idea behind this definition is that in contrast to ICA and k-ISA, we do not fix the size of the groups Si in advance. Of course, some restriction is necessary, otherwise no decomposition would be enforced at all. This restriction is realized by allowing only irreducible components. The advantage of this formulation now is that it can clearly be applied to any random vector, although of course a trivial decomposition might be the result in the case of an irreducible random vector. Obvious indeterminacies of an ISA of X are, as mentioned above, scalings i.e. invertible transformations within each Si and permutation of Si of the same dimension1. These are already all indeterminacies as shown by the following theorem, which extends previous results in the case of ICA [6] and k-ISA [11], where also the additional slight assumptions on square-integrability i.e. on existing covariance have been made. Theorem 1.2. Given a random vector X with existing covariance and no Gaussian independent component, then an ISA of X exists and is unique except for scaling and permutation. Existence holds trivially but uniqueness is not obvious. Due to the limited space, we only give a short sketch of the proof in the following. The uniqueness result can easily be formulated as a subspace extraction problem, and theorem 1.2 follows readily from Lemma 1.3. Let S = (S1 , . . . , Sk ) be a square-integrable decomposition of S into irreducible independent components Si . If X is an irreducible component of S, then X ∼ Si for some i. Here the equivalence relation ∼ denotes equality except for an invertible transformation. The following two lemmata each give a simplification of lemma 1.3 by ordering the components Si according to their dimensions. Some care has to be taken when showing that lemma 1.5 implies lemma 1.4. Lemma 1.4. Let S and X be defined as in lemma 1.3. In addition assume that dim Si = dim X for i ≤ l and dim Si < dim X for i > l. Then X ∼ Si for some i ≤ l. Lemma 1.5. Let S and X be defined as in lemma 1.4, and let l = 1 and k = 2. Then X ∼ S1 . In order to prove lemma 1.5 (and hence the theorem), it is sufficient to show the following lemma: Lemma 1.6. Let S = (S1 , S2 ) with S1 irreducible and m := dim S1 > dim S2 =: n. If X = AS is again irreducible for some m × (m + n)-matrix A, then (i) the left m × m-submatrix of A is invertible, and (ii) if X is an independent component of S, the right m × n-submatrix of A vanishes. (i) follows after some linear algebra, and is necessary to show the more difficult part (ii). For this, we follow the ideas presented in [12] using factorization of the joint characteristic function of S. 1.4 Dealing with Gaussians In the previous section, Gaussians had to be excluded (or at most one was allowed) in order to avoid additional indeterminacies. Indeed, any orthogonal transformation of two decorrelated hence independent Gaussians is again independent, so clearly such a strong identification result would not be possible. Recently, a general decomposition model dealing with Gaussians was proposed in the form of the socalled non-Gaussian subspace analysis (NGSA) [3]. It tries to detect a whole non-Gaussian subspace within the data, and no assumption of independence within the subspace is made. More precisely, given a random vector X, a factorization X = AS with an invertible matrix A, S = (SN , SG ) and SN a square-integrable m-dimensional random vector is called an m-decomposition of X if SN and SG are stochastically independent and SG is Gaussian. In this case, X is said to be mdecomposable. X is denoted to be minimally n-decomposable if X is not (n − 1)-decomposable. According to our previous notation, SN and SG are independent components of X. It has been shown that the subspaces of such decompositions are unique [12]: 1 Note that scaling here implies a basis change in the component Si , so for example in the case of a twodimensional source component, this might be rotation and sheering. In the example later in figure 3, these indeterminacies can easily be seen by comparing true and estimated sources. Theorem 1.7 (Uniqueness of NGSA). The mixing matrix A of a minimal decomposition is unique except for transformations in each of the two subspaces. Moreover, explicit algorithms can be constructed for identifying the subspaces [3]. This result enables us to generalize theorem 1.2and to get a general decomposition theorem, which characterizes solutions of ISA. Theorem 1.8 (Existence and Uniqueness of ISA). Given a random vector X with existing covariance, an ISA of X exists and is unique except for permutation of components of the same dimension and invertible transformations within each independent component and within the Gaussian part. Proof. Existence is obvious. Uniqueness follows after first applying theorem 1.7 to X and then theorem 1.2 to the non-Gaussian part. 2 Joint block diagonalization with unknown block-sizes Joint diagonalization has become an important tool in ICA-based BSS (used for example in JADE) or in BSS relying on second-order temporal decorrelation. The task of (real) joint diagonalization (JD) of a set of symmetric real n×n matrices M := {M1 , . . . , MK } is to find an orthogonal matrix K ˆ ˆ ˆ E such that E⊤ Mk E is diagonal for all k = 1, . . . , K i.e. to minimizef (E) := k=1 E⊤ Mk E − ˆ ⊤ Mk E) 2 with respect to the orthogonal matrix E, where diagM(M) produces a matrix ˆ ˆ diagM(E F where all off-diagonal elements of M have been set to zero, and M 2 := tr(MM⊤ ) denotes the F squared Frobenius norm. The Frobenius norm is invariant under conjugation by an orthogonal maˆ ˆ⊤ ˆ 2 trix, so minimizing f is equivalent to maximizing g(E) := K k=1 diag(E Mk E) , where now diag(M) := (mii )i denotes the diagonal of M. For the actual minimization of f respectively maximization of g, we will use the common approach of Jacobi-like optimization by iterative applications of Givens rotation in two coordinates [5]. 2.1 Generalization to blocks In the following we will use a generalization of JD in order to solve ISA problems. Instead of fully diagonalizing all n × n matrices Mk ∈ M, in joint block diagonalization (JBD) of M we want to determine E such that E⊤ Mk E is block-diagonal. Depending on the application, we fix the blockstructure in advance or try to determine it from M. We are not interested in the order of the blocks, so the block-structure is uniquely specified by fixing a partition of n i.e. a way of writing n as a sum of positive integers, where the order of the addends is not significant. So let2 n = m1 + . . . + mr with m1 ≤ m2 ≤ . . . ≤ mr and set m := (m1 , . . . , mr ) ∈ Nr . An n × n matrix is said to be m-block diagonal if it is of the form   D1 · · · 0  . .  .. .   . . . . 0 · · · Dr with arbitrary mi × mi matrices Di . As generalization of JD in the case of known the block structure, we can formulate the joint mK ˆ ˆ⊤ ˆ block diagonalization (m-JBD) problem as the minimization of f m (E) := k=1 E Mk E − m ˆ⊤ ˆ 2 with respect to the orthogonal matrix E, where diagMm (M) produces a ˆ diagM (E Mk E) F m-block diagonal matrix by setting all other elements of M to zero. In practice due to estimation errors, such E will not exist, so we speak of approximate JBD and imply minimizing some errormeasure on non-block-diagonality. Indeterminacies of any m-JBD are m-scaling i.e. multiplication by an m-block diagonal matrix from the right, and m-permutation defined by a permutation matrix that only swaps blocks of the same size. Finally, we speak of general JBD if we search for a JBD but no block structure is given; instead it is to be determined from the matrix set. For this it is necessary to require a block 2 We do not use the convention from Ferrers graphs of specifying partitions in decreasing order, as a visualization of increasing block-sizes seems to be preferable in our setting. structure of maximal length, otherwise trivial solutions or ‘in-between’ solutions could exist (and obviously contain high indeterminacies). Formally, E is said to be a (general) JBD of M if (E, m) = argmaxm | ∃E:f m (E)=0 |m|. In practice due to errors, a true JBD would always result in the trivial decomposition m = (n), so we define an approximate general JBD by requiring f m (E) < ǫ for some fixed constant ǫ > 0 instead of f m (E) = 0. 2.2 JBD by JD A few algorithms to actually perform JBD have been proposed, see [1] and references therein. In the following we will simply perform joint diagonalization and then permute the columns of E to achieve block-diagonality — in experiments this turns out to be an efficient solution to JBD [1]. This idea has been formulated in a conjecture [1] essentially claiming that a minimum of the JD cost function f already is a JBD i.e. a minimum of the function f m up to a permutation matrix. Indeed, in the conjecture it is required to use the Jacobi-update algorithm from [5], but this is not necessary, and we can prove the conjecture partially: We want to show that JD implies JBD up to permutation, i.e. if E is a minimum of f , then there exists a permutation P such that f m (EP) = 0 (given existence of a JBD of M). But of course f (EP) = f (E), so we will show why (certain) JBD solutions are minima of f . However, JD might have additional minima. First note that clearly not any JBD minimizes f , only those such that in each block of size mk , f (E) when restricted to the block is maximal over E ∈ O(mk ). We will call such a JBD block-optimal in the following. Theorem 2.1. Any block-optimal JBD of M (zero of f m ) is a local minimum of f . Proof. Let E ∈ O(n) be block-optimal with f m (E) = 0. We have to show that E is a local minimum of f or equivalently a local maximum of the squared diagonal sum g. After substituting each Mk by E⊤ Mk E, we may already assume that Mk is m-block diagonal, so we have to show that E = I is a local maximum of g. Consider the elementary Givens rotation Gij (ǫ) defined for i < j and ǫ√ (−1, 1) as the orthogonal ∈ matrix, where all diagonal elements are 1 except for the two elements 1 − ǫ2 in rows i and j and with all off-diagonal elements equal to 0 except for the two elements ǫ and −ǫ at (i, j) and (j, i), respectively. It can be used to construct local coordinates of the d := n(n − 1)/2-dimensional manifold O(n) at I, simply by ι(ǫ12 , ǫ13 , . . . , ǫn−1,n ) := i < j. If i, j are in the same block of m, then h is locally maximal i.e. negative semi-definite at 0 in the direction ǫij because of block-optimality. Now assume i and j are from different blocks. After possible permutation, we may assume that j = i + 1 so that each matrix Mk ∈ M has (Mk )ij = (Mk )ji = 0, and ak := (Mk )ii , bk := (Mk )jj . Then Gij (ǫ)⊤ Mk Gij (ǫ) can be easily calculated at coordinates (i, i) to (j, j), and indeed entries on the diagonal other than at indices (i, i) and (j, j) are not changed, so diag(Gij (ǫ)⊤ Mk Gij (ǫ)) 2 2 − diag(Mk ) 2 = 2 = −2ak (ak − bk )ǫ + 2bk (ak − bk )ǫ + 2(ak − bk )2 ǫ4 = −2(a2 + b2 )ǫ2 + 2(ak − bk )2 ǫ4 . k k Hence h(0, . . . , 0, ǫij , 0, . . . , 0) − h(0) = −cǫ2 + dǫ4 with c = 2 K (a2 + b2 ) and d = ij ij k k=1 k K 2 k=1 (ak − bk )2 . Now either c = 0, then also d = 0 and h is constant zero in the direction ǫij . Or, more interestingly, c = 0, then c > 0 and therefore h is negative definite in the direction ǫij . Altogether we get a negative definite h at 0 except for ‘trivial directions’, and hence a local maximum at 0. 2.3 Recovering the permutation In order to perform JBD, we therefore only have to find a JD E of M. What is left according to the above theorem is to find a permutation matrix P such that EP block-diagonalizes M. In the case of known block-order m, we can employ similar techniques as used in [1, 10], which essentially find P by some combinatorial optimization. 5 5 5 10 10 10 15 15 15 20 20 20 25 25 25 30 30 30 35 35 40 35 40 40 5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40 ˆ⊤ (a) (unknown) block diagonal M1 (b) E E w/o recovered permutation 5 10 15 20 25 30 35 40 ˆ⊤ (c) E E Figure 2: Performance of the proposed general JBD algorithm in the case of the (unknown) blockpartition 40 = 1 + 2 + 2 + 3 + 3 + 5 + 6 + 6 + 6 + 6 in the presence of noise with SNR of 5dB. The ˆ product E⊤ E of the inverse of the estimated block diagonalizer and the original one is an m-block diagonal matrix except for permutation within groups of the same sizes as claimed in section 2.2. In the case of unknown block-size, we propose to use the following simple permutation-recovery K algorithm: consider the mean diagonalized matrix D := K −1 k=1 E⊤ Mk E. Due to the assumption that M is m-block-diagonalizable (with unknown m), each E⊤ Mk E and hence also D must be m-block-diagonal except for a permutation P, so it must have the corresponding number of zeros in each column and row. In the approximate JBD case, thresholding with a threshold θ is necessary, whose choice is non-trivial. We propose using algorithm 1 to recover the permutation; we denote its resulting permuted matrix by P(D) when applied to the input D. P(D) is constructed from possibly thresholded D by iteratively permuting columns and rows in order to guarantee that all non-zeros of D are clustered along the diagonal as closely as possible. This recovers the permutation as well as the partition m of n. Algorithm 1: Block-diagonality permutation finder Input: (n × n)-matrix D Output: block-diagonal matrix P(D) := D′ such that D′ = PDPT for a permutation matrix P D′ ← D for i ← 1 to n do repeat if (j0 ← min{j|j ≥ i and d′ = 0 and d′ = 0}) exists then ij ji if (k0 ← min{k|k > j0 and (d′ = 0 or d′ = 0)}) exists then ik ki swap column j0 of D′ with column k0 swap row j0 of D′ with row k0 until no swap has occurred ; We illustrate the performance of the proposed JBD algorithm as follows: we generate a set of K = 100 m-block-diagonal matrices Dk of dimension 40 × 40 with m = (1, 2, 2, 3, 3, 5, 6, 6, 6, 6). They have been generated in blocks of size m with coefficients chosen randomly uniform from [−1, 1], and symmetrized by Dk ← (Dk + D⊤ )/2. After that, they have been mixed by a random k orthogonal mixing matrix E ∈ O(40), i.e. Mk := EDk E⊤ + N, where N is a noise matrix with independent Gaussian entries such that the resulting signal-to-noise ratio is 5dB. Application of the JBD algorithm from above to {M1 , . . . , MK } with threshold θ = 0.1 correctly recovers the ˆ block sizes, and the estimated block diagonalizer E equals E up to m-scaling and permutation, as illustrated in figure 2. 3 SJADE — a simple algorithm for general ISA As usual by preprocessing of the observations X by whitening we may assume that Cov(X) = I. The indeterminacies allow scaling transformations in the sources, so without loss of generality let 5.5 6 5 5.5 5 1 5 2 4.5 4 3 5 4.5 4 3 4 2 5 4.5 4 3.5 4 3.5 6 3 1 2.5 0 5 3.5 3 7 3 2.5 8 4 3 2 2.5 2 5 4 2 1 1.5 7 8 9 10 11 12 13 14 2 3 4 5 (a) S2 6 7 8 9 1.5 3 3.5 4 (b) S3 4.5 5 5.5 6 6.5 10 1 0 7 (c) S4 5 9 3 2 0 1 14 3 4 5 6 7 8 9 10 ˆ (e) A−1 A −3 1 −3.5 4.5 2 (d) S5 13 250 0 −4 4 −1 −4.5 12 200 −5 −3 −6 −4 −6.5 11 150 −2 −5.5 3.5 −5 0 3 10 100 2.5 −1 −7 9 50 2 5 −2 4 3 −3 −7.5 2 −4 1.5 −6.5 −6 −5.5 −5 −4.5 −4 −3.5 −3 ˆ ˆ (f) (S1 , S2 ) −2.5 −2 0 −4 −3.5 −3 −2.5 −2 −1.5 −1 −0.5 ˆ (g) histogram of S3 0 8 −4.5 −4 −3.5 −3 −2.5 −2 (h) S4 −1.5 −1 −0.5 −8 −7.5 −7 −6.5 −6 −5.5 −5 −4.5 (i) S5 −4 −3.5 −3 −2.5 1 −5 0 (j) S6 Figure 3: Example application of general ISA for unknown sizes m = (1, 2, 2, 2, 3). Shown are the ˆ scatter plots i.e. densities of the source components and the mixing-separating map A−1 A. also Cov(S) = I. Then I = Cov(X) = A Cov(S)A⊤ = AA⊤ so A is orthogonal. Due to the ISA assumptions, the fourth-order cross cumulants of the sources have to be trivial between different groups, and within the Gaussians. In order to find transformations of the mixtures fulfilling this property, we follow the idea of the JADE algorithmbut now in the ISA setting. We perform JBD of the (whitened) contracted quadricovariance matrices defined by Cij (X) := E X⊤ Eij XXX⊤ − Eij − E⊤ − tr(Eij )I. Here RX := Cov(X) and Eij is a set of eigen-matrices of Cij , 1 ≤ i, j ≤ n. ij One simple choice is to use n2 matrices Eij with zeros everywhere except 1 at index (i, j). More elaborate choices of eigen-matrices (with only n(n + 1)/2 or even n entries) are possible. The resulting algorithm, subspace-JADE (SJADE) not only performs NGCA by grouping Gaussians as one-dimensional components with trivial Cii ’s, but also automatically finds the subspace partition m using the general JBD algorithm from section 2.3. 4 Experimental results In a first example, we consider a general ISA problem in dimension n = 10 with the unknown partition m = (1, 2, 2, 2, 3). In order to generate 2- and 3-dimensional irreducible random vectors, we decided to follow the nice visual ideas from [10] and to draw samples from a density following a known shape — in our case 2d-letters or 3d-geometrical shapes. The chosen sources densities are shown in figure 3(a-d). Another 1-dimensional source following a uniform distribution was constructed. Altogether 104 samples were used. The sources S were mixed by a mixing matrix A with coefficients uniformly randomly sampled from [−1, 1] to give mixtures X = AS. The recovered ˆ mixing matrix A was then estimated using the above block-JADE algorithm with unknown block size; we observed that the method is quite sensitive to the choice of the threshold (here θ = 0.015). ˆ Figure 3(e) shows the composed mixing-separating system A−1 A; clearly the matrices are equal except for block permutation and scaling, which experimentally confirms theorem 1.8. The algoˆ rithm found a partition m = (1, 1, 1, 2, 2, 3), so one 2d-source was misinterpreted as two 1d-sources, but by using previous knowledge combination of the correct two 1d-sources yields the original 2dˆ ˆ source. The resulting recovered sources S := A−1 X, figures 3(f-j), then equal the original sources except for permutation and scaling within the sources — which in the higher-dimensional cases implies transformations such as rotation of the underlying images or shapes. When applying ICA (1-ISA) to the above mixtures, we cannot expect to recover the original sources as explained in figure 1; however, some algorithms might recover the sources up to permutation. Indeed, SJADE equals JADE with additional permutation recovery because the joint block diagonalization is per- formed using joint diagonalization. This explains why JADE retrieves meaningful components even in this non-ICA setting as observed in [4]. In a second example, we illustrate how the algorithm deals with Gaussian sources i.e. how the subspace JADE also includes NGCA. For this we consider the case n = 5, m = (1, 1, 1, 2) and sources with two Gaussians, one uniform and a 2-dimensional irreducible component as before; 105 samples were drawn. We perform 100 Monte-Carlo simulations with random mixing matrix ˆ A, and apply SJADE with θ = 0.01. The recovered mixing matrix A is compared with A by 3 2 2 2 ˆ −1 A. Indeed, we get nearly taking the ad-hoc measure ι(P) := i=1 j=1 (pij + pji ) for P := A perfect recovery in 99 out of 100 runs, the median of ι(P) is very low with 0.0083. A single run diverges with ι(P ) = 3.48. In order to show that the algorithm really separates the Gaussian part from the other components, we compare the recovered source kurtoses. The median kurtoses are −0.0006 ± 0.02, −0.003 ± 0.3, −1.2 ± 0.3, −1.2 ± 0.2 and −1.6 ± 0.2. The first two components have kurtoses close to zero, so they are the two Gaussians, whereas the third component has kurtosis of around −1.2, which equals the kurtosis of a uniform density. This confirms the applicability of the algorithm in the general, noisy ISA setting. 5 Conclusion Previous approaches for independent subspace analysis were restricted either to fixed group sizes or semi-parametric models. In neither case, general applicability to any kind of mixture data set was guaranteed, so blind source separation might fail. In the present contribution we introduce the concept of irreducible independent components and give an identifiability result for this general, parameter-free model together with a novel arbitrary-subspace-size algorithm based on joint block diagonalization. As in ICA, the main uniqueness theorem is an asymptotic result (but includes noisy case via NGCA). However in practice in the finite sample case, due to estimation errors the general joint block diagonality only approximately holds. Our simple solution in this contribution was to choose appropriate thresholds. But this choice is non-trivial, and adaptive methods are to be developed in future works. References [1] K. Abed-Meraim and A. Belouchrani. Algorithms for joint block diagonalization. In Proc. EUSIPCO 2004, pages 209–212, Vienna, Austria, 2004. [2] F.R. Bach and M.I. Jordan. Finding clusters in independent component analysis. In Proc. ICA 2003, pages 891–896, 2003. [3] G. Blanchard, M. Kawanabe, M. Sugiyama, V. Spokoiny, and K.-R. M¨ ller. In search of non-gaussian u components of a high-dimensional distribution. JMLR, 7:247–282, 2006. [4] J.F. Cardoso. Multidimensional independent component analysis. In Proc. of ICASSP ’98, Seattle, 1998. [5] J.F. Cardoso and A. Souloumiac. Jacobi angles for simultaneous diagonalization. SIAM J. Mat. Anal. Appl., 17(1):161–164, January 1995. [6] P. Comon. Independent component analysis - a new concept? Signal Processing, 36:287–314, 1994. [7] A. Hyv¨ rinen and P.O. Hoyer. Emergence of phase and shift invariant features by decomposition of a natural images into independent feature subspaces. Neural Computation, 12(7):1705–1720, 2000. [8] A. Hyv¨ rinen, P.O. Hoyer, and M. Inki. Topographic independent component analysis. Neural Computaa tion, 13(7):1525–1558, 2001. [9] J.K. Lin. Factorizing multivariate function classes. In Advances in Neural Information Processing Systems, volume 10, pages 563–569, 1998. [10] B. Poczos and A. L¨ rincz. Independent subspace analysis using k-nearest neighborhood distances. In o Proc. ICANN 2005, volume 3696 of LNCS, pages 163–168, Warsaw, Poland, 2005. Springer. [11] F.J. Theis. Uniqueness of complex and multidimensional independent component analysis. Signal Processing, 84(5):951–956, 2004. [12] F.J. Theis and M. Kawanabe. Uniqueness of non-gaussian subspace analysis. In Proc. ICA 2006, pages 917–925, Charleston, USA, 2006. [13] R. Vollgraf and K. Obermayer. Multi-dimensional ICA to separate correlated sources. In Proc. NIPS 2001, pages 993–1000, 2001.

5 0.12008224 167 nips-2006-Recursive ICA

Author: Honghao Shan, Lingyun Zhang, Garrison W. Cottrell

Abstract: Independent Component Analysis (ICA) is a popular method for extracting independent features from visual data. However, as a fundamentally linear technique, there is always nonlinear residual redundancy that is not captured by ICA. Hence there have been many attempts to try to create a hierarchical version of ICA, but so far none of the approaches have a natural way to apply them more than once. Here we show that there is a relatively simple technique that transforms the absolute values of the outputs of a previous application of ICA into a normal distribution, to which ICA maybe applied again. This results in a recursive ICA algorithm that may be applied any number of times in order to extract higher order structure from previous layers. 1

6 0.11976909 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis

7 0.11233373 179 nips-2006-Sparse Representation for Signal Classification

8 0.09619946 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization

9 0.094559774 111 nips-2006-Learning Motion Style Synthesis from Perceptual Observations

10 0.08864592 131 nips-2006-Mixture Regression for Covariate Shift

11 0.087267719 176 nips-2006-Single Channel Speech Separation Using Factorial Dynamics

12 0.081424177 198 nips-2006-Unified Inference for Variational Bayesian Linear Gaussian State-Space Models

13 0.079851247 169 nips-2006-Relational Learning with Gaussian Processes

14 0.072886631 134 nips-2006-Modeling Human Motion Using Binary Latent Variables

15 0.07076855 33 nips-2006-Analysis of Representations for Domain Adaptation

16 0.067799628 2 nips-2006-A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation

17 0.067512356 148 nips-2006-Nonlinear physically-based models for decoding motor-cortical population activity

18 0.067143798 35 nips-2006-Approximate inference using planar graph decomposition

19 0.064511992 107 nips-2006-Large Margin Multi-channel Analog-to-Digital Conversion with Applications to Neural Prosthesis

20 0.062939942 65 nips-2006-Denoising and Dimension Reduction in Feature Space


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.204), (1, -0.026), (2, 0.096), (3, -0.06), (4, -0.099), (5, -0.094), (6, 0.168), (7, 0.003), (8, -0.099), (9, 0.168), (10, 0.003), (11, -0.099), (12, -0.125), (13, -0.134), (14, 0.188), (15, -0.238), (16, 0.079), (17, -0.09), (18, 0.153), (19, -0.118), (20, -0.035), (21, -0.01), (22, -0.152), (23, 0.083), (24, 0.283), (25, 0.109), (26, -0.043), (27, -0.018), (28, -0.105), (29, 0.042), (30, 0.093), (31, -0.045), (32, 0.005), (33, -0.01), (34, 0.108), (35, 0.009), (36, -0.06), (37, -0.01), (38, -0.055), (39, 0.008), (40, 0.147), (41, -0.061), (42, -0.035), (43, 0.018), (44, 0.049), (45, 0.04), (46, -0.082), (47, 0.026), (48, -0.029), (49, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96578759 46 nips-2006-Blind source separation for over-determined delayed mixtures

Author: Lars Omlor, Martin Giese

Abstract: Blind source separation, i.e. the extraction of unknown sources from a set of given signals, is relevant for many applications. A special case of this problem is dimension reduction, where the goal is to approximate a given set of signals by superpositions of a minimal number of sources. Since in this case the signals outnumber the sources the problem is over-determined. Most popular approaches for addressing this problem are based on purely linear mixing models. However, many applications like the modeling of acoustic signals, EMG signals, or movement trajectories, require temporal shift-invariance of the extracted components. This case has only rarely been treated in the computational literature, and specifically for the case of dimension reduction almost no algorithms have been proposed. We present a new algorithm for the solution of this problem, which is based on a timefrequency transformation (Wigner-Ville distribution) of the generative model. We show that this algorithm outperforms classical source separation algorithms for linear mixtures, and also a related method for mixtures with delays. In addition, applying the new algorithm to trajectories of human gaits, we demonstrate that it is suitable for the extraction of spatio-temporal components that are easier to interpret than components extracted with other classical algorithms. 1

2 0.88703114 27 nips-2006-An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments

Author: Michael I. Mandel, Daniel P. Ellis, Tony Jebara

Abstract: We present a method for localizing and separating sound sources in stereo recordings that is robust to reverberation and does not make any assumptions about the source statistics. The method consists of a probabilistic model of binaural multisource recordings and an expectation maximization algorithm for finding the maximum likelihood parameters of that model. These parameters include distributions over delays and assignments of time-frequency regions to sources. We evaluate this method against two comparable algorithms on simulations of simultaneous speech from two or three sources. Our method outperforms the others in anechoic conditions and performs as well as the better of the two in the presence of reverberation. 1

3 0.67197388 194 nips-2006-Towards a general independent subspace analysis

Author: Fabian J. Theis

Abstract: The increasingly popular independent component analysis (ICA) may only be applied to data following the generative ICA model in order to guarantee algorithmindependent and theoretically valid results. Subspace ICA models generalize the assumption of component independence to independence between groups of components. They are attractive candidates for dimensionality reduction methods, however are currently limited by the assumption of equal group sizes or less general semi-parametric models. By introducing the concept of irreducible independent subspaces or components, we present a generalization to a parameter-free mixture model. Moreover, we relieve the condition of at-most-one-Gaussian by including previous results on non-Gaussian component analysis. After introducing this general model, we discuss joint block diagonalization with unknown block sizes, on which we base a simple extension of JADE to algorithmically perform the subspace analysis. Simulations confirm the feasibility of the algorithm. 1 Independent subspace analysis A random vector Y is called an independent component of the random vector X, if there exists an invertible matrix A and a decomposition X = A(Y, Z) such that Y and Z are stochastically independent. The goal of a general independent subspace analysis (ISA) or multidimensional independent component analysis is the decomposition of an arbitrary random vector X into independent components. If X is to be decomposed into one-dimensional components, this coincides with ordinary independent component analysis (ICA). Similarly, if the independent components are required to be of the same dimension k, then this is denoted by multidimensional ICA of fixed group size k or simply k-ISA. So 1-ISA is equivalent to ICA. 1.1 Why extend ICA? An important structural aspect in the search for decompositions is the knowledge of the number of solutions i.e. the indeterminacies of the problem. Without it, the result of any ICA or ISA algorithm cannot be compared with other solutions, so for instance blind source separation (BSS) would be impossible. Clearly, given an ISA solution, invertible transforms in each component (scaling matrices L) as well as permutations of components of the same dimension (permutation matrices P) give again an ISA of X. And indeed, in the special case of ICA, scaling and permutation are already all indeterminacies given that at most one Gaussian is contained in X [6]. This is one of the key theoretical results in ICA, allowing the usage of ICA for solving BSS problems and hence stimulating many applications. It has been shown that also for k-ISA, scalings and permutations as above are the only indeterminacies [11], given some additional rather weak restrictions to the model. However, a serious drawback of k-ISA (and hence of ICA) lies in the fact that the requirement fixed group-size k does not allow us to apply this analysis to an arbitrary random vector. Indeed, crosstalking error 4 3 2 1 0 FastICA JADE Extended Infomax Figure 1: Applying ICA to a random vector X = AS that does not fulfill the ICA model; here S is chosen to consist of a two-dimensional and a one-dimensional irreducible component. Shown are the statistics over 100 runs of the Amari error of the random original and the reconstructed mixing matrix using the three ICA-algorithms FastICA, JADE and Extended Infomax. Clearly, the original mixing matrix could not be reconstructed in any of the experiments. However, interestingly, the latter two algorithms do indeed find an ISA up to permutation, which will be explained in section 3. theoretically speaking, it may only be applied to random vectors following the k-ISA blind source separation model, which means that they have to be mixtures of a random vector that consists of independent groups of size k. If this is the case, uniqueness up to permutation and scaling holds as noted above; however if k-ISA is applied to any random vector, a decomposition into groups that are only ‘as independent as possible’ cannot be unique and depends on the contrast and the algorithm. In the literature, ICA is often applied to find representations fulfilling the independence condition as well as possible, however care has to be taken; the strong uniqueness result is not valid any more, and the results may depend on the algorithm as illustrated in figure 1. This work aims at finding an ISA model that allows applicability to any random vector. After reviewing previous approaches, we will provide such a model together with a corresponding uniqueness result and a preliminary algorithm. 1.2 Previous approaches to ISA for dependent component analysis Generalizations of the ICA model that are to include dependencies of multiple one-dimensional components have been studied for quite some time. ISA in the terminology of multidimensional ICA has first been introduced by Cardoso [4] using geometrical motivations. His model as well as the related but independently proposed factorization of multivariate function classes [9] is quite general, however no identifiability results were presented, and applicability to an arbitrary random vector was unclear; later, in the special case of equal group sizes (k-ISA) uniqueness results have been extended from the ICA theory [11]. Algorithmic enhancements in this setting have been recently studied by [10]. Moreover, if the observation contain additional structures such as spatial or temporal structures, these may be used for the multidimensional separation [13]. Hyv¨ rinen and Hoyer presented a special case of k-ISA by combining it with invariant feature suba space analysis [7]. They model the dependence within a k-tuple explicitly and are therefore able to propose more efficient algorithms without having to resort to the problematic multidimensional density estimation. A related relaxation of the ICA assumption is given by topographic ICA [8], where dependencies between all components are assumed and modelled along a topographic structure (e.g. a 2-dimensional grid). Bach and Jordan [2] formulate ISA as a component clustering problem, which necessitates a model for inter-cluster independence and intra-cluster dependence. For the latter, they propose to use a tree-structure as employed by their tree dependepent component analysis. Together with inter-cluster independence, this implies a search for a transformation of the mixtures into a forest i.e. a set of disjoint trees. However, the above models are all semi-parametric and hence not fully blind. In the following, no additional structures are necessary for the separation. 1.3 General ISA Definition 1.1. A random vector S is said to be irreducible if it contains no lower-dimensional independent component. An invertible matrix W is called a (general) independent subspace analysis of X if WX = (S1 , . . . , Sk ) with pairwise independent, irreducible random vectors Si . Note that in this case, the Si are independent components of X. The idea behind this definition is that in contrast to ICA and k-ISA, we do not fix the size of the groups Si in advance. Of course, some restriction is necessary, otherwise no decomposition would be enforced at all. This restriction is realized by allowing only irreducible components. The advantage of this formulation now is that it can clearly be applied to any random vector, although of course a trivial decomposition might be the result in the case of an irreducible random vector. Obvious indeterminacies of an ISA of X are, as mentioned above, scalings i.e. invertible transformations within each Si and permutation of Si of the same dimension1. These are already all indeterminacies as shown by the following theorem, which extends previous results in the case of ICA [6] and k-ISA [11], where also the additional slight assumptions on square-integrability i.e. on existing covariance have been made. Theorem 1.2. Given a random vector X with existing covariance and no Gaussian independent component, then an ISA of X exists and is unique except for scaling and permutation. Existence holds trivially but uniqueness is not obvious. Due to the limited space, we only give a short sketch of the proof in the following. The uniqueness result can easily be formulated as a subspace extraction problem, and theorem 1.2 follows readily from Lemma 1.3. Let S = (S1 , . . . , Sk ) be a square-integrable decomposition of S into irreducible independent components Si . If X is an irreducible component of S, then X ∼ Si for some i. Here the equivalence relation ∼ denotes equality except for an invertible transformation. The following two lemmata each give a simplification of lemma 1.3 by ordering the components Si according to their dimensions. Some care has to be taken when showing that lemma 1.5 implies lemma 1.4. Lemma 1.4. Let S and X be defined as in lemma 1.3. In addition assume that dim Si = dim X for i ≤ l and dim Si < dim X for i > l. Then X ∼ Si for some i ≤ l. Lemma 1.5. Let S and X be defined as in lemma 1.4, and let l = 1 and k = 2. Then X ∼ S1 . In order to prove lemma 1.5 (and hence the theorem), it is sufficient to show the following lemma: Lemma 1.6. Let S = (S1 , S2 ) with S1 irreducible and m := dim S1 > dim S2 =: n. If X = AS is again irreducible for some m × (m + n)-matrix A, then (i) the left m × m-submatrix of A is invertible, and (ii) if X is an independent component of S, the right m × n-submatrix of A vanishes. (i) follows after some linear algebra, and is necessary to show the more difficult part (ii). For this, we follow the ideas presented in [12] using factorization of the joint characteristic function of S. 1.4 Dealing with Gaussians In the previous section, Gaussians had to be excluded (or at most one was allowed) in order to avoid additional indeterminacies. Indeed, any orthogonal transformation of two decorrelated hence independent Gaussians is again independent, so clearly such a strong identification result would not be possible. Recently, a general decomposition model dealing with Gaussians was proposed in the form of the socalled non-Gaussian subspace analysis (NGSA) [3]. It tries to detect a whole non-Gaussian subspace within the data, and no assumption of independence within the subspace is made. More precisely, given a random vector X, a factorization X = AS with an invertible matrix A, S = (SN , SG ) and SN a square-integrable m-dimensional random vector is called an m-decomposition of X if SN and SG are stochastically independent and SG is Gaussian. In this case, X is said to be mdecomposable. X is denoted to be minimally n-decomposable if X is not (n − 1)-decomposable. According to our previous notation, SN and SG are independent components of X. It has been shown that the subspaces of such decompositions are unique [12]: 1 Note that scaling here implies a basis change in the component Si , so for example in the case of a twodimensional source component, this might be rotation and sheering. In the example later in figure 3, these indeterminacies can easily be seen by comparing true and estimated sources. Theorem 1.7 (Uniqueness of NGSA). The mixing matrix A of a minimal decomposition is unique except for transformations in each of the two subspaces. Moreover, explicit algorithms can be constructed for identifying the subspaces [3]. This result enables us to generalize theorem 1.2and to get a general decomposition theorem, which characterizes solutions of ISA. Theorem 1.8 (Existence and Uniqueness of ISA). Given a random vector X with existing covariance, an ISA of X exists and is unique except for permutation of components of the same dimension and invertible transformations within each independent component and within the Gaussian part. Proof. Existence is obvious. Uniqueness follows after first applying theorem 1.7 to X and then theorem 1.2 to the non-Gaussian part. 2 Joint block diagonalization with unknown block-sizes Joint diagonalization has become an important tool in ICA-based BSS (used for example in JADE) or in BSS relying on second-order temporal decorrelation. The task of (real) joint diagonalization (JD) of a set of symmetric real n×n matrices M := {M1 , . . . , MK } is to find an orthogonal matrix K ˆ ˆ ˆ E such that E⊤ Mk E is diagonal for all k = 1, . . . , K i.e. to minimizef (E) := k=1 E⊤ Mk E − ˆ ⊤ Mk E) 2 with respect to the orthogonal matrix E, where diagM(M) produces a matrix ˆ ˆ diagM(E F where all off-diagonal elements of M have been set to zero, and M 2 := tr(MM⊤ ) denotes the F squared Frobenius norm. The Frobenius norm is invariant under conjugation by an orthogonal maˆ ˆ⊤ ˆ 2 trix, so minimizing f is equivalent to maximizing g(E) := K k=1 diag(E Mk E) , where now diag(M) := (mii )i denotes the diagonal of M. For the actual minimization of f respectively maximization of g, we will use the common approach of Jacobi-like optimization by iterative applications of Givens rotation in two coordinates [5]. 2.1 Generalization to blocks In the following we will use a generalization of JD in order to solve ISA problems. Instead of fully diagonalizing all n × n matrices Mk ∈ M, in joint block diagonalization (JBD) of M we want to determine E such that E⊤ Mk E is block-diagonal. Depending on the application, we fix the blockstructure in advance or try to determine it from M. We are not interested in the order of the blocks, so the block-structure is uniquely specified by fixing a partition of n i.e. a way of writing n as a sum of positive integers, where the order of the addends is not significant. So let2 n = m1 + . . . + mr with m1 ≤ m2 ≤ . . . ≤ mr and set m := (m1 , . . . , mr ) ∈ Nr . An n × n matrix is said to be m-block diagonal if it is of the form   D1 · · · 0  . .  .. .   . . . . 0 · · · Dr with arbitrary mi × mi matrices Di . As generalization of JD in the case of known the block structure, we can formulate the joint mK ˆ ˆ⊤ ˆ block diagonalization (m-JBD) problem as the minimization of f m (E) := k=1 E Mk E − m ˆ⊤ ˆ 2 with respect to the orthogonal matrix E, where diagMm (M) produces a ˆ diagM (E Mk E) F m-block diagonal matrix by setting all other elements of M to zero. In practice due to estimation errors, such E will not exist, so we speak of approximate JBD and imply minimizing some errormeasure on non-block-diagonality. Indeterminacies of any m-JBD are m-scaling i.e. multiplication by an m-block diagonal matrix from the right, and m-permutation defined by a permutation matrix that only swaps blocks of the same size. Finally, we speak of general JBD if we search for a JBD but no block structure is given; instead it is to be determined from the matrix set. For this it is necessary to require a block 2 We do not use the convention from Ferrers graphs of specifying partitions in decreasing order, as a visualization of increasing block-sizes seems to be preferable in our setting. structure of maximal length, otherwise trivial solutions or ‘in-between’ solutions could exist (and obviously contain high indeterminacies). Formally, E is said to be a (general) JBD of M if (E, m) = argmaxm | ∃E:f m (E)=0 |m|. In practice due to errors, a true JBD would always result in the trivial decomposition m = (n), so we define an approximate general JBD by requiring f m (E) < ǫ for some fixed constant ǫ > 0 instead of f m (E) = 0. 2.2 JBD by JD A few algorithms to actually perform JBD have been proposed, see [1] and references therein. In the following we will simply perform joint diagonalization and then permute the columns of E to achieve block-diagonality — in experiments this turns out to be an efficient solution to JBD [1]. This idea has been formulated in a conjecture [1] essentially claiming that a minimum of the JD cost function f already is a JBD i.e. a minimum of the function f m up to a permutation matrix. Indeed, in the conjecture it is required to use the Jacobi-update algorithm from [5], but this is not necessary, and we can prove the conjecture partially: We want to show that JD implies JBD up to permutation, i.e. if E is a minimum of f , then there exists a permutation P such that f m (EP) = 0 (given existence of a JBD of M). But of course f (EP) = f (E), so we will show why (certain) JBD solutions are minima of f . However, JD might have additional minima. First note that clearly not any JBD minimizes f , only those such that in each block of size mk , f (E) when restricted to the block is maximal over E ∈ O(mk ). We will call such a JBD block-optimal in the following. Theorem 2.1. Any block-optimal JBD of M (zero of f m ) is a local minimum of f . Proof. Let E ∈ O(n) be block-optimal with f m (E) = 0. We have to show that E is a local minimum of f or equivalently a local maximum of the squared diagonal sum g. After substituting each Mk by E⊤ Mk E, we may already assume that Mk is m-block diagonal, so we have to show that E = I is a local maximum of g. Consider the elementary Givens rotation Gij (ǫ) defined for i < j and ǫ√ (−1, 1) as the orthogonal ∈ matrix, where all diagonal elements are 1 except for the two elements 1 − ǫ2 in rows i and j and with all off-diagonal elements equal to 0 except for the two elements ǫ and −ǫ at (i, j) and (j, i), respectively. It can be used to construct local coordinates of the d := n(n − 1)/2-dimensional manifold O(n) at I, simply by ι(ǫ12 , ǫ13 , . . . , ǫn−1,n ) := i < j. If i, j are in the same block of m, then h is locally maximal i.e. negative semi-definite at 0 in the direction ǫij because of block-optimality. Now assume i and j are from different blocks. After possible permutation, we may assume that j = i + 1 so that each matrix Mk ∈ M has (Mk )ij = (Mk )ji = 0, and ak := (Mk )ii , bk := (Mk )jj . Then Gij (ǫ)⊤ Mk Gij (ǫ) can be easily calculated at coordinates (i, i) to (j, j), and indeed entries on the diagonal other than at indices (i, i) and (j, j) are not changed, so diag(Gij (ǫ)⊤ Mk Gij (ǫ)) 2 2 − diag(Mk ) 2 = 2 = −2ak (ak − bk )ǫ + 2bk (ak − bk )ǫ + 2(ak − bk )2 ǫ4 = −2(a2 + b2 )ǫ2 + 2(ak − bk )2 ǫ4 . k k Hence h(0, . . . , 0, ǫij , 0, . . . , 0) − h(0) = −cǫ2 + dǫ4 with c = 2 K (a2 + b2 ) and d = ij ij k k=1 k K 2 k=1 (ak − bk )2 . Now either c = 0, then also d = 0 and h is constant zero in the direction ǫij . Or, more interestingly, c = 0, then c > 0 and therefore h is negative definite in the direction ǫij . Altogether we get a negative definite h at 0 except for ‘trivial directions’, and hence a local maximum at 0. 2.3 Recovering the permutation In order to perform JBD, we therefore only have to find a JD E of M. What is left according to the above theorem is to find a permutation matrix P such that EP block-diagonalizes M. In the case of known block-order m, we can employ similar techniques as used in [1, 10], which essentially find P by some combinatorial optimization. 5 5 5 10 10 10 15 15 15 20 20 20 25 25 25 30 30 30 35 35 40 35 40 40 5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40 ˆ⊤ (a) (unknown) block diagonal M1 (b) E E w/o recovered permutation 5 10 15 20 25 30 35 40 ˆ⊤ (c) E E Figure 2: Performance of the proposed general JBD algorithm in the case of the (unknown) blockpartition 40 = 1 + 2 + 2 + 3 + 3 + 5 + 6 + 6 + 6 + 6 in the presence of noise with SNR of 5dB. The ˆ product E⊤ E of the inverse of the estimated block diagonalizer and the original one is an m-block diagonal matrix except for permutation within groups of the same sizes as claimed in section 2.2. In the case of unknown block-size, we propose to use the following simple permutation-recovery K algorithm: consider the mean diagonalized matrix D := K −1 k=1 E⊤ Mk E. Due to the assumption that M is m-block-diagonalizable (with unknown m), each E⊤ Mk E and hence also D must be m-block-diagonal except for a permutation P, so it must have the corresponding number of zeros in each column and row. In the approximate JBD case, thresholding with a threshold θ is necessary, whose choice is non-trivial. We propose using algorithm 1 to recover the permutation; we denote its resulting permuted matrix by P(D) when applied to the input D. P(D) is constructed from possibly thresholded D by iteratively permuting columns and rows in order to guarantee that all non-zeros of D are clustered along the diagonal as closely as possible. This recovers the permutation as well as the partition m of n. Algorithm 1: Block-diagonality permutation finder Input: (n × n)-matrix D Output: block-diagonal matrix P(D) := D′ such that D′ = PDPT for a permutation matrix P D′ ← D for i ← 1 to n do repeat if (j0 ← min{j|j ≥ i and d′ = 0 and d′ = 0}) exists then ij ji if (k0 ← min{k|k > j0 and (d′ = 0 or d′ = 0)}) exists then ik ki swap column j0 of D′ with column k0 swap row j0 of D′ with row k0 until no swap has occurred ; We illustrate the performance of the proposed JBD algorithm as follows: we generate a set of K = 100 m-block-diagonal matrices Dk of dimension 40 × 40 with m = (1, 2, 2, 3, 3, 5, 6, 6, 6, 6). They have been generated in blocks of size m with coefficients chosen randomly uniform from [−1, 1], and symmetrized by Dk ← (Dk + D⊤ )/2. After that, they have been mixed by a random k orthogonal mixing matrix E ∈ O(40), i.e. Mk := EDk E⊤ + N, where N is a noise matrix with independent Gaussian entries such that the resulting signal-to-noise ratio is 5dB. Application of the JBD algorithm from above to {M1 , . . . , MK } with threshold θ = 0.1 correctly recovers the ˆ block sizes, and the estimated block diagonalizer E equals E up to m-scaling and permutation, as illustrated in figure 2. 3 SJADE — a simple algorithm for general ISA As usual by preprocessing of the observations X by whitening we may assume that Cov(X) = I. The indeterminacies allow scaling transformations in the sources, so without loss of generality let 5.5 6 5 5.5 5 1 5 2 4.5 4 3 5 4.5 4 3 4 2 5 4.5 4 3.5 4 3.5 6 3 1 2.5 0 5 3.5 3 7 3 2.5 8 4 3 2 2.5 2 5 4 2 1 1.5 7 8 9 10 11 12 13 14 2 3 4 5 (a) S2 6 7 8 9 1.5 3 3.5 4 (b) S3 4.5 5 5.5 6 6.5 10 1 0 7 (c) S4 5 9 3 2 0 1 14 3 4 5 6 7 8 9 10 ˆ (e) A−1 A −3 1 −3.5 4.5 2 (d) S5 13 250 0 −4 4 −1 −4.5 12 200 −5 −3 −6 −4 −6.5 11 150 −2 −5.5 3.5 −5 0 3 10 100 2.5 −1 −7 9 50 2 5 −2 4 3 −3 −7.5 2 −4 1.5 −6.5 −6 −5.5 −5 −4.5 −4 −3.5 −3 ˆ ˆ (f) (S1 , S2 ) −2.5 −2 0 −4 −3.5 −3 −2.5 −2 −1.5 −1 −0.5 ˆ (g) histogram of S3 0 8 −4.5 −4 −3.5 −3 −2.5 −2 (h) S4 −1.5 −1 −0.5 −8 −7.5 −7 −6.5 −6 −5.5 −5 −4.5 (i) S5 −4 −3.5 −3 −2.5 1 −5 0 (j) S6 Figure 3: Example application of general ISA for unknown sizes m = (1, 2, 2, 2, 3). Shown are the ˆ scatter plots i.e. densities of the source components and the mixing-separating map A−1 A. also Cov(S) = I. Then I = Cov(X) = A Cov(S)A⊤ = AA⊤ so A is orthogonal. Due to the ISA assumptions, the fourth-order cross cumulants of the sources have to be trivial between different groups, and within the Gaussians. In order to find transformations of the mixtures fulfilling this property, we follow the idea of the JADE algorithmbut now in the ISA setting. We perform JBD of the (whitened) contracted quadricovariance matrices defined by Cij (X) := E X⊤ Eij XXX⊤ − Eij − E⊤ − tr(Eij )I. Here RX := Cov(X) and Eij is a set of eigen-matrices of Cij , 1 ≤ i, j ≤ n. ij One simple choice is to use n2 matrices Eij with zeros everywhere except 1 at index (i, j). More elaborate choices of eigen-matrices (with only n(n + 1)/2 or even n entries) are possible. The resulting algorithm, subspace-JADE (SJADE) not only performs NGCA by grouping Gaussians as one-dimensional components with trivial Cii ’s, but also automatically finds the subspace partition m using the general JBD algorithm from section 2.3. 4 Experimental results In a first example, we consider a general ISA problem in dimension n = 10 with the unknown partition m = (1, 2, 2, 2, 3). In order to generate 2- and 3-dimensional irreducible random vectors, we decided to follow the nice visual ideas from [10] and to draw samples from a density following a known shape — in our case 2d-letters or 3d-geometrical shapes. The chosen sources densities are shown in figure 3(a-d). Another 1-dimensional source following a uniform distribution was constructed. Altogether 104 samples were used. The sources S were mixed by a mixing matrix A with coefficients uniformly randomly sampled from [−1, 1] to give mixtures X = AS. The recovered ˆ mixing matrix A was then estimated using the above block-JADE algorithm with unknown block size; we observed that the method is quite sensitive to the choice of the threshold (here θ = 0.015). ˆ Figure 3(e) shows the composed mixing-separating system A−1 A; clearly the matrices are equal except for block permutation and scaling, which experimentally confirms theorem 1.8. The algoˆ rithm found a partition m = (1, 1, 1, 2, 2, 3), so one 2d-source was misinterpreted as two 1d-sources, but by using previous knowledge combination of the correct two 1d-sources yields the original 2dˆ ˆ source. The resulting recovered sources S := A−1 X, figures 3(f-j), then equal the original sources except for permutation and scaling within the sources — which in the higher-dimensional cases implies transformations such as rotation of the underlying images or shapes. When applying ICA (1-ISA) to the above mixtures, we cannot expect to recover the original sources as explained in figure 1; however, some algorithms might recover the sources up to permutation. Indeed, SJADE equals JADE with additional permutation recovery because the joint block diagonalization is per- formed using joint diagonalization. This explains why JADE retrieves meaningful components even in this non-ICA setting as observed in [4]. In a second example, we illustrate how the algorithm deals with Gaussian sources i.e. how the subspace JADE also includes NGCA. For this we consider the case n = 5, m = (1, 1, 1, 2) and sources with two Gaussians, one uniform and a 2-dimensional irreducible component as before; 105 samples were drawn. We perform 100 Monte-Carlo simulations with random mixing matrix ˆ A, and apply SJADE with θ = 0.01. The recovered mixing matrix A is compared with A by 3 2 2 2 ˆ −1 A. Indeed, we get nearly taking the ad-hoc measure ι(P) := i=1 j=1 (pij + pji ) for P := A perfect recovery in 99 out of 100 runs, the median of ι(P) is very low with 0.0083. A single run diverges with ι(P ) = 3.48. In order to show that the algorithm really separates the Gaussian part from the other components, we compare the recovered source kurtoses. The median kurtoses are −0.0006 ± 0.02, −0.003 ± 0.3, −1.2 ± 0.3, −1.2 ± 0.2 and −1.6 ± 0.2. The first two components have kurtoses close to zero, so they are the two Gaussians, whereas the third component has kurtosis of around −1.2, which equals the kurtosis of a uniform density. This confirms the applicability of the algorithm in the general, noisy ISA setting. 5 Conclusion Previous approaches for independent subspace analysis were restricted either to fixed group sizes or semi-parametric models. In neither case, general applicability to any kind of mixture data set was guaranteed, so blind source separation might fail. In the present contribution we introduce the concept of irreducible independent components and give an identifiability result for this general, parameter-free model together with a novel arbitrary-subspace-size algorithm based on joint block diagonalization. As in ICA, the main uniqueness theorem is an asymptotic result (but includes noisy case via NGCA). However in practice in the finite sample case, due to estimation errors the general joint block diagonality only approximately holds. Our simple solution in this contribution was to choose appropriate thresholds. But this choice is non-trivial, and adaptive methods are to be developed in future works. References [1] K. Abed-Meraim and A. Belouchrani. Algorithms for joint block diagonalization. In Proc. EUSIPCO 2004, pages 209–212, Vienna, Austria, 2004. [2] F.R. Bach and M.I. Jordan. Finding clusters in independent component analysis. In Proc. ICA 2003, pages 891–896, 2003. [3] G. Blanchard, M. Kawanabe, M. Sugiyama, V. Spokoiny, and K.-R. M¨ ller. In search of non-gaussian u components of a high-dimensional distribution. JMLR, 7:247–282, 2006. [4] J.F. Cardoso. Multidimensional independent component analysis. In Proc. of ICASSP ’98, Seattle, 1998. [5] J.F. Cardoso and A. Souloumiac. Jacobi angles for simultaneous diagonalization. SIAM J. Mat. Anal. Appl., 17(1):161–164, January 1995. [6] P. Comon. Independent component analysis - a new concept? Signal Processing, 36:287–314, 1994. [7] A. Hyv¨ rinen and P.O. Hoyer. Emergence of phase and shift invariant features by decomposition of a natural images into independent feature subspaces. Neural Computation, 12(7):1705–1720, 2000. [8] A. Hyv¨ rinen, P.O. Hoyer, and M. Inki. Topographic independent component analysis. Neural Computaa tion, 13(7):1525–1558, 2001. [9] J.K. Lin. Factorizing multivariate function classes. In Advances in Neural Information Processing Systems, volume 10, pages 563–569, 1998. [10] B. Poczos and A. L¨ rincz. Independent subspace analysis using k-nearest neighborhood distances. In o Proc. ICANN 2005, volume 3696 of LNCS, pages 163–168, Warsaw, Poland, 2005. Springer. [11] F.J. Theis. Uniqueness of complex and multidimensional independent component analysis. Signal Processing, 84(5):951–956, 2004. [12] F.J. Theis and M. Kawanabe. Uniqueness of non-gaussian subspace analysis. In Proc. ICA 2006, pages 917–925, Charleston, USA, 2006. [13] R. Vollgraf and K. Obermayer. Multi-dimensional ICA to separate correlated sources. In Proc. NIPS 2001, pages 993–1000, 2001.

4 0.54974437 12 nips-2006-A Probabilistic Algorithm Integrating Source Localization and Noise Suppression of MEG and EEG data

Author: Johanna M. Zumer, Hagai T. Attias, Kensuke Sekihara, Srikantan S. Nagarajan

Abstract: We have developed a novel algorithm for integrating source localization and noise suppression based on a probabilistic graphical model of stimulus-evoked MEG/EEG data. Our algorithm localizes multiple dipoles while suppressing noise sources with the computational complexity equivalent to a single dipole scan, and is therefore more efficient than traditional multidipole fitting procedures. In simulation, the algorithm can accurately localize and estimate the time course of several simultaneously-active dipoles, with rotating or fixed orientation, at noise levels typical for averaged MEG data. Furthermore, the algorithm is superior to beamforming techniques, which we show to be an approximation to our graphical model, in estimation of temporally correlated sources. Success of this algorithm for localizing auditory cortex in a tumor patient and for localizing an epileptic spike source are also demonstrated. 1

5 0.45220384 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization

Author: Rey Ramírez, Jason Palmer, Scott Makeig, Bhaskar D. Rao, David P. Wipf

Abstract: The ill-posed nature of the MEG/EEG source localization problem requires the incorporation of prior assumptions when choosing an appropriate solution out of an infinite set of candidates. Bayesian methods are useful in this capacity because they allow these assumptions to be explicitly quantified. Recently, a number of empirical Bayesian approaches have been proposed that attempt a form of model selection by using the data to guide the search for an appropriate prior. While seemingly quite different in many respects, we apply a unifying framework based on automatic relevance determination (ARD) that elucidates various attributes of these methods and suggests directions for improvement. We also derive theoretical properties of this methodology related to convergence, local minima, and localization bias and explore connections with established algorithms. 1

6 0.40872803 179 nips-2006-Sparse Representation for Signal Classification

7 0.40816829 176 nips-2006-Single Channel Speech Separation Using Factorial Dynamics

8 0.40386289 107 nips-2006-Large Margin Multi-channel Analog-to-Digital Conversion with Applications to Neural Prosthesis

9 0.3759695 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis

10 0.34095511 174 nips-2006-Similarity by Composition

11 0.31959921 131 nips-2006-Mixture Regression for Covariate Shift

12 0.29211974 49 nips-2006-Causal inference in sensorimotor integration

13 0.29199639 167 nips-2006-Recursive ICA

14 0.28949875 169 nips-2006-Relational Learning with Gaussian Processes

15 0.28394425 35 nips-2006-Approximate inference using planar graph decomposition

16 0.25577879 43 nips-2006-Bayesian Model Scoring in Markov Random Fields

17 0.25557226 153 nips-2006-Online Clustering of Moving Hyperplanes

18 0.25504208 149 nips-2006-Nonnegative Sparse PCA

19 0.25277972 29 nips-2006-An Information Theoretic Framework for Eukaryotic Gradient Sensing

20 0.25252986 57 nips-2006-Conditional mean field


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.646), (3, 0.017), (7, 0.039), (8, 0.013), (9, 0.031), (22, 0.035), (44, 0.034), (57, 0.042), (65, 0.021), (69, 0.029), (90, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98997939 186 nips-2006-Support Vector Machines on a Budget

Author: Ofer Dekel, Yoram Singer

Abstract: The standard Support Vector Machine formulation does not provide its user with the ability to explicitly control the number of support vectors used to define the generated classifier. We present a modified version of SVM that allows the user to set a budget parameter B and focuses on minimizing the loss attained by the B worst-classified examples while ignoring the remaining examples. This idea can be used to derive sparse versions of both L1-SVM and L2-SVM. Technically, we obtain these new SVM variants by replacing the 1-norm in the standard SVM formulation with various interpolation-norms. We also adapt the SMO optimization algorithm to our setting and report on some preliminary experimental results. 1

same-paper 2 0.98420244 46 nips-2006-Blind source separation for over-determined delayed mixtures

Author: Lars Omlor, Martin Giese

Abstract: Blind source separation, i.e. the extraction of unknown sources from a set of given signals, is relevant for many applications. A special case of this problem is dimension reduction, where the goal is to approximate a given set of signals by superpositions of a minimal number of sources. Since in this case the signals outnumber the sources the problem is over-determined. Most popular approaches for addressing this problem are based on purely linear mixing models. However, many applications like the modeling of acoustic signals, EMG signals, or movement trajectories, require temporal shift-invariance of the extracted components. This case has only rarely been treated in the computational literature, and specifically for the case of dimension reduction almost no algorithms have been proposed. We present a new algorithm for the solution of this problem, which is based on a timefrequency transformation (Wigner-Ville distribution) of the generative model. We show that this algorithm outperforms classical source separation algorithms for linear mixtures, and also a related method for mixtures with delays. In addition, applying the new algorithm to trajectories of human gaits, we demonstrate that it is suitable for the extraction of spatio-temporal components that are easier to interpret than components extracted with other classical algorithms. 1

3 0.97287577 63 nips-2006-Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods

Author: Matthias Seeger

Abstract: We propose a highly efficient framework for kernel multi-class models with a large and structured set of classes. Kernel parameters are learned automatically by maximizing the cross-validation log likelihood, and predictive probabilities are estimated. We demonstrate our approach on large scale text classification tasks with hierarchical class structure, achieving state-of-the-art results in an order of magnitude less time than previous work. 1

4 0.96521765 142 nips-2006-Mutagenetic tree Fisher kernel improves prediction of HIV drug resistance from viral genotype

Author: Tobias Sing, Niko Beerenwinkel

Abstract: Starting with the work of Jaakkola and Haussler, a variety of approaches have been proposed for coupling domain-specific generative models with statistical learning methods. The link is established by a kernel function which provides a similarity measure based inherently on the underlying model. In computational biology, the full promise of this framework has rarely ever been exploited, as most kernels are derived from very generic models, such as sequence profiles or hidden Markov models. Here, we introduce the MTreeMix kernel, which is based on a generative model tailored to the underlying biological mechanism. Specifically, the kernel quantifies the similarity of evolutionary escape from antiviral drug pressure between two viral sequence samples. We compare this novel kernel to a standard, evolution-agnostic amino acid encoding in the prediction of HIV drug resistance from genotype, using support vector regression. The results show significant improvements in predictive performance across 17 anti-HIV drugs. Thus, in our study, the generative-discriminative paradigm is key to bridging the gap between population genetic modeling and clinical decision making. 1

5 0.94795692 73 nips-2006-Efficient Methods for Privacy Preserving Face Detection

Author: Shai Avidan, Moshe Butman

Abstract: Bob offers a face-detection web service where clients can submit their images for analysis. Alice would very much like to use the service, but is reluctant to reveal the content of her images to Bob. Bob, for his part, is reluctant to release his face detector, as he spent a lot of time, energy and money constructing it. Secure MultiParty computations use cryptographic tools to solve this problem without leaking any information. Unfortunately, these methods are slow to compute and we introduce a couple of machine learning techniques that allow the parties to solve the problem while leaking a controlled amount of information. The first method is an information-bottleneck variant of AdaBoost that lets Bob find a subset of features that are enough for classifying an image patch, but not enough to actually reconstruct it. The second machine learning technique is active learning that allows Alice to construct an online classifier, based on a small number of calls to Bob’s face detector. She can then use her online classifier as a fast rejector before using a cryptographically secure classifier on the remaining image patches. 1

6 0.81426913 179 nips-2006-Sparse Representation for Signal Classification

7 0.80745727 152 nips-2006-Online Classification for Complex Problems Using Simultaneous Projections

8 0.80482823 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space

9 0.80398631 108 nips-2006-Large Scale Hidden Semi-Markov SVMs

10 0.77046102 138 nips-2006-Multi-Task Feature Learning

11 0.75869095 82 nips-2006-Gaussian and Wishart Hyperkernels

12 0.75469023 177 nips-2006-Sparse Kernel Orthonormalized PLS for feature extraction in large data sets

13 0.75449187 178 nips-2006-Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation

14 0.74808621 140 nips-2006-Multiple Instance Learning for Computer Aided Diagnosis

15 0.74348485 111 nips-2006-Learning Motion Style Synthesis from Perceptual Observations

16 0.7368623 75 nips-2006-Efficient sparse coding algorithms

17 0.7364226 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation

18 0.73099095 99 nips-2006-Information Bottleneck Optimization and Independent Component Extraction with Spiking Neurons

19 0.72762179 65 nips-2006-Denoising and Dimension Reduction in Feature Space

20 0.72661483 64 nips-2006-Data Integration for Classification Problems Employing Gaussian Process Priors