jmlr jmlr2011 jmlr2011-10 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Lars Omlor, Martin A. Giese
Abstract: Blind source separation problems emerge in many applications, where signals can be modeled as superpositions of multiple sources. Many popular applications of blind source separation are based on linear instantaneous mixture models. If specific invariance properties are known about the sources, for example, translation or rotation invariance, the simple linear model can be extended by inclusion of the corresponding transformations. When the sources are invariant against translations (spatial displacements or time shifts) the resulting model is called an anechoic mixing model. We present a new algorithmic framework for the solution of anechoic problems in arbitrary dimensions. This framework is derived from stochastic time-frequency analysis in general, and the marginal properties of the Wigner-Ville spectrum in particular. The method reduces the general anechoic problem to a set of anechoic problems with non-negativity constraints and a phase retrieval problem. The first type of subproblem can be solved by existing algorithms, for example by an appropriate modification of non-negative matrix factorization (NMF). The second subproblem is solved by established phase retrieval methods. We discuss and compare implementations of this new algorithmic framework for several example problems with synthetic and real-world data, including music streams, natural 2D images, human motion trajectories and two-dimensional shapes. Keywords: blind source separation, anechoic mixtures, time-frequency transformations, linear canonical transform, Wigner-Ville spectrum
Reference: text
sentIndex sentText sentNum sentScore
1 Many popular applications of blind source separation are based on linear instantaneous mixture models. [sent-8, score-0.518]
2 When the sources are invariant against translations (spatial displacements or time shifts) the resulting model is called an anechoic mixing model. [sent-10, score-0.583]
3 We present a new algorithmic framework for the solution of anechoic problems in arbitrary dimensions. [sent-11, score-0.417]
4 The method reduces the general anechoic problem to a set of anechoic problems with non-negativity constraints and a phase retrieval problem. [sent-13, score-1.031]
5 Keywords: blind source separation, anechoic mixtures, time-frequency transformations, linear canonical transform, Wigner-Ville spectrum 1. [sent-17, score-0.776]
6 The most elementary class of such methods is based on linear mixture models that combine source signals or mixture components as weighted linear sum. [sent-21, score-0.317]
7 Such linear blind source separation methods have a wide spectrum of applications. [sent-22, score-0.449]
8 Popular approaches exploit typically two classes of generative models: instantaneous and convolutive mixtures (Choi et al. [sent-27, score-0.253]
9 The time-dependent signals xi (t) are approximated by the linear superposition of a number of hidden source signals s j (t). [sent-37, score-0.332]
10 Contrasting with this model, convolutive mixtures assume that the source signals are filtered assuming the filter kernels αi j (t) prior to the superposition, resulting in the mixing model: n ∞ j=1 xi (t) = −∞ ∑ αi j (τ)s j (t − τ)dτ i = 1, · · · , m. [sent-39, score-0.408]
11 In between these two model classes are anechoic mixture models which linearly combine time-shifted and scaled versions of the sources, without permitting multiple occurrences of the same source in the mixture. [sent-41, score-0.564]
12 (2) Compared to full covolutive models, anechoic models constrain substantially the space of admissible filter functions. [sent-43, score-0.417]
13 Apart from the problem of robust parameter estimation with limited amounts of data, all blind source separation methods suffer from intrinsic ambiguities. [sent-46, score-0.384]
14 In contrast, for anechoic mixtures (see Equation (2)) the filter ambiguity is limited to an unknown scaling and arbitrary additive shift that can be applied to all time delays τi j , while the form of the source functions is uniquely defined, except for absolute position. [sent-51, score-0.711]
15 This implies that anechoic mixtures can be advantageous for the modeling of data that are consistent with the corresponding generative model. [sent-52, score-0.496]
16 , 2008), where the same control signal might influence several muscles or joints with different delays, or in functional magnetic resonance imaging, where time shift occur naturally due to hemodynamic delays (Mørup et al. [sent-61, score-0.25]
17 The major part of previous work on anechoic mixtures has considered under-determined (overcomplete) source separation problems (m ≤ n), where the sources outnumber the dimensions of the available data. [sent-63, score-0.846]
18 In this paper we present a present a new algorithmic framework for the solution of arbitrary anechoic mixture problems, which is independent of the number of sources and the dimensionality of the data. [sent-71, score-0.611]
19 The obtained solution in timefrequency space is then transformed back into signal space, where in a second step the full solution of the original problem is determined by solution of a phase retrieval problem. [sent-78, score-0.309]
20 For the solution of the non-negative anechoic mixture problem a modification of non-negative matrix factorization (NMF) (Lee and Seung, 1999) is presented in 3. [sent-87, score-0.522]
21 The work in Torkkola (1996a,b) extended the information maximization approach by Anthony and Sejnowski (1995), using the adaptive delay architecture described in Platt and Faggin (1992) in order to unmix anechoic 2 × 2 mixtures. [sent-99, score-0.537]
22 This implies that preprocessing of the signals, for example, by application of bilinear time-frequency transformations before source separation can be essential. [sent-113, score-0.261]
23 In this context time-frequency representations have been used quite frequently in the context of blind source separation (Karako-Eilon et al. [sent-114, score-0.384]
24 , 1998), and specifically for anechoic demixing (Yilmaz and Rickard, 2004). [sent-117, score-0.517]
25 Contrasting with this work, our approach relies on further properties of the stochastic Wigner-Ville spectrum (WVS), which to our knowledge, never have been exploited for source separation previously. [sent-119, score-0.291]
26 It was later reintroduced in signal analysis by Ville (1948), with the basic idea of defining a joint distribution of the signal energy simultaneously in time and frequency (in physics corresponding to coordinates and momentum). [sent-137, score-0.278]
27 The only requirement for a zero mean random signal to be harmonizable is the existence of a Fourier representation Φ of its autocovariance function rx (t,t ′ ) that is defined by: rx (t,t ′ ) := E{x(t)x∗ (t ′ )} = ′ e2π (λt−µt ) Φ(λ, µ)dλdµ. [sent-145, score-0.265]
28 – Time-frequency shift covariant: The Wigner-Ville spectrum of a time frequency shifted signal x(t) = x(t − t0 )e2π f0 (t−t0 ) is the shifted WVS of the original signal: ˜ Wx (t, f ) = Wx (t − t0 , f − f0 ). [sent-167, score-0.375]
29 – Shift Theorem: Of particular interest for the anechoic mixing problem is the behavior of the linear canonical transform under shifts of the input signal: L (M)[x(t − τ)]( f ) = exp π(2 f − Aτ)T Cτ L (M)[x]( f − Aτ). [sent-230, score-0.634]
30 (14) Since the output signal is translated by the term Aτ a good choice of the matrix A can optimize the separability of the the signal x from it is shifted counterparts in particular LCT domains (Sharmaa and Joshi, 2006). [sent-231, score-0.313]
31 Algorithmic Framework: Anechoic Demixing Using Wigner Marginals In the following, we will first derive the basic algorithmic framework by application of the mathematical properties of the WVS, as discussed in the last section, to the anechoic mixing model. [sent-235, score-0.459]
32 The resulting algorithm consists of two major steps: the solution of positive anechoic demixing problems and phase retrieval. [sent-236, score-0.647]
33 , m are delayed superpositions of uncorrelated source signals s j (t), that is: n xi (t) = ∑ αi j · s j (t − τi j ) i = 1, · · · , m j=1 (15) with rsi ,s j (t,t ′ ) = E{si (t)s∗ (t ′ )} = 0 for i = j. [sent-253, score-0.284]
34 j The shift covariance (5) of the WVS and the superposition law (8) imply the following relation between the spectra of the sources and observations: n Wxi (t, f ) = ∑ |αi j |2Ws j (t − τi j , f ) = j=1 n ∑ |αi j |2WT s (t, f ). [sent-254, score-0.261]
35 ij j j=1 1122 (18) A NECHOIC B LIND S OURCE S EPARATION U SING W IGNER M ARGINALS This equation could directly serve as basis for the separation of the source signals s j . [sent-264, score-0.354]
36 The individual marginals have the form of an anechoic mixture problem with an additional nonnegativity constraint: |L (M)[xi ]|2 (t) = n ∑ |αi j |2 |L (M)[s j ](t − Aτi j )|2 + n(t). [sent-269, score-0.558]
37 However, the (approximate) solution of Equation (20) gives an estimate for the power spectra of the sources in LCT domain |L (M)[s j ]|2 , the scaled delays Aτi j and the absolute value of the weights |αi j |. [sent-274, score-0.268]
38 Since the projections onto LCT domains (19) are also anechoic mixtures, at first glance, the positive system of Equations (20) seems not to provide any advantage compared to the direct solution of Equations (15) or (18). [sent-281, score-0.417]
39 An appropriate choice of A results thus in improved separation of the signals (Sharmaa and Joshi, 2006). [sent-304, score-0.249]
40 As the multivariate LCT can be the tensorial product of several one-dimensional LCTs, A can be adjusted for each dimension individually, making it possible to transform a multivariate anechoic mixture into several coupled one-dimensional mixtures (Omlor and Giese, 2007a). [sent-306, score-0.64]
41 Opposed to standard non-negative matrix factorization (NMF), convolutive NMF assumes a non-negative convolutive model (see Equation 1). [sent-328, score-0.255]
42 For many applications, such as spectral analysis or blind image deblurring, the data can be described more exactly as a (discrete) positive convolution of the form: n xi (t) = ∑ (ai j ∗ s j )[t] = j=1 n ∑ ∑ ai j [u]s j [t − u] (22) j=1 u ai j ≥ 0 , s j ≥ 0 ∀i ∈ 1, · · · , m. [sent-332, score-0.355]
43 If both, the filters ai j and the sources s j are unknown, Equation (22) implies the estimation problem: n min xi (t) − ∑ (ai j ∗ s j )(t) ai j ,b j j=1 subject to s j ≥ 0 , ai j ≥ 0 ∀i, j. [sent-336, score-0.331]
44 In the simple case of one-dimensional (vectorial) data the convolution ai j ∗ s j is equal to the matrix products: ai j (0) ai j (N) ai j (N − 1) · · · s j (0) ai j (1) ai j (0) ai j (N) . [sent-338, score-0.513]
45 (28) j=1 The anechoic problem (28) can thus be solved using the convolutive NMF algorithm for the solution of the problem (22), if sparseness of the filters is enforced. [sent-441, score-0.527]
46 One disadvantage of delay estimation by deconvolution, is the difficulty to locate the exact peak of the sparse filter, especially for fractional delays (non-integer delays). [sent-449, score-0.29]
47 The positive anechoic mixture problem that is given by Equation (20) |L (M)[xi ]|2 (t) = n ∑ |αi j |2 |L (M)[s j ](t − Aτi j )|2 + n(t). [sent-451, score-0.487]
48 j=1 can be solved with this anechoic non-negative matrix factorization approach (ANMF). [sent-452, score-0.452]
49 (31) The fact that the cross-spectra are not present in Equation 31 permits a reformulation as a constraint for the anechoic NMF problem (19) of the form: SST = I. [sent-456, score-0.417]
50 This orthogonality constraint (32) guaranties the uniqueness of the solution of the anechoic demixing problem, making the new algorithm a true blind source separation method. [sent-462, score-0.901]
51 3 Phase Retrieval The solution of the positive anechoic mixture problem: |L (M)[xi ]|2 (t) = n ∑ |αi j |2 |L (M)[s j ](t − Aτi j )|2 j=1 that is given by Equation (20) yields the LCT power spectra |L (M)[s j ]|2 . [sent-466, score-0.531]
52 In order to reconstruct the sources s j , or equivalently their LCT transforms L (M)[s j ], the missing phase information needs to be recovered. [sent-467, score-0.286]
53 For A = 0 in this case the anechoic mixture (2) reduces to a simple deconvolution problem with known filters νi j = ai j δ(t − τi j ). [sent-473, score-0.643]
54 2 G ERCHBERG -S AXTON (GS) A LGORITHM In typical phase retrieval applications the parameters known about the signal are the phase and some other properties like the support. [sent-478, score-0.439]
55 For the phase retrieval problem in our algorithm, an estimate of the signal is transformed back and forth between different LCT domains, always replacing the estimated amplitude spectrum by the known amplitude spectrum from the solution of the positive mixture problem and re-eastimating the phase. [sent-483, score-0.509]
56 , K Step 1: Solve the positive anechoic demixing problems given by: n |L (Mk )[x j ]|2 (t) = ∑ |αi j |2 |L (Mk )[s j ](t − Aτi j )|2 j for example, using the ANMF algorithm (1) or the bayesian Probabilistic Latent Component Analysis (PLCA) algorithm (Smaragdis et al. [sent-528, score-0.517]
57 With A = 0 this −1 0 simplifies the first step in algorithm 2, reducing the positive anechoic mixture to an instantenous mixture problem: The choice M = n |F xi |2 ( f ) = ∑ |α|2j |F s j |2 ( f ). [sent-536, score-0.557]
58 Hence phase retrieval can be accomplished by a simple deconvolution procedure, as described in 3. [sent-553, score-0.284]
59 Thus this transformation just converts the original anechoic mixture problem into a nonnegative mixture problem that can be solved using the update rules given by (29) and (30). [sent-559, score-0.59]
60 1 Sound Mixtures Since the formulation of the ”cocktail party problem” by Cherry (1953) acoustics has been a major field for the application of blind source separation methods. [sent-573, score-0.423]
61 However, anechoic mixtures are relevant for the case of reverberation-free environments (Bofill, 2003; Yilmaz and Rickard, 2004). [sent-579, score-0.496]
62 To demonstrate that algorithm 2 is applicable to sound mixtures we present three different separation problems for synthetically generated delayed mixtures (model (2)) using speech and sound segments from the ICA benchmark described in Cichocki and Amari (2002). [sent-580, score-0.466]
63 Three types of mixtures were generated: (I) Mixtures of 2 source segments with random mixing and delay matrices (2 × 2), each resulting in two simulated signals x1,2 . [sent-583, score-0.418]
64 5 1 2 1 1 500 333 Data set (II) with fixed mixing and delay matrices was included since completely random generation sometimes produced degenerated anechoic mixtures (instantaneous mixtures or ill-conditioned mixing matrices). [sent-585, score-0.779]
65 3 SICA PCA Figure 2: Comparison of different blind source separation algorithms for synthetic mixtures of sound signals with delays (data sets I-III, see text). [sent-605, score-0.697]
66 (III) A third data set was generated by mixing two randomly selected source segments with random mixture matrices and random delay matrices. [sent-606, score-0.309]
67 As expected, the performance Pe of the instantaneous mixture model assumed by principal component analysis (PCA) is worse than the performance achieved by the three compared anechoic methods. [sent-617, score-0.551]
68 We show that anechoic mixtures have the potential for a strong dimensionality reduction. [sent-633, score-0.496]
69 2 1 2 3 4 5 Number of sources Number of sources (a) Approximation quality as a function of the number (b) Approximation quality as a function of the numof extracted sources for periodic gait data. [sent-680, score-0.401]
70 Figure 3: Comparison of different blind source separation algorithms. [sent-682, score-0.384]
71 3 Image Processing Blind source separation is interesting for a wide range of applications in image processing. [sent-685, score-0.255]
72 With anechoic mixtures spatial displacements can be explicitly modeled (Omlor and Giese, 2007a; Be’ery and Yeredor, 2008). [sent-691, score-0.496]
73 A quantitative comparison shows that images predicted from the extracted components predict 95% of the variance of the original images for the deconvolution method, but only 72% for the phase retrieval method. [sent-703, score-0.391]
74 Right: extracted features from the image set on the left using GR phase retrieval or deconvolution. [sent-713, score-0.255]
75 In order to apply the anechoic mixture model (2) to this problem, a planar contour can be transformed into a log-polar image. [sent-724, score-0.487]
76 1 A PPLICATION TO T WO - DIMENSIONAL S HAPES To illustrate this application of anechoic mixtures to model rotation and scale invariance, we tested the scale and rotation invariant learning on a small sample of shapes taken from the MPEG-7 test database (Sebastian et al. [sent-737, score-0.523]
77 Both the extracted features (figure 6b) as well as the corresponding weights αi j (figure 6c) show that in principle, the anechoic mixing model is capable of correctly classifying different objects, independent of their size and rotation. [sent-741, score-0.488]
78 Conclusions We have presented a novel class of algorithms for the solution of over- and under-determined anechoic mixture problems, which extend to an arbitrary number of dimensions of the time argument. [sent-752, score-0.487]
79 Proper application of this bilinear time-frequency distribution to the delayed mixture model transforms the anechoic problem into a simpler delayed mixture problem in the domain of a linear canonical transform and a phase recovery problem. [sent-754, score-0.96]
80 In addition, some aspects of the discussed time-frequency framework may be applicable to full convolutive mixtures or mixture models with other invariance properties. [sent-764, score-0.259]
81 The corruption of the signal with noise, degradation of the signal shape, moving signal sources, and the multiple overlaps of reflected copies of the signals (multipath), make TDE a very challenging problem. [sent-778, score-0.436]
82 If the observed signal x(t) is the superposition of multiple delayed signals s j : n x(t) = ∑ α j s j (t − τi j ) + n(t). [sent-781, score-0.321]
83 If the signals are uncorrelated rsi ,s j (t,t ′ ) = 0 then the ML estimation simplifies to n 1dimensional estimation problems (rx,s j (t) specifying the cross-correlation function of the signals x and s j ): n arg min E (D j ) j x(t) − ∑ α j s j (t − D j )) j=1 2 = arg max rx,s j (D j ) j . [sent-787, score-0.253]
84 For the solution of the delay estimation problem in the anechoic NMF algorithm (instead of Equation (29)), the ML-cost function (36) was minimized by a nonlinear optimization based on the nonlinear Gauss-Seidel algorithm. [sent-791, score-0.537]
85 The delays were estimated from mixtures with three randomly selected sources and constant weight A and delay matrices T = (τi j )i j : 20. [sent-797, score-0.423]
86 An information-maximization approach to blind separation and blind deconvolution. [sent-839, score-0.465]
87 A robust method to count and locate audio sources in a stereophonic linear anechoic mixture. [sent-845, score-0.541]
88 Underdetermined blind separation of delayed sound sources in the frequency domain. [sent-910, score-0.573]
89 Blind signal separation and independent component analysis: A review. [sent-968, score-0.261]
90 Recursive algorithm for phase retrieval in the fractional fourier transform domain. [sent-999, score-0.474]
91 Sparse component analysis and blind source separation of underdetermined mixtures. [sent-1060, score-0.384]
92 Classification of closely spaced subsurface objects using electromagnetic induction data and blind source separation algorithms. [sent-1104, score-0.384]
93 Blind source separation based on the fractional fourier transform. [sent-1134, score-0.429]
94 A general contrast function based blind source separation method for convolutively mixed independent sources. [sent-1151, score-0.384]
95 Underdetermined audio source separation from anechoic mixtures with long time delay. [sent-1259, score-0.722]
96 Networks for the separation of sources that are superimposed and delayed. [sent-1346, score-0.273]
97 Blind separation of convolved mixtures in the frequency domain - signal processing. [sent-1395, score-0.394]
98 Signal separation using linear canonical and fractional fourier transforms. [sent-1402, score-0.411]
99 Blind separation of convolved sources based on information maximization. [sent-1433, score-0.273]
100 Blind separation of delayed sources based on information maximization. [sent-1437, score-0.327]
wordName wordTfidf (topN-words)
[('anechoic', 0.417), ('wvs', 0.322), ('lct', 0.252), ('wigner', 0.165), ('blind', 0.158), ('nechoic', 0.157), ('wx', 0.156), ('iese', 0.149), ('igner', 0.149), ('mlor', 0.149), ('separation', 0.149), ('eparation', 0.141), ('lind', 0.141), ('fourier', 0.133), ('phase', 0.13), ('sources', 0.124), ('arginals', 0.12), ('ource', 0.12), ('delay', 0.12), ('signal', 0.112), ('convolutive', 0.11), ('delays', 0.1), ('demixing', 0.1), ('nmf', 0.1), ('signals', 0.1), ('omlor', 0.094), ('deconvolution', 0.087), ('sing', 0.082), ('mixtures', 0.079), ('source', 0.077), ('transform', 0.074), ('sica', 0.071), ('marginals', 0.071), ('mixture', 0.07), ('fractional', 0.07), ('ai', 0.069), ('retrieval', 0.067), ('spectrum', 0.065), ('instantaneous', 0.064), ('giese', 0.063), ('symplectic', 0.063), ('rx', 0.061), ('cichocki', 0.06), ('canonical', 0.059), ('superposition', 0.055), ('frequency', 0.054), ('delayed', 0.054), ('uncorrelated', 0.053), ('shifted', 0.053), ('lter', 0.051), ('gs', 0.05), ('motion', 0.05), ('hlawatsch', 0.047), ('optics', 0.047), ('rup', 0.047), ('yilmaz', 0.047), ('spectra', 0.044), ('pca', 0.043), ('shifts', 0.042), ('mixing', 0.042), ('crb', 0.039), ('ogrady', 0.039), ('rickard', 0.039), ('acoustics', 0.039), ('images', 0.039), ('shift', 0.038), ('speech', 0.037), ('separability', 0.036), ('transformations', 0.035), ('martin', 0.035), ('factorization', 0.035), ('sound', 0.034), ('transformation', 0.033), ('cram', 0.033), ('plumbley', 0.033), ('synergies', 0.033), ('cohen', 0.033), ('pe', 0.032), ('transforms', 0.032), ('anmf', 0.031), ('autocovariance', 0.031), ('avella', 0.031), ('belouchrani', 0.031), ('mecklenbruker', 0.031), ('yeredor', 0.031), ('movement', 0.031), ('convolution', 0.03), ('deterministic', 0.03), ('image', 0.029), ('extracted', 0.029), ('trajectories', 0.029), ('equation', 0.028), ('shapes', 0.027), ('snr', 0.027), ('bo', 0.027), ('cocktail', 0.027), ('flandrin', 0.027), ('human', 0.027), ('integral', 0.027), ('lters', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 10 jmlr-2011-Anechoic Blind Source Separation Using Wigner Marginals
Author: Lars Omlor, Martin A. Giese
Abstract: Blind source separation problems emerge in many applications, where signals can be modeled as superpositions of multiple sources. Many popular applications of blind source separation are based on linear instantaneous mixture models. If specific invariance properties are known about the sources, for example, translation or rotation invariance, the simple linear model can be extended by inclusion of the corresponding transformations. When the sources are invariant against translations (spatial displacements or time shifts) the resulting model is called an anechoic mixing model. We present a new algorithmic framework for the solution of anechoic problems in arbitrary dimensions. This framework is derived from stochastic time-frequency analysis in general, and the marginal properties of the Wigner-Ville spectrum in particular. The method reduces the general anechoic problem to a set of anechoic problems with non-negativity constraints and a phase retrieval problem. The first type of subproblem can be solved by existing algorithms, for example by an appropriate modification of non-negative matrix factorization (NMF). The second subproblem is solved by established phase retrieval methods. We discuss and compare implementations of this new algorithmic framework for several example problems with synthetic and real-world data, including music streams, natural 2D images, human motion trajectories and two-dimensional shapes. Keywords: blind source separation, anechoic mixtures, time-frequency transformations, linear canonical transform, Wigner-Ville spectrum
2 0.049873047 60 jmlr-2011-Locally Defined Principal Curves and Surfaces
Author: Umut Ozertem, Deniz Erdogmus
Abstract: Principal curves are defined as self-consistent smooth curves passing through the middle of the data, and they have been used in many applications of machine learning as a generalization, dimensionality reduction and a feature extraction tool. We redefine principal curves and surfaces in terms of the gradient and the Hessian of the probability density estimate. This provides a geometric understanding of the principal curves and surfaces, as well as a unifying view for clustering, principal curve fitting and manifold learning by regarding those as principal manifolds of different intrinsic dimensionalities. The theory does not impose any particular density estimation method can be used with any density estimator that gives continuous first and second derivatives. Therefore, we first present our principal curve/surface definition without assuming any particular density estimation method. Afterwards, we develop practical algorithms for the commonly used kernel density estimation (KDE) and Gaussian mixture models (GMM). Results of these algorithms are presented in notional data sets as well as real applications with comparisons to other approaches in the principal curve literature. All in all, we present a novel theoretical understanding of principal curves and surfaces, practical algorithms as general purpose machine learning tools, and applications of these algorithms to several practical problems. Keywords: unsupervised learning, dimensionality reduction, principal curves, principal surfaces, subspace constrained mean-shift
3 0.049872056 11 jmlr-2011-Approximate Marginals in Latent Gaussian Models
Author: Botond Cseke, Tom Heskes
Abstract: We consider the problem of improving the Gaussian approximate posterior marginals computed by expectation propagation and the Laplace method in latent Gaussian models and propose methods that are similar in spirit to the Laplace approximation of Tierney and Kadane (1986). We show that in the case of sparse Gaussian models, the computational complexity of expectation propagation can be made comparable to that of the Laplace method by using a parallel updating scheme. In some cases, expectation propagation gives excellent estimates where the Laplace approximation fails. Inspired by bounds on the correct marginals, we arrive at factorized approximations, which can be applied on top of both expectation propagation and the Laplace method. The factorized approximations can give nearly indistinguishable results from the non-factorized approximations and their computational complexity scales linearly with the number of variables. We experienced that the expectation propagation based marginal approximations we introduce are typically more accurate than the methods of similar complexity proposed by Rue et al. (2009). Keywords: approximate marginals, Gaussian Markov random fields, Laplace approximation, variational inference, expectation propagation
4 0.047778413 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review
Author: Thomas L. Griffiths, Zoubin Ghahramani
Abstract: The Indian buffet process is a stochastic process defining a probability distribution over equivalence classes of sparse binary matrices with a finite number of rows and an unbounded number of columns. This distribution is suitable for use as a prior in probabilistic models that represent objects using a potentially infinite array of features, or that involve bipartite graphs in which the size of at least one class of nodes is unknown. We give a detailed derivation of this distribution, and illustrate its use as a prior in an infinite latent feature model. We then review recent applications of the Indian buffet process in machine learning, discuss its extensions, and summarize its connections to other stochastic processes. Keywords: nonparametric Bayes, Markov chain Monte Carlo, latent variable models, Chinese restaurant processes, beta process, exchangeable distributions, sparse binary matrices
5 0.046799734 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series
Author: Graham W. Taylor, Geoffrey E. Hinton, Sam T. Roweis
Abstract: In this paper we develop a class of nonlinear generative models for high-dimensional time series. We first propose a model based on the restricted Boltzmann machine (RBM) that uses an undirected model with binary latent variables and real-valued “visible” variables. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. This “conditional” RBM (CRBM) makes on-line inference efficient and allows us to use a simple approximate learning procedure. We demonstrate the power of our approach by synthesizing various sequences from a model trained on motion capture data and by performing on-line filling in of data lost during capture. We extend the CRBM in a way that preserves its most important computational properties and introduces multiplicative three-way interactions that allow the effective interaction weight between two variables to be modulated by the dynamic state of a third variable. We introduce a factoring of the implied three-way weight tensor to permit a more compact parameterization. The resulting model can capture diverse styles of motion with a single set of parameters, and the three-way interactions greatly improve its ability to blend motion styles or to transition smoothly among them. Videos and source code can be found at http://www.cs.nyu.edu/˜gwtaylor/publications/ jmlr2011. Keywords: unsupervised learning, restricted Boltzmann machines, time series, generative models, motion capture
6 0.040362019 3 jmlr-2011-A Cure for Variance Inflation in High Dimensional Kernel Principal Component Analysis
7 0.037402127 79 jmlr-2011-Proximal Methods for Hierarchical Sparse Coding
8 0.034525804 43 jmlr-2011-Information, Divergence and Risk for Binary Experiments
9 0.034269188 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels
10 0.032783255 59 jmlr-2011-Learning with Structured Sparsity
11 0.032667968 92 jmlr-2011-The Stationary Subspace Analysis Toolbox
12 0.03255187 55 jmlr-2011-Learning Multi-modal Similarity
13 0.031714749 19 jmlr-2011-Convergence of Distributed Asynchronous Learning Vector Quantization Algorithms
14 0.031215494 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling
15 0.031142859 91 jmlr-2011-The Sample Complexity of Dictionary Learning
16 0.030553672 37 jmlr-2011-Group Lasso Estimation of High-dimensional Covariance Matrices
17 0.030201428 44 jmlr-2011-Information Rates of Nonparametric Gaussian Process Methods
18 0.029777011 61 jmlr-2011-Logistic Stick-Breaking Process
19 0.029288612 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models
20 0.028941382 80 jmlr-2011-Regression on Fixed-Rank Positive Semidefinite Matrices: A Riemannian Approach
topicId topicWeight
[(0, 0.15), (1, -0.049), (2, -0.021), (3, 0.024), (4, -0.045), (5, -0.026), (6, -0.003), (7, -0.048), (8, -0.026), (9, -0.033), (10, 0.062), (11, 0.052), (12, 0.08), (13, 0.007), (14, -0.065), (15, -0.048), (16, 0.085), (17, 0.058), (18, -0.001), (19, -0.094), (20, 0.022), (21, 0.038), (22, 0.084), (23, -0.082), (24, 0.153), (25, -0.039), (26, -0.052), (27, 0.189), (28, 0.092), (29, -0.068), (30, -0.165), (31, -0.186), (32, 0.079), (33, 0.134), (34, -0.156), (35, -0.203), (36, -0.002), (37, 0.141), (38, -0.234), (39, -0.066), (40, -0.136), (41, -0.115), (42, -0.046), (43, 0.184), (44, -0.023), (45, -0.091), (46, -0.024), (47, 0.263), (48, 0.112), (49, 0.137)]
simIndex simValue paperId paperTitle
same-paper 1 0.94779307 10 jmlr-2011-Anechoic Blind Source Separation Using Wigner Marginals
Author: Lars Omlor, Martin A. Giese
Abstract: Blind source separation problems emerge in many applications, where signals can be modeled as superpositions of multiple sources. Many popular applications of blind source separation are based on linear instantaneous mixture models. If specific invariance properties are known about the sources, for example, translation or rotation invariance, the simple linear model can be extended by inclusion of the corresponding transformations. When the sources are invariant against translations (spatial displacements or time shifts) the resulting model is called an anechoic mixing model. We present a new algorithmic framework for the solution of anechoic problems in arbitrary dimensions. This framework is derived from stochastic time-frequency analysis in general, and the marginal properties of the Wigner-Ville spectrum in particular. The method reduces the general anechoic problem to a set of anechoic problems with non-negativity constraints and a phase retrieval problem. The first type of subproblem can be solved by existing algorithms, for example by an appropriate modification of non-negative matrix factorization (NMF). The second subproblem is solved by established phase retrieval methods. We discuss and compare implementations of this new algorithmic framework for several example problems with synthetic and real-world data, including music streams, natural 2D images, human motion trajectories and two-dimensional shapes. Keywords: blind source separation, anechoic mixtures, time-frequency transformations, linear canonical transform, Wigner-Ville spectrum
2 0.4128406 92 jmlr-2011-The Stationary Subspace Analysis Toolbox
Author: Jan Saputra Müller, Paul von Bünau, Frank C. Meinecke, Franz J. Király, Klaus-Robert Müller
Abstract: The Stationary Subspace Analysis (SSA) algorithm linearly factorizes a high-dimensional time series into stationary and non-stationary components. The SSA Toolbox is a platform-independent efficient stand-alone implementation of the SSA algorithm with a graphical user interface written in Java, that can also be invoked from the command line and from Matlab. The graphical interface guides the user through the whole process; data can be imported and exported from comma separated values (CSV) and Matlab’s .mat files. Keywords: non-stationarities, blind source separation, dimensionality reduction, unsupervised learning
3 0.3072201 61 jmlr-2011-Logistic Stick-Breaking Process
Author: Lu Ren, Lan Du, Lawrence Carin, David Dunson
Abstract: A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efficient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. Keywords: Bayesian, nonparametric, dependent, hierarchical models, segmentation
4 0.28640035 60 jmlr-2011-Locally Defined Principal Curves and Surfaces
Author: Umut Ozertem, Deniz Erdogmus
Abstract: Principal curves are defined as self-consistent smooth curves passing through the middle of the data, and they have been used in many applications of machine learning as a generalization, dimensionality reduction and a feature extraction tool. We redefine principal curves and surfaces in terms of the gradient and the Hessian of the probability density estimate. This provides a geometric understanding of the principal curves and surfaces, as well as a unifying view for clustering, principal curve fitting and manifold learning by regarding those as principal manifolds of different intrinsic dimensionalities. The theory does not impose any particular density estimation method can be used with any density estimator that gives continuous first and second derivatives. Therefore, we first present our principal curve/surface definition without assuming any particular density estimation method. Afterwards, we develop practical algorithms for the commonly used kernel density estimation (KDE) and Gaussian mixture models (GMM). Results of these algorithms are presented in notional data sets as well as real applications with comparisons to other approaches in the principal curve literature. All in all, we present a novel theoretical understanding of principal curves and surfaces, practical algorithms as general purpose machine learning tools, and applications of these algorithms to several practical problems. Keywords: unsupervised learning, dimensionality reduction, principal curves, principal surfaces, subspace constrained mean-shift
5 0.25322637 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review
Author: Thomas L. Griffiths, Zoubin Ghahramani
Abstract: The Indian buffet process is a stochastic process defining a probability distribution over equivalence classes of sparse binary matrices with a finite number of rows and an unbounded number of columns. This distribution is suitable for use as a prior in probabilistic models that represent objects using a potentially infinite array of features, or that involve bipartite graphs in which the size of at least one class of nodes is unknown. We give a detailed derivation of this distribution, and illustrate its use as a prior in an infinite latent feature model. We then review recent applications of the Indian buffet process in machine learning, discuss its extensions, and summarize its connections to other stochastic processes. Keywords: nonparametric Bayes, Markov chain Monte Carlo, latent variable models, Chinese restaurant processes, beta process, exchangeable distributions, sparse binary matrices
6 0.25223106 43 jmlr-2011-Information, Divergence and Risk for Binary Experiments
7 0.24695525 19 jmlr-2011-Convergence of Distributed Asynchronous Learning Vector Quantization Algorithms
8 0.22166343 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series
9 0.21583569 3 jmlr-2011-A Cure for Variance Inflation in High Dimensional Kernel Principal Component Analysis
10 0.21327882 59 jmlr-2011-Learning with Structured Sparsity
11 0.20756939 73 jmlr-2011-Online Learning in Case of Unbounded Losses Using Follow the Perturbed Leader Algorithm
12 0.2074638 17 jmlr-2011-Computationally Efficient Convolved Multiple Output Gaussian Processes
13 0.18730886 99 jmlr-2011-Unsupervised Similarity-Based Risk Stratification for Cardiovascular Events Using Long-Term Time-Series Data
14 0.18385997 81 jmlr-2011-Robust Approximate Bilinear Programming for Value Function Approximation
15 0.18082649 91 jmlr-2011-The Sample Complexity of Dictionary Learning
16 0.16992153 36 jmlr-2011-Generalized TD Learning
17 0.16399592 79 jmlr-2011-Proximal Methods for Hierarchical Sparse Coding
18 0.15628582 31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels
19 0.15536575 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling
20 0.1549539 80 jmlr-2011-Regression on Fixed-Rank Positive Semidefinite Matrices: A Riemannian Approach
topicId topicWeight
[(4, 0.043), (9, 0.037), (10, 0.024), (24, 0.035), (31, 0.08), (32, 0.019), (41, 0.031), (43, 0.477), (60, 0.014), (71, 0.011), (73, 0.054), (78, 0.04), (86, 0.01), (90, 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.74893391 10 jmlr-2011-Anechoic Blind Source Separation Using Wigner Marginals
Author: Lars Omlor, Martin A. Giese
Abstract: Blind source separation problems emerge in many applications, where signals can be modeled as superpositions of multiple sources. Many popular applications of blind source separation are based on linear instantaneous mixture models. If specific invariance properties are known about the sources, for example, translation or rotation invariance, the simple linear model can be extended by inclusion of the corresponding transformations. When the sources are invariant against translations (spatial displacements or time shifts) the resulting model is called an anechoic mixing model. We present a new algorithmic framework for the solution of anechoic problems in arbitrary dimensions. This framework is derived from stochastic time-frequency analysis in general, and the marginal properties of the Wigner-Ville spectrum in particular. The method reduces the general anechoic problem to a set of anechoic problems with non-negativity constraints and a phase retrieval problem. The first type of subproblem can be solved by existing algorithms, for example by an appropriate modification of non-negative matrix factorization (NMF). The second subproblem is solved by established phase retrieval methods. We discuss and compare implementations of this new algorithmic framework for several example problems with synthetic and real-world data, including music streams, natural 2D images, human motion trajectories and two-dimensional shapes. Keywords: blind source separation, anechoic mixtures, time-frequency transformations, linear canonical transform, Wigner-Ville spectrum
2 0.24537592 66 jmlr-2011-Multiple Kernel Learning Algorithms
Author: Mehmet Gönen, Ethem Alpaydın
Abstract: In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may be using information coming from multiple sources (different representations or different feature subsets). In trying to organize and highlight the similarities and differences between them, we give a taxonomy of and review several multiple kernel learning algorithms. We perform experiments on real data sets for better illustration and comparison of existing algorithms. We see that though there may not be large differences in terms of accuracy, there is difference between them in complexity as given by the number of stored support vectors, the sparsity of the solution as given by the number of used kernels, and training time complexity. We see that overall, using multiple kernels instead of a single one is useful and believe that combining kernels in a nonlinear or data-dependent way seems more promising than linear combination in fusing information provided by simple linear kernels, whereas linear methods are more reasonable when combining complex Gaussian kernels. Keywords: support vector machines, kernel machines, multiple kernel learning
3 0.24490081 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms
Author: Jinfeng Zhuang, Ivor W. Tsang, Steven C.H. Hoi
Abstract: Previous studies of Non-Parametric Kernel Learning (NPKL) usually formulate the learning task as a Semi-Definite Programming (SDP) problem that is often solved by some general purpose SDP solvers. However, for N data examples, the time complexity of NPKL using a standard interiorpoint SDP solver could be as high as O(N 6.5 ), which prohibits NPKL methods applicable to real applications, even for data sets of moderate size. In this paper, we present a family of efficient NPKL algorithms, termed “SimpleNPKL”, which can learn non-parametric kernels from a large set of pairwise constraints efficiently. In particular, we propose two efficient SimpleNPKL algorithms. One is SimpleNPKL algorithm with linear loss, which enjoys a closed-form solution that can be efficiently computed by the Lanczos sparse eigen decomposition technique. Another one is SimpleNPKL algorithm with other loss functions (including square hinge loss, hinge loss, square loss) that can be re-formulated as a saddle-point optimization problem, which can be further resolved by a fast iterative algorithm. In contrast to the previous NPKL approaches, our empirical results show that the proposed new technique, maintaining the same accuracy, is significantly more efficient and scalable. Finally, we also demonstrate that the proposed new technique is also applicable to speed up many kernel learning tasks, including colored maximum variance unfolding, minimum volume embedding, and structure preserving embedding. Keywords: non-parametric kernel learning, semi-definite programming, semi-supervised learning, side information, pairwise constraints
4 0.2433347 48 jmlr-2011-Kernel Analysis of Deep Networks
Author: Grégoire Montavon, Mikio L. Braun, Klaus-Robert Müller
Abstract: When training deep networks it is common knowledge that an efficient and well generalizing representation of the problem is formed. In this paper we aim to elucidate what makes the emerging representation successful. We analyze the layer-wise evolution of the representation in a deep network by building a sequence of deeper and deeper kernels that subsume the mapping performed by more and more layers of the deep network and measuring how these increasingly complex kernels fit the learning problem. We observe that deep networks create increasingly better representations of the learning problem and that the structure of the deep network controls how fast the representation of the task is formed layer after layer. Keywords: deep networks, kernel principal component analysis, representations
5 0.23864464 91 jmlr-2011-The Sample Complexity of Dictionary Learning
Author: Daniel Vainsencher, Shie Mannor, Alfred M. Bruckstein
Abstract: A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary. Algorithms for various signal processing applications, including classification, denoising and signal separation, learn a dictionary from a given set of signals to be represented. Can we expect that the error in representing by such a dictionary a previously unseen signal from the same source will be of similar magnitude as those for the given examples? We assume signals are generated from a fixed distribution, and study these questions from a statistical learning theory perspective. We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefficient selection, as measured by the expected L2 error in representation when the dictionary is used. For the case of l1 regularized coefficient selection we provide a generalnp ln(mλ)/m , where n is the dimension, p is the number of ization bound of the order of O elements in the dictionary, λ is a bound on the l1 norm of the coefficient vector and m is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most k dictionary elements, we provide a bound of the order O( np ln(mk)/m) under an assumption on the closeness to orthogonality of the dictionary (low Babel function). We further show that this assumption holds for most dictionaries in high dimensions in a strong probabilistic sense. Our results also include bounds that converge as 1/m, not previously known for this problem. We provide similar results in a general setting using kernels with weak smoothness requirements. Keywords: dictionary learning, generalization bound, sparse representation
6 0.23836036 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling
7 0.23767574 74 jmlr-2011-Operator Norm Convergence of Spectral Clustering on Level Sets
8 0.23648113 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing
9 0.2361225 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series
10 0.23469576 17 jmlr-2011-Computationally Efficient Convolved Multiple Output Gaussian Processes
11 0.23363064 12 jmlr-2011-Bayesian Co-Training
12 0.23309357 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination
13 0.23092939 53 jmlr-2011-Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates
14 0.22989705 105 jmlr-2011-lp-Norm Multiple Kernel Learning
15 0.22935961 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning
16 0.22822368 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models
17 0.22781698 16 jmlr-2011-Clustering Algorithms for Chains
18 0.22717685 59 jmlr-2011-Learning with Structured Sparsity
19 0.22663824 84 jmlr-2011-Semi-Supervised Learning with Measure Propagation
20 0.22546603 9 jmlr-2011-An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models