jmlr jmlr2011 jmlr2011-10 knowledge-graph by maker-knowledge-mining

10 jmlr-2011-Anechoic Blind Source Separation Using Wigner Marginals

Source: pdf

Author: Lars Omlor, Martin A. Giese

Abstract: Blind source separation problems emerge in many applications, where signals can be modeled as superpositions of multiple sources. Many popular applications of blind source separation are based on linear instantaneous mixture models. If speciﬁc invariance properties are known about the sources, for example, translation or rotation invariance, the simple linear model can be extended by inclusion of the corresponding transformations. When the sources are invariant against translations (spatial displacements or time shifts) the resulting model is called an anechoic mixing model. We present a new algorithmic framework for the solution of anechoic problems in arbitrary dimensions. This framework is derived from stochastic time-frequency analysis in general, and the marginal properties of the Wigner-Ville spectrum in particular. The method reduces the general anechoic problem to a set of anechoic problems with non-negativity constraints and a phase retrieval problem. The ﬁrst type of subproblem can be solved by existing algorithms, for example by an appropriate modiﬁcation of non-negative matrix factorization (NMF). The second subproblem is solved by established phase retrieval methods. We discuss and compare implementations of this new algorithmic framework for several example problems with synthetic and real-world data, including music streams, natural 2D images, human motion trajectories and two-dimensional shapes. Keywords: blind source separation, anechoic mixtures, time-frequency transformations, linear canonical transform, Wigner-Ville spectrum

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Many popular applications of blind source separation are based on linear instantaneous mixture models. [sent-8, score-0.518]

2 When the sources are invariant against translations (spatial displacements or time shifts) the resulting model is called an anechoic mixing model. [sent-10, score-0.583]

3 We present a new algorithmic framework for the solution of anechoic problems in arbitrary dimensions. [sent-11, score-0.417]

4 The method reduces the general anechoic problem to a set of anechoic problems with non-negativity constraints and a phase retrieval problem. [sent-13, score-1.031]

5 Keywords: blind source separation, anechoic mixtures, time-frequency transformations, linear canonical transform, Wigner-Ville spectrum 1. [sent-17, score-0.776]

6 The most elementary class of such methods is based on linear mixture models that combine source signals or mixture components as weighted linear sum. [sent-21, score-0.317]

7 Such linear blind source separation methods have a wide spectrum of applications. [sent-22, score-0.449]

8 Popular approaches exploit typically two classes of generative models: instantaneous and convolutive mixtures (Choi et al. [sent-27, score-0.253]

9 The time-dependent signals xi (t) are approximated by the linear superposition of a number of hidden source signals s j (t). [sent-37, score-0.332]

10 Contrasting with this model, convolutive mixtures assume that the source signals are ﬁltered assuming the ﬁlter kernels αi j (t) prior to the superposition, resulting in the mixing model: n ∞ j=1 xi (t) = −∞ ∑ αi j (τ)s j (t − τ)dτ i = 1, · · · , m. [sent-39, score-0.408]

11 In between these two model classes are anechoic mixture models which linearly combine time-shifted and scaled versions of the sources, without permitting multiple occurrences of the same source in the mixture. [sent-41, score-0.564]

12 (2) Compared to full covolutive models, anechoic models constrain substantially the space of admissible ﬁlter functions. [sent-43, score-0.417]

13 Apart from the problem of robust parameter estimation with limited amounts of data, all blind source separation methods suffer from intrinsic ambiguities. [sent-46, score-0.384]

14 In contrast, for anechoic mixtures (see Equation (2)) the ﬁlter ambiguity is limited to an unknown scaling and arbitrary additive shift that can be applied to all time delays τi j , while the form of the source functions is uniquely deﬁned, except for absolute position. [sent-51, score-0.711]

15 This implies that anechoic mixtures can be advantageous for the modeling of data that are consistent with the corresponding generative model. [sent-52, score-0.496]

16 , 2008), where the same control signal might inﬂuence several muscles or joints with different delays, or in functional magnetic resonance imaging, where time shift occur naturally due to hemodynamic delays (Mørup et al. [sent-61, score-0.25]

17 The major part of previous work on anechoic mixtures has considered under-determined (overcomplete) source separation problems (m ≤ n), where the sources outnumber the dimensions of the available data. [sent-63, score-0.846]

18 In this paper we present a present a new algorithmic framework for the solution of arbitrary anechoic mixture problems, which is independent of the number of sources and the dimensionality of the data. [sent-71, score-0.611]

19 The obtained solution in timefrequency space is then transformed back into signal space, where in a second step the full solution of the original problem is determined by solution of a phase retrieval problem. [sent-78, score-0.309]

20 For the solution of the non-negative anechoic mixture problem a modiﬁcation of non-negative matrix factorization (NMF) (Lee and Seung, 1999) is presented in 3. [sent-87, score-0.522]

21 The work in Torkkola (1996a,b) extended the information maximization approach by Anthony and Sejnowski (1995), using the adaptive delay architecture described in Platt and Faggin (1992) in order to unmix anechoic 2 × 2 mixtures. [sent-99, score-0.537]

22 This implies that preprocessing of the signals, for example, by application of bilinear time-frequency transformations before source separation can be essential. [sent-113, score-0.261]

23 In this context time-frequency representations have been used quite frequently in the context of blind source separation (Karako-Eilon et al. [sent-114, score-0.384]

24 , 1998), and speciﬁcally for anechoic demixing (Yilmaz and Rickard, 2004). [sent-117, score-0.517]

25 Contrasting with this work, our approach relies on further properties of the stochastic Wigner-Ville spectrum (WVS), which to our knowledge, never have been exploited for source separation previously. [sent-119, score-0.291]

26 It was later reintroduced in signal analysis by Ville (1948), with the basic idea of deﬁning a joint distribution of the signal energy simultaneously in time and frequency (in physics corresponding to coordinates and momentum). [sent-137, score-0.278]

27 The only requirement for a zero mean random signal to be harmonizable is the existence of a Fourier representation Φ of its autocovariance function rx (t,t ′ ) that is deﬁned by: rx (t,t ′ ) := E{x(t)x∗ (t ′ )} = ′ e2π (λt−µt ) Φ(λ, µ)dλdµ. [sent-145, score-0.265]

28 – Time-frequency shift covariant: The Wigner-Ville spectrum of a time frequency shifted signal x(t) = x(t − t0 )e2π f0 (t−t0 ) is the shifted WVS of the original signal: ˜ Wx (t, f ) = Wx (t − t0 , f − f0 ). [sent-167, score-0.375]

29 – Shift Theorem: Of particular interest for the anechoic mixing problem is the behavior of the linear canonical transform under shifts of the input signal: L (M)[x(t − τ)]( f ) = exp π(2 f − Aτ)T Cτ L (M)[x]( f − Aτ). [sent-230, score-0.634]

30 (14) Since the output signal is translated by the term Aτ a good choice of the matrix A can optimize the separability of the the signal x from it is shifted counterparts in particular LCT domains (Sharmaa and Joshi, 2006). [sent-231, score-0.313]

31 Algorithmic Framework: Anechoic Demixing Using Wigner Marginals In the following, we will ﬁrst derive the basic algorithmic framework by application of the mathematical properties of the WVS, as discussed in the last section, to the anechoic mixing model. [sent-235, score-0.459]

32 The resulting algorithm consists of two major steps: the solution of positive anechoic demixing problems and phase retrieval. [sent-236, score-0.647]

33 , m are delayed superpositions of uncorrelated source signals s j (t), that is: n xi (t) = ∑ αi j · s j (t − τi j ) i = 1, · · · , m j=1 (15) with rsi ,s j (t,t ′ ) = E{si (t)s∗ (t ′ )} = 0 for i = j. [sent-253, score-0.284]

34 j The shift covariance (5) of the WVS and the superposition law (8) imply the following relation between the spectra of the sources and observations: n Wxi (t, f ) = ∑ |αi j |2Ws j (t − τi j , f ) = j=1 n ∑ |αi j |2WT s (t, f ). [sent-254, score-0.261]

35 ij j j=1 1122 (18) A NECHOIC B LIND S OURCE S EPARATION U SING W IGNER M ARGINALS This equation could directly serve as basis for the separation of the source signals s j . [sent-264, score-0.354]

36 The individual marginals have the form of an anechoic mixture problem with an additional nonnegativity constraint: |L (M)[xi ]|2 (t) = n ∑ |αi j |2 |L (M)[s j ](t − Aτi j )|2 + n(t). [sent-269, score-0.558]

37 However, the (approximate) solution of Equation (20) gives an estimate for the power spectra of the sources in LCT domain |L (M)[s j ]|2 , the scaled delays Aτi j and the absolute value of the weights |αi j |. [sent-274, score-0.268]

38 Since the projections onto LCT domains (19) are also anechoic mixtures, at ﬁrst glance, the positive system of Equations (20) seems not to provide any advantage compared to the direct solution of Equations (15) or (18). [sent-281, score-0.417]

39 An appropriate choice of A results thus in improved separation of the signals (Sharmaa and Joshi, 2006). [sent-304, score-0.249]

40 As the multivariate LCT can be the tensorial product of several one-dimensional LCTs, A can be adjusted for each dimension individually, making it possible to transform a multivariate anechoic mixture into several coupled one-dimensional mixtures (Omlor and Giese, 2007a). [sent-306, score-0.64]

41 Opposed to standard non-negative matrix factorization (NMF), convolutive NMF assumes a non-negative convolutive model (see Equation 1). [sent-328, score-0.255]

42 For many applications, such as spectral analysis or blind image deblurring, the data can be described more exactly as a (discrete) positive convolution of the form: n xi (t) = ∑ (ai j ∗ s j )[t] = j=1 n ∑ ∑ ai j [u]s j [t − u] (22) j=1 u ai j ≥ 0 , s j ≥ 0 ∀i ∈ 1, · · · , m. [sent-332, score-0.355]

43 If both, the ﬁlters ai j and the sources s j are unknown, Equation (22) implies the estimation problem: n min xi (t) − ∑ (ai j ∗ s j )(t) ai j ,b j j=1 subject to s j ≥ 0 , ai j ≥ 0 ∀i, j. [sent-336, score-0.331]

44 In the simple case of one-dimensional (vectorial) data the convolution ai j ∗ s j is equal to the matrix products:     ai j (0) ai j (N) ai j (N − 1) · · · s j (0)  ai j (1) ai j (0) ai j (N) . [sent-338, score-0.513]

45 (28) j=1 The anechoic problem (28) can thus be solved using the convolutive NMF algorithm for the solution of the problem (22), if sparseness of the ﬁlters is enforced. [sent-441, score-0.527]

46 One disadvantage of delay estimation by deconvolution, is the difﬁculty to locate the exact peak of the sparse ﬁlter, especially for fractional delays (non-integer delays). [sent-449, score-0.29]

47 The positive anechoic mixture problem that is given by Equation (20) |L (M)[xi ]|2 (t) = n ∑ |αi j |2 |L (M)[s j ](t − Aτi j )|2 + n(t). [sent-451, score-0.487]

48 j=1 can be solved with this anechoic non-negative matrix factorization approach (ANMF). [sent-452, score-0.452]

49 (31) The fact that the cross-spectra are not present in Equation 31 permits a reformulation as a constraint for the anechoic NMF problem (19) of the form: SST = I. [sent-456, score-0.417]

50 This orthogonality constraint (32) guaranties the uniqueness of the solution of the anechoic demixing problem, making the new algorithm a true blind source separation method. [sent-462, score-0.901]

52 In order to reconstruct the sources s j , or equivalently their LCT transforms L (M)[s j ], the missing phase information needs to be recovered. [sent-467, score-0.286]

53 For A = 0 in this case the anechoic mixture (2) reduces to a simple deconvolution problem with known ﬁlters νi j = ai j δ(t − τi j ). [sent-473, score-0.643]

54 2 G ERCHBERG -S AXTON (GS) A LGORITHM In typical phase retrieval applications the parameters known about the signal are the phase and some other properties like the support. [sent-478, score-0.439]

55 For the phase retrieval problem in our algorithm, an estimate of the signal is transformed back and forth between different LCT domains, always replacing the estimated amplitude spectrum by the known amplitude spectrum from the solution of the positive mixture problem and re-eastimating the phase. [sent-483, score-0.509]

56 , K Step 1: Solve the positive anechoic demixing problems given by: n |L (Mk )[x j ]|2 (t) = ∑ |αi j |2 |L (Mk )[s j ](t − Aτi j )|2 j for example, using the ANMF algorithm (1) or the bayesian Probabilistic Latent Component Analysis (PLCA) algorithm (Smaragdis et al. [sent-528, score-0.517]

57 With A = 0 this −1 0 simpliﬁes the ﬁrst step in algorithm 2, reducing the positive anechoic mixture to an instantenous mixture problem: The choice M = n |F xi |2 ( f ) = ∑ |α|2j |F s j |2 ( f ). [sent-536, score-0.557]

58 Hence phase retrieval can be accomplished by a simple deconvolution procedure, as described in 3. [sent-553, score-0.284]

59 Thus this transformation just converts the original anechoic mixture problem into a nonnegative mixture problem that can be solved using the update rules given by (29) and (30). [sent-559, score-0.59]

60 1 Sound Mixtures Since the formulation of the ”cocktail party problem” by Cherry (1953) acoustics has been a major ﬁeld for the application of blind source separation methods. [sent-573, score-0.423]

61 However, anechoic mixtures are relevant for the case of reverberation-free environments (Boﬁll, 2003; Yilmaz and Rickard, 2004). [sent-579, score-0.496]

62 To demonstrate that algorithm 2 is applicable to sound mixtures we present three different separation problems for synthetically generated delayed mixtures (model (2)) using speech and sound segments from the ICA benchmark described in Cichocki and Amari (2002). [sent-580, score-0.466]

63 Three types of mixtures were generated: (I) Mixtures of 2 source segments with random mixing and delay matrices (2 × 2), each resulting in two simulated signals x1,2 . [sent-583, score-0.418]

64 5    1 2 1  1 500 333 Data set (II) with ﬁxed mixing and delay matrices was included since completely random generation sometimes produced degenerated anechoic mixtures (instantaneous mixtures or ill-conditioned mixing matrices). [sent-585, score-0.779]

65 3 SICA PCA Figure 2: Comparison of different blind source separation algorithms for synthetic mixtures of sound signals with delays (data sets I-III, see text). [sent-605, score-0.697]

66 (III) A third data set was generated by mixing two randomly selected source segments with random mixture matrices and random delay matrices. [sent-606, score-0.309]

67 As expected, the performance Pe of the instantaneous mixture model assumed by principal component analysis (PCA) is worse than the performance achieved by the three compared anechoic methods. [sent-617, score-0.551]

68 We show that anechoic mixtures have the potential for a strong dimensionality reduction. [sent-633, score-0.496]

69 2 1 2 3 4 5 Number of sources Number of sources (a) Approximation quality as a function of the number (b) Approximation quality as a function of the numof extracted sources for periodic gait data. [sent-680, score-0.401]

70 Figure 3: Comparison of different blind source separation algorithms. [sent-682, score-0.384]

71 3 Image Processing Blind source separation is interesting for a wide range of applications in image processing. [sent-685, score-0.255]

72 With anechoic mixtures spatial displacements can be explicitly modeled (Omlor and Giese, 2007a; Be’ery and Yeredor, 2008). [sent-691, score-0.496]

73 A quantitative comparison shows that images predicted from the extracted components predict 95% of the variance of the original images for the deconvolution method, but only 72% for the phase retrieval method. [sent-703, score-0.391]

74 Right: extracted features from the image set on the left using GR phase retrieval or deconvolution. [sent-713, score-0.255]

75 In order to apply the anechoic mixture model (2) to this problem, a planar contour can be transformed into a log-polar image. [sent-724, score-0.487]

76 1 A PPLICATION TO T WO - DIMENSIONAL S HAPES To illustrate this application of anechoic mixtures to model rotation and scale invariance, we tested the scale and rotation invariant learning on a small sample of shapes taken from the MPEG-7 test database (Sebastian et al. [sent-737, score-0.523]

77 Both the extracted features (ﬁgure 6b) as well as the corresponding weights αi j (ﬁgure 6c) show that in principle, the anechoic mixing model is capable of correctly classifying different objects, independent of their size and rotation. [sent-741, score-0.488]

78 Conclusions We have presented a novel class of algorithms for the solution of over- and under-determined anechoic mixture problems, which extend to an arbitrary number of dimensions of the time argument. [sent-752, score-0.487]

79 Proper application of this bilinear time-frequency distribution to the delayed mixture model transforms the anechoic problem into a simpler delayed mixture problem in the domain of a linear canonical transform and a phase recovery problem. [sent-754, score-0.96]

80 In addition, some aspects of the discussed time-frequency framework may be applicable to full convolutive mixtures or mixture models with other invariance properties. [sent-764, score-0.259]

81 The corruption of the signal with noise, degradation of the signal shape, moving signal sources, and the multiple overlaps of reﬂected copies of the signals (multipath), make TDE a very challenging problem. [sent-778, score-0.436]

82 If the observed signal x(t) is the superposition of multiple delayed signals s j : n x(t) = ∑ α j s j (t − τi j ) + n(t). [sent-781, score-0.321]

83 If the signals are uncorrelated rsi ,s j (t,t ′ ) = 0 then the ML estimation simpliﬁes to n 1dimensional estimation problems (rx,s j (t) specifying the cross-correlation function of the signals x and s j ): n arg min E (D j ) j x(t) − ∑ α j s j (t − D j )) j=1 2 = arg max rx,s j (D j ) j . [sent-787, score-0.253]

84 For the solution of the delay estimation problem in the anechoic NMF algorithm (instead of Equation (29)), the ML-cost function (36) was minimized by a nonlinear optimization based on the nonlinear Gauss-Seidel algorithm. [sent-791, score-0.537]

85 The delays were estimated from mixtures with three randomly selected sources and constant weight A and delay matrices T = (τi j )i j :     20. [sent-797, score-0.423]

86 An information-maximization approach to blind separation and blind deconvolution. [sent-839, score-0.465]

87 A robust method to count and locate audio sources in a stereophonic linear anechoic mixture. [sent-845, score-0.541]

88 Underdetermined blind separation of delayed sound sources in the frequency domain. [sent-910, score-0.573]

89 Blind signal separation and independent component analysis: A review. [sent-968, score-0.261]

90 Recursive algorithm for phase retrieval in the fractional fourier transform domain. [sent-999, score-0.474]

91 Sparse component analysis and blind source separation of underdetermined mixtures. [sent-1060, score-0.384]

92 Classiﬁcation of closely spaced subsurface objects using electromagnetic induction data and blind source separation algorithms. [sent-1104, score-0.384]

93 Blind source separation based on the fractional fourier transform. [sent-1134, score-0.429]

94 A general contrast function based blind source separation method for convolutively mixed independent sources. [sent-1151, score-0.384]

95 Underdetermined audio source separation from anechoic mixtures with long time delay. [sent-1259, score-0.722]

96 Networks for the separation of sources that are superimposed and delayed. [sent-1346, score-0.273]

97 Blind separation of convolved mixtures in the frequency domain - signal processing. [sent-1395, score-0.394]

98 Signal separation using linear canonical and fractional fourier transforms. [sent-1402, score-0.411]

99 Blind separation of convolved sources based on information maximization. [sent-1433, score-0.273]

100 Blind separation of delayed sources based on information maximization. [sent-1437, score-0.327]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('anechoic', 0.417), ('wvs', 0.322), ('lct', 0.252), ('wigner', 0.165), ('blind', 0.158), ('nechoic', 0.157), ('wx', 0.156), ('iese', 0.149), ('igner', 0.149), ('mlor', 0.149), ('separation', 0.149), ('eparation', 0.141), ('lind', 0.141), ('fourier', 0.133), ('phase', 0.13), ('sources', 0.124), ('arginals', 0.12), ('ource', 0.12), ('delay', 0.12), ('signal', 0.112), ('convolutive', 0.11), ('delays', 0.1), ('demixing', 0.1), ('nmf', 0.1), ('signals', 0.1), ('omlor', 0.094), ('deconvolution', 0.087), ('sing', 0.082), ('mixtures', 0.079), ('source', 0.077), ('transform', 0.074), ('sica', 0.071), ('marginals', 0.071), ('mixture', 0.07), ('fractional', 0.07), ('ai', 0.069), ('retrieval', 0.067), ('spectrum', 0.065), ('instantaneous', 0.064), ('giese', 0.063), ('symplectic', 0.063), ('rx', 0.061), ('cichocki', 0.06), ('canonical', 0.059), ('superposition', 0.055), ('frequency', 0.054), ('delayed', 0.054), ('uncorrelated', 0.053), ('shifted', 0.053), ('lter', 0.051), ('gs', 0.05), ('motion', 0.05), ('hlawatsch', 0.047), ('optics', 0.047), ('rup', 0.047), ('yilmaz', 0.047), ('spectra', 0.044), ('pca', 0.043), ('shifts', 0.042), ('mixing', 0.042), ('crb', 0.039), ('ogrady', 0.039), ('rickard', 0.039), ('acoustics', 0.039), ('images', 0.039), ('shift', 0.038), ('speech', 0.037), ('separability', 0.036), ('transformations', 0.035), ('martin', 0.035), ('factorization', 0.035), ('sound', 0.034), ('transformation', 0.033), ('cram', 0.033), ('plumbley', 0.033), ('synergies', 0.033), ('cohen', 0.033), ('pe', 0.032), ('transforms', 0.032), ('anmf', 0.031), ('autocovariance', 0.031), ('avella', 0.031), ('belouchrani', 0.031), ('mecklenbruker', 0.031), ('yeredor', 0.031), ('movement', 0.031), ('convolution', 0.03), ('deterministic', 0.03), ('image', 0.029), ('extracted', 0.029), ('trajectories', 0.029), ('equation', 0.028), ('shapes', 0.027), ('snr', 0.027), ('bo', 0.027), ('cocktail', 0.027), ('flandrin', 0.027), ('human', 0.027), ('integral', 0.027), ('lters', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 10 jmlr-2011-Anechoic Blind Source Separation Using Wigner Marginals

Author: Lars Omlor, Martin A. Giese

2 0.049873047 60 jmlr-2011-Locally Defined Principal Curves and Surfaces

Author: Umut Ozertem, Deniz Erdogmus

Abstract: Principal curves are deﬁned as self-consistent smooth curves passing through the middle of the data, and they have been used in many applications of machine learning as a generalization, dimensionality reduction and a feature extraction tool. We redeﬁne principal curves and surfaces in terms of the gradient and the Hessian of the probability density estimate. This provides a geometric understanding of the principal curves and surfaces, as well as a unifying view for clustering, principal curve ﬁtting and manifold learning by regarding those as principal manifolds of different intrinsic dimensionalities. The theory does not impose any particular density estimation method can be used with any density estimator that gives continuous ﬁrst and second derivatives. Therefore, we ﬁrst present our principal curve/surface deﬁnition without assuming any particular density estimation method. Afterwards, we develop practical algorithms for the commonly used kernel density estimation (KDE) and Gaussian mixture models (GMM). Results of these algorithms are presented in notional data sets as well as real applications with comparisons to other approaches in the principal curve literature. All in all, we present a novel theoretical understanding of principal curves and surfaces, practical algorithms as general purpose machine learning tools, and applications of these algorithms to several practical problems. Keywords: unsupervised learning, dimensionality reduction, principal curves, principal surfaces, subspace constrained mean-shift

3 0.049872056 11 jmlr-2011-Approximate Marginals in Latent Gaussian Models

Author: Botond Cseke, Tom Heskes

Abstract: We consider the problem of improving the Gaussian approximate posterior marginals computed by expectation propagation and the Laplace method in latent Gaussian models and propose methods that are similar in spirit to the Laplace approximation of Tierney and Kadane (1986). We show that in the case of sparse Gaussian models, the computational complexity of expectation propagation can be made comparable to that of the Laplace method by using a parallel updating scheme. In some cases, expectation propagation gives excellent estimates where the Laplace approximation fails. Inspired by bounds on the correct marginals, we arrive at factorized approximations, which can be applied on top of both expectation propagation and the Laplace method. The factorized approximations can give nearly indistinguishable results from the non-factorized approximations and their computational complexity scales linearly with the number of variables. We experienced that the expectation propagation based marginal approximations we introduce are typically more accurate than the methods of similar complexity proposed by Rue et al. (2009). Keywords: approximate marginals, Gaussian Markov random ﬁelds, Laplace approximation, variational inference, expectation propagation

4 0.047778413 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review

Author: Thomas L. Griffiths, Zoubin Ghahramani

Abstract: The Indian buffet process is a stochastic process deﬁning a probability distribution over equivalence classes of sparse binary matrices with a ﬁnite number of rows and an unbounded number of columns. This distribution is suitable for use as a prior in probabilistic models that represent objects using a potentially inﬁnite array of features, or that involve bipartite graphs in which the size of at least one class of nodes is unknown. We give a detailed derivation of this distribution, and illustrate its use as a prior in an inﬁnite latent feature model. We then review recent applications of the Indian buffet process in machine learning, discuss its extensions, and summarize its connections to other stochastic processes. Keywords: nonparametric Bayes, Markov chain Monte Carlo, latent variable models, Chinese restaurant processes, beta process, exchangeable distributions, sparse binary matrices

5 0.046799734 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series

Author: Graham W. Taylor, Geoffrey E. Hinton, Sam T. Roweis

Abstract: In this paper we develop a class of nonlinear generative models for high-dimensional time series. We ﬁrst propose a model based on the restricted Boltzmann machine (RBM) that uses an undirected model with binary latent variables and real-valued “visible” variables. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. This “conditional” RBM (CRBM) makes on-line inference efﬁcient and allows us to use a simple approximate learning procedure. We demonstrate the power of our approach by synthesizing various sequences from a model trained on motion capture data and by performing on-line ﬁlling in of data lost during capture. We extend the CRBM in a way that preserves its most important computational properties and introduces multiplicative three-way interactions that allow the effective interaction weight between two variables to be modulated by the dynamic state of a third variable. We introduce a factoring of the implied three-way weight tensor to permit a more compact parameterization. The resulting model can capture diverse styles of motion with a single set of parameters, and the three-way interactions greatly improve its ability to blend motion styles or to transition smoothly among them. Videos and source code can be found at http://www.cs.nyu.edu/˜gwtaylor/publications/ jmlr2011. Keywords: unsupervised learning, restricted Boltzmann machines, time series, generative models, motion capture

6 0.040362019 3 jmlr-2011-A Cure for Variance Inflation in High Dimensional Kernel Principal Component Analysis

7 0.037402127 79 jmlr-2011-Proximal Methods for Hierarchical Sparse Coding

8 0.034525804 43 jmlr-2011-Information, Divergence and Risk for Binary Experiments

9 0.034269188 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels

10 0.032783255 59 jmlr-2011-Learning with Structured Sparsity

11 0.032667968 92 jmlr-2011-The Stationary Subspace Analysis Toolbox

12 0.03255187 55 jmlr-2011-Learning Multi-modal Similarity

13 0.031714749 19 jmlr-2011-Convergence of Distributed Asynchronous Learning Vector Quantization Algorithms

14 0.031215494 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

15 0.031142859 91 jmlr-2011-The Sample Complexity of Dictionary Learning

16 0.030553672 37 jmlr-2011-Group Lasso Estimation of High-dimensional Covariance Matrices

17 0.030201428 44 jmlr-2011-Information Rates of Nonparametric Gaussian Process Methods

18 0.029777011 61 jmlr-2011-Logistic Stick-Breaking Process

19 0.029288612 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models

20 0.028941382 80 jmlr-2011-Regression on Fixed-Rank Positive Semidefinite Matrices: A Riemannian Approach

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.15), (1, -0.049), (2, -0.021), (3, 0.024), (4, -0.045), (5, -0.026), (6, -0.003), (7, -0.048), (8, -0.026), (9, -0.033), (10, 0.062), (11, 0.052), (12, 0.08), (13, 0.007), (14, -0.065), (15, -0.048), (16, 0.085), (17, 0.058), (18, -0.001), (19, -0.094), (20, 0.022), (21, 0.038), (22, 0.084), (23, -0.082), (24, 0.153), (25, -0.039), (26, -0.052), (27, 0.189), (28, 0.092), (29, -0.068), (30, -0.165), (31, -0.186), (32, 0.079), (33, 0.134), (34, -0.156), (35, -0.203), (36, -0.002), (37, 0.141), (38, -0.234), (39, -0.066), (40, -0.136), (41, -0.115), (42, -0.046), (43, 0.184), (44, -0.023), (45, -0.091), (46, -0.024), (47, 0.263), (48, 0.112), (49, 0.137)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94779307 10 jmlr-2011-Anechoic Blind Source Separation Using Wigner Marginals

Author: Lars Omlor, Martin A. Giese

2 0.4128406 92 jmlr-2011-The Stationary Subspace Analysis Toolbox

Author: Jan Saputra Müller, Paul von Bünau, Frank C. Meinecke, Franz J. Király, Klaus-Robert Müller

Abstract: The Stationary Subspace Analysis (SSA) algorithm linearly factorizes a high-dimensional time series into stationary and non-stationary components. The SSA Toolbox is a platform-independent efﬁcient stand-alone implementation of the SSA algorithm with a graphical user interface written in Java, that can also be invoked from the command line and from Matlab. The graphical interface guides the user through the whole process; data can be imported and exported from comma separated values (CSV) and Matlab’s .mat ﬁles. Keywords: non-stationarities, blind source separation, dimensionality reduction, unsupervised learning

3 0.3072201 61 jmlr-2011-Logistic Stick-Breaking Process

Author: Lu Ren, Lan Du, Lawrence Carin, David Dunson

Abstract: A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efﬁcient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. Keywords: Bayesian, nonparametric, dependent, hierarchical models, segmentation

4 0.28640035 60 jmlr-2011-Locally Defined Principal Curves and Surfaces

Author: Umut Ozertem, Deniz Erdogmus

5 0.25322637 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review

Author: Thomas L. Griffiths, Zoubin Ghahramani

6 0.25223106 43 jmlr-2011-Information, Divergence and Risk for Binary Experiments

7 0.24695525 19 jmlr-2011-Convergence of Distributed Asynchronous Learning Vector Quantization Algorithms

8 0.22166343 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series

9 0.21583569 3 jmlr-2011-A Cure for Variance Inflation in High Dimensional Kernel Principal Component Analysis

10 0.21327882 59 jmlr-2011-Learning with Structured Sparsity

11 0.20756939 73 jmlr-2011-Online Learning in Case of Unbounded Losses Using Follow the Perturbed Leader Algorithm

12 0.2074638 17 jmlr-2011-Computationally Efficient Convolved Multiple Output Gaussian Processes

13 0.18730886 99 jmlr-2011-Unsupervised Similarity-Based Risk Stratification for Cardiovascular Events Using Long-Term Time-Series Data

14 0.18385997 81 jmlr-2011-Robust Approximate Bilinear Programming for Value Function Approximation

15 0.18082649 91 jmlr-2011-The Sample Complexity of Dictionary Learning

16 0.16992153 36 jmlr-2011-Generalized TD Learning

17 0.16399592 79 jmlr-2011-Proximal Methods for Hierarchical Sparse Coding

18 0.15628582 31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels

19 0.15536575 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

20 0.1549539 80 jmlr-2011-Regression on Fixed-Rank Positive Semidefinite Matrices: A Riemannian Approach

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.043), (9, 0.037), (10, 0.024), (24, 0.035), (31, 0.08), (32, 0.019), (41, 0.031), (43, 0.477), (60, 0.014), (71, 0.011), (73, 0.054), (78, 0.04), (86, 0.01), (90, 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.74893391 10 jmlr-2011-Anechoic Blind Source Separation Using Wigner Marginals

Author: Lars Omlor, Martin A. Giese

2 0.24537592 66 jmlr-2011-Multiple Kernel Learning Algorithms

Author: Mehmet Gönen, Ethem Alpaydın

Abstract: In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may be using information coming from multiple sources (different representations or different feature subsets). In trying to organize and highlight the similarities and differences between them, we give a taxonomy of and review several multiple kernel learning algorithms. We perform experiments on real data sets for better illustration and comparison of existing algorithms. We see that though there may not be large differences in terms of accuracy, there is difference between them in complexity as given by the number of stored support vectors, the sparsity of the solution as given by the number of used kernels, and training time complexity. We see that overall, using multiple kernels instead of a single one is useful and believe that combining kernels in a nonlinear or data-dependent way seems more promising than linear combination in fusing information provided by simple linear kernels, whereas linear methods are more reasonable when combining complex Gaussian kernels. Keywords: support vector machines, kernel machines, multiple kernel learning

3 0.24490081 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms

Author: Jinfeng Zhuang, Ivor W. Tsang, Steven C.H. Hoi

Abstract: Previous studies of Non-Parametric Kernel Learning (NPKL) usually formulate the learning task as a Semi-Deﬁnite Programming (SDP) problem that is often solved by some general purpose SDP solvers. However, for N data examples, the time complexity of NPKL using a standard interiorpoint SDP solver could be as high as O(N 6.5 ), which prohibits NPKL methods applicable to real applications, even for data sets of moderate size. In this paper, we present a family of efﬁcient NPKL algorithms, termed “SimpleNPKL”, which can learn non-parametric kernels from a large set of pairwise constraints efﬁciently. In particular, we propose two efﬁcient SimpleNPKL algorithms. One is SimpleNPKL algorithm with linear loss, which enjoys a closed-form solution that can be efﬁciently computed by the Lanczos sparse eigen decomposition technique. Another one is SimpleNPKL algorithm with other loss functions (including square hinge loss, hinge loss, square loss) that can be re-formulated as a saddle-point optimization problem, which can be further resolved by a fast iterative algorithm. In contrast to the previous NPKL approaches, our empirical results show that the proposed new technique, maintaining the same accuracy, is signiﬁcantly more efﬁcient and scalable. Finally, we also demonstrate that the proposed new technique is also applicable to speed up many kernel learning tasks, including colored maximum variance unfolding, minimum volume embedding, and structure preserving embedding. Keywords: non-parametric kernel learning, semi-deﬁnite programming, semi-supervised learning, side information, pairwise constraints

4 0.2433347 48 jmlr-2011-Kernel Analysis of Deep Networks

Author: Grégoire Montavon, Mikio L. Braun, Klaus-Robert Müller

Abstract: When training deep networks it is common knowledge that an efﬁcient and well generalizing representation of the problem is formed. In this paper we aim to elucidate what makes the emerging representation successful. We analyze the layer-wise evolution of the representation in a deep network by building a sequence of deeper and deeper kernels that subsume the mapping performed by more and more layers of the deep network and measuring how these increasingly complex kernels ﬁt the learning problem. We observe that deep networks create increasingly better representations of the learning problem and that the structure of the deep network controls how fast the representation of the task is formed layer after layer. Keywords: deep networks, kernel principal component analysis, representations

5 0.23864464 91 jmlr-2011-The Sample Complexity of Dictionary Learning

Author: Daniel Vainsencher, Shie Mannor, Alfred M. Bruckstein

Abstract: A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary. Algorithms for various signal processing applications, including classiﬁcation, denoising and signal separation, learn a dictionary from a given set of signals to be represented. Can we expect that the error in representing by such a dictionary a previously unseen signal from the same source will be of similar magnitude as those for the given examples? We assume signals are generated from a ﬁxed distribution, and study these questions from a statistical learning theory perspective. We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefﬁcient selection, as measured by the expected L2 error in representation when the dictionary is used. For the case of l1 regularized coefﬁcient selection we provide a generalnp ln(mλ)/m , where n is the dimension, p is the number of ization bound of the order of O elements in the dictionary, λ is a bound on the l1 norm of the coefﬁcient vector and m is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most k dictionary elements, we provide a bound of the order O( np ln(mk)/m) under an assumption on the closeness to orthogonality of the dictionary (low Babel function). We further show that this assumption holds for most dictionaries in high dimensions in a strong probabilistic sense. Our results also include bounds that converge as 1/m, not previously known for this problem. We provide similar results in a general setting using kernels with weak smoothness requirements. Keywords: dictionary learning, generalization bound, sparse representation

6 0.23836036 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

7 0.23767574 74 jmlr-2011-Operator Norm Convergence of Spectral Clustering on Level Sets

8 0.23648113 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing

9 0.2361225 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series

10 0.23469576 17 jmlr-2011-Computationally Efficient Convolved Multiple Output Gaussian Processes

11 0.23363064 12 jmlr-2011-Bayesian Co-Training

12 0.23309357 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination

13 0.23092939 53 jmlr-2011-Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates

14 0.22989705 105 jmlr-2011-lp-Norm Multiple Kernel Learning

15 0.22935961 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning

16 0.22822368 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models

17 0.22781698 16 jmlr-2011-Clustering Algorithms for Chains

18 0.22717685 59 jmlr-2011-Learning with Structured Sparsity

19 0.22663824 84 jmlr-2011-Semi-Supervised Learning with Measure Propagation

20 0.22546603 9 jmlr-2011-An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models