jmlr jmlr2005 jmlr2005-30 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Motoaki Kawanabe, Klaus-Robert Müller
Abstract: A blind separation problem where the sources are not independent, but have variance dependencies is discussed. For this scenario Hyv¨ rinen and Hurri (2004) proposed an algorithm which requires a no assumption on distributions of sources and no parametric model of dependencies between components. In this paper, we extend the semiparametric approach of Amari and Cardoso (1997) to variance dependencies and study estimating functions for blind separation of such dependent sources. In particular, we show that many ICA algorithms are applicable to the variance-dependent model as well under mild conditions, although they should in principle not. Our results indicate that separation can be done based only on normalized sources which are adjusted to have stationary variances and is not affected by the dependent activity levels. We also study the asymptotic distribution of the quasi maximum likelihood method and the stability of the natural gradient learning in detail. Simulation results of artificial and realistic examples match well with our theoretical findings. Keywords: blind source separation, variance dependencies, independent component analysis, semiparametric statistical models, estimating functions
Reference: text
sentIndex sentText sentNum sentScore
1 IDA Kekul´ strasse 7 e 12489 Berlin, Germany and Department of Computer Science University of Potsdam August-Bebel-Strasse 89 14482 Potsdam, Germany Editor: Aapo Hyv¨ rinen a Abstract A blind separation problem where the sources are not independent, but have variance dependencies is discussed. [sent-7, score-0.77]
2 For this scenario Hyv¨ rinen and Hurri (2004) proposed an algorithm which requires a no assumption on distributions of sources and no parametric model of dependencies between components. [sent-8, score-0.357]
3 In this paper, we extend the semiparametric approach of Amari and Cardoso (1997) to variance dependencies and study estimating functions for blind separation of such dependent sources. [sent-9, score-0.68]
4 Our results indicate that separation can be done based only on normalized sources which are adjusted to have stationary variances and is not affected by the dependent activity levels. [sent-11, score-0.434]
5 We also study the asymptotic distribution of the quasi maximum likelihood method and the stability of the natural gradient learning in detail. [sent-12, score-0.285]
6 Keywords: blind source separation, variance dependencies, independent component analysis, semiparametric statistical models, estimating functions 1. [sent-14, score-0.605]
7 The basic model assumes that the observed signals are linear superpositions of underlying hidden source signals. [sent-22, score-0.272]
8 Let us denote the n source signals by the vector c 2005 Motoaki Kawanabe and Klaus-Robert M¨ ller. [sent-23, score-0.272]
9 For simplicity, we consider the case where the number of source signals equals that of observed signals (n = m). [sent-33, score-0.444]
10 In most blind source separation (BSS) methods, the source signals are assumed to be statistically independent. [sent-35, score-0.738]
11 By using non-Gaussianity of the sources, the mixing matrix can be estimated and the source signals can be extracted under appropriate conditions. [sent-37, score-0.337]
12 The second-order methods are applicable to the case where the source signals have (lagged) auto-correlation. [sent-39, score-0.272]
13 Among many extensions of the basic ICA models, several researchers have studied the case where the source signals are not independent (for example, Cardoso, 1998b; Hyv¨ rinen et al. [sent-41, score-0.41]
14 They simply assume that the sources are dependent only through their variances and that the sources have temporal correlation. [sent-47, score-0.371]
15 , 2001a), the dependencies of the a sources are also caused only by their variances, but in contrast to the double blind case, they are determined by a prefixed neighborhood relation. [sent-49, score-0.512]
16 In particular, they showed that the quasi maximum likelihood (QML) estimation and the natural gradient learning give a correct solution regardless of the true source densities which satisfy certain mild conditions. [sent-54, score-0.315]
17 Thus our consistency results indicate that separation can be done based only on normalized sources which are adjusted to have stationary variances and is not affected by the dependent activity levels. [sent-64, score-0.475]
18 Among several ICA algorithms, the quasi maximum likelihood method and its online version, the natural gradient learning are discussed in detail. [sent-69, score-0.25]
19 We study the asymptotic distributions of the quasi maximum likelihood method (Section 5. [sent-70, score-0.252]
20 In particular, we carried out two experiments, where we extract two speech signals with high variance dependencies. [sent-77, score-0.271]
21 Variance-Dependent BSS Model Hyv¨ rinen and Hurri (2004) formalized the probabilistic framework of variance-dependent blind a separation. [sent-81, score-0.387]
22 Let us assume that each source signal si (t) is a product of non-negative activity level vi (t) and underlying i. [sent-82, score-0.473]
23 In practice, the activity levels vi (t) are often dependent among different signals and each observed signal is expressed as n xi (t) = ∑ ai j v j (t)z j (t), i = 1, . [sent-96, score-0.59]
24 , n, j=1 where vi (t) and zi (t) satisfy: (i) vi (t) and z j (t ) are independent for all i, j, t and t , (ii) each zi (t) is i. [sent-99, score-0.376]
25 Regarding the general activity levels vi ’s, vi (t) and v j (t) are allowed to be statistically dependent, and furthermore, no particular assumption on these dependencies is made (double blind situation). [sent-107, score-0.801]
26 455 ¨ K AWANABE AND M ULLER = × = × source (s1 , s2 ) activity level (v1 , v2 ) normalized signal (z1 , z2 ) Figure 1: Sources (s1 , s2 ) with variance dependencies in the variance-dependent BSS model. [sent-111, score-0.41]
27 However, since the sequences z1 and z2 are multiplied by extremely dependent activity levels v1 and v2 , respectively, the short-term variance of the source signals s1 and s2 are highly correlated. [sent-113, score-0.563]
28 It is important to remark that the nonstationary algorithm by Pham and Cardoso (2000) was also designed for the same source model (1), except that vi (t)’s are assumed to be deterministic and slowly varying. [sent-121, score-0.324]
29 , bn ) = A−1 is the demixing matrix to be estimated and ρs (s) = n Π ρsi (si ) is the density of the sources s. [sent-129, score-0.321]
30 , xn (t) ) source signals at t v(t) = (v1 (t), . [sent-147, score-0.272]
31 , vn (t) ) general activity levels of the sources s(t) V = (v(1), . [sent-150, score-0.405]
32 , zn (t) ) normalized source signals by the activity levels v(t) A n × n mixing matrix B = (bi j ) = (b1 , . [sent-156, score-0.581]
33 , bn ) demixing matrix which is equivalent to A−1 n ρz (z) = Π ρzi (zi ) density of the normalized source signals z ρV (V ) density of the entire sequence V = (v(1), . [sent-159, score-0.432]
34 , v(T ) ) i=1 of the activity levels y(t) = Bx(t) extracted sources by the demixing matrix B ¯ F(x, B) or F(X, B) estimating function which is an n × n matrix-valued function of the data and the parameter B vec(F) vectorization operator = (F11 , . [sent-162, score-0.661]
35 , Fnn ) Table 1: List of notations used in the variance-dependent BSS model In the variance-dependent BSS model which we consider, the sources s(t) are decomposed of two components, the normalized signals z(t) = (z1 (t), . [sent-171, score-0.333]
36 Since the former has mutual independence like the ICA model, the density of the data X is factorized as T n 1 ρz vi (t) i t=1 i=1 p(X|V ; B, ρz ) = | det B|T ∏ ∏ bi x(t) vi (t) , (3) when V = (v(1), . [sent-178, score-0.312]
37 It should be noted that both in usual ICA models and in the variance-dependent BSS model, scales and orders of the sources cannot be determined, that is, two matrices B and PDB indicate the same distribution, when P and D are a permutation and a diagonal matrix respectively (Comon, 1994). [sent-234, score-0.251]
38 In general the quasi maximum likelihood estimator is no longer consistent because of misspecified distribution. [sent-254, score-0.266]
39 However, in the ICA model (2), Amari and Cardoso (1997) found that the quasi maximum likelihood method and its online version (the natural gradient learning) give an asymptotically consistent estimator, provided that F(x, B) = I − ϕ{Bx}(Bx) satisfies (9) and (10). [sent-255, score-0.277]
40 1 that the quasi maximum likelihood method (11) still gives a consistent estimator even under this extended situation. [sent-259, score-0.266]
41 s(t)) under fixed activity levels V , while E[·|ρV ] denotes the expectation over the activity level V . [sent-286, score-0.4]
42 1 Asymptotic Distribution of the Quasi Maximum Likelihood Estimator In this section, it is shown that the quasi maximum likelihood method (11) as for example Pham and Garrat (1997); Bell and Sejnowski (1995) still gives a consistent estimator even under the extended model (3) and (4). [sent-313, score-0.266]
43 In that case, the quasi maximum likelihood estimator BQML de¯ rived from the equation F QML (X, BQML ) = 0 is consistent regardless of the true nuisance functions ∗ , ρ∗ ) under appropriate regularity conditions. [sent-339, score-0.329]
44 The natural gradient learning (Amari, 1998) B(t + 1) = B(t) + η(t) I − ϕ{y(t)} y (t) B(t), (29) is an online algorithm based on the quasi maximum likelihood method, where y(t) = B(t)x(t) is the current estimator of the sources and η(t) is an appropriate learning constant. [sent-344, score-0.462]
45 If we apply the online algorithm (29) to data with highly nonstationary variances like speech, the scale factor of the demixing matrix B changes substantially from time to time and never converges. [sent-352, score-0.252]
46 If all eigenvalues have negative real parts, then the equilib∂vec(B) ∗ is asymptotically stable for the fixed activity levels V . [sent-364, score-0.271]
47 Since the matrix can be expressed rium B as ∂ vec{ E F NG (x(t), B∗ ) V B∗ } ∂ vec E F NG V ¯ ¯ = B∗ (B∗ )−1 , (33) ∂ vec(B) ∂ vec(χ) ¯ ¯i where B∗ = (B∗j;kl ) = (δik b∗j ), and derivative w. [sent-365, score-0.285]
48 Theorem 4 If the stochastic process V = {v(t), t ≥ 0} of the activity levels satisfies the conditions ∗ (34) – (36) with probability 1 as for the true parameter (B∗ , ρ∗ , ρV ), then the true demixing matrix z ∗ becomes an asymptotically stable equilibrium of the flow (30) with probability 1. [sent-371, score-0.465]
49 p+1 p+1 E[ vi ] E[ v j ] −1 , finally we For the cubic function ϕi (yi ) = y3 , not as in the ICA model, the condition that all signals are subi Gaussian E[ |zi |4 ] <3 γ3i = 2 E[ |zi |2 ] is not enough, but the variation of activity levels vi from (1) should be taken into account. [sent-385, score-0.692]
50 3 Properties of Other BSS Algorithms Although we concentrated on estimating functions of the form (20), we can deal with more general functions and investigate other ICA algorithms within the framework of estimating functions and asymptotic estimating functions (see also Cardoso, 1997). [sent-397, score-0.325]
51 The double blind algorithm (Hyv¨ rinen and Hurri, 2004) cannot be applied to the case where the variance structures of sources a are the same or there is no temporal variance-dependency. [sent-406, score-0.688]
52 The nonstationary algorithm by Pham and Cardoso (2000) is not applicable to the case where time courses of the activity levels are proportional to each other. [sent-407, score-0.301]
53 The eight batch algorithms and the online versions of the quasi maximum likelihood methods listed in Table 3 were applied to those data sets. [sent-413, score-0.25]
54 468 E STIMATING F UNCTIONS FOR VARIANCE -D EPENDENT BSS algorithm FastICA Hyv¨ rinen (1999) a double blind Hyv¨ rinen and Hurri (2004) a JADE Cardoso and Souloumiac (1993) TDSEP/SOBI Ziehe and M¨ ller (1998) u Belouchrani et al. [sent-419, score-0.612]
55 asymptotically yes always (since we consider here only the case without auto-correlations) yes Time course of the activity levels are proportional to each other. [sent-423, score-0.271]
56 1 Artificial Data Sets In all artificial data sets, five source signals of various types with length T = 10000 were generated and data after multiplying a random 5 × 5 mixing matrix were observed. [sent-429, score-0.337]
57 In the third and the fourth data, the activity levels vi (t) are sinusoidal functions with different frequencies. [sent-452, score-0.369]
58 ’Sepagaus’ also showed 470 E STIMATING F UNCTIONS FOR VARIANCE -D EPENDENT BSS QML(tanh) QML(pow3) Online(tanh) Online(pow3) ’DoubleBlind’ ar subG ar uni sin supG sin subG com supG com subG exp supG uni subG sss v12 8. [sent-464, score-0.352]
59 21 JADE FastICA(tanh) FastICA(pow3) TDSEP/SOBI ’Sepagaus’ ar subG ar uni sin supG sin subG com supG com subG exp supG uni subG sss v12 10. [sent-557, score-0.352]
60 The double blind algorithm (’DoubleBlind’) by Hyv¨ rinen and Hurri (2004) does not work a when (i) all vi ’s have same temporal structure, and (ii) there exist no temporal dependencies in vi ’s. [sent-653, score-0.837]
61 ’Sepagaus’ does not have a guarantee to separate sources either, because smoothed sequences of the activity levels are nearly proportional to each other (see Table 2). [sent-654, score-0.405]
62 Speech and audio signals have often been used as sources s(t) even for experiments of the instantaneous ICA model. [sent-702, score-0.333]
63 We used the separated signals of their second demo as the sources, because their separation quality is good enough. [sent-705, score-0.289]
64 Figure 2 shows the sources and the estimators of their activity levels with 8. [sent-706, score-0.431]
65 We inserted one short pause at different positions of both sequences to make correlation of the activity levels of the modified signals much larger (0. [sent-711, score-0.416]
66 Figure 3 shows the sources and the estimators of the activity levels. [sent-714, score-0.343]
67 Correlation of the activity levels of the arranged signals becomes 0. [sent-716, score-0.416]
68 1 2 40000 80000 120000 40000 80000 120000 1 2 Figure 2: The sources of the data set ’sss’ and the estimators of their activity levels. [sent-718, score-0.343]
69 1 2 10000 20000 30000 40000 10000 20000 30000 40000 1 2 Figure 3: The sources of the data set ’v12’ and the estimators of their activity levels. [sent-721, score-0.343]
70 A 2 × 2 mixing matrix A was randomly generated 100 times and 100 different mixtures of the source signals were made. [sent-724, score-0.337]
71 Conclusions In this paper, we discussed semiparametric estimation for blind source separation, when sources have variance dependencies. [sent-741, score-0.67]
72 Hyv¨ rinen and Hurri (2004) introduced the double blind setting where, a in addition to source distributions, dependencies between components are not restricted by any parametric model. [sent-742, score-0.589]
73 In the presence of these two nuisance parameters (densities of activity level and underlying signal), they proposed an algorithm based on lagged 4-th order cumulants. [sent-743, score-0.278]
74 Although their algorithm works well in many cases, it fails if (i) all vi ’s have similar temporal structure, or (ii) there exist no temporal dependencies in vi ’s. [sent-744, score-0.406]
75 Extending the semiparametric approach (Amari and Cardoso, 1997) under variance dependencies, we investigated estimating functions for the variance-dependent BSS model. [sent-746, score-0.256]
76 In particular, we proved that the quasi maximum likelihood estimator is derived from an estimating function, and is hence consistent regardless of the true nuisance densities (which satisfy certain mild conditions). [sent-747, score-0.425]
77 We also analyzed other ICA algorithms within the framework of (asymptotic) estimating functions and showed that many of them can separate sources with coherent variances. [sent-748, score-0.257]
78 Comments on Other Selected BSS Algorithms We will discuss in the following the local consistency of ICA/BSS algorithms except the quasi maximum likelihood method. [sent-774, score-0.256]
79 1 FastICA FastICA is one of the standard algorithms for blind source separation. [sent-776, score-0.349]
80 We use, in the following the notation W for the demixing matrix after whitening in order to distinguish it from the total demixing matrix B = WC−1/2 including whitening process. [sent-782, score-0.32]
81 , n, (40) t=1 then it is easy to show that the expectations of the left hand side of (38) and (39) vanish regardless of the nuisance functions ρz and ρV , in the same way as for the quasi maximum likelihood method. [sent-796, score-0.278]
82 If the other regularity conditions hold, it becomes an estimating function and the estimator B derived from it converges to the correct demixing matrix B∗ = (A∗ )−1 with a permutation matrix P and a diagonal matrix D. [sent-798, score-0.407]
83 Although the estimating function is similar to that of the quasi maximum likelihood, FastICA algorithm is based on the Newton’s algorithm, and therefore, it has globally more stable dynamics than the natural gradient learning. [sent-799, score-0.273]
84 2 The Double Blind Algorithm by Hyv¨ rinen and Hurri (2004) a Hyv¨ rinen and Hurri (2004) proposed an algorithm for separating sources under the double blind a situation. [sent-801, score-0.73]
85 Provided that the matrix K = (Ki j ) is non-singular, the quantity J is maximized when Q is a signed permutation matrix, that is, by maximizing the criterion J we can estimate the true demixing matrix B∗ = (A∗ )−1 up to signed permutation matrices. [sent-805, score-0.308]
86 If the other regularity condition holds, F(X, B) turns out to be an asymptotic estimating function which is asymptotically equivalent to an estimating function and the estimator B converges to the correct demixing matrix B∗ = (A∗ )−1 . [sent-817, score-0.493]
87 If W is the true demixing matrix and yi ’s are extracted signals with W , the components Ki jkl := cum(yi , y j , yk , yl ) of the expected cumulant are zero except for i = j = k = l or i = j = k = l or i = l = j = k. [sent-822, score-0.434]
88 Thus, one needs only to show that the estimating equation is associated to the minimization of 2 ∑ |cum(yi , y j , yi , y j )|2 under the orthogonality constraints which is satisfied i= j when yi equals the true sources (up to a scaling and a permutation). [sent-823, score-0.355]
89 When the sources si ’s are mutually independent and have temporal covariance structure, the demixing matrix PD(A∗ )−1 can diagonalize all lagged covariance matrices R(∆t), where P is a permutation matrix and D is a diagonal matrix. [sent-829, score-0.538]
90 This property has been used in blind separation methods with second order statistics (Tong et al. [sent-830, score-0.366]
91 An information-maximization approach to blind separation and blind deconvolution. [sent-898, score-0.615]
92 A blind source separation techinique based on second order statistics. [sent-906, score-0.466]
93 Injecting noise for analysing the stability of ica u components. [sent-976, score-0.268]
94 Blind separation of sources that have spatiotemporal variance depena dencies. [sent-998, score-0.325]
95 Estimating functions for blind separation when source have u variance-dependencies. [sent-1021, score-0.466]
96 On-line learning in Switching and Drifting u environments with application to blind source separation, pages 93–110. [sent-1084, score-0.349]
97 Blind separation of instantaneous mixture sources via an independent component analysis. [sent-1121, score-0.278]
98 Blind separation of mixture of independent sources through a quasimaximum likelihood approach. [sent-1137, score-0.316]
99 Adaptive on-line learning algorithms for blind separation: Maximum entropy and minimum mutual information. [sent-1189, score-0.249]
100 TDSEP – an efficient algorithm for blind separation using time strucu ture. [sent-1195, score-0.366]
wordName wordTfidf (topN-words)
[('bss', 0.32), ('qml', 0.304), ('vec', 0.251), ('blind', 0.249), ('ica', 0.235), ('fastica', 0.188), ('quasi', 0.177), ('signals', 0.172), ('sources', 0.161), ('subg', 0.16), ('activity', 0.156), ('hyv', 0.143), ('rinen', 0.138), ('cardoso', 0.131), ('tanh', 0.129), ('hurri', 0.127), ('awanabe', 0.127), ('uller', 0.127), ('demixing', 0.126), ('vi', 0.125), ('bx', 0.12), ('stimating', 0.118), ('separation', 0.117), ('semiparametric', 0.113), ('source', 0.1), ('ependent', 0.099), ('amari', 0.097), ('estimating', 0.096), ('supg', 0.092), ('jade', 0.088), ('unctions', 0.088), ('levels', 0.088), ('doubleblind', 0.084), ('sepagaus', 0.084), ('ziehe', 0.069), ('unbiasedness', 0.067), ('meinecke', 0.064), ('nuisance', 0.063), ('zi', 0.063), ('cum', 0.059), ('lagged', 0.059), ('dependencies', 0.058), ('kawanabe', 0.057), ('pham', 0.057), ('nonstationary', 0.057), ('fii', 0.056), ('ar', 0.055), ('av', 0.053), ('speech', 0.052), ('estimator', 0.051), ('temporal', 0.049), ('signal', 0.049), ('yi', 0.049), ('variance', 0.047), ('ki', 0.045), ('double', 0.044), ('ller', 0.043), ('si', 0.043), ('sss', 0.042), ('vip', 0.042), ('remark', 0.042), ('consistency', 0.041), ('ng', 0.041), ('bi', 0.039), ('likelihood', 0.038), ('asymptotic', 0.037), ('sl', 0.035), ('online', 0.035), ('sin', 0.035), ('equilibrium', 0.034), ('matrix', 0.034), ('com', 0.034), ('fji', 0.034), ('jjade', 0.034), ('kil', 0.034), ('kli', 0.034), ('stability', 0.033), ('permutation', 0.032), ('cov', 0.031), ('uni', 0.031), ('mixing', 0.031), ('nonlinearity', 0.029), ('yk', 0.028), ('asymptotically', 0.027), ('bell', 0.026), ('estimators', 0.026), ('condition', 0.026), ('amariindex', 0.025), ('godambe', 0.025), ('jkl', 0.025), ('wc', 0.025), ('jl', 0.025), ('murata', 0.025), ('belouchrani', 0.025), ('signed', 0.025), ('harmeling', 0.025), ('worked', 0.025), ('fi', 0.024), ('scales', 0.024), ('det', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 30 jmlr-2005-Estimating Functions for Blind Separation When Sources Have Variance Dependencies
Author: Motoaki Kawanabe, Klaus-Robert Müller
Abstract: A blind separation problem where the sources are not independent, but have variance dependencies is discussed. For this scenario Hyv¨ rinen and Hurri (2004) proposed an algorithm which requires a no assumption on distributions of sources and no parametric model of dependencies between components. In this paper, we extend the semiparametric approach of Amari and Cardoso (1997) to variance dependencies and study estimating functions for blind separation of such dependent sources. In particular, we show that many ICA algorithms are applicable to the variance-dependent model as well under mild conditions, although they should in principle not. Our results indicate that separation can be done based only on normalized sources which are adjusted to have stationary variances and is not affected by the dependent activity levels. We also study the asymptotic distribution of the quasi maximum likelihood method and the stability of the natural gradient learning in detail. Simulation results of artificial and realistic examples match well with our theoretical findings. Keywords: blind source separation, variance dependencies, independent component analysis, semiparametric statistical models, estimating functions
2 0.21109863 65 jmlr-2005-Separating a Real-Life Nonlinear Image Mixture
Author: Luís B. Almeida
Abstract: When acquiring an image of a paper document, the image printed on the back page sometimes shows through. The mixture of the front- and back-page images thus obtained is markedly nonlinear, and thus constitutes a good real-life test case for nonlinear blind source separation. This paper addresses a difficult version of this problem, corresponding to the use of “onion skin” paper, which results in a relatively strong nonlinearity of the mixture, which becomes close to singular in the lighter regions of the images. The separation is achieved through the MISEP technique, which is an extension of the well known INFOMAX method. The separation results are assessed with objective quality measures. They show an improvement over the results obtained with linear separation, but have room for further improvement. Keywords: ICA, blind source separation, nonlinear mixtures, nonlinear separation, image mixture, image separation
3 0.17053783 25 jmlr-2005-Denoising Source Separation
Author: Jaakko Särelä, Harri Valpola
Abstract: A new algorithmic framework called denoising source separation (DSS) is introduced. The main benefit of this framework is that it allows for the easy development of new source separation algorithms which can be optimised for specific problems. In this framework, source separation algorithms are constructed around denoising procedures. The resulting algorithms can range from almost blind to highly specialised source separation algorithms. Both simple linear and more complex nonlinear or adaptive denoising schemes are considered. Some existing independent component analysis algorithms are reinterpreted within the DSS framework and new, robust blind source separation algorithms are suggested. The framework is derived as a one-unit equivalent to an EM algorithm for source separation. However, in the DSS framework it is easy to utilise various kinds of denoising procedures which need not be based on generative models. In the experimental section, various DSS schemes are applied extensively to artificial data, to real magnetoencephalograms and to simulated CDMA mobile network signals. Finally, various extensions to the proposed DSS algorithms are considered. These include nonlinear observation mappings, hierarchical models and over-complete, nonorthogonal feature spaces. With these extensions, DSS appears to have relevance to many existing models of neural information processing. Keywords: blind source separation, BSS, prior information, denoising, denoising source separation, DSS, independent component analysis, ICA, magnetoencephalograms, MEG, CDMA
4 0.13119859 31 jmlr-2005-Estimation of Non-Normalized Statistical Models by Score Matching
Author: Aapo Hyvärinen
Abstract: One often wants to estimate statistical models where the probability density function is known only up to a multiplicative normalization constant. Typically, one then has to resort to Markov Chain Monte Carlo methods, or approximations of the normalization constant. Here, we propose that such models can be estimated by minimizing the expected squared distance between the gradient of the log-density given by the model and the gradient of the log-density of the observed data. While the estimation of the gradient of log-density function is, in principle, a very difficult non-parametric problem, we prove a surprising result that gives a simple formula for this objective function. The density function of the observed data does not appear in this formula, which simplifies to a sample average of a sum of some derivatives of the log-density given by the model. The validity of the method is demonstrated on multivariate Gaussian and independent component analysis models, and by estimating an overcomplete filter set for natural image data. Keywords: statistical estimation, non-normalized densities, pseudo-likelihood, Markov chain Monte Carlo, contrastive divergence
5 0.10871425 41 jmlr-2005-Kernel Methods for Measuring Independence
Author: Arthur Gretton, Ralf Herbrich, Alexander Smola, Olivier Bousquet, Bernhard Schölkopf
Abstract: We introduce two new functionals, the constrained covariance and the kernel mutual information, to measure the degree of independence of random variables. These quantities are both based on the covariance between functions of the random variables in reproducing kernel Hilbert spaces (RKHSs). We prove that when the RKHSs are universal, both functionals are zero if and only if the random variables are pairwise independent. We also show that the kernel mutual information is an upper bound near independence on the Parzen window estimate of the mutual information. Analogous results apply for two correlation-based dependence functionals introduced earlier: we show the kernel canonical correlation and the kernel generalised variance to be independence measures for universal kernels, and prove the latter to be an upper bound on the mutual information near independence. The performance of the kernel dependence functionals in measuring independence is verified in the context of independent component analysis. Keywords: independence, covariance operator, mutual information, kernel, Parzen window estimate, independent component analysis c 2005 Arthur Gretton, Ralf Herbrich, Alexander Smola, Olivier Bousquet and Bernhard Schölkopf . G RETTON , H ERBRICH , S MOLA , B OUSQUET AND S CHÖLKOPF
6 0.10214231 63 jmlr-2005-Quasi-Geodesic Neural Learning Algorithms Over the Orthogonal Group: A Tutorial
7 0.04241588 36 jmlr-2005-Gaussian Processes for Ordinal Regression
8 0.04007059 50 jmlr-2005-Learning with Decision Lists of Data-Dependent Features
9 0.038630851 34 jmlr-2005-Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach
10 0.037411004 32 jmlr-2005-Expectation Consistent Approximate Inference
11 0.036710184 15 jmlr-2005-Asymptotic Model Selection for Naive Bayesian Networks
12 0.031234957 67 jmlr-2005-Stability of Randomized Learning Algorithms
13 0.03102156 14 jmlr-2005-Assessing Approximate Inference for Binary Gaussian Process Classification
14 0.029425606 13 jmlr-2005-Analysis of Variance of Cross-Validation Estimators of the Generalization Error
15 0.028333535 42 jmlr-2005-Large Margin Methods for Structured and Interdependent Output Variables
16 0.026049038 6 jmlr-2005-A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs
17 0.024316281 39 jmlr-2005-Information Bottleneck for Gaussian Variables
18 0.023918172 35 jmlr-2005-Frames, Reproducing Kernels, Regularization and Learning
19 0.023590151 49 jmlr-2005-Learning the Kernel with Hyperkernels (Kernel Machines Section)
20 0.022904582 62 jmlr-2005-Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models
topicId topicWeight
[(0, 0.205), (1, 0.351), (2, -0.434), (3, 0.032), (4, -0.136), (5, -0.121), (6, 0.145), (7, 0.061), (8, 0.033), (9, -0.065), (10, 0.116), (11, 0.185), (12, -0.061), (13, -0.039), (14, 0.01), (15, -0.013), (16, -0.031), (17, 0.043), (18, -0.009), (19, 0.035), (20, 0.015), (21, 0.074), (22, 0.032), (23, -0.043), (24, 0.067), (25, -0.016), (26, -0.023), (27, 0.032), (28, -0.049), (29, 0.028), (30, 0.002), (31, -0.001), (32, 0.004), (33, -0.045), (34, -0.022), (35, 0.043), (36, -0.029), (37, 0.005), (38, -0.041), (39, -0.001), (40, -0.036), (41, 0.02), (42, 0.009), (43, 0.008), (44, 0.012), (45, -0.001), (46, 0.031), (47, 0.065), (48, -0.024), (49, -0.004)]
simIndex simValue paperId paperTitle
same-paper 1 0.97343957 30 jmlr-2005-Estimating Functions for Blind Separation When Sources Have Variance Dependencies
Author: Motoaki Kawanabe, Klaus-Robert Müller
Abstract: A blind separation problem where the sources are not independent, but have variance dependencies is discussed. For this scenario Hyv¨ rinen and Hurri (2004) proposed an algorithm which requires a no assumption on distributions of sources and no parametric model of dependencies between components. In this paper, we extend the semiparametric approach of Amari and Cardoso (1997) to variance dependencies and study estimating functions for blind separation of such dependent sources. In particular, we show that many ICA algorithms are applicable to the variance-dependent model as well under mild conditions, although they should in principle not. Our results indicate that separation can be done based only on normalized sources which are adjusted to have stationary variances and is not affected by the dependent activity levels. We also study the asymptotic distribution of the quasi maximum likelihood method and the stability of the natural gradient learning in detail. Simulation results of artificial and realistic examples match well with our theoretical findings. Keywords: blind source separation, variance dependencies, independent component analysis, semiparametric statistical models, estimating functions
2 0.80432427 25 jmlr-2005-Denoising Source Separation
Author: Jaakko Särelä, Harri Valpola
Abstract: A new algorithmic framework called denoising source separation (DSS) is introduced. The main benefit of this framework is that it allows for the easy development of new source separation algorithms which can be optimised for specific problems. In this framework, source separation algorithms are constructed around denoising procedures. The resulting algorithms can range from almost blind to highly specialised source separation algorithms. Both simple linear and more complex nonlinear or adaptive denoising schemes are considered. Some existing independent component analysis algorithms are reinterpreted within the DSS framework and new, robust blind source separation algorithms are suggested. The framework is derived as a one-unit equivalent to an EM algorithm for source separation. However, in the DSS framework it is easy to utilise various kinds of denoising procedures which need not be based on generative models. In the experimental section, various DSS schemes are applied extensively to artificial data, to real magnetoencephalograms and to simulated CDMA mobile network signals. Finally, various extensions to the proposed DSS algorithms are considered. These include nonlinear observation mappings, hierarchical models and over-complete, nonorthogonal feature spaces. With these extensions, DSS appears to have relevance to many existing models of neural information processing. Keywords: blind source separation, BSS, prior information, denoising, denoising source separation, DSS, independent component analysis, ICA, magnetoencephalograms, MEG, CDMA
3 0.78800315 65 jmlr-2005-Separating a Real-Life Nonlinear Image Mixture
Author: Luís B. Almeida
Abstract: When acquiring an image of a paper document, the image printed on the back page sometimes shows through. The mixture of the front- and back-page images thus obtained is markedly nonlinear, and thus constitutes a good real-life test case for nonlinear blind source separation. This paper addresses a difficult version of this problem, corresponding to the use of “onion skin” paper, which results in a relatively strong nonlinearity of the mixture, which becomes close to singular in the lighter regions of the images. The separation is achieved through the MISEP technique, which is an extension of the well known INFOMAX method. The separation results are assessed with objective quality measures. They show an improvement over the results obtained with linear separation, but have room for further improvement. Keywords: ICA, blind source separation, nonlinear mixtures, nonlinear separation, image mixture, image separation
4 0.3959417 41 jmlr-2005-Kernel Methods for Measuring Independence
Author: Arthur Gretton, Ralf Herbrich, Alexander Smola, Olivier Bousquet, Bernhard Schölkopf
Abstract: We introduce two new functionals, the constrained covariance and the kernel mutual information, to measure the degree of independence of random variables. These quantities are both based on the covariance between functions of the random variables in reproducing kernel Hilbert spaces (RKHSs). We prove that when the RKHSs are universal, both functionals are zero if and only if the random variables are pairwise independent. We also show that the kernel mutual information is an upper bound near independence on the Parzen window estimate of the mutual information. Analogous results apply for two correlation-based dependence functionals introduced earlier: we show the kernel canonical correlation and the kernel generalised variance to be independence measures for universal kernels, and prove the latter to be an upper bound on the mutual information near independence. The performance of the kernel dependence functionals in measuring independence is verified in the context of independent component analysis. Keywords: independence, covariance operator, mutual information, kernel, Parzen window estimate, independent component analysis c 2005 Arthur Gretton, Ralf Herbrich, Alexander Smola, Olivier Bousquet and Bernhard Schölkopf . G RETTON , H ERBRICH , S MOLA , B OUSQUET AND S CHÖLKOPF
5 0.39273372 31 jmlr-2005-Estimation of Non-Normalized Statistical Models by Score Matching
Author: Aapo Hyvärinen
Abstract: One often wants to estimate statistical models where the probability density function is known only up to a multiplicative normalization constant. Typically, one then has to resort to Markov Chain Monte Carlo methods, or approximations of the normalization constant. Here, we propose that such models can be estimated by minimizing the expected squared distance between the gradient of the log-density given by the model and the gradient of the log-density of the observed data. While the estimation of the gradient of log-density function is, in principle, a very difficult non-parametric problem, we prove a surprising result that gives a simple formula for this objective function. The density function of the observed data does not appear in this formula, which simplifies to a sample average of a sum of some derivatives of the log-density given by the model. The validity of the method is demonstrated on multivariate Gaussian and independent component analysis models, and by estimating an overcomplete filter set for natural image data. Keywords: statistical estimation, non-normalized densities, pseudo-likelihood, Markov chain Monte Carlo, contrastive divergence
6 0.35716137 63 jmlr-2005-Quasi-Geodesic Neural Learning Algorithms Over the Orthogonal Group: A Tutorial
7 0.15764308 36 jmlr-2005-Gaussian Processes for Ordinal Regression
8 0.14997822 50 jmlr-2005-Learning with Decision Lists of Data-Dependent Features
9 0.14739604 15 jmlr-2005-Asymptotic Model Selection for Naive Bayesian Networks
10 0.14591539 67 jmlr-2005-Stability of Randomized Learning Algorithms
11 0.13412131 6 jmlr-2005-A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs
12 0.13180043 13 jmlr-2005-Analysis of Variance of Cross-Validation Estimators of the Generalization Error
13 0.11777526 32 jmlr-2005-Expectation Consistent Approximate Inference
14 0.11701227 42 jmlr-2005-Large Margin Methods for Structured and Interdependent Output Variables
15 0.11596604 34 jmlr-2005-Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach
16 0.111631 14 jmlr-2005-Assessing Approximate Inference for Binary Gaussian Process Classification
17 0.10698949 11 jmlr-2005-Algorithmic Stability and Meta-Learning
18 0.10335971 35 jmlr-2005-Frames, Reproducing Kernels, Regularization and Learning
19 0.10281343 17 jmlr-2005-Change Point Problems in Linear Dynamical Systems
20 0.10187912 72 jmlr-2005-What's Strange About Recent Events (WSARE): An Algorithm for the Early Detection of Disease Outbreaks
topicId topicWeight
[(1, 0.515), (13, 0.017), (17, 0.019), (19, 0.021), (36, 0.027), (37, 0.028), (43, 0.023), (47, 0.02), (49, 0.012), (52, 0.064), (59, 0.014), (70, 0.025), (80, 0.028), (88, 0.066), (90, 0.022), (94, 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.7923879 30 jmlr-2005-Estimating Functions for Blind Separation When Sources Have Variance Dependencies
Author: Motoaki Kawanabe, Klaus-Robert Müller
Abstract: A blind separation problem where the sources are not independent, but have variance dependencies is discussed. For this scenario Hyv¨ rinen and Hurri (2004) proposed an algorithm which requires a no assumption on distributions of sources and no parametric model of dependencies between components. In this paper, we extend the semiparametric approach of Amari and Cardoso (1997) to variance dependencies and study estimating functions for blind separation of such dependent sources. In particular, we show that many ICA algorithms are applicable to the variance-dependent model as well under mild conditions, although they should in principle not. Our results indicate that separation can be done based only on normalized sources which are adjusted to have stationary variances and is not affected by the dependent activity levels. We also study the asymptotic distribution of the quasi maximum likelihood method and the stability of the natural gradient learning in detail. Simulation results of artificial and realistic examples match well with our theoretical findings. Keywords: blind source separation, variance dependencies, independent component analysis, semiparametric statistical models, estimating functions
2 0.32050067 25 jmlr-2005-Denoising Source Separation
Author: Jaakko Särelä, Harri Valpola
Abstract: A new algorithmic framework called denoising source separation (DSS) is introduced. The main benefit of this framework is that it allows for the easy development of new source separation algorithms which can be optimised for specific problems. In this framework, source separation algorithms are constructed around denoising procedures. The resulting algorithms can range from almost blind to highly specialised source separation algorithms. Both simple linear and more complex nonlinear or adaptive denoising schemes are considered. Some existing independent component analysis algorithms are reinterpreted within the DSS framework and new, robust blind source separation algorithms are suggested. The framework is derived as a one-unit equivalent to an EM algorithm for source separation. However, in the DSS framework it is easy to utilise various kinds of denoising procedures which need not be based on generative models. In the experimental section, various DSS schemes are applied extensively to artificial data, to real magnetoencephalograms and to simulated CDMA mobile network signals. Finally, various extensions to the proposed DSS algorithms are considered. These include nonlinear observation mappings, hierarchical models and over-complete, nonorthogonal feature spaces. With these extensions, DSS appears to have relevance to many existing models of neural information processing. Keywords: blind source separation, BSS, prior information, denoising, denoising source separation, DSS, independent component analysis, ICA, magnetoencephalograms, MEG, CDMA
3 0.27043083 65 jmlr-2005-Separating a Real-Life Nonlinear Image Mixture
Author: Luís B. Almeida
Abstract: When acquiring an image of a paper document, the image printed on the back page sometimes shows through. The mixture of the front- and back-page images thus obtained is markedly nonlinear, and thus constitutes a good real-life test case for nonlinear blind source separation. This paper addresses a difficult version of this problem, corresponding to the use of “onion skin” paper, which results in a relatively strong nonlinearity of the mixture, which becomes close to singular in the lighter regions of the images. The separation is achieved through the MISEP technique, which is an extension of the well known INFOMAX method. The separation results are assessed with objective quality measures. They show an improvement over the results obtained with linear separation, but have room for further improvement. Keywords: ICA, blind source separation, nonlinear mixtures, nonlinear separation, image mixture, image separation
4 0.25473166 31 jmlr-2005-Estimation of Non-Normalized Statistical Models by Score Matching
Author: Aapo Hyvärinen
Abstract: One often wants to estimate statistical models where the probability density function is known only up to a multiplicative normalization constant. Typically, one then has to resort to Markov Chain Monte Carlo methods, or approximations of the normalization constant. Here, we propose that such models can be estimated by minimizing the expected squared distance between the gradient of the log-density given by the model and the gradient of the log-density of the observed data. While the estimation of the gradient of log-density function is, in principle, a very difficult non-parametric problem, we prove a surprising result that gives a simple formula for this objective function. The density function of the observed data does not appear in this formula, which simplifies to a sample average of a sum of some derivatives of the log-density given by the model. The validity of the method is demonstrated on multivariate Gaussian and independent component analysis models, and by estimating an overcomplete filter set for natural image data. Keywords: statistical estimation, non-normalized densities, pseudo-likelihood, Markov chain Monte Carlo, contrastive divergence
5 0.24456449 41 jmlr-2005-Kernel Methods for Measuring Independence
Author: Arthur Gretton, Ralf Herbrich, Alexander Smola, Olivier Bousquet, Bernhard Schölkopf
Abstract: We introduce two new functionals, the constrained covariance and the kernel mutual information, to measure the degree of independence of random variables. These quantities are both based on the covariance between functions of the random variables in reproducing kernel Hilbert spaces (RKHSs). We prove that when the RKHSs are universal, both functionals are zero if and only if the random variables are pairwise independent. We also show that the kernel mutual information is an upper bound near independence on the Parzen window estimate of the mutual information. Analogous results apply for two correlation-based dependence functionals introduced earlier: we show the kernel canonical correlation and the kernel generalised variance to be independence measures for universal kernels, and prove the latter to be an upper bound on the mutual information near independence. The performance of the kernel dependence functionals in measuring independence is verified in the context of independent component analysis. Keywords: independence, covariance operator, mutual information, kernel, Parzen window estimate, independent component analysis c 2005 Arthur Gretton, Ralf Herbrich, Alexander Smola, Olivier Bousquet and Bernhard Schölkopf . G RETTON , H ERBRICH , S MOLA , B OUSQUET AND S CHÖLKOPF
6 0.22121902 63 jmlr-2005-Quasi-Geodesic Neural Learning Algorithms Over the Orthogonal Group: A Tutorial
7 0.20405903 33 jmlr-2005-Fast Kernel Classifiers with Online and Active Learning
8 0.20194535 49 jmlr-2005-Learning the Kernel with Hyperkernels (Kernel Machines Section)
9 0.20079044 71 jmlr-2005-Variational Message Passing
10 0.2004187 32 jmlr-2005-Expectation Consistent Approximate Inference
11 0.20035328 36 jmlr-2005-Gaussian Processes for Ordinal Regression
12 0.19843031 64 jmlr-2005-Semigroup Kernels on Measures
13 0.19757433 39 jmlr-2005-Information Bottleneck for Gaussian Variables
14 0.19611937 46 jmlr-2005-Learning a Mahalanobis Metric from Equivalence Constraints
15 0.19574215 3 jmlr-2005-A Classification Framework for Anomaly Detection
16 0.19453083 44 jmlr-2005-Learning Module Networks
17 0.19428803 56 jmlr-2005-Maximum Margin Algorithms with Boolean Kernels
18 0.192766 11 jmlr-2005-Algorithmic Stability and Meta-Learning
19 0.19267331 19 jmlr-2005-Clustering on the Unit Hypersphere using von Mises-Fisher Distributions
20 0.19138713 20 jmlr-2005-Clustering with Bregman Divergences