nips nips2004 nips2004-104 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yoshitatsu Matsuda, Kazunori Yamaguchi
Abstract: In this paper, linear multilayer ICA (LMICA) is proposed for extracting independent components from quite high-dimensional observed signals such as large-size natural scenes. There are two phases in each layer of LMICA. One is the mapping phase, where a one-dimensional mapping is formed by a stochastic gradient algorithm which makes more highlycorrelated (non-independent) signals be nearer incrementally. Another is the local-ICA phase, where each neighbor (namely, highly-correlated) pair of signals in the mapping is separated by the MaxKurt algorithm. Because LMICA separates only the highly-correlated pairs instead of all ones, it can extract independent components quite efficiently from appropriate observed signals. In addition, it is proved that LMICA always converges. Some numerical experiments verify that LMICA is quite efficient and effective in large-size natural image processing.
Reference: text
sentIndex sentText sentNum sentScore
1 jp Abstract In this paper, linear multilayer ICA (LMICA) is proposed for extracting independent components from quite high-dimensional observed signals such as large-size natural scenes. [sent-9, score-0.449]
2 One is the mapping phase, where a one-dimensional mapping is formed by a stochastic gradient algorithm which makes more highlycorrelated (non-independent) signals be nearer incrementally. [sent-11, score-0.531]
3 Another is the local-ICA phase, where each neighbor (namely, highly-correlated) pair of signals in the mapping is separated by the MaxKurt algorithm. [sent-12, score-0.39]
4 Because LMICA separates only the highly-correlated pairs instead of all ones, it can extract independent components quite efficiently from appropriate observed signals. [sent-13, score-0.232]
5 Some numerical experiments verify that LMICA is quite efficient and effective in large-size natural image processing. [sent-15, score-0.145]
6 1 Introduction Independent component analysis (ICA) is a recently-developed method in the fields of signal processing and artificial neural networks, and has been shown to be quite useful for the blind separation problem [1][2][3] [4]. [sent-16, score-0.202]
7 Let s and A are N -dimensional source signals and N × N mixing matrix. [sent-18, score-0.213]
8 Then, the observed signals x are defined as x = As. [sent-19, score-0.165]
9 (1) The purpose is to find out A (or the inverse W ) when the observed (mixed) signals only are given. [sent-20, score-0.165]
10 In other words, ICA blindly extracts the source signals from M samples of the observed signals as follows: ˆ S = W X, (2) ∗ http://www. [sent-21, score-0.352]
11 jp/˜matsuda ˆ where X is an N × M matrix of the observed signals and S is the estimate of the source signals. [sent-26, score-0.208]
12 This is a typical ill-conditioned problem, but ICA can solve it by assuming that the source signals are generated according to independent and non-gaussian probability distributions. [sent-27, score-0.243]
13 Some efficient algorithms for this optimization problem have been proposed, for example, the fast ICA algorithm [5][6], the relative gradient algorithm [4], and JADE [7][8]. [sent-32, score-0.059]
14 Now, suppose that quite high-dimensional observed signals (namely, N is quite large) are given such as large-size natural scenes. [sent-33, score-0.347]
15 Recently, we proposed a new algorithm for this problem, which can find out global independent components by integrating the local ICA modules. [sent-35, score-0.135]
16 Developing this approach in this paper, we propose a new efficient ICA algorithm named “ the linear multilayer ICA algorithm (LMICA). [sent-36, score-0.137]
17 ” It will be shown in this paper that LMICA is quite efficient than other standard ICA algorithms in the processing of natural scenes. [sent-37, score-0.123]
18 In Section 3, numerical experiments will verify that LMICA is quite efficient in image processing and can extract some interesting edge detectors from large natural scenes. [sent-41, score-0.416]
19 1 basic idea LMICA can extract all the independent components approximately by repetition of the following two phases. [sent-44, score-0.158]
20 One is the mapping phase, which brings more highly-correlated signals nearer. [sent-45, score-0.286]
21 Another is local-ICA phase, where each neighbor pair of signals in the mapping is separated by MaxKurt algorithm [8]. [sent-46, score-0.408]
22 It will be shown in Section 3 that this hierarchical model is quite effective at least in natural scenes. [sent-51, score-0.123]
23 2 mapping phase In the mapping phase, given signals X are arranged in a one-dimensional array so that pairs (i, j) taking higher k x2 x2 are placed nearer. [sent-53, score-0.601]
24 Letting Y = (yi ) be the coordinate ik jk of the i-th signal xik , the following objective function µ is defined: 2 x2 x2 (yi − yj ) . [sent-54, score-0.318]
25 ik jk µ (Y ) = i,j (3) k The optimal mapping is found out by minimizing µ with respect to Y under the constraints 2 that yi = 0 and yi = 1. [sent-55, score-0.491]
26 It has been well-known that such optimization problems can be solved efficiently by a stochastic gradient algorithm [11][12]. [sent-56, score-0.064]
27 In this case, the stochastic gradient algorithm is given as follows (see [10] for the details of the derivation of this algorithm): yi (T + 1) := yi (T ) − λT (zi yi ζ + zi η) , (4) Figure 1: The illustration of LMICA (the ideal case): Each number from 1 to 8 means a source signal. [sent-57, score-0.397]
28 In the first local-ICA phase, each neighbor pair of the completely-mixed signals (denoted “1-8”) is partially separated into “1-4” and “5-8. [sent-58, score-0.248]
29 ” Next, the mapping phase rearranges the partially-separated signals so that more highly-correlated signals are nearer. [sent-59, score-0.57]
30 In consequence, the four “1-4” signals (similarly, “5-8” ones) are brought nearer. [sent-60, score-0.169]
31 Then, the local-ICA phase partially separates the pairs of neighbor signals into “1-2,” “3-4,” “56,” and “7-8. [sent-61, score-0.399]
32 ” By repetition of the two phases, LMICA can extract all the sources quite efficiently. [sent-62, score-0.14]
33 where λT is the step size at the T -th time step, zi = x2 (k is randomly selected from ik {1, . [sent-63, score-0.144]
34 Because the Y in the above method is continuous, each continuous yi is replaced by the ranking of itself in Y in the last of the mapping phase. [sent-70, score-0.217]
35 That is, yi := 1 for the largest yi , yj := N for the smallest one, and so on. [sent-71, score-0.15]
36 The corresponding permutation σ is given as σ (i) = yi . [sent-72, score-0.118]
37 The total procedure of the mapping phase for given X is described as follows: mapping phase 1. [sent-73, score-0.564]
38 xik := xik − xi for each i, k, where xi is the mean ¯ ¯ 2. [sent-74, score-0.186]
39 2 (c) Normalize Y to satisfy i yi = 0 and i yi = 1. [sent-84, score-0.15]
40 3 local-ICA phase In the local-ICA phase, the following contrast function φ (X) (the sum of kurtoses) is used (MaxKurt algorithm in [8]): (7) φ (X) = − x4 , ik i,k and φ (X) is minimized by “rotating” the neighbor pairs of signals (namely, under an orthogonal transformation). [sent-90, score-0.539]
41 For each neighbor pair (i, i + 1), a rotation matrix Ri (θ) is given as I i−1 0 0 0 cos θ sin θ 0 0 (8) Ri (θ) = , 0 − sin θ cos θ 0 0 0 0 I N −i−2 ˆ where I n is the n × n identity matrix. [sent-91, score-0.205]
42 Now, the procedure of the local-ICA phase for given X is described as follows: local-ICA phase 1. [sent-94, score-0.28]
43 ˆ (b) X := Ri (θ)X, W local := Ri W local , and Alocal := Alocal Rt . [sent-101, score-0.066]
44 4 complete algorithm The complete algorithm of LMICA for any given observed signals X is given by repeating the mapping phase and the local-ICA phase alternately. [sent-103, score-0.623]
45 Initial Settings: Let X be the given observed signal matrix, and W and A be I N . [sent-106, score-0.047]
46 (a) Mapping Phase: Find out the optimal permutation matrix P σ and the optimally-arranged signals X by the mapping phase. [sent-109, score-0.329]
47 The crucial difference between our LMICA and MaxKurt is that LMICA optimizes just the neighbor pairs instead of all the N (N −1) ones in MaxKurt. [sent-117, score-0.129]
48 In 2 LMICA, the pairs with higher “costs” (higher k x2 x2 ) are brought nearer in ik jk the mapping phase. [sent-118, score-0.438]
49 So, independent components can be extracted effectively by optimizing just the neighbor pairs. [sent-119, score-0.142]
50 (12) φ (X) = ik jk i,j,k The minimization of Eq. [sent-123, score-0.199]
51 Because any pre-whitening method suitable for LMICA has not been found out yet, raw images of natural scenes are given as X in the numerical experiments in Section 3. [sent-129, score-0.299]
52 In this non-whitening case, the mixing matrix A is limited to be orthogonal and the influence of the second-order statistics is not removed. [sent-130, score-0.048]
53 3 Results It has been well-known that various local edge detectors can be extracted from natural scenes by the standard ICA algorithm [13][14]. [sent-132, score-0.528]
54 30000 samples of natural scenes of 12 × 12 pixels were given as the observed signals X. [sent-134, score-0.437]
55 The number of layers L was set 720, where one layer means one pair of the mapping and the local-ICA phases. [sent-140, score-0.342]
56 For comparison, the experiments without the mapping phase were carried out, where the mapping Y was randomly generated. [sent-141, score-0.445]
57 2-(a) shows the decreasing curves of φ of normal LMICA and the one without the mapping phase. [sent-146, score-0.178]
58 The cross points show the result at each iteration of MaxKurt. [sent-147, score-0.048]
59 Because one iteration of MaxKurt is equivalent to 72 layers of LMICA with respect to the times of the optimizations for the pairs of signals, a scaling (×72) is applied. [sent-148, score-0.179]
60 The number of parameters within 10 layers is 143 × 10, which is much fewer than the degree of freedom of A ( 144×143 ). [sent-150, score-0.086]
61 It suggests that LMICA 2 gives a quite suitable model for natural scenes. [sent-151, score-0.141]
62 It shows that the time costs of the mapping phase are not much higher than those of the local-ICA phase. [sent-154, score-0.299]
63 The fact that 10 layers of LMICA required much less time (22sec. [sent-155, score-0.086]
64 3 shows 5 × 5 representative edge detectors at each layer of LMICA. [sent-161, score-0.336]
65 3-(a)), rough and local edge detectors were recognized, though they were a little unclear. [sent-163, score-0.248]
66 As the layer proceeded, edge detectors became clearer and more global (see Figs. [sent-164, score-0.333]
67 It is interesting that ICA-like local edges (where the higherorder statistics are dominant) at the early stage were transformed to PCA-like global edges (the second-order statistics are dominant) at the later stage (see [13]). [sent-166, score-0.075]
68 100000 samples of natural scenes of 64 × 64 pixels were given as X. [sent-172, score-0.272]
69 2-(b) shows the decreasing curve of φ in the large-size natural scenes. [sent-175, score-0.083]
70 It shows that LMICA rapidly decreased in the first 20 layers and converged around the 500th layer. [sent-178, score-0.107]
71 It verifies that LMICA is quite efficient in the analysis of large-size natural scenes. [sent-179, score-0.123]
72 4 shows some edge detectors generated at the 1000th layer. [sent-181, score-0.236]
73 It is interesting that some “compound” detectors such as a “cross” were generated in addition to simple “long-edge” detectors. [sent-182, score-0.22]
74 In a famous previous work [13] which applied ICA and PCA to small-size natural scenes, symmetric global edge detectors similar to our “compound” ones could be generated by PCA which manages only the second-order statistics. [sent-183, score-0.359]
75 On the other hand, asymmetric local edge detectors similar to our simple “long-edge” ones could not be generated by PCA and could be extracted by ICA utilizing the higher-order statistics. [sent-184, score-0.325]
76 In comparison with it, our LMICA could extract various local and global detectors simultaneously from large-size natural scenes. [sent-185, score-0.319]
77 3) that other various detectors are generated at each layer. [sent-187, score-0.181]
78 In summary, those results show that LMICA can extract quite many useful and various detectors from large-size natural scenes efficiently. [sent-188, score-0.498]
79 It suggests that large-size natural scenes may be generated by two different generative models. [sent-190, score-0.262]
80 4 Conclusion In this paper, we proposed the linear multilayer ICA algorithm (LMICA). [sent-192, score-0.119]
81 We carried out some numerical experiments on natural scenes, which verified that LMICA can find out the approximations of independent components quite efficiently and it is applicable to large problems. [sent-193, score-0.243]
82 We are now analyzing the results of LMICA in large-size natural scenes of 64 × 64 pixels, and we are planning to apply this algorithm to quite large-scale images such as the ones of 256 × 256 pixels. [sent-194, score-0.373]
83 (12)): They are the averages over 10 runs at the 10th layer (approximation) and the 720th layer (convergence) in LMICA (the normal one and the one without the mapping phase). [sent-196, score-0.347]
84 LMICA LMICA without mapping MaxKurt (10 iterations) 10th layer 22sec. [sent-200, score-0.236]
85 57) of quite high-dimensional data space, such as the text mining. [sent-211, score-0.059]
86 Some normalization techniques in the local-ICA phase may be promising. [sent-213, score-0.14]
87 An information-maximization approach to blind separation and blind deconvolution. [sent-228, score-0.15]
88 A fast fixed-point algorithm for independent component a analysis. [sent-238, score-0.082]
89 Fast and robust fixed-point algorithms for independent component a analysis. [sent-242, score-0.064]
90 Linear multilayer independent component analysis using stochastic gradient algorithm. [sent-254, score-0.211]
91 In Independent Component Analysis and Blind source separation - ICA2004, volume 3195 of LNCS, pages 303–310, Granada, Spain, sep 2004. [sent-255, score-0.069]
92 The ”independent components” of natural scenes are edge filters. [sent-269, score-0.296]
93 Independent component filters of natural images compared with simple cells in primary visual cortex. [sent-275, score-0.093]
94 Figure 2: Decreasing curve of the contrast function φ along the number of layers (in logscale): (a). [sent-281, score-0.109]
95 It is for small-size natural scenes of 12 × 12 pixels. [sent-282, score-0.241]
96 The normal and dotted curves show the decreases of φ by LMICA and the one without the mapping phase (random mapping), respectively. [sent-283, score-0.299]
97 Each iteration in MaxKurt approximately corresponds to 72 layers with respect to the times of the optimizations for the pairs of signals. [sent-285, score-0.196]
98 It is for large-size natural scenes of 64 × 64 pixels. [sent-287, score-0.241]
99 Figure 3: Representative edge detectors from natural scenes of 12 × 12 pixels: (a). [sent-297, score-0.456]
100 Figure 4: Representative edge detectors from natural scenes of 64 × 64 pixels. [sent-305, score-0.456]
wordName wordTfidf (topN-words)
[('lmica', 0.741), ('maxkurt', 0.312), ('ica', 0.182), ('scenes', 0.177), ('detectors', 0.16), ('signals', 0.144), ('mapping', 0.142), ('phase', 0.14), ('matsuda', 0.136), ('kazunori', 0.117), ('jk', 0.101), ('multilayer', 0.101), ('ik', 0.098), ('yoshitatsu', 0.097), ('layer', 0.094), ('xik', 0.093), ('layers', 0.086), ('alocal', 0.078), ('yi', 0.075), ('natural', 0.064), ('blind', 0.062), ('neighbor', 0.061), ('quite', 0.059), ('edge', 0.055), ('zi', 0.046), ('ij', 0.045), ('repetition', 0.043), ('permutation', 0.043), ('dec', 0.043), ('source', 0.043), ('ri', 0.039), ('nearer', 0.039), ('yamaguchi', 0.039), ('extract', 0.038), ('independent', 0.035), ('ones', 0.035), ('pairs', 0.033), ('local', 0.033), ('sin', 0.032), ('find', 0.032), ('pixels', 0.031), ('cardoso', 0.031), ('optimizations', 0.031), ('cos', 0.03), ('iteration', 0.029), ('phases', 0.029), ('component', 0.029), ('compound', 0.027), ('representative', 0.027), ('separation', 0.026), ('mixing', 0.026), ('signal', 0.026), ('veri', 0.026), ('components', 0.025), ('hyv', 0.025), ('brought', 0.025), ('global', 0.024), ('gradient', 0.023), ('pca', 0.023), ('stochastic', 0.023), ('intel', 0.023), ('contrast', 0.023), ('separated', 0.023), ('numerical', 0.022), ('ef', 0.022), ('orthogonal', 0.022), ('dominant', 0.021), ('bell', 0.021), ('carried', 0.021), ('observed', 0.021), ('generated', 0.021), ('addition', 0.021), ('extracted', 0.021), ('separates', 0.021), ('converged', 0.021), ('japan', 0.02), ('pair', 0.02), ('planning', 0.02), ('decreasing', 0.019), ('displays', 0.019), ('illustration', 0.019), ('cross', 0.019), ('algorithm', 0.018), ('raw', 0.018), ('suitable', 0.018), ('interesting', 0.018), ('normal', 0.017), ('applicable', 0.017), ('calculation', 0.017), ('namely', 0.017), ('gaus', 0.017), ('sian', 0.017), ('kurtosis', 0.017), ('granada', 0.017), ('antoine', 0.017), ('hateren', 0.017), ('jutten', 0.017), ('proceeded', 0.017), ('approximately', 0.017), ('costs', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 104 nips-2004-Linear Multilayer Independent Component Analysis for Large Natural Scenes
Author: Yoshitatsu Matsuda, Kazunori Yamaguchi
Abstract: In this paper, linear multilayer ICA (LMICA) is proposed for extracting independent components from quite high-dimensional observed signals such as large-size natural scenes. There are two phases in each layer of LMICA. One is the mapping phase, where a one-dimensional mapping is formed by a stochastic gradient algorithm which makes more highlycorrelated (non-independent) signals be nearer incrementally. Another is the local-ICA phase, where each neighbor (namely, highly-correlated) pair of signals in the mapping is separated by the MaxKurt algorithm. Because LMICA separates only the highly-correlated pairs instead of all ones, it can extract independent components quite efficiently from appropriate observed signals. In addition, it is proved that LMICA always converges. Some numerical experiments verify that LMICA is quite efficient and effective in large-size natural image processing.
Author: Tobias Blaschke, Laurenz Wiskott
Abstract: In contrast to the equivalence of linear blind source separation and linear independent component analysis it is not possible to recover the original source signal from some unknown nonlinear transformations of the sources using only the independence assumption. Integrating the objectives of statistical independence and temporal slowness removes this indeterminacy leading to a new method for nonlinear blind source separation. The principle of temporal slowness is adopted from slow feature analysis, an unsupervised method to extract slowly varying features from a given observed vectorial signal. The performance of the algorithm is demonstrated on nonlinearly mixed speech data. 1
3 0.12741715 121 nips-2004-Modeling Nonlinear Dependencies in Natural Images using Mixture of Laplacian Distribution
Author: Hyun J. Park, Te W. Lee
Abstract: Capturing dependencies in images in an unsupervised manner is important for many image processing applications. We propose a new method for capturing nonlinear dependencies in images of natural scenes. This method is an extension of the linear Independent Component Analysis (ICA) method by building a hierarchical model based on ICA and mixture of Laplacian distribution. The model parameters are learned via an EM algorithm and it can accurately capture variance correlation and other high order structures in a simple manner. We visualize the learned variance structure and demonstrate applications to image segmentation and denoising. 1 In trod u ction Unsupervised learning has become an important tool for understanding biological information processing and building intelligent signal processing methods. Real biological systems however are much more robust and flexible than current artificial intelligence mostly due to a much more efficient representations used in biological systems. Therefore, unsupervised learning algorithms that capture more sophisticated representations can provide a better understanding of neural information processing and also provide improved algorithm for signal processing applications. For example, independent component analysis (ICA) can learn representations similar to simple cell receptive fields in visual cortex [1] and is also applied for feature extraction, image segmentation and denoising [2,3]. ICA can approximate statistics of natural image patches by Eq.(1,2), where X is the data and u is a source signal whose distribution is a product of sparse distributions like a generalized Laplacian distribution. X = Au (1) P (u ) = ∏ P (u i ) (2) But the representation learned by the ICA algorithm is relatively low-level. In biological systems there are more high-level representations such as contours, textures and objects, which are not well represented by the linear ICA model. ICA learns only linear dependency between pixels by finding strongly correlated linear axis. Therefore, the modeling capability of ICA is quite limited. Previous approaches showed that one can learn more sophisticated high-level representations by capturing nonlinear dependencies in a post-processing step after the ICA step [4,5,6,7,8]. The focus of these efforts has centered on variance correlation in natural images. After ICA, a source signal is not linearly predictable from others. However, given variance dependencies, a source signal is still ‘predictable’ in a nonlinear manner. It is not possible to de-correlate this variance dependency using a linear transformation. Several researchers have proposed extensions to capture the nonlinear dependencies. Portilla et al. used Gaussian Scale Mixture (GSM) to model variance dependency in wavelet domain. This model can learn variance correlation in source prior and showed improvement in image denoising [4]. But in this model, dependency is defined only between a subset of wavelet coefficients. Hyvarinen and Hoyer suggested using a special variance related distribution to model the variance correlated source prior. This model can learn grouping of dependent sources (Subspace ICA) or topographic arrangements of correlated sources (Topographic ICA) [5,6]. Similarly, Welling et al. suggested a product of expert model where each expert represents a variance correlated group [7]. The product form of the model enables applications to image denoising. But these models don’t reveal higher-order structures explicitly. Our model is motivated by Lewicki and Karklin who proposed a 2-stage model where the 1st stage is an ICA model (Eq. (3)) and the 2 nd-stage is a linear generative model where another source v generates logarithmic variance for the 1st stage (Eq. (4)) [8]. This model captures variance dependency structure explicitly, but treating variance as an additional random variable introduces another level of complexity and requires several approximations. Thus, it is difficult to obtain a simple analytic PDF of source signal u and to apply the model for image processing problems. ( P (u | λ ) = c exp − u / λ q ) (3) log[λ ] = Bv (4) We propose a hierarchical model based on ICA and a mixture of Laplacian distribution. Our model can be considered as a simplification of model in [8] by constraining v to be 0/1 random vector where only one element can be 1. Our model is computationally simpler but still can capture variance dependency. Experiments show that our model can reveal higher order structures similar to [8]. In addition, our model provides a simple parametric PDF of variance correlated priors, which is an important advantage for adaptive signal processing. Utilizing this, we demonstrate simple applications on image segmentation and image denoising. Our model provides an improved statistic model for natural images and can be used for other applications including feature extraction, image coding, or learning even higher order structures. 2 Modeling nonlinear dependencies We propose a hierarchical or 2-stage model where the 1 st stage is an ICA source signal model and the 2nd stage is modeled by a mixture model with different variances (figure 1). In natural images, the correlation of variance reflects different types of regularities in the real world. Such specialized regularities can be summarized as “context” information. To model the context dependent variance correlation, we use mixture models where Laplacian distributions with different variance represent different contexts. For each image patch, a context variable Z “selects” which Laplacian distribution will represent ICA source signal u. Laplacian distributions have 0-mean but different variances. The advantage of Laplacian distribution for modeling context is that we can model a sparse distribution using only one Laplacian distribution. But we need more than two Gaussian distributions to do the same thing. Also conventional ICA is a special case of our model with one Laplacian. We define the mixture model and its learning algorithm in the next sections. Figure 1: Proposed hierarchical model (1st stage is ICA generative model. 2nd stage is mixture of “context dependent” Laplacian distributions which model U. Z is a random variable that selects a Laplacian distribution that generates the given image patch) 2.1 Mixture of Laplacian Distribution We define a PDF for mixture of M-dimensional Laplacian Distribution as Eq.(5), where N is the number of data samples, and K is the number of mixtures. N N K M N K r r r P(U | Λ, Π) = ∏ P(u n | Λ, Π) = ∏∑ π k P(u n | λk ) = ∏∑ π k ∏ n n k n k m 1 (2λ ) k ,m u n,m exp − λk , m (5) r r r r r un = (un,1 , un , 2 , , , un,M ) : n-th data sample, U = (u1 , u 2 , , , ui , , , u N ) r r r r r λk = (λk ,1 , λk , 2 ,..., λk ,M ) : Variance of k-th Laplacian distribution, Λ = (λ1 , λ2 , , , λk , , , λK ) πk : probability of Laplacian distribution k, Π = (π 1 , , , π K ) and ∑ k πk =1 It is not easy to maximize Eq.(5) directly, and we use EM (expectation maximization) algorithm for parameter estimation. Here we introduce a new hidden context variable Z that represents which Laplacian k, is responsible for a given data point. Assuming we know the hidden variable Z, we can write the likelihood of data and Z as Eq.(6), n zk K N r (π )zkn 1 ⋅ exp − z n u n ,m P(U , Z | Λ, Π ) = ∏ P(u n , Z | Λ, Π ) = ∏ ∏ k ∏ k k λk , m n n m 2λk ,m N (6) n z k : Hidden binary random variable, 1 if n-th data sample is generated from k-th n Laplacian, 0 other wise. ( Z = (z kn ) and ∑ z k = 1 for all n = 1…N) k 2.2 EM algorithm for learning the mixture model The EM algorithm maximizes the log likelihood of data averaged over hidden variable Z. The log likelihood and its expectation can be computed as in Eq.(7,8). u 1 n n log P(U , Z | Λ, Π ) = ∑ z k log(π k ) + ∑ z k log( ) − n ,m 2λk ,m λk , m n ,k m (7) u 1 n E {log P (U , Z | Λ, Π )} = ∑ E z k log(π k ) + ∑ log( ) − n ,m 2λ k , m λk , m n ,k m { } (8) The expectation in Eq.(8) can be evaluated, if we are given the data U and estimated parameters Λ and Π. For Λ and Π, EM algorithm uses current estimation Λ’ and Π’. { } { } ∑ z P( z n n E z k ≡ E zk | U , Λ' , Π ' = 1 n z k =0 n k n k n | u n , Λ' , Π ' ) = P( z k = 1 | u n , Λ' , Π ' ) (9) = n n P (u n | z k = 1, Λ' , Π ' ) P( z k = 1 | Λ ' , Π ' ) P(u n | Λ' , Π ' ) = M u n ,m 1 1 1 ∏ 2λ ' exp(− λ ' ) ⋅ π k ' = c P (u n | Λ ' , Π ' ) m k ,m k ,m n M πk ' ∏ 2λ m k ,m ' exp(− u n ,m λk , m ' ) Where the normalization constant can be computed as K K M k k =1 m =1 n cn = P (u n | Λ ' , Π ' ) = ∑ P (u n | z k , Λ ' , Π ' ) P ( z kn | Λ ' , Π ' ) = ∑ π k ∏ 1 (2λ ) exp( − k ,m u n ,m λk ,m ) (10) The EM algorithm works by maximizing Eq.(8), given the expectation computed from Eq.(9,10). Eq.(9,10) can be computed using Λ’ and Π’ estimated in the previous iteration of EM algorithm. This is E-step of EM algorithm. Then in M-step of EM algorithm, we need to maximize Eq.(8) over parameter Λ and Π. First, we can maximize Eq.(8) with respect to Λ, by setting the derivative as 0. 1 u n,m ∂E{log P (U , Z | Λ, Π )} n = 0 = ∑ E z k − + λ k , m (λ k , m ) 2 ∂λ k ,m n { } ⇒ λ k ,m ∑ E{z }⋅ u = ∑ E{z } n k n ,m n (11) n k n Second, for maximization of Eq.(8) with respect to Π, we can rewrite Eq.(8) as below. n (12) E {log P (U , Z | Λ , Π )} = C + ∑ E {z k ' }log(π k ' ) n ,k ' As we see, the derivative of Eq.(12) with respect to Π cannot be 0. Instead, we need to use Lagrange multiplier method for maximization. A Lagrange function can be defined as Eq.(14) where ρ is a Lagrange multiplier. { } (13) n L (Π , ρ ) = − ∑ E z k ' log(π k ' ) + ρ (∑ π k ' − 1) n,k ' k' By setting the derivative of Eq.(13) to be 0 with respect to ρ and Π, we can simply get the maximization solution with respect to Π. We just show the solution in Eq.(14). ∂L(Π, ρ ) ∂L(Π, ρ ) =0 = 0, ∂Π ∂ρ n n ⇒ π k = ∑ E z k / ∑∑ E z k k n n { } { } (14) Then the EM algorithm can be summarized as figure 2. For the convergence criteria, we can use the expectation of log likelihood, which can be calculated from Eq. (8). πk = { } , λk , m = E um + e (e is small random noise) 2. Calculate the Expectation by 1. Initialize 1 K u n ,m 1 M πk ' ∏ 2λ ' exp( − λ ' ) cn m k ,m k ,m 3. Maximize the log likelihood given the Expectation { } { } n n E z k ≡ E zk | U , Λ' , Π ' = λk ,m ← ∑ E {z kn }⋅ u n,m / ∑ E {z kn } , π k ← ∑ E {z kn } / ∑∑ E {z kn } n n k n 4. If (converged) stop, otherwise repeat from step 2. n Figure 2: Outline of EM algorithm for Learning the Mixture Model 3 Experimental Results Here we provide examples of image data and show how the learning procedure is performed for the mixture model. We also provide visualization of learned variances that reveal the structure of variance correlation and an application to image denoising. 3.1 Learning Nonlinear Dependencies in Natural images As shown in figure 1, the 1 st stage of the proposed model is simply the linear ICA. The ICA matrix A and W(=A-1) are learned by the FastICA algorithm [9]. We sampled 105(=N) data from 16x16 patches (256 dim.) of natural images and use them for both first and second stage learning. ICA input dimension is 256, and source dimension is set to be 160(=M). The learned ICA basis is partially shown in figure 1. The 2nd stage mixture model is learned given the ICA source signals. In the 2 nd stage the number of mixtures is set to 16, 64, or 256(=K). Training by the EM algorithm is fast and several hundred iterations are sufficient for convergence (0.5 hour on a 1.7GHz Pentium PC). For the visualization of learned variance, we adapted the visualization method from [8]. Each dimension of ICA source signal corresponds to an ICA basis (columns of A) and each ICA basis is localized in both image and frequency space. Then for each Laplacian distribution, we can display its variance vector as a set of points in image and frequency space. Each point can be color coded by variance value as figure 3. (a1) (a2) (b1) (b2) Figure 3: Visualization of learned variances (a1 and a2 visualize variance of Laplacian #4 and b1 and 2 show that of Laplacian #5. High variance value is mapped to red color and low variance is mapped to blue. In Laplacian #4, variances for diagonally oriented edges are high. But in Laplacian #5, variances for edges at spatially right position are high. Variance structures are related to “contexts” in the image. For example, Laplacian #4 explains image patches that have oriented textures or edges. Laplacian #5 captures patches where left side of the patch is clean but right side is filled with randomly oriented edges.) A key idea of our model is that we can mix up independent distributions to get nonlinearly dependent distribution. This modeling power can be shown by figure 4. Figure 4: Joint distribution of nonlinearly dependent sources. ((a) is a joint histogram of 2 ICA sources, (b) is computed from learned mixture model, and (c) is from learned Laplacian model. In (a), variance of u2 is smaller than u1 at center area (arrow A), but almost equal to u1 at outside (arrow B). So the variance of u2 is dependent on u1. This nonlinear dependency is closely approximated by mixture model in (b), but not in (c).) 3.2 Unsupervised Image Segmentation The idea behind our model is that the image can be modeled as mixture of different variance correlated “contexts”. We show how the learned model can be used to classify different context by an unsupervised image segmentation task. Given learned model and data, we can compute the expectation of a hidden variable Z from Eq. (9). Then for an image patch, we can select a Laplacian distribution with highest probability, which is the most explaining Laplacian or “context”. For segmentation, we use the model with 16 Laplacians. This enables abstract partitioning of images and we can visualize organization of images more clearly (figure 5). Figure 5: Unsupervised image segmentation (left is original image, middle is color labeled image, right image shows color coded Laplacians with variance structure. Each color corresponds to a Laplacian distribution, which represents surface or textural organization of underlying contexts. Laplacian #14 captures smooth surface and Laplacian #9 captures contrast between clear sky and textured ground scenes.) 3.3 Application to Image Restoration The proposed mixture model provides a better parametric model of the ICA source distribution and hence an improved model of the image structure. An advantage is in the MAP (maximum a posterior) estimation of a noisy image. If we assume Gaussian noise n, the image generation model can be written as Eq.(15). Then, we can compute MAP estimation of ICA source signal u by Eq.(16) and reconstruct the original image. (15) X = Au + n (16) ˆ u = argmax log P (u | X , A) = argmax (log P ( X | u , A) + log P (u ) ) u u Since we assumed Gaussian noise, P(X|u,A) in Eq. (16) is Gaussian. P(u) in Eq. (16) can be modeled as a Laplacian or a mixture of Laplacian distribution. The mixture distribution can be approximated by a maximum explaining Laplacian. We evaluated 3 different methods for image restoration including ICA MAP estimation with simple Laplacian prior, same with Laplacian mixture prior, and the Wiener filter. Figure 6 shows an example and figure 7 summarizes the results obtained with different noise levels. As shown MAP estimation with the mixture prior performs better than the others in terms of SNR and SSIM (Structural Similarity Measure) [10]. Figure 6: Image restoration results (signal variance 1.0, noise variance 0.81) 16 ICA MAP (Mixture prior) ICA MAP (Laplacian prior) W iener 14 0.8 SSIM Index SNR 12 10 8 6 0.6 0.4 0.2 4 2 ICA MAP(Mixture prior) ICA MAP(Laplacian prior) W iener Noisy Image 1 0 0.5 1 1.5 Noise variance 2 2.5 0 0 0.5 1 1.5 Noise variance 2 2.5 Figure 7: SNR and SSIM for 3 different algorithms (signal variance = 1.0) 4 D i s c u s s i on We proposed a mixture model to learn nonlinear dependencies of ICA source signals for natural images. The proposed mixture of Laplacian distribution model is a generalization of the conventional independent source priors and can model variance dependency given natural image signals. Experiments show that the proposed model can learn the variance correlated signals grouped as different mixtures and learn highlevel structures, which are highly correlated with the underlying physical properties captured in the image. Our model provides an analytic prior of nearly independent and variance-correlated signals, which was not viable in previous models [4,5,6,7,8]. The learned variances of the mixture model show structured localization in image and frequency space, which are similar to the result in [8]. Since the model is given no information about the spatial location or frequency of the source signals, we can assume that the dependency captured by the mixture model reveals regularity in the natural images. As shown in image labeling experiments, such regularities correspond to specific surface types (textures) or boundaries between surfaces. The learned mixture model can be used to discover hidden contexts that generated such regularity or correlated signal groups. Experiments also show that the labeling of image patches is highly correlated with the object surface types shown in the image. The segmentation results show regularity across image space and strong correlation with high-level concepts. Finally, we showed applications of the model for image restoration. We compare the performance with the conventional ICA MAP estimation and Wiener filter. Our results suggest that the proposed model outperforms other traditional methods. It is due to the estimation of the correlated variance structure, which provides an improved prior that has not been considered in other methods. In our future work, we plan to exploit the regularity of the image segmentation result to lean more high-level structures by building additional hierarchies on the current model. Furthermore, the application to image coding seems promising. References [1] A. J. Bell and T. J. Sejnowski, The ‘Independent Components’ of Natural Scenes are Edge Filters, Vision Research, 37(23):3327–3338, 1997. [2] A. Hyvarinen, Sparse Code Shrinkage: Denoising of Nongaussian Data by Maximum Likelihood Estimation,Neural Computation, 11(7):1739-1768, 1999. [3] T. Lee, M. Lewicki, and T. Sejnowski., ICA Mixture Models for unsupervised Classification of non-gaussian classes and automatic context switching in blind separation. PAMI, 22(10), October 2000. [4] J. Portilla, V. Strela, M. J. Wainwright and E. P Simoncelli, Image Denoising using Scale Mixtures of Gaussians in the Wavelet Domain, IEEE Trans. On Image Processing, Vol.12, No. 11, 1338-1351, 2003. [5] A. Hyvarinen, P. O. Hoyer. Emergence of phase and shift invariant features by decomposition of natural images into independent feature subspaces. Neurocomputing, 1999. [6] A. Hyvarinen, P.O. Hoyer, Topographic Independent component analysis as a model of V1 Receptive Fields, Neurocomputing, Vol. 38-40, June 2001. [7] M. Welling and G. E. Hinton, S. Osindero, Learning Sparse Topographic Representations with Products of Student-t Distributions, NIPS, 2002. [8] M. S. Lewicki and Y. Karklin, Learning higher-order structures in natural images, Network: Comput. Neural Syst. 14 (August 2003) 483-499. [9] A.Hyvarinen, P.O. Hoyer, Fast ICA matlab code., http://www.cis.hut.fi/projects/compneuro/extensions.html/ [10] Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, The SSIM Index for Image Quality Assessment, IEEE Transactions on Image Processing, vol. 13, no. 4, Apr. 2004.
4 0.077689931 5 nips-2004-A Harmonic Excitation State-Space Approach to Blind Separation of Speech
Author: Rasmus K. Olsson, Lars K. Hansen
Abstract: We discuss an identification framework for noisy speech mixtures. A block-based generative model is formulated that explicitly incorporates the time-varying harmonic plus noise (H+N) model for a number of latent sources observed through noisy convolutive mixtures. All parameters including the pitches of the source signals, the amplitudes and phases of the sources, the mixing filters and the noise statistics are estimated by maximum likelihood, using an EM-algorithm. Exact averaging over the hidden sources is obtained using the Kalman smoother. We show that pitch estimation and source separation can be performed simultaneously. The pitch estimates are compared to laryngograph (EGG) measurements. Artificial and real room mixtures are used to demonstrate the viability of the approach. Intelligible speech signals are re-synthesized from the estimated H+N models.
5 0.071426921 33 nips-2004-Brain Inspired Reinforcement Learning
Author: Françcois Rivest, Yoshua Bengio, John Kalaska
Abstract: Successful application of reinforcement learning algorithms often involves considerable hand-crafting of the necessary non-linear features to reduce the complexity of the value functions and hence to promote convergence of the algorithm. In contrast, the human brain readily and autonomously finds the complex features when provided with sufficient training. Recent work in machine learning and neurophysiology has demonstrated the role of the basal ganglia and the frontal cortex in mammalian reinforcement learning. This paper develops and explores new reinforcement learning algorithms inspired by neurological evidence that provides potential new approaches to the feature construction problem. The algorithms are compared and evaluated on the Acrobot task. 1
6 0.069745041 20 nips-2004-An Auditory Paradigm for Brain-Computer Interfaces
7 0.057086822 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment
8 0.055796295 172 nips-2004-Sparse Coding of Natural Images Using an Overcomplete Set of Limited Capacity Units
9 0.051299956 144 nips-2004-Parallel Support Vector Machines: The Cascade SVM
10 0.047278155 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach
11 0.041716948 198 nips-2004-Unsupervised Variational Bayesian Learning of Nonlinear Models
12 0.041130599 84 nips-2004-Inference, Attention, and Decision in a Bayesian Neural Architecture
13 0.036706571 187 nips-2004-The Entire Regularization Path for the Support Vector Machine
14 0.035960943 178 nips-2004-Support Vector Classification with Input Data Uncertainty
15 0.033452421 11 nips-2004-A Second Order Cone programming Formulation for Classifying Missing Data
16 0.032278135 27 nips-2004-Bayesian Regularization and Nonnegative Deconvolution for Time Delay Estimation
17 0.031854592 194 nips-2004-Theory of localized synfire chain: characteristic propagation speed of stable spike pattern
18 0.031332344 67 nips-2004-Exponentiated Gradient Algorithms for Large-margin Structured Classification
19 0.030605184 127 nips-2004-Neighbourhood Components Analysis
20 0.03041612 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning
topicId topicWeight
[(0, -0.111), (1, -0.009), (2, -0.026), (3, -0.076), (4, -0.075), (5, -0.07), (6, 0.128), (7, 0.016), (8, 0.002), (9, 0.026), (10, 0.07), (11, -0.008), (12, -0.04), (13, -0.092), (14, 0.021), (15, 0.076), (16, -0.057), (17, 0.172), (18, 0.036), (19, -0.039), (20, 0.069), (21, -0.102), (22, -0.019), (23, -0.017), (24, -0.083), (25, 0.001), (26, -0.125), (27, -0.016), (28, 0.089), (29, 0.05), (30, -0.017), (31, 0.046), (32, -0.009), (33, -0.087), (34, 0.058), (35, 0.062), (36, 0.111), (37, 0.058), (38, -0.009), (39, -0.008), (40, 0.142), (41, -0.006), (42, -0.007), (43, -0.04), (44, 0.029), (45, -0.081), (46, 0.027), (47, -0.009), (48, 0.055), (49, 0.047)]
simIndex simValue paperId paperTitle
same-paper 1 0.94830579 104 nips-2004-Linear Multilayer Independent Component Analysis for Large Natural Scenes
Author: Yoshitatsu Matsuda, Kazunori Yamaguchi
Abstract: In this paper, linear multilayer ICA (LMICA) is proposed for extracting independent components from quite high-dimensional observed signals such as large-size natural scenes. There are two phases in each layer of LMICA. One is the mapping phase, where a one-dimensional mapping is formed by a stochastic gradient algorithm which makes more highlycorrelated (non-independent) signals be nearer incrementally. Another is the local-ICA phase, where each neighbor (namely, highly-correlated) pair of signals in the mapping is separated by the MaxKurt algorithm. Because LMICA separates only the highly-correlated pairs instead of all ones, it can extract independent components quite efficiently from appropriate observed signals. In addition, it is proved that LMICA always converges. Some numerical experiments verify that LMICA is quite efficient and effective in large-size natural image processing.
Author: Tobias Blaschke, Laurenz Wiskott
Abstract: In contrast to the equivalence of linear blind source separation and linear independent component analysis it is not possible to recover the original source signal from some unknown nonlinear transformations of the sources using only the independence assumption. Integrating the objectives of statistical independence and temporal slowness removes this indeterminacy leading to a new method for nonlinear blind source separation. The principle of temporal slowness is adopted from slow feature analysis, an unsupervised method to extract slowly varying features from a given observed vectorial signal. The performance of the algorithm is demonstrated on nonlinearly mixed speech data. 1
3 0.60577315 121 nips-2004-Modeling Nonlinear Dependencies in Natural Images using Mixture of Laplacian Distribution
Author: Hyun J. Park, Te W. Lee
Abstract: Capturing dependencies in images in an unsupervised manner is important for many image processing applications. We propose a new method for capturing nonlinear dependencies in images of natural scenes. This method is an extension of the linear Independent Component Analysis (ICA) method by building a hierarchical model based on ICA and mixture of Laplacian distribution. The model parameters are learned via an EM algorithm and it can accurately capture variance correlation and other high order structures in a simple manner. We visualize the learned variance structure and demonstrate applications to image segmentation and denoising. 1 In trod u ction Unsupervised learning has become an important tool for understanding biological information processing and building intelligent signal processing methods. Real biological systems however are much more robust and flexible than current artificial intelligence mostly due to a much more efficient representations used in biological systems. Therefore, unsupervised learning algorithms that capture more sophisticated representations can provide a better understanding of neural information processing and also provide improved algorithm for signal processing applications. For example, independent component analysis (ICA) can learn representations similar to simple cell receptive fields in visual cortex [1] and is also applied for feature extraction, image segmentation and denoising [2,3]. ICA can approximate statistics of natural image patches by Eq.(1,2), where X is the data and u is a source signal whose distribution is a product of sparse distributions like a generalized Laplacian distribution. X = Au (1) P (u ) = ∏ P (u i ) (2) But the representation learned by the ICA algorithm is relatively low-level. In biological systems there are more high-level representations such as contours, textures and objects, which are not well represented by the linear ICA model. ICA learns only linear dependency between pixels by finding strongly correlated linear axis. Therefore, the modeling capability of ICA is quite limited. Previous approaches showed that one can learn more sophisticated high-level representations by capturing nonlinear dependencies in a post-processing step after the ICA step [4,5,6,7,8]. The focus of these efforts has centered on variance correlation in natural images. After ICA, a source signal is not linearly predictable from others. However, given variance dependencies, a source signal is still ‘predictable’ in a nonlinear manner. It is not possible to de-correlate this variance dependency using a linear transformation. Several researchers have proposed extensions to capture the nonlinear dependencies. Portilla et al. used Gaussian Scale Mixture (GSM) to model variance dependency in wavelet domain. This model can learn variance correlation in source prior and showed improvement in image denoising [4]. But in this model, dependency is defined only between a subset of wavelet coefficients. Hyvarinen and Hoyer suggested using a special variance related distribution to model the variance correlated source prior. This model can learn grouping of dependent sources (Subspace ICA) or topographic arrangements of correlated sources (Topographic ICA) [5,6]. Similarly, Welling et al. suggested a product of expert model where each expert represents a variance correlated group [7]. The product form of the model enables applications to image denoising. But these models don’t reveal higher-order structures explicitly. Our model is motivated by Lewicki and Karklin who proposed a 2-stage model where the 1st stage is an ICA model (Eq. (3)) and the 2 nd-stage is a linear generative model where another source v generates logarithmic variance for the 1st stage (Eq. (4)) [8]. This model captures variance dependency structure explicitly, but treating variance as an additional random variable introduces another level of complexity and requires several approximations. Thus, it is difficult to obtain a simple analytic PDF of source signal u and to apply the model for image processing problems. ( P (u | λ ) = c exp − u / λ q ) (3) log[λ ] = Bv (4) We propose a hierarchical model based on ICA and a mixture of Laplacian distribution. Our model can be considered as a simplification of model in [8] by constraining v to be 0/1 random vector where only one element can be 1. Our model is computationally simpler but still can capture variance dependency. Experiments show that our model can reveal higher order structures similar to [8]. In addition, our model provides a simple parametric PDF of variance correlated priors, which is an important advantage for adaptive signal processing. Utilizing this, we demonstrate simple applications on image segmentation and image denoising. Our model provides an improved statistic model for natural images and can be used for other applications including feature extraction, image coding, or learning even higher order structures. 2 Modeling nonlinear dependencies We propose a hierarchical or 2-stage model where the 1 st stage is an ICA source signal model and the 2nd stage is modeled by a mixture model with different variances (figure 1). In natural images, the correlation of variance reflects different types of regularities in the real world. Such specialized regularities can be summarized as “context” information. To model the context dependent variance correlation, we use mixture models where Laplacian distributions with different variance represent different contexts. For each image patch, a context variable Z “selects” which Laplacian distribution will represent ICA source signal u. Laplacian distributions have 0-mean but different variances. The advantage of Laplacian distribution for modeling context is that we can model a sparse distribution using only one Laplacian distribution. But we need more than two Gaussian distributions to do the same thing. Also conventional ICA is a special case of our model with one Laplacian. We define the mixture model and its learning algorithm in the next sections. Figure 1: Proposed hierarchical model (1st stage is ICA generative model. 2nd stage is mixture of “context dependent” Laplacian distributions which model U. Z is a random variable that selects a Laplacian distribution that generates the given image patch) 2.1 Mixture of Laplacian Distribution We define a PDF for mixture of M-dimensional Laplacian Distribution as Eq.(5), where N is the number of data samples, and K is the number of mixtures. N N K M N K r r r P(U | Λ, Π) = ∏ P(u n | Λ, Π) = ∏∑ π k P(u n | λk ) = ∏∑ π k ∏ n n k n k m 1 (2λ ) k ,m u n,m exp − λk , m (5) r r r r r un = (un,1 , un , 2 , , , un,M ) : n-th data sample, U = (u1 , u 2 , , , ui , , , u N ) r r r r r λk = (λk ,1 , λk , 2 ,..., λk ,M ) : Variance of k-th Laplacian distribution, Λ = (λ1 , λ2 , , , λk , , , λK ) πk : probability of Laplacian distribution k, Π = (π 1 , , , π K ) and ∑ k πk =1 It is not easy to maximize Eq.(5) directly, and we use EM (expectation maximization) algorithm for parameter estimation. Here we introduce a new hidden context variable Z that represents which Laplacian k, is responsible for a given data point. Assuming we know the hidden variable Z, we can write the likelihood of data and Z as Eq.(6), n zk K N r (π )zkn 1 ⋅ exp − z n u n ,m P(U , Z | Λ, Π ) = ∏ P(u n , Z | Λ, Π ) = ∏ ∏ k ∏ k k λk , m n n m 2λk ,m N (6) n z k : Hidden binary random variable, 1 if n-th data sample is generated from k-th n Laplacian, 0 other wise. ( Z = (z kn ) and ∑ z k = 1 for all n = 1…N) k 2.2 EM algorithm for learning the mixture model The EM algorithm maximizes the log likelihood of data averaged over hidden variable Z. The log likelihood and its expectation can be computed as in Eq.(7,8). u 1 n n log P(U , Z | Λ, Π ) = ∑ z k log(π k ) + ∑ z k log( ) − n ,m 2λk ,m λk , m n ,k m (7) u 1 n E {log P (U , Z | Λ, Π )} = ∑ E z k log(π k ) + ∑ log( ) − n ,m 2λ k , m λk , m n ,k m { } (8) The expectation in Eq.(8) can be evaluated, if we are given the data U and estimated parameters Λ and Π. For Λ and Π, EM algorithm uses current estimation Λ’ and Π’. { } { } ∑ z P( z n n E z k ≡ E zk | U , Λ' , Π ' = 1 n z k =0 n k n k n | u n , Λ' , Π ' ) = P( z k = 1 | u n , Λ' , Π ' ) (9) = n n P (u n | z k = 1, Λ' , Π ' ) P( z k = 1 | Λ ' , Π ' ) P(u n | Λ' , Π ' ) = M u n ,m 1 1 1 ∏ 2λ ' exp(− λ ' ) ⋅ π k ' = c P (u n | Λ ' , Π ' ) m k ,m k ,m n M πk ' ∏ 2λ m k ,m ' exp(− u n ,m λk , m ' ) Where the normalization constant can be computed as K K M k k =1 m =1 n cn = P (u n | Λ ' , Π ' ) = ∑ P (u n | z k , Λ ' , Π ' ) P ( z kn | Λ ' , Π ' ) = ∑ π k ∏ 1 (2λ ) exp( − k ,m u n ,m λk ,m ) (10) The EM algorithm works by maximizing Eq.(8), given the expectation computed from Eq.(9,10). Eq.(9,10) can be computed using Λ’ and Π’ estimated in the previous iteration of EM algorithm. This is E-step of EM algorithm. Then in M-step of EM algorithm, we need to maximize Eq.(8) over parameter Λ and Π. First, we can maximize Eq.(8) with respect to Λ, by setting the derivative as 0. 1 u n,m ∂E{log P (U , Z | Λ, Π )} n = 0 = ∑ E z k − + λ k , m (λ k , m ) 2 ∂λ k ,m n { } ⇒ λ k ,m ∑ E{z }⋅ u = ∑ E{z } n k n ,m n (11) n k n Second, for maximization of Eq.(8) with respect to Π, we can rewrite Eq.(8) as below. n (12) E {log P (U , Z | Λ , Π )} = C + ∑ E {z k ' }log(π k ' ) n ,k ' As we see, the derivative of Eq.(12) with respect to Π cannot be 0. Instead, we need to use Lagrange multiplier method for maximization. A Lagrange function can be defined as Eq.(14) where ρ is a Lagrange multiplier. { } (13) n L (Π , ρ ) = − ∑ E z k ' log(π k ' ) + ρ (∑ π k ' − 1) n,k ' k' By setting the derivative of Eq.(13) to be 0 with respect to ρ and Π, we can simply get the maximization solution with respect to Π. We just show the solution in Eq.(14). ∂L(Π, ρ ) ∂L(Π, ρ ) =0 = 0, ∂Π ∂ρ n n ⇒ π k = ∑ E z k / ∑∑ E z k k n n { } { } (14) Then the EM algorithm can be summarized as figure 2. For the convergence criteria, we can use the expectation of log likelihood, which can be calculated from Eq. (8). πk = { } , λk , m = E um + e (e is small random noise) 2. Calculate the Expectation by 1. Initialize 1 K u n ,m 1 M πk ' ∏ 2λ ' exp( − λ ' ) cn m k ,m k ,m 3. Maximize the log likelihood given the Expectation { } { } n n E z k ≡ E zk | U , Λ' , Π ' = λk ,m ← ∑ E {z kn }⋅ u n,m / ∑ E {z kn } , π k ← ∑ E {z kn } / ∑∑ E {z kn } n n k n 4. If (converged) stop, otherwise repeat from step 2. n Figure 2: Outline of EM algorithm for Learning the Mixture Model 3 Experimental Results Here we provide examples of image data and show how the learning procedure is performed for the mixture model. We also provide visualization of learned variances that reveal the structure of variance correlation and an application to image denoising. 3.1 Learning Nonlinear Dependencies in Natural images As shown in figure 1, the 1 st stage of the proposed model is simply the linear ICA. The ICA matrix A and W(=A-1) are learned by the FastICA algorithm [9]. We sampled 105(=N) data from 16x16 patches (256 dim.) of natural images and use them for both first and second stage learning. ICA input dimension is 256, and source dimension is set to be 160(=M). The learned ICA basis is partially shown in figure 1. The 2nd stage mixture model is learned given the ICA source signals. In the 2 nd stage the number of mixtures is set to 16, 64, or 256(=K). Training by the EM algorithm is fast and several hundred iterations are sufficient for convergence (0.5 hour on a 1.7GHz Pentium PC). For the visualization of learned variance, we adapted the visualization method from [8]. Each dimension of ICA source signal corresponds to an ICA basis (columns of A) and each ICA basis is localized in both image and frequency space. Then for each Laplacian distribution, we can display its variance vector as a set of points in image and frequency space. Each point can be color coded by variance value as figure 3. (a1) (a2) (b1) (b2) Figure 3: Visualization of learned variances (a1 and a2 visualize variance of Laplacian #4 and b1 and 2 show that of Laplacian #5. High variance value is mapped to red color and low variance is mapped to blue. In Laplacian #4, variances for diagonally oriented edges are high. But in Laplacian #5, variances for edges at spatially right position are high. Variance structures are related to “contexts” in the image. For example, Laplacian #4 explains image patches that have oriented textures or edges. Laplacian #5 captures patches where left side of the patch is clean but right side is filled with randomly oriented edges.) A key idea of our model is that we can mix up independent distributions to get nonlinearly dependent distribution. This modeling power can be shown by figure 4. Figure 4: Joint distribution of nonlinearly dependent sources. ((a) is a joint histogram of 2 ICA sources, (b) is computed from learned mixture model, and (c) is from learned Laplacian model. In (a), variance of u2 is smaller than u1 at center area (arrow A), but almost equal to u1 at outside (arrow B). So the variance of u2 is dependent on u1. This nonlinear dependency is closely approximated by mixture model in (b), but not in (c).) 3.2 Unsupervised Image Segmentation The idea behind our model is that the image can be modeled as mixture of different variance correlated “contexts”. We show how the learned model can be used to classify different context by an unsupervised image segmentation task. Given learned model and data, we can compute the expectation of a hidden variable Z from Eq. (9). Then for an image patch, we can select a Laplacian distribution with highest probability, which is the most explaining Laplacian or “context”. For segmentation, we use the model with 16 Laplacians. This enables abstract partitioning of images and we can visualize organization of images more clearly (figure 5). Figure 5: Unsupervised image segmentation (left is original image, middle is color labeled image, right image shows color coded Laplacians with variance structure. Each color corresponds to a Laplacian distribution, which represents surface or textural organization of underlying contexts. Laplacian #14 captures smooth surface and Laplacian #9 captures contrast between clear sky and textured ground scenes.) 3.3 Application to Image Restoration The proposed mixture model provides a better parametric model of the ICA source distribution and hence an improved model of the image structure. An advantage is in the MAP (maximum a posterior) estimation of a noisy image. If we assume Gaussian noise n, the image generation model can be written as Eq.(15). Then, we can compute MAP estimation of ICA source signal u by Eq.(16) and reconstruct the original image. (15) X = Au + n (16) ˆ u = argmax log P (u | X , A) = argmax (log P ( X | u , A) + log P (u ) ) u u Since we assumed Gaussian noise, P(X|u,A) in Eq. (16) is Gaussian. P(u) in Eq. (16) can be modeled as a Laplacian or a mixture of Laplacian distribution. The mixture distribution can be approximated by a maximum explaining Laplacian. We evaluated 3 different methods for image restoration including ICA MAP estimation with simple Laplacian prior, same with Laplacian mixture prior, and the Wiener filter. Figure 6 shows an example and figure 7 summarizes the results obtained with different noise levels. As shown MAP estimation with the mixture prior performs better than the others in terms of SNR and SSIM (Structural Similarity Measure) [10]. Figure 6: Image restoration results (signal variance 1.0, noise variance 0.81) 16 ICA MAP (Mixture prior) ICA MAP (Laplacian prior) W iener 14 0.8 SSIM Index SNR 12 10 8 6 0.6 0.4 0.2 4 2 ICA MAP(Mixture prior) ICA MAP(Laplacian prior) W iener Noisy Image 1 0 0.5 1 1.5 Noise variance 2 2.5 0 0 0.5 1 1.5 Noise variance 2 2.5 Figure 7: SNR and SSIM for 3 different algorithms (signal variance = 1.0) 4 D i s c u s s i on We proposed a mixture model to learn nonlinear dependencies of ICA source signals for natural images. The proposed mixture of Laplacian distribution model is a generalization of the conventional independent source priors and can model variance dependency given natural image signals. Experiments show that the proposed model can learn the variance correlated signals grouped as different mixtures and learn highlevel structures, which are highly correlated with the underlying physical properties captured in the image. Our model provides an analytic prior of nearly independent and variance-correlated signals, which was not viable in previous models [4,5,6,7,8]. The learned variances of the mixture model show structured localization in image and frequency space, which are similar to the result in [8]. Since the model is given no information about the spatial location or frequency of the source signals, we can assume that the dependency captured by the mixture model reveals regularity in the natural images. As shown in image labeling experiments, such regularities correspond to specific surface types (textures) or boundaries between surfaces. The learned mixture model can be used to discover hidden contexts that generated such regularity or correlated signal groups. Experiments also show that the labeling of image patches is highly correlated with the object surface types shown in the image. The segmentation results show regularity across image space and strong correlation with high-level concepts. Finally, we showed applications of the model for image restoration. We compare the performance with the conventional ICA MAP estimation and Wiener filter. Our results suggest that the proposed model outperforms other traditional methods. It is due to the estimation of the correlated variance structure, which provides an improved prior that has not been considered in other methods. In our future work, we plan to exploit the regularity of the image segmentation result to lean more high-level structures by building additional hierarchies on the current model. Furthermore, the application to image coding seems promising. References [1] A. J. Bell and T. J. Sejnowski, The ‘Independent Components’ of Natural Scenes are Edge Filters, Vision Research, 37(23):3327–3338, 1997. [2] A. Hyvarinen, Sparse Code Shrinkage: Denoising of Nongaussian Data by Maximum Likelihood Estimation,Neural Computation, 11(7):1739-1768, 1999. [3] T. Lee, M. Lewicki, and T. Sejnowski., ICA Mixture Models for unsupervised Classification of non-gaussian classes and automatic context switching in blind separation. PAMI, 22(10), October 2000. [4] J. Portilla, V. Strela, M. J. Wainwright and E. P Simoncelli, Image Denoising using Scale Mixtures of Gaussians in the Wavelet Domain, IEEE Trans. On Image Processing, Vol.12, No. 11, 1338-1351, 2003. [5] A. Hyvarinen, P. O. Hoyer. Emergence of phase and shift invariant features by decomposition of natural images into independent feature subspaces. Neurocomputing, 1999. [6] A. Hyvarinen, P.O. Hoyer, Topographic Independent component analysis as a model of V1 Receptive Fields, Neurocomputing, Vol. 38-40, June 2001. [7] M. Welling and G. E. Hinton, S. Osindero, Learning Sparse Topographic Representations with Products of Student-t Distributions, NIPS, 2002. [8] M. S. Lewicki and Y. Karklin, Learning higher-order structures in natural images, Network: Comput. Neural Syst. 14 (August 2003) 483-499. [9] A.Hyvarinen, P.O. Hoyer, Fast ICA matlab code., http://www.cis.hut.fi/projects/compneuro/extensions.html/ [10] Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, The SSIM Index for Image Quality Assessment, IEEE Transactions on Image Processing, vol. 13, no. 4, Apr. 2004.
4 0.55841178 33 nips-2004-Brain Inspired Reinforcement Learning
Author: Françcois Rivest, Yoshua Bengio, John Kalaska
Abstract: Successful application of reinforcement learning algorithms often involves considerable hand-crafting of the necessary non-linear features to reduce the complexity of the value functions and hence to promote convergence of the algorithm. In contrast, the human brain readily and autonomously finds the complex features when provided with sufficient training. Recent work in machine learning and neurophysiology has demonstrated the role of the basal ganglia and the frontal cortex in mammalian reinforcement learning. This paper develops and explores new reinforcement learning algorithms inspired by neurological evidence that provides potential new approaches to the feature construction problem. The algorithms are compared and evaluated on the Acrobot task. 1
5 0.43011171 198 nips-2004-Unsupervised Variational Bayesian Learning of Nonlinear Models
Author: Antti Honkela, Harri Valpola
Abstract: In this paper we present a framework for using multi-layer perceptron (MLP) networks in nonlinear generative models trained by variational Bayesian learning. The nonlinearity is handled by linearizing it using a Gauss–Hermite quadrature at the hidden neurons. This yields an accurate approximation for cases of large posterior variance. The method can be used to derive nonlinear counterparts for linear algorithms such as factor analysis, independent component/factor analysis and state-space models. This is demonstrated with a nonlinear factor analysis experiment in which even 20 sources can be estimated from a real world speech data set. 1
6 0.41911381 172 nips-2004-Sparse Coding of Natural Images Using an Overcomplete Set of Limited Capacity Units
7 0.36172509 5 nips-2004-A Harmonic Excitation State-Space Approach to Blind Separation of Speech
8 0.29180771 20 nips-2004-An Auditory Paradigm for Brain-Computer Interfaces
9 0.29119468 52 nips-2004-Discrete profile alignment via constrained information bottleneck
10 0.28164533 81 nips-2004-Implicit Wiener Series for Higher-Order Image Analysis
11 0.27528501 25 nips-2004-Assignment of Multiplicative Mixtures in Natural Images
12 0.26498911 27 nips-2004-Bayesian Regularization and Nonnegative Deconvolution for Time Delay Estimation
13 0.26289582 207 nips-2004-ℓ₀-norm Minimization for Basis Selection
14 0.26057085 21 nips-2004-An Information Maximization Model of Eye Movements
15 0.2595951 144 nips-2004-Parallel Support Vector Machines: The Cascade SVM
16 0.25522614 84 nips-2004-Inference, Attention, and Decision in a Bayesian Neural Architecture
17 0.25108981 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment
18 0.24751188 155 nips-2004-Responding to Modalities with Different Latencies
19 0.24665624 97 nips-2004-Learning Efficient Auditory Codes Using Spikes Predicts Cochlear Filters
20 0.2185387 18 nips-2004-Algebraic Set Kernels with Application to Inference Over Local Image Representations
topicId topicWeight
[(13, 0.103), (15, 0.075), (26, 0.047), (31, 0.015), (33, 0.131), (35, 0.026), (39, 0.022), (50, 0.011), (51, 0.029), (53, 0.3), (59, 0.01), (76, 0.095), (77, 0.028)]
simIndex simValue paperId paperTitle
Author: Baranidharan Raman, Ricardo Gutierrez-osuna
Abstract: This paper presents a neuromorphic model of two olfactory signalprocessing primitives: chemotopic convergence of olfactory receptor neurons, and center on-off surround lateral inhibition in the olfactory bulb. A self-organizing model of receptor convergence onto glomeruli is used to generate a spatially organized map, an olfactory image. This map serves as input to a lattice of spiking neurons with lateral connections. The dynamics of this recurrent network transforms the initial olfactory image into a spatio-temporal pattern that evolves and stabilizes into odor- and intensity-coding attractors. The model is validated using experimental data from an array of temperature-modulated gas sensors. Our results are consistent with recent neurobiological findings on the antennal lobe of the honeybee and the locust. 1 In trod u ction An artificial olfactory system comprises of an array of cross-selective chemical sensors followed by a pattern recognition engine. An elegant alternative for the processing of sensor-array signals, normally performed with statistical pattern recognition techniques [1], involves adopting solutions from the biological olfactory system. The use of neuromorphic approaches provides an opportunity for formulating new computational problems in machine olfaction, including mixture segmentation, background suppression, olfactory habituation, and odor-memory associations. A biologically inspired approach to machine olfaction involves (1) identifying key signal processing primitives in the olfactory pathway, (2) adapting these primitives to account for the unique properties of chemical sensor signals, and (3) applying the models to solving specific computational problems. The biological olfactory pathway can be divided into three general stages: (i) olfactory epithelium, where primary reception takes place, (ii) olfactory bulb (OB), where the bulk of signal processing is performed and, (iii) olfactory cortex, where odor associations are stored. A review of literature on olfactory signal processing reveals six key primitives in the olfactory pathway that can be adapted for use in machine olfaction. These primitives are: (a) chemical transduction into a combinatorial code by a large population of olfactory receptor neurons (ORN), (b) chemotopic convergence of ORN axons onto glomeruli (GL), (c) logarithmic compression through lateral inhibition at the GL level by periglomerular interneurons, (d) contrast enhancement through lateral inhibition of mitral (M) projection neurons by granule interneurons, (e) storage and association of odor memories in the piriform cortex, and (f) bulbar modulation through cortical feedback [2, 3]. This article presents a model that captures the first three abovementioned primitives: population coding, chemotopic convergence and contrast enhancement. The model operates as follows. First, a large population of cross-selective pseudosensors is generated from an array of metal-oxide (MOS) gas sensors by means of temperature modulation. Next, a self-organizing model of convergence is used to cluster these pseudo-sensors according to their relative selectivity. This clustering generates an initial spatial odor map at the GL layer. Finally, a lattice of spiking neurons with center on-off surround lateral connections is used to transform the GL map into identity- and intensity-specific attractors. The model is validated using a database of temperature-modulated sensor patterns from three analytes at three concentration levels. The model is shown to address the first problem in biologically-inspired machine olfaction: intensity and identity coding of a chemical stimulus in a manner consistent with neurobiology [4, 5]. 2 M o d e l i n g c h e m o t opi c c o n v e r g e n c e The projection of sensory signals onto the olfactory bulb is organized such that ORNs expressing the same receptor gene converge onto one or a few GLs [3]. This convergence transforms the initial combinatorial code into an organized spatial pattern (i.e., an olfactory image). In addition, massive convergence improves the signal to noise ratio by integrating signals from multiple receptor neurons [6]. When incorporating this principle into machine olfaction, a fundamental difference between the artificial and biological counterparts must be overcome: the input dimensionality at the receptor/sensor level. The biological olfactory system employs a large population of ORNs (over 100 million in humans, replicated from 1,000 primary receptor types), whereas its artificial analogue uses a few chemical sensors (commonly one replica of up to 32 different sensor types). To bridge this gap, we employ a sensor excitation technique known as temperature modulation [7]. MOS sensors are conventionally driven in an isothermal fashion by maintaining a constant temperature. However, the selectivity of these devices is a function of the operating temperature. Thus, capturing the sensor response at multiple temperatures generates a wealth of additional information as compared to the isothermal mode of operation. If the temperature is modulated slow enough (e.g., mHz), the behavior of the sensor at each point in the temperature cycle can then be treated as a pseudo-sensor, and thus used to simulate a large population of cross-selective ORNs (refer to Figure 1(a)). To model chemotopic convergence, these temperature-modulated pseudo-sensors (referred to as ORNs in what follows) must be clustered according to their selectivity [8]. As a first approximation, each ORN can be modeled by an affinity vector [9] consisting of the responses across a set of C analytes: r K i = K i1 , K i2 ,..., K iC (1) [ ] where K ia is the response of the ith ORN to analyte a. The selectivity of this ORN r is then defined by the orientation of the affinity vector Κ i . A close look at the OB also shows that neighboring GLs respond to similar odors [10]. Therefore, we model the ORN-GL projection with a Kohonen self-organizing map (SOM) [11]. In our model, the SOM is trained to model the distribution of r ORNs in chemical sensitivity space, defined by the affinity vector Κ i . Once the training of the SOM is completed, each ORN is assigned to the closest SOM node (a simulated GL) in affinity space, thereby forming a convergence map. The response of each GL can then be computed as Ga = σ j (∑ N i =1 Wij ⋅ ORN ia ) (2) where ORN ia is the response of pseudo-sensor i to analyte a, Wij=1 if pseudo-sensor i converges to GL j and zero otherwise, and σ (⋅) is a squashing sigmoidal function that models saturation. This convergence model works well under the assumption that the different sensory inputs are reasonably uncorrelated. Unfortunately, most gas sensors are extremely collinear. As a result, this convergence model degenerates into a few dominant GLs that capture most of the sensory activity, and a large number of dormant GLs that do not receive any projections. To address this issue, we employ a form of competition known as conscience learning [12], which incorporates a habituation mechanism to prevent certain SOM nodes from dominating the competition. In this scheme, the fraction of times that a particular SOM node wins the competition is used as a bias to favor non-winning nodes. This results in a spreading of the ORN projections to neighboring units and, therefore, significantly reduces the number of dormant units. We measure the performance of the convergence mapping with the entropy across the lattice, H = −∑ Pi log Pi , where Pi is the fraction of ORNs that project to SOM node i [13]. To compare Kohonen and conscience learning, we built convergence mappings with 3,000 pseudo-sensors and 400 GL units (refer to section 4 for details). The theoretical maximum of the entropy for this network, which corresponds to a uniform distribution, is 8.6439. When trained with Kohonen’s algorithm, the entropy of the SOM is 7.3555. With conscience learning, the entropy increases to 8.2280. Thus, conscience is an effective mechanism to improve the spreading of ORN projections across the GL lattice. 3 M o d e l i n g t h e o l f a c t o r y b u l b n e t wo r k Mitral cells, which synapse ORNs at the GL level, transform the initial olfactory image into a spatio-temporal code by means of lateral inhibition. Two roles have been suggested for this lateral inhibition: (a) sharpening of the molecular tuning range of individual M cells with respect to that of their corresponding ORNs [10], and (b) global redistribution of activity, such that the bulb-wide representation of an odorant, rather than the individual tuning ranges, becomes specific and concise over time [3]. More recently, center on-off surround inhibitory connections have been found in the OB [14]. These circuits have been suggested to perform pattern normalization, noise reduction and contrast enhancement of the spatial patterns. We model each M cell using a leaky integrate-and-fire spiking neuron [15]. The input current I(t) and change in membrane potential u(t) of a neuron are given by: I (t ) = du u (t ) +C dt R (3) du τ = −u (t ) + R ⋅ I (t ) [τ = RC ] dt Each M cell receives current Iinput from ORNs and current Ilateral from lateral connections with other M cells: I input ( j ) = ∑Wij ⋅ ORNi i (4) I lateral ( j , t ) = ∑ Lkj ⋅ α (k , t − 1) k where Wij indicates the presence/absence of a synapse between ORNi and Mj, as determined by the chemotopic mapping, Lkj is the efficacy of the lateral connection between Mk and Mj, and α(k,t-1) is the post-synaptic current generated by a spike at Mk: α (k , t − 1) = − g (k , t − 1) ⋅ [u ( j, t − 1) + − Esyn ] (5) g(k,t-1) is the conductance of the synapse between Mk and Mj at time t-1, u(j,t-1) is the membrane potential of Mj at time t-1 and the + subscript indicates this value becomes zero if negative, and Esyn is the reverse synaptic potential. The change in conductance of post-synaptic membrane is: & g (k , t ) = dg (k , t ) − g (k , t ) = + z (k , t ) dt τ syn & z (k , t ) = dz (k , t ) − z ( k , t ) = + g norm ⋅ spk ( k , t ) dt τ syn (6) where z(.) and g(.) are low pass filters of the form exp(-t/τsyn) and t ⋅ exp(−t / τ syn ) , respectively, τsyn is the synaptic time constant, gnorm is a normalization constant, and spk(j,t) marks the occurrence of a spike in neuron i at time t: 1 u ( j , t ) = Vspike spk ( j , t ) = 0 u ( j , t ) ≠ Vspike (7) Combining equations (3) and (4), the membrane potential can be expressed as: du ( j , t ) − u ( j, t ) I lateral ( j, t ) I input ( j ) = + + dt RC C C & u ( j , t − 1) + u ( j , t − 1) ⋅ dt u ( j, t ) < Vthreshold u ( j, t ) = Vspike u ( j, t ) ≥ Vthreshold & u ( j, t ) = (8) When the membrane potential reaches Vthreshold, a spike is generated, and the membrane potential is reset to Vrest. Any further inputs to the neuron are ignored during the subsequent refractory period. Following [14], lateral interactions are modeled with a center on-off surround matrix Lij. Each M cell makes excitatory synapses to nearby M cells (d
same-paper 2 0.75246716 104 nips-2004-Linear Multilayer Independent Component Analysis for Large Natural Scenes
Author: Yoshitatsu Matsuda, Kazunori Yamaguchi
Abstract: In this paper, linear multilayer ICA (LMICA) is proposed for extracting independent components from quite high-dimensional observed signals such as large-size natural scenes. There are two phases in each layer of LMICA. One is the mapping phase, where a one-dimensional mapping is formed by a stochastic gradient algorithm which makes more highlycorrelated (non-independent) signals be nearer incrementally. Another is the local-ICA phase, where each neighbor (namely, highly-correlated) pair of signals in the mapping is separated by the MaxKurt algorithm. Because LMICA separates only the highly-correlated pairs instead of all ones, it can extract independent components quite efficiently from appropriate observed signals. In addition, it is proved that LMICA always converges. Some numerical experiments verify that LMICA is quite efficient and effective in large-size natural image processing.
3 0.69537359 162 nips-2004-Semi-Markov Conditional Random Fields for Information Extraction
Author: Sunita Sarawagi, William W. Cohen
Abstract: We describe semi-Markov conditional random fields (semi-CRFs), a conditionally trained version of semi-Markov chains. Intuitively, a semiCRF on an input sequence x outputs a “segmentation” of x, in which labels are assigned to segments (i.e., subsequences) of x rather than to individual elements xi of x. Importantly, features for semi-CRFs can measure properties of segments, and transitions within a segment can be non-Markovian. In spite of this additional power, exact learning and inference algorithms for semi-CRFs are polynomial-time—often only a small constant factor slower than conventional CRFs. In experiments on five named entity recognition problems, semi-CRFs generally outperform conventional CRFs. 1
Author: Tobias Blaschke, Laurenz Wiskott
Abstract: In contrast to the equivalence of linear blind source separation and linear independent component analysis it is not possible to recover the original source signal from some unknown nonlinear transformations of the sources using only the independence assumption. Integrating the objectives of statistical independence and temporal slowness removes this indeterminacy leading to a new method for nonlinear blind source separation. The principle of temporal slowness is adopted from slow feature analysis, an unsupervised method to extract slowly varying features from a given observed vectorial signal. The performance of the algorithm is demonstrated on nonlinearly mixed speech data. 1
5 0.54765421 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants
Author: Volker Roth
Abstract: The problem of detecting “atypical objects” or “outliers” is one of the classical topics in (robust) statistics. Recently, it has been proposed to address this problem by means of one-class SVM classifiers. The main conceptual shortcoming of most one-class approaches, however, is that in a strict sense they are unable to detect outliers, since the expected fraction of outliers has to be specified in advance. The method presented in this paper overcomes this problem by relating kernelized one-class classification to Gaussian density estimation in the induced feature space. Having established this relation, it is possible to identify “atypical objects” by quantifying their deviations from the Gaussian model. For RBF kernels it is shown that the Gaussian model is “rich enough” in the sense that it asymptotically provides an unbiased estimator for the true density. In order to overcome the inherent model selection problem, a cross-validated likelihood criterion for selecting all free model parameters is applied. 1
6 0.53948289 30 nips-2004-Binet-Cauchy Kernels
7 0.53744149 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform
8 0.52620298 5 nips-2004-A Harmonic Excitation State-Space Approach to Blind Separation of Speech
9 0.51739103 181 nips-2004-Synergies between Intrinsic and Synaptic Plasticity in Individual Model Neurons
10 0.50729215 124 nips-2004-Multiple Alignment of Continuous Time Series
11 0.50569093 131 nips-2004-Non-Local Manifold Tangent Learning
12 0.50568134 102 nips-2004-Learning first-order Markov models for control
13 0.50464779 163 nips-2004-Semi-parametric Exponential Family PCA
14 0.50464439 204 nips-2004-Variational Minimax Estimation of Discrete Distributions under KL Loss
15 0.50302339 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment
16 0.50293052 25 nips-2004-Assignment of Multiplicative Mixtures in Natural Images
17 0.50220895 70 nips-2004-Following Curved Regularized Optimization Solution Paths
18 0.50196683 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition
19 0.50068271 116 nips-2004-Message Errors in Belief Propagation
20 0.49919647 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms