nips nips2011 nips2011-273 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jun-ichiro Hirayama, Aapo Hyvärinen
Abstract: Components estimated by independent component analysis and related methods are typically not independent in real data. A very common form of nonlinear dependency between the components is correlations in their variances or energies. Here, we propose a principled probabilistic model to model the energycorrelations between the latent variables. Our two-stage model includes a linear mixing of latent signals into the observed ones like in ICA. The main new feature is a model of the energy-correlations based on the structural equation model (SEM), in particular, a Linear Non-Gaussian SEM. The SEM is closely related to divisive normalization which effectively reduces energy correlation. Our new twostage model enables estimation of both the linear mixing and the interactions related to energy-correlations, without resorting to approximations of the likelihood function or other non-principled approaches. We demonstrate the applicability of our method with synthetic dataset, natural images and brain signals. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Structural equations and divisive normalization for energy-dependent component analysis Jun-ichiro Hirayama Dept. [sent-1, score-0.357]
2 of Systems Science Graduate School of of Informatics Kyoto University 611-0011 Uji, Kyoto, Japan Aapo Hyv¨ rinen a Dept. [sent-2, score-0.105]
3 of Computer Science and HIIT University of Helsinki 00560 Helsinki, Finland Abstract Components estimated by independent component analysis and related methods are typically not independent in real data. [sent-4, score-0.103]
4 A very common form of nonlinear dependency between the components is correlations in their variances or energies. [sent-5, score-0.118]
5 Our two-stage model includes a linear mixing of latent signals into the observed ones like in ICA. [sent-7, score-0.207]
6 The main new feature is a model of the energy-correlations based on the structural equation model (SEM), in particular, a Linear Non-Gaussian SEM. [sent-8, score-0.081]
7 The SEM is closely related to divisive normalization which effectively reduces energy correlation. [sent-9, score-0.326]
8 Our new twostage model enables estimation of both the linear mixing and the interactions related to energy-correlations, without resorting to approximations of the likelihood function or other non-principled approaches. [sent-10, score-0.104]
9 We demonstrate the applicability of our method with synthetic dataset, natural images and brain signals. [sent-11, score-0.144]
10 1 Introduction Statistical models of natural signals have provided a rich framework to describe how sensory neurons process and adapt to ecologically-valid stimuli [28, 12]. [sent-12, score-0.13]
11 In early studies, independent component analysis (ICA) [2, 31, 13] and sparse coding [22] have successfully shown that V1 simple cell-like edge filters, or receptive fields, emerge as optimal inference on latent quantities under linear generative models trained on natural image patches. [sent-13, score-0.267]
12 Interestingly, such energy correlations are also prominent in other kinds of data, including brain signals [33] and presumably even financial time series which have strong heteroscedasticity. [sent-20, score-0.179]
13 Our two-stage model includes a linear mixing of latent signals into the observed ones like in ICA, and a model of the energy-correlations based on the structural equation model (SEM) [3], in particular the Linear Non-Gaussian (LiNG) SEM [27, 18] developed recently. [sent-23, score-0.288]
14 1 We provide a new generative interpretation of DN based on the SEM, which is an important contribution of this work. [sent-25, score-0.094]
15 Also, from machine learning perspective, causal analysis by using SEM has recently become very popular; our model could extend the applicability of LiNG-SEM for blindly mixed signals. [sent-26, score-0.106]
16 2 Structural equation model and divisive normalization A structural equation model (SEM) [3] of a random vector y = (y1 , y2 , . [sent-33, score-0.355]
17 , yd )⊤ is formulated as simultaneous equations of random variables, such that yi = κi (yi , y −i , ri ), i = 1, 2, . [sent-36, score-0.21]
18 , d, (1) or y = κ(y, r), where the function κi describes how each single variable yi is related to other variables y −i , possibly including itself, and a corresponding stochastic disturbance or external input ri which is independent of y. [sent-39, score-0.257]
19 These equations, called structural equations, specify the distribution of y, as y is an implicit function (assuming the system is invertible) of the random vector r = (r1 , r2 , . [sent-40, score-0.081]
20 Otherwise, the SEM is called non-recursive or cyclic, where the structural equations cannot be simply decomposed into regressive models. [sent-45, score-0.125]
21 In a standard interpretation, a cyclic SEM rather describes the distribution of equilibrium points of a dynamical system, y(t) = κ(y(t − 1), r) (t = 0, 1, . [sent-46, score-0.158]
22 1 Divisive normalization as non-linear SEM Now, we briefly point out the connection of SEM to DN, which strongly motivated us to explore the application of SEM to natural signal statistics. [sent-51, score-0.276]
23 The outputs of linear filters often have the property that their energies ϕ(|si |) (i = 1, 2, . [sent-64, score-0.104]
24 Although several variants have been proposed, a basic form can be formulated as follows: Given the d outputs, their energies are normalized (divided) by a linear combination of the energies of other signals, such that zi = ∑ ϕ(|si |) , j hij ϕ(|sj |) + hi0 i = 1, 2, . [sent-72, score-0.456]
25 , d, (2) where hij and hi0 are real-valued parameters of this transform. [sent-75, score-0.235]
26 Now, it is straightforward to see that the following structural equations in the log-energy domain, ∑ yi := ln ϕ(|si |) = ln( hij exp(yj ) + hi0 ) + ri , i = 1, 2, . [sent-76, score-0.636]
27 (2) where zi = exp(ri ) is another representation of the disturbance. [sent-80, score-0.071]
28 The SEM will typically be cyclic, since the coefficients hij in Eq. [sent-81, score-0.235]
29 (3) thus implies a nonlinear dynamical system, and this can be interpreted as the data-generating processes underlying DN. [sent-83, score-0.096]
30 (3) also implies a linear system with multiplicative ∑ input, yi = ( j hij yj + hi0 )zi , in the energy domain, i. [sent-85, score-0.409]
31 (2) gives the optimal mapping under the SEM to infer the disturbance from given si ’s; if the true disturbances are independent, it optimally reduces the energy-dependencies. [sent-89, score-0.253]
32 3 Energy-dependent ICA using structural equation model Now, we define a new generative model which models energy-dependencies of linear latent components using an SEM. [sent-94, score-0.249]
33 1 Scale-mixture model Let s now be a random vector of d source signals underlying an observation x = (x1 , x2 , . [sent-96, score-0.108]
34 They follow a standard linear generative model: x = As, (4) where A is a square mixing matrix. [sent-100, score-0.125]
35 Then, assuming A is invertible, each transposed row wi of the demixing (filtering) matrix W = A−1 gives the optimal filter to recover si from x, which is constrained to have unit norm, ∥wi ∥2 = 1 to fix the scaling ambiguity. [sent-102, score-0.159]
36 Here, u and σ are mutually independent, and ui ’s 2 2 2 are also independent of each other. [sent-105, score-0.101]
37 2 Linear Non-Gaussian SEM Here, we simplify the above scale-mixture model by restricting ui to be binary, i. [sent-112, score-0.069]
38 Also, this implies that ui = sign(si ) and σi = |si |, and hence the log-energy above now has a simple deterministic relation to σi , i. [sent-118, score-0.069]
39 yi = ln ϕ(σi ), which can be inverted to σi = ϕ−1 (exp(yi )). [sent-120, score-0.183]
40 We particularly assume the log-energies yi follow the Linear Non-Gaussian (LiNG) [27, 18] SEM: ∑ yi = hij yj + hi0 + ri , i = 1, 2, . [sent-121, score-0.523]
41 , d, (6) j where the disturbances are zero-mean and in particular assumed to be non-Gaussian and independent of each other, which has been shown to greatly improve the identifiability of linear SEMs [27]; the interaction structure in Eq. [sent-124, score-0.106]
42 (6) can be represented by a directed graph for which the matrix 1 To be precise, [20] showed the invertibility of the entire mapping s → z in the case of a “signed” DN transform that keeps the signs of zi and si to be the same. [sent-125, score-0.372]
43 (6) is equivalent to (∏ hij ) h yi = e i0 zi (i = 1, 2, . [sent-128, score-0.379]
44 , d), and interestingly, these SEMs further imply a novel form of j yj DN transform, given by ϕ(|si |) zi = hi0 ∏ (7) , i = 1, 2, . [sent-131, score-0.12]
45 , d, hij e j ϕ(|sj |) where the denominator is now not additive but multiplicative. [sent-134, score-0.235]
46 (5) with σi = ϕ−1 (exp(yi )) and random signs, ui ; and 3) the observation x is obtained by linearly mixing the sources as in Eq. [sent-138, score-0.195]
47 In our model, the optimal mapping to infer zi = exp(ri ) from x under this model is the linear filtering W followed by the new DN transform, Eq. [sent-140, score-0.071]
48 Then, the optimal inference would be given by the divisive normalization in Eq. [sent-144, score-0.274]
49 The permutation ambiguity is more serious than in the case of ICA, because the row-permutation of H completely changes the structure of corresponding directed graph, and is typically addressed by constraining the graph structure, as will be discussed next. [sent-157, score-0.097]
50 The other is generally referred to as LiNG [18] which allows general cyclic graphs; the “LiNG discovery” algorithm in [18] dealt with the non-identifiability of cyclic SEMs by finding out multiple solutions that give the same distribution. [sent-160, score-0.254]
51 Here we define two variants of our model: One is the acyclic model, using LiNGAM. [sent-161, score-0.174]
52 The acyclic constraint thus can be simplified into a lower-triangular constraint on H. [sent-164, score-0.174]
53 Another one is the symmetric model, which uses a special case of cyclic SEM, i. [sent-165, score-0.169]
54 This implies the non-Gaussianity is not essential for identifiability, in contrast that the acyclic model is not identifiable without non-Gaussianity [27]. [sent-172, score-0.174]
55 4 Maximum likelihood Let ψ(s) := ln ϕ(|s|) for notational simplicity, and denote ψ ′ (s) := sign(s)(ln ϕ)′ (|s|) as a convention, e. [sent-175, score-0.145]
56 Also, following the basic∏ theory of ICA, we assume the disturbances have a joint probability density function (pdf) pr (r) = i ρ(ri ) with a common fixed marginal pdf ρ. [sent-178, score-0.164]
57 Then, we have the following pdf of s without any approximation (see Appendix for derivation): ps (s) = d ∏ 1 | det V| ρ(v ⊤ ψ(s) − hi0 )|ψ ′ (si )|. [sent-179, score-0.334]
58 Each panel corresponds to a particular value of α, which determined the relative connection strength between sources. [sent-189, score-0.131]
59 The pdf of x is given by px (x) = | det W|ps (Wx), and the corresponding loss function, l = − ln px (x) + const. [sent-192, score-0.279]
60 , is given by ¯ l(x, W, V, h0 ) = f (Vψ(Wx) − h0 ) + g (Wx) − ln | det W| − ln | det V|, ¯ (9) ∑ ∑ ¯ where f (r) = i f (ri ), f (ri ) = − ln ρ(ri ), g (s) = i g(si ), and g(si ) = − ln |ψ ′ (si )|. [sent-193, score-0.598]
61 It is also interesting to see that the loss function above includes an additional second term that has not appeared in previous models, due to the formal derivation of pdf by the argument of transformation of random variables. [sent-196, score-0.09]
62 ∂W In both acyclic and symmetric cases, only the lower-triangular elements in V are free parameters. [sent-201, score-0.216]
63 02 0 −2 0 2 Pairwise Difference (mod ± π) Figure 2: Connection weights versus pairwise differences of four properties of linear basis functions, estimated by fitting 2D Gabor functions. [sent-226, score-0.101]
64 The three methods were: 1) FastICA 3 with the tanh nonlinearity, 2) Our method (symmetric model) without energy-dependence (NoDep) initialized by FastICA, and 3) Our full method (symmetric model) initialized by NoDep. [sent-230, score-0.13]
65 As a preprocessing, the sample mean was subtracted and the dimensionality was reduced to 160 by the principal component analysis (PCA) where 99% of the variance was retained. [sent-237, score-0.07]
66 Figure 2 shows the values of connection weights hij (after a row-wise re-scaling of V to set any hii = 1 − vii to be zero, as a standard convention in SEM [18]) for every d(d − 1) pairs, compared with the pairwise difference of four properties of learned features (i. [sent-240, score-0.428]
67 Notice that in the DN transform (7), these positive weights learned in the SEM perform as inhibitory and will suppress the energies of the filters having similar properties. [sent-247, score-0.187]
68 The original signals were measured in 204 channels (sensors) for several minutes with sampling rate (75Hz); the total number of measurements, i. [sent-250, score-0.079]
69 html 6 Figure 3: Depiction of connection properties between learned basis functions in a similar manner to that has used in e. [sent-264, score-0.148]
70 The intensities of red and blue colors were adjusted separately from each other in each panel; the ratio of the maximum positive and negative connection strengths is depicted at the bottom of each small panel by the relative length of horizontal color bars. [sent-270, score-0.162]
71 One cluster of components, highlighted in the figure by the manually inserted yellow contour, seems to consist of components related to auditory processing. [sent-276, score-0.134]
72 The direction of influence, which we can estimate in the acyclic model, seems to be from the anterior areas to posterior ones. [sent-278, score-0.215]
73 This may be related to top-down influence, since the primary auditory cortex seems to be included in the posterior areas on the left hemisphere; at the end of the chain, the signal goes to the right hemisphere. [sent-279, score-0.165]
74 Such temporal components are typically quite difficult to find because the modulation of their energies is quite weak. [sent-280, score-0.128]
75 Our method may help in grouping such components together by analyzing the energy correlations. [sent-281, score-0.105]
76 Another cluster of components consists of low-level visual areas, highlighted by the green contour. [sent-282, score-0.118]
77 It is more difficult to interpret these interactions because the areas corresponding to the components are very close to each other. [sent-283, score-0.094]
78 It seems, however, that here the influences are mainly from the primary visual areas to the higher-order visual areas. [sent-284, score-0.151]
79 5 Conclusion We proposed a new statistical model that uses SEM to model energy-dependencies of latent variables in a standard linear generative model. [sent-285, score-0.115]
80 In the acyclic case, non-Gaussianity is essential for identifiability, while in the cyclic case we introduces the constraint of symmetricity which also guarantees identifiability. [sent-288, score-0.301]
81 We also provided a new generative interpretation of DN transform based on a nonlinear SEM. [sent-289, score-0.21]
82 Our method exhibited a high applicability in three simulations each with synthetic dataset, natural images, and brain signals. [sent-290, score-0.172]
83 (8) From the uniformity of signs, we have ps (s) = ps (Ds) for any D = diag(±1, . [sent-292, score-0.33]
84 Then, the ∫ ∫ ∑K ∫ ∑K ∫ d dσ ps (σ) imrelation S1 dσ pσ (σ) = k=1 S1 dσ ps (Dk σ) = 2 k=1 Sk ds ps (s) = S1 d d plies ps (s) = (1/2 )pσ (s) for any s ∈ S1 ; thus ps (s) = (1/2 )pσ (|s|) for any s ∈ Rd . [sent-296, score-0.825]
85 Now, ∏ y = ln ϕ(σ) (for every component) and thus pσ (σ) = py (y) i |(ln ϕ)′ (σi )|, where we assume ′ ϕ is differentiable. [sent-297, score-0.155]
86 Let ψ(s) := ln ϕ(|s|) and ψ (s) := sign(s)(ln ϕ)′ (|s|). [sent-298, score-0.11]
87 Then it follows that ∏ ps (s) = (1/2d )py (ψ(s)) i |ψ ′ (si )|, where ψ(s) performs component-wise. [sent-299, score-0.165]
88 Since y maps lin∏ early to r with the absolute Jacobian | det V|, we have py (y) = | det V| i ρ(ri ); combining it with ps above, we obtain Eq. [sent-300, score-0.368]
89 Linearity and normalization in simple cells of the macaque primary visual cortex. [sent-333, score-0.173]
90 Learning horizontal connections in a sparse coding model of natural images. [sent-345, score-0.081]
91 Fast and robust fixed-point algorithms for independent component analysis. [sent-363, score-0.071]
92 Emergence of phase and shift invariant features by decomposition of a natural images into independent feature subspaces. [sent-369, score-0.083]
93 A hierarchical Bayesian model for learning nonlinear statistical regularities in nonstationary natural signals. [sent-396, score-0.116]
94 Estimating functions for blind separation when sources have variance u dependencies. [sent-409, score-0.13]
95 Psychophysically tuned divisive normalization approximately factorizes the PDF of natural images. [sent-439, score-0.325]
96 Characterization of neuromagnetic brain rhythms a over time scales of minutes using spatial independent component analysis. [sent-466, score-0.119]
97 A linear non-Gaussian acyclic model for causal a discovery. [sent-481, score-0.235]
98 Optimal coding through divisive normalization models of V1 neurons. [sent-496, score-0.304]
99 Independent component filters of natural images compared with simple cells in primary visual cortex. [sent-508, score-0.164]
100 Source separation and higher-order causal analysis of MEG and EEG. [sent-524, score-0.089]
wordName wordTfidf (topN-words)
[('sem', 0.571), ('hij', 0.235), ('ica', 0.195), ('divisive', 0.175), ('acyclic', 0.174), ('ps', 0.165), ('hyv', 0.16), ('dn', 0.139), ('cyclic', 0.127), ('si', 0.12), ('fastica', 0.118), ('ln', 0.11), ('rinen', 0.105), ('normalization', 0.099), ('ri', 0.093), ('wx', 0.092), ('connection', 0.092), ('pdf', 0.09), ('ling', 0.087), ('hoyer', 0.087), ('sems', 0.085), ('structural', 0.081), ('signals', 0.079), ('det', 0.079), ('dag', 0.079), ('meg', 0.079), ('lters', 0.078), ('energies', 0.075), ('disturbances', 0.074), ('lingam', 0.073), ('malo', 0.073), ('nodep', 0.073), ('yi', 0.073), ('zi', 0.071), ('amari', 0.07), ('ui', 0.069), ('mixing', 0.069), ('signs', 0.067), ('nonlinear', 0.065), ('tanh', 0.062), ('causal', 0.061), ('disturbance', 0.059), ('latent', 0.059), ('sources', 0.057), ('generative', 0.056), ('components', 0.053), ('auditory', 0.052), ('energy', 0.052), ('identi', 0.052), ('natural', 0.051), ('transform', 0.051), ('yj', 0.049), ('ramkumar', 0.049), ('brain', 0.048), ('gabor', 0.048), ('blind', 0.045), ('py', 0.045), ('applicability', 0.045), ('equations', 0.044), ('diag', 0.044), ('patrik', 0.043), ('symmetric', 0.042), ('areas', 0.041), ('pairwise', 0.04), ('panel', 0.039), ('component', 0.039), ('helsinki', 0.039), ('transposed', 0.039), ('karklin', 0.039), ('kyoto', 0.039), ('emergence', 0.038), ('primary', 0.038), ('interpretation', 0.038), ('visual', 0.036), ('ability', 0.036), ('likelihood', 0.035), ('graph', 0.035), ('vij', 0.035), ('initialized', 0.034), ('signal', 0.034), ('permutation', 0.034), ('vy', 0.033), ('weights', 0.033), ('independent', 0.032), ('topographic', 0.032), ('dynamical', 0.031), ('adjusted', 0.031), ('subtracted', 0.031), ('lter', 0.031), ('mod', 0.03), ('coding', 0.03), ('outputs', 0.029), ('orientations', 0.029), ('highlighted', 0.029), ('source', 0.029), ('directed', 0.028), ('separation', 0.028), ('exhibited', 0.028), ('basis', 0.028), ('learned', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis
Author: Jun-ichiro Hirayama, Aapo Hyvärinen
Abstract: Components estimated by independent component analysis and related methods are typically not independent in real data. A very common form of nonlinear dependency between the components is correlations in their variances or energies. Here, we propose a principled probabilistic model to model the energycorrelations between the latent variables. Our two-stage model includes a linear mixing of latent signals into the observed ones like in ICA. The main new feature is a model of the energy-correlations based on the structural equation model (SEM), in particular, a Linear Non-Gaussian SEM. The SEM is closely related to divisive normalization which effectively reduces energy correlation. Our new twostage model enables estimation of both the linear mixing and the interactions related to energy-correlations, without resorting to approximations of the likelihood function or other non-principled approaches. We demonstrate the applicability of our method with synthetic dataset, natural images and brain signals. 1
2 0.16624776 261 nips-2011-Sparse Filtering
Author: Jiquan Ngiam, Zhenghao Chen, Sonia A. Bhaskar, Pang W. Koh, Andrew Y. Ng
Abstract: Unsupervised feature learning has been shown to be effective at learning representations that perform well on image, video and audio classification. However, many existing feature learning algorithms are hard to use and require extensive hyperparameter tuning. In this work, we present sparse filtering, a simple new algorithm which is efficient and only has one hyperparameter, the number of features to learn. In contrast to most other feature learning methods, sparse filtering does not explicitly attempt to construct a model of the data distribution. Instead, it optimizes a simple cost function – the sparsity of 2 -normalized features – which can easily be implemented in a few lines of MATLAB code. Sparse filtering scales gracefully to handle high-dimensional inputs, and can also be used to learn meaningful features in additional layers with greedy layer-wise stacking. We evaluate sparse filtering on natural images, object classification (STL-10), and phone classification (TIMIT), and show that our method works well on a range of different modalities. 1
3 0.15754573 124 nips-2011-ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning
Author: Quoc V. Le, Alexandre Karpenko, Jiquan Ngiam, Andrew Y. Ng
Abstract: Independent Components Analysis (ICA) and its variants have been successfully used for unsupervised feature learning. However, standard ICA requires an orthonoramlity constraint to be enforced, which makes it difficult to learn overcomplete features. In addition, ICA is sensitive to whitening. These properties make it challenging to scale ICA to high dimensional data. In this paper, we propose a robust soft reconstruction cost for ICA that allows us to learn highly overcomplete sparse features even on unwhitened data. Our formulation reveals formal connections between ICA and sparse autoencoders, which have previously been observed only empirically. Our algorithm can be used in conjunction with off-the-shelf fast unconstrained optimizers. We show that the soft reconstruction cost can also be used to prevent replicated features in tiled convolutional neural networks. Using our method to learn highly overcomplete sparse features and tiled convolutional neural networks, we obtain competitive performances on a wide variety of object recognition tasks. We achieve state-of-the-art test accuracies on the STL-10 and Hollywood2 datasets. 1
4 0.13306648 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons
Author: Yan Karklin, Eero P. Simoncelli
Abstract: Efficient coding provides a powerful principle for explaining early sensory coding. Most attempts to test this principle have been limited to linear, noiseless models, and when applied to natural images, have yielded oriented filters consistent with responses in primary visual cortex. Here we show that an efficient coding model that incorporates biologically realistic ingredients – input and output noise, nonlinear response functions, and a metabolic cost on the firing rate – predicts receptive fields and response nonlinearities similar to those observed in the retina. Specifically, we develop numerical methods for simultaneously learning the linear filters and response nonlinearities of a population of model neurons, so as to maximize information transmission subject to metabolic costs. When applied to an ensemble of natural images, the method yields filters that are center-surround and nonlinearities that are rectifying. The filters are organized into two populations, with On- and Off-centers, which independently tile the visual space. As observed in the primate retina, the Off-center neurons are more numerous and have filters with smaller spatial extent. In the absence of noise, our method reduces to a generalized version of independent components analysis, with an adapted nonlinear “contrast” function; in this case, the optimal filters are localized and oriented.
5 0.11601163 194 nips-2011-On Causal Discovery with Cyclic Additive Noise Models
Author: Joris M. Mooij, Dominik Janzing, Tom Heskes, Bernhard Schölkopf
Abstract: We study a particular class of cyclic causal models, where each variable is a (possibly nonlinear) function of its parents and additive noise. We prove that the causal graph of such models is generically identifiable in the bivariate, Gaussian-noise case. We also propose a method to learn such models from observational data. In the acyclic case, the method reduces to ordinary regression, but in the more challenging cyclic case, an additional term arises in the loss function, which makes it a special case of nonlinear independent component analysis. We illustrate the proposed method on synthetic data. 1
6 0.10689396 298 nips-2011-Unsupervised learning models of primary cortical receptive fields and receptive field plasticity
7 0.091600202 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
8 0.086763754 276 nips-2011-Structured sparse coding via lateral inhibition
9 0.079331271 244 nips-2011-Selecting Receptive Fields in Deep Networks
10 0.075861119 68 nips-2011-Demixed Principal Component Analysis
11 0.07228931 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
12 0.07151752 44 nips-2011-Bayesian Spike-Triggered Covariance Analysis
13 0.069442652 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations
14 0.067075014 217 nips-2011-Practical Variational Inference for Neural Networks
15 0.06420213 205 nips-2011-Online Submodular Set Cover, Ranking, and Repeated Active Learning
16 0.063841991 15 nips-2011-A rational model of causal inference with continuous causes
17 0.06309031 302 nips-2011-Variational Learning for Recurrent Spiking Networks
18 0.060921088 180 nips-2011-Multiple Instance Filtering
19 0.060543485 288 nips-2011-Thinning Measurement Models and Questionnaire Design
20 0.060415383 259 nips-2011-Sparse Estimation with Structured Dictionaries
topicId topicWeight
[(0, 0.208), (1, 0.07), (2, 0.068), (3, -0.001), (4, 0.003), (5, 0.046), (6, 0.021), (7, 0.106), (8, 0.001), (9, -0.048), (10, -0.078), (11, -0.112), (12, 0.058), (13, -0.081), (14, 0.05), (15, -0.001), (16, 0.064), (17, -0.071), (18, 0.059), (19, 0.018), (20, -0.081), (21, -0.086), (22, 0.065), (23, 0.113), (24, 0.016), (25, 0.061), (26, 0.013), (27, 0.048), (28, -0.152), (29, -0.009), (30, -0.067), (31, -0.001), (32, 0.159), (33, -0.039), (34, -0.142), (35, 0.01), (36, 0.084), (37, 0.03), (38, 0.049), (39, 0.025), (40, 0.024), (41, -0.061), (42, -0.026), (43, -0.064), (44, 0.03), (45, 0.147), (46, -0.092), (47, 0.014), (48, -0.015), (49, -0.088)]
simIndex simValue paperId paperTitle
same-paper 1 0.92438388 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis
Author: Jun-ichiro Hirayama, Aapo Hyvärinen
Abstract: Components estimated by independent component analysis and related methods are typically not independent in real data. A very common form of nonlinear dependency between the components is correlations in their variances or energies. Here, we propose a principled probabilistic model to model the energycorrelations between the latent variables. Our two-stage model includes a linear mixing of latent signals into the observed ones like in ICA. The main new feature is a model of the energy-correlations based on the structural equation model (SEM), in particular, a Linear Non-Gaussian SEM. The SEM is closely related to divisive normalization which effectively reduces energy correlation. Our new twostage model enables estimation of both the linear mixing and the interactions related to energy-correlations, without resorting to approximations of the likelihood function or other non-principled approaches. We demonstrate the applicability of our method with synthetic dataset, natural images and brain signals. 1
2 0.65815395 124 nips-2011-ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning
Author: Quoc V. Le, Alexandre Karpenko, Jiquan Ngiam, Andrew Y. Ng
Abstract: Independent Components Analysis (ICA) and its variants have been successfully used for unsupervised feature learning. However, standard ICA requires an orthonoramlity constraint to be enforced, which makes it difficult to learn overcomplete features. In addition, ICA is sensitive to whitening. These properties make it challenging to scale ICA to high dimensional data. In this paper, we propose a robust soft reconstruction cost for ICA that allows us to learn highly overcomplete sparse features even on unwhitened data. Our formulation reveals formal connections between ICA and sparse autoencoders, which have previously been observed only empirically. Our algorithm can be used in conjunction with off-the-shelf fast unconstrained optimizers. We show that the soft reconstruction cost can also be used to prevent replicated features in tiled convolutional neural networks. Using our method to learn highly overcomplete sparse features and tiled convolutional neural networks, we obtain competitive performances on a wide variety of object recognition tasks. We achieve state-of-the-art test accuracies on the STL-10 and Hollywood2 datasets. 1
3 0.59381449 194 nips-2011-On Causal Discovery with Cyclic Additive Noise Models
Author: Joris M. Mooij, Dominik Janzing, Tom Heskes, Bernhard Schölkopf
Abstract: We study a particular class of cyclic causal models, where each variable is a (possibly nonlinear) function of its parents and additive noise. We prove that the causal graph of such models is generically identifiable in the bivariate, Gaussian-noise case. We also propose a method to learn such models from observational data. In the acyclic case, the method reduces to ordinary regression, but in the more challenging cyclic case, an additional term arises in the loss function, which makes it a special case of nonlinear independent component analysis. We illustrate the proposed method on synthetic data. 1
4 0.54205924 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons
Author: Yan Karklin, Eero P. Simoncelli
Abstract: Efficient coding provides a powerful principle for explaining early sensory coding. Most attempts to test this principle have been limited to linear, noiseless models, and when applied to natural images, have yielded oriented filters consistent with responses in primary visual cortex. Here we show that an efficient coding model that incorporates biologically realistic ingredients – input and output noise, nonlinear response functions, and a metabolic cost on the firing rate – predicts receptive fields and response nonlinearities similar to those observed in the retina. Specifically, we develop numerical methods for simultaneously learning the linear filters and response nonlinearities of a population of model neurons, so as to maximize information transmission subject to metabolic costs. When applied to an ensemble of natural images, the method yields filters that are center-surround and nonlinearities that are rectifying. The filters are organized into two populations, with On- and Off-centers, which independently tile the visual space. As observed in the primate retina, the Off-center neurons are more numerous and have filters with smaller spatial extent. In the absence of noise, our method reduces to a generalized version of independent components analysis, with an adapted nonlinear “contrast” function; in this case, the optimal filters are localized and oriented.
5 0.52859133 261 nips-2011-Sparse Filtering
Author: Jiquan Ngiam, Zhenghao Chen, Sonia A. Bhaskar, Pang W. Koh, Andrew Y. Ng
Abstract: Unsupervised feature learning has been shown to be effective at learning representations that perform well on image, video and audio classification. However, many existing feature learning algorithms are hard to use and require extensive hyperparameter tuning. In this work, we present sparse filtering, a simple new algorithm which is efficient and only has one hyperparameter, the number of features to learn. In contrast to most other feature learning methods, sparse filtering does not explicitly attempt to construct a model of the data distribution. Instead, it optimizes a simple cost function – the sparsity of 2 -normalized features – which can easily be implemented in a few lines of MATLAB code. Sparse filtering scales gracefully to handle high-dimensional inputs, and can also be used to learn meaningful features in additional layers with greedy layer-wise stacking. We evaluate sparse filtering on natural images, object classification (STL-10), and phone classification (TIMIT), and show that our method works well on a range of different modalities. 1
6 0.51795471 298 nips-2011-Unsupervised learning models of primary cortical receptive fields and receptive field plasticity
7 0.49370384 15 nips-2011-A rational model of causal inference with continuous causes
8 0.47101742 68 nips-2011-Demixed Principal Component Analysis
9 0.42882609 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
10 0.42694119 103 nips-2011-Generalization Bounds and Consistency for Latent Structural Probit and Ramp Loss
11 0.42141944 276 nips-2011-Structured sparse coding via lateral inhibition
12 0.41828328 130 nips-2011-Inductive reasoning about chimeric creatures
13 0.40631905 244 nips-2011-Selecting Receptive Fields in Deep Networks
14 0.383057 123 nips-2011-How biased are maximum entropy models?
15 0.36869061 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes
16 0.36690605 44 nips-2011-Bayesian Spike-Triggered Covariance Analysis
17 0.36606258 109 nips-2011-Greedy Model Averaging
19 0.36453944 180 nips-2011-Multiple Instance Filtering
20 0.35956633 225 nips-2011-Probabilistic amplitude and frequency demodulation
topicId topicWeight
[(0, 0.022), (4, 0.034), (13, 0.07), (20, 0.038), (26, 0.04), (31, 0.103), (33, 0.013), (38, 0.015), (43, 0.096), (45, 0.063), (57, 0.044), (65, 0.064), (71, 0.118), (74, 0.067), (83, 0.061), (84, 0.026), (99, 0.038)]
simIndex simValue paperId paperTitle
same-paper 1 0.8656798 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis
Author: Jun-ichiro Hirayama, Aapo Hyvärinen
Abstract: Components estimated by independent component analysis and related methods are typically not independent in real data. A very common form of nonlinear dependency between the components is correlations in their variances or energies. Here, we propose a principled probabilistic model to model the energycorrelations between the latent variables. Our two-stage model includes a linear mixing of latent signals into the observed ones like in ICA. The main new feature is a model of the energy-correlations based on the structural equation model (SEM), in particular, a Linear Non-Gaussian SEM. The SEM is closely related to divisive normalization which effectively reduces energy correlation. Our new twostage model enables estimation of both the linear mixing and the interactions related to energy-correlations, without resorting to approximations of the likelihood function or other non-principled approaches. We demonstrate the applicability of our method with synthetic dataset, natural images and brain signals. 1
2 0.82058132 300 nips-2011-Variance Reduction in Monte-Carlo Tree Search
Author: Joel Veness, Marc Lanctot, Michael Bowling
Abstract: Monte-Carlo Tree Search (MCTS) has proven to be a powerful, generic planning technique for decision-making in single-agent and adversarial environments. The stochastic nature of the Monte-Carlo simulations introduces errors in the value estimates, both in terms of bias and variance. Whilst reducing bias (typically through the addition of domain knowledge) has been studied in the MCTS literature, comparatively little effort has focused on reducing variance. This is somewhat surprising, since variance reduction techniques are a well-studied area in classical statistics. In this paper, we examine the application of some standard techniques for variance reduction in MCTS, including common random numbers, antithetic variates and control variates. We demonstrate how these techniques can be applied to MCTS and explore their efficacy on three different stochastic, single-agent settings: Pig, Can’t Stop and Dominion. 1
3 0.76204336 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations
Author: Kamiar R. Rad, Liam Paninski
Abstract: Many fundamental questions in theoretical neuroscience involve optimal decoding and the computation of Shannon information rates in populations of spiking neurons. In this paper, we apply methods from the asymptotic theory of statistical inference to obtain a clearer analytical understanding of these quantities. We find that for large neural populations carrying a finite total amount of information, the full spiking population response is asymptotically as informative as a single observation from a Gaussian process whose mean and covariance can be characterized explicitly in terms of network and single neuron properties. The Gaussian form of this asymptotic sufficient statistic allows us in certain cases to perform optimal Bayesian decoding by simple linear transformations, and to obtain closed-form expressions of the Shannon information carried by the network. One technical advantage of the theory is that it may be applied easily even to non-Poisson point process network models; for example, we find that under some conditions, neural populations with strong history-dependent (non-Poisson) effects carry exactly the same information as do simpler equivalent populations of non-interacting Poisson neurons with matched firing rates. We argue that our findings help to clarify some results from the recent literature on neural decoding and neuroprosthetic design.
4 0.75995684 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)
Author: Alyson K. Fletcher, Sundeep Rangan, Lav R. Varshney, Aniruddha Bhargava
Abstract: Many functional descriptions of spiking neurons assume a cascade structure where inputs are passed through an initial linear filtering stage that produces a lowdimensional signal that drives subsequent nonlinear stages. This paper presents a novel and systematic parameter estimation procedure for such models and applies the method to two neural estimation problems: (i) compressed-sensing based neural mapping from multi-neuron excitation, and (ii) estimation of neural receptive fields in sensory neurons. The proposed estimation algorithm models the neurons via a graphical model and then estimates the parameters in the model using a recently-developed generalized approximate message passing (GAMP) method. The GAMP method is based on Gaussian approximations of loopy belief propagation. In the neural connectivity problem, the GAMP-based method is shown to be computational efficient, provides a more exact modeling of the sparsity, can incorporate nonlinearities in the output and significantly outperforms previous compressed-sensing methods. For the receptive field estimation, the GAMP method can also exploit inherent structured sparsity in the linear weights. The method is validated on estimation of linear nonlinear Poisson (LNP) cascade models for receptive fields of salamander retinal ganglion cells. 1
5 0.75695246 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons
Author: Yan Karklin, Eero P. Simoncelli
Abstract: Efficient coding provides a powerful principle for explaining early sensory coding. Most attempts to test this principle have been limited to linear, noiseless models, and when applied to natural images, have yielded oriented filters consistent with responses in primary visual cortex. Here we show that an efficient coding model that incorporates biologically realistic ingredients – input and output noise, nonlinear response functions, and a metabolic cost on the firing rate – predicts receptive fields and response nonlinearities similar to those observed in the retina. Specifically, we develop numerical methods for simultaneously learning the linear filters and response nonlinearities of a population of model neurons, so as to maximize information transmission subject to metabolic costs. When applied to an ensemble of natural images, the method yields filters that are center-surround and nonlinearities that are rectifying. The filters are organized into two populations, with On- and Off-centers, which independently tile the visual space. As observed in the primate retina, the Off-center neurons are more numerous and have filters with smaller spatial extent. In the absence of noise, our method reduces to a generalized version of independent components analysis, with an adapted nonlinear “contrast” function; in this case, the optimal filters are localized and oriented.
6 0.75211936 75 nips-2011-Dynamical segmentation of single trials from population neural data
7 0.7502284 258 nips-2011-Sparse Bayesian Multi-Task Learning
8 0.74647969 281 nips-2011-The Doubly Correlated Nonparametric Topic Model
9 0.74637657 133 nips-2011-Inferring spike-timing-dependent plasticity from spike train data
10 0.74185234 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
11 0.74144256 276 nips-2011-Structured sparse coding via lateral inhibition
12 0.74107707 219 nips-2011-Predicting response time and error rates in visual search
13 0.74048311 86 nips-2011-Empirical models of spiking in neural populations
14 0.73847234 102 nips-2011-Generalised Coupled Tensor Factorisation
15 0.73837709 301 nips-2011-Variational Gaussian Process Dynamical Systems
16 0.73565912 83 nips-2011-Efficient inference in matrix-variate Gaussian models with \iid observation noise
17 0.73328656 239 nips-2011-Robust Lasso with missing and grossly corrupted observations
18 0.73268992 92 nips-2011-Expressive Power and Approximation Errors of Restricted Boltzmann Machines
19 0.73205739 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning
20 0.73193038 158 nips-2011-Learning unbelievable probabilities