nips nips2007 nips2007-138 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Matthias Bethge, Philipp Berens
Abstract: Maximum entropy analysis of binary variables provides an elegant way for studying the role of pairwise correlations in neural populations. Unfortunately, these approaches suffer from their poor scalability to high dimensions. In sensory coding, however, high-dimensional data is ubiquitous. Here, we introduce a new approach using a near-maximum entropy model, that makes this type of analysis feasible for very high-dimensional data—the model parameters can be derived in closed form and sampling is easy. Therefore, our NearMaxEnt approach can serve as a tool for testing predictions from a pairwise maximum entropy model not only for low-dimensional marginals, but also for high dimensional measurements of more than thousand units. We demonstrate its usefulness by studying natural images with dichotomized pixel intensities. Our results indicate that the statistics of such higher-dimensional measurements exhibit additional structure that are not predicted by pairwise correlations, despite the fact that pairwise correlations explain the lower-dimensional marginal statistics surprisingly well up to the limit of dimensionality where estimation of the full joint distribution is feasible. 1
Reference: text
sentIndex sentText sentNum sentScore
1 de Abstract Maximum entropy analysis of binary variables provides an elegant way for studying the role of pairwise correlations in neural populations. [sent-3, score-0.665]
2 Here, we introduce a new approach using a near-maximum entropy model, that makes this type of analysis feasible for very high-dimensional data—the model parameters can be derived in closed form and sampling is easy. [sent-6, score-0.409]
3 Therefore, our NearMaxEnt approach can serve as a tool for testing predictions from a pairwise maximum entropy model not only for low-dimensional marginals, but also for high dimensional measurements of more than thousand units. [sent-7, score-0.487]
4 We demonstrate its usefulness by studying natural images with dichotomized pixel intensities. [sent-8, score-0.482]
5 1 Introduction A core issue in sensory coding is to seek out and model statistical regularities in high-dimensional data. [sent-10, score-0.155]
6 In particular, motivated by developments in information theory, it has been hypothesized that modeling these regularities by means of redundancy reduction constitutes an important goal of early visual processing [2]. [sent-11, score-0.123]
7 Recent studies conjectured that the binary spike responses of retinal ganglion cells may be characterized completely in terms of second-order correlations when using a maximum entropy approach [13, 12]. [sent-12, score-0.702]
8 In light of what we know about the statistics of the visual input, however, this would be very surprising: Natural images are known to exhibit complex higherorder correlations which are extremely difficult to model yet being perceptually relevant. [sent-13, score-0.312]
9 Thus, if we assume that retinal ganglion cells do not discard the information underlying these higher-order correlations altogether, it would be a very difficult signal processing task to remove all of those already within the retinal network. [sent-14, score-0.348]
10 For such simple neuron models, the possibility of removing higher-order correlations present in the input is very limited [3]. [sent-16, score-0.157]
11 Here, we study the role of second-order correlations in the multivariate binary output statistics of such linear-nonlinear model neurons with a threshold nonlinearity responding to natural images. [sent-17, score-0.464]
12 A: Up to 10 dimensions we can compute HDG directly by evaluating Eq. [sent-22, score-0.112]
13 ∆H as function of sample size used to estimate HDG , at seven (black) and ten (grey) dimensions (note log scale on both axes). [sent-34, score-0.112]
14 (B) The same model can also be used more generally to fit multivariate binary data with given pairwise correlations, if x is drawn from a Gaussian distribution. [sent-37, score-0.254]
15 In particular, we will show that the resulting distribution closely resembles the binary maximum entropy models known as Ising models or Boltzmann machines which have recently become popular for the analysis of spike train recordings from retinal ganglion cell responses [13, 12]. [sent-38, score-0.602]
16 If we suppose that pairwise interactions are enough, what can we say about the amount of redundancies in high-dimensional data? [sent-40, score-0.139]
17 In comparison with neural spike data, natural images provide two advantages for studying these questions: 1) It is much easier to obtain large amounts of data with millions of samples which are less prone to nonstationarities. [sent-41, score-0.197]
18 2) Often differences in the higher-order statistics such as between pink noise and natural images can be recognized by eye. [sent-42, score-0.182]
19 2 Second order models for binary variables In order to study whether pairwise interactions are enough to determine the statistical regularities in high-dimensional data, it is necessary to be able to compute the maximum entropy distribution for large number of dimensions N . [sent-43, score-0.707]
20 Given a set of measured statistics, maximum entropy models yield a full probability distribution that is consistent with these constraints but does not impose any 2 0. [sent-44, score-0.335]
21 For binary data with given mean activations µi = si and correlations between neurons Σij = si sj − si sj , one obtains a quadratic exponential probability mass function known as the Ising model in physics or as the Boltzmann machine in machine learning. [sent-54, score-0.78]
22 Currently all methods used to determine the parameters of such binary maximum entropy models suffer from the same drawback: since the parameters do not correspond directly to any of the measured statistics, they have to be inferred (or ‘learned’) from data. [sent-55, score-0.366]
23 In high dimensions though, this poses a difficult computational problem. [sent-56, score-0.112]
24 To make the maximum entropy approach feasible in high dimensions, we propose a new strategy: Sampling from a ‘near-maximum’ entropy model that does not require any complicated learning of parameters. [sent-58, score-0.637]
25 In order to justify this approach, we verify empirically that the entropy of the full probability distributions obtained with the near-maximum entropy model are indistinguishable from those obtained with classical methods such as Gibbs sampling for up to 20 dimensions. [sent-59, score-0.695]
26 Therefore, one has to resort to an optimization approach to learn the model parameters hi and Jij from data. [sent-63, score-0.127]
27 (1) is not available in closed form, Monte-Carlo methods such 3 Figure 3: Random samples of dichotomized 4x4 patches from the van Hateren image data base (left) and from the corresponding dichotomized Gaussian distribution with equal covariance matrix (middle). [sent-67, score-0.93]
28 as Gibbs sampling are employed [9] in order to approximate the required model average. [sent-70, score-0.105]
29 2 Modeling with the dichotomized Gaussian Here we explore an intriguing alternative to the Monte-Carlo approach: We replace the Ising model by a ’near-maximum’ entropy model, for which both parameter computation and sampling is easy. [sent-76, score-0.704]
30 A very convenient, but in this context rarely recognized, candidate model is the dichotomized Gaussian distribution (DG) [11, 5, 4]. [sent-77, score-0.387]
31 It is obtained by supposing that the observed binary vector s is generated from a hidden Gaussian variable z ∼ N (γ, Λ) , si = sgn(zi ). [sent-78, score-0.149]
32 a1 exp −(s − γ)T Λ−1 (s − γ) , (6) am where the integration limits are chosen as ai = 0 and bi = ∞, if si = 1, and ai = −∞ and bi = 0, otherwise. [sent-89, score-0.126]
33 4 3 Near-maximum entropy behavior of the dichotomized Gaussian distribution In the previous section we introduced the dichotomized Gaussian distribution. [sent-91, score-0.941]
34 For a wide range of interaction terms and mean activations we verify that the DG model closely resembles the Ising model. [sent-94, score-0.14]
35 In particular we show that the entropy of the DG distribution is not smaller than the entropy of the Ising model even at rather high dimensions. [sent-95, score-0.637]
36 1 Random Connectivity We created randomly connected networks of varying size m, where mean activations hi and interactions terms Jij were drawn from N (0, 0. [sent-97, score-0.173]
37 First, we compared the entropy HI = − s PI (s) log2 PI (s) of the thus specified Ising model obtained by evaluating Eq. [sent-99, score-0.328]
38 1 with the entropy of the DG distribution HDG computed by numerical integration1 from Eq. [sent-100, score-0.341]
39 The entropy difference ∆H = HI − HDG was smaller than 0. [sent-102, score-0.283]
40 We find that DJS [PI PDG ] is extremly small up to 10 dimensions (Fig. [sent-106, score-0.112]
41 6 becomes too time-consuming for m → 20 due to the large number of states, we used a histogram based estimate of PDG (using 3 · 106 samples for m < 15 and 15 · 106 samples for m ≥ 15). [sent-111, score-0.098]
42 The estimate of ∆H is still very small at high dimensions (Fig. [sent-112, score-0.112]
43 2 Specified covariance structure To explore the relationship between the two techniques more systematically, we generated covariance matrices with varying eigenvalue spectra. [sent-123, score-0.155]
44 We varied the decay parameter α, which led to a widely varying covariance structure (For eigenvalue spectra, see Fig. [sent-126, score-0.101]
45 The covariance matrix of the samples drawn from the Ising model resembles the original very closely (Fig. [sent-129, score-0.179]
46 We also computed the entropy of the DG model using the desired covariance structure. [sent-131, score-0.382]
47 We estimated ∆H and DJS [PG PDG ] averaged over 10 trials with 105 samples obtained by Gibbs sampling from the Ising model. [sent-132, score-0.109]
48 Our experiments demonstrate clearly that the dichotomized Gaussian distribution constitutes a good approximation to the quadratic exponential distribution for a large parameter range. [sent-140, score-0.393]
49 In the following section, we will exploit the similarity between the two models to study how the role of second-order correlations may change between low-dimensional and high-dimensional statistics in case of natural images. [sent-141, score-0.275]
50 Similar to the analysis in [12], we observe a power law behavior of the entropy of the independent model (black solid line) and the mutli-information. [sent-150, score-0.41]
51 Linear extrapolation (in the log-log plot) to higher dimensions is indicated by dashed lines. [sent-151, score-0.228]
52 C: Different way of presentation of the same data as in B: the joint entropy H = Hindep − I (blue dots) is plotted instead of I and the axis are in linear scale. [sent-152, score-0.331]
53 The dashed red line represents the same extrapolation as in B. [sent-153, score-0.15]
54 4 Natural images: Second order and beyond We now investigate to which extent the statistics of natural images with dichotomized pixel intensities can be characterized by pairwise correlations only. [sent-154, score-0.75]
55 In particular, we would like to know how the role of pairwise correlations opposed to higher-order correlations changes depending on the dimensionality. [sent-155, score-0.448]
56 Thanks to the DG model introduced above, we are in the position to study the effect of pairwise correlations for high-dimensional binary random variables (N ≈ 1000 or even larger). [sent-156, score-0.389]
57 We use the van Hateren image database in log-intensity scale, from which we sample small image patches at random positions. [sent-157, score-0.198]
58 That is, each binary variable encodes whether the corresponding pixel intensity is above or below the median over the ensemble. [sent-159, score-0.12]
59 Up to patch sizes of 4 × 4 pixel, the true joint statistics can be assessed using nonparametric histogram methods. [sent-160, score-0.14]
60 3, left), from the DG model with same mean and covariance (Fig. [sent-162, score-0.099]
61 By visual inspection, it seems that the DG model fits the true distribution well. [sent-165, score-0.143]
62 In order to quantify how well the DG model matches the true distribution, we draw two independent sets of samples from each (N = 2 · 106 for each set) and generate a scatter plot as shown in Fig. [sent-166, score-0.141]
63 The relative frequencies of these patterns according to the DG model (red dots) and according to the independent model (blue dots) are plotted against the relative frequencies obtained from the natural image patches. [sent-169, score-0.2]
64 The solid diagonal line corresponds to a perfect match between model and ground truth. [sent-170, score-0.117]
65 Since most of the red dots fall within this region, the DG model fits the data distribution very well. [sent-172, score-0.182]
66 Then we incrementally added more pixels of random location until the random vector contains all the 16 pixels of the 4 × 4 image patches. [sent-175, score-0.104]
67 For two independent sets of samples both drawn from natural image data the JS-divergence ranges between 0. [sent-181, score-0.138]
68 007 bits for 4 × 4 patches setting the gold standard for the minimal possible JS-divergence one could achieve with any model due to finite sampling size. [sent-183, score-0.222]
69 6 Figure 5: Random samples of dichotomized 32x32 patches from the van Hateren image data base (left) and from the corresponding dichotomized Gaussian distribution with equal covariance matrix (right). [sent-185, score-0.909]
70 This striking difference is not obvious, however, at the level of 4x4 patches, for which we found an excellent match of the dichotomized Gaussian to the ensemble of natural images. [sent-187, score-0.385]
71 Furthermore, the multi-information of the DG model (red solid line) and of the true distribution (blue dots) increases linearly on a loglog-scale with the number of dimensions (Fig. [sent-188, score-0.224]
72 Both findings can be verified only up to a rather limited number of dimensions (less than 20). [sent-190, score-0.112]
73 Using natural images instead of retinal ganglion cell data, we would like to verify to what extent the low-dimensional observations can be used to support these claims about the high-dimensional statistics [10]. [sent-192, score-0.296]
74 To this end we study the same kind of extrapolation (Fig. [sent-193, score-0.106]
75 4 B) to higher dimensions (dashed lines) as in [12]. [sent-194, score-0.112]
76 The difference between the entropy of the independent model and the multi-information yields the joint entropy of the respective distribution. [sent-195, score-0.638]
77 If the extrapolation is taken seriously, this difference seems to vanish at the order of 50 dimensions suggesting that the joint entropy of the neural responses approaches zero at this size—say for 7 × 7 image patches (Fig. [sent-196, score-0.689]
78 First of all, the joint entropy of a distribution can never be smaller than the joint entropy of any of its marginals. [sent-200, score-0.646]
79 Therefore, the joint entropy cannot decrease with increasing number of dimensions as the extrapolation would suggest (Fig. [sent-201, score-0.508]
80 Instead it would be necessary to ask more precisely how the growth rate of the joint entropy can be characterized and whether there is a critical number of dimensions at which the growth rate suddenly drops. [sent-203, score-0.491]
81 In our study with natural images, visual inspection does not indicate anything special to happen at the ‘critical patch size’ of 7 × 7 pixels. [sent-204, score-0.145]
82 Rather, for all patch sizes, the DG model yields dichotomized pink noise. [sent-205, score-0.428]
83 5 (right) we show a sample from the DG model for 32×32 image patches (i. [sent-207, score-0.172]
84 The exact law according to which the multi-information grows with the number of dimensions for large m, however, is not easily assessed and remains to be explored. [sent-210, score-0.177]
85 Finally, we point out that the sufficiency of pairwise correlations at the level of m = 16 dimensions does not hold any more in the case of large m: the samples from the true distribution at the left hand side of Fig. [sent-211, score-0.476]
86 5, right), indicating that pairwise correlations do not suffice to determine the full statistics of large image patches. [sent-213, score-0.352]
87 Even if the match between the DG model and the Ising model may turn out to be less accurate in high dimensions, this would not affect our conclusion. [sent-214, score-0.12]
88 Any mismatch would only introduce more order in the DG model than justified by pairwise correlations only. [sent-215, score-0.312]
89 We verified numerically that the empirical entropy of the DG model is comparable to that obtained with Gibbs sampling at least up to 20 dimensions. [sent-217, score-0.388]
90 For practical purposes, the DG distribution can even be superior to the Gibbs sampler in terms of entropy maximization due to the lack of independence between consecutive samples in the Gibbs sampler. [sent-218, score-0.358]
91 Although the Ising model and the DG model are in principle different, the match between the two turns out to be surprisingly good for a large region of the parameter space. [sent-219, score-0.12]
92 In addition, we explore the possibility to use the dichotomized Gaussian distribution as a proposal density for Monte-Carlo methods such as importance sampling. [sent-221, score-0.342]
93 In summary, by linking the DG model to the Ising model, we believe that maximum entropy modeling of multivariate binary random variables will become much more practical in the future. [sent-223, score-0.453]
94 We used the DG model to investigate the role of second-order correlations in the context of sensory coding of natural images. [sent-224, score-0.331]
95 While for small image patches the DG model provided an excellent fit to the true distribution, we were able to show that this agreement breakes down in the case of larger image patches. [sent-225, score-0.244]
96 Thus caution is required when extrapolating from low-dimensional measurements to higher-dimensional distributions because higher-order correlations may be invisible in low-dimensional marginal distributions. [sent-226, score-0.18]
97 Nevertheless, the maximum entropy approach seems to be a promising tool for the analysis of correlated neural activities, and the DG model can facilitate its use significantly in practice. [sent-227, score-0.379]
98 Factorial coding of natural images: How effective are linear model in removing higher-order dependencies? [sent-250, score-0.115]
99 On some models for multivariate binary variables parallel in complexity with the multivariate gaussian distribution. [sent-260, score-0.175]
100 Weak pairwise correlations imply strongly correlated network states in a neural population. [sent-299, score-0.267]
wordName wordTfidf (topN-words)
[('dg', 0.652), ('dichotomized', 0.316), ('ising', 0.29), ('entropy', 0.283), ('pdg', 0.158), ('correlations', 0.157), ('dimensions', 0.112), ('pairwise', 0.11), ('djs', 0.099), ('jij', 0.099), ('si', 0.092), ('extrapolation', 0.086), ('hi', 0.082), ('sj', 0.08), ('hdg', 0.079), ('dots', 0.077), ('patches', 0.077), ('pi', 0.076), ('ij', 0.068), ('retinal', 0.066), ('boltzmann', 0.064), ('sampling', 0.06), ('ganglion', 0.059), ('binary', 0.057), ('covariance', 0.054), ('image', 0.05), ('images', 0.05), ('samples', 0.049), ('gibbs', 0.048), ('hateren', 0.047), ('model', 0.045), ('neurons', 0.045), ('regularities', 0.044), ('pixel', 0.043), ('law', 0.042), ('multivariate', 0.042), ('bits', 0.04), ('oftentimes', 0.04), ('undersampling', 0.04), ('activations', 0.04), ('natural', 0.039), ('statistics', 0.035), ('sensory', 0.035), ('integration', 0.034), ('gaussian', 0.034), ('pink', 0.034), ('schneidman', 0.034), ('red', 0.034), ('studying', 0.034), ('blue', 0.033), ('patch', 0.033), ('numerical', 0.032), ('bivariate', 0.031), ('coding', 0.031), ('resembles', 0.031), ('dashed', 0.03), ('match', 0.03), ('redundancy', 0.029), ('berry', 0.029), ('interactions', 0.029), ('responses', 0.029), ('inspection', 0.028), ('sgn', 0.028), ('spectra', 0.028), ('twenty', 0.028), ('joint', 0.027), ('pixels', 0.027), ('distribution', 0.026), ('dkl', 0.026), ('maximum', 0.026), ('constitutes', 0.025), ('scatter', 0.025), ('visual', 0.025), ('seems', 0.025), ('eigenvalue', 0.025), ('spike', 0.025), ('growth', 0.024), ('recognized', 0.024), ('zk', 0.024), ('role', 0.024), ('verify', 0.024), ('claims', 0.023), ('assessed', 0.023), ('ground', 0.023), ('measurements', 0.023), ('true', 0.022), ('varying', 0.022), ('systematically', 0.022), ('plotted', 0.021), ('van', 0.021), ('closed', 0.021), ('power', 0.021), ('critical', 0.021), ('character', 0.021), ('activities', 0.021), ('biometrika', 0.021), ('qualitatively', 0.02), ('study', 0.02), ('median', 0.02), ('solid', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
Author: Matthias Bethge, Philipp Berens
Abstract: Maximum entropy analysis of binary variables provides an elegant way for studying the role of pairwise correlations in neural populations. Unfortunately, these approaches suffer from their poor scalability to high dimensions. In sensory coding, however, high-dimensional data is ubiquitous. Here, we introduce a new approach using a near-maximum entropy model, that makes this type of analysis feasible for very high-dimensional data—the model parameters can be derived in closed form and sampling is easy. Therefore, our NearMaxEnt approach can serve as a tool for testing predictions from a pairwise maximum entropy model not only for low-dimensional marginals, but also for high dimensional measurements of more than thousand units. We demonstrate its usefulness by studying natural images with dichotomized pixel intensities. Our results indicate that the statistics of such higher-dimensional measurements exhibit additional structure that are not predicted by pairwise correlations, despite the fact that pairwise correlations explain the lower-dimensional marginal statistics surprisingly well up to the limit of dimensionality where estimation of the full joint distribution is feasible. 1
2 0.18960644 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images
Author: Pierre Garrigues, Bruno A. Olshausen
Abstract: It has been shown that adapting a dictionary of basis functions to the statistics of natural images so as to maximize sparsity in the coefficients results in a set of dictionary elements whose spatial properties resemble those of V1 (primary visual cortex) receptive fields. However, the resulting sparse coefficients still exhibit pronounced statistical dependencies, thus violating the independence assumption of the sparse coding model. Here, we propose a model that attempts to capture the dependencies among the basis function coefficients by including a pairwise coupling term in the prior over the coefficient activity states. When adapted to the statistics of natural images, the coupling terms learn a combination of facilitatory and inhibitory interactions among neighboring basis functions. These learned interactions may offer an explanation for the function of horizontal connections in V1 in terms of a prior over natural images.
3 0.087733008 17 nips-2007-A neural network implementing optimal state estimation based on dynamic spike train decoding
Author: Omer Bobrowski, Ron Meir, Shy Shoham, Yonina Eldar
Abstract: It is becoming increasingly evident that organisms acting in uncertain dynamical environments often employ exact or approximate Bayesian statistical calculations in order to continuously estimate the environmental state, integrate information from multiple sensory modalities, form predictions and choose actions. What is less clear is how these putative computations are implemented by cortical neural networks. An additional level of complexity is introduced because these networks observe the world through spike trains received from primary sensory afferents, rather than directly. A recent line of research has described mechanisms by which such computations can be implemented using a network of neurons whose activity directly represents a probability distribution across the possible “world states”. Much of this work, however, uses various approximations, which severely restrict the domain of applicability of these implementations. Here we make use of rigorous mathematical results from the theory of continuous time point process filtering, and show how optimal real-time state estimation and prediction may be implemented in a general setting using linear neural networks. We demonstrate the applicability of the approach with several examples, and relate the required network properties to the statistical nature of the environment, thereby quantifying the compatibility of a given network with its environment. 1
4 0.083479419 141 nips-2007-New Outer Bounds on the Marginal Polytope
Author: David Sontag, Tommi S. Jaakkola
Abstract: We give a new class of outer bounds on the marginal polytope, and propose a cutting-plane algorithm for efficiently optimizing over these constraints. When combined with a concave upper bound on the entropy, this gives a new variational inference algorithm for probabilistic inference in discrete Markov Random Fields (MRFs). Valid constraints on the marginal polytope are derived through a series of projections onto the cut polytope. As a result, we obtain tighter upper bounds on the log-partition function. We also show empirically that the approximations of the marginals are significantly more accurate when using the tighter outer bounds. Finally, we demonstrate the advantage of the new constraints for finding the MAP assignment in protein structure prediction. 1
5 0.074223146 181 nips-2007-Sparse Overcomplete Latent Variable Decomposition of Counts Data
Author: Madhusudana Shashanka, Bhiksha Raj, Paris Smaragdis
Abstract: An important problem in many fields is the analysis of counts data to extract meaningful latent components. Methods like Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) have been proposed for this purpose. However, they are limited in the number of components they can extract and lack an explicit provision to control the “expressiveness” of the extracted components. In this paper, we present a learning formulation to address these limitations by employing the notion of sparsity. We start with the PLSA framework and use an entropic prior in a maximum a posteriori formulation to enforce sparsity. We show that this allows the extraction of overcomplete sets of latent components which better characterize the data. We present experimental evidence of the utility of such representations.
6 0.072825633 164 nips-2007-Receptive Fields without Spike-Triggering
7 0.071037441 140 nips-2007-Neural characterization in partially observed populations of spiking neurons
8 0.070600837 36 nips-2007-Better than least squares: comparison of objective functions for estimating linear-nonlinear models
9 0.066019565 33 nips-2007-Bayesian Inference for Spiking Neuron Models with a Sparsity Prior
10 0.064001247 182 nips-2007-Sparse deep belief net model for visual area V2
11 0.063074328 145 nips-2007-On Sparsity and Overcompleteness in Image Models
12 0.061390683 115 nips-2007-Learning the 2-D Topology of Images
13 0.058220055 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
14 0.054348484 183 nips-2007-Spatial Latent Dirichlet Allocation
15 0.05377081 26 nips-2007-An online Hebbian learning rule that performs Independent Component Analysis
16 0.050086271 180 nips-2007-Sparse Feature Learning for Deep Belief Networks
17 0.047180064 113 nips-2007-Learning Visual Attributes
18 0.046353836 123 nips-2007-Loop Series and Bethe Variational Bounds in Attractive Graphical Models
19 0.044735968 103 nips-2007-Inferring Elapsed Time from Stochastic Neural Processes
20 0.042234454 135 nips-2007-Multi-task Gaussian Process Prediction
topicId topicWeight
[(0, -0.167), (1, 0.075), (2, 0.066), (3, -0.042), (4, -0.01), (5, -0.027), (6, -0.053), (7, 0.089), (8, 0.053), (9, -0.013), (10, 0.052), (11, 0.039), (12, 0.03), (13, 0.008), (14, 0.029), (15, 0.032), (16, 0.083), (17, 0.105), (18, -0.017), (19, 0.01), (20, -0.066), (21, 0.09), (22, 0.087), (23, -0.029), (24, -0.088), (25, -0.013), (26, 0.081), (27, -0.012), (28, -0.04), (29, -0.024), (30, 0.037), (31, 0.101), (32, -0.019), (33, -0.111), (34, -0.088), (35, 0.068), (36, 0.073), (37, -0.051), (38, -0.009), (39, 0.027), (40, 0.017), (41, 0.055), (42, -0.146), (43, -0.134), (44, -0.028), (45, -0.047), (46, -0.052), (47, 0.037), (48, 0.065), (49, -0.088)]
simIndex simValue paperId paperTitle
same-paper 1 0.90910214 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
Author: Matthias Bethge, Philipp Berens
Abstract: Maximum entropy analysis of binary variables provides an elegant way for studying the role of pairwise correlations in neural populations. Unfortunately, these approaches suffer from their poor scalability to high dimensions. In sensory coding, however, high-dimensional data is ubiquitous. Here, we introduce a new approach using a near-maximum entropy model, that makes this type of analysis feasible for very high-dimensional data—the model parameters can be derived in closed form and sampling is easy. Therefore, our NearMaxEnt approach can serve as a tool for testing predictions from a pairwise maximum entropy model not only for low-dimensional marginals, but also for high dimensional measurements of more than thousand units. We demonstrate its usefulness by studying natural images with dichotomized pixel intensities. Our results indicate that the statistics of such higher-dimensional measurements exhibit additional structure that are not predicted by pairwise correlations, despite the fact that pairwise correlations explain the lower-dimensional marginal statistics surprisingly well up to the limit of dimensionality where estimation of the full joint distribution is feasible. 1
2 0.76172769 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images
Author: Pierre Garrigues, Bruno A. Olshausen
Abstract: It has been shown that adapting a dictionary of basis functions to the statistics of natural images so as to maximize sparsity in the coefficients results in a set of dictionary elements whose spatial properties resemble those of V1 (primary visual cortex) receptive fields. However, the resulting sparse coefficients still exhibit pronounced statistical dependencies, thus violating the independence assumption of the sparse coding model. Here, we propose a model that attempts to capture the dependencies among the basis function coefficients by including a pairwise coupling term in the prior over the coefficient activity states. When adapted to the statistics of natural images, the coupling terms learn a combination of facilitatory and inhibitory interactions among neighboring basis functions. These learned interactions may offer an explanation for the function of horizontal connections in V1 in terms of a prior over natural images.
3 0.60282427 181 nips-2007-Sparse Overcomplete Latent Variable Decomposition of Counts Data
Author: Madhusudana Shashanka, Bhiksha Raj, Paris Smaragdis
Abstract: An important problem in many fields is the analysis of counts data to extract meaningful latent components. Methods like Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) have been proposed for this purpose. However, they are limited in the number of components they can extract and lack an explicit provision to control the “expressiveness” of the extracted components. In this paper, we present a learning formulation to address these limitations by employing the notion of sparsity. We start with the PLSA framework and use an entropic prior in a maximum a posteriori formulation to enforce sparsity. We show that this allows the extraction of overcomplete sets of latent components which better characterize the data. We present experimental evidence of the utility of such representations.
4 0.51751405 81 nips-2007-Estimating disparity with confidence from energy neurons
Author: Eric K. Tsang, Bertram E. Shi
Abstract: The peak location in a population of phase-tuned neurons has been shown to be a more reliable estimator for disparity than the peak location in a population of position-tuned neurons. Unfortunately, the disparity range covered by a phasetuned population is limited by phase wraparound. Thus, a single population cannot cover the large range of disparities encountered in natural scenes unless the scale of the receptive fields is chosen to be very large, which results in very low resolution depth estimates. Here we describe a biologically plausible measure of the confidence that the stimulus disparity is inside the range covered by a population of phase-tuned neurons. Based upon this confidence measure, we propose an algorithm for disparity estimation that uses many populations of high-resolution phase-tuned neurons that are biased to different disparity ranges via position shifts between the left and right eye receptive fields. The population with the highest confidence is used to estimate the stimulus disparity. We show that this algorithm outperforms a previously proposed coarse-to-fine algorithm for disparity estimation, which uses disparity estimates from coarse scales to select the populations used at finer scales and can effectively detect occlusions.
5 0.50388503 164 nips-2007-Receptive Fields without Spike-Triggering
Author: Guenther Zeck, Matthias Bethge, Jakob H. Macke
Abstract: S timulus selectivity of sensory neurons is often characterized by estimating their receptive field properties such as orientation selectivity. Receptive fields are usually derived from the mean (or covariance) of the spike-triggered stimulus ensemble. This approach treats each spike as an independent message but does not take into account that information might be conveyed through patterns of neural activity that are distributed across space or time. Can we find a concise description for the processing of a whole population of neurons analogous to the receptive field for single neurons? Here, we present a generalization of the linear receptive field which is not bound to be triggered on individual spikes but can be meaningfully linked to distributed response patterns. More precisely, we seek to identify those stimulus features and the corresponding patterns of neural activity that are most reliably coupled. We use an extension of reverse-correlation methods based on canonical correlation analysis. The resulting population receptive fields span the subspace of stimuli that is most informative about the population response. We evaluate our approach using both neuronal models and multi-electrode recordings from rabbit retinal ganglion cells. We show how the model can be extended to capture nonlinear stimulus-response relationships using kernel canonical correlation analysis, which makes it possible to test different coding mechanisms. Our technique can also be used to calculate receptive fields from multi-dimensional neural measurements such as those obtained from dynamic imaging methods. 1
6 0.49141651 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
7 0.490437 145 nips-2007-On Sparsity and Overcompleteness in Image Models
8 0.47412816 141 nips-2007-New Outer Bounds on the Marginal Polytope
9 0.46630403 33 nips-2007-Bayesian Inference for Spiking Neuron Models with a Sparsity Prior
10 0.43201485 182 nips-2007-Sparse deep belief net model for visual area V2
11 0.40756848 123 nips-2007-Loop Series and Bethe Variational Bounds in Attractive Graphical Models
12 0.40118816 196 nips-2007-The Infinite Gamma-Poisson Feature Model
13 0.39953524 121 nips-2007-Local Algorithms for Approximate Inference in Minor-Excluded Graphs
14 0.39578372 180 nips-2007-Sparse Feature Learning for Deep Belief Networks
15 0.38335875 140 nips-2007-Neural characterization in partially observed populations of spiking neurons
16 0.37491968 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)
17 0.37350953 17 nips-2007-A neural network implementing optimal state estimation based on dynamic spike train decoding
18 0.35925269 103 nips-2007-Inferring Elapsed Time from Stochastic Neural Processes
19 0.35131529 115 nips-2007-Learning the 2-D Topology of Images
20 0.3460241 26 nips-2007-An online Hebbian learning rule that performs Independent Component Analysis
topicId topicWeight
[(5, 0.121), (13, 0.025), (16, 0.056), (18, 0.011), (19, 0.019), (21, 0.073), (31, 0.022), (34, 0.042), (35, 0.03), (36, 0.017), (47, 0.091), (49, 0.016), (68, 0.17), (83, 0.102), (85, 0.039), (87, 0.025), (90, 0.066)]
simIndex simValue paperId paperTitle
1 0.86218685 37 nips-2007-Blind channel identification for speech dereverberation using l1-norm sparse learning
Author: Yuanqing Lin, Jingdong Chen, Youngmoo Kim, Daniel D. Lee
Abstract: Speech dereverberation remains an open problem after more than three decades of research. The most challenging step in speech dereverberation is blind channel identification (BCI). Although many BCI approaches have been developed, their performance is still far from satisfactory for practical applications. The main difficulty in BCI lies in finding an appropriate acoustic model, which not only can effectively resolve solution degeneracies due to the lack of knowledge of the source, but also robustly models real acoustic environments. This paper proposes a sparse acoustic room impulse response (RIR) model for BCI, that is, an acoustic RIR can be modeled by a sparse FIR filter. Under this model, we show how to formulate the BCI of a single-input multiple-output (SIMO) system into a l1 norm regularized least squares (LS) problem, which is convex and can be solved efficiently with guaranteed global convergence. The sparseness of solutions is controlled by l1 -norm regularization parameters. We propose a sparse learning scheme that infers the optimal l1 -norm regularization parameters directly from microphone observations under a Bayesian framework. Our results show that the proposed approach is effective and robust, and it yields source estimates in real acoustic environments with high fidelity to anechoic chamber measurements.
same-paper 2 0.81563687 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
Author: Matthias Bethge, Philipp Berens
Abstract: Maximum entropy analysis of binary variables provides an elegant way for studying the role of pairwise correlations in neural populations. Unfortunately, these approaches suffer from their poor scalability to high dimensions. In sensory coding, however, high-dimensional data is ubiquitous. Here, we introduce a new approach using a near-maximum entropy model, that makes this type of analysis feasible for very high-dimensional data—the model parameters can be derived in closed form and sampling is easy. Therefore, our NearMaxEnt approach can serve as a tool for testing predictions from a pairwise maximum entropy model not only for low-dimensional marginals, but also for high dimensional measurements of more than thousand units. We demonstrate its usefulness by studying natural images with dichotomized pixel intensities. Our results indicate that the statistics of such higher-dimensional measurements exhibit additional structure that are not predicted by pairwise correlations, despite the fact that pairwise correlations explain the lower-dimensional marginal statistics surprisingly well up to the limit of dimensionality where estimation of the full joint distribution is feasible. 1
3 0.78039318 165 nips-2007-Regret Minimization in Games with Incomplete Information
Author: Martin Zinkevich, Michael Johanson, Michael Bowling, Carmelo Piccione
Abstract: Extensive games are a powerful model of multiagent decision-making scenarios with incomplete information. Finding a Nash equilibrium for very large instances of these games has received a great deal of recent attention. In this paper, we describe a new technique for solving large games based on regret minimization. In particular, we introduce the notion of counterfactual regret, which exploits the degree of incomplete information in an extensive game. We show how minimizing counterfactual regret minimizes overall regret, and therefore in self-play can be used to compute a Nash equilibrium. We demonstrate this technique in the domain of poker, showing we can solve abstractions of limit Texas Hold’em with as many as 1012 states, two orders of magnitude larger than previous methods. 1
4 0.73629862 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images
Author: Pierre Garrigues, Bruno A. Olshausen
Abstract: It has been shown that adapting a dictionary of basis functions to the statistics of natural images so as to maximize sparsity in the coefficients results in a set of dictionary elements whose spatial properties resemble those of V1 (primary visual cortex) receptive fields. However, the resulting sparse coefficients still exhibit pronounced statistical dependencies, thus violating the independence assumption of the sparse coding model. Here, we propose a model that attempts to capture the dependencies among the basis function coefficients by including a pairwise coupling term in the prior over the coefficient activity states. When adapted to the statistics of natural images, the coupling terms learn a combination of facilitatory and inhibitory interactions among neighboring basis functions. These learned interactions may offer an explanation for the function of horizontal connections in V1 in terms of a prior over natural images.
5 0.72406673 27 nips-2007-Anytime Induction of Cost-sensitive Trees
Author: Saher Esmeir, Shaul Markovitch
Abstract: Machine learning techniques are increasingly being used to produce a wide-range of classifiers for complex real-world applications that involve nonuniform testing costs and misclassification costs. As the complexity of these applications grows, the management of resources during the learning and classification processes becomes a challenging task. In this work we introduce ACT (Anytime Cost-sensitive Trees), a novel framework for operating in such environments. ACT is an anytime algorithm that allows trading computation time for lower classification costs. It builds a tree top-down and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. Using sampling techniques ACT approximates for each candidate split the cost of the subtree under it and favors the one with a minimal cost. Due to its stochastic nature ACT is expected to be able to escape local minima, into which greedy methods may be trapped. Experiments with a variety of datasets were conducted to compare the performance of ACT to that of the state of the art cost-sensitive tree learners. The results show that for most domains ACT produces trees of significantly lower costs. ACT is also shown to exhibit good anytime behavior with diminishing returns.
6 0.71181357 164 nips-2007-Receptive Fields without Spike-Triggering
7 0.70467883 181 nips-2007-Sparse Overcomplete Latent Variable Decomposition of Counts Data
8 0.70379895 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process
9 0.702663 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
10 0.69405603 140 nips-2007-Neural characterization in partially observed populations of spiking neurons
11 0.68693745 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons
12 0.68562114 115 nips-2007-Learning the 2-D Topology of Images
13 0.68336755 195 nips-2007-The Generalized FITC Approximation
14 0.6823945 158 nips-2007-Probabilistic Matrix Factorization
15 0.6816752 96 nips-2007-Heterogeneous Component Analysis
16 0.68165845 174 nips-2007-Selecting Observations against Adversarial Objectives
17 0.68147337 7 nips-2007-A Kernel Statistical Test of Independence
18 0.68019569 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
19 0.67997932 63 nips-2007-Convex Relaxations of Latent Variable Training
20 0.67945856 104 nips-2007-Inferring Neural Firing Rates from Spike Trains Using Gaussian Processes