nips nips2012 nips2012-56 knowledge-graph by maker-knowledge-mining

56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization

Source: pdf

Author: Mijung Park, Jonathan W. Pillow

Abstract: Active learning methods can dramatically improve the yield of neurophysiology experiments by adaptively selecting stimuli to probe a neuron’s receptive ﬁeld (RF). Bayesian active learning methods specify a posterior distribution over the RF given the data collected so far in the experiment, and select a stimulus on each time step that maximally reduces posterior uncertainty. However, existing methods tend to employ simple Gaussian priors over the RF and do not exploit uncertainty at the level of hyperparameters. Incorporating this uncertainty can substantially speed up active learning, particularly when RFs are smooth, sparse, or local in space and time. Here we describe a novel framework for active learning under hierarchical, conditionally Gaussian priors. Our algorithm uses sequential Markov Chain Monte Carlo sampling (“particle ﬁltering” with MCMC) to construct a mixture-of-Gaussians representation of the RF posterior, and selects optimal stimuli using an approximate infomax criterion. The core elements of this algorithm are parallelizable, making it computationally efﬁcient for real-time experiments. We apply our algorithm to simulated and real neural data, and show that it can provide highly accurate receptive ﬁeld estimates from very limited data, even with a small number of hyperparameter samples. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Bayesian active learning with localized priors for fast receptive ﬁeld characterization Jonathan W. [sent-1, score-0.561]

2 edu Abstract Active learning methods can dramatically improve the yield of neurophysiology experiments by adaptively selecting stimuli to probe a neuron’s receptive ﬁeld (RF). [sent-6, score-0.378]

3 Bayesian active learning methods specify a posterior distribution over the RF given the data collected so far in the experiment, and select a stimulus on each time step that maximally reduces posterior uncertainty. [sent-7, score-0.996]

4 Incorporating this uncertainty can substantially speed up active learning, particularly when RFs are smooth, sparse, or local in space and time. [sent-9, score-0.308]

5 Here we describe a novel framework for active learning under hierarchical, conditionally Gaussian priors. [sent-10, score-0.334]

6 Our algorithm uses sequential Markov Chain Monte Carlo sampling (“particle ﬁltering” with MCMC) to construct a mixture-of-Gaussians representation of the RF posterior, and selects optimal stimuli using an approximate infomax criterion. [sent-11, score-0.238]

7 We apply our algorithm to simulated and real neural data, and show that it can provide highly accurate receptive ﬁeld estimates from very limited data, even with a small number of hyperparameter samples. [sent-13, score-0.372]

8 This motivates the use of active learning, known in statistics as “optimal experimental design”, to improve experiments using adaptive stimulus selection in closed-loop experiments. [sent-16, score-0.499]

9 In Bayesian active learning, the basic idea is to deﬁne a statistical model of the neural response, then carry out experiments to efﬁciently characterize the model parameters [1–6]. [sent-18, score-0.285]

10 , stimulus-response pairs) provide likelihood terms that we combine with the prior to obtain a posterior distribution. [sent-24, score-0.358]

11 This posterior reﬂects our beliefs about the parameters given the data collected so far in the experiment. [sent-25, score-0.256]

12 We then select a stimulus for the next trial that maximizes some measure of utility (e. [sent-26, score-0.36]

13 In this paper, we focus on the problem of receptive ﬁeld (RF) characterization from extracellularly recorded spike train data. [sent-30, score-0.252]

14 Typically, RFs are high-dimensional (with 10s to 100s of parameters, depending on the choice of input domain), making them an attractive target for active learning methods. [sent-34, score-0.247]

15 Our paper builds on prior work from Lewi et al [6], a seminal paper that describes active learning for RFs under a conditionally Poisson point process model. [sent-35, score-0.468]

16 Here we show that a sophisticated choice of prior distribution can lead to substantial improvements in active learning. [sent-36, score-0.339]

17 These priors ﬂexibly encode a preference for smooth, sparse, and/or localized structure, which are common features of real neural RFs. [sent-38, score-0.191]

18 In ﬁxed datasets (“passive learning”), the associated estimators give substantial improvements over both maximum likelihood and standard lasso/ridge-regression shrinkage estimators, but they have not yet been incorporated into frameworks for active learning. [sent-39, score-0.281]

19 Active learning with a non-Gaussian prior poses several major challenges, however, since the posterior is non-Gaussian, and requisite posterior expectations are much harder to compute. [sent-40, score-0.556]

20 We address these challenges by exploiting a conditionally Gaussian representation of the prior (and posterior) using sampling at the level of the hyperparameters. [sent-41, score-0.179]

21 We demonstrate our method using the Automatic Locality Determination (ALD) prior introduced in [9], where hyperparameters control the locality of the RF in space-time and frequency. [sent-42, score-0.283]

22 The resulting algorithm outperforms previous active learning methods on real and simulated neural data, even under various forms of model mismatch. [sent-43, score-0.39]

23 2, we formally deﬁne the Bayesian active learning problem and review the algorithm of [6], to which we will compare our results. [sent-46, score-0.247]

24 4 describe the localized RF prior that we will employ for active learning. [sent-49, score-0.466]

25 5, we describe a new active learning method for conditionally Gaussian priors. [sent-51, score-0.334]

26 6, we show results of simulated experiments with simulated and real neural data. [sent-53, score-0.248]

27 2 Bayesian active learning Bayesian active learning (or “experimental design”) provides a model-based framework for selecting optimal stimuli or experiments. [sent-54, score-0.68]

28 The optimal stimulus x is the one that maximizes the expected utility Ey|x [U (x, y)], meaning the utility averaged over the distribution of (as yet) unobserved y|x. [sent-56, score-0.356]

29 It is equivalent to picking the stimulus on each trial that minimizes the expected posterior entropy. [sent-59, score-0.527]

30 The mutual information provided by (y, x) about k, denoted by I(y, k|x, Dt ), is simply the difference between the prior and posterior entropy. [sent-62, score-0.37]

31 1 Method of Lewi, Butera & Paninski 2009 Lewi et al [6] developed a Bayesian active learning framework for RF characterization in closed-loop neurophysiology experiments, which we henceforth refer to as “Lewi-09”. [sent-64, score-0.392]

32 For each presented stimulus x and recorded response y (upper right), we update the posterior over receptive ﬁeld k (bottom), then select the stimulus that maximizes expected information gain (upper left). [sent-66, score-1.014]

33 (C) Graphical model for the hierarchical RF model used here, with a hyper-prior pθ (θ) over hyper-parameters and conditionally Gaussian prior p(k|θ) over the RF. [sent-69, score-0.219]

34 The Lewi-09 method assumes a Gaussian prior over k, which leads to a (non-Gaussian) posterior given by the product of Poisson likelihood and Gaussian prior. [sent-72, score-0.358]

35 Neither the predictive distribution p(y|x, Dt ) nor the posterior entropy H(k|x, y, Dt ) can be computed in closed form. [sent-75, score-0.305]

36 However, the log-concavity of the posterior (guaranteed for suitable choice of g [11]) motivates a tractable and accurate Gaussian approximation to the posterior, which provides a concise analytic formula for posterior entropy [12, 13]. [sent-76, score-0.537]

37 The key contributions of Lewi-09 include fast methods for updating the Gaussian approximation to the posterior and for selecting the stimulus (subject to a maximum-power constraint) that maximizes expected information gain. [sent-77, score-0.577]

38 3 Hierarchical RF models Here we seek to extend the work of Lewi et al to incorporate non-Gaussian priors in a hierarchical receptive ﬁeld model. [sent-82, score-0.256]

39 Intuitively, a good prior can improve active learning by reducing the prior entropy, i. [sent-85, score-0.431]

40 The drawback of more sophisticated priors is that they may complicate the problem of computing and optimizing the posterior expectations needed for active learning. [sent-88, score-0.528]

41 To focus more straightforwardly on the role of the prior distribution, we employ a simple linearGaussian model of the neural response: yt = k xt + t , t ∼ N (0, σ 2 ), (3) 2 where t is iid zero-mean Gaussian noise with variance σ . [sent-89, score-0.332]

42 We then place a hierarchical, conditionally Gaussian prior on k: k|θ θ ∼ N (0, Cθ ) ∼ pθ , (4) (5) where Cθ is a prior covariance matrix that depends on hyperparameters θ. [sent-90, score-0.428]

43 Several authors have pointed out that active learning confers no beneﬁt over ﬁxed-design experiments in linear-Gaussian models with Gaussian priors, due to the fact that the posterior covariance is response-independent [1, 6]. [sent-102, score-0.534]

44 That is, an optimal design (one that minimizes the ﬁnal posterior entropy) can be planned out entirely in advance of the experiment. [sent-103, score-0.232]

45 The posterior distribution in such models is data-dependent via the marginal posterior’s dependence on Y (eq. [sent-105, score-0.28]

46 Thus, active learning is warranted even for linear-Gaussian responses, as we will demonstrate empirically below. [sent-107, score-0.247]

47 4 Automatic Locality Determination (ALD) prior In this paper, we employ a ﬂexible RF model underlying the so-called automatic locality determination (ALD) estimator [9]. [sent-108, score-0.263]

48 1 The key justiﬁcation for the ALD prior is the observation that most neural RFs tend to be localized in both space-time and spatio-temporal frequency. [sent-109, score-0.234]

49 , visual) neurons integrate input over a limited domain in time and space; locality in frequency refers to the band-pass (or smooth / low pass) character of most neural RFs. [sent-112, score-0.205]

50 The ALD prior encodes these tendencies in the parametric form of the covariance matrix Cθ , where hyperparameters θ control the support of both the RF and its Fourier transform. [sent-113, score-0.249]

51 3) as a hyperparameter, since it plays a similar role to C in determining the posterior and evidence. [sent-116, score-0.232]

52 Note that although the conditional ALD prior over k|θ assigns high prior probability to smooth and sparse RFs for some settings of θ, for other settings (i. [sent-118, score-0.211]

53 , where Ms and Mf describe elliptical regions large enough to cover the entire RF) the conditional prior corresponds to a simple ridge prior and imposes no such structure. [sent-120, score-0.214]

54 We place a ﬂat prior over θ so that no strong prior beliefs about spatial locality or bandpass frequency characteristics are imposed a priori. [sent-121, score-0.324]

55 However, as data from a neuron with a truly localized RF accumulates, the support of the marginal posterior p(θ|Dt ) shrinks down on regions that favor a localized RF, shrinking the posterior entropy over k far more quickly than is achievable with methods based on Gaussian priors. [sent-122, score-0.845]

56 1 Posterior updating via sequential Markov Chain Monte Carlo To represent the ALD posterior over k given data, we will rely on the conditionally Gaussian representation of the posterior (eq. [sent-133, score-0.613]

57 The posterior will then be approximated as: p(k|Dt ) ≈ 1 N p(k|Dt , θi ), (11) i where each distribution p(k|Dt , θi ) is Gaussian with θi -dependent mean and covariance (eq. [sent-139, score-0.287]

58 The main idea of our algorithm is adopted from the resample-move particle ﬁlter, which involves generating initial particles; resampling particles according to incoming data; then performing MCMC moves to avoid degeneracy in particles [14]. [sent-146, score-0.711]

59 (13) MCMC Move: Propagate particles via Metropolis Hastings (MH), with multivariate Gaussian proposals centered on the current particle θi of the Markov chain: θ∗ ∼ N (θi , Γ), where Γ is a diagonal matrix with diagonal entries given by the variance of the particles at the end of time step t−1. [sent-150, score-0.666]

60 The main bottleneck of this scheme is the updating of conditional posterior mean µi and covariance Λi for each particle θi , since this requires inversion of a d × d matrix. [sent-153, score-0.421]

61 However, particle updates can be performed efﬁciently in parallel on GPUs or machines with multi-core processors, since the particles do not interact except for stimulus selection, which we describe below. [sent-156, score-0.636]

62 2 Optimal Stimulus Selection Given the posterior over k at time t, represented by a mixture of Gaussians attached to particles {θi } sampled from the marginal posterior, our task is to determine the maximally informative stimulus to present at time t + 1. [sent-158, score-0.874]

63 number of stimuli, for Lewi-09 method (blue), the ALD-based active learning method using 10 (pink) or 100 (red) particles, and the ALD-based passive learning method (black). [sent-170, score-0.285]

64 compute the exact posterior covariance via the formula: 1 ˜ Λt = N N Λi + µi µi − µµ , ˜˜ (14) i=1 1 where µt = N ˜ µi is the full posterior mean. [sent-175, score-0.519]

65 This leads to an upper bound on posterior entropy, since a Gaussian is the maximum-entropy distribution for ﬁxed covariance. [sent-176, score-0.232]

66 We then take the next stimulus to be the maximum-variance eigenvector of the posterior covariance, which is the most informative stimulus under a Gaussian posterior and Gaussian noise model, subject to a power constraint on stimuli [6]. [sent-177, score-1.148]

67 Although this selection criterion is heuristic, since it is not guaranteed to maximize mutual information under the true posterior, it is intuitively reasonable since it selects the stimulus direction along which the current posterior is maximally uncertain. [sent-178, score-0.54]

68 In either scenario, selecting a stimulus proportional to the dominant eigenvector is heuristically justiﬁed by the fact that it will reduce collective uncertainty in particle covariances or cause particle means to converge by narrowing of the marginal posterior. [sent-180, score-0.643]

69 Algorithm 1 Sequential active learning under conditionally Gaussian models Given particles {θi } from p(θ|Dt ), which deﬁne the posterior as P (k|Dt ) = i N (µi , Λi ), ˜ 1. [sent-183, score-0.848]

70 Compute the posterior covariance Λt from {(µi , Λi )} (eq. [sent-184, score-0.287]

71 Select optimal stimulus xt+1 as the maximal eigenvector of Λt 3. [sent-187, score-0.281]

72 Resample particles {θi } with the weights {N (yt+1 |µi xt+1 , xt+1 Λi xt+1 + σi )}. [sent-190, score-0.282]

73 repeat 6 true filter Lewi-09 ALD10 Figure 3: Additional simulated examples comparing Lewi-09 and ALDbased active learning. [sent-193, score-0.377]

74 Middle and right columns show RF estimates after 400 trials of active learning under each method, with average angular error (over independent 20 repeats) shown beneath in red. [sent-195, score-0.444]

75 For the Lewi-09 method, we used a diagonal prior covariance with amplitude set by maximizing marginal likelihood for a small dataset. [sent-205, score-0.229]

76 We compared two versions of the ALD-based algorithm (with 10 and 100 hyperparameter particles, respectively) to examine the relationship between performance and ﬁdelity of the posterior representation. [sent-206, score-0.301]

77 The ALD estimate exhibits more rapid convergence, and performs noticeably better with 100 than with 10 particles (ALD100 vs. [sent-209, score-0.282]

78 We also show the performance of ALD inference under passive learning (iid random stimulus selection), which indicates that the improvement in our method is not simply due to the use of an improved RF estimator. [sent-211, score-0.29]

79 The ﬁlters included: (A) a gabor ﬁlter similar to that used in [6]; (B) a retina-like center-surround receptive ﬁeld; (C) a grid-cell receptive ﬁeld with multiple modes. [sent-216, score-0.286]

80 For the grid-cell example, these ﬁlter is not strongly localized in space, yet the ALDbased estimate substantially outperforms Lewi-09 due to its sensitivity to localized components in frequency. [sent-218, score-0.237]

81 Thus, ALD-based method converges more quickly despite the mismatch between the model used to simulate data and the model assumed for active learning. [sent-219, score-0.247]

82 The stimulus consisted of 1D spatiotemporal white noise (“ﬂickering bars”), with 16 spatial bars on each frame, aligned with the cell’s preferred orientation. [sent-221, score-0.33]

83 We performed simulated active learning by extracting the raw stimuli from 46 minutes of experimental data. [sent-223, score-0.503]

84 On each trial, we then computed the expected information gain from presenting each of these stimuli (blind to neuron’s actual response to each stimulus). [sent-224, score-0.234]

85 We used ALD-based active learning with 10 hyperparameter particles, and examined performance of both algorithms for 960 trials (selecting from ≈ 276,000 possible stimuli on each trial). [sent-225, score-0.527]

86 ) 16 60 8 1 40 0 1 8 16 Lewi-09 50 ALD 160 480 # of stimuli 960 480 stimuli 160 stimuli A C Lewi-09 ALD 20 −20 angle: 55. [sent-227, score-0.453]

87 5 −140 0 320 640 960 # of stimuli Figure 4: Comparison of active learning methods in a simulated experiment with real neural data from a primate V1 simple cell. [sent-231, score-0.569]

88 (A): Average angular difference between the MLE (inset, computed from an entire 46-minute dataset) and the estimates obtained by active learning, as a function of the amount of data. [sent-233, score-0.407]

89 We simulated active learning via an ofﬂine analysis of the ﬁxed dataset, where methods had access to possible stimuli but not responses. [sent-234, score-0.503]

90 (C): Average entropy of hyperparameter particles as a function of t, showing rapid narrowing of marginal posterior. [sent-237, score-0.515]

91 Fig 4A shows the average angular difference between the maximum likelihood estimate (computed with the entire dataset) and the estimate obtained by each active learning method, as a function of the number of stimuli. [sent-238, score-0.406]

92 We also examined the average entropy of the hyperparameter particles as a function of the amount of data used. [sent-242, score-0.424]

93 4C shows that the entropy of the marginal posterior over hyperparameters falls rapidly during the ﬁrst 150 trials of active learning. [sent-244, score-0.762]

94 ˜ The main bottleneck of the algorithm is eigendecomposition of the posterior covariance Λ, which took 30ms for a 256 × 256 matrix on a 2 × 2. [sent-245, score-0.327]

95 Updating importance weights and resampling 10 particles took 4ms, and a single step of MH resampling for each particle took 5ms. [sent-247, score-0.554]

96 In total, it took <60 ms to compute the optimal stimulus in each trial using a non-optimized implementation of our algorithm, indicating that our methods should be fast enough for use in real-time neurophysiology experiments. [sent-248, score-0.447]

97 7 Discussion We have developed a Bayesian active learning method for neural RFs under hierarchical response models with conditionally Gaussian priors. [sent-249, score-0.495]

98 To take account of uncertainty at the level of hyperparameters, we developed an approximate information-theoretic criterion for selecting optimal stimuli under a mixture-of-Gaussians posterior. [sent-250, score-0.218]

99 We applied this framework using a prior designed to capture smooth and localized RF structure. [sent-251, score-0.223]

100 A natural future direction therefore will be to combine the Poisson-GLM likelihood and ALD prior, which will combine the beneﬁts of a more accurate neural response model and a ﬂexible (low-entropy) prior for neural receptive ﬁelds, while incurring only a small increase in computational cost. [sent-254, score-0.41]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('rf', 0.483), ('ald', 0.321), ('particles', 0.282), ('stimulus', 0.252), ('active', 0.247), ('posterior', 0.232), ('dt', 0.217), ('stimuli', 0.151), ('lewi', 0.15), ('receptive', 0.125), ('rfs', 0.122), ('simulated', 0.105), ('localized', 0.104), ('particle', 0.102), ('hyperparameters', 0.102), ('angular', 0.102), ('prior', 0.092), ('locality', 0.089), ('conditionally', 0.087), ('response', 0.083), ('yt', 0.077), ('entropy', 0.073), ('xt', 0.072), ('hyperparameter', 0.069), ('lter', 0.068), ('neurophysiology', 0.067), ('gaussian', 0.06), ('trials', 0.06), ('mf', 0.057), ('infomax', 0.057), ('covariance', 0.055), ('fig', 0.054), ('neuron', 0.052), ('bayesian', 0.051), ('eld', 0.049), ('priors', 0.049), ('marginal', 0.048), ('spike', 0.047), ('resampling', 0.045), ('ms', 0.045), ('recorded', 0.044), ('trial', 0.043), ('mcmc', 0.043), ('aldbased', 0.043), ('butera', 0.043), ('narrowing', 0.043), ('al', 0.042), ('hierarchical', 0.04), ('pillow', 0.04), ('ey', 0.04), ('poisson', 0.04), ('took', 0.04), ('utility', 0.039), ('angle', 0.038), ('passive', 0.038), ('mh', 0.038), ('neural', 0.038), ('ickering', 0.038), ('gabor', 0.036), ('characterization', 0.036), ('estimates', 0.035), ('selecting', 0.035), ('movshon', 0.035), ('likelihood', 0.034), ('determination', 0.033), ('maximally', 0.033), ('rust', 0.033), ('responses', 0.032), ('updating', 0.032), ('uncertainty', 0.032), ('iid', 0.03), ('elliptical', 0.03), ('glm', 0.03), ('sequential', 0.03), ('eigenvector', 0.029), ('substantially', 0.029), ('bars', 0.028), ('primate', 0.028), ('inset', 0.028), ('frequency', 0.027), ('exible', 0.027), ('attached', 0.027), ('resample', 0.027), ('smooth', 0.027), ('automatic', 0.026), ('spatiotemporal', 0.026), ('maximizes', 0.026), ('ingredients', 0.025), ('comput', 0.025), ('carlo', 0.025), ('filter', 0.025), ('evolves', 0.025), ('white', 0.024), ('refers', 0.024), ('monte', 0.024), ('beliefs', 0.024), ('austin', 0.024), ('employ', 0.023), ('mutual', 0.023), ('difference', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization

Author: Mijung Park, Jonathan W. Pillow

2 0.22179282 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

Author: Pedro Ortega, Jordi Grau-moya, Tim Genewein, David Balduzzi, Daniel Braun

Abstract: We propose a novel Bayesian approach to solve stochastic optimization problems that involve ﬁnding extrema of noisy, nonlinear functions. Previous work has focused on representing possible functions explicitly, which leads to a two-step procedure of ﬁrst, doing inference over the function space and second, ﬁnding the extrema of these functions. Here we skip the representation step and directly model the distribution over extrema. To this end, we devise a non-parametric conjugate prior based on a kernel regressor. The resulting posterior distribution directly captures the uncertainty over the maximum of the unknown function. Given t observations of the function, the posterior can be evaluated efﬁciently in time O(t2 ) up to a multiplicative constant. Finally, we show how to apply our model to optimize a noisy, non-convex, high-dimensional objective function.

3 0.21439333 118 nips-2012-Entangled Monte Carlo

Author: Seong-hwan Jun, Liangliang Wang, Alexandre Bouchard-côté

Abstract: We propose a novel method for scalable parallelization of SMC algorithms, Entangled Monte Carlo simulation (EMC). EMC avoids the transmission of particles between nodes, and instead reconstructs them from the particle genealogy. In particular, we show that we can reduce the communication to the particle weights for each machine while efﬁciently maintaining implicit global coherence of the parallel simulation. We explain methods to efﬁciently maintain a genealogy of particles from which any particle can be reconstructed. We demonstrate using examples from Bayesian phylogenetic that the computational gain from parallelization using EMC signiﬁcantly outweighs the cost of particle reconstruction. The timing experiments show that reconstruction of particles is indeed much more efﬁcient as compared to transmission of particles. 1

4 0.1797028 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking

Author: James Scott, Jonathan W. Pillow

Abstract: Characterizing the information carried by neural populations in the brain requires accurate statistical models of neural spike responses. The negative-binomial distribution provides a convenient model for over-dispersed spike counts, that is, responses with greater-than-Poisson variability. Here we describe a powerful data-augmentation framework for fully Bayesian inference in neural models with negative-binomial spiking. Our approach relies on a recently described latentvariable representation of the negative-binomial distribution, which equates it to a Polya-gamma mixture of normals. This framework provides a tractable, conditionally Gaussian representation of the posterior that can be used to design efﬁcient EM and Gibbs sampling based algorithms for inference in regression and dynamic factor models. We apply the model to neural data from primate retina and show that it substantially outperforms Poisson regression on held-out data, and reveals latent structure underlying spike count correlations in simultaneously recorded spike trains. 1

5 0.15045576 32 nips-2012-Active Comparison of Prediction Models

Author: Christoph Sawade, Niels Landwehr, Tobias Scheffer

Abstract: We address the problem of comparing the risks of two given predictive models—for instance, a baseline model and a challenger—as conﬁdently as possible on a ﬁxed labeling budget. This problem occurs whenever models cannot be compared on held-out training data, possibly because the training data are unavailable or do not reﬂect the desired test distribution. In this case, new test instances have to be drawn and labeled at a cost. We devise an active comparison method that selects instances according to an instrumental sampling distribution. We derive the sampling distribution that maximizes the power of a statistical test applied to the observed empirical risks, and thereby minimizes the likelihood of choosing the inferior model. Empirically, we investigate model selection problems on several classiﬁcation and regression tasks and study the accuracy of the resulting p-values. 1

6 0.14240927 262 nips-2012-Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L p$ Loss

7 0.13345329 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

8 0.13257781 195 nips-2012-Learning visual motion in recurrent neural networks

9 0.12784673 94 nips-2012-Delay Compensation with Dynamical Synapses

10 0.12586258 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

11 0.11831547 24 nips-2012-A mechanistic model of early sensory processing based on subtracting sparse representations

12 0.11819196 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature

13 0.11139596 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

14 0.10583349 41 nips-2012-Ancestor Sampling for Particle Gibbs

15 0.098422132 57 nips-2012-Bayesian estimation of discrete entropy with mixtures of stick-breaking priors

16 0.097360089 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions

17 0.095502786 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

18 0.093806796 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems

19 0.088646203 23 nips-2012-A lattice filter model of the visual pathway

20 0.088215709 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.211), (1, 0.027), (2, -0.006), (3, 0.26), (4, -0.103), (5, 0.074), (6, -0.004), (7, 0.028), (8, 0.025), (9, -0.091), (10, -0.146), (11, -0.08), (12, 0.005), (13, -0.011), (14, -0.013), (15, -0.003), (16, 0.027), (17, -0.039), (18, -0.059), (19, 0.036), (20, 0.011), (21, 0.149), (22, 0.034), (23, 0.079), (24, 0.003), (25, -0.073), (26, -0.064), (27, 0.051), (28, 0.176), (29, -0.108), (30, -0.123), (31, -0.048), (32, 0.016), (33, -0.007), (34, 0.025), (35, -0.025), (36, -0.071), (37, 0.069), (38, 0.136), (39, 0.068), (40, 0.108), (41, 0.098), (42, 0.078), (43, 0.074), (44, -0.024), (45, -0.051), (46, 0.014), (47, -0.018), (48, 0.049), (49, -0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9499594 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization

Author: Mijung Park, Jonathan W. Pillow

2 0.6763981 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

Author: Xue-xin Wei, Alan Stocker

Abstract: A common challenge for Bayesian models of perception is the fact that the two fundamental Bayesian components, the prior distribution and the likelihood function, are formally unconstrained. Here we argue that a neural system that emulates Bayesian inference is naturally constrained by the way it represents sensory information in populations of neurons. More speciﬁcally, we show that an efﬁcient coding principle creates a direct link between prior and likelihood based on the underlying stimulus distribution. The resulting Bayesian estimates can show biases away from the peaks of the prior distribution, a behavior seemingly at odds with the traditional view of Bayesian estimation, yet one that has been reported in human perception. We demonstrate that our framework correctly accounts for the repulsive biases previously reported for the perception of visual orientation, and show that the predicted tuning characteristics of the model neurons match the reported orientation tuning properties of neurons in primary visual cortex. Our results suggest that efﬁcient coding is a promising hypothesis in constraining Bayesian models of perceptual inference. 1 Motivation Human perception is not perfect. Biases have been observed in a large number of perceptual tasks and modalities, of which the most salient ones constitute many well-known perceptual illusions. It has been suggested, however, that these biases do not reﬂect a failure of perception but rather an observer’s attempt to optimally combine the inherently noisy and ambiguous sensory information with appropriate prior knowledge about the world [13, 4, 14]. This hypothesis, which we will refer to as the Bayesian hypothesis, has indeed proven quite successful in providing a normative explanation of perception at a qualitative and, more recently, quantitative level (see e.g. [15]). A major challenge in forming models based on the Bayesian hypothesis is the correct selection of two main components: the prior distribution (belief) and the likelihood function. This has encouraged some to criticize the Bayesian hypothesis altogether, claiming that arbitrary choices for these components always allow for unjustiﬁed post-hoc explanations of the data [1]. We do not share this criticism, referring to a number of successful attempts to constrain prior beliefs and likelihood functions based on principled grounds. For example, prior beliefs have been deﬁned as the relative distribution of the sensory variable in the environment in cases where these statistics are relatively easy to measure (e.g. local visual orientations [16]), or where it can be assumed that subjects have learned them over the course of the experiment (e.g. time perception [17]). Other studies have constrained the likelihood function according to known noise characteristics of neurons that are crucially involved in the speciﬁc perceptual process (e.g motion tuned neurons in visual cor∗ http://www.sas.upenn.edu/ astocker/lab 1 world neural representation efficient encoding percept Bayesian decoding Figure 1: Encoding-decoding framework. A stimulus representing a sensory variable θ elicits a ﬁring rate response R = {r1 , r2 , ..., rN } in a population of N neurons. The perceptual task is to generate a ˆ good estimate θ(R) of the presented value of the sensory variable based on this population response. Our framework assumes that encoding is efﬁcient, and decoding is Bayesian based on the likelihood p(R|θ), the prior p(θ), and a squared-error loss function. tex [18]). However, we agree that ﬁnding appropriate constraints is generally difﬁcult and that prior beliefs and likelihood functions have been often selected on the basis of mathematical convenience. Here, we propose that the efﬁcient coding hypothesis [19] offers a joint constraint on the prior and likelihood function in neural implementations of Bayesian inference. Efﬁcient coding provides a normative description of how neurons encode sensory information, and suggests a direct link between measured perceptual discriminability, neural tuning characteristics, and environmental statistics [11]. We show how this link can be extended to a full Bayesian account of perception that includes perceptual biases. We validate our model framework against behavioral as well as neural data characterizing the perception of visual orientation. We demonstrate that we can account not only for the reported perceptual biases away from the cardinal orientations, but also for the speciﬁc response characteristics of orientation-tuned neurons in primary visual cortex. Our work is a novel proposal of how two important normative hypotheses in perception science, namely efﬁcient (en)coding and Bayesian decoding, might be linked. 2 Encoding-decoding framework We consider perception as an inference process that takes place along the simpliﬁed neural encodingdecoding cascade illustrated in Fig. 11 . 2.1 Efﬁcient encoding Efﬁcient encoding proposes that the tuning characteristics of a neural population are adapted to the prior distribution p(θ) of the sensory variable such that the population optimally represents the sensory variable [19]. Different deﬁnitions of “optimally” are possible, and may lead to different results. Here, we assume an efﬁcient representation that maximizes the mutual information between the sensory variable and the population response. With this deﬁnition and an upper limit on the total ﬁring activity, the square-root of the Fisher Information must be proportional to the prior distribution [12, 21]. In order to constrain the tuning curves of individual neurons in the population we also impose a homogeneity constraint, requiring that there exists a one-to-one mapping F (θ) that transforms the ˜ physical space with units θ to a homogeneous space with units θ = F (θ) in which the stimulus distribution becomes uniform. This deﬁnes the mapping as θ F (θ) = p(χ)dχ , (1) −∞ which is the cumulative of the prior distribution p(θ). We then assume a neural population with identical tuning curves that evenly tiles the stimulus range in this homogeneous space. The population provides an efﬁcient representation of the sensory variable θ according to the above constraints [11]. ˜ The tuning curves in the physical space are obtained by applying the inverse mapping F −1 (θ). Fig. 2 1 In the context of this paper, we consider ‘inferring’, ‘decoding’, and ‘estimating’ as synonymous. 2 stimulus distribution d samples # a Fisher information discriminability and average firing rates and b firing rate [ Hz] efficient encoding F likelihood function F -1 likelihood c symmetric asymmetric homogeneous space physical space Figure 2: Efﬁcient encoding constrains the likelihood function. a) Prior distribution p(θ) derived from stimulus statistics. b) Efﬁcient coding deﬁnes the shape of the tuning curves in the physical space by transforming a set of homogeneous neurons using a mapping F −1 that is the inverse of the cumulative of the prior p(θ) (see Eq. (1)). c) As a result, the likelihood shape is constrained by the prior distribution showing heavier tails on the side of lower prior density. d) Fisher information, discrimination threshold, and average ﬁring rates are all uniform in the homogeneous space. illustrates the applied efﬁcient encoding scheme, the mapping, and the concept of the homogeneous space for the example of a symmetric, exponentially decaying prior distribution p(θ). The key idea here is that by assuming efﬁcient encoding, the prior (i.e. the stimulus distribution in the world) directly constrains the likelihood function. In particular, the shape of the likelihood is determined by the cumulative distribution of the prior. As a result, the likelihood is generally asymmetric, as shown in Fig. 2, exhibiting heavier tails on the side of the prior with lower density. 2.2 Bayesian decoding Let us consider a population of N sensory neurons that efﬁciently represents a stimulus variable θ as described above. A stimulus θ0 elicits a speciﬁc population response that is characterized by the vector R = [r1 , r2 , ..., rN ] where ri is the spike-count of the ith neuron over a given time-window τ . Under the assumption that the variability in the individual ﬁring rates is governed by a Poisson process, we can write the likelihood function over θ as N p(R|θ) = (τ fi (θ))ri −τ fi (θ) e , ri ! i=1 (2) ˆ with fi (θ) describing the tuning curve of neuron i. We then deﬁne a Bayesian decoder θLSE as the estimator that minimizes the expected squared-error between the estimate and the true stimulus value, thus θp(R|θ)p(θ)dθ ˆ θLSE (R) = , (3) p(R|θ)p(θ)dθ where we use Bayes’ rule to appropriately combine the sensory evidence with the stimulus prior p(θ). 3 Bayesian estimates can be biased away from prior peaks Bayesian models of perception typically predict perceptual biases toward the peaks of the prior density, a characteristic often considered a hallmark of Bayesian inference. This originates from the 3 a b prior attraction prior prior attraction likelihood repulsion! likelihood c prior prior repulsive bias likelihood likelihood mean posterior mean posterior mean Figure 3: Bayesian estimates biased away from the prior. a) If the likelihood function is symmetric, then the estimate (posterior mean) is, on average, shifted away from the actual value of the sensory variable θ0 towards the prior peak. b) Efﬁcient encoding typically leads to an asymmetric likelihood function whose normalized mean is away from the peak of the prior (relative to θ0 ). The estimate is determined by a combination of prior attraction and shifted likelihood mean, and can exhibit an overall repulsive bias. c) If p(θ0 ) < 0 and the likelihood is relatively narrow, then (1/p(θ)2 ) > 0 (blue line) and the estimate is biased away from the prior peak (see Eq. (6)). common approach of choosing a parametric description of the likelihood function that is computationally convenient (e.g. Gaussian). As a consequence, likelihood functions are typically assumed to be symmetric (but see [23, 24]), leaving the bias of the Bayesian estimator to be mainly determined by the shape of the prior density, i.e. leading to biases toward the peak of the prior (Fig. 3a). In our model framework, the shape of the likelihood function is constrained by the stimulus prior via efﬁcient neural encoding, and is generally not symmetric for non-ﬂat priors. It has a heavier tail on the side with lower prior density (Fig. 3b). The intuition is that due to the efﬁcient allocation of neural resources, the side with smaller prior density will be encoded less accurately, leading to a broader likelihood function on that side. The likelihood asymmetry pulls the Bayes’ least-squares estimate away from the peak of the prior while at the same time the prior pulls it toward its peak. Thus, the resulting estimation bias is the combination of these two counter-acting forces - and both are determined by the prior! 3.1 General derivation of the estimation bias In the following, we will formally derive the mean estimation bias b(θ) of the proposed encodingdecoding framework. Speciﬁcally, we will study the conditions for which the bias is repulsive i.e. away from the peak of the prior density. ˆ We ﬁrst re-write the estimator θLSE (3) by replacing θ with the inverse of its mapping to the homo−1 ˜ geneous space, i.e., θ = F (θ). The motivation for this is that the likelihood in the homogeneous space is symmetric (Fig. 2). Given a value θ0 and the elicited population response R, we can write the estimator as ˜ ˜ ˜ ˜ θp(R|θ)p(θ)dθ F −1 (θ)p(R|F −1 (θ))p(F −1 (θ))dF −1 (θ) ˆ θLSE (R) = = . ˜ ˜ ˜ p(R|θ)p(θ)dθ p(R|F −1 (θ))p(F −1 (θ))dF −1 (θ) Calculating the derivative of the inverse function and noting that F is the cumulative of the prior density, we get 1 1 1 ˜ ˜ ˜ ˜ ˜ ˜ dθ = dθ. dF −1 (θ) = (F −1 (θ)) dθ = dθ = −1 (θ)) ˜ F (θ) p(θ) p(F ˆ Hence, we can simplify θLSE (R) as ˆ θLSE (R) = ˜ ˜ ˜ F −1 (θ)p(R|F −1 (θ))dθ . ˜ ˜ p(R|F −1 (θ))dθ With ˜ K(R, θ) = ˜ p(R|F −1 (θ)) ˜ ˜ p(R|F −1 (θ))dθ 4 we can further simplify the notation and get ˆ θLSE (R) = ˜ ˜ ˜ F −1 (θ)K(R, θ)dθ . (4) ˆ ˜ In order to get the expected value of the estimate, θLSE (θ), we marginalize (4) over the population response space S, ˆ ˜ ˜ ˜ ˜ θLSE (θ) = p(R)F −1 (θ)K(R, θ)dθdR S = F −1 ˜ (θ)( ˜ ˜ p(R)K(R, θ)dR)dθ = ˜ ˜ ˜ F −1 (θ)L(θ)dθ, S where we deﬁne ˜ L(θ) = ˜ p(R)K(R, θ)dR. S ˜ ˜ ˜ It follows that L(θ)dθ = 1. Due to the symmetry in this space, it can be shown that L(θ) is ˜0 . Intuitively, L(θ) can be thought as the normalized ˜ symmetric around the true stimulus value θ average likelihood in the homogeneous space. We can then compute the expected bias at θ0 as b(θ0 ) = ˜ ˜ ˜ ˜ F −1 (θ)L(θ)dθ − F −1 (θ0 ) (5) ˜ This is expression is general where F −1 (θ) is deﬁned as the inverse of the cumulative of an arbitrary ˜ prior density p(θ) (see Eq. (1)) and the dispersion of L(θ) is determined by the internal noise level. ˜ ˜ Assuming the prior density to be smooth, we expand F −1 in a neighborhood (θ0 − h, θ0 + h) that is larger than the support of the likelihood function. Using Taylor’s theorem with mean-value forms of the remainder, we get 1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ F −1 (θ) = F −1 (θ0 ) + F −1 (θ0 ) (θ − θ0 ) + F −1 (θx ) (θ − θ0 )2 , 2 ˜ ˜ ˜ with θx lying between θ0 and θ. By applying this expression to (5), we ﬁnd ˜ θ0 +h b(θ0 ) = = 1 2 ˜ θ0 −h 1 −1 ˜ ˜ ˜ ˜ ˜ 1 F (θx )θ (θ − θ0 )2 L(θ)dθ = ˜ 2 2 ˜ θ0 +h −( ˜ θ0 −h p(θx )θ ˜ ˜ 2 ˜ ˜ 1 )(θ − θ0 ) L(θ)dθ = p(θx )3 4 ˜ θ0 +h 1 ˜ − θ0 )2 L(θ)dθ ˜ ˜ ˜ ( ) ˜(θ ˜ p(F −1 (θx )) θ ( 1 ˜ ˜ ˜ ˜ ) (θ − θ0 )2 L(θ)dθ. p(θx )2 θ ˜ θ0 −h ˜ θ0 +h ˜ θ0 −h In general, there is no simple rule to judge the sign of b(θ0 ). However, if the prior is monotonic ˜ ˜ on the interval F −1 ((θ0 − h, θ0 + h)), then the sign of ( p(θ1 )2 ) is always the same as the sign of x 1 1 ( p(θ0 )2 ) . Also, if the likelihood is sufﬁciently narrow we can approximate ( p(θ1 )2 ) by ( p(θ0 )2 ) , x and therefore approximate the bias as b(θ0 ) ≈ C( 1 ) , p(θ0 )2 (6) where C is a positive constant. The result is quite surprising because it states that as long as the prior is monotonic over the support of the likelihood function, the expected estimation bias is always away from the peaks of the prior! 3.2 Internal (neural) versus external (stimulus) noise The above derivation of estimation bias is based on the assumption that all uncertainty about the sensory variable is caused by neural response variability. This level of internal noise depends on the response magnitude, and thus can be modulated e.g. by changing stimulus contrast. This contrastcontrolled noise modulation is commonly exploited in perceptual studies (e.g. [18]). Internal noise will always lead to repulsive biases in our framework if the prior is monotonic. If internal noise is low, the likelihood is narrow and thus the bias is small. Increasing internal noise leads to increasingly 5 larger biases up to the point where the likelihood becomes wide enough such that monotonicity of the prior over the support of the likelihood is potentially violated. Stimulus noise is another way to modulate the noise level in perception (e.g. random-dot motion stimuli). Such external noise, however, has a different effect on the shape of the likelihood function as compared to internal noise. It modiﬁes the likelihood function (2) by convolving it with the noise kernel. External noise is frequently chosen as additive and symmetric (e.g. zero-mean Gaussian). It is straightforward to prove that such symmetric external noise does not lead to a change in the mean of the likelihood, and thus does not alter the repulsive effect induced by its asymmetry. However, by increasing the overall width of the likelihood, the attractive inﬂuence of the prior increases, resulting in an estimate that is closer to the prior peak than without external noise2 . 4 Perception of visual orientation We tested our framework by modelling the perception of visual orientation. Our choice was based on the fact that i) we have pretty good estimates of the prior distribution of local orientations in natural images, ii) tuning characteristics of orientation selective neurons in visual cortex are wellstudied (monkey/cat), and iii) biases in perceived stimulus orientation have been well characterized. We start by creating an efﬁcient neural population based on measured prior distributions of local visual orientation, and then compare the resulting tuning characteristics of the population and the predicted perceptual biases with reported data in the literature. 4.1 Efﬁcient neural model population for visual orientation Previous studies measured the statistics of the local orientation in large sets of natural images and consistently found that the orientation distribution is multimodal, peaking at the two cardinal orientations as shown in Fig. 4a [16, 20]. We assumed that the visual system’s prior belief over orientation p(θ) follows this distribution and approximate it formally as p(θ) ∝ 2 − | sin(θ)| (black line in Fig. 4b) . (7) Based on this prior distribution we deﬁned an efﬁcient neural representation for orientation. We assumed a population of model neurons (N = 30) with tuning curves that follow a von-Mises distribution in the homogeneous space on top of a constant spontaneous ﬁring rate (5 Hz). We then ˜ applied the inverse transformation F −1 (θ) to all these tuning curves to get the corresponding tuning curves in the physical space (Fig. 4b - red curves), where F (θ) is the cumulative of the prior (7). The concentration parameter for the von-Mises tuning curves was set to κ ≈ 1.6 in the homogeneous space in order to match the measured average tuning width (∼ 32 deg) of neurons in area V1 of the macaque [9]. 4.2 Predicted tuning characteristics of neurons in primary visual cortex The orientation tuning characteristics of our model population well match neurophysiological data of neurons in primary visual cortex (V1). Efﬁcient encoding predicts that the distribution of neurons’ preferred orientation follows the prior, with more neurons tuned to cardinal than oblique orientations by a factor of approximately 1.5. A similar ratio has been found for neurons in area V1 of monkey/cat [9, 10]. Also, the tuning widths of the model neurons vary between 25-42 deg depending on their preferred tuning (see Fig. 4c), matching the measured tuning width ratio of 0.6 between neurons tuned to the cardinal versus oblique orientations [9]. An important prediction of our model is that most of the tuning curves should be asymmetric. Such asymmetries have indeed been reported for the orientation tuning of neurons in area V1 [6, 7, 8]. We computed the asymmetry index for our model population as deﬁned in previous studies [6, 7], and plotted it as a function of the preferred tuning of each neuron (Fig. 4d). The overall asymmetry index in our model population is 1.24 ± 0.11, which approximately matches the measured values for neurons in area V1 of the cat (1.26 ± 0.06) [6]. It also predicts that neurons tuned to the cardinal and oblique orientations should show less symmetry than those tuned to orientations in between. Finally, 2 Note, that these predictions are likely to change if the external noise is not symmetric. 6 a b 25 firing rate(Hz) 0 orientation(deg) asymmetry vs. tuning width 1.0 2.0 90 2.0 e asymmetry 1.0 0 asymmetry index 50 30 width (deg) 10 90 preferred tuning(deg) -90 0 d 0 0 90 asymmetry index 0 orientation(deg) tuning width -90 0 0 probability 0 -90 c efficient representation 0.01 0.01 image statistics -90 0 90 preferred tuning(deg) 25 30 35 40 tuning width (deg) Figure 4: Tuning characteristics of model neurons. a) Distribution of local orientations in natural images, replotted from [16]. b) Prior used in the model (black) and predicted tuning curves according to efﬁcient coding (red). c) Tuning width as a function of preferred orientation. d) Tuning curves of cardinal and oblique neurons are more symmetric than those tuned to orientations in between. e) Both narrowly and broadly tuned neurons neurons show less asymmetry than neurons with tuning widths in between. neurons with tuning widths at the lower and upper end of the range are predicted to exhibit less asymmetry than those neurons whose widths lie in between these extremes (illustrated in Fig. 4e). These last two predictions have not been tested yet. 4.3 Predicted perceptual biases Our model framework also provides speciﬁc predictions for the expected perceptual biases. Humans show systematic biases in perceived orientation of visual stimuli such as e.g. arrays of Gabor patches (Fig. 5a,d). Two types of biases can be distinguished: First, perceived orientations show an absolute bias away from the cardinal orientations, thus away from the peaks of the orientation prior [2, 3]. We refer to these biases as absolute because they are typically measured by adjusting a noise-free reference until it matched the orientation of the test stimulus. Interestingly, these repulsive absolute biases are the larger the smaller the external stimulus noise is (see Fig. 5b). Second, the relative bias between the perceived overall orientations of a high-noise and a low-noise stimulus is toward the cardinal orientations as shown in Fig. 5c, and thus toward the peak of the prior distribution [3, 16]. The predicted perceptual biases of our model are shown Fig. 5e,f. We computed the likelihood function according to (2) and used the prior in (7). External noise was modeled by convolving the stimulus likelihood function with a Gaussian (different widths for different noise levels). The predictions well match both, the reported absolute bias away as well as the relative biases toward the cardinal orientations. Note, that our model framework correctly accounts for the fact that less external noise leads to larger absolute biases (see also discussion in section 3.2). 5 Discussion We have presented a modeling framework for perception that combines efﬁcient (en)coding and Bayesian decoding. Efﬁcient coding imposes constraints on the tuning characteristics of a population of neurons according to the stimulus distribution (prior). It thus establishes a direct link between prior and likelihood, and provides clear constraints on the latter for a Bayesian observer model of perception. We have shown that the resulting likelihoods are in general asymmetric, with 7 absolute bias (data) b c relative bias (data) -4 0 bias(deg) 4 a low-noise stimulus -90 e 90 absolute bias (model) low external noise high external noise 3 high-noise stimulus -90 f 0 90 relative bias (model) 0 bias(deg) d 0 attraction -3 repulsion -90 0 orientation (deg) 90 -90 0 orientation (deg) 90 Figure 5: Biases in perceived orientation: Human data vs. Model prediction. a,d) Low- and highnoise orientation stimuli of the type used in [3, 16]. b) Humans show absolute biases in perceived orientation that are away from the cardinal orientations. Data replotted from [2] (pink squares) and [3] (green (black) triangles: bias for low (high) external noise). c) Relative bias between stimuli with different external noise level (high minus low). Data replotted from [3] (blue triangles) and [16] (red circles). e,f) Model predictions for absolute and relative bias. heavier tails away from the prior peaks. We demonstrated that such asymmetric likelihoods can lead to the counter-intuitive prediction that a Bayesian estimator is biased away from the peaks of the prior distribution. Interestingly, such repulsive biases have been reported for human perception of visual orientation, yet a principled and consistent explanation of their existence has been missing so far. Here, we suggest that these counter-intuitive biases directly follow from the asymmetries in the likelihood function induced by efﬁcient neural encoding of the stimulus. The good match between our model predictions and the measured perceptual biases and orientation tuning characteristics of neurons in primary visual cortex provides further support of our framework. Previous work has suggested that there might be a link between stimulus statistics, neuronal tuning characteristics, and perceptual behavior based on efﬁcient coding principles, yet none of these studies has recognized the importance of the resulting likelihood asymmetries [16, 11]. We have demonstrated here that such asymmetries can be crucial in explaining perceptual data, even though the resulting estimates appear “anti-Bayesian” at ﬁrst sight (see also models of sensory adaptation [23]). Note, that we do not provide a neural implementation of the Bayesian inference step. However, we and others have proposed various neural decoding schemes that can approximate Bayes’ leastsquares estimation using efﬁcient coding [26, 25, 22]. It is also worth pointing out that our estimator is set to minimize total squared-error, and that other choices of the loss function (e.g. MAP estimator) could lead to different predictions. Our framework is general and should be directly applicable to other modalities. In particular, it might provide a new explanation for perceptual biases that are hard to reconcile with traditional Bayesian approaches [5]. Acknowledgments We thank M. Jogan and A. Tank for helpful comments on the manuscript. This work was partially supported by grant ONR N000141110744. 8 References [1] M. Jones, and B. C. Love. Bayesian fundamentalism or enlightenment? On the explanatory status and theoretical contributions of Bayesian models of cognition. Behavioral and Brain Sciences, 34, 169–231,2011. [2] D. P. Andrews. Perception of contours in the central fovea. Nature, 205:1218- 1220, 1965. [3] A. Tomassini, M. J.Morgam. and J. A. Solomon. Orientation uncertainty reduces perceived obliquity. Vision Res, 50, 541–547, 2010. [4] W. S. Geisler, D. Kersten. Illusions, perception and Bayes. Nature Neuroscience, 5(6):508- 510, 2002. [5] M. O. Ernst Perceptual learning: inverting the size-weight illusion. Current Biology, 19:R23- R25, 2009. [6] G. H. Henry, B. Dreher, P. O. Bishop. Orientation speciﬁcity of cells in cat striate cortex. J Neurophysiol, 37(6):1394-409,1974. [7] D. Rose, C. Blakemore An analysis of orientation selectivity in the cat’s visual cortex. Exp Brain Res., Apr 30;20(1):1-17, 1974. [8] N. V. Swindale. Orientation tuning curves: empirical description and estimation of parameters. Biol Cybern., 78(1):45-56, 1998. [9] R. L. De Valois, E. W. Yund, N. Hepler. The orientation and direction selectivity of cells in macaque visual cortex. Vision Res.,22, 531544,1982. [10] B. Li, M. R. Peterson, R. D. Freeman. The oblique effect: a neural basis in the visual cortex. J. Neurophysiol., 90, 204217, 2003. [11] D. Ganguli and E.P. Simoncelli. Implicit encoding of prior probabilities in optimal neural populations. In Adv. Neural Information Processing Systems NIPS 23, vol. 23:658–666, 2011. [12] M. D. McDonnell, N. G. Stocks. Maximally Informative Stimuli and Tuning Curves for Sigmoidal RateCoding Neurons and Populations. Phys Rev Lett., 101(5):058103, 2008. [13] H Helmholtz. Treatise on Physiological Optics (transl.). Thoemmes Press, Bristol, U.K., 2000. Original publication 1867. [14] Y. Weiss, E. Simoncelli, and E. Adelson. Motion illusions as optimal percept. Nature Neuroscience, 5(6):598–604, June 2002. [15] D.C. Knill and W. Richards, editors. Perception as Bayesian Inference. Cambridge University Press, 1996. [16] A R Girshick, M S Landy, and E P Simoncelli. Cardinal rules: visual orientation perception reﬂects knowledge of environmental statistics. Nat Neurosci, 14(7):926–932, Jul 2011. [17] M. Jazayeri and M.N. Shadlen. Temporal context calibrates interval timing. Nature Neuroscience, 13(8):914–916, 2010. [18] A.A. Stocker and E.P. Simoncelli. Noise characteristics and prior expectations in human visual speed perception. Nature Neuroscience, pages 578–585, April 2006. [19] H.B. Barlow. Possible principles underlying the transformation of sensory messages. In W.A. Rosenblith, editor, Sensory Communication, pages 217–234. MIT Press, Cambridge, MA, 1961. [20] D.M. Coppola, H.R. Purves, A.N. McCoy, and D. Purves The distribution of oriented contours in the real world. Proc Natl Acad Sci U S A., 95(7): 4002–4006, 1998. [21] N. Brunel and J.-P. Nadal. Mutual information, Fisher information and population coding. Neural Computation, 10, 7, 1731–1757, 1998. [22] X-X. Wei and A.A. Stocker. Bayesian inference with efﬁcient neural population codes. In Lecture Notes in Computer Science, Artiﬁcial Neural Networks and Machine Learning - ICANN 2012, Lausanne, Switzerland, volume 7552, pages 523–530, 2012. [23] A.A. Stocker and E.P. Simoncelli. Sensory adaptation within a Bayesian framework for perception. In Y. Weiss, B. Sch¨ lkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages o 1291–1298. MIT Press, Cambridge, MA, 2006. Oral presentation. [24] D.C. Knill. Robust cue integration: A Bayesian model and evidence from cue-conﬂict studies with stereoscopic and ﬁgure cues to slant. Journal of Vision, 7(7):1–24, 2007. [25] Deep Ganguli. Efﬁcient coding and Bayesian inference with neural populations. PhD thesis, Center for Neural Science, New York University, New York, NY, September 2012. [26] B. Fischer. Bayesian estimates from heterogeneous population codes. In Proc. IEEE Intl. Joint Conf. on Neural Networks. IEEE, 2010. 9

3 0.58897841 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

Author: Pedro Ortega, Jordi Grau-moya, Tim Genewein, David Balduzzi, Daniel Braun

4 0.58633411 94 nips-2012-Delay Compensation with Dynamical Synapses

Author: Chi Fung, K. Wong, Si Wu

Abstract: Time delay is pervasive in neural information processing. To achieve real-time tracking, it is critical to compensate the transmission and processing delays in a neural system. In the present study we show that dynamical synapses with shortterm depression can enhance the mobility of a continuous attractor network to the extent that the system tracks time-varying stimuli in a timely manner. The state of the network can either track the instantaneous position of a moving stimulus perfectly (with zero-lag) or lead it with an effectively constant time, in agreement with experiments on the head-direction systems in rodents. The parameter regions for delayed, perfect and anticipative tracking correspond to network states that are static, ready-to-move and spontaneously moving, respectively, demonstrating the strong correlation between tracking performance and the intrinsic dynamics of the network. We also ﬁnd that when the speed of the stimulus coincides with the natural speed of the network state, the delay becomes effectively independent of the stimulus amplitude.

5 0.5679577 118 nips-2012-Entangled Monte Carlo

Author: Seong-hwan Jun, Liangliang Wang, Alexandre Bouchard-côté

6 0.56691557 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking

7 0.56312215 262 nips-2012-Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L p$ Loss

8 0.55696601 195 nips-2012-Learning visual motion in recurrent neural networks

9 0.54770583 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes

10 0.53061056 24 nips-2012-A mechanistic model of early sensory processing based on subtracting sparse representations

11 0.52592719 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

12 0.51280546 11 nips-2012-A Marginalized Particle Gaussian Process Regression

13 0.50590682 41 nips-2012-Ancestor Sampling for Particle Gibbs

14 0.50295484 32 nips-2012-Active Comparison of Prediction Models

15 0.4774729 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature

16 0.46671164 365 nips-2012-Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding

17 0.46528089 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

18 0.45366007 23 nips-2012-A lattice filter model of the visual pathway

19 0.4275389 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

20 0.42671517 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.04), (8, 0.07), (17, 0.021), (21, 0.064), (38, 0.189), (39, 0.016), (42, 0.015), (54, 0.031), (55, 0.02), (60, 0.024), (74, 0.041), (76, 0.144), (80, 0.178), (92, 0.047)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95915425 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization

Author: Mijung Park, Jonathan W. Pillow

2 0.94635481 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions

Author: Mathieu Sinn, Bei Chen

Abstract: Conditional Markov Chains (also known as Linear-Chain Conditional Random Fields in the literature) are a versatile class of discriminative models for the distribution of a sequence of hidden states conditional on a sequence of observable variables. Large-sample properties of Conditional Markov Chains have been ﬁrst studied in [1]. The paper extends this work in two directions: ﬁrst, mixing properties of models with unbounded feature functions are being established; second, necessary conditions for model identiﬁability and the uniqueness of maximum likelihood estimates are being given. 1

3 0.9422484 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback

Author: Claudio Gentile, Francesco Orabona

Abstract: We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd-order descent methods, and relies on upper-conﬁdence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We show O(T 1/2 log T ) regret bounds, which improve in several ways on the existing results. We test the effectiveness of our upper-conﬁdence scheme by contrasting against full-information baselines on real-world multilabel datasets, often obtaining comparable performance. 1

4 0.94091833 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

Author: Sanjeev Arora, Rong Ge, Ankur Moitra, Sushant Sachdeva

Abstract: We present a new algorithm for Independent Component Analysis (ICA) which has provable performance guarantees. In particular, suppose we are given samples of the form y = Ax + η where A is an unknown n × n matrix and x is a random variable whose components are independent and have a fourth moment strictly less than that of a standard Gaussian random variable and η is an n-dimensional Gaussian random variable with unknown covariance Σ: We give an algorithm that provable recovers A and Σ up to an additive and whose running time and sample complexity are polynomial in n and 1/ . To accomplish this, we introduce a novel “quasi-whitening” step that may be useful in other contexts in which the covariance of Gaussian noise is not known in advance. We also give a general framework for ﬁnding all local optima of a function (given an oracle for approximately ﬁnding just one) and this is a crucial step in our algorithm, one that has been overlooked in previous attempts, and allows us to control the accumulation of error when we ﬁnd the columns of A one by one via local search. 1

5 0.94057655 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)

Author: Nicolò Cesa-bianchi, Pierre Gaillard, Gabor Lugosi, Gilles Stoltz

Abstract: Mirror descent with an entropic regularizer is known to achieve shifting regret bounds that are logarithmic in the dimension. This is done using either a carefully designed projection or by a weight sharing technique. Via a novel uniﬁed analysis, we show that these two approaches deliver essentially equivalent bounds on a notion of regret generalizing shifting, adaptive, discounted, and other related regrets. Our analysis also captures and extends the generalized weight sharing technique of Bousquet and Warmuth, and can be reﬁned in several ways, including improvements for small losses and adaptive tuning of parameters. 1

6 0.93805999 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

7 0.93793589 65 nips-2012-Cardinality Restricted Boltzmann Machines

8 0.9355045 200 nips-2012-Local Supervised Learning through Space Partitioning

9 0.93544656 227 nips-2012-Multiclass Learning with Simplex Coding

10 0.93531692 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

11 0.93337911 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

12 0.93337619 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

13 0.93304759 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

14 0.93250579 293 nips-2012-Relax and Randomize : From Value to Algorithms

15 0.92984909 168 nips-2012-Kernel Latent SVM for Visual Recognition

16 0.92813581 230 nips-2012-Multiple Choice Learning: Learning to Produce Multiple Structured Outputs

17 0.92742133 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

18 0.92615068 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

19 0.92562562 292 nips-2012-Regularized Off-Policy TD-Learning

20 0.92520797 258 nips-2012-Online L1-Dictionary Learning with Application to Novel Document Detection