nips nips2012 nips2012-262 knowledge-graph by maker-knowledge-mining

262 nips-2012-Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L p$ Loss


Source: pdf

Author: Zhuo Wang, Alan Stocker, Daniel Lee

Abstract: In this work we study how the stimulus distribution influences the optimal coding of an individual neuron. Closed-form solutions to the optimal sigmoidal tuning curve are provided for a neuron obeying Poisson statistics under a given stimulus distribution. We consider a variety of optimality criteria, including maximizing discriminability, maximizing mutual information and minimizing estimation error under a general Lp norm. We generalize the Cramer-Rao lower bound and show how the Lp loss can be written as a functional of the Fisher Information in the asymptotic limit, by proving the moment convergence of certain functions of Poisson random variables. In this manner, we show how the optimal tuning curve depends upon the loss function, and the equivalence of maximizing mutual information with minimizing Lp loss in the limit as p goes to zero. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract In this work we study how the stimulus distribution influences the optimal coding of an individual neuron. [sent-9, score-0.327]

2 Closed-form solutions to the optimal sigmoidal tuning curve are provided for a neuron obeying Poisson statistics under a given stimulus distribution. [sent-10, score-1.028]

3 We consider a variety of optimality criteria, including maximizing discriminability, maximizing mutual information and minimizing estimation error under a general Lp norm. [sent-11, score-0.287]

4 We generalize the Cramer-Rao lower bound and show how the Lp loss can be written as a functional of the Fisher Information in the asymptotic limit, by proving the moment convergence of certain functions of Poisson random variables. [sent-12, score-0.437]

5 In this manner, we show how the optimal tuning curve depends upon the loss function, and the equivalence of maximizing mutual information with minimizing Lp loss in the limit as p goes to zero. [sent-13, score-1.038]

6 1 Introduction A neuron represents sensory information via its spike train. [sent-14, score-0.242]

7 Rate coding maps an input stimulus to a spiking rate via the neuron’s tuning. [sent-15, score-0.356]

8 Previous work in computational neuroscience has tried to explain this mapping via optimality criteria. [sent-16, score-0.048]

9 An important factor determining the optimal shape of the tuning curve is the input statistics of the stimulus. [sent-17, score-0.562]

10 It has previously been observed that environmental statistics can influence the neural tuning curves of sensory neurons [1, 2, 3, 4, 5]. [sent-18, score-0.561]

11 However, most theoretical analysis has usually assumed the input stimulus distribution to be uniform. [sent-19, score-0.23]

12 Only recently, theoretical work has been demonstrating how non-uniform prior distributions will affect the optimal shape of the neural tuning curves [6, 7, 8, 9, 10]. [sent-20, score-0.524]

13 An important factor in determining the optimal tuning curve is the optimality criterion [11]. [sent-21, score-0.586]

14 Most previous work used local Fisher Information [12, 13, 14], the estimation square loss or discriminability (discrimax) [15, 16] or the mutual information (infomax) [9, 17] to evaluate neural codes. [sent-22, score-0.356]

15 It has also been proved that both lower bounds can be attained on the con1 dition that the encoding time is long enough and the estimator behaves well in the asymptotic limit. [sent-24, score-0.411]

16 In this paper, we ask the question what tuning curve is optimally encoding a stimulus with an arbitrary prior distribution such that the Lp estimation lost is minimized. [sent-26, score-0.917]

17 With the asymptotic analysis of the maximum likelihood estimator (MLE), we can show how the Lp loss converges to a functional of Fisher Information in the limit of long encoding time. [sent-28, score-0.582]

18 The optimization of such functional can be conducted for arbitrary stimulus prior and for all p ≥ 0 in general. [sent-29, score-0.332]

19 The special case of p = 2 and the limit p → 0 corresponds to discrimax and infomax, respectively. [sent-30, score-0.197]

20 The general result offers a framework to help us understand the infomax problem in a new point of view: maximizing mutual information is equivalent to minimizing the expected L0 loss. [sent-31, score-0.4]

21 The encoding process involves a probabilistic mapping from stimulus to a random number of spikes. [sent-35, score-0.326]

22 For each s, the neuron will fire at a predetermined firing rate h(s), representing the neuron’s tuning curve. [sent-36, score-0.502]

23 The encoded information will contain some noise due to neural variability. [sent-37, score-0.072]

24 According to the conventional Poisson noise model, if the available coding time is T , then the observed spike count N has a Poisson distribution with parameter λ = h(s)T P[N = k] = 1 k (h(s)T ) e−h(s)T k! [sent-38, score-0.122]

25 (1) The tuning curve h(s) is assumed to be sigmoidal, i. [sent-39, score-0.517]

26 monotonically increasing, but limited to a certain range hmin ≤ h(s) ≤ hmax due to biological constraints. [sent-41, score-0.555]

27 The estimator s = s(N ) should be a function of observed count N . [sent-43, score-0.069]

28 First the MLE estimator λ for mean firing rate ˆ = N/T . [sent-45, score-0.097]

29 There for the MLE estimator for stimulus s is simply s = h−1 (λ). [sent-46, score-0.299]

30 (2) For tuning function h(s) with Poisson spiking model, the Fisher Information is (see [12, 7]) Ih (s) = T h (s)2 h(s) (3) Further with the sigmoidal assumption, by solving the above ordinary differential equation, we can derive the inverse formula in Eq. [sent-50, score-0.464]

31 (5) h(s) = s −∞ 1 hmin + √ 2 T √ Ih (t) dt ≤ 2 T 2 s −∞ Ih (t) dt hmax − hmin (4) (5) This constraint is closely related to the Jeffrey’s prior, which claims that π ∗ (s) ∝ I(s) is the least informative prior. [sent-52, score-1.159]

32 The above inequality means that the normalization factor of the Jeffrey’s prior is finite, as long as the range of firing rate is limited hmin ≤ h(s) ≤ hmax . [sent-53, score-0.675]

33 (7) is only a lower bound, it is attained by the MLE of s asymptotically. [sent-56, score-0.055]

34 (5), variation method can be applied and the optimal condition and the optimal solution can be written as Ih (s) ∝ π(s) 3. [sent-59, score-0.09]

35 Although this is an lower bound on the mutual information which we want to maximize, the equality holds asymptotically by the MLE of s as stated in [7]. [sent-61, score-0.296]

36 To maximize the mutual information, we can maximize the right side of Eq. [sent-62, score-0.194]

37 (5) by variation method again and obtain the optimal condition and optimal solution as Ih (s) ∝ π 2 (s), h0 (s) = hmin + hmax − hmin s −∞ ∞ −∞ π(t) dt π(t) dt 2 (10) To see the connection between solutions in Eq. [sent-64, score-1.249]

38 4 Asymptotic Behavior of Estimators In general, minimizing the lower bound does not imply that the measures of interest, e. [sent-67, score-0.103]

39 In order to make the lower bounds useful, we need to know the conditions for which there exist ”good” estimators that can reach these theoretical lower bounds. [sent-72, score-0.096]

40 First we will introduce some definitions of estimator properties. [sent-73, score-0.069]

41 Let T be the encoding time for a √ neuron with Poisson noise, and sT be the MLE at time T . [sent-74, score-0.242]

42 (7) hold, we need the asymptotic consistent and asymptotic efficient estimators. [sent-77, score-0.332]

43 To have the equality in (9) hold, we need the asymptotic normal estimators (see [7]). [sent-78, score-0.219]

44 finding the tuning curve which minimizes the Lp estimation loss, then we need the moment convergent estimator for all p-th moments. [sent-81, score-0.683]

45 Here we will give two theorems to prove that the MLE s of the true stimulus s would satisfy all the ˆ above four properties in Eq. [sent-82, score-0.256]

46 Let h(s) be the tuning curve of a neuron with Poisson spiking ˆ noise. [sent-84, score-0.709]

47 (c) The p-th moment of Yn converges, and limn→∞ Eλ [|Yn |p ] = E[|Z|p ] for all p > 0. [sent-97, score-0.067]

48 One direct application of this theorem is that, if the tuning curve h(s) = s for (s > 0) and the encoding time is T , then the estimator s = N/T is asymptotically efficient since as T → ∞, ˆ √ 2 Var[ˆ] → E[|Zλ / T | ] = s/T = 1/I(s). [sent-98, score-0.755]

49 (b) The p-th moment of Yn converges, and limn→∞ Eλ [|Yn |p ] = E[|Z |p ] for all p > 0. [sent-103, score-0.067]

50 Theorem 1 indicates that we can always estimate the firing rate λ = h(s) efficiently by the estimator ˆ λ = N/T . [sent-104, score-0.097]

51 Theorem 2 indicates, however, that we can also estimate a smooth transformation of the firing rate efficiently in the asymptotic limit T → ∞. [sent-105, score-0.271]

52 Now, if we go back to the conventional ˆ setting of the tuning curve λ = h(s), we can estimate the stimulus by the estimator s = h−1 (λ). [sent-106, score-0.842]

53 ˆ To meet the need of boundedness of g: |g (λ)| ≤ M , we have 1/g (λ) = h (s) ≥ 1/M hence this theory only works for stimulus from a compact set s ∈ [−M, M ], although the M can be chosen as large as possible. [sent-107, score-0.23]

54 The larger the M is, the longer encoding time T will be necessary to observe the asymptotic normality and the convergence of moments. [sent-108, score-0.347]

55 ˆ The estimator s = h−1 (λ) is biased for finite T , but it is asymptotically unbiased and efficient. [sent-109, score-0.104]

56 With asymptotic normality, we can do more than just optimizing Imutual (N, s) and Lp (ˆ, s). [sent-111, score-0.166]

57 In general we can find the optimal tuning curve s which minimizes the expected Lp loss Lp (ˆ, s) since as T → ∞ s p √ p E T (ˆT − s) s → E |Z | (17) where Z = χ/ I(s)/T , χ ∼ N (0, 1). [sent-112, score-0.676]

58 When p = 2 the s loss is the squared loss and when p → 0, the Lp loss converges to 0-1 loss pointwise. [sent-120, score-0.373]

59 5 Optimal Tuning Curves: Infomax, Discrimax and More With the asymptotic normality and moment convergence, we know the asymptotic expected Lp loss will approach E[|Z |p ] for each s. [sent-122, score-0.547]

60 I(s)p/2 ds = K(p) (21) To obtain the optimal tuning curve for the Lp loss, we need to solve a simple variation problem minimize π(s)f (Ih (s)) ds (22) subject to Ih (s) ds ≤ const (23) h with fp (x) = −x−p/2−1 . [sent-124, score-0.997]

61 For some examples of Lp optimal tuning curves, see Fig. [sent-126, score-0.373]

62 This shows that the infomax tuning curve simultaneously optimizes the mutual information as well as the expected L0 norm of the error s − s. [sent-129, score-0.853]

63 0 0 2 4 stimulus s Figure 2: For stimulus with standard Gaussian prior distribution (inset figure) and various values of p, (A) shows the optimal allocation of Fisher Information Ip (s) and (B) shows the fp -optimal tuning curve hp (s). [sent-138, score-1.235]

64 When p = 2 the f2 -optimal (discrimax) tuning curve minimizes the squared loss and when p = 0 the f0 -optimal (infomax) tuning curve maximizes the mutual information. [sent-139, score-1.294]

65 In each iteration, a random stimulus s was chosen from the standard Gaussian distribution or Exponential distribution with mean one. [sent-141, score-0.23]

66 A Poisson neuron was simulated to generate spikes in response to that stimulus. [sent-142, score-0.146]

67 the theoretical value of Lp loss for fq -optimal tuning curve p E [|ˆ − s| ] = K(p) · (4T )−p/2 s hmax − hmin −p p π(t)1/(q+1) dt p π(s)1− q+1 ds (31) The above theoretical prediction works well for distributions with compact support s ∈ [A, B]. [sent-145, score-1.384]

68 The numerical and theoretical prediction of Lp loss are plotted, for both Gaussian N (0, 1) prior (Fig. [sent-150, score-0.152]

69 7 Discussion In this paper we have derived a closed form solution for the optimal tuning curve of a single neuron given an arbitrary stimulus prior π(s) and for a variety of optimality. [sent-177, score-1.012]

70 Our work offers a principled explanation for the observed non-linearity in neuron tuning: Each neuron should adapt their tuning curves to reallocate the limited amount of Fisher information they can carry and minimize the Lp error. [sent-178, score-0.718]

71 Two special and well known cases are maximum mutual information (p = 0) and maximum discriminant (p = 2). [sent-180, score-0.167]

72 To obtain this result, we established two theorems regarding the asymptotic behavior of the MLE ˆ s = h−1 (λ). [sent-181, score-0.192]

73 Asymptotically, the MLE converges to a standard Gaussian not only with regard to ˆ its distribution, but also in terms of its p-th moment for arbitrary p > 0. [sent-182, score-0.132]

74 By calculating the p-th moments for the Gaussian random variable, we can predict the Lp loss of the encoding-decoding process and optimize the tuning curve by minimizing the attainable limit. [sent-183, score-0.654]

75 The Cramer-Rao lower bound and the mutual information lower bound proposed by [7] are special cases with p = 2 or p = 0 respectively. [sent-184, score-0.309]

76 So far, we have put our focus on a single neuron with sigmoidal tuning curve. [sent-185, score-0.564]

77 The optimal condition for Fisher information can be calculated, regardless of the tuning curve(s) format. [sent-187, score-0.394]

78 According to the assumption on the number of neurons and the shape of the tuning curves, the optimized Fisher information can be inverted to derive the optimal tuning curve via the same type of analysis as we presented in this paper. [sent-188, score-0.954]

79 One theoretic limitation of our method is that we only addressed the problem for long encoding times, which is usually not the typical scenario in real sensory systems. [sent-189, score-0.175]

80 Though the long encoding time limit can be replaced by short encoding time with many identical tuned neurons. [sent-190, score-0.294]

81 It is still an interesting problem to find out the optimal tuning curve for arbitrary prior, in the sense of Lp loss function. [sent-191, score-0.674]

82 Some work [16, 20] has been done to maximize mutual information or L2 for uniformly distributed stimuli. [sent-192, score-0.191]

83 Another problem is that the asymptotic behavior is not uniformly true if the space of stimulus is not compact. [sent-193, score-0.396]

84 The asymptotic behavior will take longer to be observed if the slop of the tuning function is too close to zero. [sent-194, score-0.494]

85 (c) In general, convergence in distribution does not imply convergence in p-th moment. [sent-202, score-0.042]

86 For our purpose, notice that S2 (p, a) = 0 for a > p/2 and S2 (p, a) ≤ pa . [sent-210, score-0.043]

87 Therefore the supreme of E[|Yn |p ] is bounded since √ E[|Yn |p ] = E[| n(Sn /n − λ)|p ] ≤ n−p/2 p/2 (nλ)a pa ≤ a=0 n(λp)p/2+1 ≤ C(λp)p/2 nλp − 1 (36) For arbitrary q, choose any even number p such that p > q, and by Jensen’s inequality, E[|Yn |q ] ≤ E[|Yn |p ]q/p . [sent-211, score-0.092]

88 Apply mean value theorem for g(x) near λ : ˆ ˆ g(λn ) − g(λ) = g (λ∗ )(λn − λ) ˆ for some λ∗ between λn and λ. [sent-215, score-0.038]

89 Therefore √ √ D ˆ ˆ n g(λn ) − g(λ) = g (λ∗ ) n(λ − λ) → g (λ)Z (37) (38) ˆ Note that λn → λ in probability, λ∗ → λ in probability and g (λ∗ ) → g (λ) in probability, together √ ˆ D with the fact that n(λn − λ) → Z, apply Slutsky’s theorem and the conclusion follows. [sent-216, score-0.038]

90 (b) Use Taylor’s expansion and Slutsky’s theorem again, p p √ √ p p ˆ ˆ n g(λn ) − g(λ) = g (λ∗ ) n(λ − λ) = |g (λ∗ )| |Yn |p → |g (λ)| |Yn |p (39) To see |Yn |p is uniformly integrable, notice that |Yn |p ≥ K ⇒ |Yn |p ≥ K · M −p . [sent-217, score-0.038]

91 Adaptation of the motion-sensitive neuron h1 is generated locally and governed by contrast frequency. [sent-220, score-0.146]

92 Could information theory provide an ecological theory of sensory processing? [sent-228, score-0.075]

93 Neural population coding of sound level adapts to stimulus statistics. [sent-234, score-0.33]

94 Noise characteristics and prior expectations in human visual speed perception. [sent-237, score-0.046]

95 Non linear neurons in the low noise limit: A factorial code maximizes information transfer, 1994. [sent-240, score-0.087]

96 Maximally informative stimuli and tuning curves for sigmoidal rate-coding neurons and populations. [sent-248, score-0.538]

97 Implicit encoding of prior probabilities in optimal neural populations. [sent-254, score-0.215]

98 Narrow versus wide tuning curves: Whats best for a population code? [sent-276, score-0.376]

99 Optimal neural rate coding leads to bimodal firing rate distributions. [sent-282, score-0.136]

100 Neural population coding is optimized by discrete tuning curves. [sent-298, score-0.428]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hmin', 0.36), ('tuning', 0.328), ('ih', 0.293), ('lp', 0.268), ('stimulus', 0.23), ('yn', 0.212), ('mle', 0.202), ('fisher', 0.196), ('hmax', 0.195), ('curve', 0.189), ('infomax', 0.169), ('asymptotic', 0.166), ('neuron', 0.146), ('mutual', 0.146), ('poisson', 0.126), ('dt', 0.122), ('discrimax', 0.12), ('fp', 0.117), ('ds', 0.106), ('encoding', 0.096), ('sigmoidal', 0.09), ('loss', 0.084), ('limit', 0.077), ('curves', 0.077), ('sn', 0.076), ('ring', 0.07), ('estimator', 0.069), ('moment', 0.067), ('normality', 0.064), ('discriminability', 0.055), ('sensory', 0.054), ('coding', 0.052), ('hp', 0.05), ('imutual', 0.048), ('nadal', 0.048), ('rotermund', 0.048), ('slutsky', 0.048), ('population', 0.048), ('spiking', 0.046), ('prior', 0.046), ('optimal', 0.045), ('pa', 0.043), ('yt', 0.043), ('neurons', 0.043), ('stocker', 0.042), ('pennsylvania', 0.042), ('brunel', 0.039), ('theorem', 0.038), ('bound', 0.038), ('philadelphia', 0.037), ('var', 0.037), ('converges', 0.037), ('asymptotically', 0.035), ('tm', 0.035), ('integrable', 0.033), ('sher', 0.033), ('jeffrey', 0.033), ('bethge', 0.033), ('lower', 0.033), ('neuronal', 0.033), ('maximizing', 0.032), ('minimizing', 0.032), ('sb', 0.031), ('environmental', 0.031), ('estimators', 0.03), ('minimizes', 0.03), ('limn', 0.029), ('combinatorics', 0.029), ('decoding', 0.029), ('arbitrary', 0.028), ('ip', 0.028), ('neural', 0.028), ('rate', 0.028), ('functional', 0.028), ('conventional', 0.026), ('theorems', 0.026), ('long', 0.025), ('md', 0.025), ('maximize', 0.024), ('gaussian', 0.024), ('optimality', 0.024), ('neuroscience', 0.024), ('noise', 0.023), ('equality', 0.023), ('square', 0.022), ('attained', 0.022), ('numerical', 0.022), ('inequality', 0.021), ('spike', 0.021), ('tvd', 0.021), ('zhuo', 0.021), ('dia', 0.021), ('ddlee', 0.021), ('ganguli', 0.021), ('supreme', 0.021), ('sharpen', 0.021), ('vars', 0.021), ('convergence', 0.021), ('information', 0.021), ('moments', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 262 nips-2012-Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L p$ Loss

Author: Zhuo Wang, Alan Stocker, Daniel Lee

Abstract: In this work we study how the stimulus distribution influences the optimal coding of an individual neuron. Closed-form solutions to the optimal sigmoidal tuning curve are provided for a neuron obeying Poisson statistics under a given stimulus distribution. We consider a variety of optimality criteria, including maximizing discriminability, maximizing mutual information and minimizing estimation error under a general Lp norm. We generalize the Cramer-Rao lower bound and show how the Lp loss can be written as a functional of the Fisher Information in the asymptotic limit, by proving the moment convergence of certain functions of Poisson random variables. In this manner, we show how the optimal tuning curve depends upon the loss function, and the equivalence of maximizing mutual information with minimizing Lp loss in the limit as p goes to zero. 1

2 0.24946441 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

Author: Xue-xin Wei, Alan Stocker

Abstract: A common challenge for Bayesian models of perception is the fact that the two fundamental Bayesian components, the prior distribution and the likelihood function, are formally unconstrained. Here we argue that a neural system that emulates Bayesian inference is naturally constrained by the way it represents sensory information in populations of neurons. More specifically, we show that an efficient coding principle creates a direct link between prior and likelihood based on the underlying stimulus distribution. The resulting Bayesian estimates can show biases away from the peaks of the prior distribution, a behavior seemingly at odds with the traditional view of Bayesian estimation, yet one that has been reported in human perception. We demonstrate that our framework correctly accounts for the repulsive biases previously reported for the perception of visual orientation, and show that the predicted tuning characteristics of the model neurons match the reported orientation tuning properties of neurons in primary visual cortex. Our results suggest that efficient coding is a promising hypothesis in constraining Bayesian models of perceptual inference. 1 Motivation Human perception is not perfect. Biases have been observed in a large number of perceptual tasks and modalities, of which the most salient ones constitute many well-known perceptual illusions. It has been suggested, however, that these biases do not reflect a failure of perception but rather an observer’s attempt to optimally combine the inherently noisy and ambiguous sensory information with appropriate prior knowledge about the world [13, 4, 14]. This hypothesis, which we will refer to as the Bayesian hypothesis, has indeed proven quite successful in providing a normative explanation of perception at a qualitative and, more recently, quantitative level (see e.g. [15]). A major challenge in forming models based on the Bayesian hypothesis is the correct selection of two main components: the prior distribution (belief) and the likelihood function. This has encouraged some to criticize the Bayesian hypothesis altogether, claiming that arbitrary choices for these components always allow for unjustified post-hoc explanations of the data [1]. We do not share this criticism, referring to a number of successful attempts to constrain prior beliefs and likelihood functions based on principled grounds. For example, prior beliefs have been defined as the relative distribution of the sensory variable in the environment in cases where these statistics are relatively easy to measure (e.g. local visual orientations [16]), or where it can be assumed that subjects have learned them over the course of the experiment (e.g. time perception [17]). Other studies have constrained the likelihood function according to known noise characteristics of neurons that are crucially involved in the specific perceptual process (e.g motion tuned neurons in visual cor∗ http://www.sas.upenn.edu/ astocker/lab 1 world neural representation efficient encoding percept Bayesian decoding Figure 1: Encoding-decoding framework. A stimulus representing a sensory variable θ elicits a firing rate response R = {r1 , r2 , ..., rN } in a population of N neurons. The perceptual task is to generate a ˆ good estimate θ(R) of the presented value of the sensory variable based on this population response. Our framework assumes that encoding is efficient, and decoding is Bayesian based on the likelihood p(R|θ), the prior p(θ), and a squared-error loss function. tex [18]). However, we agree that finding appropriate constraints is generally difficult and that prior beliefs and likelihood functions have been often selected on the basis of mathematical convenience. Here, we propose that the efficient coding hypothesis [19] offers a joint constraint on the prior and likelihood function in neural implementations of Bayesian inference. Efficient coding provides a normative description of how neurons encode sensory information, and suggests a direct link between measured perceptual discriminability, neural tuning characteristics, and environmental statistics [11]. We show how this link can be extended to a full Bayesian account of perception that includes perceptual biases. We validate our model framework against behavioral as well as neural data characterizing the perception of visual orientation. We demonstrate that we can account not only for the reported perceptual biases away from the cardinal orientations, but also for the specific response characteristics of orientation-tuned neurons in primary visual cortex. Our work is a novel proposal of how two important normative hypotheses in perception science, namely efficient (en)coding and Bayesian decoding, might be linked. 2 Encoding-decoding framework We consider perception as an inference process that takes place along the simplified neural encodingdecoding cascade illustrated in Fig. 11 . 2.1 Efficient encoding Efficient encoding proposes that the tuning characteristics of a neural population are adapted to the prior distribution p(θ) of the sensory variable such that the population optimally represents the sensory variable [19]. Different definitions of “optimally” are possible, and may lead to different results. Here, we assume an efficient representation that maximizes the mutual information between the sensory variable and the population response. With this definition and an upper limit on the total firing activity, the square-root of the Fisher Information must be proportional to the prior distribution [12, 21]. In order to constrain the tuning curves of individual neurons in the population we also impose a homogeneity constraint, requiring that there exists a one-to-one mapping F (θ) that transforms the ˜ physical space with units θ to a homogeneous space with units θ = F (θ) in which the stimulus distribution becomes uniform. This defines the mapping as θ F (θ) = p(χ)dχ , (1) −∞ which is the cumulative of the prior distribution p(θ). We then assume a neural population with identical tuning curves that evenly tiles the stimulus range in this homogeneous space. The population provides an efficient representation of the sensory variable θ according to the above constraints [11]. ˜ The tuning curves in the physical space are obtained by applying the inverse mapping F −1 (θ). Fig. 2 1 In the context of this paper, we consider ‘inferring’, ‘decoding’, and ‘estimating’ as synonymous. 2 stimulus distribution d samples # a Fisher information discriminability and average firing rates and b firing rate [ Hz] efficient encoding F likelihood function F -1 likelihood c symmetric asymmetric homogeneous space physical space Figure 2: Efficient encoding constrains the likelihood function. a) Prior distribution p(θ) derived from stimulus statistics. b) Efficient coding defines the shape of the tuning curves in the physical space by transforming a set of homogeneous neurons using a mapping F −1 that is the inverse of the cumulative of the prior p(θ) (see Eq. (1)). c) As a result, the likelihood shape is constrained by the prior distribution showing heavier tails on the side of lower prior density. d) Fisher information, discrimination threshold, and average firing rates are all uniform in the homogeneous space. illustrates the applied efficient encoding scheme, the mapping, and the concept of the homogeneous space for the example of a symmetric, exponentially decaying prior distribution p(θ). The key idea here is that by assuming efficient encoding, the prior (i.e. the stimulus distribution in the world) directly constrains the likelihood function. In particular, the shape of the likelihood is determined by the cumulative distribution of the prior. As a result, the likelihood is generally asymmetric, as shown in Fig. 2, exhibiting heavier tails on the side of the prior with lower density. 2.2 Bayesian decoding Let us consider a population of N sensory neurons that efficiently represents a stimulus variable θ as described above. A stimulus θ0 elicits a specific population response that is characterized by the vector R = [r1 , r2 , ..., rN ] where ri is the spike-count of the ith neuron over a given time-window τ . Under the assumption that the variability in the individual firing rates is governed by a Poisson process, we can write the likelihood function over θ as N p(R|θ) = (τ fi (θ))ri −τ fi (θ) e , ri ! i=1 (2) ˆ with fi (θ) describing the tuning curve of neuron i. We then define a Bayesian decoder θLSE as the estimator that minimizes the expected squared-error between the estimate and the true stimulus value, thus θp(R|θ)p(θ)dθ ˆ θLSE (R) = , (3) p(R|θ)p(θ)dθ where we use Bayes’ rule to appropriately combine the sensory evidence with the stimulus prior p(θ). 3 Bayesian estimates can be biased away from prior peaks Bayesian models of perception typically predict perceptual biases toward the peaks of the prior density, a characteristic often considered a hallmark of Bayesian inference. This originates from the 3 a b prior attraction prior prior attraction likelihood repulsion! likelihood c prior prior repulsive bias likelihood likelihood mean posterior mean posterior mean Figure 3: Bayesian estimates biased away from the prior. a) If the likelihood function is symmetric, then the estimate (posterior mean) is, on average, shifted away from the actual value of the sensory variable θ0 towards the prior peak. b) Efficient encoding typically leads to an asymmetric likelihood function whose normalized mean is away from the peak of the prior (relative to θ0 ). The estimate is determined by a combination of prior attraction and shifted likelihood mean, and can exhibit an overall repulsive bias. c) If p(θ0 ) < 0 and the likelihood is relatively narrow, then (1/p(θ)2 ) > 0 (blue line) and the estimate is biased away from the prior peak (see Eq. (6)). common approach of choosing a parametric description of the likelihood function that is computationally convenient (e.g. Gaussian). As a consequence, likelihood functions are typically assumed to be symmetric (but see [23, 24]), leaving the bias of the Bayesian estimator to be mainly determined by the shape of the prior density, i.e. leading to biases toward the peak of the prior (Fig. 3a). In our model framework, the shape of the likelihood function is constrained by the stimulus prior via efficient neural encoding, and is generally not symmetric for non-flat priors. It has a heavier tail on the side with lower prior density (Fig. 3b). The intuition is that due to the efficient allocation of neural resources, the side with smaller prior density will be encoded less accurately, leading to a broader likelihood function on that side. The likelihood asymmetry pulls the Bayes’ least-squares estimate away from the peak of the prior while at the same time the prior pulls it toward its peak. Thus, the resulting estimation bias is the combination of these two counter-acting forces - and both are determined by the prior! 3.1 General derivation of the estimation bias In the following, we will formally derive the mean estimation bias b(θ) of the proposed encodingdecoding framework. Specifically, we will study the conditions for which the bias is repulsive i.e. away from the peak of the prior density. ˆ We first re-write the estimator θLSE (3) by replacing θ with the inverse of its mapping to the homo−1 ˜ geneous space, i.e., θ = F (θ). The motivation for this is that the likelihood in the homogeneous space is symmetric (Fig. 2). Given a value θ0 and the elicited population response R, we can write the estimator as ˜ ˜ ˜ ˜ θp(R|θ)p(θ)dθ F −1 (θ)p(R|F −1 (θ))p(F −1 (θ))dF −1 (θ) ˆ θLSE (R) = = . ˜ ˜ ˜ p(R|θ)p(θ)dθ p(R|F −1 (θ))p(F −1 (θ))dF −1 (θ) Calculating the derivative of the inverse function and noting that F is the cumulative of the prior density, we get 1 1 1 ˜ ˜ ˜ ˜ ˜ ˜ dθ = dθ. dF −1 (θ) = (F −1 (θ)) dθ = dθ = −1 (θ)) ˜ F (θ) p(θ) p(F ˆ Hence, we can simplify θLSE (R) as ˆ θLSE (R) = ˜ ˜ ˜ F −1 (θ)p(R|F −1 (θ))dθ . ˜ ˜ p(R|F −1 (θ))dθ With ˜ K(R, θ) = ˜ p(R|F −1 (θ)) ˜ ˜ p(R|F −1 (θ))dθ 4 we can further simplify the notation and get ˆ θLSE (R) = ˜ ˜ ˜ F −1 (θ)K(R, θ)dθ . (4) ˆ ˜ In order to get the expected value of the estimate, θLSE (θ), we marginalize (4) over the population response space S, ˆ ˜ ˜ ˜ ˜ θLSE (θ) = p(R)F −1 (θ)K(R, θ)dθdR S = F −1 ˜ (θ)( ˜ ˜ p(R)K(R, θ)dR)dθ = ˜ ˜ ˜ F −1 (θ)L(θ)dθ, S where we define ˜ L(θ) = ˜ p(R)K(R, θ)dR. S ˜ ˜ ˜ It follows that L(θ)dθ = 1. Due to the symmetry in this space, it can be shown that L(θ) is ˜0 . Intuitively, L(θ) can be thought as the normalized ˜ symmetric around the true stimulus value θ average likelihood in the homogeneous space. We can then compute the expected bias at θ0 as b(θ0 ) = ˜ ˜ ˜ ˜ F −1 (θ)L(θ)dθ − F −1 (θ0 ) (5) ˜ This is expression is general where F −1 (θ) is defined as the inverse of the cumulative of an arbitrary ˜ prior density p(θ) (see Eq. (1)) and the dispersion of L(θ) is determined by the internal noise level. ˜ ˜ Assuming the prior density to be smooth, we expand F −1 in a neighborhood (θ0 − h, θ0 + h) that is larger than the support of the likelihood function. Using Taylor’s theorem with mean-value forms of the remainder, we get 1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ F −1 (θ) = F −1 (θ0 ) + F −1 (θ0 ) (θ − θ0 ) + F −1 (θx ) (θ − θ0 )2 , 2 ˜ ˜ ˜ with θx lying between θ0 and θ. By applying this expression to (5), we find ˜ θ0 +h b(θ0 ) = = 1 2 ˜ θ0 −h 1 −1 ˜ ˜ ˜ ˜ ˜ 1 F (θx )θ (θ − θ0 )2 L(θ)dθ = ˜ 2 2 ˜ θ0 +h −( ˜ θ0 −h p(θx )θ ˜ ˜ 2 ˜ ˜ 1 )(θ − θ0 ) L(θ)dθ = p(θx )3 4 ˜ θ0 +h 1 ˜ − θ0 )2 L(θ)dθ ˜ ˜ ˜ ( ) ˜(θ ˜ p(F −1 (θx )) θ ( 1 ˜ ˜ ˜ ˜ ) (θ − θ0 )2 L(θ)dθ. p(θx )2 θ ˜ θ0 −h ˜ θ0 +h ˜ θ0 −h In general, there is no simple rule to judge the sign of b(θ0 ). However, if the prior is monotonic ˜ ˜ on the interval F −1 ((θ0 − h, θ0 + h)), then the sign of ( p(θ1 )2 ) is always the same as the sign of x 1 1 ( p(θ0 )2 ) . Also, if the likelihood is sufficiently narrow we can approximate ( p(θ1 )2 ) by ( p(θ0 )2 ) , x and therefore approximate the bias as b(θ0 ) ≈ C( 1 ) , p(θ0 )2 (6) where C is a positive constant. The result is quite surprising because it states that as long as the prior is monotonic over the support of the likelihood function, the expected estimation bias is always away from the peaks of the prior! 3.2 Internal (neural) versus external (stimulus) noise The above derivation of estimation bias is based on the assumption that all uncertainty about the sensory variable is caused by neural response variability. This level of internal noise depends on the response magnitude, and thus can be modulated e.g. by changing stimulus contrast. This contrastcontrolled noise modulation is commonly exploited in perceptual studies (e.g. [18]). Internal noise will always lead to repulsive biases in our framework if the prior is monotonic. If internal noise is low, the likelihood is narrow and thus the bias is small. Increasing internal noise leads to increasingly 5 larger biases up to the point where the likelihood becomes wide enough such that monotonicity of the prior over the support of the likelihood is potentially violated. Stimulus noise is another way to modulate the noise level in perception (e.g. random-dot motion stimuli). Such external noise, however, has a different effect on the shape of the likelihood function as compared to internal noise. It modifies the likelihood function (2) by convolving it with the noise kernel. External noise is frequently chosen as additive and symmetric (e.g. zero-mean Gaussian). It is straightforward to prove that such symmetric external noise does not lead to a change in the mean of the likelihood, and thus does not alter the repulsive effect induced by its asymmetry. However, by increasing the overall width of the likelihood, the attractive influence of the prior increases, resulting in an estimate that is closer to the prior peak than without external noise2 . 4 Perception of visual orientation We tested our framework by modelling the perception of visual orientation. Our choice was based on the fact that i) we have pretty good estimates of the prior distribution of local orientations in natural images, ii) tuning characteristics of orientation selective neurons in visual cortex are wellstudied (monkey/cat), and iii) biases in perceived stimulus orientation have been well characterized. We start by creating an efficient neural population based on measured prior distributions of local visual orientation, and then compare the resulting tuning characteristics of the population and the predicted perceptual biases with reported data in the literature. 4.1 Efficient neural model population for visual orientation Previous studies measured the statistics of the local orientation in large sets of natural images and consistently found that the orientation distribution is multimodal, peaking at the two cardinal orientations as shown in Fig. 4a [16, 20]. We assumed that the visual system’s prior belief over orientation p(θ) follows this distribution and approximate it formally as p(θ) ∝ 2 − | sin(θ)| (black line in Fig. 4b) . (7) Based on this prior distribution we defined an efficient neural representation for orientation. We assumed a population of model neurons (N = 30) with tuning curves that follow a von-Mises distribution in the homogeneous space on top of a constant spontaneous firing rate (5 Hz). We then ˜ applied the inverse transformation F −1 (θ) to all these tuning curves to get the corresponding tuning curves in the physical space (Fig. 4b - red curves), where F (θ) is the cumulative of the prior (7). The concentration parameter for the von-Mises tuning curves was set to κ ≈ 1.6 in the homogeneous space in order to match the measured average tuning width (∼ 32 deg) of neurons in area V1 of the macaque [9]. 4.2 Predicted tuning characteristics of neurons in primary visual cortex The orientation tuning characteristics of our model population well match neurophysiological data of neurons in primary visual cortex (V1). Efficient encoding predicts that the distribution of neurons’ preferred orientation follows the prior, with more neurons tuned to cardinal than oblique orientations by a factor of approximately 1.5. A similar ratio has been found for neurons in area V1 of monkey/cat [9, 10]. Also, the tuning widths of the model neurons vary between 25-42 deg depending on their preferred tuning (see Fig. 4c), matching the measured tuning width ratio of 0.6 between neurons tuned to the cardinal versus oblique orientations [9]. An important prediction of our model is that most of the tuning curves should be asymmetric. Such asymmetries have indeed been reported for the orientation tuning of neurons in area V1 [6, 7, 8]. We computed the asymmetry index for our model population as defined in previous studies [6, 7], and plotted it as a function of the preferred tuning of each neuron (Fig. 4d). The overall asymmetry index in our model population is 1.24 ± 0.11, which approximately matches the measured values for neurons in area V1 of the cat (1.26 ± 0.06) [6]. It also predicts that neurons tuned to the cardinal and oblique orientations should show less symmetry than those tuned to orientations in between. Finally, 2 Note, that these predictions are likely to change if the external noise is not symmetric. 6 a b 25 firing rate(Hz) 0 orientation(deg) asymmetry vs. tuning width 1.0 2.0 90 2.0 e asymmetry 1.0 0 asymmetry index 50 30 width (deg) 10 90 preferred tuning(deg) -90 0 d 0 0 90 asymmetry index 0 orientation(deg) tuning width -90 0 0 probability 0 -90 c efficient representation 0.01 0.01 image statistics -90 0 90 preferred tuning(deg) 25 30 35 40 tuning width (deg) Figure 4: Tuning characteristics of model neurons. a) Distribution of local orientations in natural images, replotted from [16]. b) Prior used in the model (black) and predicted tuning curves according to efficient coding (red). c) Tuning width as a function of preferred orientation. d) Tuning curves of cardinal and oblique neurons are more symmetric than those tuned to orientations in between. e) Both narrowly and broadly tuned neurons neurons show less asymmetry than neurons with tuning widths in between. neurons with tuning widths at the lower and upper end of the range are predicted to exhibit less asymmetry than those neurons whose widths lie in between these extremes (illustrated in Fig. 4e). These last two predictions have not been tested yet. 4.3 Predicted perceptual biases Our model framework also provides specific predictions for the expected perceptual biases. Humans show systematic biases in perceived orientation of visual stimuli such as e.g. arrays of Gabor patches (Fig. 5a,d). Two types of biases can be distinguished: First, perceived orientations show an absolute bias away from the cardinal orientations, thus away from the peaks of the orientation prior [2, 3]. We refer to these biases as absolute because they are typically measured by adjusting a noise-free reference until it matched the orientation of the test stimulus. Interestingly, these repulsive absolute biases are the larger the smaller the external stimulus noise is (see Fig. 5b). Second, the relative bias between the perceived overall orientations of a high-noise and a low-noise stimulus is toward the cardinal orientations as shown in Fig. 5c, and thus toward the peak of the prior distribution [3, 16]. The predicted perceptual biases of our model are shown Fig. 5e,f. We computed the likelihood function according to (2) and used the prior in (7). External noise was modeled by convolving the stimulus likelihood function with a Gaussian (different widths for different noise levels). The predictions well match both, the reported absolute bias away as well as the relative biases toward the cardinal orientations. Note, that our model framework correctly accounts for the fact that less external noise leads to larger absolute biases (see also discussion in section 3.2). 5 Discussion We have presented a modeling framework for perception that combines efficient (en)coding and Bayesian decoding. Efficient coding imposes constraints on the tuning characteristics of a population of neurons according to the stimulus distribution (prior). It thus establishes a direct link between prior and likelihood, and provides clear constraints on the latter for a Bayesian observer model of perception. We have shown that the resulting likelihoods are in general asymmetric, with 7 absolute bias (data) b c relative bias (data) -4 0 bias(deg) 4 a low-noise stimulus -90 e 90 absolute bias (model) low external noise high external noise 3 high-noise stimulus -90 f 0 90 relative bias (model) 0 bias(deg) d 0 attraction -3 repulsion -90 0 orientation (deg) 90 -90 0 orientation (deg) 90 Figure 5: Biases in perceived orientation: Human data vs. Model prediction. a,d) Low- and highnoise orientation stimuli of the type used in [3, 16]. b) Humans show absolute biases in perceived orientation that are away from the cardinal orientations. Data replotted from [2] (pink squares) and [3] (green (black) triangles: bias for low (high) external noise). c) Relative bias between stimuli with different external noise level (high minus low). Data replotted from [3] (blue triangles) and [16] (red circles). e,f) Model predictions for absolute and relative bias. heavier tails away from the prior peaks. We demonstrated that such asymmetric likelihoods can lead to the counter-intuitive prediction that a Bayesian estimator is biased away from the peaks of the prior distribution. Interestingly, such repulsive biases have been reported for human perception of visual orientation, yet a principled and consistent explanation of their existence has been missing so far. Here, we suggest that these counter-intuitive biases directly follow from the asymmetries in the likelihood function induced by efficient neural encoding of the stimulus. The good match between our model predictions and the measured perceptual biases and orientation tuning characteristics of neurons in primary visual cortex provides further support of our framework. Previous work has suggested that there might be a link between stimulus statistics, neuronal tuning characteristics, and perceptual behavior based on efficient coding principles, yet none of these studies has recognized the importance of the resulting likelihood asymmetries [16, 11]. We have demonstrated here that such asymmetries can be crucial in explaining perceptual data, even though the resulting estimates appear “anti-Bayesian” at first sight (see also models of sensory adaptation [23]). Note, that we do not provide a neural implementation of the Bayesian inference step. However, we and others have proposed various neural decoding schemes that can approximate Bayes’ leastsquares estimation using efficient coding [26, 25, 22]. It is also worth pointing out that our estimator is set to minimize total squared-error, and that other choices of the loss function (e.g. MAP estimator) could lead to different predictions. Our framework is general and should be directly applicable to other modalities. In particular, it might provide a new explanation for perceptual biases that are hard to reconcile with traditional Bayesian approaches [5]. Acknowledgments We thank M. Jogan and A. Tank for helpful comments on the manuscript. This work was partially supported by grant ONR N000141110744. 8 References [1] M. Jones, and B. C. Love. Bayesian fundamentalism or enlightenment? On the explanatory status and theoretical contributions of Bayesian models of cognition. Behavioral and Brain Sciences, 34, 169–231,2011. [2] D. P. Andrews. Perception of contours in the central fovea. Nature, 205:1218- 1220, 1965. [3] A. Tomassini, M. J.Morgam. and J. A. Solomon. Orientation uncertainty reduces perceived obliquity. Vision Res, 50, 541–547, 2010. [4] W. S. Geisler, D. Kersten. Illusions, perception and Bayes. Nature Neuroscience, 5(6):508- 510, 2002. [5] M. O. Ernst Perceptual learning: inverting the size-weight illusion. Current Biology, 19:R23- R25, 2009. [6] G. H. Henry, B. Dreher, P. O. Bishop. Orientation specificity of cells in cat striate cortex. J Neurophysiol, 37(6):1394-409,1974. [7] D. Rose, C. Blakemore An analysis of orientation selectivity in the cat’s visual cortex. Exp Brain Res., Apr 30;20(1):1-17, 1974. [8] N. V. Swindale. Orientation tuning curves: empirical description and estimation of parameters. Biol Cybern., 78(1):45-56, 1998. [9] R. L. De Valois, E. W. Yund, N. Hepler. The orientation and direction selectivity of cells in macaque visual cortex. Vision Res.,22, 531544,1982. [10] B. Li, M. R. Peterson, R. D. Freeman. The oblique effect: a neural basis in the visual cortex. J. Neurophysiol., 90, 204217, 2003. [11] D. Ganguli and E.P. Simoncelli. Implicit encoding of prior probabilities in optimal neural populations. In Adv. Neural Information Processing Systems NIPS 23, vol. 23:658–666, 2011. [12] M. D. McDonnell, N. G. Stocks. Maximally Informative Stimuli and Tuning Curves for Sigmoidal RateCoding Neurons and Populations. Phys Rev Lett., 101(5):058103, 2008. [13] H Helmholtz. Treatise on Physiological Optics (transl.). Thoemmes Press, Bristol, U.K., 2000. Original publication 1867. [14] Y. Weiss, E. Simoncelli, and E. Adelson. Motion illusions as optimal percept. Nature Neuroscience, 5(6):598–604, June 2002. [15] D.C. Knill and W. Richards, editors. Perception as Bayesian Inference. Cambridge University Press, 1996. [16] A R Girshick, M S Landy, and E P Simoncelli. Cardinal rules: visual orientation perception reflects knowledge of environmental statistics. Nat Neurosci, 14(7):926–932, Jul 2011. [17] M. Jazayeri and M.N. Shadlen. Temporal context calibrates interval timing. Nature Neuroscience, 13(8):914–916, 2010. [18] A.A. Stocker and E.P. Simoncelli. Noise characteristics and prior expectations in human visual speed perception. Nature Neuroscience, pages 578–585, April 2006. [19] H.B. Barlow. Possible principles underlying the transformation of sensory messages. In W.A. Rosenblith, editor, Sensory Communication, pages 217–234. MIT Press, Cambridge, MA, 1961. [20] D.M. Coppola, H.R. Purves, A.N. McCoy, and D. Purves The distribution of oriented contours in the real world. Proc Natl Acad Sci U S A., 95(7): 4002–4006, 1998. [21] N. Brunel and J.-P. Nadal. Mutual information, Fisher information and population coding. Neural Computation, 10, 7, 1731–1757, 1998. [22] X-X. Wei and A.A. Stocker. Bayesian inference with efficient neural population codes. In Lecture Notes in Computer Science, Artificial Neural Networks and Machine Learning - ICANN 2012, Lausanne, Switzerland, volume 7552, pages 523–530, 2012. [23] A.A. Stocker and E.P. Simoncelli. Sensory adaptation within a Bayesian framework for perception. In Y. Weiss, B. Sch¨ lkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages o 1291–1298. MIT Press, Cambridge, MA, 2006. Oral presentation. [24] D.C. Knill. Robust cue integration: A Bayesian model and evidence from cue-conflict studies with stereoscopic and figure cues to slant. Journal of Vision, 7(7):1–24, 2007. [25] Deep Ganguli. Efficient coding and Bayesian inference with neural populations. PhD thesis, Center for Neural Science, New York University, New York, NY, September 2012. [26] B. Fischer. Bayesian estimates from heterogeneous population codes. In Proc. IEEE Intl. Joint Conf. on Neural Networks. IEEE, 2010. 9

3 0.14240927 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization

Author: Mijung Park, Jonathan W. Pillow

Abstract: Active learning methods can dramatically improve the yield of neurophysiology experiments by adaptively selecting stimuli to probe a neuron’s receptive field (RF). Bayesian active learning methods specify a posterior distribution over the RF given the data collected so far in the experiment, and select a stimulus on each time step that maximally reduces posterior uncertainty. However, existing methods tend to employ simple Gaussian priors over the RF and do not exploit uncertainty at the level of hyperparameters. Incorporating this uncertainty can substantially speed up active learning, particularly when RFs are smooth, sparse, or local in space and time. Here we describe a novel framework for active learning under hierarchical, conditionally Gaussian priors. Our algorithm uses sequential Markov Chain Monte Carlo sampling (“particle filtering” with MCMC) to construct a mixture-of-Gaussians representation of the RF posterior, and selects optimal stimuli using an approximate infomax criterion. The core elements of this algorithm are parallelizable, making it computationally efficient for real-time experiments. We apply our algorithm to simulated and real neural data, and show that it can provide highly accurate receptive field estimates from very limited data, even with a small number of hyperparameter samples. 1

4 0.13279365 190 nips-2012-Learning optimal spike-based representations

Author: Ralph Bourdoukan, David Barrett, Sophie Deneve, Christian K. Machens

Abstract: How can neural networks learn to represent information optimally? We answer this question by deriving spiking dynamics and learning dynamics directly from a measure of network performance. We find that a network of integrate-and-fire neurons undergoing Hebbian plasticity can learn an optimal spike-based representation for a linear decoder. The learning rule acts to minimise the membrane potential magnitude, which can be interpreted as a representation error after learning. In this way, learning reduces the representation error and drives the network into a robust, balanced regime. The network becomes balanced because small representation errors correspond to small membrane potentials, which in turn results from a balance of excitation and inhibition. The representation is robust because neurons become self-correcting, only spiking if the representation error exceeds a threshold. Altogether, these results suggest that several observed features of cortical dynamics, such as excitatory-inhibitory balance, integrate-and-fire dynamics and Hebbian plasticity, are signatures of a robust, optimal spike-based code. A central question in neuroscience is to understand how populations of neurons represent information and how they learn to do so. Usually, learning and information representation are treated as two different functions. From the outset, this separation seems like a good idea, as it reduces the problem into two smaller, more manageable chunks. Our approach, however, is to study these together. This allows us to treat learning and information representation as two sides of a single mechanism, operating at two different timescales. Experimental work has given us several clues about the regime in which real networks operate in the brain. Some of the most prominent observations are: (a) high trial-to-trial variability—a neuron responds differently to repeated, identical inputs [1, 2]; (b) asynchronous firing at the network level—spike trains of different neurons are at most very weakly correlated [3, 4, 5]; (c) tight balance of excitation and inhibition—every excitatory input is met by an inhibitory input of equal or greater size [6, 7, 8] and (4) spike-timing-dependent plasticity (STDP)—the strength of synapses change as a function of presynaptic and postsynaptic spike times [9]. Previously, it has been shown that observations (a)–(c) can be understood as signatures of an optimal, spike-based code [10, 11]. The essential idea is to derive spiking dynamics from the assumption that neurons only fire if their spike improves information representation. Information in a network may ∗ Authors contributed equally 1 originate from several possible sources: external sensory input, external neural network input, or alternatively, it may originate within the network itself as a memory, or as a computation. Whatever the source, this initial assumption leads directly to the conclusion that a network of integrate-and-fire neurons can optimally represent a signal while exhibiting properties (a)–(c). A major problem with this framework is that network connectivity must be completely specified a priori, and requires the tuning of N 2 parameters, where N is the number of neurons in the network. Although this is feasible mathematically, it is unclear how a real network could tune itself into this optimal regime. In this work, we solve this problem using a simple synaptic learning rule. The key insight is that the plasticity rule can be derived from the same basic principle as the spiking rule in the earlier work—namely, that any change should improve information representation. Surprisingly, this can be achieved with a local, Hebbian learning rule, where synaptic plasticity is proportional to the product of presynaptic firing rates with post-synaptic membrane potentials. Spiking and synaptic plasticity then work hand in hand towards the same goal: the spiking of a neuron decreases the representation error on a fast time scale, thereby giving rise to the actual population representation; synaptic plasticity decreases the representation error on a slower time scale, thereby improving or maintaining the population representation. For a large set of initial connectivities and spiking dynamics, neural networks are driven into a balanced regime, where excitation and inhibition cancel each other and where spike trains are asynchronous and irregular. Furthermore, the learning rule that we derive reproduces the main features of STDP (property (d) above). In this way, a network can learn to represent information optimally, with synaptic, neural and network dynamics consistent with those observed experimentally. 1 Derivation of the learning rule for a single neuron We begin by deriving a learning rule for a single neuron with an autapse (a self-connection) (Fig. 1A). Our approach is to derive synaptic dynamics for the autapse and spiking dynamics for the neuron such that the neuron learns to optimally represent a time-varying input signal. We will derive a learning rule for networks of neurons later, after we have developed the fundamental concepts for the single neuron case. Our first step is to derive optimal spiking dynamics for the neuron, so that we have a target for our learning rule. We do this by making two simple assumptions [11]. First, we assume that the neuron can provide an estimate or read-out x(t) of a time-dependent signal x(t) by filtering its spike train ˆ o(t) as follows: ˙ x(t) = −ˆ(t) + Γo(t), ˆ x (1) where Γ is a fixed read-out weight, which we will refer to as the neuron’s “output kernel” and the spike train can be written as o(t) = i δ(t − ti ), where {ti } are the spike times. Next, we assume that the neuron only produces a spike if that spike improves the read-out, where we measure the read-out performance through a simple squared-error loss function: 2 L(t) = x(t) − x(t) . ˆ (2) With these two assumptions, we can now derive optimal spiking dynamics. First, we observe that if the neuron produces an additional spike at time t, the read-out increases by Γ, and the loss function becomes L(t|spike) = (x(t) − (x(t) + Γ))2 . This allows us to restate our spiking rule as follows: ˆ the neuron should only produce a spike if L(t|no spike) > L(t|spike), or (x(t) − x(t))2 > (x(t) − ˆ (x(t) + Γ))2 . Now, squaring both sides of this inequality, defining V (t) ≡ Γ(x(t) − x(t)) and ˆ ˆ defining T ≡ Γ2 /2 we find that the neuron should only spike if: V (t) > T. (3) We interpret V (t) to be the membrane potential of the neuron, and we interpret T as the spike threshold. This interpretation allows us to understand the membrane potential functionally: the voltage is proportional to a prediction error—the difference between the read-out x(t) and the actual ˆ signal x(t). A spike is an error reduction mechanism—the neuron only spikes if the error exceeds the spike threshold. This is a greedy minimisation, in that the neuron fires a spike whenever that action decreases L(t) without considering the future impact of that spike. Importantly, the neuron does not require direct access to the loss function L(t). 2 To determine the membrane potential dynamics, we take the derivative of the voltage, which gives ˙ ˙ us V = Γ(x − x). (Here, and in the following, we will drop the time index for notational brevity.) ˙ ˆ ˙ Now, using Eqn. (1) we obtain V = Γx − Γ(−x + Γo) = −Γ(x − x) + Γ(x + x) − Γ2 o, so that: ˙ ˆ ˆ ˙ ˙ V = −V + Γc − Γ2 o, (4) where c = x + x is the neural input. This corresponds exactly to the dynamics of a leaky integrate˙ and-fire neuron with an inhibitory autapse1 of strength Γ2 , and a feedforward connection strength Γ. The dynamics and connectivity guarantee that a neuron spikes at just the right times to optimise the loss function (Fig. 1B). In addition, it is especially robust to noise of different forms, because of its error-correcting nature. If x is constant in time, the voltage will rise up to the threshold T at which point a spike is fired, adding a delta function to the spike train o at time t, thereby producing a read-out x that is closer to x and causing an instantaneous drop in the voltage through the autapse, ˆ by an amount Γ2 = 2T , effectively resetting the voltage to V = −T . We now have a target for learning—we know the connection strength that a neuron must have at the end of learning if it is to represent information optimally, for a linear read-out. We can use this target to derive synaptic dynamics that can learn an optimal representation from experience. Specifically, we consider an integrate-and-fire neuron with some arbitrary autapse strength ω. The dynamics of this neuron are given by ˙ V = −V + Γc − ωo. (5) This neuron will not produce the correct spike train for representing x through a linear read-out (Eqn. (1)) unless ω = Γ2 . Our goal is to derive a dynamical equation for the synapse ω so that the spike train becomes optimal. We do this by quantifying the loss that we are incurring by using the suboptimal strength, and then deriving a learning rule that minimises this loss with respect to ω. The loss function underlying the spiking dynamics determined by Eqn. (5) can be found by reversing the previous membrane potential analysis. First, we integrate the differential equation for V , assuming that ω changes on time scales much slower than the membrane potential. We obtain the following (formal) solution: V = Γx − ω¯, o (6) ˙ where o is determined by o = −¯ + o. The solution to this latter equation is o = h ∗ o, a convolution ¯ ¯ o ¯ of the spike train with the exponential kernel h(τ ) = θ(τ ) exp(−τ ). As such, it is analogous to the instantaneous firing rate of the neuron. Now, using Eqn. (6), and rewriting the read-out as x = Γ¯, we obtain the loss incurred by the ˆ o sub-optimal neuron, L = (x − x)2 = ˆ 1 V 2 + 2(ω − Γ2 )¯ + (ω − Γ2 )2 o2 . o ¯ Γ2 (7) We observe that the last two terms of Eqn. (7) will vanish whenever ω = Γ2 , i.e., when the optimal reset has been found. We can therefore simplify the problem by defining an alternative loss function, 1 2 V , (8) 2 which has the same minimum as the original loss (V = 0 or x = x, compare Eqn. (2)), but yields a ˆ simpler learning algorithm. We can now calculate how changes to ω affect LV : LV = ∂LV ∂V ∂o ¯ =V = −V o − V ω ¯ . (9) ∂ω ∂ω ∂ω We can ignore the last term in this equation (as we will show below). Finally, using simple gradient descent, we obtain a simple Hebbian-like synaptic plasticity rule: τω = − ˙ ∂LV = V o, ¯ ∂ω (10) where τ is the learning time constant. 1 This contribution of the autapse can also be interpreted as the reset of an integrate-and-fire neuron. Later, when we generalise to networks of neurons, we shall employ this interpretation. 3 This synaptic learning rule is capable of learning the synaptic weight ω that minimises the difference between x and x (Fig. 1B). During learning, the synaptic weight changes in proportion to the postˆ synaptic voltage V and the pre-synaptic firing rate o (Fig. 1C). As such, this is a Hebbian learning ¯ rule. Of course, in this single neuron case, the pre-synaptic neuron and post-synaptic neuron are the same neuron. The synaptic weight gradually approaches its optimal value Γ2 . However, it never completely stabilises, because learning never stops as long as neurons are spiking. Instead, the synapse oscillates closely about the optimal value (Fig. 1D). This is also a “greedy” learning rule, similar to the spiking rule, in that it seeks to minimise the error at each instant in time, without regard for the future impact of those changes. To demonstrate that the second term in Eqn. (5) can be neglected we note that the equations for V , o, and ω define a system ¯ of coupled differential equations that can be solved analytically by integrating between spikes. This results in a simple recurrence relation for changes in ω from the ith to the (i + 1)th spike, ωi+1 = ωi + ωi (ωi − 2T ) . τ (T − Γc − ωi ) (11) This iterative equation has a single stable fixed point at ω = 2T = Γ2 , proving that the neuron’s autaptic weight or reset will approach the optimal solution. 2 Learning in a homogeneous network We now generalise our learning rule derivation to a network of N identical, homogeneously connected neurons. This generalisation is reasonably straightforward because many characteristics of the single neuron case are shared by a network of identical neurons. We will return to the more general case of heterogeneously connected neurons in the next section. We begin by deriving optimal spiking dynamics, as in the single neuron case. This provides a target for learning, which we can then use to derive synaptic dynamics. As before, we want our network to produce spikes that optimally represent a variable x for a linear read-out. We assume that the read-out x is provided by summing and filtering the spike trains of all the neurons in the network: ˆ ˙ x = −ˆ + Γo, ˆ x (12) 2 where the row vector Γ = (Γ, . . . , Γ) contains the read-out weights of the neurons and the column vector o = (o1 , . . . , oN ) their spike trains. Here, we have used identical read-out weights for each neuron, because this indirectly leads to homogeneous connectivity, as we will demonstrate. Next, we assume that a neuron only spikes if that spike reduces a loss-function. This spiking rule is similar to the single neuron spiking rule except that this time there is some ambiguity about which neuron should spike to represent a signal. Indeed, there are many different spike patterns that provide exactly the same estimate x. For example, one neuron could fire regularly at a high rate (exactly like ˆ our previous single neuron example) while all others are silent. To avoid this firing rate ambiguity, we use a modified loss function, that selects amongst all equivalent solutions, those with the smallest neural firing rates. We do this by adding a ‘metabolic cost’ term to our loss function, so that high firing rates are penalised: ¯ L = (x − x)2 + µ o 2 , ˆ (13) where µ is a small positive constant that controls the cost-accuracy trade-off, akin to a regularisation parameter. Each neuron in the optimal network will seek to reduce this loss function by firing a spike. Specifically, the ith neuron will spike whenever L(no spike in i) > L(spike in i). This leads to the following spiking rule for the ith neuron: Vi > Ti (14) where Vi ≡ Γ(x − x) − µoi and Ti ≡ Γ2 /2 + µ/2. We can naturally interpret Vi as the membrane ˆ potential of the ith neuron and Ti as the spiking threshold of that neuron. As before, we can now derive membrane potential dynamics: ˙ V = −V + ΓT c − (ΓT Γ + µI)o, 2 (15) The read-out weights must scale as Γ ∼ 1/N so that firing rates are not unrealistically small in large networks. We can see this by calculating the average firing rate N oi /N ≈ x/(ΓN ) ∼ O(N/N ) ∼ O(1). i=1 ¯ 4 where I is the identity matrix and ΓT Γ + µI is the network connectivity. We can interpret the selfconnection terms {Γ2 +µ} as voltage resets that decrease the voltage of any neuron that spikes. This optimal network is equivalent to a network of identical integrate-and-fire neurons with homogeneous inhibitory connectivity. The network has some interesting dynamical properties. The voltages of all the neurons are largely synchronous, all increasing to the spiking threshold at about the same time3 (Fig. 1F). Nonetheless, neural spiking is asynchronous. The first neuron to spike will reset itself by Γ2 + µ, and it will inhibit all the other neurons in the network by Γ2 . This mechanism prevents neurons from spik- x 3 The first neuron to spike will be random if there is some membrane potential noise. V (A) (B) x x ˆ x 10 1 0.1 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 400 1 D 0.5 V V 0 ˆ x V ˆ x (C) 1 0 1 2 0 0.625 25 25.625 (D) start of learning 1 V 50 200.625 400 400.625 1 2.4 O 1.78 ω 1.77 25 neuron$ 0 1 2 !me$ 3 4 25 1 5 V 400.625 !me$ (F) 25 1 2.35 1.05 1.049 400 25.625 !me$ (E) neuron$ 100.625 200 end of learning 1.4 1.35 ω 100 !me$ 1 V 1 O 50.625 0 1 2 !me$ 3 4 5 V !me$ !me$ Figure 1: Learning in a single neuron and a homogeneous network. (A) A single neuron represents an input signal x by producing an output x. (B) During learning, the single neuron output x (solid red ˆ ˆ line, top panel) converges towards the input x (blue). Similarly, for a homogeneous network the output x (dashed red line, top panel) converges towards x. Connectivity also converges towards optimal ˆ connectivity in both the single neuron case (solid black line, middle panel) and the homogeneous net2 2 work case (dashed black line, middle panel), as quantified by D = maxi,j ( Ωij − Ωopt / Ωopt ) ij ij at each point in time. Consequently, the membrane potential reset (bottom panel) converges towards the optimal reset (green line, bottom panel). Spikes are indicated by blue vertical marks, and are produced when the membrane potential reaches threshold (bottom panel). Here, we have rescaled time, as indicated, for clarity. (C) Our learning rule dictates that the autapse ω in our single neuron (bottom panel) changes in proportion to the membrane potential (top panel) and the firing rate (middle panel). (D) At the end of learning, the reset ω fluctuates weakly about the optimal value. (E) For a homogeneous network, neurons spike regularly at the start of learning, as shown in this raster plot. Membrane potentials of different neurons are weakly correlated. (F) At the end of learning, spiking is very irregular and membrane potentials become more synchronous. 5 ing synchronously. The population as a whole acts similarly to the single neuron in our previous example. Each neuron fires regularly, even if a different neuron fires in every integration cycle. The design of this optimal network requires the tuning of N (N − 1) synaptic parameters. How can an arbitrary network of integrate-and-fire neurons learn this optimum? As before, we address this question by using the optimal network as a target for learning. We start with an arbitrarily connected network of integrate-and-fire neurons: ˙ V = −V + ΓT c − Ωo, (16) where Ω is a matrix of connectivity weights, which includes the resets of the individual neurons. Assuming that learning occurs on a slow time scale, we can rewrite this equation as V = ΓT x − Ω¯ . o (17) Now, repeating the arguments from the single neuron derivation, we modify the loss function to obtain an online learning rule. Specifically, we set LV = V 2 /2, and calculate the gradient: ∂LV = ∂Ωij Vk k ∂Vk =− ∂Ωij Vk δki oj − ¯ k Vk Ωkl kl ∂ ol ¯ . ∂Ωij (18) We can simplify this equation considerably by observing that the contribution of the second summation is largely averaged out under a wide variety of realistic conditions4 . Therefore, it can be neglected, and we obtain the following local learning rule: ∂LV ˙ = V i oj . ¯ τ Ωij = − ∂Ωij (19) This is a Hebbian plasticity rule, whereby connectivity changes in proportion to the presynaptic firing rate oj and post-synaptic membrane potential Vi . We assume that the neural thresholds are set ¯ to a constant T and that the neural resets are set to their optimal values −T . In the previous section we demonstrated that these resets can be obtained by a Hebbian plasticity rule (Eqn. (10)). This learning rule minimises the difference between the read-out and the signal, by approaching the optimal recurrent connection strengths for the network (Fig. 1B). As in the single neuron case, learning does not stop, so the connection strengths fluctuate close to their optimal value. During learning, network activity becomes progressively more asynchronous as it progresses towards optimal connectivity (Fig. 1E, F). 3 Learning in the general case Now that we have developed the fundamental concepts underlying our learning rule, we can derive a learning rule for the more general case of a network of N arbitrarily connected leaky integrateand-fire neurons. Our goal is to understand how such networks can learn to optimally represent a ˙ J-dimensional signal x = (x1 , . . . , xJ ), using the read-out equation x = −x + Γo. We consider a network with the following membrane potential dynamics: ˙ V = −V + ΓT c − Ωo, (20) where c is a J-dimensional input. We assume that this input is related to the signal according to ˙ c = x + x. This assumption can be relaxed by treating the input as the control for an arbitrary linear dynamical system, in which case the signal represented by the network is the output of such a computation [11]. However, this further generalisation is beyond the scope of this work. As before, we need to identify the optimal recurrent connectivity so that we have a target for learning. Most generally, the optimal recurrent connectivity is Ωopt ≡ ΓT Γ + µI. The output kernels of the individual neurons, Γi , are given by the rows of Γ, and their spiking thresholds by Ti ≡ Γi 2 /2 + 4 From the definition of the membrane potential we can see that Vk ∼ O(1/N ) because Γ ∼ 1/N . Therefore, the size of the first term in Eqn. (18) is k Vk δki oj = Vi oj ∼ O(1/N ). Therefore, the second term can ¯ ¯ be ignored if kl Vk Ωkl ∂ ol /∂Ωij ¯ O(1/N ). This happens if Ωkl O(1/N 2 ) as at the start of learning. It also happens towards the end of learning if the terms {Ωkl ∂ ol /∂Ωij } are weakly correlated with zero mean, ¯ or if the membrane potentials {Vi } are weakly correlated with zero mean. 6 µ/2. With these connections and thresholds, we find that a network of integrate-and-fire neurons ˆ ¯ will produce spike trains in such a way that the loss function L = x − x 2 + µ o 2 is minimised, ˆ where the read-out is given by x = Γ¯ . We can show this by prescribing a greedy5 spike rule: o a spike is fired by neuron i whenever L(no spike in i) > L(spike in i) [11]. The resulting spike generation rule is Vi > Ti , (21) ˆ where Vi ≡ ΓT (x − x) − µ¯i is interpreted as the membrane potential. o i 5 Despite being greedy, this spiking rule can generate firing rates that are practically identical to the optimal solutions: we checked this numerically in a large ensemble of networks with randomly chosen kernels. (A) x1 … x … 1 1 (B) xJJ x 10 L 10 T T 10 4 6 8 1 Viii V D ˆˆ ˆˆ x11 xJJ x x F 0.5 0 0.4 … … 0.2 0 0 2000 4000 !me   (C) x V V 1 x 10 x 3 ˆ x 8 0 x 10 1 2 3 !me   4 5 4 0 1 4 0 1 8 V (F) Ρ(Δt)   E-­‐I  input   0.4 ˆ x 0 3 0 1 x 10 1.3 0.95 x 10 ˆ x 4 V (E) 1 x 0 end of learning 50 neuron neuron 50 !me   2 0 ˆ x 0 0.5 ISI  Δt     1 2 !me   4 5 4 1.5 1.32 3 2 0.1 Ρ(Δt)   x E-­‐I  input   (D) start of learning 0 2 !me   0 0 0.5 ISI  Δt   1 Figure 2: Learning in a heterogeneous network. (A) A network of neurons represents an input ˆ signal x by producing an output x. (B) During learning, the loss L decreases (top panel). The difference between the connection strengths and the optimal strengths also decreases (middle panel), as 2 2 quantified by the mean difference (solid line), given by D = Ω − Ωopt / Ωopt and the maxi2 2 mum difference (dashed line), given by maxi,j ( Ωij − Ωopt / Ωopt ). The mean population firing ij ij rate (solid line, bottom panel) also converges towards the optimal firing rate (dashed line, bottom panel). (C, E) Before learning, a raster plot of population spiking shows that neurons produce bursts ˆ of spikes (upper panel). The network output x (red line, middle panel) fails to represent x (blue line, middle panel). The excitatory input (red, bottom left panel) and inhibitory input (green, bottom left panel) to a randomly selected neuron is not tightly balanced. Furthermore, a histogram of interspike intervals shows that spiking activity is not Poisson, as indicated by the red line that represents a best-fit exponential distribution. (D, F) At the end of learning, spiking activity is irregular and ˆ Poisson-like, excitatory and inhibitory input is tightly balanced and x matches x. 7 How can we learn this optimal connection matrix? As before, we can derive a learning rule by minimising the cost function LV = V 2 /2. This leads to a Hebbian learning rule with the same form as before: ˙ τ Ωij = Vi oj . ¯ (22) Again, we assume that the neural resets are given by −Ti . Furthermore, in order for this learning rule to work, we must assume that the network input explores all possible directions in the J-dimensional input space (since the kernels Γi can point in any of these directions). The learning performance does not critically depend on how the input variable space is sampled as long as the exploration is extensive. In our simulations, we randomly sample the input c from a Gaussian white noise distribution at every time step for the entire duration of the learning. We find that this learning rule decreases the loss function L, thereby approaching optimal network connectivity and producing optimal firing rates for our linear decoder (Fig. 2B). In this example, we have chosen connectivity that is initially much too weak at the start of learning. Consequently, the initial network behaviour is similar to a collection of unconnected single neurons that ignore each other. Spike trains are not Poisson-like, firing rates are excessively large, excitatory and inhibitory ˆ input is unbalanced and the decoded variable x is highly unreliable (Fig. 2C, E). As a result of learning, the network becomes tightly balanced and the spike trains become asynchronous, irregular and Poisson-like with much lower rates (Fig. 2D, F). However, despite this apparent variability, the population representation is extremely precise, only limited by the the metabolic cost and the discrete nature of a spike. This learnt representation is far more precise than a rate code with independent Poisson spike trains [11]. In particular, shuffling the spike trains in response to identical inputs drastically degrades this precision. 4 Conclusions and Discussion In population coding, large trial-to-trial spike train variability is usually interpreted as noise [2]. We show here that a deterministic network of leaky integrate-and-fire neurons with a simple Hebbian plasticity rule can self-organise into a regime where information is represented far more precisely than in noisy rate codes, while appearing to have noisy Poisson-like spiking dynamics. Our learning rule (Eqn. (22)) has the basic properties of STDP. Specifically, a presynaptic spike occurring immediately before a post-synaptic spike will potentiate a synapse, because membrane potentials are positive immediately before a postsynaptic spike. Furthermore, a presynaptic spike occurring immediately after a post-synaptic spike will depress a synapse, because membrane potentials are always negative immediately after a postsynaptic spike. This is similar in spirit to the STDP rule proposed in [12], but different to classical STDP, which depends on post-synaptic spike times [9]. This learning rule can also be understood as a mechanism for generating a tight balance between excitatory and inhibitory input. We can see this by observing that membrane potentials after learning can be interpreted as representation errors (projected onto the read-out kernels). Therefore, learning acts to minimise the magnitude of membrane potentials. Excitatory and inhibitory input must be balanced if membrane potentials are small, so we can equate balance with optimal information representation. Previous work has shown that the balanced regime produces (quasi-)chaotic network dynamics, thereby accounting for much observed cortical spike train variability [13, 14, 4]. Moreover, the STDP rule has been known to produce a balanced regime [16, 17]. Additionally, recent theoretical studies have suggested that the balanced regime plays an integral role in network computation [15, 13]. In this work, we have connected these mechanisms and functions, to conclude that learning this balance is equivalent to the development of an optimal spike-based population code, and that this learning can be achieved using a simple Hebbian learning rule. Acknowledgements We are grateful for generous funding from the Emmy-Noether grant of the Deutsche Forschungsgemeinschaft (CKM) and the Chaire d’excellence of the Agence National de la Recherche (CKM, DB), as well as a James Mcdonnell Foundation Award (SD) and EU grants BACS FP6-IST-027140, BIND MECT-CT-20095-024831, and ERC FP7-PREDSPIKE (SD). 8 References [1] Tolhurst D, Movshon J, and Dean A (1982) The statistical reliability of signals in single neurons in cat and monkey visual cortex. Vision Res 23: 775–785. [2] Shadlen MN, Newsome WT (1998) The variable discharge of cortical neurons: implications for connectivity, computation, and information coding. J Neurosci 18(10): 3870–3896. [3] Zohary E, Newsome WT (1994) Correlated neuronal discharge rate and its implication for psychophysical performance. Nature 370: 140–143. [4] Renart A, de la Rocha J, Bartho P, Hollender L, Parga N, Reyes A, & Harris, KD (2010) The asynchronous state in cortical circuits. Science 327, 587–590. [5] Ecker AS, Berens P, Keliris GA, Bethge M, Logothetis NK, Tolias AS (2010) Decorrelated neuronal firing in cortical microcircuits. Science 327: 584–587. [6] Okun M, Lampl I (2008) Instantaneous correlation of excitation and inhibition during ongoing and sensory-evoked activities. Nat Neurosci 11, 535–537. [7] Shu Y, Hasenstaub A, McCormick DA (2003) Turning on and off recurrent balanced cortical activity. Nature 423, 288–293. [8] Gentet LJ, Avermann M, Matyas F, Staiger JF, Petersen CCH (2010) Membrane potential dynamics of GABAergic neurons in the barrel cortex of behaving mice. Neuron 65: 422–435. [9] Caporale N, Dan Y (2008) Spike-timing-dependent plasticity: a Hebbian learning rule. Annu Rev Neurosci 31: 25–46. [10] Boerlin M, Deneve S (2011) Spike-based population coding and working memory. PLoS Comput Biol 7, e1001080. [11] Boerlin M, Machens CK, Deneve S (2012) Predictive coding of dynamic variables in balanced spiking networks. under review. [12] Clopath C, B¨ sing L, Vasilaki E, Gerstner W (2010) Connectivity reflects coding: a model of u voltage-based STDP with homeostasis. Nat Neurosci 13(3): 344–352. [13] van Vreeswijk C, Sompolinsky H (1998) Chaotic balanced state in a model of cortical circuits. Neural Comput 10(6): 1321–1371. [14] Brunel N (2000) Dynamics of sparsely connected networks of excitatory and inhibitory neurons. J Comput Neurosci 8, 183–208. [15] Vogels TP, Rajan K, Abbott LF (2005) Neural network dynamics. Annu Rev Neurosci 28: 357–376. [16] Vogels TP, Sprekeler H, Zenke F, Clopath C, Gerstner W. (2011) Inhibitory plasticity balances excitation and inhibition in sensory pathways and memory networks. Science 334(6062):1569– 73. [17] Song S, Miller KD, Abbott LF (2000) Competitive Hebbian learning through spike-timingdependent synaptic plasticity. Nat Neurosci 3(9): 919–926. 9

5 0.12418552 24 nips-2012-A mechanistic model of early sensory processing based on subtracting sparse representations

Author: Shaul Druckmann, Tao Hu, Dmitri B. Chklovskii

Abstract: Early stages of sensory systems face the challenge of compressing information from numerous receptors onto a much smaller number of projection neurons, a so called communication bottleneck. To make more efficient use of limited bandwidth, compression may be achieved using predictive coding, whereby predictable, or redundant, components of the stimulus are removed. In the case of the retina, Srinivasan et al. (1982) suggested that feedforward inhibitory connections subtracting a linear prediction generated from nearby receptors implement such compression, resulting in biphasic center-surround receptive fields. However, feedback inhibitory circuits are common in early sensory circuits and furthermore their dynamics may be nonlinear. Can such circuits implement predictive coding as well? Here, solving the transient dynamics of nonlinear reciprocal feedback circuits through analogy to a signal-processing algorithm called linearized Bregman iteration we show that nonlinear predictive coding can be implemented in an inhibitory feedback circuit. In response to a step stimulus, interneuron activity in time constructs progressively less sparse but more accurate representations of the stimulus, a temporally evolving prediction. This analysis provides a powerful theoretical framework to interpret and understand the dynamics of early sensory processing in a variety of physiological experiments and yields novel predictions regarding the relation between activity and stimulus statistics.

6 0.11518808 94 nips-2012-Delay Compensation with Dynamical Synapses

7 0.10802241 195 nips-2012-Learning visual motion in recurrent neural networks

8 0.10439776 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking

9 0.099927254 187 nips-2012-Learning curves for multi-task Gaussian process regression

10 0.09841691 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

11 0.094657429 204 nips-2012-MAP Inference in Chains using Column Generation

12 0.091174357 5 nips-2012-A Conditional Multinomial Mixture Model for Superset Label Learning

13 0.080836378 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

14 0.077201173 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

15 0.072730012 239 nips-2012-Neuronal Spike Generation Mechanism as an Oversampling, Noise-shaping A-to-D converter

16 0.070476837 84 nips-2012-Convergence Rate Analysis of MAP Coordinate Minimization Algorithms

17 0.069961354 227 nips-2012-Multiclass Learning with Simplex Coding

18 0.068396844 322 nips-2012-Spiking and saturating dendrites differentially expand single neuron computation capacity

19 0.066192836 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions

20 0.065568581 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.17), (1, 0.025), (2, 0.006), (3, 0.134), (4, -0.019), (5, 0.212), (6, 0.041), (7, 0.081), (8, -0.038), (9, -0.011), (10, -0.014), (11, -0.063), (12, 0.063), (13, -0.04), (14, 0.028), (15, 0.031), (16, 0.014), (17, -0.047), (18, -0.066), (19, 0.06), (20, 0.028), (21, 0.025), (22, 0.078), (23, 0.129), (24, -0.084), (25, 0.053), (26, -0.003), (27, 0.067), (28, 0.107), (29, 0.036), (30, 0.049), (31, 0.094), (32, -0.06), (33, 0.024), (34, 0.135), (35, -0.001), (36, 0.052), (37, -0.012), (38, 0.117), (39, 0.035), (40, 0.038), (41, 0.157), (42, 0.06), (43, 0.029), (44, -0.048), (45, -0.054), (46, -0.065), (47, -0.043), (48, 0.028), (49, -0.066)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95689499 262 nips-2012-Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L p$ Loss

Author: Zhuo Wang, Alan Stocker, Daniel Lee

Abstract: In this work we study how the stimulus distribution influences the optimal coding of an individual neuron. Closed-form solutions to the optimal sigmoidal tuning curve are provided for a neuron obeying Poisson statistics under a given stimulus distribution. We consider a variety of optimality criteria, including maximizing discriminability, maximizing mutual information and minimizing estimation error under a general Lp norm. We generalize the Cramer-Rao lower bound and show how the Lp loss can be written as a functional of the Fisher Information in the asymptotic limit, by proving the moment convergence of certain functions of Poisson random variables. In this manner, we show how the optimal tuning curve depends upon the loss function, and the equivalence of maximizing mutual information with minimizing Lp loss in the limit as p goes to zero. 1

2 0.80657613 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

Author: Xue-xin Wei, Alan Stocker

Abstract: A common challenge for Bayesian models of perception is the fact that the two fundamental Bayesian components, the prior distribution and the likelihood function, are formally unconstrained. Here we argue that a neural system that emulates Bayesian inference is naturally constrained by the way it represents sensory information in populations of neurons. More specifically, we show that an efficient coding principle creates a direct link between prior and likelihood based on the underlying stimulus distribution. The resulting Bayesian estimates can show biases away from the peaks of the prior distribution, a behavior seemingly at odds with the traditional view of Bayesian estimation, yet one that has been reported in human perception. We demonstrate that our framework correctly accounts for the repulsive biases previously reported for the perception of visual orientation, and show that the predicted tuning characteristics of the model neurons match the reported orientation tuning properties of neurons in primary visual cortex. Our results suggest that efficient coding is a promising hypothesis in constraining Bayesian models of perceptual inference. 1 Motivation Human perception is not perfect. Biases have been observed in a large number of perceptual tasks and modalities, of which the most salient ones constitute many well-known perceptual illusions. It has been suggested, however, that these biases do not reflect a failure of perception but rather an observer’s attempt to optimally combine the inherently noisy and ambiguous sensory information with appropriate prior knowledge about the world [13, 4, 14]. This hypothesis, which we will refer to as the Bayesian hypothesis, has indeed proven quite successful in providing a normative explanation of perception at a qualitative and, more recently, quantitative level (see e.g. [15]). A major challenge in forming models based on the Bayesian hypothesis is the correct selection of two main components: the prior distribution (belief) and the likelihood function. This has encouraged some to criticize the Bayesian hypothesis altogether, claiming that arbitrary choices for these components always allow for unjustified post-hoc explanations of the data [1]. We do not share this criticism, referring to a number of successful attempts to constrain prior beliefs and likelihood functions based on principled grounds. For example, prior beliefs have been defined as the relative distribution of the sensory variable in the environment in cases where these statistics are relatively easy to measure (e.g. local visual orientations [16]), or where it can be assumed that subjects have learned them over the course of the experiment (e.g. time perception [17]). Other studies have constrained the likelihood function according to known noise characteristics of neurons that are crucially involved in the specific perceptual process (e.g motion tuned neurons in visual cor∗ http://www.sas.upenn.edu/ astocker/lab 1 world neural representation efficient encoding percept Bayesian decoding Figure 1: Encoding-decoding framework. A stimulus representing a sensory variable θ elicits a firing rate response R = {r1 , r2 , ..., rN } in a population of N neurons. The perceptual task is to generate a ˆ good estimate θ(R) of the presented value of the sensory variable based on this population response. Our framework assumes that encoding is efficient, and decoding is Bayesian based on the likelihood p(R|θ), the prior p(θ), and a squared-error loss function. tex [18]). However, we agree that finding appropriate constraints is generally difficult and that prior beliefs and likelihood functions have been often selected on the basis of mathematical convenience. Here, we propose that the efficient coding hypothesis [19] offers a joint constraint on the prior and likelihood function in neural implementations of Bayesian inference. Efficient coding provides a normative description of how neurons encode sensory information, and suggests a direct link between measured perceptual discriminability, neural tuning characteristics, and environmental statistics [11]. We show how this link can be extended to a full Bayesian account of perception that includes perceptual biases. We validate our model framework against behavioral as well as neural data characterizing the perception of visual orientation. We demonstrate that we can account not only for the reported perceptual biases away from the cardinal orientations, but also for the specific response characteristics of orientation-tuned neurons in primary visual cortex. Our work is a novel proposal of how two important normative hypotheses in perception science, namely efficient (en)coding and Bayesian decoding, might be linked. 2 Encoding-decoding framework We consider perception as an inference process that takes place along the simplified neural encodingdecoding cascade illustrated in Fig. 11 . 2.1 Efficient encoding Efficient encoding proposes that the tuning characteristics of a neural population are adapted to the prior distribution p(θ) of the sensory variable such that the population optimally represents the sensory variable [19]. Different definitions of “optimally” are possible, and may lead to different results. Here, we assume an efficient representation that maximizes the mutual information between the sensory variable and the population response. With this definition and an upper limit on the total firing activity, the square-root of the Fisher Information must be proportional to the prior distribution [12, 21]. In order to constrain the tuning curves of individual neurons in the population we also impose a homogeneity constraint, requiring that there exists a one-to-one mapping F (θ) that transforms the ˜ physical space with units θ to a homogeneous space with units θ = F (θ) in which the stimulus distribution becomes uniform. This defines the mapping as θ F (θ) = p(χ)dχ , (1) −∞ which is the cumulative of the prior distribution p(θ). We then assume a neural population with identical tuning curves that evenly tiles the stimulus range in this homogeneous space. The population provides an efficient representation of the sensory variable θ according to the above constraints [11]. ˜ The tuning curves in the physical space are obtained by applying the inverse mapping F −1 (θ). Fig. 2 1 In the context of this paper, we consider ‘inferring’, ‘decoding’, and ‘estimating’ as synonymous. 2 stimulus distribution d samples # a Fisher information discriminability and average firing rates and b firing rate [ Hz] efficient encoding F likelihood function F -1 likelihood c symmetric asymmetric homogeneous space physical space Figure 2: Efficient encoding constrains the likelihood function. a) Prior distribution p(θ) derived from stimulus statistics. b) Efficient coding defines the shape of the tuning curves in the physical space by transforming a set of homogeneous neurons using a mapping F −1 that is the inverse of the cumulative of the prior p(θ) (see Eq. (1)). c) As a result, the likelihood shape is constrained by the prior distribution showing heavier tails on the side of lower prior density. d) Fisher information, discrimination threshold, and average firing rates are all uniform in the homogeneous space. illustrates the applied efficient encoding scheme, the mapping, and the concept of the homogeneous space for the example of a symmetric, exponentially decaying prior distribution p(θ). The key idea here is that by assuming efficient encoding, the prior (i.e. the stimulus distribution in the world) directly constrains the likelihood function. In particular, the shape of the likelihood is determined by the cumulative distribution of the prior. As a result, the likelihood is generally asymmetric, as shown in Fig. 2, exhibiting heavier tails on the side of the prior with lower density. 2.2 Bayesian decoding Let us consider a population of N sensory neurons that efficiently represents a stimulus variable θ as described above. A stimulus θ0 elicits a specific population response that is characterized by the vector R = [r1 , r2 , ..., rN ] where ri is the spike-count of the ith neuron over a given time-window τ . Under the assumption that the variability in the individual firing rates is governed by a Poisson process, we can write the likelihood function over θ as N p(R|θ) = (τ fi (θ))ri −τ fi (θ) e , ri ! i=1 (2) ˆ with fi (θ) describing the tuning curve of neuron i. We then define a Bayesian decoder θLSE as the estimator that minimizes the expected squared-error between the estimate and the true stimulus value, thus θp(R|θ)p(θ)dθ ˆ θLSE (R) = , (3) p(R|θ)p(θ)dθ where we use Bayes’ rule to appropriately combine the sensory evidence with the stimulus prior p(θ). 3 Bayesian estimates can be biased away from prior peaks Bayesian models of perception typically predict perceptual biases toward the peaks of the prior density, a characteristic often considered a hallmark of Bayesian inference. This originates from the 3 a b prior attraction prior prior attraction likelihood repulsion! likelihood c prior prior repulsive bias likelihood likelihood mean posterior mean posterior mean Figure 3: Bayesian estimates biased away from the prior. a) If the likelihood function is symmetric, then the estimate (posterior mean) is, on average, shifted away from the actual value of the sensory variable θ0 towards the prior peak. b) Efficient encoding typically leads to an asymmetric likelihood function whose normalized mean is away from the peak of the prior (relative to θ0 ). The estimate is determined by a combination of prior attraction and shifted likelihood mean, and can exhibit an overall repulsive bias. c) If p(θ0 ) < 0 and the likelihood is relatively narrow, then (1/p(θ)2 ) > 0 (blue line) and the estimate is biased away from the prior peak (see Eq. (6)). common approach of choosing a parametric description of the likelihood function that is computationally convenient (e.g. Gaussian). As a consequence, likelihood functions are typically assumed to be symmetric (but see [23, 24]), leaving the bias of the Bayesian estimator to be mainly determined by the shape of the prior density, i.e. leading to biases toward the peak of the prior (Fig. 3a). In our model framework, the shape of the likelihood function is constrained by the stimulus prior via efficient neural encoding, and is generally not symmetric for non-flat priors. It has a heavier tail on the side with lower prior density (Fig. 3b). The intuition is that due to the efficient allocation of neural resources, the side with smaller prior density will be encoded less accurately, leading to a broader likelihood function on that side. The likelihood asymmetry pulls the Bayes’ least-squares estimate away from the peak of the prior while at the same time the prior pulls it toward its peak. Thus, the resulting estimation bias is the combination of these two counter-acting forces - and both are determined by the prior! 3.1 General derivation of the estimation bias In the following, we will formally derive the mean estimation bias b(θ) of the proposed encodingdecoding framework. Specifically, we will study the conditions for which the bias is repulsive i.e. away from the peak of the prior density. ˆ We first re-write the estimator θLSE (3) by replacing θ with the inverse of its mapping to the homo−1 ˜ geneous space, i.e., θ = F (θ). The motivation for this is that the likelihood in the homogeneous space is symmetric (Fig. 2). Given a value θ0 and the elicited population response R, we can write the estimator as ˜ ˜ ˜ ˜ θp(R|θ)p(θ)dθ F −1 (θ)p(R|F −1 (θ))p(F −1 (θ))dF −1 (θ) ˆ θLSE (R) = = . ˜ ˜ ˜ p(R|θ)p(θ)dθ p(R|F −1 (θ))p(F −1 (θ))dF −1 (θ) Calculating the derivative of the inverse function and noting that F is the cumulative of the prior density, we get 1 1 1 ˜ ˜ ˜ ˜ ˜ ˜ dθ = dθ. dF −1 (θ) = (F −1 (θ)) dθ = dθ = −1 (θ)) ˜ F (θ) p(θ) p(F ˆ Hence, we can simplify θLSE (R) as ˆ θLSE (R) = ˜ ˜ ˜ F −1 (θ)p(R|F −1 (θ))dθ . ˜ ˜ p(R|F −1 (θ))dθ With ˜ K(R, θ) = ˜ p(R|F −1 (θ)) ˜ ˜ p(R|F −1 (θ))dθ 4 we can further simplify the notation and get ˆ θLSE (R) = ˜ ˜ ˜ F −1 (θ)K(R, θ)dθ . (4) ˆ ˜ In order to get the expected value of the estimate, θLSE (θ), we marginalize (4) over the population response space S, ˆ ˜ ˜ ˜ ˜ θLSE (θ) = p(R)F −1 (θ)K(R, θ)dθdR S = F −1 ˜ (θ)( ˜ ˜ p(R)K(R, θ)dR)dθ = ˜ ˜ ˜ F −1 (θ)L(θ)dθ, S where we define ˜ L(θ) = ˜ p(R)K(R, θ)dR. S ˜ ˜ ˜ It follows that L(θ)dθ = 1. Due to the symmetry in this space, it can be shown that L(θ) is ˜0 . Intuitively, L(θ) can be thought as the normalized ˜ symmetric around the true stimulus value θ average likelihood in the homogeneous space. We can then compute the expected bias at θ0 as b(θ0 ) = ˜ ˜ ˜ ˜ F −1 (θ)L(θ)dθ − F −1 (θ0 ) (5) ˜ This is expression is general where F −1 (θ) is defined as the inverse of the cumulative of an arbitrary ˜ prior density p(θ) (see Eq. (1)) and the dispersion of L(θ) is determined by the internal noise level. ˜ ˜ Assuming the prior density to be smooth, we expand F −1 in a neighborhood (θ0 − h, θ0 + h) that is larger than the support of the likelihood function. Using Taylor’s theorem with mean-value forms of the remainder, we get 1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ F −1 (θ) = F −1 (θ0 ) + F −1 (θ0 ) (θ − θ0 ) + F −1 (θx ) (θ − θ0 )2 , 2 ˜ ˜ ˜ with θx lying between θ0 and θ. By applying this expression to (5), we find ˜ θ0 +h b(θ0 ) = = 1 2 ˜ θ0 −h 1 −1 ˜ ˜ ˜ ˜ ˜ 1 F (θx )θ (θ − θ0 )2 L(θ)dθ = ˜ 2 2 ˜ θ0 +h −( ˜ θ0 −h p(θx )θ ˜ ˜ 2 ˜ ˜ 1 )(θ − θ0 ) L(θ)dθ = p(θx )3 4 ˜ θ0 +h 1 ˜ − θ0 )2 L(θ)dθ ˜ ˜ ˜ ( ) ˜(θ ˜ p(F −1 (θx )) θ ( 1 ˜ ˜ ˜ ˜ ) (θ − θ0 )2 L(θ)dθ. p(θx )2 θ ˜ θ0 −h ˜ θ0 +h ˜ θ0 −h In general, there is no simple rule to judge the sign of b(θ0 ). However, if the prior is monotonic ˜ ˜ on the interval F −1 ((θ0 − h, θ0 + h)), then the sign of ( p(θ1 )2 ) is always the same as the sign of x 1 1 ( p(θ0 )2 ) . Also, if the likelihood is sufficiently narrow we can approximate ( p(θ1 )2 ) by ( p(θ0 )2 ) , x and therefore approximate the bias as b(θ0 ) ≈ C( 1 ) , p(θ0 )2 (6) where C is a positive constant. The result is quite surprising because it states that as long as the prior is monotonic over the support of the likelihood function, the expected estimation bias is always away from the peaks of the prior! 3.2 Internal (neural) versus external (stimulus) noise The above derivation of estimation bias is based on the assumption that all uncertainty about the sensory variable is caused by neural response variability. This level of internal noise depends on the response magnitude, and thus can be modulated e.g. by changing stimulus contrast. This contrastcontrolled noise modulation is commonly exploited in perceptual studies (e.g. [18]). Internal noise will always lead to repulsive biases in our framework if the prior is monotonic. If internal noise is low, the likelihood is narrow and thus the bias is small. Increasing internal noise leads to increasingly 5 larger biases up to the point where the likelihood becomes wide enough such that monotonicity of the prior over the support of the likelihood is potentially violated. Stimulus noise is another way to modulate the noise level in perception (e.g. random-dot motion stimuli). Such external noise, however, has a different effect on the shape of the likelihood function as compared to internal noise. It modifies the likelihood function (2) by convolving it with the noise kernel. External noise is frequently chosen as additive and symmetric (e.g. zero-mean Gaussian). It is straightforward to prove that such symmetric external noise does not lead to a change in the mean of the likelihood, and thus does not alter the repulsive effect induced by its asymmetry. However, by increasing the overall width of the likelihood, the attractive influence of the prior increases, resulting in an estimate that is closer to the prior peak than without external noise2 . 4 Perception of visual orientation We tested our framework by modelling the perception of visual orientation. Our choice was based on the fact that i) we have pretty good estimates of the prior distribution of local orientations in natural images, ii) tuning characteristics of orientation selective neurons in visual cortex are wellstudied (monkey/cat), and iii) biases in perceived stimulus orientation have been well characterized. We start by creating an efficient neural population based on measured prior distributions of local visual orientation, and then compare the resulting tuning characteristics of the population and the predicted perceptual biases with reported data in the literature. 4.1 Efficient neural model population for visual orientation Previous studies measured the statistics of the local orientation in large sets of natural images and consistently found that the orientation distribution is multimodal, peaking at the two cardinal orientations as shown in Fig. 4a [16, 20]. We assumed that the visual system’s prior belief over orientation p(θ) follows this distribution and approximate it formally as p(θ) ∝ 2 − | sin(θ)| (black line in Fig. 4b) . (7) Based on this prior distribution we defined an efficient neural representation for orientation. We assumed a population of model neurons (N = 30) with tuning curves that follow a von-Mises distribution in the homogeneous space on top of a constant spontaneous firing rate (5 Hz). We then ˜ applied the inverse transformation F −1 (θ) to all these tuning curves to get the corresponding tuning curves in the physical space (Fig. 4b - red curves), where F (θ) is the cumulative of the prior (7). The concentration parameter for the von-Mises tuning curves was set to κ ≈ 1.6 in the homogeneous space in order to match the measured average tuning width (∼ 32 deg) of neurons in area V1 of the macaque [9]. 4.2 Predicted tuning characteristics of neurons in primary visual cortex The orientation tuning characteristics of our model population well match neurophysiological data of neurons in primary visual cortex (V1). Efficient encoding predicts that the distribution of neurons’ preferred orientation follows the prior, with more neurons tuned to cardinal than oblique orientations by a factor of approximately 1.5. A similar ratio has been found for neurons in area V1 of monkey/cat [9, 10]. Also, the tuning widths of the model neurons vary between 25-42 deg depending on their preferred tuning (see Fig. 4c), matching the measured tuning width ratio of 0.6 between neurons tuned to the cardinal versus oblique orientations [9]. An important prediction of our model is that most of the tuning curves should be asymmetric. Such asymmetries have indeed been reported for the orientation tuning of neurons in area V1 [6, 7, 8]. We computed the asymmetry index for our model population as defined in previous studies [6, 7], and plotted it as a function of the preferred tuning of each neuron (Fig. 4d). The overall asymmetry index in our model population is 1.24 ± 0.11, which approximately matches the measured values for neurons in area V1 of the cat (1.26 ± 0.06) [6]. It also predicts that neurons tuned to the cardinal and oblique orientations should show less symmetry than those tuned to orientations in between. Finally, 2 Note, that these predictions are likely to change if the external noise is not symmetric. 6 a b 25 firing rate(Hz) 0 orientation(deg) asymmetry vs. tuning width 1.0 2.0 90 2.0 e asymmetry 1.0 0 asymmetry index 50 30 width (deg) 10 90 preferred tuning(deg) -90 0 d 0 0 90 asymmetry index 0 orientation(deg) tuning width -90 0 0 probability 0 -90 c efficient representation 0.01 0.01 image statistics -90 0 90 preferred tuning(deg) 25 30 35 40 tuning width (deg) Figure 4: Tuning characteristics of model neurons. a) Distribution of local orientations in natural images, replotted from [16]. b) Prior used in the model (black) and predicted tuning curves according to efficient coding (red). c) Tuning width as a function of preferred orientation. d) Tuning curves of cardinal and oblique neurons are more symmetric than those tuned to orientations in between. e) Both narrowly and broadly tuned neurons neurons show less asymmetry than neurons with tuning widths in between. neurons with tuning widths at the lower and upper end of the range are predicted to exhibit less asymmetry than those neurons whose widths lie in between these extremes (illustrated in Fig. 4e). These last two predictions have not been tested yet. 4.3 Predicted perceptual biases Our model framework also provides specific predictions for the expected perceptual biases. Humans show systematic biases in perceived orientation of visual stimuli such as e.g. arrays of Gabor patches (Fig. 5a,d). Two types of biases can be distinguished: First, perceived orientations show an absolute bias away from the cardinal orientations, thus away from the peaks of the orientation prior [2, 3]. We refer to these biases as absolute because they are typically measured by adjusting a noise-free reference until it matched the orientation of the test stimulus. Interestingly, these repulsive absolute biases are the larger the smaller the external stimulus noise is (see Fig. 5b). Second, the relative bias between the perceived overall orientations of a high-noise and a low-noise stimulus is toward the cardinal orientations as shown in Fig. 5c, and thus toward the peak of the prior distribution [3, 16]. The predicted perceptual biases of our model are shown Fig. 5e,f. We computed the likelihood function according to (2) and used the prior in (7). External noise was modeled by convolving the stimulus likelihood function with a Gaussian (different widths for different noise levels). The predictions well match both, the reported absolute bias away as well as the relative biases toward the cardinal orientations. Note, that our model framework correctly accounts for the fact that less external noise leads to larger absolute biases (see also discussion in section 3.2). 5 Discussion We have presented a modeling framework for perception that combines efficient (en)coding and Bayesian decoding. Efficient coding imposes constraints on the tuning characteristics of a population of neurons according to the stimulus distribution (prior). It thus establishes a direct link between prior and likelihood, and provides clear constraints on the latter for a Bayesian observer model of perception. We have shown that the resulting likelihoods are in general asymmetric, with 7 absolute bias (data) b c relative bias (data) -4 0 bias(deg) 4 a low-noise stimulus -90 e 90 absolute bias (model) low external noise high external noise 3 high-noise stimulus -90 f 0 90 relative bias (model) 0 bias(deg) d 0 attraction -3 repulsion -90 0 orientation (deg) 90 -90 0 orientation (deg) 90 Figure 5: Biases in perceived orientation: Human data vs. Model prediction. a,d) Low- and highnoise orientation stimuli of the type used in [3, 16]. b) Humans show absolute biases in perceived orientation that are away from the cardinal orientations. Data replotted from [2] (pink squares) and [3] (green (black) triangles: bias for low (high) external noise). c) Relative bias between stimuli with different external noise level (high minus low). Data replotted from [3] (blue triangles) and [16] (red circles). e,f) Model predictions for absolute and relative bias. heavier tails away from the prior peaks. We demonstrated that such asymmetric likelihoods can lead to the counter-intuitive prediction that a Bayesian estimator is biased away from the peaks of the prior distribution. Interestingly, such repulsive biases have been reported for human perception of visual orientation, yet a principled and consistent explanation of their existence has been missing so far. Here, we suggest that these counter-intuitive biases directly follow from the asymmetries in the likelihood function induced by efficient neural encoding of the stimulus. The good match between our model predictions and the measured perceptual biases and orientation tuning characteristics of neurons in primary visual cortex provides further support of our framework. Previous work has suggested that there might be a link between stimulus statistics, neuronal tuning characteristics, and perceptual behavior based on efficient coding principles, yet none of these studies has recognized the importance of the resulting likelihood asymmetries [16, 11]. We have demonstrated here that such asymmetries can be crucial in explaining perceptual data, even though the resulting estimates appear “anti-Bayesian” at first sight (see also models of sensory adaptation [23]). Note, that we do not provide a neural implementation of the Bayesian inference step. However, we and others have proposed various neural decoding schemes that can approximate Bayes’ leastsquares estimation using efficient coding [26, 25, 22]. It is also worth pointing out that our estimator is set to minimize total squared-error, and that other choices of the loss function (e.g. MAP estimator) could lead to different predictions. Our framework is general and should be directly applicable to other modalities. In particular, it might provide a new explanation for perceptual biases that are hard to reconcile with traditional Bayesian approaches [5]. Acknowledgments We thank M. Jogan and A. Tank for helpful comments on the manuscript. This work was partially supported by grant ONR N000141110744. 8 References [1] M. Jones, and B. C. Love. Bayesian fundamentalism or enlightenment? On the explanatory status and theoretical contributions of Bayesian models of cognition. Behavioral and Brain Sciences, 34, 169–231,2011. [2] D. P. Andrews. Perception of contours in the central fovea. Nature, 205:1218- 1220, 1965. [3] A. Tomassini, M. J.Morgam. and J. A. Solomon. Orientation uncertainty reduces perceived obliquity. Vision Res, 50, 541–547, 2010. [4] W. S. Geisler, D. Kersten. Illusions, perception and Bayes. Nature Neuroscience, 5(6):508- 510, 2002. [5] M. O. Ernst Perceptual learning: inverting the size-weight illusion. Current Biology, 19:R23- R25, 2009. [6] G. H. Henry, B. Dreher, P. O. Bishop. Orientation specificity of cells in cat striate cortex. J Neurophysiol, 37(6):1394-409,1974. [7] D. Rose, C. Blakemore An analysis of orientation selectivity in the cat’s visual cortex. Exp Brain Res., Apr 30;20(1):1-17, 1974. [8] N. V. Swindale. Orientation tuning curves: empirical description and estimation of parameters. Biol Cybern., 78(1):45-56, 1998. [9] R. L. De Valois, E. W. Yund, N. Hepler. The orientation and direction selectivity of cells in macaque visual cortex. Vision Res.,22, 531544,1982. [10] B. Li, M. R. Peterson, R. D. Freeman. The oblique effect: a neural basis in the visual cortex. J. Neurophysiol., 90, 204217, 2003. [11] D. Ganguli and E.P. Simoncelli. Implicit encoding of prior probabilities in optimal neural populations. In Adv. Neural Information Processing Systems NIPS 23, vol. 23:658–666, 2011. [12] M. D. McDonnell, N. G. Stocks. Maximally Informative Stimuli and Tuning Curves for Sigmoidal RateCoding Neurons and Populations. Phys Rev Lett., 101(5):058103, 2008. [13] H Helmholtz. Treatise on Physiological Optics (transl.). Thoemmes Press, Bristol, U.K., 2000. Original publication 1867. [14] Y. Weiss, E. Simoncelli, and E. Adelson. Motion illusions as optimal percept. Nature Neuroscience, 5(6):598–604, June 2002. [15] D.C. Knill and W. Richards, editors. Perception as Bayesian Inference. Cambridge University Press, 1996. [16] A R Girshick, M S Landy, and E P Simoncelli. Cardinal rules: visual orientation perception reflects knowledge of environmental statistics. Nat Neurosci, 14(7):926–932, Jul 2011. [17] M. Jazayeri and M.N. Shadlen. Temporal context calibrates interval timing. Nature Neuroscience, 13(8):914–916, 2010. [18] A.A. Stocker and E.P. Simoncelli. Noise characteristics and prior expectations in human visual speed perception. Nature Neuroscience, pages 578–585, April 2006. [19] H.B. Barlow. Possible principles underlying the transformation of sensory messages. In W.A. Rosenblith, editor, Sensory Communication, pages 217–234. MIT Press, Cambridge, MA, 1961. [20] D.M. Coppola, H.R. Purves, A.N. McCoy, and D. Purves The distribution of oriented contours in the real world. Proc Natl Acad Sci U S A., 95(7): 4002–4006, 1998. [21] N. Brunel and J.-P. Nadal. Mutual information, Fisher information and population coding. Neural Computation, 10, 7, 1731–1757, 1998. [22] X-X. Wei and A.A. Stocker. Bayesian inference with efficient neural population codes. In Lecture Notes in Computer Science, Artificial Neural Networks and Machine Learning - ICANN 2012, Lausanne, Switzerland, volume 7552, pages 523–530, 2012. [23] A.A. Stocker and E.P. Simoncelli. Sensory adaptation within a Bayesian framework for perception. In Y. Weiss, B. Sch¨ lkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages o 1291–1298. MIT Press, Cambridge, MA, 2006. Oral presentation. [24] D.C. Knill. Robust cue integration: A Bayesian model and evidence from cue-conflict studies with stereoscopic and figure cues to slant. Journal of Vision, 7(7):1–24, 2007. [25] Deep Ganguli. Efficient coding and Bayesian inference with neural populations. PhD thesis, Center for Neural Science, New York University, New York, NY, September 2012. [26] B. Fischer. Bayesian estimates from heterogeneous population codes. In Proc. IEEE Intl. Joint Conf. on Neural Networks. IEEE, 2010. 9

3 0.72058237 94 nips-2012-Delay Compensation with Dynamical Synapses

Author: Chi Fung, K. Wong, Si Wu

Abstract: Time delay is pervasive in neural information processing. To achieve real-time tracking, it is critical to compensate the transmission and processing delays in a neural system. In the present study we show that dynamical synapses with shortterm depression can enhance the mobility of a continuous attractor network to the extent that the system tracks time-varying stimuli in a timely manner. The state of the network can either track the instantaneous position of a moving stimulus perfectly (with zero-lag) or lead it with an effectively constant time, in agreement with experiments on the head-direction systems in rodents. The parameter regions for delayed, perfect and anticipative tracking correspond to network states that are static, ready-to-move and spontaneously moving, respectively, demonstrating the strong correlation between tracking performance and the intrinsic dynamics of the network. We also find that when the speed of the stimulus coincides with the natural speed of the network state, the delay becomes effectively independent of the stimulus amplitude.

4 0.67032069 24 nips-2012-A mechanistic model of early sensory processing based on subtracting sparse representations

Author: Shaul Druckmann, Tao Hu, Dmitri B. Chklovskii

Abstract: Early stages of sensory systems face the challenge of compressing information from numerous receptors onto a much smaller number of projection neurons, a so called communication bottleneck. To make more efficient use of limited bandwidth, compression may be achieved using predictive coding, whereby predictable, or redundant, components of the stimulus are removed. In the case of the retina, Srinivasan et al. (1982) suggested that feedforward inhibitory connections subtracting a linear prediction generated from nearby receptors implement such compression, resulting in biphasic center-surround receptive fields. However, feedback inhibitory circuits are common in early sensory circuits and furthermore their dynamics may be nonlinear. Can such circuits implement predictive coding as well? Here, solving the transient dynamics of nonlinear reciprocal feedback circuits through analogy to a signal-processing algorithm called linearized Bregman iteration we show that nonlinear predictive coding can be implemented in an inhibitory feedback circuit. In response to a step stimulus, interneuron activity in time constructs progressively less sparse but more accurate representations of the stimulus, a temporally evolving prediction. This analysis provides a powerful theoretical framework to interpret and understand the dynamics of early sensory processing in a variety of physiological experiments and yields novel predictions regarding the relation between activity and stimulus statistics.

5 0.54382652 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization

Author: Mijung Park, Jonathan W. Pillow

Abstract: Active learning methods can dramatically improve the yield of neurophysiology experiments by adaptively selecting stimuli to probe a neuron’s receptive field (RF). Bayesian active learning methods specify a posterior distribution over the RF given the data collected so far in the experiment, and select a stimulus on each time step that maximally reduces posterior uncertainty. However, existing methods tend to employ simple Gaussian priors over the RF and do not exploit uncertainty at the level of hyperparameters. Incorporating this uncertainty can substantially speed up active learning, particularly when RFs are smooth, sparse, or local in space and time. Here we describe a novel framework for active learning under hierarchical, conditionally Gaussian priors. Our algorithm uses sequential Markov Chain Monte Carlo sampling (“particle filtering” with MCMC) to construct a mixture-of-Gaussians representation of the RF posterior, and selects optimal stimuli using an approximate infomax criterion. The core elements of this algorithm are parallelizable, making it computationally efficient for real-time experiments. We apply our algorithm to simulated and real neural data, and show that it can provide highly accurate receptive field estimates from very limited data, even with a small number of hyperparameter samples. 1

6 0.51369995 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

7 0.50710028 195 nips-2012-Learning visual motion in recurrent neural networks

8 0.47850108 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

9 0.45194638 256 nips-2012-On the connections between saliency and tracking

10 0.44821903 224 nips-2012-Multi-scale Hyper-time Hardware Emulation of Human Motor Nervous System Based on Spiking Neurons using FPGA

11 0.44760162 239 nips-2012-Neuronal Spike Generation Mechanism as an Oversampling, Noise-shaping A-to-D converter

12 0.43400624 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

13 0.41278467 57 nips-2012-Bayesian estimation of discrete entropy with mixtures of stick-breaking priors

14 0.40979207 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

15 0.40712932 2 nips-2012-3D Social Saliency from Head-mounted Cameras

16 0.40152654 190 nips-2012-Learning optimal spike-based representations

17 0.39896354 161 nips-2012-Interpreting prediction markets: a stochastic approach

18 0.39412585 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

19 0.38816151 73 nips-2012-Coding efficiency and detectability of rate fluctuations with non-Poisson neuronal firing

20 0.38771665 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(21, 0.033), (38, 0.638), (42, 0.021), (55, 0.014), (74, 0.014), (76, 0.075), (80, 0.068), (92, 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98927569 262 nips-2012-Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L p$ Loss

Author: Zhuo Wang, Alan Stocker, Daniel Lee

Abstract: In this work we study how the stimulus distribution influences the optimal coding of an individual neuron. Closed-form solutions to the optimal sigmoidal tuning curve are provided for a neuron obeying Poisson statistics under a given stimulus distribution. We consider a variety of optimality criteria, including maximizing discriminability, maximizing mutual information and minimizing estimation error under a general Lp norm. We generalize the Cramer-Rao lower bound and show how the Lp loss can be written as a functional of the Fisher Information in the asymptotic limit, by proving the moment convergence of certain functions of Poisson random variables. In this manner, we show how the optimal tuning curve depends upon the loss function, and the equivalence of maximizing mutual information with minimizing Lp loss in the limit as p goes to zero. 1

2 0.96911967 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

Author: Xinhua Zhang, Dale Schuurmans, Yao-liang Yu

Abstract: Sparse learning models typically combine a smooth loss with a nonsmooth penalty, such as trace norm. Although recent developments in sparse approximation have offered promising solution methods, current approaches either apply only to matrix-norm constrained problems or provide suboptimal convergence rates. In this paper, we propose a boosting method for regularized learning that guarantees accuracy within O(1/ ) iterations. Performance is further accelerated by interlacing boosting with fixed-rank local optimization—exploiting a simpler local objective than previous work. The proposed method yields state-of-the-art performance on large-scale problems. We also demonstrate an application to latent multiview learning for which we provide the first efficient weak-oracle. 1

3 0.96853429 84 nips-2012-Convergence Rate Analysis of MAP Coordinate Minimization Algorithms

Author: Ofer Meshi, Amir Globerson, Tommi S. Jaakkola

Abstract: Finding maximum a posteriori (MAP) assignments in graphical models is an important task in many applications. Since the problem is generally hard, linear programming (LP) relaxations are often used. Solving these relaxations efficiently is thus an important practical problem. In recent years, several authors have proposed message passing updates corresponding to coordinate descent in the dual LP. However, these are generally not guaranteed to converge to a global optimum. One approach to remedy this is to smooth the LP, and perform coordinate descent on the smoothed dual. However, little is known about the convergence rate of this procedure. Here we perform a thorough rate analysis of such schemes and derive primal and dual convergence rates. We also provide a simple dual to primal mapping that yields feasible primal solutions with a guaranteed rate of convergence. Empirical evaluation supports our theoretical claims and shows that the method is highly competitive with state of the art approaches that yield global optima. 1

4 0.96039331 165 nips-2012-Iterative ranking from pair-wise comparisons

Author: Sahand Negahban, Sewoong Oh, Devavrat Shah

Abstract: The question of aggregating pairwise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR’s TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on transactions. In most settings, in addition to obtaining ranking, finding ‘scores’ for each object (e.g. player’s rating) is of interest to understanding the intensity of the preferences. In this paper, we propose a novel iterative rank aggregation algorithm for discovering scores for objects from pairwise comparisons. The algorithm has a natural random walk interpretation over the graph of objects with edges present between two objects if they are compared; the scores turn out to be the stationary probability of this random walk. The algorithm is model independent. To establish the efficacy of our method, however, we consider the popular Bradley-Terry-Luce (BTL) model in which each object has an associated score which determines the probabilistic outcomes of pairwise comparisons between objects. We bound the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. This, in essence, leads to order-optimal dependence on the number of samples required to learn the scores well by our algorithm. Indeed, the experimental evaluation shows that our (model independent) algorithm performs as well as the Maximum Likelihood Estimator of the BTL model and outperforms a recently proposed algorithm by Ammar and Shah [1]. 1

5 0.93094319 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

Author: Kevin Swersky, Brendan J. Frey, Daniel Tarlow, Richard S. Zemel, Ryan P. Adams

Abstract: In categorical data there is often structure in the number of variables that take on each label. For example, the total number of objects in an image and the number of highly relevant documents per query in web search both tend to follow a structured distribution. In this paper, we study a probabilistic model that explicitly includes a prior distribution over such counts, along with a count-conditional likelihood that defines probabilities over all subsets of a given size. When labels are binary and the prior over counts is a Poisson-Binomial distribution, a standard logistic regression model is recovered, but for other count distributions, such priors induce global dependencies and combinatorics that appear to complicate learning and inference. However, we demonstrate that simple, efficient learning procedures can be derived for more general forms of this model. We illustrate the utility of the formulation by exploring applications to multi-object classification, learning to rank, and top-K classification. 1

6 0.930637 181 nips-2012-Learning Multiple Tasks using Shared Hypotheses

7 0.92755836 268 nips-2012-Perfect Dimensionality Recovery by Variational Bayesian PCA

8 0.91366529 240 nips-2012-Newton-Like Methods for Sparse Inverse Covariance Estimation

9 0.91040713 143 nips-2012-Globally Convergent Dual MAP LP Relaxation Solvers using Fenchel-Young Margins

10 0.9031502 208 nips-2012-Matrix reconstruction with the local max norm

11 0.85698068 285 nips-2012-Query Complexity of Derivative-Free Optimization

12 0.85233313 241 nips-2012-No-Regret Algorithms for Unconstrained Online Convex Optimization

13 0.8461777 86 nips-2012-Convex Multi-view Subspace Learning

14 0.84172332 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

15 0.8337211 30 nips-2012-Accuracy at the Top

16 0.8311587 324 nips-2012-Stochastic Gradient Descent with Only One Projection

17 0.82884908 17 nips-2012-A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound

18 0.82530749 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

19 0.82260245 187 nips-2012-Learning curves for multi-task Gaussian process regression

20 0.82254994 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference