nips nips2004 nips2004-148 nips2004-148-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Richard S. Zemel, Rama Natarajan, Peter Dayan, Quentin J. Huys
Abstract: As animals interact with their environments, they must constantly update estimates about their states. Bayesian models combine prior probabilities, a dynamical model and sensory evidence to update estimates optimally. These models are consistent with the results of many diverse psychophysical studies. However, little is known about the neural representation and manipulation of such Bayesian information, particularly in populations of spiking neurons. We consider this issue, suggesting a model based on standard neural architecture and activations. We illustrate the approach on a simple random walk example, and apply it to a sensorimotor integration task that provides a particularly compelling example of dynamic probabilistic computation. Bayesian models have been used to explain a gamut of experimental results in tasks which require estimates to be derived from multiple sensory cues. These include a wide range of psychophysical studies of perception;13 motor action;7 and decision-making.3, 5 Central to Bayesian inference is that computations are sensitive to uncertainties about afferent and efferent quantities, arising from ignorance, noise, or inherent ambiguity (e.g., the aperture problem), and that these uncertainties change over time as information accumulates and dissipates. Understanding how neurons represent and manipulate uncertain quantities is therefore key to understanding the neural instantiation of these Bayesian inferences. Most previous work on representing probabilistic inference in neural populations has focused on the representation of static information.1, 12, 15 These encompass various strategies for encoding and decoding uncertain quantities, but do not readily generalize to real-world dynamic information processing tasks, particularly the most interesting cases with stimuli changing over the same timescale as spiking itself.11 Notable exceptions are the recent, seminal, but, as we argue, representationally restricted, models proposed by Gold and Shadlen,5 Rao,10 and Deneve.4 In this paper, we first show how probabilistic information varying over time can be represented in a spiking population code. Second, we present a method for producing spiking codes that facilitate further processing of the probabilistic information. Finally, we show the utility of this method by applying it to a temporal sensorimotor integration task. 1 TRAJECTORY ENCODING AND DECODING We assume that population spikes R(t) arise stochastically in relation to the trajectory X(t) of an underlying (but hidden) variable. We use RT and XT for the whole trajectory and spike trains respectively from time 0 to T . The spikes RT constitute the observations and are assumed to be probabilistically related to the signal by a tuning function f (X, θ i ): P (R(i, T )|X(T )) ∝ f (X, θi ) (1) for the spike train of the ith neuron, with parameters θi . Therefore, via standard Bayesian inference, RT determines a distribution over the hidden variable at time T , P (X(T )|RT ). We first consider a version of the dynamics and input coding that permits an analytical examination of the impact of spikes. Let X(t) follow a stationary Gaussian process such that the joint distribution P (X(t1 ), X(t2 ), . . . , X(tm )) is Gaussian for any finite collection of times, with a covariance matrix which depends on time differences: Ctt = c(|t − t |). Function c(|∆t|) controls the smoothness of the resulting random walks. Then, P (X(T )|RT ) ∝ p(X(T )) X(T ) dX(T )P (RT |X(T ))P (X(T )|X(T )) (2) where P (X(T )|X(T )) is the distribution over the whole trajectory X(T ) conditional on the value of X(T ) at its end point. If RT are a set of conditionally independent inhomogeneous Poisson processes, we have P (RT |X(T )) ∝ iτ f (X(tiτ ), θi ) exp − i τ dτ f (X(τ ), θi ) , (3) where tiτ ∀τ are the spike times τ of neuron i in RT . Let χ = [X(tiτ )] be the vector of stimulus positions at the times at which we observed a spike and Θ = [θ(tiτ )] be the vector of spike positions. If the tuning functions are Gaussian f (X, θi ) ∝ exp(−(X − θi )2 /2σ 2 ) and sufficiently dense that i τ dτ f (X, θi ) is independent of X (a standard assumption in population coding), then P (RT |X(T )) ∝ exp(− χ − Θ 2 /2σ 2 ) and in Equation 2, we can marginalize out X(T ) except at the spike times tiτ : P (X(T )|RT ) ∝ p(X(T )) −1 χ dχ exp −[χ, X(T )]T C 2 [χ, X(T )] − χ−Θ 2σ 2 2 (4) and C is the block covariance matrix between X(tiτ ), x(T ) at the spike times [ttτ ] and the final time T . This Gaussian integral has P (X(T )|RT ) ∼ N (µ(T ), ν(T )), with µ(T ) = CT t (Ctt + Iσ 2 )−1 Θ = kΘ ν(T ) = CT T − kCtT (5) CT T is the T, T th element of the covariance matrix and CT t is similarly a row vector. The dependence in µ on past spike times is specified chiefly by the inverse covariance matrix, and acts as an effective kernel (k). This kernel is not stationary, since it depends on factors such as the local density of spiking in the spike train RT . For example, consider where X(t) evolves according to a diffusion process with drift: dX = −αXdt + σ dN (t) (6) where α prevents it from wandering too far, N (t) is white Gaussian noise with mean zero and σ 2 variance. Figure 1A shows sample kernels for this process. Inspection of Figure 1A reveals some important traits. First, the monotonically decreasing kernel magnitude as the time span between the spike and the current time T grows matches the intuition that recent spikes play a more significant role in determining the posterior over X(T ). Second, the kernel is nearly exponential, with a time constant that depends on the time constant of the covariance function and the density of the spikes; two settings of these parameters produced the two groupings of kernels in the figure. Finally, the fully adaptive kernel k can be locally well approximated by a metronomic kernel k (shown in red in Figure 1A) that assumes regular spiking. This takes advantage of the general fact, indicated by the grouping of kernels, that the kernel depends weakly on the actual spike pattern, but strongly on the average rate. The merits of the metronomic kernel are that it is stationary and only depends on a single mean rate rather than the full spike train RT . It also justifies s Kernels k and k −0.5 C 5 0 0.03 0.06 0.09 0.04 0.06 0.08 t−t Time spike True stimulus and means D Full kernel E Regular, stationary kernel −0.5 0 −0.5 0.03 0.04 0.05 0.06 0.07 Time 0.08 0.09 0 0.5 0.1 Space 0 Space −4 10 Space Variance ratio 10 −2 10 0.5 B ν2 / σ2 Kernel size (weight) A 0.1 0 0.5 0.03 0.04 0.05 0.06 0.07 Time 0.08 0.09 0.1 Figure 1: Exact and approximate spike decoding with the Gaussian process prior. Spikes are shown in yellow, the true stimulus in green, and P (X(T )|RT ) in gray. Blue: exact inference with nonstationary and red: approximate inference with regular spiking. A Kernel samples for a diffusion process as defined by equations 5, 6. B, C: Mean and variance of the inference. D: Exact inference with full kernel k and E: approximation based on metronomic kernel k