nips nips2004 nips2004-112 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Anthony J. Bell, Lucas C. Parra
Abstract: We use unsupervised probabilistic machine learning ideas to try to explain the kinds of learning observed in real neurons, the goal being to connect abstract principles of self-organisation to known biophysical processes. For example, we would like to explain Spike TimingDependent Plasticity (see [5,6] and Figure 3A), in terms of information theory. Starting out, we explore the optimisation of a network sensitivity measure related to maximising the mutual information between input spike timings and output spike timings. Our derivations are analogous to those in ICA, except that the sensitivity of output timings to input timings is maximised, rather than the sensitivity of output ‘firing rates’ to inputs. ICA and related approaches have been successful in explaining the learning of many properties of early visual receptive fields in rate coding models, and we are hoping for similar gains in understanding of spike coding in networks, and how this is supported, in principled probabilistic ways, by cellular biophysical processes. For now, in our initial simulations, we show that our derived rule can learn synaptic weights which can unmix, or demultiplex, mixed spike trains. That is, it can recover independent point processes embedded in distributed correlated input spike trains, using an adaptive single-layer feedforward spiking network. 1 Maximising Sensitivity. In this section, we will follow the structure of the ICA derivation [4] in developing the spiking theory. We cannot claim, as before, that this gives us an information maximisation algorithm, for reasons that we will delay addressing until Section 3. But for now, to first develop our approach, we will explore an interim objective function called sensitivity which we define as the log Jacobian of how input spike timings affect output spike timings. 1.1 How to maximise the effect of one spike timing on another. Consider a spike in neuron j at time tl that has an effect on the timing of another spike in neuron i at time tk . The neurons are connected by a weight wij . We use i and j to index neurons, and k and l to index spikes, but sometimes for convenience we will use spike indices in place of neuron indices. For example, wkl , the weight between an input spike l and an output spike k, is naturally understood to be just the corresponding wij . dtk dtl threshold potential du u(t) R(t) resting potential tk output spikes tl input spikes Figure 1: Firing time tk is determined by the time of threshold crossing. A change of an input spike time dtl affects, via a change of the membrane potential du the time of the output spike by dtk . In the simplest version of the Spike Response Model [7], spike l has an effect on spike k that depends on the time-course of the evoked EPSP or IPSP, which we write as R kl (tk − tl ). In general, this Rkl models both synaptic and dendritic linear responses to an input spike, and thus models synapse type and location. For learning, we need only consider the value of this function when an output spike, k, occurs. In this model, depicted in Figure 1, a neuron adds up its spiking inputs until its membrane potential, ui (t), reaches threshold at time tk . This threshold we will often, again for convenience, write as uk ≡ ui (tk , {tl }), and it is given by a sum over spikes l: uk = wkl Rkl (tk − tl ) . (1) l To maximise timing sensitivity, we need to determine the effect of a small change in the input firing time tl on the output firing time tk . (A related problem is tackled in [2].) When tl is changed by a small amount dtl the membrane potential will change as a result. This change in the membrane potential leads to a change in the time of threshold crossing dt k . The contribution to the membrane potential, du, due to dtl is (∂uk /∂tl )dtl , and the change in du corresponding to a change dtk is (∂uk /∂tk )dtk . We can relate these two effects by noting that the total change of the membrane potential du has to vanish because u k is defined as the potential at threshold. ie: du = ∂uk ∂uk dtk + dtl = 0 . ∂tk ∂tl (2) This is the total differential of the function uk = u(tk , {tl }), and is a special case of the implicit function theorem. Rearranging this: dtk ∂uk =− dtl ∂tl ∂uk ˙ = −wkl Rkl /uk . ˙ ∂tk (3) Now, to connect with the standard ICA derivation [4], recall the ‘rate’ (or sigmoidal) neuron, for which yi = gi (ui ) and ui = j wij xj . For this neuron, the output dependence on input is ∂yi /∂xj = wij gi while the learning gradient is: ∂yi ∂ 1 log − fi (ui )xj = ∂wij ∂xj wij (4) where the ‘score functions’, fi , are defined in terms of a density estimate on the summed ∂ ∂ inputs: fi (ui ) = ∂ui log gi = ∂ui log p(ui ). ˆ The analogous learning gradient for the spiking case, from (3), is: ˙ j(a)Rka ∂ dtk 1 log − a . = ∂wij dtl wij uk ˙ (5) where j(a) = 1 if spike a came from neuron j, and 0 otherwise. Comparing the two cases in (4) and (5), we see that the input variable xj has become the temporal derivative of the sum of the EPSPs coming from synapse j, and the output variable (or score function) fi (ui ) has become u−1 , the inverse of the temporal derivative ˙k of the membrane potential at threshold. It is intriguing (A) to see this quantity appear as analogous to the score function in the ICA likelihood model, and, (B) to speculate that experiments could show that this‘ voltage slope at threshold’ is a hidden factor in STDP data, explaining some of the scatter in Figure 3A. In other words, an STDP datapoint should lie on a 2-surface in a 3D space of {∆w, ∆t, uk }. Incidentally, uk shows up in any ˙ ˙ learning rule optimising an objective function involving output spike timings. 1.2 How to maximise the effect of N spike timings on N other ones. Now we deal with the case of a ‘square’ single-layer feedforward mapping between spike timings. There can be several input and output neurons, but here we ignore which neurons are spiking, and just look at how the input timings affect the output timings. This is captured in a Jacobian matrix of all timing dependencies we call T. The entries of this matrix are Tkl ≡ ∂tk /∂tl . A multivariate version of the sensitivity measure introduced in the previous section is the log of the absolute determinant of the timing matrix, ie: log |T|. The full derivation for the gradient W log |T| is in the Appendix. Here, we again draw out the analogy between Square ICA [4] and this gradient, as follows. Square ICA with a network y = g(Wx) is: ∆W ∝ W log |J| = W−1 − f (u)xT (6) where the Jacobian J has entries ∂yi /∂xj and the score functions are now, fi (u) = ∂ − ∂ui log p(u) for the general likelihood case, with p(u) = i gi being the special case of ˆ ˆ ICA. We will now split the gradient in (6) according to the chain rule: W log |J| = [ J log |J|] ⊗ [ W J] j(l) − fk (u)xj wkl J−T ⊗ Jkl i(k) = (7) . (8) In this equation, i(k) = δik and j(l) = δjl . The righthand term is a 4-tensor with entries ∂Jkl /∂wij , and ⊗ is defined as A ⊗ Bij = kl Akl Bklij . We write the gradient this way to preserve, in the second term, the independent structure of the 1 → 1 gradient term in (4), and to separate a difficult derivation into two easy parts. The structure of (8) holds up when we move to the spiking case, giving: W log |T| = = [ T log |T|] ⊗ [ W T] T−T ⊗ Tkl i(k) j(l) − wkl (9) a ˙ j(a)Rka uk ˙ (10) where i(k) is now defined as being 1 if spike k occured in neuron i, and 0 otherwise. j(l) and j(a) are analogously defined. Because the T matrix is much bigger than the J matrix, and because it’s entries are more complex, here the similarity ends. When (10) is evaluated for a single weight influencing a single spike coupling (see the Appendix for the full derivation), it yields: ∆wkl ∝ ∂ log |T| Tkl = ∂wkl wkl T−1 lk −1 , (11) This is a non-local update involving a matrix inverse at each step. In the ICA case of (6), such an inverse was removed by the Natural Gradient transform (see [1]), but in the spike timing case, this has turned out not to be possible, because of the additional asymmetry ˙ introduced into the T matrix (as opposed to the J matrix) by the Rkl term in (3). 2 Results. Nonetheless, this learning rule can be simulated. It requires running the network for a while to generate spikes (and a corresponding T matrix), and then for each input/output spike coupling, the corresponding synapse is updated according to (11). When this is done, and the weights learn, it is clear that something has been sacrificed by ignoring the issue of which neurons are producing the spikes. Specifically, the network will often put all the output spikes on one output neuron, with the rates of the others falling to zero. It is happy to do this, if a large log |T| can thereby be achieved, because we have not included this ‘which neuron’ information in the objective. We will address these and other problems in Section 3, but now we report on our simulation results on demultiplexing. 2.1 Demultiplexing spike trains. An interesting possibility in the brain is that ‘patterns’ are embedded in spatially distributed spike timings that are input to neurons. Several patterns could be embedded in single input trains. This is called multiplexing. To extract and propagate these patterns, the neurons must demultiplex these inputs using its threshold nonlinearity. Demultiplexing is the ‘point process’ analog of the unmixing of independent inputs in ICA. We have been able to robustly achieve demultiplexing, as we now report. We simulated a feed-forward network with 3 integrate-and-fire neurons and inputs from 3 presynaptic neurons. Learning followed (11) where we replace the inverse by the pseudoinverse computed on the spikes generated during 0.5 s. The pseudo-inverse is necessary because even though on average, the learning matches number of output spikes to number of input spikes, the matrix T is still not usually square and so its actual inverse cannot be taken. In addition, in these simulations, an additional term is introduced in the learning to make sure all the output neurons fire with equal probability. This partially counters the ignoral of the ‘which neuron’ information, which we explained above. Assuming Poisson spike count ni for the ith output neuron with equal firing rate ni it is easy to derive in an approximate ¯ term that will control the spike count, i (¯ i − ni ). The target firing rates ni were set to n ¯ match the “source” spike train in this example. The network learns to demultiplex mixed spike trains, as shown in Figure 2. This demultiplexing is a robust property of learning using (11) with this new spike-controlling term. Finally, what about the spike-timing dependendence of the observed learning? Does it match experimental results? The comparison is made in Figure 3, and the answer is no. There is a timing-dependent transition between depression and potentiation in our result Spike Trains mixing mixed input trains 1 1 0.8 2 0.6 3 0 50 100 150 200 250 300 350 400 450 0.4 500 0.2 output 1 0 2 3 synaptic weights 0 50 100 150 200 250 300 350 400 450 500 original spike train 1 1 0.5 2 0 3 0 50 100 150 200 250 time in ms 300 350 400 450 500 −0.5 Figure 2: Unmixed spike trains. The input (top lef) are 3 spike trains which are a mixture of three independent Poison processes (bottom left). The network unmixes the spike train to approximately recover the original (center left). In this example 19 spikes correspond to the original with 4 deletion and 2 insertions. The two panels at the right show the mixing (top) and synaptic weight matrix after training (bottom). in Figure 3B, but it is not a sharp transition like the experimental result in Figure 3A. In addition, it does not transition at zero (ie: when tk − tl = 0), but at a time offset by the rise time of the EPSPs. In earlier experiments, in which we tranformed the gradient in (11) by an approximate inverse Hessian, to get an approximate Natural Gradient method, a sharp transition did emerge in simulations. However, the approximate inverse Hessian was singular, and we had to de-emphasise this result. It does suggest, however, that if the Natural Gradient transform can be usefully done on some variant of this learning rule, it may well be what accounts for the sharp transition effect of STDP. 3 Discussion Although these derivations started out smoothly, the reader possibly shares the authors’ frustration at the approximations involved here. Why isn’t this simple, like ICA? Why don’t we just have a nice maximum spikelihood model, ie: a density estimation algorithm for multivariate point processes, as ICA was a model in continuous space? We are going to be explicit about the problems now, and will propose a direction where the solution may lie. The over-riding problem is: we are unable to claim that in maximising log |T|, we are maximising the mutual information between inputs and outputs because: 1. The Invertability Problem. Algorithms such as ICA which maximise log Jacobians can only be called Infomax algorithms if the network transformation is both deterministic and invertable. The Spike Response Model is deterministic, but it is not invertable in general. When not invertable, the key formula (considering here vectors of input and output timings, tin and tout )is transformed from simple to complex. ie: p(tout ) = p(tin ) becomes p(tout ) = |T| solns tin p(tin ) d tin |T| (12) Thus when not invertable, we need to know the Jacobians of all the inputs that could have caused an output (called here ‘solns’), something we simply don’t know. 2. The ‘Which Neuron’ Problem. Instead of maximising the mutual information I(tout , tin ), we should be maximising I(tiout , tiin ), where the vector ti is the timing (A) STDP (B) Gradient 100 ∆ w (a.u.) 150 100 ∆ w / w (%) 150 50 0 −50 −100 −100 50 0 −50 −50 0 ∆ t (ms) 50 100 −100 −20 0 20 40 60 ∆ t (ms) 80 100 Figure 3: Dependence of synaptic modification on pre/post inter-spike interval. Left (A): From Froemke & Dan, Nature (2002)]. Dependence of synaptic modification on pre/post inter-spike interval in cat L2/3 visual cortical pyramidal cells in slice. Naturalistic spike trains. Each point represents one experiment. Right (B): According to Equation (11). Each point corresponds to an spike pair between approximately 100 input and 100 output spikes. vector, t, with the vector, i, of corresponding neuron indices, concatenated. Thus, ‘who spiked?’ should be included in the analysis as it is part of the information. 3. The Predictive Information Problem. In ICA, since there was no time involved, we did not have to worry about mutual informations over time between inputs and outputs. But in the spiking model, output spikes may well have (predictive) mutual information with future input spikes, as well as the usual (causal) mutual information with past input spikes. The former has been entirely missing from our analysis so far. These temporal and spatial infomation dependencies missing in our analysis so far, are thrown into a different light by a single empirical observation, which is that Spike TimingDependent Plasticity is not just a feedforward computation like the Spike Response Model. Specifically, there must be at least a statistical, if not a causal, relation between a real synapse’s plasticity and its neuron’s output spike timings, for Figure 3B to look like it does. It seems we have to confront the need for both a ‘memory’ (or reconstruction) model, such as the T we have thus far dealt with, in which output spikes talk about past inputs, and a ‘prediction’ model, in which they talk about future inputs. This is most easily understood from the point of view of Barber & Agakov’s variational Infomax algorithm [3]. They argue for optimising a lower bound on mutual information, which, for our neurons’, would be expressed using an inverse model p, as follows: ˆ I(tiin , tiout ) = H(tiin ) − log p(tiin |tiout ) ˆ p(tiin ,tiout ) ≤ I(tiin , tiout ) (13) In a feedforward model, H(tiin ) may be disregarded in taking gradients, leading us to the optimisation of a ‘memory-prediction’ model p(tiin |tiout ) related to something supposˆ edly happening in dendrites, somas and at synapses. In trying to guess what this might be, it would be nice if the math worked out. We need a square Jacobian matrix, T, so that |T| = p(tiin |tiout ) can be our memory/prediction model. Now let’s rename our feedforˆ → − ward timing Jacobian T (‘up the dendritic trees’), as T, and let’s fantasise that there is ← − some, as yet unspecified, feedback Jacobian T (‘down the dendritic trees’), which covers → − electrotonic influences as they spread from soma to synapse, and which T can be combined with by some operation ‘⊗’ to make things square. Imagine further, that doing this → ← − − yields a memory/prediction model on the inputs. Then the T we are looking for is T ⊗ T, → ← − − and the memory-prediction model is: p(tiin |tiout ) = T ⊗ T ˆ → − → − Ideally, the entries of T should be as before, ie: T kl = ∂tk /∂tl . What should the entries ← − ← − ← − of T be? Becoming just one step more concrete, suppose T had entries T lk = ∂cl /∂tk , where cl is some, as yet unspecified, value, or process, occuring at an input synapse when spike l comes in. What seems clear is that ⊗ should combine the correctly tensorised forms → − ← − → ← − − of T and T (giving them each 4 indices ijkl), so that T = T ⊗ T sums over the spikes k and l to give a I × J matrix, where I is the number of output neurons, and J the number of input neurons. Then our quantity, T, would represent all dependencies of input neuronal activity on output activity, summed over spikes. ← − Further, we imagine that T contains reverse (feedback) electrotonic transforms from soma ← − to synapse R lk that are somehow symmetrically related to the feedforward Spike Re→ − sponses from synapse to soma, which we now rename R kl . Thinking for a moment in terms of somatic k and synaptic l, voltages V , currents I and linear cable theory, the synapse to → − → − soma transform, R kl would be related to an impedance in Vk = Il Z kl , while the soma ← − ← − to synapse transform, R lk would be related to an admittance in Il = Vk Y lk [8]. The → − ← − symmetry in these equations is that Z kl is just the inverse conjugate of Y lk . Finally, then, what is cl ? And what is its relation to the calcium concentration, [Ca2+ ]l , at a synapse, when spike l comes in? These questions naturally follow from considering the experimental data, since it is known that the calcium level at synapses is the critical integrating factor in determining whether potentiation or depression occurs [5]. 4 Appendix: Gradient of log |T| for the full Spike Response Model. Here we give full details of the gradient for Gerstner’s Spike Response Model [7]. This is a general model for which Integrate-and-Fire is a special case. In this model the effect of a presynaptic spike at time tl on the membrane potential at time t is described by a post synaptic potential or spike response, which may also depend on the time that has passed since the last output spike tk−1 , hence the spike response is written as R(t − tk−1 , t − tl ). This response is weighted by the synaptic strength wl . Excitatory or inhibitory synapses are determined by the sign of wl . Refractoriness is incorporated by adding a hyper-polarizing contribution (spike-afterpotential) to the membrane potential in response to the last preceding spike η(t − tk−1 ). The membrane potential as a function of time is therefore given by u(t) = η(t − tk−1 ) + wl R(t − tk−1 , t − tl ) . (14) l We have ignored here potential contributions from external currents which can easily be included without modifying the following derivations. The output firing times t k are defined as the times for which u(t) reaches firing threshold from below. We consider a dynamic threshold, ϑ(t − tk−1 ), which may depend on the time since that last spike tk−1 , together then output spike times are defined implicitly by: t = tk : u(t) = ϑ(t − tk−1 ) and du(t) > 0. dt (15) For this more general model Tkl is given by Tkl = dtk =− dtl ∂u ∂ϑ − ∂tk ∂tk −1 ˙ ∂u wkl R(tk − tk−1 , tk − tl , ) = , ˙ ∂tl u(tk ) − ϑ(tk − tk−1 ) ˙ (16) ˙ ˙ where R(s, t), u(t), and ϑ(t) are derivatives with respect to t. The dependence of Tkl on ˙ tk−1 should be implicitly assumed. It has been omitted to simplify the notation. Now we compute the derivative of log |T| with respect to wkl . For any matrix T we have ∂ log |T|/∂Tab = [T−1 ]ba . Therefore: ∂ log |T| ∂Tab ∂ log |T| ∂Tab = [T−1 ]ba . (17) ∂wkl ∂Tab ∂wkl ∂wkl ab ab Utilising the Kronecker delta δab = (1 if a = b, else 0), the derivative of (16) with respect to wkl gives: ˙ ∂Tab ∂ wab R(ta − ta−1 , ta − tb ) = ˙ ˙ ∂wkl ∂wkl η(ta − ta−1 ) + wac R(ta − ta−1 , ta − tc ) − ϑ(ta − ta−1 ) c ˙ R(ta − ta−1 , ta − tb ) = δak δbl ˙ u(ta ) − ϑ(ta − ta−1 ) ˙ ˙ ˙ wab R(ta − ta−1 , ta − tb )δak R(ta − ta−1 , ta − tl ) − 2 ˙ u(ta ) − ϑ(ta − ta−1 ) ˙ = δak Tab Therefore: ∂ log |T| ∂wkl Tal δbl − wab wal . (18) δbl Tal − wab wal [T−1 ]ba δak Tab = ab = Tkl wkl [T−1 ]lk − [T−1 ]bk Tkl b (19) = Tkl [T−1 ]lk − 1 . wkl (20) Acknowledgments We are grateful for inspirational discussions with Nihat Ay, Michael Eisele, Hong Hui Yu, Jim Crutchfield, Jeff Beck, Surya Ganguli, Sophi` Deneve, David Barber, Fabian Theis, e Tony Zador and Arunava Banerjee. AJB thanks all RNI colleagues for many such discussions. References [1] Amari S-I. 1997. Natural gradient works efficiently in learning, Neural Computation, 10, 251-276 [2] Banerjee A. 2001. On the Phase-Space Dynamics of Systems of Spiking Neurons. Neural Computation, 13, 161-225 [3] Barber D. & Agakov F. 2003. The IM Algorithm: A Variational Approach to Information Maximization. Advances in Neural Information Processing Systems 16, MIT Press. [4] Bell A.J. & Sejnowski T.J. 1995. An information maximization approach to blind separation and blind deconvolution, Neural Computation, 7, 1129-1159 [5] Dan Y. & Poo M-m. 2004. Spike timing-dependent plasticity of neural circuits, Neuron, 44, 23-30 [6] Froemke R.C. & Dan Y. 2002. Spike-timing-dependent synaptic modification induced by natural spike trains. Nature, 28, 416: 433-8 [7] Gerstner W. & Kistner W.M. 2002. Spiking neuron models, Camb. Univ. Press [8] Zador A.M., Agmon-Snir H. & Segev I. 1995. The morphoelectrotonic transform: a graphical approach to dendritic function, J. Neurosci., 15(3): 1669-1682
Reference: text
sentIndex sentText sentNum sentScore
1 Starting out, we explore the optimisation of a network sensitivity measure related to maximising the mutual information between input spike timings and output spike timings. [sent-8, score-1.447]
2 Our derivations are analogous to those in ICA, except that the sensitivity of output timings to input timings is maximised, rather than the sensitivity of output ‘firing rates’ to inputs. [sent-9, score-0.587]
3 For now, in our initial simulations, we show that our derived rule can learn synaptic weights which can unmix, or demultiplex, mixed spike trains. [sent-11, score-0.545]
4 That is, it can recover independent point processes embedded in distributed correlated input spike trains, using an adaptive single-layer feedforward spiking network. [sent-12, score-0.64]
5 In this section, we will follow the structure of the ICA derivation [4] in developing the spiking theory. [sent-14, score-0.109]
6 But for now, to first develop our approach, we will explore an interim objective function called sensitivity which we define as the log Jacobian of how input spike timings affect output spike timings. [sent-16, score-1.304]
7 1 How to maximise the effect of one spike timing on another. [sent-18, score-0.613]
8 Consider a spike in neuron j at time tl that has an effect on the timing of another spike in neuron i at time tk . [sent-19, score-1.824]
9 We use i and j to index neurons, and k and l to index spikes, but sometimes for convenience we will use spike indices in place of neuron indices. [sent-21, score-0.589]
10 For example, wkl , the weight between an input spike l and an output spike k, is naturally understood to be just the corresponding wij . [sent-22, score-1.466]
11 dtk dtl threshold potential du u(t) R(t) resting potential tk output spikes tl input spikes Figure 1: Firing time tk is determined by the time of threshold crossing. [sent-23, score-1.792]
12 A change of an input spike time dtl affects, via a change of the membrane potential du the time of the output spike by dtk . [sent-24, score-1.64]
13 In the simplest version of the Spike Response Model [7], spike l has an effect on spike k that depends on the time-course of the evoked EPSP or IPSP, which we write as R kl (tk − tl ). [sent-25, score-1.241]
14 In general, this Rkl models both synaptic and dendritic linear responses to an input spike, and thus models synapse type and location. [sent-26, score-0.251]
15 For learning, we need only consider the value of this function when an output spike, k, occurs. [sent-27, score-0.096]
16 In this model, depicted in Figure 1, a neuron adds up its spiking inputs until its membrane potential, ui (t), reaches threshold at time tk . [sent-28, score-0.765]
17 This threshold we will often, again for convenience, write as uk ≡ ui (tk , {tl }), and it is given by a sum over spikes l: uk = wkl Rkl (tk − tl ) . [sent-29, score-0.982]
18 (1) l To maximise timing sensitivity, we need to determine the effect of a small change in the input firing time tl on the output firing time tk . [sent-30, score-0.878]
19 ) When tl is changed by a small amount dtl the membrane potential will change as a result. [sent-32, score-0.57]
20 This change in the membrane potential leads to a change in the time of threshold crossing dt k . [sent-33, score-0.238]
21 The contribution to the membrane potential, du, due to dtl is (∂uk /∂tl )dtl , and the change in du corresponding to a change dtk is (∂uk /∂tk )dtk . [sent-34, score-0.507]
22 We can relate these two effects by noting that the total change of the membrane potential du has to vanish because u k is defined as the potential at threshold. [sent-35, score-0.307]
23 ∂tk ∂tl (2) This is the total differential of the function uk = u(tk , {tl }), and is a special case of the implicit function theorem. [sent-37, score-0.099]
24 Rearranging this: dtk ∂uk =− dtl ∂tl ∂uk ˙ = −wkl Rkl /uk . [sent-38, score-0.284]
25 ˙ ∂tk (3) Now, to connect with the standard ICA derivation [4], recall the ‘rate’ (or sigmoidal) neuron, for which yi = gi (ui ) and ui = j wij xj . [sent-39, score-0.237]
26 For this neuron, the output dependence on input is ∂yi /∂xj = wij gi while the learning gradient is: ∂yi ∂ 1 log − fi (ui )xj = ∂wij ∂xj wij (4) where the ‘score functions’, fi , are defined in terms of a density estimate on the summed ∂ ∂ inputs: fi (ui ) = ∂ui log gi = ∂ui log p(ui ). [sent-40, score-0.663]
27 ˆ The analogous learning gradient for the spiking case, from (3), is: ˙ j(a)Rka ∂ dtk 1 log − a . [sent-41, score-0.324]
28 = ∂wij dtl wij uk ˙ (5) where j(a) = 1 if spike a came from neuron j, and 0 otherwise. [sent-42, score-0.91]
29 In other words, an STDP datapoint should lie on a 2-surface in a 3D space of {∆w, ∆t, uk }. [sent-45, score-0.099]
30 Incidentally, uk shows up in any ˙ ˙ learning rule optimising an objective function involving output spike timings. [sent-46, score-0.692]
31 2 How to maximise the effect of N spike timings on N other ones. [sent-48, score-0.656]
32 Now we deal with the case of a ‘square’ single-layer feedforward mapping between spike timings. [sent-49, score-0.519]
33 There can be several input and output neurons, but here we ignore which neurons are spiking, and just look at how the input timings affect the output timings. [sent-50, score-0.462]
34 This is captured in a Jacobian matrix of all timing dependencies we call T. [sent-51, score-0.076]
35 A multivariate version of the sensitivity measure introduced in the previous section is the log of the absolute determinant of the timing matrix, ie: log |T|. [sent-53, score-0.234]
36 The full derivation for the gradient W log |T| is in the Appendix. [sent-54, score-0.139]
37 Square ICA with a network y = g(Wx) is: ∆W ∝ W log |J| = W−1 − f (u)xT (6) where the Jacobian J has entries ∂yi /∂xj and the score functions are now, fi (u) = ∂ − ∂ui log p(u) for the general likelihood case, with p(u) = i gi being the special case of ˆ ˆ ICA. [sent-56, score-0.258]
38 We will now split the gradient in (6) according to the chain rule: W log |J| = [ J log |J|] ⊗ [ W J] j(l) − fk (u)xj wkl J−T ⊗ Jkl i(k) = (7) . [sent-57, score-0.477]
39 The righthand term is a 4-tensor with entries ∂Jkl /∂wij , and ⊗ is defined as A ⊗ Bij = kl Akl Bklij . [sent-59, score-0.094]
40 We write the gradient this way to preserve, in the second term, the independent structure of the 1 → 1 gradient term in (4), and to separate a difficult derivation into two easy parts. [sent-60, score-0.149]
41 The structure of (8) holds up when we move to the spiking case, giving: W log |T| = = [ T log |T|] ⊗ [ W T] T−T ⊗ Tkl i(k) j(l) − wkl (9) a ˙ j(a)Rka uk ˙ (10) where i(k) is now defined as being 1 if spike k occured in neuron i, and 0 otherwise. [sent-61, score-1.185]
42 When (10) is evaluated for a single weight influencing a single spike coupling (see the Appendix for the full derivation), it yields: ∆wkl ∝ ∂ log |T| Tkl = ∂wkl wkl T−1 lk −1 , (11) This is a non-local update involving a matrix inverse at each step. [sent-64, score-0.968]
43 In the ICA case of (6), such an inverse was removed by the Natural Gradient transform (see [1]), but in the spike timing case, this has turned out not to be possible, because of the additional asymmetry ˙ introduced into the T matrix (as opposed to the J matrix) by the Rkl term in (3). [sent-65, score-0.621]
44 It requires running the network for a while to generate spikes (and a corresponding T matrix), and then for each input/output spike coupling, the corresponding synapse is updated according to (11). [sent-68, score-0.691]
45 When this is done, and the weights learn, it is clear that something has been sacrificed by ignoring the issue of which neurons are producing the spikes. [sent-69, score-0.097]
46 Specifically, the network will often put all the output spikes on one output neuron, with the rates of the others falling to zero. [sent-70, score-0.325]
47 It is happy to do this, if a large log |T| can thereby be achieved, because we have not included this ‘which neuron’ information in the objective. [sent-71, score-0.05]
48 An interesting possibility in the brain is that ‘patterns’ are embedded in spatially distributed spike timings that are input to neurons. [sent-75, score-0.63]
49 To extract and propagate these patterns, the neurons must demultiplex these inputs using its threshold nonlinearity. [sent-78, score-0.197]
50 We simulated a feed-forward network with 3 integrate-and-fire neurons and inputs from 3 presynaptic neurons. [sent-81, score-0.136]
51 Learning followed (11) where we replace the inverse by the pseudoinverse computed on the spikes generated during 0. [sent-82, score-0.153]
52 The pseudo-inverse is necessary because even though on average, the learning matches number of output spikes to number of input spikes, the matrix T is still not usually square and so its actual inverse cannot be taken. [sent-84, score-0.29]
53 In addition, in these simulations, an additional term is introduced in the learning to make sure all the output neurons fire with equal probability. [sent-85, score-0.165]
54 Assuming Poisson spike count ni for the ith output neuron with equal firing rate ni it is easy to derive in an approximate ¯ term that will control the spike count, i (¯ i − ni ). [sent-87, score-1.248]
55 The target firing rates ni were set to n ¯ match the “source” spike train in this example. [sent-88, score-0.501]
56 The network learns to demultiplex mixed spike trains, as shown in Figure 2. [sent-89, score-0.546]
57 This demultiplexing is a robust property of learning using (11) with this new spike-controlling term. [sent-90, score-0.067]
58 There is a timing-dependent transition between depression and potentiation in our result Spike Trains mixing mixed input trains 1 1 0. [sent-94, score-0.159]
59 2 output 1 0 2 3 synaptic weights 0 50 100 150 200 250 300 350 400 450 500 original spike train 1 1 0. [sent-98, score-0.641]
60 The input (top lef) are 3 spike trains which are a mixture of three independent Poison processes (bottom left). [sent-101, score-0.552]
61 The network unmixes the spike train to approximately recover the original (center left). [sent-102, score-0.496]
62 In this example 19 spikes correspond to the original with 4 deletion and 2 insertions. [sent-103, score-0.107]
63 The two panels at the right show the mixing (top) and synaptic weight matrix after training (bottom). [sent-104, score-0.075]
64 In addition, it does not transition at zero (ie: when tk − tl = 0), but at a time offset by the rise time of the EPSPs. [sent-106, score-0.595]
65 In earlier experiments, in which we tranformed the gradient in (11) by an approximate inverse Hessian, to get an approximate Natural Gradient method, a sharp transition did emerge in simulations. [sent-107, score-0.156]
66 It does suggest, however, that if the Natural Gradient transform can be usefully done on some variant of this learning rule, it may well be what accounts for the sharp transition effect of STDP. [sent-109, score-0.079]
67 The over-riding problem is: we are unable to claim that in maximising log |T|, we are maximising the mutual information between inputs and outputs because: 1. [sent-114, score-0.324]
68 Algorithms such as ICA which maximise log Jacobians can only be called Infomax algorithms if the network transformation is both deterministic and invertable. [sent-116, score-0.143]
69 When not invertable, the key formula (considering here vectors of input and output timings, tin and tout )is transformed from simple to complex. [sent-118, score-0.304]
70 ie: p(tout ) = p(tin ) becomes p(tout ) = |T| solns tin p(tin ) d tin |T| (12) Thus when not invertable, we need to know the Jacobians of all the inputs that could have caused an output (called here ‘solns’), something we simply don’t know. [sent-119, score-0.398]
71 Instead of maximising the mutual information I(tout , tin ), we should be maximising I(tiout , tiin ), where the vector ti is the timing (A) STDP (B) Gradient 100 ∆ w (a. [sent-122, score-0.576]
72 ) 150 100 ∆ w / w (%) 150 50 0 −50 −100 −100 50 0 −50 −50 0 ∆ t (ms) 50 100 −100 −20 0 20 40 60 ∆ t (ms) 80 100 Figure 3: Dependence of synaptic modification on pre/post inter-spike interval. [sent-124, score-0.075]
73 Dependence of synaptic modification on pre/post inter-spike interval in cat L2/3 visual cortical pyramidal cells in slice. [sent-126, score-0.075]
74 Each point corresponds to an spike pair between approximately 100 input and 100 output spikes. [sent-130, score-0.607]
75 vector, t, with the vector, i, of corresponding neuron indices, concatenated. [sent-131, score-0.119]
76 In ICA, since there was no time involved, we did not have to worry about mutual informations over time between inputs and outputs. [sent-136, score-0.088]
77 But in the spiking model, output spikes may well have (predictive) mutual information with future input spikes, as well as the usual (causal) mutual information with past input spikes. [sent-137, score-0.459]
78 Specifically, there must be at least a statistical, if not a causal, relation between a real synapse’s plasticity and its neuron’s output spike timings, for Figure 3B to look like it does. [sent-140, score-0.603]
79 It seems we have to confront the need for both a ‘memory’ (or reconstruction) model, such as the T we have thus far dealt with, in which output spikes talk about past inputs, and a ‘prediction’ model, in which they talk about future inputs. [sent-141, score-0.253]
80 Then the T we are looking for is T ⊗ T, → ← − − and the memory-prediction model is: p(tiin |tiout ) = T ⊗ T ˆ → − → − Ideally, the entries of T should be as before, ie: T kl = ∂tk /∂tl . [sent-148, score-0.094]
81 Becoming just one step more concrete, suppose T had entries T lk = ∂cl /∂tk , where cl is some, as yet unspecified, value, or process, occuring at an input synapse when spike l comes in. [sent-150, score-0.759]
82 What seems clear is that ⊗ should combine the correctly tensorised forms → − ← − → ← − − of T and T (giving them each 4 indices ijkl), so that T = T ⊗ T sums over the spikes k and l to give a I × J matrix, where I is the number of output neurons, and J the number of input neurons. [sent-151, score-0.244]
83 Then our quantity, T, would represent all dependencies of input neuronal activity on output activity, summed over spikes. [sent-152, score-0.137]
84 ← − Further, we imagine that T contains reverse (feedback) electrotonic transforms from soma ← − to synapse R lk that are somehow symmetrically related to the feedforward Spike Re→ − sponses from synapse to soma, which we now rename R kl . [sent-153, score-0.492]
85 The → − ← − symmetry in these equations is that Z kl is just the inverse conjugate of Y lk . [sent-155, score-0.185]
86 And what is its relation to the calcium concentration, [Ca2+ ]l , at a synapse, when spike l comes in? [sent-157, score-0.503]
87 These questions naturally follow from considering the experimental data, since it is known that the calcium level at synapses is the critical integrating factor in determining whether potentiation or depression occurs [5]. [sent-158, score-0.085]
88 Here we give full details of the gradient for Gerstner’s Spike Response Model [7]. [sent-160, score-0.06]
89 This response is weighted by the synaptic strength wl . [sent-163, score-0.17]
90 Refractoriness is incorporated by adding a hyper-polarizing contribution (spike-afterpotential) to the membrane potential in response to the last preceding spike η(t − tk−1 ). [sent-165, score-0.66]
91 The membrane potential as a function of time is therefore given by u(t) = η(t − tk−1 ) + wl R(t − tk−1 , t − tl ) . [sent-166, score-0.442]
92 (14) l We have ignored here potential contributions from external currents which can easily be included without modifying the following derivations. [sent-167, score-0.056]
93 The output firing times t k are defined as the times for which u(t) reaches firing threshold from below. [sent-168, score-0.133]
94 We consider a dynamic threshold, ϑ(t − tk−1 ), which may depend on the time since that last spike tk−1 , together then output spike times are defined implicitly by: t = tk : u(t) = ϑ(t − tk−1 ) and du(t) > 0. [sent-169, score-1.359]
95 dt (15) For this more general model Tkl is given by Tkl = dtk =− dtl ∂u ∂ϑ − ∂tk ∂tk −1 ˙ ∂u wkl R(tk − tk−1 , tk − tl , ) = , ˙ ∂tl u(tk ) − ϑ(tk − tk−1 ) ˙ (16) ˙ ˙ where R(s, t), u(t), and ϑ(t) are derivatives with respect to t. [sent-170, score-1.171]
96 Now we compute the derivative of log |T| with respect to wkl . [sent-173, score-0.398]
97 Therefore: ∂ log |T| ∂Tab ∂ log |T| ∂Tab = [T−1 ]ba . [sent-175, score-0.1]
98 (18) δbl Tal − wab wal [T−1 ]ba δak Tab = ab = Tkl wkl [T−1 ]lk − [T−1 ]bk Tkl b (19) = Tkl [T−1 ]lk − 1 . [sent-177, score-0.434]
99 wkl (20) Acknowledgments We are grateful for inspirational discussions with Nihat Ay, Michael Eisele, Hong Hui Yu, Jim Crutchfield, Jeff Beck, Surya Ganguli, Sophi` Deneve, David Barber, Fabian Theis, e Tony Zador and Arunava Banerjee. [sent-178, score-0.317]
100 Natural gradient works efficiently in learning, Neural Computation, 10, 251-276 [2] Banerjee A. [sent-182, score-0.06]
wordName wordTfidf (topN-words)
[('spike', 0.47), ('tk', 0.323), ('wkl', 0.317), ('ta', 0.265), ('tl', 0.247), ('tiin', 0.167), ('dtl', 0.15), ('tkl', 0.15), ('dtk', 0.134), ('neuron', 0.119), ('timings', 0.119), ('tab', 0.117), ('tiout', 0.117), ('spikes', 0.107), ('tin', 0.1), ('uk', 0.099), ('ica', 0.099), ('output', 0.096), ('maximising', 0.093), ('membrane', 0.089), ('synapse', 0.088), ('jacobian', 0.087), ('lk', 0.085), ('spiking', 0.08), ('du', 0.078), ('timing', 0.076), ('ui', 0.076), ('synaptic', 0.075), ('wij', 0.072), ('neurons', 0.069), ('demultiplexing', 0.067), ('maximise', 0.067), ('rkl', 0.067), ('tout', 0.067), ('ie', 0.064), ('soma', 0.062), ('gradient', 0.06), ('sensitivity', 0.058), ('potential', 0.056), ('kl', 0.054), ('wab', 0.053), ('ring', 0.051), ('log', 0.05), ('demultiplex', 0.05), ('invertable', 0.05), ('wl', 0.05), ('feedforward', 0.049), ('mutual', 0.047), ('dendritic', 0.047), ('inverse', 0.046), ('response', 0.045), ('tb', 0.044), ('bl', 0.044), ('input', 0.041), ('inputs', 0.041), ('trains', 0.041), ('entries', 0.04), ('threshold', 0.037), ('dan', 0.037), ('ba', 0.037), ('stdp', 0.037), ('plasticity', 0.037), ('fi', 0.036), ('cl', 0.035), ('calcium', 0.033), ('electrotonic', 0.033), ('jacobians', 0.033), ('jkl', 0.033), ('rename', 0.033), ('rka', 0.033), ('solns', 0.033), ('timingdependent', 0.033), ('wal', 0.033), ('zador', 0.033), ('barber', 0.033), ('ak', 0.032), ('gi', 0.032), ('ni', 0.031), ('ab', 0.031), ('derivative', 0.031), ('transform', 0.029), ('agakov', 0.029), ('froemke', 0.029), ('tal', 0.029), ('derivation', 0.029), ('xj', 0.028), ('change', 0.028), ('something', 0.028), ('parra', 0.027), ('optimisation', 0.027), ('optimising', 0.027), ('infomax', 0.027), ('potentiation', 0.027), ('network', 0.026), ('sharp', 0.025), ('transition', 0.025), ('talk', 0.025), ('depression', 0.025), ('score', 0.024), ('ms', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999875 112 nips-2004-Maximising Sensitivity in a Spiking Network
Author: Anthony J. Bell, Lucas C. Parra
Abstract: We use unsupervised probabilistic machine learning ideas to try to explain the kinds of learning observed in real neurons, the goal being to connect abstract principles of self-organisation to known biophysical processes. For example, we would like to explain Spike TimingDependent Plasticity (see [5,6] and Figure 3A), in terms of information theory. Starting out, we explore the optimisation of a network sensitivity measure related to maximising the mutual information between input spike timings and output spike timings. Our derivations are analogous to those in ICA, except that the sensitivity of output timings to input timings is maximised, rather than the sensitivity of output ‘firing rates’ to inputs. ICA and related approaches have been successful in explaining the learning of many properties of early visual receptive fields in rate coding models, and we are hoping for similar gains in understanding of spike coding in networks, and how this is supported, in principled probabilistic ways, by cellular biophysical processes. For now, in our initial simulations, we show that our derived rule can learn synaptic weights which can unmix, or demultiplex, mixed spike trains. That is, it can recover independent point processes embedded in distributed correlated input spike trains, using an adaptive single-layer feedforward spiking network. 1 Maximising Sensitivity. In this section, we will follow the structure of the ICA derivation [4] in developing the spiking theory. We cannot claim, as before, that this gives us an information maximisation algorithm, for reasons that we will delay addressing until Section 3. But for now, to first develop our approach, we will explore an interim objective function called sensitivity which we define as the log Jacobian of how input spike timings affect output spike timings. 1.1 How to maximise the effect of one spike timing on another. Consider a spike in neuron j at time tl that has an effect on the timing of another spike in neuron i at time tk . The neurons are connected by a weight wij . We use i and j to index neurons, and k and l to index spikes, but sometimes for convenience we will use spike indices in place of neuron indices. For example, wkl , the weight between an input spike l and an output spike k, is naturally understood to be just the corresponding wij . dtk dtl threshold potential du u(t) R(t) resting potential tk output spikes tl input spikes Figure 1: Firing time tk is determined by the time of threshold crossing. A change of an input spike time dtl affects, via a change of the membrane potential du the time of the output spike by dtk . In the simplest version of the Spike Response Model [7], spike l has an effect on spike k that depends on the time-course of the evoked EPSP or IPSP, which we write as R kl (tk − tl ). In general, this Rkl models both synaptic and dendritic linear responses to an input spike, and thus models synapse type and location. For learning, we need only consider the value of this function when an output spike, k, occurs. In this model, depicted in Figure 1, a neuron adds up its spiking inputs until its membrane potential, ui (t), reaches threshold at time tk . This threshold we will often, again for convenience, write as uk ≡ ui (tk , {tl }), and it is given by a sum over spikes l: uk = wkl Rkl (tk − tl ) . (1) l To maximise timing sensitivity, we need to determine the effect of a small change in the input firing time tl on the output firing time tk . (A related problem is tackled in [2].) When tl is changed by a small amount dtl the membrane potential will change as a result. This change in the membrane potential leads to a change in the time of threshold crossing dt k . The contribution to the membrane potential, du, due to dtl is (∂uk /∂tl )dtl , and the change in du corresponding to a change dtk is (∂uk /∂tk )dtk . We can relate these two effects by noting that the total change of the membrane potential du has to vanish because u k is defined as the potential at threshold. ie: du = ∂uk ∂uk dtk + dtl = 0 . ∂tk ∂tl (2) This is the total differential of the function uk = u(tk , {tl }), and is a special case of the implicit function theorem. Rearranging this: dtk ∂uk =− dtl ∂tl ∂uk ˙ = −wkl Rkl /uk . ˙ ∂tk (3) Now, to connect with the standard ICA derivation [4], recall the ‘rate’ (or sigmoidal) neuron, for which yi = gi (ui ) and ui = j wij xj . For this neuron, the output dependence on input is ∂yi /∂xj = wij gi while the learning gradient is: ∂yi ∂ 1 log − fi (ui )xj = ∂wij ∂xj wij (4) where the ‘score functions’, fi , are defined in terms of a density estimate on the summed ∂ ∂ inputs: fi (ui ) = ∂ui log gi = ∂ui log p(ui ). ˆ The analogous learning gradient for the spiking case, from (3), is: ˙ j(a)Rka ∂ dtk 1 log − a . = ∂wij dtl wij uk ˙ (5) where j(a) = 1 if spike a came from neuron j, and 0 otherwise. Comparing the two cases in (4) and (5), we see that the input variable xj has become the temporal derivative of the sum of the EPSPs coming from synapse j, and the output variable (or score function) fi (ui ) has become u−1 , the inverse of the temporal derivative ˙k of the membrane potential at threshold. It is intriguing (A) to see this quantity appear as analogous to the score function in the ICA likelihood model, and, (B) to speculate that experiments could show that this‘ voltage slope at threshold’ is a hidden factor in STDP data, explaining some of the scatter in Figure 3A. In other words, an STDP datapoint should lie on a 2-surface in a 3D space of {∆w, ∆t, uk }. Incidentally, uk shows up in any ˙ ˙ learning rule optimising an objective function involving output spike timings. 1.2 How to maximise the effect of N spike timings on N other ones. Now we deal with the case of a ‘square’ single-layer feedforward mapping between spike timings. There can be several input and output neurons, but here we ignore which neurons are spiking, and just look at how the input timings affect the output timings. This is captured in a Jacobian matrix of all timing dependencies we call T. The entries of this matrix are Tkl ≡ ∂tk /∂tl . A multivariate version of the sensitivity measure introduced in the previous section is the log of the absolute determinant of the timing matrix, ie: log |T|. The full derivation for the gradient W log |T| is in the Appendix. Here, we again draw out the analogy between Square ICA [4] and this gradient, as follows. Square ICA with a network y = g(Wx) is: ∆W ∝ W log |J| = W−1 − f (u)xT (6) where the Jacobian J has entries ∂yi /∂xj and the score functions are now, fi (u) = ∂ − ∂ui log p(u) for the general likelihood case, with p(u) = i gi being the special case of ˆ ˆ ICA. We will now split the gradient in (6) according to the chain rule: W log |J| = [ J log |J|] ⊗ [ W J] j(l) − fk (u)xj wkl J−T ⊗ Jkl i(k) = (7) . (8) In this equation, i(k) = δik and j(l) = δjl . The righthand term is a 4-tensor with entries ∂Jkl /∂wij , and ⊗ is defined as A ⊗ Bij = kl Akl Bklij . We write the gradient this way to preserve, in the second term, the independent structure of the 1 → 1 gradient term in (4), and to separate a difficult derivation into two easy parts. The structure of (8) holds up when we move to the spiking case, giving: W log |T| = = [ T log |T|] ⊗ [ W T] T−T ⊗ Tkl i(k) j(l) − wkl (9) a ˙ j(a)Rka uk ˙ (10) where i(k) is now defined as being 1 if spike k occured in neuron i, and 0 otherwise. j(l) and j(a) are analogously defined. Because the T matrix is much bigger than the J matrix, and because it’s entries are more complex, here the similarity ends. When (10) is evaluated for a single weight influencing a single spike coupling (see the Appendix for the full derivation), it yields: ∆wkl ∝ ∂ log |T| Tkl = ∂wkl wkl T−1 lk −1 , (11) This is a non-local update involving a matrix inverse at each step. In the ICA case of (6), such an inverse was removed by the Natural Gradient transform (see [1]), but in the spike timing case, this has turned out not to be possible, because of the additional asymmetry ˙ introduced into the T matrix (as opposed to the J matrix) by the Rkl term in (3). 2 Results. Nonetheless, this learning rule can be simulated. It requires running the network for a while to generate spikes (and a corresponding T matrix), and then for each input/output spike coupling, the corresponding synapse is updated according to (11). When this is done, and the weights learn, it is clear that something has been sacrificed by ignoring the issue of which neurons are producing the spikes. Specifically, the network will often put all the output spikes on one output neuron, with the rates of the others falling to zero. It is happy to do this, if a large log |T| can thereby be achieved, because we have not included this ‘which neuron’ information in the objective. We will address these and other problems in Section 3, but now we report on our simulation results on demultiplexing. 2.1 Demultiplexing spike trains. An interesting possibility in the brain is that ‘patterns’ are embedded in spatially distributed spike timings that are input to neurons. Several patterns could be embedded in single input trains. This is called multiplexing. To extract and propagate these patterns, the neurons must demultiplex these inputs using its threshold nonlinearity. Demultiplexing is the ‘point process’ analog of the unmixing of independent inputs in ICA. We have been able to robustly achieve demultiplexing, as we now report. We simulated a feed-forward network with 3 integrate-and-fire neurons and inputs from 3 presynaptic neurons. Learning followed (11) where we replace the inverse by the pseudoinverse computed on the spikes generated during 0.5 s. The pseudo-inverse is necessary because even though on average, the learning matches number of output spikes to number of input spikes, the matrix T is still not usually square and so its actual inverse cannot be taken. In addition, in these simulations, an additional term is introduced in the learning to make sure all the output neurons fire with equal probability. This partially counters the ignoral of the ‘which neuron’ information, which we explained above. Assuming Poisson spike count ni for the ith output neuron with equal firing rate ni it is easy to derive in an approximate ¯ term that will control the spike count, i (¯ i − ni ). The target firing rates ni were set to n ¯ match the “source” spike train in this example. The network learns to demultiplex mixed spike trains, as shown in Figure 2. This demultiplexing is a robust property of learning using (11) with this new spike-controlling term. Finally, what about the spike-timing dependendence of the observed learning? Does it match experimental results? The comparison is made in Figure 3, and the answer is no. There is a timing-dependent transition between depression and potentiation in our result Spike Trains mixing mixed input trains 1 1 0.8 2 0.6 3 0 50 100 150 200 250 300 350 400 450 0.4 500 0.2 output 1 0 2 3 synaptic weights 0 50 100 150 200 250 300 350 400 450 500 original spike train 1 1 0.5 2 0 3 0 50 100 150 200 250 time in ms 300 350 400 450 500 −0.5 Figure 2: Unmixed spike trains. The input (top lef) are 3 spike trains which are a mixture of three independent Poison processes (bottom left). The network unmixes the spike train to approximately recover the original (center left). In this example 19 spikes correspond to the original with 4 deletion and 2 insertions. The two panels at the right show the mixing (top) and synaptic weight matrix after training (bottom). in Figure 3B, but it is not a sharp transition like the experimental result in Figure 3A. In addition, it does not transition at zero (ie: when tk − tl = 0), but at a time offset by the rise time of the EPSPs. In earlier experiments, in which we tranformed the gradient in (11) by an approximate inverse Hessian, to get an approximate Natural Gradient method, a sharp transition did emerge in simulations. However, the approximate inverse Hessian was singular, and we had to de-emphasise this result. It does suggest, however, that if the Natural Gradient transform can be usefully done on some variant of this learning rule, it may well be what accounts for the sharp transition effect of STDP. 3 Discussion Although these derivations started out smoothly, the reader possibly shares the authors’ frustration at the approximations involved here. Why isn’t this simple, like ICA? Why don’t we just have a nice maximum spikelihood model, ie: a density estimation algorithm for multivariate point processes, as ICA was a model in continuous space? We are going to be explicit about the problems now, and will propose a direction where the solution may lie. The over-riding problem is: we are unable to claim that in maximising log |T|, we are maximising the mutual information between inputs and outputs because: 1. The Invertability Problem. Algorithms such as ICA which maximise log Jacobians can only be called Infomax algorithms if the network transformation is both deterministic and invertable. The Spike Response Model is deterministic, but it is not invertable in general. When not invertable, the key formula (considering here vectors of input and output timings, tin and tout )is transformed from simple to complex. ie: p(tout ) = p(tin ) becomes p(tout ) = |T| solns tin p(tin ) d tin |T| (12) Thus when not invertable, we need to know the Jacobians of all the inputs that could have caused an output (called here ‘solns’), something we simply don’t know. 2. The ‘Which Neuron’ Problem. Instead of maximising the mutual information I(tout , tin ), we should be maximising I(tiout , tiin ), where the vector ti is the timing (A) STDP (B) Gradient 100 ∆ w (a.u.) 150 100 ∆ w / w (%) 150 50 0 −50 −100 −100 50 0 −50 −50 0 ∆ t (ms) 50 100 −100 −20 0 20 40 60 ∆ t (ms) 80 100 Figure 3: Dependence of synaptic modification on pre/post inter-spike interval. Left (A): From Froemke & Dan, Nature (2002)]. Dependence of synaptic modification on pre/post inter-spike interval in cat L2/3 visual cortical pyramidal cells in slice. Naturalistic spike trains. Each point represents one experiment. Right (B): According to Equation (11). Each point corresponds to an spike pair between approximately 100 input and 100 output spikes. vector, t, with the vector, i, of corresponding neuron indices, concatenated. Thus, ‘who spiked?’ should be included in the analysis as it is part of the information. 3. The Predictive Information Problem. In ICA, since there was no time involved, we did not have to worry about mutual informations over time between inputs and outputs. But in the spiking model, output spikes may well have (predictive) mutual information with future input spikes, as well as the usual (causal) mutual information with past input spikes. The former has been entirely missing from our analysis so far. These temporal and spatial infomation dependencies missing in our analysis so far, are thrown into a different light by a single empirical observation, which is that Spike TimingDependent Plasticity is not just a feedforward computation like the Spike Response Model. Specifically, there must be at least a statistical, if not a causal, relation between a real synapse’s plasticity and its neuron’s output spike timings, for Figure 3B to look like it does. It seems we have to confront the need for both a ‘memory’ (or reconstruction) model, such as the T we have thus far dealt with, in which output spikes talk about past inputs, and a ‘prediction’ model, in which they talk about future inputs. This is most easily understood from the point of view of Barber & Agakov’s variational Infomax algorithm [3]. They argue for optimising a lower bound on mutual information, which, for our neurons’, would be expressed using an inverse model p, as follows: ˆ I(tiin , tiout ) = H(tiin ) − log p(tiin |tiout ) ˆ p(tiin ,tiout ) ≤ I(tiin , tiout ) (13) In a feedforward model, H(tiin ) may be disregarded in taking gradients, leading us to the optimisation of a ‘memory-prediction’ model p(tiin |tiout ) related to something supposˆ edly happening in dendrites, somas and at synapses. In trying to guess what this might be, it would be nice if the math worked out. We need a square Jacobian matrix, T, so that |T| = p(tiin |tiout ) can be our memory/prediction model. Now let’s rename our feedforˆ → − ward timing Jacobian T (‘up the dendritic trees’), as T, and let’s fantasise that there is ← − some, as yet unspecified, feedback Jacobian T (‘down the dendritic trees’), which covers → − electrotonic influences as they spread from soma to synapse, and which T can be combined with by some operation ‘⊗’ to make things square. Imagine further, that doing this → ← − − yields a memory/prediction model on the inputs. Then the T we are looking for is T ⊗ T, → ← − − and the memory-prediction model is: p(tiin |tiout ) = T ⊗ T ˆ → − → − Ideally, the entries of T should be as before, ie: T kl = ∂tk /∂tl . What should the entries ← − ← − ← − of T be? Becoming just one step more concrete, suppose T had entries T lk = ∂cl /∂tk , where cl is some, as yet unspecified, value, or process, occuring at an input synapse when spike l comes in. What seems clear is that ⊗ should combine the correctly tensorised forms → − ← − → ← − − of T and T (giving them each 4 indices ijkl), so that T = T ⊗ T sums over the spikes k and l to give a I × J matrix, where I is the number of output neurons, and J the number of input neurons. Then our quantity, T, would represent all dependencies of input neuronal activity on output activity, summed over spikes. ← − Further, we imagine that T contains reverse (feedback) electrotonic transforms from soma ← − to synapse R lk that are somehow symmetrically related to the feedforward Spike Re→ − sponses from synapse to soma, which we now rename R kl . Thinking for a moment in terms of somatic k and synaptic l, voltages V , currents I and linear cable theory, the synapse to → − → − soma transform, R kl would be related to an impedance in Vk = Il Z kl , while the soma ← − ← − to synapse transform, R lk would be related to an admittance in Il = Vk Y lk [8]. The → − ← − symmetry in these equations is that Z kl is just the inverse conjugate of Y lk . Finally, then, what is cl ? And what is its relation to the calcium concentration, [Ca2+ ]l , at a synapse, when spike l comes in? These questions naturally follow from considering the experimental data, since it is known that the calcium level at synapses is the critical integrating factor in determining whether potentiation or depression occurs [5]. 4 Appendix: Gradient of log |T| for the full Spike Response Model. Here we give full details of the gradient for Gerstner’s Spike Response Model [7]. This is a general model for which Integrate-and-Fire is a special case. In this model the effect of a presynaptic spike at time tl on the membrane potential at time t is described by a post synaptic potential or spike response, which may also depend on the time that has passed since the last output spike tk−1 , hence the spike response is written as R(t − tk−1 , t − tl ). This response is weighted by the synaptic strength wl . Excitatory or inhibitory synapses are determined by the sign of wl . Refractoriness is incorporated by adding a hyper-polarizing contribution (spike-afterpotential) to the membrane potential in response to the last preceding spike η(t − tk−1 ). The membrane potential as a function of time is therefore given by u(t) = η(t − tk−1 ) + wl R(t − tk−1 , t − tl ) . (14) l We have ignored here potential contributions from external currents which can easily be included without modifying the following derivations. The output firing times t k are defined as the times for which u(t) reaches firing threshold from below. We consider a dynamic threshold, ϑ(t − tk−1 ), which may depend on the time since that last spike tk−1 , together then output spike times are defined implicitly by: t = tk : u(t) = ϑ(t − tk−1 ) and du(t) > 0. dt (15) For this more general model Tkl is given by Tkl = dtk =− dtl ∂u ∂ϑ − ∂tk ∂tk −1 ˙ ∂u wkl R(tk − tk−1 , tk − tl , ) = , ˙ ∂tl u(tk ) − ϑ(tk − tk−1 ) ˙ (16) ˙ ˙ where R(s, t), u(t), and ϑ(t) are derivatives with respect to t. The dependence of Tkl on ˙ tk−1 should be implicitly assumed. It has been omitted to simplify the notation. Now we compute the derivative of log |T| with respect to wkl . For any matrix T we have ∂ log |T|/∂Tab = [T−1 ]ba . Therefore: ∂ log |T| ∂Tab ∂ log |T| ∂Tab = [T−1 ]ba . (17) ∂wkl ∂Tab ∂wkl ∂wkl ab ab Utilising the Kronecker delta δab = (1 if a = b, else 0), the derivative of (16) with respect to wkl gives: ˙ ∂Tab ∂ wab R(ta − ta−1 , ta − tb ) = ˙ ˙ ∂wkl ∂wkl η(ta − ta−1 ) + wac R(ta − ta−1 , ta − tc ) − ϑ(ta − ta−1 ) c ˙ R(ta − ta−1 , ta − tb ) = δak δbl ˙ u(ta ) − ϑ(ta − ta−1 ) ˙ ˙ ˙ wab R(ta − ta−1 , ta − tb )δak R(ta − ta−1 , ta − tl ) − 2 ˙ u(ta ) − ϑ(ta − ta−1 ) ˙ = δak Tab Therefore: ∂ log |T| ∂wkl Tal δbl − wab wal . (18) δbl Tal − wab wal [T−1 ]ba δak Tab = ab = Tkl wkl [T−1 ]lk − [T−1 ]bk Tkl b (19) = Tkl [T−1 ]lk − 1 . wkl (20) Acknowledgments We are grateful for inspirational discussions with Nihat Ay, Michael Eisele, Hong Hui Yu, Jim Crutchfield, Jeff Beck, Surya Ganguli, Sophi` Deneve, David Barber, Fabian Theis, e Tony Zador and Arunava Banerjee. AJB thanks all RNI colleagues for many such discussions. References [1] Amari S-I. 1997. Natural gradient works efficiently in learning, Neural Computation, 10, 251-276 [2] Banerjee A. 2001. On the Phase-Space Dynamics of Systems of Spiking Neurons. Neural Computation, 13, 161-225 [3] Barber D. & Agakov F. 2003. The IM Algorithm: A Variational Approach to Information Maximization. Advances in Neural Information Processing Systems 16, MIT Press. [4] Bell A.J. & Sejnowski T.J. 1995. An information maximization approach to blind separation and blind deconvolution, Neural Computation, 7, 1129-1159 [5] Dan Y. & Poo M-m. 2004. Spike timing-dependent plasticity of neural circuits, Neuron, 44, 23-30 [6] Froemke R.C. & Dan Y. 2002. Spike-timing-dependent synaptic modification induced by natural spike trains. Nature, 28, 416: 433-8 [7] Gerstner W. & Kistner W.M. 2002. Spiking neuron models, Camb. Univ. Press [8] Zador A.M., Agmon-Snir H. & Segev I. 1995. The morphoelectrotonic transform: a graphical approach to dendritic function, J. Neurosci., 15(3): 1669-1682
2 0.32176504 194 nips-2004-Theory of localized synfire chain: characteristic propagation speed of stable spike pattern
Author: Kosuke Hamaguchi, Masato Okada, Kazuyuki Aihara
Abstract: Repeated spike patterns have often been taken as evidence for the synfire chain, a phenomenon that a stable spike synchrony propagates through a feedforward network. Inter-spike intervals which represent a repeated spike pattern are influenced by the propagation speed of a spike packet. However, the relation between the propagation speed and network structure is not well understood. While it is apparent that the propagation speed depends on the excitatory synapse strength, it might also be related to spike patterns. We analyze a feedforward network with Mexican-Hattype connectivity (FMH) using the Fokker-Planck equation. We show that both a uniform and a localized spike packet are stable in the FMH in a certain parameter region. We also demonstrate that the propagation speed depends on the distinct firing patterns in the same network.
3 0.31433725 153 nips-2004-Reducing Spike Train Variability: A Computational Theory Of Spike-Timing Dependent Plasticity
Author: Sander M. Bohte, Michael C. Mozer
Abstract: Experimental studies have observed synaptic potentiation when a presynaptic neuron fires shortly before a postsynaptic neuron, and synaptic depression when the presynaptic neuron fires shortly after. The dependence of synaptic modulation on the precise timing of the two action potentials is known as spike-timing dependent plasticity or STDP. We derive STDP from a simple computational principle: synapses adapt so as to minimize the postsynaptic neuron’s variability to a given presynaptic input, causing the neuron’s output to become more reliable in the face of noise. Using an entropy-minimization objective function and the biophysically realistic spike-response model of Gerstner (2001), we simulate neurophysiological experiments and obtain the characteristic STDP curve along with other phenomena including the reduction in synaptic plasticity as synaptic efficacy increases. We compare our account to other efforts to derive STDP from computational principles, and argue that our account provides the most comprehensive coverage of the phenomena. Thus, reliability of neural response in the face of noise may be a key goal of cortical adaptation. 1
4 0.26140541 148 nips-2004-Probabilistic Computation in Spiking Populations
Author: Richard S. Zemel, Rama Natarajan, Peter Dayan, Quentin J. Huys
Abstract: As animals interact with their environments, they must constantly update estimates about their states. Bayesian models combine prior probabilities, a dynamical model and sensory evidence to update estimates optimally. These models are consistent with the results of many diverse psychophysical studies. However, little is known about the neural representation and manipulation of such Bayesian information, particularly in populations of spiking neurons. We consider this issue, suggesting a model based on standard neural architecture and activations. We illustrate the approach on a simple random walk example, and apply it to a sensorimotor integration task that provides a particularly compelling example of dynamic probabilistic computation. Bayesian models have been used to explain a gamut of experimental results in tasks which require estimates to be derived from multiple sensory cues. These include a wide range of psychophysical studies of perception;13 motor action;7 and decision-making.3, 5 Central to Bayesian inference is that computations are sensitive to uncertainties about afferent and efferent quantities, arising from ignorance, noise, or inherent ambiguity (e.g., the aperture problem), and that these uncertainties change over time as information accumulates and dissipates. Understanding how neurons represent and manipulate uncertain quantities is therefore key to understanding the neural instantiation of these Bayesian inferences. Most previous work on representing probabilistic inference in neural populations has focused on the representation of static information.1, 12, 15 These encompass various strategies for encoding and decoding uncertain quantities, but do not readily generalize to real-world dynamic information processing tasks, particularly the most interesting cases with stimuli changing over the same timescale as spiking itself.11 Notable exceptions are the recent, seminal, but, as we argue, representationally restricted, models proposed by Gold and Shadlen,5 Rao,10 and Deneve.4 In this paper, we first show how probabilistic information varying over time can be represented in a spiking population code. Second, we present a method for producing spiking codes that facilitate further processing of the probabilistic information. Finally, we show the utility of this method by applying it to a temporal sensorimotor integration task. 1 TRAJECTORY ENCODING AND DECODING We assume that population spikes R(t) arise stochastically in relation to the trajectory X(t) of an underlying (but hidden) variable. We use RT and XT for the whole trajectory and spike trains respectively from time 0 to T . The spikes RT constitute the observations and are assumed to be probabilistically related to the signal by a tuning function f (X, θ i ): P (R(i, T )|X(T )) ∝ f (X, θi ) (1) for the spike train of the ith neuron, with parameters θi . Therefore, via standard Bayesian inference, RT determines a distribution over the hidden variable at time T , P (X(T )|RT ). We first consider a version of the dynamics and input coding that permits an analytical examination of the impact of spikes. Let X(t) follow a stationary Gaussian process such that the joint distribution P (X(t1 ), X(t2 ), . . . , X(tm )) is Gaussian for any finite collection of times, with a covariance matrix which depends on time differences: Ctt = c(|t − t |). Function c(|∆t|) controls the smoothness of the resulting random walks. Then, P (X(T )|RT ) ∝ p(X(T )) X(T ) dX(T )P (RT |X(T ))P (X(T )|X(T )) (2) where P (X(T )|X(T )) is the distribution over the whole trajectory X(T ) conditional on the value of X(T ) at its end point. If RT are a set of conditionally independent inhomogeneous Poisson processes, we have P (RT |X(T )) ∝ iτ f (X(tiτ ), θi ) exp − i τ dτ f (X(τ ), θi ) , (3) where tiτ ∀τ are the spike times τ of neuron i in RT . Let χ = [X(tiτ )] be the vector of stimulus positions at the times at which we observed a spike and Θ = [θ(tiτ )] be the vector of spike positions. If the tuning functions are Gaussian f (X, θi ) ∝ exp(−(X − θi )2 /2σ 2 ) and sufficiently dense that i τ dτ f (X, θi ) is independent of X (a standard assumption in population coding), then P (RT |X(T )) ∝ exp(− χ − Θ 2 /2σ 2 ) and in Equation 2, we can marginalize out X(T ) except at the spike times tiτ : P (X(T )|RT ) ∝ p(X(T )) −1 χ dχ exp −[χ, X(T )]T C 2 [χ, X(T )] − χ−Θ 2σ 2 2 (4) and C is the block covariance matrix between X(tiτ ), x(T ) at the spike times [ttτ ] and the final time T . This Gaussian integral has P (X(T )|RT ) ∼ N (µ(T ), ν(T )), with µ(T ) = CT t (Ctt + Iσ 2 )−1 Θ = kΘ ν(T ) = CT T − kCtT (5) CT T is the T, T th element of the covariance matrix and CT t is similarly a row vector. The dependence in µ on past spike times is specified chiefly by the inverse covariance matrix, and acts as an effective kernel (k). This kernel is not stationary, since it depends on factors such as the local density of spiking in the spike train RT . For example, consider where X(t) evolves according to a diffusion process with drift: dX = −αXdt + σ dN (t) (6) where α prevents it from wandering too far, N (t) is white Gaussian noise with mean zero and σ 2 variance. Figure 1A shows sample kernels for this process. Inspection of Figure 1A reveals some important traits. First, the monotonically decreasing kernel magnitude as the time span between the spike and the current time T grows matches the intuition that recent spikes play a more significant role in determining the posterior over X(T ). Second, the kernel is nearly exponential, with a time constant that depends on the time constant of the covariance function and the density of the spikes; two settings of these parameters produced the two groupings of kernels in the figure. Finally, the fully adaptive kernel k can be locally well approximated by a metronomic kernel k (shown in red in Figure 1A) that assumes regular spiking. This takes advantage of the general fact, indicated by the grouping of kernels, that the kernel depends weakly on the actual spike pattern, but strongly on the average rate. The merits of the metronomic kernel are that it is stationary and only depends on a single mean rate rather than the full spike train RT . It also justifies s Kernels k and k −0.5 C 5 0 0.03 0.06 0.09 0.04 0.06 0.08 t−t Time spike True stimulus and means D Full kernel E Regular, stationary kernel −0.5 0 −0.5 0.03 0.04 0.05 0.06 0.07 Time 0.08 0.09 0 0.5 0.1 Space 0 Space −4 10 Space Variance ratio 10 −2 10 0.5 B ν2 / σ2 Kernel size (weight) A 0.1 0 0.5 0.03 0.04 0.05 0.06 0.07 Time 0.08 0.09 0.1 Figure 1: Exact and approximate spike decoding with the Gaussian process prior. Spikes are shown in yellow, the true stimulus in green, and P (X(T )|RT ) in gray. Blue: exact inference with nonstationary and red: approximate inference with regular spiking. A Kernel samples for a diffusion process as defined by equations 5, 6. B, C: Mean and variance of the inference. D: Exact inference with full kernel k and E: approximation based on metronomic kernel k
5 0.2572276 28 nips-2004-Bayesian inference in spiking neurons
Author: Sophie Deneve
Abstract: We propose a new interpretation of spiking neurons as Bayesian integrators accumulating evidence over time about events in the external world or the body, and communicating to other neurons their certainties about these events. In this model, spikes signal the occurrence of new information, i.e. what cannot be predicted from the past activity. As a result, firing statistics are close to Poisson, albeit providing a deterministic representation of probabilities. We proceed to develop a theory of Bayesian inference in spiking neural networks, recurrent interactions implementing a variant of belief propagation. Many perceptual and motor tasks performed by the central nervous system are probabilistic, and can be described in a Bayesian framework [4, 3]. A few important but hidden properties, such as direction of motion, or appropriate motor commands, are inferred from many noisy, local and ambiguous sensory cues. These evidences are combined with priors about the sensory world and body. Importantly, because most of these inferences should lead to quick and irreversible decisions in a perpetually changing world, noisy cues have to be integrated on-line, but in a way that takes into account unpredictable events, such as a sudden change in motion direction or the appearance of a new stimulus. This raises the question of how this temporal integration can be performed at the neural level. It has been proposed that single neurons in sensory cortices represent and compute the log probability that a sensory variable takes on a certain value (eg Is visual motion in the neuron’s preferred direction?) [9, 7]. Alternatively, to avoid normalization issues and provide an appropriate signal for decision making, neurons could represent the log probability ratio of a particular hypothesis (eg is motion more likely to be towards the right than towards the left) [7, 6]. Log probabilities are convenient here, since under some assumptions, independent noisy cues simply combine linearly. Moreover, there are physiological evidence for the neural representation of log probabilities and log probability ratios [9, 6, 7]. However, these models assume that neurons represent probabilities in their firing rates. We argue that it is important to study how probabilistic information are encoded in spikes. Indeed, it seems spurious to marry the idea of an exquisite on-line integration of noisy cues with an underlying rate code that requires averaging on large populations of noisy neurons and long periods of time. In particular, most natural tasks require this integration to take place on the time scale of inter-spike intervals. Spikes are more efficiently signaling events ∗ Institute of Cognitive Science, 69645 Bron, France than analog quantities. In addition, a neural theory of inference with spikes will bring us closer to the physiological level and generate more easily testable predictions. Thus, we propose a new theory of neural processing in which spike trains provide a deterministic, online representation of a log-probability ratio. Spikes signals events, eg that the log-probability ratio has exceeded what could be predicted from previous spikes. This form of coding was loosely inspired by the idea of ”energy landscape” coding proposed by Hinton and Brown [2]. However, contrary to [2] and other theories using rate-based representation of probabilities, this model is self-consistent and does not require different models for encoding and decoding: As output spikes provide new, unpredictable, temporally independent evidence, they can be used directly as an input to other Bayesian neurons. Finally, we show that these neurons can be used as building blocks in a theory of approximate Bayesian inference in recurrent spiking networks. Connections between neurons implement an underlying Bayesian network, consisting of coupled hidden Markov models. Propagation of spikes is a form of belief propagation in this underlying graphical model. Our theory provides computational explanations of some general physiological properties of cortical neurons, such as spike frequency adaptation, Poisson statistics of spike trains, the existence of strong local inhibition in cortical columns, and the maintenance of a tight balance between excitation and inhibition. Finally, we discuss the implications of this model for the debate about temporal versus rate-based neural coding. 1 Spikes and log posterior odds 1.1 Synaptic integration seen as inference in a hidden Markov chain We propose that each neuron codes for an underlying ”hidden” binary variable, xt , whose state evolves over time. We assume that xt depends only on the state at the previous time step, xt−dt , and is conditionally independent of other past states. The state xt can switch 1 from 0 to 1 with a constant rate ron = dt limdt→0 P (xt = 1|xt−dt = 0), and from 1 to 0 with a constant rate roff . For example, these transition rates could represent how often motion in a preferred direction appears the receptive field and how long it is likely to stay there. The neuron infers the state of its hidden variable from N noisy synaptic inputs, considered to be observations of the hidden state. In this initial version of the model, we assume that these inputs are conditionally independent homogeneous Poisson processes, synapse i i emitting a spike between time t and t + dt (si = 1) with constant probability qon dt if t i xt = 1, and another constant probability qoff dt if xt = 0. The synaptic spikes are assumed to be otherwise independent of previous synaptic spikes, previous states and spikes at other synapses. The resulting generative model is a hidden Markov chain (figure 1-A). However, rather than estimating the state of its hidden variable and communicating this estimate to other neurons (for example by emitting a spike when sensory evidence for xt = 1 goes above a threshold) the neuron reports and communicates its certainty that the current state is 1. This certainty takes the form of the log of the ratio of the probability that the hidden state is 1, and the probability that the state is 0, given all the synaptic inputs P (xt =1|s0→t ) received so far: Lt = log P (xt =0|s0→t ) . We use s0→t as a short hand notation for the N synaptic inputs received at present and in the past. We will refer to it as the log odds ratio. Thanks to the conditional independencies assumed in the generative model, we can compute this Log odds ratio iteratively. Taking the limit as dt goes to zero, we get the following differential equation: ˙ L = ron 1 + e−L − roff 1 + eL + i wi δ(si − 1) − θ t B. A. xt ron .roff dt qon , qoff st xt ron .roff i t st dt s qon , qoff qon , qoff st dt xt j st Ot It Gt Ot Lt t t dt C. E. 2 0 -2 -4 D. 500 1000 1500 2000 2500 2 3000 Count Log odds 4 20 Lt 0 -2 0 500 1000 1500 2000 2500 Time Ot 3000 0 200 400 600 ISI Figure 1: A. Generative model for the synaptic input. B. Schematic representation of log odds ratio encoding and decoding. The dashed circle represents both eventual downstream elements and the self-prediction taking place inside the model neuron. A spike is fired only when Lt exceeds Gt . C. One example trial, where the state switches from 0 to 1 (shaded area) and back to 0. plain: Lt , dotted: Gt . Black stripes at the top: corresponding spikes train. D. Mean Log odds ratio (dark line) and mean output firing rate (clear line). E. Output spike raster plot (1 line per trial) and ISI distribution for the neuron shown is C. and D. Clear line: ISI distribution for a poisson neuron with the same rate. wi , the synaptic weight, describe how informative synapse i is about the state of the hidden i qon variable, e.g. wi = log qi . Each synaptic spike (si = 1) gives an impulse to the log t off odds ratio, which is positive if this synapse is more active when the hidden state if 1 (i.e it increases the neuron’s confidence that the state is 1), and negative if this synapse is more active when xt = 0 (i.e it decreases the neuron’s confidence that the state is 1). The bias, θ, is determined by how informative it is not to receive any spike, e.g. θ = i i i qon − qoff . By convention, we will consider that the ”bias” is positive or zero (if not, we need simply to invert the status of the state x). 1.2 Generation of output spikes The spike train should convey a sparse representation of Lt , so that each spike reports new information about the state xt that is not redundant with that reported by other, preceding, spikes. This proposition is based on three arguments: First, spikes, being metabolically expensive, should be kept to a minimum. Second, spikes conveying redundant information would require a decoding of the entire spike train, whereas independent spike can be taken into account individually. And finally, we seek a self consistent model, with the spiking output having a similar semantics to its spiking input. To maximize the independence of the spikes (conditioned on xt ), we propose that the neuron fires only when the difference between its log odds ratio Lt and a prediction Gt of this log odds ratio based on the output spikes emitted so far reaches a certain threshold. Indeed, supposing that downstream elements predicts Lt as best as they can, the neuron only needs to fire when it expects that prediction to be too inaccurate (figure 1-B). In practice, this will happen when the neuron receives new evidence for xt = 1. Gt should thereby follow the same dynamics as Lt when spikes are not received. The equation for Gt and the output Ot (Ot = 1 when an output spike is fired) are given by: ˙ G = Ot = ron 1 + e−L − roff 1 + eL + go δ(Ot − 1) go 1. when Lt > Gt + , 0 otherwise, 2 (1) (2) Here go , a positive constant, is the only free parameter, the other parameters being constrained by the statistics of the synaptic input. 1.3 Results Figure 1-C plots a typical trial, showing the behavior of L, G and O before, during and after presentation of the stimulus. As random synaptic inputs are integrated, L fluctuates and eventually exceeds G + 0.5, leading to an output spike. Immediately after a spike, G jumps to G + go , which prevents (except in very rare cases) a second spike from immediately following the first. Thus, this ”jump” implements a relative refractory period. However, ron G decays as it tends to converge back to its stable level gstable = log roff . Thus L eventually exceeds G again, leading to a new spike. This threshold crossing happens more often during stimulation (xt = 1) as the net synaptic input alters to create a higher overall level of certainty, Lt . Mean Log odds ratio and output firing rate ¯ The mean firing rate Ot of the Bayesian neuron during presentation of its preferred stimulus (i.e. when xt switches from 0 to 1 and back to 0) is plotted in figure 1-D, together with the ¯ mean log posterior ratio Lt , both averaged over trials. Not surprisingly, the log-posterior ratio reflects the leaky integration of synaptic evidence, with an effective time constant that depends on the transition probabilities ron , roff . If the state is very stable (ron = roff ∼ 0), synaptic evidence is integrated over almost infinite time periods, the mean log posterior ratio tending to either increase or decrease linearly with time. In the example in figure 1D, the state is less stable, so ”old” synaptic evidence are discounted and Lt saturates. ¯ In contrast, the mean output firing rate Ot tracks the state of xt almost perfectly. This is because, as a form of predictive coding, the output spikes reflect the new synaptic i evidence, It = i δ(st − 1) − θ, rather than the log posterior ratio itself. In particular, the mean output firing rate is a rectified linear function of the mean input, e. g. + ¯ ¯ wi q i −θ . O= 1I= go i on(off) Analogy with a leaky integrate and fire neuron We can get an interesting insight into the computation performed by this neuron by linearizing L and G around their mean levels over trials. Here we reduce the analysis to prolonged, statistically stable periods when the state is constant (either ON or OFF). In this case, the ¯ ¯ mean level of certainty L and its output prediction G are also constant over time. We make the rough approximation that the post spike jump, go , and the input fluctuations are small ¯ compared to the mean level of certainty L. Rewriting Vt = Lt − Gt + go 2 as the ”membrane potential” of the Bayesian neuron: ˙ V = −kL V + It − ∆go − go Ot ¯ ¯ ¯ where kL = ron e−L + roff eL , the ”leak” of the membrane potential, depends on the overall ¯ level of certainty. ∆go is positive and a monotonic increasing function of go . A. s t1 dt s t1 s t1 dt B. C. x t1 x t3 dt x t3 x t3 dt x t1 x t1 x t1 x t2 x t3 x t1 … x tn x t3 x t2 … x tn … dt dt Lx2 D. x t2 dt s t2 dt x t2 s t2 x t2 dt s t2 dt Log odds 10 No inh -0.5 -1 -1 -1.5 -2 5 Feedback 500 1000 1500 2000 Tiger Stripes 0 -5 -10 500 1000 1500 2000 2500 Time Figure 2: A. Bayesian causal network for yt (tiger), x1 (stripes) and x2 (paws). B. A nett t work feedforward computing the log posterior for x1 . C. A recurrent network computing t the log posterior odds for all variables. D. Log odds ratio in a simulated trial with the net2 1 1 work in C (see text). Thick line: Lx , thin line: Lx , dash-dotted: Lx without inhibition. t t t 2 Insert: Lx averaged over trials, showing the effect of feedback. t The linearized Bayesian neuron thus acts in its stable regime as a leaky integrate and fire (LIF) neuron. The membrane potential Vt integrates its input, Jt = It − ∆go , with a leak kL . The neuron fires when its membrane potential reaches a constant threshold go . After ¯ each spikes, Vt is reset to 0. Interestingly, for appropriately chosen compression factor go , the mean input to the lin¯ ¯ earized neuron J = I − ∆go ≈ 0 1 . This means that the membrane potential is purely driven to its threshold by input fluctuations, or a random walk in membrane potential. As a consequence, the neuron’s firing will be memoryless, and close to a Poisson process. In particular, we found Fano factor close to 1 and quasi-exponential ISI distribution (figure 1E) on the entire range of parameters tested. Indeed, LIF neurons with balanced inputs have been proposed as a model to reproduce the statistics of real cortical neurons [8]. This balance is implemented in our model by the neuron’s effective self-inhibition, even when the synaptic input itself is not balanced. Decoding As we previously said, downstream elements could predict the log odds ratio Lt by computing Gt from the output spikes (Eq 1, fig 1-B). Of course, this requires an estimate of the transition probabilities ron , roff , that could be learned from the observed spike trains. However, we show next that explicit decoding is not necessary to perform bayesian inference in spiking networks. Intuitively, this is because the quantity that our model neurons receive and transmit, eg new information, is exactly what probabilistic inference algorithm propagate between connected statistical elements. 1 ¯ Even if go is not chosen optimally, the influence of the drift J is usually negligible compared to the large fluctuations in membrane potential. 2 Bayesian inference in cortical networks The model neurons, having the same input and output semantics, can be used as building blocks to implement more complex generative models consisting of coupled Markov chains. Consider, for example, the example in figure 2-A. Here, a ”parent” variable x1 t (the presence of a tiger) can cause the state of n other ”children” variables ([xk ]k=2...n ), t of whom two are represented (the presence of stripes,x2 , and motion, x3 ). The ”chilt t dren” variables are Bayesian neurons identical to those described previously. The resulting bayesian network consist of n + 1 coupled hidden Markov chains. Inference in this architecture corresponds to computing the log posterior odds ratio for the tiger, x1 , and the log t posterior of observing stripes or motion, ([xk ]k=2...n ), given the synaptic inputs received t by the entire network so far, i.e. s2 , . . . , sk . 0→t 0→t Unfortunately, inference and learning in this network (and in general in coupled Markov chains) requires very expensive computations, and cannot be performed by simply propagating messages over time and among the variable nodes. In particular, the state of a child k variable xt depends on xk , sk , x1 and the state of all other children at the previous t t t−dt time step, [xj ]2
7 0.17991394 118 nips-2004-Methods for Estimating the Computational Power and Generalization Capability of Neural Microcircuits
8 0.17511015 76 nips-2004-Hierarchical Bayesian Inference in Networks of Spiking Neurons
9 0.16418907 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
10 0.16342628 97 nips-2004-Learning Efficient Auditory Codes Using Spikes Predicts Cochlear Filters
11 0.14725797 151 nips-2004-Rate- and Phase-coded Autoassociative Memory
12 0.14240375 140 nips-2004-Optimal Information Decoding from Neuronal Populations with Specific Stimulus Selectivity
13 0.13633128 12 nips-2004-A Temporal Kernel-Based Model for Tracking Hand Movements from Neural Activities
14 0.12333293 157 nips-2004-Saliency-Driven Image Acuity Modulation on a Reconfigurable Array of Spiking Silicon Neurons
15 0.11548892 181 nips-2004-Synergies between Intrinsic and Synaptic Plasticity in Individual Model Neurons
16 0.082352147 73 nips-2004-Generative Affine Localisation and Tracking
17 0.070062555 200 nips-2004-Using Random Forests in the Structured Language Model
18 0.067130812 121 nips-2004-Modeling Nonlinear Dependencies in Natural Images using Mixture of Laplacian Distribution
19 0.065948449 35 nips-2004-Chemosensory Processing in a Spiking Model of the Olfactory Bulb: Chemotopic Convergence and Center Surround Inhibition
20 0.061985124 198 nips-2004-Unsupervised Variational Bayesian Learning of Nonlinear Models
topicId topicWeight
[(0, -0.187), (1, -0.466), (2, -0.158), (3, 0.102), (4, -0.019), (5, -0.005), (6, -0.072), (7, 0.006), (8, -0.009), (9, 0.087), (10, -0.011), (11, 0.045), (12, 0.049), (13, 0.061), (14, -0.092), (15, 0.031), (16, 0.123), (17, 0.103), (18, -0.003), (19, 0.056), (20, -0.025), (21, -0.079), (22, -0.099), (23, 0.111), (24, 0.008), (25, 0.064), (26, -0.125), (27, 0.08), (28, 0.025), (29, -0.089), (30, 0.01), (31, 0.028), (32, -0.026), (33, 0.01), (34, 0.012), (35, 0.01), (36, 0.023), (37, 0.043), (38, 0.049), (39, -0.02), (40, 0.039), (41, -0.05), (42, 0.005), (43, -0.03), (44, -0.005), (45, -0.057), (46, -0.019), (47, -0.035), (48, 0.038), (49, 0.043)]
simIndex simValue paperId paperTitle
same-paper 1 0.97533721 112 nips-2004-Maximising Sensitivity in a Spiking Network
Author: Anthony J. Bell, Lucas C. Parra
Abstract: We use unsupervised probabilistic machine learning ideas to try to explain the kinds of learning observed in real neurons, the goal being to connect abstract principles of self-organisation to known biophysical processes. For example, we would like to explain Spike TimingDependent Plasticity (see [5,6] and Figure 3A), in terms of information theory. Starting out, we explore the optimisation of a network sensitivity measure related to maximising the mutual information between input spike timings and output spike timings. Our derivations are analogous to those in ICA, except that the sensitivity of output timings to input timings is maximised, rather than the sensitivity of output ‘firing rates’ to inputs. ICA and related approaches have been successful in explaining the learning of many properties of early visual receptive fields in rate coding models, and we are hoping for similar gains in understanding of spike coding in networks, and how this is supported, in principled probabilistic ways, by cellular biophysical processes. For now, in our initial simulations, we show that our derived rule can learn synaptic weights which can unmix, or demultiplex, mixed spike trains. That is, it can recover independent point processes embedded in distributed correlated input spike trains, using an adaptive single-layer feedforward spiking network. 1 Maximising Sensitivity. In this section, we will follow the structure of the ICA derivation [4] in developing the spiking theory. We cannot claim, as before, that this gives us an information maximisation algorithm, for reasons that we will delay addressing until Section 3. But for now, to first develop our approach, we will explore an interim objective function called sensitivity which we define as the log Jacobian of how input spike timings affect output spike timings. 1.1 How to maximise the effect of one spike timing on another. Consider a spike in neuron j at time tl that has an effect on the timing of another spike in neuron i at time tk . The neurons are connected by a weight wij . We use i and j to index neurons, and k and l to index spikes, but sometimes for convenience we will use spike indices in place of neuron indices. For example, wkl , the weight between an input spike l and an output spike k, is naturally understood to be just the corresponding wij . dtk dtl threshold potential du u(t) R(t) resting potential tk output spikes tl input spikes Figure 1: Firing time tk is determined by the time of threshold crossing. A change of an input spike time dtl affects, via a change of the membrane potential du the time of the output spike by dtk . In the simplest version of the Spike Response Model [7], spike l has an effect on spike k that depends on the time-course of the evoked EPSP or IPSP, which we write as R kl (tk − tl ). In general, this Rkl models both synaptic and dendritic linear responses to an input spike, and thus models synapse type and location. For learning, we need only consider the value of this function when an output spike, k, occurs. In this model, depicted in Figure 1, a neuron adds up its spiking inputs until its membrane potential, ui (t), reaches threshold at time tk . This threshold we will often, again for convenience, write as uk ≡ ui (tk , {tl }), and it is given by a sum over spikes l: uk = wkl Rkl (tk − tl ) . (1) l To maximise timing sensitivity, we need to determine the effect of a small change in the input firing time tl on the output firing time tk . (A related problem is tackled in [2].) When tl is changed by a small amount dtl the membrane potential will change as a result. This change in the membrane potential leads to a change in the time of threshold crossing dt k . The contribution to the membrane potential, du, due to dtl is (∂uk /∂tl )dtl , and the change in du corresponding to a change dtk is (∂uk /∂tk )dtk . We can relate these two effects by noting that the total change of the membrane potential du has to vanish because u k is defined as the potential at threshold. ie: du = ∂uk ∂uk dtk + dtl = 0 . ∂tk ∂tl (2) This is the total differential of the function uk = u(tk , {tl }), and is a special case of the implicit function theorem. Rearranging this: dtk ∂uk =− dtl ∂tl ∂uk ˙ = −wkl Rkl /uk . ˙ ∂tk (3) Now, to connect with the standard ICA derivation [4], recall the ‘rate’ (or sigmoidal) neuron, for which yi = gi (ui ) and ui = j wij xj . For this neuron, the output dependence on input is ∂yi /∂xj = wij gi while the learning gradient is: ∂yi ∂ 1 log − fi (ui )xj = ∂wij ∂xj wij (4) where the ‘score functions’, fi , are defined in terms of a density estimate on the summed ∂ ∂ inputs: fi (ui ) = ∂ui log gi = ∂ui log p(ui ). ˆ The analogous learning gradient for the spiking case, from (3), is: ˙ j(a)Rka ∂ dtk 1 log − a . = ∂wij dtl wij uk ˙ (5) where j(a) = 1 if spike a came from neuron j, and 0 otherwise. Comparing the two cases in (4) and (5), we see that the input variable xj has become the temporal derivative of the sum of the EPSPs coming from synapse j, and the output variable (or score function) fi (ui ) has become u−1 , the inverse of the temporal derivative ˙k of the membrane potential at threshold. It is intriguing (A) to see this quantity appear as analogous to the score function in the ICA likelihood model, and, (B) to speculate that experiments could show that this‘ voltage slope at threshold’ is a hidden factor in STDP data, explaining some of the scatter in Figure 3A. In other words, an STDP datapoint should lie on a 2-surface in a 3D space of {∆w, ∆t, uk }. Incidentally, uk shows up in any ˙ ˙ learning rule optimising an objective function involving output spike timings. 1.2 How to maximise the effect of N spike timings on N other ones. Now we deal with the case of a ‘square’ single-layer feedforward mapping between spike timings. There can be several input and output neurons, but here we ignore which neurons are spiking, and just look at how the input timings affect the output timings. This is captured in a Jacobian matrix of all timing dependencies we call T. The entries of this matrix are Tkl ≡ ∂tk /∂tl . A multivariate version of the sensitivity measure introduced in the previous section is the log of the absolute determinant of the timing matrix, ie: log |T|. The full derivation for the gradient W log |T| is in the Appendix. Here, we again draw out the analogy between Square ICA [4] and this gradient, as follows. Square ICA with a network y = g(Wx) is: ∆W ∝ W log |J| = W−1 − f (u)xT (6) where the Jacobian J has entries ∂yi /∂xj and the score functions are now, fi (u) = ∂ − ∂ui log p(u) for the general likelihood case, with p(u) = i gi being the special case of ˆ ˆ ICA. We will now split the gradient in (6) according to the chain rule: W log |J| = [ J log |J|] ⊗ [ W J] j(l) − fk (u)xj wkl J−T ⊗ Jkl i(k) = (7) . (8) In this equation, i(k) = δik and j(l) = δjl . The righthand term is a 4-tensor with entries ∂Jkl /∂wij , and ⊗ is defined as A ⊗ Bij = kl Akl Bklij . We write the gradient this way to preserve, in the second term, the independent structure of the 1 → 1 gradient term in (4), and to separate a difficult derivation into two easy parts. The structure of (8) holds up when we move to the spiking case, giving: W log |T| = = [ T log |T|] ⊗ [ W T] T−T ⊗ Tkl i(k) j(l) − wkl (9) a ˙ j(a)Rka uk ˙ (10) where i(k) is now defined as being 1 if spike k occured in neuron i, and 0 otherwise. j(l) and j(a) are analogously defined. Because the T matrix is much bigger than the J matrix, and because it’s entries are more complex, here the similarity ends. When (10) is evaluated for a single weight influencing a single spike coupling (see the Appendix for the full derivation), it yields: ∆wkl ∝ ∂ log |T| Tkl = ∂wkl wkl T−1 lk −1 , (11) This is a non-local update involving a matrix inverse at each step. In the ICA case of (6), such an inverse was removed by the Natural Gradient transform (see [1]), but in the spike timing case, this has turned out not to be possible, because of the additional asymmetry ˙ introduced into the T matrix (as opposed to the J matrix) by the Rkl term in (3). 2 Results. Nonetheless, this learning rule can be simulated. It requires running the network for a while to generate spikes (and a corresponding T matrix), and then for each input/output spike coupling, the corresponding synapse is updated according to (11). When this is done, and the weights learn, it is clear that something has been sacrificed by ignoring the issue of which neurons are producing the spikes. Specifically, the network will often put all the output spikes on one output neuron, with the rates of the others falling to zero. It is happy to do this, if a large log |T| can thereby be achieved, because we have not included this ‘which neuron’ information in the objective. We will address these and other problems in Section 3, but now we report on our simulation results on demultiplexing. 2.1 Demultiplexing spike trains. An interesting possibility in the brain is that ‘patterns’ are embedded in spatially distributed spike timings that are input to neurons. Several patterns could be embedded in single input trains. This is called multiplexing. To extract and propagate these patterns, the neurons must demultiplex these inputs using its threshold nonlinearity. Demultiplexing is the ‘point process’ analog of the unmixing of independent inputs in ICA. We have been able to robustly achieve demultiplexing, as we now report. We simulated a feed-forward network with 3 integrate-and-fire neurons and inputs from 3 presynaptic neurons. Learning followed (11) where we replace the inverse by the pseudoinverse computed on the spikes generated during 0.5 s. The pseudo-inverse is necessary because even though on average, the learning matches number of output spikes to number of input spikes, the matrix T is still not usually square and so its actual inverse cannot be taken. In addition, in these simulations, an additional term is introduced in the learning to make sure all the output neurons fire with equal probability. This partially counters the ignoral of the ‘which neuron’ information, which we explained above. Assuming Poisson spike count ni for the ith output neuron with equal firing rate ni it is easy to derive in an approximate ¯ term that will control the spike count, i (¯ i − ni ). The target firing rates ni were set to n ¯ match the “source” spike train in this example. The network learns to demultiplex mixed spike trains, as shown in Figure 2. This demultiplexing is a robust property of learning using (11) with this new spike-controlling term. Finally, what about the spike-timing dependendence of the observed learning? Does it match experimental results? The comparison is made in Figure 3, and the answer is no. There is a timing-dependent transition between depression and potentiation in our result Spike Trains mixing mixed input trains 1 1 0.8 2 0.6 3 0 50 100 150 200 250 300 350 400 450 0.4 500 0.2 output 1 0 2 3 synaptic weights 0 50 100 150 200 250 300 350 400 450 500 original spike train 1 1 0.5 2 0 3 0 50 100 150 200 250 time in ms 300 350 400 450 500 −0.5 Figure 2: Unmixed spike trains. The input (top lef) are 3 spike trains which are a mixture of three independent Poison processes (bottom left). The network unmixes the spike train to approximately recover the original (center left). In this example 19 spikes correspond to the original with 4 deletion and 2 insertions. The two panels at the right show the mixing (top) and synaptic weight matrix after training (bottom). in Figure 3B, but it is not a sharp transition like the experimental result in Figure 3A. In addition, it does not transition at zero (ie: when tk − tl = 0), but at a time offset by the rise time of the EPSPs. In earlier experiments, in which we tranformed the gradient in (11) by an approximate inverse Hessian, to get an approximate Natural Gradient method, a sharp transition did emerge in simulations. However, the approximate inverse Hessian was singular, and we had to de-emphasise this result. It does suggest, however, that if the Natural Gradient transform can be usefully done on some variant of this learning rule, it may well be what accounts for the sharp transition effect of STDP. 3 Discussion Although these derivations started out smoothly, the reader possibly shares the authors’ frustration at the approximations involved here. Why isn’t this simple, like ICA? Why don’t we just have a nice maximum spikelihood model, ie: a density estimation algorithm for multivariate point processes, as ICA was a model in continuous space? We are going to be explicit about the problems now, and will propose a direction where the solution may lie. The over-riding problem is: we are unable to claim that in maximising log |T|, we are maximising the mutual information between inputs and outputs because: 1. The Invertability Problem. Algorithms such as ICA which maximise log Jacobians can only be called Infomax algorithms if the network transformation is both deterministic and invertable. The Spike Response Model is deterministic, but it is not invertable in general. When not invertable, the key formula (considering here vectors of input and output timings, tin and tout )is transformed from simple to complex. ie: p(tout ) = p(tin ) becomes p(tout ) = |T| solns tin p(tin ) d tin |T| (12) Thus when not invertable, we need to know the Jacobians of all the inputs that could have caused an output (called here ‘solns’), something we simply don’t know. 2. The ‘Which Neuron’ Problem. Instead of maximising the mutual information I(tout , tin ), we should be maximising I(tiout , tiin ), where the vector ti is the timing (A) STDP (B) Gradient 100 ∆ w (a.u.) 150 100 ∆ w / w (%) 150 50 0 −50 −100 −100 50 0 −50 −50 0 ∆ t (ms) 50 100 −100 −20 0 20 40 60 ∆ t (ms) 80 100 Figure 3: Dependence of synaptic modification on pre/post inter-spike interval. Left (A): From Froemke & Dan, Nature (2002)]. Dependence of synaptic modification on pre/post inter-spike interval in cat L2/3 visual cortical pyramidal cells in slice. Naturalistic spike trains. Each point represents one experiment. Right (B): According to Equation (11). Each point corresponds to an spike pair between approximately 100 input and 100 output spikes. vector, t, with the vector, i, of corresponding neuron indices, concatenated. Thus, ‘who spiked?’ should be included in the analysis as it is part of the information. 3. The Predictive Information Problem. In ICA, since there was no time involved, we did not have to worry about mutual informations over time between inputs and outputs. But in the spiking model, output spikes may well have (predictive) mutual information with future input spikes, as well as the usual (causal) mutual information with past input spikes. The former has been entirely missing from our analysis so far. These temporal and spatial infomation dependencies missing in our analysis so far, are thrown into a different light by a single empirical observation, which is that Spike TimingDependent Plasticity is not just a feedforward computation like the Spike Response Model. Specifically, there must be at least a statistical, if not a causal, relation between a real synapse’s plasticity and its neuron’s output spike timings, for Figure 3B to look like it does. It seems we have to confront the need for both a ‘memory’ (or reconstruction) model, such as the T we have thus far dealt with, in which output spikes talk about past inputs, and a ‘prediction’ model, in which they talk about future inputs. This is most easily understood from the point of view of Barber & Agakov’s variational Infomax algorithm [3]. They argue for optimising a lower bound on mutual information, which, for our neurons’, would be expressed using an inverse model p, as follows: ˆ I(tiin , tiout ) = H(tiin ) − log p(tiin |tiout ) ˆ p(tiin ,tiout ) ≤ I(tiin , tiout ) (13) In a feedforward model, H(tiin ) may be disregarded in taking gradients, leading us to the optimisation of a ‘memory-prediction’ model p(tiin |tiout ) related to something supposˆ edly happening in dendrites, somas and at synapses. In trying to guess what this might be, it would be nice if the math worked out. We need a square Jacobian matrix, T, so that |T| = p(tiin |tiout ) can be our memory/prediction model. Now let’s rename our feedforˆ → − ward timing Jacobian T (‘up the dendritic trees’), as T, and let’s fantasise that there is ← − some, as yet unspecified, feedback Jacobian T (‘down the dendritic trees’), which covers → − electrotonic influences as they spread from soma to synapse, and which T can be combined with by some operation ‘⊗’ to make things square. Imagine further, that doing this → ← − − yields a memory/prediction model on the inputs. Then the T we are looking for is T ⊗ T, → ← − − and the memory-prediction model is: p(tiin |tiout ) = T ⊗ T ˆ → − → − Ideally, the entries of T should be as before, ie: T kl = ∂tk /∂tl . What should the entries ← − ← − ← − of T be? Becoming just one step more concrete, suppose T had entries T lk = ∂cl /∂tk , where cl is some, as yet unspecified, value, or process, occuring at an input synapse when spike l comes in. What seems clear is that ⊗ should combine the correctly tensorised forms → − ← − → ← − − of T and T (giving them each 4 indices ijkl), so that T = T ⊗ T sums over the spikes k and l to give a I × J matrix, where I is the number of output neurons, and J the number of input neurons. Then our quantity, T, would represent all dependencies of input neuronal activity on output activity, summed over spikes. ← − Further, we imagine that T contains reverse (feedback) electrotonic transforms from soma ← − to synapse R lk that are somehow symmetrically related to the feedforward Spike Re→ − sponses from synapse to soma, which we now rename R kl . Thinking for a moment in terms of somatic k and synaptic l, voltages V , currents I and linear cable theory, the synapse to → − → − soma transform, R kl would be related to an impedance in Vk = Il Z kl , while the soma ← − ← − to synapse transform, R lk would be related to an admittance in Il = Vk Y lk [8]. The → − ← − symmetry in these equations is that Z kl is just the inverse conjugate of Y lk . Finally, then, what is cl ? And what is its relation to the calcium concentration, [Ca2+ ]l , at a synapse, when spike l comes in? These questions naturally follow from considering the experimental data, since it is known that the calcium level at synapses is the critical integrating factor in determining whether potentiation or depression occurs [5]. 4 Appendix: Gradient of log |T| for the full Spike Response Model. Here we give full details of the gradient for Gerstner’s Spike Response Model [7]. This is a general model for which Integrate-and-Fire is a special case. In this model the effect of a presynaptic spike at time tl on the membrane potential at time t is described by a post synaptic potential or spike response, which may also depend on the time that has passed since the last output spike tk−1 , hence the spike response is written as R(t − tk−1 , t − tl ). This response is weighted by the synaptic strength wl . Excitatory or inhibitory synapses are determined by the sign of wl . Refractoriness is incorporated by adding a hyper-polarizing contribution (spike-afterpotential) to the membrane potential in response to the last preceding spike η(t − tk−1 ). The membrane potential as a function of time is therefore given by u(t) = η(t − tk−1 ) + wl R(t − tk−1 , t − tl ) . (14) l We have ignored here potential contributions from external currents which can easily be included without modifying the following derivations. The output firing times t k are defined as the times for which u(t) reaches firing threshold from below. We consider a dynamic threshold, ϑ(t − tk−1 ), which may depend on the time since that last spike tk−1 , together then output spike times are defined implicitly by: t = tk : u(t) = ϑ(t − tk−1 ) and du(t) > 0. dt (15) For this more general model Tkl is given by Tkl = dtk =− dtl ∂u ∂ϑ − ∂tk ∂tk −1 ˙ ∂u wkl R(tk − tk−1 , tk − tl , ) = , ˙ ∂tl u(tk ) − ϑ(tk − tk−1 ) ˙ (16) ˙ ˙ where R(s, t), u(t), and ϑ(t) are derivatives with respect to t. The dependence of Tkl on ˙ tk−1 should be implicitly assumed. It has been omitted to simplify the notation. Now we compute the derivative of log |T| with respect to wkl . For any matrix T we have ∂ log |T|/∂Tab = [T−1 ]ba . Therefore: ∂ log |T| ∂Tab ∂ log |T| ∂Tab = [T−1 ]ba . (17) ∂wkl ∂Tab ∂wkl ∂wkl ab ab Utilising the Kronecker delta δab = (1 if a = b, else 0), the derivative of (16) with respect to wkl gives: ˙ ∂Tab ∂ wab R(ta − ta−1 , ta − tb ) = ˙ ˙ ∂wkl ∂wkl η(ta − ta−1 ) + wac R(ta − ta−1 , ta − tc ) − ϑ(ta − ta−1 ) c ˙ R(ta − ta−1 , ta − tb ) = δak δbl ˙ u(ta ) − ϑ(ta − ta−1 ) ˙ ˙ ˙ wab R(ta − ta−1 , ta − tb )δak R(ta − ta−1 , ta − tl ) − 2 ˙ u(ta ) − ϑ(ta − ta−1 ) ˙ = δak Tab Therefore: ∂ log |T| ∂wkl Tal δbl − wab wal . (18) δbl Tal − wab wal [T−1 ]ba δak Tab = ab = Tkl wkl [T−1 ]lk − [T−1 ]bk Tkl b (19) = Tkl [T−1 ]lk − 1 . wkl (20) Acknowledgments We are grateful for inspirational discussions with Nihat Ay, Michael Eisele, Hong Hui Yu, Jim Crutchfield, Jeff Beck, Surya Ganguli, Sophi` Deneve, David Barber, Fabian Theis, e Tony Zador and Arunava Banerjee. AJB thanks all RNI colleagues for many such discussions. References [1] Amari S-I. 1997. Natural gradient works efficiently in learning, Neural Computation, 10, 251-276 [2] Banerjee A. 2001. On the Phase-Space Dynamics of Systems of Spiking Neurons. Neural Computation, 13, 161-225 [3] Barber D. & Agakov F. 2003. The IM Algorithm: A Variational Approach to Information Maximization. Advances in Neural Information Processing Systems 16, MIT Press. [4] Bell A.J. & Sejnowski T.J. 1995. An information maximization approach to blind separation and blind deconvolution, Neural Computation, 7, 1129-1159 [5] Dan Y. & Poo M-m. 2004. Spike timing-dependent plasticity of neural circuits, Neuron, 44, 23-30 [6] Froemke R.C. & Dan Y. 2002. Spike-timing-dependent synaptic modification induced by natural spike trains. Nature, 28, 416: 433-8 [7] Gerstner W. & Kistner W.M. 2002. Spiking neuron models, Camb. Univ. Press [8] Zador A.M., Agmon-Snir H. & Segev I. 1995. The morphoelectrotonic transform: a graphical approach to dendritic function, J. Neurosci., 15(3): 1669-1682
2 0.86814392 194 nips-2004-Theory of localized synfire chain: characteristic propagation speed of stable spike pattern
Author: Kosuke Hamaguchi, Masato Okada, Kazuyuki Aihara
Abstract: Repeated spike patterns have often been taken as evidence for the synfire chain, a phenomenon that a stable spike synchrony propagates through a feedforward network. Inter-spike intervals which represent a repeated spike pattern are influenced by the propagation speed of a spike packet. However, the relation between the propagation speed and network structure is not well understood. While it is apparent that the propagation speed depends on the excitatory synapse strength, it might also be related to spike patterns. We analyze a feedforward network with Mexican-Hattype connectivity (FMH) using the Fokker-Planck equation. We show that both a uniform and a localized spike packet are stable in the FMH in a certain parameter region. We also demonstrate that the propagation speed depends on the distinct firing patterns in the same network.
3 0.78604138 173 nips-2004-Spike-timing Dependent Plasticity and Mutual Information Maximization for a Spiking Neuron Model
Author: Taro Toyoizumi, Jean-pascal Pfister, Kazuyuki Aihara, Wulfram Gerstner
Abstract: We derive an optimal learning rule in the sense of mutual information maximization for a spiking neuron model. Under the assumption of small fluctuations of the input, we find a spike-timing dependent plasticity (STDP) function which depends on the time course of excitatory postsynaptic potentials (EPSPs) and the autocorrelation function of the postsynaptic neuron. We show that the STDP function has both positive and negative phases. The positive phase is related to the shape of the EPSP while the negative phase is controlled by neuronal refractoriness. 1
4 0.7712853 153 nips-2004-Reducing Spike Train Variability: A Computational Theory Of Spike-Timing Dependent Plasticity
Author: Sander M. Bohte, Michael C. Mozer
Abstract: Experimental studies have observed synaptic potentiation when a presynaptic neuron fires shortly before a postsynaptic neuron, and synaptic depression when the presynaptic neuron fires shortly after. The dependence of synaptic modulation on the precise timing of the two action potentials is known as spike-timing dependent plasticity or STDP. We derive STDP from a simple computational principle: synapses adapt so as to minimize the postsynaptic neuron’s variability to a given presynaptic input, causing the neuron’s output to become more reliable in the face of noise. Using an entropy-minimization objective function and the biophysically realistic spike-response model of Gerstner (2001), we simulate neurophysiological experiments and obtain the characteristic STDP curve along with other phenomena including the reduction in synaptic plasticity as synaptic efficacy increases. We compare our account to other efforts to derive STDP from computational principles, and argue that our account provides the most comprehensive coverage of the phenomena. Thus, reliability of neural response in the face of noise may be a key goal of cortical adaptation. 1
Author: Wolfgang Maass, Robert A. Legenstein, Nils Bertschinger
Abstract: What makes a neural microcircuit computationally powerful? Or more precisely, which measurable quantities could explain why one microcircuit C is better suited for a particular family of computational tasks than another microcircuit C ? We propose in this article quantitative measures for evaluating the computational power and generalization capability of a neural microcircuit, and apply them to generic neural microcircuit models drawn from different distributions. We validate the proposed measures by comparing their prediction with direct evaluations of the computational performance of these microcircuit models. This procedure is applied first to microcircuit models that differ with regard to the spatial range of synaptic connections and with regard to the scale of synaptic efficacies in the circuit, and then to microcircuit models that differ with regard to the level of background input currents and the level of noise on the membrane potential of neurons. In this case the proposed method allows us to quantify differences in the computational power and generalization capability of circuits in different dynamic regimes (UP- and DOWN-states) that have been demonstrated through intracellular recordings in vivo. 1
6 0.63924557 28 nips-2004-Bayesian inference in spiking neurons
7 0.6185416 148 nips-2004-Probabilistic Computation in Spiking Populations
8 0.52447921 97 nips-2004-Learning Efficient Auditory Codes Using Spikes Predicts Cochlear Filters
9 0.50743949 12 nips-2004-A Temporal Kernel-Based Model for Tracking Hand Movements from Neural Activities
10 0.47149223 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
11 0.43732864 157 nips-2004-Saliency-Driven Image Acuity Modulation on a Reconfigurable Array of Spiking Silicon Neurons
12 0.42016861 76 nips-2004-Hierarchical Bayesian Inference in Networks of Spiking Neurons
14 0.37171933 140 nips-2004-Optimal Information Decoding from Neuronal Populations with Specific Stimulus Selectivity
15 0.33503747 151 nips-2004-Rate- and Phase-coded Autoassociative Memory
16 0.32049429 181 nips-2004-Synergies between Intrinsic and Synaptic Plasticity in Individual Model Neurons
17 0.25435174 198 nips-2004-Unsupervised Variational Bayesian Learning of Nonlinear Models
18 0.22390486 132 nips-2004-Nonlinear Blind Source Separation by Integrating Independent Component Analysis and Slow Feature Analysis
19 0.21551223 184 nips-2004-The Cerebellum Chip: an Analog VLSI Implementation of a Cerebellar Model of Classical Conditioning
20 0.21495722 104 nips-2004-Linear Multilayer Independent Component Analysis for Large Natural Scenes
topicId topicWeight
[(1, 0.339), (13, 0.066), (15, 0.111), (26, 0.08), (31, 0.015), (33, 0.097), (35, 0.043), (39, 0.025), (44, 0.03), (50, 0.036), (82, 0.034), (87, 0.022)]
simIndex simValue paperId paperTitle
1 0.80217773 135 nips-2004-On-Chip Compensation of Device-Mismatch Effects in Analog VLSI Neural Networks
Author: Miguel Figueroa, Seth Bridges, Chris Diorio
Abstract: Device mismatch in VLSI degrades the accuracy of analog arithmetic circuits and lowers the learning performance of large-scale neural networks implemented in this technology. We show compact, low-power on-chip calibration techniques that compensate for device mismatch. Our techniques enable large-scale analog VLSI neural networks with learning performance on the order of 10 bits. We demonstrate our techniques on a 64-synapse linear perceptron learning with the Least-Mean-Squares (LMS) algorithm, and fabricated in a 0.35µm CMOS process. 1
same-paper 2 0.78912222 112 nips-2004-Maximising Sensitivity in a Spiking Network
Author: Anthony J. Bell, Lucas C. Parra
Abstract: We use unsupervised probabilistic machine learning ideas to try to explain the kinds of learning observed in real neurons, the goal being to connect abstract principles of self-organisation to known biophysical processes. For example, we would like to explain Spike TimingDependent Plasticity (see [5,6] and Figure 3A), in terms of information theory. Starting out, we explore the optimisation of a network sensitivity measure related to maximising the mutual information between input spike timings and output spike timings. Our derivations are analogous to those in ICA, except that the sensitivity of output timings to input timings is maximised, rather than the sensitivity of output ‘firing rates’ to inputs. ICA and related approaches have been successful in explaining the learning of many properties of early visual receptive fields in rate coding models, and we are hoping for similar gains in understanding of spike coding in networks, and how this is supported, in principled probabilistic ways, by cellular biophysical processes. For now, in our initial simulations, we show that our derived rule can learn synaptic weights which can unmix, or demultiplex, mixed spike trains. That is, it can recover independent point processes embedded in distributed correlated input spike trains, using an adaptive single-layer feedforward spiking network. 1 Maximising Sensitivity. In this section, we will follow the structure of the ICA derivation [4] in developing the spiking theory. We cannot claim, as before, that this gives us an information maximisation algorithm, for reasons that we will delay addressing until Section 3. But for now, to first develop our approach, we will explore an interim objective function called sensitivity which we define as the log Jacobian of how input spike timings affect output spike timings. 1.1 How to maximise the effect of one spike timing on another. Consider a spike in neuron j at time tl that has an effect on the timing of another spike in neuron i at time tk . The neurons are connected by a weight wij . We use i and j to index neurons, and k and l to index spikes, but sometimes for convenience we will use spike indices in place of neuron indices. For example, wkl , the weight between an input spike l and an output spike k, is naturally understood to be just the corresponding wij . dtk dtl threshold potential du u(t) R(t) resting potential tk output spikes tl input spikes Figure 1: Firing time tk is determined by the time of threshold crossing. A change of an input spike time dtl affects, via a change of the membrane potential du the time of the output spike by dtk . In the simplest version of the Spike Response Model [7], spike l has an effect on spike k that depends on the time-course of the evoked EPSP or IPSP, which we write as R kl (tk − tl ). In general, this Rkl models both synaptic and dendritic linear responses to an input spike, and thus models synapse type and location. For learning, we need only consider the value of this function when an output spike, k, occurs. In this model, depicted in Figure 1, a neuron adds up its spiking inputs until its membrane potential, ui (t), reaches threshold at time tk . This threshold we will often, again for convenience, write as uk ≡ ui (tk , {tl }), and it is given by a sum over spikes l: uk = wkl Rkl (tk − tl ) . (1) l To maximise timing sensitivity, we need to determine the effect of a small change in the input firing time tl on the output firing time tk . (A related problem is tackled in [2].) When tl is changed by a small amount dtl the membrane potential will change as a result. This change in the membrane potential leads to a change in the time of threshold crossing dt k . The contribution to the membrane potential, du, due to dtl is (∂uk /∂tl )dtl , and the change in du corresponding to a change dtk is (∂uk /∂tk )dtk . We can relate these two effects by noting that the total change of the membrane potential du has to vanish because u k is defined as the potential at threshold. ie: du = ∂uk ∂uk dtk + dtl = 0 . ∂tk ∂tl (2) This is the total differential of the function uk = u(tk , {tl }), and is a special case of the implicit function theorem. Rearranging this: dtk ∂uk =− dtl ∂tl ∂uk ˙ = −wkl Rkl /uk . ˙ ∂tk (3) Now, to connect with the standard ICA derivation [4], recall the ‘rate’ (or sigmoidal) neuron, for which yi = gi (ui ) and ui = j wij xj . For this neuron, the output dependence on input is ∂yi /∂xj = wij gi while the learning gradient is: ∂yi ∂ 1 log − fi (ui )xj = ∂wij ∂xj wij (4) where the ‘score functions’, fi , are defined in terms of a density estimate on the summed ∂ ∂ inputs: fi (ui ) = ∂ui log gi = ∂ui log p(ui ). ˆ The analogous learning gradient for the spiking case, from (3), is: ˙ j(a)Rka ∂ dtk 1 log − a . = ∂wij dtl wij uk ˙ (5) where j(a) = 1 if spike a came from neuron j, and 0 otherwise. Comparing the two cases in (4) and (5), we see that the input variable xj has become the temporal derivative of the sum of the EPSPs coming from synapse j, and the output variable (or score function) fi (ui ) has become u−1 , the inverse of the temporal derivative ˙k of the membrane potential at threshold. It is intriguing (A) to see this quantity appear as analogous to the score function in the ICA likelihood model, and, (B) to speculate that experiments could show that this‘ voltage slope at threshold’ is a hidden factor in STDP data, explaining some of the scatter in Figure 3A. In other words, an STDP datapoint should lie on a 2-surface in a 3D space of {∆w, ∆t, uk }. Incidentally, uk shows up in any ˙ ˙ learning rule optimising an objective function involving output spike timings. 1.2 How to maximise the effect of N spike timings on N other ones. Now we deal with the case of a ‘square’ single-layer feedforward mapping between spike timings. There can be several input and output neurons, but here we ignore which neurons are spiking, and just look at how the input timings affect the output timings. This is captured in a Jacobian matrix of all timing dependencies we call T. The entries of this matrix are Tkl ≡ ∂tk /∂tl . A multivariate version of the sensitivity measure introduced in the previous section is the log of the absolute determinant of the timing matrix, ie: log |T|. The full derivation for the gradient W log |T| is in the Appendix. Here, we again draw out the analogy between Square ICA [4] and this gradient, as follows. Square ICA with a network y = g(Wx) is: ∆W ∝ W log |J| = W−1 − f (u)xT (6) where the Jacobian J has entries ∂yi /∂xj and the score functions are now, fi (u) = ∂ − ∂ui log p(u) for the general likelihood case, with p(u) = i gi being the special case of ˆ ˆ ICA. We will now split the gradient in (6) according to the chain rule: W log |J| = [ J log |J|] ⊗ [ W J] j(l) − fk (u)xj wkl J−T ⊗ Jkl i(k) = (7) . (8) In this equation, i(k) = δik and j(l) = δjl . The righthand term is a 4-tensor with entries ∂Jkl /∂wij , and ⊗ is defined as A ⊗ Bij = kl Akl Bklij . We write the gradient this way to preserve, in the second term, the independent structure of the 1 → 1 gradient term in (4), and to separate a difficult derivation into two easy parts. The structure of (8) holds up when we move to the spiking case, giving: W log |T| = = [ T log |T|] ⊗ [ W T] T−T ⊗ Tkl i(k) j(l) − wkl (9) a ˙ j(a)Rka uk ˙ (10) where i(k) is now defined as being 1 if spike k occured in neuron i, and 0 otherwise. j(l) and j(a) are analogously defined. Because the T matrix is much bigger than the J matrix, and because it’s entries are more complex, here the similarity ends. When (10) is evaluated for a single weight influencing a single spike coupling (see the Appendix for the full derivation), it yields: ∆wkl ∝ ∂ log |T| Tkl = ∂wkl wkl T−1 lk −1 , (11) This is a non-local update involving a matrix inverse at each step. In the ICA case of (6), such an inverse was removed by the Natural Gradient transform (see [1]), but in the spike timing case, this has turned out not to be possible, because of the additional asymmetry ˙ introduced into the T matrix (as opposed to the J matrix) by the Rkl term in (3). 2 Results. Nonetheless, this learning rule can be simulated. It requires running the network for a while to generate spikes (and a corresponding T matrix), and then for each input/output spike coupling, the corresponding synapse is updated according to (11). When this is done, and the weights learn, it is clear that something has been sacrificed by ignoring the issue of which neurons are producing the spikes. Specifically, the network will often put all the output spikes on one output neuron, with the rates of the others falling to zero. It is happy to do this, if a large log |T| can thereby be achieved, because we have not included this ‘which neuron’ information in the objective. We will address these and other problems in Section 3, but now we report on our simulation results on demultiplexing. 2.1 Demultiplexing spike trains. An interesting possibility in the brain is that ‘patterns’ are embedded in spatially distributed spike timings that are input to neurons. Several patterns could be embedded in single input trains. This is called multiplexing. To extract and propagate these patterns, the neurons must demultiplex these inputs using its threshold nonlinearity. Demultiplexing is the ‘point process’ analog of the unmixing of independent inputs in ICA. We have been able to robustly achieve demultiplexing, as we now report. We simulated a feed-forward network with 3 integrate-and-fire neurons and inputs from 3 presynaptic neurons. Learning followed (11) where we replace the inverse by the pseudoinverse computed on the spikes generated during 0.5 s. The pseudo-inverse is necessary because even though on average, the learning matches number of output spikes to number of input spikes, the matrix T is still not usually square and so its actual inverse cannot be taken. In addition, in these simulations, an additional term is introduced in the learning to make sure all the output neurons fire with equal probability. This partially counters the ignoral of the ‘which neuron’ information, which we explained above. Assuming Poisson spike count ni for the ith output neuron with equal firing rate ni it is easy to derive in an approximate ¯ term that will control the spike count, i (¯ i − ni ). The target firing rates ni were set to n ¯ match the “source” spike train in this example. The network learns to demultiplex mixed spike trains, as shown in Figure 2. This demultiplexing is a robust property of learning using (11) with this new spike-controlling term. Finally, what about the spike-timing dependendence of the observed learning? Does it match experimental results? The comparison is made in Figure 3, and the answer is no. There is a timing-dependent transition between depression and potentiation in our result Spike Trains mixing mixed input trains 1 1 0.8 2 0.6 3 0 50 100 150 200 250 300 350 400 450 0.4 500 0.2 output 1 0 2 3 synaptic weights 0 50 100 150 200 250 300 350 400 450 500 original spike train 1 1 0.5 2 0 3 0 50 100 150 200 250 time in ms 300 350 400 450 500 −0.5 Figure 2: Unmixed spike trains. The input (top lef) are 3 spike trains which are a mixture of three independent Poison processes (bottom left). The network unmixes the spike train to approximately recover the original (center left). In this example 19 spikes correspond to the original with 4 deletion and 2 insertions. The two panels at the right show the mixing (top) and synaptic weight matrix after training (bottom). in Figure 3B, but it is not a sharp transition like the experimental result in Figure 3A. In addition, it does not transition at zero (ie: when tk − tl = 0), but at a time offset by the rise time of the EPSPs. In earlier experiments, in which we tranformed the gradient in (11) by an approximate inverse Hessian, to get an approximate Natural Gradient method, a sharp transition did emerge in simulations. However, the approximate inverse Hessian was singular, and we had to de-emphasise this result. It does suggest, however, that if the Natural Gradient transform can be usefully done on some variant of this learning rule, it may well be what accounts for the sharp transition effect of STDP. 3 Discussion Although these derivations started out smoothly, the reader possibly shares the authors’ frustration at the approximations involved here. Why isn’t this simple, like ICA? Why don’t we just have a nice maximum spikelihood model, ie: a density estimation algorithm for multivariate point processes, as ICA was a model in continuous space? We are going to be explicit about the problems now, and will propose a direction where the solution may lie. The over-riding problem is: we are unable to claim that in maximising log |T|, we are maximising the mutual information between inputs and outputs because: 1. The Invertability Problem. Algorithms such as ICA which maximise log Jacobians can only be called Infomax algorithms if the network transformation is both deterministic and invertable. The Spike Response Model is deterministic, but it is not invertable in general. When not invertable, the key formula (considering here vectors of input and output timings, tin and tout )is transformed from simple to complex. ie: p(tout ) = p(tin ) becomes p(tout ) = |T| solns tin p(tin ) d tin |T| (12) Thus when not invertable, we need to know the Jacobians of all the inputs that could have caused an output (called here ‘solns’), something we simply don’t know. 2. The ‘Which Neuron’ Problem. Instead of maximising the mutual information I(tout , tin ), we should be maximising I(tiout , tiin ), where the vector ti is the timing (A) STDP (B) Gradient 100 ∆ w (a.u.) 150 100 ∆ w / w (%) 150 50 0 −50 −100 −100 50 0 −50 −50 0 ∆ t (ms) 50 100 −100 −20 0 20 40 60 ∆ t (ms) 80 100 Figure 3: Dependence of synaptic modification on pre/post inter-spike interval. Left (A): From Froemke & Dan, Nature (2002)]. Dependence of synaptic modification on pre/post inter-spike interval in cat L2/3 visual cortical pyramidal cells in slice. Naturalistic spike trains. Each point represents one experiment. Right (B): According to Equation (11). Each point corresponds to an spike pair between approximately 100 input and 100 output spikes. vector, t, with the vector, i, of corresponding neuron indices, concatenated. Thus, ‘who spiked?’ should be included in the analysis as it is part of the information. 3. The Predictive Information Problem. In ICA, since there was no time involved, we did not have to worry about mutual informations over time between inputs and outputs. But in the spiking model, output spikes may well have (predictive) mutual information with future input spikes, as well as the usual (causal) mutual information with past input spikes. The former has been entirely missing from our analysis so far. These temporal and spatial infomation dependencies missing in our analysis so far, are thrown into a different light by a single empirical observation, which is that Spike TimingDependent Plasticity is not just a feedforward computation like the Spike Response Model. Specifically, there must be at least a statistical, if not a causal, relation between a real synapse’s plasticity and its neuron’s output spike timings, for Figure 3B to look like it does. It seems we have to confront the need for both a ‘memory’ (or reconstruction) model, such as the T we have thus far dealt with, in which output spikes talk about past inputs, and a ‘prediction’ model, in which they talk about future inputs. This is most easily understood from the point of view of Barber & Agakov’s variational Infomax algorithm [3]. They argue for optimising a lower bound on mutual information, which, for our neurons’, would be expressed using an inverse model p, as follows: ˆ I(tiin , tiout ) = H(tiin ) − log p(tiin |tiout ) ˆ p(tiin ,tiout ) ≤ I(tiin , tiout ) (13) In a feedforward model, H(tiin ) may be disregarded in taking gradients, leading us to the optimisation of a ‘memory-prediction’ model p(tiin |tiout ) related to something supposˆ edly happening in dendrites, somas and at synapses. In trying to guess what this might be, it would be nice if the math worked out. We need a square Jacobian matrix, T, so that |T| = p(tiin |tiout ) can be our memory/prediction model. Now let’s rename our feedforˆ → − ward timing Jacobian T (‘up the dendritic trees’), as T, and let’s fantasise that there is ← − some, as yet unspecified, feedback Jacobian T (‘down the dendritic trees’), which covers → − electrotonic influences as they spread from soma to synapse, and which T can be combined with by some operation ‘⊗’ to make things square. Imagine further, that doing this → ← − − yields a memory/prediction model on the inputs. Then the T we are looking for is T ⊗ T, → ← − − and the memory-prediction model is: p(tiin |tiout ) = T ⊗ T ˆ → − → − Ideally, the entries of T should be as before, ie: T kl = ∂tk /∂tl . What should the entries ← − ← − ← − of T be? Becoming just one step more concrete, suppose T had entries T lk = ∂cl /∂tk , where cl is some, as yet unspecified, value, or process, occuring at an input synapse when spike l comes in. What seems clear is that ⊗ should combine the correctly tensorised forms → − ← − → ← − − of T and T (giving them each 4 indices ijkl), so that T = T ⊗ T sums over the spikes k and l to give a I × J matrix, where I is the number of output neurons, and J the number of input neurons. Then our quantity, T, would represent all dependencies of input neuronal activity on output activity, summed over spikes. ← − Further, we imagine that T contains reverse (feedback) electrotonic transforms from soma ← − to synapse R lk that are somehow symmetrically related to the feedforward Spike Re→ − sponses from synapse to soma, which we now rename R kl . Thinking for a moment in terms of somatic k and synaptic l, voltages V , currents I and linear cable theory, the synapse to → − → − soma transform, R kl would be related to an impedance in Vk = Il Z kl , while the soma ← − ← − to synapse transform, R lk would be related to an admittance in Il = Vk Y lk [8]. The → − ← − symmetry in these equations is that Z kl is just the inverse conjugate of Y lk . Finally, then, what is cl ? And what is its relation to the calcium concentration, [Ca2+ ]l , at a synapse, when spike l comes in? These questions naturally follow from considering the experimental data, since it is known that the calcium level at synapses is the critical integrating factor in determining whether potentiation or depression occurs [5]. 4 Appendix: Gradient of log |T| for the full Spike Response Model. Here we give full details of the gradient for Gerstner’s Spike Response Model [7]. This is a general model for which Integrate-and-Fire is a special case. In this model the effect of a presynaptic spike at time tl on the membrane potential at time t is described by a post synaptic potential or spike response, which may also depend on the time that has passed since the last output spike tk−1 , hence the spike response is written as R(t − tk−1 , t − tl ). This response is weighted by the synaptic strength wl . Excitatory or inhibitory synapses are determined by the sign of wl . Refractoriness is incorporated by adding a hyper-polarizing contribution (spike-afterpotential) to the membrane potential in response to the last preceding spike η(t − tk−1 ). The membrane potential as a function of time is therefore given by u(t) = η(t − tk−1 ) + wl R(t − tk−1 , t − tl ) . (14) l We have ignored here potential contributions from external currents which can easily be included without modifying the following derivations. The output firing times t k are defined as the times for which u(t) reaches firing threshold from below. We consider a dynamic threshold, ϑ(t − tk−1 ), which may depend on the time since that last spike tk−1 , together then output spike times are defined implicitly by: t = tk : u(t) = ϑ(t − tk−1 ) and du(t) > 0. dt (15) For this more general model Tkl is given by Tkl = dtk =− dtl ∂u ∂ϑ − ∂tk ∂tk −1 ˙ ∂u wkl R(tk − tk−1 , tk − tl , ) = , ˙ ∂tl u(tk ) − ϑ(tk − tk−1 ) ˙ (16) ˙ ˙ where R(s, t), u(t), and ϑ(t) are derivatives with respect to t. The dependence of Tkl on ˙ tk−1 should be implicitly assumed. It has been omitted to simplify the notation. Now we compute the derivative of log |T| with respect to wkl . For any matrix T we have ∂ log |T|/∂Tab = [T−1 ]ba . Therefore: ∂ log |T| ∂Tab ∂ log |T| ∂Tab = [T−1 ]ba . (17) ∂wkl ∂Tab ∂wkl ∂wkl ab ab Utilising the Kronecker delta δab = (1 if a = b, else 0), the derivative of (16) with respect to wkl gives: ˙ ∂Tab ∂ wab R(ta − ta−1 , ta − tb ) = ˙ ˙ ∂wkl ∂wkl η(ta − ta−1 ) + wac R(ta − ta−1 , ta − tc ) − ϑ(ta − ta−1 ) c ˙ R(ta − ta−1 , ta − tb ) = δak δbl ˙ u(ta ) − ϑ(ta − ta−1 ) ˙ ˙ ˙ wab R(ta − ta−1 , ta − tb )δak R(ta − ta−1 , ta − tl ) − 2 ˙ u(ta ) − ϑ(ta − ta−1 ) ˙ = δak Tab Therefore: ∂ log |T| ∂wkl Tal δbl − wab wal . (18) δbl Tal − wab wal [T−1 ]ba δak Tab = ab = Tkl wkl [T−1 ]lk − [T−1 ]bk Tkl b (19) = Tkl [T−1 ]lk − 1 . wkl (20) Acknowledgments We are grateful for inspirational discussions with Nihat Ay, Michael Eisele, Hong Hui Yu, Jim Crutchfield, Jeff Beck, Surya Ganguli, Sophi` Deneve, David Barber, Fabian Theis, e Tony Zador and Arunava Banerjee. AJB thanks all RNI colleagues for many such discussions. References [1] Amari S-I. 1997. Natural gradient works efficiently in learning, Neural Computation, 10, 251-276 [2] Banerjee A. 2001. On the Phase-Space Dynamics of Systems of Spiking Neurons. Neural Computation, 13, 161-225 [3] Barber D. & Agakov F. 2003. The IM Algorithm: A Variational Approach to Information Maximization. Advances in Neural Information Processing Systems 16, MIT Press. [4] Bell A.J. & Sejnowski T.J. 1995. An information maximization approach to blind separation and blind deconvolution, Neural Computation, 7, 1129-1159 [5] Dan Y. & Poo M-m. 2004. Spike timing-dependent plasticity of neural circuits, Neuron, 44, 23-30 [6] Froemke R.C. & Dan Y. 2002. Spike-timing-dependent synaptic modification induced by natural spike trains. Nature, 28, 416: 433-8 [7] Gerstner W. & Kistner W.M. 2002. Spiking neuron models, Camb. Univ. Press [8] Zador A.M., Agmon-Snir H. & Segev I. 1995. The morphoelectrotonic transform: a graphical approach to dendritic function, J. Neurosci., 15(3): 1669-1682
3 0.75207537 57 nips-2004-Economic Properties of Social Networks
Author: Sham M. Kakade, Michael Kearns, Luis E. Ortiz, Robin Pemantle, Siddharth Suri
Abstract: We examine the marriage of recent probabilistic generative models for social networks with classical frameworks from mathematical economics. We are particularly interested in how the statistical structure of such networks influences global economic quantities such as price variation. Our findings are a mixture of formal analysis, simulation, and experiments on an international trade data set from the United Nations. 1
4 0.4696106 153 nips-2004-Reducing Spike Train Variability: A Computational Theory Of Spike-Timing Dependent Plasticity
Author: Sander M. Bohte, Michael C. Mozer
Abstract: Experimental studies have observed synaptic potentiation when a presynaptic neuron fires shortly before a postsynaptic neuron, and synaptic depression when the presynaptic neuron fires shortly after. The dependence of synaptic modulation on the precise timing of the two action potentials is known as spike-timing dependent plasticity or STDP. We derive STDP from a simple computational principle: synapses adapt so as to minimize the postsynaptic neuron’s variability to a given presynaptic input, causing the neuron’s output to become more reliable in the face of noise. Using an entropy-minimization objective function and the biophysically realistic spike-response model of Gerstner (2001), we simulate neurophysiological experiments and obtain the characteristic STDP curve along with other phenomena including the reduction in synaptic plasticity as synaptic efficacy increases. We compare our account to other efforts to derive STDP from computational principles, and argue that our account provides the most comprehensive coverage of the phenomena. Thus, reliability of neural response in the face of noise may be a key goal of cortical adaptation. 1
5 0.46888456 28 nips-2004-Bayesian inference in spiking neurons
Author: Sophie Deneve
Abstract: We propose a new interpretation of spiking neurons as Bayesian integrators accumulating evidence over time about events in the external world or the body, and communicating to other neurons their certainties about these events. In this model, spikes signal the occurrence of new information, i.e. what cannot be predicted from the past activity. As a result, firing statistics are close to Poisson, albeit providing a deterministic representation of probabilities. We proceed to develop a theory of Bayesian inference in spiking neural networks, recurrent interactions implementing a variant of belief propagation. Many perceptual and motor tasks performed by the central nervous system are probabilistic, and can be described in a Bayesian framework [4, 3]. A few important but hidden properties, such as direction of motion, or appropriate motor commands, are inferred from many noisy, local and ambiguous sensory cues. These evidences are combined with priors about the sensory world and body. Importantly, because most of these inferences should lead to quick and irreversible decisions in a perpetually changing world, noisy cues have to be integrated on-line, but in a way that takes into account unpredictable events, such as a sudden change in motion direction or the appearance of a new stimulus. This raises the question of how this temporal integration can be performed at the neural level. It has been proposed that single neurons in sensory cortices represent and compute the log probability that a sensory variable takes on a certain value (eg Is visual motion in the neuron’s preferred direction?) [9, 7]. Alternatively, to avoid normalization issues and provide an appropriate signal for decision making, neurons could represent the log probability ratio of a particular hypothesis (eg is motion more likely to be towards the right than towards the left) [7, 6]. Log probabilities are convenient here, since under some assumptions, independent noisy cues simply combine linearly. Moreover, there are physiological evidence for the neural representation of log probabilities and log probability ratios [9, 6, 7]. However, these models assume that neurons represent probabilities in their firing rates. We argue that it is important to study how probabilistic information are encoded in spikes. Indeed, it seems spurious to marry the idea of an exquisite on-line integration of noisy cues with an underlying rate code that requires averaging on large populations of noisy neurons and long periods of time. In particular, most natural tasks require this integration to take place on the time scale of inter-spike intervals. Spikes are more efficiently signaling events ∗ Institute of Cognitive Science, 69645 Bron, France than analog quantities. In addition, a neural theory of inference with spikes will bring us closer to the physiological level and generate more easily testable predictions. Thus, we propose a new theory of neural processing in which spike trains provide a deterministic, online representation of a log-probability ratio. Spikes signals events, eg that the log-probability ratio has exceeded what could be predicted from previous spikes. This form of coding was loosely inspired by the idea of ”energy landscape” coding proposed by Hinton and Brown [2]. However, contrary to [2] and other theories using rate-based representation of probabilities, this model is self-consistent and does not require different models for encoding and decoding: As output spikes provide new, unpredictable, temporally independent evidence, they can be used directly as an input to other Bayesian neurons. Finally, we show that these neurons can be used as building blocks in a theory of approximate Bayesian inference in recurrent spiking networks. Connections between neurons implement an underlying Bayesian network, consisting of coupled hidden Markov models. Propagation of spikes is a form of belief propagation in this underlying graphical model. Our theory provides computational explanations of some general physiological properties of cortical neurons, such as spike frequency adaptation, Poisson statistics of spike trains, the existence of strong local inhibition in cortical columns, and the maintenance of a tight balance between excitation and inhibition. Finally, we discuss the implications of this model for the debate about temporal versus rate-based neural coding. 1 Spikes and log posterior odds 1.1 Synaptic integration seen as inference in a hidden Markov chain We propose that each neuron codes for an underlying ”hidden” binary variable, xt , whose state evolves over time. We assume that xt depends only on the state at the previous time step, xt−dt , and is conditionally independent of other past states. The state xt can switch 1 from 0 to 1 with a constant rate ron = dt limdt→0 P (xt = 1|xt−dt = 0), and from 1 to 0 with a constant rate roff . For example, these transition rates could represent how often motion in a preferred direction appears the receptive field and how long it is likely to stay there. The neuron infers the state of its hidden variable from N noisy synaptic inputs, considered to be observations of the hidden state. In this initial version of the model, we assume that these inputs are conditionally independent homogeneous Poisson processes, synapse i i emitting a spike between time t and t + dt (si = 1) with constant probability qon dt if t i xt = 1, and another constant probability qoff dt if xt = 0. The synaptic spikes are assumed to be otherwise independent of previous synaptic spikes, previous states and spikes at other synapses. The resulting generative model is a hidden Markov chain (figure 1-A). However, rather than estimating the state of its hidden variable and communicating this estimate to other neurons (for example by emitting a spike when sensory evidence for xt = 1 goes above a threshold) the neuron reports and communicates its certainty that the current state is 1. This certainty takes the form of the log of the ratio of the probability that the hidden state is 1, and the probability that the state is 0, given all the synaptic inputs P (xt =1|s0→t ) received so far: Lt = log P (xt =0|s0→t ) . We use s0→t as a short hand notation for the N synaptic inputs received at present and in the past. We will refer to it as the log odds ratio. Thanks to the conditional independencies assumed in the generative model, we can compute this Log odds ratio iteratively. Taking the limit as dt goes to zero, we get the following differential equation: ˙ L = ron 1 + e−L − roff 1 + eL + i wi δ(si − 1) − θ t B. A. xt ron .roff dt qon , qoff st xt ron .roff i t st dt s qon , qoff qon , qoff st dt xt j st Ot It Gt Ot Lt t t dt C. E. 2 0 -2 -4 D. 500 1000 1500 2000 2500 2 3000 Count Log odds 4 20 Lt 0 -2 0 500 1000 1500 2000 2500 Time Ot 3000 0 200 400 600 ISI Figure 1: A. Generative model for the synaptic input. B. Schematic representation of log odds ratio encoding and decoding. The dashed circle represents both eventual downstream elements and the self-prediction taking place inside the model neuron. A spike is fired only when Lt exceeds Gt . C. One example trial, where the state switches from 0 to 1 (shaded area) and back to 0. plain: Lt , dotted: Gt . Black stripes at the top: corresponding spikes train. D. Mean Log odds ratio (dark line) and mean output firing rate (clear line). E. Output spike raster plot (1 line per trial) and ISI distribution for the neuron shown is C. and D. Clear line: ISI distribution for a poisson neuron with the same rate. wi , the synaptic weight, describe how informative synapse i is about the state of the hidden i qon variable, e.g. wi = log qi . Each synaptic spike (si = 1) gives an impulse to the log t off odds ratio, which is positive if this synapse is more active when the hidden state if 1 (i.e it increases the neuron’s confidence that the state is 1), and negative if this synapse is more active when xt = 0 (i.e it decreases the neuron’s confidence that the state is 1). The bias, θ, is determined by how informative it is not to receive any spike, e.g. θ = i i i qon − qoff . By convention, we will consider that the ”bias” is positive or zero (if not, we need simply to invert the status of the state x). 1.2 Generation of output spikes The spike train should convey a sparse representation of Lt , so that each spike reports new information about the state xt that is not redundant with that reported by other, preceding, spikes. This proposition is based on three arguments: First, spikes, being metabolically expensive, should be kept to a minimum. Second, spikes conveying redundant information would require a decoding of the entire spike train, whereas independent spike can be taken into account individually. And finally, we seek a self consistent model, with the spiking output having a similar semantics to its spiking input. To maximize the independence of the spikes (conditioned on xt ), we propose that the neuron fires only when the difference between its log odds ratio Lt and a prediction Gt of this log odds ratio based on the output spikes emitted so far reaches a certain threshold. Indeed, supposing that downstream elements predicts Lt as best as they can, the neuron only needs to fire when it expects that prediction to be too inaccurate (figure 1-B). In practice, this will happen when the neuron receives new evidence for xt = 1. Gt should thereby follow the same dynamics as Lt when spikes are not received. The equation for Gt and the output Ot (Ot = 1 when an output spike is fired) are given by: ˙ G = Ot = ron 1 + e−L − roff 1 + eL + go δ(Ot − 1) go 1. when Lt > Gt + , 0 otherwise, 2 (1) (2) Here go , a positive constant, is the only free parameter, the other parameters being constrained by the statistics of the synaptic input. 1.3 Results Figure 1-C plots a typical trial, showing the behavior of L, G and O before, during and after presentation of the stimulus. As random synaptic inputs are integrated, L fluctuates and eventually exceeds G + 0.5, leading to an output spike. Immediately after a spike, G jumps to G + go , which prevents (except in very rare cases) a second spike from immediately following the first. Thus, this ”jump” implements a relative refractory period. However, ron G decays as it tends to converge back to its stable level gstable = log roff . Thus L eventually exceeds G again, leading to a new spike. This threshold crossing happens more often during stimulation (xt = 1) as the net synaptic input alters to create a higher overall level of certainty, Lt . Mean Log odds ratio and output firing rate ¯ The mean firing rate Ot of the Bayesian neuron during presentation of its preferred stimulus (i.e. when xt switches from 0 to 1 and back to 0) is plotted in figure 1-D, together with the ¯ mean log posterior ratio Lt , both averaged over trials. Not surprisingly, the log-posterior ratio reflects the leaky integration of synaptic evidence, with an effective time constant that depends on the transition probabilities ron , roff . If the state is very stable (ron = roff ∼ 0), synaptic evidence is integrated over almost infinite time periods, the mean log posterior ratio tending to either increase or decrease linearly with time. In the example in figure 1D, the state is less stable, so ”old” synaptic evidence are discounted and Lt saturates. ¯ In contrast, the mean output firing rate Ot tracks the state of xt almost perfectly. This is because, as a form of predictive coding, the output spikes reflect the new synaptic i evidence, It = i δ(st − 1) − θ, rather than the log posterior ratio itself. In particular, the mean output firing rate is a rectified linear function of the mean input, e. g. + ¯ ¯ wi q i −θ . O= 1I= go i on(off) Analogy with a leaky integrate and fire neuron We can get an interesting insight into the computation performed by this neuron by linearizing L and G around their mean levels over trials. Here we reduce the analysis to prolonged, statistically stable periods when the state is constant (either ON or OFF). In this case, the ¯ ¯ mean level of certainty L and its output prediction G are also constant over time. We make the rough approximation that the post spike jump, go , and the input fluctuations are small ¯ compared to the mean level of certainty L. Rewriting Vt = Lt − Gt + go 2 as the ”membrane potential” of the Bayesian neuron: ˙ V = −kL V + It − ∆go − go Ot ¯ ¯ ¯ where kL = ron e−L + roff eL , the ”leak” of the membrane potential, depends on the overall ¯ level of certainty. ∆go is positive and a monotonic increasing function of go . A. s t1 dt s t1 s t1 dt B. C. x t1 x t3 dt x t3 x t3 dt x t1 x t1 x t1 x t2 x t3 x t1 … x tn x t3 x t2 … x tn … dt dt Lx2 D. x t2 dt s t2 dt x t2 s t2 x t2 dt s t2 dt Log odds 10 No inh -0.5 -1 -1 -1.5 -2 5 Feedback 500 1000 1500 2000 Tiger Stripes 0 -5 -10 500 1000 1500 2000 2500 Time Figure 2: A. Bayesian causal network for yt (tiger), x1 (stripes) and x2 (paws). B. A nett t work feedforward computing the log posterior for x1 . C. A recurrent network computing t the log posterior odds for all variables. D. Log odds ratio in a simulated trial with the net2 1 1 work in C (see text). Thick line: Lx , thin line: Lx , dash-dotted: Lx without inhibition. t t t 2 Insert: Lx averaged over trials, showing the effect of feedback. t The linearized Bayesian neuron thus acts in its stable regime as a leaky integrate and fire (LIF) neuron. The membrane potential Vt integrates its input, Jt = It − ∆go , with a leak kL . The neuron fires when its membrane potential reaches a constant threshold go . After ¯ each spikes, Vt is reset to 0. Interestingly, for appropriately chosen compression factor go , the mean input to the lin¯ ¯ earized neuron J = I − ∆go ≈ 0 1 . This means that the membrane potential is purely driven to its threshold by input fluctuations, or a random walk in membrane potential. As a consequence, the neuron’s firing will be memoryless, and close to a Poisson process. In particular, we found Fano factor close to 1 and quasi-exponential ISI distribution (figure 1E) on the entire range of parameters tested. Indeed, LIF neurons with balanced inputs have been proposed as a model to reproduce the statistics of real cortical neurons [8]. This balance is implemented in our model by the neuron’s effective self-inhibition, even when the synaptic input itself is not balanced. Decoding As we previously said, downstream elements could predict the log odds ratio Lt by computing Gt from the output spikes (Eq 1, fig 1-B). Of course, this requires an estimate of the transition probabilities ron , roff , that could be learned from the observed spike trains. However, we show next that explicit decoding is not necessary to perform bayesian inference in spiking networks. Intuitively, this is because the quantity that our model neurons receive and transmit, eg new information, is exactly what probabilistic inference algorithm propagate between connected statistical elements. 1 ¯ Even if go is not chosen optimally, the influence of the drift J is usually negligible compared to the large fluctuations in membrane potential. 2 Bayesian inference in cortical networks The model neurons, having the same input and output semantics, can be used as building blocks to implement more complex generative models consisting of coupled Markov chains. Consider, for example, the example in figure 2-A. Here, a ”parent” variable x1 t (the presence of a tiger) can cause the state of n other ”children” variables ([xk ]k=2...n ), t of whom two are represented (the presence of stripes,x2 , and motion, x3 ). The ”chilt t dren” variables are Bayesian neurons identical to those described previously. The resulting bayesian network consist of n + 1 coupled hidden Markov chains. Inference in this architecture corresponds to computing the log posterior odds ratio for the tiger, x1 , and the log t posterior of observing stripes or motion, ([xk ]k=2...n ), given the synaptic inputs received t by the entire network so far, i.e. s2 , . . . , sk . 0→t 0→t Unfortunately, inference and learning in this network (and in general in coupled Markov chains) requires very expensive computations, and cannot be performed by simply propagating messages over time and among the variable nodes. In particular, the state of a child k variable xt depends on xk , sk , x1 and the state of all other children at the previous t t t−dt time step, [xj ]2
6 0.46341765 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees
7 0.46233478 69 nips-2004-Fast Rates to Bayes for Kernel Machines
8 0.46222812 76 nips-2004-Hierarchical Bayesian Inference in Networks of Spiking Neurons
10 0.46018085 172 nips-2004-Sparse Coding of Natural Images Using an Overcomplete Set of Limited Capacity Units
11 0.45803294 148 nips-2004-Probabilistic Computation in Spiking Populations
12 0.45539445 151 nips-2004-Rate- and Phase-coded Autoassociative Memory
13 0.45445397 68 nips-2004-Face Detection --- Efficient and Rank Deficient
14 0.45437706 4 nips-2004-A Generalized Bradley-Terry Model: From Group Competition to Individual Skill
15 0.45370921 131 nips-2004-Non-Local Manifold Tangent Learning
16 0.45282462 173 nips-2004-Spike-timing Dependent Plasticity and Mutual Information Maximization for a Spiking Neuron Model
17 0.45199621 201 nips-2004-Using the Equivalent Kernel to Understand Gaussian Process Regression
18 0.45154235 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods
19 0.45065236 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms
20 0.44951534 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection