nips nips2011 nips2011-133 knowledge-graph by maker-knowledge-mining

133 nips-2011-Inferring spike-timing-dependent plasticity from spike train data

Source: pdf

Author: Konrad Koerding, Ian Stevenson

Abstract: Synaptic plasticity underlies learning and is thus central for development, memory, and recovery from injury. However, it is often difﬁcult to detect changes in synaptic strength in vivo, since intracellular recordings are experimentally challenging. Here we present two methods aimed at inferring changes in the coupling between pairs of neurons from extracellularly recorded spike trains. First, using a generalized bilinear model with Poisson output we estimate time-varying coupling assuming that all changes are spike-timing-dependent. This approach allows model-based estimation of STDP modiﬁcation functions from pairs of spike trains. Then, using recursive point-process adaptive ﬁltering methods we estimate more general variation in coupling strength over time. Using simulations of neurons undergoing spike-timing dependent modiﬁcation, we show that the true modiﬁcation function can be recovered. Using multi-electrode data from motor cortex we then illustrate the use of this technique on in vivo data. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Inferring spike-timing-dependent plasticity from spike train data Ian H. [sent-1, score-0.3]

2 However, it is often difﬁcult to detect changes in synaptic strength in vivo, since intracellular recordings are experimentally challenging. [sent-5, score-0.587]

3 Here we present two methods aimed at inferring changes in the coupling between pairs of neurons from extracellularly recorded spike trains. [sent-6, score-0.962]

4 First, using a generalized bilinear model with Poisson output we estimate time-varying coupling assuming that all changes are spike-timing-dependent. [sent-7, score-0.526]

5 This approach allows model-based estimation of STDP modiﬁcation functions from pairs of spike trains. [sent-8, score-0.253]

6 Then, using recursive point-process adaptive ﬁltering methods we estimate more general variation in coupling strength over time. [sent-9, score-0.612]

7 Using simulations of neurons undergoing spike-timing dependent modiﬁcation, we show that the true modiﬁcation function can be recovered. [sent-10, score-0.296]

8 Using multi-electrode data from motor cortex we then illustrate the use of this technique on in vivo data. [sent-11, score-0.244]

9 A number of experimental results, using intracellular recordings in vitro, have shown that synaptic plasticity depends on the precise pairing of pre- and post-synaptic spiking [3]. [sent-13, score-0.532]

10 While such spike-timing-dependent plasticity (STDP) is thought to serve as a powerful regulatory mechanism [4], measuring STDP in vivo using intracellular recordings is experimentally difﬁcult [5]. [sent-14, score-0.325]

11 Here we instead attempt to estimate STDP in vivo by using simultaneously recorded extracellular spike trains and develop methods to estimate the time-varying strength of synapses. [sent-15, score-0.477]

12 In the past few years model-based methods have been developed that allow the estimation of coupling between pairs of neurons from spike train data [6, 7, 8, 9, 10, 11]. [sent-16, score-0.855]

13 While anatomical connections between pairs of extracellularly recorded neurons are generally not guaranteed, these phenomenological methods regularly improve encoding accuracy and provide a statistical description of the functional coupling between neurons. [sent-18, score-0.653]

14 Here we present two techniques that extend these statistical methods to time-varying coupling between neurons and allow the estimation of spike-timing-dependent plasticity from spike trains. [sent-19, score-0.843]

15 First we introduce a generative model for time-varying coupling between neurons where the changes in coupling strength depend on the relative timing of pre- and post-synaptic spikes: a bilinearnonlinear-Poisson model. [sent-20, score-1.253]

16 We then present two approaches for inferring STDP modiﬁcation functions from spike data. [sent-21, score-0.251]

17 1 B + Nonlinearity Predicted Spiking Post-Synaptic Spikes Post-Spike History Coupling to Pre-Synaptic Neuron x Synaptic Strength Modification Function Pre-Synaptic Spikes tpre - tpost Pre Post log(λ) Synaptic Strength A 200 ms 1. [sent-23, score-0.569]

18 A) A generative model of spikes where the coupling between neurons undergoes spike-timing dependent modiﬁcation. [sent-26, score-0.747]

19 Post-synaptic spiking is modeled as a doubly stochastic Poisson process with a conditional intensity that depends on the neuron’s own history and coupling to a pre-synaptic neuron. [sent-27, score-0.601]

20 We consider the case where the strength of the coupling changes over time, depending on the relative timing of pre- and post-synaptic spikes through a modiﬁcation function. [sent-28, score-0.75]

21 B) As the synaptic strength changes over time, the inﬂuence of the pre-synaptic neuron on the post-synaptic neuron changes. [sent-29, score-0.742]

22 Insets illustrate two points in time where synaptic strength is low (left) and high (right), respectively. [sent-30, score-0.383]

23 2 Methods Many studies have examined nonstationarity in neural systems, including for decoding [13], unitary event detection [14], and assessing statistical dependencies between neurons [15]. [sent-32, score-0.25]

24 Here we focus speciﬁcally on non-stationarity in coupling between neurons due to spike-timing dependent modiﬁcation of synapses. [sent-33, score-0.62]

25 Our aim is to provide a framework for inferring spike-timing dependent modiﬁcation functions from spike train data alone. [sent-34, score-0.328]

26 We ﬁrst present a generative model for spike trains where neurons are undergoing STDP. [sent-35, score-0.441]

27 We then present two methods for estimating spike-timing dependent modiﬁcation functions from spike train data: a direct method based on a time-varying generalized linear model (GLM) and an indirect method based on point-process adaptive ﬁltering. [sent-36, score-0.4]

28 1 A generative model for coupling with spike-timing dependent modiﬁcation While STDP has traditionally been modeled using integrate-and-ﬁre neurons [4, 16], here we model neurons undergoing STDP using a simple rate model of coupling between neurons, a linearnonlinear-Poisson (LNP) model. [sent-38, score-1.255]

29 In our LNP model, the conditional intensity (instantaneous ﬁring rate) of a neuron is given by a linear combination of covariates passed through a nonlinearity. [sent-39, score-0.271]

30 The covariates driving variations in the neuron’s ﬁring rate can depend on the past spiking history of the neuron, the past spiking history of other neurons (coupling), as well as any external covariates such as visual stimuli [10] or hand movement [12]. [sent-41, score-0.803]

31 α0 deﬁnes a baseline ﬁring rate, which is modulated by both the neuron’s own spike history from t−τ to t, npost (t−τ : t), and the history of the pre-synaptic neuron npre (t − τ : t) (together abbreviated as Ht ). [sent-43, score-0.679]

32 Here we have assumed that the post-spike history and coupling effects are mapped into a smooth basis by a set of functions fi and then weighted by a set of post-spike coefﬁcients α and a set of coupling 2 coefﬁcients β. [sent-44, score-0.898]

33 This model has been used extensively over the past few years to model coupling between neurons [10, 12]. [sent-46, score-0.596]

34 Here we consider the case where the coupling strength can vary over time, and particularly as a function of precise timing between pre- and post-synaptic spikes. [sent-50, score-0.542]

35 Here, for simplicity, we have re-written the stable post-spike history and coupling terms in matrix form. [sent-52, score-0.564]

36 The vector X s (t) summarizes the post-spike history covariates at time t while X c (t) summarizes the covariates related to the history of the pre-synaptic neuron. [sent-53, score-0.402]

37 In this model, the synaptic weight w(t) simply acts to scale the stable coupling deﬁned by β, and we update w(t) such that every pre-post spike pair alters the synaptic weight independently following the second spike. [sent-54, score-1.294]

38 A synaptic weight determining the strength of coupling between the two neurons changes over time depending on the relative spike-timing (Fig 1A). [sent-56, score-1.114]

39 This creates a sharp boundary where the synapse is strengthened whenever pre-synaptic spikes appear to cause postsynaptic spikes and weakened when post-synaptic spikes do not immediately proceed pre-synaptic spikes. [sent-60, score-0.378]

40 The sharp causal boundary in the classical doubleexponential tends to drive synaptic weights either towards a maximum or to zero. [sent-65, score-0.267]

41 By adding noise 3 to tpre − tpost , this causal boundary can be smoothed and weight distributions become stable [17]. [sent-66, score-0.685]

42 Here we add Gaussian noise to (3) such that (tpre − tpost ) = (tpre − tpost ) + , ∼ N (0, σ 2 ). [sent-67, score-0.524]

43 It is important to note that, unlike more common integrate-and-ﬁre models of STDP, these modiﬁcation function do not describe a change in the magnitude of post synaptic potentials (PSPs). [sent-68, score-0.267]

44 This generative model includes two distinct components: a GLM that deﬁnes the stationary ﬁring properties of the post-synaptic neuron and a modiﬁcation function that deﬁnes how the coupling between the pre- and post-synaptic neuron changes over time as a function of relative spike timing. [sent-72, score-1.005]

45 In simulating isolated pairs of neurons, each of the modiﬁcation functions described above induces large variations in the synaptic weight. [sent-73, score-0.325]

46 For the sake of stable simulation we add an additional longtimescale forgetting factor that pushes the synaptic weights back to 1. [sent-74, score-0.374]

47 Namely, w(t + ∆t) = w(t) − w(t) − ∆t τf (w(t) ∆t τf (w(t) − 1) + ∆w(tpre − tpost ) if npre or npost = 1 − 1) otherwise (5) where, here, we use τf = 60s. [sent-75, score-0.408]

48 The next sections describe two methods for estimating time-varying synaptic strength as well as STDP modiﬁcation functions from spike train data. [sent-76, score-0.626]

49 2 Point-process adaptive ﬁltering of coupling strength Several recent studies have examined the possibility that the tuning properties of neurons may drift over time. [sent-78, score-0.815]

50 Here we use this approach to track variations in coupling strength between two neurons over time. [sent-82, score-0.686]

51 Given a new spike count observation nt , we then integrate this prior information with the likelihood to obtain the posterior. [sent-87, score-0.245]

52 Here, the state-space variable is coupling strength, 4 and stable components of the model, such as post-spike history effects, are summarized with ct . [sent-90, score-0.564]

53 In the analysis that follows we will reduce the problem to a single dimension, where the shape of coupling is ﬁxed during training, and we apply the point-process adaptive ﬁlter to a single coefﬁcient for the covariate X (t) = Xc (t)β. [sent-93, score-0.496]

54 Given an estimate of the time-varying synaptic weight w(t), we can then estimate the modiﬁcation function ∆w(tpre − ˆ ˆ tpost ) by correlating the estimated changes in w(t) with the relative spike timings that we observe. [sent-95, score-0.912]

55 3 Inferring STDP with a nonparametric, generalized bilinear model Point-process adaptive ﬁltering allows us to track noisy changes in coupling strength over time. [sent-97, score-0.749]

56 Speciﬁcally, we model the modiﬁcation function non-parametrically by generating covariates W that depend on the relative spike timing. [sent-100, score-0.329]

57 In addition to estimates of the post-spike history and coupling ﬁlters, the GBLM thus provides a non-parametric approximation to the modiﬁcation function and explicitly accounts for spike-timing dependent modiﬁcation of the coupling strength. [sent-111, score-0.927]

58 We simulated a presynaptic neuron as a homogeneous Poisson process with a ﬁring rate of 5Hz, and the post-synaptic neuron as a conditionally Poisson process with a baseline ﬁring rate of 5Hz. [sent-113, score-0.377]

59 Through the GBLM, the post-synaptic neuron’s ﬁring rate is affected by its own post-spike history as well as the activity of the pre-synaptic neuron (modeled using 5 raised cosine basis functions [10]). [sent-114, score-0.298]

60 However, as STDP occurs the strength of coupling between the neurons changes according to one of three modiﬁcation functions: a double-exponential, a mexican-hat, or a smoothed double-exponential (Fig 2). [sent-115, score-0.8]

61 We ﬁnd that both point-process adaptive ﬁltering and the generalized bilinear model are able to accurately reconstruct the time-varying synaptic weight for each type of modiﬁcation function (Fig 2, left). [sent-116, score-0.505]

62 Spikes were simulated from two neurons whose coupling varied over time, depending on the relative timing of pre- and post-synaptic spikes. [sent-119, score-0.696]

63 Using two distinct methods (point-process adaptive ﬁltering and the GBLM) we estimated the time-varying coupling strength and modiﬁcation function from simulated spike train data. [sent-120, score-0.918]

64 The post-spike history and coupling terms are shown at left for the GBLM as multiplicative gains exp(β). [sent-123, score-0.488]

65 Error bars denote standard errors for the post-spike and coupling ﬁlters and 95% conﬁdence intervals for the modiﬁcation function estimates. [sent-124, score-0.415]

66 synaptic weight following the observations nt , this is not entirely unsurprising. [sent-125, score-0.339]

67 Changes in coupling strength are only detected by the ﬁlter after they have occurred and become evident in the spiking of the post-synaptic neuron. [sent-126, score-0.589]

68 In contrast to the GBLM, there is a substantial delay between changes in the true synaptic weight and those estimated by the adaptive ﬁlter. [sent-127, score-0.53]

69 In this case, we ﬁnd that the accuracy of the adaptive ﬁlter follows changes in the synaptic weight approximately exponentially with τ ∼ 25ms (Fig 3B). [sent-128, score-0.503]

70 Here we simulated the standard double-exponential STDP model with several different effect-sizes, modifying A+ and A− and examining the estimation error in both w(t) and ∆w(tpre − tpost ) (Fig 3). [sent-131, score-0.319]

71 The three different effect-sizes simulated here used ˆ ˆ coupling kernels similar to Fig 2A and began with w(t) = 1. [sent-132, score-0.446]

72 In these situations Adaptive Filtering reconstructs both the synaptic weight (Fig 3E) and modiﬁcation function (Fig 3F) more accurately than the GBLM (Fig 3C,E). [sent-141, score-0.34]

73 However, once enough data is available maximum likelihood estimation of the GBLM out-performs both the stable coupling model and adaptive ﬁltering. [sent-142, score-0.6]

74 Here, the stable coupling model has an average cross-validated log likelihood ratio relative to a homogeneous Poisson process of 0. [sent-144, score-0.565]

75 Even in this controlled simulation the contribution of timevarying coupling is relatively small. [sent-147, score-0.389]

76 Filled circles denote updates of the stable coupling terms. [sent-156, score-0.465]

77 B) Cross-correlation between changes in the true synaptic weight and estimated weight for the GBLM and Adaptive Filter. [sent-160, score-0.473]

78 C,D) Correlation between the simulated and estimated synaptic weight (C) and modiﬁcation function (D) for the GBLM as a function of the recording length. [sent-163, score-0.459]

79 E,F) Correlation between the simulated and estimated synaptic weight and modiﬁcation function for Adaptive Filtering. [sent-164, score-0.401]

80 The GBLM (blue) over-ﬁts for small amounts of data, but eventually out-performs both the stable coupling model (gray) and Adaptive Filtering (red). [sent-167, score-0.465]

81 A) Log likelihood relative to a homogeneous Poisson process for each of four models: a stable GLM with only post-spike history (PSH), a stable GLM with PSH and coupling, the GBLM, and the Adaptive Filter. [sent-187, score-0.351]

82 D) The degree to which adding nonstationary coupling improves model accuracy does not appear to be related to coupling strength as measured by how much the PSH+Coupling model improves model accuracy over the PSH model. [sent-194, score-0.923]

83 Approximately 180 minutes of data from 83 neurons were collected (after spike sorting) during REM and NREM sleep. [sent-197, score-0.376]

84 For the GBLM τf determines the timescale of the spike-timing dependent covariates X w , while for adaptive ﬁltering τf deﬁnes the transition matrix F . [sent-199, score-0.259]

85 Analyzing the most strongly correlated 75 pairs of neurons during the 180 minute recording (2-fold cross-validation) we ﬁnd that the GBLM and Adaptive Filtering both increase model accuracy (Fig 4A). [sent-202, score-0.276]

86 Additionally, we ﬁnd that the increase in model accuracy provided by adding non-stationary coupling to the traditional, stable coupling GLM does not appear to be correlated with the strength of coupling itself. [sent-205, score-1.359]

87 4 Discussion Here we have presented two methods for estimating spike-timing dependent modiﬁcation functions from multiple spike train data: an indirect method based on point-process adaptive ﬁltering and a direct method using a generalized bilinear model. [sent-208, score-0.458]

88 We have shown that each of these methods is able to accurately reconstruct both ongoing ﬂuctuations in synaptic weight and modiﬁcation functions in simulation. [sent-209, score-0.361]

89 Rather, each neuron receives input from thousands of other neurons, inputs which may confound estimation of the coupling between a given pair. [sent-212, score-0.529]

90 It would be relatively straightforward to include multiple pre-synaptic neurons in the model using either stable coupling [6, 10] or time-varying, spike-timing dependent coupling. [sent-213, score-0.696]

91 These extra covariates should further improve spike prediction accuracy, and could, potentially, result in better estimation of STDP modiﬁcation functions. [sent-215, score-0.297]

92 Despite these caveats the statistical description of time-varying coupling presented here shows promise. [sent-216, score-0.389]

93 Although the neurons in vivo are not guaranteed to be anatomically connected and estimated coupling must be always be interpreted cautiously [11], including synaptic modiﬁcation terms does improve model accuracy on in vivo data. [sent-217, score-1.152]

94 For instance, recordings from hippocampal slice or dissociated neuronal cultures may reveal substantially more plasticity than in vivo cortical recordings and are less likely to be confounded by unobserved common-input. [sent-221, score-0.428]

95 By changing the functional form of the covariates included in the GBLM we may be able to distinguish between standard models of STDP where spike pairs are treated independently and other models such as those with self-normalization [16] or where spike triplets are considered [26]. [sent-224, score-0.529]

96 Ultimately, the framework presented here extends recent GLM-based approaches to modeling coupling between neurons to allow for time-varying coupling between neurons and, particularly, changes in coupling related to spike-timing dependent plasticity. [sent-225, score-1.658]

97 Regulation of synaptic efﬁcacy by coincidence of postsynaptic aps and epsps. [sent-234, score-0.305]

98 Spike timing-dependent synaptic depression in the in vivo barrel cortex of the rat. [sent-241, score-0.448]

99 A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. [sent-271, score-0.258]

100 Distributed synaptic modiﬁcation in neural networks induced by patterned stimulation. [sent-313, score-0.293]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('gblm', 0.393), ('coupling', 0.389), ('stdp', 0.294), ('synaptic', 0.267), ('tpost', 0.262), ('tpre', 0.262), ('spike', 0.195), ('neurons', 0.181), ('modi', 0.148), ('vivo', 0.144), ('neuron', 0.14), ('strength', 0.116), ('fig', 0.115), ('adaptive', 0.107), ('covariates', 0.102), ('npost', 0.102), ('history', 0.099), ('spikes', 0.097), ('spiking', 0.084), ('changes', 0.079), ('plasticity', 0.078), ('stable', 0.076), ('psh', 0.073), ('glm', 0.067), ('motor', 0.063), ('recordings', 0.063), ('filtering', 0.063), ('ltering', 0.061), ('poisson', 0.061), ('recording', 0.058), ('bilinear', 0.058), ('simulated', 0.057), ('lnp', 0.055), ('ring', 0.054), ('modification', 0.051), ('weight', 0.05), ('dependent', 0.05), ('cation', 0.047), ('ms', 0.045), ('cultures', 0.044), ('lpsh', 0.044), ('npre', 0.044), ('homogeneous', 0.04), ('intracellular', 0.04), ('postsynaptic', 0.038), ('oisson', 0.038), ('activity', 0.038), ('pairs', 0.037), ('timing', 0.037), ('cortex', 0.037), ('hippocampal', 0.036), ('inferring', 0.035), ('smoothed', 0.035), ('deviance', 0.035), ('undergoing', 0.035), ('stevenson', 0.033), ('sem', 0.033), ('hebbian', 0.033), ('neuroscience', 0.033), ('relative', 0.032), ('filter', 0.032), ('rehabilitation', 0.031), ('forgetting', 0.031), ('effect', 0.031), ('generative', 0.03), ('electrode', 0.03), ('lf', 0.03), ('simulations', 0.03), ('intensity', 0.029), ('coup', 0.029), ('synapses', 0.029), ('nonstationary', 0.029), ('likelihood', 0.028), ('synapse', 0.028), ('neurophysiology', 0.028), ('tk', 0.027), ('estimated', 0.027), ('train', 0.027), ('bars', 0.026), ('past', 0.026), ('timestep', 0.026), ('sleeping', 0.026), ('kording', 0.026), ('neural', 0.026), ('lm', 0.025), ('en', 0.024), ('extracellularly', 0.024), ('rebesco', 0.024), ('solo', 0.024), ('lter', 0.023), ('accurately', 0.023), ('barbieri', 0.022), ('examined', 0.022), ('nt', 0.022), ('detect', 0.022), ('recorded', 0.022), ('functions', 0.021), ('eden', 0.021), ('unitary', 0.021), ('strengthened', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000014 133 nips-2011-Inferring spike-timing-dependent plasticity from spike train data

Author: Konrad Koerding, Ian Stevenson

2 0.322541 302 nips-2011-Variational Learning for Recurrent Spiking Networks

Author: Danilo J. Rezende, Daan Wierstra, Wulfram Gerstner

Abstract: We derive a plausible learning rule for feedforward, feedback and lateral connections in a recurrent network of spiking neurons. Operating in the context of a generative model for distributions of spike sequences, the learning mechanism is derived from variational inference principles. The synaptic plasticity rules found are interesting in that they are strongly reminiscent of experimental Spike Time Dependent Plasticity, and in that they differ for excitatory and inhibitory neurons. A simulation conﬁrms the method’s applicability to learning both stationary and temporal spike patterns. 1

3 0.20947585 249 nips-2011-Sequence learning with hidden units in spiking neural networks

Author: Johanni Brea, Walter Senn, Jean-pascal Pfister

Abstract: We consider a statistical framework in which recurrent networks of spiking neurons learn to generate spatio-temporal spike patterns. Given biologically realistic stochastic neuronal dynamics we derive a tractable learning rule for the synaptic weights towards hidden and visible neurons that leads to optimal recall of the training sequences. We show that learning synaptic weights towards hidden neurons signiﬁcantly improves the storing capacity of the network. Furthermore, we derive an approximate online learning rule and show that our learning rule is consistent with Spike-Timing Dependent Plasticity in that if a presynaptic spike shortly precedes a postynaptic spike, potentiation is induced and otherwise depression is elicited.

4 0.20429656 200 nips-2011-On the Analysis of Multi-Channel Neural Spike Data

Author: Bo Chen, David E. Carlson, Lawrence Carin

Abstract: Nonparametric Bayesian methods are developed for analysis of multi-channel spike-train data, with the feature learning and spike sorting performed jointly. The feature learning and sorting are performed simultaneously across all channels. Dictionary learning is implemented via the beta-Bernoulli process, with spike sorting performed via the dynamic hierarchical Dirichlet process (dHDP), with these two models coupled. The dHDP is augmented to eliminate refractoryperiod violations, it allows the “appearance” and “disappearance” of neurons over time, and it models smooth variation in the spike statistics. 1

5 0.18892108 23 nips-2011-Active dendrites: adaptation to spike-based communication

Author: Balazs B. Ujfalussy, Máté Lengyel

Abstract: Computational analyses of dendritic computations often assume stationary inputs to neurons, ignoring the pulsatile nature of spike-based communication between neurons and the moment-to-moment ﬂuctuations caused by such spiking inputs. Conversely, circuit computations with spiking neurons are usually formalized without regard to the rich nonlinear nature of dendritic processing. Here we address the computational challenge faced by neurons that compute and represent analogue quantities but communicate with digital spikes, and show that reliable computation of even purely linear functions of inputs can require the interplay of strongly nonlinear subunits within the postsynaptic dendritic tree. Our theory predicts a matching of dendritic nonlinearities and synaptic weight distributions to the joint statistics of presynaptic inputs. This approach suggests normative roles for some puzzling forms of nonlinear dendritic dynamics and plasticity. 1

6 0.18631527 86 nips-2011-Empirical models of spiking in neural populations

7 0.15876812 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations

8 0.14249083 13 nips-2011-A blind sparse deconvolution method for neural spike identification

9 0.13000272 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

10 0.1192857 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)

11 0.11751328 24 nips-2011-Active learning of neural response functions with Gaussian processes

12 0.11408048 89 nips-2011-Estimating time-varying input signals and ion channel states from a single voltage trace of a neuron

13 0.11148243 2 nips-2011-A Brain-Machine Interface Operating with a Real-Time Spiking Neural Network Control Algorithm

14 0.10733129 219 nips-2011-Predicting response time and error rates in visual search

15 0.10084368 224 nips-2011-Probabilistic Modeling of Dependencies Among Visual Short-Term Memory Representations

16 0.097333565 292 nips-2011-Two is better than one: distinct roles for familiarity and recollection in retrieving palimpsest memories

17 0.09727183 99 nips-2011-From Stochastic Nonlinear Integrate-and-Fire to Generalized Linear Models

18 0.090446733 75 nips-2011-Dynamical segmentation of single trials from population neural data

19 0.090415895 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes

20 0.089049451 44 nips-2011-Bayesian Spike-Triggered Covariance Analysis

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.16), (1, 0.126), (2, 0.371), (3, -0.041), (4, 0.122), (5, 0.055), (6, -0.088), (7, -0.038), (8, -0.061), (9, 0.031), (10, 0.092), (11, 0.112), (12, -0.092), (13, 0.057), (14, -0.054), (15, -0.015), (16, -0.028), (17, -0.03), (18, -0.01), (19, 0.011), (20, 0.037), (21, 0.023), (22, 0.016), (23, 0.024), (24, -0.05), (25, -0.041), (26, -0.003), (27, -0.022), (28, 0.015), (29, 0.063), (30, 0.037), (31, -0.047), (32, 0.009), (33, 0.056), (34, -0.015), (35, -0.015), (36, 0.001), (37, -0.011), (38, 0.095), (39, -0.046), (40, 0.004), (41, 0.018), (42, -0.029), (43, -0.043), (44, -0.069), (45, 0.04), (46, -0.059), (47, 0.025), (48, -0.008), (49, -0.006)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96726429 133 nips-2011-Inferring spike-timing-dependent plasticity from spike train data

Author: Konrad Koerding, Ian Stevenson

2 0.85769963 23 nips-2011-Active dendrites: adaptation to spike-based communication

Author: Balazs B. Ujfalussy, Máté Lengyel

3 0.83534551 302 nips-2011-Variational Learning for Recurrent Spiking Networks

Author: Danilo J. Rezende, Daan Wierstra, Wulfram Gerstner

4 0.75997877 99 nips-2011-From Stochastic Nonlinear Integrate-and-Fire to Generalized Linear Models

Author: Skander Mensi, Richard Naud, Wulfram Gerstner

Abstract: Variability in single neuron models is typically implemented either by a stochastic Leaky-Integrate-and-Fire model or by a model of the Generalized Linear Model (GLM) family. We use analytical and numerical methods to relate state-of-theart models from both schools of thought. First we ﬁnd the analytical expressions relating the subthreshold voltage from the Adaptive Exponential Integrate-andFire model (AdEx) to the Spike-Response Model with escape noise (SRM as an example of a GLM). Then we calculate numerically the link-function that provides the ﬁring probability given a deterministic membrane potential. We ﬁnd a mathematical expression for this link-function and test the ability of the GLM to predict the ﬁring probability of a neuron receiving complex stimulation. Comparing the prediction performance of various link-functions, we ﬁnd that a GLM with an exponential link-function provides an excellent approximation to the Adaptive Exponential Integrate-and-Fire with colored-noise input. These results help to understand the relationship between the different approaches to stochastic neuron models. 1 Motivation When it comes to modeling the intrinsic variability in simple neuron models, we can distinguish two traditional approaches. One approach is inspired by the stochastic Leaky Integrate-and-Fire (LIF) hypothesis of Stein (1967) [1], where a noise term is added to the system of differential equations implementing the leaky integration to a threshold. There are multiple versions of such a stochastic LIF [2]. How the noise affects the ﬁring probability is also a function of the parameters of the neuron model. Therefore, it is important to take into account the reﬁnements of simple neuron models in terms of subthreshold resonance [3, 4], spike-triggered adaptation [5, 6] and non-linear spike 1 initiation [7, 5]. All these improvements are encompassed by the Adaptive Exponential Integrateand-Fire model (AdEx [8, 9]). The other approach is to start with some deterministic dynamics for the the state of the neuron (for instance the instantaneous distance from the membrane potential to the threshold) and link the probability intensity of emitting a spike with a non-linear function of the state variable. Under some conditions, this type of model is part of a greater class of statistical models called Generalized Linear Models (GLM [10]). As a single neuron model, the Spike Response Model (SRM) with escape noise is a GLM in which the state variable is explicitly the distance between a deterministic voltage and the threshold. The original SRM could account for subthreshold resonance, refractory effects and spike-frequency adaptation [11]. Mathematically similar models were developed independently in the study of the visual system [12] where spike-frequency adaptation has also been modeled [13]. Recently, this approach has retained increased attention since the probabilistic framework can be linked with the Bayesian theory of neural systems [14] and because Bayesian inference can be applied to the population of neurons [15]. In this paper, we investigate the similarity and differences between the state-of-the-art GLM and the stochastic AdEx. The motivation behind this work is to relate the traditional threshold neuron models to Bayesian theory. Our results extend the work of Plesser and Gerstner (2000) [16] since we include the non-linearity for spike initiation and spike-frequency adaptation. We also provide relationships between the parameters of the AdEx and the equivalent GLM. These precise relationships can be used to relate analog implementations of threshold models [17] to the probabilistic models used in the Bayesian approach. The paper is organized as follows: We ﬁrst describe the expressions relating the SRM state-variable to the parameters of the AdEx (Sect. 3.1) in the subthreshold regime. Then, we use numerical methods to ﬁnd the non-linear link-function that models the ﬁring probability (Sect. 3.2). We ﬁnd a functional form for the SRM link-function that best describes the ﬁring probability of a stochastic AdEx. We then compare the performance of this link-function with the often used exponential or linear-rectiﬁer link-functions (also called half-wave linear rectiﬁer) in terms of predicting the ﬁring probability of an AdEx under complex stimulus (Sect. 3.3). We ﬁnd that the exponential linkfunction yields almost perfect prediction. Finally, we explore the relations between the statistic of the noise and the sharpness of the non-linearity for spike initiation with the parameters of the SRM. 2 Presentation of the Models In this section we present the general formula for the stochastic AdEx model (Sect. 2.1) and the SRM (Sect 2.2). 2.1 The Stochastic Adaptive Exponential Integrate-and-Fire Model The voltage dynamics of the stochastic AdEx is given by: V −Θ ˙ τm V = El − V + ∆T exp − Rw + RI + R (1) ∆T τw w = a(V − El ) − w ˙ (2) where τm is the membrane time constant, El the reverse potential, R the membrane resistance, Θ is the threshold, ∆T is the shape factor and I(t) the input current which is chosen to be an Ornstein−Θ Uhlenbeck process with correlation time-constant of 5 ms. The exponential term ∆T exp( V∆T ) is a non-linear function responsible for the emission of spikes and is a diffusive white noise with standard deviation σ (i.e. ∼ N (0, σ)). Note that the diffusive white-noise does not imply white noise ﬂuctuations of the voltage V (t), the probability distribution of V (t) will depend on ∆T and Θ. The second variable, w, describes the subthreshold as well as the spike-triggered adaptation both ˆ parametrized by the coupling strength a and the time constant τw . Each time tj the voltage goes to inﬁnity, we assumed that a spike is emitted. Then the voltage is reset to a ﬁxed value Vr and w is increased by a constant value b. 2.2 The Generalized Linear Model In the SRM, The voltage V (t) is given by the convolution of the injected current I(t) with the membrane ﬁlter κ(t) plus the additional kernel η(t) that acts after each spikes (here we split the 2 spike-triggered kernel in two η(t) = ηv (t) + ηw (t) for reasons that will become clear later): V (t) = ˆ ˆ ηv (t − tj ) + ηw (t − tj ) El + [κ ∗ I](t) + (3) ˆ {tj } ˆ Then at each time tj a spike is emitted which results in a change of voltage described by η(t) = ηv (t) + ηw (t). Given the deterministic voltage, (Eq. 3) a spike is emitted according to the ﬁring intensity λ(V ): λ(t) = f (V (t)) (4) where f (·) is an arbitrary function called the link-function. Then the ﬁring behavior of the SRM depends on the choice of the link-function and its parameters. The most common link-function used to model single neuron activities are the linear-rectiﬁer and the exponential function. 3 Mapping In order to map the stochastic AdEx to the SRM we follow a two-step procedure. First we derive the ﬁlter κ(t) and the kernels ηv (t) and ηw (t) analytically as a function of AdEx parameters. Second, we derive the link-function of the SRM from the stochastic spike emission of the AdEx. Figure 1: Mapping of the subthreshold dynamics of an AdEx to an equivalent SRM. A. Membrane ﬁlter κ(t) for three different sets of parameters of the AdEx leading to over-damped, critically damped and under-damped cases (upper, middle and lower panel, respectively). B. Spike-Triggered η(t) (black), ηv (t) (light gray) and ηw (gray) for the three cases. C. Example of voltage trace produced when an AdEx is stimulated with a step of colored noise (black). The corresponding voltage from a SRM stimulated with the same current and where we forced the spikes to match those of the AdEx (red). D. Error in the subthreshold voltage (VAdEx − VGLM ) as a function of the mean voltage of the AdEx, for the three different cases: over-, critically and under-damped (light gray, gray and black, respectively) with ∆T = 1 mV. Red line represents the voltage threshold Θ. E. Root Mean Square Error (RMSE) ratio for the three cases with ∆T = 1 mV. The RMSE ratio is the RMSE between the deterministic VSRM and the stochastic VAdEx divided by the RMSE between repetitions of the stochastic AdEx voltage. The error bar shows a single standard deviation as the RMSE ratio is averaged accross multiple value of σ. 3.1 Subthreshold voltage dynamics We start by assuming that the non-linearity for spike initiation does not affect the mean subthreshold voltage of the stochastic AdEx (see Figure 1 D). This assumption is motivated by the small ∆T 3 observed in in-vitro recordings (from 0.5 to 2 mV [8, 9]) which suggest that the subthreshold dynamics are mainly linear except very close to Θ. Also, we expect that the non-linear link-function will capture some of the dynamics due to the non-linearity for spike initiation. Thus it is possible to rewrite the deterministic subthreshold part of the AdEx (Eq. 1-2 without and without ∆T exp((V − Θ)/∆T )) using matrices: ˙ x = Ax (5) with x = V w and A = − τ1 m a τw − gl1m τ − τ1 w (6) In this form, the dynamics of the deterministic AdEx voltage is a damped oscillator with a driving force. Depending on the eigenvalues of A the system could be over-damped, critically damped or under-damped. The ﬁlter κ(t) of the GLM is given by the impulse response of the system of coupled differential equations of the AdEx, described by Eq. 5 and 6. In other words, one has to derive the response of the system when stimulating with a Dirac-delta function. The type of damping gives three different qualitative shapes of the kernel κ(t), which are summarized in Table 3.1 and Figure 1 A. Since the three different ﬁlters also affect the nature of the stochastic voltage ﬂuctuations, we will keep the distinction between over-damped, critically damped and under-damped scenarios throughout the paper. This means that our approach is valid for at least 3 types of diffusive voltage-noise (i.e. the white noise in Eq. 1 ﬁltered by 3 different membrane ﬁlters κ(t)). To complete the description of the deterministic voltage, we need an expression for the spiketriggered kernels. The voltage reset at each spike brings a spike-triggered jump in voltage of magˆ nitude ∆ = Vr − V (t). This perturbation is superposed to the current ﬂuctuations due to I(t) and can be mediated by a Delta-diract pulse of current. Thus we can write the voltage reset kernel by: ηv (t) = ∆ ∆ [δ ∗ κ] (t) = κ(t) κ(0) κ(0) (7) where δ(t) is the Dirac-delta function. The shape of this kernel depends on κ(t) and can be computed from Table 3.1 (see Figure 1 B). Finally, the AdEx mediates spike-frequency adaptation by the jump of the second variables w. From Eq. 2 we can see that this produces a current wspike (t) = b exp (−t/τw ) that can cumulate over subsequent spikes. The effect of this current on voltage is then given by the convolution of wspike (t) with the membrane ﬁlter κ(t). Thus in the SRM framework the spike-frequency adaptation is taken into account by: ηw (t) = [wspike ∗ κ](t) (8) Again the precise form of ηw (t) depends on κ(t) and can be computed from Table 3.1 (see Figure 1 B). At this point, we would like to verify our assumption that the non-linearity for spike emission can be neglected. Fig. 1 C and D shows that the error between the voltage from Eq. 3 and the voltage from the stochastic AdEx is generally small. Moreover, we see that the main contribution to the voltage prediction error is due to the mismatch close to the spikes. However the non-linearity for spike initiation may change the probability distribution of the voltage ﬂuctuations, which in turn inﬂuences the probability of spiking. This will inﬂuence the choice of the link-function, as we will see in the next section. 3.2 Spike Generation Using κ(t), ηv (t) and ηw (t), we must relate the spiking probability of the stochastic AdEx as a function of its deterministic voltage. According to [2] the probability of spiking in time bin dt given the deterministic voltage V (t) is given by: p(V ) = prob{spike in [t, t + dt]} = 1 − exp (−f (V (t))dt) (9) where f (·) gives the ﬁring intensity as a function of the deterministic V (t) (Eq. 3). Thus to extract the link-function f we have to compute the probability of spiking given V (t) for our SRM. To do so we apply the method proposed by Jolivet et al. (2004) [18], where the probability of spiking is simply given by the distribution of the deterministic voltage estimated at the spike times divided by the distribution of the SRM voltage when there is no spike (see ﬁgure 2 A). One can numerically compute these two quantities for our models using N repetitions of the same stimulus. 4 Table 1: Analytical expressions for the membrane ﬁlter κ(t) in terms of the parameters of the AdEx for over-, critically-, and under-damped cases. Membrane Filter: κ(t) over-damped if: (τm + τw )2 > 4τm τw (gl +a) gl κ(t) = k1 eλ1 t + k2 eλ2 t λ1 = 1 2τm τw (−(τm + τw ) + critically-damped if: (τm + τw )2 = 4τm τw (gl +a) gl κ(t) = (αt + β)eλt λ= under-damped if: (τm + τw )2 < 4τm τw (gl +a) gl κ(t) = (k1 cos (ωt) + k2 sin (ωt)) eλt −(τm +τw ) 2τm τw λ= −(τm +τw ) 2τm τw (τm + τw )2 − 4 τm τw (gl + a) gl λ2 = 1 2τm τw (−(τm + τw ) − α= τm −τw 2Cτm τw ω= τw −τm 2τm τw 2 − a g l τm τw (τm + τw )2 − 4 τm τw (gl + a) gl k1 = −(1+(τm λ2 )) Cτm (λ1 −λ2 ) k2 = 1+(τm λ1 ) Cτm (λ1 −λ2 ) β= 1 C k1 = k2 = 1 C −(1+τm λ) Cωτm The standard deviation σ of the noise and the parameter ∆T of the AdEx non-linearity may affect the shape of the link-function. We thus extract p(V ) for different σ and ∆T (Fig. 2 B). Then using visual heuristics and previous knowledge about the potential analytical expression of the link-funtion, we try to ﬁnd a simple analytical function that captures p(V ) for a large range of combinations of σ and ∆T . We observed that the log(− log(p)) is close to linear in most studied conditions Fig. 2 B suggesting the following two distributions of p(V ): V − VT (10) p(V ) = 1 − exp − exp ∆V V − VT p(V ) = exp − exp − (11) ∆V Once we have p(V ), we can use Eq. 4 to obtain the equivalent SRM link-function, which leads to: −1 f (V ) = log (1 − p(V )) (12) dt Then the two potential link-functions of the SRM can be derived from Eq. 10 and Eq. 11 (respectively): V − VT f (V ) = λ0 exp (13) ∆V V − VT (14) f (V ) = −λ0 log 1 − exp − exp − ∆V 1 with λ0 = dt , VT the threshold of the SRM and ∆V the sharpness of the link-function (i.e. the parameters that governs the degree of the stochasticity). Note that the exact value of λ0 has no importance since it is redundant with VT . Eq. 13 is the standard exponential link-function, but we call Eq. 14 the log-exp-exp link-function. 3.3 Prediction The next point is to evaluate the ﬁt quality of each link-function. To do this, we ﬁrst estimate the parameters VT and ∆V of the GLM link-function that maximize the likelihood of observing a spike 5 Figure 2: SRM link-function. A. Histogram of the SRM voltage at the AdEx ﬁring times (red) and at non-ﬁring times (gray). The ratio of the two distributions gives p(V ) (Eq. 9, dashed lines). Inset, zoom to see the voltage histogram evaluated at the ﬁring time (red). B. log(− log(p)) as a function of the SRM voltage for three different noise levels σ = 0.07, 0.14, 0.18 nA (pale gray, gray, black dots, respectively) and ∆T = 1 mV. The line is a linear ﬁt corresponding to the log-exp-exp linkfunction and the dashed line corresponds to a ﬁt with the exponential link-function. C. Same data and labeling scheme as B, but plotting f (V ) according to Eq. 12. The lines are produced with Eq. 14 with parameters ﬁtted as described in B. and the dashed lines are produced with Eq. 13. Inset, same plot but on a semi-log(y) axis. train generated with an AdEx. Second we look at the predictive power of the resulting SRM in terms of Peri-Stimulus Time Histogram (PSTH). In other words we ask how close the spike trains generated with a GLM are from the spike train generated with a stochastic AdEx when both models are stimulated with the same input current. For any GLM with link-function f (V ) ≡ f (t|I, θ) and parameters θ regulating the shape of κ(t), ˆ ηv (t) and ηw (t), the Negative Log-Likelihood (NLL) of observing a spike-train {t} is given by:   NLL = − log(f (t|I, θ)) − f (t|I, θ) (15) t ˆ t It has been shown that the negative log-likelihood is convex in the parameters if f is convex and logconcave [19]. It is easy to show that a linear-rectiﬁer link-function, the exponential link-function and the log-exp-exp link-function all satisfy these conditions. This allows efﬁcient estimation of ˆ ˆ the optimal parameters VT and ∆V using a simple gradient descent. One can thus estimate from a single AdEx spike train the optimal parameters of a given link-function, which is more efﬁcient than the method used in Sect. 3.2. The minimal NLL resulting from the gradient descent gives an estimation of the ﬁt quality. A better estimate of the ﬁt quality is given by the distance between the PSTHs in response to stimuli not used for parameter ﬁtting . Let ν1 (t) be the PSTH of the AdEx, and ν2 (t) be the PSTH of the ﬁtted SRM, 6 Figure 3: PSTH prediction. A. Injected current. B. Voltage traces produced by an AdEx (black) and the equivalent SRM (red), when stimulated with the current in A. C. Raster plot for 20 realizations of AdEx (black tick marks) and equivalent SRM (red tick marks). D. PSTH of the AdEx (black) and the SRM (red) obtained by averaging 10,000 repetitions. E. Optimal log-likelihood for the three cases of the AdEx, using three different link-functions, a linear-rectiﬁer (light gray), an exponential link-function (gray) and the link-function deﬁned by Eq. 14 (dark gray), these values are obtained by averaging over 40 different combinations σ and ∆T (see Fig. 4). Error bars are one standard deviation, the stars denote a signiﬁcant difference, two-sample t-test with α = 0.01. F. same as E. but for Md (Eq. 16). then we use Md ∈ [0, 1] as a measure of match: Md = 2 2 (ν1 (t) − ν2 (t)) dt ν1 (t)2 dt + ν2 (t)2 dt (16) Md = 1 means that it is impossible to differentiate the SRM from the AdEx in terms of their PSTHs, whereas a Md of 0 means that the two PSTHs are completely different. Thus Md is a normalized similarity measure between two PSTHs. In practice, Md is estimated from the smoothed (boxcar average of 1 ms half-width) averaged spike train of 1 000 repetitions for each models. We use both the NLL and Md to quantify the ﬁt quality for each of the three damping cases and each of the three link-functions. Figure 3 shows the match between the stochastic AdEx used as a reference and the derived GLM when both are stimulated with the same input current (Fig. 3 A). The resulting voltage traces are almost identical (Fig. 3 B) and both models predict almost the same spike trains and so the same PSTHs (Fig. 3 C and D). More quantitalively, we see on Fig. 3 E and F, that the linear-rectiﬁer ﬁts signiﬁcantly worse than both the exponential and log-exp-exp link-functions, both in terms of NLL and of Md . The exponential link-function performs as well as the log-exp-exp link-function, with a spike train similarity measure Md being almost 1 for both. Finally the likelihood-based method described above gives us the opportunity to look at the relationship between the AdEx parameters σ and ∆T that governs its spike emission and the parameters VT and ∆V of the link-function (Fig. 4). We observe that an increase of the noise level produces a ﬂatter link-function (greater ∆V ) while an increase in ∆T also produces an increase in ∆V and VT (note that Fig. 4 shows ∆V and VT for the exponential link-function only, but equivalent results are obtained with the log-exp-exp link-function). 4 Discussion In Sect. 3.3 we have shown that it is possible to predict with almost perfect accuracy the PSTH of a stochastic AdEx model using an appropriate set of parameters in the SRM. Moreover, since 7 Figure 4: Inﬂuence of the AdEx parameters on the parameters of the exponential link-function. A. VT as a function of ∆T and σ. B. ∆V as a function of ∆T and σ. the subthreshold voltage of the AdEx also gives a good match with the deterministic voltage of the SRM, we expect that the AdEx and the SRM will not differ in higher moments of the spike train probability distributions beyond the PSTH. We therefore conclude that diffusive noise models of the type of Eq. 1-2 are equivalent to GLM of the type of Eq. 3-4. Once combined with similar results on other types of stochastic LIF (e.g. correlated noise), we could bridge the gap between the literature on GLM and the literature on diffusive noise models. Another noteworthy observation pertains to the nature of the link-function. The link-function has been hypothesized to be a linear-rectiﬁer, an exponential, a sigmoidal or a Gaussian [16]. We have observed that for the AdEx the link-function follows Eq. 14 that we called the log-exp-exp linkfunction. Although the link-function is log-exp-exp for most of the AdEx parameters, the exponential link-function gives an equivalently good prediction of the PSTH. This can be explained by the fact that the difference between log-exp-exp and exponential link-functions happens mainly at low voltage (i.e. far from the threshold), where the probability of emitting a spike is so low (Figure 2 C, until -50 mv). Therefore, even if the exponential link-function overestimates the ﬁring probability at these low voltages it rarely produces extra spikes. At voltages closer to the threshold, where most of the spikes are emitted, the two link-functions behave almost identically and hence produce the same PSTH. The Gaussian link-function can be seen as lying in-between the exponential link-function and the log-exp-exp link-function in Fig. 2. This means that the work of Plesser and Gerstner (2000) [16] is in agreement with the results presented here. The importance of the time-derivative of the ˙ voltage stressed by Plesser and Gerstner (leading to a two-dimensional link-function f (V, V )) was not studied here to remain consistent with the typical usage of GLM in neural systems [14]. Finally we restricted our study to exponential non-linearity for spike initiation and do not consider other cases such as the Quadratic Integrate-and-ﬁre (QIF, [5]) or other polynomial functional shapes. We overlooked these cases for two reasons. First, there are many evidences that the non-linearity in neurons (estimated from in-vitro recordings of Pyramidal neurons) is well approximated by a single exponential [9]. Second, the exponential non-linearity of the AdEx only affects the subthreshold voltage at high voltage (close to threshold) and thus can be neglected to derive the ﬁlters κ(t) and η(t). Polynomial non-linearities on the other hand affect a larger range of the subthreshold voltage so that it would be difﬁcult to justify the linearization of subthreshold dynamics essential to the method presented here. References [1] R. B. Stein, “Some models of neuronal variability,” Biophys J, vol. 7, no. 1, pp. 37–68, 1967. [2] W. Gerstner and W. Kistler, Spiking neuron models. Cambridge University Press New York, 2002. [3] E. Izhikevich, “Resonate-and-ﬁre neurons,” Neural Networks, vol. 14, no. 883-894, 2001. [4] M. J. E. Richardson, N. Brunel, and V. Hakim, “From subthreshold to ﬁring-rate resonance,” Journal of Neurophysiology, vol. 89, pp. 2538–2554, 2003. 8 [5] E. Izhikevich, “Simple model of spiking neurons,” IEEE Transactions on Neural Networks, vol. 14, pp. 1569–1572, 2003. [6] S. Mensi, R. Naud, M. Avermann, C. C. H. Petersen, and W. Gerstner, “Parameter extraction and classiﬁcation of three neuron types reveals two different adaptation mechanisms,” Under review. [7] N. Fourcaud-Trocme, D. Hansel, C. V. Vreeswijk, and N. Brunel, “How spike generation mechanisms determine the neuronal response to ﬂuctuating inputs,” Journal of Neuroscience, vol. 23, no. 37, pp. 11 628–11 640, 2003. [8] R. Brette and W. Gerstner, “Adaptive exponential integrate-and-ﬁre model as an effective description of neuronal activity,” Journal of Neurophysiology, vol. 94, pp. 3637–3642, 2005. [9] L. Badel, W. Gerstner, and M. Richardson, “Dependence of the spike-triggered average voltage on membrane response properties,” Neurocomputing, vol. 69, pp. 1062–1065, 2007. [10] P. McCullagh and J. A. Nelder, Generalized linear models, 2nd ed. Chapman & Hall/CRC, 1998, vol. 37. [11] W. Gerstner, J. van Hemmen, and J. Cowan, “What matters in neuronal locking?” Neural computation, vol. 8, pp. 1653–1676, 1996. [12] D. Hubel and T. Wiesel, “Receptive ﬁelds and functional architecture of monkey striate cortex,” Journal of Physiology, vol. 195, pp. 215–243, 1968. [13] J. Pillow, L. Paninski, V. Uzzell, E. Simoncelli, and E. Chichilnisky, “Prediction and decoding of retinal ganglion cell responses with a probabilistic spiking model,” Journal of Neuroscience, vol. 25, no. 47, pp. 11 003–11 013, 2005. [14] K. Doya, S. Ishii, A. Pouget, and R. P. N. Rao, Bayesian brain: Probabilistic approaches to neural coding. The MIT Press, 2007. [15] S. Gerwinn, J. H. Macke, M. Seeger, and M. Bethge, “Bayesian inference for spiking neuron models with a sparsity prior,” in Advances in Neural Information Processing Systems, 2007. [16] H. Plesser and W. Gerstner, “Noise in integrate-and-ﬁre neurons: From stochastic input to escape rates,” Neural Computation, vol. 12, pp. 367–384, 2000. [17] J. Schemmel, J. Fieres, and K. Meier, “Wafer-scale integration of analog neural networks,” in Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on, june 2008, pp. 431 –438. [18] R. Jolivet, T. Lewis, and W. Gerstner, “Generalized integrate-and-ﬁre models of neuronal activity approximate spike trains of a detailed model to a high degree of accuracy,” Journal of Neurophysiology, vol. 92, pp. 959–976, 2004. [19] L. Paninski, “Maximum likelihood estimation of cascade point-process neural encoding models,” Network: Computation in Neural Systems, vol. 15, pp. 243–262, 2004. 9

5 0.74704432 2 nips-2011-A Brain-Machine Interface Operating with a Real-Time Spiking Neural Network Control Algorithm

Author: Julie Dethier, Paul Nuyujukian, Chris Eliasmith, Terrence C. Stewart, Shauki A. Elasaad, Krishna V. Shenoy, Kwabena A. Boahen

Abstract: Motor prostheses aim to restore function to disabled patients. Despite compelling proof of concept systems, barriers to clinical translation remain. One challenge is to develop a low-power, fully-implantable system that dissipates only minimal power so as not to damage tissue. To this end, we implemented a Kalman-ﬁlter based decoder via a spiking neural network (SNN) and tested it in brain-machine interface (BMI) experiments with a rhesus monkey. The Kalman ﬁlter was trained to predict the arm’s velocity and mapped on to the SNN using the Neural Engineering Framework (NEF). A 2,000-neuron embedded Matlab SNN implementation runs in real-time and its closed-loop performance is quite comparable to that of the standard Kalman ﬁlter. The success of this closed-loop decoder holds promise for hardware SNN implementations of statistical signal processing algorithms on neuromorphic chips, which may offer power savings necessary to overcome a major obstacle to the successful clinical translation of neural motor prostheses. ∗ Present: Research Fellow F.R.S.-FNRS, Systmod Unit, University of Liege, Belgium. 1 1 Cortically-controlled motor prostheses: the challenge Motor prostheses aim to restore function for severely disabled patients by translating neural signals from the brain into useful control signals for prosthetic limbs or computer cursors. Several proof of concept demonstrations have shown encouraging results, but barriers to clinical translation still remain. One example is the development of a fully-implantable system that meets power dissipation constraints, but is still powerful enough to perform complex operations. A recently reported closedloop cortically-controlled motor prosthesis is capable of producing quick, accurate, and robust computer cursor movements by decoding neural signals (threshold-crossings) from a 96-electrode array in rhesus macaque premotor/motor cortex [1]-[4]. This, and previous designs (e.g., [5]), employ versions of the Kalman ﬁlter, ubiquitous in statistical signal processing. Such a ﬁlter and its variants are the state-of-the-art decoder for brain-machine interfaces (BMIs) in humans [5] and monkeys [2]. While these recent advances are encouraging, clinical translation of such BMIs requires fullyimplanted systems, which in turn impose severe power dissipation constraints. Even though it is an open, actively-debated question as to how much of the neural prosthetic system must be implanted, we note that there are no reports to date demonstrating a fully implantable 100-channel wireless transmission system, motivating performing decoding within the implanted chip. This computation is constrained by a stringent power budget: A 6 × 6mm2 implant must dissipate less than 10mW to avoid heating the brain by more than 1◦ C [6], which is believed to be important for long term cell health. With this power budget, current approaches can not scale to higher electrode densities or to substantially more computer-intensive decode/control algorithms. The feasibility of mapping a Kalman-ﬁlter based decoder algorithm [1]-[4] on to a spiking neural network (SNN) has been explored off-line (open-loop). In these off-line tests, the SNN’s performance virtually matched that of the standard implementation [7]. These simulations provide conﬁdence that this algorithm—and others similar to it—could be implemented using an ultra-low-power approach potentially capable of meeting the severe power constraints set by clinical translation. This neuromorphic approach uses very-large-scale integrated systems containing microelectronic analog circuits to morph neural systems into silicon chips [8, 9]. These neuromorphic circuits may yield tremendous power savings—50nW per silicon neuron [10]—over digital circuits because they use physical operations to perform mathematical computations (analog approach). When implemented on a chip designed using the neuromorphic approach, a 2,000-neuron SNN network can consume as little as 100µW. Demonstrating this approach’s feasibility in a closed-loop system running in real-time is a key, non-incremental step in the development of a fully implantable decoding chip, and is necessary before proceeding with fabricating and implanting the chip. As noise, delay, and over-ﬁtting play a more important role in the closed-loop setting, it is not obvious that the SNN’s stellar open-loop performance will hold up. In addition, performance criteria are different in the closed-loop and openloop settings (e.g., time per target vs. root mean squared error). Therefore, a SNN of a different size may be required to meet the desired speciﬁcations. Here we present results and assess the performance and viability of the SNN Kalman-ﬁlter based decoder in real-time, closed-loop tests, with the monkey performing a center-out-and-back target acquisition task. To achieve closed-loop operation, we developed an embedded Matlab implementation that ran a 2,000-neuron version of the SNN in real-time on a PC. We achieved almost a 50-fold speed-up by performing part of the computation in a lower-dimensional space deﬁned by the formal method we used to map the Kalman ﬁlter on to the SNN. This shortcut allowed us to run a larger SNN in real-time than would otherwise be possible. 2 Spiking neural network mapping of control theory algorithms As reported in [11], a formal methodology, called the Neural Engineering Framework (NEF), has been developed to map control-theory algorithms onto a computational fabric consisting of a highly heterogeneous population of spiking neurons simply by programming the strengths of their connections. These artiﬁcial neurons are characterized by a nonlinear multi-dimensional-vector-to-spikerate function—ai (x(t)) for the ith neuron—with parameters (preferred direction, maximum ﬁring rate, and spiking-threshold) drawn randomly from a wide distribution (standard deviation ≈ mean). 2 Spike rate (spikes/s) Representation ˆ x → ai (x) → x = ∑i ai (x)φix ˜ ai (x) = G(αi φix · x + Jibias ) 400 Transformation y = Ax → b j (Aˆ ) x Aˆ = ∑i ai (x)Aφix x x(t) B' y(t) A' 200 0 −1 Dynamics ˙ x = Ax → x = h ∗ A x A = τA + I 0 Stimulus x 1 bk(t) y(t) B' h(t) x(t) A' aj(t) Figure 1: NEF’s three principles. Representation. 1D tuning curves of a population of 50 leaky integrate-and-ﬁre neurons. The neurons’ tuning curves map control variables (x) to spike rates (ai (x)); this nonlinear transformation is inverted by linear weighted decoding. G() is the neurons’ nonlinear current-to-spike-rate function. Transformation. SNN with populations bk (t) and a j (t) representing y(t) and x(t). Feedforward and recurrent weights are determined by B and A , as described next. Dynamics. The system’s dynamics is captured in a neurally plausible fashion by replacing integration with the synapses’ spike response, h(t), and replacing the matrices with A = τA + I and B = τB to compensate. The neural engineering approach to conﬁguring SNNs to perform arbitrary computations is underlined by three principles (Figure 1) [11]-[14]: Representation is deﬁned by nonlinear encoding of x(t) as a spike rate, ai (x(t))—represented by the neuron tuning curve—combined with optimal weighted linear decoding of ai (x(t)) to recover ˆ an estimate of x(t), x(t) = ∑i ai (x(t))φix , where φix are the decoding weights. Transformation is performed by using alternate decoding weights in the decoding operation to map transformations of x(t) directly into transformations of ai (x(t)). For example, y(t) = Ax(t) is represented by the spike rates b j (Aˆ (t)), where unit j’s input is computed directly from unit i’s x output using Aˆ (t) = ∑i ai (x(t))Aφix , an alternative linear weighting. x Dynamics brings the ﬁrst two principles together and adds the time dimension to the circuit. This principle aims at reuniting the control-theory and neural levels by modifying the matrices to render the system neurally plausible, thereby permitting the synapses’ spike response, h(t), (i.e., impulse ˙ response) to capture the system’s dynamics. For example, for h(t) = τ −1 e−t/τ , x = Ax(t) is realized by replacing A with A = τA + I. This so-called neurally plausible matrix yields an equivalent dynamical system: x(t) = h(t) ∗ A x(t), where convolution replaces integration. The nonlinear encoding process—from a multi-dimensional stimulus, x(t), to a one-dimensional soma current, Ji (x(t)), to a ﬁring rate, ai (x(t))—is speciﬁed as: ai (x(t)) = G(Ji (x(t))). (1) Here G is the neurons’ nonlinear current-to-spike-rate function, which is given by G(Ji (x)) = τ ref − τ RC ln (1 − Jth /Ji (x)) −1 , (2) for the leaky integrate-and-ﬁre model (LIF). The LIF neuron has two behavioral regimes: subthreshold and super-threshold. The sub-threshold regime is described by an RC circuit with time constant τ RC . When the sub-threshold soma voltage reaches the threshold, Vth , the neuron emits a spike δ (t −tn ). After this spike, the neuron is reset and rests for τ ref seconds (absolute refractory period) before it resumes integrating. Jth = Vth /R is the minimum input current that produces spiking. Ignoring the soma’s RC time-constant when specifying the SNN’s dynamics are reasonable because the neurons cross threshold at a rate that is proportional to their input current, which thus sets the spike rate instantaneously, without any ﬁltering [11]. The conversion from a multi-dimensional stimulus, x(t), to a one-dimensional soma current, Ji , is ˜ performed by assigning to the neuron a preferred direction, φix , in the stimulus space and taking the dot-product: ˜ Ji (x(t)) = αi φix · x(t) + Jibias , (3) 3 where αi is a gain or conversion factor, and Jibias is a bias current that accounts for background ˜ activity. For a 1D space, φix is either +1 or −1 (drawn randomly), for ON and OFF neurons, respectively. The resulting tuning curves are illustrated in Figure 1, left. The linear decoding process is characterized by the synapses’ spike response, h(t) (i.e., post-synaptic currents), and the decoding weights, φix , which are obtained by minimizing the mean square error. A single noise term, η, takes into account all sources of noise, which have the effect of introducing uncertainty into the decoding process. Hence, the transmitted ﬁring rate can be written as ai (x(t)) + ηi , where ai (x(t)) represents the noiseless set of tuning curves and ηi is a random variable picked from a zero-mean Gaussian distribution with variance σ 2 . Consequently, the mean square error can be written as [11]: E = 1 ˆ [x(t) − x(t)]2 2 x,η,t = 2 1 2 x(t) − ∑ (ai (x(t)) + ηi ) φix i (4) x,η,t where · x,η denotes integration over the range of x and η, the expected noise. We assume that the noise is independent and has the same variance for each neuron [11], which yields: E= where σ2 1 2 2 x(t) − ∑ ai (x(t))φix i x,t 1 + σ 2 ∑(φix )2 , 2 i (5) is the noise variance ηi η j . This expression is minimized by: N φix = ∑ Γ−1 ϒ j , ij (6) j with Γi j = ai (x)a j (x) x + σ 2 δi j , where δ is the Kronecker delta function matrix, and ϒ j = xa j (x) x [11]. One consequence of modeling noise in the neural representation is that the matrix Γ is invertible despite the use of a highly overcomplete representation. In a noiseless representation, Γ is generally singular because, due to the large number of neurons, there is a high probability of having two neurons with similar tuning curves leading to two similar rows in Γ. 3 Kalman-ﬁlter based cortical decoder In the 1960’s, Kalman described a method that uses linear ﬁltering to track the state of a dynamical system throughout time using a model of the dynamics of the system as well as noisy measurements [15]. The model dynamics gives an estimate of the state of the system at the next time step. This estimate is then corrected using the observations (i.e., measurements) at this time step. The relative weights for these two pieces of information are given by the Kalman gain, K [15, 16]. Whereas the Kalman gain is updated at each iteration, the state and observation matrices (deﬁned below)—and corresponding noise matrices—are supposed constant. In the case of prosthetic applications, the system’s state vector is the cursor’s kinematics, xt = y [veltx , velt , 1], where the constant 1 allows for a ﬁxed offset compensation. The measurement vector, yt , is the neural spike rate (spike counts in each time step) of 192 channels of neural threshold crossings. The system’s dynamics is modeled by: xt yt = Axt−1 + wt , = Cxt + qt , (7) (8) where A is the state matrix, C is the observation matrix, and wt and qt are additive, Gaussian noise sources with wt ∼ N (0, W) and qt ∼ N (0, Q). The model parameters (A, C, W and Q) are ﬁt with training data by correlating the observed hand kinematics with the simultaneously measured neural signals (Figure 2). For an efﬁcient decoding, we derived the steady-state update equation by replacing the adaptive Kalman gain by its steady-state formulation: K = (I + WCQ−1 C)−1 W CT Q−1 . This yields the following estimate of the system’s state: xt = (I − KC)Axt−1 + Kyt = MDT xt−1 + MDT yt , x y 4 (9) a Velocity (cm/s) Neuron 10 c 150 5 100 b 50 20 0 −20 0 0 x−velocity y−velocity 2000 4000 6000 8000 Time (ms) 10000 12000 1cm 14000 Trials: 0034-0049 Figure 2: Neural and kinematic measurements (monkey J, 2011-04-16, 16 continuous trials) used to ﬁt the standard Kalman ﬁlter model. a. The 192 cortical recordings fed as input to ﬁt the Kalman ﬁlter’s matrices (color code refers to the number of threshold crossings observed in each 50ms bin). b. Hand x- and y-velocity measurements correlated with the neural data to obtain the Kalman ﬁlter’s matrices. c. Cursor kinematics of 16 continuous trials under direct hand control. where MDT = (I − KC)A and MDT = K are the discrete time (DT) Kalman matrices. The steadyx y state formulation improves efﬁciency with little loss in accuracy because the optimal Kalman gain rapidly converges (typically less than 100 iterations). Indeed, in neural applications under both open-loop and closed-loop conditions, the difference between the full Kalman ﬁlter and its steadystate implementation falls to within 1% in a few seconds [17]. This simplifying assumption reduces the execution time for decoding a typical neuronal ﬁring rate signal approximately seven-fold [17], a critical speed-up for real-time applications. 4 Kalman ﬁlter with a spiking neural network To implement the Kalman ﬁlter with a SNN by applying the NEF, we ﬁrst convert Equation 9 from DT to continuous time (CT), and then replace the CT matrices with neurally plausible ones, which yields: x(t) = h(t) ∗ A x(t) + B y(t) , (10) where A = τMCT + I, B = τMCT , with MCT = MDT − I /∆t and MCT = MDT /∆t, the CT x y x x y y Kalman matrices, and ∆t = 50ms, the discrete time step; τ is the synaptic time-constant. The jth neuron’s input current (see Equation 3) is computed from the system’s current state, x(t), which is computed from estimates of the system’s previous state (ˆ (t) = ∑i ai (t)φix ) and current x y input (ˆ (t) = ∑k bk (t)φk ) using Equation 10. This yields: y ˜x J j (x(t)) = α j φ j · x(t) + J bias j ˜x ˆ ˆ = α j φ j · h(t) ∗ A x(t) + B y(t) ˜x = α j φ j · h(t) ∗ A + J bias j ∑ ai (t)φix + B ∑ bk (t)φky i + J bias j (11) k This last equation can be written in a neural network form: J j (x(t)) = h(t) ∗ ∑ ω ji ai (t) + ∑ ω jk bk (t) i + J bias j (12) k y ˜x ˜x where ω ji = α j φ j A φix and ω jk = α j φ j B φk are the recurrent and feedforward weights, respectively. 5 Efﬁcient implementation of the SNN In this section, we describe the two distinct steps carried out when implementing the SNN: creating and running the network. The ﬁrst step has no computational constraints whereas the second must be very efﬁcient in order to be successfully deployed in the closed-loop experimental setting. 5 x ( 1000 x ( = 1000 1000 = 1000 x 1000 b 1000 x 1000 1000 a Figure 3: Computing a 1000-neuron pool’s recurrent connections. a. Using connection weights requires multiplying a 1000×1000 matrix by a 1000 ×1 vector. b. Operating in the lower-dimensional state space requires multiplying a 1 × 1000 vector by a 1000 × 1 vector to get the decoded state, multiplying this state by a component of the A matrix to update it, and multiplying the updated state by a 1000 × 1 vector to re-encode it as ﬁring rates, which are then used to update the soma current for every neuron. Network creation: This step generates, for a speciﬁed number of neurons composing the network, x ˜x the gain α j , bias current J bias , preferred direction φ j , and decoding weight φ j for each neuron. The j ˜x preferred directions φ j are drawn randomly from a uniform distribution over the unit sphere. The maximum ﬁring rate, max G(J j (x)), and the normalized x-axis intercept, G(J j (x)) = 0, are drawn randomly from a uniform distribution on [200, 400] Hz and [-1, 1], respectively. From these two speciﬁcations, α j and J bias are computed using Equation 2 and Equation 3. The decoding weights j x φ j are computed by minimizing the mean square error (Equation 6). For efﬁcient implementation, we used two 1D integrators (i.e., two recurrent neuron pools, with each pool representing a scalar) rather than a single 3D integrator (i.e., one recurrent neuron pool, with the pool representing a 3D vector by itself) [13]. The constant 1 is fed to the 1D integrators as an input, rather than continuously integrated as part of the state vector. We also replaced the bk (t) units’ spike rates (Figure 1, middle) with the 192 neural measurements (spike counts in 50ms bins), y which is equivalent to choosing φk from a standard basis (i.e., a unit vector with 1 at the kth position and 0 everywhere else) [7]. Network simulation: This step runs the simulation to update the soma current for every neuron, based on input spikes. The soma voltage is then updated following RC circuit dynamics. Gaussian noise is normally added at this step, the rest of the simulation being noiseless. Neurons with soma voltage above threshold generate a spike and enter their refractory period. The neuron ﬁring rates are decoded using the linear decoding weights to get the updated states values, x and y-velocity. These values are smoothed with a ﬁlter identical to h(t), but with τ set to 5ms instead of 20ms to avoid introducing signiﬁcant delay. Then the simulation step starts over again. In order to ensure rapid execution of the simulation step, neuron interactions are not updated dix rectly using the connection matrix (Equation 12), but rather indirectly with the decoding matrix φ j , ˜x dynamics matrix A , and preferred direction matrix φ j (Equation 11). To see why this is more efﬁcient, suppose we have 1000 neurons in the a population for each of the state vector’s two scalars. Computing the recurrent connections using connection weights requires multiplying a 1000 × 1000 matrix by a 1000-dimensional vector (Figure 3a). This requires 106 multiplications and about 106 sums. Decoding each scalar (i.e., ∑i ai (t)φix ), however, requires only 1000 multiplications and 1000 sums. The decoded state vector is then updated by multiplying it by the (diagonal) A matrix, another 2 products and 1 sum. The updated state vector is then encoded by multiplying it with the neurons’ preferred direction vectors, another 1000 multiplications per scalar (Figure 3b). The resulting total of about 3000 operations is nearly three orders of magnitude fewer than using the connection weights to compute the identical transformation. To measure the speedup, we simulated a 2,000-neuron network on a computer running Matlab 2011a (Intel Core i7, 2.7-GHz, Mac OS X Lion). Although the exact run-times depend on the computing hardware and software, the run-time reduction factor should remain approximately constant across platforms. For each reported result, we ran the simulation 10 times to obtain a reliable estimate of the execution time. The run-time for neuron interactions using the recurrent connection weights was 9.9ms and dropped to 2.7µs in the lower-dimensional space, approximately a 3,500-fold speedup. Only the recurrent interactions beneﬁt from the speedup, the execution time for the rest of the operations remaining constant. The run-time for a 50ms network simulation using the recurrent connec6 Table 1: Model parameters Symbol max G(J j (x)) G(J j (x)) = 0 J bias j αj ˜x φj Range 200-400 Hz −1 to 1 Satisﬁes ﬁrst two Satisﬁes ﬁrst two ˜x φj = 1 Description Maximum ﬁring rate Normalized x-axis intercept Bias current Gain factor Preferred-direction vector σ2 τ RC j τ ref j τ PSC j 0.1 20 ms 1 ms 20 ms Gaussian noise variance RC time constant Refractory period PSC time constant tion weights was 0.94s and dropped to 0.0198s in the lower-dimensional space, a 47-fold speedup. These results demonstrate the efﬁciency the lower-dimensional space offers, which made the closedloop application of SNNs possible. 6 Closed-loop implementation An adult male rhesus macaque (monkey J) was trained to perform a center-out-and-back reaching task for juice rewards to one of eight targets, with a 500ms hold time (Figure 4a) [1]. All animal protocols and procedures were approved by the Stanford Institutional Animal Care and Use Committee. Hand position was measured using a Polaris optical tracking system at 60Hz (Northern Digital Inc.). Neural data were recorded from two 96-electrode silicon arrays (Blackrock Microsystems) implanted in the dorsal pre-motor and motor cortex. These recordings (-4.5 RMS threshold crossing applied to each electrode’s signal) yielded tuned activity for the direction and speed of arm movements. As detailed in [1], a standard Kalman ﬁlter model was ﬁt by correlating the observed hand kinematics with the simultaneously measured neural signals, while the monkey moved his arm to acquire virtual targets (Figure 2). The resulting model was used in a closed-loop system to control an on-screen cursor in real-time (Figure 4a, Decoder block). A steady-state version of this model serves as the standard against which the SNN implementation’s performance is compared. We built a SNN using the NEF methodology based on derived Kalman ﬁlter parameters mentioned above. This SNN was then simulated on an xPC Target (Mathworks) x86 system (Dell T3400, Intel Core 2 Duo E8600, 3.33GHz). It ran in closed-loop, replacing the standard Kalman ﬁlter as the decoder block in Figure 4a. The parameter values listed in Table 1 were used for the SNN implementation. We ensured that the time constants τiRC ,τiref , and τiPSC were smaller than the implementation’s time step (50ms). Noise was not explicitly added. It arose naturally from the ﬂuctuations produced by representing a scalar with ﬁltered spike trains, which has been shown to have effects similar to Gaussian noise [11]. For the purpose of computing the linear decoding weights (i.e., Γ), we modeled the resulting noise as Gaussian with a variance of 0.1. A 2,000-neuron version of the SNN-based decoder was tested in a closed-loop system, the largest network our embedded MatLab implementation could run in real-time. There were 1206 trials total among which 301 (center-outs only) were performed with the SNN and 302 with the standard (steady-state) Kalman ﬁlter. The block structure was randomized and interleaved, so that there is no behavioral bias present in the ﬁndings. 100 trials under hand control are used as a baseline comparison. Success corresponds to a target acquisition under 1500ms, with 500ms hold time. Success rates were higher than 99% on all blocks for the SNN implementation and 100% for the standard Kalman ﬁlter. The average time to acquire the target was slightly slower for the SNN (Figure 5b)—711ms vs. 661ms, respectively—we believe this could be improved by using more neurons in the SNN.1 The average distance to target (Figure 5a) and the average velocity of the cursor (Figure 5c) are very similar. 1 Off-line, the SNN performed better as we increased the number of neurons [7]. 7 a Neural Spikes b c BMI: Kalman decoder BMI: SNN decoder Decoder Cursor Velocity 1cm 1cm Trials: 2056-2071 Trials: 1748-1763 5 0 0 400 Time after Target Onset (ms) 800 Target acquisition time histogram 40 Mean cursor velocity 50 Standard Kalman filter 40 20 Hand 30 30 Spiking Neural Network 20 10 0 c Cursor Velocity (cm/s) b Mean distance to target 10 Percent of Trials (%) a Distance to Target (cm) Figure 4: Experimental setup and results. a. Data are recorded from two 96-channel silicon electrode arrays implanted in dorsal pre-motor and motor cortex of an adult male monkey performing a centerout-and-back reach task for juice rewards to one of eight targets with a 500ms hold time. b. BMI position kinematics of 16 continuous trials for the standard Kalman ﬁlter implementation. c. BMI position kinematics of 16 continuous trials for the SNN implementation. 10 0 500 1000 Target Acquire Time (ms) 1500 0 0 200 400 600 800 Time after Target Onset (ms) 1000 Figure 5: SNN (red) performance compared to standard Kalman ﬁlter (blue) (hand trials are shown for reference (yellow)). The SNN achieves similar results—success rates are higher than 99% on all blocks—as the standard Kalman ﬁlter implementation. a. Plot of distance to target vs. time both after target onset for different control modalities. The thicker traces represent the average time when the cursor ﬁrst enters the acceptance window until successfully entering for the 500ms hold time. b. Histogram of target acquisition time. c. Plot of mean cursor velocity vs. time. 7 Conclusions and future work The SNN’s performance was quite comparable to that produced by a standard Kalman ﬁlter implementation. The 2,000-neuron network had success rates higher than 99% on all blocks, with mean distance to target, target acquisition time, and mean cursor velocity curves very similar to the ones obtained with the standard implementation. Future work will explore whether these results extend to additional animals. As the Kalman ﬁlter and its variants are the state-of-the-art in cortically-controlled motor prostheses [1]-[5], these simulations provide conﬁdence that similar levels of performance can be attained with a neuromorphic system, which can potentially overcome the power constraints set by clinical applications. Our ultimate goal is to develop an ultra-low-power neuromorphic chip for prosthetic applications on to which control theory algorithms can be mapped using the NEF. As our next step in this direction, we will begin exploring this mapping with Neurogrid, a hardware platform with sixteen programmable neuromorphic chips that can simulate up to a million spiking neurons in real-time [9]. However, bandwidth limitations prevent Neurogrid from realizing random connectivity patterns. It can only connect each neuron to thousands of others if neighboring neurons share common inputs — just as they do in the cortex. Such columnar organization may be possible with NEF-generated networks if preferred directions vectors are assigned topographically rather than randomly. Implementing this constraint effectively is a subject of ongoing research. Acknowledgment This work was supported in part by the Belgian American Education Foundation(J. Dethier), Stanford NIH Medical Scientist Training Program (MSTP) and Soros Fellowship (P. Nuyujukian), DARPA Revolutionizing Prosthetics program (N66001-06-C-8005, K. V. Shenoy), and two NIH Director’s Pioneer Awards (DP1-OD006409, K. V. Shenoy; DPI-OD000965, K. Boahen). 8 References [1] V. Gilja, Towards clinically viable neural prosthetic systems, Ph.D. Thesis, Department of Computer Science, Stanford University, 2010, pp 19–22 and pp 57–73. [2] V. Gilja, P. Nuyujukian, C.A. Chestek, J.P. Cunningham, J.M. Fan, B.M. Yu, S.I. Ryu, and K.V. Shenoy, A high-performance continuous cortically-controlled prosthesis enabled by feedback control design, 2010 Neuroscience Meeting Planner, San Diego, CA: Society for Neuroscience, 2010. [3] P. Nuyujukian, V. Gilja, C.A. Chestek, J.P. Cunningham, J.M. Fan, B.M. Yu, S.I. Ryu, and K.V. Shenoy, Generalization and robustness of a continuous cortically-controlled prosthesis enabled by feedback control design, 2010 Neuroscience Meeting Planner, San Diego, CA: Society for Neuroscience, 2010. [4] V. Gilja, C.A. Chestek, I. Diester, J.M. Henderson, K. Deisseroth, and K.V. Shenoy, Challenges and opportunities for next-generation intra-cortically based neural prostheses, IEEE Transactions on Biomedical Engineering, 2011, in press. [5] S.P. Kim, J.D. Simeral, L.R. Hochberg, J.P. Donoghue, and M.J. Black, Neural control of computer cursor velocity by decoding motor cortical spiking activity in humans with tetraplegia, Journal of Neural Engineering, vol. 5, 2008, pp 455–476. [6] S. Kim, P. Tathireddy, R.A. Normann, and F. Solzbacher, Thermal impact of an active 3-D microelectrode array implanted in the brain, IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 15, 2007, pp 493–501. [7] J. Dethier, V. Gilja, P. Nuyujukian, S.A. Elassaad, K.V. Shenoy, and K. Boahen, Spiking neural network decoder for brain-machine interfaces, IEEE Engineering in Medicine & Biology Society Conference on Neural Engineering, Cancun, Mexico, 2011, pp 396–399. [8] K. Boahen, Neuromorphic microchips, Scientiﬁc American, vol. 292(5), 2005, pp 56–63. [9] R. Silver, K. Boahen, S. Grillner, N. Kopell, and K.L. Olsen, Neurotech for neuroscience: unifying concepts, organizing principles, and emerging tools, Journal of Neuroscience, vol. 27(44), 2007, pp 11807– 11819. [10] J.V. Arthur and K. Boahen, Silicon neuron design: the dynamical systems approach, IEEE Transactions on Circuits and Systems, vol. 58(5), 2011, pp 1034-1043. [11] C. Eliasmith and C.H. Anderson, Neural engineering: computation, representation, and dynamics in neurobiological systems, MIT Press, Cambridge, MA; 2003. [12] C. Eliasmith, A uniﬁed approach to building and controlling spiking attractor networks, Neural Computation, vol. 17, 2005, pp 1276–1314. [13] R. Singh and C. Eliasmith, Higher-dimensional neurons explain the tuning and dynamics of working memory cells, The Journal of Neuroscience, vol. 26(14), 2006, pp 3667–3678. [14] C. Eliasmith, How to build a brain: from function to implementation, Synthese, vol. 159(3), 2007, pp 373–388. [15] R.E. Kalman, A new approach to linear ﬁltering and prediction problems, Transactions of the ASME– Journal of Basic Engineering, vol. 82(Series D), 1960, pp 35–45. [16] G. Welsh and G. Bishop, An introduction to the Kalman Filter, University of North Carolina at Chapel Hill Chapel Hill NC, vol. 95(TR 95-041), 1995, pp 1–16. [17] W.Q. Malik, W. Truccolo, E.N. Brown, and L.R. Hochberg, Efﬁcient decoding with steady-state Kalman ﬁlter in neural interface systems, IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 19(1), 2011, pp 25–34. 9

6 0.69183636 86 nips-2011-Empirical models of spiking in neural populations

7 0.67800993 200 nips-2011-On the Analysis of Multi-Channel Neural Spike Data

8 0.67013603 249 nips-2011-Sequence learning with hidden units in spiking neural networks

9 0.65405631 85 nips-2011-Emergence of Multiplication in a Biophysical Model of a Wide-Field Visual Neuron for Computing Object Approaches: Dynamics, Peaks, & Fits

10 0.64959663 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations

11 0.58880484 13 nips-2011-A blind sparse deconvolution method for neural spike identification

12 0.58140981 75 nips-2011-Dynamical segmentation of single trials from population neural data

13 0.57863432 292 nips-2011-Two is better than one: distinct roles for familiarity and recollection in retrieving palimpsest memories

14 0.5167008 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

15 0.51613277 224 nips-2011-Probabilistic Modeling of Dependencies Among Visual Short-Term Memory Representations

16 0.4768247 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)

17 0.46824405 24 nips-2011-Active learning of neural response functions with Gaussian processes

18 0.46003574 89 nips-2011-Estimating time-varying input signals and ion channel states from a single voltage trace of a neuron

19 0.45747334 219 nips-2011-Predicting response time and error rates in visual search

20 0.42021543 87 nips-2011-Energetically Optimal Action Potentials

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.013), (4, 0.025), (18, 0.207), (20, 0.021), (26, 0.037), (31, 0.135), (33, 0.023), (43, 0.063), (45, 0.065), (57, 0.027), (65, 0.02), (74, 0.046), (83, 0.165), (84, 0.027), (99, 0.043)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.82909966 133 nips-2011-Inferring spike-timing-dependent plasticity from spike train data

Author: Konrad Koerding, Ian Stevenson

2 0.73920804 302 nips-2011-Variational Learning for Recurrent Spiking Networks

Author: Danilo J. Rezende, Daan Wierstra, Wulfram Gerstner

3 0.73808724 13 nips-2011-A blind sparse deconvolution method for neural spike identification

Author: Chaitanya Ekanadham, Daniel Tranchina, Eero P. Simoncelli

Abstract: We consider the problem of estimating neural spikes from extracellular voltage recordings. Most current methods are based on clustering, which requires substantial human supervision and systematically mishandles temporally overlapping spikes. We formulate the problem as one of statistical inference, in which the recorded voltage is a noisy sum of the spike trains of each neuron convolved with its associated spike waveform. Joint maximum-a-posteriori (MAP) estimation of the waveforms and spikes is then a blind deconvolution problem in which the coefﬁcients are sparse. We develop a block-coordinate descent procedure to approximate the MAP solution, based on our recently developed continuous basis pursuit method. We validate our method on simulated data as well as real data for which ground truth is available via simultaneous intracellular recordings. In both cases, our method substantially reduces the number of missed spikes and false positives when compared to a standard clustering algorithm, primarily by recovering overlapping spikes. The method offers a fully automated alternative to clustering methods that is less susceptible to systematic errors. 1

4 0.72213668 40 nips-2011-Automated Refinement of Bayes Networks' Parameters based on Test Ordering Constraints

Author: Omar Z. Khan, Pascal Poupart, John-mark M. Agosta

Abstract: In this paper, we derive a method to reﬁne a Bayes network diagnostic model by exploiting constraints implied by expert decisions on test ordering. At each step, the expert executes an evidence gathering test, which suggests the test’s relative diagnostic value. We demonstrate that consistency with an expert’s test selection leads to non-convex constraints on the model parameters. We incorporate these constraints by augmenting the network with nodes that represent the constraint likelihoods. Gibbs sampling, stochastic hill climbing and greedy search algorithms are proposed to ﬁnd a MAP estimate that takes into account test ordering constraints and any data available. We demonstrate our approach on diagnostic sessions from a manufacturing scenario. 1 INTRODUCTION The problem of learning-by-example has the promise to create strong models from a restricted number of cases; certainly humans show the ability to generalize from limited experience. Machine Learning has seen numerous approaches to learning task performance by imitation, going back to some of the approaches to inductive learning from examples [14]. Of particular interest are problemsolving tasks that use a model to infer the source, or cause of a problem from a sequence of investigatory steps or tests. The speciﬁc example we adopt is a diagnostic task such as appears in medicine, electro-mechanical fault isolation, customer support and network diagnostics, among others. We deﬁne a diagnostic sequence as consisting of the assignment of values to a subset of tests. The diagnostic process embodies the choice of the best next test to execute at each step in the sequence, by measuring the diagnostic value among the set of available tests at each step, that is, the ability of a test to distinguish among the possible causes. One possible implementation with which to carry out this process, the one we apply, is a Bayes network [9]. As with all model-based approaches, provisioning an adequate model can be daunting, resulting in a “knowledge elicitation bottleneck.” A recent approach for easing the bottleneck grew out of the realization that the best time to gain an expert’s insight into the model structure is during the diagnostic process. Recent work in “QueryBased Diagnostics” [1] demonstrated a way to improve model quality by merging model use and model building into a single process. More precisely the expert can take steps to modify the network structure to add or remove nodes or links, interspersed within the diagnostic sequence. In this paper we show how to extend this variety of learning-by-example to include also reﬁnement of model parameters based on the expert’s choice of test, from which we determine constraints. The nature of these constraints, as shown herein, is derived from the value of the tests to distinguish causes, a value referred to informally as value of information [10]. It is the effect of these novel constraints on network parameter learning that is elucidated in this paper. ∗ J. M. Agosta is no longer afﬁliated with Intel Corporation 1 Conventional statistical learning approaches are not suited to this problem, since the number of cases available from diagnostic sessions is small, and the data from any case is sparse. (Only a fraction of the tests are taken.) But more relevant is that one diagnostic sequence from an expert user represents the true behavior expected of the model, rather than a noisy realization of a case generated by the true model. We adopt a Bayesian approach, which offers a principled way to incorporate knowledge (constraints and data, when available) and also consider weakening the constraints, by applying a likelihood to them, so that possibly conﬂicting constraints can be incorporated consistently. Sec. 2 reviews related work and Sec. 3 provides some background on diagnostic networks and model consistency. Then, Sec. 4 describes an augmented Bayesian network that incorporates constraints implied by an expert’s choice of tests. Some sampling techniques are proposed to ﬁnd the Maximum a posterior setting of the parameters given the constraints (and any data available). The approach is evaluated in Sec. 5 on synthetic data and a real world manufacturing diagnostic scenario. Finally, Sec. 6 discusses some future work. 2 RELATED WORK Parameter learning for Bayesian networks can be viewed as searching in a high-dimensional space. Adopting constraints on the parameters based on some domain knowledge is a way of pruning this search space and learning the parameters more efﬁciently, both in terms of data needed and time required. Qualitative probabilistic networks [17] allow qualitative constraints on the parameter space to be speciﬁed by experts. For instance, the inﬂuence of one variable on another, or the combined inﬂuence of multiple variables on another variable [5] leads to linear inequalities on the parameters. Wittig and Jameson [18] explain how to transform the likelihood of violating qualitative constraints into a penalty term to adjust maximum likelihood, which allows gradient ascent and Expectation Maximization (EM) to take into account linear qualitative constraints. Other examples of qualitative constraints include some parameters being larger than others, bounded in a range, within ϵ of each other, etc. Various proposals have been made that exploit such constraints. Altendorf et al. [2] provide an approximate technique based on constrained convex optimization for parameter learning. Niculescu et al. [15] also provide a technique based on constrained optimization with closed form solutions for different classes of constraints. Feelders [6] provides an alternate method based on isotonic regression while Liao and Ji [12] combine gradient descent with EM. de Campos and Ji [4] also use constrained convex optimization, however, they use Dirichlet priors on the parameters to incorporate any additional knowledge. Mao and Lebanon [13] also use Dirichlet priors, but they use probabilistic constraints to allow inaccuracies in the speciﬁcation of the constraints. A major difference between our technique and previous work is on the type of constraints. Our constraints do not need to be explicitly speciﬁed by an expert. Instead, we passively observe the expert and learn from what choices are made and not made [16]. Furthermore, as we shall show later, our constraints are non-convex, preventing the direct application of existing techniques that assume linear or convex functions. We use Beta priors on the parameters, which can easily be extended to Dirichlet priors like previous work. We incorporate constraints in an augmented Bayesian network, similar to Liang et al. [11], though their constraints are on model predictions as opposed to ours which are on the parameters of the network. Finally, we also use the notion of probabilistic constraints to handle potential mistakes made by experts. 3 3.1 BACKGROUND DIAGNOSTIC BAYES NETWORKS We consider the class of bipartite Bayes networks that are widely used as diagnostic models, though our approach can be used for networks with any structure. The network forms a sparse, directed, causal graph, where arcs go from causes to observable node variables. We use upper case to denote random variables; C for causes, and T for observables (tests). Lower case letters denote values in the domain of a variable, e.g. c ∈ dom(C) = {c, c}, and bold letters denote sets of variables. A ¯ set of marginally independent binary-valued node variables C with distributions Pr(C) represent unobserved causes, and condition the remaining conditionally independent binary-valued test vari2 able nodes T. Each cause conditions one or more tests; likewise each test is conditioned by one or more causes, resulting in a graph with one or more possibly multiply-connected components. The test variable distributions Pr(T |C) incorporate the further modeling assumption of Independence of Causal Inﬂuence, the most familiar example being the Noisy-Or model [8]. To keep the exposition simple, we assume that all variables are binary and that conditional distributions are parametrized by the Noisy-Or; however, the algorithms described in the rest of the paper generalize to any discrete non-binary variable models. Conventionally, unobserved tests are ranked in a diagnostic Bayes network by their Value Of Information (VOI) conditioned on tests already observed. To be precise, VOI is the expected gain in utility if the test were to be observed. The complete computation requires a model equivalent to a partially observable Markov decision process. Instead, VOI is commonly approximated by a greedy computation of the Mutual Information between a test and the set of causes [3]. In this case, it is easy to show that Mutual Information is in turn well approximated to second order by the Gini impurity [7] as shown in Equation 1. ] [∑ ∑ GI(C|T ) = Pr(T = t) Pr(C = c|T = t)(1 − Pr(C = c|T = t)) (1) t c We will use the Gini measure as a surrogate for VOI, as a way to rank the best next test in the diagnostic sequence. 3.2 MODEL CONSISTENCY A model that is consistent with an expert would generate Gini impurity rankings consistent with the expert’s diagnostic sequence. We interpret the expert’s test choices as implying constraints on Gini impurity rankings between tests. To that effect, [1] deﬁnes the notion of Cause Consistency and Test Consistency, which indicate whether the cause and test orderings induced by the posterior distribution over causes and the VOI of each test agree with an expert’s observed choice. Assuming that the expert greedily chooses the most informative test T ∗ (i.e., test that yields the lowest Gini impurity) at each step, then the model is consistent with the expert’s choices when the following constraints are satisﬁed: GI(C|T ∗ ) ≤ GI(C|Ti ) ∀i (2) We demonstrate next how to exploit these constraints to reﬁne the Bayes network. 4 MODEL REFINEMENT Consider a simple diagnosis example with two possible causes C1 and C2 and two tests T1 and T2 as shown in Figure 1. To keep the exposition simple, suppose that the priors for each cause are known (generally separate data is available to estimate these), but the conditional distribution of each test is unknown. Using the Noisy-OR parameterizations for the conditional distributions, the number of parameters are linear in the number of parents instead of exponential. ∏ i i Pr(Ti = true|C) = 1 − (1 − θ0 ) (1 − θj ) (3) j|Cj =true i Here, θ0 = Pr(Ti = true|Cj = f alse ∀j) is the leak probability that Ti will be true when none of i the causes are true and θj = Pr(Ti = true|Cj = true, Ck = f alse ∀k ̸= j) is the link reliability, which indicates the independent contribution of cause Cj to the probability that test Ti will be true. In the rest of this section, we describe how to learn the θ parameters while respecting the constraints implied by test consistency. 4.1 TEST CONSISTENCY CONSTRAINTS Suppose that an expert chooses test T1 instead of test T2 during the diagnostic process. This ordering by the expert implies that the current model (parametrized by the θ’s) must be consistent with the constraint GI(C|T2 ) − GI(C|T1 ) ≥ 0. Using the deﬁnition of Gini impurity in Eq. 1, we can rewrite 3 Figure 1: Network with 2 causes and 2 tests Figure 2: Augmented network with parameters and constraints Figure 3: Augmented network extended to handle inaccurate feedback the constraint for the network shown in Fig. 1 as follows: ∑ t1 ( ∑ (Pr(t1 |c1 , c2 ) Pr(c1 ) Pr(c2 ))2 Pr(t1 ) − Pr(t1 ) c ,c 1 2 ) ( ) ∑ ∑ (Pr(t2 |c1 , c2 ) Pr(c1 ) Pr(c2 ))2 − Pr(t2 ) − ≥0 Pr(t2 ) t c ,c 2 1 2 (4) Furthermore, using the Noisy-Or encoding from Eq. 3, we can rewrite the constraint as a polynomial in the θ’s. This polynomial is non-linear, and in general, not concave. The feasible space may consist of disconnected regions. Fig. 4 shows the surface corresponding to the polynomial for the 2 1 i i case where θ0 = 0 and θ1 = 0.5 for each test i, which leaves θ2 and θ2 as the only free variables. The parameters’ feasible space, satisfying the constraint consists of the two disconnected regions where the surface is positive. 4.2 AUGMENTED BAYES NETWORK Our objective is to learn the θ parameters of diagnostic Bayes networks given test constraints of the form described in Eq. 4. To deal with non-convex constraints and disconnected feasible regions, we pursue a Bayesian approach whereby we explicitly model the parameters and constraints as random variables in an augmented Bayes network (see Fig. 2). This allows us to frame the problem of learning the parameters as an inference problem in a hybrid Bayes network of discrete (T, C, V ) and continuous (Θ) variables. As we will see shortly, this augmented Bayes network provides a unifying framework to simultaneously learn from constraints and data, to deal with possibly inconsistent constraints, and to express preferences over the degree of satisfaction of the constraints. We encode the constraint derived from the expert feedback as a binary random variable V in the Bayes network. If V is true the constraint is satisﬁed, otherwise it is violated. Thus, if V is true then Θ lies in the positive region of Fig. 4, and if V is f alse then Θ lies in the negative region. We model the CPT for V as Pr(V |Θ) = max(0, π), where π = GI(C|T1 ) − GI(C|T2 ). Note that the value of GI(C|T ) lies in the interval [0,1], so the probability π will always be normalized. The intuition behind this deﬁnition of the CPT for V is that a constraint is more likely to be satisﬁed if the parameters lie in the interior of the constraint region. We place a Beta prior over each Θ parameter. Since the test variables are conditioned on the Θ parameters that are now part of the network, their conditional distributions become known. For instance, the conditional distribution for Ti (given in Eq. 3) is fully deﬁned given the noisy-or parami eters θj . Hence the problem of learning the parameters becomes an inference problem to compute posteriors over the parameters given that the constraint is satisﬁed (and any data). In practice, it is more convenient to obtain a single value for the parameters instead of a posterior distribution since it is easier to make diagnostic predictions based on one Bayes network. We estimate the parameters by computing a maximum a posteriori (MAP) hypothesis given that the constraint is satisﬁed (and any data): Θ∗ = arg maxΘ Pr(Θ|V = true). 4 Algorithm 1 Pseudo Code for Gibbs Sampling, Stochastic Hill Climbing and Greedy Search 1 Fix observed variables, let V = true and randomly sample feasible starting state S 2 for i = 1 to #samples 3 for j = 1 to #hiddenV ariables 4 acceptSample = f alse; k = 0 5 repeat 6 Sample s′ from conditional of j th hidden variable Sj 7 S′ = S; Sj = s′ 8 if Sj is cause or test, then acceptSample = true 9 elseif S′ obeys constraints V∗ 10 if algo == Gibbs 11 Sample u from uniform distribution, U(0,1) p(S′ 12 if u < M q(S)′ ) where p and q are the true and proposal distributions and M > 1 13 acceptSample = true 14 elseif algo = = StochasticHillClimbing 15 if likelihood(S′ ) > likelihood(S), then acceptSample = true 16 elseif algo = = Greedy, then acceptSample = true 17 elseif algo = = Greedy 18 k = k+1 19 if k = = maxIterations, then s′ = Sj ; acceptSample = true 20 until acceptSample = = true 21 Sj = s′ 4.3 MAP ESTIMATION Previous approaches for parameter learning with domain knowledge include modiﬁed versions of EM or some other optimization techniques that account for linear/convex constraints on the parameters. Since our constraints are non-convex, we propose a new approach based on Gibbs sampling to approximate the posterior distribution, from which we compute the MAP estimate. Although the technique converges to the MAP in the limit, it may require excessive time. Hence, we modify Gibbs sampling to obtain more efﬁcient stochastic hill climbing and greedy search algorithms with anytime properties. The pseudo code for our Gibbs sampler is provided in Algorithm 1. The two key steps are sampling the conditional distributions of each variable (line 6) and rejection sampling to ensure that the constraints are satisﬁed (lines 9 and 12). We sample each variable given the rest according to the following distributions: ti ∼ Pr(Ti |c, θi ) ∀i cj ∼ Pr(Cj |c − cj , t, θ) ∝ ∏ Pr(Cj ) j ∏ (5) Pr(ti |c, θi ) ∀j i i θj ∼ Pr(Θi |Θ − Θi , t, c, v) ∝ Pr(v|t, Θ) j j ∏ Pr(ti |cj , θi ) ∀i, j (6) (7) i The tests and causes are easily sampled from the multinomials as described in the equations above. However, sampling the θ’s is more difﬁcult due to the factor Pr(v|Θ, t) = max(0, π), which is a truncated mixture of Betas. So, instead of sampling θ from its true conditional, we sample it from a proposal distribution that replaces max(0, π) by an un-truncated mixture of Betas equal to π + a where a is a constant that ensures that π + a is always positive. This is equivalent to ignoring the constraints. Then we ensure that the constraints are satisﬁed by rejecting the samples that violate the constraints. Once Gibbs sampling has been performed, we obtain a sample that approximates the posterior distribution over the parameters given the constraints (and any data). We return a single setting of the parameters by selecting the sampled instance with the highest posterior probability (i.e., MAP estimate). Since we will only return the MAP estimate, it is possible to speed up the search by modifying Gibbs sampling. In particular, we obtain a stochastic hill climbing algorithm by accepting a new sample only if its posterior probability improves upon that of the previous sample 5 Posterior Probability 0.1 0.08 Difference in Gini Impurity 0.1 0.05 0 −0.05 0.06 0.04 0.02 −0.1 1 0 1 1 0.8 0.5 0.6 0.8 0.4 Link Reliability of Test 2 and Cause 1 0 0.6 0.2 0 0.4 Link Reliability of Test 2 and Cause 2 Figure 4: Difference in Gini impurity for the network in 1 2 Fig. 1 when θ2 and θ2 are the only parameters allowed to vary. 0.2 Link Reliability of Test 2 and Cause 1 0 0 0.2 0.4 0.6 0.8 1 Link Reliability of Test 2 and Cause 1 Figure 5: Posterior over parameters computed through calculation after discretization. Figure 6: Posterior over parameters calculated through Sampling. (line 15). Thus, each iteration of the stochastic hill climber requires more time, but always improves the solution. As the number of constraints grows and the feasibility region shrinks, the Gibbs sampler and stochastic hill climber will reject most samples. We can mitigate this by using a Greedy sampler that caps the number of rejected samples, after which it abandons the sampling for the current variable to move on to the next variable (line 19). Even though the feasibility region is small overall, it may still be large in some dimensions, so it makes sense to try sampling another variable (that may have a larger range of feasible values) when it is taking too long to ﬁnd a new feasible value for the current variable. 4.4 MODEL REFINEMENT WITH INCONSISTENT CONSTRAINTS So far, we have assumed that the expert’s actions generate a feasible region as a consequence of consistent constraints. We handle inconsistencies by further extending our augmented diagnostic Bayes network. We treat the observed constraint variable, V , as a probabilistic indicator of the true constraint V ∗ as shown in Figure 3. We can easily extend our techniques for computing the MAP to cater for this new constraint node by sampling an extra variable. 5 EVALUATION AND EXPERIMENTS 5.1 EVALUATION CRITERIA Formally, for M ∗ , the true model that we aim to learn, the diagnostic process determines the choice of best next test as the one with the smallest Gini impurity. If the correct choice for the next test is known (such as demonstrated by an expert), we can use this information to include a constraint on the model. We denote by V+ the set of observed constraints and by V∗ the set of all possible constraints that hold for M ∗ . Having only observed V+ , our technique will consider any M + ∈ M+ as a possible true model, where M+ is the set of all models that obey V + . We denote by M∗ the set of all models that are diagnostically equivalent to M ∗ (i.e., obey V ∗ and would recommend the MAP same steps as M ∗ ) and by MV+ the particular model obtained by MAP estimation based on the MAP constraints V+ . Similarly, when a dataset D is available, we denote by MD the model obtained MAP by MAP estimation based on D and by MDV+ , the model based on D and V+ . Ideally we would like to ﬁnd the true underlying model M ∗ , hence we will report the KL divergence between the models found and M ∗ . However, other diagnostically equivalent M ∗ may recommend the same tests as M ∗ and thus have similar constraints, so we also report test consistency with M ∗ (i.e., # of recommended tests that are the same). 5.2 CORRECTNESS OF MODEL REFINEMENT Given V∗ , our technique for model adjustment is guaranteed to choose a model M MAP ∈ M∗ by construction. If any constraint V ∗ ∈ V∗ is violated, the rejection sampling step of our technique 6 100 Comparing convergence of Different Techniques 80 70 60 50 40 30 Data Only Constraints Only Data+Constraints 20 10 0 1 2 3 4 5 Number of constraints used 6 −10 −12 −14 −16 −18 7 −20 Figure 7: Mean KLdivergence and one standard deviation for a 3 cause 3 test network on learning with data, constraints and data+constraints. Gibbs Sampling Stochastic Hill Climbing Greedy Sampling −8 Negative Log Likelihood of MAP Estimate Percentage of tests correctly predicted 90 0 1 2 3 10 10 10 10 Elapsed Time (plotted on log scale from 0 to 1500 seconds) Figure 8: Test Consistency for a 3 cause 3 test network on learning with data, constraints and data+constraints. Figure 9: Convergence rate comparison. would reject that set of parameters. To illustrate this, consider the network in Fig. 2. There are six parameters (four link reliabilities and two leak parameters). Let us ﬁx the leak parameters and the link reliability from the ﬁrst cause to each test. Now we can compute the posterior surface over the two variable parameters after discretizing each parameter in small steps and then calculating the posterior probability at each step as shown in Fig. 5. We can compare this surface with that obtained after Gibbs sampling using our technique as shown in Fig. 6. We can see that our technique recovers the posterior surface from which we can compute the MAP. We obtain the same MAP estimate with the stochastic hill climbing and greedy search algorithms. 5.3 EXPERIMENTAL RESULTS ON SYNTHETIC PROBLEMS We start by presenting our results on a 3-cause by 3-test fully-connected bipartite Bayes network. We assume that there exists some M ∗ ∈ M∗ that we want to learn given V+ . We use our technique to ﬁnd M MAP . To evaluate M MAP , we ﬁrst compute the constraints, V∗ for M ∗ to get the feasible region associated with the true model. Next, we sample 100 other models from this feasible region that are diagnostically equivalent. We compare these models with M MAP (after collecting 200 samples with non-informative priors for the parameters). We compute the KL-divergence of M MAP with respect to each sampled model. We expect KLdivergence to decrease as the number of constraints in V+ increases since the feasible region beMAP comes smaller. Figure 7 conﬁrms this trend and shows that MDV+ has lower mean KL-divergence MAP MAP than MV+ , which has lower mean KL-divergence than MD . The data points in D are limited to the results of the diagnostic sessions needed to obtain V+ . As constraints increase, more data is available and so the results for the data-only approach also improve with increasing constraints. We also compare the test consistency when learning from data only, constraints only or both. Given a ﬁxed number of constraints, we enumerate the unobserved trajectories, and then compute the highest ranked test using the learnt model and the sampled true models, for each trajectory. The test consistency is reported as a percentage, with 100% consistency indicating that the learned and true models had the same highest ranked tests on every trajectory. Figure 8 presents these percentatges for the greedy sampling technique (the results are similar for the other techniques). It again appears that learning parameters with both constraints and data is better than learning with only constraints, which is most of the times better than learning with only data. Figure 9 compares the convergence rate of each technique to ﬁnd the MAP estimate. As expected, Stochastic Hill Climbing and Greedy Sampling take less time than Gibbs sampling to ﬁnd parameter settings with high posterior probability. 5.4 EXPERIMENTAL RESULTS ON REAL-WORLD PROBLEMS We evaluate our technique on a real-world diagnostic network collected and reported by Agosta et al. [1], where the authors collected detailed session logs over a period of seven weeks in which the 7 KL−divergence of when computing joint over all tests 8 Figure 10: Diagnostic Bayesian network collected from user trials and pruned to retain sub-networks with at least one constraint Data Only Constraints Only Data+Constraints 7 6 5 4 3 2 1 6 8 10 12 14 16 Number of constraints used 18 20 22 Figure 11: KL divergence comparison as the number of constraints increases for the real world problem. entire diagnostic sequence was recorded. The sequences intermingle model building and querying phases. The model network structure was inferred from an expert’s sequence of positing causes and tests. Test-ranking constraints were deduced from the expert’s test query sequences once the network structure is established. The 157 sessions captured over the seven weeks resulted in a Bayes network with 115 tests, 82 root causes and 188 arcs. The network consists of several disconnected sub-networks, each identiﬁed with a symptom represented by the ﬁrst test in the sequence, and all subsequent tests applied within the same subnet. There were 20 sessions from which we were able to observe trajectories with at least two tests, resulting in a total of 32 test constraints. We pruned our diagnostic network to remove the sub-networks with no constraints to get a Bayes network with 54 tests, 30 root causes, and 67 parameters divided in 7 sub-networks, as shown in Figure 10, on which we apply our model reﬁnement technique to learn the parameters for each sub-network separately. Since we don’t have the true underlying network and the full set of constraints (more constraints could be observed in future diagnostic sessions), we treated the 32 constraints as if they were V∗ and the corresponding feasible region M∗ as if it contained models diagnostically equivalent to the unknown true model. Figure 11 reports the KL divergence between the models found by our algorithms and sampled models from M∗ as we increase the number of constraints. With such limited constraints and consequently large feasible regions, it is not surprising that the variation in KL divergence is large. Again, the MAP estimate based on both the constraints and the data has lower KL divergence than constraints only and data only. 6 CONCLUSION AND FUTURE WORK In summary, we presented an approach that can learn the parameters of a Bayes network based on constraints implied by test consistency and any data available. While several approaches exist to incorporate qualitative constraints in learning procedures, our work makes two important contributions: First, this is the ﬁrst approach that exploits implicit constraints based on value of information assessments. Secondly it is the ﬁrst approach that can handle non-convex constraints. We demonstrated the approach on synthetic data and on a real-world manufacturing diagnostic problem. Since data is generally sparse in diagnostics, this work makes an important advance to mitigate the model acquisition bottleneck, which has prevented the widespread application of diagnostic networks so far. In the future, it would be interesting to generalize this work to reinforcement learning in applications where data is sparse, but constraints may be inferred from expert interactions. Acknowledgments This work was supported by a grant from Intel Corporation. 8 References [1] John Mark Agosta, Omar Zia Khan, and Pascal Poupart. Evaluation results for a query-based diagnostics application. In The Fifth European Workshop on Probabilistic Graphical Models (PGM 10), Helsinki, Finland, September 13–15 2010. [2] Eric E. Altendorf, Angelo C. Restiﬁcar, and Thomas G. Dietterich. Learning from sparse data by exploiting monotonicity constraints. In Proceedings of Twenty First Conference on Uncertainty in Artiﬁcial Intelligence (UAI), Edinburgh, Scotland, July 2005. [3] Brigham S. Anderson and Andrew W. Moore. Fast information value for graphical models. In Proceedings of Nineteenth Annual Conference on Neural Information Processing Systems (NIPS), pages 51–58, Vancouver, BC, Canada, December 2005. [4] Cassio P. de Campos and Qiang Ji. Improving Bayesian network parameter learning using constraints. In International Conference in Pattern Recognition (ICPR), Tampa, FL, USA, 2008. [5] Marek J. Druzdzel and Linda C. van der Gaag. Elicitation of probabilities for belief networks: combining qualitative and quantitative information. In Proceedings of the Eleventh Annual Conference on Uncertainty in Artiﬁcial Intelligence (UAI), pages 141–148, Montreal, QC, Canada, 1995. [6] Ad J. Feelders. A new parameter learning method for Bayesian networks with qualitative inﬂuences. In Proceedings of Twenty Third International Conference on Uncertainty in Artiﬁcial Intelligence (UAI), Vancouver, BC, July 2007. [7] Mara Angeles Gil and Pedro Gil. A procedure to test the suitability of a factor for stratiﬁcation in estimating diversity. Applied Mathematics and Computation, 43(3):221 – 229, 1991. [8] David Heckerman and John S. Breese. Causal independence for probability assessment and inference using bayesian networks. IEEE Systems, Man, and Cybernetics, 26(6):826–831, November 1996. [9] David Heckerman, John S. Breese, and Koos Rommelse. Decision-theoretic troubleshooting. Communications of the ACM, 38(3):49–56, 1995. [10] Ronald A. Howard. Information value theory. IEEE Transactions on Systems Science and Cybernetics, 2(1):22–26, August 1966. [11] Percy Liang, Michael I. Jordan, and Dan Klein. Learning from measurements in exponential families. In Proceedings of Twenty Sixth Annual International Conference on Machine Learning (ICML), Montreal, QC, Canada, June 2009. [12] Wenhui Liao and Qiang Ji. Learning Bayesian network parameters under incomplete data with domain knowledge. Pattern Recognition, 42:3046–3056, 2009. [13] Yi Mao and Guy Lebanon. Domain knowledge uncertainty and probabilistic parameter constraints. In Proceedings of Twenty Fifth Conference on Uncertainty in Artiﬁcial Intelligence (UAI), Montreal, QC, Canada, 2009. [14] Ryszard S. Michalski. A theory and methodology of inductive learning. Artiﬁcial Intelligence, 20:111–116, 1984. [15] Radu Stefan Niculescu, Tom M. Mitchell, and R. Bharat Rao. Bayesian network learning with parameter constraints. Journal of Machine Learning Research, 7:1357–1383, 2006. [16] Mark A. Peot and Ross D. Shachter. Learning from what you dont observe. In Proceedings of the Fourteenth Conference on Uncertainty in Artiﬁcial Intelligence (UAI), pages 439–446, Madison, WI, July 1998. [17] Michael P. Wellman. Fundamental concepts of qualitative probabilistic networks. Artiﬁcial Intelligence, 44(3):257–303, August 1990. [18] Frank Wittig and Anthony Jameson. Exploiting qualitative knowledge in the learning of conditional probabilities of Bayesian networks. In Proceedings of the Sixteenth Conference on Uncertainty in Artiﬁcial Intelligence (UAI), San Francisco, CA, July 2000. 9

5 0.7086767 249 nips-2011-Sequence learning with hidden units in spiking neural networks

Author: Johanni Brea, Walter Senn, Jean-pascal Pfister

6 0.70750016 145 nips-2011-Learning Eigenvectors for Free

7 0.68033558 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations

8 0.67614627 262 nips-2011-Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation

9 0.66028363 75 nips-2011-Dynamical segmentation of single trials from population neural data

10 0.65930831 86 nips-2011-Empirical models of spiking in neural populations

11 0.64653295 292 nips-2011-Two is better than one: distinct roles for familiarity and recollection in retrieving palimpsest memories

12 0.641761 219 nips-2011-Predicting response time and error rates in visual search

13 0.63868719 2 nips-2011-A Brain-Machine Interface Operating with a Real-Time Spiking Neural Network Control Algorithm

14 0.63706255 102 nips-2011-Generalised Coupled Tensor Factorisation

15 0.63557816 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

16 0.63485992 158 nips-2011-Learning unbelievable probabilities

17 0.63210797 23 nips-2011-Active dendrites: adaptation to spike-based communication

18 0.63057262 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis

19 0.62987298 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes

20 0.62983572 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning