nips nips2011 nips2011-75 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Biljana Petreska, Byron M. Yu, John P. Cunningham, Gopal Santhanam, Stephen I. Ryu, Krishna V. Shenoy, Maneesh Sahani
Abstract: Simultaneous recordings of many neurons embedded within a recurrentlyconnected cortical network may provide concurrent views into the dynamical processes of that network, and thus its computational function. In principle, these dynamics might be identified by purely unsupervised, statistical means. Here, we show that a Hidden Switching Linear Dynamical Systems (HSLDS) model— in which multiple linear dynamical laws approximate a nonlinear and potentially non-stationary dynamical process—is able to distinguish different dynamical regimes within single-trial motor cortical activity associated with the preparation and initiation of hand movements. The regimes are identified without reference to behavioural or experimental epochs, but nonetheless transitions between them correlate strongly with external events whose timing may vary from trial to trial. The HSLDS model also performs better than recent comparable models in predicting the firing rate of an isolated neuron based on the firing rates of others, suggesting that it captures more of the “shared variance” of the data. Thus, the method is able to trace the dynamical processes underlying the coordinated evolution of network activity in a way that appears to reflect its computational role. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Dynamical segmentation of single trials from population neural data Biljana Petreska Gatsby Computational Neuroscience Unit University College London biljana@gatsby. [sent-1, score-0.195]
2 uk Abstract Simultaneous recordings of many neurons embedded within a recurrentlyconnected cortical network may provide concurrent views into the dynamical processes of that network, and thus its computational function. [sent-15, score-0.482]
3 The regimes are identified without reference to behavioural or experimental epochs, but nonetheless transitions between them correlate strongly with external events whose timing may vary from trial to trial. [sent-18, score-0.218]
4 Thus, the method is able to trace the dynamical processes underlying the coordinated evolution of network activity in a way that appears to reflect its computational role. [sent-20, score-0.522]
5 By studying the activity of these neurons in concert we may hope to gain insight not only into the computations performed by specific neurons, but also into the computations performed by the population as a whole. [sent-22, score-0.33]
6 The dynamics of such collective computations can be seen in the coordinated activity of all of the neurons within the local network; although each individual such neuron may reflect this coordinated component only noisily. [sent-23, score-0.596]
7 Thus, we hope to identify the computationallyrelevant network dynamics by purely statistical, unsupervised means—capturing the shared evolu1 tion through latent-variable state-space models [1, 2, 3, 4, 5, 6, 7, 8]. [sent-24, score-0.262]
8 In a similar way, noisy data from many neurons participating in a local network computation needs to be combined with the learned structure of that computation—embodied by a suitable statistical model—to reveal the progression of the computation. [sent-28, score-0.202]
9 Neural spiking activity is usually analysed by averaging across multiple experimental trials, to obtain a smooth estimate of the underlying firing rates [2, 3, 4, 5]. [sent-29, score-0.223]
10 In the case of a surprisingly long movement preparation time or a wrong decision, it becomes possible to identify the sources of error at the neural level. [sent-35, score-0.212]
11 Our approach is based on a Hidden Switching Linear Dynamical System (HSLDS) model, in which the coordinated network influence on the population is captured by a low-dimensional latent variable which evolves at each time step according to one of a set of available linear dynamical laws. [sent-40, score-0.638]
12 Similar models have a long history in tracking, speech and, indeed, neural decoding applications [9, 10, 11] where they are variously known as Switching Linear Dynamical System models, Jump Markov models or processes, switching Kalman Filters or Switching Linear Gaussian State Space models [12]. [sent-41, score-0.183]
13 We add the prefix “Hidden” to stress that in our application neither the switching process nor the latent dynamical variable are ever directly observed, and so learning of the parameters of the model is entirely unsupervised—and again, learning in such models has a long history [13]. [sent-42, score-0.55]
14 In our models, the transitions between linear dynamical laws may serve two purposes. [sent-44, score-0.334]
15 First, they may provide a piecewise-linear approximation to a more accurate non-linear dynamical model [14]. [sent-45, score-0.255]
16 Second, they may reflect genuine changes in the dynamics of the local network, perhaps due to changes in the goals of the underlying computation under the control of signals external to the local area. [sent-46, score-0.225]
17 Instead of explicitly modeling the network computation as a dynamical process, GPFA assumes that the computation evolves smoothly in time. [sent-49, score-0.343]
18 However GPFA assumes that the latent dimensions evolve independently, making GPFA more restrictive than HSLDS in which the latent dimensions can be coupled. [sent-51, score-0.405]
19 Coupling the latent dynamics introduces complex interactions between the latent dimensions, which allows a richer set of behaviors. [sent-52, score-0.442]
20 Having validated the HSLDS approach, we go on to study the dynamical segmentation identified by the model in the rest of Section 4, leading to the conclusions of Section 5. [sent-55, score-0.327]
21 2 2 Hidden Switching Linear Dynamical Systems Our goal is to extract the structure of computational dynamics in a cortical network from the recorded firing rates of a subset of neurons in that network. [sent-56, score-0.46]
22 We use a Hidden Switching Linear Dynamical Systems (HSLDS) model to capture the component of those firing rates which is shared by multiple cells, thus exploiting the intuition that network computations should be reflected in coordinated activity across a local population. [sent-57, score-0.333]
23 This will yield a latent low-dimensional subspace of dynamical states embedded within the space of noisy measured firing rates, along with a model of the dynamics within that latent space. [sent-58, score-0.697]
24 The dynamics of the HSLDS model combines a number of linear dynamical systems (LDS), each of which capture linear Markovian dynamics using a first-order linear autoregressive (AR) rule [9, 15]. [sent-59, score-0.523]
25 By combining multiple such rules, the HSLDS model can provide a piecewise linear approximation to nonlinear dynamics, and also capture changes in the dynamics of the local network driven by external influences that presumably reflect task demands. [sent-60, score-0.309]
26 This latent R computational state reflects the network-level computation performed at timepoint t that gives rise to the observed spiking activity y:,t ∈ I q×1 . [sent-63, score-0.433]
27 Note that the dimensionality of the computational state R p is lower than the dimensionality of the recorded neural data q which corresponds to the number of recorded neurons. [sent-64, score-0.252]
28 The linear dynamical matrices Ast ∈ I p×p and innovations covariance matrices Kst ∈ I p×p are parameters of the R R model and need to be learned. [sent-66, score-0.279]
29 These matrices are indexed by a switch variable st ∈ {1, . [sent-67, score-0.175]
30 , S} such that different Ast and Kst need to be learned for each of the S possible linear dynamical systems. [sent-70, score-0.255]
31 The switch variable st specifies which linear dynamical law guides the evolution of the latent state x:,t at timepoint t and as such provides a piecewise approximation to the nonlinear dynamics with which x:,t may evolve. [sent-73, score-0.906]
32 This means that the firing rates of different neurons are independent conditioned on the latent dynamics, compelling the shared variance to live only in the latent space. [sent-81, score-0.552]
33 Finally, the observation dynamics do not depend on which linear dynamical system is used (i. [sent-84, score-0.389]
34 Inference (or the E-step) requires finding appropriate expected sufficient statistics under the distributions of the computational latent state and switch variable at each point in time given the observed neural data p(x1:T , s1:T |y1:T ). [sent-89, score-0.328]
35 At the next timepoint, each of the S possible latent states can again evolve according to S different linear dynamical laws, such that at timepoint t we need to keep track of S t possible solutions. [sent-92, score-0.544]
36 The first layer corresponds to the discrete switch variable that dictates which of the S available linear dynamical systems (LDSs) will guide the latent dynamics shown in the second layer. [sent-94, score-0.653]
37 The latent dynamics evolves as a linear dynamical system at timepoint t and presumably captures relevant aspects of the computation performed at the level of the recorded neural network. [sent-95, score-0.759]
38 The relationship between the latent dynamics and neural data (third layer) is again linear-Gaussian, such that each computational state is associated to a specific denoised firing pattern. [sent-96, score-0.352]
39 The dimensionality of the latent dynamics x is lower than that of the observations y (equivalent to the number of recorded neurons), meaning that x extracts relevant features reflected in the shared variance of y. [sent-97, score-0.454]
40 In practice, the estimated individual variance of particularly low-firing neurons was very low and likely to be incorrectly estimated. [sent-107, score-0.169]
41 Finally, the most likely state of the switch variable s∗ = arg maxs1:T p(s1:T |y1:T ) was estimated using the standard Viterbi algorithm [20], 1:T which ensures that the most likely switch variable path is in fact possible in terms of the transitions allowed by M . [sent-111, score-0.285]
42 In this model, the latent dynamics evolve as a Gaussian Process (GP), with a smooth correlation structure between the latent states at different points in time. [sent-115, score-0.495]
43 , p} and defines a separate GP: xi,: ∼ N (0, Ki ) where xi,: ∈ I 1×T is the trajectory in time of the ith latent dimension and Ki ∈ I T ×T is the R R ith GP smoothing covariance matrix. [sent-120, score-0.178]
44 Whereas HSLDS explicitly models the dynamics of the network computation, GPFA only assumes that the evolution of the computational state is smooth. [sent-122, score-0.253]
45 Thus GPFA is a less restrictive model than HSLDS, but being model-free makes it also less informative of the dynamical rules that underlie the computation. [sent-123, score-0.255]
46 All of these models attempt to capture the shared variance in the data, and so performance may be measured by how well the activity of one neuron can be predicted using the activity of the rest of the neurons. [sent-127, score-0.408]
47 , y:,T ] ∈ I q×T where R each row yj,: represents the activity of neuron j in time. [sent-132, score-0.214]
48 We first ˆ R estimate the trajectories in the latent space using all but the jth neuron P (x1:p,: |Y−j,: ) in a set of testing trials. [sent-134, score-0.274]
49 Note that the performance of difference models can be evaluated as a function of the dimensionality of the latent state. [sent-137, score-0.179]
50 A trial began with the animal touching and looking at an illuminated point at the center of a vertically oriented screen. [sent-141, score-0.175]
51 The target remained visible while the animal prepared but withheld a movement to touch it. [sent-143, score-0.213]
52 Neural activity was recorded from 105 single and multi-units, using a 96-electrode array (Blackrock, Salt Lake City, UT). [sent-145, score-0.191]
53 We then study the dynamical segmentation found by the HSLDS model, first by looking at a typical example (section 4. [sent-152, score-0.315]
54 2) and then by correlating dynamical switches to behavioural events (section 4. [sent-153, score-0.327]
55 Analyses are based on one movement type with the target in the 45◦ direction. [sent-156, score-0.176]
56 2 6 HSLDS, S=7 LDS GPFA 4 5 6 7 8 9 10 11 12 13 14 15 Latent state dimensionality p trajectories and dynamical transitions estimated by the model predict reaction time, a behavioral covariate that varies from trial-to-trial. [sent-165, score-0.5]
57 Since all of these models attempt to capture the shared variance of the data across neurons and multiple trials, we used cross-prediction to measure their performance. [sent-170, score-0.237]
58 Crossprediction looks at how well the spiking activity of one neuron is predicted just by looking at the spiking activity of all of the other neurons (described in detail in Section 3. [sent-171, score-0.6]
59 We found that both the single LDS and HSLDS models that allow for coupled latent dynamics do better than GPFA, shown in Figure 2, which could be attributed to the fact that GPFA constrains the different dimensions of the latent computational state to evolve independently. [sent-173, score-0.544]
60 The HSLDS model also outperforms a single LDS yielding the lowest prediction error for all of the latent dimensions we have looked at, arguing that a nonlinear model of the latent dynamics is better than a linear model. [sent-174, score-0.488]
61 It is tempting to suggest that for this particular task the effective dimensionality of the spiking activity is much lower than that of the 105 recorded neurons, thereby justifying the use of a low-dimensional manifold to describe the underlying computation. [sent-176, score-0.264]
62 This could be interpreted as evidence that neurons may carry redundant information and that the (nonlinear) computational function of the network is better reflected at the level of the population of neurons, rather than in single neurons. [sent-177, score-0.268]
63 2 Data segmentation By definition, the HSLDS model partitions the latent dynamics underlying the observed data into time-labeled segments that may evolve linearly. [sent-179, score-0.4]
64 The segments found by HSLDS correspond to periods of time in which the latent dynamics seem to evolve according to different linear dynamical laws, suggesting that the observed firing pattern of the network has changed as a whole. [sent-180, score-0.707]
65 Thus, by construction, the HSLDS model can subdivide the network activity into different firing regimes for each trial specifically. [sent-181, score-0.269]
66 For the purpose of visualization, we have applied an additional orthonormalization post-processing step (as in [8]) that helps us order the latent dimensions according to the amount of covariance explained. [sent-182, score-0.227]
67 We will refer to x:,t = DC VC x:,t as the orthonormalised latent state at time t. [sent-184, score-0.243]
68 The first ˜ dimension of the orthonormalised latent state in time x1,: corresponds then to the latent trajectory ˜ which explains the most covariance. [sent-185, score-0.397]
69 Since the columns of UC are orthonormal, the relationship between the orthonormalised latent trajectories and observed data can be interpreted in an intuitive way, similarly to Principal Components Analysis (PCA). [sent-186, score-0.244]
70 The results presented here were obtained by setting the number of switching LDSs S, latent space dimensionality p and Wishart prior ψ to values that yielded a reasonably low cross-prediction error. [sent-187, score-0.298]
71 Figure 3 shows a typical example of the HSLDS model applied to data in one movement direction, where the different trials are fanned out vertically for illustration purposes. [sent-188, score-0.233]
72 The first orthonormalized 6 Figure 3: HSLDS applied to neural data from the 45◦ direction movement (S = 7, p = 7, ψ = 0. [sent-189, score-0.218]
73 The first dimension of the orthonormalised latent trajectory is shown. [sent-191, score-0.216]
74 The colors denote the different linear dynamical systems used by the model. [sent-192, score-0.255]
75 Each line is a different trial, aligned to the target onset (left) and go cue (right), and sorted by reaction time. [sent-193, score-0.287]
76 Switches reliably follow the target onset and precede the movement onset, with a time lag that is correlated with reaction time. [sent-194, score-0.374]
77 latent dimension indicates a transient in the recorded population activity shortly after target onset (which is marked by the red dots) and a sustained change of activity after the go cue (marked by the green dots). [sent-195, score-0.737]
78 It is evident that the learned solution segments each trial into a broadly reproducible sequence of dynamical epochs. [sent-197, score-0.337]
79 Some transitions appear to reliably follow or precede external events (even though these events were not used to train the segmentation) and may reflect actual changes in dynamics due to external influences. [sent-198, score-0.377]
80 Others seem to follow each other in quick succession, and may instead reflect linear approximations to non-linear dynamical processes—evident particularly during transiently rapid changes in the latent state. [sent-199, score-0.432]
81 Note that the delays (time from target onset to go cue) used in the experiment varied from 200 to 700ms, such that the model systematically detected a change in the neural firing rates shortly after the go cue appeared on each individual trial. [sent-201, score-0.309]
82 3 Behavioral correlates during single trials It is not surprising that the firing rates of the recorded neurons change during different behavioral periods. [sent-204, score-0.37]
83 For example, neural activity is often observed to be higher during movement execution than during movement preparation. [sent-205, score-0.461]
84 However, the HSLDS method reliably detects the behaviourallycorrelated changes in the pattern of neural activity across many neurons on single trials. [sent-206, score-0.371]
85 In order to ensure that HSLDS captures trial-specific information we have looked at whether the time post-go-cue at which the model estimates a first switch in the neural dynamics could predict the subsequent onset of movement and thus the trial reaction time (RT). [sent-207, score-0.619]
86 We found that the filtered model (which does not incorporate spiking data from future times into its estimate of the switching variable) could explain 52% of the reaction time variance on average, across the 7 reach directions (Figure 4). [sent-208, score-0.3]
87 The PV sums the preferred directions of a population of neurons, weighted by the respective spike counts in order to decode the represented direction of movement [22]. [sent-211, score-0.304]
88 The rise-to-threshold hypothesis asserts that neural firing rates rise during a preparatory period and movement is initiated when the population rate crosses a threshold [23]. [sent-212, score-0.284]
89 52) between the reaction time and first filtered HSLDS switch following the go cue, on a trial-bytrial basis and averaged across directions. [sent-215, score-0.232]
90 Note that in two catch trials the model did not switch following the go cue, so we considered the last switch before the cue. [sent-217, score-0.272]
91 7 d unit vector in the direction of pq = d=1 ri v d where d indexes the instructed movement direction d d v and rq is the mean firing rate of neuron q during all movements in direction d. [sent-218, score-0.363]
92 The preferred direction of a given neuron often differed between plan and movement activity, so we used data from d movement onset until the movement end to estimate rq as this gave us better results when trying to estimate a threshold in the rising movement-related activity. [sent-219, score-0.644]
93 We then estimated the instanteneous Q amplitude of the network PV at time t as sd = || q=1 yq,t pq ||, where yq,t is the smoothed spike t count of neuron q at time t, Q is the number of neurons and ||w|| denotes the norm of the vector w. [sent-220, score-0.385]
94 First, the movement epoch of each trial was identified to define the PV. [sent-223, score-0.208]
95 HSLDS explicitly models the nonlinear dynamics of the computation as a piecewise linear process that captures the shared variance in the neural data across neurons and multiple trials. [sent-229, score-0.455]
96 Despite these simplications, HSLDS can be used to dynamically segment the neural activity at the level of the whole population of neurons into periods of different firing regimes. [sent-233, score-0.393]
97 The role of sensory network dynamics in generating a motor program. [sent-269, score-0.236]
98 Switching linear dynamical systems for noise robust speech recognition. [sent-313, score-0.255]
99 Modeling and decoding motor cortical activity using a switching kalman filter. [sent-324, score-0.359]
100 Expectation correction for smoothed inference in switching linear dynamical systems. [sent-376, score-0.404]
wordName wordTfidf (topN-words)
[('hslds', 0.715), ('gpfa', 0.26), ('dynamical', 0.255), ('latent', 0.154), ('movement', 0.151), ('neurons', 0.142), ('dynamics', 0.134), ('ring', 0.128), ('lds', 0.126), ('activity', 0.122), ('switching', 0.119), ('neuron', 0.092), ('switch', 0.088), ('reaction', 0.083), ('timepoint', 0.082), ('cue', 0.072), ('onset', 0.069), ('recorded', 0.069), ('population', 0.066), ('st', 0.065), ('orthonormalised', 0.062), ('network', 0.06), ('pv', 0.059), ('trials', 0.058), ('trial', 0.057), ('kst', 0.055), ('coordinated', 0.053), ('evolve', 0.053), ('ast', 0.05), ('spiking', 0.048), ('crossprediction', 0.047), ('shared', 0.045), ('external', 0.045), ('behavioral', 0.044), ('motor', 0.042), ('laws', 0.041), ('ldss', 0.041), ('go', 0.038), ('transitions', 0.038), ('animal', 0.037), ('neural', 0.037), ('ected', 0.036), ('santhanam', 0.035), ('wishart', 0.034), ('segmentation', 0.034), ('ryu', 0.034), ('evolution', 0.032), ('biljana', 0.031), ('gaussianised', 0.031), ('illuminated', 0.031), ('jayaraman', 0.031), ('odor', 0.031), ('timepoints', 0.031), ('shenoy', 0.031), ('gp', 0.031), ('spike', 0.031), ('rates', 0.03), ('regimes', 0.03), ('smoothed', 0.03), ('ect', 0.03), ('pq', 0.03), ('direction', 0.03), ('ki', 0.028), ('evolves', 0.028), ('trajectories', 0.028), ('re', 0.028), ('orthonormalization', 0.027), ('decoding', 0.027), ('state', 0.027), ('correlates', 0.027), ('variance', 0.027), ('gatsby', 0.026), ('periods', 0.026), ('looking', 0.026), ('counts', 0.026), ('dimensionality', 0.025), ('cortical', 0.025), ('segments', 0.025), ('dept', 0.025), ('behavioural', 0.025), ('target', 0.025), ('yj', 0.025), ('rt', 0.025), ('nonlinear', 0.024), ('covariance', 0.024), ('kalman', 0.024), ('reliably', 0.024), ('preparation', 0.024), ('vertically', 0.024), ('switches', 0.024), ('changes', 0.023), ('events', 0.023), ('uc', 0.023), ('unsupervised', 0.023), ('hidden', 0.023), ('across', 0.023), ('piecewise', 0.023), ('precede', 0.022), ('variable', 0.022), ('dimensions', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 75 nips-2011-Dynamical segmentation of single trials from population neural data
Author: Biljana Petreska, Byron M. Yu, John P. Cunningham, Gopal Santhanam, Stephen I. Ryu, Krishna V. Shenoy, Maneesh Sahani
Abstract: Simultaneous recordings of many neurons embedded within a recurrentlyconnected cortical network may provide concurrent views into the dynamical processes of that network, and thus its computational function. In principle, these dynamics might be identified by purely unsupervised, statistical means. Here, we show that a Hidden Switching Linear Dynamical Systems (HSLDS) model— in which multiple linear dynamical laws approximate a nonlinear and potentially non-stationary dynamical process—is able to distinguish different dynamical regimes within single-trial motor cortical activity associated with the preparation and initiation of hand movements. The regimes are identified without reference to behavioural or experimental epochs, but nonetheless transitions between them correlate strongly with external events whose timing may vary from trial to trial. The HSLDS model also performs better than recent comparable models in predicting the firing rate of an isolated neuron based on the firing rates of others, suggesting that it captures more of the “shared variance” of the data. Thus, the method is able to trace the dynamical processes underlying the coordinated evolution of network activity in a way that appears to reflect its computational role. 1
2 0.23726892 86 nips-2011-Empirical models of spiking in neural populations
Author: Jakob H. Macke, Lars Buesing, John P. Cunningham, Byron M. Yu, Krishna V. Shenoy, Maneesh Sahani
Abstract: Neurons in the neocortex code and compute as part of a locally interconnected population. Large-scale multi-electrode recording makes it possible to access these population processes empirically by fitting statistical models to unaveraged data. What statistical structure best describes the concurrent spiking of cells within a local network? We argue that in the cortex, where firing exhibits extensive correlations in both time and space and where a typical sample of neurons still reflects only a very small fraction of the local population, the most appropriate model captures shared variability by a low-dimensional latent process evolving with smooth dynamics, rather than by putative direct coupling. We test this claim by comparing a latent dynamical model with realistic spiking observations to coupled generalised linear spike-response models (GLMs) using cortical recordings. We find that the latent dynamical approach outperforms the GLM in terms of goodness-offit, and reproduces the temporal correlations in the data more accurately. We also compare models whose observations models are either derived from a Gaussian or point-process models, finding that the non-Gaussian model provides slightly better goodness-of-fit and more realistic population spike counts. 1
3 0.16472113 302 nips-2011-Variational Learning for Recurrent Spiking Networks
Author: Danilo J. Rezende, Daan Wierstra, Wulfram Gerstner
Abstract: We derive a plausible learning rule for feedforward, feedback and lateral connections in a recurrent network of spiking neurons. Operating in the context of a generative model for distributions of spike sequences, the learning mechanism is derived from variational inference principles. The synaptic plasticity rules found are interesting in that they are strongly reminiscent of experimental Spike Time Dependent Plasticity, and in that they differ for excitatory and inhibitory neurons. A simulation confirms the method’s applicability to learning both stationary and temporal spike patterns. 1
4 0.14222632 249 nips-2011-Sequence learning with hidden units in spiking neural networks
Author: Johanni Brea, Walter Senn, Jean-pascal Pfister
Abstract: We consider a statistical framework in which recurrent networks of spiking neurons learn to generate spatio-temporal spike patterns. Given biologically realistic stochastic neuronal dynamics we derive a tractable learning rule for the synaptic weights towards hidden and visible neurons that leads to optimal recall of the training sequences. We show that learning synaptic weights towards hidden neurons significantly improves the storing capacity of the network. Furthermore, we derive an approximate online learning rule and show that our learning rule is consistent with Spike-Timing Dependent Plasticity in that if a presynaptic spike shortly precedes a postynaptic spike, potentiation is induced and otherwise depression is elicited.
5 0.12077452 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations
Author: Kamiar R. Rad, Liam Paninski
Abstract: Many fundamental questions in theoretical neuroscience involve optimal decoding and the computation of Shannon information rates in populations of spiking neurons. In this paper, we apply methods from the asymptotic theory of statistical inference to obtain a clearer analytical understanding of these quantities. We find that for large neural populations carrying a finite total amount of information, the full spiking population response is asymptotically as informative as a single observation from a Gaussian process whose mean and covariance can be characterized explicitly in terms of network and single neuron properties. The Gaussian form of this asymptotic sufficient statistic allows us in certain cases to perform optimal Bayesian decoding by simple linear transformations, and to obtain closed-form expressions of the Shannon information carried by the network. One technical advantage of the theory is that it may be applied easily even to non-Poisson point process network models; for example, we find that under some conditions, neural populations with strong history-dependent (non-Poisson) effects carry exactly the same information as do simpler equivalent populations of non-interacting Poisson neurons with matched firing rates. We argue that our findings help to clarify some results from the recent literature on neural decoding and neuroprosthetic design.
6 0.11716148 2 nips-2011-A Brain-Machine Interface Operating with a Real-Time Spiking Neural Network Control Algorithm
7 0.11352111 301 nips-2011-Variational Gaussian Process Dynamical Systems
8 0.10895298 224 nips-2011-Probabilistic Modeling of Dependencies Among Visual Short-Term Memory Representations
9 0.098492831 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons
10 0.095725872 219 nips-2011-Predicting response time and error rates in visual search
11 0.090954795 23 nips-2011-Active dendrites: adaptation to spike-based communication
12 0.090446733 133 nips-2011-Inferring spike-timing-dependent plasticity from spike train data
13 0.082095355 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes
14 0.076551668 94 nips-2011-Facial Expression Transfer with Input-Output Temporal Restricted Boltzmann Machines
15 0.076096073 200 nips-2011-On the Analysis of Multi-Channel Neural Spike Data
16 0.074196547 68 nips-2011-Demixed Principal Component Analysis
17 0.072022706 184 nips-2011-Neuronal Adaptation for Sampling-Based Probabilistic Inference in Perceptual Bistability
18 0.071590133 131 nips-2011-Inference in continuous-time change-point models
19 0.066475824 134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning
topicId topicWeight
[(0, 0.168), (1, 0.08), (2, 0.25), (3, 0.002), (4, 0.058), (5, -0.019), (6, -0.017), (7, -0.081), (8, 0.022), (9, 0.009), (10, 0.044), (11, -0.03), (12, 0.015), (13, 0.016), (14, -0.016), (15, -0.022), (16, -0.085), (17, 0.056), (18, -0.019), (19, -0.05), (20, -0.053), (21, -0.058), (22, -0.039), (23, 0.053), (24, -0.042), (25, -0.111), (26, -0.024), (27, -0.058), (28, 0.055), (29, 0.008), (30, -0.02), (31, -0.058), (32, 0.014), (33, -0.06), (34, -0.004), (35, 0.028), (36, 0.065), (37, 0.031), (38, -0.02), (39, -0.104), (40, 0.211), (41, 0.068), (42, 0.001), (43, 0.012), (44, 0.006), (45, -0.028), (46, 0.043), (47, 0.082), (48, -0.081), (49, -0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.93755198 75 nips-2011-Dynamical segmentation of single trials from population neural data
Author: Biljana Petreska, Byron M. Yu, John P. Cunningham, Gopal Santhanam, Stephen I. Ryu, Krishna V. Shenoy, Maneesh Sahani
Abstract: Simultaneous recordings of many neurons embedded within a recurrentlyconnected cortical network may provide concurrent views into the dynamical processes of that network, and thus its computational function. In principle, these dynamics might be identified by purely unsupervised, statistical means. Here, we show that a Hidden Switching Linear Dynamical Systems (HSLDS) model— in which multiple linear dynamical laws approximate a nonlinear and potentially non-stationary dynamical process—is able to distinguish different dynamical regimes within single-trial motor cortical activity associated with the preparation and initiation of hand movements. The regimes are identified without reference to behavioural or experimental epochs, but nonetheless transitions between them correlate strongly with external events whose timing may vary from trial to trial. The HSLDS model also performs better than recent comparable models in predicting the firing rate of an isolated neuron based on the firing rates of others, suggesting that it captures more of the “shared variance” of the data. Thus, the method is able to trace the dynamical processes underlying the coordinated evolution of network activity in a way that appears to reflect its computational role. 1
2 0.8304134 86 nips-2011-Empirical models of spiking in neural populations
Author: Jakob H. Macke, Lars Buesing, John P. Cunningham, Byron M. Yu, Krishna V. Shenoy, Maneesh Sahani
Abstract: Neurons in the neocortex code and compute as part of a locally interconnected population. Large-scale multi-electrode recording makes it possible to access these population processes empirically by fitting statistical models to unaveraged data. What statistical structure best describes the concurrent spiking of cells within a local network? We argue that in the cortex, where firing exhibits extensive correlations in both time and space and where a typical sample of neurons still reflects only a very small fraction of the local population, the most appropriate model captures shared variability by a low-dimensional latent process evolving with smooth dynamics, rather than by putative direct coupling. We test this claim by comparing a latent dynamical model with realistic spiking observations to coupled generalised linear spike-response models (GLMs) using cortical recordings. We find that the latent dynamical approach outperforms the GLM in terms of goodness-offit, and reproduces the temporal correlations in the data more accurately. We also compare models whose observations models are either derived from a Gaussian or point-process models, finding that the non-Gaussian model provides slightly better goodness-of-fit and more realistic population spike counts. 1
3 0.72938102 2 nips-2011-A Brain-Machine Interface Operating with a Real-Time Spiking Neural Network Control Algorithm
Author: Julie Dethier, Paul Nuyujukian, Chris Eliasmith, Terrence C. Stewart, Shauki A. Elasaad, Krishna V. Shenoy, Kwabena A. Boahen
Abstract: Motor prostheses aim to restore function to disabled patients. Despite compelling proof of concept systems, barriers to clinical translation remain. One challenge is to develop a low-power, fully-implantable system that dissipates only minimal power so as not to damage tissue. To this end, we implemented a Kalman-filter based decoder via a spiking neural network (SNN) and tested it in brain-machine interface (BMI) experiments with a rhesus monkey. The Kalman filter was trained to predict the arm’s velocity and mapped on to the SNN using the Neural Engineering Framework (NEF). A 2,000-neuron embedded Matlab SNN implementation runs in real-time and its closed-loop performance is quite comparable to that of the standard Kalman filter. The success of this closed-loop decoder holds promise for hardware SNN implementations of statistical signal processing algorithms on neuromorphic chips, which may offer power savings necessary to overcome a major obstacle to the successful clinical translation of neural motor prostheses. ∗ Present: Research Fellow F.R.S.-FNRS, Systmod Unit, University of Liege, Belgium. 1 1 Cortically-controlled motor prostheses: the challenge Motor prostheses aim to restore function for severely disabled patients by translating neural signals from the brain into useful control signals for prosthetic limbs or computer cursors. Several proof of concept demonstrations have shown encouraging results, but barriers to clinical translation still remain. One example is the development of a fully-implantable system that meets power dissipation constraints, but is still powerful enough to perform complex operations. A recently reported closedloop cortically-controlled motor prosthesis is capable of producing quick, accurate, and robust computer cursor movements by decoding neural signals (threshold-crossings) from a 96-electrode array in rhesus macaque premotor/motor cortex [1]-[4]. This, and previous designs (e.g., [5]), employ versions of the Kalman filter, ubiquitous in statistical signal processing. Such a filter and its variants are the state-of-the-art decoder for brain-machine interfaces (BMIs) in humans [5] and monkeys [2]. While these recent advances are encouraging, clinical translation of such BMIs requires fullyimplanted systems, which in turn impose severe power dissipation constraints. Even though it is an open, actively-debated question as to how much of the neural prosthetic system must be implanted, we note that there are no reports to date demonstrating a fully implantable 100-channel wireless transmission system, motivating performing decoding within the implanted chip. This computation is constrained by a stringent power budget: A 6 × 6mm2 implant must dissipate less than 10mW to avoid heating the brain by more than 1◦ C [6], which is believed to be important for long term cell health. With this power budget, current approaches can not scale to higher electrode densities or to substantially more computer-intensive decode/control algorithms. The feasibility of mapping a Kalman-filter based decoder algorithm [1]-[4] on to a spiking neural network (SNN) has been explored off-line (open-loop). In these off-line tests, the SNN’s performance virtually matched that of the standard implementation [7]. These simulations provide confidence that this algorithm—and others similar to it—could be implemented using an ultra-low-power approach potentially capable of meeting the severe power constraints set by clinical translation. This neuromorphic approach uses very-large-scale integrated systems containing microelectronic analog circuits to morph neural systems into silicon chips [8, 9]. These neuromorphic circuits may yield tremendous power savings—50nW per silicon neuron [10]—over digital circuits because they use physical operations to perform mathematical computations (analog approach). When implemented on a chip designed using the neuromorphic approach, a 2,000-neuron SNN network can consume as little as 100µW. Demonstrating this approach’s feasibility in a closed-loop system running in real-time is a key, non-incremental step in the development of a fully implantable decoding chip, and is necessary before proceeding with fabricating and implanting the chip. As noise, delay, and over-fitting play a more important role in the closed-loop setting, it is not obvious that the SNN’s stellar open-loop performance will hold up. In addition, performance criteria are different in the closed-loop and openloop settings (e.g., time per target vs. root mean squared error). Therefore, a SNN of a different size may be required to meet the desired specifications. Here we present results and assess the performance and viability of the SNN Kalman-filter based decoder in real-time, closed-loop tests, with the monkey performing a center-out-and-back target acquisition task. To achieve closed-loop operation, we developed an embedded Matlab implementation that ran a 2,000-neuron version of the SNN in real-time on a PC. We achieved almost a 50-fold speed-up by performing part of the computation in a lower-dimensional space defined by the formal method we used to map the Kalman filter on to the SNN. This shortcut allowed us to run a larger SNN in real-time than would otherwise be possible. 2 Spiking neural network mapping of control theory algorithms As reported in [11], a formal methodology, called the Neural Engineering Framework (NEF), has been developed to map control-theory algorithms onto a computational fabric consisting of a highly heterogeneous population of spiking neurons simply by programming the strengths of their connections. These artificial neurons are characterized by a nonlinear multi-dimensional-vector-to-spikerate function—ai (x(t)) for the ith neuron—with parameters (preferred direction, maximum firing rate, and spiking-threshold) drawn randomly from a wide distribution (standard deviation ≈ mean). 2 Spike rate (spikes/s) Representation ˆ x → ai (x) → x = ∑i ai (x)φix ˜ ai (x) = G(αi φix · x + Jibias ) 400 Transformation y = Ax → b j (Aˆ ) x Aˆ = ∑i ai (x)Aφix x x(t) B' y(t) A' 200 0 −1 Dynamics ˙ x = Ax → x = h ∗ A x A = τA + I 0 Stimulus x 1 bk(t) y(t) B' h(t) x(t) A' aj(t) Figure 1: NEF’s three principles. Representation. 1D tuning curves of a population of 50 leaky integrate-and-fire neurons. The neurons’ tuning curves map control variables (x) to spike rates (ai (x)); this nonlinear transformation is inverted by linear weighted decoding. G() is the neurons’ nonlinear current-to-spike-rate function. Transformation. SNN with populations bk (t) and a j (t) representing y(t) and x(t). Feedforward and recurrent weights are determined by B and A , as described next. Dynamics. The system’s dynamics is captured in a neurally plausible fashion by replacing integration with the synapses’ spike response, h(t), and replacing the matrices with A = τA + I and B = τB to compensate. The neural engineering approach to configuring SNNs to perform arbitrary computations is underlined by three principles (Figure 1) [11]-[14]: Representation is defined by nonlinear encoding of x(t) as a spike rate, ai (x(t))—represented by the neuron tuning curve—combined with optimal weighted linear decoding of ai (x(t)) to recover ˆ an estimate of x(t), x(t) = ∑i ai (x(t))φix , where φix are the decoding weights. Transformation is performed by using alternate decoding weights in the decoding operation to map transformations of x(t) directly into transformations of ai (x(t)). For example, y(t) = Ax(t) is represented by the spike rates b j (Aˆ (t)), where unit j’s input is computed directly from unit i’s x output using Aˆ (t) = ∑i ai (x(t))Aφix , an alternative linear weighting. x Dynamics brings the first two principles together and adds the time dimension to the circuit. This principle aims at reuniting the control-theory and neural levels by modifying the matrices to render the system neurally plausible, thereby permitting the synapses’ spike response, h(t), (i.e., impulse ˙ response) to capture the system’s dynamics. For example, for h(t) = τ −1 e−t/τ , x = Ax(t) is realized by replacing A with A = τA + I. This so-called neurally plausible matrix yields an equivalent dynamical system: x(t) = h(t) ∗ A x(t), where convolution replaces integration. The nonlinear encoding process—from a multi-dimensional stimulus, x(t), to a one-dimensional soma current, Ji (x(t)), to a firing rate, ai (x(t))—is specified as: ai (x(t)) = G(Ji (x(t))). (1) Here G is the neurons’ nonlinear current-to-spike-rate function, which is given by G(Ji (x)) = τ ref − τ RC ln (1 − Jth /Ji (x)) −1 , (2) for the leaky integrate-and-fire model (LIF). The LIF neuron has two behavioral regimes: subthreshold and super-threshold. The sub-threshold regime is described by an RC circuit with time constant τ RC . When the sub-threshold soma voltage reaches the threshold, Vth , the neuron emits a spike δ (t −tn ). After this spike, the neuron is reset and rests for τ ref seconds (absolute refractory period) before it resumes integrating. Jth = Vth /R is the minimum input current that produces spiking. Ignoring the soma’s RC time-constant when specifying the SNN’s dynamics are reasonable because the neurons cross threshold at a rate that is proportional to their input current, which thus sets the spike rate instantaneously, without any filtering [11]. The conversion from a multi-dimensional stimulus, x(t), to a one-dimensional soma current, Ji , is ˜ performed by assigning to the neuron a preferred direction, φix , in the stimulus space and taking the dot-product: ˜ Ji (x(t)) = αi φix · x(t) + Jibias , (3) 3 where αi is a gain or conversion factor, and Jibias is a bias current that accounts for background ˜ activity. For a 1D space, φix is either +1 or −1 (drawn randomly), for ON and OFF neurons, respectively. The resulting tuning curves are illustrated in Figure 1, left. The linear decoding process is characterized by the synapses’ spike response, h(t) (i.e., post-synaptic currents), and the decoding weights, φix , which are obtained by minimizing the mean square error. A single noise term, η, takes into account all sources of noise, which have the effect of introducing uncertainty into the decoding process. Hence, the transmitted firing rate can be written as ai (x(t)) + ηi , where ai (x(t)) represents the noiseless set of tuning curves and ηi is a random variable picked from a zero-mean Gaussian distribution with variance σ 2 . Consequently, the mean square error can be written as [11]: E = 1 ˆ [x(t) − x(t)]2 2 x,η,t = 2 1 2 x(t) − ∑ (ai (x(t)) + ηi ) φix i (4) x,η,t where · x,η denotes integration over the range of x and η, the expected noise. We assume that the noise is independent and has the same variance for each neuron [11], which yields: E= where σ2 1 2 2 x(t) − ∑ ai (x(t))φix i x,t 1 + σ 2 ∑(φix )2 , 2 i (5) is the noise variance ηi η j . This expression is minimized by: N φix = ∑ Γ−1 ϒ j , ij (6) j with Γi j = ai (x)a j (x) x + σ 2 δi j , where δ is the Kronecker delta function matrix, and ϒ j = xa j (x) x [11]. One consequence of modeling noise in the neural representation is that the matrix Γ is invertible despite the use of a highly overcomplete representation. In a noiseless representation, Γ is generally singular because, due to the large number of neurons, there is a high probability of having two neurons with similar tuning curves leading to two similar rows in Γ. 3 Kalman-filter based cortical decoder In the 1960’s, Kalman described a method that uses linear filtering to track the state of a dynamical system throughout time using a model of the dynamics of the system as well as noisy measurements [15]. The model dynamics gives an estimate of the state of the system at the next time step. This estimate is then corrected using the observations (i.e., measurements) at this time step. The relative weights for these two pieces of information are given by the Kalman gain, K [15, 16]. Whereas the Kalman gain is updated at each iteration, the state and observation matrices (defined below)—and corresponding noise matrices—are supposed constant. In the case of prosthetic applications, the system’s state vector is the cursor’s kinematics, xt = y [veltx , velt , 1], where the constant 1 allows for a fixed offset compensation. The measurement vector, yt , is the neural spike rate (spike counts in each time step) of 192 channels of neural threshold crossings. The system’s dynamics is modeled by: xt yt = Axt−1 + wt , = Cxt + qt , (7) (8) where A is the state matrix, C is the observation matrix, and wt and qt are additive, Gaussian noise sources with wt ∼ N (0, W) and qt ∼ N (0, Q). The model parameters (A, C, W and Q) are fit with training data by correlating the observed hand kinematics with the simultaneously measured neural signals (Figure 2). For an efficient decoding, we derived the steady-state update equation by replacing the adaptive Kalman gain by its steady-state formulation: K = (I + WCQ−1 C)−1 W CT Q−1 . This yields the following estimate of the system’s state: xt = (I − KC)Axt−1 + Kyt = MDT xt−1 + MDT yt , x y 4 (9) a Velocity (cm/s) Neuron 10 c 150 5 100 b 50 20 0 −20 0 0 x−velocity y−velocity 2000 4000 6000 8000 Time (ms) 10000 12000 1cm 14000 Trials: 0034-0049 Figure 2: Neural and kinematic measurements (monkey J, 2011-04-16, 16 continuous trials) used to fit the standard Kalman filter model. a. The 192 cortical recordings fed as input to fit the Kalman filter’s matrices (color code refers to the number of threshold crossings observed in each 50ms bin). b. Hand x- and y-velocity measurements correlated with the neural data to obtain the Kalman filter’s matrices. c. Cursor kinematics of 16 continuous trials under direct hand control. where MDT = (I − KC)A and MDT = K are the discrete time (DT) Kalman matrices. The steadyx y state formulation improves efficiency with little loss in accuracy because the optimal Kalman gain rapidly converges (typically less than 100 iterations). Indeed, in neural applications under both open-loop and closed-loop conditions, the difference between the full Kalman filter and its steadystate implementation falls to within 1% in a few seconds [17]. This simplifying assumption reduces the execution time for decoding a typical neuronal firing rate signal approximately seven-fold [17], a critical speed-up for real-time applications. 4 Kalman filter with a spiking neural network To implement the Kalman filter with a SNN by applying the NEF, we first convert Equation 9 from DT to continuous time (CT), and then replace the CT matrices with neurally plausible ones, which yields: x(t) = h(t) ∗ A x(t) + B y(t) , (10) where A = τMCT + I, B = τMCT , with MCT = MDT − I /∆t and MCT = MDT /∆t, the CT x y x x y y Kalman matrices, and ∆t = 50ms, the discrete time step; τ is the synaptic time-constant. The jth neuron’s input current (see Equation 3) is computed from the system’s current state, x(t), which is computed from estimates of the system’s previous state (ˆ (t) = ∑i ai (t)φix ) and current x y input (ˆ (t) = ∑k bk (t)φk ) using Equation 10. This yields: y ˜x J j (x(t)) = α j φ j · x(t) + J bias j ˜x ˆ ˆ = α j φ j · h(t) ∗ A x(t) + B y(t) ˜x = α j φ j · h(t) ∗ A + J bias j ∑ ai (t)φix + B ∑ bk (t)φky i + J bias j (11) k This last equation can be written in a neural network form: J j (x(t)) = h(t) ∗ ∑ ω ji ai (t) + ∑ ω jk bk (t) i + J bias j (12) k y ˜x ˜x where ω ji = α j φ j A φix and ω jk = α j φ j B φk are the recurrent and feedforward weights, respectively. 5 Efficient implementation of the SNN In this section, we describe the two distinct steps carried out when implementing the SNN: creating and running the network. The first step has no computational constraints whereas the second must be very efficient in order to be successfully deployed in the closed-loop experimental setting. 5 x ( 1000 x ( = 1000 1000 = 1000 x 1000 b 1000 x 1000 1000 a Figure 3: Computing a 1000-neuron pool’s recurrent connections. a. Using connection weights requires multiplying a 1000×1000 matrix by a 1000 ×1 vector. b. Operating in the lower-dimensional state space requires multiplying a 1 × 1000 vector by a 1000 × 1 vector to get the decoded state, multiplying this state by a component of the A matrix to update it, and multiplying the updated state by a 1000 × 1 vector to re-encode it as firing rates, which are then used to update the soma current for every neuron. Network creation: This step generates, for a specified number of neurons composing the network, x ˜x the gain α j , bias current J bias , preferred direction φ j , and decoding weight φ j for each neuron. The j ˜x preferred directions φ j are drawn randomly from a uniform distribution over the unit sphere. The maximum firing rate, max G(J j (x)), and the normalized x-axis intercept, G(J j (x)) = 0, are drawn randomly from a uniform distribution on [200, 400] Hz and [-1, 1], respectively. From these two specifications, α j and J bias are computed using Equation 2 and Equation 3. The decoding weights j x φ j are computed by minimizing the mean square error (Equation 6). For efficient implementation, we used two 1D integrators (i.e., two recurrent neuron pools, with each pool representing a scalar) rather than a single 3D integrator (i.e., one recurrent neuron pool, with the pool representing a 3D vector by itself) [13]. The constant 1 is fed to the 1D integrators as an input, rather than continuously integrated as part of the state vector. We also replaced the bk (t) units’ spike rates (Figure 1, middle) with the 192 neural measurements (spike counts in 50ms bins), y which is equivalent to choosing φk from a standard basis (i.e., a unit vector with 1 at the kth position and 0 everywhere else) [7]. Network simulation: This step runs the simulation to update the soma current for every neuron, based on input spikes. The soma voltage is then updated following RC circuit dynamics. Gaussian noise is normally added at this step, the rest of the simulation being noiseless. Neurons with soma voltage above threshold generate a spike and enter their refractory period. The neuron firing rates are decoded using the linear decoding weights to get the updated states values, x and y-velocity. These values are smoothed with a filter identical to h(t), but with τ set to 5ms instead of 20ms to avoid introducing significant delay. Then the simulation step starts over again. In order to ensure rapid execution of the simulation step, neuron interactions are not updated dix rectly using the connection matrix (Equation 12), but rather indirectly with the decoding matrix φ j , ˜x dynamics matrix A , and preferred direction matrix φ j (Equation 11). To see why this is more efficient, suppose we have 1000 neurons in the a population for each of the state vector’s two scalars. Computing the recurrent connections using connection weights requires multiplying a 1000 × 1000 matrix by a 1000-dimensional vector (Figure 3a). This requires 106 multiplications and about 106 sums. Decoding each scalar (i.e., ∑i ai (t)φix ), however, requires only 1000 multiplications and 1000 sums. The decoded state vector is then updated by multiplying it by the (diagonal) A matrix, another 2 products and 1 sum. The updated state vector is then encoded by multiplying it with the neurons’ preferred direction vectors, another 1000 multiplications per scalar (Figure 3b). The resulting total of about 3000 operations is nearly three orders of magnitude fewer than using the connection weights to compute the identical transformation. To measure the speedup, we simulated a 2,000-neuron network on a computer running Matlab 2011a (Intel Core i7, 2.7-GHz, Mac OS X Lion). Although the exact run-times depend on the computing hardware and software, the run-time reduction factor should remain approximately constant across platforms. For each reported result, we ran the simulation 10 times to obtain a reliable estimate of the execution time. The run-time for neuron interactions using the recurrent connection weights was 9.9ms and dropped to 2.7µs in the lower-dimensional space, approximately a 3,500-fold speedup. Only the recurrent interactions benefit from the speedup, the execution time for the rest of the operations remaining constant. The run-time for a 50ms network simulation using the recurrent connec6 Table 1: Model parameters Symbol max G(J j (x)) G(J j (x)) = 0 J bias j αj ˜x φj Range 200-400 Hz −1 to 1 Satisfies first two Satisfies first two ˜x φj = 1 Description Maximum firing rate Normalized x-axis intercept Bias current Gain factor Preferred-direction vector σ2 τ RC j τ ref j τ PSC j 0.1 20 ms 1 ms 20 ms Gaussian noise variance RC time constant Refractory period PSC time constant tion weights was 0.94s and dropped to 0.0198s in the lower-dimensional space, a 47-fold speedup. These results demonstrate the efficiency the lower-dimensional space offers, which made the closedloop application of SNNs possible. 6 Closed-loop implementation An adult male rhesus macaque (monkey J) was trained to perform a center-out-and-back reaching task for juice rewards to one of eight targets, with a 500ms hold time (Figure 4a) [1]. All animal protocols and procedures were approved by the Stanford Institutional Animal Care and Use Committee. Hand position was measured using a Polaris optical tracking system at 60Hz (Northern Digital Inc.). Neural data were recorded from two 96-electrode silicon arrays (Blackrock Microsystems) implanted in the dorsal pre-motor and motor cortex. These recordings (-4.5 RMS threshold crossing applied to each electrode’s signal) yielded tuned activity for the direction and speed of arm movements. As detailed in [1], a standard Kalman filter model was fit by correlating the observed hand kinematics with the simultaneously measured neural signals, while the monkey moved his arm to acquire virtual targets (Figure 2). The resulting model was used in a closed-loop system to control an on-screen cursor in real-time (Figure 4a, Decoder block). A steady-state version of this model serves as the standard against which the SNN implementation’s performance is compared. We built a SNN using the NEF methodology based on derived Kalman filter parameters mentioned above. This SNN was then simulated on an xPC Target (Mathworks) x86 system (Dell T3400, Intel Core 2 Duo E8600, 3.33GHz). It ran in closed-loop, replacing the standard Kalman filter as the decoder block in Figure 4a. The parameter values listed in Table 1 were used for the SNN implementation. We ensured that the time constants τiRC ,τiref , and τiPSC were smaller than the implementation’s time step (50ms). Noise was not explicitly added. It arose naturally from the fluctuations produced by representing a scalar with filtered spike trains, which has been shown to have effects similar to Gaussian noise [11]. For the purpose of computing the linear decoding weights (i.e., Γ), we modeled the resulting noise as Gaussian with a variance of 0.1. A 2,000-neuron version of the SNN-based decoder was tested in a closed-loop system, the largest network our embedded MatLab implementation could run in real-time. There were 1206 trials total among which 301 (center-outs only) were performed with the SNN and 302 with the standard (steady-state) Kalman filter. The block structure was randomized and interleaved, so that there is no behavioral bias present in the findings. 100 trials under hand control are used as a baseline comparison. Success corresponds to a target acquisition under 1500ms, with 500ms hold time. Success rates were higher than 99% on all blocks for the SNN implementation and 100% for the standard Kalman filter. The average time to acquire the target was slightly slower for the SNN (Figure 5b)—711ms vs. 661ms, respectively—we believe this could be improved by using more neurons in the SNN.1 The average distance to target (Figure 5a) and the average velocity of the cursor (Figure 5c) are very similar. 1 Off-line, the SNN performed better as we increased the number of neurons [7]. 7 a Neural Spikes b c BMI: Kalman decoder BMI: SNN decoder Decoder Cursor Velocity 1cm 1cm Trials: 2056-2071 Trials: 1748-1763 5 0 0 400 Time after Target Onset (ms) 800 Target acquisition time histogram 40 Mean cursor velocity 50 Standard Kalman filter 40 20 Hand 30 30 Spiking Neural Network 20 10 0 c Cursor Velocity (cm/s) b Mean distance to target 10 Percent of Trials (%) a Distance to Target (cm) Figure 4: Experimental setup and results. a. Data are recorded from two 96-channel silicon electrode arrays implanted in dorsal pre-motor and motor cortex of an adult male monkey performing a centerout-and-back reach task for juice rewards to one of eight targets with a 500ms hold time. b. BMI position kinematics of 16 continuous trials for the standard Kalman filter implementation. c. BMI position kinematics of 16 continuous trials for the SNN implementation. 10 0 500 1000 Target Acquire Time (ms) 1500 0 0 200 400 600 800 Time after Target Onset (ms) 1000 Figure 5: SNN (red) performance compared to standard Kalman filter (blue) (hand trials are shown for reference (yellow)). The SNN achieves similar results—success rates are higher than 99% on all blocks—as the standard Kalman filter implementation. a. Plot of distance to target vs. time both after target onset for different control modalities. The thicker traces represent the average time when the cursor first enters the acceptance window until successfully entering for the 500ms hold time. b. Histogram of target acquisition time. c. Plot of mean cursor velocity vs. time. 7 Conclusions and future work The SNN’s performance was quite comparable to that produced by a standard Kalman filter implementation. The 2,000-neuron network had success rates higher than 99% on all blocks, with mean distance to target, target acquisition time, and mean cursor velocity curves very similar to the ones obtained with the standard implementation. Future work will explore whether these results extend to additional animals. As the Kalman filter and its variants are the state-of-the-art in cortically-controlled motor prostheses [1]-[5], these simulations provide confidence that similar levels of performance can be attained with a neuromorphic system, which can potentially overcome the power constraints set by clinical applications. Our ultimate goal is to develop an ultra-low-power neuromorphic chip for prosthetic applications on to which control theory algorithms can be mapped using the NEF. As our next step in this direction, we will begin exploring this mapping with Neurogrid, a hardware platform with sixteen programmable neuromorphic chips that can simulate up to a million spiking neurons in real-time [9]. However, bandwidth limitations prevent Neurogrid from realizing random connectivity patterns. It can only connect each neuron to thousands of others if neighboring neurons share common inputs — just as they do in the cortex. Such columnar organization may be possible with NEF-generated networks if preferred directions vectors are assigned topographically rather than randomly. Implementing this constraint effectively is a subject of ongoing research. Acknowledgment This work was supported in part by the Belgian American Education Foundation(J. Dethier), Stanford NIH Medical Scientist Training Program (MSTP) and Soros Fellowship (P. Nuyujukian), DARPA Revolutionizing Prosthetics program (N66001-06-C-8005, K. V. Shenoy), and two NIH Director’s Pioneer Awards (DP1-OD006409, K. V. Shenoy; DPI-OD000965, K. Boahen). 8 References [1] V. Gilja, Towards clinically viable neural prosthetic systems, Ph.D. Thesis, Department of Computer Science, Stanford University, 2010, pp 19–22 and pp 57–73. [2] V. Gilja, P. Nuyujukian, C.A. Chestek, J.P. Cunningham, J.M. Fan, B.M. Yu, S.I. Ryu, and K.V. Shenoy, A high-performance continuous cortically-controlled prosthesis enabled by feedback control design, 2010 Neuroscience Meeting Planner, San Diego, CA: Society for Neuroscience, 2010. [3] P. Nuyujukian, V. Gilja, C.A. Chestek, J.P. Cunningham, J.M. Fan, B.M. Yu, S.I. Ryu, and K.V. Shenoy, Generalization and robustness of a continuous cortically-controlled prosthesis enabled by feedback control design, 2010 Neuroscience Meeting Planner, San Diego, CA: Society for Neuroscience, 2010. [4] V. Gilja, C.A. Chestek, I. Diester, J.M. Henderson, K. Deisseroth, and K.V. Shenoy, Challenges and opportunities for next-generation intra-cortically based neural prostheses, IEEE Transactions on Biomedical Engineering, 2011, in press. [5] S.P. Kim, J.D. Simeral, L.R. Hochberg, J.P. Donoghue, and M.J. Black, Neural control of computer cursor velocity by decoding motor cortical spiking activity in humans with tetraplegia, Journal of Neural Engineering, vol. 5, 2008, pp 455–476. [6] S. Kim, P. Tathireddy, R.A. Normann, and F. Solzbacher, Thermal impact of an active 3-D microelectrode array implanted in the brain, IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 15, 2007, pp 493–501. [7] J. Dethier, V. Gilja, P. Nuyujukian, S.A. Elassaad, K.V. Shenoy, and K. Boahen, Spiking neural network decoder for brain-machine interfaces, IEEE Engineering in Medicine & Biology Society Conference on Neural Engineering, Cancun, Mexico, 2011, pp 396–399. [8] K. Boahen, Neuromorphic microchips, Scientific American, vol. 292(5), 2005, pp 56–63. [9] R. Silver, K. Boahen, S. Grillner, N. Kopell, and K.L. Olsen, Neurotech for neuroscience: unifying concepts, organizing principles, and emerging tools, Journal of Neuroscience, vol. 27(44), 2007, pp 11807– 11819. [10] J.V. Arthur and K. Boahen, Silicon neuron design: the dynamical systems approach, IEEE Transactions on Circuits and Systems, vol. 58(5), 2011, pp 1034-1043. [11] C. Eliasmith and C.H. Anderson, Neural engineering: computation, representation, and dynamics in neurobiological systems, MIT Press, Cambridge, MA; 2003. [12] C. Eliasmith, A unified approach to building and controlling spiking attractor networks, Neural Computation, vol. 17, 2005, pp 1276–1314. [13] R. Singh and C. Eliasmith, Higher-dimensional neurons explain the tuning and dynamics of working memory cells, The Journal of Neuroscience, vol. 26(14), 2006, pp 3667–3678. [14] C. Eliasmith, How to build a brain: from function to implementation, Synthese, vol. 159(3), 2007, pp 373–388. [15] R.E. Kalman, A new approach to linear filtering and prediction problems, Transactions of the ASME– Journal of Basic Engineering, vol. 82(Series D), 1960, pp 35–45. [16] G. Welsh and G. Bishop, An introduction to the Kalman Filter, University of North Carolina at Chapel Hill Chapel Hill NC, vol. 95(TR 95-041), 1995, pp 1–16. [17] W.Q. Malik, W. Truccolo, E.N. Brown, and L.R. Hochberg, Efficient decoding with steady-state Kalman filter in neural interface systems, IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 19(1), 2011, pp 25–34. 9
4 0.69501752 302 nips-2011-Variational Learning for Recurrent Spiking Networks
Author: Danilo J. Rezende, Daan Wierstra, Wulfram Gerstner
Abstract: We derive a plausible learning rule for feedforward, feedback and lateral connections in a recurrent network of spiking neurons. Operating in the context of a generative model for distributions of spike sequences, the learning mechanism is derived from variational inference principles. The synaptic plasticity rules found are interesting in that they are strongly reminiscent of experimental Spike Time Dependent Plasticity, and in that they differ for excitatory and inhibitory neurons. A simulation confirms the method’s applicability to learning both stationary and temporal spike patterns. 1
5 0.66796619 224 nips-2011-Probabilistic Modeling of Dependencies Among Visual Short-Term Memory Representations
Author: Emin Orhan, Robert A. Jacobs
Abstract: Extensive evidence suggests that items are not encoded independently in visual short-term memory (VSTM). However, previous research has not quantitatively considered how the encoding of an item influences the encoding of other items. Here, we model the dependencies among VSTM representations using a multivariate Gaussian distribution with a stimulus-dependent mean and covariance matrix. We report the results of an experiment designed to determine the specific form of the stimulus-dependence of the mean and the covariance matrix. We find that the magnitude of the covariance between the representations of two items is a monotonically decreasing function of the difference between the items’ feature values, similar to a Gaussian process with a distance-dependent, stationary kernel function. We further show that this type of covariance function can be explained as a natural consequence of encoding multiple stimuli in a population of neurons with correlated responses. 1
6 0.61055607 249 nips-2011-Sequence learning with hidden units in spiking neural networks
7 0.59985918 23 nips-2011-Active dendrites: adaptation to spike-based communication
8 0.5950768 133 nips-2011-Inferring spike-timing-dependent plasticity from spike train data
9 0.58909029 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations
10 0.57213324 68 nips-2011-Demixed Principal Component Analysis
11 0.52580363 301 nips-2011-Variational Gaussian Process Dynamical Systems
13 0.48876429 148 nips-2011-Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities
14 0.47168976 219 nips-2011-Predicting response time and error rates in visual search
15 0.45282969 292 nips-2011-Two is better than one: distinct roles for familiarity and recollection in retrieving palimpsest memories
16 0.44801906 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes
17 0.43376201 94 nips-2011-Facial Expression Transfer with Input-Output Temporal Restricted Boltzmann Machines
18 0.42892566 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons
19 0.41767192 184 nips-2011-Neuronal Adaptation for Sampling-Based Probabilistic Inference in Perceptual Bistability
20 0.41581237 93 nips-2011-Extracting Speaker-Specific Information with a Regularized Siamese Deep Network
topicId topicWeight
[(0, 0.015), (4, 0.197), (7, 0.014), (20, 0.027), (26, 0.017), (31, 0.17), (33, 0.017), (34, 0.052), (43, 0.064), (45, 0.069), (57, 0.044), (65, 0.025), (74, 0.04), (83, 0.078), (84, 0.022), (99, 0.046)]
simIndex simValue paperId paperTitle
1 0.9151867 139 nips-2011-Kernel Bayes' Rule
Author: Kenji Fukumizu, Le Song, Arthur Gretton
Abstract: A nonparametric kernel-based method for realizing Bayes’ rule is proposed, based on kernel representations of probabilities in reproducing kernel Hilbert spaces. The prior and conditional probabilities are expressed as empirical kernel mean and covariance operators, respectively, and the kernel mean of the posterior distribution is computed in the form of a weighted sample. The kernel Bayes’ rule can be applied to a wide variety of Bayesian inference problems: we demonstrate Bayesian computation without likelihood, and filtering with a nonparametric statespace model. A consistency rate for the posterior estimate is established. 1
same-paper 2 0.9002645 75 nips-2011-Dynamical segmentation of single trials from population neural data
Author: Biljana Petreska, Byron M. Yu, John P. Cunningham, Gopal Santhanam, Stephen I. Ryu, Krishna V. Shenoy, Maneesh Sahani
Abstract: Simultaneous recordings of many neurons embedded within a recurrentlyconnected cortical network may provide concurrent views into the dynamical processes of that network, and thus its computational function. In principle, these dynamics might be identified by purely unsupervised, statistical means. Here, we show that a Hidden Switching Linear Dynamical Systems (HSLDS) model— in which multiple linear dynamical laws approximate a nonlinear and potentially non-stationary dynamical process—is able to distinguish different dynamical regimes within single-trial motor cortical activity associated with the preparation and initiation of hand movements. The regimes are identified without reference to behavioural or experimental epochs, but nonetheless transitions between them correlate strongly with external events whose timing may vary from trial to trial. The HSLDS model also performs better than recent comparable models in predicting the firing rate of an isolated neuron based on the firing rates of others, suggesting that it captures more of the “shared variance” of the data. Thus, the method is able to trace the dynamical processes underlying the coordinated evolution of network activity in a way that appears to reflect its computational role. 1
3 0.8932783 130 nips-2011-Inductive reasoning about chimeric creatures
Author: Charles Kemp
Abstract: Given one feature of a novel animal, humans readily make inferences about other features of the animal. For example, winged creatures often fly, and creatures that eat fish often live in the water. We explore the knowledge that supports these inferences and compare two approaches. The first approach proposes that humans rely on abstract representations of dependency relationships between features, and is formalized here as a graphical model. The second approach proposes that humans rely on specific knowledge of previously encountered animals, and is formalized here as a family of exemplar models. We evaluate these models using a task where participants reason about chimeras, or animals with pairs of features that have not previously been observed to co-occur. The results support the hypothesis that humans rely on explicit representations of relationships between features. Suppose that an eighteenth-century naturalist learns about a new kind of animal that has fur and a duck’s bill. Even though the naturalist has never encountered an animal with this pair of features, he should be able to make predictions about other features of the animal—for example, the animal could well live in water but probably does not have feathers. Although the platypus exists in reality, from a eighteenth-century perspective it qualifies as a chimera, or an animal that combines two or more features that have not previously been observed to co-occur. Here we describe a probabilistic account of inductive reasoning and use it to account for human inferences about chimeras. The inductive problems we consider are special cases of the more general problem in Figure 1a where a reasoner is given a partially observed matrix of animals by features then asked to infer the values of the missing entries. This general problem has been previously studied and is addressed by computational models of property induction, categorization, and generalization [1–7]. A challenge faced by all of these models is to capture the background knowledge that guides inductive inferences. Some accounts rely on similarity relationships between animals [6, 8], others rely on causal relationships between features [9, 10], and others incorporate relationships between animals and relationships between features [11]. We will evaluate graphical models that capture both kinds of relationships (Figure 1a), but will focus in particular on relationships between features. Psychologists have previously suggested that humans rely on explicit mental representations of relationships between features [12–16]. Often these representations are described as theories—for example, theories that specify a causal relationship between having wings and flying, or living in the sea and eating fish. Relationships between features may take several forms: for example, one feature may cause, enable, prevent, be inconsistent with, or be a special case of another feature. For simplicity, we will treat all of these relationships as instances of dependency relationships between features, and will capture them using an undirected graphical model. Previous studies have used graphical models to account for human inferences about features but typically these studies consider toy problems involving a handful of novel features such as “has gene X14” or “has enzyme Y132” [9, 11]. Participants might be told, for example, that gene X14 leads to the production of enzyme Y132, then asked to use this information when reasoning about novel animals. Here we explore whether a graphical model approach can account for inferences 1 (a) slow heavy flies (b) wings hippo 1 1 0 0 rhino 1 1 0 0 sparrow 0 0 1 1 robin 0 0 1 1 new ? ? 1 ? o Figure 1: Inductive reasoning about animals and features. (a) Inferences about the features of a new animal onew that flies may draw on similarity relationships between animals (the new animal is similar to sparrows and robins but not hippos and rhinos), and on dependency relationships between features (flying and having wings are linked). (b) A graph product produced by combining the two graph structures in (a). about familiar features. Working with familiar features raises a methodological challenge since participants have a substantial amount of knowledge about these features and can reason about them in multiple ways. Suppose, for example, that you learn that a novel animal can fly (Figure 1a). To conclude that the animal probably has wings, you might consult a mental representation similar to the graph at the top of Figure 1a that specifies a dependency relationship between flying and having wings. On the other hand, you might reach the same conclusion by thinking about flying creatures that you have previously encountered (e.g. sparrows and robins) and noticing that these creatures have wings. Since the same conclusion can be reached in two different ways, judgments about arguments of this kind provide little evidence about the mental representations involved. The challenge of working with familiar features directly motivates our focus on chimeras. Inferences about chimeras draw on rich background knowledge but require the reasoner to go beyond past experience in a fundamental way. For example, if you learn that an animal flies and has no legs, you cannot make predictions about the animal by thinking of flying, no-legged creatures that you have previously encountered. You may, however, still be able to infer that the novel animal has wings if you understand the relationship between flying and having wings. We propose that graphical models over features can help to explain how humans make inferences of this kind, and evaluate our approach by comparing it to a family of exemplar models. The next section introduces these models, and we then describe two experiments designed to distinguish between the models. 1 Reasoning about objects and features Our models make use of a binary matrix D where the rows {o1 , . . . , o129 } correspond to objects, and the columns {f 1 , . . . , f 56 } correspond to features. A subset of the objects is shown in Figure 2a, and the full set of features is shown in Figure 2b and its caption. Matrix D was extracted from the Leuven natural concept database [17], which includes 129 animals and 757 features in total. We chose a subset of these features that includes a mix of perceptual and behavioral features, and that includes many pairs of features that depend on each other. For example, animals that “live in water” typically “can swim,” and animals that have “no legs” cannot “jump far.” Matrix D can be used to formulate problems where a reasoner observes one or two features of a new object (i.e. animal o130 ) and must make inferences about the remaining features of the animal. The next two sections describe graphical models that can be used to address this problem. The first graphical model O captures relationships between objects, and the second model F captures relationships between features. We then discuss how these models can be combined, and introduce a family of exemplar-style models that will be compared with our graphical models. A graphical model over objects Many accounts of inductive reasoning focus on similarity relationships between objects [6, 8]. Here we describe a tree-structured graphical model O that captures these relationships. The tree was constructed from matrix D using average linkage clustering and the Jaccard similarity measure, and part of the resulting structure is shown in Figure 2a. The subtree in Figure 2a includes clusters 2 alligator caiman crocodile monitor lizard dinosaur blindworm boa cobra python snake viper chameleon iguana gecko lizard salamander frog toad tortoise turtle anchovy herring sardine cod sole salmon trout carp pike stickleback eel flatfish ray plaice piranha sperm whale squid swordfish goldfish dolphin orca whale shark bat fox wolf beaver hedgehog hamster squirrel mouse rabbit bison elephant hippopotamus rhinoceros lion tiger polar bear deer dromedary llama giraffe zebra kangaroo monkey cat dog cow horse donkey pig sheep (a) (b) can swim lives in water eats fish eats nuts eats grain eats grass has gills can jump far has two legs has no legs has six legs has four legs can fly can be ridden has sharp teeth nocturnal has wings strong predator can see in dark eats berries lives in the sea lives in the desert crawls lives in the woods has mane lives in trees can climb well lives underground has feathers has scales slow has fur heavy Figure 2: Graph structures used to define graphical models O and F. (a) A tree that captures similarity relationships between animals. The full tree includes 129 animals, and only part of the tree is shown here. The grey points along the branches indicate locations where a novel animal o130 could be attached to the tree. (b) A network capturing pairwise dependency relationships between features. The edges capture both positive and negative dependencies. All edges in the network are shown, and the network also includes 20 isolated nodes for the following features: is black, is blue, is green, is grey, is pink, is red, is white, is yellow, is a pet, has a beak, stings, stinks, has a long neck, has feelers, sucks blood, lays eggs, makes a web, has a hump, has a trunk, and is cold-blooded. corresponding to amphibians and reptiles, aquatic creatures, and land mammals, and the subtree omitted for space includes clusters for insects and birds. We assume that the features in matrix D (i.e. the columns) are generated independently over O: P (f i |O, π i , λi ). P (D|O, π, λ) = i i i i The distribution P (f |O, π , λ ) is based on the intuition that nearby nodes in O tend to have the same value of f i . Previous researchers [8, 18] have used a directed graphical model where the distribution at the root node is based on the baserate π i , and any other node v with parent u has the following conditional probability distribution: i P (v = 1|u) = π i + (1 − π i )e−λ l , if u = 1 i π i − π i e−λ l , if u = 0 (1) where l is the length of the branch joining node u to node v. The variability parameter λi captures the extent to which feature f i is expected to vary over the tree. Note, for example, that any node v must take the same value as its parent u when λ = 0. To avoid free parameters, the feature baserates π i and variability parameters λi are set to their maximum likelihood values given the observed values of the features {f i } in the data matrix D. The conditional distributions in Equation 1 induce a joint distribution over all of the nodes in graph O, and the distribution P (f i |O, π i , λi ) is computed by marginalizing out the values of the internal nodes. Although we described O as a directed graphical model, the model can be converted into an equivalent undirected model with a potential for each edge in the tree and a potential for the root node. Here we use the undirected version of the model, which is a natural counterpart to the undirected model F described in the next section. The full version of structure O in Figure 2a includes 129 familiar animals, and our task requires inferences about a novel animal o130 that must be slotted into the structure. Let D′ be an expanded version of D that includes a row for o130 , and let O′ be an expanded version of O that includes a node for o130 . The edges in Figure 2a are marked with evenly spaced gray points, and we use a 3 uniform prior P (O′ ) over all trees that can be created by attaching o130 to one of these points. Some of these trees have identical topologies, since some edges in Figure 2a have multiple gray points. Predictions about o130 can be computed using: P (D′ |D) = P (D′ |O′ , D)P (O′ |D) ∝ O′ P (D′ |O′ , D)P (D|O′ )P (O′ ). (2) O′ Equation 2 captures the basic intuition that the distribution of features for o130 is expected to be consistent with the distribution observed for previous animals. For example, if o130 is known to fly then the trees with high posterior probability P (O′ |D) will be those where o130 is near other flying creatures (Figure 1a), and since these creatures have wings Equation 2 predicts that o130 probably also has wings. As this example suggests, model O captures dependency relationships between features implicitly, and therefore stands in contrast to models like F that rely on explicit representations of relationships between features. A graphical model over features Model F is an undirected graphical model defined over features. The graph shown in Figure 2b was created by identifying pairs where one feature depends directly on another. The author and a research assistant both independently identified candidate sets of pairwise dependencies, and Figure 2b was created by merging these sets and reaching agreement about how to handle any discrepancies. As previous researchers have suggested [13, 15], feature dependencies can capture several kinds of relationships. For example, wings enable flying, living in the sea leads to eating fish, and having no legs rules out jumping far. We work with an undirected graph because some pairs of features depend on each other but there is no clear direction of causal influence. For example, there is clearly a dependency relationship between being nocturnal and seeing in the dark, but no obvious sense in which one of these features causes the other. We assume that the rows of the object-feature matrix D are generated independently from an undirected graphical model F defined over the feature structure in Figure 2b: P (oi |F). P (D|F) = i Model F includes potential functions for each node and for each edge in the graph. These potentials were learned from matrix D using the UGM toolbox for undirected graphical models [19]. The learned potentials capture both positive and negative relationships: for example, animals that live in the sea tend to eat fish, and tend not to eat berries. Some pairs of feature values never occur together in matrix D (there are no creatures that fly but do not have wings). We therefore chose to compute maximum a posteriori values of the potential functions rather than maximum likelihood values, and used a diffuse Gaussian prior with a variance of 100 on the entries in each potential. After learning the potentials for model F, we can make predictions about a new object o130 using the distribution P (o130 |F). For example, if o130 is known to fly (Figure 1a), model F predicts that o130 probably has wings because the learned potentials capture a positive dependency between flying and having wings. Combining object and feature relationships There are two simple ways to combine models O and F in order to develop an approach that incorporates both relationships between features and relationships between objects. The output combination model computes the predictions of both models in isolation, then combines these predictions using a weighted sum. The resulting model is similar to a mixture-of-experts model, and to avoid free parameters we use a mixing weight of 0.5. The structure combination model combines the graph structures used by the two models and relies on a set of potentials defined over the resulting graph product. An example of a graph product is shown in Figure 1b, and the potential functions for this graph are inherited from the component models in the natural way. Kemp et al. [11] use a similar approach to combine a functional causal model with an object model O, but note that our structure combination model uses an undirected model F rather than a functional causal model over features. Both combination models capture the intuition that inductive inferences rely on relationships between features and relationships between objects. The output combination model has the virtue of 4 simplicity, and the structure combination model is appealing because it relies on a single integrated representation that captures both relationships between features and relationships between objects. To preview our results, our data suggest that the combination models perform better overall than either O or F in isolation, and that both combination models perform about equally well. Exemplar models We will compare the family of graphical models already described with a family of exemplar models. The key difference between these model families is that the exemplar models do not rely on explicit representations of relationships between objects and relationships between features. Comparing the model families can therefore help to establish whether human inferences rely on representations of this sort. Consider first a problem where a reasoner must predict whether object o130 has feature k after observing that it has feature i. An exemplar model addresses the problem by retrieving all previouslyobserved objects with feature i and computing the proportion that have feature k: P (ok = 1|oi = 1) = |f k & f i | |f i | (3) where |f k | is the number of objects in matrix D that have feature k, and |f k & f i | is the number that have both feature k and feature i. Note that we have streamlined our notation by using ok instead of o130 to refer to the kth feature value for object o130 . k Suppose now that the reasoner observes that object o130 has features i and j. The natural generalization of Equation 3 is: P (ok = 1|oi = 1, oj = 1) = |f k & f i & f j | |f i & f j | (4) Because we focus on chimeras, |f i & f j | = 0 and Equation 4 is not well defined. We therefore evaluate an exemplar model that computes predictions for the two observed features separately then computes the weighted sum of these predictions: P (ok = 1|oi = 1, oj = 1) = wi |f k & f i | |f k & f j | + wj . i| |f |f j | (5) where the weights wi and wj must sum to one. We consider four ways in which the weights could be set. The first strategy sets wi = wj = 0.5. The second strategy sets wi ∝ |f i |, and is consistent with an approach where the reasoner retrieves all exemplars in D that are most similar to the novel animal and reports the proportion of these exemplars that have feature k. The third strategy sets wi ∝ |f1i | , and captures the idea that features should be weighted by their distinctiveness [20]. The final strategy sets weights according to the coherence of each feature [21]. A feature is coherent if objects with that feature tend to resemble each other overall, and we define the coherence of feature i as the expected Jaccard similarity between two randomly chosen objects from matrix D that both have feature i. Note that the final three strategies are all consistent with previous proposals from the psychological literature, and each one might be expected to perform well. Because exemplar models and prototype models are often compared, it is natural to consider a prototype model [22] as an additional baseline. A standard prototype model would partition the 129 animals into categories and would use summary statistics for these categories to make predictions about the novel animal o130 . We will not evaluate this model because it corresponds to a coarser version of model O, which organizes the animals into a hierarchy of categories. The key characteristic shared by both models is that they explicitly capture relationships between objects but not features. 2 Experiment 1: Chimeras Our first experiment explores how people make inferences about chimeras, or novel animals with features that have not previously been observed to co-occur. Inferences about chimeras raise challenges for exemplar models, and therefore help to establish whether humans rely on explicit representations of relationships between features. Each argument can be represented as f i , f j → f k 5 exemplar r = 0.42 7 feature F exemplar (wi = |f i |) (wi = 0.5) r = 0.44 7 object O r = 0.69 7 output combination r = 0.31 7 structure combination r = 0.59 7 r = 0.60 7 5 5 5 5 5 3 3 3 3 3 3 all 5 1 1 0 1 r = 0.06 7 conflict 0.5 1 1 0 0.5 1 r = 0.71 7 1 0 0.5 1 r = −0.02 7 1 0 0.5 1 r = 0.49 7 0 5 5 5 5 3 3 3 3 1 5 3 0.5 r = 0.57 7 5 3 1 0 0.5 1 r = 0.51 7 edge 0.5 r = 0.17 7 1 1 0 0.5 1 r = 0.64 7 1 0 0.5 1 r = 0.83 7 1 0 0.5 1 r = 0.45 7 1 0 0.5 1 r = 0.76 7 0 5 5 5 5 3 3 3 3 1 5 3 0.5 r = 0.79 7 5 3 1 1 0 0.5 1 r = 0.26 7 other 1 0 1 0 0.5 1 r = 0.25 7 1 0 0.5 1 r = 0.19 7 1 0 0.5 1 r = 0.25 7 1 0 0.5 1 r = 0.24 7 0 7 5 5 5 5 5 3 3 3 3 1 5 3 0.5 r = 0.33 3 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 0 0.5 1 Figure 3: Argument ratings for Experiment 1 plotted against the predictions of six models. The y-axis in each panel shows human ratings on a seven point scale, and the x-axis shows probabilities according to one of the models. Correlation coefficients are shown for each plot. where f i and f k are the premises (e.g. “has no legs” and “can fly”) and f k is the conclusion (e.g. “has wings”). We are especially interested in conflict cases where the premises f i and f j lead to opposite conclusions when taken individually: for example, most animals with no legs do not have wings, but most animals that fly do have wings. Our models that incorporate feature structure F can resolve this conflict since F includes a dependency between “wings” and “can fly” but not between “wings” and “has no legs.” Our models that do not include F cannot resolve the conflict and predict that humans will be uncertain about whether the novel animal has wings. Materials. The object-feature matrix D includes 447 feature pairs {f i , f j } such that none of the 129 animals has both f i and f j . We selected 40 pairs (see the supporting material) and created 400 arguments in total by choosing 10 conclusion features for each pair. The arguments can be assigned to three categories. Conflict cases are arguments f i , f j → f k such that the single-premise arguments f i → f k and f j → f k lead to incompatible predictions. For our purposes, two singlepremise arguments with the same conclusion are deemed incompatible if one leads to a probability greater than 0.9 according to Equation 3, and the other leads to a probability less than 0.1. Edge cases are arguments f i , f j → f k such that the feature network in Figure 2b includes an edge between f k and either f i or f j . Note that some arguments are both conflict cases and edge cases. All arguments that do not fall into either one of these categories will be referred to as other cases. The 400 arguments for the experiment include 154 conflict cases, 153 edge cases, and 120 other cases. 34 arguments are both conflict cases and edge cases. We chose these arguments based on three criteria. First, we avoided premise pairs that did not co-occur in matrix D but that co-occur in familiar animals that do not belong to D. For example, “is pink” and “has wings” do not co-occur in D but “flamingo” is a familiar animal that has both features. Second, we avoided premise pairs that specified two different numbers of legs—for example, {“has four legs,” “has six legs”}. Finally, we aimed to include roughly equal numbers of conflict cases, edge cases, and other cases. Method. 16 undergraduates participated for course credit. The experiment was carried out using a custom-built computer interface, and one argument was presented on screen at a time. Participants 6 rated the probability of the conclusion on seven point scale where the endpoints were labeled “very unlikely” and “very likely.” The ten arguments for each pair of premises were presented in a block, but the order of these blocks and the order of the arguments within these blocks were randomized across participants. Results. Figure 3 shows average human judgments plotted against the predictions of six models. The plots in the first row include all 400 arguments in the experiment, and the remaining rows show results for conflict cases, edge cases, and other cases. The previous section described four exemplar models, and the two shown in Figure 3 are the best performers overall. Even though the graphical models include more numerical parameters than the exemplar models, recall that these parameters are learned from matrix D rather than fit to the experimental data. Matrix D also serves as the basis for the exemplar models, which means that all of the models can be compared on equal terms. The first row of Figure 3 suggests that the three models which include feature structure F perform better than the alternatives. The output combination model is the worst of the three models that incorporate F, and the correlation achieved by this model is significantly greater than the correlation achieved by the best exemplar model (p < 0.001, using the Fisher transformation to convert correlation coefficients to z scores). Our data therefore suggest that explicit representations of relationships between features are needed to account for inductive inferences about chimeras. The model that includes the feature structure F alone performs better than the two models that combine F with the object structure O, which may not be surprising since Experiment 1 focuses specifically on novel animals that do not slot naturally into structure O. Rows two through four suggest that the conflict arguments in particular raise challenges for the models which do not include feature structure F. Since these conflict cases are arguments f i , f j → f k where f i → f k has strength greater than 0.9 and f j → f k has strength less than 0.1, the first exemplar model averages these strengths and assigns an overall strength of around 0.5 to each argument. The second exemplar model is better able to differentiate between the conflict arguments, but still performs substantially worse than the three models that include structure F. The exemplar models perform better on the edge arguments, but are outperformed by the models that include F. Finally, all models achieve roughly the same level of performance on the other arguments. Although the feature model F performs best overall, the predictions of this model still leave room for improvement. The two most obvious outliers in the third plot in the top row represent the arguments {is blue, lives in desert → lives in woods} and {is pink, lives in desert → lives in woods}. Our participants sensibly infer that any animal which lives in the desert cannot simultaneously live in the woods. In contrast, the Leuven database indicates that eight of the twelve animals that live in the desert also live in the woods, and the edge in Figure 2b between “lives in the desert” and “lives in the woods” therefore represents a positive dependency relationship according to model F. This discrepancy between model and participants reflects the fact that participants made inferences about individual animals but the Leuven database is based on features of animal categories. Note, for example, that any individual animal is unlikely to live in the desert and the woods, but that some animal categories (including snakes, salamanders, and lizards) are found in both environments. 3 Experiment 2: Single-premise arguments Our results so far suggest that inferences about chimeras rely on explicit representations of relationships between features but provide no evidence that relationships between objects are important. It would be a mistake, however, to conclude that relationships between objects play no role in inductive reasoning. Previous studies have used object structures like the example in Figure 2a to account for inferences about novel features [11]—for example, given that alligators have enzyme Y132 in their blood, it seems likely that crocodiles also have this enzyme. Inferences about novel objects can also draw on relationships between objects rather than relationships between features. For example, given that a novel animal has a beak you will probably predict that it has feathers, not because there is any direct dependency between these two features, but because the beaked animals that you know tend to have feathers. Our second experiment explores inferences of this kind. Materials and Method. 32 undergraduates participated for course credit. The task was identical to Experiment 1 with the following exceptions. Each two-premise argument f i , f j → f k from Experiment 1 was converted into two one-premise arguments f i → f k and f j → f k , and these 7 feature F exemplar r = 0.78 7 object O r = 0.54 7 output combination r = 0.75 7 structure combination r = 0.75 7 all 5 5 5 5 5 3 3 3 3 3 1 1 0 edge 0.5 1 r = 0.87 7 1 0 0.5 1 r = 0.87 7 1 0 0.5 1 r = 0.84 7 1 0 0.5 1 r = 0.86 7 0 5 5 5 3 3 3 1 5 3 0.5 r = 0.85 7 5 3 1 1 0 0.5 1 r = 0.79 7 other r = 0.77 7 1 0 0.5 1 r = 0.21 7 1 0 0.5 1 r = 0.74 7 1 0 0.5 1 r = 0.66 7 0 5 5 5 5 3 3 3 3 1 r = 0.73 7 5 0.5 3 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 0 0.5 1 Figure 4: Argument ratings and model predictions for Experiment 2. one-premise arguments were randomly assigned to two sets. 16 participants rated the 400 arguments in the first set, and the other 16 rated the 400 arguments in the second set. Results. Figure 4 shows average human ratings for the 800 arguments plotted against the predictions of five models. Unlike Figure 3, Figure 4 includes a single exemplar model since there is no need to consider different feature weightings in this case. Unlike Experiment 1, the feature model F performs worse than the other alternatives (p < 0.001 in all cases). Not surprisingly, this model performs relatively well for edge cases f j → f k where f j and f k are linked in Figure 2b, but the final row shows that the model performs poorly across the remaining set of arguments. Taken together, Experiments 1 and 2 suggest that relationships between objects and relationships between features are both needed to account for human inferences. Experiment 1 rules out an exemplar approach but models that combine graph structures over objects and features perform relatively well in both experiments. We considered two methods for combining these structures and both performed equally well. Combining the knowledge captured by these structures appears to be important, and future studies can explore in detail how humans achieve this combination. 4 Conclusion This paper proposed that graphical models are useful for capturing knowledge about animals and their features and showed that a graphical model over features can account for human inferences about chimeras. A family of exemplar models and a graphical model defined over objects were unable to account for our data, which suggests that humans rely on mental representations that explicitly capture dependency relationships between features. Psychologists have previously used graphical models to capture relationships between features, but our work is the first to focus on chimeras and to explore models defined over a large set of familiar features. Although a simple undirected model accounted relatively well for our data, this model is only a starting point. The model incorporates dependency relationships between features, but people know about many specific kinds of dependencies, including cases where one feature causes, enables, prevents, or is inconsistent with another. An undirected graph with only one class of edges cannot capture this knowledge in full, and richer representations will ultimately be needed in order to provide a more complete account of human reasoning. Acknowledgments I thank Madeleine Clute for assisting with this research. This work was supported in part by the Pittsburgh Life Sciences Greenhouse Opportunity Fund and by NSF grant CDI-0835797. 8 References [1] R. N. Shepard. Towards a universal law of generalization for psychological science. Science, 237:1317– 1323, 1987. [2] J. R. Anderson. The adaptive nature of human categorization. Psychological Review, 98(3):409–429, 1991. [3] E. Heit. A Bayesian analysis of some forms of inductive reasoning. In M. Oaksford and N. Chater, editors, Rational models of cognition, pages 248–274. Oxford University Press, Oxford, 1998. [4] J. B. Tenenbaum and T. L. Griffiths. Generalization, similarity, and Bayesian inference. Behavioral and Brain Sciences, 24:629–641, 2001. [5] C. Kemp and J. B. Tenenbaum. Structured statistical models of inductive reasoning. Psychological Review, 116(1):20–58, 2009. [6] D. N. Osherson, E. E. Smith, O. Wilkie, A. Lopez, and E. Shafir. Category-based induction. Psychological Review, 97(2):185–200, 1990. [7] D. J. Navarro. Learning the context of a category. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 1795–1803. 2010. [8] C. Kemp, T. L. Griffiths, S. Stromsten, and J. B. Tenenbaum. Semi-supervised learning with trees. In Advances in Neural Information Processing Systems 16, pages 257–264. MIT Press, Cambridge, MA, 2004. [9] B. Rehder. A causal-model theory of conceptual representation and categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29:1141–1159, 2003. [10] B. Rehder and R. Burnett. Feature inference and the causal structure of categories. Cognitive Psychology, 50:264–314, 2005. [11] C. Kemp, P. Shafto, and J. B. Tenenbaum. An integrated account of generalization across objects and features. Cognitive Psychology, in press. [12] S. E. Barrett, H. Abdi, G. L. Murphy, and J. McCarthy Gallagher. Theory-based correlations and their role in children’s concepts. Child Development, 64:1595–1616, 1993. [13] S. A. Sloman, B. C. Love, and W. Ahn. Feature centrality and conceptual coherence. Cognitive Science, 22(2):189–228, 1998. [14] D. Yarlett and M. Ramscar. A quantitative model of counterfactual reasoning. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 123–130. MIT Press, Cambridge, MA, 2002. [15] W. Ahn, J. K. Marsh, C. C. Luhmann, and K. Lee. Effect of theory-based feature correlations on typicality judgments. Memory and Cognition, 30(1):107–118, 2002. [16] D. C. Meehan C. McNorgan, R. A. Kotack and K. McRae. Feature-feature causal relations and statistical co-occurrences in object concepts. Memory and Cognition, 35(3):418–431, 2007. [17] S. De Deyne, S. Verheyen, E. Ameel, W. Vanpaemel, M. J. Dry, W. Voorspoels, and G. Storms. Exemplar by feature applicability matrices and other Dutch normative data for semantic concepts. Behavior Research Methods, 40(4):1030–1048, 2008. [18] J. P. Huelsenbeck and F. Ronquist. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics, 17(8):754–755, 2001. [19] M. Schmidt. UGM: A Matlab toolbox for probabilistic undirected graphical models. 2007. Available at http://people.cs.ubc.ca/∼schmidtm/Software/UGM.html. [20] L. J. Nelson and D. T. Miller. The distinctiveness effect in social categorization: you are what makes you unusual. Psychological Science, 6:246–249, 1995. [21] A. L. Patalano, S. Chin-Parker, and B. H. Ross. The importance of being coherent: category coherence, cross-classification and reasoning. Journal of memory and language, 54:407–424, 2006. [22] S. K. Reed. Pattern recognition and categorization. Cognitive Psychology, 3:393–407, 1972. 9
4 0.87774616 213 nips-2011-Phase transition in the family of p-resistances
Author: Morteza Alamgir, Ulrike V. Luxburg
Abstract: We study the family of p-resistances on graphs for p 1. This family generalizes the standard resistance distance. We prove that for any fixed graph, for p = 1 the p-resistance coincides with the shortest path distance, for p = 2 it coincides with the standard resistance distance, and for p ! 1 it converges to the inverse of the minimal s-t-cut in the graph. Secondly, we consider the special case of random geometric graphs (such as k-nearest neighbor graphs) when the number n of vertices in the graph tends to infinity. We prove that an interesting phase transition takes place. There exist two critical thresholds p⇤ and p⇤⇤ such that if p < p⇤ , then the p-resistance depends on meaningful global properties of the graph, whereas if p > p⇤⇤ , it only depends on trivial local quantities and does not convey any useful information. We can explicitly compute the critical values: p⇤ = 1 + 1/(d 1) and p⇤⇤ = 1 + 1/(d 2) where d is the dimension of the underlying space (we believe that the fact that there is a small gap between p⇤ and p⇤⇤ is an artifact of our proofs). We also relate our findings to Laplacian regularization and suggest to use q-Laplacians as regularizers, where q satisfies 1/p⇤ + 1/q = 1. 1
5 0.87591583 193 nips-2011-Object Detection with Grammar Models
Author: Ross B. Girshick, Pedro F. Felzenszwalb, David A. McAllester
Abstract: Compositional models provide an elegant formalism for representing the visual appearance of highly variable objects. While such models are appealing from a theoretical point of view, it has been difficult to demonstrate that they lead to performance advantages on challenging datasets. Here we develop a grammar model for person detection and show that it outperforms previous high-performance systems on the PASCAL benchmark. Our model represents people using a hierarchy of deformable parts, variable structure and an explicit model of occlusion for partially visible objects. To train the model, we introduce a new discriminative framework for learning structured prediction models from weakly-labeled data. 1
6 0.84956765 39 nips-2011-Approximating Semidefinite Programs in Sublinear Time
7 0.82697284 127 nips-2011-Image Parsing with Stochastic Scene Grammar
8 0.81499749 158 nips-2011-Learning unbelievable probabilities
9 0.81242102 279 nips-2011-Target Neighbor Consistent Feature Weighting for Nearest Neighbor Classification
10 0.80148888 64 nips-2011-Convergent Bounds on the Euclidean Distance
11 0.80015385 38 nips-2011-Anatomically Constrained Decoding of Finger Flexion from Electrocorticographic Signals
12 0.79492974 229 nips-2011-Query-Aware MCMC
13 0.78887105 249 nips-2011-Sequence learning with hidden units in spiking neural networks
14 0.78288656 11 nips-2011-A Reinforcement Learning Theory for Homeostatic Regulation
15 0.78163815 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries
16 0.78160775 219 nips-2011-Predicting response time and error rates in visual search
17 0.77680254 180 nips-2011-Multiple Instance Filtering
18 0.77322072 192 nips-2011-Nonstandard Interpretations of Probabilistic Programs for Efficient Inference
19 0.77226478 292 nips-2011-Two is better than one: distinct roles for familiarity and recollection in retrieving palimpsest memories
20 0.77125734 31 nips-2011-An Application of Tree-Structured Expectation Propagation for Channel Decoding