nips nips2011 nips2011-221 knowledge-graph by maker-knowledge-mining

221 nips-2011-Priors over Recurrent Continuous Time Processes

Source: pdf

Author: Ardavan Saeedi, Alexandre Bouchard-côté

Abstract: We introduce the Gamma-Exponential Process (GEP), a prior over a large family of continuous time stochastic processes. A hierarchical version of this prior (HGEP; the Hierarchical GEP) yields a useful model for analyzing complex time series. Models based on HGEPs display many attractive properties: conjugacy, exchangeability and closed-form predictive distribution for the waiting times, and exact Gibbs updates for the time scale parameters. After establishing these properties, we show how posterior inference can be carried efﬁciently using Particle MCMC methods [1]. This yields a MCMC algorithm that can resample entire sequences atomically while avoiding the complications of introducing slice and stick auxiliary variables of the beam sampler [2]. We applied our model to the problem of estimating the disease progression in multiple sclerosis [3], and to RNA evolutionary modeling [4]. In both domains, we found that our model outperformed the standard rate matrix estimation approach. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Priors over Recurrent Continuous Time Processes Ardavan Saeedi Alexandre Bouchard-Cˆ t´ oe Department of Statistics University of British Columbia Abstract We introduce the Gamma-Exponential Process (GEP), a prior over a large family of continuous time stochastic processes. [sent-1, score-0.12]

2 A hierarchical version of this prior (HGEP; the Hierarchical GEP) yields a useful model for analyzing complex time series. [sent-2, score-0.128]

3 Models based on HGEPs display many attractive properties: conjugacy, exchangeability and closed-form predictive distribution for the waiting times, and exact Gibbs updates for the time scale parameters. [sent-3, score-0.397]

4 This yields a MCMC algorithm that can resample entire sequences atomically while avoiding the complications of introducing slice and stick auxiliary variables of the beam sampler [2]. [sent-5, score-0.276]

5 We applied our model to the problem of estimating the disease progression in multiple sclerosis [3], and to RNA evolutionary modeling [4]. [sent-6, score-0.364]

6 In both domains, we found that our model outperformed the standard rate matrix estimation approach. [sent-7, score-0.118]

7 1 Introduction The application of non-parametric Bayesian techniques to time series has been an active ﬁeld in the recent years, and has led to many successful continuous time models. [sent-8, score-0.155]

8 Examples include Dependent Dirichlet Processes (DDP) [5], Ornstein-Uhlenbeck Dirichlet Processes [6], and stick-breaking autoregressive processes [7]. [sent-9, score-0.135]

9 More formally, DDPs and their cousins can be viewed as priors over transient processes (see Section A of the Supplementary Material). [sent-11, score-0.147]

10 As a concrete example of the type of time series we are interested in, consider the problem of modeling the progression of recurrent diseases such as multiple sclerosis. [sent-14, score-0.296]

11 In multiple sclerosis research, measuring the effect of drugs in the presence of these complex cycles is challenging, and is one of the applications that motivated this work. [sent-16, score-0.102]

12 The data available to infer the disease progression typically takes the form of summary measurements taken at different points in time for each patient. [sent-17, score-0.299]

13 We model these measurements as being conditionally independent given a continuous time non-parametric latent process. [sent-18, score-0.119]

14 GEPs are based on priors over recurrent, inﬁnite rate matrices specifying a jump process in a latent space. [sent-21, score-0.215]

15 It is informative to start by a preview of what the predictive distributions look like in GEP models. [sent-22, score-0.125]

16 , tn denote the previous, distinct waiting times at θ. [sent-28, score-0.256]

17 The predictive distribution is then speciﬁed by the following density over the positive reals: f (t) = (α0 + n)(β0 + T )(α0 +n) , (β0 + T + t)α0 +n+1 where T is the sum over the ti ’s, and α0 , β0 are parameters. [sent-29, score-0.125]

18 It can be checked that this yields an exchangeable distribution over the sequences of waiting times at θ (if forms a telescoping product— see the proof of Proposition 5 in the Supplementary Material). [sent-30, score-0.393]

19 We identify this prior in Section 3, and use it to build a powerful hierarchical model in Section 4. [sent-32, score-0.092]

20 As we will see, this hierarchical model displays many attractive properties: conjugacy, exchangeability and closed-form predictive distributions for the waiting times, and exact Gibbs updates for the time scale parameters. [sent-33, score-0.489]

21 While continuous-time analogues of these discrete time processes can be constructed by subordination, we discuss in Section C of the Supplementary Material the differences and advantages of GEPs compared to these subordinations. [sent-36, score-0.136]

22 1 Note also that the gamma-exponential process introduced here is unrelated to the exponential-gamma process [18]. [sent-39, score-0.126]

23 2 Background and notation While our process can be deﬁned on continuous state spaces, the essential ideas can be described over countable state spaces. [sent-40, score-0.382]

24 Samples from these processes take the form of a list of pairs of states and waiting times X = (θn , Jn )N (see Figure 1(a)). [sent-45, score-0.416]

25 It is the same process as the Gamma process used in e. [sent-58, score-0.126]

26 (b) Graphical model for the hierarchical model of Section 4. [sent-73, score-0.092]

27 To get a conjugate family, we will base our priors on Moran Gamma Processes (MGPs) [13], a family of measure-valued probability distributions. [sent-75, score-0.18]

28 Recall that by the Kolmogorov consistency theorem, in order to guarantee the existence of a stochastic process on a probability space (Ω , FΩ ), it is enough to provide a consistent deﬁnition of what the marginals of this stochastic process are. [sent-78, score-0.126]

29 As the name suggest, in the case of a Moran Gamma process, the marginals are gamma distributions: Deﬁnition 1 (Moran Gamma Process). [sent-79, score-0.264]

30 In GEPs, the rows of a rate matrix Q are obtained by a transformation of iid samples from an MGP, and the states are then generated from Q with the Doob-Gillespie algorithm described in the previous section. [sent-90, score-0.181]

31 In this section we show that this model is conjugate and has a closed form expression for the predictive distribution. [sent-91, score-0.164]

32 Let H0 be a base measure on a countable support Ω with H0 < ∞. [sent-92, score-0.205]

33 We will relax the countable base measure support assumption in the next section. [sent-93, score-0.205]

34 , and setting:3 qi,j = µθ(i) ({θ(j) }) if i = j, and 2 We use the rate parameterization for the gamma density throughout. [sent-97, score-0.303]

35 Note that the GEP as deﬁned above can generate self-transitions, but conditioning on the parameters, the jump waiting times are still exponential. [sent-98, score-0.357]

36 However for computing predictive distributions, it will be simpler to allow positive self-transitions rates. [sent-99, score-0.125]

37 In other words, we always condition on (θ0 = θbeg ) and (θn = θbeg , n > 0), and drop these conditioning events from the notation. [sent-102, score-0.102]

38 Similarly, we are going to consider distribution over inﬁnite sequences in the notation that follows, but if the goal is to model ﬁnite sequences, an additional special state θend = θbeg can be introduced. [sent-103, score-0.14]

39 The sufﬁcient statistics for the parameters of µθ |X are the empirical transition measures and waiting times: N Fθ = N 1[θn−1 = θ] δθn , Tθ = n=1 1[θn−1 = θ] Jn . [sent-110, score-0.204]

40 This connexion with the Dirichlet process is used in the proof below, and also implies that samples from GEPs have countable support even when Ω is uncountable (i. [sent-114, score-0.258]

41 the chain will always visit a random countable subset of Ω). [sent-116, score-0.156]

42 Fix an arbitrary state θ and drop the index for simplicity (this is without loss of generality since the rows are iid): let µ = µθ , µ = µθ , and β = βθ . [sent-121, score-0.094]

43 We now turn to the task of ﬁnding an expression for the predictive distribution, (θN +1 , JN +1 )|X. [sent-137, score-0.125]

44 The predictive distribution of the GEP is given by: (θN +1 , JN +1 )|X ∼ µθN × TP( µθN , βθN ). [sent-142, score-0.125]

45 As a sanity check, and to connect this result with the discussion in the introduction, it is instructive to directly check that these predictive distributions are indeed exchangeable (see Section B for the proof): Proposition 6. [sent-147, score-0.184]

46 , Jj(θ,K) be the subsequence of waiting times following state θ. [sent-151, score-0.318]

47 Moreover, the joint density of a sequence of waiting times (Jj(θ,1) = j1 , Jj(θ,2) = j2 , . [sent-156, score-0.309]

48 4 Hierarchical GEP In this section, we present a hierarchical version of the GEP, where the rows of the random rate matrix are exchangeable rather than iid. [sent-166, score-0.222]

49 For such spaces, since each GEP sample has a random countable support, any two independent GEP samples will have disjoint supports with probability one. [sent-169, score-0.156]

50 Therefore, GEP alone cannot be used to construct recurrent processes when Ω is uncountable. [sent-170, score-0.194]

51 Fortunately, the hierarchical model introduced in this section addresses this issue: it yields a recurrent prior over continuous time jump processes over both countable and uncountable spaces Ω (see Section A). [sent-171, score-0.622]

52 The hierarchical process is constructed by making the base measure parameter of the rows shared and random. [sent-172, score-0.236]

53 In order to get a tractable predictive distribution, we introduce a set of auxiliary variables. [sent-174, score-0.233]

54 These auxiliary variables can be compared to the variables used in the Chinese Restaurant Franchise (CRF) metaphor [20] to indicate when new tables are created in a given restaurant. [sent-175, score-0.142]

55 These auxiliary variables will be denoted by An , where the event An = 1 means informally that the n-th transition creates a new table. [sent-177, score-0.139]

56 See Section D in the Supplementary Material for a review of the CRF construction and a formal deﬁnition of the auxiliary variables An . [sent-179, score-0.108]

57 We augment the sufﬁcient statistics with empirical counts for the number of tables across all restauN rants that share a given dish, G = n=1 An δθn , and introduce one additional auxiliary variable, the normalization of the top level random measure, µ0 . [sent-180, score-0.176]

58 This latter auxiliary variable has no equivalent in CRFs. [sent-181, score-0.108]

59 Finally, we let: (H) µ = G + H0 µθ (H) µθ ¯ = F θ + µ0 µ ¯ where can be recognized as the mean parameter of the predictive distribution of the HDP. [sent-183, score-0.125]

60 The predictive distribution of the Hierarchical GEP (HGEP) is given by: (H) (H) (θN +1 , JN +1 ) (X, {An }N , µ0 ) ∼ µθN × TP( µθN , βθN ). [sent-186, score-0.125]

61 ¯ n=1 To resample the auxiliary variable µ0 , a gamma-distributed Gibbs kernel can be used (see Section E of the Supplementary Material). [sent-187, score-0.147]

62 5 Inference on partially observed sequences In this section, we describe how to approximate expectations under the posterior distribution of GEPs, E[h(X)|Y], for a test function h on the hidden events X given observations Y. [sent-188, score-0.252]

63 An example of function h on these events is to interpolate the progression of the disease in a patient with Multiple Sclerosis (MS) between two medical visits. [sent-189, score-0.286]

64 Note that in most applications, the sequence of states is not directly nor fully observed. [sent-191, score-0.113]

65 First, instead of observing the random variables θ, inference is often carried from X -valued random variables Yn distributed according to a parametric family P indexed by the states θ of the chain, P = {Lθ : FX → [0, 1], θ ∈ Ω}. [sent-192, score-0.139]

66 Second, the measurements are generally available only for a ﬁnite set of times T . [sent-193, score-0.096]

67 Nonconjugate models can be handled by incorporating the auxiliary variables of Algorithm 8 in [21]. [sent-200, score-0.108]

68 Extension to hierarchical models is direct (by keeping track of an additional sufﬁcient statistic G, as well as the auxiliary variables An , µ0 ). [sent-202, score-0.2]

69 In general, there may be several exchangeable sequences from which we want to learn a model. [sent-203, score-0.137]

70 For example, we learned a model for MS disease progression by using time series from several patients. [sent-204, score-0.299]

71 g At a high-level, our inference algorithm works by resampling the hidden events X (k) for one se(\k) (\k) quence k given the sufﬁcient statistics of the other sequences, (Fθ , Tθ ). [sent-212, score-0.176]

72 This is done using a Sequential Monte Carlo (SMC) algorithm to construct a proposal over sequences of hidden events. [sent-213, score-0.2]

73 Each particle in our SMC algorithm is a sequence of states and waiting times for the current sequence k. [sent-214, score-0.516]

74 By using a Particle MCMC (PMCMC) method [1], we then compute an acceptance ratio 4 Even in cases where there is a single long sequence, we recommend for efﬁciency reasons to partition the sequence into subsequences. [sent-215, score-0.109]

75 As we will see shortly, the acceptance is simply given by a ratio of marginal likelihood estimators, which can be computed directly from the unnormalized particle weights. [sent-229, score-0.229]

76 , M } consists of a list of hidden events indexed by n, containing both Nm,g (hidden) states and waiting times: Xm,g = (θm,n , Jm,n )n=1 . [sent-234, score-0.406]

77 (k) The next step is to compute an acceptance probability for a proposed sequence of states X∗ . [sent-236, score-0.169]

78 At each MCMC iteration, we assume that we store the value of the data likelihood estimates for the accepted state sequences. [sent-237, score-0.13]

79 Let L(k) be the estimate for the previously accepted sequence of states for observed sequence k, and (k) let L∗ be the estimate for the current MCMC iteration. [sent-239, score-0.201]

80 The acceptance probability for the new (k) (k) sequence is given by min 1, L∗ /L(k) . [sent-240, score-0.109]

81 First, we demonstrate the behavior of state trajectories and sojourn times sampled from the prior to give a qualitative idea of the range of time series that can be captured by our model. [sent-243, score-0.329]

82 Figure 2(a) shows a sequence with short sojourn times and high volatility of states, whereas Figure 2(b) depicts longer sojourn times with much less volatility. [sent-248, score-0.361]

83 Likewise, in Figure 2(d) the high tendency to create new states is present, but we have longer sojourn times. [sent-251, score-0.162]

84 See Section H of the supplementary material for a more detailed account of the interpretation and quantitative effect of the parameters. [sent-252, score-0.116]

85 2 Quantitative evaluation In this section, we use a simple likelihood model for discrete observations (described in Section G of the supplementary material) to evaluate our method on three held-out tasks. [sent-254, score-0.094]

86 7 q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq 200 400 600 800 0. [sent-260, score-0.204]

87 6 HGEP EM qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q 0. [sent-266, score-0.153]

88 8 Synthetic HGEP EM q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq 0 200 400 600 800 Figure 3: Mean reconstruction error on the held-out data as a function of the number of Gibbs scans. [sent-270, score-0.153]

89 The standard maximum likelihood estimate learned with EM outperformed our model in the simple synthetic dataset, but the trend was reversed in the more complex real world datasets. [sent-272, score-0.153]

90 See Section G of the supplementary material for detailed instructions for replicating the following three results. [sent-276, score-0.116]

91 Both HGEP and the EM-learned maximum likelihood outperformed the baseline. [sent-279, score-0.112]

92 MS disease progression: This dataset, obtained from a phase III clinical trial, tracks the progression of MS in 72 patients over 3 years. [sent-285, score-0.261]

93 The observed state of a patient at a given time is binned into three categories as customary in the MS literature [3]. [sent-286, score-0.098]

94 Both HGEP and EM outperformed the baseline by a large margin, and our HGEP model outperformed EM with a relative error reduction of 22%. [sent-287, score-0.158]

95 Again, both HGEP and EM outperformed the baseline, and our model outperformed EM with a relative error reduction of 29%. [sent-292, score-0.158]

96 The model has attractive properties and we show that the posterior computations can be done efﬁciently using a sampler based on particle MCMC methods. [sent-294, score-0.126]

97 The Ornstein-Uhlenbeck Dirichlet process and other time-varying processes for Bayesian nonparametric inference. [sent-351, score-0.198]

98 Estimation in the Koziol-Green model using a gamma process prior. [sent-406, score-0.327]

99 Statistical inference in evolutionary models of DNA sequences via the EM algorithm. [sent-461, score-0.155]

100 Inferring complex DNA substitution processes on phylogenies using uniformization and data augmentation. [sent-466, score-0.1]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('gep', 0.356), ('hgep', 0.356), ('gamma', 0.264), ('waiting', 0.204), ('mgp', 0.179), ('geps', 0.178), ('jn', 0.157), ('countable', 0.156), ('jj', 0.146), ('moran', 0.134), ('tg', 0.134), ('predictive', 0.125), ('rna', 0.122), ('progression', 0.122), ('auxiliary', 0.108), ('em', 0.105), ('dirichlet', 0.103), ('beg', 0.102), ('conjugacy', 0.102), ('qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq', 0.102), ('sclerosis', 0.102), ('sojourn', 0.102), ('processes', 0.1), ('disease', 0.097), ('recurrent', 0.094), ('particle', 0.094), ('hierarchical', 0.092), ('tp', 0.081), ('ms', 0.079), ('outperformed', 0.079), ('sequences', 0.078), ('mcmc', 0.077), ('ihmm', 0.076), ('hidden', 0.075), ('ak', 0.071), ('smc', 0.067), ('events', 0.067), ('jump', 0.066), ('process', 0.063), ('proposition', 0.063), ('state', 0.062), ('supplementary', 0.061), ('states', 0.06), ('exchangeable', 0.059), ('acceptance', 0.056), ('material', 0.055), ('sequence', 0.053), ('times', 0.052), ('beam', 0.051), ('ctmps', 0.051), ('ddps', 0.051), ('mgps', 0.051), ('palgrave', 0.051), ('qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq', 0.051), ('ribosomal', 0.051), ('jk', 0.05), ('iid', 0.05), ('base', 0.049), ('priors', 0.047), ('proposal', 0.047), ('unnormalized', 0.046), ('restaurant', 0.046), ('family', 0.045), ('series', 0.044), ('measurements', 0.044), ('evolutionary', 0.043), ('patients', 0.042), ('synthetic', 0.041), ('gael', 0.041), ('continuous', 0.039), ('rate', 0.039), ('conjugate', 0.039), ('resample', 0.039), ('uncountable', 0.039), ('finance', 0.039), ('bayesian', 0.037), ('doucet', 0.037), ('crp', 0.037), ('econometrics', 0.037), ('dna', 0.037), ('beta', 0.036), ('teh', 0.036), ('time', 0.036), ('nonparametric', 0.035), ('conditioning', 0.035), ('accepted', 0.035), ('autoregressive', 0.035), ('tables', 0.034), ('normalization', 0.034), ('gibbs', 0.034), ('nite', 0.034), ('inference', 0.034), ('factorial', 0.034), ('qualitative', 0.033), ('likelihood', 0.033), ('exchangeability', 0.032), ('rows', 0.032), ('posterior', 0.032), ('markov', 0.032), ('informally', 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 221 nips-2011-Priors over Recurrent Continuous Time Processes

Author: Ardavan Saeedi, Alexandre Bouchard-côté

2 0.11506099 104 nips-2011-Generalized Beta Mixtures of Gaussians

Author: Artin Armagan, Merlise Clyde, David B. Dunson

Abstract: In recent years, a rich variety of shrinkage priors have been proposed that have great promise in addressing massive regression problems. In general, these new priors can be expressed as scale mixtures of normals, but have more complex forms and better properties than traditional Cauchy and double exponential priors. We ﬁrst propose a new class of normal scale mixtures through a novel generalized beta distribution that encompasses many interesting priors as special cases. This encompassing framework should prove useful in comparing competing priors, considering properties and revealing close connections. We then develop a class of variational Bayes approximations through the new hierarchy presented that will scale more efﬁciently to the types of truly massive data sets that are now encountered routinely. 1

3 0.10444804 101 nips-2011-Gaussian process modulated renewal processes

Author: Yee W. Teh, Vinayak Rao

Abstract: Renewal processes are generalizations of the Poisson process on the real line whose intervals are drawn i.i.d. from some distribution. Modulated renewal processes allow these interevent distributions to vary with time, allowing the introduction of nonstationarity. In this work, we take a nonparametric Bayesian approach, modelling this nonstationarity with a Gaussian process. Our approach is based on the idea of uniformization, which allows us to draw exact samples from an otherwise intractable distribution. We develop a novel and efﬁcient MCMC sampler for posterior inference. In our experiments, we test these on a number of synthetic and real datasets. 1

4 0.09727446 131 nips-2011-Inference in continuous-time change-point models

Author: Florian Stimberg, Manfred Opper, Guido Sanguinetti, Andreas Ruttor

Abstract: We consider the problem of Bayesian inference for continuous-time multi-stable stochastic systems which can change both their diffusion and drift parameters at discrete times. We propose exact inference and sampling methodologies for two speciﬁc cases where the discontinuous dynamics is given by a Poisson process and a two-state Markovian switch. We test the methodology on simulated data, and apply it to two real data sets in ﬁnance and systems biology. Our experimental results show that the approach leads to valid inferences and non-trivial insights. 1

5 0.089124314 173 nips-2011-Modelling Genetic Variations using Fragmentation-Coagulation Processes

Author: Yee W. Teh, Charles Blundell, Lloyd Elliott

Abstract: We propose a novel class of Bayesian nonparametric models for sequential data called fragmentation-coagulation processes (FCPs). FCPs model a set of sequences using a partition-valued Markov process which evolves by splitting and merging clusters. An FCP is exchangeable, projective, stationary and reversible, and its equilibrium distributions are given by the Chinese restaurant process. As opposed to hidden Markov models, FCPs allow for ﬂexible modelling of the number of clusters, and they avoid label switching non-identiﬁability problems. We develop an efﬁcient Gibbs sampler for FCPs which uses uniformization and the forward-backward algorithm. Our development of FCPs is motivated by applications in population genetics, and we demonstrate the utility of FCPs on problems of genotype imputation with phased and unphased SNP data. 1

6 0.081848003 258 nips-2011-Sparse Bayesian Multi-Task Learning

7 0.075419441 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes

8 0.075229339 246 nips-2011-Selective Prediction of Financial Trends with Hidden Markov Models

9 0.074046269 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

10 0.073532999 285 nips-2011-The Kernel Beta Process

11 0.07179451 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning

12 0.071154773 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices

13 0.070791848 249 nips-2011-Sequence learning with hidden units in spiking neural networks

14 0.068600342 134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning

15 0.067093849 281 nips-2011-The Doubly Correlated Nonparametric Topic Model

16 0.06705007 156 nips-2011-Learning to Learn with Compound HD Models

17 0.063974075 8 nips-2011-A Model for Temporal Dependencies in Event Streams

18 0.061997179 6 nips-2011-A Global Structural EM Algorithm for a Model of Cancer Progression

19 0.060904302 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation

20 0.060863595 229 nips-2011-Query-Aware MCMC

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.179), (1, 0.024), (2, 0.051), (3, -0.008), (4, -0.059), (5, -0.167), (6, 0.028), (7, -0.067), (8, 0.03), (9, 0.035), (10, -0.022), (11, -0.094), (12, 0.039), (13, -0.046), (14, -0.056), (15, -0.004), (16, 0.01), (17, -0.074), (18, -0.024), (19, 0.016), (20, 0.115), (21, 0.03), (22, -0.04), (23, -0.013), (24, -0.126), (25, 0.054), (26, -0.028), (27, 0.001), (28, 0.049), (29, 0.022), (30, 0.008), (31, 0.078), (32, -0.085), (33, 0.027), (34, -0.041), (35, -0.071), (36, 0.041), (37, -0.058), (38, 0.026), (39, 0.038), (40, 0.031), (41, -0.018), (42, -0.081), (43, 0.035), (44, -0.019), (45, 0.074), (46, 0.024), (47, 0.048), (48, 0.064), (49, -0.062)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94329023 221 nips-2011-Priors over Recurrent Continuous Time Processes

Author: Ardavan Saeedi, Alexandre Bouchard-côté

2 0.7981708 131 nips-2011-Inference in continuous-time change-point models

Author: Florian Stimberg, Manfred Opper, Guido Sanguinetti, Andreas Ruttor

3 0.78277731 173 nips-2011-Modelling Genetic Variations using Fragmentation-Coagulation Processes

Author: Yee W. Teh, Charles Blundell, Lloyd Elliott

4 0.72471666 101 nips-2011-Gaussian process modulated renewal processes

Author: Yee W. Teh, Vinayak Rao

5 0.72461599 8 nips-2011-A Model for Temporal Dependencies in Event Streams

Author: Asela Gunawardana, Christopher Meek, Puyang Xu

Abstract: We introduce the Piecewise-Constant Conditional Intensity Model, a model for learning temporal dependencies in event streams. We describe a closed-form Bayesian approach to learning these models, and describe an importance sampling algorithm for forecasting future events using these models, using a proposal distribution based on Poisson superposition. We then use synthetic data, supercomputer event logs, and web search query logs to illustrate that our learning algorithm can efﬁciently learn nonlinear temporal dependencies, and that our importance sampling algorithm can effectively forecast future events. 1

6 0.72334921 55 nips-2011-Collective Graphical Models

7 0.65327942 104 nips-2011-Generalized Beta Mixtures of Gaussians

8 0.63516659 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning

9 0.58227628 285 nips-2011-The Kernel Beta Process

10 0.56586117 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes

11 0.55906957 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

12 0.55188316 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

13 0.54764509 40 nips-2011-Automated Refinement of Bayes Networks' Parameters based on Test Ordering Constraints

14 0.54610473 14 nips-2011-A concave regularization technique for sparse mixture models

15 0.54058105 192 nips-2011-Nonstandard Interpretations of Probabilistic Programs for Efficient Inference

16 0.52652466 197 nips-2011-On Tracking The Partition Function

17 0.52598482 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data

18 0.52557021 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation

19 0.52532548 6 nips-2011-A Global Structural EM Algorithm for a Model of Cancer Progression

20 0.52269804 246 nips-2011-Selective Prediction of Financial Trends with Hidden Markov Models

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.03), (4, 0.033), (20, 0.017), (26, 0.033), (31, 0.122), (33, 0.044), (43, 0.058), (45, 0.086), (57, 0.03), (63, 0.279), (74, 0.08), (83, 0.036), (84, 0.029), (99, 0.042)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.8317048 2 nips-2011-A Brain-Machine Interface Operating with a Real-Time Spiking Neural Network Control Algorithm

Author: Julie Dethier, Paul Nuyujukian, Chris Eliasmith, Terrence C. Stewart, Shauki A. Elasaad, Krishna V. Shenoy, Kwabena A. Boahen

Abstract: Motor prostheses aim to restore function to disabled patients. Despite compelling proof of concept systems, barriers to clinical translation remain. One challenge is to develop a low-power, fully-implantable system that dissipates only minimal power so as not to damage tissue. To this end, we implemented a Kalman-ﬁlter based decoder via a spiking neural network (SNN) and tested it in brain-machine interface (BMI) experiments with a rhesus monkey. The Kalman ﬁlter was trained to predict the arm’s velocity and mapped on to the SNN using the Neural Engineering Framework (NEF). A 2,000-neuron embedded Matlab SNN implementation runs in real-time and its closed-loop performance is quite comparable to that of the standard Kalman ﬁlter. The success of this closed-loop decoder holds promise for hardware SNN implementations of statistical signal processing algorithms on neuromorphic chips, which may offer power savings necessary to overcome a major obstacle to the successful clinical translation of neural motor prostheses. ∗ Present: Research Fellow F.R.S.-FNRS, Systmod Unit, University of Liege, Belgium. 1 1 Cortically-controlled motor prostheses: the challenge Motor prostheses aim to restore function for severely disabled patients by translating neural signals from the brain into useful control signals for prosthetic limbs or computer cursors. Several proof of concept demonstrations have shown encouraging results, but barriers to clinical translation still remain. One example is the development of a fully-implantable system that meets power dissipation constraints, but is still powerful enough to perform complex operations. A recently reported closedloop cortically-controlled motor prosthesis is capable of producing quick, accurate, and robust computer cursor movements by decoding neural signals (threshold-crossings) from a 96-electrode array in rhesus macaque premotor/motor cortex [1]-[4]. This, and previous designs (e.g., [5]), employ versions of the Kalman ﬁlter, ubiquitous in statistical signal processing. Such a ﬁlter and its variants are the state-of-the-art decoder for brain-machine interfaces (BMIs) in humans [5] and monkeys [2]. While these recent advances are encouraging, clinical translation of such BMIs requires fullyimplanted systems, which in turn impose severe power dissipation constraints. Even though it is an open, actively-debated question as to how much of the neural prosthetic system must be implanted, we note that there are no reports to date demonstrating a fully implantable 100-channel wireless transmission system, motivating performing decoding within the implanted chip. This computation is constrained by a stringent power budget: A 6 × 6mm2 implant must dissipate less than 10mW to avoid heating the brain by more than 1◦ C [6], which is believed to be important for long term cell health. With this power budget, current approaches can not scale to higher electrode densities or to substantially more computer-intensive decode/control algorithms. The feasibility of mapping a Kalman-ﬁlter based decoder algorithm [1]-[4] on to a spiking neural network (SNN) has been explored off-line (open-loop). In these off-line tests, the SNN’s performance virtually matched that of the standard implementation [7]. These simulations provide conﬁdence that this algorithm—and others similar to it—could be implemented using an ultra-low-power approach potentially capable of meeting the severe power constraints set by clinical translation. This neuromorphic approach uses very-large-scale integrated systems containing microelectronic analog circuits to morph neural systems into silicon chips [8, 9]. These neuromorphic circuits may yield tremendous power savings—50nW per silicon neuron [10]—over digital circuits because they use physical operations to perform mathematical computations (analog approach). When implemented on a chip designed using the neuromorphic approach, a 2,000-neuron SNN network can consume as little as 100µW. Demonstrating this approach’s feasibility in a closed-loop system running in real-time is a key, non-incremental step in the development of a fully implantable decoding chip, and is necessary before proceeding with fabricating and implanting the chip. As noise, delay, and over-ﬁtting play a more important role in the closed-loop setting, it is not obvious that the SNN’s stellar open-loop performance will hold up. In addition, performance criteria are different in the closed-loop and openloop settings (e.g., time per target vs. root mean squared error). Therefore, a SNN of a different size may be required to meet the desired speciﬁcations. Here we present results and assess the performance and viability of the SNN Kalman-ﬁlter based decoder in real-time, closed-loop tests, with the monkey performing a center-out-and-back target acquisition task. To achieve closed-loop operation, we developed an embedded Matlab implementation that ran a 2,000-neuron version of the SNN in real-time on a PC. We achieved almost a 50-fold speed-up by performing part of the computation in a lower-dimensional space deﬁned by the formal method we used to map the Kalman ﬁlter on to the SNN. This shortcut allowed us to run a larger SNN in real-time than would otherwise be possible. 2 Spiking neural network mapping of control theory algorithms As reported in [11], a formal methodology, called the Neural Engineering Framework (NEF), has been developed to map control-theory algorithms onto a computational fabric consisting of a highly heterogeneous population of spiking neurons simply by programming the strengths of their connections. These artiﬁcial neurons are characterized by a nonlinear multi-dimensional-vector-to-spikerate function—ai (x(t)) for the ith neuron—with parameters (preferred direction, maximum ﬁring rate, and spiking-threshold) drawn randomly from a wide distribution (standard deviation ≈ mean). 2 Spike rate (spikes/s) Representation ˆ x → ai (x) → x = ∑i ai (x)φix ˜ ai (x) = G(αi φix · x + Jibias ) 400 Transformation y = Ax → b j (Aˆ ) x Aˆ = ∑i ai (x)Aφix x x(t) B' y(t) A' 200 0 −1 Dynamics ˙ x = Ax → x = h ∗ A x A = τA + I 0 Stimulus x 1 bk(t) y(t) B' h(t) x(t) A' aj(t) Figure 1: NEF’s three principles. Representation. 1D tuning curves of a population of 50 leaky integrate-and-ﬁre neurons. The neurons’ tuning curves map control variables (x) to spike rates (ai (x)); this nonlinear transformation is inverted by linear weighted decoding. G() is the neurons’ nonlinear current-to-spike-rate function. Transformation. SNN with populations bk (t) and a j (t) representing y(t) and x(t). Feedforward and recurrent weights are determined by B and A , as described next. Dynamics. The system’s dynamics is captured in a neurally plausible fashion by replacing integration with the synapses’ spike response, h(t), and replacing the matrices with A = τA + I and B = τB to compensate. The neural engineering approach to conﬁguring SNNs to perform arbitrary computations is underlined by three principles (Figure 1) [11]-[14]: Representation is deﬁned by nonlinear encoding of x(t) as a spike rate, ai (x(t))—represented by the neuron tuning curve—combined with optimal weighted linear decoding of ai (x(t)) to recover ˆ an estimate of x(t), x(t) = ∑i ai (x(t))φix , where φix are the decoding weights. Transformation is performed by using alternate decoding weights in the decoding operation to map transformations of x(t) directly into transformations of ai (x(t)). For example, y(t) = Ax(t) is represented by the spike rates b j (Aˆ (t)), where unit j’s input is computed directly from unit i’s x output using Aˆ (t) = ∑i ai (x(t))Aφix , an alternative linear weighting. x Dynamics brings the ﬁrst two principles together and adds the time dimension to the circuit. This principle aims at reuniting the control-theory and neural levels by modifying the matrices to render the system neurally plausible, thereby permitting the synapses’ spike response, h(t), (i.e., impulse ˙ response) to capture the system’s dynamics. For example, for h(t) = τ −1 e−t/τ , x = Ax(t) is realized by replacing A with A = τA + I. This so-called neurally plausible matrix yields an equivalent dynamical system: x(t) = h(t) ∗ A x(t), where convolution replaces integration. The nonlinear encoding process—from a multi-dimensional stimulus, x(t), to a one-dimensional soma current, Ji (x(t)), to a ﬁring rate, ai (x(t))—is speciﬁed as: ai (x(t)) = G(Ji (x(t))). (1) Here G is the neurons’ nonlinear current-to-spike-rate function, which is given by G(Ji (x)) = τ ref − τ RC ln (1 − Jth /Ji (x)) −1 , (2) for the leaky integrate-and-ﬁre model (LIF). The LIF neuron has two behavioral regimes: subthreshold and super-threshold. The sub-threshold regime is described by an RC circuit with time constant τ RC . When the sub-threshold soma voltage reaches the threshold, Vth , the neuron emits a spike δ (t −tn ). After this spike, the neuron is reset and rests for τ ref seconds (absolute refractory period) before it resumes integrating. Jth = Vth /R is the minimum input current that produces spiking. Ignoring the soma’s RC time-constant when specifying the SNN’s dynamics are reasonable because the neurons cross threshold at a rate that is proportional to their input current, which thus sets the spike rate instantaneously, without any ﬁltering [11]. The conversion from a multi-dimensional stimulus, x(t), to a one-dimensional soma current, Ji , is ˜ performed by assigning to the neuron a preferred direction, φix , in the stimulus space and taking the dot-product: ˜ Ji (x(t)) = αi φix · x(t) + Jibias , (3) 3 where αi is a gain or conversion factor, and Jibias is a bias current that accounts for background ˜ activity. For a 1D space, φix is either +1 or −1 (drawn randomly), for ON and OFF neurons, respectively. The resulting tuning curves are illustrated in Figure 1, left. The linear decoding process is characterized by the synapses’ spike response, h(t) (i.e., post-synaptic currents), and the decoding weights, φix , which are obtained by minimizing the mean square error. A single noise term, η, takes into account all sources of noise, which have the effect of introducing uncertainty into the decoding process. Hence, the transmitted ﬁring rate can be written as ai (x(t)) + ηi , where ai (x(t)) represents the noiseless set of tuning curves and ηi is a random variable picked from a zero-mean Gaussian distribution with variance σ 2 . Consequently, the mean square error can be written as [11]: E = 1 ˆ [x(t) − x(t)]2 2 x,η,t = 2 1 2 x(t) − ∑ (ai (x(t)) + ηi ) φix i (4) x,η,t where · x,η denotes integration over the range of x and η, the expected noise. We assume that the noise is independent and has the same variance for each neuron [11], which yields: E= where σ2 1 2 2 x(t) − ∑ ai (x(t))φix i x,t 1 + σ 2 ∑(φix )2 , 2 i (5) is the noise variance ηi η j . This expression is minimized by: N φix = ∑ Γ−1 ϒ j , ij (6) j with Γi j = ai (x)a j (x) x + σ 2 δi j , where δ is the Kronecker delta function matrix, and ϒ j = xa j (x) x [11]. One consequence of modeling noise in the neural representation is that the matrix Γ is invertible despite the use of a highly overcomplete representation. In a noiseless representation, Γ is generally singular because, due to the large number of neurons, there is a high probability of having two neurons with similar tuning curves leading to two similar rows in Γ. 3 Kalman-ﬁlter based cortical decoder In the 1960’s, Kalman described a method that uses linear ﬁltering to track the state of a dynamical system throughout time using a model of the dynamics of the system as well as noisy measurements [15]. The model dynamics gives an estimate of the state of the system at the next time step. This estimate is then corrected using the observations (i.e., measurements) at this time step. The relative weights for these two pieces of information are given by the Kalman gain, K [15, 16]. Whereas the Kalman gain is updated at each iteration, the state and observation matrices (deﬁned below)—and corresponding noise matrices—are supposed constant. In the case of prosthetic applications, the system’s state vector is the cursor’s kinematics, xt = y [veltx , velt , 1], where the constant 1 allows for a ﬁxed offset compensation. The measurement vector, yt , is the neural spike rate (spike counts in each time step) of 192 channels of neural threshold crossings. The system’s dynamics is modeled by: xt yt = Axt−1 + wt , = Cxt + qt , (7) (8) where A is the state matrix, C is the observation matrix, and wt and qt are additive, Gaussian noise sources with wt ∼ N (0, W) and qt ∼ N (0, Q). The model parameters (A, C, W and Q) are ﬁt with training data by correlating the observed hand kinematics with the simultaneously measured neural signals (Figure 2). For an efﬁcient decoding, we derived the steady-state update equation by replacing the adaptive Kalman gain by its steady-state formulation: K = (I + WCQ−1 C)−1 W CT Q−1 . This yields the following estimate of the system’s state: xt = (I − KC)Axt−1 + Kyt = MDT xt−1 + MDT yt , x y 4 (9) a Velocity (cm/s) Neuron 10 c 150 5 100 b 50 20 0 −20 0 0 x−velocity y−velocity 2000 4000 6000 8000 Time (ms) 10000 12000 1cm 14000 Trials: 0034-0049 Figure 2: Neural and kinematic measurements (monkey J, 2011-04-16, 16 continuous trials) used to ﬁt the standard Kalman ﬁlter model. a. The 192 cortical recordings fed as input to ﬁt the Kalman ﬁlter’s matrices (color code refers to the number of threshold crossings observed in each 50ms bin). b. Hand x- and y-velocity measurements correlated with the neural data to obtain the Kalman ﬁlter’s matrices. c. Cursor kinematics of 16 continuous trials under direct hand control. where MDT = (I − KC)A and MDT = K are the discrete time (DT) Kalman matrices. The steadyx y state formulation improves efﬁciency with little loss in accuracy because the optimal Kalman gain rapidly converges (typically less than 100 iterations). Indeed, in neural applications under both open-loop and closed-loop conditions, the difference between the full Kalman ﬁlter and its steadystate implementation falls to within 1% in a few seconds [17]. This simplifying assumption reduces the execution time for decoding a typical neuronal ﬁring rate signal approximately seven-fold [17], a critical speed-up for real-time applications. 4 Kalman ﬁlter with a spiking neural network To implement the Kalman ﬁlter with a SNN by applying the NEF, we ﬁrst convert Equation 9 from DT to continuous time (CT), and then replace the CT matrices with neurally plausible ones, which yields: x(t) = h(t) ∗ A x(t) + B y(t) , (10) where A = τMCT + I, B = τMCT , with MCT = MDT − I /∆t and MCT = MDT /∆t, the CT x y x x y y Kalman matrices, and ∆t = 50ms, the discrete time step; τ is the synaptic time-constant. The jth neuron’s input current (see Equation 3) is computed from the system’s current state, x(t), which is computed from estimates of the system’s previous state (ˆ (t) = ∑i ai (t)φix ) and current x y input (ˆ (t) = ∑k bk (t)φk ) using Equation 10. This yields: y ˜x J j (x(t)) = α j φ j · x(t) + J bias j ˜x ˆ ˆ = α j φ j · h(t) ∗ A x(t) + B y(t) ˜x = α j φ j · h(t) ∗ A + J bias j ∑ ai (t)φix + B ∑ bk (t)φky i + J bias j (11) k This last equation can be written in a neural network form: J j (x(t)) = h(t) ∗ ∑ ω ji ai (t) + ∑ ω jk bk (t) i + J bias j (12) k y ˜x ˜x where ω ji = α j φ j A φix and ω jk = α j φ j B φk are the recurrent and feedforward weights, respectively. 5 Efﬁcient implementation of the SNN In this section, we describe the two distinct steps carried out when implementing the SNN: creating and running the network. The ﬁrst step has no computational constraints whereas the second must be very efﬁcient in order to be successfully deployed in the closed-loop experimental setting. 5 x ( 1000 x ( = 1000 1000 = 1000 x 1000 b 1000 x 1000 1000 a Figure 3: Computing a 1000-neuron pool’s recurrent connections. a. Using connection weights requires multiplying a 1000×1000 matrix by a 1000 ×1 vector. b. Operating in the lower-dimensional state space requires multiplying a 1 × 1000 vector by a 1000 × 1 vector to get the decoded state, multiplying this state by a component of the A matrix to update it, and multiplying the updated state by a 1000 × 1 vector to re-encode it as ﬁring rates, which are then used to update the soma current for every neuron. Network creation: This step generates, for a speciﬁed number of neurons composing the network, x ˜x the gain α j , bias current J bias , preferred direction φ j , and decoding weight φ j for each neuron. The j ˜x preferred directions φ j are drawn randomly from a uniform distribution over the unit sphere. The maximum ﬁring rate, max G(J j (x)), and the normalized x-axis intercept, G(J j (x)) = 0, are drawn randomly from a uniform distribution on [200, 400] Hz and [-1, 1], respectively. From these two speciﬁcations, α j and J bias are computed using Equation 2 and Equation 3. The decoding weights j x φ j are computed by minimizing the mean square error (Equation 6). For efﬁcient implementation, we used two 1D integrators (i.e., two recurrent neuron pools, with each pool representing a scalar) rather than a single 3D integrator (i.e., one recurrent neuron pool, with the pool representing a 3D vector by itself) [13]. The constant 1 is fed to the 1D integrators as an input, rather than continuously integrated as part of the state vector. We also replaced the bk (t) units’ spike rates (Figure 1, middle) with the 192 neural measurements (spike counts in 50ms bins), y which is equivalent to choosing φk from a standard basis (i.e., a unit vector with 1 at the kth position and 0 everywhere else) [7]. Network simulation: This step runs the simulation to update the soma current for every neuron, based on input spikes. The soma voltage is then updated following RC circuit dynamics. Gaussian noise is normally added at this step, the rest of the simulation being noiseless. Neurons with soma voltage above threshold generate a spike and enter their refractory period. The neuron ﬁring rates are decoded using the linear decoding weights to get the updated states values, x and y-velocity. These values are smoothed with a ﬁlter identical to h(t), but with τ set to 5ms instead of 20ms to avoid introducing signiﬁcant delay. Then the simulation step starts over again. In order to ensure rapid execution of the simulation step, neuron interactions are not updated dix rectly using the connection matrix (Equation 12), but rather indirectly with the decoding matrix φ j , ˜x dynamics matrix A , and preferred direction matrix φ j (Equation 11). To see why this is more efﬁcient, suppose we have 1000 neurons in the a population for each of the state vector’s two scalars. Computing the recurrent connections using connection weights requires multiplying a 1000 × 1000 matrix by a 1000-dimensional vector (Figure 3a). This requires 106 multiplications and about 106 sums. Decoding each scalar (i.e., ∑i ai (t)φix ), however, requires only 1000 multiplications and 1000 sums. The decoded state vector is then updated by multiplying it by the (diagonal) A matrix, another 2 products and 1 sum. The updated state vector is then encoded by multiplying it with the neurons’ preferred direction vectors, another 1000 multiplications per scalar (Figure 3b). The resulting total of about 3000 operations is nearly three orders of magnitude fewer than using the connection weights to compute the identical transformation. To measure the speedup, we simulated a 2,000-neuron network on a computer running Matlab 2011a (Intel Core i7, 2.7-GHz, Mac OS X Lion). Although the exact run-times depend on the computing hardware and software, the run-time reduction factor should remain approximately constant across platforms. For each reported result, we ran the simulation 10 times to obtain a reliable estimate of the execution time. The run-time for neuron interactions using the recurrent connection weights was 9.9ms and dropped to 2.7µs in the lower-dimensional space, approximately a 3,500-fold speedup. Only the recurrent interactions beneﬁt from the speedup, the execution time for the rest of the operations remaining constant. The run-time for a 50ms network simulation using the recurrent connec6 Table 1: Model parameters Symbol max G(J j (x)) G(J j (x)) = 0 J bias j αj ˜x φj Range 200-400 Hz −1 to 1 Satisﬁes ﬁrst two Satisﬁes ﬁrst two ˜x φj = 1 Description Maximum ﬁring rate Normalized x-axis intercept Bias current Gain factor Preferred-direction vector σ2 τ RC j τ ref j τ PSC j 0.1 20 ms 1 ms 20 ms Gaussian noise variance RC time constant Refractory period PSC time constant tion weights was 0.94s and dropped to 0.0198s in the lower-dimensional space, a 47-fold speedup. These results demonstrate the efﬁciency the lower-dimensional space offers, which made the closedloop application of SNNs possible. 6 Closed-loop implementation An adult male rhesus macaque (monkey J) was trained to perform a center-out-and-back reaching task for juice rewards to one of eight targets, with a 500ms hold time (Figure 4a) [1]. All animal protocols and procedures were approved by the Stanford Institutional Animal Care and Use Committee. Hand position was measured using a Polaris optical tracking system at 60Hz (Northern Digital Inc.). Neural data were recorded from two 96-electrode silicon arrays (Blackrock Microsystems) implanted in the dorsal pre-motor and motor cortex. These recordings (-4.5 RMS threshold crossing applied to each electrode’s signal) yielded tuned activity for the direction and speed of arm movements. As detailed in [1], a standard Kalman ﬁlter model was ﬁt by correlating the observed hand kinematics with the simultaneously measured neural signals, while the monkey moved his arm to acquire virtual targets (Figure 2). The resulting model was used in a closed-loop system to control an on-screen cursor in real-time (Figure 4a, Decoder block). A steady-state version of this model serves as the standard against which the SNN implementation’s performance is compared. We built a SNN using the NEF methodology based on derived Kalman ﬁlter parameters mentioned above. This SNN was then simulated on an xPC Target (Mathworks) x86 system (Dell T3400, Intel Core 2 Duo E8600, 3.33GHz). It ran in closed-loop, replacing the standard Kalman ﬁlter as the decoder block in Figure 4a. The parameter values listed in Table 1 were used for the SNN implementation. We ensured that the time constants τiRC ,τiref , and τiPSC were smaller than the implementation’s time step (50ms). Noise was not explicitly added. It arose naturally from the ﬂuctuations produced by representing a scalar with ﬁltered spike trains, which has been shown to have effects similar to Gaussian noise [11]. For the purpose of computing the linear decoding weights (i.e., Γ), we modeled the resulting noise as Gaussian with a variance of 0.1. A 2,000-neuron version of the SNN-based decoder was tested in a closed-loop system, the largest network our embedded MatLab implementation could run in real-time. There were 1206 trials total among which 301 (center-outs only) were performed with the SNN and 302 with the standard (steady-state) Kalman ﬁlter. The block structure was randomized and interleaved, so that there is no behavioral bias present in the ﬁndings. 100 trials under hand control are used as a baseline comparison. Success corresponds to a target acquisition under 1500ms, with 500ms hold time. Success rates were higher than 99% on all blocks for the SNN implementation and 100% for the standard Kalman ﬁlter. The average time to acquire the target was slightly slower for the SNN (Figure 5b)—711ms vs. 661ms, respectively—we believe this could be improved by using more neurons in the SNN.1 The average distance to target (Figure 5a) and the average velocity of the cursor (Figure 5c) are very similar. 1 Off-line, the SNN performed better as we increased the number of neurons [7]. 7 a Neural Spikes b c BMI: Kalman decoder BMI: SNN decoder Decoder Cursor Velocity 1cm 1cm Trials: 2056-2071 Trials: 1748-1763 5 0 0 400 Time after Target Onset (ms) 800 Target acquisition time histogram 40 Mean cursor velocity 50 Standard Kalman filter 40 20 Hand 30 30 Spiking Neural Network 20 10 0 c Cursor Velocity (cm/s) b Mean distance to target 10 Percent of Trials (%) a Distance to Target (cm) Figure 4: Experimental setup and results. a. Data are recorded from two 96-channel silicon electrode arrays implanted in dorsal pre-motor and motor cortex of an adult male monkey performing a centerout-and-back reach task for juice rewards to one of eight targets with a 500ms hold time. b. BMI position kinematics of 16 continuous trials for the standard Kalman ﬁlter implementation. c. BMI position kinematics of 16 continuous trials for the SNN implementation. 10 0 500 1000 Target Acquire Time (ms) 1500 0 0 200 400 600 800 Time after Target Onset (ms) 1000 Figure 5: SNN (red) performance compared to standard Kalman ﬁlter (blue) (hand trials are shown for reference (yellow)). The SNN achieves similar results—success rates are higher than 99% on all blocks—as the standard Kalman ﬁlter implementation. a. Plot of distance to target vs. time both after target onset for different control modalities. The thicker traces represent the average time when the cursor ﬁrst enters the acceptance window until successfully entering for the 500ms hold time. b. Histogram of target acquisition time. c. Plot of mean cursor velocity vs. time. 7 Conclusions and future work The SNN’s performance was quite comparable to that produced by a standard Kalman ﬁlter implementation. The 2,000-neuron network had success rates higher than 99% on all blocks, with mean distance to target, target acquisition time, and mean cursor velocity curves very similar to the ones obtained with the standard implementation. Future work will explore whether these results extend to additional animals. As the Kalman ﬁlter and its variants are the state-of-the-art in cortically-controlled motor prostheses [1]-[5], these simulations provide conﬁdence that similar levels of performance can be attained with a neuromorphic system, which can potentially overcome the power constraints set by clinical applications. Our ultimate goal is to develop an ultra-low-power neuromorphic chip for prosthetic applications on to which control theory algorithms can be mapped using the NEF. As our next step in this direction, we will begin exploring this mapping with Neurogrid, a hardware platform with sixteen programmable neuromorphic chips that can simulate up to a million spiking neurons in real-time [9]. However, bandwidth limitations prevent Neurogrid from realizing random connectivity patterns. It can only connect each neuron to thousands of others if neighboring neurons share common inputs — just as they do in the cortex. Such columnar organization may be possible with NEF-generated networks if preferred directions vectors are assigned topographically rather than randomly. Implementing this constraint effectively is a subject of ongoing research. Acknowledgment This work was supported in part by the Belgian American Education Foundation(J. Dethier), Stanford NIH Medical Scientist Training Program (MSTP) and Soros Fellowship (P. Nuyujukian), DARPA Revolutionizing Prosthetics program (N66001-06-C-8005, K. V. Shenoy), and two NIH Director’s Pioneer Awards (DP1-OD006409, K. V. Shenoy; DPI-OD000965, K. Boahen). 8 References [1] V. Gilja, Towards clinically viable neural prosthetic systems, Ph.D. Thesis, Department of Computer Science, Stanford University, 2010, pp 19–22 and pp 57–73. [2] V. Gilja, P. Nuyujukian, C.A. Chestek, J.P. Cunningham, J.M. Fan, B.M. Yu, S.I. Ryu, and K.V. Shenoy, A high-performance continuous cortically-controlled prosthesis enabled by feedback control design, 2010 Neuroscience Meeting Planner, San Diego, CA: Society for Neuroscience, 2010. [3] P. Nuyujukian, V. Gilja, C.A. Chestek, J.P. Cunningham, J.M. Fan, B.M. Yu, S.I. Ryu, and K.V. Shenoy, Generalization and robustness of a continuous cortically-controlled prosthesis enabled by feedback control design, 2010 Neuroscience Meeting Planner, San Diego, CA: Society for Neuroscience, 2010. [4] V. Gilja, C.A. Chestek, I. Diester, J.M. Henderson, K. Deisseroth, and K.V. Shenoy, Challenges and opportunities for next-generation intra-cortically based neural prostheses, IEEE Transactions on Biomedical Engineering, 2011, in press. [5] S.P. Kim, J.D. Simeral, L.R. Hochberg, J.P. Donoghue, and M.J. Black, Neural control of computer cursor velocity by decoding motor cortical spiking activity in humans with tetraplegia, Journal of Neural Engineering, vol. 5, 2008, pp 455–476. [6] S. Kim, P. Tathireddy, R.A. Normann, and F. Solzbacher, Thermal impact of an active 3-D microelectrode array implanted in the brain, IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 15, 2007, pp 493–501. [7] J. Dethier, V. Gilja, P. Nuyujukian, S.A. Elassaad, K.V. Shenoy, and K. Boahen, Spiking neural network decoder for brain-machine interfaces, IEEE Engineering in Medicine & Biology Society Conference on Neural Engineering, Cancun, Mexico, 2011, pp 396–399. [8] K. Boahen, Neuromorphic microchips, Scientiﬁc American, vol. 292(5), 2005, pp 56–63. [9] R. Silver, K. Boahen, S. Grillner, N. Kopell, and K.L. Olsen, Neurotech for neuroscience: unifying concepts, organizing principles, and emerging tools, Journal of Neuroscience, vol. 27(44), 2007, pp 11807– 11819. [10] J.V. Arthur and K. Boahen, Silicon neuron design: the dynamical systems approach, IEEE Transactions on Circuits and Systems, vol. 58(5), 2011, pp 1034-1043. [11] C. Eliasmith and C.H. Anderson, Neural engineering: computation, representation, and dynamics in neurobiological systems, MIT Press, Cambridge, MA; 2003. [12] C. Eliasmith, A uniﬁed approach to building and controlling spiking attractor networks, Neural Computation, vol. 17, 2005, pp 1276–1314. [13] R. Singh and C. Eliasmith, Higher-dimensional neurons explain the tuning and dynamics of working memory cells, The Journal of Neuroscience, vol. 26(14), 2006, pp 3667–3678. [14] C. Eliasmith, How to build a brain: from function to implementation, Synthese, vol. 159(3), 2007, pp 373–388. [15] R.E. Kalman, A new approach to linear ﬁltering and prediction problems, Transactions of the ASME– Journal of Basic Engineering, vol. 82(Series D), 1960, pp 35–45. [16] G. Welsh and G. Bishop, An introduction to the Kalman Filter, University of North Carolina at Chapel Hill Chapel Hill NC, vol. 95(TR 95-041), 1995, pp 1–16. [17] W.Q. Malik, W. Truccolo, E.N. Brown, and L.R. Hochberg, Efﬁcient decoding with steady-state Kalman ﬁlter in neural interface systems, IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 19(1), 2011, pp 25–34. 9

2 0.78210187 228 nips-2011-Quasi-Newton Methods for Markov Chain Monte Carlo

Author: Yichuan Zhang, Charles A. Sutton

Abstract: The performance of Markov chain Monte Carlo methods is often sensitive to the scaling and correlations between the random variables of interest. An important source of information about the local correlation and scale is given by the Hessian matrix of the target distribution, but this is often either computationally expensive or infeasible. In this paper we propose MCMC samplers that make use of quasiNewton approximations, which approximate the Hessian of the target distribution from previous samples and gradients generated by the sampler. A key issue is that MCMC samplers that depend on the history of previous states are in general not valid. We address this problem by using limited memory quasi-Newton methods, which depend only on a ﬁxed window of previous samples. On several real world datasets, we show that the quasi-Newton sampler is more effective than standard Hamiltonian Monte Carlo at a fraction of the cost of MCMC methods that require higher-order derivatives. 1

same-paper 3 0.7578541 221 nips-2011-Priors over Recurrent Continuous Time Processes

Author: Ardavan Saeedi, Alexandre Bouchard-côté

4 0.57022876 192 nips-2011-Nonstandard Interpretations of Probabilistic Programs for Efficient Inference

Author: David Wingate, Noah Goodman, Andreas Stuhlmueller, Jeffrey M. Siskind

Abstract: Probabilistic programming languages allow modelers to specify a stochastic process using syntax that resembles modern programming languages. Because the program is in machine-readable format, a variety of techniques from compiler design and program analysis can be used to examine the structure of the distribution represented by the probabilistic program. We show how nonstandard interpretations of probabilistic programs can be used to craft efﬁcient inference algorithms: information about the structure of a distribution (such as gradients or dependencies) is generated as a monad-like side computation while executing the program. These interpretations can be easily coded using special-purpose objects and operator overloading. We implement two examples of nonstandard interpretations in two different languages, and use them as building blocks to construct inference algorithms: automatic differentiation, which enables gradient based methods, and provenance tracking, which enables efﬁcient construction of global proposals. 1

5 0.56051749 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

Author: Armen Allahverdyan, Aram Galstyan

Abstract: We present an asymptotic analysis of Viterbi Training (VT) and contrast it with a more conventional Maximum Likelihood (ML) approach to parameter estimation in Hidden Markov Models. While ML estimator works by (locally) maximizing the likelihood of the observed data, VT seeks to maximize the probability of the most likely hidden state sequence. We develop an analytical framework based on a generating function formalism and illustrate it on an exactly solvable model of HMM with one unambiguous symbol. For this particular model the ML objective function is continuously degenerate. VT objective, in contrast, is shown to have only ﬁnite degeneracy. Furthermore, VT converges faster and results in sparser (simpler) models, thus realizing an automatic Occam’s razor for HMM learning. For more general scenario VT can be worse compared to ML but still capable of correctly recovering most of the parameters. 1

6 0.5537138 66 nips-2011-Crowdclustering

7 0.55349827 229 nips-2011-Query-Aware MCMC

8 0.55080044 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

9 0.54964876 301 nips-2011-Variational Gaussian Process Dynamical Systems

10 0.5490278 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors

11 0.54897082 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems

12 0.548913 68 nips-2011-Demixed Principal Component Analysis

13 0.548352 158 nips-2011-Learning unbelievable probabilities

14 0.54788518 258 nips-2011-Sparse Bayesian Multi-Task Learning

15 0.54778308 285 nips-2011-The Kernel Beta Process

16 0.5475989 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning

17 0.5470801 241 nips-2011-Scalable Training of Mixture Models via Coresets

18 0.5470677 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis

19 0.54655492 197 nips-2011-On Tracking The Partition Function

20 0.54578143 156 nips-2011-Learning to Learn with Compound HD Models