nips nips2004 nips2004-175 knowledge-graph by maker-knowledge-mining

175 nips-2004-Stable adaptive control with online learning

Source: pdf

Author: H. J. Kim, Andrew Y. Ng

Abstract: Learning algorithms have enjoyed numerous successes in robotic control tasks. In problems with time-varying dynamics, online learning methods have also proved to be a powerful tool for automatically tracking and/or adapting to the changing circumstances. However, for safety-critical applications such as airplane ﬂight, the adoption of these algorithms has been signiﬁcantly hampered by their lack of safety, such as “stability,” guarantees. Rather than trying to show difﬁcult, a priori, stability guarantees for speciﬁc learning methods, in this paper we propose a method for “monitoring” the controllers suggested by the learning algorithm online, and rejecting controllers leading to instability. We prove that even if an arbitrary online learning method is used with our algorithm to control a linear dynamical system, the resulting system is stable. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Stable adaptive control with online learning Andrew Y. [sent-1, score-0.279]

2 Rather than trying to show difﬁcult, a priori, stability guarantees for speciﬁc learning methods, in this paper we propose a method for “monitoring” the controllers suggested by the learning algorithm online, and rejecting controllers leading to instability. [sent-6, score-0.509]

3 We prove that even if an arbitrary online learning method is used with our algorithm to control a linear dynamical system, the resulting system is stable. [sent-7, score-0.36]

4 1 Introduction Online learning algorithms provide a powerful set of tools for automatically ﬁne-tuning a controller to optimize performance while in operation, or for automatically adapting to the changing dynamics of a control problem. [sent-8, score-0.403]

5 ,) being powerfully applied to online learning for control, for these methods to be widely adopted for applications such as airplane ﬂight, it is critical that they come with safety guarantees, speciﬁcally stability guarantees. [sent-12, score-0.336]

6 In our interactions with industry, we also found stability to be a frequently raised concern for online learning. [sent-13, score-0.259]

7 We believe that the lack of safety guarantees represents a signiﬁcant barrier to the wider adoption of many powerful learning algorithms for online adaptation and control. [sent-14, score-0.22]

8 It is also typically infeasible to replace formal stability guarantees with only empirical testing: For example, to convincingly demonstrate that we can safely ﬂy a ﬂeet of 100 aircraft for 10000 hours would require 106 hours of ﬂight-tests. [sent-15, score-0.321]

9 The control literature contains many examples of ingenious stability proofs for various online learning schemes. [sent-16, score-0.393]

10 However, most of this work addresses only very speciﬁc online learning methods, and usually quite simple ones (such as ones that switch between only a ﬁnite number of parameter values using a speciﬁc, simple, decision rule, e. [sent-18, score-0.156]

11 In this paper, rather than trying to show difﬁcult a priori stability guarantees for speciﬁc algorithms, we propose a method for “monitoring” an arbitrary learning algorithm being used to control a linear dynamical system. [sent-21, score-0.38]

12 By rejecting control values online that appear to be leading to instability, our algorithm ensures that the resulting controlled system is stable. [sent-22, score-0.429]

13 2 Preliminaries Following most work in control [6], we will consider control of a linear dynamical system. [sent-23, score-0.3]

14 Let xt ∈ Rnx be the nx dimensional state at time t. [sent-24, score-0.425]

15 At each time t, we select a control action ut ∈ Rnu , as a result of which the state transitions to xt+1 = Axt + But + wt . [sent-26, score-0.581]

16 (1) Here, A ∈ Rnx ×nx and B ∈ Rnx ×nu govern the dynamics of the system, and wt is a disturbance term. [sent-27, score-0.443]

17 We will not make any distributional assumptions about the source of the disturbances wt for now (indeed, we will consider a setting where an adversary chooses them from some bounded set). [sent-28, score-0.381]

18 For many applications, the controls are chosen as a linear function of the state: ut = K t xt . [sent-29, score-0.403]

19 If the goal is to minimize the expected value T of a quadratic cost function over the states and actions J = (1/T ) t=1 xT Qxt + uT Rut t t and the wt are gaussian, then we are in the LQR (linear quadratic regulation) control setting. [sent-31, score-0.412]

20 [1] We consider a setting in which an online learning algorithm (also called an adaptive control algorithm) is used to design a controller. [sent-34, score-0.303]

21 Thus, on each time step t, an online algorithm may (based on the observed states and action sequence so far) propose some new gain matrix Kt . [sent-35, score-0.28]

22 More formally, an online learning algorithm is a function f : ∪∞ (Rnx × Rnu )t → Rnu ×nx mapping from ﬁnite sequences of states and actions t=1 (x0 , u0 , . [sent-37, score-0.165]

23 1 Stability In classical control theory [6], probably the most important desideratum of a controlled system is that it must be stable. [sent-43, score-0.232]

24 Given a ﬁxed adaptive control algorithm f and a ﬁxed sequence of disturbance terms w0 , w1 , . [sent-44, score-0.391]

25 , the sequence of states xt visited is exactly determined by the equations Kt = f (x0 , u0 , . [sent-47, score-0.344]

26 (3) Thus, for ﬁxed f , we can think of the (controlled) dynamical system as a mapping from the sequence of disturbance terms wt to the sequence of states xt . [sent-54, score-0.879]

27 A system controlled by f is bounded-input bounded-output (BIBO) stable if, given any constant c1 > 0, there exists some constant c2 > 0 so that for all sequences of disturbance terms satisfying ||wt ||2 ≤ c1 (for all t = 1, 2, . [sent-59, score-0.324]

28 Thus, a system is BIBO stable if, under bounded disturbances to it (possibly chosen by an adversary), the state remains bounded and does not diverge. [sent-66, score-0.388]

29 Note therefore that the state transition dynamics of the system (right half of Equation 3) may now be written xt+1 = Dt xt + wt . [sent-68, score-0.693]

30 Further, the dependence of xt on the wt ’s can be expressed as follows: xt = wt−1 + Dt−1 xt−1 = wt−1 + Dt−1 (wt−2 + Dt−2 xt−2 ) = · · · (4) = wt−1 + Dt−1 wt−2 + Dt−1 Dt−2 wt−3 + · · · + Dt−1 · · · D1 w0 . [sent-69, score-0.773]

31 (5) Since the number of terms in the sum above grows linearly with t, to ensure BIBO stability of a system—i. [sent-70, score-0.173]

32 , that xt remains bounded for all t—it is usually necessary for the terms in the sum to decay rapidly, so that the sum remains bounded. [sent-72, score-0.351]

33 More generally, the disturbance wt contributes a term Dt+k−1 · · · Dt+1 wt to the state xt+k , and we would like Dt+k−1 · · · Dt+1 wt to become small rapidly as k becomes large (or, in the control parlance, for the effects of the disturbance wt on xt+k to be attenuated quickly). [sent-74, score-1.53]

34 If Kt = K for all t, then we say that we using a (nonadaptive) stationary controller K. [sent-75, score-0.252]

35 [6] To informally see why, note that the effect of wt on xt+k can be written Dk−1 wt (as in Equation 5). [sent-78, score-0.49]

36 Moreover, |λmax (D)| < 1 implies D k−1 wt → 0 as k → ∞. [sent-79, score-0.245]

37 Thus, the disturbance wt has a negligible inﬂuence on xt+k for large k. [sent-80, score-0.394]

38 [6] It was easy to check for stability when Kt was stationary, because the mapping from the wt ’s to the xt ’s was linear. [sent-82, score-0.692]

39 , wt−2 ), then xt+1 = Axt + BKt xt + wt will be a nonlinear function of the sequence of disturbances. [sent-89, score-0.556]

40 1 This makes it signiﬁcantly more difﬁcult to check for BIBO stability of the system. [sent-90, score-0.183]

41 81 (8) t w0 (9) t Thus, even though the wt ’s are bounded, we have ||x2t+1 ||2 ≥ (100. [sent-107, score-0.245]

42 3 Checking for stability If f is a complex learning algorithm, it is typically very difﬁcult to guarantee that the resulting system is BIBO stable. [sent-110, score-0.213]

43 Indeed, even if f switches between only two speciﬁc sets of gains K, and if w0 is the only non-zero disturbance term, it can still be undecidable to determine whether the state sequence remains bounded. [sent-111, score-0.539]

44 [3] Rather than try to give a priori guarantees on f , we instead propose a method for ensuring BIBO stability of a system by “monitoring” the control gains proposed by f , and rejecting gains that appear to be ˆ leading to instability. [sent-112, score-0.869]

45 We start computing controls according to a set of gains Kt only if it is accepted by the algorithm. [sent-113, score-0.301]

46 1, the criterion for accepting or rejecting a set of gains Kt cannot simply be to check if λmax (A+BKt ) = λmax (Dt ) < 1. [sent-115, score-0.306]

47 Thus, if we could ensure that σmax (Dt ) ≤ 1 − for all t, we would ﬁnd that the inﬂuence of w0 on xt has norm bounded by ||Dt−1 Dt−2 . [sent-126, score-0.331]

48 ) is still nonlinear because of the multiplicative term Kt xt in the dynamics (Equation 3). [sent-138, score-0.313]

49 Thus, the inﬂuence of wt on xt+k goes to 0 as k → ∞. [sent-146, score-0.245]

50 Speciﬁcally, there are many stable, stationary controllers that do not satisfy this. [sent-148, score-0.169]

51 Thus, it should be acceptable for us to use a controller with either of these Dt (so long as we do not switch between them on every step). [sent-151, score-0.268]

52 (10) This is motivated by the following, which shows that any stable, stationary controller meets this condition (for sufﬁciently large N ): Proposition 3. [sent-155, score-0.252]

53 Initialization: Assume we have some initial stable controller K0 , so that N λmax (D0 ) < 1, where D0 = A + BK0 . [sent-167, score-0.275]

54 (a) Run the online learning algorithm f to compute the next set of proposed ˆ gains Kt = f (x0 , u0 , . [sent-175, score-0.341]

55 (d) Let Dt = A + BKt , and pick our action at time t to be ut = Kt xt . [sent-193, score-0.393]

56 We begin by showing that, if we use this algorithm to “ﬁlter” the gains output by the online learning algorithm, Equation (10) holds. [sent-194, score-0.341]

57 be the sequence of gains selected using the algorithm above. [sent-202, score-0.28]

58 Thus, τ is the index of the time step at which we most recently accepted a set of gains from f (or 0 if no such gains exist). [sent-210, score-0.523]

59 = Kt , since the gains stay the same in every time step on which we do not accept a new one. [sent-214, score-0.318]

60 In case (i), τ = 0, and we did not accept any gains after time 0. [sent-220, score-0.273]

61 4: Let an arbitrary learning algorithm f be given, and suppose we use f to control a system, but using our algorithm to accept/reject gains selected by f . [sent-231, score-0.391]

62 4 guarantees that, using our algorithm, we can safely apply any adaptive control algorithm f to our system. [sent-240, score-0.253]

63 As discussed previously, it is difﬁcult to exactly characterize the class of BIBO-stable controllers, and thus the set of controllers that we can safety accept. [sent-241, score-0.162]

64 4 that certain large, “reasonable” classes of adaptive control methods will always have their proposed controllers accepted by our method. [sent-243, score-0.346]

65 For example, it is a folk theorem in control that if we use only stable sets of gains (K : λmax (A + BK) < 1), and if we switch “sufﬁciently slowly” between them, then system will be stable. [sent-244, score-0.53]

66 5: Let any 0 < < 1 be ﬁxed, and let K ⊆ Rnu ×nx be a ﬁnite set of controller gains, so that for all K ∈ K, we have λmax (A + BK) < 1. [sent-246, score-0.198]

67 Then there exist constants N0 and k so that for all N ≥ N0 , if (i) Our algorithm is run with parameters N, , and (ii) The adaptive control algorithm f picks only gains in K, and moreover switches gains no ˆ ˆ ˆ ˆ ˆ more than once every k steps (i. [sent-247, score-0.699]

68 5 90 10 3 80 10 1 70 10 2 controller numder 60 state (x1) state (x1) 10 50 10 1 40 10 0. [sent-251, score-0.344]

69 5 0 200 400 600 800 1000 t 1200 1400 1600 1800 2000 (c) Figure 1: (a) Typical state sequence (ﬁrst component xt,1 of state vector) using switching controllers from Equation (6). [sent-253, score-0.332]

70 ) (b) Typical state sequence using our algorithm and the same controller f . [sent-255, score-0.342]

71 1) (c) Index of the controller used over time, when using our algorithm. [sent-257, score-0.198]

72 A similar result also holds if K is inﬁnite (but ∃c > 0, ∀K ∈ K, λmax (A + BK) ≤ 1 − c), and if the proposed gains change on ˆ ˆ every step but the differences ||Kt − Kt+1 ||F between successive values is small. [sent-259, score-0.254]

73 In the ﬁrst experiment, we apply the switching controller given in (6). [sent-261, score-0.222]

74 Figure 1a shows a typical state sequence resulting from using this controller without using our algorithm to monitor it (and wt ’s from an IID standard Normal distribution). [sent-262, score-0.613]

75 Even though λmax (Dt ) < 1 for all t, the controlled system is unstable, and the state rapidly diverges. [sent-263, score-0.192]

76 (If do not use our algorithm so that the controller switches on every time step, this ﬁgure would switch between 0 and 1 on every time step. [sent-267, score-0.398]

77 ) We see that our algorithm is rejecting most of the proposed switches to the controller; speciﬁcally, it is permitting f to switch between the two controllers only every 140 steps or so. [sent-268, score-0.314]

78 By slowing down the rate at which we switch controllers, it causes the system to become stable (compare Theorem 3. [sent-269, score-0.187]

79 We have a four-dimensional state vector x t consisting of the sideslip angle β, bank angle φ, yaw rate, and roll rate of the aircraft in cruise ﬂight. [sent-273, score-0.159]

80 The state transition dynamics are given as in Equation (1)6 with IID gaussian disturbance terms wt . [sent-275, score-0.516]

81 But instead of observing the states directly, on each time step t we observe only yt = Cxt + vt , (20) ny ny where yt ∈ R , and the disturbances vt ∈ R are distributed Normal(0, Σv ). [sent-276, score-0.181]

82 , if A, B, C, Σv , Σw were ﬁxed), then this is a standard LQG problem, and optimal estimates xt of the hidden states xt are obtained using a Kalman ﬁlter: ˆ xt+1 = Lt (yt+1 − C(Axt + But )) + Aˆt + But , ˆ x (21) where Lt ∈ Rnx ×ny is the Kalman ﬁlter gain matrix. [sent-279, score-0.584]

83 Further, it is known that, in LQG, the optimal steady state controller is obtained by picking actions according to u t = Kt xt , ˆ where Kt are appropriate control gains. [sent-280, score-0.669]

84 [1] In our aircraft control problem, C = 0 1 0 0 , so that only two of the four state 0 0 0 1 variables and are observed directly. [sent-282, score-0.267]

85 Since the reliability of the sensors changes over time, one might want to apply an online learning algorithm (such as online stochastic gradient ascent) to dynamically estimate the values of Σv,11 and Σv,22 . [sent-306, score-0.24]

86 This gives a simple method for adapting our controller and Kalman ﬁlter parameters to the varying noise parameters. [sent-310, score-0.22]

87 The adaptive control algorithm that we have described is sufﬁciently complex that it is extremely difﬁcult to prove that it gives a stable controller. [sent-311, score-0.272]

88 To do so, note that the “state” of the controlled system at each time step is fully characterized by the true world state x t and the internal state estimate of the Kalman ﬁlter xt . [sent-313, score-0.553]

89 So, we can deﬁne an augmented ˆ state vector xt = [xt ; xt ] ∈ R8 . [sent-314, score-0.601]

90 Because xt+1 is linear in ut (which is in turn linear in xt ) ˜ ˆ ˆ and similarly xt+1 is linear in xt and ut (substitute (20) into (21)), for a ﬁxed set of gains ˆ Kt and Lt , we can express xt+1 as a linear function of xt plus a disturbance: ˜ ˜ ˜ ˜ xt+1 = Dt xt + wt . [sent-315, score-1.724]

91 It turns out that there is a very subtle bug in the online learning algorithm. [sent-320, score-0.16]

92 When this occurs, the Matlab LQG solver for the steady-state gains outputs L = 0 on this and all successive time steps. [sent-323, score-0.259]

93 When the learning algorithm 7 Even if we had anticipated this speciﬁc bug and clipped Σv,11 to be non-negative, the LQG solver (from the Matlab controls toolbox) still outputs invalid gains, since it expects nonsingular Σ v . [sent-326, score-0.16]

94 2 0 1 2 3 4 5 4 (a) x 10 6 7 8 9 10 4 (b) x 10 Figure 3: (a) Typical plot of state (xt,1 ) using the (buggy) online learning algorithm in a sequence in which L was set to zero part-way through the sequence. [sent-334, score-0.252]

95 encounters the bug, our algorithm successfully rejects the changes to the gains that lead to instability, thereby keeping the system stable. [sent-337, score-0.295]

96 5 Discussion Space constraints preclude a full discussion, but these ideas can also be applied to verifying the stability of certain nonlinear dynamical systems. [sent-338, score-0.183]

97 8 The same idea also applies to settings where A may be changing (perhaps adversarially) within some bounded set, or if the dynamics are unknown so that we need to verify stability with respect to a set of possible dynamics. [sent-346, score-0.245]

98 In simulation experiments of the Stanford autonomous helicopter, by using a linearization of the non-linear dynamics, our algorithm was also empirically successful at stabilizing an adaptive control algorithm that normally drives the helicopter into unstable oscillations. [sent-347, score-0.243]

99 A locally weighted learning composite adaptive controller with structure adaptation. [sent-402, score-0.235]

100 8 Checking all k N such combinations takes time exponential in N , but it is often possible to use very small values of N , sometimes including N = 1, if the states xt are linearly reparameterized (xt = M xt ) to minimize σmax (D0 ). [sent-414, score-0.583]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dt', 0.529), ('kt', 0.404), ('xt', 0.264), ('wt', 0.245), ('gains', 0.209), ('controller', 0.198), ('bibo', 0.194), ('stability', 0.151), ('disturbance', 0.149), ('max', 0.138), ('control', 0.134), ('controllers', 0.115), ('online', 0.108), ('ut', 0.107), ('bkt', 0.105), ('rnu', 0.09), ('rnx', 0.09), ('stable', 0.077), ('jn', 0.075), ('lqg', 0.075), ('lt', 0.074), ('state', 0.073), ('nx', 0.066), ('disturbances', 0.065), ('rejecting', 0.065), ('system', 0.062), ('aircraft', 0.06), ('axt', 0.06), ('dkn', 0.06), ('accepted', 0.06), ('stationary', 0.054), ('bug', 0.052), ('dynamics', 0.049), ('switch', 0.048), ('safety', 0.047), ('sequence', 0.047), ('deven', 0.045), ('din', 0.045), ('bounded', 0.045), ('kalman', 0.044), ('accept', 0.042), ('switches', 0.04), ('guarantees', 0.039), ('dodd', 0.039), ('adaptive', 0.037), ('qv', 0.036), ('controlled', 0.036), ('lter', 0.034), ('states', 0.033), ('controls', 0.032), ('check', 0.032), ('dynamical', 0.032), ('monitoring', 0.031), ('airplane', 0.03), ('boeing', 0.03), ('ight', 0.03), ('seoul', 0.03), ('switched', 0.03), ('solver', 0.028), ('equation', 0.027), ('bk', 0.027), ('hours', 0.026), ('proposition', 0.026), ('adoption', 0.026), ('safe', 0.026), ('adversary', 0.026), ('nu', 0.026), ('yaw', 0.026), ('typical', 0.026), ('ascent', 0.025), ('switching', 0.024), ('lyapunov', 0.024), ('instability', 0.024), ('geometrically', 0.024), ('invalid', 0.024), ('helicopter', 0.024), ('attenuated', 0.024), ('algorithm', 0.024), ('step', 0.023), ('gain', 0.023), ('every', 0.022), ('cally', 0.022), ('prentice', 0.022), ('adapting', 0.022), ('time', 0.022), ('ensure', 0.022), ('remains', 0.021), ('demand', 0.021), ('singular', 0.021), ('rapidly', 0.021), ('matrices', 0.02), ('toolbox', 0.02), ('checking', 0.02), ('uence', 0.02), ('stanford', 0.019), ('ny', 0.019), ('safely', 0.019), ('rejected', 0.019), ('reject', 0.019), ('checked', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999917 175 nips-2004-Stable adaptive control with online learning

Author: H. J. Kim, Andrew Y. Ng

2 0.2928912 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

Author: Ofer Dekel, Shai Shalev-shwartz, Yoram Singer

Abstract: Prediction sufﬁx trees (PST) provide a popular and effective tool for tasks such as compression, classiﬁcation, and language modeling. In this paper we take a decision theoretic view of PSTs for the task of sequence prediction. Generalizing the notion of margin to PSTs, we present an online PST learning algorithm and derive a loss bound for it. The depth of the PST generated by this algorithm scales linearly with the length of the input. We then describe a self-bounded enhancement of our learning algorithm which automatically grows a bounded-depth PST. We also prove an analogous mistake-bound for the self-bounded algorithm. The result is an efﬁcient algorithm that neither relies on a-priori assumptions on the shape or maximal depth of the target PST nor does it require any parameters. To our knowledge, this is the ﬁrst provably-correct PST learning algorithm which generates a bounded-depth PST while being competitive with any ﬁxed PST determined in hindsight. 1

3 0.21853104 28 nips-2004-Bayesian inference in spiking neurons

Author: Sophie Deneve

Abstract: We propose a new interpretation of spiking neurons as Bayesian integrators accumulating evidence over time about events in the external world or the body, and communicating to other neurons their certainties about these events. In this model, spikes signal the occurrence of new information, i.e. what cannot be predicted from the past activity. As a result, ﬁring statistics are close to Poisson, albeit providing a deterministic representation of probabilities. We proceed to develop a theory of Bayesian inference in spiking neural networks, recurrent interactions implementing a variant of belief propagation. Many perceptual and motor tasks performed by the central nervous system are probabilistic, and can be described in a Bayesian framework [4, 3]. A few important but hidden properties, such as direction of motion, or appropriate motor commands, are inferred from many noisy, local and ambiguous sensory cues. These evidences are combined with priors about the sensory world and body. Importantly, because most of these inferences should lead to quick and irreversible decisions in a perpetually changing world, noisy cues have to be integrated on-line, but in a way that takes into account unpredictable events, such as a sudden change in motion direction or the appearance of a new stimulus. This raises the question of how this temporal integration can be performed at the neural level. It has been proposed that single neurons in sensory cortices represent and compute the log probability that a sensory variable takes on a certain value (eg Is visual motion in the neuron’s preferred direction?) [9, 7]. Alternatively, to avoid normalization issues and provide an appropriate signal for decision making, neurons could represent the log probability ratio of a particular hypothesis (eg is motion more likely to be towards the right than towards the left) [7, 6]. Log probabilities are convenient here, since under some assumptions, independent noisy cues simply combine linearly. Moreover, there are physiological evidence for the neural representation of log probabilities and log probability ratios [9, 6, 7]. However, these models assume that neurons represent probabilities in their ﬁring rates. We argue that it is important to study how probabilistic information are encoded in spikes. Indeed, it seems spurious to marry the idea of an exquisite on-line integration of noisy cues with an underlying rate code that requires averaging on large populations of noisy neurons and long periods of time. In particular, most natural tasks require this integration to take place on the time scale of inter-spike intervals. Spikes are more efﬁciently signaling events ∗ Institute of Cognitive Science, 69645 Bron, France than analog quantities. In addition, a neural theory of inference with spikes will bring us closer to the physiological level and generate more easily testable predictions. Thus, we propose a new theory of neural processing in which spike trains provide a deterministic, online representation of a log-probability ratio. Spikes signals events, eg that the log-probability ratio has exceeded what could be predicted from previous spikes. This form of coding was loosely inspired by the idea of ”energy landscape” coding proposed by Hinton and Brown [2]. However, contrary to [2] and other theories using rate-based representation of probabilities, this model is self-consistent and does not require different models for encoding and decoding: As output spikes provide new, unpredictable, temporally independent evidence, they can be used directly as an input to other Bayesian neurons. Finally, we show that these neurons can be used as building blocks in a theory of approximate Bayesian inference in recurrent spiking networks. Connections between neurons implement an underlying Bayesian network, consisting of coupled hidden Markov models. Propagation of spikes is a form of belief propagation in this underlying graphical model. Our theory provides computational explanations of some general physiological properties of cortical neurons, such as spike frequency adaptation, Poisson statistics of spike trains, the existence of strong local inhibition in cortical columns, and the maintenance of a tight balance between excitation and inhibition. Finally, we discuss the implications of this model for the debate about temporal versus rate-based neural coding. 1 Spikes and log posterior odds 1.1 Synaptic integration seen as inference in a hidden Markov chain We propose that each neuron codes for an underlying ”hidden” binary variable, xt , whose state evolves over time. We assume that xt depends only on the state at the previous time step, xt−dt , and is conditionally independent of other past states. The state xt can switch 1 from 0 to 1 with a constant rate ron = dt limdt→0 P (xt = 1|xt−dt = 0), and from 1 to 0 with a constant rate roﬀ . For example, these transition rates could represent how often motion in a preferred direction appears the receptive ﬁeld and how long it is likely to stay there. The neuron infers the state of its hidden variable from N noisy synaptic inputs, considered to be observations of the hidden state. In this initial version of the model, we assume that these inputs are conditionally independent homogeneous Poisson processes, synapse i i emitting a spike between time t and t + dt (si = 1) with constant probability qon dt if t i xt = 1, and another constant probability qoﬀ dt if xt = 0. The synaptic spikes are assumed to be otherwise independent of previous synaptic spikes, previous states and spikes at other synapses. The resulting generative model is a hidden Markov chain (ﬁgure 1-A). However, rather than estimating the state of its hidden variable and communicating this estimate to other neurons (for example by emitting a spike when sensory evidence for xt = 1 goes above a threshold) the neuron reports and communicates its certainty that the current state is 1. This certainty takes the form of the log of the ratio of the probability that the hidden state is 1, and the probability that the state is 0, given all the synaptic inputs P (xt =1|s0→t ) received so far: Lt = log P (xt =0|s0→t ) . We use s0→t as a short hand notation for the N synaptic inputs received at present and in the past. We will refer to it as the log odds ratio. Thanks to the conditional independencies assumed in the generative model, we can compute this Log odds ratio iteratively. Taking the limit as dt goes to zero, we get the following differential equation: ˙ L = ron 1 + e−L − roﬀ 1 + eL + i wi δ(si − 1) − θ t B. A. xt ron .roff dt qon , qoff st xt ron .roff i t st dt s qon , qoff qon , qoff st dt xt j st Ot It Gt Ot Lt t t dt C. E. 2 0 -2 -4 D. 500 1000 1500 2000 2500 2 3000 Count Log odds 4 20 Lt 0 -2 0 500 1000 1500 2000 2500 Time Ot 3000 0 200 400 600 ISI Figure 1: A. Generative model for the synaptic input. B. Schematic representation of log odds ratio encoding and decoding. The dashed circle represents both eventual downstream elements and the self-prediction taking place inside the model neuron. A spike is ﬁred only when Lt exceeds Gt . C. One example trial, where the state switches from 0 to 1 (shaded area) and back to 0. plain: Lt , dotted: Gt . Black stripes at the top: corresponding spikes train. D. Mean Log odds ratio (dark line) and mean output ﬁring rate (clear line). E. Output spike raster plot (1 line per trial) and ISI distribution for the neuron shown is C. and D. Clear line: ISI distribution for a poisson neuron with the same rate. wi , the synaptic weight, describe how informative synapse i is about the state of the hidden i qon variable, e.g. wi = log qi . Each synaptic spike (si = 1) gives an impulse to the log t off odds ratio, which is positive if this synapse is more active when the hidden state if 1 (i.e it increases the neuron’s conﬁdence that the state is 1), and negative if this synapse is more active when xt = 0 (i.e it decreases the neuron’s conﬁdence that the state is 1). The bias, θ, is determined by how informative it is not to receive any spike, e.g. θ = i i i qon − qoﬀ . By convention, we will consider that the ”bias” is positive or zero (if not, we need simply to invert the status of the state x). 1.2 Generation of output spikes The spike train should convey a sparse representation of Lt , so that each spike reports new information about the state xt that is not redundant with that reported by other, preceding, spikes. This proposition is based on three arguments: First, spikes, being metabolically expensive, should be kept to a minimum. Second, spikes conveying redundant information would require a decoding of the entire spike train, whereas independent spike can be taken into account individually. And ﬁnally, we seek a self consistent model, with the spiking output having a similar semantics to its spiking input. To maximize the independence of the spikes (conditioned on xt ), we propose that the neuron ﬁres only when the difference between its log odds ratio Lt and a prediction Gt of this log odds ratio based on the output spikes emitted so far reaches a certain threshold. Indeed, supposing that downstream elements predicts Lt as best as they can, the neuron only needs to ﬁre when it expects that prediction to be too inaccurate (ﬁgure 1-B). In practice, this will happen when the neuron receives new evidence for xt = 1. Gt should thereby follow the same dynamics as Lt when spikes are not received. The equation for Gt and the output Ot (Ot = 1 when an output spike is ﬁred) are given by: ˙ G = Ot = ron 1 + e−L − roﬀ 1 + eL + go δ(Ot − 1) go 1. when Lt > Gt + , 0 otherwise, 2 (1) (2) Here go , a positive constant, is the only free parameter, the other parameters being constrained by the statistics of the synaptic input. 1.3 Results Figure 1-C plots a typical trial, showing the behavior of L, G and O before, during and after presentation of the stimulus. As random synaptic inputs are integrated, L ﬂuctuates and eventually exceeds G + 0.5, leading to an output spike. Immediately after a spike, G jumps to G + go , which prevents (except in very rare cases) a second spike from immediately following the ﬁrst. Thus, this ”jump” implements a relative refractory period. However, ron G decays as it tends to converge back to its stable level gstable = log roff . Thus L eventually exceeds G again, leading to a new spike. This threshold crossing happens more often during stimulation (xt = 1) as the net synaptic input alters to create a higher overall level of certainty, Lt . Mean Log odds ratio and output ﬁring rate ¯ The mean ﬁring rate Ot of the Bayesian neuron during presentation of its preferred stimulus (i.e. when xt switches from 0 to 1 and back to 0) is plotted in ﬁgure 1-D, together with the ¯ mean log posterior ratio Lt , both averaged over trials. Not surprisingly, the log-posterior ratio reﬂects the leaky integration of synaptic evidence, with an effective time constant that depends on the transition probabilities ron , roﬀ . If the state is very stable (ron = roﬀ ∼ 0), synaptic evidence is integrated over almost inﬁnite time periods, the mean log posterior ratio tending to either increase or decrease linearly with time. In the example in ﬁgure 1D, the state is less stable, so ”old” synaptic evidence are discounted and Lt saturates. ¯ In contrast, the mean output ﬁring rate Ot tracks the state of xt almost perfectly. This is because, as a form of predictive coding, the output spikes reﬂect the new synaptic i evidence, It = i δ(st − 1) − θ, rather than the log posterior ratio itself. In particular, the mean output ﬁring rate is a rectiﬁed linear function of the mean input, e. g. + ¯ ¯ wi q i −θ . O= 1I= go i on(oﬀ) Analogy with a leaky integrate and ﬁre neuron We can get an interesting insight into the computation performed by this neuron by linearizing L and G around their mean levels over trials. Here we reduce the analysis to prolonged, statistically stable periods when the state is constant (either ON or OFF). In this case, the ¯ ¯ mean level of certainty L and its output prediction G are also constant over time. We make the rough approximation that the post spike jump, go , and the input ﬂuctuations are small ¯ compared to the mean level of certainty L. Rewriting Vt = Lt − Gt + go 2 as the ”membrane potential” of the Bayesian neuron: ˙ V = −kL V + It − ∆go − go Ot ¯ ¯ ¯ where kL = ron e−L + roﬀ eL , the ”leak” of the membrane potential, depends on the overall ¯ level of certainty. ∆go is positive and a monotonic increasing function of go . A. s t1 dt s t1 s t1 dt B. C. x t1 x t3 dt x t3 x t3 dt x t1 x t1 x t1 x t2 x t3 x t1 … x tn x t3 x t2 … x tn … dt dt Lx2 D. x t2 dt s t2 dt x t2 s t2 x t2 dt s t2 dt Log odds 10 No inh -0.5 -1 -1 -1.5 -2 5 Feedback 500 1000 1500 2000 Tiger Stripes 0 -5 -10 500 1000 1500 2000 2500 Time Figure 2: A. Bayesian causal network for yt (tiger), x1 (stripes) and x2 (paws). B. A nett t work feedforward computing the log posterior for x1 . C. A recurrent network computing t the log posterior odds for all variables. D. Log odds ratio in a simulated trial with the net2 1 1 work in C (see text). Thick line: Lx , thin line: Lx , dash-dotted: Lx without inhibition. t t t 2 Insert: Lx averaged over trials, showing the effect of feedback. t The linearized Bayesian neuron thus acts in its stable regime as a leaky integrate and ﬁre (LIF) neuron. The membrane potential Vt integrates its input, Jt = It − ∆go , with a leak kL . The neuron ﬁres when its membrane potential reaches a constant threshold go . After ¯ each spikes, Vt is reset to 0. Interestingly, for appropriately chosen compression factor go , the mean input to the lin¯ ¯ earized neuron J = I − ∆go ≈ 0 1 . This means that the membrane potential is purely driven to its threshold by input ﬂuctuations, or a random walk in membrane potential. As a consequence, the neuron’s ﬁring will be memoryless, and close to a Poisson process. In particular, we found Fano factor close to 1 and quasi-exponential ISI distribution (ﬁgure 1E) on the entire range of parameters tested. Indeed, LIF neurons with balanced inputs have been proposed as a model to reproduce the statistics of real cortical neurons [8]. This balance is implemented in our model by the neuron’s effective self-inhibition, even when the synaptic input itself is not balanced. Decoding As we previously said, downstream elements could predict the log odds ratio Lt by computing Gt from the output spikes (Eq 1, ﬁg 1-B). Of course, this requires an estimate of the transition probabilities ron , roﬀ , that could be learned from the observed spike trains. However, we show next that explicit decoding is not necessary to perform bayesian inference in spiking networks. Intuitively, this is because the quantity that our model neurons receive and transmit, eg new information, is exactly what probabilistic inference algorithm propagate between connected statistical elements. 1 ¯ Even if go is not chosen optimally, the inﬂuence of the drift J is usually negligible compared to the large ﬂuctuations in membrane potential. 2 Bayesian inference in cortical networks The model neurons, having the same input and output semantics, can be used as building blocks to implement more complex generative models consisting of coupled Markov chains. Consider, for example, the example in ﬁgure 2-A. Here, a ”parent” variable x1 t (the presence of a tiger) can cause the state of n other ”children” variables ([xk ]k=2...n ), t of whom two are represented (the presence of stripes,x2 , and motion, x3 ). The ”chilt t dren” variables are Bayesian neurons identical to those described previously. The resulting bayesian network consist of n + 1 coupled hidden Markov chains. Inference in this architecture corresponds to computing the log posterior odds ratio for the tiger, x1 , and the log t posterior of observing stripes or motion, ([xk ]k=2...n ), given the synaptic inputs received t by the entire network so far, i.e. s2 , . . . , sk . 0→t 0→t Unfortunately, inference and learning in this network (and in general in coupled Markov chains) requires very expensive computations, and cannot be performed by simply propagating messages over time and among the variable nodes. In particular, the state of a child k variable xt depends on xk , sk , x1 and the state of all other children at the previous t t t−dt time step, [xj ]2

4 0.18846902 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection

Author: Koji Tsuda, Gunnar Rätsch, Manfred K. Warmuth

Abstract: We address the problem of learning a symmetric positive deﬁnite matrix. The central issue is to design parameter updates that preserve positive deﬁniteness. Our updates are motivated with the von Neumann divergence. Rather than treating the most general case, we focus on two key applications that exemplify our methods: On-line learning with a simple square loss and ﬁnding a symmetric positive deﬁnite matrix subject to symmetric linear constraints. The updates generalize the Exponentiated Gradient (EG) update and AdaBoost, respectively: the parameter is now a symmetric positive deﬁnite matrix of trace one instead of a probability vector (which in this context is a diagonal positive deﬁnite matrix with trace one). The generalized updates use matrix logarithms and exponentials to preserve positive deﬁniteness. Most importantly, we show how the analysis of each algorithm generalizes to the non-diagonal case. We apply both new algorithms, called the Matrix Exponentiated Gradient (MEG) update and DeﬁniteBoost, to learn a kernel matrix from distance measurements.

5 0.15531527 26 nips-2004-At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks

Author: Nils Bertschinger, Thomas Natschläger, Robert A. Legenstein

Abstract: In this paper we analyze the relationship between the computational capabilities of randomly connected networks of threshold gates in the timeseries domain and their dynamical properties. In particular we propose a complexity measure which we ﬁnd to assume its highest values near the edge of chaos, i.e. the transition from ordered to chaotic dynamics. Furthermore we show that the proposed complexity measure predicts the computational capabilities very well: only near the edge of chaos are such networks able to perform complex computations on time series. Additionally a simple synaptic scaling rule for self-organized criticality is presented and analyzed. 1

6 0.13517433 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms

7 0.12715937 39 nips-2004-Coarticulation in Markov Decision Processes

8 0.12198999 24 nips-2004-Approximately Efficient Online Mechanism Design

9 0.10518848 91 nips-2004-Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters

10 0.094443046 48 nips-2004-Convergence and No-Regret in Multiagent Learning

11 0.090357378 83 nips-2004-Incremental Learning for Visual Tracking

12 0.08636488 67 nips-2004-Exponentiated Gradient Algorithms for Large-margin Structured Classification

13 0.08096575 116 nips-2004-Message Errors in Belief Propagation

14 0.080848806 200 nips-2004-Using Random Forests in the Structured Language Model

15 0.078581505 136 nips-2004-On Semi-Supervised Classification

16 0.077852838 204 nips-2004-Variational Minimax Estimation of Discrete Distributions under KL Loss

17 0.067418896 126 nips-2004-Nearly Tight Bounds for the Continuum-Armed Bandit Problem

18 0.065935045 178 nips-2004-Support Vector Classification with Input Data Uncertainty

19 0.061455429 119 nips-2004-Mistake Bounds for Maximum Entropy Discrimination

20 0.061399288 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.175), (1, -0.11), (2, 0.294), (3, 0.048), (4, 0.139), (5, -0.211), (6, -0.03), (7, 0.032), (8, 0.068), (9, 0.018), (10, -0.002), (11, -0.047), (12, -0.142), (13, -0.01), (14, -0.057), (15, -0.012), (16, -0.058), (17, -0.053), (18, -0.008), (19, -0.014), (20, -0.013), (21, -0.068), (22, -0.117), (23, 0.219), (24, -0.04), (25, -0.015), (26, 0.255), (27, -0.144), (28, 0.03), (29, 0.012), (30, 0.117), (31, -0.045), (32, 0.008), (33, 0.04), (34, 0.084), (35, 0.084), (36, 0.015), (37, 0.027), (38, 0.009), (39, 0.061), (40, 0.057), (41, -0.153), (42, 0.02), (43, -0.005), (44, 0.056), (45, -0.049), (46, 0.024), (47, -0.019), (48, -0.005), (49, 0.134)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98250288 175 nips-2004-Stable adaptive control with online learning

Author: H. J. Kim, Andrew Y. Ng

2 0.80825186 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

Author: Ofer Dekel, Shai Shalev-shwartz, Yoram Singer

3 0.53709763 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection

Author: Koji Tsuda, Gunnar Rätsch, Manfred K. Warmuth

4 0.50429195 28 nips-2004-Bayesian inference in spiking neurons

Author: Sophie Deneve

5 0.44995743 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms

Author: Nicolò Cesa-bianchi, Claudio Gentile, Luca Zaniboni

Abstract: We provide a worst-case analysis of selective sampling algorithms for learning linear threshold functions. The algorithms considered in this paper are Perceptron-like algorithms, i.e., algorithms which can be efﬁciently run in any reproducing kernel Hilbert space. Our algorithms exploit a simple margin-based randomized rule to decide whether to query the current label. We obtain selective sampling algorithms achieving on average the same bounds as those proven for their deterministic counterparts, but using much fewer labels. We complement our theoretical ﬁndings with an empirical comparison on two text categorization tasks. The outcome of these experiments is largely predicted by our theoretical results: Our selective sampling algorithms tend to perform as good as the algorithms receiving the true label after each classiﬁcation, while observing in practice substantially fewer labels. 1

6 0.4374792 24 nips-2004-Approximately Efficient Online Mechanism Design

7 0.38671395 26 nips-2004-At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks

8 0.34677342 39 nips-2004-Coarticulation in Markov Decision Processes

9 0.32656145 91 nips-2004-Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters

10 0.30523953 123 nips-2004-Multi-agent Cooperation in Diverse Population Games

11 0.28816622 1 nips-2004-A Cost-Shaping LP for Bellman Error Minimization with Performance Guarantees

12 0.27829191 200 nips-2004-Using Random Forests in the Structured Language Model

13 0.26920554 136 nips-2004-On Semi-Supervised Classification

14 0.26444101 116 nips-2004-Message Errors in Belief Propagation

15 0.25837973 183 nips-2004-Temporal-Difference Networks

16 0.25774509 119 nips-2004-Mistake Bounds for Maximum Entropy Discrimination

17 0.25687358 35 nips-2004-Chemosensory Processing in a Spiking Model of the Olfactory Bulb: Chemotopic Convergence and Center Surround Inhibition

18 0.22894271 30 nips-2004-Binet-Cauchy Kernels

19 0.22637181 2 nips-2004-A Direct Formulation for Sparse PCA Using Semidefinite Programming

20 0.22625373 185 nips-2004-The Convergence of Contrastive Divergences

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.069), (15, 0.121), (26, 0.057), (31, 0.018), (33, 0.179), (35, 0.014), (38, 0.305), (39, 0.036), (50, 0.07), (76, 0.01)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.79749382 158 nips-2004-Sampling Methods for Unsupervised Learning

Author: Rob Fergus, Andrew Zisserman, Pietro Perona

Abstract: We present an algorithm to overcome the local maxima problem in estimating the parameters of mixture models. It combines existing approaches from both EM and a robust ﬁtting algorithm, RANSAC, to give a data-driven stochastic learning scheme. Minimal subsets of data points, sufﬁcient to constrain the parameters of the model, are drawn from proposal densities to discover new regions of high likelihood. The proposal densities are learnt using EM and bias the sampling toward promising solutions. The algorithm is computationally efﬁcient, as well as effective at escaping from local maxima. We compare it with alternative methods, including EM and RANSAC, on both challenging synthetic data and the computer vision problem of alpha-matting. 1

same-paper 2 0.78325021 175 nips-2004-Stable adaptive control with online learning

Author: H. J. Kim, Andrew Y. Ng

3 0.61766934 69 nips-2004-Fast Rates to Bayes for Kernel Machines

Author: Ingo Steinwart, Clint Scovel

Abstract: We establish learning rates to the Bayes risk for support vector machines (SVMs) with hinge loss. In particular, for SVMs with Gaussian RBF kernels we propose a geometric condition for distributions which can be used to determine approximation properties of these kernels. Finally, we compare our methods with a recent paper of G. Blanchard et al.. 1

4 0.61679947 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

Author: Ofer Dekel, Shai Shalev-shwartz, Yoram Singer

5 0.6105547 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data

Author: Aharon Bar-hillel, Adam Spiro, Eran Stark

Abstract: Spike sorting involves clustering spike trains recorded by a microelectrode according to the source neuron. It is a complicated problem, which requires a lot of human labor, partly due to the non-stationary nature of the data. We propose an automated technique for the clustering of non-stationary Gaussian sources in a Bayesian framework. At a ﬁrst search stage, data is divided into short time frames and candidate descriptions of the data as a mixture of Gaussians are computed for each frame. At a second stage transition probabilities between candidate mixtures are computed, and a globally optimal clustering is found as the MAP solution of the resulting probabilistic model. Transition probabilities are computed using local stationarity assumptions and are based on a Gaussian version of the Jensen-Shannon divergence. The method was applied to several recordings. The performance appeared almost indistinguishable from humans in a wide range of scenarios, including movement, merges, and splits of clusters. 1

6 0.60980022 29 nips-2004-Beat Tracking the Graphical Model Way

7 0.60952914 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms

8 0.6081816 103 nips-2004-Limits of Spectral Clustering

9 0.60799062 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound

10 0.60784513 161 nips-2004-Self-Tuning Spectral Clustering

11 0.60722089 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

12 0.60683697 207 nips-2004-ℓ₀-norm Minimization for Basis Selection

13 0.60636044 105 nips-2004-Log-concavity Results on Gaussian Process Methods for Supervised and Unsupervised Learning

14 0.60623008 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform

15 0.6060223 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach

16 0.60518557 49 nips-2004-Density Level Detection is Classification

17 0.60458505 131 nips-2004-Non-Local Manifold Tangent Learning

18 0.60367674 178 nips-2004-Support Vector Classification with Input Data Uncertainty

19 0.60350275 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering

20 0.60277927 187 nips-2004-The Entire Regularization Path for the Support Vector Machine