nips nips2006 nips2006-154 knowledge-graph by maker-knowledge-mining

154 nips-2006-Optimal Change-Detection and Spiking Neurons

Source: pdf

Author: Angela J. Yu

Abstract: Survival in a non-stationary, potentially adversarial environment requires animals to detect sensory changes rapidly yet accurately, two oft competing desiderata. Neurons subserving such detections are faced with the corresponding challenge to discern “real” changes in inputs as quickly as possible, while ignoring noisy ﬂuctuations. Mathematically, this is an example of a change-detection problem that is actively researched in the controlled stochastic processes community. In this paper, we utilize sophisticated tools developed in that community to formalize an instantiation of the problem faced by the nervous system, and characterize the Bayes-optimal decision policy under certain assumptions. We will derive from this optimal strategy an information accumulation and decision process that remarkably resembles the dynamics of a leaky integrate-and-ﬁre neuron. This correspondence suggests that neurons are optimized for tracking input changes, and sheds new light on the computational import of intracellular properties such as resting membrane potential, voltage-dependent conductance, and post-spike reset voltage. We also explore the inﬂuence that factors such as timing, uncertainty, neuromodulation, and reward should and do have on neuronal dynamics and sensitivity, as the optimal decision strategy depends critically on these factors. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Neurons subserving such detections are faced with the corresponding challenge to discern “real” changes in inputs as quickly as possible, while ignoring noisy ﬂuctuations. [sent-4, score-0.327]

2 In this paper, we utilize sophisticated tools developed in that community to formalize an instantiation of the problem faced by the nervous system, and characterize the Bayes-optimal decision policy under certain assumptions. [sent-6, score-0.596]

3 We will derive from this optimal strategy an information accumulation and decision process that remarkably resembles the dynamics of a leaky integrate-and-ﬁre neuron. [sent-7, score-0.509]

4 This correspondence suggests that neurons are optimized for tracking input changes, and sheds new light on the computational import of intracellular properties such as resting membrane potential, voltage-dependent conductance, and post-spike reset voltage. [sent-8, score-0.482]

5 We also explore the inﬂuence that factors such as timing, uncertainty, neuromodulation, and reward should and do have on neuronal dynamics and sensitivity, as the optimal decision strategy depends critically on these factors. [sent-9, score-0.385]

6 1 Introduction Animals interacting with a changeable, potentially adversarial environment need to excel in the detection of changes in its sensory inputs. [sent-10, score-0.338]

7 Due to the noisy and incomplete nature of sensory inputs, the animal can generally achieve more accurate detection by waiting for more sensory inputs. [sent-12, score-0.354]

8 Neurons subserving the detection process face a similar speed-accuracy trade-off. [sent-14, score-0.279]

9 In this work, we aim to understand the computations performed by a neuron at the time-scale of single spikes. [sent-15, score-0.247]

10 How sensitive a neuron is to each input spike should depend on the relative probabilities of the input representing noise and useful information, and the relative costs of mis-interpretation. [sent-16, score-0.454]

11 We formulate the problem as an example of change-detection, and characterize the optimal decision policy in this context. [sent-17, score-0.556]

12 Finding optimal decision policies for such processes is an actively researched problem in ﬁnancial mathematics and operations research. [sent-20, score-0.299]

13 3, we demonstrate that the optimal information accumulation and decision process has dynamics remarkably resembling that of a spiking neuron. [sent-27, score-0.494]

14 We examine the computational import of certain intracellular properties, characterize the input-output ﬁring rate relationship, and extend the framework of multi-source detection. [sent-28, score-0.234]

15 The change-detection problem is concerned with ﬁnding the optimal decision policy for reporting the change from f0 to f1 as early as possible while minimizing false-alarms [1]. [sent-41, score-0.608]

16 A decision policy π is a mapping, possibly stochastic, from all observations made so far to the control (or action) set, π(xt {x1 , . [sent-42, score-0.427]

17 Every unique decision policy is identiﬁed by a corresponding r. [sent-47, score-0.427]

18 The Loss Function Following convention [2], we assume a loss function linear in false alarms and detection delay: lπ (θ, τ ) = 1{τ <θ;π} + 1{τ ≥θ;π} c(τ − θ) (1) where 1 is the indicator function, and c > 0 is a constant that speciﬁes the relative importance of speed and accuracy. [sent-54, score-0.303]

19 The total loss is the expectation of this loss function over θ and τ : τ =∞ Lπ θ−1 lπ (θ, τ ); π = ∞ P (θ, τ ) + θ=0 τ =0 c(τ −θ)P (θ, τ ) = P (τ < θ) + c (τ −θ)+ (2) τ =θ An optimal policy π ∗ minimizes Lπ . [sent-55, score-0.519]

20 Due to the linear loss in detection delay, the expected loss blows up for all policies that do not stop almost surely (a. [sent-56, score-0.338]

21 This implies that the optimal policy consists of a stopping region S ⊂ [0, 1] and a continuation region C = [0, 1] \ S, such that π(Pt : Pt ∈ S) = a1 and π(Pt : Pt ∈ C) = a2 . [sent-79, score-0.573]

22 We will state and prove a useful theorem below, which will imply that C and S neatly fall into two contiguous blocks, such that the optimal policy requires the termination action as soon as Pt exceeds some ﬁxed threshold B ∗ – ie the optimal policy is a ﬁrst-passage process in Pt ! [sent-80, score-1.127]

23 If we can impose a ﬁnite horizon T T T on τ , then the ﬁnitely recursive relation γt = min {Ct , γt+1 |xt } has a corresponding ﬁniteT ∗ T ∞ limT →∞ γt , it has been horizon optimal policy πT , where γT = CT . [sent-83, score-0.423]

24 in ﬁnite time), γt = γt , and πT converges to the ∗ inﬁnite-horizon optimal policy π . [sent-87, score-0.423]

25 Assume that the theorem holds for t+1, and note: T Ct − γt+1 |xt = −(c + q)Pt + q + gi wi i T where gi max(0, li ), li Ct+1 − γt+2 |xt , xt+1 = i , and wi P (xt+1 = i|x). [sent-96, score-0.228]

26 Suppose i, j are such that f1 (i)−f0 (i) > f1 (j)−f0 (j), then Φt+1 (i) > Φt+1 (j), and Pt+1 (i) > Pt+1 (j), for any given xt . [sent-99, score-0.248]

27 This theorem states that the cost of stopping at time t relative to continuing gets smaller when it is more certain that θ ≤ t. [sent-103, score-0.259]

28 If Ct − γt+1 |xt is negative for some value of Pt , then the optimal policy is to select action a1 ; this is also true for any larger values of Pt . [sent-105, score-0.457]

29 Ideally, we would like to have an exact solution for the optimal policy as a function of the generative and cost parameters of the change-detection problem as deﬁned in Sec. [sent-107, score-0.533]

30 While the explicit form of B ∗ is not known, the theorem allows us to ﬁnd the optimal policy numerically by evaluating and minimizing the empirical loss as a function of the decision threshold B ∈ [0, 1]. [sent-109, score-0.779]

31 This case resembles the problem faced by neurons, which receive sequential binary inputs (spike=1, no spike=0) with approximately Poisson statistics. [sent-111, score-0.227]

32 The Bernoulli process is a discrete-time analog of the Poisson process, and obviates the problematic assumption (made by the Poisson model) that spikes could be ﬁred inﬁnitely close to one another. [sent-112, score-0.269]

33 This corresponds to the one-step look-ahead policy, and is optimal when the cost of detection is large or when the probability of the change taking place is very high. [sent-116, score-0.453]

34 This turns out not to be a very interesting case as the detection process is driven to cross the threshold even in the absence of any input spikes. [sent-117, score-0.429]

35 Although we do not have an explicit solution for the optimal detection threshold B ∗ in general, we can numerically compare different values of B for any speciﬁc problem. [sent-118, score-0.502]

36 1(a) shows the empirical cost averaged over 1000 trials for different threshold values. [sent-120, score-0.229]

37 65, although the cost function is quite shallow for a large range of values of B around the optimum, implying that performance is not particularly sensitive to relatively large perturbation around the optimal value. [sent-122, score-0.23]

38 However, the same policy formulation can apply to the case of repeated detection of changes, one after another, in a temporally contiguous fashion. [sent-125, score-0.569]

39 As long as each detection event is generated from the same model parameters (q, q0 , f1 , f0 ), and the cost parameter (c) remains constant, the threshold-crossing policy is still optimal in minimizing the empirical expected loss over these repeated events. [sent-126, score-0.782]

40 The only generative parameter affected by the repetition is q0 , which represents the probability of the inputs already being generated from f1 before the current observation process began. [sent-127, score-0.244]

41 In this repeated detection scenario, q0 should in general be high if the detection threshold B ∗ is high, and low if B ∗ is low. [sent-128, score-0.56]

42 Fortunately, while q0 is inﬂuenced by the detection policy, the optimization of the policy is not inﬂuenced by q0 , since it consists of comparing Ct and γt+1 |xt at every time step. [sent-130, score-0.53]

43 In this repeated ﬁring scenario, where the number of spikes is relatively high relative to the frequency of changes, the loss function of Eq. [sent-132, score-0.269]

44 2 can be rewritten as Lπ = p0 r0 + c/r1 , where ri is the mean ﬁring rate when the inputs are generated from fi , and p0 is the fraction of time when f0 is applicable (as opposed to f1 ). [sent-133, score-0.243]

45 In other words, if the rate of change is slow compared to neuronal ﬁring rates, then optimal processing amounts to minimizing the “spontaneous” ﬁring rate during f0 and maximizing the “stimulus-evoked” ﬁring rate during f1 . [sent-134, score-0.41]

46 65, the optimal threshold as determined in the last section), a change is reported and the whole process resets to Φ0 . [sent-141, score-0.387]

47 The dynamics of Φt is remarkably similar to a leaky integrate-and-ﬁre neuron. [sent-142, score-0.236]

48 The bottom panel shows a raster plot of input and output spikes over 25 trials, and again the resemblance to spiking neurons is remarkable. [sent-143, score-0.576]

49 5 indeed approximates f1 (xt ) the dynamics of a leaky integrate-and-ﬁre neuron [3]. [sent-145, score-0.378]

50 5 as Φt = a(Φt−1 + q) (6) λ1 (1−q)λ0 When xt = 1, a = > 1, Φt increases, and the rate of increase is larger when Φt itself is larger. [sent-147, score-0.294]

51 This is reminiscent of the near-threshold dynamics of the Hodgkin-Huxley model, in which the voltage-dependent activation of sodium conductance drives the neuron to ﬁre [4]. [sent-148, score-0.335]

52 When xt = 0, Φt converges to Φ0 = f1 q/(f0 (1 − q) − f1 ) (by Eq. [sent-149, score-0.248]

53 Since Φ0 increases ∞ ∞ with q, it implies that the resting potential should be higher and closer to the ﬁring threshold, making the neuron more sensitive to synaptic inputs, when there is a stronger expectation that a change is imminent. [sent-152, score-0.443]

54 5 0 0 100 200 300 400 (c) Distribution of input spikes 0. [sent-158, score-0.245]

55 6 −200 Input and output spikes −100 0 100 200 Cost Dynamics of Φ 1 Frequency (a) Distribution of output spikes Frequency 0 10 20 0 100 200 300 Time (samples) 400 0. [sent-163, score-0.46]

56 In this example, a chance ﬂurry of input spikes near the start causes the optimal change-detector to ﬁre; after the change, the increased input ﬁring rate induces the change-detector ﬁre much more frequently. [sent-190, score-0.445]

57 (c) Output spikes (bottom) increase frequency quickly after the increase in input spikes (top). [sent-193, score-0.427]

58 Given the decision threshold B, Φt0 |f0 = Φt1 |f0 = B, where ti is the average number of time steps it takes to reach the threshold for for xt = fi , and can be assumed to be 1 (it takes many time steps of input integration to reach the threshold). [sent-194, score-0.688]

59 To see exactly how the output ﬁring rate ratio changes as a function of the input rates, λ2+λ0−2λ0 λ1 1 we deﬁne the function g(λ0 , λ1 ) , and take its partial derivatives with respect to λ0 λ0−λ2 0 and λ1 . [sent-199, score-0.272]

60 1(c) shows the average detection/ﬁring rate over time: the rise in output ﬁring rate closely follows that in the input, despite the small change in the input ﬁring rates. [sent-203, score-0.293]

61 For instance, a visual neuron detecting the onset of a stimulus might get inputs from up-stream neurons sensitive to stimuli with different properties (different colors, orientations, depth of view, etc. [sent-206, score-0.739]

62 (a) Distribution of ﬁrst spikes for the optimal stopping policy; spikes aligned to time 0 when the actual change takes place. [sent-220, score-0.695]

63 (c) The distribution of spikes is also slightly tightened and brought closer to the actual change time, when there is stronger prior probability of a stimulus appearing (q0 = . [sent-223, score-0.462]

64 The effect is smaller because the higher prior leads to false alarms as well as reducing detection delay. [sent-225, score-0.255]

65 2, we can show that if the generative and cost parameters are such that Φt is lowerbounded by Φ0 for t 1, then the optimal stopping/detection policy is to terminate at the smallest ∞ t, such that Φt = Φ1 +Φ2 +Φ1 Φ2 ≥ (q 1 +q 2 −q 1 q 2 )/c. [sent-228, score-0.533]

66 Despite the generative independence of t t t t the two Bernoulli processes, we note that the optimal policy is different from the na¨ve strategy of ı running two single-source change-detectors, and report a change as soon as one of them reports a change. [sent-229, score-0.589]

67 4 Optimal Change-Detection and Neuromodulation A sizeable body of behavioral studies suggest that stimulus processing is inﬂuenced by cognitive factors such as knowledge about the timing of stimulus onset, or whether or not a stimulus would appear in a particular location. [sent-232, score-0.435]

68 There is evidence that the neuromodulators norepinephrine [6], and acetylcholine [7] are respectively involved in those two aspects of stimulus processing. [sent-233, score-0.291]

69 Experimentally, it has been observed that norepinephrine makes sensory neurons ﬁre more vigorously to bottom-up sensory inputs [8]. [sent-241, score-0.573]

70 It is also known from behavioral studies that a temporal cue improves detection performance, and that noradrenergic depletion diminishes this advantage [6]. [sent-242, score-0.282]

71 If there is some prior knowledge about the stimulus being in a particular location, we can model this with a higher prior probability q0 of the stimulus being present. [sent-243, score-0.234]

72 This also has the effect of increasing the responsiveness of the change-detection process to input spikes (Fig. [sent-244, score-0.326]

73 2C), as well as making the detection (spiking) process more sensitive. [sent-245, score-0.241]

74 It has been shown experimentally that a (correct) spatial cue improves stimulus detection, and that acetylcholine is implicated in this process [7], and that acetylcholine potentiates neurons and increases their responsiveness to sensory inputs [8]. [sent-246, score-0.801]

75 5 Discussion Responding accurately and rapidly to changes in the environment is a problem confronted by the brain at every level, from single neurons to behavior. [sent-247, score-0.283]

76 Applying these ideas to the case of neurons that must rapidly and accurately detect changes in input spike statistics, we saw that the optimal algorithm yields dynamics remarkably similar to the intracellular dynamics of spiking neurons. [sent-249, score-0.961]

77 This suggests that neurons are optimized for tracking discrete, abrupt changes in the inputs. [sent-250, score-0.248]

78 The model yields insight into the computational import of cellular properties such as resting membrane potential, post-spike reset potential, voltage-dependent conductances, and the input-output spiking relationship. [sent-251, score-0.303]

79 The basic framework was extended to examine the case of multi-source change-detection, a problem faced by a neuron tasked with detecting a stimulus when it could be one of two possible sub-categories. [sent-252, score-0.404]

80 We also explored the computational consequences of spatial and temporal cueing on stimulus detection, and saw that the behavioral and biophysical effects of neuromodulation (eg by acetylcholine and norepinephrine) are consistent within the framework. [sent-253, score-0.501]

81 Every neuron in this scheme simply detects changes in its synaptic inputs, on a spike-to-spike time scale, and propagates its knowledge according to its own speed-accuracy trade-off. [sent-257, score-0.295]

82 In particular, the down-stream neuron does not need to know about this neuron’s inputs, its internal dynamics, its decision policy, its objective function, its model of the world, etc. [sent-259, score-0.284]

83 In this scheme, more sophisticated computations can be achieved by pooling together the outputs of different neurons in various conﬁgurations – we explored this brieﬂy with the example of multi-source change-detection. [sent-260, score-0.244]

84 In this scheme, neurons make inferences about their inputs and make decisions at every level of processing. [sent-262, score-0.351]

85 While the incorporation of formal tools from controlled stochastic processes into the modeling of single-cell computations is a novel approach, this work is related to several other theoretical works. [sent-264, score-0.23]

86 The idea of neurons processing and representing probabilistic information has received much attention in recent years, with most work focusing on the level of neuronal populations [9–12]. [sent-265, score-0.277]

87 It has been suggested [13] that certain decision-making neurons may accumulate probabilistic information and spike when the evidence exceeds a certain threshold. [sent-267, score-0.291]

88 However, it was typically assumed that the neurons already receive continuously-valued inputs that represent probabilistic information. [sent-268, score-0.351]

89 Consistent with this characterization, our optimal policy is a generalization of the SPRT algorithm which is known to be optimal for stationary 2AFC discrimination [14]. [sent-271, score-0.514]

90 Also, there was no explicit analysis of the optimality of the output spike generation process: how much of a discrepancy merits a spike, and how this depends on the relevant statistical and cost parameters. [sent-274, score-0.283]

91 We showed in this work that there are good reasons for neurons not to integrate inputs linearly. [sent-277, score-0.351]

92 One important assumption we made in our model is that the cost of detection delay is linear in time, parameterized by the constant c. [sent-280, score-0.309]

93 Without this assumption, the controlled dynamic process framework would not apply, as the decision policy would not only depend on a state variable, but on time in an explicit way. [sent-281, score-0.578]

94 However, in general, there might not be a ﬁxed c that relates the trade-off between false alarms and detection delay. [sent-282, score-0.255]

95 In particular, if a new “trial” begins as soon as the current “trial” terminates, regardless of detection accuracy, then c should be set to P (θ ≤ τ )/ τ , which also places the two cost terms in the same dimension. [sent-284, score-0.312]

96 If we had analytical expressions for P (θ ≤ τ ) and τ as a function of the decision threshold B, then we could solve the optimization problem through the self-consistency constraint placed on the optimal threshold B ∗ through its dependence on c. [sent-285, score-0.436]

97 Alternatively, one might still numerically obtain a value for a ﬁxed detection threshold that incurs the lowest cost among all thresholds. [sent-287, score-0.429]

98 There is no guarantee, however, that the optimal policy lives in this parameterized family of policies. [sent-288, score-0.423]

99 It may be that the best ﬁxed threshold policy is still far from optimal detection. [sent-289, score-0.548]

100 Finally, we note that the formal framework we presented, that of optimal detection of changes in input statistics, is not only applicable to the level of single neuron, but also to systems and cognitive level problems. [sent-300, score-0.465]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pt', 0.442), ('policy', 0.332), ('xt', 0.248), ('detection', 0.198), ('neuron', 0.189), ('neurons', 0.186), ('spikes', 0.182), ('inputs', 0.165), ('ring', 0.158), ('stopping', 0.15), ('ct', 0.13), ('threshold', 0.125), ('stimulus', 0.117), ('dynamics', 0.108), ('spike', 0.105), ('decision', 0.095), ('neuronal', 0.091), ('optimal', 0.091), ('change', 0.09), ('behavioral', 0.084), ('leaky', 0.081), ('sensory', 0.078), ('neuromodulation', 0.077), ('cost', 0.074), ('gi', 0.074), ('acetylcholine', 0.07), ('spiking', 0.066), ('cueing', 0.066), ('norepinephrine', 0.066), ('input', 0.063), ('dayan', 0.062), ('faced', 0.062), ('changes', 0.062), ('resting', 0.061), ('cellular', 0.061), ('membrane', 0.058), ('computations', 0.058), ('intracellular', 0.057), ('import', 0.057), ('pti', 0.057), ('alarms', 0.057), ('explicit', 0.056), ('bernoulli', 0.055), ('consequences', 0.054), ('posterior', 0.053), ('ratio', 0.053), ('controlled', 0.052), ('formal', 0.051), ('decreases', 0.05), ('onset', 0.048), ('loss', 0.048), ('output', 0.048), ('remarkably', 0.047), ('rate', 0.046), ('policies', 0.044), ('deneve', 0.044), ('obviates', 0.044), ('psychopharmacology', 0.044), ('accumulation', 0.044), ('detects', 0.044), ('process', 0.043), ('soon', 0.04), ('wi', 0.04), ('ai', 0.04), ('repeated', 0.039), ('termination', 0.039), ('uenced', 0.039), ('researched', 0.038), ('ess', 0.038), ('subserving', 0.038), ('neuromodulators', 0.038), ('resets', 0.038), ('conductance', 0.038), ('responsiveness', 0.038), ('tightened', 0.038), ('tools', 0.038), ('characterize', 0.038), ('delay', 0.037), ('generative', 0.036), ('examine', 0.036), ('rapidly', 0.035), ('states', 0.035), ('animals', 0.035), ('closer', 0.035), ('increases', 0.034), ('action', 0.034), ('sensitive', 0.034), ('poisson', 0.033), ('saw', 0.033), ('terminates', 0.033), ('firing', 0.033), ('fi', 0.032), ('numerically', 0.032), ('processes', 0.031), ('raster', 0.031), ('nervous', 0.031), ('shallow', 0.031), ('rule', 0.03), ('trials', 0.03), ('rates', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 154 nips-2006-Optimal Change-Detection and Spiking Neurons

Author: Angela J. Yu

2 0.29210559 187 nips-2006-Temporal Coding using the Response Properties of Spiking Neurons

Author: Thomas Voegtlin

Abstract: In biological neurons, the timing of a spike depends on the timing of synaptic currents, in a way that is classically described by the Phase Response Curve. This has implications for temporal coding: an action potential that arrives on a synapse has an implicit meaning, that depends on the position of the postsynaptic neuron on the ﬁring cycle. Here we show that this implicit code can be used to perform computations. Using theta neurons, we derive a spike-timing dependent learning rule from an error criterion. We demonstrate how to train an auto-encoder neural network using this rule. 1

3 0.27553096 165 nips-2006-Real-time adaptive information-theoretic optimization of neurophysiology experiments

Author: Jeremy Lewi, Robert Butera, Liam Paninski

Abstract: Adaptively optimizing experiments can signiﬁcantly reduce the number of trials needed to characterize neural responses using parametric statistical models. However, the potential for these methods has been limited to date by severe computational challenges: choosing the stimulus which will provide the most information about the (typically high-dimensional) model parameters requires evaluating a high-dimensional integration and optimization in near-real time. Here we present a fast algorithm for choosing the optimal (most informative) stimulus based on a Fisher approximation of the Shannon information and specialized numerical linear algebra techniques. This algorithm requires only low-rank matrix manipulations and a one-dimensional linesearch to choose the stimulus and is therefore efﬁcient even for high-dimensional stimulus and parameter spaces; for example, we require just 15 milliseconds on a desktop computer to optimize a 100-dimensional stimulus. Our algorithm therefore makes real-time adaptive experimental design feasible. Simulation results show that model parameters can be estimated much more efﬁciently using these adaptive techniques than by using random (nonadaptive) stimuli. Finally, we generalize the algorithm to efﬁciently handle both fast adaptation due to spike-history effects and slow, non-systematic drifts in the model parameters. Maximizing the efﬁciency of data collection is important in any experimental setting. In neurophysiology experiments, minimizing the number of trials needed to characterize a neural system is essential for maintaining the viability of a preparation and ensuring robust results. As a result, various approaches have been developed to optimize neurophysiology experiments online in order to choose the “best” stimuli given prior knowledge of the system and the observed history of the cell’s responses. The “best” stimulus can be deﬁned a number of different ways depending on the experimental objectives. One reasonable choice, if we are interested in ﬁnding a neuron’s “preferred stimulus,” is the stimulus which maximizes the ﬁring rate of the neuron [1, 2, 3, 4]. Alternatively, when investigating the coding properties of sensory cells it makes sense to deﬁne the optimal stimulus in terms of the mutual information between the stimulus and response [5]. Here we take a system identiﬁcation approach: we deﬁne the optimal stimulus as the one which tells us the most about how a neural system responds to its inputs [6, 7]. We consider neural systems in † ‡ http://www.prism.gatech.edu/∼gtg120z http://www.stat.columbia.edu/∼liam which the probability p(rt |{xt , xt−1 , ..., xt−tk }, {rt−1 , . . . , rt−ta }) of the neural response rt given the current and past stimuli {xt , xt−1 , ..., xt−tk }, and the observed recent history of the neuron’s activity, {rt−1 , . . . , rt−ta }, can be described by a model p(rt |{xt }, {rt−1 }, θ), speciﬁed by a ﬁnite vector of parameters θ. Since we estimate these parameters from experimental trials, we want to choose our stimuli so as to minimize the number of trials needed to robustly estimate θ. Two inconvenient facts make it difﬁcult to realize this goal in a computationally efﬁcient manner: 1) model complexity — we typically need a large number of parameters to accurately model a system’s response p(rt |{xt }, {rt−1 }, θ); and 2) stimulus complexity — we are typically interested in neural responses to stimuli xt which are themselves very high-dimensional (e.g., spatiotemporal movies if we are dealing with visual neurons). In particular, it is computationally challenging to 1) update our a posteriori beliefs about the model parameters p(θ|{rt }, {xt }) given new stimulus-response data, and 2) ﬁnd the optimal stimulus quickly enough to be useful in an online experimental context. In this work we present methods for solving these problems using generalized linear models (GLM) for the input-output relationship p(rt |{xt }, {rt−1 }, θ) and certain Gaussian approximations of the posterior distribution of the model parameters. Our emphasis is on ﬁnding solutions which scale well in high dimensions. We solve problem (1) by using efﬁcient rank-one update methods to update the Gaussian approximation to the posterior, and problem (2) by a reduction to a highly tractable onedimensional optimization problem. Simulation results show that the resulting algorithm produces a set of stimulus-response pairs which is much more informative than the set produced by random sampling. Moreover, the algorithm is efﬁcient enough that it could feasibly run in real-time. Neural systems are highly adaptive and more generally nonstatic. A robust approach to optimal experimental design must be able to cope with changes in θ. We emphasize that the model framework analyzed here can account for three key types of changes: stimulus adaptation, spike rate adaptation, and random non-systematic changes. Adaptation which is completely stimulus dependent can be accounted for by including enough stimulus history terms in the model p(rt |{xt , ..., xt−tk }, {rt−1 , ..., rt−ta }). Spike-rate adaptation effects, and more generally spike history-dependent effects, are accounted for explicitly in the model (1) below. Finally, we consider slow, non-systematic changes which could potentially be due to changes in the health, arousal, or attentive state of the preparation. Methods We model a neuron as a point process whose conditional intensity function (instantaneous ﬁring rate) is given as the output of a generalized linear model (GLM) [8, 9]. This model class has been discussed extensively elsewhere; brieﬂy, this class is fairly natural from a physiological point of view [10], with close connections to biophysical models such as the integrate-and-ﬁre cell [9], and has been applied in a wide variety of experimental settings [11, 12, 13, 14]. The model is summarized as: tk λt = E(rt ) = f ta aj rt−j ki,t−l xi,t−l + i l=1 (1) j=1 In the above summation the ﬁlter coefﬁcients ki,t−l capture the dependence of the neuron’s instantaneous ﬁring rate λt on the ith component of the vector stimulus at time t − l, xt−l ; the model therefore allows for spatiotemporal receptive ﬁelds. For convenience, we arrange all the stimulus coefﬁcients in a vector, k, which allows for a uniform treatment of the spatial and temporal components of the receptive ﬁeld. The coefﬁcients aj model the dependence on the observed recent activity r at time t − j (these terms may reﬂect e.g. refractory effects, burstiness, ﬁring-rate adaptation, etc., depending on the value of the vector a [9]). For convenience we denote the unknown parameter vector as θ = {k; a}. The experimental objective is the estimation of the unknown ﬁlter coefﬁcients, θ, given knowledge of the stimuli, xt , and the resulting responses rt . We chose the nonlinear stage of the GLM, the link function f (), to be the exponential function for simplicity. This choice ensures that the log likelihood of the observed data is a concave function of θ [9]. Representing and updating the posterior. As emphasized above, our ﬁrst key task is to efﬁciently update the posterior distribution of θ after t trials, p(θt |xt , rt ), as new stimulus-response pairs are trial 100 trial 500 trial 2500 trial 5000 θ true 1 info. max. trial 0 0 random −1 (a) random info. max. 2000 Time(Seconds) Entropy 1500 1000 500 0 −500 0 1000 2000 3000 Iteration (b) 4000 5000 0.1 total time diagonalization posterior update 1d line Search 0.01 0.001 0 200 400 Dimensionality 600 (c) Figure 1: A) Plots of the estimated receptive ﬁeld for a simulated visual neuron. The neuron’s receptive ﬁeld θ has the Gabor structure shown in the last panel (spike history effects were set to zero for simplicity here, a = 0). The estimate of θ is taken as the mean of the posterior, µt . The images compare the accuracy of the estimates using information maximizing stimuli and random stimuli. B) Plots of the posterior entropies for θ in these two cases; note that the information-maximizing stimuli constrain the posterior of θ much more effectively than do random stimuli. C) A plot of the timing of the three steps performed on each iteration as a function of the dimensionality of θ. The timing for each step was well-ﬁt by a polynomial of degree 2 for the diagonalization, posterior update and total time, and degree 1 for the line search. The times are an average over many iterations. The error-bars for the total time indicate ±1 std. observed. (We use xt and rt to abbreviate the sequences {xt , . . . , x0 } and {rt , . . . , r0 }.) To solve this problem, we approximate this posterior as a Gaussian; this approximation may be justiﬁed by the fact that the posterior is the product of two smooth, log-concave terms, the GLM likelihood function and the prior (which we assume to be Gaussian, for simplicity). Furthermore, the main theorem of [7] indicates that a Gaussian approximation of the posterior will be asymptotically accurate. We use a Laplace approximation to construct the Gaussian approximation of the posterior, p(θt |xt , rt ): we set µt to the peak of the posterior (i.e. the maximum a posteriori (MAP) estimate of θ), and the covariance matrix Ct to the negative inverse of the Hessian of the log posterior at µt . In general, computing these terms directly requires O(td2 + d3 ) time (where d = dim(θ); the time-complexity increases with t because to compute the posterior we must form a product of t likelihood terms, and the d3 term is due to the inverse of the Hessian matrix), which is unfortunately too slow when t or d becomes large. Therefore we further approximate p(θt−1 |xt−1 , rt−1 ) as Gaussian; to see how this simpliﬁes matters, we use Bayes to write out the posterior: 1 −1 log p(θ|rt , xt ) = − (θ − µt−1 )T Ct−1 (θ − µt−1 ) + − exp {xt ; rt−1 }T θ 2 + rt {xt ; rt−1 }T θ + const d log p(θ|rt , xt ) −1 = −(θ − µt−1 )T Ct−1 + (2) − exp({xt ; rt−1 }T θ) + rt {xt ; rt−1 }T dθ d2 log p(θ|rt , xt ) −1 = −Ct−1 − exp({xt ; rt−1 }T θ){xt ; rt−1 }{xt ; rt−1 }T dθi dθj (3) Now, to update µt we only need to ﬁnd the peak of a one-dimensional function (as opposed to a d-dimensional function); this follows by noting that that the likelihood only varies along a single direction, {xt ; rt−1 }, as a function of θ. At the peak of the posterior, µt , the ﬁrst term in the gradient must be parallel to {xt ; rt−1 } because the gradient is zero. Since Ct−1 is non-singular, µt − µt−1 must be parallel to Ct−1 {xt ; rt−1 }. Therefore we just need to solve a one dimensional problem now to determine how much the mean changes in the direction Ct−1 {xt ; rt−1 }; this requires only O(d2 ) time. Moreover, from the second derivative term above it is clear that computing Ct requires just a rank-one matrix update of Ct−1 , which can be evaluated in O(d2 ) time via the Woodbury matrix lemma. Thus this Gaussian approximation of p(θt−1 |xt−1 , rt−1 ) provides a large gain in efﬁciency; our simulations (data not shown) showed that, despite this improved efﬁciency, the loss in accuracy due to this approximation was minimal. Deriving the (approximately) optimal stimulus. To simplify the derivation of our maximization strategy, we start by considering models in which the ﬁring rate does not depend on past spiking, so θ = {k}. To choose the optimal stimulus for trial t + 1, we want to maximize the conditional mutual information I(θ; rt+1 |xt+1 , xt , rt ) = H(θ|xt , rt ) − H(θ|xt+1 , rt+1 ) (4) with respect to the stimulus xt+1 . The ﬁrst term does not depend on xt+1 , so maximizing the information requires minimizing the conditional entropy H(θ|xt+1 , rt+1 ) = p(rt+1 |xt+1 ) −p(θ|rt+1 , xt+1 ) log p(θ|rt+1 , xt+1 )dθ = Ert+1 |xt+1 log det[Ct+1 ] + const. rt+1 (5) We do not average the entropy of p(θ|rt+1 , xt+1 ) over xt+1 because we are only interested in the conditional entropy for the particular xt+1 which will be presented next. The equality above is due to our Gaussian approximation of p(θ|xt+1 , rt+1 ). Therefore, we need to minimize Ert+1 |xt+1 log det[Ct+1 ] with respect to xt+1 . Since we set Ct+1 to be the negative inverse Hessian of the log-posterior, we have: −1 Ct+1 = Ct + Jobs (rt+1 , xt+1 ) −1 , (6) Jobs is the observed Fisher information. Jobs (rt+1 , xt+1 ) = −∂ 2 log p(rt+1 |ε = xt θ)/∂ε2 xt+1 xt t+1 t+1 (7) Here we use the fact that for the GLM, the likelihood depends only on the dot product, ε = xt θ. t+1 We can use the Woodbury lemma to evaluate the inverse: Ct+1 = Ct I + D(rt+1 , ε)(1 − D(rt+1 , ε)xt Ct xt+1 )−1 xt+1 xt Ct t+1 t+1 (8) where D(rt+1 , ε) = ∂ 2 log p(rt+1 |ε)/∂ε2 . Using some basic matrix identities, log det[Ct+1 ] = log det[Ct ] − log(1 − D(rt+1 , ε)xt Ct xt+1 ) t+1 = log det[Ct ] + D(rt+1 , ε)xt Ct xt+1 t+1 + o(D(rt+1 , ε)xt Ct xt+1 ) t+1 (9) (10) Ignoring the higher order terms, we need to minimize Ert+1 |xt+1 D(rt+1 , ε)xt Ct xt+1 . In our case, t+1 with f (θt xt+1 ) = exp(θt xt+1 ), we can use the moment-generating function of the multivariate Trial info. max. i.i.d 2 400 −10−4 0 0.05 −10−1 −2 ai 800 2 0 −2 −7 −10 i i.i.d k info. max. 1 1 50 i 1 50 i 1 10 1 i (a) i 100 0 −0.05 10 1 1 (b) i 10 (c) Figure 2: A comparison of parameter estimates using information-maximizing versus random stimuli for a model neuron whose conditional intensity depends on both the stimulus and the spike history. The images in the top row of A and B show the MAP estimate of θ after each trial as a row in the image. Intensity indicates the value of the coefﬁcients. The true value of θ is shown in the second row of images. A) The estimated stimulus coefﬁcients, k. B) The estimated spike history coefﬁcients, a. C) The ﬁnal estimates of the parameters after 800 trials: dashed black line shows true values, dark gray is estimate using information maximizing stimuli, and light gray is estimate using random stimuli. Using our algorithm improved the estimates of k and a. Gaussian p(θ|xt , rt ) to evaluate this expectation. After some algebra, we ﬁnd that to maximize I(θ; rt+1 |xt+1 , xt , rt ), we need to maximize 1 F (xt+1 ) = exp(xT µt ) exp( xT Ct xt+1 )xT Ct xt+1 . t+1 t+1 2 t+1 (11) Computing the optimal stimulus. For the GLM the most informative stimulus is undeﬁned, since increasing the stimulus power ||xt+1 ||2 increases the informativeness of any putatively “optimal” stimulus. To obtain a well-posed problem, we optimize the stimulus under the usual power constraint ||xt+1 ||2 ≤ e < ∞. We maximize Eqn. 11 under this constraint using Lagrange multipliers and an eigendecomposition to reduce our original d-dimensional optimization problem to a onedimensional problem. Expressing Eqn. 11 in terms of the eigenvectors of Ct yields: 1 2 2 F (xt+1 ) = exp( u i yi + ci yi ) ci yi (12) 2 i i i = g( 2 ci yi ) ui yi )h( i (13) i where ui and yi represent the projection of µt and xt+1 onto the ith eigenvector and ci is the corresponding eigenvalue. To simplify notation we also introduce the functions g() and h() which are monotonically strictly increasing functions implicitly deﬁned by Eqn. 12. We maximize F (xt+1 ) by breaking the problem into an inner and outer problem by ﬁxing the value of i ui yi and maximizing h() subject to that constraint. A single line search over all possible values of i ui yi will then ﬁnd the global maximum of F (.). This approach is summarized by the equation: max F (y) = max g(b) · y:||y||2 =e b max y:||y||2 =e,y t u=b 2 ci yi ) h( i Since h() is increasing, to solve the inner problem we only need to solve: 2 ci yi max y:||y||2 =e,y t u=b (14) i This last expression is a quadratic function with quadratic and linear constraints and we can solve it using the Lagrange method for constrained optimization. The result is an explicit system of 1 true θ random info. max. info. max. no diffusion 1 0.8 0.6 trial 0.4 0.2 400 0 −0.2 −0.4 800 1 100 θi 1 θi 100 1 θi 100 1 θ i 100 −0.6 random info. max. θ true θ i 1 0 −1 Entropy θ i 1 0 −1 random info. max. 250 200 i 1 θ Trial 400 Trial 200 Trial 0 (a) 0 −1 20 40 (b) i 60 80 100 150 0 200 400 600 Iteration 800 (c) Figure 3: Estimating the receptive ﬁeld when θ is not constant. A) The posterior means µt and true θt plotted after each trial. θ was 100 dimensional, with its components following a Gabor function. To simulate nonsystematic changes in the response function, the center of the Gabor function was moved according to a random walk in between trials. We modeled the changes in θ as a random walk with a white covariance matrix, Q, with variance .01. In addition to the results for random and information-maximizing stimuli, we also show the µt given stimuli chosen to maximize the information under the (mistaken) assumption that θ was constant. Each row of the images plots θ using intensity to indicate the value of the different components. B) Details of the posterior means µt on selected trials. C) Plots of the posterior entropies as a function of trial number; once again, we see that information-maximizing stimuli constrain the posterior of θt more effectively. equations for the optimal yi as a function of the Lagrange multiplier λ1 . ui e yi (λ1 ) = ||y||2 2(ci − λ1 ) (15) Thus to ﬁnd the global optimum we simply vary λ1 (this is equivalent to performing a search over b), and compute the corresponding y(λ1 ). For each value of λ1 we compute F (y(λ1 )) and choose the stimulus y(λ1 ) which maximizes F (). It is possible to show (details omitted) that the maximum of F () must occur on the interval λ1 ≥ c0 , where c0 is the largest eigenvalue. This restriction on the optimal λ1 makes the implementation of the linesearch signiﬁcantly faster and more stable. To summarize, updating the posterior and ﬁnding the optimal stimulus requires three steps: 1) a rankone matrix update and one-dimensional search to compute µt and Ct ; 2) an eigendecomposition of Ct ; 3) a one-dimensional search over λ1 ≥ c0 to compute the optimal stimulus. The most expensive step here is the eigendecomposition of Ct ; in principle this step is O(d3 ), while the other steps, as discussed above, are O(d2 ). Here our Gaussian approximation of p(θt−1 |xt−1 , rt−1 ) is once again quite useful: recall that in this setting Ct is just a rank-one modiﬁcation of Ct−1 , and there exist efﬁcient algorithms for rank-one eigendecomposition updates [15]. While the worst-case running time of this rank-one modiﬁcation of the eigendecomposition is still O(d3 ), we found the average running time in our case to be O(d2 ) (Fig. 1(c)), due to deﬂation which reduces the cost of matrix multiplications associated with ﬁnding the eigenvectors of repeated eigenvalues. Therefore the total time complexity of our algorithm is empirically O(d2 ) on average. Spike history terms. The preceding derivation ignored the spike-history components of the GLM model; that is, we ﬁxed a = 0 in equation (1). Incorporating spike history terms only affects the optimization step of our algorithm; updating the posterior of θ = {k; a} proceeds exactly as before. The derivation of the optimization strategy proceeds in a similar fashion and leads to an analogous optimization strategy, albeit with a few slight differences in detail which we omit due to space constraints. The main difference is that instead of maximizing the quadratic expression in Eqn. 14 to ﬁnd the maximum of h(), we need to maximize a quadratic expression which includes a linear term due to the correlation between the stimulus coefﬁcients, k, and the spike history coefﬁcients,a. The results of our simulations with spike history terms are shown in Fig. 2. Dynamic θ. In addition to fast changes due to adaptation and spike-history effects, animal preparations often change slowly and nonsystematically over the course of an experiment [16]. We model these effects by letting θ experience diffusion: θt+1 = θt + wt (16) Here wt is a normally distributed random variable with mean zero and known covariance matrix Q. This means that p(θt+1 |xt , rt ) is Gaussian with mean µt and covariance Ct + Q. To update the posterior and choose the optimal stimulus, we use the same procedure as described above1 . Results Our ﬁrst simulation considered the use of our algorithm for learning the receptive ﬁeld of a visually sensitive neuron. We took the neuron’s receptive ﬁeld to be a Gabor function, as a proxy model of a V1 simple cell. We generated synthetic responses by sampling Eqn. 1 with θ set to a 25x33 Gabor function. We used this synthetic data to compare how well θ could be estimated using information maximizing stimuli compared to using random stimuli. The stimuli were 2-d images which were rasterized in order to express x as a vector. The plots of the posterior means µt in Fig. 1 (recall these are equivalent to the MAP estimate of θ) show that the information maximizing strategy converges an order of magnitude more rapidly to the true θ. These results are supported by the conclusion of [7] that the information maximization strategy is asymptotically never worse than using random stimuli and is in general more efﬁcient. The running time for each step of the algorithm as a function of the dimensionality of θ is plotted in Fig. 1(c). These results were obtained on a machine with a dual core Intel 2.80GHz XEON processor running Matlab. The solid lines indicate ﬁtted polynomials of degree 1 for the 1d line search and degree 2 for the remaining curves; the total running time for each trial scaled as O(d2 ), as predicted. When θ was less than 200 dimensions, the total running time was roughly 50 ms (and for dim(θ) ≈ 100, the runtime was close to 15 ms), well within the range of tolerable latencies for many experiments. In Fig. 2 we apply our algorithm to characterize the receptive ﬁeld of a neuron whose response depends on its past spiking. Here, the stimulus coefﬁcients k were chosen to follow a sine-wave; 1 The one difference is that the covariance matrix of p(θt+1 |xt+1 , rt+1 ) is in general no longer just a rankone modiﬁcation of the covariance matrix of p(θt |xt , rt ); thus, we cannot use the rank-one update to compute the eigendecomposition. However, it is often reasonable to take Q to be white, Q = cI; in this case the eigenvectors of Ct + Q are those of Ct and the eigenvalues are ci + c where ci is the ith eigenvalue of Ct ; thus in this case, our methods may be applied without modiﬁcation. the spike history coefﬁcients a were inhibitory and followed an exponential function. When choosing stimuli we updated the posterior for the full θ = {k; a} simultaneously and maximized the information about both the stimulus coefﬁcients and the spike history coefﬁcients. The information maximizing strategy outperformed random sampling for estimating both the spike history and stimulus coefﬁcients. Our ﬁnal set of results, Fig. 3, considers a neuron whose receptive ﬁeld drifts non-systematically with time. We take the receptive ﬁeld to be a Gabor function whose center moves according to a random walk (we have in mind a slow random drift of eye position during a visual experiment). The results demonstrate the feasibility of the information-maximization strategy in the presence of nonstationary response properties θ, and emphasize the superiority of adaptive methods in this context. Conclusion We have developed an efﬁcient implementation of an algorithm for online optimization of neurophysiology experiments based on information-theoretic criterion. Reasonable approximations based on a GLM framework allow the algorithm to run in near-real time even for high dimensional parameter and stimulus spaces, and in the presence of spike-rate adaptation and time-varying neural response properties. Despite these approximations the algorithm consistently provides signiﬁcant improvements over random sampling; indeed, the differences in efﬁciency are large enough that the information-optimization strategy may permit robust system identiﬁcation in cases where it is simply not otherwise feasible to estimate the neuron’s parameters using random stimuli. Thus, in a sense, the proposed stimulus-optimization technique signiﬁcantly extends the reach and power of classical neurophysiology methods. Acknowledgments JL is supported by the Computational Science Graduate Fellowship Program administered by the DOE under contract DE-FG02-97ER25308 and by the NSF IGERT Program in Hybrid Neural Microsystems at Georgia Tech via grant number DGE-0333411. LP is supported by grant EY018003 from the NEI and by a Gatsby Foundation Pilot Grant. We thank P. Latham for helpful conversations. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] I. Nelken, et al., Hearing Research 72, 237 (1994). P. Foldiak, Neurocomputing 38–40, 1217 (2001). K. Zhang, et al., Proceedings (Computational and Systems Neuroscience Meeting, 2004). R. C. deCharms, et al., Science 280, 1439 (1998). C. Machens, et al., Neuron 47, 447 (2005). A. Watson, et al., Perception and Psychophysics 33, 113 (1983). L. Paninski, Neural Computation 17, 1480 (2005). P. McCullagh, et al., Generalized linear models (Chapman and Hall, London, 1989). L. Paninski, Network: Computation in Neural Systems 15, 243 (2004). E. Simoncelli, et al., The Cognitive Neurosciences, M. Gazzaniga, ed. (MIT Press, 2004), third edn. P. Dayan, et al., Theoretical Neuroscience (MIT Press, 2001). E. Chichilnisky, Network: Computation in Neural Systems 12, 199 (2001). F. Theunissen, et al., Network: Computation in Neural Systems 12, 289 (2001). L. Paninski, et al., Journal of Neuroscience 24, 8551 (2004). M. Gu, et al., SIAM Journal on Matrix Analysis and Applications 15, 1266 (1994). N. A. Lesica, et al., IEEE Trans. On Neural Systems And Rehabilitation Engineering 13, 194 (2005).

4 0.26905856 197 nips-2006-Uncertainty, phase and oscillatory hippocampal recall

Author: Máté Lengyel, Peter Dayan

Abstract: Many neural areas, notably, the hippocampus, show structured, dynamical, population behavior such as coordinated oscillations. It has long been observed that such oscillations provide a substrate for representing analog information in the ﬁring phases of neurons relative to the underlying population rhythm. However, it has become increasingly clear that it is essential for neural populations to represent uncertainty about the information they capture, and the substantial recent work on neural codes for uncertainty has omitted any analysis of oscillatory systems. Here, we observe that, since neurons in an oscillatory network need not only ﬁre once in each cycle (or even at all), uncertainty about the analog quantities each neuron represents by its ﬁring phase might naturally be reported through the degree of concentration of the spikes that it ﬁres. We apply this theory to memory in a model of oscillatory associative recall in hippocampal area CA3. Although it is not well treated in the literature, representing and manipulating uncertainty is fundamental to competent memory; our theory enables us to view CA3 as an effective uncertainty-aware, retrieval system. 1

5 0.22981919 125 nips-2006-Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning

Author: Peter Auer, Ronald Ortner

Abstract: We present a learning algorithm for undiscounted reinforcement learning. Our interest lies in bounds for the algorithm’s online performance after some ﬁnite number of steps. In the spirit of similar methods already successfully applied for the exploration-exploitation tradeoff in multi-armed bandit problems, we use upper conﬁdence bounds to show that our UCRL algorithm achieves logarithmic online regret in the number of steps taken with respect to an optimal policy. 1 1.1

6 0.2160465 36 nips-2006-Attentional Processing on a Spike-Based VLSI Neural Network

7 0.21175699 171 nips-2006-Sample Complexity of Policy Search with Known Dynamics

8 0.20302029 99 nips-2006-Information Bottleneck Optimization and Independent Component Extraction with Spiking Neurons

9 0.20003022 162 nips-2006-Predicting spike times from subthreshold dynamics of a neuron

10 0.1982514 59 nips-2006-Context dependent amplification of both rate and event-correlation in a VLSI network of spiking neurons

11 0.18952286 44 nips-2006-Bayesian Policy Gradient Algorithms

12 0.13990939 18 nips-2006-A selective attention multi--chip system with dynamic synapses and spiking neurons

13 0.13152044 164 nips-2006-Randomized PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension

14 0.12628786 38 nips-2006-Automated Hierarchy Discovery for Planning in Partially Observable Environments

15 0.12472477 145 nips-2006-Neurophysiological Evidence of Cooperative Mechanisms for Stereo Computation

16 0.11309222 203 nips-2006-implicit Online Learning with Kernels

17 0.11271274 17 nips-2006-A recipe for optimizing a time-histogram

18 0.10605746 50 nips-2006-Chained Boosting

19 0.1052302 152 nips-2006-Online Classification for Complex Problems Using Simultaneous Projections

20 0.10353222 191 nips-2006-The Robustness-Performance Tradeoff in Markov Decision Processes

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.286), (1, -0.435), (2, -0.284), (3, -0.008), (4, -0.006), (5, -0.003), (6, 0.024), (7, -0.14), (8, 0.154), (9, -0.025), (10, 0.024), (11, 0.037), (12, 0.06), (13, 0.004), (14, -0.014), (15, 0.059), (16, -0.023), (17, -0.053), (18, 0.041), (19, -0.064), (20, 0.009), (21, 0.047), (22, 0.051), (23, -0.001), (24, 0.009), (25, 0.04), (26, -0.044), (27, -0.039), (28, -0.085), (29, -0.043), (30, -0.039), (31, 0.025), (32, -0.013), (33, -0.008), (34, 0.042), (35, 0.036), (36, 0.005), (37, -0.038), (38, -0.049), (39, 0.08), (40, -0.051), (41, 0.019), (42, 0.028), (43, 0.066), (44, 0.056), (45, 0.024), (46, -0.079), (47, -0.036), (48, 0.124), (49, -0.07)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97205484 154 nips-2006-Optimal Change-Detection and Spiking Neurons

Author: Angela J. Yu

2 0.67752647 197 nips-2006-Uncertainty, phase and oscillatory hippocampal recall

Author: Máté Lengyel, Peter Dayan

3 0.66422206 187 nips-2006-Temporal Coding using the Response Properties of Spiking Neurons

Author: Thomas Voegtlin

4 0.65848565 165 nips-2006-Real-time adaptive information-theoretic optimization of neurophysiology experiments

Author: Jeremy Lewi, Robert Butera, Liam Paninski

5 0.63475472 99 nips-2006-Information Bottleneck Optimization and Independent Component Extraction with Spiking Neurons

Author: Stefan Klampfl, Wolfgang Maass, Robert A. Legenstein

Abstract: The extraction of statistically independent components from high-dimensional multi-sensory input streams is assumed to be an essential component of sensory processing in the brain. Such independent component analysis (or blind source separation) could provide a less redundant representation of information about the external world. Another powerful processing strategy is to extract preferentially those components from high-dimensional input streams that are related to other information sources, such as internal predictions or proprioceptive feedback. This strategy allows the optimization of internal representation according to the information bottleneck method. However, concrete learning rules that implement these general unsupervised learning principles for spiking neurons are still missing. We show how both information bottleneck optimization and the extraction of independent components can in principle be implemented with stochastically spiking neurons with refractoriness. The new learning rule that achieves this is derived from abstract information optimization principles. 1

6 0.62769687 125 nips-2006-Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning

7 0.57974523 162 nips-2006-Predicting spike times from subthreshold dynamics of a neuron

8 0.5754109 44 nips-2006-Bayesian Policy Gradient Algorithms

9 0.56884825 171 nips-2006-Sample Complexity of Policy Search with Known Dynamics

10 0.54263926 202 nips-2006-iLSTD: Eligibility Traces and Convergence Analysis

11 0.53203338 36 nips-2006-Attentional Processing on a Spike-Based VLSI Neural Network

12 0.51017481 18 nips-2006-A selective attention multi--chip system with dynamic synapses and spiking neurons

13 0.50663882 59 nips-2006-Context dependent amplification of both rate and event-correlation in a VLSI network of spiking neurons

14 0.46603665 189 nips-2006-Temporal dynamics of information content carried by neurons in the primary visual cortex

15 0.4556742 191 nips-2006-The Robustness-Performance Tradeoff in Markov Decision Processes

16 0.38536039 192 nips-2006-Theory and Dynamics of Perceptual Bistability

17 0.38416028 25 nips-2006-An Application of Reinforcement Learning to Aerobatic Helicopter Flight

18 0.38219798 143 nips-2006-Natural Actor-Critic for Road Traffic Optimisation

19 0.36287445 89 nips-2006-Handling Advertisements of Unknown Quality in Search Advertising

20 0.35718453 29 nips-2006-An Information Theoretic Framework for Eukaryotic Gradient Sensing

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.087), (3, 0.016), (7, 0.075), (9, 0.073), (20, 0.024), (22, 0.063), (25, 0.216), (44, 0.073), (57, 0.115), (65, 0.043), (69, 0.027), (71, 0.093), (82, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86212003 44 nips-2006-Bayesian Policy Gradient Algorithms

Author: Mohammad Ghavamzadeh, Yaakov Engel

Abstract: Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Conventional policy gradient methods use Monte-Carlo techniques to estimate this gradient. Since Monte Carlo methods tend to have high variance, a large number of samples is required, resulting in slow convergence. In this paper, we propose a Bayesian framework that models the policy gradient as a Gaussian process. This reduces the number of samples needed to obtain accurate gradient estimates. Moreover, estimates of the natural gradient as well as a measure of the uncertainty in the gradient estimates are provided at little extra cost. 1

same-paper 2 0.85424608 154 nips-2006-Optimal Change-Detection and Spiking Neurons

Author: Angela J. Yu

3 0.81473011 67 nips-2006-Differential Entropic Clustering of Multivariate Gaussians

Author: Jason V. Davis, Inderjit S. Dhillon

Abstract: Gaussian data is pervasive and many learning algorithms (e.g., k-means) model their inputs as a single sample drawn from a multivariate Gaussian. However, in many real-life settings, each input object is best described by multiple samples drawn from a multivariate Gaussian. Such data can arise, for example, in a movie review database where each movie is rated by several users, or in time-series domains such as sensor networks. Here, each input can be naturally described by both a mean vector and covariance matrix which parameterize the Gaussian distribution. In this paper, we consider the problem of clustering such input objects, each represented as a multivariate Gaussian. We formulate the problem using an information theoretic approach and draw several interesting theoretical connections to Bregman divergences and also Bregman matrix divergences. We evaluate our method across several domains, including synthetic data, sensor network data, and a statistical debugging application. 1

4 0.70469445 187 nips-2006-Temporal Coding using the Response Properties of Spiking Neurons

Author: Thomas Voegtlin

5 0.67964649 191 nips-2006-The Robustness-Performance Tradeoff in Markov Decision Processes

Author: Huan Xu, Shie Mannor

Abstract: Computation of a satisfactory control policy for a Markov decision process when the parameters of the model are not exactly known is a problem encountered in many practical applications. The traditional robust approach is based on a worstcase analysis and may lead to an overly conservative policy. In this paper we consider the tradeoff between nominal performance and the worst case performance over all possible models. Based on parametric linear programming, we propose a method that computes the whole set of Pareto efﬁcient policies in the performancerobustness plane when only the reward parameters are subject to uncertainty. In the more general case when the transition probabilities are also subject to error, we show that the strategy with the “optimal” tradeoff might be non-Markovian and hence is in general not tractable. 1

6 0.67631221 135 nips-2006-Modelling transcriptional regulation using Gaussian Processes

7 0.65474838 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency

8 0.65400416 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis

9 0.65273142 162 nips-2006-Predicting spike times from subthreshold dynamics of a neuron

10 0.65054041 167 nips-2006-Recursive ICA

11 0.64873689 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation

12 0.6443975 99 nips-2006-Information Bottleneck Optimization and Independent Component Extraction with Spiking Neurons

13 0.64201003 165 nips-2006-Real-time adaptive information-theoretic optimization of neurophysiology experiments

14 0.63878787 158 nips-2006-PG-means: learning the number of clusters in data

15 0.63839388 175 nips-2006-Simplifying Mixture Models through Function Approximation

16 0.63455278 188 nips-2006-Temporal and Cross-Subject Probabilistic Models for fMRI Prediction Tasks

17 0.63441139 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization

18 0.63433737 34 nips-2006-Approximate Correspondences in High Dimensions

19 0.63404024 65 nips-2006-Denoising and Dimension Reduction in Feature Space

20 0.6313802 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model