nips nips2002 nips2002-199 knowledge-graph by maker-knowledge-mining

199 nips-2002-Timing and Partial Observability in the Dopamine System


Source: pdf

Author: Nathaniel D. Daw, Aaron C. Courville, David S. Touretzky

Abstract: According to a series of influential models, dopamine (DA) neurons signal reward prediction error using a temporal-difference (TD) algorithm. We address a problem not convincingly solved in these accounts: how to maintain a representation of cues that predict delayed consequences. Our new model uses a TD rule grounded in partially observable semi-Markov processes, a formalism that captures two largely neglected features of DA experiments: hidden state and temporal variability. Previous models predicted rewards using a tapped delay line representation of sensory inputs; we replace this with a more active process of inference about the underlying state of the world. The DA system can then learn to map these inferred states to reward predictions using TD. The new model can explain previously vexing data on the responses of DA neurons in the face of temporal variability. By combining statistical model-based learning with a physiologically grounded TD theory, it also brings into contact with physiology some insights about behavior that had previously been confined to more abstract psychological models.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract According to a series of influential models, dopamine (DA) neurons signal reward prediction error using a temporal-difference (TD) algorithm. [sent-6, score-0.848]

2 Our new model uses a TD rule grounded in partially observable semi-Markov processes, a formalism that captures two largely neglected features of DA experiments: hidden state and temporal variability. [sent-8, score-0.58]

3 Previous models predicted rewards using a tapped delay line representation of sensory inputs; we replace this with a more active process of inference about the underlying state of the world. [sent-9, score-0.78]

4 The DA system can then learn to map these inferred states to reward predictions using TD. [sent-10, score-0.619]

5 The new model can explain previously vexing data on the responses of DA neurons in the face of temporal variability. [sent-11, score-0.303]

6 In trace conditioning, for instance, nothing observable spans the delay between a transient stimulus and the reward it predicts. [sent-17, score-0.89]

7 For DA models, this raises problems of coping with hidden state and of tracking temporal intervals. [sent-18, score-0.274]

8 Most previous models address these issues using a tapped delay line representation of the world’s state. [sent-19, score-0.482]

9 This augments the representation of current sensory observations with remembered past observations, dividing temporal intervals into a series of states to mark the passage of time. [sent-20, score-0.309]

10 But linear combinations of tapped delay lines do not properly model variability in the intervals between events. [sent-21, score-0.503]

11 We propose a model that better reflects experimental situations by using a formalism that explicitly incorporates hidden state and temporal variability: a partially observable semiMarkov process. [sent-23, score-0.483]

12 The proposal envisions the interaction between a cortical perceptual system that infers the world’s hidden state using an internal world model, and a dopaminergic TD system that learns reward predictions for these inferred states. [sent-24, score-0.945]

13 (a,b) State spaces for the Markov tapped delay line (a) and our semi-Markov (b) TD models of a trace conditioning experiment. [sent-46, score-0.583]

14 (c,d) Modeled DA activity (TD error) when an expected reward is delivered early (top), on time (middle) or late (bottom). [sent-47, score-0.651]

15 The tapped delay line model (c) produces spurious negative error after an early reward, while, in accord with experiments, our semi-Markov model (d) does not. [sent-48, score-0.57]

16 Shaded stripes under (d) and (f) track the model’s belief distribution over the world’s hidden state (given a one-timestep backward pass), with the ISI in white, the ITI in black, and gray for uncertainty between the two. [sent-49, score-0.229]

17 (e,f) Modeled DA activity when reward timing varies uniformly over a range. [sent-50, score-0.542]

18 The tapped delay line model (e) incorrectly predicts identical excitation to rewards delivered at all times, while, in accord with experiment, our model (f) predicts a response that declines with delay. [sent-51, score-0.706]

19 Several models [1, 2, 3, 4, 5] identify the firing of DA neurons with the reward prediction error signal δt of a TD algorithm [6]. [sent-52, score-0.774]

20 In the models, DA neurons are excited by positive error in reward prediction (caused by unexpected rewards or reward-predicting stimuli) and inhibited by negative prediction error (caused by the omission of expected reward). [sent-53, score-0.915]

21 If a reward arrives as expected, the models predict no change in firing rate. [sent-54, score-0.578]

22 In idealized form (neglecting some instrumental contingencies), these experiments and the others that we consider here are all variations on trace conditioning, in which a phasic stimulus such as a flash of light signals that reward will be delivered after a delay. [sent-56, score-0.726]

23 TD systems map a representation of the state of the world to a prediction of future reward, but previous DA modeling exploited few experimental constraints on the form of this representation. [sent-57, score-0.251]

24 [1] computed values using only immediately observable stimuli and allowed learning about rewards to accrue to previously observed stimuli using eligibility traces. [sent-59, score-0.305]

25 But in trace conditioning, DA neurons show a timed pause in their background firing when an expected reward fails to arrive [7]. [sent-60, score-0.665]

26 [3] addressed these data using a tapped delay line representation of stimulus history [8]: at time t, each stimulus is represented by a vector whose nth element codes whether the stimulus was observed at time t − n. [sent-65, score-0.775]

27 This representation allows the models to learn the temporal relationship between stimulus and reward, and to correctly predict phasic inhibition timelocked to omitted rewards. [sent-66, score-0.385]

28 These models, however, mispredict the behavior of DA neurons when the interval between stimulus and reward varies. [sent-67, score-0.759]

29 In part, this occurs because the models do not represent the reward as an observation, so its arrival can have no effect on later predictions. [sent-70, score-0.539]

30 More fundamentally, this is a problem with how the models partition events into a state space. [sent-71, score-0.248]

31 Figure 1a illustrates how the tapped delay lines mark time in the interval between stimulus and reward using a series of states, each of which learns its own reward prediction. [sent-72, score-1.522]

32 After the stimulus occurs, the model’s representation marches through each state in succession. [sent-73, score-0.235]

33 If the second event has occurred, the interval is complete and the system should not expect reward again, but the tapped delay line continues to advance. [sent-75, score-0.955]

34 This may be correctable, though awkwardly, by representing the reward with its own delay line, which can then learn to suppress further reward expectation after a reward occurs [10]. [sent-76, score-1.669]

35 Also, whether this works depends on how information from multiple cues is combined into an aggregate reward prediction (i. [sent-78, score-0.55]

36 In this case, all substates within the interval see reward with the same (low) probability, so each produces identical positive error when reward occurs there. [sent-82, score-1.082]

37 In animal experiments, however, stronger dopaminergic activity is seen for earlier rewards [11]. [sent-83, score-0.249]

38 We address them with a TD model grounded in a formalism that incorporates temporal variability, a partially observable [12] semi-Markov [13] process. [sent-85, score-0.363]

39 If the process is in state s ∈ S, then the next state is s with probability Qss . [sent-88, score-0.246]

40 The dwell time τ spent in s before making a transition is distributed with probability Dsτ ; we define the indicator φt as one if the state transitioned between t and t + 1 and zero otherwise. [sent-90, score-0.284]

41 Some observations are distinguished as rewarding; we separately write the reward magnitude of an observation as r. [sent-92, score-0.537]

42 In this formalism, a trace conditioning experiment can be treated as alternation between two states (Figure 1b). [sent-94, score-0.224]

43 The states correspond to the intervals between stimulus and reward (interstimulus interval: ISI) and between reward and stimulus (intertrial interval: ITI). [sent-95, score-1.307]

44 A stimulus is the likely observation when entering the ISI and a reward when entering the ITI. [sent-96, score-0.684]

45 If φt = 0 (if the state did not transition between t and t + 1) then st+1 = st , ot+1 is null and rt+1 = 0 (i. [sent-100, score-0.281]

46 An unsignaled transition into the ITI state occurs in our model when reward is omitted, a common experimental manipulation [7]. [sent-104, score-0.738]

47 This example demonstrates the relationship between temporal variability and partial observability: if reward timing can vary, nothing in the observable state reveals whether a late reward is still coming or has been omitted completely. [sent-105, score-1.476]

48 TD algorithms [6] approximate a function mapping each state to its value, defined as the expectation (with respect to variability in reward magnitude, state succession, and dwell times) of summed, discounted future reward, starting from that state. [sent-106, score-0.924]

49 In the semi-Markov case [13], a state’s value is defined as the reward expectation at the moment it is entered; we do not count rewards received on the transition in. [sent-107, score-0.664]

50 The value of the nth state entered is: Vsn = E γ τn rn+1 + γ τn +τn+1 rn+2 + . [sent-108, score-0.216]

51 We address partial observability by using model-based inference to determine a distribution over the hidden states, which then serves as a basis over which a modified TD algorithm can learn values. [sent-112, score-0.269]

52 For state inference, we assume that the brain’s sensory processing systems use an internal model of the semi-Markov process — that is, the functions O, Q, and D. [sent-115, score-0.25]

53 A key assumption about this internal model is that its distributions over intervals, rewards and observations contain asymptotic uncertainty, that is, they are not arbitrarily sharp. [sent-117, score-0.268]

54 This uncertainty is present only in the internal model: most anomalous events never occur in our simulations. [sent-123, score-0.221]

55 ot , we can determine the likelihood that each hidden state is active using a standard forward-backward algorithm for hidden semi-Markov models [17]. [sent-127, score-0.532]

56 The first term can be computed by integrating over st+1 in the model: P (ot+1 |st =s, φt =1) = s ∈S Qss · Os ot+1 ; the second requires integrating over possible state sequences and dwell times: dlastO P (st = s, φt = 1|o1 . [sent-136, score-0.242]

57 ot ) = Dsτ ·Osot−τ +1 ·P (st−τ +1 = s, φt−τ = 1|o1 . [sent-139, score-0.242]

58 Due to partial observability, we may not be certain when transitions have occurred or from which states, so we perform TD updates to every state at every timestep, weighted by β. [sent-150, score-0.227]

59 Both expectations are conditioned on the process having left state s at time t, and computed using the internal world model. [sent-156, score-0.256]

60 However, because of uncertainty as to the state of the world, the TD error signal is vector-valued rather than scalar. [sent-158, score-0.249]

61 DA neurons could code this vector in a distributed manner, which might explain experimentally observed response variability between neurons [7]. [sent-159, score-0.372]

62 ot ) · βs,t ˆ Marginalizing out the observations reduces this to Bellman’s equation for Vs , which is also, of course, the fixed-point equation for value iteration. [sent-187, score-0.287]

63 4 Results When expected reward is delivered early, the semi-Markov model assumes that this signals an early transition into the ITI state, and it thus does not expect further reward or produce spurious negative error (Figure 1d, top). [sent-188, score-1.236]

64 Because of variability in the model’s ISI estimate, an early transition, while improbable, better explains the data than some other path through the state space. [sent-189, score-0.224]

65 The early reward is worth more than expected, due to reduced discounting, and is thus accompanied by positive error. [sent-190, score-0.526]

66 The model can also infer a state transition from the passage of time, absent any observations. [sent-191, score-0.235]

67 In Figure 1d (bottom), when the reward is delivered late, the system infers that the world has entered the ITI state without reward, producing negative error. [sent-192, score-0.819]

68 (The dwell time distribution D in the inference model was changed to reflect this distribution, as an animal should learn a different model here. [sent-194, score-0.302]

69 ) Earlier-than-average rewards are worth more than expected (due to discounting) and cause positive prediction error, while laterthan-average rewards cause negative error because they are more heavily discounted. [sent-195, score-0.358]

70 Our state inference approach is based on a hidden Markov model (HMM) account we previously advanced to explain animal learning about the temporal relationships of events [15]. [sent-200, score-0.513]

71 One is the notion of uncertainty in some of its internal parameters, which Kakade and Dayan [16] use to explain interval timing and attentional effects in learning. [sent-203, score-0.256]

72 With tapped delay lines, timescale dilation increases the number of marker states in Figure 1a and slows learning. [sent-207, score-0.483]

73 But our semi-Markov model is timescale invariant: learning is induced by state transitions which in turn are triggered by events or by the passage of time on a scale controlled by the internal model. [sent-208, score-0.458]

74 (The form of temporal discounting we use is not timescale invariant, but this can be corrected as in [5]. [sent-209, score-0.212]

75 ) 5 Discussion We have presented a model of the DA system that improves on previous models’ accounts of data involving temporal variability and partial observability, because, unlike prior models, it is grounded in a formalism that explicitly incorporates these considerations. [sent-210, score-0.385]

76 Like previous models, ours identifies the DA response with reward prediction error, but it differs in the representational systems driving the predictions. [sent-211, score-0.586]

77 Previous models assumed that tapped delay lines transcribed raw sensory events; ours envisions that these events inform a more active process of inference about the underlying state of the world. [sent-212, score-0.725]

78 For instance, Suri and Schultz [4] propose that reward delivery overrides stimulus representations, canceling pending predictions and eliminating the spurious negative error in Figure 1c (top). [sent-215, score-0.719]

79 DA models often assume an actor-critic framework [1] in which reward predictions are used to evaluate action selection policies. [sent-221, score-0.539]

80 Partial observability complicates such an extension here, since policies must be defined over belief states (distributions over the hidden states S) to accommodate uncertainty; our use of S as a linear basis for value predictions is thus an oversimplification. [sent-222, score-0.285]

81 In this case, TD learning in the inferred state space could maintain a reasonably current and observationally grounded value function. [sent-226, score-0.221]

82 ) Suri [19] and Dayan [20] have also proposed TD theories of DA that incorporate world models to explain behavioral effects, though they do not address the theoretical issues or dopaminergic data considered here. [sent-228, score-0.283]

83 While those accounts use the world model for directly anticipating future events, we have proposed another role for it in state inference. [sent-229, score-0.223]

84 The formal models in question have roughly equivalent explanatory power: a semi-Markov model can be simulated (to arbitrarily fine temporal discretization) by a Markov model that subdivides its states by dwell time. [sent-232, score-0.378]

85 Thus it would be possible to devise a state representation for a Markov model that copes properly with temporal variability. [sent-234, score-0.244]

86 But doing so by elaborating the tapped delay line architecture would amount to building a clockwork engine for the inference process we describe, without the benefit of useful abstractions such as distributions over intervals; a clearer approach would subdivide the states in our model. [sent-235, score-0.507]

87 Such inhibition is somewhat parameter-dependent, since if inference parameters assign high probability to unsignaled transitions the decrease in reward value with delay can be mitigated by increasing uncertainty about the hidden state. [sent-238, score-0.987]

88 One choice would be the subdivision of our semi-Markov states by dwell time discussed above, which in the experiment of Figure 1f would decrease TD error toward but not past zero for longer delays. [sent-240, score-0.252]

89 In this case, later rewards are less surprising because the conditional probability of reward increases as time passes without reward. [sent-241, score-0.622]

90 A related prediction suggested by our model is that DA responses not just to rewards but also to stimuli that signal reward might be modulated by their timing relative to expectation. [sent-242, score-0.895]

91 In tapped delay line models, this is possible only for a constant ITI (since if expectancy is divided between a number of states, stimulus delivery in any one of them cannot be completely predicted away). [sent-244, score-0.612]

92 But the response to a stimulus in the semi-Markov model can show behavior exactly analogous to the reward response in Figure 1f — positive or negative error depending on the time of delivery relative to expectation. [sent-245, score-0.79]

93 We have suggested that the TD error may be a vector signal, with different neurons signaling errors for different elements of a state distribution. [sent-249, score-0.26]

94 This could be investigated experimentally by recording DA neurons as a situation of ambiguous reward expectancy (e. [sent-250, score-0.676]

95 one reward or three) resolved into a situation of intermediate, determinate reward expectancy (e. [sent-252, score-1.035]

96 Long-term reward prediction in TD models of the dopamine system. [sent-273, score-0.718]

97 Dopamine neurons report an error in the temporal prediction of reward during learning. [sent-286, score-0.778]

98 The reward responses of dopamine neurons persist when prediction of reward is probabilistic with respect to time or occurrence. [sent-293, score-1.306]

99 Explicit state occupancy modeling by hidden semi-Markov models: Application of Derin’s scheme. [sent-317, score-0.212]

100 Anticipatory responses of dopamine neurons and cortical neurons reproduced by internal model. [sent-323, score-0.424]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('reward', 0.492), ('td', 0.368), ('da', 0.244), ('ot', 0.242), ('tapped', 0.206), ('iti', 0.186), ('delay', 0.162), ('rewards', 0.13), ('state', 0.123), ('dopamine', 0.121), ('dwell', 0.119), ('st', 0.116), ('stimulus', 0.112), ('observability', 0.103), ('isi', 0.1), ('neurons', 0.097), ('temporal', 0.091), ('vs', 0.086), ('conditioning', 0.084), ('events', 0.078), ('observable', 0.077), ('delivered', 0.075), ('formalism', 0.071), ('transitions', 0.07), ('world', 0.07), ('dopaminergic', 0.068), ('variability', 0.067), ('discounting', 0.067), ('inhibition', 0.065), ('internal', 0.063), ('grounded', 0.063), ('states', 0.061), ('hidden', 0.06), ('entered', 0.059), ('houk', 0.059), ('interval', 0.058), ('prediction', 0.058), ('timescale', 0.054), ('animal', 0.051), ('expectancy', 0.051), ('suri', 0.051), ('unsignaled', 0.051), ('markov', 0.051), ('late', 0.05), ('timing', 0.05), ('stimuli', 0.049), ('animals', 0.048), ('models', 0.047), ('ds', 0.047), ('trace', 0.047), ('uncertainty', 0.046), ('responses', 0.046), ('dayan', 0.045), ('observations', 0.045), ('delays', 0.045), ('delivery', 0.044), ('transition', 0.042), ('inference', 0.041), ('error', 0.04), ('signal', 0.04), ('entering', 0.04), ('kakade', 0.04), ('passage', 0.04), ('predict', 0.039), ('explain', 0.039), ('intervals', 0.038), ('line', 0.037), ('experimentally', 0.036), ('response', 0.036), ('schultz', 0.035), ('inferred', 0.035), ('partial', 0.034), ('anomalous', 0.034), ('dlasto', 0.034), ('envisions', 0.034), ('gallistel', 0.034), ('jc', 0.034), ('neglected', 0.034), ('overtrained', 0.034), ('qss', 0.034), ('vsn', 0.034), ('early', 0.034), ('nth', 0.034), ('sensory', 0.034), ('rt', 0.032), ('experiment', 0.032), ('partially', 0.031), ('delayed', 0.031), ('spurious', 0.031), ('learn', 0.031), ('model', 0.03), ('issues', 0.03), ('daw', 0.029), ('courville', 0.029), ('timed', 0.029), ('semi', 0.029), ('occupancy', 0.029), ('involving', 0.029), ('theories', 0.029), ('reinforcement', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999797 199 nips-2002-Timing and Partial Observability in the Dopamine System

Author: Nathaniel D. Daw, Aaron C. Courville, David S. Touretzky

Abstract: According to a series of influential models, dopamine (DA) neurons signal reward prediction error using a temporal-difference (TD) algorithm. We address a problem not convincingly solved in these accounts: how to maintain a representation of cues that predict delayed consequences. Our new model uses a TD rule grounded in partially observable semi-Markov processes, a formalism that captures two largely neglected features of DA experiments: hidden state and temporal variability. Previous models predicted rewards using a tapped delay line representation of sensory inputs; we replace this with a more active process of inference about the underlying state of the world. The DA system can then learn to map these inferred states to reward predictions using TD. The new model can explain previously vexing data on the responses of DA neurons in the face of temporal variability. By combining statistical model-based learning with a physiologically grounded TD theory, it also brings into contact with physiology some insights about behavior that had previously been confined to more abstract psychological models.

2 0.24988046 159 nips-2002-Optimality of Reinforcement Learning Algorithms with Linear Function Approximation

Author: Ralf Schoknecht

Abstract: There are several reinforcement learning algorithms that yield approximate solutions for the problem of policy evaluation when the value function is represented with a linear function approximator. In this paper we show that each of the solutions is optimal with respect to a specific objective function. Moreover, we characterise the different solutions as images of the optimal exact value function under different projection operations. The results presented here will be useful for comparing the algorithms in terms of the error they achieve relative to the error of the optimal approximate solution. 1

3 0.24377383 61 nips-2002-Convergent Combinations of Reinforcement Learning with Linear Function Approximation

Author: Ralf Schoknecht, Artur Merke

Abstract: Convergence for iterative reinforcement learning algorithms like TD(O) depends on the sampling strategy for the transitions. However, in practical applications it is convenient to take transition data from arbitrary sources without losing convergence. In this paper we investigate the problem of repeated synchronous updates based on a fixed set of transitions. Our main theorem yields sufficient conditions of convergence for combinations of reinforcement learning algorithms and linear function approximation. This allows to analyse if a certain reinforcement learning algorithm and a certain function approximator are compatible. For the combination of the residual gradient algorithm with grid-based linear interpolation we show that there exists a universal constant learning rate such that the iteration converges independently of the concrete transition data. 1

4 0.15778871 71 nips-2002-Dopamine Induced Bistability Enhances Signal Processing in Spiny Neurons

Author: Aaron J. Gruber, Sara A. Solla, James C. Houk

Abstract: Single unit activity in the striatum of awake monkeys shows a marked dependence on the expected reward that a behavior will elicit. We present a computational model of spiny neurons, the principal neurons of the striatum, to assess the hypothesis that direct neuromodulatory effects of dopamine through the activation of D 1 receptors mediate the reward dependency of spiny neuron activity. Dopamine release results in the amplification of key ion currents, leading to the emergence of bistability, which not only modulates the peak firing rate but also introduces a temporal and state dependence of the model's response, thus improving the detectability of temporally correlated inputs. 1

5 0.14024629 116 nips-2002-Interpreting Neural Response Variability as Monte Carlo Sampling of the Posterior

Author: Patrik O. Hoyer, Aapo Hyvärinen

Abstract: The responses of cortical sensory neurons are notoriously variable, with the number of spikes evoked by identical stimuli varying significantly from trial to trial. This variability is most often interpreted as ‘noise’, purely detrimental to the sensory system. In this paper, we propose an alternative view in which the variability is related to the uncertainty, about world parameters, which is inherent in the sensory stimulus. Specifically, the responses of a population of neurons are interpreted as stochastic samples from the posterior distribution in a latent variable model. In addition to giving theoretical arguments supporting such a representational scheme, we provide simulations suggesting how some aspects of response variability might be understood in this framework.

6 0.12585424 171 nips-2002-Reconstructing Stimulus-Driven Neural Networks from Spike Times

7 0.11400066 13 nips-2002-A Note on the Representational Incompatibility of Function Approximation and Factored Dynamics

8 0.1084324 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs

9 0.10629837 205 nips-2002-Value-Directed Compression of POMDPs

10 0.098591551 128 nips-2002-Learning a Forward Model of a Reflex

11 0.094019353 103 nips-2002-How Linear are Auditory Cortical Responses?

12 0.092820533 184 nips-2002-Spectro-Temporal Receptive Fields of Subthreshold Responses in Auditory Cortex

13 0.09045928 141 nips-2002-Maximally Informative Dimensions: Analyzing Neural Responses to Natural Signals

14 0.089847937 148 nips-2002-Morton-Style Factorial Coding of Color in Primary Visual Cortex

15 0.07991945 43 nips-2002-Binary Coding in Auditory Cortex

16 0.079483755 51 nips-2002-Classifying Patterns of Visual Motion - a Neuromorphic Approach

17 0.076123133 73 nips-2002-Dynamic Bayesian Networks with Deterministic Latent Tables

18 0.07403741 26 nips-2002-An Estimation-Theoretic Framework for the Presentation of Multiple Stimuli

19 0.072826453 76 nips-2002-Dynamical Constraints on Computing with Spike Timing in the Cortex

20 0.066743433 12 nips-2002-A Neural Edge-Detection Model for Enhanced Auditory Sensitivity in Modulated Noise


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.2), (1, 0.147), (2, -0.177), (3, -0.092), (4, -0.02), (5, -0.103), (6, -0.047), (7, -0.035), (8, 0.014), (9, 0.008), (10, -0.03), (11, -0.18), (12, -0.086), (13, 0.227), (14, 0.02), (15, -0.069), (16, -0.053), (17, 0.252), (18, -0.127), (19, -0.149), (20, 0.102), (21, -0.006), (22, 0.124), (23, 0.116), (24, 0.087), (25, 0.053), (26, -0.057), (27, -0.088), (28, 0.031), (29, 0.084), (30, 0.043), (31, 0.015), (32, 0.039), (33, 0.003), (34, 0.051), (35, -0.087), (36, 0.009), (37, 0.066), (38, -0.019), (39, -0.082), (40, -0.034), (41, 0.067), (42, -0.008), (43, 0.146), (44, -0.048), (45, -0.07), (46, 0.0), (47, 0.032), (48, -0.021), (49, -0.068)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.957241 199 nips-2002-Timing and Partial Observability in the Dopamine System

Author: Nathaniel D. Daw, Aaron C. Courville, David S. Touretzky

Abstract: According to a series of influential models, dopamine (DA) neurons signal reward prediction error using a temporal-difference (TD) algorithm. We address a problem not convincingly solved in these accounts: how to maintain a representation of cues that predict delayed consequences. Our new model uses a TD rule grounded in partially observable semi-Markov processes, a formalism that captures two largely neglected features of DA experiments: hidden state and temporal variability. Previous models predicted rewards using a tapped delay line representation of sensory inputs; we replace this with a more active process of inference about the underlying state of the world. The DA system can then learn to map these inferred states to reward predictions using TD. The new model can explain previously vexing data on the responses of DA neurons in the face of temporal variability. By combining statistical model-based learning with a physiologically grounded TD theory, it also brings into contact with physiology some insights about behavior that had previously been confined to more abstract psychological models.

2 0.70280266 61 nips-2002-Convergent Combinations of Reinforcement Learning with Linear Function Approximation

Author: Ralf Schoknecht, Artur Merke

Abstract: Convergence for iterative reinforcement learning algorithms like TD(O) depends on the sampling strategy for the transitions. However, in practical applications it is convenient to take transition data from arbitrary sources without losing convergence. In this paper we investigate the problem of repeated synchronous updates based on a fixed set of transitions. Our main theorem yields sufficient conditions of convergence for combinations of reinforcement learning algorithms and linear function approximation. This allows to analyse if a certain reinforcement learning algorithm and a certain function approximator are compatible. For the combination of the residual gradient algorithm with grid-based linear interpolation we show that there exists a universal constant learning rate such that the iteration converges independently of the concrete transition data. 1

3 0.63837534 159 nips-2002-Optimality of Reinforcement Learning Algorithms with Linear Function Approximation

Author: Ralf Schoknecht

Abstract: There are several reinforcement learning algorithms that yield approximate solutions for the problem of policy evaluation when the value function is represented with a linear function approximator. In this paper we show that each of the solutions is optimal with respect to a specific objective function. Moreover, we characterise the different solutions as images of the optimal exact value function under different projection operations. The results presented here will be useful for comparing the algorithms in terms of the error they achieve relative to the error of the optimal approximate solution. 1

4 0.62419391 71 nips-2002-Dopamine Induced Bistability Enhances Signal Processing in Spiny Neurons

Author: Aaron J. Gruber, Sara A. Solla, James C. Houk

Abstract: Single unit activity in the striatum of awake monkeys shows a marked dependence on the expected reward that a behavior will elicit. We present a computational model of spiny neurons, the principal neurons of the striatum, to assess the hypothesis that direct neuromodulatory effects of dopamine through the activation of D 1 receptors mediate the reward dependency of spiny neuron activity. Dopamine release results in the amplification of key ion currents, leading to the emergence of bistability, which not only modulates the peak firing rate but also introduces a temporal and state dependence of the model's response, thus improving the detectability of temporally correlated inputs. 1

5 0.42989963 205 nips-2002-Value-Directed Compression of POMDPs

Author: Pascal Poupart, Craig Boutilier

Abstract: We examine the problem of generating state-space compressions of POMDPs in a way that minimally impacts decision quality. We analyze the impact of compressions on decision quality, observing that compressions that allow accurate policy evaluation (prediction of expected future reward) will not affect decision quality. We derive a set of sufficient conditions that ensure accurate prediction in this respect, illustrate interesting mathematical properties these confer on lossless linear compressions, and use these to derive an iterative procedure for finding good linear lossy compressions. We also elaborate on how structured representations of a POMDP can be used to find such compressions.

6 0.41342488 81 nips-2002-Expected and Unexpected Uncertainty: ACh and NE in the Neocortex

7 0.41278052 128 nips-2002-Learning a Forward Model of a Reflex

8 0.38757035 116 nips-2002-Interpreting Neural Response Variability as Monte Carlo Sampling of the Posterior

9 0.3610968 18 nips-2002-Adaptation and Unsupervised Learning

10 0.34171674 12 nips-2002-A Neural Edge-Detection Model for Enhanced Auditory Sensitivity in Modulated Noise

11 0.34150293 13 nips-2002-A Note on the Representational Incompatibility of Function Approximation and Factored Dynamics

12 0.33588168 129 nips-2002-Learning in Spiking Neural Assemblies

13 0.3280147 84 nips-2002-Fast Exact Inference with a Factored Model for Natural Language Parsing

14 0.3190226 171 nips-2002-Reconstructing Stimulus-Driven Neural Networks from Spike Times

15 0.31610173 22 nips-2002-Adaptive Nonlinear System Identification with Echo State Networks

16 0.30508667 44 nips-2002-Binary Tuning is Optimal for Neural Rate Coding with High Temporal Resolution

17 0.30210114 31 nips-2002-Application of Variational Bayesian Approach to Speech Recognition

18 0.29987738 141 nips-2002-Maximally Informative Dimensions: Analyzing Neural Responses to Natural Signals

19 0.29681268 160 nips-2002-Optoelectronic Implementation of a FitzHugh-Nagumo Neural Model

20 0.29199857 3 nips-2002-A Convergent Form of Approximate Policy Iteration


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(11, 0.014), (23, 0.049), (42, 0.055), (54, 0.073), (55, 0.051), (57, 0.013), (58, 0.014), (62, 0.288), (64, 0.023), (67, 0.048), (68, 0.072), (74, 0.065), (92, 0.022), (98, 0.123)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79763639 199 nips-2002-Timing and Partial Observability in the Dopamine System

Author: Nathaniel D. Daw, Aaron C. Courville, David S. Touretzky

Abstract: According to a series of influential models, dopamine (DA) neurons signal reward prediction error using a temporal-difference (TD) algorithm. We address a problem not convincingly solved in these accounts: how to maintain a representation of cues that predict delayed consequences. Our new model uses a TD rule grounded in partially observable semi-Markov processes, a formalism that captures two largely neglected features of DA experiments: hidden state and temporal variability. Previous models predicted rewards using a tapped delay line representation of sensory inputs; we replace this with a more active process of inference about the underlying state of the world. The DA system can then learn to map these inferred states to reward predictions using TD. The new model can explain previously vexing data on the responses of DA neurons in the face of temporal variability. By combining statistical model-based learning with a physiologically grounded TD theory, it also brings into contact with physiology some insights about behavior that had previously been confined to more abstract psychological models.

2 0.6961692 205 nips-2002-Value-Directed Compression of POMDPs

Author: Pascal Poupart, Craig Boutilier

Abstract: We examine the problem of generating state-space compressions of POMDPs in a way that minimally impacts decision quality. We analyze the impact of compressions on decision quality, observing that compressions that allow accurate policy evaluation (prediction of expected future reward) will not affect decision quality. We derive a set of sufficient conditions that ensure accurate prediction in this respect, illustrate interesting mathematical properties these confer on lossless linear compressions, and use these to derive an iterative procedure for finding good linear lossy compressions. We also elaborate on how structured representations of a POMDP can be used to find such compressions.

3 0.53472304 5 nips-2002-A Digital Antennal Lobe for Pattern Equalization: Analysis and Design

Author: Alex Holub, Gilles Laurent, Pietro Perona

Abstract: Re-mapping patterns in order to equalize their distribution may greatly simplify both the structure and the training of classifiers. Here, the properties of one such map obtained by running a few steps of discrete-time dynamical system are explored. The system is called 'Digital Antennal Lobe' (DAL) because it is inspired by recent studies of the antennallobe, a structure in the olfactory system of the grasshopper. The pattern-spreading properties of the DAL as well as its average behavior as a function of its (few) design parameters are analyzed by extending previous results of Van Vreeswijk and Sompolinsky. Furthermore, a technique for adapting the parameters of the initial design in order to obtain opportune noise-rejection behavior is suggested. Our results are demonstrated with a number of simulations. 1

4 0.53314018 43 nips-2002-Binary Coding in Auditory Cortex

Author: Michael R. Deweese, Anthony M. Zador

Abstract: Cortical neurons have been reported to use both rate and temporal codes. Here we describe a novel mode in which each neuron generates exactly 0 or 1 action potentials, but not more, in response to a stimulus. We used cell-attached recording, which ensured single-unit isolation, to record responses in rat auditory cortex to brief tone pips. Surprisingly, the majority of neurons exhibited binary behavior with few multi-spike responses; several dramatic examples consisted of exactly one spike on 100% of trials, with no trial-to-trial variability in spike count. Many neurons were tuned to stimulus frequency. Since individual trials yielded at most one spike for most neurons, the information about stimulus frequency was encoded in the population, and would not have been accessible to later stages of processing that only had access to the activity of a single unit. These binary units allow a more efficient population code than is possible with conventional rate coding units, and are consistent with a model of cortical processing in which synchronous packets of spikes propagate stably from one neuronal population to the next. 1 Binary coding in auditory cortex We recorded responses of neurons in the auditory cortex of anesthetized rats to pure-tone pips of different frequencies [1, 2]. Each pip was presented repeatedly, allowing us to assess the variability of the neural response to multiple presentations of each stimulus. We first recorded multi-unit activity with conventional tungsten electrodes (Fig. 1a). The number of spikes in response to each pip fluctuated markedly from one trial to the next (Fig. 1e), as though governed by a random mechanism such as that generating the ticks of a Geiger counter. Highly variable responses such as these, which are at least as variable as a Poisson process, are the norm in the cortex [3-7], and have contributed to the widely held view that cortical spike trains are so noisy that only the average firing rate can be used to encode stimuli. Because we were recording the activity of an unknown number of neurons, we could not be sure whether the strong trial-to-trial fluctuations reflected the underlying variability of the single units. We therefore used an alternative technique, cell- a b Single-unit recording method 5mV Multi-unit 1sec Raw cellattached voltage 10 kHz c Single-unit . . . . .. .. ... . . .... . ... . Identified spikes Threshold e 28 kHz d Single-unit 80 120 160 200 Time (msec) N = 29 tones 3 2 1 Poisson N = 11 tones ry 40 4 na bi 38 kHz 0 Response variance/mean (spikes/trial) High-pass filtered 0 0 1 2 3 Mean response (spikes/trial) Figure 1: Multi-unit spiking activity was highly variable, but single units obeyed binomial statistics. a Multi-unit spike rasters from a conventional tungsten electrode recording showed high trial-to-trial variability in response to ten repetitions of the same 50 msec pure tone stimulus (bottom). Darker hash marks indicate spike times within the response period, which were used in the variability analysis. b Spikes recorded in cell-attached mode were easily identified from the raw voltage trace (top) by applying a high-pass filter (bottom) and thresholding (dark gray line). Spike times (black squares) were assigned to the peaks of suprathreshold segments. c Spike rasters from a cell-attached recording of single-unit responses to 25 repetitions of the same tone consisted of exactly one well-timed spike per trial (latency standard deviation = 1.0 msec), unlike the multi-unit responses (Fig. 1a). Under the Poisson assumption, this would have been highly unlikely (P ~ 10 -11). d The same neuron as in Fig. 1c responds with lower probability to repeated presentations of a different tone, but there are still no multi-spike responses. e We quantified response variability for each tone by dividing the variance in spike count by the mean spike count across all trials for that tone. Response variability for multi-unit tungsten recording (open triangles) was high for each of the 29 tones (out of 32) that elicited at least one spike on one trial. All but one point lie above one (horizontal gray line), which is the value produced by a Poisson process with any constant or time varying event rate. Single unit responses recorded in cell-attached mode were far less variable (filled circles). Ninety one percent (10/11) of the tones that elicited at least one spike from this neuron produced no multi-spike responses in 25 trials; the corresponding points fall on the diagonal line between (0,1) and (1,0), which provides a strict lower bound on the variability for any response set with a mean between 0 and 1. No point lies above one. attached recording with a patch pipette [8, 9], in order to ensure single unit isolation (Fig. 1b). This recording mode minimizes both of the main sources of error in spike detection: failure to detect a spike in the unit under observation (false negatives), and contamination by spikes from nearby neurons (false positives). It also differs from conventional extracellular recording methods in its selection bias: With cell- attached recording neurons are selected solely on the basis of the experimenter’s ability to form a seal, rather than on the basis of neuronal activity and responsiveness to stimuli as in conventional methods. Surprisingly, single unit responses were far more orderly than suggested by the multi-unit recordings; responses typically consisted of either 0 or 1 spikes per trial, and not more (Fig. 1c-e). In the most dramatic examples, each presentation of the same tone pip elicited exactly one spike (Fig. 1c). In most cases, however, some presentations failed to elicit a spike (Fig. 1d). Although low-variability responses have recently been observed in the cortex [10, 11] and elsewhere [12, 13], the binary behavior described here has not previously been reported for cortical neurons. a 1.4 N = 3055 response sets b 1.2 1 Poisson 28 kHz - 100 msec 0.8 0.6 0.4 0.2 0 0 ry na bi Response variance/mean (spikes/trial) The majority of the neurons (59%) in our study for which statistical significance could be assessed (at the p<0.001 significance level; see Fig. 2, caption) showed noisy binary behavior—“binary” because neurons produced either 0 or 1 spikes, and “noisy” because some stimuli elicited both single spikes and failures. In a substantial fraction of neurons, however, the responses showed more variability. We found no correlation between neuronal variability and cortical layer (inferred from the depth of the recording electrode), cortical area (inside vs. outside of area A1) or depth of anesthesia. Moreover, the binary mode of spiking was not due to the brevity (25 msec) of the stimuli; responses that were binary for short tones were comparably binary when longer (100 msec) tones were used (Fig. 2b). Not assessable Not significant Significant (p<0.001) 0.2 0.4 0.6 0.8 1 1.2 Mean response (spikes/trial) 28 kHz - 25 msec 1.4 0 40 80 120 160 Time (msec) 200 Figure 2: Half of the neuronal population exhibited binary firing behavior. a Of the 3055 sets of responses to 25 msec tones, 2588 (gray points) could not be assessed for significance at the p<0.001 level, 225 (open circles) were not significantly binary, and 242 were significantly binary (black points; see Identification methods for group statistics below). All points were jittered slightly so that overlying points could be seen in the figure. 2165 response sets contained no multi-spike responses; the corresponding points fell on the line from [0,1] to [1,0]. b The binary nature of single unit responses was insensitive to tone duration, even for frequencies that elicited the largest responses. Twenty additional spike rasters from the same neuron (and tone frequency) as in Fig. 1c contain no multi-spike responses whether in response to 100 msec tones (above) or 25 msec tones (below). Across the population, binary responses were as prevalent for 100 msec tones as for 25 msec tones (see Identification methods for group statistics). In many neurons, binary responses showed high temporal precision, with latencies sometimes exhibiting standard deviations as low as 1 msec (Fig. 3; see also Fig. 1c), comparable to previous observations in the auditory cortex [14], and only slightly more precise than in monkey visual area MT [5]. High temporal precision was positively correlated with high response probability (Fig. 3). a b N = (44 cells)x(32 tones) 14 N = 32 tones 12 30 Jitter (msec) Jitter (msec) 40 10 8 6 20 10 4 2 0 0 0 0.2 0.4 0.6 0.8 Mean response (spikes/trial) 1 0 0.4 0.8 1.2 1.6 Mean response (spikes/trial) 2 Figure 3: Trial-to-trial variability in latency of response to repeated presentations of the same tone decreased with increasing response probability. a Scatter plot of standard deviation of latency vs. mean response for 25 presentations each of 32 tones for a different neuron as in Figs. 1 and 2 (gray line is best linear fit). Rasters from 25 repeated presentations of a low response tone (upper left inset, which corresponds to left-most data point) display much more variable latencies than rasters from a high response tone (lower right inset; corresponds to right-most data point). b The negative correlation between latency variability and response size was present on average across the population of 44 neurons described in Identification methods for group statistics (linear fit, gray). The low trial-to-trial variability ruled out the possibility that the firing statistics could be accounted for by a simple rate-modulated Poisson process (Fig. 4a1,a2). In other systems, low variability has sometimes been modeled as a Poisson process followed by a post-spike refractory period [10, 12]. In our system, however, the range in latencies of evoked binary responses was often much greater than the refractory period, which could not have been longer than the 2 msec inter-spike intervals observed during epochs of spontaneous spiking, indicating that binary spiking did not result from any intrinsic property of the spike generating mechanism (Fig. 4a3). Moreover, a single stimulus-evoked spike could suppress subsequent spikes for as long as hundreds of milliseconds (e.g. Figs. 1d,4d), supporting the idea that binary spiking arises through a circuit-level, rather than a single-neuron, mechanism. Indeed, the fact that this suppression is observed even in the cortex of awake animals [15] suggests that binary spiking is not a special property of the anesthetized state. It seems surprising that binary spiking in the cortex has not previously been remarked upon. In the auditory cortex the explanation may be in part technical: Because firing rates in the auditory cortex tend to be low, multi-unit recording is often used to maximize the total amount of data collected. Moreover, our use of cell-attached recording minimizes the usual bias toward responsive or active neurons. Such explanations are not, however, likely to account for the failure to observe binary spiking in the visual cortex, where spike count statistics have been scrutinized more closely [3-7]. One possibility is that this reflects a fundamental difference between the auditory and visual systems. An alternative interpretation— a1 b Response probability 100 spikes/s 2 kHz Poisson simulation c 100 200 300 400 Time (msec) 500 20 Ratio of pool sizes a2 0 16 12 8 4 0 a3 Poisson with refractory period 0 40 80 120 160 200 Time (msec) d Response probability PSTH 0.2 0.4 0.6 0.8 1 Mean spike count per neuron 1 0.8 N = 32 tones 0.6 0.4 0.2 0 2.0 3.8 7.1 13.2 24.9 46.7 Tone frequency (kHz) Figure 4: a The lack of multi-spike responses elicited by the neuron shown in Fig. 3a were not due to an absolute refractory period since the range of latencies for many tones, like that shown here, was much greater than any reasonable estimate for the neuron’s refractory period. (a1) Experimentally recorded responses. (a2) Using the smoothed post stimulus time histogram (PSTH; bottom) from the set of responses in Fig. 4a, we generated rasters under the assumption of Poisson firing. In this representative example, four double-spike responses (arrows at left) were produced in 25 trials. (a3) We then generated rasters assuming that the neuron fired according to a Poisson process subject to a hard refractory period of 2 msec. Even with a refractory period, this representative example includes one triple- and three double-spike responses. The minimum interspike-interval during spontaneous firing events was less than two msec for five of our neurons, so 2 msec is a conservative upper bound for the refractory period. b. Spontaneous activity is reduced following high-probability responses. The PSTH (top; 0.25 msec bins) of the combined responses from the 25% (8/32) of tones that elicited the largest responses from the same neuron as in Figs. 3a and 4a illustrates a preclusion of spontaneous and evoked activity for over 200 msec following stimulation. The PSTHs from progressively less responsive groups of tones show progressively less preclusion following stimulation. c Fewer noisy binary neurons need to be pooled to achieve the same “signal-to-noise ratio” (SNR; see ref. [24]) as a collection of Poisson neurons. The ratio of the number of Poisson to binary neurons required to achieve the same SNR is plotted against the mean number of spikes elicited per neuron following stimulation; here we have defined the SNR to be the ratio of the mean spike count to the standard deviation of the spike count. d Spike probability tuning curve for the same neuron as in Figs. 1c-e and 2b fit to a Gaussian in tone frequency. and one that we favor—is that the difference rests not in the sensory modality, but instead in the difference between the stimuli used. In this view, the binary responses may not be limited to the auditory cortex; neurons in visual and other sensory cortices might exhibit similar responses to the appropriate stimuli. For example, the tone pips we used might be the auditory analog of a brief flash of light, rather than the oriented moving edges or gratings usually used to probe the primary visual cortex. Conversely, auditory stimuli analogous to edges or gratings [16, 17] may be more likely to elicit conventional, rate-modulated Poisson responses in the auditory cortex. Indeed, there may be a continuum between binary and Poisson modes. Thus, even in conventional rate-modulated responses, the first spike is often privileged in that it carries most of the information in the spike train [5, 14, 18]. The first spike may be particularly important as a means of rapidly signaling stimulus transients. Binary responses suggest a mode that complements conventional rate coding. In the simplest rate-coding model, a stimulus parameter (such as the frequency of a tone) governs only the rate at which a neuron generates spikes, but not the detailed positions of the spikes; the actual spike train itself is an instantiation of a random process (such as a Poisson process). By contrast, in the binomial model, the stimulus parameter (frequency) is encoded as the probability of firing (Fig. 4d). Binary coding has implications for cortical computation. In the rate coding model, stimulus encoding is “ergodic”: a stimulus parameter can be read out either by observing the activity of one neuron for a long time, or a population for a short time. By contrast, in the binary model the stimulus value can be decoded only by observing a neuronal population, so that there is no benefit to integrating over long time periods (cf. ref. [19]). One advantage of binary encoding is that it allows the population to signal quickly; the most compact message a neuron can send is one spike [20]. Binary coding is also more efficient in the context of population coding, as quantified by the signal-to-noise ratio (Fig. 4c). The precise organization of both spike number and time we have observed suggests that cortical activity consists, at least under some conditions, of packets of spikes synchronized across populations of neurons. Theoretical work [21-23] has shown how such packets can propagate stably from one population to the next, but only if neurons within each population fire at most one spike per packet; otherwise, the number of spikes per packet—and hence the width of each packet—grows at each propagation step. Interestingly, one prediction of stable propagation models is that spike probability should be related to timing precision, a prediction born out by our observations (Fig. 3). The role of these packets in computation remains an open question. 2 Identification methods for group statistics We recorded responses to 32 different 25 msec tones from each of 175 neurons from the auditory cortices of 16 Sprague-Dawley rats; each tone was repeated between 5 and 75 times (mean = 19). Thus our ensemble consisted of 32x175=5600 response sets, with between 5 and 75 samples in each set. Of these, 3055 response sets contained at least one spike on at least on trial. For each response set, we tested the hypothesis that the observed variability was significantly lower than expected from the null hypothesis of a Poisson process. The ability to assess significance depended on two parameters: the sample size (5-75) and the firing probability. Intuitively, the dependence on firing probability arises because at low firing rates most responses produce only trials with 0 or 1 spikes under both the Poisson and binary models; only at high firing rates do the two models make different predictions, since in that case the Poisson model includes many trials with 2 or even 3 spikes while the binary model generates only solitary spikes (see Fig. 4a1,a2). Using a stringent significance criterion of p<0.001, 467 response sets had a sufficient number of repeats to assess significance, given the observed firing probability. Of these, half (242/467=52%) were significantly less variable than expected by chance, five hundred-fold higher than the 467/1000=0.467 response sets expected, based on the 0.001 significance criterion, to yield a binary response set. Seventy-two neurons had at least one response set for which significance could be assessed, and of these, 49 neurons (49/72=68%) had at least one significantly sub-Poisson response set. Of this population of 49 neurons, five achieved low variability through repeatable bursty behavior (e.g., every spike count was either 0 or 3, but not 1 or 2) and were excluded from further analysis. The remaining 44 neurons formed the basis for the group statistics analyses shown in Figs. 2a and 3b. Nine of these neurons were subjected to an additional protocol consisting of at least 10 presentations each of 100 msec tones and 25 msec tones of all 32 frequencies. Of the 100 msec stimulation response sets, 44 were found to be significantly sub-Poisson at the p<0.05 level, in good agreement with the 43 found to be significant among the responses to 25 msec tones. 3 Bibliography 1. Kilgard, M.P. and M.M. Merzenich, Cortical map reorganization enabled by nucleus basalis activity. Science, 1998. 279(5357): p. 1714-8. 2. Sally, S.L. and J.B. Kelly, Organization of auditory cortex in the albino rat: sound frequency. J Neurophysiol, 1988. 59(5): p. 1627-38. 3. Softky, W.R. and C. Koch, The highly irregular firing of cortical cells is inconsistent with temporal integration of random EPSPs. J Neurosci, 1993. 13(1): p. 334-50. 4. Stevens, C.F. and A.M. Zador, Input synchrony and the irregular firing of cortical neurons. Nat Neurosci, 1998. 1(3): p. 210-7. 5. Buracas, G.T., A.M. Zador, M.R. DeWeese, and T.D. Albright, Efficient discrimination of temporal patterns by motion-sensitive neurons in primate visual cortex. Neuron, 1998. 20(5): p. 959-69. 6. Shadlen, M.N. and W.T. Newsome, The variable discharge of cortical neurons: implications for connectivity, computation, and information coding. J Neurosci, 1998. 18(10): p. 3870-96. 7. Tolhurst, D.J., J.A. Movshon, and A.F. Dean, The statistical reliability of signals in single neurons in cat and monkey visual cortex. Vision Res, 1983. 23(8): p. 775-85. 8. Otmakhov, N., A.M. Shirke, and R. Malinow, Measuring the impact of probabilistic transmission on neuronal output. Neuron, 1993. 10(6): p. 1101-11. 9. Friedrich, R.W. and G. Laurent, Dynamic optimization of odor representations by slow temporal patterning of mitral cell activity. Science, 2001. 291(5505): p. 889-94. 10. Kara, P., P. Reinagel, and R.C. Reid, Low response variability in simultaneously recorded retinal, thalamic, and cortical neurons. Neuron, 2000. 27(3): p. 635-46. 11. Gur, M., A. Beylin, and D.M. Snodderly, Response variability of neurons in primary visual cortex (V1) of alert monkeys. J Neurosci, 1997. 17(8): p. 2914-20. 12. Berry, M.J., D.K. Warland, and M. Meister, The structure and precision of retinal spike trains. Proc Natl Acad Sci U S A, 1997. 94(10): p. 5411-6. 13. de Ruyter van Steveninck, R.R., G.D. Lewen, S.P. Strong, R. Koberle, and W. Bialek, Reproducibility and variability in neural spike trains. Science, 1997. 275(5307): p. 1805-8. 14. Heil, P., Auditory cortical onset responses revisited. I. First-spike timing. J Neurophysiol, 1997. 77(5): p. 2616-41. 15. Lu, T., L. Liang, and X. Wang, Temporal and rate representations of timevarying signals in the auditory cortex of awake primates. Nat Neurosci, 2001. 4(11): p. 1131-8. 16. Kowalski, N., D.A. Depireux, and S.A. Shamma, Analysis of dynamic spectra in ferret primary auditory cortex. I. Characteristics of single-unit responses to moving ripple spectra. J Neurophysiol, 1996. 76(5): p. 350323. 17. deCharms, R.C., D.T. Blake, and M.M. Merzenich, Optimizing sound features for cortical neurons. Science, 1998. 280(5368): p. 1439-43. 18. Panzeri, S., R.S. Petersen, S.R. Schultz, M. Lebedev, and M.E. Diamond, The role of spike timing in the coding of stimulus location in rat somatosensory cortex. Neuron, 2001. 29(3): p. 769-77. 19. Britten, K.H., M.N. Shadlen, W.T. Newsome, and J.A. Movshon, The analysis of visual motion: a comparison of neuronal and psychophysical performance. J Neurosci, 1992. 12(12): p. 4745-65. 20. Delorme, A. and S.J. Thorpe, Face identification using one spike per neuron: resistance to image degradations. Neural Netw, 2001. 14(6-7): p. 795-803. 21. Diesmann, M., M.O. Gewaltig, and A. Aertsen, Stable propagation of synchronous spiking in cortical neural networks. Nature, 1999. 402(6761): p. 529-33. 22. Marsalek, P., C. Koch, and J. Maunsell, On the relationship between synaptic input and spike output jitter in individual neurons. Proc Natl Acad Sci U S A, 1997. 94(2): p. 735-40. 23. Kistler, W.M. and W. Gerstner, Stable propagation of activity pulses in populations of spiking neurons. Neural Comp., 2002. 14: p. 987-997. 24. Zohary, E., M.N. Shadlen, and W.T. Newsome, Correlated neuronal discharge rate and its implications for psychophysical performance. Nature, 1994. 370(6485): p. 140-3. 25. Abbott, L.F. and P. Dayan, The effect of correlated variability on the accuracy of a population code. Neural Comput, 1999. 11(1): p. 91-101.

5 0.5286842 11 nips-2002-A Model for Real-Time Computation in Generic Neural Microcircuits

Author: Wolfgang Maass, Thomas Natschläger, Henry Markram

Abstract: A key challenge for neural modeling is to explain how a continuous stream of multi-modal input from a rapidly changing environment can be processed by stereotypical recurrent circuits of integrate-and-fire neurons in real-time. We propose a new computational model that is based on principles of high dimensional dynamical systems in combination with statistical learning theory. It can be implemented on generic evolved or found recurrent circuitry.

6 0.52837336 76 nips-2002-Dynamical Constraints on Computing with Spike Timing in the Cortex

7 0.52803838 148 nips-2002-Morton-Style Factorial Coding of Color in Primary Visual Cortex

8 0.5232057 81 nips-2002-Expected and Unexpected Uncertainty: ACh and NE in the Neocortex

9 0.52209902 102 nips-2002-Hidden Markov Model of Cortical Synaptic Plasticity: Derivation of the Learning Rule

10 0.52108455 184 nips-2002-Spectro-Temporal Receptive Fields of Subthreshold Responses in Auditory Cortex

11 0.51864421 50 nips-2002-Circuit Model of Short-Term Synaptic Dynamics

12 0.51813656 141 nips-2002-Maximally Informative Dimensions: Analyzing Neural Responses to Natural Signals

13 0.51791179 44 nips-2002-Binary Tuning is Optimal for Neural Rate Coding with High Temporal Resolution

14 0.51786667 116 nips-2002-Interpreting Neural Response Variability as Monte Carlo Sampling of the Posterior

15 0.51745093 28 nips-2002-An Information Theoretic Approach to the Functional Classification of Neurons

16 0.51731455 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs

17 0.51702416 62 nips-2002-Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems

18 0.51395738 180 nips-2002-Selectivity and Metaplasticity in a Unified Calcium-Dependent Model

19 0.51233041 10 nips-2002-A Model for Learning Variance Components of Natural Images

20 0.51201731 73 nips-2002-Dynamic Bayesian Networks with Deterministic Latent Tables