nips nips2009 nips2009-52 knowledge-graph by maker-knowledge-mining

52 nips-2009-Code-specific policy gradient rules for spiking neurons

Source: pdf

Author: Henning Sprekeler, Guillaume Hennequin, Wulfram Gerstner

Abstract: Although it is widely believed that reinforcement learning is a suitable tool for describing behavioral learning, the mechanisms by which it can be implemented in networks of spiking neurons are not fully understood. Here, we show that different learning rules emerge from a policy gradient approach depending on which features of the spike trains are assumed to inﬂuence the reward signals, i.e., depending on which neural code is in effect. We use the framework of Williams (1992) to derive learning rules for arbitrary neural codes. For illustration, we present policy-gradient rules for three different example codes - a spike count code, a spike timing code and the most general “full spike train” code - and test them on simple model problems. In addition to classical synaptic learning, we derive learning rules for intrinsic parameters that control the excitability of the neuron. The spike count learning rule has structural similarities with established Bienenstock-Cooper-Munro rules. If the distribution of the relevant spike train features belongs to the natural exponential family, the learning rules have a characteristic shape that raises interesting prediction problems. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Here, we show that different learning rules emerge from a policy gradient approach depending on which features of the spike trains are assumed to inﬂuence the reward signals, i. [sent-2, score-1.441]

2 For illustration, we present policy-gradient rules for three different example codes - a spike count code, a spike timing code and the most general “full spike train” code - and test them on simple model problems. [sent-6, score-2.203]

3 In addition to classical synaptic learning, we derive learning rules for intrinsic parameters that control the excitability of the neuron. [sent-7, score-0.494]

4 The spike count learning rule has structural similarities with established Bienenstock-Cooper-Munro rules. [sent-8, score-0.843]

5 If the distribution of the relevant spike train features belongs to the natural exponential family, the learning rules have a characteristic shape that raises interesting prediction problems. [sent-9, score-1.121]

6 1 Introduction Neural implementations of reinforcement learning have to solve two basic credit assignment problems: (a) the temporal credit assignment problem, i. [sent-10, score-0.35]

7 , the question which of the actions that were taken in the past were crucial to receiving a reward later and (b) the spatial credit assignment problem, i. [sent-12, score-0.555]

8 , the question, which neurons in a population were important for getting the reward and which ones were not. [sent-14, score-0.577]

9 Presume that we know that the spike pattern of one speciﬁc neuron within one speciﬁc time interval was crucial for getting the reward (that is, we have already solved the ﬁrst two credit assignment problems). [sent-16, score-1.251]

10 Then, there is still one question that remains: Which feature of the spike pattern was important for the reward? [sent-17, score-0.608]

11 Would any spike train with the same number of spikes yield the same reward or do we need precisely timed spikes to get it? [sent-18, score-1.196]

12 This credit assignment problem is in essence the question which neural code the output neuron is (or should be) using. [sent-19, score-0.354]

13 It becomes particularly important, if we want to change neuronal parameters like synaptic weights in order to maximize the likelihood of getting the reward again in the future. [sent-20, score-0.675]

14 If only the spike count is relevant, it might not be very effective to spend a lot of time and energy on the difﬁcult task of learning precisely timed spikes. [sent-21, score-0.729]

15 The most modest and probably most versatile way of solving this problem is not to make any assumption on the neural code but to assume that all features of the spike train were important. [sent-22, score-0.809]

16 ch 1 this case, neuronal parameters are changed such that the likelihood of repeating exactly the same spike train for the same synaptic input is maximized. [sent-25, score-0.951]

17 Using a policy-gradient framework, we derive learning rules for neural parameters like synaptic weights or threshold parameters that maximize the expected reward. [sent-28, score-0.493]

18 Finally, we argue that the learning rules contain two types of prediction problems, one related to reward prediction, the other to response prediction. [sent-30, score-0.686]

19 1 General framework Coding features and the policy-gradient approach The basic setup is the following: let there be a set of different input spike trains X µ to a single postsynaptic neuron, which in response generates stochastic output spike trains Y µ . [sent-32, score-1.575]

20 In the language of partially observable Markov decision processes, the input spike trains are observations that provide information about the state of the animal and the output spike trains are controls that inﬂuence the action choice. [sent-33, score-1.467]

21 Depending on both of these spike trains, the system receives a reward. [sent-34, score-0.58]

22 Our central assumption is that the reward R does not depend on the full output spike train, but only on a set of coding features Fj (Y ) of the output spike train: R = R(F, X). [sent-36, score-1.89]

23 Which coding features F the reward depends on is in fact a choice of a neural code, because all other features of the spike train are not behaviorally relevant. [sent-37, score-1.455]

24 Note that there is a conceptual difference to the notion of a neural code in sensory processing, where the coding features convey information about input signals, not about the output signal or rewards. [sent-38, score-0.486]

25 The expectation value of the reward is given by R = F,X R(F, X)P (F|X, θ)P (X), where P (X) denotes the probability of the presynaptic spike trains and P (F|X, θ) the conditional probability of generating the coding feature F given the input spike train X and the neuronal parameters θ. [sent-39, score-2.201]

26 The reward is conditionally independent of the neural parameters θi given the coding feature F. [sent-41, score-0.682]

27 Therefore, if we want to optimize the expected reward by employing a gradient ascent method, we get a learning rule of the form ∂t θi = η R(F, X)P (X)∂θi P (F|X, θ) (1) P (X)P (F|X, θ)R(F, X)∂θi ln P (F|X, θ) . [sent-42, score-0.592]

28 (2) F,X = η F,X If we choose a small learning rate η, the average over presynaptic patterns X and coding features F can be replaced by a time average. [sent-43, score-0.459]

29 2 Learning rules for exponentially distributed coding features The joint distribution of the coding features Fj can always be factorized into a set of conditional distributions P (F|X) = i P (Fi |X; F1 , . [sent-47, score-0.762]

30 , Fi−1 , θ) = 2 h(Fi ) exp(Ci Fi − A(Ci )), where the Ci are parameters that depend on the input spike train X, the coding features F1 , . [sent-54, score-0.991]

31 , Fi−1 , θ) and therefore also depend on the input X, the coding features F1 , . [sent-63, score-0.32]

32 The summation over different coding features arises from the factorization of the distribution, while the speciﬁc shape of the summands relies on the assumption of normal exponential distributions [for a proof, cf. [sent-68, score-0.312]

33 If these ﬂuctuations are positively correlated with the trial ﬂuctuations of the reward R, i. [sent-72, score-0.485]

34 The neuron type we are using is a simple Poisson-type neuron model where the postsynaptic ﬁring rate is given by a nonlinear function ρ(u) of the membrane potential u. [sent-77, score-0.497]

35 The membrane potential u, in turn, is given by the sum of the EPSPs that are evoked by the presynaptic spikes, weighted with the respective synaptic weights: wi (t − tf ) , =: i u(t) = wi PSPi (t) , (5) i i,f where tf denote the time of the f -th spike in the i-th presynaptic neuron. [sent-78, score-1.35]

36 (t − tf ) denotes the i i shape of the postsynaptic potential evoked by a single presynaptic spike at time tf . [sent-79, score-0.97]

37 For future use, i we have introduced PSPi as the postsynaptic potential that would be evoked by the i-th presynaptic spike train alone, if the synaptic weight were unity. [sent-80, score-1.142]

38 The parameters that one could optimize in this neuron model are (a) the synaptic weights and (b) parameters in the dependence of the ﬁring rate ρ on the membrane potential. [sent-81, score-0.442]

39 The ﬁrst case is the standard case of synaptic plasticity, the second corresponds to a reward-driven version of intrinsic plasticity [cf. [sent-82, score-0.368]

40 1 Spike Count Codes: Synaptic plasticity Let us ﬁrst assume that the coding feature is the number N of spikes within a given time window [0, T ] and that the reward is delivered at the end of this period. [sent-85, score-0.848]

41 The probability distribution for the spike count is a Poisson distribution P (N ) = µN exp(−µ)/N ! [sent-86, score-0.671]

42 (6) 0 The dependence of the distribution P (N ) on the presynaptic spike trains X and the synaptic weights wi is hidden in the mean spike count µ, which naturally depends on those factors through the postsynaptic ﬁring rate ρ. [sent-88, score-1.915]

43 3 Because the Poisson distribution belongs to the NEF, we can derive a synaptic learning rule by using equation (4) and calculating the particular form of the term ∂wi µ: ∂t wi = ηR N −µ µ T [∂u ρ](t )PSPi (t ) dt . [sent-89, score-0.521]

44 (7) 0 This learning rule has structural similarities with the Bienenstock-Cooper-Munro (BCM) rule [2]: The integral term has the structure of an eligibility trace that is driven by a simple Hebbian learning rule. [sent-90, score-0.344]

45 In addition, learning is modulated by a factor that compares the current spike count (“rate”) with the expected spike count (“sliding threshold” in BCM theory). [sent-91, score-1.39]

46 Interestingly, the functional role of this factor is very different from the one in the original BCM rule: It is not meant to introduce selectivity [2], but rather to exploit trial ﬂuctuations around the mean spike count to explore the structure of the reward landscape. [sent-92, score-1.157]

47 In each trial, the input spike trains are generated anew from Poisson processes with these neuron- and state-speciﬁc rates. [sent-98, score-0.716]

48 The agent chooses its action stochastically with probabilities that are proportional to the spike counts of two output neurons: p(ak |s) = Nk /(N1 + N2 ). [sent-99, score-0.729]

49 Because the spike counts depend on the state via the presynaptic ﬁring rates, the agent can choose different actions for different states. [sent-100, score-0.801]

50 Figure 1B and C show that the learning rule learns the task by suppressing activity in the neuron that encodes the punished action. [sent-101, score-0.371]

51 The sharpness parameter γ is set to either γ = 1 (for the 2-armed bandit task) or γ = 3 (for the spike latency task). [sent-103, score-0.711]

52 Moreover, the postsynaptic neurons have a membrane potential reset after each spike (i. [sent-104, score-0.883]

53 , relative refractoriness), so that the assumption of a Poisson distribution for the spike counts is not necessarily fulﬁlled. [sent-106, score-0.613]

54 2 Spike Count Codes: Intrinsic plasticity Let us now assume that the rate of the neuron is given by a function ρ(u) = g (γ(u − u0 )) which depends on the threshold parameters u0 and γ. [sent-109, score-0.338]

55 By intrinsic plasticity we mean that the parameters u0 and γ are learned instead of or in addition to the synaptic weights. [sent-111, score-0.391]

56 3 First Spike-Latency Code: Synaptic plasticity ˆ As a second coding scheme, let us assume that the reward depends only on the latency t of the ﬁrst spike after stimulus onset. [sent-115, score-1.489]

57 More precisely, we assume that each trial starts with the onset of the presynaptic spike trains X and that a reward is delivered at the time of the ﬁrst spike. [sent-116, score-1.287]

58 The reward depends on the latency of that spike, so that certain latencies are favored. [sent-117, score-0.573]

59 The probability of choosing the actions is proportional to the spike counts of two output neurons: p(ak s) = Nk (N1 + N2 ). [sent-122, score-0.667]

60 Blue: Spike count learning rule (7), Red: Full spike train rule (16). [sent-124, score-1.081]

61 C Evolution of the spike count in response to the two input states during learning. [sent-125, score-0.705]

62 Both rewards (panel B) and spike counts (panel C) are low-pass ﬁltered with a time constant of 4000 trials. [sent-126, score-0.613]

63 D Learning of ﬁrst spike latencies with the latency rule (11). [sent-127, score-0.878]

64 Two different output neurons are to learn to ﬁre their ﬁrst spike at given target latencies L1 2 . [sent-128, score-0.783]

65 We present one of two ﬁxed input spike train patterns (“stimuli”) to the neurons in randomly interleaved trials. [sent-129, score-0.813]

66 The input spike train for each input neuron is drawn separately for each stimulus by sampling once from a Poisson process with a rate of 10Hz. [sent-130, score-0.967]

67 The colored curves show that the ﬁrst spike latencies of neurons 1 (green, red) and neuron 2 (purple, blue) converge to the target latencies. [sent-132, score-0.891]

68 The black curve (scale on the right axis) shows the evolution of the reward during learning. [sent-133, score-0.394]

69 The probability distribution of the spike latency is given by the product of the ﬁring probability at time t and the probability that the neuron did not ﬁre earlier: t P (t) = (t) exp (t ) dt (10) 0 Using eq. [sent-134, score-0.858]

70 4 The Full Spike Train Code: Synaptic plasticity Finally, let us consider the most general coding feature, namely, the full spike train. [sent-137, score-0.927]

71 Let us start with a time-discretized version of the spike train with a discretization that is sufﬁciently narrow to allow at most one spike per time bin. [sent-138, score-1.275]

72 In each time bin [t t + t], the number of spikes Yt follows a Bernoulli distribution with spiking probability pt , which depends on the input and on the recent history of the neuron. [sent-139, score-0.334]

73 In the limit ∆t → 0, pt can be approximated by pt → ρ∆t, which leads to the continuous time version of the rule: ∂t wi = ηR lim t→0 = ηR t Yt − ρt ∆t (Y (t) − ρ(t)) ∂u ρt PSPi (t)∆t ρt (15) [∂u ρ](t) PSPi (t) dt . [sent-141, score-0.431]

74 4 Why use code-speciﬁc rules when more general rules are available? [sent-148, score-0.38]

75 Obviously, the learning rule (16) is the most general in the sense that it considers the whole spike train as a coding feature. [sent-149, score-1.078]

76 Say, we have a learning rule for two coding features F1 and F2 , of which only F1 is correlated with reward. [sent-152, score-0.48]

77 This can be achieved by subtracting a suitable reward baseline from the current reward. [sent-159, score-0.394]

78 Ideally, this should be done in a stimulus-speciﬁc way (because µ1 depends on the stimulus), which leads to the notion of a reward prediction error instead of a pure reward signal. [sent-160, score-0.893]

79 This approach is in line with both standard reinforcement learning theory [4] and the proposal that neuromodulatory signals like dopamine represent reward prediction error instead of reward alone. [sent-161, score-1.063]

80 This corresponds to using the learning rule for those features that are in fact correlated with reward while suppressing those that are not correlated with reward. [sent-164, score-0.694]

81 In extreme cases, where a very general rule is used for a very speciﬁc task, a very large number of coding dimensions may merely give rise to noise in the learning dynamics, while only one is relevant and causes systematic changes. [sent-166, score-0.458]

82 These considerations suggest that the spike count rule (7) should outperform the full spike train rule (16) in tasks where the reward is based purely on spike count. [sent-167, score-2.61]

83 5 Inherent Prediction Problems As shown in section 4, the policy-gradient rule with a reduced amount of noise in the gradient estimate is one that takes only the relevant coding features into account and subtracts the trial mean of the reward: Fj − µj ∂t θ = η(R − R(µ1 , µ2 , . [sent-172, score-0.575]

84 )) ∂θ µj (21) 2 σj j This learning rule has a conceptually interesting structure: Learning takes place only when two conditions are fulﬁlled: the animal did something unexpected (Fj − µi ) and receives an unexpected reward (R − R(µ1 , µ2 , . [sent-175, score-0.644]

85 Moreover, it raises two interesting prediction problems: (a) the prediction of the trial average µj of the coding feature conditioned on the stimulus and (b) the reward that is expected if the coding feature takes its mean value. [sent-179, score-1.278]

86 1 Prediction of the coding feature In the cases where we could derive the learning rule analytically, the trial average of the coding feature could be calculated from intrinsic properties of the neuron like its membrane potential. [sent-181, score-1.073]

87 This should be particularly problematic when trying to extend the framework to coding features of populations, where the population would need to know, e. [sent-183, score-0.326]

88 Learning would in this case be modulated by the mismatch of a top-down prediction of the coding feature - represented by µj (X) - and the real value of Fj , which is calculated by a “bottom-up” approach. [sent-189, score-0.408]

89 Another prediction system for the expected response could be a population coding scheme, in which a population of neurons is receiving the same input and should produce the same output. [sent-195, score-0.557]

90 Any neuron of the population could receive the average population activity as a prediction of its own mean response. [sent-196, score-0.313]

91 It would be interesting to study the relation of such an approach with the one recently proposed for reinforcement learning in populations of spiking neurons [11]. [sent-197, score-0.361]

92 2 Reward prediction The other quantity that should be predicted in the learning rule is the reward one would get when the coding feature would take the value of its mean. [sent-199, score-0.906]

93 In classical reinforcement learning, this term is often calculated in an actor-critic architecture, where some external module - the critic - learns the expected future reward either for states alone or for state-action pairs. [sent-201, score-0.546]

94 These values are then used to calculate the expected reward for the current state or state-action pair. [sent-202, score-0.394]

95 The difference between the reward that was really received and the predicted reward is then used as a reward prediction error that drives learning. [sent-203, score-1.259]

96 There is evidence that dopamine signals in the brain encode prediction error rather than reward alone [7]. [sent-204, score-0.559]

97 6 Discussion We have presented a general framework for deriving policy-gradient rules for spiking neurons and shown that different learning rules emerge depending on which features of the spike trains are assumed to inﬂuence the reward signals. [sent-205, score-1.758]

98 For exponentially distributed coding features, the learning rule has a characteristic structure, which allows a simple intuitive interpretation. [sent-208, score-0.435]

99 The fact that there is a whole class of code-speciﬁc policy-gradient learning rules opens the interesting possibility that neuronal learning rules could be controlled by metalearning processes that shape the learning rule according to what neural code is in effect. [sent-210, score-0.782]

100 From the biological perspective, it would be interesting to compare spike-based synaptic plasticity in different brain regions that are thought to use different neural codes and see if there are systematic differences. [sent-211, score-0.445]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('spike', 0.58), ('reward', 0.394), ('coding', 0.235), ('synaptic', 0.201), ('rules', 0.19), ('pt', 0.153), ('pspi', 0.149), ('rule', 0.147), ('neuron', 0.133), ('presynaptic', 0.112), ('plasticity', 0.112), ('neurons', 0.108), ('reinforcement', 0.107), ('trains', 0.102), ('postsynaptic', 0.101), ('fi', 0.096), ('ring', 0.094), ('train', 0.091), ('count', 0.091), ('fj', 0.088), ('latency', 0.081), ('prediction', 0.077), ('credit', 0.076), ('nef', 0.074), ('membrane', 0.072), ('spiking', 0.07), ('latencies', 0.07), ('trial', 0.069), ('dt', 0.064), ('code', 0.062), ('wi', 0.061), ('stimulus', 0.059), ('codes', 0.058), ('intrinsic', 0.055), ('sensory', 0.054), ('features', 0.051), ('bandit', 0.05), ('spikes', 0.049), ('agent', 0.047), ('tf', 0.047), ('poisson', 0.045), ('bcm', 0.045), ('neuronal', 0.045), ('action', 0.044), ('ci', 0.041), ('population', 0.04), ('yt', 0.038), ('uctuations', 0.036), ('rate', 0.036), ('getting', 0.035), ('evoked', 0.035), ('input', 0.034), ('signals', 0.033), ('assignment', 0.033), ('counts', 0.033), ('suppressing', 0.033), ('dopamine', 0.033), ('punished', 0.033), ('rewarded', 0.033), ('timed', 0.033), ('raises', 0.031), ('xie', 0.03), ('delivered', 0.03), ('florian', 0.03), ('gerstner', 0.03), ('lausanne', 0.03), ('actions', 0.029), ('populations', 0.029), ('threshold', 0.029), ('depends', 0.028), ('baxter', 0.028), ('unexpected', 0.028), ('feature', 0.028), ('characteristic', 0.028), ('systematic', 0.027), ('depending', 0.026), ('gradient', 0.026), ('shape', 0.026), ('neural', 0.025), ('seung', 0.025), ('output', 0.025), ('learning', 0.025), ('policy', 0.025), ('narrow', 0.024), ('ster', 0.024), ('noise', 0.024), ('derive', 0.023), ('modulated', 0.023), ('receiving', 0.023), ('mean', 0.023), ('simulations', 0.023), ('neuroscience', 0.023), ('calculated', 0.023), ('ful', 0.022), ('emerge', 0.022), ('interesting', 0.022), ('alone', 0.022), ('potential', 0.022), ('correlated', 0.022), ('mismatch', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.000001 52 nips-2009-Code-specific policy gradient rules for spiking neurons

Author: Henning Sprekeler, Guillaume Hennequin, Wulfram Gerstner

2 0.29749015 121 nips-2009-Know Thy Neighbour: A Normative Theory of Synaptic Depression

Author: Jean-pascal Pfister, Peter Dayan, Máté Lengyel

Abstract: Synapses exhibit an extraordinary degree of short-term malleability, with release probabilities and effective synaptic strengths changing markedly over multiple timescales. From the perspective of a ﬁxed computational operation in a network, this seems like a most unacceptable degree of added variability. We suggest an alternative theory according to which short-term synaptic plasticity plays a normatively-justiﬁable role. This theory starts from the commonplace observation that the spiking of a neuron is an incomplete, digital, report of the analog quantity that contains all the critical information, namely its membrane potential. We suggest that a synapse solves the inverse problem of estimating the pre-synaptic membrane potential from the spikes it receives, acting as a recursive ﬁlter. We show that the dynamics of short-term synaptic depression closely resemble those required for optimal ﬁltering, and that they indeed support high quality estimation. Under this account, the local postsynaptic potential and the level of synaptic resources track the (scaled) mean and variance of the estimated presynaptic membrane potential. We make experimentally testable predictions for how the statistics of subthreshold membrane potential ﬂuctuations and the form of spiking non-linearity should be related to the properties of short-term plasticity in any particular cell type. 1

3 0.28653556 183 nips-2009-Optimal context separation of spiking haptic signals by second-order somatosensory neurons

Author: Romain Brasselet, Roland Johansson, Angelo Arleo

Abstract: We study an encoding/decoding mechanism accounting for the relative spike timing of the signals propagating from peripheral nerve ﬁbers to second-order somatosensory neurons in the cuneate nucleus (CN). The CN is modeled as a population of spiking neurons receiving as inputs the spatiotemporal responses of real mechanoreceptors obtained via microneurography recordings in humans. The efﬁciency of the haptic discrimination process is quantiﬁed by a novel deﬁnition of entropy that takes into full account the metrical properties of the spike train space. This measure proves to be a suitable decoding scheme for generalizing the classical Shannon entropy to spike-based neural codes. It permits an assessment of neurotransmission in the presence of a large output space (i.e. hundreds of spike trains) with 1 ms temporal precision. It is shown that the CN population code performs a complete discrimination of 81 distinct stimuli already within 35 ms of the ﬁrst afferent spike, whereas a partial discrimination (80% of the maximum information transmission) is possible as rapidly as 15 ms. This study suggests that the CN may not constitute a mere synaptic relay along the somatosensory pathway but, rather, it may convey optimal contextual accounts (in terms of fast and reliable information transfer) of peripheral tactile inputs to downstream structures of the central nervous system. 1

4 0.27131864 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

Author: Sebastian Gerwinn, Philipp Berens, Matthias Bethge

Abstract: Second-order maximum-entropy models have recently gained much interest for describing the statistics of binary spike trains. Here, we extend this approach to take continuous stimuli into account as well. By constraining the joint secondorder statistics, we obtain a joint Gaussian-Boltzmann distribution of continuous stimuli and binary neural ﬁring patterns, for which we also compute marginal and conditional distributions. This model has the same computational complexity as pure binary models and ﬁtting it to data is a convex problem. We show that the model can be seen as an extension to the classical spike-triggered average/covariance analysis and can be used as a non-linear method for extracting features which a neural population is sensitive to. Further, by calculating the posterior distribution of stimuli given an observed neural response, the model can be used to decode stimuli and yields a natural spike-train metric. Therefore, extending the framework of maximum-entropy models to continuous variables allows us to gain novel insights into the relationship between the ﬁring patterns of neural ensembles and the stimuli they are processing. 1

5 0.24405544 200 nips-2009-Reconstruction of Sparse Circuits Using Multi-neuronal Excitation (RESCUME)

Author: Tao Hu, Anthony Leonardo, Dmitri B. Chklovskii

Abstract: One of the central problems in neuroscience is reconstructing synaptic connectivity in neural circuits. Synapses onto a neuron can be probed by sequentially stimulating potentially pre-synaptic neurons while monitoring the membrane voltage of the post-synaptic neuron. Reconstructing a large neural circuit using such a “brute force” approach is rather time-consuming and inefficient because the connectivity in neural circuits is sparse. Instead, we propose to measure a post-synaptic neuron’s voltage while stimulating sequentially random subsets of multiple potentially pre-synaptic neurons. To reconstruct these synaptic connections from the recorded voltage we apply a decoding algorithm recently developed for compressive sensing. Compared to the brute force approach, our method promises significant time savings that grow with the size of the circuit. We use computer simulations to find optimal stimulation parameters and explore the feasibility of our reconstruction method under realistic experimental conditions including noise and non-linear synaptic integration. Multineuronal stimulation allows reconstructing synaptic connectivity just from the spiking activity of post-synaptic neurons, even when sub-threshold voltage is unavailable. By using calcium indicators, voltage-sensitive dyes, or multi-electrode arrays one could monitor activity of multiple postsynaptic neurons simultaneously, thus mapping their synaptic inputs in parallel, potentially reconstructing a complete neural circuit. 1 In tro d uc tio n Understanding information processing in neural circuits requires systematic characterization of synaptic connectivity [1, 2]. The most direct way to measure synapses between a pair of neurons is to stimulate potentially pre-synaptic neuron while recording intra-cellularly from the potentially post-synaptic neuron [3-8]. This method can be scaled to reconstruct multiple synaptic connections onto one neuron by combining intracellular recordings from the postsynaptic neuron with photo-activation of pre-synaptic neurons using glutamate uncaging [913] or channelrhodopsin [14, 15], or with multi-electrode arrays [16, 17]. Neurons are sequentially stimulated to fire action potentials by scanning a laser beam (or electrode voltage) over a brain slice, while synaptic weights are measured by recording post-synaptic voltage. Although sequential excitation of single potentially pre-synaptic neurons could reveal connectivity, such a “brute force” approach is inefficient because the connectivity among neurons is sparse. Even among nearby neurons in the cerebral cortex, the probability of connection is only about ten percent [3-8]. Connection probability decays rapidly with the 1 distance between neurons and falls below one percent on the scale of a cortical column [3, 8]. Thus, most single-neuron stimulation trials would result in zero response making the brute force approach slow, especially for larger circuits. Another drawback of the brute force approach is that single-neuron stimulation cannot be combined efficiently with methods allowing parallel recording of neural activity, such as calcium imaging [18-22], voltage-sensitive dyes [23-25] or multi-electrode arrays [17, 26]. As these techniques do not reliably measure sub-threshold potential but report only spiking activity, they would reveal only the strongest connections that can drive a neuron to fire [2730]. Therefore, such combination would reveal only a small fraction of the circuit. We propose to circumvent the above limitations of the brute force approach by stimulating multiple potentially pre-synaptic neurons simultaneously and reconstructing individual connections by using a recently developed method called compressive sensing (CS) [31-35]. In each trial, we stimulate F neurons randomly chosen out of N potentially pre-synaptic neurons and measure post-synaptic activity. Although each measurement yields only a combined response to stimulated neurons, if synaptic inputs sum linearly in a post-synaptic neuron, one can reconstruct the weights of individual connections by using an optimization algorithm. Moreover, if the synaptic connections are sparse, i.e. only K << N potentially pre-synaptic neurons make synaptic connections onto a post-synaptic neuron, the required number of trials M ~ K log(N/K), which is much less than N [31-35]. The proposed method can be used even if only spiking activity is available. Because multiple neurons are driven to fire simultaneously, if several of them synapse on the post-synaptic neuron, they can induce one or more spikes in that neuron. As quantized spike counts carry less information than analog sub-threshold voltage recordings, reconstruction requires a larger number of trials. Yet, the method can be used to reconstruct a complete feedforward circuit from spike recordings. Reconstructing neural circuit with multi-neuronal excitation may be compared with mapping retinal ganglion cell receptive fields. Typically, photoreceptors are stimulated by white-noise checkerboard stimulus and the receptive field is obtained by Reverse Correlation (RC) in case of sub-threshold measurements or Spike-Triggered Average (STA) of the stimulus [36, 37]. Although CS may use the same stimulation protocol, for a limited number of trials, the reconstruction quality is superior to RC or STA. 2 Ma pp i ng sy na pti c inp ut s o nto o n e ne uro n We start by formalizing the problem of mapping synaptic connections from a population of N potentially pre-synaptic neurons onto a single neuron, as exemplified by granule cells synapsing onto a Purkinje cell (Figure 1a). Our experimental protocol can be illustrated using linear algebra formalism, Figure 1b. We represent synaptic weights as components of a column vector x, where zeros represent non-existing connections. Each row in the stimulation matrix A represents a trial, ones indicating neurons driven to spike once and zeros indicating non-spiking neurons. The number of rows in the stimulation matrix A is equal to the number of trials M. The column vector y represents M measurements of membrane voltage obtained by an intra-cellular recording from the post-synaptic neuron: y = Ax. (1) In order to recover individual synaptic weights, Eq. (1) must be solved for x. RC (or STA) solution to this problem is x = (ATA)-1AT y, which minimizes (y-Ax)2 if M>N. In the case M << N for a sparse circuit. In this section we search computationally for the minimum number of trials required for exact reconstruction as a function of the number of non-zero synaptic weights K out of N potentially pre-synaptic neurons. First, note that the number of trials depends on the number of stimulated neurons F. If F = 1 we revert to the brute force approach and the number of measurements is N, while for F = N, the measurements are redundant and no finite number suffices. As the minimum number of measurements is expected to scale as K logN, there must be an optimal F which makes each measurement most informative about x. To determine the optimal number of stimulated neurons F for given K and N, we search for the minimum number of trials M, which allows a perfect reconstruction of the synaptic connectivity x. For each F, we generate 50 synaptic weight vectors and attempt reconstruction from sequentially increasing numbers of trials. The value of M, at which all 50 recoveries are successful (up to computer round-off error), estimates the number of trial needed for reconstruction with probability higher than 98%. By repeating this procedure 50 times for each F, we estimate the mean and standard deviation of M. We find that, for given N and K, the minimum number of trials, M, as a function of the number of stimulated neurons, F, has a shallow minimum. As K decreases, the minimum shifts towards larger F because more neurons should be stimulated simultaneously for sparser x. For the explored range of simulation parameters, the minimum is located close to 0.1N. Next, we set F = 0.1N and explore how the minimum number of measurements required for exact reconstruction depends on K and N. Results of the simulations following the recipe described above are shown in Figure 3a. As expected, when x is sparse, M grows approximately linearly with K (Figure 3b), and logarithmically with N (Figure 3c). N = 1000 180 25 160 20 140 120 15 100 10 80 5 150 250 400 650 1000 Number of potential connections (N) K = 30 220 200 (a) 200 220 Number of measurements (M) 30 Number of measurements (M) Number of actual connections (K) Number of necessary measurements (M) (b) 180 160 140 120 100 80 5 10 15 20 25 30 Number of actual connections (K) 210 (c) 200 190 180 170 160 150 140 130 120 2 10 10 3 Number of potential connections (N) Figure 3: a) Minimum number of measurements M required for reconstruction as a function of the number of actual synapses, K, and the number of potential synapses, N. b) For given N, we find M ~ K. c) For given K, we find M ~ logN (note semi-logarithmic scale in c). 4 R o b ust nes s o f re con st r uc t io n s t o noi se a n d v io la tio n o f si m pli fy in g a ss umpt io n s To make our simulation more realistic we now take into account three possible sources of noise: 1) In reality, post-synaptic voltage on a given synapse varies from trial to trial [4, 5, 46-52], an effect we call synaptic noise. Such noise detrimentally affects reconstructions because each row of A is multiplied by a different instantiation of vector x. 2) Stimulation of neurons may be imprecise exciting a slightly different subset of neurons than intended and/or firing intended neurons multiple times. We call this effect stimulation noise. Such noise detrimentally affects reconstructions because, in its presence, the actual measurement matrix A is different from the one used for recovery. 3) A synapse may fail to release neurotransmitter with some probability. Naturally, in the presence of noise, reconstructions cannot be exact. We quantify the 4 reconstruction x − xr l2 = error ∑ N i =1 by the normalized x − xr l2–error l2 / xl , where 2 ( xi − xri ) 2 . We plot normalized reconstruction error in brute force approach (M = N = 500 trials) as a function of noise, as well as CS and RC reconstruction errors (M = 200, 600 trials), Figure 4. 2 0.9 Normalized reconstruction error ||x-x|| /||x|| 1 r 2 For each noise source, the reconstruction error of the brute force approach can be achieved with 60% fewer trials by CS method for the above parameters (Figure 4). For the same number of trials, RC method performs worse. Naturally, the reconstruction error decreases with the number of trials. The reconstruction error is most sensitive to stimulation noise and least sensitive to synaptic noise. 1 (a) 1 (b) 0.9 0.8 (c) 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0 0 0.7 RC: M=200 RC: M=600 Brute force method: M=500 CS: M=200 CS: M=600 0.6 0.5 0 0.05 0.1 0.15 Synaptic noise level 0.2 0.25 0 0.05 0.1 0.15 Stimulation noise level 0.2 0.25 0 0 0.05 0.1 0.15 0.2 0.25 Synaptic failure probability Figure 4: Impact of noise on the reconstruction quality for N = 500, K = 30, F = 50. a) Recovery error due to trial-to-trial variation in synaptic weight. The response y is calculated using the synaptic connectivity x perturbed by an additive Gaussian noise. The noise level is given by the coefficient of variation of synaptic weight. b) Recovery error due to stimulation noise. The matrix A used for recovery is obtained from the binary matrix used to calculate the measurement vector y by shifting, in each row, a fraction of ones specified by the noise level to random positions. c) Recovery error due to synaptic failures. The detrimental effect of the stimulation noise on the reconstruction can be eliminated by monitoring spiking activity of potentially pre-synaptic neurons. By using calcium imaging [18-22], voltage-sensitive dyes [23] or multi-electrode arrays [17, 26] one could record the actual stimulation matrix. Because most random matrices satisfy the reconstruction requirements [31, 34, 35], the actual stimulation matrix can be used for a successful recovery instead of the intended one. If neuronal activity can be monitored reliably, experiments can be done in a different mode altogether. Instead of stimulating designated neurons with high fidelity by using highly localized and intense light, one could stimulate all neurons with low probability. Random firing events can be detected and used in the recovery process. The light intensity can be tuned to stimulate the optimal number of neurons per trial. Next, we explore the sensitivity of the proposed reconstruction method to the violation of simplifying assumptions. First, whereas our simulation assumes that the actual number of connections, K, is known, in reality, connectivity sparseness is known a priori only approximately. Will this affect reconstruction results? In principle, CS does not require prior knowledge of K for reconstruction [31, 34, 35]. For the CoSaMP algorithm, however, it is important to provide value K larger than the actual value (Figure 5a). Then, the algorithm will find all the actual synaptic weights plus some extra non-zero weights, negligibly small when compared to actual ones. Thus, one can provide the algorithm with the value of K safely larger than the actual one and then threshold the reconstruction result according to the synaptic noise level. Second, whereas we assumed a linear summation of inputs [53], synaptic integration may be 2 non-linear [54]. We model non-linearity by setting y = yl + α yl , where yl represents linearly summed synaptic inputs. Results of simulations (Figure 5b) show that although nonlinearity can significantly degrade CS reconstruction quality, it still performs better than RC. 5 (b) 0.45 0.4 Actual K = 30 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 10 20 30 40 50 60 Normalized reconstruction error ||x-x||2/||x|| r 2 Normalized reconstrcution error ||x-x||2/||x|| r 2 (a) 0.9 0.8 0.7 CS RC 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.15-0.12-0.09-0.06-0.03 0 0.03 0.06 0.09 0.12 0.15 Relative strength of the non-linear term α × mean(y ) K fed to CoSaMP l Figure 5: Sensitivity of reconstruction error to the violation of simplifying assumptions for N = 500, K = 30, M = 200, F = 50. a) The quality of the reconstruction is not affected if the CoSaMP algorithm is fed with the value of K larger than actual. b) Reconstruction error computed in 100 realizations for each value of the quadratic term relative to the linear term. 5 Ma pp i ng sy na pti c inp ut s o nto a n e uro na l po pu la tio n Until now, we considered reconstruction of synaptic inputs onto one neuron using subthreshold measurements of its membrane potential. In this section, we apply CS to reconstructing synaptic connections onto a population of potentially post-synaptic neurons. Because in CS the choice of stimulated neurons is non-adaptive, by recording from all potentially post-synaptic neurons in response to one sequence of trials one can reconstruct a complete feedforward network (Figure 6). (a) x p(y=1) A Normalized reconstruction error ||x/||x||2−xr/||xr||2||2 y (b) (c) 100 (d) 500 700 900 1100 Number of spikes 1 STA CS 0.8 0.6 0.4 0.2 0 1000 0 300 3000 5000 7000 9000 Number of trials (M) Ax Figure 6: Mapping of a complete feedforward network. a) Each post-synaptic neuron (red) receives synapses from a sparse subset of potentially pre-synaptic neurons (blue). b) Linear algebra representation of the experimental protocol. c) Probability of firing as a function of synaptic current. d) Comparison of CS and STA reconstruction error using spike trains for N = 500, K = 30 and F = 50. Although attractive, such parallelization raises several issues. First, patching a large number of neurons is unrealistic and, therefore, monitoring membrane potential requires using different methods, such as calcium imaging [18-22], voltage sensitive dyes [23-25] or multielectrode arrays [17, 26]. As these methods can report reliably only spiking activity, the measurement is not analog but discrete. Depending on the strength of summed synaptic inputs compared to the firing threshold, the postsynaptic neuron may be silent, fire once or multiple times. As a result, the measured response y is quantized by the integer number of spikes. Such quantized measurements are less informative than analog measurements of the sub-threshold membrane potential. In the extreme case of only two quantization levels, spike or no spike, each measurement contains only 1 bit of information. Therefore, to achieve reasonable reconstruction quality using quantized measurements, a larger number of trials M>>N is required. We simulate circuit reconstruction from spike recordings in silico as follows. First, we draw synaptic weights from an experimentally motivated distribution. Second, we generate a 6 random stimulation matrix and calculate the product Ax. Third, we linear half-wave rectify this product and use the result as the instantaneous firing rate for the Poisson spike generator (Figure 6c). We used a rectifying threshold that results in 10% of spiking trials as typically observed in experiments. Fourth, we reconstruct synaptic weights using STA and CS and compare the results with the generated weights. We calculated mean error over 100 realizations of the simulation protocol (Figure 6d). Due to the non-linear spike generating procedure, x can be recovered only up to a scaling factor. We propose to calibrate x with a few brute-force measurements of synaptic weights. Thus, in calculating the reconstruction error using l2 norm, we normalize both the generated and recovered synaptic weights. Such definition is equivalent to the angular error, which is often used to evaluate the performance of STA in mapping receptive field [37, 55]. Why is CS superior to STA for a given number of trials (Figure 6d)? Note that spikeless trials, which typically constitute a majority, also carry information about connectivity. While STA discards these trials, CS takes them into account. In particular, CoSaMP starts with the STA solution as zeroth iteration and improves on it by using the results of all trials and the sparseness prior. 6 D i s c uss ion We have demonstrated that sparse feedforward networks can be reconstructed by stimulating multiple potentially pre-synaptic neurons simultaneously and monitoring either subthreshold or spiking response of potentially post-synaptic neurons. When sub-threshold voltage is recorded, significantly fewer measurements are required than in the brute force approach. Although our method is sensitive to noise (with stimulation noise worse than synapse noise), it is no less robust than the brute force approach or RC. The proposed reconstruction method can also recover inputs onto a neuron from spike counts, albeit with more trials than from sub-threshold potential measurements. This is particularly useful when intra-cellular recordings are not feasible and only spiking can be detected reliably, for example, when mapping synaptic inputs onto multiple neurons in parallel. For a given number of trials, our method yields smaller error than STA. The proposed reconstruction method assumes linear summation of synaptic inputs (both excitatory and inhibitory) and is sensitive to non-linearity of synaptic integration. Therefore, it is most useful for studying connections onto neurons, in which synaptic integration is close to linear. On the other hand, multi-neuron stimulation is closer than single-neuron stimulation to the intrinsic activity in the live brain and can be used to study synaptic integration under realistic conditions. In contrast to circuit reconstruction using intrinsic neuronal activity [56, 57], our method relies on extrinsic stimulation of neurons. Can our method use intrinsic neuronal activity instead? We see two major drawbacks of such approach. First, activity of non-monitored presynaptic neurons may significantly distort reconstruction results. Thus, successful reconstruction would require monitoring all active pre-synaptic neurons, which is rather challenging. Second, reliable reconstruction is possible only when the activity of presynaptic neurons is uncorrelated. Yet, their activity may be correlated, for example, due to common input. We thank Ashok Veeraraghavan for introducing us to CS, Anthony Leonardo for making a retina dataset available for the analysis, Lou Scheffer and Hong Young Noh for commenting on the manuscript and anonymous reviewers for helpful suggestions. References [1] Luo, L., Callaway, E.M. & Svoboda, K. (2008) Genetic dissection of neural circuits. Neuron 57(5):634-660. [2] Helmstaedter, M., Briggman, K.L. & Denk, W. (2008) 3D structural imaging of the brain with photons and electrons. Current opinion in neurobiology 18(6):633-641. [3] Holmgren, C., Harkany, T., Svennenfors, B. & Zilberter, Y. (2003) Pyramidal cell communication within local networks in layer 2/3 of rat neocortex. Journal of Physiology 551:139-153. [4] Markram, H. (1997) A network of tufted layer 5 pyramidal neurons. Cerebral Cortex 7(6):523-533. 7 [5] Markram, H., Lubke, J., Frotscher, M., Roth, A. & Sakmann, B. (1997) Physiology and anatomy of synaptic connections between thick tufted pyramidal neurones in the developing rat neocortex. Journal of Physiology 500(2):409-440. [6] Thomson, A.M. & Bannister, A.P. (2003) Interlaminar connections in the neocortex. Cerebral Cortex 13(1):5-14. [7] Thomson, A.M., West, D.C., Wang, Y. & Bannister, A.P. (2002) Synaptic connections and small circuits involving excitatory and inhibitory neurons in layers 2-5 of adult rat and cat neocortex: triple intracellular recordings and biocytin labelling in vitro. Cerebral Cortex 12(9):936-953. [8] Song, S., Sjostrom, P.J., Reigl, M., Nelson, S. & Chklovskii, D.B. (2005) Highly nonrandom features of synaptic connectivity in local cortical circuits. Plos Biology 3(3):e68. [9] Callaway, E.M. & Katz, L.C. (1993) Photostimulation using caged glutamate reveals functional circuitry in living brain slices. Proceedings of the National Academy of Sciences of the United States of America 90(16):7661-7665. [10] Dantzker, J.L. & Callaway, E.M. (2000) Laminar sources of synaptic input to cortical inhibitory interneurons and pyramidal neurons. Nature Neuroscience 3(7):701-707. [11] Shepherd, G.M. & Svoboda, K. (2005) Laminar and columnar organization of ascending excitatory projections to layer 2/3 pyramidal neurons in rat barrel cortex. Journal of Neuroscience 25(24):5670-5679. [12] Nikolenko, V., Poskanzer, K.E. & Yuste, R. (2007) Two-photon photostimulation and imaging of neural circuits. Nature Methods 4(11):943-950. [13] Shoham, S., O'connor, D.H., Sarkisov, D.V. & Wang, S.S. (2005) Rapid neurotransmitter uncaging in spatially defined patterns. Nature Methods 2(11):837-843. [14] Gradinaru, V., Thompson, K.R., Zhang, F., Mogri, M., Kay, K., Schneider, M.B. & Deisseroth, K. (2007) Targeting and readout strategies for fast optical neural control in vitro and in vivo. Journal of Neuroscience 27(52):14231-14238. [15] Petreanu, L., Huber, D., Sobczyk, A. & Svoboda, K. (2007) Channelrhodopsin-2-assisted circuit mapping of long-range callosal projections. Nature Neuroscience 10(5):663-668. [16] Na, L., Watson, B.O., Maclean, J.N., Yuste, R. & Shepard, K.L. (2008) A 256×256 CMOS Microelectrode Array for Extracellular Neural Stimulation of Acute Brain Slices. Solid-State Circuits Conference, 2008. ISSCC 2008. Digest of Technical Papers. IEEE International. [17] Fujisawa, S., Amarasingham, A., Harrison, M.T. & Buzsaki, G. (2008) Behavior-dependent shortterm assembly dynamics in the medial prefrontal cortex. Nature Neuroscience 11(7):823-833. [18] Ikegaya, Y., Aaron, G., Cossart, R., Aronov, D., Lampl, I., Ferster, D. & Yuste, R. (2004) Synfire chains and cortical songs: temporal modules of cortical activity. Science 304(5670):559-564. [19] Ohki, K., Chung, S., Ch'ng, Y.H., Kara, P. & Reid, R.C. (2005) Functional imaging with cellular resolution reveals precise micro-architecture in visual cortex. Nature 433(7026):597-603. [20] Stosiek, C., Garaschuk, O., Holthoff, K. & Konnerth, A. (2003) In vivo two-photon calcium imaging of neuronal networks. Proceedings of the National Academy of Sciences of the United States of America 100(12):7319-7324. [21] Svoboda, K., Denk, W., Kleinfeld, D. & Tank, D.W. (1997) In vivo dendritic calcium dynamics in neocortical pyramidal neurons. Nature 385(6612):161-165. [22] Sasaki, T., Minamisawa, G., Takahashi, N., Matsuki, N. & Ikegaya, Y. (2009) Reverse optical trawling for synaptic connections in situ. Journal of Neurophysiology 102(1):636-643. [23] Zecevic, D., Djurisic, M., Cohen, L.B., Antic, S., Wachowiak, M., Falk, C.X. & Zochowski, M.R. (2003) Imaging nervous system activity with voltage-sensitive dyes. Current Protocols in Neuroscience Chapter 6:Unit 6.17. [24] Cacciatore, T.W., Brodfuehrer, P.D., Gonzalez, J.E., Jiang, T., Adams, S.R., Tsien, R.Y., Kristan, W.B., Jr. & Kleinfeld, D. (1999) Identification of neural circuits by imaging coherent electrical activity with FRET-based dyes. Neuron 23(3):449-459. [25] Taylor, A.L., Cottrell, G.W., Kleinfeld, D. & Kristan, W.B., Jr. (2003) Imaging reveals synaptic targets of a swim-terminating neuron in the leech CNS. Journal of Neuroscience 23(36):11402-11410. [26] Hutzler, M., Lambacher, A., Eversmann, B., Jenkner, M., Thewes, R. & Fromherz, P. (2006) High-resolution multitransistor array recording of electrical field potentials in cultured brain slices. Journal of Neurophysiology 96(3):1638-1645. [27] Egger, V., Feldmeyer, D. & Sakmann, B. (1999) Coincidence detection and changes of synaptic efficacy in spiny stellate neurons in rat barrel cortex. Nature Neuroscience 2(12):1098-1105. [28] Feldmeyer, D., Egger, V., Lubke, J. & Sakmann, B. (1999) Reliable synaptic connections between pairs of excitatory layer 4 neurones within a single 'barrel' of developing rat somatosensory cortex. Journal of Physiology 521:169-190. [29] Peterlin, Z.A., Kozloski, J., Mao, B.Q., Tsiola, A. & Yuste, R. (2000) Optical probing of neuronal circuits with calcium indicators. Proceedings of the National Academy of Sciences of the United States of America 97(7):3619-3624. 8 [30] Thomson, A.M., Deuchars, J. & West, D.C. (1993) Large, deep layer pyramid-pyramid single axon EPSPs in slices of rat motor cortex display paired pulse and frequency-dependent depression, mediated presynaptically and self-facilitation, mediated postsynaptically. Journal of Neurophysiology 70(6):2354-2369. [31] Baraniuk, R.G. (2007) Compressive sensing. Ieee Signal Processing Magazine 24(4):118-120. [32] Candes, E.J. (2008) Compressed Sensing. Twenty-Second Annual Conference on Neural Information Processing Systems, Tutorials. [33] Candes, E.J., Romberg, J.K. & Tao, T. (2006) Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics 59(8):1207-1223. [34] Candes, E.J. & Tao, T. (2006) Near-optimal signal recovery from random projections: Universal encoding strategies? Ieee Transactions on Information Theory 52(12):5406-5425. [35] Donoho, D.L. (2006) Compressed sensing. Ieee Transactions on Information Theory 52(4):12891306. [36] Ringach, D. & Shapley, R. (2004) Reverse Correlation in Neurophysiology. Cognitive Science 28:147-166. [37] Schwartz, O., Pillow, J.W., Rust, N.C. & Simoncelli, E.P. (2006) Spike-triggered neural characterization. Journal of Vision 6(4):484-507. [38] Candes, E.J. & Tao, T. (2005) Decoding by linear programming. Ieee Transactions on Information Theory 51(12):4203-4215. [39] Needell, D. & Vershynin, R. (2009) Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit. Foundations of Computational Mathematics 9(3):317-334. [40] Tropp, J.A. & Gilbert, A.C. (2007) Signal recovery from random measurements via orthogonal matching pursuit. Ieee Transactions on Information Theory 53(12):4655-4666. [41] Dai, W. & Milenkovic, O. (2009) Subspace Pursuit for Compressive Sensing Signal Reconstruction. Ieee Transactions on Information Theory 55(5):2230-2249. [42] Needell, D. & Tropp, J.A. (2009) CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Applied and Computational Harmonic Analysis 26(3):301-321. [43] Varshney, L.R., Sjostrom, P.J. & Chklovskii, D.B. (2006) Optimal information storage in noisy synapses under resource constraints. Neuron 52(3):409-423. [44] Brunel, N., Hakim, V., Isope, P., Nadal, J.P. & Barbour, B. (2004) Optimal information storage and the distribution of synaptic weights: perceptron versus Purkinje cell. Neuron 43(5):745-757. [45] Napoletani, D. & Sauer, T.D. (2008) Reconstructing the topology of sparsely connected dynamical networks. Physical Review E 77(2):026103. [46] Allen, C. & Stevens, C.F. (1994) An evaluation of causes for unreliability of synaptic transmission. Proceedings of the National Academy of Sciences of the United States of America 91(22):10380-10383. [47] Hessler, N.A., Shirke, A.M. & Malinow, R. (1993) The probability of transmitter release at a mammalian central synapse. Nature 366(6455):569-572. [48] Isope, P. & Barbour, B. (2002) Properties of unitary granule cell-->Purkinje cell synapses in adult rat cerebellar slices. Journal of Neuroscience 22(22):9668-9678. [49] Mason, A., Nicoll, A. & Stratford, K. (1991) Synaptic transmission between individual pyramidal neurons of the rat visual cortex in vitro. Journal of Neuroscience 11(1):72-84. [50] Raastad, M., Storm, J.F. & Andersen, P. (1992) Putative Single Quantum and Single Fibre Excitatory Postsynaptic Currents Show Similar Amplitude Range and Variability in Rat Hippocampal Slices. European Journal of Neuroscience 4(1):113-117. [51] Rosenmund, C., Clements, J.D. & Westbrook, G.L. (1993) Nonuniform probability of glutamate release at a hippocampal synapse. Science 262(5134):754-757. [52] Sayer, R.J., Friedlander, M.J. & Redman, S.J. (1990) The time course and amplitude of EPSPs evoked at synapses between pairs of CA3/CA1 neurons in the hippocampal slice. Journal of Neuroscience 10(3):826-836. [53] Cash, S. & Yuste, R. (1999) Linear summation of excitatory inputs by CA1 pyramidal neurons. Neuron 22(2):383-394. [54] Polsky, A., Mel, B.W. & Schiller, J. (2004) Computational subunits in thin dendrites of pyramidal cells. Nature Neuroscience 7(6):621-627. [55] Paninski, L. (2003) Convergence properties of three spike-triggered analysis techniques. Network: Computation in Neural Systems 14(3):437-464. [56] Okatan, M., Wilson, M.A. & Brown, E.N. (2005) Analyzing functional connectivity using a network likelihood model of ensemble neural spiking activity. Neural Computation 17(9):1927-1961. [57] Timme, M. (2007) Revealing network connectivity from response dynamics. Physical Review Letters 98(22):224101. 9

6 0.23176068 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies

7 0.20742975 247 nips-2009-Time-rescaling methods for the estimation and assessment of non-Poisson neural encoding models

8 0.20480427 165 nips-2009-Noise Characterization, Modeling, and Reduction for In Vivo Neural Recording

9 0.20339349 99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning

10 0.16798502 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

11 0.16574836 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs

12 0.16134655 242 nips-2009-The Infinite Partially Observable Markov Decision Process

13 0.15652418 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

14 0.15150405 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

15 0.14354061 163 nips-2009-Neurometric function analysis of population codes

16 0.12168708 164 nips-2009-No evidence for active sparsification in the visual cortex

17 0.11468047 203 nips-2009-Replacing supervised classification learning by Slow Feature Analysis in spiking neural networks

18 0.10118252 53 nips-2009-Complexity of Decentralized Control: Special Cases

19 0.097353257 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

20 0.095748879 107 nips-2009-Help or Hinder: Bayesian Models of Social Goal Inference

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.2), (1, -0.237), (2, 0.496), (3, 0.122), (4, -0.039), (5, -0.081), (6, -0.194), (7, 0.055), (8, 0.071), (9, 0.139), (10, 0.041), (11, 0.067), (12, 0.061), (13, -0.037), (14, -0.094), (15, 0.045), (16, 0.014), (17, -0.025), (18, 0.114), (19, -0.014), (20, 0.043), (21, 0.023), (22, -0.069), (23, 0.126), (24, -0.065), (25, -0.003), (26, 0.103), (27, -0.031), (28, 0.064), (29, 0.007), (30, 0.055), (31, 0.059), (32, 0.028), (33, -0.064), (34, 0.038), (35, -0.02), (36, -0.016), (37, 0.066), (38, -0.036), (39, 0.077), (40, -0.054), (41, -0.007), (42, 0.024), (43, 0.015), (44, 0.039), (45, 0.027), (46, -0.046), (47, 0.025), (48, 0.01), (49, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97774369 52 nips-2009-Code-specific policy gradient rules for spiking neurons

Author: Henning Sprekeler, Guillaume Hennequin, Wulfram Gerstner

2 0.79748321 183 nips-2009-Optimal context separation of spiking haptic signals by second-order somatosensory neurons

Author: Romain Brasselet, Roland Johansson, Angelo Arleo

3 0.74460256 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies

Author: Arno Onken, Steffen Grünewälder, Klaus Obermayer

Abstract: The linear correlation coefﬁcient is typically used to characterize and analyze dependencies of neural spike counts. Here, we show that the correlation coefﬁcient is in general insufﬁcient to characterize these dependencies. We construct two neuron spike count models with Poisson-like marginals and vary their dependence structure using copulas. To this end, we construct a copula that allows to keep the spike counts uncorrelated while varying their dependence strength. Moreover, we employ a network of leaky integrate-and-ﬁre neurons to investigate whether weakly correlated spike counts with strong dependencies are likely to occur in real networks. We ﬁnd that the entropy of uncorrelated but dependent spike count distributions can deviate from the corresponding distribution with independent components by more than 25 % and that weakly correlated but strongly dependent spike counts are very likely to occur in biological networks. Finally, we introduce a test for deciding whether the dependence structure of distributions with Poissonlike marginals is well characterized by the linear correlation coefﬁcient and verify it for different copula-based models. 1

4 0.72640795 121 nips-2009-Know Thy Neighbour: A Normative Theory of Synaptic Depression

Author: Jean-pascal Pfister, Peter Dayan, Máté Lengyel

5 0.68062854 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

Author: Sebastian Gerwinn, Philipp Berens, Matthias Bethge

6 0.67799419 247 nips-2009-Time-rescaling methods for the estimation and assessment of non-Poisson neural encoding models

7 0.65514016 200 nips-2009-Reconstruction of Sparse Circuits Using Multi-neuronal Excitation (RESCUME)

8 0.63299 165 nips-2009-Noise Characterization, Modeling, and Reduction for In Vivo Neural Recording

9 0.60546201 99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning

10 0.55556679 203 nips-2009-Replacing supervised classification learning by Slow Feature Analysis in spiking neural networks

11 0.54111302 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs

12 0.52643996 163 nips-2009-Neurometric function analysis of population codes

13 0.47573027 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

14 0.38786641 164 nips-2009-No evidence for active sparsification in the visual cortex

15 0.38288704 13 nips-2009-A Neural Implementation of the Kalman Filter

16 0.33743334 242 nips-2009-The Infinite Partially Observable Markov Decision Process

17 0.32818708 218 nips-2009-Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining

18 0.30695406 53 nips-2009-Complexity of Decentralized Control: Special Cases

19 0.30227837 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

20 0.2826204 134 nips-2009-Learning to Explore and Exploit in POMDPs

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(24, 0.041), (25, 0.042), (35, 0.03), (36, 0.078), (39, 0.036), (44, 0.021), (58, 0.047), (61, 0.078), (62, 0.02), (71, 0.048), (81, 0.077), (86, 0.069), (91, 0.318)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.93615782 39 nips-2009-Bayesian Belief Polarization

Author: Alan Jern, Kai-min Chang, Charles Kemp

Abstract: Empirical studies have documented cases of belief polarization, where two people with opposing prior beliefs both strengthen their beliefs after observing the same evidence. Belief polarization is frequently offered as evidence of human irrationality, but we demonstrate that this phenomenon is consistent with a fully Bayesian approach to belief revision. Simulation results indicate that belief polarization is not only possible but relatively common within the set of Bayesian models that we consider. Suppose that Carol has requested a promotion at her company and has received a score of 50 on an aptitude test. Alice, one of the company’s managers, began with a high opinion of Carol and became even more conﬁdent of her abilities after seeing her test score. Bob, another manager, began with a low opinion of Carol and became even less conﬁdent about her qualiﬁcations after seeing her score. On the surface, it may appear that either Alice or Bob is behaving irrationally, since the same piece of evidence has led them to update their beliefs about Carol in opposite directions. This situation is an example of belief polarization [1, 2], a widely studied phenomenon that is often taken as evidence of human irrationality [3, 4]. In some cases, however, belief polarization may appear much more sensible when all the relevant information is taken into account. Suppose, for instance, that Alice was familiar with the aptitude test and knew that it was scored out of 60, but that Bob was less familiar with the test and assumed that the score was a percentage. Even though only one interpretation of the score can be correct, Alice and Bob have both made rational inferences given their assumptions about the test. Some instances of belief polarization are almost certain to qualify as genuine departures from rational inference, but we argue in this paper that others will be entirely compatible with a rational approach. Distinguishing between these cases requires a precise normative standard against which human inferences can be compared. We suggest that Bayesian inference provides this normative standard, and present a set of Bayesian models that includes cases where polarization can and cannot emerge. Our work is in the spirit of previous studies that use careful rational analyses in order to illuminate apparently irrational human behavior (e.g. [5, 6, 7]). Previous studies of belief polarization have occasionally taken a Bayesian approach, but often the goal is to show how belief polarization can emerge as a consequence of approximate inference in a Bayesian model that is subject to memory constraints or processing limitations [8]. In contrast, we demonstrate that some examples of polarization are compatible with a fully Bayesian approach. Other formal accounts of belief polarization have relied on complex versions of utility theory [9], or have focused on continuous hypothesis spaces [10] unlike the discrete hypothesis spaces usually considered by psychological studies of belief polarization. We focus on discrete hypothesis spaces and require no additional machinery beyond the basics of Bayesian inference. We begin by introducing the belief revision phenomena considered in this paper and developing a Bayesian approach that clariﬁes whether and when these phenomena should be considered irrational. We then consider several Bayesian models that are capable of producing belief polarization and illustrate them with concrete examples. Having demonstrated that belief polarization is compatible 1 (a) Contrary updating (i) Divergence (ii) (b) Parallel updating Convergence A P (h1 ) 0.5 0.5 0.5 B Prior beliefs Updated beliefs Prior beliefs Updated beliefs Prior beliefs Updated beliefs Figure 1: Examples of belief updating behaviors for two individuals, A (solid line) and B (dashed line). The individuals begin with different beliefs about hypothesis h1 . After observing the same set of evidence, their beliefs may (a) move in opposite directions or (b) move in the same direction. with a Bayesian approach, we present simulations suggesting that this phenomenon is relatively generic within the space of models that we consider. We ﬁnish with some general comments on human rationality and normative models. 1 Belief revision phenomena The term “belief polarization” is generally used to describe situations in which two people observe the same evidence and update their respective beliefs in the directions of their priors. A study by Lord, et al. [1] provides one classic example in which participants read about two studies, one of which concluded that the death penalty deters crime and another which concluded that the death penalty has no effect on crime. After exposure to this mixed evidence, supporters of the death penalty strengthened their support and opponents strengthened their opposition. We will treat belief polarization as a special case of contrary updating, a phenomenon where two people update their beliefs in opposite directions after observing the same evidence (Figure 1a). We distinguish between two types of contrary updating. Belief divergence refers to cases in which the person with the stronger belief in some hypothesis increases the strength of his or her belief and the person with the weaker belief in the hypothesis decreases the strength of his or her belief (Figure 1a(i)). Divergence therefore includes cases of traditional belief polarization. The opposite of divergence is belief convergence (Figure 1a(ii)), in which the person with the stronger belief decreases the strength of his or her belief and the person with the weaker belief increases the strength of his or her belief. Contrary updating may be contrasted with parallel updating (Figure 1b), in which the two people update their beliefs in the same direction. Throughout this paper, we consider only situations in which both people change their beliefs after observing some evidence. All such situations can be unambiguously classiﬁed as instances of parallel or contrary updating. Parallel updating is clearly compatible with a normative approach, but the normative status of divergence and convergence is less clear. Many authors argue that divergence is irrational, and many of the same authors also propose that convergence is rational [2, 3]. For example, Baron [3] writes that “Normatively, we might expect that beliefs move toward the middle of the range when people are presented with mixed evidence.” (p. 210) The next section presents a formal analysis that challenges the conventional wisdom about these phenomena and clariﬁes the cases where they can be considered rational. 2 A Bayesian approach to belief revision Since belief revision involves inference under uncertainty, Bayesian inference provides the appropriate normative standard. Consider a problem where two people observe data d that bear on some hypothesis h1 . Let P1 (·) and P2 (·) be distributions that capture the two people’s respective beliefs. Contrary updating occurs whenever one person’s belief in h1 increases and the other person’s belief in h1 decreases, or when [P1 (h1 |d) − P1 (h1 )] [P2 (h1 |d) − P2 (h1 )] < 0 . 2 (1) Family 1 (a) H (c) (d) (e) V H D Family 2 (b) V V V D H D H D H (f) (g) V V D H D (h) V H D H D Figure 2: (a) A simple Bayesian network that cannot produce either belief divergence or belief convergence. (b) – (h) All possible three-node Bayes nets subject to the constraints described in the text. Networks in Family 1 can produce only parallel updating, but networks in Family 2 can produce both parallel and contrary updating. We will use Bayesian networks to capture the relationships between H, D, and any other variables that are relevant to the situation under consideration. For example, Figure 2a captures the idea that the data D are probabilistically generated from hypothesis H. The remaining networks in Figure 2 show several other ways in which D and H may be related, and will be discussed later. We assume that the two individuals agree on the variables that are relevant to a problem and agree about the relationships between these variables. We can formalize this idea by requiring that both people agree on the structure and the conditional probability distributions (CPDs) of a network N that captures relationships between the relevant variables, and that they differ only in the priors they assign to the root nodes of N . If N is the Bayes net in Figure 2a, then we assume that the two people must agree on the distribution P (D|H), although they may have different priors P1 (H) and P2 (H). If two people agree on network N but have different priors on the root nodes, we can create a single expanded Bayes net to simulate the inferences of both individuals. The expanded network is created by adding a background knowledge node B that sends directed edges to all root nodes in N , and acts as a switch that sets different root node priors for the two different individuals. Given this expanded network, distributions P1 and P2 in Equation 1 can be recovered by conditioning on the value of the background knowledge node and rewritten as [P (h1 |d, b1 ) − P (h1 |b1 )] [P (h1 |d, b2 ) − P (h1 |b2 )] < 0 (2) where P (·) represents the probability distribution captured by the expanded network. Suppose that there are exactly two mutually exclusive hypotheses. For example, h1 and h0 might state that the death penalty does or does not deter crime. In this case Equation 2 implies that contrary updating occurs when [P (d|h1 , b1 ) − P (d|h0 , b1 )] [P (d|h1 , b2 ) − P (d|h0 , b2 )] < 0 . (3) Equation 3 is derived in the supporting material, and leads immediately to the following result: R1: If H is a binary variable and D and B are conditionally independent given H, then contrary updating is impossible. Result R1 follows from the observation that if D and B are conditionally independent given H, then the product in Equation 3 is equal to (P (d|h1 ) − P (d|h0 ))2 , which cannot be less than zero. R1 implies that the simple Bayes net in Figure 2a is incapable of producing contrary updating, an observation previously made by Lopes [11]. Our analysis may help to explain the common intuition that belief divergence is irrational, since many researchers seem to implicitly adopt a model in which H and D are the only relevant variables. Network 2a, however, is too simple to capture the causal relationships that are present in many real world situations. For example, the promotion example at the beginning of this paper is best captured using a network with an additional node that represents the grading scale for the aptitude test. Networks with many nodes may be needed for some real world problems, but here we explore the space of three-node networks. We restrict our attention to connected graphs in which D has no outgoing edges, motivated by the idea that the three variables should be linked and that the data are the ﬁnal result of some generative process. The seven graphs that meet these conditions are shown in Figures 2b–h, where the additional variable has been labeled V . These Bayes nets illustrate cases in which (b) V is an additional 3 Models Conventional wisdom Family 1 Family 2 Belief divergence Belief convergence Parallel updating Table 1: The ﬁrst column represents the conventional wisdom about which belief revision phenomena are normative. The models in the remaining columns include all three-node Bayes nets. This set of models can be partitioned into those that support both belief divergence and convergence (Family 2) and those that support neither (Family 1). piece of evidence that bears on H, (c) V informs the prior probability of H, (d)–(e) D is generated by an intervening variable V , (f) V is an additional generating factor of D, (g) V informs both the prior probability of H and the likelihood of D, and (h) H and D are both effects of V . The graphs in Figure 2 have been organized into two families. R1 implies that none of the graphs in Family 1 is capable of producing contrary updating. The next section demonstrates by example that all three of the graphs in Family 2 are capable of producing contrary updating. Table 1 compares the two families of Bayes nets to the informal conclusions about normative approaches that are often found in the psychological literature. As previously noted, the conventional wisdom holds that belief divergence is irrational but that convergence and parallel updating are both rational. Our analysis suggests that this position has little support. Depending on the causal structure of the problem under consideration, a rational approach should allow both divergence and convergence or neither. Although we focus in this paper on Bayes nets with no more than three nodes, the class of all network structures can be partitioned into those that can (Family 2) and cannot (Family 1) produce contrary updating. R1 is true for Bayes nets of any size and characterizes one group of networks that belong to Family 1. Networks where the data provide no information about the hypotheses must also fail to produce contrary updating. Note that if D and H are conditionally independent given B, then the left side of Equation 3 is equal to zero, meaning contrary updating cannot occur. We conjecture that all remaining networks can produce contrary updating if the cardinalities of the nodes and the CPDs are chosen appropriately. Future studies can attempt to verify this conjecture and to precisely characterize the CPDs that lead to contrary updating. 3 Examples of rational belief divergence We now present four scenarios that can be modeled by the three-node Bayes nets in Family 2. Our purpose in developing these examples is to demonstrate that these networks can produce belief divergence and to provide some everyday examples in which this behavior is both normative and intuitive. 3.1 Example 1: Promotion We ﬁrst consider a scenario that can be captured by Bayes net 2f, in which the data depend on two independent factors. Recall the scenario described at the beginning of this paper: Alice and Bob are responsible for deciding whether to promote Carol. For simplicity, we consider a case where the data represent a binary outcome—whether or not Carol’s r´ sum´ indicates that she is included e e in The Directory of Notable People—rather than her score on an aptitude test. Alice believes that The Directory is a reputable publication but Bob believes it is illegitimate. This situation is represented by the Bayes net and associated CPDs in Figure 3a. In the tables, the hypothesis space H = {‘Unqualiﬁed’ = 0, ‘Qualiﬁed’ = 1} represents whether or not Carol is qualiﬁed for the promotion, the additional factor V = {‘Disreputable’ = 0, ‘Reputable’ = 1} represents whether The Directory is a reputable publication, and the data variable D = {‘Not included’ = 0, ‘Included’ = 1} represents whether Carol is featured in it. The actual probabilities were chosen to reﬂect the fact that only an unqualiﬁed person is likely to pad their r´ sum´ by mentioning a disreputable publication, but that e e 4 (a) B Alice Bob (b) P(V=1) 0.01 0.9 B Alice Bob V B Alice Bob P(H=1) 0.6 0.4 V H D V 0 0 1 1 H 0 1 0 1 V 0 1 P(D=1) 0.5 0.1 0.1 0.9 (c) P(H=1) 0.1 0.9 H V 0 0 1 1 D H 0 1 0 1 P(D=1) 0.4 0.01 0.4 0.6 (d) B Alice Bob P(V=0) P(V=1) P(V=2) P(V=3) 0.6 0.2 0.1 0.1 0.1 0.1 0.2 0.6 B Alice Bob P(V1=1) 0.9 0.1 P(H=1) 1 1 0 0 H B Alice Bob V1 V V 0 1 2 3 P(V=1) 0.9 0.1 D V 0 1 2 3 P(D=0) P(D=1) P(D=2) P(D=3) 0.7 0.1 0.1 0.1 0.1 0.7 0.1 0.1 0.1 0.1 0.7 0.1 0.1 0.1 0.1 0.7 V1 0 0 1 1 V2 0 1 0 1 P(H=1) 0.5 0.1 0.5 0.9 P(V2=1) 0.5 0.5 V2 H D V2 0 1 P(D=1) 0.1 0.9 Figure 3: The Bayes nets and conditional probability distributions used in (a) Example 1: Promotion, (b) Example 2: Religious belief, (c) Example 3: Election polls, (d) Example 4: Political belief. only a qualiﬁed person is likely to be included in The Directory if it is reputable. Note that Alice and Bob agree on the conditional probability distribution for D, but assign different priors to V and H. Alice and Bob therefore interpret the meaning of Carol’s presence in The Directory differently, resulting in the belief divergence shown in Figure 4a. This scenario is one instance of a large number of belief divergence cases that can be attributed to two individuals possessing different mental models of how the observed evidence was generated. For instance, suppose now that Alice and Bob are both on an admissions committee and are evaluating a recommendation letter for an applicant. Although the letter is positive, it is not enthusiastic. Alice, who has less experience reading recommendation letters interprets the letter as a strong endorsement. Bob, however, takes the lack of enthusiasm as an indication that the author has some misgivings [12]. As in the promotion scenario, the differences in Alice’s and Bob’s experience can be effectively represented by the priors they assign to the H and V nodes in a Bayes net of the form in Figure 2f. 3.2 Example 2: Religious belief We now consider a scenario captured by Bayes net 2g. In our example for Bayes net 2f, the status of an additional factor V affected how Alice and Bob interpreted the data D, but did not shape their prior beliefs about H. In many cases, however, the additional factor V will inﬂuence both people’s prior beliefs about H as well as their interpretation of the relationship between D and H. Bayes net 2g captures this situation, and we provide a concrete example inspired by an experiment conducted by Batson [13]. Suppose that Alice believes in a “Christian universe:” she believes in the divinity of Jesus Christ and expects that followers of Christ will be persecuted. Bob, on the other hand, believes in a “secular universe.” This belief leads him to doubt Christ’s divinity, but to believe that if Christ were divine, his followers would likely be protected rather than persecuted. Now suppose that both Alice and Bob observe that Christians are, in fact, persecuted, and reassess the probability of Christ’s divinity. This situation is represented by the Bayes net and associated CPDs in Figure 3b. In the tables, the hypothesis space H = {‘Human’ = 0, ‘Divine’ = 1} represents the divinity of Jesus Christ, the additional factor V = {‘Secular’ = 0, ‘Christian’ = 1} represents the nature of the universe, and the data variable D = {‘Not persecuted’ = 0, ‘Persecuted’ = 1} represents whether Christians are subject to persecution. The exact probabilities were chosen to reﬂect the fact that, regardless of worldview, people will agree on a “base rate” of persecution given that Christ is not divine, but that more persecution is expected if the Christian worldview is correct than if the secular worldview is correct. Unlike in the previous scenario, Alice and Bob agree on the CPDs for both D and H, but 5 (a) (b) P (H = 1) (d) 1 1 1 0.5 1 (c) 0.5 0.5 A 0.5 B 0 0 0 Prior beliefs Updated beliefs Prior beliefs Updated beliefs 0 Prior beliefs Updated beliefs Prior beliefs Updated beliefs Figure 4: Belief revision outcomes for (a) Example 1: Promotion, (b) Example 2: Religious belief, (c) Example 3: Election polls, and (d) Example 4: Political belief. In all four plots, the updated beliefs for Alice (solid line) and Bob (dashed line) are computed after observing the data described in the text. The plots conﬁrm that all four of our example networks can lead to belief divergence. differ in the priors they assign to V . As a result, Alice and Bob disagree about whether persecution supports or undermines a Christian worldview, which leads to the divergence shown in Figure 4b. This scenario is analogous to many real world situations in which one person has knowledge that the other does not. For instance, in a police interrogation, someone with little knowledge of the case (V ) might take a suspect’s alibi (D) as strong evidence of their innocence (H). However, a detective with detailed knowledge of the case may assign a higher prior probability to the subject’s guilt based on other circumstantial evidence, and may also notice a detail in the suspect’s alibi that only the culprit would know, thus making the statement strong evidence of guilt. In all situations of this kind, although two people possess different background knowledge, their inferences are normative given that knowledge, consistent with the Bayes net in Figure 2g. 3.3 Example 3: Election polls We now consider two qualitatively different cases that are both captured by Bayes net 2h. The networks considered so far have all included a direct link between H and D. In our next two examples, we consider cases where the hypotheses and observed data are not directly linked, but are coupled by means of one or more unobserved causal factors. Suppose that an upcoming election will be contested by two Republican candidates, Rogers and Rudolph, and two Democratic candidates, Davis and Daly. Alice and Bob disagree about the various candidates’ chances of winning, with Alice favoring the two Republicans and Bob favoring the two Democrats. Two polls were recently released, one indicating that Rogers was most likely to win the election and the other indicating that Daly was most likely to win. After considering these polls, they both assess the likelihood that a Republican will win the election. This situation is represented by the Bayes net and associated CPDs in Figure 3c. In the tables, the hypothesis space H = {‘Democrat wins’ = 0, ‘Republican wins’ = 1} represents the winning party, the variable V = {‘Rogers’ = 0, ‘Rudolph’ = 1, ‘Davis’ = 2, ‘Daly’ = 3} represents the winning candidate, and the data variables D1 = D2 = {‘Rogers’ = 0, ‘Rudolph’ = 1, ‘Davis’ = 2, ‘Daly’ = 3} represent the results of the two polls. The exact probabilities were chosen to reﬂect the fact that the polls are likely to reﬂect the truth with some noise, but whether a Democrat or Republican wins is completely determined by the winning candidate V . In Figure 3c, only a single D node is shown because D1 and D2 have identical CPDs. The resulting belief divergence is shown in Figure 4c. Note that in this scenario, Alice’s and Bob’s different priors cause them to discount the poll that disagrees with their existing beliefs as noise, thus causing their prior beliefs to be reinforced by the mixed data. This scenario was inspired by the death penalty study [1] alluded to earlier, in which a set of mixed results caused supporters and opponents of the death penalty to strengthen their existing beliefs. We do not claim that people’s behavior in this study can be explained with exactly the model employed here, but our analysis does show that selective interpretation of evidence is sometimes consistent with a rational approach. 6 3.4 Example 4: Political belief We conclude with a second illustration of Bayes net 2h in which two people agree on the interpretation of an observed piece of evidence but disagree about the implications of that evidence. In this scenario, Alice and Bob are two economists with different philosophies about how the federal government should approach a major recession. Alice believes that the federal government should increase its own spending to stimulate economic activity; Bob believes that the government should decrease its spending and reduce taxes instead, providing taxpayers with more spending money. A new bill has just been proposed and an independent study found that the bill was likely to increase federal spending. Alice and Bob now assess the likelihood that this piece of legislation will improve the economic climate. This scenario can be modeled by the Bayes net and associated CPDs in Figure 3d. In the tables, the hypothesis space H = {‘Bad policy’ = 0, ‘Good policy’ = 1} represents whether the new bill is good for the economy and the data variable D = {‘No spending’ = 0, ‘Spending increase’ = 1} represents the conclusions of the independent study. Unlike in previous scenarios, we introduce two additional factors, V 1 = {‘Fiscally conservative’ = 0, ‘Fiscally liberal’ = 1}, which represents the optimal economic philosophy, and V 2 = {‘No spending’ = 0, ‘Spending increase’ = 1}, which represents the spending policy of the new bill. The exact probabilities in the tables were chosen to reﬂect the fact that if the bill does not increase spending, the policy it enacts may still be good for other reasons. A uniform prior was placed on V 2 for both people, reﬂecting the fact that they have no prior expectations about the spending in the bill. However, the priors placed on V 1 for Alice and Bob reﬂect their different beliefs about the best economic policy. The resulting belief divergence behavior is shown in Figure 4d. The model used in this scenario bears a strong resemblance to the probabilogical model of attitude change developed by McGuire [14] in which V 1 and V 2 might be logical “premises” that entail the “conclusion” H. 4 How common is contrary updating? We have now described four concrete cases where belief divergence is captured by a normative approach. It is possible, however, that belief divergence is relatively rare within the Bayes nets of Family 2, and that our four examples are exotic special cases that depend on carefully selected CPDs. To rule out this possibility, we ran simulations to explore the space of all possible CPDs for the three networks in Family 2. We initially considered cases where H, D, and V were binary variables, and ran two simulations for each model. In one simulation, the priors and each row of each CPD were sampled from a symmetric Beta distribution with parameter 0.1, resulting in probabilities highly biased toward 0 and 1. In the second simulation, the probabilities were sampled from a uniform distribution. In each trial, a single set of CPDs were generated and then two different priors were generated for each root node in the graph to simulate two individuals, consistent with our assumption that two individuals may have different priors but must agree about the conditional probabilities. 20,000 trials were carried out in each simulation, and the proportion of trials that led to convergence and divergence was computed. Trials were only counted as instances of convergence or divergence if |P (H = 1|D = 1) − P (H = 1)| > for both individuals, with = 1 × 10−5 . The results of these simulations are shown in Table 2. The supporting material proves that divergence and convergence are equally common, and therefore the percentages in the table show the frequencies for contrary updating of either type. Our primary question was whether contrary updating is rare or anomalous. In all but the third simulation, contrary updating constituted a substantial proportion of trials, suggesting that the phenomenon is relatively generic. We were also interested in whether this behavior relied on particular settings of the CPDs. The fact that percentages for the uniform distribution are approximately the same or greater than for the biased distribution indicates that contrary updating appears to be a relatively generic behavior for the Bayes nets we considered. More generally, these results directly challenge the suggestion that normative accounts are not suited for modeling belief divergence. The last two columns of Table 2 show results for two simulations with the same Bayes net, the only difference being whether V was treated as 2-valued (binary) or 4-valued. The 4-valued case is included because both Examples 3 and 4 considered multi-valued additional factor variables V . 7 2-valued V V H Biased Uniform 4-valued V V V V D 9.6% 18.2% D H 12.7% 16.0% H D 0% 0% H D 23.3% 20.0% Table 2: Simulation results. The percentages indicate the proportion of trials that produced contrary updating using the speciﬁed Bayes net (column) and probability distributions (row). The prior and conditional probabilities were either sampled from a Beta(0.1, 0.1) distribution (biased) or a Beta(1, 1) distribution (uniform). The probabilities for the simulation results shown in the last column were sampled from a Dirichlet([0.1, 0.1, 0.1, 0.1]) distribution (biased) or a Dirichlet([1, 1, 1, 1]) distribution (uniform). In Example 4, we used two binary variables, but we could have equivalently used a single 4-valued variable. Belief convergence and divergence are not possible in the binary case, a result that is proved in the supporting material. We believe, however, that convergence and divergence are fairly common whenever V takes three or more values, and the simulation in the last column of the table conﬁrms this claim for the 4-valued case. Given that belief divergence seems relatively common in the space of all Bayes nets, it is natural to explore whether cases of rational divergence are regularly encountered in the real world. One possible approach is to analyze a large database of networks that capture everyday belief revision problems, and to determine what proportion of networks lead to rational divergence. Future studies can explore this issue, but our simulations suggest that contrary updating is likely to arise in cases where it is necessary to move beyond a simple model like the one in Figure 2a and consider several causal factors. 5 Conclusion This paper presented a family of Bayes nets that can account for belief divergence, a phenomenon that is typically considered to be incompatible with normative accounts. We provided four concrete examples that illustrate how this family of networks can capture a variety of settings where belief divergence can emerge from rational statistical inference. We also described a series of simulations that suggest that belief divergence is not only possible but relatively common within the family of networks that we considered. Our work suggests that belief polarization should not always be taken as evidence of irrationality, and that researchers who aim to document departures from rationality may wish to consider alternative phenomena instead. One such phenomenon might be called “inevitable belief reinforcement” and occurs when supporters of a hypothesis update their belief in the same direction for all possible data sets d. For example, a gambler will demonstrate inevitable belief reinforcement if he or she becomes increasingly convinced that a roulette wheel is biased towards red regardless of whether the next spin produces red, black, or green. This phenomenon is provably inconsistent with any fully Bayesian approach, and therefore provides strong evidence of irrationality. Although we propose that some instances of polarization are compatible with a Bayesian approach, we do not claim that human inferences are always or even mostly rational. We suggest, however, that characterizing normative behavior can require careful thought, and that formal analyses are invaluable for assessing the rationality of human inferences. In some cases, a formal analysis will provide an appropriate baseline for understanding how human inferences depart from rational norms. In other cases, a formal analysis will suggest that an apparently irrational inference makes sense once all of the relevant information is taken into account. 8 References [1] C. G. Lord, L. Ross, and M. R. Lepper. Biased assimilation and attitude polarization: The effects of prior theories on subsequently considered evidence. Journal of Personality and Social Psychology, 37(1):2098–2109, 1979. [2] L. Ross and M. R. Lepper. The perseverance of beliefs: Empirical and normative considerations. In New directions for methodology of social and behavioral science: Fallible judgment in behavioral research. Jossey-Bass, San Francisco, 1980. [3] J. Baron. Thinking and Deciding. Cambridge University Press, Cambridge, 4th edition, 2008. [4] A. Gerber and D. Green. Misperceptions about perceptual bias. Annual Review of Political Science, 2:189–210, 1999. [5] M. Oaksford and N. Chater. A rational analysis of the selection task as optimal data selection. Psychological Review, 101(4):608–631, 1994. [6] U. Hahn and M. Oaksford. The rationality of informal argumentation: A Bayesian approach to reasoning fallacies. Psychological Review, 114(3):704–732, 2007. [7] S. Sher and C. R. M. McKenzie. Framing effects and rationality. In N. Chater and M. Oaksford, editors, The probablistic mind: Prospects for Bayesian cognitive science. Oxford University Press, Oxford, 2008. [8] B. O’Connor. Biased evidence assimilation under bounded Bayesian rationality. Master’s thesis, Stanford University, 2006. [9] A. Zimper and A. Ludwig. Attitude polarization. Technical report, Mannheim Research Institute for the Economics of Aging, 2007. [10] A. K. Dixit and J. W. Weibull. Political polarization. Proceedings of the National Academy of Sciences, 104(18):7351–7356, 2007. [11] L. L. Lopes. Averaging rules and adjustment processes in Bayesian inference. Bulletin of the Psychonomic Society, 23(6):509–512, 1985. [12] A. Harris, A. Corner, and U. Hahn. “Damned by faint praise”: A Bayesian account. In A. D. De Groot and G. Heymans, editors, Proceedings of the 31th Annual Conference of the Cognitive Science Society, Austin, TX, 2009. Cognitive Science Society. [13] C. D. Batson. Rational processing or rationalization? The effect of disconﬁrming information on a stated religious belief. Journal of Personality and Social Psychology, 32(1):176–184, 1975. [14] W. J. McGuire. The probabilogical model of cognitive structure and attitude change. In R. E. Petty, T. M. Ostrom, and T. C. Brock, editors, Cognitive Responses in Persuasion. Lawrence Erlbaum Associates, 1981. 9

2 0.89246833 259 nips-2009-Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Author: Jie Luo, Barbara Caputo, Vittorio Ferrari

Abstract: Given a corpus of news items consisting of images accompanied by text captions, we want to ﬁnd out “who’s doing what”, i.e. associate names and action verbs in the captions to the face and body pose of the persons in the images. We present a joint model for simultaneously solving the image-caption correspondences and learning visual appearance models for the face and pose classes occurring in the corpus. These models can then be used to recognize people and actions in novel images without captions. We demonstrate experimentally that our joint ‘face and pose’ model solves the correspondence problem better than earlier models covering only the face, and that it can perform recognition of new uncaptioned images. 1

same-paper 3 0.87106001 52 nips-2009-Code-specific policy gradient rules for spiking neurons

Author: Henning Sprekeler, Guillaume Hennequin, Wulfram Gerstner

4 0.71409333 213 nips-2009-Semi-supervised Learning using Sparse Eigenfunction Bases

Author: Kaushik Sinha, Mikhail Belkin

Abstract: We present a new framework for semi-supervised learning with sparse eigenfunction bases of kernel matrices. It turns out that when the data has clustered, that is, when the high density regions are sufﬁciently separated by low density valleys, each high density area corresponds to a unique representative eigenvector. Linear combination of such eigenvectors (or, more precisely, of their Nystrom extensions) provide good candidates for good classiﬁcation functions when the cluster assumption holds. By ﬁrst choosing an appropriate basis of these eigenvectors from unlabeled data and then using labeled data with Lasso to select a classiﬁer in the span of these eigenvectors, we obtain a classiﬁer, which has a very sparse representation in this basis. Importantly, the sparsity corresponds naturally to the cluster assumption. Experimental results on a number of real-world data-sets show that our method is competitive with the state of the art semi-supervised learning algorithms and outperforms the natural base-line algorithm (Lasso in the Kernel PCA basis). 1

5 0.65083325 166 nips-2009-Noisy Generalized Binary Search

Author: Robert Nowak

Abstract: This paper addresses the problem of noisy Generalized Binary Search (GBS). GBS is a well-known greedy algorithm for determining a binary-valued hypothesis through a sequence of strategically selected queries. At each step, a query is selected that most evenly splits the hypotheses under consideration into two disjoint subsets, a natural generalization of the idea underlying classic binary search. GBS is used in many applications, including fault testing, machine diagnostics, disease diagnosis, job scheduling, image processing, computer vision, and active learning. In most of these cases, the responses to queries can be noisy. Past work has provided a partial characterization of GBS, but existing noise-tolerant versions of GBS are suboptimal in terms of query complexity. This paper presents an optimal algorithm for noisy GBS and demonstrates its application to learning multidimensional threshold functions. 1

6 0.5666185 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs

7 0.55455816 99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning

8 0.53969121 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

9 0.52905381 247 nips-2009-Time-rescaling methods for the estimation and assessment of non-Poisson neural encoding models

10 0.52608132 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies

11 0.51657182 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

12 0.50510049 107 nips-2009-Help or Hinder: Bayesian Models of Social Goal Inference

13 0.50297236 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

14 0.50156569 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels

15 0.4929001 163 nips-2009-Neurometric function analysis of population codes

16 0.49223143 121 nips-2009-Know Thy Neighbour: A Normative Theory of Synaptic Depression

17 0.48160139 203 nips-2009-Replacing supervised classification learning by Slow Feature Analysis in spiking neural networks

18 0.47972244 38 nips-2009-Augmenting Feature-driven fMRI Analyses: Semi-supervised learning and resting state activity

19 0.47891933 66 nips-2009-Differential Use of Implicit Negative Evidence in Generative and Discriminative Language Learning

20 0.47670907 29 nips-2009-An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism