nips nips2003 nips2003-27 knowledge-graph by maker-knowledge-mining

27 nips-2003-Analytical Solution of Spike-timing Dependent Plasticity Based on Synaptic Biophysics

Source: pdf

Author: Bernd Porr, Ausra Saudargiene, Florentin Wörgötter

Abstract: Spike timing plasticity (STDP) is a special form of synaptic plasticity where the relative timing of post- and presynaptic activity determines the change of the synaptic weight. On the postsynaptic side, active backpropagating spikes in dendrites seem to play a crucial role in the induction of spike timing dependent plasticity. We argue that postsynaptically the temporal change of the membrane potential determines the weight change. Coming from the presynaptic side induction of STDP is closely related to the activation of NMDA channels. Therefore, we will calculate analytically the change of the synaptic weight by correlating the derivative of the membrane potential with the activity of the NMDA channel. Thus, for this calculation we utilise biophysical variables of the physiological cell. The ﬁnal result shows a weight change curve which conforms with measurements from biology. The positive part of the weight change curve is determined by the NMDA activation. The negative part of the weight change curve is determined by the membrane potential change. Therefore, the weight change curve should change its shape depending on the distance from the soma of the postsynaptic cell. We ﬁnd temporally asymmetric weight change close to the soma and temporally symmetric weight change in the distal dendrite. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Analytical solution of spike-timing dependent plasticity based on synaptic biophysics Bernd Porr, Ausra Saudargiene and Florentin W¨ rg¨ tter o o Computational Neuroscience Psychology University of Stirling FK9 4LR Stirling, UK {Bernd. [sent-1, score-0.354]

2 uk Abstract Spike timing plasticity (STDP) is a special form of synaptic plasticity where the relative timing of post- and presynaptic activity determines the change of the synaptic weight. [sent-5, score-0.917]

3 On the postsynaptic side, active backpropagating spikes in dendrites seem to play a crucial role in the induction of spike timing dependent plasticity. [sent-6, score-0.985]

4 We argue that postsynaptically the temporal change of the membrane potential determines the weight change. [sent-7, score-0.507]

5 Coming from the presynaptic side induction of STDP is closely related to the activation of NMDA channels. [sent-8, score-0.103]

6 Therefore, we will calculate analytically the change of the synaptic weight by correlating the derivative of the membrane potential with the activity of the NMDA channel. [sent-9, score-0.756]

7 Thus, for this calculation we utilise biophysical variables of the physiological cell. [sent-10, score-0.153]

8 The ﬁnal result shows a weight change curve which conforms with measurements from biology. [sent-11, score-0.185]

9 The positive part of the weight change curve is determined by the NMDA activation. [sent-12, score-0.185]

10 The negative part of the weight change curve is determined by the membrane potential change. [sent-13, score-0.502]

11 Therefore, the weight change curve should change its shape depending on the distance from the soma of the postsynaptic cell. [sent-14, score-0.554]

12 We ﬁnd temporally asymmetric weight change close to the soma and temporally symmetric weight change in the distal dendrite. [sent-15, score-0.578]

13 1 Introduction Donald Hebb [1] postulated half a century ago that the change of synaptic strength depends on the correlation of pre- and postsynaptic activity: cells which ﬁre together wire together. [sent-16, score-0.43]

14 Here we want to concentrate on a special form of correlation based learning, namely, spike timing dependent plasticity (STDP, [2, 3]). [sent-17, score-0.346]

15 STDP is asymmetrical in time: Weights grow if the pre-synaptic event precedes the postsynaptic event. [sent-18, score-0.179]

16 Correlations between pre- and postsynaptic activity can take place at different locations of the cell. [sent-22, score-0.21]

17 Here we will focus on the dendrite of the cell (see Fig. [sent-23, score-0.137]

18 The dendrite has attracted interest recently because of its ability to propagate spikes back from the soma of the cell into its distal regions. [sent-25, score-0.382]

19 The transmission is active which guarantees that the spikes can reach even the distal regions of the dendrite [4]. [sent-27, score-0.385]

20 Backpropagating spikes have been suggested to be the driving force for STDP in the dendrite [5]. [sent-28, score-0.228]

21 On the presynaptic side the main contribution to STDP comes from Ca2+ ﬂow through the NMDA channels [6]. [sent-29, score-0.109]

22 The goal of this study is to derive an analytical solution for STDP on the basis of the biophysical properties of the NMDA channel and the cell membrane. [sent-30, score-0.138]

23 We will show that mainly the timing of the backpropagating spike determines the shape of the learning curve. [sent-31, score-0.542]

24 With fast decaying backpropagating spikes we obtain STDP while with slow decaying backpropagating spikes we approximate temporally symmetric Hebbian learning. [sent-32, score-0.898]

25 event = BP-spike t Plastic Synapse g ρ NMDA 0 g C dV = dt ∑ i Ii ms 100 I BP BP-Spike Figure 1: Schematic diagram of the model setup. [sent-35, score-0.103]

26 2 The Model The goal is to deﬁne a weight change rule which correlates the dynamics of an NMDA channel with a variable which is linked to the dynamics of a backpropagating spike. [sent-38, score-0.543]

27 The precise biophysical mechanisms of STDP are still to a large degree unresolved. [sent-39, score-0.114]

28 It is, however, known that high levels of Ca2+ concentration resulting from Ca2+ inﬂux mainly through NMDA-channels will lead to LTP, while lower levels will lead to LTD. [sent-40, score-0.135]

29 Recent physiological results (reviewed in detail in [10]), however suggest that not only the Ca2+ concentration but maybe more importantly the change of the Ca2+ concentration determines if LTP or LTD is observed. [sent-42, score-0.295]

30 In our model we assume that the Ca2+ concentration and the membrane potential are highly correlated. [sent-45, score-0.398]

31 Consequently, our learning rule utilises the derivative of the membrane potential for the postsynaptic activity. [sent-46, score-0.583]

32 After having identiﬁed the postsynaptic part of the weight change rule we have to deﬁne the presynaptic part. [sent-47, score-0.434]

33 This shall be the conductance function of the NMDA channel [6]. [sent-48, score-0.161]

34 The conventional membrane equation reads: dv(t) Vrest − v(t) = ρ g(t)[E − v(t)] + iBP (t) + , (1) dt R where v is the membrane potential, ρ the synaptic weight of the NMDA-channel and g, E are its conductance and equilibrium potential, respectively. [sent-49, score-0.841]

35 The current, which a BP-spike elicits, is given by iBP and the last term represents the passive repolarisation property of the membrane towards its resting potential Vrest = −70 mV . [sent-50, score-0.434]

36 We set the membrane capacitance C = 50 pF and the membrane resistance to R = 100 M Ω. [sent-51, score-0.46]

37 The NMDA channel has the following equation: C g(t) = g ¯ e−b1 t − e−a1 t [a1 − b1 ][1 + κe−γV (t) ] (2) −1 −1 For simpler notation, in general we use inverse time-constants a1 = τa , b1 = τb , etc. [sent-53, score-0.065]

38 Thus, we adjust for this by deﬁning g = 12 mS/ms which represents the ¯ peak conductance (4 nS) multiplied by b1 − a1 . [sent-55, score-0.144]

39 Since we will not vary the M g 2+ concentration we have already abbreviated: κ = η[M g 2+ ], η = 0. [sent-60, score-0.081]

40 The shift T > 0 means that the backpropagating spike follows after the trigger of the NMDA channel. [sent-63, score-0.408]

41 The shift T < 0 means that the temporal sequence of the pre- and postsynaptic events is reversed. [sent-64, score-0.214]

42 4 we have to simplify it, however, without loosing biophysical realism. [sent-66, score-0.073]

43 In this paper we are interested in different shapes of backpropagating spikes. [sent-67, score-0.296]

44 The underlying mechanisms which establish backpropagating spikes will not be addressed here. [sent-68, score-0.428]

45 The backpropagating spike shall be simply modelled as a potential change in the dendrite and its shape is determined by its amplitude, its rise time and its decay time. [sent-69, score-0.817]

46 First we observe that the inﬂuence of a single (or even a few) NMDA-channels on the membrane potential can be neglected in comparison to a BP-spike1 , which, due to active processes, leads to a depolarisation of often more than 50 mV even at distal dendrites because of active processes [15]. [sent-70, score-0.68]

47 Thus, we can assume that the dynamics of the membrane potential is established by the backpropagating spike and the resting potential Vrest : dv(t) Vrest − v(t) = iBP (t) + (5) dt R This equation can be further simpliﬁed. [sent-71, score-0.943]

48 Next we assume that the second passive repolarisa−v(t) tion term can also be absorbed into iBP , thus resulting to itotal (t) = iBP (t) + VrestR . [sent-72, score-0.236]

49 To this end we model itotal as a derivative of a band-pass ﬁlter function: C itotal (t) = ¯total i a2 e−a2 t − b2 e−b2 t a2 − b2 (6) 1 Note that in spines, however, synaptic input can lead to large changes in the postsynaptic potential. [sent-73, score-0.797]

50 This ﬁlter function causes ﬁrst an inﬂux of charges i into the dendrite and then again an outﬂux of charges. [sent-76, score-0.137]

51 The time constants a2 and b2 determine the timing of the current ﬂow and therefore the rise and decay time. [sent-77, score-0.141]

52 The total charge ﬂux is zero so that the resting potential is reestablished after a backpropagating spike. [sent-78, score-0.542]

53 In this way the active de- and repolarising properties of a BP-spike can be combined with the passive properties of the membrane, in practise by a curve ﬁtting procedure which yields a2 , b2 . [sent-79, score-0.116]

54 As a result we ﬁnd that the membrane equation in our case reduces to: dv(t) C = itotal (t) (7) dt We receive the resulting membrane potential simply by integrating Eq. [sent-80, score-0.797]

55 The NMDA conductance g is more complex, because the membrane potential enters the denominator in Eq. [sent-84, score-0.413]

56 We expand around 0 mV and not around the resting potential. [sent-87, score-0.078]

57 Second, the NMDA channel has a strong non-linearity around the resting potential. [sent-91, score-0.143]

58 Towards 0 mV , however, the NMDA channel has a linear voltage/current curve. [sent-92, score-0.065]

59 The NMDA conductance can now be written as: e−b1 t − e−a1 t γκv(t) 1 g(t) = g ¯ + + . [sent-94, score-0.096]

60 ) ·( a1 − b1 κ + 1 (κ + 1)2 and ﬁnally the potential v(t) (Eq. [sent-97, score-0.087]

61 Mixed inﬂuences arise from the second and third terms which scale with the peak current amplitude ¯total of the BP-spike. [sent-108, score-0.073]

62 2 and remain fairly constant, BP-spikes change their shapes along the dendrite. [sent-110, score-0.087]

63 Panels i A-C were obtained with different peak currents ¯total = 0. [sent-114, score-0.091]

64 These i currents cause peak voltages of 40mV, 50mV and 40mV respectively. [sent-117, score-0.125]

65 This current is unrealistic, however, it i is chosen for illustrative purposes to show the different contributions to the learning curve (the dashed lines for G(0) and the dotted lines for G(1a,b) and the solid lines for the sum of the two contributions). [sent-120, score-0.066]

66 The contributions of the different terms to the STDP curves are also shown (ﬁrst term, dashed, as well as second and third term scaled with their fore-factor, dotted). [sent-131, score-0.064]

67 As expected we ﬁnd that the ﬁrst term dominates for small (realistic) currents (top panels), while the second and third terms dominate for higher currents (middle panels). [sent-133, score-0.086]

68 4 Discussion We believe that two of our ﬁndings could be of longer lasting relevance for the understanding of synaptic learning, provided they withstand physiological scrutinising: 1) The shape of the weight change curves heavily relies on the shape of the backpropagating spike. [sent-135, score-0.807]

69 2) STDP can turn into plain Hebbian learning if the postsynaptic depolarisation (i. [sent-136, score-0.243]

70 Physiological studies suggest that weight change curves can indeed have a widely varying shape (reviewed in [17]). [sent-139, score-0.242]

71 In this study we argue that in particular the shape of the back- propagating spike inﬂuences the shape of the weight change curve. [sent-140, score-0.385]

72 In fact the dendrites can be seen as active ﬁlters which change the shape of backpropagating spikes during their journey to the distal parts of the dendrite [18]. [sent-141, score-0.947]

73 In particular, the decay time of the BP spike is increased in the distal parts of the dendrite [15]. [sent-142, score-0.398]

74 The different decay times determine if we get pure symmetric Hebbian learning or STDP (see Fig. [sent-143, score-0.065]

75 Thus, the theoretical result would suggest temporal symmetric Hebbian learning in the distal dendrites and STDP in the proximal dendrites. [sent-145, score-0.322]

76 From a computational perspective this would mean that the distal dendrites perform principle component analysis [19] and the proximal dendrites temporal sequence learning [20]. [sent-146, score-0.416]

77 Such models can either adopt a rather descriptive approach [21], where appropriate functions are being ﬁt to the measured weight change curves. [sent-149, score-0.155]

78 Those models establish a more realistic relation between calcium concentration and membrane potential. [sent-151, score-0.456]

79 The calcium concentration seems to be a low-pass ﬁltered version of the membrane potential [24]. [sent-152, score-0.543]

80 Such a low pass ﬁlter hlow could be added to the learning rule Eq. [sent-153, score-0.066]

81 Both models investigate the effects of different calcium concentration levels by assuming certain (e. [sent-156, score-0.253]

82 This allows them to address the question of how different calcium levels will lead to LTD or LTP [25]. [sent-159, score-0.172]

83 The differential Hebbian rule employed by us leads to the observed results as the consequence of the fact that the derivative of any generic unimodal signal will lead to a bimodal curve. [sent-161, score-0.172]

84 We utilise the derivative of the unimodal membrane potential to obtain a bimodal weight change curve. [sent-162, score-0.618]

85 The derivative of the membrane potential is proportional to the charge transfer dqt = it across the (post-synaptic) membrane dt (see Eq. [sent-163, score-0.687]

86 There is wide ranging support that synaptic plasticity is strongly dominated by calcium transfer through NMDA channels [26, 27, 6]. [sent-165, score-0.469]

87 Thus it seems reasonable to assume that a part of dQ represents calcium ﬂow through the NMDA channel. [sent-166, score-0.145]

88 Regulation of synaptic efﬁcacy u by coincidence of postsynaptic aps and epsps. [sent-173, score-0.387]

89 A synaptically controlled, associative signal for Hebbian plasticity in hippocampal neurons. [sent-179, score-0.178]

90 An algorithm for modifying neurotransmitter release probability based on pre-and postsynaptic spike timing. [sent-207, score-0.291]

91 A uniﬁed model of NMDA receptor-dependent bidirectional synaptic plasticity. [sent-228, score-0.203]

92 Spatiotemporal speciﬁcity of synaptic plasticity: cellular rules and mechanisms. [sent-237, score-0.164]

93 Action potential a initiation and backpropagation in neurons of the mammalian cns. [sent-262, score-0.147]

94 Dichotomy of action potential backpropagation in ca1 pyramidal neuron dendrites. [sent-281, score-0.139]

95 A biophysical model of bidirectional synaptic plasticity: Dependence on AMPA and NMDA receptors. [sent-312, score-0.276]

96 A model of spike-timing dependent plasticity: One or two coincidence detectors? [sent-323, score-0.079]

97 Action potential initiation and propagation in rat neocortical pyramidal neurons. [sent-331, score-0.147]

98 Calcium stores regulate the polarity and input speciﬁcity of synaptic modiﬁcation. [sent-339, score-0.164]

99 Ampliﬁcation of calcium inﬂux into dendritic spines during associative pre- and postsynaptic activation: The role of direct calcium inﬂux through the NMDA receptor. [sent-346, score-0.564]

100 Mechanisms of calcium inﬂux into hippocampal spines: heterogeneity among spines, coincidence detection by NMDA receptors, and optical quantal analysis. [sent-356, score-0.216]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('nmda', 0.552), ('backpropagating', 0.296), ('stdp', 0.25), ('membrane', 0.23), ('itotal', 0.197), ('postsynaptic', 0.179), ('synaptic', 0.164), ('calcium', 0.145), ('dendrite', 0.137), ('plasticity', 0.124), ('hebbian', 0.124), ('dendrites', 0.12), ('ltp', 0.12), ('spike', 0.112), ('distal', 0.11), ('mv', 0.109), ('ux', 0.102), ('ibp', 0.099), ('conductance', 0.096), ('spikes', 0.091), ('change', 0.087), ('potential', 0.087), ('concentration', 0.081), ('vrest', 0.079), ('resting', 0.078), ('timing', 0.075), ('presynaptic', 0.073), ('biophysical', 0.073), ('ltd', 0.073), ('spines', 0.068), ('weight', 0.068), ('channel', 0.065), ('derivative', 0.06), ('karmarkar', 0.059), ('schiller', 0.059), ('shape', 0.059), ('total', 0.054), ('dt', 0.053), ('eb', 0.051), ('ms', 0.05), ('peak', 0.048), ('panels', 0.048), ('dv', 0.048), ('laplace', 0.048), ('active', 0.047), ('physiological', 0.046), ('temporally', 0.044), ('coincidence', 0.044), ('soma', 0.044), ('currents', 0.043), ('mechanisms', 0.041), ('decay', 0.039), ('bidirectional', 0.039), ('depolarisation', 0.039), ('hlow', 0.039), ('porr', 0.039), ('passive', 0.039), ('channels', 0.036), ('contributions', 0.036), ('dependent', 0.035), ('temporal', 0.035), ('voltages', 0.034), ('bernd', 0.034), ('florentin', 0.034), ('initiation', 0.034), ('plastic', 0.034), ('potentiation', 0.034), ('utilise', 0.034), ('differential', 0.033), ('kinetic', 0.031), ('biophysics', 0.031), ('markram', 0.031), ('nelson', 0.031), ('proximal', 0.031), ('rg', 0.031), ('stirling', 0.031), ('activity', 0.031), ('induction', 0.03), ('curve', 0.03), ('correlating', 0.029), ('stuart', 0.029), ('uences', 0.029), ('curves', 0.028), ('associative', 0.027), ('charge', 0.027), ('decaying', 0.027), ('hippocampal', 0.027), ('reviewed', 0.027), ('unimodal', 0.027), ('constants', 0.027), ('rule', 0.027), ('ow', 0.027), ('levels', 0.027), ('symmetric', 0.026), ('backpropagation', 0.026), ('pyramidal', 0.026), ('amplitude', 0.025), ('richard', 0.025), ('bimodal', 0.025), ('plain', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 27 nips-2003-Analytical Solution of Spike-timing Dependent Plasticity Based on Synaptic Biophysics

Author: Bernd Porr, Ausra Saudargiene, Florentin Wörgötter

2 0.25046101 183 nips-2003-Synchrony Detection by Analogue VLSI Neurons with Bimodal STDP Synapses

Author: Adria Bofill-i-petit, Alan F. Murray

Abstract: We present test results from spike-timing correlation learning experiments carried out with silicon neurons with STDP (Spike Timing Dependent Plasticity) synapses. The weight change scheme of the STDP synapses can be set to either weight-independent or weight-dependent mode. We present results that characterise the learning window implemented for both modes of operation. When presented with spike trains with diﬀerent types of synchronisation the neurons develop bimodal weight distributions. We also show that a 2-layered network of silicon spiking neurons with STDP synapses can perform hierarchical synchrony detection. 1

3 0.14569406 157 nips-2003-Plasticity Kernels and Temporal Statistics

Author: Peter Dayan, Michael Häusser, Michael London

Abstract: Computational mysteries surround the kernels relating the magnitude and sign of changes in efficacy as a function of the time difference between pre- and post-synaptic activity at a synapse. One important idea34 is that kernels result from filtering, ie an attempt by synapses to eliminate noise corrupting learning. This idea has hitherto been applied to trace learning rules; we apply it to experimentally-defined kernels, using it to reverse-engineer assumed signal statistics. We also extend it to consider the additional goal for filtering of weighting learning according to statistical surprise, as in the Z-score transform. This provides a fresh view of observed kernels and can lead to different, and more natural, signal statistics.

4 0.11911687 18 nips-2003-A Summating, Exponentially-Decaying CMOS Synapse for Spiking Neural Systems

Author: Rock Z. Shi, Timothy K. Horiuchi

Abstract: Synapses are a critical element of biologically-realistic, spike-based neural computation, serving the role of communication, computation, and modiﬁcation. Many different circuit implementations of synapse function exist with different computational goals in mind. In this paper we describe a new CMOS synapse design that separately controls quiescent leak current, synaptic gain, and time-constant of decay. This circuit implements part of a commonly-used kinetic model of synaptic conductance. We show a theoretical analysis and experimental data for prototypes fabricated in a commercially-available 1.5µm CMOS process. 1

5 0.090980761 125 nips-2003-Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model

Author: Liam Paninski, Eero P. Simoncelli, Jonathan W. Pillow

Abstract: Recent work has examined the estimation of models of stimulus-driven neural activity in which some linear ﬁltering process is followed by a nonlinear, probabilistic spiking stage. We analyze the estimation of one such model for which this nonlinear step is implemented by a noisy, leaky, integrate-and-ﬁre mechanism with a spike-dependent aftercurrent. This model is a biophysically plausible alternative to models with Poisson (memory-less) spiking, and has been shown to effectively reproduce various spiking statistics of neurons in vivo. However, the problem of estimating the model from extracellular spike train data has not been examined in depth. We formulate the problem in terms of maximum likelihood estimation, and show that the computational problem of maximizing the likelihood is tractable. Our main contribution is an algorithm and a proof that this algorithm is guaranteed to ﬁnd the global optimum with reasonable speed. We demonstrate the effectiveness of our estimator with numerical simulations. A central issue in computational neuroscience is the characterization of the functional relationship between sensory stimuli and neural spike trains. A common model for this relationship consists of linear ﬁltering of the stimulus, followed by a nonlinear, probabilistic spike generation process. The linear ﬁlter is typically interpreted as the neuron’s “receptive ﬁeld,” while the spiking mechanism accounts for simple nonlinearities like rectiﬁcation and response saturation. Given a set of stimuli and (extracellularly) recorded spike times, the characterization problem consists of estimating both the linear ﬁlter and the parameters governing the spiking mechanism. One widely used model of this type is the Linear-Nonlinear-Poisson (LNP) cascade model, in which spikes are generated according to an inhomogeneous Poisson process, with rate determined by an instantaneous (“memoryless”) nonlinear function of the ﬁltered input. This model has a number of desirable features, including conceptual simplicity and computational tractability. Additionally, reverse correlation analysis provides a simple unbiased estimator for the linear ﬁlter [5], and the properties of estimators (for both the linear ﬁlter and static nonlinearity) have been thoroughly analyzed, even for the case of highly non-symmetric or “naturalistic” stimuli [12]. One important drawback of the LNP model, * JWP and LP contributed equally to this work. We thank E.J. Chichilnisky for helpful discussions. L−NLIF model LNP model )ekips(P Figure 1: Simulated responses of LNLIF and LNP models to 20 repetitions of a ﬁxed 100-ms stimulus segment of temporal white noise. Top: Raster of responses of L-NLIF model, where σnoise /σsignal = 0.5 and g gives a membrane time constant of 15 ms. The top row shows the ﬁxed (deterministic) response of the model with σnoise set to zero. Middle: Raster of responses of LNP model, with parameters ﬁt with standard methods from a long run of the L-NLIF model responses to nonrepeating stimuli. Bottom: (Black line) Post-stimulus time histogram (PSTH) of the simulated L-NLIF response. (Gray line) PSTH of the LNP model. Note that the LNP model fails to preserve the ﬁne temporal structure of the spike trains, relative to the L-NLIF model. 001 05 0 )sm( emit however, is that Poisson processes do not accurately capture the statistics of neural spike trains [2, 9, 16, 1]. In particular, the probability of observing a spike is not a functional of the stimulus only; it is also strongly affected by the recent history of spiking. The leaky integrate-and-ﬁre (LIF) model provides a biophysically more realistic spike mechanism with a simple form of spike-history dependence. This model is simple, wellunderstood, and has dynamics that are entirely linear except for a nonlinear “reset” of the membrane potential following a spike. Although this model’s overriding linearity is often emphasized (due to the approximately linear relationship between input current and ﬁring rate, and lack of active conductances), the nonlinear reset has signiﬁcant functional importance for the model’s response properties. In previous work, we have shown that standard reverse correlation analysis fails when applied to a neuron with deterministic (noise-free) LIF spike generation; we developed a new estimator for this model, and demonstrated that a change in leakiness of such a mechanism might underlie nonlinear effects of contrast adaptation in macaque retinal ganglion cells [15]. We and others have explored other “adaptive” properties of the LIF model [17, 13, 19]. In this paper, we consider a model consisting of a linear ﬁlter followed by noisy LIF spike generation with a spike-dependent after-current; this is essentially the standard LIF model driven by a noisy, ﬁltered version of the stimulus, with an additional current waveform injected following each spike. We will refer to this as the the “L-NLIF” model. The probabilistic nature of this model provides several important advantages over the deterministic version we have considered previously. First, an explicit noise model allows us to couch the problem in the terms of classical estimation theory. This, in turn, provides a natural “cost function” (likelihood) for model assessment and leads to more efﬁcient estimation of the model parameters. Second, noise allows us to explicitly model neural ﬁring statistics, and could provide a rigorous basis for a metric distance between spike trains, useful in other contexts [18]. Finally, noise inﬂuences the behavior of the model itself, giving rise to phenomena not observed in the purely deterministic model [11]. Our main contribution here is to show that the maximum likelihood estimator (MLE) for the L-NLIF model is computationally tractable. Speciﬁcally, we describe an algorithm for computing the likelihood function, and prove that this likelihood function contains no non-global maxima, implying that the MLE can be computed efﬁciently using standard ascent techniques. The desirable statistical properties of this estimator (e.g. consistency, efﬁciency) are all inherited “for free” from classical estimation theory. Thus, we have a compact and powerful model for the neural code, and a well-motivated, efﬁcient way to estimate the parameters of this model from extracellular data. The Model We consider a model for which the (dimensionless) subthreshold voltage variable V evolves according to i−1 dV = − gV (t) + k · x(t) + j=0 h(t − tj ) dt + σNt , (1) and resets to Vr whenever V = 1. Here, g denotes the leak conductance, k · x(t) the projection of the input signal x(t) onto the linear kernel k, h is an “afterpotential,” a current waveform of ﬁxed amplitude and shape whose value depends only on the time since the last spike ti−1 , and Nt is an unobserved (hidden) noise process with scale parameter σ. Without loss of generality, the “leak” and “threshold” potential are set at 0 and 1, respectively, so the cell spikes whenever V = 1, and V decays back to 0 with time constant 1/g in the absence of input. Note that the nonlinear behavior of the model is completely determined by only a few parameters, namely {g, σ, Vr }, and h (where the function h is allowed to take values in some low-dimensional vector space). The dynamical properties of this type of “spike response model” have been extensively studied [7]; for example, it is known that this class of models can effectively capture much of the behavior of apparently more biophysically realistic models (e.g. Hodgkin-Huxley). Figures 1 and 2 show several simple comparisons of the L-NLIF and LNP models. In 1, note the ﬁne structure of spike timing in the responses of the L-NLIF model, which is qualitatively similar to in vivo experimental observations [2, 16, 9]). The LNP model fails to capture this ﬁne temporal reproducibility. At the same time, the L-NLIF model is much more ﬂexible and representationally powerful, as demonstrated in Fig. 2: by varying V r or h, for example, we can match a wide variety of dynamical behaviors (e.g. adaptation, bursting, bistability) known to exist in biological neurons. The Estimation Problem Our problem now is to estimate the model parameters {k, σ, g, Vr , h} from a sufﬁciently rich, dynamic input sequence x(t) together with spike times {ti }. A natural choice is the maximum likelihood estimator (MLE), which is easily proven to be consistent and statistically efﬁcient here. To compute the MLE, we need to compute the likelihood and develop an algorithm for maximizing it. The tractability of the likelihood function for this model arises directly from the linearity of the subthreshold dynamics of voltage V (t) during an interspike interval. In the noiseless case [15], the voltage trace during an interspike interval t ∈ [ti−1 , ti ] is given by the solution to equation (1) with σ = 0:   V0 (t) = Vr e−gt + t ti−1 i−1 k · x(s) + j=0 h(s − tj ) e−g(t−s) ds, (2) A stimulus h current responses 0 0 0 1 )ces( t 0 2. 0 t stimulus x 0 B c responses c=1 h current 0 c=2 2. 0 c=5 1 )ces( t t 0 0 stimulus C 0 h current responses Figure 2: Illustration of diverse behaviors of L-NLIF model. A: Firing rate adaptation. A positive DC current (top) was injected into three model cells differing only in their h currents (shown on left: top, h = 0; middle, h depolarizing; bottom, h hyperpolarizing). Voltage traces of each cell’s response (right, with spikes superimposed) exhibit rate facilitation for depolarizing h (middle), and rate adaptation for hyperpolarizing h (bottom). B: Bursting. The response of a model cell with a biphasic h current (left) is shown as a function of the three different levels of DC current. For small current levels (top), the cell responds rhythmically. For larger currents (middle and bottom), the cell responds with regular bursts of spikes. C: Bistability. The stimulus (top) is a positive followed by a negative current pulse. Although a cell with no h current (middle) responds transiently to the positive pulse, a cell with biphasic h (bottom) exhibits a bistable response: the positive pulse puts it into a stable ﬁring regime which persists until the arrival of a negative pulse. 0 0 1 )ces( t 0 5 0. t 0 which is simply a linear convolution of the input current with a negative exponential. It is easy to see that adding Gaussian noise to the voltage during each time step induces a Gaussian density over V (t), since linear dynamics preserve Gaussianity [8]. This density is uniquely characterized by its ﬁrst two moments; the mean is given by (2), and its covariance T is σ 2 Eg Eg , where Eg is the convolution operator corresponding to e−gt . Note that this density is highly correlated for nearby points in time, since noise is integrated by the linear dynamics. Intuitively, smaller leak conductance g leads to stronger correlation in V (t) at nearby time points. We denote this Gaussian density G(xi , k, σ, g, Vr , h), where index i indicates the ith spike and the corresponding stimulus chunk xi (i.e. the stimuli that inﬂuence V (t) during the ith interspike interval). Now, on any interspike interval t ∈ [ti−1 , ti ], the only information we have is that V (t) is less than threshold for all times before ti , and exceeds threshold during the time bin containing ti . This translates to a set of linear constraints on V (t), expressed in terms of the set Ci = ti−1 ≤t < 1 ∩ V (ti ) ≥ 1 . Therefore, the likelihood that the neuron ﬁrst spikes at time ti , given a spike at time ti−1 , is the probability of the event V (t) ∈ Ci , which is given by Lxi ,ti (k, σ, g, Vr , h) = G(xi , k, σ, g, Vr , h), Ci the integral of the Gaussian density G(xi , k, σ, g, Vr , h) over the set Ci . sulumits Figure 3: Behavior of the L-NLIF model during a single interspike interval, for a single (repeated) input current (top). Top middle: Ten simulated voltage traces V (t), evaluated up to the ﬁrst threshold crossing, conditional on a spike at time zero (Vr = 0). Note the strong correlation between neighboring time points, and the sparsening of the plot as traces are eliminated by spiking. Bottom Middle: Time evolution of P (V ). Each column represents the conditional distribution of V at the corresponding time (i.e. for all traces that have not yet crossed threshold). Bottom: Probability density of the interspike interval (isi) corresponding to this particular input. Note that probability mass is concentrated at the points where input drives V0 (t) close to threshold. rhtV secart V 0 rhtV )V(P 0 )isi(P 002 001 )cesm( t 0 0 Spiking resets V to Vr , meaning that the noise contribution to V in different interspike intervals is independent. This “renewal” property, in turn, implies that the density over V (t) for an entire experiment factorizes into a product of conditionally independent terms, where each of these terms is one of the Gaussian integrals derived above for a single interspike interval. The likelihood for the entire spike train is therefore the product of these terms over all observed spikes. Putting all the pieces together, then, the full likelihood is L{xi ,ti } (k, σ, g, Vr , h) = G(xi , k, σ, g, Vr , h), i Ci where the product, again, is over all observed spike times {ti } and corresponding stimulus chunks {xi }. Now that we have an expression for the likelihood, we need to be able to maximize it. Our main result now states, basically, that we can use simple ascent algorithms to compute the MLE without getting stuck in local maxima. Theorem 1. The likelihood L{xi ,ti } (k, σ, g, Vr , h) has no non-global extrema in the parameters (k, σ, g, Vr , h), for any data {xi , ti }. The proof [14] is based on the log-concavity of L{xi ,ti } (k, σ, g, Vr , h) under a certain parametrization of (k, σ, g, Vr , h). The classical approach for establishing the nonexistence of non-global maxima of a given function uses concavity, which corresponds roughly to the function having everywhere non-positive second derivatives. However, the basic idea can be extended with the use of any invertible function: if f has no non-global extrema, neither will g(f ), for any strictly increasing real function g. The logarithm is a natural choice for g in any probabilistic context in which independence plays a role, since sums are easier to work with than products. Moreover, concavity of a function f is strictly stronger than logconcavity, so logconcavity can be a powerful tool even in situations for which concavity is useless (the Gaussian density is logconcave but not concave, for example). Our proof relies on a particular theorem [3] establishing the logconcavity of integrals of logconcave functions, and proceeds by making a correspondence between this type of integral and the integrals that appear in the deﬁnition of the L-NLIF likelihood above. We should also note that the proof extends without difﬁculty to some other noise processes which generate logconcave densities (where white noise has the standard Gaussian density); for example, the proof is nearly identical if Nt is allowed to be colored or nonGaussian noise, with possibly nonzero drift. Computational methods and numerical results Theorem 1 tells us that we can ascend the likelihood surface without fear of getting stuck in local maxima. Now how do we actually compute the likelihood? This is a nontrivial problem: we need to be able to quickly compute (or at least approximate, in a rational way) integrals of multivariate Gaussian densities G over simple but high-dimensional orthants Ci . We discuss two ways to compute these integrals; each has its own advantages. The ﬁrst technique can be termed “density evolution” [10, 13]. The method is based on the following well-known fact from the theory of stochastic differential equations [8]: given the data (xi , ti−1 ), the probability density of the voltage process V (t) up to the next spike ti satisﬁes the following partial differential (Fokker-Planck) equation: ∂P (V, t) σ2 ∂ 2 P ∂[(V − Veq (t))P ] = , +g 2 ∂t 2 ∂V ∂V under the boundary conditions (3) P (V, ti−1 ) = δ(V − Vr ), P (Vth , t) = 0; where Veq (t) is the instantaneous equilibrium potential:   i−1 1 Veq (t) = h(t − tj ) . k · x(t) + g j=0 Moreover, the conditional ﬁring rate f (t) satisﬁes t ti−1 f (s)ds = 1 − P (V, t)dV. Thus standard techniques for solving the drift-diffusion evolution equation (3) lead to a fast method for computing f (t) (as illustrated in Fig. 2). Finally, the likelihood Lxi ,ti (k, σ, g, Vr , h) is simply f (ti ). While elegant and efﬁcient, this density evolution technique turns out to be slightly more powerful than what we need for the MLE: recall that we do not need to compute the conditional rate function f at all times t, but rather just at the set of spike times {ti }, and thus we can turn to more specialized techniques for faster performance. We employ a rapid technique for computing the likelihood using an algorithm due to Genz [6], designed to compute exactly the kinds of multidimensional Gaussian probability integrals considered here. This algorithm works well when the orthants Ci are deﬁned by fewer than ≈ 10 linear constraints on V (t). The number of actual constraints on V (t) during an interspike interval (ti+1 − ti ) grows linearly in the length of the interval: thus, to use this algorithm in typical data situations, we adopt a strategy proposed in our work on the deterministic form of the model [15], in which we discard all but a small subset of the constraints. The key point is that, due to strong correlations in the noise and the fact that the constraints only ﬁgure signiﬁcantly when the V (t) is driven close to threshold, a small number of constraints often sufﬁce to approximate the true likelihood to a high degree of precision. h mitse h eurt K mitse ATS K eurt 0 0 06 )ekips retfa cesm( t 03 0 0 )ekips erofeb cesm( t 001- 002- Figure 4: Demonstration of the estimator’s performance on simulated data. Dashed lines show the true kernel k and aftercurrent h; k is a 12-sample function chosen to resemble the biphasic temporal impulse response of a macaque retinal ganglion cell, while h is function speciﬁed in a ﬁve-dimensional vector space, whose shape induces a slight degree of burstiness in the model’s spike responses. The L-NLIF model was stimulated with parameters g = 0.05 (corresponding to a membrane time constant of 20 time-samples), σ noise = 0.5, and Vr = 0. The stimulus was 30,000 time samples of white Gaussian noise with a standard deviation of 0.5. With only 600 spikes of output, the estimator is able to retrieve an estimate of k (gray curve) which closely matches the true kernel. Note that the spike-triggered average (black curve), which is an unbiased estimator for the kernel of an LNP neuron [5], differs signiﬁcantly from this true kernel (see also [15]). The accuracy of this approach improves with the number of constraints considered, but performance is fastest with fewer constraints. Therefore, because ascending the likelihood function requires evaluating the likelihood at many different points, we can make this ascent process much quicker by applying a version of the coarse-to-ﬁne idea. Let L k denote the approximation to the likelihood given by allowing only k constraints in the above algorithm. Then we know, by a proof identical to that of Theorem 1, that Lk has no local maxima; in addition, by the above logic, Lk → L as k grows. It takes little additional effort to prove that argmax Lk → argmax L; thus, we can efﬁciently ascend the true likelihood surface by ascending the “coarse” approximants Lk , then gradually “reﬁning” our approximation by letting k increase. An application of this algorithm to simulated data is shown in Fig. 4. Further applications to both simulated and real data will be presented elsewhere. Discussion We have shown here that the L-NLIF model, which couples a linear ﬁltering stage to a biophysically plausible and ﬂexible model of neuronal spiking, can be efﬁciently estimated from extracellular physiological data using maximum likelihood. Moreover, this model lends itself directly to analysis via tools from the modern theory of point processes. For example, once we have obtained our estimate of the parameters (k, σ, g, Vr , h), how do we verify that the resulting model provides an adequate description of the data? This important “model validation” question has been the focus of some recent elegant research, under the rubric of “time rescaling” techniques [4]. While we lack the room here to review these methods in detail, we can note that they depend essentially on knowledge of the conditional ﬁring rate function f (t). Recall that we showed how to efﬁciently compute this function in the last section and examined some of its qualitative properties in the L-NLIF context in Figs. 2 and 3. We are currently in the process of applying the model to physiological data recorded both in vivo and in vitro, in order to assess whether it accurately accounts for the stimulus preferences and spiking statistics of real neurons. One long-term goal of this research is to elucidate the different roles of stimulus-driven and stimulus-independent activity on the spiking patterns of both single cells and multineuronal ensembles. References [1] B. Aguera y Arcas and A. Fairhall. What causes a neuron to spike? 15:1789–1807, 2003. Neral Computation, [2] M. Berry and M. Meister. Refractoriness and neural precision. Journal of Neuroscience, 18:2200–2211, 1998. [3] V. Bogachev. Gaussian Measures. AMS, New York, 1998. [4] E. Brown, R. Barbieri, V. Ventura, R. Kass, and L. Frank. The time-rescaling theorem and its application to neural spike train data analysis. Neural Computation, 14:325–346, 2002. [5] E. Chichilnisky. A simple white noise analysis of neuronal light responses. Network: Computation in Neural Systems, 12:199–213, 2001. [6] A. Genz. Numerical computation of multivariate normal probabilities. Journal of Computational and Graphical Statistics, 1:141–149, 1992. [7] W. Gerstner and W. Kistler. Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, 2002. [8] S. Karlin and H. Taylor. A Second Course in Stochastic Processes. Academic Press, New York, 1981. [9] J. Keat, P. Reinagel, R. Reid, and M. Meister. Predicting every spike: a model for the responses of visual neurons. Neuron, 30:803–817, 2001. [10] B. Knight, A. Omurtag, and L. Sirovich. The approach of a neuron population ﬁring rate to a new equilibrium: an exact theoretical result. Neural Computation, 12:1045–1055, 2000. [11] J. Levin and J. Miller. Broadband neural encoding in the cricket cercal sensory system enhanced by stochastic resonance. Nature, 380:165–168, 1996. [12] L. Paninski. Convergence properties of some spike-triggered analysis techniques. Network: Computation in Neural Systems, 14:437–464, 2003. [13] L. Paninski, B. Lau, and A. Reyes. Noise-driven adaptation: in vitro and mathematical analysis. Neurocomputing, 52:877–883, 2003. [14] L. Paninski, J. Pillow, and E. Simoncelli. Maximum likelihood estimation of a stochastic integrate-and-ﬁre neural encoding model. submitted manuscript (cns.nyu.edu/∼liam), 2004. [15] J. Pillow and E. Simoncelli. Biases in white noise analysis due to non-poisson spike generation. Neurocomputing, 52:109–115, 2003. [16] D. Reich, J. Victor, and B. Knight. The power ratio and the interval map: Spiking models and extracellular recordings. The Journal of Neuroscience, 18:10090–10104, 1998. [17] M. Rudd and L. Brown. Noise adaptation in integrate-and-ﬁre neurons. Neural Computation, 9:1047–1069, 1997. [18] J. Victor. How the brain uses time to represent and process visual information. Brain Research, 886:33–46, 2000. [19] Y. Yu and T. Lee. Dynamical mechanisms underlying contrast gain control in sing le neurons. Physical Review E, 68:011901, 2003.

6 0.082105696 61 nips-2003-Entrainment of Silicon Central Pattern Generators for Legged Locomotory Control

7 0.081920505 16 nips-2003-A Recurrent Model of Orientation Maps with Simple and Complex Cells

8 0.076047145 127 nips-2003-Mechanism of Neural Interference by Transcranial Magnetic Stimulation: Network or Single Neuron?

9 0.065776892 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms

10 0.061908331 93 nips-2003-Information Dynamics and Emergent Computation in Recurrent Circuits of Spiking Neurons

11 0.05989426 81 nips-2003-Geometric Analysis of Constrained Curves

12 0.057386339 184 nips-2003-The Diffusion-Limited Biochemical Signal-Relay Channel

13 0.047305804 45 nips-2003-Circuit Optimization Predicts Dynamic Networks for Chemosensory Orientation in Nematode C. elegans

14 0.04433699 185 nips-2003-The Doubly Balanced Network of Spiking Neurons: A Memory Model with High Capacity

15 0.042316753 110 nips-2003-Learning a World Model and Planning with a Self-Organizing, Dynamic Neural System

16 0.041222617 104 nips-2003-Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks

17 0.038488943 79 nips-2003-Gene Expression Clustering with Functional Mixture Models

18 0.036870893 139 nips-2003-Nonlinear Filtering of Electron Micrographs by Means of Support Vector Regression

19 0.036147244 132 nips-2003-Multiple Instance Learning via Disjunctive Programming Boosting

20 0.035993457 106 nips-2003-Learning Non-Rigid 3D Shape from 2D Motion

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.113), (1, 0.055), (2, 0.236), (3, 0.054), (4, 0.09), (5, -0.007), (6, -0.102), (7, 0.008), (8, 0.041), (9, 0.003), (10, 0.005), (11, 0.015), (12, 0.011), (13, -0.041), (14, -0.082), (15, -0.063), (16, 0.063), (17, -0.004), (18, 0.077), (19, -0.088), (20, -0.002), (21, 0.146), (22, -0.064), (23, 0.092), (24, -0.028), (25, 0.048), (26, 0.068), (27, 0.031), (28, -0.049), (29, 0.041), (30, 0.017), (31, -0.232), (32, 0.062), (33, -0.24), (34, -0.102), (35, 0.01), (36, 0.084), (37, -0.026), (38, -0.088), (39, 0.047), (40, 0.029), (41, -0.136), (42, 0.033), (43, -0.181), (44, 0.06), (45, -0.019), (46, -0.097), (47, 0.059), (48, -0.06), (49, -0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97733605 27 nips-2003-Analytical Solution of Spike-timing Dependent Plasticity Based on Synaptic Biophysics

Author: Bernd Porr, Ausra Saudargiene, Florentin Wörgötter

2 0.77645087 183 nips-2003-Synchrony Detection by Analogue VLSI Neurons with Bimodal STDP Synapses

Author: Adria Bofill-i-petit, Alan F. Murray

3 0.73447299 157 nips-2003-Plasticity Kernels and Temporal Statistics

Author: Peter Dayan, Michael Häusser, Michael London

4 0.51093829 61 nips-2003-Entrainment of Silicon Central Pattern Generators for Legged Locomotory Control

Author: Francesco Tenore, Ralph Etienne-Cummings, M. A. Lewis

Abstract: We have constructed a second generation CPG chip capable of generating the necessary timing to control the leg of a walking machine. We demonstrate improvements over a previous chip by moving toward a significantly more versatile device. This includes a larger number of silicon neurons, more sophisticated neurons including voltage dependent charging and relative and absolute refractory periods, and enhanced programmability of neural networks. This chip builds on the basic results achieved on a previous chip and expands its versatility to get closer to a self-contained locomotion controller for walking robots. 1

5 0.50819981 18 nips-2003-A Summating, Exponentially-Decaying CMOS Synapse for Spiking Neural Systems

Author: Rock Z. Shi, Timothy K. Horiuchi

6 0.41314358 125 nips-2003-Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model

7 0.33593392 127 nips-2003-Mechanism of Neural Interference by Transcranial Magnetic Stimulation: Network or Single Neuron?

8 0.32958132 184 nips-2003-The Diffusion-Limited Biochemical Signal-Relay Channel

9 0.28507295 81 nips-2003-Geometric Analysis of Constrained Curves

10 0.27118474 110 nips-2003-Learning a World Model and Planning with a Self-Organizing, Dynamic Neural System

11 0.26643482 165 nips-2003-Reasoning about Time and Knowledge in Neural Symbolic Learning Systems

12 0.26171947 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms

13 0.24235836 187 nips-2003-Training a Quantum Neural Network

14 0.23000969 89 nips-2003-Impact of an Energy Normalization Transform on the Performance of the LF-ASD Brain Computer Interface

15 0.22353318 106 nips-2003-Learning Non-Rigid 3D Shape from 2D Motion

16 0.2221586 104 nips-2003-Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks

17 0.20279096 44 nips-2003-Can We Learn to Beat the Best Stock

18 0.1967773 16 nips-2003-A Recurrent Model of Orientation Maps with Simple and Complex Cells

19 0.1865211 162 nips-2003-Probabilistic Inference of Speech Signals from Phaseless Spectrograms

20 0.17941622 85 nips-2003-Human and Ideal Observers for Detecting Image Curves

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.029), (29, 0.017), (30, 0.044), (35, 0.036), (53, 0.093), (59, 0.035), (63, 0.09), (65, 0.343), (71, 0.055), (76, 0.033), (85, 0.044), (91, 0.081)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83087331 27 nips-2003-Analytical Solution of Spike-timing Dependent Plasticity Based on Synaptic Biophysics

Author: Bernd Porr, Ausra Saudargiene, Florentin Wörgötter

2 0.62285978 63 nips-2003-Error Bounds for Transductive Learning via Compression and Clustering

Author: Philip Derbeko, Ran El-Yaniv, Ron Meir

Abstract: This paper is concerned with transductive learning. Although transduction appears to be an easier task than induction, there have not been many provably useful algorithms and bounds for transduction. We present explicit error bounds for transduction and derive a general technique for devising bounds within this setting. The technique is applied to derive error bounds for compression schemes such as (transductive) SVMs and for transduction algorithms based on clustering. 1 Introduction and Related Work In contrast to inductive learning, in the transductive setting the learner is given both the training and test sets prior to learning. The goal of the learner is to infer (or “transduce”) the labels of the test points. The transduction setting was introduced by Vapnik [1, 2] who proposed basic bounds and an algorithm for this setting. Clearly, inferring the labels of points in the test set can be done using an inductive scheme. However, as pointed out in [2], it makes little sense to solve an easier problem by ‘reducing’ it to a much more difﬁcult one. In particular, the prior knowledge carried by the (unlabeled) test points can be incorporated into an algorithm, potentially leading to superior performance. Indeed, a number of papers have demonstrated empirically that transduction can offer substantial advantage over induction whenever the training set is small or moderate (see e.g. [3, 4, 5, 6]). However, unlike the current state of affairs in induction, the question of what are provably effective learning principles for transduction is quite far from being resolved. In this paper we provide new error bounds and a general technique for transductive learning. Our technique is based on bounds that can be viewed as an extension of McAllester’s PAC-Bayesian framework [7, 8] to transductive learning. The main advantage of using this framework in transduction is that here priors can be selected after observing the unlabeled data (but before observing the labeled sample). This ﬂexibility allows for the choice of “compact priors” (with small support) and therefore, for tight bounds. Another simple observation is that the PAC-Bayesian framework can be operated with polynomially (in m, the training sample size) many different priors simultaneously. Altogether, this added ﬂexibility, of using data-dependent multiple priors allows for easy derivation of tight error bounds for “compression schemes” such as (transductive) SVMs and for clustering algorithms. We brieﬂy review some previous results. The idea of transduction, and a speciﬁc algorithm for SVM transductive learning, was introduced and studied by Vapnik (e.g. [2]), where an error bound is also proposed. However, this bound is implicit and rather unwieldy and, to the best of our knowledge, has not been applied in practical situations. A PAC-Bayes bound [7] for transduction with Perceptron Decision Trees is given in [9]. The bound is data-dependent depending on the number of decision nodes, the margins at each node and the sample size. However, the authors state that the transduction bound is not much tighter than the induction bound. Empirical tests show that this transduction algorithm performs slightly better than induction in terms of the test error, however, the advantage is usually statistically insigniﬁcant. Reﬁning the algorithm of [2] a transductive algorithm based on a SVMs is proposed in [3]. The paper also provides empirical tests indicating that transduction is advantageous in the text categorization domain. An error bound for transduction, based on the effective VC Dimension, is given in [10]. More recently Lanckriet et al. [11] derived a transductive bound for kernel methods based on spectral properties of the kernel matrix. Blum and Langford [12] recently also established an implicit bound for transduction, in the spirit of the results in [2]. 2 The Transduction Setup We consider the following setting proposed by Vapnik ([2] Chp. 8), which for simplicity is described in the context of binary classiﬁcation (the general case will be discussed in the full paper). Let H be a set of binary hypotheses consisting of functions from input space X to {±1} and let Xm+u = {x1 , . . . , xm+u } be a set of points from X each of which is chosen i.i.d. according to some unknown distribution µ(x). We call Xm+u the full sample. Let Xm = {x1 , . . . , xm } and Ym = {y1 , . . . , ym }, where Xm is drawn uniformly from Xm+u and yi ∈ {±1}. The set Sm = {(x1 , y1 ), . . . , (xm , ym )} is referred to as a training sample. In this paper we assume that yi = φ(xi ) for some unknown function φ. The remaining subset Xu = Xm+u \ Xm is referred to as the unlabeled sample. Based on Sm and Xu our goal is to choose h ∈ H which predicts the labels of points in Xu as accurately as possible. For each h ∈ H and a set Z = x1 , . . . , x|Z| of samples deﬁne 1 Rh (Z) = |Z| |Z| (h(xi ), yi ), (1) i=1 where in our case (·, ·) is the zero-one loss function. Our goal in transduction is to learn an h such that Rh (Xu ) is as small as possible. This problem setup is summarized by the following transduction “protocol” introduced in [2] and referred to as Setting 1: (i) A full sample Xm+u = {x1 , . . . , xm+u } consisting of arbitrary m + u points is given.1 (ii) We then choose uniformly at random the training sample Xm ⊆ Xm+u and receive its labeling Ym ; the resulting training set is Sm = (Xm , Ym ) and the remaining set Xu is the unlabeled sample, Xu = Xm+u \ Xm ; (iii) Using both Sm and Xu we select a classiﬁer h ∈ H whose quality is measured by Rh (Xu ). Vapnik [2] also considers another formulation of transduction, referred to as Setting 2: (i) We are given a training set Sm = (Xm , Ym ) selected i.i.d according to µ(x, y). (ii) An independent test set Su = (Xu , Yu ) of u samples is then selected in the same manner. 1 The original Setting 1, as proposed by Vapnik, discusses a full sample whose points are chosen independently at random according to some source distribution µ(x). (iii) We are required to choose our best h ∈ H based on Sm and Xu so as to minimize m+u Rm,u (h) = 1 (h(xi ), yi ) dµ(x1 , y1 ) · · · dµ(xm+u , ym+u ). u i=m+1 (2) Even though Setting 2 may appear more applicable in practical situations than Setting 1, the derivation of theoretical results can be easier within Setting 1. Nevertheless, as far as the expected losses are concerned, Vapnik [2] shows that an error bound in Setting 1 implies an equivalent bound in Setting 2. In view of this result we restrict ourselves in the sequel to Setting 1. We make use of the following quantities, which are all instances of (1). The quantity Rh (Xm+u ) is called the full sample risk of the hypothesis h, Rh (Xu ) is referred to as the transduction risk (of h), and Rh (Xm ) is the training error (of h). Thus, Rh (Xm ) is ˆ the standard training error denoted by Rh (Sm ). While our objective in transduction is to achieve small error over the unlabeled set (i.e. to minimize Rh (Xu )), it turns out that it is much easier to derive error bounds for the full sample risk. The following simple lemma translates an error bound on Rh (Xm+u ), the full sample risk, to an error bound on the transduction risk Rh (Xu ). Lemma 2.1 For any h ∈ H and any C ˆ Rh (Xm+u ) ≤ Rh (Sm ) + C ⇔ ˆ Rh (Xu ) ≤ Rh (Sm ) + m+u · C. u (3) Proof: For any h Rh (Xm+u ) = mRh (Xm ) + uRh (Xu ) . m+u (4) ˆ Substituting Rh (Sm ) for Rh (Xm ) in (4) and then substituting the result for the left-hand side of (3) we get Rh (Xm+u ) = ˆ mRh (Sm ) + uRh (Xu ) ˆ ≤ Rh (Sm ) + C. m+u The equivalence (3) is now obtained by isolating Rh (Xu ) on the left-hand side. 2 3 General Error Bounds for Transduction Consider a hypothesis class H and assume for simplicity that H is countable; in fact, in the case of transduction it sufﬁces to consider a ﬁnite hypothesis class. To see this note that all m + u points are known in advance. Thus, in the case of binary classiﬁcation (for example) it sufﬁces to consider at most 2m+u possible dichotomies. Recall that in the setting considered we select a sub-sample of m points from the set Xm+u of cardinality m+u. This corresponds to a selection of m points without replacement from a set of m+u points, leading to the m points being dependent. A naive utilization of large deviation bounds would therefore not be directly applicable in this setting. However, Hoeffding (see Theorem 4 in [13]) pointed out a simple procedure to transform the problem into one involving independent data. While this procedure leads to non-trivial bounds, it does not fully take advantage of the transductive setting and will not be used here. Consider for simplicity the case of binary classiﬁcation. In this case we make use of the following concentration inequality, based on [14]. Theorem 3.1 Let C = {c1 , . . . , cN }, ci ∈ {0, 1}, be a ﬁnite set of binary numbers, and N set c = (1/N ) i=1 ci . Let Z1 , . . . , Zm , be random variables obtaining their values ¯ by sampling C uniformly at random without replacement. Set Z = (1/m) β = m/N . Then, if 2 ε ≤ min{1 − c, c(1 − β)/β}, ¯¯ Pr {Z − EZ > ε} ≤ exp −mD(¯ + ε c) − (N − m) D c − c ¯ ¯ m i=1 Zi and βε c + 7 log(N + 1) ¯ 1−β where D(p q) = p log(p/q) = (1 − p) log(1 − p)/(1 − q), p, q, ∈ [0, 1] is the binary Kullback-Leibler divergence. Using this result we obtain the following error bound for transductive classiﬁcation. Theorem 3.2 Let Xm+u = Xm ∪Xu be the full sample and let p = p(Xm+u ) be a (prior) distribution over the class of binary hypotheses H that may depend on the full sample. Let δ ∈ (0, 1) be given. Then, with probability at least 1 − δ over choices of Sm (from the full sample) the following bound holds for any h ∈ H, ˆ 2Rh (Sm )(m + u) u ˆ Rh (Xu ) ≤ Rh (Sm ) + + 2 log 1 p(h) log + ln m + 7 log(m + u + 1) δ m−1 + ln m + 7 log(m + u + 1) δ m−1 1 p(h) . (5) Proof: (sketch) In our transduction setting the set Xm (and therefore Sm ) is obtained by sampling the full sample Xm+u uniformly at random without replacement. We ﬁrst claim that ˆ EΣm Rh (Sm ) = Rh (Xm+u ), (6) where EΣm (·) is the expectation with respect to a random choice of Sm from Xm+u without replacement. This is shown as follows. ˆ EΣm Rh (Sm ) = 1 m+u m ˆ Rh (Sm ) = Sm 1 m+u m Xm ⊆Xm+n 1 m (h(x), φ(x)). x∈Sm By symmetry, all points x ∈ Xm+u are counted on the right-hand side an equal number of times; this number is precisely m+u − m+u−1 = m+u−1 . The equality (6) is obtained m m m−1 m by considering the deﬁnition of Rh (Xm+u ) and noting that m+u−1 / m+u = m+u . m−1 m The remainder of the proof combines Theorem 3.1 and the techniques presented in [15]. The details will be provided in the full paper. 2 ˆ Notice that when Rh (Sm ) → 0 the square root in (5) vanishes and faster rates are obtained. An important feature of Theorem 3.2 is that it allows one to use the sample Xm+u in order to choose the prior distribution p(h). This advantage has already been alluded to in [2], but does not seem to have been widely used in practice. Additionally, observe that (5) holds with probability at least 1 − δ with respect to the random selection of sub-samples of size m from the ﬁxed set Xm+u . This should be contrasted with the standard inductive setting results where the probabilities are with respect to a random choice of m training points chosen i.i.d. from µ(x, y). The next bound we present is analogous to McAllester’s Theorem 1 in [8]. This theorem concerns Gibbs composite classiﬁers, which are distributions over the base classiﬁers in H. For any distribution q over H denote by Gq the Gibbs classiﬁer, which classiﬁes an 2 The second condition, ε ≤ c(1 − β)/β, simply guarantees that the number of ‘ones’ in the ¯ sub-sample does not exceed their number in the original sample. , instance (in Xu ) by randomly choosing, according to q, one hypothesis h ∈ H. For Gibbs classiﬁers we now extend deﬁnition (1) as follows. Let Z = x1 , . . . , x|Z| be any set of samples and let Gq be a Gibbs classiﬁer over H. The risk of Gq over Z is RGq (Z) = Eh∼q (1/|Z|) |Z| i=1 (h(xi ), φ(xi )) . As before, when Z = Xm (the training set) we ˆ use the standard notation RGq (Sm ) = RGq (Xm ). Due to space limitations, the proof of the following theorem will appear in the full paper. Theorem 3.3 Let Xm+u be the full sample. Let p be a distribution over H that may depend on Xm+u and let q be a (posterior) distribution over H that may depend on both Sm and Xu . Let δ ∈ (0, 1) be given. With probability at least 1 − δ over the choices of Sm for any distribution q ˆ RGq (Xu ) ≤ RGq (Sm ) + + ˆ 2RGq (Sm )(m + u) u D(q p) + ln m + 7 log(m + u + 1) δ m−1 7 2 D(q p) + ln m + m log(m + u + 1) δ m−1 . In the context of inductive learning, a major obstacle in generating meaningful and effective bounds using the PAC-Bayesian framework [8] is the construction of “compact priors”. Here we discuss two extensions to the PAC-Bayesian scheme, which together allow for easy choices of compact priors that can yield tight error bounds. The ﬁrst extension we offer is the use of multiple priors. Instead of a single prior p in the original PACBayesian framework we observe that one can use all PAC-Bayesian bounds with a number of priors p1 , . . . , pk and then replace the complexity term ln(1/p(h)) (in Theorem 3.2) by mini ln(1/pi (h)), at a cost of an additional ln k term (see below). Similarly, in Theorem 3.3 we can replace the KL-divergence term in the bound with mini D(q||pi ). The penalty for using k priors is logarithmic in k (speciﬁcally the ln(1/δ) term in the original bound becomes ln(k/δ)). As long as k is sub-exponential in m we still obtain effective generalization bounds. The second “extension” is simply the feature of our transduction bounds (Theorems 3.2 and 3.3), which allows for the priors to be dependent on the full sample Xm+u . The combination of these two simple ideas yields a powerful technique for deriving error bounds in realistic transductive settings. After stating the extended result we later use it for deriving tight bounds for known learning algorithms and for deriving new algorithms. Suppose that instead of a single prior p over H we want to utilize k priors, p1 , . . . , pk and in retrospect choose the best among the k corresponding PAC-Bayesian bounds. The following theorem shows that one can use polynomially many priors with a minor penalty. The proof, which is omitted due to space limitations, utilizes the union bound in a straightforward manner. Theorem 3.4 Let the conditions of Theorem 3.2 hold, except that we now have k prior distributions p1 , . . . , pk deﬁned over H, each of which may depend on Xm+u . Let δ ∈ (0, 1) be given. Then, with probability at least 1 − δ over random choices of sub-samples of size m from the full-sample, for all h ∈ H, (5) holds with p(h) replaced by min1≤i≤k pi (h) and log 1 is replaced by log k . δ δ Remark: A similar result holds for the Gibbs algorithm of Theorem 3.3. Also, as noted by one of the reviewers, when the supports of the k priors intersect (i.e. there is at least one pair of priors pi and pj with overlapping support), then one can do better by utilizing the 1 “super prior” p = k i pi within the original Theorem 3.2. However, note that when the supports are disjoint, these two views (of multiple priors and a super prior) are equivalent. In the applications below we utilize non-intersecting priors. 4 Bounds for Compression Algorithms Here we propose a technique for bounding the error of “compression” algorithms based on appropriate construction of prior probabilities. Let A be a learning algorithm. Intuitively, A is a “compression scheme” if it can generate the same hypothesis using a subset of the data. More formally, a learning algorithm A (viewed as a function from samples to some hypothesis class) is a compression scheme with respect to a sample Z if there is a subsample Z , Z ⊂ Z, such that A(Z ) = A(Z). Observe that the SVM approach is a compression scheme, with Z being determined by the set of support vectors. Let A be a deterministic compression scheme and consider the full sample Xm+u . For each integer τ = 1, . . . , m, consider all subsets of Xm+u of size τ , and for each subset construct all possible dichotomies of that subset (note that we are not proposing this approach as an algorithm, but rather as a means to derive bounds; in practice one need not construct all these dichotomies). A deterministic algorithm A uniquely determines at most one hypothesis h ∈ H for each dichotomy.3 For each τ , let the set of hypotheses generated by this procedure be denoted by Hτ . For the rest of this discussion we assume the worst case where |Hτ | = m+u (i.e. if Hτ does not contains one hypothesis for each dichotomy τ the bounds improve). The prior pτ is then deﬁned to be a uniform distribution over Hτ . In this way we have m priors, p1 , . . . , pm which are constructed using only Xm+u (and are independent of Sm ). Any hypothesis selected by the learning algorithm A based on the labeled sample Sm and on the test set Xu belongs to ∪m Hτ . The motivation for this τ =1 construction is as follows. Each τ can be viewed as our “guess” for the maximal number of compression points that will be utilized by a resulting classiﬁer. For each such τ the prior pτ is constructed over all possible classiﬁers that use τ compression points. By systematically considering all possible dichotomies of τ points we can characterize a relatively small subset of H without observing labels of the training points. Thus, each prior pτ represents one such guess. Using Theorem 3.4 we are later allowed to choose in retrospect the bound corresponding to the best “guess”. The following corollary identiﬁes an upper bound on the divergence in terms of the observed size of the compression set of the ﬁnal classiﬁer. Corollary 4.1 Let the conditions of Theorem 3.4 hold. Let A be a deterministic learning algorithm leading to a hypothesis h ∈ H based on a compression set of size s. Then with probability at least 1 − δ for all h ∈ H, (5) holds with log(1/p(h)) replaced by s log(2e(m + u)/s) and ln(m/δ) replaced by ln(m2 /δ). Proof: Recall that Hs ⊆ H is the support set of ps and that ps (h) = 1/|Hs | for all h ∈ Hs , implying that ln(1/ps (h)) = |Hs |. Using the inequality m+u ≤ (e(m + u)/s)s s we have that |Hs | = 2s m+u ≤ (2e(m + u)/s)s . Substituting this result in Theorem 3.4 s while restricting the minimum over i to be over i ≥ s, leads to the desired result. 2 The bound of Corollary 4.1 can be easily computed once the classiﬁer is trained. If the size of the compression set happens to be small, we obtain a tight bound. SVM classiﬁcation is one of the best studied compression schemes. The compression set for a sample Sm is given by the subset of support vectors. Thus the bound in Corollary 4.1 immediately applies with s being the number of observed support vectors (after training). We note that this bound is similar to a recently derived compression bound for inductive learning (Theorem 5.18 in [16]). Also, observe that the algorithm itself (inductive SVM) did not use in this case the unlabeled sample (although the bound does use this sample). Nevertheless, using exactly the same technique we obtain error bounds for the transductive SVM algorithms in [2, 3].4 3 It might be that for some dichotomies the algorithm will fail. For example, an SVM in feature space without soft margin will fail to classify non linearly-separable dichotomies of Xm+u . 4 Note however that our bounds are optimized with a “minimum number of support vectors” approach rather than “maximum margin”. 5 Bounds for Clustering Algorithms Some learning problems do not allow for high compression rates using compression schemes such as SVMs (i.e. the number of support vectors can sometimes be very large). A considerably stronger type of compression can often be achieved by clustering algorithms. While there is lack of formal links between entirely unsupervised clustering and classiﬁcation, within a transduction setting we can provide a principled approach to using clustering algorithms for classiﬁcation. Let A be any (deterministic) clustering algorithm which, given the full sample Xm+u , can cluster this sample into any desired number of clusters. We use A to cluster Xm+u into 2, 3 . . . , c clusters where c ≤ m. Thus, the algorithm generates a collection of partitions of Xm+u into τ = 2, 3, . . . , c clusters, where each partition is denoted by Cτ . For each value of τ , let Hτ consist of those hypotheses which assign an identical label to all points in the same cluster of partition Cτ , and deﬁne the prior pτ (h) = 1/2τ for each h ∈ Hτ and zero otherwise (note that there are 2τ possible dichotomies). The learning algorithm selects a hypothesis as follows. Upon observing the labeled sample Sm = (Xm , Ym ), for each of the clusterings C2 , . . . , Cc constructed above, it assigns a label to each cluster based on the majority vote from the labels Ym of points falling within the cluster (in case of ties, or if no points from Xm belong to the cluster, choose a label arbitrarily). Doing this leads to c − 1 classiﬁers hτ , τ = 2, . . . , c. For each hτ there is a valid error bound as given by Theorem 3.4 and all these bounds are valid simultaneously. Thus we choose the best classiﬁer (equivalently, number of clusters) for which the best bound holds. We thus have the following corollary of Theorem 3.4 and Lemma 2.1. Corollary 5.1 Let A be any clustering algorithm and let hτ , τ = 2, . . . , c be classiﬁcations of test set Xu as determined by clustering of the full sample Xm+u (into τ clusters) and the training set Sm , as described above. Let δ ∈ (0, 1) be given. Then with probability at least 1 − δ, for all τ , (5) holds with log(1/p(h)) replaced by τ and ln(m/δ) replaced by ln(mc/δ). Error bounds obtained using Corollary 5.1 can be rather tight when the clustering algorithm is successful (i.e. when it captures the class structure in the data using a small number of clusters). Corollary 5.1 can be extended in a number of ways. One simple extension is the use of an ensemble of clustering algorithms. Speciﬁcally, we can concurrently apply k clustering algorithm (using each algorithm to cluster the data into τ = 2, . . . , c clusters). We thus obtain kc hypotheses (partitions of Xm+u ). By a simple application of the union bound we can replace ln cm by ln kcm in Corollary 5.1 and guarantee that kc bounds hold siδ δ multaneously for all kc hypotheses (with probability at least 1 − δ). We thus choose the hypothesis which minimizes the resulting bound. This extension is particularly attractive since typically without prior knowledge we do not know which clustering algorithm will be effective for the dataset at hand. 6 Concluding Remarks We presented new bounds for transductive learning algorithms. We also developed a new technique for deriving tight error bounds for compression schemes and for clustering algorithms in the transductive setting. We expect that these bounds and new techniques will be useful for deriving new error bounds for other known algorithms and for deriving new types of transductive learning algorithms. It would be interesting to see if tighter transduction bounds can be obtained by reducing the “slacks” in the inequalities we use in our analysis. Another promising direction is the construction of better (multiple) priors. For example, in our compression bound (Corollary 4.1), for each number of compression points we assigned the same prior to each possible point subset and each possible dichotomy. However, in practice a vast majority of all these subsets and dichotomies are unlikely to occur. Acknowledgments The work of R.E and R.M. was partially supported by the Technion V.P.R. fund for the promotion of sponsored research. Support from the Ollendorff center of the department of Electrical Engineering at the Technion is also acknowledged. We also thank anonymous referees for their useful comments. References [1] V. N. Vapnik. Estimation of Dependences Based on Empirical Data. Springer Verlag, New York, 1982. [2] V. N. Vapnik. Statistical Learning Theory. Wiley Interscience, New York, 1998. [3] T. Joachims. Transductive inference for text classiﬁcation unsing support vector machines. In European Conference on Machine Learning, 1999. [4] A. Blum and S. Chawla. Learning from labeled and unlabeled data using graph mincuts. In Proceeding of The Eighteenth International Conference on Machine Learning (ICML 2001), pages 19–26, 2001. [5] R. El-Yaniv and O. Souroujon. Iterative double clustering for unsupervised and semisupervised learning. In Advances in Neural Information Processing Systems (NIPS 2001), pages 1025–1032, 2001. [6] T. Joachims. Transductive learning via spectral graph partitioning. In Proceeding of The Twentieth International Conference on Machine Learning (ICML-2003), 2003. [7] D. McAllester. Some PAC-Bayesian theorems. Machine Learning, 37(3):355–363, 1999. [8] D. McAllester. PAC-Bayesian stochastic model selection. Machine Learning, 51(1):5–21, 2003. [9] D. Wu, K. Bennett, N. Cristianini, and J. Shawe-Taylor. Large margin trees for induction and transduction. In International Conference on Machine Learning, 1999. [10] L. Bottou, C. Cortes, and V. Vapnik. On the effective VC dimension. Technical report, AT&T;, 1994. [11] G.R.G. Lanckriet, N. Cristianini, L. El Ghaoui, P. Bartlett, and M.I. Jordan. Learning the kernel matrix with semi-deﬁnite programming. Technical report, University of Berkeley, Computer Science Division, 2002. [12] A. Blum and J. Langford. Pac-mdl bounds. In COLT, pages 344–357, 2003. [13] W. Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statis. Assoc., 58:13–30, 1963. [14] A. Dembo and O. Zeitouni. Large Deviation Techniques and Applications. Springer, New York, second edition, 1998. [15] D. McAllester. Simpliﬁed pac-bayesian margin bounds. In COLT, pages 203–215, 2003. [16] R. Herbrich. Learning Kernel Classiﬁers: Theory and Algorithms. MIT Press, Boston, 2002.

3 0.46058917 183 nips-2003-Synchrony Detection by Analogue VLSI Neurons with Bimodal STDP Synapses

Author: Adria Bofill-i-petit, Alan F. Murray

4 0.4552328 13 nips-2003-A Neuromorphic Multi-chip Model of a Disparity Selective Complex Cell

Author: Bertram E. Shi, Eric K. Tsang

Abstract: The relative depth of objects causes small shifts in the left and right retinal positions of these objects, called binocular disparity. Here, we describe a neuromorphic implementation of a disparity selective complex cell using the binocular energy model, which has been proposed to model the response of disparity selective cells in the visual cortex. Our system consists of two silicon chips containing spiking neurons with monocular Gabor-type spatial receptive fields (RF) and circuits that combine the spike outputs to compute a disparity selective complex cell response. The disparity selectivity of the cell can be adjusted by both position and phase shifts between the monocular RF profiles, which are both used in biology. Our neuromorphic system performs better with phase encoding, because the relative responses of neurons tuned to different disparities by phase shifts are better matched than the responses of neurons tuned by position shifts.

5 0.44643852 177 nips-2003-Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles

Author: Mark Girolami, Ata Kabán

Abstract: To provide a compact generative representation of the sequential activity of a number of individuals within a group there is a tradeoff between the deﬁnition of individual speciﬁc and global models. This paper proposes a linear-time distributed model for ﬁnite state symbolic sequences representing traces of individual user activity by making the assumption that heterogeneous user behavior may be ‘explained’ by a relatively small number of common structurally simple behavioral patterns which may interleave randomly in a user-speciﬁc proportion. The results of an empirical study on three different sources of user traces indicates that this modelling approach provides an efﬁcient representation scheme, reﬂected by improved prediction performance as well as providing lowcomplexity and intuitively interpretable representations.

6 0.40762466 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates

7 0.39801213 127 nips-2003-Mechanism of Neural Interference by Transcranial Magnetic Stimulation: Network or Single Neuron?

8 0.39027467 16 nips-2003-A Recurrent Model of Orientation Maps with Simple and Complex Cells

9 0.39002323 125 nips-2003-Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model

10 0.38893285 107 nips-2003-Learning Spectral Clustering

11 0.38720098 93 nips-2003-Information Dynamics and Emergent Computation in Recurrent Circuits of Spiking Neurons

12 0.38719213 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

13 0.38496184 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games

14 0.38291332 81 nips-2003-Geometric Analysis of Constrained Curves

15 0.3827222 162 nips-2003-Probabilistic Inference of Speech Signals from Phaseless Spectrograms

16 0.38234961 126 nips-2003-Measure Based Regularization

17 0.38081694 30 nips-2003-Approximability of Probability Distributions

18 0.38046169 115 nips-2003-Linear Dependent Dimensionality Reduction

19 0.38025317 179 nips-2003-Sparse Representation and Its Applications in Blind Source Separation

20 0.37978533 113 nips-2003-Learning with Local and Global Consistency