nips nips2001 nips2001-112 knowledge-graph by maker-knowledge-mining

112 nips-2001-Learning Spike-Based Correlations and Conditional Probabilities in Silicon


Source: pdf

Author: Aaron P. Shon, David Hsu, Chris Diorio

Abstract: We have designed and fabricated a VLSI synapse that can learn a conditional probability or correlation between spike-based inputs and feedback signals. The synapse is low power, compact, provides nonvolatile weight storage, and can perform simultaneous multiplication and adaptation. We can calibrate arrays of synapses to ensure uniform adaptation characteristics. Finally, adaptation in our synapse does not necessarily depend on the signals used for computation. Consequently, our synapse can implement learning rules that correlate past and present synaptic activity. We provide analysis and experimental chip results demonstrating the operation in learning and calibration mode, and show how to use our synapse to implement various learning rules in silicon. 1 I n tro d u cti o n Computation with conditional probabilities and correlations underlies many models of neurally inspired information processing. For example, in the sequence-learning neural network models proposed by Levy [1], synapses store the log conditional probability that a presynaptic spike occurred given that the postsynaptic neuron spiked sometime later. Boltzmann machine synapses learn the difference between the correlations of pairs of neurons in the sleep and wake phase [2]. In most neural models, computation and adaptation occurs at the synaptic level. Hence, a silicon synapse that can learn conditional probabilities or correlations between pre- and post-synaptic signals can be a key part of many silicon neural-learning architectures. We have designed and implemented a silicon synapse, in a 0.35µm CMOS process, that learns a synaptic weight that corresponds to the conditional probability or correlation between binary input and feedback signals. This circuit utilizes floating-gate transistors to provide both nonvolatile storage and weight adaptation mechanisms [3]. In addition, the circuit is compact, low power, and provides simultaneous adaptation and computation. Our circuit improves upon previous implementations of floating-gate based learning synapses [3,4,5] in several ways. First, our synapse appears to be the first spike-based floating-gate synapse that implements a general learning principle, rather than a particular learning rule [4,5]. We demon- strate that our synapse can learn either the conditional probability or the correlation between input and feedback signals. Consequently, we can implement a wide range of synaptic learning networks with our circuit. Second, unlike the general correlational learning synapse proposed by Hasler et. al. [3], our synapse can implement learning rules that correlate pre- and postsynaptic activity that occur at different times. Learning algorithms that employ time-separated correlations include both temporal difference learning [6] and recently postulated temporally asymmetric Hebbian learning [7]. Hasler’s correlational floating-gate synapse can only perform updates based on the present input and feedback signals, and is therefore unsuitable for learning rules that correlate signals that occur at different times. Because signals that control adaptation and computation in our synapse are separate, our circuit can implement these time-dependent learning rules. Finally, we can calibrate our synapses to remove mismatch between the adaptation mechanisms of individual synapses. Mismatch between the same adaptation mechanisms on different floating-gate transistors limits the accuracy of learning rules based on these devices. This problem has been noted in previous circuits that use floating-gate adaptation [4,8]. In our circuit, different synapses can learn widely divergent weights from the same inputs because of component mismatch. We provide a calibration mechanism that enables identical adaptation across multiple synapses despite device mismatch. To our knowledge, this circuit is the first instance of a floating-gate learning circuit that includes this feature. This paper is organized as follows. First, we provide a brief introduction to floating-gate transistors. Next, we provide a description and analysis of our synapse, demonstrating that it can learn the conditional probability or correlation between a pair of binary signals. We then describe the calibration circuitry and show its effectiveness in compensating for adaptation mismatches. Finally, we discuss how this synapse can be used for silicon implementations of various learning networks. 2 Floating-gate transistors Because our circuit relies on floating-gate transistors to achieve adaptation, we begin by briefly discussing these devices. A floating-gate transistor (e.g. transistor M3 of Fig.1(a)) comprises a MOSFET whose gate is isolated on all sides by SiO2. A control gate capacitively couples signals to the floating gate. Charge stored on the floating gate implements a nonvolatile analog weight; the transistor’s output current varies with both the floating-gate voltage and the control-gate voltage. We use Fowler-Nordheim tunneling [9] to increase the floating-gate charge, and impact-ionized hot-electron injection (IHEI) [10] to decrease the floating-gate charge. We tunnel by placing a high voltage on a tunneling implant, denoted by the arrow in Fig.1(a). We inject by imposing more than about 3V across the drain and source of transistor M3. The circuit allows simultaneous adaptation and computation, because neither tunneling nor IHEI interfere with circuit operation. Over a wide range of tunneling voltages Vtun, we can approximate the magnitude of the tunneling current Itun as [4]: I tun = I tun 0 exp (Vtun − V fg ) / Vχ (1) where Vtun is the tunneling-implant voltage, Vfg is the floating-gate voltage, and Itun0 and Vχ are fit constants. Over a wide range of transistor drain and source voltages, we can approximate the magnitude of the injection current Iinj as [4]: 1−U t / Vγ I inj = I inj 0 I s exp ( (Vs − Vd ) / Vγ ) (2) where Vs and Vd are the drain and source voltages, Iinj0 is a pre-exponential current, Vγ is a constant that depends on the VLSI process, and Ut is the thermal voltage kT/q. 3 T h e s i l i co n s y n a p s e We show our silicon synapse in Fig.1. The synapse stores an analog weight W, multiplies W by a binary input Xin, and adapts W to either a conditional probability P(Xcor|Y) or a correlation P(XcorY). Xin is analogous to a presynaptic input, while Y is analogous to a postsynaptic signal or error feedback. Xcor is a presynaptic adaptation signal, and typically has some relationship with Xin. We can implement different learning rules by altering the relationship between Xcor and Xin. For some examples, see section 4. We now describe the circuit in more detail. The drain current of floating-gate transistor M4 represents the weight value W. Because the control gate of M4 is fixed, W depends solely on the charge on floating-gate capacitor C1. We can switch the drain current on or off using transistor M7; this switching action corresponds to a multiplication of the weight value W by a binary input signal, Xin. We choose values for the drain voltage of the M4 to prevent injection. A second floating-gate transistor M3, whose gate is also connected to C1, controls adaptation by injection and tunneling. Simultaneously high input signals Xcor and Y cause injection, increasing the weight. A high Vtun causes tunneling, decreasing the weight. We either choose to correlate a high Vtun with signal Y or provide a fixed high Vtun throughout the adaptation process. The choice determines whether the circuit learns a conditional probability or a correlation, respectively. Because the drain current sourced by M4 provides is the weight W, we can express W in terms of M4’s floating-gate voltage, Vfg. Vfg includes the effects of both the fixed controlgate voltage and the variable floating-gate charge. The expression differs depending on whether the readout transistor is operating in the subthreshold or above-threshold regime. We provide both expressions below: I 0 exp( − κ 2V fg /(1 + κ )U t ) W= κ V fg (1 + κ ) 2 β V0 − below threshold 2 (3) above threshold Here V0 is a constant that depends on the threshold voltage and on Vdd, Ut is the thermal voltage kT/q, κ is the floating-gate-to-channel coupling coefficient, and I 0 is a fixed bias current. Eq. 3 shows that W depends solely on Vfg, (all the other factors are constants). These equations differ slightly from standard equations for the source current through a transistor due to source degeneration caused by M 4. This degeneration smoothes the nonlinear relationship between Vfg and Is; its addition to the circuit is optional. 3.1 Weight adaptation Because W depends on Vfg, we can control W by tunneling or injecting transistor M3. In this section, we show that these mechanisms enable our circuit to learn the correlation or conditional probability between inputs Xcor (which we will refer to as X) and Y. Our analysis assumes that these statistics are fixed over some period during which adaptation occurs. The change in floating-gate voltage, and hence the weight, discussed below should therefore be interpreted in terms of the expected weight change due to the statistics of the inputs. We discuss learning of conditional probabilities; a slight change in the tunneling signal, described previously, allows us to learn correlations instead. We first derive the injection equation for the floating-gate voltage in terms of the joint probability P(X,Y) by considering the relationship between the input signals and Is, Vs, Vb Vtun M1 W eq (nA) 80 M2 60 40 C1 Xcor M4 M3 W M5 Xin Y o chip data − fit: P(X|Y)0.78 20 M6 0 M7 synaptic output 0.2 0.4 0.6 Pr(X|Y) 1 0.8 (b) 3.5 Fig. 1. (a) Synapse schematic. (b) Plot of equilibrium weight in the subthreshold regime versus the conditional probability P(X|Y), showing both experimental chip data and a fit from Eq.7 (c). Plot of equilibrium weight versus conditional probability in the above-threshold regime, again showing chip data and a fit from Eq.7. W eq (µA) (a). 3 2.5 2 0 o chip data − fit 0.2 0.4 0.6 Pr(X|Y) 0.8 1 (c) and Vd of M3. We assume that transistor M1 is in saturation, constraining Is at M3 to be constant. Presentation of a joint binary event (X,Y) closes nFET switches M5 and M6, pulling the drain voltage Vd of M3 to 0V and causing injection. Therefore the probability that Vd is low enough to cause injection is the probability of the joint event Pr(X,Y). By Eq.2 , the amount of the injection is also dependent on M3’s source voltage Vs. Because M3 is constrained to a fixed channel current, a drop in the floating-gate voltage, ∆Vfg, causes a drop in Vs of magnitude κ∆Vfg. Substituting these expressions into Eq.2 results in a floating-gate voltage update of: (dV fg / dt )inj = − I inj 0 Pr( X , Y ) exp(κ Vfg / Vγ ) (4) where Iinj0 also includes the constant source current. Eq.4 shows that the floating-gate voltage update due to injection is a function of the probability of the joint event (X,Y). Next we analyze the effects of tunneling on the floating-gate voltage. The origin of the tunneling signal determines whether the synapse is learning a conditional probability or a correlation. If the circuit is learning a conditional probability, occurrence of the conditioning event Y gates a corresponding high-voltage (~9V) signal onto the tunneling implant. Consequently, we can express the change in floating-gate voltage due to tunneling in terms of the probability of Y, and the floating-gate voltage. (dV fg / dt )tun = I tun 0 Pr(Y ) exp(−V fg / Vχ ) (5) Eq.5 shows that the floating-gate voltage update due to tunneling is a function of the probability of the event Y. 3.2 Weight equilibrium To demonstrate that our circuit learns P(X|Y), we show that the equilibrium weight of the synapse is solely a function of P(X|Y). The equilibrium weight of the synapse is the weight value where the expected weight change over time equals zero. This weight value corresponds to the floating-gate voltage where injection and tunneling currents are equal. To find this voltage, we equate Eq’s. 4 and 5 and solve: eq V fg = I inj 0 −1 log Pr( X | Y ) + log I tun 0 (κ / Vy + 1/ Vx ) (6) To derive the equilibrium weight, we substitute Eq.6 into Eq.3 and solve: I0 Weq = I inj 0 I tun 0 β V0 + η log where α = α Pr( X | Y ) I inj 0 I tun 0 below threshold 2 + log ( Pr( X | Y ) ) above threshold (7) κ2 κ2 and η = . (1 + κ )U t (κ / Vγ + 1/ Vχ ) (1 + κ )(κ / Vγ + 1/ Vχ ) Consequently, the equilibrium weight is a function of the conditional probability below threshold and a function of the log-squared conditional probability above threshold. Note that the equilibrium weight is stable because of negative feedback in the tunneling and injection processes. Therefore, the weight will always converge to the equilibrium value shown in Eq.7. Figs. 1(b) and (c) show the equilibrium weight versus the conditional P(X|Y) for both sub- and above-threshold circuits, along with fits to Eq.7. Note that both the sub- and above-threshold relationship between P(X|Y) and the equilibrium weight enables us to compute the probability of a vector of synaptic inputs X given a post-synaptic response Y. In both cases, we can apply the outputs currents of an array of synapses through diodes, and then add the resulting voltages via a capacitive voltage divider, resulting in a voltage that is a linear function of log P(X|Y). 3.3 Calibration circuitry Mismatch between injection and tunneling in different floating-gate transistors can greatly reduce the ability of our synapses to learn meaningful values. Experimental data from floating-gate transistors fabricated in a 0.35µm process show that injection varies by as much as 2:1 across a chip, and tunneling by up to 1.2:1. The effect of this mismatch on our synapses causes the weight equilibrium of different synapses to differ by a multiplicative gain. Fig.2 (b) shows the equilibrium weights of an array of six synapses exposed to identical input signals. The variation of the synaptic weights is of the same order of magnitude as the weights themselves, making large arrays of synapses all but useless for implementing many learning algorithms. We alleviate this problem by calibrating our synapses to equalize the pre-exponential tunneling and injection constants. Because the dependence of the equilibrium weight on these constants is determined by the ratio of Iinj0/Itun0, our calibration process changes Iinj to equalize the ratio of injection to tunneling across all synapses. We choose to calibrate injection because we can easily change Iinj0 by altering the drain current through M1. Our calibration procedure is a self-convergent memory write [11], that causes the equilibrium weight of every synapse to equal the current Ical. Calibration requires many operat- 80 Verase M1 M8 60 W eq (nA) Vb M2 Vtun 40 M3 M4 M9 V cal 20 M5 0 M7 M6 synaptic output 0.2 Ical 0.6 P(X|Y) 0.8 1 0.4 0.6 P(X|Y) 0.8 1 0.4 (b) 80 (a) Fig. 2. (a) Schematic of calibrated synapse with signals used during the calibration procedure. (b) Equilibrium weights for array of synapses shown in Fig.1a. (c) Equilibrium weights for array of calibrated synapses after calibration. W eq (nA) 60 40 20 0 0.2 (c) ing cycles, where, during each cycle, we first increase the equilibrium weight of the synapse, and second, we let the synapse adapt to the new equilibrium weight. We create the calibrated synapse by modifying our original synapse according to Fig. 2(a). We convert M1 into a floating-gate transistor, whose floating-gate charge thereby sets M3’s channel current, providing control of Iinj0 of Eq.7. Transistor M8 modifies M1’s gate charge by means of injection when M9’s gate is low and Vcal is low. M9’s gate is only low when the equilibrium weight W is less than Ical. During calibration, injection and tunneling on M3 are continuously active. We apply a pulse train to Vcal; during each pulse period, Vcal is predominately high. When Vcal is high, the synapse adapts towards its equilibrium weight. When Vcal pulses low, M8 injects, increasing the synapse’s equilibrium weight W. We repeat this process until the equilibrium weight W matches Ical, causing M9’s gate voltage to rise, disabling Vcal and with it injection. To ensure that a precalibrated synapse has an equilibrium weight below Ical, we use tunneling to erase all bias transistors prior to calibration. Fig.2(c) shows the equilibrium weights of six synapses after calibration. The data show that calibration can reduce the effect of mismatched adaptation on the synapse’s learned weight to a small fraction of the weight itself. Because M1 is a floating-gate transistor, its parasitic gate-drain capacitance causes a mild dependence between M1’s drain voltage and source current. Consequently, M3’s floatinggate voltage now affects its source current (through M1’s drain voltage), and we can model M3 as a source-degenerated pFET [3]. The new expression for the injection current in M3 is: Presynaptic neuron W+ Synapse W− X Y Injection Postsynaptic neuron Injection Activation window Fig. 3. A method for achieving spike-time dependent plasticity in silicon. (dV fg / dt )inj = − I inj 0 Pr( X , Y ) exp Vfg κ Vγ − κ k1 Ut (8) where k1 is close to zero. The new expression for injection slightly changes the α and η terms of the weight equilibrium in Eq.7, although the qualitative relationship between the weight equilibrium and the conditional probability remains the same. 4 Implementing silicon synaptic learning rules In this section we discuss how to implement a variety of learning rules from the computational-neurobiology and neural-network literature with our synapse circuit. We can use our circuit to implement a Hebbian learning rule. Simultaneously activating both M5 and M6 is analogous to heterosynaptic LTP based on synchronized pre- and postsynaptic signals, and activating tunneling with the postsynaptic Y is analogous to homosynaptic LTD. In our synapse, we tie Xin and Xcor together and correlate Vtun with Y. Our synapse is also capable of emulating a Boltzmann weight-update rule [2]. This weight-update rule derives from the difference between correlations among neurons when the network receives external input, and when the network operates in a free running phase (denoted as clamped and unclamped phases respectively). With weight decay, a Boltzmann synapse learns the difference between correlations in the clamped and unclamped phase. We can create a Boltzmann synapse from a pair of our circuits, in which the effective weight is the difference between the weights of the two synapses. To implement a weight update, we update one silicon synapse based on pre- and postsynaptic signals in the clamped phase, and update the other synapse in the unclamped phase. We do this by sending Xin to Xcor of one synapse in the clamped phase, and sending Xin to Xcor of the other synapse in the negative phase. Vtun remains constant throughout adaptation. Finally, we consider implementing a temporally asymmetric Hebbian learning rule [7] using our synapse. In temporally asymmetric Hebbian learning, a synapse exhibits LTP or LTD if the presynaptic input occurs before or after the postsynaptic response, respectively. We implement an asymmetric learning synapse using two of our circuits, where the synaptic weight is the difference in the weights of the two circuit. We show the circuit in Fig. 3. Each neuron sends two signals: a neuronal output, and an adaptation time window that is active for some time afterwards. Therefore, the combined synapse receives two presynaptic signals and two postsynaptic signals. The relative timing of a postsynaptic response, Y, with the presynaptic input, X, determines whether the synapse undergoes LTP or LTD. If Y occurs before X, Y’s time window correlates with X, causing injection on the negative synapse, decreasing the weight. If Y occurs after X, Y correlates with X’s time window, causing injection on the positive synapse, increasing the weight. Hence, our circuit can use the relative timing between presynaptic and postsynaptic activity to implement learning. 5 Conclusion We have described a silicon synapse that implements a wide range of spike-based learning rules, and that does not suffer from device mismatch. We have also described how we can implement various silicon-learning networks using this synapse. In addition, although we have only analyzed the learning properties of the synapse for binary signals, we can instead use pulse-coded analog signals. One possible avenue for future work is to analyze the implications of different pulse-coded schemes on the circuit’s adaptive behavior. A c k n o w l e d g e me n t s This work was supported by the National Science Foundation and by the Office of Naval Research. Aaron Shon was also supported by a NDSEG fellowship. We thank Anhai Doan and the anonymous reviewers for helpful comments. References [1] W.B.Levy, “A computational approach to hippocampal function,” in R.D. Hawkins and G.H. Bower (eds.), Computational Models of Learning in Simple Neural Systems, The Psychology of Learning and Motivation vol. 23, pp. 243-305, San Diego, CA: Academic Press, 1989. [2] D. H. Ackley, G. Hinton, and T. Sejnowski, “A learning algorithm for Boltzmann machines,” Cognitive Science vol. 9, pp. 147-169, 1985. [3 ] P. Hasler, B. A. Minch, J. Dugger, and C. Diorio, “Adaptive circuits and synapses using pFET floating-gate devices, ” in G. Cauwenberghs and M. Bayoumi (eds.) Learning in Silicon, pp. 33-65, Kluwer Academic, 1999. [4] P. Hafliger, A spike-based learning rule and its implementation in analog hardware, Ph.D. thesis, ETH Zurich, 1999. [5] C. Diorio, P. Hasler, B. A. Minch, and C. Mead, “A floating-gate MOS learning array with locally computer weight updates,” IEEE Transactions on Electron Devices vol. 44(12), pp. 2281-2289, 1997. [6] R. Sutton, “Learning to predict by the methods of temporal difference,” Machine Learning, vol. 3, p p . 9-44, 1988. [7] H.Markram, J. Lübke, M. Frotscher, and B. Sakmann, “Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs,” Science vol. 275, pp.213-215, 1997. [8] A. Pesavento, T. Horiuchi, C. Diorio, and C. Koch, “Adaptation of current signals with floating-gate circuits,” in Proceedings of the 7th International Conference on Microelectronics for Neural, Fuzzy, and Bio-Inspired Systems (Microneuro99), pp. 128-134, 1999. [9] M. Lenzlinger and E. H. Snow. “Fowler-Nordheim tunneling into thermally grown SiO2,” Journal of Applied Physics vol. 40(1), p p . 278--283, 1969. [10] E. Takeda, C. Yang, and A. Miura-Hamada, Hot Carrier Effects in MOS Devices, San Diego, CA: Academic Press, 1995. [11] C. Diorio, “A p-channel MOS synapse transistor with self-convergent memory writes,” IEEE Journal of Solid-State Circuits vol. 36(5), pp. 816-822, 2001.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We have designed and fabricated a VLSI synapse that can learn a conditional probability or correlation between spike-based inputs and feedback signals. [sent-4, score-0.734]

2 The synapse is low power, compact, provides nonvolatile weight storage, and can perform simultaneous multiplication and adaptation. [sent-5, score-0.725]

3 We can calibrate arrays of synapses to ensure uniform adaptation characteristics. [sent-6, score-0.337]

4 Finally, adaptation in our synapse does not necessarily depend on the signals used for computation. [sent-7, score-0.758]

5 Consequently, our synapse can implement learning rules that correlate past and present synaptic activity. [sent-8, score-0.839]

6 We provide analysis and experimental chip results demonstrating the operation in learning and calibration mode, and show how to use our synapse to implement various learning rules in silicon. [sent-9, score-0.927]

7 For example, in the sequence-learning neural network models proposed by Levy [1], synapses store the log conditional probability that a presynaptic spike occurred given that the postsynaptic neuron spiked sometime later. [sent-11, score-0.456]

8 Hence, a silicon synapse that can learn conditional probabilities or correlations between pre- and post-synaptic signals can be a key part of many silicon neural-learning architectures. [sent-14, score-0.955]

9 35µm CMOS process, that learns a synaptic weight that corresponds to the conditional probability or correlation between binary input and feedback signals. [sent-16, score-0.35]

10 This circuit utilizes floating-gate transistors to provide both nonvolatile storage and weight adaptation mechanisms [3]. [sent-17, score-0.515]

11 In addition, the circuit is compact, low power, and provides simultaneous adaptation and computation. [sent-18, score-0.316]

12 Our circuit improves upon previous implementations of floating-gate based learning synapses [3,4,5] in several ways. [sent-19, score-0.347]

13 First, our synapse appears to be the first spike-based floating-gate synapse that implements a general learning principle, rather than a particular learning rule [4,5]. [sent-20, score-1.186]

14 We demon- strate that our synapse can learn either the conditional probability or the correlation between input and feedback signals. [sent-21, score-0.734]

15 Second, unlike the general correlational learning synapse proposed by Hasler et. [sent-23, score-0.603]

16 [3], our synapse can implement learning rules that correlate pre- and postsynaptic activity that occur at different times. [sent-25, score-0.881]

17 Hasler’s correlational floating-gate synapse can only perform updates based on the present input and feedback signals, and is therefore unsuitable for learning rules that correlate signals that occur at different times. [sent-27, score-0.808]

18 Because signals that control adaptation and computation in our synapse are separate, our circuit can implement these time-dependent learning rules. [sent-28, score-1.024]

19 Finally, we can calibrate our synapses to remove mismatch between the adaptation mechanisms of individual synapses. [sent-29, score-0.376]

20 We provide a calibration mechanism that enables identical adaptation across multiple synapses despite device mismatch. [sent-33, score-0.421]

21 To our knowledge, this circuit is the first instance of a floating-gate learning circuit that includes this feature. [sent-34, score-0.347]

22 Finally, we discuss how this synapse can be used for silicon implementations of various learning networks. [sent-39, score-0.679]

23 2 Floating-gate transistors Because our circuit relies on floating-gate transistors to achieve adaptation, we begin by briefly discussing these devices. [sent-40, score-0.313]

24 Charge stored on the floating gate implements a nonvolatile analog weight; the transistor’s output current varies with both the floating-gate voltage and the control-gate voltage. [sent-46, score-0.442]

25 We use Fowler-Nordheim tunneling [9] to increase the floating-gate charge, and impact-ionized hot-electron injection (IHEI) [10] to decrease the floating-gate charge. [sent-47, score-0.546]

26 We tunnel by placing a high voltage on a tunneling implant, denoted by the arrow in Fig. [sent-48, score-0.528]

27 We inject by imposing more than about 3V across the drain and source of transistor M3. [sent-50, score-0.374]

28 The circuit allows simultaneous adaptation and computation, because neither tunneling nor IHEI interfere with circuit operation. [sent-51, score-0.772]

29 Over a wide range of tunneling voltages Vtun, we can approximate the magnitude of the tunneling current Itun as [4]: I tun = I tun 0 exp (Vtun − V fg ) / Vχ (1) where Vtun is the tunneling-implant voltage, Vfg is the floating-gate voltage, and Itun0 and Vχ are fit constants. [sent-52, score-0.997]

30 3 T h e s i l i co n s y n a p s e We show our silicon synapse in Fig. [sent-54, score-0.654]

31 The synapse stores an analog weight W, multiplies W by a binary input Xin, and adapts W to either a conditional probability P(Xcor|Y) or a correlation P(XcorY). [sent-56, score-0.802]

32 The drain current of floating-gate transistor M4 represents the weight value W. [sent-62, score-0.461]

33 We can switch the drain current on or off using transistor M7; this switching action corresponds to a multiplication of the weight value W by a binary input signal, Xin. [sent-64, score-0.461]

34 We choose values for the drain voltage of the M4 to prevent injection. [sent-65, score-0.396]

35 A second floating-gate transistor M3, whose gate is also connected to C1, controls adaptation by injection and tunneling. [sent-66, score-0.628]

36 These equations differ slightly from standard equations for the source current through a transistor due to source degeneration caused by M 4. [sent-77, score-0.32]

37 1 Weight adaptation Because W depends on Vfg, we can control W by tunneling or injecting transistor M3. [sent-80, score-0.591]

38 We discuss learning of conditional probabilities; a slight change in the tunneling signal, described previously, allows us to learn correlations instead. [sent-84, score-0.453]

39 We first derive the injection equation for the floating-gate voltage in terms of the joint probability P(X,Y) by considering the relationship between the input signals and Is, Vs, Vb Vtun M1 W eq (nA) 80 M2 60 40 C1 Xcor M4 M3 W M5 Xin Y o chip data − fit: P(X|Y)0. [sent-85, score-0.735]

40 (b) Plot of equilibrium weight in the subthreshold regime versus the conditional probability P(X|Y), showing both experimental chip data and a fit from Eq. [sent-94, score-0.504]

41 Plot of equilibrium weight versus conditional probability in the above-threshold regime, again showing chip data and a fit from Eq. [sent-96, score-0.482]

42 Presentation of a joint binary event (X,Y) closes nFET switches M5 and M6, pulling the drain voltage Vd of M3 to 0V and causing injection. [sent-106, score-0.466]

43 Therefore the probability that Vd is low enough to cause injection is the probability of the joint event Pr(X,Y). [sent-107, score-0.326]

44 2 , the amount of the injection is also dependent on M3’s source voltage Vs. [sent-109, score-0.531]

45 2 results in a floating-gate voltage update of: (dV fg / dt )inj = − I inj 0 Pr( X , Y ) exp(κ Vfg / Vγ ) (4) where Iinj0 also includes the constant source current. [sent-112, score-0.482]

46 4 shows that the floating-gate voltage update due to injection is a function of the probability of the joint event (X,Y). [sent-114, score-0.537]

47 The origin of the tunneling signal determines whether the synapse is learning a conditional probability or a correlation. [sent-116, score-0.984]

48 If the circuit is learning a conditional probability, occurrence of the conditioning event Y gates a corresponding high-voltage (~9V) signal onto the tunneling implant. [sent-117, score-0.598]

49 Consequently, we can express the change in floating-gate voltage due to tunneling in terms of the probability of Y, and the floating-gate voltage. [sent-118, score-0.55]

50 5 shows that the floating-gate voltage update due to tunneling is a function of the probability of the event Y. [sent-120, score-0.581]

51 2 Weight equilibrium To demonstrate that our circuit learns P(X|Y), we show that the equilibrium weight of the synapse is solely a function of P(X|Y). [sent-122, score-1.257]

52 The equilibrium weight of the synapse is the weight value where the expected weight change over time equals zero. [sent-123, score-1.057]

53 This weight value corresponds to the floating-gate voltage where injection and tunneling currents are equal. [sent-124, score-0.904]

54 4 and 5 and solve: eq V fg = I inj 0 −1 log Pr( X | Y ) + log I tun 0 (κ / Vy + 1/ Vx ) (6) To derive the equilibrium weight, we substitute Eq. [sent-126, score-0.565]

55 3 and solve: I0 Weq = I inj 0 I tun 0 β V0 + η log where α = α Pr( X | Y ) I inj 0 I tun 0 below threshold 2 + log ( Pr( X | Y ) ) above threshold (7) κ2 κ2 and η = . [sent-128, score-0.52]

56 (1 + κ )U t (κ / Vγ + 1/ Vχ ) (1 + κ )(κ / Vγ + 1/ Vχ ) Consequently, the equilibrium weight is a function of the conditional probability below threshold and a function of the log-squared conditional probability above threshold. [sent-129, score-0.489]

57 Note that the equilibrium weight is stable because of negative feedback in the tunneling and injection processes. [sent-130, score-0.875]

58 1(b) and (c) show the equilibrium weight versus the conditional P(X|Y) for both sub- and above-threshold circuits, along with fits to Eq. [sent-134, score-0.359]

59 Note that both the sub- and above-threshold relationship between P(X|Y) and the equilibrium weight enables us to compute the probability of a vector of synaptic inputs X given a post-synaptic response Y. [sent-136, score-0.423]

60 In both cases, we can apply the outputs currents of an array of synapses through diodes, and then add the resulting voltages via a capacitive voltage divider, resulting in a voltage that is a linear function of log P(X|Y). [sent-137, score-0.734]

61 3 Calibration circuitry Mismatch between injection and tunneling in different floating-gate transistors can greatly reduce the ability of our synapses to learn meaningful values. [sent-139, score-0.813]

62 35µm process show that injection varies by as much as 2:1 across a chip, and tunneling by up to 1. [sent-141, score-0.546]

63 The effect of this mismatch on our synapses causes the weight equilibrium of different synapses to differ by a multiplicative gain. [sent-143, score-0.686]

64 2 (b) shows the equilibrium weights of an array of six synapses exposed to identical input signals. [sent-145, score-0.423]

65 The variation of the synaptic weights is of the same order of magnitude as the weights themselves, making large arrays of synapses all but useless for implementing many learning algorithms. [sent-146, score-0.334]

66 We alleviate this problem by calibrating our synapses to equalize the pre-exponential tunneling and injection constants. [sent-147, score-0.733]

67 Because the dependence of the equilibrium weight on these constants is determined by the ratio of Iinj0/Itun0, our calibration process changes Iinj to equalize the ratio of injection to tunneling across all synapses. [sent-148, score-0.997]

68 We choose to calibrate injection because we can easily change Iinj0 by altering the drain current through M1. [sent-149, score-0.513]

69 Our calibration procedure is a self-convergent memory write [11], that causes the equilibrium weight of every synapse to equal the current Ical. [sent-150, score-1.041]

70 (a) Schematic of calibrated synapse with signals used during the calibration procedure. [sent-160, score-0.793]

71 2 (c) ing cycles, where, during each cycle, we first increase the equilibrium weight of the synapse, and second, we let the synapse adapt to the new equilibrium weight. [sent-165, score-1.048]

72 We create the calibrated synapse by modifying our original synapse according to Fig. [sent-166, score-1.151]

73 Transistor M8 modifies M1’s gate charge by means of injection when M9’s gate is low and Vcal is low. [sent-170, score-0.464]

74 M9’s gate is only low when the equilibrium weight W is less than Ical. [sent-171, score-0.378]

75 During calibration, injection and tunneling on M3 are continuously active. [sent-172, score-0.546]

76 When Vcal is high, the synapse adapts towards its equilibrium weight. [sent-174, score-0.751]

77 We repeat this process until the equilibrium weight W matches Ical, causing M9’s gate voltage to rise, disabling Vcal and with it injection. [sent-176, score-0.65]

78 To ensure that a precalibrated synapse has an equilibrium weight below Ical, we use tunneling to erase all bias transistors prior to calibration. [sent-177, score-1.224]

79 2(c) shows the equilibrium weights of six synapses after calibration. [sent-179, score-0.38]

80 The data show that calibration can reduce the effect of mismatched adaptation on the synapse’s learned weight to a small fraction of the weight itself. [sent-180, score-0.464]

81 Because M1 is a floating-gate transistor, its parasitic gate-drain capacitance causes a mild dependence between M1’s drain voltage and source current. [sent-181, score-0.471]

82 Consequently, M3’s floatinggate voltage now affects its source current (through M1’s drain voltage), and we can model M3 as a source-degenerated pFET [3]. [sent-182, score-0.475]

83 The new expression for the injection current in M3 is: Presynaptic neuron W+ Synapse W− X Y Injection Postsynaptic neuron Injection Activation window Fig. [sent-183, score-0.35]

84 The new expression for injection slightly changes the α and η terms of the weight equilibrium in Eq. [sent-187, score-0.548]

85 7, although the qualitative relationship between the weight equilibrium and the conditional probability remains the same. [sent-188, score-0.41]

86 4 Implementing silicon synaptic learning rules In this section we discuss how to implement a variety of learning rules from the computational-neurobiology and neural-network literature with our synapse circuit. [sent-189, score-0.955]

87 Simultaneously activating both M5 and M6 is analogous to heterosynaptic LTP based on synchronized pre- and postsynaptic signals, and activating tunneling with the postsynaptic Y is analogous to homosynaptic LTD. [sent-191, score-0.529]

88 Our synapse is also capable of emulating a Boltzmann weight-update rule [2]. [sent-193, score-0.556]

89 With weight decay, a Boltzmann synapse learns the difference between correlations in the clamped and unclamped phase. [sent-195, score-0.804]

90 We can create a Boltzmann synapse from a pair of our circuits, in which the effective weight is the difference between the weights of the two synapses. [sent-196, score-0.682]

91 To implement a weight update, we update one silicon synapse based on pre- and postsynaptic signals in the clamped phase, and update the other synapse in the unclamped phase. [sent-197, score-1.659]

92 We do this by sending Xin to Xcor of one synapse in the clamped phase, and sending Xin to Xcor of the other synapse in the negative phase. [sent-198, score-1.197]

93 In temporally asymmetric Hebbian learning, a synapse exhibits LTP or LTD if the presynaptic input occurs before or after the postsynaptic response, respectively. [sent-201, score-0.807]

94 We implement an asymmetric learning synapse using two of our circuits, where the synaptic weight is the difference in the weights of the two circuit. [sent-202, score-0.891]

95 Therefore, the combined synapse receives two presynaptic signals and two postsynaptic signals. [sent-206, score-0.815]

96 The relative timing of a postsynaptic response, Y, with the presynaptic input, X, determines whether the synapse undergoes LTP or LTD. [sent-207, score-0.745]

97 Hence, our circuit can use the relative timing between presynaptic and postsynaptic activity to implement learning. [sent-210, score-0.43]

98 5 Conclusion We have described a silicon synapse that implements a wide range of spike-based learning rules, and that does not suffer from device mismatch. [sent-211, score-0.703]

99 In addition, although we have only analyzed the learning properties of the synapse for binary signals, we can instead use pulse-coded analog signals. [sent-213, score-0.609]

100 Diorio, “A p-channel MOS synapse transistor with self-convergent memory writes,” IEEE Journal of Solid-State Circuits vol. [sent-286, score-0.72]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('synapse', 0.556), ('tunneling', 0.295), ('injection', 0.251), ('voltage', 0.233), ('equilibrium', 0.195), ('transistor', 0.164), ('drain', 0.163), ('circuit', 0.161), ('synapses', 0.161), ('vtun', 0.148), ('inj', 0.133), ('xcor', 0.133), ('adaptation', 0.132), ('calibration', 0.128), ('vfg', 0.118), ('postsynaptic', 0.117), ('tun', 0.103), ('weight', 0.102), ('silicon', 0.098), ('diorio', 0.089), ('vcal', 0.089), ('gate', 0.081), ('implement', 0.08), ('transistors', 0.076), ('synaptic', 0.075), ('circuits', 0.072), ('presynaptic', 0.072), ('xin', 0.07), ('signals', 0.07), ('fg', 0.069), ('eq', 0.065), ('chip', 0.065), ('conditional', 0.062), ('hasler', 0.059), ('pr', 0.056), ('correlate', 0.055), ('vd', 0.055), ('charge', 0.051), ('rules', 0.048), ('source', 0.047), ('calibrate', 0.044), ('nonvolatile', 0.044), ('boltzmann', 0.043), ('array', 0.043), ('correlations', 0.041), ('voltages', 0.041), ('clamped', 0.041), ('causing', 0.039), ('mismatch', 0.039), ('ical', 0.039), ('aaron', 0.039), ('calibrated', 0.039), ('ltp', 0.039), ('unclamped', 0.039), ('fit', 0.036), ('na', 0.035), ('hebbian', 0.033), ('temporally', 0.033), ('mos', 0.033), ('feedback', 0.032), ('current', 0.032), ('correlation', 0.032), ('event', 0.031), ('learn', 0.03), ('degeneration', 0.03), ('ihei', 0.03), ('minch', 0.03), ('pfet', 0.03), ('shon', 0.03), ('relationship', 0.029), ('consequently', 0.029), ('dv', 0.029), ('asymmetric', 0.029), ('causes', 0.028), ('analog', 0.028), ('devices', 0.028), ('vs', 0.027), ('equalize', 0.026), ('iinj', 0.026), ('learning', 0.025), ('implementing', 0.025), ('fixed', 0.025), ('learns', 0.025), ('signal', 0.024), ('implements', 0.024), ('threshold', 0.024), ('weights', 0.024), ('solely', 0.023), ('ut', 0.023), ('currents', 0.023), ('thermal', 0.023), ('altering', 0.023), ('window', 0.023), ('simultaneous', 0.023), ('exp', 0.023), ('probability', 0.022), ('neuron', 0.022), ('subthreshold', 0.022), ('correlational', 0.022), ('sending', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 112 nips-2001-Learning Spike-Based Correlations and Conditional Probabilities in Silicon

Author: Aaron P. Shon, David Hsu, Chris Diorio

Abstract: We have designed and fabricated a VLSI synapse that can learn a conditional probability or correlation between spike-based inputs and feedback signals. The synapse is low power, compact, provides nonvolatile weight storage, and can perform simultaneous multiplication and adaptation. We can calibrate arrays of synapses to ensure uniform adaptation characteristics. Finally, adaptation in our synapse does not necessarily depend on the signals used for computation. Consequently, our synapse can implement learning rules that correlate past and present synaptic activity. We provide analysis and experimental chip results demonstrating the operation in learning and calibration mode, and show how to use our synapse to implement various learning rules in silicon. 1 I n tro d u cti o n Computation with conditional probabilities and correlations underlies many models of neurally inspired information processing. For example, in the sequence-learning neural network models proposed by Levy [1], synapses store the log conditional probability that a presynaptic spike occurred given that the postsynaptic neuron spiked sometime later. Boltzmann machine synapses learn the difference between the correlations of pairs of neurons in the sleep and wake phase [2]. In most neural models, computation and adaptation occurs at the synaptic level. Hence, a silicon synapse that can learn conditional probabilities or correlations between pre- and post-synaptic signals can be a key part of many silicon neural-learning architectures. We have designed and implemented a silicon synapse, in a 0.35µm CMOS process, that learns a synaptic weight that corresponds to the conditional probability or correlation between binary input and feedback signals. This circuit utilizes floating-gate transistors to provide both nonvolatile storage and weight adaptation mechanisms [3]. In addition, the circuit is compact, low power, and provides simultaneous adaptation and computation. Our circuit improves upon previous implementations of floating-gate based learning synapses [3,4,5] in several ways. First, our synapse appears to be the first spike-based floating-gate synapse that implements a general learning principle, rather than a particular learning rule [4,5]. We demon- strate that our synapse can learn either the conditional probability or the correlation between input and feedback signals. Consequently, we can implement a wide range of synaptic learning networks with our circuit. Second, unlike the general correlational learning synapse proposed by Hasler et. al. [3], our synapse can implement learning rules that correlate pre- and postsynaptic activity that occur at different times. Learning algorithms that employ time-separated correlations include both temporal difference learning [6] and recently postulated temporally asymmetric Hebbian learning [7]. Hasler’s correlational floating-gate synapse can only perform updates based on the present input and feedback signals, and is therefore unsuitable for learning rules that correlate signals that occur at different times. Because signals that control adaptation and computation in our synapse are separate, our circuit can implement these time-dependent learning rules. Finally, we can calibrate our synapses to remove mismatch between the adaptation mechanisms of individual synapses. Mismatch between the same adaptation mechanisms on different floating-gate transistors limits the accuracy of learning rules based on these devices. This problem has been noted in previous circuits that use floating-gate adaptation [4,8]. In our circuit, different synapses can learn widely divergent weights from the same inputs because of component mismatch. We provide a calibration mechanism that enables identical adaptation across multiple synapses despite device mismatch. To our knowledge, this circuit is the first instance of a floating-gate learning circuit that includes this feature. This paper is organized as follows. First, we provide a brief introduction to floating-gate transistors. Next, we provide a description and analysis of our synapse, demonstrating that it can learn the conditional probability or correlation between a pair of binary signals. We then describe the calibration circuitry and show its effectiveness in compensating for adaptation mismatches. Finally, we discuss how this synapse can be used for silicon implementations of various learning networks. 2 Floating-gate transistors Because our circuit relies on floating-gate transistors to achieve adaptation, we begin by briefly discussing these devices. A floating-gate transistor (e.g. transistor M3 of Fig.1(a)) comprises a MOSFET whose gate is isolated on all sides by SiO2. A control gate capacitively couples signals to the floating gate. Charge stored on the floating gate implements a nonvolatile analog weight; the transistor’s output current varies with both the floating-gate voltage and the control-gate voltage. We use Fowler-Nordheim tunneling [9] to increase the floating-gate charge, and impact-ionized hot-electron injection (IHEI) [10] to decrease the floating-gate charge. We tunnel by placing a high voltage on a tunneling implant, denoted by the arrow in Fig.1(a). We inject by imposing more than about 3V across the drain and source of transistor M3. The circuit allows simultaneous adaptation and computation, because neither tunneling nor IHEI interfere with circuit operation. Over a wide range of tunneling voltages Vtun, we can approximate the magnitude of the tunneling current Itun as [4]: I tun = I tun 0 exp (Vtun − V fg ) / Vχ (1) where Vtun is the tunneling-implant voltage, Vfg is the floating-gate voltage, and Itun0 and Vχ are fit constants. Over a wide range of transistor drain and source voltages, we can approximate the magnitude of the injection current Iinj as [4]: 1−U t / Vγ I inj = I inj 0 I s exp ( (Vs − Vd ) / Vγ ) (2) where Vs and Vd are the drain and source voltages, Iinj0 is a pre-exponential current, Vγ is a constant that depends on the VLSI process, and Ut is the thermal voltage kT/q. 3 T h e s i l i co n s y n a p s e We show our silicon synapse in Fig.1. The synapse stores an analog weight W, multiplies W by a binary input Xin, and adapts W to either a conditional probability P(Xcor|Y) or a correlation P(XcorY). Xin is analogous to a presynaptic input, while Y is analogous to a postsynaptic signal or error feedback. Xcor is a presynaptic adaptation signal, and typically has some relationship with Xin. We can implement different learning rules by altering the relationship between Xcor and Xin. For some examples, see section 4. We now describe the circuit in more detail. The drain current of floating-gate transistor M4 represents the weight value W. Because the control gate of M4 is fixed, W depends solely on the charge on floating-gate capacitor C1. We can switch the drain current on or off using transistor M7; this switching action corresponds to a multiplication of the weight value W by a binary input signal, Xin. We choose values for the drain voltage of the M4 to prevent injection. A second floating-gate transistor M3, whose gate is also connected to C1, controls adaptation by injection and tunneling. Simultaneously high input signals Xcor and Y cause injection, increasing the weight. A high Vtun causes tunneling, decreasing the weight. We either choose to correlate a high Vtun with signal Y or provide a fixed high Vtun throughout the adaptation process. The choice determines whether the circuit learns a conditional probability or a correlation, respectively. Because the drain current sourced by M4 provides is the weight W, we can express W in terms of M4’s floating-gate voltage, Vfg. Vfg includes the effects of both the fixed controlgate voltage and the variable floating-gate charge. The expression differs depending on whether the readout transistor is operating in the subthreshold or above-threshold regime. We provide both expressions below: I 0 exp( − κ 2V fg /(1 + κ )U t ) W= κ V fg (1 + κ ) 2 β V0 − below threshold 2 (3) above threshold Here V0 is a constant that depends on the threshold voltage and on Vdd, Ut is the thermal voltage kT/q, κ is the floating-gate-to-channel coupling coefficient, and I 0 is a fixed bias current. Eq. 3 shows that W depends solely on Vfg, (all the other factors are constants). These equations differ slightly from standard equations for the source current through a transistor due to source degeneration caused by M 4. This degeneration smoothes the nonlinear relationship between Vfg and Is; its addition to the circuit is optional. 3.1 Weight adaptation Because W depends on Vfg, we can control W by tunneling or injecting transistor M3. In this section, we show that these mechanisms enable our circuit to learn the correlation or conditional probability between inputs Xcor (which we will refer to as X) and Y. Our analysis assumes that these statistics are fixed over some period during which adaptation occurs. The change in floating-gate voltage, and hence the weight, discussed below should therefore be interpreted in terms of the expected weight change due to the statistics of the inputs. We discuss learning of conditional probabilities; a slight change in the tunneling signal, described previously, allows us to learn correlations instead. We first derive the injection equation for the floating-gate voltage in terms of the joint probability P(X,Y) by considering the relationship between the input signals and Is, Vs, Vb Vtun M1 W eq (nA) 80 M2 60 40 C1 Xcor M4 M3 W M5 Xin Y o chip data − fit: P(X|Y)0.78 20 M6 0 M7 synaptic output 0.2 0.4 0.6 Pr(X|Y) 1 0.8 (b) 3.5 Fig. 1. (a) Synapse schematic. (b) Plot of equilibrium weight in the subthreshold regime versus the conditional probability P(X|Y), showing both experimental chip data and a fit from Eq.7 (c). Plot of equilibrium weight versus conditional probability in the above-threshold regime, again showing chip data and a fit from Eq.7. W eq (µA) (a). 3 2.5 2 0 o chip data − fit 0.2 0.4 0.6 Pr(X|Y) 0.8 1 (c) and Vd of M3. We assume that transistor M1 is in saturation, constraining Is at M3 to be constant. Presentation of a joint binary event (X,Y) closes nFET switches M5 and M6, pulling the drain voltage Vd of M3 to 0V and causing injection. Therefore the probability that Vd is low enough to cause injection is the probability of the joint event Pr(X,Y). By Eq.2 , the amount of the injection is also dependent on M3’s source voltage Vs. Because M3 is constrained to a fixed channel current, a drop in the floating-gate voltage, ∆Vfg, causes a drop in Vs of magnitude κ∆Vfg. Substituting these expressions into Eq.2 results in a floating-gate voltage update of: (dV fg / dt )inj = − I inj 0 Pr( X , Y ) exp(κ Vfg / Vγ ) (4) where Iinj0 also includes the constant source current. Eq.4 shows that the floating-gate voltage update due to injection is a function of the probability of the joint event (X,Y). Next we analyze the effects of tunneling on the floating-gate voltage. The origin of the tunneling signal determines whether the synapse is learning a conditional probability or a correlation. If the circuit is learning a conditional probability, occurrence of the conditioning event Y gates a corresponding high-voltage (~9V) signal onto the tunneling implant. Consequently, we can express the change in floating-gate voltage due to tunneling in terms of the probability of Y, and the floating-gate voltage. (dV fg / dt )tun = I tun 0 Pr(Y ) exp(−V fg / Vχ ) (5) Eq.5 shows that the floating-gate voltage update due to tunneling is a function of the probability of the event Y. 3.2 Weight equilibrium To demonstrate that our circuit learns P(X|Y), we show that the equilibrium weight of the synapse is solely a function of P(X|Y). The equilibrium weight of the synapse is the weight value where the expected weight change over time equals zero. This weight value corresponds to the floating-gate voltage where injection and tunneling currents are equal. To find this voltage, we equate Eq’s. 4 and 5 and solve: eq V fg = I inj 0 −1 log Pr( X | Y ) + log I tun 0 (κ / Vy + 1/ Vx ) (6) To derive the equilibrium weight, we substitute Eq.6 into Eq.3 and solve: I0 Weq = I inj 0 I tun 0 β V0 + η log where α = α Pr( X | Y ) I inj 0 I tun 0 below threshold 2 + log ( Pr( X | Y ) ) above threshold (7) κ2 κ2 and η = . (1 + κ )U t (κ / Vγ + 1/ Vχ ) (1 + κ )(κ / Vγ + 1/ Vχ ) Consequently, the equilibrium weight is a function of the conditional probability below threshold and a function of the log-squared conditional probability above threshold. Note that the equilibrium weight is stable because of negative feedback in the tunneling and injection processes. Therefore, the weight will always converge to the equilibrium value shown in Eq.7. Figs. 1(b) and (c) show the equilibrium weight versus the conditional P(X|Y) for both sub- and above-threshold circuits, along with fits to Eq.7. Note that both the sub- and above-threshold relationship between P(X|Y) and the equilibrium weight enables us to compute the probability of a vector of synaptic inputs X given a post-synaptic response Y. In both cases, we can apply the outputs currents of an array of synapses through diodes, and then add the resulting voltages via a capacitive voltage divider, resulting in a voltage that is a linear function of log P(X|Y). 3.3 Calibration circuitry Mismatch between injection and tunneling in different floating-gate transistors can greatly reduce the ability of our synapses to learn meaningful values. Experimental data from floating-gate transistors fabricated in a 0.35µm process show that injection varies by as much as 2:1 across a chip, and tunneling by up to 1.2:1. The effect of this mismatch on our synapses causes the weight equilibrium of different synapses to differ by a multiplicative gain. Fig.2 (b) shows the equilibrium weights of an array of six synapses exposed to identical input signals. The variation of the synaptic weights is of the same order of magnitude as the weights themselves, making large arrays of synapses all but useless for implementing many learning algorithms. We alleviate this problem by calibrating our synapses to equalize the pre-exponential tunneling and injection constants. Because the dependence of the equilibrium weight on these constants is determined by the ratio of Iinj0/Itun0, our calibration process changes Iinj to equalize the ratio of injection to tunneling across all synapses. We choose to calibrate injection because we can easily change Iinj0 by altering the drain current through M1. Our calibration procedure is a self-convergent memory write [11], that causes the equilibrium weight of every synapse to equal the current Ical. Calibration requires many operat- 80 Verase M1 M8 60 W eq (nA) Vb M2 Vtun 40 M3 M4 M9 V cal 20 M5 0 M7 M6 synaptic output 0.2 Ical 0.6 P(X|Y) 0.8 1 0.4 0.6 P(X|Y) 0.8 1 0.4 (b) 80 (a) Fig. 2. (a) Schematic of calibrated synapse with signals used during the calibration procedure. (b) Equilibrium weights for array of synapses shown in Fig.1a. (c) Equilibrium weights for array of calibrated synapses after calibration. W eq (nA) 60 40 20 0 0.2 (c) ing cycles, where, during each cycle, we first increase the equilibrium weight of the synapse, and second, we let the synapse adapt to the new equilibrium weight. We create the calibrated synapse by modifying our original synapse according to Fig. 2(a). We convert M1 into a floating-gate transistor, whose floating-gate charge thereby sets M3’s channel current, providing control of Iinj0 of Eq.7. Transistor M8 modifies M1’s gate charge by means of injection when M9’s gate is low and Vcal is low. M9’s gate is only low when the equilibrium weight W is less than Ical. During calibration, injection and tunneling on M3 are continuously active. We apply a pulse train to Vcal; during each pulse period, Vcal is predominately high. When Vcal is high, the synapse adapts towards its equilibrium weight. When Vcal pulses low, M8 injects, increasing the synapse’s equilibrium weight W. We repeat this process until the equilibrium weight W matches Ical, causing M9’s gate voltage to rise, disabling Vcal and with it injection. To ensure that a precalibrated synapse has an equilibrium weight below Ical, we use tunneling to erase all bias transistors prior to calibration. Fig.2(c) shows the equilibrium weights of six synapses after calibration. The data show that calibration can reduce the effect of mismatched adaptation on the synapse’s learned weight to a small fraction of the weight itself. Because M1 is a floating-gate transistor, its parasitic gate-drain capacitance causes a mild dependence between M1’s drain voltage and source current. Consequently, M3’s floatinggate voltage now affects its source current (through M1’s drain voltage), and we can model M3 as a source-degenerated pFET [3]. The new expression for the injection current in M3 is: Presynaptic neuron W+ Synapse W− X Y Injection Postsynaptic neuron Injection Activation window Fig. 3. A method for achieving spike-time dependent plasticity in silicon. (dV fg / dt )inj = − I inj 0 Pr( X , Y ) exp Vfg κ Vγ − κ k1 Ut (8) where k1 is close to zero. The new expression for injection slightly changes the α and η terms of the weight equilibrium in Eq.7, although the qualitative relationship between the weight equilibrium and the conditional probability remains the same. 4 Implementing silicon synaptic learning rules In this section we discuss how to implement a variety of learning rules from the computational-neurobiology and neural-network literature with our synapse circuit. We can use our circuit to implement a Hebbian learning rule. Simultaneously activating both M5 and M6 is analogous to heterosynaptic LTP based on synchronized pre- and postsynaptic signals, and activating tunneling with the postsynaptic Y is analogous to homosynaptic LTD. In our synapse, we tie Xin and Xcor together and correlate Vtun with Y. Our synapse is also capable of emulating a Boltzmann weight-update rule [2]. This weight-update rule derives from the difference between correlations among neurons when the network receives external input, and when the network operates in a free running phase (denoted as clamped and unclamped phases respectively). With weight decay, a Boltzmann synapse learns the difference between correlations in the clamped and unclamped phase. We can create a Boltzmann synapse from a pair of our circuits, in which the effective weight is the difference between the weights of the two synapses. To implement a weight update, we update one silicon synapse based on pre- and postsynaptic signals in the clamped phase, and update the other synapse in the unclamped phase. We do this by sending Xin to Xcor of one synapse in the clamped phase, and sending Xin to Xcor of the other synapse in the negative phase. Vtun remains constant throughout adaptation. Finally, we consider implementing a temporally asymmetric Hebbian learning rule [7] using our synapse. In temporally asymmetric Hebbian learning, a synapse exhibits LTP or LTD if the presynaptic input occurs before or after the postsynaptic response, respectively. We implement an asymmetric learning synapse using two of our circuits, where the synaptic weight is the difference in the weights of the two circuit. We show the circuit in Fig. 3. Each neuron sends two signals: a neuronal output, and an adaptation time window that is active for some time afterwards. Therefore, the combined synapse receives two presynaptic signals and two postsynaptic signals. The relative timing of a postsynaptic response, Y, with the presynaptic input, X, determines whether the synapse undergoes LTP or LTD. If Y occurs before X, Y’s time window correlates with X, causing injection on the negative synapse, decreasing the weight. If Y occurs after X, Y correlates with X’s time window, causing injection on the positive synapse, increasing the weight. Hence, our circuit can use the relative timing between presynaptic and postsynaptic activity to implement learning. 5 Conclusion We have described a silicon synapse that implements a wide range of spike-based learning rules, and that does not suffer from device mismatch. We have also described how we can implement various silicon-learning networks using this synapse. In addition, although we have only analyzed the learning properties of the synapse for binary signals, we can instead use pulse-coded analog signals. One possible avenue for future work is to analyze the implications of different pulse-coded schemes on the circuit’s adaptive behavior. A c k n o w l e d g e me n t s This work was supported by the National Science Foundation and by the Office of Naval Research. Aaron Shon was also supported by a NDSEG fellowship. We thank Anhai Doan and the anonymous reviewers for helpful comments. References [1] W.B.Levy, “A computational approach to hippocampal function,” in R.D. Hawkins and G.H. Bower (eds.), Computational Models of Learning in Simple Neural Systems, The Psychology of Learning and Motivation vol. 23, pp. 243-305, San Diego, CA: Academic Press, 1989. [2] D. H. Ackley, G. Hinton, and T. Sejnowski, “A learning algorithm for Boltzmann machines,” Cognitive Science vol. 9, pp. 147-169, 1985. [3 ] P. Hasler, B. A. Minch, J. Dugger, and C. Diorio, “Adaptive circuits and synapses using pFET floating-gate devices, ” in G. Cauwenberghs and M. Bayoumi (eds.) Learning in Silicon, pp. 33-65, Kluwer Academic, 1999. [4] P. Hafliger, A spike-based learning rule and its implementation in analog hardware, Ph.D. thesis, ETH Zurich, 1999. [5] C. Diorio, P. Hasler, B. A. Minch, and C. Mead, “A floating-gate MOS learning array with locally computer weight updates,” IEEE Transactions on Electron Devices vol. 44(12), pp. 2281-2289, 1997. [6] R. Sutton, “Learning to predict by the methods of temporal difference,” Machine Learning, vol. 3, p p . 9-44, 1988. [7] H.Markram, J. Lübke, M. Frotscher, and B. Sakmann, “Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs,” Science vol. 275, pp.213-215, 1997. [8] A. Pesavento, T. Horiuchi, C. Diorio, and C. Koch, “Adaptation of current signals with floating-gate circuits,” in Proceedings of the 7th International Conference on Microelectronics for Neural, Fuzzy, and Bio-Inspired Systems (Microneuro99), pp. 128-134, 1999. [9] M. Lenzlinger and E. H. Snow. “Fowler-Nordheim tunneling into thermally grown SiO2,” Journal of Applied Physics vol. 40(1), p p . 278--283, 1969. [10] E. Takeda, C. Yang, and A. Miura-Hamada, Hot Carrier Effects in MOS Devices, San Diego, CA: Academic Press, 1995. [11] C. Diorio, “A p-channel MOS synapse transistor with self-convergent memory writes,” IEEE Journal of Solid-State Circuits vol. 36(5), pp. 816-822, 2001.

2 0.28205851 49 nips-2001-Citcuits for VLSI Implementation of Temporally Asymmetric Hebbian Learning

Author: A. Bofill, D. P. Thompson, Alan F. Murray

Abstract: Experimental data has shown that synaptic strength modification in some types of biological neurons depends upon precise spike timing differences between presynaptic and postsynaptic spikes. Several temporally-asymmetric Hebbian learning rules motivated by this data have been proposed. We argue that such learning rules are suitable to analog VLSI implementation. We describe an easily tunable circuit to modify the weight of a silicon spiking neuron according to those learning rules. Test results from the fabrication of the circuit using a O.6J.lm CMOS process are given. 1

3 0.16713753 141 nips-2001-Orientation-Selective aVLSI Spiking Neurons

Author: Shih-Chii Liu, Jörg Kramer, Giacomo Indiveri, Tobi Delbrück, Rodney J. Douglas

Abstract: We describe a programmable multi-chip VLSI neuronal system that can be used for exploring spike-based information processing models. The system consists of a silicon retina, a PIC microcontroller, and a transceiver chip whose integrate-and-fire neurons are connected in a soft winner-take-all architecture. The circuit on this multi-neuron chip approximates a cortical microcircuit. The neurons can be configured for different computational properties by the virtual connections of a selected set of pixels on the silicon retina. The virtual wiring between the different chips is effected by an event-driven communication protocol that uses asynchronous digital pulses, similar to spikes in a neuronal system. We used the multi-chip spike-based system to synthesize orientation-tuned neurons using both a feedforward model and a feedback model. The performance of our analog hardware spiking model matched the experimental observations and digital simulations of continuous-valued neurons. The multi-chip VLSI system has advantages over computer neuronal models in that it is real-time, and the computational time does not scale with the size of the neuronal network.

4 0.13397725 197 nips-2001-Why Neuronal Dynamics Should Control Synaptic Learning Rules

Author: Jesper Tegnér, Ádám Kepecs

Abstract: Hebbian learning rules are generally formulated as static rules. Under changing condition (e.g. neuromodulation, input statistics) most rules are sensitive to parameters. In particular, recent work has focused on two different formulations of spike-timing-dependent plasticity rules. Additive STDP [1] is remarkably versatile but also very fragile, whereas multiplicative STDP [2, 3] is more robust but lacks attractive features such as synaptic competition and rate stabilization. Here we address the problem of robustness in the additive STDP rule. We derive an adaptive control scheme, where the learning function is under fast dynamic control by postsynaptic activity to stabilize learning under a variety of conditions. Such a control scheme can be implemented using known biophysical mechanisms of synapses. We show that this adaptive rule makes the addit ive STDP more robust. Finally, we give an example how meta plasticity of the adaptive rule can be used to guide STDP into different type of learning regimes. 1

5 0.12803657 33 nips-2001-An Efficient Clustering Algorithm Using Stochastic Association Model and Its Implementation Using Nanostructures

Author: Takashi Morie, Tomohiro Matsuura, Makoto Nagata, Atsushi Iwata

Abstract: This paper describes a clustering algorithm for vector quantizers using a “stochastic association model”. It offers a new simple and powerful softmax adaptation rule. The adaptation process is the same as the on-line K-means clustering method except for adding random fluctuation in the distortion error evaluation process. Simulation results demonstrate that the new algorithm can achieve efficient adaptation as high as the “neural gas” algorithm, which is reported as one of the most efficient clustering methods. It is a key to add uncorrelated random fluctuation in the similarity evaluation process for each reference vector. For hardware implementation of this process, we propose a nanostructure, whose operation is described by a single-electron circuit. It positively uses fluctuation in quantum mechanical tunneling processes.

6 0.11556991 176 nips-2001-Stochastic Mixed-Signal VLSI Architecture for High-Dimensional Kernel Machines

7 0.11090186 34 nips-2001-Analog Soft-Pattern-Matching Classifier using Floating-Gate MOS Technology

8 0.067673348 166 nips-2001-Self-regulation Mechanism of Temporally Asymmetric Hebbian Plasticity

9 0.066656269 2 nips-2001-3 state neurons for contextual processing

10 0.058895282 27 nips-2001-Activity Driven Adaptive Stochastic Resonance

11 0.055456661 37 nips-2001-Associative memory in realistic neuronal networks

12 0.045653649 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation

13 0.041677527 23 nips-2001-A theory of neural integration in the head-direction system

14 0.039973684 146 nips-2001-Playing is believing: The role of beliefs in multi-agent learning

15 0.036765736 119 nips-2001-Means, Correlations and Bounds

16 0.0339633 72 nips-2001-Exact differential equation population dynamics for integrate-and-fire neurons

17 0.031591378 57 nips-2001-Correlation Codes in Neuronal Populations

18 0.029127261 98 nips-2001-Information Geometrical Framework for Analyzing Belief Propagation Decoder

19 0.028616773 82 nips-2001-Generating velocity tuning by asymmetric recurrent connections

20 0.028237384 44 nips-2001-Blind Source Separation via Multinode Sparse Representation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.101), (1, -0.159), (2, -0.081), (3, 0.031), (4, 0.093), (5, 0.014), (6, 0.104), (7, -0.032), (8, -0.047), (9, -0.078), (10, 0.021), (11, -0.292), (12, 0.168), (13, 0.049), (14, -0.135), (15, 0.213), (16, -0.004), (17, -0.015), (18, 0.157), (19, 0.291), (20, -0.02), (21, -0.019), (22, -0.05), (23, -0.046), (24, 0.089), (25, -0.056), (26, -0.101), (27, -0.0), (28, 0.03), (29, -0.049), (30, -0.019), (31, 0.129), (32, 0.05), (33, -0.043), (34, 0.027), (35, 0.013), (36, -0.058), (37, -0.024), (38, -0.01), (39, -0.027), (40, -0.024), (41, -0.019), (42, -0.051), (43, 0.064), (44, -0.145), (45, 0.03), (46, -0.024), (47, -0.054), (48, 0.1), (49, 0.053)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9774251 112 nips-2001-Learning Spike-Based Correlations and Conditional Probabilities in Silicon

Author: Aaron P. Shon, David Hsu, Chris Diorio

Abstract: We have designed and fabricated a VLSI synapse that can learn a conditional probability or correlation between spike-based inputs and feedback signals. The synapse is low power, compact, provides nonvolatile weight storage, and can perform simultaneous multiplication and adaptation. We can calibrate arrays of synapses to ensure uniform adaptation characteristics. Finally, adaptation in our synapse does not necessarily depend on the signals used for computation. Consequently, our synapse can implement learning rules that correlate past and present synaptic activity. We provide analysis and experimental chip results demonstrating the operation in learning and calibration mode, and show how to use our synapse to implement various learning rules in silicon. 1 I n tro d u cti o n Computation with conditional probabilities and correlations underlies many models of neurally inspired information processing. For example, in the sequence-learning neural network models proposed by Levy [1], synapses store the log conditional probability that a presynaptic spike occurred given that the postsynaptic neuron spiked sometime later. Boltzmann machine synapses learn the difference between the correlations of pairs of neurons in the sleep and wake phase [2]. In most neural models, computation and adaptation occurs at the synaptic level. Hence, a silicon synapse that can learn conditional probabilities or correlations between pre- and post-synaptic signals can be a key part of many silicon neural-learning architectures. We have designed and implemented a silicon synapse, in a 0.35µm CMOS process, that learns a synaptic weight that corresponds to the conditional probability or correlation between binary input and feedback signals. This circuit utilizes floating-gate transistors to provide both nonvolatile storage and weight adaptation mechanisms [3]. In addition, the circuit is compact, low power, and provides simultaneous adaptation and computation. Our circuit improves upon previous implementations of floating-gate based learning synapses [3,4,5] in several ways. First, our synapse appears to be the first spike-based floating-gate synapse that implements a general learning principle, rather than a particular learning rule [4,5]. We demon- strate that our synapse can learn either the conditional probability or the correlation between input and feedback signals. Consequently, we can implement a wide range of synaptic learning networks with our circuit. Second, unlike the general correlational learning synapse proposed by Hasler et. al. [3], our synapse can implement learning rules that correlate pre- and postsynaptic activity that occur at different times. Learning algorithms that employ time-separated correlations include both temporal difference learning [6] and recently postulated temporally asymmetric Hebbian learning [7]. Hasler’s correlational floating-gate synapse can only perform updates based on the present input and feedback signals, and is therefore unsuitable for learning rules that correlate signals that occur at different times. Because signals that control adaptation and computation in our synapse are separate, our circuit can implement these time-dependent learning rules. Finally, we can calibrate our synapses to remove mismatch between the adaptation mechanisms of individual synapses. Mismatch between the same adaptation mechanisms on different floating-gate transistors limits the accuracy of learning rules based on these devices. This problem has been noted in previous circuits that use floating-gate adaptation [4,8]. In our circuit, different synapses can learn widely divergent weights from the same inputs because of component mismatch. We provide a calibration mechanism that enables identical adaptation across multiple synapses despite device mismatch. To our knowledge, this circuit is the first instance of a floating-gate learning circuit that includes this feature. This paper is organized as follows. First, we provide a brief introduction to floating-gate transistors. Next, we provide a description and analysis of our synapse, demonstrating that it can learn the conditional probability or correlation between a pair of binary signals. We then describe the calibration circuitry and show its effectiveness in compensating for adaptation mismatches. Finally, we discuss how this synapse can be used for silicon implementations of various learning networks. 2 Floating-gate transistors Because our circuit relies on floating-gate transistors to achieve adaptation, we begin by briefly discussing these devices. A floating-gate transistor (e.g. transistor M3 of Fig.1(a)) comprises a MOSFET whose gate is isolated on all sides by SiO2. A control gate capacitively couples signals to the floating gate. Charge stored on the floating gate implements a nonvolatile analog weight; the transistor’s output current varies with both the floating-gate voltage and the control-gate voltage. We use Fowler-Nordheim tunneling [9] to increase the floating-gate charge, and impact-ionized hot-electron injection (IHEI) [10] to decrease the floating-gate charge. We tunnel by placing a high voltage on a tunneling implant, denoted by the arrow in Fig.1(a). We inject by imposing more than about 3V across the drain and source of transistor M3. The circuit allows simultaneous adaptation and computation, because neither tunneling nor IHEI interfere with circuit operation. Over a wide range of tunneling voltages Vtun, we can approximate the magnitude of the tunneling current Itun as [4]: I tun = I tun 0 exp (Vtun − V fg ) / Vχ (1) where Vtun is the tunneling-implant voltage, Vfg is the floating-gate voltage, and Itun0 and Vχ are fit constants. Over a wide range of transistor drain and source voltages, we can approximate the magnitude of the injection current Iinj as [4]: 1−U t / Vγ I inj = I inj 0 I s exp ( (Vs − Vd ) / Vγ ) (2) where Vs and Vd are the drain and source voltages, Iinj0 is a pre-exponential current, Vγ is a constant that depends on the VLSI process, and Ut is the thermal voltage kT/q. 3 T h e s i l i co n s y n a p s e We show our silicon synapse in Fig.1. The synapse stores an analog weight W, multiplies W by a binary input Xin, and adapts W to either a conditional probability P(Xcor|Y) or a correlation P(XcorY). Xin is analogous to a presynaptic input, while Y is analogous to a postsynaptic signal or error feedback. Xcor is a presynaptic adaptation signal, and typically has some relationship with Xin. We can implement different learning rules by altering the relationship between Xcor and Xin. For some examples, see section 4. We now describe the circuit in more detail. The drain current of floating-gate transistor M4 represents the weight value W. Because the control gate of M4 is fixed, W depends solely on the charge on floating-gate capacitor C1. We can switch the drain current on or off using transistor M7; this switching action corresponds to a multiplication of the weight value W by a binary input signal, Xin. We choose values for the drain voltage of the M4 to prevent injection. A second floating-gate transistor M3, whose gate is also connected to C1, controls adaptation by injection and tunneling. Simultaneously high input signals Xcor and Y cause injection, increasing the weight. A high Vtun causes tunneling, decreasing the weight. We either choose to correlate a high Vtun with signal Y or provide a fixed high Vtun throughout the adaptation process. The choice determines whether the circuit learns a conditional probability or a correlation, respectively. Because the drain current sourced by M4 provides is the weight W, we can express W in terms of M4’s floating-gate voltage, Vfg. Vfg includes the effects of both the fixed controlgate voltage and the variable floating-gate charge. The expression differs depending on whether the readout transistor is operating in the subthreshold or above-threshold regime. We provide both expressions below: I 0 exp( − κ 2V fg /(1 + κ )U t ) W= κ V fg (1 + κ ) 2 β V0 − below threshold 2 (3) above threshold Here V0 is a constant that depends on the threshold voltage and on Vdd, Ut is the thermal voltage kT/q, κ is the floating-gate-to-channel coupling coefficient, and I 0 is a fixed bias current. Eq. 3 shows that W depends solely on Vfg, (all the other factors are constants). These equations differ slightly from standard equations for the source current through a transistor due to source degeneration caused by M 4. This degeneration smoothes the nonlinear relationship between Vfg and Is; its addition to the circuit is optional. 3.1 Weight adaptation Because W depends on Vfg, we can control W by tunneling or injecting transistor M3. In this section, we show that these mechanisms enable our circuit to learn the correlation or conditional probability between inputs Xcor (which we will refer to as X) and Y. Our analysis assumes that these statistics are fixed over some period during which adaptation occurs. The change in floating-gate voltage, and hence the weight, discussed below should therefore be interpreted in terms of the expected weight change due to the statistics of the inputs. We discuss learning of conditional probabilities; a slight change in the tunneling signal, described previously, allows us to learn correlations instead. We first derive the injection equation for the floating-gate voltage in terms of the joint probability P(X,Y) by considering the relationship between the input signals and Is, Vs, Vb Vtun M1 W eq (nA) 80 M2 60 40 C1 Xcor M4 M3 W M5 Xin Y o chip data − fit: P(X|Y)0.78 20 M6 0 M7 synaptic output 0.2 0.4 0.6 Pr(X|Y) 1 0.8 (b) 3.5 Fig. 1. (a) Synapse schematic. (b) Plot of equilibrium weight in the subthreshold regime versus the conditional probability P(X|Y), showing both experimental chip data and a fit from Eq.7 (c). Plot of equilibrium weight versus conditional probability in the above-threshold regime, again showing chip data and a fit from Eq.7. W eq (µA) (a). 3 2.5 2 0 o chip data − fit 0.2 0.4 0.6 Pr(X|Y) 0.8 1 (c) and Vd of M3. We assume that transistor M1 is in saturation, constraining Is at M3 to be constant. Presentation of a joint binary event (X,Y) closes nFET switches M5 and M6, pulling the drain voltage Vd of M3 to 0V and causing injection. Therefore the probability that Vd is low enough to cause injection is the probability of the joint event Pr(X,Y). By Eq.2 , the amount of the injection is also dependent on M3’s source voltage Vs. Because M3 is constrained to a fixed channel current, a drop in the floating-gate voltage, ∆Vfg, causes a drop in Vs of magnitude κ∆Vfg. Substituting these expressions into Eq.2 results in a floating-gate voltage update of: (dV fg / dt )inj = − I inj 0 Pr( X , Y ) exp(κ Vfg / Vγ ) (4) where Iinj0 also includes the constant source current. Eq.4 shows that the floating-gate voltage update due to injection is a function of the probability of the joint event (X,Y). Next we analyze the effects of tunneling on the floating-gate voltage. The origin of the tunneling signal determines whether the synapse is learning a conditional probability or a correlation. If the circuit is learning a conditional probability, occurrence of the conditioning event Y gates a corresponding high-voltage (~9V) signal onto the tunneling implant. Consequently, we can express the change in floating-gate voltage due to tunneling in terms of the probability of Y, and the floating-gate voltage. (dV fg / dt )tun = I tun 0 Pr(Y ) exp(−V fg / Vχ ) (5) Eq.5 shows that the floating-gate voltage update due to tunneling is a function of the probability of the event Y. 3.2 Weight equilibrium To demonstrate that our circuit learns P(X|Y), we show that the equilibrium weight of the synapse is solely a function of P(X|Y). The equilibrium weight of the synapse is the weight value where the expected weight change over time equals zero. This weight value corresponds to the floating-gate voltage where injection and tunneling currents are equal. To find this voltage, we equate Eq’s. 4 and 5 and solve: eq V fg = I inj 0 −1 log Pr( X | Y ) + log I tun 0 (κ / Vy + 1/ Vx ) (6) To derive the equilibrium weight, we substitute Eq.6 into Eq.3 and solve: I0 Weq = I inj 0 I tun 0 β V0 + η log where α = α Pr( X | Y ) I inj 0 I tun 0 below threshold 2 + log ( Pr( X | Y ) ) above threshold (7) κ2 κ2 and η = . (1 + κ )U t (κ / Vγ + 1/ Vχ ) (1 + κ )(κ / Vγ + 1/ Vχ ) Consequently, the equilibrium weight is a function of the conditional probability below threshold and a function of the log-squared conditional probability above threshold. Note that the equilibrium weight is stable because of negative feedback in the tunneling and injection processes. Therefore, the weight will always converge to the equilibrium value shown in Eq.7. Figs. 1(b) and (c) show the equilibrium weight versus the conditional P(X|Y) for both sub- and above-threshold circuits, along with fits to Eq.7. Note that both the sub- and above-threshold relationship between P(X|Y) and the equilibrium weight enables us to compute the probability of a vector of synaptic inputs X given a post-synaptic response Y. In both cases, we can apply the outputs currents of an array of synapses through diodes, and then add the resulting voltages via a capacitive voltage divider, resulting in a voltage that is a linear function of log P(X|Y). 3.3 Calibration circuitry Mismatch between injection and tunneling in different floating-gate transistors can greatly reduce the ability of our synapses to learn meaningful values. Experimental data from floating-gate transistors fabricated in a 0.35µm process show that injection varies by as much as 2:1 across a chip, and tunneling by up to 1.2:1. The effect of this mismatch on our synapses causes the weight equilibrium of different synapses to differ by a multiplicative gain. Fig.2 (b) shows the equilibrium weights of an array of six synapses exposed to identical input signals. The variation of the synaptic weights is of the same order of magnitude as the weights themselves, making large arrays of synapses all but useless for implementing many learning algorithms. We alleviate this problem by calibrating our synapses to equalize the pre-exponential tunneling and injection constants. Because the dependence of the equilibrium weight on these constants is determined by the ratio of Iinj0/Itun0, our calibration process changes Iinj to equalize the ratio of injection to tunneling across all synapses. We choose to calibrate injection because we can easily change Iinj0 by altering the drain current through M1. Our calibration procedure is a self-convergent memory write [11], that causes the equilibrium weight of every synapse to equal the current Ical. Calibration requires many operat- 80 Verase M1 M8 60 W eq (nA) Vb M2 Vtun 40 M3 M4 M9 V cal 20 M5 0 M7 M6 synaptic output 0.2 Ical 0.6 P(X|Y) 0.8 1 0.4 0.6 P(X|Y) 0.8 1 0.4 (b) 80 (a) Fig. 2. (a) Schematic of calibrated synapse with signals used during the calibration procedure. (b) Equilibrium weights for array of synapses shown in Fig.1a. (c) Equilibrium weights for array of calibrated synapses after calibration. W eq (nA) 60 40 20 0 0.2 (c) ing cycles, where, during each cycle, we first increase the equilibrium weight of the synapse, and second, we let the synapse adapt to the new equilibrium weight. We create the calibrated synapse by modifying our original synapse according to Fig. 2(a). We convert M1 into a floating-gate transistor, whose floating-gate charge thereby sets M3’s channel current, providing control of Iinj0 of Eq.7. Transistor M8 modifies M1’s gate charge by means of injection when M9’s gate is low and Vcal is low. M9’s gate is only low when the equilibrium weight W is less than Ical. During calibration, injection and tunneling on M3 are continuously active. We apply a pulse train to Vcal; during each pulse period, Vcal is predominately high. When Vcal is high, the synapse adapts towards its equilibrium weight. When Vcal pulses low, M8 injects, increasing the synapse’s equilibrium weight W. We repeat this process until the equilibrium weight W matches Ical, causing M9’s gate voltage to rise, disabling Vcal and with it injection. To ensure that a precalibrated synapse has an equilibrium weight below Ical, we use tunneling to erase all bias transistors prior to calibration. Fig.2(c) shows the equilibrium weights of six synapses after calibration. The data show that calibration can reduce the effect of mismatched adaptation on the synapse’s learned weight to a small fraction of the weight itself. Because M1 is a floating-gate transistor, its parasitic gate-drain capacitance causes a mild dependence between M1’s drain voltage and source current. Consequently, M3’s floatinggate voltage now affects its source current (through M1’s drain voltage), and we can model M3 as a source-degenerated pFET [3]. The new expression for the injection current in M3 is: Presynaptic neuron W+ Synapse W− X Y Injection Postsynaptic neuron Injection Activation window Fig. 3. A method for achieving spike-time dependent plasticity in silicon. (dV fg / dt )inj = − I inj 0 Pr( X , Y ) exp Vfg κ Vγ − κ k1 Ut (8) where k1 is close to zero. The new expression for injection slightly changes the α and η terms of the weight equilibrium in Eq.7, although the qualitative relationship between the weight equilibrium and the conditional probability remains the same. 4 Implementing silicon synaptic learning rules In this section we discuss how to implement a variety of learning rules from the computational-neurobiology and neural-network literature with our synapse circuit. We can use our circuit to implement a Hebbian learning rule. Simultaneously activating both M5 and M6 is analogous to heterosynaptic LTP based on synchronized pre- and postsynaptic signals, and activating tunneling with the postsynaptic Y is analogous to homosynaptic LTD. In our synapse, we tie Xin and Xcor together and correlate Vtun with Y. Our synapse is also capable of emulating a Boltzmann weight-update rule [2]. This weight-update rule derives from the difference between correlations among neurons when the network receives external input, and when the network operates in a free running phase (denoted as clamped and unclamped phases respectively). With weight decay, a Boltzmann synapse learns the difference between correlations in the clamped and unclamped phase. We can create a Boltzmann synapse from a pair of our circuits, in which the effective weight is the difference between the weights of the two synapses. To implement a weight update, we update one silicon synapse based on pre- and postsynaptic signals in the clamped phase, and update the other synapse in the unclamped phase. We do this by sending Xin to Xcor of one synapse in the clamped phase, and sending Xin to Xcor of the other synapse in the negative phase. Vtun remains constant throughout adaptation. Finally, we consider implementing a temporally asymmetric Hebbian learning rule [7] using our synapse. In temporally asymmetric Hebbian learning, a synapse exhibits LTP or LTD if the presynaptic input occurs before or after the postsynaptic response, respectively. We implement an asymmetric learning synapse using two of our circuits, where the synaptic weight is the difference in the weights of the two circuit. We show the circuit in Fig. 3. Each neuron sends two signals: a neuronal output, and an adaptation time window that is active for some time afterwards. Therefore, the combined synapse receives two presynaptic signals and two postsynaptic signals. The relative timing of a postsynaptic response, Y, with the presynaptic input, X, determines whether the synapse undergoes LTP or LTD. If Y occurs before X, Y’s time window correlates with X, causing injection on the negative synapse, decreasing the weight. If Y occurs after X, Y correlates with X’s time window, causing injection on the positive synapse, increasing the weight. Hence, our circuit can use the relative timing between presynaptic and postsynaptic activity to implement learning. 5 Conclusion We have described a silicon synapse that implements a wide range of spike-based learning rules, and that does not suffer from device mismatch. We have also described how we can implement various silicon-learning networks using this synapse. In addition, although we have only analyzed the learning properties of the synapse for binary signals, we can instead use pulse-coded analog signals. One possible avenue for future work is to analyze the implications of different pulse-coded schemes on the circuit’s adaptive behavior. A c k n o w l e d g e me n t s This work was supported by the National Science Foundation and by the Office of Naval Research. Aaron Shon was also supported by a NDSEG fellowship. We thank Anhai Doan and the anonymous reviewers for helpful comments. References [1] W.B.Levy, “A computational approach to hippocampal function,” in R.D. Hawkins and G.H. Bower (eds.), Computational Models of Learning in Simple Neural Systems, The Psychology of Learning and Motivation vol. 23, pp. 243-305, San Diego, CA: Academic Press, 1989. [2] D. H. Ackley, G. Hinton, and T. Sejnowski, “A learning algorithm for Boltzmann machines,” Cognitive Science vol. 9, pp. 147-169, 1985. [3 ] P. Hasler, B. A. Minch, J. Dugger, and C. Diorio, “Adaptive circuits and synapses using pFET floating-gate devices, ” in G. Cauwenberghs and M. Bayoumi (eds.) Learning in Silicon, pp. 33-65, Kluwer Academic, 1999. [4] P. Hafliger, A spike-based learning rule and its implementation in analog hardware, Ph.D. thesis, ETH Zurich, 1999. [5] C. Diorio, P. Hasler, B. A. Minch, and C. Mead, “A floating-gate MOS learning array with locally computer weight updates,” IEEE Transactions on Electron Devices vol. 44(12), pp. 2281-2289, 1997. [6] R. Sutton, “Learning to predict by the methods of temporal difference,” Machine Learning, vol. 3, p p . 9-44, 1988. [7] H.Markram, J. Lübke, M. Frotscher, and B. Sakmann, “Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs,” Science vol. 275, pp.213-215, 1997. [8] A. Pesavento, T. Horiuchi, C. Diorio, and C. Koch, “Adaptation of current signals with floating-gate circuits,” in Proceedings of the 7th International Conference on Microelectronics for Neural, Fuzzy, and Bio-Inspired Systems (Microneuro99), pp. 128-134, 1999. [9] M. Lenzlinger and E. H. Snow. “Fowler-Nordheim tunneling into thermally grown SiO2,” Journal of Applied Physics vol. 40(1), p p . 278--283, 1969. [10] E. Takeda, C. Yang, and A. Miura-Hamada, Hot Carrier Effects in MOS Devices, San Diego, CA: Academic Press, 1995. [11] C. Diorio, “A p-channel MOS synapse transistor with self-convergent memory writes,” IEEE Journal of Solid-State Circuits vol. 36(5), pp. 816-822, 2001.

2 0.86525929 49 nips-2001-Citcuits for VLSI Implementation of Temporally Asymmetric Hebbian Learning

Author: A. Bofill, D. P. Thompson, Alan F. Murray

Abstract: Experimental data has shown that synaptic strength modification in some types of biological neurons depends upon precise spike timing differences between presynaptic and postsynaptic spikes. Several temporally-asymmetric Hebbian learning rules motivated by this data have been proposed. We argue that such learning rules are suitable to analog VLSI implementation. We describe an easily tunable circuit to modify the weight of a silicon spiking neuron according to those learning rules. Test results from the fabrication of the circuit using a O.6J.lm CMOS process are given. 1

3 0.61753726 34 nips-2001-Analog Soft-Pattern-Matching Classifier using Floating-Gate MOS Technology

Author: Toshihiko Yamasaki, Tadashi Shibata

Abstract: A flexible pattern-matching analog classifier is presented in conjunction with a robust image representation algorithm called Principal Axes Projection (PAP). In the circuit, the functional form of matching is configurable in terms of the peak position, the peak height and the sharpness of the similarity evaluation. The test chip was fabricated in a 0.6-µm CMOS technology and successfully applied to hand-written pattern recognition and medical radiograph analysis using PAP as a feature extraction pre-processing step for robust image coding. The separation and classification of overlapping patterns is also experimentally demonstrated. 1 I ntr o du c ti o n Pattern classification using template matching techniques is a powerful tool in implementing human-like intelligent systems. However, the processing is computationally very expensive, consuming a lot of CPU time when implemented as software running on general-purpose computers. Therefore, software approaches are not practical for real-time applications. For systems working in mobile environment, in particular, they are not realistic because the memory and computational resources are severely limited. The development of analog VLSI chips having a fully parallel template matching architecture [1,2] would be a promising solution in such applications because they offer an opportunity of low-power operation as well as very compact implementation. In order to build a real human-like intelligent system, however, not only the pattern representation algorithm but also the matching hardware itself needs to be made flexible and robust in carrying out the pattern matching task. First of all, two-dimensional patterns need to be represented by feature vectors having substantially reduced dimensions, while at the same time preserving the human perception of similarity among patterns in the vector space mapping. For this purpose, an image representation algorithm called Principal Axes Projection (PAP) has been de- veloped [3] and its robust nature in pattern recognition has been demonstrated in the applications to medical radiograph analysis [3] and hand-written digits recognition [4]. However, the demonstration so far was only carried out by computer simulation. Regarding the matching hardware, high-flexibility analog template matching circuits have been developed for PAP vector representation. The circuits are flexible in a sense that the matching criteria (the weight to elements, the strictness in matching) are configurable. In Ref. [5], the fundamental characteristics of the building block circuits were presented, and their application to simple hand-written digits was presented in Ref. [6]. The purpose of this paper is to demonstrate the robust nature of the hardware matching system by experiments. The classification of simple hand-written patterns and the cephalometric landmark identification in gray-scale medical radiographs have been carried out and successful results are presented. In addition, multiple overlapping patterns can be separated without utilizing a priori knowledge, which is one of the most difficult problems at present in artificial intelligence. 2 I ma g e re pr es e n tati on by P AP PAP is a feature extraction technique using the edge information. The input image (64x64 pixels) is first subjected to pixel-by-pixel spatial filtering operations to detect edges in four directions: horizontal (HR); vertical (VR); +45 degrees (+45); and –45 degrees (-45). Each detected edge is represented by a binary flag and four edge maps are generated. The two-dimensional bit array in an edge map is reduced to a one-dimensional array of numerals by projection. The horizontal edge flags are accumulated in the horizontal direction and projected onto vertical axis. The vertical, +45-degree and –45-degree edge flags are similarly projected onto horizontal, -45-degree and +45-degree axes, respectively. Therefore the method is called “Principal Axes Projection (PAP)” [3,4]. Then each projection data set is series connected in the order of HR, +45, VR, -45 to form a feature vector. Neighboring four elements are averaged and merged to one element and a 64-dimensional vector is finally obtained. This vector representation very well preserves the human perception of similarity in the vector space. In the experiments below, we have further reduced the feature vector to 16 dimensions by merging each set of four neighboring elements into one, without any significant degradation in performance. C i r cui t c o nf i g ura ti ons A B C VGG A B C VGG IOUT IOUT 1 1 2 2 4 4 1 VIN 13 VIN RST RST £ ¡ ¤¢  £ ¥ §¦  3 Figure 1: Schematic of vector element matching circuit: (a) pyramid (gain reduction) type; (b) plateau (feedback) type. The capacitor area ratio is indicated in the figure. The basic functional form of the similarity evaluation is generated by the shortcut current flowing in a CMOS inverter as in Refs. [7,8,9]. However, their circuits were utilized to form radial basis functions and only the peak position was programmable. In our circuits, not only the peak position but also the peak height and the sharpness of the peak response shape are made configurable to realize flexible matching operations [5]. Two types of the element matching circuit are shown in Fig. 1. They evaluate the similarity between two vector elements. The result of the evaluation is given as an output current (IOUT ) from the pMOS current mirror. The peak position is temporarily memorized by auto-zeroing of the CMOS inverter. The common-gate transistor with VGG stabilizes the voltage supply to the inverter. By controlling the gate bias VGG, the peak height can be changed. This corresponds to multiplying a weight factor to the element. The sharpness of the functional form is taken as the strictness of the similarity evaluation. In the pyramid type circuit (Fig. 1(a)), the sharpness is controlled by the gain reduction in the input. In the plateau type (Fig. 1(b)), the output voltage of the inverter is fed back to input nodes and the sharpness changes in accordance with the amount of the feedback.    ¥£¡ ¦¤¢   £¨ 9&% ¦©§ (!! #$ 5 !' #$ &% 9 9 4 92 !¦ A1@9  ¨¥  5 4 52 (!  5 8765  9) 0 1 ¥ 1 ¨

4 0.51496953 176 nips-2001-Stochastic Mixed-Signal VLSI Architecture for High-Dimensional Kernel Machines

Author: Roman Genov, Gert Cauwenberghs

Abstract: A mixed-signal paradigm is presented for high-resolution parallel innerproduct computation in very high dimensions, suitable for efficient implementation of kernels in image processing. At the core of the externally digital architecture is a high-density, low-power analog array performing binary-binary partial matrix-vector multiplication. Full digital resolution is maintained even with low-resolution analog-to-digital conversion, owing to random statistics in the analog summation of binary products. A random modulation scheme produces near-Bernoulli statistics even for highly correlated inputs. The approach is validated with real image data, and with experimental results from a CID/DRAM analog array prototype in 0.5 m CMOS. ¢

5 0.4798151 197 nips-2001-Why Neuronal Dynamics Should Control Synaptic Learning Rules

Author: Jesper Tegnér, Ádám Kepecs

Abstract: Hebbian learning rules are generally formulated as static rules. Under changing condition (e.g. neuromodulation, input statistics) most rules are sensitive to parameters. In particular, recent work has focused on two different formulations of spike-timing-dependent plasticity rules. Additive STDP [1] is remarkably versatile but also very fragile, whereas multiplicative STDP [2, 3] is more robust but lacks attractive features such as synaptic competition and rate stabilization. Here we address the problem of robustness in the additive STDP rule. We derive an adaptive control scheme, where the learning function is under fast dynamic control by postsynaptic activity to stabilize learning under a variety of conditions. Such a control scheme can be implemented using known biophysical mechanisms of synapses. We show that this adaptive rule makes the addit ive STDP more robust. Finally, we give an example how meta plasticity of the adaptive rule can be used to guide STDP into different type of learning regimes. 1

6 0.40712953 33 nips-2001-An Efficient Clustering Algorithm Using Stochastic Association Model and Its Implementation Using Nanostructures

7 0.35688791 141 nips-2001-Orientation-Selective aVLSI Spiking Neurons

8 0.28959024 166 nips-2001-Self-regulation Mechanism of Temporally Asymmetric Hebbian Plasticity

9 0.18669462 2 nips-2001-3 state neurons for contextual processing

10 0.1623314 177 nips-2001-Switch Packet Arbitration via Queue-Learning

11 0.14552873 151 nips-2001-Probabilistic principles in unsupervised learning of visual structure: human data and a model

12 0.14409217 64 nips-2001-EM-DD: An Improved Multiple-Instance Learning Technique

13 0.13675377 73 nips-2001-Eye movements and the maturation of cortical orientation selectivity

14 0.13579147 27 nips-2001-Activity Driven Adaptive Stochastic Resonance

15 0.13392417 70 nips-2001-Estimating Car Insurance Premia: a Case Study in High-Dimensional Data Inference

16 0.12758553 174 nips-2001-Spike timing and the coding of naturalistic sounds in a central auditory area of songbirds

17 0.12707752 72 nips-2001-Exact differential equation population dynamics for integrate-and-fire neurons

18 0.12539212 133 nips-2001-On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes

19 0.1201522 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation

20 0.11929867 17 nips-2001-A Quantitative Model of Counterfactual Reasoning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.099), (17, 0.011), (19, 0.02), (27, 0.126), (30, 0.094), (34, 0.016), (38, 0.016), (48, 0.349), (59, 0.01), (72, 0.037), (74, 0.014), (79, 0.035), (91, 0.068)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84963733 112 nips-2001-Learning Spike-Based Correlations and Conditional Probabilities in Silicon

Author: Aaron P. Shon, David Hsu, Chris Diorio

Abstract: We have designed and fabricated a VLSI synapse that can learn a conditional probability or correlation between spike-based inputs and feedback signals. The synapse is low power, compact, provides nonvolatile weight storage, and can perform simultaneous multiplication and adaptation. We can calibrate arrays of synapses to ensure uniform adaptation characteristics. Finally, adaptation in our synapse does not necessarily depend on the signals used for computation. Consequently, our synapse can implement learning rules that correlate past and present synaptic activity. We provide analysis and experimental chip results demonstrating the operation in learning and calibration mode, and show how to use our synapse to implement various learning rules in silicon. 1 I n tro d u cti o n Computation with conditional probabilities and correlations underlies many models of neurally inspired information processing. For example, in the sequence-learning neural network models proposed by Levy [1], synapses store the log conditional probability that a presynaptic spike occurred given that the postsynaptic neuron spiked sometime later. Boltzmann machine synapses learn the difference between the correlations of pairs of neurons in the sleep and wake phase [2]. In most neural models, computation and adaptation occurs at the synaptic level. Hence, a silicon synapse that can learn conditional probabilities or correlations between pre- and post-synaptic signals can be a key part of many silicon neural-learning architectures. We have designed and implemented a silicon synapse, in a 0.35µm CMOS process, that learns a synaptic weight that corresponds to the conditional probability or correlation between binary input and feedback signals. This circuit utilizes floating-gate transistors to provide both nonvolatile storage and weight adaptation mechanisms [3]. In addition, the circuit is compact, low power, and provides simultaneous adaptation and computation. Our circuit improves upon previous implementations of floating-gate based learning synapses [3,4,5] in several ways. First, our synapse appears to be the first spike-based floating-gate synapse that implements a general learning principle, rather than a particular learning rule [4,5]. We demon- strate that our synapse can learn either the conditional probability or the correlation between input and feedback signals. Consequently, we can implement a wide range of synaptic learning networks with our circuit. Second, unlike the general correlational learning synapse proposed by Hasler et. al. [3], our synapse can implement learning rules that correlate pre- and postsynaptic activity that occur at different times. Learning algorithms that employ time-separated correlations include both temporal difference learning [6] and recently postulated temporally asymmetric Hebbian learning [7]. Hasler’s correlational floating-gate synapse can only perform updates based on the present input and feedback signals, and is therefore unsuitable for learning rules that correlate signals that occur at different times. Because signals that control adaptation and computation in our synapse are separate, our circuit can implement these time-dependent learning rules. Finally, we can calibrate our synapses to remove mismatch between the adaptation mechanisms of individual synapses. Mismatch between the same adaptation mechanisms on different floating-gate transistors limits the accuracy of learning rules based on these devices. This problem has been noted in previous circuits that use floating-gate adaptation [4,8]. In our circuit, different synapses can learn widely divergent weights from the same inputs because of component mismatch. We provide a calibration mechanism that enables identical adaptation across multiple synapses despite device mismatch. To our knowledge, this circuit is the first instance of a floating-gate learning circuit that includes this feature. This paper is organized as follows. First, we provide a brief introduction to floating-gate transistors. Next, we provide a description and analysis of our synapse, demonstrating that it can learn the conditional probability or correlation between a pair of binary signals. We then describe the calibration circuitry and show its effectiveness in compensating for adaptation mismatches. Finally, we discuss how this synapse can be used for silicon implementations of various learning networks. 2 Floating-gate transistors Because our circuit relies on floating-gate transistors to achieve adaptation, we begin by briefly discussing these devices. A floating-gate transistor (e.g. transistor M3 of Fig.1(a)) comprises a MOSFET whose gate is isolated on all sides by SiO2. A control gate capacitively couples signals to the floating gate. Charge stored on the floating gate implements a nonvolatile analog weight; the transistor’s output current varies with both the floating-gate voltage and the control-gate voltage. We use Fowler-Nordheim tunneling [9] to increase the floating-gate charge, and impact-ionized hot-electron injection (IHEI) [10] to decrease the floating-gate charge. We tunnel by placing a high voltage on a tunneling implant, denoted by the arrow in Fig.1(a). We inject by imposing more than about 3V across the drain and source of transistor M3. The circuit allows simultaneous adaptation and computation, because neither tunneling nor IHEI interfere with circuit operation. Over a wide range of tunneling voltages Vtun, we can approximate the magnitude of the tunneling current Itun as [4]: I tun = I tun 0 exp (Vtun − V fg ) / Vχ (1) where Vtun is the tunneling-implant voltage, Vfg is the floating-gate voltage, and Itun0 and Vχ are fit constants. Over a wide range of transistor drain and source voltages, we can approximate the magnitude of the injection current Iinj as [4]: 1−U t / Vγ I inj = I inj 0 I s exp ( (Vs − Vd ) / Vγ ) (2) where Vs and Vd are the drain and source voltages, Iinj0 is a pre-exponential current, Vγ is a constant that depends on the VLSI process, and Ut is the thermal voltage kT/q. 3 T h e s i l i co n s y n a p s e We show our silicon synapse in Fig.1. The synapse stores an analog weight W, multiplies W by a binary input Xin, and adapts W to either a conditional probability P(Xcor|Y) or a correlation P(XcorY). Xin is analogous to a presynaptic input, while Y is analogous to a postsynaptic signal or error feedback. Xcor is a presynaptic adaptation signal, and typically has some relationship with Xin. We can implement different learning rules by altering the relationship between Xcor and Xin. For some examples, see section 4. We now describe the circuit in more detail. The drain current of floating-gate transistor M4 represents the weight value W. Because the control gate of M4 is fixed, W depends solely on the charge on floating-gate capacitor C1. We can switch the drain current on or off using transistor M7; this switching action corresponds to a multiplication of the weight value W by a binary input signal, Xin. We choose values for the drain voltage of the M4 to prevent injection. A second floating-gate transistor M3, whose gate is also connected to C1, controls adaptation by injection and tunneling. Simultaneously high input signals Xcor and Y cause injection, increasing the weight. A high Vtun causes tunneling, decreasing the weight. We either choose to correlate a high Vtun with signal Y or provide a fixed high Vtun throughout the adaptation process. The choice determines whether the circuit learns a conditional probability or a correlation, respectively. Because the drain current sourced by M4 provides is the weight W, we can express W in terms of M4’s floating-gate voltage, Vfg. Vfg includes the effects of both the fixed controlgate voltage and the variable floating-gate charge. The expression differs depending on whether the readout transistor is operating in the subthreshold or above-threshold regime. We provide both expressions below: I 0 exp( − κ 2V fg /(1 + κ )U t ) W= κ V fg (1 + κ ) 2 β V0 − below threshold 2 (3) above threshold Here V0 is a constant that depends on the threshold voltage and on Vdd, Ut is the thermal voltage kT/q, κ is the floating-gate-to-channel coupling coefficient, and I 0 is a fixed bias current. Eq. 3 shows that W depends solely on Vfg, (all the other factors are constants). These equations differ slightly from standard equations for the source current through a transistor due to source degeneration caused by M 4. This degeneration smoothes the nonlinear relationship between Vfg and Is; its addition to the circuit is optional. 3.1 Weight adaptation Because W depends on Vfg, we can control W by tunneling or injecting transistor M3. In this section, we show that these mechanisms enable our circuit to learn the correlation or conditional probability between inputs Xcor (which we will refer to as X) and Y. Our analysis assumes that these statistics are fixed over some period during which adaptation occurs. The change in floating-gate voltage, and hence the weight, discussed below should therefore be interpreted in terms of the expected weight change due to the statistics of the inputs. We discuss learning of conditional probabilities; a slight change in the tunneling signal, described previously, allows us to learn correlations instead. We first derive the injection equation for the floating-gate voltage in terms of the joint probability P(X,Y) by considering the relationship between the input signals and Is, Vs, Vb Vtun M1 W eq (nA) 80 M2 60 40 C1 Xcor M4 M3 W M5 Xin Y o chip data − fit: P(X|Y)0.78 20 M6 0 M7 synaptic output 0.2 0.4 0.6 Pr(X|Y) 1 0.8 (b) 3.5 Fig. 1. (a) Synapse schematic. (b) Plot of equilibrium weight in the subthreshold regime versus the conditional probability P(X|Y), showing both experimental chip data and a fit from Eq.7 (c). Plot of equilibrium weight versus conditional probability in the above-threshold regime, again showing chip data and a fit from Eq.7. W eq (µA) (a). 3 2.5 2 0 o chip data − fit 0.2 0.4 0.6 Pr(X|Y) 0.8 1 (c) and Vd of M3. We assume that transistor M1 is in saturation, constraining Is at M3 to be constant. Presentation of a joint binary event (X,Y) closes nFET switches M5 and M6, pulling the drain voltage Vd of M3 to 0V and causing injection. Therefore the probability that Vd is low enough to cause injection is the probability of the joint event Pr(X,Y). By Eq.2 , the amount of the injection is also dependent on M3’s source voltage Vs. Because M3 is constrained to a fixed channel current, a drop in the floating-gate voltage, ∆Vfg, causes a drop in Vs of magnitude κ∆Vfg. Substituting these expressions into Eq.2 results in a floating-gate voltage update of: (dV fg / dt )inj = − I inj 0 Pr( X , Y ) exp(κ Vfg / Vγ ) (4) where Iinj0 also includes the constant source current. Eq.4 shows that the floating-gate voltage update due to injection is a function of the probability of the joint event (X,Y). Next we analyze the effects of tunneling on the floating-gate voltage. The origin of the tunneling signal determines whether the synapse is learning a conditional probability or a correlation. If the circuit is learning a conditional probability, occurrence of the conditioning event Y gates a corresponding high-voltage (~9V) signal onto the tunneling implant. Consequently, we can express the change in floating-gate voltage due to tunneling in terms of the probability of Y, and the floating-gate voltage. (dV fg / dt )tun = I tun 0 Pr(Y ) exp(−V fg / Vχ ) (5) Eq.5 shows that the floating-gate voltage update due to tunneling is a function of the probability of the event Y. 3.2 Weight equilibrium To demonstrate that our circuit learns P(X|Y), we show that the equilibrium weight of the synapse is solely a function of P(X|Y). The equilibrium weight of the synapse is the weight value where the expected weight change over time equals zero. This weight value corresponds to the floating-gate voltage where injection and tunneling currents are equal. To find this voltage, we equate Eq’s. 4 and 5 and solve: eq V fg = I inj 0 −1 log Pr( X | Y ) + log I tun 0 (κ / Vy + 1/ Vx ) (6) To derive the equilibrium weight, we substitute Eq.6 into Eq.3 and solve: I0 Weq = I inj 0 I tun 0 β V0 + η log where α = α Pr( X | Y ) I inj 0 I tun 0 below threshold 2 + log ( Pr( X | Y ) ) above threshold (7) κ2 κ2 and η = . (1 + κ )U t (κ / Vγ + 1/ Vχ ) (1 + κ )(κ / Vγ + 1/ Vχ ) Consequently, the equilibrium weight is a function of the conditional probability below threshold and a function of the log-squared conditional probability above threshold. Note that the equilibrium weight is stable because of negative feedback in the tunneling and injection processes. Therefore, the weight will always converge to the equilibrium value shown in Eq.7. Figs. 1(b) and (c) show the equilibrium weight versus the conditional P(X|Y) for both sub- and above-threshold circuits, along with fits to Eq.7. Note that both the sub- and above-threshold relationship between P(X|Y) and the equilibrium weight enables us to compute the probability of a vector of synaptic inputs X given a post-synaptic response Y. In both cases, we can apply the outputs currents of an array of synapses through diodes, and then add the resulting voltages via a capacitive voltage divider, resulting in a voltage that is a linear function of log P(X|Y). 3.3 Calibration circuitry Mismatch between injection and tunneling in different floating-gate transistors can greatly reduce the ability of our synapses to learn meaningful values. Experimental data from floating-gate transistors fabricated in a 0.35µm process show that injection varies by as much as 2:1 across a chip, and tunneling by up to 1.2:1. The effect of this mismatch on our synapses causes the weight equilibrium of different synapses to differ by a multiplicative gain. Fig.2 (b) shows the equilibrium weights of an array of six synapses exposed to identical input signals. The variation of the synaptic weights is of the same order of magnitude as the weights themselves, making large arrays of synapses all but useless for implementing many learning algorithms. We alleviate this problem by calibrating our synapses to equalize the pre-exponential tunneling and injection constants. Because the dependence of the equilibrium weight on these constants is determined by the ratio of Iinj0/Itun0, our calibration process changes Iinj to equalize the ratio of injection to tunneling across all synapses. We choose to calibrate injection because we can easily change Iinj0 by altering the drain current through M1. Our calibration procedure is a self-convergent memory write [11], that causes the equilibrium weight of every synapse to equal the current Ical. Calibration requires many operat- 80 Verase M1 M8 60 W eq (nA) Vb M2 Vtun 40 M3 M4 M9 V cal 20 M5 0 M7 M6 synaptic output 0.2 Ical 0.6 P(X|Y) 0.8 1 0.4 0.6 P(X|Y) 0.8 1 0.4 (b) 80 (a) Fig. 2. (a) Schematic of calibrated synapse with signals used during the calibration procedure. (b) Equilibrium weights for array of synapses shown in Fig.1a. (c) Equilibrium weights for array of calibrated synapses after calibration. W eq (nA) 60 40 20 0 0.2 (c) ing cycles, where, during each cycle, we first increase the equilibrium weight of the synapse, and second, we let the synapse adapt to the new equilibrium weight. We create the calibrated synapse by modifying our original synapse according to Fig. 2(a). We convert M1 into a floating-gate transistor, whose floating-gate charge thereby sets M3’s channel current, providing control of Iinj0 of Eq.7. Transistor M8 modifies M1’s gate charge by means of injection when M9’s gate is low and Vcal is low. M9’s gate is only low when the equilibrium weight W is less than Ical. During calibration, injection and tunneling on M3 are continuously active. We apply a pulse train to Vcal; during each pulse period, Vcal is predominately high. When Vcal is high, the synapse adapts towards its equilibrium weight. When Vcal pulses low, M8 injects, increasing the synapse’s equilibrium weight W. We repeat this process until the equilibrium weight W matches Ical, causing M9’s gate voltage to rise, disabling Vcal and with it injection. To ensure that a precalibrated synapse has an equilibrium weight below Ical, we use tunneling to erase all bias transistors prior to calibration. Fig.2(c) shows the equilibrium weights of six synapses after calibration. The data show that calibration can reduce the effect of mismatched adaptation on the synapse’s learned weight to a small fraction of the weight itself. Because M1 is a floating-gate transistor, its parasitic gate-drain capacitance causes a mild dependence between M1’s drain voltage and source current. Consequently, M3’s floatinggate voltage now affects its source current (through M1’s drain voltage), and we can model M3 as a source-degenerated pFET [3]. The new expression for the injection current in M3 is: Presynaptic neuron W+ Synapse W− X Y Injection Postsynaptic neuron Injection Activation window Fig. 3. A method for achieving spike-time dependent plasticity in silicon. (dV fg / dt )inj = − I inj 0 Pr( X , Y ) exp Vfg κ Vγ − κ k1 Ut (8) where k1 is close to zero. The new expression for injection slightly changes the α and η terms of the weight equilibrium in Eq.7, although the qualitative relationship between the weight equilibrium and the conditional probability remains the same. 4 Implementing silicon synaptic learning rules In this section we discuss how to implement a variety of learning rules from the computational-neurobiology and neural-network literature with our synapse circuit. We can use our circuit to implement a Hebbian learning rule. Simultaneously activating both M5 and M6 is analogous to heterosynaptic LTP based on synchronized pre- and postsynaptic signals, and activating tunneling with the postsynaptic Y is analogous to homosynaptic LTD. In our synapse, we tie Xin and Xcor together and correlate Vtun with Y. Our synapse is also capable of emulating a Boltzmann weight-update rule [2]. This weight-update rule derives from the difference between correlations among neurons when the network receives external input, and when the network operates in a free running phase (denoted as clamped and unclamped phases respectively). With weight decay, a Boltzmann synapse learns the difference between correlations in the clamped and unclamped phase. We can create a Boltzmann synapse from a pair of our circuits, in which the effective weight is the difference between the weights of the two synapses. To implement a weight update, we update one silicon synapse based on pre- and postsynaptic signals in the clamped phase, and update the other synapse in the unclamped phase. We do this by sending Xin to Xcor of one synapse in the clamped phase, and sending Xin to Xcor of the other synapse in the negative phase. Vtun remains constant throughout adaptation. Finally, we consider implementing a temporally asymmetric Hebbian learning rule [7] using our synapse. In temporally asymmetric Hebbian learning, a synapse exhibits LTP or LTD if the presynaptic input occurs before or after the postsynaptic response, respectively. We implement an asymmetric learning synapse using two of our circuits, where the synaptic weight is the difference in the weights of the two circuit. We show the circuit in Fig. 3. Each neuron sends two signals: a neuronal output, and an adaptation time window that is active for some time afterwards. Therefore, the combined synapse receives two presynaptic signals and two postsynaptic signals. The relative timing of a postsynaptic response, Y, with the presynaptic input, X, determines whether the synapse undergoes LTP or LTD. If Y occurs before X, Y’s time window correlates with X, causing injection on the negative synapse, decreasing the weight. If Y occurs after X, Y correlates with X’s time window, causing injection on the positive synapse, increasing the weight. Hence, our circuit can use the relative timing between presynaptic and postsynaptic activity to implement learning. 5 Conclusion We have described a silicon synapse that implements a wide range of spike-based learning rules, and that does not suffer from device mismatch. We have also described how we can implement various silicon-learning networks using this synapse. In addition, although we have only analyzed the learning properties of the synapse for binary signals, we can instead use pulse-coded analog signals. One possible avenue for future work is to analyze the implications of different pulse-coded schemes on the circuit’s adaptive behavior. A c k n o w l e d g e me n t s This work was supported by the National Science Foundation and by the Office of Naval Research. Aaron Shon was also supported by a NDSEG fellowship. We thank Anhai Doan and the anonymous reviewers for helpful comments. References [1] W.B.Levy, “A computational approach to hippocampal function,” in R.D. Hawkins and G.H. Bower (eds.), Computational Models of Learning in Simple Neural Systems, The Psychology of Learning and Motivation vol. 23, pp. 243-305, San Diego, CA: Academic Press, 1989. [2] D. H. Ackley, G. Hinton, and T. Sejnowski, “A learning algorithm for Boltzmann machines,” Cognitive Science vol. 9, pp. 147-169, 1985. [3 ] P. Hasler, B. A. Minch, J. Dugger, and C. Diorio, “Adaptive circuits and synapses using pFET floating-gate devices, ” in G. Cauwenberghs and M. Bayoumi (eds.) Learning in Silicon, pp. 33-65, Kluwer Academic, 1999. [4] P. Hafliger, A spike-based learning rule and its implementation in analog hardware, Ph.D. thesis, ETH Zurich, 1999. [5] C. Diorio, P. Hasler, B. A. Minch, and C. Mead, “A floating-gate MOS learning array with locally computer weight updates,” IEEE Transactions on Electron Devices vol. 44(12), pp. 2281-2289, 1997. [6] R. Sutton, “Learning to predict by the methods of temporal difference,” Machine Learning, vol. 3, p p . 9-44, 1988. [7] H.Markram, J. Lübke, M. Frotscher, and B. Sakmann, “Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs,” Science vol. 275, pp.213-215, 1997. [8] A. Pesavento, T. Horiuchi, C. Diorio, and C. Koch, “Adaptation of current signals with floating-gate circuits,” in Proceedings of the 7th International Conference on Microelectronics for Neural, Fuzzy, and Bio-Inspired Systems (Microneuro99), pp. 128-134, 1999. [9] M. Lenzlinger and E. H. Snow. “Fowler-Nordheim tunneling into thermally grown SiO2,” Journal of Applied Physics vol. 40(1), p p . 278--283, 1969. [10] E. Takeda, C. Yang, and A. Miura-Hamada, Hot Carrier Effects in MOS Devices, San Diego, CA: Academic Press, 1995. [11] C. Diorio, “A p-channel MOS synapse transistor with self-convergent memory writes,” IEEE Journal of Solid-State Circuits vol. 36(5), pp. 816-822, 2001.

2 0.4716714 137 nips-2001-On the Convergence of Leveraging

Author: Gunnar Rätsch, Sebastian Mika, Manfred K. Warmuth

Abstract: We give an unified convergence analysis of ensemble learning methods including e.g. AdaBoost, Logistic Regression and the Least-SquareBoost algorithm for regression. These methods have in common that they iteratively call a base learning algorithm which returns hypotheses that are then linearly combined. We show that these methods are related to the Gauss-Southwell method known from numerical optimization and state non-asymptotical convergence results for all these methods. Our analysis includes -norm regularized cost functions leading to a clean and general way to regularize ensemble learning.

3 0.46648943 139 nips-2001-Online Learning with Kernels

Author: Jyrki Kivinen, Alex J. Smola, Robert C. Williamson

Abstract: We consider online learning in a Reproducing Kernel Hilbert Space. Our method is computationally efficient and leads to simple algorithms. In particular we derive update equations for classification, regression, and novelty detection. The inclusion of the -trick allows us to give a robust parameterization. Moreover, unlike in batch learning where the -trick only applies to the -insensitive loss function we are able to derive general trimmed-mean types of estimators such as for Huber’s robust loss.     ¡

4 0.46583936 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines

Author: Manfred Opper, Robert Urbanczik

Abstract: Using methods of Statistical Physics, we investigate the rOle of model complexity in learning with support vector machines (SVMs). We show the advantages of using SVMs with kernels of infinite complexity on noisy target rules, which, in contrast to common theoretical beliefs, are found to achieve optimal generalization error although the training error does not converge to the generalization error. Moreover, we find a universal asymptotics of the learning curves which only depend on the target rule but not on the SVM kernel. 1

5 0.46331561 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines

Author: Olivier Chapelle, Bernhard Schćžšlkopf

Abstract: The choice of an SVM kernel corresponds to the choice of a representation of the data in a feature space and, to improve performance , it should therefore incorporate prior knowledge such as known transformation invariances. We propose a technique which extends earlier work and aims at incorporating invariances in nonlinear kernels. We show on a digit recognition task that the proposed approach is superior to the Virtual Support Vector method, which previously had been the method of choice. 1

6 0.46056333 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine

7 0.45962349 197 nips-2001-Why Neuronal Dynamics Should Control Synaptic Learning Rules

8 0.457899 134 nips-2001-On Kernel-Target Alignment

9 0.4567802 60 nips-2001-Discriminative Direction for Kernel Classifiers

10 0.45619395 8 nips-2001-A General Greedy Approximation Algorithm with Applications

11 0.45537969 27 nips-2001-Activity Driven Adaptive Stochastic Resonance

12 0.45440167 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

13 0.45379025 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms

14 0.45202255 170 nips-2001-Spectral Kernel Methods for Clustering

15 0.45050967 176 nips-2001-Stochastic Mixed-Signal VLSI Architecture for High-Dimensional Kernel Machines

16 0.45038694 37 nips-2001-Associative memory in realistic neuronal networks

17 0.44688141 74 nips-2001-Face Recognition Using Kernel Methods

18 0.44647163 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation

19 0.44544321 49 nips-2001-Citcuits for VLSI Implementation of Temporally Asymmetric Hebbian Learning

20 0.44511637 13 nips-2001-A Natural Policy Gradient