nips nips2005 nips2005-17 knowledge-graph by maker-knowledge-mining

17 nips-2005-Active Bidirectional Coupling in a Cochlear Chip

Source: pdf

Author: Bo Wen, Kwabena A. Boahen

Abstract: We present a novel cochlear model implemented in analog very large scale integration (VLSI) technology that emulates nonlinear active cochlear behavior. This silicon cochlea includes outer hair cell (OHC) electromotility through active bidirectional coupling (ABC), a mechanism we proposed in which OHC motile forces, through the microanatomical organization of the organ of Corti, realize the cochlear ampliﬁer. Our chip measurements demonstrate that frequency responses become larger and more sharply tuned when ABC is turned on; the degree of the enhancement decreases with input intensity as ABC includes saturation of OHC forces. 1 Silicon Cochleae Cochlear models, mathematical and physical, with the shared goal of emulating nonlinear active cochlear behavior, shed light on how the cochlea works if based on cochlear micromechanics. Among the modeling efforts, silicon cochleae have promise in meeting the need for real-time performance and low power consumption. Lyon and Mead developed the ﬁrst analog electronic cochlea [1], which employed a cascade of second-order ﬁlters with exponentially decreasing resonant frequencies. However, the cascade structure suffers from delay and noise accumulation and lacks fault-tolerance. Modeling the cochlea more faithfully, Watts built a two-dimensional (2D) passive cochlea that addressed these shortcomings by incorporating the cochlear ﬂuid using a resistive network [2]. This parallel structure, however, has its own problem: response gain is diminished by interference among the second-order sections’ outputs due to the large phase change at resonance [3]. Listening more to biology, our silicon cochlea aims to overcome the shortcomings of existing architectures by mimicking the cochlear micromechanics while including outer hair cell (OHC) electromotility. Although how exactly OHC motile forces boost the basilar membrane’s (BM) vibration remains a mystery, cochlear microanatomy provides clues. Based on these clues, we previously proposed a novel mechanism, active bidirectional coupling (ABC), for the cochlear ampliﬁer [4]. Here, we report an analog VLSI chip that implements this mechanism. In essence, our implementation is the ﬁrst silicon cochlea that employs stimulus enhancement (i.e., active behavior) instead of undamping (i.e., high ﬁlter Q [5]). The paper is organized as follows. In Section 2, we present the hypothesized mechanism (ABC), ﬁrst described in [4]. In Section 3, we provide a mathematical formulation of the Oval window organ of Corti BM Round window IHC RL A OHC PhP DC BM Basal @ Stereocilia i -1 i i+1 Apical B Figure 1: The inner ear. A Cutaway showing cochlear ducts (adapted from [6]). B Longitudinal view of cochlear partition (CP) (modiﬁed from [7]-[8]). Each outer hair cell (OHC) tilts toward the base while the Deiter’s cell (DC) on which it sits extends a phalangeal process (PhP) toward the apex. The OHCs’ stereocilia and the PhPs’ apical ends form the reticular lamina (RL). d is the tilt distance, and the segment size. IHC: inner hair cell. model as the basis of cochlear circuit design. Then we proceed in Section 4 to synthesize the circuit for the cochlear chip. Last, we present chip measurements in Section 5 that demonstrate nonlinear active cochlear behavior. 2 Active Bidirectional Coupling The cochlea actively ampliﬁes acoustic signals as it performs spectral analysis. The movement of the stapes sets the cochlear ﬂuid into motion, which passes the stimulus energy onto a certain region of the BM, the main vibrating organ in the cochlea (Figure 1A). From the base to the apex, BM ﬁbers increase in width and decrease in thickness, resulting in an exponential decrease in stiffness which, in turn, gives rise to the passive frequency tuning of the cochlea. The OHCs’ electromotility is widely thought to account for the cochlea’s exquisite sensitivity and discriminability. The exact way that OHC motile forces enhance the BM’s motion, however, remains unresolved. We propose that the triangular mechanical unit formed by an OHC, a phalangeal process (PhP) extended from the Deiter’s cell (DC) on which the OHC sits, and a portion of the reticular lamina (RL), between the OHC’s stereocilia end and the PhP’s apical tip, plays an active role in enhancing the BM’s responses (Figure 1B). The cochlear partition (CP) is divided into a number of segments longitudinally. Each segment includes one DC, one PhP’s apical tip and one OHC’s stereocilia end, both attached to the RL. Approximating the anatomy, we assume that when an OHC’s stereocilia end lies in segment i − 1, its basolateral end lies in the immediately apical segment i. Furthermore, the DC in segment i extends a PhP that angles toward the apex of the cochlea, with its apical end inserted just behind the stereocilia end of the OHC in segment i + 1. Our hypothesis (ABC) includes both feedforward and feedbackward interactions. On one hand, the feedforward mechanism, proposed in [9], hypothesized that the force resulting from OHC contraction or elongation is exerted onto an adjacent downstream BM segment due to the OHC’s basal tilt. On the other hand, the novel insight of the feedbackward mechanism is that the OHC force is delivered onto an adjacent upstream BM segment due to the apical tilt of the PhP extending from the DC’s main trunk. In a nutshell, the OHC motile forces, through the microanatomy of the CP, feed forward and backward, in harmony with each other, resulting in bidirectional coupling between BM segments in the longitudinal direction. Speciﬁcally, due to the opposite action of OHC S x M x Re Zm 1 0.5 0 0.2 0 A 5 10 15 20 Distance from stapes mm 25 B Figure 2: Wave propagation (WP) and basilar membrane (BM) impedance in the active cochlear model with a 2kHz pure tone (α = 0.15, γ = 0.3). A WP in ﬂuid and BM. B BM impedance Zm (i.e., pressure divided by velocity), normalized by S(x)M (x). Only the resistive component is shown; dot marks peak location. forces on the BM and the RL, the motion of BM segment i − 1 reinforces that of segment i while the motion of segment i + 1 opposes that of segment i, as described in detail in [4]. 3 The 2D Nonlinear Active Model To provide a blueprint for the cochlear circuit design, we formulate a 2D model of the cochlea that includes ABC. Both the cochlea’s length (BM) and height (cochlear ducts) are discretized into a number of segments, with the original aspect ratio of the cochlea maintained. In the following expressions, x represents the distance from the stapes along the CP, with x = 0 at the base (or the stapes) and x = L (uncoiled cochlear duct length) at the apex; y represents the vertical distance from the BM, with y = 0 at the BM and y = ±h (cochlear duct radius) at the bottom/top wall. Providing that the assumption of ﬂuid incompressibility holds, the velocity potential φ of the ﬂuids is required to satisfy 2 φ(x, y, t) = 0, where 2 denotes the Laplacian operator. By deﬁnition, this potential is related to ﬂuid velocities in the x and y directions: Vx = −∂φ/∂x and Vy = −∂φ/∂y. The BM is driven by the ﬂuid pressure difference across it. Hence, the BM’s vertical motion (with downward displacement being positive) can be described as follows. ˙ ¨ Pd (x) + FOHC (x) = S(x)δ(x) + β(x)δ(x) + M (x)δ(x), (1) where S(x) is the stiffness, β(x) is the damping, and M (x) is the mass, per unit area, of the BM; δ is the BM’s downward displacement. Pd = ρ ∂(φSV (x, y, t) − φST (x, y, t))/∂t is the pressure difference between the two ﬂuid ducts (the scala vestibuli (SV) and the scala tympani (ST)), evaluated at the BM (y = 0); ρ is the ﬂuid density. The FOHC(x) term combines feedforward and feedbackward OHC forces, described by FOHC (x) = s0 tanh(αγS(x)δ(x − d)/s0 ) − tanh(αS(x)δ(x + d)/s0 ) , (2) where α denotes the OHC motility, expressed as a fraction of the BM stiffness, and γ is the ratio of feedforward to feedbackward coupling, representing relative strengths of the OHC forces exerted on the BM segment through the DC, directly and via the tilted PhP. d denotes the tilt distance, which is the horizontal displacement between the source and the recipient of the OHC force, assumed to be equal for the forward and backward cases. We use the hyperbolic tangent function to model saturation of the OHC forces, the nonlinearity that is evident in physiological measurements [8]; s0 determines the saturation level. We observed wave propagation in the model and computed the BM’s impedance (i.e., the ratio of driving pressure to velocity). Following the semi-analytical approach in [2], we simulated a linear version of the model (without saturation). The traveling wave transitions from long-wave to short-wave before the BM vibration peaks; the wavelength around the characteristic place is comparable to the tilt distance (Figure 2A). The BM impedance’s real part (i.e., the resistive component) becomes negative before the peak (Figure 2B). On the whole, inclusion of OHC motility through ABC boosts the traveling wave by pumping energy onto the BM when the wavelength matches the tilt of the OHC and PhP. 4 Analog VLSI Design and Implementation Based on our mathematical model, which produces realistic responses, we implemented a 2D nonlinear active cochlear circuit in analog VLSI, taking advantage of the 2D nature of silicon chips. We ﬁrst synthesize a circuit analog of the mathematical model, and then we implement the circuit in the log-domain. We start by synthesizing a passive model, and then extend it to a nonlinear active one by including ABC with saturation. 4.1 Synthesizing the BM Circuit The model consists of two fundamental parts: the cochlear ﬂuid and the BM. First, we design the ﬂuid element and thus the ﬂuid network. In discrete form, the ﬂuids can be viewed as a grid of elements with a speciﬁc resistance that corresponds to the ﬂuid density or mass. Since charge is conserved for a small sheet of resistance and so are particles for a small volume of ﬂuid, we use current to simulate ﬂuid velocity. At the transistor level, the current ﬂowing through the channel of a MOS transistor, operating subthreshold as a diffusive element, can be used for this purpose. Therefore, following the approach in [10], we implement the cochlear ﬂuid network using a diffusor network formed by a 2D grid of nMOS transistors. Second, we design the BM element and thus the BM. As current represents velocity, we rewrite the BM boundary condition (Equation 1, without the FOHC term): ˙ Iin = S(x) Imem dt + β(x)Imem + M (x)I˙mem , (3) where Iin , obtained by applying the voltage from the diffusor network to the gate of a pMOS transistor, represents the velocity potential scaled by the ﬂuid density. In turn, Imem ˙ drives the diffusor network to match the ﬂuid velocity with the BM velocity, δ. The FOHC term is dealt with in Section 4.2. Implementing this second-order system requires two state-space variables, which we name Is and Io . And with s = jω, our synthesized BM design (passive) is τ1 Is s + Is τ2 Io s + Io Imem = −Iin + Io , = Iin − bIs , = Iin + Is − Io , (4) (5) (6) where the two ﬁrst-order systems are both low-pass ﬁlters (LPFs), with time constants τ1 and τ2 , respectively; b is a gain factor. Thus, Iin can be expressed in terms of Imem as: Iin s2 = (b + 1)/τ1 τ2 + ((τ1 + τ2 )/τ1 τ2)s + s2 Imem . Comparing this expression with the design target (Equation 3) yields the circuit analogs: S(x) = (b + 1)/τ1τ2 , β(x) = (τ1 + τ2 )/τ1 τ2 , and M (x) = 1. Note that the mass M (x) is a constant (i.e., 1), which was also the case in our mathematical model simulation. These analogies require that τ1 and τ2 increase exponentially to Half LPF ( ) + Iout- Iin+ Iout+ Iout Vq Iin+ Iin- C+ B Iin- A Iin+ + - - Iin- + + + C To neighbors Is- Is+ > + > + IT+ IT- + - + From neighbors Io- Io+ + + + + - - + + LPF Iout+ Iout- BM Imem+ Imem- Figure 3: Low-pass ﬁlter (LPF) and second-order section circuit design. A Half-LPF circuit. B Complete LPF circuit formed by two half-LPF circuits. C Basilar membrane (BM) circuit. It consists of two LPFs and connects to its neighbors through Is and IT . simulate the exponentially decreasing BM stiffness (and damping); b allows us to achieve a reasonable stiffness for a practical choice of τ1 and τ2 (capacitor size is limited by silicon area). 4.2 Adding Active Bidirectional Coupling To include ABC in the BM boundary condition, we replace δ in Equation 2 with Imem dt to obtain FOHC = rﬀ S(x)T Imem (x − d)dt − rfb S(x)T Imem (x + d)dt , where rﬀ = αγ and rfb = α denote the feedforward and feedbackward OHC motility factors, and T denotes saturation. The saturation is applied to the displacement, instead of the force, as this simpliﬁes the implementation. We obtain the integrals by observing that, in the passive design, the state variable Is = −Imem /sτ1 . Thus, Imem (x − d)dt = −τ1f Isf and Imem (x + d)dt = −τ1b Isb . Here, Isf and Isb represent the outputs of the ﬁrst LPF in the upstream and downstream BM segments, respectively; τ1f and τ1b represent their respective time constants. To reduce complexity in implementation, we use τ1 to approximate both τ1f and τ1b as the longitudinal span is small. We obtain the active BM design by replacing Equation 5 with the synthesis result: τ2 Ios + Io = Iin − bIs + rfb (b + 1)T (−Isb ) − rﬀ (b + 1)T (−Isf ). Note that, to implement ABC, we only need to add two currents to the second LPF in the passive system. These currents, Isf and Isb , come from the upstream and downstream neighbors of each segment. ISV Fluid Base BM IST Apex Fluid A IT + IT Is+ Is- + Vsat Imem Iin+ Imem- Iin- Is+ Is+ Is- IsBM IT + IT + - I IT T Vsat IT + IT Is+ Is- B Figure 4: Cochlear chip. A Architecture: Two diffusive grids with embedded BM circuits model the cochlea. B Detail. BM circuits exchange currents with their neighbors. 4.3 Class AB Log-domain Implementation We employ the log-domain ﬁltering technique [11] to realize current-mode operation. In addition, following the approach proposed in [12], we implement the circuit in Class AB to increase dynamic range, reduce the effect of mismatch and lower power consumption. This differential signaling is inspired by the way the biological cochlea works—the vibration of BM is driven by the pressure difference across it. Taking a bottom-up strategy, we start by designing a Class AB LPF, a building block for the BM circuit. It is described by + − + − + − + − + − 2 τ (Iout − Iout )s + (Iout − Iout ) = Iin − Iin and τ Iout Iout s + Iout Iout = Iq , where Iq sets the geometric mean of the positive and negative components of the output current, and τ sets the time constant. Combining the common-mode constraint with the differential design equation yields the nodal equation for the positive path (the negative path has superscripts + and − swapped): + − + + + − 2 ˙+ C Vout = Iτ (Iin − Iin ) + (Iq /Iout − Iout ) /(Iout + Iout ). + This nodal equation suggests the half-LPF circuit shown in Figure 3A. Vout , the voltage on + the positive capacitor (C ), gates a pMOS transistor to produce the corresponding current + − − signal, Iout (Vout and Iout are similarly related). The bias Vq sets the quiescent current Iq while Vτ determines the current Iτ , which is related to the time constant by τ = CuT/κIτ (κ is the subthreshold slope coefﬁcient and uT is the thermal voltage). Two of these subcircuits, connected in push–pull, form a complete LPF (Figure 3B). The BM circuit is implemented using two LPFs interacting in accordance with the synthesized design equations (Figure 3C). Imem is the combination of three currents, Iin , Is , and Io . Each BM sends out Is and receives IT , a saturated version of its neighbor’s Is . The saturation is accomplished by a current-limiting transistor (see Figure 4B), which yields IT = T (Is ) = Is Isat /(Is + Isat ), where Isat is set by a bias voltage Vsat. 4.4 Chip Architecture We fabricated a version of our cochlear chip architecture (Figure 4) with 360 BM circuits and two 4680-element ﬂuid grids (360 ×13). This chip occupies 10.9mm2 of silicon area in 0.25µm CMOS technology. Differential input signals are applied at the base while the two ﬂuid grids are connected at the apex through a ﬂuid element that represents the helicotrema. 5 Chip Measurements We carried out two measurements that demonstrate the desired ampliﬁcation by ABC, and the compressive growth of BM responses due to saturation. To obtain sinusoidal current as the input to the BM subcircuits, we set the voltages applied at the base to be the logarithm of a half-wave rectiﬁed sinusoid. We ﬁrst investigated BM-velocity frequency responses at six linearly spaced cochlear positions (Figure 5). The frequency that maximally excites the ﬁrst position (Stage 30), deﬁned as its characteristic frequency (CF), is 12.1kHz. The remaining ﬁve CFs, from early to later stages, are 8.2k, 1.7k, 905, 366, and 218Hz, respectively. Phase accumulation at the CFs ranges from 0.56 to 2.67π radians, comparable to 1.67π radians in the mammalian cochlea [13]. Q10 factor (the ratio of the CF to the bandwidth 10dB below the peak) ranges from 1.25 to 2.73, comparable to 2.55 at mid-sound intensity in biology (computed from [13]). The cutoff slope ranges from -20 to -54dB/octave, as compared to -85dB/octave in biology (computed from [13]). BM Velocity Amplitude dB 40 Stage 0 230 190 150 110 70 30 30 20 10 0 BM Velocity Phase Π radians 50 2 4 10 0.1 0.2 0.5 1 2 5 Frequency kHz A 10 20 0.1 0.2 0.5 1 2 5 Frequency kHz 10 20 B Figure 5: Measured BM-velocity frequency responses at six locations. A Amplitude. B Phase. Dashed lines: Biological data (adapted from [13]). Dots mark peaks. We then explored the longitudinal pattern of BM-velocity responses and the effect of ABC. Stimulating the chip using four different pure tones, we obtained responses in which a 4kHz input elicits a peak around Stage 85 while 500Hz sound travels all the way to Stage 178 and peaks there (Figure 6A). We varied the input voltage level and obtained frequency responses at Stage 100 (Figure 6B). Input voltage level increases linearly such that the current increases exponentially; the input current level (in dB) was estimated based on the measured κ for this chip. As expected, we observed linearly increasing responses at low frequencies in the logarithmic plot. In contrast, the responses around the CF increase less and become broader with increasing input level as saturation takes effect in that region (resembling a passive cochlea). We observed 24dB compression as compared to 27 to 47dB in biology [13]. At the highest intensities, compression also occurs at low frequencies. These chip measurements demonstrate that inclusion of ABC, simply through coupling neighboring BM elements, transforms a passive cochlea into an active one. This active cochlear model’s nonlinear responses are qualitatively comparable to physiological data. 6 Conclusions We presented an analog VLSI implementation of a 2D nonlinear cochlear model that utilizes a novel active mechanism, ABC, which we proposed to account for the cochlear ampliﬁer. ABC was shown to pump energy into the traveling wave. Rather than detecting the wave’s amplitude and implementing an automatic-gain-control loop, our biomorphic model accomplishes this simply by nonlinear interactions between adjacent neighbors. Im- 60 Frequency 4k 2k 1k 500 Hz BM Velocity Amplitude dB BM Velocity Amplitude dB 20 10 0 Input Level 40 48 dB 20 Stage 100 32 dB 16 dB 0 0 dB 10 0 50 100 150 Stage Number A 200 0.2 0.5 1 2 5 Frequency kHz 10 20 B Figure 6: Measured BM-velocity responses (cont’d). A Longitudinal responses (20-stage moving average). Peak shifts to earlier (basal) stages as input frequency increases from 500 to 4kHz. B Effects of increasing input intensity. Responses become broader and show compressive growth. plemented in the log-domain, with Class AB operation, our silicon cochlea shows enhanced frequency responses, with compressive behavior around the CF, when ABC is turned on. These features are desirable in prosthetic applications and automatic speech recognition systems as they capture the properties of the biological cochlea. References [1] Lyon, R.F. & Mead, C.A. (1988) An analog electronic cochlea. IEEE Trans. Acoust. Speech and Signal Proc., 36: 1119-1134. [2] Watts, L. (1993) Cochlear Mechanics: Analysis and Analog VLSI . Ph.D. thesis, Pasadena, CA: California Institute of Technology. [3] Fragni`re, E. (2005) A 100-Channel analog CMOS auditory ﬁlter bank for speech recognition. e IEEE International Solid-State Circuits Conference (ISSCC 2005) , pp. 140-141. [4] Wen, B. & Boahen, K. (2003) A linear cochlear model with active bi-directional coupling. The 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2003), pp. 2013-2016. [5] Sarpeshkar, R., Lyon, R.F., & Mead, C.A. (1996) An analog VLSI cochlear model with new transconductance ampliﬁer and nonlinear gain control. Proceedings of the IEEE Symposium on Circuits and Systems (ISCAS 1996) , 3: 292-295. [6] Mead, C.A. (1989) Analog VLSI and Neural Systems . Reading, MA: Addison-Wesley. [7] Russell, I.J. & Nilsen, K.E. (1997) The location of the cochlear ampliﬁer: Spatial representation of a single tone on the guinea pig basilar membrane. Proc. Natl. Acad. Sci. USA, 94: 2660-2664. [8] Geisler, C.D. (1998) From sound to synapse: physiology of the mammalian ear . Oxford University Press. [9] Geisler, C.D. & Sang, C. (1995) A cochlear model using feed-forward outer-hair-cell forces. Hearing Research , 86: 132-146. [10] Boahen, K.A. & Andreou, A.G. (1992) A contrast sensitive silicon retina with reciprocal synapses. In Moody, J.E. and Lippmann, R.P. (eds.), Advances in Neural Information Processing Systems 4 (NIPS 1992) , pp. 764-772, Morgan Kaufmann, San Mateo, CA. [11] Frey, D.R. (1993) Log-domain ﬁltering: an approach to current-mode ﬁltering. IEE Proc. G, Circuits Devices Syst., 140 (6): 406-416. [12] Zaghloul, K. & Boahen, K.A. (2005) An On-Off log-domain circuit that recreates adaptive ﬁltering in the retina. IEEE Transactions on Circuits and Systems I: Regular Papers , 52 (1): 99-107. [13] Ruggero, M.A., Rich, N.C., Narayan, S.S., & Robles, L. (1997) Basilar membrane responses to tones at the base of the chinchilla cochlea. J. Acoust. Soc. Am., 101 (4): 2151-2163.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present a novel cochlear model implemented in analog very large scale integration (VLSI) technology that emulates nonlinear active cochlear behavior. [sent-3, score-0.977]

2 This silicon cochlea includes outer hair cell (OHC) electromotility through active bidirectional coupling (ABC), a mechanism we proposed in which OHC motile forces, through the microanatomical organization of the organ of Corti, realize the cochlear ampliﬁer. [sent-4, score-1.019]

3 Our chip measurements demonstrate that frequency responses become larger and more sharply tuned when ABC is turned on; the degree of the enhancement decreases with input intensity as ABC includes saturation of OHC forces. [sent-5, score-0.278]

4 1 Silicon Cochleae Cochlear models, mathematical and physical, with the shared goal of emulating nonlinear active cochlear behavior, shed light on how the cochlea works if based on cochlear micromechanics. [sent-6, score-1.142]

5 Among the modeling efforts, silicon cochleae have promise in meeting the need for real-time performance and low power consumption. [sent-7, score-0.094]

6 Lyon and Mead developed the ﬁrst analog electronic cochlea [1], which employed a cascade of second-order ﬁlters with exponentially decreasing resonant frequencies. [sent-8, score-0.313]

7 Modeling the cochlea more faithfully, Watts built a two-dimensional (2D) passive cochlea that addressed these shortcomings by incorporating the cochlear ﬂuid using a resistive network [2]. [sent-10, score-0.964]

8 Listening more to biology, our silicon cochlea aims to overcome the shortcomings of existing architectures by mimicking the cochlear micromechanics while including outer hair cell (OHC) electromotility. [sent-12, score-0.75]

9 Although how exactly OHC motile forces boost the basilar membrane’s (BM) vibration remains a mystery, cochlear microanatomy provides clues. [sent-13, score-0.626]

10 Based on these clues, we previously proposed a novel mechanism, active bidirectional coupling (ABC), for the cochlear ampliﬁer [4]. [sent-14, score-0.575]

11 Here, we report an analog VLSI chip that implements this mechanism. [sent-15, score-0.145]

12 In essence, our implementation is the ﬁrst silicon cochlea that employs stimulus enhancement (i. [sent-16, score-0.308]

13 In Section 3, we provide a mathematical formulation of the Oval window organ of Corti BM Round window IHC RL A OHC PhP DC BM Basal @ Stereocilia i -1 i i+1 Apical B Figure 1: The inner ear. [sent-23, score-0.03]

14 B Longitudinal view of cochlear partition (CP) (modiﬁed from [7]-[8]). [sent-25, score-0.405]

15 Each outer hair cell (OHC) tilts toward the base while the Deiter’s cell (DC) on which it sits extends a phalangeal process (PhP) toward the apex. [sent-26, score-0.096]

16 The OHCs’ stereocilia and the PhPs’ apical ends form the reticular lamina (RL). [sent-27, score-0.188]

17 Then we proceed in Section 4 to synthesize the circuit for the cochlear chip. [sent-31, score-0.487]

18 Last, we present chip measurements in Section 5 that demonstrate nonlinear active cochlear behavior. [sent-32, score-0.599]

19 2 Active Bidirectional Coupling The cochlea actively ampliﬁes acoustic signals as it performs spectral analysis. [sent-33, score-0.239]

20 The movement of the stapes sets the cochlear ﬂuid into motion, which passes the stimulus energy onto a certain region of the BM, the main vibrating organ in the cochlea (Figure 1A). [sent-34, score-0.724]

21 From the base to the apex, BM ﬁbers increase in width and decrease in thickness, resulting in an exponential decrease in stiffness which, in turn, gives rise to the passive frequency tuning of the cochlea. [sent-35, score-0.175]

22 The exact way that OHC motile forces enhance the BM’s motion, however, remains unresolved. [sent-37, score-0.103]

23 The cochlear partition (CP) is divided into a number of segments longitudinally. [sent-39, score-0.405]

24 Each segment includes one DC, one PhP’s apical tip and one OHC’s stereocilia end, both attached to the RL. [sent-40, score-0.245]

25 Approximating the anatomy, we assume that when an OHC’s stereocilia end lies in segment i − 1, its basolateral end lies in the immediately apical segment i. [sent-41, score-0.327]

26 Furthermore, the DC in segment i extends a PhP that angles toward the apex of the cochlea, with its apical end inserted just behind the stereocilia end of the OHC in segment i + 1. [sent-42, score-0.39]

27 Our hypothesis (ABC) includes both feedforward and feedbackward interactions. [sent-43, score-0.1]

28 On one hand, the feedforward mechanism, proposed in [9], hypothesized that the force resulting from OHC contraction or elongation is exerted onto an adjacent downstream BM segment due to the OHC’s basal tilt. [sent-44, score-0.236]

29 On the other hand, the novel insight of the feedbackward mechanism is that the OHC force is delivered onto an adjacent upstream BM segment due to the apical tilt of the PhP extending from the DC’s main trunk. [sent-45, score-0.339]

30 In a nutshell, the OHC motile forces, through the microanatomy of the CP, feed forward and backward, in harmony with each other, resulting in bidirectional coupling between BM segments in the longitudinal direction. [sent-46, score-0.216]

31 2 0 A 5 10 15 20 Distance from stapes mm 25 B Figure 2: Wave propagation (WP) and basilar membrane (BM) impedance in the active cochlear model with a 2kHz pure tone (α = 0. [sent-49, score-0.635]

32 Only the resistive component is shown; dot marks peak location. [sent-56, score-0.066]

33 forces on the BM and the RL, the motion of BM segment i − 1 reinforces that of segment i while the motion of segment i + 1 opposes that of segment i, as described in detail in [4]. [sent-57, score-0.387]

34 3 The 2D Nonlinear Active Model To provide a blueprint for the cochlear circuit design, we formulate a 2D model of the cochlea that includes ABC. [sent-58, score-0.726]

35 Both the cochlea’s length (BM) and height (cochlear ducts) are discretized into a number of segments, with the original aspect ratio of the cochlea maintained. [sent-59, score-0.239]

36 Providing that the assumption of ﬂuid incompressibility holds, the velocity potential φ of the ﬂuids is required to satisfy 2 φ(x, y, t) = 0, where 2 denotes the Laplacian operator. [sent-61, score-0.072]

37 The BM is driven by the ﬂuid pressure difference across it. [sent-63, score-0.04]

38 Pd = ρ ∂(φSV (x, y, t) − φST (x, y, t))/∂t is the pressure difference between the two ﬂuid ducts (the scala vestibuli (SV) and the scala tympani (ST)), evaluated at the BM (y = 0); ρ is the ﬂuid density. [sent-66, score-0.078]

39 d denotes the tilt distance, which is the horizontal displacement between the source and the recipient of the OHC force, assumed to be equal for the forward and backward cases. [sent-68, score-0.047]

40 We use the hyperbolic tangent function to model saturation of the OHC forces, the nonlinearity that is evident in physiological measurements [8]; s0 determines the saturation level. [sent-69, score-0.152]

41 We observed wave propagation in the model and computed the BM’s impedance (i. [sent-70, score-0.09]

42 The traveling wave transitions from long-wave to short-wave before the BM vibration peaks; the wavelength around the characteristic place is comparable to the tilt distance (Figure 2A). [sent-74, score-0.147]

43 , the resistive component) becomes negative before the peak (Figure 2B). [sent-77, score-0.066]

44 On the whole, inclusion of OHC motility through ABC boosts the traveling wave by pumping energy onto the BM when the wavelength matches the tilt of the OHC and PhP. [sent-78, score-0.155]

45 4 Analog VLSI Design and Implementation Based on our mathematical model, which produces realistic responses, we implemented a 2D nonlinear active cochlear circuit in analog VLSI, taking advantage of the 2D nature of silicon chips. [sent-79, score-0.723]

46 We ﬁrst synthesize a circuit analog of the mathematical model, and then we implement the circuit in the log-domain. [sent-80, score-0.238]

47 We start by synthesizing a passive model, and then extend it to a nonlinear active one by including ABC with saturation. [sent-81, score-0.141]

48 1 Synthesizing the BM Circuit The model consists of two fundamental parts: the cochlear ﬂuid and the BM. [sent-83, score-0.405]

49 First, we design the ﬂuid element and thus the ﬂuid network. [sent-84, score-0.032]

50 At the transistor level, the current ﬂowing through the channel of a MOS transistor, operating subthreshold as a diffusive element, can be used for this purpose. [sent-87, score-0.044]

51 Therefore, following the approach in [10], we implement the cochlear ﬂuid network using a diffusor network formed by a 2D grid of nMOS transistors. [sent-88, score-0.443]

52 Second, we design the BM element and thus the BM. [sent-89, score-0.032]

53 In turn, Imem ˙ drives the diffusor network to match the ﬂuid velocity with the BM velocity, δ. [sent-91, score-0.11]

54 And with s = jω, our synthesized BM design (passive) is τ1 Is s + Is τ2 Io s + Io Imem = −Iin + Io , = Iin − bIs , = Iin + Is − Io , (4) (5) (6) where the two ﬁrst-order systems are both low-pass ﬁlters (LPFs), with time constants τ1 and τ2 , respectively; b is a gain factor. [sent-95, score-0.032]

55 Comparing this expression with the design target (Equation 3) yields the circuit analogs: S(x) = (b + 1)/τ1τ2 , β(x) = (τ1 + τ2 )/τ1 τ2 , and M (x) = 1. [sent-97, score-0.114]

56 B Complete LPF circuit formed by two half-LPF circuits. [sent-103, score-0.082]

57 simulate the exponentially decreasing BM stiffness (and damping); b allows us to achieve a reasonable stiffness for a practical choice of τ1 and τ2 (capacitor size is limited by silicon area). [sent-106, score-0.169]

58 The saturation is applied to the displacement, instead of the force, as this simpliﬁes the implementation. [sent-109, score-0.061]

59 We obtain the integrals by observing that, in the passive design, the state variable Is = −Imem /sτ1 . [sent-110, score-0.048]

60 Here, Isf and Isb represent the outputs of the ﬁrst LPF in the upstream and downstream BM segments, respectively; τ1f and τ1b represent their respective time constants. [sent-112, score-0.071]

61 To reduce complexity in implementation, we use τ1 to approximate both τ1f and τ1b as the longitudinal span is small. [sent-113, score-0.044]

62 We obtain the active BM design by replacing Equation 5 with the synthesis result: τ2 Ios + Io = Iin − bIs + rfb (b + 1)T (−Isb ) − rﬀ (b + 1)T (−Isf ). [sent-114, score-0.137]

63 Note that, to implement ABC, we only need to add two currents to the second LPF in the passive system. [sent-115, score-0.075]

64 These currents, Isf and Isb , come from the upstream and downstream neighbors of each segment. [sent-116, score-0.071]

65 A Architecture: Two diffusive grids with embedded BM circuits model the cochlea. [sent-118, score-0.071]

66 In addition, following the approach proposed in [12], we implement the circuit in Class AB to increase dynamic range, reduce the effect of mismatch and lower power consumption. [sent-123, score-0.082]

67 This differential signaling is inspired by the way the biological cochlea works—the vibration of BM is driven by the pressure difference across it. [sent-124, score-0.309]

68 Combining the common-mode constraint with the differential design equation yields the nodal equation for the positive path (the negative path has superscripts + and − swapped): + − + + + − 2 ˙+ C Vout = Iτ (Iin − Iin ) + (Iq /Iout − Iout ) /(Iout + Iout ). [sent-127, score-0.057]

69 + This nodal equation suggests the half-LPF circuit shown in Figure 3A. [sent-128, score-0.107]

70 Vout , the voltage on + the positive capacitor (C ), gates a pMOS transistor to produce the corresponding current + − − signal, Iout (Vout and Iout are similarly related). [sent-129, score-0.082]

71 The BM circuit is implemented using two LPFs interacting in accordance with the synthesized design equations (Figure 3C). [sent-132, score-0.114]

72 The saturation is accomplished by a current-limiting transistor (see Figure 4B), which yields IT = T (Is ) = Is Isat /(Is + Isat ), where Isat is set by a bias voltage Vsat. [sent-135, score-0.143]

73 4 Chip Architecture We fabricated a version of our cochlear chip architecture (Figure 4) with 360 BM circuits and two 4680-element ﬂuid grids (360 ×13). [sent-137, score-0.547]

74 Differential input signals are applied at the base while the two ﬂuid grids are connected at the apex through a ﬂuid element that represents the helicotrema. [sent-141, score-0.127]

75 5 Chip Measurements We carried out two measurements that demonstrate the desired ampliﬁcation by ABC, and the compressive growth of BM responses due to saturation. [sent-142, score-0.136]

76 To obtain sinusoidal current as the input to the BM subcircuits, we set the voltages applied at the base to be the logarithm of a half-wave rectiﬁed sinusoid. [sent-143, score-0.034]

77 We ﬁrst investigated BM-velocity frequency responses at six linearly spaced cochlear positions (Figure 5). [sent-144, score-0.521]

78 The frequency that maximally excites the ﬁrst position (Stage 30), deﬁned as its characteristic frequency (CF), is 12. [sent-145, score-0.086]

79 BM Velocity Amplitude dB 40 Stage 0 230 190 150 110 70 30 30 20 10 0 BM Velocity Phase Π radians 50 2 4 10 0. [sent-159, score-0.033]

80 5 1 2 5 Frequency kHz 10 20 B Figure 5: Measured BM-velocity frequency responses at six locations. [sent-165, score-0.116]

81 We then explored the longitudinal pattern of BM-velocity responses and the effect of ABC. [sent-170, score-0.117]

82 Stimulating the chip using four different pure tones, we obtained responses in which a 4kHz input elicits a peak around Stage 85 while 500Hz sound travels all the way to Stage 178 and peaks there (Figure 6A). [sent-171, score-0.177]

83 We varied the input voltage level and obtained frequency responses at Stage 100 (Figure 6B). [sent-172, score-0.154]

84 Input voltage level increases linearly such that the current increases exponentially; the input current level (in dB) was estimated based on the measured κ for this chip. [sent-173, score-0.038]

85 As expected, we observed linearly increasing responses at low frequencies in the logarithmic plot. [sent-174, score-0.073]

86 In contrast, the responses around the CF increase less and become broader with increasing input level as saturation takes effect in that region (resembling a passive cochlea). [sent-175, score-0.182]

87 These chip measurements demonstrate that inclusion of ABC, simply through coupling neighboring BM elements, transforms a passive cochlea into an active one. [sent-178, score-0.502]

88 This active cochlear model’s nonlinear responses are qualitatively comparable to physiological data. [sent-179, score-0.571]

89 6 Conclusions We presented an analog VLSI implementation of a 2D nonlinear cochlear model that utilizes a novel active mechanism, ABC, which we proposed to account for the cochlear ampliﬁer. [sent-180, score-0.977]

90 ABC was shown to pump energy into the traveling wave. [sent-181, score-0.03]

91 5 1 2 5 Frequency kHz 10 20 B Figure 6: Measured BM-velocity responses (cont’d). [sent-185, score-0.073]

92 Peak shifts to earlier (basal) stages as input frequency increases from 500 to 4kHz. [sent-187, score-0.043]

93 plemented in the log-domain, with Class AB operation, our silicon cochlea shows enhanced frequency responses, with compressive behavior around the CF, when ABC is turned on. [sent-190, score-0.384]

94 (2005) A 100-Channel analog CMOS auditory ﬁlter bank for speech recognition. [sent-207, score-0.074]

95 (2003) A linear cochlear model with active bi-directional coupling. [sent-212, score-0.472]

96 (1996) An analog VLSI cochlear model with new transconductance ampliﬁer and nonlinear gain control. [sent-220, score-0.505]

97 (1997) The location of the cochlear ampliﬁer: Spatial representation of a single tone on the guinea pig basilar membrane. [sent-230, score-0.468]

98 (1992) A contrast sensitive silicon retina with reciprocal synapses. [sent-249, score-0.069]

99 (2005) An On-Off log-domain circuit that recreates adaptive ﬁltering in the retina. [sent-266, score-0.082]

100 (1997) Basilar membrane responses to tones at the base of the chinchilla cochlea. [sent-275, score-0.107]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('bm', 0.474), ('cochlear', 0.405), ('ohc', 0.34), ('cochlea', 0.239), ('iin', 0.214), ('iout', 0.201), ('uid', 0.2), ('imem', 0.189), ('abc', 0.15), ('apical', 0.088), ('php', 0.088), ('io', 0.088), ('lpf', 0.088), ('segment', 0.082), ('circuit', 0.082), ('fohc', 0.075), ('stereocilia', 0.075), ('analog', 0.074), ('responses', 0.073), ('velocity', 0.072), ('chip', 0.071), ('silicon', 0.069), ('active', 0.067), ('apex', 0.063), ('basilar', 0.063), ('feedbackward', 0.063), ('saturation', 0.061), ('vlsi', 0.061), ('forces', 0.059), ('bidirectional', 0.056), ('db', 0.052), ('dc', 0.052), ('impedance', 0.05), ('isb', 0.05), ('isf', 0.05), ('stapes', 0.05), ('stiffness', 0.05), ('ampli', 0.048), ('passive', 0.048), ('coupling', 0.047), ('tilt', 0.047), ('longitudinal', 0.044), ('transistor', 0.044), ('iq', 0.044), ('mead', 0.044), ('motile', 0.044), ('frequency', 0.043), ('circuits', 0.041), ('cp', 0.04), ('wave', 0.04), ('pressure', 0.04), ('voltage', 0.038), ('diffusor', 0.038), ('downstream', 0.038), ('ducts', 0.038), ('isat', 0.038), ('lpfs', 0.038), ('lyon', 0.038), ('motility', 0.038), ('rfb', 0.038), ('cf', 0.037), ('hair', 0.037), ('feedforward', 0.037), ('boahen', 0.035), ('base', 0.034), ('compressive', 0.033), ('radians', 0.033), ('resistive', 0.033), ('upstream', 0.033), ('vout', 0.033), ('peak', 0.033), ('design', 0.032), ('dt', 0.031), ('measurements', 0.03), ('grids', 0.03), ('organ', 0.03), ('traveling', 0.03), ('vibration', 0.03), ('stage', 0.03), ('biology', 0.029), ('ab', 0.029), ('basal', 0.028), ('khz', 0.028), ('rl', 0.027), ('currents', 0.027), ('nonlinear', 0.026), ('force', 0.026), ('cfs', 0.025), ('cochleae', 0.025), ('corti', 0.025), ('deiter', 0.025), ('duct', 0.025), ('electromotility', 0.025), ('exerted', 0.025), ('ihc', 0.025), ('lamina', 0.025), ('microanatomy', 0.025), ('nodal', 0.025), ('ohcs', 0.025), ('phalangeal', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999911 17 nips-2005-Active Bidirectional Coupling in a Cochlear Chip

Author: Bo Wen, Kwabena A. Boahen

2 0.16849484 155 nips-2005-Predicting EMG Data from M1 Neurons with Variational Bayesian Least Squares

Author: Jo-anne Ting, Aaron D'souza, Kenji Yamamoto, Toshinori Yoshioka, Donna Hoffman, Shinji Kakei, Lauren Sergio, John Kalaska, Mitsuo Kawato

Abstract: An increasing number of projects in neuroscience requires the statistical analysis of high dimensional data sets, as, for instance, in predicting behavior from neural ﬁring or in operating artiﬁcial devices from brain recordings in brain-machine interfaces. Linear analysis techniques remain prevalent in such cases, but classical linear regression approaches are often numerically too fragile in high dimensions. In this paper, we address the question of whether EMG data collected from arm movements of monkeys can be faithfully reconstructed with linear approaches from neural activity in primary motor cortex (M1). To achieve robust data analysis, we develop a full Bayesian approach to linear regression that automatically detects and excludes irrelevant features in the data, regularizing against overﬁtting. In comparison with ordinary least squares, stepwise regression, partial least squares, LASSO regression and a brute force combinatorial search for the most predictive input features in the data, we demonstrate that the new Bayesian method oﬀers a superior mixture of characteristics in terms of regularization against overﬁtting, computational eﬃciency and ease of use, demonstrating its potential as a drop-in replacement for other linear regression techniques. As neuroscientiﬁc results, our analyses demonstrate that EMG data can be well predicted from M1 neurons, further opening the path for possible real-time interfaces between brains and machines. 1

3 0.15368457 192 nips-2005-The Information-Form Data Association Filter

Author: Brad Schumitsch, Sebastian Thrun, Gary Bradski, Kunle Olukotun

Abstract: This paper presents a new ﬁlter for online data association problems in high-dimensional spaces. The key innovation is a representation of the data association posterior in information form, in which the “proximity” of objects and tracks are expressed by numerical links. Updating these links requires linear time, compared to exponential time required for computing the exact posterior probabilities. The paper derives the algorithm formally and provides comparative results using data obtained by a real-world camera array and by a large-scale sensor network simulation.

4 0.075547703 22 nips-2005-An Analog Visual Pre-Processing Processor Employing Cyclic Line Access in Only-Nearest-Neighbor-Interconnects Architecture

Author: Yusuke Nakashita, Yoshio Mita, Tadashi Shibata

Abstract: An analog focal-plane processor having a 128¢128 photodiode array has been developed for directional edge ﬁltering. It can perform 4¢4-pixel kernel convolution for entire pixels only with 256 steps of simple analog processing. Newly developed cyclic line access and row-parallel processing scheme in conjunction with the “only-nearest-neighbor interconnects” architecture has enabled a very simple implementation. A proof-of-concept chip was fabricated in a 0.35- m 2-poly 3-metal CMOS technology and the edge ﬁltering at a rate of 200 frames/sec. has been experimentally demonstrated.

5 0.064782627 118 nips-2005-Learning in Silicon: Timing is Everything

Author: John V. Arthur, Kwabena Boahen

Abstract: We describe a neuromorphic chip that uses binary synapses with spike timing-dependent plasticity (STDP) to learn stimulated patterns of activity and to compensate for variability in excitability. Speciﬁcally, STDP preferentially potentiates (turns on) synapses that project from excitable neurons, which spike early, to lethargic neurons, which spike late. The additional excitatory synaptic current makes lethargic neurons spike earlier, thereby causing neurons that belong to the same pattern to spike in synchrony. Once learned, an entire pattern can be recalled by stimulating a subset. 1 Variability in Neural Systems Evidence suggests precise spike timing is important in neural coding, speciﬁcally, in the hippocampus. The hippocampus uses timing in the spike activity of place cells (in addition to rate) to encode location in space [1]. Place cells employ a phase code: the timing at which a neuron spikes relative to the phase of the inhibitory theta rhythm (5-12Hz) conveys information. As an animal approaches a place cell’s preferred location, the place cell not only increases its spike rate, but also spikes at earlier phases in the theta cycle. To implement a phase code, the theta rhythm is thought to prevent spiking until the input synaptic current exceeds the sum of the neuron threshold and the decreasing inhibition on the downward phase of the cycle [2]. However, even with identical inputs and common theta inhibition, neurons do not spike in synchrony. Variability in excitability spreads the activity in phase. Lethargic neurons (such as those with high thresholds) spike late in the theta cycle, since their input exceeds the sum of the neuron threshold and theta inhibition only after the theta inhibition has had time to decrease. Conversely, excitable neurons (such as those with low thresholds) spike early in the theta cycle. Consequently, variability in excitability translates into variability in timing. We hypothesize that the hippocampus achieves its precise spike timing (about 10ms) through plasticity enhanced phase-coding (PEP). The source of hippocampal timing precision in the presence of variability (and noise) remains unexplained. Synaptic plasticity can compensate for variability in excitability if it increases excitatory synaptic input to neurons in inverse proportion to their excitabilities. Recasting this in a phase-coding framework, we desire a learning rule that increases excitatory synaptic input to neurons directly related to their phases. Neurons that lag require additional synaptic input, whereas neurons that lead 120µm 190µm A B Figure 1: STDP Chip. A The chip has a 16-by-16 array of microcircuits; one microcircuit includes four principal neurons, each with 21 STDP circuits. B The STDP Chip is embedded in a circuit board including DACs, a CPLD, a RAM chip, and a USB chip, which communicates with a PC. require none. The spike timing-dependent plasticity (STDP) observed in the hippocampus satisﬁes this requirement [3]. It requires repeated pre-before-post spike pairings (within a time window) to potentiate and repeated post-before-pre pairings to depress a synapse. Here we validate our hypothesis with a model implemented in silicon, where variability is as ubiquitous as it is in biology [4]. Section 2 presents our silicon system, including the STDP Chip. Section 3 describes and characterizes the STDP circuit. Section 4 demonstrates that PEP compensates for variability and provides evidence that STDP is the compensation mechanism. Section 5 explores a desirable consequence of PEP: unconventional associative pattern recall. Section 6 discusses the implications of the PEP model, including its beneﬁts and applications in the engineering of neuromorphic systems and in the study of neurobiology. 2 Silicon System We have designed, submitted, and tested a silicon implementation of PEP. The STDP Chip was fabricated through MOSIS in a 1P5M 0.25µm CMOS process, with just under 750,000 transistors in just over 10mm2 of area. It has a 32 by 32 array of excitatory principal neurons commingled with a 16 by 16 array of inhibitory interneurons that are not used here (Figure 1A). Each principal neuron has 21 STDP synapses. The address-event representation (AER) [5] is used to transmit spikes off chip and to receive afferent and recurrent spike input. To conﬁgure the STDP Chip as a recurrent network, we embedded it in a circuit board (Figure 1B). The board has ﬁve primary components: a CPLD (complex programmable logic device), the STDP Chip, a RAM chip, a USB interface chip, and DACs (digital-to-analog converters). The central component in the system is the CPLD. The CPLD handles AER trafﬁc, mediates communication between devices, and implements recurrent connections by accessing a lookup table, stored in the RAM chip. The USB interface chip provides a bidirectional link with a PC. The DACs control the analog biases in the system, including the leak current, which the PC varies in real-time to create the global inhibitory theta rhythm. The principal neuron consists of a refractory period and calcium-dependent potassium circuit (RCK), a synapse circuit, and a soma circuit (Figure 2A). RCK and the synapse are ISOMA Soma Synapse STDP Presyn. Spike PE LPF A Presyn. Spike Raster AH 0 0.1 Spike probability RCK Postsyn. Spike B 0.05 0.1 0.05 0.1 0.08 0.06 0.04 0.02 0 0 Time(s) Figure 2: Principal neuron. A A simpliﬁed schematic is shown, including: the synapse, refractory and calcium-dependent potassium channel (RCK), soma, and axon-hillock (AH) circuits, plus their constituent elements, the pulse extender (PE) and the low-pass ﬁlter (LPF). B Spikes (dots) from 81 principal neurons are temporally dispersed, when excited by poisson-like inputs (58Hz) and inhibited by the common 8.3Hz theta rhythm (solid line). The histogram includes spikes from ﬁve theta cycles. composed of two reusable blocks: the low-pass ﬁlter (LPF) and the pulse extender (PE). The soma is a modiﬁed version of the LPF, which receives additional input from an axonhillock circuit (AH). RCK is inhibitory to the neuron. It consists of a PE, which models calcium inﬂux during a spike, and a LPF, which models calcium buffering. When AH ﬁres a spike, a packet of charge is dumped onto a capacitor in the PE. The PE’s output activates until the charge decays away, which takes a few milliseconds. Also, while the PE is active, charge accumulates on the LPF’s capacitor, lowering the LPF’s output voltage. Once the PE deactivates, this charge leaks away as well, but this takes tens of milliseconds because the leak is smaller. The PE’s and the LPF’s inhibitory effects on the soma are both described below in terms of the sum (ISHUNT ) of the currents their output voltages produce in pMOS transistors whose sources are at Vdd (see Figure 2A). Note that, in the absence of spikes, these currents decay exponentially, with a time-constant determined by their respective leaks. The synapse circuit is excitatory to the neuron. It is composed of a PE, which represents the neurotransmitter released into the synaptic cleft, and a LPF, which represents the bound neurotransmitter. The synapse circuit is similar to RCK in structure but differs in function: It is activated not by the principal neuron itself but by the STDP circuits (or directly by afferent spikes that bypass these circuits, i.e., ﬁxed synapses). The synapse’s effect on the soma is also described below in terms of the current (ISYN ) its output voltage produces in a pMOS transistor whose source is at Vdd. The soma circuit is a leaky integrator. It receives excitation from the synapse circuit and shunting inhibition from RCK and has a leak current as well. Its temporal behavior is described by: τ dISOMA ISYN I0 + ISOMA = dt ISHUNT where ISOMA is the current the capacitor’s voltage produces in a pMOS transistor whose source is at Vdd (see Figure 2A). ISHUNT is the sum of the leak, refractory, and calciumdependent potassium currents. These currents also determine the time constant: τ = C Ut κISHUNT , where I0 and κ are transistor parameters and Ut is the thermal voltage. STDP circuit ~LTP SRAM Presynaptic spike A ~LTD Inverse number of pairings Integrator Decay Postsynaptic spike Potentiation 0.1 0.05 0 0.05 0.1 Depression -80 -40 0 Presynaptic spike Postsynaptic spike 40 Spike timing: t pre - t post (ms) 80 B Figure 3: STDP circuit design and characterization. A The circuit is composed of three subcircuits: decay, integrator, and SRAM. B The circuit potentiates when the presynaptic spike precedes the postsynaptic spike and depresses when the postsynaptic spike precedes the presynaptic spike. The soma circuit is connected to an AH, the locus of spike generation. The AH consists of model voltage-dependent sodium and potassium channel populations (modiﬁed from [6] by Kai Hynna). It initiates the AER signaling process required to send a spike off chip. To characterize principal neuron variability, we excited 81 neurons with poisson-like 58Hz spike trains (Figure 2B). We made these spike trains poisson-like by starting with a regular 200Hz spike train and dropping spikes randomly, with probability of 0.71. Thus spikes were delivered to neurons that won the coin toss in synchrony every 5ms. However, neurons did not lock onto the input synchrony due to ﬁltering by the synaptic time constant (see Figure 2B). They also received a common inhibitory input at the theta frequency (8.3Hz), via their leak current. Each neuron was prevented from ﬁring more than one spike in a theta cycle by its model calcium-dependent potassium channel population. The principal neurons’ spike times were variable. To quantify the spike variability, we used timing precision, which we deﬁne as twice the standard deviation of spike times accumulated from ﬁve theta cycles. With an input rate of 58Hz the timing precision was 34ms. 3 STDP Circuit The STDP circuit (related to [7]-[8]), for which the STDP Chip is named, is the most abundant, with 21,504 copies on the chip. This circuit is built from three subcircuits: decay, integrator, and SRAM (Figure 3A). The decay and integrator are used to implement potentiation, and depression, in a symmetric fashion. The SRAM holds the current binary state of the synapse, either potentiated or depressed. For potentiation, the decay remembers the last presynaptic spike. Its capacitor is charged when that spike occurs and discharges linearly thereafter. A postsynaptic spike samples the charge remaining on the capacitor, passes it through an exponential function, and dumps the resultant charge into the integrator. This charge decays linearly thereafter. At the time of the postsynaptic spike, the SRAM, a cross-coupled inverter pair, reads the voltage on the integrator’s capacitor. If it exceeds a threshold, the SRAM switches state from depressed to potentiated (∼LTD goes high and ∼LTP goes low). The depression side of the STDP circuit is exactly symmetric, except that it responds to postsynaptic activation followed by presynaptic activation and switches the SRAM’s state from potentiated to depressed (∼LTP goes high and ∼LTD goes low). When the SRAM is in the potentiated state, the presynaptic 50 After STDP 83 92 100 Timing precision(ms) Before STDP 75 B Before STDP After STDP 40 30 20 10 0 50 60 70 80 90 Input rate(Hz) 100 50 58 67 text A 0.2 0.4 Time(s) 0.6 0.2 0.4 Time(s) 0.6 C Figure 4: Plasticity enhanced phase-coding. A Spike rasters of 81 neurons (9 by 9 cluster) display synchrony over a two-fold range of input rates after STDP. B The degree of enhancement is quantiﬁed by timing precision. C Each neuron (center box) sends synapses to (dark gray) and receives synapses from (light gray) twenty-one randomly chosen neighbors up to ﬁve nodes away (black indicates both connections). spike activates the principal neuron’s synapse; otherwise the spike has no effect. We characterized the STDP circuit by activating a plastic synapse and a ﬁxed synapse– which elicits a spike at different relative times. We repeated this pairing at 16Hz. We counted the number of pairings required to potentiate (or depress) the synapse. Based on this count, we calculated the efﬁcacy of each pairing as the inverse number of pairings required (Figure 3B). For example, if twenty pairings were required to potentiate the synapse, the efﬁcacy of that pre-before-post time-interval was one twentieth. The efﬁcacy of both potentiation and depression are ﬁt by exponentials with time constants of 11.4ms and 94.9ms, respectively. This behavior is similar to that observed in the hippocampus: potentiation has a shorter time constant and higher maximum efﬁcacy than depression [3]. 4 Recurrent Network We carried out an experiment designed to test the STDP circuit’s ability to compensate for variability in spike timing through PEP. Each neuron received recurrent connections from 21 randomly selected neurons within an 11 by 11 neighborhood centered on itself (see Figure 4C). Conversely, it made recurrent connections to randomly chosen neurons within the same neighborhood. These connections were mediated by STDP circuits, initialized to the depressed state. We chose a 9 by 9 cluster of neurons and delivered spikes at a mean rate of 50 to 100Hz to each one (dropping spikes with a probability of 0.75 to 0.5 from a regular 200Hz train) and provided common theta inhibition as before. We compared the variability in spike timing after ﬁve seconds of learning with the initial distribution. Phase coding was enhanced after STDP (Figure 4A). Before STDP, spike timing among neurons was highly variable (except for the very highest input rate). After STDP, variability was virtually eliminated (except for the very lowest input rate). Initially, the variability, characterized by timing precision, was inversely related to the input rate, decreasing from 34 to 13ms. After ﬁve seconds of STDP, variability decreased and was largely independent of input rate, remaining below 11ms. Potentiated synapses 25 A Synaptic state after STDP 20 15 10 5 0 B 50 100 150 200 Spiking order 250 Figure 5: Compensating for variability. A Some synapses (dots) become potentiated (light) while others remain depressed (dark) after STDP. B The number of potentiated synapses neurons make (pluses) and receive (circles) is negatively (r = -0.71) and positively (r = 0.76) correlated to their rank in the spiking order, respectively. Comparing the number of potentiated synapses each neuron made or received with its excitability conﬁrmed the PEP hypothesis (i.e., leading neurons provide additional synaptic current to lagging neurons via potentiated recurrent synapses). In this experiment, to eliminate variability due to noise (as opposed to excitability), we provided a 17 by 17 cluster of neurons with a regular 200Hz excitatory input. Theta inhibition was present as before and all synapses were initialized to the depressed state. After 10 seconds of STDP, a large fraction of the synapses were potentiated (Figure 5A). When the number of potentiated synapses each neuron made or received was plotted versus its rank in spiking order (Figure 5B), a clear correlation emerged (r = -0.71 or 0.76, respectively). As expected, neurons that spiked early made more and received fewer potentiated synapses. In contrast, neurons that spiked late made fewer and received more potentiated synapses. 5 Pattern Completion After STDP, we found that the network could recall an entire pattern given a subset, thus the same mechanisms that compensated for variability and noise could also compensate for lack of information. We chose a 9 by 9 cluster of neurons as our pattern and delivered a poisson-like spike train with mean rate of 67Hz to each one as in the ﬁrst experiment. Theta inhibition was present as before and all synapses were initialized to the depressed state. Before STDP, we stimulated a subset of the pattern and only neurons in that subset spiked (Figure 6A). After ﬁve seconds of STDP, we stimulated the same subset again. This time they recruited spikes from other neurons in the pattern, completing it (Figure 6B). Upon varying the fraction of the pattern presented, we found that the fraction recalled increased faster than the fraction presented. We selected subsets of the original pattern randomly, varying the fraction of neurons chosen from 0.1 to 1.0 (ten trials for each). We classiﬁed neurons as active if they spiked in the two second period over which we recorded. Thus, we characterized PEP’s pattern-recall performance as a function of the probability that the pattern in question’s neurons are activated (Figure 6C). At a fraction of 0.50 presented, nearly all of the neurons in the pattern are consistently activated (0.91±0.06), showing robust pattern completion. We ﬁtted the recall performance with a sigmoid that reached 0.50 recall fraction with an input fraction of 0.30. No spurious neurons were activated during any trials. Rate(Hz) Rate(Hz) 8 7 7 6 6 5 5 0.6 0.4 2 0.2 0 0 3 3 2 1 1 A 0.8 4 4 Network activity before STDP 1 Fraction of pattern actived 8 0 B Network activity after STDP C 0 0.2 0.4 0.6 0.8 Fraction of pattern stimulated 1 Figure 6: Associative recall. A Before STDP, half of the neurons in a pattern are stimulated; only they are activated. B After STDP, half of the neurons in a pattern are stimulated, and all are activated. C The fraction of the pattern activated grows faster than the fraction stimulated. 6 Discussion Our results demonstrate that PEP successfully compensates for graded variations in our silicon recurrent network using binary (on–off) synapses (in contrast with [8], where weights are graded). While our chip results are encouraging, variability was not eliminated in every case. In the case of the lowest input (50Hz), we see virtually no change (Figure 4A). We suspect the timing remains imprecise because, with such low input, neurons do not spike every theta cycle and, consequently, provide fewer opportunities for the STDP synapses to potentiate. This shortfall illustrates the system’s limits; it can only compensate for variability within certain bounds, and only for activity appropriate to the PEP model. As expected, STDP is the mechanism responsible for PEP. STDP potentiated recurrent synapses from leading neurons to lagging neurons, reducing the disparity among the diverse population of neurons. Even though the STDP circuits are themselves variable, with different efﬁcacies and time constants, when using timing the sign of the weight-change is always correct (data not shown). For this reason, we chose STDP over other more physiological implementations of plasticity, such as membrane-voltage-dependent plasticity (MVDP), which has the capability to learn with graded voltage signals [9], such as those found in active dendrites, providing more computational power [10]. Previously, we investigated a MVDP circuit, which modeled a voltage-dependent NMDAreceptor-gated synapse [11]. It potentiated when the calcium current analog exceeded a threshold, which was designed to occur only during a dendritic action potential. This circuit produced behavior similar to STDP, implying it could be used in PEP. However, it was sensitive to variability in the NMDA and potentiation thresholds, causing a fraction of the population to potentiate anytime the synapse received an input and another fraction to never potentiate, rendering both subpopulations useless. Therefore, the simpler, less biophysical STDP circuit won out over the MVDP circuit: In our system timing is everything. Associative storage and recall naturally emerge in the PEP network when synapses between neurons coactivated by a pattern are potentiated. These synapses allow neurons to recruit their peers when a subset of the pattern is presented, thereby completing the pattern. However, this form of pattern storage and completion differs from Hopﬁeld’s attractor model [12] . Rather than forming symmetric, recurrent neuronal circuits, our recurrent network forms asymmetric circuits in which neurons make connections exclusively to less excitable neurons in the pattern. In both the poisson-like and regular cases (Figures 4 & 5), only about six percent of potentiated connections were reciprocated, as expected by chance. We plan to investigate the storage capacity of this asymmetric form of associative memory. Our system lends itself to modeling brain regions that use precise spike timing, such as the hippocampus. We plan to extend the work presented to store and recall sequences of patterns, as the hippocampus is hypothesized to do. Place cells that represent different locations spike at different phases of the theta cycle, in relation to the distance to their preferred locations. This sequential spiking will allow us to link patterns representing different locations in the order those locations are visited, thereby realizing episodic memory. We propose PEP as a candidate neural mechanism for information coding and storage in the hippocampal system. Observations from the CA1 region of the hippocampus suggest that basal dendrites (which primarily receive excitation from recurrent connections) support submillisecond timing precision, consistent with PEP [13]. We have shown, in a silicon model, PEP’s ability to exploit such fast recurrent connections to sharpen timing precision as well as to associatively store and recall patterns. Acknowledgments We thank Joe Lin for assistance with chip generation. The Ofﬁce of Naval Research funded this work (Award No. N000140210468). References [1] O’Keefe J. & Recce M.L. (1993). Phase relationship between hippocampal place units and the EEG theta rhythm. Hippocampus 3(3):317-330. [2] Mehta M.R., Lee A.K. & Wilson M.A. (2002) Role of experience and oscillations in transforming a rate code into a temporal code. Nature 417(6890):741-746. [3] Bi G.Q. & Wang H.X. (2002) Temporal asymmetry in spike timing-dependent synaptic plasticity. Physiology & Behavior 77:551-555. [4] Rodriguez-Vazquez, A., Linan, G., Espejo S. & Dominguez-Castro R. (2003) Mismatch-induced trade-offs and scalability of analog preprocessing visual microprocessor chips. Analog Integrated Circuits and Signal Processing 37:73-83. [5] Boahen K.A. (2000) Point-to-point connectivity between neuromorphic chips using address events. IEEE Transactions on Circuits and Systems II 47:416-434. [6] Culurciello E.R., Etienne-Cummings R. & Boahen K.A. (2003) A biomorphic digital image sensor. IEEE Journal of Solid State Circuits 38:281-294. [7] Boﬁll A., Murray A.F & Thompson D.P. (2005) Citcuits for VLSI Implementation of Temporally Asymmetric Hebbian Learning. In: Advances in Neural Information Processing Systems 14, MIT Press, 2002. [8] Cameron K., Boonsobhak V., Murray A. & Renshaw D. (2005) Spike timing dependent plasticity (STDP) can ameliorate process variations in neuromorphic VLSI. IEEE Transactions on Neural Networks 16(6):1626-1627. [9] Chicca E., Badoni D., Dante V., D’Andreagiovanni M., Salina G., Carota L., Fusi S. & Del Giudice P. (2003) A VLSI recurrent network of integrate-and-ﬁre neurons connected by plastic synapses with long-term memory. IEEE Transaction on Neural Networks 14(5):1297-1307. [10] Poirazi P., & Mel B.W. (2001) Impact of active dendrites and structural plasticity on the memory capacity of neural tissue. Neuron 29(3)779-796. [11] Arthur J.V. & Boahen K. (2004) Recurrently connected silicon neurons with active dendrites for one-shot learning. In: IEEE International Joint Conference on Neural Networks 3, pp.1699-1704. [12] Hopﬁeld J.J. (1984) Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of the National Academy of Science 81(10):3088-3092. [13] Ariav G., Polsky A. & Schiller J. (2003) Submillisecond precision of the input-output transformation function mediated by fast sodium dendritic spikes in basal dendrites of CA1 pyramidal neurons. Journal of Neuroscience 23(21):7750-7758.

6 0.06435845 88 nips-2005-Gradient Flow Independent Component Analysis in Micropower VLSI

7 0.061481729 25 nips-2005-An aVLSI Cricket Ear Model

8 0.058467954 1 nips-2005-AER Building Blocks for Multi-Layer Multi-Chip Neuromorphic Vision Systems

9 0.053262696 74 nips-2005-Faster Rates in Regression via Active Learning

10 0.053101234 157 nips-2005-Principles of real-time computing with feedback applied to cortical microcircuit models

11 0.052401178 109 nips-2005-Learning Cue-Invariant Visual Responses

12 0.049150441 176 nips-2005-Silicon growth cones map silicon retina

13 0.048924439 28 nips-2005-Analyzing Auditory Neurons by Learning Distance Functions

14 0.042120658 40 nips-2005-CMOL CrossNets: Possible Neuromorphic Nanoelectronic Circuits

15 0.037916411 3 nips-2005-A Bayesian Framework for Tilt Perception and Confidence

16 0.036452923 181 nips-2005-Spiking Inputs to a Winner-take-all Network

17 0.035198916 167 nips-2005-Robust design of biological experiments

18 0.034931589 7 nips-2005-A Cortically-Plausible Inverse Problem Solving Method Applied to Recognizing Static and Kinematic 3D Objects

19 0.032501079 19 nips-2005-Active Learning for Misspecified Models

20 0.031159502 41 nips-2005-Coarse sample complexity bounds for active learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.091), (1, -0.076), (2, -0.025), (3, 0.045), (4, 0.005), (5, 0.056), (6, -0.089), (7, -0.041), (8, 0.082), (9, 0.053), (10, -0.193), (11, 0.02), (12, 0.017), (13, 0.108), (14, -0.098), (15, -0.047), (16, -0.087), (17, 0.027), (18, -0.095), (19, 0.068), (20, 0.236), (21, 0.136), (22, 0.002), (23, -0.198), (24, -0.003), (25, 0.058), (26, 0.147), (27, 0.064), (28, -0.071), (29, -0.061), (30, -0.054), (31, 0.043), (32, -0.028), (33, -0.1), (34, 0.073), (35, -0.061), (36, -0.081), (37, 0.103), (38, 0.079), (39, -0.082), (40, 0.151), (41, -0.03), (42, -0.009), (43, -0.149), (44, -0.078), (45, -0.125), (46, 0.052), (47, -0.047), (48, 0.018), (49, 0.102)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97608316 17 nips-2005-Active Bidirectional Coupling in a Cochlear Chip

Author: Bo Wen, Kwabena A. Boahen

2 0.59877115 155 nips-2005-Predicting EMG Data from M1 Neurons with Variational Bayesian Least Squares

Author: Jo-anne Ting, Aaron D'souza, Kenji Yamamoto, Toshinori Yoshioka, Donna Hoffman, Shinji Kakei, Lauren Sergio, John Kalaska, Mitsuo Kawato

3 0.53899276 192 nips-2005-The Information-Form Data Association Filter

Author: Brad Schumitsch, Sebastian Thrun, Gary Bradski, Kunle Olukotun

4 0.36615032 25 nips-2005-An aVLSI Cricket Ear Model

Author: Andre V. Schaik, Richard Reeve, Craig Jin, Tara Hamilton

Abstract: Female crickets can locate males by phonotaxis to the mating song they produce. The behaviour and underlying physiology has been studied in some depth showing that the cricket auditory system solves this complex problem in a unique manner. We present an analogue very large scale integrated (aVLSI) circuit model of this process and show that results from testing the circuit agree with simulation and what is known from the behaviour and physiology of the cricket auditory system. The aVLSI circuitry is now being extended to use on a robot along with previously modelled neural circuitry to better understand the complete sensorimotor pathway. 1 In trod u ction Understanding how insects carry out complex sensorimotor tasks can help in the design of simple sensory and robotic systems. Often insect sensors have evolved into intricate filters matched to extract highly specific data from the environment which solves a particular problem directly with little or no need for further processing [1]. Examples include head stabilisation in the fly, which uses vision amongst other senses to estimate self-rotation and thus to stabilise its head in flight, and phonotaxis in the cricket. Because of the narrowness of the cricket body (only a few millimetres), the Interaural Time Difference (ITD) for sounds arriving at the two sides of the head is very small (10–20µs). Even with the tympanal membranes (eardrums) located, as they are, on the forelegs of the cricket, the ITD only reaches about 40µs, which is too low to detect directly from timings of neural spikes. Because the wavelength of the cricket calling song is significantly greater than the width of the cricket body the Interaural Intensity Difference (IID) is also very low. In the absence of ITD or IID information, the cricket uses phase to determine direction. This is possible because the male cricket produces an almost pure tone for its calling song. * School of Electrical and Information Engineering, Institute of Perception, Action and Behaviour. + Figure 1: The cricket auditory system. Four acoustic inputs channel sounds directly or through tracheal tubes onto two tympanal membranes. Sound from contralateral inputs has to pass a (double) central membrane (the medial septum), inducing a phase delay and reduction in gain. The sound transmission from the contralateral tympanum is very weak, making each eardrum effectively a 3 input system. The physics of the cricket auditory system is well understood [2]; the system (see Figure 1) uses a pair of sound receivers with four acoustic inputs, two on the forelegs, which are the external surfaces of the tympana, and two on the body, the prothoracic or acoustic spiracles [3]. The connecting tracheal tubes are such that interference occurs as sounds travel inside the cricket, producing a directional response at the tympana to frequencies near to that of the calling song. The amplitude of vibration of the tympana, and hence the firing rate of the auditory afferent neurons attached to them, vary as a sound source is moved around the cricket and the sounds from the different inputs move in and out of phase. The outputs of the two tympana match when the sound is straight ahead, and the inputs are bilaterally symmetric with respect to the sound source. However, when sound at the calling song frequency is off-centre the phase of signals on the closer side comes better into alignment, and the signal increases on that side, and conversely decreases on the other. It is that crossover of tympanal vibration amplitudes which allows the cricket to track a sound source (see Figure 6 for example). A simplified version of the auditory system using only two acoustic inputs was implemented in hardware [4], and a simple 8-neuron network was all that was required to then direct a robot to carry out phonotaxis towards a species-specific calling song [5]. A simple simulator was also created to model the behaviour of the auditory system of Figure 1 at different frequencies [6]. Data from Michelsen et al. [2] (Figures 5 and 6) were digitised, and used together with average and “typical” values from the paper to choose gains and delays for the simulation. Figure 2 shows the model of the internal auditory system of the cricket from sound arriving at the acoustic inputs through to transmission down auditory receptor fibres. The simulator implements this model up to the summing of the delayed inputs, as well as modelling the external sound transmission. Results from the simulator were used to check the directionality of the system at different frequencies, and to gain a better understanding of its response. It was impractical to check the effect of leg movements or of complex sounds in the simulator due to the necessity of simulating the sound production and transmission. An aVLSI chip was designed to implement the same model, both allowing more complex experiments, such as leg movements to be run, and experiments to be run in the real world. Figure 2: A model of the auditory system of the cricket, used to build the simulator and the aVLSI implementation (shown in boxes). These experiments with the simulator and the circuits are being published in [6] and the reader is referred to those papers for more details. In the present paper we present the details of the circuits used for the aVLSI implementation. 2 Circuits The chip, implementing the aVLSI box in Figure 2, comprises two all-pass delay filters, three gain circuits, a second-order narrow-band band-pass filter, a first-order wide-band band-pass filter, a first-order high-pass filter, as well as supporting circuitry (including reference voltages, currents, etc.). A single aVLSI chip (MOSIS tiny-chip) thus includes half the necessary circuitry to model the complete auditory system of a cricket. The complete model of the auditory system can be obtained by using two appropriately connected chips. Only two all-pass delay filters need to be implemented instead of three as suggested by Figure 2, because it is only the relative delay between the three pathways arriving at the one summing node that counts. The delay circuits were implemented with fully-differential gm-C filters. In order to extend the frequency range of the delay, a first-order all-pass delay circuit was cascaded with a second-order all-pass delay circuit. The resulting addition of the first-order delay and the second-order delay allowed for an approximately flat delay response for a wider bandwidth as the decreased delay around the corner frequency of the first-order filter cancelled with the increased delay of the second-order filter around its resonant frequency. Figure 3 shows the first- and second-order sections of the all-pass delay circuit. Two of these circuits were used and, based on data presented in [2], were designed with delays of 28µs and 62µs, by way of bias current manipulation. The operational transconductance amplifier (OTA) in figure 3 is a standard OTA which includes the common-mode feedback necessary for fully differential designs. The buffers (Figure 3) are simple, cascoded differential pairs. V+ V- II+ V+ V- II+ V+ V- II+ V+ V- II+ V+ V- II+ V+ V- II+ Figure 3: The first-order all-pass delay circuit (left) and the second-order all-pass delay (right). The differential output of the delay circuits is converted into a current which is multiplied by a variable gain implemented as shown in Figure 4. The gain cell includes a differential pair with source degeneration via transistors N4 and N5. The source degeneration improves the linearity of the current. The three gain cells implemented on the aVLSI have default gains of 2, 3 and 0.91 which are set by holding the default input high and appropriately ratioing the bias currents through the value of vbiasp. To correct any on-chip mismatches and/or explore other gain configurations a current splitter cell [7] (p-splitter, figure 4) allows the gain to be programmed by digital means post fabrication. The current splitter takes an input current (Ibias, figure 4) and divides it into branches which recursively halve the current, i.e., the first branch gives ½ Ibias, the second branch ¼ Ibias, the third branch 1/8 Ibias and so on. These currents can be used together with digitally controlled switches as a Digital-to-Analogue converter. By holding default low and setting C5:C0 appropriately, any gain – from 4 to 0.125 – can be set. To save on output pins the program bits (C5:C0) for each of the three gain cells are set via a single 18-bit shift register in bit-serial fashion. Summing the output of the three gain circuits in the current domain simply involves connecting three wires together. Therefore, a natural option for the filters that follow is to use current domain filters. In our case we have chosen to implement log-domain filters using MOS transistors operating in weak inversion. Figure 5 shows the basic building blocks for the filters – the Tau Cell [8] and the multiplier cell – and block diagrams showing how these blocks were connected to create the necessary filtering blocks. The Tau Cell is a log-domain filter which has the firstorder response: I out 1 , = I in sτ + 1 where τ = nC aVT Ia and n = the slope factor, VT = thermal voltage, Ca = capacitance, and Ia = bias current. In figure 5, the input currents to the Tau Cell, Imult and A*Ia, are only used when building a second-order filter. The multiplier cell is simply a translinear loop where: I out1 ∗ I mult = I out 2 ∗ AI a or Imult = AIaIout2/Iout1. The configurations of the Tau Cell to get particular responses are covered in [8] along with the corresponding equations. The high frequency filter of Figure 2 is implemented by the high-pass filter in Figure 5 with a corner frequency of 17kHz. The low frequency filter, however, is divided into two parts since the biological filter’s response (see for example Figure 3A in [9]) separates well into a narrow second-order band-pass filter with a 10kHz resonant frequency and a wide band-pass filter made from a first-order high-pass filter with a 3kHz corner frequency followed by a first-order low-pass filter with a 12kHz corner frequency. These filters are then added together to reproduce the biological filter. The filters’ responses can be adjusted post fabrication via their bias currents. This allows for compensation due to processing and matching errors. Figure 4: The Gain Cell above is used to convert the differential voltage input from the delay cells into a single-ended current output. The gain of each cell is controllable via a programmable current cell (p_splitter). An on-chip bias generator [7] was used to create all the necessary current biases on the chip. All the main blocks (delays, gain cells and filters), however, can have their on-chip bias currents overridden through external pins on the chip. The chip was fabricated using the MOSIS AMI 1.6µm technology and designed using the Cadence Custom IC Design Tools (5.0.33). 3 Methods The chip was tested using sound generated on a computer and played through a soundcard to the chip. Responses from the chip were recorded by an oscilloscope, and uploaded back to the computer on completion. Given that the output from the chip and the gain circuits is a current, an external current-sense circuit built with discrete components was used to enable the output to be probed by the oscilloscope. Figure 5: The circuit diagrams for the log-domain filter building blocks – The Tau Cell and The Multiplier – along with the block diagrams for the three filters used in the aVLSI model. Initial experiments were performed to tune the delays and gains. After that, recordings were taken of the directional frequency responses. Sounds were generated by computer for each chip input to simulate moving the forelegs by delaying the sound by the appropriate amount of time; this was a much simpler solution than using microphones and moving them using motors. 4 Results The aVLSI chip was tested to measure its gains and delays, which were successfully tuned to the appropriate values. The chip was then compared with the simulation to check that it was faithfully modelling the system. A result of this test at 4kHz (approximately the cricket calling-song frequency) is shown in Figure 6. Apart from a drop in amplitude of the signal, the response of the circuit was very similar to that of the simulator. The differences were expected because the aVLSI circuit has to deal with real-world noise, whereas the simulated version has perfect signals. Examples of the gain versus frequency response of the two log-domain band-pass filters are shown in Figure 7. Note that the narrow-band filter peaks at 6kHz, which is significantly above the mating song frequency of the cricket which is around 4.5kHz. This is not a mistake, but is observed in real crickets as well. As stated in the introduction, a range of further testing results with both the circuit and the simulator are being published in [6]. 5 D i s c u s s i on The aVLSI auditory sensor in this research models the hearing of the field cricket Gryllus bimaculatus. It is a more faithful model of the cricket auditory system than was previously built in [4], reproducing all the acoustic inputs, as well as the responses to frequencies of both the co specific calling song and bat echolocation chirps. It also generates outputs corresponding to the two sets of behaviourally relevant auditory receptor fibres. Results showed that it matched the biological data well, though there were some inconsistencies due to an error in the specification that will be addressed in a future iteration of the design. A more complete implementation across all frequencies was impractical because of complexity and size issues as well as serving no clear behavioural purpose. Figure 6: Vibration amplitude of the left (dotted) and right (solid) virtual tympana measured in decibels in response to a 4kHz tone in simulation (left) and on the aVLSI chip (right). The plot shows the amplitude of the tympanal responses as the sound source is rotated around the cricket. Figure 7: Frequency-Gain curves for the narrow-band and wide-band bandpass filters. The long-term aim of this work is to better understand simple sensorimotor control loops in crickets and other insects. The next step is to mount this circuitry on a robot to carry out behavioural experiments, which we will compare with existing and new behavioural data (such as that in [10]). This will allow us to refine our models of the neural circuitry involved. Modelling the sensory afferent neurons in hardware is necessary in order to reduce processor load on our robot, so the next revision will include these either onboard, or on a companion chip as we have done before [11]. We will also move both sides of the auditory system onto a single chip to conserve space on the robot. It is our belief and experience that, as a result of this intelligent pre-processing carried out at the sensor level, the neural circuits necessary to accurately model the behaviour will remain simple. Acknowledgments The authors thank the Institute of Neuromorphic Engineering and the UK Biotechnology and Biological Sciences Research Council for funding the research in this paper. References [1] R. Wehner. Matched ﬁlters – neural models of the external world. J Comp Physiol A, 161: 511–531, 1987. [2] A. Michelsen, A. V. Popov, and B. Lewis. Physics of directional hearing in the cricket Gryllus bimaculatus. Journal of Comparative Physiology A, 175:153–164, 1994. [3] A. Michelsen. The tuned cricket. News Physiol. Sci., 13:32–38, 1998. [4] H. H. Lund, B. Webb, and J. Hallam. A robot attracted to the cricket species Gryllus bimaculatus. In P. Husbands and I. Harvey, editors, Proceedings of 4th European Conference on Artiﬁcial Life, pages 246–255. MIT Press/Bradford Books, MA., 1997. [5] R Reeve and B. Webb. New neural circuits for robot phonotaxis. Phil. Trans. R. Soc. Lond. A, 361:2245–2266, August 2003. [6] R. Reeve, A. van Schaik, C. Jin, T. Hamilton, B. Torben-Nielsen and B. Webb Directional hearing in a silicon cricket. Biosystems, (in revision), 2005b [7] T. Delbrück and A. van Schaik, Bias Current Generators with Wide Dynamic Range, Analog Integrated Circuits and Signal Processing 42(2), 2005 [8] A. van Schaik and C. Jin, The Tau Cell: A New Method for the Implementation of Arbitrary Differential Equations, IEEE International Symposium on Circuits and Systems (ISCAS) 2003 [9] Kazuo Imaizumi and Gerald S. Pollack. Neural coding of sound frequency by cricket auditory receptors. The Journal of Neuroscience, 19(4):1508– 1516, 1999. [10] Berthold Hedwig and James F.A. Poulet. Complex auditory behaviour emerges from simple reactive steering. Nature, 430:781–785, 2004. [11] R. Reeve, B. Webb, A. Horchler, G. Indiveri, and R. Quinn. New technologies for testing a model of cricket phonotaxis on an outdoor robot platform. Robotics and Autonomous Systems, 51(1):41-54, 2005.

5 0.29541034 40 nips-2005-CMOL CrossNets: Possible Neuromorphic Nanoelectronic Circuits

Author: Jung Hoon Lee, Xiaolong Ma, Konstantin K. Likharev

Abstract: Hybrid “CMOL” integrated circuits, combining CMOS subsystem with nanowire crossbars and simple two-terminal nanodevices, promise to extend the exponential Moore-Law development of microelectronics into the sub-10-nm range. We are developing neuromorphic network (“CrossNet”) architectures for this future technology, in which neural cell bodies are implemented in CMOS, nanowires are used as axons and dendrites, while nanodevices (bistable latching switches) are used as elementary synapses. We have shown how CrossNets may be trained to perform pattern recovery and classification despite the limitations imposed by the CMOL hardware. Preliminary estimates have shown that CMOL CrossNets may be extremely dense (~10 7 cells per cm2) and operate approximately a million times faster than biological neural networks, at manageable power consumption. In Conclusion, we discuss in brief possible short-term and long-term applications of the emerging technology. 1 Introduction: CMOL Circuits Recent results [1, 2] indicate that the current VLSI paradigm based on CMOS technology can be hardly extended beyond the 10-nm frontier: in this range the sensitivity of parameters (most importantly, the gate voltage threshold) of silicon field-effect transistors to inevitable fabrication spreads grows exponentially. This sensitivity will probably send the fabrication facilities costs skyrocketing, and may lead to the end of Moore’s Law some time during the next decade. There is a growing consensus that the impending Moore’s Law crisis may be preempted by a radical paradigm shift from the purely CMOS technology to hybrid CMOS/nanodevice circuits, e.g., those of “CMOL” variety (Fig. 1). Such circuits (see, e.g., Ref. 3 for their recent review) would combine a level of advanced CMOS devices fabricated by the lithographic patterning, and two-layer nanowire crossbar formed, e.g., by nanoimprint, with nanowires connected by simple, similar, two-terminal nanodevices at each crosspoint. For such devices, molecular single-electron latching switches [4] are presently the leading candidates, in particular because they may be fabricated using the self-assembled monolayer (SAM) technique which already gave reproducible results for simpler molecular devices [5]. (a) nanodevices nanowiring and nanodevices interface pins upper wiring level of CMOS stack (b) βFCMOS Fnano α Fig. 1. CMOL circuit: (a) schematic side view, and (b) top-view zoom-in on several adjacent interface pins. (For clarity, only two adjacent nanodevices are shown.) In order to overcome the CMOS/nanodevice interface problems pertinent to earlier proposals of hybrid circuits [6], in CMOL the interface is provided by pins that are distributed all over the circuit area, on the top of the CMOS stack. This allows to use advanced techniques of nanowire patterning (like nanoimprint) which do not have nanoscale accuracy of layer alignment [3]. The vital feature of this interface is the tilt, by angle α = arcsin(Fnano/βFCMOS), of the nanowire crossbar relative to the square arrays of interface pins (Fig. 1b). Here Fnano is the nanowiring half-pitch, FCMOS is the half-pitch of the CMOS subsystem, and β is a dimensionless factor larger than 1 that depends on the CMOS cell complexity. Figure 1b shows that this tilt allows the CMOS subsystem to address each nanodevice even if Fnano << βFCMOS. By now, it has been shown that CMOL circuits can combine high performance with high defect tolerance (which is necessary for any circuit using nanodevices) for several digital applications. In particular, CMOL circuits with defect rates below a few percent would enable terabit-scale memories [7], while the performance of FPGA-like CMOL circuits may be several hundred times above that of overcome purely CMOL FPGA (implemented with the same FCMOS), at acceptable power dissipation and defect tolerance above 20% [8]. In addition, the very structure of CMOL circuits makes them uniquely suitable for the implementation of more complex, mixed-signal information processing systems, including ultradense and ultrafast neuromorphic networks. The objective of this paper is to describe in brief the current status of our work on the development of so-called Distributed Crossbar Networks (“CrossNets”) that could provide high performance despite the limitations imposed by CMOL hardware. A more detailed description of our earlier results may be found in Ref. 9. 2 Synapses The central device of CrossNet is a two-terminal latching switch [3, 4] (Fig. 2a) which is a combination of two single-electron devices, a transistor and a trap [3]. The device may be naturally implemented as a single organic molecule (Fig. 2b). Qualitatively, the device operates as follows: if voltage V = Vj – Vk applied between the external electrodes (in CMOL, nanowires) is low, the trap island has no net electric charge, and the single-electron transistor is closed. If voltage V approaches certain threshold value V+ > 0, an additional electron is inserted into the trap island, and its field lifts the Coulomb blockade of the single-electron transistor, thus connecting the nanowires. The switch state may be reset (e.g., wires disconnected) by applying a lower voltage V < V- < V+. Due to the random character of single-electron tunneling [2], the quantitative description of the switch is by necessity probabilistic: actually, V determines only the rates Γ↑↓ of device switching between its ON and OFF states. The rates, in turn, determine the dynamics of probability p to have the transistor opened (i.e. wires connected): dp/dt = Γ↑(1 - p) - Γ↓p. (1) The theory of single-electron tunneling [2] shows that, in a good approximation, the rates may be presented as Γ↑↓ = Γ0 exp{±e(V - S)/kBT} , (2) (a) single-electron trap tunnel junction Vj Vk single-electron transistor (b) O clipping group O N C R diimide acceptor groups O O C N R R O OPE wires O N R R N O O R O N R R = hexyl N O O R R O N C R R R Fig. 2. (a) Schematics and (b) possible molecular implementation of the two-terminal single-electron latching switch where Γ0 and S are constants depending on physical parameters of the latching switches. Note that despite the random character of switching, the strong nonlinearity of Eq. (2) allows to limit the degree of the device “fuzziness”. 3 CrossNets Figure 3a shows the generic structure of a CrossNet. CMOS-implemented somatic cells (within the Fire Rate model, just nonlinear differential amplifiers, see Fig. 3b,c) apply their output voltages to “axonic” nanowires. If the latching switch, working as an elementary synapse, on the crosspoint of an axonic wire with the perpendicular “dendritic” wire is open, some current flows into the latter wire, charging it. Since such currents are injected into each dendritic wire through several (many) open synapses, their addition provides a natural passive analog summation of signals from the corresponding somas, typical for all neural networks. Examining Fig. 3a, please note the open-circuit terminations of axonic and dendritic lines at the borders of the somatic cells; due to these terminations the somas do not communicate directly (but only via synapses). The network shown on Fig. 3 is evidently feedforward; recurrent networks are achieved in the evident way by doubling the number of synapses and nanowires per somatic cell (Fig. 3c). Moreover, using dual-rail (bipolar) representation of the signal, and hence doubling the number of nanowires and elementary synapses once again, one gets a CrossNet with somas coupled by compact 4-switch groups [9]. Using Eqs. (1) and (2), it is straightforward to show that that the average synaptic weight wjk of the group obeys the “quasi-Hebbian” rule: d w jk = −4Γ0 sinh (γ S ) sinh (γ V j ) sinh (γ Vk ) . dt (3) (a) - +soma j (b) RL + -- jk+ RL (c) jk- RL + -- -+soma k RL Fig. 3. (a) Generic structure of the simplest, (feedforward, non-Hebbian) CrossNet. Red lines show “axonic”, and blue lines “dendritic” nanowires. Gray squares are interfaces between nanowires and CMOS-based somas (b, c). Signs show the dendrite input polarities. Green circles denote molecular latching switches forming elementary synapses. Bold red and blue points are open-circuit terminations of the nanowires, that do not allow somas to interact in bypass of synapses In the simplest cases (e.g., quasi-Hopfield networks with finite connectivity), the tri-level synaptic weights of the generic CrossNets are quite satisfactory, leading to just a very modest (~30%) network capacity loss. However, some applications (in particular, pattern classification) may require a larger number of weight quantization levels L (e.g., L ≈ 30 for a 1% fidelity [9]). This may be achieved by using compact square arrays (e.g., 4×4) of latching switches (Fig. 4). Various species of CrossNets [9] differ also by the way the somatic cells are distributed around the synaptic field. Figure 5 shows feedforward versions of two CrossNet types most explored so far: the so-called FlossBar and InBar. The former network is more natural for the implementation of multilayered perceptrons (MLP), while the latter system is preferable for recurrent network implementations and also allows a simpler CMOS design of somatic cells. The most important advantage of CrossNets over the hardware neural networks suggested earlier is that these networks allow to achieve enormous density combined with large cell connectivity M >> 1 in quasi-2D electronic circuits. 4 CrossNet training CrossNet training faces several hardware-imposed challenges: (i) The synaptic weight contribution provided by the elementary latching switch is binary, so that for most applications the multi-switch synapses (Fig. 4) are necessary. (ii) The only way to adjust any particular synaptic weight is to turn ON or OFF the corresponding latching switch(es). This is only possible to do by applying certain voltage V = Vj – Vk between the two corresponding nanowires. At this procedure, other nanodevices attached to the same wires should not be disturbed. (iii) As stated above, synapse state switching is a statistical progress, so that the degree of its “fuzziness” should be carefully controlled. (a) Vj (b) V w – A/2 i=1 i=1 2 2 … … n n Vj V w+ A/2 i' = 1 RL 2 … i' = 1 n RS ±(V t –A/2) 2 … RS n ±(V t +A/2) Fig. 4. Composite synapse for providing L = 2n2+1 discrete levels of the weight in (a) operation and (b) weight adjustment modes. The dark-gray rectangles are resistive metallic strips at soma/nanowire interfaces (a) (b) Fig. 5. Two main CrossNet species: (a) FlossBar and (b) InBar, in the generic (feedforward, non-Hebbian, ternary-weight) case for the connectivity parameter M = 9. Only the nanowires and nanodevices coupling one cell (indicated with red dashed lines) to M post-synaptic cells (blue dashed lines) are shown; actually all the cells are similarly coupled We have shown that these challenges may be met using (at least) the following training methods [9]: (i) Synaptic weight import. This procedure is started with training of a homomorphic “precursor” artificial neural network with continuous synaptic weighs wjk, implemented in software, using one of established methods (e.g., error backpropagation). Then the synaptic weights wjk are transferred to the CrossNet, with some “clipping” (rounding) due to the binary nature of elementary synaptic weights. To accomplish the transfer, pairs of somatic cells are sequentially selected via CMOS-level wiring. Using the flexibility of CMOS circuitry, these cells are reconfigured to apply external voltages ±VW to the axonic and dendritic nanowires leading to a particular synapse, while all other nanowires are grounded. The voltage level V W is selected so that it does not switch the synapses attached to only one of the selected nanowires, while voltage 2VW applied to the synapse at the crosspoint of the selected wires is sufficient for its reliable switching. (In the composite synapses with quasi-continuous weights (Fig. 4), only a part of the corresponding switches is turned ON or OFF.) (ii) Error backpropagation. The synaptic weight import procedure is straightforward when wjk may be simply calculated, e.g., for the Hopfield-type networks. However, for very large CrossNets used, e.g., as pattern classifiers the precursor network training may take an impracticably long time. In this case the direct training of a CrossNet may become necessary. We have developed two methods of such training, both based on “Hebbian” synapses consisting of 4 elementary synapses (latching switches) whose average weight dynamics obeys Eq. (3). This quasi-Hebbian rule may be used to implement the backpropagation algorithm either using a periodic time-multiplexing [9] or in a continuous fashion, using the simultaneous propagation of signals and errors along the same dual-rail channels. As a result, presently we may state that CrossNets may be taught to perform virtually all major functions demonstrated earlier with the usual neural networks, including the corrupted pattern restoration in the recurrent quasi-Hopfield mode and pattern classification in the feedforward MLP mode [11]. 5 C r o s s N e t p e r f o r m an c e e s t i m a t e s The significance of this result may be only appreciated in the context of unparalleled physical parameters of CMOL CrossNets. The only fundamental limitation on the half-pitch Fnano (Fig. 1) comes from quantum-mechanical tunneling between nanowires. If the wires are separated by vacuum, the corresponding specific leakage conductance becomes uncomfortably large (~10-12 Ω-1m-1) only at Fnano = 1.5 nm; however, since realistic insulation materials (SiO2, etc.) provide somewhat lower tunnel barriers, let us use a more conservative value Fnano= 3 nm. Note that this value corresponds to 1012 elementary synapses per cm2, so that for 4M = 104 and n = 4 the areal density of neural cells is close to 2×107 cm-2. Both numbers are higher than those for the human cerebral cortex, despite the fact that the quasi-2D CMOL circuits have to compete with quasi-3D cerebral cortex. With the typical specific capacitance of 3×10-10 F/m = 0.3 aF/nm, this gives nanowire capacitance C0 ≈ 1 aF per working elementary synapse, because the corresponding segment has length 4Fnano. The CrossNet operation speed is determined mostly by the time constant τ0 of dendrite nanowire capacitance recharging through resistances of open nanodevices. Since both the relevant conductance and capacitance increase similarly with M and n, τ0 ≈ R0C0. The possibilities of reduction of R0, and hence τ0, are limited mostly by acceptable power dissipation per unit area, that is close to Vs2/(2Fnano)2R0. For room-temperature operation, the voltage scale V0 ≈ Vt should be of the order of at least 30 kBT/e ≈ 1 V to avoid thermally-induced errors [9]. With our number for Fnano, and a relatively high but acceptable power consumption of 100 W/cm2, we get R0 ≈ 1010Ω (which is a very realistic value for single-molecule single-electron devices like one shown in Fig. 3). With this number, τ0 is as small as ~10 ns. This means that the CrossNet speed may be approximately six orders of magnitude (!) higher than that of the biological neural networks. Even scaling R0 up by a factor of 100 to bring power consumption to a more comfortable level of 1 W/cm2, would still leave us at least a four-orders-of-magnitude speed advantage. 6 D i s c u s s i on: P o s s i bl e a p p l i c at i o n s These estimates make us believe that that CMOL CrossNet chips may revolutionize the neuromorphic network applications. Let us start with the example of relatively small (1-cm2-scale) chips used for recognition of a face in a crowd [11]. The most difficult feature of such recognition is the search for face location, i.e. optimal placement of a face on the image relative to the panel providing input for the processing network. The enormous density and speed of CMOL hardware gives a possibility to time-and-space multiplex this task (Fig. 6). In this approach, the full image (say, formed by CMOS photodetectors on the same chip) is divided into P rectangular panels of h×w pixels, corresponding to the expected size and approximate shape of a single face. A CMOS-implemented communication channel passes input data from each panel to the corresponding CMOL neural network, providing its shift in time, say using the TV scanning pattern (red line in Fig. 6). The standard methods of image classification require the network to have just a few hidden layers, so that the time interval Δt necessary for each mapping position may be so short that the total pattern recognition time T = hwΔt may be acceptable even for online face recognition. w h image network input Fig. 6. Scan mapping of the input image on CMOL CrossNet inputs. Red lines show the possible time sequence of image pixels sent to a certain input of the network processing image from the upper-left panel of the pattern Indeed, let us consider a 4-Megapixel image partitioned into 4K 32×32-pixel panels (h = w = 32). This panel will require an MLP net with several (say, four) layers with 1K cells each in order to compare the panel image with ~10 3 stored faces. With the feasible 4-nm nanowire half-pitch, and 65-level synapses (sufficient for better than 99% fidelity [9]), each interlayer crossbar would require chip area about (4K×64 nm)2 = 64×64 μm2, fitting 4×4K of them on a ~0.6 cm2 chip. (The CMOS somatic-layer and communication-system overheads are negligible.) With the acceptable power consumption of the order of 10 W/cm2, the input-to-output signal propagation in such a network will take only about 50 ns, so that Δt may be of the order of 100 ns and the total time T = hwΔt of processing one frame of the order of 100 microseconds, much shorter than the typical TV frame time of ~10 milliseconds. The remaining two-orders-of-magnitude time gap may be used, for example, for double-checking the results via stopping the scan mapping (Fig. 6) at the most promising position. (For this, a simple feedback from the recognition output to the mapping communication system is necessary.) It is instructive to compare the estimated CMOL chip speed with that of the implementation of a similar parallel network ensemble on a CMOS signal processor (say, also combined on the same chip with an array of CMOS photodetectors). Even assuming an extremely high performance of 30 billion additions/multiplications per second, we would need ~4×4K×1K×(4K)2/(30×109) ≈ 104 seconds ~ 3 hours per frame, evidently incompatible with the online image stream processing. Let us finish with a brief (and much more speculative) discussion of possible long-term prospects of CMOL CrossNets. Eventually, large-scale (~30×30 cm2) CMOL circuits may become available. According to the estimates given in the previous section, the integration scale of such a system (in terms of both neural cells and synapses) will be comparable with that of the human cerebral cortex. Equipped with a set of broadband sensor/actuator interfaces, such (necessarily, hierarchical) system may be capable, after a period of initial supervised training, of further self-training in the process of interaction with environment, with the speed several orders of magnitude higher than that of its biological prototypes. Needless to say, the successful development of such self-developing systems would have a major impact not only on all information technologies, but also on the society as a whole. Acknowledgments This work has been supported in part by the AFOSR, MARCO (via FENA Center), and NSF. Valuable contributions made by Simon Fölling, Özgür Türel and Ibrahim Muckra, as well as useful discussions with P. Adams, J. Barhen, D. Hammerstrom, V. Protopopescu, T. Sejnowski, and D. Strukov are gratefully acknowledged. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] Frank, D. J. et al. (2001) Device scaling limits of Si MOSFETs and their application dependencies. Proc. IEEE 89(3): 259-288. Likharev, K. K. (2003) Electronics below 10 nm, in J. Greer et al. (eds.), Nano and Giga Challenges in Microelectronics, pp. 27-68. Amsterdam: Elsevier. Likharev, K. K. and Strukov, D. B. (2005) CMOL: Devices, circuits, and architectures, in G. Cuniberti et al. (eds.), Introducing Molecular Electronics, Ch. 16. Springer, Berlin. Fölling, S., Türel, Ö. & Likharev, K. K. (2001) Single-electron latching switches as nanoscale synapses, in Proc. of the 2001 Int. Joint Conf. on Neural Networks, pp. 216-221. Mount Royal, NJ: Int. Neural Network Society. Wang, W. et al. (2003) Mechanism of electron conduction in self-assembled alkanethiol monolayer devices. Phys. Rev. B 68(3): 035416 1-8. Stan M. et al. (2003) Molecular electronics: From devices and interconnect to circuits and architecture, Proc. IEEE 91(11): 1940-1957. Strukov, D. B. & Likharev, K. K. (2005) Prospects for terabit-scale nanoelectronic memories. Nanotechnology 16(1): 137-148. Strukov, D. B. & Likharev, K. K. (2005) CMOL FPGA: A reconfigurable architecture for hybrid digital circuits with two-terminal nanodevices. Nanotechnology 16(6): 888-900. Türel, Ö. et al. (2004) Neuromorphic architectures for nanoelectronic circuits”, Int. J. of Circuit Theory and Appl. 32(5): 277-302. See, e.g., Hertz J. et al. (1991) Introduction to the Theory of Neural Computation. Cambridge, MA: Perseus. Lee, J. H. & Likharev, K. K. (2005) CrossNets as pattern classifiers. Lecture Notes in Computer Sciences 3575: 434-441.

6 0.28662235 19 nips-2005-Active Learning for Misspecified Models

7 0.28380454 176 nips-2005-Silicon growth cones map silicon retina

8 0.28346673 22 nips-2005-An Analog Visual Pre-Processing Processor Employing Cyclic Line Access in Only-Nearest-Neighbor-Interconnects Architecture

9 0.26019201 1 nips-2005-AER Building Blocks for Multi-Layer Multi-Chip Neuromorphic Vision Systems

10 0.24299814 119 nips-2005-Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods

11 0.23455907 109 nips-2005-Learning Cue-Invariant Visual Responses

12 0.23023391 157 nips-2005-Principles of real-time computing with feedback applied to cortical microcircuit models

13 0.22013873 74 nips-2005-Faster Rates in Regression via Active Learning

14 0.21244666 3 nips-2005-A Bayesian Framework for Tilt Perception and Confidence

15 0.20490935 2 nips-2005-A Bayes Rule for Density Matrices

16 0.19964419 73 nips-2005-Fast biped walking with a reflexive controller and real-time policy searching

17 0.17865106 28 nips-2005-Analyzing Auditory Neurons by Learning Distance Functions

18 0.17629771 88 nips-2005-Gradient Flow Independent Component Analysis in Micropower VLSI

19 0.174739 118 nips-2005-Learning in Silicon: Timing is Everything

20 0.16830152 167 nips-2005-Robust design of biological experiments

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.015), (10, 0.028), (12, 0.014), (26, 0.023), (27, 0.036), (31, 0.032), (34, 0.053), (39, 0.012), (55, 0.02), (57, 0.021), (59, 0.02), (61, 0.418), (69, 0.058), (73, 0.016), (88, 0.066), (91, 0.055)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.87041026 17 nips-2005-Active Bidirectional Coupling in a Cochlear Chip

Author: Bo Wen, Kwabena A. Boahen

2 0.28166467 67 nips-2005-Extracting Dynamical Structure Embedded in Neural Activity

Author: Afsheen Afshar, Gopal Santhanam, Stephen I. Ryu, Maneesh Sahani, Byron M. Yu, Krishna V. Shenoy

Abstract: Spiking activity from neurophysiological experiments often exhibits dynamics beyond that driven by external stimulation, presumably reﬂecting the extensive recurrence of neural circuitry. Characterizing these dynamics may reveal important features of neural computation, particularly during internally-driven cognitive operations. For example, the activity of premotor cortex (PMd) neurons during an instructed delay period separating movement-target speciﬁcation and a movementinitiation cue is believed to be involved in motor planning. We show that the dynamics underlying this activity can be captured by a lowdimensional non-linear dynamical systems model, with underlying recurrent structure and stochastic point-process output. We present and validate latent variable methods that simultaneously estimate the system parameters and the trial-by-trial dynamical trajectories. These methods are applied to characterize the dynamics in PMd data recorded from a chronically-implanted 96-electrode array while monkeys perform delayed-reach tasks. 1

3 0.28112075 30 nips-2005-Assessing Approximations for Gaussian Process Classification

Author: Malte Kuss, Carl E. Rasmussen

Abstract: Gaussian processes are attractive models for probabilistic classiﬁcation but unfortunately exact inference is analytically intractable. We compare Laplace’s method and Expectation Propagation (EP) focusing on marginal likelihood estimates and predictive performance. We explain theoretically and corroborate empirically that EP is superior to Laplace. We also compare to a sophisticated MCMC scheme and show that EP is surprisingly accurate. In recent years models based on Gaussian process (GP) priors have attracted much attention in the machine learning community. Whereas inference in the GP regression model with Gaussian noise can be done analytically, probabilistic classiﬁcation using GPs is analytically intractable. Several approaches to approximate Bayesian inference have been suggested, including Laplace’s approximation, Expectation Propagation (EP), variational approximations and Markov chain Monte Carlo (MCMC) sampling, some of these in conjunction with generalisation bounds, online learning schemes and sparse approximations. Despite the abundance of recent work on probabilistic GP classiﬁers, most experimental studies provide only anecdotal evidence, and no clear picture has yet emerged, as to when and why which algorithm should be preferred. Thus, from a practitioners point of view probabilistic GP classiﬁcation remains a jungle. In this paper, we set out to understand and compare two of the most wide-spread approximations: Laplace’s method and Expectation Propagation (EP). We also compare to a sophisticated, but computationally demanding MCMC scheme to examine how close the approximations are to ground truth. We examine two aspects of the approximation schemes: Firstly the accuracy of approximations to the marginal likelihood which is of central importance for model selection and model comparison. In any practical application of GPs in classiﬁcation (usually multiple) parameters of the covariance function (hyperparameters) have to be handled. Bayesian model selection provides a consistent framework for setting such parameters. Therefore, it is essential to evaluate the accuracy of the marginal likelihood approximations as a function of the hyperparameters, in order to assess the practical usefulness of the approach Secondly, we need to assess the quality of the approximate probabilistic predictions. In the past, the probabilistic nature of the GP predictions have not received much attention, the focus being mostly on classiﬁcation error rates. This unfortunate state of affairs is caused primarily by typical benchmarking problems being considered outside of a realistic context. The ability of a classiﬁer to produce class probabilities or conﬁdences, have obvious relevance in most areas of application, eg. medical diagnosis. We evaluate the predictive distributions of the approximate methods, and compare to the MCMC gold standard. 1 The Gaussian Process Model for Binary Classiﬁcation Let y ∈ {−1, 1} denote the class label of an input x. Gaussian process classiﬁcation (GPC) is discriminative in modelling p(y|x) for given x by a Bernoulli distribution. The probability of success p(y = 1|x) is related to an unconstrained latent function f (x) which is mapped to the unit interval by a sigmoid transformation, eg. the logit or the probit. For reasons of analytic convenience we exclusively use the probit model p(y = 1|x) = Φ(f (x)), where Φ denotes the cumulative density function of the standard Normal distribution. In the GPC model Bayesian inference is performed about the latent function f in the light of observed data D = {(yi , xi )|i = 1, . . . , m}. Let fi = f (xi ) and f = [f1 , . . . , fm ] be shorthand for the values of the latent function and y = [y1 , . . . , ym ] and X = [x1 , . . . , xm ] collect the class labels and inputs respectively. Given the latent function the class labels are independent Bernoulli variables, so the joint likelihood factories: m m p(yi |fi ) = p(y|f ) = i=1 Φ(yi fi ), i=1 and depends on f only through its value at the observed inputs. We use a zero-mean Gaussian process prior over the latent function f with a covariance function k(x, x |θ), which may depend on hyperparameters θ [1]. The functional form and parameters of the covariance function encodes assumptions about the latent function, and adaptation of these is part of the inference. The posterior distribution over latent function values f at the observed X for given hyperparameters θ becomes: m p(f |D, θ) = N (f |0, K) Φ(yi fi ), p(D|θ) i=1 where p(D|θ) = p(y|f )p(f |X, θ)df , denotes the marginal likelihood. Unfortunately neither the marginal likelihood, nor the posterior itself, or predictions can be computed analytically, so approximations are needed. 2 Approximate Bayesian Inference For the GPC model approximations are either based on a Gaussian approximation to the posterior p(f |D, θ) ≈ q(f |D, θ) = N (f |m, A) or involve Markov chain Monte Carlo (MCMC) sampling [2]. We compare Laplace’s method and Expectation Propagation (EP) which are two alternative approaches to ﬁnding parameters m and A of the Gaussian q(f |D, θ). Both methods also allow approximate evaluation of the marginal likelihood, which is useful for ML-II hyperparameter optimisation. Laplace’s approximation (LA) is found by making a second order Taylor approximation of the (un-normalised) log posterior [3]. The mean m is placed at the mode (MAP) and the covariance A equals the negative inverse Hessian of the log posterior density at m. The EP approximation [4] also gives a Gaussian approximation to the posterior. The parameters m and A are found in an iterative scheme by matching the approximate marginal moments of p(fi |D, θ) by the marginals of the approximation N (fi |mi , Aii ). Although we cannot prove the convergence of EP, we conjecture that it always converges for GPC with probit likelihood, and have never encountered an exception. A key insight is that a Gaussian approximation to the GPC posterior is equivalent to a GP approximation to the posterior distribution over latent functions. For a test input x∗ the fi 1 0.16 0.14 0.8 0.6 0.1 fj p(y|f) p(f|y) 0.12 Likelihood p(y|f) Prior p(f) Posterior p(f|y) Laplace q(f|y) EP q(f|y) 0.08 0.4 0.06 0.04 0.2 0.02 0 −4 0 4 8 0 f . (a) (b) Figure 1: Panel (a) provides a one-dimensional illustration of the approximations. The prior N (f |0, 52 ) combined with the probit likelihood (y = 1) results in a skewed posterior. The likelihood uses the right axis, all other curves use the left axis. Laplace’s approximation peaks at the posterior mode, but places far too much mass over negative values of f and too little at large positive values. The EP approximation matches the ﬁrst two posterior moments, which results in a larger mean and a more accurate placement of probability mass compared to Laplace’s approximation. In Panel (b) we caricature a high dimensional zeromean Gaussian prior as an ellipse. The gray shadow indicates that for a high dimensional Gaussian most of the mass lies in a thin shell. For large latent signals (large entries in K), the likelihood essentially cuts off regions which are incompatible with the training labels (hatched area), leaving the upper right orthant as the posterior. The dot represents the mode of the posterior, which remains close to the origin. approximate predictive latent and class probabilities are: 2 q(f∗ |D, θ, x∗ ) = N (µ∗ , σ∗ ), and 2 q(y∗ = 1|D, x∗ ) = Φ(µ∗ / 1 + σ∗ ), 2 where µ∗ = k∗ K−1 m and σ∗ = k(x∗ , x∗ )−k∗ (K−1 − K−1 AK−1 )k∗ , where the vector k∗ = [k(x1 , x∗ ), . . . , k(xm , x∗ )] collects covariances between x∗ and training inputs X. MCMC sampling has the advantage that it becomes exact in the limit of long runs and so provides a gold standard by which to measure the two analytic methods described above. Although MCMC methods can in principle be used to do inference over f and θ jointly [5], we compare to methods using ML-II optimisation over θ, thus we use MCMC to integrate over f only. Good marginal likelihood estimates are notoriously difﬁcult to obtain; in our experiments we use Annealed Importance Sampling (AIS) [6], combining several Thermodynamic Integration runs into a single (unbiased) estimate of the marginal likelihood. Both analytic approximations have a computational complexity which is cubic O(m3 ) as common among non-sparse GP models due to inversions m × m matrices. In our implementations LA and EP need similar running times, on the order of a few minutes for several hundred data-points. Making AIS work efﬁciently requires some ﬁne-tuning and a single estimate of p(D|θ) can take several hours for data sets of a few hundred examples, but this could conceivably be improved upon. 3 Structural Properties of the Posterior and its Approximations Structural properties of the posterior can best be understood by examining its construction. The prior is a correlated m-dimensional Gaussian N (f |0, K) centred at the origin. Each likelihood term p(yi |fi ) softly truncates the half-space from the prior that is incompatible with the observed label, see Figure 1. The resulting posterior is unimodal and skewed, similar to a multivariate Gaussian truncated to the orthant containing y. The mode of the posterior remains close to the origin, while the mass is placed in accordance with the observed class labels. Additionally, high dimensional Gaussian distributions exhibit the property that most probability mass is contained in a thin ellipsoidal shell – depending on the covariance structure – away from the mean [7, ch. 29.2]. Intuitively this occurs since in high dimensions the volume grows extremely rapidly with the radius. As an effect the mode becomes less representative (typical) for the prior distribution as the dimension increases. For the GPC posterior this property persists: the mode of the posterior distribution stays relatively close to the origin, still being unrepresentative for the posterior distribution, while the mean moves to the mass of the posterior making mean and mode differ signiﬁcantly. We cannot generally assume the posterior to be close to Gaussian, as in the often studied limit of low-dimensional parametric models with large amounts of data. Therefore in GPC we must be aware of making a Gaussian approximation to a non-Gaussian posterior. From the properties of the posterior it can be expected that Laplace’s method places m in the right orthant but too close to the origin, such that the approximation will overlap with regions having practically zero posterior mass. As an effect the amplitude of the approximate latent posterior GP will be underestimated systematically, leading to overly cautious predictive distributions. The EP approximation does not rely on a local expansion, but assumes that the marginal distributions can be well approximated by Gaussians. This assumption will be examined empirically below. 4 Experiments In this section we compare and inspect approximations for GPC using various benchmark data sets. The primary focus is not to optimise the absolute performance of GPC models but to compare the relative accuracy of approximations and to validate the arguments given in the previous section. In all experiments we use a covariance function of the form: k(x, x |θ) = σ 2 exp − 1 x − x 2 2 / 2 , (1) such that θ = [σ, ]. We refer to σ 2 as the signal variance and to as the characteristic length-scale. Note that for many classiﬁcation tasks it may be reasonable to use an individual length scale parameter for every input dimension (ARD) or a different kind of covariance function. Nevertheless, for the sake of presentability we use the above covariance function and we believe the conclusions about the accuracy of approximations to be independent of this choice, since it relies on arguments which are independent of the form of the covariance function. As measure of the accuracy of predictive probabilities we use the average information in bits of the predictions about the test targets in excess of that of random guessing. Let p∗ = p(y∗ = 1|D, θ, x∗ ) be the model’s prediction, then we average: I(p∗ , yi ) = i yi +1 2 log2 (p∗ ) + i 1−yi 2 log2 (1 − p∗ ) + H i (2) over all test cases, where H is the entropy of the training labels. The error rate E is equal to the percentage of erroneous class assignments if prediction is understood as a decision problem with symmetric costs. For the ﬁrst set of experiments presented here the well-known USPS digits and the Ionosphere data set were used. A binary sub-problem from the USPS digits is deﬁned by only considering 3’s vs. 5’s (which is probably the hardest of the binary sub-problems) and dividing the data into 767 cases for training and 773 for testing. The Ionosphere data is split into 200 training and 151 test cases. We do an exhaustive investigation on a ﬁne regular grid of values for the log hyperparameters. For each θ on the grid we compute the approximated log marginal likelihood by LA, EP and AIS. Additionally we compute the respective predictive performance (2) on the test set. Results are shown in Figure 2. Log marginal likelihood −150 −130 −200 Log marginal likelihood 5 −115 −105 −95 4 −115 −105 3 −130 −100 −150 2 1 log magnitude, log(σf) log magnitude, log(σf) 4 Log marginal likelihood 5 −160 4 −100 3 −130 −92 −160 2 −105 −160 −105 −200 −115 1 log magnitude, log(σf) 5 −92 −95 3 −100 −105 2−200 −115 −160 −130 −200 1 −200 0 0 0 −200 3 4 log lengthscale, log(l) 5 2 3 4 log lengthscale, log(l) (1a) 4 0.84 4 0.8 0.8 0.25 3 0.8 0.84 2 0.7 0.7 1 0.5 log magnitude, log(σf) 0.86 5 0.86 0.8 0.89 0.88 0.7 1 0.5 3 4 log lengthscale, log(l) 2 3 4 log lengthscale, log(l) (2a) Log marginal likelihood −90 −70 −100 −120 −120 0 −70 −75 −120 1 −100 1 2 3 log lengthscale, log(l) 4 0 −70 −90 −65 2 −100 −100 1 −120 −80 1 2 3 log lengthscale, log(l) 4 −1 −1 5 5 f 0.1 0.2 0.55 0 1 0.4 1 2 3 log lengthscale, log(l) 5 0.5 0.1 0 0.3 0.4 0.6 0.55 0.3 0.2 0.2 0.1 1 0 0.2 4 5 −1 −1 0.4 0.2 0.6 2 0.3 10 0 0.1 0.2 0.1 0 0 0.5 1 2 3 log lengthscale, log(l) 0.5 0.5 0.55 3 0 0.1 0 1 2 3 log lengthscale, log(l) 0.5 0.3 0.5 4 2 5 (3c) 0.5 3 4 Information about test targets in bits 4 log magnitude, log(σf) 4 2 0 (3b) Information about test targets in bits 0.3 log magnitude, log(σ ) −75 0 −1 −1 5 5 0 −120 3 −120 (3a) −1 −1 −90 −80 −65 −100 2 Information about test targets in bits 0 −75 4 0 3 5 Log marginal likelihood −90 3 −100 0 0.25 3 4 log lengthscale, log(l) 5 log magnitude, log(σf) log magnitude, log(σf) f log magnitude, log(σ ) −80 3 0.5 (2c) −75 −90 0.7 0.8 2 4 −75 −1 −1 0.86 0.84 Log marginal likelihood 4 1 0.7 1 5 5 −150 2 (2b) 5 2 0.88 3 0 5 0.84 0.89 0.25 0 0.7 0.25 0 0.86 4 0.84 3 2 5 Information about test targets in bits log magnitude, log(σf) log magnitude, log(σf) 5 −200 3 4 log lengthscale, log(l) (1c) Information about test targets in bits 5 2 2 (1b) Information about test targets in bits 0.5 5 log magnitude, log(σf) 2 4 5 −1 −1 0 1 2 3 log lengthscale, log(l) 4 5 (4a) (4b) (4c) Figure 2: Comparison of marginal likelihood approximations and predictive performances of different approximation techniques for USPS 3s vs. 5s (upper half) and the Ionosphere data (lower half). The columns correspond to LA (a), EP (b), and MCMC (c). The rows show estimates of the log marginal likelihood (rows 1 & 3) and the corresponding predictive performance (2) on the test set (rows 2 & 4) respectively. MCMC samples Laplace p(f|D) EP p(f|D) 0.2 0.15 0.45 0.1 0.4 0.05 0.3 −16 −14 −12 −10 −8 −6 f −4 −2 0 2 4 p(xi) 0 0.35 (a) 0.06 0.25 0.2 0.15 MCMC samples Laplace p(f|D) EP p(f|D) 0.1 0.05 0.04 0 0 2 0.02 xi 4 6 (c) 0 −40 −35 −30 −25 −20 −15 −10 −5 0 5 10 15 f (b) Figure 3: Panel (a) and (b) show two marginal distributions p(fi |D, θ) from a GPC posterior and its approximations. The true posterior is approximated by a normalised histogram of 9000 samples of fi obtained by MCMC sampling. Panel (c) shows a histogram of samples of a marginal distribution of a truncated high-dimensional Gaussian. The line describes a Gaussian with mean and variance estimated from the samples. For all three approximation techniques we see an agreement between marginal likelihood estimates and test performance, which justiﬁes the use of ML-II parameter estimation. But the shape of the contours and the values differ between the methods. The contours for Laplace’s method appear to be slanted compared to EP. The marginal likelihood estimates of EP and AIS agree surprisingly well1 , given that the marginal likelihood comes as a 767 respectively 200 dimensional integral. The EP predictions contain as much information about the test cases as the MCMC predictions and signiﬁcantly more than for LA. Note that for small signal variances (roughly ln(σ 2 ) < 1) LA and EP give very similar results. A possible explanation is that for small signal variances the likelihood does not truncate the prior but only down-weights the tail that disagrees with the observation. As an effect the posterior will be less skewed and both approximations will lead to similar results. For the USPS 3’s vs. 5’s we now inspect the marginal distributions p(fi |D, θ) of single latent function values under the posterior approximations for a given value of θ. We have chosen the values ln(σ) = 3.35 and ln( ) = 2.85 which are between the ML-II estimates of EP and LA. Hybrid MCMC was used to generate 9000 samples from the posterior p(f |D, θ). For LA and EP the approximate marginals are q(fi |D, θ) = N (fi |mi , Aii ) where m and A are found by the respective approximation techniques. In general we observe that the marginal distributions of MCMC samples agree very well with the respective marginal distributions of the EP approximation. For Laplace’s approximation we ﬁnd the mean to be underestimated and the marginal distributions to overlap with zero far more than the EP approximations. Figure (3a) displays the marginal distribution and its approximations for which the MCMC samples show maximal skewness. Figure (3b) shows a typical example where the EP approximation agrees very well with the MCMC samples. We show this particular example because under the EP approximation p(yi = 1|D, θ) < 0.1% but LA gives a wrong p(yi = 1|D, θ) ≈ 18%. In the experiment we saw that the marginal distributions of the posterior often agree very 1 Note that the agreement between the two seems to be limited by the accuracy of the MCMC runs, as judged by the regularity of the contour lines; the tolerance is less than one unit on a (natural) log scale. well with a Gaussian approximation. This seems to contradict the description given in the previous section were we argued that the posterior is skewed by construction. In order to inspect the marginals of a truncated high-dimensional multivariate Gaussian distribution we made an additional synthetic experiment. We constructed a 767 dimensional Gaussian N (x|0, C) with a covariance matrix having one eigenvalue of 100 with eigenvector 1, and all other eigenvalues are 1. We then truncate this distribution such that all xi ≥ 0. Note that the mode of the truncated Gaussian is still at zero, whereas the mean moves towards the remaining mass. Figure (3c) shows a normalised histogram of samples from a marginal distribution of one xi . The samples agree very well with a Gaussian approximation. In the previous section we described the somewhat surprising property, that for a truncated high-dimensional Gaussian, resembling the posterior, the mode (used by LA) may not be particularly representative of the distribution. Although the marginal is also truncated, it is still exceptionally well modelled by a Gaussian – however, the Laplace approximation centred on the origin would be completely inappropriate. In a second set of experiments we compare the predictive performance of LA and EP for GPC on several well known benchmark problems. Each data set is randomly split into 10 folds of which one at a time is left out as a test set to measure the predictive performance of a model trained (or selected) on the remaining nine folds. All performance measures are averages over the 10 folds. For GPC we implement model selection by ML-II hyperparameter estimation, reporting results given the θ that maximised the respective approximate marginal likelihoods p(D|θ). In order to get a better picture of the absolute performance we also compare to results obtained by C-SVM classiﬁcation. The kernel we used is equivalent to the covariance function (1) without the signal variance parameter. For each fold the parameters C and are found in an inner loop of 5-fold cross-validation, in which the parameter grids are reﬁned until the performance stabilises. Predictive probabilities for test cases are obtained by mapping the unthresholded output of the SVM to [0, 1] using a sigmoid function [8]. Results are summarised in Table 1. Comparing Laplace’s method to EP the latter shows to be more accurate both in terms of error rate and information. While the error rates are relatively similar the predictive distribution obtained by EP shows to be more informative about the test targets. Note that for GPC the error rate only depends of the sign of the mean µ∗ of the approximated posterior over latent functions and not the entire posterior predictive distribution. As to be expected, the length of the mean vector m shows much larger values for the EP approximations. Comparing EP and SVMs the results are mixed. For the Crabs data set all methods show the same error rate but the information content of the predictive distributions differs dramatically. For some test cases the SVM predicts the wrong class with large certainty. 5 Summary & Conclusions Our experiments reveal serious differences between Laplace’s method and EP when used in GPC models. From the structural properties of the posterior we described why LA systematically underestimates the mean m. The resulting posterior GP over latent functions will have too small amplitude, although the sign of the mean function will be mostly correct. As an effect LA gives over-conservative predictive probabilities, and diminished information about the test labels. This effect has been show empirically on several real world examples. Large resulting discrepancies in the actual posterior probabilities were found, even at the training locations, which renders the predictive class probabilities produced under this approximation grossly inaccurate. Note, the difference becomes less dramatic if we only consider the classiﬁcation error rates obtained by thresholding p∗ at 1/2. For this particular task, we’ve seen the the sign of the latent function tends to be correct (at least at the training locations). Laplace EP SVM Data Set m n E% I m E% I m E% I Ionosphere 351 34 8.84 0.591 49.96 7.99 0.661 124.94 5.69 0.681 Wisconsin 683 9 3.21 0.804 62.62 3.21 0.805 84.95 3.21 0.795 Pima Indians 768 8 22.77 0.252 29.05 22.63 0.253 47.49 23.01 0.232 Crabs 200 7 2.0 0.682 112.34 2.0 0.908 2552.97 2.0 0.047 Sonar 208 60 15.36 0.439 26.86 13.85 0.537 15678.55 11.14 0.567 USPS 3 vs 5 1540 256 2.27 0.849 163.05 2.21 0.902 22011.70 2.01 0.918 Table 1: Results for benchmark data sets. The ﬁrst three columns give the name of the data set, number of observations m and dimension of inputs n. For Laplace’s method and EP the table reports the average error rate E%, the average information I (2) and the average length m of the mean vector of the Gaussian approximation. For SVMs the error rate and the average information about the test targets are reported. Note that for the Crabs data set we use the sex (not the colour) of the crabs as class label. The EP approximation has shown to give results very close to MCMC both in terms of predictive distributions and marginal likelihood estimates. We have shown and explained why the marginal distributions of the posterior can be well approximated by Gaussians. Further, the marginal likelihood values obtained by LA and EP differ systematically which will lead to different results of ML-II hyperparameter estimation. The discrepancies are similar for different tasks. Using AIS we were able to show the accuracy of marginal likelihood estimates, which to the best of our knowledge has never been done before. In summary, we found that EP is the method of choice for approximate inference in binary GPC models, when the computational cost of MCMC is prohibitive. In contrast, the Laplace approximation is so inaccurate that we advise against its use, especially when predictive probabilities are to be taken seriously. Further experiments and a detailed description of the approximation schemes can be found in [2]. Acknowledgements Both authors acknowledge support by the German Research Foundation (DFG) through grant RA 1030/1. This work was supported in part by the IST Programme of the European Community, under the PASCAL Network of Excellence, IST2002-506778. This publication only reﬂects the authors’ views. References [1] C. K. I. Williams and C. E. Rasmussen. Gaussian processes for regression. In David S. Touretzky, Michael C. Mozer, and Michael E. Hasselmo, editors, NIPS 8, pages 514–520. MIT Press, 1996. [2] M. Kuss and C. E. Rasmussen. Assessing approximate inference for binary Gaussian process classiﬁcation. Journal of Machine Learning Research, 6:1679–1704, 2005. [3] C. K. I. Williams and D. Barber. Bayesian classiﬁcation with Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12):1342–1351, 1998. [4] T. P. Minka. A Family of Algorithms for Approximate Bayesian Inference. PhD thesis, Department of Electrical Engineering and Computer Science, MIT, 2001. [5] R. M. Neal. Regression and classiﬁcation using Gaussian process priors. In J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, editors, Bayesian Statistics 6, pages 475–501. Oxford University Press, 1998. [6] R. M. Neal. Annealed importance sampling. Statistics and Computing, 11:125–139, 2001. [7] D. J. C. MacKay. Information Theory, Inference and Learning Algorithms. CUP, 2003. [8] J. C. Platt. Probabilities for SV machines. In Advances in Large Margin Classiﬁers, pages 61–73. The MIT Press, 2000.

4 0.28098238 32 nips-2005-Augmented Rescorla-Wagner and Maximum Likelihood Estimation

Author: Alan L. Yuille

Abstract: We show that linear generalizations of Rescorla-Wagner can perform Maximum Likelihood estimation of the parameters of all generative models for causal reasoning. Our approach involves augmenting variables to deal with conjunctions of causes, similar to the agumented model of Rescorla. Our results involve genericity assumptions on the distributions of causes. If these assumptions are violated, for example for the Cheng causal power theory, then we show that a linear Rescorla-Wagner can estimate the parameters of the model up to a nonlinear transformtion. Moreover, a nonlinear Rescorla-Wagner is able to estimate the parameters directly to within arbitrary accuracy. Previous results can be used to determine convergence and to estimate convergence rates. 1

5 0.27903438 181 nips-2005-Spiking Inputs to a Winner-take-all Network

Author: Matthias Oster, Shih-Chii Liu

Abstract: Recurrent networks that perform a winner-take-all computation have been studied extensively. Although some of these studies include spiking networks, they consider only analog input rates. We present results of this winner-take-all computation on a network of integrate-and-ﬁre neurons which receives spike trains as inputs. We show how we can conﬁgure the connectivity in the network so that the winner is selected after a pre-determined number of input spikes. We discuss spiking inputs with both regular frequencies and Poisson-distributed rates. The robustness of the computation was tested by implementing the winner-take-all network on an analog VLSI array of 64 integrate-and-ﬁre neurons which have an innate variance in their operating parameters. 1

6 0.27871832 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs

7 0.27871773 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach

8 0.27582595 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery

9 0.27489194 157 nips-2005-Principles of real-time computing with feedback applied to cortical microcircuit models

10 0.27460319 187 nips-2005-Temporal Abstraction in Temporal-difference Networks

11 0.27441257 144 nips-2005-Off-policy Learning with Options and Recognizers

12 0.27401039 136 nips-2005-Noise and the two-thirds power Law

13 0.27235693 28 nips-2005-Analyzing Auditory Neurons by Learning Distance Functions

14 0.27189028 48 nips-2005-Context as Filtering

15 0.27150428 179 nips-2005-Sparse Gaussian Processes using Pseudo-inputs

16 0.27043229 92 nips-2005-Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification

17 0.27033219 45 nips-2005-Conditional Visual Tracking in Kernel Space

18 0.27008933 21 nips-2005-An Alternative Infinite Mixture Of Gaussian Process Experts

19 0.26964435 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations

20 0.26958248 169 nips-2005-Saliency Based on Information Maximization