nips nips2001 nips2001-176 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Roman Genov, Gert Cauwenberghs
Abstract: A mixed-signal paradigm is presented for high-resolution parallel innerproduct computation in very high dimensions, suitable for efficient implementation of kernels in image processing. At the core of the externally digital architecture is a high-density, low-power analog array performing binary-binary partial matrix-vector multiplication. Full digital resolution is maintained even with low-resolution analog-to-digital conversion, owing to random statistics in the analog summation of binary products. A random modulation scheme produces near-Bernoulli statistics even for highly correlated inputs. The approach is validated with real image data, and with experimental results from a CID/DRAM analog array prototype in 0.5 m CMOS. ¢
Reference: text
sentIndex sentText sentNum sentScore
1 edu ¡ Abstract A mixed-signal paradigm is presented for high-resolution parallel innerproduct computation in very high dimensions, suitable for efficient implementation of kernels in image processing. [sent-2, score-0.216]
2 At the core of the externally digital architecture is a high-density, low-power analog array performing binary-binary partial matrix-vector multiplication. [sent-3, score-1.198]
3 Full digital resolution is maintained even with low-resolution analog-to-digital conversion, owing to random statistics in the analog summation of binary products. [sent-4, score-0.867]
4 The approach is validated with real image data, and with experimental results from a CID/DRAM analog array prototype in 0. [sent-6, score-0.832]
5 ¢ 1 Introduction Analog computational arrays [1, 2, 3, 4] for neural information processing offer very large integration density and throughput as needed for real-time tasks in computer vision and pattern recognition [5]. [sent-8, score-0.126]
6 Despite the success of adaptive algorithms and architectures in reducing the effect of analog component mismatch and noise on system performance [6, 7], the precision and repeatability of analog VLSI computation under process and environmental variations is inadequate for some applications. [sent-9, score-0.926]
7 Digital implementation [10] offers absolute precision limited only by wordlength, but at the cost of significantly larger silicon area and power dissipation compared with dedicated, fine-grain parallel analog implementation, e. [sent-10, score-0.653]
8 Largest gains in system precision are obtained for high input dimensions. [sent-14, score-0.135]
9 The framework allows to operate at full digital resolution with relatively imprecise analog hardware, and with minimal cost in implementation complexity to randomize the input data. [sent-15, score-0.963]
10 The computational core of inner-product based kernel operations in image processing and pattern recognition is that of vector-matrix multiplication (VMM) in high dimensions: " ! [sent-16, score-0.209]
11 © £ ¡ ¤¢ ¨ ¦ ©§¥ ¡ (1) ¡ # with -dimensional input vector , -dimensional output vector , and matrix elements . [sent-17, score-0.105]
12 The elements also represent templates in a vector quantizer [8], or support vectors in a support vector machine [9]. [sent-19, score-0.083]
13 1 Internally Analog, Externally Digital Computation The approach combines the computational efficiency of analog array processing with the precision of digital processing and the convenience of a programmable and reconfigurable digital interface. [sent-22, score-1.373]
14 Digital-to-analog conversion at the input interface is inherent in the bit-serial implementation, and row-parallel analog-to-digital converters (ADCs) are used at the output interface to quantize . [sent-24, score-0.204]
15 A 512 128 array prototype using CID/DRAM cells is shown in Figure 1 (a). [sent-25, score-0.444]
16 2 CID/DRAM Cell and Array 5 H' The unit cell in the analog array combines a CID computational element [12, 13] with a DRAM storage element. [sent-27, score-0.869]
17 The cell stores one bit of a matrix element , performs a one-quadrant binary-binary multiplication of and in (5), and accumulates ¡ 1 5 5 6' 83 9 3 ¡ 1 R SQ Radial basis kernels with 3 1 -norm can also be formulated in inner product format. [sent-28, score-0.659]
18 RS(i) m Vout(i) m M1 M2 M3 CID DRAM (i) w mn x(j) n RS(i) x(j) Vout(i) m n m 0 Vdd/2 Vdd Write 0 Vdd/2 Vdd Compute (b) ¤ (a) 0 Vdd/2 Vdd Figure 1: (a) Micrograph of the Kerneltron prototype, containing an array of CID/DRAM cells, and a row-parallel bank of flash ADCs. [sent-29, score-0.329]
19 Circuit diagram, and charge transfer diagram for active write and compute operations. [sent-33, score-0.407]
20 ¦( ¦ ¥ ¨§©($ ¨§¥ ¦ ¦ ¡ £$ ¢ ¡ ¤ ( ¡ ¢ the result across cells with common and indices. [sent-34, score-0.066]
21 The circuit diagram and operation of the cell are given in Figure 1 (b). [sent-35, score-0.186]
22 An array of cells thus performs (unsigned) binary multiplication (5) of matrix and vector yielding , for values of in parallel across the array, and values of in sequence over time. [sent-36, score-0.642]
23 8G 3 ¡ 5 ' 5 6' 5 83 9 3 ¡ 1 The cell contains three MOS transistors connected in series as depicted in Figure 1 (b). [sent-37, score-0.198]
24 Transistors M1 and M2 comprise a dynamic random-access memory (DRAM) cell, with switch M1 controlled by Row Select signal . [sent-38, score-0.04]
25 When activated, the binary quantity is written in the form of charge (either or 0) stored under the gate of M2. [sent-39, score-0.437]
26 Transistors M2 and M3 in turn comprise a charge injection device (CID), which by virtue of charge conservation moves electric charge between two potential wells in a non-destructive manner [12, 13, 14]. [sent-40, score-0.712]
27 5 6' 3 ¡ © 5 6' ¡ 1 3 The charge left under the gate of M2 can only be redistributed between the two CID transistors, M2 and M3. [sent-41, score-0.273]
28 An active charge transfer from M2 to M3 can only occur if there is non-zero charge stored, and if the potential on the gate of M2 drops below that of M3 [12]. [sent-42, score-0.613]
29 The multiply-and-accumulate operation is then completed by capacitively sensing the amount of charge transferred onto the electrode of M3, the output summing node. [sent-46, score-0.323]
30 To this end, the voltage on the output line, left floating after being pre-charged to , is observed. [sent-47, score-0.137]
31 When the charge transfer is active, the cell contributes a change in voltage where is the total capacitance on the output line across cells. [sent-48, score-0.532]
32 After deactivating the input , the transferred charge returns to the storage node M2. [sent-50, score-0.348]
33 3 1 45) 5 83 9 3 1 ) 420" © £ ' %# (&$© The bottom diagram in Figure 1 (b) depicts the charge transfer timing diagram for write 5 5 H' 3 and 83 9 ¡ 1 and compute operations in the case when both are of logic level 1. [sent-53, score-0.41]
34 3 System-Level Performance Measurements on the 512 128-element analog array and other fabricated prototypes show a dynamic range of 43 dB, and a computational cycle of 10 s with power consumption of 50 nW per cell. [sent-55, score-0.816]
35 ¢ ¥I ¡ ¢£ $ ¢ $ The overall system resolution is limited by the precision in the quantization of the outputs from the analog array. [sent-57, score-0.719]
36 Through digital postprocessing, two bits are gained over the resolution of the ADCs used [15], for a total system resolution of 8 bits. [sent-58, score-0.774]
37 Larger resolutions can be obtained by accounting for the statistics of binary terms in the addition, the subject of the next section. [sent-59, score-0.071]
38 3 Resolution Enhancement Through Stochastic Encoding Since the analog inner product (5) is discrete, zero error can be achieved (as if computed digitally) by matching the quantization levels of the ADC with each of the discrete levels in the inner product. [sent-60, score-1.043]
39 Perfect reconstruction of from the quantized output, for an overall resolution of bits, assumes the combined effect of noise and nonlinearity in the analog array and the ADC is within one LSB (least significant bit). [sent-61, score-0.914]
40 For large arrays, this places stringent requirements on analog precision and ADC resolution, . [sent-62, score-0.472]
41 ¡ $ #F " The implicit assumption is that all quantization levels are (equally) needed. [sent-64, score-0.145]
42 A straightforward study of the statistics of the inner product, below, reveals that this is poor use of available resources. [sent-65, score-0.182]
43 1 Bernoulli Statistics 5 H' 5 In what follows we assume signed, rather than unsigned, binary values for inputs and weights, and . [sent-67, score-0.114]
44 This translates to exclusive-OR (XOR), rather than AND, multiplication on the analog array, an operation that can be easily accomplished with the CID/DRAM architecture by differentially coding input and stored bits using twice the number of columns and unit cells. [sent-68, score-0.929]
45 , fair coin flips), the (XOR) product For input bits terms in (5) are Bernoulli distributed, regardless of . [sent-71, score-0.336]
46 Their sum thus follows a binomial distribution 3 (6) ¡ 1 F D GE @ §¥ 8 2 ¡ A971 5£ ¨ 31 ( @86 4 2 ¦ £ 8G 3 ¡ 5 ' B HI6II C5£ 1 I 5£ 8 B¡ ¡ 0( ) with , , which in the Central Limit approaches a normal distribution with zero mean and variance . [sent-72, score-0.071]
47 In other words, for random inputs in high dimensions the active range (or standard deviation) of the inner-product is , a factor smaller than the full range . [sent-73, score-0.279]
48 IH F ¨ IH F ¨ In principle, this allows to relax the effective resolution of the ADC. [sent-74, score-0.198]
49 However, any reduction in conversion range will result in a small but non-zero probability of overflow. [sent-75, score-0.166]
50 In practice, the risk of overflow can be reduced to negligible levels with a few additional bits in the ADC conversion range. [sent-76, score-0.358]
51 An alternative strategy is to use a variable resolution ADC which expands the conversion range on rare occurences of overflow. [sent-77, score-0.328]
52 2 2 Or, with stochastic input encoding, overflow detection could initiate a different random draw. [sent-78, score-0.096]
53 2 Output Voltage (V) (b) Figure 2: Experimental results from CID/DRAM analog array. [sent-87, score-0.383]
54 (a) Output voltage on the sense line computing exclusive-or inner product of 64-dimensional stored and presented binary vectors. [sent-88, score-0.515]
55 A variable number of active bits is summed at different locations in the array by shifting the presented bits. [sent-89, score-0.626]
56 (b) Top: Measured output and actual inner product for 1,024 samples of Bernoulli distributed pairs of stored and presented vectors. [sent-90, score-0.425]
57 2 Experimental Results While the reduced range of the analog inner product supports lower ADC resolution in terms of number of quantization levels, it requires low levels of mismatch and noise so that the discrete levels can be individually resolved, near the center of the distribution. [sent-93, score-1.161]
58 Figure 2 shows the measured outputs on one row of 128 CID/DRAM cells, configured differentially to compute signed binary (exclusive-OR) inner products of stored and presented binary vectors in 64 dimensions. [sent-95, score-0.566]
59 The scope trace in Figure 2 (a) is obtained by storing all bits, and shifting a sequence of input bits that differ with the stored bits by bits. [sent-96, score-0.604]
60 The left and right segment of the scope trace correspond to different selections of active bit locations along the array that are maximally disjoint, to indicate a worst-case mismatch scenario. [sent-97, score-0.579]
61 The measured and actual inner products in Figure 2 (b) are obtained by storing and presenting 1,024 pairs of random binary vectors. [sent-98, score-0.364]
62 The histogram shows a clearly resolved, discrete binomial distribution for the observed analog voltage. [sent-99, score-0.454]
63 ¡% ( ¥ ¡ £ For very large arrays, mismatch and noise may pose a problem in the present implementation with floating sense line. [sent-100, score-0.138]
64 A sense amplifier with virtual ground on the sense line and feedback capacitor optimized to the range would provide a simple solution. [sent-101, score-0.067]
65 3 Real Image Data Although most randomly selected patterns do not correlate with any chosen template, patterns from the real world tend to correlate, and certainly those that are of interest to kernel computation 3 . [sent-103, score-0.04]
66 The key is stochastic encoding of the inputs, as to randomize the bits presented to the analog array. [sent-104, score-0.758]
67 3 This observation, and the binomial distribution for sums of random bits (6), forms the basis for the associative recall in a Kanerva memory. [sent-105, score-0.27]
68 Left: with unmodulated 8-bit image data for both vectors. [sent-107, score-0.071]
69 Right: with 12-bit modulated stochastic encoding of one of the two vectors. [sent-108, score-0.155]
70 ¡ ¢£ $ £ Randomizing an informative input while retaining the information is a futile goal, and we are content with a solution that approaches the ideal performance within observable bounds, and with reasonable cost in implementation. [sent-111, score-0.121]
71 Given that “ideal” randomized inputs relax the ADC resolution by bits, they necessarily reduce the wordlenght of the output by the same. [sent-112, score-0.334]
72 To account for the lost bits in the range of the output, it is necessary to increase the range of the “ideal” randomized input by the same number of bits. [sent-113, score-0.413]
73 ( " F £ ¡ ¤¢ IH F ¨ One possible stochastic encoding scheme that restores the range is -fold oversampling of the input through (digital) delta-sigma modulation. [sent-114, score-0.235]
74 For each -bit input component , pick a random integer in the range , and subtract it to produce a modulated input with additional bits. [sent-116, score-0.192]
75 It can be shown that for worst-case deterministic inputs the mean of the inner product for is off at most by from the origin. [sent-117, score-0.316]
76 The desired inner products for are retrieved by digitally adding and . [sent-118, score-0.314]
77 The random offset can be chosen once, so the inner products obtained for its inner product with the templates can be pre-computed upon initializing or programming the array. [sent-119, score-0.569]
78 The implementation cost is thus limited to component-wise subtraction of and , achieved using one full adder cell, one bit register, and ROM storage of the bits for every column of the array. [sent-120, score-0.419]
79 ¥ ¦ 2 £ ¦ IH % F ¨ ¥ Figure 3 provides a proof of principle, using image data selected at random from Lena. [sent-122, score-0.071]
80 12-bit stochastic encoding of the 8-bit image, by subtracting a random variable in a range 15 times larger than the image, produces the desired binomial distribution for the partial bit inner products, even for the most significant bit (MSB) which is most highly correlated. [sent-123, score-0.709]
81 4 Conclusions We presented an externally digital, internally analog VLSI array architecture suitable for real-time kernel-based neural computation and machine learning in very large dimensions, such as image recognition. [sent-124, score-1.011]
82 Fine-grain massive parallelism and distributed memory, in an array of 3-transistor CID/DRAM cells, provides a throughput of binary MACS (multiply accumulates per second) per Watt of power in a 0. [sent-125, score-0.524]
83 A simple stochastic encoding scheme relaxes precision requirements in the analog implementation by one bit for each four-fold increase in vector dimension, while retaining full digital overall system resolution. [sent-127, score-1.063]
84 Long, “ Programmable analog vector-matrix multipliers,” IEEE Journal of Solid-State Circuits, vol. [sent-144, score-0.383]
85 Chiang, “A programmable CCD signal processor,” IEEE Journal of Solid-State Circuits, vol. [sent-179, score-0.07]
86 Yariv, “Pattern matching and parallel processing with CCD technology,” Proc. [sent-192, score-0.078]
wordName wordTfidf (topN-words)
[('analog', 0.383), ('array', 0.329), ('digital', 0.251), ('charge', 0.224), ('bits', 0.199), ('adc', 0.188), ('inner', 0.182), ('resolution', 0.162), ('cid', 0.161), ('dram', 0.134), ('cell', 0.119), ('bit', 0.115), ('vmm', 0.107), ('vlsi', 0.102), ('conversion', 0.099), ('multiplication', 0.098), ('stored', 0.093), ('product', 0.091), ('precision', 0.089), ('bernoulli', 0.089), ('externally', 0.085), ('quantization', 0.085), ('unsigned', 0.081), ('transistors', 0.079), ('ih', 0.079), ('voltage', 0.078), ('products', 0.078), ('parallel', 0.078), ('circuits', 0.075), ('cauwenberghs', 0.075), ('architecture', 0.073), ('encoding', 0.072), ('binary', 0.071), ('binomial', 0.071), ('mismatch', 0.071), ('image', 0.071), ('internally', 0.07), ('programmable', 0.07), ('throughput', 0.07), ('diagram', 0.067), ('implementation', 0.067), ('range', 0.067), ('cells', 0.066), ('active', 0.064), ('count', 0.064), ('levels', 0.06), ('pp', 0.059), ('vdd', 0.059), ('output', 0.059), ('arrays', 0.056), ('accumulates', 0.054), ('adcs', 0.054), ('ccd', 0.054), ('digitally', 0.054), ('genov', 0.054), ('kerneltron', 0.054), ('msb', 0.054), ('neugebauer', 0.054), ('pedroni', 0.054), ('randomize', 0.054), ('xor', 0.054), ('transfer', 0.052), ('stochastic', 0.05), ('prototype', 0.049), ('gate', 0.049), ('yariv', 0.047), ('ijcnn', 0.047), ('quantizer', 0.047), ('input', 0.046), ('inputs', 0.043), ('oating', 0.043), ('vout', 0.043), ('correlate', 0.04), ('core', 0.04), ('cmos', 0.04), ('comprise', 0.04), ('processor', 0.04), ('transferred', 0.04), ('norwell', 0.04), ('quantized', 0.04), ('ideal', 0.039), ('storage', 0.038), ('dimensions', 0.038), ('differentially', 0.037), ('dedicated', 0.037), ('resolved', 0.037), ('ow', 0.037), ('fabricated', 0.037), ('partial', 0.037), ('retaining', 0.036), ('relax', 0.036), ('silicon', 0.036), ('templates', 0.036), ('shifting', 0.034), ('enhancement', 0.034), ('signed', 0.034), ('randomized', 0.034), ('storing', 0.033), ('rs', 0.033), ('modulated', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 176 nips-2001-Stochastic Mixed-Signal VLSI Architecture for High-Dimensional Kernel Machines
Author: Roman Genov, Gert Cauwenberghs
Abstract: A mixed-signal paradigm is presented for high-resolution parallel innerproduct computation in very high dimensions, suitable for efficient implementation of kernels in image processing. At the core of the externally digital architecture is a high-density, low-power analog array performing binary-binary partial matrix-vector multiplication. Full digital resolution is maintained even with low-resolution analog-to-digital conversion, owing to random statistics in the analog summation of binary products. A random modulation scheme produces near-Bernoulli statistics even for highly correlated inputs. The approach is validated with real image data, and with experimental results from a CID/DRAM analog array prototype in 0.5 m CMOS. ¢
2 0.14781618 34 nips-2001-Analog Soft-Pattern-Matching Classifier using Floating-Gate MOS Technology
Author: Toshihiko Yamasaki, Tadashi Shibata
Abstract: A flexible pattern-matching analog classifier is presented in conjunction with a robust image representation algorithm called Principal Axes Projection (PAP). In the circuit, the functional form of matching is configurable in terms of the peak position, the peak height and the sharpness of the similarity evaluation. The test chip was fabricated in a 0.6-µm CMOS technology and successfully applied to hand-written pattern recognition and medical radiograph analysis using PAP as a feature extraction pre-processing step for robust image coding. The separation and classification of overlapping patterns is also experimentally demonstrated. 1 I ntr o du c ti o n Pattern classification using template matching techniques is a powerful tool in implementing human-like intelligent systems. However, the processing is computationally very expensive, consuming a lot of CPU time when implemented as software running on general-purpose computers. Therefore, software approaches are not practical for real-time applications. For systems working in mobile environment, in particular, they are not realistic because the memory and computational resources are severely limited. The development of analog VLSI chips having a fully parallel template matching architecture [1,2] would be a promising solution in such applications because they offer an opportunity of low-power operation as well as very compact implementation. In order to build a real human-like intelligent system, however, not only the pattern representation algorithm but also the matching hardware itself needs to be made flexible and robust in carrying out the pattern matching task. First of all, two-dimensional patterns need to be represented by feature vectors having substantially reduced dimensions, while at the same time preserving the human perception of similarity among patterns in the vector space mapping. For this purpose, an image representation algorithm called Principal Axes Projection (PAP) has been de- veloped [3] and its robust nature in pattern recognition has been demonstrated in the applications to medical radiograph analysis [3] and hand-written digits recognition [4]. However, the demonstration so far was only carried out by computer simulation. Regarding the matching hardware, high-flexibility analog template matching circuits have been developed for PAP vector representation. The circuits are flexible in a sense that the matching criteria (the weight to elements, the strictness in matching) are configurable. In Ref. [5], the fundamental characteristics of the building block circuits were presented, and their application to simple hand-written digits was presented in Ref. [6]. The purpose of this paper is to demonstrate the robust nature of the hardware matching system by experiments. The classification of simple hand-written patterns and the cephalometric landmark identification in gray-scale medical radiographs have been carried out and successful results are presented. In addition, multiple overlapping patterns can be separated without utilizing a priori knowledge, which is one of the most difficult problems at present in artificial intelligence. 2 I ma g e re pr es e n tati on by P AP PAP is a feature extraction technique using the edge information. The input image (64x64 pixels) is first subjected to pixel-by-pixel spatial filtering operations to detect edges in four directions: horizontal (HR); vertical (VR); +45 degrees (+45); and –45 degrees (-45). Each detected edge is represented by a binary flag and four edge maps are generated. The two-dimensional bit array in an edge map is reduced to a one-dimensional array of numerals by projection. The horizontal edge flags are accumulated in the horizontal direction and projected onto vertical axis. The vertical, +45-degree and –45-degree edge flags are similarly projected onto horizontal, -45-degree and +45-degree axes, respectively. Therefore the method is called “Principal Axes Projection (PAP)” [3,4]. Then each projection data set is series connected in the order of HR, +45, VR, -45 to form a feature vector. Neighboring four elements are averaged and merged to one element and a 64-dimensional vector is finally obtained. This vector representation very well preserves the human perception of similarity in the vector space. In the experiments below, we have further reduced the feature vector to 16 dimensions by merging each set of four neighboring elements into one, without any significant degradation in performance. C i r cui t c o nf i g ura ti ons A B C VGG A B C VGG IOUT IOUT 1 1 2 2 4 4 1 VIN 13 VIN RST RST £ ¡ ¤¢ £ ¥ §¦ 3 Figure 1: Schematic of vector element matching circuit: (a) pyramid (gain reduction) type; (b) plateau (feedback) type. The capacitor area ratio is indicated in the figure. The basic functional form of the similarity evaluation is generated by the shortcut current flowing in a CMOS inverter as in Refs. [7,8,9]. However, their circuits were utilized to form radial basis functions and only the peak position was programmable. In our circuits, not only the peak position but also the peak height and the sharpness of the peak response shape are made configurable to realize flexible matching operations [5]. Two types of the element matching circuit are shown in Fig. 1. They evaluate the similarity between two vector elements. The result of the evaluation is given as an output current (IOUT ) from the pMOS current mirror. The peak position is temporarily memorized by auto-zeroing of the CMOS inverter. The common-gate transistor with VGG stabilizes the voltage supply to the inverter. By controlling the gate bias VGG, the peak height can be changed. This corresponds to multiplying a weight factor to the element. The sharpness of the functional form is taken as the strictness of the similarity evaluation. In the pyramid type circuit (Fig. 1(a)), the sharpness is controlled by the gain reduction in the input. In the plateau type (Fig. 1(b)), the output voltage of the inverter is fed back to input nodes and the sharpness changes in accordance with the amount of the feedback. ¥£¡ ¦¤¢ £¨ 9&% ¦©§ (!! #$ 5 !' #$ &% 9 9 4 92 !¦ A1@9 ¨¥ 5 4 52 (! 5 8765 9) 0 1 ¥ 1 ¨
Author: Takashi Morie, Tomohiro Matsuura, Makoto Nagata, Atsushi Iwata
Abstract: This paper describes a clustering algorithm for vector quantizers using a “stochastic association model”. It offers a new simple and powerful softmax adaptation rule. The adaptation process is the same as the on-line K-means clustering method except for adding random fluctuation in the distortion error evaluation process. Simulation results demonstrate that the new algorithm can achieve efficient adaptation as high as the “neural gas” algorithm, which is reported as one of the most efficient clustering methods. It is a key to add uncorrelated random fluctuation in the similarity evaluation process for each reference vector. For hardware implementation of this process, we propose a nanostructure, whose operation is described by a single-electron circuit. It positively uses fluctuation in quantum mechanical tunneling processes.
4 0.11556991 112 nips-2001-Learning Spike-Based Correlations and Conditional Probabilities in Silicon
Author: Aaron P. Shon, David Hsu, Chris Diorio
Abstract: We have designed and fabricated a VLSI synapse that can learn a conditional probability or correlation between spike-based inputs and feedback signals. The synapse is low power, compact, provides nonvolatile weight storage, and can perform simultaneous multiplication and adaptation. We can calibrate arrays of synapses to ensure uniform adaptation characteristics. Finally, adaptation in our synapse does not necessarily depend on the signals used for computation. Consequently, our synapse can implement learning rules that correlate past and present synaptic activity. We provide analysis and experimental chip results demonstrating the operation in learning and calibration mode, and show how to use our synapse to implement various learning rules in silicon. 1 I n tro d u cti o n Computation with conditional probabilities and correlations underlies many models of neurally inspired information processing. For example, in the sequence-learning neural network models proposed by Levy [1], synapses store the log conditional probability that a presynaptic spike occurred given that the postsynaptic neuron spiked sometime later. Boltzmann machine synapses learn the difference between the correlations of pairs of neurons in the sleep and wake phase [2]. In most neural models, computation and adaptation occurs at the synaptic level. Hence, a silicon synapse that can learn conditional probabilities or correlations between pre- and post-synaptic signals can be a key part of many silicon neural-learning architectures. We have designed and implemented a silicon synapse, in a 0.35µm CMOS process, that learns a synaptic weight that corresponds to the conditional probability or correlation between binary input and feedback signals. This circuit utilizes floating-gate transistors to provide both nonvolatile storage and weight adaptation mechanisms [3]. In addition, the circuit is compact, low power, and provides simultaneous adaptation and computation. Our circuit improves upon previous implementations of floating-gate based learning synapses [3,4,5] in several ways. First, our synapse appears to be the first spike-based floating-gate synapse that implements a general learning principle, rather than a particular learning rule [4,5]. We demon- strate that our synapse can learn either the conditional probability or the correlation between input and feedback signals. Consequently, we can implement a wide range of synaptic learning networks with our circuit. Second, unlike the general correlational learning synapse proposed by Hasler et. al. [3], our synapse can implement learning rules that correlate pre- and postsynaptic activity that occur at different times. Learning algorithms that employ time-separated correlations include both temporal difference learning [6] and recently postulated temporally asymmetric Hebbian learning [7]. Hasler’s correlational floating-gate synapse can only perform updates based on the present input and feedback signals, and is therefore unsuitable for learning rules that correlate signals that occur at different times. Because signals that control adaptation and computation in our synapse are separate, our circuit can implement these time-dependent learning rules. Finally, we can calibrate our synapses to remove mismatch between the adaptation mechanisms of individual synapses. Mismatch between the same adaptation mechanisms on different floating-gate transistors limits the accuracy of learning rules based on these devices. This problem has been noted in previous circuits that use floating-gate adaptation [4,8]. In our circuit, different synapses can learn widely divergent weights from the same inputs because of component mismatch. We provide a calibration mechanism that enables identical adaptation across multiple synapses despite device mismatch. To our knowledge, this circuit is the first instance of a floating-gate learning circuit that includes this feature. This paper is organized as follows. First, we provide a brief introduction to floating-gate transistors. Next, we provide a description and analysis of our synapse, demonstrating that it can learn the conditional probability or correlation between a pair of binary signals. We then describe the calibration circuitry and show its effectiveness in compensating for adaptation mismatches. Finally, we discuss how this synapse can be used for silicon implementations of various learning networks. 2 Floating-gate transistors Because our circuit relies on floating-gate transistors to achieve adaptation, we begin by briefly discussing these devices. A floating-gate transistor (e.g. transistor M3 of Fig.1(a)) comprises a MOSFET whose gate is isolated on all sides by SiO2. A control gate capacitively couples signals to the floating gate. Charge stored on the floating gate implements a nonvolatile analog weight; the transistor’s output current varies with both the floating-gate voltage and the control-gate voltage. We use Fowler-Nordheim tunneling [9] to increase the floating-gate charge, and impact-ionized hot-electron injection (IHEI) [10] to decrease the floating-gate charge. We tunnel by placing a high voltage on a tunneling implant, denoted by the arrow in Fig.1(a). We inject by imposing more than about 3V across the drain and source of transistor M3. The circuit allows simultaneous adaptation and computation, because neither tunneling nor IHEI interfere with circuit operation. Over a wide range of tunneling voltages Vtun, we can approximate the magnitude of the tunneling current Itun as [4]: I tun = I tun 0 exp (Vtun − V fg ) / Vχ (1) where Vtun is the tunneling-implant voltage, Vfg is the floating-gate voltage, and Itun0 and Vχ are fit constants. Over a wide range of transistor drain and source voltages, we can approximate the magnitude of the injection current Iinj as [4]: 1−U t / Vγ I inj = I inj 0 I s exp ( (Vs − Vd ) / Vγ ) (2) where Vs and Vd are the drain and source voltages, Iinj0 is a pre-exponential current, Vγ is a constant that depends on the VLSI process, and Ut is the thermal voltage kT/q. 3 T h e s i l i co n s y n a p s e We show our silicon synapse in Fig.1. The synapse stores an analog weight W, multiplies W by a binary input Xin, and adapts W to either a conditional probability P(Xcor|Y) or a correlation P(XcorY). Xin is analogous to a presynaptic input, while Y is analogous to a postsynaptic signal or error feedback. Xcor is a presynaptic adaptation signal, and typically has some relationship with Xin. We can implement different learning rules by altering the relationship between Xcor and Xin. For some examples, see section 4. We now describe the circuit in more detail. The drain current of floating-gate transistor M4 represents the weight value W. Because the control gate of M4 is fixed, W depends solely on the charge on floating-gate capacitor C1. We can switch the drain current on or off using transistor M7; this switching action corresponds to a multiplication of the weight value W by a binary input signal, Xin. We choose values for the drain voltage of the M4 to prevent injection. A second floating-gate transistor M3, whose gate is also connected to C1, controls adaptation by injection and tunneling. Simultaneously high input signals Xcor and Y cause injection, increasing the weight. A high Vtun causes tunneling, decreasing the weight. We either choose to correlate a high Vtun with signal Y or provide a fixed high Vtun throughout the adaptation process. The choice determines whether the circuit learns a conditional probability or a correlation, respectively. Because the drain current sourced by M4 provides is the weight W, we can express W in terms of M4’s floating-gate voltage, Vfg. Vfg includes the effects of both the fixed controlgate voltage and the variable floating-gate charge. The expression differs depending on whether the readout transistor is operating in the subthreshold or above-threshold regime. We provide both expressions below: I 0 exp( − κ 2V fg /(1 + κ )U t ) W= κ V fg (1 + κ ) 2 β V0 − below threshold 2 (3) above threshold Here V0 is a constant that depends on the threshold voltage and on Vdd, Ut is the thermal voltage kT/q, κ is the floating-gate-to-channel coupling coefficient, and I 0 is a fixed bias current. Eq. 3 shows that W depends solely on Vfg, (all the other factors are constants). These equations differ slightly from standard equations for the source current through a transistor due to source degeneration caused by M 4. This degeneration smoothes the nonlinear relationship between Vfg and Is; its addition to the circuit is optional. 3.1 Weight adaptation Because W depends on Vfg, we can control W by tunneling or injecting transistor M3. In this section, we show that these mechanisms enable our circuit to learn the correlation or conditional probability between inputs Xcor (which we will refer to as X) and Y. Our analysis assumes that these statistics are fixed over some period during which adaptation occurs. The change in floating-gate voltage, and hence the weight, discussed below should therefore be interpreted in terms of the expected weight change due to the statistics of the inputs. We discuss learning of conditional probabilities; a slight change in the tunneling signal, described previously, allows us to learn correlations instead. We first derive the injection equation for the floating-gate voltage in terms of the joint probability P(X,Y) by considering the relationship between the input signals and Is, Vs, Vb Vtun M1 W eq (nA) 80 M2 60 40 C1 Xcor M4 M3 W M5 Xin Y o chip data − fit: P(X|Y)0.78 20 M6 0 M7 synaptic output 0.2 0.4 0.6 Pr(X|Y) 1 0.8 (b) 3.5 Fig. 1. (a) Synapse schematic. (b) Plot of equilibrium weight in the subthreshold regime versus the conditional probability P(X|Y), showing both experimental chip data and a fit from Eq.7 (c). Plot of equilibrium weight versus conditional probability in the above-threshold regime, again showing chip data and a fit from Eq.7. W eq (µA) (a). 3 2.5 2 0 o chip data − fit 0.2 0.4 0.6 Pr(X|Y) 0.8 1 (c) and Vd of M3. We assume that transistor M1 is in saturation, constraining Is at M3 to be constant. Presentation of a joint binary event (X,Y) closes nFET switches M5 and M6, pulling the drain voltage Vd of M3 to 0V and causing injection. Therefore the probability that Vd is low enough to cause injection is the probability of the joint event Pr(X,Y). By Eq.2 , the amount of the injection is also dependent on M3’s source voltage Vs. Because M3 is constrained to a fixed channel current, a drop in the floating-gate voltage, ∆Vfg, causes a drop in Vs of magnitude κ∆Vfg. Substituting these expressions into Eq.2 results in a floating-gate voltage update of: (dV fg / dt )inj = − I inj 0 Pr( X , Y ) exp(κ Vfg / Vγ ) (4) where Iinj0 also includes the constant source current. Eq.4 shows that the floating-gate voltage update due to injection is a function of the probability of the joint event (X,Y). Next we analyze the effects of tunneling on the floating-gate voltage. The origin of the tunneling signal determines whether the synapse is learning a conditional probability or a correlation. If the circuit is learning a conditional probability, occurrence of the conditioning event Y gates a corresponding high-voltage (~9V) signal onto the tunneling implant. Consequently, we can express the change in floating-gate voltage due to tunneling in terms of the probability of Y, and the floating-gate voltage. (dV fg / dt )tun = I tun 0 Pr(Y ) exp(−V fg / Vχ ) (5) Eq.5 shows that the floating-gate voltage update due to tunneling is a function of the probability of the event Y. 3.2 Weight equilibrium To demonstrate that our circuit learns P(X|Y), we show that the equilibrium weight of the synapse is solely a function of P(X|Y). The equilibrium weight of the synapse is the weight value where the expected weight change over time equals zero. This weight value corresponds to the floating-gate voltage where injection and tunneling currents are equal. To find this voltage, we equate Eq’s. 4 and 5 and solve: eq V fg = I inj 0 −1 log Pr( X | Y ) + log I tun 0 (κ / Vy + 1/ Vx ) (6) To derive the equilibrium weight, we substitute Eq.6 into Eq.3 and solve: I0 Weq = I inj 0 I tun 0 β V0 + η log where α = α Pr( X | Y ) I inj 0 I tun 0 below threshold 2 + log ( Pr( X | Y ) ) above threshold (7) κ2 κ2 and η = . (1 + κ )U t (κ / Vγ + 1/ Vχ ) (1 + κ )(κ / Vγ + 1/ Vχ ) Consequently, the equilibrium weight is a function of the conditional probability below threshold and a function of the log-squared conditional probability above threshold. Note that the equilibrium weight is stable because of negative feedback in the tunneling and injection processes. Therefore, the weight will always converge to the equilibrium value shown in Eq.7. Figs. 1(b) and (c) show the equilibrium weight versus the conditional P(X|Y) for both sub- and above-threshold circuits, along with fits to Eq.7. Note that both the sub- and above-threshold relationship between P(X|Y) and the equilibrium weight enables us to compute the probability of a vector of synaptic inputs X given a post-synaptic response Y. In both cases, we can apply the outputs currents of an array of synapses through diodes, and then add the resulting voltages via a capacitive voltage divider, resulting in a voltage that is a linear function of log P(X|Y). 3.3 Calibration circuitry Mismatch between injection and tunneling in different floating-gate transistors can greatly reduce the ability of our synapses to learn meaningful values. Experimental data from floating-gate transistors fabricated in a 0.35µm process show that injection varies by as much as 2:1 across a chip, and tunneling by up to 1.2:1. The effect of this mismatch on our synapses causes the weight equilibrium of different synapses to differ by a multiplicative gain. Fig.2 (b) shows the equilibrium weights of an array of six synapses exposed to identical input signals. The variation of the synaptic weights is of the same order of magnitude as the weights themselves, making large arrays of synapses all but useless for implementing many learning algorithms. We alleviate this problem by calibrating our synapses to equalize the pre-exponential tunneling and injection constants. Because the dependence of the equilibrium weight on these constants is determined by the ratio of Iinj0/Itun0, our calibration process changes Iinj to equalize the ratio of injection to tunneling across all synapses. We choose to calibrate injection because we can easily change Iinj0 by altering the drain current through M1. Our calibration procedure is a self-convergent memory write [11], that causes the equilibrium weight of every synapse to equal the current Ical. Calibration requires many operat- 80 Verase M1 M8 60 W eq (nA) Vb M2 Vtun 40 M3 M4 M9 V cal 20 M5 0 M7 M6 synaptic output 0.2 Ical 0.6 P(X|Y) 0.8 1 0.4 0.6 P(X|Y) 0.8 1 0.4 (b) 80 (a) Fig. 2. (a) Schematic of calibrated synapse with signals used during the calibration procedure. (b) Equilibrium weights for array of synapses shown in Fig.1a. (c) Equilibrium weights for array of calibrated synapses after calibration. W eq (nA) 60 40 20 0 0.2 (c) ing cycles, where, during each cycle, we first increase the equilibrium weight of the synapse, and second, we let the synapse adapt to the new equilibrium weight. We create the calibrated synapse by modifying our original synapse according to Fig. 2(a). We convert M1 into a floating-gate transistor, whose floating-gate charge thereby sets M3’s channel current, providing control of Iinj0 of Eq.7. Transistor M8 modifies M1’s gate charge by means of injection when M9’s gate is low and Vcal is low. M9’s gate is only low when the equilibrium weight W is less than Ical. During calibration, injection and tunneling on M3 are continuously active. We apply a pulse train to Vcal; during each pulse period, Vcal is predominately high. When Vcal is high, the synapse adapts towards its equilibrium weight. When Vcal pulses low, M8 injects, increasing the synapse’s equilibrium weight W. We repeat this process until the equilibrium weight W matches Ical, causing M9’s gate voltage to rise, disabling Vcal and with it injection. To ensure that a precalibrated synapse has an equilibrium weight below Ical, we use tunneling to erase all bias transistors prior to calibration. Fig.2(c) shows the equilibrium weights of six synapses after calibration. The data show that calibration can reduce the effect of mismatched adaptation on the synapse’s learned weight to a small fraction of the weight itself. Because M1 is a floating-gate transistor, its parasitic gate-drain capacitance causes a mild dependence between M1’s drain voltage and source current. Consequently, M3’s floatinggate voltage now affects its source current (through M1’s drain voltage), and we can model M3 as a source-degenerated pFET [3]. The new expression for the injection current in M3 is: Presynaptic neuron W+ Synapse W− X Y Injection Postsynaptic neuron Injection Activation window Fig. 3. A method for achieving spike-time dependent plasticity in silicon. (dV fg / dt )inj = − I inj 0 Pr( X , Y ) exp Vfg κ Vγ − κ k1 Ut (8) where k1 is close to zero. The new expression for injection slightly changes the α and η terms of the weight equilibrium in Eq.7, although the qualitative relationship between the weight equilibrium and the conditional probability remains the same. 4 Implementing silicon synaptic learning rules In this section we discuss how to implement a variety of learning rules from the computational-neurobiology and neural-network literature with our synapse circuit. We can use our circuit to implement a Hebbian learning rule. Simultaneously activating both M5 and M6 is analogous to heterosynaptic LTP based on synchronized pre- and postsynaptic signals, and activating tunneling with the postsynaptic Y is analogous to homosynaptic LTD. In our synapse, we tie Xin and Xcor together and correlate Vtun with Y. Our synapse is also capable of emulating a Boltzmann weight-update rule [2]. This weight-update rule derives from the difference between correlations among neurons when the network receives external input, and when the network operates in a free running phase (denoted as clamped and unclamped phases respectively). With weight decay, a Boltzmann synapse learns the difference between correlations in the clamped and unclamped phase. We can create a Boltzmann synapse from a pair of our circuits, in which the effective weight is the difference between the weights of the two synapses. To implement a weight update, we update one silicon synapse based on pre- and postsynaptic signals in the clamped phase, and update the other synapse in the unclamped phase. We do this by sending Xin to Xcor of one synapse in the clamped phase, and sending Xin to Xcor of the other synapse in the negative phase. Vtun remains constant throughout adaptation. Finally, we consider implementing a temporally asymmetric Hebbian learning rule [7] using our synapse. In temporally asymmetric Hebbian learning, a synapse exhibits LTP or LTD if the presynaptic input occurs before or after the postsynaptic response, respectively. We implement an asymmetric learning synapse using two of our circuits, where the synaptic weight is the difference in the weights of the two circuit. We show the circuit in Fig. 3. Each neuron sends two signals: a neuronal output, and an adaptation time window that is active for some time afterwards. Therefore, the combined synapse receives two presynaptic signals and two postsynaptic signals. The relative timing of a postsynaptic response, Y, with the presynaptic input, X, determines whether the synapse undergoes LTP or LTD. If Y occurs before X, Y’s time window correlates with X, causing injection on the negative synapse, decreasing the weight. If Y occurs after X, Y correlates with X’s time window, causing injection on the positive synapse, increasing the weight. Hence, our circuit can use the relative timing between presynaptic and postsynaptic activity to implement learning. 5 Conclusion We have described a silicon synapse that implements a wide range of spike-based learning rules, and that does not suffer from device mismatch. We have also described how we can implement various silicon-learning networks using this synapse. In addition, although we have only analyzed the learning properties of the synapse for binary signals, we can instead use pulse-coded analog signals. One possible avenue for future work is to analyze the implications of different pulse-coded schemes on the circuit’s adaptive behavior. A c k n o w l e d g e me n t s This work was supported by the National Science Foundation and by the Office of Naval Research. Aaron Shon was also supported by a NDSEG fellowship. We thank Anhai Doan and the anonymous reviewers for helpful comments. References [1] W.B.Levy, “A computational approach to hippocampal function,” in R.D. Hawkins and G.H. Bower (eds.), Computational Models of Learning in Simple Neural Systems, The Psychology of Learning and Motivation vol. 23, pp. 243-305, San Diego, CA: Academic Press, 1989. [2] D. H. Ackley, G. Hinton, and T. Sejnowski, “A learning algorithm for Boltzmann machines,” Cognitive Science vol. 9, pp. 147-169, 1985. [3 ] P. Hasler, B. A. Minch, J. Dugger, and C. Diorio, “Adaptive circuits and synapses using pFET floating-gate devices, ” in G. Cauwenberghs and M. Bayoumi (eds.) Learning in Silicon, pp. 33-65, Kluwer Academic, 1999. [4] P. Hafliger, A spike-based learning rule and its implementation in analog hardware, Ph.D. thesis, ETH Zurich, 1999. [5] C. Diorio, P. Hasler, B. A. Minch, and C. Mead, “A floating-gate MOS learning array with locally computer weight updates,” IEEE Transactions on Electron Devices vol. 44(12), pp. 2281-2289, 1997. [6] R. Sutton, “Learning to predict by the methods of temporal difference,” Machine Learning, vol. 3, p p . 9-44, 1988. [7] H.Markram, J. Lübke, M. Frotscher, and B. Sakmann, “Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs,” Science vol. 275, pp.213-215, 1997. [8] A. Pesavento, T. Horiuchi, C. Diorio, and C. Koch, “Adaptation of current signals with floating-gate circuits,” in Proceedings of the 7th International Conference on Microelectronics for Neural, Fuzzy, and Bio-Inspired Systems (Microneuro99), pp. 128-134, 1999. [9] M. Lenzlinger and E. H. Snow. “Fowler-Nordheim tunneling into thermally grown SiO2,” Journal of Applied Physics vol. 40(1), p p . 278--283, 1969. [10] E. Takeda, C. Yang, and A. Miura-Hamada, Hot Carrier Effects in MOS Devices, San Diego, CA: Academic Press, 1995. [11] C. Diorio, “A p-channel MOS synapse transistor with self-convergent memory writes,” IEEE Journal of Solid-State Circuits vol. 36(5), pp. 816-822, 2001.
5 0.10784999 52 nips-2001-Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks
Author: M. Schmitt
Abstract: Recurrent neural networks of analog units are computers for realvalued functions. We study the time complexity of real computation in general recurrent neural networks. These have sigmoidal, linear, and product units of unlimited order as nodes and no restrictions on the weights. For networks operating in discrete time, we exhibit a family of functions with arbitrarily high complexity, and we derive almost tight bounds on the time required to compute these functions. Thus, evidence is given of the computational limitations that time-bounded analog recurrent neural networks are subject to. 1
6 0.097221956 49 nips-2001-Citcuits for VLSI Implementation of Temporally Asymmetric Hebbian Learning
7 0.091139071 164 nips-2001-Sampling Techniques for Kernel Methods
8 0.085347563 141 nips-2001-Orientation-Selective aVLSI Spiking Neurons
9 0.077211864 150 nips-2001-Probabilistic Inference of Hand Motion from Neural Activity in Motor Cortex
10 0.073997773 155 nips-2001-Quantizing Density Estimators
11 0.070720948 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
12 0.064384997 111 nips-2001-Learning Lateral Interactions for Feature Binding and Sensory Segmentation
13 0.06394206 153 nips-2001-Product Analysis: Learning to Model Observations as Products of Hidden Variables
14 0.063469268 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine
15 0.058158517 19 nips-2001-A Rotation and Translation Invariant Discrete Saliency Network
16 0.054827642 161 nips-2001-Reinforcement Learning with Long Short-Term Memory
17 0.050604288 27 nips-2001-Activity Driven Adaptive Stochastic Resonance
18 0.049548108 174 nips-2001-Spike timing and the coding of naturalistic sounds in a central auditory area of songbirds
19 0.047215931 73 nips-2001-Eye movements and the maturation of cortical orientation selectivity
20 0.045368057 43 nips-2001-Bayesian time series classification
topicId topicWeight
[(0, -0.164), (1, -0.072), (2, -0.076), (3, -0.027), (4, 0.006), (5, 0.019), (6, -0.026), (7, 0.03), (8, -0.036), (9, -0.031), (10, -0.008), (11, -0.141), (12, 0.15), (13, 0.042), (14, -0.069), (15, 0.233), (16, -0.026), (17, 0.037), (18, 0.14), (19, 0.243), (20, -0.006), (21, 0.022), (22, 0.069), (23, -0.076), (24, -0.011), (25, 0.064), (26, -0.059), (27, -0.003), (28, 0.043), (29, -0.039), (30, -0.029), (31, -0.003), (32, -0.153), (33, -0.004), (34, -0.049), (35, 0.163), (36, -0.083), (37, 0.106), (38, 0.039), (39, -0.028), (40, -0.01), (41, -0.089), (42, 0.031), (43, -0.1), (44, 0.075), (45, -0.009), (46, -0.086), (47, -0.098), (48, 0.018), (49, -0.163)]
simIndex simValue paperId paperTitle
same-paper 1 0.97091264 176 nips-2001-Stochastic Mixed-Signal VLSI Architecture for High-Dimensional Kernel Machines
Author: Roman Genov, Gert Cauwenberghs
Abstract: A mixed-signal paradigm is presented for high-resolution parallel innerproduct computation in very high dimensions, suitable for efficient implementation of kernels in image processing. At the core of the externally digital architecture is a high-density, low-power analog array performing binary-binary partial matrix-vector multiplication. Full digital resolution is maintained even with low-resolution analog-to-digital conversion, owing to random statistics in the analog summation of binary products. A random modulation scheme produces near-Bernoulli statistics even for highly correlated inputs. The approach is validated with real image data, and with experimental results from a CID/DRAM analog array prototype in 0.5 m CMOS. ¢
2 0.77527606 34 nips-2001-Analog Soft-Pattern-Matching Classifier using Floating-Gate MOS Technology
Author: Toshihiko Yamasaki, Tadashi Shibata
Abstract: A flexible pattern-matching analog classifier is presented in conjunction with a robust image representation algorithm called Principal Axes Projection (PAP). In the circuit, the functional form of matching is configurable in terms of the peak position, the peak height and the sharpness of the similarity evaluation. The test chip was fabricated in a 0.6-µm CMOS technology and successfully applied to hand-written pattern recognition and medical radiograph analysis using PAP as a feature extraction pre-processing step for robust image coding. The separation and classification of overlapping patterns is also experimentally demonstrated. 1 I ntr o du c ti o n Pattern classification using template matching techniques is a powerful tool in implementing human-like intelligent systems. However, the processing is computationally very expensive, consuming a lot of CPU time when implemented as software running on general-purpose computers. Therefore, software approaches are not practical for real-time applications. For systems working in mobile environment, in particular, they are not realistic because the memory and computational resources are severely limited. The development of analog VLSI chips having a fully parallel template matching architecture [1,2] would be a promising solution in such applications because they offer an opportunity of low-power operation as well as very compact implementation. In order to build a real human-like intelligent system, however, not only the pattern representation algorithm but also the matching hardware itself needs to be made flexible and robust in carrying out the pattern matching task. First of all, two-dimensional patterns need to be represented by feature vectors having substantially reduced dimensions, while at the same time preserving the human perception of similarity among patterns in the vector space mapping. For this purpose, an image representation algorithm called Principal Axes Projection (PAP) has been de- veloped [3] and its robust nature in pattern recognition has been demonstrated in the applications to medical radiograph analysis [3] and hand-written digits recognition [4]. However, the demonstration so far was only carried out by computer simulation. Regarding the matching hardware, high-flexibility analog template matching circuits have been developed for PAP vector representation. The circuits are flexible in a sense that the matching criteria (the weight to elements, the strictness in matching) are configurable. In Ref. [5], the fundamental characteristics of the building block circuits were presented, and their application to simple hand-written digits was presented in Ref. [6]. The purpose of this paper is to demonstrate the robust nature of the hardware matching system by experiments. The classification of simple hand-written patterns and the cephalometric landmark identification in gray-scale medical radiographs have been carried out and successful results are presented. In addition, multiple overlapping patterns can be separated without utilizing a priori knowledge, which is one of the most difficult problems at present in artificial intelligence. 2 I ma g e re pr es e n tati on by P AP PAP is a feature extraction technique using the edge information. The input image (64x64 pixels) is first subjected to pixel-by-pixel spatial filtering operations to detect edges in four directions: horizontal (HR); vertical (VR); +45 degrees (+45); and –45 degrees (-45). Each detected edge is represented by a binary flag and four edge maps are generated. The two-dimensional bit array in an edge map is reduced to a one-dimensional array of numerals by projection. The horizontal edge flags are accumulated in the horizontal direction and projected onto vertical axis. The vertical, +45-degree and –45-degree edge flags are similarly projected onto horizontal, -45-degree and +45-degree axes, respectively. Therefore the method is called “Principal Axes Projection (PAP)” [3,4]. Then each projection data set is series connected in the order of HR, +45, VR, -45 to form a feature vector. Neighboring four elements are averaged and merged to one element and a 64-dimensional vector is finally obtained. This vector representation very well preserves the human perception of similarity in the vector space. In the experiments below, we have further reduced the feature vector to 16 dimensions by merging each set of four neighboring elements into one, without any significant degradation in performance. C i r cui t c o nf i g ura ti ons A B C VGG A B C VGG IOUT IOUT 1 1 2 2 4 4 1 VIN 13 VIN RST RST £ ¡ ¤¢ £ ¥ §¦ 3 Figure 1: Schematic of vector element matching circuit: (a) pyramid (gain reduction) type; (b) plateau (feedback) type. The capacitor area ratio is indicated in the figure. The basic functional form of the similarity evaluation is generated by the shortcut current flowing in a CMOS inverter as in Refs. [7,8,9]. However, their circuits were utilized to form radial basis functions and only the peak position was programmable. In our circuits, not only the peak position but also the peak height and the sharpness of the peak response shape are made configurable to realize flexible matching operations [5]. Two types of the element matching circuit are shown in Fig. 1. They evaluate the similarity between two vector elements. The result of the evaluation is given as an output current (IOUT ) from the pMOS current mirror. The peak position is temporarily memorized by auto-zeroing of the CMOS inverter. The common-gate transistor with VGG stabilizes the voltage supply to the inverter. By controlling the gate bias VGG, the peak height can be changed. This corresponds to multiplying a weight factor to the element. The sharpness of the functional form is taken as the strictness of the similarity evaluation. In the pyramid type circuit (Fig. 1(a)), the sharpness is controlled by the gain reduction in the input. In the plateau type (Fig. 1(b)), the output voltage of the inverter is fed back to input nodes and the sharpness changes in accordance with the amount of the feedback. ¥£¡ ¦¤¢ £¨ 9&% ¦©§ (!! #$ 5 !' #$ &% 9 9 4 92 !¦ A1@9 ¨¥ 5 4 52 (! 5 8765 9) 0 1 ¥ 1 ¨
Author: Takashi Morie, Tomohiro Matsuura, Makoto Nagata, Atsushi Iwata
Abstract: This paper describes a clustering algorithm for vector quantizers using a “stochastic association model”. It offers a new simple and powerful softmax adaptation rule. The adaptation process is the same as the on-line K-means clustering method except for adding random fluctuation in the distortion error evaluation process. Simulation results demonstrate that the new algorithm can achieve efficient adaptation as high as the “neural gas” algorithm, which is reported as one of the most efficient clustering methods. It is a key to add uncorrelated random fluctuation in the similarity evaluation process for each reference vector. For hardware implementation of this process, we propose a nanostructure, whose operation is described by a single-electron circuit. It positively uses fluctuation in quantum mechanical tunneling processes.
4 0.5687049 112 nips-2001-Learning Spike-Based Correlations and Conditional Probabilities in Silicon
Author: Aaron P. Shon, David Hsu, Chris Diorio
Abstract: We have designed and fabricated a VLSI synapse that can learn a conditional probability or correlation between spike-based inputs and feedback signals. The synapse is low power, compact, provides nonvolatile weight storage, and can perform simultaneous multiplication and adaptation. We can calibrate arrays of synapses to ensure uniform adaptation characteristics. Finally, adaptation in our synapse does not necessarily depend on the signals used for computation. Consequently, our synapse can implement learning rules that correlate past and present synaptic activity. We provide analysis and experimental chip results demonstrating the operation in learning and calibration mode, and show how to use our synapse to implement various learning rules in silicon. 1 I n tro d u cti o n Computation with conditional probabilities and correlations underlies many models of neurally inspired information processing. For example, in the sequence-learning neural network models proposed by Levy [1], synapses store the log conditional probability that a presynaptic spike occurred given that the postsynaptic neuron spiked sometime later. Boltzmann machine synapses learn the difference between the correlations of pairs of neurons in the sleep and wake phase [2]. In most neural models, computation and adaptation occurs at the synaptic level. Hence, a silicon synapse that can learn conditional probabilities or correlations between pre- and post-synaptic signals can be a key part of many silicon neural-learning architectures. We have designed and implemented a silicon synapse, in a 0.35µm CMOS process, that learns a synaptic weight that corresponds to the conditional probability or correlation between binary input and feedback signals. This circuit utilizes floating-gate transistors to provide both nonvolatile storage and weight adaptation mechanisms [3]. In addition, the circuit is compact, low power, and provides simultaneous adaptation and computation. Our circuit improves upon previous implementations of floating-gate based learning synapses [3,4,5] in several ways. First, our synapse appears to be the first spike-based floating-gate synapse that implements a general learning principle, rather than a particular learning rule [4,5]. We demon- strate that our synapse can learn either the conditional probability or the correlation between input and feedback signals. Consequently, we can implement a wide range of synaptic learning networks with our circuit. Second, unlike the general correlational learning synapse proposed by Hasler et. al. [3], our synapse can implement learning rules that correlate pre- and postsynaptic activity that occur at different times. Learning algorithms that employ time-separated correlations include both temporal difference learning [6] and recently postulated temporally asymmetric Hebbian learning [7]. Hasler’s correlational floating-gate synapse can only perform updates based on the present input and feedback signals, and is therefore unsuitable for learning rules that correlate signals that occur at different times. Because signals that control adaptation and computation in our synapse are separate, our circuit can implement these time-dependent learning rules. Finally, we can calibrate our synapses to remove mismatch between the adaptation mechanisms of individual synapses. Mismatch between the same adaptation mechanisms on different floating-gate transistors limits the accuracy of learning rules based on these devices. This problem has been noted in previous circuits that use floating-gate adaptation [4,8]. In our circuit, different synapses can learn widely divergent weights from the same inputs because of component mismatch. We provide a calibration mechanism that enables identical adaptation across multiple synapses despite device mismatch. To our knowledge, this circuit is the first instance of a floating-gate learning circuit that includes this feature. This paper is organized as follows. First, we provide a brief introduction to floating-gate transistors. Next, we provide a description and analysis of our synapse, demonstrating that it can learn the conditional probability or correlation between a pair of binary signals. We then describe the calibration circuitry and show its effectiveness in compensating for adaptation mismatches. Finally, we discuss how this synapse can be used for silicon implementations of various learning networks. 2 Floating-gate transistors Because our circuit relies on floating-gate transistors to achieve adaptation, we begin by briefly discussing these devices. A floating-gate transistor (e.g. transistor M3 of Fig.1(a)) comprises a MOSFET whose gate is isolated on all sides by SiO2. A control gate capacitively couples signals to the floating gate. Charge stored on the floating gate implements a nonvolatile analog weight; the transistor’s output current varies with both the floating-gate voltage and the control-gate voltage. We use Fowler-Nordheim tunneling [9] to increase the floating-gate charge, and impact-ionized hot-electron injection (IHEI) [10] to decrease the floating-gate charge. We tunnel by placing a high voltage on a tunneling implant, denoted by the arrow in Fig.1(a). We inject by imposing more than about 3V across the drain and source of transistor M3. The circuit allows simultaneous adaptation and computation, because neither tunneling nor IHEI interfere with circuit operation. Over a wide range of tunneling voltages Vtun, we can approximate the magnitude of the tunneling current Itun as [4]: I tun = I tun 0 exp (Vtun − V fg ) / Vχ (1) where Vtun is the tunneling-implant voltage, Vfg is the floating-gate voltage, and Itun0 and Vχ are fit constants. Over a wide range of transistor drain and source voltages, we can approximate the magnitude of the injection current Iinj as [4]: 1−U t / Vγ I inj = I inj 0 I s exp ( (Vs − Vd ) / Vγ ) (2) where Vs and Vd are the drain and source voltages, Iinj0 is a pre-exponential current, Vγ is a constant that depends on the VLSI process, and Ut is the thermal voltage kT/q. 3 T h e s i l i co n s y n a p s e We show our silicon synapse in Fig.1. The synapse stores an analog weight W, multiplies W by a binary input Xin, and adapts W to either a conditional probability P(Xcor|Y) or a correlation P(XcorY). Xin is analogous to a presynaptic input, while Y is analogous to a postsynaptic signal or error feedback. Xcor is a presynaptic adaptation signal, and typically has some relationship with Xin. We can implement different learning rules by altering the relationship between Xcor and Xin. For some examples, see section 4. We now describe the circuit in more detail. The drain current of floating-gate transistor M4 represents the weight value W. Because the control gate of M4 is fixed, W depends solely on the charge on floating-gate capacitor C1. We can switch the drain current on or off using transistor M7; this switching action corresponds to a multiplication of the weight value W by a binary input signal, Xin. We choose values for the drain voltage of the M4 to prevent injection. A second floating-gate transistor M3, whose gate is also connected to C1, controls adaptation by injection and tunneling. Simultaneously high input signals Xcor and Y cause injection, increasing the weight. A high Vtun causes tunneling, decreasing the weight. We either choose to correlate a high Vtun with signal Y or provide a fixed high Vtun throughout the adaptation process. The choice determines whether the circuit learns a conditional probability or a correlation, respectively. Because the drain current sourced by M4 provides is the weight W, we can express W in terms of M4’s floating-gate voltage, Vfg. Vfg includes the effects of both the fixed controlgate voltage and the variable floating-gate charge. The expression differs depending on whether the readout transistor is operating in the subthreshold or above-threshold regime. We provide both expressions below: I 0 exp( − κ 2V fg /(1 + κ )U t ) W= κ V fg (1 + κ ) 2 β V0 − below threshold 2 (3) above threshold Here V0 is a constant that depends on the threshold voltage and on Vdd, Ut is the thermal voltage kT/q, κ is the floating-gate-to-channel coupling coefficient, and I 0 is a fixed bias current. Eq. 3 shows that W depends solely on Vfg, (all the other factors are constants). These equations differ slightly from standard equations for the source current through a transistor due to source degeneration caused by M 4. This degeneration smoothes the nonlinear relationship between Vfg and Is; its addition to the circuit is optional. 3.1 Weight adaptation Because W depends on Vfg, we can control W by tunneling or injecting transistor M3. In this section, we show that these mechanisms enable our circuit to learn the correlation or conditional probability between inputs Xcor (which we will refer to as X) and Y. Our analysis assumes that these statistics are fixed over some period during which adaptation occurs. The change in floating-gate voltage, and hence the weight, discussed below should therefore be interpreted in terms of the expected weight change due to the statistics of the inputs. We discuss learning of conditional probabilities; a slight change in the tunneling signal, described previously, allows us to learn correlations instead. We first derive the injection equation for the floating-gate voltage in terms of the joint probability P(X,Y) by considering the relationship between the input signals and Is, Vs, Vb Vtun M1 W eq (nA) 80 M2 60 40 C1 Xcor M4 M3 W M5 Xin Y o chip data − fit: P(X|Y)0.78 20 M6 0 M7 synaptic output 0.2 0.4 0.6 Pr(X|Y) 1 0.8 (b) 3.5 Fig. 1. (a) Synapse schematic. (b) Plot of equilibrium weight in the subthreshold regime versus the conditional probability P(X|Y), showing both experimental chip data and a fit from Eq.7 (c). Plot of equilibrium weight versus conditional probability in the above-threshold regime, again showing chip data and a fit from Eq.7. W eq (µA) (a). 3 2.5 2 0 o chip data − fit 0.2 0.4 0.6 Pr(X|Y) 0.8 1 (c) and Vd of M3. We assume that transistor M1 is in saturation, constraining Is at M3 to be constant. Presentation of a joint binary event (X,Y) closes nFET switches M5 and M6, pulling the drain voltage Vd of M3 to 0V and causing injection. Therefore the probability that Vd is low enough to cause injection is the probability of the joint event Pr(X,Y). By Eq.2 , the amount of the injection is also dependent on M3’s source voltage Vs. Because M3 is constrained to a fixed channel current, a drop in the floating-gate voltage, ∆Vfg, causes a drop in Vs of magnitude κ∆Vfg. Substituting these expressions into Eq.2 results in a floating-gate voltage update of: (dV fg / dt )inj = − I inj 0 Pr( X , Y ) exp(κ Vfg / Vγ ) (4) where Iinj0 also includes the constant source current. Eq.4 shows that the floating-gate voltage update due to injection is a function of the probability of the joint event (X,Y). Next we analyze the effects of tunneling on the floating-gate voltage. The origin of the tunneling signal determines whether the synapse is learning a conditional probability or a correlation. If the circuit is learning a conditional probability, occurrence of the conditioning event Y gates a corresponding high-voltage (~9V) signal onto the tunneling implant. Consequently, we can express the change in floating-gate voltage due to tunneling in terms of the probability of Y, and the floating-gate voltage. (dV fg / dt )tun = I tun 0 Pr(Y ) exp(−V fg / Vχ ) (5) Eq.5 shows that the floating-gate voltage update due to tunneling is a function of the probability of the event Y. 3.2 Weight equilibrium To demonstrate that our circuit learns P(X|Y), we show that the equilibrium weight of the synapse is solely a function of P(X|Y). The equilibrium weight of the synapse is the weight value where the expected weight change over time equals zero. This weight value corresponds to the floating-gate voltage where injection and tunneling currents are equal. To find this voltage, we equate Eq’s. 4 and 5 and solve: eq V fg = I inj 0 −1 log Pr( X | Y ) + log I tun 0 (κ / Vy + 1/ Vx ) (6) To derive the equilibrium weight, we substitute Eq.6 into Eq.3 and solve: I0 Weq = I inj 0 I tun 0 β V0 + η log where α = α Pr( X | Y ) I inj 0 I tun 0 below threshold 2 + log ( Pr( X | Y ) ) above threshold (7) κ2 κ2 and η = . (1 + κ )U t (κ / Vγ + 1/ Vχ ) (1 + κ )(κ / Vγ + 1/ Vχ ) Consequently, the equilibrium weight is a function of the conditional probability below threshold and a function of the log-squared conditional probability above threshold. Note that the equilibrium weight is stable because of negative feedback in the tunneling and injection processes. Therefore, the weight will always converge to the equilibrium value shown in Eq.7. Figs. 1(b) and (c) show the equilibrium weight versus the conditional P(X|Y) for both sub- and above-threshold circuits, along with fits to Eq.7. Note that both the sub- and above-threshold relationship between P(X|Y) and the equilibrium weight enables us to compute the probability of a vector of synaptic inputs X given a post-synaptic response Y. In both cases, we can apply the outputs currents of an array of synapses through diodes, and then add the resulting voltages via a capacitive voltage divider, resulting in a voltage that is a linear function of log P(X|Y). 3.3 Calibration circuitry Mismatch between injection and tunneling in different floating-gate transistors can greatly reduce the ability of our synapses to learn meaningful values. Experimental data from floating-gate transistors fabricated in a 0.35µm process show that injection varies by as much as 2:1 across a chip, and tunneling by up to 1.2:1. The effect of this mismatch on our synapses causes the weight equilibrium of different synapses to differ by a multiplicative gain. Fig.2 (b) shows the equilibrium weights of an array of six synapses exposed to identical input signals. The variation of the synaptic weights is of the same order of magnitude as the weights themselves, making large arrays of synapses all but useless for implementing many learning algorithms. We alleviate this problem by calibrating our synapses to equalize the pre-exponential tunneling and injection constants. Because the dependence of the equilibrium weight on these constants is determined by the ratio of Iinj0/Itun0, our calibration process changes Iinj to equalize the ratio of injection to tunneling across all synapses. We choose to calibrate injection because we can easily change Iinj0 by altering the drain current through M1. Our calibration procedure is a self-convergent memory write [11], that causes the equilibrium weight of every synapse to equal the current Ical. Calibration requires many operat- 80 Verase M1 M8 60 W eq (nA) Vb M2 Vtun 40 M3 M4 M9 V cal 20 M5 0 M7 M6 synaptic output 0.2 Ical 0.6 P(X|Y) 0.8 1 0.4 0.6 P(X|Y) 0.8 1 0.4 (b) 80 (a) Fig. 2. (a) Schematic of calibrated synapse with signals used during the calibration procedure. (b) Equilibrium weights for array of synapses shown in Fig.1a. (c) Equilibrium weights for array of calibrated synapses after calibration. W eq (nA) 60 40 20 0 0.2 (c) ing cycles, where, during each cycle, we first increase the equilibrium weight of the synapse, and second, we let the synapse adapt to the new equilibrium weight. We create the calibrated synapse by modifying our original synapse according to Fig. 2(a). We convert M1 into a floating-gate transistor, whose floating-gate charge thereby sets M3’s channel current, providing control of Iinj0 of Eq.7. Transistor M8 modifies M1’s gate charge by means of injection when M9’s gate is low and Vcal is low. M9’s gate is only low when the equilibrium weight W is less than Ical. During calibration, injection and tunneling on M3 are continuously active. We apply a pulse train to Vcal; during each pulse period, Vcal is predominately high. When Vcal is high, the synapse adapts towards its equilibrium weight. When Vcal pulses low, M8 injects, increasing the synapse’s equilibrium weight W. We repeat this process until the equilibrium weight W matches Ical, causing M9’s gate voltage to rise, disabling Vcal and with it injection. To ensure that a precalibrated synapse has an equilibrium weight below Ical, we use tunneling to erase all bias transistors prior to calibration. Fig.2(c) shows the equilibrium weights of six synapses after calibration. The data show that calibration can reduce the effect of mismatched adaptation on the synapse’s learned weight to a small fraction of the weight itself. Because M1 is a floating-gate transistor, its parasitic gate-drain capacitance causes a mild dependence between M1’s drain voltage and source current. Consequently, M3’s floatinggate voltage now affects its source current (through M1’s drain voltage), and we can model M3 as a source-degenerated pFET [3]. The new expression for the injection current in M3 is: Presynaptic neuron W+ Synapse W− X Y Injection Postsynaptic neuron Injection Activation window Fig. 3. A method for achieving spike-time dependent plasticity in silicon. (dV fg / dt )inj = − I inj 0 Pr( X , Y ) exp Vfg κ Vγ − κ k1 Ut (8) where k1 is close to zero. The new expression for injection slightly changes the α and η terms of the weight equilibrium in Eq.7, although the qualitative relationship between the weight equilibrium and the conditional probability remains the same. 4 Implementing silicon synaptic learning rules In this section we discuss how to implement a variety of learning rules from the computational-neurobiology and neural-network literature with our synapse circuit. We can use our circuit to implement a Hebbian learning rule. Simultaneously activating both M5 and M6 is analogous to heterosynaptic LTP based on synchronized pre- and postsynaptic signals, and activating tunneling with the postsynaptic Y is analogous to homosynaptic LTD. In our synapse, we tie Xin and Xcor together and correlate Vtun with Y. Our synapse is also capable of emulating a Boltzmann weight-update rule [2]. This weight-update rule derives from the difference between correlations among neurons when the network receives external input, and when the network operates in a free running phase (denoted as clamped and unclamped phases respectively). With weight decay, a Boltzmann synapse learns the difference between correlations in the clamped and unclamped phase. We can create a Boltzmann synapse from a pair of our circuits, in which the effective weight is the difference between the weights of the two synapses. To implement a weight update, we update one silicon synapse based on pre- and postsynaptic signals in the clamped phase, and update the other synapse in the unclamped phase. We do this by sending Xin to Xcor of one synapse in the clamped phase, and sending Xin to Xcor of the other synapse in the negative phase. Vtun remains constant throughout adaptation. Finally, we consider implementing a temporally asymmetric Hebbian learning rule [7] using our synapse. In temporally asymmetric Hebbian learning, a synapse exhibits LTP or LTD if the presynaptic input occurs before or after the postsynaptic response, respectively. We implement an asymmetric learning synapse using two of our circuits, where the synaptic weight is the difference in the weights of the two circuit. We show the circuit in Fig. 3. Each neuron sends two signals: a neuronal output, and an adaptation time window that is active for some time afterwards. Therefore, the combined synapse receives two presynaptic signals and two postsynaptic signals. The relative timing of a postsynaptic response, Y, with the presynaptic input, X, determines whether the synapse undergoes LTP or LTD. If Y occurs before X, Y’s time window correlates with X, causing injection on the negative synapse, decreasing the weight. If Y occurs after X, Y correlates with X’s time window, causing injection on the positive synapse, increasing the weight. Hence, our circuit can use the relative timing between presynaptic and postsynaptic activity to implement learning. 5 Conclusion We have described a silicon synapse that implements a wide range of spike-based learning rules, and that does not suffer from device mismatch. We have also described how we can implement various silicon-learning networks using this synapse. In addition, although we have only analyzed the learning properties of the synapse for binary signals, we can instead use pulse-coded analog signals. One possible avenue for future work is to analyze the implications of different pulse-coded schemes on the circuit’s adaptive behavior. A c k n o w l e d g e me n t s This work was supported by the National Science Foundation and by the Office of Naval Research. Aaron Shon was also supported by a NDSEG fellowship. We thank Anhai Doan and the anonymous reviewers for helpful comments. References [1] W.B.Levy, “A computational approach to hippocampal function,” in R.D. Hawkins and G.H. Bower (eds.), Computational Models of Learning in Simple Neural Systems, The Psychology of Learning and Motivation vol. 23, pp. 243-305, San Diego, CA: Academic Press, 1989. [2] D. H. Ackley, G. Hinton, and T. Sejnowski, “A learning algorithm for Boltzmann machines,” Cognitive Science vol. 9, pp. 147-169, 1985. [3 ] P. Hasler, B. A. Minch, J. Dugger, and C. Diorio, “Adaptive circuits and synapses using pFET floating-gate devices, ” in G. Cauwenberghs and M. Bayoumi (eds.) Learning in Silicon, pp. 33-65, Kluwer Academic, 1999. [4] P. Hafliger, A spike-based learning rule and its implementation in analog hardware, Ph.D. thesis, ETH Zurich, 1999. [5] C. Diorio, P. Hasler, B. A. Minch, and C. Mead, “A floating-gate MOS learning array with locally computer weight updates,” IEEE Transactions on Electron Devices vol. 44(12), pp. 2281-2289, 1997. [6] R. Sutton, “Learning to predict by the methods of temporal difference,” Machine Learning, vol. 3, p p . 9-44, 1988. [7] H.Markram, J. Lübke, M. Frotscher, and B. Sakmann, “Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs,” Science vol. 275, pp.213-215, 1997. [8] A. Pesavento, T. Horiuchi, C. Diorio, and C. Koch, “Adaptation of current signals with floating-gate circuits,” in Proceedings of the 7th International Conference on Microelectronics for Neural, Fuzzy, and Bio-Inspired Systems (Microneuro99), pp. 128-134, 1999. [9] M. Lenzlinger and E. H. Snow. “Fowler-Nordheim tunneling into thermally grown SiO2,” Journal of Applied Physics vol. 40(1), p p . 278--283, 1969. [10] E. Takeda, C. Yang, and A. Miura-Hamada, Hot Carrier Effects in MOS Devices, San Diego, CA: Academic Press, 1995. [11] C. Diorio, “A p-channel MOS synapse transistor with self-convergent memory writes,” IEEE Journal of Solid-State Circuits vol. 36(5), pp. 816-822, 2001.
5 0.39113471 19 nips-2001-A Rotation and Translation Invariant Discrete Saliency Network
Author: Lance R. Williams, John W. Zweck
Abstract: We describe a neural network which enhances and completes salient closed contours. Our work is different from all previous work in three important ways. First, like the input provided to V1 by LGN, the input to our computation is isotropic. That is, the input is composed of spots not edges. Second, our network computes a well defined function of the input based on a distribution of closed contours characterized by a random process. Third, even though our computation is implemented in a discrete network, its output is invariant to continuous rotations and translations of the input pattern.
6 0.35104603 49 nips-2001-Citcuits for VLSI Implementation of Temporally Asymmetric Hebbian Learning
7 0.34780559 42 nips-2001-Bayesian morphometry of hippocampal cells suggests same-cell somatodendritic repulsion
8 0.34349543 52 nips-2001-Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks
9 0.33847103 111 nips-2001-Learning Lateral Interactions for Feature Binding and Sensory Segmentation
10 0.32414606 155 nips-2001-Quantizing Density Estimators
11 0.2991398 68 nips-2001-Entropy and Inference, Revisited
12 0.28351173 177 nips-2001-Switch Packet Arbitration via Queue-Learning
13 0.27450889 164 nips-2001-Sampling Techniques for Kernel Methods
14 0.27254671 141 nips-2001-Orientation-Selective aVLSI Spiking Neurons
15 0.26793259 161 nips-2001-Reinforcement Learning with Long Short-Term Memory
16 0.25927466 142 nips-2001-Orientational and Geometric Determinants of Place and Head-direction
17 0.25893411 153 nips-2001-Product Analysis: Learning to Model Observations as Products of Hidden Variables
18 0.24937728 2 nips-2001-3 state neurons for contextual processing
19 0.23796329 125 nips-2001-Modularity in the motor system: decomposition of muscle patterns as combinations of time-varying synergies
20 0.23555793 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
topicId topicWeight
[(14, 0.079), (17, 0.024), (19, 0.028), (27, 0.085), (30, 0.108), (34, 0.288), (38, 0.035), (59, 0.039), (70, 0.016), (72, 0.027), (79, 0.046), (83, 0.033), (91, 0.118)]
simIndex simValue paperId paperTitle
same-paper 1 0.83896846 176 nips-2001-Stochastic Mixed-Signal VLSI Architecture for High-Dimensional Kernel Machines
Author: Roman Genov, Gert Cauwenberghs
Abstract: A mixed-signal paradigm is presented for high-resolution parallel innerproduct computation in very high dimensions, suitable for efficient implementation of kernels in image processing. At the core of the externally digital architecture is a high-density, low-power analog array performing binary-binary partial matrix-vector multiplication. Full digital resolution is maintained even with low-resolution analog-to-digital conversion, owing to random statistics in the analog summation of binary products. A random modulation scheme produces near-Bernoulli statistics even for highly correlated inputs. The approach is validated with real image data, and with experimental results from a CID/DRAM analog array prototype in 0.5 m CMOS. ¢
2 0.68213469 95 nips-2001-Infinite Mixtures of Gaussian Process Experts
Author: Carl E. Rasmussen, Zoubin Ghahramani
Abstract: We present an extension to the Mixture of Experts (ME) model, where the individual experts are Gaussian Process (GP) regression models. Using an input-dependent adaptation of the Dirichlet Process, we implement a gating network for an infinite number of Experts. Inference in this model may be done efficiently using a Markov Chain relying on Gibbs sampling. The model allows the effective covariance function to vary with the inputs, and may handle large datasets – thus potentially overcoming two of the biggest hurdles with GP models. Simulations show the viability of this approach.
3 0.55952609 52 nips-2001-Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks
Author: M. Schmitt
Abstract: Recurrent neural networks of analog units are computers for realvalued functions. We study the time complexity of real computation in general recurrent neural networks. These have sigmoidal, linear, and product units of unlimited order as nodes and no restrictions on the weights. For networks operating in discrete time, we exhibit a family of functions with arbitrarily high complexity, and we derive almost tight bounds on the time required to compute these functions. Thus, evidence is given of the computational limitations that time-bounded analog recurrent neural networks are subject to. 1
4 0.5493865 27 nips-2001-Activity Driven Adaptive Stochastic Resonance
Author: Gregor Wenning, Klaus Obermayer
Abstract: Cortical neurons might be considered as threshold elements integrating in parallel many excitatory and inhibitory inputs. Due to the apparent variability of cortical spike trains this yields a strongly fluctuating membrane potential, such that threshold crossings are highly irregular. Here we study how a neuron could maximize its sensitivity w.r.t. a relatively small subset of excitatory input. Weak signals embedded in fluctuations is the natural realm of stochastic resonance. The neuron's response is described in a hazard-function approximation applied to an Ornstein-Uhlenbeck process. We analytically derive an optimality criterium and give a learning rule for the adjustment of the membrane fluctuations, such that the sensitivity is maximal exploiting stochastic resonance. We show that adaptation depends only on quantities that could easily be estimated locally (in space and time) by the neuron. The main results are compared with simulations of a biophysically more realistic neuron model. 1
5 0.54881018 56 nips-2001-Convolution Kernels for Natural Language
Author: Michael Collins, Nigel Duffy
Abstract: We describe the application of kernel methods to Natural Language Processing (NLP) problems. In many NLP tasks the objects being modeled are strings, trees, graphs or other discrete structures which require some mechanism to convert them into feature vectors. We describe kernels for various natural language structures, allowing rich, high dimensional representations of these structures. We show how a kernel over trees can be applied to parsing using the voted perceptron algorithm, and we give experimental results on the ATIS corpus of parse trees.
6 0.54634786 102 nips-2001-KLD-Sampling: Adaptive Particle Filters
7 0.54566199 63 nips-2001-Dynamic Time-Alignment Kernel in Support Vector Machine
8 0.54463017 65 nips-2001-Effective Size of Receptive Fields of Inferior Temporal Visual Cortex Neurons in Natural Scenes
9 0.54133713 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade
10 0.54098767 161 nips-2001-Reinforcement Learning with Long Short-Term Memory
11 0.53988367 46 nips-2001-Categorization by Learning and Combining Object Parts
12 0.53881001 149 nips-2001-Probabilistic Abstraction Hierarchies
13 0.53850913 13 nips-2001-A Natural Policy Gradient
14 0.53831983 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes
15 0.5357402 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's
16 0.53489947 160 nips-2001-Reinforcement Learning and Time Perception -- a Model of Animal Experiments
17 0.53477734 182 nips-2001-The Fidelity of Local Ordinal Encoding
18 0.53431189 22 nips-2001-A kernel method for multi-labelled classification
19 0.53372306 74 nips-2001-Face Recognition Using Kernel Methods
20 0.53336227 60 nips-2001-Discriminative Direction for Kernel Classifiers