nips nips2002 nips2002-22 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Herbert Jaeger
Abstract: Echo state networks (ESN) are a novel approach to recurrent neural network training. An ESN consists of a large, fixed, recurrent
Reference: text
sentIndex sentText sentNum sentScore
1 de Abstract Echo state networks (ESN) are a novel approach to recurrent neural network training. [sent-3, score-0.289]
2 An ESN consists of a large, fixed, recurrent "reservoir" network, from which the desired output is obtained by training suitable output connection weights. [sent-4, score-0.443]
3 Determination of optimal output weights becomes a linear, uniquely solvable task of MSE minimization. [sent-5, score-0.194]
4 This article reviews the basic ideas and describes an online adaptation scheme based on the RLS algorithm known from adaptive linear systems. [sent-6, score-0.244]
5 The known benefits of the RLS algorithms carryover from linear systems to nonlinear ones; specifically, the convergence rate and misadjustment can be determined at design time. [sent-8, score-0.213]
6 1 Introduction It is fair to say that difficulties with existing algorithms have so far precluded supervised training techniques for recurrent neural networks (RNNs) from widespread use. [sent-9, score-0.163]
7 Echo state networks (ESNs) provide a novel and easier to manage approach to supervised training of RNNs. [sent-10, score-0.149]
8 The connection weights of this reservoir network are not changed by training. [sent-12, score-0.397]
9 In order to compute a desired output dynamics, only the weights of connections from the reservoir to the output units are calculated. [sent-13, score-0.558]
10 In this article I describe how ESNs can be conjoined with the "recursive least squares" (RLS) algorithm, a method for fast online adaptation known from linear systems. [sent-18, score-0.179]
11 The resulting RLS-ESN is capable of tracking a 10-th order nonlinear system with high quality in convergence speed and residual error. [sent-19, score-0.169]
12 Furthermore, the approach yields apriori estimates of tracking performance parameters and thus allows one to design nonlinear trackers according to specifications l . [sent-20, score-0.083]
13 Section 3 demonstrates ESN omine learning on the 10th order system identification task. [sent-23, score-0.093]
14 2 Basic ideas of echo state networks For the sake of a simple notation, in this article I address only single-input, singleoutput systems (general treatment in [5]). [sent-26, score-0.294]
15 We consider a discrete-time "reservoir" RNN with N internal network units , a single extra input unit, and a single extra output unit. [sent-27, score-0.385]
16 The input at time n 2 1 is u(n), activations of internal units are x(n) = (xl(n), . [sent-28, score-0.122]
17 ,xN(n)), and activation of the output unit is y(n). [sent-31, score-0.156]
18 Internal connection weights are collected in an N x N matrix W = (Wij), weights of connections going from the input unit into the network in an N-element (column) weight vector win = (w~n), and the N + 1 (input-and-network)-to-output connection weights in aN + 1element (row) vector wo ut = (w? [sent-32, score-0.924]
19 The output weights wo ut will be learned, the internal weights Wand input weights win are fixed before learning, typically in a sparse random connectivity pattern. [sent-34, score-0.718]
20 When the network is updated according to (1), then under certain conditions the network state becomes asymptotically independent of initial conditions. [sent-40, score-0.369]
21 More precisely, if the network is started from two arbitrary states x(O), X(O) and is run with the same input sequence in both cases, the resulting state sequences x(n), x(n) converge to each other. [sent-41, score-0.337]
22 If this condition holds , the reservoir network state will asymptotically depend only on the input history, and the network tained in a tutorial Mathematica notebook which http://www. [sent-42, score-0.637]
23 can be fetched from is said to be an echo state network (ESN). [sent-47, score-0.354]
24 A sufficient condition for the echo state property is contractivity of W. [sent-48, score-0.215]
25 Consider the task of computing the output weights such that the teacher output is approximated by the network. [sent-51, score-0.465]
26 In the ESN approach, this task is spelled out concretely as follows: compute wo ut such that the training error (3) is minimized in the mean square sense. [sent-52, score-0.291]
27 Note that the effect of the output nonlinearity is undone by (f0ut)-l in this error definition. [sent-53, score-0.152]
28 We dub (fout)-IYteach(n) the teacher pre-signal and (f0ut)-l (wo ut (Uteach(n), x(n)) + v(n)) the network's preoutput. [sent-54, score-0.223]
29 Here is a sketch of an offline algorithm for the entire learning procedure: 1. [sent-56, score-0.131]
30 Fix a RNN with a single input and a single output unit , scaling the weight matrix W such that IAmax 1< 1 obtains. [sent-57, score-0.268]
31 Dismiss data from initial transient and collect remaining input+network states (Uteach (n), Xteach (n)) row-wise into a matrix M. [sent-60, score-0.094]
32 Simultaneously, collect the remaining training pre-signals (f0ut)-IYteach(n) into a column vector r. [sent-61, score-0.157]
33 Compute the pseudo-inverse M-l, and put wo ut = (M-Ir) T (where T denotes transpose). [sent-63, score-0.226]
34 Write wo ut into the output connections; the ESN is now trained. [sent-65, score-0.35]
35 The modeling power of an ESN grows with network size. [sent-66, score-0.139]
36 A cheaper way to increase the power is to use additional nonlinear transformations ofthe network state x(n) for computing the network output in (2). [sent-67, score-0.491]
37 We use here a squared version of the network state. [sent-68, score-0.139]
38 Let w~~~ares denote a length 2N + 2 output weight vector and Xsquares(n) the length 2N +2 (column) vector (u(n), Xl (n), . [sent-69, score-0.236]
39 Keep the network update (1) unchanged, but compute outputs with the following variant of (2): y(n + 1) (4) The "reservoir" and the input is now tapped by linear and quadratic connections. [sent-76, score-0.206]
40 Dismiss initial transient and collect remaining augmented states Xsquares(n) row-wise into M. [sent-80, score-0.24]
41 Simultaneously, collect the training pre-signals (fout)-IYteach(n) into a column vector r. [sent-81, score-0.133]
42 The ESN is now ready for exploitation, using output formula (4). [sent-85, score-0.124]
43 3 Identifying a 10th order system: offline case In this section the workings of the augmented algorithm will be demonstrated with a nonlinear system identification task. [sent-86, score-0.407]
44 An N = 100 ESN was prepared by fixing a random, sparse connection weight matrix W (connectivity 5 %, non-zero weights sampled from uniform distribution in [-1,1], the resulting raw matrix was re-scaled to a spectral radius of 0. [sent-94, score-0.207]
45 An input unit was attached with a random weight vector win sampled from a uniform distribution over [-0. [sent-96, score-0.215]
46 An I/O training sequence was prepared by driving the system (5) with an i. [sent-100, score-0.158]
47 The network was run according to (1) with the training input for 1200 time steps with uniform noise v(n) of size 0. [sent-105, score-0.344]
48 The remaining 1000 network states were entered into the augmented training algorithm, and a 202-length augmented output weight vector w~~~ares was calculated. [sent-108, score-0.784]
49 The learnt output vector was installed and the network was run from a zero starting state with newly created testing input for 2200 steps, of which the first 200 were discarded. [sent-110, score-0.448]
50 (1) The noise term v(n) functions as a regularizer, slightly compromising the training error but improving the test error. [sent-115, score-0.095]
51 Set up exactly like in the described 100-unit example, an augmented 20-unit ESN trained on 500 data points gave NMSE test ~ 0. [sent-117, score-0.169]
52 ] error obtained in [1] on a length 200 training sequence was NMSEtrain ~ 0. [sent-123, score-0.089]
53 2412 However, the level of precision reported [1] and many other published papers about RNN training appear to be based on suboptimal training schemes. [sent-124, score-0.169]
54 4 Online adaptation of ESN network Because the determination of optimal (augmented) output weights is a linear task, standard recursive algorithms for MSE minimization known from adaptive linear signal processing can be applied to online ESN estimation. [sent-127, score-0.599]
55 ,XN (n) are transformed into an output signal y(n) by an inner product with a tap-weight vector (Wl, . [sent-133, score-0.183]
56 In the ESN context, the input signals are the 2N + 2 components of the augmented input+network state vector, the tap-weight vector is the augmented output weight vector, and the output signal is the network pre-output (fout)-ly(n) . [sent-140, score-0.902]
57 1 A refresher on adaptive linear system identification For a recursive online estimation of tap-weight vectors, "recursive least squares" (RLS) algorithms are widely used in linear signal processing when fast convergence is of prime importance. [sent-147, score-0.334]
58 An online algorithm in the augmented ESN setting should do the following: given an open-ended, typically non-stationary training I/O sequence (Uteach(n), Yteach(n)), at each time n ~ 1 determine an augmented output weight vector w~~~ares(n) which yields a good model of the current teacher system. [sent-149, score-0.826]
59 Two parameters characterise the tracking performance of an RLS algorithm: the misadjustment M and the convergence time constant T. [sent-157, score-0.222]
60 The misadjustment gives the ratio between the excess MSE (or excess NMSE) incurred by the fluctuations of the adaptation process, and the optimal steady-state MSE that would be obtained in the limit of offline-training on infinite stationary training data. [sent-158, score-0.334]
61 3 means that the tracking error of the adaptive algorithm in a steady-state situation exceeds the theoretically achievable optimum (with Sanle tap weight vector length) by 30 %. [sent-160, score-0.177]
62 Misadjustment and convergence exponent are related to the forgetting factor and the tap-vector length through and 4. [sent-162, score-0.112]
63 Ire-use the same augmented lOa-unit ESN, but now determine its 2N + 2 output weight vector online with RLS. [sent-167, score-0.444]
64 Experiments with the system (5) revealed that the system sometimes explodes when driven with i. [sent-179, score-0.105]
65 (8) This system was run for 10000 steps with an i. [sent-193, score-0.107]
66 2A shows the resulting teacher output sequence, which clearly shows transitions between different "episodes" every 2000 steps. [sent-201, score-0.271]
67 The ENS was started from zero state and with a zero augmented output weight vector. [sent-203, score-0.413]
68 It was driven by the teacher input, and a noise of size 0. [sent-204, score-0.177]
69 0001 was inserted into the state update, as in the offline training. [sent-205, score-0.22]
70 995) was initialized according to the prescriptions given in [2] and then run together with the network updates , to compute from the augmented input+network states x(n) = (u(n), Xl (n), . [sent-207, score-0.34]
71 ,xJv(n)) a sequence of augmented output weight vectors w~~~ares (n). [sent-213, score-0.362]
72 These output weight vectors were used to calculate a network output y(n) = tanh(w~~~ares(n), x(n)). [sent-214, score-0.455]
73 From the resulting length-l0000 sequences of desired outputs d(n) and network productions y(n) , NMSE's were numerically estimated from averaging within subsequent length-lOO blocks. [sent-216, score-0.185]
74 Also the convergence speed matches the predicted time constant, as revealed by the T = 200 slope line inserted in Fig. [sent-220, score-0.105]
75 Surprisingly, after convergence, the online-NMSE is lower than the offline NMSE. [sent-224, score-0.131]
76 (8) , which incurs long-term correlations in the signal d( n), or in other words, a nonstationarity of the signal in the timescale of the correlation lengthes, even with fixed parameters a, (3", 6. [sent-226, score-0.134]
77 This medium-term nonstationarity compromises the performance of the offline algorithm, but the online adaptation can to a certain degree follow this nonstationarity. [sent-227, score-0.296]
78 2C is a logarithmic plot of the development of the mean absolute output weight size. [sent-229, score-0.192]
79 It is apparent that after starting from zero, there is an initial exponential growth of absolute values of the output weights, until a stabilization at a size of about 1000, whereafter the NMSE develops a regular pattern (Fig. [sent-230, score-0.124]
80 Standard offline training of ESNs yields output weights whose absolute size depends on the noise inserted into the network during training: the larger the noise, the smaller the mean output weights (extensive discussion in [5]). [sent-235, score-0.79]
81 In online training, a similar inverse correlation between output weight size (after settling on plateau) and noise size can be observed. [sent-236, score-0.306]
82 When the online learning experiment was done otherwise identically but without noise insertion, weights grew so large that the RLS algorithm entered a region of numerical instability. [sent-237, score-0.21]
83 Thus, the noise term is crucial here for numerical stability, a condition familiar from EKF-based RNN training schemes [3], which are computationally closely related to RLS. [sent-238, score-0.117]
84 5 Discussion Several of the well-known error-gradient-based RNN training algorithms can be used for online weight adaptation. [sent-263, score-0.217]
85 The update costs per time step in the most efficient of those algorithms (overview in [1]) are O(N 2 ) , where N is network size. [sent-264, score-0.139]
86 Typically, standard approaches train small networks (order of N = 20), whereas ESN typically relies on large networks for precision (order of N = 100). [sent-265, score-0.103]
87 Thus, the RLS-based ESN online learning algorithm is typically more expensive than standard techniques. [sent-266, score-0.084]
88 Exploitable ESN memory spans grow with network size (analysis in [6]). [sent-270, score-0.139]
89 It was learnt by a 400-unit augmented adaptive ESN with a test NMSE of 0. [sent-276, score-0.223]
90 ) order system y(n + 1) = u(n - 10) u(n - 50) was learnt offline by a 400-unit augmented ESN with a NMSE of 0. [sent-279, score-0.354]
91 gradient-based techniques with a similar number of trainable weights (D. [sent-283, score-0.098]
92 Because gradient-based techniques train every connection weight in the RNN, whereas 3S ee Mathematica notebook for details. [sent-285, score-0.147]
93 ESNs train only the output weights, the numbers of units of similarly performing standard RNNs vs. [sent-286, score-0.164]
94 However, when working with ESNs, for each new trained output signal one can re-use the same "reservoir", adding only N new connections and weights. [sent-289, score-0.191]
95 This has for instance been exploited for robots in the AIS institute by simultaneously training multiple feature detectors from a single "reservoir" [4]. [sent-290, score-0.088]
96 The size disadvantage of ESNs is further balanced by much faster offline training, greater simplicity, and the general possibility to exploit linear-systems expertise for nonlinear adaptive modeling. [sent-293, score-0.209]
97 New results on recurrent network training: Unifying the algorithms and accelerating convergence. [sent-303, score-0.205]
98 Enhanced multistream Kalman filter training for recurrent neural networks. [sent-318, score-0.131]
99 The "echo state" approach to analysing and training recurrent neural networks. [sent-340, score-0.131]
100 Tutorial on training recurrent neural networks, covering BPPT, RTRL , EKF and the echo state network approach. [sent-358, score-0.485]
wordName wordTfidf (topN-words)
[('esn', 0.544), ('nmse', 0.431), ('rls', 0.225), ('esns', 0.206), ('echo', 0.163), ('rnn', 0.15), ('wo', 0.15), ('reservoir', 0.147), ('teacher', 0.147), ('augmented', 0.146), ('network', 0.139), ('misadjustment', 0.131), ('offline', 0.131), ('output', 0.124), ('ares', 0.114), ('online', 0.084), ('mse', 0.079), ('ut', 0.076), ('gmd', 0.075), ('prokhorov', 0.075), ('uteach', 0.075), ('tanh', 0.074), ('weights', 0.07), ('weight', 0.068), ('recurrent', 0.066), ('rnns', 0.065), ('training', 0.065), ('fout', 0.056), ('identification', 0.052), ('state', 0.052), ('ais', 0.049), ('win', 0.049), ('adaptation', 0.048), ('article', 0.047), ('tracking', 0.046), ('collect', 0.046), ('convergence', 0.045), ('forgetting', 0.045), ('excess', 0.045), ('input', 0.044), ('connection', 0.041), ('system', 0.041), ('adaptive', 0.041), ('units', 0.04), ('tutorial', 0.039), ('asymptotically', 0.039), ('precision', 0.039), ('internal', 0.038), ('danil', 0.038), ('follt', 0.038), ('jaeger', 0.038), ('loglo', 0.038), ('miscalculated', 0.038), ('narma', 0.038), ('nmsetrain', 0.038), ('notebook', 0.038), ('rut', 0.038), ('xsquares', 0.038), ('yteach', 0.038), ('inserted', 0.037), ('nonlinear', 0.037), ('signal', 0.037), ('learnt', 0.036), ('steps', 0.035), ('xl', 0.035), ('recursive', 0.034), ('maass', 0.034), ('dismiss', 0.033), ('mathematica', 0.033), ('nonstationarity', 0.033), ('networks', 0.032), ('unit', 0.032), ('run', 0.031), ('bremen', 0.03), ('connections', 0.03), ('noise', 0.03), ('nonlinearity', 0.028), ('prepared', 0.028), ('trainable', 0.028), ('fixed', 0.027), ('entered', 0.026), ('episodes', 0.026), ('fraunhofer', 0.025), ('german', 0.024), ('states', 0.024), ('basic', 0.024), ('sequence', 0.024), ('remaining', 0.024), ('outputs', 0.023), ('desired', 0.023), ('simultaneously', 0.023), ('started', 0.023), ('revealed', 0.023), ('filters', 0.023), ('gave', 0.023), ('vector', 0.022), ('exponent', 0.022), ('familiar', 0.022), ('determination', 0.022), ('unchanged', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 22 nips-2002-Adaptive Nonlinear System Identification with Echo State Networks
Author: Herbert Jaeger
Abstract: Echo state networks (ESN) are a novel approach to recurrent neural network training. An ESN consists of a large, fixed, recurrent
2 0.05819267 5 nips-2002-A Digital Antennal Lobe for Pattern Equalization: Analysis and Design
Author: Alex Holub, Gilles Laurent, Pietro Perona
Abstract: Re-mapping patterns in order to equalize their distribution may greatly simplify both the structure and the training of classifiers. Here, the properties of one such map obtained by running a few steps of discrete-time dynamical system are explored. The system is called 'Digital Antennal Lobe' (DAL) because it is inspired by recent studies of the antennallobe, a structure in the olfactory system of the grasshopper. The pattern-spreading properties of the DAL as well as its average behavior as a function of its (few) design parameters are analyzed by extending previous results of Van Vreeswijk and Sompolinsky. Furthermore, a technique for adapting the parameters of the initial design in order to obtain opportune noise-rejection behavior is suggested. Our results are demonstrated with a number of simulations. 1
3 0.056744821 51 nips-2002-Classifying Patterns of Visual Motion - a Neuromorphic Approach
Author: Jakob Heinzle, Alan Stocker
Abstract: We report a system that classifies and can learn to classify patterns of visual motion on-line. The complete system is described by the dynamics of its physical network architectures. The combination of the following properties makes the system novel: Firstly, the front-end of the system consists of an aVLSI optical flow chip that collectively computes 2-D global visual motion in real-time [1]. Secondly, the complexity of the classification task is significantly reduced by mapping the continuous motion trajectories to sequences of ’motion events’. And thirdly, all the network structures are simple and with the exception of the optical flow chip based on a Winner-Take-All (WTA) architecture. We demonstrate the application of the proposed generic system for a contactless man-machine interface that allows to write letters by visual motion. Regarding the low complexity of the system, its robustness and the already existing front-end, a complete aVLSI system-on-chip implementation is realistic, allowing various applications in mobile electronic devices.
Author: Alistair Bray, Dominique Martinez
Abstract: In Slow Feature Analysis (SFA [1]), it has been demonstrated that high-order invariant properties can be extracted by projecting inputs into a nonlinear space and computing the slowest changing features in this space; this has been proposed as a simple general model for learning nonlinear invariances in the visual system. However, this method is highly constrained by the curse of dimensionality which limits it to simple theoretical simulations. This paper demonstrates that by using a different but closely-related objective function for extracting slowly varying features ([2, 3]), and then exploiting the kernel trick, this curse can be avoided. Using this new method we show that both the complex cell properties of translation invariance and disparity coding can be learnt simultaneously from natural images when complex cells are driven by simple cells also learnt from the image. The notion of maximising an objective function based upon the temporal predictability of output has been progressively applied in modelling the development of invariances in the visual system. F6ldiak used it indirectly via a Hebbian trace rule for modelling the development of translation invariance in complex cells [4] (closely related to many other models [5,6,7]); this rule has been used to maximise invariance as one component of a hierarchical system for object and face recognition [8]. On the other hand, similar functions have been maximised directly in networks for extracting linear [2] and nonlinear [9, 1] visual invariances. Direct maximisation of such functions have recently been used to model complex cells [10] and as an alternative to maximising sparseness/independence in modelling simple cells [11]. Slow Feature Analysis [1] combines many of the best properties of these methods to provide a good general nonlinear model. That is, it uses an objective function that minimises the first-order temporal derivative of the outputs; it provides a closedform solution which maximises this function by projecting inputs into a nonlinear http://www.loria.fr/equipes/cortex/ space; it exploits sphering (or PCA-whitening) of the data to ensure that all outputs have unit variance and are uncorrelated. However, the method suffers from the curse of dimensionality in that the nonlinear feature space soon becomes very large as the input dimension grows, and yet this feature space must be represented explicitly in order for the essential sphering to occur. The alternative that we propose here is to use the objective function of Stone [2, 9], that maximises output variance over a long period whilst minimising variance over a shorter period; in the linear case, this can be implemented by a biologically plausible mixture of Hebbian and anti-Hebbian learning on the same synapses [2]. In recent work, Stone has proposed a closed-form solution for maximising this function in the linear domain of blind source separation that does not involve data-sphering. This paper describes how this method can be kernelised. The use of the
5 0.055339016 73 nips-2002-Dynamic Bayesian Networks with Deterministic Latent Tables
Author: David Barber
Abstract: The application of latent/hidden variable Dynamic Bayesian Networks is constrained by the complexity of marginalising over latent variables. For this reason either small latent dimensions or Gaussian latent conditional tables linearly dependent on past states are typically considered in order that inference is tractable. We suggest an alternative approach in which the latent variables are modelled using deterministic conditional probability tables. This specialisation has the advantage of tractable inference even for highly complex non-linear/non-Gaussian visible conditional probability tables. This approach enables the consideration of highly complex latent dynamics whilst retaining the benefits of a tractable probabilistic model. 1
6 0.054576632 11 nips-2002-A Model for Real-Time Computation in Generic Neural Microcircuits
7 0.051429957 18 nips-2002-Adaptation and Unsupervised Learning
8 0.044897482 76 nips-2002-Dynamical Constraints on Computing with Spike Timing in the Cortex
9 0.044833004 171 nips-2002-Reconstructing Stimulus-Driven Neural Networks from Spike Times
10 0.042647343 21 nips-2002-Adaptive Classification by Variational Kalman Filtering
11 0.042386625 128 nips-2002-Learning a Forward Model of a Reflex
12 0.041571703 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization
13 0.041305479 206 nips-2002-Visual Development Aids the Acquisition of Motion Velocity Sensitivities
14 0.04103547 119 nips-2002-Kernel Dependency Estimation
15 0.037053943 79 nips-2002-Evidence Optimization Techniques for Estimating Stimulus-Response Functions
16 0.035305735 103 nips-2002-How Linear are Auditory Cortical Responses?
17 0.035147317 181 nips-2002-Self Supervised Boosting
18 0.034940679 164 nips-2002-Prediction of Protein Topologies Using Generalized IOHMMs and RNNs
19 0.034699947 153 nips-2002-Neural Decoding of Cursor Motion Using a Kalman Filter
20 0.033327676 102 nips-2002-Hidden Markov Model of Cortical Synaptic Plasticity: Derivation of the Learning Rule
topicId topicWeight
[(0, -0.122), (1, 0.032), (2, -0.03), (3, -0.005), (4, 0.009), (5, 0.062), (6, -0.012), (7, 0.026), (8, 0.059), (9, 0.018), (10, 0.007), (11, 0.004), (12, 0.02), (13, 0.02), (14, -0.038), (15, 0.015), (16, 0.022), (17, 0.05), (18, 0.012), (19, -0.014), (20, 0.064), (21, 0.002), (22, 0.058), (23, 0.013), (24, 0.028), (25, -0.052), (26, 0.013), (27, 0.058), (28, -0.02), (29, 0.049), (30, 0.063), (31, 0.064), (32, 0.029), (33, 0.002), (34, 0.001), (35, 0.026), (36, -0.078), (37, 0.122), (38, -0.113), (39, -0.022), (40, 0.116), (41, -0.063), (42, 0.035), (43, 0.058), (44, -0.049), (45, 0.171), (46, -0.046), (47, 0.176), (48, -0.018), (49, -0.002)]
simIndex simValue paperId paperTitle
same-paper 1 0.92355883 22 nips-2002-Adaptive Nonlinear System Identification with Echo State Networks
Author: Herbert Jaeger
Abstract: Echo state networks (ESN) are a novel approach to recurrent neural network training. An ESN consists of a large, fixed, recurrent
2 0.59835857 5 nips-2002-A Digital Antennal Lobe for Pattern Equalization: Analysis and Design
Author: Alex Holub, Gilles Laurent, Pietro Perona
Abstract: Re-mapping patterns in order to equalize their distribution may greatly simplify both the structure and the training of classifiers. Here, the properties of one such map obtained by running a few steps of discrete-time dynamical system are explored. The system is called 'Digital Antennal Lobe' (DAL) because it is inspired by recent studies of the antennallobe, a structure in the olfactory system of the grasshopper. The pattern-spreading properties of the DAL as well as its average behavior as a function of its (few) design parameters are analyzed by extending previous results of Van Vreeswijk and Sompolinsky. Furthermore, a technique for adapting the parameters of the initial design in order to obtain opportune noise-rejection behavior is suggested. Our results are demonstrated with a number of simulations. 1
3 0.55346984 160 nips-2002-Optoelectronic Implementation of a FitzHugh-Nagumo Neural Model
Author: Alexandre R. Romariz, Kelvin Wagner
Abstract: An optoelectronic implementation of a spiking neuron model based on the FitzHugh-Nagumo equations is presented. A tunable semiconductor laser source and a spectral filter provide a nonlinear mapping from driver voltage to detected signal. Linear electronic feedback completes the implementation, which allows either electronic or optical input signals. Experimental results for a single system and numeric results of model interaction confirm that important features of spiking neural models can be implemented through this approach.
4 0.47289866 51 nips-2002-Classifying Patterns of Visual Motion - a Neuromorphic Approach
Author: Jakob Heinzle, Alan Stocker
Abstract: We report a system that classifies and can learn to classify patterns of visual motion on-line. The complete system is described by the dynamics of its physical network architectures. The combination of the following properties makes the system novel: Firstly, the front-end of the system consists of an aVLSI optical flow chip that collectively computes 2-D global visual motion in real-time [1]. Secondly, the complexity of the classification task is significantly reduced by mapping the continuous motion trajectories to sequences of ’motion events’. And thirdly, all the network structures are simple and with the exception of the optical flow chip based on a Winner-Take-All (WTA) architecture. We demonstrate the application of the proposed generic system for a contactless man-machine interface that allows to write letters by visual motion. Regarding the low complexity of the system, its robustness and the already existing front-end, a complete aVLSI system-on-chip implementation is realistic, allowing various applications in mobile electronic devices.
5 0.46826163 95 nips-2002-Gaussian Process Priors with Uncertain Inputs Application to Multiple-Step Ahead Time Series Forecasting
Author: Agathe Girard, Carl Edward Rasmussen, Joaquin Quiñonero Candela, Roderick Murray-Smith
Abstract: We consider the problem of multi-step ahead prediction in time series analysis using the non-parametric Gaussian process model. -step ahead forecasting of a discrete-time non-linear dynamic system can be performed by doing repeated one-step ahead predictions. For a state-space model of the form , the prediction of at time is based on the point estimates of the previous outputs. In this paper, we show how, using an analytical Gaussian approximation, we can formally incorporate the uncertainty about intermediate regressor values, thus updating the uncertainty on the current prediction. ¡ % # ¢ ¡ ¢ ¡¨ ¦ ¤ ¢ $
6 0.44427684 206 nips-2002-Visual Development Aids the Acquisition of Motion Velocity Sensitivities
7 0.42892903 73 nips-2002-Dynamic Bayesian Networks with Deterministic Latent Tables
8 0.40113935 7 nips-2002-A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences
9 0.3981261 201 nips-2002-Transductive and Inductive Methods for Approximate Gaussian Process Regression
10 0.39333495 11 nips-2002-A Model for Real-Time Computation in Generic Neural Microcircuits
12 0.37946296 110 nips-2002-Incremental Gaussian Processes
13 0.37614524 128 nips-2002-Learning a Forward Model of a Reflex
14 0.36245531 18 nips-2002-Adaptation and Unsupervised Learning
15 0.34774131 107 nips-2002-Identity Uncertainty and Citation Matching
16 0.33583447 150 nips-2002-Multiple Cause Vector Quantization
17 0.33064488 65 nips-2002-Derivative Observations in Gaussian Process Models of Dynamic Systems
18 0.32512832 37 nips-2002-Automatic Derivation of Statistical Algorithms: The EM Family and Beyond
19 0.32017425 119 nips-2002-Kernel Dependency Estimation
20 0.31891811 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization
topicId topicWeight
[(11, 0.011), (23, 0.025), (42, 0.547), (54, 0.077), (55, 0.025), (67, 0.014), (68, 0.044), (74, 0.042), (87, 0.01), (92, 0.028), (98, 0.075)]
simIndex simValue paperId paperTitle
1 0.94535041 115 nips-2002-Informed Projections
Author: David Tax
Abstract: Low rank approximation techniques are widespread in pattern recognition research — they include Latent Semantic Analysis (LSA), Probabilistic LSA, Principal Components Analysus (PCA), the Generative Aspect Model, and many forms of bibliometric analysis. All make use of a low-dimensional manifold onto which data are projected. Such techniques are generally “unsupervised,” which allows them to model data in the absence of labels or categories. With many practical problems, however, some prior knowledge is available in the form of context. In this paper, I describe a principled approach to incorporating such information, and demonstrate its application to PCA-based approximations of several data sets. 1
same-paper 2 0.94082969 22 nips-2002-Adaptive Nonlinear System Identification with Echo State Networks
Author: Herbert Jaeger
Abstract: Echo state networks (ESN) are a novel approach to recurrent neural network training. An ESN consists of a large, fixed, recurrent
3 0.91409647 181 nips-2002-Self Supervised Boosting
Author: Max Welling, Richard S. Zemel, Geoffrey E. Hinton
Abstract: Boosting algorithms and successful applications thereof abound for classification and regression learning problems, but not for unsupervised learning. We propose a sequential approach to adding features to a random field model by training them to improve classification performance between the data and an equal-sized sample of “negative examples” generated from the model’s current estimate of the data density. Training in each boosting round proceeds in three stages: first we sample negative examples from the model’s current Boltzmann distribution. Next, a feature is trained to improve classification performance between data and negative examples. Finally, a coefficient is learned which determines the importance of this feature relative to ones already in the pool. Negative examples only need to be generated once to learn each new feature. The validity of the approach is demonstrated on binary digits and continuous synthetic data.
4 0.90013772 197 nips-2002-The Stability of Kernel Principal Components Analysis and its Relation to the Process Eigenspectrum
Author: Christopher Williams, John S. Shawe-taylor
Abstract: In this paper we analyze the relationships between the eigenvalues of the m x m Gram matrix K for a kernel k(·, .) corresponding to a sample Xl, ... ,X m drawn from a density p(x) and the eigenvalues of the corresponding continuous eigenproblem. We bound the differences between the two spectra and provide a performance bound on kernel peA. 1
5 0.79686391 138 nips-2002-Manifold Parzen Windows
Author: Pascal Vincent, Yoshua Bengio
Abstract: The similarity between objects is a fundamental element of many learning algorithms. Most non-parametric methods take this similarity to be fixed, but much recent work has shown the advantages of learning it, in particular to exploit the local invariances in the data or to capture the possibly non-linear manifold on which most of the data lies. We propose a new non-parametric kernel density estimation method which captures the local structure of an underlying manifold through the leading eigenvectors of regularized local covariance matrices. Experiments in density estimation show significant improvements with respect to Parzen density estimators. The density estimators can also be used within Bayes classifiers, yielding classification rates similar to SVMs and much superior to the Parzen classifier.
6 0.63165605 159 nips-2002-Optimality of Reinforcement Learning Algorithms with Linear Function Approximation
7 0.61772126 46 nips-2002-Boosting Density Estimation
8 0.60952616 143 nips-2002-Mean Field Approach to a Probabilistic Model in Information Retrieval
9 0.58944547 65 nips-2002-Derivative Observations in Gaussian Process Models of Dynamic Systems
10 0.58175951 169 nips-2002-Real-Time Particle Filters
11 0.56032395 52 nips-2002-Cluster Kernels for Semi-Supervised Learning
12 0.55959618 3 nips-2002-A Convergent Form of Approximate Policy Iteration
13 0.55862421 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
14 0.55457032 21 nips-2002-Adaptive Classification by Variational Kalman Filtering
15 0.5436064 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs
16 0.54253477 61 nips-2002-Convergent Combinations of Reinforcement Learning with Linear Function Approximation
17 0.53978056 96 nips-2002-Generalized² Linear² Models
18 0.53831249 41 nips-2002-Bayesian Monte Carlo
19 0.53714383 100 nips-2002-Half-Lives of EigenFlows for Spectral Clustering
20 0.53652835 137 nips-2002-Location Estimation with a Differential Update Network