nips nips2000 nips2000-102 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zhaoping Li, Peter Dayan
Abstract: Stimulus arrays are inevitably presented at different positions on the retina in visual tasks, even those that nominally require fixation. In particular, this applies to many perceptual learning tasks. We show that perceptual inference or discrimination in the face of positional variance has a structurally different quality from inference about fixed position stimuli, involving a particular, quadratic, non-linearity rather than a purely linear discrimination. We show the advantage taking this non-linearity into account has for discrimination, and suggest it as a role for recurrent connections in area VI, by demonstrating the superior discrimination performance of a recurrent network. We propose that learning the feedforward and recurrent neural connections for these tasks corresponds to the fast and slow components of learning observed in perceptual learning tasks.
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract Stimulus arrays are inevitably presented at different positions on the retina in visual tasks, even those that nominally require fixation. [sent-8, score-0.074]
2 In particular, this applies to many perceptual learning tasks. [sent-9, score-0.16]
3 We show that perceptual inference or discrimination in the face of positional variance has a structurally different quality from inference about fixed position stimuli, involving a particular, quadratic, non-linearity rather than a purely linear discrimination. [sent-10, score-0.683]
4 We show the advantage taking this non-linearity into account has for discrimination, and suggest it as a role for recurrent connections in area VI, by demonstrating the superior discrimination performance of a recurrent network. [sent-11, score-0.863]
5 We propose that learning the feedforward and recurrent neural connections for these tasks corresponds to the fast and slow components of learning observed in perceptual learning tasks. [sent-12, score-0.985]
6 1 Introduction The field of perceptual learning in simple, but high precision, visual tasks (such as vernier acuity tasks) has produced many surprising results whose import for models has yet to be fully felt. [sent-13, score-0.533]
7 For instance, improvement through learning on an orientation discrimination task does not lead to improvement on a vernier acuity task (Fable 1997), even though both tasks presumably use the same orientation selective striate cortical cells to process inputs. [sent-16, score-1.013]
8 Of course, learning in human psychophysics is likely to involve plasticity in a large number of different parts of the brain over various timescales. [sent-17, score-0.071]
9 Previous studies (Poggio, Fable, & Edelman 1992, Weiss, Edelman, & Fable 1993) proposed phenomenological models of learning in a feedforward network architecture. [sent-18, score-0.347]
10 In these models, the first stage units in the network receive the sensory inputs through the medium of basis functions relevant for the perceptual task. [sent-19, score-0.228]
11 Over learning, a set of feedforward weights is acquired such that the weighted sum of the activities from the input units can be used to make an appropriate binary decision, eg using a threshold. [sent-20, score-0.681]
12 These models can account for some, but not all, observations on perceptual learning (Fable et al 1995). [sent-21, score-0.16]
13 Since the activity of VI units seems not to relate directly to behavioral decisions on these visual tasks, the feedforward connections A --=---------:-----~ -l+y y+E l+y X x Figure 1: Mid-point discrimination. [sent-22, score-0.422]
14 The task is to report which of the outer bars is closer to the central bar. [sent-24, score-0.325]
15 y represents the variable placement of the stimulus array. [sent-25, score-0.184]
16 B) Population activities in cortical cells evoked by the stimulus bars - the activities ai is plotted against the preferred location Xi of the cells. [sent-26, score-0.873]
17 This comes from Gaussian tuning curves (k = 20; T = 0. [sent-27, score-0.071]
18 There are 81 units whose preferred values are placed at regular intervals of ~x = 0. [sent-29, score-0.081]
19 The lack of generalisation between tasks that involve the same visual feature samplers suggests that the basis functions, eg the orientation selective primary cortical cells that sample the inputs, do not change their sensitivity and shapes, eg their orientation selectivity or tuning widths. [sent-32, score-0.716]
20 However, evidence such as the specificity of learning to the eye of origin and spatial location strongly suggest that lower visual areas such as VI are directly involved in learning. [sent-33, score-0.195]
21 Indeed, VI is a visual processor of quite some computational power (performing tasks such as segmentation, contour-integration, pop-out, noise removal) rather than being just a feedforward, linear, processing stage (eg Li, 1999; Pouget et aII998). [sent-34, score-0.186]
22 Here, we study a paradigmatic perceptual task from a statistical perspective. [sent-35, score-0.192]
23 Rather than suggest particular learning rules, we seek to understand what it is about the structure of the task that might lead to two phases of learning (fast and slow), and thus what computational job might be ascribed to VI processing, in particular, the role of lateral recurrent connections. [sent-36, score-0.434]
24 We agree with the general consensus that fast learning involves the feedforward connections. [sent-37, score-0.323]
25 However, by considering positional invariance for discrimination, we show that there is an inherently non-linear component to the overall task, which defeats feedforward algorithms. [sent-38, score-0.383]
26 2 The bisection task Figure IA shows the bisection task. [sent-39, score-0.415]
27 Three bars are presented at horizontal positions Xo = y + E, x_ = -1 + Y and x+ = 1 + y, where -1 « E « 1. [sent-40, score-0.186]
28 Here y is a nuisance random number with zero mean, reflecting the variability in the position of stimulus array due to eye movements or other uncontrolled factors. [sent-41, score-0.454]
29 The task for the subject is to report which of the outer bars is closer to the central bar, ie to report whether E is greater than or less than O. [sent-42, score-0.429]
30 The bars create a population-coded representation in VI cells preferring vertical orientation. [sent-43, score-0.217]
31 In figure IB, we show the activity of cells ai as a function of preferred topographic location Xi of the cell; and, for simplicity, we ignore activities from other VI cells which prefer orientations other than vertical. [sent-44, score-0.504]
32 The net activity is ai = ai + ni, where ni is a noise term. [sent-46, score-0.405]
33 We assume that ni comes from a Poisson distribution and is independent across the units, and E and y have mean zero and are uniformly distributed in their respective ranges. [sent-47, score-0.137]
34 The subject must report whether E is greater or less than 0 on the basis of the activities a. [sent-48, score-0.209]
35 Without prior information about E, y, and with Poisson noise ni = ai - iii, we have 1,>0 3 Fixed position stimulus array When the stimulus array is in a fixed position y = 0, analysis is easy, and is very similar to that carried out by Seung & Sompolinsky (1993). [sent-51, score-1.061]
36 Dropping y, we calculate log P[Ela] and approximate it by Taylor expansion about E = 0 to second order in E: 10gP[aIE] . [sent-52, score-0.064]
37 Provided that the last term is negative (which it indeed is, almost surely), we derive an approximately Gaussian distribution (4) u; = £= u; t, with variance [-t-IogP[aIE]I,=o]-l and mean logP[aIE]IE=o. [sent-54, score-0.045]
38 Thus the 10gP[aIE]I,=0 is greater or subject should report that E > 0 or E < 0 if the test t(a) = less than zero respectively. [sent-55, score-0.068]
39 For the Poisson noise case we consider, log P[aIE] = constant+ l:i ai log iii ( E) since l:i iii (E) is a constant, independent of E. [sent-56, score-0.419]
40 Thus, t, (5) Therefore, maximum likelihood discrimination can be implemented by a linear feedforward network mapping inputs ai through feedforward weights Wi = log iii to calculate as the output t(a) = l:i Wiai . [sent-57, score-1.237]
41 A threshold of 0 on t(a) provides the discrimination E > 0 if t(a) > 0 and E < 0 for t(a) < O. [sent-58, score-0.279]
42 Note that if the noise corrupting the activities is Gaussian, the weights should instead be t, Wi a= alai. [sent-60, score-0.337]
43 Figure 2A shows the optimal discrimination weights for the case of independent Poisson noise. [sent-61, score-0.437]
44 The lower solid line in figure 2C shows optimal performance as a function of Eo The error rate drops precipitately from 50% for very small (and thus difficult) E to almost 0, long before E approaches the tuning width T. [sent-62, score-0.163]
45 4 Moveable stimulus array If the stimulus array can move around, ie if y is not necessarily 0, then the discrimination task gets considerably harder. [sent-65, score-1.028]
46 The upper dotted line in figure 2C shows the (rather unfair) test of using the learned weights in figure 2B when y E [- . [sent-66, score-0.323]
47 Looking at the weight structure in figure 2A;B suggests an obvious reason for this - the weights associated with 0, and the the outer bars are zero since they provide no information about E when y = ML weights learned weights performance Vl . [sent-70, score-0.764]
48 Figure 2: A) The ML optimal discrimination weights w = log a (plotted as Wi vs. [sent-88, score-0.501]
49 B) The learned discrimination weights w for the same decision. [sent-90, score-0.477]
50 During on line learning, random examples were selected with € E -2[-r, r] uniformly, r = 0. [sent-91, score-0.059]
51 1, and the weights were adjusted online to maximise the log probability of generating the correct discrimination under a model in which the probability of declaring that € > 0 is O'(~i Wiai) = 1/(1 + exp( - ~i Wiai)). [sent-92, score-0.501]
52 C) Performance of the networks with ML (lower solid line) and learned (lower dashed line) weights as a function of €. [sent-93, score-0.198]
53 Performance is measured by drawing a randomly given € and y, and assessing the %'age of trials the answer is incorrect. [sent-94, score-0.073]
54 The upper dotted line shows the effect of drawing y E [-0. [sent-95, score-0.093]
55 2] uniformly, yet using the ML weights in (B) that assume y = O. [sent-97, score-0.158]
56 weights are finely balanced about 0, the mid-point of the outer bars, giving an unbiased or balanced discrimination on E. [sent-98, score-0.582]
57 If the whole array can move, this balance will be destroyed, and all the above conclusions change. [sent-99, score-0.137]
58 Here, E and yare anti-correlated given activities a, because the information from the center stimulus bar only constrains their sum E + y. [sent-103, score-0.364]
59 Of interest is the probability P[Ela] dy log prE, yla], which is approximately Gaussian with mean (3 and variance where, under Poisson noise ni = ai - ai, p; p;, =f 210g _ (3 = [a· 810gB _ (a. [sent-104, score-0.354]
60 Interestingly, t( a) is a very simple quadratic form t(a) =a . [sent-116, score-0.078]
61 [( 82 log iii) (8 log iii ) _ (8 log iii) (8 28y iii)] lo~ 0tJ t 8y8e 8y 8e Ie,y-O _ (7) Therefore, the discrimination problem in the face of positional variance has a precisely quantifiable non-linear character. [sent-122, score-0.738]
62 The quadratic test t(a) cannot be implemented by a linear feedforward architecture only, since the optimal boundary t(a) 0 to separate the state space a for a decision is now curved. [sent-123, score-0.324]
63 C) The four eigenvectors of Q with non-zero eigenvalues (shown above). [sent-145, score-0.092]
64 The eigenvalues come in ± pairs; the associated eigenvectors come in antisymmetric pairs. [sent-146, score-0.092]
65 The absolute scale of Q and its eigenvalues is arbitrary. [sent-147, score-0.1]
66 A) Performance of the approximate inference based on the quadratic form of figure 3B in terms of %'age error as a function of Iyl and lEI (7 = 0. [sent-149, score-0.111]
67 Xi , learned using the same procedure as in figure 2B , but with y E [-. [sent-152, score-0.073]
68 C) Ratio of error rates for the linear (weights from B) to the quadratic discrimination. [sent-155, score-0.078]
69 = form Qij (Q~j + Qji)/2, we find Q only has four non-zero eigenvalues, for the 42 log · . [sent-157, score-0.064]
70 I su b -space spanne d b y 4 vectors 0 OYOf a If ,y=O, 0 log a If,y=O, 0 log a If,y=O, an d d ImenSlOna oy Of 02d~~ a ky=o. [sent-158, score-0.128]
71 Q and its eigenvectors and eigenvalues are shown in Figure 3B;C. [sent-159, score-0.092]
72 Note that if Gaussian rather than Poisson noise is used for ni ai - ai , the test t( a) is still quadratic. [sent-160, score-0.356]
73 = Using t(a) to infer E is sound for y up to two standard deviations (7) of the tuning curve f(x) away from 0, as shown in Figure 4A. [sent-161, score-0.071]
74 By comparison, a feedforward network, of weights shown in figure 4B and learned using the same error-correcting learning procedure as above, gives substantially worse performance, even though it is better than the feedforward net of Figure 2A;B. [sent-162, score-0.762]
75 Figure 4C shows the ratio of the error rates for the linear to the quadratic decisions. [sent-163, score-0.078]
76 The linear network is often dramatically worse, because it fails to take proper account of y . [sent-164, score-0.062]
77 We originally suggested that recurrent interactions in the form of horizontal intra-cortical connections within VI might be the site of the longer term improvement in behavior. [sent-165, score-0.333]
78 Input activity (as in figure IB) is used to initialise the state u at time t 0 of a recurrent network. [sent-167, score-0.337]
79 The recurrent weights are = A B recurrent weights C recu rrent error o lin/rec error Decision y Input a Figure 5: Threshold linear recurrent network, its weights, and performance. [sent-168, score-1.081]
80 The network activities evolve according to dui/dt = -Ui + Lj Jijg(Uj) + ai (8) where Jij are the recurrent weight from unit j to i , g(u) = U if U > 0 and g(u) = 0 for U :::; O. [sent-171, score-0.569]
81 The network activities finally settle to an equilibrium u(t -+ 00) (note that Ui (t -+ 00) = ai when J = 0). [sent-172, score-0.314]
82 The activity values u( t -+ 00) of this equilibrium are fed through feed forward weights w, that are trained for this recurrent network just as for the pure feedforward case, to reach a decision Li WiUi(t -+ 00). [sent-173, score-0.812]
83 Figure 5C shows that using this network gives results that are almost invariant to y (as for the quadratic discriminator) ; and figure 5D shows that it generally outperforms the optimal linear discriminator by a large margin, albeit performing slightly worse than the quadratic form. [sent-174, score-0.363]
84 The first two interaction components are translation invariant in the spatial range of Xi, Xj E [-2,2] where the stimulus array appears, in order to accommodate the positional variance in y. [sent-176, score-0.639]
85 The last component is not translation invariant and counters variations in y. [sent-177, score-0.071]
86 5 Discussion The problem of position invariant discrimination is common to many perceptual learning tasks, including hyper-acuity tasks such as the standard line vernier, three-dot vernier, curvature vernier, and orientation vernier tasks (Fahle et al 1995, Fahle 1997). [sent-178, score-1.046]
87 In particular, our mathematical formulation, derivations, and thus conclusions, are general and do not depend on any particular aspect of the bisection task. [sent-180, score-0.172]
88 The positional variable y may not have to correspond to the absolute position of the stimulus array, but merely to the error in the estimation of the absolute position of the stimulus by other neural areas. [sent-182, score-0.737]
89 We also showed that a non-linear recurrent network, which is a close relative of a line attractor network, can perform much better than a pure feedforward network on the bisection task in the face of position variance. [sent-184, score-1.038]
90 There is experimental evidence that lateral connections within VI change after learning the bisection task (Gilbert 2000), although we have yet to construct an appropriate learning rule. [sent-185, score-0.396]
91 We suggest that learning the recurrent weights for the nonlinear transform corresponds to the slow component in perceptual learning, while learning the feedforward weights corresponds to the fast component. [sent-186, score-1.138]
92 The desired recurrent weights are expected to be much more difficult to learn, in the face of nonlinear transforms and (the easily unstable) recurrent dynamics. [sent-187, score-0.712]
93 Further, the feedforward weights need to be adjusted further as the recurrent weights change the activities on which they work. [sent-188, score-0.989]
94 The precise recurrent interactions in our network are very specific to the task and its parameters. [sent-189, score-0.422]
95 In particular, the range of the interactions is completely determined by the scale of spacing between stimulus bars; and the distance-dependent excitation and inhibition in the recurrent weights is determined by the nature of the bisection task. [sent-190, score-0.839]
96 This may be why there is little transfer of learning between tasks, when the nature and the spatial scale of the task change, even if the same input units are involved. [sent-191, score-0.243]
97 However, our recurrent interaction model does predict that transfer is likely when the spacing between the two outer bars (here at ~x = 2) changes by a small fraction. [sent-192, score-0.607]
98 Further, since the signs of the recurrent synapses change drastically with the distance between the interacting cells, negative transfer is likely between two bisection tasks of slightly different spatial scales. [sent-193, score-0.656]
99 Achieving selectivity at the same time as translation invariance is a very basic requirement for position-invariant object recognition (see Riesenhuber & Poggio 1999 for a recent discussion), and arises in a pure form in this bisection task. [sent-195, score-0.312]
100 In our case, we have shown, at least for fairly small y, that the optimal non-linearity for the task is a simple quadratic. [sent-198, score-0.071]
wordName wordTfidf (topN-words)
[('discrimination', 0.279), ('recurrent', 0.255), ('fable', 0.249), ('feedforward', 0.246), ('edelman', 0.236), ('fahle', 0.224), ('poggio', 0.195), ('stimulus', 0.184), ('aie', 0.175), ('bisection', 0.172), ('weights', 0.158), ('bars', 0.15), ('vernier', 0.15), ('activities', 0.141), ('array', 0.137), ('perceptual', 0.121), ('ai', 0.111), ('tasks', 0.11), ('positional', 0.107), ('ela', 0.1), ('ni', 0.096), ('poisson', 0.094), ('eg', 0.091), ('ml', 0.088), ('position', 0.087), ('quadratic', 0.078), ('acuity', 0.075), ('discriminator', 0.075), ('wiai', 0.075), ('task', 0.071), ('iii', 0.071), ('tuning', 0.071), ('vi', 0.069), ('outer', 0.067), ('cells', 0.067), ('riesenhuber', 0.064), ('yla', 0.064), ('log', 0.064), ('network', 0.062), ('line', 0.059), ('zhaoping', 0.058), ('pre', 0.058), ('eigenvalues', 0.056), ('orientation', 0.054), ('slow', 0.054), ('interaction', 0.053), ('fukushima', 0.05), ('iogb', 0.05), ('karni', 0.05), ('sagi', 0.05), ('activity', 0.049), ('eye', 0.046), ('transfer', 0.046), ('units', 0.045), ('variance', 0.045), ('face', 0.044), ('absolute', 0.044), ('connections', 0.044), ('weiss', 0.044), ('cortical', 0.043), ('gilbert', 0.043), ('qij', 0.043), ('spatial', 0.042), ('pure', 0.042), ('uniformly', 0.041), ('wi', 0.041), ('xo', 0.04), ('learned', 0.04), ('learning', 0.039), ('trials', 0.039), ('bar', 0.039), ('balanced', 0.039), ('visual', 0.038), ('fast', 0.038), ('xi', 0.038), ('noise', 0.038), ('report', 0.037), ('invariant', 0.037), ('eigenvectors', 0.036), ('positions', 0.036), ('preferred', 0.036), ('ie', 0.036), ('pouget', 0.036), ('ixi', 0.036), ('sompolinsky', 0.036), ('spacing', 0.036), ('gatsby', 0.035), ('translation', 0.034), ('interactions', 0.034), ('selectivity', 0.034), ('jij', 0.034), ('drawing', 0.034), ('figure', 0.033), ('essentially', 0.032), ('involve', 0.032), ('change', 0.031), ('subject', 0.031), ('invariance', 0.03), ('li', 0.03), ('suggest', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 102 nips-2000-Position Variance, Recurrence and Perceptual Learning
Author: Zhaoping Li, Peter Dayan
Abstract: Stimulus arrays are inevitably presented at different positions on the retina in visual tasks, even those that nominally require fixation. In particular, this applies to many perceptual learning tasks. We show that perceptual inference or discrimination in the face of positional variance has a structurally different quality from inference about fixed position stimuli, involving a particular, quadratic, non-linearity rather than a purely linear discrimination. We show the advantage taking this non-linearity into account has for discrimination, and suggest it as a role for recurrent connections in area VI, by demonstrating the superior discrimination performance of a recurrent network. We propose that learning the feedforward and recurrent neural connections for these tasks corresponds to the fast and slow components of learning observed in perceptual learning tasks.
2 0.22082956 24 nips-2000-An Information Maximization Approach to Overcomplete and Recurrent Representations
Author: Oren Shriki, Haim Sompolinsky, Daniel D. Lee
Abstract: The principle of maximizing mutual information is applied to learning overcomplete and recurrent representations. The underlying model consists of a network of input units driving a larger number of output units with recurrent interactions. In the limit of zero noise, the network is deterministic and the mutual information can be related to the entropy of the output units. Maximizing this entropy with respect to both the feedforward connections as well as the recurrent interactions results in simple learning rules for both sets of parameters. The conventional independent components (ICA) learning algorithm can be recovered as a special case where there is an equal number of output units and no recurrent connections. The application of these new learning rules is illustrated on a simple two-dimensional input example.
3 0.12600543 8 nips-2000-A New Model of Spatial Representation in Multimodal Brain Areas
Author: Sophie Denève, Jean-René Duhamel, Alexandre Pouget
Abstract: Most models of spatial representations in the cortex assume cells with limited receptive fields that are defined in a particular egocentric frame of reference. However, cells outside of primary sensory cortex are either gain modulated by postural input or partially shifting. We show that solving classical spatial tasks, like sensory prediction, multi-sensory integration, sensory-motor transformation and motor control requires more complicated intermediate representations that are not invariant in one frame of reference. We present an iterative basis function map that performs these spatial tasks optimally with gain modulated and partially shifting units, and tests it against neurophysiological and neuropsychological data. In order to perform an action directed toward an object, it is necessary to have a representation of its spatial location. The brain must be able to use spatial cues coming from different modalities (e.g. vision, audition, touch, proprioception), combine them to infer the position of the object, and compute the appropriate movement. These cues are in different frames of reference corresponding to different sensory or motor modalities. Visual inputs are primarily encoded in retinotopic maps, auditory inputs are encoded in head centered maps and tactile cues are encoded in skin-centered maps. Going from one frame of reference to the other might seem easy. For example, the head-centered position of an object can be approximated by the sum of its retinotopic position and the eye position. However, positions are represented by population codes in the brain, and computing a head-centered map from a retinotopic map is a more complex computation than the underlying sum. Moreover, as we get closer to sensory-motor areas it seems reasonable to assume Spksls 150 100 50 o Figure 1: Response of a VIP cell to visual stimuli appearing in different part of the screen, for three different eye positions. The level of grey represent the frequency of discharge (In spikes per seconds). The white cross is the fixation point (the head is fixed). The cell's receptive field is moving with the eyes, but only partially. Here the receptive field shift is 60% of the total gaze shift. Moreover this cell is gain modulated by eye position (adapted from Duhamel et al). that the representations should be useful for sensory-motor transformations, rather than encode an
4 0.11231387 88 nips-2000-Multiple Timescales of Adaptation in a Neural Code
Author: Adrienne L. Fairhall, Geoffrey D. Lewen, William Bialek, Robert R. de Ruyter van Steveninck
Abstract: Many neural systems extend their dynamic range by adaptation. We examine the timescales of adaptation in the context of dynamically modulated rapidly-varying stimuli, and demonstrate in the fly visual system that adaptation to the statistical ensemble of the stimulus dynamically maximizes information transmission about the time-dependent stimulus. Further, while the rate response has long transients, the adaptation takes place on timescales consistent with optimal variance estimation.
5 0.10431254 49 nips-2000-Explaining Away in Weight Space
Author: Peter Dayan, Sham Kakade
Abstract: Explaining away has mostly been considered in terms of inference of states in belief networks. We show how it can also arise in a Bayesian context in inference about the weights governing relationships such as those between stimuli and reinforcers in conditioning experiments such as bacA, 'Ward blocking. We show how explaining away in weight space can be accounted for using an extension of a Kalman filter model; provide a new approximate way of looking at the Kalman gain matrix as a whitener for the correlation matrix of the observation process; suggest a network implementation of this whitener using an architecture due to Goodall; and show that the resulting model exhibits backward blocking.
7 0.097014114 10 nips-2000-A Productive, Systematic Framework for the Representation of Visual Structure
8 0.094180502 42 nips-2000-Divisive and Subtractive Mask Effects: Linking Psychophysics and Biophysics
9 0.082683496 34 nips-2000-Competition and Arbors in Ocular Dominance
10 0.08139047 147 nips-2000-Who Does What? A Novel Algorithm to Determine Function Localization
11 0.079077736 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition
12 0.077608593 43 nips-2000-Dopamine Bonuses
13 0.076605558 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics
14 0.0721955 121 nips-2000-Sparse Kernel Principal Component Analysis
15 0.070303574 19 nips-2000-Adaptive Object Representation with Hierarchically-Distributed Memory Sites
16 0.06990502 120 nips-2000-Sparse Greedy Gaussian Process Regression
17 0.066525981 101 nips-2000-Place Cells and Spatial Navigation Based on 2D Visual Feature Extraction, Path Integration, and Reinforcement Learning
18 0.06606324 124 nips-2000-Spike-Timing-Dependent Learning for Oscillatory Networks
19 0.06373547 129 nips-2000-Temporally Dependent Plasticity: An Information Theoretic Account
20 0.062577792 74 nips-2000-Kernel Expansions with Unlabeled Examples
topicId topicWeight
[(0, 0.235), (1, -0.129), (2, -0.087), (3, -0.008), (4, 0.028), (5, 0.077), (6, 0.044), (7, -0.072), (8, 0.172), (9, -0.089), (10, -0.05), (11, -0.036), (12, 0.018), (13, 0.074), (14, 0.166), (15, -0.088), (16, -0.051), (17, -0.054), (18, 0.173), (19, 0.15), (20, 0.175), (21, 0.118), (22, 0.169), (23, -0.006), (24, 0.07), (25, 0.181), (26, -0.014), (27, 0.069), (28, -0.056), (29, -0.153), (30, 0.052), (31, 0.041), (32, 0.169), (33, -0.131), (34, 0.039), (35, -0.21), (36, -0.066), (37, -0.036), (38, 0.009), (39, 0.11), (40, 0.09), (41, -0.115), (42, -0.014), (43, 0.037), (44, 0.027), (45, -0.165), (46, 0.054), (47, 0.075), (48, 0.042), (49, -0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.96554488 102 nips-2000-Position Variance, Recurrence and Perceptual Learning
Author: Zhaoping Li, Peter Dayan
Abstract: Stimulus arrays are inevitably presented at different positions on the retina in visual tasks, even those that nominally require fixation. In particular, this applies to many perceptual learning tasks. We show that perceptual inference or discrimination in the face of positional variance has a structurally different quality from inference about fixed position stimuli, involving a particular, quadratic, non-linearity rather than a purely linear discrimination. We show the advantage taking this non-linearity into account has for discrimination, and suggest it as a role for recurrent connections in area VI, by demonstrating the superior discrimination performance of a recurrent network. We propose that learning the feedforward and recurrent neural connections for these tasks corresponds to the fast and slow components of learning observed in perceptual learning tasks.
2 0.74072421 24 nips-2000-An Information Maximization Approach to Overcomplete and Recurrent Representations
Author: Oren Shriki, Haim Sompolinsky, Daniel D. Lee
Abstract: The principle of maximizing mutual information is applied to learning overcomplete and recurrent representations. The underlying model consists of a network of input units driving a larger number of output units with recurrent interactions. In the limit of zero noise, the network is deterministic and the mutual information can be related to the entropy of the output units. Maximizing this entropy with respect to both the feedforward connections as well as the recurrent interactions results in simple learning rules for both sets of parameters. The conventional independent components (ICA) learning algorithm can be recovered as a special case where there is an equal number of output units and no recurrent connections. The application of these new learning rules is illustrated on a simple two-dimensional input example.
3 0.47579613 147 nips-2000-Who Does What? A Novel Algorithm to Determine Function Localization
Author: Ranit Aharonov-Barki, Isaac Meilijson, Eytan Ruppin
Abstract: We introduce a novel algorithm, termed PPA (Performance Prediction Algorithm), that quantitatively measures the contributions of elements of a neural system to the tasks it performs. The algorithm identifies the neurons or areas which participate in a cognitive or behavioral task, given data about performance decrease in a small set of lesions. It also allows the accurate prediction of performances due to multi-element lesions. The effectiveness of the new algorithm is demonstrated in two models of recurrent neural networks with complex interactions among the elements. The algorithm is scalable and applicable to the analysis of large neural networks. Given the recent advances in reversible inactivation techniques, it has the potential to significantly contribute to the understanding of the organization of biological nervous systems, and to shed light on the long-lasting debate about local versus distributed computation in the brain.
4 0.4407891 34 nips-2000-Competition and Arbors in Ocular Dominance
Author: Peter Dayan
Abstract: Hebbian and competitive Hebbian algorithms are almost ubiquitous in modeling pattern formation in cortical development. We analyse in theoretical detail a particular model (adapted from Piepenbrock & Obermayer, 1999) for the development of Id stripe-like patterns, which places competitive and interactive cortical influences, and free and restricted initial arborisation onto a common footing.
5 0.42216343 8 nips-2000-A New Model of Spatial Representation in Multimodal Brain Areas
Author: Sophie Denève, Jean-René Duhamel, Alexandre Pouget
Abstract: Most models of spatial representations in the cortex assume cells with limited receptive fields that are defined in a particular egocentric frame of reference. However, cells outside of primary sensory cortex are either gain modulated by postural input or partially shifting. We show that solving classical spatial tasks, like sensory prediction, multi-sensory integration, sensory-motor transformation and motor control requires more complicated intermediate representations that are not invariant in one frame of reference. We present an iterative basis function map that performs these spatial tasks optimally with gain modulated and partially shifting units, and tests it against neurophysiological and neuropsychological data. In order to perform an action directed toward an object, it is necessary to have a representation of its spatial location. The brain must be able to use spatial cues coming from different modalities (e.g. vision, audition, touch, proprioception), combine them to infer the position of the object, and compute the appropriate movement. These cues are in different frames of reference corresponding to different sensory or motor modalities. Visual inputs are primarily encoded in retinotopic maps, auditory inputs are encoded in head centered maps and tactile cues are encoded in skin-centered maps. Going from one frame of reference to the other might seem easy. For example, the head-centered position of an object can be approximated by the sum of its retinotopic position and the eye position. However, positions are represented by population codes in the brain, and computing a head-centered map from a retinotopic map is a more complex computation than the underlying sum. Moreover, as we get closer to sensory-motor areas it seems reasonable to assume Spksls 150 100 50 o Figure 1: Response of a VIP cell to visual stimuli appearing in different part of the screen, for three different eye positions. The level of grey represent the frequency of discharge (In spikes per seconds). The white cross is the fixation point (the head is fixed). The cell's receptive field is moving with the eyes, but only partially. Here the receptive field shift is 60% of the total gaze shift. Moreover this cell is gain modulated by eye position (adapted from Duhamel et al). that the representations should be useful for sensory-motor transformations, rather than encode an
6 0.40056282 49 nips-2000-Explaining Away in Weight Space
8 0.34547347 43 nips-2000-Dopamine Bonuses
9 0.33085427 88 nips-2000-Multiple Timescales of Adaptation in a Neural Code
10 0.31328252 42 nips-2000-Divisive and Subtractive Mask Effects: Linking Psychophysics and Biophysics
11 0.27753845 10 nips-2000-A Productive, Systematic Framework for the Representation of Visual Structure
12 0.27240658 19 nips-2000-Adaptive Object Representation with Hierarchically-Distributed Memory Sites
13 0.25866491 129 nips-2000-Temporally Dependent Plasticity: An Information Theoretic Account
15 0.25652081 56 nips-2000-Foundations for a Circuit Complexity Theory of Sensory Processing
16 0.23618869 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation
17 0.22416641 69 nips-2000-Incorporating Second-Order Functional Knowledge for Better Option Pricing
18 0.22084317 30 nips-2000-Bayesian Video Shot Segmentation
19 0.22072823 20 nips-2000-Algebraic Information Geometry for Learning Machines with Singularities
20 0.21813278 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics
topicId topicWeight
[(10, 0.273), (17, 0.125), (32, 0.016), (33, 0.036), (38, 0.016), (42, 0.035), (55, 0.038), (62, 0.02), (65, 0.02), (67, 0.089), (76, 0.07), (79, 0.02), (81, 0.038), (90, 0.033), (91, 0.013), (97, 0.032)]
simIndex simValue paperId paperTitle
1 0.95762867 73 nips-2000-Kernel-Based Reinforcement Learning in Average-Cost Problems: An Application to Optimal Portfolio Choice
Author: Dirk Ormoneit, Peter W. Glynn
Abstract: Many approaches to reinforcement learning combine neural networks or other parametric function approximators with a form of temporal-difference learning to estimate the value function of a Markov Decision Process. A significant disadvantage of those procedures is that the resulting learning algorithms are frequently unstable. In this work, we present a new, kernel-based approach to reinforcement learning which overcomes this difficulty and provably converges to a unique solution. By contrast to existing algorithms, our method can also be shown to be consistent in the sense that its costs converge to the optimal costs asymptotically. Our focus is on learning in an average-cost framework and on a practical application to the optimal portfolio choice problem. 1
2 0.94395542 144 nips-2000-Vicinal Risk Minimization
Author: Olivier Chapelle, Jason Weston, Léon Bottou, Vladimir Vapnik
Abstract: The Vicinal Risk Minimization principle establishes a bridge between generative models and methods derived from the Structural Risk Minimization Principle such as Support Vector Machines or Statistical Regularization. We explain how VRM provides a framework which integrates a number of existing algorithms, such as Parzen windows, Support Vector Machines, Ridge Regression, Constrained Logistic Classifiers and Tangent-Prop. We then show how the approach implies new algorithms for solving problems usually associated with generative models. New algorithms are described for dealing with pattern recognition problems with very different pattern distributions and dealing with unlabeled data. Preliminary empirical results are presented.
same-paper 3 0.93417859 102 nips-2000-Position Variance, Recurrence and Perceptual Learning
Author: Zhaoping Li, Peter Dayan
Abstract: Stimulus arrays are inevitably presented at different positions on the retina in visual tasks, even those that nominally require fixation. In particular, this applies to many perceptual learning tasks. We show that perceptual inference or discrimination in the face of positional variance has a structurally different quality from inference about fixed position stimuli, involving a particular, quadratic, non-linearity rather than a purely linear discrimination. We show the advantage taking this non-linearity into account has for discrimination, and suggest it as a role for recurrent connections in area VI, by demonstrating the superior discrimination performance of a recurrent network. We propose that learning the feedforward and recurrent neural connections for these tasks corresponds to the fast and slow components of learning observed in perceptual learning tasks.
4 0.829036 9 nips-2000-A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work
Author: Ralf Herbrich, Thore Graepel
Abstract: We present a bound on the generalisation error of linear classifiers in terms of a refined margin quantity on the training set. The result is obtained in a PAC- Bayesian framework and is based on geometrical arguments in the space of linear classifiers. The new bound constitutes an exponential improvement of the so far tightest margin bound by Shawe-Taylor et al. [8] and scales logarithmically in the inverse margin. Even in the case of less training examples than input dimensions sufficiently large margins lead to non-trivial bound values and - for maximum margins - to a vanishing complexity term. Furthermore, the classical margin is too coarse a measure for the essential quantity that controls the generalisation error: the volume ratio between the whole hypothesis space and the subset of consistent hypotheses. The practical relevance of the result lies in the fact that the well-known support vector machine is optimal w.r.t. the new bound only if the feature vectors are all of the same length. As a consequence we recommend to use SVMs on normalised feature vectors only - a recommendation that is well supported by our numerical experiments on two benchmark data sets. 1
5 0.66077828 49 nips-2000-Explaining Away in Weight Space
Author: Peter Dayan, Sham Kakade
Abstract: Explaining away has mostly been considered in terms of inference of states in belief networks. We show how it can also arise in a Bayesian context in inference about the weights governing relationships such as those between stimuli and reinforcers in conditioning experiments such as bacA, 'Ward blocking. We show how explaining away in weight space can be accounted for using an extension of a Kalman filter model; provide a new approximate way of looking at the Kalman gain matrix as a whitener for the correlation matrix of the observation process; suggest a network implementation of this whitener using an architecture due to Goodall; and show that the resulting model exhibits backward blocking.
6 0.65146798 119 nips-2000-Some New Bounds on the Generalization Error of Combined Classifiers
7 0.6509493 69 nips-2000-Incorporating Second-Order Functional Knowledge for Better Option Pricing
8 0.63488942 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data
9 0.63148785 37 nips-2000-Convergence of Large Margin Separable Linear Classification
10 0.62715811 147 nips-2000-Who Does What? A Novel Algorithm to Determine Function Localization
11 0.6270535 133 nips-2000-The Kernel Gibbs Sampler
12 0.62614274 21 nips-2000-Algorithmic Stability and Generalization Performance
13 0.62514824 122 nips-2000-Sparse Representation for Gaussian Process Models
14 0.62260449 74 nips-2000-Kernel Expansions with Unlabeled Examples
15 0.61942434 130 nips-2000-Text Classification using String Kernels
16 0.61930275 4 nips-2000-A Linear Programming Approach to Novelty Detection
17 0.61858058 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm
18 0.61732864 111 nips-2000-Regularized Winnow Methods
19 0.61707002 75 nips-2000-Large Scale Bayes Point Machines
20 0.61642826 2 nips-2000-A Comparison of Image Processing Techniques for Visual Speech Recognition Applications