nips nips2012 nips2012-113 knowledge-graph by maker-knowledge-mining

113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

Source: pdf

Author: Brett Vintch, Andrew Zaharia, J Movshon, Hhmi) Hhmi), Eero P. Simoncelli

Abstract: Many visual and auditory neurons have response properties that are well explained by pooling the rectiﬁed responses of a set of spatially shifted linear ﬁlters. These ﬁlters cannot be estimated using spike-triggered averaging (STA). Subspace methods such as spike-triggered covariance (STC) can recover multiple ﬁlters, but require substantial amounts of data, and recover an orthogonal basis for the subspace in which the ﬁlters reside rather than the ﬁlters themselves. Here, we assume a linear-nonlinear–linear-nonlinear (LN-LN) cascade model in which the ﬁrst linear stage is a set of shifted (‘convolutional’) copies of a common ﬁlter, and the ﬁrst nonlinear stage consists of rectifying scalar nonlinearities that are identical for all ﬁlter outputs. We refer to these initial LN elements as the ‘subunits’ of the receptive ﬁeld. The second linear stage then computes a weighted sum of the responses of the rectiﬁed subunits. We present a method for directly ﬁtting this model to spike data, and apply it to both simulated and real neuronal data from primate V1. The subunit model signiﬁcantly outperforms STA and STC in terms of cross-validated accuracy and efﬁciency. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Efﬁcient and direct estimation of a neural subunit model for sensory coding Brett Vintch Andrew D. [sent-3, score-0.861]

2 edu Abstract Many visual and auditory neurons have response properties that are well explained by pooling the rectiﬁed responses of a set of spatially shifted linear ﬁlters. [sent-8, score-0.549]

3 Here, we assume a linear-nonlinear–linear-nonlinear (LN-LN) cascade model in which the ﬁrst linear stage is a set of shifted (‘convolutional’) copies of a common ﬁlter, and the ﬁrst nonlinear stage consists of rectifying scalar nonlinearities that are identical for all ﬁlter outputs. [sent-11, score-0.341]

4 The second linear stage then computes a weighted sum of the responses of the rectiﬁed subunits. [sent-13, score-0.202]

5 The subunit model signiﬁcantly outperforms STA and STC in terms of cross-validated accuracy and efﬁciency. [sent-15, score-0.829]

6 The most common models in the visual and auditory literature are based on linear-nonlinear (LN) cascades, in which a linear stage serves to project the highdimensional stimulus down to a one-dimensional signal, where it is then nonlinearly transformed to drive spiking. [sent-18, score-0.286]

7 For many visual and auditory neurons, responses are not well described by projection onto a single linear ﬁlter, but instead reﬂect a combination of several ﬁlters. [sent-21, score-0.249]

8 In the cat retina, the responses of Y cells have been described by linear pooling of shifted rectiﬁed linear ﬁlters, dubbed “subunits” [1, 2]. [sent-22, score-0.454]

9 In the auditory nerve, responses are described as computing the envelope of the temporally ﬁltered sound waveform, which can be computed via summation of squared quadrature ﬁlter responses [5]. [sent-24, score-0.397]

10 In primary visual cortex (V1), simple cells are well described using LN models [6, 7], but complex cell responses are more like a 1 superposition of multiple spatially shifted simple cells [8], each with the same orientation and spatial frequency preference [9]. [sent-25, score-0.726]

11 Although the description of complex cells is often reduced to a sum of two squared ﬁlters in quadrature [10], more recent experiments indicate that these cells (and indeed most ’simple’ cells) require multiple shifted ﬁlters to fully capture their responses [11, 12, 13]. [sent-26, score-0.527]

12 Intermediate nonlinearities are also required to describing the response properties of some neurons in V2 to stimuli (e. [sent-27, score-0.221]

13 The second linear stage then pools the responses of these “subunits” using a weighted sum, and the ﬁnal nonlinearity converts this to a ﬁring rate. [sent-31, score-0.309]

14 In a subunit model, the initial linear stage projects the stimulus into a multidimensional subspace, which can be estimated using spike-triggered covariance (STC) [20, 21]. [sent-36, score-0.999]

15 But this method relies on a Gaussian stimulus ensemble, requires a substantial amount of data, and recovers only a set of orthogonal axes for the response subspace—not the underlying biological ﬁlters. [sent-38, score-0.25]

16 More general methods based on information maximization alleviate some of the stimulus restrictions [25] but strongly limit the dimensionality of the recoverable subspace and still produce only a basis for the subspace. [sent-39, score-0.195]

17 Here, we develop a speciﬁc subunit model and a maximum likelihood procedure to estimate its parameters from spiking data. [sent-40, score-0.863]

18 2 Subunit model We assume that neural responses arise from a weighted sum of the responses of a set of nonlinear subunits. [sent-42, score-0.361]

19 Each subunit applies a linear ﬁlter to its input (which can be either the raw stimulus, or the responses arising from a previous stage in a hierarchical cascade), and transforms the ﬁltered response using a memoryless rectifying nonlinearity. [sent-43, score-1.111]

20 A critical simpliﬁcation is that the subunit ﬁlters are related by a ﬁxed transformation; here, we assume they are spatially translated copies of a common ﬁlter, and thus the population of subunits can be viewed as computing a convolution. [sent-44, score-0.99]

21 For example, the subunits of a V1 complex cell could be simple cells in V1 that share the same orientation and spatial frequency preference, but differ in spatial location, as originally proposed by Hubel & Wiesel [8, 9]. [sent-45, score-0.457]

22 + b, (1) m,n i,j,⌧ where k is the subunit ﬁlter, f⇥ is a point-wise function parameterized by vector ⇥, wn,m are the spatial weights, and b is an additive baseline. [sent-50, score-0.855]

23 The ellipsis indicates that we allow for multiple subunit channels, each with its own ﬁlter, nonlinearity, and pooling weights. [sent-51, score-0.876]

24 Figure 1 demonstrates what happens when STC is applied to a simulated complex cell with 15 spatially shifted subunits. [sent-60, score-0.277]

25 The response of this model cell is 2 b) c) eigenvalues STC plane shifted filter manifold λ1-2 1 0 λ1-4 1 0 // 10 6 data points position envelope eigenvectors // 10 4 data points a) Figure 1: Spike-triggered covariance analysis of a simulated V1 complex cell. [sent-61, score-0.364]

26 (a) The model output is formed by summing the rectiﬁed responses of multiple linear ﬁlter kernels which are shifted and scaled copies of a canonical form. [sent-62, score-0.308]

27 (b) The shifted ﬁlters lie along a manifold in stimulus space (four shown), and are not mutually orthogonal in general. [sent-63, score-0.244]

28 STC recovers an orthogonal basis for a low-dimensional subspace that contains this manifold by ﬁnding the directions in stimulus space along which spikes are elicited or suppressed. [sent-64, score-0.233]

29 P r(t) = i wi b(~ i · ~ (t))c2 , where the ~ are shifted ﬁlters, w weights ﬁlters by position, and ~ is ˆ k x k’s x the stimulus vector. [sent-70, score-0.225]

30 The recovered STC axes span the same subspace as the shifted model ﬁlters, but there are fewer of them, and the enforced orthogonality of eigenvectors means that they are generally not a direct match to any of the model ﬁlters. [sent-71, score-0.218]

31 Although one may follow the STC analysis by indirectly identifying a localized ﬁlter whose shifted copies span the recovered subspace [11, 13], the reliance on STC still imposes the stimulus limitations and data requirements mentioned above. [sent-73, score-0.342]

32 3 Direct subunit model estimation A generic subspace method like STC does not exploit the speciﬁc structure of the subunit model. [sent-74, score-1.68]

33 (1) allows us to fold the second linear pooling stage and the subunit nonlinearity into a single sum: 0 1 X X r(t) = ˆ wm,n ↵l Tl @ k(m, n, ⌧ )· x(i m, j n, t ⌧ )A + . [sent-78, score-1.011]

34 In the ﬁrst, the stimulus is convolved with k, and in the second, the nonlinear responses are summed with a set of weights that are separable in the indices l and n, m. [sent-83, score-0.329]

35 For models that include two subunit channels we optimize over both channels simultaneously (see section 3. [sent-86, score-0.885]

36 1 Estimating the convolutional subunit kernel The ﬁrst coordinate descent leg optimizes the convolutional subunit kernel, k, using gradient descent while ﬁxing the subunit nonlinearity and the ﬁnal linear pooling. [sent-89, score-2.658]

37 This property also ensures that the descent is locally convex: assuming that updating k does not cause any of the the linear subunit responses to jump between the localized tent functions representing f , then the optimization is linear and the objective function is quadratic. [sent-91, score-1.061]

38 In practice, the full gradient descent path causes the linear subunit responses to move slowly across bins of the piecewise nonlinearity. [sent-92, score-0.979]

39 2 Estimating the subunit nonlinearities and linear subunit pooling The second leg of coordinate descent optimizes the subunit nonlinearity (more speciﬁcally, the weights on the tent functions, ↵l ), and the subunit pooling, wn,m . [sent-95, score-3.503]

40 We found that initializing our two-channel subunit model to have a positive pooling function for one channel and a negative pooling function for the second channel allowed the optimization of the second channel to proceed much more quickly. [sent-105, score-1.175]

41 This is probably due in part to a suppressive channel that is much weaker than the excitatory channel in general. [sent-106, score-0.318]

42 We initialized the nonlinearity to halfwave-rectiﬁcation for the excitatory channel and fullwaverectiﬁcation for the suppressive channel. [sent-107, score-0.34]

43 The subunit model describes a receptive ﬁeld as the linear combination of nonlinear kernel responses that spatially tile the stimulus. [sent-109, score-1.132]

44 Thus, the contribution of each localized patch of stimulus (of a size equal to the subunit kernel) is the same, up to a scale factor set by the weighting used in the subsequent pooling stage. [sent-110, score-1.065]

45 For each subunit location, {m, n}, we extract the local stimulus values in a window, gm,n (i, j), the size of the convolutional kernel and append them vertically in a ’local’ stimulus matrix. [sent-112, score-1.162]

46 We also generate a vector containing the vertical concatenation of copies of the measured spike train, ~ (one copy for each subunit location). [sent-114, score-0.904]

47 After performing STC analysis on the localized stimulus matrix, we use the ﬁrst (largest variance) eigenvector to initialize the subunit kernel of the excitatory channel, and the last (lowest variance) eigenvector to initialize the kernel of the suppressive channel. [sent-124, score-1.21]

48 4 a) b) Simulated simple cell 1 Simulated complex cell 1 Model performance (r) train 0. [sent-126, score-0.198]

49 25 subunit model Rust-STC model 0 5/60 10/60 1 5 10 20 0 40 5/60 10/60 Minutes of simulated data 1 5 10 20 40 Minutes of simulated data Figure 2: Model ﬁtting performance for simulated V1 neurons. [sent-132, score-1.027]

50 Shown are correlation coefﬁcients for the subunit model (black circles) and the Rust-STC model (blue squares) [11], computed on both the training data (open), and on a holdout test set (closed). [sent-133, score-0.853]

51 Insets show estimated ﬁlters for the subunit (top) and RustSTC (bottom) models with ten seconds (400 frames; left) and 20 minutes (48,000 frames; right) of data. [sent-140, score-0.805]

52 4 Experiments We ﬁt the subunit model to physiological data sets in 3 different primate cortical areas: V1, V2, and MT. [sent-141, score-0.87]

53 Initially, we use simulated V1 cells to compare the performance of the subunit model to that of the Rust-STC model [11], which is based upon STC analysis. [sent-143, score-1.022]

54 1 Simulated V1 data We simulated the responses of canonical V1 simple cells and complex cells in response to white noise stimuli. [sent-145, score-0.561]

55 The simulated cells use spatiotemporally oriented Gabor ﬁlters: The simple cell has one even-phase ﬁlter and a half-squaring output nonlinearity while the complex cell has two ﬁlters (one even and one odd) whose squared responses are combined to give a ﬁring rate. [sent-147, score-0.633]

56 For consistency with the analysis of the physiological data, we ﬁt the simulated data using a subunit model with two subunit channels (even though the simulated cells only possess an excitatory channel). [sent-151, score-2.017]

57 Brieﬂy, after the STA and STC ﬁlters are estimated, they are weighted according to their predictive power and combined in excitatory and suppressive pools, E and S (we use cross-validation to determine the number of ﬁlters to use for each pool). [sent-153, score-0.182]

58 But as the data set increases in size, the subunit model rapidly improves, reaching near-perfect performance for modest spike counts. [sent-158, score-0.897]

59 The Rust-STC model also improves, but much more slowly; It requires more than an order of magnitude more data to achieve the same performance as the subunit model. [sent-159, score-0.829]

60 This 5 a) b) Convolutional subunit filters Nonlinearity Position map spikes 0 0 ms 25 50 100 125 150 + 75 175 0. [sent-160, score-0.824]

61 96º 1s c) Firing rate (ips) Suppressive Excitatory Trials 5 200 150 subunit 100 50 0 measured Rust 0 90 180 Orientation 270 Figure 3: Two-channel subunit model ﬁt to a physiological data from a macaque V1 cell. [sent-161, score-1.727]

62 (a) Fitted parameters for the excitatory (top row) and suppressive (bottom row) channels, including the spacetime subunit ﬁlters (8 grayscale images, corresponding to different time frames), the nonlinearity, and the spatial weighting function wn,m that is used to combine the subunit responses. [sent-162, score-1.842]

63 (b) A raster showing spiking responses to 20 repeated presentations of an identical stimulus with the average spike count (black) and model prediction (blue) plotted above. [sent-163, score-0.49]

64 We conclude that directly ﬁtting the subunit model is much more efﬁcient in the use of data than using STC to estimate a subspace model. [sent-166, score-0.875]

65 2 Physiological data from macaque V1 We presented spatio-temporal pixel noise to 38 cells recorded from V1 in anesthetized macaques (see [11] for details of experimental design). [sent-168, score-0.182]

66 The stimulus was a 16x16 grid with luminance values set by independent ternary white noise sequences refreshed at 40 Hz. [sent-169, score-0.209]

67 For 21 neurons we also presented 20 repeats of a sequence of 1000 stimulus frames as a validation set. [sent-170, score-0.274]

68 The model ﬁlters were assumed to respond over a 200 ms (8 frame) causal time window in which the stimulus most strongly affected the ﬁring of the neurons, and thus, model responses were derived from a stimulus vector with 2048 dimensions (16x16x8). [sent-171, score-0.503]

69 Figure 3 shows the ﬁt of a 2-channel subunit model to data from a typical V1 cell. [sent-172, score-0.829]

70 Figure 3a illustrates the subunit kernels and their associated nonlinearities and spatial pooling maps, for both the excitatory channel (top row) and the suppressive channel (bottom row). [sent-173, score-1.289]

71 First, the model shows a symmetric, full-wave rectifying nonlinearity for the excitatory channel. [sent-176, score-0.241]

72 Second, the ﬁnal linear pooling for this channel is diffuse over space, eliciting a response that is invariant to the exact spatial position and phase of the stimulus. [sent-177, score-0.241]

73 A raster of spiking responses to twenty repetitions of a 5 s stimulus are depicted in Fig. [sent-181, score-0.357]

74 The subunit model acceptably ﬁts most of the cells we recorded in V1. [sent-185, score-0.959]

75 The ﬁtted subunit model also signiﬁcantly outperforms the Rust-STC model in terms of predicting responses to novel data. [sent-190, score-1.01]

76 Figure 4a shows the performance of the Rust-STC and subunit models for 21 V1 neurons, for both training data and test data on single trials. [sent-191, score-0.805]

77 STC model performs signiﬁcantly better than the subunit model (Figure 4a; < rRust >= 0. [sent-220, score-0.853]

78 For test data (that was not included in the data used to ﬁt the models), the subunit model exhibits signiﬁcantly better performance than the Rust-STC model (< rRust >= 0. [sent-225, score-0.853]

79 For the same stimulus, a subunit model with two channels and an 8x8x8 subunit kernel has only about 1200 parameters. [sent-231, score-1.691]

80 The subunit model performs well when compared to the Rust-STC model, but we were interested in obtaining a more absolute measure of performance. [sent-232, score-0.829]

81 We can estimate an upper bound on stimulus-driven model performance by implementing an empirical ‘oracle’ model that uses the average response over all but one of a set of repeated stimulus trials to predict the response on the remaining trial. [sent-234, score-0.321]

82 Over the 21 neurons with repeated stimulus data, we found that the subunit model achieved, on average, 76% the performance of the oracle model (Figure 4b). [sent-235, score-1.119]

83 Moreover, the cells that were least well ﬁt by the subunit model were also the cells that responded only weakly to the stimulus (Figure 4c). [sent-236, score-1.2]

84 We conclude that, for most cells, the ﬁtted subunit model explains a signiﬁcant fraction of the response that can be explained by any stimulus-driven model. [sent-237, score-0.881]

85 Compared with STA or STC, the model ﬁts are more accurate for a given amount of data, less sensitive to the choice of stimulus ensemble, and more interpretable in terms of biological mechanism. [sent-240, score-0.212]

86 Speciﬁcally, for neurons in area V2, we model the afferent V1 population as a collection of simple cells that tile visual space. [sent-243, score-0.342]

87 For neurons in area MT (V5), we use an afferent V1 population that also includes direction selective subunits, because the projections from V1 to MT are known to be sensitive to the direction of visual motion [31]. [sent-246, score-0.188]

88 We ﬁt these models to neural responses to textured stimuli that varied in contrast and local orientation content (for MT, the local elements also drift over time). [sent-248, score-0.236]

89 Our preliminary results show that the subunit model outperforms standard models for these higher order areas as well. [sent-249, score-0.848]

90 We are currently working to reﬁne and generalize the subunit model in a number of ways. [sent-250, score-0.829]

91 Real neurons also possess other forms of nonlinearities, such as local gain control that is been observed in neurons through the visual and auditory systems [33]. [sent-255, score-0.236]

92 Linear and nonlinear spatial subunits in Y cat retinal ganglion cells, 1976. [sent-271, score-0.246]

93 Bipolar cells contribute to nonlinear spatial summation in the brisk-transient (Y) ganglion cell in mammalian retina. [sent-278, score-0.297]

94 Y-cell receptive ﬁeld and collicular projection of parasol ganglion cells in macaque monkey retina. [sent-293, score-0.272]

95 The two-dimensional spatial structure of simple receptive ﬁelds in cat striate cortex. [sent-310, score-0.181]

96 Excitatory and suppressive receptive ﬁeld subunits in awake monkey primary visual cortex (V1). [sent-361, score-0.343]

97 A quantitative explanation of responses to disparitydeﬁned edges in macaque V2. [sent-383, score-0.209]

98 The nonlinear pathway of Y ganglion cells in the cat retina. [sent-470, score-0.202]

99 Characterizing responses of translationinvariant neurons to natural stimuli: maximally informative invariant dimensions. [sent-479, score-0.229]

100 Visual response properties of striate cortical neurons projecting to area MT in macaque monkeys. [sent-496, score-0.21]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('subunit', 0.805), ('stc', 0.284), ('responses', 0.157), ('stimulus', 0.149), ('lters', 0.133), ('cells', 0.111), ('suppressive', 0.107), ('subunits', 0.105), ('nonlinearity', 0.09), ('cell', 0.084), ('shifted', 0.076), ('excitatory', 0.075), ('neurons', 0.072), ('pooling', 0.071), ('spike', 0.068), ('channel', 0.068), ('receptive', 0.058), ('lter', 0.058), ('simulated', 0.058), ('stimuli', 0.052), ('rectifying', 0.052), ('sta', 0.052), ('response', 0.052), ('macaque', 0.052), ('ring', 0.052), ('visual', 0.051), ('spatial', 0.05), ('subspace', 0.046), ('rust', 0.045), ('afferent', 0.045), ('stage', 0.045), ('nonlinearities', 0.045), ('tting', 0.044), ('convolutional', 0.042), ('tent', 0.042), ('auditory', 0.041), ('physiological', 0.041), ('channels', 0.04), ('localized', 0.04), ('cat', 0.039), ('movshon', 0.039), ('neuroscience', 0.035), ('striate', 0.034), ('spiking', 0.034), ('physiology', 0.033), ('tl', 0.033), ('sensory', 0.032), ('copies', 0.031), ('complex', 0.03), ('spatially', 0.029), ('recti', 0.029), ('ganglion', 0.029), ('selectivity', 0.029), ('frames', 0.029), ('axes', 0.027), ('orientation', 0.027), ('oracle', 0.025), ('retina', 0.025), ('schwartz', 0.025), ('repeats', 0.024), ('model', 0.024), ('rrust', 0.024), ('rsubunit', 0.024), ('steerable', 0.024), ('vintch', 0.024), ('nonlinear', 0.023), ('quadrature', 0.023), ('mt', 0.023), ('biological', 0.022), ('white', 0.022), ('spatiotemporal', 0.022), ('monkey', 0.022), ('neuronal', 0.022), ('eigenvectors', 0.021), ('presentations', 0.021), ('sinusoidal', 0.021), ('ln', 0.021), ('canonical', 0.02), ('population', 0.02), ('repeated', 0.02), ('recorded', 0.019), ('tile', 0.019), ('luminance', 0.019), ('hubel', 0.019), ('ternary', 0.019), ('simoncelli', 0.019), ('areas', 0.019), ('spikes', 0.019), ('manifold', 0.019), ('ensemble', 0.019), ('squared', 0.019), ('neurophysiology', 0.018), ('leg', 0.018), ('bialek', 0.017), ('pools', 0.017), ('raster', 0.017), ('ips', 0.017), ('interpretable', 0.017), ('kernel', 0.017), ('descent', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

Author: Brett Vintch, Andrew Zaharia, J Movshon, Hhmi) Hhmi), Eero P. Simoncelli

2 0.18784043 62 nips-2012-Burn-in, bias, and the rationality of anchoring

Author: Falk Lieder, Thomas Griffiths, Noah Goodman

Abstract: Recent work in unsupervised feature learning has focused on the goal of discovering high-level features from unlabeled images. Much progress has been made in this direction, but in most cases it is still standard to use a large amount of labeled data in order to construct detectors sensitive to object classes or other complex patterns in the data. In this paper, we aim to test the hypothesis that unsupervised feature learning methods, provided with only unlabeled data, can learn high-level, invariant features that are sensitive to commonly-occurring objects. Though a handful of prior results suggest that this is possible when each object class accounts for a large fraction of the data (as in many labeled datasets), it is unclear whether something similar can be accomplished when dealing with completely unlabeled data. A major obstacle to this test, however, is scale: we cannot expect to succeed with small datasets or with small numbers of learned features. Here, we propose a large-scale feature learning system that enables us to carry out this experiment, learning 150,000 features from tens of millions of unlabeled images. Based on two scalable clustering algorithms (K-means and agglomerative clustering), we ﬁnd that our simple system can discover features sensitive to a commonly occurring object class (human faces) and can also combine these into detectors invariant to signiﬁcant global distortions like large translations and scale. 1

3 0.18784043 116 nips-2012-Emergence of Object-Selective Features in Unsupervised Feature Learning

Author: Adam Coates, Andrej Karpathy, Andrew Y. Ng

4 0.14101185 195 nips-2012-Learning visual motion in recurrent neural networks

Author: Marius Pachitariu, Maneesh Sahani

Abstract: We present a dynamic nonlinear generative model for visual motion based on a latent representation of binary-gated Gaussian variables. Trained on sequences of images, the model learns to represent different movement directions in different variables. We use an online approximate inference scheme that can be mapped to the dynamics of networks of neurons. Probed with drifting grating stimuli and moving bars of light, neurons in the model show patterns of responses analogous to those of direction-selective simple cells in primary visual cortex. Most model neurons also show speed tuning and respond equally well to a range of motion directions and speeds aligned to the constraint line of their respective preferred speed. We show how these computations are enabled by a speciﬁc pattern of recurrent connections learned by the model. 1

5 0.1385666 23 nips-2012-A lattice filter model of the visual pathway

Author: Karol Gregor, Dmitri B. Chklovskii

Abstract: Early stages of visual processing are thought to decorrelate, or whiten, the incoming temporally varying signals. Motivated by the cascade structure of the visual pathway (retina → lateral geniculate nucelus (LGN) → primary visual cortex, V1) we propose to model its function using lattice ﬁlters - signal processing devices for stage-wise decorrelation of temporal signals. Lattice ﬁlter models predict neuronal responses consistent with physiological recordings in cats and primates. In particular, they predict temporal receptive ﬁelds of two different types resembling so-called lagged and non-lagged cells in the LGN. Moreover, connection weights in the lattice ﬁlter can be learned using Hebbian rules in a stage-wise sequential manner reminiscent of the neuro-developmental sequence in mammals. In addition, lattice ﬁlters can model visual processing in insects. Therefore, lattice ﬁlter is a useful abstraction that captures temporal aspects of visual processing. Our sensory organs face an ongoing barrage of stimuli from the world and must transmit as much information about them as possible to the rest of the brain [1]. This is a formidable task because, in sensory modalities such as vision, the dynamic range of natural stimuli (more than three orders of magnitude) greatly exceeds the dynamic range of relay neurons (less than two orders of magnitude) [2]. The reason why high ﬁdelity transmission is possible at all is that the continuity of objects in the physical world leads to correlations in natural stimuli, which imply redundancy. In turn, such redundancy can be eliminated by compression performed by the front end of the visual system leading to the reduction of the dynamic range [3, 4]. A compression strategy appropriate for redundant natural stimuli is called predictive coding [5, 6, 7]. In predictive coding, a prediction of the incoming signal value is computed from past values delayed in the circuit. This prediction is subtracted from the actual signal value and only the prediction error is transmitted. In the absence of transmission noise such compression is lossless as the original signal could be decoded on the receiving end by inverting the encoder. If predictions are accurate, the dynamic range of the error is much smaller than that of the natural stimuli. Therefore, minimizing dynamic range using predictive coding reduces to optimizing prediction. Experimental support for viewing the front end of the visual system as a predictive encoder comes from the measurements of receptive ﬁelds [6, 7]. In particular, predictive coding suggests that, for natural stimuli, the temporal receptive ﬁelds should be biphasic and the spatial receptive ﬁelds center-surround. These predictions are born out by experimental measurements in retinal ganglion cells, [8], lateral geniculate nucleus (LGN) neurons [9] and ﬂy second order visual neurons called large monopolar cells (LMCs) [2]. In addition, the experimentally measured receptive ﬁelds vary with signal-to-noise ratio as would be expected from optimal prediction theory [6]. Furthermore, experimentally observed whitening of the transmitted signal [10] is consistent with removing correlated components from the incoming signals [11]. As natural stimuli contain correlations on time scales greater than hundred milliseconds, experimentally measured receptive ﬁelds of LGN neurons are equally long [12]. Decorrelation over such long time scales requires equally long delays. How can such extended receptive ﬁeld be produced by 1 biological neurons and synapses whose time constants are typically less than hundred milliseconds [13]? The ﬁeld of signal processing offers a solution to this problem in the form of a device called a lattice ﬁlter, which decorrelates signals in stages, sequentially adding longer and longer delays [14, 15, 16, 17]. Motivated by the cascade structure of visual systems [18], we propose to model decorrelation in them by lattice ﬁlters. Naturally, visual systems are more complex than lattice ﬁlters and perform many other operations. However, we show that the lattice ﬁlter model explains several existing observations in vertebrate and invertebrate visual systems and makes testable predictions. Therefore, we believe that lattice ﬁlters provide a convenient abstraction for modeling temporal aspects of visual processing. This paper is organized as follows. First, we brieﬂy summarize relevant results from linear prediction theory. Second, we explain the operation of the lattice ﬁlter in discrete and continuous time. Third, we compare lattice ﬁlter predictions with physiological measurements. 1 Linear prediction theory Despite the non-linear nature of neurons and synapses, the operation of some neural circuits in vertebrates [19] and invertebrates [20] can be described by a linear systems theory. The advantage of linear systems is that optimal circuit parameters may be obtained analytically and the results are often intuitively clear. Perhaps not surprisingly, the ﬁeld of signal processing relies heavily on the linear prediction theory, offering a convenient framework [15, 16, 17]. Below, we summarize the results from linear prediction that will be used to explain the operation of the lattice ﬁlter. Consider a scalar sequence y = {yt } where time t = 1, . . . , n. Suppose that yt at each time point depends on side information provided by vector zt . Our goal is to generate a series of linear predictions, yt from the vector zt , yt = w · zt . We deﬁne a prediction error as: ˆ ˆ et = yt − yt = yt − w · zt ˆ (1) and look for values of w that minimize mean squared error: e2 = 1 nt e2 = t t 1 nt (yt − w · zt )2 . (2) t The weight vector, w is optimal for prediction of sequence y from sequence z if and only if the prediction error sequence e = y − w · z is orthogonal to each component of vector z: ez = 0. (3) When the whole series y is given in advance, i.e. in the ofﬂine setting, these so-called normal equations can be solved for w, for example, by Gaussian elimination [21]. However, in signal processing and neuroscience applications, another setting called online is more relevant: At every time step t, prediction yt must be made using only current values of zt and w. Furthermore, after a ˆ prediction is made, w is updated based on the prediction yt and observed yt , zt . ˆ In the online setting, an algorithm called stochastic gradient descent is often used, where, at each time step, w is updated in the direction of negative gradient of e2 : t w →w−η w (yt − w · zt ) 2 . (4) This leads to the following weight update, known as least mean square (LMS) [15], for predicting sequence y from sequence z: w → w + ηet zt , (5) where η is the learning rate. The value of η represents the relative inﬂuence of more recent observations compared to more distant ones. The larger the learning rate the faster the system adapts to recent observations and less past it remembers. In this paper, we are interested in predicting a current value xt of sequence x from its past values xt−1 , . . . , xt−k restricted by the prediction order k > 0: xt = wk · (xt−1 , . . . , xt−k )T . ˆ 2 (6) This problem is a special case of the online linear prediction framework above, where yt = xt , zt = (xt−1 , . . . , xt−k )T . Then the gradient update is given by: w → wk + ηet (xt−1 , . . . , xt−k )T . (7) While the LMS algorithm can ﬁnd the weights that optimize linear prediction (6), the ﬁlter wk has a long temporal extent making it difﬁcult to implement with neurons and synapses. 2 Lattice ﬁlters One way to generate long receptive ﬁelds in circuits of biological neurons is to use a cascade architecture, known as the lattice ﬁlter, which calculates optimal linear predictions for temporal sequences and transmits prediction errors [14, 15, 16, 17]. In this section, we explain the operation of a discrete-time lattice ﬁlter, then adapt it to continuous-time operation. 2.1 Discrete-time implementation The ﬁrst stage of the lattice ﬁlter, Figure 1, calculates the error of the ﬁrst order optimal prediction (i.e. only using the preceding element of the sequence), the second stage uses the output of the ﬁrst stage and calculates the error of the second order optimal prediction (i.e. using only two previous values) etc. To make such stage-wise error computations possible the lattice ﬁlter calculates at every stage not only the error of optimal prediction of xt from past values xt−1 , . . . , xt−k , called forward error, ftk = xt − wk · (xt−1 , . . . , xt−k )T , (8) but, perhaps non-intuitively, also the error of optimal prediction of a past value xt−k from the more recent values xt−k+1 , . . . , xt , called backward error: bk = xt−k − w k · (xt−k+1 , . . . , xt )T , t k where w and w k (9) are the weights of the optimal prediction. For example, the ﬁrst stage of the ﬁlter calculates the forward error ft1 of optimal prediction of xt from xt−1 : ft1 = xt − u1 xt−1 as well as the backward error b1 of optimal prediction of xt−1 from t xt : b1 = xt−1 − v 1 xt , Figure 1. Here, we assume that coefﬁcients u1 and v 1 that give optimal linear t prediction are known and return to learning them below. Each following stage of the lattice ﬁlter performs a stereotypic operation on its inputs, Figure 1. The k-th stage (k > 1) receives forward, ftk−1 , and backward, bk−1 , errors from the previous stage, t delays backward error by one time step and computes a forward error: ftk = ftk−1 − uk bk−1 t−1 (10) of the optimal linear prediction of ftk−1 from bk−1 . In addition, each stage computes a backward t−1 error k−1 k bt = bt−1 − v k ftk−1 (11) of the optimal linear prediction of bk−1 from ftk−1 . t−1 As can be seen in Figure 1, the lattice ﬁlter contains forward prediction error (top) and backward prediction error (bottom) branches, which interact at every stage via cross-links. Operation of the lattice ﬁlter can be characterized by the linear ﬁlters acting on the input, x, to compute forward or backward errors of consecutive order, so called prediction-error ﬁlters (blue bars in Figure 1). Because of delays in the backward error branch the temporal extent of the ﬁlters grows from stage to stage. In the next section, we will argue that prediction-error ﬁlters correspond to the measurements of temporal receptive ﬁelds in neurons. For detailed comparison with physiological measurements we will use the result that, for bi-phasic prediction-error ﬁlters, such as the ones in Figure 1, the ﬁrst bar of the forward prediction-error ﬁlter has larger weight, by absolute value, than the combined weights of the remaining coefﬁcients of the corresponding ﬁlter. Similarly, in backward predictionerror ﬁlters, the last bar has greater weight than the rest of them combined. This fact arises from the observation that forward prediction-error ﬁlters are minimum phase, while backward predictionerror ﬁlters are maximum phase [16, 17]. 3 Figure 1: Discrete-time lattice ﬁlter performs stage-wise computation of forward and backward prediction errors. In the ﬁrst stage, the optimal prediction of xt from xt−1 is computed by delaying the input by one time step and multiplying it by u1 . The upper summation unit subtracts the predicted xt from the actual value and outputs prediction error ft1 . Similarly, the optimal prediction of xt−1 from xt is computed by multiplying the input by v 1 . The lower summation unit subtracts the optimal prediction from the actual value and outputs backward error b1 . In each following stage t k, the optimal prediction of ftk−1 from bk−1 is computed by delaying bk−1 by one time step and t t multiplying it by uk . The upper summation unit subtracts the prediction from the actual ftk−1 and outputs prediction error ftk . Similarly, the optimal prediction of bk−1 from ftk−1 is computed by t−1 multiplying it by uk . The lower summation unit subtracts the optimal prediction from the actual value and outputs backward error bk . Black connections have unitary weights and red connections t have learnable negative weights. One can view forward and backward error calculations as applications of so-called prediction-error ﬁlters (blue) to the input sequence. Note that the temporal extent of the ﬁlters gets longer from stage to stage. Next, we derive a learning rule for ﬁnding optimal coefﬁcients u and v in the online setting. The uk is used for predicting ftk−1 from bk−1 to obtain error ftk . By substituting yt = ftk−1 , zt = bk−1 and t−1 t−1 et = ftk into (5) the update of uk becomes uk → uk + ηftk bk−1 . t−1 (12) Similarly, v k is updated by v k → v k + ηbk ftk−1 . (13) t Interestingly, the updates of the weights are given by the product of the activities of outgoing and incoming nodes of the corresponding cross-links. Such updates are known as Hebbian learning rules thought to be used by biological neurons [22, 23]. Finally, we give a simple proof that, in the ofﬂine setting when the entire sequence x is known, f k and bk , given by equations (10, 11), are indeed errors of optimal k-th order linear prediction. Let D be one step time delay operator (Dx)t = xt−1 . The induction statement at k is that f k and bk are k-th order forward and backward errors of optimal linear prediction which is equivalent to f k and bk k k being of the form f k = x−w1 Dx−. . .−wk Dk x and bk = Dk x−w1k Dk−1 x−. . .−wkk x and, from k i normal equations (3), satisfying f D x = 0 and Dbk Di x = bk Di−1 x = 0 for i = 1, . . . , k. That this is true for k = 1 directly follows from the deﬁnition of f 1 and b1 . Now we assume that this is true for k − 1 ≥ 1 and show it is true for k. It is easy to see from the forms of f k−1 and bk−1 k k and from f k = f k−1 − uk Dbk−1 that f k has the correct form f k = x − w1 Dx − . . . − wk Dk x. k i k−1 k k−1 Regarding orthogonality for i = 1, . . . , k − 1 we have f D x = (f − u Db )Di x = f k−1 Di x − uk (Dbk−1 )Di x = 0 using the induction assumptions of orhogonality at k − 1. For the remaining i = k we note that f k is the error of the optimal linear prediction of f k−1 from Dbk−1 k−1 and therefore 0 = f k Dbk−1 = f k (Dk x − w1k−1 Dk−1 x − . . . + wk−1 Dx) = f k Dk x as desired. The bk case can be proven similarly. 2.2 Continuous-time implementation The last hurdle remaining for modeling neuronal circuits which operate in continuous time with a lattice ﬁlter is its discrete-time operation. To obtain a continuous-time implementation of the lattice 4 ﬁlter we cannot simply take the time step size to zero as prediction-error ﬁlters would become inﬁnitesimally short. Here, we adapt the discrete-time lattice ﬁlter to continous-time operation in two steps. First, we introduce a discrete-time Laguerre lattice ﬁlter [24, 17] which uses Laguerre polynomials rather than the shift operator to generate its basis functions, Figure 2. The input signal passes through a leaky integrator whose leakage constant α deﬁnes a time-scale distinct from the time step (14). A delay, D, at every stage is replaced by an all-pass ﬁlter, L, (15) with the same constant α, which preserves the magnitude of every Fourier component of the input but shifts its phase in a frequency dependent manner. Such all-pass ﬁlter reduces to a single time-step delay when α = 0. The optimality of a general discrete-time Laguerre lattice ﬁlter can be proven similarly to that for the discrete-time ﬁlter, simply by replacing operator D with L in the proof of section 2.1. Figure 2: Continuous-time lattice ﬁlter using Laguerre polynomials. Compared to the discretetime version, it contains a leaky integrator, L0 ,(16) and replaces delays with all-pass ﬁlters, L, (17). Second, we obtain a continuous-time formulation of the lattice ﬁlter by replacing t − 1 → t − δt, deﬁning the inverse time scale γ = (1 − α)/δt and taking the limit δt → 0 while keeping γ ﬁxed. As a result L0 and L are given by: Discrete time L0 (x)t L(x)t Continuous time = αL0 (x)t−1 + xt (14) = α(L(x)t−1 − xt ) + xt−1 (15) dL0 (x)/dt = −γL0 (x) + x L(x) = x − 2γL0 (x) (16) (17) Representative impulse responses of the continuous Laguerre ﬁlter are shown in Figure 2. Note that, similarly to the discrete-time case, the area under the ﬁrst (peak) phase is greater than the area under the second (rebound) phase in the forward branch and the opposite is true in the backward branch. Moreover, the temporal extent of the rebound is greater than that of the peak not just in the forward branch like in the basic discrete-time implementation but also in the backward branch. As will be seen in the next section, these predictions are conﬁrmed by physiological recordings. 3 Experimental evidence for the lattice ﬁlter in visual pathways In this section we demonstrate that physiological measurements from visual pathways in vertebrates and invertebrates are consistent with the predictions of the lattice ﬁlter model. For the purpose of modeling visual pathways, we identify summation units of the lattice ﬁlter with neurons and propose that neural activity represents forward and backward errors. In the ﬂy visual pathway neuronal activity is represented by continuously varying graded potentials. In the vertebrate visual system, all neurons starting with ganglion cells are spiking and we identify their ﬁring rate with the activity in the lattice ﬁlter. 3.1 Mammalian visual pathway In mammals, visual processing is performed in stages. In the retina, photoreceptors synapse onto bipolar cells, which in turn synapse onto retinal ganglion cells (RGCs). RGCs send axons to the LGN, where they synapse onto LGN relay neurons projecting to the primary visual cortex, V1. In addition to this feedforward pathway, at each stage there are local circuits involving (usually inhibitory) inter-neurons such as horizontal and amacrine cells in the retina. Neurons of each class 5 come in many types, which differ in their connectivity, morphology and physiological response. The bewildering complexity of these circuits has posed a major challenge to visual neuroscience. Alonso et al. • Connections between LGN and Cortex J. Neurosci., June 1, 2001, 21(11):4002–4015 4009 Temporal Filter 1 0.5 0 -0.5 -1 RGC LGN 0 100 Time (ms) 200 Figure 7. Distribution of geniculate cells and simple cells with respect to the timing of their responses. The distribution of three parameters derived from impulse responses of geniculate and cortical neurons is shown. A, Peak time. B, Zero-crossing time. C, Rebound index. Peak time is the time with the strongest response in the ﬁrst phase of the impulse response. Zero-crossing time is the time between the ﬁrst and second phases. Rebound index is the area of the impulse response after the zero crossing divided by the area before the zero crossing. Only impulse responses with good signal to noise were included (Ͼ5 SD above baseline; n ϭ 169). Figure 3: Electrophysiologically measured temporal receptive ﬁelds get progressively longer along the cat visual pathway. Left: A cat LGN cell (red) has a longer receptive ﬁeld than a corresponding RGC cell (blue) (adapted from [12] which also reports population data). Right (A,B): Extent of the temporal receptive ﬁelds of simple cells in cat V1 is greater than that of corresponding LGN cells as quantiﬁed by the peak (A) and zero-crossing (B) times. Right (C): In the temporal receptive ﬁelds of cat LGN and V1 cells the peak can be stronger or weaker than the rebound (adapted from [25]). simple cells and geniculate cells differed for all temporal parameters measured, there was considerable overlap between the distributions (Fig. 7). This overlap raises the following question: does connectivity depend on how well geniculate and cortical responses are matched with respect to time? For instance, do simple cells with fast subregions (early times to peak and early zero crossings) receive input mostly from geniculate cells with fast centers? Figure 8 illustrates the visual responses from a geniculate cell and a simple cell that were monosynaptically connected. A strong positive peak was observed in the correlogram (shown with a 10 msec time window to emphasize its short latency and fast rise time). In this case, an ON central subregion was well overlapped with an ON geniculate center (precisely at the peak of the subregion). Moreover, the timings of the visual responses from the overlapped subregion and the geniculate center were very similar (same onset, ϳ0 –25 msec; same peak, ϳ25–50 msec). It is worth noting that the two central subregions of the simple cell were faster and stronger than the two lateral subregions. The responses of the central subregions matched the timing of the geniculate center. In contrast, the timing of the lateral subregions resembled more closely the timing of the geniculate surround (both peaked at 25–50 msec). Unlike the example shown in Figure 8, a considerable number of geniculocortical pairs produced responses with different timing. For example, Figure 9 illustrates a case in which a geniculate center fully overlapped a strong simple-cell subregion of the same sign, but with slower timing (LGN onset, ϳ0 –25 msec; peak, ϳ25–50 msec; simple-cell onset, ϳ25–50 msec; peak, ϳ50 –75 msec). The cross-correlogram between this pair of neurons was ﬂat, which indicates the absence of a monosynaptic connection (Fig. 9, top right). To examine the role of timing in geniculocortical connectivity, we measured the response time course from all cell pairs that met two criteria. First, the geniculate center overlapped a simple-cell subregion of the same sign (n ϭ 104). Second, the geniculate center overlapped the cortical subregion in a near-optimal position (relative overlap Ͼ 50%, n ϭ 47; see Materials and Methods; Fig. 5A). All these cell pairs had a high probability of being monosynaptically connected because of the precise match in receptive-ﬁeld position and sign (31 of 47 were connected). The distributions of peak time, zero-crossing time, and rebound index from these cell pairs were very similar to the distributions from the entire sample (Fig. 7; see also Fig. 10 legend). The selected cell pairs included both presumed directional (predicted DI Ͼ 0.3, see Materials and Methods; 12/20 connected) and nondirectional (19/27 connected) simple cells. Most geniculate cells had small receptive ﬁelds (less than two simple-cell subregion widths; see Receptive-ﬁeld sign), although ﬁve cells with larger receptive ﬁelds were also included (three connected). From the 47 cell pairs used in this analysis, those with similar response time courses had a higher probability of being connected (Fig. 10). In particular, cell pairs that had both similar peak time and zero-crossing time were all connected (n ϭ 12; Fig. 10 A). Directionally selective simple cells were included in all timing groups. For example, in Figure 10 A there were four, ﬁve, two, and one directionally selective cells in the time groups Ͻ20, 40, 60, and Ͼ60 msec, respectively. Similar results were obtained if we restricted our sample to geniculate centers overlapped with the dominant subregion of the simple cell (n ϭ 31). Interestingly, the efﬁcacy and contributions of the connections seemed to depend little on the relative timing of the visual responses (Fig. 10, right). Although our sample of them was quite small, lagged cells are of considerable interest and therefore deserve comment. We recorded from 13 potentially lagged LGN cells whose centers were superimposed with a simple-cell subregion (eight with rebound indices between 1.2 and 1.5; ﬁve with rebound indices Ͼ1.9). Only seven of these pairs could be used for timing comparisons (in one pair the baseline of the correlogram had insufﬁcient spikes; in three pairs the geniculate receptive ﬁelds were Here, we point out several experimental observations related to temporal processing in the visual system consistent with the lattice ﬁlter model. First, measurements of temporal receptive ﬁelds demonstrate that they get progressively longer at each consecutive stage: i) LGN neurons have longer receptive ﬁelds than corresponding pre-synaptic ganglion cells [12], Figure 3left; ii) simple cells in V1 have longer receptive ﬁelds than corresponding pre-synaptic LGN neurons [25], Figure 3rightA,B. These observation are consistent with the progressively greater temporal extent of the prediction-error ﬁlters (blue plots in Figure 2). Second, the weight of the peak (integrated area under the curve) may be either greater or less than that of the rebound both in LGN relay cells [26] and simple cells of V1 [25], Figure 3right(C). Neurons with peak weight exceeding that of rebound are often referred to as non-lagged while the others are known as lagged found both in cat [27, 28, 29] and monkey [30]. The reason for this becomes clear from the response to a step stimulus, Figure 4(top). By comparing experimentally measured receptive ﬁelds with those of the continuous lattice ﬁlter, Figure 4, we identify non-lagged neurons with the forward branch and lagged neurons with the backward branch. Another way to characterize step-stimulus response is whether the sign of the transient is the same (non-lagged) or different (lagged) relative to sustained response. Third, measurements of cross-correlation between RGCs and LGN cell spikes in lagged and nonlagged neurons reveals a difference of the transfer function indicative of the difference in underlying circuitry [30]. This is consistent with backward pathway circuit of the Laguerre lattice ﬁlter, Figure 2, being more complex then that of the forward path (which results in different transfer function). ” (or replacing ”more complex” with ”different”) Third, measurements of cross-correlation between RGCs and LGN cell spikes in lagged and nonlagged neurons reveals a difference of the transfer function indicative of the difference in underlying circuitry [31]. This is consistent with the backward branch circuit of the Laguerre lattice ﬁlter, Figure 2, being different then that of the forward branch (which results in different transfer function). In particular, a combination of different glutamate receptors such as AMPA and NMDA, as well as GABA receptors are thought to be responsible for observed responses in lagged cells [32]. However, further investigation of the corresponding circuitry, perhaps using connectomics technology, is desirable. Fourth, the cross-link weights of the lattice ﬁlter can be learned using Hebbian rules, (12,13) which are biologically plausible [22, 23]. Interestingly, if these weights are learned sequentially, starting from the ﬁrst stage, they do not need to be re-learned when additional stages are added or learned. This property maps naturally on the fact that in the course of mammalian development the visual pathway matures in a stage-wise fashion - starting with the retina, then LGN, then V1 - and implying that the more peripheral structures do not need to adapt to the maturation of the downstream ones. 6 Figure 4: Comparison of electrophysiologically measured responses of cat LGN cells with the continuous-time lattice ﬁlter model. Top: Experimentally measured temporal receptive ﬁelds and step-stimulus responses of LGN cells (adapted from [26]). Bottom: Typical examples of responses in the continuous-time lattice ﬁlter model. Lattice ﬁlter coefﬁcients were u1 = v 1 = 0.4, u2 = v 2 = 0.2 and 1/γ = 50ms to model the non-lagged cell and u1 = v 1 = u2 = v 2 = 0.2 and 1/γ = 60ms to model the lagged cell. To model photoreceptor contribution to the responses, an additional leaky integrator L0 was added to the circuit of Figure 2. While Hebbian rules are biologically plausible, one may get an impression from Figure 2 that they must apply to inhibitory cross-links. We point out that this circuit is meant to represent only the computation performed rather than the speciﬁc implementation in terms of neurons. As the same linear computation can be performed by circuits with a different arrangement of the same components, there are multiple implementations of the lattice ﬁlter. For example, activity of non-lagged OFF cells may be seen as representing minus forward error. Then the cross-links between the non-lagged OFF pathway and the lagged ON pathway would be excitatory. In general, classiﬁcation of cells into lagged and non-lagged seems independent of their ON/OFF and X/Y classiﬁcation [31, 28, 29], but see[33]. 3.2 Insect visual pathway In insects, two cell types, L1 and L2, both post-synaptic to photoreceptors play an important role in visual processing. Physiological responses of L1 and L2 indicate that they decorrelate visual signals by subtracting their predictable parts. In fact, receptive ﬁelds of these neurons were used as the ﬁrst examples of predictive coding in neuroscience [6]. Yet, as the numbers of synapses from photoreceptors to L1 and L2 are the same [34] and their physiological properties are similar, it has been a mystery why insects, have not just one but a pair of such seemingly redundant neurons per facet. Previously, it was suggested that L1 and L2 provide inputs to the two pathways that map onto ON and OFF pathways in the vertebrate retina [35, 36]. Here, we put forward a hypothesis that the role of L1 and L2 in visual processing is similar to that of the two branches of the lattice ﬁlter. We do not incorporate the ON/OFF distinction in the effectively linear lattice ﬁlter model but anticipate that such combined description will materialize in the future. As was argued in Section 2, in forward prediction-error ﬁlters, the peak has greater weight than the rebound, while in backward prediction-error ﬁlters the opposite is true. Such difference implies that in response to a step-stimulus the signs of sustained responses compared to initial transients are different between the branches. Indeed, Ca2+ imaging shows that responses of L1 and L2 to step-stimulus are different as predicted by the lattice ﬁlter model [35], Figure 5b. Interestingly, the activity of L1 seems to represent minus forward error and L2 - plus backward error, suggesting that the lattice ﬁlter cross-links are excitatory. To summarize, the predictions of the lattice ﬁlter model seem to be consistent with the physiological measurements in the ﬂy visual system and may help understand its operation. 7 Stimulus 1 0.5 0 0 5 10 15 20 5 10 15 20 5 10 time 15 20 − Forward Error 1 0 −1 0 Backward Error 1 0 −1 0 Figure 5: Response of the lattice ﬁlter and fruit ﬂy LMCs to a step-stimulus. Left: Responses of the ﬁrst order discrete-time lattice ﬁlter to a step stimulus. Right: Responses of ﬂy L1 and L2 cells to a moving step stimulus (adapted from [35]). Predicted and the experimentally measured responses have qualitatively the same shape: a transient followed by sustained response, which has the same sign for the forward error and L1 and the opposite sign for the backward error and L2. 4 Discussion Motivated by the cascade structure of the visual pathway, we propose to model its operation with the lattice ﬁlter. We demonstrate that the predictions of the continuous-time lattice ﬁlter model are consistent with the course of neural development and the physiological measurement in the LGN, V1 of cat and monkey, as well as ﬂy LMC neurons. Therefore, lattice ﬁlters may offer a useful abstraction for understanding aspects of temporal processing in visual systems of vertebrates and invertebrates. Previously, [11] proposed that lagged and non-lagged cells could be a result of rectiﬁcation by spiking neurons. Although we agree with [11] that LGN performs temporal decorrelation, our explanation does not rely on non-linear processing but rather on the cascade architecture and, hence, is fundamentally different. Our model generates the following predictions that are not obvious in [11]: i) Not only are LGN receptive ﬁelds longer than RGC but also V1 receptive ﬁelds are longer than LGN; ii) Even a linear model can generate a difference in the peak/rebound ratio; iii) The circuit from RGC to LGN should be different for lagged and non-lagged cells consistent with [31]; iv) The lattice ﬁlter circuit can self-organize using Hebbian rules, which gives a mechanistic explanation of receptive ﬁelds beyond the normative framework of [11]. In light of the redundancy reduction arguments given in the introduction, we note that, if the only goal of the system were to compress incoming signals using a given number of lattice ﬁlter stages, then after the compression is peformed only one kind of prediction errors, forward or backward needs to be transmitted. Therefore, having two channels, in the absence of noise, may seem redundant. However, transmitting both forward and backward errors gives one the ﬂexibility to continue decorrelation further by adding stages performing relatively simple operations. We are grateful to D.A. Butts, E. Callaway, M. Carandini, D.A. Clark, J.A. Hirsch, T. Hu, S.B. Laughlin, D.N. Mastronarde, R.C. Reid, H. Rouault, A. Saul, L. Scheffer, F.T. Sommer, X. Wang for helpful discussions. References [1] F. Rieke, D. Warland, R.R. van Steveninck, and W. Bialek. Spikes: exploring the neural code. MIT press, 1999. [2] S.B. Laughlin. Matching coding, circuits, cells, and molecules to signals: general principles of retinal design in the ﬂy’s eye. Progress in retinal and eye research, 13(1):165–196, 1994. [3] F. Attneave. Some informational aspects of visual perception. Psychological review, 61(3):183, 1954. [4] H. Barlow. Redundancy reduction revisited. Network: Comp in Neural Systems, 12(3):241–253, 2001. [5] R.M. Gray. Linear Predictive Coding and the Internet Protocol. Now Publishers, 2010. [6] MV Srinivasan, SB Laughlin, and A. Dubs. Predictive coding: a fresh view of inhibition in the retina. Proceedings of the Royal Society of London. Series B. Biological Sciences, 216(1205):427–459, 1982. [7] T. Hosoya, S.A. Baccus, and M. Meister. Dynamic predictive coding by the retina. Nature, 436:71, 2005. 8 [8] HK Hartline, H.G. Wagner, and EF MacNichol Jr. The peripheral origin of nervous activity in the visual system. Studies on excitation and inhibition in the retina: a collection of papers from the laboratories of H. Keffer Hartline, page 99, 1974. [9] N.A. Lesica, J. Jin, C. Weng, C.I. Yeh, D.A. Butts, G.B. Stanley, and J.M. Alonso. Adaptation to stimulus contrast and correlations during natural visual stimulation. Neuron, 55(3):479–491, 2007. [10] Y. Dan, J.J. Atick, and R.C. Reid. Efﬁcient coding of natural scenes in the lateral geniculate nucleus: experimental test of a computational theory. The Journal of Neuroscience, 16(10):3351–3362, 1996. [11] D.W. Dong and J.J. Atick. Statistics of natural time-varying images. Network: Computation in Neural Systems, 6(3):345–358, 1995. [12] X. Wang, J.A. Hirsch, and F.T. Sommer. Recoding of sensory information across the retinothalamic synapse. The Journal of Neuroscience, 30(41):13567–13577, 2010. [13] C. Koch. Biophysics of computation: information processing in single neurons. Oxford Univ Press, 2005. [14] F. Itakura and S. Saito. On the optimum quantization of feature parameters in the parcor speech synthesizer. In Conference Record, 1972 International Conference on Speech Communication and Processing, Boston, MA, pages 434–437, 1972. [15] B. Widrow and S.D. Stearns. Adaptive signal processing. Prentice-Hall, Inc. Englewood Cliffs, NJ, 1985. [16] S. Haykin. Adaptive ﬁlter theory. Prentice-Hall, Englewood-Cliffs, NJ, 2003. [17] A.H. Sayed. Fundamentals of adaptive ﬁltering. Wiley-IEEE Press, 2003. [18] D.J. Felleman and D.C. Van Essen. Distributed hierarchical processing in the primate cerebral cortex. Cerebral cortex, 1(1):1–47, 1991. [19] X. Wang, F.T. Sommer, and J.A. Hirsch. Inhibitory circuits for visual processing in thalamus. Current Opinion in Neurobiology, 2011. [20] SB Laughlin, J. Howard, and B. Blakeslee. Synaptic limitations to contrast coding in the retina of the blowﬂy calliphora. Proceedings of the Royal society of London. Series B. Biological sciences, 231(1265):437–467, 1987. [21] D.C. Lay. Linear Algebra and Its Applications. Addison-Wesley/Longman, New York/London, 2000. [22] D.O. Hebb. The organization of behavior: A neuropsychological theory. Lawrence Erlbaum, 2002. [23] O. Paulsen and T.J. Sejnowski. Natural patterns of activity and long-term synaptic plasticity. Current opinion in neurobiology, 10(2):172–180, 2000. [24] Z. Fejzo and H. Lev-Ari. Adaptive laguerre-lattice ﬁlters. Signal Processing, IEEE Transactions on, 45(12):3006–3016, 1997. [25] J.M. Alonso, W.M. Usrey, and R.C. Reid. Rules of connectivity between geniculate cells and simple cells in cat primary visual cortex. The Journal of Neuroscience, 21(11):4002–4015, 2001. [26] D. Cai, G.C. Deangelis, and R.D. Freeman. Spatiotemporal receptive ﬁeld organization in the lateral geniculate nucleus of cats and kittens. Journal of Neurophysiology, 78(2):1045–1061, 1997. [27] D.N. Mastronarde. Two classes of single-input x-cells in cat lateral geniculate nucleus. i. receptive-ﬁeld properties and classiﬁcation of cells. Journal of Neurophysiology, 57(2):357–380, 1987. [28] J. Wolfe and L.A. Palmer. Temporal diversity in the lateral geniculate nucleus of cat. Visual neuroscience, 15(04):653–675, 1998. [29] AB Saul and AL Humphrey. Spatial and temporal response properties of lagged and nonlagged cells in cat lateral geniculate nucleus. Journal of Neurophysiology, 64(1):206–224, 1990. [30] A.B. Saul. Lagged cells in alert monkey lateral geniculate nucleus. Visual neurosci, 25:647–659, 2008. [31] D.N. Mastronarde. Two classes of single-input x-cells in cat lateral geniculate nucleus. ii. retinal inputs and the generation of receptive-ﬁeld properties. Journal of Neurophysiology, 57(2):381–413, 1987. [32] P. Heggelund and E. Hartveit. Neurotransmitter receptors mediating excitatory input to cells in the cat lateral geniculate nucleus. i. lagged cells. Journal of neurophysiology, 63(6):1347–1360, 1990. [33] J. Jin, Y. Wang, R. Lashgari, H.A. Swadlow, and J.M. Alonso. Faster thalamocortical processing for dark than light visual targets. The Journal of Neuroscience, 31(48):17471–17479, 2011. [34] M. Rivera-Alba, S.N. Vitaladevuni, Y. Mischenko, Z. Lu, S. Takemura, L. Scheffer, I.A. Meinertzhagen, D.B. Chklovskii, and G.G. de Polavieja. Wiring economy and volume exclusion determine neuronal placement in the drosophila brain. Current Biology, 21(23):2000–5, 2011. [35] D.A. Clark, L. Bursztyn, M.A. Horowitz, M.J. Schnitzer, and T.R. Clandinin. Deﬁning the computational structure of the motion detector in drosophila. Neuron, 70(6):1165–1177, 2011. [36] M. Joesch, B. Schnell, S.V. Raghu, D.F. Reiff, and A. Borst. On and off pathways in drosophila motion vision. Nature, 468(7321):300–304, 2010. 9

6 0.11139596 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization

7 0.10751697 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

8 0.10586009 24 nips-2012-A mechanistic model of early sensory processing based on subtracting sparse representations

9 0.097271457 341 nips-2012-The topographic unsupervised learning of natural sounds in the auditory cortex

10 0.089751132 190 nips-2012-Learning optimal spike-based representations

11 0.088637941 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking

12 0.084064953 94 nips-2012-Delay Compensation with Dynamical Synapses

13 0.076016903 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

14 0.074961565 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

15 0.073451675 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

16 0.070295505 239 nips-2012-Neuronal Spike Generation Mechanism as an Oversampling, Noise-shaping A-to-D converter

17 0.065568581 262 nips-2012-Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L p$ Loss

18 0.057522155 150 nips-2012-Hierarchical spike coding of sound

19 0.050581679 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

20 0.04849875 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.113), (1, 0.045), (2, -0.161), (3, 0.135), (4, 0.031), (5, 0.185), (6, -0.003), (7, -0.038), (8, 0.019), (9, -0.012), (10, -0.012), (11, -0.043), (12, -0.14), (13, -0.016), (14, 0.025), (15, -0.024), (16, 0.036), (17, 0.027), (18, -0.021), (19, 0.15), (20, -0.003), (21, 0.096), (22, 0.015), (23, 0.006), (24, -0.038), (25, -0.027), (26, -0.038), (27, -0.038), (28, 0.046), (29, 0.029), (30, 0.002), (31, -0.069), (32, -0.012), (33, 0.015), (34, 0.084), (35, 0.008), (36, -0.021), (37, -0.042), (38, 0.029), (39, 0.047), (40, 0.045), (41, -0.012), (42, 0.061), (43, 0.003), (44, -0.002), (45, 0.008), (46, -0.049), (47, 0.009), (48, 0.032), (49, -0.01)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94649529 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

Author: Brett Vintch, Andrew Zaharia, J Movshon, Hhmi) Hhmi), Eero P. Simoncelli

2 0.79081345 23 nips-2012-A lattice filter model of the visual pathway

Author: Karol Gregor, Dmitri B. Chklovskii

3 0.70707917 24 nips-2012-A mechanistic model of early sensory processing based on subtracting sparse representations

Author: Shaul Druckmann, Tao Hu, Dmitri B. Chklovskii

Abstract: Early stages of sensory systems face the challenge of compressing information from numerous receptors onto a much smaller number of projection neurons, a so called communication bottleneck. To make more efficient use of limited bandwidth, compression may be achieved using predictive coding, whereby predictable, or redundant, components of the stimulus are removed. In the case of the retina, Srinivasan et al. (1982) suggested that feedforward inhibitory connections subtracting a linear prediction generated from nearby receptors implement such compression, resulting in biphasic center-surround receptive fields. However, feedback inhibitory circuits are common in early sensory circuits and furthermore their dynamics may be nonlinear. Can such circuits implement predictive coding as well? Here, solving the transient dynamics of nonlinear reciprocal feedback circuits through analogy to a signal-processing algorithm called linearized Bregman iteration we show that nonlinear predictive coding can be implemented in an inhibitory feedback circuit. In response to a step stimulus, interneuron activity in time constructs progressively less sparse but more accurate representations of the stimulus, a temporally evolving prediction. This analysis provides a powerful theoretical framework to interpret and understand the dynamics of early sensory processing in a variety of physiological experiments and yields novel predictions regarding the relation between activity and stimulus statistics.

4 0.68674326 341 nips-2012-The topographic unsupervised learning of natural sounds in the auditory cortex

Author: Hiroki Terashima, Masato Okada

Abstract: The computational modelling of the primary auditory cortex (A1) has been less fruitful than that of the primary visual cortex (V1) due to the less organized properties of A1. Greater disorder has recently been demonstrated for the tonotopy of A1 that has traditionally been considered to be as ordered as the retinotopy of V1. This disorder appears to be incongruous, given the uniformity of the neocortex; however, we hypothesized that both A1 and V1 would adopt an efﬁcient coding strategy and that the disorder in A1 reﬂects natural sound statistics. To provide a computational model of the tonotopic disorder in A1, we used a model that was originally proposed for the smooth V1 map. In contrast to natural images, natural sounds exhibit distant correlations, which were learned and reﬂected in the disordered map. The auditory model predicted harmonic relationships among neighbouring A1 cells; furthermore, the same mechanism used to model V1 complex cells reproduced nonlinear responses similar to the pitch selectivity. These results contribute to the understanding of the sensory cortices of different modalities in a novel and integrated manner.

5 0.65591764 195 nips-2012-Learning visual motion in recurrent neural networks

Author: Marius Pachitariu, Maneesh Sahani

6 0.65194422 62 nips-2012-Burn-in, bias, and the rationality of anchoring

7 0.65194422 116 nips-2012-Emergence of Object-Selective Features in Unsupervised Feature Learning

8 0.6041677 94 nips-2012-Delay Compensation with Dynamical Synapses

9 0.59071481 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

10 0.51590067 224 nips-2012-Multi-scale Hyper-time Hardware Emulation of Human Motor Nervous System Based on Spiking Neurons using FPGA

11 0.46522537 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

12 0.44739106 262 nips-2012-Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L p$ Loss

13 0.43865415 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization

14 0.42260483 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

15 0.41742927 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

16 0.41052002 239 nips-2012-Neuronal Spike Generation Mechanism as an Oversampling, Noise-shaping A-to-D converter

17 0.40408802 365 nips-2012-Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding

18 0.4014723 190 nips-2012-Learning optimal spike-based representations

19 0.39192274 256 nips-2012-On the connections between saliency and tracking

20 0.38265419 238 nips-2012-Neurally Plausible Reinforcement Learning of Working Memory Tasks

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.038), (17, 0.054), (21, 0.096), (38, 0.106), (39, 0.015), (42, 0.017), (44, 0.018), (54, 0.03), (55, 0.046), (74, 0.046), (76, 0.081), (80, 0.081), (85, 0.204), (92, 0.045)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.77506185 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

Author: Brett Vintch, Andrew Zaharia, J Movshon, Hhmi) Hhmi), Eero P. Simoncelli

2 0.6708734 195 nips-2012-Learning visual motion in recurrent neural networks

Author: Marius Pachitariu, Maneesh Sahani

3 0.65118164 23 nips-2012-A lattice filter model of the visual pathway

Author: Karol Gregor, Dmitri B. Chklovskii

4 0.64920467 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

Author: Christoph H. Lampert

Abstract: We study the problem of maximum marginal prediction (MMP) in probabilistic graphical models, a task that occurs, for example, as the Bayes optimal decision rule under a Hamming loss. MMP is typically performed as a two-stage procedure: one estimates each variable’s marginal probability and then forms a prediction from the states of maximal probability. In this work we propose a simple yet effective technique for accelerating MMP when inference is sampling-based: instead of the above two-stage procedure we directly estimate the posterior probability of each decision variable. This allows us to identify the point of time when we are sufﬁciently certain about any individual decision. Whenever this is the case, we dynamically prune the variables we are conﬁdent about from the underlying factor graph. Consequently, at any time only samples of variables whose decision is still uncertain need to be created. Experiments in two prototypical scenarios, multi-label classiﬁcation and image inpainting, show that adaptive sampling can drastically accelerate MMP without sacriﬁcing prediction accuracy. 1

5 0.64868635 300 nips-2012-Scalable nonconvex inexact proximal splitting

Author: Suvrit Sra

Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is ﬁrst to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1

6 0.64404386 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

7 0.63867396 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

8 0.63779372 60 nips-2012-Bayesian nonparametric models for ranked data

9 0.63685435 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

10 0.63590372 190 nips-2012-Learning optimal spike-based representations

11 0.62830484 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

12 0.62804306 193 nips-2012-Learning to Align from Scratch

13 0.62618864 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

14 0.62150866 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

15 0.62099242 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

16 0.61933577 18 nips-2012-A Simple and Practical Algorithm for Differentially Private Data Release

17 0.61816931 94 nips-2012-Delay Compensation with Dynamical Synapses

18 0.61790466 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

19 0.61628342 24 nips-2012-A mechanistic model of early sensory processing based on subtracting sparse representations

20 0.61322153 152 nips-2012-Homeostatic plasticity in Bayesian spiking networks as Expectation Maximization with posterior constraints