nips nips2012 nips2012-62 knowledge-graph by maker-knowledge-mining

62 nips-2012-Burn-in, bias, and the rationality of anchoring


Source: pdf

Author: Falk Lieder, Thomas Griffiths, Noah Goodman

Abstract: Recent work in unsupervised feature learning has focused on the goal of discovering high-level features from unlabeled images. Much progress has been made in this direction, but in most cases it is still standard to use a large amount of labeled data in order to construct detectors sensitive to object classes or other complex patterns in the data. In this paper, we aim to test the hypothesis that unsupervised feature learning methods, provided with only unlabeled data, can learn high-level, invariant features that are sensitive to commonly-occurring objects. Though a handful of prior results suggest that this is possible when each object class accounts for a large fraction of the data (as in many labeled datasets), it is unclear whether something similar can be accomplished when dealing with completely unlabeled data. A major obstacle to this test, however, is scale: we cannot expect to succeed with small datasets or with small numbers of learned features. Here, we propose a large-scale feature learning system that enables us to carry out this experiment, learning 150,000 features from tens of millions of unlabeled images. Based on two scalable clustering algorithms (K-means and agglomerative clustering), we find that our simple system can discover features sensitive to a commonly occurring object class (human faces) and can also combine these into detectors invariant to significant global distortions like large translations and scale. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Recent work in unsupervised feature learning has focused on the goal of discovering high-level features from unlabeled images. [sent-4, score-0.363]

2 Much progress has been made in this direction, but in most cases it is still standard to use a large amount of labeled data in order to construct detectors sensitive to object classes or other complex patterns in the data. [sent-5, score-0.512]

3 In this paper, we aim to test the hypothesis that unsupervised feature learning methods, provided with only unlabeled data, can learn high-level, invariant features that are sensitive to commonly-occurring objects. [sent-6, score-0.578]

4 Though a handful of prior results suggest that this is possible when each object class accounts for a large fraction of the data (as in many labeled datasets), it is unclear whether something similar can be accomplished when dealing with completely unlabeled data. [sent-7, score-0.373]

5 Here, we propose a large-scale feature learning system that enables us to carry out this experiment, learning 150,000 features from tens of millions of unlabeled images. [sent-9, score-0.385]

6 1 Introduction Many algorithms are now available to learn hierarchical features from unlabeled image data. [sent-11, score-0.317]

7 Thus, an interesting open question is whether unsupervised feature learning algorithms are able to construct features, without the benefit of supervision, that can identify high-level concepts like frequently-occurring object classes. [sent-13, score-0.303]

8 4 million image thumbnails from YouTube and extracted roughly 57 million 32-by-32 pixel patches at random locations and scales. [sent-18, score-0.456]

9 Finally, as is already common in feature learning work, we should aim to build features that incorporate invariance so that features respond not just to a specific pattern (e. [sent-28, score-0.491]

10 Unfortunately, these desiderata are difficult to achieve at once: current methods for building invariant hierarchies of features are difficult to scale up to train many thousands of features from our 57 million patch dataset on our cluster of 30 machines. [sent-33, score-0.682]

11 In this paper, we will propose a highly scalable combination of clustering algorithms for learning selective and invariant features that are capable of tackling this size of problem. [sent-34, score-0.562]

12 Surprisingly, we find that despite the simplicity of these algorithms we are nevertheless able to discover high-level features sensitive to the most commonly occurring object class present in our dataset: human faces. [sent-35, score-0.429]

13 In fact, we find that these features are better face detectors than a linear filter trained from labeled data, achieving up to 86% AUC compared to 77% on labeled validation data. [sent-36, score-0.505]

14 Thus, our results emphasize that not only can unsupervised learning algorithms discover object-selective features with no labeled data, but that such features can potentially perform better than basic supervised detectors due to their deep architecture. [sent-37, score-0.625]

15 We will begin with a description of our algorithms for learning selective and invariant features, and explain their relationship to existing systems. [sent-40, score-0.299]

16 We will refer to these features as “simple cells” and “complex cells” respectively, in analogy to previous work and to biological cells with (very loosely) related response properties. [sent-44, score-0.58]

17 Following other popular systems [14, 12, 6, 5] we will then use these two algorithms to build alternating layers of simple cell and complex cell features. [sent-45, score-1.138]

18 Given the linear filters D, we then define the responses of the learned simple cell features as s(i) = g(a(i) ) where a(i) = D x(i) and g(·) is a nonlinear activation function. [sent-59, score-0.857]

19 2 Learning Invariant Features (Complex Cells) To construct invariant complex cell features a common approach is to create “pooling units” that combine the responses of lower-level simple cells. [sent-62, score-1.163]

20 Specifically, given a vector of simple cell responses s(i) , we will train complex cell features whose responses are given by: (i) (i) cj = max sk k∈Gj where Gj is a set that specifies which simple cells the j’th complex cell should pool over. [sent-64, score-2.736]

21 Thus, the complex cell cj is an invariant feature that responds significantly to any of the patterns represented by simple cells in its group. [sent-65, score-1.364]

22 Each group Gj should specify a set of simple cells that are, in some sense, similar to one another. [sent-66, score-0.565]

23 In convolutional neural networks [12], for instance, each group is hard-coded to include translated copies of the same filter resulting in complex cell responses cj that are invariant to small translations. [sent-67, score-0.997]

24 Some algorithms [6, 3] fix the groups Gj ahead of time then optimize the simple cell filters D so that the simple cells in each group share a particular form of statistical dependence. [sent-68, score-1.145]

25 In our system, we will use linear correlation of simple cell responses as our similarity metric, E[ak al ], and construct groups Gj that combine similar features according to this metric. [sent-69, score-0.933]

26 To construct the groups G, we will use a version of single-link agglomerative clustering to combine sets of features that have low dissimilarity according to d(k, l). [sent-72, score-0.462]

27 2 To construct a single group G0 we begin by choosing a random simple cell filter, say D(k) , as the first member. [sent-73, score-0.61]

28 We then search for candidate cells to be added to the group by computing d(k, l) for each simple cell filter D(l) and add D(l) to the group if d(k, l) is less than some limit τ . [sent-74, score-1.078]

29 The algorithm then continues to expand G0 by adding any additional simple cells that are closer than τ to any one of the simple cells already in the group. [sent-75, score-0.99]

30 This procedure continues until there are no more cells to be added, or until the diameter of the group (the dissimilarity between the two furthest cells in the group) reaches a limit ∆. [sent-76, score-0.961]

31 3 This procedure can be executed, quite rapidly, in parallel for a large number of randomly chosen simple cells to act as the “seed” cell, thus allowing us to train many complex cells at once. [sent-77, score-1.119]

32 Compared to the simple cell learning procedure, the computational cost is extremely small even for our rudimentary implementation. [sent-78, score-0.556]

33 This ensures that we do not end up with many groups that pool over very few simple cells (and hence yield complex cells cj that are not especially invariant). [sent-82, score-1.209]

34 5 > 2 so that a complex cell group may include orthogonal filters but cannot grow without limit. [sent-91, score-0.654]

35 Other popular algorithms [6, 5, 3] for learning complex-cell features are designed to identify X and Y as features to be pooled together due to the correlation in their energies (scales). [sent-97, score-0.332]

36 By pooling together the responses of a1 and a2 a complex cell is able to detect an edge of fixed orientation invariant to small translations. [sent-99, score-1.039]

37 This model also makes sense for higher-level invariances where X and Y do not merely represent responses of linear filters on image patches but feature responses in a deep network. [sent-100, score-0.594]

38 Our combination of simple cell and complex cell learning algorithms above tend to learn this same type of invariance. [sent-102, score-1.124]

39 From this picture it is clear what a subsequent application of our single-link clustering algorithm will do: it will try to string together the centroids around the “belt” that forms the invariant subspace and avoid connecting them to the (distant) centroids at the poles. [sent-106, score-0.395]

40 Max-pooling over the responses of these filters will result in a complex cell that responds consistently to points in the X–Y plane, but not in the Z direction— that is, we end up with an invariant feature detector very similar to those constructed by existing methods. [sent-107, score-0.932]

41 (c) A pooling unit identified by applying single-link clustering to the centroids (black links join pooled filters). [sent-114, score-0.285]

42 4 Feature Hierarchy Now that we have defined our simple and complex cell learning algorithms, we can use them to train alternating layers of selective and invariant features. [sent-117, score-1.05]

43 Our first layer of simple cell features are locally connected to 16 non-overlapping 8-by-8 pixel patches within the 32-by-32 pixel image. [sent-123, score-1.005]

44 These features are trained by building a dataset of 8-by8 patches and passing them to our simple cell learning procedure to train 6400 first-layer filters D ∈ 64×6400 . [sent-124, score-0.955]

45 We apply our complex cell learning procedure to this bank of filters to find 128 pooling groups G1 , G2 , . [sent-125, score-0.812]

46 Using these results, we can extract our simple cell and complex cell features from each 8-by-8 pixel subpatch of the 32-by-32 image. [sent-129, score-1.295]

47 Specifically, the linear filters D (p) are used to extract the first layer simple cell responses si = g(D(i) x(p) ) where x(p) , p = 1, . [sent-130, score-0.742]

48 We then compute the complex cell feature responses (p) (p) cj = maxk∈Gj sk for each patch. [sent-133, score-0.821]

49 Once complete, we have an array of 128-by-4-by-4 = 2048 complex cell responses c representing each 32-by-32 image. [sent-134, score-0.722]

50 These responses are then used to form a new dataset from which to learn a second layer of simple cells with K-means. [sent-135, score-0.801]

51 We denote the second layer of learned filters as D, and the second layer simple cell responses ¯ ¯ as s = D c. [sent-137, score-0.89]

52 Applying again our complex cell learning procedure to D, we obtain pooling groups ¯ ¯ G, and complex cells c defined analogously. [sent-138, score-1.35]

53 We cropped 57 million randomly selected 32-by-32 pixel patches from these images to form our unlabeled training set. [sent-143, score-0.486]

54 5 We will now visualize these features and check whether any of them have learned to identify an object class. [sent-146, score-0.405]

55 1 Low-Level Simple and Complex Cell Visualizations We visualize the learned low-level filters D and pooling groups G to verify that they are, in fact, similar to those learned by other well-known algorithms. [sent-148, score-0.368]

56 To visualize the learned complex cells we inspect the simple cell filters that belong to each of the pooling groups. [sent-152, score-1.366]

57 2 Higher-Level Simple and Complex Cells Finally, we inspect the learned higher layer simple cell and complex cell features, s and c, partic¯ ¯ ularly to see whether any of them are selective for an object class. [sent-165, score-1.538]

58 The most commonly occurring object in these video thumbnails is human faces (even though we estimate that much less than 0. [sent-166, score-0.521]

59 Thus we search through our learned features for cells that are selective for human faces at varying locations and scales. [sent-168, score-1.077]

60 To locate such features we use a dataset of labeled images: several hundred thousand non-face images as well as tens of thousands of known face images from the “Labeled Faces in the Wild” (LFW) dataset [4]. [sent-169, score-0.584]

61 7 To test whether any of the s simple cell features are selective for faces, we use each feature by itself ¯ as a “detector” on the labeled dataset: we compute the area under the precision-recall curve (AUC) obtained when each feature’s response si is used as a simple classifier. [sent-170, score-1.043]

62 Indeed, it turns out that there ¯ are a handful of high-level features that tend to be good detectors for faces. [sent-171, score-0.306]

63 We visualize 16 of the simple cell features identified by this procedure8 in Figure 4a along with a sampling of the image patches that activate the first of these cells strongly. [sent-173, score-1.412]

64 There it can be seen that these simple cells are selective for faces located at particular locations and scales. [sent-174, score-0.9]

65 Within each group the faces differ slightly due to the learned invariance provided by the complex cells in the lower layer (and thus the mean of each group of images is blurry). [sent-175, score-1.242]

66 9 1 (c) Figure 3: (a) First layer simple cell filters learned by K-means. [sent-191, score-0.659]

67 (b) Sets of simple cell filters belonging to three pooling groups learned by our complex cell training algorithm. [sent-192, score-1.348]

68 (c) Precision-Recall curves showing selectivity for human faces of 5 low-level simple cells trained from a full 32-by-32 patch (red curves, bottom) versus 5 higher-level simple cells (green curves, top). [sent-193, score-1.422]

69 It may appear that this result could be obtained by applying our simple cell learning procedure directly to full 32-by-32 images without any attempts at incorporating local invariance. [sent-195, score-0.579]

70 The lower curves in Figure 3c are the precision-recall curves for the best 5 simple cells found in this way. [sent-198, score-0.595]

71 Clearly the higher-level features are dramatically better detectors than simple cells built directly from pixels9 (only 64% AUC). [sent-199, score-0.731]

72 8 We visualize the higher-level features by averaging together the 100 unlabeled images from our YouTube dataset that elicit the strongest activation. [sent-201, score-0.458]

73 9 These simple cells were trained by applying K-means to normalized, whitened 32-by-32 pixel patches from a smaller unlabeled set known to have a higher concentration of faces. [sent-202, score-0.871]

74 Due to this, a handful of centroids look roughly like face exemplars and act as simple “template matchers”. [sent-203, score-0.269]

75 6 AUC Best 32-by-32 simple cell 64% Best in s ¯ 86% Best in c ¯ 80% Supervised Linear SVM 77% Table 1: Area under PR curve for different cells on our face detection validation set. [sent-205, score-1.014]

76 (a) A collection of patches from our unlabeled dataset that maximally activate one of the high-level simple cells from s. [sent-208, score-0.867]

77 (b) The mean of the top stimuli for a handful ¯ of face-selective cells in s. [sent-209, score-0.497]

78 (c) Visualization of the face-selective cells that belong to one of the ¯ ¯ complex cells in c discovered by the single-link clustering algorithm applied to D. [sent-210, score-1.07]

79 (d) A collection ¯ of unlabeled patches that elicit a strong response from the complex cell visualized in (c) — virtually all are faces, at a variety of scales and positions. [sent-211, score-0.869]

80 On the other hand, it does not perform as well as the higher level simple cells learned by our system even though it is likely the best possible linear detector. [sent-216, score-0.627]

81 Finally, we inspect the higher-level complex cells learned by the applying the same agglomerative clustering procedure to the higher-level simple cell filters. [sent-217, score-1.318]

82 Due to the invariance introduced at the lower layers, two simple cells that detect faces at slightly different locations or scales will often have very similar filter weights and thus we expect our algorithm to find and combine these simple cells into higher-level invariant features cells. [sent-218, score-1.724]

83 To visualize our higher-level complex cell features c, we can simply look at visualizations for all of ¯ ¯ the simple cells in each of the groups G. [sent-219, score-1.419]

84 These visualizations show us the set of patches that strongly activate each simple cell, and hence also activate the complex cell. [sent-220, score-0.565]

85 The results of such a visualization for one group that was found to contain only face-selective cells is shown in Figure 4c. [sent-221, score-0.497]

86 A sampling of image patches collected from the unlabeled data that strongly activate the corresponding complex cell are shown in Figure 4d. [sent-223, score-0.945]

87 We see that the complex cell detects many faces but at a much wider variety of positions and scales compared to the simple cells, demonstrating that even “higher level” invariances are being captured, including scale invariance. [sent-224, score-0.95]

88 Benchmarked on our labeled set, this complex cell achieves 80. [sent-225, score-0.676]

89 0% AUC—somewhat worse than the very best simple cells, but still in the top 10 performing cells in the entire network. [sent-226, score-0.495]

90 These results suggest that the basic notions of invariance and selectivity that underpin popular feature learning algorithms may be sufficient to discover the kinds of high-level features that we desire, possibly including whole object classes robust to local and global variations. [sent-229, score-0.52]

91 Indeed, using simple implementations of selective and invariant features closely related to existing algorithms, we have found that is possible to build features with high selectivity for a coherent, commonly occurring object class. [sent-230, score-0.88]

92 Though human faces occur only very rarely in our very large dataset, it is clear that the complex cell visualized Figure 4d is adept at spotting them amongst tens of millions of images. [sent-231, score-0.877]

93 The advantage of this criterion is that it can determine which features to pool together even when the simple cell filters are orthogonal (where they would be too far apart for our algorithm to recognize their relationship). [sent-235, score-0.694]

94 This in turn allows us to construct features invariant to shifts along the manifold with little effort. [sent-242, score-0.383]

95 One of our key results, the unsupervised discovery of features selective for human faces is fairly unique (though seen recently in the extremely large system of [11]). [sent-244, score-0.702]

96 For instance, [13] trained Deep Belief Network models that decomposed object classes like faces and cars into parts using a probabilistic max-pooling to gain translation invariance. [sent-246, score-0.368]

97 Even if we do not use the labels themselves the labeled examples are, by construction, highly clustered: faces will be separated from other objects because there are no partial faces or random clutter. [sent-253, score-0.536]

98 Specifically, we found that higher level simple cells could learn to detect human faces without any supervision at all, and that our complex-cell learning procedure combined these into even higher-level invariances. [sent-265, score-0.875]

99 Labeled faces in the wild: A database for studying face recognition in unconstrained environments. [sent-293, score-0.272]

100 Emergence of phase-and shift-invariant features by decomposition a of natural images into independent feature subspaces. [sent-298, score-0.273]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cell', 0.443), ('cells', 0.427), ('lters', 0.277), ('faces', 0.222), ('invariant', 0.158), ('patches', 0.154), ('features', 0.153), ('complex', 0.141), ('selective', 0.141), ('responses', 0.138), ('pooling', 0.129), ('object', 0.111), ('unlabeled', 0.1), ('auc', 0.097), ('invariance', 0.096), ('youtube', 0.094), ('layer', 0.093), ('labeled', 0.092), ('detectors', 0.083), ('centroids', 0.081), ('thumbnails', 0.076), ('lter', 0.075), ('clustering', 0.075), ('gj', 0.075), ('activate', 0.072), ('group', 0.07), ('handful', 0.07), ('groups', 0.069), ('images', 0.068), ('simple', 0.068), ('cropped', 0.066), ('agglomerative', 0.066), ('supervision', 0.062), ('visualize', 0.06), ('unsupervised', 0.058), ('visualizations', 0.058), ('train', 0.056), ('discover', 0.056), ('learned', 0.055), ('selectivity', 0.052), ('feature', 0.052), ('million', 0.051), ('curves', 0.05), ('face', 0.05), ('cj', 0.047), ('invariances', 0.047), ('pixel', 0.047), ('dataset', 0.046), ('system', 0.046), ('extremely', 0.045), ('occurring', 0.044), ('inspect', 0.043), ('layers', 0.043), ('manifold', 0.043), ('locations', 0.042), ('hyv', 0.042), ('rinen', 0.042), ('whitened', 0.04), ('coding', 0.04), ('belt', 0.038), ('contractive', 0.038), ('isocontour', 0.038), ('rifai', 0.038), ('respond', 0.037), ('dissimilarity', 0.037), ('human', 0.037), ('gabor', 0.036), ('patch', 0.036), ('image', 0.035), ('harvested', 0.035), ('trained', 0.035), ('scalable', 0.035), ('emergence', 0.034), ('tens', 0.034), ('combine', 0.033), ('pictured', 0.033), ('elicit', 0.031), ('plane', 0.031), ('though', 0.031), ('deep', 0.03), ('pool', 0.03), ('detect', 0.03), ('bank', 0.03), ('wild', 0.03), ('construct', 0.029), ('scale', 0.029), ('topographic', 0.029), ('learn', 0.029), ('patterns', 0.028), ('ak', 0.028), ('sensitive', 0.028), ('identi', 0.027), ('thousand', 0.027), ('vastly', 0.027), ('concepts', 0.027), ('clutter', 0.026), ('identify', 0.026), ('curve', 0.026), ('ica', 0.026), ('vincent', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 62 nips-2012-Burn-in, bias, and the rationality of anchoring

Author: Falk Lieder, Thomas Griffiths, Noah Goodman

Abstract: Recent work in unsupervised feature learning has focused on the goal of discovering high-level features from unlabeled images. Much progress has been made in this direction, but in most cases it is still standard to use a large amount of labeled data in order to construct detectors sensitive to object classes or other complex patterns in the data. In this paper, we aim to test the hypothesis that unsupervised feature learning methods, provided with only unlabeled data, can learn high-level, invariant features that are sensitive to commonly-occurring objects. Though a handful of prior results suggest that this is possible when each object class accounts for a large fraction of the data (as in many labeled datasets), it is unclear whether something similar can be accomplished when dealing with completely unlabeled data. A major obstacle to this test, however, is scale: we cannot expect to succeed with small datasets or with small numbers of learned features. Here, we propose a large-scale feature learning system that enables us to carry out this experiment, learning 150,000 features from tens of millions of unlabeled images. Based on two scalable clustering algorithms (K-means and agglomerative clustering), we find that our simple system can discover features sensitive to a commonly occurring object class (human faces) and can also combine these into detectors invariant to significant global distortions like large translations and scale. 1

2 1.0000004 116 nips-2012-Emergence of Object-Selective Features in Unsupervised Feature Learning

Author: Adam Coates, Andrej Karpathy, Andrew Y. Ng

Abstract: Recent work in unsupervised feature learning has focused on the goal of discovering high-level features from unlabeled images. Much progress has been made in this direction, but in most cases it is still standard to use a large amount of labeled data in order to construct detectors sensitive to object classes or other complex patterns in the data. In this paper, we aim to test the hypothesis that unsupervised feature learning methods, provided with only unlabeled data, can learn high-level, invariant features that are sensitive to commonly-occurring objects. Though a handful of prior results suggest that this is possible when each object class accounts for a large fraction of the data (as in many labeled datasets), it is unclear whether something similar can be accomplished when dealing with completely unlabeled data. A major obstacle to this test, however, is scale: we cannot expect to succeed with small datasets or with small numbers of learned features. Here, we propose a large-scale feature learning system that enables us to carry out this experiment, learning 150,000 features from tens of millions of unlabeled images. Based on two scalable clustering algorithms (K-means and agglomerative clustering), we find that our simple system can discover features sensitive to a commonly occurring object class (human faces) and can also combine these into detectors invariant to significant global distortions like large translations and scale. 1

3 0.20606898 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

Author: Will Zou, Shenghuo Zhu, Kai Yu, Andrew Y. Ng

Abstract: We apply salient feature detection and tracking in videos to simulate fixations and smooth pursuit in human vision. With tracked sequences as input, a hierarchical network of modules learns invariant features using a temporal slowness constraint. The network encodes invariance which are increasingly complex with hierarchy. Although learned from videos, our features are spatial instead of spatial-temporal, and well suited for extracting features from still images. We applied our features to four datasets (COIL-100, Caltech 101, STL-10, PubFig), and observe a consistent improvement of 4% to 5% in classification accuracy. With this approach, we achieve state-of-the-art recognition accuracy 61% on STL-10 dataset. 1

4 0.20006442 23 nips-2012-A lattice filter model of the visual pathway

Author: Karol Gregor, Dmitri B. Chklovskii

Abstract: Early stages of visual processing are thought to decorrelate, or whiten, the incoming temporally varying signals. Motivated by the cascade structure of the visual pathway (retina → lateral geniculate nucelus (LGN) → primary visual cortex, V1) we propose to model its function using lattice filters - signal processing devices for stage-wise decorrelation of temporal signals. Lattice filter models predict neuronal responses consistent with physiological recordings in cats and primates. In particular, they predict temporal receptive fields of two different types resembling so-called lagged and non-lagged cells in the LGN. Moreover, connection weights in the lattice filter can be learned using Hebbian rules in a stage-wise sequential manner reminiscent of the neuro-developmental sequence in mammals. In addition, lattice filters can model visual processing in insects. Therefore, lattice filter is a useful abstraction that captures temporal aspects of visual processing. Our sensory organs face an ongoing barrage of stimuli from the world and must transmit as much information about them as possible to the rest of the brain [1]. This is a formidable task because, in sensory modalities such as vision, the dynamic range of natural stimuli (more than three orders of magnitude) greatly exceeds the dynamic range of relay neurons (less than two orders of magnitude) [2]. The reason why high fidelity transmission is possible at all is that the continuity of objects in the physical world leads to correlations in natural stimuli, which imply redundancy. In turn, such redundancy can be eliminated by compression performed by the front end of the visual system leading to the reduction of the dynamic range [3, 4]. A compression strategy appropriate for redundant natural stimuli is called predictive coding [5, 6, 7]. In predictive coding, a prediction of the incoming signal value is computed from past values delayed in the circuit. This prediction is subtracted from the actual signal value and only the prediction error is transmitted. In the absence of transmission noise such compression is lossless as the original signal could be decoded on the receiving end by inverting the encoder. If predictions are accurate, the dynamic range of the error is much smaller than that of the natural stimuli. Therefore, minimizing dynamic range using predictive coding reduces to optimizing prediction. Experimental support for viewing the front end of the visual system as a predictive encoder comes from the measurements of receptive fields [6, 7]. In particular, predictive coding suggests that, for natural stimuli, the temporal receptive fields should be biphasic and the spatial receptive fields center-surround. These predictions are born out by experimental measurements in retinal ganglion cells, [8], lateral geniculate nucleus (LGN) neurons [9] and fly second order visual neurons called large monopolar cells (LMCs) [2]. In addition, the experimentally measured receptive fields vary with signal-to-noise ratio as would be expected from optimal prediction theory [6]. Furthermore, experimentally observed whitening of the transmitted signal [10] is consistent with removing correlated components from the incoming signals [11]. As natural stimuli contain correlations on time scales greater than hundred milliseconds, experimentally measured receptive fields of LGN neurons are equally long [12]. Decorrelation over such long time scales requires equally long delays. How can such extended receptive field be produced by 1 biological neurons and synapses whose time constants are typically less than hundred milliseconds [13]? The field of signal processing offers a solution to this problem in the form of a device called a lattice filter, which decorrelates signals in stages, sequentially adding longer and longer delays [14, 15, 16, 17]. Motivated by the cascade structure of visual systems [18], we propose to model decorrelation in them by lattice filters. Naturally, visual systems are more complex than lattice filters and perform many other operations. However, we show that the lattice filter model explains several existing observations in vertebrate and invertebrate visual systems and makes testable predictions. Therefore, we believe that lattice filters provide a convenient abstraction for modeling temporal aspects of visual processing. This paper is organized as follows. First, we briefly summarize relevant results from linear prediction theory. Second, we explain the operation of the lattice filter in discrete and continuous time. Third, we compare lattice filter predictions with physiological measurements. 1 Linear prediction theory Despite the non-linear nature of neurons and synapses, the operation of some neural circuits in vertebrates [19] and invertebrates [20] can be described by a linear systems theory. The advantage of linear systems is that optimal circuit parameters may be obtained analytically and the results are often intuitively clear. Perhaps not surprisingly, the field of signal processing relies heavily on the linear prediction theory, offering a convenient framework [15, 16, 17]. Below, we summarize the results from linear prediction that will be used to explain the operation of the lattice filter. Consider a scalar sequence y = {yt } where time t = 1, . . . , n. Suppose that yt at each time point depends on side information provided by vector zt . Our goal is to generate a series of linear predictions, yt from the vector zt , yt = w · zt . We define a prediction error as: ˆ ˆ et = yt − yt = yt − w · zt ˆ (1) and look for values of w that minimize mean squared error: e2 = 1 nt e2 = t t 1 nt (yt − w · zt )2 . (2) t The weight vector, w is optimal for prediction of sequence y from sequence z if and only if the prediction error sequence e = y − w · z is orthogonal to each component of vector z: ez = 0. (3) When the whole series y is given in advance, i.e. in the offline setting, these so-called normal equations can be solved for w, for example, by Gaussian elimination [21]. However, in signal processing and neuroscience applications, another setting called online is more relevant: At every time step t, prediction yt must be made using only current values of zt and w. Furthermore, after a ˆ prediction is made, w is updated based on the prediction yt and observed yt , zt . ˆ In the online setting, an algorithm called stochastic gradient descent is often used, where, at each time step, w is updated in the direction of negative gradient of e2 : t w →w−η w (yt − w · zt ) 2 . (4) This leads to the following weight update, known as least mean square (LMS) [15], for predicting sequence y from sequence z: w → w + ηet zt , (5) where η is the learning rate. The value of η represents the relative influence of more recent observations compared to more distant ones. The larger the learning rate the faster the system adapts to recent observations and less past it remembers. In this paper, we are interested in predicting a current value xt of sequence x from its past values xt−1 , . . . , xt−k restricted by the prediction order k > 0: xt = wk · (xt−1 , . . . , xt−k )T . ˆ 2 (6) This problem is a special case of the online linear prediction framework above, where yt = xt , zt = (xt−1 , . . . , xt−k )T . Then the gradient update is given by: w → wk + ηet (xt−1 , . . . , xt−k )T . (7) While the LMS algorithm can find the weights that optimize linear prediction (6), the filter wk has a long temporal extent making it difficult to implement with neurons and synapses. 2 Lattice filters One way to generate long receptive fields in circuits of biological neurons is to use a cascade architecture, known as the lattice filter, which calculates optimal linear predictions for temporal sequences and transmits prediction errors [14, 15, 16, 17]. In this section, we explain the operation of a discrete-time lattice filter, then adapt it to continuous-time operation. 2.1 Discrete-time implementation The first stage of the lattice filter, Figure 1, calculates the error of the first order optimal prediction (i.e. only using the preceding element of the sequence), the second stage uses the output of the first stage and calculates the error of the second order optimal prediction (i.e. using only two previous values) etc. To make such stage-wise error computations possible the lattice filter calculates at every stage not only the error of optimal prediction of xt from past values xt−1 , . . . , xt−k , called forward error, ftk = xt − wk · (xt−1 , . . . , xt−k )T , (8) but, perhaps non-intuitively, also the error of optimal prediction of a past value xt−k from the more recent values xt−k+1 , . . . , xt , called backward error: bk = xt−k − w k · (xt−k+1 , . . . , xt )T , t k where w and w k (9) are the weights of the optimal prediction. For example, the first stage of the filter calculates the forward error ft1 of optimal prediction of xt from xt−1 : ft1 = xt − u1 xt−1 as well as the backward error b1 of optimal prediction of xt−1 from t xt : b1 = xt−1 − v 1 xt , Figure 1. Here, we assume that coefficients u1 and v 1 that give optimal linear t prediction are known and return to learning them below. Each following stage of the lattice filter performs a stereotypic operation on its inputs, Figure 1. The k-th stage (k > 1) receives forward, ftk−1 , and backward, bk−1 , errors from the previous stage, t delays backward error by one time step and computes a forward error: ftk = ftk−1 − uk bk−1 t−1 (10) of the optimal linear prediction of ftk−1 from bk−1 . In addition, each stage computes a backward t−1 error k−1 k bt = bt−1 − v k ftk−1 (11) of the optimal linear prediction of bk−1 from ftk−1 . t−1 As can be seen in Figure 1, the lattice filter contains forward prediction error (top) and backward prediction error (bottom) branches, which interact at every stage via cross-links. Operation of the lattice filter can be characterized by the linear filters acting on the input, x, to compute forward or backward errors of consecutive order, so called prediction-error filters (blue bars in Figure 1). Because of delays in the backward error branch the temporal extent of the filters grows from stage to stage. In the next section, we will argue that prediction-error filters correspond to the measurements of temporal receptive fields in neurons. For detailed comparison with physiological measurements we will use the result that, for bi-phasic prediction-error filters, such as the ones in Figure 1, the first bar of the forward prediction-error filter has larger weight, by absolute value, than the combined weights of the remaining coefficients of the corresponding filter. Similarly, in backward predictionerror filters, the last bar has greater weight than the rest of them combined. This fact arises from the observation that forward prediction-error filters are minimum phase, while backward predictionerror filters are maximum phase [16, 17]. 3 Figure 1: Discrete-time lattice filter performs stage-wise computation of forward and backward prediction errors. In the first stage, the optimal prediction of xt from xt−1 is computed by delaying the input by one time step and multiplying it by u1 . The upper summation unit subtracts the predicted xt from the actual value and outputs prediction error ft1 . Similarly, the optimal prediction of xt−1 from xt is computed by multiplying the input by v 1 . The lower summation unit subtracts the optimal prediction from the actual value and outputs backward error b1 . In each following stage t k, the optimal prediction of ftk−1 from bk−1 is computed by delaying bk−1 by one time step and t t multiplying it by uk . The upper summation unit subtracts the prediction from the actual ftk−1 and outputs prediction error ftk . Similarly, the optimal prediction of bk−1 from ftk−1 is computed by t−1 multiplying it by uk . The lower summation unit subtracts the optimal prediction from the actual value and outputs backward error bk . Black connections have unitary weights and red connections t have learnable negative weights. One can view forward and backward error calculations as applications of so-called prediction-error filters (blue) to the input sequence. Note that the temporal extent of the filters gets longer from stage to stage. Next, we derive a learning rule for finding optimal coefficients u and v in the online setting. The uk is used for predicting ftk−1 from bk−1 to obtain error ftk . By substituting yt = ftk−1 , zt = bk−1 and t−1 t−1 et = ftk into (5) the update of uk becomes uk → uk + ηftk bk−1 . t−1 (12) Similarly, v k is updated by v k → v k + ηbk ftk−1 . (13) t Interestingly, the updates of the weights are given by the product of the activities of outgoing and incoming nodes of the corresponding cross-links. Such updates are known as Hebbian learning rules thought to be used by biological neurons [22, 23]. Finally, we give a simple proof that, in the offline setting when the entire sequence x is known, f k and bk , given by equations (10, 11), are indeed errors of optimal k-th order linear prediction. Let D be one step time delay operator (Dx)t = xt−1 . The induction statement at k is that f k and bk are k-th order forward and backward errors of optimal linear prediction which is equivalent to f k and bk k k being of the form f k = x−w1 Dx−. . .−wk Dk x and bk = Dk x−w1k Dk−1 x−. . .−wkk x and, from k i normal equations (3), satisfying f D x = 0 and Dbk Di x = bk Di−1 x = 0 for i = 1, . . . , k. That this is true for k = 1 directly follows from the definition of f 1 and b1 . Now we assume that this is true for k − 1 ≥ 1 and show it is true for k. It is easy to see from the forms of f k−1 and bk−1 k k and from f k = f k−1 − uk Dbk−1 that f k has the correct form f k = x − w1 Dx − . . . − wk Dk x. k i k−1 k k−1 Regarding orthogonality for i = 1, . . . , k − 1 we have f D x = (f − u Db )Di x = f k−1 Di x − uk (Dbk−1 )Di x = 0 using the induction assumptions of orhogonality at k − 1. For the remaining i = k we note that f k is the error of the optimal linear prediction of f k−1 from Dbk−1 k−1 and therefore 0 = f k Dbk−1 = f k (Dk x − w1k−1 Dk−1 x − . . . + wk−1 Dx) = f k Dk x as desired. The bk case can be proven similarly. 2.2 Continuous-time implementation The last hurdle remaining for modeling neuronal circuits which operate in continuous time with a lattice filter is its discrete-time operation. To obtain a continuous-time implementation of the lattice 4 filter we cannot simply take the time step size to zero as prediction-error filters would become infinitesimally short. Here, we adapt the discrete-time lattice filter to continous-time operation in two steps. First, we introduce a discrete-time Laguerre lattice filter [24, 17] which uses Laguerre polynomials rather than the shift operator to generate its basis functions, Figure 2. The input signal passes through a leaky integrator whose leakage constant α defines a time-scale distinct from the time step (14). A delay, D, at every stage is replaced by an all-pass filter, L, (15) with the same constant α, which preserves the magnitude of every Fourier component of the input but shifts its phase in a frequency dependent manner. Such all-pass filter reduces to a single time-step delay when α = 0. The optimality of a general discrete-time Laguerre lattice filter can be proven similarly to that for the discrete-time filter, simply by replacing operator D with L in the proof of section 2.1. Figure 2: Continuous-time lattice filter using Laguerre polynomials. Compared to the discretetime version, it contains a leaky integrator, L0 ,(16) and replaces delays with all-pass filters, L, (17). Second, we obtain a continuous-time formulation of the lattice filter by replacing t − 1 → t − δt, defining the inverse time scale γ = (1 − α)/δt and taking the limit δt → 0 while keeping γ fixed. As a result L0 and L are given by: Discrete time L0 (x)t L(x)t Continuous time = αL0 (x)t−1 + xt (14) = α(L(x)t−1 − xt ) + xt−1 (15) dL0 (x)/dt = −γL0 (x) + x L(x) = x − 2γL0 (x) (16) (17) Representative impulse responses of the continuous Laguerre filter are shown in Figure 2. Note that, similarly to the discrete-time case, the area under the first (peak) phase is greater than the area under the second (rebound) phase in the forward branch and the opposite is true in the backward branch. Moreover, the temporal extent of the rebound is greater than that of the peak not just in the forward branch like in the basic discrete-time implementation but also in the backward branch. As will be seen in the next section, these predictions are confirmed by physiological recordings. 3 Experimental evidence for the lattice filter in visual pathways In this section we demonstrate that physiological measurements from visual pathways in vertebrates and invertebrates are consistent with the predictions of the lattice filter model. For the purpose of modeling visual pathways, we identify summation units of the lattice filter with neurons and propose that neural activity represents forward and backward errors. In the fly visual pathway neuronal activity is represented by continuously varying graded potentials. In the vertebrate visual system, all neurons starting with ganglion cells are spiking and we identify their firing rate with the activity in the lattice filter. 3.1 Mammalian visual pathway In mammals, visual processing is performed in stages. In the retina, photoreceptors synapse onto bipolar cells, which in turn synapse onto retinal ganglion cells (RGCs). RGCs send axons to the LGN, where they synapse onto LGN relay neurons projecting to the primary visual cortex, V1. In addition to this feedforward pathway, at each stage there are local circuits involving (usually inhibitory) inter-neurons such as horizontal and amacrine cells in the retina. Neurons of each class 5 come in many types, which differ in their connectivity, morphology and physiological response. The bewildering complexity of these circuits has posed a major challenge to visual neuroscience. Alonso et al. • Connections between LGN and Cortex J. Neurosci., June 1, 2001, 21(11):4002–4015 4009 Temporal Filter 1 0.5 0 -0.5 -1 RGC LGN 0 100 Time (ms) 200 Figure 7. Distribution of geniculate cells and simple cells with respect to the timing of their responses. The distribution of three parameters derived from impulse responses of geniculate and cortical neurons is shown. A, Peak time. B, Zero-crossing time. C, Rebound index. Peak time is the time with the strongest response in the first phase of the impulse response. Zero-crossing time is the time between the first and second phases. Rebound index is the area of the impulse response after the zero crossing divided by the area before the zero crossing. Only impulse responses with good signal to noise were included (Ͼ5 SD above baseline; n ϭ 169). Figure 3: Electrophysiologically measured temporal receptive fields get progressively longer along the cat visual pathway. Left: A cat LGN cell (red) has a longer receptive field than a corresponding RGC cell (blue) (adapted from [12] which also reports population data). Right (A,B): Extent of the temporal receptive fields of simple cells in cat V1 is greater than that of corresponding LGN cells as quantified by the peak (A) and zero-crossing (B) times. Right (C): In the temporal receptive fields of cat LGN and V1 cells the peak can be stronger or weaker than the rebound (adapted from [25]). simple cells and geniculate cells differed for all temporal parameters measured, there was considerable overlap between the distributions (Fig. 7). This overlap raises the following question: does connectivity depend on how well geniculate and cortical responses are matched with respect to time? For instance, do simple cells with fast subregions (early times to peak and early zero crossings) receive input mostly from geniculate cells with fast centers? Figure 8 illustrates the visual responses from a geniculate cell and a simple cell that were monosynaptically connected. A strong positive peak was observed in the correlogram (shown with a 10 msec time window to emphasize its short latency and fast rise time). In this case, an ON central subregion was well overlapped with an ON geniculate center (precisely at the peak of the subregion). Moreover, the timings of the visual responses from the overlapped subregion and the geniculate center were very similar (same onset, ϳ0 –25 msec; same peak, ϳ25–50 msec). It is worth noting that the two central subregions of the simple cell were faster and stronger than the two lateral subregions. The responses of the central subregions matched the timing of the geniculate center. In contrast, the timing of the lateral subregions resembled more closely the timing of the geniculate surround (both peaked at 25–50 msec). Unlike the example shown in Figure 8, a considerable number of geniculocortical pairs produced responses with different timing. For example, Figure 9 illustrates a case in which a geniculate center fully overlapped a strong simple-cell subregion of the same sign, but with slower timing (LGN onset, ϳ0 –25 msec; peak, ϳ25–50 msec; simple-cell onset, ϳ25–50 msec; peak, ϳ50 –75 msec). The cross-correlogram between this pair of neurons was flat, which indicates the absence of a monosynaptic connection (Fig. 9, top right). To examine the role of timing in geniculocortical connectivity, we measured the response time course from all cell pairs that met two criteria. First, the geniculate center overlapped a simple-cell subregion of the same sign (n ϭ 104). Second, the geniculate center overlapped the cortical subregion in a near-optimal position (relative overlap Ͼ 50%, n ϭ 47; see Materials and Methods; Fig. 5A). All these cell pairs had a high probability of being monosynaptically connected because of the precise match in receptive-field position and sign (31 of 47 were connected). The distributions of peak time, zero-crossing time, and rebound index from these cell pairs were very similar to the distributions from the entire sample (Fig. 7; see also Fig. 10 legend). The selected cell pairs included both presumed directional (predicted DI Ͼ 0.3, see Materials and Methods; 12/20 connected) and nondirectional (19/27 connected) simple cells. Most geniculate cells had small receptive fields (less than two simple-cell subregion widths; see Receptive-field sign), although five cells with larger receptive fields were also included (three connected). From the 47 cell pairs used in this analysis, those with similar response time courses had a higher probability of being connected (Fig. 10). In particular, cell pairs that had both similar peak time and zero-crossing time were all connected (n ϭ 12; Fig. 10 A). Directionally selective simple cells were included in all timing groups. For example, in Figure 10 A there were four, five, two, and one directionally selective cells in the time groups Ͻ20, 40, 60, and Ͼ60 msec, respectively. Similar results were obtained if we restricted our sample to geniculate centers overlapped with the dominant subregion of the simple cell (n ϭ 31). Interestingly, the efficacy and contributions of the connections seemed to depend little on the relative timing of the visual responses (Fig. 10, right). Although our sample of them was quite small, lagged cells are of considerable interest and therefore deserve comment. We recorded from 13 potentially lagged LGN cells whose centers were superimposed with a simple-cell subregion (eight with rebound indices between 1.2 and 1.5; five with rebound indices Ͼ1.9). Only seven of these pairs could be used for timing comparisons (in one pair the baseline of the correlogram had insufficient spikes; in three pairs the geniculate receptive fields were Here, we point out several experimental observations related to temporal processing in the visual system consistent with the lattice filter model. First, measurements of temporal receptive fields demonstrate that they get progressively longer at each consecutive stage: i) LGN neurons have longer receptive fields than corresponding pre-synaptic ganglion cells [12], Figure 3left; ii) simple cells in V1 have longer receptive fields than corresponding pre-synaptic LGN neurons [25], Figure 3rightA,B. These observation are consistent with the progressively greater temporal extent of the prediction-error filters (blue plots in Figure 2). Second, the weight of the peak (integrated area under the curve) may be either greater or less than that of the rebound both in LGN relay cells [26] and simple cells of V1 [25], Figure 3right(C). Neurons with peak weight exceeding that of rebound are often referred to as non-lagged while the others are known as lagged found both in cat [27, 28, 29] and monkey [30]. The reason for this becomes clear from the response to a step stimulus, Figure 4(top). By comparing experimentally measured receptive fields with those of the continuous lattice filter, Figure 4, we identify non-lagged neurons with the forward branch and lagged neurons with the backward branch. Another way to characterize step-stimulus response is whether the sign of the transient is the same (non-lagged) or different (lagged) relative to sustained response. Third, measurements of cross-correlation between RGCs and LGN cell spikes in lagged and nonlagged neurons reveals a difference of the transfer function indicative of the difference in underlying circuitry [30]. This is consistent with backward pathway circuit of the Laguerre lattice filter, Figure 2, being more complex then that of the forward path (which results in different transfer function). ” (or replacing ”more complex” with ”different”) Third, measurements of cross-correlation between RGCs and LGN cell spikes in lagged and nonlagged neurons reveals a difference of the transfer function indicative of the difference in underlying circuitry [31]. This is consistent with the backward branch circuit of the Laguerre lattice filter, Figure 2, being different then that of the forward branch (which results in different transfer function). In particular, a combination of different glutamate receptors such as AMPA and NMDA, as well as GABA receptors are thought to be responsible for observed responses in lagged cells [32]. However, further investigation of the corresponding circuitry, perhaps using connectomics technology, is desirable. Fourth, the cross-link weights of the lattice filter can be learned using Hebbian rules, (12,13) which are biologically plausible [22, 23]. Interestingly, if these weights are learned sequentially, starting from the first stage, they do not need to be re-learned when additional stages are added or learned. This property maps naturally on the fact that in the course of mammalian development the visual pathway matures in a stage-wise fashion - starting with the retina, then LGN, then V1 - and implying that the more peripheral structures do not need to adapt to the maturation of the downstream ones. 6 Figure 4: Comparison of electrophysiologically measured responses of cat LGN cells with the continuous-time lattice filter model. Top: Experimentally measured temporal receptive fields and step-stimulus responses of LGN cells (adapted from [26]). Bottom: Typical examples of responses in the continuous-time lattice filter model. Lattice filter coefficients were u1 = v 1 = 0.4, u2 = v 2 = 0.2 and 1/γ = 50ms to model the non-lagged cell and u1 = v 1 = u2 = v 2 = 0.2 and 1/γ = 60ms to model the lagged cell. To model photoreceptor contribution to the responses, an additional leaky integrator L0 was added to the circuit of Figure 2. While Hebbian rules are biologically plausible, one may get an impression from Figure 2 that they must apply to inhibitory cross-links. We point out that this circuit is meant to represent only the computation performed rather than the specific implementation in terms of neurons. As the same linear computation can be performed by circuits with a different arrangement of the same components, there are multiple implementations of the lattice filter. For example, activity of non-lagged OFF cells may be seen as representing minus forward error. Then the cross-links between the non-lagged OFF pathway and the lagged ON pathway would be excitatory. In general, classification of cells into lagged and non-lagged seems independent of their ON/OFF and X/Y classification [31, 28, 29], but see[33]. 3.2 Insect visual pathway In insects, two cell types, L1 and L2, both post-synaptic to photoreceptors play an important role in visual processing. Physiological responses of L1 and L2 indicate that they decorrelate visual signals by subtracting their predictable parts. In fact, receptive fields of these neurons were used as the first examples of predictive coding in neuroscience [6]. Yet, as the numbers of synapses from photoreceptors to L1 and L2 are the same [34] and their physiological properties are similar, it has been a mystery why insects, have not just one but a pair of such seemingly redundant neurons per facet. Previously, it was suggested that L1 and L2 provide inputs to the two pathways that map onto ON and OFF pathways in the vertebrate retina [35, 36]. Here, we put forward a hypothesis that the role of L1 and L2 in visual processing is similar to that of the two branches of the lattice filter. We do not incorporate the ON/OFF distinction in the effectively linear lattice filter model but anticipate that such combined description will materialize in the future. As was argued in Section 2, in forward prediction-error filters, the peak has greater weight than the rebound, while in backward prediction-error filters the opposite is true. Such difference implies that in response to a step-stimulus the signs of sustained responses compared to initial transients are different between the branches. Indeed, Ca2+ imaging shows that responses of L1 and L2 to step-stimulus are different as predicted by the lattice filter model [35], Figure 5b. Interestingly, the activity of L1 seems to represent minus forward error and L2 - plus backward error, suggesting that the lattice filter cross-links are excitatory. To summarize, the predictions of the lattice filter model seem to be consistent with the physiological measurements in the fly visual system and may help understand its operation. 7 Stimulus 1 0.5 0 0 5 10 15 20 5 10 15 20 5 10 time 15 20 − Forward Error 1 0 −1 0 Backward Error 1 0 −1 0 Figure 5: Response of the lattice filter and fruit fly LMCs to a step-stimulus. Left: Responses of the first order discrete-time lattice filter to a step stimulus. Right: Responses of fly L1 and L2 cells to a moving step stimulus (adapted from [35]). Predicted and the experimentally measured responses have qualitatively the same shape: a transient followed by sustained response, which has the same sign for the forward error and L1 and the opposite sign for the backward error and L2. 4 Discussion Motivated by the cascade structure of the visual pathway, we propose to model its operation with the lattice filter. We demonstrate that the predictions of the continuous-time lattice filter model are consistent with the course of neural development and the physiological measurement in the LGN, V1 of cat and monkey, as well as fly LMC neurons. Therefore, lattice filters may offer a useful abstraction for understanding aspects of temporal processing in visual systems of vertebrates and invertebrates. Previously, [11] proposed that lagged and non-lagged cells could be a result of rectification by spiking neurons. Although we agree with [11] that LGN performs temporal decorrelation, our explanation does not rely on non-linear processing but rather on the cascade architecture and, hence, is fundamentally different. Our model generates the following predictions that are not obvious in [11]: i) Not only are LGN receptive fields longer than RGC but also V1 receptive fields are longer than LGN; ii) Even a linear model can generate a difference in the peak/rebound ratio; iii) The circuit from RGC to LGN should be different for lagged and non-lagged cells consistent with [31]; iv) The lattice filter circuit can self-organize using Hebbian rules, which gives a mechanistic explanation of receptive fields beyond the normative framework of [11]. In light of the redundancy reduction arguments given in the introduction, we note that, if the only goal of the system were to compress incoming signals using a given number of lattice filter stages, then after the compression is peformed only one kind of prediction errors, forward or backward needs to be transmitted. Therefore, having two channels, in the absence of noise, may seem redundant. However, transmitting both forward and backward errors gives one the flexibility to continue decorrelation further by adding stages performing relatively simple operations. We are grateful to D.A. Butts, E. Callaway, M. Carandini, D.A. Clark, J.A. Hirsch, T. Hu, S.B. Laughlin, D.N. Mastronarde, R.C. Reid, H. Rouault, A. Saul, L. Scheffer, F.T. Sommer, X. Wang for helpful discussions. References [1] F. Rieke, D. Warland, R.R. van Steveninck, and W. Bialek. Spikes: exploring the neural code. MIT press, 1999. [2] S.B. Laughlin. Matching coding, circuits, cells, and molecules to signals: general principles of retinal design in the fly’s eye. Progress in retinal and eye research, 13(1):165–196, 1994. [3] F. Attneave. Some informational aspects of visual perception. Psychological review, 61(3):183, 1954. [4] H. Barlow. Redundancy reduction revisited. Network: Comp in Neural Systems, 12(3):241–253, 2001. [5] R.M. Gray. Linear Predictive Coding and the Internet Protocol. Now Publishers, 2010. [6] MV Srinivasan, SB Laughlin, and A. Dubs. Predictive coding: a fresh view of inhibition in the retina. Proceedings of the Royal Society of London. Series B. Biological Sciences, 216(1205):427–459, 1982. [7] T. Hosoya, S.A. Baccus, and M. Meister. Dynamic predictive coding by the retina. Nature, 436:71, 2005. 8 [8] HK Hartline, H.G. Wagner, and EF MacNichol Jr. The peripheral origin of nervous activity in the visual system. Studies on excitation and inhibition in the retina: a collection of papers from the laboratories of H. Keffer Hartline, page 99, 1974. [9] N.A. Lesica, J. Jin, C. Weng, C.I. Yeh, D.A. Butts, G.B. Stanley, and J.M. Alonso. Adaptation to stimulus contrast and correlations during natural visual stimulation. Neuron, 55(3):479–491, 2007. [10] Y. Dan, J.J. Atick, and R.C. Reid. Efficient coding of natural scenes in the lateral geniculate nucleus: experimental test of a computational theory. The Journal of Neuroscience, 16(10):3351–3362, 1996. [11] D.W. Dong and J.J. Atick. Statistics of natural time-varying images. Network: Computation in Neural Systems, 6(3):345–358, 1995. [12] X. Wang, J.A. Hirsch, and F.T. Sommer. Recoding of sensory information across the retinothalamic synapse. The Journal of Neuroscience, 30(41):13567–13577, 2010. [13] C. Koch. Biophysics of computation: information processing in single neurons. Oxford Univ Press, 2005. [14] F. Itakura and S. Saito. On the optimum quantization of feature parameters in the parcor speech synthesizer. In Conference Record, 1972 International Conference on Speech Communication and Processing, Boston, MA, pages 434–437, 1972. [15] B. Widrow and S.D. Stearns. Adaptive signal processing. Prentice-Hall, Inc. Englewood Cliffs, NJ, 1985. [16] S. Haykin. Adaptive filter theory. Prentice-Hall, Englewood-Cliffs, NJ, 2003. [17] A.H. Sayed. Fundamentals of adaptive filtering. Wiley-IEEE Press, 2003. [18] D.J. Felleman and D.C. Van Essen. Distributed hierarchical processing in the primate cerebral cortex. Cerebral cortex, 1(1):1–47, 1991. [19] X. Wang, F.T. Sommer, and J.A. Hirsch. Inhibitory circuits for visual processing in thalamus. Current Opinion in Neurobiology, 2011. [20] SB Laughlin, J. Howard, and B. Blakeslee. Synaptic limitations to contrast coding in the retina of the blowfly calliphora. Proceedings of the Royal society of London. Series B. Biological sciences, 231(1265):437–467, 1987. [21] D.C. Lay. Linear Algebra and Its Applications. Addison-Wesley/Longman, New York/London, 2000. [22] D.O. Hebb. The organization of behavior: A neuropsychological theory. Lawrence Erlbaum, 2002. [23] O. Paulsen and T.J. Sejnowski. Natural patterns of activity and long-term synaptic plasticity. Current opinion in neurobiology, 10(2):172–180, 2000. [24] Z. Fejzo and H. Lev-Ari. Adaptive laguerre-lattice filters. Signal Processing, IEEE Transactions on, 45(12):3006–3016, 1997. [25] J.M. Alonso, W.M. Usrey, and R.C. Reid. Rules of connectivity between geniculate cells and simple cells in cat primary visual cortex. The Journal of Neuroscience, 21(11):4002–4015, 2001. [26] D. Cai, G.C. Deangelis, and R.D. Freeman. Spatiotemporal receptive field organization in the lateral geniculate nucleus of cats and kittens. Journal of Neurophysiology, 78(2):1045–1061, 1997. [27] D.N. Mastronarde. Two classes of single-input x-cells in cat lateral geniculate nucleus. i. receptive-field properties and classification of cells. Journal of Neurophysiology, 57(2):357–380, 1987. [28] J. Wolfe and L.A. Palmer. Temporal diversity in the lateral geniculate nucleus of cat. Visual neuroscience, 15(04):653–675, 1998. [29] AB Saul and AL Humphrey. Spatial and temporal response properties of lagged and nonlagged cells in cat lateral geniculate nucleus. Journal of Neurophysiology, 64(1):206–224, 1990. [30] A.B. Saul. Lagged cells in alert monkey lateral geniculate nucleus. Visual neurosci, 25:647–659, 2008. [31] D.N. Mastronarde. Two classes of single-input x-cells in cat lateral geniculate nucleus. ii. retinal inputs and the generation of receptive-field properties. Journal of Neurophysiology, 57(2):381–413, 1987. [32] P. Heggelund and E. Hartveit. Neurotransmitter receptors mediating excitatory input to cells in the cat lateral geniculate nucleus. i. lagged cells. Journal of neurophysiology, 63(6):1347–1360, 1990. [33] J. Jin, Y. Wang, R. Lashgari, H.A. Swadlow, and J.M. Alonso. Faster thalamocortical processing for dark than light visual targets. The Journal of Neuroscience, 31(48):17471–17479, 2011. [34] M. Rivera-Alba, S.N. Vitaladevuni, Y. Mischenko, Z. Lu, S. Takemura, L. Scheffer, I.A. Meinertzhagen, D.B. Chklovskii, and G.G. de Polavieja. Wiring economy and volume exclusion determine neuronal placement in the drosophila brain. Current Biology, 21(23):2000–5, 2011. [35] D.A. Clark, L. Bursztyn, M.A. Horowitz, M.J. Schnitzer, and T.R. Clandinin. Defining the computational structure of the motion detector in drosophila. Neuron, 70(6):1165–1177, 2011. [36] M. Joesch, B. Schnell, S.V. Raghu, D.F. Reiff, and A. Borst. On and off pathways in drosophila motion vision. Nature, 468(7321):300–304, 2010. 9

5 0.18784043 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

Author: Brett Vintch, Andrew Zaharia, J Movshon, Hhmi) Hhmi), Eero P. Simoncelli

Abstract: Many visual and auditory neurons have response properties that are well explained by pooling the rectified responses of a set of spatially shifted linear filters. These filters cannot be estimated using spike-triggered averaging (STA). Subspace methods such as spike-triggered covariance (STC) can recover multiple filters, but require substantial amounts of data, and recover an orthogonal basis for the subspace in which the filters reside rather than the filters themselves. Here, we assume a linear-nonlinear–linear-nonlinear (LN-LN) cascade model in which the first linear stage is a set of shifted (‘convolutional’) copies of a common filter, and the first nonlinear stage consists of rectifying scalar nonlinearities that are identical for all filter outputs. We refer to these initial LN elements as the ‘subunits’ of the receptive field. The second linear stage then computes a weighted sum of the responses of the rectified subunits. We present a method for directly fitting this model to spike data, and apply it to both simulated and real neuronal data from primate V1. The subunit model significantly outperforms STA and STC in terms of cross-validated accuracy and efficiency. 1

6 0.18255523 193 nips-2012-Learning to Align from Scratch

7 0.16895367 341 nips-2012-The topographic unsupervised learning of natural sounds in the auditory cortex

8 0.15243404 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

9 0.14046772 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification

10 0.13508891 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

11 0.12944047 195 nips-2012-Learning visual motion in recurrent neural networks

12 0.11458063 344 nips-2012-Timely Object Recognition

13 0.11310365 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

14 0.11198852 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

15 0.10792045 197 nips-2012-Learning with Recursive Perceptual Representations

16 0.090304255 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves

17 0.08389464 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

18 0.079541147 271 nips-2012-Pointwise Tracking the Optimal Regression Function

19 0.077645481 8 nips-2012-A Generative Model for Parts-based Object Segmentation

20 0.077512644 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.227), (1, 0.088), (2, -0.421), (3, 0.071), (4, 0.22), (5, 0.005), (6, -0.01), (7, -0.232), (8, 0.091), (9, -0.112), (10, 0.015), (11, -0.069), (12, -0.451), (13, -0.069), (14, 0.011), (15, -0.191), (16, 0.032), (17, 0.075), (18, -0.063), (19, 0.349), (20, 0.031), (21, 0.202), (22, 0.018), (23, -0.031), (24, -0.023), (25, -0.003), (26, 0.023), (27, -0.149), (28, -0.135), (29, 0.002), (30, 0.051), (31, -0.07), (32, -0.027), (33, 0.097), (34, -0.066), (35, -0.012), (36, 0.029), (37, 0.009), (38, -0.085), (39, 0.017), (40, -0.026), (41, -0.018), (42, 0.029), (43, 0.004), (44, -0.01), (45, -0.009), (46, 0.04), (47, 0.004), (48, -0.026), (49, -0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98236334 62 nips-2012-Burn-in, bias, and the rationality of anchoring

Author: Falk Lieder, Thomas Griffiths, Noah Goodman

Abstract: Recent work in unsupervised feature learning has focused on the goal of discovering high-level features from unlabeled images. Much progress has been made in this direction, but in most cases it is still standard to use a large amount of labeled data in order to construct detectors sensitive to object classes or other complex patterns in the data. In this paper, we aim to test the hypothesis that unsupervised feature learning methods, provided with only unlabeled data, can learn high-level, invariant features that are sensitive to commonly-occurring objects. Though a handful of prior results suggest that this is possible when each object class accounts for a large fraction of the data (as in many labeled datasets), it is unclear whether something similar can be accomplished when dealing with completely unlabeled data. A major obstacle to this test, however, is scale: we cannot expect to succeed with small datasets or with small numbers of learned features. Here, we propose a large-scale feature learning system that enables us to carry out this experiment, learning 150,000 features from tens of millions of unlabeled images. Based on two scalable clustering algorithms (K-means and agglomerative clustering), we find that our simple system can discover features sensitive to a commonly occurring object class (human faces) and can also combine these into detectors invariant to significant global distortions like large translations and scale. 1

2 0.98236334 116 nips-2012-Emergence of Object-Selective Features in Unsupervised Feature Learning

Author: Adam Coates, Andrej Karpathy, Andrew Y. Ng

Abstract: Recent work in unsupervised feature learning has focused on the goal of discovering high-level features from unlabeled images. Much progress has been made in this direction, but in most cases it is still standard to use a large amount of labeled data in order to construct detectors sensitive to object classes or other complex patterns in the data. In this paper, we aim to test the hypothesis that unsupervised feature learning methods, provided with only unlabeled data, can learn high-level, invariant features that are sensitive to commonly-occurring objects. Though a handful of prior results suggest that this is possible when each object class accounts for a large fraction of the data (as in many labeled datasets), it is unclear whether something similar can be accomplished when dealing with completely unlabeled data. A major obstacle to this test, however, is scale: we cannot expect to succeed with small datasets or with small numbers of learned features. Here, we propose a large-scale feature learning system that enables us to carry out this experiment, learning 150,000 features from tens of millions of unlabeled images. Based on two scalable clustering algorithms (K-means and agglomerative clustering), we find that our simple system can discover features sensitive to a commonly occurring object class (human faces) and can also combine these into detectors invariant to significant global distortions like large translations and scale. 1

3 0.69678515 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

Author: Brett Vintch, Andrew Zaharia, J Movshon, Hhmi) Hhmi), Eero P. Simoncelli

Abstract: Many visual and auditory neurons have response properties that are well explained by pooling the rectified responses of a set of spatially shifted linear filters. These filters cannot be estimated using spike-triggered averaging (STA). Subspace methods such as spike-triggered covariance (STC) can recover multiple filters, but require substantial amounts of data, and recover an orthogonal basis for the subspace in which the filters reside rather than the filters themselves. Here, we assume a linear-nonlinear–linear-nonlinear (LN-LN) cascade model in which the first linear stage is a set of shifted (‘convolutional’) copies of a common filter, and the first nonlinear stage consists of rectifying scalar nonlinearities that are identical for all filter outputs. We refer to these initial LN elements as the ‘subunits’ of the receptive field. The second linear stage then computes a weighted sum of the responses of the rectified subunits. We present a method for directly fitting this model to spike data, and apply it to both simulated and real neuronal data from primate V1. The subunit model significantly outperforms STA and STC in terms of cross-validated accuracy and efficiency. 1

4 0.68696636 341 nips-2012-The topographic unsupervised learning of natural sounds in the auditory cortex

Author: Hiroki Terashima, Masato Okada

Abstract: The computational modelling of the primary auditory cortex (A1) has been less fruitful than that of the primary visual cortex (V1) due to the less organized properties of A1. Greater disorder has recently been demonstrated for the tonotopy of A1 that has traditionally been considered to be as ordered as the retinotopy of V1. This disorder appears to be incongruous, given the uniformity of the neocortex; however, we hypothesized that both A1 and V1 would adopt an efficient coding strategy and that the disorder in A1 reflects natural sound statistics. To provide a computational model of the tonotopic disorder in A1, we used a model that was originally proposed for the smooth V1 map. In contrast to natural images, natural sounds exhibit distant correlations, which were learned and reflected in the disordered map. The auditory model predicted harmonic relationships among neighbouring A1 cells; furthermore, the same mechanism used to model V1 complex cells reproduced nonlinear responses similar to the pitch selectivity. These results contribute to the understanding of the sensory cortices of different modalities in a novel and integrated manner.

5 0.63735533 23 nips-2012-A lattice filter model of the visual pathway

Author: Karol Gregor, Dmitri B. Chklovskii

Abstract: Early stages of visual processing are thought to decorrelate, or whiten, the incoming temporally varying signals. Motivated by the cascade structure of the visual pathway (retina → lateral geniculate nucelus (LGN) → primary visual cortex, V1) we propose to model its function using lattice filters - signal processing devices for stage-wise decorrelation of temporal signals. Lattice filter models predict neuronal responses consistent with physiological recordings in cats and primates. In particular, they predict temporal receptive fields of two different types resembling so-called lagged and non-lagged cells in the LGN. Moreover, connection weights in the lattice filter can be learned using Hebbian rules in a stage-wise sequential manner reminiscent of the neuro-developmental sequence in mammals. In addition, lattice filters can model visual processing in insects. Therefore, lattice filter is a useful abstraction that captures temporal aspects of visual processing. Our sensory organs face an ongoing barrage of stimuli from the world and must transmit as much information about them as possible to the rest of the brain [1]. This is a formidable task because, in sensory modalities such as vision, the dynamic range of natural stimuli (more than three orders of magnitude) greatly exceeds the dynamic range of relay neurons (less than two orders of magnitude) [2]. The reason why high fidelity transmission is possible at all is that the continuity of objects in the physical world leads to correlations in natural stimuli, which imply redundancy. In turn, such redundancy can be eliminated by compression performed by the front end of the visual system leading to the reduction of the dynamic range [3, 4]. A compression strategy appropriate for redundant natural stimuli is called predictive coding [5, 6, 7]. In predictive coding, a prediction of the incoming signal value is computed from past values delayed in the circuit. This prediction is subtracted from the actual signal value and only the prediction error is transmitted. In the absence of transmission noise such compression is lossless as the original signal could be decoded on the receiving end by inverting the encoder. If predictions are accurate, the dynamic range of the error is much smaller than that of the natural stimuli. Therefore, minimizing dynamic range using predictive coding reduces to optimizing prediction. Experimental support for viewing the front end of the visual system as a predictive encoder comes from the measurements of receptive fields [6, 7]. In particular, predictive coding suggests that, for natural stimuli, the temporal receptive fields should be biphasic and the spatial receptive fields center-surround. These predictions are born out by experimental measurements in retinal ganglion cells, [8], lateral geniculate nucleus (LGN) neurons [9] and fly second order visual neurons called large monopolar cells (LMCs) [2]. In addition, the experimentally measured receptive fields vary with signal-to-noise ratio as would be expected from optimal prediction theory [6]. Furthermore, experimentally observed whitening of the transmitted signal [10] is consistent with removing correlated components from the incoming signals [11]. As natural stimuli contain correlations on time scales greater than hundred milliseconds, experimentally measured receptive fields of LGN neurons are equally long [12]. Decorrelation over such long time scales requires equally long delays. How can such extended receptive field be produced by 1 biological neurons and synapses whose time constants are typically less than hundred milliseconds [13]? The field of signal processing offers a solution to this problem in the form of a device called a lattice filter, which decorrelates signals in stages, sequentially adding longer and longer delays [14, 15, 16, 17]. Motivated by the cascade structure of visual systems [18], we propose to model decorrelation in them by lattice filters. Naturally, visual systems are more complex than lattice filters and perform many other operations. However, we show that the lattice filter model explains several existing observations in vertebrate and invertebrate visual systems and makes testable predictions. Therefore, we believe that lattice filters provide a convenient abstraction for modeling temporal aspects of visual processing. This paper is organized as follows. First, we briefly summarize relevant results from linear prediction theory. Second, we explain the operation of the lattice filter in discrete and continuous time. Third, we compare lattice filter predictions with physiological measurements. 1 Linear prediction theory Despite the non-linear nature of neurons and synapses, the operation of some neural circuits in vertebrates [19] and invertebrates [20] can be described by a linear systems theory. The advantage of linear systems is that optimal circuit parameters may be obtained analytically and the results are often intuitively clear. Perhaps not surprisingly, the field of signal processing relies heavily on the linear prediction theory, offering a convenient framework [15, 16, 17]. Below, we summarize the results from linear prediction that will be used to explain the operation of the lattice filter. Consider a scalar sequence y = {yt } where time t = 1, . . . , n. Suppose that yt at each time point depends on side information provided by vector zt . Our goal is to generate a series of linear predictions, yt from the vector zt , yt = w · zt . We define a prediction error as: ˆ ˆ et = yt − yt = yt − w · zt ˆ (1) and look for values of w that minimize mean squared error: e2 = 1 nt e2 = t t 1 nt (yt − w · zt )2 . (2) t The weight vector, w is optimal for prediction of sequence y from sequence z if and only if the prediction error sequence e = y − w · z is orthogonal to each component of vector z: ez = 0. (3) When the whole series y is given in advance, i.e. in the offline setting, these so-called normal equations can be solved for w, for example, by Gaussian elimination [21]. However, in signal processing and neuroscience applications, another setting called online is more relevant: At every time step t, prediction yt must be made using only current values of zt and w. Furthermore, after a ˆ prediction is made, w is updated based on the prediction yt and observed yt , zt . ˆ In the online setting, an algorithm called stochastic gradient descent is often used, where, at each time step, w is updated in the direction of negative gradient of e2 : t w →w−η w (yt − w · zt ) 2 . (4) This leads to the following weight update, known as least mean square (LMS) [15], for predicting sequence y from sequence z: w → w + ηet zt , (5) where η is the learning rate. The value of η represents the relative influence of more recent observations compared to more distant ones. The larger the learning rate the faster the system adapts to recent observations and less past it remembers. In this paper, we are interested in predicting a current value xt of sequence x from its past values xt−1 , . . . , xt−k restricted by the prediction order k > 0: xt = wk · (xt−1 , . . . , xt−k )T . ˆ 2 (6) This problem is a special case of the online linear prediction framework above, where yt = xt , zt = (xt−1 , . . . , xt−k )T . Then the gradient update is given by: w → wk + ηet (xt−1 , . . . , xt−k )T . (7) While the LMS algorithm can find the weights that optimize linear prediction (6), the filter wk has a long temporal extent making it difficult to implement with neurons and synapses. 2 Lattice filters One way to generate long receptive fields in circuits of biological neurons is to use a cascade architecture, known as the lattice filter, which calculates optimal linear predictions for temporal sequences and transmits prediction errors [14, 15, 16, 17]. In this section, we explain the operation of a discrete-time lattice filter, then adapt it to continuous-time operation. 2.1 Discrete-time implementation The first stage of the lattice filter, Figure 1, calculates the error of the first order optimal prediction (i.e. only using the preceding element of the sequence), the second stage uses the output of the first stage and calculates the error of the second order optimal prediction (i.e. using only two previous values) etc. To make such stage-wise error computations possible the lattice filter calculates at every stage not only the error of optimal prediction of xt from past values xt−1 , . . . , xt−k , called forward error, ftk = xt − wk · (xt−1 , . . . , xt−k )T , (8) but, perhaps non-intuitively, also the error of optimal prediction of a past value xt−k from the more recent values xt−k+1 , . . . , xt , called backward error: bk = xt−k − w k · (xt−k+1 , . . . , xt )T , t k where w and w k (9) are the weights of the optimal prediction. For example, the first stage of the filter calculates the forward error ft1 of optimal prediction of xt from xt−1 : ft1 = xt − u1 xt−1 as well as the backward error b1 of optimal prediction of xt−1 from t xt : b1 = xt−1 − v 1 xt , Figure 1. Here, we assume that coefficients u1 and v 1 that give optimal linear t prediction are known and return to learning them below. Each following stage of the lattice filter performs a stereotypic operation on its inputs, Figure 1. The k-th stage (k > 1) receives forward, ftk−1 , and backward, bk−1 , errors from the previous stage, t delays backward error by one time step and computes a forward error: ftk = ftk−1 − uk bk−1 t−1 (10) of the optimal linear prediction of ftk−1 from bk−1 . In addition, each stage computes a backward t−1 error k−1 k bt = bt−1 − v k ftk−1 (11) of the optimal linear prediction of bk−1 from ftk−1 . t−1 As can be seen in Figure 1, the lattice filter contains forward prediction error (top) and backward prediction error (bottom) branches, which interact at every stage via cross-links. Operation of the lattice filter can be characterized by the linear filters acting on the input, x, to compute forward or backward errors of consecutive order, so called prediction-error filters (blue bars in Figure 1). Because of delays in the backward error branch the temporal extent of the filters grows from stage to stage. In the next section, we will argue that prediction-error filters correspond to the measurements of temporal receptive fields in neurons. For detailed comparison with physiological measurements we will use the result that, for bi-phasic prediction-error filters, such as the ones in Figure 1, the first bar of the forward prediction-error filter has larger weight, by absolute value, than the combined weights of the remaining coefficients of the corresponding filter. Similarly, in backward predictionerror filters, the last bar has greater weight than the rest of them combined. This fact arises from the observation that forward prediction-error filters are minimum phase, while backward predictionerror filters are maximum phase [16, 17]. 3 Figure 1: Discrete-time lattice filter performs stage-wise computation of forward and backward prediction errors. In the first stage, the optimal prediction of xt from xt−1 is computed by delaying the input by one time step and multiplying it by u1 . The upper summation unit subtracts the predicted xt from the actual value and outputs prediction error ft1 . Similarly, the optimal prediction of xt−1 from xt is computed by multiplying the input by v 1 . The lower summation unit subtracts the optimal prediction from the actual value and outputs backward error b1 . In each following stage t k, the optimal prediction of ftk−1 from bk−1 is computed by delaying bk−1 by one time step and t t multiplying it by uk . The upper summation unit subtracts the prediction from the actual ftk−1 and outputs prediction error ftk . Similarly, the optimal prediction of bk−1 from ftk−1 is computed by t−1 multiplying it by uk . The lower summation unit subtracts the optimal prediction from the actual value and outputs backward error bk . Black connections have unitary weights and red connections t have learnable negative weights. One can view forward and backward error calculations as applications of so-called prediction-error filters (blue) to the input sequence. Note that the temporal extent of the filters gets longer from stage to stage. Next, we derive a learning rule for finding optimal coefficients u and v in the online setting. The uk is used for predicting ftk−1 from bk−1 to obtain error ftk . By substituting yt = ftk−1 , zt = bk−1 and t−1 t−1 et = ftk into (5) the update of uk becomes uk → uk + ηftk bk−1 . t−1 (12) Similarly, v k is updated by v k → v k + ηbk ftk−1 . (13) t Interestingly, the updates of the weights are given by the product of the activities of outgoing and incoming nodes of the corresponding cross-links. Such updates are known as Hebbian learning rules thought to be used by biological neurons [22, 23]. Finally, we give a simple proof that, in the offline setting when the entire sequence x is known, f k and bk , given by equations (10, 11), are indeed errors of optimal k-th order linear prediction. Let D be one step time delay operator (Dx)t = xt−1 . The induction statement at k is that f k and bk are k-th order forward and backward errors of optimal linear prediction which is equivalent to f k and bk k k being of the form f k = x−w1 Dx−. . .−wk Dk x and bk = Dk x−w1k Dk−1 x−. . .−wkk x and, from k i normal equations (3), satisfying f D x = 0 and Dbk Di x = bk Di−1 x = 0 for i = 1, . . . , k. That this is true for k = 1 directly follows from the definition of f 1 and b1 . Now we assume that this is true for k − 1 ≥ 1 and show it is true for k. It is easy to see from the forms of f k−1 and bk−1 k k and from f k = f k−1 − uk Dbk−1 that f k has the correct form f k = x − w1 Dx − . . . − wk Dk x. k i k−1 k k−1 Regarding orthogonality for i = 1, . . . , k − 1 we have f D x = (f − u Db )Di x = f k−1 Di x − uk (Dbk−1 )Di x = 0 using the induction assumptions of orhogonality at k − 1. For the remaining i = k we note that f k is the error of the optimal linear prediction of f k−1 from Dbk−1 k−1 and therefore 0 = f k Dbk−1 = f k (Dk x − w1k−1 Dk−1 x − . . . + wk−1 Dx) = f k Dk x as desired. The bk case can be proven similarly. 2.2 Continuous-time implementation The last hurdle remaining for modeling neuronal circuits which operate in continuous time with a lattice filter is its discrete-time operation. To obtain a continuous-time implementation of the lattice 4 filter we cannot simply take the time step size to zero as prediction-error filters would become infinitesimally short. Here, we adapt the discrete-time lattice filter to continous-time operation in two steps. First, we introduce a discrete-time Laguerre lattice filter [24, 17] which uses Laguerre polynomials rather than the shift operator to generate its basis functions, Figure 2. The input signal passes through a leaky integrator whose leakage constant α defines a time-scale distinct from the time step (14). A delay, D, at every stage is replaced by an all-pass filter, L, (15) with the same constant α, which preserves the magnitude of every Fourier component of the input but shifts its phase in a frequency dependent manner. Such all-pass filter reduces to a single time-step delay when α = 0. The optimality of a general discrete-time Laguerre lattice filter can be proven similarly to that for the discrete-time filter, simply by replacing operator D with L in the proof of section 2.1. Figure 2: Continuous-time lattice filter using Laguerre polynomials. Compared to the discretetime version, it contains a leaky integrator, L0 ,(16) and replaces delays with all-pass filters, L, (17). Second, we obtain a continuous-time formulation of the lattice filter by replacing t − 1 → t − δt, defining the inverse time scale γ = (1 − α)/δt and taking the limit δt → 0 while keeping γ fixed. As a result L0 and L are given by: Discrete time L0 (x)t L(x)t Continuous time = αL0 (x)t−1 + xt (14) = α(L(x)t−1 − xt ) + xt−1 (15) dL0 (x)/dt = −γL0 (x) + x L(x) = x − 2γL0 (x) (16) (17) Representative impulse responses of the continuous Laguerre filter are shown in Figure 2. Note that, similarly to the discrete-time case, the area under the first (peak) phase is greater than the area under the second (rebound) phase in the forward branch and the opposite is true in the backward branch. Moreover, the temporal extent of the rebound is greater than that of the peak not just in the forward branch like in the basic discrete-time implementation but also in the backward branch. As will be seen in the next section, these predictions are confirmed by physiological recordings. 3 Experimental evidence for the lattice filter in visual pathways In this section we demonstrate that physiological measurements from visual pathways in vertebrates and invertebrates are consistent with the predictions of the lattice filter model. For the purpose of modeling visual pathways, we identify summation units of the lattice filter with neurons and propose that neural activity represents forward and backward errors. In the fly visual pathway neuronal activity is represented by continuously varying graded potentials. In the vertebrate visual system, all neurons starting with ganglion cells are spiking and we identify their firing rate with the activity in the lattice filter. 3.1 Mammalian visual pathway In mammals, visual processing is performed in stages. In the retina, photoreceptors synapse onto bipolar cells, which in turn synapse onto retinal ganglion cells (RGCs). RGCs send axons to the LGN, where they synapse onto LGN relay neurons projecting to the primary visual cortex, V1. In addition to this feedforward pathway, at each stage there are local circuits involving (usually inhibitory) inter-neurons such as horizontal and amacrine cells in the retina. Neurons of each class 5 come in many types, which differ in their connectivity, morphology and physiological response. The bewildering complexity of these circuits has posed a major challenge to visual neuroscience. Alonso et al. • Connections between LGN and Cortex J. Neurosci., June 1, 2001, 21(11):4002–4015 4009 Temporal Filter 1 0.5 0 -0.5 -1 RGC LGN 0 100 Time (ms) 200 Figure 7. Distribution of geniculate cells and simple cells with respect to the timing of their responses. The distribution of three parameters derived from impulse responses of geniculate and cortical neurons is shown. A, Peak time. B, Zero-crossing time. C, Rebound index. Peak time is the time with the strongest response in the first phase of the impulse response. Zero-crossing time is the time between the first and second phases. Rebound index is the area of the impulse response after the zero crossing divided by the area before the zero crossing. Only impulse responses with good signal to noise were included (Ͼ5 SD above baseline; n ϭ 169). Figure 3: Electrophysiologically measured temporal receptive fields get progressively longer along the cat visual pathway. Left: A cat LGN cell (red) has a longer receptive field than a corresponding RGC cell (blue) (adapted from [12] which also reports population data). Right (A,B): Extent of the temporal receptive fields of simple cells in cat V1 is greater than that of corresponding LGN cells as quantified by the peak (A) and zero-crossing (B) times. Right (C): In the temporal receptive fields of cat LGN and V1 cells the peak can be stronger or weaker than the rebound (adapted from [25]). simple cells and geniculate cells differed for all temporal parameters measured, there was considerable overlap between the distributions (Fig. 7). This overlap raises the following question: does connectivity depend on how well geniculate and cortical responses are matched with respect to time? For instance, do simple cells with fast subregions (early times to peak and early zero crossings) receive input mostly from geniculate cells with fast centers? Figure 8 illustrates the visual responses from a geniculate cell and a simple cell that were monosynaptically connected. A strong positive peak was observed in the correlogram (shown with a 10 msec time window to emphasize its short latency and fast rise time). In this case, an ON central subregion was well overlapped with an ON geniculate center (precisely at the peak of the subregion). Moreover, the timings of the visual responses from the overlapped subregion and the geniculate center were very similar (same onset, ϳ0 –25 msec; same peak, ϳ25–50 msec). It is worth noting that the two central subregions of the simple cell were faster and stronger than the two lateral subregions. The responses of the central subregions matched the timing of the geniculate center. In contrast, the timing of the lateral subregions resembled more closely the timing of the geniculate surround (both peaked at 25–50 msec). Unlike the example shown in Figure 8, a considerable number of geniculocortical pairs produced responses with different timing. For example, Figure 9 illustrates a case in which a geniculate center fully overlapped a strong simple-cell subregion of the same sign, but with slower timing (LGN onset, ϳ0 –25 msec; peak, ϳ25–50 msec; simple-cell onset, ϳ25–50 msec; peak, ϳ50 –75 msec). The cross-correlogram between this pair of neurons was flat, which indicates the absence of a monosynaptic connection (Fig. 9, top right). To examine the role of timing in geniculocortical connectivity, we measured the response time course from all cell pairs that met two criteria. First, the geniculate center overlapped a simple-cell subregion of the same sign (n ϭ 104). Second, the geniculate center overlapped the cortical subregion in a near-optimal position (relative overlap Ͼ 50%, n ϭ 47; see Materials and Methods; Fig. 5A). All these cell pairs had a high probability of being monosynaptically connected because of the precise match in receptive-field position and sign (31 of 47 were connected). The distributions of peak time, zero-crossing time, and rebound index from these cell pairs were very similar to the distributions from the entire sample (Fig. 7; see also Fig. 10 legend). The selected cell pairs included both presumed directional (predicted DI Ͼ 0.3, see Materials and Methods; 12/20 connected) and nondirectional (19/27 connected) simple cells. Most geniculate cells had small receptive fields (less than two simple-cell subregion widths; see Receptive-field sign), although five cells with larger receptive fields were also included (three connected). From the 47 cell pairs used in this analysis, those with similar response time courses had a higher probability of being connected (Fig. 10). In particular, cell pairs that had both similar peak time and zero-crossing time were all connected (n ϭ 12; Fig. 10 A). Directionally selective simple cells were included in all timing groups. For example, in Figure 10 A there were four, five, two, and one directionally selective cells in the time groups Ͻ20, 40, 60, and Ͼ60 msec, respectively. Similar results were obtained if we restricted our sample to geniculate centers overlapped with the dominant subregion of the simple cell (n ϭ 31). Interestingly, the efficacy and contributions of the connections seemed to depend little on the relative timing of the visual responses (Fig. 10, right). Although our sample of them was quite small, lagged cells are of considerable interest and therefore deserve comment. We recorded from 13 potentially lagged LGN cells whose centers were superimposed with a simple-cell subregion (eight with rebound indices between 1.2 and 1.5; five with rebound indices Ͼ1.9). Only seven of these pairs could be used for timing comparisons (in one pair the baseline of the correlogram had insufficient spikes; in three pairs the geniculate receptive fields were Here, we point out several experimental observations related to temporal processing in the visual system consistent with the lattice filter model. First, measurements of temporal receptive fields demonstrate that they get progressively longer at each consecutive stage: i) LGN neurons have longer receptive fields than corresponding pre-synaptic ganglion cells [12], Figure 3left; ii) simple cells in V1 have longer receptive fields than corresponding pre-synaptic LGN neurons [25], Figure 3rightA,B. These observation are consistent with the progressively greater temporal extent of the prediction-error filters (blue plots in Figure 2). Second, the weight of the peak (integrated area under the curve) may be either greater or less than that of the rebound both in LGN relay cells [26] and simple cells of V1 [25], Figure 3right(C). Neurons with peak weight exceeding that of rebound are often referred to as non-lagged while the others are known as lagged found both in cat [27, 28, 29] and monkey [30]. The reason for this becomes clear from the response to a step stimulus, Figure 4(top). By comparing experimentally measured receptive fields with those of the continuous lattice filter, Figure 4, we identify non-lagged neurons with the forward branch and lagged neurons with the backward branch. Another way to characterize step-stimulus response is whether the sign of the transient is the same (non-lagged) or different (lagged) relative to sustained response. Third, measurements of cross-correlation between RGCs and LGN cell spikes in lagged and nonlagged neurons reveals a difference of the transfer function indicative of the difference in underlying circuitry [30]. This is consistent with backward pathway circuit of the Laguerre lattice filter, Figure 2, being more complex then that of the forward path (which results in different transfer function). ” (or replacing ”more complex” with ”different”) Third, measurements of cross-correlation between RGCs and LGN cell spikes in lagged and nonlagged neurons reveals a difference of the transfer function indicative of the difference in underlying circuitry [31]. This is consistent with the backward branch circuit of the Laguerre lattice filter, Figure 2, being different then that of the forward branch (which results in different transfer function). In particular, a combination of different glutamate receptors such as AMPA and NMDA, as well as GABA receptors are thought to be responsible for observed responses in lagged cells [32]. However, further investigation of the corresponding circuitry, perhaps using connectomics technology, is desirable. Fourth, the cross-link weights of the lattice filter can be learned using Hebbian rules, (12,13) which are biologically plausible [22, 23]. Interestingly, if these weights are learned sequentially, starting from the first stage, they do not need to be re-learned when additional stages are added or learned. This property maps naturally on the fact that in the course of mammalian development the visual pathway matures in a stage-wise fashion - starting with the retina, then LGN, then V1 - and implying that the more peripheral structures do not need to adapt to the maturation of the downstream ones. 6 Figure 4: Comparison of electrophysiologically measured responses of cat LGN cells with the continuous-time lattice filter model. Top: Experimentally measured temporal receptive fields and step-stimulus responses of LGN cells (adapted from [26]). Bottom: Typical examples of responses in the continuous-time lattice filter model. Lattice filter coefficients were u1 = v 1 = 0.4, u2 = v 2 = 0.2 and 1/γ = 50ms to model the non-lagged cell and u1 = v 1 = u2 = v 2 = 0.2 and 1/γ = 60ms to model the lagged cell. To model photoreceptor contribution to the responses, an additional leaky integrator L0 was added to the circuit of Figure 2. While Hebbian rules are biologically plausible, one may get an impression from Figure 2 that they must apply to inhibitory cross-links. We point out that this circuit is meant to represent only the computation performed rather than the specific implementation in terms of neurons. As the same linear computation can be performed by circuits with a different arrangement of the same components, there are multiple implementations of the lattice filter. For example, activity of non-lagged OFF cells may be seen as representing minus forward error. Then the cross-links between the non-lagged OFF pathway and the lagged ON pathway would be excitatory. In general, classification of cells into lagged and non-lagged seems independent of their ON/OFF and X/Y classification [31, 28, 29], but see[33]. 3.2 Insect visual pathway In insects, two cell types, L1 and L2, both post-synaptic to photoreceptors play an important role in visual processing. Physiological responses of L1 and L2 indicate that they decorrelate visual signals by subtracting their predictable parts. In fact, receptive fields of these neurons were used as the first examples of predictive coding in neuroscience [6]. Yet, as the numbers of synapses from photoreceptors to L1 and L2 are the same [34] and their physiological properties are similar, it has been a mystery why insects, have not just one but a pair of such seemingly redundant neurons per facet. Previously, it was suggested that L1 and L2 provide inputs to the two pathways that map onto ON and OFF pathways in the vertebrate retina [35, 36]. Here, we put forward a hypothesis that the role of L1 and L2 in visual processing is similar to that of the two branches of the lattice filter. We do not incorporate the ON/OFF distinction in the effectively linear lattice filter model but anticipate that such combined description will materialize in the future. As was argued in Section 2, in forward prediction-error filters, the peak has greater weight than the rebound, while in backward prediction-error filters the opposite is true. Such difference implies that in response to a step-stimulus the signs of sustained responses compared to initial transients are different between the branches. Indeed, Ca2+ imaging shows that responses of L1 and L2 to step-stimulus are different as predicted by the lattice filter model [35], Figure 5b. Interestingly, the activity of L1 seems to represent minus forward error and L2 - plus backward error, suggesting that the lattice filter cross-links are excitatory. To summarize, the predictions of the lattice filter model seem to be consistent with the physiological measurements in the fly visual system and may help understand its operation. 7 Stimulus 1 0.5 0 0 5 10 15 20 5 10 15 20 5 10 time 15 20 − Forward Error 1 0 −1 0 Backward Error 1 0 −1 0 Figure 5: Response of the lattice filter and fruit fly LMCs to a step-stimulus. Left: Responses of the first order discrete-time lattice filter to a step stimulus. Right: Responses of fly L1 and L2 cells to a moving step stimulus (adapted from [35]). Predicted and the experimentally measured responses have qualitatively the same shape: a transient followed by sustained response, which has the same sign for the forward error and L1 and the opposite sign for the backward error and L2. 4 Discussion Motivated by the cascade structure of the visual pathway, we propose to model its operation with the lattice filter. We demonstrate that the predictions of the continuous-time lattice filter model are consistent with the course of neural development and the physiological measurement in the LGN, V1 of cat and monkey, as well as fly LMC neurons. Therefore, lattice filters may offer a useful abstraction for understanding aspects of temporal processing in visual systems of vertebrates and invertebrates. Previously, [11] proposed that lagged and non-lagged cells could be a result of rectification by spiking neurons. Although we agree with [11] that LGN performs temporal decorrelation, our explanation does not rely on non-linear processing but rather on the cascade architecture and, hence, is fundamentally different. Our model generates the following predictions that are not obvious in [11]: i) Not only are LGN receptive fields longer than RGC but also V1 receptive fields are longer than LGN; ii) Even a linear model can generate a difference in the peak/rebound ratio; iii) The circuit from RGC to LGN should be different for lagged and non-lagged cells consistent with [31]; iv) The lattice filter circuit can self-organize using Hebbian rules, which gives a mechanistic explanation of receptive fields beyond the normative framework of [11]. In light of the redundancy reduction arguments given in the introduction, we note that, if the only goal of the system were to compress incoming signals using a given number of lattice filter stages, then after the compression is peformed only one kind of prediction errors, forward or backward needs to be transmitted. Therefore, having two channels, in the absence of noise, may seem redundant. However, transmitting both forward and backward errors gives one the flexibility to continue decorrelation further by adding stages performing relatively simple operations. We are grateful to D.A. Butts, E. Callaway, M. Carandini, D.A. Clark, J.A. Hirsch, T. Hu, S.B. Laughlin, D.N. Mastronarde, R.C. Reid, H. Rouault, A. Saul, L. Scheffer, F.T. Sommer, X. Wang for helpful discussions. References [1] F. Rieke, D. Warland, R.R. van Steveninck, and W. Bialek. Spikes: exploring the neural code. MIT press, 1999. [2] S.B. Laughlin. Matching coding, circuits, cells, and molecules to signals: general principles of retinal design in the fly’s eye. Progress in retinal and eye research, 13(1):165–196, 1994. [3] F. Attneave. Some informational aspects of visual perception. Psychological review, 61(3):183, 1954. [4] H. Barlow. Redundancy reduction revisited. Network: Comp in Neural Systems, 12(3):241–253, 2001. [5] R.M. Gray. Linear Predictive Coding and the Internet Protocol. Now Publishers, 2010. [6] MV Srinivasan, SB Laughlin, and A. Dubs. Predictive coding: a fresh view of inhibition in the retina. Proceedings of the Royal Society of London. Series B. Biological Sciences, 216(1205):427–459, 1982. [7] T. Hosoya, S.A. Baccus, and M. Meister. Dynamic predictive coding by the retina. Nature, 436:71, 2005. 8 [8] HK Hartline, H.G. Wagner, and EF MacNichol Jr. The peripheral origin of nervous activity in the visual system. Studies on excitation and inhibition in the retina: a collection of papers from the laboratories of H. Keffer Hartline, page 99, 1974. [9] N.A. Lesica, J. Jin, C. Weng, C.I. Yeh, D.A. Butts, G.B. Stanley, and J.M. Alonso. Adaptation to stimulus contrast and correlations during natural visual stimulation. Neuron, 55(3):479–491, 2007. [10] Y. Dan, J.J. Atick, and R.C. Reid. Efficient coding of natural scenes in the lateral geniculate nucleus: experimental test of a computational theory. The Journal of Neuroscience, 16(10):3351–3362, 1996. [11] D.W. Dong and J.J. Atick. Statistics of natural time-varying images. Network: Computation in Neural Systems, 6(3):345–358, 1995. [12] X. Wang, J.A. Hirsch, and F.T. Sommer. Recoding of sensory information across the retinothalamic synapse. The Journal of Neuroscience, 30(41):13567–13577, 2010. [13] C. Koch. Biophysics of computation: information processing in single neurons. Oxford Univ Press, 2005. [14] F. Itakura and S. Saito. On the optimum quantization of feature parameters in the parcor speech synthesizer. In Conference Record, 1972 International Conference on Speech Communication and Processing, Boston, MA, pages 434–437, 1972. [15] B. Widrow and S.D. Stearns. Adaptive signal processing. Prentice-Hall, Inc. Englewood Cliffs, NJ, 1985. [16] S. Haykin. Adaptive filter theory. Prentice-Hall, Englewood-Cliffs, NJ, 2003. [17] A.H. Sayed. Fundamentals of adaptive filtering. Wiley-IEEE Press, 2003. [18] D.J. Felleman and D.C. Van Essen. Distributed hierarchical processing in the primate cerebral cortex. Cerebral cortex, 1(1):1–47, 1991. [19] X. Wang, F.T. Sommer, and J.A. Hirsch. Inhibitory circuits for visual processing in thalamus. Current Opinion in Neurobiology, 2011. [20] SB Laughlin, J. Howard, and B. Blakeslee. Synaptic limitations to contrast coding in the retina of the blowfly calliphora. Proceedings of the Royal society of London. Series B. Biological sciences, 231(1265):437–467, 1987. [21] D.C. Lay. Linear Algebra and Its Applications. Addison-Wesley/Longman, New York/London, 2000. [22] D.O. Hebb. The organization of behavior: A neuropsychological theory. Lawrence Erlbaum, 2002. [23] O. Paulsen and T.J. Sejnowski. Natural patterns of activity and long-term synaptic plasticity. Current opinion in neurobiology, 10(2):172–180, 2000. [24] Z. Fejzo and H. Lev-Ari. Adaptive laguerre-lattice filters. Signal Processing, IEEE Transactions on, 45(12):3006–3016, 1997. [25] J.M. Alonso, W.M. Usrey, and R.C. Reid. Rules of connectivity between geniculate cells and simple cells in cat primary visual cortex. The Journal of Neuroscience, 21(11):4002–4015, 2001. [26] D. Cai, G.C. Deangelis, and R.D. Freeman. Spatiotemporal receptive field organization in the lateral geniculate nucleus of cats and kittens. Journal of Neurophysiology, 78(2):1045–1061, 1997. [27] D.N. Mastronarde. Two classes of single-input x-cells in cat lateral geniculate nucleus. i. receptive-field properties and classification of cells. Journal of Neurophysiology, 57(2):357–380, 1987. [28] J. Wolfe and L.A. Palmer. Temporal diversity in the lateral geniculate nucleus of cat. Visual neuroscience, 15(04):653–675, 1998. [29] AB Saul and AL Humphrey. Spatial and temporal response properties of lagged and nonlagged cells in cat lateral geniculate nucleus. Journal of Neurophysiology, 64(1):206–224, 1990. [30] A.B. Saul. Lagged cells in alert monkey lateral geniculate nucleus. Visual neurosci, 25:647–659, 2008. [31] D.N. Mastronarde. Two classes of single-input x-cells in cat lateral geniculate nucleus. ii. retinal inputs and the generation of receptive-field properties. Journal of Neurophysiology, 57(2):381–413, 1987. [32] P. Heggelund and E. Hartveit. Neurotransmitter receptors mediating excitatory input to cells in the cat lateral geniculate nucleus. i. lagged cells. Journal of neurophysiology, 63(6):1347–1360, 1990. [33] J. Jin, Y. Wang, R. Lashgari, H.A. Swadlow, and J.M. Alonso. Faster thalamocortical processing for dark than light visual targets. The Journal of Neuroscience, 31(48):17471–17479, 2011. [34] M. Rivera-Alba, S.N. Vitaladevuni, Y. Mischenko, Z. Lu, S. Takemura, L. Scheffer, I.A. Meinertzhagen, D.B. Chklovskii, and G.G. de Polavieja. Wiring economy and volume exclusion determine neuronal placement in the drosophila brain. Current Biology, 21(23):2000–5, 2011. [35] D.A. Clark, L. Bursztyn, M.A. Horowitz, M.J. Schnitzer, and T.R. Clandinin. Defining the computational structure of the motion detector in drosophila. Neuron, 70(6):1165–1177, 2011. [36] M. Joesch, B. Schnell, S.V. Raghu, D.F. Reiff, and A. Borst. On and off pathways in drosophila motion vision. Nature, 468(7321):300–304, 2010. 9

6 0.62153083 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

7 0.57925457 193 nips-2012-Learning to Align from Scratch

8 0.47871488 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification

9 0.36602116 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

10 0.36469045 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

11 0.34623459 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

12 0.32245287 137 nips-2012-From Deformations to Parts: Motion-based Segmentation of 3D Objects

13 0.31316543 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

14 0.30126897 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

15 0.29701787 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

16 0.28869596 195 nips-2012-Learning visual motion in recurrent neural networks

17 0.28745705 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves

18 0.28355104 131 nips-2012-Feature Clustering for Accelerating Parallel Coordinate Descent

19 0.26954925 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions

20 0.26908889 344 nips-2012-Timely Object Recognition


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.04), (17, 0.31), (21, 0.036), (38, 0.096), (42, 0.027), (54, 0.027), (55, 0.044), (74, 0.085), (76, 0.147), (80, 0.055), (92, 0.061)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95360988 224 nips-2012-Multi-scale Hyper-time Hardware Emulation of Human Motor Nervous System Based on Spiking Neurons using FPGA

Author: C. M. Niu, Sirish Nandyala, Won J. Sohn, Terence Sanger

Abstract: Our central goal is to quantify the long-term progression of pediatric neurological diseases, such as a typical 10-15 years progression of child dystonia. To this purpose, quantitative models are convincing only if they can provide multi-scale details ranging from neuron spikes to limb biomechanics. The models also need to be evaluated in hyper-time, i.e. significantly faster than real-time, for producing useful predictions. We designed a platform with digital VLSI hardware for multiscale hyper-time emulations of human motor nervous systems. The platform is constructed on a scalable, distributed array of Field Programmable Gate Array (FPGA) devices. All devices operate asynchronously with 1 millisecond time granularity, and the overall system is accelerated to 365x real-time. Each physiological component is implemented using models from well documented studies and can be flexibly modified. Thus the validity of emulation can be easily advised by neurophysiologists and clinicians. For maximizing the speed of emulation, all calculations are implemented in combinational logic instead of clocked iterative circuits. This paper presents the methodology of building FPGA modules emulating a monosynaptic spinal loop. Emulated activities are qualitatively similar to real human data. Also discussed is the rationale of approximating neural circuitry by organizing neurons with sparse interconnections. In conclusion, our platform allows emulating pathological abnormalities such that motor symptoms will emerge and can be analyzed. It compels us to test the origins of childhood motor disorders and predict their long-term progressions. 1 Challenges of studying developmental motor disorders There is currently no quantitative model of how a neuropathological condition, which mainly affects the function of neurons, ends up causing the functional abnormalities identified in clinical examinations. The gap in knowledge is particularly evident for disorders in developing human nervous systems, i.e. childhood neurological diseases. In these cases, the ultimate clinical effect of cellu1 lar injury is compounded by a complex interplay among the child’s injury, development, behavior, experience, plasticity, etc. Qualitative insight has been provided by clinical experiences into the association between particular types of injury and particular types of outcome. Their quantitative linkages, nevertheless, have yet to be created – neither in clinic nor in cellular physiological tests. This discrepancy is significantly more prominent for individual child patients, which makes it very difficult to estimate the efficacy of treatment plans. In order to understand the consequence of injury and discover new treatments, it is necessary to create a modeling toolset with certain design guidelines, such that child neurological diseases can be quantitatively analyzed. Perhaps more than any other organ, the brain necessarily operates on multiple spatial and temporal scales. On the one hand, it is the neurons that perform fundamental computations, but neurons have to interact with large-scale organs (ears, eyes, skeletal muscles, etc.) to achieve global functions. This multi-scale nature worths more attention in injuries, where the overall deficits depend on both the cellular effects of injuries and the propagated consequences. On the other hand, neural processes in developmental diseases usually operate on drastically different time scales, e.g. spinal reflex in milliseconds versus learning in years. Thus when studying motor nervous systems, mathematical modeling is convincing only if it can provide multi-scale details, ranging from neuron spikes to limb biomechanics; also the models should be evaluated with time granularity as small as 1 millisecond, meanwhile the evaluation needs to continue trillions of cycles in order to cover years of life. It is particularly challenging to describe the multi-scale nature of human nervous system when modeling childhood movement disorders. Note that for a child who suffered brain injury at birth, the full development of all motor symptoms may easily take more than 10 years. Therefore the millisecondbased model needs to be evaluated significantly faster than real-time, otherwise the model will fail to produce any useful predictions in time. We have implemented realistic models for spiking motoneurons, sensory neurons, neural circuitry, muscle fibers and proprioceptors using VLSI and programmable logic technologies. All models are computed in Field Programmable Gate Array (FPGA) hardware in 365 times real-time. Therefore one year of disease progression can be assessed after one day of emulation. This paper presents the methodology of building the emulation platform. The results demonstrate that our platform is capable of producing physiologically realistic multi-scale signals, which are usually scarce in experiments. Successful emulations enabled by this platform will be used to verify theories of neuropathology. New treatment mechanisms and drug effects can also be emulated before animal experiments or clinical trials. 2 Methodology of multi-scale neural emulation A. Human arm B. Monosynaptic spinal loop C. Inner structure of muscle spindle Gamma Secondary dynamic Gamma output input static Primary input output Bag 1 αMN Bag 2 Chain Figure 1: Illustration of the multi-scale nature of motor nervous system. The motor part of human nervous system is responsible for maintaining body postures and generating voluntary movements. The multi-scale nature of motor nervous system is demonstrated in Fig.1. When the elbow (Fig.1A) is maintaining a posture or performing a movement, a force is established by the involved muscle based on how much spiking excitation the muscle receives from its αmotoneurons (Fig.1B). The α-motoneurons are regulated by a variety of sensory input, part of which comes directly from the proprioceptors in the muscle. As the primary proprioceptor found in skeletal muscles, a muscle spindle is another complex system that has its own microscopic Multiple-InputMultiple-Output structure (Fig.1C). Spindles continuously provide information about the length and lengthening speed of the muscle fiber. A muscle with its regulating motoneurons, sensory neurons and proprioceptors constitutes a monosynaptic spinal loop. This minimalist neurophysiological 2 structure is used as an example for explaining the multi-scale hyper-time emulation in hardware. Additional structures can be added to the backbone set-up using similar methodologies. 2.1 Modularized architecture for multi-scale models Decades of studies on neurophysiology provided an abundance of models characterizing different components of the human motor nervous system. The informational characteristics of physiological components allowed us to model them as functional structures, i.e. each of which converting input signals to certain outputs. In particular, within a monosynaptic spinal loop illustrated in Fig.1B, stretching the muscle will elicit a chain of physiological activities in: muscle stretch ⇒ spindle ⇒ sensory neuron ⇒ synapse ⇒ motoneuron ⇒ muscle contraction. The adjacent components must have compatible interfaces, and the interfacing variables must also be physiologically realistic. In our design, each component is mathematically described in Table 1: Table 1: Functional definition of neural models COMPONENT Neuron Synapse Muscle Spindle MATHEMATICAL DEFINITION S(t) = fneuron (I, t) I(t) = fsynapse (S, t) ˙ T (t) = fmuscle (S, L, L, t) ˙ Γdynamic , Γstatic , t) A(t) = fspindle (L, L, all components are modeled as black-box functions that map the inputs to the outputs. The meanings of these mathematical definitions are explained below. This design allows existing physiological models to be easily inserted and switched. In all models the input signals are time-varying, e.g. I = I(t), L = L(t) , etc. The argument of t in input signals are omitted throughout this paper. 2.2 Selection of models for emulation Models were selected in consideration of their computational cost, physiological verisimilitude, and whether it can be adapted to the mathematical form defined in Table 1. Model of Neuron The informational process for a neuron is to take post-synaptic current I as the input, and produce a binary spike train S in the output. The neuron model adopted in the emulation was developed by Izhikevich [1]: = 0.04v 2 + 5v + 140 − u + I = a(bv − u) v u (1) (2) if v = 30 mV, then v ← c, u ← u + d where a, b, c, d are free parameters tuned to achieve certain firing patterns. Membrane potential v directly determines a binary spike train S(t) that S(t) = 1 if v ≥ 30, otherwise S(t) = 0. Note that v in Izhikevich model is in millivolts and time t is in milliseconds. Therefore the coefficients in eq.1 need to be adjusted in correspondence to SI units. Model of Synapse When a pre-synaptic neuron spikes, i.e. S(0) = 1, an excitatory synapse subsequently issues an Excitatory Post-Synaptic Current (EPSC) that drives the post-synaptic neuron. Neural recording of hair cells in rats [2] provided evidence that the time profile of EPSC can be well characterized using the equations below: I(t) = Vm × e t d Vm −τ 0 t − e− τr Vm if t ≥ 0 (3) otherwise The key parameters in a synapse model is the time constants for rising (τr ) and decaying (τd ). In our emulation τr = 0.001 s and τr = 0.003 s. 3 Model of Muscle force and electromyograph (EMG) The primary effect of skeletal muscle is to convert α-motoneuron spikes S into force T , depending ˙ on the muscle’s instantaneous length L and lengthening speed L. We used Hill’s muscle model in the emulation with parameter tuning described in [3]. Another measurable output of muscle is electromyograph (EMG). EMG is the small skin current polarized by motor unit action potential (MUAP) when it travels along muscle fibers. Models exist to describe the typical waveform picked by surface EMG electrodes. In this project we chose to implement the one described in [4]. Model of Proprioceptor Spindle is a sensory organ that provides the main source of proprioceptive information. As can be seen in Fig.1C, a spindle typically produces two afferent outputs (primary Ia and secondary II) ˙ according to its gamma fusimotor drives (Γdynamic and Γstatic ) and muscle states (L and L). There is currently no closed-form models describing spindle functions due to spindle’s significant nonlinearity. On representative model that numerically approximates the spindle dynamics was developed by Mileusnic et al. [5]. The model used differential equations to characterize a typical cat soleus spindle. Eqs.4-10 present a subset of this model for one type of spindle fiber (bag1): Γdynamic − x0 /τ Γdynamic + Ω2 bag1 x0 ˙ = x1 ˙ = x2 1 = [TSR − TB − TP R − Γ1 x0 ] M x2 ˙ (4) (5) (6) where TSR TB TP R CSS = KSR (L − x1 − LSR0 ) (7) 0.3 = (B0 + B1 x0 ) · (x1 − R) · CSS · |x2 | = KP R (x1 − LP R0 ) 2 = −1 −1000x2 1+e (8) (9) (10) Eq.8 and 10 suggest that evaluating the spindle model requires multiplication, division as well as more complex arithmetics like polynomials and exponentials. The implementation details are described in Section 3. 2.3 Neuron connectivity with sparse interconnections Although the number of spinal neurons (~1 billion) is significantly less compared to that of cortical neurons (~100 billion), a fully connected spinal network still means approximately 2 trillion synaptic endings [6]. Implementing such a huge number of synapses imposes a major challenge, if not impossible, given limited hardware resource. In this platform we approximated the neural connectivity by sparsely connecting sensory neurons to motoneurons as parallel pathways. We do not attempt to introduce the full connectivity. The rationale is that in a neural control system, the effect of a single neuron can be considered as mapping current state x to change in state x through a band-limited channel. Therefore when a collection of ˙ neurons are firing stochastically, the probability of x depends on both x and the firing behavior s ˙ (s = 1 when spiking, otherwise s = 0) of each neuron, as such: p(x|x, s) = p(x|s = 1)p(s = 1|x) + p(x|s = 0)p(s = 0|x) ˙ ˙ ˙ (11) Eq.11 is a master equation that determines a probability flow on the state. From the Kramers-Moyal expansion we can associate this probability flow with a partial differential equation: ∂ p(x, t) ∂t ∞ − = i=1 ∂ ∂x i D(i) (x)p(x, t) (12) where D(i) (x) is a time-invariant term that modifies the change of probability density based on its i-th gradient. 4 Under certain conditions [7, 8], D(i) (x) for i > 2 all vanish and therefore the probability flow can be described deterministically using a linear operator L: ∂ ∂ ∂ 2 (2) D (x) p(x, t) = Lp(x, t) (13) p(x, t) = − D(1) (x) + ∂t ∂x ∂x2 This means that various Ls can be superimposed to achieve complex system dynamics (illustrated in Fig.2A). B. Equivalent network with sparse interconnections A. Neuron function as superimposed linear operators SN Sensory Input + SN SN SN αMN αMN αMN Motor Output αMN Figure 2: Functions of neuron population can be described as the combination of linear operators (A). Therefore the original neural function can be equivalently produced by sparsely connected neurons formalizing parallel pathways (B). As a consequence, the statistical effect of two fully connected neuron populations is equivalent to ones that are only sparsely connected, as long as the probability flow can be described by the same L. For a movement task, in particular, it is the statistical effect from the neuron ensemble onto skeletal muscles that determines the global behavior. Therefore we argue that it is feasible to approximate the spinal cord connectivity by sparsely interconnecting sensory and motor neurons (Fig.2B). Here a pool of homogenous sensory neurons projects to another pool of homogeneous α-motoneurons. Pseudorandom noise is added to the input of all homogeneous neurons within a population. It is worth noting that this approximation significantly reduces the number of synapses that need to be implemented in hardware. 3 Hardware implementation on FPGA We select FPGA as the implementation device due to its inherent parallelism that resembles the nervous system. FPGA is favored over GPU or clustered CPUs because it is relatively easy to network hundreds of nodes under flexible protocols. The platform is distributed on multiple nodes of Xilinx Spartan-6 devices. The interfacing among FPGAs and computers is created using OpalKelly development board XEM6010. The dynamic range of variables is tight in models of Izhikevich neuron, synapse and EMG. This helps maintaining the accuracy of models even when they are evaluated in 32-bit fixed-point arithmetics. The spindle model, in contrast, requires floating-point arithmetics due to its wide dynamic range and complex calculations (see eq.4-10). Hyper-time computations with floating-point numbers are resource consuming and therefore need to be implemented with special attentions. 3.1 Floating-point arithmetics in combinational logic Our arithmetic implementations are compatible with IEEE-754 standard. Typical floating-point arithmetic IP cores are either pipe-lined or based on iterative algorithms such as CORDIC, all of which require clocks to schedule the calculation. In our platform, no clock is provided for model evaluations thus all arithmetics need to be executed in pure combinational logic. Taking advantage of combinational logic allows all model evaluations to be 1) fast, the evaluation time depends entirely on the propagating and settling time of signals, which is on the order of microseconds, and 2) parallel, each model is evaluated on its own circuit without waiting for any other results. Our implementations of adder and multiplier are inspired by the open source project “Free FloatingPoint Madness”, available at http://www.hmc.edu/chips/. Please contact the authors of this paper if the modified code is needed. 5 Fast combinational floating-point division Floating-point division is even more resource demanding than multiplications. We avoided directly implementing the dividing algorithm by approximating it with additions and multiplications. Our approach is inspired by an algorithm described in [9], which provides a good approximation of the inverse square root for any positive number x within one Newton-Raphson iteration: 1 x Q(x) = √ ≈ x(1.5 − · x2 ) 2 x (x > 0) (14) Q(x) can be implemented only using floating-point adders and multipliers. Thereby any division with a positive divisor can be achieved if two blocks of Q(x) are concatenated: a a (15) = √ √ = a · Q(b) · Q(b) (b > 0) b b· b This algorithm has been adjusted to also work with negative divisors (b < 0). Numerical integrators for differential equations Evaluating the instantaneous states of differential equation models require a fixed-step numerical integrator. Backward Euler’s Method was chosen to balance the numerical error and FPGA usage: x ˙ xn+1 = f (x, t) = xn + T f (xn+1 , tn+1 ) (16) (17) where T is the sampling interval. f (x, t) is the derivative function for state variable x. 3.2 Asynchronous spike-based communication between FPGA chips Clock Spike clean count Counter 1 1 2 1 2 3 Figure 3: Timing diagram of asynchronous spike-based communication FPGA nodes are networked by transferring 1-bit binary spikes to each other. Our design allowed the sender and the receiver to operate on independent clocks without having to synchronize. The timing diagram of the spike-based communication is shown in Fig.3. The sender issues Spike with a pulse width of 1/(365 × Femu ) second. Each Spike then triggers a counting event on the receiver, meanwhile each Clock first reads the accumulated spike count and subsequently cleans the counter. Note that the phase difference between Spike and Clock is not predictable due to asynchronicity. 3.3 Serialize neuron evaluations within a homogeneous population Different neuron populations are instantiated as standalone circuits. Within in each population, however, homogeneous neurons mentioned in Section 2.3 are evaluated in series in order to optimize FPGA usage. Within each FPGA node all modules operate with a central clock, which is the only source allowed to trigger any updating event. Therefore the maximal number of neurons that can be serialized (Nserial ) is restrained by the following relationship: Ffpga = C × Nserial × 365 × Femu (18) Here Ffpga is the fastest clock rate that a FPGA can operate on; C = 4 is the minimal clock cycles needed for updating each state variable in the on-chip memory; Femu = 1 kHz is the time granularity of emulation (1 millisecond), and 365 × Femu represents 365x real-time. Consider that Xilinx 6 Spartan-6 FPGA devices peaks at 200MHz central clock frequency, the theoretical maximum of neurons that can be serialized is Nserial 200 MHz/(4 × 365 × 1 kHz) ≈ 137 (19) In the current design we choose Nserial = 128. 4 Results: emulated activities of motor nervous system Figure 4 shows the implemented monosynaptic spinal loop in schematics and in operation. Each FPGA node is able to emulate monosynaptic spinal loops consisting of 1,024 sensory and 1,024 motor neurons, i.e. 2,048 neurons in total. The spike-based asynchronous communication is successful between two FPGA nodes. Note that the emulation has to be significantly slowed down for on-line plotting. When the emulation is at full speed (365x real-time) the software front-end is not able to visualize the signals due to limited data throughput. 128 SNs 128 αMNs SN αMN 128 SNs 128 αMNs SN αMN ... 8 parallel pathways 2,048 neurons Figure 4: The neural emulation platform in operation. Left: Neural circuits implemented for each FPGA node including 2,048 neurons. SN = Sensory Neuron; αMN = α-motoneuron. Center: One working FPGA node. Right: Two FPGA nodes networked using asynchronous spiking protocol. The emulation platform successfully created multi-scale information when the muscle is externally stretched (Fig.5A). We also tested if our emulated motor system is able to produce the recruitment order and size principles observed in real physiological data. It has been well known that when a voluntary motor command is sent to the α-motoneuron pool, the motor units are recruited in an order that small ones get recruited first, followed by the big ones [10]. The comparison between our results and real data are shown in Fig.5B, where the top panel shows 20 motor unit activities emulated using our platform, and the bottom panel shows decoded motor unit activities from real human EMG [11]. No qualitative difference was found. 5 Discussion and future work We designed a hardware platform for emulating the multi-scale motor nervous activities in hypertime. We managed to use one node of single Xilinx Spartan-6 FPGA to emulate monosynaptic spinal loops consisting of 2,048 neurons, associated muscles and proprioceptors. The neurons are organized as parallel pathways with sparse interconnections. The emulation is successfully accelerated to 365x real-time. The platform can be scaled by networking multiple FPGA nodes, which is enabled by an asynchronous spike-based communication protocol. The emulated monosynaptic spinal loops are capable of producing reflex-like activities in response to muscle stretch. Our results of motor unit recruitment order are compatible with the physiological data collected in real human subjects. There is a question of whether this stochastic system turns out chaotic, especially with accumulated errors from Backward Euler’s integrator. Note that the firing property of a neuron population is usually stable even with explicit noise [8], and spindle inputs are measured from real robots so the integrator errors are corrected at every iteration. To our knowledge, the system is not critically sensitive to the initial conditions or integrator errors. This question, however, is both interesting and important for in-depth investigations in the future. 7 It has been shown [12] that replicating classic types of spinal interneurons (propriospinal, Iaexcitatory, Ia-inhibitory, Renshaw, etc.) is sufficient to produce stabilizing responses and rapid reaching movement in a wrist. Our platform will introduce those interneurons to describe the known spinal circuitry in further details. Physiological models will also be refined as needed. For the purpose of modeling movement behavior or diseases, Izhikevich model is a good balance between verisimilitude and computational cost. Nevertheless when testing drug effects along disease progression, neuron models are expected to cover sufficient molecular details including how neurotransmitters affect various ion channels. With the advancing of programmable semiconductor technology, it is expected to upgrade our neuron model to Hodgkin-Huxley’s. For the muscle models, Hill’s type of model does not fit the muscle properties accurately enough when the muscle is being shortened. Alternative models will be tested. Other studies showed that the functional dexterity of human limbs – especially in the hands – is critically enabled by the tendon configurations and joint geometry [13]. As a result, if our platform is used to understand whether known neurophysiology and biomechanics are sufficient to produce able and pathological movements, it will be necessary to use this platform to control human-like limbs. Since the emulation speed can be flexibly adjusted from arbitrarily slow to 365x real-time, when speeded to exactly 1x real-time the platform will function as a digital controller with 1kHz refresh rate. The main purpose of the emulation is to learn how certain motor disorders progress during childhood development. This first requires the platform to reproduce motor symptoms that are compatible with clinical observations. For example it has been suggested that muscle spasticity in rats is associated with decreased soma size of α-motoneurons [14], which presumably reduced the firing threshold of neurons. Thus when lower firing threshold is introduced to the emulated motoneuron pool, similar EMG patterns as in [15] should be observed. It is also necessary for the symptoms to evolve with neural plasticity. In the current version we presume that the structure of each component remains time invariant. In the future work Spike Timing Dependent Plasticity (STDP) will be introduced such that all components are subject to temporal modifications. B. Verify motor unit recruitment pattern A. Multi-scale activities from emulation Emulation 1s Stretch Spindle Ia Sensory post-synaptic current Real Data Motoneurons Muscle Force EMG Figure 5: A) Physiological activity emulated by each model when the muscle is sinusoidally stretched. B) Comparing the emulated motor unit recruitment order with real experimental data. Acknowledgments The authors thank Dr. Gerald Loeb for helping set up the emulation of spindle models. This project is supported by NIH NINDS grant R01NS069214-02. 8 References [1] Izhikevich, E. M. Simple model of spiking neurons. IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council 14, 1569–1572 (2003). [2] Glowatzki, E. & Fuchs, P. A. Transmitter release at the hair cell ribbon synapse. Nature neuroscience 5, 147–154 (2002). [3] Shadmehr, R. & Wise, S. P. A Mathematical Muscle Model. In Supplementary documents for “Computational Neurobiology of Reaching and Pointing”, 1–18 (MIT Press, Cambridge, MA, 2005). [4] Fuglevand, A. J., Winter, D. A. & Patla, A. E. Models of recruitment and rate coding organization in motor-unit pools. Journal of neurophysiology 70, 2470–2488 (1993). [5] Mileusnic, M. P., Brown, I. E., Lan, N. & Loeb, G. E. Mathematical models of proprioceptors. I. Control and transduction in the muscle spindle. Journal of neurophysiology 96, 1772–1788 (2006). [6] Gelfan, S., Kao, G. & Ruchkin, D. S. The dendritic tree of spinal neurons. The Journal of comparative neurology 139, 385–411 (1970). [7] Sanger, T. D. Neuro-mechanical control using differential stochastic operators. In Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE, 4494–4497 (2010). [8] Sanger, T. D. Distributed control of uncertain systems using superpositions of linear operators. Neural computation 23, 1911–1934 (2011). [9] Lomont, C. Fast inverse square root (2003). URL http://www.lomont.org/Math/Papers/ 2003/InvSqrt.pdf. [10] Henneman, E. Relation between size of neurons and their susceptibility to discharge. Science (New York, N.Y.) 126, 1345–1347 (1957). [11] De Luca, C. J. & Hostage, E. C. Relationship between firing rate and recruitment threshold of motoneurons in voluntary isometric contractions. Journal of neurophysiology 104, 1034–1046 (2010). [12] Raphael, G., Tsianos, G. A. & Loeb, G. E. Spinal-like regulator facilitates control of a two-degree-offreedom wrist. The Journal of neuroscience : the official journal of the Society for Neuroscience 30, 9431–9444 (2010). [13] Valero-Cuevas, F. J. et al. The tendon network of the fingers performs anatomical computation at a macroscopic scale. IEEE transactions on bio-medical engineering 54, 1161–1166 (2007). [14] Brashear, A. & Elovic, E. Spasticity: Diagnosis and Management (Demos Medical, 2010), 1 edn. [15] Levin, M. F. & Feldman, A. G. The role of stretch reflex threshold regulation in normal and impaired motor control. Brain research 657, 23–30 (1994). 9

same-paper 2 0.83336598 62 nips-2012-Burn-in, bias, and the rationality of anchoring

Author: Falk Lieder, Thomas Griffiths, Noah Goodman

Abstract: Recent work in unsupervised feature learning has focused on the goal of discovering high-level features from unlabeled images. Much progress has been made in this direction, but in most cases it is still standard to use a large amount of labeled data in order to construct detectors sensitive to object classes or other complex patterns in the data. In this paper, we aim to test the hypothesis that unsupervised feature learning methods, provided with only unlabeled data, can learn high-level, invariant features that are sensitive to commonly-occurring objects. Though a handful of prior results suggest that this is possible when each object class accounts for a large fraction of the data (as in many labeled datasets), it is unclear whether something similar can be accomplished when dealing with completely unlabeled data. A major obstacle to this test, however, is scale: we cannot expect to succeed with small datasets or with small numbers of learned features. Here, we propose a large-scale feature learning system that enables us to carry out this experiment, learning 150,000 features from tens of millions of unlabeled images. Based on two scalable clustering algorithms (K-means and agglomerative clustering), we find that our simple system can discover features sensitive to a commonly occurring object class (human faces) and can also combine these into detectors invariant to significant global distortions like large translations and scale. 1

3 0.83336598 116 nips-2012-Emergence of Object-Selective Features in Unsupervised Feature Learning

Author: Adam Coates, Andrej Karpathy, Andrew Y. Ng

Abstract: Recent work in unsupervised feature learning has focused on the goal of discovering high-level features from unlabeled images. Much progress has been made in this direction, but in most cases it is still standard to use a large amount of labeled data in order to construct detectors sensitive to object classes or other complex patterns in the data. In this paper, we aim to test the hypothesis that unsupervised feature learning methods, provided with only unlabeled data, can learn high-level, invariant features that are sensitive to commonly-occurring objects. Though a handful of prior results suggest that this is possible when each object class accounts for a large fraction of the data (as in many labeled datasets), it is unclear whether something similar can be accomplished when dealing with completely unlabeled data. A major obstacle to this test, however, is scale: we cannot expect to succeed with small datasets or with small numbers of learned features. Here, we propose a large-scale feature learning system that enables us to carry out this experiment, learning 150,000 features from tens of millions of unlabeled images. Based on two scalable clustering algorithms (K-means and agglomerative clustering), we find that our simple system can discover features sensitive to a commonly occurring object class (human faces) and can also combine these into detectors invariant to significant global distortions like large translations and scale. 1

4 0.78724128 97 nips-2012-Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification

Author: Yung-kyun Noh, Frank Park, Daniel D. Lee

Abstract: This paper sheds light on some fundamental connections of the diffusion decision making model of neuroscience and cognitive psychology with k-nearest neighbor classification. We show that conventional k-nearest neighbor classification can be viewed as a special problem of the diffusion decision model in the asymptotic situation. By applying the optimal strategy associated with the diffusion decision model, an adaptive rule is developed for determining appropriate values of k in knearest neighbor classification. Making use of the sequential probability ratio test (SPRT) and Bayesian analysis, we propose five different criteria for adaptively acquiring nearest neighbors. Experiments with both synthetic and real datasets demonstrate the effectiveness of our classification criteria. 1

5 0.74151498 283 nips-2012-Putting Bayes to sleep

Author: Dmitry Adamskiy, Manfred K. Warmuth, Wouter M. Koolen

Abstract: We consider sequential prediction algorithms that are given the predictions from a set of models as inputs. If the nature of the data is changing over time in that different models predict well on different segments of the data, then adaptivity is typically achieved by mixing into the weights in each round a bit of the initial prior (kind of like a weak restart). However, what if the favored models in each segment are from a small subset, i.e. the data is likely to be predicted well by models that predicted well before? Curiously, fitting such “sparse composite models” is achieved by mixing in a bit of all the past posteriors. This self-referential updating method is rather peculiar, but it is efficient and gives superior performance on many natural data sets. Also it is important because it introduces a long-term memory: any model that has done well in the past can be recovered quickly. While Bayesian interpretations can be found for mixing in a bit of the initial prior, no Bayesian interpretation is known for mixing in past posteriors. We build atop the “specialist” framework from the online learning literature to give the Mixing Past Posteriors update a proper Bayesian foundation. We apply our method to a well-studied multitask learning problem and obtain a new intriguing efficient update that achieves a significantly better bound. 1

6 0.68211251 135 nips-2012-Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach

7 0.64025861 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

8 0.62758482 23 nips-2012-A lattice filter model of the visual pathway

9 0.61053699 341 nips-2012-The topographic unsupervised learning of natural sounds in the auditory cortex

10 0.60703921 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

11 0.59730232 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

12 0.59708595 193 nips-2012-Learning to Align from Scratch

13 0.59266019 303 nips-2012-Searching for objects driven by context

14 0.58908641 24 nips-2012-A mechanistic model of early sensory processing based on subtracting sparse representations

15 0.58813524 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

16 0.58762991 345 nips-2012-Topic-Partitioned Multinetwork Embeddings

17 0.58466423 363 nips-2012-Wavelet based multi-scale shape features on arbitrary surfaces for cortical thickness discrimination

18 0.58400518 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

19 0.58209455 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

20 0.58134675 238 nips-2012-Neurally Plausible Reinforcement Learning of Working Memory Tasks