nips nips2007 nips2007-182 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Honglak Lee, Chaitanya Ekanadham, Andrew Y. Ng
Abstract: Motivated in part by the hierarchical organization of the cortex, a number of algorithms have recently been proposed that try to learn hierarchical, or “deep,” structure from unlabeled data. While several authors have formally or informally compared their algorithms to computations performed in visual area V1 (and the cochlea), little attempt has been made thus far to evaluate these algorithms in terms of their fidelity for mimicking computations at deeper levels in the cortical hierarchy. This paper presents an unsupervised learning model that faithfully mimics certain properties of visual area V2. Specifically, we develop a sparse variant of the deep belief networks of Hinton et al. (2006). We learn two layers of nodes in the network, and demonstrate that the first layer, similar to prior work on sparse coding and ICA, results in localized, oriented, edge filters, similar to the Gabor functions known to model V1 cell receptive fields. Further, the second layer in our model encodes correlations of the first layer responses in the data. Specifically, it picks up both colinear (“contour”) features as well as corners and junctions. More interestingly, in a quantitative comparison, the encoding of these more complex “corner” features matches well with the results from the Ito & Komatsu’s study of biological V2 responses. This suggests that our sparse variant of deep belief networks holds promise for modeling more higher-order features. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Sparse deep belief net model for visual area V2 Honglak Lee Chaitanya Ekanadham Andrew Y. [sent-1, score-0.439]
2 Specifically, we develop a sparse variant of the deep belief networks of Hinton et al. [sent-7, score-0.499]
3 We learn two layers of nodes in the network, and demonstrate that the first layer, similar to prior work on sparse coding and ICA, results in localized, oriented, edge filters, similar to the Gabor functions known to model V1 cell receptive fields. [sent-9, score-0.635]
4 Further, the second layer in our model encodes correlations of the first layer responses in the data. [sent-10, score-0.572]
5 This suggests that our sparse variant of deep belief networks holds promise for modeling more higher-order features. [sent-13, score-0.538]
6 Much of this work appears to have been motivated by the hierarchical organization of the cortex, and indeed authors frequently compare their algorithms’ output to the oriented simple cell receptive fields found in visual area V1. [sent-16, score-0.541]
7 However, to our knowledge no serious attempt has been made to directly relate, such as through quantitative comparisons, the computations of these deep learning algorithms to areas deeper in the cortical hierarchy, such as to visual areas V2, V4, etc. [sent-20, score-0.466]
8 In this paper, we develop a sparse variant of Hinton’s deep belief network algorithm, and measure the degree to which it faithfully mimics biological measurements of V2. [sent-21, score-0.531]
9 Specifically, we take Ito & Komatsu [7]’s characterization of V2 in terms of its responses to a large class of angled bar stimuli, and quantitatively measure the degree to which the deep belief network algorithm generates similar responses. [sent-22, score-0.412]
10 1 In related work, several studies have compared models such as these, as well as nonhierarchical/non-deep learning algorithms, to the response properties of neurons in area V1. [sent-30, score-0.357]
11 A study by van Hateren and van der Schaaf [8] showed that the filters learned by independent components analysis (ICA) [9] on natural image data match very well with the classical receptive fields of V1 simple cells. [sent-31, score-0.441]
12 (Filters learned by sparse coding [10, 11] also similarly give responses similar to V1 simple cells. [sent-32, score-0.43]
13 ) Our work takes inspiration from the work of van Hateren and van der Schaaf, and represents a study that is done in a similar spirit, only extending the comparisons to a deeper area in the cortical hierarchy, namely visual area V2. [sent-33, score-0.523]
14 1 Features in early visual cortex: area V1 The selectivity of neurons for oriented bar stimuli in cortical area V1 has been well documented [12, 13]. [sent-35, score-0.814]
15 Many of these algorithms, such as [10, 9, 8, 6], compute a (approximately or exactly) sparse representation of the natural stimuli data. [sent-38, score-0.39]
16 Some hierarchical extensions of these models [15, 6, 16] are able to learn features that are more complex than simple oriented bars. [sent-40, score-0.239]
17 For example, hierarchical sparse models of natural images have accounted for complex cell receptive fields [17], topography [18, 6], colinearity and contour coding [19]. [sent-41, score-0.763]
18 2 Features in visual cortex area V2 It remains unknown to what extent the previously described algorithms can learn higher order features that are known to be encoded further down the ventral visual pathway. [sent-44, score-0.392]
19 In addition, the response properties of neurons in cortical areas receiving projections from area V1 (e. [sent-45, score-0.434]
20 It is uncertain what type of stimuli cause V2 neurons to respond optimally [21]. [sent-48, score-0.4]
21 One V2 study by [22] reported that the receptive fields in this area were similar to those in the neighboring areas V1 and V4. [sent-49, score-0.247]
22 However, quantitative accounts of responses in area V2 are few in number. [sent-51, score-0.242]
23 In one of these studies, Ito and Komatsu [7] investigated how V2 neurons responded to angular stimuli. [sent-53, score-0.249]
24 They summarized each neuron’s response with a two-dimensional visualization of the stimuli set called an angle profile. [sent-54, score-0.54]
25 By making several axial measurements within the profile, the authors were able to compute various statistics about each neuron’s selectivity for angle width, angle orientation, and for each separate line component of the angle (see Figure 1). [sent-55, score-0.774]
26 Approximately 80% of the neurons responded to specific angle stimuli. [sent-56, score-0.472]
27 They found neurons that were selective for only one line component of its peak angle as well as neurons selective for both line components. [sent-57, score-0.861]
28 These neurons yielded angle profiles resembling those of Cell 2 and Cell 5 in Figure 1, respectively. [sent-58, score-0.414]
29 In addition, several neurons exhibited a high amount of selectivity for its peak angle producing angle profiles like that of Cell 1 in Figure 1. [sent-59, score-0.802]
30 No neurons were found that had more elongation in a diagonal axis than in the horizontal or vertical axes, indicating that neurons in V2 were not selective for angle width or orientation. [sent-60, score-0.815]
31 Therefore, an important conclusion made from [7] was that a V2 neuron’s response to an angle stimulus is highly dependent on its responses to each individual line component of the angle. [sent-61, score-0.546]
32 29 neurons had very small peak response areas and yielded profiles like that of Cell 1 in Figure 1(right), thus indicating a highly specific tuning to an angle stimulus. [sent-63, score-0.62]
33 Another study by Hegde and Van Essen [23] studied the responses of a population of V2 neurons to complex contour and grating stimuli. [sent-66, score-0.409]
34 They found several V2 neurons responding maximally for angles, and the distribution of peak angles for these neurons is consistent with that found by [7]. [sent-67, score-0.72]
35 In addition, several V2 neurons responded maximally for shapes such as intersections, tri-stars, fivepoint stars, circles, and arcs of varying length. [sent-68, score-0.36]
36 (D) The angles width remains constant as one moves along a the diagonal indicated (E) The angle orientation remains constant as one moves along the diagonal indicated. [sent-74, score-0.457]
37 [1] proposed an algorithm for learning deep belief networks, by treating each layer as a restricted Boltzmann machine (RBM) and greedily training the network one layer at a time from the bottom up [24, 1]. [sent-83, score-0.756]
38 , sparse coding [10, 11], ICA [9], heavy-tailed models [6], and energy based models [2]), sparseness seems to play a key role in learning gabor-like filters. [sent-87, score-0.31]
39 ’s learning algorithm to enable deep belief nets to learn sparse representations. [sent-89, score-0.508]
40 An RBM has a set of hidden units h, a set of visible units v, and symmetric connections weights between these two layers represented by a weight matrix W . [sent-92, score-0.405]
41 The negative log probability of any state in the RBM is given by the following energy function:1 − log P (v, h) = E(v, h) = 1 2σ 2 2 vi − i 1 σ2 j i vi wij hj . [sent-94, score-0.499]
42 b j hj + ci vi + (1) i,j Here, σ is a parameter, hj are hidden unit variables, vi are visible unit variables. [sent-95, score-0.766]
43 Informally, the maximum likelihood parameter estimation problem corresponds to learning wij , ci and bj so as to minimize the energy of states drawn from the data distribution, and raise the energy of states that are improbable given the data. [sent-96, score-0.333]
44 Holding either h or v fixed, we can sample from the other as follows: P (vi |h) = N ci + P (hj |v) = logistic j 1 σ2 wij hj , σ 2 , (2) (bj + (3) i wij vi ) . [sent-98, score-0.51]
45 It is also straightforward to formulate a sparse RBM with binary-valued visible units; for example, we can write the P P P energy function as E(v, h) = −1/σ 2 ( i ci vi + j bj hj + i,j vi wij hj ) (see also [24]). [sent-100, score-1.075]
46 We also want hidden unit activations to be sparse; thus, we add a regularization term that penalizes a deviation of the expected activation of the hidden units from a (low) fixed level p. [sent-103, score-0.332]
47 More specifically, wij := wij + α( vi hj data − vi hj recon ) ci := ci + α( vi data − vi recon ) bj := bj + α( bj data − bj recon ), where α is a learning rate, and · recon is an expectation over the reconstruction data, estimated using one iteration of Gibbs sampling (as in Equations 2,3). [sent-115, score-1.618]
48 2 Learning deep networks using sparse RBM Once a layer of the network is trained, the parameters wij , bj , ci ’s are frozen and the hidden unit values given the data are inferred. [sent-121, score-1.012]
49 [1] showed that by repeatedly applying such a procedure, one can learn a multilayered deep belief network. [sent-124, score-0.308]
50 In our experiments using natural images, we learn a network with two hidden layers, with each layer learned using the sparse RBM algorithm described in Section 3. [sent-126, score-0.659]
51 4 We learned a sparse RBM with 69 visible units and 200 hidden units. [sent-130, score-0.485]
52 ) Many bases found by the algorithm roughly represent different “strokes” of which handwritten digits are comprised. [sent-133, score-0.284]
53 This is consistent 2 Figure 2: Bases learned from MNIST data Less formally, this regularization ensures that the “firing rate” of the model neurons (corresponding to the latent random variables hj ) are kept at a certain (fairly low) level, so that the activations of the model neurons are sparse. [sent-134, score-0.654]
54 ) 4 Figure 3: 400 first layer bases learned from the van Hateren natural image dataset, using our algorithm. [sent-146, score-0.719]
55 Figure 4: Visualization of 200 second layer bases (model V2 receptive fields), learned from natural images. [sent-147, score-0.73]
56 Each small group of 3-5 (arranged in a row) images shows one model V2 unit; the leftmost patch in the group is a visualization of the model V2 basis, and is obtained by taking a weighted linear combination of the first layer “V1” bases to which it is connected. [sent-148, score-0.688]
57 The next few patches in the group show the first layer bases that have the strongest weight connection to the model V2 basis. [sent-149, score-0.569]
58 with results obtained by applying different algorithms to learn sparse representations of this data set (e. [sent-150, score-0.242]
59 2 Learning from natural images We also applied the algorithm to a training set a set of 14-by-14 natural image patches, taken from a dataset compiled by van Hateren. [sent-154, score-0.284]
60 5 We learned a sparse RBM model with 196 visible units and 400 hidden units. [sent-155, score-0.485]
61 The learned bases are shown in Figure 3; they are oriented, gabor-like bases and resemble the receptive fields of V1 simple cells. [sent-156, score-0.74]
62 3 Learning a two-layer model of natural images using sparse RBMs We further learned a two-layer network by stacking one sparse RBM on top of another (see Section 3. [sent-158, score-0.604]
63 6 Most other authors’ experiments to date using regular (non-sparse) RBMs, when trained on such data, seem to have learned relatively diffuse, unlocalized bases (ones that do not represent oriented edge filters). [sent-167, score-0.512]
64 While sensitive to the parameter settings and requiring a long training time, we found that it is possible in some cases to get a regular RBM to learn oriented edge filter bases as well. [sent-168, score-0.475]
65 But in our experiments, even in these cases we found that repeating this process to build a two layer deep belief net (see Section 4. [sent-169, score-0.495]
66 For example, the fraction of model V2 neurons that respond strongly to a pair of edges near right angles (formally, have peak angle in the range 60-120 degrees) was 2% for the regular RBM, whereas it was 17% for the sparse RBM (and Ito & Komatsu reported 22%). [sent-171, score-0.905]
67 7 For the results reported in this paper, we trained the second layer sparse RBM with real-valued visible units; however, the results were very similar when we trained the second layer sparse RBM with binary-valued visible units (except that the second layer weights became less sparse). [sent-174, score-1.406]
68 ) Bottom: Angle stimulus response profile for model V2 neurons in the top row. [sent-177, score-0.364]
69 As in Figure 1, darkened patches represent stimuli to which the model V2 neuron responds strongly; also, a small black square indicates the overall peak response. [sent-179, score-0.389]
70 By visualizing the second layer bases as shown in Figure 4, we observed bases that encoded co-linear first layer bases as well as edge junctions. [sent-181, score-1.356]
71 This shows that by extending the sparse RBM to two layers and using greedy learning, the model is able to learn bases that encode contours, angles, and junctions of edges. [sent-182, score-0.684]
72 To identify the “center” of each model neuron’s receptive field, we translate all stimuli densely over the 14x14 input image patch, and identify the position at which the maximum response is elicited. [sent-187, score-0.385]
73 All measures are then taken with all angle stimuli centered at this position. [sent-188, score-0.368]
74 In other words, for each stimulus we compute the first hidden layer activation probabilities, then feed this probability as data to the second hidden layer and compute the activation probabilities again in the same manner. [sent-190, score-0.682]
75 Following a protocol similar to [7], we also eliminate from consideration the model neurons that do not respond strongly to corners and edges. [sent-191, score-0.343]
76 ) We see that all the V2 bases in Figure 5 have maximal response when its strongest V1-basis components are aligned with the stimulus. [sent-194, score-0.361]
77 Thus, some of these bases do indeed seem to encode edge junctions or crossings. [sent-195, score-0.414]
78 We also compute similar summary statistics as [7] (described in Figure 1(C,D,E)), that more quantitatively measure the distribution of V2 or model V2 responses to the different angle stimuli. [sent-196, score-0.337]
79 These stimulus are scaled such that about 5% of the V2 bases fires maximally to these random stimuli. [sent-212, score-0.491]
80 We then exclude the V2 bases that are maximally activated to these random stimuli from the subsequent analysis. [sent-213, score-0.586]
81 5 primary line axis sparse DBN Ito & Komatsu 0. [sent-215, score-0.296]
82 1 0 0 105 135 165 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 angle width axis angle orientation axis 0. [sent-228, score-0.669]
83 8 1 sparse DBN sparse DBN Ito & Komatsu Ito & Komatsu 0. [sent-229, score-0.4]
84 Figure 7: Visualization of a number of model V2 neurons that maximally respond to various complex stimuli. [sent-239, score-0.399]
85 The V2 bases shown in the figures maximally respond to acute angles (left), obtuse angles (middle), and tri-stars and junctions (right). [sent-242, score-0.914]
86 2 Complex shaped model V2 neurons Our second experiment represents a comparison to a subset of the results described in Hegde and van Essen [23]. [sent-244, score-0.267]
87 We generated a stimulus set comprising some [23]’s complex shaped stimuli: angles, single bars, tri-stars (three line segments that meet at a point), and arcs/circles, and measured the response of the second layer of our sparse RBM model to these stimuli. [sent-245, score-0.671]
88 11 We observe that many V2 bases are activated mainly by one of these different stimulus classes. [sent-246, score-0.426]
89 For example, some model V2 neurons activate maximally to single bars; some maximally activate to (acute or obtuse) angles; and others to tri-stars (see Figure 7). [sent-247, score-0.495]
90 Further, the number of V2 bases that are maximally activated by acute angles is significantly larger than the number of obtuse angles, and the number of V2 bases that respond maximally to the tri-stars was much smaller than both preceding cases. [sent-248, score-1.14]
91 6 Conclusions We presented a sparse variant of the deep belief network model. [sent-250, score-0.531]
92 More interestingly, the second layer captures a variety of both colinear (“contour”) features as well as corners and junctions, that in a quantitative comparison to measurements of V2 taken by Ito & Komatsu, appeared to give responses that were similar along several dimensions. [sent-252, score-0.439]
93 This by no means indicates that the cortex is a sparse RBM, but perhaps is more suggestive of contours, corners and junctions being fundamental to the statistics of natural images. [sent-253, score-0.44]
94 12 Nonetheless, we believe that these results also suggest that sparse 11 All the stimuli were 14-by-14 pixel image patches. [sent-254, score-0.383]
95 7 deep learning algorithms, such as our sparse variant of deep belief nets, hold promise for modeling higher-order features such as might be computed in the ventral visual pathway in the cortex. [sent-258, score-0.863]
96 Representation of angles embedded within contour stimuli in area v2 of macaque monkeys. [sent-308, score-0.485]
97 Independent component filters of natural images compared with simple cells in primary visual cortex. [sent-314, score-0.243]
98 Emergence of simple-cell receptive field properties by learning a sparse code for natural images. [sent-332, score-0.37]
99 The orientation and direction selectivity of cells in macaque visual cortex. [sent-353, score-0.295]
100 A multi-layer sparse coding network learns contour coding from natural images. [sent-365, score-0.486]
wordName wordTfidf (topN-words)
[('rbm', 0.312), ('bases', 0.284), ('komatsu', 0.232), ('layer', 0.229), ('angle', 0.223), ('ito', 0.216), ('deep', 0.202), ('sparse', 0.2), ('neurons', 0.191), ('hj', 0.148), ('stimuli', 0.145), ('angles', 0.131), ('receptive', 0.125), ('responses', 0.114), ('maximally', 0.111), ('wij', 0.104), ('oriented', 0.103), ('vi', 0.103), ('peak', 0.096), ('bj', 0.096), ('hinton', 0.096), ('stimulus', 0.096), ('visualization', 0.095), ('units', 0.093), ('area', 0.089), ('pro', 0.087), ('junctions', 0.084), ('visual', 0.084), ('visible', 0.081), ('images', 0.08), ('cell', 0.079), ('response', 0.077), ('van', 0.076), ('layers', 0.074), ('contour', 0.071), ('selectivity', 0.069), ('coding', 0.069), ('dbn', 0.065), ('respond', 0.064), ('belief', 0.064), ('hidden', 0.064), ('tolerance', 0.062), ('elongation', 0.062), ('hateren', 0.062), ('hierarchical', 0.061), ('axis', 0.06), ('orientation', 0.059), ('obtuse', 0.058), ('responded', 0.058), ('corners', 0.057), ('patches', 0.056), ('cortex', 0.054), ('recon', 0.054), ('neuron', 0.053), ('contrastive', 0.052), ('ci', 0.051), ('hegde', 0.051), ('rbms', 0.051), ('acute', 0.051), ('macaque', 0.049), ('lters', 0.048), ('les', 0.048), ('learned', 0.047), ('elds', 0.046), ('edge', 0.046), ('raina', 0.046), ('activated', 0.046), ('natural', 0.045), ('selective', 0.044), ('width', 0.044), ('cortical', 0.044), ('learn', 0.042), ('activate', 0.041), ('energy', 0.041), ('quantitative', 0.039), ('darkened', 0.039), ('schaaf', 0.039), ('ventral', 0.039), ('promise', 0.039), ('activations', 0.039), ('osindero', 0.039), ('regularization', 0.038), ('image', 0.038), ('line', 0.036), ('informally', 0.036), ('unit', 0.034), ('cells', 0.034), ('der', 0.034), ('essen', 0.034), ('hyvarinen', 0.034), ('ranzato', 0.034), ('variant', 0.033), ('complex', 0.033), ('areas', 0.033), ('ica', 0.032), ('axes', 0.032), ('network', 0.032), ('trained', 0.032), ('deeper', 0.031), ('protocol', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 182 nips-2007-Sparse deep belief net model for visual area V2
Author: Honglak Lee, Chaitanya Ekanadham, Andrew Y. Ng
Abstract: Motivated in part by the hierarchical organization of the cortex, a number of algorithms have recently been proposed that try to learn hierarchical, or “deep,” structure from unlabeled data. While several authors have formally or informally compared their algorithms to computations performed in visual area V1 (and the cochlea), little attempt has been made thus far to evaluate these algorithms in terms of their fidelity for mimicking computations at deeper levels in the cortical hierarchy. This paper presents an unsupervised learning model that faithfully mimics certain properties of visual area V2. Specifically, we develop a sparse variant of the deep belief networks of Hinton et al. (2006). We learn two layers of nodes in the network, and demonstrate that the first layer, similar to prior work on sparse coding and ICA, results in localized, oriented, edge filters, similar to the Gabor functions known to model V1 cell receptive fields. Further, the second layer in our model encodes correlations of the first layer responses in the data. Specifically, it picks up both colinear (“contour”) features as well as corners and junctions. More interestingly, in a quantitative comparison, the encoding of these more complex “corner” features matches well with the results from the Ito & Komatsu’s study of biological V2 responses. This suggests that our sparse variant of deep belief networks holds promise for modeling more higher-order features. 1
2 0.42420977 132 nips-2007-Modeling image patches with a directed hierarchy of Markov random fields
Author: Simon Osindero, Geoffrey E. Hinton
Abstract: We describe an efficient learning procedure for multilayer generative models that combine the best aspects of Markov random fields and deep, directed belief nets. The generative models can be learned one layer at a time and when learning is complete they have a very fast inference procedure for computing a good approximation to the posterior distribution in all of the hidden layers. Each hidden layer has its own MRF whose energy function is modulated by the top-down directed connections from the layer above. To generate from the model, each layer in turn must settle to equilibrium given its top-down input. We show that this type of model is good at capturing the statistics of patches of natural images. 1
3 0.31462699 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes
Author: Geoffrey E. Hinton, Ruslan Salakhutdinov
Abstract: We show how to use unlabeled data and a deep belief net (DBN) to learn a good covariance kernel for a Gaussian process. We first learn a deep generative model of the unlabeled data using the fast, greedy algorithm introduced by [7]. If the data is high-dimensional and highly-structured, a Gaussian kernel applied to the top layer of features in the DBN works much better than a similar kernel applied to the raw input. Performance at both regression and classification can then be further improved by using backpropagation through the DBN to discriminatively fine-tune the covariance kernel.
4 0.23857762 180 nips-2007-Sparse Feature Learning for Deep Belief Networks
Author: Marc'aurelio Ranzato, Y-lan Boureau, Yann L. Cun
Abstract: Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the representation to have certain desirable properties (e.g. low dimension, sparsity, etc). Others are based on approximating density by stochastically reconstructing the input from the representation. We describe a novel and efficient algorithm to learn sparse representations, and compare it theoretically and experimentally with a similar machine trained probabilistically, namely a Restricted Boltzmann Machine. We propose a simple criterion to compare and select different unsupervised machines based on the trade-off between the reconstruction error and the information content of the representation. We demonstrate this method by extracting features from a dataset of handwritten numerals, and from a dataset of natural image patches. We show that by stacking multiple levels of such machines and by training sequentially, high-order dependencies between the input observed variables can be captured. 1
5 0.20464022 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images
Author: Pierre Garrigues, Bruno A. Olshausen
Abstract: It has been shown that adapting a dictionary of basis functions to the statistics of natural images so as to maximize sparsity in the coefficients results in a set of dictionary elements whose spatial properties resemble those of V1 (primary visual cortex) receptive fields. However, the resulting sparse coefficients still exhibit pronounced statistical dependencies, thus violating the independence assumption of the sparse coding model. Here, we propose a model that attempts to capture the dependencies among the basis function coefficients by including a pairwise coupling term in the prior over the coefficient activity states. When adapted to the statistics of natural images, the coupling terms learn a combination of facilitatory and inhibitory interactions among neighboring basis functions. These learned interactions may offer an explanation for the function of horizontal connections in V1 in terms of a prior over natural images.
6 0.18861519 164 nips-2007-Receptive Fields without Spike-Triggering
7 0.18083164 140 nips-2007-Neural characterization in partially observed populations of spiking neurons
8 0.15913337 181 nips-2007-Sparse Overcomplete Latent Variable Decomposition of Counts Data
9 0.1534138 33 nips-2007-Bayesian Inference for Spiking Neuron Models with a Sparsity Prior
10 0.13959427 145 nips-2007-On Sparsity and Overcompleteness in Image Models
11 0.092547163 60 nips-2007-Contraction Properties of VLSI Cooperative Competitive Neural Networks of Spiking Neurons
12 0.087675899 81 nips-2007-Estimating disparity with confidence from energy neurons
13 0.083528958 25 nips-2007-An in-silico Neural Model of Dynamic Routing through Neuronal Coherence
14 0.077098548 36 nips-2007-Better than least squares: comparison of objective functions for estimating linear-nonlinear models
15 0.071586356 17 nips-2007-A neural network implementing optimal state estimation based on dynamic spike train decoding
16 0.069797359 8 nips-2007-A New View of Automatic Relevance Determination
17 0.066817626 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons
18 0.065110281 122 nips-2007-Locality and low-dimensions in the prediction of natural experience from fMRI
19 0.064001247 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
20 0.063456386 117 nips-2007-Learning to classify complex patterns using a VLSI network of spiking neurons
topicId topicWeight
[(0, -0.245), (1, 0.191), (2, 0.183), (3, -0.083), (4, -0.015), (5, 0.078), (6, -0.303), (7, 0.158), (8, 0.374), (9, 0.011), (10, 0.331), (11, -0.069), (12, 0.158), (13, -0.072), (14, 0.038), (15, 0.016), (16, 0.06), (17, -0.04), (18, -0.06), (19, 0.078), (20, 0.051), (21, -0.005), (22, -0.104), (23, -0.081), (24, 0.061), (25, -0.069), (26, -0.032), (27, 0.015), (28, -0.017), (29, -0.015), (30, -0.013), (31, 0.013), (32, -0.03), (33, -0.0), (34, 0.041), (35, -0.042), (36, 0.017), (37, 0.028), (38, 0.006), (39, 0.008), (40, 0.033), (41, -0.022), (42, 0.001), (43, -0.004), (44, 0.008), (45, -0.002), (46, 0.031), (47, 0.046), (48, 0.019), (49, 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.96174008 182 nips-2007-Sparse deep belief net model for visual area V2
Author: Honglak Lee, Chaitanya Ekanadham, Andrew Y. Ng
Abstract: Motivated in part by the hierarchical organization of the cortex, a number of algorithms have recently been proposed that try to learn hierarchical, or “deep,” structure from unlabeled data. While several authors have formally or informally compared their algorithms to computations performed in visual area V1 (and the cochlea), little attempt has been made thus far to evaluate these algorithms in terms of their fidelity for mimicking computations at deeper levels in the cortical hierarchy. This paper presents an unsupervised learning model that faithfully mimics certain properties of visual area V2. Specifically, we develop a sparse variant of the deep belief networks of Hinton et al. (2006). We learn two layers of nodes in the network, and demonstrate that the first layer, similar to prior work on sparse coding and ICA, results in localized, oriented, edge filters, similar to the Gabor functions known to model V1 cell receptive fields. Further, the second layer in our model encodes correlations of the first layer responses in the data. Specifically, it picks up both colinear (“contour”) features as well as corners and junctions. More interestingly, in a quantitative comparison, the encoding of these more complex “corner” features matches well with the results from the Ito & Komatsu’s study of biological V2 responses. This suggests that our sparse variant of deep belief networks holds promise for modeling more higher-order features. 1
2 0.87052649 132 nips-2007-Modeling image patches with a directed hierarchy of Markov random fields
Author: Simon Osindero, Geoffrey E. Hinton
Abstract: We describe an efficient learning procedure for multilayer generative models that combine the best aspects of Markov random fields and deep, directed belief nets. The generative models can be learned one layer at a time and when learning is complete they have a very fast inference procedure for computing a good approximation to the posterior distribution in all of the hidden layers. Each hidden layer has its own MRF whose energy function is modulated by the top-down directed connections from the layer above. To generate from the model, each layer in turn must settle to equilibrium given its top-down input. We show that this type of model is good at capturing the statistics of patches of natural images. 1
3 0.75693989 180 nips-2007-Sparse Feature Learning for Deep Belief Networks
Author: Marc'aurelio Ranzato, Y-lan Boureau, Yann L. Cun
Abstract: Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the representation to have certain desirable properties (e.g. low dimension, sparsity, etc). Others are based on approximating density by stochastically reconstructing the input from the representation. We describe a novel and efficient algorithm to learn sparse representations, and compare it theoretically and experimentally with a similar machine trained probabilistically, namely a Restricted Boltzmann Machine. We propose a simple criterion to compare and select different unsupervised machines based on the trade-off between the reconstruction error and the information content of the representation. We demonstrate this method by extracting features from a dataset of handwritten numerals, and from a dataset of natural image patches. We show that by stacking multiple levels of such machines and by training sequentially, high-order dependencies between the input observed variables can be captured. 1
4 0.60040498 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes
Author: Geoffrey E. Hinton, Ruslan Salakhutdinov
Abstract: We show how to use unlabeled data and a deep belief net (DBN) to learn a good covariance kernel for a Gaussian process. We first learn a deep generative model of the unlabeled data using the fast, greedy algorithm introduced by [7]. If the data is high-dimensional and highly-structured, a Gaussian kernel applied to the top layer of features in the DBN works much better than a similar kernel applied to the raw input. Performance at both regression and classification can then be further improved by using backpropagation through the DBN to discriminatively fine-tune the covariance kernel.
5 0.51040924 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images
Author: Pierre Garrigues, Bruno A. Olshausen
Abstract: It has been shown that adapting a dictionary of basis functions to the statistics of natural images so as to maximize sparsity in the coefficients results in a set of dictionary elements whose spatial properties resemble those of V1 (primary visual cortex) receptive fields. However, the resulting sparse coefficients still exhibit pronounced statistical dependencies, thus violating the independence assumption of the sparse coding model. Here, we propose a model that attempts to capture the dependencies among the basis function coefficients by including a pairwise coupling term in the prior over the coefficient activity states. When adapted to the statistics of natural images, the coupling terms learn a combination of facilitatory and inhibitory interactions among neighboring basis functions. These learned interactions may offer an explanation for the function of horizontal connections in V1 in terms of a prior over natural images.
6 0.44201159 164 nips-2007-Receptive Fields without Spike-Triggering
7 0.43127266 81 nips-2007-Estimating disparity with confidence from energy neurons
8 0.42390144 140 nips-2007-Neural characterization in partially observed populations of spiking neurons
9 0.40467894 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
10 0.40209115 25 nips-2007-An in-silico Neural Model of Dynamic Routing through Neuronal Coherence
11 0.40206304 145 nips-2007-On Sparsity and Overcompleteness in Image Models
12 0.40095177 33 nips-2007-Bayesian Inference for Spiking Neuron Models with a Sparsity Prior
13 0.35931775 60 nips-2007-Contraction Properties of VLSI Cooperative Competitive Neural Networks of Spiking Neurons
14 0.3576262 181 nips-2007-Sparse Overcomplete Latent Variable Decomposition of Counts Data
15 0.31562987 89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta
16 0.31257102 130 nips-2007-Modeling Natural Sounds with Modulation Cascade Processes
17 0.27322441 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
18 0.26837885 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data
19 0.24555589 115 nips-2007-Learning the 2-D Topology of Images
20 0.24365152 210 nips-2007-Unconstrained On-line Handwriting Recognition with Recurrent Neural Networks
topicId topicWeight
[(5, 0.1), (13, 0.025), (16, 0.061), (21, 0.034), (34, 0.015), (35, 0.017), (47, 0.073), (83, 0.17), (85, 0.011), (87, 0.025), (90, 0.384)]
simIndex simValue paperId paperTitle
1 0.96714306 8 nips-2007-A New View of Automatic Relevance Determination
Author: David P. Wipf, Srikantan S. Nagarajan
Abstract: Automatic relevance determination (ARD) and the closely-related sparse Bayesian learning (SBL) framework are effective tools for pruning large numbers of irrelevant features leading to a sparse explanatory subset. However, popular update rules used for ARD are either difficult to extend to more general problems of interest or are characterized by non-ideal convergence properties. Moreover, it remains unclear exactly how ARD relates to more traditional MAP estimation-based methods for learning sparse representations (e.g., the Lasso). This paper furnishes an alternative means of expressing the ARD cost function using auxiliary functions that naturally addresses both of these issues. First, the proposed reformulation of ARD can naturally be optimized by solving a series of re-weighted 1 problems. The result is an efficient, extensible algorithm that can be implemented using standard convex programming toolboxes and is guaranteed to converge to a local minimum (or saddle point). Secondly, the analysis reveals that ARD is exactly equivalent to performing standard MAP estimation in weight space using a particular feature- and noise-dependent, non-factorial weight prior. We then demonstrate that this implicit prior maintains several desirable advantages over conventional priors with respect to feature selection. Overall these results suggest alternative cost functions and update procedures for selecting features and promoting sparse solutions in a variety of general situations. In particular, the methodology readily extends to handle problems such as non-negative sparse coding and covariance component estimation. 1
2 0.9571104 85 nips-2007-Experience-Guided Search: A Theory of Attentional Control
Author: David Baldwin, Michael C. Mozer
Abstract: People perform a remarkable range of tasks that require search of the visual environment for a target item among distractors. The Guided Search model (Wolfe, 1994, 2007), or GS, is perhaps the best developed psychological account of human visual search. To prioritize search, GS assigns saliency to locations in the visual field. Saliency is a linear combination of activations from retinotopic maps representing primitive visual features. GS includes heuristics for setting the gain coefficient associated with each map. Variants of GS have formalized the notion of optimization as a principle of attentional control (e.g., Baldwin & Mozer, 2006; Cave, 1999; Navalpakkam & Itti, 2006; Rao et al., 2002), but every GS-like model must be ’dumbed down’ to match human data, e.g., by corrupting the saliency map with noise and by imposing arbitrary restrictions on gain modulation. We propose a principled probabilistic formulation of GS, called Experience-Guided Search (EGS), based on a generative model of the environment that makes three claims: (1) Feature detectors produce Poisson spike trains whose rates are conditioned on feature type and whether the feature belongs to a target or distractor; (2) the environment and/or task is nonstationary and can change over a sequence of trials; and (3) a prior specifies that features are more likely to be present for target than for distractors. Through experience, EGS infers latent environment variables that determine the gains for guiding search. Control is thus cast as probabilistic inference, not optimization. We show that EGS can replicate a range of human data from visual search, including data that GS does not address. 1
3 0.9548378 119 nips-2007-Learning with Tree-Averaged Densities and Distributions
Author: Sergey Kirshner
Abstract: We utilize the ensemble of trees framework, a tractable mixture over superexponential number of tree-structured distributions [1], to develop a new model for multivariate density estimation. The model is based on a construction of treestructured copulas – multivariate distributions with uniform on [0, 1] marginals. By averaging over all possible tree structures, the new model can approximate distributions with complex variable dependencies. We propose an EM algorithm to estimate the parameters for these tree-averaged models for both the real-valued and the categorical case. Based on the tree-averaged framework, we propose a new model for joint precipitation amounts data on networks of rain stations. 1
4 0.95085216 184 nips-2007-Stability Bounds for Non-i.i.d. Processes
Author: Mehryar Mohri, Afshin Rostamizadeh
Abstract: The notion of algorithmic stability has been used effectively in the past to derive tight generalization bounds. A key advantage of these bounds is that they are designed for specific learning algorithms, exploiting their particular properties. But, as in much of learning theory, existing stability analyses and bounds apply only in the scenario where the samples are independently and identically distributed (i.i.d.). In many machine learning applications, however, this assumption does not hold. The observations received by the learning algorithm often have some inherent temporal dependence, which is clear in system diagnosis or time series prediction problems. This paper studies the scenario where the observations are drawn from a stationary mixing sequence, which implies a dependence between observations that weaken over time. It proves novel stability-based generalization bounds that hold even with this more general setting. These bounds strictly generalize the bounds given in the i.i.d. case. It also illustrates their application in the case of several general classes of learning algorithms, including Support Vector Regression and Kernel Ridge Regression.
same-paper 5 0.92824888 182 nips-2007-Sparse deep belief net model for visual area V2
Author: Honglak Lee, Chaitanya Ekanadham, Andrew Y. Ng
Abstract: Motivated in part by the hierarchical organization of the cortex, a number of algorithms have recently been proposed that try to learn hierarchical, or “deep,” structure from unlabeled data. While several authors have formally or informally compared their algorithms to computations performed in visual area V1 (and the cochlea), little attempt has been made thus far to evaluate these algorithms in terms of their fidelity for mimicking computations at deeper levels in the cortical hierarchy. This paper presents an unsupervised learning model that faithfully mimics certain properties of visual area V2. Specifically, we develop a sparse variant of the deep belief networks of Hinton et al. (2006). We learn two layers of nodes in the network, and demonstrate that the first layer, similar to prior work on sparse coding and ICA, results in localized, oriented, edge filters, similar to the Gabor functions known to model V1 cell receptive fields. Further, the second layer in our model encodes correlations of the first layer responses in the data. Specifically, it picks up both colinear (“contour”) features as well as corners and junctions. More interestingly, in a quantitative comparison, the encoding of these more complex “corner” features matches well with the results from the Ito & Komatsu’s study of biological V2 responses. This suggests that our sparse variant of deep belief networks holds promise for modeling more higher-order features. 1
6 0.74011266 66 nips-2007-Density Estimation under Independent Similarly Distributed Sampling Assumptions
7 0.7124005 202 nips-2007-The discriminant center-surround hypothesis for bottom-up saliency
8 0.70011038 156 nips-2007-Predictive Matrix-Variate t Models
9 0.68220758 128 nips-2007-Message Passing for Max-weight Independent Set
10 0.67504126 185 nips-2007-Stable Dual Dynamic Programming
11 0.67354548 82 nips-2007-Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization
12 0.6732564 63 nips-2007-Convex Relaxations of Latent Variable Training
13 0.67304349 49 nips-2007-Colored Maximum Variance Unfolding
14 0.67161238 96 nips-2007-Heterogeneous Component Analysis
15 0.66882777 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models
16 0.66835862 113 nips-2007-Learning Visual Attributes
17 0.66345382 140 nips-2007-Neural characterization in partially observed populations of spiking neurons
18 0.66214085 7 nips-2007-A Kernel Statistical Test of Independence
19 0.65709639 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection
20 0.65539187 187 nips-2007-Structured Learning with Approximate Inference