nips nips2012 nips2012-158 knowledge-graph by maker-knowledge-mining

158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks


Source: pdf

Author: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ca Abstract We trained a large, deep convolutional neural network to classify the 1. [sent-8, score-0.82]

2 2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. [sent-9, score-0.325]

3 The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. [sent-13, score-0.879]

4 To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. [sent-14, score-0.238]

5 To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. [sent-15, score-0.233]

6 We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15. [sent-16, score-0.284]

7 Until recently, datasets of labeled images were relatively small — on the order of tens of thousands of images (e. [sent-21, score-0.385]

8 The new larger datasets include LabelMe [23], which consists of hundreds of thousands of fully-segmented images, and ImageNet [6], which consists of over 15 million labeled high-resolution images in over 22,000 categories. [sent-32, score-0.321]

9 Their capacity can be controlled by varying their depth and breadth, and they also make strong and mostly correct assumptions about the nature of images (namely, stationarity of statistics and locality of pixel dependencies). [sent-36, score-0.213]

10 The specific contributions of this paper are as follows: we trained one of the largest convolutional neural networks to date on the subsets of ImageNet used in the ILSVRC-2010 and ILSVRC-2012 competitions [2] and achieved by far the best results ever reported on these datasets. [sent-40, score-0.714]

11 We wrote a highly-optimized GPU implementation of 2D convolution and all the other operations inherent in training convolutional neural networks, which we make available publicly1 . [sent-41, score-0.654]

12 Our network contains a number of new and unusual features which improve its performance and reduce its training time, which are detailed in Section 3. [sent-42, score-0.26]

13 2 million labeled training examples, so we used several effective techniques for preventing overfitting, which are described in Section 4. [sent-44, score-0.283]

14 Our final network contains five convolutional and three fully-connected layers, and this depth seems to be important: we found that removing any convolutional layer (each of which contains no more than 1% of the model’s parameters) resulted in inferior performance. [sent-45, score-1.365]

15 Our network takes between five and six days to train on two GTX 580 3GB GPUs. [sent-47, score-0.262]

16 2 The Dataset ImageNet is a dataset of over 15 million labeled high-resolution images belonging to roughly 22,000 categories. [sent-49, score-0.387]

17 The images were collected from the web and labeled by human labelers using Amazon’s Mechanical Turk crowd-sourcing tool. [sent-50, score-0.213]

18 ILSVRC uses a subset of ImageNet with roughly 1000 images in each of 1000 categories. [sent-52, score-0.202]

19 2 million training images, 50,000 validation images, and 150,000 testing images. [sent-54, score-0.217]

20 On ImageNet, it is customary to report two error rates: top-1 and top-5, where the top-5 error rate is the fraction of test images for which the correct label is not among the five labels considered most probable by the model. [sent-57, score-0.387]

21 We did not pre-process the images in any other way, except for subtracting the mean activity over the training set from each pixel. [sent-61, score-0.251]

22 So we trained our network on the (centered) raw RGB values of the pixels. [sent-62, score-0.21]

23 3 The Architecture The architecture of our network is summarized in Figure 2. [sent-63, score-0.237]

24 It contains eight learned layers — five convolutional and three fully-connected. [sent-64, score-0.781]

25 Deep convolutional neural networks with ReLUs train several times faster than their equivalents with tanh units. [sent-75, score-0.738]

26 This is demonstrated in Figure 1, which shows the number of iterations required to reach 25% training error on the CIFAR-10 dataset for a particular four-layer convolutional network. [sent-76, score-0.666]

27 This plot shows that we would not have been able to experiment with such large neural networks for this work if we had used traditional saturating neuron Figure 1: A four-layer convolutional neural models. [sent-77, score-0.774]

28 network with ReLUs (solid line) reaches a 25% We are not the first to consider alternatives to traditional neuron models in CNNs. [sent-78, score-0.231]

29 [11] claim that the nonlinearity f (x) = |tanh(x)| works particularly well with their type of contrast normalization followed by local average pooling on the Caltech-101 dataset. [sent-80, score-0.263]

30 2 training error rate on CIFAR-10 six times faster than an equivalent network with tanh neurons (dashed line). [sent-84, score-0.544]

31 The learning rates for each network were chosen independently to make training as fast as possible. [sent-85, score-0.29]

32 The magnitude of the effect demonstrated here varies with network architecture, but networks with ReLUs consistently learn several times faster than equivalents with saturating neurons. [sent-87, score-0.31]

33 2 million training examples are enough to train networks which are too big to fit on one GPU. [sent-90, score-0.321]

34 The parallelization scheme that we employ essentially puts half of the kernels (or neurons) on each GPU, with one additional trick: the GPUs communicate only in certain layers. [sent-93, score-0.215]

35 This means that, for example, the kernels of layer 3 take input from all kernel maps in layer 2. [sent-94, score-0.604]

36 However, kernels in layer 4 take input only from those kernel maps in layer 3 which reside on the same GPU. [sent-95, score-0.634]

37 2%, respectively, as compared with a net with half as many kernels in each convolutional layer trained on one GPU. [sent-101, score-1.032]

38 2 The one-GPU net actually has the same number of kernels as the two-GPU net in the final convolutional layer. [sent-103, score-0.828]

39 This is because most of the net’s parameters are in the first fully-connected layer, which takes the last convolutional layer as input. [sent-104, score-0.71]

40 So to make the two nets have approximately the same number of parameters, we did not halve the size of the final convolutional layer (nor the fully-conneced layers which follow). [sent-105, score-0.943]

41 This sort of response normalization implements a form of lateral inhibition inspired by the type found in real neurons, creating competition for big activities amongst neuron outputs computed using different kernels. [sent-113, score-0.242]

42 We applied this normalization after applying the ReLU nonlinearity in certain layers (see Section 3. [sent-116, score-0.371]

43 We also verified the effectiveness of this scheme on the CIFAR-10 dataset: a four-layer CNN achieved a 13% test error rate without normalization and 11% with normalization3 . [sent-123, score-0.296]

44 4 Overlapping Pooling Pooling layers in CNNs summarize the outputs of neighboring groups of neurons in the same kernel map. [sent-125, score-0.388]

45 To be more precise, a pooling layer can be thought of as consisting of a grid of pooling units spaced s pixels apart, each summarizing a neighborhood of size z × z centered at the location of the pooling unit. [sent-129, score-0.639]

46 We generally observe during training that models with overlapping pooling find it slightly more difficult to overfit. [sent-136, score-0.234]

47 As depicted in Figure 2, the net contains eight layers with weights; the first five are convolutional and the remaining three are fullyconnected. [sent-139, score-0.877]

48 The output of the last fully-connected layer is fed to a 1000-way softmax which produces a distribution over the 1000 class labels. [sent-140, score-0.203]

49 Our network maximizes the multinomial logistic regression objective, which is equivalent to maximizing the average across training cases of the log-probability of the correct label under the prediction distribution. [sent-141, score-0.227]

50 The kernels of the second, fourth, and fifth convolutional layers are connected only to those kernel maps in the previous layer which reside on the same GPU (see Figure 2). [sent-142, score-1.203]

51 The kernels of the third convolutional layer are connected to all kernel maps in the second layer. [sent-143, score-0.94]

52 The neurons in the fullyconnected layers are connected to all neurons in the previous layer. [sent-144, score-0.509]

53 Response-normalization layers follow the first and second convolutional layers. [sent-145, score-0.74]

54 4, follow both response-normalization layers as well as the fifth convolutional layer. [sent-147, score-0.74]

55 The ReLU non-linearity is applied to the output of every convolutional and fully-connected layer. [sent-148, score-0.507]

56 The network’s input is 150,528-dimensional, and the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264– 4096–4096–1000. [sent-155, score-0.355]

57 The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5 × 5 × 48. [sent-157, score-1.549]

58 The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling or normalization layers. [sent-158, score-0.97]

59 The third convolutional layer has 384 kernels of size 3 × 3 × 256 connected to the (normalized, pooled) outputs of the second convolutional layer. [sent-159, score-1.378]

60 The fourth convolutional layer has 384 kernels of size 3 × 3 × 192 , and the fifth convolutional layer has 256 kernels of size 3 × 3 × 192. [sent-160, score-1.72]

61 4 Reducing Overfitting Our neural network architecture has 60 million parameters. [sent-162, score-0.376]

62 We employ two distinct forms of data augmentation, both of which allow transformed images to be produced from the original images with very little computation, so the transformed images do not need to be stored on disk. [sent-169, score-0.516]

63 In our implementation, the transformed images are generated in Python code on the CPU while the GPU is training on the previous batch of images. [sent-170, score-0.251]

64 We do this by extracting random 224 × 224 patches (and their horizontal reflections) from the 256×256 images and training our network on these extracted patches4 . [sent-173, score-0.474]

65 At test time, the network makes a prediction by extracting five 224 × 224 patches (the four corner patches and the center patch) as well as their horizontal reflections (hence ten patches in all), and averaging the predictions made by the network’s softmax layer on the ten patches. [sent-176, score-0.609]

66 To each training image, we add multiples of the found principal components, 4 This is the reason why the input images in Figure 2 are 224 × 224 × 3-dimensional. [sent-179, score-0.251]

67 Each αi is drawn only once for all the pixels of a particular training image until that image is used for training again, at which point it is re-drawn. [sent-183, score-0.361]

68 2 Dropout Combining the predictions of many different models is a very successful way to reduce test errors [1, 3], but it appears to be too expensive for big neural networks that already take several days to train. [sent-187, score-0.263]

69 We use dropout in the first two fully-connected layers of Figure 2. [sent-197, score-0.391]

70 0005 · · wi − · vi+1 := wi+1 ∂L ∂w Figure 3: 96 convolutional kernels of size 11×11×3 learned by the first convolutional layer on the 224×224×3 input images. [sent-207, score-1.385]

71 The top 48 kernels were learned on GPU 1 while the bottom 48 kernels were learned on GPU 2. [sent-208, score-0.258]

72 We initialized the weights in each layer from a zero-mean Gaussian distribution with standard deviation 0. [sent-212, score-0.203]

73 We initialized the neuron biases in the second, fourth, and fifth convolutional layers, as well as in the fully-connected hidden layers, with the constant 1. [sent-214, score-0.59]

74 We initialized the neuron biases in the remaining layers with the constant 0. [sent-216, score-0.316]

75 We trained the network for roughly 90 cycles through the training set of 1. [sent-221, score-0.319]

76 2 million images, which took five to six days on two NVIDIA GTX 580 3GB GPUs. [sent-222, score-0.192]

77 Our network achieves top-1 and top-5 test set error rates of 37. [sent-224, score-0.297]

78 2% with an approach that averages the predictions produced from six sparse-coding models trained on different features [2], and since then the best published results are 45. [sent-229, score-0.194]

79 7% ILSVRC-2012 test set labels are not publicly available, we cannot report test error rates for all the models that CNN 37. [sent-237, score-0.229]

80 Training one CNN, with an extra sixth convolutional layer over the last pooling layer, to classify the entire ImageNet Fall 2011 release (15M images, 22K categories), and then “fine-tuning” it on ILSVRC-2012 gives an error rate of 16. [sent-248, score-0.959]

81 Since there is no established test set, our split necessarily differs from the splits used Table 2: Comparison of error rates on ILSVRC-2012 validation and by previous authors, but this does test sets. [sent-268, score-0.221]

82 9%, attained by the net described above but with an additional, sixth convolutional layer over the last pooling layer. [sent-275, score-0.964]

83 1 Qualitative Evaluations Figure 3 shows the convolutional kernels learned by the network’s two data-connected layers. [sent-280, score-0.636]

84 The kernels on GPU 1 are largely color-agnostic, while the kernels on on GPU 2 are largely color-specific. [sent-284, score-0.258]

85 5 The error rates without averaging predictions over ten patches as described in Section 4. [sent-286, score-0.203]

86 7 Figure 4: (Left) Eight ILSVRC-2010 test images and the five labels considered most probable by our model. [sent-290, score-0.252]

87 (Right) Five ILSVRC-2010 test images in the first column. [sent-292, score-0.214]

88 The remaining columns show the six training images that produce feature vectors in the last hidden layer with the smallest Euclidean distance from the feature vector for the test image. [sent-293, score-0.545]

89 In the left panel of Figure 4 we qualitatively assess what the network has learned by computing its top-5 predictions on eight test images. [sent-294, score-0.282]

90 If two images produce feature activation vectors with a small Euclidean separation, we can say that the higher levels of the neural network consider them to be similar. [sent-300, score-0.351]

91 Figure 4 shows five images from the test set and the six images from the training set that are most similar to each of them according to this measure. [sent-301, score-0.514]

92 Notice that at the pixel level, the retrieved training images are generally not close in L2 to the query images in the first column. [sent-302, score-0.464]

93 We present the results for many more test images in the supplementary material. [sent-304, score-0.214]

94 This should produce a much better image retrieval method than applying autoencoders to the raw pixels [14], which does not make use of image labels and hence has a tendency to retrieve images with similar patterns of edges, whether or not they are semantically similar. [sent-306, score-0.413]

95 7 Discussion Our results show that a large, deep convolutional neural network is capable of achieving recordbreaking results on a highly challenging dataset using purely supervised learning. [sent-307, score-0.794]

96 It is notable that our network’s performance degrades if a single convolutional layer is removed. [sent-308, score-0.71]

97 For example, removing any of the middle layers results in a loss of about 2% for the top-1 performance of the network. [sent-309, score-0.233]

98 Thus far, our results have improved as we have made our network larger and trained it longer but we still have many orders of magnitude to go in order to match the infero-temporal pathway of the human visual system. [sent-312, score-0.264]

99 Ultimately we would like to use very large and deep convolutional nets on video sequences where the temporal structure provides very helpful information that is missing or far less obvious in static images. [sent-313, score-0.579]

100 Best practices for convolutional neural networks applied to visual document analysis. [sent-514, score-0.667]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('convolutional', 0.507), ('layers', 0.233), ('imagenet', 0.215), ('layer', 0.203), ('cnns', 0.199), ('images', 0.172), ('dropout', 0.158), ('gpu', 0.151), ('network', 0.148), ('gpus', 0.142), ('relus', 0.136), ('cnn', 0.131), ('kernels', 0.129), ('pooling', 0.125), ('neurons', 0.122), ('relu', 0.113), ('million', 0.108), ('net', 0.096), ('fvs', 0.09), ('ilsvrc', 0.09), ('ixy', 0.09), ('architecture', 0.089), ('image', 0.086), ('neuron', 0.083), ('training', 0.079), ('networks', 0.075), ('normalization', 0.073), ('deep', 0.072), ('rgb', 0.069), ('ve', 0.068), ('cire', 0.068), ('augmentation', 0.067), ('fth', 0.065), ('nonlinearity', 0.065), ('rates', 0.063), ('trained', 0.062), ('gtx', 0.06), ('competition', 0.057), ('pinto', 0.055), ('krizhevsky', 0.055), ('entered', 0.055), ('tanh', 0.055), ('preventing', 0.055), ('visual', 0.054), ('predictions', 0.051), ('scheme', 0.051), ('jarrett', 0.049), ('six', 0.049), ('object', 0.049), ('recognition', 0.048), ('saturating', 0.047), ('rate', 0.047), ('contest', 0.045), ('italics', 0.045), ('tting', 0.045), ('patches', 0.045), ('error', 0.044), ('test', 0.042), ('fourth', 0.042), ('eight', 0.041), ('fall', 0.041), ('pixel', 0.041), ('labeled', 0.041), ('equivalents', 0.04), ('wi', 0.039), ('achieved', 0.039), ('hinton', 0.038), ('labels', 0.038), ('toronto', 0.038), ('nair', 0.037), ('ilya', 0.037), ('val', 0.037), ('convolution', 0.037), ('arxiv', 0.036), ('dataset', 0.036), ('maps', 0.036), ('days', 0.035), ('deng', 0.035), ('half', 0.035), ('ections', 0.033), ('sixth', 0.033), ('unusual', 0.033), ('kernel', 0.033), ('averages', 0.032), ('connected', 0.032), ('momentum', 0.031), ('labelme', 0.031), ('sutskever', 0.031), ('specialization', 0.031), ('neural', 0.031), ('pixels', 0.031), ('reside', 0.03), ('units', 0.03), ('validation', 0.03), ('horizontal', 0.03), ('overlapping', 0.03), ('roughly', 0.03), ('train', 0.03), ('decay', 0.029), ('big', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

Author: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry. 1

2 0.28813568 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

Author: Dan Ciresan, Alessandro Giusti, Luca M. Gambardella, Jürgen Schmidhuber

Abstract: We address a central problem of neuroanatomy, namely, the automatic segmentation of neuronal structures depicted in stacks of electron microscopy (EM) images. This is necessary to efficiently map 3D brain structure and connectivity. To segment biological neuron membranes, we use a special type of deep artificial neural network as a pixel classifier. The label of each pixel (membrane or nonmembrane) is predicted from raw pixel values in a square window centered on it. The input layer maps each window pixel to a neuron. It is followed by a succession of convolutional and max-pooling layers which preserve 2D information and extract features with increasing levels of abstraction. The output layer produces a calibrated probability for each class. The classifier is trained by plain gradient descent on a 512 × 512 × 30 stack with known ground truth, and tested on a stack of the same size (ground truth unknown to the authors) by the organizers of the ISBI 2012 EM Segmentation Challenge. Even without problem-specific postprocessing, our approach outperforms competing techniques by a large margin in all three considered metrics, i.e. rand error, warping error and pixel error. For pixel error, our approach is the only one outperforming a second human observer. 1

3 0.26346353 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification

Author: Richard Socher, Brody Huval, Bharath Bath, Christopher D. Manning, Andrew Y. Ng

Abstract: Recent advances in 3D sensing technologies make it possible to easily record color and depth images which together can improve object recognition. Most current methods rely on very well-designed features for this new 3D modality. We introduce a model based on a combination of convolutional and recursive neural networks (CNN and RNN) for learning features and classifying RGB-D images. The CNN layer learns low-level translationally invariant features which are then given as inputs to multiple, fixed-tree RNNs in order to compose higher order features. RNNs can be seen as combining convolution and pooling into one efficient, hierarchical operation. Our main result is that even RNNs with random weights compose powerful features. Our model obtains state of the art performance on a standard RGB-D object dataset while being more accurate and faster during training and testing than comparable architectures such as two-layer CNNs. 1

4 0.19017397 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

Author: Will Zou, Shenghuo Zhu, Kai Yu, Andrew Y. Ng

Abstract: We apply salient feature detection and tracking in videos to simulate fixations and smooth pursuit in human vision. With tracked sequences as input, a hierarchical network of modules learns invariant features using a temporal slowness constraint. The network encodes invariance which are increasingly complex with hierarchy. Although learned from videos, our features are spatial instead of spatial-temporal, and well suited for extracting features from still images. We applied our features to four datasets (COIL-100, Caltech 101, STL-10, PubFig), and observe a consistent improvement of 4% to 5% in classification accuracy. With this approach, we achieve state-of-the-art recognition accuracy 61% on STL-10 dataset. 1

5 0.1822565 193 nips-2012-Learning to Align from Scratch

Author: Gary Huang, Marwan Mattar, Honglak Lee, Erik G. Learned-miller

Abstract: Unsupervised joint alignment of images has been demonstrated to improve performance on recognition tasks such as face verification. Such alignment reduces undesired variability due to factors such as pose, while only requiring weak supervision in the form of poorly aligned examples. However, prior work on unsupervised alignment of complex, real-world images has required the careful selection of feature representation based on hand-crafted image descriptors, in order to achieve an appropriate, smooth optimization landscape. In this paper, we instead propose a novel combination of unsupervised joint alignment with unsupervised feature learning. Specifically, we incorporate deep learning into the congealing alignment framework. Through deep learning, we obtain features that can represent the image at differing resolutions based on network depth, and that are tuned to the statistics of the specific data being aligned. In addition, we modify the learning algorithm for the restricted Boltzmann machine by incorporating a group sparsity penalty, leading to a topographic organization of the learned filters and improving subsequent alignment results. We apply our method to the Labeled Faces in the Wild database (LFW). Using the aligned images produced by our proposed unsupervised algorithm, we achieve higher accuracy in face verification compared to prior work in both unsupervised and supervised alignment. We also match the accuracy for the best available commercial method. 1

6 0.16545559 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

7 0.16320576 197 nips-2012-Learning with Recursive Perceptual Representations

8 0.15243404 62 nips-2012-Burn-in, bias, and the rationality of anchoring

9 0.15243404 116 nips-2012-Emergence of Object-Selective Features in Unsupervised Feature Learning

10 0.12556431 195 nips-2012-Learning visual motion in recurrent neural networks

11 0.11483499 238 nips-2012-Neurally Plausible Reinforcement Learning of Working Memory Tasks

12 0.11370772 170 nips-2012-Large Scale Distributed Deep Networks

13 0.11120221 190 nips-2012-Learning optimal spike-based representations

14 0.10891981 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

15 0.10639473 4 nips-2012-A Better Way to Pretrain Deep Boltzmann Machines

16 0.10541034 159 nips-2012-Image Denoising and Inpainting with Deep Neural Networks

17 0.099630848 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

18 0.097764857 8 nips-2012-A Generative Model for Parts-based Object Segmentation

19 0.096816286 185 nips-2012-Learning about Canonical Views from Internet Image Collections

20 0.093447961 100 nips-2012-Discriminative Learning of Sum-Product Networks


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.214), (1, 0.095), (2, -0.331), (3, 0.054), (4, 0.127), (5, 0.065), (6, -0.009), (7, -0.037), (8, -0.013), (9, -0.006), (10, 0.006), (11, 0.063), (12, -0.078), (13, 0.151), (14, -0.078), (15, -0.093), (16, -0.01), (17, -0.017), (18, 0.064), (19, -0.101), (20, -0.023), (21, -0.15), (22, 0.078), (23, 0.029), (24, 0.016), (25, -0.024), (26, -0.013), (27, 0.014), (28, 0.015), (29, -0.047), (30, 0.019), (31, -0.04), (32, -0.016), (33, -0.002), (34, 0.124), (35, 0.053), (36, 0.007), (37, 0.043), (38, 0.033), (39, 0.013), (40, -0.021), (41, -0.127), (42, -0.041), (43, 0.009), (44, -0.011), (45, 0.032), (46, 0.094), (47, -0.051), (48, 0.029), (49, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9648841 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

Author: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry. 1

2 0.85626471 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

Author: Dan Ciresan, Alessandro Giusti, Luca M. Gambardella, Jürgen Schmidhuber

Abstract: We address a central problem of neuroanatomy, namely, the automatic segmentation of neuronal structures depicted in stacks of electron microscopy (EM) images. This is necessary to efficiently map 3D brain structure and connectivity. To segment biological neuron membranes, we use a special type of deep artificial neural network as a pixel classifier. The label of each pixel (membrane or nonmembrane) is predicted from raw pixel values in a square window centered on it. The input layer maps each window pixel to a neuron. It is followed by a succession of convolutional and max-pooling layers which preserve 2D information and extract features with increasing levels of abstraction. The output layer produces a calibrated probability for each class. The classifier is trained by plain gradient descent on a 512 × 512 × 30 stack with known ground truth, and tested on a stack of the same size (ground truth unknown to the authors) by the organizers of the ISBI 2012 EM Segmentation Challenge. Even without problem-specific postprocessing, our approach outperforms competing techniques by a large margin in all three considered metrics, i.e. rand error, warping error and pixel error. For pixel error, our approach is the only one outperforming a second human observer. 1

3 0.84542793 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification

Author: Richard Socher, Brody Huval, Bharath Bath, Christopher D. Manning, Andrew Y. Ng

Abstract: Recent advances in 3D sensing technologies make it possible to easily record color and depth images which together can improve object recognition. Most current methods rely on very well-designed features for this new 3D modality. We introduce a model based on a combination of convolutional and recursive neural networks (CNN and RNN) for learning features and classifying RGB-D images. The CNN layer learns low-level translationally invariant features which are then given as inputs to multiple, fixed-tree RNNs in order to compose higher order features. RNNs can be seen as combining convolution and pooling into one efficient, hierarchical operation. Our main result is that even RNNs with random weights compose powerful features. Our model obtains state of the art performance on a standard RGB-D object dataset while being more accurate and faster during training and testing than comparable architectures such as two-layer CNNs. 1

4 0.82332277 193 nips-2012-Learning to Align from Scratch

Author: Gary Huang, Marwan Mattar, Honglak Lee, Erik G. Learned-miller

Abstract: Unsupervised joint alignment of images has been demonstrated to improve performance on recognition tasks such as face verification. Such alignment reduces undesired variability due to factors such as pose, while only requiring weak supervision in the form of poorly aligned examples. However, prior work on unsupervised alignment of complex, real-world images has required the careful selection of feature representation based on hand-crafted image descriptors, in order to achieve an appropriate, smooth optimization landscape. In this paper, we instead propose a novel combination of unsupervised joint alignment with unsupervised feature learning. Specifically, we incorporate deep learning into the congealing alignment framework. Through deep learning, we obtain features that can represent the image at differing resolutions based on network depth, and that are tuned to the statistics of the specific data being aligned. In addition, we modify the learning algorithm for the restricted Boltzmann machine by incorporating a group sparsity penalty, leading to a topographic organization of the learned filters and improving subsequent alignment results. We apply our method to the Labeled Faces in the Wild database (LFW). Using the aligned images produced by our proposed unsupervised algorithm, we achieve higher accuracy in face verification compared to prior work in both unsupervised and supervised alignment. We also match the accuracy for the best available commercial method. 1

5 0.81024855 93 nips-2012-Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction

Author: Pietro D. Lena, Ken Nagata, Pierre F. Baldi

Abstract: Residue-residue contact prediction is a fundamental problem in protein structure prediction. Hower, despite considerable research efforts, contact prediction methods are still largely unreliable. Here we introduce a novel deep machine-learning architecture which consists of a multidimensional stack of learning modules. For contact prediction, the idea is implemented as a three-dimensional stack of Neural Networks NNk , where i and j index the spatial coordinates of the contact ij map and k indexes “time”. The temporal dimension is introduced to capture the fact that protein folding is not an instantaneous process, but rather a progressive refinement. Networks at level k in the stack can be trained in supervised fashion to refine the predictions produced by the previous level, hence addressing the problem of vanishing gradients, typical of deep architectures. Increased accuracy and generalization capabilities of this approach are established by rigorous comparison with other classical machine learning approaches for contact prediction. The deep approach leads to an accuracy for difficult long-range contacts of about 30%, roughly 10% above the state-of-the-art. Many variations in the architectures and the training algorithms are possible, leaving room for further improvements. Furthermore, the approach is applicable to other problems with strong underlying spatial and temporal components. 1

6 0.78099537 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

7 0.74136382 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

8 0.72105908 170 nips-2012-Large Scale Distributed Deep Networks

9 0.67142308 159 nips-2012-Image Denoising and Inpainting with Deep Neural Networks

10 0.59089959 210 nips-2012-Memorability of Image Regions

11 0.5804323 8 nips-2012-A Generative Model for Parts-based Object Segmentation

12 0.57407683 65 nips-2012-Cardinality Restricted Boltzmann Machines

13 0.56932473 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

14 0.56809628 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

15 0.56705272 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

16 0.55943215 177 nips-2012-Learning Invariant Representations of Molecules for Atomization Energy Prediction

17 0.55703413 238 nips-2012-Neurally Plausible Reinforcement Learning of Working Memory Tasks

18 0.55651832 146 nips-2012-Graphical Gaussian Vector for Image Categorization

19 0.53883493 202 nips-2012-Locally Uniform Comparison Image Descriptor

20 0.53612775 197 nips-2012-Learning with Recursive Perceptual Representations


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.023), (17, 0.012), (21, 0.038), (38, 0.106), (42, 0.022), (53, 0.012), (54, 0.017), (55, 0.4), (74, 0.065), (76, 0.094), (80, 0.068), (92, 0.072)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84737182 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

Author: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry. 1

2 0.8338443 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts

Author: Francisco Ruiz, Isabel Valera, Carlos Blanco, Fernando Pérez-Cruz

Abstract: The National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) database contains a large amount of information, regarding the way of life, medical conditions, etc., of a representative sample of the U.S. population. In this paper, we are interested in seeking the hidden causes behind the suicide attempts, for which we propose to model the subjects using a nonparametric latent model based on the Indian Buffet Process (IBP). Due to the nature of the data, we need to adapt the observation model for discrete random variables. We propose a generative model in which the observations are drawn from a multinomial-logit distribution given the IBP matrix. The implementation of an efficient Gibbs sampler is accomplished using the Laplace approximation, which allows integrating out the weighting factors of the multinomial-logit likelihood model. Finally, the experiments over the NESARC database show that our model properly captures some of the hidden causes that model suicide attempts. 1

3 0.83355033 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition

Author: Francesco Dinuzzo, Bernhard Schölkopf

Abstract: The representer theorem is a property that lies at the foundation of regularization theory and kernel methods. A class of regularization functionals is said to admit a linear representer theorem if every member of the class admits minimizers that lie in the finite dimensional subspace spanned by the representers of the data. A recent characterization states that certain classes of regularization functionals with differentiable regularization term admit a linear representer theorem for any choice of the data if and only if the regularization term is a radial nondecreasing function. In this paper, we extend such result by weakening the assumptions on the regularization term. In particular, the main result of this paper implies that, for a sufficiently large family of regularization functionals, radial nondecreasing functions are the only lower semicontinuous regularization terms that guarantee existence of a representer theorem for any choice of the data. 1

4 0.81956017 155 nips-2012-Human memory search as a random walk in a semantic network

Author: Joseph L. Austerweil, Joshua T. Abbott, Thomas L. Griffiths

Abstract: The human mind has a remarkable ability to store a vast amount of information in memory, and an even more remarkable ability to retrieve these experiences when needed. Understanding the representations and algorithms that underlie human memory search could potentially be useful in other information retrieval settings, including internet search. Psychological studies have revealed clear regularities in how people search their memory, with clusters of semantically related items tending to be retrieved together. These findings have recently been taken as evidence that human memory search is similar to animals foraging for food in patchy environments, with people making a rational decision to switch away from a cluster of related information as it becomes depleted. We demonstrate that the results that were taken as evidence for this account also emerge from a random walk on a semantic network, much like the random web surfer model used in internet search engines. This offers a simpler and more unified account of how people search their memory, postulating a single process rather than one process for exploring a cluster and one process for switching between clusters. 1

5 0.75499165 211 nips-2012-Meta-Gaussian Information Bottleneck

Author: Melanie Rey, Volker Roth

Abstract: We present a reformulation of the information bottleneck (IB) problem in terms of copula, using the equivalence between mutual information and negative copula entropy. Focusing on the Gaussian copula we extend the analytical IB solution available for the multivariate Gaussian case to distributions with a Gaussian dependence structure but arbitrary marginal densities, also called meta-Gaussian distributions. This opens new possibles applications of IB to continuous data and provides a solution more robust to outliers. 1

6 0.67699641 215 nips-2012-Minimizing Uncertainty in Pipelines

7 0.65782726 95 nips-2012-Density-Difference Estimation

8 0.56191474 306 nips-2012-Semantic Kernel Forests from Multiple Taxonomies

9 0.55449867 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

10 0.5475893 193 nips-2012-Learning to Align from Scratch

11 0.54640502 238 nips-2012-Neurally Plausible Reinforcement Learning of Working Memory Tasks

12 0.5443294 231 nips-2012-Multiple Operator-valued Kernel Learning

13 0.54396021 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

14 0.51727486 210 nips-2012-Memorability of Image Regions

15 0.51617998 298 nips-2012-Scalable Inference of Overlapping Communities

16 0.51435947 4 nips-2012-A Better Way to Pretrain Deep Boltzmann Machines

17 0.51022369 188 nips-2012-Learning from Distributions via Support Measure Machines

18 0.50503582 93 nips-2012-Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction

19 0.50204027 190 nips-2012-Learning optimal spike-based representations

20 0.50197154 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding