nips nips2013 nips2013-83 knowledge-graph by maker-knowledge-mining

83 nips-2013-Deep Fisher Networks for Large-Scale Image Classification


Source: pdf

Author: Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

Abstract: As massively parallel computations have become broadly available with modern GPUs, deep architectures trained on very large datasets have risen in popularity. Discriminatively trained convolutional neural networks, in particular, were recently shown to yield state-of-the-art performance in challenging image classification benchmarks such as ImageNet. However, elements of these architectures are similar to standard hand-crafted representations used in computer vision. In this paper, we explore the extent of this analogy, proposing a version of the stateof-the-art Fisher vector image encoding that can be stacked in multiple layers. This architecture significantly improves on standard Fisher vectors, and obtains competitive results with deep convolutional networks at a smaller computational learning cost. Our hybrid architecture allows us to assess how the performance of a conventional hand-crafted image classification pipeline changes with increased depth. We also show that convolutional networks and Fisher vector encodings are complementary in the sense that their combination further improves the accuracy. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Discriminatively trained convolutional neural networks, in particular, were recently shown to yield state-of-the-art performance in challenging image classification benchmarks such as ImageNet. [sent-5, score-0.251]

2 In this paper, we explore the extent of this analogy, proposing a version of the stateof-the-art Fisher vector image encoding that can be stacked in multiple layers. [sent-7, score-0.302]

3 This architecture significantly improves on standard Fisher vectors, and obtains competitive results with deep convolutional networks at a smaller computational learning cost. [sent-8, score-0.22]

4 Our hybrid architecture allows us to assess how the performance of a conventional hand-crafted image classification pipeline changes with increased depth. [sent-9, score-0.303]

5 We also show that convolutional networks and Fisher vector encodings are complementary in the sense that their combination further improves the accuracy. [sent-10, score-0.22]

6 1 Introduction Discriminatively trained deep convolutional neural networks (CNN) [18] have recently achieved impressive state of the art results over a number of areas, including, in particular, the visual recognition of categories in the ImageNet Large-Scale Visual Recognition Challenge [4]. [sent-11, score-0.28]

7 Indeed, several standard features and pipelines in computer vision, such as SIFT [19] and a spatial pyramid on Bag of visual Words (BoW) [16] can be seen as corresponding to layers of a standard CNN. [sent-15, score-0.387]

8 We then show how this 1 representation can be modified to be used as a layer in a deeper architecture (Sect. [sent-24, score-0.266]

9 3) and how the latter can be discriminatively learnt to yield a deep Fisher network (Sect. [sent-25, score-0.209]

10 There exists a large variety of different encodings that can be used for this purpose, including the BoW [9, 29] encoding, sparse coding [33], and the FV encoding [20]. [sent-35, score-0.263]

11 Since FV was shown to outperform other encodings [6] and achieve very good performance on various image recognition benchmarks [21, 28], we use it as the basis of our framework. [sent-36, score-0.309]

12 Most encodings are designed to disregard the spatial location of features in order to be invariant to image transformations; in practice, however, retaining weak spatial information yields an improved classification performance. [sent-40, score-0.576]

13 This can be incorporated by dividing the image into regions, encoding each of them individually, and stacking the result in a composite higher-dimensional code, known as a spatial pyramid [16]. [sent-41, score-0.575]

14 The alternative, which does not increase the encoding dimensionality, is to augment the local features with their spatial coordinates [24]. [sent-42, score-0.317]

15 DNNs can be trained greedily, in a layer-by-layer manner, as in Restricted Boltzmann Machines [12] and (sparse) auto-encoders [3, 17], or by learning all layers simultaneously, which is relatively efficient if the layers are convolutional [18]. [sent-44, score-0.289]

16 For instance, dense feature encoding using the bag of visual words was considered as a single layer of a deep network in [1, 8, 32]. [sent-48, score-0.637]

17 2 Fisher vector encoding for image classification The Fisher vector encoding φ of a set of features {xp } (e. [sent-49, score-0.453]

18 The encoding describes how the distribution of features of a particular image differs from the distribution fitted to the features of all training images. [sent-59, score-0.452]

19 The FV dimensionality is 2Kd, where K is the codebook size (the number of Gaussians in the GMM), and d is the dimensionality of the encoded feature vector. [sent-61, score-0.254]

20 rest linear SVMs classifier layer SSR & L2 norm. [sent-68, score-0.232]

21 2-nd Fisher layer (global pooling) FV encoder L2 norm. [sent-69, score-0.299]

22 & PCA Spatial stacking FV encoder Dense feature extraction SSR & L2 norm. [sent-70, score-0.311]

23 FV encoder 1-st Fisher layer (with optional global pooling branched out) 0-th layer SIFT, raw patches, … Dense feature extraction SIFT, raw patches, … input image Figure 1: Left: Fisher network (Sect. [sent-72, score-1.003]

24 6, making the conventional pipeline slightly deeper by injecting a single Fisher layer substantially improves the classification accuracy. [sent-76, score-0.377]

25 As can be seen from (1), the (unnormalised) FV encoding is additive with respect to image features, i. [sent-77, score-0.262]

26 the encoding of an image is an average of the individual encodings of its features. [sent-79, score-0.391]

27 Finally, the high-dimensional FV is usually coupled with a one-vs-rest linear SVM classifier, and together they form a conventional image classification pipeline [21] (see Fig. [sent-81, score-0.269]

28 3 Fisher layer The conventional FV representation of an image (Sect. [sent-83, score-0.438]

29 SIFT) into a high-dimensional representation, and then aggregates these encodings into a single vector by global sum-pooling over the whole image (followed by normalisation). [sent-86, score-0.306]

30 This means that the representation describes the image in terms of the local patch features, and can not capture more complex image structures. [sent-87, score-0.329]

31 Deep neural networks are able to model the feature hierarchies by passing an output of one feature computation layer as the input to the next one. [sent-88, score-0.361]

32 We adopt a similar approach here, and devise a feed-forward feature encoding layer (which we term a Fisher layer), which is based on off-the-shelf Fisher vector encoding. [sent-89, score-0.391]

33 The layers can then be stacked into a deep network, which we call a Fisher network. [sent-90, score-0.231]

34 The architecture of the l-th Fisher layer is depicted in Fig. [sent-91, score-0.266]

35 On the input, it receives dl -dimensional features (dl ∼ 102 ), densely computed over multiple scales on a regular image grid. [sent-93, score-0.467]

36 The layer then performs feed-forward feature transformation in three sub-layers. [sent-95, score-0.283]

37 The first one computes semi-local FV encodings by pooling the input features not from the whole image, but from a dense set of semi-local regions. [sent-96, score-0.412]

38 The resulting FVs form a new set of densely sampled features that are more discriminative than the input ones and less local, as they integrate information from larger image areas. [sent-97, score-0.359]

39 2) uses a layer-specific GMM with Kl components, so the dimensionality of each FV is 2Kl dl , which, considering that FVs are computed densely, might be too large for practical applications. [sent-99, score-0.256]

40 Therefore, we decrease FV dimensionality by projection onto hl -dimensional subspace using a discriminatively trained linear projection Wl ∈ Rhl ×2Kl dl . [sent-100, score-0.516]

41 In the second sub-layer, the spatially adjacent features are stacked in a 2 × 2 window, which produces 4hl -dimensional dense feature representation. [sent-103, score-0.267]

42 3 dl Compressed semi-local Fisher local vector encoding mixture of Kl Gaussians hl projection Wl from 2Kldl to hl Spatial stacking (2×2) 4hl L2 norm. [sent-106, score-0.615]

43 Left: the arrows illustrate the data flow through the layer; the dimensionality of densely computed features is shown next to the arrows. [sent-108, score-0.246]

44 Right: spatial pooling (the blue squares) and stacking (the red square) in sub-layers 1 and 2 respectively. [sent-109, score-0.402]

45 2); compared to the regions used in global or spatial pyramid pooling [20], these are smaller and sampled much more densely. [sent-113, score-0.309]

46 As a result, instead of a single FV, describing the whole image, the image is represented by a large number of densely computed semi-local FVs, each of which describes a spatially adjacent set of local features, computed by the previous layer. [sent-114, score-0.303]

47 Thus, the new feature representation can capture more complex image statistics with larger spatial support. [sent-115, score-0.31]

48 The high dimensionality of Fisher vectors, however, brings up the computational complexity issue, as storing and processing thousands of dense FVs per image (each of which is 2Kl dl -dimensional) is prohibitive at large scale. [sent-117, score-0.454]

49 We tackle this problem by employing discriminative dimensionality reduction for high-dimensional FVs, which makes the layer learning procedure supervised. [sent-118, score-0.425]

50 The dimensionality reduction is carried out using a linear projection Wl onto an hl -dimensional subspace. [sent-119, score-0.287]

51 As noted in [8], such encodings require large codebooks to produce discriminative feature representations. [sent-123, score-0.284]

52 2, FV encoders do not require large codebooks, and by employing supervised dimensionality reduction, we can preserve the discriminative ability of FV even after the projection onto a low-dimensional space, similarly to [10]. [sent-126, score-0.229]

53 3), an image is represented as a spatially dense set of low-dimensional multi-scale discriminative features. [sent-129, score-0.311]

54 To capture the spatial structure within each feature’s neighbourhood, we incorporate the stacking sub-layer, which concatenates the spatially adjacent features in a 2 × 2 window (Fig. [sent-131, score-0.425]

55 In practice, the Fisher layer computation is repeated at multiple scales by changing the pooling window size ql (the PCA projection in sub-layer 3 is the same for all scales). [sent-139, score-0.534]

56 This allows a single layer to capture multi-scale statistics, which is different from typical DNN architectures, which use a single pooling window size per layer. [sent-140, score-0.414]

57 The resulting dense multi-scale features, computed by the layer, form the input of the next layer (similarly to the dense multi-scale SIFT features). [sent-141, score-0.383]

58 6 we show that a multi-scale Fisher layer indeed brings an improvement, compared to a fixed pooling window size. [sent-143, score-0.414]

59 4 4 Fisher network Our image classification pipeline, which we coin Fisher network (shown in Fig. [sent-144, score-0.256]

60 1) is constructed by stacking several (at least one) Fisher layers (Sect. [sent-145, score-0.258]

61 3) on top of dense features, such as SIFT or raw image patches. [sent-146, score-0.219]

62 We call this layer the global Fisher layer, and it effectively computes a full-dimensional normalised Fisher vector encoding (the dimensionality reduction stage is omitted since the computed FV is directly used for classification). [sent-148, score-0.542]

63 The final layer is an off-the-shelf ensemble of one-vs-rest binary linear SVMs. [sent-149, score-0.232]

64 Each subsequent Fisher layer is designed to capture more complex, higher-level image statistics, but the competitive performance of shallow FV-based frameworks [21] suggests that low-level SIFT features are already discriminative enough to distinguish between a number of image classes. [sent-152, score-0.768]

65 These image representations are then concatenated to produce a rich, multi-layer image descriptor. [sent-154, score-0.33]

66 1 Learning The Fisher network is trained in a supervised manner, since each Fisher layer (apart from the global layer) depends on discriminative dimensionality reduction. [sent-157, score-0.487]

67 Here we discuss how the (non-global) Fisher layer can be efficiently trained in the large-scale scenario, and introduce two options for the projection learning objective. [sent-159, score-0.346]

68 3, we need to learn a discriminative projection W to significantly reduce the dimensionality of the densely-computed semi-local FVs. [sent-162, score-0.229]

69 One approach to discriminative dimensionality reduction learning consists in finding the projection onto a subspace where the image classes are as linearly separable as possible [10, 31]. [sent-172, score-0.428]

70 This corresponds to the bilinear class scoring function: T vc W ψ, where W is the linear projection which we seek to optimise and vc is the linear model (e. [sent-173, score-0.213]

71 The max-margin optimisation problem for W and the ensemble {vc } takes the following form: max i vc − vc(i) T W ψi + 1, 0 + c =c(i) λ 2 vc c 2 2 + µ W 2 2 F, (2) where ci is the ground-truth class of an image i, λ and µ are the regularisation constants. [sent-176, score-0.286]

72 If a specific target dimensionality is required, PCA dimensionality reduction can be further applied to the classifier scores [10], but in our case we applied PCA after spatial stacking (Sect. [sent-184, score-0.517]

73 This suggests the fast computation procedure: each dl -dimensional input feature xp is first hard-assigned to a Gaussian k based on (3). [sent-194, score-0.31]

74 Then, the corresponding dl -D (1),(2) (k,1) (k,2) differences Φk (p) are computed and projected using small hl ×dl sub-matrices Wl , Wl , which is fast. [sent-195, score-0.218]

75 The algorithm avoids computing high-dimensional FVs, followed by the projection using a large matrix Wl ∈ Rhl ×2Kl dl , which is prohibitive since the number of dense FVs is high. [sent-196, score-0.297]

76 dim-ty reduction classifier scores classifier scores classifier scores bi-convex stacking L2 norm-n top-1 59. [sent-225, score-0.318]

77 11 Table 2: Evaluation of multi-scale pooling and multi-layer image description on the subset of ILSVRC-2010. [sent-233, score-0.289]

78 pooling window size q1 5 {5, 7, 9, 11} {5, 7, 9, 11} pooling stride δ1 1 2 2 multi-layer top-1 61. [sent-239, score-0.339]

79 In our experiments, we used SIFT as the first layer of the network, followed by two Fisher layers (the second one is global, as explained in Sect. [sent-247, score-0.328]

80 Here we quantitatively assess the three sub-layers of a Fisher layer (Sect. [sent-250, score-0.232]

81 We compare the two proposed dimensionality reduction learning schemes (bi-convex learning and classifier scores), and also demonstrate the importance of spatial stacking and L2 normalisation. [sent-252, score-0.396]

82 As can be seen, both spatial stacking and L2 normalisation improve the performance, and dimensionality reduction via projection onto the space of SVM classifier scores performs on par with the projection learnt using the bi-convex formulation (2). [sent-254, score-0.669]

83 From Table 2 it is clear that using multiple pooling window sizes is beneficial compared to a single window size. [sent-260, score-0.229]

84 Also, the multi-layer image descriptor obtained by stacking globally pooled and normalised FVs, computed by the two Fisher layers, outperforms each of these FVs taken separately. [sent-262, score-0.425]

85 We also note that in this experiment, unlike the previous one, both Fisher layers utilized spatial coordinate augmentation of the input features, which leads to a noticeable boost in the shallow baseline performance (from 78. [sent-263, score-0.366]

86 Apart from our Fisher network, multi-scale pooling can be readily employed in convolutional networks. [sent-266, score-0.22]

87 2 Evaluation on ILSVRC-2010 Now that we have evaluated various Fisher layer configurations on a subset of ILSVRC, we assess the performance of our framework on the full ILSVRC-2010 dataset. [sent-268, score-0.232]

88 We use off-the-shelf SIFT and colour features [20] in the feature extraction layer, and demonstrate that significant improvements can be achieved by injecting a single Fisher layer into the conventional FV-based pipeline [23]. [sent-269, score-0.585]

89 The first Fisher layer uses a large number of GMM components Kl , since it was found to be beneficial for shallow FV encodings [23], used here as a baseline. [sent-272, score-0.442]

90 First, we note that the globally pooled Fisher vector, branched out of the first Fisher layer (which effectively corresponds to the conventional FV encoding [23]), results in better accuracy than reported in [23], which validates our implementation. [sent-275, score-0.5]

91 Using the 2nd Fisher layer on top of the 1st one leads to a significant performance improvement. [sent-276, score-0.232]

92 We also specify the dimensionality of SIFT-based image representations. [sent-279, score-0.238]

93 pipeline setting 1st Fisher layer 2nd Fisher layer multi-layer (1st and 2nd Fisher layers) S´ nchez et al. [sent-280, score-0.574]

94 We conclude that injecting a single intermediate layer leads to a significant performance boost (+4. [sent-312, score-0.286]

95 7 Conclusion We have shown that Fisher vectors, a standard image encoding method, are amenable to be stacked in multiple layers, in analogy to the state-of-the-art deep neural network architectures. [sent-327, score-0.448]

96 Adding a single layer is in fact sufficient to significantly boost the performance of these shallow image encodings, bringing their performance closer to the state of the art in the large-scale classification scenario [14]. [sent-328, score-0.491]

97 The fact that off-the-shelf image representations can be simply and successfully stacked indicates that deep schemes may extend well beyond neural networks. [sent-329, score-0.311]

98 Modeling the spatial layout of images beyond spatial a ı pyramids. [sent-500, score-0.236]

99 Beyond spatial pyramids: A new feature extraction framework with dense spatial sampling for image classification. [sent-557, score-0.511]

100 Linear spatial pyramid matching using sparse coding for image classification. [sent-566, score-0.331]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('fv', 0.545), ('fisher', 0.427), ('fvs', 0.258), ('layer', 0.232), ('stacking', 0.162), ('image', 0.154), ('dl', 0.151), ('sift', 0.143), ('pooling', 0.135), ('encodings', 0.129), ('wl', 0.126), ('gmm', 0.111), ('xp', 0.108), ('encoding', 0.108), ('spatial', 0.105), ('layers', 0.096), ('deep', 0.095), ('cnn', 0.087), ('dimensionality', 0.084), ('features', 0.083), ('projection', 0.081), ('shallow', 0.081), ('classi', 0.07), ('encoder', 0.067), ('vc', 0.066), ('dense', 0.065), ('discriminative', 0.064), ('convolutional', 0.064), ('pipeline', 0.063), ('ilsvrc', 0.061), ('augmentation', 0.06), ('densely', 0.058), ('ssr', 0.053), ('conventional', 0.052), ('feature', 0.051), ('normalisation', 0.051), ('network', 0.051), ('window', 0.047), ('nchez', 0.047), ('hl', 0.046), ('pyramid', 0.046), ('svm', 0.046), ('reduction', 0.045), ('colour', 0.043), ('eccv', 0.042), ('perronnin', 0.041), ('discriminatively', 0.04), ('stacked', 0.04), ('codebooks', 0.04), ('ql', 0.039), ('pooled', 0.037), ('scores', 0.037), ('visual', 0.035), ('alternation', 0.035), ('codebook', 0.035), ('decorrelated', 0.035), ('cvpr', 0.035), ('er', 0.035), ('imagenet', 0.034), ('architecture', 0.034), ('trained', 0.033), ('cnns', 0.032), ('pca', 0.032), ('carried', 0.031), ('extraction', 0.031), ('injecting', 0.03), ('normalised', 0.029), ('vedaldi', 0.029), ('object', 0.029), ('spatially', 0.028), ('networks', 0.027), ('bow', 0.027), ('branched', 0.027), ('classemes', 0.027), ('parallelised', 0.027), ('rhl', 0.027), ('simonyan', 0.027), ('coding', 0.026), ('recognition', 0.026), ('neighbourhood', 0.026), ('compressed', 0.026), ('images', 0.026), ('training', 0.024), ('boost', 0.024), ('unnormalised', 0.023), ('global', 0.023), ('learnt', 0.023), ('accuracy', 0.022), ('globally', 0.022), ('representations', 0.022), ('stride', 0.022), ('pipelines', 0.022), ('bags', 0.022), ('dnn', 0.022), ('dnns', 0.022), ('gmms', 0.022), ('gpus', 0.022), ('employed', 0.021), ('computed', 0.021), ('local', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 83 nips-2013-Deep Fisher Networks for Large-Scale Image Classification

Author: Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

Abstract: As massively parallel computations have become broadly available with modern GPUs, deep architectures trained on very large datasets have risen in popularity. Discriminatively trained convolutional neural networks, in particular, were recently shown to yield state-of-the-art performance in challenging image classification benchmarks such as ImageNet. However, elements of these architectures are similar to standard hand-crafted representations used in computer vision. In this paper, we explore the extent of this analogy, proposing a version of the stateof-the-art Fisher vector image encoding that can be stacked in multiple layers. This architecture significantly improves on standard Fisher vectors, and obtains competitive results with deep convolutional networks at a smaller computational learning cost. Our hybrid architecture allows us to assess how the performance of a conventional hand-crafted image classification pipeline changes with increased depth. We also show that convolutional networks and Fisher vector encodings are complementary in the sense that their combination further improves the accuracy. 1

2 0.20385279 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs

Author: Ying Liu, Alan Willsky

Abstract: Gaussian Graphical Models (GGMs) or Gauss Markov random fields are widely used in many applications, and the trade-off between the modeling capacity and the efficiency of learning and inference has been an important research problem. In this paper, we study the family of GGMs with small feedback vertex sets (FVSs), where an FVS is a set of nodes whose removal breaks all the cycles. Exact inference such as computing the marginal distributions and the partition function has complexity O(k 2 n) using message-passing algorithms, where k is the size of the FVS, and n is the total number of nodes. We propose efficient structure learning algorithms for two cases: 1) All nodes are observed, which is useful in modeling social or flight networks where the FVS nodes often correspond to a small number of highly influential nodes, or hubs, while the rest of the networks is modeled by a tree. Regardless of the maximum degree, without knowing the full graph structure, we can exactly compute the maximum likelihood estimate with complexity O(kn2 + n2 log n) if the FVS is known or in polynomial time if the FVS is unknown but has bounded size. 2) The FVS nodes are latent variables, where structure learning is equivalent to decomposing an inverse covariance matrix (exactly or approximately) into the sum of a tree-structured matrix and a low-rank matrix. By incorporating efficient inference into the learning steps, we can obtain a learning algorithm using alternating low-rank corrections with complexity O(kn2 + n2 log n) per iteration. We perform experiments using both synthetic data as well as real data of flight delays to demonstrate the modeling capacity with FVSs of various sizes. 1

3 0.20089671 331 nips-2013-Top-Down Regularization of Deep Belief Networks

Author: Hanlin Goh, Nicolas Thome, Matthieu Cord, Joo-Hwee Lim

Abstract: Designing a principled and effective algorithm for learning deep architectures is a challenging problem. The current approach involves two training phases: a fully unsupervised learning followed by a strongly discriminative optimization. We suggest a deep learning strategy that bridges the gap between the two phases, resulting in a three-phase learning procedure. We propose to implement the scheme using a method to regularize deep belief networks with top-down information. The network is constructed from building blocks of restricted Boltzmann machines learned by combining bottom-up and top-down sampled signals. A global optimization procedure that merges samples from a forward bottom-up pass and a top-down pass is used. Experiments on the MNIST dataset show improvements over the existing algorithms for deep belief networks. Object recognition results on the Caltech-101 dataset also yield competitive results. 1

4 0.19117932 251 nips-2013-Predicting Parameters in Deep Learning

Author: Misha Denil, Babak Shakibi, Laurent Dinh, Marc'Aurelio Ranzato, Nando de Freitas

Abstract: We demonstrate that there is significant redundancy in the parameterization of several deep learning models. Given only a few weight values for each feature it is possible to accurately predict the remaining values. Moreover, we show that not only can the parameter values be predicted, but many of them need not be learned at all. We train several different architectures by learning only a small number of weights and predicting the rest. In the best case we are able to predict more than 95% of the weights of a network without any drop in accuracy. 1

5 0.17619461 334 nips-2013-Training and Analysing Deep Recurrent Neural Networks

Author: Michiel Hermans, Benjamin Schrauwen

Abstract: Time series often have a temporal hierarchy, with information that is spread out over multiple time scales. Common recurrent neural networks, however, do not explicitly accommodate such a hierarchy, and most research on them has been focusing on training algorithms rather than on their basic architecture. In this paper we study the effect of a hierarchy of recurrent neural networks on processing time series. Here, each layer is a recurrent network which receives the hidden state of the previous layer as input. This architecture allows us to perform hierarchical processing on difficult temporal tasks, and more naturally capture the structure of time series. We show that they reach state-of-the-art performance for recurrent networks in character-level language modeling when trained with simple stochastic gradient descent. We also offer an analysis of the different emergent time scales. 1

6 0.13023511 75 nips-2013-Convex Two-Layer Modeling

7 0.12246006 5 nips-2013-A Deep Architecture for Matching Short Texts

8 0.11982576 81 nips-2013-DeViSE: A Deep Visual-Semantic Embedding Model

9 0.10285465 84 nips-2013-Deep Neural Networks for Object Detection

10 0.09616375 93 nips-2013-Discriminative Transfer Learning with Tree-based Priors

11 0.092353679 356 nips-2013-Zero-Shot Learning Through Cross-Modal Transfer

12 0.091681518 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking

13 0.091654226 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables

14 0.087801062 351 nips-2013-What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach

15 0.085372023 64 nips-2013-Compete to Compute

16 0.083257392 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies

17 0.082025126 30 nips-2013-Adaptive dropout for training deep neural networks

18 0.080526836 163 nips-2013-Learning a Deep Compact Image Representation for Visual Tracking

19 0.078696251 166 nips-2013-Learning invariant representations and applications to face verification

20 0.078479126 216 nips-2013-On Flat versus Hierarchical Classification in Large-Scale Taxonomies


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.152), (1, 0.098), (2, -0.165), (3, -0.114), (4, 0.13), (5, -0.144), (6, -0.091), (7, 0.061), (8, -0.05), (9, -0.099), (10, 0.052), (11, 0.029), (12, -0.008), (13, 0.03), (14, 0.013), (15, 0.019), (16, -0.017), (17, -0.078), (18, -0.041), (19, -0.015), (20, 0.054), (21, -0.014), (22, 0.081), (23, -0.045), (24, 0.003), (25, 0.004), (26, -0.063), (27, 0.013), (28, 0.089), (29, -0.018), (30, 0.044), (31, 0.007), (32, -0.088), (33, -0.035), (34, -0.044), (35, 0.138), (36, 0.022), (37, -0.016), (38, 0.007), (39, -0.057), (40, 0.131), (41, -0.041), (42, 0.027), (43, -0.075), (44, 0.061), (45, 0.076), (46, -0.07), (47, 0.11), (48, -0.185), (49, -0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94555891 83 nips-2013-Deep Fisher Networks for Large-Scale Image Classification

Author: Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

Abstract: As massively parallel computations have become broadly available with modern GPUs, deep architectures trained on very large datasets have risen in popularity. Discriminatively trained convolutional neural networks, in particular, were recently shown to yield state-of-the-art performance in challenging image classification benchmarks such as ImageNet. However, elements of these architectures are similar to standard hand-crafted representations used in computer vision. In this paper, we explore the extent of this analogy, proposing a version of the stateof-the-art Fisher vector image encoding that can be stacked in multiple layers. This architecture significantly improves on standard Fisher vectors, and obtains competitive results with deep convolutional networks at a smaller computational learning cost. Our hybrid architecture allows us to assess how the performance of a conventional hand-crafted image classification pipeline changes with increased depth. We also show that convolutional networks and Fisher vector encodings are complementary in the sense that their combination further improves the accuracy. 1

2 0.72456104 251 nips-2013-Predicting Parameters in Deep Learning

Author: Misha Denil, Babak Shakibi, Laurent Dinh, Marc'Aurelio Ranzato, Nando de Freitas

Abstract: We demonstrate that there is significant redundancy in the parameterization of several deep learning models. Given only a few weight values for each feature it is possible to accurately predict the remaining values. Moreover, we show that not only can the parameter values be predicted, but many of them need not be learned at all. We train several different architectures by learning only a small number of weights and predicting the rest. In the best case we are able to predict more than 95% of the weights of a network without any drop in accuracy. 1

3 0.69991595 334 nips-2013-Training and Analysing Deep Recurrent Neural Networks

Author: Michiel Hermans, Benjamin Schrauwen

Abstract: Time series often have a temporal hierarchy, with information that is spread out over multiple time scales. Common recurrent neural networks, however, do not explicitly accommodate such a hierarchy, and most research on them has been focusing on training algorithms rather than on their basic architecture. In this paper we study the effect of a hierarchy of recurrent neural networks on processing time series. Here, each layer is a recurrent network which receives the hidden state of the previous layer as input. This architecture allows us to perform hierarchical processing on difficult temporal tasks, and more naturally capture the structure of time series. We show that they reach state-of-the-art performance for recurrent networks in character-level language modeling when trained with simple stochastic gradient descent. We also offer an analysis of the different emergent time scales. 1

4 0.6678896 331 nips-2013-Top-Down Regularization of Deep Belief Networks

Author: Hanlin Goh, Nicolas Thome, Matthieu Cord, Joo-Hwee Lim

Abstract: Designing a principled and effective algorithm for learning deep architectures is a challenging problem. The current approach involves two training phases: a fully unsupervised learning followed by a strongly discriminative optimization. We suggest a deep learning strategy that bridges the gap between the two phases, resulting in a three-phase learning procedure. We propose to implement the scheme using a method to regularize deep belief networks with top-down information. The network is constructed from building blocks of restricted Boltzmann machines learned by combining bottom-up and top-down sampled signals. A global optimization procedure that merges samples from a forward bottom-up pass and a top-down pass is used. Experiments on the MNIST dataset show improvements over the existing algorithms for deep belief networks. Object recognition results on the Caltech-101 dataset also yield competitive results. 1

5 0.65285653 84 nips-2013-Deep Neural Networks for Object Detection

Author: Christian Szegedy, Alexander Toshev, Dumitru Erhan

Abstract: Deep Neural Networks (DNNs) have recently shown outstanding performance on image classification tasks [14]. In this paper we go one step further and address the problem of object detection using DNNs, that is not only classifying but also precisely localizing objects of various classes. We present a simple and yet powerful formulation of object detection as a regression problem to object bounding box masks. We define a multi-scale inference procedure which is able to produce high-resolution object detections at a low cost by a few network applications. State-of-the-art performance of the approach is shown on Pascal VOC. 1

6 0.63238686 27 nips-2013-Adaptive Multi-Column Deep Neural Networks with Application to Robust Image Denoising

7 0.58066678 351 nips-2013-What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach

8 0.56902468 163 nips-2013-Learning a Deep Compact Image Representation for Visual Tracking

9 0.56635422 81 nips-2013-DeViSE: A Deep Visual-Semantic Embedding Model

10 0.53346413 5 nips-2013-A Deep Architecture for Matching Short Texts

11 0.53102744 85 nips-2013-Deep content-based music recommendation

12 0.4974294 93 nips-2013-Discriminative Transfer Learning with Tree-based Priors

13 0.49122965 160 nips-2013-Learning Stochastic Feedforward Neural Networks

14 0.48339206 166 nips-2013-Learning invariant representations and applications to face verification

15 0.47914979 64 nips-2013-Compete to Compute

16 0.47585201 30 nips-2013-Adaptive dropout for training deep neural networks

17 0.46790266 75 nips-2013-Convex Two-Layer Modeling

18 0.46713048 200 nips-2013-Multi-Prediction Deep Boltzmann Machines

19 0.46637505 167 nips-2013-Learning the Local Statistics of Optical Flow

20 0.43580538 226 nips-2013-One-shot learning by inverting a compositional causal process


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.027), (33, 0.177), (34, 0.066), (41, 0.015), (49, 0.036), (56, 0.063), (70, 0.04), (85, 0.02), (89, 0.383), (93, 0.067)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90336949 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification

Author: George H. Chen, Stanislav Nikolov, Devavrat Shah

Abstract: For classifying time series, a nearest-neighbor approach is widely used in practice with performance often competitive with or better than more elaborate methods such as neural networks, decision trees, and support vector machines. We develop theoretical justification for the effectiveness of nearest-neighbor-like classification of time series. Our guiding hypothesis is that in many applications, such as forecasting which topics will become trends on Twitter, there aren’t actually that many prototypical time series to begin with, relative to the number of time series we have access to, e.g., topics become trends on Twitter only in a few distinct manners whereas we can collect massive amounts of Twitter data. To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a “weighted majority voting” classification rule that can be approximated by a nearest-neighbor classifier. We establish nonasymptotic performance guarantees of both weighted majority voting and nearest-neighbor classification under our model accounting for how much of the time series we observe and the model complexity. Experimental results on synthetic data show weighted majority voting achieving the same misclassification rate as nearest-neighbor classification while observing less of the time series. We then use weighted majority to forecast which news topics on Twitter become trends, where we are able to detect such “trending topics” in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%. 1

2 0.89891261 305 nips-2013-Spectral methods for neural characterization using generalized quadratic models

Author: Il M. Park, Evan W. Archer, Nicholas Priebe, Jonathan W. Pillow

Abstract: We describe a set of fast, tractable methods for characterizing neural responses to high-dimensional sensory stimuli using a model we refer to as the generalized quadratic model (GQM). The GQM consists of a low-rank quadratic function followed by a point nonlinearity and exponential-family noise. The quadratic function characterizes the neuron’s stimulus selectivity in terms of a set linear receptive fields followed by a quadratic combination rule, and the invertible nonlinearity maps this output to the desired response range. Special cases of the GQM include the 2nd-order Volterra model [1, 2] and the elliptical Linear-Nonlinear-Poisson model [3]. Here we show that for “canonical form” GQMs, spectral decomposition of the first two response-weighted moments yields approximate maximumlikelihood estimators via a quantity called the expected log-likelihood. The resulting theory generalizes moment-based estimators such as the spike-triggered covariance, and, in the Gaussian noise case, provides closed-form estimators under a large class of non-Gaussian stimulus distributions. We show that these estimators are fast and provide highly accurate estimates with far lower computational cost than full maximum likelihood. Moreover, the GQM provides a natural framework for combining multi-dimensional stimulus sensitivity and spike-history dependencies within a single model. We show applications to both analog and spiking data using intracellular recordings of V1 membrane potential and extracellular recordings of retinal spike trains. 1

3 0.87176931 109 nips-2013-Estimating LASSO Risk and Noise Level

Author: Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari

Abstract: We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. In particular, we consider the problem of learning a coefficient vector θ0 ∈ Rp from noisy linear observations y = Xθ0 + w ∈ Rn (p > n) and the popular estimation procedure of solving the 1 -penalized least squares objective known as the LASSO or Basis Pursuit DeNoising (BPDN). In this context, we develop new estimators for the 2 estimation risk θ − θ0 2 and the variance of the noise when distributions of θ0 and w are unknown. These can be used to select the regularization parameter optimally. Our approach combines Stein’s unbiased risk estimate [Ste81] and the recent results of [BM12a][BM12b] on the analysis of approximate message passing and the risk of LASSO. We establish high-dimensional consistency of our estimators for sequences of matrices X of increasing dimensions, with independent Gaussian entries. We establish validity for a broader class of Gaussian designs, conditional on a certain conjecture from statistical physics. To the best of our knowledge, this result is the first that provides an asymptotically consistent risk estimator for the LASSO solely based on data. In addition, we demonstrate through simulations that our variance estimation outperforms several existing methods in the literature. 1

4 0.85769141 91 nips-2013-Dirty Statistical Models

Author: Eunho Yang, Pradeep Ravikumar

Abstract: We provide a unified framework for the high-dimensional analysis of “superposition-structured” or “dirty” statistical models: where the model parameters are a superposition of structurally constrained parameters. We allow for any number and types of structures, and any statistical model. We consider the general class of M -estimators that minimize the sum of any loss function, and an instance of what we call a “hybrid” regularization, that is the infimal convolution of weighted regularization functions, one for each structural component. We provide corollaries showcasing our unified framework for varied statistical models such as linear regression, multiple regression and principal component analysis, over varied superposition structures. 1

same-paper 5 0.83982635 83 nips-2013-Deep Fisher Networks for Large-Scale Image Classification

Author: Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

Abstract: As massively parallel computations have become broadly available with modern GPUs, deep architectures trained on very large datasets have risen in popularity. Discriminatively trained convolutional neural networks, in particular, were recently shown to yield state-of-the-art performance in challenging image classification benchmarks such as ImageNet. However, elements of these architectures are similar to standard hand-crafted representations used in computer vision. In this paper, we explore the extent of this analogy, proposing a version of the stateof-the-art Fisher vector image encoding that can be stacked in multiple layers. This architecture significantly improves on standard Fisher vectors, and obtains competitive results with deep convolutional networks at a smaller computational learning cost. Our hybrid architecture allows us to assess how the performance of a conventional hand-crafted image classification pipeline changes with increased depth. We also show that convolutional networks and Fisher vector encodings are complementary in the sense that their combination further improves the accuracy. 1

6 0.77872717 234 nips-2013-Online Variational Approximations to non-Exponential Family Change Point Models: With Application to Radar Tracking

7 0.67166901 273 nips-2013-Reinforcement Learning in Robust Markov Decision Processes

8 0.65768373 237 nips-2013-Optimal integration of visual speed across different spatiotemporal frequency channels

9 0.65546137 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition

10 0.64849025 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.

11 0.62882876 163 nips-2013-Learning a Deep Compact Image Representation for Visual Tracking

12 0.62435752 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration

13 0.62164754 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models

14 0.61729729 130 nips-2013-Generalizing Analytic Shrinkage for Arbitrary Covariance Structures

15 0.60973865 67 nips-2013-Conditional Random Fields via Univariate Exponential Families

16 0.60821092 284 nips-2013-Robust Spatial Filtering with Beta Divergence

17 0.60618782 259 nips-2013-Provable Subspace Clustering: When LRR meets SSC

18 0.60532165 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

19 0.60531008 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions

20 0.59570217 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables