nips nips2011 nips2011-105 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nobuyuki Morioka, Shin'ichi Satoh
Abstract: Sparse coding, a method of explaining sensory data with as few dictionary bases as possible, has attracted much attention in computer vision. For visual object category recognition, 1 regularized sparse coding is combined with the spatial pyramid representation to obtain state-of-the-art performance. However, because of its iterative optimization, applying sparse coding onto every local feature descriptor extracted from an image database can become a major bottleneck. To overcome this computational challenge, this paper presents “Generalized Lasso based Approximation of Sparse coding” (GLAS). By representing the distribution of sparse coefficients with slice transform, we fit a piece-wise linear mapping function with the generalized lasso. We also propose an efficient post-refinement procedure to perform mutual inhibition between bases which is essential for an overcomplete setting. The experiments show that GLAS obtains a comparable performance to 1 regularized sparse coding, yet achieves a significant speed up demonstrating its effectiveness for large-scale visual recognition problems. 1
Reference: text
sentIndex sentText sentNum sentScore
1 jp Abstract Sparse coding, a method of explaining sensory data with as few dictionary bases as possible, has attracted much attention in computer vision. [sent-6, score-0.193]
2 For visual object category recognition, 1 regularized sparse coding is combined with the spatial pyramid representation to obtain state-of-the-art performance. [sent-7, score-0.517]
3 However, because of its iterative optimization, applying sparse coding onto every local feature descriptor extracted from an image database can become a major bottleneck. [sent-8, score-0.55]
4 By representing the distribution of sparse coefficients with slice transform, we fit a piece-wise linear mapping function with the generalized lasso. [sent-10, score-0.23]
5 We also propose an efficient post-refinement procedure to perform mutual inhibition between bases which is essential for an overcomplete setting. [sent-11, score-0.257]
6 The experiments show that GLAS obtains a comparable performance to 1 regularized sparse coding, yet achieves a significant speed up demonstrating its effectiveness for large-scale visual recognition problems. [sent-12, score-0.264]
7 1 Introduction Recently, sparse coding [3, 18] has attracted much attention in computer vision research. [sent-13, score-0.324]
8 Its applications range from image denoising [23] to image segmentation [17] and image classification [10, 24], achieving state-of-the-art results. [sent-14, score-0.158]
9 Sparse coding interprets an input signal x ∈ RD×1 with a sparse vector u ∈ RK×1 whose linear combination with an overcomplete set of K bases (i. [sent-15, score-0.463]
10 , D K), also known as dictionary B ∈ RD×K , reconstructs the input as precisely as possible. [sent-17, score-0.103]
11 Several efficient 1 regularized sparse coding algorithms have been proposed [4, 14] and are adopted in visual recognition [10, 24]. [sent-19, score-0.427]
12 [24] compute the spare codes of many local feature descriptors with sparse coding. [sent-21, score-0.367]
13 However, due to the 1 norm being non-smooth convex, the sparse coding algorithm needs to optimize iteratively until convergence. [sent-22, score-0.327]
14 Therefore, the local feature descriptor coding step becomes a major bottleneck for large-scale problems like visual recognition. [sent-23, score-0.4]
15 Specifically, we encode the distribution of each dimension in sparse codes with the slice transform representation [9] and learn a piece-wise linear mapping function with the generalized lasso to obtain the best fit [21] to approximate 1 regularized sparse coding. [sent-27, score-0.569]
16 [24] and performing better than other fast algorithms that obtain sparse codes [22]. [sent-30, score-0.215]
17 While there have been several supervised dictionary 1 learning methods for sparse coding to obtain more discriminative sparse representations [16, 25], they have not been evaluated on visual recognition with many object categories due to its computational challenges. [sent-31, score-0.631]
18 Therefore, in this paper, we focus on learning a fast approximation of sparse coding in an unsupervised manner. [sent-34, score-0.33]
19 The paper is organized as follows: Section 2 reviews some related work including the linear spatial pyramid combined with sparse coding and other fast algorithms to obtain sparse codes. [sent-35, score-0.549]
20 1 Related Work Linear Spatial Pyramid Matching Using Sparse Coding This section reviews the linear spatial pyramid matching based on sparse coding by Yang et al. [sent-40, score-0.431]
21 Given a collection of N local feature descriptors randomly sampled from training images X = [x1 , x2 , . [sent-42, score-0.216]
22 Under the spatial pyramid matching framework, each image is divided into a set of sub-regions r = [r1 , r2 , . [sent-60, score-0.168]
23 Then, we compute the sparse solutions of all local feature descriptors, denoted as Urj , appearing in each sub-region rj by min Xrj − BUrj Urj 2 2 + λ Ur j 1. [sent-65, score-0.239]
24 (2) The sparse solutions are max pooled for each sub-region and concatenated with other sub-regions to build a statistic of the image by h = [max(|Ur1 |) , max(|Ur2 |) , . [sent-66, score-0.196]
25 The main advantage of using sparse coding is that state-of-the-art results can be achieved with a simple linear classifier as reported in [24]. [sent-72, score-0.31]
26 However, the step of finding a sparse code for each local descriptor with sparse coding now becomes a major bottleneck. [sent-74, score-0.644]
27 Using the efficient sparse coding algorithm based on feature-sign search [14], the time to compute the solution for one local descriptor u is O(KZ) where Z is the number of non-zeros in u. [sent-75, score-0.498]
28 2 Predictive Sparse Decomposition Predictive sparse decomposition (PSD) described in [10, 11] is a feedforward network that applies a non-linear mapping function on linearly transformed input data to match the optimal sparse coding ˆ solution as accurate as possible. [sent-79, score-0.481]
29 Given training samples {xi }N , the parameters can be estimated either jointly or separately from i=1 the dictionary B. [sent-82, score-0.105]
30 Gregor and LeCun [7] have later proposed a better, but iterative approximation scheme for 1 regularized sparse coding. [sent-88, score-0.181]
31 This is particularly useful in visual recognition that uses multiple feature types, as it automatically estimates the function form for each feature type from data. [sent-92, score-0.108]
32 We demonstrate this with two different local descriptor types in our experiments. [sent-93, score-0.174]
33 3 Locality-constrained Linear Coding Another notable work that overcomes the bottleneck of the local descriptor coding step is localityconstrained linear coding (LLC) proposed by Wang et al. [sent-95, score-0.545]
34 [22], a fast version of local coordinate coding [26]. [sent-96, score-0.267]
35 Given a local feature descriptor xi , LLC searches for M nearest dictionary bases of each local descriptor xi and these nearest bases stacked in columns are denoted as Bφi ∈ RD×M where φi indicates the index list of the bases. [sent-97, score-0.692]
36 The final sparse code ui is obtained by setting its elements indexed at φi to uφi . [sent-102, score-0.183]
37 While it is fast, the resulting sparse solutions obtained are not as discriminative as the ones obtained by sparse coding. [sent-105, score-0.298]
38 Some descriptors may need more bases for accurate representation and others may need less bases for more distinctiveness. [sent-107, score-0.289]
39 In contrast, the number of bases selected with our post-refinement procedure to handle the mutual inhibition is different for each local descriptor. [sent-108, score-0.276]
40 We first learn a dictionary from a collection of local feature descriptors as given Eqn. [sent-110, score-0.248]
41 Then, based on slice transform representation, we fit a piece-wise linear mapping function with the generalized lasso to approximate the optimal sparse solutions of the local feature descriptors under 1 regularized sparse coding. [sent-112, score-0.696]
42 1 Slice Transform Representation Slice transform representation is introduced as a way to discretize a function space so to fit a piecewise linear function for the purpose of image denoising by Hel-Or and Shaked [9]. [sent-115, score-0.102]
43 In this paper, we utilise the representation to approximate sparse coding to obtain sparse codes for local feature descriptor as fast as possible. [sent-118, score-0.719]
44 Given a local descriptor x, we can linearly combine with B to obtain z. [sent-119, score-0.174]
45 1 -regularized sparse coding (L1-SC) in magenta (see Eqn. [sent-152, score-0.31]
46 , pK } such that resulting vector approximates the optimal sparse solution of x obtained by 1 regularized sparse coding as much as possible. [sent-176, score-0.505]
47 (7) Hel-Or and Shaked [9] have formulated the problem of learning each pk as regularized least squares either independently in a transform domain or jointly in a spatial domain. [sent-181, score-0.296]
48 Unlike their setting, we have significantly large number of bases which makes joint optimization of all pk difficult. [sent-182, score-0.242]
49 Moreover, since we are interested in approximating the sparse solutions which are in the transform domain, we learn each pk independently. [sent-183, score-0.325]
50 , xN ] ∈ RD×N and their corresponding sparse solutions U = [u1 , u2 , . [sent-187, score-0.15]
51 , yK ] ∈ RK×N obtained with 1 regularized sparse coding, we have an optimization problem given as min yk − Sk pk pk 2 2 + α q − pk 2 2, (8) where Sk = Sq (zk ). [sent-193, score-0.636]
52 The regularization of the second term is essential to avoid singularity when computing the inverse and its consequence is that pk is encouraged to align itself to q when not many data samples are available. [sent-194, score-0.168]
53 This might have been a reasonable prior for image denoising [9], but not desirable for the purpose of approximating sparse coing, as we would like to suppress most of the coefficients in u to zero. [sent-195, score-0.198]
54 Figure 1 shows the distribution of one dimension of sparse coefficients z obtained from a collection of SIFT descriptors and q does not look similar to the distribution. [sent-196, score-0.215]
55 This motivates us to look at the generalized lasso [21] as an alternative for obtaining a better fit of the distribution of the coefficients. [sent-197, score-0.099]
56 2 Generalized Lasso In the previous section, we have argued that regularized least squares stated in Eqn. [sent-199, score-0.092]
57 This naturally leads us to consider 1 regularized sparse coding also known as the lasso which is formulated as: min yk − Sk pk pk 4 2 2 + α pk 1. [sent-202, score-0.873]
58 It turns out 1 trend filtering [12], generally known as the generalized lasso [21], can overcome this problem. [sent-204, score-0.099]
59 This is expressed as min yk − Sk pk 2 + α Dpk 1 , (10) 2 pk where D ∈ R (Q−2)×Q is referred to as a penalty matrix and defined as −1 2 −1 −1 2 −1 . [sent-205, score-0.316]
60 −1 2 (11) −1 To solve the above optimization problem, we can turn it into the sparse coding problem [21]. [sent-215, score-0.31]
61 After some substitutions, where θ1 = Dpk and θ2 = Apk , then Sk pk = Sk D we see that θ2 can be solved by: θ2 = (Sk2 Sk2 )−1 Sk2 (yk − Sk1 θ1 ), given θ1 is solved already. [sent-225, score-0.169]
62 Now, to solve θ1 , we have the following sparse coding problem: min (I − P)yk − (I − P)Sk1 θ1 θ1 2 2 + α θ1 1 , (12) where P = Sk2 (Sk2 Sk2 )−1 Sk2 . [sent-226, score-0.31]
63 Having computed both θ1 and θ2 , we can recover the solution of pk ˜ −1 by D θ. [sent-227, score-0.153]
64 Given the learnt p, we can approximate sparse solution of x by Eqn. [sent-229, score-0.146]
65 Thus, we can alternatively compute ˆ each component of u as follows: uk = (1 − r(zk )) × pk (π(zk ) − 1) + r(zk ) × pk (π(zk )), ˆ (13) whose time complexity becomes O(K). [sent-232, score-0.278]
66 (13), since we are essentially using pk as a lookup ˆ table, the complexity is independent from Q. [sent-234, score-0.139]
67 (3), it does not yet capture any “explaining away” effect where the corresponding coefficients of correlated bases are mutually inhibited to remove redundancy. [sent-237, score-0.103]
68 This is because each pk is estimated independently in the transform domain [9]. [sent-238, score-0.175]
69 3 Capturing Dependency Between Bases To handle the mutual inhibition between overcomplete bases, this section explains how to refine the sparse codes by solving regularized least squares on a significantly small active basis set. [sent-241, score-0.427]
70 Given a ˆ local descriptor x and its initial sparse code u estimated with above method, we set the non-zero components of the code to be active. [sent-242, score-0.362]
71 By denoting a set of these active components as φ, we have ˆ uφ and Bφ which are the subsets of the sparse code and dictionary bases respectively. [sent-243, score-0.353]
72 v u ˆ The intuition behind the above formulation is that the initial sparse code u is considered as a good starting point for refinement to further reduce the reconstruction error by allowing redundant bases to ˆ compete against each other. [sent-247, score-0.263]
73 We also report the time taken to process 1000 local descriptors for each method. [sent-313, score-0.152]
74 However, in our case, we do not preset the number of active bases and determine by non-zero ˆ ˆ components of u. [sent-317, score-0.117]
75 To learn the mapping function, we have used 50,000 local descriptors as data samples. [sent-322, score-0.177]
76 LLC is locality-constrained linear coding proposed by Wang et al. [sent-329, score-0.193]
77 We also include KM which builds its codebook with k-means clustering and adopts hard-assignment as its local descriptor coding. [sent-334, score-0.174]
78 For all methods, exactly the same local feature descriptors, spatial max pooling technique and linear SVM are used to only compare the difference between the local feature descriptor coding techniques. [sent-335, score-0.524]
79 In contrast, Local Self-Similarity computes correlation between a small image patch of interest and its surrounding region which captures the geometric layout of a local region. [sent-339, score-0.133]
80 For SIFT, GLAS+ is consistently better than GLAS demonstrating the effectiveness of mutual inhibition by the post-refinement procedure. [sent-348, score-0.105]
81 Both GLAS and GLAS+ performs better than other fast algorithms that produces sparse codes. [sent-349, score-0.167]
82 While sparse codes for both GLAS and GLAS+ are learned from the solutions of SC, the approximated codes are not exactly the same as the ones of SC. [sent-352, score-0.276]
83 Moreover, SC sometimes produces unstable codes due to the non-smooth convex property of 1 norm as previously observed in [6]. [sent-353, score-0.094]
84 5 Alpha (b) 1 72 70 68 66 64 62 0% SC RLS GLAS GLAS+ 10% 20% 30% % of Missing Data 40% (c) Figure 2: (a) Q, the number of bins to quantize the interval of each sparse code component. [sent-355, score-0.178]
85 (c) When some data samples are missing GLAS is more robust than regularized least squares given in Eqn. [sent-357, score-0.115]
86 approximates its sparse codes with a relatively smooth piece-wise linear mapping function learned with the generalized lasso (note that the 1 norm penalizes on changes in the shape of the function) and performs smooth post-refinement. [sent-359, score-0.351]
87 GLAS outperforms PSD probably due to the distribution of sparse codes is not captured well by a simple shrinkage function. [sent-362, score-0.216]
88 Table 1 also reports computational time taken to process 1000 local descriptors for each method. [sent-370, score-0.152]
89 We also validate if the generalized lasso given in Eqn. [sent-380, score-0.099]
90 The performance of PSD is close to KM and is outperformed by GLAS, suggesting the inadequate fitting of sparse codes. [sent-394, score-0.132]
91 We used SIFT to learn 1024 dictionary bases for each method. [sent-461, score-0.179]
92 5 Conclusion This paper has presented an approximation of 1 sparse coding based on the generalized lasso called GLAS. [sent-467, score-0.409]
93 This is further extended with the post-refinement procedure to handle mutual inhibition between bases which are essential in an overcomplete setting. [sent-468, score-0.257]
94 We have also demonstrated that the effectiveness of GLAS on two local descriptor types, namely SIFT and Local Self-Similarity where LLC and PSD only perform well on one type. [sent-470, score-0.189]
95 GLAS is not restricted to only approximate 1 sparse coding, but should be applicable to other variations of sparse coding in general. [sent-471, score-0.442]
96 For example, it may be interesting to try GLAS on Laplacian sparse coding [6] that achieves smoother sparse codes than 1 sparse coding. [sent-472, score-0.637]
97 Optimally sparse representation in general (nonorthogonal) dictionaries via L1 minimization. [sent-489, score-0.132]
98 Fast inference in sparse coding algorithms with applications to object recognition. [sent-535, score-0.339]
99 Sparse coding with an overcomplete basis set: A strategy employed by V1? [sent-585, score-0.228]
100 Linear spatial pyramid matching using sparse coding for image classification. [sent-627, score-0.462]
wordName wordTfidf (topN-words)
[('glas', 0.838), ('coding', 0.178), ('llc', 0.148), ('sc', 0.143), ('pk', 0.139), ('psd', 0.137), ('sparse', 0.132), ('descriptor', 0.105), ('bases', 0.103), ('sift', 0.093), ('rls', 0.091), ('nement', 0.088), ('descriptors', 0.083), ('sq', 0.077), ('dictionary', 0.076), ('km', 0.073), ('local', 0.069), ('zk', 0.064), ('codes', 0.063), ('lasso', 0.059), ('inhibition', 0.056), ('overcomplete', 0.05), ('regularized', 0.049), ('image', 0.046), ('pyramid', 0.044), ('spatial', 0.043), ('yang', 0.042), ('recognition', 0.04), ('generalized', 0.04), ('dpk', 0.039), ('wxi', 0.039), ('yk', 0.038), ('transform', 0.036), ('shaked', 0.034), ('mutual', 0.034), ('sk', 0.033), ('slice', 0.033), ('cvpr', 0.03), ('images', 0.029), ('squares', 0.029), ('object', 0.029), ('ranzato', 0.028), ('code', 0.028), ('visual', 0.028), ('parametric', 0.027), ('reconstructs', 0.027), ('train', 0.026), ('satoh', 0.026), ('urj', 0.026), ('mapping', 0.025), ('missing', 0.023), ('ui', 0.023), ('bui', 0.023), ('kz', 0.023), ('adler', 0.023), ('nearest', 0.021), ('shrinkage', 0.021), ('extrapolate', 0.021), ('tip', 0.021), ('denoising', 0.02), ('pooling', 0.02), ('fast', 0.02), ('feature', 0.02), ('suspect', 0.02), ('nicta', 0.02), ('gg', 0.02), ('matching', 0.019), ('bk', 0.019), ('rd', 0.019), ('qq', 0.019), ('gregor', 0.019), ('patch', 0.018), ('interpolate', 0.018), ('solutions', 0.018), ('interval', 0.018), ('yu', 0.017), ('kavukcuoglu', 0.017), ('norm', 0.017), ('cients', 0.017), ('discriminative', 0.016), ('divided', 0.016), ('coef', 0.016), ('australian', 0.015), ('training', 0.015), ('solved', 0.015), ('et', 0.015), ('align', 0.015), ('scenes', 0.015), ('effectiveness', 0.015), ('performs', 0.015), ('testing', 0.015), ('argued', 0.014), ('qj', 0.014), ('attracted', 0.014), ('solution', 0.014), ('unstable', 0.014), ('category', 0.014), ('samples', 0.014), ('active', 0.014), ('procedure', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 105 nips-2011-Generalized Lasso based Approximation of Sparse Coding for Visual Recognition
Author: Nobuyuki Morioka, Shin'ichi Satoh
Abstract: Sparse coding, a method of explaining sensory data with as few dictionary bases as possible, has attracted much attention in computer vision. For visual object category recognition, 1 regularized sparse coding is combined with the spatial pyramid representation to obtain state-of-the-art performance. However, because of its iterative optimization, applying sparse coding onto every local feature descriptor extracted from an image database can become a major bottleneck. To overcome this computational challenge, this paper presents “Generalized Lasso based Approximation of Sparse coding” (GLAS). By representing the distribution of sparse coefficients with slice transform, we fit a piece-wise linear mapping function with the generalized lasso. We also propose an efficient post-refinement procedure to perform mutual inhibition between bases which is essential for an overcomplete setting. The experiments show that GLAS obtains a comparable performance to 1 regularized sparse coding, yet achieves a significant speed up demonstrating its effectiveness for large-scale visual recognition problems. 1
2 0.14300501 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms
Author: Liefeng Bo, Xiaofeng Ren, Dieter Fox
Abstract: Extracting good representations from images is essential for many computer vision tasks. In this paper, we propose hierarchical matching pursuit (HMP), which builds a feature hierarchy layer-by-layer using an efficient matching pursuit encoder. It includes three modules: batch (tree) orthogonal matching pursuit, spatial pyramid max pooling, and contrast normalization. We investigate the architecture of HMP, and show that all three components are critical for good performance. To speed up the orthogonal matching pursuit, we propose a batch tree orthogonal matching pursuit that is particularly suitable to encode a large number of observations that share the same large dictionary. HMP is scalable and can efficiently handle full-size images. In addition, HMP enables linear support vector machines (SVM) to match the performance of nonlinear SVM while being scalable to large datasets. We compare HMP with many state-of-the-art algorithms including convolutional deep belief networks, SIFT based single layer sparse coding, and kernel based feature learning. HMP consistently yields superior accuracy on three types of image classification problems: object recognition (Caltech-101), scene recognition (MIT-Scene), and static event recognition (UIUC-Sports). 1
3 0.11510332 261 nips-2011-Sparse Filtering
Author: Jiquan Ngiam, Zhenghao Chen, Sonia A. Bhaskar, Pang W. Koh, Andrew Y. Ng
Abstract: Unsupervised feature learning has been shown to be effective at learning representations that perform well on image, video and audio classification. However, many existing feature learning algorithms are hard to use and require extensive hyperparameter tuning. In this work, we present sparse filtering, a simple new algorithm which is efficient and only has one hyperparameter, the number of features to learn. In contrast to most other feature learning methods, sparse filtering does not explicitly attempt to construct a model of the data distribution. Instead, it optimizes a simple cost function – the sparsity of 2 -normalized features – which can easily be implemented in a few lines of MATLAB code. Sparse filtering scales gracefully to handle high-dimensional inputs, and can also be used to learn meaningful features in additional layers with greedy layer-wise stacking. We evaluate sparse filtering on natural images, object classification (STL-10), and phone classification (TIMIT), and show that our method works well on a range of different modalities. 1
4 0.10652998 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
Author: Ioannis A. Gkioulekas, Todd Zickler
Abstract: We propose an approach for linear unsupervised dimensionality reduction, based on the sparse linear model that has been used to probabilistically interpret sparse coding. We formulate an optimization problem for learning a linear projection from the original signal domain to a lower-dimensional one in a way that approximately preserves, in expectation, pairwise inner products in the sparse domain. We derive solutions to the problem, present nonlinear extensions, and discuss relations to compressed sensing. Our experiments using facial images, texture patches, and images of object categories suggest that the approach can improve our ability to recover meaningful structure in many classes of signals. 1
5 0.094026178 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
Author: Alessandro Bergamo, Lorenzo Torresani, Andrew W. Fitzgibbon
Abstract: We introduce P I C O D ES: a very compact image descriptor which nevertheless allows high performance on object category recognition. In particular, we address novel-category recognition: the task of defining indexing structures and image representations which enable a large collection of images to be searched for an object category that was not known when the index was built. Instead, the training images defining the category are supplied at query time. We explicitly learn descriptors of a given length (from as small as 16 bytes per image) which have good object-recognition performance. In contrast to previous work in the domain of object recognition, we do not choose an arbitrary intermediate representation, but explicitly learn short codes. In contrast to previous approaches to learn compact codes, we optimize explicitly for (an upper bound on) classification performance. Optimization directly for binary features is difficult and nonconvex, but we present an alternation scheme and convex upper bound which demonstrate excellent performance in practice. P I C O D ES of 256 bytes match the accuracy of the current best known classifier for the Caltech256 benchmark, but they decrease the database storage size by a factor of 100 and speed-up the training and testing of novel classes by orders of magnitude.
6 0.088239737 276 nips-2011-Structured sparse coding via lateral inhibition
7 0.077182598 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
8 0.074054979 9 nips-2011-A More Powerful Two-Sample Test in High Dimensions using Random Projection
9 0.072271429 259 nips-2011-Sparse Estimation with Structured Dictionaries
10 0.068941638 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries
11 0.062443785 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
12 0.061956782 222 nips-2011-Prismatic Algorithm for Discrete D.C. Programming Problem
13 0.059567396 124 nips-2011-ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning
14 0.058283608 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
15 0.057665933 143 nips-2011-Learning Anchor Planes for Classification
16 0.05567085 189 nips-2011-Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels
17 0.055402409 239 nips-2011-Robust Lasso with missing and grossly corrupted observations
18 0.052403357 168 nips-2011-Maximum Margin Multi-Instance Learning
19 0.048705366 298 nips-2011-Unsupervised learning models of primary cortical receptive fields and receptive field plasticity
20 0.048558831 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
topicId topicWeight
[(0, 0.133), (1, 0.092), (2, -0.053), (3, 0.01), (4, -0.005), (5, 0.103), (6, 0.086), (7, 0.153), (8, 0.011), (9, -0.011), (10, -0.018), (11, -0.005), (12, 0.03), (13, 0.067), (14, -0.03), (15, 0.009), (16, -0.015), (17, 0.015), (18, 0.016), (19, 0.016), (20, 0.093), (21, 0.002), (22, 0.081), (23, -0.024), (24, 0.032), (25, -0.013), (26, 0.037), (27, 0.028), (28, -0.032), (29, 0.017), (30, -0.077), (31, -0.021), (32, -0.035), (33, -0.014), (34, -0.137), (35, -0.076), (36, 0.114), (37, 0.076), (38, 0.05), (39, 0.055), (40, 0.04), (41, 0.079), (42, -0.055), (43, 0.055), (44, 0.043), (45, -0.097), (46, 0.031), (47, -0.032), (48, -0.008), (49, 0.085)]
simIndex simValue paperId paperTitle
same-paper 1 0.93080682 105 nips-2011-Generalized Lasso based Approximation of Sparse Coding for Visual Recognition
Author: Nobuyuki Morioka, Shin'ichi Satoh
Abstract: Sparse coding, a method of explaining sensory data with as few dictionary bases as possible, has attracted much attention in computer vision. For visual object category recognition, 1 regularized sparse coding is combined with the spatial pyramid representation to obtain state-of-the-art performance. However, because of its iterative optimization, applying sparse coding onto every local feature descriptor extracted from an image database can become a major bottleneck. To overcome this computational challenge, this paper presents “Generalized Lasso based Approximation of Sparse coding” (GLAS). By representing the distribution of sparse coefficients with slice transform, we fit a piece-wise linear mapping function with the generalized lasso. We also propose an efficient post-refinement procedure to perform mutual inhibition between bases which is essential for an overcomplete setting. The experiments show that GLAS obtains a comparable performance to 1 regularized sparse coding, yet achieves a significant speed up demonstrating its effectiveness for large-scale visual recognition problems. 1
2 0.78154099 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms
Author: Liefeng Bo, Xiaofeng Ren, Dieter Fox
Abstract: Extracting good representations from images is essential for many computer vision tasks. In this paper, we propose hierarchical matching pursuit (HMP), which builds a feature hierarchy layer-by-layer using an efficient matching pursuit encoder. It includes three modules: batch (tree) orthogonal matching pursuit, spatial pyramid max pooling, and contrast normalization. We investigate the architecture of HMP, and show that all three components are critical for good performance. To speed up the orthogonal matching pursuit, we propose a batch tree orthogonal matching pursuit that is particularly suitable to encode a large number of observations that share the same large dictionary. HMP is scalable and can efficiently handle full-size images. In addition, HMP enables linear support vector machines (SVM) to match the performance of nonlinear SVM while being scalable to large datasets. We compare HMP with many state-of-the-art algorithms including convolutional deep belief networks, SIFT based single layer sparse coding, and kernel based feature learning. HMP consistently yields superior accuracy on three types of image classification problems: object recognition (Caltech-101), scene recognition (MIT-Scene), and static event recognition (UIUC-Sports). 1
3 0.63627201 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
Author: Yangqing Jia, Trevor Darrell
Abstract: Many applications in computer vision measure the similarity between images or image patches based on some statistics such as oriented gradients. These are often modeled implicitly or explicitly with a Gaussian noise assumption, leading to the use of the Euclidean distance when comparing image descriptors. In this paper, we show that the statistics of gradient based image descriptors often follow a heavy-tailed distribution, which undermines any principled motivation for the use of Euclidean distances. We advocate for the use of a distance measure based on the likelihood ratio test with appropriate probabilistic models that fit the empirical data distribution. We instantiate this similarity measure with the Gammacompound-Laplace distribution, and show significant improvement over existing distance measures in the application of SIFT feature matching, at relatively low computational cost. 1
4 0.63096476 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
Author: Ioannis A. Gkioulekas, Todd Zickler
Abstract: We propose an approach for linear unsupervised dimensionality reduction, based on the sparse linear model that has been used to probabilistically interpret sparse coding. We formulate an optimization problem for learning a linear projection from the original signal domain to a lower-dimensional one in a way that approximately preserves, in expectation, pairwise inner products in the sparse domain. We derive solutions to the problem, present nonlinear extensions, and discuss relations to compressed sensing. Our experiments using facial images, texture patches, and images of object categories suggest that the approach can improve our ability to recover meaningful structure in many classes of signals. 1
5 0.61187446 261 nips-2011-Sparse Filtering
Author: Jiquan Ngiam, Zhenghao Chen, Sonia A. Bhaskar, Pang W. Koh, Andrew Y. Ng
Abstract: Unsupervised feature learning has been shown to be effective at learning representations that perform well on image, video and audio classification. However, many existing feature learning algorithms are hard to use and require extensive hyperparameter tuning. In this work, we present sparse filtering, a simple new algorithm which is efficient and only has one hyperparameter, the number of features to learn. In contrast to most other feature learning methods, sparse filtering does not explicitly attempt to construct a model of the data distribution. Instead, it optimizes a simple cost function – the sparsity of 2 -normalized features – which can easily be implemented in a few lines of MATLAB code. Sparse filtering scales gracefully to handle high-dimensional inputs, and can also be used to learn meaningful features in additional layers with greedy layer-wise stacking. We evaluate sparse filtering on natural images, object classification (STL-10), and phone classification (TIMIT), and show that our method works well on a range of different modalities. 1
6 0.57445943 143 nips-2011-Learning Anchor Planes for Classification
7 0.53831136 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
8 0.5366627 124 nips-2011-ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning
9 0.53087962 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries
10 0.51223558 276 nips-2011-Structured sparse coding via lateral inhibition
11 0.50413752 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance
12 0.48103523 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
13 0.46489155 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation
14 0.42327067 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
15 0.42043465 293 nips-2011-Understanding the Intrinsic Memorability of Images
16 0.41788572 259 nips-2011-Sparse Estimation with Structured Dictionaries
17 0.41235849 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs
18 0.40791115 168 nips-2011-Maximum Margin Multi-Instance Learning
19 0.40053141 189 nips-2011-Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels
topicId topicWeight
[(0, 0.028), (4, 0.038), (17, 0.013), (20, 0.034), (26, 0.037), (31, 0.049), (33, 0.073), (40, 0.161), (43, 0.04), (45, 0.145), (57, 0.043), (65, 0.098), (74, 0.058), (83, 0.041), (84, 0.019), (99, 0.024)]
simIndex simValue paperId paperTitle
1 0.83486086 107 nips-2011-Global Solution of Fully-Observed Variational Bayesian Matrix Factorization is Column-Wise Independent
Author: Shinichi Nakajima, Masashi Sugiyama, S. D. Babacan
Abstract: Variational Bayesian matrix factorization (VBMF) efficiently approximates the posterior distribution of factorized matrices by assuming matrix-wise independence of the two factors. A recent study on fully-observed VBMF showed that, under a stronger assumption that the two factorized matrices are column-wise independent, the global optimal solution can be analytically computed. However, it was not clear how restrictive the column-wise independence assumption is. In this paper, we prove that the global solution under matrix-wise independence is actually column-wise independent, implying that the column-wise independence assumption is harmless. A practical consequence of our theoretical finding is that the global solution under matrix-wise independence (which is a standard setup) can be obtained analytically in a computationally very efficient way without any iterative algorithms. We experimentally illustrate advantages of using our analytic solution in probabilistic principal component analysis. 1
same-paper 2 0.81815034 105 nips-2011-Generalized Lasso based Approximation of Sparse Coding for Visual Recognition
Author: Nobuyuki Morioka, Shin'ichi Satoh
Abstract: Sparse coding, a method of explaining sensory data with as few dictionary bases as possible, has attracted much attention in computer vision. For visual object category recognition, 1 regularized sparse coding is combined with the spatial pyramid representation to obtain state-of-the-art performance. However, because of its iterative optimization, applying sparse coding onto every local feature descriptor extracted from an image database can become a major bottleneck. To overcome this computational challenge, this paper presents “Generalized Lasso based Approximation of Sparse coding” (GLAS). By representing the distribution of sparse coefficients with slice transform, we fit a piece-wise linear mapping function with the generalized lasso. We also propose an efficient post-refinement procedure to perform mutual inhibition between bases which is essential for an overcomplete setting. The experiments show that GLAS obtains a comparable performance to 1 regularized sparse coding, yet achieves a significant speed up demonstrating its effectiveness for large-scale visual recognition problems. 1
3 0.77415591 30 nips-2011-Algorithms for Hyper-Parameter Optimization
Author: James S. Bergstra, Rémi Bardenet, Yoshua Bengio, Balázs Kégl
Abstract: Several recent advances to the state of the art in image classification benchmarks have come from better configurations of existing techniques rather than novel approaches to feature learning. Traditionally, hyper-parameter optimization has been the job of humans because they can be very efficient in regimes where only a few trials are possible. Presently, computer clusters and GPU processors make it possible to run more trials and we show that algorithmic approaches can find better results. We present hyper-parameter optimization results on tasks of training neural networks and deep belief networks (DBNs). We optimize hyper-parameters using random search and two new greedy sequential methods based on the expected improvement criterion. Random search has been shown to be sufficiently efficient for learning neural networks for several datasets, but we show it is unreliable for training DBNs. The sequential algorithms are applied to the most difficult DBN learning problems from [1] and find significantly better results than the best previously reported. This work contributes novel techniques for making response surface models P (y|x) in which many elements of hyper-parameter assignment (x) are known to be irrelevant given particular values of other elements. 1
4 0.76936853 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms
Author: Liefeng Bo, Xiaofeng Ren, Dieter Fox
Abstract: Extracting good representations from images is essential for many computer vision tasks. In this paper, we propose hierarchical matching pursuit (HMP), which builds a feature hierarchy layer-by-layer using an efficient matching pursuit encoder. It includes three modules: batch (tree) orthogonal matching pursuit, spatial pyramid max pooling, and contrast normalization. We investigate the architecture of HMP, and show that all three components are critical for good performance. To speed up the orthogonal matching pursuit, we propose a batch tree orthogonal matching pursuit that is particularly suitable to encode a large number of observations that share the same large dictionary. HMP is scalable and can efficiently handle full-size images. In addition, HMP enables linear support vector machines (SVM) to match the performance of nonlinear SVM while being scalable to large datasets. We compare HMP with many state-of-the-art algorithms including convolutional deep belief networks, SIFT based single layer sparse coding, and kernel based feature learning. HMP consistently yields superior accuracy on three types of image classification problems: object recognition (Caltech-101), scene recognition (MIT-Scene), and static event recognition (UIUC-Sports). 1
5 0.76393187 124 nips-2011-ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning
Author: Quoc V. Le, Alexandre Karpenko, Jiquan Ngiam, Andrew Y. Ng
Abstract: Independent Components Analysis (ICA) and its variants have been successfully used for unsupervised feature learning. However, standard ICA requires an orthonoramlity constraint to be enforced, which makes it difficult to learn overcomplete features. In addition, ICA is sensitive to whitening. These properties make it challenging to scale ICA to high dimensional data. In this paper, we propose a robust soft reconstruction cost for ICA that allows us to learn highly overcomplete sparse features even on unwhitened data. Our formulation reveals formal connections between ICA and sparse autoencoders, which have previously been observed only empirically. Our algorithm can be used in conjunction with off-the-shelf fast unconstrained optimizers. We show that the soft reconstruction cost can also be used to prevent replicated features in tiled convolutional neural networks. Using our method to learn highly overcomplete sparse features and tiled convolutional neural networks, we obtain competitive performances on a wide variety of object recognition tasks. We achieve state-of-the-art test accuracies on the STL-10 and Hollywood2 datasets. 1
6 0.75512499 261 nips-2011-Sparse Filtering
7 0.75403929 98 nips-2011-From Bandits to Experts: On the Value of Side-Observations
8 0.75312471 189 nips-2011-Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels
9 0.74996454 244 nips-2011-Selecting Receptive Fields in Deep Networks
10 0.74332315 77 nips-2011-Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression
11 0.73861402 180 nips-2011-Multiple Instance Filtering
12 0.72479385 93 nips-2011-Extracting Speaker-Specific Information with a Regularized Siamese Deep Network
13 0.71114683 165 nips-2011-Matrix Completion for Multi-label Image Classification
14 0.71048975 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
15 0.70685911 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
16 0.70608312 168 nips-2011-Maximum Margin Multi-Instance Learning
17 0.70259249 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries
18 0.70152044 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation
19 0.70059103 298 nips-2011-Unsupervised learning models of primary cortical receptive fields and receptive field plasticity
20 0.7003504 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model