nips nips2010 nips2010-59 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yuanqing Lin, Zhang Tong, Shenghuo Zhu, Kai Yu
Abstract: This paper proposes a principled extension of the traditional single-layer flat sparse coding scheme, where a two-layer coding scheme is derived based on theoretical analysis of nonlinear functional approximation that extends recent results for local coordinate coding. The two-layer approach can be easily generalized to deeper structures in a hierarchical multiple-layer manner. Empirically, it is shown that the deep coding approach yields improved performance in benchmark datasets.
Reference: text
sentIndex sentText sentNum sentScore
1 The two-layer approach can be easily generalized to deeper structures in a hierarchical multiple-layer manner. [sent-2, score-0.133]
2 Empirically, it is shown that the deep coding approach yields improved performance in benchmark datasets. [sent-3, score-0.804]
3 1 Introduction Sparse coding has attracted significant attention in recent years because it has been shown to be effective for some classification problems [12, 10, 9, 13, 11, 14, 2, 5]. [sent-4, score-0.729]
4 In particular, it has been empirically observed that high-dimensional sparse coding plus linear classifier is successful for image classification tasks such as PASCAL 2009 [7, 15]. [sent-5, score-0.852]
5 Specifically, LCC learns a nonlinear function in high dimension by forming an adaptive set of basis functions on the data manifold, and it has nonlinear approximation power. [sent-7, score-0.372]
6 A recent extension of LCC with added local tangent directions [16] demonstrated the possibility to achieve locally quadratic approximation power when the underlying data manifold is relatively flat. [sent-8, score-0.32]
7 This also indicates that the nonlinear function approximation view of sparse coding not only yields deeper theoretical understanding of its success, but also leads to improved algorithms based on refined analysis. [sent-9, score-1.08]
8 This paper follows the same idea, where we propose a principled extension of single-layer sparse coding based on theoretical analysis of a two level coding scheme. [sent-10, score-1.591]
9 Such extension draws connection to deep belief networks (DBN) [8], and hence we call this approach deep coding network. [sent-12, score-0.937]
10 Hierarchical sparse coding has two main advantages over its single-layer counter-part. [sent-13, score-0.811]
11 Second, it is computationally more efficient than flat coding because we only need to look at locations in the second (or higher) layer corresponding to basis functions with nonzero coefficients in the first (or previous) layer. [sent-16, score-1.09]
12 Since sparse coding produces many zero coefficients, the hierarchical structure significantly eliminates many of the coding computation. [sent-17, score-1.554]
13 Moreover, instead of fitting a single model with many variables as in a flat single layer approach, our proposal of multi-layer coding requires fitting many small models separately, each 1 with a small number of parameters. [sent-18, score-1.027]
14 2 Sparse Coding and Nonlinear Function Approximation This section reviews the nonlinear function approximation results of single-layer coding scheme in [17], and then presents our multi-layer extension. [sent-22, score-0.958]
15 Since the result of [17] requires a modification of the traditional sparse coding scheme called local coordinate coding (LCC), our analysis will rely on a similar modification. [sent-23, score-1.838]
16 Specifically, it was theoretically shown in [17] that a specific coding scheme called Local Coordinate Coding can take advantage of the underlying data manifold geometric structure in order to learn a nonlinear function in high dimension and alleviate the curse of dimensionality problem. [sent-27, score-1.04]
17 The main idea of LCC, described in [17], is to locally embed points on the underlying data manifold into a lower dimensional space, expressed as coordinates with respect to a set of anchor points. [sent-28, score-0.168]
18 The main theoretical observation was relatively simple: it was shown in [17] that on the data manifold, a nonlinear function can be effectively approximated by a globally linear function with respect to the local coordinate coding. [sent-29, score-0.377]
19 For local constant approximation, the error term α x−x′ is the first order in x−x′ ; for local linear approximation, the error term β x−x′ 2 is the second order in x − x′ ; for local quadratic approximation, the error term ν x − x′ 3 is the third order in x − x′ . [sent-42, score-0.345]
20 2 Similar to the single-layer coordinate coding in [17], here we define a two-layer coordinate coding as the following. [sent-44, score-1.678]
21 2 (Coordinate Coding) A single-layer coordinate coding is a pair (γ 1 , C 1 ), where 1 C 1 ⊂ Rd is a set of anchor points (aka basis functions), and γ is a map of x ∈ Rd to [γv (x)]v∈C 1 ∈ 1 1 R|C | such that v∈C 1 γv (x) = 1. [sent-46, score-0.979]
22 hγ 1 ,C 1 (x) = v∈C 1 A two-layer coordinate coding (γ, C) consists of coordinate coding systems {(γ 1 , C 1 )} ∪ {(γ 2,v , C 2,v ) : v ∈ C 1 }. [sent-48, score-1.678]
23 The pair (γ 1 , C 1 ) is the first layer coordinate coding, (γ 2,v , C 2,v ) are second layer coordinate-coding pairs that refine the first layer coding for every first-layer anchor point v ∈ C 1 . [sent-49, score-1.799]
24 The performance of LCC is characterized in [17] using the following nonlinear function approximation result. [sent-50, score-0.153]
25 1 (Single-layer LCC Nonlinear Function Approximation) Let (γ 1 , C 1 ) be an arbitrary single-layer coordinate coding scheme on Rd . [sent-52, score-0.942]
26 We have for all x ∈ Rd : 1 wv γv (x) ≤ α x − hγ 1 ,C 1 (x) + β f (x) − v∈C 1 1 |γv (x)| v − x 2 , (1) v∈C 1 where wv = f (v) for v ∈ C 1 . [sent-54, score-0.286]
27 This result shows that a high dimensional nonlinear function can be globally approximated by a 1 linear function with respect to the single-layer coding [γv (x)], with unknown linear coefficients [wv ]v∈C 1 = [f (v)]v∈C 1 , where the approximation on the right hand size is second order. [sent-55, score-0.97]
28 This 1 bounds directly suggests the following learning method: for each x, we use its coding [γv (x)] ∈ 1 1 R|C | as features. [sent-56, score-0.702]
29 We then learn a linear function of the form v wv γv (x) using a standard linear learning method such as SVM, where [wv ] is the unknown coefficient vector to be learned. [sent-57, score-0.164]
30 The optimal coding can be learned using unlabeled data by optimizing the right hand side of (1) over unlabeled data. [sent-58, score-0.854]
31 2 (Two-layer LCC Nonlinear Function Approximation) Let (γ, C) = {(γ 1 , C 1 )} ∪ {(γ 2,v , C 2,v ) : v ∈ C 1 } be an arbitrary two-layer coordinate coding on Rd . [sent-62, score-0.839]
32 We have for all x ∈ Rd : 1 wv γv (x) − f (x) − v∈C 1 1 γv (x) v∈C 1 2,v wv,u γu (x) u∈C 2,v 1 |γv (x)| x − hγ 2,v ,C 2,v (x) + ν ≤0. [sent-64, score-0.143]
33 5α v∈C 1 1 |γv (x)| x − v 3 , (2) v∈C 1 where wv = f (v) for v ∈ C 1 and wv,u = 0. [sent-66, score-0.143]
34 The coding can be learned from unlabeled data by minimizing the right hand side of (2) or (3). [sent-71, score-0.821]
35 Compare with the single-layer coding, we note that the second term on the right hand side of (1) is replaced by the third term on the right hand side of (2). [sent-72, score-0.242]
36 That is, the linear approximation power of the single-layer coding scheme (with a quadratic error term) becomes quadratic approximation power of the two-layer coding scheme (with a cubic error term). [sent-73, score-1.898]
37 The first term on the right hand side of (1) is replaced by the first two terms on the right hand of (2). [sent-74, score-0.165]
38 If the manifold is relatively flat, then the error terms x − hγ 1 ,C 1 (x) and x − hγ 2,v ,C 2,v (x) will be relatively small in comparison to the second term on the right hand side of (1). [sent-75, score-0.253]
39 In such case the two-layer coding scheme can potentially improve the single-layer system significantly. [sent-76, score-0.805]
40 This result is similar to that of [16], where the second layer uses local PCA instead of another layer of nonlinear coding. [sent-77, score-0.753]
41 The bound in (2) shows the potential of the two-layer coding scheme in achieving higher order approximation power than single layer coding. [sent-80, score-1.206]
42 In such case, the bound in (3) shows that the performance of the two-level coding is still comparable to that of one-level coding scheme in (1). [sent-83, score-1.507]
43 This is the situation where the 1st layer is mainly used to partition the space (while its approximation accuracy is not important), while the main approximation power is achieved with the second layer. [sent-84, score-0.432]
44 The main advantage of two-layer coding in this case is to save computation. [sent-85, score-0.702]
45 This is because instead of solving a single layer coding system with many parameters, we can solve many smaller coding systems, each with a small number of parameters. [sent-86, score-1.706]
46 This is the situation when including nonlinearity in the second layer becomes useful, which means that the deep-coding network approach in this paper has some advantage over [16] which can only approximate linear function with local PCA in the second layer. [sent-87, score-0.375]
47 While the two bounds (2) and (3) consider different scenarios depending on the relative size of the first layer versus the second layer, in reality it is difficult to differentiate and usually both bounds play a role at the same time. [sent-90, score-0.302]
48 , vj,L2 }, and 2,v i γvj,kj (Xi ) = γj,k , where L1 is the size of the first-layer codebook, and L2 is the size of each individual codebook at the second layer. [sent-100, score-0.238]
49 We take a layer-by-layer approach for training, where the second layer is regarded as a refinement of the first layer, which is consistent with Lemma 2. [sent-101, score-0.323]
50 In the first layer, we learn a simple sparse coding model with all data: 2 L1 n 1 1 [γ , C ] = arg min γ,v 1 i γj vj Xi − 2 i=1 j=1 i subject to γj ≥ 0, 2 i γj = 1, vj ≤ κ, (4) j where κ is some constant, e. [sent-103, score-0.888]
51 For convenience, we not only enforce sum-to-one-constraint on the sparse coefficients, but also 4 i i impose nonnegative constraints so that j |γj | = j γj = 1 for all i. [sent-106, score-0.198]
52 L 1 where s is a scaling factor balances the coding from the two different layers. [sent-119, score-0.702]
53 2 Multi-layer Extension The two-level coding scheme can be easily extended to the third and higher layers. [sent-121, score-0.864]
54 For example, at the third layer, for each base vj,k , the third-layer coding is to solve the following weighted optimization: 2 L3 n 1 j,k j,k i i i [γ3 , C3 ] = arg min γj,k Xi − γj,k,l γj,k,l vj,k,l + λ3 γ,v 2 i=1 l=1 subject to i γj,k,l ≥ 0, vj,k,l ≤ 1. [sent-122, score-0.722]
55 3 Optimization The optimization problems in Equations (4) to (6) can be generally solved by alternating the following two steps: 1) given current cookbook estimation v, compute the optimal sparse coefficients γ; 2) given the new estimates of the sparse coefficients, optimize the cookbooks. [sent-124, score-0.254]
56 The optimization problem in (5) contains only nonnegative constraints (but not the sum-to-one constraint), for which we employ a pathwise projected Newton (PPN) method [3] that optimizes a block of coordinates per iteration instead of one coordinate at a time in the active set method. [sent-131, score-0.278]
57 As a result, in typical sparse coding settings (for example, in the experiments that we will present shortly in Section 4), the PPN method is able to give the exact solution of a median size (e. [sent-132, score-0.811]
58 A significant advantage of the second layer optimization in our proposal is parallelization. [sent-139, score-0.325]
59 As shown in (5), the second-layer sparse coding is decomposed into L1 independent coding problems, and thus can be naturally parallelized. [sent-140, score-1.513]
60 1 MNIST dataset We first demonstrate the effectiveness of the proposed deep coding scheme on the popular MNIST benchmark data [1]. [sent-143, score-0.907]
61 In our experiments of deep coding network, the entire training set is used to learn the first-layer coding, with codebook of size 64. [sent-145, score-1.063]
62 For each of the 64 bases in the first layer, a second-layer codebook was learned – the deep coding scheme presented in the paper ensures that the codebook learning can be done independently. [sent-146, score-1.562]
63 We implemented a Hadoop parallel program that solved the 64 codebook learning tasks in about an hour – which would have taken 64 hours on single machine. [sent-147, score-0.261]
64 This shows that easy parallelization is a very attractive aspect of the proposed deep coding scheme, especially for large scale problems. [sent-148, score-0.804]
65 Table 1 shows the performance of deep coding network on MNIST compared to some previous coding schemes. [sent-149, score-1.532]
66 First, adding an extra layer yields significant improvement on classification; e. [sent-151, score-0.302]
67 for L1 = 512, the classification error rate for single layer LCC is 2. [sent-153, score-0.324]
68 98% [16] (the extended LCC method in [16] may also be regarded as a two layer method but the second layer is linear); the two-layer coding scheme here significantly improves the performance with classification error rate of 1. [sent-155, score-1.471]
69 Second, the two-layer coding is less prone to overfitting than its single-layer counterpart. [sent-157, score-0.702]
70 In fact, for the single-layer coding, our experiment shows that further increasing the codebook size will cause overfitting (e. [sent-158, score-0.238]
71 In contrast, the performance of two-layer coding still improves when the second-layer codebook is as large as 512 (and the total codebook size is 64 × 512 = 32768, which is very high-dimensional considering the total number of training data is only 60,000). [sent-162, score-1.178]
72 This property is desirable especially when high-dimensional representation is preferred in the case of using sparse coding plus linear classifier for classifications. [sent-163, score-0.811]
73 Figure 1 shows some first-layer bases and their associated second-layer bases. [sent-164, score-0.179]
74 We can see that the second-layer bases provide deeper details that helps to further explain their first layer parent basis; on the other hand, the parent first-layer basis provides an informative context for its child secondlayer bases. [sent-165, score-0.743]
75 1 where the first-layer basis is like Digit 7, this basis can come from Digit 7, Digit 9 or even Digit 4. [sent-167, score-0.172]
76 Then, its second-layer bases help to further explain the meaning of the first-layer basis: in its associated second-layer bases, the first two bases in that row are parts of Digit 9 while the last basis in that row is a part of Digit ’4’. [sent-168, score-0.488]
77 Meanwhile, the first-layer 7-like basis provides important context for its second-layer part-like bases – without the first-layer basis, the fragmented parts (like the first two second-layer bases in that row) may not be very informative. [sent-169, score-0.444]
78 The zoomed-in details contained in deeper bases significantly help a classifier to resolve difficult examples, and interestingly, coarser details provide useful context for finer details. [sent-170, score-0.271]
79 Single layer sparse coding Number of bases (L1 ) 512 1024 Local coordinate coding 2. [sent-171, score-2.131]
80 82 Two-layer sparse coding Number of bases (L2 ) 64 128 L1 = 64 1. [sent-175, score-0.99]
81 51 Table 1: The classification error rate (in %) on MNIST dataset with different sparse coding schemes. [sent-183, score-0.833]
82 6 First−layer bases Second−layer bases Figure 1: Example of bases from a two-layer coding network on MNIST data. [sent-184, score-1.265]
83 The colorbar is the same for all images, but the range it represents differs from image to image – generally, the color of the background of a image represent zero value, and the colors above and below that color respectively represent positive and negative values. [sent-186, score-0.142]
84 Among different methods, one particularly effective approach is to use sparse coding to derive a codebook of low-level features (such as SIFT) and represent an image as a bag of visual words [15]. [sent-192, score-1.138]
85 Here, we intend to learn two-layer hierarchical codebooks instead of single flat codebook for the bag-of-word image representation. [sent-193, score-0.404]
86 Then, the SIFT descriptors from all images (both training and validation images) were utilized to learn first-layer codebooks with different dimensions, L1 = 512, 1024 and 2048. [sent-195, score-0.137]
87 Then, given a firstlayer codebook, for each basis in the codebook, we learned its second-layer codebook of size 64 by solving the weighted optimization in (5). [sent-196, score-0.324]
88 Again, the second-layer codebook learning was done in parallel using Hadoop. [sent-197, score-0.261]
89 With the first-layer and second-layer codebooks, each SIFT feature was coded into a very high dimensional space: using L1 = 1024 as an example, the coding dimension 7 Dimension of the first layer (L1 ) Single-layer sparse coding Two-layer sparse coding (L2 =64) 512 42. [sent-198, score-2.682]
90 3 Table 2: Average precision (in %) of classification on PASCAL07 dataset using different sparse coding schemes. [sent-204, score-0.811]
91 It is clear that the two-layer sparse coding performs significantly better than its single-layer counterpart. [sent-209, score-0.811]
92 We would like to point out that, although we simply employed max-pooling in the experiments, it may not be the best pooling strategy for the hierarchical coding scheme presented in this paper. [sent-210, score-0.869]
93 We believe a better pooling scheme needs to take the hierarchical structure into account, but this remains as an open problem and is one of our future work. [sent-211, score-0.167]
94 5 Conclusion This paper proposes a principled extension of the traditional single-layer flat sparse coding scheme, where a two-layer coding scheme is derived based on theoretical analysis of nonlinear functional approximation that extends recent results for local coordinate coding. [sent-212, score-2.089]
95 The two-layer approach can be easily generalized to deeper structures in a hierarchical multiple-layer manner. [sent-213, score-0.133]
96 There are two main advantages of multi-layer coding: it can potentially achieve better performance because the deeper layers provide more details and structures; it is computationally more efficient because coding are decomposed into smaller problems. [sent-214, score-0.831]
97 Experiment showed that the performance of two-layer coding can significantly improve that of single-layer coding. [sent-215, score-0.702]
98 For the future directions, it will be interesting to explore the deep coding network with more than two layers. [sent-216, score-0.83]
99 However, for more complicated data, deeper coding with multiple layers may be an effective way for gaining finer and finer features. [sent-219, score-0.858]
100 Linear spatial pyramid matching using sparse coding for image classification. [sent-296, score-0.876]
wordName wordTfidf (topN-words)
[('coding', 0.702), ('layer', 0.302), ('lcc', 0.25), ('codebook', 0.238), ('bases', 0.179), ('wv', 0.143), ('coordinate', 0.137), ('sparse', 0.109), ('scheme', 0.103), ('deep', 0.102), ('nonlinear', 0.102), ('deeper', 0.092), ('basis', 0.086), ('mnist', 0.076), ('lipschitz', 0.075), ('nonnegative', 0.067), ('digit', 0.064), ('codebooks', 0.063), ('kai', 0.061), ('ner', 0.059), ('manifold', 0.058), ('anchor', 0.054), ('rd', 0.053), ('pascal', 0.052), ('approximation', 0.051), ('cients', 0.049), ('local', 0.047), ('sift', 0.046), ('coef', 0.046), ('hadoop', 0.044), ('ppn', 0.044), ('secondlayer', 0.044), ('yihong', 0.044), ('newton', 0.043), ('quadratic', 0.043), ('image', 0.041), ('hierarchical', 0.041), ('classi', 0.04), ('rajat', 0.039), ('traditional', 0.038), ('layers', 0.037), ('approximated', 0.036), ('tong', 0.036), ('alexis', 0.036), ('cookbook', 0.036), ('honglak', 0.036), ('tting', 0.035), ('hand', 0.034), ('unlabeled', 0.033), ('side', 0.032), ('projected', 0.032), ('lemma', 0.031), ('locally', 0.031), ('relatively', 0.031), ('dimension', 0.031), ('extension', 0.031), ('images', 0.03), ('hessian', 0.028), ('yu', 0.028), ('vj', 0.028), ('power', 0.028), ('battle', 0.027), ('effective', 0.027), ('network', 0.026), ('raina', 0.026), ('smoothness', 0.025), ('dimensional', 0.025), ('term', 0.025), ('theoretical', 0.024), ('pyramid', 0.024), ('pooling', 0.023), ('curse', 0.023), ('principled', 0.023), ('parallel', 0.023), ('descriptors', 0.023), ('proposal', 0.023), ('row', 0.022), ('smooth', 0.022), ('ected', 0.022), ('re', 0.022), ('constraints', 0.022), ('error', 0.022), ('andrew', 0.021), ('visual', 0.021), ('regarded', 0.021), ('learn', 0.021), ('parent', 0.02), ('proposes', 0.02), ('third', 0.02), ('employ', 0.02), ('higher', 0.02), ('right', 0.02), ('extended', 0.019), ('dennis', 0.019), ('popularized', 0.019), ('colorbar', 0.019), ('airplanes', 0.019), ('lewicki', 0.019), ('rstlayer', 0.019), ('xi', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 59 nips-2010-Deep Coding Network
Author: Yuanqing Lin, Zhang Tong, Shenghuo Zhu, Kai Yu
Abstract: This paper proposes a principled extension of the traditional single-layer flat sparse coding scheme, where a two-layer coding scheme is derived based on theoretical analysis of nonlinear functional approximation that extends recent results for local coordinate coding. The two-layer approach can be easily generalized to deeper structures in a hierarchical multiple-layer manner. Empirically, it is shown that the deep coding approach yields improved performance in benchmark datasets.
2 0.2266365 143 nips-2010-Learning Convolutional Feature Hierarchies for Visual Recognition
Author: Koray Kavukcuoglu, Pierre Sermanet, Y-lan Boureau, Karol Gregor, Michael Mathieu, Yann L. Cun
Abstract: We propose an unsupervised method for learning multi-stage hierarchies of sparse convolutional features. While sparse coding has become an increasingly popular method for learning visual features, it is most often trained at the patch level. Applying the resulting filters convolutionally results in highly redundant codes because overlapping patches are encoded in isolation. By training convolutionally over large image windows, our method reduces the redudancy between feature vectors at neighboring locations and improves the efficiency of the overall representation. In addition to a linear decoder that reconstructs the image from sparse features, our method trains an efficient feed-forward encoder that predicts quasisparse features from the input. While patch-based training rarely produces anything but oriented edge detectors, we show that convolutional training produces highly diverse filters, including center-surround filters, corner detectors, cross detectors, and oriented grating detectors. We show that using these filters in multistage convolutional network architecture improves performance on a number of visual recognition and detection tasks. 1
3 0.18488233 76 nips-2010-Energy Disaggregation via Discriminative Sparse Coding
Author: J. Z. Kolter, Siddharth Batra, Andrew Y. Ng
Abstract: Energy disaggregation is the task of taking a whole-home energy signal and separating it into its component appliances. Studies have shown that having devicelevel energy information can cause users to conserve significant amounts of energy, but current electricity meters only report whole-home data. Thus, developing algorithmic methods for disaggregation presents a key technical challenge in the effort to maximize energy conservation. In this paper, we examine a large scale energy disaggregation task, and apply a novel extension of sparse coding to this problem. In particular, we develop a method, based upon structured prediction, for discriminatively training sparse coding algorithms specifically to maximize disaggregation performance. We show that this significantly improves the performance of sparse coding algorithms on the energy task and illustrate how these disaggregation results can provide useful information about energy usage. 1
4 0.16856603 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
Author: Pierre Garrigues, Bruno A. Olshausen
Abstract: We propose a class of sparse coding models that utilizes a Laplacian Scale Mixture (LSM) prior to model dependencies among coefficients. Each coefficient is modeled as a Laplacian distribution with a variable scale parameter, with a Gamma distribution prior over the scale parameter. We show that, due to the conjugacy of the Gamma prior, it is possible to derive efficient inference procedures for both the coefficients and the scale parameter. When the scale parameters of a group of coefficients are combined into a single variable, it is possible to describe the dependencies that occur due to common amplitude fluctuations among coefficients, which have been shown to constitute a large fraction of the redundancy in natural images [1]. We show that, as a consequence of this group sparse coding, the resulting inference of the coefficients follows a divisive normalization rule, and that this may be efficiently implemented in a network architecture similar to that which has been proposed to occur in primary visual cortex. We also demonstrate improvements in image coding and compressive sensing recovery using the LSM model. 1
5 0.15931979 140 nips-2010-Layer-wise analysis of deep networks with Gaussian kernels
Author: Grégoire Montavon, Klaus-Robert Müller, Mikio L. Braun
Abstract: Deep networks can potentially express a learning problem more efficiently than local learning machines. While deep networks outperform local learning machines on some problems, it is still unclear how their nice representation emerges from their complex structure. We present an analysis based on Gaussian kernels that measures how the representation of the learning problem evolves layer after layer as the deep network builds higher-level abstract representations of the input. We use this analysis to show empirically that deep networks build progressively better representations of the learning problem and that the best representations are obtained when the deep network discriminates only in the last layers. 1
6 0.15388536 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives
7 0.15043983 268 nips-2010-The Neural Costs of Optimal Control
8 0.14065656 65 nips-2010-Divisive Normalization: Justification and Effectiveness as Efficient Coding Transform
9 0.12739709 141 nips-2010-Layered image motion with explicit occlusions, temporal consistency, and depth ordering
10 0.10829341 89 nips-2010-Factorized Latent Spaces with Structured Sparsity
11 0.10572756 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
12 0.10571076 56 nips-2010-Deciphering subsampled data: adaptive compressive sampling as a principle of brain communication
13 0.08888337 271 nips-2010-Tiled convolutional neural networks
14 0.088485666 37 nips-2010-Basis Construction from Power Series Expansions of Value Functions
15 0.087499008 96 nips-2010-Fractionally Predictive Spiking Neurons
16 0.084087811 133 nips-2010-Kernel Descriptors for Visual Recognition
17 0.082479075 99 nips-2010-Gated Softmax Classification
18 0.080111578 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
19 0.078721106 266 nips-2010-The Maximal Causes of Natural Scenes are Edge Filters
20 0.075831808 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression
topicId topicWeight
[(0, 0.186), (1, 0.083), (2, -0.148), (3, -0.021), (4, 0.111), (5, -0.066), (6, 0.011), (7, 0.169), (8, -0.145), (9, -0.01), (10, 0.112), (11, -0.062), (12, 0.003), (13, -0.195), (14, -0.169), (15, -0.144), (16, -0.067), (17, 0.04), (18, -0.131), (19, -0.104), (20, 0.116), (21, -0.096), (22, -0.105), (23, -0.038), (24, -0.063), (25, -0.029), (26, -0.093), (27, -0.025), (28, -0.022), (29, 0.2), (30, -0.013), (31, 0.091), (32, -0.106), (33, 0.078), (34, -0.074), (35, 0.029), (36, 0.082), (37, 0.06), (38, -0.055), (39, -0.038), (40, 0.036), (41, -0.01), (42, 0.09), (43, 0.017), (44, 0.136), (45, -0.056), (46, 0.085), (47, -0.072), (48, -0.031), (49, -0.081)]
simIndex simValue paperId paperTitle
same-paper 1 0.97542661 59 nips-2010-Deep Coding Network
Author: Yuanqing Lin, Zhang Tong, Shenghuo Zhu, Kai Yu
Abstract: This paper proposes a principled extension of the traditional single-layer flat sparse coding scheme, where a two-layer coding scheme is derived based on theoretical analysis of nonlinear functional approximation that extends recent results for local coordinate coding. The two-layer approach can be easily generalized to deeper structures in a hierarchical multiple-layer manner. Empirically, it is shown that the deep coding approach yields improved performance in benchmark datasets.
2 0.82863462 76 nips-2010-Energy Disaggregation via Discriminative Sparse Coding
Author: J. Z. Kolter, Siddharth Batra, Andrew Y. Ng
Abstract: Energy disaggregation is the task of taking a whole-home energy signal and separating it into its component appliances. Studies have shown that having devicelevel energy information can cause users to conserve significant amounts of energy, but current electricity meters only report whole-home data. Thus, developing algorithmic methods for disaggregation presents a key technical challenge in the effort to maximize energy conservation. In this paper, we examine a large scale energy disaggregation task, and apply a novel extension of sparse coding to this problem. In particular, we develop a method, based upon structured prediction, for discriminatively training sparse coding algorithms specifically to maximize disaggregation performance. We show that this significantly improves the performance of sparse coding algorithms on the energy task and illustrate how these disaggregation results can provide useful information about energy usage. 1
3 0.77262533 143 nips-2010-Learning Convolutional Feature Hierarchies for Visual Recognition
Author: Koray Kavukcuoglu, Pierre Sermanet, Y-lan Boureau, Karol Gregor, Michael Mathieu, Yann L. Cun
Abstract: We propose an unsupervised method for learning multi-stage hierarchies of sparse convolutional features. While sparse coding has become an increasingly popular method for learning visual features, it is most often trained at the patch level. Applying the resulting filters convolutionally results in highly redundant codes because overlapping patches are encoded in isolation. By training convolutionally over large image windows, our method reduces the redudancy between feature vectors at neighboring locations and improves the efficiency of the overall representation. In addition to a linear decoder that reconstructs the image from sparse features, our method trains an efficient feed-forward encoder that predicts quasisparse features from the input. While patch-based training rarely produces anything but oriented edge detectors, we show that convolutional training produces highly diverse filters, including center-surround filters, corner detectors, cross detectors, and oriented grating detectors. We show that using these filters in multistage convolutional network architecture improves performance on a number of visual recognition and detection tasks. 1
4 0.67211068 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
Author: Pierre Garrigues, Bruno A. Olshausen
Abstract: We propose a class of sparse coding models that utilizes a Laplacian Scale Mixture (LSM) prior to model dependencies among coefficients. Each coefficient is modeled as a Laplacian distribution with a variable scale parameter, with a Gamma distribution prior over the scale parameter. We show that, due to the conjugacy of the Gamma prior, it is possible to derive efficient inference procedures for both the coefficients and the scale parameter. When the scale parameters of a group of coefficients are combined into a single variable, it is possible to describe the dependencies that occur due to common amplitude fluctuations among coefficients, which have been shown to constitute a large fraction of the redundancy in natural images [1]. We show that, as a consequence of this group sparse coding, the resulting inference of the coefficients follows a divisive normalization rule, and that this may be efficiently implemented in a network architecture similar to that which has been proposed to occur in primary visual cortex. We also demonstrate improvements in image coding and compressive sensing recovery using the LSM model. 1
5 0.6404897 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives
Author: Taehwan Kim, Gregory Shakhnarovich, Raquel Urtasun
Abstract: Sparse coding has recently become a popular approach in computer vision to learn dictionaries of natural images. In this paper we extend the sparse coding framework to learn interpretable spatio-temporal primitives. We formulated the problem as a tensor factorization problem with tensor group norm constraints over the primitives, diagonal constraints on the activations that provide interpretability as well as smoothness constraints that are inherent to human motion. We demonstrate the effectiveness of our approach to learn interpretable representations of human motion from motion capture data, and show that our approach outperforms recently developed matching pursuit and sparse coding algorithms. 1
6 0.58891881 56 nips-2010-Deciphering subsampled data: adaptive compressive sampling as a principle of brain communication
7 0.57990313 271 nips-2010-Tiled convolutional neural networks
8 0.50964206 37 nips-2010-Basis Construction from Power Series Expansions of Value Functions
9 0.48880786 140 nips-2010-Layer-wise analysis of deep networks with Gaussian kernels
10 0.48540029 65 nips-2010-Divisive Normalization: Justification and Effectiveness as Efficient Coding Transform
11 0.46099061 266 nips-2010-The Maximal Causes of Natural Scenes are Edge Filters
12 0.45151493 111 nips-2010-Hallucinations in Charles Bonnet Syndrome Induced by Homeostasis: a Deep Boltzmann Machine Model
13 0.4118745 200 nips-2010-Over-complete representations on recurrent neural networks can support persistent percepts
14 0.40808693 268 nips-2010-The Neural Costs of Optimal Control
15 0.37126002 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression
16 0.36582682 89 nips-2010-Factorized Latent Spaces with Structured Sparsity
17 0.35602009 99 nips-2010-Gated Softmax Classification
18 0.3322477 96 nips-2010-Fractionally Predictive Spiking Neurons
19 0.32412603 3 nips-2010-A Bayesian Framework for Figure-Ground Interpretation
20 0.31872773 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning
topicId topicWeight
[(13, 0.041), (17, 0.014), (27, 0.056), (30, 0.047), (35, 0.381), (45, 0.205), (50, 0.031), (52, 0.033), (60, 0.035), (77, 0.035), (90, 0.034)]
simIndex simValue paperId paperTitle
1 0.89930969 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning
Author: Jun Liu, Jieping Ye
Abstract: We consider the tree structured group Lasso where the structure over the features can be represented as a tree with leaf nodes as features and internal nodes as clusters of the features. The structured regularization with a pre-defined tree structure is based on a group-Lasso penalty, where one group is defined for each node in the tree. Such a regularization can help uncover the structured sparsity, which is desirable for applications with some meaningful tree structures on the features. However, the tree structured group Lasso is challenging to solve due to the complex regularization. In this paper, we develop an efficient algorithm for the tree structured group Lasso. One of the key steps in the proposed algorithm is to solve the Moreau-Yosida regularization associated with the grouped tree structure. The main technical contributions of this paper include (1) we show that the associated Moreau-Yosida regularization admits an analytical solution, and (2) we develop an efficient algorithm for determining the effective interval for the regularization parameter. Our experimental results on the AR and JAFFE face data sets demonstrate the efficiency and effectiveness of the proposed algorithm.
2 0.82463205 97 nips-2010-Functional Geometry Alignment and Localization of Brain Areas
Author: Georg Langs, Yanmei Tie, Laura Rigolo, Alexandra Golby, Polina Golland
Abstract: Matching functional brain regions across individuals is a challenging task, largely due to the variability in their location and extent. It is particularly difficult, but highly relevant, for patients with pathologies such as brain tumors, which can cause substantial reorganization of functional systems. In such cases spatial registration based on anatomical data is only of limited value if the goal is to establish correspondences of functional areas among different individuals, or to localize potentially displaced active regions. Rather than rely on spatial alignment, we propose to perform registration in an alternative space whose geometry is governed by the functional interaction patterns in the brain. We first embed each brain into a functional map that reflects connectivity patterns during a fMRI experiment. The resulting functional maps are then registered, and the obtained correspondences are propagated back to the two brains. In application to a language fMRI experiment, our preliminary results suggest that the proposed method yields improved functional correspondences across subjects. This advantage is pronounced for subjects with tumors that affect the language areas and thus cause spatial reorganization of the functional regions. 1
same-paper 3 0.82452035 59 nips-2010-Deep Coding Network
Author: Yuanqing Lin, Zhang Tong, Shenghuo Zhu, Kai Yu
Abstract: This paper proposes a principled extension of the traditional single-layer flat sparse coding scheme, where a two-layer coding scheme is derived based on theoretical analysis of nonlinear functional approximation that extends recent results for local coordinate coding. The two-layer approach can be easily generalized to deeper structures in a hierarchical multiple-layer manner. Empirically, it is shown that the deep coding approach yields improved performance in benchmark datasets.
4 0.81068122 73 nips-2010-Efficient and Robust Feature Selection via Joint ℓ2,1-Norms Minimization
Author: Feiping Nie, Heng Huang, Xiao Cai, Chris H. Ding
Abstract: Feature selection is an important component of many machine learning applications. Especially in many bioinformatics tasks, efficient and robust feature selection methods are desired to extract meaningful features and eliminate noisy ones. In this paper, we propose a new robust feature selection method with emphasizing joint 2,1 -norm minimization on both loss function and regularization. The 2,1 -norm based loss function is robust to outliers in data points and the 2,1 norm regularization selects features across all data points with joint sparsity. An efficient algorithm is introduced with proved convergence. Our regression based objective makes the feature selection process more efficient. Our method has been applied into both genomic and proteomic biomarkers discovery. Extensive empirical studies are performed on six data sets to demonstrate the performance of our feature selection method. 1
5 0.72533536 149 nips-2010-Learning To Count Objects in Images
Author: Victor Lempitsky, Andrew Zisserman
Abstract: We propose a new supervised learning framework for visual object counting tasks, such as estimating the number of cells in a microscopic image or the number of humans in surveillance video frames. We focus on the practically-attractive case when the training images are annotated with dots (one dot per object). Our goal is to accurately estimate the count. However, we evade the hard task of learning to detect and localize individual object instances. Instead, we cast the problem as that of estimating an image density whose integral over any image region gives the count of objects within that region. Learning to infer such density can be formulated as a minimization of a regularized risk quadratic cost function. We introduce a new loss function, which is well-suited for such learning, and at the same time can be computed efficiently via a maximum subarray algorithm. The learning can then be posed as a convex quadratic program solvable with cutting-plane optimization. The proposed framework is very flexible as it can accept any domain-specific visual features. Once trained, our system provides accurate object counts and requires a very small time overhead over the feature extraction step, making it a good candidate for applications involving real-time processing or dealing with huge amount of visual data. 1
6 0.68460989 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
7 0.66116941 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
8 0.64678228 26 nips-2010-Adaptive Multi-Task Lasso: with Application to eQTL Detection
9 0.63197976 76 nips-2010-Energy Disaggregation via Discriminative Sparse Coding
10 0.62079155 249 nips-2010-Spatial and anatomical regularization of SVM for brain image analysis
11 0.61797309 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives
12 0.61709088 181 nips-2010-Network Flow Algorithms for Structured Sparsity
13 0.6168558 200 nips-2010-Over-complete representations on recurrent neural networks can support persistent percepts
14 0.61569375 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
15 0.6142534 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
16 0.61087012 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups
17 0.61066777 123 nips-2010-Individualized ROI Optimization via Maximization of Group-wise Consistency of Structural and Functional Profiles
18 0.60315675 258 nips-2010-Structured sparsity-inducing norms through submodular functions
19 0.60239077 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
20 0.60236657 5 nips-2010-A Dirty Model for Multi-task Learning