nips nips2013 nips2013-321 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Pablo Sprechmann, Roee Litman, Tal Ben Yakar, Alexander M. Bronstein, Guillermo Sapiro
Abstract: In this paper, we propose a new computationally efficient framework for learning sparse models. We formulate a unified approach that contains as particular cases models promoting sparse synthesis and analysis type of priors, and mixtures thereof. The supervised training of the proposed model is formulated as a bilevel optimization problem, in which the operators are optimized to achieve the best possible performance on a specific task, e.g., reconstruction or classification. By restricting the operators to be shift invariant, our approach can be thought as a way of learning sparsity-promoting convolutional operators. Leveraging recent ideas on fast trainable regressors designed to approximate exact sparse codes, we propose a way of constructing feed-forward networks capable of approximating the learned models at a fraction of the computational cost of exact solvers. In the shift-invariant case, this leads to a principled way of constructing a form of taskspecific convolutional networks. We illustrate the proposed models on several experiments in music analysis and image processing applications. 1
Reference: text
sentIndex sentText sentNum sentScore
1 We formulate a unified approach that contains as particular cases models promoting sparse synthesis and analysis type of priors, and mixtures thereof. [sent-13, score-0.589]
2 The supervised training of the proposed model is formulated as a bilevel optimization problem, in which the operators are optimized to achieve the best possible performance on a specific task, e. [sent-14, score-0.523]
3 By restricting the operators to be shift invariant, our approach can be thought as a way of learning sparsity-promoting convolutional operators. [sent-17, score-0.158]
4 Leveraging recent ideas on fast trainable regressors designed to approximate exact sparse codes, we propose a way of constructing feed-forward networks capable of approximating the learned models at a fraction of the computational cost of exact solvers. [sent-18, score-0.292]
5 We illustrate the proposed models on several experiments in music analysis and image processing applications. [sent-20, score-0.293]
6 Parsimony in the form of sparsity has been shown particularly useful in the fields of signal and image processing and machine learning. [sent-23, score-0.134]
7 Sparse models impose sparsity-promoting priors on the signal, which can be roughly categorized as synthesis or analysis. [sent-24, score-0.533]
8 Synthesis priors are generative, asserting that the signal is approximated well as a superposition of a small number of vectors from a (possibly redundant) synthesis dictionary. [sent-25, score-0.578]
9 Analysis priors, on the other hand, assume that the signal admits a sparse projection onto an analysis dictionary. [sent-26, score-0.22]
10 Many classes of signals, in particular, speech, music, and natural images, have been shown to be sparsely representable in overcomplete wavelet and Gabor frames, which have been successfully adopted as synthesis dictionaries in numerous applications [14]. [sent-27, score-0.568]
11 Analysis priors involving differential operators, of which total variation is a popular instance, have also been shown very successful in regularizing ill-posed image restoration problems [19]. [sent-28, score-0.212]
12 1 Despite the spectacular success of these axiomatically constructed synthesis and analysis operators, significant empirical evidence suggests that better performance is achieved when a data- or problemspecific dictionary is used instead of a predefined one. [sent-30, score-0.606]
13 Works [1, 16], followed by many others, demonstrated that synthesis dictionaries can be constructed to best represent training data by solving essentially a matrix factorization problem. [sent-31, score-0.531]
14 Despite the lack of convexity, many efficient dictionary learning procedures have been proposed. [sent-32, score-0.122]
15 This unsupervised or data-driven approach to synthesis dictionary learning is well-suited for reconstruction tasks such as image restoration. [sent-33, score-0.738]
16 For example, synthesis models with learned dictionaries, have achieved excellent results in denoising [9, 13]. [sent-34, score-0.541]
17 However, in discriminative tasks such as classification, good data reconstruction is not necessarily required or even desirable. [sent-35, score-0.122]
18 While the supervised analysis operator learning has been mainly used as regularization on inverse problems, e. [sent-38, score-0.333]
19 , denoising [5], we argue that it is often better suited for classification tasks than it synthesis counterpart, since the feature learning and the reconstruction are separated. [sent-40, score-0.586]
20 For the synthesis case, the task-oriented bilevel optimization problem is smooth and can be efficiently solved using stochastic gradient descent (SGD) [12]. [sent-42, score-0.71]
21 However, [12] heavily relies on the separability of the proximal operator of the 1 norm, and thus cannot be extended to the analysis case, where the 1 term is not separable. [sent-43, score-0.167]
22 The approach proposed in [17] formulates an analysis model with a smoothed 1 -type prior and uses implicit differentiation to obtain its gradients with respect to the dictionary required for the solution of the bilevel problem. [sent-44, score-0.57]
23 However, such approximate priors are known to produce inferior results compared to their exact counterparts. [sent-45, score-0.181]
24 This paper focuses on supervised learning of synthesis and analysis priors, making three main contributions: First, we consider a more general sparse model encompassing analysis and synthesis priors as particular cases, and formulate its supervised learning as a bilevel optimization problem. [sent-47, score-1.77]
25 We propose a new analysis technique, for which the (almost everywhere) smoothness of the proposed bilevel problem is shown, and its exact subgradients are derived. [sent-48, score-0.352]
26 We also show that the model can be extended to include a sensing matrix and a non-Euclidean metric in the data term, both of which can be learned as well. [sent-49, score-0.135]
27 Second, we show a systematic way of constructing fast fixed-complexity approximations to the solution of the proposed exact pursuit problem by unrolling few iterations of the exact iterative solver into a feed-forward network, whose parameters are learned in the supervised regime. [sent-51, score-0.713]
28 The idea of deriving a fast approximation of sparse codes from an iterative algorithm has been recently successfully advocated in [11] for the synthesis model. [sent-52, score-0.673]
29 The fast approximation in this case assumes the form of a convolutional neural network. [sent-55, score-0.123]
30 2 Analysis, synthesis, and mixed sparse models We consider a generalization of the Lasso-type [21, 22] pursuit problem λ2 1 y 2, (1) min M1 x − M2 y 2 + λ1 Ωy 1 + 2 2 y 2 2 where x ∈ Rn , y ∈ Rk , M1 , M2 are m × n and m × k, respectively, Ω is r × k, and λ1 , λ2 > 0 are parameters. [sent-56, score-0.283]
31 Here, σt (z) = sign(z) · max{|z| − t, 0} denotes the element-wise soft thresholding (the proximal operator of 1 ). [sent-63, score-0.129]
32 One the other hand, by setting M1 , M2 = I, and Ω a row-overcomplete dictionary (r > k), the standard sparse analysis model is obtained, which attempts to approximate the data vector x by another vector y in the same space admitting a sparse projection on Ω. [sent-67, score-0.402]
33 The analysis model can also be extended by adding an m × k sensing operator M2 = Φ, assuming that x is given in the mdimensional measurement space. [sent-69, score-0.167]
34 This leads to popular analysis formulations of image deblurring, super-resolution, and other inverse problems. [sent-70, score-0.178]
35 Keeping both the analysis and the synthesis dictionaries and setting M2 = D, Ω = [Ω D; I], leads ˆ to the mixed model. [sent-71, score-0.569]
36 Note that the reconstructed data vector is now obtained by x = Dy with sparse ˆ y; as a result, the 1 term is extended to make sparse the projection of x on the analysis dictionary Ω , as well as impose sparsity of y. [sent-72, score-0.444]
37 A particularly important family of analysis operators is obtained when the operator is restricted to be shift-invariant. [sent-75, score-0.195]
38 On of the most attractive properties of pursuit problem (1) is convexity, which becomes strict for λ2 > 0. [sent-84, score-0.178]
39 3 Bilevel sparse models A central focus of this paper is to develop a framework for supervised learning of the parameters in (1), collectively denoted by Θ = {M1 , M2 , D, Ω}, to achieve the best possible performance in a 3 specific task such as reconstruction or classification. [sent-92, score-0.392]
40 In sharp contrast to the generative synthesis models, where data reconstruction can be enforced unsupervisedly, there is no trivial way for unsupervised training of analysis operators without restricting them to satisfy some external, frequently arbitrary, constraints. [sent-94, score-0.651]
41 The solution of the pursuit problem defines, therefore, an unambiguous deterministic map from the space of the observations to the space of the latent variables, which we denote by y∗ (x). [sent-102, score-0.213]
42 The goal of supervised learning is to select such Θ that minimize the expectation over PX of some problem-specific loss function . [sent-104, score-0.207]
43 Problem (4) is a bilevel optimization problem [8], as we need to optimize the loss function , which in turn depends on the minimizer of (1). [sent-107, score-0.344]
44 As an example, let us examine the generic class of signal reconstruction problems, in which, as explained in Section 2, the matrix M2 = Φ plays the role of a linear degradation (e. [sent-108, score-0.126]
45 , blur and subsampling in case of image super-resolution problems), producing the degraded and, possibly, noisy observation x = Φy+n from the latent clean signal y. [sent-110, score-0.134]
46 2 2 (5) While the supervised learning of analysis operator has been considered for solving denoising problems [5, 17], here we address more general scenarios. [sent-113, score-0.341]
47 In particular, we argue that, when used along with metric learning, it is often better suited for classification tasks than its synthesis counterpart, because the non-generative nature of analysis models is more suitable for feature learning. [sent-114, score-0.525]
48 The gradients needed for the remaining model settings described in Section 2 can be obtained straightforwardly from M2 and Ω . [sent-129, score-0.144]
49 The first-order optimality conditions of (8) lead to the equalities MT (M2 y∗ − M1 x) + tΩT (Ωy∗ − z∗ ) + λ2 y∗ = 0, (8) 2 t t t t ∗ ∗ ∗ t(zt − Ωyt ) + λ1 (sign(zt ) + α) = 0, (9) where the sign of zero is defined as zero and α is a vector in Rr such that αΛ = 0 and |αΛc | ≤ 1. [sent-135, score-0.16]
50 t It has been shown that the solution of the synthesis [12], analysis [23], and generalized Lasso [22] regularization problems are all piecewise affine functions of the observations and the regularization parameter. [sent-137, score-0.519]
51 It can be shown that if λ1 is not a transition point of x, then a small perturbation in Ω, M1 , or M2 leaves Λ and the sign of the coefficients in the solution unchanged [12]. [sent-140, score-0.195]
52 Let C = UHU−1 be the eigen-decomposition of C, with H a diagonal matrix with the elements hi , 1 ≤ i ≤ n. [sent-149, score-0.125]
53 In the limit, thi → 0 if hi = 0, and thi → ∞ otherwise, yielding 0 : hi = 0, Q = lim Qt = UH U−1 B−1 with H = diag{hi }, hi = (13) 1 : hi = 0. [sent-151, score-0.664]
54 We summarize our main result in Proposition 1 below, for which we define 1 : hi = 0, hi ˜ Q = lim tQt = UH U−1 B−1 with H = diag{hi }, hi = (14) t→∞ 0 : hi = 0. [sent-154, score-0.5]
55 The network comprises K identical layers parameterized by the matrices A and B and the threshold vector t, and one output layer parameterized by the matrices U and V. [sent-156, score-0.135]
56 Obtaining the exact gradients given in Proposition 1 requires computing the eigendecomposition of C, which is in general computationally expensive. [sent-161, score-0.161]
57 The supervised model learning framework can be straightforwardly specialized to the shift-invariant case, in which filters γ i in (2) are learned instead of a full matrix Ω. [sent-163, score-0.242]
58 4 Fast approximation The discussed sparse models rely on an iterative optimization scheme such as ADMM, required to solve the pursuit problem (1). [sent-165, score-0.322]
59 From the perspective of the pursuit process, the minimization of (1) is merely a proxy to obtaining a highly non-linear map between the data vector x and the representation vector y (which can also be the “feature” vector ΩDy or the reconstructed data vector Dy, depending on the application). [sent-173, score-0.22]
60 Adopting ADMM, such a map can be expressed by unrolling a sufficient number K of iterations into a feed-forward network comprising K (identical) layers depicted in Figure 1, where the parameters A, B, U, V, and t, collectively denoted as Ψ, are prescribed by the ADMM iteration. [sent-174, score-0.301]
61 This approach was later extended to more elaborated structured sparse and low-rank models, with applications in audio separation and denoising [20]. [sent-181, score-0.249]
62 Here is the first attempt to extend it to sparse analysis and mixed analysis-synthesis models. [sent-182, score-0.143]
63 The learning of the fast encoder is performed by plugging it into the training problem (4) in place of the exact encoder. [sent-183, score-0.152]
64 In the particular setting of a shift-invariant analysis model, the described neural network encoder assumes a structure resembling that of a convolutional network. [sent-189, score-0.213]
65 5 Experimental results and discussion In what follows, we illustrate the proposed approaches on two experiments: single-image superresolution (demonstrating a reconstruction problem), and polyphonic music transcription (demonstrating a classification problem). [sent-192, score-0.573]
66 Single-image super-resolution is an inverse problem in which a high-resolution image is reconstructed from its blurred and down-sampled version lacking the high-frequency details. [sent-195, score-0.182]
67 In [25], it has been demonstrated that prefiltering a high resolution image with a Gaussian kernel with σ = 0. [sent-197, score-0.136]
68 This models very well practical image decimation schemes, since allowing a certain amount of aliasing improves the visual perception. [sent-199, score-0.143]
69 As shown in [26], up-sampling the low resolution image in this way, produces an image that is very close to the pre-filtered high resolution counterpart. [sent-202, score-0.272]
70 We also tested a convolutional neural network approximation of the third model, trained under similar conditions. [sent-207, score-0.124]
71 We observe that on the average, the supervised model outperforms A-DCT and TV by 1 − 3 dB PSNR. [sent-211, score-0.173]
72 While performing slightly inferior to the exact supervised model, the neural network approximation is about ten times faster. [sent-212, score-0.319]
73 The goal of automatic music transcription is to obtain a musical score from an input audio signal. [sent-214, score-0.409]
74 This task is particularly difficult when the audio signal is polyphonic, i. [sent-215, score-0.13]
75 Like the majority of music and speech analysis techniques, music transcription typically operates on the magnitude of the audio time-frequency representation such as the short-time Fourier transform or constant-Q transform (CQT) [7] (adopted here). [sent-218, score-0.646]
76 Given a spectral frame x at some time, the transcription problem consists of producing a binary label vector p ∈ {−1, +1}k , whose i-th element indicates the pres7 method Bicubic TV A-DCT SI-ADMM SI-NN (K = 10) mean ±std. [sent-219, score-0.158]
77 21 Table 1: PSNR in dB of different image super-resolution methods: bicubic interpolation (Bicubic), shiftinvariant analysis models with TV and DCT priors (TV and A-DCT), supervised shift-invariant analysis model (SI-ADMM), and its fast approximation with K = 10 layers (SI-NN). [sent-266, score-0.641]
78 synthesis Benetos & Dixon Poliner & Ellis 20 10 0 0 10 10 1 Number of iterations / layers (K) 60 50 40 30 Analysis ADMM Analysis NN (K=1) Analysis NN (K=10) Nonneg. [sent-268, score-0.564]
79 synthesis 20 10 10 0 2 0 20 40 60 Precision (%) 80 100 Figure 2: Left: Accuracy of the proposed analysis model (Analysis-ADMM) and its fast approximation (Analysis-NN) as the function of number of iterations or layers K. [sent-269, score-0.653]
80 For reference, the accuracy of a nonnegative synthesis model as well as two leading methods [3, 18] is shown. [sent-270, score-0.446]
81 We use k = 88 corresponding to the span of the standard piano keyboard (MIDI pitches 21 − 108). [sent-273, score-0.253]
82 We used an analysis model with a square dictionary Ω and a square metric matrix M1 = M2 to produce the feature vector z = Ωy, which was then fed to a classifier of the form p = sign(Wz+b). [sent-274, score-0.201]
83 The parameters Ω, M2 , W, and b were trained using the logistic loss on the MAPS Disklavier dataset [10] containing examples of polyphonic piano recordings with time-aligned groundtruth. [sent-275, score-0.346]
84 The testing was performed on another annotated real piano dataset from [18]. [sent-276, score-0.144]
85 For comparison, we show the performance of a supervised nonnegative synthesis model and two leading methods [3, 18] evaluated in the same settings. [sent-278, score-0.619]
86 This measure is frequently used in the music analysis literature [3, 18]. [sent-280, score-0.204]
87 The supervised analysis model outperforms leading pitch transcription methods. [sent-281, score-0.417]
88 In this example, ten layers are enough for having a good representation and the improvement obtained by adding layers begins to be very marginal around this point. [sent-283, score-0.166]
89 We presented a bilevel optimization framework for the supervised learning of a superset of sparse analysis and synthesis models. [sent-285, score-1.026]
90 We also showed that in applications requiring low complexity or latency, a fast approximation to the exact solution of the pursuit problem can be achieved by a feed-forward architecture derived from truncated ADMM. [sent-286, score-0.314]
91 The obtained fast regressor can be initialized with the model parameters trained through the supervised bilevel framework, and tuned similarly to the training and adaptation of neural networks. [sent-287, score-0.488]
92 We observed that the structure of the network becomes essentially a convolutional network in the case of shift-invariant models. [sent-288, score-0.176]
93 The generative setting of the proposed approaches was demonstrated on an image restoration experiment, while the discriminative setting was tested in a polyphonic piano transcription experiment. [sent-289, score-0.636]
94 k-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. [sent-295, score-0.19]
95 A fast iterative shrinkage-thresholding algorithm for linear inverse problems. [sent-303, score-0.141]
96 Multiple-instrument polyphonic music transcription using a convolutive probabilistic model. [sent-311, score-0.492]
97 Learning l1-based analysis and synthesis sparsity priors using bi-level optimization. [sent-322, score-0.571]
98 Image denoising via sparse and redundant representations over learned dictionaries. [sent-351, score-0.2]
99 Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. [sent-360, score-0.144]
100 Image super-resolution as sparse representation of raw image patches. [sent-452, score-0.194]
wordName wordTfidf (topN-words)
[('synthesis', 0.446), ('bilevel', 0.264), ('admm', 0.191), ('pursuit', 0.178), ('supervised', 0.173), ('polyphonic', 0.168), ('music', 0.166), ('zout', 0.163), ('sign', 0.16), ('transcription', 0.158), ('piano', 0.144), ('hi', 0.125), ('dictionary', 0.122), ('gradients', 0.111), ('pitches', 0.109), ('sparse', 0.105), ('mt', 0.101), ('bronstein', 0.096), ('image', 0.089), ('priors', 0.087), ('operators', 0.086), ('dictionaries', 0.085), ('audio', 0.085), ('qt', 0.085), ('layers', 0.083), ('bicubic', 0.082), ('bout', 0.082), ('thi', 0.082), ('reconstruction', 0.081), ('tv', 0.075), ('convolutional', 0.072), ('operator', 0.071), ('lters', 0.068), ('aviv', 0.066), ('tel', 0.066), ('unrolling', 0.066), ('latency', 0.062), ('denoising', 0.059), ('proximal', 0.058), ('sensing', 0.058), ('aliasing', 0.054), ('benetos', 0.054), ('bprev', 0.054), ('dct', 0.054), ('prev', 0.054), ('uh', 0.054), ('zin', 0.054), ('network', 0.052), ('encoder', 0.051), ('fast', 0.051), ('inverse', 0.051), ('exact', 0.05), ('pitch', 0.048), ('poliner', 0.048), ('dy', 0.047), ('resolution', 0.047), ('minimizer', 0.046), ('classi', 0.046), ('signal', 0.045), ('blurring', 0.044), ('elad', 0.044), ('inferior', 0.044), ('yk', 0.043), ('px', 0.042), ('reconstructed', 0.042), ('parsimony', 0.042), ('sprechmann', 0.042), ('metric', 0.041), ('discriminative', 0.041), ('everywhere', 0.04), ('counterpart', 0.04), ('psnr', 0.04), ('deconvolution', 0.04), ('coming', 0.039), ('iterative', 0.039), ('nn', 0.038), ('fp', 0.038), ('sapiro', 0.038), ('analysis', 0.038), ('lter', 0.037), ('adopted', 0.037), ('zj', 0.037), ('tp', 0.036), ('restoration', 0.036), ('learned', 0.036), ('solution', 0.035), ('iterations', 0.035), ('loss', 0.034), ('yj', 0.034), ('collectively', 0.033), ('straightforwardly', 0.033), ('images', 0.033), ('speech', 0.033), ('prescribed', 0.032), ('duke', 0.032), ('advocated', 0.032), ('unstructured', 0.032), ('projection', 0.032), ('positives', 0.031), ('proposition', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 321 nips-2013-Supervised Sparse Analysis and Synthesis Operators
Author: Pablo Sprechmann, Roee Litman, Tal Ben Yakar, Alexander M. Bronstein, Guillermo Sapiro
Abstract: In this paper, we propose a new computationally efficient framework for learning sparse models. We formulate a unified approach that contains as particular cases models promoting sparse synthesis and analysis type of priors, and mixtures thereof. The supervised training of the proposed model is formulated as a bilevel optimization problem, in which the operators are optimized to achieve the best possible performance on a specific task, e.g., reconstruction or classification. By restricting the operators to be shift invariant, our approach can be thought as a way of learning sparsity-promoting convolutional operators. Leveraging recent ideas on fast trainable regressors designed to approximate exact sparse codes, we propose a way of constructing feed-forward networks capable of approximating the learned models at a fraction of the computational cost of exact solvers. In the shift-invariant case, this leads to a principled way of constructing a form of taskspecific convolutional networks. We illustrate the proposed models on several experiments in music analysis and image processing applications. 1
2 0.14869171 251 nips-2013-Predicting Parameters in Deep Learning
Author: Misha Denil, Babak Shakibi, Laurent Dinh, Marc'Aurelio Ranzato, Nando de Freitas
Abstract: We demonstrate that there is significant redundancy in the parameterization of several deep learning models. Given only a few weight values for each feature it is possible to accurately predict the remaining values. Moreover, we show that not only can the parameter values be predicted, but many of them need not be learned at all. We train several different architectures by learning only a small number of weights and predicting the rest. In the best case we are able to predict more than 95% of the weights of a network without any drop in accuracy. 1
3 0.13247012 85 nips-2013-Deep content-based music recommendation
Author: Aaron van den Oord, Sander Dieleman, Benjamin Schrauwen
Abstract: Automatic music recommendation has become an increasingly relevant problem in recent years, since a lot of music is now sold and consumed digitally. Most recommender systems rely on collaborative filtering. However, this approach suffers from the cold start problem: it fails when no usage data is available, so it is not effective for recommending new and unpopular songs. In this paper, we propose to use a latent factor model for recommendation, and predict the latent factors from music audio when they cannot be obtained from usage data. We compare a traditional approach using a bag-of-words representation of the audio signals with deep convolutional neural networks, and evaluate the predictions quantitatively and qualitatively on the Million Song Dataset. We show that using predicted latent factors produces sensible recommendations, despite the fact that there is a large semantic gap between the characteristics of a song that affect user preference and the corresponding audio signal. We also show that recent advances in deep learning translate very well to the music recommendation setting, with deep convolutional neural networks significantly outperforming the traditional approach. 1
4 0.12432623 157 nips-2013-Learning Multi-level Sparse Representations
Author: Ferran Diego Andilla, Fred A. Hamprecht
Abstract: Bilinear approximation of a matrix is a powerful paradigm of unsupervised learning. In some applications, however, there is a natural hierarchy of concepts that ought to be reflected in the unsupervised analysis. For example, in the neurosciences image sequence considered here, there are the semantic concepts of pixel → neuron → assembly that should find their counterpart in the unsupervised analysis. Driven by this concrete problem, we propose a decomposition of the matrix of observations into a product of more than two sparse matrices, with the rank decreasing from lower to higher levels. In contrast to prior work, we allow for both hierarchical and heterarchical relations of lower-level to higher-level concepts. In addition, we learn the nature of these relations rather than imposing them. Finally, we describe an optimization scheme that allows to optimize the decomposition over all levels jointly, rather than in a greedy level-by-level fashion. The proposed bilevel SHMF (sparse heterarchical matrix factorization) is the first formalism that allows to simultaneously interpret a calcium imaging sequence in terms of the constituent neurons, their membership in assemblies, and the time courses of both neurons and assemblies. Experiments show that the proposed model fully recovers the structure from difficult synthetic data designed to imitate the experimental data. More importantly, bilevel SHMF yields plausible interpretations of real-world Calcium imaging data. 1
5 0.12143356 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding
Author: Marius Pachitariu, Adam M. Packer, Noah Pettit, Henry Dalgleish, Michael Hausser, Maneesh Sahani
Abstract: Biological tissue is often composed of cells with similar morphologies replicated throughout large volumes and many biological applications rely on the accurate identification of these cells and their locations from image data. Here we develop a generative model that captures the regularities present in images composed of repeating elements of a few different types. Formally, the model can be described as convolutional sparse block coding. For inference we use a variant of convolutional matching pursuit adapted to block-based representations. We extend the KSVD learning algorithm to subspaces by retaining several principal vectors from the SVD decomposition instead of just one. Good models with little cross-talk between subspaces can be obtained by learning the blocks incrementally. We perform extensive experiments on simulated images and the inference algorithm consistently recovers a large proportion of the cells with a small number of false positives. We fit the convolutional model to noisy GCaMP6 two-photon images of spiking neurons and to Nissl-stained slices of cortical tissue and show that it recovers cell body locations without supervision. The flexibility of the block-based representation is reflected in the variability of the recovered cell shapes. 1
6 0.10431458 75 nips-2013-Convex Two-Layer Modeling
7 0.10229409 65 nips-2013-Compressive Feature Learning
8 0.10156353 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
9 0.097379535 212 nips-2013-Non-Uniform Camera Shake Removal Using a Spatially-Adaptive Sparse Penalty
10 0.095757149 293 nips-2013-Sign Cauchy Projections and Chi-Square Kernel
11 0.09253414 331 nips-2013-Top-Down Regularization of Deep Belief Networks
12 0.080694005 235 nips-2013-Online learning in episodic Markovian decision processes by relative entropy policy search
13 0.079446718 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
14 0.079017811 149 nips-2013-Latent Structured Active Learning
15 0.078935817 56 nips-2013-Better Approximation and Faster Algorithm Using the Proximal Average
16 0.078539304 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
17 0.078501865 146 nips-2013-Large Scale Distributed Sparse Precision Estimation
18 0.076629758 171 nips-2013-Learning with Noisy Labels
19 0.075860463 313 nips-2013-Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization
20 0.075737 301 nips-2013-Sparse Additive Text Models with Low Rank Background
topicId topicWeight
[(0, 0.233), (1, 0.096), (2, -0.023), (3, -0.027), (4, 0.038), (5, -0.04), (6, -0.11), (7, 0.07), (8, -0.037), (9, -0.057), (10, 0.016), (11, 0.036), (12, -0.007), (13, -0.031), (14, -0.066), (15, 0.001), (16, -0.025), (17, -0.016), (18, -0.025), (19, 0.012), (20, -0.028), (21, -0.011), (22, 0.076), (23, 0.013), (24, 0.057), (25, -0.04), (26, -0.013), (27, -0.025), (28, 0.085), (29, 0.002), (30, -0.046), (31, 0.128), (32, 0.032), (33, 0.121), (34, -0.059), (35, -0.006), (36, 0.047), (37, 0.018), (38, 0.029), (39, -0.08), (40, 0.03), (41, 0.047), (42, 0.046), (43, -0.013), (44, -0.1), (45, -0.009), (46, 0.035), (47, -0.064), (48, -0.017), (49, -0.089)]
simIndex simValue paperId paperTitle
same-paper 1 0.93661422 321 nips-2013-Supervised Sparse Analysis and Synthesis Operators
Author: Pablo Sprechmann, Roee Litman, Tal Ben Yakar, Alexander M. Bronstein, Guillermo Sapiro
Abstract: In this paper, we propose a new computationally efficient framework for learning sparse models. We formulate a unified approach that contains as particular cases models promoting sparse synthesis and analysis type of priors, and mixtures thereof. The supervised training of the proposed model is formulated as a bilevel optimization problem, in which the operators are optimized to achieve the best possible performance on a specific task, e.g., reconstruction or classification. By restricting the operators to be shift invariant, our approach can be thought as a way of learning sparsity-promoting convolutional operators. Leveraging recent ideas on fast trainable regressors designed to approximate exact sparse codes, we propose a way of constructing feed-forward networks capable of approximating the learned models at a fraction of the computational cost of exact solvers. In the shift-invariant case, this leads to a principled way of constructing a form of taskspecific convolutional networks. We illustrate the proposed models on several experiments in music analysis and image processing applications. 1
2 0.7229268 65 nips-2013-Compressive Feature Learning
Author: Hristo S. Paskov, Robert West, John C. Mitchell, Trevor Hastie
Abstract: This paper addresses the problem of unsupervised feature learning for text data. Our method is grounded in the principle of minimum description length and uses a dictionary-based compression scheme to extract a succinct feature set. Specifically, our method finds a set of word k-grams that minimizes the cost of reconstructing the text losslessly. We formulate document compression as a binary optimization task and show how to solve it approximately via a sequence of reweighted linear programs that are efficient to solve and parallelizable. As our method is unsupervised, features may be extracted once and subsequently used in a variety of tasks. We demonstrate the performance of these features over a range of scenarios including unsupervised exploratory analysis and supervised text categorization. Our compressed feature space is two orders of magnitude smaller than the full k-gram space and matches the text categorization accuracy achieved in the full feature space. This dimensionality reduction not only results in faster training times, but it can also help elucidate structure in unsupervised learning tasks and reduce the amount of training data necessary for supervised learning. 1
3 0.69958001 212 nips-2013-Non-Uniform Camera Shake Removal Using a Spatially-Adaptive Sparse Penalty
Author: Haichao Zhang, David Wipf
Abstract: Typical blur from camera shake often deviates from the standard uniform convolutional assumption, in part because of problematic rotations which create greater blurring away from some unknown center point. Consequently, successful blind deconvolution for removing shake artifacts requires the estimation of a spatiallyvarying or non-uniform blur operator. Using ideas from Bayesian inference and convex analysis, this paper derives a simple non-uniform blind deblurring algorithm with a spatially-adaptive image penalty. Through an implicit normalization process, this penalty automatically adjust its shape based on the estimated degree of local blur and image structure such that regions with large blur or few prominent edges are discounted. Remaining regions with modest blur and revealing edges therefore dominate on average without explicitly incorporating structureselection heuristics. The algorithm can be implemented using an optimization strategy that is virtually tuning-parameter free and simpler than existing methods, and likely can be applied in other settings such as dictionary learning. Detailed theoretical analysis and empirical comparisons on real images serve as validation.
4 0.67558247 350 nips-2013-Wavelets on Graphs via Deep Learning
Author: Raif Rustamov, Leonidas Guibas
Abstract: An increasing number of applications require processing of signals defined on weighted graphs. While wavelets provide a flexible tool for signal processing in the classical setting of regular domains, the existing graph wavelet constructions are less flexible – they are guided solely by the structure of the underlying graph and do not take directly into consideration the particular class of signals to be processed. This paper introduces a machine learning framework for constructing graph wavelets that can sparsely represent a given class of signals. Our construction uses the lifting scheme, and is based on the observation that the recurrent nature of the lifting scheme gives rise to a structure resembling a deep auto-encoder network. Particular properties that the resulting wavelets must satisfy determine the training objective and the structure of the involved neural networks. The training is unsupervised, and is conducted similarly to the greedy pre-training of a stack of auto-encoders. After training is completed, we obtain a linear wavelet transform that can be applied to any graph signal in time and memory linear in the size of the graph. Improved sparsity of our wavelet transform for the test signals is confirmed via experiments both on synthetic and real data. 1
5 0.67264146 214 nips-2013-On Algorithms for Sparse Multi-factor NMF
Author: Siwei Lyu, Xin Wang
Abstract: Nonnegative matrix factorization (NMF) is a popular data analysis method, the objective of which is to approximate a matrix with all nonnegative components into the product of two nonnegative matrices. In this work, we describe a new simple and efficient algorithm for multi-factor nonnegative matrix factorization (mfNMF) problem that generalizes the original NMF problem to more than two factors. Furthermore, we extend the mfNMF algorithm to incorporate a regularizer based on the Dirichlet distribution to encourage the sparsity of the components of the obtained factors. Our sparse mfNMF algorithm affords a closed form and an intuitive interpretation, and is more efficient in comparison with previous works that use fix point iterations. We demonstrate the effectiveness and efficiency of our algorithms on both synthetic and real data sets. 1
6 0.6709196 157 nips-2013-Learning Multi-level Sparse Representations
7 0.65987539 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions
8 0.65851206 251 nips-2013-Predicting Parameters in Deep Learning
9 0.65296525 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding
10 0.62224329 249 nips-2013-Polar Operators for Structured Sparse Estimation
11 0.61861855 202 nips-2013-Multiclass Total Variation Clustering
12 0.60784632 247 nips-2013-Phase Retrieval using Alternating Minimization
13 0.59409428 27 nips-2013-Adaptive Multi-Column Deep Neural Networks with Application to Robust Image Denoising
14 0.58201325 186 nips-2013-Matrix factorization with binary components
15 0.58168995 117 nips-2013-Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis
16 0.57419533 245 nips-2013-Pass-efficient unsupervised feature selection
17 0.56661654 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning
18 0.56441998 85 nips-2013-Deep content-based music recommendation
19 0.56116271 119 nips-2013-Fast Template Evaluation with Vector Quantization
20 0.5572899 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
topicId topicWeight
[(2, 0.017), (16, 0.025), (33, 0.162), (34, 0.125), (35, 0.202), (36, 0.016), (41, 0.045), (49, 0.03), (56, 0.106), (70, 0.034), (85, 0.04), (89, 0.048), (93, 0.066), (95, 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.8490352 321 nips-2013-Supervised Sparse Analysis and Synthesis Operators
Author: Pablo Sprechmann, Roee Litman, Tal Ben Yakar, Alexander M. Bronstein, Guillermo Sapiro
Abstract: In this paper, we propose a new computationally efficient framework for learning sparse models. We formulate a unified approach that contains as particular cases models promoting sparse synthesis and analysis type of priors, and mixtures thereof. The supervised training of the proposed model is formulated as a bilevel optimization problem, in which the operators are optimized to achieve the best possible performance on a specific task, e.g., reconstruction or classification. By restricting the operators to be shift invariant, our approach can be thought as a way of learning sparsity-promoting convolutional operators. Leveraging recent ideas on fast trainable regressors designed to approximate exact sparse codes, we propose a way of constructing feed-forward networks capable of approximating the learned models at a fraction of the computational cost of exact solvers. In the shift-invariant case, this leads to a principled way of constructing a form of taskspecific convolutional networks. We illustrate the proposed models on several experiments in music analysis and image processing applications. 1
2 0.84656799 73 nips-2013-Convex Relaxations for Permutation Problems
Author: Fajwel Fogel, Rodolphe Jenatton, Francis Bach, Alexandre D'Aspremont
Abstract: Seriation seeks to reconstruct a linear order between variables using unsorted similarity information. It has direct applications in archeology and shotgun gene sequencing for example. We prove the equivalence between the seriation and the combinatorial 2-SUM problem (a quadratic minimization problem over permutations) over a class of similarity matrices. The seriation problem can be solved exactly by a spectral algorithm in the noiseless case and we produce a convex relaxation for the 2-SUM problem to improve the robustness of solutions in a noisy setting. This relaxation also allows us to impose additional structural constraints on the solution, to solve semi-supervised seriation problems. We present numerical experiments on archeological data, Markov chains and gene sequences. 1
3 0.76904804 99 nips-2013-Dropout Training as Adaptive Regularization
Author: Stefan Wager, Sida Wang, Percy Liang
Abstract: Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is first-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learning algorithm, and find that a close relative of AdaGrad operates by repeatedly solving linear dropout-regularized problems. By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer. We apply this idea to document classification tasks, and show that it consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset. 1
4 0.7684769 201 nips-2013-Multi-Task Bayesian Optimization
Author: Kevin Swersky, Jasper Snoek, Ryan P. Adams
Abstract: Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up k-fold cross-validation. Lastly, we propose an adaptation of a recently developed acquisition function, entropy search, to the cost-sensitive, multi-task setting. We demonstrate the utility of this new acquisition function by leveraging a small dataset to explore hyperparameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost. 1
5 0.76721722 173 nips-2013-Least Informative Dimensions
Author: Fabian Sinz, Anna Stockl, January Grewe, January Benda
Abstract: We present a novel non-parametric method for finding a subspace of stimulus features that contains all information about the response of a system. Our method generalizes similar approaches to this problem such as spike triggered average, spike triggered covariance, or maximally informative dimensions. Instead of maximizing the mutual information between features and responses directly, we use integral probability metrics in kernel Hilbert spaces to minimize the information between uninformative features and the combination of informative features and responses. Since estimators of these metrics access the data via kernels, are easy to compute, and exhibit good theoretical convergence properties, our method can easily be generalized to populations of neurons or spike patterns. By using a particular expansion of the mutual information, we can show that the informative features must contain all information if we can make the uninformative features independent of the rest. 1
6 0.76567858 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
7 0.76452643 251 nips-2013-Predicting Parameters in Deep Learning
8 0.76364911 5 nips-2013-A Deep Architecture for Matching Short Texts
9 0.76291585 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
10 0.76276785 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
11 0.76271468 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding
12 0.76211679 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization
13 0.76084238 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits
14 0.76067013 350 nips-2013-Wavelets on Graphs via Deep Learning
15 0.76046109 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables
16 0.76043153 294 nips-2013-Similarity Component Analysis
17 0.76029211 318 nips-2013-Structured Learning via Logistic Regression
18 0.76019704 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result
19 0.75851303 301 nips-2013-Sparse Additive Text Models with Low Rank Background
20 0.75837755 75 nips-2013-Convex Two-Layer Modeling