nips nips2009 nips2009-137 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Benjamin Culpepper, Bruno A. Olshausen
Abstract: We describe an unsupervised manifold learning algorithm that represents a surface through a compact description of operators that traverse it. The operators are based on matrix exponentials, which are the solution to a system of first-order linear differential equations. The matrix exponents are represented by a basis that is adapted to the statistics of the data so that the infinitesimal generator for a trajectory along the underlying manifold can be produced by linearly composing a few elements. The method is applied to recover topological structure from low dimensional synthetic data, and to model local structure in how natural images change over time and scale. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Learning transport operators for image manifolds Bruno A. [sent-1, score-0.734]
2 edu Abstract We describe an unsupervised manifold learning algorithm that represents a surface through a compact description of operators that traverse it. [sent-6, score-0.631]
3 The operators are based on matrix exponentials, which are the solution to a system of first-order linear differential equations. [sent-7, score-0.328]
4 The matrix exponents are represented by a basis that is adapted to the statistics of the data so that the infinitesimal generator for a trajectory along the underlying manifold can be produced by linearly composing a few elements. [sent-8, score-0.343]
5 The method is applied to recover topological structure from low dimensional synthetic data, and to model local structure in how natural images change over time and scale. [sent-9, score-0.194]
6 1 Introduction It is well known that natural images occupy a small fraction of the space of all possible images. [sent-10, score-0.194]
7 Moreover, as images change over time in response to observer motion or changes in the environment they trace out particular trajectories along manifolds in this space. [sent-11, score-0.478]
8 In this paper, we derive methods for learning these transport operators from data. [sent-13, score-0.488]
9 Rather than simply learning a mapping of individual data points to a low-dimensional space, we seek a compact representation of the entire manifold via the operators that traverse it. [sent-14, score-0.598]
10 We investigate a direct application of the Lie approach to invariance [1] utilizing a matrix exponential generative model for transforming images. [sent-15, score-0.162]
11 This is in contrast to previous methods that rely mainly upon a firstorder Taylor series approximation of the matrix exponential [2,3], and bilinear models, in which the transformation variables interact multiplicatively with the input [4,5,6]. [sent-16, score-0.182]
12 It is also distinct from the class of methods that learn embeddings of manifolds from point cloud data [7,8,9,10]. [sent-17, score-0.205]
13 We share the goal of [12] of learning a model of the manifold which can then be generalized to new data. [sent-19, score-0.138]
14 Here we show how a particular class of transport operators for moving along manifolds may be learned from data. [sent-20, score-0.831]
15 Subsequently, we apply it to time-varying natural images and extrapolate along inferred trajectories to demonstrate super-resolution and temporal filling-in of missing video frames. [sent-22, score-0.496]
16 1 2 Problem formulation Let us consider an image of the visual world at time t as a point x ∈ RN , where the elements of x correspond to image pixels. [sent-23, score-0.221]
17 We describe the evolution of x as x = Ax, ˙ (1) where the matrix A is a linear operator capturing some action in the environment that transforms the image. [sent-24, score-0.209]
18 Such an action belongs to a family that occupies a subspace of RN ×N given by M A= Ψ m cm (2) m=1 for some M ≤ N 2 (usually M << N 2 ), with Ψm ∈ RN ×N . [sent-25, score-0.197]
19 The amount of a particular action from the dictionary Ψm that occurs is controlled by the corresponding cm . [sent-26, score-0.291]
20 At t = 0, a vision system takes an image x0 , and then makes repeated observations at intervals ∆t. [sent-27, score-0.146]
21 Given x0 , the solution to (1) traces out a continuously differentiable manifold of images given by xt = exp(At) x0 , which we observe periodically. [sent-28, score-0.405]
22 Our goal is to learn an appropriate set of bases, Ψ, that allow for a compact description of this set of transformations by training on many pairs of related observations. [sent-29, score-0.301]
23 This generative model for transformed images has a number of attractive properties. [sent-30, score-0.128]
24 First, it factors apart the time-varying image into an invariant part (the initial image, x0 ) and variant part (the transformation, parameterized by the coefficient vector c), thus making explicit the underlying causes. [sent-31, score-0.139]
25 Second, the learned exponential operators are quite powerful in terms of modeling capacity, compared to their linear counterparts. [sent-32, score-0.413]
26 These points are related through an exponentiated matrix that itself is composed of a few basis elements, plus zero-mean white i. [sent-37, score-0.189]
27 Gaussian noise, n: x1 = T(c) x0 + n T(c) = exp( Ψm cm ) . [sent-40, score-0.158]
28 Then ∂ exp(A)ij = Fαβ Uiα Vkα Ulβ Vjβ , (6) ∂Akl αβ 2 choose M ≤ N 2 initialize Ψ while stopping criteria is not met, pick x0 , x1 initialize c to zeros c ← arg minc E ∂E ∆Ψ = −η ∂Ψ sort Ψm by ||Ψm ||F M ← max m s. [sent-60, score-0.115]
29 Steps 8-9 shrink the subspace spanned by the dictionary if one or more of the elements have shrunk sufficiently in norm. [sent-67, score-0.182]
30 After computing two intermediate terms P and Q, P = UT (x1 x0 T + x0 x0 T TT )V Qkl = Vkα Ulβ Fαβ Pαβ , (8) (9) αβ the two partial derivatives for inference and learning are: ∂E ∂cm ∂E ∂Ψklm = Qkl Ψklm + ζ sgn(cm ) (10) kl = Qkl cm + γ Ψklm . [sent-72, score-0.251]
31 4 Experiments on point sets We first test the model by applying it to simple datasets where the solutions are known: learning the topology of a sphere and a torus. [sent-74, score-0.255]
32 Second, we apply the model to learn the manifold of timevarying responses to a natural movie from complex oriented filters. [sent-75, score-0.431]
33 Related pairs of points on a torus are generated by choosing two angles θ0 , φ0 uniformly at random from [0, 2π]; two related angles θ1 , φ1 are produced by sampling from two von Mises distributions with means θ0 and φ0 , and concentration κ = 5 using the circular statistics toolbox of [15]. [sent-79, score-0.591]
34 Though parameterized by two angles, the coordinates of points on these surfaces are 3- and 4-dimensional; pairs of points xt for t = 0, 1 on the unit sphere are given by xt = (sin θt cos φt , sin θt sin φt , cos θt ), and points on a torus by xt = (cos θt , sin θt , cos φt , sin φt ). [sent-81, score-1.511]
35 5 −1 −1 (a) −1 (b) Figure 2: Orbits of learned sphere operators. [sent-94, score-0.203]
36 (a) Three Ψm basis elements applied to points at the six poles of the sphere, (1, 0, 0), (0, 1, 0), (0, 0, 1), (−1, 0, 0), (0, −1, 0), and (0, 0, −1). [sent-95, score-0.187]
37 The orbits are generated by setting x0 to a pole, then plotting xt = exp(Ψm t) x0 for t = [−100, 100]. [sent-96, score-0.383]
38 (b) When superimposed on top of each other, the three sets of orbits clearly define the surface of a sphere. [sent-97, score-0.326]
39 5 x3 −1 −1 x2 Figure 3: Orbits of learned torus operators. [sent-122, score-0.252]
40 Each row shows three projections of a Ψm basis element applied to a point on the surface of the torus. [sent-123, score-0.181]
41 The orbits shown are generated by setting x0 = (0, 1, 0, 1) then plotting xt = exp(Ψm t) x0 for t = [−1000, 1000] in projections constructed from each triplet of the four coordinates. [sent-124, score-0.383]
42 4 Figure 4: Learning transformations of oriented filter pairs across time. [sent-126, score-0.258]
43 The orbits of three three complex filter outputs in response to a natural movie. [sent-127, score-0.338]
44 The blue points denote the complex output for each frame in the movie sequence and are linked to their neighbors via the blue line. [sent-128, score-0.35]
45 The points circled in red were observed by the model, and the red curve shows an extrapolation along the estimated trajectory. [sent-129, score-0.126]
46 For cases where topology can be recovered, the solution is robust to the settings of γ and ζ – changing either variable by an order of magnitude does not change the solution, though it may increase the number of learning steps required to get to it. [sent-139, score-0.197]
47 In cases where the topology can not be recovered, the influence on the solution of the settings of γ and ζ is more subtle, as their relative values effectively trade-off the importance of data reconstruction and the sparsity of the vector c. [sent-140, score-0.211]
48 Figure 2 shows orbits produced by applying each of the remaining Ψm operators to points on the sphere. [sent-145, score-0.583]
49 Similar experiments are successful for the torus; Figure 3 shows trajectories of the operators learned for the torus. [sent-146, score-0.407]
50 As an intermediate step towards modeling time varying natural images, we investigate the model’s ability to learn the response surface for a single complex oriented filter to a moving image. [sent-147, score-0.367]
51 A complex pyramid is built from each frame in a movie, and pairs of filter responses 1 to 4 frames apart are observed by the model. [sent-148, score-0.466]
52 Four 2x2 basis functions are learned in the manner described above. [sent-149, score-0.15]
53 Figure 4 shows three representative examples that illustrate how well the model is able to extrapolate from the solution estimated using the learned basis Ψ, and complex responses from the same filter within a 4 frame time interval. [sent-150, score-0.444]
54 5 Experiments on movies In the image domain, our model has potential applications in temporal interpolation/filling in of video, super-resolution, compression, and geodesic distance estimation. [sent-152, score-0.157]
55 We apply the model to moving natural images and investigate the first three applications; the third will be the subject of future work. [sent-153, score-0.254]
56 Here we report on the ability of the model to learn transformations across time, as well as across scales of the Laplacian pyramid. [sent-154, score-0.192]
57 Our data is many short grayscale video sequences of Africa from the BBC. [sent-155, score-0.136]
58 1 Time We apply the model to natural movies by presenting it with patches of adjacent frame pairs. [sent-157, score-0.375]
59 Using an analytically generated infinitesimal shift operator, we first run a series of experiments to determine the effect of local minima on the recovery of a known displacement through the minimization of E 5 Figure 5: Shift operator learned from synthetically transformed natural images. [sent-158, score-0.417]
60 The operator Ψ1 , displayed as an array of weights that, for each output pixel, shows the strength of its connection to each input pixel. [sent-159, score-0.129]
61 Because of the 1/f 2 falloff in the power spectrum of natural images, synthetic images with a wildly different distribution of spatial frequency content, such as uncorrelated noise, will not be properly shifted by this operator. [sent-161, score-0.318]
62 00 exponential linear 1st order Taylor approximation Figure 6: Interpolating between shifted images to temporally fill in missing video frames. [sent-172, score-0.45]
63 Two images x0 and x1 are generated by convolving an image of a diagonal line and a shifted diagonal line by a 3x3 Gaussian kernel with σ = 0. [sent-173, score-0.305]
64 The top row shows the sequence of images xt = exp(At) x0 for t = 0. [sent-175, score-0.229]
65 The bottom row shows the sequence of images xt = (I + At) x0 , that is, the first-order Taylor expansion of the matrix exponential, which performs poorly for shifts greater than one pixel. [sent-181, score-0.229]
66 Doing so requires a slight alteration to our inference algorithm: now we must solve a sequence of optimization problems on frame pairs convolved with a Gaussian kernel whose variance is progressively decreased. [sent-187, score-0.281]
67 At each step in the sequence, both frames are convolved by the kernel before a patch is selected. [sent-188, score-0.242]
68 For the first step, the c variables are initialized to zero; for subsequent steps they are initialized to the solution of the previous step. [sent-189, score-0.227]
69 For control purposes, the video for this experiment comes from a camera fly over; thus, most of the motion in the scene is due to camera motion. [sent-191, score-0.292]
70 We apply the model pairs of 11x11 patches, selected from random locations in the video, but discarding patches near the horizon where there is little or no motion. [sent-192, score-0.138]
71 We initialize M = 16; after learning, the basis function with the longest norm has the structure of a shift operator in the primary direction of motion taking place in the video sequence. [sent-193, score-0.517]
72 Using these 16 operators, we run inference on 1,000 randomly selected pairs of patches from a second video, not used during learning, and measure the quality of the reconstruction as the trajectory is used to predict into the future. [sent-194, score-0.25]
73 At 5 frames into the future, our model is able to maintain an average SNR of 7, compared to SNR 5 when a first-order Taylor approximation is used in place of the matrix exponential; for comparison, the average SNR for the identity transformation model on this data is 1. [sent-195, score-0.116]
74 Since the primary form of motion going on in these small patches is translation, we also train a single operator using artificially translated natural data to make clear that the model can learn this case completely. [sent-196, score-0.4]
75 For this last experiment we take a 360x360 pixel frame of our natural movie, and continuously translate the entire frame in the Fourier domain by a displacement chosen uniformly at random from [0, 3] pixels. [sent-197, score-0.549]
76 We then randomly select a 15x15 region on the interior of the pair of frames and use the two 225 pixel vectors as our x0 and x1 . [sent-198, score-0.157]
77 We modify the objective function to be 1 γ E = ||W [x1 − T(c) x0 ]||2 + ||Ψm ||2 + ζ||c||1 , (12) 2 F 2 2 m where W is a binary windowing function that selects the central 9x9 region from a 15x15 patch; thus, the residual errors that come from new content translating into the patch are ignored. [sent-199, score-0.196]
78 After learning, the basis function Ψ1 (shown in figure 5) is capable of translating natural images up to 3 pixels while maintaining an average SNR of 16 in the 9x9 center region. [sent-200, score-0.324]
79 Figure 6 shows how this operator is able to correctly interpolate between two measurements of a shifted image in order to temporally up-sample a movie. [sent-201, score-0.394]
80 2 Scale The model can also learn to transform between successive scales in the Laplacian pyramid built from a single frame of a video sequence. [sent-203, score-0.423]
81 Figure 7 depicts the system transforming an image patch from scale 2 to 3 of a 256x256 pixel image. [sent-204, score-0.413]
82 We initialize M = 100, but many basis elements shrink during learning; we use only the 16 Ψm with non-negligible norm to encode a scale change. [sent-205, score-0.284]
83 The basis Ψ is initialized to mean-zero white Gaussian noise with variance 0. [sent-206, score-0.2]
84 01; the same inference and learning procedures as described for the point sets are then run on pairs x0 , x1 selected at random from the corpus of image sequences in the following way. [sent-207, score-0.152]
85 First, we choose a random frame from a random sequence, then up-sample and blur scale 3 of its Laplacian pyramid. [sent-208, score-0.237]
86 Second, we select an 8x8 patch from scale 2 (x0 ) of the corresponding up-blurred image patch (x1 ). [sent-209, score-0.38]
87 Were it not for the highly structured manifold on which natural images live, the proposition of finding an operator that maps a blurred, subsampled image to its high-resolution original state would seem untenable. [sent-210, score-0.552]
88 6 Discussion and conclusions We have shown that it is possible to learn low-dimensional parameterizations of operators that transport along non-linear manifolds formed by natural images, both across time and scale. [sent-212, score-0.823]
89 Our focus thus far has been primarily on understanding the model and how to properly optimize its parameters, 7 Scale 3 Estimated Scale 2 Actual Scale 2 Error (a) (b) (c) (d) Figure 7: Learning transformations across scale. [sent-213, score-0.18]
90 (a) Scale 3 of the Laplacian pyramid for a natural scene we wish to code, by describing how it transforms across scale, in terms of our learned dictionary. [sent-214, score-0.242]
91 (b) The estimated scale 2, computed by transforming 8x8 regions of the up-sampled and blurred scale 3. [sent-215, score-0.255]
92 Early attempts to model the manifold structure of images train on densely sampled point clouds and find an embedding into a small number of coordinates along the manifold. [sent-221, score-0.378]
93 However such an approach does not actually constitute a model, since there is no function for mapping arbitrary points, or moving along the manifold. [sent-222, score-0.124]
94 Here, by learning operators that transport along the manifold we have been able to learn a compact description of its structure. [sent-226, score-0.788]
95 Previous attempts to learn Lie group operators have focused on linear approximations. [sent-229, score-0.34]
96 Here we show that utilizing the full Lie operator/matrix exponential in learning, while computationally intensive, is tractable, even in the extremely high dimensional cases required by models of natural movies. [sent-230, score-0.164]
97 Our spectral decomposition is the key component that enables this, and, in combination with careful mitigation of local minima in the objective function using a coarse-to-fine technique, gives us the power to factor out large transformations from data. [sent-231, score-0.222]
98 One shortcoming of the approach described here is that transformations are modeled in the original pixel domain. [sent-232, score-0.22]
99 Potentially these transformations may be described more economically by working in a feature space, such as a sparse decomposition of the image. [sent-233, score-0.142]
100 (2004) Unsupervised learning of image manifolds by semidefinite programming. [sent-294, score-0.246]
wordName wordTfidf (topN-words)
[('operators', 0.29), ('orbits', 0.231), ('transport', 0.198), ('torus', 0.188), ('frame', 0.166), ('cm', 0.158), ('manifolds', 0.155), ('transformations', 0.142), ('sphere', 0.139), ('manifold', 0.138), ('video', 0.136), ('operator', 0.129), ('images', 0.128), ('lie', 0.124), ('snr', 0.122), ('topology', 0.116), ('klm', 0.113), ('qkl', 0.113), ('patch', 0.109), ('xt', 0.101), ('surface', 0.095), ('sin', 0.094), ('dictionary', 0.094), ('image', 0.091), ('basis', 0.086), ('angles', 0.086), ('shifted', 0.086), ('bilinear', 0.086), ('olshausen', 0.086), ('exp', 0.085), ('movie', 0.081), ('frames', 0.079), ('pixel', 0.078), ('motion', 0.078), ('patches', 0.077), ('culpepper', 0.075), ('displacement', 0.073), ('initialized', 0.073), ('laplacian', 0.073), ('scale', 0.071), ('pyramid', 0.071), ('cos', 0.07), ('natural', 0.066), ('movies', 0.066), ('cadieu', 0.066), ('transforming', 0.064), ('circular', 0.064), ('rao', 0.064), ('learned', 0.064), ('along', 0.064), ('taylor', 0.063), ('lter', 0.063), ('points', 0.062), ('pairs', 0.061), ('traverse', 0.06), ('moving', 0.06), ('exponential', 0.059), ('reconstruction', 0.057), ('trajectory', 0.055), ('oriented', 0.055), ('vision', 0.055), ('derivatives', 0.054), ('convolved', 0.054), ('trajectories', 0.053), ('berkeley', 0.053), ('ul', 0.051), ('plotting', 0.051), ('learn', 0.05), ('shift', 0.049), ('extrapolate', 0.049), ('blurred', 0.049), ('shrink', 0.049), ('coordinates', 0.048), ('compact', 0.048), ('apart', 0.048), ('interpolate', 0.047), ('toolbox', 0.044), ('multiply', 0.044), ('translating', 0.044), ('spectral', 0.044), ('content', 0.043), ('steps', 0.043), ('nitesimal', 0.043), ('vk', 0.042), ('white', 0.041), ('complex', 0.041), ('transforms', 0.041), ('temporally', 0.041), ('partial', 0.039), ('initialize', 0.039), ('elements', 0.039), ('camera', 0.039), ('saul', 0.039), ('utilizing', 0.039), ('action', 0.039), ('solution', 0.038), ('properly', 0.038), ('transformation', 0.037), ('stopping', 0.037), ('minima', 0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999946 137 nips-2009-Learning transport operators for image manifolds
Author: Benjamin Culpepper, Bruno A. Olshausen
Abstract: We describe an unsupervised manifold learning algorithm that represents a surface through a compact description of operators that traverse it. The operators are based on matrix exponentials, which are the solution to a system of first-order linear differential equations. The matrix exponents are represented by a basis that is adapted to the statistics of the data so that the infinitesimal generator for a trajectory along the underlying manifold can be produced by linearly composing a few elements. The method is applied to recover topological structure from low dimensional synthetic data, and to model local structure in how natural images change over time and scale. 1
2 0.1288034 133 nips-2009-Learning models of object structure
Author: Joseph Schlecht, Kobus Barnard
Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1
3 0.12867709 104 nips-2009-Group Sparse Coding
Author: Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow
Abstract: Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification. 1
Author: Kwang I. Kim, Florian Steinke, Matthias Hein
Abstract: Semi-supervised regression based on the graph Laplacian suffers from the fact that the solution is biased towards a constant and the lack of extrapolating power. Based on these observations, we propose to use the second-order Hessian energy for semi-supervised regression which overcomes both these problems. If the data lies on or close to a low-dimensional submanifold in feature space, the Hessian energy prefers functions whose values vary linearly with respect to geodesic distance. We first derive the Hessian energy for smooth manifolds and continue to give a stable estimation procedure for the common case where only samples of the underlying manifold are given. The preference of ‘’linear” functions on manifolds renders the Hessian energy particularly suited for the task of semi-supervised dimensionality reduction, where the goal is to find a user-defined embedding function given some labeled points which varies smoothly (and ideally linearly) along the manifold. The experimental results suggest superior performance of our method compared with semi-supervised regression using Laplacian regularization or standard supervised regression techniques applied to this task. 1
5 0.11322125 93 nips-2009-Fast Image Deconvolution using Hyper-Laplacian Priors
Author: Dilip Krishnan, Rob Fergus
Abstract: The heavy-tailed distribution of gradients in natural scenes have proven effective priors for a range of problems such as denoising, deblurring and super-resolution. α These distributions are well modeled by a hyper-Laplacian p(x) ∝ e−k|x| , typically with 0.5 ≤ α ≤ 0.8. However, the use of sparse distributions makes the problem non-convex and impractically slow to solve for multi-megapixel images. In this paper we describe a deconvolution approach that is several orders of magnitude faster than existing techniques that use hyper-Laplacian priors. We adopt an alternating minimization scheme where one of the two phases is a non-convex problem that is separable over pixels. This per-pixel sub-problem may be solved with a lookup table (LUT). Alternatively, for two specific values of α, 1/2 and 2/3 an analytic solution can be found, by finding the roots of a cubic and quartic polynomial, respectively. Our approach (using either LUTs or analytic formulae) is able to deconvolve a 1 megapixel image in less than ∼3 seconds, achieving comparable quality to existing methods such as iteratively reweighted least squares (IRLS) that take ∼20 minutes. Furthermore, our method is quite general and can easily be extended to related image processing problems, beyond the deconvolution application demonstrated. 1
7 0.11186948 219 nips-2009-Slow, Decorrelated Features for Pretraining Complex Cell-like Networks
8 0.10813817 176 nips-2009-On Invariance in Hierarchical Models
9 0.10739796 169 nips-2009-Nonlinear Learning using Local Coordinate Coding
10 0.10308539 241 nips-2009-The 'tree-dependent components' of natural scenes are edge filters
11 0.10197473 211 nips-2009-Segmenting Scenes by Matching Image Composites
12 0.10187681 88 nips-2009-Extending Phase Mechanism to Differential Motion Opponency for Motion Pop-out
13 0.0991356 167 nips-2009-Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations
14 0.095418237 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models
15 0.095347323 151 nips-2009-Measuring Invariances in Deep Networks
16 0.095177829 229 nips-2009-Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data
17 0.093742706 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections
18 0.089680389 46 nips-2009-Bilinear classifiers for visual recognition
19 0.088255435 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection
20 0.087708622 6 nips-2009-A Biologically Plausible Model for Rapid Natural Scene Identification
topicId topicWeight
[(0, -0.281), (1, -0.06), (2, -0.034), (3, 0.052), (4, -0.028), (5, 0.147), (6, 0.116), (7, 0.076), (8, 0.184), (9, 0.009), (10, 0.035), (11, 0.114), (12, -0.078), (13, 0.058), (14, -0.031), (15, -0.066), (16, -0.069), (17, 0.095), (18, -0.067), (19, 0.021), (20, -0.023), (21, 0.043), (22, 0.076), (23, -0.032), (24, -0.009), (25, 0.052), (26, -0.018), (27, -0.025), (28, 0.029), (29, 0.093), (30, 0.042), (31, -0.051), (32, -0.028), (33, -0.035), (34, -0.011), (35, -0.01), (36, 0.019), (37, -0.02), (38, 0.002), (39, -0.065), (40, -0.0), (41, 0.128), (42, 0.081), (43, 0.136), (44, -0.025), (45, -0.003), (46, -0.013), (47, -0.02), (48, 0.036), (49, -0.053)]
simIndex simValue paperId paperTitle
same-paper 1 0.95140338 137 nips-2009-Learning transport operators for image manifolds
Author: Benjamin Culpepper, Bruno A. Olshausen
Abstract: We describe an unsupervised manifold learning algorithm that represents a surface through a compact description of operators that traverse it. The operators are based on matrix exponentials, which are the solution to a system of first-order linear differential equations. The matrix exponents are represented by a basis that is adapted to the statistics of the data so that the infinitesimal generator for a trajectory along the underlying manifold can be produced by linearly composing a few elements. The method is applied to recover topological structure from low dimensional synthetic data, and to model local structure in how natural images change over time and scale. 1
2 0.70783895 93 nips-2009-Fast Image Deconvolution using Hyper-Laplacian Priors
Author: Dilip Krishnan, Rob Fergus
Abstract: The heavy-tailed distribution of gradients in natural scenes have proven effective priors for a range of problems such as denoising, deblurring and super-resolution. α These distributions are well modeled by a hyper-Laplacian p(x) ∝ e−k|x| , typically with 0.5 ≤ α ≤ 0.8. However, the use of sparse distributions makes the problem non-convex and impractically slow to solve for multi-megapixel images. In this paper we describe a deconvolution approach that is several orders of magnitude faster than existing techniques that use hyper-Laplacian priors. We adopt an alternating minimization scheme where one of the two phases is a non-convex problem that is separable over pixels. This per-pixel sub-problem may be solved with a lookup table (LUT). Alternatively, for two specific values of α, 1/2 and 2/3 an analytic solution can be found, by finding the roots of a cubic and quartic polynomial, respectively. Our approach (using either LUTs or analytic formulae) is able to deconvolve a 1 megapixel image in less than ∼3 seconds, achieving comparable quality to existing methods such as iteratively reweighted least squares (IRLS) that take ∼20 minutes. Furthermore, our method is quite general and can easily be extended to related image processing problems, beyond the deconvolution application demonstrated. 1
3 0.61454886 104 nips-2009-Group Sparse Coding
Author: Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow
Abstract: Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification. 1
4 0.57487464 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection
Author: Roy Anati, Kostas Daniilidis
Abstract: We present a system which constructs a topological map of an environment given a sequence of images. This system includes a novel image similarity score which uses dynamic programming to match images using both the appearance and relative positions of local features simultaneously. Additionally, an MRF is constructed to model the probability of loop-closures. A locally optimal labeling is found using Loopy-BP. Finally we outline a method to generate a topological map from loop closure data. Results, presented on four urban sequences and one indoor sequence, outperform the state of the art. 1
5 0.56201005 6 nips-2009-A Biologically Plausible Model for Rapid Natural Scene Identification
Author: Sennay Ghebreab, Steven Scholte, Victor Lamme, Arnold Smeulders
Abstract: Contrast statistics of the majority of natural images conform to a Weibull distribution. This property of natural images may facilitate efficient and very rapid extraction of a scene's visual gist. Here we investigated whether a neural response model based on the Wei bull contrast distribution captures visual information that humans use to rapidly identify natural scenes. In a learning phase, we measured EEG activity of 32 subjects viewing brief flashes of 700 natural scenes. From these neural measurements and the contrast statistics of the natural image stimuli, we derived an across subject Wei bull response model. We used this model to predict the EEG responses to 100 new natural scenes and estimated which scene the subject viewed by finding the best match between the model predictions and the observed EEG responses. In almost 90 percent of the cases our model accurately predicted the observed scene. Moreover, in most failed cases, the scene mistaken for the observed scene was visually similar to the observed scene itself. Similar results were obtained in a separate experiment in which 16 other subjects where presented with artificial occlusion models of natural images. Together, these results suggest that Weibull contrast statistics of natural images contain a considerable amount of visual gist information to warrant rapid image identification.
6 0.55256355 138 nips-2009-Learning with Compressible Priors
7 0.54439867 219 nips-2009-Slow, Decorrelated Features for Pretraining Complex Cell-like Networks
8 0.54357415 231 nips-2009-Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing
9 0.53956574 167 nips-2009-Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations
10 0.53344649 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis
11 0.51539433 228 nips-2009-Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing
12 0.51262951 214 nips-2009-Semi-supervised Regression using Hessian energy with an application to semi-supervised dimensionality reduction
13 0.5026747 243 nips-2009-The Ordered Residual Kernel for Robust Motion Subspace Clustering
14 0.49418667 177 nips-2009-On Learning Rotations
15 0.48534977 211 nips-2009-Segmenting Scenes by Matching Image Composites
16 0.47766358 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry
17 0.4719888 176 nips-2009-On Invariance in Hierarchical Models
18 0.46365187 151 nips-2009-Measuring Invariances in Deep Networks
19 0.46184477 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models
20 0.46100914 83 nips-2009-Estimating image bases for visual image reconstruction from human brain activity
topicId topicWeight
[(7, 0.014), (21, 0.028), (24, 0.035), (25, 0.071), (35, 0.071), (36, 0.115), (39, 0.059), (57, 0.01), (58, 0.123), (60, 0.168), (61, 0.025), (71, 0.042), (81, 0.02), (86, 0.136), (91, 0.018)]
simIndex simValue paperId paperTitle
Author: Steven Chase, Andrew Schwartz, Wolfgang Maass, Robert A. Legenstein
Abstract: The control of neuroprosthetic devices from the activity of motor cortex neurons benefits from learning effects where the function of these neurons is adapted to the control task. It was recently shown that tuning properties of neurons in monkey motor cortex are adapted selectively in order to compensate for an erroneous interpretation of their activity. In particular, it was shown that the tuning curves of those neurons whose preferred directions had been misinterpreted changed more than those of other neurons. In this article, we show that the experimentally observed self-tuning properties of the system can be explained on the basis of a simple learning rule. This learning rule utilizes neuronal noise for exploration and performs Hebbian weight updates that are modulated by a global reward signal. In contrast to most previously proposed reward-modulated Hebbian learning rules, this rule does not require extraneous knowledge about what is noise and what is signal. The learning rule is able to optimize the performance of the model system within biologically realistic periods of time and under high noise levels. When the neuronal noise is fitted to experimental data, the model produces learning effects similar to those found in monkey experiments.
same-paper 2 0.89319676 137 nips-2009-Learning transport operators for image manifolds
Author: Benjamin Culpepper, Bruno A. Olshausen
Abstract: We describe an unsupervised manifold learning algorithm that represents a surface through a compact description of operators that traverse it. The operators are based on matrix exponentials, which are the solution to a system of first-order linear differential equations. The matrix exponents are represented by a basis that is adapted to the statistics of the data so that the infinitesimal generator for a trajectory along the underlying manifold can be produced by linearly composing a few elements. The method is applied to recover topological structure from low dimensional synthetic data, and to model local structure in how natural images change over time and scale. 1
3 0.81773007 104 nips-2009-Group Sparse Coding
Author: Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow
Abstract: Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification. 1
4 0.81258726 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning
Author: Gal Chechik, Uri Shalit, Varun Sharma, Samy Bengio
Abstract: Learning a measure of similarity between pairs of objects is a fundamental problem in machine learning. It stands in the core of classification methods like kernel machines, and is particularly useful for applications like searching for images that are similar to a given image or finding videos that are relevant to a given video. In these tasks, users look for objects that are not only visually similar but also semantically related to a given object. Unfortunately, current approaches for learning similarity do not scale to large datasets, especially when imposing metric constraints on the learned similarity. We describe OASIS, a method for learning pairwise similarity that is fast and scales linearly with the number of objects and the number of non-zero features. Scalability is achieved through online learning of a bilinear model over sparse representations using a large margin criterion and an efficient hinge loss cost. OASIS is accurate at a wide range of scales: on a standard benchmark with thousands of images, it is more precise than state-of-the-art methods, and faster by orders of magnitude. On 2.7 million images collected from the web, OASIS can be trained within 3 days on a single CPU. The nonmetric similarities learned by OASIS can be transformed into metric similarities, achieving higher precisions than similarities that are learned as metrics in the first place. This suggests an approach for learning a metric from data that is larger by orders of magnitude than was handled before. 1
5 0.79905427 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA
Author: Piyush Rai, Hal Daume
Abstract: Canonical Correlation Analysis (CCA) is a useful technique for modeling dependencies between two (or more) sets of variables. Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of correlation components, and effectively capture the sparsity underlying the projections. In addition, given (partially) labeled data, our algorithm can also be used as a (semi)supervised dimensionality reduction technique, and can be applied to learn useful predictive features in the context of learning a set of related tasks. Experimental results demonstrate the efficacy of the proposed approach for both CCA as a stand-alone problem, and when applied to multi-label prediction. 1
6 0.79621559 70 nips-2009-Discriminative Network Models of Schizophrenia
7 0.79574627 191 nips-2009-Positive Semidefinite Metric Learning with Boosting
8 0.79448694 119 nips-2009-Kernel Methods for Deep Learning
9 0.79186827 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability
10 0.7911942 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization
11 0.79031116 167 nips-2009-Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations
12 0.79005206 3 nips-2009-AUC optimization and the two-sample problem
13 0.78926122 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data
14 0.7892217 169 nips-2009-Nonlinear Learning using Local Coordinate Coding
15 0.78891683 83 nips-2009-Estimating image bases for visual image reconstruction from human brain activity
16 0.78477997 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds
17 0.78376424 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference
18 0.78368127 80 nips-2009-Efficient and Accurate Lp-Norm Multiple Kernel Learning
19 0.78345209 87 nips-2009-Exponential Family Graph Matching and Ranking
20 0.78255254 237 nips-2009-Subject independent EEG-based BCI decoding