nips nips2012 nips2012-159 knowledge-graph by maker-knowledge-mining

159 nips-2012-Image Denoising and Inpainting with Deep Neural Networks


Source: pdf

Author: Junyuan Xie, Linli Xu, Enhong Chen

Abstract: We present a novel approach to low-level vision problems that combines sparse coding and deep networks pre-trained with denoising auto-encoder (DA). We propose an alternative training scheme that successfully adapts DA, originally designed for unsupervised feature learning, to the tasks of image denoising and blind inpainting. Our method’s performance in the image denoising task is comparable to that of KSVD which is a widely used sparse coding technique. More importantly, in blind image inpainting task, the proposed method provides solutions to some complex problems that have not been tackled before. Specifically, we can automatically remove complex patterns like superimposed text from an image, rather than simple patterns like pixels missing at random. Moreover, the proposed method does not need the information regarding the region that requires inpainting to be given a priori. Experimental results demonstrate the effectiveness of the proposed method in the tasks of image denoising and blind inpainting. We also show that our new training scheme for DA is more effective and can improve the performance of unsupervised feature learning. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 cn Abstract We present a novel approach to low-level vision problems that combines sparse coding and deep networks pre-trained with denoising auto-encoder (DA). [sent-8, score-0.797]

2 We propose an alternative training scheme that successfully adapts DA, originally designed for unsupervised feature learning, to the tasks of image denoising and blind inpainting. [sent-9, score-1.058]

3 Our method’s performance in the image denoising task is comparable to that of KSVD which is a widely used sparse coding technique. [sent-10, score-0.877]

4 More importantly, in blind image inpainting task, the proposed method provides solutions to some complex problems that have not been tackled before. [sent-11, score-0.913]

5 Specifically, we can automatically remove complex patterns like superimposed text from an image, rather than simple patterns like pixels missing at random. [sent-12, score-0.246]

6 Moreover, the proposed method does not need the information regarding the region that requires inpainting to be given a priori. [sent-13, score-0.541]

7 Experimental results demonstrate the effectiveness of the proposed method in the tasks of image denoising and blind inpainting. [sent-14, score-0.95]

8 1 Introduction Observed image signals are often corrupted by acquisition channel or artificial editing. [sent-16, score-0.323]

9 The goal of image restoration techniques is to restore the original image from a noisy observation of it. [sent-17, score-0.498]

10 Image denoising and inpainting are common image restoration problems that are both useful by themselves and important preprocessing steps of many other applications. [sent-18, score-1.337]

11 This paper focuses on image denoising and blind inpainting. [sent-20, score-0.917]

12 One approach is to transfer image signals to an alternative domain where they can be more easily separated from the noise [1, 2, 3]. [sent-22, score-0.324]

13 Another approach is to capture image statistics directly in the image domain. [sent-24, score-0.384]

14 Following this strategy, A family of models exploiting the (linear) sparse coding technique have drawn increasing attention recently [4, 5, 6, 7, 8, 9]. [sent-25, score-0.122]

15 Sparse coding methods reconstruct images from a sparse linear combination of an over-complete dictionary. [sent-26, score-0.214]

16 This learning step improves the performance of sparse coding significantly. [sent-28, score-0.122]

17 1 Image inpainting methods can be divided into two categories: non-blind inpainting and blind inpainting. [sent-31, score-1.262]

18 In non-blind inpainting, the regions that need to be filled in are provided to the algorithm a priori, whereas in blind inpainting, no information about the locations of the corrupted pixels is given and the algorithm must automatically identify the pixels that require inpainting. [sent-32, score-0.35]

19 The stateof-the-art non-blind inpainting algorithms can perform very well on removing text, doodle, or even very large objects [10, 11, 12]. [sent-33, score-0.541]

20 Some image denoising methods, after modification, can also be applied to non-blind image inpainting with state-of-the-art results [7]. [sent-34, score-1.47]

21 Recent research suggests, however, that non-linear, deep models can achieve superior performance in various real world problems. [sent-41, score-0.128]

22 In this paper, we propose to combine the advantageous “sparse” and “deep” principles of sparse coding and deep networks to solve the image denoising and blind inpainting problems. [sent-45, score-1.71]

23 The sparse variants of deep neural network are expected to perform especially well in vision problems because they have a similar structure to human visual cortex [17]. [sent-46, score-0.224]

24 Deep neural networks with many hidden layers were generally considered hard to train before a new training scheme was proposed which is to adopt greedy layer-wise pre-training to give better initialization of network parameters before traditional back-propagation training [18, 19]. [sent-47, score-0.277]

25 We employ DA to perform pre-training in our method because it naturally lends itself to denoising and inpainting tasks. [sent-49, score-1.086]

26 DA is a two-layer neural network that tries to reconstruct the original input from a noisy version of it. [sent-50, score-0.159]

27 A series of DAs can be stacked to form a deep network called Stacked Denoising Auto-encoders (SDA) by using the hidden layer activation of the previous layer as input of the next layer. [sent-53, score-0.524]

28 In these settings, only the clean data is provided while the noisy version of it is generated during training by adding random Gaussian or Salt-and-Pepper noise to the clean data. [sent-55, score-0.511]

29 After training of one layer, only the clean data is passed on to the network to produce the clean training data for the next layer while the noisy data is discarded. [sent-56, score-0.569]

30 The noisy training data for the next layer is similarly constructed by randomly corrupting the generated clean training data. [sent-57, score-0.471]

31 For the image denoising and inpainting tasks, however, the choices of clean and noisy input are natural: they are set to be the desired image after denoising or inpainting and the observed noisy image respectively. [sent-58, score-3.049]

32 Therefore, we propose a new training scheme that trains the DA to reconstruct the clean image from the corresponding noisy observation. [sent-59, score-0.509]

33 After training of the first layer, the hidden layer activations of both the noisy input and the clean input are calculated to serve as the training data of the second layer. [sent-60, score-0.493]

34 Our experiments on the image denoising and inpainting tasks demonstrate that SDA is able to learn features that adapt to specific noises from white Gaussian noise to superimposed text. [sent-61, score-1.506]

35 Inspired by SDA’s ability to learn noise specific features in denoising tasks, we argue that in unsupervised feature learning problems the type of noise used can also affect the performance. [sent-62, score-0.805]

36 Specifically, instead of corrupting the input with arbitrarily chosen noise, more sophisticated corruption process that agrees to the true noise distribution in the data can improve the quality of the learned features. [sent-63, score-0.293]

37 For example, when learning audio features, the variations of noise on different frequencies are usually different and sometimes correlated. [sent-64, score-0.14]

38 Hence instead of corrupting the training data with simple i. [sent-65, score-0.124]

39 2 (a) Denoising auto-encoder (DA) architecture (b) Stacked sparse denoising auto-encoder architecture Figure 1: Model architectures. [sent-71, score-0.598]

40 1 Problem Formulation Assuming x is the observed noisy image and y is the original noise free image, we can formulate the image corruption process as: x = η(y). [sent-73, score-0.6]

41 Then, the denoising task’s learning objective becomes: f = argmin Ey f (x) − y 2 (2) 2 f From this formulation, we can see that the task here is to find a function f that best approximates η −1 . [sent-75, score-0.545]

42 We can now treat the image denoising and inpainting problems in a unified framework by choosing appropriate η in different situations. [sent-76, score-1.278]

43 , N and xi be the corrupted version of corresponding yi . [sent-81, score-0.158]

44 1a: h(xi ) = σ(Wxi + b) (3) y(xi ) = σ(W h(xi ) + b ) ˆ (4) where σ(x) = (1 + exp(−x))−1 is the sigmoid activation function which is applied element-wise to vectors, hi is the hidden layer activation, y(xi ) is an approximation of yi and Θ = {W, b, W , b } ˆ represents the weights and biases. [sent-83, score-0.226]

45 ˆ θ = argmin θ (5) i=1 After finish training a DA, we can move on to training the next layer by using the hidden layer activation of the first layer as the input of the next layer. [sent-85, score-0.484]

46 Due to the fact that directly processing the entire image is intractable, we instead draw overlapping patches from the image as our data objects. [sent-89, score-0.413]

47 In the training phase, the model is supplied with both the corrupted noisy image patches xi , for i = 1, 2, . [sent-90, score-0.488]

48 After training, SSDA will be able to reconstruct the corresponding clean image given any noisy observation. [sent-94, score-0.433]

49 We regularize the hidden layer representation to be sparse by choosing small ρ so that the KLdivergence term will encourage the mean activation of hidden units to be small. [sent-122, score-0.289]

50 After training of the first DA, we use h(yi ) and h(xi ) as the clean and noisy input respectively for the second DA. [sent-124, score-0.282]

51 We then initialize a deep network with the weights obtained from K stacked DAs. [sent-127, score-0.232]

52 The network has one input layer, one output and 2K − 1 hidden layers as shown in Fig. [sent-128, score-0.134]

53 3 Experiments We narrow our focus down to denoising and inpainting of grey-scale images, but there is no difficulty in generalizing to colored images. [sent-133, score-1.11]

54 We use a set of natural images collected from the web1 as our training set and standard testing images2 as the testing set. [sent-134, score-0.166]

55 We create noisy images from clean training and testing images by applying the function (1) to them. [sent-135, score-0.406]

56 Image patches are then extracted from both clean and noisy images to train SSDAs. [sent-136, score-0.312]

57 We employ Peak Signal to Noise Ratio (PSNR) 2 2 to quantify denoising results: 10 log10 (2552 /σe ), where σe is the mean squared error. [sent-137, score-0.545]

58 PSNR is one of the standard indicators used for evaluating image denoising results. [sent-138, score-0.737]

59 1 Denoising White Gaussian Noise We first corrupt images with additive white Gaussian noise of various standard deviations. [sent-140, score-0.281]

60 For the proposed method, one SSDA model is trained for each noise level. [sent-141, score-0.144]

61 In the meantime, we try different patch sizes and find that higher noise level generally requires larger patch size. [sent-144, score-0.144]

62 2 4 Figure 2: Visual comparison of denoising results. [sent-149, score-0.56]

63 Results of images corrupted by white Gaussian noise with standard deviation σ = 50 are shown. [sent-150, score-0.309]

64 This indicates that although the reconstruction errors averaged over all pixels are the same, SSDA is better at denoising complex regions. [sent-162, score-0.601]

65 2 Image Inpainting Figure 3: Visual comparison of inpainting results. [sent-164, score-0.556]

66 For the image inpainting task, we test our model on the text removal problem. [sent-165, score-0.804]

67 Both the training and testing set compose of images with super-imposed text of various fonts and sizes from 18-pix to 36-pix. [sent-166, score-0.232]

68 Due to the lack of comparable blind inpainting algorithms, We compare our method to the non-blind KSVD inpainting algorithm [7], which significantly simplifies the problem by requiring the knowledge of which pixels are corrupted and require inpainting. [sent-167, score-1.412]

69 We find that SSDA is able to eliminate text of small fonts completely while text of larger fonts is dimmed. [sent-170, score-0.152]

70 Non-blind inpainting is a well developed technology that works decently on the removal of small objects. [sent-172, score-0.595]

71 SSDA’s capability of blind inpainting of complex patterns is one of this paper’s major contributions. [sent-178, score-0.765]

72 3 Hidden Layer Feature Analysis Traditionally when training denoising auto-encoders, the noisy training data is usually generated with arbitrarily selected simple noise distribution regardless of the characteristics of the specific training data [21]. [sent-191, score-0.908]

73 In real world problems, the clean training data is in fact usually subject to noise. [sent-193, score-0.189]

74 Hence, if we estimate the distribution of noise and exaggerate it to generate noisy training data, the resulting DA will learn to be more robust to noise in the input data and produce better features. [sent-194, score-0.374]

75 Inspired by SSDA’s ability to learn different features when trained on denoising different noise patterns, we argue that training denoising auto-encoders with noise patterns that fit to specific situations can also improve the performance of unsupervised feature learning. [sent-195, score-1.477]

76 We train DAs with different types of noise and then apply them to handwritten digits corrupted by the type of noise they are trained on as well as other types of noise. [sent-197, score-0.368]

77 We find that the highest classification accuracy on each type of noise is achieved by the DA trained to remove that type of noise. [sent-201, score-0.169]

78 This is not surprising since more information is utilized, however it indicates that instead of arbitrarily corrupting input with noise that follows simple distribution and feeding it to DA, more sophisticated methods that corrupt input in more realistic ways can achieve better performance. [sent-202, score-0.323]

79 Learned Structure Unlike models relying on structural priors, our method’s denoising ability comes from learning. [sent-205, score-0.545]

80 Therefore SSDA’s ability to denoise and inpaint images is mostly the result of training. [sent-208, score-0.17]

81 With some modifications, it is possible to denoise audio signals or complete missing data (as a data preprocessing step) with SSDA. [sent-210, score-0.157]

82 2 Advantages and Limitations Traditionally, for complicated inpainting tasks, an inpainting mask that tells the algorithm which pixels correspond to noise and require inpainting is supplied a priori. [sent-212, score-1.8]

83 This makes our method a suitable choice for fully automatic and noise pattern specific image processing. [sent-215, score-0.306]

84 Generally speaking, however, SSDA can remove only the noise patterns it has seen in the training data. [sent-218, score-0.236]

85 7 SSDA would only be suitable in circumstances where the scope of denoising tasks is narrow, such as reconstructing images corrupted by a certain procedure. [sent-222, score-0.749]

86 5 Conclusion In this paper, we present a novel approach to image denoising and blind inpainting that combines sparse coding and deep neural networks pre-trained with denoising auto-encoders. [sent-223, score-2.255]

87 We propose a new training scheme for DA that makes it possible to denoise and inpaint images within a unified framework. [sent-224, score-0.246]

88 In the experiments, our method achieves performance comparable to traditional linear sparse coding algorithm on the simple task of denoising additive white Gaussian noise. [sent-225, score-0.743]

89 Moreover, our non-linear approach successfully tackles the much harder problem of blind inpainting of complex patterns which, to the best of our knowledge, has not been addressed before. [sent-226, score-0.789]

90 We also show that the proposed training scheme is able to improve DA’s performance in the tasks of unsupervised feature learning. [sent-227, score-0.141]

91 In our future work, we would like to explore the possibility of adapting the proposed approach to various other applications such as denoising and inpainting of audio and video, image super-resolution and missing data completion. [sent-228, score-1.343]

92 An adaptive threshold method for image denoising based on wavelet domain. [sent-240, score-0.789]

93 Image denoising using scale mixtures of Gaussians in the wavelet domain. [sent-249, score-0.597]

94 A new SURE approach to image denoising: Interscale orthonormal wavelet thresholding. [sent-255, score-0.244]

95 Image denoising via sparse and redundant representations over learned dictionaries. [sent-280, score-0.618]

96 Region filling and object removal by exemplar-based e image inpainting. [sent-308, score-0.229]

97 An image inpainting technique based on the fast marching method. [sent-319, score-0.733]

98 Robust locally linear analysis with applications to image denoising and blind inpainting. [sent-334, score-0.917]

99 Restoration of images corrupted by impulse noise using blind inpainting and l0 norm. [sent-338, score-1.028]

100 Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. [sent-379, score-1.234]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('denoising', 0.545), ('inpainting', 0.541), ('ssda', 0.296), ('image', 0.192), ('da', 0.185), ('blind', 0.18), ('ksvd', 0.148), ('clean', 0.136), ('noise', 0.114), ('deep', 0.111), ('corrupted', 0.094), ('sda', 0.092), ('stacked', 0.088), ('layer', 0.086), ('denoise', 0.074), ('noisy', 0.072), ('corrupting', 0.071), ('coding', 0.069), ('psnr', 0.065), ('images', 0.059), ('sparse', 0.053), ('training', 0.053), ('wavelet', 0.052), ('hidden', 0.051), ('activation', 0.048), ('patterns', 0.044), ('fonts', 0.042), ('restoration', 0.042), ('white', 0.042), ('yi', 0.041), ('impulse', 0.04), ('superimposed', 0.039), ('pixels', 0.038), ('removal', 0.037), ('inpaint', 0.037), ('portilla', 0.037), ('text', 0.034), ('reconstruct', 0.033), ('network', 0.033), ('tasks', 0.033), ('corrupt', 0.033), ('unsupervised', 0.032), ('corruption', 0.03), ('trained', 0.03), ('patches', 0.029), ('layers', 0.029), ('china', 0.028), ('testing', 0.027), ('visual', 0.027), ('audio', 0.026), ('feeding', 0.026), ('remove', 0.025), ('das', 0.025), ('supplied', 0.025), ('harder', 0.024), ('narrow', 0.024), ('dictionary', 0.023), ('xi', 0.023), ('scheme', 0.023), ('gaussian', 0.022), ('elad', 0.022), ('vincent', 0.022), ('missing', 0.022), ('input', 0.021), ('xu', 0.021), ('mairal', 0.02), ('learned', 0.02), ('peak', 0.02), ('sophisticated', 0.019), ('networks', 0.019), ('graphics', 0.019), ('convolutional', 0.019), ('traditionally', 0.019), ('acquisition', 0.019), ('reconstruction', 0.018), ('signals', 0.018), ('arbitrarily', 0.018), ('comparable', 0.018), ('scope', 0.018), ('yuan', 0.018), ('jain', 0.017), ('preprocessing', 0.017), ('various', 0.017), ('technology', 0.017), ('additive', 0.016), ('criminisi', 0.016), ('lena', 0.016), ('barbara', 0.016), ('kldivergence', 0.016), ('manzagol', 0.016), ('prochnow', 0.016), ('train', 0.016), ('boltzmann', 0.016), ('patch', 0.015), ('comparison', 0.015), ('bengio', 0.015), ('restores', 0.015), ('doctoral', 0.015), ('crafted', 0.015), ('erhan', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 159 nips-2012-Image Denoising and Inpainting with Deep Neural Networks

Author: Junyuan Xie, Linli Xu, Enhong Chen

Abstract: We present a novel approach to low-level vision problems that combines sparse coding and deep networks pre-trained with denoising auto-encoder (DA). We propose an alternative training scheme that successfully adapts DA, originally designed for unsupervised feature learning, to the tasks of image denoising and blind inpainting. Our method’s performance in the image denoising task is comparable to that of KSVD which is a widely used sparse coding technique. More importantly, in blind image inpainting task, the proposed method provides solutions to some complex problems that have not been tackled before. Specifically, we can automatically remove complex patterns like superimposed text from an image, rather than simple patterns like pixels missing at random. Moreover, the proposed method does not need the information regarding the region that requires inpainting to be given a priori. Experimental results demonstrate the effectiveness of the proposed method in the tasks of image denoising and blind inpainting. We also show that our new training scheme for DA is more effective and can improve the performance of unsupervised feature learning. 1

2 0.15921924 150 nips-2012-Hierarchical spike coding of sound

Author: Yan Karklin, Chaitanya Ekanadham, Eero P. Simoncelli

Abstract: Natural sounds exhibit complex statistical regularities at multiple scales. Acoustic events underlying speech, for example, are characterized by precise temporal and frequency relationships, but they can also vary substantially according to the pitch, duration, and other high-level properties of speech production. Learning this structure from data while capturing the inherent variability is an important first step in building auditory processing systems, as well as understanding the mechanisms of auditory perception. Here we develop Hierarchical Spike Coding, a two-layer probabilistic generative model for complex acoustic structure. The first layer consists of a sparse spiking representation that encodes the sound using kernels positioned precisely in time and frequency. Patterns in the positions of first layer spikes are learned from the data: on a coarse scale, statistical regularities are encoded by a second-layer spiking representation, while fine-scale structure is captured by recurrent interactions within the first layer. When fit to speech data, the second layer acoustic features include harmonic stacks, sweeps, frequency modulations, and precise temporal onsets, which can be composed to represent complex acoustic events. Unlike spectrogram-based methods, the model gives a probability distribution over sound pressure waveforms. This allows us to use the second-layer representation to synthesize sounds directly, and to perform model-based denoising, on which we demonstrate a significant improvement over standard methods. 1

3 0.148334 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves

Author: Daniel Zoran, Yair Weiss

Abstract: Simple Gaussian Mixture Models (GMMs) learned from pixels of natural image patches have been recently shown to be surprisingly strong performers in modeling the statistics of natural images. Here we provide an in depth analysis of this simple yet rich model. We show that such a GMM model is able to compete with even the most successful models of natural images in log likelihood scores, denoising performance and sample quality. We provide an analysis of what such a model learns from natural images as a function of number of mixture components including covariance structure, contrast variation and intricate structures such as textures, boundaries and more. Finally, we show that the salient properties of the GMM learned from natural images can be derived from a simplified Dead Leaves model which explicitly models occlusion, explaining its surprising success relative to other models. 1 GMMs and natural image statistics models Many models for the statistics of natural image patches have been suggested in recent years. Finding good models for natural images is important to many different research areas - computer vision, biological vision and neuroscience among others. Recently, there has been a growing interest in comparing different aspects of models for natural images such as log-likelihood and multi-information reduction performance, and much progress has been achieved [1,2, 3,4,5, 6]. Out of these results there is one which is particularly interesting: simple, unconstrained Gaussian Mixture Models (GMMs) with a relatively small number of mixture components learned from image patches are extraordinarily good in modeling image statistics [6, 4]. This is a surprising result due to the simplicity of GMMs and their ubiquity. Another surprising aspect of this result is that many of the current models may be thought of as GMMs with an exponential or infinite number of components, having different constraints on the covariance structure of the mixture components. In this work we study the nature of GMMs learned from natural image patches. We start with a thorough comparison to some popular and cutting edge image models. We show that indeed, GMMs are excellent performers in modeling natural image patches. We then analyze what properties of natural images these GMMs capture, their dependence on the number of components in the mixture and their relation to the structure of the world around us. Finally, we show that the learned GMM suggests a strong connection between natural image statistics and a simple variant of the dead leaves model [7, 8] , explicitly modeling occlusions and explaining some of the success of GMMs in modeling natural images. 1 3.5 .,...- ••.......-.-.. -..---'-. 1 ~~6\8161·· -.. .-.. --...--.-- ---..-.- -. --------------MII+··+ilIl ..... .. . . ~ '[25 . . . ---- ] B'II 1_ -- ~2 ;t:: fI 1 - --- ,---- ._.. : 61.5 ..... '

4 0.11672806 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

Author: Sanjeev Arora, Rong Ge, Ankur Moitra, Sushant Sachdeva

Abstract: We present a new algorithm for Independent Component Analysis (ICA) which has provable performance guarantees. In particular, suppose we are given samples of the form y = Ax + η where A is an unknown n × n matrix and x is a random variable whose components are independent and have a fourth moment strictly less than that of a standard Gaussian random variable and η is an n-dimensional Gaussian random variable with unknown covariance Σ: We give an algorithm that provable recovers A and Σ up to an additive and whose running time and sample complexity are polynomial in n and 1/ . To accomplish this, we introduce a novel “quasi-whitening” step that may be useful in other contexts in which the covariance of Gaussian noise is not known in advance. We also give a general framework for finding all local optima of a function (given an oracle for approximately finding just one) and this is a crucial step in our algorithm, one that has been overlooked in previous attempts, and allows us to control the accumulation of error when we find the columns of A one by one via local search. 1

5 0.11560854 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

Author: Christoph H. Lampert

Abstract: We study the problem of maximum marginal prediction (MMP) in probabilistic graphical models, a task that occurs, for example, as the Bayes optimal decision rule under a Hamming loss. MMP is typically performed as a two-stage procedure: one estimates each variable’s marginal probability and then forms a prediction from the states of maximal probability. In this work we propose a simple yet effective technique for accelerating MMP when inference is sampling-based: instead of the above two-stage procedure we directly estimate the posterior probability of each decision variable. This allows us to identify the point of time when we are sufficiently certain about any individual decision. Whenever this is the case, we dynamically prune the variables we are confident about from the underlying factor graph. Consequently, at any time only samples of variables whose decision is still uncertain need to be created. Experiments in two prototypical scenarios, multi-label classification and image inpainting, show that adaptive sampling can drastically accelerate MMP without sacrificing prediction accuracy. 1

6 0.10541034 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

7 0.10226095 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

8 0.10081049 197 nips-2012-Learning with Recursive Perceptual Representations

9 0.084950276 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

10 0.082284011 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

11 0.077911623 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

12 0.077331029 193 nips-2012-Learning to Align from Scratch

13 0.073443778 78 nips-2012-Compressive Sensing MRI with Wavelet Tree Sparsity

14 0.067561232 365 nips-2012-Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding

15 0.066900246 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

16 0.066817701 86 nips-2012-Convex Multi-view Subspace Learning

17 0.066611312 210 nips-2012-Memorability of Image Regions

18 0.064202502 65 nips-2012-Cardinality Restricted Boltzmann Machines

19 0.061235249 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

20 0.060535911 4 nips-2012-A Better Way to Pretrain Deep Boltzmann Machines


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.134), (1, 0.064), (2, -0.157), (3, 0.002), (4, 0.071), (5, 0.025), (6, -0.002), (7, -0.039), (8, -0.013), (9, -0.026), (10, 0.03), (11, 0.056), (12, -0.026), (13, 0.078), (14, -0.06), (15, -0.03), (16, 0.024), (17, -0.038), (18, 0.076), (19, -0.163), (20, 0.017), (21, -0.036), (22, -0.061), (23, 0.031), (24, 0.0), (25, 0.054), (26, -0.048), (27, 0.012), (28, 0.026), (29, 0.095), (30, -0.026), (31, 0.037), (32, 0.057), (33, -0.128), (34, -0.02), (35, -0.012), (36, 0.026), (37, 0.069), (38, 0.122), (39, -0.048), (40, 0.009), (41, 0.022), (42, -0.075), (43, -0.092), (44, 0.027), (45, 0.028), (46, 0.055), (47, 0.056), (48, -0.005), (49, -0.113)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94027519 159 nips-2012-Image Denoising and Inpainting with Deep Neural Networks

Author: Junyuan Xie, Linli Xu, Enhong Chen

Abstract: We present a novel approach to low-level vision problems that combines sparse coding and deep networks pre-trained with denoising auto-encoder (DA). We propose an alternative training scheme that successfully adapts DA, originally designed for unsupervised feature learning, to the tasks of image denoising and blind inpainting. Our method’s performance in the image denoising task is comparable to that of KSVD which is a widely used sparse coding technique. More importantly, in blind image inpainting task, the proposed method provides solutions to some complex problems that have not been tackled before. Specifically, we can automatically remove complex patterns like superimposed text from an image, rather than simple patterns like pixels missing at random. Moreover, the proposed method does not need the information regarding the region that requires inpainting to be given a priori. Experimental results demonstrate the effectiveness of the proposed method in the tasks of image denoising and blind inpainting. We also show that our new training scheme for DA is more effective and can improve the performance of unsupervised feature learning. 1

2 0.78689271 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves

Author: Daniel Zoran, Yair Weiss

Abstract: Simple Gaussian Mixture Models (GMMs) learned from pixels of natural image patches have been recently shown to be surprisingly strong performers in modeling the statistics of natural images. Here we provide an in depth analysis of this simple yet rich model. We show that such a GMM model is able to compete with even the most successful models of natural images in log likelihood scores, denoising performance and sample quality. We provide an analysis of what such a model learns from natural images as a function of number of mixture components including covariance structure, contrast variation and intricate structures such as textures, boundaries and more. Finally, we show that the salient properties of the GMM learned from natural images can be derived from a simplified Dead Leaves model which explicitly models occlusion, explaining its surprising success relative to other models. 1 GMMs and natural image statistics models Many models for the statistics of natural image patches have been suggested in recent years. Finding good models for natural images is important to many different research areas - computer vision, biological vision and neuroscience among others. Recently, there has been a growing interest in comparing different aspects of models for natural images such as log-likelihood and multi-information reduction performance, and much progress has been achieved [1,2, 3,4,5, 6]. Out of these results there is one which is particularly interesting: simple, unconstrained Gaussian Mixture Models (GMMs) with a relatively small number of mixture components learned from image patches are extraordinarily good in modeling image statistics [6, 4]. This is a surprising result due to the simplicity of GMMs and their ubiquity. Another surprising aspect of this result is that many of the current models may be thought of as GMMs with an exponential or infinite number of components, having different constraints on the covariance structure of the mixture components. In this work we study the nature of GMMs learned from natural image patches. We start with a thorough comparison to some popular and cutting edge image models. We show that indeed, GMMs are excellent performers in modeling natural image patches. We then analyze what properties of natural images these GMMs capture, their dependence on the number of components in the mixture and their relation to the structure of the world around us. Finally, we show that the learned GMM suggests a strong connection between natural image statistics and a simple variant of the dead leaves model [7, 8] , explicitly modeling occlusions and explaining some of the success of GMMs in modeling natural images. 1 3.5 .,...- ••.......-.-.. -..---'-. 1 ~~6\8161·· -.. .-.. --...--.-- ---..-.- -. --------------MII+··+ilIl ..... .. . . ~ '[25 . . . ---- ] B'II 1_ -- ~2 ;t:: fI 1 - --- ,---- ._.. : 61.5 ..... '

3 0.65785646 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation

Author: Ryan Kiros, Csaba Szepesvári

Abstract: The task of image auto-annotation, namely assigning a set of relevant tags to an image, is challenging due to the size and variability of tag vocabularies. Consequently, most existing algorithms focus on tag assignment and fix an often large number of hand-crafted features to describe image characteristics. In this paper we introduce a hierarchical model for learning representations of standard sized color images from the pixel level, removing the need for engineered feature representations and subsequent feature selection for annotation. We benchmark our model on the STL-10 recognition dataset, achieving state-of-the-art performance. When our features are combined with TagProp (Guillaumin et al.), we compete with or outperform existing annotation approaches that use over a dozen distinct handcrafted image descriptors. Furthermore, using 256-bit codes and Hamming distance for training TagProp, we exchange only a small reduction in performance for efficient storage and fast comparisons. Self-taught learning is used in all of our experiments and deeper architectures always outperform shallow ones. 1

4 0.63199055 210 nips-2012-Memorability of Image Regions

Author: Aditya Khosla, Jianxiong Xiao, Antonio Torralba, Aude Oliva

Abstract: While long term human visual memory can store a remarkable amount of visual information, it tends to degrade over time. Recent works have shown that image memorability is an intrinsic property of an image that can be reliably estimated using state-of-the-art image features and machine learning algorithms. However, the class of features and image information that is forgotten has not been explored yet. In this work, we propose a probabilistic framework that models how and which local regions from an image may be forgotten using a data-driven approach that combines local and global images features. The model automatically discovers memorability maps of individual images without any human annotation. We incorporate multiple image region attributes in our algorithm, leading to improved memorability prediction of images as compared to previous works. 1

5 0.58831286 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks

Author: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry. 1

6 0.58800501 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

7 0.58791572 193 nips-2012-Learning to Align from Scratch

8 0.58636886 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

9 0.55422044 202 nips-2012-Locally Uniform Comparison Image Descriptor

10 0.54495871 150 nips-2012-Hierarchical spike coding of sound

11 0.54196966 146 nips-2012-Graphical Gaussian Vector for Image Categorization

12 0.54143035 8 nips-2012-A Generative Model for Parts-based Object Segmentation

13 0.53456938 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

14 0.50881505 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification

15 0.50074619 185 nips-2012-Learning about Canonical Views from Internet Image Collections

16 0.49790484 78 nips-2012-Compressive Sensing MRI with Wavelet Tree Sparsity

17 0.49555987 365 nips-2012-Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding

18 0.48736775 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space

19 0.48309571 170 nips-2012-Large Scale Distributed Deep Networks

20 0.48057172 93 nips-2012-Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.038), (17, 0.015), (21, 0.019), (38, 0.099), (39, 0.019), (42, 0.025), (54, 0.012), (55, 0.034), (74, 0.072), (76, 0.12), (80, 0.117), (87, 0.278), (92, 0.051)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.84271681 341 nips-2012-The topographic unsupervised learning of natural sounds in the auditory cortex

Author: Hiroki Terashima, Masato Okada

Abstract: The computational modelling of the primary auditory cortex (A1) has been less fruitful than that of the primary visual cortex (V1) due to the less organized properties of A1. Greater disorder has recently been demonstrated for the tonotopy of A1 that has traditionally been considered to be as ordered as the retinotopy of V1. This disorder appears to be incongruous, given the uniformity of the neocortex; however, we hypothesized that both A1 and V1 would adopt an efficient coding strategy and that the disorder in A1 reflects natural sound statistics. To provide a computational model of the tonotopic disorder in A1, we used a model that was originally proposed for the smooth V1 map. In contrast to natural images, natural sounds exhibit distant correlations, which were learned and reflected in the disordered map. The auditory model predicted harmonic relationships among neighbouring A1 cells; furthermore, the same mechanism used to model V1 complex cells reproduced nonlinear responses similar to the pitch selectivity. These results contribute to the understanding of the sensory cortices of different modalities in a novel and integrated manner.

2 0.82042879 182 nips-2012-Learning Networks of Heterogeneous Influence

Author: Nan Du, Le Song, Ming Yuan, Alex J. Smola

Abstract: Information, disease, and influence diffuse over networks of entities in both natural systems and human society. Analyzing these transmission networks plays an important role in understanding the diffusion processes and predicting future events. However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen. In this paper, we address the challenging problem of uncovering the hidden network only from the cascades. The structure discovery problem is complicated by the fact that the influence between networked entities is heterogeneous, which can not be described by a simple parametric model. Therefore, we propose a kernelbased method which can capture a diverse range of different types of influence without any prior assumption. In both synthetic and real cascade data, we show that our model can better recover the underlying diffusion network and drastically improve the estimation of the transmission functions among networked entities. 1

same-paper 3 0.77909011 159 nips-2012-Image Denoising and Inpainting with Deep Neural Networks

Author: Junyuan Xie, Linli Xu, Enhong Chen

Abstract: We present a novel approach to low-level vision problems that combines sparse coding and deep networks pre-trained with denoising auto-encoder (DA). We propose an alternative training scheme that successfully adapts DA, originally designed for unsupervised feature learning, to the tasks of image denoising and blind inpainting. Our method’s performance in the image denoising task is comparable to that of KSVD which is a widely used sparse coding technique. More importantly, in blind image inpainting task, the proposed method provides solutions to some complex problems that have not been tackled before. Specifically, we can automatically remove complex patterns like superimposed text from an image, rather than simple patterns like pixels missing at random. Moreover, the proposed method does not need the information regarding the region that requires inpainting to be given a priori. Experimental results demonstrate the effectiveness of the proposed method in the tasks of image denoising and blind inpainting. We also show that our new training scheme for DA is more effective and can improve the performance of unsupervised feature learning. 1

4 0.61040437 197 nips-2012-Learning with Recursive Perceptual Representations

Author: Oriol Vinyals, Yangqing Jia, Li Deng, Trevor Darrell

Abstract: Linear Support Vector Machines (SVMs) have become very popular in vision as part of state-of-the-art object recognition and other classification tasks but require high dimensional feature spaces for good performance. Deep learning methods can find more compact representations but current methods employ multilayer perceptrons that require solving a difficult, non-convex optimization problem. We propose a deep non-linear classifier whose layers are SVMs and which incorporates random projection as its core stacking element. Our method learns layers of linear SVMs recursively transforming the original data manifold through a random projection of the weak prediction computed from each layer. Our method scales as linear SVMs, does not rely on any kernel computations or nonconvex optimization, and exhibits better generalization ability than kernel-based SVMs. This is especially true when the number of training samples is smaller than the dimensionality of data, a common scenario in many real-world applications. The use of random projections is key to our method, as we show in the experiments section, in which we observe a consistent improvement over previous –often more complicated– methods on several vision and speech benchmarks. 1

5 0.60911512 168 nips-2012-Kernel Latent SVM for Visual Recognition

Author: Weilong Yang, Yang Wang, Arash Vahdat, Greg Mori

Abstract: Latent SVMs (LSVMs) are a class of powerful tools that have been successfully applied to many applications in computer vision. However, a limitation of LSVMs is that they rely on linear models. For many computer vision tasks, linear models are suboptimal and nonlinear models learned with kernels typically perform much better. Therefore it is desirable to develop the kernel version of LSVM. In this paper, we propose kernel latent SVM (KLSVM) – a new learning framework that combines latent SVMs and kernel methods. We develop an iterative training algorithm to learn the model parameters. We demonstrate the effectiveness of KLSVM using three different applications in visual recognition. Our KLSVM formulation is very general and can be applied to solve a wide range of applications in computer vision and machine learning. 1

6 0.60557008 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

7 0.60450757 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

8 0.60265166 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

9 0.59988528 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

10 0.59675109 200 nips-2012-Local Supervised Learning through Space Partitioning

11 0.59671551 150 nips-2012-Hierarchical spike coding of sound

12 0.59549642 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

13 0.59533209 279 nips-2012-Projection Retrieval for Classification

14 0.59516346 193 nips-2012-Learning to Align from Scratch

15 0.595034 65 nips-2012-Cardinality Restricted Boltzmann Machines

16 0.59424263 188 nips-2012-Learning from Distributions via Support Measure Machines

17 0.59363484 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

18 0.59234619 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

19 0.59166461 48 nips-2012-Augmented-SVM: Automatic space partitioning for combining multiple non-linear dynamics

20 0.59055346 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model