nips nips2008 nips2008-148 nips2008-148-reference knowledge-graph by maker-knowledge-mining

148 nips-2008-Natural Image Denoising with Convolutional Networks


Source: pdf

Author: Viren Jain, Sebastian Seung

Abstract: We present an approach to low-level vision that combines two main ideas: the use of convolutional networks as an image processing architecture and an unsupervised learning procedure that synthesizes training samples from specific noise models. We demonstrate this approach on the challenging problem of natural image denoising. Using a test set with a hundred natural images, we find that convolutional networks provide comparable and in some cases superior performance to state of the art wavelet and Markov random field (MRF) methods. Moreover, we find that a convolutional network offers similar performance in the blind denoising setting as compared to other techniques in the non-blind setting. We also show how convolutional networks are mathematically related to MRF approaches by presenting a mean field theory for an MRF specially designed for image denoising. Although these approaches are related, convolutional networks avoid computational difficulties in MRF approaches that arise from probabilistic learning and inference. This makes it possible to learn image processing architectures that have a high degree of representational power (we train models with over 15,000 parameters), but whose computational expense is significantly less than that associated with inference in MRF approaches with even hundreds of parameters. 1 Background Low-level image processing tasks include edge detection, interpolation, and deconvolution. These tasks are useful both in themselves, and as a front-end for high-level visual tasks like object recognition. This paper focuses on the task of denoising, defined as the recovery of an underlying image from an observation that has been subjected to Gaussian noise. One approach to image denoising is to transform an image from pixel intensities into another representation where statistical regularities are more easily captured. For example, the Gaussian scale mixture (GSM) model introduced by Portilla and colleagues is based on a multiscale wavelet decomposition that provides an effective description of local image statistics [1, 2]. Another approach is to try and capture statistical regularities of pixel intensities directly using Markov random fields (MRFs) to define a prior over the image space. Initial work used handdesigned settings of the parameters, but recently there has been increasing success in learning the parameters of such models from databases of natural images [3, 4, 5, 6, 7, 8]. Prior models can be used for tasks such as image denoising by augmenting the prior with a noise model. Alternatively, an MRF can be used to model the probability distribution of the clean image conditioned on the noisy image. This conditional random field (CRF) approach is said to be discriminative, in contrast to the generative MRF approach. Several researchers have shown that the CRF approach can outperform generative learning on various image restoration and labeling tasks [9, 10]. CRFs have recently been applied to the problem of image denoising as well [5]. 1 The present work is most closely related to the CRF approach. Indeed, certain special cases of convolutional networks can be seen as performing maximum likelihood inference on a CRF [11]. The advantage of the convolutional network approach is that it avoids a general difficulty with applying MRF-based methods to image analysis: the computational expense associated with both parameter estimation and inference in probabilistic models. For example, naive methods of learning MRFbased models involve calculation of the partition function, a normalization factor that is generally intractable for realistic models and image dimensions. As a result, a great deal of research has been devoted to approximate MRF learning and inference techniques that meliorate computational difficulties, generally at the cost of either representational power or theoretical guarantees [12, 13]. Convolutional networks largely avoid these difficulties by posing the computational task within the statistical framework of regression rather than density estimation. Regression is a more tractable computation and therefore permits models with greater representational power than methods based on density estimation. This claim will be argued for with empirical results on the denoising problem, as well as mathematical connections between MRF and convolutional network approaches. 2 Convolutional Networks Convolutional networks have been extensively applied to visual object recognition using architectures that accept an image as input and, through alternating layers of convolution and subsampling, produce one or more output values that are thresholded to yield binary predictions regarding object identity [14, 15]. In contrast, we study networks that accept an image as input and produce an entire image as output. Previous work has used such architectures to produce images with binary targets in image restoration problems for specialized microscopy data [11, 16]. Here we show that similar architectures can also be used to produce images with the analog fluctuations found in the intensity distributions of natural images. Network Dynamics and Architecture A convolutional network is an alternating sequence of linear filtering and nonlinear transformation operations. The input and output layers include one or more images, while intermediate layers contain “hidden


reference text

[1] J. Portilla, V. Strela, M.J. Wainwright, E.P. Simoncelli. Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans. Image Proc., 2003.

[2] S. Lyu, E.P. Simoncelli. Statistical modeling of images with fields of Gaussian scale mixtures. NIPS* 2006.

[3] S. Geman, D. Geman. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. Pattern Analysis and Machine Intelligence, 1984.

[4] S. Roth, M.J. Black. Fields of Experts: a framework for learning image priors. CVPR 2005.

[5] M.F. Tappen, C. Liu, E.H. Adelson, W.T. Freeman. Learning Gaussian Conditional Random Fields for Low-Level Vision. CVPR 2007.

[6] Y. Weiss, W.T. Freeman. What makes a good model of natural images? CVPR 2007.

[7] P. Gehler, M. Welling. Product of

[8] S.C. Zhu, Y. Wu, D. Mumford. Filters, Random Fields and Maximum Entropy (FRAME): Towards a Unified Theory for Texture Modeling. International Journal of Computer Vision, 1998.

[9] S. Kumar, M. Hebert. Discriminative fields for modeling spatial dependencies in natural images. NIPS* 2004.

[10] X. He, R Zemel, M.C. Perpinan. Multiscale conditional random fields for image labeling. CVPR 2004.

[11] V. Jain, J.F. Murray, F. Roth, S. Turaga, V. Zhigulin, K.L. Briggman, M.N. Helmstaedter, W. Denk, H.S. Seung. Supervised Learning of Image Restoration with Convolutional Networks. ICCV 2007.

[12] S. Parise, M. Welling. Learning in markov random fields: An empirical study. Joint Stat. Meeting, 2005.

[13] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, C. Rother. A comparative study of energy minimization methods for markov random fields. ECCV 2006.

[14] Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D. Jackel. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation, 1989.

[15] Y. LeCun, F.J. Huang, L. Bottou. Learning methods for generic object recognition with invariance to pose and lighting. CVPR 2004.

[16] F. Ning, D. Delhomme, Y. LeCun, F. Piano, L. Bottou, P.E. Barbano. Toward Automatic Phenotyping of Developing Embryos From Videos. IEEE Trans. Image Proc., 2005.

[17] G. Hinton, R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 2006.

[18] M. Ranzato, YL Boureau, Y. LeCun. Sparse feature learning for deep belief networks. NIPS* 2007.

[19] Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle. Greedy Layer-Wise Training of Deep Networks. NIPS* 2006.

[20] D. Martin, C. Fowlkes, D. Tal, J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. ICCV 2001.

[21] S. Roth. High-order markov random fields for low-level vision. PhD Thesis, Brown Univ., 2007.

[22] H.S. Seung. Learning continuous attractors in recurrent networks. NIPS* 1997. 8