nips nips2006 nips2006-182 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Siwei Lyu, Eero P. Simoncelli
Abstract: The local statistical properties of photographic images, when represented in a multi-scale basis, have been described using Gaussian scale mixtures (GSMs). Here, we use this local description to construct a global field of Gaussian scale mixtures (FoGSM). Specifically, we model subbands of wavelet coefficients as a product of an exponentiated homogeneous Gaussian Markov random field (hGMRF) and a second independent hGMRF. We show that parameter estimation for FoGSM is feasible, and that samples drawn from an estimated FoGSM model have marginal and joint statistics similar to wavelet coefficients of photographic images. We develop an algorithm for image denoising based on the FoGSM model, and demonstrate substantial improvements over current state-ofthe-art denoising method based on the local GSM model. Many successful methods in image processing and computer vision rely on statistical models for images, and it is thus of continuing interest to develop improved models, both in terms of their ability to precisely capture image structures, and in terms of their tractability when used in applications. Constructing such a model is difficult, primarily because of the intrinsic high dimensionality of the space of images. Two simplifying assumptions are usually made to reduce model complexity. The first is Markovianity: the density of a pixel conditioned on a small neighborhood, is assumed to be independent from the rest of the image. The second assumption is homogeneity: the local density is assumed to be independent of its absolute position within the image. The set of models satisfying both of these assumptions constitute the class of homogeneous Markov random fields (hMRFs). Over the past two decades, studies of photographic images represented with multi-scale multiorientation image decompositions (loosely referred to as “wavelets”) have revealed striking nonGaussian regularities and inter and intra-subband dependencies. For instance, wavelet coefficients generally have highly kurtotic marginal distributions [1, 2], and their amplitudes exhibit strong correlations with the amplitudes of nearby coefficients [3, 4]. One model that can capture the nonGaussian marginal behaviors is a product of non-Gaussian scalar variables [5]. A number of authors have developed non-Gaussian MRF models based on this sort of local description [6, 7, 8], among which the recently developed fields of experts model [7] has demonstrated impressive performance in denoising (albeit at an extremely high computational cost in learning model parameters). An alternative model that can capture non-Gaussian local structure is a scale mixture model [9, 10, 11]. An important special case is Gaussian scale mixtures (GSM), which consists of a Gaussian random vector whose amplitude is modulated by a hidden scaling variable. The GSM model provides a particularly good description of local image statistics, and the Gaussian substructure of the model leads to efficient algorithms for parameter estimation and inference. Local GSM-based methods represent the current state-of-the-art in image denoising [12]. The power of GSM models should be substantially improved when extended to describe more than a small neighborhood of wavelet coefficients. To this end, several authors have embedded local Gaussian mixtures into tree-structured MRF models [e.g., 13, 14]. In order to maintain tractability, these models are arranged such that coefficients are grouped in non-overlapping clusters, allowing a graphical probability model with no loops. Despite their global consistency, the artificially imposed cluster boundaries lead to substantial artifacts in applications such as denoising. In this paper, we use a local GSM as a basis for a globally consistent and spatially homogeneous field of Gaussian scale mixtures (FoGSM). Specifically, the FoGSM is formulated as the product of two mutually independent MRFs: a positive multiplier field obtained by exponentiating a homogeneous Gaussian MRF (hGMRF), and a second hGMRF. We develop a parameter estimation procedure, and show that the model is able to capture important statistical regularities in the marginal and joint wavelet statistics of a photographic image. We apply the FoGSM to image denoising, demonstrating substantial improvement over the previous state-of-the-art results obtained with a local GSM model. 1 Gaussian scale mixtures A GSM random vector x is formed as the product of a zero-mean Gaussian random vector u and an d √ d independent random variable z, as x = zu, where = denotes equality in distribution. The density of x is determined by the covariance of the Gaussian vector, Σ, and the density of the multiplier, p z (z), through the integral Nx (0, zΣ)pz (z)dz ∝ p(x) = z z xT Σ−1 x 1 pz (z)dz. exp − √ 2z z|Σ| (1) A key property of GSMs is that when z determines the scale of the conditional variance of x given z, which is a Gaussian variable with zero mean and covariance zΣ. In addition, the normalized variable √ x z is a zero mean Gaussian with covariance matrix Σ. The GSM model has been used to describe the marginal and joint densities of local clusters of wavelet coefficients, both within and across subbands [9], where the embedded Gaussian structure affords simple and efficient computation. This local GSM model has been be used for denoising, by independently estimating each coefficient conditioned on its surrounding cluster [12]. This method achieves state-of-the-art performances, despite the fact that treating overlapping clusters as independent does not give rise to a globally consistent statistical model that satisfies all the local constraints. 2 Fields of Gaussian scale mixtures In this section, we develop fields of Gaussian scale mixtures (FoGSM) as a framework for modeling wavelet coefficients of photographic images. Analogous to the local GSM model, we use a latent multiplier field to modulate a homogeneous Gaussian MRF (hGMRF). Formally, we define a FoGSM x as the product of two mutually independent MRFs, √ d x = u ⊗ z, (2) where u is a zero-mean hGMRF, and z is a field of positive multipliers that control the local coefficient variances. The operator ⊗ denotes element-wise multiplication, and the square root operation is applied to each component. Note that x has a one-dimensional GSM marginal distributions, while its components have dependencies captured by the MRF structures of u and z. Analogous to the local GSM, when conditioned on z, x is an inhomogeneous GMRF p(x|z) ∝ √ 1 |Qu | exp − xT D z zi 2 i −1 Qu D √ z −1 x = 1 |Qu | exp − (x zi 2 i √ T z) Qu (x √ z) , (3) where Qu is the inverse covariance matrix of u (also known as the precision matrix), and D(·) denotes the operator that form a diagonal matrix from an input vector. Note also that the element√ wise division of the two fields, x z, yields a hGMRF with precision matrix Q u . To complete the FoGSM model, we need to specify the structure of the multiplier field z. For tractability, we use another hGMRF as a substrate, and map it into positive values by exponentiation, x u log z Fig. 1. Decomposition of a subband from image “boat” (left) into the normalized subband u (middle) and the multiplier field z (right, in the logarithm domain). Each image is rescaled individually to fill the full range of grayscale intensities. as was done in [10]. To be more specific, we model log(z) as a hGMRF with mean µ and precision matrix Qz , where the log operator is applied element-wise, from which the density of z follows as: pz (z) ∝ |Qz | 1 exp − (log z − µ)T Qz (log z − µ) . zi 2 i (4) This is a natural extension of the univariate lognormal prior used previously for the scalar multiplier in the local GSM model [12]. The restriction to hGMRFs greatly simplifies computation with FoGSM. Particularly, we take advantage of the fact that a 2D hGMRF with circular boundary handling has a sparse block-circulant precision matrix with a generating kernel θ specifying its nonzero elements. A block-circulant matrix is diagonalized by the Fourier transform, and its multiplication with a vector corresponds to convolution with the kernel θ. The diagonalizability with a fixed and efficiently computed transform makes the parameter estimation, sampling, and inference with a hGMRF substantially more tractable than with a general MRF. Readers are referred to [15] for a detailed description of hGMRFs. Parameter estimation: The estimation of the latent multiplier field z and the model parameters (µ, Qz , Qu ) may be achieved by maximizing log p(x, z; Q u, Qz , µ) with an iterative coordinate-ascent method, which is guaranteed to converge. Specifically, based on the statistical dependency structures in the FoGSM model, the following three steps are repeated until convergence: (i) z(t+1) = argmaxz log p(x|z; Q(t) ) + log p(z; Q(t) , µ(t) ) u z (ii) Q(t+1) = argmaxQu log p(x|z(t+1); Qu ) u (iii) (Q(t+1) , µ(t+1) ) = argmaxQz ,µ log p(z(t+1) ; Qz , µ) z (5) According to the FoGSM model structure, steps (ii) and (iii) correspond to maximum likelihood √ z(t+1) and log z(t+1) , respectively. Because of this, estimates of the parameters of hGMRFs, x both steps may be efficiently implemented by exploiting the diagonalization of the precision matrices with 2D Fourier transforms [15]. Step (i) in (5) may be implemented with conjugate gradient ascent [16]. To simplify description and computation, we introduce a new variable for the element-wise inverse square root of the multiplier: √ s=1 z. The likelihood in (3) is then changed to: p(x|s) ∝ i 1 si exp − (x ⊗ s)T Qu (x ⊗ s) = 2 i 1 si exp − sT D(x)Qu D(x)s . 2 (6) The joint density of s is obtained from (4), using the relations between densities of transformed variables, as 1 1 exp − (2 log s + µ)T Qz (2 log s + µ) . (7) p(s) ∝ si 2 i ˆ Combining . (6) and (7), step (i) in (5) is equivalent to computing s = argmaxs log p(x|s; Qu ) + log p(s; Qz , µ), which is further simplified into: argmin s 1 T 1 s D (x) Qu D (x) s + (2 log s + µ)T Qz (2 log s + µ) . 2 2 (8) boat house peppers x x x x log p(x) Barbara Fig. 2. Empirical marginal log distributions of coefficients from a multi-scale decomposition of photographic images (blue dot-dashed line), synthesized FoGSM samples from the same subband (red solid line), and a Gaussian with the same standard deviation (red dashed line). ˆ ˆ and the optimal z is then recovered as z = 1 (ˆ ⊗ s). We the optimize (8) with conjugate gradient s ˆ ascent [16]. Specifically, the negative gradient of the objective function in (8) with respect to s is − ∂ log p(x|s)p(s) ∂s = D (x) Qu D (x) s + 2 D(s)−1 Qz (2 log s + µ) = x ⊗ (θu (x ⊗ s)) + 2(θz (2 log s + µ)) s, and the multiplication of any vector h with the Hessian matrix can be computed as: ∂2 log p(x|s)p(s) h = x ⊗ (θu (x ⊗ h)) + 4 (θz (h s)) s − 2 θz (log s + µ) ⊗ h (s ⊗ s). ∂s2 Both operations can be expressed entirely in terms of element-wise operations ( and ⊗) and 2D convolutions ( ) with the generating kernels of the two precision matrices θ u and θz , which allows for efficient implementation. 3 Modeling photographic images We have applied the FoGSM model to subbands of a multi-scale image representation known as a steerable pyramid [17]. This decomposition is a tight frame, constructed from oriented multiscale derivative operators, and is overcomplete by a factor of 4K/3, where K is the number of orientation bands. Note that the marginal and joint statistics we describe are not specific to this decomposition, and are similar for other multi-scale oriented representations. We fit a FoGSM model to each subband of a decomposed photographic image, using the algorithms described in the previous section. For precision matrices Q u and Qz , we assumed a 5 × 5 Markov neighborhood (corresponding to a 5 × 5 convolution kernel), which was loosely chosen to optimize the tradeoff between accuracy and overfitting. Figure 1 shows the result of fitting a FoGSM model to an example subband from the “boat” image (left panel). The subband is decomposed into the product of the u field (middle panel) and the z field (right panel, in the logarithm domain), along with model parameters Q u , µ and Qz (not shown). Visually, the changing spatial variances are represented in the estimated log z field, and the estimated u is much more homogeneous than the original subband and has a marginal distribution close to Gaussian.1 However, the log z field still has a non-Gaussian marginal distribution and is spatially inhomogeneous, suggesting limitations of FoGSM for modeling photographic image wavelet coefficients (see Discussion). The statistical dependencies captured by the FoGSM model can be further revealed by examining marginal and joint statistics of samples synthesized with the estimated model parameters. A sample √ from FoGSM can be formed by multiplying samples of u and z. The former is obtained by sampling from hGMRF u, and the latter is obtained from the element-wise exponentiation followed by a element-wise square root operation of a sample of hGMRF log z. This procedure is again efficient for FoGSM due to the use of hGMRFs as building blocks [15]. Marginal distributions: We start by comparing the marginal distributions of the samples and the original subband. Figure 2 shows empirical histograms in the log domain of a particular subband 1 This ”Gaussianizing” behavior was first noted in photographic images by Ruderman [18], who observed that image derivative measurements that were normalized by a local estimate of their standard deviation had approximately Gaussian marginal distributions. close ∆ = 1 near ∆ = 4 far ∆ = 32 orientation scale real sim real sim Fig. 3. Examples of empirically observed distributions of wavelet coefficient pairs, compared with distributions from synthesized samples with the FoGSM model. See text for details. from four different photographic images (blue dot-dashed line), and those of the synthesized samples of FoGSM models learned from each corresponding subband (red solid line). For comparison, a Gaussian with the same standard deviation as the image subband is also displayed (red dashed line). Note that the synthesized samples have conspicuous non-Gaussian characteristics similar to the real subbands, exemplified by the high peak and heavy tails in the marginal distributions. On the other hand, they are typically less kurtotic than the real subbands. We believe this arises from the imprecise Gaussian approximation of log z (see Discussion). Joint distributions: In addition to one-dimensional marginal statistics, the FoGSM model is capable of capturing the joint behavior of wavelet coefficients. As described in [4, 9], wavelet coefficients of photographic images present non-Gaussian dependencies. Shown in the first and the third rows in Fig. 3 are empirical joint and conditional histograms for one subband of the “boat” image, for five pairs of coefficients, corresponding to basis functions with spatial separations of ∆ = {1, 4, 32} samples, two orthogonal orientations and two adjacent scales. Contour lines in the joint histogram are drawn at equal intervals of log probability. Intensities in the conditional histograms correspond to probability, except that each column is independently rescaled to fill the full range of intensity. For a pair of adjacent coefficients, we observe an elliptical joint distribution and a “bow-tie” shaped conditional distribution. The latter is indicative of strong non-Gaussian dependencies. For coefficients that are distant, the dependency becomes weaker and the corresponding joint and conditional histograms become more separable, as would be expected for two independent random variables. Random samples drawn from a FoGSM model, with parameters fitted to the corresponding subband, have statistical characteristics consistent with the general description of wavelet coefficients of photographic images. Shown in the second and the fourth rows of Fig. 3 are the joint and conditional histograms of synthesized samples from the FoGSM model estimated from the same subband as in the first and the third rows. Note that the joint and conditional histograms of the synthesized samples have similar transition of spatial dependencies as the separation increases (column 1,2 and 3), suggesting that the FoGSM accounts well for pairwise joint dependencies of coefficients over a full range of spatial separations. On the other hand, the dependencies between subbands of different orientations and scales are not properly modeled by FoGSM (column 4 and 5). This is especially true for subbands at different scales, which exhibit strong dependencies. The current FoGSM model original image noisy image (σ = 50) (PSNR = 14.15dB) GSM-BLS (PSNR = 26.34dB) FoGSM (PSNR = 27.01dB) Fig. 4. Denoising results using local GSM [12] and FoGSM. Performances are evaluated in peaksignal-to-noise-ratio (PSNR), 20 log10 (255/σe ), where σe is the standard deviation of the error. does not exhibit those dependencies as only spatial neighbors are used to make use the 2D hGMRFs (see Discussion). 4 Application to image denoising Let y = x+w be a wavelet subband of an image that has been corrupted with white Gaussian noise of known variance. In an overcomplete wavelet domain such as steerable pyramid, the white Gaussian noise is transformed into correlated Gaussian noise w ∼ N w (0, Σw ), whose covariance Σ w can be derived from the basis functions of the pyramid transform. With FoGSM as prior over x, commonly used denoising methods involve expensive high-dimensional integration: for instance, maximum ˆ a posterior estimate, x MAP = argmaxx log p(x|y), requires a high-dimensional integral over z, and ˆ the Bayesian least square estimation, x BLS = E(x|y) requires a double high-dimensional integral over x and z. Although it is possible to optimize with these criteria using Monte-Carlo Markov sampling or other approximations, we instead develop a more efficient deterministic algorithm that takes advantage of the hGMRF structure in the FoGSM model. Specifically, we compute (ˆ , z, Qu , Qz , µ) = argmaxx,z,Qu ,Qz ,µ log p(x, z|y; Q u, Qz , µ) x ˆ ˆ ˆ ˆ (9) ˆ and take x as the optimal denoised subband. Note that the model parameters are learned within the inference process rather than in a separate parameter learning step. This strategy, known as partial optimal solution [19], greatly reduces the computational complexity. We optimize (9) with coordinate ascent, iterating between maximizing each of (x, z, Q u , Qz , µ) while fixing the others. With fixed estimates of (z, Q u , Qz , µ), the optimization of x is argmaxx log p(x, z|y; Q u, Qz , µ) = argmaxx log p(x|z, y; Q u, Qz , µ) + log p(z|y; Q u, Qz , µ) , which reduces to argmax x log p(x|z, y; Q u), with the second term independent of x and can be dropped from optimization. Given the Gaussian structure of x given z, this step is then equivalent to a Wiener filter (linear in y). Fixing (x, Q u , Qz , µ), the optimization of z is argmaxz log p(x, z|y; Q u, Qz , µ)= argmaxz log p(y|x, z; Q u)+log p(x, z; Q u, Qz , µ)−log p(y; Qu , Qz , µ) , which is further reduced to argmax z log p(x, z; Qu , Qz , µ). Here, the first term was dropped since y is independent of z when conditioned on x. The last term was also dropped since it is also independent of z. Therefore, optimizing z given (x, Q u , Qz , µ) is equivalent to the first step of the algorithm in section 2, which can be implemented with efficient gradient descent. Finally, given (x, z), the FoGSM √ z(t+1) and log z(t+1) , similar to the model parameters (Q u , Qz , µ) are estimated from hGMRFs x second and third step in the algorithm of section 2. However, to reduce the overall computation time, instead of a complete maximum likelihood estimation, these parameters are estimated with a maximum pseudo-likelihood procedure [20], which finds the parameters maximizing the product of all conditional distributions (which are 1D Gaussians in the GMRF case), followed by a projection to the subspace of FoGSM parameters that results in positive definite precision matrices. We tested this denoising method on a standard set of test images [12]. The noise corrupted images were first decomposed these into a steerable pyramid with multiple levels (5 levels for a 512 × 512 image and 4 levels for a 256 × 256 image ) and 8 orientations. We assumed a FoGSM model for each subband, with a 5 × 5 neighborhood for field u and a 1 × 1 neighborhood for field log z. These sizes were chosen to provide a reasonable combination of performance and computational efficiency. We then estimate the optimal x with the algorithm described previously, with the initial values of x and z computed from subband denoised with the local GSM model [12]. Shown in Fig. 4 is an example of denoising the “boat” image corrupted with simulated additive white Gaussian noise of strength σ = 50, corresponding to a peak-signal-to-noise-ratio (PSNR), of 14.15 dB. We compare this with the local GSM method in [12], which, assuming a local GSM model for the neighborhood consisting of 3 × 3 spatial neighbors plus parent in the next coarsest scale, computes a Bayes least squares estimate of each coefficient conditioned on its surrounding neighborhood. The FoGSM denoising achieves substantial improvement (+0.68 in PSNR) and is seen to exhibit better contrast and continuation of oriented features (see Fig. 4). On the other hand, FoGSM introduces some noticeable artifacts in low contrast areas, which is caused by numerical instability at locations with small z. We find that the improvement in terms of PSNR is consistent across photographic images and noise levels, as reported in Table 1. But even with a restricted neighborhood for the multiplier field, this PSNR improvement does come at a substantial computational cost. As a rough indication, running on a PowerPC G5 workstation with 2.3 Ghz processor and 16 Gb RAM memory, using unoptimized MATLAB (version R14) code, denoising a 512 × 512 image takes on average 4.5 hours (results averaging over 5 images), and denoising a 256×256 image takes on average 2.4 hours (result averaging over 2 images), to a convergence precision producing the reported results. Our preliminary investigation indicates that the slow running time is mainly due to the nature of coordinate ascent and the landscape of (9), which requires many iterations to converge. 5 Discussion We have introduced fields of Gaussian scale mixtures as a flexible and efficient tool for modeling the statistics of wavelet coefficients of photographic images. We developed a feasible (although admittedly computationally costly) parameter estimation method, and showed that samples synthesized from the fitted FoGSM model are able to capture structures in the marginal and joint wavelet statistics of photographic images. Preliminary results of applying FoGSM to image denoising indicate substantial improvements over the state-of-the-art methods based on the local GSM model. Although FoGSM has a structure that is similar to the local scale mixture model [9, 10], there is a fundamental difference between them. In FoGSM, hGMRF structures are enforced in u and log z, while the local scale mixture models impose minimal statistical structure on these variables. Because of this, our model easily extends to images of arbitrary size, while the local scale mixture models are essentially confined to describing small image patches (the curse of dimensionality, and the increase in computational cost prevent one from scaling the patch size up). On the other hand, the close relation to Gaussian MRF makes the analysis and computation of FoGSM significantly easier than other non-Gaussian MRF based image models [6, 7, 5]. We envision, and are currently working on, a number of model improvements. First, the model should benefit from the introduction of more general Markov neighborhoods, including wavelet coefficients from subbands at other scales and orientations [4, 12], since the current model is clearly not accounting for these dependencies (see Fig. 3). Secondly, the log transformation used to derive the multiplier field from a hGMRF is somewhat ad hoc, and we believe that substitution of another nonlinear transformation (e.g., a power law [14]) might lead to a more accurate description of the image statistics. Thirdly, the current denoising method estimates model parameter during the process of denoising, which produces image adaptive model parameters. We are exploring the possibility of using a set of generic model parameter learned a priori on a large set of photographic images, so that a generic statistical model for all photographic images based on FoGSM can be built. Finally, there exist residual inhomogeneous structures in the log z field (see Fig. 1) that can likely be captured by explicitly incorporating local orientation [21] or phase into the model. Finding tractable models and algorithms for handling such circular variables is challenging, but we believe their inclusion will result in substantial improvements in modeling and in denoising performance. σ/PSNR 10/28.13 25/20.17 50/14.15 100/8.13 σ/PSNR 10/28.13 25/20.17 50/14.15 100/8.13 Barbara 35.01 (34.01) 30.10 (29.07) 26.40 (25.45) 23.01 (22.61) Flintstones 32.47 (31.78) 28.29 (27.48) 24.82 (24.02) 21.24 (20.49) barco 35.05 (34.42) 30.44 (29.73) 27.36 (26.63) 24.44 (23.84) house 35.63 (35.27) 31.64 (31.32) 28.51 (28.23) 25.33 (25.31) boat 34.12 (33.58) 30.03 (29.34) 27.01 (26.35) 24.20 (23.79) Lena 35.94 (35.60) 32.11 (31.70) 29.12 (28.62) 26.12 (25.77) fingerprint 33.28 (32.45) 28.45 (27.44) 25.11 (24.13) 21.78 (21.21) peppers 34.38 (33.73) 29.78 (29.18) 26.43 (25.93) 23.17 (22.80) Table 1. Denoising results with FoGSM on different images and different noise levels. Shown in the table are PSNRs (20 log10 (255/σe ), where σe is the standard deviation of the error) of the denoised images, and in the parenthesis are the PSNRs of the same images denoised with a local GSM model [12]. References [1] P. J. Burt. Fast filter transforms for image processing. Comp. Graph. Image Proc., 16:20–51, 1981. [2] D. J. Field. Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am., 4(12):2379–2394, 1987. [3] B. Wegmann and C. Zetzsche. Statistical dependencies between orientation filter outputs used in human vision based image code. In Proc. Visual Comm. and Image Proc., volume 1360, pages 909–922, 1990. [4] R. W. Buccigrossi and E. P. Simoncelli. Image compression via joint statistical characterization in the wavelet domain. IEEE Trans. on Image Proc., 8(12):1688–1701, 1999. [5] Y. W. Teh, M. Welling, S. Osindero, and G. E. Hinton. Energy-based models for sparse overcomplete representations. J. of Machine Learning Res., 4:1235–1260, 2003. [6] S. C. Zhu, Y. Wu, and D. Mumford. Filters, random fields and maximum entropy (FRAME): Towards a unified theory for texture modeling. Int’l. J. Comp. Vis., 27(2):107–126, 1998. [7] S. Roth and M. J. Black. Fields of experts: a framework for learning image priors. In IEEE Conf. on Comp. Vis. and Pat. Rec., volume 2, pages 860–867, 2005. [8] P. Gehler and M. Welling. Products of ”edge-perts”. In Adv. in Neural Info. Proc. Systems (NIPS*05). MIT Press, 2006. [9] M. J. Wainwright and E. P. Simoncelli. Scale mixtures of Gaussians and the statistics of natural images. In Adv. Neural Info. Proc. Sys. (NIPS*99), volume 12, pages 855–861, May 2000. [10] Y. Karklin and M. S. Lewicki. A hierarchical Bayesian model for learning non-linear statistical regularities in non-stationary natural signals. Neural Computation, 17(2):397–423, 2005. [11] A. Hyv¨ rinen, P. O. Hoyer, and M. Inki. Topographic ICA as a model of natural image statistics. In the a First IEEE Int’l. Workshop on Bio. Motivated Comp. Vis., London, UK, 2000. [12] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli. Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans. on Image Proc., 12(11):1338–1351, 2003. [13] J. Romberg, H. Choi, and R. G. Baraniuk. Bayesian tree-structured image modeling using wavelet domain hidden Markov models. IEEE Trans. on Image Proc., 10(7):303–347, 2001. [14] M. J. Wainwright, E. P. Simoncelli, and A. S. Willsky. Random cascades on wavelet trees and their use in modeling and analyzing natural imagery. Appl. and Comp. Harm. Ana., 11(1):89–123, 2001. [15] H. Rue and L. Held. Gaussian Markov Random Fields: Theory And Applications. Monographs on Statistics and Applied Probability. Chapman and Hall/CRC, 2005. [16] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes. Cambridge, 2nd edition, 2002. [17] E. P. Simoncelli and W. T. Freeman. The steerable pyramid: A flexible architecture for multi-scale derivative computation. In IEEE Int’l. Conf. on Image Proc., volume 3, pages 444–447, 1995. [18] D. Ruderman. The statistics of natural images. Network : Comp. in Neural Sys., 5:598–605, 1994. [19] M. Figueiredo and J. Leit¨ o. Unsupervised image restoration and edge location using compound Gaussa Markov random fields and MDL principle. IEEE Trans. on Image Proc., 6(8):1089–1122, 1997. [20] J. Besag. On the statistical analysis of dirty pictures. J. of the Royal Stat. Soc., Series B, 48:259–302, 1986. [21] D. K. Hammond and E. P. Simoncelli. Image denoising with an orientation-adaptive Gaussian scale mixture model. In Proc. 13th IEEE Int’l. Conf. on Image Proc., pages 1433–1436, October 2006.
Reference: text
sentIndex sentText sentNum sentScore
1 Here, we use this local description to construct a global field of Gaussian scale mixtures (FoGSM). [sent-4, score-0.18]
2 Specifically, we model subbands of wavelet coefficients as a product of an exponentiated homogeneous Gaussian Markov random field (hGMRF) and a second independent hGMRF. [sent-5, score-0.37]
3 We show that parameter estimation for FoGSM is feasible, and that samples drawn from an estimated FoGSM model have marginal and joint statistics similar to wavelet coefficients of photographic images. [sent-6, score-0.561]
4 We develop an algorithm for image denoising based on the FoGSM model, and demonstrate substantial improvements over current state-ofthe-art denoising method based on the local GSM model. [sent-7, score-0.546]
5 Over the past two decades, studies of photographic images represented with multi-scale multiorientation image decompositions (loosely referred to as “wavelets”) have revealed striking nonGaussian regularities and inter and intra-subband dependencies. [sent-14, score-0.378]
6 For instance, wavelet coefficients generally have highly kurtotic marginal distributions [1, 2], and their amplitudes exhibit strong correlations with the amplitudes of nearby coefficients [3, 4]. [sent-15, score-0.334]
7 One model that can capture the nonGaussian marginal behaviors is a product of non-Gaussian scalar variables [5]. [sent-16, score-0.107]
8 An alternative model that can capture non-Gaussian local structure is a scale mixture model [9, 10, 11]. [sent-18, score-0.146]
9 An important special case is Gaussian scale mixtures (GSM), which consists of a Gaussian random vector whose amplitude is modulated by a hidden scaling variable. [sent-19, score-0.098]
10 The GSM model provides a particularly good description of local image statistics, and the Gaussian substructure of the model leads to efficient algorithms for parameter estimation and inference. [sent-20, score-0.242]
11 Local GSM-based methods represent the current state-of-the-art in image denoising [12]. [sent-21, score-0.26]
12 The power of GSM models should be substantially improved when extended to describe more than a small neighborhood of wavelet coefficients. [sent-22, score-0.215]
13 To this end, several authors have embedded local Gaussian mixtures into tree-structured MRF models [e. [sent-23, score-0.111]
14 In this paper, we use a local GSM as a basis for a globally consistent and spatially homogeneous field of Gaussian scale mixtures (FoGSM). [sent-28, score-0.203]
15 Specifically, the FoGSM is formulated as the product of two mutually independent MRFs: a positive multiplier field obtained by exponentiating a homogeneous Gaussian MRF (hGMRF), and a second hGMRF. [sent-29, score-0.155]
16 We develop a parameter estimation procedure, and show that the model is able to capture important statistical regularities in the marginal and joint wavelet statistics of a photographic image. [sent-30, score-0.605]
17 We apply the FoGSM to image denoising, demonstrating substantial improvement over the previous state-of-the-art results obtained with a local GSM model. [sent-31, score-0.186]
18 1 Gaussian scale mixtures A GSM random vector x is formed as the product of a zero-mean Gaussian random vector u and an d √ d independent random variable z, as x = zu, where = denotes equality in distribution. [sent-32, score-0.117]
19 The GSM model has been used to describe the marginal and joint densities of local clusters of wavelet coefficients, both within and across subbands [9], where the embedded Gaussian structure affords simple and efficient computation. [sent-36, score-0.454]
20 This local GSM model has been be used for denoising, by independently estimating each coefficient conditioned on its surrounding cluster [12]. [sent-37, score-0.097]
21 This method achieves state-of-the-art performances, despite the fact that treating overlapping clusters as independent does not give rise to a globally consistent statistical model that satisfies all the local constraints. [sent-38, score-0.092]
22 2 Fields of Gaussian scale mixtures In this section, we develop fields of Gaussian scale mixtures (FoGSM) as a framework for modeling wavelet coefficients of photographic images. [sent-39, score-0.596]
23 Analogous to the local GSM model, we use a latent multiplier field to modulate a homogeneous Gaussian MRF (hGMRF). [sent-40, score-0.185]
24 Note that x has a one-dimensional GSM marginal distributions, while its components have dependencies captured by the MRF structures of u and z. [sent-43, score-0.145]
25 Decomposition of a subband from image “boat” (left) into the normalized subband u (middle) and the multiplier field z (right, in the logarithm domain). [sent-49, score-0.629]
26 Each image is rescaled individually to fill the full range of grayscale intensities. [sent-50, score-0.117]
27 To be more specific, we model log(z) as a hGMRF with mean µ and precision matrix Qz , where the log operator is applied element-wise, from which the density of z follows as: pz (z) ∝ |Qz | 1 exp − (log z − µ)T Qz (log z − µ) . [sent-52, score-0.198]
28 zi 2 i (4) This is a natural extension of the univariate lognormal prior used previously for the scalar multiplier in the local GSM model [12]. [sent-53, score-0.151]
29 Parameter estimation: The estimation of the latent multiplier field z and the model parameters (µ, Qz , Qu ) may be achieved by maximizing log p(x, z; Q u, Qz , µ) with an iterative coordinate-ascent method, which is guaranteed to converge. [sent-59, score-0.186]
30 2 (6) The joint density of s is obtained from (4), using the relations between densities of transformed variables, as 1 1 exp − (2 log s + µ)T Qz (2 log s + µ) . [sent-65, score-0.207]
31 (6) and (7), step (i) in (5) is equivalent to computing s = argmaxs log p(x|s; Qu ) + log p(s; Qz , µ), which is further simplified into: argmin s 1 T 1 s D (x) Qu D (x) s + (2 log s + µ)T Qz (2 log s + µ) . [sent-67, score-0.26]
32 2 2 (8) boat house peppers x x x x log p(x) Barbara Fig. [sent-68, score-0.198]
33 Empirical marginal log distributions of coefficients from a multi-scale decomposition of photographic images (blue dot-dashed line), synthesized FoGSM samples from the same subband (red solid line), and a Gaussian with the same standard deviation (red dashed line). [sent-70, score-0.726]
34 3 Modeling photographic images We have applied the FoGSM model to subbands of a multi-scale image representation known as a steerable pyramid [17]. [sent-75, score-0.555]
35 Note that the marginal and joint statistics we describe are not specific to this decomposition, and are similar for other multi-scale oriented representations. [sent-77, score-0.15]
36 We fit a FoGSM model to each subband of a decomposed photographic image, using the algorithms described in the previous section. [sent-78, score-0.457]
37 For precision matrices Q u and Qz , we assumed a 5 × 5 Markov neighborhood (corresponding to a 5 × 5 convolution kernel), which was loosely chosen to optimize the tradeoff between accuracy and overfitting. [sent-79, score-0.088]
38 Figure 1 shows the result of fitting a FoGSM model to an example subband from the “boat” image (left panel). [sent-80, score-0.345]
39 The subband is decomposed into the product of the u field (middle panel) and the z field (right panel, in the logarithm domain), along with model parameters Q u , µ and Qz (not shown). [sent-81, score-0.288]
40 Visually, the changing spatial variances are represented in the estimated log z field, and the estimated u is much more homogeneous than the original subband and has a marginal distribution close to Gaussian. [sent-82, score-0.432]
41 1 However, the log z field still has a non-Gaussian marginal distribution and is spatially inhomogeneous, suggesting limitations of FoGSM for modeling photographic image wavelet coefficients (see Discussion). [sent-83, score-0.611]
42 The statistical dependencies captured by the FoGSM model can be further revealed by examining marginal and joint statistics of samples synthesized with the estimated model parameters. [sent-84, score-0.355]
43 The former is obtained by sampling from hGMRF u, and the latter is obtained from the element-wise exponentiation followed by a element-wise square root operation of a sample of hGMRF log z. [sent-86, score-0.09]
44 Marginal distributions: We start by comparing the marginal distributions of the samples and the original subband. [sent-88, score-0.094]
45 close ∆ = 1 near ∆ = 4 far ∆ = 32 orientation scale real sim real sim Fig. [sent-90, score-0.113]
46 Examples of empirically observed distributions of wavelet coefficient pairs, compared with distributions from synthesized samples with the FoGSM model. [sent-92, score-0.277]
47 from four different photographic images (blue dot-dashed line), and those of the synthesized samples of FoGSM models learned from each corresponding subband (red solid line). [sent-94, score-0.576]
48 For comparison, a Gaussian with the same standard deviation as the image subband is also displayed (red dashed line). [sent-95, score-0.342]
49 Note that the synthesized samples have conspicuous non-Gaussian characteristics similar to the real subbands, exemplified by the high peak and heavy tails in the marginal distributions. [sent-96, score-0.169]
50 Joint distributions: In addition to one-dimensional marginal statistics, the FoGSM model is capable of capturing the joint behavior of wavelet coefficients. [sent-99, score-0.306]
51 As described in [4, 9], wavelet coefficients of photographic images present non-Gaussian dependencies. [sent-100, score-0.421]
52 3 are empirical joint and conditional histograms for one subband of the “boat” image, for five pairs of coefficients, corresponding to basis functions with spatial separations of ∆ = {1, 4, 32} samples, two orthogonal orientations and two adjacent scales. [sent-102, score-0.374]
53 Contour lines in the joint histogram are drawn at equal intervals of log probability. [sent-103, score-0.109]
54 For coefficients that are distant, the dependency becomes weaker and the corresponding joint and conditional histograms become more separable, as would be expected for two independent random variables. [sent-107, score-0.107]
55 Random samples drawn from a FoGSM model, with parameters fitted to the corresponding subband, have statistical characteristics consistent with the general description of wavelet coefficients of photographic images. [sent-108, score-0.444]
56 3 are the joint and conditional histograms of synthesized samples from the FoGSM model estimated from the same subband as in the first and the third rows. [sent-110, score-0.458]
57 Note that the joint and conditional histograms of the synthesized samples have similar transition of spatial dependencies as the separation increases (column 1,2 and 3), suggesting that the FoGSM accounts well for pairwise joint dependencies of coefficients over a full range of spatial separations. [sent-111, score-0.372]
58 On the other hand, the dependencies between subbands of different orientations and scales are not properly modeled by FoGSM (column 4 and 5). [sent-112, score-0.161]
59 This is especially true for subbands at different scales, which exhibit strong dependencies. [sent-113, score-0.121]
60 The current FoGSM model original image noisy image (σ = 50) (PSNR = 14. [sent-114, score-0.216]
61 4 Application to image denoising Let y = x+w be a wavelet subband of an image that has been corrupted with white Gaussian noise of known variance. [sent-122, score-0.794]
62 In an overcomplete wavelet domain such as steerable pyramid, the white Gaussian noise is transformed into correlated Gaussian noise w ∼ N w (0, Σw ), whose covariance Σ w can be derived from the basis functions of the pyramid transform. [sent-123, score-0.342]
63 Specifically, we compute (ˆ , z, Qu , Qz , µ) = argmaxx,z,Qu ,Qz ,µ log p(x, z|y; Q u, Qz , µ) x ˆ ˆ ˆ ˆ (9) ˆ and take x as the optimal denoised subband. [sent-126, score-0.114]
64 With fixed estimates of (z, Q u , Qz , µ), the optimization of x is argmaxx log p(x, z|y; Q u, Qz , µ) = argmaxx log p(x|z, y; Q u, Qz , µ) + log p(z|y; Q u, Qz , µ) , which reduces to argmax x log p(x|z, y; Q u), with the second term independent of x and can be dropped from optimization. [sent-130, score-0.386]
65 Fixing (x, Q u , Qz , µ), the optimization of z is argmaxz log p(x, z|y; Q u, Qz , µ)= argmaxz log p(y|x, z; Q u)+log p(x, z; Q u, Qz , µ)−log p(y; Qu , Qz , µ) , which is further reduced to argmax z log p(x, z; Qu , Qz , µ). [sent-132, score-0.297]
66 We tested this denoising method on a standard set of test images [12]. [sent-138, score-0.222]
67 The noise corrupted images were first decomposed these into a steerable pyramid with multiple levels (5 levels for a 512 × 512 image and 4 levels for a 256 × 256 image ) and 8 orientations. [sent-139, score-0.449]
68 We assumed a FoGSM model for each subband, with a 5 × 5 neighborhood for field u and a 1 × 1 neighborhood for field log z. [sent-140, score-0.169]
69 We then estimate the optimal x with the algorithm described previously, with the initial values of x and z computed from subband denoised with the local GSM model [12]. [sent-142, score-0.346]
70 4 is an example of denoising the “boat” image corrupted with simulated additive white Gaussian noise of strength σ = 50, corresponding to a peak-signal-to-noise-ratio (PSNR), of 14. [sent-144, score-0.297]
71 We compare this with the local GSM method in [12], which, assuming a local GSM model for the neighborhood consisting of 3 × 3 spatial neighbors plus parent in the next coarsest scale, computes a Bayes least squares estimate of each coefficient conditioned on its surrounding neighborhood. [sent-146, score-0.206]
72 We find that the improvement in terms of PSNR is consistent across photographic images and noise levels, as reported in Table 1. [sent-151, score-0.264]
73 But even with a restricted neighborhood for the multiplier field, this PSNR improvement does come at a substantial computational cost. [sent-152, score-0.161]
74 3 Ghz processor and 16 Gb RAM memory, using unoptimized MATLAB (version R14) code, denoising a 512 × 512 image takes on average 4. [sent-154, score-0.26]
75 5 hours (results averaging over 5 images), and denoising a 256×256 image takes on average 2. [sent-155, score-0.26]
76 5 Discussion We have introduced fields of Gaussian scale mixtures as a flexible and efficient tool for modeling the statistics of wavelet coefficients of photographic images. [sent-158, score-0.501]
77 We developed a feasible (although admittedly computationally costly) parameter estimation method, and showed that samples synthesized from the fitted FoGSM model are able to capture structures in the marginal and joint wavelet statistics of photographic images. [sent-159, score-0.658]
78 Preliminary results of applying FoGSM to image denoising indicate substantial improvements over the state-of-the-art methods based on the local GSM model. [sent-160, score-0.366]
79 Although FoGSM has a structure that is similar to the local scale mixture model [9, 10], there is a fundamental difference between them. [sent-161, score-0.124]
80 In FoGSM, hGMRF structures are enforced in u and log z, while the local scale mixture models impose minimal statistical structure on these variables. [sent-162, score-0.21]
81 Because of this, our model easily extends to images of arbitrary size, while the local scale mixture models are essentially confined to describing small image patches (the curse of dimensionality, and the increase in computational cost prevent one from scaling the patch size up). [sent-163, score-0.28]
82 On the other hand, the close relation to Gaussian MRF makes the analysis and computation of FoGSM significantly easier than other non-Gaussian MRF based image models [6, 7, 5]. [sent-164, score-0.097]
83 First, the model should benefit from the introduction of more general Markov neighborhoods, including wavelet coefficients from subbands at other scales and orientations [4, 12], since the current model is clearly not accounting for these dependencies (see Fig. [sent-166, score-0.379]
84 Secondly, the log transformation used to derive the multiplier field from a hGMRF is somewhat ad hoc, and we believe that substitution of another nonlinear transformation (e. [sent-168, score-0.145]
85 , a power law [14]) might lead to a more accurate description of the image statistics. [sent-170, score-0.13]
86 Thirdly, the current denoising method estimates model parameter during the process of denoising, which produces image adaptive model parameters. [sent-171, score-0.304]
87 We are exploring the possibility of using a set of generic model parameter learned a priori on a large set of photographic images, so that a generic statistical model for all photographic images based on FoGSM can be built. [sent-172, score-0.5]
88 Finally, there exist residual inhomogeneous structures in the log z field (see Fig. [sent-173, score-0.115]
89 1) that can likely be captured by explicitly incorporating local orientation [21] or phase into the model. [sent-174, score-0.087]
90 Finding tractable models and algorithms for handling such circular variables is challenging, but we believe their inclusion will result in substantial improvements in modeling and in denoising performance. [sent-175, score-0.259]
91 Shown in the table are PSNRs (20 log10 (255/σe ), where σe is the standard deviation of the error) of the denoised images, and in the parenthesis are the PSNRs of the same images denoised with a local GSM model [12]. [sent-250, score-0.247]
92 Statistical dependencies between orientation filter outputs used in human vision based image code. [sent-271, score-0.158]
93 Image compression via joint statistical characterization in the wavelet domain. [sent-281, score-0.239]
94 Fields of experts: a framework for learning image priors. [sent-311, score-0.097]
95 Topographic ICA as a model of natural image statistics. [sent-349, score-0.119]
96 Image denoising using scale mixtures of Gaussians in the wavelet domain. [sent-362, score-0.435]
97 Bayesian tree-structured image modeling using wavelet domain hidden Markov models. [sent-371, score-0.31]
98 Random cascades on wavelet trees and their use in modeling and analyzing natural imagery. [sent-382, score-0.195]
99 Unsupervised image restoration and edge location using compound Gaussa Markov random fields and MDL principle. [sent-424, score-0.097]
100 Image denoising with an orientation-adaptive Gaussian scale mixture model. [sent-440, score-0.216]
wordName wordTfidf (topN-words)
[('fogsm', 0.607), ('qz', 0.367), ('gsm', 0.268), ('subband', 0.226), ('hgmrf', 0.198), ('coe', 0.198), ('photographic', 0.188), ('wavelet', 0.174), ('denoising', 0.163), ('qu', 0.148), ('psnr', 0.113), ('subbands', 0.099), ('image', 0.097), ('boat', 0.085), ('multiplier', 0.08), ('synthesized', 0.075), ('hgmrfs', 0.071), ('marginal', 0.066), ('log', 0.065), ('cients', 0.063), ('mixtures', 0.062), ('eld', 0.061), ('mrf', 0.06), ('images', 0.059), ('homogeneous', 0.056), ('steerable', 0.049), ('denoised', 0.049), ('local', 0.049), ('precision', 0.047), ('joint', 0.044), ('argmaxx', 0.042), ('argmaxz', 0.042), ('pyramid', 0.041), ('neighborhood', 0.041), ('dependencies', 0.04), ('substantial', 0.04), ('histograms', 0.04), ('erent', 0.039), ('gaussian', 0.038), ('scale', 0.036), ('regularities', 0.034), ('di', 0.033), ('int', 0.033), ('description', 0.033), ('pz', 0.031), ('samples', 0.028), ('gsms', 0.028), ('kurtotic', 0.028), ('peppers', 0.028), ('psnrs', 0.028), ('sim', 0.028), ('tractability', 0.028), ('inhomogeneous', 0.028), ('ascent', 0.028), ('simoncelli', 0.027), ('fields', 0.026), ('conditioned', 0.026), ('overcomplete', 0.026), ('elds', 0.025), ('exponentiation', 0.025), ('nongaussian', 0.025), ('dropped', 0.024), ('wainwright', 0.023), ('conditional', 0.023), ('markov', 0.023), ('multiplication', 0.023), ('amplitudes', 0.022), ('gmrf', 0.022), ('exhibit', 0.022), ('model', 0.022), ('structures', 0.022), ('orientations', 0.022), ('decomposed', 0.021), ('barbara', 0.021), ('statistical', 0.021), ('orientation', 0.021), ('modeling', 0.021), ('corrupted', 0.02), ('statistics', 0.02), ('oriented', 0.02), ('rescaled', 0.02), ('house', 0.02), ('mrfs', 0.02), ('deviation', 0.019), ('product', 0.019), ('estimation', 0.019), ('spatial', 0.019), ('circular', 0.018), ('argmax', 0.018), ('domain', 0.018), ('develop', 0.017), ('improvements', 0.017), ('mixture', 0.017), ('noise', 0.017), ('density', 0.017), ('captured', 0.017), ('gaussians', 0.016), ('red', 0.016), ('exp', 0.016), ('levels', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 182 nips-2006-Statistical Modeling of Images with Fields of Gaussian Scale Mixtures
Author: Siwei Lyu, Eero P. Simoncelli
Abstract: The local statistical properties of photographic images, when represented in a multi-scale basis, have been described using Gaussian scale mixtures (GSMs). Here, we use this local description to construct a global field of Gaussian scale mixtures (FoGSM). Specifically, we model subbands of wavelet coefficients as a product of an exponentiated homogeneous Gaussian Markov random field (hGMRF) and a second independent hGMRF. We show that parameter estimation for FoGSM is feasible, and that samples drawn from an estimated FoGSM model have marginal and joint statistics similar to wavelet coefficients of photographic images. We develop an algorithm for image denoising based on the FoGSM model, and demonstrate substantial improvements over current state-ofthe-art denoising method based on the local GSM model. Many successful methods in image processing and computer vision rely on statistical models for images, and it is thus of continuing interest to develop improved models, both in terms of their ability to precisely capture image structures, and in terms of their tractability when used in applications. Constructing such a model is difficult, primarily because of the intrinsic high dimensionality of the space of images. Two simplifying assumptions are usually made to reduce model complexity. The first is Markovianity: the density of a pixel conditioned on a small neighborhood, is assumed to be independent from the rest of the image. The second assumption is homogeneity: the local density is assumed to be independent of its absolute position within the image. The set of models satisfying both of these assumptions constitute the class of homogeneous Markov random fields (hMRFs). Over the past two decades, studies of photographic images represented with multi-scale multiorientation image decompositions (loosely referred to as “wavelets”) have revealed striking nonGaussian regularities and inter and intra-subband dependencies. For instance, wavelet coefficients generally have highly kurtotic marginal distributions [1, 2], and their amplitudes exhibit strong correlations with the amplitudes of nearby coefficients [3, 4]. One model that can capture the nonGaussian marginal behaviors is a product of non-Gaussian scalar variables [5]. A number of authors have developed non-Gaussian MRF models based on this sort of local description [6, 7, 8], among which the recently developed fields of experts model [7] has demonstrated impressive performance in denoising (albeit at an extremely high computational cost in learning model parameters). An alternative model that can capture non-Gaussian local structure is a scale mixture model [9, 10, 11]. An important special case is Gaussian scale mixtures (GSM), which consists of a Gaussian random vector whose amplitude is modulated by a hidden scaling variable. The GSM model provides a particularly good description of local image statistics, and the Gaussian substructure of the model leads to efficient algorithms for parameter estimation and inference. Local GSM-based methods represent the current state-of-the-art in image denoising [12]. The power of GSM models should be substantially improved when extended to describe more than a small neighborhood of wavelet coefficients. To this end, several authors have embedded local Gaussian mixtures into tree-structured MRF models [e.g., 13, 14]. In order to maintain tractability, these models are arranged such that coefficients are grouped in non-overlapping clusters, allowing a graphical probability model with no loops. Despite their global consistency, the artificially imposed cluster boundaries lead to substantial artifacts in applications such as denoising. In this paper, we use a local GSM as a basis for a globally consistent and spatially homogeneous field of Gaussian scale mixtures (FoGSM). Specifically, the FoGSM is formulated as the product of two mutually independent MRFs: a positive multiplier field obtained by exponentiating a homogeneous Gaussian MRF (hGMRF), and a second hGMRF. We develop a parameter estimation procedure, and show that the model is able to capture important statistical regularities in the marginal and joint wavelet statistics of a photographic image. We apply the FoGSM to image denoising, demonstrating substantial improvement over the previous state-of-the-art results obtained with a local GSM model. 1 Gaussian scale mixtures A GSM random vector x is formed as the product of a zero-mean Gaussian random vector u and an d √ d independent random variable z, as x = zu, where = denotes equality in distribution. The density of x is determined by the covariance of the Gaussian vector, Σ, and the density of the multiplier, p z (z), through the integral Nx (0, zΣ)pz (z)dz ∝ p(x) = z z xT Σ−1 x 1 pz (z)dz. exp − √ 2z z|Σ| (1) A key property of GSMs is that when z determines the scale of the conditional variance of x given z, which is a Gaussian variable with zero mean and covariance zΣ. In addition, the normalized variable √ x z is a zero mean Gaussian with covariance matrix Σ. The GSM model has been used to describe the marginal and joint densities of local clusters of wavelet coefficients, both within and across subbands [9], where the embedded Gaussian structure affords simple and efficient computation. This local GSM model has been be used for denoising, by independently estimating each coefficient conditioned on its surrounding cluster [12]. This method achieves state-of-the-art performances, despite the fact that treating overlapping clusters as independent does not give rise to a globally consistent statistical model that satisfies all the local constraints. 2 Fields of Gaussian scale mixtures In this section, we develop fields of Gaussian scale mixtures (FoGSM) as a framework for modeling wavelet coefficients of photographic images. Analogous to the local GSM model, we use a latent multiplier field to modulate a homogeneous Gaussian MRF (hGMRF). Formally, we define a FoGSM x as the product of two mutually independent MRFs, √ d x = u ⊗ z, (2) where u is a zero-mean hGMRF, and z is a field of positive multipliers that control the local coefficient variances. The operator ⊗ denotes element-wise multiplication, and the square root operation is applied to each component. Note that x has a one-dimensional GSM marginal distributions, while its components have dependencies captured by the MRF structures of u and z. Analogous to the local GSM, when conditioned on z, x is an inhomogeneous GMRF p(x|z) ∝ √ 1 |Qu | exp − xT D z zi 2 i −1 Qu D √ z −1 x = 1 |Qu | exp − (x zi 2 i √ T z) Qu (x √ z) , (3) where Qu is the inverse covariance matrix of u (also known as the precision matrix), and D(·) denotes the operator that form a diagonal matrix from an input vector. Note also that the element√ wise division of the two fields, x z, yields a hGMRF with precision matrix Q u . To complete the FoGSM model, we need to specify the structure of the multiplier field z. For tractability, we use another hGMRF as a substrate, and map it into positive values by exponentiation, x u log z Fig. 1. Decomposition of a subband from image “boat” (left) into the normalized subband u (middle) and the multiplier field z (right, in the logarithm domain). Each image is rescaled individually to fill the full range of grayscale intensities. as was done in [10]. To be more specific, we model log(z) as a hGMRF with mean µ and precision matrix Qz , where the log operator is applied element-wise, from which the density of z follows as: pz (z) ∝ |Qz | 1 exp − (log z − µ)T Qz (log z − µ) . zi 2 i (4) This is a natural extension of the univariate lognormal prior used previously for the scalar multiplier in the local GSM model [12]. The restriction to hGMRFs greatly simplifies computation with FoGSM. Particularly, we take advantage of the fact that a 2D hGMRF with circular boundary handling has a sparse block-circulant precision matrix with a generating kernel θ specifying its nonzero elements. A block-circulant matrix is diagonalized by the Fourier transform, and its multiplication with a vector corresponds to convolution with the kernel θ. The diagonalizability with a fixed and efficiently computed transform makes the parameter estimation, sampling, and inference with a hGMRF substantially more tractable than with a general MRF. Readers are referred to [15] for a detailed description of hGMRFs. Parameter estimation: The estimation of the latent multiplier field z and the model parameters (µ, Qz , Qu ) may be achieved by maximizing log p(x, z; Q u, Qz , µ) with an iterative coordinate-ascent method, which is guaranteed to converge. Specifically, based on the statistical dependency structures in the FoGSM model, the following three steps are repeated until convergence: (i) z(t+1) = argmaxz log p(x|z; Q(t) ) + log p(z; Q(t) , µ(t) ) u z (ii) Q(t+1) = argmaxQu log p(x|z(t+1); Qu ) u (iii) (Q(t+1) , µ(t+1) ) = argmaxQz ,µ log p(z(t+1) ; Qz , µ) z (5) According to the FoGSM model structure, steps (ii) and (iii) correspond to maximum likelihood √ z(t+1) and log z(t+1) , respectively. Because of this, estimates of the parameters of hGMRFs, x both steps may be efficiently implemented by exploiting the diagonalization of the precision matrices with 2D Fourier transforms [15]. Step (i) in (5) may be implemented with conjugate gradient ascent [16]. To simplify description and computation, we introduce a new variable for the element-wise inverse square root of the multiplier: √ s=1 z. The likelihood in (3) is then changed to: p(x|s) ∝ i 1 si exp − (x ⊗ s)T Qu (x ⊗ s) = 2 i 1 si exp − sT D(x)Qu D(x)s . 2 (6) The joint density of s is obtained from (4), using the relations between densities of transformed variables, as 1 1 exp − (2 log s + µ)T Qz (2 log s + µ) . (7) p(s) ∝ si 2 i ˆ Combining . (6) and (7), step (i) in (5) is equivalent to computing s = argmaxs log p(x|s; Qu ) + log p(s; Qz , µ), which is further simplified into: argmin s 1 T 1 s D (x) Qu D (x) s + (2 log s + µ)T Qz (2 log s + µ) . 2 2 (8) boat house peppers x x x x log p(x) Barbara Fig. 2. Empirical marginal log distributions of coefficients from a multi-scale decomposition of photographic images (blue dot-dashed line), synthesized FoGSM samples from the same subband (red solid line), and a Gaussian with the same standard deviation (red dashed line). ˆ ˆ and the optimal z is then recovered as z = 1 (ˆ ⊗ s). We the optimize (8) with conjugate gradient s ˆ ascent [16]. Specifically, the negative gradient of the objective function in (8) with respect to s is − ∂ log p(x|s)p(s) ∂s = D (x) Qu D (x) s + 2 D(s)−1 Qz (2 log s + µ) = x ⊗ (θu (x ⊗ s)) + 2(θz (2 log s + µ)) s, and the multiplication of any vector h with the Hessian matrix can be computed as: ∂2 log p(x|s)p(s) h = x ⊗ (θu (x ⊗ h)) + 4 (θz (h s)) s − 2 θz (log s + µ) ⊗ h (s ⊗ s). ∂s2 Both operations can be expressed entirely in terms of element-wise operations ( and ⊗) and 2D convolutions ( ) with the generating kernels of the two precision matrices θ u and θz , which allows for efficient implementation. 3 Modeling photographic images We have applied the FoGSM model to subbands of a multi-scale image representation known as a steerable pyramid [17]. This decomposition is a tight frame, constructed from oriented multiscale derivative operators, and is overcomplete by a factor of 4K/3, where K is the number of orientation bands. Note that the marginal and joint statistics we describe are not specific to this decomposition, and are similar for other multi-scale oriented representations. We fit a FoGSM model to each subband of a decomposed photographic image, using the algorithms described in the previous section. For precision matrices Q u and Qz , we assumed a 5 × 5 Markov neighborhood (corresponding to a 5 × 5 convolution kernel), which was loosely chosen to optimize the tradeoff between accuracy and overfitting. Figure 1 shows the result of fitting a FoGSM model to an example subband from the “boat” image (left panel). The subband is decomposed into the product of the u field (middle panel) and the z field (right panel, in the logarithm domain), along with model parameters Q u , µ and Qz (not shown). Visually, the changing spatial variances are represented in the estimated log z field, and the estimated u is much more homogeneous than the original subband and has a marginal distribution close to Gaussian.1 However, the log z field still has a non-Gaussian marginal distribution and is spatially inhomogeneous, suggesting limitations of FoGSM for modeling photographic image wavelet coefficients (see Discussion). The statistical dependencies captured by the FoGSM model can be further revealed by examining marginal and joint statistics of samples synthesized with the estimated model parameters. A sample √ from FoGSM can be formed by multiplying samples of u and z. The former is obtained by sampling from hGMRF u, and the latter is obtained from the element-wise exponentiation followed by a element-wise square root operation of a sample of hGMRF log z. This procedure is again efficient for FoGSM due to the use of hGMRFs as building blocks [15]. Marginal distributions: We start by comparing the marginal distributions of the samples and the original subband. Figure 2 shows empirical histograms in the log domain of a particular subband 1 This ”Gaussianizing” behavior was first noted in photographic images by Ruderman [18], who observed that image derivative measurements that were normalized by a local estimate of their standard deviation had approximately Gaussian marginal distributions. close ∆ = 1 near ∆ = 4 far ∆ = 32 orientation scale real sim real sim Fig. 3. Examples of empirically observed distributions of wavelet coefficient pairs, compared with distributions from synthesized samples with the FoGSM model. See text for details. from four different photographic images (blue dot-dashed line), and those of the synthesized samples of FoGSM models learned from each corresponding subband (red solid line). For comparison, a Gaussian with the same standard deviation as the image subband is also displayed (red dashed line). Note that the synthesized samples have conspicuous non-Gaussian characteristics similar to the real subbands, exemplified by the high peak and heavy tails in the marginal distributions. On the other hand, they are typically less kurtotic than the real subbands. We believe this arises from the imprecise Gaussian approximation of log z (see Discussion). Joint distributions: In addition to one-dimensional marginal statistics, the FoGSM model is capable of capturing the joint behavior of wavelet coefficients. As described in [4, 9], wavelet coefficients of photographic images present non-Gaussian dependencies. Shown in the first and the third rows in Fig. 3 are empirical joint and conditional histograms for one subband of the “boat” image, for five pairs of coefficients, corresponding to basis functions with spatial separations of ∆ = {1, 4, 32} samples, two orthogonal orientations and two adjacent scales. Contour lines in the joint histogram are drawn at equal intervals of log probability. Intensities in the conditional histograms correspond to probability, except that each column is independently rescaled to fill the full range of intensity. For a pair of adjacent coefficients, we observe an elliptical joint distribution and a “bow-tie” shaped conditional distribution. The latter is indicative of strong non-Gaussian dependencies. For coefficients that are distant, the dependency becomes weaker and the corresponding joint and conditional histograms become more separable, as would be expected for two independent random variables. Random samples drawn from a FoGSM model, with parameters fitted to the corresponding subband, have statistical characteristics consistent with the general description of wavelet coefficients of photographic images. Shown in the second and the fourth rows of Fig. 3 are the joint and conditional histograms of synthesized samples from the FoGSM model estimated from the same subband as in the first and the third rows. Note that the joint and conditional histograms of the synthesized samples have similar transition of spatial dependencies as the separation increases (column 1,2 and 3), suggesting that the FoGSM accounts well for pairwise joint dependencies of coefficients over a full range of spatial separations. On the other hand, the dependencies between subbands of different orientations and scales are not properly modeled by FoGSM (column 4 and 5). This is especially true for subbands at different scales, which exhibit strong dependencies. The current FoGSM model original image noisy image (σ = 50) (PSNR = 14.15dB) GSM-BLS (PSNR = 26.34dB) FoGSM (PSNR = 27.01dB) Fig. 4. Denoising results using local GSM [12] and FoGSM. Performances are evaluated in peaksignal-to-noise-ratio (PSNR), 20 log10 (255/σe ), where σe is the standard deviation of the error. does not exhibit those dependencies as only spatial neighbors are used to make use the 2D hGMRFs (see Discussion). 4 Application to image denoising Let y = x+w be a wavelet subband of an image that has been corrupted with white Gaussian noise of known variance. In an overcomplete wavelet domain such as steerable pyramid, the white Gaussian noise is transformed into correlated Gaussian noise w ∼ N w (0, Σw ), whose covariance Σ w can be derived from the basis functions of the pyramid transform. With FoGSM as prior over x, commonly used denoising methods involve expensive high-dimensional integration: for instance, maximum ˆ a posterior estimate, x MAP = argmaxx log p(x|y), requires a high-dimensional integral over z, and ˆ the Bayesian least square estimation, x BLS = E(x|y) requires a double high-dimensional integral over x and z. Although it is possible to optimize with these criteria using Monte-Carlo Markov sampling or other approximations, we instead develop a more efficient deterministic algorithm that takes advantage of the hGMRF structure in the FoGSM model. Specifically, we compute (ˆ , z, Qu , Qz , µ) = argmaxx,z,Qu ,Qz ,µ log p(x, z|y; Q u, Qz , µ) x ˆ ˆ ˆ ˆ (9) ˆ and take x as the optimal denoised subband. Note that the model parameters are learned within the inference process rather than in a separate parameter learning step. This strategy, known as partial optimal solution [19], greatly reduces the computational complexity. We optimize (9) with coordinate ascent, iterating between maximizing each of (x, z, Q u , Qz , µ) while fixing the others. With fixed estimates of (z, Q u , Qz , µ), the optimization of x is argmaxx log p(x, z|y; Q u, Qz , µ) = argmaxx log p(x|z, y; Q u, Qz , µ) + log p(z|y; Q u, Qz , µ) , which reduces to argmax x log p(x|z, y; Q u), with the second term independent of x and can be dropped from optimization. Given the Gaussian structure of x given z, this step is then equivalent to a Wiener filter (linear in y). Fixing (x, Q u , Qz , µ), the optimization of z is argmaxz log p(x, z|y; Q u, Qz , µ)= argmaxz log p(y|x, z; Q u)+log p(x, z; Q u, Qz , µ)−log p(y; Qu , Qz , µ) , which is further reduced to argmax z log p(x, z; Qu , Qz , µ). Here, the first term was dropped since y is independent of z when conditioned on x. The last term was also dropped since it is also independent of z. Therefore, optimizing z given (x, Q u , Qz , µ) is equivalent to the first step of the algorithm in section 2, which can be implemented with efficient gradient descent. Finally, given (x, z), the FoGSM √ z(t+1) and log z(t+1) , similar to the model parameters (Q u , Qz , µ) are estimated from hGMRFs x second and third step in the algorithm of section 2. However, to reduce the overall computation time, instead of a complete maximum likelihood estimation, these parameters are estimated with a maximum pseudo-likelihood procedure [20], which finds the parameters maximizing the product of all conditional distributions (which are 1D Gaussians in the GMRF case), followed by a projection to the subspace of FoGSM parameters that results in positive definite precision matrices. We tested this denoising method on a standard set of test images [12]. The noise corrupted images were first decomposed these into a steerable pyramid with multiple levels (5 levels for a 512 × 512 image and 4 levels for a 256 × 256 image ) and 8 orientations. We assumed a FoGSM model for each subband, with a 5 × 5 neighborhood for field u and a 1 × 1 neighborhood for field log z. These sizes were chosen to provide a reasonable combination of performance and computational efficiency. We then estimate the optimal x with the algorithm described previously, with the initial values of x and z computed from subband denoised with the local GSM model [12]. Shown in Fig. 4 is an example of denoising the “boat” image corrupted with simulated additive white Gaussian noise of strength σ = 50, corresponding to a peak-signal-to-noise-ratio (PSNR), of 14.15 dB. We compare this with the local GSM method in [12], which, assuming a local GSM model for the neighborhood consisting of 3 × 3 spatial neighbors plus parent in the next coarsest scale, computes a Bayes least squares estimate of each coefficient conditioned on its surrounding neighborhood. The FoGSM denoising achieves substantial improvement (+0.68 in PSNR) and is seen to exhibit better contrast and continuation of oriented features (see Fig. 4). On the other hand, FoGSM introduces some noticeable artifacts in low contrast areas, which is caused by numerical instability at locations with small z. We find that the improvement in terms of PSNR is consistent across photographic images and noise levels, as reported in Table 1. But even with a restricted neighborhood for the multiplier field, this PSNR improvement does come at a substantial computational cost. As a rough indication, running on a PowerPC G5 workstation with 2.3 Ghz processor and 16 Gb RAM memory, using unoptimized MATLAB (version R14) code, denoising a 512 × 512 image takes on average 4.5 hours (results averaging over 5 images), and denoising a 256×256 image takes on average 2.4 hours (result averaging over 2 images), to a convergence precision producing the reported results. Our preliminary investigation indicates that the slow running time is mainly due to the nature of coordinate ascent and the landscape of (9), which requires many iterations to converge. 5 Discussion We have introduced fields of Gaussian scale mixtures as a flexible and efficient tool for modeling the statistics of wavelet coefficients of photographic images. We developed a feasible (although admittedly computationally costly) parameter estimation method, and showed that samples synthesized from the fitted FoGSM model are able to capture structures in the marginal and joint wavelet statistics of photographic images. Preliminary results of applying FoGSM to image denoising indicate substantial improvements over the state-of-the-art methods based on the local GSM model. Although FoGSM has a structure that is similar to the local scale mixture model [9, 10], there is a fundamental difference between them. In FoGSM, hGMRF structures are enforced in u and log z, while the local scale mixture models impose minimal statistical structure on these variables. Because of this, our model easily extends to images of arbitrary size, while the local scale mixture models are essentially confined to describing small image patches (the curse of dimensionality, and the increase in computational cost prevent one from scaling the patch size up). On the other hand, the close relation to Gaussian MRF makes the analysis and computation of FoGSM significantly easier than other non-Gaussian MRF based image models [6, 7, 5]. We envision, and are currently working on, a number of model improvements. First, the model should benefit from the introduction of more general Markov neighborhoods, including wavelet coefficients from subbands at other scales and orientations [4, 12], since the current model is clearly not accounting for these dependencies (see Fig. 3). Secondly, the log transformation used to derive the multiplier field from a hGMRF is somewhat ad hoc, and we believe that substitution of another nonlinear transformation (e.g., a power law [14]) might lead to a more accurate description of the image statistics. Thirdly, the current denoising method estimates model parameter during the process of denoising, which produces image adaptive model parameters. We are exploring the possibility of using a set of generic model parameter learned a priori on a large set of photographic images, so that a generic statistical model for all photographic images based on FoGSM can be built. Finally, there exist residual inhomogeneous structures in the log z field (see Fig. 1) that can likely be captured by explicitly incorporating local orientation [21] or phase into the model. Finding tractable models and algorithms for handling such circular variables is challenging, but we believe their inclusion will result in substantial improvements in modeling and in denoising performance. σ/PSNR 10/28.13 25/20.17 50/14.15 100/8.13 σ/PSNR 10/28.13 25/20.17 50/14.15 100/8.13 Barbara 35.01 (34.01) 30.10 (29.07) 26.40 (25.45) 23.01 (22.61) Flintstones 32.47 (31.78) 28.29 (27.48) 24.82 (24.02) 21.24 (20.49) barco 35.05 (34.42) 30.44 (29.73) 27.36 (26.63) 24.44 (23.84) house 35.63 (35.27) 31.64 (31.32) 28.51 (28.23) 25.33 (25.31) boat 34.12 (33.58) 30.03 (29.34) 27.01 (26.35) 24.20 (23.79) Lena 35.94 (35.60) 32.11 (31.70) 29.12 (28.62) 26.12 (25.77) fingerprint 33.28 (32.45) 28.45 (27.44) 25.11 (24.13) 21.78 (21.21) peppers 34.38 (33.73) 29.78 (29.18) 26.43 (25.93) 23.17 (22.80) Table 1. Denoising results with FoGSM on different images and different noise levels. Shown in the table are PSNRs (20 log10 (255/σe ), where σe is the standard deviation of the error) of the denoised images, and in the parenthesis are the PSNRs of the same images denoised with a local GSM model [12]. References [1] P. J. Burt. Fast filter transforms for image processing. Comp. Graph. Image Proc., 16:20–51, 1981. [2] D. J. Field. Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am., 4(12):2379–2394, 1987. [3] B. Wegmann and C. Zetzsche. Statistical dependencies between orientation filter outputs used in human vision based image code. In Proc. Visual Comm. and Image Proc., volume 1360, pages 909–922, 1990. [4] R. W. Buccigrossi and E. P. Simoncelli. Image compression via joint statistical characterization in the wavelet domain. IEEE Trans. on Image Proc., 8(12):1688–1701, 1999. [5] Y. W. Teh, M. Welling, S. Osindero, and G. E. Hinton. Energy-based models for sparse overcomplete representations. J. of Machine Learning Res., 4:1235–1260, 2003. [6] S. C. Zhu, Y. Wu, and D. Mumford. Filters, random fields and maximum entropy (FRAME): Towards a unified theory for texture modeling. Int’l. J. Comp. Vis., 27(2):107–126, 1998. [7] S. Roth and M. J. Black. Fields of experts: a framework for learning image priors. In IEEE Conf. on Comp. Vis. and Pat. Rec., volume 2, pages 860–867, 2005. [8] P. Gehler and M. Welling. Products of ”edge-perts”. In Adv. in Neural Info. Proc. Systems (NIPS*05). MIT Press, 2006. [9] M. J. Wainwright and E. P. Simoncelli. Scale mixtures of Gaussians and the statistics of natural images. In Adv. Neural Info. Proc. Sys. (NIPS*99), volume 12, pages 855–861, May 2000. [10] Y. Karklin and M. S. Lewicki. A hierarchical Bayesian model for learning non-linear statistical regularities in non-stationary natural signals. Neural Computation, 17(2):397–423, 2005. [11] A. Hyv¨ rinen, P. O. Hoyer, and M. Inki. Topographic ICA as a model of natural image statistics. In the a First IEEE Int’l. Workshop on Bio. Motivated Comp. Vis., London, UK, 2000. [12] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli. Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans. on Image Proc., 12(11):1338–1351, 2003. [13] J. Romberg, H. Choi, and R. G. Baraniuk. Bayesian tree-structured image modeling using wavelet domain hidden Markov models. IEEE Trans. on Image Proc., 10(7):303–347, 2001. [14] M. J. Wainwright, E. P. Simoncelli, and A. S. Willsky. Random cascades on wavelet trees and their use in modeling and analyzing natural imagery. Appl. and Comp. Harm. Ana., 11(1):89–123, 2001. [15] H. Rue and L. Held. Gaussian Markov Random Fields: Theory And Applications. Monographs on Statistics and Applied Probability. Chapman and Hall/CRC, 2005. [16] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes. Cambridge, 2nd edition, 2002. [17] E. P. Simoncelli and W. T. Freeman. The steerable pyramid: A flexible architecture for multi-scale derivative computation. In IEEE Int’l. Conf. on Image Proc., volume 3, pages 444–447, 1995. [18] D. Ruderman. The statistics of natural images. Network : Comp. in Neural Sys., 5:598–605, 1994. [19] M. Figueiredo and J. Leit¨ o. Unsupervised image restoration and edge location using compound Gaussa Markov random fields and MDL principle. IEEE Trans. on Image Proc., 6(8):1089–1122, 1997. [20] J. Besag. On the statistical analysis of dirty pictures. J. of the Royal Stat. Soc., Series B, 48:259–302, 1986. [21] D. K. Hammond and E. P. Simoncelli. Image denoising with an orientation-adaptive Gaussian scale mixture model. In Proc. 13th IEEE Int’l. Conf. on Image Proc., pages 1433–1436, October 2006.
2 0.10571314 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
Author: Dong S. Cheng, Vittorio Murino, Mário Figueiredo
Abstract: This paper proposes a new approach to model-based clustering under prior knowledge. The proposed formulation can be interpreted from two different angles: as penalized logistic regression, where the class labels are only indirectly observed (via the probability density of each class); as finite mixture learning under a grouping prior. To estimate the parameters of the proposed model, we derive a (generalized) EM algorithm with a closed-form E-step, in contrast with other recent approaches to semi-supervised probabilistic clustering which require Gibbs sampling or suboptimal shortcuts. We show that our approach is ideally suited for image segmentation: it avoids the combinatorial nature Markov random field priors, and opens the door to more sophisticated spatial priors (e.g., wavelet-based) in a simple and computationally efficient way. Finally, we extend our formulation to work in unsupervised, semi-supervised, or discriminative modes. 1
3 0.098947078 128 nips-2006-Manifold Denoising
Author: Matthias Hein, Markus Maier
Abstract: We consider the problem of denoising a noisily sampled submanifold M in Rd , where the submanifold M is a priori unknown and we are only given a noisy point sample. The presented denoising algorithm is based on a graph-based diffusion process of the point sample. We analyze this diffusion process using recent results about the convergence of graph Laplacians. In the experiments we show that our method is capable of dealing with non-trivial high-dimensional noise. Moreover using the denoising algorithm as pre-processing method we can improve the results of a semi-supervised learning algorithm. 1
4 0.080859371 126 nips-2006-Logistic Regression for Single Trial EEG Classification
Author: Ryota Tomioka, Kazuyuki Aihara, Klaus-Robert Müller
Abstract: We propose a novel framework for the classification of single trial ElectroEncephaloGraphy (EEG), based on regularized logistic regression. Framed in this robust statistical framework no prior feature extraction or outlier removal is required. We present two variations of parameterizing the regression function: (a) with a full rank symmetric matrix coefficient and (b) as a difference of two rank=1 matrices. In the first case, the problem is convex and the logistic regression is optimal under a generative model. The latter case is shown to be related to the Common Spatial Pattern (CSP) algorithm, which is a popular technique in Brain Computer Interfacing. The regression coefficients can also be topographically mapped onto the scalp similarly to CSP projections, which allows neuro-physiological interpretation. Simulations on 162 BCI datasets demonstrate that classification accuracy and robustness compares favorably against conventional CSP based classifiers. 1
5 0.054556742 42 nips-2006-Bayesian Image Super-resolution, Continued
Author: Lyndsey C. Pickup, David P. Capel, Stephen J. Roberts, Andrew Zisserman
Abstract: This paper develops a multi-frame image super-resolution approach from a Bayesian view-point by marginalizing over the unknown registration parameters relating the set of input low-resolution views. In Tipping and Bishop’s Bayesian image super-resolution approach [16], the marginalization was over the superresolution image, necessitating the use of an unfavorable image prior. By integrating over the registration parameters rather than the high-resolution image, our method allows for more realistic prior distributions, and also reduces the dimension of the integral considerably, removing the main computational bottleneck of the other algorithm. In addition to the motion model used by Tipping and Bishop, illumination components are introduced into the generative model, allowing us to handle changes in lighting as well as motion. We show results on real and synthetic datasets to illustrate the efficacy of this approach.
6 0.050897606 167 nips-2006-Recursive ICA
7 0.050098851 199 nips-2006-Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing
8 0.050090641 57 nips-2006-Conditional mean field
9 0.049801156 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests
10 0.045189906 130 nips-2006-Max-margin classification of incomplete data
11 0.043409754 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
12 0.043387879 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions
13 0.043174066 74 nips-2006-Efficient Structure Learning of Markov Networks using $L 1$-Regularization
14 0.042591609 179 nips-2006-Sparse Representation for Signal Classification
15 0.040116869 103 nips-2006-Kernels on Structured Objects Through Nested Histograms
16 0.039888032 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
17 0.039686494 34 nips-2006-Approximate Correspondences in High Dimensions
18 0.039657678 45 nips-2006-Blind Motion Deblurring Using Image Statistics
19 0.039142758 66 nips-2006-Detecting Humans via Their Pose
20 0.037023507 75 nips-2006-Efficient sparse coding algorithms
topicId topicWeight
[(0, -0.132), (1, 0.012), (2, 0.08), (3, -0.021), (4, -0.021), (5, -0.046), (6, -0.008), (7, -0.02), (8, 0.002), (9, 0.022), (10, 0.055), (11, -0.025), (12, 0.039), (13, 0.037), (14, 0.029), (15, 0.089), (16, -0.003), (17, -0.008), (18, 0.061), (19, 0.064), (20, -0.036), (21, -0.017), (22, 0.066), (23, -0.076), (24, 0.099), (25, 0.052), (26, 0.083), (27, 0.016), (28, -0.083), (29, 0.057), (30, 0.004), (31, -0.019), (32, -0.067), (33, -0.054), (34, 0.022), (35, -0.005), (36, 0.021), (37, 0.035), (38, 0.053), (39, -0.077), (40, 0.007), (41, 0.022), (42, -0.054), (43, 0.012), (44, 0.187), (45, 0.126), (46, 0.102), (47, 0.148), (48, -0.034), (49, 0.124)]
simIndex simValue paperId paperTitle
same-paper 1 0.88492185 182 nips-2006-Statistical Modeling of Images with Fields of Gaussian Scale Mixtures
Author: Siwei Lyu, Eero P. Simoncelli
Abstract: The local statistical properties of photographic images, when represented in a multi-scale basis, have been described using Gaussian scale mixtures (GSMs). Here, we use this local description to construct a global field of Gaussian scale mixtures (FoGSM). Specifically, we model subbands of wavelet coefficients as a product of an exponentiated homogeneous Gaussian Markov random field (hGMRF) and a second independent hGMRF. We show that parameter estimation for FoGSM is feasible, and that samples drawn from an estimated FoGSM model have marginal and joint statistics similar to wavelet coefficients of photographic images. We develop an algorithm for image denoising based on the FoGSM model, and demonstrate substantial improvements over current state-ofthe-art denoising method based on the local GSM model. Many successful methods in image processing and computer vision rely on statistical models for images, and it is thus of continuing interest to develop improved models, both in terms of their ability to precisely capture image structures, and in terms of their tractability when used in applications. Constructing such a model is difficult, primarily because of the intrinsic high dimensionality of the space of images. Two simplifying assumptions are usually made to reduce model complexity. The first is Markovianity: the density of a pixel conditioned on a small neighborhood, is assumed to be independent from the rest of the image. The second assumption is homogeneity: the local density is assumed to be independent of its absolute position within the image. The set of models satisfying both of these assumptions constitute the class of homogeneous Markov random fields (hMRFs). Over the past two decades, studies of photographic images represented with multi-scale multiorientation image decompositions (loosely referred to as “wavelets”) have revealed striking nonGaussian regularities and inter and intra-subband dependencies. For instance, wavelet coefficients generally have highly kurtotic marginal distributions [1, 2], and their amplitudes exhibit strong correlations with the amplitudes of nearby coefficients [3, 4]. One model that can capture the nonGaussian marginal behaviors is a product of non-Gaussian scalar variables [5]. A number of authors have developed non-Gaussian MRF models based on this sort of local description [6, 7, 8], among which the recently developed fields of experts model [7] has demonstrated impressive performance in denoising (albeit at an extremely high computational cost in learning model parameters). An alternative model that can capture non-Gaussian local structure is a scale mixture model [9, 10, 11]. An important special case is Gaussian scale mixtures (GSM), which consists of a Gaussian random vector whose amplitude is modulated by a hidden scaling variable. The GSM model provides a particularly good description of local image statistics, and the Gaussian substructure of the model leads to efficient algorithms for parameter estimation and inference. Local GSM-based methods represent the current state-of-the-art in image denoising [12]. The power of GSM models should be substantially improved when extended to describe more than a small neighborhood of wavelet coefficients. To this end, several authors have embedded local Gaussian mixtures into tree-structured MRF models [e.g., 13, 14]. In order to maintain tractability, these models are arranged such that coefficients are grouped in non-overlapping clusters, allowing a graphical probability model with no loops. Despite their global consistency, the artificially imposed cluster boundaries lead to substantial artifacts in applications such as denoising. In this paper, we use a local GSM as a basis for a globally consistent and spatially homogeneous field of Gaussian scale mixtures (FoGSM). Specifically, the FoGSM is formulated as the product of two mutually independent MRFs: a positive multiplier field obtained by exponentiating a homogeneous Gaussian MRF (hGMRF), and a second hGMRF. We develop a parameter estimation procedure, and show that the model is able to capture important statistical regularities in the marginal and joint wavelet statistics of a photographic image. We apply the FoGSM to image denoising, demonstrating substantial improvement over the previous state-of-the-art results obtained with a local GSM model. 1 Gaussian scale mixtures A GSM random vector x is formed as the product of a zero-mean Gaussian random vector u and an d √ d independent random variable z, as x = zu, where = denotes equality in distribution. The density of x is determined by the covariance of the Gaussian vector, Σ, and the density of the multiplier, p z (z), through the integral Nx (0, zΣ)pz (z)dz ∝ p(x) = z z xT Σ−1 x 1 pz (z)dz. exp − √ 2z z|Σ| (1) A key property of GSMs is that when z determines the scale of the conditional variance of x given z, which is a Gaussian variable with zero mean and covariance zΣ. In addition, the normalized variable √ x z is a zero mean Gaussian with covariance matrix Σ. The GSM model has been used to describe the marginal and joint densities of local clusters of wavelet coefficients, both within and across subbands [9], where the embedded Gaussian structure affords simple and efficient computation. This local GSM model has been be used for denoising, by independently estimating each coefficient conditioned on its surrounding cluster [12]. This method achieves state-of-the-art performances, despite the fact that treating overlapping clusters as independent does not give rise to a globally consistent statistical model that satisfies all the local constraints. 2 Fields of Gaussian scale mixtures In this section, we develop fields of Gaussian scale mixtures (FoGSM) as a framework for modeling wavelet coefficients of photographic images. Analogous to the local GSM model, we use a latent multiplier field to modulate a homogeneous Gaussian MRF (hGMRF). Formally, we define a FoGSM x as the product of two mutually independent MRFs, √ d x = u ⊗ z, (2) where u is a zero-mean hGMRF, and z is a field of positive multipliers that control the local coefficient variances. The operator ⊗ denotes element-wise multiplication, and the square root operation is applied to each component. Note that x has a one-dimensional GSM marginal distributions, while its components have dependencies captured by the MRF structures of u and z. Analogous to the local GSM, when conditioned on z, x is an inhomogeneous GMRF p(x|z) ∝ √ 1 |Qu | exp − xT D z zi 2 i −1 Qu D √ z −1 x = 1 |Qu | exp − (x zi 2 i √ T z) Qu (x √ z) , (3) where Qu is the inverse covariance matrix of u (also known as the precision matrix), and D(·) denotes the operator that form a diagonal matrix from an input vector. Note also that the element√ wise division of the two fields, x z, yields a hGMRF with precision matrix Q u . To complete the FoGSM model, we need to specify the structure of the multiplier field z. For tractability, we use another hGMRF as a substrate, and map it into positive values by exponentiation, x u log z Fig. 1. Decomposition of a subband from image “boat” (left) into the normalized subband u (middle) and the multiplier field z (right, in the logarithm domain). Each image is rescaled individually to fill the full range of grayscale intensities. as was done in [10]. To be more specific, we model log(z) as a hGMRF with mean µ and precision matrix Qz , where the log operator is applied element-wise, from which the density of z follows as: pz (z) ∝ |Qz | 1 exp − (log z − µ)T Qz (log z − µ) . zi 2 i (4) This is a natural extension of the univariate lognormal prior used previously for the scalar multiplier in the local GSM model [12]. The restriction to hGMRFs greatly simplifies computation with FoGSM. Particularly, we take advantage of the fact that a 2D hGMRF with circular boundary handling has a sparse block-circulant precision matrix with a generating kernel θ specifying its nonzero elements. A block-circulant matrix is diagonalized by the Fourier transform, and its multiplication with a vector corresponds to convolution with the kernel θ. The diagonalizability with a fixed and efficiently computed transform makes the parameter estimation, sampling, and inference with a hGMRF substantially more tractable than with a general MRF. Readers are referred to [15] for a detailed description of hGMRFs. Parameter estimation: The estimation of the latent multiplier field z and the model parameters (µ, Qz , Qu ) may be achieved by maximizing log p(x, z; Q u, Qz , µ) with an iterative coordinate-ascent method, which is guaranteed to converge. Specifically, based on the statistical dependency structures in the FoGSM model, the following three steps are repeated until convergence: (i) z(t+1) = argmaxz log p(x|z; Q(t) ) + log p(z; Q(t) , µ(t) ) u z (ii) Q(t+1) = argmaxQu log p(x|z(t+1); Qu ) u (iii) (Q(t+1) , µ(t+1) ) = argmaxQz ,µ log p(z(t+1) ; Qz , µ) z (5) According to the FoGSM model structure, steps (ii) and (iii) correspond to maximum likelihood √ z(t+1) and log z(t+1) , respectively. Because of this, estimates of the parameters of hGMRFs, x both steps may be efficiently implemented by exploiting the diagonalization of the precision matrices with 2D Fourier transforms [15]. Step (i) in (5) may be implemented with conjugate gradient ascent [16]. To simplify description and computation, we introduce a new variable for the element-wise inverse square root of the multiplier: √ s=1 z. The likelihood in (3) is then changed to: p(x|s) ∝ i 1 si exp − (x ⊗ s)T Qu (x ⊗ s) = 2 i 1 si exp − sT D(x)Qu D(x)s . 2 (6) The joint density of s is obtained from (4), using the relations between densities of transformed variables, as 1 1 exp − (2 log s + µ)T Qz (2 log s + µ) . (7) p(s) ∝ si 2 i ˆ Combining . (6) and (7), step (i) in (5) is equivalent to computing s = argmaxs log p(x|s; Qu ) + log p(s; Qz , µ), which is further simplified into: argmin s 1 T 1 s D (x) Qu D (x) s + (2 log s + µ)T Qz (2 log s + µ) . 2 2 (8) boat house peppers x x x x log p(x) Barbara Fig. 2. Empirical marginal log distributions of coefficients from a multi-scale decomposition of photographic images (blue dot-dashed line), synthesized FoGSM samples from the same subband (red solid line), and a Gaussian with the same standard deviation (red dashed line). ˆ ˆ and the optimal z is then recovered as z = 1 (ˆ ⊗ s). We the optimize (8) with conjugate gradient s ˆ ascent [16]. Specifically, the negative gradient of the objective function in (8) with respect to s is − ∂ log p(x|s)p(s) ∂s = D (x) Qu D (x) s + 2 D(s)−1 Qz (2 log s + µ) = x ⊗ (θu (x ⊗ s)) + 2(θz (2 log s + µ)) s, and the multiplication of any vector h with the Hessian matrix can be computed as: ∂2 log p(x|s)p(s) h = x ⊗ (θu (x ⊗ h)) + 4 (θz (h s)) s − 2 θz (log s + µ) ⊗ h (s ⊗ s). ∂s2 Both operations can be expressed entirely in terms of element-wise operations ( and ⊗) and 2D convolutions ( ) with the generating kernels of the two precision matrices θ u and θz , which allows for efficient implementation. 3 Modeling photographic images We have applied the FoGSM model to subbands of a multi-scale image representation known as a steerable pyramid [17]. This decomposition is a tight frame, constructed from oriented multiscale derivative operators, and is overcomplete by a factor of 4K/3, where K is the number of orientation bands. Note that the marginal and joint statistics we describe are not specific to this decomposition, and are similar for other multi-scale oriented representations. We fit a FoGSM model to each subband of a decomposed photographic image, using the algorithms described in the previous section. For precision matrices Q u and Qz , we assumed a 5 × 5 Markov neighborhood (corresponding to a 5 × 5 convolution kernel), which was loosely chosen to optimize the tradeoff between accuracy and overfitting. Figure 1 shows the result of fitting a FoGSM model to an example subband from the “boat” image (left panel). The subband is decomposed into the product of the u field (middle panel) and the z field (right panel, in the logarithm domain), along with model parameters Q u , µ and Qz (not shown). Visually, the changing spatial variances are represented in the estimated log z field, and the estimated u is much more homogeneous than the original subband and has a marginal distribution close to Gaussian.1 However, the log z field still has a non-Gaussian marginal distribution and is spatially inhomogeneous, suggesting limitations of FoGSM for modeling photographic image wavelet coefficients (see Discussion). The statistical dependencies captured by the FoGSM model can be further revealed by examining marginal and joint statistics of samples synthesized with the estimated model parameters. A sample √ from FoGSM can be formed by multiplying samples of u and z. The former is obtained by sampling from hGMRF u, and the latter is obtained from the element-wise exponentiation followed by a element-wise square root operation of a sample of hGMRF log z. This procedure is again efficient for FoGSM due to the use of hGMRFs as building blocks [15]. Marginal distributions: We start by comparing the marginal distributions of the samples and the original subband. Figure 2 shows empirical histograms in the log domain of a particular subband 1 This ”Gaussianizing” behavior was first noted in photographic images by Ruderman [18], who observed that image derivative measurements that were normalized by a local estimate of their standard deviation had approximately Gaussian marginal distributions. close ∆ = 1 near ∆ = 4 far ∆ = 32 orientation scale real sim real sim Fig. 3. Examples of empirically observed distributions of wavelet coefficient pairs, compared with distributions from synthesized samples with the FoGSM model. See text for details. from four different photographic images (blue dot-dashed line), and those of the synthesized samples of FoGSM models learned from each corresponding subband (red solid line). For comparison, a Gaussian with the same standard deviation as the image subband is also displayed (red dashed line). Note that the synthesized samples have conspicuous non-Gaussian characteristics similar to the real subbands, exemplified by the high peak and heavy tails in the marginal distributions. On the other hand, they are typically less kurtotic than the real subbands. We believe this arises from the imprecise Gaussian approximation of log z (see Discussion). Joint distributions: In addition to one-dimensional marginal statistics, the FoGSM model is capable of capturing the joint behavior of wavelet coefficients. As described in [4, 9], wavelet coefficients of photographic images present non-Gaussian dependencies. Shown in the first and the third rows in Fig. 3 are empirical joint and conditional histograms for one subband of the “boat” image, for five pairs of coefficients, corresponding to basis functions with spatial separations of ∆ = {1, 4, 32} samples, two orthogonal orientations and two adjacent scales. Contour lines in the joint histogram are drawn at equal intervals of log probability. Intensities in the conditional histograms correspond to probability, except that each column is independently rescaled to fill the full range of intensity. For a pair of adjacent coefficients, we observe an elliptical joint distribution and a “bow-tie” shaped conditional distribution. The latter is indicative of strong non-Gaussian dependencies. For coefficients that are distant, the dependency becomes weaker and the corresponding joint and conditional histograms become more separable, as would be expected for two independent random variables. Random samples drawn from a FoGSM model, with parameters fitted to the corresponding subband, have statistical characteristics consistent with the general description of wavelet coefficients of photographic images. Shown in the second and the fourth rows of Fig. 3 are the joint and conditional histograms of synthesized samples from the FoGSM model estimated from the same subband as in the first and the third rows. Note that the joint and conditional histograms of the synthesized samples have similar transition of spatial dependencies as the separation increases (column 1,2 and 3), suggesting that the FoGSM accounts well for pairwise joint dependencies of coefficients over a full range of spatial separations. On the other hand, the dependencies between subbands of different orientations and scales are not properly modeled by FoGSM (column 4 and 5). This is especially true for subbands at different scales, which exhibit strong dependencies. The current FoGSM model original image noisy image (σ = 50) (PSNR = 14.15dB) GSM-BLS (PSNR = 26.34dB) FoGSM (PSNR = 27.01dB) Fig. 4. Denoising results using local GSM [12] and FoGSM. Performances are evaluated in peaksignal-to-noise-ratio (PSNR), 20 log10 (255/σe ), where σe is the standard deviation of the error. does not exhibit those dependencies as only spatial neighbors are used to make use the 2D hGMRFs (see Discussion). 4 Application to image denoising Let y = x+w be a wavelet subband of an image that has been corrupted with white Gaussian noise of known variance. In an overcomplete wavelet domain such as steerable pyramid, the white Gaussian noise is transformed into correlated Gaussian noise w ∼ N w (0, Σw ), whose covariance Σ w can be derived from the basis functions of the pyramid transform. With FoGSM as prior over x, commonly used denoising methods involve expensive high-dimensional integration: for instance, maximum ˆ a posterior estimate, x MAP = argmaxx log p(x|y), requires a high-dimensional integral over z, and ˆ the Bayesian least square estimation, x BLS = E(x|y) requires a double high-dimensional integral over x and z. Although it is possible to optimize with these criteria using Monte-Carlo Markov sampling or other approximations, we instead develop a more efficient deterministic algorithm that takes advantage of the hGMRF structure in the FoGSM model. Specifically, we compute (ˆ , z, Qu , Qz , µ) = argmaxx,z,Qu ,Qz ,µ log p(x, z|y; Q u, Qz , µ) x ˆ ˆ ˆ ˆ (9) ˆ and take x as the optimal denoised subband. Note that the model parameters are learned within the inference process rather than in a separate parameter learning step. This strategy, known as partial optimal solution [19], greatly reduces the computational complexity. We optimize (9) with coordinate ascent, iterating between maximizing each of (x, z, Q u , Qz , µ) while fixing the others. With fixed estimates of (z, Q u , Qz , µ), the optimization of x is argmaxx log p(x, z|y; Q u, Qz , µ) = argmaxx log p(x|z, y; Q u, Qz , µ) + log p(z|y; Q u, Qz , µ) , which reduces to argmax x log p(x|z, y; Q u), with the second term independent of x and can be dropped from optimization. Given the Gaussian structure of x given z, this step is then equivalent to a Wiener filter (linear in y). Fixing (x, Q u , Qz , µ), the optimization of z is argmaxz log p(x, z|y; Q u, Qz , µ)= argmaxz log p(y|x, z; Q u)+log p(x, z; Q u, Qz , µ)−log p(y; Qu , Qz , µ) , which is further reduced to argmax z log p(x, z; Qu , Qz , µ). Here, the first term was dropped since y is independent of z when conditioned on x. The last term was also dropped since it is also independent of z. Therefore, optimizing z given (x, Q u , Qz , µ) is equivalent to the first step of the algorithm in section 2, which can be implemented with efficient gradient descent. Finally, given (x, z), the FoGSM √ z(t+1) and log z(t+1) , similar to the model parameters (Q u , Qz , µ) are estimated from hGMRFs x second and third step in the algorithm of section 2. However, to reduce the overall computation time, instead of a complete maximum likelihood estimation, these parameters are estimated with a maximum pseudo-likelihood procedure [20], which finds the parameters maximizing the product of all conditional distributions (which are 1D Gaussians in the GMRF case), followed by a projection to the subspace of FoGSM parameters that results in positive definite precision matrices. We tested this denoising method on a standard set of test images [12]. The noise corrupted images were first decomposed these into a steerable pyramid with multiple levels (5 levels for a 512 × 512 image and 4 levels for a 256 × 256 image ) and 8 orientations. We assumed a FoGSM model for each subband, with a 5 × 5 neighborhood for field u and a 1 × 1 neighborhood for field log z. These sizes were chosen to provide a reasonable combination of performance and computational efficiency. We then estimate the optimal x with the algorithm described previously, with the initial values of x and z computed from subband denoised with the local GSM model [12]. Shown in Fig. 4 is an example of denoising the “boat” image corrupted with simulated additive white Gaussian noise of strength σ = 50, corresponding to a peak-signal-to-noise-ratio (PSNR), of 14.15 dB. We compare this with the local GSM method in [12], which, assuming a local GSM model for the neighborhood consisting of 3 × 3 spatial neighbors plus parent in the next coarsest scale, computes a Bayes least squares estimate of each coefficient conditioned on its surrounding neighborhood. The FoGSM denoising achieves substantial improvement (+0.68 in PSNR) and is seen to exhibit better contrast and continuation of oriented features (see Fig. 4). On the other hand, FoGSM introduces some noticeable artifacts in low contrast areas, which is caused by numerical instability at locations with small z. We find that the improvement in terms of PSNR is consistent across photographic images and noise levels, as reported in Table 1. But even with a restricted neighborhood for the multiplier field, this PSNR improvement does come at a substantial computational cost. As a rough indication, running on a PowerPC G5 workstation with 2.3 Ghz processor and 16 Gb RAM memory, using unoptimized MATLAB (version R14) code, denoising a 512 × 512 image takes on average 4.5 hours (results averaging over 5 images), and denoising a 256×256 image takes on average 2.4 hours (result averaging over 2 images), to a convergence precision producing the reported results. Our preliminary investigation indicates that the slow running time is mainly due to the nature of coordinate ascent and the landscape of (9), which requires many iterations to converge. 5 Discussion We have introduced fields of Gaussian scale mixtures as a flexible and efficient tool for modeling the statistics of wavelet coefficients of photographic images. We developed a feasible (although admittedly computationally costly) parameter estimation method, and showed that samples synthesized from the fitted FoGSM model are able to capture structures in the marginal and joint wavelet statistics of photographic images. Preliminary results of applying FoGSM to image denoising indicate substantial improvements over the state-of-the-art methods based on the local GSM model. Although FoGSM has a structure that is similar to the local scale mixture model [9, 10], there is a fundamental difference between them. In FoGSM, hGMRF structures are enforced in u and log z, while the local scale mixture models impose minimal statistical structure on these variables. Because of this, our model easily extends to images of arbitrary size, while the local scale mixture models are essentially confined to describing small image patches (the curse of dimensionality, and the increase in computational cost prevent one from scaling the patch size up). On the other hand, the close relation to Gaussian MRF makes the analysis and computation of FoGSM significantly easier than other non-Gaussian MRF based image models [6, 7, 5]. We envision, and are currently working on, a number of model improvements. First, the model should benefit from the introduction of more general Markov neighborhoods, including wavelet coefficients from subbands at other scales and orientations [4, 12], since the current model is clearly not accounting for these dependencies (see Fig. 3). Secondly, the log transformation used to derive the multiplier field from a hGMRF is somewhat ad hoc, and we believe that substitution of another nonlinear transformation (e.g., a power law [14]) might lead to a more accurate description of the image statistics. Thirdly, the current denoising method estimates model parameter during the process of denoising, which produces image adaptive model parameters. We are exploring the possibility of using a set of generic model parameter learned a priori on a large set of photographic images, so that a generic statistical model for all photographic images based on FoGSM can be built. Finally, there exist residual inhomogeneous structures in the log z field (see Fig. 1) that can likely be captured by explicitly incorporating local orientation [21] or phase into the model. Finding tractable models and algorithms for handling such circular variables is challenging, but we believe their inclusion will result in substantial improvements in modeling and in denoising performance. σ/PSNR 10/28.13 25/20.17 50/14.15 100/8.13 σ/PSNR 10/28.13 25/20.17 50/14.15 100/8.13 Barbara 35.01 (34.01) 30.10 (29.07) 26.40 (25.45) 23.01 (22.61) Flintstones 32.47 (31.78) 28.29 (27.48) 24.82 (24.02) 21.24 (20.49) barco 35.05 (34.42) 30.44 (29.73) 27.36 (26.63) 24.44 (23.84) house 35.63 (35.27) 31.64 (31.32) 28.51 (28.23) 25.33 (25.31) boat 34.12 (33.58) 30.03 (29.34) 27.01 (26.35) 24.20 (23.79) Lena 35.94 (35.60) 32.11 (31.70) 29.12 (28.62) 26.12 (25.77) fingerprint 33.28 (32.45) 28.45 (27.44) 25.11 (24.13) 21.78 (21.21) peppers 34.38 (33.73) 29.78 (29.18) 26.43 (25.93) 23.17 (22.80) Table 1. Denoising results with FoGSM on different images and different noise levels. Shown in the table are PSNRs (20 log10 (255/σe ), where σe is the standard deviation of the error) of the denoised images, and in the parenthesis are the PSNRs of the same images denoised with a local GSM model [12]. References [1] P. J. Burt. Fast filter transforms for image processing. Comp. Graph. Image Proc., 16:20–51, 1981. [2] D. J. Field. Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am., 4(12):2379–2394, 1987. [3] B. Wegmann and C. Zetzsche. Statistical dependencies between orientation filter outputs used in human vision based image code. In Proc. Visual Comm. and Image Proc., volume 1360, pages 909–922, 1990. [4] R. W. Buccigrossi and E. P. Simoncelli. Image compression via joint statistical characterization in the wavelet domain. IEEE Trans. on Image Proc., 8(12):1688–1701, 1999. [5] Y. W. Teh, M. Welling, S. Osindero, and G. E. Hinton. Energy-based models for sparse overcomplete representations. J. of Machine Learning Res., 4:1235–1260, 2003. [6] S. C. Zhu, Y. Wu, and D. Mumford. Filters, random fields and maximum entropy (FRAME): Towards a unified theory for texture modeling. Int’l. J. Comp. Vis., 27(2):107–126, 1998. [7] S. Roth and M. J. Black. Fields of experts: a framework for learning image priors. In IEEE Conf. on Comp. Vis. and Pat. Rec., volume 2, pages 860–867, 2005. [8] P. Gehler and M. Welling. Products of ”edge-perts”. In Adv. in Neural Info. Proc. Systems (NIPS*05). MIT Press, 2006. [9] M. J. Wainwright and E. P. Simoncelli. Scale mixtures of Gaussians and the statistics of natural images. In Adv. Neural Info. Proc. Sys. (NIPS*99), volume 12, pages 855–861, May 2000. [10] Y. Karklin and M. S. Lewicki. A hierarchical Bayesian model for learning non-linear statistical regularities in non-stationary natural signals. Neural Computation, 17(2):397–423, 2005. [11] A. Hyv¨ rinen, P. O. Hoyer, and M. Inki. Topographic ICA as a model of natural image statistics. In the a First IEEE Int’l. Workshop on Bio. Motivated Comp. Vis., London, UK, 2000. [12] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli. Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans. on Image Proc., 12(11):1338–1351, 2003. [13] J. Romberg, H. Choi, and R. G. Baraniuk. Bayesian tree-structured image modeling using wavelet domain hidden Markov models. IEEE Trans. on Image Proc., 10(7):303–347, 2001. [14] M. J. Wainwright, E. P. Simoncelli, and A. S. Willsky. Random cascades on wavelet trees and their use in modeling and analyzing natural imagery. Appl. and Comp. Harm. Ana., 11(1):89–123, 2001. [15] H. Rue and L. Held. Gaussian Markov Random Fields: Theory And Applications. Monographs on Statistics and Applied Probability. Chapman and Hall/CRC, 2005. [16] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes. Cambridge, 2nd edition, 2002. [17] E. P. Simoncelli and W. T. Freeman. The steerable pyramid: A flexible architecture for multi-scale derivative computation. In IEEE Int’l. Conf. on Image Proc., volume 3, pages 444–447, 1995. [18] D. Ruderman. The statistics of natural images. Network : Comp. in Neural Sys., 5:598–605, 1994. [19] M. Figueiredo and J. Leit¨ o. Unsupervised image restoration and edge location using compound Gaussa Markov random fields and MDL principle. IEEE Trans. on Image Proc., 6(8):1089–1122, 1997. [20] J. Besag. On the statistical analysis of dirty pictures. J. of the Royal Stat. Soc., Series B, 48:259–302, 1986. [21] D. K. Hammond and E. P. Simoncelli. Image denoising with an orientation-adaptive Gaussian scale mixture model. In Proc. 13th IEEE Int’l. Conf. on Image Proc., pages 1433–1436, October 2006.
2 0.55317873 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
Author: Dong S. Cheng, Vittorio Murino, Mário Figueiredo
Abstract: This paper proposes a new approach to model-based clustering under prior knowledge. The proposed formulation can be interpreted from two different angles: as penalized logistic regression, where the class labels are only indirectly observed (via the probability density of each class); as finite mixture learning under a grouping prior. To estimate the parameters of the proposed model, we derive a (generalized) EM algorithm with a closed-form E-step, in contrast with other recent approaches to semi-supervised probabilistic clustering which require Gibbs sampling or suboptimal shortcuts. We show that our approach is ideally suited for image segmentation: it avoids the combinatorial nature Markov random field priors, and opens the door to more sophisticated spatial priors (e.g., wavelet-based) in a simple and computationally efficient way. Finally, we extend our formulation to work in unsupervised, semi-supervised, or discriminative modes. 1
3 0.48251519 128 nips-2006-Manifold Denoising
Author: Matthias Hein, Markus Maier
Abstract: We consider the problem of denoising a noisily sampled submanifold M in Rd , where the submanifold M is a priori unknown and we are only given a noisy point sample. The presented denoising algorithm is based on a graph-based diffusion process of the point sample. We analyze this diffusion process using recent results about the convergence of graph Laplacians. In the experiments we show that our method is capable of dealing with non-trivial high-dimensional noise. Moreover using the denoising algorithm as pre-processing method we can improve the results of a semi-supervised learning algorithm. 1
4 0.42532212 121 nips-2006-Learning to be Bayesian without Supervision
Author: Martin Raphan, Eero P. Simoncelli
Abstract: unkown-abstract
5 0.42051229 45 nips-2006-Blind Motion Deblurring Using Image Statistics
Author: Anat Levin
Abstract: We address the problem of blind motion deblurring from a single image, caused by a few moving objects. In such situations only part of the image may be blurred, and the scene consists of layers blurred in different degrees. Most of of existing blind deconvolution research concentrates at recovering a single blurring kernel for the entire image. However, in the case of different motions, the blur cannot be modeled with a single kernel, and trying to deconvolve the entire image with the same kernel will cause serious artifacts. Thus, the task of deblurring needs to involve segmentation of the image into regions with different blurs. Our approach relies on the observation that the statistics of derivative filters in images are significantly changed by blur. Assuming the blur results from a constant velocity motion, we can limit the search to one dimensional box filter blurs. This enables us to model the expected derivatives distributions as a function of the width of the blur kernel. Those distributions are surprisingly powerful in discriminating regions with different blurs. The approach produces convincing deconvolution results on real world images with rich texture.
6 0.38273227 43 nips-2006-Bayesian Model Scoring in Markov Random Fields
7 0.37919065 179 nips-2006-Sparse Representation for Signal Classification
8 0.36656043 42 nips-2006-Bayesian Image Super-resolution, Continued
9 0.36164993 201 nips-2006-Using Combinatorial Optimization within Max-Product Belief Propagation
10 0.347628 120 nips-2006-Learning to Traverse Image Manifolds
11 0.34027562 199 nips-2006-Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing
12 0.32982045 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions
13 0.32871988 91 nips-2006-Hierarchical Dirichlet Processes with Random Effects
14 0.32528585 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
15 0.32418454 190 nips-2006-The Neurodynamics of Belief Propagation on Binary Markov Random Fields
16 0.31177929 16 nips-2006-A Theory of Retinal Population Coding
17 0.31089136 1 nips-2006-A Bayesian Approach to Diffusion Models of Decision-Making and Response Time
18 0.30535364 169 nips-2006-Relational Learning with Gaussian Processes
19 0.30356506 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds
20 0.30293772 153 nips-2006-Online Clustering of Moving Hyperplanes
topicId topicWeight
[(1, 0.089), (3, 0.018), (7, 0.069), (9, 0.043), (20, 0.019), (22, 0.077), (44, 0.068), (46, 0.354), (57, 0.074), (65, 0.025), (69, 0.024), (71, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.75319541 182 nips-2006-Statistical Modeling of Images with Fields of Gaussian Scale Mixtures
Author: Siwei Lyu, Eero P. Simoncelli
Abstract: The local statistical properties of photographic images, when represented in a multi-scale basis, have been described using Gaussian scale mixtures (GSMs). Here, we use this local description to construct a global field of Gaussian scale mixtures (FoGSM). Specifically, we model subbands of wavelet coefficients as a product of an exponentiated homogeneous Gaussian Markov random field (hGMRF) and a second independent hGMRF. We show that parameter estimation for FoGSM is feasible, and that samples drawn from an estimated FoGSM model have marginal and joint statistics similar to wavelet coefficients of photographic images. We develop an algorithm for image denoising based on the FoGSM model, and demonstrate substantial improvements over current state-ofthe-art denoising method based on the local GSM model. Many successful methods in image processing and computer vision rely on statistical models for images, and it is thus of continuing interest to develop improved models, both in terms of their ability to precisely capture image structures, and in terms of their tractability when used in applications. Constructing such a model is difficult, primarily because of the intrinsic high dimensionality of the space of images. Two simplifying assumptions are usually made to reduce model complexity. The first is Markovianity: the density of a pixel conditioned on a small neighborhood, is assumed to be independent from the rest of the image. The second assumption is homogeneity: the local density is assumed to be independent of its absolute position within the image. The set of models satisfying both of these assumptions constitute the class of homogeneous Markov random fields (hMRFs). Over the past two decades, studies of photographic images represented with multi-scale multiorientation image decompositions (loosely referred to as “wavelets”) have revealed striking nonGaussian regularities and inter and intra-subband dependencies. For instance, wavelet coefficients generally have highly kurtotic marginal distributions [1, 2], and their amplitudes exhibit strong correlations with the amplitudes of nearby coefficients [3, 4]. One model that can capture the nonGaussian marginal behaviors is a product of non-Gaussian scalar variables [5]. A number of authors have developed non-Gaussian MRF models based on this sort of local description [6, 7, 8], among which the recently developed fields of experts model [7] has demonstrated impressive performance in denoising (albeit at an extremely high computational cost in learning model parameters). An alternative model that can capture non-Gaussian local structure is a scale mixture model [9, 10, 11]. An important special case is Gaussian scale mixtures (GSM), which consists of a Gaussian random vector whose amplitude is modulated by a hidden scaling variable. The GSM model provides a particularly good description of local image statistics, and the Gaussian substructure of the model leads to efficient algorithms for parameter estimation and inference. Local GSM-based methods represent the current state-of-the-art in image denoising [12]. The power of GSM models should be substantially improved when extended to describe more than a small neighborhood of wavelet coefficients. To this end, several authors have embedded local Gaussian mixtures into tree-structured MRF models [e.g., 13, 14]. In order to maintain tractability, these models are arranged such that coefficients are grouped in non-overlapping clusters, allowing a graphical probability model with no loops. Despite their global consistency, the artificially imposed cluster boundaries lead to substantial artifacts in applications such as denoising. In this paper, we use a local GSM as a basis for a globally consistent and spatially homogeneous field of Gaussian scale mixtures (FoGSM). Specifically, the FoGSM is formulated as the product of two mutually independent MRFs: a positive multiplier field obtained by exponentiating a homogeneous Gaussian MRF (hGMRF), and a second hGMRF. We develop a parameter estimation procedure, and show that the model is able to capture important statistical regularities in the marginal and joint wavelet statistics of a photographic image. We apply the FoGSM to image denoising, demonstrating substantial improvement over the previous state-of-the-art results obtained with a local GSM model. 1 Gaussian scale mixtures A GSM random vector x is formed as the product of a zero-mean Gaussian random vector u and an d √ d independent random variable z, as x = zu, where = denotes equality in distribution. The density of x is determined by the covariance of the Gaussian vector, Σ, and the density of the multiplier, p z (z), through the integral Nx (0, zΣ)pz (z)dz ∝ p(x) = z z xT Σ−1 x 1 pz (z)dz. exp − √ 2z z|Σ| (1) A key property of GSMs is that when z determines the scale of the conditional variance of x given z, which is a Gaussian variable with zero mean and covariance zΣ. In addition, the normalized variable √ x z is a zero mean Gaussian with covariance matrix Σ. The GSM model has been used to describe the marginal and joint densities of local clusters of wavelet coefficients, both within and across subbands [9], where the embedded Gaussian structure affords simple and efficient computation. This local GSM model has been be used for denoising, by independently estimating each coefficient conditioned on its surrounding cluster [12]. This method achieves state-of-the-art performances, despite the fact that treating overlapping clusters as independent does not give rise to a globally consistent statistical model that satisfies all the local constraints. 2 Fields of Gaussian scale mixtures In this section, we develop fields of Gaussian scale mixtures (FoGSM) as a framework for modeling wavelet coefficients of photographic images. Analogous to the local GSM model, we use a latent multiplier field to modulate a homogeneous Gaussian MRF (hGMRF). Formally, we define a FoGSM x as the product of two mutually independent MRFs, √ d x = u ⊗ z, (2) where u is a zero-mean hGMRF, and z is a field of positive multipliers that control the local coefficient variances. The operator ⊗ denotes element-wise multiplication, and the square root operation is applied to each component. Note that x has a one-dimensional GSM marginal distributions, while its components have dependencies captured by the MRF structures of u and z. Analogous to the local GSM, when conditioned on z, x is an inhomogeneous GMRF p(x|z) ∝ √ 1 |Qu | exp − xT D z zi 2 i −1 Qu D √ z −1 x = 1 |Qu | exp − (x zi 2 i √ T z) Qu (x √ z) , (3) where Qu is the inverse covariance matrix of u (also known as the precision matrix), and D(·) denotes the operator that form a diagonal matrix from an input vector. Note also that the element√ wise division of the two fields, x z, yields a hGMRF with precision matrix Q u . To complete the FoGSM model, we need to specify the structure of the multiplier field z. For tractability, we use another hGMRF as a substrate, and map it into positive values by exponentiation, x u log z Fig. 1. Decomposition of a subband from image “boat” (left) into the normalized subband u (middle) and the multiplier field z (right, in the logarithm domain). Each image is rescaled individually to fill the full range of grayscale intensities. as was done in [10]. To be more specific, we model log(z) as a hGMRF with mean µ and precision matrix Qz , where the log operator is applied element-wise, from which the density of z follows as: pz (z) ∝ |Qz | 1 exp − (log z − µ)T Qz (log z − µ) . zi 2 i (4) This is a natural extension of the univariate lognormal prior used previously for the scalar multiplier in the local GSM model [12]. The restriction to hGMRFs greatly simplifies computation with FoGSM. Particularly, we take advantage of the fact that a 2D hGMRF with circular boundary handling has a sparse block-circulant precision matrix with a generating kernel θ specifying its nonzero elements. A block-circulant matrix is diagonalized by the Fourier transform, and its multiplication with a vector corresponds to convolution with the kernel θ. The diagonalizability with a fixed and efficiently computed transform makes the parameter estimation, sampling, and inference with a hGMRF substantially more tractable than with a general MRF. Readers are referred to [15] for a detailed description of hGMRFs. Parameter estimation: The estimation of the latent multiplier field z and the model parameters (µ, Qz , Qu ) may be achieved by maximizing log p(x, z; Q u, Qz , µ) with an iterative coordinate-ascent method, which is guaranteed to converge. Specifically, based on the statistical dependency structures in the FoGSM model, the following three steps are repeated until convergence: (i) z(t+1) = argmaxz log p(x|z; Q(t) ) + log p(z; Q(t) , µ(t) ) u z (ii) Q(t+1) = argmaxQu log p(x|z(t+1); Qu ) u (iii) (Q(t+1) , µ(t+1) ) = argmaxQz ,µ log p(z(t+1) ; Qz , µ) z (5) According to the FoGSM model structure, steps (ii) and (iii) correspond to maximum likelihood √ z(t+1) and log z(t+1) , respectively. Because of this, estimates of the parameters of hGMRFs, x both steps may be efficiently implemented by exploiting the diagonalization of the precision matrices with 2D Fourier transforms [15]. Step (i) in (5) may be implemented with conjugate gradient ascent [16]. To simplify description and computation, we introduce a new variable for the element-wise inverse square root of the multiplier: √ s=1 z. The likelihood in (3) is then changed to: p(x|s) ∝ i 1 si exp − (x ⊗ s)T Qu (x ⊗ s) = 2 i 1 si exp − sT D(x)Qu D(x)s . 2 (6) The joint density of s is obtained from (4), using the relations between densities of transformed variables, as 1 1 exp − (2 log s + µ)T Qz (2 log s + µ) . (7) p(s) ∝ si 2 i ˆ Combining . (6) and (7), step (i) in (5) is equivalent to computing s = argmaxs log p(x|s; Qu ) + log p(s; Qz , µ), which is further simplified into: argmin s 1 T 1 s D (x) Qu D (x) s + (2 log s + µ)T Qz (2 log s + µ) . 2 2 (8) boat house peppers x x x x log p(x) Barbara Fig. 2. Empirical marginal log distributions of coefficients from a multi-scale decomposition of photographic images (blue dot-dashed line), synthesized FoGSM samples from the same subband (red solid line), and a Gaussian with the same standard deviation (red dashed line). ˆ ˆ and the optimal z is then recovered as z = 1 (ˆ ⊗ s). We the optimize (8) with conjugate gradient s ˆ ascent [16]. Specifically, the negative gradient of the objective function in (8) with respect to s is − ∂ log p(x|s)p(s) ∂s = D (x) Qu D (x) s + 2 D(s)−1 Qz (2 log s + µ) = x ⊗ (θu (x ⊗ s)) + 2(θz (2 log s + µ)) s, and the multiplication of any vector h with the Hessian matrix can be computed as: ∂2 log p(x|s)p(s) h = x ⊗ (θu (x ⊗ h)) + 4 (θz (h s)) s − 2 θz (log s + µ) ⊗ h (s ⊗ s). ∂s2 Both operations can be expressed entirely in terms of element-wise operations ( and ⊗) and 2D convolutions ( ) with the generating kernels of the two precision matrices θ u and θz , which allows for efficient implementation. 3 Modeling photographic images We have applied the FoGSM model to subbands of a multi-scale image representation known as a steerable pyramid [17]. This decomposition is a tight frame, constructed from oriented multiscale derivative operators, and is overcomplete by a factor of 4K/3, where K is the number of orientation bands. Note that the marginal and joint statistics we describe are not specific to this decomposition, and are similar for other multi-scale oriented representations. We fit a FoGSM model to each subband of a decomposed photographic image, using the algorithms described in the previous section. For precision matrices Q u and Qz , we assumed a 5 × 5 Markov neighborhood (corresponding to a 5 × 5 convolution kernel), which was loosely chosen to optimize the tradeoff between accuracy and overfitting. Figure 1 shows the result of fitting a FoGSM model to an example subband from the “boat” image (left panel). The subband is decomposed into the product of the u field (middle panel) and the z field (right panel, in the logarithm domain), along with model parameters Q u , µ and Qz (not shown). Visually, the changing spatial variances are represented in the estimated log z field, and the estimated u is much more homogeneous than the original subband and has a marginal distribution close to Gaussian.1 However, the log z field still has a non-Gaussian marginal distribution and is spatially inhomogeneous, suggesting limitations of FoGSM for modeling photographic image wavelet coefficients (see Discussion). The statistical dependencies captured by the FoGSM model can be further revealed by examining marginal and joint statistics of samples synthesized with the estimated model parameters. A sample √ from FoGSM can be formed by multiplying samples of u and z. The former is obtained by sampling from hGMRF u, and the latter is obtained from the element-wise exponentiation followed by a element-wise square root operation of a sample of hGMRF log z. This procedure is again efficient for FoGSM due to the use of hGMRFs as building blocks [15]. Marginal distributions: We start by comparing the marginal distributions of the samples and the original subband. Figure 2 shows empirical histograms in the log domain of a particular subband 1 This ”Gaussianizing” behavior was first noted in photographic images by Ruderman [18], who observed that image derivative measurements that were normalized by a local estimate of their standard deviation had approximately Gaussian marginal distributions. close ∆ = 1 near ∆ = 4 far ∆ = 32 orientation scale real sim real sim Fig. 3. Examples of empirically observed distributions of wavelet coefficient pairs, compared with distributions from synthesized samples with the FoGSM model. See text for details. from four different photographic images (blue dot-dashed line), and those of the synthesized samples of FoGSM models learned from each corresponding subband (red solid line). For comparison, a Gaussian with the same standard deviation as the image subband is also displayed (red dashed line). Note that the synthesized samples have conspicuous non-Gaussian characteristics similar to the real subbands, exemplified by the high peak and heavy tails in the marginal distributions. On the other hand, they are typically less kurtotic than the real subbands. We believe this arises from the imprecise Gaussian approximation of log z (see Discussion). Joint distributions: In addition to one-dimensional marginal statistics, the FoGSM model is capable of capturing the joint behavior of wavelet coefficients. As described in [4, 9], wavelet coefficients of photographic images present non-Gaussian dependencies. Shown in the first and the third rows in Fig. 3 are empirical joint and conditional histograms for one subband of the “boat” image, for five pairs of coefficients, corresponding to basis functions with spatial separations of ∆ = {1, 4, 32} samples, two orthogonal orientations and two adjacent scales. Contour lines in the joint histogram are drawn at equal intervals of log probability. Intensities in the conditional histograms correspond to probability, except that each column is independently rescaled to fill the full range of intensity. For a pair of adjacent coefficients, we observe an elliptical joint distribution and a “bow-tie” shaped conditional distribution. The latter is indicative of strong non-Gaussian dependencies. For coefficients that are distant, the dependency becomes weaker and the corresponding joint and conditional histograms become more separable, as would be expected for two independent random variables. Random samples drawn from a FoGSM model, with parameters fitted to the corresponding subband, have statistical characteristics consistent with the general description of wavelet coefficients of photographic images. Shown in the second and the fourth rows of Fig. 3 are the joint and conditional histograms of synthesized samples from the FoGSM model estimated from the same subband as in the first and the third rows. Note that the joint and conditional histograms of the synthesized samples have similar transition of spatial dependencies as the separation increases (column 1,2 and 3), suggesting that the FoGSM accounts well for pairwise joint dependencies of coefficients over a full range of spatial separations. On the other hand, the dependencies between subbands of different orientations and scales are not properly modeled by FoGSM (column 4 and 5). This is especially true for subbands at different scales, which exhibit strong dependencies. The current FoGSM model original image noisy image (σ = 50) (PSNR = 14.15dB) GSM-BLS (PSNR = 26.34dB) FoGSM (PSNR = 27.01dB) Fig. 4. Denoising results using local GSM [12] and FoGSM. Performances are evaluated in peaksignal-to-noise-ratio (PSNR), 20 log10 (255/σe ), where σe is the standard deviation of the error. does not exhibit those dependencies as only spatial neighbors are used to make use the 2D hGMRFs (see Discussion). 4 Application to image denoising Let y = x+w be a wavelet subband of an image that has been corrupted with white Gaussian noise of known variance. In an overcomplete wavelet domain such as steerable pyramid, the white Gaussian noise is transformed into correlated Gaussian noise w ∼ N w (0, Σw ), whose covariance Σ w can be derived from the basis functions of the pyramid transform. With FoGSM as prior over x, commonly used denoising methods involve expensive high-dimensional integration: for instance, maximum ˆ a posterior estimate, x MAP = argmaxx log p(x|y), requires a high-dimensional integral over z, and ˆ the Bayesian least square estimation, x BLS = E(x|y) requires a double high-dimensional integral over x and z. Although it is possible to optimize with these criteria using Monte-Carlo Markov sampling or other approximations, we instead develop a more efficient deterministic algorithm that takes advantage of the hGMRF structure in the FoGSM model. Specifically, we compute (ˆ , z, Qu , Qz , µ) = argmaxx,z,Qu ,Qz ,µ log p(x, z|y; Q u, Qz , µ) x ˆ ˆ ˆ ˆ (9) ˆ and take x as the optimal denoised subband. Note that the model parameters are learned within the inference process rather than in a separate parameter learning step. This strategy, known as partial optimal solution [19], greatly reduces the computational complexity. We optimize (9) with coordinate ascent, iterating between maximizing each of (x, z, Q u , Qz , µ) while fixing the others. With fixed estimates of (z, Q u , Qz , µ), the optimization of x is argmaxx log p(x, z|y; Q u, Qz , µ) = argmaxx log p(x|z, y; Q u, Qz , µ) + log p(z|y; Q u, Qz , µ) , which reduces to argmax x log p(x|z, y; Q u), with the second term independent of x and can be dropped from optimization. Given the Gaussian structure of x given z, this step is then equivalent to a Wiener filter (linear in y). Fixing (x, Q u , Qz , µ), the optimization of z is argmaxz log p(x, z|y; Q u, Qz , µ)= argmaxz log p(y|x, z; Q u)+log p(x, z; Q u, Qz , µ)−log p(y; Qu , Qz , µ) , which is further reduced to argmax z log p(x, z; Qu , Qz , µ). Here, the first term was dropped since y is independent of z when conditioned on x. The last term was also dropped since it is also independent of z. Therefore, optimizing z given (x, Q u , Qz , µ) is equivalent to the first step of the algorithm in section 2, which can be implemented with efficient gradient descent. Finally, given (x, z), the FoGSM √ z(t+1) and log z(t+1) , similar to the model parameters (Q u , Qz , µ) are estimated from hGMRFs x second and third step in the algorithm of section 2. However, to reduce the overall computation time, instead of a complete maximum likelihood estimation, these parameters are estimated with a maximum pseudo-likelihood procedure [20], which finds the parameters maximizing the product of all conditional distributions (which are 1D Gaussians in the GMRF case), followed by a projection to the subspace of FoGSM parameters that results in positive definite precision matrices. We tested this denoising method on a standard set of test images [12]. The noise corrupted images were first decomposed these into a steerable pyramid with multiple levels (5 levels for a 512 × 512 image and 4 levels for a 256 × 256 image ) and 8 orientations. We assumed a FoGSM model for each subband, with a 5 × 5 neighborhood for field u and a 1 × 1 neighborhood for field log z. These sizes were chosen to provide a reasonable combination of performance and computational efficiency. We then estimate the optimal x with the algorithm described previously, with the initial values of x and z computed from subband denoised with the local GSM model [12]. Shown in Fig. 4 is an example of denoising the “boat” image corrupted with simulated additive white Gaussian noise of strength σ = 50, corresponding to a peak-signal-to-noise-ratio (PSNR), of 14.15 dB. We compare this with the local GSM method in [12], which, assuming a local GSM model for the neighborhood consisting of 3 × 3 spatial neighbors plus parent in the next coarsest scale, computes a Bayes least squares estimate of each coefficient conditioned on its surrounding neighborhood. The FoGSM denoising achieves substantial improvement (+0.68 in PSNR) and is seen to exhibit better contrast and continuation of oriented features (see Fig. 4). On the other hand, FoGSM introduces some noticeable artifacts in low contrast areas, which is caused by numerical instability at locations with small z. We find that the improvement in terms of PSNR is consistent across photographic images and noise levels, as reported in Table 1. But even with a restricted neighborhood for the multiplier field, this PSNR improvement does come at a substantial computational cost. As a rough indication, running on a PowerPC G5 workstation with 2.3 Ghz processor and 16 Gb RAM memory, using unoptimized MATLAB (version R14) code, denoising a 512 × 512 image takes on average 4.5 hours (results averaging over 5 images), and denoising a 256×256 image takes on average 2.4 hours (result averaging over 2 images), to a convergence precision producing the reported results. Our preliminary investigation indicates that the slow running time is mainly due to the nature of coordinate ascent and the landscape of (9), which requires many iterations to converge. 5 Discussion We have introduced fields of Gaussian scale mixtures as a flexible and efficient tool for modeling the statistics of wavelet coefficients of photographic images. We developed a feasible (although admittedly computationally costly) parameter estimation method, and showed that samples synthesized from the fitted FoGSM model are able to capture structures in the marginal and joint wavelet statistics of photographic images. Preliminary results of applying FoGSM to image denoising indicate substantial improvements over the state-of-the-art methods based on the local GSM model. Although FoGSM has a structure that is similar to the local scale mixture model [9, 10], there is a fundamental difference between them. In FoGSM, hGMRF structures are enforced in u and log z, while the local scale mixture models impose minimal statistical structure on these variables. Because of this, our model easily extends to images of arbitrary size, while the local scale mixture models are essentially confined to describing small image patches (the curse of dimensionality, and the increase in computational cost prevent one from scaling the patch size up). On the other hand, the close relation to Gaussian MRF makes the analysis and computation of FoGSM significantly easier than other non-Gaussian MRF based image models [6, 7, 5]. We envision, and are currently working on, a number of model improvements. First, the model should benefit from the introduction of more general Markov neighborhoods, including wavelet coefficients from subbands at other scales and orientations [4, 12], since the current model is clearly not accounting for these dependencies (see Fig. 3). Secondly, the log transformation used to derive the multiplier field from a hGMRF is somewhat ad hoc, and we believe that substitution of another nonlinear transformation (e.g., a power law [14]) might lead to a more accurate description of the image statistics. Thirdly, the current denoising method estimates model parameter during the process of denoising, which produces image adaptive model parameters. We are exploring the possibility of using a set of generic model parameter learned a priori on a large set of photographic images, so that a generic statistical model for all photographic images based on FoGSM can be built. Finally, there exist residual inhomogeneous structures in the log z field (see Fig. 1) that can likely be captured by explicitly incorporating local orientation [21] or phase into the model. Finding tractable models and algorithms for handling such circular variables is challenging, but we believe their inclusion will result in substantial improvements in modeling and in denoising performance. σ/PSNR 10/28.13 25/20.17 50/14.15 100/8.13 σ/PSNR 10/28.13 25/20.17 50/14.15 100/8.13 Barbara 35.01 (34.01) 30.10 (29.07) 26.40 (25.45) 23.01 (22.61) Flintstones 32.47 (31.78) 28.29 (27.48) 24.82 (24.02) 21.24 (20.49) barco 35.05 (34.42) 30.44 (29.73) 27.36 (26.63) 24.44 (23.84) house 35.63 (35.27) 31.64 (31.32) 28.51 (28.23) 25.33 (25.31) boat 34.12 (33.58) 30.03 (29.34) 27.01 (26.35) 24.20 (23.79) Lena 35.94 (35.60) 32.11 (31.70) 29.12 (28.62) 26.12 (25.77) fingerprint 33.28 (32.45) 28.45 (27.44) 25.11 (24.13) 21.78 (21.21) peppers 34.38 (33.73) 29.78 (29.18) 26.43 (25.93) 23.17 (22.80) Table 1. Denoising results with FoGSM on different images and different noise levels. Shown in the table are PSNRs (20 log10 (255/σe ), where σe is the standard deviation of the error) of the denoised images, and in the parenthesis are the PSNRs of the same images denoised with a local GSM model [12]. References [1] P. J. Burt. Fast filter transforms for image processing. Comp. Graph. Image Proc., 16:20–51, 1981. [2] D. J. Field. Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am., 4(12):2379–2394, 1987. [3] B. Wegmann and C. Zetzsche. Statistical dependencies between orientation filter outputs used in human vision based image code. In Proc. Visual Comm. and Image Proc., volume 1360, pages 909–922, 1990. [4] R. W. Buccigrossi and E. P. Simoncelli. Image compression via joint statistical characterization in the wavelet domain. IEEE Trans. on Image Proc., 8(12):1688–1701, 1999. [5] Y. W. Teh, M. Welling, S. Osindero, and G. E. Hinton. Energy-based models for sparse overcomplete representations. J. of Machine Learning Res., 4:1235–1260, 2003. [6] S. C. Zhu, Y. Wu, and D. Mumford. Filters, random fields and maximum entropy (FRAME): Towards a unified theory for texture modeling. Int’l. J. Comp. Vis., 27(2):107–126, 1998. [7] S. Roth and M. J. Black. Fields of experts: a framework for learning image priors. In IEEE Conf. on Comp. Vis. and Pat. Rec., volume 2, pages 860–867, 2005. [8] P. Gehler and M. Welling. Products of ”edge-perts”. In Adv. in Neural Info. Proc. Systems (NIPS*05). MIT Press, 2006. [9] M. J. Wainwright and E. P. Simoncelli. Scale mixtures of Gaussians and the statistics of natural images. In Adv. Neural Info. Proc. Sys. (NIPS*99), volume 12, pages 855–861, May 2000. [10] Y. Karklin and M. S. Lewicki. A hierarchical Bayesian model for learning non-linear statistical regularities in non-stationary natural signals. Neural Computation, 17(2):397–423, 2005. [11] A. Hyv¨ rinen, P. O. Hoyer, and M. Inki. Topographic ICA as a model of natural image statistics. In the a First IEEE Int’l. Workshop on Bio. Motivated Comp. Vis., London, UK, 2000. [12] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli. Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans. on Image Proc., 12(11):1338–1351, 2003. [13] J. Romberg, H. Choi, and R. G. Baraniuk. Bayesian tree-structured image modeling using wavelet domain hidden Markov models. IEEE Trans. on Image Proc., 10(7):303–347, 2001. [14] M. J. Wainwright, E. P. Simoncelli, and A. S. Willsky. Random cascades on wavelet trees and their use in modeling and analyzing natural imagery. Appl. and Comp. Harm. Ana., 11(1):89–123, 2001. [15] H. Rue and L. Held. Gaussian Markov Random Fields: Theory And Applications. Monographs on Statistics and Applied Probability. Chapman and Hall/CRC, 2005. [16] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes. Cambridge, 2nd edition, 2002. [17] E. P. Simoncelli and W. T. Freeman. The steerable pyramid: A flexible architecture for multi-scale derivative computation. In IEEE Int’l. Conf. on Image Proc., volume 3, pages 444–447, 1995. [18] D. Ruderman. The statistics of natural images. Network : Comp. in Neural Sys., 5:598–605, 1994. [19] M. Figueiredo and J. Leit¨ o. Unsupervised image restoration and edge location using compound Gaussa Markov random fields and MDL principle. IEEE Trans. on Image Proc., 6(8):1089–1122, 1997. [20] J. Besag. On the statistical analysis of dirty pictures. J. of the Royal Stat. Soc., Series B, 48:259–302, 1986. [21] D. K. Hammond and E. P. Simoncelli. Image denoising with an orientation-adaptive Gaussian scale mixture model. In Proc. 13th IEEE Int’l. Conf. on Image Proc., pages 1433–1436, October 2006.
2 0.44027871 175 nips-2006-Simplifying Mixture Models through Function Approximation
Author: Kai Zhang, James T. Kwok
Abstract: Finite mixture model is a powerful tool in many statistical learning problems. In this paper, we propose a general, structure-preserving approach to reduce its model complexity, which can bring significant computational benefits in many applications. The basic idea is to group the original mixture components into compact clusters, and then minimize an upper bound on the approximation error between the original and simplified models. By adopting the L2 norm as the distance measure between mixture models, we can derive closed-form solutions that are more robust and reliable than using the KL-based distance measure. Moreover, the complexity of our algorithm is only linear in the sample size and dimensionality. Experiments on density estimation and clustering-based image segmentation demonstrate its outstanding performance in terms of both speed and accuracy.
3 0.43910593 20 nips-2006-Active learning for misspecified generalized linear models
Author: Francis R. Bach
Abstract: Active learning refers to algorithmic frameworks aimed at selecting training data points in order to reduce the number of required training data points and/or improve the generalization performance of a learning method. In this paper, we present an asymptotic analysis of active learning for generalized linear models. Our analysis holds under the common practical situation of model misspecification, and is based on realistic assumptions regarding the nature of the sampling distributions, which are usually neither independent nor identical. We derive unbiased estimators of generalization performance, as well as estimators of expected reduction in generalization error after adding a new training data point, that allow us to optimize its sampling distribution through a convex optimization problem. Our analysis naturally leads to an algorithm for sequential active learning which is applicable for all tasks supported by generalized linear models (e.g., binary classification, multi-class classification, regression) and can be applied in non-linear settings through the use of Mercer kernels. 1
4 0.43809971 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
Author: Dong S. Cheng, Vittorio Murino, Mário Figueiredo
Abstract: This paper proposes a new approach to model-based clustering under prior knowledge. The proposed formulation can be interpreted from two different angles: as penalized logistic regression, where the class labels are only indirectly observed (via the probability density of each class); as finite mixture learning under a grouping prior. To estimate the parameters of the proposed model, we derive a (generalized) EM algorithm with a closed-form E-step, in contrast with other recent approaches to semi-supervised probabilistic clustering which require Gibbs sampling or suboptimal shortcuts. We show that our approach is ideally suited for image segmentation: it avoids the combinatorial nature Markov random field priors, and opens the door to more sophisticated spatial priors (e.g., wavelet-based) in a simple and computationally efficient way. Finally, we extend our formulation to work in unsupervised, semi-supervised, or discriminative modes. 1
5 0.43795639 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization
Author: Rey Ramírez, Jason Palmer, Scott Makeig, Bhaskar D. Rao, David P. Wipf
Abstract: The ill-posed nature of the MEG/EEG source localization problem requires the incorporation of prior assumptions when choosing an appropriate solution out of an infinite set of candidates. Bayesian methods are useful in this capacity because they allow these assumptions to be explicitly quantified. Recently, a number of empirical Bayesian approaches have been proposed that attempt a form of model selection by using the data to guide the search for an appropriate prior. While seemingly quite different in many respects, we apply a unifying framework based on automatic relevance determination (ARD) that elucidates various attributes of these methods and suggests directions for improvement. We also derive theoretical properties of this methodology related to convergence, local minima, and localization bias and explore connections with established algorithms. 1
6 0.4375242 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
7 0.43744496 65 nips-2006-Denoising and Dimension Reduction in Feature Space
8 0.43506479 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
9 0.43453127 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
10 0.43400234 158 nips-2006-PG-means: learning the number of clusters in data
11 0.43277979 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
12 0.43216369 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation
13 0.43205646 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
14 0.43205348 165 nips-2006-Real-time adaptive information-theoretic optimization of neurophysiology experiments
15 0.43042272 21 nips-2006-AdaBoost is Consistent
16 0.43014938 119 nips-2006-Learning to Rank with Nonsmooth Cost Functions
17 0.42982501 194 nips-2006-Towards a general independent subspace analysis
18 0.42961842 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
19 0.42934778 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model
20 0.42921716 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds