jmlr jmlr2008 jmlr2008-73 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Gustavo Camps-Valls, Juan Gutiérrez, Gabriel Gómez-Pérez, Jesús Malo
Abstract: Conventional SVM-based image coding methods are founded on independently restricting the distortion in every image coefficient at some particular image representation. Geometrically, this implies allowing arbitrary signal distortions in an n-dimensional rectangle defined by the ε-insensitivity zone in each dimension of the selected image representation domain. Unfortunately, not every image representation domain is well-suited for such a simple, scalar-wise, approach because statistical and/or perceptual interactions between the coefficients may exist. These interactions imply that scalar approaches may induce distortions that do not follow the image statistics and/or are perceptually annoying. Taking into account these relations would imply using non-rectangular εinsensitivity regions (allowing coupled distortions in different coefficients), which is beyond the conventional SVM formulation. In this paper, we report a condition on the suitable domain for developing efficient SVM image coding schemes. We analytically demonstrate that no linear domain fulfills this condition because of the statistical and perceptual inter-coefficient relations that exist in these domains. This theoretical result is experimentally confirmed by comparing SVM learning in previously reported linear domains and in a recently proposed non-linear perceptual domain that simultaneously reduces the statistical and perceptual relations (so it is closer to fulfilling the proposed condition). These results highlight the relevance of an appropriate choice of the image representation before SVM learning. Keywords: image coding, non-linear perception models, statistical independence, support vector machines, insensitivity zone c 2008 Gustavo Camps-Valls, Juan Guti´ rrez, Gabriel G´ mez-P´ rez and Jes´ s Malo. e o e u ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO 1. Problem Statement: The Diagonal Jacobian Condition Image coding schemes based on support vector machines (SVM) have been successfully introduced in the literature. SVMs have been used in the spatial domain (Robinson and Kecman, 2000), in the block-DCT domain (Robinson and Kecman, 2003), and in the wavelet domain (Ahmed, 2005; Jiao et al., 2005). These coding methods take advantage of the ability of the support vector regression (SVR) algorithm for function approximation using a small number of parameters (signal samples, or ¨ support vectors) (Smola and Scholkopf, 2004). In all current SVM-based image coding techniques, a representation of the image is described by the entropy-coded weights associated to the support vectors necessary to approximate the signal with a given accuracy. Relaxing the accuracy bounds reduces the number of needed support vectors. In a given representation domain, reducing the number of support vectors increases the compression ratio at the expense of bigger distortion (lower image quality). By applying the standard SVR formulation, a certain amount of distortion in each sample of the image representation is allowed. In the original formulation, scalar restrictions on the errors are introduced using a constant ε-insensitivity value for every sample. ´ Recently, this procedure has been refined by Gomez-P´ rez et al. (2005) using a profile-dependent e SVR (Camps-Valls et al., 2001) that considers a different ε for each sample or frequency. This frequency-dependent insensitivity, ε f , accounts for the fact that, according to simple (linear) perception models, not every sample in linear frequency domains (such as DCT or wavelets) contributes to the perceived distortion in the same way. Despite different domains have been proposed for SVM training (spatial domain, block-DCT and wavelets) and different ε insensitivities per sample have been proposed, in conventional SVR formulation, the particular distortions introduced by regression in the different samples are not coupled. In all the reported SVM-based image coding schemes, the RBF kernel is used and the penalization parameter is fixed to an arbitrarily large value. In this setting, considering n-sample signals as n-dimensional vectors, the SVR guarantees that the approximated vectors are confined in n-dimensional rectangles around the original vectors. These rectangles are just n-dimensional cubes in the standard formulation or they have certain elongation if different ε f are considered in each axis, f . Therefore, in all the reported SVM-based coding methods, these rectangles are always oriented along the axes of the (linear) image representation. According to this, a common feature of these (scalar-wise) approaches is that they give rise to decoupled distortions in each dimension. P´ rez-Cruz et al. (2002) proposed a hyperspherical insensitivity zone to correct the penalization e factor in each dimension of multi-output regression problems, but again, restrictions to each sample were still uncoupled. This scalar-wise strategy is not the best option in domains where the different dimensions of the image representation are not independent. For instance, consider the situation where actually independent components, r f , are obtained from a given image representation, y, applying some eventually non-linear transform, R: R y −→ r. In this case, SVM regression with scalar-wise error restriction makes sense in the r domain. However, the original y domain will not be suitable for the standard SVM regression unless the matrix ∇R is diagonal (up to any permutation of the dimensions, that is, only one non-zero element per row). Therefore, if transforms that achieve independence have non-diagonal Jacobian, scalar-wise restrictions in the original (coupled coefficients) domain y are not allowed. 50 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING y2 1 −0,5 −0,5 0,4 R= r2 y1 r1 R−1 = 0,83 −0,83 0,67 −1,67 Figure 1: Insensitivity regions in different representation domains, y (left) and r (right), related by a non-diagonal transform ∇R and its inverse ∇R−1 . Figure 1 illustrates this situation. The shaded region in the right plot (r domain) represents the n-dimensional box determined by the ε f insensitivities in each dimension ( f =1,2), in which a scalar-wise approach is appropriate due to independence among signal coefficients. Given that the particular ∇R transform is not diagonal, the corresponding shaded region in the left plot (the original y domain) is not aligned along the axes of the representation. This has negative implications: note that for the highlighted points, smaller distortions in both dimensions in the y domain (as implied by SVM with tighter but scalar ε f insensitivities) do not necessarily imply lying inside the insensitivity region in the final truly independent (and meaningful) r domain. Therefore, the original y domain is not suitable for the direct application of conventional SVM, and consequently, non-trivial coupled insensitivity regions are required. Summarizing, in the image coding context, the condition for an image representation y to be strictly suitable for conventional SVM learning is that the transform that maps the original representation y to an independent coefficient representation r must be locally diagonal. As will be reviewed below, independence among coefficients (and the transforms to obtain them) may be defined in both statistical and perceptual terms (Hyvarinen et al., 2001; Malo et al., 2001; Epifanio et al., 2003; Malo et al., 2006). On the one hand, a locally diagonal relation to a statistically independent representation is desirable because independently induced distortions (as the conventional SVM approach does) will preserve the statistics of the distorted signal, that is, it will not introduce artificial-looking artifacts. On the other hand, a locally diagonal relation to a perceptually 51 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO independent representation is desirable because independently induced distortions do not give rise to increased subjective distortions due to non-trivial masking or facilitation interactions between the distortions in each dimension (Watson and Solomon, 1997). In this work, we show that conventional linear domains do not fulfill the diagonal Jacobian condition in either the statistical case or in the perceptual case. This theoretical result is experimentally confirmed by comparing SVM learning in previously reported linear domains (Robinson and Kecman, 2003; G´ mez-P´ rez et al., 2005) and in a recently proposed non-linear perceptual domain o e that simultaneously reduces the statistical and the perceptual relations (Malo et al., 2006), thus, this non-linear perceptual domain is closer to fulfilling the proposed condition. The rest of the paper is structured as follows. Section 2 reviews the fact that linear coefficients of the image representations commonly used for SVM training are neither statistically independent nor perceptually independent. Section 3 shows that transforms for obtaining statistical and/or perceptual independence from linear domains have non-diagonal Jacobian. This suggests that there is room to improve the performance of conventional SVM learning reported in linear domains. In Section 4, we propose the use of a perceptual representation for SVM training because it strictly fulfills the diagonal Jacobian condition in the perceptual sense and increases the statistical independence among coefficients, bringing it closer to fulfilling the condition in the statistical sense. The experimental image coding results confirm the superiority of this domain for SVM training in Section 5. Section 6 presents the conclusions and final remarks. 2. Statistical and Perceptual Relations Among Image Coefficients Statistical independence among the coefficients of a signal representation refers to the fact that the joint PDF of the class of signals to be considered can be expressed as a product of the marginal PDFs in each dimension (Hyvarinen et al., 2001). Simple (second-order) descriptions of statistical dependence use the non-diagonal nature of the covariance matrix (Clarke, 1985; Gersho and Gray, 1992). More recent and accurate descriptions use higher-order moments, mutual information, or the non-Gaussian nature (sparsity) of marginal PDFs (Hyvarinen et al., 2001; Simoncelli, 1997). Perceptual independence refers to the fact that the visibility of errors in coefficients of an image may depend on the energy of neighboring coefficients, a phenomenon known in the perceptual literature as masking or facilitation (Watson and Solomon, 1997). Perceptual dependence has been formalized just up to second order, and this may be described by the non-Euclidean nature of the perceptual metric matrix (Malo et al., 2001; Epifanio et al., 2003; Malo et al., 2006). 2.1 Statistical Relations In recent years, a variety of approaches, known collectively as “independent component analysis” (ICA), have been developed to exploit higher-order statistics for the purpose of achieving a unique linear solution for coefficient independence (Hyvarinen et al., 2001). The basis functions obtained when these methods are applied to images are spatially localized and selective for both orientation and spatial frequency (Olshausen and Field, 1996; Bell and Sejnowski, 1997). Thus, they are similar to basis functions of multi-scale wavelet representations. Despite its name, linear ICA does not actually produce statistically independent coefficients when applied to photographic images. Intuitively, independence would seem unlikely, since images are not formed from linear superpositions of independent patterns: the typical combination rule for the elements of an image is occlusion. Empirically, the coefficients of natural image decom52 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING | f | = 10.8 cpd | f | = 24.4 cpd −20 −10 −10 fy (cpd) −30 −20 fy (cpd) −30 0 0 10 10 20 20 30 30 −30 −20 −10 0 f (cpd) 10 20 30 −30 x −20 −10 0 f (cpd) 10 20 30 x Figure 2: Statistical interaction of two particular coefficients of the local Fourier Transform with their neighbors in a natural image database. The absolute value of the frequency of these coefficients is | f | = 10.8 and | f | = 24.4 cycles/degree (cpd). positions in spatially localized oscillating basis functions are found to be fairly well decorrelated (i.e., their covariance is almost zero). However, the amplitudes of coefficients at nearby spatial positions, orientations, and scales are highly correlated (even with orthonormal transforms) (Simoncelli, ´ 1997; Buccigrossi and Simoncelli, 1999; Wainwright et al., 2001; Hyvarinen et al., 2003; Guti errez et al., 2006; Malo et al., 2006; Malo and Guti´ rrez, 2006). This suggests that achieving statistical e independence requires the introduction of non-linearities beyond linear ICA transforms. Figure 2 reproduces one of many results that highlight the presence of statistical relations of natural image coefficients in block PCA or linear ICA-like domains: the energy of spatially localized oscillating filters is correlated with the energy of neighboring filters in scale and orientation (see Guti´ rrez et al., 2006). A remarkable feature is that the interaction width increases with frequency, e as has been reported in other domains, for example, wavelets (Buccigrossi and Simoncelli, 1999; Wainwright et al., 2001; Hyvarinen et al., 2003), and block-DCT (Malo et al., 2006). In order to remove the remaining statistical relations in the linear domains y, non-linear ICA methods are necessary (Hyvarinen et al., 2001; Lin, 1999; Karhunen et al., 2000; Jutten and Karhunen, 2003). Without lack of generality, non-linear ICA transforms can be schematically understood as a two-stage process (Malo and Guti´ rrez, 2006): e T R (( y hh x hh (( r, (1) R−1 T−1 where x is the image representation in the spatial domain, and T is a global unitary linear transform that removes second-order and eventually higher-order relations among coefficients in the spatial domain. Particular examples of T include block PCA, linear ICAs, DCT or wavelets. In the ICA literature notation, T is the separating matrix and T−1 is the mixing matrix. The second transform 53 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO R is an additional non-linearity that is introduced in order to remove the statistical relations that still remain in the y domain. 2.2 Perceptual Relations Perceptual dependence among coefficients in different image representations can be understood by using the current model of V1 cortex. This model can also be summarized by the two-stage (linear and non-linear) process described in Equation (1). In this perceptual case, T is also a linear filter bank applied to the original input image in the spatial domain. This filter bank represents the linear behavior of V1 neurons whose receptive fields happen to be similar to wavelets or linear ICA basis functions (Olshausen and Field, 1996; Bell and Sejnowski, 1997). The second transform, R, is a non-linear function that accounts for the masking and facilitation phenomena that have been reported in the linear y domain (Foley, 1994; Watson and Solomon, 1997). Section 3.2 gives a parametric expression for the second non-linear stage, R: the divisive normalization model (Heeger, 1992; Foley, 1994; Watson and Solomon, 1997). This class of models is based on psychophysical experiments assuming that the last domain, r, is perceptually Euclidean (i.e., perfect perceptual independence). An additional confirmation of this assumption is the success of (Euclidean) subjective image distortion measures defined in that domain (Teo and Heeger, 1994). Straightforward application of Riemannian geometry to obtain the perceptual metric matrix in other domains shows that the coefficients of linear domains x and y, or any other linear transform of them, are not perceptually independent (Epifanio et al., 2003). Figure 3 illustrates the presence of perceptual relations between coefficients when using linear block frequency or wavelet-like domains, y: the cross-masking behavior. In this example, the visibility of the distortions added on top of the background image made of periodic patterns has to be assessed. This is a measure of the sensitivity of a particular perceptual mechanism to distortions in that dimension, ∆y f , when mechanisms tuned to other dimensions are simultaneously active, that is, y f = 0, with f = f . As can be observed, low frequency noise is more visible in high frequency backgrounds than in low frequency backgrounds (e.g., left image). Similarly, high frequency noise is more visible in low frequency backgrounds than in high frequency ones (e.g., right image). That is to say, a signal of a specific frequency strongly masks the corresponding frequency analyzer, but it induces a smaller sensitivity reduction in the analyzers that are tuned to different frequencies. In other words, the reduction in sensitivity of a specific analyzer gets larger as the distance between the background frequency and the frequency of the analyzer gets smaller. The response of each frequency analyzer not only depends on the energy of the signal for that frequency band, but also on the energy of the signal in other frequency bands (cross-masking). This implies that a different amount of noise in each frequency band may be acceptable depending on the energy of that frequency band and on the energy of neighboring bands. This is what we have called perceptual dependence among different coefficients in the y domain. At this point, it is important to stress the similarity between the set of computations to obtain statistically decoupled image coefficients and the known stages of biological vision. In fact, it has been hypothesized that biological visual systems have organized their sensors to exploit the particular statistics of the signals they have to process. See Barlow (2001), Simoncelli and Olshausen (2001), and Simoncelli (2003) for reviews on this hypothesis. In particular, both the linear and the non-linear stages of the cortical processing have been successfully derived using redundancy reduction arguments: nowadays, the same class of linear 54 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING 3 cpd 6 cpd 12 cpd 24 cpd Figure 3: Illustrative example of perceptual dependence (cross-masking phenomenon). Equal energy noise of different frequency content, 3 cycl/deg (cpd), 6 cpd, 12 cpd and 24 cpd, shown on top of a background image. Sampling frequency assumes that these images subtend an angle of 3 deg. stage T is used in transform coding algorithms and in vision models (Olshausen and Field, 1996; Bell and Sejnowski, 1997; Taubman and Marcellin, 2001), and new evidence supports the same idea for the second non-linear stage (Schwartz and Simoncelli, 2001; Malo and Guti´ rrez, 2006). e According to this, the statistical and perceptual transforms, R, that remove the above relations from the linear domains, y, would be very similar if not the same. 3. Statistical and Perceptual Independence Imply Non-diagonal Jacobian In this section, we show that both statistical redundancy reduction transforms (e.g., non-linear ICA) and perceptual independence transforms (e.g., divisive normalization), have non-diagonal Jacobian for any linear image representation, so they are not strictly suitable for conventional SVM training. 3.1 Non-diagonal Jacobian in Non-linear ICA Transforms One possible approach for dealing with global non-linear ICA is to act differentially by breaking the problem into local linear pieces that can then be integrated to obtain the global independent coefficient domain (Malo and Guti´ rrez, 2006). Each differential sub-problem around a particular e point (image) can be locally solved using the standard linear ICA methods restricted to the neighbors of that point (Lin, 1999). Using the differential approach in the context of a two-stage process such as the one in Equation (1), it can be shown that (Malo and Guti´ rrez, 2006): e r = r0 + Z x x0 T (x ) dx = r0 + Z x x0 ∇R(Tx ) T dx , (2) where T (x ) is the local separating matrix for a neighborhood of the image x , and T is the global separating matrix for the whole PDF. Therefore, the Jacobian of the second non-linear stage is: ∇R(y) = ∇R(Tx) = T (x) T−1 . 55 (3) ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO As local linear independent features around a particular image, x, differ in general from global linear independent features, that is, T (x) = T, the above product is not the identity nor diagonal in general. 3.2 Non-diagonal Jacobian in Non-linear Perceptual Transforms The current response model for the cortical frequency analyzers is non-linear (Heeger, 1992; Watson and Solomon, 1997). The outputs of the filters of the first linear stage, y, undergo a non-linear sigmoid transform in which the energy of each linear coefficient is weighted by a linear Contrast Sensitivity Function (CSF) (Campbell and Robson, 1968; Malo et al., 1997) and is further normalized by a combination of the energies of neighbor coefficients in frequency, r f = R(y) f = sgn(y f ) |α f y f |γ , β f + ∑n =1 h f f |α f y f |γ f (4) where α f (Figure 4[top left]) are CSF-like weights, β f (Figure 4[top right]) control the sharpness of the response saturation for each coefficient, γ is the so called excitation exponent, and the matrix h f f determines the interaction neighborhood in the non-linear normalization of the energy. This interaction matrix models the cross-masking behavior (cf. Section 2.2). The interaction in this matrix is assumed to be Gaussian (Watson and Solomon, 1997), and its width increases with the frequency. Figure 4[bottom] shows two examples of this Gaussian interaction for two particular coefficients in a local Fourier domain. Note that the width of the perceptual interaction neighborhood increases with the frequency in the same way as the width of the statistical interaction neighborhood shown in Figure 2. We used a value of γ = 2 in the experiments. Taking derivatives in the general divisive normalization model, Equation (4), we obtain ∇R(y) f f = sgn(y f )γ α f |α f y f |γ |α f y f |γ−1 α f |α f y f |γ−1 δf f − hf f β f + ∑n =1 h f f |α f y f |γ (β f + ∑n =1 h f f |α f y f |γ )2 f f , (5) which is not diagonal because of the interaction matrix, h, which describes the cross-masking between each frequency f and the remaining f = f . Note that the intrinsic non-linear nature of both the statistical and perceptual transforms, Equations (3) and (5), makes the above results true for any linear domain under consideration. Specifically, if any other possible linear domain for image representation is considered, y = T y, then the Jacobian of the corresponding independence transform, R , is ∇R (y ) = ∇R(y) T −1 , which, in general, will also be non-diagonal because of the non-diagonal and point-dependent nature of ∇R(y). To summarize, since no linear domain fulfills the diagonal Jacobian condition in either statistical or perceptual terms, the negative situation illustrated in Figure 1 may occur when using SVM in these domains. Therefore, improved results could be obtained if SVM learning were applied after some transform achieving independent coefficients, R. 4. SVM Learning in a Perceptually Independent Representation In order to confirm the above theoretical results (i.e., the unsuitability of linear representation domains for SVM learning) and to assess the eventual gain that can be obtained from training SVR 56 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING 0.04 αf 0.015 βf 0.03 0.012 0.02 0.009 0.006 0.01 0.003 0 0 4 8 12 16 20 24 28 32 0 f (frequency cycles/degree) 4 8 | f | = 10.8 cpd 16 20 24 28 32 | f | = 24.4 cpd −30 −20 −20 −10 −10 f (cpd) −30 0 y 0 y f (cpd) 12 f (frequency cycles/degree) 10 10 20 20 30 −30 −20 −10 0 fx (cpd) 10 20 30 −30 30 −20 −10 0 fx (cpd) 10 20 30 Figure 4: Parameters of the perceptual model: α f (top left), β f (top right). Bottom figures represent perceptual interaction neighborhoods h f f of two particular coefficients of the local Fourier domain. in a more appropriate domain, we should compare the performance of SVRs in previously reported linear domains (e.g., block-DCT or wavelets) and in one of the proposed non-linear domains (either the statistically independent domain or the perceptually independent domain). Exploration of the statistical independence transform may have academic interest but, in its present formulation, it is not practical for coding purposes: direct application of non-linear ICA as in Equation (2) is very time-consuming for high dimensional vectors since lots of local ICA computations are needed to transform each block, and a very large image database is needed for a robust and significant computation of R. Besides, an equally expensive differential approach is also needed to compute the inverse R−1 for image decoding. In contrast, the perceptual non-linearity (and its inverse) are analytical. These analytical expressions are feasible for reasonable block sizes, and there are efficient iterative methods that can be used for larger vectors (Malo et al., 2006). In this paper, we explore the use of a psychophysically-based divisive normalized domain: first compute a block-DCT transform and then apply the divisive normalization model described above for each block. The results will be compared to the first competitive SVM coding results (Robinson 57 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO ´ and Kecman, 2003) and the posterior improvements reported by G omez-P´ rez et al. (2005), both e formulated in the linear block-DCT domain. As stated in Section 2, by construction, the proposed domain is perceptually Euclidean with perceptually independent components. The Euclidean nature of this domain has an additional benefit: the ε-insensitivity design is very simple because a constant value is appropriate due to the constant perceptual relevance of all coefficients. Thus, direct application of the standard SVR method is theoretically appropriate in this domain. Moreover, beyond its built-in perceptual benefits, this psychophysically-based divisive normalization has attractive statistical properties: it strongly reduces the mutual information between the final coefficients r (Malo et al., 2006). This is not surprising according to the hypothesis that try to explain the early stages of biological vision systems using information theory arguments (Barlow, 1961; Simoncelli and Olshausen, 2001). Specifically, dividing the energy of each linear coefficient by the energy of the neighbors, which are statistically related with it, cf. Figure 2, gives coefficients with reduced statistical dependence. Moreover, as the empirical non-linearities of perception have been reproduced using non-linear ICA in Equation (2) (Malo and Guti´ rrez, 2006), the empirical die visive normalization can be seen as a convenient parametric way to obtain statistical independence. 5. Performance of SVM Learning in Different Domains In this section, we analyze the performance of SVM-based coding algorithms in linear and nonlinear domains through rate-distortion curves and explicit examples for visual comparison. In addition, we discuss how SVM selects support vectors in these domains to represent the image features. 5.1 Model Development and Experimental Setup In the (linear) block-DCT domain, y, we use the method introduced by Robinson and Kecman (2003) (RKi-1), in which the SVR is trained to learn a fixed (low-pass) number of DCT coefficients ´ (those with frequency bigger than 20 cycl/deg are discarded); and the method proposed by G omezP´ rez et al. (2005) (CSF-SVR), in which the relevance of all DCT coefficients is weighted according e to the CSF criterion using an appropriately modulated ε f . In the non-linear domain, r, we use the SVR with constant insensitivity parameter ε (NL-SVR). In all cases, the block-size is 16×16, that is, y, r ∈ R256 . The behavior of JPEG standard is also included in the experiments for comparison purposes. As stated in Section 1, we used the RBF kernel and arbitrarily large penalization parameter in every SVR case. In all experiments, we trained the SVR models without the bias term, and modelled the absolute value of the DCT, y, or response coefficients, r. All the remaining free parameters (ε-insensitivity and Gaussian width of the RBF kernel σ) were optimized for all the considered models and different compression ratios. In the NL-SVM case, the parameters of the divisive normalization used in the experiments are shown in Figure 4. After training, the signal is described by the uniformly quantized Lagrange multipliers of the support vectors needed to keep the regression error below the thresholds ε f . The last step is entropy coding of the quantized weights. The compression ratio is controlled by a factor applied to the thresholds, ε f . 58 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING 5.2 Model Comparison In order to assess the quality of the coded images, three different measures were used: the standard ´ (Euclidean) RMSE, the Maximum Perceptual Error (MPE) (Malo et al., 2000; G omez-P´ rez et al., e 2005; Malo et al., 2006) and the also perceptually meaningful Structural SIMilarity (SSIM) index (Wang et al., 2004). Eight standard 256×256 monochrome 8 bits/pix images were used in the experiments. Average rate-distortion curves are plotted in Figure 5 in the range [0.05, 0.6] bits/pix (bpp). According to these entropy-per-sample data, original file size was 64 KBytes in every case, while the compressed image sizes were in the range [0.4, 4.8] KBytes. This implies that the compression ratios were in the range [160:1, 13:1]. In general, a clear gain over standard JPEG is obtained by all SVM-based methods. According to the standard Euclidean MSE point of view, the performance of RKi-1 and CSF-SVR algorithms is basically the same (note the overlapped curves in Figure 5(a)). However, it is widely known that the MSE results are not useful to represent the subjective quality of images, as extensively reported elsewhere (Girod, 1993; Teo and Heeger, 1994; Watson and Malo, 2002). When using more appropriate (perceptually meaningful) quality measures (Figures 5(b)-(c)), the CSF-SVR obtains a certain advantage over the RKi-1 algorithm for all compression rates, which was already reported by G´ mez-P´ rez et al. (2005). In all measures, and for the whole considered entropy range, the o e proposed NL-SVR clearly outperforms all previously reported methods, obtaining a noticeable gain at medium-to-high compression ratios (between 0.1 bpp (80:1) and 0.3 bpp (27:1)). Taking into account that the recommended bit rate for JPEG is about 0.5 bpp, from Figure 5 we can also conclude that the proposed technique achieves the similar quality levels at a lower bit rate in the range [0.15, 0.3] bpp. Figure 6 shows representative visual results of the considered SVM strategies on standard images (Lena and Barbara) at the same bit rate (0.3 bpp, 27:1 compression ratio or 2.4 KBytes in 256×256 images). The visual inspection confirms that the numerical gain in MPE and SSIM shown in Figure 5 is also perceptually significant. Some conclusions can be extracted from this figure. ´ First, as previously reported by Gomez-P´ rez et al. (2005), RKi-1 leads to poorer (blocky) results e because of the crude approximation of the CSF (as an ideal low-pass filter) and the equal relevance applied to the low-frequency DCT-coefficients. Second, despite the good performance yielded by the CSF-SVR approach to avoid blocking effects, it is worth noting that high frequency details are smoothed (e.g., see Barbara’s scarf). These effects are highly alleviated by introducing SVR in the non-linear domain. See, for instance, Lena’s eyes, her hat’s feathers or the better reproduction of the high frequency pattern in Barbara’s clothes. Figure 7 shows the results obtained by all considered methods at a very high compression ratio for the Barbara image (0.05 bpp, 160:1 compression ratio or 0.4 KBytes in 256×256 images). This experiment is just intended to show the limits of methods performance since it is out of the recommended rate ranges. Even though this scenario is unrealistic, differences among methods are still noticeable: the proposed NL-SVR method reduces the blocky effects (note for instance that the face is better reproduced). This is due to a better distribution of support vectors in the perceptually independent domain. 5.3 Support Vector Distribution The observed different perceptual image quality obtained with each approach is a direct consequence of support vector distribution in different domains. Figure 8 shows a representative example 59 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO 20 JPEG RKi−1 CSF−SVR NL−SVR 18 RMSE 16 14 12 10 8 0.1 0.2 0.3 0.4 Entropy (bits/pix) 0.5 0.6 22 JPEG RKi−1 CSF−SVR NL−SVR 20 18 16 MPE 14 12 10 8 6 4 0.1 0.2 0.3 0.4 Entropy (bits/pix) 0.5 0.6 0.85 0.8 SSIM 0.75 0.7 0.65 JPEG 0.6 RKi−1 CSF−SVR 0.55 NL−SVR 0.5 0.1 0.2 0.3 0.4 Entropy (bits/pix) 0.5 0.6 Figure 5: Average rate distortion curves over eight standard images (Lena, Barbara, Boats, Einstein, Peppers, Mandrill, Goldhill, Camera man) using objective and subjective measures for the considered JPEG (dotted) and the SVM approaches (RKi-1 dash-dotted, CSF-SVR dashed and NL-SVR solid). RMSE distortion (top), Maximum Perceptual Error, MPE (middle) (Malo et al., 2000; G´ mez-P´ rez et al., 2005; Malo et al., 2006), and Structural o e SIMilarity index, SSIM (bottom) (Wang et al., 2004). 60 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING Rki Rki SVR+CSF SVR+CSF NL+SVR NL+SVR Figure 6: Examples of decoded Lena (left) and Barbara (right) images at 0.3 bits/pix. From top to bottom: JPEG, RKi-1, CSF-SVR, and NL-SVR. 61 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO (a) (b) (c) (d) Figure 7: Examples of decoded Barbara images at a high compression ratio of 0.05 bits/pix (160:1) for (a) JPEG, (b) RKi-1, (c) CSF-SVR, and (d) NL-SVR. of the distribution of the selected support vectors by the RKi-1 and the CSF-SVR models working in the linear DCT domain, and the NL-SVM working in the perceptually independent non-linear domain r. Specifically, a block of Barbara’s scarf at different compression ratios is used for illustration purposes. The RKi-1 approach (Robinson and Kecman, 2003) uses a constant ε but, in order to consider the low subjective relevance of the high-frequency region, the corresponding coefficients are neglected. As a result, this approach only allocates support vectors in the low/medium frequency regions. The CSF-SVR approach uses a variable ε according to the CSF and gives rise to a more natural concentration of support vectors in the low/medium frequency region, which captures medium to high frequency details at lower compression rates (0.5 bits/pix). Note that the number of support vectors is bigger than in the RKi-1 approach, but it selects some necessary high-frequency coefficients to keep the error below the selected threshold. However, for bigger compression ratios (0.3 bits/pix), it misrepresents some high frequency, yet relevant, features (e.g., the peak from the stripes). The NL-SVM approach works in the non-linear transform domain, in which a more uniform coverage 62 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING (a) (b) (c) 0 200 10 100 0 200 5 5 15 0 0 20 30 25 20 10 20 30 fy (cpd) 10 0 fy (cpd) 30 10 fx (cpd) 0 200 5 10 100 0 200 5 10 100 15 25 10 30 20 30 f (cpd) y 10 0 30 10 0.5 20 30 25 20 5 1 0 0 20 30 0 1.5 15 15 0 fy (cpd) 30 0 fx (cpd) 20 25 20 0 fx (cpd) fx (cpd) 20 30 25 20 30 10 0.5 15 0 5 1 10 100 15 0 1.5 25 20 f (cpd) y 10 30 fy (cpd) 0 0 fx (cpd) fx (cpd) Figure 8: Signal in different domains and the selected support vectors by the SVM models in a block of the Barbara image at 0.3 bits/pix (top row) and 0.5 bits/pix (bottom row). Different domains are analyzed: (a) linear DCT using RKi-1, (b) linear DCT with CSF-SVM, and (c) non-linear perceptual domain with standard ε-SVM (NL-SVR). of the domain is done, accounting for richer (and perceptually independent) coefficients to perform efficient sparse signal reconstruction. It is important to remark that, for a given method (or domain), tightening ε f implies (1) considering more support vectors, and (2) an increase in entropy (top and bottom rows in Figure 8, 0.3 bpp to 0.5 bpp). However, note that the relevant measure is the entropy and not the number of support vectors: even though the number of selected support vectors in the r domain is higher, their variance is lower, thus giving rise to the same entropy after entropy coding. 6. Conclusions In this paper, we have reported a condition on the suitable domain for developing efficient SVM image coding schemes. The so-called diagonal Jacobian condition states that SVM regression with scalar-wise error restriction in a particular domain makes sense only if the transform that maps this domain to an independent coefficient representation is locally diagonal. We have demonstrated that, 63 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO in general, linear domains do not fulfill this condition because non-trivial statistical and perceptual inter-coefficient relations do exist in these domains. This theoretical finding has been experimentally confirmed by observing that improved compression results are obtained when SVM is applied in a non-linear perceptual domain that starts from the same linear domain used by previously reported SVM-based image coding schemes. These results highlight the relevance of an appropriate image representation choice before SVM learning. Further work is tied to the use of SVM-based coding schemes in statistically, rather than perceptually, independent non-linear ICA domains. In order to do so, local PCA instead of local ICA may be used in the local-to-global differential approach (Malo and Guti´ rrez, 2006) to speed up the e non-linear computation. Acknowledgments This work has been partly supported by the Spanish Ministry of Education and Science under grant CICYT TEC2006-13845/TCM and by the Generalitat Valenciana under grant GV-06/215. References R. Ahmed. Wavelet-based image compression using SVM learning and encoding techniques. Proc. 8th IASTED International Conference Computer Graphics and Imaging, 1:162–166, 2005. H. B. Barlow. Redundancy reduction revisited. Network: Comp. Neur. Sys., 12:241–253, 2001. H. B. Barlow. Possible principles underlying the transformation of sensory messages. In WA Rosenblith, editor, Sensory Communication, pages 217–234. MIT Press, Cambridge, MA, 1961. A. J. Bell and T. J. Sejnowski. The ‘independent components’ of natural scenes are edge filters. Vision Research, 37(23):3327–3338, 1997. R. W. Buccigrossi and E. P. Simoncelli. Image compression via joint statistical characterization in the wavelet domain. IEEE Transactions on Image Processing, 8(12):1688–1701, 1999. F. W. Campbell and J. G. Robson. Application of Fourier analysis to the visibility of gratings. Journal of Physiology, 197(3):551–566, August 1968. G. Camps-Valls, E. Soria-Olivas, J. P´ rez-Ruixo, A. Art´ s-Rodr´guez, F. P´ rez-Cruz, and e e ı e A. Figueiras-Vidal. A profile-dependent kernel-based regression for cyclosporine concentration prediction. In Neural Information Processing Systems (NIPS) – Workshop on New Directions in Kernel-Based Learning Methods, Vancouver, Canada, December 2001. R. J. Clarke. Transform Coding of Images. Academic Press, New York, 1985. I. Epifanio, J. Guti´ rrez, and J. Malo. Linear transform for simultaneous diagonalization of coe variance and perceptual metric matrix in image coding. Pattern Recognition, 36(8):1799–1811, August 2003. J. M. Foley. Human luminance pattern mechanisms: Masking experiments require a new model. Journal of the Optical Society of America A, 11(6):1710–1719, 1994. 64 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING A. Gersho and R. M. Gray. Vector Quantization and Signal Compression. Kluwer Academic Press, Boston, 1992. B. Girod. What’s wrong with mean-squared error. In A. B. Watson, editor, Digital Images and Human Vision, pages 207–220. The MIT press, 1993. G. G´ mez-P´ rez, G. Camps-Valls, J. Guti´ rrez, and J. Malo. Perceptually adaptive insensitivity for o e e support vector machine image coding. IEEE Transactions on Neural Networks, 16(6):1574–1581, 2005. J. Guti´ rrez, F. J. Ferri, and J. Malo. Regularization operators for natural images based on nonlinear e perception models. IEEE Transactions on Image Processing, 15(1):189–2000, January 2006. D. J. Heeger. Normalization of cell responses in cat striate cortex. Vis. Neurosci., 9:181–198, 1992. A. Hyvarinen, J. Karhunen, and E. Oja. Independent Component Analysis. John Wiley & Sons, New York, 2001. A. Hyvarinen, J. Hurri, and J. Vayrynenm. Bubbles: a unifying framework for low-level statistical properties of natural image sequences. Journal of the Optical Society of America A, 20(7), 2003. R. Jiao, Y. Li, Q. Wang, and B. Li. SVM regression and its application to image compression. In International Conference on Intelligent Computing, volume 3645, pages 747–756. Lecture Notes on Computer Science, 2005. C. Jutten and J. Karhunen. Advances in nonlinear blind source separation. Proc. of the 4th Int. Symp. on Independent Component Analysis and Blind Signal Separation (ICA2003), pages 245– 256, 2003. J. Karhunen, S. Malaroiu, and M. Ilmoniemi. Local linear independent component analysis based on clustering. Intl. J. Neur. Syst., 10(6), December 2000. J. K. Lin. Factorizing probability density functions: Generalizing ICA. In Proc. of the First Intl. Workshop on ICA and Signal Separation, pages 313–318, Aussois, France, 1999. J. Malo and J. Guti´ rrez. V1 non-linear properties emerge from local-to-global non-linear ICA. e Network: Computation in Neural Systems, 17:85–102, 2006. J. Malo, A. M. Pons, A. Felipe, and J. M. Artigas. Characterization of human visual system threshold performance by a weighting function in the Gabor domain. Journal of Modern Optics, 44(1): 127–148, 1997. J. Malo, F. Ferri, J. Albert, J. Soret, and J. M. Artigas. The role of perceptual contrast non-linearities in image transform coding. Image & Vision Computing, 18(3):233–246, February 2000. J. Malo, J. Guti´ rrez, I. Epifanio, F. Ferri, and J. M. Artigas. Perceptual feed-back in multigrid moe tion estimation using an improved DCT quantization. IEEE Transactions on Image Processing, 10(10):1411–1427, October 2001. J. Malo, I. Epifanio, R. Navarro, and E. Simoncelli. Non-linear image representation for efficient perceptual coding. IEEE Transactions on Image Processing, 15(1):68–80, 2006. 65 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–609, 1996. F. P´ rez-Cruz, G. Camps-Valls, E. Soria-Olivas, J. J. P´ rez-Ruixo, A. R. Figueiras-Vidal, and e e A. Art´ s-Rodr´guez. Multi-dimensional function approximation and regression estimation. In e ı International Conference on Artificial Neural Networks, ICANN2002, Madrid, Spain., Aug 2002. Lecture Notes in Computer Science. Springer–Verlag. J. Robinson and V. Kecman. The use of Support Vector Machines in image compression. In Proceedings of the International Conference on Engineering Intelligence Systems, EIS’2000, volume 36, pages 93–96, University of Paisley, Scotland, U.K., June 2000. J. Robinson and V. Kecman. Combining Support Vector Machine learning with the discrete cosine transform in image compression. IEEE Transactions Neural Networks, 14(4):950–958, July 2003. O. Schwartz and E. P. Simoncelli. Natural signal statistics and sensory gain control. Nature Neuroscience, 4(8):819–825, 2001. E. P. Simoncelli. Vision and the statistics of the visual environment. Current Opinion in Neurobiology, 13(2):144–149, 2003. E. P. Simoncelli. Statistical models for images: Compression, restoration and synthesis. In 31st Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, 1997. E. P. Simoncelli and B. O. Olshausen. Natural image statistics and neural representation. Annual Review of Neuroscience, 24:1193–1216, 2001. A. J. Smola and B. Sch¨ lkopf. A tutorial on support vector regression. Statistics and Computing, o 14:199–222, 2004. D. S. Taubman and M. W. Marcellin. JPEG2000: Image Compression Fundamentals, Standards and Practice. Kluwer Academic Publishers, Boston, 2001. P. C. Teo and D. J. Heeger. Perceptual image distortion. Proceedings of the First IEEE International Conference on Image Processing, 2:982–986, 1994. M. J. Wainwright, E. P. Simoncelli, and A. S. Willsky. Random cascades on wavelet trees and their use in analyzing and modeling natural images. Applied and Computational Harmonic Analysis, 11:89–123, 2001. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004. A. B. Watson and J. Malo. Video quality measures based on the standard spatial observer. In Proceedings of the IEEE International Conference on Image Proceedings, volume 3, pages 41– 44, 2002. A. B. Watson and J. A. Solomon. A model of visual contrast gain control and pattern masking. Journal of the Optical Society of America A, 14(9):2379–2391, September 1997. 66
Reference: text
sentIndex sentText sentNum sentScore
1 Enginyeria Electr` nica o Universitat de Val` ncia e 46100 Burjassot, Val` ncia, Spain e Juan Guti´ rrez e JUAN . [sent-4, score-0.318]
2 Inform` tica a Universitat de Val` ncia e 46100 Burjassot, Val` ncia, Spain e Gabriel G´ mez-P´ rez o e GABRIEL . [sent-7, score-0.252]
3 d’Optica Universitat de Val` ncia e 46100 Burjassot, Val` ncia, Spain e Editor: Donald Geman Abstract Conventional SVM-based image coding methods are founded on independently restricting the distortion in every image coefficient at some particular image representation. [sent-13, score-0.698]
4 Geometrically, this implies allowing arbitrary signal distortions in an n-dimensional rectangle defined by the ε-insensitivity zone in each dimension of the selected image representation domain. [sent-14, score-0.407]
5 Unfortunately, not every image representation domain is well-suited for such a simple, scalar-wise, approach because statistical and/or perceptual interactions between the coefficients may exist. [sent-15, score-0.691]
6 These interactions imply that scalar approaches may induce distortions that do not follow the image statistics and/or are perceptually annoying. [sent-16, score-0.472]
7 Taking into account these relations would imply using non-rectangular εinsensitivity regions (allowing coupled distortions in different coefficients), which is beyond the conventional SVM formulation. [sent-17, score-0.226]
8 In this paper, we report a condition on the suitable domain for developing efficient SVM image coding schemes. [sent-18, score-0.362]
9 We analytically demonstrate that no linear domain fulfills this condition because of the statistical and perceptual inter-coefficient relations that exist in these domains. [sent-19, score-0.548]
10 This theoretical result is experimentally confirmed by comparing SVM learning in previously reported linear domains and in a recently proposed non-linear perceptual domain that simultaneously reduces the statistical and perceptual relations (so it is closer to fulfilling the proposed condition). [sent-20, score-1.069]
11 These results highlight the relevance of an appropriate choice of the image representation before SVM learning. [sent-21, score-0.214]
12 Keywords: image coding, non-linear perception models, statistical independence, support vector machines, insensitivity zone c 2008 Gustavo Camps-Valls, Juan Guti´ rrez, Gabriel G´ mez-P´ rez and Jes´ s Malo. [sent-22, score-0.528]
13 SVMs have been used in the spatial domain (Robinson and Kecman, 2000), in the block-DCT domain (Robinson and Kecman, 2003), and in the wavelet domain (Ahmed, 2005; Jiao et al. [sent-25, score-0.319]
14 These coding methods take advantage of the ability of the support vector regression (SVR) algorithm for function approximation using a small number of parameters (signal samples, or ¨ support vectors) (Smola and Scholkopf, 2004). [sent-27, score-0.17]
15 In all current SVM-based image coding techniques, a representation of the image is described by the entropy-coded weights associated to the support vectors necessary to approximate the signal with a given accuracy. [sent-28, score-0.563]
16 In a given representation domain, reducing the number of support vectors increases the compression ratio at the expense of bigger distortion (lower image quality). [sent-30, score-0.409]
17 By applying the standard SVR formulation, a certain amount of distortion in each sample of the image representation is allowed. [sent-31, score-0.244]
18 ´ Recently, this procedure has been refined by Gomez-P´ rez et al. [sent-33, score-0.186]
19 This frequency-dependent insensitivity, ε f , accounts for the fact that, according to simple (linear) perception models, not every sample in linear frequency domains (such as DCT or wavelets) contributes to the perceived distortion in the same way. [sent-36, score-0.275]
20 Despite different domains have been proposed for SVM training (spatial domain, block-DCT and wavelets) and different ε insensitivities per sample have been proposed, in conventional SVR formulation, the particular distortions introduced by regression in the different samples are not coupled. [sent-37, score-0.293]
21 In all the reported SVM-based image coding schemes, the RBF kernel is used and the penalization parameter is fixed to an arbitrarily large value. [sent-38, score-0.301]
22 Therefore, in all the reported SVM-based coding methods, these rectangles are always oriented along the axes of the (linear) image representation. [sent-41, score-0.301]
23 This scalar-wise strategy is not the best option in domains where the different dimensions of the image representation are not independent. [sent-45, score-0.273]
24 For instance, consider the situation where actually independent components, r f , are obtained from a given image representation, y, applying some eventually non-linear transform, R: R y −→ r. [sent-46, score-0.155]
25 Therefore, if transforms that achieve independence have non-diagonal Jacobian, scalar-wise restrictions in the original (coupled coefficients) domain y are not allowed. [sent-49, score-0.193]
26 Therefore, the original y domain is not suitable for the direct application of conventional SVM, and consequently, non-trivial coupled insensitivity regions are required. [sent-55, score-0.239]
27 Summarizing, in the image coding context, the condition for an image representation y to be strictly suitable for conventional SVM learning is that the transform that maps the original representation y to an independent coefficient representation r must be locally diagonal. [sent-56, score-0.648]
28 As will be reviewed below, independence among coefficients (and the transforms to obtain them) may be defined in both statistical and perceptual terms (Hyvarinen et al. [sent-57, score-0.507]
29 In this work, we show that conventional linear domains do not fulfill the diagonal Jacobian condition in either the statistical case or in the perceptual case. [sent-64, score-0.565]
30 This theoretical result is experimentally confirmed by comparing SVM learning in previously reported linear domains (Robinson and Kecman, 2003; G´ mez-P´ rez et al. [sent-65, score-0.3]
31 , 2005) and in a recently proposed non-linear perceptual domain o e that simultaneously reduces the statistical and the perceptual relations (Malo et al. [sent-66, score-0.955]
32 , 2006), thus, this non-linear perceptual domain is closer to fulfilling the proposed condition. [sent-67, score-0.5]
33 Section 2 reviews the fact that linear coefficients of the image representations commonly used for SVM training are neither statistically independent nor perceptually independent. [sent-69, score-0.367]
34 Section 3 shows that transforms for obtaining statistical and/or perceptual independence from linear domains have non-diagonal Jacobian. [sent-70, score-0.589]
35 In Section 4, we propose the use of a perceptual representation for SVM training because it strictly fulfills the diagonal Jacobian condition in the perceptual sense and increases the statistical independence among coefficients, bringing it closer to fulfilling the condition in the statistical sense. [sent-72, score-0.922]
36 The experimental image coding results confirm the superiority of this domain for SVM training in Section 5. [sent-73, score-0.362]
37 Perceptual independence refers to the fact that the visibility of errors in coefficients of an image may depend on the energy of neighboring coefficients, a phenomenon known in the perceptual literature as masking or facilitation (Watson and Solomon, 1997). [sent-81, score-0.753]
38 Perceptual dependence has been formalized just up to second order, and this may be described by the non-Euclidean nature of the perceptual metric matrix (Malo et al. [sent-82, score-0.407]
39 The basis functions obtained when these methods are applied to images are spatially localized and selective for both orientation and spatial frequency (Olshausen and Field, 1996; Bell and Sejnowski, 1997). [sent-89, score-0.229]
40 Intuitively, independence would seem unlikely, since images are not formed from linear superpositions of independent patterns: the typical combination rule for the elements of an image is occlusion. [sent-92, score-0.249]
41 In this perceptual case, T is also a linear filter bank applied to the original input image in the spatial domain. [sent-124, score-0.602]
42 The second transform, R, is a non-linear function that accounts for the masking and facilitation phenomena that have been reported in the linear y domain (Foley, 1994; Watson and Solomon, 1997). [sent-126, score-0.186]
43 This class of models is based on psychophysical experiments assuming that the last domain, r, is perceptually Euclidean (i. [sent-129, score-0.186]
44 An additional confirmation of this assumption is the success of (Euclidean) subjective image distortion measures defined in that domain (Teo and Heeger, 1994). [sent-132, score-0.336]
45 Straightforward application of Riemannian geometry to obtain the perceptual metric matrix in other domains shows that the coefficients of linear domains x and y, or any other linear transform of them, are not perceptually independent (Epifanio et al. [sent-133, score-0.826]
46 Figure 3 illustrates the presence of perceptual relations between coefficients when using linear block frequency or wavelet-like domains, y: the cross-masking behavior. [sent-135, score-0.593]
47 In this example, the visibility of the distortions added on top of the background image made of periodic patterns has to be assessed. [sent-136, score-0.316]
48 This is a measure of the sensitivity of a particular perceptual mechanism to distortions in that dimension, ∆y f , when mechanisms tuned to other dimensions are simultaneously active, that is, y f = 0, with f = f . [sent-137, score-0.538]
49 As can be observed, low frequency noise is more visible in high frequency backgrounds than in low frequency backgrounds (e. [sent-138, score-0.395]
50 Similarly, high frequency noise is more visible in low frequency backgrounds than in high frequency ones (e. [sent-141, score-0.367]
51 That is to say, a signal of a specific frequency strongly masks the corresponding frequency analyzer, but it induces a smaller sensitivity reduction in the analyzers that are tuned to different frequencies. [sent-144, score-0.278]
52 In other words, the reduction in sensitivity of a specific analyzer gets larger as the distance between the background frequency and the frequency of the analyzer gets smaller. [sent-145, score-0.314]
53 The response of each frequency analyzer not only depends on the energy of the signal for that frequency band, but also on the energy of the signal in other frequency bands (cross-masking). [sent-146, score-0.601]
54 This implies that a different amount of noise in each frequency band may be acceptable depending on the energy of that frequency band and on the energy of neighboring bands. [sent-147, score-0.34]
55 This is what we have called perceptual dependence among different coefficients in the y domain. [sent-148, score-0.407]
56 At this point, it is important to stress the similarity between the set of computations to obtain statistically decoupled image coefficients and the known stages of biological vision. [sent-149, score-0.181]
57 Equal energy noise of different frequency content, 3 cycl/deg (cpd), 6 cpd, 12 cpd and 24 cpd, shown on top of a background image. [sent-153, score-0.476]
58 Sampling frequency assumes that these images subtend an angle of 3 deg. [sent-154, score-0.164]
59 stage T is used in transform coding algorithms and in vision models (Olshausen and Field, 1996; Bell and Sejnowski, 1997; Taubman and Marcellin, 2001), and new evidence supports the same idea for the second non-linear stage (Schwartz and Simoncelli, 2001; Malo and Guti´ rrez, 2006). [sent-155, score-0.183]
60 e According to this, the statistical and perceptual transforms, R, that remove the above relations from the linear domains, y, would be very similar if not the same. [sent-156, score-0.455]
61 , divisive normalization), have non-diagonal Jacobian for any linear image representation, so they are not strictly suitable for conventional SVM training. [sent-162, score-0.267]
62 Note that the width of the perceptual interaction neighborhood increases with the frequency in the same way as the width of the statistical interaction neighborhood shown in Figure 2. [sent-178, score-0.6]
63 Note that the intrinsic non-linear nature of both the statistical and perceptual transforms, Equations (3) and (5), makes the above results true for any linear domain under consideration. [sent-181, score-0.5]
64 To summarize, since no linear domain fulfills the diagonal Jacobian condition in either statistical or perceptual terms, the negative situation illustrated in Figure 1 may occur when using SVM in these domains. [sent-183, score-0.529]
65 4 cpd −30 −20 −20 −10 −10 f (cpd) −30 0 y 0 y f (cpd) 12 f (frequency cycles/degree) 10 10 20 20 30 −30 −20 −10 0 fx (cpd) 10 20 30 −30 30 −20 −10 0 fx (cpd) 10 20 30 Figure 4: Parameters of the perceptual model: α f (top left), β f (top right). [sent-199, score-0.835]
66 Bottom figures represent perceptual interaction neighborhoods h f f of two particular coefficients of the local Fourier domain. [sent-200, score-0.447]
67 , block-DCT or wavelets) and in one of the proposed non-linear domains (either the statistically independent domain or the perceptually independent domain). [sent-203, score-0.387]
68 In contrast, the perceptual non-linearity (and its inverse) are analytical. [sent-206, score-0.407]
69 In this paper, we explore the use of a psychophysically-based divisive normalized domain: first compute a block-DCT transform and then apply the divisive normalization model described above for each block. [sent-209, score-0.231]
70 The results will be compared to the first competitive SVM coding results (Robinson 57 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO ´ and Kecman, 2003) and the posterior improvements reported by G omez-P´ rez et al. [sent-210, score-0.332]
71 As stated in Section 2, by construction, the proposed domain is perceptually Euclidean with perceptually independent components. [sent-212, score-0.465]
72 The Euclidean nature of this domain has an additional benefit: the ε-insensitivity design is very simple because a constant value is appropriate due to the constant perceptual relevance of all coefficients. [sent-213, score-0.523]
73 Moreover, beyond its built-in perceptual benefits, this psychophysically-based divisive normalization has attractive statistical properties: it strongly reduces the mutual information between the final coefficients r (Malo et al. [sent-215, score-0.504]
74 Performance of SVM Learning in Different Domains In this section, we analyze the performance of SVM-based coding algorithms in linear and nonlinear domains through rate-distortion curves and explicit examples for visual comparison. [sent-222, score-0.225]
75 In addition, we discuss how SVM selects support vectors in these domains to represent the image features. [sent-223, score-0.288]
76 , 2006) and the also perceptually meaningful Structural SIMilarity (SSIM) index (Wang et al. [sent-241, score-0.186]
77 When using more appropriate (perceptually meaningful) quality measures (Figures 5(b)-(c)), the CSF-SVR obtains a certain advantage over the RKi-1 algorithm for all compression rates, which was already reported by G´ mez-P´ rez et al. [sent-254, score-0.308]
78 The visual inspection confirms that the numerical gain in MPE and SSIM shown in Figure 5 is also perceptually significant. [sent-266, score-0.215]
79 ´ First, as previously reported by Gomez-P´ rez et al. [sent-268, score-0.218]
80 Figure 7 shows the results obtained by all considered methods at a very high compression ratio for the Barbara image (0. [sent-275, score-0.245]
81 This is due to a better distribution of support vectors in the perceptually independent domain. [sent-280, score-0.237]
82 3 Support Vector Distribution The observed different perceptual image quality obtained with each approach is a direct consequence of support vector distribution in different domains. [sent-282, score-0.59]
83 of the distribution of the selected support vectors by the RKi-1 and the CSF-SVR models working in the linear DCT domain, and the NL-SVM working in the perceptually independent non-linear domain r. [sent-320, score-0.33]
84 As a result, this approach only allocates support vectors in the low/medium frequency regions. [sent-323, score-0.164]
85 The CSF-SVR approach uses a variable ε according to the CSF and gives rise to a more natural concentration of support vectors in the low/medium frequency region, which captures medium to high frequency details at lower compression rates (0. [sent-324, score-0.367]
86 5 15 15 0 fy (cpd) 30 0 fx (cpd) 20 25 20 0 fx (cpd) fx (cpd) 20 30 25 20 30 10 0. [sent-333, score-0.226]
87 5 25 20 f (cpd) y 10 30 fy (cpd) 0 0 fx (cpd) fx (cpd) Figure 8: Signal in different domains and the selected support vectors by the SVM models in a block of the Barbara image at 0. [sent-335, score-0.478]
88 Different domains are analyzed: (a) linear DCT using RKi-1, (b) linear DCT with CSF-SVM, and (c) non-linear perceptual domain with standard ε-SVM (NL-SVR). [sent-338, score-0.582]
89 of the domain is done, accounting for richer (and perceptually independent) coefficients to perform efficient sparse signal reconstruction. [sent-339, score-0.331]
90 However, note that the relevant measure is the entropy and not the number of support vectors: even though the number of selected support vectors in the r domain is higher, their variance is lower, thus giving rise to the same entropy after entropy coding. [sent-343, score-0.172]
91 Conclusions In this paper, we have reported a condition on the suitable domain for developing efficient SVM image coding schemes. [sent-345, score-0.394]
92 The so-called diagonal Jacobian condition states that SVM regression with scalar-wise error restriction in a particular domain makes sense only if the transform that maps this domain to an independent coefficient representation is locally diagonal. [sent-346, score-0.32]
93 We have demonstrated that, 63 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO in general, linear domains do not fulfill this condition because non-trivial statistical and perceptual inter-coefficient relations do exist in these domains. [sent-347, score-0.537]
94 This theoretical finding has been experimentally confirmed by observing that improved compression results are obtained when SVM is applied in a non-linear perceptual domain that starts from the same linear domain used by previously reported SVM-based image coding schemes. [sent-348, score-0.984]
95 These results highlight the relevance of an appropriate image representation choice before SVM learning. [sent-349, score-0.214]
96 Wavelet-based image compression using SVM learning and encoding techniques. [sent-355, score-0.245]
97 Linear transform for simultaneous diagonalization of coe variance and perceptual metric matrix in image coding. [sent-411, score-0.631]
98 Perceptually adaptive insensitivity for o e e support vector machine image coding. [sent-436, score-0.282]
99 The role of perceptual contrast non-linearities in image transform coding. [sent-518, score-0.631]
100 Combining Support Vector Machine learning with the discrete cosine transform in image compression. [sent-567, score-0.224]
wordName wordTfidf (topN-words)
[('perceptual', 0.407), ('malo', 0.318), ('cpd', 0.306), ('rrez', 0.252), ('svr', 0.204), ('perceptually', 0.186), ('rez', 0.186), ('guti', 0.175), ('image', 0.155), ('distortions', 0.131), ('coding', 0.114), ('ica', 0.113), ('frequency', 0.113), ('simoncelli', 0.11), ('svm', 0.101), ('alo', 0.099), ('amps', 0.099), ('csf', 0.099), ('insensitivity', 0.099), ('domain', 0.093), ('jacobian', 0.091), ('compression', 0.09), ('bpp', 0.088), ('mage', 0.088), ('oding', 0.088), ('omain', 0.088), ('uitable', 0.088), ('coef', 0.085), ('cients', 0.085), ('omez', 0.084), ('jpeg', 0.084), ('uti', 0.084), ('barbara', 0.084), ('watson', 0.083), ('domains', 0.082), ('dct', 0.075), ('raining', 0.074), ('transform', 0.069), ('hyvarinen', 0.069), ('epifanio', 0.066), ('kecman', 0.066), ('ncia', 0.066), ('solomon', 0.066), ('divisive', 0.065), ('fx', 0.061), ('transforms', 0.057), ('energy', 0.057), ('robinson', 0.057), ('rki', 0.055), ('distortion', 0.053), ('signal', 0.052), ('images', 0.051), ('olshausen', 0.05), ('val', 0.05), ('relations', 0.048), ('conventional', 0.047), ('analyzer', 0.044), ('heeger', 0.044), ('karhunen', 0.044), ('mpe', 0.044), ('ssim', 0.044), ('independence', 0.043), ('fy', 0.043), ('interaction', 0.04), ('spatial', 0.04), ('lena', 0.037), ('representation', 0.036), ('subjective', 0.035), ('ful', 0.034), ('wavelets', 0.033), ('masking', 0.033), ('buccigrossi', 0.033), ('burjassot', 0.033), ('ferri', 0.033), ('gustavo', 0.033), ('insensitivities', 0.033), ('kbytes', 0.033), ('universitat', 0.033), ('zone', 0.033), ('reported', 0.032), ('normalization', 0.032), ('visibility', 0.03), ('visual', 0.029), ('diagonal', 0.029), ('support', 0.028), ('juan', 0.028), ('sensory', 0.028), ('facilitation', 0.028), ('backgrounds', 0.028), ('nl', 0.027), ('perception', 0.027), ('statistically', 0.026), ('localized', 0.025), ('america', 0.025), ('teo', 0.025), ('gabriel', 0.025), ('block', 0.025), ('bigger', 0.024), ('relevance', 0.023), ('vectors', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999928 73 jmlr-2008-On the Suitable Domain for SVM Training in Image Coding
Author: Gustavo Camps-Valls, Juan Gutiérrez, Gabriel Gómez-Pérez, Jesús Malo
Abstract: Conventional SVM-based image coding methods are founded on independently restricting the distortion in every image coefficient at some particular image representation. Geometrically, this implies allowing arbitrary signal distortions in an n-dimensional rectangle defined by the ε-insensitivity zone in each dimension of the selected image representation domain. Unfortunately, not every image representation domain is well-suited for such a simple, scalar-wise, approach because statistical and/or perceptual interactions between the coefficients may exist. These interactions imply that scalar approaches may induce distortions that do not follow the image statistics and/or are perceptually annoying. Taking into account these relations would imply using non-rectangular εinsensitivity regions (allowing coupled distortions in different coefficients), which is beyond the conventional SVM formulation. In this paper, we report a condition on the suitable domain for developing efficient SVM image coding schemes. We analytically demonstrate that no linear domain fulfills this condition because of the statistical and perceptual inter-coefficient relations that exist in these domains. This theoretical result is experimentally confirmed by comparing SVM learning in previously reported linear domains and in a recently proposed non-linear perceptual domain that simultaneously reduces the statistical and perceptual relations (so it is closer to fulfilling the proposed condition). These results highlight the relevance of an appropriate choice of the image representation before SVM learning. Keywords: image coding, non-linear perception models, statistical independence, support vector machines, insensitivity zone c 2008 Gustavo Camps-Valls, Juan Guti´ rrez, Gabriel G´ mez-P´ rez and Jes´ s Malo. e o e u ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO 1. Problem Statement: The Diagonal Jacobian Condition Image coding schemes based on support vector machines (SVM) have been successfully introduced in the literature. SVMs have been used in the spatial domain (Robinson and Kecman, 2000), in the block-DCT domain (Robinson and Kecman, 2003), and in the wavelet domain (Ahmed, 2005; Jiao et al., 2005). These coding methods take advantage of the ability of the support vector regression (SVR) algorithm for function approximation using a small number of parameters (signal samples, or ¨ support vectors) (Smola and Scholkopf, 2004). In all current SVM-based image coding techniques, a representation of the image is described by the entropy-coded weights associated to the support vectors necessary to approximate the signal with a given accuracy. Relaxing the accuracy bounds reduces the number of needed support vectors. In a given representation domain, reducing the number of support vectors increases the compression ratio at the expense of bigger distortion (lower image quality). By applying the standard SVR formulation, a certain amount of distortion in each sample of the image representation is allowed. In the original formulation, scalar restrictions on the errors are introduced using a constant ε-insensitivity value for every sample. ´ Recently, this procedure has been refined by Gomez-P´ rez et al. (2005) using a profile-dependent e SVR (Camps-Valls et al., 2001) that considers a different ε for each sample or frequency. This frequency-dependent insensitivity, ε f , accounts for the fact that, according to simple (linear) perception models, not every sample in linear frequency domains (such as DCT or wavelets) contributes to the perceived distortion in the same way. Despite different domains have been proposed for SVM training (spatial domain, block-DCT and wavelets) and different ε insensitivities per sample have been proposed, in conventional SVR formulation, the particular distortions introduced by regression in the different samples are not coupled. In all the reported SVM-based image coding schemes, the RBF kernel is used and the penalization parameter is fixed to an arbitrarily large value. In this setting, considering n-sample signals as n-dimensional vectors, the SVR guarantees that the approximated vectors are confined in n-dimensional rectangles around the original vectors. These rectangles are just n-dimensional cubes in the standard formulation or they have certain elongation if different ε f are considered in each axis, f . Therefore, in all the reported SVM-based coding methods, these rectangles are always oriented along the axes of the (linear) image representation. According to this, a common feature of these (scalar-wise) approaches is that they give rise to decoupled distortions in each dimension. P´ rez-Cruz et al. (2002) proposed a hyperspherical insensitivity zone to correct the penalization e factor in each dimension of multi-output regression problems, but again, restrictions to each sample were still uncoupled. This scalar-wise strategy is not the best option in domains where the different dimensions of the image representation are not independent. For instance, consider the situation where actually independent components, r f , are obtained from a given image representation, y, applying some eventually non-linear transform, R: R y −→ r. In this case, SVM regression with scalar-wise error restriction makes sense in the r domain. However, the original y domain will not be suitable for the standard SVM regression unless the matrix ∇R is diagonal (up to any permutation of the dimensions, that is, only one non-zero element per row). Therefore, if transforms that achieve independence have non-diagonal Jacobian, scalar-wise restrictions in the original (coupled coefficients) domain y are not allowed. 50 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING y2 1 −0,5 −0,5 0,4 R= r2 y1 r1 R−1 = 0,83 −0,83 0,67 −1,67 Figure 1: Insensitivity regions in different representation domains, y (left) and r (right), related by a non-diagonal transform ∇R and its inverse ∇R−1 . Figure 1 illustrates this situation. The shaded region in the right plot (r domain) represents the n-dimensional box determined by the ε f insensitivities in each dimension ( f =1,2), in which a scalar-wise approach is appropriate due to independence among signal coefficients. Given that the particular ∇R transform is not diagonal, the corresponding shaded region in the left plot (the original y domain) is not aligned along the axes of the representation. This has negative implications: note that for the highlighted points, smaller distortions in both dimensions in the y domain (as implied by SVM with tighter but scalar ε f insensitivities) do not necessarily imply lying inside the insensitivity region in the final truly independent (and meaningful) r domain. Therefore, the original y domain is not suitable for the direct application of conventional SVM, and consequently, non-trivial coupled insensitivity regions are required. Summarizing, in the image coding context, the condition for an image representation y to be strictly suitable for conventional SVM learning is that the transform that maps the original representation y to an independent coefficient representation r must be locally diagonal. As will be reviewed below, independence among coefficients (and the transforms to obtain them) may be defined in both statistical and perceptual terms (Hyvarinen et al., 2001; Malo et al., 2001; Epifanio et al., 2003; Malo et al., 2006). On the one hand, a locally diagonal relation to a statistically independent representation is desirable because independently induced distortions (as the conventional SVM approach does) will preserve the statistics of the distorted signal, that is, it will not introduce artificial-looking artifacts. On the other hand, a locally diagonal relation to a perceptually 51 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO independent representation is desirable because independently induced distortions do not give rise to increased subjective distortions due to non-trivial masking or facilitation interactions between the distortions in each dimension (Watson and Solomon, 1997). In this work, we show that conventional linear domains do not fulfill the diagonal Jacobian condition in either the statistical case or in the perceptual case. This theoretical result is experimentally confirmed by comparing SVM learning in previously reported linear domains (Robinson and Kecman, 2003; G´ mez-P´ rez et al., 2005) and in a recently proposed non-linear perceptual domain o e that simultaneously reduces the statistical and the perceptual relations (Malo et al., 2006), thus, this non-linear perceptual domain is closer to fulfilling the proposed condition. The rest of the paper is structured as follows. Section 2 reviews the fact that linear coefficients of the image representations commonly used for SVM training are neither statistically independent nor perceptually independent. Section 3 shows that transforms for obtaining statistical and/or perceptual independence from linear domains have non-diagonal Jacobian. This suggests that there is room to improve the performance of conventional SVM learning reported in linear domains. In Section 4, we propose the use of a perceptual representation for SVM training because it strictly fulfills the diagonal Jacobian condition in the perceptual sense and increases the statistical independence among coefficients, bringing it closer to fulfilling the condition in the statistical sense. The experimental image coding results confirm the superiority of this domain for SVM training in Section 5. Section 6 presents the conclusions and final remarks. 2. Statistical and Perceptual Relations Among Image Coefficients Statistical independence among the coefficients of a signal representation refers to the fact that the joint PDF of the class of signals to be considered can be expressed as a product of the marginal PDFs in each dimension (Hyvarinen et al., 2001). Simple (second-order) descriptions of statistical dependence use the non-diagonal nature of the covariance matrix (Clarke, 1985; Gersho and Gray, 1992). More recent and accurate descriptions use higher-order moments, mutual information, or the non-Gaussian nature (sparsity) of marginal PDFs (Hyvarinen et al., 2001; Simoncelli, 1997). Perceptual independence refers to the fact that the visibility of errors in coefficients of an image may depend on the energy of neighboring coefficients, a phenomenon known in the perceptual literature as masking or facilitation (Watson and Solomon, 1997). Perceptual dependence has been formalized just up to second order, and this may be described by the non-Euclidean nature of the perceptual metric matrix (Malo et al., 2001; Epifanio et al., 2003; Malo et al., 2006). 2.1 Statistical Relations In recent years, a variety of approaches, known collectively as “independent component analysis” (ICA), have been developed to exploit higher-order statistics for the purpose of achieving a unique linear solution for coefficient independence (Hyvarinen et al., 2001). The basis functions obtained when these methods are applied to images are spatially localized and selective for both orientation and spatial frequency (Olshausen and Field, 1996; Bell and Sejnowski, 1997). Thus, they are similar to basis functions of multi-scale wavelet representations. Despite its name, linear ICA does not actually produce statistically independent coefficients when applied to photographic images. Intuitively, independence would seem unlikely, since images are not formed from linear superpositions of independent patterns: the typical combination rule for the elements of an image is occlusion. Empirically, the coefficients of natural image decom52 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING | f | = 10.8 cpd | f | = 24.4 cpd −20 −10 −10 fy (cpd) −30 −20 fy (cpd) −30 0 0 10 10 20 20 30 30 −30 −20 −10 0 f (cpd) 10 20 30 −30 x −20 −10 0 f (cpd) 10 20 30 x Figure 2: Statistical interaction of two particular coefficients of the local Fourier Transform with their neighbors in a natural image database. The absolute value of the frequency of these coefficients is | f | = 10.8 and | f | = 24.4 cycles/degree (cpd). positions in spatially localized oscillating basis functions are found to be fairly well decorrelated (i.e., their covariance is almost zero). However, the amplitudes of coefficients at nearby spatial positions, orientations, and scales are highly correlated (even with orthonormal transforms) (Simoncelli, ´ 1997; Buccigrossi and Simoncelli, 1999; Wainwright et al., 2001; Hyvarinen et al., 2003; Guti errez et al., 2006; Malo et al., 2006; Malo and Guti´ rrez, 2006). This suggests that achieving statistical e independence requires the introduction of non-linearities beyond linear ICA transforms. Figure 2 reproduces one of many results that highlight the presence of statistical relations of natural image coefficients in block PCA or linear ICA-like domains: the energy of spatially localized oscillating filters is correlated with the energy of neighboring filters in scale and orientation (see Guti´ rrez et al., 2006). A remarkable feature is that the interaction width increases with frequency, e as has been reported in other domains, for example, wavelets (Buccigrossi and Simoncelli, 1999; Wainwright et al., 2001; Hyvarinen et al., 2003), and block-DCT (Malo et al., 2006). In order to remove the remaining statistical relations in the linear domains y, non-linear ICA methods are necessary (Hyvarinen et al., 2001; Lin, 1999; Karhunen et al., 2000; Jutten and Karhunen, 2003). Without lack of generality, non-linear ICA transforms can be schematically understood as a two-stage process (Malo and Guti´ rrez, 2006): e T R (( y hh x hh (( r, (1) R−1 T−1 where x is the image representation in the spatial domain, and T is a global unitary linear transform that removes second-order and eventually higher-order relations among coefficients in the spatial domain. Particular examples of T include block PCA, linear ICAs, DCT or wavelets. In the ICA literature notation, T is the separating matrix and T−1 is the mixing matrix. The second transform 53 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO R is an additional non-linearity that is introduced in order to remove the statistical relations that still remain in the y domain. 2.2 Perceptual Relations Perceptual dependence among coefficients in different image representations can be understood by using the current model of V1 cortex. This model can also be summarized by the two-stage (linear and non-linear) process described in Equation (1). In this perceptual case, T is also a linear filter bank applied to the original input image in the spatial domain. This filter bank represents the linear behavior of V1 neurons whose receptive fields happen to be similar to wavelets or linear ICA basis functions (Olshausen and Field, 1996; Bell and Sejnowski, 1997). The second transform, R, is a non-linear function that accounts for the masking and facilitation phenomena that have been reported in the linear y domain (Foley, 1994; Watson and Solomon, 1997). Section 3.2 gives a parametric expression for the second non-linear stage, R: the divisive normalization model (Heeger, 1992; Foley, 1994; Watson and Solomon, 1997). This class of models is based on psychophysical experiments assuming that the last domain, r, is perceptually Euclidean (i.e., perfect perceptual independence). An additional confirmation of this assumption is the success of (Euclidean) subjective image distortion measures defined in that domain (Teo and Heeger, 1994). Straightforward application of Riemannian geometry to obtain the perceptual metric matrix in other domains shows that the coefficients of linear domains x and y, or any other linear transform of them, are not perceptually independent (Epifanio et al., 2003). Figure 3 illustrates the presence of perceptual relations between coefficients when using linear block frequency or wavelet-like domains, y: the cross-masking behavior. In this example, the visibility of the distortions added on top of the background image made of periodic patterns has to be assessed. This is a measure of the sensitivity of a particular perceptual mechanism to distortions in that dimension, ∆y f , when mechanisms tuned to other dimensions are simultaneously active, that is, y f = 0, with f = f . As can be observed, low frequency noise is more visible in high frequency backgrounds than in low frequency backgrounds (e.g., left image). Similarly, high frequency noise is more visible in low frequency backgrounds than in high frequency ones (e.g., right image). That is to say, a signal of a specific frequency strongly masks the corresponding frequency analyzer, but it induces a smaller sensitivity reduction in the analyzers that are tuned to different frequencies. In other words, the reduction in sensitivity of a specific analyzer gets larger as the distance between the background frequency and the frequency of the analyzer gets smaller. The response of each frequency analyzer not only depends on the energy of the signal for that frequency band, but also on the energy of the signal in other frequency bands (cross-masking). This implies that a different amount of noise in each frequency band may be acceptable depending on the energy of that frequency band and on the energy of neighboring bands. This is what we have called perceptual dependence among different coefficients in the y domain. At this point, it is important to stress the similarity between the set of computations to obtain statistically decoupled image coefficients and the known stages of biological vision. In fact, it has been hypothesized that biological visual systems have organized their sensors to exploit the particular statistics of the signals they have to process. See Barlow (2001), Simoncelli and Olshausen (2001), and Simoncelli (2003) for reviews on this hypothesis. In particular, both the linear and the non-linear stages of the cortical processing have been successfully derived using redundancy reduction arguments: nowadays, the same class of linear 54 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING 3 cpd 6 cpd 12 cpd 24 cpd Figure 3: Illustrative example of perceptual dependence (cross-masking phenomenon). Equal energy noise of different frequency content, 3 cycl/deg (cpd), 6 cpd, 12 cpd and 24 cpd, shown on top of a background image. Sampling frequency assumes that these images subtend an angle of 3 deg. stage T is used in transform coding algorithms and in vision models (Olshausen and Field, 1996; Bell and Sejnowski, 1997; Taubman and Marcellin, 2001), and new evidence supports the same idea for the second non-linear stage (Schwartz and Simoncelli, 2001; Malo and Guti´ rrez, 2006). e According to this, the statistical and perceptual transforms, R, that remove the above relations from the linear domains, y, would be very similar if not the same. 3. Statistical and Perceptual Independence Imply Non-diagonal Jacobian In this section, we show that both statistical redundancy reduction transforms (e.g., non-linear ICA) and perceptual independence transforms (e.g., divisive normalization), have non-diagonal Jacobian for any linear image representation, so they are not strictly suitable for conventional SVM training. 3.1 Non-diagonal Jacobian in Non-linear ICA Transforms One possible approach for dealing with global non-linear ICA is to act differentially by breaking the problem into local linear pieces that can then be integrated to obtain the global independent coefficient domain (Malo and Guti´ rrez, 2006). Each differential sub-problem around a particular e point (image) can be locally solved using the standard linear ICA methods restricted to the neighbors of that point (Lin, 1999). Using the differential approach in the context of a two-stage process such as the one in Equation (1), it can be shown that (Malo and Guti´ rrez, 2006): e r = r0 + Z x x0 T (x ) dx = r0 + Z x x0 ∇R(Tx ) T dx , (2) where T (x ) is the local separating matrix for a neighborhood of the image x , and T is the global separating matrix for the whole PDF. Therefore, the Jacobian of the second non-linear stage is: ∇R(y) = ∇R(Tx) = T (x) T−1 . 55 (3) ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO As local linear independent features around a particular image, x, differ in general from global linear independent features, that is, T (x) = T, the above product is not the identity nor diagonal in general. 3.2 Non-diagonal Jacobian in Non-linear Perceptual Transforms The current response model for the cortical frequency analyzers is non-linear (Heeger, 1992; Watson and Solomon, 1997). The outputs of the filters of the first linear stage, y, undergo a non-linear sigmoid transform in which the energy of each linear coefficient is weighted by a linear Contrast Sensitivity Function (CSF) (Campbell and Robson, 1968; Malo et al., 1997) and is further normalized by a combination of the energies of neighbor coefficients in frequency, r f = R(y) f = sgn(y f ) |α f y f |γ , β f + ∑n =1 h f f |α f y f |γ f (4) where α f (Figure 4[top left]) are CSF-like weights, β f (Figure 4[top right]) control the sharpness of the response saturation for each coefficient, γ is the so called excitation exponent, and the matrix h f f determines the interaction neighborhood in the non-linear normalization of the energy. This interaction matrix models the cross-masking behavior (cf. Section 2.2). The interaction in this matrix is assumed to be Gaussian (Watson and Solomon, 1997), and its width increases with the frequency. Figure 4[bottom] shows two examples of this Gaussian interaction for two particular coefficients in a local Fourier domain. Note that the width of the perceptual interaction neighborhood increases with the frequency in the same way as the width of the statistical interaction neighborhood shown in Figure 2. We used a value of γ = 2 in the experiments. Taking derivatives in the general divisive normalization model, Equation (4), we obtain ∇R(y) f f = sgn(y f )γ α f |α f y f |γ |α f y f |γ−1 α f |α f y f |γ−1 δf f − hf f β f + ∑n =1 h f f |α f y f |γ (β f + ∑n =1 h f f |α f y f |γ )2 f f , (5) which is not diagonal because of the interaction matrix, h, which describes the cross-masking between each frequency f and the remaining f = f . Note that the intrinsic non-linear nature of both the statistical and perceptual transforms, Equations (3) and (5), makes the above results true for any linear domain under consideration. Specifically, if any other possible linear domain for image representation is considered, y = T y, then the Jacobian of the corresponding independence transform, R , is ∇R (y ) = ∇R(y) T −1 , which, in general, will also be non-diagonal because of the non-diagonal and point-dependent nature of ∇R(y). To summarize, since no linear domain fulfills the diagonal Jacobian condition in either statistical or perceptual terms, the negative situation illustrated in Figure 1 may occur when using SVM in these domains. Therefore, improved results could be obtained if SVM learning were applied after some transform achieving independent coefficients, R. 4. SVM Learning in a Perceptually Independent Representation In order to confirm the above theoretical results (i.e., the unsuitability of linear representation domains for SVM learning) and to assess the eventual gain that can be obtained from training SVR 56 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING 0.04 αf 0.015 βf 0.03 0.012 0.02 0.009 0.006 0.01 0.003 0 0 4 8 12 16 20 24 28 32 0 f (frequency cycles/degree) 4 8 | f | = 10.8 cpd 16 20 24 28 32 | f | = 24.4 cpd −30 −20 −20 −10 −10 f (cpd) −30 0 y 0 y f (cpd) 12 f (frequency cycles/degree) 10 10 20 20 30 −30 −20 −10 0 fx (cpd) 10 20 30 −30 30 −20 −10 0 fx (cpd) 10 20 30 Figure 4: Parameters of the perceptual model: α f (top left), β f (top right). Bottom figures represent perceptual interaction neighborhoods h f f of two particular coefficients of the local Fourier domain. in a more appropriate domain, we should compare the performance of SVRs in previously reported linear domains (e.g., block-DCT or wavelets) and in one of the proposed non-linear domains (either the statistically independent domain or the perceptually independent domain). Exploration of the statistical independence transform may have academic interest but, in its present formulation, it is not practical for coding purposes: direct application of non-linear ICA as in Equation (2) is very time-consuming for high dimensional vectors since lots of local ICA computations are needed to transform each block, and a very large image database is needed for a robust and significant computation of R. Besides, an equally expensive differential approach is also needed to compute the inverse R−1 for image decoding. In contrast, the perceptual non-linearity (and its inverse) are analytical. These analytical expressions are feasible for reasonable block sizes, and there are efficient iterative methods that can be used for larger vectors (Malo et al., 2006). In this paper, we explore the use of a psychophysically-based divisive normalized domain: first compute a block-DCT transform and then apply the divisive normalization model described above for each block. The results will be compared to the first competitive SVM coding results (Robinson 57 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO ´ and Kecman, 2003) and the posterior improvements reported by G omez-P´ rez et al. (2005), both e formulated in the linear block-DCT domain. As stated in Section 2, by construction, the proposed domain is perceptually Euclidean with perceptually independent components. The Euclidean nature of this domain has an additional benefit: the ε-insensitivity design is very simple because a constant value is appropriate due to the constant perceptual relevance of all coefficients. Thus, direct application of the standard SVR method is theoretically appropriate in this domain. Moreover, beyond its built-in perceptual benefits, this psychophysically-based divisive normalization has attractive statistical properties: it strongly reduces the mutual information between the final coefficients r (Malo et al., 2006). This is not surprising according to the hypothesis that try to explain the early stages of biological vision systems using information theory arguments (Barlow, 1961; Simoncelli and Olshausen, 2001). Specifically, dividing the energy of each linear coefficient by the energy of the neighbors, which are statistically related with it, cf. Figure 2, gives coefficients with reduced statistical dependence. Moreover, as the empirical non-linearities of perception have been reproduced using non-linear ICA in Equation (2) (Malo and Guti´ rrez, 2006), the empirical die visive normalization can be seen as a convenient parametric way to obtain statistical independence. 5. Performance of SVM Learning in Different Domains In this section, we analyze the performance of SVM-based coding algorithms in linear and nonlinear domains through rate-distortion curves and explicit examples for visual comparison. In addition, we discuss how SVM selects support vectors in these domains to represent the image features. 5.1 Model Development and Experimental Setup In the (linear) block-DCT domain, y, we use the method introduced by Robinson and Kecman (2003) (RKi-1), in which the SVR is trained to learn a fixed (low-pass) number of DCT coefficients ´ (those with frequency bigger than 20 cycl/deg are discarded); and the method proposed by G omezP´ rez et al. (2005) (CSF-SVR), in which the relevance of all DCT coefficients is weighted according e to the CSF criterion using an appropriately modulated ε f . In the non-linear domain, r, we use the SVR with constant insensitivity parameter ε (NL-SVR). In all cases, the block-size is 16×16, that is, y, r ∈ R256 . The behavior of JPEG standard is also included in the experiments for comparison purposes. As stated in Section 1, we used the RBF kernel and arbitrarily large penalization parameter in every SVR case. In all experiments, we trained the SVR models without the bias term, and modelled the absolute value of the DCT, y, or response coefficients, r. All the remaining free parameters (ε-insensitivity and Gaussian width of the RBF kernel σ) were optimized for all the considered models and different compression ratios. In the NL-SVM case, the parameters of the divisive normalization used in the experiments are shown in Figure 4. After training, the signal is described by the uniformly quantized Lagrange multipliers of the support vectors needed to keep the regression error below the thresholds ε f . The last step is entropy coding of the quantized weights. The compression ratio is controlled by a factor applied to the thresholds, ε f . 58 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING 5.2 Model Comparison In order to assess the quality of the coded images, three different measures were used: the standard ´ (Euclidean) RMSE, the Maximum Perceptual Error (MPE) (Malo et al., 2000; G omez-P´ rez et al., e 2005; Malo et al., 2006) and the also perceptually meaningful Structural SIMilarity (SSIM) index (Wang et al., 2004). Eight standard 256×256 monochrome 8 bits/pix images were used in the experiments. Average rate-distortion curves are plotted in Figure 5 in the range [0.05, 0.6] bits/pix (bpp). According to these entropy-per-sample data, original file size was 64 KBytes in every case, while the compressed image sizes were in the range [0.4, 4.8] KBytes. This implies that the compression ratios were in the range [160:1, 13:1]. In general, a clear gain over standard JPEG is obtained by all SVM-based methods. According to the standard Euclidean MSE point of view, the performance of RKi-1 and CSF-SVR algorithms is basically the same (note the overlapped curves in Figure 5(a)). However, it is widely known that the MSE results are not useful to represent the subjective quality of images, as extensively reported elsewhere (Girod, 1993; Teo and Heeger, 1994; Watson and Malo, 2002). When using more appropriate (perceptually meaningful) quality measures (Figures 5(b)-(c)), the CSF-SVR obtains a certain advantage over the RKi-1 algorithm for all compression rates, which was already reported by G´ mez-P´ rez et al. (2005). In all measures, and for the whole considered entropy range, the o e proposed NL-SVR clearly outperforms all previously reported methods, obtaining a noticeable gain at medium-to-high compression ratios (between 0.1 bpp (80:1) and 0.3 bpp (27:1)). Taking into account that the recommended bit rate for JPEG is about 0.5 bpp, from Figure 5 we can also conclude that the proposed technique achieves the similar quality levels at a lower bit rate in the range [0.15, 0.3] bpp. Figure 6 shows representative visual results of the considered SVM strategies on standard images (Lena and Barbara) at the same bit rate (0.3 bpp, 27:1 compression ratio or 2.4 KBytes in 256×256 images). The visual inspection confirms that the numerical gain in MPE and SSIM shown in Figure 5 is also perceptually significant. Some conclusions can be extracted from this figure. ´ First, as previously reported by Gomez-P´ rez et al. (2005), RKi-1 leads to poorer (blocky) results e because of the crude approximation of the CSF (as an ideal low-pass filter) and the equal relevance applied to the low-frequency DCT-coefficients. Second, despite the good performance yielded by the CSF-SVR approach to avoid blocking effects, it is worth noting that high frequency details are smoothed (e.g., see Barbara’s scarf). These effects are highly alleviated by introducing SVR in the non-linear domain. See, for instance, Lena’s eyes, her hat’s feathers or the better reproduction of the high frequency pattern in Barbara’s clothes. Figure 7 shows the results obtained by all considered methods at a very high compression ratio for the Barbara image (0.05 bpp, 160:1 compression ratio or 0.4 KBytes in 256×256 images). This experiment is just intended to show the limits of methods performance since it is out of the recommended rate ranges. Even though this scenario is unrealistic, differences among methods are still noticeable: the proposed NL-SVR method reduces the blocky effects (note for instance that the face is better reproduced). This is due to a better distribution of support vectors in the perceptually independent domain. 5.3 Support Vector Distribution The observed different perceptual image quality obtained with each approach is a direct consequence of support vector distribution in different domains. Figure 8 shows a representative example 59 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO 20 JPEG RKi−1 CSF−SVR NL−SVR 18 RMSE 16 14 12 10 8 0.1 0.2 0.3 0.4 Entropy (bits/pix) 0.5 0.6 22 JPEG RKi−1 CSF−SVR NL−SVR 20 18 16 MPE 14 12 10 8 6 4 0.1 0.2 0.3 0.4 Entropy (bits/pix) 0.5 0.6 0.85 0.8 SSIM 0.75 0.7 0.65 JPEG 0.6 RKi−1 CSF−SVR 0.55 NL−SVR 0.5 0.1 0.2 0.3 0.4 Entropy (bits/pix) 0.5 0.6 Figure 5: Average rate distortion curves over eight standard images (Lena, Barbara, Boats, Einstein, Peppers, Mandrill, Goldhill, Camera man) using objective and subjective measures for the considered JPEG (dotted) and the SVM approaches (RKi-1 dash-dotted, CSF-SVR dashed and NL-SVR solid). RMSE distortion (top), Maximum Perceptual Error, MPE (middle) (Malo et al., 2000; G´ mez-P´ rez et al., 2005; Malo et al., 2006), and Structural o e SIMilarity index, SSIM (bottom) (Wang et al., 2004). 60 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING Rki Rki SVR+CSF SVR+CSF NL+SVR NL+SVR Figure 6: Examples of decoded Lena (left) and Barbara (right) images at 0.3 bits/pix. From top to bottom: JPEG, RKi-1, CSF-SVR, and NL-SVR. 61 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO (a) (b) (c) (d) Figure 7: Examples of decoded Barbara images at a high compression ratio of 0.05 bits/pix (160:1) for (a) JPEG, (b) RKi-1, (c) CSF-SVR, and (d) NL-SVR. of the distribution of the selected support vectors by the RKi-1 and the CSF-SVR models working in the linear DCT domain, and the NL-SVM working in the perceptually independent non-linear domain r. Specifically, a block of Barbara’s scarf at different compression ratios is used for illustration purposes. The RKi-1 approach (Robinson and Kecman, 2003) uses a constant ε but, in order to consider the low subjective relevance of the high-frequency region, the corresponding coefficients are neglected. As a result, this approach only allocates support vectors in the low/medium frequency regions. The CSF-SVR approach uses a variable ε according to the CSF and gives rise to a more natural concentration of support vectors in the low/medium frequency region, which captures medium to high frequency details at lower compression rates (0.5 bits/pix). Note that the number of support vectors is bigger than in the RKi-1 approach, but it selects some necessary high-frequency coefficients to keep the error below the selected threshold. However, for bigger compression ratios (0.3 bits/pix), it misrepresents some high frequency, yet relevant, features (e.g., the peak from the stripes). The NL-SVM approach works in the non-linear transform domain, in which a more uniform coverage 62 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING (a) (b) (c) 0 200 10 100 0 200 5 5 15 0 0 20 30 25 20 10 20 30 fy (cpd) 10 0 fy (cpd) 30 10 fx (cpd) 0 200 5 10 100 0 200 5 10 100 15 25 10 30 20 30 f (cpd) y 10 0 30 10 0.5 20 30 25 20 5 1 0 0 20 30 0 1.5 15 15 0 fy (cpd) 30 0 fx (cpd) 20 25 20 0 fx (cpd) fx (cpd) 20 30 25 20 30 10 0.5 15 0 5 1 10 100 15 0 1.5 25 20 f (cpd) y 10 30 fy (cpd) 0 0 fx (cpd) fx (cpd) Figure 8: Signal in different domains and the selected support vectors by the SVM models in a block of the Barbara image at 0.3 bits/pix (top row) and 0.5 bits/pix (bottom row). Different domains are analyzed: (a) linear DCT using RKi-1, (b) linear DCT with CSF-SVM, and (c) non-linear perceptual domain with standard ε-SVM (NL-SVR). of the domain is done, accounting for richer (and perceptually independent) coefficients to perform efficient sparse signal reconstruction. It is important to remark that, for a given method (or domain), tightening ε f implies (1) considering more support vectors, and (2) an increase in entropy (top and bottom rows in Figure 8, 0.3 bpp to 0.5 bpp). However, note that the relevant measure is the entropy and not the number of support vectors: even though the number of selected support vectors in the r domain is higher, their variance is lower, thus giving rise to the same entropy after entropy coding. 6. Conclusions In this paper, we have reported a condition on the suitable domain for developing efficient SVM image coding schemes. The so-called diagonal Jacobian condition states that SVM regression with scalar-wise error restriction in a particular domain makes sense only if the transform that maps this domain to an independent coefficient representation is locally diagonal. We have demonstrated that, 63 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO in general, linear domains do not fulfill this condition because non-trivial statistical and perceptual inter-coefficient relations do exist in these domains. This theoretical finding has been experimentally confirmed by observing that improved compression results are obtained when SVM is applied in a non-linear perceptual domain that starts from the same linear domain used by previously reported SVM-based image coding schemes. These results highlight the relevance of an appropriate image representation choice before SVM learning. Further work is tied to the use of SVM-based coding schemes in statistically, rather than perceptually, independent non-linear ICA domains. In order to do so, local PCA instead of local ICA may be used in the local-to-global differential approach (Malo and Guti´ rrez, 2006) to speed up the e non-linear computation. Acknowledgments This work has been partly supported by the Spanish Ministry of Education and Science under grant CICYT TEC2006-13845/TCM and by the Generalitat Valenciana under grant GV-06/215. References R. Ahmed. Wavelet-based image compression using SVM learning and encoding techniques. Proc. 8th IASTED International Conference Computer Graphics and Imaging, 1:162–166, 2005. H. B. Barlow. Redundancy reduction revisited. Network: Comp. Neur. Sys., 12:241–253, 2001. H. B. Barlow. Possible principles underlying the transformation of sensory messages. In WA Rosenblith, editor, Sensory Communication, pages 217–234. MIT Press, Cambridge, MA, 1961. A. J. Bell and T. J. Sejnowski. The ‘independent components’ of natural scenes are edge filters. Vision Research, 37(23):3327–3338, 1997. R. W. Buccigrossi and E. P. Simoncelli. Image compression via joint statistical characterization in the wavelet domain. IEEE Transactions on Image Processing, 8(12):1688–1701, 1999. F. W. Campbell and J. G. Robson. Application of Fourier analysis to the visibility of gratings. Journal of Physiology, 197(3):551–566, August 1968. G. Camps-Valls, E. Soria-Olivas, J. P´ rez-Ruixo, A. Art´ s-Rodr´guez, F. P´ rez-Cruz, and e e ı e A. Figueiras-Vidal. A profile-dependent kernel-based regression for cyclosporine concentration prediction. In Neural Information Processing Systems (NIPS) – Workshop on New Directions in Kernel-Based Learning Methods, Vancouver, Canada, December 2001. R. J. Clarke. Transform Coding of Images. Academic Press, New York, 1985. I. Epifanio, J. Guti´ rrez, and J. Malo. Linear transform for simultaneous diagonalization of coe variance and perceptual metric matrix in image coding. Pattern Recognition, 36(8):1799–1811, August 2003. J. M. Foley. Human luminance pattern mechanisms: Masking experiments require a new model. Journal of the Optical Society of America A, 11(6):1710–1719, 1994. 64 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING A. Gersho and R. M. Gray. Vector Quantization and Signal Compression. Kluwer Academic Press, Boston, 1992. B. Girod. What’s wrong with mean-squared error. In A. B. Watson, editor, Digital Images and Human Vision, pages 207–220. The MIT press, 1993. G. G´ mez-P´ rez, G. Camps-Valls, J. Guti´ rrez, and J. Malo. Perceptually adaptive insensitivity for o e e support vector machine image coding. IEEE Transactions on Neural Networks, 16(6):1574–1581, 2005. J. Guti´ rrez, F. J. Ferri, and J. Malo. Regularization operators for natural images based on nonlinear e perception models. IEEE Transactions on Image Processing, 15(1):189–2000, January 2006. D. J. Heeger. Normalization of cell responses in cat striate cortex. Vis. Neurosci., 9:181–198, 1992. A. Hyvarinen, J. Karhunen, and E. Oja. Independent Component Analysis. John Wiley & Sons, New York, 2001. A. Hyvarinen, J. Hurri, and J. Vayrynenm. Bubbles: a unifying framework for low-level statistical properties of natural image sequences. Journal of the Optical Society of America A, 20(7), 2003. R. Jiao, Y. Li, Q. Wang, and B. Li. SVM regression and its application to image compression. In International Conference on Intelligent Computing, volume 3645, pages 747–756. Lecture Notes on Computer Science, 2005. C. Jutten and J. Karhunen. Advances in nonlinear blind source separation. Proc. of the 4th Int. Symp. on Independent Component Analysis and Blind Signal Separation (ICA2003), pages 245– 256, 2003. J. Karhunen, S. Malaroiu, and M. Ilmoniemi. Local linear independent component analysis based on clustering. Intl. J. Neur. Syst., 10(6), December 2000. J. K. Lin. Factorizing probability density functions: Generalizing ICA. In Proc. of the First Intl. Workshop on ICA and Signal Separation, pages 313–318, Aussois, France, 1999. J. Malo and J. Guti´ rrez. V1 non-linear properties emerge from local-to-global non-linear ICA. e Network: Computation in Neural Systems, 17:85–102, 2006. J. Malo, A. M. Pons, A. Felipe, and J. M. Artigas. Characterization of human visual system threshold performance by a weighting function in the Gabor domain. Journal of Modern Optics, 44(1): 127–148, 1997. J. Malo, F. Ferri, J. Albert, J. Soret, and J. M. Artigas. The role of perceptual contrast non-linearities in image transform coding. Image & Vision Computing, 18(3):233–246, February 2000. J. Malo, J. Guti´ rrez, I. Epifanio, F. Ferri, and J. M. Artigas. Perceptual feed-back in multigrid moe tion estimation using an improved DCT quantization. IEEE Transactions on Image Processing, 10(10):1411–1427, October 2001. J. Malo, I. Epifanio, R. Navarro, and E. Simoncelli. Non-linear image representation for efficient perceptual coding. IEEE Transactions on Image Processing, 15(1):68–80, 2006. 65 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–609, 1996. F. P´ rez-Cruz, G. Camps-Valls, E. Soria-Olivas, J. J. P´ rez-Ruixo, A. R. Figueiras-Vidal, and e e A. Art´ s-Rodr´guez. Multi-dimensional function approximation and regression estimation. In e ı International Conference on Artificial Neural Networks, ICANN2002, Madrid, Spain., Aug 2002. Lecture Notes in Computer Science. Springer–Verlag. J. Robinson and V. Kecman. The use of Support Vector Machines in image compression. In Proceedings of the International Conference on Engineering Intelligence Systems, EIS’2000, volume 36, pages 93–96, University of Paisley, Scotland, U.K., June 2000. J. Robinson and V. Kecman. Combining Support Vector Machine learning with the discrete cosine transform in image compression. IEEE Transactions Neural Networks, 14(4):950–958, July 2003. O. Schwartz and E. P. Simoncelli. Natural signal statistics and sensory gain control. Nature Neuroscience, 4(8):819–825, 2001. E. P. Simoncelli. Vision and the statistics of the visual environment. Current Opinion in Neurobiology, 13(2):144–149, 2003. E. P. Simoncelli. Statistical models for images: Compression, restoration and synthesis. In 31st Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, 1997. E. P. Simoncelli and B. O. Olshausen. Natural image statistics and neural representation. Annual Review of Neuroscience, 24:1193–1216, 2001. A. J. Smola and B. Sch¨ lkopf. A tutorial on support vector regression. Statistics and Computing, o 14:199–222, 2004. D. S. Taubman and M. W. Marcellin. JPEG2000: Image Compression Fundamentals, Standards and Practice. Kluwer Academic Publishers, Boston, 2001. P. C. Teo and D. J. Heeger. Perceptual image distortion. Proceedings of the First IEEE International Conference on Image Processing, 2:982–986, 1994. M. J. Wainwright, E. P. Simoncelli, and A. S. Willsky. Random cascades on wavelet trees and their use in analyzing and modeling natural images. Applied and Computational Harmonic Analysis, 11:89–123, 2001. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004. A. B. Watson and J. Malo. Video quality measures based on the standard spatial observer. In Proceedings of the IEEE International Conference on Image Proceedings, volume 3, pages 41– 44, 2002. A. B. Watson and J. A. Solomon. A model of visual contrast gain control and pattern masking. Journal of the Optical Society of America A, 14(9):2379–2391, September 1997. 66
2 0.079948761 60 jmlr-2008-Minimal Nonlinear Distortion Principle for Nonlinear Independent Component Analysis
Author: Kun Zhang, Laiwan Chan
Abstract: It is well known that solutions to the nonlinear independent component analysis (ICA) problem are highly non-unique. In this paper we propose the “minimal nonlinear distortion” (MND) principle for tackling the ill-posedness of nonlinear ICA problems. MND prefers the nonlinear ICA solution with the estimated mixing procedure as close as possible to linear, among all possible solutions. It also helps to avoid local optima in the solutions. To achieve MND, we exploit a regularization term to minimize the mean square error between the nonlinear mixing mapping and the best-fitting linear one. The effect of MND on the inherent trivial and non-trivial indeterminacies in nonlinear ICA solutions is investigated. Moreover, we show that local MND is closely related to the smoothness regularizer penalizing large curvature, which provides another useful regularization condition for nonlinear ICA. Experiments on synthetic data show the usefulness of the MND principle for separating various nonlinear mixtures. Finally, as an application, we use nonlinear ICA with MND to separate daily returns of a set of stocks in Hong Kong, and the linear causal relations among them are successfully discovered. The resulting causal relations give some interesting insights into the stock market. Such a result can not be achieved by linear ICA. Simulation studies also verify that when doing causality discovery, sometimes one should not ignore the nonlinear distortion in the data generation procedure, even if it is weak. Keywords: nonlinear ICA, regularization, minimal nonlinear distortion, mean square error, best linear reconstruction
3 0.048827346 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model
Author: Matthias W. Seeger
Abstract: The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identification application appears in Steinke et al. (2007). Keywords: sparse linear model, Laplace prior, expectation propagation, approximate inference, optimal design, Bayesian statistics, gene network recovery, image coding, compressive sensing
4 0.046860103 1 jmlr-2008-A Bahadur Representation of the Linear Support Vector Machine
Author: Ja-Yong Koo, Yoonkyung Lee, Yuwon Kim, Changyi Park
Abstract: The support vector machine has been successful in a variety of applications. Also on the theoretical front, statistical properties of the support vector machine have been studied quite extensively with a particular attention to its Bayes risk consistency under some conditions. In this paper, we study somewhat basic statistical properties of the support vector machine yet to be investigated, namely the asymptotic behavior of the coefficients of the linear support vector machine. A Bahadur type representation of the coefficients is established under appropriate conditions, and their asymptotic normality and statistical variability are derived on the basis of the representation. These asymptotic results do not only help further our understanding of the support vector machine, but also they can be useful for related statistical inferences. Keywords: asymptotic normality, Bahadur representation, classification, convexity lemma, Radon transform
5 0.041405611 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes
Author: Arthur D. Szlam, Mauro Maggioni, Ronald R. Coifman
Abstract: Harmonic analysis and diffusion on discrete data has been shown to lead to state-of-the-art algorithms for machine learning tasks, especially in the context of semi-supervised and transductive learning. The success of these algorithms rests on the assumption that the function(s) to be studied (learned, interpolated, etc.) are smooth with respect to the geometry of the data. In this paper we present a method for modifying the given geometry so the function(s) to be studied are smoother with respect to the modified geometry, and thus more amenable to treatment using harmonic analysis methods. Among the many possible applications, we consider the problems of image denoising and transductive classification. In both settings, our approach improves on standard diffusion based methods. Keywords: diffusion processes, diffusion geometry, spectral graph theory, image denoising, transductive learning, semi-supervised learning
6 0.041128725 70 jmlr-2008-On Relevant Dimensions in Kernel Feature Spaces
7 0.039504331 78 jmlr-2008-Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension
8 0.035942297 87 jmlr-2008-Stationary Features and Cat Detection
9 0.033222634 59 jmlr-2008-Maximal Causes for Non-linear Component Extraction
10 0.033105105 17 jmlr-2008-Automatic PCA Dimension Selection for High Dimensional Data and Small Sample Sizes
11 0.032696236 58 jmlr-2008-Max-margin Classification of Data with Absent Features
12 0.031334896 68 jmlr-2008-Nearly Uniform Validation Improves Compression-Based Error Bounds
13 0.03047543 32 jmlr-2008-Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers
14 0.030411402 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
15 0.030364089 54 jmlr-2008-Learning to Select Features using their Properties
16 0.03031297 86 jmlr-2008-SimpleMKL
17 0.029601894 11 jmlr-2008-Aggregation of SVM Classifiers Using Sobolev Spaces
18 0.028816164 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers
19 0.02711506 42 jmlr-2008-HPB: A Model for Handling BN Nodes with High Cardinality Parents
20 0.026122965 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning (Special Topic on Causality)
topicId topicWeight
[(0, 0.127), (1, -0.037), (2, -0.001), (3, -0.007), (4, 0.001), (5, 0.017), (6, 0.047), (7, 0.072), (8, -0.0), (9, 0.023), (10, -0.074), (11, 0.119), (12, 0.016), (13, -0.078), (14, -0.001), (15, 0.062), (16, 0.061), (17, -0.027), (18, 0.16), (19, 0.178), (20, 0.113), (21, 0.232), (22, 0.167), (23, -0.344), (24, -0.174), (25, 0.14), (26, 0.163), (27, 0.078), (28, 0.016), (29, -0.038), (30, -0.141), (31, -0.004), (32, -0.045), (33, 0.022), (34, 0.017), (35, -0.18), (36, -0.114), (37, 0.163), (38, -0.105), (39, 0.199), (40, 0.119), (41, 0.041), (42, 0.052), (43, -0.118), (44, -0.0), (45, 0.158), (46, 0.126), (47, -0.056), (48, 0.028), (49, -0.128)]
simIndex simValue paperId paperTitle
same-paper 1 0.96693176 73 jmlr-2008-On the Suitable Domain for SVM Training in Image Coding
Author: Gustavo Camps-Valls, Juan Gutiérrez, Gabriel Gómez-Pérez, Jesús Malo
Abstract: Conventional SVM-based image coding methods are founded on independently restricting the distortion in every image coefficient at some particular image representation. Geometrically, this implies allowing arbitrary signal distortions in an n-dimensional rectangle defined by the ε-insensitivity zone in each dimension of the selected image representation domain. Unfortunately, not every image representation domain is well-suited for such a simple, scalar-wise, approach because statistical and/or perceptual interactions between the coefficients may exist. These interactions imply that scalar approaches may induce distortions that do not follow the image statistics and/or are perceptually annoying. Taking into account these relations would imply using non-rectangular εinsensitivity regions (allowing coupled distortions in different coefficients), which is beyond the conventional SVM formulation. In this paper, we report a condition on the suitable domain for developing efficient SVM image coding schemes. We analytically demonstrate that no linear domain fulfills this condition because of the statistical and perceptual inter-coefficient relations that exist in these domains. This theoretical result is experimentally confirmed by comparing SVM learning in previously reported linear domains and in a recently proposed non-linear perceptual domain that simultaneously reduces the statistical and perceptual relations (so it is closer to fulfilling the proposed condition). These results highlight the relevance of an appropriate choice of the image representation before SVM learning. Keywords: image coding, non-linear perception models, statistical independence, support vector machines, insensitivity zone c 2008 Gustavo Camps-Valls, Juan Guti´ rrez, Gabriel G´ mez-P´ rez and Jes´ s Malo. e o e u ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO 1. Problem Statement: The Diagonal Jacobian Condition Image coding schemes based on support vector machines (SVM) have been successfully introduced in the literature. SVMs have been used in the spatial domain (Robinson and Kecman, 2000), in the block-DCT domain (Robinson and Kecman, 2003), and in the wavelet domain (Ahmed, 2005; Jiao et al., 2005). These coding methods take advantage of the ability of the support vector regression (SVR) algorithm for function approximation using a small number of parameters (signal samples, or ¨ support vectors) (Smola and Scholkopf, 2004). In all current SVM-based image coding techniques, a representation of the image is described by the entropy-coded weights associated to the support vectors necessary to approximate the signal with a given accuracy. Relaxing the accuracy bounds reduces the number of needed support vectors. In a given representation domain, reducing the number of support vectors increases the compression ratio at the expense of bigger distortion (lower image quality). By applying the standard SVR formulation, a certain amount of distortion in each sample of the image representation is allowed. In the original formulation, scalar restrictions on the errors are introduced using a constant ε-insensitivity value for every sample. ´ Recently, this procedure has been refined by Gomez-P´ rez et al. (2005) using a profile-dependent e SVR (Camps-Valls et al., 2001) that considers a different ε for each sample or frequency. This frequency-dependent insensitivity, ε f , accounts for the fact that, according to simple (linear) perception models, not every sample in linear frequency domains (such as DCT or wavelets) contributes to the perceived distortion in the same way. Despite different domains have been proposed for SVM training (spatial domain, block-DCT and wavelets) and different ε insensitivities per sample have been proposed, in conventional SVR formulation, the particular distortions introduced by regression in the different samples are not coupled. In all the reported SVM-based image coding schemes, the RBF kernel is used and the penalization parameter is fixed to an arbitrarily large value. In this setting, considering n-sample signals as n-dimensional vectors, the SVR guarantees that the approximated vectors are confined in n-dimensional rectangles around the original vectors. These rectangles are just n-dimensional cubes in the standard formulation or they have certain elongation if different ε f are considered in each axis, f . Therefore, in all the reported SVM-based coding methods, these rectangles are always oriented along the axes of the (linear) image representation. According to this, a common feature of these (scalar-wise) approaches is that they give rise to decoupled distortions in each dimension. P´ rez-Cruz et al. (2002) proposed a hyperspherical insensitivity zone to correct the penalization e factor in each dimension of multi-output regression problems, but again, restrictions to each sample were still uncoupled. This scalar-wise strategy is not the best option in domains where the different dimensions of the image representation are not independent. For instance, consider the situation where actually independent components, r f , are obtained from a given image representation, y, applying some eventually non-linear transform, R: R y −→ r. In this case, SVM regression with scalar-wise error restriction makes sense in the r domain. However, the original y domain will not be suitable for the standard SVM regression unless the matrix ∇R is diagonal (up to any permutation of the dimensions, that is, only one non-zero element per row). Therefore, if transforms that achieve independence have non-diagonal Jacobian, scalar-wise restrictions in the original (coupled coefficients) domain y are not allowed. 50 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING y2 1 −0,5 −0,5 0,4 R= r2 y1 r1 R−1 = 0,83 −0,83 0,67 −1,67 Figure 1: Insensitivity regions in different representation domains, y (left) and r (right), related by a non-diagonal transform ∇R and its inverse ∇R−1 . Figure 1 illustrates this situation. The shaded region in the right plot (r domain) represents the n-dimensional box determined by the ε f insensitivities in each dimension ( f =1,2), in which a scalar-wise approach is appropriate due to independence among signal coefficients. Given that the particular ∇R transform is not diagonal, the corresponding shaded region in the left plot (the original y domain) is not aligned along the axes of the representation. This has negative implications: note that for the highlighted points, smaller distortions in both dimensions in the y domain (as implied by SVM with tighter but scalar ε f insensitivities) do not necessarily imply lying inside the insensitivity region in the final truly independent (and meaningful) r domain. Therefore, the original y domain is not suitable for the direct application of conventional SVM, and consequently, non-trivial coupled insensitivity regions are required. Summarizing, in the image coding context, the condition for an image representation y to be strictly suitable for conventional SVM learning is that the transform that maps the original representation y to an independent coefficient representation r must be locally diagonal. As will be reviewed below, independence among coefficients (and the transforms to obtain them) may be defined in both statistical and perceptual terms (Hyvarinen et al., 2001; Malo et al., 2001; Epifanio et al., 2003; Malo et al., 2006). On the one hand, a locally diagonal relation to a statistically independent representation is desirable because independently induced distortions (as the conventional SVM approach does) will preserve the statistics of the distorted signal, that is, it will not introduce artificial-looking artifacts. On the other hand, a locally diagonal relation to a perceptually 51 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO independent representation is desirable because independently induced distortions do not give rise to increased subjective distortions due to non-trivial masking or facilitation interactions between the distortions in each dimension (Watson and Solomon, 1997). In this work, we show that conventional linear domains do not fulfill the diagonal Jacobian condition in either the statistical case or in the perceptual case. This theoretical result is experimentally confirmed by comparing SVM learning in previously reported linear domains (Robinson and Kecman, 2003; G´ mez-P´ rez et al., 2005) and in a recently proposed non-linear perceptual domain o e that simultaneously reduces the statistical and the perceptual relations (Malo et al., 2006), thus, this non-linear perceptual domain is closer to fulfilling the proposed condition. The rest of the paper is structured as follows. Section 2 reviews the fact that linear coefficients of the image representations commonly used for SVM training are neither statistically independent nor perceptually independent. Section 3 shows that transforms for obtaining statistical and/or perceptual independence from linear domains have non-diagonal Jacobian. This suggests that there is room to improve the performance of conventional SVM learning reported in linear domains. In Section 4, we propose the use of a perceptual representation for SVM training because it strictly fulfills the diagonal Jacobian condition in the perceptual sense and increases the statistical independence among coefficients, bringing it closer to fulfilling the condition in the statistical sense. The experimental image coding results confirm the superiority of this domain for SVM training in Section 5. Section 6 presents the conclusions and final remarks. 2. Statistical and Perceptual Relations Among Image Coefficients Statistical independence among the coefficients of a signal representation refers to the fact that the joint PDF of the class of signals to be considered can be expressed as a product of the marginal PDFs in each dimension (Hyvarinen et al., 2001). Simple (second-order) descriptions of statistical dependence use the non-diagonal nature of the covariance matrix (Clarke, 1985; Gersho and Gray, 1992). More recent and accurate descriptions use higher-order moments, mutual information, or the non-Gaussian nature (sparsity) of marginal PDFs (Hyvarinen et al., 2001; Simoncelli, 1997). Perceptual independence refers to the fact that the visibility of errors in coefficients of an image may depend on the energy of neighboring coefficients, a phenomenon known in the perceptual literature as masking or facilitation (Watson and Solomon, 1997). Perceptual dependence has been formalized just up to second order, and this may be described by the non-Euclidean nature of the perceptual metric matrix (Malo et al., 2001; Epifanio et al., 2003; Malo et al., 2006). 2.1 Statistical Relations In recent years, a variety of approaches, known collectively as “independent component analysis” (ICA), have been developed to exploit higher-order statistics for the purpose of achieving a unique linear solution for coefficient independence (Hyvarinen et al., 2001). The basis functions obtained when these methods are applied to images are spatially localized and selective for both orientation and spatial frequency (Olshausen and Field, 1996; Bell and Sejnowski, 1997). Thus, they are similar to basis functions of multi-scale wavelet representations. Despite its name, linear ICA does not actually produce statistically independent coefficients when applied to photographic images. Intuitively, independence would seem unlikely, since images are not formed from linear superpositions of independent patterns: the typical combination rule for the elements of an image is occlusion. Empirically, the coefficients of natural image decom52 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING | f | = 10.8 cpd | f | = 24.4 cpd −20 −10 −10 fy (cpd) −30 −20 fy (cpd) −30 0 0 10 10 20 20 30 30 −30 −20 −10 0 f (cpd) 10 20 30 −30 x −20 −10 0 f (cpd) 10 20 30 x Figure 2: Statistical interaction of two particular coefficients of the local Fourier Transform with their neighbors in a natural image database. The absolute value of the frequency of these coefficients is | f | = 10.8 and | f | = 24.4 cycles/degree (cpd). positions in spatially localized oscillating basis functions are found to be fairly well decorrelated (i.e., their covariance is almost zero). However, the amplitudes of coefficients at nearby spatial positions, orientations, and scales are highly correlated (even with orthonormal transforms) (Simoncelli, ´ 1997; Buccigrossi and Simoncelli, 1999; Wainwright et al., 2001; Hyvarinen et al., 2003; Guti errez et al., 2006; Malo et al., 2006; Malo and Guti´ rrez, 2006). This suggests that achieving statistical e independence requires the introduction of non-linearities beyond linear ICA transforms. Figure 2 reproduces one of many results that highlight the presence of statistical relations of natural image coefficients in block PCA or linear ICA-like domains: the energy of spatially localized oscillating filters is correlated with the energy of neighboring filters in scale and orientation (see Guti´ rrez et al., 2006). A remarkable feature is that the interaction width increases with frequency, e as has been reported in other domains, for example, wavelets (Buccigrossi and Simoncelli, 1999; Wainwright et al., 2001; Hyvarinen et al., 2003), and block-DCT (Malo et al., 2006). In order to remove the remaining statistical relations in the linear domains y, non-linear ICA methods are necessary (Hyvarinen et al., 2001; Lin, 1999; Karhunen et al., 2000; Jutten and Karhunen, 2003). Without lack of generality, non-linear ICA transforms can be schematically understood as a two-stage process (Malo and Guti´ rrez, 2006): e T R (( y hh x hh (( r, (1) R−1 T−1 where x is the image representation in the spatial domain, and T is a global unitary linear transform that removes second-order and eventually higher-order relations among coefficients in the spatial domain. Particular examples of T include block PCA, linear ICAs, DCT or wavelets. In the ICA literature notation, T is the separating matrix and T−1 is the mixing matrix. The second transform 53 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO R is an additional non-linearity that is introduced in order to remove the statistical relations that still remain in the y domain. 2.2 Perceptual Relations Perceptual dependence among coefficients in different image representations can be understood by using the current model of V1 cortex. This model can also be summarized by the two-stage (linear and non-linear) process described in Equation (1). In this perceptual case, T is also a linear filter bank applied to the original input image in the spatial domain. This filter bank represents the linear behavior of V1 neurons whose receptive fields happen to be similar to wavelets or linear ICA basis functions (Olshausen and Field, 1996; Bell and Sejnowski, 1997). The second transform, R, is a non-linear function that accounts for the masking and facilitation phenomena that have been reported in the linear y domain (Foley, 1994; Watson and Solomon, 1997). Section 3.2 gives a parametric expression for the second non-linear stage, R: the divisive normalization model (Heeger, 1992; Foley, 1994; Watson and Solomon, 1997). This class of models is based on psychophysical experiments assuming that the last domain, r, is perceptually Euclidean (i.e., perfect perceptual independence). An additional confirmation of this assumption is the success of (Euclidean) subjective image distortion measures defined in that domain (Teo and Heeger, 1994). Straightforward application of Riemannian geometry to obtain the perceptual metric matrix in other domains shows that the coefficients of linear domains x and y, or any other linear transform of them, are not perceptually independent (Epifanio et al., 2003). Figure 3 illustrates the presence of perceptual relations between coefficients when using linear block frequency or wavelet-like domains, y: the cross-masking behavior. In this example, the visibility of the distortions added on top of the background image made of periodic patterns has to be assessed. This is a measure of the sensitivity of a particular perceptual mechanism to distortions in that dimension, ∆y f , when mechanisms tuned to other dimensions are simultaneously active, that is, y f = 0, with f = f . As can be observed, low frequency noise is more visible in high frequency backgrounds than in low frequency backgrounds (e.g., left image). Similarly, high frequency noise is more visible in low frequency backgrounds than in high frequency ones (e.g., right image). That is to say, a signal of a specific frequency strongly masks the corresponding frequency analyzer, but it induces a smaller sensitivity reduction in the analyzers that are tuned to different frequencies. In other words, the reduction in sensitivity of a specific analyzer gets larger as the distance between the background frequency and the frequency of the analyzer gets smaller. The response of each frequency analyzer not only depends on the energy of the signal for that frequency band, but also on the energy of the signal in other frequency bands (cross-masking). This implies that a different amount of noise in each frequency band may be acceptable depending on the energy of that frequency band and on the energy of neighboring bands. This is what we have called perceptual dependence among different coefficients in the y domain. At this point, it is important to stress the similarity between the set of computations to obtain statistically decoupled image coefficients and the known stages of biological vision. In fact, it has been hypothesized that biological visual systems have organized their sensors to exploit the particular statistics of the signals they have to process. See Barlow (2001), Simoncelli and Olshausen (2001), and Simoncelli (2003) for reviews on this hypothesis. In particular, both the linear and the non-linear stages of the cortical processing have been successfully derived using redundancy reduction arguments: nowadays, the same class of linear 54 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING 3 cpd 6 cpd 12 cpd 24 cpd Figure 3: Illustrative example of perceptual dependence (cross-masking phenomenon). Equal energy noise of different frequency content, 3 cycl/deg (cpd), 6 cpd, 12 cpd and 24 cpd, shown on top of a background image. Sampling frequency assumes that these images subtend an angle of 3 deg. stage T is used in transform coding algorithms and in vision models (Olshausen and Field, 1996; Bell and Sejnowski, 1997; Taubman and Marcellin, 2001), and new evidence supports the same idea for the second non-linear stage (Schwartz and Simoncelli, 2001; Malo and Guti´ rrez, 2006). e According to this, the statistical and perceptual transforms, R, that remove the above relations from the linear domains, y, would be very similar if not the same. 3. Statistical and Perceptual Independence Imply Non-diagonal Jacobian In this section, we show that both statistical redundancy reduction transforms (e.g., non-linear ICA) and perceptual independence transforms (e.g., divisive normalization), have non-diagonal Jacobian for any linear image representation, so they are not strictly suitable for conventional SVM training. 3.1 Non-diagonal Jacobian in Non-linear ICA Transforms One possible approach for dealing with global non-linear ICA is to act differentially by breaking the problem into local linear pieces that can then be integrated to obtain the global independent coefficient domain (Malo and Guti´ rrez, 2006). Each differential sub-problem around a particular e point (image) can be locally solved using the standard linear ICA methods restricted to the neighbors of that point (Lin, 1999). Using the differential approach in the context of a two-stage process such as the one in Equation (1), it can be shown that (Malo and Guti´ rrez, 2006): e r = r0 + Z x x0 T (x ) dx = r0 + Z x x0 ∇R(Tx ) T dx , (2) where T (x ) is the local separating matrix for a neighborhood of the image x , and T is the global separating matrix for the whole PDF. Therefore, the Jacobian of the second non-linear stage is: ∇R(y) = ∇R(Tx) = T (x) T−1 . 55 (3) ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO As local linear independent features around a particular image, x, differ in general from global linear independent features, that is, T (x) = T, the above product is not the identity nor diagonal in general. 3.2 Non-diagonal Jacobian in Non-linear Perceptual Transforms The current response model for the cortical frequency analyzers is non-linear (Heeger, 1992; Watson and Solomon, 1997). The outputs of the filters of the first linear stage, y, undergo a non-linear sigmoid transform in which the energy of each linear coefficient is weighted by a linear Contrast Sensitivity Function (CSF) (Campbell and Robson, 1968; Malo et al., 1997) and is further normalized by a combination of the energies of neighbor coefficients in frequency, r f = R(y) f = sgn(y f ) |α f y f |γ , β f + ∑n =1 h f f |α f y f |γ f (4) where α f (Figure 4[top left]) are CSF-like weights, β f (Figure 4[top right]) control the sharpness of the response saturation for each coefficient, γ is the so called excitation exponent, and the matrix h f f determines the interaction neighborhood in the non-linear normalization of the energy. This interaction matrix models the cross-masking behavior (cf. Section 2.2). The interaction in this matrix is assumed to be Gaussian (Watson and Solomon, 1997), and its width increases with the frequency. Figure 4[bottom] shows two examples of this Gaussian interaction for two particular coefficients in a local Fourier domain. Note that the width of the perceptual interaction neighborhood increases with the frequency in the same way as the width of the statistical interaction neighborhood shown in Figure 2. We used a value of γ = 2 in the experiments. Taking derivatives in the general divisive normalization model, Equation (4), we obtain ∇R(y) f f = sgn(y f )γ α f |α f y f |γ |α f y f |γ−1 α f |α f y f |γ−1 δf f − hf f β f + ∑n =1 h f f |α f y f |γ (β f + ∑n =1 h f f |α f y f |γ )2 f f , (5) which is not diagonal because of the interaction matrix, h, which describes the cross-masking between each frequency f and the remaining f = f . Note that the intrinsic non-linear nature of both the statistical and perceptual transforms, Equations (3) and (5), makes the above results true for any linear domain under consideration. Specifically, if any other possible linear domain for image representation is considered, y = T y, then the Jacobian of the corresponding independence transform, R , is ∇R (y ) = ∇R(y) T −1 , which, in general, will also be non-diagonal because of the non-diagonal and point-dependent nature of ∇R(y). To summarize, since no linear domain fulfills the diagonal Jacobian condition in either statistical or perceptual terms, the negative situation illustrated in Figure 1 may occur when using SVM in these domains. Therefore, improved results could be obtained if SVM learning were applied after some transform achieving independent coefficients, R. 4. SVM Learning in a Perceptually Independent Representation In order to confirm the above theoretical results (i.e., the unsuitability of linear representation domains for SVM learning) and to assess the eventual gain that can be obtained from training SVR 56 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING 0.04 αf 0.015 βf 0.03 0.012 0.02 0.009 0.006 0.01 0.003 0 0 4 8 12 16 20 24 28 32 0 f (frequency cycles/degree) 4 8 | f | = 10.8 cpd 16 20 24 28 32 | f | = 24.4 cpd −30 −20 −20 −10 −10 f (cpd) −30 0 y 0 y f (cpd) 12 f (frequency cycles/degree) 10 10 20 20 30 −30 −20 −10 0 fx (cpd) 10 20 30 −30 30 −20 −10 0 fx (cpd) 10 20 30 Figure 4: Parameters of the perceptual model: α f (top left), β f (top right). Bottom figures represent perceptual interaction neighborhoods h f f of two particular coefficients of the local Fourier domain. in a more appropriate domain, we should compare the performance of SVRs in previously reported linear domains (e.g., block-DCT or wavelets) and in one of the proposed non-linear domains (either the statistically independent domain or the perceptually independent domain). Exploration of the statistical independence transform may have academic interest but, in its present formulation, it is not practical for coding purposes: direct application of non-linear ICA as in Equation (2) is very time-consuming for high dimensional vectors since lots of local ICA computations are needed to transform each block, and a very large image database is needed for a robust and significant computation of R. Besides, an equally expensive differential approach is also needed to compute the inverse R−1 for image decoding. In contrast, the perceptual non-linearity (and its inverse) are analytical. These analytical expressions are feasible for reasonable block sizes, and there are efficient iterative methods that can be used for larger vectors (Malo et al., 2006). In this paper, we explore the use of a psychophysically-based divisive normalized domain: first compute a block-DCT transform and then apply the divisive normalization model described above for each block. The results will be compared to the first competitive SVM coding results (Robinson 57 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO ´ and Kecman, 2003) and the posterior improvements reported by G omez-P´ rez et al. (2005), both e formulated in the linear block-DCT domain. As stated in Section 2, by construction, the proposed domain is perceptually Euclidean with perceptually independent components. The Euclidean nature of this domain has an additional benefit: the ε-insensitivity design is very simple because a constant value is appropriate due to the constant perceptual relevance of all coefficients. Thus, direct application of the standard SVR method is theoretically appropriate in this domain. Moreover, beyond its built-in perceptual benefits, this psychophysically-based divisive normalization has attractive statistical properties: it strongly reduces the mutual information between the final coefficients r (Malo et al., 2006). This is not surprising according to the hypothesis that try to explain the early stages of biological vision systems using information theory arguments (Barlow, 1961; Simoncelli and Olshausen, 2001). Specifically, dividing the energy of each linear coefficient by the energy of the neighbors, which are statistically related with it, cf. Figure 2, gives coefficients with reduced statistical dependence. Moreover, as the empirical non-linearities of perception have been reproduced using non-linear ICA in Equation (2) (Malo and Guti´ rrez, 2006), the empirical die visive normalization can be seen as a convenient parametric way to obtain statistical independence. 5. Performance of SVM Learning in Different Domains In this section, we analyze the performance of SVM-based coding algorithms in linear and nonlinear domains through rate-distortion curves and explicit examples for visual comparison. In addition, we discuss how SVM selects support vectors in these domains to represent the image features. 5.1 Model Development and Experimental Setup In the (linear) block-DCT domain, y, we use the method introduced by Robinson and Kecman (2003) (RKi-1), in which the SVR is trained to learn a fixed (low-pass) number of DCT coefficients ´ (those with frequency bigger than 20 cycl/deg are discarded); and the method proposed by G omezP´ rez et al. (2005) (CSF-SVR), in which the relevance of all DCT coefficients is weighted according e to the CSF criterion using an appropriately modulated ε f . In the non-linear domain, r, we use the SVR with constant insensitivity parameter ε (NL-SVR). In all cases, the block-size is 16×16, that is, y, r ∈ R256 . The behavior of JPEG standard is also included in the experiments for comparison purposes. As stated in Section 1, we used the RBF kernel and arbitrarily large penalization parameter in every SVR case. In all experiments, we trained the SVR models without the bias term, and modelled the absolute value of the DCT, y, or response coefficients, r. All the remaining free parameters (ε-insensitivity and Gaussian width of the RBF kernel σ) were optimized for all the considered models and different compression ratios. In the NL-SVM case, the parameters of the divisive normalization used in the experiments are shown in Figure 4. After training, the signal is described by the uniformly quantized Lagrange multipliers of the support vectors needed to keep the regression error below the thresholds ε f . The last step is entropy coding of the quantized weights. The compression ratio is controlled by a factor applied to the thresholds, ε f . 58 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING 5.2 Model Comparison In order to assess the quality of the coded images, three different measures were used: the standard ´ (Euclidean) RMSE, the Maximum Perceptual Error (MPE) (Malo et al., 2000; G omez-P´ rez et al., e 2005; Malo et al., 2006) and the also perceptually meaningful Structural SIMilarity (SSIM) index (Wang et al., 2004). Eight standard 256×256 monochrome 8 bits/pix images were used in the experiments. Average rate-distortion curves are plotted in Figure 5 in the range [0.05, 0.6] bits/pix (bpp). According to these entropy-per-sample data, original file size was 64 KBytes in every case, while the compressed image sizes were in the range [0.4, 4.8] KBytes. This implies that the compression ratios were in the range [160:1, 13:1]. In general, a clear gain over standard JPEG is obtained by all SVM-based methods. According to the standard Euclidean MSE point of view, the performance of RKi-1 and CSF-SVR algorithms is basically the same (note the overlapped curves in Figure 5(a)). However, it is widely known that the MSE results are not useful to represent the subjective quality of images, as extensively reported elsewhere (Girod, 1993; Teo and Heeger, 1994; Watson and Malo, 2002). When using more appropriate (perceptually meaningful) quality measures (Figures 5(b)-(c)), the CSF-SVR obtains a certain advantage over the RKi-1 algorithm for all compression rates, which was already reported by G´ mez-P´ rez et al. (2005). In all measures, and for the whole considered entropy range, the o e proposed NL-SVR clearly outperforms all previously reported methods, obtaining a noticeable gain at medium-to-high compression ratios (between 0.1 bpp (80:1) and 0.3 bpp (27:1)). Taking into account that the recommended bit rate for JPEG is about 0.5 bpp, from Figure 5 we can also conclude that the proposed technique achieves the similar quality levels at a lower bit rate in the range [0.15, 0.3] bpp. Figure 6 shows representative visual results of the considered SVM strategies on standard images (Lena and Barbara) at the same bit rate (0.3 bpp, 27:1 compression ratio or 2.4 KBytes in 256×256 images). The visual inspection confirms that the numerical gain in MPE and SSIM shown in Figure 5 is also perceptually significant. Some conclusions can be extracted from this figure. ´ First, as previously reported by Gomez-P´ rez et al. (2005), RKi-1 leads to poorer (blocky) results e because of the crude approximation of the CSF (as an ideal low-pass filter) and the equal relevance applied to the low-frequency DCT-coefficients. Second, despite the good performance yielded by the CSF-SVR approach to avoid blocking effects, it is worth noting that high frequency details are smoothed (e.g., see Barbara’s scarf). These effects are highly alleviated by introducing SVR in the non-linear domain. See, for instance, Lena’s eyes, her hat’s feathers or the better reproduction of the high frequency pattern in Barbara’s clothes. Figure 7 shows the results obtained by all considered methods at a very high compression ratio for the Barbara image (0.05 bpp, 160:1 compression ratio or 0.4 KBytes in 256×256 images). This experiment is just intended to show the limits of methods performance since it is out of the recommended rate ranges. Even though this scenario is unrealistic, differences among methods are still noticeable: the proposed NL-SVR method reduces the blocky effects (note for instance that the face is better reproduced). This is due to a better distribution of support vectors in the perceptually independent domain. 5.3 Support Vector Distribution The observed different perceptual image quality obtained with each approach is a direct consequence of support vector distribution in different domains. Figure 8 shows a representative example 59 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO 20 JPEG RKi−1 CSF−SVR NL−SVR 18 RMSE 16 14 12 10 8 0.1 0.2 0.3 0.4 Entropy (bits/pix) 0.5 0.6 22 JPEG RKi−1 CSF−SVR NL−SVR 20 18 16 MPE 14 12 10 8 6 4 0.1 0.2 0.3 0.4 Entropy (bits/pix) 0.5 0.6 0.85 0.8 SSIM 0.75 0.7 0.65 JPEG 0.6 RKi−1 CSF−SVR 0.55 NL−SVR 0.5 0.1 0.2 0.3 0.4 Entropy (bits/pix) 0.5 0.6 Figure 5: Average rate distortion curves over eight standard images (Lena, Barbara, Boats, Einstein, Peppers, Mandrill, Goldhill, Camera man) using objective and subjective measures for the considered JPEG (dotted) and the SVM approaches (RKi-1 dash-dotted, CSF-SVR dashed and NL-SVR solid). RMSE distortion (top), Maximum Perceptual Error, MPE (middle) (Malo et al., 2000; G´ mez-P´ rez et al., 2005; Malo et al., 2006), and Structural o e SIMilarity index, SSIM (bottom) (Wang et al., 2004). 60 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING Rki Rki SVR+CSF SVR+CSF NL+SVR NL+SVR Figure 6: Examples of decoded Lena (left) and Barbara (right) images at 0.3 bits/pix. From top to bottom: JPEG, RKi-1, CSF-SVR, and NL-SVR. 61 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO (a) (b) (c) (d) Figure 7: Examples of decoded Barbara images at a high compression ratio of 0.05 bits/pix (160:1) for (a) JPEG, (b) RKi-1, (c) CSF-SVR, and (d) NL-SVR. of the distribution of the selected support vectors by the RKi-1 and the CSF-SVR models working in the linear DCT domain, and the NL-SVM working in the perceptually independent non-linear domain r. Specifically, a block of Barbara’s scarf at different compression ratios is used for illustration purposes. The RKi-1 approach (Robinson and Kecman, 2003) uses a constant ε but, in order to consider the low subjective relevance of the high-frequency region, the corresponding coefficients are neglected. As a result, this approach only allocates support vectors in the low/medium frequency regions. The CSF-SVR approach uses a variable ε according to the CSF and gives rise to a more natural concentration of support vectors in the low/medium frequency region, which captures medium to high frequency details at lower compression rates (0.5 bits/pix). Note that the number of support vectors is bigger than in the RKi-1 approach, but it selects some necessary high-frequency coefficients to keep the error below the selected threshold. However, for bigger compression ratios (0.3 bits/pix), it misrepresents some high frequency, yet relevant, features (e.g., the peak from the stripes). The NL-SVM approach works in the non-linear transform domain, in which a more uniform coverage 62 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING (a) (b) (c) 0 200 10 100 0 200 5 5 15 0 0 20 30 25 20 10 20 30 fy (cpd) 10 0 fy (cpd) 30 10 fx (cpd) 0 200 5 10 100 0 200 5 10 100 15 25 10 30 20 30 f (cpd) y 10 0 30 10 0.5 20 30 25 20 5 1 0 0 20 30 0 1.5 15 15 0 fy (cpd) 30 0 fx (cpd) 20 25 20 0 fx (cpd) fx (cpd) 20 30 25 20 30 10 0.5 15 0 5 1 10 100 15 0 1.5 25 20 f (cpd) y 10 30 fy (cpd) 0 0 fx (cpd) fx (cpd) Figure 8: Signal in different domains and the selected support vectors by the SVM models in a block of the Barbara image at 0.3 bits/pix (top row) and 0.5 bits/pix (bottom row). Different domains are analyzed: (a) linear DCT using RKi-1, (b) linear DCT with CSF-SVM, and (c) non-linear perceptual domain with standard ε-SVM (NL-SVR). of the domain is done, accounting for richer (and perceptually independent) coefficients to perform efficient sparse signal reconstruction. It is important to remark that, for a given method (or domain), tightening ε f implies (1) considering more support vectors, and (2) an increase in entropy (top and bottom rows in Figure 8, 0.3 bpp to 0.5 bpp). However, note that the relevant measure is the entropy and not the number of support vectors: even though the number of selected support vectors in the r domain is higher, their variance is lower, thus giving rise to the same entropy after entropy coding. 6. Conclusions In this paper, we have reported a condition on the suitable domain for developing efficient SVM image coding schemes. The so-called diagonal Jacobian condition states that SVM regression with scalar-wise error restriction in a particular domain makes sense only if the transform that maps this domain to an independent coefficient representation is locally diagonal. We have demonstrated that, 63 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO in general, linear domains do not fulfill this condition because non-trivial statistical and perceptual inter-coefficient relations do exist in these domains. This theoretical finding has been experimentally confirmed by observing that improved compression results are obtained when SVM is applied in a non-linear perceptual domain that starts from the same linear domain used by previously reported SVM-based image coding schemes. These results highlight the relevance of an appropriate image representation choice before SVM learning. Further work is tied to the use of SVM-based coding schemes in statistically, rather than perceptually, independent non-linear ICA domains. In order to do so, local PCA instead of local ICA may be used in the local-to-global differential approach (Malo and Guti´ rrez, 2006) to speed up the e non-linear computation. Acknowledgments This work has been partly supported by the Spanish Ministry of Education and Science under grant CICYT TEC2006-13845/TCM and by the Generalitat Valenciana under grant GV-06/215. References R. Ahmed. Wavelet-based image compression using SVM learning and encoding techniques. Proc. 8th IASTED International Conference Computer Graphics and Imaging, 1:162–166, 2005. H. B. Barlow. Redundancy reduction revisited. Network: Comp. Neur. Sys., 12:241–253, 2001. H. B. Barlow. Possible principles underlying the transformation of sensory messages. In WA Rosenblith, editor, Sensory Communication, pages 217–234. MIT Press, Cambridge, MA, 1961. A. J. Bell and T. J. Sejnowski. The ‘independent components’ of natural scenes are edge filters. Vision Research, 37(23):3327–3338, 1997. R. W. Buccigrossi and E. P. Simoncelli. Image compression via joint statistical characterization in the wavelet domain. IEEE Transactions on Image Processing, 8(12):1688–1701, 1999. F. W. Campbell and J. G. Robson. Application of Fourier analysis to the visibility of gratings. Journal of Physiology, 197(3):551–566, August 1968. G. Camps-Valls, E. Soria-Olivas, J. P´ rez-Ruixo, A. Art´ s-Rodr´guez, F. P´ rez-Cruz, and e e ı e A. Figueiras-Vidal. A profile-dependent kernel-based regression for cyclosporine concentration prediction. In Neural Information Processing Systems (NIPS) – Workshop on New Directions in Kernel-Based Learning Methods, Vancouver, Canada, December 2001. R. J. Clarke. Transform Coding of Images. Academic Press, New York, 1985. I. Epifanio, J. Guti´ rrez, and J. Malo. Linear transform for simultaneous diagonalization of coe variance and perceptual metric matrix in image coding. Pattern Recognition, 36(8):1799–1811, August 2003. J. M. Foley. Human luminance pattern mechanisms: Masking experiments require a new model. Journal of the Optical Society of America A, 11(6):1710–1719, 1994. 64 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING A. Gersho and R. M. Gray. Vector Quantization and Signal Compression. Kluwer Academic Press, Boston, 1992. B. Girod. What’s wrong with mean-squared error. In A. B. Watson, editor, Digital Images and Human Vision, pages 207–220. The MIT press, 1993. G. G´ mez-P´ rez, G. Camps-Valls, J. Guti´ rrez, and J. Malo. Perceptually adaptive insensitivity for o e e support vector machine image coding. IEEE Transactions on Neural Networks, 16(6):1574–1581, 2005. J. Guti´ rrez, F. J. Ferri, and J. Malo. Regularization operators for natural images based on nonlinear e perception models. IEEE Transactions on Image Processing, 15(1):189–2000, January 2006. D. J. Heeger. Normalization of cell responses in cat striate cortex. Vis. Neurosci., 9:181–198, 1992. A. Hyvarinen, J. Karhunen, and E. Oja. Independent Component Analysis. John Wiley & Sons, New York, 2001. A. Hyvarinen, J. Hurri, and J. Vayrynenm. Bubbles: a unifying framework for low-level statistical properties of natural image sequences. Journal of the Optical Society of America A, 20(7), 2003. R. Jiao, Y. Li, Q. Wang, and B. Li. SVM regression and its application to image compression. In International Conference on Intelligent Computing, volume 3645, pages 747–756. Lecture Notes on Computer Science, 2005. C. Jutten and J. Karhunen. Advances in nonlinear blind source separation. Proc. of the 4th Int. Symp. on Independent Component Analysis and Blind Signal Separation (ICA2003), pages 245– 256, 2003. J. Karhunen, S. Malaroiu, and M. Ilmoniemi. Local linear independent component analysis based on clustering. Intl. J. Neur. Syst., 10(6), December 2000. J. K. Lin. Factorizing probability density functions: Generalizing ICA. In Proc. of the First Intl. Workshop on ICA and Signal Separation, pages 313–318, Aussois, France, 1999. J. Malo and J. Guti´ rrez. V1 non-linear properties emerge from local-to-global non-linear ICA. e Network: Computation in Neural Systems, 17:85–102, 2006. J. Malo, A. M. Pons, A. Felipe, and J. M. Artigas. Characterization of human visual system threshold performance by a weighting function in the Gabor domain. Journal of Modern Optics, 44(1): 127–148, 1997. J. Malo, F. Ferri, J. Albert, J. Soret, and J. M. Artigas. The role of perceptual contrast non-linearities in image transform coding. Image & Vision Computing, 18(3):233–246, February 2000. J. Malo, J. Guti´ rrez, I. Epifanio, F. Ferri, and J. M. Artigas. Perceptual feed-back in multigrid moe tion estimation using an improved DCT quantization. IEEE Transactions on Image Processing, 10(10):1411–1427, October 2001. J. Malo, I. Epifanio, R. Navarro, and E. Simoncelli. Non-linear image representation for efficient perceptual coding. IEEE Transactions on Image Processing, 15(1):68–80, 2006. 65 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–609, 1996. F. P´ rez-Cruz, G. Camps-Valls, E. Soria-Olivas, J. J. P´ rez-Ruixo, A. R. Figueiras-Vidal, and e e A. Art´ s-Rodr´guez. Multi-dimensional function approximation and regression estimation. In e ı International Conference on Artificial Neural Networks, ICANN2002, Madrid, Spain., Aug 2002. Lecture Notes in Computer Science. Springer–Verlag. J. Robinson and V. Kecman. The use of Support Vector Machines in image compression. In Proceedings of the International Conference on Engineering Intelligence Systems, EIS’2000, volume 36, pages 93–96, University of Paisley, Scotland, U.K., June 2000. J. Robinson and V. Kecman. Combining Support Vector Machine learning with the discrete cosine transform in image compression. IEEE Transactions Neural Networks, 14(4):950–958, July 2003. O. Schwartz and E. P. Simoncelli. Natural signal statistics and sensory gain control. Nature Neuroscience, 4(8):819–825, 2001. E. P. Simoncelli. Vision and the statistics of the visual environment. Current Opinion in Neurobiology, 13(2):144–149, 2003. E. P. Simoncelli. Statistical models for images: Compression, restoration and synthesis. In 31st Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, 1997. E. P. Simoncelli and B. O. Olshausen. Natural image statistics and neural representation. Annual Review of Neuroscience, 24:1193–1216, 2001. A. J. Smola and B. Sch¨ lkopf. A tutorial on support vector regression. Statistics and Computing, o 14:199–222, 2004. D. S. Taubman and M. W. Marcellin. JPEG2000: Image Compression Fundamentals, Standards and Practice. Kluwer Academic Publishers, Boston, 2001. P. C. Teo and D. J. Heeger. Perceptual image distortion. Proceedings of the First IEEE International Conference on Image Processing, 2:982–986, 1994. M. J. Wainwright, E. P. Simoncelli, and A. S. Willsky. Random cascades on wavelet trees and their use in analyzing and modeling natural images. Applied and Computational Harmonic Analysis, 11:89–123, 2001. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004. A. B. Watson and J. Malo. Video quality measures based on the standard spatial observer. In Proceedings of the IEEE International Conference on Image Proceedings, volume 3, pages 41– 44, 2002. A. B. Watson and J. A. Solomon. A model of visual contrast gain control and pattern masking. Journal of the Optical Society of America A, 14(9):2379–2391, September 1997. 66
2 0.57163852 60 jmlr-2008-Minimal Nonlinear Distortion Principle for Nonlinear Independent Component Analysis
Author: Kun Zhang, Laiwan Chan
Abstract: It is well known that solutions to the nonlinear independent component analysis (ICA) problem are highly non-unique. In this paper we propose the “minimal nonlinear distortion” (MND) principle for tackling the ill-posedness of nonlinear ICA problems. MND prefers the nonlinear ICA solution with the estimated mixing procedure as close as possible to linear, among all possible solutions. It also helps to avoid local optima in the solutions. To achieve MND, we exploit a regularization term to minimize the mean square error between the nonlinear mixing mapping and the best-fitting linear one. The effect of MND on the inherent trivial and non-trivial indeterminacies in nonlinear ICA solutions is investigated. Moreover, we show that local MND is closely related to the smoothness regularizer penalizing large curvature, which provides another useful regularization condition for nonlinear ICA. Experiments on synthetic data show the usefulness of the MND principle for separating various nonlinear mixtures. Finally, as an application, we use nonlinear ICA with MND to separate daily returns of a set of stocks in Hong Kong, and the linear causal relations among them are successfully discovered. The resulting causal relations give some interesting insights into the stock market. Such a result can not be achieved by linear ICA. Simulation studies also verify that when doing causality discovery, sometimes one should not ignore the nonlinear distortion in the data generation procedure, even if it is weak. Keywords: nonlinear ICA, regularization, minimal nonlinear distortion, mean square error, best linear reconstruction
3 0.27028906 1 jmlr-2008-A Bahadur Representation of the Linear Support Vector Machine
Author: Ja-Yong Koo, Yoonkyung Lee, Yuwon Kim, Changyi Park
Abstract: The support vector machine has been successful in a variety of applications. Also on the theoretical front, statistical properties of the support vector machine have been studied quite extensively with a particular attention to its Bayes risk consistency under some conditions. In this paper, we study somewhat basic statistical properties of the support vector machine yet to be investigated, namely the asymptotic behavior of the coefficients of the linear support vector machine. A Bahadur type representation of the coefficients is established under appropriate conditions, and their asymptotic normality and statistical variability are derived on the basis of the representation. These asymptotic results do not only help further our understanding of the support vector machine, but also they can be useful for related statistical inferences. Keywords: asymptotic normality, Bahadur representation, classification, convexity lemma, Radon transform
4 0.2330211 87 jmlr-2008-Stationary Features and Cat Detection
Author: François Fleuret, Donald Geman
Abstract: Most discriminative techniques for detecting instances from object categories in still images consist of looping over a partition of a pose space with dedicated binary classiÄ?Ĺš ers. The efÄ?Ĺš ciency of this strategy for a complex pose, that is, for Ä?Ĺš ne-grained descriptions, can be assessed by measuring the effect of sample size and pose resolution on accuracy and computation. Two conclusions emerge: (1) fragmenting the training data, which is inevitable in dealing with high in-class variation, severely reduces accuracy; (2) the computational cost at high resolution is prohibitive due to visiting a massive pose partition. To overcome data-fragmentation we propose a novel framework centered on pose-indexed features which assign a response to a pair consisting of an image and a pose, and are designed to be stationary: the probability distribution of the response is always the same if an object is actually present. Such features allow for efÄ?Ĺš cient, one-shot learning of pose-speciÄ?Ĺš c classiÄ?Ĺš ers. To avoid expensive scene processing, we arrange these classiÄ?Ĺš ers in a hierarchy based on nested partitions of the pose; as in previous work on coarse-to-Ä?Ĺš ne search, this allows for efÄ?Ĺš cient processing. The hierarchy is then â€?foldedâ€? for training: all the classiÄ?Ĺš ers at each level are derived from one base predictor learned from all the data. The hierarchy is â€?unfoldedâ€? for testing: parsing a scene amounts to examining increasingly Ä?Ĺš ner object descriptions only when there is sufÄ?Ĺš cient evidence for coarser ones. In this way, the detection results are equivalent to an exhaustive search at high resolution. We illustrate these ideas by detecting and localizing cats in highly cluttered greyscale scenes. Keywords: supervised learning, computer vision, image interpretation, cats, stationary features, hierarchical search
5 0.19368397 49 jmlr-2008-Learning Control Knowledge for Forward Search Planning
Author: Sungwook Yoon, Alan Fern, Robert Givan
Abstract: A number of today’s state-of-the-art planners are based on forward state-space search. The impressive performance can be attributed to progress in computing domain independent heuristics that perform well across many domains. However, it is easy to find domains where such heuristics provide poor guidance, leading to planning failure. Motivated by such failures, the focus of this paper is to investigate mechanisms for learning domain-specific knowledge to better control forward search in a given domain. While there has been a large body of work on inductive learning of control knowledge for AI planning, there is a void of work aimed at forward-state-space search. One reason for this may be that it is challenging to specify a knowledge representation for compactly representing important concepts across a wide range of domains. One of the main contributions of this work is to introduce a novel feature space for representing such control knowledge. The key idea is to define features in terms of information computed via relaxed plan extraction, which has been a major source of success for non-learning planners. This gives a new way of leveraging relaxed planning techniques in the context of learning. Using this feature space, we describe three forms of control knowledge—reactive policies (decision list rules and measures of progress) and linear heuristics—and show how to learn them and incorporate them into forward state-space search. Our empirical results show that our approaches are able to surpass state-of-the-art nonlearning planners across a wide range of planning competition domains. Keywords: planning, machine learning, knowledge representation, search
6 0.18471563 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes
7 0.16716342 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model
8 0.16024837 58 jmlr-2008-Max-margin Classification of Data with Absent Features
9 0.15302353 55 jmlr-2008-Linear-Time Computation of Similarity Measures for Sequential Data
10 0.12251574 61 jmlr-2008-Mixed Membership Stochastic Blockmodels
11 0.12244353 54 jmlr-2008-Learning to Select Features using their Properties
12 0.11989751 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
13 0.11741796 78 jmlr-2008-Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension
14 0.11434177 86 jmlr-2008-SimpleMKL
15 0.11325319 4 jmlr-2008-A Multiple Instance Learning Strategy for Combating Good Word Attacks on Spam Filters
16 0.10247225 29 jmlr-2008-Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods
17 0.098692946 7 jmlr-2008-A Tutorial on Conformal Prediction
18 0.096055828 70 jmlr-2008-On Relevant Dimensions in Kernel Feature Spaces
19 0.095423475 11 jmlr-2008-Aggregation of SVM Classifiers Using Sobolev Spaces
20 0.095405303 15 jmlr-2008-An Information Criterion for Variable Selection in Support Vector Machines (Special Topic on Model Selection)
topicId topicWeight
[(0, 0.021), (5, 0.034), (9, 0.014), (31, 0.01), (40, 0.035), (54, 0.023), (58, 0.022), (66, 0.039), (72, 0.562), (76, 0.011), (88, 0.081), (92, 0.022), (94, 0.034), (99, 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.82527381 73 jmlr-2008-On the Suitable Domain for SVM Training in Image Coding
Author: Gustavo Camps-Valls, Juan Gutiérrez, Gabriel Gómez-Pérez, Jesús Malo
Abstract: Conventional SVM-based image coding methods are founded on independently restricting the distortion in every image coefficient at some particular image representation. Geometrically, this implies allowing arbitrary signal distortions in an n-dimensional rectangle defined by the ε-insensitivity zone in each dimension of the selected image representation domain. Unfortunately, not every image representation domain is well-suited for such a simple, scalar-wise, approach because statistical and/or perceptual interactions between the coefficients may exist. These interactions imply that scalar approaches may induce distortions that do not follow the image statistics and/or are perceptually annoying. Taking into account these relations would imply using non-rectangular εinsensitivity regions (allowing coupled distortions in different coefficients), which is beyond the conventional SVM formulation. In this paper, we report a condition on the suitable domain for developing efficient SVM image coding schemes. We analytically demonstrate that no linear domain fulfills this condition because of the statistical and perceptual inter-coefficient relations that exist in these domains. This theoretical result is experimentally confirmed by comparing SVM learning in previously reported linear domains and in a recently proposed non-linear perceptual domain that simultaneously reduces the statistical and perceptual relations (so it is closer to fulfilling the proposed condition). These results highlight the relevance of an appropriate choice of the image representation before SVM learning. Keywords: image coding, non-linear perception models, statistical independence, support vector machines, insensitivity zone c 2008 Gustavo Camps-Valls, Juan Guti´ rrez, Gabriel G´ mez-P´ rez and Jes´ s Malo. e o e u ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO 1. Problem Statement: The Diagonal Jacobian Condition Image coding schemes based on support vector machines (SVM) have been successfully introduced in the literature. SVMs have been used in the spatial domain (Robinson and Kecman, 2000), in the block-DCT domain (Robinson and Kecman, 2003), and in the wavelet domain (Ahmed, 2005; Jiao et al., 2005). These coding methods take advantage of the ability of the support vector regression (SVR) algorithm for function approximation using a small number of parameters (signal samples, or ¨ support vectors) (Smola and Scholkopf, 2004). In all current SVM-based image coding techniques, a representation of the image is described by the entropy-coded weights associated to the support vectors necessary to approximate the signal with a given accuracy. Relaxing the accuracy bounds reduces the number of needed support vectors. In a given representation domain, reducing the number of support vectors increases the compression ratio at the expense of bigger distortion (lower image quality). By applying the standard SVR formulation, a certain amount of distortion in each sample of the image representation is allowed. In the original formulation, scalar restrictions on the errors are introduced using a constant ε-insensitivity value for every sample. ´ Recently, this procedure has been refined by Gomez-P´ rez et al. (2005) using a profile-dependent e SVR (Camps-Valls et al., 2001) that considers a different ε for each sample or frequency. This frequency-dependent insensitivity, ε f , accounts for the fact that, according to simple (linear) perception models, not every sample in linear frequency domains (such as DCT or wavelets) contributes to the perceived distortion in the same way. Despite different domains have been proposed for SVM training (spatial domain, block-DCT and wavelets) and different ε insensitivities per sample have been proposed, in conventional SVR formulation, the particular distortions introduced by regression in the different samples are not coupled. In all the reported SVM-based image coding schemes, the RBF kernel is used and the penalization parameter is fixed to an arbitrarily large value. In this setting, considering n-sample signals as n-dimensional vectors, the SVR guarantees that the approximated vectors are confined in n-dimensional rectangles around the original vectors. These rectangles are just n-dimensional cubes in the standard formulation or they have certain elongation if different ε f are considered in each axis, f . Therefore, in all the reported SVM-based coding methods, these rectangles are always oriented along the axes of the (linear) image representation. According to this, a common feature of these (scalar-wise) approaches is that they give rise to decoupled distortions in each dimension. P´ rez-Cruz et al. (2002) proposed a hyperspherical insensitivity zone to correct the penalization e factor in each dimension of multi-output regression problems, but again, restrictions to each sample were still uncoupled. This scalar-wise strategy is not the best option in domains where the different dimensions of the image representation are not independent. For instance, consider the situation where actually independent components, r f , are obtained from a given image representation, y, applying some eventually non-linear transform, R: R y −→ r. In this case, SVM regression with scalar-wise error restriction makes sense in the r domain. However, the original y domain will not be suitable for the standard SVM regression unless the matrix ∇R is diagonal (up to any permutation of the dimensions, that is, only one non-zero element per row). Therefore, if transforms that achieve independence have non-diagonal Jacobian, scalar-wise restrictions in the original (coupled coefficients) domain y are not allowed. 50 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING y2 1 −0,5 −0,5 0,4 R= r2 y1 r1 R−1 = 0,83 −0,83 0,67 −1,67 Figure 1: Insensitivity regions in different representation domains, y (left) and r (right), related by a non-diagonal transform ∇R and its inverse ∇R−1 . Figure 1 illustrates this situation. The shaded region in the right plot (r domain) represents the n-dimensional box determined by the ε f insensitivities in each dimension ( f =1,2), in which a scalar-wise approach is appropriate due to independence among signal coefficients. Given that the particular ∇R transform is not diagonal, the corresponding shaded region in the left plot (the original y domain) is not aligned along the axes of the representation. This has negative implications: note that for the highlighted points, smaller distortions in both dimensions in the y domain (as implied by SVM with tighter but scalar ε f insensitivities) do not necessarily imply lying inside the insensitivity region in the final truly independent (and meaningful) r domain. Therefore, the original y domain is not suitable for the direct application of conventional SVM, and consequently, non-trivial coupled insensitivity regions are required. Summarizing, in the image coding context, the condition for an image representation y to be strictly suitable for conventional SVM learning is that the transform that maps the original representation y to an independent coefficient representation r must be locally diagonal. As will be reviewed below, independence among coefficients (and the transforms to obtain them) may be defined in both statistical and perceptual terms (Hyvarinen et al., 2001; Malo et al., 2001; Epifanio et al., 2003; Malo et al., 2006). On the one hand, a locally diagonal relation to a statistically independent representation is desirable because independently induced distortions (as the conventional SVM approach does) will preserve the statistics of the distorted signal, that is, it will not introduce artificial-looking artifacts. On the other hand, a locally diagonal relation to a perceptually 51 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO independent representation is desirable because independently induced distortions do not give rise to increased subjective distortions due to non-trivial masking or facilitation interactions between the distortions in each dimension (Watson and Solomon, 1997). In this work, we show that conventional linear domains do not fulfill the diagonal Jacobian condition in either the statistical case or in the perceptual case. This theoretical result is experimentally confirmed by comparing SVM learning in previously reported linear domains (Robinson and Kecman, 2003; G´ mez-P´ rez et al., 2005) and in a recently proposed non-linear perceptual domain o e that simultaneously reduces the statistical and the perceptual relations (Malo et al., 2006), thus, this non-linear perceptual domain is closer to fulfilling the proposed condition. The rest of the paper is structured as follows. Section 2 reviews the fact that linear coefficients of the image representations commonly used for SVM training are neither statistically independent nor perceptually independent. Section 3 shows that transforms for obtaining statistical and/or perceptual independence from linear domains have non-diagonal Jacobian. This suggests that there is room to improve the performance of conventional SVM learning reported in linear domains. In Section 4, we propose the use of a perceptual representation for SVM training because it strictly fulfills the diagonal Jacobian condition in the perceptual sense and increases the statistical independence among coefficients, bringing it closer to fulfilling the condition in the statistical sense. The experimental image coding results confirm the superiority of this domain for SVM training in Section 5. Section 6 presents the conclusions and final remarks. 2. Statistical and Perceptual Relations Among Image Coefficients Statistical independence among the coefficients of a signal representation refers to the fact that the joint PDF of the class of signals to be considered can be expressed as a product of the marginal PDFs in each dimension (Hyvarinen et al., 2001). Simple (second-order) descriptions of statistical dependence use the non-diagonal nature of the covariance matrix (Clarke, 1985; Gersho and Gray, 1992). More recent and accurate descriptions use higher-order moments, mutual information, or the non-Gaussian nature (sparsity) of marginal PDFs (Hyvarinen et al., 2001; Simoncelli, 1997). Perceptual independence refers to the fact that the visibility of errors in coefficients of an image may depend on the energy of neighboring coefficients, a phenomenon known in the perceptual literature as masking or facilitation (Watson and Solomon, 1997). Perceptual dependence has been formalized just up to second order, and this may be described by the non-Euclidean nature of the perceptual metric matrix (Malo et al., 2001; Epifanio et al., 2003; Malo et al., 2006). 2.1 Statistical Relations In recent years, a variety of approaches, known collectively as “independent component analysis” (ICA), have been developed to exploit higher-order statistics for the purpose of achieving a unique linear solution for coefficient independence (Hyvarinen et al., 2001). The basis functions obtained when these methods are applied to images are spatially localized and selective for both orientation and spatial frequency (Olshausen and Field, 1996; Bell and Sejnowski, 1997). Thus, they are similar to basis functions of multi-scale wavelet representations. Despite its name, linear ICA does not actually produce statistically independent coefficients when applied to photographic images. Intuitively, independence would seem unlikely, since images are not formed from linear superpositions of independent patterns: the typical combination rule for the elements of an image is occlusion. Empirically, the coefficients of natural image decom52 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING | f | = 10.8 cpd | f | = 24.4 cpd −20 −10 −10 fy (cpd) −30 −20 fy (cpd) −30 0 0 10 10 20 20 30 30 −30 −20 −10 0 f (cpd) 10 20 30 −30 x −20 −10 0 f (cpd) 10 20 30 x Figure 2: Statistical interaction of two particular coefficients of the local Fourier Transform with their neighbors in a natural image database. The absolute value of the frequency of these coefficients is | f | = 10.8 and | f | = 24.4 cycles/degree (cpd). positions in spatially localized oscillating basis functions are found to be fairly well decorrelated (i.e., their covariance is almost zero). However, the amplitudes of coefficients at nearby spatial positions, orientations, and scales are highly correlated (even with orthonormal transforms) (Simoncelli, ´ 1997; Buccigrossi and Simoncelli, 1999; Wainwright et al., 2001; Hyvarinen et al., 2003; Guti errez et al., 2006; Malo et al., 2006; Malo and Guti´ rrez, 2006). This suggests that achieving statistical e independence requires the introduction of non-linearities beyond linear ICA transforms. Figure 2 reproduces one of many results that highlight the presence of statistical relations of natural image coefficients in block PCA or linear ICA-like domains: the energy of spatially localized oscillating filters is correlated with the energy of neighboring filters in scale and orientation (see Guti´ rrez et al., 2006). A remarkable feature is that the interaction width increases with frequency, e as has been reported in other domains, for example, wavelets (Buccigrossi and Simoncelli, 1999; Wainwright et al., 2001; Hyvarinen et al., 2003), and block-DCT (Malo et al., 2006). In order to remove the remaining statistical relations in the linear domains y, non-linear ICA methods are necessary (Hyvarinen et al., 2001; Lin, 1999; Karhunen et al., 2000; Jutten and Karhunen, 2003). Without lack of generality, non-linear ICA transforms can be schematically understood as a two-stage process (Malo and Guti´ rrez, 2006): e T R (( y hh x hh (( r, (1) R−1 T−1 where x is the image representation in the spatial domain, and T is a global unitary linear transform that removes second-order and eventually higher-order relations among coefficients in the spatial domain. Particular examples of T include block PCA, linear ICAs, DCT or wavelets. In the ICA literature notation, T is the separating matrix and T−1 is the mixing matrix. The second transform 53 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO R is an additional non-linearity that is introduced in order to remove the statistical relations that still remain in the y domain. 2.2 Perceptual Relations Perceptual dependence among coefficients in different image representations can be understood by using the current model of V1 cortex. This model can also be summarized by the two-stage (linear and non-linear) process described in Equation (1). In this perceptual case, T is also a linear filter bank applied to the original input image in the spatial domain. This filter bank represents the linear behavior of V1 neurons whose receptive fields happen to be similar to wavelets or linear ICA basis functions (Olshausen and Field, 1996; Bell and Sejnowski, 1997). The second transform, R, is a non-linear function that accounts for the masking and facilitation phenomena that have been reported in the linear y domain (Foley, 1994; Watson and Solomon, 1997). Section 3.2 gives a parametric expression for the second non-linear stage, R: the divisive normalization model (Heeger, 1992; Foley, 1994; Watson and Solomon, 1997). This class of models is based on psychophysical experiments assuming that the last domain, r, is perceptually Euclidean (i.e., perfect perceptual independence). An additional confirmation of this assumption is the success of (Euclidean) subjective image distortion measures defined in that domain (Teo and Heeger, 1994). Straightforward application of Riemannian geometry to obtain the perceptual metric matrix in other domains shows that the coefficients of linear domains x and y, or any other linear transform of them, are not perceptually independent (Epifanio et al., 2003). Figure 3 illustrates the presence of perceptual relations between coefficients when using linear block frequency or wavelet-like domains, y: the cross-masking behavior. In this example, the visibility of the distortions added on top of the background image made of periodic patterns has to be assessed. This is a measure of the sensitivity of a particular perceptual mechanism to distortions in that dimension, ∆y f , when mechanisms tuned to other dimensions are simultaneously active, that is, y f = 0, with f = f . As can be observed, low frequency noise is more visible in high frequency backgrounds than in low frequency backgrounds (e.g., left image). Similarly, high frequency noise is more visible in low frequency backgrounds than in high frequency ones (e.g., right image). That is to say, a signal of a specific frequency strongly masks the corresponding frequency analyzer, but it induces a smaller sensitivity reduction in the analyzers that are tuned to different frequencies. In other words, the reduction in sensitivity of a specific analyzer gets larger as the distance between the background frequency and the frequency of the analyzer gets smaller. The response of each frequency analyzer not only depends on the energy of the signal for that frequency band, but also on the energy of the signal in other frequency bands (cross-masking). This implies that a different amount of noise in each frequency band may be acceptable depending on the energy of that frequency band and on the energy of neighboring bands. This is what we have called perceptual dependence among different coefficients in the y domain. At this point, it is important to stress the similarity between the set of computations to obtain statistically decoupled image coefficients and the known stages of biological vision. In fact, it has been hypothesized that biological visual systems have organized their sensors to exploit the particular statistics of the signals they have to process. See Barlow (2001), Simoncelli and Olshausen (2001), and Simoncelli (2003) for reviews on this hypothesis. In particular, both the linear and the non-linear stages of the cortical processing have been successfully derived using redundancy reduction arguments: nowadays, the same class of linear 54 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING 3 cpd 6 cpd 12 cpd 24 cpd Figure 3: Illustrative example of perceptual dependence (cross-masking phenomenon). Equal energy noise of different frequency content, 3 cycl/deg (cpd), 6 cpd, 12 cpd and 24 cpd, shown on top of a background image. Sampling frequency assumes that these images subtend an angle of 3 deg. stage T is used in transform coding algorithms and in vision models (Olshausen and Field, 1996; Bell and Sejnowski, 1997; Taubman and Marcellin, 2001), and new evidence supports the same idea for the second non-linear stage (Schwartz and Simoncelli, 2001; Malo and Guti´ rrez, 2006). e According to this, the statistical and perceptual transforms, R, that remove the above relations from the linear domains, y, would be very similar if not the same. 3. Statistical and Perceptual Independence Imply Non-diagonal Jacobian In this section, we show that both statistical redundancy reduction transforms (e.g., non-linear ICA) and perceptual independence transforms (e.g., divisive normalization), have non-diagonal Jacobian for any linear image representation, so they are not strictly suitable for conventional SVM training. 3.1 Non-diagonal Jacobian in Non-linear ICA Transforms One possible approach for dealing with global non-linear ICA is to act differentially by breaking the problem into local linear pieces that can then be integrated to obtain the global independent coefficient domain (Malo and Guti´ rrez, 2006). Each differential sub-problem around a particular e point (image) can be locally solved using the standard linear ICA methods restricted to the neighbors of that point (Lin, 1999). Using the differential approach in the context of a two-stage process such as the one in Equation (1), it can be shown that (Malo and Guti´ rrez, 2006): e r = r0 + Z x x0 T (x ) dx = r0 + Z x x0 ∇R(Tx ) T dx , (2) where T (x ) is the local separating matrix for a neighborhood of the image x , and T is the global separating matrix for the whole PDF. Therefore, the Jacobian of the second non-linear stage is: ∇R(y) = ∇R(Tx) = T (x) T−1 . 55 (3) ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO As local linear independent features around a particular image, x, differ in general from global linear independent features, that is, T (x) = T, the above product is not the identity nor diagonal in general. 3.2 Non-diagonal Jacobian in Non-linear Perceptual Transforms The current response model for the cortical frequency analyzers is non-linear (Heeger, 1992; Watson and Solomon, 1997). The outputs of the filters of the first linear stage, y, undergo a non-linear sigmoid transform in which the energy of each linear coefficient is weighted by a linear Contrast Sensitivity Function (CSF) (Campbell and Robson, 1968; Malo et al., 1997) and is further normalized by a combination of the energies of neighbor coefficients in frequency, r f = R(y) f = sgn(y f ) |α f y f |γ , β f + ∑n =1 h f f |α f y f |γ f (4) where α f (Figure 4[top left]) are CSF-like weights, β f (Figure 4[top right]) control the sharpness of the response saturation for each coefficient, γ is the so called excitation exponent, and the matrix h f f determines the interaction neighborhood in the non-linear normalization of the energy. This interaction matrix models the cross-masking behavior (cf. Section 2.2). The interaction in this matrix is assumed to be Gaussian (Watson and Solomon, 1997), and its width increases with the frequency. Figure 4[bottom] shows two examples of this Gaussian interaction for two particular coefficients in a local Fourier domain. Note that the width of the perceptual interaction neighborhood increases with the frequency in the same way as the width of the statistical interaction neighborhood shown in Figure 2. We used a value of γ = 2 in the experiments. Taking derivatives in the general divisive normalization model, Equation (4), we obtain ∇R(y) f f = sgn(y f )γ α f |α f y f |γ |α f y f |γ−1 α f |α f y f |γ−1 δf f − hf f β f + ∑n =1 h f f |α f y f |γ (β f + ∑n =1 h f f |α f y f |γ )2 f f , (5) which is not diagonal because of the interaction matrix, h, which describes the cross-masking between each frequency f and the remaining f = f . Note that the intrinsic non-linear nature of both the statistical and perceptual transforms, Equations (3) and (5), makes the above results true for any linear domain under consideration. Specifically, if any other possible linear domain for image representation is considered, y = T y, then the Jacobian of the corresponding independence transform, R , is ∇R (y ) = ∇R(y) T −1 , which, in general, will also be non-diagonal because of the non-diagonal and point-dependent nature of ∇R(y). To summarize, since no linear domain fulfills the diagonal Jacobian condition in either statistical or perceptual terms, the negative situation illustrated in Figure 1 may occur when using SVM in these domains. Therefore, improved results could be obtained if SVM learning were applied after some transform achieving independent coefficients, R. 4. SVM Learning in a Perceptually Independent Representation In order to confirm the above theoretical results (i.e., the unsuitability of linear representation domains for SVM learning) and to assess the eventual gain that can be obtained from training SVR 56 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING 0.04 αf 0.015 βf 0.03 0.012 0.02 0.009 0.006 0.01 0.003 0 0 4 8 12 16 20 24 28 32 0 f (frequency cycles/degree) 4 8 | f | = 10.8 cpd 16 20 24 28 32 | f | = 24.4 cpd −30 −20 −20 −10 −10 f (cpd) −30 0 y 0 y f (cpd) 12 f (frequency cycles/degree) 10 10 20 20 30 −30 −20 −10 0 fx (cpd) 10 20 30 −30 30 −20 −10 0 fx (cpd) 10 20 30 Figure 4: Parameters of the perceptual model: α f (top left), β f (top right). Bottom figures represent perceptual interaction neighborhoods h f f of two particular coefficients of the local Fourier domain. in a more appropriate domain, we should compare the performance of SVRs in previously reported linear domains (e.g., block-DCT or wavelets) and in one of the proposed non-linear domains (either the statistically independent domain or the perceptually independent domain). Exploration of the statistical independence transform may have academic interest but, in its present formulation, it is not practical for coding purposes: direct application of non-linear ICA as in Equation (2) is very time-consuming for high dimensional vectors since lots of local ICA computations are needed to transform each block, and a very large image database is needed for a robust and significant computation of R. Besides, an equally expensive differential approach is also needed to compute the inverse R−1 for image decoding. In contrast, the perceptual non-linearity (and its inverse) are analytical. These analytical expressions are feasible for reasonable block sizes, and there are efficient iterative methods that can be used for larger vectors (Malo et al., 2006). In this paper, we explore the use of a psychophysically-based divisive normalized domain: first compute a block-DCT transform and then apply the divisive normalization model described above for each block. The results will be compared to the first competitive SVM coding results (Robinson 57 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO ´ and Kecman, 2003) and the posterior improvements reported by G omez-P´ rez et al. (2005), both e formulated in the linear block-DCT domain. As stated in Section 2, by construction, the proposed domain is perceptually Euclidean with perceptually independent components. The Euclidean nature of this domain has an additional benefit: the ε-insensitivity design is very simple because a constant value is appropriate due to the constant perceptual relevance of all coefficients. Thus, direct application of the standard SVR method is theoretically appropriate in this domain. Moreover, beyond its built-in perceptual benefits, this psychophysically-based divisive normalization has attractive statistical properties: it strongly reduces the mutual information between the final coefficients r (Malo et al., 2006). This is not surprising according to the hypothesis that try to explain the early stages of biological vision systems using information theory arguments (Barlow, 1961; Simoncelli and Olshausen, 2001). Specifically, dividing the energy of each linear coefficient by the energy of the neighbors, which are statistically related with it, cf. Figure 2, gives coefficients with reduced statistical dependence. Moreover, as the empirical non-linearities of perception have been reproduced using non-linear ICA in Equation (2) (Malo and Guti´ rrez, 2006), the empirical die visive normalization can be seen as a convenient parametric way to obtain statistical independence. 5. Performance of SVM Learning in Different Domains In this section, we analyze the performance of SVM-based coding algorithms in linear and nonlinear domains through rate-distortion curves and explicit examples for visual comparison. In addition, we discuss how SVM selects support vectors in these domains to represent the image features. 5.1 Model Development and Experimental Setup In the (linear) block-DCT domain, y, we use the method introduced by Robinson and Kecman (2003) (RKi-1), in which the SVR is trained to learn a fixed (low-pass) number of DCT coefficients ´ (those with frequency bigger than 20 cycl/deg are discarded); and the method proposed by G omezP´ rez et al. (2005) (CSF-SVR), in which the relevance of all DCT coefficients is weighted according e to the CSF criterion using an appropriately modulated ε f . In the non-linear domain, r, we use the SVR with constant insensitivity parameter ε (NL-SVR). In all cases, the block-size is 16×16, that is, y, r ∈ R256 . The behavior of JPEG standard is also included in the experiments for comparison purposes. As stated in Section 1, we used the RBF kernel and arbitrarily large penalization parameter in every SVR case. In all experiments, we trained the SVR models without the bias term, and modelled the absolute value of the DCT, y, or response coefficients, r. All the remaining free parameters (ε-insensitivity and Gaussian width of the RBF kernel σ) were optimized for all the considered models and different compression ratios. In the NL-SVM case, the parameters of the divisive normalization used in the experiments are shown in Figure 4. After training, the signal is described by the uniformly quantized Lagrange multipliers of the support vectors needed to keep the regression error below the thresholds ε f . The last step is entropy coding of the quantized weights. The compression ratio is controlled by a factor applied to the thresholds, ε f . 58 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING 5.2 Model Comparison In order to assess the quality of the coded images, three different measures were used: the standard ´ (Euclidean) RMSE, the Maximum Perceptual Error (MPE) (Malo et al., 2000; G omez-P´ rez et al., e 2005; Malo et al., 2006) and the also perceptually meaningful Structural SIMilarity (SSIM) index (Wang et al., 2004). Eight standard 256×256 monochrome 8 bits/pix images were used in the experiments. Average rate-distortion curves are plotted in Figure 5 in the range [0.05, 0.6] bits/pix (bpp). According to these entropy-per-sample data, original file size was 64 KBytes in every case, while the compressed image sizes were in the range [0.4, 4.8] KBytes. This implies that the compression ratios were in the range [160:1, 13:1]. In general, a clear gain over standard JPEG is obtained by all SVM-based methods. According to the standard Euclidean MSE point of view, the performance of RKi-1 and CSF-SVR algorithms is basically the same (note the overlapped curves in Figure 5(a)). However, it is widely known that the MSE results are not useful to represent the subjective quality of images, as extensively reported elsewhere (Girod, 1993; Teo and Heeger, 1994; Watson and Malo, 2002). When using more appropriate (perceptually meaningful) quality measures (Figures 5(b)-(c)), the CSF-SVR obtains a certain advantage over the RKi-1 algorithm for all compression rates, which was already reported by G´ mez-P´ rez et al. (2005). In all measures, and for the whole considered entropy range, the o e proposed NL-SVR clearly outperforms all previously reported methods, obtaining a noticeable gain at medium-to-high compression ratios (between 0.1 bpp (80:1) and 0.3 bpp (27:1)). Taking into account that the recommended bit rate for JPEG is about 0.5 bpp, from Figure 5 we can also conclude that the proposed technique achieves the similar quality levels at a lower bit rate in the range [0.15, 0.3] bpp. Figure 6 shows representative visual results of the considered SVM strategies on standard images (Lena and Barbara) at the same bit rate (0.3 bpp, 27:1 compression ratio or 2.4 KBytes in 256×256 images). The visual inspection confirms that the numerical gain in MPE and SSIM shown in Figure 5 is also perceptually significant. Some conclusions can be extracted from this figure. ´ First, as previously reported by Gomez-P´ rez et al. (2005), RKi-1 leads to poorer (blocky) results e because of the crude approximation of the CSF (as an ideal low-pass filter) and the equal relevance applied to the low-frequency DCT-coefficients. Second, despite the good performance yielded by the CSF-SVR approach to avoid blocking effects, it is worth noting that high frequency details are smoothed (e.g., see Barbara’s scarf). These effects are highly alleviated by introducing SVR in the non-linear domain. See, for instance, Lena’s eyes, her hat’s feathers or the better reproduction of the high frequency pattern in Barbara’s clothes. Figure 7 shows the results obtained by all considered methods at a very high compression ratio for the Barbara image (0.05 bpp, 160:1 compression ratio or 0.4 KBytes in 256×256 images). This experiment is just intended to show the limits of methods performance since it is out of the recommended rate ranges. Even though this scenario is unrealistic, differences among methods are still noticeable: the proposed NL-SVR method reduces the blocky effects (note for instance that the face is better reproduced). This is due to a better distribution of support vectors in the perceptually independent domain. 5.3 Support Vector Distribution The observed different perceptual image quality obtained with each approach is a direct consequence of support vector distribution in different domains. Figure 8 shows a representative example 59 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO 20 JPEG RKi−1 CSF−SVR NL−SVR 18 RMSE 16 14 12 10 8 0.1 0.2 0.3 0.4 Entropy (bits/pix) 0.5 0.6 22 JPEG RKi−1 CSF−SVR NL−SVR 20 18 16 MPE 14 12 10 8 6 4 0.1 0.2 0.3 0.4 Entropy (bits/pix) 0.5 0.6 0.85 0.8 SSIM 0.75 0.7 0.65 JPEG 0.6 RKi−1 CSF−SVR 0.55 NL−SVR 0.5 0.1 0.2 0.3 0.4 Entropy (bits/pix) 0.5 0.6 Figure 5: Average rate distortion curves over eight standard images (Lena, Barbara, Boats, Einstein, Peppers, Mandrill, Goldhill, Camera man) using objective and subjective measures for the considered JPEG (dotted) and the SVM approaches (RKi-1 dash-dotted, CSF-SVR dashed and NL-SVR solid). RMSE distortion (top), Maximum Perceptual Error, MPE (middle) (Malo et al., 2000; G´ mez-P´ rez et al., 2005; Malo et al., 2006), and Structural o e SIMilarity index, SSIM (bottom) (Wang et al., 2004). 60 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING Rki Rki SVR+CSF SVR+CSF NL+SVR NL+SVR Figure 6: Examples of decoded Lena (left) and Barbara (right) images at 0.3 bits/pix. From top to bottom: JPEG, RKi-1, CSF-SVR, and NL-SVR. 61 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO (a) (b) (c) (d) Figure 7: Examples of decoded Barbara images at a high compression ratio of 0.05 bits/pix (160:1) for (a) JPEG, (b) RKi-1, (c) CSF-SVR, and (d) NL-SVR. of the distribution of the selected support vectors by the RKi-1 and the CSF-SVR models working in the linear DCT domain, and the NL-SVM working in the perceptually independent non-linear domain r. Specifically, a block of Barbara’s scarf at different compression ratios is used for illustration purposes. The RKi-1 approach (Robinson and Kecman, 2003) uses a constant ε but, in order to consider the low subjective relevance of the high-frequency region, the corresponding coefficients are neglected. As a result, this approach only allocates support vectors in the low/medium frequency regions. The CSF-SVR approach uses a variable ε according to the CSF and gives rise to a more natural concentration of support vectors in the low/medium frequency region, which captures medium to high frequency details at lower compression rates (0.5 bits/pix). Note that the number of support vectors is bigger than in the RKi-1 approach, but it selects some necessary high-frequency coefficients to keep the error below the selected threshold. However, for bigger compression ratios (0.3 bits/pix), it misrepresents some high frequency, yet relevant, features (e.g., the peak from the stripes). The NL-SVM approach works in the non-linear transform domain, in which a more uniform coverage 62 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING (a) (b) (c) 0 200 10 100 0 200 5 5 15 0 0 20 30 25 20 10 20 30 fy (cpd) 10 0 fy (cpd) 30 10 fx (cpd) 0 200 5 10 100 0 200 5 10 100 15 25 10 30 20 30 f (cpd) y 10 0 30 10 0.5 20 30 25 20 5 1 0 0 20 30 0 1.5 15 15 0 fy (cpd) 30 0 fx (cpd) 20 25 20 0 fx (cpd) fx (cpd) 20 30 25 20 30 10 0.5 15 0 5 1 10 100 15 0 1.5 25 20 f (cpd) y 10 30 fy (cpd) 0 0 fx (cpd) fx (cpd) Figure 8: Signal in different domains and the selected support vectors by the SVM models in a block of the Barbara image at 0.3 bits/pix (top row) and 0.5 bits/pix (bottom row). Different domains are analyzed: (a) linear DCT using RKi-1, (b) linear DCT with CSF-SVM, and (c) non-linear perceptual domain with standard ε-SVM (NL-SVR). of the domain is done, accounting for richer (and perceptually independent) coefficients to perform efficient sparse signal reconstruction. It is important to remark that, for a given method (or domain), tightening ε f implies (1) considering more support vectors, and (2) an increase in entropy (top and bottom rows in Figure 8, 0.3 bpp to 0.5 bpp). However, note that the relevant measure is the entropy and not the number of support vectors: even though the number of selected support vectors in the r domain is higher, their variance is lower, thus giving rise to the same entropy after entropy coding. 6. Conclusions In this paper, we have reported a condition on the suitable domain for developing efficient SVM image coding schemes. The so-called diagonal Jacobian condition states that SVM regression with scalar-wise error restriction in a particular domain makes sense only if the transform that maps this domain to an independent coefficient representation is locally diagonal. We have demonstrated that, 63 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO in general, linear domains do not fulfill this condition because non-trivial statistical and perceptual inter-coefficient relations do exist in these domains. This theoretical finding has been experimentally confirmed by observing that improved compression results are obtained when SVM is applied in a non-linear perceptual domain that starts from the same linear domain used by previously reported SVM-based image coding schemes. These results highlight the relevance of an appropriate image representation choice before SVM learning. Further work is tied to the use of SVM-based coding schemes in statistically, rather than perceptually, independent non-linear ICA domains. In order to do so, local PCA instead of local ICA may be used in the local-to-global differential approach (Malo and Guti´ rrez, 2006) to speed up the e non-linear computation. Acknowledgments This work has been partly supported by the Spanish Ministry of Education and Science under grant CICYT TEC2006-13845/TCM and by the Generalitat Valenciana under grant GV-06/215. References R. Ahmed. Wavelet-based image compression using SVM learning and encoding techniques. Proc. 8th IASTED International Conference Computer Graphics and Imaging, 1:162–166, 2005. H. B. Barlow. Redundancy reduction revisited. Network: Comp. Neur. Sys., 12:241–253, 2001. H. B. Barlow. Possible principles underlying the transformation of sensory messages. In WA Rosenblith, editor, Sensory Communication, pages 217–234. MIT Press, Cambridge, MA, 1961. A. J. Bell and T. J. Sejnowski. The ‘independent components’ of natural scenes are edge filters. Vision Research, 37(23):3327–3338, 1997. R. W. Buccigrossi and E. P. Simoncelli. Image compression via joint statistical characterization in the wavelet domain. IEEE Transactions on Image Processing, 8(12):1688–1701, 1999. F. W. Campbell and J. G. Robson. Application of Fourier analysis to the visibility of gratings. Journal of Physiology, 197(3):551–566, August 1968. G. Camps-Valls, E. Soria-Olivas, J. P´ rez-Ruixo, A. Art´ s-Rodr´guez, F. P´ rez-Cruz, and e e ı e A. Figueiras-Vidal. A profile-dependent kernel-based regression for cyclosporine concentration prediction. In Neural Information Processing Systems (NIPS) – Workshop on New Directions in Kernel-Based Learning Methods, Vancouver, Canada, December 2001. R. J. Clarke. Transform Coding of Images. Academic Press, New York, 1985. I. Epifanio, J. Guti´ rrez, and J. Malo. Linear transform for simultaneous diagonalization of coe variance and perceptual metric matrix in image coding. Pattern Recognition, 36(8):1799–1811, August 2003. J. M. Foley. Human luminance pattern mechanisms: Masking experiments require a new model. Journal of the Optical Society of America A, 11(6):1710–1719, 1994. 64 O N THE S UITABLE D OMAIN FOR SVM T RAINING IN I MAGE C ODING A. Gersho and R. M. Gray. Vector Quantization and Signal Compression. Kluwer Academic Press, Boston, 1992. B. Girod. What’s wrong with mean-squared error. In A. B. Watson, editor, Digital Images and Human Vision, pages 207–220. The MIT press, 1993. G. G´ mez-P´ rez, G. Camps-Valls, J. Guti´ rrez, and J. Malo. Perceptually adaptive insensitivity for o e e support vector machine image coding. IEEE Transactions on Neural Networks, 16(6):1574–1581, 2005. J. Guti´ rrez, F. J. Ferri, and J. Malo. Regularization operators for natural images based on nonlinear e perception models. IEEE Transactions on Image Processing, 15(1):189–2000, January 2006. D. J. Heeger. Normalization of cell responses in cat striate cortex. Vis. Neurosci., 9:181–198, 1992. A. Hyvarinen, J. Karhunen, and E. Oja. Independent Component Analysis. John Wiley & Sons, New York, 2001. A. Hyvarinen, J. Hurri, and J. Vayrynenm. Bubbles: a unifying framework for low-level statistical properties of natural image sequences. Journal of the Optical Society of America A, 20(7), 2003. R. Jiao, Y. Li, Q. Wang, and B. Li. SVM regression and its application to image compression. In International Conference on Intelligent Computing, volume 3645, pages 747–756. Lecture Notes on Computer Science, 2005. C. Jutten and J. Karhunen. Advances in nonlinear blind source separation. Proc. of the 4th Int. Symp. on Independent Component Analysis and Blind Signal Separation (ICA2003), pages 245– 256, 2003. J. Karhunen, S. Malaroiu, and M. Ilmoniemi. Local linear independent component analysis based on clustering. Intl. J. Neur. Syst., 10(6), December 2000. J. K. Lin. Factorizing probability density functions: Generalizing ICA. In Proc. of the First Intl. Workshop on ICA and Signal Separation, pages 313–318, Aussois, France, 1999. J. Malo and J. Guti´ rrez. V1 non-linear properties emerge from local-to-global non-linear ICA. e Network: Computation in Neural Systems, 17:85–102, 2006. J. Malo, A. M. Pons, A. Felipe, and J. M. Artigas. Characterization of human visual system threshold performance by a weighting function in the Gabor domain. Journal of Modern Optics, 44(1): 127–148, 1997. J. Malo, F. Ferri, J. Albert, J. Soret, and J. M. Artigas. The role of perceptual contrast non-linearities in image transform coding. Image & Vision Computing, 18(3):233–246, February 2000. J. Malo, J. Guti´ rrez, I. Epifanio, F. Ferri, and J. M. Artigas. Perceptual feed-back in multigrid moe tion estimation using an improved DCT quantization. IEEE Transactions on Image Processing, 10(10):1411–1427, October 2001. J. Malo, I. Epifanio, R. Navarro, and E. Simoncelli. Non-linear image representation for efficient perceptual coding. IEEE Transactions on Image Processing, 15(1):68–80, 2006. 65 ´ ´ ´ C AMPS -VALLS , G UTI E RREZ , G OMEZ -P E REZ AND M ALO B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–609, 1996. F. P´ rez-Cruz, G. Camps-Valls, E. Soria-Olivas, J. J. P´ rez-Ruixo, A. R. Figueiras-Vidal, and e e A. Art´ s-Rodr´guez. Multi-dimensional function approximation and regression estimation. In e ı International Conference on Artificial Neural Networks, ICANN2002, Madrid, Spain., Aug 2002. Lecture Notes in Computer Science. Springer–Verlag. J. Robinson and V. Kecman. The use of Support Vector Machines in image compression. In Proceedings of the International Conference on Engineering Intelligence Systems, EIS’2000, volume 36, pages 93–96, University of Paisley, Scotland, U.K., June 2000. J. Robinson and V. Kecman. Combining Support Vector Machine learning with the discrete cosine transform in image compression. IEEE Transactions Neural Networks, 14(4):950–958, July 2003. O. Schwartz and E. P. Simoncelli. Natural signal statistics and sensory gain control. Nature Neuroscience, 4(8):819–825, 2001. E. P. Simoncelli. Vision and the statistics of the visual environment. Current Opinion in Neurobiology, 13(2):144–149, 2003. E. P. Simoncelli. Statistical models for images: Compression, restoration and synthesis. In 31st Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, 1997. E. P. Simoncelli and B. O. Olshausen. Natural image statistics and neural representation. Annual Review of Neuroscience, 24:1193–1216, 2001. A. J. Smola and B. Sch¨ lkopf. A tutorial on support vector regression. Statistics and Computing, o 14:199–222, 2004. D. S. Taubman and M. W. Marcellin. JPEG2000: Image Compression Fundamentals, Standards and Practice. Kluwer Academic Publishers, Boston, 2001. P. C. Teo and D. J. Heeger. Perceptual image distortion. Proceedings of the First IEEE International Conference on Image Processing, 2:982–986, 1994. M. J. Wainwright, E. P. Simoncelli, and A. S. Willsky. Random cascades on wavelet trees and their use in analyzing and modeling natural images. Applied and Computational Harmonic Analysis, 11:89–123, 2001. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004. A. B. Watson and J. Malo. Video quality measures based on the standard spatial observer. In Proceedings of the IEEE International Conference on Image Proceedings, volume 3, pages 41– 44, 2002. A. B. Watson and J. A. Solomon. A model of visual contrast gain control and pattern masking. Journal of the Optical Society of America A, 14(9):2379–2391, September 1997. 66
2 0.69090909 67 jmlr-2008-Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies
Author: Andreas Krause, Ajit Singh, Carlos Guestrin
Abstract: When monitoring spatial phenomena, which can often be modeled as Gaussian processes (GPs), choosing sensor locations is a fundamental task. There are several common strategies to address this task, for example, geometry or disk models, placing sensors at the points of highest entropy (variance) in the GP model, and A-, D-, or E-optimal design. In this paper, we tackle the combinatorial optimization problem of maximizing the mutual information between the chosen locations and the locations which are not selected. We prove that the problem of finding the configuration that maximizes mutual information is NP-complete. To address this issue, we describe a polynomial-time approximation that is within (1 − 1/e) of the optimum by exploiting the submodularity of mutual information. We also show how submodularity can be used to obtain online bounds, and design branch and bound search procedures. We then extend our algorithm to exploit lazy evaluations and local structure in the GP, yielding significant speedups. We also extend our approach to find placements which are robust against node failures and uncertainties in the model. These extensions are again associated with rigorous theoretical approximation guarantees, exploiting the submodularity of the objective function. We demonstrate the advantages of our approach towards optimizing mutual information in a very extensive empirical study on two real-world data sets. Keywords: networks Gaussian processes, experimental design, active learning, spatial learning; sensor
3 0.32007653 83 jmlr-2008-Robust Submodular Observation Selection
Author: Andreas Krause, H. Brendan McMahan, Carlos Guestrin, Anupam Gupta
Abstract: In many applications, one has to actively select among a set of expensive observations before making an informed decision. For example, in environmental monitoring, we want to select locations to measure in order to most effectively predict spatial phenomena. Often, we want to select observations which are robust against a number of possible objective functions. Examples include minimizing the maximum posterior variance in Gaussian Process regression, robust experimental design, and sensor placement for outbreak detection. In this paper, we present the Submodular Saturation algorithm, a simple and efficient algorithm with strong theoretical approximation guarantees for cases where the possible objective functions exhibit submodularity, an intuitive diminishing returns property. Moreover, we prove that better approximation algorithms do not exist unless NP-complete problems admit efficient algorithms. We show how our algorithm can be extended to handle complex cost functions (incorporating non-unit observation cost or communication and path costs). We also show how the algorithm can be used to near-optimally trade off expected-case (e.g., the Mean Square Prediction Error in Gaussian Process regression) and worst-case (e.g., maximum predictive variance) performance. We show that many important machine learning problems fit our robust submodular observation selection formalism, and provide extensive empirical evaluation on several real-world problems. For Gaussian Process regression, our algorithm compares favorably with state-of-the-art heuristics described in the geostatistics literature, while being simpler, faster and providing theoretical guarantees. For robust experimental design, our algorithm performs favorably compared to SDP-based algorithms. c 2008 Andreas Krause, H. Brendan McMahan, Carlos Guestrin and Anupam Gupta. K RAUSE , M C M AHAN , G UESTRIN AND G UPTA Keywords: observation selection, experimental design, active learning, submodular functions, Gaussi
4 0.20955904 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields
Author: Thomas G. Dietterich, Guohua Hao, Adam Ashenfelter
Abstract: Conditional random fields (CRFs) provide a flexible and powerful model for sequence labeling problems. However, existing learning algorithms are slow, particularly in problems with large numbers of potential input features and feature combinations. This paper describes a new algorithm for training CRFs via gradient tree boosting. In tree boosting, the CRF potential functions are represented as weighted sums of regression trees, which provide compact representations of feature interactions. So the algorithm does not explicitly consider the potentially large parameter space. As a result, gradient tree boosting scales linearly in the order of the Markov model and in the order of the feature interactions, rather than exponentially as in previous algorithms based on iterative scaling and gradient descent. Gradient tree boosting also makes it possible to use instance weighting (as in C4.5) and surrogate splitting (as in CART) to handle missing values. Experimental studies of the effectiveness of these two methods (as well as standard imputation and indicator feature methods) show that instance weighting is the best method in most cases when feature values are missing at random. Keywords: sequential supervised learning, conditional random fields, functional gradient, gradient tree boosting, missing values
5 0.18913674 96 jmlr-2008-Visualizing Data using t-SNE
Author: Laurens van der Maaten, Geoffrey Hinton
Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large data sets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of data sets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the data sets. Keywords: visualization, dimensionality reduction, manifold learning, embedding algorithms, multidimensional scaling
6 0.18432024 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning
8 0.18017802 57 jmlr-2008-Manifold Learning: The Price of Normalization
9 0.17990538 1 jmlr-2008-A Bahadur Representation of the Linear Support Vector Machine
10 0.17982224 76 jmlr-2008-Optimization Techniques for Semi-Supervised Support Vector Machines
11 0.17819817 58 jmlr-2008-Max-margin Classification of Data with Absent Features
12 0.17736605 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model
13 0.17617249 9 jmlr-2008-Active Learning by Spherical Subdivision
14 0.17549059 49 jmlr-2008-Learning Control Knowledge for Forward Search Planning
15 0.17507133 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection
16 0.17178413 53 jmlr-2008-Learning to Combine Motor Primitives Via Greedy Additive Regression
17 0.17148024 86 jmlr-2008-SimpleMKL
18 0.17066996 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks
19 0.17040963 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers
20 0.17025518 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction