jmlr jmlr2010 jmlr2010-50 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Valero Laparra, Juan Gutiérrez, Gustavo Camps-Valls, Jesús Malo
Abstract: A successful class of image denoising methods is based on Bayesian approaches working in wavelet representations. The performance of these methods improves when relations among the local frequency coefficients are explicitly included. However, in these techniques, analytical estimates can be obtained only for particular combinations of analytical models of signal and noise, thus precluding its straightforward extension to deal with other arbitrary noise sources. In this paper, we propose an alternative non-explicit way to take into account the relations among natural image wavelet coefficients for denoising: we use support vector regression (SVR) in the wavelet domain to enforce these relations in the estimated signal. Since relations among the coefficients are specific to the signal, the regularization property of SVR is exploited to remove the noise, which does not share this feature. The specific signal relations are encoded in an anisotropic kernel obtained from mutual information measures computed on a representative image database. In the proposed scheme, training considers minimizing the Kullback-Leibler divergence (KLD) between the estimated and actual probability functions (or histograms) of signal and noise in order to enforce similarity up to the higher (computationally estimable) order. Due to its non-parametric nature, the method can eventually cope with different noise sources without the need of an explicit re-formulation, as it is strictly necessary under parametric Bayesian formalisms. Results under several noise levels and noise sources show that: (1) the proposed method outperforms conventional wavelet methods that assume coefficient independence, (2) it is similar to state-of-the-art methods that do explicitly include these relations when the noise source is Gaussian, and (3) it gives better numerical and visual performance when more complex, realistic noise sources are considered. Therefore, the proposed machine learning approach can be seen as a mor
Reference: text
sentIndex sentText sentNum sentScore
1 e Editor: Donald Geman Abstract A successful class of image denoising methods is based on Bayesian approaches working in wavelet representations. [sent-13, score-0.716]
2 In this paper, we propose an alternative non-explicit way to take into account the relations among natural image wavelet coefficients for denoising: we use support vector regression (SVR) in the wavelet domain to enforce these relations in the estimated signal. [sent-16, score-1.238]
3 The specific signal relations are encoded in an anisotropic kernel obtained from mutual information measures computed on a representative image database. [sent-18, score-0.576]
4 Therefore, the proposed machine learning approach can be seen as a more flexible (model-free) alternative to the explicit description of wavelet coefficient relations for image denoising. [sent-22, score-0.663]
5 Introduction Denoising requires representing the distorted signal in a domain where signal and noise display different enough behavior. [sent-25, score-0.497]
6 In image denoising, classical regularization techniques are used to impose smoothness in the spatial domain since noise is typically white (Banham and Katsaggelos, 1997). [sent-27, score-0.464]
7 Several image denoising methods working in the spatial domain have been presented in the literature, either based on splines (Takeda et al. [sent-30, score-0.473]
8 In fact, wavelet representations have been recognized as quite appropriate domains for image denoising. [sent-36, score-0.528]
9 1 Wavelet representations are convenient in image denoising because natural image samples have a very specific statistical behavior in this domain. [sent-37, score-0.535]
10 On the other hand, the combination of smooth regions with local, high contrast features, such as edges, gives rise to sparse activation of wavelet sensors. [sent-39, score-0.388]
11 It is well-known, however, that marginal models in the wavelet domain are not enough for a proper signal characterization: relevant relations among coefficients still remain after wavelet transforms (Simoncelli, 1999). [sent-43, score-1.022]
12 For instance, edges lead to strong coupling between the energy of neighboring wavelet coefficients of natural images. [sent-44, score-0.387]
13 These relations among wavelet coefficients have proven to be a key issue in applications such as image coding (Malo et al. [sent-45, score-0.685]
14 The use of these relations is in the roots of the most recent and successful image denoising approaches as well (Portilla et al. [sent-49, score-0.515]
15 In this case, more complex image models explicitly including the relations among coefficients have to be plugged and fitted into the Bayesian framework to obtain the image estimates. [sent-52, score-0.453]
16 Conversely, non-parametric approaches can include the above qualitative properties in an indirect way without the restriction of being analytically attached to particular image or noise models. [sent-63, score-0.371]
17 In this work we apply support vector regression (SVR) in a redundant (overcomplete) wavelet domain to the noisy image. [sent-65, score-0.473]
18 Its solution may be found for arbitrary noise sources even without knowing the functional form of the noise PDF since it can work with just noise histograms. [sent-69, score-0.651]
19 It is capable to take into account the relations among wavelet coefficients of natural images through the use of a suitable kernel. [sent-72, score-0.647]
20 The proposed method does not assume independence among the signal coefficients in the wavelet domain, as opposed to Simoncelli (1999) and Figueiredo and Nowak (2001), nor an explicit model of signal relations, as done in Portilla et al. [sent-74, score-0.594]
21 Therefore, the proposed machine learning approach can be seen as a more flexible (model-free) alternative to the explicit description of wavelet coefficient relations for image denoising. [sent-76, score-0.663]
22 Non-explicit use of dependencies in local frequency domains for denoising was also introduced in Guti´ rrez et al. [sent-80, score-0.398]
23 The idea of using SVR regularization in the wavelet domain for image denoising has been recently introduced in Kai Tick Chow and Lee (2001), Cheng et al. [sent-84, score-0.747]
24 On the contrary, in this paper we address the key following issues: 875 ´ L APARRA , G UTI E RREZ , C AMPS -VALLS AND M ALO • Natural images features in redundant wavelet domains. [sent-88, score-0.5]
25 Interesting insight about the problem can be obtained by analyzing the mutual information between the coefficients of wavelet representations (Buccigrossi and Simoncelli, 1999; Liu and Moulin, 2001). [sent-89, score-0.426]
26 Even though this methodological framework is proposed in the context of achromatic image denoising, it can be readily extended to other denoising problems in which wavelet coefficients exhibit particular relations, such as in color or multispectral images, speech signals, etc. [sent-97, score-0.716]
27 In Section 2, we point out relevant signal features in redundant wavelet domains through mutual information measurements. [sent-99, score-0.605]
28 Section 5 shows the performance of the proposed method compared to standard denoising methods in the wavelet domain. [sent-102, score-0.568]
29 Features of Natural Images in the Steerable Wavelet Domain The starting hypothesis for image denoising is that signal and noise display different characteristics and thus it is possible to separate them in a certain domain. [sent-106, score-0.677]
30 Natural images show non-trivial relationships among wavelet transform coefficients. [sent-107, score-0.491]
31 In the following, we review the reported statistical properties of natural images in orthogonal wavelet domains, and then analyze them in the redundant steerable wavelet domain selected in our implementation. [sent-108, score-1.031]
32 Specifically we will use mutual information (MI) to assess the statistical relations among wavelet coefficients of natural images as in Buccigrossi and Simoncelli (1999) and Liu and Moulin (2001). [sent-109, score-0.715]
33 1 Intraband Versus Interband Signal Relations in Orthogonal Wavelets Dependencies among orthogonal wavelet coefficients were measured using mutual information in Liu and Moulin (2001). [sent-111, score-0.426]
34 The dependencies were studied at interband and intraband levels, and the results suggested that the mutual information between intraband neighbors is typically larger than the interband relations for several models and types of interaction. [sent-112, score-0.497]
35 These evidences suggest that the dependencies among spatial neighboring coefficients (intraband) in orthogonal wavelet descriptions are stronger than the interband dependencies. [sent-115, score-0.472]
36 2 Natural Images Relations in Steerable Wavelets Redundant (non-orthogonal) wavelet representations may be more suited to image denoising applications since redundant representation of the image features may make the signal inherent relations clearer. [sent-117, score-1.178]
37 Despite the reported results on the relations of signal coefficients in orthogonal transforms, a number of questions have to be answered in the case of redundant representations, and in particular, in the steerable wavelet domain: 1. [sent-124, score-0.785]
38 The first question is particularly important since, even though the steerable transform may intensify the relations among signal coefficients, its redundant nature may also introduce relations which could be due to the transform but not to the signal. [sent-130, score-0.644]
39 1 S IGNAL R ELATIONS ARE S PECIFIC TO THE S IGNAL In our first test, following Liu and Moulin (2001), we computed the mutual information among steerable wavelet coefficients of the data set for different spatial, orientation, and scale distances. [sent-136, score-0.539]
40 877 ´ L APARRA , G UTI E RREZ , C AMPS -VALLS AND M ALO (a) (b) (c) Figure 1: Comparison between redundancy of natural image coefficients in the steerable wavelet representation (solid), and the redundancy due to this representation (dashed). [sent-148, score-0.648]
41 2 I NTRABAND S IGNAL R ELATIONS D OMINATE OVER I NTERSCALE OR O RIENTATION Besides, the results show that intraband relations in the signal are also more important than interorientation or interscale relations. [sent-164, score-0.404]
42 Beyond consistency with previously reported results for orthogonal wavelet transforms (Buccigrossi and Simoncelli, 1999; Liu and Moulin, 2001), it has been observed that the relations are specific to the signal and not just induced by the transform. [sent-166, score-0.633]
43 2 [top] is the specific spatial arrangement of these relations: the presence of oriented structures in natural images gives rise to strong anisotropic intraband relations in the different subbands. [sent-180, score-0.577]
44 These mutual information results match recently reported results on autocorrelation of intraband wavelet coefficients (Goossens et al. [sent-182, score-0.532]
45 878 I MAGE D ENOISING WITH K ERNELS BASED ON NATURAL I MAGE R ELATIONS α=0 α= π 8 α= 2π 8 α= α= 3π 8 4π 8 α= 5π 8 α= 6π 8 α= 7π 8 Figure 2: Mutual information among the central coefficient and its spatial neighbors in the same subband (intraband) in the steerable wavelet domain. [sent-185, score-0.661]
46 Summarizing, natural images have singular features in the steerable wavelet domain (Figs. [sent-188, score-0.634]
47 1 and 2): given a distorted image, enforcing these singular oriented relations among coefficients in every subband (with the appropriate kernels) will eventually preserve the natural signal relations and remove the noise. [sent-189, score-0.629]
48 Of course, the bigger the difference between the shape of the intraband relations in signal and noise the better the results are expected to be. [sent-190, score-0.582]
49 Restoring Wavelet Relations with SVR The effect of noise in the wavelet domain is introducing artificial deviations from the original signal and hiding the natural relations among the coefficients (see an illustrative example in Fig. [sent-192, score-0.894]
50 The original set of wavelet coefficients, x = T · i, has to be 879 ´ L APARRA , G UTI E RREZ , C AMPS -VALLS AND M ALO Figure 3: Effect of noise on the wavelet coefficients. [sent-205, score-0.917]
51 Patch of a subband of a wavelet representation of the original image Barbara (left) and its noisy version (right). [sent-206, score-0.657]
52 Due to the observed strong intraband relations, we will use the SVR in the wavelet domain in patches inside each subband. [sent-209, score-0.495]
53 Now, given input-output pairs {pi , yi }N , where pi are the wavelet indices and yi are the corresponding noisy wavelet coefficients i=1 in a patch, we train the adaptive SVR (Camps-Valls et al. [sent-211, score-0.783]
54 In the following, we restrict the range of possible values of these parameters, θ = (Ci , εi , K), in the particular case of image denoising in wavelet domains: Penalization factor. [sent-234, score-0.716]
55 In the denoising problem considered here, the signal variance substantially differs in each wavelet scale. [sent-236, score-0.686]
56 This profile ki was obtained by averaging the standard deviation of wavelet coefficients over 100 images from the database used in Section 2. [sent-238, score-0.461]
57 In redundant wavelet representations, this standard deviation is coefficient dependent. [sent-248, score-0.397]
58 The transformed standard deviations can be estimated either (1) empiro ically from noise samples, or (2) computed from the noise covariance matrix if it is known. [sent-252, score-0.402]
59 In the empirical case, noise samples can be experimentally obtained by applying the noise source to a large enough set of images, and writing the noise as in Eq. [sent-253, score-0.603]
60 In the case that the noise covariance is known, the corresponding matrix in the selected wavelet domain can be obtained from the noise covariance matrix in the spatial domain, Σn , and the transform T (Stark and Woods, 1994). [sent-256, score-0.905]
61 (2) reduces to: n 1/2 εi = τ σn diag(T · T ⊤ )i , (3) where σ2 is the noise variance in the spatial domain, and τ is a scaling factor to be adapted n for each particular image and noise combination. [sent-259, score-0.634]
62 In our case, we propose to take into account image coefficient relations by analyzing a large (representative) database and taking the (oriented) mutual information among samples as core distance measure. [sent-272, score-0.373]
63 The conclusion of this section is that in the case of image denoising in wavelet domains, an appropriate analysis of the signal variance, the noise variance, and the relations among the wavelet coefficients of the signal can be used to strongly reduce the dimensionality of the SVR parameter space. [sent-284, score-1.668]
64 However, as stated above, denoising methods usually assume that additional probabilistic information on the signal and noise is available: p(i) and p(n|i). [sent-292, score-0.529]
65 (5) This means that the selected SVR parameters are those that give rise to both signal and noise estimates probabilistically similar to the true signal and noise respectively. [sent-307, score-0.668]
66 Note that this similarity does not require analytical models of the PDFs since it can be computed from histograms (or signal and noise samples). [sent-308, score-0.359]
67 First the noisy image is transformed by a steerable wavelet filter bank. [sent-317, score-0.664]
68 These SVRs use the profiles for the penalization factor and the insensitivity computed from signal and noise samples as described in Section 3. [sent-319, score-0.43]
69 The SVRs use the kernel based on MI that captures signal relations in the wavelet domain as described in Section 3. [sent-321, score-0.689]
70 Moreover we validate the proposed automatic procedure for SVR selection considering examples with different noise sources including non-additive and signal dependent cases. [sent-328, score-0.367]
71 In the image denoising case, the deviation from an appropriate solution in combined directions of the parameters gives rise to different solutions that combine the negative effect of the departure in each direction. [sent-343, score-0.388]
72 3 qualitatively illustrate the usefulness of the proposed 884 I MAGE D ENOISING WITH K ERNELS BASED ON NATURAL I MAGE R ELATIONS Figure 5: Effect of SVR parameters on the noisy wavelet patch of Fig. [sent-348, score-0.426]
73 Note that while KLD values are available in real situations (provided the noise histogram and a generic natural images histogram are known), distortion measures are not available since the original image is unknown. [sent-357, score-0.594]
74 Curves are shown for different kinds of (Gaussian and nonGaussian) noise sources corrupting a particular image (details on the noise sources are given in Section 5). [sent-364, score-0.646]
75 These facts suggest that, in the Gaussian noise case, the proposed criterion is quite robust and provides consistent results: the higher the noise 885 ´ L APARRA , G UTI E RREZ , C AMPS -VALLS AND M ALO (red curves) the higher the ε zone minima. [sent-367, score-0.402]
76 Our algorithm is compared to several wavelet-based denoising methods using standard 256×256 images (‘Barbara’, ‘Boats’, ‘Lena’) with different levels and sources of degradation. [sent-379, score-0.361]
77 In the Gaussian noise case, two different noise variances are considered: σ2 = 200 (black lines) and σ2 = 400 (red lines). [sent-408, score-0.402]
78 Additive Gaussian Noise Table 1 shows the distortion results for the three considered images and the two noise variances, σ2 = 200 and σ2 = 400. [sent-435, score-0.369]
79 It can n be noticed that thresholding methods (HT, ST) and Bayesian generalizations not including signal relations in the model (BG, BL) show poor performance, producing images either grained or corrupted by too salient wavelet functions. [sent-541, score-0.736]
80 In the second case, scalar quantization of the QMF wavelet domain using standard JPEG2000 bit allocation tables (Taubman and Marcellin, 2001) was used. [sent-550, score-0.431]
81 73) Figure 8: Visual results for the ‘Barbara’ image with JPEG quantization noise (Q = 7). [sent-577, score-0.391]
82 Other typical acquisition noise source is observed in infrared imaging cameras, which is a complex mixture of different noise sources. [sent-763, score-0.433]
83 Inspired in the observed characteristics of a representative number of acquired images by a commercial IR camera, the noise was modeled by a combination of four noise sources: low-variance Gaussian noise (σ2 ≈ 50), ‘salt-and-pepper’ n noise (with a percentage of corrupted pixels about 0. [sent-772, score-0.932]
84 IRIS noise is difficult because the PSD and variance of each particular realization of the noise may substantially differ from the (estimated) averages. [sent-872, score-0.402]
85 5 Analysis of the Residuals Further qualitative insight in the obtained solutions can be achieved by comparing the estimated and actual PDFs of signal and noise with the different methods and noise sources. [sent-876, score-0.542]
86 896 I MAGE D ENOISING WITH K ERNELS BASED ON NATURAL I MAGE R ELATIONS suitable for direct inspection because (1) actual noise histograms are quite different for the different noise sources, and (2) the estimated histograms strongly depend on the denoising method. [sent-901, score-0.692]
87 For instance, the quantization noise induced by JPEG/JPEG2000 follows a non-Gaussian, oriented joint distribution (the central dark area is actually an oriented ellipsoid), indicating correlation among noise samples. [sent-906, score-0.51]
88 For the case of JPEG2000, methods not considering signal relations dramatically underestimate the noise variance. [sent-909, score-0.476]
89 To conclude, methods assuming an (inadequate) Gaussian noise model do not match, in general, the noise distribution, so they should be reformulated for each particular noise source, which may be complicated or even impossible. [sent-913, score-0.603]
90 Conclusions In this work, we proposed an alternative non-parametric way to take into account the relations among natural image wavelet coefficients for denoising: we used SVR in the wavelet domain to enforce these relations in the estimated signal. [sent-917, score-1.238]
91 The specific signal relations, which proved to be more relevant in intraband coefficients, are encoded in an anisotropic kernel based on mutual information computed from a representative image database. [sent-918, score-0.525]
92 An adaptive SVR with different cost function per subband was developed: the subband-dependent εi and Ci are modeled by analyzing the particular signal and noise variances in a representative image database. [sent-919, score-0.598]
93 A KLD-based criterion was proposed to automatically select the SVR that best recovers the relevant wavelet coefficient relations of the true signal. [sent-921, score-0.515]
94 Moreover, these results are an 897 ´ L APARRA , G UTI E RREZ , C AMPS -VALLS AND M ALO Spatial Noise HT ST BG BL GSM SVR Table 4: 2D histograms of the residuals in the spatial domain for all methods and noise sources (darker regions indicate higher probability). [sent-926, score-0.404]
95 898 σ2 = 400 JPEG JPEG2000 VS IRIS I MAGE D ENOISING WITH K ERNELS BASED ON NATURAL I MAGE R ELATIONS additional indication that relation between local frequency coefficients is a salient natural image feature that should not be neglected in denoising applications. [sent-930, score-0.387]
96 Future work is tied to the incorporation of new information in the kernels: here we focused on the consideration of signal relations in the kernel, but the particular structure of the noise could be eventually incorporated. [sent-931, score-0.476]
97 Removal of correlated noise by modeling the signal of interest in the wavelet domain. [sent-1091, score-0.677]
98 Information-theoretic analysis of interscale and intrascale dependencies between image wavelet coefficients. [sent-1165, score-0.529]
99 Image denoising using a scale mixture of Gaussians in the wavelet domain. [sent-1233, score-0.568]
100 Bayesian denoising of visual images in the wavelet domain. [sent-1244, score-0.702]
wordName wordTfidf (topN-words)
[('svr', 0.454), ('wavelet', 0.358), ('ssim', 0.272), ('denoising', 0.21), ('noise', 0.201), ('mage', 0.194), ('rrez', 0.166), ('gsm', 0.161), ('relations', 0.157), ('image', 0.148), ('signal', 0.118), ('rmse', 0.118), ('elations', 0.116), ('amps', 0.113), ('aparra', 0.113), ('steerable', 0.113), ('intraband', 0.106), ('subband', 0.106), ('simoncelli', 0.103), ('images', 0.103), ('jpeg', 0.098), ('uti', 0.097), ('alo', 0.097), ('kld', 0.097), ('svropt', 0.091), ('enoising', 0.087), ('spatial', 0.084), ('bl', 0.08), ('bg', 0.076), ('lena', 0.076), ('striping', 0.076), ('ernels', 0.075), ('barbara', 0.071), ('pdfs', 0.071), ('cients', 0.07), ('ht', 0.069), ('coef', 0.069), ('boats', 0.068), ('mutual', 0.068), ('insensitivity', 0.066), ('distortion', 0.065), ('guti', 0.06), ('pdf', 0.054), ('svrs', 0.053), ('sources', 0.048), ('pro', 0.048), ('iris', 0.047), ('noisy', 0.045), ('laparra', 0.045), ('portilla', 0.045), ('wavelets', 0.045), ('penalization', 0.045), ('st', 0.043), ('quantization', 0.042), ('histograms', 0.04), ('ncia', 0.039), ('redundant', 0.039), ('buccigrossi', 0.038), ('malo', 0.038), ('moulin', 0.038), ('anisotropic', 0.035), ('oriented', 0.033), ('kernels', 0.032), ('domain', 0.031), ('acquisition', 0.031), ('visual', 0.031), ('vertical', 0.031), ('transform', 0.03), ('cherkassky', 0.03), ('interband', 0.03), ('subbands', 0.03), ('valero', 0.03), ('figueiredo', 0.03), ('rise', 0.03), ('natural', 0.029), ('distorted', 0.029), ('val', 0.028), ('orientations', 0.027), ('tsang', 0.027), ('pyramid', 0.027), ('kernel', 0.025), ('kwok', 0.025), ('representative', 0.025), ('histogram', 0.024), ('patch', 0.023), ('adelson', 0.023), ('banham', 0.023), ('burjassot', 0.023), ('chalimourda', 0.023), ('ginneken', 0.023), ('goossens', 0.023), ('gustavo', 0.023), ('ignal', 0.023), ('interscale', 0.023), ('domains', 0.022), ('gaussian', 0.022), ('donoho', 0.022), ('qualitative', 0.022), ('coding', 0.022), ('pi', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999917 50 jmlr-2010-Image Denoising with Kernels Based on Natural Image Relations
Author: Valero Laparra, Juan Gutiérrez, Gustavo Camps-Valls, Jesús Malo
Abstract: A successful class of image denoising methods is based on Bayesian approaches working in wavelet representations. The performance of these methods improves when relations among the local frequency coefficients are explicitly included. However, in these techniques, analytical estimates can be obtained only for particular combinations of analytical models of signal and noise, thus precluding its straightforward extension to deal with other arbitrary noise sources. In this paper, we propose an alternative non-explicit way to take into account the relations among natural image wavelet coefficients for denoising: we use support vector regression (SVR) in the wavelet domain to enforce these relations in the estimated signal. Since relations among the coefficients are specific to the signal, the regularization property of SVR is exploited to remove the noise, which does not share this feature. The specific signal relations are encoded in an anisotropic kernel obtained from mutual information measures computed on a representative image database. In the proposed scheme, training considers minimizing the Kullback-Leibler divergence (KLD) between the estimated and actual probability functions (or histograms) of signal and noise in order to enforce similarity up to the higher (computationally estimable) order. Due to its non-parametric nature, the method can eventually cope with different noise sources without the need of an explicit re-formulation, as it is strictly necessary under parametric Bayesian formalisms. Results under several noise levels and noise sources show that: (1) the proposed method outperforms conventional wavelet methods that assume coefficient independence, (2) it is similar to state-of-the-art methods that do explicitly include these relations when the noise source is Gaussian, and (3) it gives better numerical and visual performance when more complex, realistic noise sources are considered. Therefore, the proposed machine learning approach can be seen as a mor
Author: Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol
Abstract: We explore an original strategy for building deep networks, based on stacking layers of denoising autoencoders which are trained locally to denoise corrupted versions of their inputs. The resulting algorithm is a straightforward variation on the stacking of ordinary autoencoders. It is however shown on a benchmark of classification problems to yield significantly lower classification error, thus bridging the performance gap with deep belief networks (DBN), and in several cases surpassing it. Higher level representations learnt in this purely unsupervised fashion also help boost the performance of subsequent SVM classifiers. Qualitative experiments show that, contrary to ordinary autoencoders, denoising autoencoders are able to learn Gabor-like edge detectors from natural image patches and larger stroke detectors from digit images. This work clearly establishes the value of using a denoising criterion as a tractable unsupervised objective to guide the learning of useful higher level representations. Keywords: deep learning, unsupervised feature learning, deep belief networks, autoencoders, denoising
3 0.073858112 59 jmlr-2010-Large Scale Online Learning of Image Similarity Through Ranking
Author: Gal Chechik, Varun Sharma, Uri Shalit, Samy Bengio
Abstract: Learning a measure of similarity between pairs of objects is an important generic problem in machine learning. It is particularly useful in large scale applications like searching for an image that is similar to a given image or finding videos that are relevant to a given video. In these tasks, users look for objects that are not only visually similar but also semantically related to a given object. Unfortunately, the approaches that exist today for learning such semantic similarity do not scale to large data sets. This is both because typically their CPU and storage requirements grow quadratically with the sample size, and because many methods impose complex positivity constraints on the space of learned similarity functions. The current paper presents OASIS, an Online Algorithm for Scalable Image Similarity learning that learns a bilinear similarity measure over sparse representations. OASIS is an online dual approach using the passive-aggressive family of learning algorithms with a large margin criterion and an efficient hinge loss cost. Our experiments show that OASIS is both fast and accurate at a wide range of scales: for a data set with thousands of images, it achieves better results than existing state-of-the-art methods, while being an order of magnitude faster. For large, web scale, data sets, OASIS can be trained on more than two million images from 150K text queries within 3 days on a single CPU. On this large scale data set, human evaluations showed that 35% of the ten nearest neighbors of a given test image, as found by OASIS, were semantically relevant to that image. This suggests that query independent similarity could be accurately learned even for large scale data sets that could not be handled before. Keywords: large scale, metric learning, image similarity, online learning ∗. Varun Sharma and Uri Shalit contributed equally to this work. †. Also at ICNC, The Hebrew University of Jerusalem, 91904, Israel. c 2010 Gal Chechik, Varun Sharma, Uri Shalit
4 0.056947071 101 jmlr-2010-Second-Order Bilinear Discriminant Analysis
Author: Christoforos Christoforou, Robert Haralick, Paul Sajda, Lucas C. Parra
Abstract: Traditional analysis methods for single-trial classification of electro-encephalography (EEG) focus on two types of paradigms: phase-locked methods, in which the amplitude of the signal is used as the feature for classification, that is, event related potentials; and second-order methods, in which the feature of interest is the power of the signal, that is, event related (de)synchronization. The process of deciding which paradigm to use is ad hoc and is driven by assumptions regarding the underlying neural generators. Here we propose a method that provides an unified framework for the analysis of EEG, combining first and second-order spatial and temporal features based on a bilinear model. Evaluation of the proposed method on simulated data shows that the technique outperforms state-of-the art techniques for single-trial classification for a broad range of signal-to-noise ratios. Evaluations on human EEG—including one benchmark data set from the Brain Computer Interface (BCI) competition—show statistically significant gains in classification accuracy, with a reduction in overall classification error from 26%-28% to 19%. Keywords: regularization, classification, bilinear decomposition, neural signals, brain computer interface
5 0.052943937 87 jmlr-2010-Online Learning for Matrix Factorization and Sparse Coding
Author: Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro
Abstract: Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the large-scale matrix factorization problem that consists of learning the basis set in order to adapt it to specific data. Variations of this problem include dictionary learning in signal processing, non-negative matrix factorization and sparse principal component analysis. In this paper, we propose to address these tasks with a new online optimization algorithm, based on stochastic approximations, which scales up gracefully to large data sets with millions of training samples, and extends naturally to various matrix factorization formulations, making it suitable for a wide range of learning problems. A proof of convergence is presented, along with experiments with natural images and genomic data demonstrating that it leads to state-of-the-art performance in terms of speed and optimization for both small and large data sets. Keywords: basis pursuit, dictionary learning, matrix factorization, online learning, sparse coding, sparse principal component analysis, stochastic approximations, stochastic optimization, nonnegative matrix factorization
6 0.043500591 65 jmlr-2010-Learning Translation Invariant Kernels for Classification
7 0.040819276 117 jmlr-2010-Why Does Unsupervised Pre-training Help Deep Learning?
8 0.036868609 69 jmlr-2010-Lp-Nested Symmetric Distributions
9 0.035602909 106 jmlr-2010-Stability Bounds for Stationary φ-mixing and β-mixing Processes
10 0.035268422 44 jmlr-2010-Graph Kernels
11 0.034465976 3 jmlr-2010-A Fast Hybrid Algorithm for Large-Scalel1-Regularized Logistic Regression
12 0.033334069 30 jmlr-2010-Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting
13 0.033210844 78 jmlr-2010-Model Selection: Beyond the Bayesian Frequentist Divide
14 0.033159364 16 jmlr-2010-Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory
15 0.032560732 15 jmlr-2010-Approximate Tree Kernels
16 0.032356054 72 jmlr-2010-Matrix Completion from Noisy Entries
17 0.031892441 4 jmlr-2010-A Generalized Path Integral Control Approach to Reinforcement Learning
18 0.03100175 7 jmlr-2010-A Streaming Parallel Decision Tree Algorithm
19 0.030668098 47 jmlr-2010-Hilbert Space Embeddings and Metrics on Probability Measures
20 0.029522333 45 jmlr-2010-High-dimensional Variable Selection with Sparse Random Projections: Measurement Sparsity and Statistical Efficiency
topicId topicWeight
[(0, -0.134), (1, 0.003), (2, -0.027), (3, 0.049), (4, -0.023), (5, 0.106), (6, 0.033), (7, -0.193), (8, 0.235), (9, 0.05), (10, -0.017), (11, 0.022), (12, -0.027), (13, -0.007), (14, -0.093), (15, -0.003), (16, 0.029), (17, 0.094), (18, -0.03), (19, -0.09), (20, 0.016), (21, -0.005), (22, -0.083), (23, -0.043), (24, 0.164), (25, 0.049), (26, 0.01), (27, 0.082), (28, -0.154), (29, 0.166), (30, 0.176), (31, 0.208), (32, -0.109), (33, 0.116), (34, -0.038), (35, 0.185), (36, 0.004), (37, 0.016), (38, 0.075), (39, 0.044), (40, 0.064), (41, -0.16), (42, 0.077), (43, -0.101), (44, 0.063), (45, -0.076), (46, 0.073), (47, -0.011), (48, -0.08), (49, 0.045)]
simIndex simValue paperId paperTitle
same-paper 1 0.96719187 50 jmlr-2010-Image Denoising with Kernels Based on Natural Image Relations
Author: Valero Laparra, Juan Gutiérrez, Gustavo Camps-Valls, Jesús Malo
Abstract: A successful class of image denoising methods is based on Bayesian approaches working in wavelet representations. The performance of these methods improves when relations among the local frequency coefficients are explicitly included. However, in these techniques, analytical estimates can be obtained only for particular combinations of analytical models of signal and noise, thus precluding its straightforward extension to deal with other arbitrary noise sources. In this paper, we propose an alternative non-explicit way to take into account the relations among natural image wavelet coefficients for denoising: we use support vector regression (SVR) in the wavelet domain to enforce these relations in the estimated signal. Since relations among the coefficients are specific to the signal, the regularization property of SVR is exploited to remove the noise, which does not share this feature. The specific signal relations are encoded in an anisotropic kernel obtained from mutual information measures computed on a representative image database. In the proposed scheme, training considers minimizing the Kullback-Leibler divergence (KLD) between the estimated and actual probability functions (or histograms) of signal and noise in order to enforce similarity up to the higher (computationally estimable) order. Due to its non-parametric nature, the method can eventually cope with different noise sources without the need of an explicit re-formulation, as it is strictly necessary under parametric Bayesian formalisms. Results under several noise levels and noise sources show that: (1) the proposed method outperforms conventional wavelet methods that assume coefficient independence, (2) it is similar to state-of-the-art methods that do explicitly include these relations when the noise source is Gaussian, and (3) it gives better numerical and visual performance when more complex, realistic noise sources are considered. Therefore, the proposed machine learning approach can be seen as a mor
2 0.57425869 59 jmlr-2010-Large Scale Online Learning of Image Similarity Through Ranking
Author: Gal Chechik, Varun Sharma, Uri Shalit, Samy Bengio
Abstract: Learning a measure of similarity between pairs of objects is an important generic problem in machine learning. It is particularly useful in large scale applications like searching for an image that is similar to a given image or finding videos that are relevant to a given video. In these tasks, users look for objects that are not only visually similar but also semantically related to a given object. Unfortunately, the approaches that exist today for learning such semantic similarity do not scale to large data sets. This is both because typically their CPU and storage requirements grow quadratically with the sample size, and because many methods impose complex positivity constraints on the space of learned similarity functions. The current paper presents OASIS, an Online Algorithm for Scalable Image Similarity learning that learns a bilinear similarity measure over sparse representations. OASIS is an online dual approach using the passive-aggressive family of learning algorithms with a large margin criterion and an efficient hinge loss cost. Our experiments show that OASIS is both fast and accurate at a wide range of scales: for a data set with thousands of images, it achieves better results than existing state-of-the-art methods, while being an order of magnitude faster. For large, web scale, data sets, OASIS can be trained on more than two million images from 150K text queries within 3 days on a single CPU. On this large scale data set, human evaluations showed that 35% of the ten nearest neighbors of a given test image, as found by OASIS, were semantically relevant to that image. This suggests that query independent similarity could be accurately learned even for large scale data sets that could not be handled before. Keywords: large scale, metric learning, image similarity, online learning ∗. Varun Sharma and Uri Shalit contributed equally to this work. †. Also at ICNC, The Hebrew University of Jerusalem, 91904, Israel. c 2010 Gal Chechik, Varun Sharma, Uri Shalit
Author: Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol
Abstract: We explore an original strategy for building deep networks, based on stacking layers of denoising autoencoders which are trained locally to denoise corrupted versions of their inputs. The resulting algorithm is a straightforward variation on the stacking of ordinary autoencoders. It is however shown on a benchmark of classification problems to yield significantly lower classification error, thus bridging the performance gap with deep belief networks (DBN), and in several cases surpassing it. Higher level representations learnt in this purely unsupervised fashion also help boost the performance of subsequent SVM classifiers. Qualitative experiments show that, contrary to ordinary autoencoders, denoising autoencoders are able to learn Gabor-like edge detectors from natural image patches and larger stroke detectors from digit images. This work clearly establishes the value of using a denoising criterion as a tractable unsupervised objective to guide the learning of useful higher level representations. Keywords: deep learning, unsupervised feature learning, deep belief networks, autoencoders, denoising
4 0.43488449 101 jmlr-2010-Second-Order Bilinear Discriminant Analysis
Author: Christoforos Christoforou, Robert Haralick, Paul Sajda, Lucas C. Parra
Abstract: Traditional analysis methods for single-trial classification of electro-encephalography (EEG) focus on two types of paradigms: phase-locked methods, in which the amplitude of the signal is used as the feature for classification, that is, event related potentials; and second-order methods, in which the feature of interest is the power of the signal, that is, event related (de)synchronization. The process of deciding which paradigm to use is ad hoc and is driven by assumptions regarding the underlying neural generators. Here we propose a method that provides an unified framework for the analysis of EEG, combining first and second-order spatial and temporal features based on a bilinear model. Evaluation of the proposed method on simulated data shows that the technique outperforms state-of-the art techniques for single-trial classification for a broad range of signal-to-noise ratios. Evaluations on human EEG—including one benchmark data set from the Brain Computer Interface (BCI) competition—show statistically significant gains in classification accuracy, with a reduction in overall classification error from 26%-28% to 19%. Keywords: regularization, classification, bilinear decomposition, neural signals, brain computer interface
5 0.30715063 87 jmlr-2010-Online Learning for Matrix Factorization and Sparse Coding
Author: Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro
Abstract: Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the large-scale matrix factorization problem that consists of learning the basis set in order to adapt it to specific data. Variations of this problem include dictionary learning in signal processing, non-negative matrix factorization and sparse principal component analysis. In this paper, we propose to address these tasks with a new online optimization algorithm, based on stochastic approximations, which scales up gracefully to large data sets with millions of training samples, and extends naturally to various matrix factorization formulations, making it suitable for a wide range of learning problems. A proof of convergence is presented, along with experiments with natural images and genomic data demonstrating that it leads to state-of-the-art performance in terms of speed and optimization for both small and large data sets. Keywords: basis pursuit, dictionary learning, matrix factorization, online learning, sparse coding, sparse principal component analysis, stochastic approximations, stochastic optimization, nonnegative matrix factorization
6 0.26276362 117 jmlr-2010-Why Does Unsupervised Pre-training Help Deep Learning?
7 0.24051848 65 jmlr-2010-Learning Translation Invariant Kernels for Classification
9 0.20235723 4 jmlr-2010-A Generalized Path Integral Control Approach to Reinforcement Learning
10 0.18960612 61 jmlr-2010-Learning From Crowds
11 0.17999174 36 jmlr-2010-Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity
12 0.17452079 15 jmlr-2010-Approximate Tree Kernels
13 0.16824554 57 jmlr-2010-Iterative Scaling and Coordinate Descent Methods for Maximum Entropy Models
14 0.1658328 30 jmlr-2010-Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting
15 0.1656609 69 jmlr-2010-Lp-Nested Symmetric Distributions
16 0.16181387 98 jmlr-2010-Regularized Discriminant Analysis, Ridge Regression and Beyond
17 0.15577558 47 jmlr-2010-Hilbert Space Embeddings and Metrics on Probability Measures
19 0.14689292 3 jmlr-2010-A Fast Hybrid Algorithm for Large-Scalel1-Regularized Logistic Regression
20 0.14685881 94 jmlr-2010-Quadratic Programming Feature Selection
topicId topicWeight
[(4, 0.017), (8, 0.029), (21, 0.016), (24, 0.012), (32, 0.057), (33, 0.018), (34, 0.476), (36, 0.02), (37, 0.04), (75, 0.141), (81, 0.018), (85, 0.045), (96, 0.017)]
simIndex simValue paperId paperTitle
1 0.75087833 37 jmlr-2010-Evolving Static Representations for Task Transfer
Author: Phillip Verbancsics, Kenneth O. Stanley
Abstract: An important goal for machine learning is to transfer knowledge between tasks. For example, learning to play RoboCup Keepaway should contribute to learning the full game of RoboCup soccer. Previous approaches to transfer in Keepaway have focused on transforming the original representation to fit the new task. In contrast, this paper explores the idea that transfer is most effective if the representation is designed to be the same even across different tasks. To demonstrate this point, a bird’s eye view (BEV) representation is introduced that can represent different tasks on the same two-dimensional map. For example, both the 3 vs. 2 and 4 vs. 3 Keepaway tasks can be represented on the same BEV. Yet the problem is that a raw two-dimensional map is high-dimensional and unstructured. This paper shows how this problem is addressed naturally by an idea from evolutionary computation called indirect encoding, which compresses the representation by exploiting its geometry. The result is that the BEV learns a Keepaway policy that transfers without further learning or manipulation. It also facilitates transferring knowledge learned in a different domain, Knight Joust, into Keepaway. Finally, the indirect encoding of the BEV means that its geometry can be changed without altering the solution. Thus static representations facilitate several kinds of transfer.
same-paper 2 0.71169364 50 jmlr-2010-Image Denoising with Kernels Based on Natural Image Relations
Author: Valero Laparra, Juan Gutiérrez, Gustavo Camps-Valls, Jesús Malo
Abstract: A successful class of image denoising methods is based on Bayesian approaches working in wavelet representations. The performance of these methods improves when relations among the local frequency coefficients are explicitly included. However, in these techniques, analytical estimates can be obtained only for particular combinations of analytical models of signal and noise, thus precluding its straightforward extension to deal with other arbitrary noise sources. In this paper, we propose an alternative non-explicit way to take into account the relations among natural image wavelet coefficients for denoising: we use support vector regression (SVR) in the wavelet domain to enforce these relations in the estimated signal. Since relations among the coefficients are specific to the signal, the regularization property of SVR is exploited to remove the noise, which does not share this feature. The specific signal relations are encoded in an anisotropic kernel obtained from mutual information measures computed on a representative image database. In the proposed scheme, training considers minimizing the Kullback-Leibler divergence (KLD) between the estimated and actual probability functions (or histograms) of signal and noise in order to enforce similarity up to the higher (computationally estimable) order. Due to its non-parametric nature, the method can eventually cope with different noise sources without the need of an explicit re-formulation, as it is strictly necessary under parametric Bayesian formalisms. Results under several noise levels and noise sources show that: (1) the proposed method outperforms conventional wavelet methods that assume coefficient independence, (2) it is similar to state-of-the-art methods that do explicitly include these relations when the noise source is Gaussian, and (3) it gives better numerical and visual performance when more complex, realistic noise sources are considered. Therefore, the proposed machine learning approach can be seen as a mor
3 0.32487696 46 jmlr-2010-High Dimensional Inverse Covariance Matrix Estimation via Linear Programming
Author: Ming Yuan
Abstract: This paper considers the problem of estimating a high dimensional inverse covariance matrix that can be well approximated by “sparse” matrices. Taking advantage of the connection between multivariate linear regression and entries of the inverse covariance matrix, we propose an estimating procedure that can effectively exploit such “sparsity”. The proposed method can be computed using linear programming and therefore has the potential to be used in very high dimensional problems. Oracle inequalities are established for the estimation error in terms of several operator norms, showing that the method is adaptive to different types of sparsity of the problem. Keywords: covariance selection, Dantzig selector, Gaussian graphical model, inverse covariance matrix, Lasso, linear programming, oracle inequality, sparsity
4 0.32437363 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes
Author: Jitkomut Songsiri, Lieven Vandenberghe
Abstract: An algorithm is presented for topology selection in graphical models of autoregressive Gaussian time series. The graph topology of the model represents the sparsity pattern of the inverse spectrum of the time series and characterizes conditional independence relations between the variables. The method proposed in the paper is based on an ℓ1 -type nonsmooth regularization of the conditional maximum likelihood estimation problem. We show that this reduces to a convex optimization problem and describe a large-scale algorithm that solves the dual problem via the gradient projection method. Results of experiments with randomly generated and real data sets are also included. Keywords: graphical models, time series, topology selection, convex optimization
5 0.32238439 74 jmlr-2010-Maximum Relative Margin and Data-Dependent Regularization
Author: Pannagadatta K. Shivaswamy, Tony Jebara
Abstract: Leading classification methods such as support vector machines (SVMs) and their counterparts achieve strong generalization performance by maximizing the margin of separation between data classes. While the maximum margin approach has achieved promising performance, this article identifies its sensitivity to affine transformations of the data and to directions with large data spread. Maximum margin solutions may be misled by the spread of data and preferentially separate classes along large spread directions. This article corrects these weaknesses by measuring margin not in the absolute sense but rather only relative to the spread of data in any projection direction. Maximum relative margin corresponds to a data-dependent regularization on the classification function while maximum absolute margin corresponds to an ℓ2 norm constraint on the classification function. Interestingly, the proposed improvements only require simple extensions to existing maximum margin formulations and preserve the computational efficiency of SVMs. Through the maximization of relative margin, surprising performance gains are achieved on real-world problems such as digit, text classification and on several other benchmark data sets. In addition, risk bounds are derived for the new formulation based on Rademacher averages. Keywords: support vector machines, kernel methods, large margin, Rademacher complexity
6 0.32141101 63 jmlr-2010-Learning Instance-Specific Predictive Models
7 0.32133952 43 jmlr-2010-Generalized Power Method for Sparse Principal Component Analysis
8 0.3206006 87 jmlr-2010-Online Learning for Matrix Factorization and Sparse Coding
9 0.32013947 103 jmlr-2010-Sparse Semi-supervised Learning Using Conjugate Functions
10 0.31980333 92 jmlr-2010-Practical Approaches to Principal Component Analysis in the Presence of Missing Values
11 0.31932202 114 jmlr-2010-Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels
12 0.31916934 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing
13 0.31828359 59 jmlr-2010-Large Scale Online Learning of Image Similarity Through Ranking
14 0.31789795 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond
15 0.31659192 66 jmlr-2010-Linear Algorithms for Online Multitask Classification
16 0.31635794 40 jmlr-2010-Fast and Scalable Local Kernel Machines
17 0.31612107 109 jmlr-2010-Stochastic Composite Likelihood
18 0.31554881 105 jmlr-2010-Spectral Regularization Algorithms for Learning Large Incomplete Matrices
19 0.31538951 49 jmlr-2010-Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
20 0.31506217 33 jmlr-2010-Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers