jmlr jmlr2010 jmlr2010-50 knowledge-graph by maker-knowledge-mining

50 jmlr-2010-Image Denoising with Kernels Based on Natural Image Relations

Source: pdf

Author: Valero Laparra, Juan Gutiérrez, Gustavo Camps-Valls, Jesús Malo

Abstract: A successful class of image denoising methods is based on Bayesian approaches working in wavelet representations. The performance of these methods improves when relations among the local frequency coefﬁcients are explicitly included. However, in these techniques, analytical estimates can be obtained only for particular combinations of analytical models of signal and noise, thus precluding its straightforward extension to deal with other arbitrary noise sources. In this paper, we propose an alternative non-explicit way to take into account the relations among natural image wavelet coefﬁcients for denoising: we use support vector regression (SVR) in the wavelet domain to enforce these relations in the estimated signal. Since relations among the coefﬁcients are speciﬁc to the signal, the regularization property of SVR is exploited to remove the noise, which does not share this feature. The speciﬁc signal relations are encoded in an anisotropic kernel obtained from mutual information measures computed on a representative image database. In the proposed scheme, training considers minimizing the Kullback-Leibler divergence (KLD) between the estimated and actual probability functions (or histograms) of signal and noise in order to enforce similarity up to the higher (computationally estimable) order. Due to its non-parametric nature, the method can eventually cope with different noise sources without the need of an explicit re-formulation, as it is strictly necessary under parametric Bayesian formalisms. Results under several noise levels and noise sources show that: (1) the proposed method outperforms conventional wavelet methods that assume coefﬁcient independence, (2) it is similar to state-of-the-art methods that do explicitly include these relations when the noise source is Gaussian, and (3) it gives better numerical and visual performance when more complex, realistic noise sources are considered. Therefore, the proposed machine learning approach can be seen as a mor

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 e Editor: Donald Geman Abstract A successful class of image denoising methods is based on Bayesian approaches working in wavelet representations. [sent-13, score-0.716]

2 In this paper, we propose an alternative non-explicit way to take into account the relations among natural image wavelet coefﬁcients for denoising: we use support vector regression (SVR) in the wavelet domain to enforce these relations in the estimated signal. [sent-16, score-1.238]

3 The speciﬁc signal relations are encoded in an anisotropic kernel obtained from mutual information measures computed on a representative image database. [sent-18, score-0.576]

4 Therefore, the proposed machine learning approach can be seen as a more ﬂexible (model-free) alternative to the explicit description of wavelet coefﬁcient relations for image denoising. [sent-22, score-0.663]

5 Introduction Denoising requires representing the distorted signal in a domain where signal and noise display different enough behavior. [sent-25, score-0.497]

6 In image denoising, classical regularization techniques are used to impose smoothness in the spatial domain since noise is typically white (Banham and Katsaggelos, 1997). [sent-27, score-0.464]

7 Several image denoising methods working in the spatial domain have been presented in the literature, either based on splines (Takeda et al. [sent-30, score-0.473]

8 In fact, wavelet representations have been recognized as quite appropriate domains for image denoising. [sent-36, score-0.528]

9 1 Wavelet representations are convenient in image denoising because natural image samples have a very speciﬁc statistical behavior in this domain. [sent-37, score-0.535]

10 On the other hand, the combination of smooth regions with local, high contrast features, such as edges, gives rise to sparse activation of wavelet sensors. [sent-39, score-0.388]

11 It is well-known, however, that marginal models in the wavelet domain are not enough for a proper signal characterization: relevant relations among coefﬁcients still remain after wavelet transforms (Simoncelli, 1999). [sent-43, score-1.022]

12 For instance, edges lead to strong coupling between the energy of neighboring wavelet coefﬁcients of natural images. [sent-44, score-0.387]

13 These relations among wavelet coefﬁcients have proven to be a key issue in applications such as image coding (Malo et al. [sent-45, score-0.685]

14 The use of these relations is in the roots of the most recent and successful image denoising approaches as well (Portilla et al. [sent-49, score-0.515]

15 In this case, more complex image models explicitly including the relations among coefﬁcients have to be plugged and ﬁtted into the Bayesian framework to obtain the image estimates. [sent-52, score-0.453]

16 Conversely, non-parametric approaches can include the above qualitative properties in an indirect way without the restriction of being analytically attached to particular image or noise models. [sent-63, score-0.371]

17 In this work we apply support vector regression (SVR) in a redundant (overcomplete) wavelet domain to the noisy image. [sent-65, score-0.473]

18 Its solution may be found for arbitrary noise sources even without knowing the functional form of the noise PDF since it can work with just noise histograms. [sent-69, score-0.651]

19 It is capable to take into account the relations among wavelet coefﬁcients of natural images through the use of a suitable kernel. [sent-72, score-0.647]

20 The proposed method does not assume independence among the signal coefﬁcients in the wavelet domain, as opposed to Simoncelli (1999) and Figueiredo and Nowak (2001), nor an explicit model of signal relations, as done in Portilla et al. [sent-74, score-0.594]

21 Therefore, the proposed machine learning approach can be seen as a more ﬂexible (model-free) alternative to the explicit description of wavelet coefﬁcient relations for image denoising. [sent-76, score-0.663]

22 Non-explicit use of dependencies in local frequency domains for denoising was also introduced in Guti´ rrez et al. [sent-80, score-0.398]

23 The idea of using SVR regularization in the wavelet domain for image denoising has been recently introduced in Kai Tick Chow and Lee (2001), Cheng et al. [sent-84, score-0.747]

24 On the contrary, in this paper we address the key following issues: 875 ´ L APARRA , G UTI E RREZ , C AMPS -VALLS AND M ALO • Natural images features in redundant wavelet domains. [sent-88, score-0.5]

25 Interesting insight about the problem can be obtained by analyzing the mutual information between the coefﬁcients of wavelet representations (Buccigrossi and Simoncelli, 1999; Liu and Moulin, 2001). [sent-89, score-0.426]

26 Even though this methodological framework is proposed in the context of achromatic image denoising, it can be readily extended to other denoising problems in which wavelet coefﬁcients exhibit particular relations, such as in color or multispectral images, speech signals, etc. [sent-97, score-0.716]

27 In Section 2, we point out relevant signal features in redundant wavelet domains through mutual information measurements. [sent-99, score-0.605]

28 Section 5 shows the performance of the proposed method compared to standard denoising methods in the wavelet domain. [sent-102, score-0.568]

29 Features of Natural Images in the Steerable Wavelet Domain The starting hypothesis for image denoising is that signal and noise display different characteristics and thus it is possible to separate them in a certain domain. [sent-106, score-0.677]

30 Natural images show non-trivial relationships among wavelet transform coefﬁcients. [sent-107, score-0.491]

31 In the following, we review the reported statistical properties of natural images in orthogonal wavelet domains, and then analyze them in the redundant steerable wavelet domain selected in our implementation. [sent-108, score-1.031]

32 Speciﬁcally we will use mutual information (MI) to assess the statistical relations among wavelet coefﬁcients of natural images as in Buccigrossi and Simoncelli (1999) and Liu and Moulin (2001). [sent-109, score-0.715]

33 1 Intraband Versus Interband Signal Relations in Orthogonal Wavelets Dependencies among orthogonal wavelet coefﬁcients were measured using mutual information in Liu and Moulin (2001). [sent-111, score-0.426]

34 The dependencies were studied at interband and intraband levels, and the results suggested that the mutual information between intraband neighbors is typically larger than the interband relations for several models and types of interaction. [sent-112, score-0.497]

35 These evidences suggest that the dependencies among spatial neighboring coefﬁcients (intraband) in orthogonal wavelet descriptions are stronger than the interband dependencies. [sent-115, score-0.472]

36 2 Natural Images Relations in Steerable Wavelets Redundant (non-orthogonal) wavelet representations may be more suited to image denoising applications since redundant representation of the image features may make the signal inherent relations clearer. [sent-117, score-1.178]

37 Despite the reported results on the relations of signal coefﬁcients in orthogonal transforms, a number of questions have to be answered in the case of redundant representations, and in particular, in the steerable wavelet domain: 1. [sent-124, score-0.785]

38 The ﬁrst question is particularly important since, even though the steerable transform may intensify the relations among signal coefﬁcients, its redundant nature may also introduce relations which could be due to the transform but not to the signal. [sent-130, score-0.644]

39 1 S IGNAL R ELATIONS ARE S PECIFIC TO THE S IGNAL In our ﬁrst test, following Liu and Moulin (2001), we computed the mutual information among steerable wavelet coefﬁcients of the data set for different spatial, orientation, and scale distances. [sent-136, score-0.539]

40 877 ´ L APARRA , G UTI E RREZ , C AMPS -VALLS AND M ALO (a) (b) (c) Figure 1: Comparison between redundancy of natural image coefﬁcients in the steerable wavelet representation (solid), and the redundancy due to this representation (dashed). [sent-148, score-0.648]

41 2 I NTRABAND S IGNAL R ELATIONS D OMINATE OVER I NTERSCALE OR O RIENTATION Besides, the results show that intraband relations in the signal are also more important than interorientation or interscale relations. [sent-164, score-0.404]

42 Beyond consistency with previously reported results for orthogonal wavelet transforms (Buccigrossi and Simoncelli, 1999; Liu and Moulin, 2001), it has been observed that the relations are speciﬁc to the signal and not just induced by the transform. [sent-166, score-0.633]

43 2 [top] is the speciﬁc spatial arrangement of these relations: the presence of oriented structures in natural images gives rise to strong anisotropic intraband relations in the different subbands. [sent-180, score-0.577]

44 These mutual information results match recently reported results on autocorrelation of intraband wavelet coefﬁcients (Goossens et al. [sent-182, score-0.532]

45 878 I MAGE D ENOISING WITH K ERNELS BASED ON NATURAL I MAGE R ELATIONS α=0 α= π 8 α= 2π 8 α= α= 3π 8 4π 8 α= 5π 8 α= 6π 8 α= 7π 8 Figure 2: Mutual information among the central coefﬁcient and its spatial neighbors in the same subband (intraband) in the steerable wavelet domain. [sent-185, score-0.661]

46 Summarizing, natural images have singular features in the steerable wavelet domain (Figs. [sent-188, score-0.634]

47 1 and 2): given a distorted image, enforcing these singular oriented relations among coefﬁcients in every subband (with the appropriate kernels) will eventually preserve the natural signal relations and remove the noise. [sent-189, score-0.629]

48 Of course, the bigger the difference between the shape of the intraband relations in signal and noise the better the results are expected to be. [sent-190, score-0.582]

49 Restoring Wavelet Relations with SVR The effect of noise in the wavelet domain is introducing artiﬁcial deviations from the original signal and hiding the natural relations among the coefﬁcients (see an illustrative example in Fig. [sent-192, score-0.894]

50 The original set of wavelet coefﬁcients, x = T · i, has to be 879 ´ L APARRA , G UTI E RREZ , C AMPS -VALLS AND M ALO Figure 3: Effect of noise on the wavelet coefﬁcients. [sent-205, score-0.917]

51 Patch of a subband of a wavelet representation of the original image Barbara (left) and its noisy version (right). [sent-206, score-0.657]

52 Due to the observed strong intraband relations, we will use the SVR in the wavelet domain in patches inside each subband. [sent-209, score-0.495]

53 Now, given input-output pairs {pi , yi }N , where pi are the wavelet indices and yi are the corresponding noisy wavelet coefﬁcients i=1 in a patch, we train the adaptive SVR (Camps-Valls et al. [sent-211, score-0.783]

54 In the following, we restrict the range of possible values of these parameters, θ = (Ci , εi , K), in the particular case of image denoising in wavelet domains: Penalization factor. [sent-234, score-0.716]

55 In the denoising problem considered here, the signal variance substantially differs in each wavelet scale. [sent-236, score-0.686]

56 This proﬁle ki was obtained by averaging the standard deviation of wavelet coefﬁcients over 100 images from the database used in Section 2. [sent-238, score-0.461]

57 In redundant wavelet representations, this standard deviation is coefﬁcient dependent. [sent-248, score-0.397]

58 The transformed standard deviations can be estimated either (1) empiro ically from noise samples, or (2) computed from the noise covariance matrix if it is known. [sent-252, score-0.402]

59 In the empirical case, noise samples can be experimentally obtained by applying the noise source to a large enough set of images, and writing the noise as in Eq. [sent-253, score-0.603]

60 In the case that the noise covariance is known, the corresponding matrix in the selected wavelet domain can be obtained from the noise covariance matrix in the spatial domain, Σn , and the transform T (Stark and Woods, 1994). [sent-256, score-0.905]

61 (2) reduces to: n 1/2 εi = τ σn diag(T · T ⊤ )i , (3) where σ2 is the noise variance in the spatial domain, and τ is a scaling factor to be adapted n for each particular image and noise combination. [sent-259, score-0.634]

62 In our case, we propose to take into account image coefﬁcient relations by analyzing a large (representative) database and taking the (oriented) mutual information among samples as core distance measure. [sent-272, score-0.373]

63 The conclusion of this section is that in the case of image denoising in wavelet domains, an appropriate analysis of the signal variance, the noise variance, and the relations among the wavelet coefﬁcients of the signal can be used to strongly reduce the dimensionality of the SVR parameter space. [sent-284, score-1.668]

64 However, as stated above, denoising methods usually assume that additional probabilistic information on the signal and noise is available: p(i) and p(n|i). [sent-292, score-0.529]

65 (5) This means that the selected SVR parameters are those that give rise to both signal and noise estimates probabilistically similar to the true signal and noise respectively. [sent-307, score-0.668]

66 Note that this similarity does not require analytical models of the PDFs since it can be computed from histograms (or signal and noise samples). [sent-308, score-0.359]

67 First the noisy image is transformed by a steerable wavelet ﬁlter bank. [sent-317, score-0.664]

68 These SVRs use the proﬁles for the penalization factor and the insensitivity computed from signal and noise samples as described in Section 3. [sent-319, score-0.43]

69 The SVRs use the kernel based on MI that captures signal relations in the wavelet domain as described in Section 3. [sent-321, score-0.689]

70 Moreover we validate the proposed automatic procedure for SVR selection considering examples with different noise sources including non-additive and signal dependent cases. [sent-328, score-0.367]

71 In the image denoising case, the deviation from an appropriate solution in combined directions of the parameters gives rise to different solutions that combine the negative effect of the departure in each direction. [sent-343, score-0.388]

72 3 qualitatively illustrate the usefulness of the proposed 884 I MAGE D ENOISING WITH K ERNELS BASED ON NATURAL I MAGE R ELATIONS Figure 5: Effect of SVR parameters on the noisy wavelet patch of Fig. [sent-348, score-0.426]

73 Note that while KLD values are available in real situations (provided the noise histogram and a generic natural images histogram are known), distortion measures are not available since the original image is unknown. [sent-357, score-0.594]

74 Curves are shown for different kinds of (Gaussian and nonGaussian) noise sources corrupting a particular image (details on the noise sources are given in Section 5). [sent-364, score-0.646]

75 These facts suggest that, in the Gaussian noise case, the proposed criterion is quite robust and provides consistent results: the higher the noise 885 ´ L APARRA , G UTI E RREZ , C AMPS -VALLS AND M ALO (red curves) the higher the ε zone minima. [sent-367, score-0.402]

76 Our algorithm is compared to several wavelet-based denoising methods using standard 256×256 images (‘Barbara’, ‘Boats’, ‘Lena’) with different levels and sources of degradation. [sent-379, score-0.361]

77 In the Gaussian noise case, two different noise variances are considered: σ2 = 200 (black lines) and σ2 = 400 (red lines). [sent-408, score-0.402]

78 Additive Gaussian Noise Table 1 shows the distortion results for the three considered images and the two noise variances, σ2 = 200 and σ2 = 400. [sent-435, score-0.369]

79 It can n be noticed that thresholding methods (HT, ST) and Bayesian generalizations not including signal relations in the model (BG, BL) show poor performance, producing images either grained or corrupted by too salient wavelet functions. [sent-541, score-0.736]

80 In the second case, scalar quantization of the QMF wavelet domain using standard JPEG2000 bit allocation tables (Taubman and Marcellin, 2001) was used. [sent-550, score-0.431]

81 73) Figure 8: Visual results for the ‘Barbara’ image with JPEG quantization noise (Q = 7). [sent-577, score-0.391]

82 Other typical acquisition noise source is observed in infrared imaging cameras, which is a complex mixture of different noise sources. [sent-763, score-0.433]

83 Inspired in the observed characteristics of a representative number of acquired images by a commercial IR camera, the noise was modeled by a combination of four noise sources: low-variance Gaussian noise (σ2 ≈ 50), ‘salt-and-pepper’ n noise (with a percentage of corrupted pixels about 0. [sent-772, score-0.932]

84 IRIS noise is difﬁcult because the PSD and variance of each particular realization of the noise may substantially differ from the (estimated) averages. [sent-872, score-0.402]

85 5 Analysis of the Residuals Further qualitative insight in the obtained solutions can be achieved by comparing the estimated and actual PDFs of signal and noise with the different methods and noise sources. [sent-876, score-0.542]

86 896 I MAGE D ENOISING WITH K ERNELS BASED ON NATURAL I MAGE R ELATIONS suitable for direct inspection because (1) actual noise histograms are quite different for the different noise sources, and (2) the estimated histograms strongly depend on the denoising method. [sent-901, score-0.692]

87 For instance, the quantization noise induced by JPEG/JPEG2000 follows a non-Gaussian, oriented joint distribution (the central dark area is actually an oriented ellipsoid), indicating correlation among noise samples. [sent-906, score-0.51]

88 For the case of JPEG2000, methods not considering signal relations dramatically underestimate the noise variance. [sent-909, score-0.476]

89 To conclude, methods assuming an (inadequate) Gaussian noise model do not match, in general, the noise distribution, so they should be reformulated for each particular noise source, which may be complicated or even impossible. [sent-913, score-0.603]

90 Conclusions In this work, we proposed an alternative non-parametric way to take into account the relations among natural image wavelet coefﬁcients for denoising: we used SVR in the wavelet domain to enforce these relations in the estimated signal. [sent-917, score-1.238]

91 The speciﬁc signal relations, which proved to be more relevant in intraband coefﬁcients, are encoded in an anisotropic kernel based on mutual information computed from a representative image database. [sent-918, score-0.525]

92 An adaptive SVR with different cost function per subband was developed: the subband-dependent εi and Ci are modeled by analyzing the particular signal and noise variances in a representative image database. [sent-919, score-0.598]

93 A KLD-based criterion was proposed to automatically select the SVR that best recovers the relevant wavelet coefﬁcient relations of the true signal. [sent-921, score-0.515]

94 Moreover, these results are an 897 ´ L APARRA , G UTI E RREZ , C AMPS -VALLS AND M ALO Spatial Noise HT ST BG BL GSM SVR Table 4: 2D histograms of the residuals in the spatial domain for all methods and noise sources (darker regions indicate higher probability). [sent-926, score-0.404]

95 898 σ2 = 400 JPEG JPEG2000 VS IRIS I MAGE D ENOISING WITH K ERNELS BASED ON NATURAL I MAGE R ELATIONS additional indication that relation between local frequency coefﬁcients is a salient natural image feature that should not be neglected in denoising applications. [sent-930, score-0.387]

96 Future work is tied to the incorporation of new information in the kernels: here we focused on the consideration of signal relations in the kernel, but the particular structure of the noise could be eventually incorporated. [sent-931, score-0.476]

97 Removal of correlated noise by modeling the signal of interest in the wavelet domain. [sent-1091, score-0.677]

98 Information-theoretic analysis of interscale and intrascale dependencies between image wavelet coefﬁcients. [sent-1165, score-0.529]

99 Image denoising using a scale mixture of Gaussians in the wavelet domain. [sent-1233, score-0.568]

100 Bayesian denoising of visual images in the wavelet domain. [sent-1244, score-0.702]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('svr', 0.454), ('wavelet', 0.358), ('ssim', 0.272), ('denoising', 0.21), ('noise', 0.201), ('mage', 0.194), ('rrez', 0.166), ('gsm', 0.161), ('relations', 0.157), ('image', 0.148), ('signal', 0.118), ('rmse', 0.118), ('elations', 0.116), ('amps', 0.113), ('aparra', 0.113), ('steerable', 0.113), ('intraband', 0.106), ('subband', 0.106), ('simoncelli', 0.103), ('images', 0.103), ('jpeg', 0.098), ('uti', 0.097), ('alo', 0.097), ('kld', 0.097), ('svropt', 0.091), ('enoising', 0.087), ('spatial', 0.084), ('bl', 0.08), ('bg', 0.076), ('lena', 0.076), ('striping', 0.076), ('ernels', 0.075), ('barbara', 0.071), ('pdfs', 0.071), ('cients', 0.07), ('ht', 0.069), ('coef', 0.069), ('boats', 0.068), ('mutual', 0.068), ('insensitivity', 0.066), ('distortion', 0.065), ('guti', 0.06), ('pdf', 0.054), ('svrs', 0.053), ('sources', 0.048), ('pro', 0.048), ('iris', 0.047), ('noisy', 0.045), ('laparra', 0.045), ('portilla', 0.045), ('wavelets', 0.045), ('penalization', 0.045), ('st', 0.043), ('quantization', 0.042), ('histograms', 0.04), ('ncia', 0.039), ('redundant', 0.039), ('buccigrossi', 0.038), ('malo', 0.038), ('moulin', 0.038), ('anisotropic', 0.035), ('oriented', 0.033), ('kernels', 0.032), ('domain', 0.031), ('acquisition', 0.031), ('visual', 0.031), ('vertical', 0.031), ('transform', 0.03), ('cherkassky', 0.03), ('interband', 0.03), ('subbands', 0.03), ('valero', 0.03), ('figueiredo', 0.03), ('rise', 0.03), ('natural', 0.029), ('distorted', 0.029), ('val', 0.028), ('orientations', 0.027), ('tsang', 0.027), ('pyramid', 0.027), ('kernel', 0.025), ('kwok', 0.025), ('representative', 0.025), ('histogram', 0.024), ('patch', 0.023), ('adelson', 0.023), ('banham', 0.023), ('burjassot', 0.023), ('chalimourda', 0.023), ('ginneken', 0.023), ('goossens', 0.023), ('gustavo', 0.023), ('ignal', 0.023), ('interscale', 0.023), ('domains', 0.022), ('gaussian', 0.022), ('donoho', 0.022), ('qualitative', 0.022), ('coding', 0.022), ('pi', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999917 50 jmlr-2010-Image Denoising with Kernels Based on Natural Image Relations

Author: Valero Laparra, Juan Gutiérrez, Gustavo Camps-Valls, Jesús Malo

2 0.1533424 107 jmlr-2010-Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion

Author: Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol

Abstract: We explore an original strategy for building deep networks, based on stacking layers of denoising autoencoders which are trained locally to denoise corrupted versions of their inputs. The resulting algorithm is a straightforward variation on the stacking of ordinary autoencoders. It is however shown on a benchmark of classiﬁcation problems to yield signiﬁcantly lower classiﬁcation error, thus bridging the performance gap with deep belief networks (DBN), and in several cases surpassing it. Higher level representations learnt in this purely unsupervised fashion also help boost the performance of subsequent SVM classiﬁers. Qualitative experiments show that, contrary to ordinary autoencoders, denoising autoencoders are able to learn Gabor-like edge detectors from natural image patches and larger stroke detectors from digit images. This work clearly establishes the value of using a denoising criterion as a tractable unsupervised objective to guide the learning of useful higher level representations. Keywords: deep learning, unsupervised feature learning, deep belief networks, autoencoders, denoising

3 0.073858112 59 jmlr-2010-Large Scale Online Learning of Image Similarity Through Ranking

Author: Gal Chechik, Varun Sharma, Uri Shalit, Samy Bengio

Abstract: Learning a measure of similarity between pairs of objects is an important generic problem in machine learning. It is particularly useful in large scale applications like searching for an image that is similar to a given image or ﬁnding videos that are relevant to a given video. In these tasks, users look for objects that are not only visually similar but also semantically related to a given object. Unfortunately, the approaches that exist today for learning such semantic similarity do not scale to large data sets. This is both because typically their CPU and storage requirements grow quadratically with the sample size, and because many methods impose complex positivity constraints on the space of learned similarity functions. The current paper presents OASIS, an Online Algorithm for Scalable Image Similarity learning that learns a bilinear similarity measure over sparse representations. OASIS is an online dual approach using the passive-aggressive family of learning algorithms with a large margin criterion and an efﬁcient hinge loss cost. Our experiments show that OASIS is both fast and accurate at a wide range of scales: for a data set with thousands of images, it achieves better results than existing state-of-the-art methods, while being an order of magnitude faster. For large, web scale, data sets, OASIS can be trained on more than two million images from 150K text queries within 3 days on a single CPU. On this large scale data set, human evaluations showed that 35% of the ten nearest neighbors of a given test image, as found by OASIS, were semantically relevant to that image. This suggests that query independent similarity could be accurately learned even for large scale data sets that could not be handled before. Keywords: large scale, metric learning, image similarity, online learning ∗. Varun Sharma and Uri Shalit contributed equally to this work. †. Also at ICNC, The Hebrew University of Jerusalem, 91904, Israel. c 2010 Gal Chechik, Varun Sharma, Uri Shalit

4 0.056947071 101 jmlr-2010-Second-Order Bilinear Discriminant Analysis

Author: Christoforos Christoforou, Robert Haralick, Paul Sajda, Lucas C. Parra

Abstract: Traditional analysis methods for single-trial classiﬁcation of electro-encephalography (EEG) focus on two types of paradigms: phase-locked methods, in which the amplitude of the signal is used as the feature for classiﬁcation, that is, event related potentials; and second-order methods, in which the feature of interest is the power of the signal, that is, event related (de)synchronization. The process of deciding which paradigm to use is ad hoc and is driven by assumptions regarding the underlying neural generators. Here we propose a method that provides an uniﬁed framework for the analysis of EEG, combining ﬁrst and second-order spatial and temporal features based on a bilinear model. Evaluation of the proposed method on simulated data shows that the technique outperforms state-of-the art techniques for single-trial classiﬁcation for a broad range of signal-to-noise ratios. Evaluations on human EEG—including one benchmark data set from the Brain Computer Interface (BCI) competition—show statistically signiﬁcant gains in classiﬁcation accuracy, with a reduction in overall classiﬁcation error from 26%-28% to 19%. Keywords: regularization, classiﬁcation, bilinear decomposition, neural signals, brain computer interface

5 0.052943937 87 jmlr-2010-Online Learning for Matrix Factorization and Sparse Coding

Author: Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro

Abstract: Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the large-scale matrix factorization problem that consists of learning the basis set in order to adapt it to speciﬁc data. Variations of this problem include dictionary learning in signal processing, non-negative matrix factorization and sparse principal component analysis. In this paper, we propose to address these tasks with a new online optimization algorithm, based on stochastic approximations, which scales up gracefully to large data sets with millions of training samples, and extends naturally to various matrix factorization formulations, making it suitable for a wide range of learning problems. A proof of convergence is presented, along with experiments with natural images and genomic data demonstrating that it leads to state-of-the-art performance in terms of speed and optimization for both small and large data sets. Keywords: basis pursuit, dictionary learning, matrix factorization, online learning, sparse coding, sparse principal component analysis, stochastic approximations, stochastic optimization, nonnegative matrix factorization

6 0.043500591 65 jmlr-2010-Learning Translation Invariant Kernels for Classification

7 0.040819276 117 jmlr-2010-Why Does Unsupervised Pre-training Help Deep Learning?

8 0.036868609 69 jmlr-2010-Lp-Nested Symmetric Distributions

9 0.035602909 106 jmlr-2010-Stability Bounds for Stationary φ-mixing and β-mixing Processes

10 0.035268422 44 jmlr-2010-Graph Kernels

11 0.034465976 3 jmlr-2010-A Fast Hybrid Algorithm for Large-Scalel1-Regularized Logistic Regression

12 0.033334069 30 jmlr-2010-Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting

13 0.033210844 78 jmlr-2010-Model Selection: Beyond the Bayesian Frequentist Divide

14 0.033159364 16 jmlr-2010-Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory

15 0.032560732 15 jmlr-2010-Approximate Tree Kernels

16 0.032356054 72 jmlr-2010-Matrix Completion from Noisy Entries

17 0.031892441 4 jmlr-2010-A Generalized Path Integral Control Approach to Reinforcement Learning

18 0.03100175 7 jmlr-2010-A Streaming Parallel Decision Tree Algorithm

19 0.030668098 47 jmlr-2010-Hilbert Space Embeddings and Metrics on Probability Measures

20 0.029522333 45 jmlr-2010-High-dimensional Variable Selection with Sparse Random Projections: Measurement Sparsity and Statistical Efficiency

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.134), (1, 0.003), (2, -0.027), (3, 0.049), (4, -0.023), (5, 0.106), (6, 0.033), (7, -0.193), (8, 0.235), (9, 0.05), (10, -0.017), (11, 0.022), (12, -0.027), (13, -0.007), (14, -0.093), (15, -0.003), (16, 0.029), (17, 0.094), (18, -0.03), (19, -0.09), (20, 0.016), (21, -0.005), (22, -0.083), (23, -0.043), (24, 0.164), (25, 0.049), (26, 0.01), (27, 0.082), (28, -0.154), (29, 0.166), (30, 0.176), (31, 0.208), (32, -0.109), (33, 0.116), (34, -0.038), (35, 0.185), (36, 0.004), (37, 0.016), (38, 0.075), (39, 0.044), (40, 0.064), (41, -0.16), (42, 0.077), (43, -0.101), (44, 0.063), (45, -0.076), (46, 0.073), (47, -0.011), (48, -0.08), (49, 0.045)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96719187 50 jmlr-2010-Image Denoising with Kernels Based on Natural Image Relations

Author: Valero Laparra, Juan Gutiérrez, Gustavo Camps-Valls, Jesús Malo

2 0.57425869 59 jmlr-2010-Large Scale Online Learning of Image Similarity Through Ranking

Author: Gal Chechik, Varun Sharma, Uri Shalit, Samy Bengio

3 0.47620481 107 jmlr-2010-Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion

Author: Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol

4 0.43488449 101 jmlr-2010-Second-Order Bilinear Discriminant Analysis

Author: Christoforos Christoforou, Robert Haralick, Paul Sajda, Lucas C. Parra

5 0.30715063 87 jmlr-2010-Online Learning for Matrix Factorization and Sparse Coding

Author: Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro

6 0.26276362 117 jmlr-2010-Why Does Unsupervised Pre-training Help Deep Learning?

7 0.24051848 65 jmlr-2010-Learning Translation Invariant Kernels for Classification

8 0.22839192 16 jmlr-2010-Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory

9 0.20235723 4 jmlr-2010-A Generalized Path Integral Control Approach to Reinforcement Learning

10 0.18960612 61 jmlr-2010-Learning From Crowds

11 0.17999174 36 jmlr-2010-Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity

12 0.17452079 15 jmlr-2010-Approximate Tree Kernels

13 0.16824554 57 jmlr-2010-Iterative Scaling and Coordinate Descent Methods for Maximum Entropy Models

14 0.1658328 30 jmlr-2010-Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting

15 0.1656609 69 jmlr-2010-Lp-Nested Symmetric Distributions

16 0.16181387 98 jmlr-2010-Regularized Discriminant Analysis, Ridge Regression and Beyond

17 0.15577558 47 jmlr-2010-Hilbert Space Embeddings and Metrics on Probability Measures

18 0.1499579 45 jmlr-2010-High-dimensional Variable Selection with Sparse Random Projections: Measurement Sparsity and Statistical Efficiency

19 0.14689292 3 jmlr-2010-A Fast Hybrid Algorithm for Large-Scalel1-Regularized Logistic Regression

20 0.14685881 94 jmlr-2010-Quadratic Programming Feature Selection

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.017), (8, 0.029), (21, 0.016), (24, 0.012), (32, 0.057), (33, 0.018), (34, 0.476), (36, 0.02), (37, 0.04), (75, 0.141), (81, 0.018), (85, 0.045), (96, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.75087833 37 jmlr-2010-Evolving Static Representations for Task Transfer

Author: Phillip Verbancsics, Kenneth O. Stanley

Abstract: An important goal for machine learning is to transfer knowledge between tasks. For example, learning to play RoboCup Keepaway should contribute to learning the full game of RoboCup soccer. Previous approaches to transfer in Keepaway have focused on transforming the original representation to ﬁt the new task. In contrast, this paper explores the idea that transfer is most effective if the representation is designed to be the same even across different tasks. To demonstrate this point, a bird’s eye view (BEV) representation is introduced that can represent different tasks on the same two-dimensional map. For example, both the 3 vs. 2 and 4 vs. 3 Keepaway tasks can be represented on the same BEV. Yet the problem is that a raw two-dimensional map is high-dimensional and unstructured. This paper shows how this problem is addressed naturally by an idea from evolutionary computation called indirect encoding, which compresses the representation by exploiting its geometry. The result is that the BEV learns a Keepaway policy that transfers without further learning or manipulation. It also facilitates transferring knowledge learned in a different domain, Knight Joust, into Keepaway. Finally, the indirect encoding of the BEV means that its geometry can be changed without altering the solution. Thus static representations facilitate several kinds of transfer.

same-paper 2 0.71169364 50 jmlr-2010-Image Denoising with Kernels Based on Natural Image Relations

Author: Valero Laparra, Juan Gutiérrez, Gustavo Camps-Valls, Jesús Malo

3 0.32487696 46 jmlr-2010-High Dimensional Inverse Covariance Matrix Estimation via Linear Programming

Author: Ming Yuan

Abstract: This paper considers the problem of estimating a high dimensional inverse covariance matrix that can be well approximated by “sparse” matrices. Taking advantage of the connection between multivariate linear regression and entries of the inverse covariance matrix, we propose an estimating procedure that can effectively exploit such “sparsity”. The proposed method can be computed using linear programming and therefore has the potential to be used in very high dimensional problems. Oracle inequalities are established for the estimation error in terms of several operator norms, showing that the method is adaptive to different types of sparsity of the problem. Keywords: covariance selection, Dantzig selector, Gaussian graphical model, inverse covariance matrix, Lasso, linear programming, oracle inequality, sparsity

4 0.32437363 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes

Author: Jitkomut Songsiri, Lieven Vandenberghe

Abstract: An algorithm is presented for topology selection in graphical models of autoregressive Gaussian time series. The graph topology of the model represents the sparsity pattern of the inverse spectrum of the time series and characterizes conditional independence relations between the variables. The method proposed in the paper is based on an ℓ1 -type nonsmooth regularization of the conditional maximum likelihood estimation problem. We show that this reduces to a convex optimization problem and describe a large-scale algorithm that solves the dual problem via the gradient projection method. Results of experiments with randomly generated and real data sets are also included. Keywords: graphical models, time series, topology selection, convex optimization

5 0.32238439 74 jmlr-2010-Maximum Relative Margin and Data-Dependent Regularization

Author: Pannagadatta K. Shivaswamy, Tony Jebara

Abstract: Leading classiﬁcation methods such as support vector machines (SVMs) and their counterparts achieve strong generalization performance by maximizing the margin of separation between data classes. While the maximum margin approach has achieved promising performance, this article identiﬁes its sensitivity to afﬁne transformations of the data and to directions with large data spread. Maximum margin solutions may be misled by the spread of data and preferentially separate classes along large spread directions. This article corrects these weaknesses by measuring margin not in the absolute sense but rather only relative to the spread of data in any projection direction. Maximum relative margin corresponds to a data-dependent regularization on the classiﬁcation function while maximum absolute margin corresponds to an ℓ2 norm constraint on the classiﬁcation function. Interestingly, the proposed improvements only require simple extensions to existing maximum margin formulations and preserve the computational efﬁciency of SVMs. Through the maximization of relative margin, surprising performance gains are achieved on real-world problems such as digit, text classiﬁcation and on several other benchmark data sets. In addition, risk bounds are derived for the new formulation based on Rademacher averages. Keywords: support vector machines, kernel methods, large margin, Rademacher complexity

6 0.32141101 63 jmlr-2010-Learning Instance-Specific Predictive Models

7 0.32133952 43 jmlr-2010-Generalized Power Method for Sparse Principal Component Analysis

8 0.3206006 87 jmlr-2010-Online Learning for Matrix Factorization and Sparse Coding

9 0.32013947 103 jmlr-2010-Sparse Semi-supervised Learning Using Conjugate Functions

10 0.31980333 92 jmlr-2010-Practical Approaches to Principal Component Analysis in the Presence of Missing Values

11 0.31932202 114 jmlr-2010-Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels

12 0.31916934 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing

13 0.31828359 59 jmlr-2010-Large Scale Online Learning of Image Similarity Through Ranking

14 0.31789795 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond

15 0.31659192 66 jmlr-2010-Linear Algorithms for Online Multitask Classification

16 0.31635794 40 jmlr-2010-Fast and Scalable Local Kernel Machines

17 0.31612107 109 jmlr-2010-Stochastic Composite Likelihood

18 0.31554881 105 jmlr-2010-Spectral Regularization Algorithms for Learning Large Incomplete Matrices

19 0.31538951 49 jmlr-2010-Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data

20 0.31506217 33 jmlr-2010-Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers