jmlr jmlr2005 jmlr2005-35 knowledge-graph by maker-knowledge-mining

35 jmlr-2005-Frames, Reproducing Kernels, Regularization and Learning

Source: pdf

Author: Alain Rakotomamonjy, Stéphane Canu

Abstract: This work deals with a method for building a reproducing kernel Hilbert space (RKHS) from a Hilbert space with frame elements having special properties. Conditions on existence and a method of construction are given. Then, these RKHS are used within the framework of regularization theory for function approximation. Implications on semiparametric estimation are discussed and a multiscale scheme of regularization is also proposed. Results on toy and real-world approximation problems illustrate the effectiveness of such methods. Keywords: regularization, kernel, frames, wavelets

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 FR Perception, Syst` mes et Information CNRS FRE2645 e INSA de Rouen 76801 Saint Etienne du Rouvray, France Editor: Alex Smola Abstract This work deals with a method for building a reproducing kernel Hilbert space (RKHS) from a Hilbert space with frame elements having special properties. [sent-5, score-1.012]

2 Implications on semiparametric estimation are discussed and a multiscale scheme of regularization is also proposed. [sent-8, score-0.555]

3 The purpose of this paper is to present a method for constructing an RKHS and its associated kernel by means of frame theory (Dufﬁn and Schaeffer, 1952; Daubechies, 1992). [sent-31, score-0.854]

4 A frame of a Hilbert space spans any vector of the space by linear combination of the frame elements. [sent-32, score-1.587]

5 But unlike a basis, a frame is not necessarily linear independent although it achieves stable representation. [sent-33, score-0.759]

6 Since a frame is a more general way to represent elements of Hilbert space, it allows ﬂexibility in the representation of any vector of the space. [sent-34, score-0.774]

7 By giving conditions for constructing arbitrary RKHS from frame elements, our goal is to widen the choice of kernel so that in future applications, one can adapt its RKHS to prior information available concerning a problem at hand. [sent-35, score-0.886]

8 After a short introduction about frame theory, we give conditions for a Hilbert space described by a frame to be an RKHS and then derive the corresponding kernel. [sent-38, score-1.554]

9 Section 5 discusses implication of these results on regularization technique and proposes an algorithm for multiscale approximation. [sent-40, score-0.415]

10 Frames and Reproducing Kernel Hilbert Spaces In this section, we give an introduction to frame theory that will be useful for the remainder of the paper. [sent-64, score-0.759]

11 Deﬁnition 1 A set of vectors {φn }n∈Γ is a frame of a Hilbert space H if there exists two constants A > 0 and ∞ > B ≥ A > 0 so that A|| f ||2 ≤ H ∀f ∈ H , ∑| n∈Γ f , φn H |2 ≤ B|| f ||2 . [sent-67, score-0.78]

12 H (4) The frame is said to be tight if A and B are equal. [sent-68, score-0.782]

13 If the set {φn }n∈Γ satisﬁes the frame condition then the frame operator U can be deﬁned as −→ 2 (5) f −→ { f , φn H }n∈Γ . [sent-69, score-1.557]

14 The reconstruction of f from its frame coefﬁcients needs the deﬁnition of a dual frame. [sent-70, score-0.805]

15 1487 (6) R AKOTOMAMONJY AND C ANU Theorem 1 (Daubechies, 1992) Let {φn }n∈Γ be a frame of H with frame bounds A and B. [sent-72, score-1.518]

16 Let us ¯ ¯ deﬁne the dual frame {φn }n∈Γ as φn = (U U)−1 φn . [sent-73, score-0.805]

17 (7) (8) n∈Γ 1 ¯ If the frame is tight then φn = A φn . [sent-75, score-0.782]

18 ¯ This theorem also shows that the dual frame {φn }n∈Γ is a family of vectors which allows to recover any f ∈ H , and consequently one can write each vector of the frame and the dual frame as ∀m ∈ Γ, ¯ φm = ∀m ∈ Γ, φm = ∑ ¯ ¯ φm , φn H φn (9) ∑ ¯ φm , φn H φn . [sent-76, score-2.369]

19 (10) n∈Γ and n∈Γ According to this theorem and the above equations, one can note that an orthonormal basis ¯ of H is a special case of frame where A = B = 1, φn = φn and φn = 1. [sent-77, score-0.796]

20 However, as stated by Daubechies (1992), frame redundancy can be statistically useful. [sent-78, score-0.759]

21 For the sake of simplicity, in the following we will call frameable Hilbert space, a Hilbert space H for which there exists a set of vector of H that forms a frame of H . [sent-82, score-0.885]

22 2 A Reproducing Kernel Hilbert Space and Its Frame After this short introduction on frame theory, let us look at the conditions under which a frameable Hilbert space is also a reproducing kernel Hilbert space. [sent-85, score-1.075]

23 (12) Theorem 5 Let H be a Hilbert space and {φn }n∈Γ be a frame of this space. [sent-94, score-0.78]

24 Hence the equation ∀f ∈ H , f= ∑ ¯ f , φn H φn n∈Γ holds in H according to the frame property given in Equation (8) (Mallat, 1998; Daubechies, 1992). [sent-98, score-0.759]

25 H ¯ Furthermore, according to Theorem (1) we know that {φn }n∈Γ is another frame of H . [sent-108, score-0.759]

26 Furthermore, any f ∈ H can be expanded by means of the frame of H , thus according to Equation (14): f (t) = ∑ ¯ f , φn H φn (t) n∈Γ = ¯ f (·), ∑ φn (·)φn (t) n∈Γ (20) H and since H is an RKHS, we have ∀t ∈ Ω, ∀f ∈ H , f (t) = f (·), K(·,t) H . [sent-111, score-0.782]

27 n∈Γ These propositions show that a Hilbert space which can be described by its frame is under general conditions, a reproducing kernel Hilbert space and its reproducing kernel is given by a linear combination of its frame and dual frame product. [sent-113, score-2.715]

28 However, these properties can also be easily shown owing to the frame representation. [sent-117, score-0.789]

29 Here, we are interested in constructing a reproducing kernel Hilbert space together with its frame and discuss about the implications of such result in a functional estimation framework. [sent-125, score-1.022]

30 For that it sufﬁces to prove that the frame {φn } satisﬁes condition given in Theorem 5. [sent-165, score-0.759]

31 ,N is a frame of H and owing to Theorem 1, the dual ¯ ¯ frame {φn }n=1,. [sent-169, score-1.594]

32 5 −12 −2 0 1 2 3 4 5 x 6 7 8 9 10 −14 0 1 2 3 4 5 x 6 7 8 9 10 Figure 1: Examples of wavelet frame elements (left) anf their dual elements (right). [sent-179, score-1.022]

33 Example 3 Consider a ﬁnite set of wavelet on R t − ku0 a j 1 ψ j,k (t) = √ ψ aj aj , j ∈ Z : jmin ≤ j ≤ jmax , k ∈ Z : kmin ≤ k ≤ kmax where (a, u0 ) ∈ R∗ × R, and ( jmin , jmax , kmin , kmax ) ∈ Z4 . [sent-185, score-0.419]

34 Figure (1) plots an example of wavelet frame and dual frame elements for a dilation j = −7. [sent-187, score-1.841]

35 ,N , if endowed with an adequate inner product, is always a frame of the space it spans (see step 2 of the proof of Theorem (8)) . [sent-194, score-0.864]

36 This is not always true for an inﬁnite set of functions and in this case, the frame condition given in Equation (4) and the boundedness of the evaluation functional in Equation (13) have to be veriﬁed. [sent-195, score-0.792]

37 Next examples are examples of inﬁnite dimension RKHS which kernels are given explicitly by their frame elements. [sent-196, score-0.791]

38 Hence, {φn (x)}n∈N∗ is a tight frame of H with the frame constant A equals to 1. [sent-199, score-1.541]

39 Let us show that this frame verify the condition given in Theorem (5) in order to prove that H is an RKHS. [sent-200, score-0.759]

40 ≤ Hence, according to the adjoint frame operator U ∗ given in equation (6), for any t ∈ Ω, the function ∑n∈N∗ φn (·)φn (t) is a well-deﬁned function of H . [sent-203, score-0.815]

41 Since according to the frame property given in 1 ¯ ¯ Equation (8), each frame element can be expanded as φk = ∑n∈Γ φk , φn φn , we have φk = αk φk . [sent-220, score-1.541]

42 Hence since the frame and dual frame elements are so that for any t ∈ Ω, we have ¯ (φn (t))2 < ∞. [sent-221, score-1.579]

43 Supposing that U is the frame operator of a either ﬁnite or inﬁnite dimensional RKHS, their kernel is based on the statement that the operator Q = U ∗U is a symmetric positive deﬁnite operator and the Green function associated to this operator is a Mercer kernel. [sent-234, score-1.012]

44 Thus, the kernel they proposed, named the frame operator kernel, can be expanded with respect to the dual frame elements as ¯ ¯ K(s,t) = ∑ φn (s)φn (t). [sent-235, score-1.72]

45 More recently, Opfer (2004b) has shown that the kernel associated to an RKHS H can be expanded as K(s,t) = ∑ φn (s)φn (t) n∈Γ if and only if the set of functions {φn }n∈Γ is a super tight frame (which is a tight frame with frame bounds equal to 1) of H . [sent-243, score-2.425]

46 This results is a particular case of Theorem (7) since for a super tight ¯ frame each dual frame element is φn = φn . [sent-244, score-1.587]

47 Notice that methods for choosing the appropriate frame elements of the RKHS are not given here. [sent-263, score-0.774]

48 The space spanned by these frame elements associated to L2 (R) inner product is an RKHS. [sent-268, score-0.842]

49 • Since conditions for obtaining a frameable RKHS hold mainly for ﬁnite dimensional space (although, it may exists inﬁnite dimensional Hilbert space which frame elements satisfy hypotheses of Theorem (5)), it is fairest to compare the frameable kernel to a ﬁnite dimensional kernel. [sent-272, score-1.174]

50 Conditions for constructing frameable kernel are less restricting since the orthogonality of the frame elements are not needed. [sent-276, score-0.974]

51 One can note that for tight frame or orthonormal basis, frameable kernel leads to the following expansion: N K(s,t) = 1 ∑ A ψn (s)ψn (t) n=1 since dual frame elements is equal to frame elements up to a multiplicative constant depending on the frame bound A . [sent-277, score-3.336]

52 Tightness of a frame is a very interesting property since in this case processing the dual frame is no more needed. [sent-278, score-1.564]

53 However, unless we explicitly build the RKHS H so that it is spanned by a tight frame (as in example (5) or in Opfer (2004b)), tightness of a frame needs more constraints on the frame elements than other frames. [sent-279, score-2.357]

54 Thus a tight frame of a space is harder to build than other frame of the same space. [sent-280, score-1.579]

55 n Again in this case, the frame kernel expansion is similar to the Mercer’s kernel one. [sent-282, score-0.917]

56 The main difference between the ﬁnite and inﬁnite dimensional case relies on the fact that a ﬁnite set of functions {φn } is always a frame of the space it spans (provided that this latter is endowed with an adequate inner product). [sent-283, score-0.882]

57 The mapping is consequently strictly related to the frame elements {φn } and is implicitly deﬁned by them. [sent-291, score-0.774]

58 • Besides, since the kernel has an expansion with regards to the frame elements, the solution of Equation (1) is of easier interpretation. [sent-292, score-0.869]

59 Indeed, although the solution depends on the kernel expression, it can be rewritten as a linear combination of the frame elements. [sent-293, score-0.838]

60 Thus, with frame-based kernel, one has to compute the dual frame elements, (for instance, by means of an iterative algorithm, as the one described in (Grochenig, 1993)). [sent-297, score-0.805]

61 Hence, if the number N of frame elements describing the kernel and the number of data are large, building K becomes rapidly very time-consuming (of an order of N 2 · 2 ). [sent-300, score-0.874]

62 Intuitively, one should use any vector that comes from “good” prior knowledge, in the parametric part of the approximation while leaving in the kernel expansion the other frame elements. [sent-322, score-0.88]

63 Another perspective which follows directly from this ﬁnding is a technique of regularization that we call multiscale regularization which is inspired from the multiresolution analysis of Mallat (1998). [sent-333, score-0.523]

64 According to the representer Thej=1 orem (9), fm−k (·) can be written: n fm−k (·) = ∑ ci,m−k Km−k (xi , ·) + i=1 ∑ d j,m−k g j (·) (28) j∈∪m−k Γl l=0 and thus the overall solution of the so-called multiscale regularization is m fˆ(·) = n ∑ ∑ ci,m−k Km−k (xi , ·) + ∑ d j,0 g j (·). [sent-369, score-0.415]

65 k=1 i=1 (29) j∈Γ0 The solution fˆ of the multiscale regularization is the sum of different approximations on nested spaces. [sent-370, score-0.43]

66 Illustrations of the multiscale regularization algorithm on both toy and real-world problems are given in the next section. [sent-387, score-0.415]

67 These values are those proposed by Daubechies (Daubechies, 1992) so that a wavelet set is a frame of L2 (R). [sent-415, score-0.946]

68 For frame-based kernel, if necessary the dual frame is processed using Grochenig’s algorithm. [sent-421, score-0.805]

69 Using prior knowledge on the problem in this context does not improve performance (Sin/Sinc kernel or wavelet kernel compared to gaussian kernel). [sent-428, score-0.397]

70 A justiﬁcation can be that such kernels use strong prior knowledge (the sin frame element) that is included in the kernel expansion and thus this prior knowledge gets regularized as much as other frame elements. [sent-429, score-1.761]

71 • Gaussian kernel and wavelet basis functions ψ j,k (x) = 1505 1 √ ψ aj x−ku0 a j aj , j ∈ {0, 5} R AKOTOMAMONJY AND C ANU • Wavelet kernel and wavelet basis functions: these functions are the same as in the previous case but the kernel is built only with low dilation wavelet ( j = −10). [sent-435, score-0.997]

72 However, this is still of interest as using wavelet kernel and basis functions does corresponds to prior knowledge that can be reformulated as: “the function to be approximated contains smooth structure (the sin part), irregular structures (the sinc part) and noise”. [sent-453, score-0.55]

73 Thus, prior knowledge on structures which may be easiest to get than prior knowledge on basis function can be easily exploited by means of wavelet span and wavelet kernel. [sent-455, score-0.512]

74 The learning machines are: regularization networks, SVM, semiparametric regularization and multiscale regularization. [sent-459, score-0.645]

75 For semiparametric regularization, the kernel and basis setting was built with a wavelet set given by x − ku0 a j 1 . [sent-486, score-0.408]

76 0069 Table 3: True mean-square-error generalization for regularization networks, SVM, semiparametric regularization networks, and multiscale regularization for f1 and f2 . [sent-511, score-0.753]

77 The kernel is constructed from a set of wavelet frame of dilation jSPH and the basis functions are given by another wavelet set described by jSPL . [sent-512, score-1.307]

78 For multiscale regularization, the setting of the nested spaces are the following: t − ku0 a j ,j=5 , aj a 1 t − ku0 a j F0 = span √ j ψ ,j=0 , aj a 1 t − ku0 a j F1 = span √ j ψ , j = −10 . [sent-513, score-0.527]

79 Comments and analysis of this experiment validating the concept of multiscale approximation are: - semiparametric 2 and multiscale approximation give the best mean-square error. [sent-518, score-0.736]

80 - multiscale approximation balances loss of approximation due to error at each level (see Figure) and ﬂexibility of regularization, thus its performance is better than semiparametric one’s when the multiscale structure of the signal is more pronounced. [sent-525, score-0.736]

81 In this experiment, it seems that leaving the space spanned by wavelet of dilation j = 0 on the parametric span (the space which is not regularized) leads to overﬁtting. [sent-536, score-0.406]

82 3) shows that multiscale and semiparametric algorithms achieve better approximation of the “wiggles” than nonparametric methods without compromising smoothness in region of the functions where it is needed. [sent-540, score-0.429]

83 For this purpose, two models have been compared, the ﬁrst one being a regularization networks with a gaussian kernel whereas the other one is a multiscale regularization algorithm with an orthogonal wavelet kernel. [sent-546, score-0.826]

84 The wavelet that has been used is a Symmlet wavelet with 4 vanishing moments (Mallat, 1998). [sent-547, score-0.374]

85 30 (22) Table 4: Averaged normalized mean-square error of estimation of real-world time-series with a gaussian and a wavelet multiscale regularization networks. [sent-563, score-0.639]

86 It shows that for both time-series, the multiscale algorithm performs better than the gaussian regularization networks. [sent-566, score-0.434]

87 This ﬁgure shows that the best model for the gaussian regularization networks is rather a smooth model whereas the wavelet multiscale model is far less smooth. [sent-571, score-0.639]

88 Conclusions In this paper, we showed that an RKHS can be deﬁned by its frame elements and conversely, one can construct an RKHS from a frame. [sent-576, score-0.774]

89 These conditions depend on the frame and the dual frame elements of the Hilbert space and under some weak hypothesis, these conditions are easy to check (see example 5) . [sent-579, score-1.63]

90 But the difﬁculty stands in one question: How to transform prior information on the learning problem to frame elements? [sent-623, score-0.776]

91 • reconstruction from frame elements has been shown to be more robust in presence of noise (Daubechies, 1992). [sent-625, score-0.774]

92 In fact, redundancy attenuates noise effects on the frame coefﬁcients. [sent-626, score-0.759]

93 Thus, this is a good statistical argument for using frame with high redundancy. [sent-627, score-0.759]

94 However, this implies the computing of the dual frame and consequently a higher time complexity of the algorithm. [sent-628, score-0.805]

95 • a multiscale regularization algorithm has been sketched in this paper in order to take advantage of frame kernels. [sent-630, score-1.174]

96 We recall in this appendix a numerical method to process the dual frame of a frameable Hilbert space H with frame elements {φn }n∈Γ . [sent-637, score-1.705]

97 (35) One can also write the operator S as S U ∗U where U is the frame operator deﬁned in equation (5) and (6). [sent-639, score-0.837]

98 Then, in order to process numerically the dual frame of H , one has to apply this algorithm on each element of the frame. [sent-648, score-0.805]

99 One can note that the speed of convergence is highly dependent on frame bounds A and B. [sent-649, score-0.759]

100 Tight frame expansions of multiscale reproducing kernels in Sobolev spaces. [sent-803, score-1.194]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('frame', 0.759), ('multiscale', 0.307), ('rkhs', 0.193), ('wavelet', 0.187), ('hilbert', 0.169), ('sinc', 0.15), ('semiparametric', 0.122), ('akotomamonjy', 0.112), ('eproducing', 0.112), ('rames', 0.112), ('regularization', 0.108), ('frameable', 0.105), ('reproducing', 0.096), ('anu', 0.094), ('egularization', 0.083), ('kernel', 0.079), ('dilation', 0.075), ('ernels', 0.07), ('spn', 0.067), ('sin', 0.066), ('opfer', 0.06), ('span', 0.052), ('dual', 0.046), ('daubechies', 0.045), ('aj', 0.042), ('earning', 0.041), ('operator', 0.039), ('basiron', 0.037), ('grochenig', 0.037), ('jmax', 0.037), ('jmin', 0.037), ('endowed', 0.035), ('hi', 0.033), ('functional', 0.033), ('amplitude', 0.033), ('kernels', 0.032), ('regards', 0.031), ('gao', 0.031), ('owing', 0.03), ('amato', 0.03), ('mallat', 0.03), ('hyperparameters', 0.029), ('cn', 0.028), ('spans', 0.027), ('pn', 0.026), ('fi', 0.026), ('mercer', 0.026), ('fm', 0.025), ('frames', 0.025), ('spanned', 0.025), ('parametric', 0.025), ('expanded', 0.023), ('tight', 0.023), ('ars', 0.022), ('atteia', 0.022), ('debnath', 0.022), ('duf', 0.022), ('hilb', 0.022), ('insa', 0.022), ('jsph', 0.022), ('jspl', 0.022), ('rakotomamonjy', 0.022), ('rouen', 0.022), ('seminorm', 0.022), ('inner', 0.022), ('exibility', 0.021), ('building', 0.021), ('space', 0.021), ('gi', 0.02), ('basis', 0.02), ('svms', 0.02), ('mt', 0.02), ('trend', 0.019), ('girosi', 0.019), ('gaussian', 0.019), ('nely', 0.019), ('alain', 0.019), ('berlinet', 0.019), ('sobolev', 0.019), ('tikhonov', 0.019), ('networks', 0.018), ('dimensional', 0.018), ('hypothesis', 0.018), ('estimation', 0.018), ('spaces', 0.017), ('orthonormal', 0.017), ('build', 0.017), ('prior', 0.017), ('adjoint', 0.017), ('engines', 0.017), ('constructing', 0.016), ('knowledge', 0.016), ('nested', 0.015), ('elements', 0.015), ('scholkopf', 0.015), ('irregular', 0.015), ('conditions', 0.015), ('agnan', 0.015), ('angewandte', 0.015), ('fur', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 35 jmlr-2005-Frames, Reproducing Kernels, Regularization and Learning

Author: Alain Rakotomamonjy, Stéphane Canu

2 0.075471699 49 jmlr-2005-Learning the Kernel with Hyperkernels (Kernel Machines Section)

Author: Cheng Soon Ong, Alexander J. Smola, Robert C. Williamson

Abstract: This paper addresses the problem of choosing a kernel suitable for estimation with a support vector machine, hence further automating machine learning. This goal is achieved by deﬁning a reproducing kernel Hilbert space on the space of kernels itself. Such a formulation leads to a statistical estimation problem similar to the problem of minimizing a regularized risk functional. We state the equivalent representer theorem for the choice of kernels and present a semideﬁnite programming formulation of the resulting optimization problem. Several recipes for constructing hyperkernels are provided, as well as the details of common machine learning problems. Experimental results for classiﬁcation, regression and novelty detection on UCI data show the feasibility of our approach. Keywords: learning the kernel, capacity control, kernel methods, support vector machines, representer theorem, semideﬁnite programming

3 0.06260623 47 jmlr-2005-Learning from Examples as an Inverse Problem

Author: Ernesto De Vito, Lorenzo Rosasco, Andrea Caponnetto, Umberto De Giovannini, Francesca Odone

Abstract: Many works related learning from examples to regularization techniques for inverse problems, emphasizing the strong algorithmic and conceptual analogy of certain learning algorithms with regularization algorithms. In particular it is well known that regularization schemes such as Tikhonov regularization can be effectively used in the context of learning and are closely related to algorithms such as support vector machines. Nevertheless the connection with inverse problem was considered only for the discrete (ﬁnite sample) problem and the probabilistic aspects of learning from examples were not taken into account. In this paper we provide a natural extension of such analysis to the continuous (population) case and study the interplay between the discrete and continuous problems. From a theoretical point of view, this allows to draw a clear connection between the consistency approach in learning theory and the stability convergence property in ill-posed inverse problems. The main mathematical result of the paper is a new probabilistic bound for the regularized least-squares algorithm. By means of standard results on the approximation term, the consistency of the algorithm easily follows. Keywords: statistical learning, inverse problems, regularization theory, consistency c 2005 Ernesto De Vito, Lorenzo Rosasco, Andrea Caponnetto, Umberto De Giovannini and Francesca Odone. D E V ITO , ROSASCO , C APONNETTO , D E G IOVANNINI AND O DONE

4 0.056844223 45 jmlr-2005-Learning Multiple Tasks with Kernel Methods

Author: Theodoros Evgeniou, Charles A. Micchelli, Massimiliano Pontil

Abstract: We study the problem of learning many related tasks simultaneously using kernel methods and regularization. The standard single-task kernel methods, such as support vector machines and regularization networks, are extended to the case of multi-task learning. Our analysis shows that the problem of estimating many task functions with regularization can be cast as a single task learning problem if a family of multi-task kernel functions we deﬁne is used. These kernels model relations among the tasks and are derived from a novel form of regularizers. Speciﬁc kernels that can be used for multi-task learning are provided and experimentally tested on two real data sets. In agreement with past empirical work on multi-task learning, the experiments show that learning multiple related tasks simultaneously using the proposed approach can signiﬁcantly outperform standard single-task learning particularly when there are many related tasks but few data per task. Keywords: multi-task learning, kernels, vector-valued functions, regularization, learning algorithms

5 0.053143844 64 jmlr-2005-Semigroup Kernels on Measures

Author: Marco Cuturi, Kenji Fukumizu, Jean-Philippe Vert

Abstract: We present a family of positive deﬁnite kernels on measures, characterized by the fact that the value of the kernel between two measures is a function of their sum. These kernels can be used to derive kernels on structured objects, such as images and texts, by representing these objects as sets of components, such as pixels or words, or more generally as measures on the space of components. Several kernels studied in this work make use of common quantities deﬁned on measures such as entropy or generalized variance to detect similarities. Given an a priori kernel on the space of components itself, the approach is further extended by restating the previous results in a more efﬁcient and ﬂexible framework using the “kernel trick”. Finally, a constructive approach to such positive deﬁnite kernels through an integral representation theorem is proved, before presenting experimental results on a benchmark experiment of handwritten digits classiﬁcation to illustrate the validity of the approach. Keywords: kernels on measures, semigroup theory, Jensen divergence, generalized variance, reproducing kernel Hilbert space

6 0.045331836 68 jmlr-2005-Tree-Based Batch Mode Reinforcement Learning

7 0.038498376 70 jmlr-2005-Universal Algorithms for Learning Theory Part I : Piecewise Constant Functions

8 0.037352309 48 jmlr-2005-Learning the Kernel Function via Regularization

9 0.033344757 26 jmlr-2005-Diffusion Kernels on Statistical Manifolds

10 0.028870407 66 jmlr-2005-Smooth ε-Insensitive Regression by Loss Symmetrization

11 0.027813101 36 jmlr-2005-Gaussian Processes for Ordinal Regression

12 0.026795158 41 jmlr-2005-Kernel Methods for Measuring Independence

13 0.025241125 34 jmlr-2005-Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach

14 0.025103532 56 jmlr-2005-Maximum Margin Algorithms with Boolean Kernels

15 0.023918172 30 jmlr-2005-Estimating Functions for Blind Separation When Sources Have Variance Dependencies

16 0.023174211 4 jmlr-2005-A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

17 0.021595102 63 jmlr-2005-Quasi-Geodesic Neural Learning Algorithms Over the Orthogonal Group: A Tutorial

18 0.021556778 62 jmlr-2005-Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models

19 0.021341588 42 jmlr-2005-Large Margin Methods for Structured and Interdependent Output Variables

20 0.021068722 40 jmlr-2005-Inner Product Spaces for Bayesian Networks

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.13), (1, 0.102), (2, 0.092), (3, 0.184), (4, 0.086), (5, 0.15), (6, 0.053), (7, -0.032), (8, 0.021), (9, 0.063), (10, 0.014), (11, 0.026), (12, 0.12), (13, 0.068), (14, -0.071), (15, -0.065), (16, -0.082), (17, 0.197), (18, -0.22), (19, 0.136), (20, -0.045), (21, -0.03), (22, -0.057), (23, -0.249), (24, 0.037), (25, 0.086), (26, 0.063), (27, 0.281), (28, 0.188), (29, -0.193), (30, -0.089), (31, 0.034), (32, -0.049), (33, -0.041), (34, 0.113), (35, 0.159), (36, -0.231), (37, 0.256), (38, -0.193), (39, 0.233), (40, -0.235), (41, 0.085), (42, 0.208), (43, -0.012), (44, -0.112), (45, 0.05), (46, 0.063), (47, 0.04), (48, -0.049), (49, -0.259)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95075589 35 jmlr-2005-Frames, Reproducing Kernels, Regularization and Learning

Author: Alain Rakotomamonjy, Stéphane Canu

2 0.22745374 49 jmlr-2005-Learning the Kernel with Hyperkernels (Kernel Machines Section)

Author: Cheng Soon Ong, Alexander J. Smola, Robert C. Williamson

3 0.20438293 68 jmlr-2005-Tree-Based Batch Mode Reinforcement Learning

Author: Damien Ernst, Pierre Geurts, Louis Wehenkel

Abstract: Reinforcement learning aims to determine an optimal control policy from interaction with a system or from observations gathered from a system. In batch mode, it can be achieved by approximating the so-called Q-function based on a set of four-tuples (xt , ut , rt , xt+1 ) where xt denotes the system state at time t, ut the control action taken, rt the instantaneous reward obtained and xt+1 the successor state of the system, and by determining the control policy from this Q-function. The Q-function approximation may be obtained from the limit of a sequence of (batch mode) supervised learning problems. Within this framework we describe the use of several classical tree-based supervised learning methods (CART, Kd-tree, tree bagging) and two newly proposed ensemble algorithms, namely extremely and totally randomized trees. We study their performances on several examples and ﬁnd that the ensemble methods based on regression trees perform well in extracting relevant information about the optimal control policy from sets of four-tuples. In particular, the totally randomized trees give good results while ensuring the convergence of the sequence, whereas by relaxing the convergence constraint even better accuracy results are provided by the extremely randomized trees. Keywords: batch mode reinforcement learning, regression trees, ensemble methods, supervised learning, ﬁtted value iteration, optimal control

4 0.19345939 47 jmlr-2005-Learning from Examples as an Inverse Problem

Author: Ernesto De Vito, Lorenzo Rosasco, Andrea Caponnetto, Umberto De Giovannini, Francesca Odone

5 0.17704117 64 jmlr-2005-Semigroup Kernels on Measures

Author: Marco Cuturi, Kenji Fukumizu, Jean-Philippe Vert

6 0.14941064 45 jmlr-2005-Learning Multiple Tasks with Kernel Methods

7 0.14512508 26 jmlr-2005-Diffusion Kernels on Statistical Manifolds

8 0.12057212 48 jmlr-2005-Learning the Kernel Function via Regularization

9 0.10770424 36 jmlr-2005-Gaussian Processes for Ordinal Regression

10 0.10411312 27 jmlr-2005-Dimension Reduction in Text Classification with Support Vector Machines

11 0.096448034 58 jmlr-2005-Multiclass Classification with Multi-Prototype Support Vector Machines

12 0.094145499 4 jmlr-2005-A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

13 0.093453377 56 jmlr-2005-Maximum Margin Algorithms with Boolean Kernels

14 0.092016518 25 jmlr-2005-Denoising Source Separation

15 0.087828673 30 jmlr-2005-Estimating Functions for Blind Separation When Sources Have Variance Dependencies

16 0.087302126 40 jmlr-2005-Inner Product Spaces for Bayesian Networks

17 0.085845999 62 jmlr-2005-Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models

18 0.08581616 41 jmlr-2005-Kernel Methods for Measuring Independence

19 0.083620042 33 jmlr-2005-Fast Kernel Classifiers with Online and Active Learning

20 0.082644403 66 jmlr-2005-Smooth ε-Insensitive Regression by Loss Symmetrization

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.431), (13, 0.028), (17, 0.024), (19, 0.021), (36, 0.035), (37, 0.027), (42, 0.017), (43, 0.043), (47, 0.015), (52, 0.078), (59, 0.014), (70, 0.026), (76, 0.016), (88, 0.064), (90, 0.033), (94, 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.69138002 35 jmlr-2005-Frames, Reproducing Kernels, Regularization and Learning

Author: Alain Rakotomamonjy, Stéphane Canu

2 0.27284402 33 jmlr-2005-Fast Kernel Classifiers with Online and Active Learning

Author: Antoine Bordes, Seyda Ertekin, Jason Weston, Léon Bottou

Abstract: Very high dimensional learning systems become theoretically possible when training examples are abundant. The computing cost then becomes the limiting factor. Any efﬁcient learning algorithm should at least take a brief look at each example. But should all examples be given equal attention? This contribution proposes an empirical answer. We ﬁrst present an online SVM algorithm based on this premise. LASVM yields competitive misclassiﬁcation rates after a single pass over the training examples, outspeeding state-of-the-art SVM solvers. Then we show how active example selection can yield faster training, higher accuracies, and simpler models, using only a fraction of the training example labels.

3 0.27140442 49 jmlr-2005-Learning the Kernel with Hyperkernels (Kernel Machines Section)

Author: Cheng Soon Ong, Alexander J. Smola, Robert C. Williamson

4 0.26985198 64 jmlr-2005-Semigroup Kernels on Measures

Author: Marco Cuturi, Kenji Fukumizu, Jean-Philippe Vert

5 0.26606289 46 jmlr-2005-Learning a Mahalanobis Metric from Equivalence Constraints

Author: Aharon Bar-Hillel, Tomer Hertz, Noam Shental, Daphna Weinshall

Abstract: Many learning algorithms use a metric deﬁned over the input space as a principal tool, and their performance critically depends on the quality of this metric. We address the problem of learning metrics using side-information in the form of equivalence constraints. Unlike labels, we demonstrate that this type of side-information can sometimes be automatically obtained without the need of human intervention. We show how such side-information can be used to modify the representation of the data, leading to improved clustering and classiﬁcation. Speciﬁcally, we present the Relevant Component Analysis (RCA) algorithm, which is a simple and efﬁcient algorithm for learning a Mahalanobis metric. We show that RCA is the solution of an interesting optimization problem, founded on an information theoretic basis. If dimensionality reduction is allowed within RCA, we show that it is optimally accomplished by a version of Fisher’s linear discriminant that uses constraints. Moreover, under certain Gaussian assumptions, RCA can be viewed as a Maximum Likelihood estimation of the within class covariance matrix. We conclude with extensive empirical evaluations of RCA, showing its advantage over alternative methods. Keywords: clustering, metric learning, dimensionality reduction, equivalence constraints, side information.

6 0.26535985 36 jmlr-2005-Gaussian Processes for Ordinal Regression

7 0.26287788 3 jmlr-2005-A Classification Framework for Anomaly Detection

8 0.26206613 31 jmlr-2005-Estimation of Non-Normalized Statistical Models by Score Matching

9 0.2618461 71 jmlr-2005-Variational Message Passing

10 0.26183203 39 jmlr-2005-Information Bottleneck for Gaussian Variables

11 0.25978938 19 jmlr-2005-Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

12 0.25898319 45 jmlr-2005-Learning Multiple Tasks with Kernel Methods

13 0.25742498 44 jmlr-2005-Learning Module Networks

14 0.25517461 56 jmlr-2005-Maximum Margin Algorithms with Boolean Kernels

15 0.25469449 9 jmlr-2005-Active Learning to Recognize Multiple Types of Plankton

16 0.25453612 24 jmlr-2005-Core Vector Machines: Fast SVM Training on Very Large Data Sets

17 0.25407609 48 jmlr-2005-Learning the Kernel Function via Regularization

18 0.25402704 47 jmlr-2005-Learning from Examples as an Inverse Problem

19 0.25328156 11 jmlr-2005-Algorithmic Stability and Meta-Learning

20 0.25244489 54 jmlr-2005-Managing Diversity in Regression Ensembles