jmlr jmlr2009 jmlr2009-78 knowledge-graph by maker-knowledge-mining

78 jmlr-2009-Refinement of Reproducing Kernels

Source: pdf

Author: Yuesheng Xu, Haizhang Zhang

Abstract: We continue our recent study on constructing a reﬁnement kernel for a given kernel so that the reproducing kernel Hilbert space associated with the reﬁnement kernel contains that with the original kernel as a subspace. To motivate this study, we ﬁrst develop a reﬁnement kernel method for learning, which gives an efﬁcient algorithm for updating a learning predictor. Several characterizations of reﬁnement kernels are then presented. It is shown that a nontrivial reﬁnement kernel for a given kernel always exists if the input space has an inﬁnite cardinal number. Reﬁnement kernels for translation invariant kernels and Hilbert-Schmidt kernels are investigated. Various concrete examples are provided. Keywords: reproducing kernels, reproducing kernel Hilbert spaces, learning with kernels, reﬁnement kernels, translation invariant kernels, Hilbert-Schmidt kernels

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 It is shown that a nontrivial reﬁnement kernel for a given kernel always exists if the input space has an inﬁnite cardinal number. [sent-6, score-0.512]

2 Reﬁnement kernels for translation invariant kernels and Hilbert-Schmidt kernels are investigated. [sent-7, score-0.827]

3 Keywords: reproducing kernels, reproducing kernel Hilbert spaces, learning with kernels, reﬁnement kernels, translation invariant kernels, Hilbert-Schmidt kernels 1. [sent-9, score-0.693]

4 Introduction In our recent work (Xu and Zhang, 2007), we studied characterizations of a reﬁnable kernel which offers a convenient way of enlarging its reproducing kernel Hilbert space (RKHS). [sent-10, score-0.472]

5 Neither the Gaussian kernels nor kernels having ﬁnite dimensional feature spaces are reﬁnable. [sent-15, score-0.406]

6 There is a bijective correspondence between the set of kernels on X and that of reproducing kernel Hilbert spaces (RKHS) on X. [sent-29, score-0.431]

7 For an existing kernel K, we call a kernel G satisfying (6) a reﬁnement kernel for K. [sent-36, score-0.552]

8 If in addition, H G contains HK as a proper subspace, then we call G a nontrivial reﬁnement kernel for K. [sent-37, score-0.328]

9 There are two possible situations where one may desire to ﬁnd a nontrivial reﬁnement kernel G for K in (7). [sent-49, score-0.328]

10 In a recent paper (Xu and Zhang, 2007), we introduced a method of updating kernels via a composition of the kernel with a bijective mapping γ of the input space. [sent-57, score-0.387]

11 Speciﬁcally, with a selected positive constant λ, we deﬁne for a kernel K on X a new kernel G(x, y) := λK(γ(x), γ(y)), x, y ∈ X (8) and call K a γ-reﬁnable kernel if (8) gives a nontrivial reﬁnement kernel G for K. [sent-58, score-0.88]

12 Various characterizations and many examples of reﬁnable kernels were provided in Xu and Zhang (2007). [sent-59, score-0.263]

13 On the other hand, we know that kernels with a ﬁnite dimensional feature space, such as ﬁnite dot-product kernels (FitzGerald et ¨ al. [sent-64, score-0.406]

14 We would like to ﬁnd nontrivial reﬁnement kernels for them by considering general methods of updating kernels besides the particular one (8). [sent-67, score-0.55]

15 In particular, it will be shown that a nontrivial reﬁnement kernel always exists if the input space contains inﬁnite elements. [sent-73, score-0.328]

16 In Sections 5 and 6, we study reﬁnement kernels for translation invariant kernels and Hilbert-Schmidt kernels, respectively. [sent-74, score-0.624]

17 This is a big saving in comparison to O (m 3 ) number of multiplications if the linear system (12) is solved directly by the Cholesky factorization without using the reﬁnement kernel method. [sent-130, score-0.289]

18 In other words, the use of the reﬁnement kernel method reduces the number of multiplications from O (m3 ) to O (m2 ). [sent-131, score-0.265]

19 The reﬁnement kernel learning method allows us to reduce the number of multiplications from O (m3 ) to O (m2 ) by using the known Cholesky factorization of the matrix corresponding to the old kernel K. [sent-169, score-0.473]

20 Then G is a reﬁnement kernel for K if and only if L := G − K is a kernel on X and HK ∩ HL = {0}. [sent-178, score-0.368]

21 If G is a reﬁnement kernel for K then HL is the orthogonal complement of HK in HG , and G is a nontrivial reﬁnement kernel for K if and only if L is not the zero kernel. [sent-179, score-0.512]

22 Aiming at a characterization convenient for use, we next characterize reﬁnement kernels in terms of feature maps for kernels, since most kernels are identiﬁed with their feature maps. [sent-182, score-0.436]

23 Theorem 6 Suppose that K is a kernel on X with a feature map Φ : X → W satisfying (27) and G is a kernel on X with a feature map Φ : X → W that satisﬁes span Φ (X) = W . [sent-195, score-0.436]

24 Then G is a reﬁnement kernel for K if and only if there exists a bounded linear operator T : W → W such that T Φ (x) = Φ(x), x ∈ X (28) and the adjoint operator T ∗ : W → W of T is isometric. [sent-196, score-0.279]

25 Moreover, G is a nontrivial reﬁnement kernel for K if and only if T in (28) is not injective. [sent-197, score-0.328]

26 By Theorem 6, G is a nontrivial reﬁnement kernel for K if and only if there exists a bounded linear operator T : W → W such that λ1/2 T Φ(γ(x)) = Φ(x), x ∈ X, its adjoint T ∗ is isometric from W to W , and T is not injective. [sent-203, score-0.414]

27 Therefore, we can not get a nontrivial reﬁnement kernel by (8) if W is of ﬁnite dimension. [sent-206, score-0.328]

28 Examples of nontrivial reﬁnement kernels for kernels with a ﬁnite dimensional feature space will be provided in Section 6. [sent-209, score-0.55]

29 This result is crucial for our discussion later on translation invariant kernels and Hilbert-Schmidt kernels. [sent-211, score-0.421]

30 If µ ν then Kν is a nontrivial reﬁnement kernel for Kµ if and only if Hν ν(Y ) − µ(Y ) > 0. [sent-245, score-0.328]

31 If µ ν then Kν is a nontrivial reﬁnement kernel for Kµ if and only if the operator T is not injective, that is, there exists f ∈ L2 (Y, ν) such that f L2 (Y,ν) > 0 but T f L2 (Y,µ) = 0. [sent-269, score-0.362]

32 Existence of Reﬁnement Kernels With characterizations in Lemma 4 and Theorem 6, we shall consider existence of nontrivial reﬁnement kernels and properties of kernels preserved by the reﬁnement process. [sent-274, score-0.61]

33 Lemma 9 A kernel K on X does not have a nontrivial reﬁnement kernel if and only if H K = CX . [sent-276, score-0.512]

34 Proof If HK = CX then since for all kernels G on X, HG ⊆ CX , K does not have a nontrivial reﬁnement kernel. [sent-277, score-0.347]

35 By Lemma 4, G is a nontrivial reﬁnement kernel for K. [sent-281, score-0.328]

36 According to Lemma 9, one may expect that every kernel has a nontrivial reﬁnement kernel since in general it should be impossible to impose an inner product on C X so that it becomes a RKHS. [sent-282, score-0.512]

37 Proposition 10 If the input space X has a ﬁnite cardinality, then a kernel K on X has a nontrivial reﬁnement kernel if and only if K[X] is singular. [sent-284, score-0.512]

38 Theorem 11 If the input space X has an inﬁnite cardinality, then every kernel on it has a nontrivial reﬁnement kernel. [sent-296, score-0.328]

39 n→∞ 120 (37) R EFINEMENT OF R EPRODUCING K ERNELS However, again by (3) for the function g ∈ CX deﬁned by g(x) := K(x, x), x ∈ X, we observe for each n ∈ N that K(xn , xn ) = |g(xn )| = |(g, K(·, xn ))HK | ≤ g HK K(·, xn ) HK = g HK K(xn , xn ). [sent-304, score-0.32]

40 Proposition 12 If K is a strictly positive deﬁnite kernel on X and G is a reﬁnement kernel for K, then G is also strictly positive deﬁnite. [sent-310, score-0.368]

41 Proof By Lemma 4, if G is a reﬁnement kernel for K then G − K remains a kernel on X. [sent-311, score-0.368]

42 We call a kernel on X a continuous kernel if it is at the same time a continuous function on X × X. [sent-316, score-0.466]

43 Given a continuous kernel K on X, can we ﬁnd a nontrivial reﬁnement kernel G for K that is also continuous? [sent-317, score-0.561]

44 Theorem 13 If X is a metric space with an inﬁnite cardinality, then every continuous kernel on X has a nontrivial continuous reﬁnement kernel. [sent-320, score-0.426]

45 By the arguments used in the proof of Lemma 9, it sufﬁces to prove that there is not a continuous kernel K on X for which HK contains all the continuous functions on X. [sent-325, score-0.282]

46 We say that a function K : X × X → C is a universal kernel if it is a continuous kernel on X and for all compact subsets Z ⊆ X, span {K(·, x) : x ∈ Z } is dense in the Banach space C(Z ) of the continuous functions on Z . [sent-346, score-0.564]

47 Proposition 14 If K is a universal kernel on X, then any continuous reﬁnement kernel for K is universal. [sent-350, score-0.417]

48 Proof Suppose that K is a universal kernel on X and G is a continuous reﬁnement kernel for K. [sent-351, score-0.417]

49 By Theorems 11 and 13, it is reasonable to conjecture that there exist nontrivial reﬁnement kernels for most kernels used in machine learning. [sent-362, score-0.55]

50 To verify this conjecture and present concrete examples, we shall discuss reﬁnement kernels for translation invariant kernels and Hilbert-Schmidt kernels in the next two sections. [sent-363, score-0.827]

51 Reﬁnement of Translation Invariant Kernels In this section, we specify our input space as Rd , d ∈ N and investigate reﬁnement kernels for translation invariant kernels on Rd . [sent-365, score-0.624]

52 2 we establish various characterizations of reﬁnement for general translation invariant kernels. [sent-369, score-0.278]

53 Speciﬁcally, reﬁnement of B-spline kernels, radial kernels and periodic kernels is studied in Sections 5. [sent-371, score-0.406]

54 1 Translation Invariant Kernels A kernel K on Rd is said to be translation invariant if for all a ∈ Rd , K(x − a, y − a) = K(x, y), x, y ∈ Rd . [sent-378, score-0.402]

55 It can be seen by (3) and (39) that a translation invariant kernel K on R d satisﬁes for all a, b, x, y ∈ Rd that τa K(·, x) = K(· − a, x) = K(·, x + a) (40) and (τa K(·, y), τb K(·, x))HK = (K(·, y + a), K(·, x + b))HK = K(x + b, y + a). [sent-380, score-0.402]

56 It was established by Bochner in Bochner (1959) that K is a continuous translation invariant kernel on R d if and only if there exists a µ ∈ B (Rd ) such that K(x, y) = Z Rd ei(x−y,ξ) dµ(ξ), x, y ∈ Rd , 123 X U AND Z HANG where (·, ·) denotes the standard inner product on Rd . [sent-382, score-0.451]

57 We present below a characterization of translation invariant kernels in terms of their RKHS. [sent-384, score-0.451]

58 Proposition 15 A kernel K on Rd is translation invariant if and only if for each a ∈ Rd , τa is an isomorphism on HK . [sent-386, score-0.433]

59 Proof Suppose that K is a translation invariant kernel on Rd . [sent-387, score-0.402]

60 2 Characterizations Suppose that K is a continuous translation invariant kernel on R d . [sent-407, score-0.451]

61 We are interested in constructing reﬁnement kernels for K that are continuous and translation invariant as well. [sent-408, score-0.47]

62 Speciﬁcally, with a different measure µ ∈ B (Rd ) we introduce a new kernel G(x, y) := Z Rd ei(x−y,ξ) dµ (ξ), x, y ∈ Rd and characterize G being a reﬁnement kernel for K in terms of a relation between µ and µ . [sent-409, score-0.368]

63 The Lebesgue decomposition of measures with respect to the Lebesgue measure leads to a decomposition of the corresponding continuous translation invariant kernel. [sent-423, score-0.267]

64 Speciﬁcally, for a continuous translation invariant kernel K on Rd , we have the Lebesgue decomposition K = Kc + Ks , where Kc (x, y) = Z Rd ei(x−y,ξ) dµc (ξ), Ks (x, y) = (42) Z Rd ei(x−y,ξ) dµs (ξ), x, y ∈ Rd . [sent-424, score-0.451]

65 Likewise, for a continuous translation invariant kernel G on R d , we also have its Lebesgue decomposition G = G c + Gs , (43) where Gc (x, y) = Z Rd ei(x−y,ξ) dµc (ξ), Gs (x, y) = Z Rd ei(x−y,ξ) dµs (ξ), x, y ∈ Rd . [sent-425, score-0.451]

66 V Lemma 17 If L is a continuous translation invariant kernel on R d with a Lebesgue decomposition L L = Lc + Ls , then HL is equal to the orthogonal direct sum of HLc and HLs , namely, HL = HLc HLs . [sent-431, score-0.451]

67 HG with two independent inclusions Proposition 18 Suppose that K and G are continuous translation invariant kernels on R d deﬁned by (42) and (43). [sent-443, score-0.47]

68 By the assumption and Lemma 17, there exist two functions gc ∈ HGc and gs ∈ HGs such that f = gc + gs and f 2 K = f 2 G = gc 2 G + gs 2 G . [sent-455, score-0.561]

69 Theorem 20 Let K and G be continuous translation invariant kernels on R d deﬁned by (42) and (43). [sent-474, score-0.47]

70 The reﬁnement kernel G is nontrivial for K if and only if there holds (45) or µs (Rd ) − µs (Rd ) > 0. [sent-476, score-0.328]

71 Suppose that G is a continuous translation invariant kernel on Rd with a Lebesgue decomposition (43). [sent-481, score-0.451]

72 The kernel G is a nontrivial reﬁnement kernel for K if and only if Gc = K and Gs = 0. [sent-483, score-0.512]

73 (48) In the next proposition, we show that K is a kernel on Rd and characterize reﬁnement kernels for K. [sent-488, score-0.387]

74 Moreover, suppose that G is a continuous translation invariant kernel on Rd with a Lebesgue decomposition (43). [sent-492, score-0.488]

75 ¨ A particular example of (48) are the B-spline kernels (see, for example, Sch olkopf and Smola, 2002, page 98, and the references therein), which are deﬁned as (48) by f 0 that is the characteristic function of a ball in Rd centered at the origin. [sent-504, score-0.267]

76 R+ (50) Theorem 23 Suppose that K is a nontrivial radial kernel deﬁned by (49), (50) with µ({0}) = 0. [sent-509, score-0.328]

77 Then a continuous translation invariant kernel G on Rd with a Lebesgue decomposition (43) is a reﬁnement kernel for K if and only if Gc = K. [sent-510, score-0.635]

78 We remark that if µ({0}) > 0 then K is the sum of a constant kernel and a radial kernel satisfying the hypothesis of the last theorem. [sent-523, score-0.368]

79 Therefore, by Theorems 20 and 23, a continuous kernel G with a Lebesgue decomposition (43) is a reﬁnement kernel for K if and only if Gc = K − µ({0}) and µs ({0}) = µ({0}) since µs and µ must agree at the origin. [sent-525, score-0.417]

80 2 Corollary 24 A continuous translation invariant kernel G on R d with a Lebesgue decomposition (43) is a reﬁnement kernel for the Gaussian kernel Gσ if and only if Gc = Gσ . [sent-527, score-0.819]

81 It is a nontrivial reﬁnement kernel for Gσ if and only if Gc = Gσ and Gs = 0. [sent-528, score-0.328]

82 The corollary above suggests that we may reﬁne the Gaussian kernel G σ by adding to it a kernel Gs deﬁned by a singular measure on Rd . [sent-529, score-0.368]

83 By the last proposition, for c := [cn : n ∈ Zd ] ∈ 1 (Zd ) with cn ≥ 0 for each n ∈ Zd , we introduce kernel fc (x − y) := ∑ cn ei(n,x−y) , x, y ∈ Rd , (53) n∈Zd and set Ωc := {n ∈ Zd : cn > 0}. [sent-554, score-0.422]

84 (54) H fb then fb is a nontrivial reﬁnement kernel for f a if and only if Ωa is a proper subset of Proof By Theorem 20, H fa H fb if and only if a b. [sent-559, score-0.445]

85 The translation invariant kernels Kc deﬁned by (44) via a nonnegative function k ∈ L1 (Rd ) are of special interest. [sent-571, score-0.469]

86 We call them translation invariant kernels of continuous type. [sent-572, score-0.47]

87 For a continuous translation invariant kernel Kc deﬁned by (44), we consider a reﬁnement kernel Gc having the form Gc (x, y) = λKc (Dx, Dy), x, y ∈ Rd . [sent-580, score-0.635]

88 det D HGc then Gc is a nontrivial reﬁnement kernel for Kc if and only if Z Ωk \(DT )−1 Ωk k(ξ)dξ > 0. [sent-585, score-0.328]

89 We then identify the nonnegative function g ∈ L1 (Rd ) in the deﬁnition (44) of the translation invariant kernel Gc of continuous type as follows g(ξ) = λ k((DT )−1 ξ), ξ ∈ Rd . [sent-587, score-0.499]

90 (59) The Mercer theorem (see, for example, Cucker and Smale, 2002; Hochstadt, 1973; Mercer, 1909; Sun, 2005) in the theory of reproducing kernels indicates that (59) represents a large class of kernels. [sent-604, score-0.285]

91 Moreover, if a b then Kb is a nontrivial reﬁnement kernel for Ka if and only if supp a is a proper subset of supp b. [sent-612, score-0.43]

92 We introduce three nonnegative functions a, b, c in 1 (N) by setting for n ∈ N ˜ ˜ ˜ an := ˜ an , n2 c n 0, n ∈ supp a, otherwise, and cn := ˜ 1 , n2 0, bn , n2 c n ˜ bn := n ∈ supp b, otherwise, 0, n ∈ supp c, otherwise. [sent-614, score-0.352]

93 For two nonnegative functions a, b deﬁned on N satisfying √ max lim sup n an , lim sup n→∞ n bn n→∞ ≤ 1 , R (60) we deﬁne the kernels Ka (ξ, η) := ∑ an ξn−1 ηn−1 , Kb (ξ, η) := ∑ bn ξn−1 ηn−1 , n∈N n∈N ξ, η ∈ X. [sent-626, score-0.333]

94 (61) ¨ Classical kernels such as the Bergman kernels and the Szeg o kernels (see, for example, Saitoh, 1988) have the above form. [sent-627, score-0.609]

95 (1 − 2zt + z2 )(d−1)/2 n∈Z+ n For a nonnegative function h deﬁned on N satisfying the conditions d ∑ hn Pn−1 (1) < +∞, (62) n∈N we introduce a Schoenberg kernel on Sd by setting Sh (x, y) := d ∑ hn Pn−1 ((x, y)), n∈N x, y ∈ Sd , (63) where (·, ·) denotes the inner product on Rd+1 . [sent-638, score-0.358]

96 If a b then Sb is a nontrivial reﬁnement kernel for Sa if and only if supp a is a proper subset of supp b. [sent-641, score-0.43]

97 Proof We shall write the kernels Sa , Sb in the form of Hilbert-Schmidt kernels and then apply Theorem 29. [sent-642, score-0.406]

98 For each n ∈ Z+ , there exists a positive constant cn such that d Pn ((x, y)) = cn ∑ j∈Ndn Y jn (x)Y jn (y), x, y ∈ Sd . [sent-648, score-0.264]

99 The kernels we consider are GA (x, y) := ∑ A jk φk (x)φ j (y), GB (x, y) := j,k∈Nn ∑ j,k∈Nm B jk φk (x)φ j (y), x, y ∈ X. [sent-687, score-0.253]

100 In both cases, if HGA HGB then GB is a nontrivial reﬁnement kernel for GA if and only if m > n. [sent-694, score-0.328]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('nement', 0.409), ('hk', 0.402), ('rd', 0.365), ('kernels', 0.203), ('kernel', 0.184), ('hg', 0.175), ('nontrivial', 0.144), ('zd', 0.143), ('borel', 0.143), ('hgc', 0.141), ('re', 0.134), ('gc', 0.133), ('translation', 0.128), ('efinement', 0.125), ('hkc', 0.117), ('eproducing', 0.106), ('au', 0.095), ('nable', 0.093), ('invariant', 0.09), ('ernels', 0.088), ('hang', 0.087), ('micchelli', 0.086), ('lebesgue', 0.082), ('multiplications', 0.081), ('xn', 0.08), ('hka', 0.078), ('hkb', 0.078), ('ei', 0.072), ('im', 0.071), ('xu', 0.071), ('kc', 0.071), ('nn', 0.069), ('cn', 0.069), ('span', 0.068), ('cx', 0.064), ('nm', 0.064), ('hgs', 0.063), ('hn', 0.063), ('jn', 0.063), ('characterizations', 0.06), ('rkhs', 0.057), ('gs', 0.054), ('bv', 0.054), ('nq', 0.053), ('supp', 0.051), ('continuous', 0.049), ('nonnegative', 0.048), ('ka', 0.048), ('hilbert', 0.047), ('hks', 0.047), ('everywhere', 0.047), ('reproducing', 0.044), ('fourier', 0.043), ('bn', 0.041), ('olkopf', 0.04), ('bochner', 0.04), ('ker', 0.039), ('wavelets', 0.039), ('cholesky', 0.039), ('sd', 0.039), ('hl', 0.038), ('theorem', 0.038), ('suppose', 0.037), ('proposition', 0.036), ('schoenberg', 0.036), ('kb', 0.036), ('operator', 0.034), ('fs', 0.034), ('aronszajn', 0.033), ('hlc', 0.031), ('hls', 0.031), ('kerc', 0.031), ('ndn', 0.031), ('walder', 0.031), ('fn', 0.031), ('fc', 0.031), ('isomorphism', 0.031), ('characterization', 0.03), ('dense', 0.03), ('fb', 0.03), ('predictor', 0.029), ('minimizer', 0.028), ('absolutely', 0.027), ('zhang', 0.027), ('conversely', 0.027), ('adjoint', 0.027), ('sb', 0.027), ('fa', 0.027), ('dt', 0.026), ('ran', 0.026), ('isometric', 0.025), ('nite', 0.025), ('jk', 0.025), ('subspace', 0.025), ('factorization', 0.024), ('jersey', 0.024), ('page', 0.024), ('hgb', 0.023), ('tietze', 0.023), ('ct', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000012 78 jmlr-2009-Refinement of Reproducing Kernels

Author: Yuesheng Xu, Haizhang Zhang

2 0.19725911 80 jmlr-2009-Reproducing Kernel Banach Spaces for Machine Learning

Author: Haizhang Zhang, Yuesheng Xu, Jun Zhang

Abstract: We introduce the notion of reproducing kernel Banach spaces (RKBS) and study special semiinner-product RKBS by making use of semi-inner-products and the duality mapping. Properties of an RKBS and its reproducing kernel are investigated. As applications, we develop in the framework of RKBS standard learning schemes including minimal norm interpolation, regularization network, support vector machines, and kernel principal component analysis. In particular, existence, uniqueness and representer theorems are established. Keywords: reproducing kernel Banach spaces, reproducing kernels, learning theory, semi-innerproducts, representer theorems

3 0.14002332 22 jmlr-2009-Deterministic Error Analysis of Support Vector Regression and Related Regularized Kernel Methods

Author: Christian Rieger, Barbara Zwicknagl

Abstract: We introduce a new technique for the analysis of kernel-based regression problems. The basic tools are sampling inequalities which apply to all machine learning problems involving penalty terms induced by kernels related to Sobolev spaces. They lead to explicit deterministic results concerning the worst case behaviour of ε- and ν-SVRs. Using these, we show how to adjust regularization parameters to get best possible approximation orders for regression. The results are illustrated by some numerical examples. Keywords: sampling inequality, radial basis functions, approximation theory, reproducing kernel Hilbert space, Sobolev space

4 0.12698738 68 jmlr-2009-Online Learning with Samples Drawn from Non-identical Distributions

Author: Ting Hu, Ding-Xuan Zhou

Abstract: Learning algorithms are based on samples which are often drawn independently from an identical distribution (i.i.d.). In this paper we consider a different setting with samples drawn according to a non-identical sequence of probability distributions. Each time a sample is drawn from a different distribution. In this setting we investigate a fully online learning algorithm associated with a general convex loss function and a reproducing kernel Hilbert space (RKHS). Error analysis is conducted under the assumption that the sequence of marginal distributions converges polynomially in the dual of a H¨ lder space. For regression with least square or insensitive loss, learning rates are given o in both the RKHS norm and the L2 norm. For classiﬁcation with hinge loss and support vector machine q-norm loss, rates are explicitly stated with respect to the excess misclassiﬁcation error. Keywords: sampling with non-identical distributions, online learning, classiﬁcation with a general convex loss, regression with insensitive loss and least square loss, reproducing kernel Hilbert space

5 0.12643465 100 jmlr-2009-When Is There a Representer Theorem? Vector Versus Matrix Regularizers

Author: Andreas Argyriou, Charles A. Micchelli, Massimiliano Pontil

Abstract: We consider a general class of regularization methods which learn a vector of parameters on the basis of linear measurements. It is well known that if the regularizer is a nondecreasing function of the L2 norm, then the learned vector is a linear combination of the input data. This result, known as the representer theorem, lies at the basis of kernel-based methods in machine learning. In this paper, we prove the necessity of the above condition, in the case of differentiable regularizers. We further extend our analysis to regularization methods which learn a matrix, a problem which is motivated by the application to multi-task learning. In this context, we study a more general representer theorem, which holds for a larger class of regularizers. We provide a necessary and sufﬁcient condition characterizing this class of matrix regularizers and we highlight some concrete examples of practical importance. Our analysis uses basic principles from matrix theory, especially the useful notion of matrix nondecreasing functions. Keywords: kernel methods, matrix learning, minimal norm interpolation, multi-task learning, regularization

6 0.094309926 38 jmlr-2009-Hash Kernels for Structured Data

7 0.089676306 2 jmlr-2009-A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization

8 0.074651837 61 jmlr-2009-Nonextensive Information Theoretic Kernels on Measures

9 0.056242328 16 jmlr-2009-Classification with Gaussians and Convex Loss

10 0.051507924 86 jmlr-2009-Similarity-based Classification: Concepts and Algorithms

11 0.048894048 12 jmlr-2009-Bi-Level Path Following for Cross Validated Solution of Kernel Quantile Regression

12 0.048842862 98 jmlr-2009-Universal Kernel-Based Learning with Applications to Regular Languages (Special Topic on Mining and Learning with Graphs and Relations)

13 0.042702537 53 jmlr-2009-Marginal Likelihood Integrals for Mixtures of Independence Models

14 0.041741427 36 jmlr-2009-Fourier Theoretic Probabilistic Inference over Permutations

15 0.040104471 90 jmlr-2009-Structure Spaces

16 0.039396778 26 jmlr-2009-Dlib-ml: A Machine Learning Toolkit (Machine Learning Open Source Software Paper)

17 0.039032008 23 jmlr-2009-Discriminative Learning Under Covariate Shift

18 0.037911542 20 jmlr-2009-DL-Learner: Learning Concepts in Description Logics

19 0.037865996 67 jmlr-2009-Online Learning with Sample Path Constraints

20 0.036464915 87 jmlr-2009-Sparse Online Learning via Truncated Gradient

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.213), (1, 0.039), (2, 0.1), (3, -0.117), (4, -0.347), (5, 0.171), (6, 0.156), (7, -0.267), (8, 0.024), (9, 0.069), (10, -0.02), (11, 0.055), (12, 0.216), (13, 0.004), (14, 0.029), (15, 0.043), (16, 0.003), (17, -0.047), (18, -0.092), (19, 0.023), (20, -0.04), (21, -0.137), (22, 0.119), (23, 0.023), (24, -0.008), (25, 0.068), (26, 0.015), (27, -0.074), (28, -0.022), (29, -0.15), (30, 0.096), (31, 0.034), (32, 0.045), (33, 0.04), (34, -0.031), (35, -0.07), (36, -0.044), (37, -0.023), (38, -0.068), (39, 0.019), (40, 0.056), (41, 0.033), (42, 0.015), (43, -0.063), (44, 0.077), (45, 0.021), (46, 0.0), (47, -0.044), (48, 0.074), (49, -0.059)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98222518 78 jmlr-2009-Refinement of Reproducing Kernels

Author: Yuesheng Xu, Haizhang Zhang

2 0.80444533 80 jmlr-2009-Reproducing Kernel Banach Spaces for Machine Learning

Author: Haizhang Zhang, Yuesheng Xu, Jun Zhang

3 0.67986763 22 jmlr-2009-Deterministic Error Analysis of Support Vector Regression and Related Regularized Kernel Methods

Author: Christian Rieger, Barbara Zwicknagl

4 0.51951015 100 jmlr-2009-When Is There a Representer Theorem? Vector Versus Matrix Regularizers

Author: Andreas Argyriou, Charles A. Micchelli, Massimiliano Pontil

5 0.46188873 38 jmlr-2009-Hash Kernels for Structured Data

Author: Qinfeng Shi, James Petterson, Gideon Dror, John Langford, Alex Smola, S.V.N. Vishwanathan

Abstract: We propose hashing to facilitate efﬁcient kernels. This generalizes previous work using sampling and we show a principled way to compute the kernel matrix for data streams and sparse feature spaces. Moreover, we give deviation bounds from the exact kernel matrix. This has applications to estimation on strings and graphs. Keywords: hashing, stream, string kernel, graphlet kernel, multiclass classiﬁcation

6 0.4524526 61 jmlr-2009-Nonextensive Information Theoretic Kernels on Measures

7 0.375947 2 jmlr-2009-A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization

8 0.3113977 68 jmlr-2009-Online Learning with Samples Drawn from Non-identical Distributions

9 0.28484768 98 jmlr-2009-Universal Kernel-Based Learning with Applications to Regular Languages (Special Topic on Mining and Learning with Graphs and Relations)

10 0.25657907 16 jmlr-2009-Classification with Gaussians and Convex Loss

11 0.25600478 12 jmlr-2009-Bi-Level Path Following for Cross Validated Solution of Kernel Quantile Regression

12 0.211943 26 jmlr-2009-Dlib-ml: A Machine Learning Toolkit (Machine Learning Open Source Software Paper)

13 0.21052612 20 jmlr-2009-DL-Learner: Learning Concepts in Description Logics

14 0.18800877 67 jmlr-2009-Online Learning with Sample Path Constraints

15 0.18587606 87 jmlr-2009-Sparse Online Learning via Truncated Gradient

16 0.18159354 6 jmlr-2009-An Algorithm for Reading Dependencies from the Minimal Undirected Independence Map of a Graphoid that Satisfies Weak Transitivity

17 0.18026838 86 jmlr-2009-Similarity-based Classification: Concepts and Algorithms

18 0.17773038 53 jmlr-2009-Marginal Likelihood Integrals for Mixtures of Independence Models

19 0.16714747 66 jmlr-2009-On the Consistency of Feature Selection using Greedy Least Squares Regression

20 0.1650144 51 jmlr-2009-Low-Rank Kernel Learning with Bregman Matrix Divergences

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.03), (11, 0.065), (19, 0.028), (21, 0.037), (38, 0.016), (47, 0.023), (49, 0.313), (52, 0.043), (55, 0.049), (58, 0.038), (66, 0.129), (81, 0.032), (90, 0.062), (96, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.74849039 78 jmlr-2009-Refinement of Reproducing Kernels

Author: Yuesheng Xu, Haizhang Zhang

2 0.47424954 100 jmlr-2009-When Is There a Representer Theorem? Vector Versus Matrix Regularizers

Author: Andreas Argyriou, Charles A. Micchelli, Massimiliano Pontil

3 0.47165072 80 jmlr-2009-Reproducing Kernel Banach Spaces for Machine Learning

Author: Haizhang Zhang, Yuesheng Xu, Jun Zhang

4 0.46911481 13 jmlr-2009-Bounded Kernel-Based Online Learning

Author: Francesco Orabona, Joseph Keshet, Barbara Caputo

Abstract: A common problem of kernel-based online algorithms, such as the kernel-based Perceptron algorithm, is the amount of memory required to store the online hypothesis, which may increase without bound as the algorithm progresses. Furthermore, the computational load of such algorithms grows linearly with the amount of memory used to store the hypothesis. To attack these problems, most previous work has focused on discarding some of the instances, in order to keep the memory bounded. In this paper we present a new algorithm, in which the instances are not discarded, but are instead projected onto the space spanned by the previous online hypothesis. We call this algorithm Projectron. While the memory size of the Projectron solution cannot be predicted before training, we prove that its solution is guaranteed to be bounded. We derive a relative mistake bound for the proposed algorithm, and deduce from it a slightly different algorithm which outperforms the Perceptron. We call this second algorithm Projectron++. We show that this algorithm can be extended to handle the multiclass and the structured output settings, resulting, as far as we know, in the ﬁrst online bounded algorithm that can learn complex classiﬁcation tasks. The method of bounding the hypothesis representation can be applied to any conservative online algorithm and to other online algorithms, as it is demonstrated for ALMA2 . Experimental results on various data sets show the empirical advantage of our technique compared to various bounded online algorithms, both in terms of memory and accuracy. Keywords: online learning, kernel methods, support vector machines, bounded support set

5 0.45387855 19 jmlr-2009-Controlling the False Discovery Rate of the Association Causality Structure Learned with the PC Algorithm (Special Topic on Mining and Learning with Graphs and Relations)

Author: Junning Li, Z. Jane Wang

Abstract: In real world applications, graphical statistical models are not only a tool for operations such as classiﬁcation or prediction, but usually the network structures of the models themselves are also of great interest (e.g., in modeling brain connectivity). The false discovery rate (FDR), the expected ratio of falsely claimed connections to all those claimed, is often a reasonable error-rate criterion in these applications. However, current learning algorithms for graphical models have not been adequately adapted to the concerns of the FDR. The traditional practice of controlling the type I error rate and the type II error rate under a conventional level does not necessarily keep the FDR low, especially in the case of sparse networks. In this paper, we propose embedding an FDR-control procedure into the PC algorithm to curb the FDR of the skeleton of the learned graph. We prove that the proposed method can control the FDR under a user-speciﬁed level at the limit of large sample sizes. In the cases of moderate sample size (about several hundred), empirical experiments show that the method is still able to control the FDR under the user-speciﬁed level, and a heuristic modiﬁcation of the method is able to control the FDR more accurately around the user-speciﬁed level. The proposed method is applicable to any models for which statistical tests of conditional independence are available, such as discrete models and Gaussian models. Keywords: Bayesian networks, false discovery rate, PC algorithm, directed acyclic graph, skeleton

6 0.45245907 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models

7 0.45203203 82 jmlr-2009-Robustness and Regularization of Support Vector Machines

8 0.44320273 9 jmlr-2009-Analysis of Perceptron-Based Active Learning

9 0.44198596 22 jmlr-2009-Deterministic Error Analysis of Support Vector Regression and Related Regularized Kernel Methods

10 0.43630943 66 jmlr-2009-On the Consistency of Feature Selection using Greedy Least Squares Regression

11 0.43600255 94 jmlr-2009-The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

12 0.43330184 18 jmlr-2009-Consistency and Localizability

13 0.43274266 47 jmlr-2009-Learning Linear Ranking Functions for Beam Search with Application to Planning

14 0.43201488 52 jmlr-2009-Margin-based Ranking and an Equivalence between AdaBoost and RankBoost

15 0.43172053 2 jmlr-2009-A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization

16 0.4311488 71 jmlr-2009-Perturbation Corrections in Approximate Inference: Mixture Modelling Applications

17 0.43099192 55 jmlr-2009-Maximum Entropy Discrimination Markov Networks

18 0.42970794 14 jmlr-2009-CarpeDiem: Optimizing the Viterbi Algorithm and Applications to Supervised Sequential Learning

19 0.42965665 36 jmlr-2009-Fourier Theoretic Probabilistic Inference over Permutations

20 0.42942446 37 jmlr-2009-Generalization Bounds for Ranking Algorithms via Algorithmic Stability