jmlr jmlr2009 jmlr2009-78 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yuesheng Xu, Haizhang Zhang
Abstract: We continue our recent study on constructing a refinement kernel for a given kernel so that the reproducing kernel Hilbert space associated with the refinement kernel contains that with the original kernel as a subspace. To motivate this study, we first develop a refinement kernel method for learning, which gives an efficient algorithm for updating a learning predictor. Several characterizations of refinement kernels are then presented. It is shown that a nontrivial refinement kernel for a given kernel always exists if the input space has an infinite cardinal number. Refinement kernels for translation invariant kernels and Hilbert-Schmidt kernels are investigated. Various concrete examples are provided. Keywords: reproducing kernels, reproducing kernel Hilbert spaces, learning with kernels, refinement kernels, translation invariant kernels, Hilbert-Schmidt kernels
Reference: text
sentIndex sentText sentNum sentScore
1 It is shown that a nontrivial refinement kernel for a given kernel always exists if the input space has an infinite cardinal number. [sent-6, score-0.512]
2 Refinement kernels for translation invariant kernels and Hilbert-Schmidt kernels are investigated. [sent-7, score-0.827]
3 Keywords: reproducing kernels, reproducing kernel Hilbert spaces, learning with kernels, refinement kernels, translation invariant kernels, Hilbert-Schmidt kernels 1. [sent-9, score-0.693]
4 Introduction In our recent work (Xu and Zhang, 2007), we studied characterizations of a refinable kernel which offers a convenient way of enlarging its reproducing kernel Hilbert space (RKHS). [sent-10, score-0.472]
5 Neither the Gaussian kernels nor kernels having finite dimensional feature spaces are refinable. [sent-15, score-0.406]
6 There is a bijective correspondence between the set of kernels on X and that of reproducing kernel Hilbert spaces (RKHS) on X. [sent-29, score-0.431]
7 For an existing kernel K, we call a kernel G satisfying (6) a refinement kernel for K. [sent-36, score-0.552]
8 If in addition, H G contains HK as a proper subspace, then we call G a nontrivial refinement kernel for K. [sent-37, score-0.328]
9 There are two possible situations where one may desire to find a nontrivial refinement kernel G for K in (7). [sent-49, score-0.328]
10 In a recent paper (Xu and Zhang, 2007), we introduced a method of updating kernels via a composition of the kernel with a bijective mapping γ of the input space. [sent-57, score-0.387]
11 Specifically, with a selected positive constant λ, we define for a kernel K on X a new kernel G(x, y) := λK(γ(x), γ(y)), x, y ∈ X (8) and call K a γ-refinable kernel if (8) gives a nontrivial refinement kernel G for K. [sent-58, score-0.88]
12 Various characterizations and many examples of refinable kernels were provided in Xu and Zhang (2007). [sent-59, score-0.263]
13 On the other hand, we know that kernels with a finite dimensional feature space, such as finite dot-product kernels (FitzGerald et ¨ al. [sent-64, score-0.406]
14 We would like to find nontrivial refinement kernels for them by considering general methods of updating kernels besides the particular one (8). [sent-67, score-0.55]
15 In particular, it will be shown that a nontrivial refinement kernel always exists if the input space contains infinite elements. [sent-73, score-0.328]
16 In Sections 5 and 6, we study refinement kernels for translation invariant kernels and Hilbert-Schmidt kernels, respectively. [sent-74, score-0.624]
17 This is a big saving in comparison to O (m 3 ) number of multiplications if the linear system (12) is solved directly by the Cholesky factorization without using the refinement kernel method. [sent-130, score-0.289]
18 In other words, the use of the refinement kernel method reduces the number of multiplications from O (m3 ) to O (m2 ). [sent-131, score-0.265]
19 The refinement kernel learning method allows us to reduce the number of multiplications from O (m3 ) to O (m2 ) by using the known Cholesky factorization of the matrix corresponding to the old kernel K. [sent-169, score-0.473]
20 Then G is a refinement kernel for K if and only if L := G − K is a kernel on X and HK ∩ HL = {0}. [sent-178, score-0.368]
21 If G is a refinement kernel for K then HL is the orthogonal complement of HK in HG , and G is a nontrivial refinement kernel for K if and only if L is not the zero kernel. [sent-179, score-0.512]
22 Aiming at a characterization convenient for use, we next characterize refinement kernels in terms of feature maps for kernels, since most kernels are identified with their feature maps. [sent-182, score-0.436]
23 Theorem 6 Suppose that K is a kernel on X with a feature map Φ : X → W satisfying (27) and G is a kernel on X with a feature map Φ : X → W that satisfies span Φ (X) = W . [sent-195, score-0.436]
24 Then G is a refinement kernel for K if and only if there exists a bounded linear operator T : W → W such that T Φ (x) = Φ(x), x ∈ X (28) and the adjoint operator T ∗ : W → W of T is isometric. [sent-196, score-0.279]
25 Moreover, G is a nontrivial refinement kernel for K if and only if T in (28) is not injective. [sent-197, score-0.328]
26 By Theorem 6, G is a nontrivial refinement kernel for K if and only if there exists a bounded linear operator T : W → W such that λ1/2 T Φ(γ(x)) = Φ(x), x ∈ X, its adjoint T ∗ is isometric from W to W , and T is not injective. [sent-203, score-0.414]
27 Therefore, we can not get a nontrivial refinement kernel by (8) if W is of finite dimension. [sent-206, score-0.328]
28 Examples of nontrivial refinement kernels for kernels with a finite dimensional feature space will be provided in Section 6. [sent-209, score-0.55]
29 This result is crucial for our discussion later on translation invariant kernels and Hilbert-Schmidt kernels. [sent-211, score-0.421]
30 If µ ν then Kν is a nontrivial refinement kernel for Kµ if and only if Hν ν(Y ) − µ(Y ) > 0. [sent-245, score-0.328]
31 If µ ν then Kν is a nontrivial refinement kernel for Kµ if and only if the operator T is not injective, that is, there exists f ∈ L2 (Y, ν) such that f L2 (Y,ν) > 0 but T f L2 (Y,µ) = 0. [sent-269, score-0.362]
32 Existence of Refinement Kernels With characterizations in Lemma 4 and Theorem 6, we shall consider existence of nontrivial refinement kernels and properties of kernels preserved by the refinement process. [sent-274, score-0.61]
33 Lemma 9 A kernel K on X does not have a nontrivial refinement kernel if and only if H K = CX . [sent-276, score-0.512]
34 Proof If HK = CX then since for all kernels G on X, HG ⊆ CX , K does not have a nontrivial refinement kernel. [sent-277, score-0.347]
35 By Lemma 4, G is a nontrivial refinement kernel for K. [sent-281, score-0.328]
36 According to Lemma 9, one may expect that every kernel has a nontrivial refinement kernel since in general it should be impossible to impose an inner product on C X so that it becomes a RKHS. [sent-282, score-0.512]
37 Proposition 10 If the input space X has a finite cardinality, then a kernel K on X has a nontrivial refinement kernel if and only if K[X] is singular. [sent-284, score-0.512]
38 Theorem 11 If the input space X has an infinite cardinality, then every kernel on it has a nontrivial refinement kernel. [sent-296, score-0.328]
39 n→∞ 120 (37) R EFINEMENT OF R EPRODUCING K ERNELS However, again by (3) for the function g ∈ CX defined by g(x) := K(x, x), x ∈ X, we observe for each n ∈ N that K(xn , xn ) = |g(xn )| = |(g, K(·, xn ))HK | ≤ g HK K(·, xn ) HK = g HK K(xn , xn ). [sent-304, score-0.32]
40 Proposition 12 If K is a strictly positive definite kernel on X and G is a refinement kernel for K, then G is also strictly positive definite. [sent-310, score-0.368]
41 Proof By Lemma 4, if G is a refinement kernel for K then G − K remains a kernel on X. [sent-311, score-0.368]
42 We call a kernel on X a continuous kernel if it is at the same time a continuous function on X × X. [sent-316, score-0.466]
43 Given a continuous kernel K on X, can we find a nontrivial refinement kernel G for K that is also continuous? [sent-317, score-0.561]
44 Theorem 13 If X is a metric space with an infinite cardinality, then every continuous kernel on X has a nontrivial continuous refinement kernel. [sent-320, score-0.426]
45 By the arguments used in the proof of Lemma 9, it suffices to prove that there is not a continuous kernel K on X for which HK contains all the continuous functions on X. [sent-325, score-0.282]
46 We say that a function K : X × X → C is a universal kernel if it is a continuous kernel on X and for all compact subsets Z ⊆ X, span {K(·, x) : x ∈ Z } is dense in the Banach space C(Z ) of the continuous functions on Z . [sent-346, score-0.564]
47 Proposition 14 If K is a universal kernel on X, then any continuous refinement kernel for K is universal. [sent-350, score-0.417]
48 Proof Suppose that K is a universal kernel on X and G is a continuous refinement kernel for K. [sent-351, score-0.417]
49 By Theorems 11 and 13, it is reasonable to conjecture that there exist nontrivial refinement kernels for most kernels used in machine learning. [sent-362, score-0.55]
50 To verify this conjecture and present concrete examples, we shall discuss refinement kernels for translation invariant kernels and Hilbert-Schmidt kernels in the next two sections. [sent-363, score-0.827]
51 Refinement of Translation Invariant Kernels In this section, we specify our input space as Rd , d ∈ N and investigate refinement kernels for translation invariant kernels on Rd . [sent-365, score-0.624]
52 2 we establish various characterizations of refinement for general translation invariant kernels. [sent-369, score-0.278]
53 Specifically, refinement of B-spline kernels, radial kernels and periodic kernels is studied in Sections 5. [sent-371, score-0.406]
54 1 Translation Invariant Kernels A kernel K on Rd is said to be translation invariant if for all a ∈ Rd , K(x − a, y − a) = K(x, y), x, y ∈ Rd . [sent-378, score-0.402]
55 It can be seen by (3) and (39) that a translation invariant kernel K on R d satisfies for all a, b, x, y ∈ Rd that τa K(·, x) = K(· − a, x) = K(·, x + a) (40) and (τa K(·, y), τb K(·, x))HK = (K(·, y + a), K(·, x + b))HK = K(x + b, y + a). [sent-380, score-0.402]
56 It was established by Bochner in Bochner (1959) that K is a continuous translation invariant kernel on R d if and only if there exists a µ ∈ B (Rd ) such that K(x, y) = Z Rd ei(x−y,ξ) dµ(ξ), x, y ∈ Rd , 123 X U AND Z HANG where (·, ·) denotes the standard inner product on Rd . [sent-382, score-0.451]
57 We present below a characterization of translation invariant kernels in terms of their RKHS. [sent-384, score-0.451]
58 Proposition 15 A kernel K on Rd is translation invariant if and only if for each a ∈ Rd , τa is an isomorphism on HK . [sent-386, score-0.433]
59 Proof Suppose that K is a translation invariant kernel on Rd . [sent-387, score-0.402]
60 2 Characterizations Suppose that K is a continuous translation invariant kernel on R d . [sent-407, score-0.451]
61 We are interested in constructing refinement kernels for K that are continuous and translation invariant as well. [sent-408, score-0.47]
62 Specifically, with a different measure µ ∈ B (Rd ) we introduce a new kernel G(x, y) := Z Rd ei(x−y,ξ) dµ (ξ), x, y ∈ Rd and characterize G being a refinement kernel for K in terms of a relation between µ and µ . [sent-409, score-0.368]
63 The Lebesgue decomposition of measures with respect to the Lebesgue measure leads to a decomposition of the corresponding continuous translation invariant kernel. [sent-423, score-0.267]
64 Specifically, for a continuous translation invariant kernel K on Rd , we have the Lebesgue decomposition K = Kc + Ks , where Kc (x, y) = Z Rd ei(x−y,ξ) dµc (ξ), Ks (x, y) = (42) Z Rd ei(x−y,ξ) dµs (ξ), x, y ∈ Rd . [sent-424, score-0.451]
65 Likewise, for a continuous translation invariant kernel G on R d , we also have its Lebesgue decomposition G = G c + Gs , (43) where Gc (x, y) = Z Rd ei(x−y,ξ) dµc (ξ), Gs (x, y) = Z Rd ei(x−y,ξ) dµs (ξ), x, y ∈ Rd . [sent-425, score-0.451]
66 V Lemma 17 If L is a continuous translation invariant kernel on R d with a Lebesgue decomposition L L = Lc + Ls , then HL is equal to the orthogonal direct sum of HLc and HLs , namely, HL = HLc HLs . [sent-431, score-0.451]
67 HG with two independent inclusions Proposition 18 Suppose that K and G are continuous translation invariant kernels on R d defined by (42) and (43). [sent-443, score-0.47]
68 By the assumption and Lemma 17, there exist two functions gc ∈ HGc and gs ∈ HGs such that f = gc + gs and f 2 K = f 2 G = gc 2 G + gs 2 G . [sent-455, score-0.561]
69 Theorem 20 Let K and G be continuous translation invariant kernels on R d defined by (42) and (43). [sent-474, score-0.47]
70 The refinement kernel G is nontrivial for K if and only if there holds (45) or µs (Rd ) − µs (Rd ) > 0. [sent-476, score-0.328]
71 Suppose that G is a continuous translation invariant kernel on Rd with a Lebesgue decomposition (43). [sent-481, score-0.451]
72 The kernel G is a nontrivial refinement kernel for K if and only if Gc = K and Gs = 0. [sent-483, score-0.512]
73 (48) In the next proposition, we show that K is a kernel on Rd and characterize refinement kernels for K. [sent-488, score-0.387]
74 Moreover, suppose that G is a continuous translation invariant kernel on Rd with a Lebesgue decomposition (43). [sent-492, score-0.488]
75 ¨ A particular example of (48) are the B-spline kernels (see, for example, Sch olkopf and Smola, 2002, page 98, and the references therein), which are defined as (48) by f 0 that is the characteristic function of a ball in Rd centered at the origin. [sent-504, score-0.267]
76 R+ (50) Theorem 23 Suppose that K is a nontrivial radial kernel defined by (49), (50) with µ({0}) = 0. [sent-509, score-0.328]
77 Then a continuous translation invariant kernel G on Rd with a Lebesgue decomposition (43) is a refinement kernel for K if and only if Gc = K. [sent-510, score-0.635]
78 We remark that if µ({0}) > 0 then K is the sum of a constant kernel and a radial kernel satisfying the hypothesis of the last theorem. [sent-523, score-0.368]
79 Therefore, by Theorems 20 and 23, a continuous kernel G with a Lebesgue decomposition (43) is a refinement kernel for K if and only if Gc = K − µ({0}) and µs ({0}) = µ({0}) since µs and µ must agree at the origin. [sent-525, score-0.417]
80 2 Corollary 24 A continuous translation invariant kernel G on R d with a Lebesgue decomposition (43) is a refinement kernel for the Gaussian kernel Gσ if and only if Gc = Gσ . [sent-527, score-0.819]
81 It is a nontrivial refinement kernel for Gσ if and only if Gc = Gσ and Gs = 0. [sent-528, score-0.328]
82 The corollary above suggests that we may refine the Gaussian kernel G σ by adding to it a kernel Gs defined by a singular measure on Rd . [sent-529, score-0.368]
83 By the last proposition, for c := [cn : n ∈ Zd ] ∈ 1 (Zd ) with cn ≥ 0 for each n ∈ Zd , we introduce kernel fc (x − y) := ∑ cn ei(n,x−y) , x, y ∈ Rd , (53) n∈Zd and set Ωc := {n ∈ Zd : cn > 0}. [sent-554, score-0.422]
84 (54) H fb then fb is a nontrivial refinement kernel for f a if and only if Ωa is a proper subset of Proof By Theorem 20, H fa H fb if and only if a b. [sent-559, score-0.445]
85 The translation invariant kernels Kc defined by (44) via a nonnegative function k ∈ L1 (Rd ) are of special interest. [sent-571, score-0.469]
86 We call them translation invariant kernels of continuous type. [sent-572, score-0.47]
87 For a continuous translation invariant kernel Kc defined by (44), we consider a refinement kernel Gc having the form Gc (x, y) = λKc (Dx, Dy), x, y ∈ Rd . [sent-580, score-0.635]
88 det D HGc then Gc is a nontrivial refinement kernel for Kc if and only if Z Ωk \(DT )−1 Ωk k(ξ)dξ > 0. [sent-585, score-0.328]
89 We then identify the nonnegative function g ∈ L1 (Rd ) in the definition (44) of the translation invariant kernel Gc of continuous type as follows g(ξ) = λ k((DT )−1 ξ), ξ ∈ Rd . [sent-587, score-0.499]
90 (59) The Mercer theorem (see, for example, Cucker and Smale, 2002; Hochstadt, 1973; Mercer, 1909; Sun, 2005) in the theory of reproducing kernels indicates that (59) represents a large class of kernels. [sent-604, score-0.285]
91 Moreover, if a b then Kb is a nontrivial refinement kernel for Ka if and only if supp a is a proper subset of supp b. [sent-612, score-0.43]
92 We introduce three nonnegative functions a, b, c in 1 (N) by setting for n ∈ N ˜ ˜ ˜ an := ˜ an , n2 c n 0, n ∈ supp a, otherwise, and cn := ˜ 1 , n2 0, bn , n2 c n ˜ bn := n ∈ supp b, otherwise, 0, n ∈ supp c, otherwise. [sent-614, score-0.352]
93 For two nonnegative functions a, b defined on N satisfying √ max lim sup n an , lim sup n→∞ n bn n→∞ ≤ 1 , R (60) we define the kernels Ka (ξ, η) := ∑ an ξn−1 ηn−1 , Kb (ξ, η) := ∑ bn ξn−1 ηn−1 , n∈N n∈N ξ, η ∈ X. [sent-626, score-0.333]
94 (61) ¨ Classical kernels such as the Bergman kernels and the Szeg o kernels (see, for example, Saitoh, 1988) have the above form. [sent-627, score-0.609]
95 (1 − 2zt + z2 )(d−1)/2 n∈Z+ n For a nonnegative function h defined on N satisfying the conditions d ∑ hn Pn−1 (1) < +∞, (62) n∈N we introduce a Schoenberg kernel on Sd by setting Sh (x, y) := d ∑ hn Pn−1 ((x, y)), n∈N x, y ∈ Sd , (63) where (·, ·) denotes the inner product on Rd+1 . [sent-638, score-0.358]
96 If a b then Sb is a nontrivial refinement kernel for Sa if and only if supp a is a proper subset of supp b. [sent-641, score-0.43]
97 Proof We shall write the kernels Sa , Sb in the form of Hilbert-Schmidt kernels and then apply Theorem 29. [sent-642, score-0.406]
98 For each n ∈ Z+ , there exists a positive constant cn such that d Pn ((x, y)) = cn ∑ j∈Ndn Y jn (x)Y jn (y), x, y ∈ Sd . [sent-648, score-0.264]
99 The kernels we consider are GA (x, y) := ∑ A jk φk (x)φ j (y), GB (x, y) := j,k∈Nn ∑ j,k∈Nm B jk φk (x)φ j (y), x, y ∈ X. [sent-687, score-0.253]
100 In both cases, if HGA HGB then GB is a nontrivial refinement kernel for GA if and only if m > n. [sent-694, score-0.328]
wordName wordTfidf (topN-words)
[('nement', 0.409), ('hk', 0.402), ('rd', 0.365), ('kernels', 0.203), ('kernel', 0.184), ('hg', 0.175), ('nontrivial', 0.144), ('zd', 0.143), ('borel', 0.143), ('hgc', 0.141), ('re', 0.134), ('gc', 0.133), ('translation', 0.128), ('efinement', 0.125), ('hkc', 0.117), ('eproducing', 0.106), ('au', 0.095), ('nable', 0.093), ('invariant', 0.09), ('ernels', 0.088), ('hang', 0.087), ('micchelli', 0.086), ('lebesgue', 0.082), ('multiplications', 0.081), ('xn', 0.08), ('hka', 0.078), ('hkb', 0.078), ('ei', 0.072), ('im', 0.071), ('xu', 0.071), ('kc', 0.071), ('nn', 0.069), ('cn', 0.069), ('span', 0.068), ('cx', 0.064), ('nm', 0.064), ('hgs', 0.063), ('hn', 0.063), ('jn', 0.063), ('characterizations', 0.06), ('rkhs', 0.057), ('gs', 0.054), ('bv', 0.054), ('nq', 0.053), ('supp', 0.051), ('continuous', 0.049), ('nonnegative', 0.048), ('ka', 0.048), ('hilbert', 0.047), ('hks', 0.047), ('everywhere', 0.047), ('reproducing', 0.044), ('fourier', 0.043), ('bn', 0.041), ('olkopf', 0.04), ('bochner', 0.04), ('ker', 0.039), ('wavelets', 0.039), ('cholesky', 0.039), ('sd', 0.039), ('hl', 0.038), ('theorem', 0.038), ('suppose', 0.037), ('proposition', 0.036), ('schoenberg', 0.036), ('kb', 0.036), ('operator', 0.034), ('fs', 0.034), ('aronszajn', 0.033), ('hlc', 0.031), ('hls', 0.031), ('kerc', 0.031), ('ndn', 0.031), ('walder', 0.031), ('fn', 0.031), ('fc', 0.031), ('isomorphism', 0.031), ('characterization', 0.03), ('dense', 0.03), ('fb', 0.03), ('predictor', 0.029), ('minimizer', 0.028), ('absolutely', 0.027), ('zhang', 0.027), ('conversely', 0.027), ('adjoint', 0.027), ('sb', 0.027), ('fa', 0.027), ('dt', 0.026), ('ran', 0.026), ('isometric', 0.025), ('nite', 0.025), ('jk', 0.025), ('subspace', 0.025), ('factorization', 0.024), ('jersey', 0.024), ('page', 0.024), ('hgb', 0.023), ('tietze', 0.023), ('ct', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000012 78 jmlr-2009-Refinement of Reproducing Kernels
Author: Yuesheng Xu, Haizhang Zhang
Abstract: We continue our recent study on constructing a refinement kernel for a given kernel so that the reproducing kernel Hilbert space associated with the refinement kernel contains that with the original kernel as a subspace. To motivate this study, we first develop a refinement kernel method for learning, which gives an efficient algorithm for updating a learning predictor. Several characterizations of refinement kernels are then presented. It is shown that a nontrivial refinement kernel for a given kernel always exists if the input space has an infinite cardinal number. Refinement kernels for translation invariant kernels and Hilbert-Schmidt kernels are investigated. Various concrete examples are provided. Keywords: reproducing kernels, reproducing kernel Hilbert spaces, learning with kernels, refinement kernels, translation invariant kernels, Hilbert-Schmidt kernels
2 0.19725911 80 jmlr-2009-Reproducing Kernel Banach Spaces for Machine Learning
Author: Haizhang Zhang, Yuesheng Xu, Jun Zhang
Abstract: We introduce the notion of reproducing kernel Banach spaces (RKBS) and study special semiinner-product RKBS by making use of semi-inner-products and the duality mapping. Properties of an RKBS and its reproducing kernel are investigated. As applications, we develop in the framework of RKBS standard learning schemes including minimal norm interpolation, regularization network, support vector machines, and kernel principal component analysis. In particular, existence, uniqueness and representer theorems are established. Keywords: reproducing kernel Banach spaces, reproducing kernels, learning theory, semi-innerproducts, representer theorems
3 0.14002332 22 jmlr-2009-Deterministic Error Analysis of Support Vector Regression and Related Regularized Kernel Methods
Author: Christian Rieger, Barbara Zwicknagl
Abstract: We introduce a new technique for the analysis of kernel-based regression problems. The basic tools are sampling inequalities which apply to all machine learning problems involving penalty terms induced by kernels related to Sobolev spaces. They lead to explicit deterministic results concerning the worst case behaviour of ε- and ν-SVRs. Using these, we show how to adjust regularization parameters to get best possible approximation orders for regression. The results are illustrated by some numerical examples. Keywords: sampling inequality, radial basis functions, approximation theory, reproducing kernel Hilbert space, Sobolev space
4 0.12698738 68 jmlr-2009-Online Learning with Samples Drawn from Non-identical Distributions
Author: Ting Hu, Ding-Xuan Zhou
Abstract: Learning algorithms are based on samples which are often drawn independently from an identical distribution (i.i.d.). In this paper we consider a different setting with samples drawn according to a non-identical sequence of probability distributions. Each time a sample is drawn from a different distribution. In this setting we investigate a fully online learning algorithm associated with a general convex loss function and a reproducing kernel Hilbert space (RKHS). Error analysis is conducted under the assumption that the sequence of marginal distributions converges polynomially in the dual of a H¨ lder space. For regression with least square or insensitive loss, learning rates are given o in both the RKHS norm and the L2 norm. For classification with hinge loss and support vector machine q-norm loss, rates are explicitly stated with respect to the excess misclassification error. Keywords: sampling with non-identical distributions, online learning, classification with a general convex loss, regression with insensitive loss and least square loss, reproducing kernel Hilbert space
5 0.12643465 100 jmlr-2009-When Is There a Representer Theorem? Vector Versus Matrix Regularizers
Author: Andreas Argyriou, Charles A. Micchelli, Massimiliano Pontil
Abstract: We consider a general class of regularization methods which learn a vector of parameters on the basis of linear measurements. It is well known that if the regularizer is a nondecreasing function of the L2 norm, then the learned vector is a linear combination of the input data. This result, known as the representer theorem, lies at the basis of kernel-based methods in machine learning. In this paper, we prove the necessity of the above condition, in the case of differentiable regularizers. We further extend our analysis to regularization methods which learn a matrix, a problem which is motivated by the application to multi-task learning. In this context, we study a more general representer theorem, which holds for a larger class of regularizers. We provide a necessary and sufficient condition characterizing this class of matrix regularizers and we highlight some concrete examples of practical importance. Our analysis uses basic principles from matrix theory, especially the useful notion of matrix nondecreasing functions. Keywords: kernel methods, matrix learning, minimal norm interpolation, multi-task learning, regularization
6 0.094309926 38 jmlr-2009-Hash Kernels for Structured Data
7 0.089676306 2 jmlr-2009-A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization
8 0.074651837 61 jmlr-2009-Nonextensive Information Theoretic Kernels on Measures
9 0.056242328 16 jmlr-2009-Classification with Gaussians and Convex Loss
10 0.051507924 86 jmlr-2009-Similarity-based Classification: Concepts and Algorithms
11 0.048894048 12 jmlr-2009-Bi-Level Path Following for Cross Validated Solution of Kernel Quantile Regression
13 0.042702537 53 jmlr-2009-Marginal Likelihood Integrals for Mixtures of Independence Models
14 0.041741427 36 jmlr-2009-Fourier Theoretic Probabilistic Inference over Permutations
15 0.040104471 90 jmlr-2009-Structure Spaces
16 0.039396778 26 jmlr-2009-Dlib-ml: A Machine Learning Toolkit (Machine Learning Open Source Software Paper)
17 0.039032008 23 jmlr-2009-Discriminative Learning Under Covariate Shift
18 0.037911542 20 jmlr-2009-DL-Learner: Learning Concepts in Description Logics
19 0.037865996 67 jmlr-2009-Online Learning with Sample Path Constraints
20 0.036464915 87 jmlr-2009-Sparse Online Learning via Truncated Gradient
topicId topicWeight
[(0, 0.213), (1, 0.039), (2, 0.1), (3, -0.117), (4, -0.347), (5, 0.171), (6, 0.156), (7, -0.267), (8, 0.024), (9, 0.069), (10, -0.02), (11, 0.055), (12, 0.216), (13, 0.004), (14, 0.029), (15, 0.043), (16, 0.003), (17, -0.047), (18, -0.092), (19, 0.023), (20, -0.04), (21, -0.137), (22, 0.119), (23, 0.023), (24, -0.008), (25, 0.068), (26, 0.015), (27, -0.074), (28, -0.022), (29, -0.15), (30, 0.096), (31, 0.034), (32, 0.045), (33, 0.04), (34, -0.031), (35, -0.07), (36, -0.044), (37, -0.023), (38, -0.068), (39, 0.019), (40, 0.056), (41, 0.033), (42, 0.015), (43, -0.063), (44, 0.077), (45, 0.021), (46, 0.0), (47, -0.044), (48, 0.074), (49, -0.059)]
simIndex simValue paperId paperTitle
same-paper 1 0.98222518 78 jmlr-2009-Refinement of Reproducing Kernels
Author: Yuesheng Xu, Haizhang Zhang
Abstract: We continue our recent study on constructing a refinement kernel for a given kernel so that the reproducing kernel Hilbert space associated with the refinement kernel contains that with the original kernel as a subspace. To motivate this study, we first develop a refinement kernel method for learning, which gives an efficient algorithm for updating a learning predictor. Several characterizations of refinement kernels are then presented. It is shown that a nontrivial refinement kernel for a given kernel always exists if the input space has an infinite cardinal number. Refinement kernels for translation invariant kernels and Hilbert-Schmidt kernels are investigated. Various concrete examples are provided. Keywords: reproducing kernels, reproducing kernel Hilbert spaces, learning with kernels, refinement kernels, translation invariant kernels, Hilbert-Schmidt kernels
2 0.80444533 80 jmlr-2009-Reproducing Kernel Banach Spaces for Machine Learning
Author: Haizhang Zhang, Yuesheng Xu, Jun Zhang
Abstract: We introduce the notion of reproducing kernel Banach spaces (RKBS) and study special semiinner-product RKBS by making use of semi-inner-products and the duality mapping. Properties of an RKBS and its reproducing kernel are investigated. As applications, we develop in the framework of RKBS standard learning schemes including minimal norm interpolation, regularization network, support vector machines, and kernel principal component analysis. In particular, existence, uniqueness and representer theorems are established. Keywords: reproducing kernel Banach spaces, reproducing kernels, learning theory, semi-innerproducts, representer theorems
3 0.67986763 22 jmlr-2009-Deterministic Error Analysis of Support Vector Regression and Related Regularized Kernel Methods
Author: Christian Rieger, Barbara Zwicknagl
Abstract: We introduce a new technique for the analysis of kernel-based regression problems. The basic tools are sampling inequalities which apply to all machine learning problems involving penalty terms induced by kernels related to Sobolev spaces. They lead to explicit deterministic results concerning the worst case behaviour of ε- and ν-SVRs. Using these, we show how to adjust regularization parameters to get best possible approximation orders for regression. The results are illustrated by some numerical examples. Keywords: sampling inequality, radial basis functions, approximation theory, reproducing kernel Hilbert space, Sobolev space
4 0.51951015 100 jmlr-2009-When Is There a Representer Theorem? Vector Versus Matrix Regularizers
Author: Andreas Argyriou, Charles A. Micchelli, Massimiliano Pontil
Abstract: We consider a general class of regularization methods which learn a vector of parameters on the basis of linear measurements. It is well known that if the regularizer is a nondecreasing function of the L2 norm, then the learned vector is a linear combination of the input data. This result, known as the representer theorem, lies at the basis of kernel-based methods in machine learning. In this paper, we prove the necessity of the above condition, in the case of differentiable regularizers. We further extend our analysis to regularization methods which learn a matrix, a problem which is motivated by the application to multi-task learning. In this context, we study a more general representer theorem, which holds for a larger class of regularizers. We provide a necessary and sufficient condition characterizing this class of matrix regularizers and we highlight some concrete examples of practical importance. Our analysis uses basic principles from matrix theory, especially the useful notion of matrix nondecreasing functions. Keywords: kernel methods, matrix learning, minimal norm interpolation, multi-task learning, regularization
5 0.46188873 38 jmlr-2009-Hash Kernels for Structured Data
Author: Qinfeng Shi, James Petterson, Gideon Dror, John Langford, Alex Smola, S.V.N. Vishwanathan
Abstract: We propose hashing to facilitate efficient kernels. This generalizes previous work using sampling and we show a principled way to compute the kernel matrix for data streams and sparse feature spaces. Moreover, we give deviation bounds from the exact kernel matrix. This has applications to estimation on strings and graphs. Keywords: hashing, stream, string kernel, graphlet kernel, multiclass classification
6 0.4524526 61 jmlr-2009-Nonextensive Information Theoretic Kernels on Measures
8 0.3113977 68 jmlr-2009-Online Learning with Samples Drawn from Non-identical Distributions
10 0.25657907 16 jmlr-2009-Classification with Gaussians and Convex Loss
11 0.25600478 12 jmlr-2009-Bi-Level Path Following for Cross Validated Solution of Kernel Quantile Regression
12 0.211943 26 jmlr-2009-Dlib-ml: A Machine Learning Toolkit (Machine Learning Open Source Software Paper)
13 0.21052612 20 jmlr-2009-DL-Learner: Learning Concepts in Description Logics
14 0.18800877 67 jmlr-2009-Online Learning with Sample Path Constraints
15 0.18587606 87 jmlr-2009-Sparse Online Learning via Truncated Gradient
17 0.18026838 86 jmlr-2009-Similarity-based Classification: Concepts and Algorithms
18 0.17773038 53 jmlr-2009-Marginal Likelihood Integrals for Mixtures of Independence Models
19 0.16714747 66 jmlr-2009-On the Consistency of Feature Selection using Greedy Least Squares Regression
20 0.1650144 51 jmlr-2009-Low-Rank Kernel Learning with Bregman Matrix Divergences
topicId topicWeight
[(7, 0.03), (11, 0.065), (19, 0.028), (21, 0.037), (38, 0.016), (47, 0.023), (49, 0.313), (52, 0.043), (55, 0.049), (58, 0.038), (66, 0.129), (81, 0.032), (90, 0.062), (96, 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.74849039 78 jmlr-2009-Refinement of Reproducing Kernels
Author: Yuesheng Xu, Haizhang Zhang
Abstract: We continue our recent study on constructing a refinement kernel for a given kernel so that the reproducing kernel Hilbert space associated with the refinement kernel contains that with the original kernel as a subspace. To motivate this study, we first develop a refinement kernel method for learning, which gives an efficient algorithm for updating a learning predictor. Several characterizations of refinement kernels are then presented. It is shown that a nontrivial refinement kernel for a given kernel always exists if the input space has an infinite cardinal number. Refinement kernels for translation invariant kernels and Hilbert-Schmidt kernels are investigated. Various concrete examples are provided. Keywords: reproducing kernels, reproducing kernel Hilbert spaces, learning with kernels, refinement kernels, translation invariant kernels, Hilbert-Schmidt kernels
2 0.47424954 100 jmlr-2009-When Is There a Representer Theorem? Vector Versus Matrix Regularizers
Author: Andreas Argyriou, Charles A. Micchelli, Massimiliano Pontil
Abstract: We consider a general class of regularization methods which learn a vector of parameters on the basis of linear measurements. It is well known that if the regularizer is a nondecreasing function of the L2 norm, then the learned vector is a linear combination of the input data. This result, known as the representer theorem, lies at the basis of kernel-based methods in machine learning. In this paper, we prove the necessity of the above condition, in the case of differentiable regularizers. We further extend our analysis to regularization methods which learn a matrix, a problem which is motivated by the application to multi-task learning. In this context, we study a more general representer theorem, which holds for a larger class of regularizers. We provide a necessary and sufficient condition characterizing this class of matrix regularizers and we highlight some concrete examples of practical importance. Our analysis uses basic principles from matrix theory, especially the useful notion of matrix nondecreasing functions. Keywords: kernel methods, matrix learning, minimal norm interpolation, multi-task learning, regularization
3 0.47165072 80 jmlr-2009-Reproducing Kernel Banach Spaces for Machine Learning
Author: Haizhang Zhang, Yuesheng Xu, Jun Zhang
Abstract: We introduce the notion of reproducing kernel Banach spaces (RKBS) and study special semiinner-product RKBS by making use of semi-inner-products and the duality mapping. Properties of an RKBS and its reproducing kernel are investigated. As applications, we develop in the framework of RKBS standard learning schemes including minimal norm interpolation, regularization network, support vector machines, and kernel principal component analysis. In particular, existence, uniqueness and representer theorems are established. Keywords: reproducing kernel Banach spaces, reproducing kernels, learning theory, semi-innerproducts, representer theorems
4 0.46911481 13 jmlr-2009-Bounded Kernel-Based Online Learning
Author: Francesco Orabona, Joseph Keshet, Barbara Caputo
Abstract: A common problem of kernel-based online algorithms, such as the kernel-based Perceptron algorithm, is the amount of memory required to store the online hypothesis, which may increase without bound as the algorithm progresses. Furthermore, the computational load of such algorithms grows linearly with the amount of memory used to store the hypothesis. To attack these problems, most previous work has focused on discarding some of the instances, in order to keep the memory bounded. In this paper we present a new algorithm, in which the instances are not discarded, but are instead projected onto the space spanned by the previous online hypothesis. We call this algorithm Projectron. While the memory size of the Projectron solution cannot be predicted before training, we prove that its solution is guaranteed to be bounded. We derive a relative mistake bound for the proposed algorithm, and deduce from it a slightly different algorithm which outperforms the Perceptron. We call this second algorithm Projectron++. We show that this algorithm can be extended to handle the multiclass and the structured output settings, resulting, as far as we know, in the first online bounded algorithm that can learn complex classification tasks. The method of bounding the hypothesis representation can be applied to any conservative online algorithm and to other online algorithms, as it is demonstrated for ALMA2 . Experimental results on various data sets show the empirical advantage of our technique compared to various bounded online algorithms, both in terms of memory and accuracy. Keywords: online learning, kernel methods, support vector machines, bounded support set
Author: Junning Li, Z. Jane Wang
Abstract: In real world applications, graphical statistical models are not only a tool for operations such as classification or prediction, but usually the network structures of the models themselves are also of great interest (e.g., in modeling brain connectivity). The false discovery rate (FDR), the expected ratio of falsely claimed connections to all those claimed, is often a reasonable error-rate criterion in these applications. However, current learning algorithms for graphical models have not been adequately adapted to the concerns of the FDR. The traditional practice of controlling the type I error rate and the type II error rate under a conventional level does not necessarily keep the FDR low, especially in the case of sparse networks. In this paper, we propose embedding an FDR-control procedure into the PC algorithm to curb the FDR of the skeleton of the learned graph. We prove that the proposed method can control the FDR under a user-specified level at the limit of large sample sizes. In the cases of moderate sample size (about several hundred), empirical experiments show that the method is still able to control the FDR under the user-specified level, and a heuristic modification of the method is able to control the FDR more accurately around the user-specified level. The proposed method is applicable to any models for which statistical tests of conditional independence are available, such as discrete models and Gaussian models. Keywords: Bayesian networks, false discovery rate, PC algorithm, directed acyclic graph, skeleton
6 0.45245907 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models
7 0.45203203 82 jmlr-2009-Robustness and Regularization of Support Vector Machines
8 0.44320273 9 jmlr-2009-Analysis of Perceptron-Based Active Learning
9 0.44198596 22 jmlr-2009-Deterministic Error Analysis of Support Vector Regression and Related Regularized Kernel Methods
10 0.43630943 66 jmlr-2009-On the Consistency of Feature Selection using Greedy Least Squares Regression
11 0.43600255 94 jmlr-2009-The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
12 0.43330184 18 jmlr-2009-Consistency and Localizability
13 0.43274266 47 jmlr-2009-Learning Linear Ranking Functions for Beam Search with Application to Planning
14 0.43201488 52 jmlr-2009-Margin-based Ranking and an Equivalence between AdaBoost and RankBoost
15 0.43172053 2 jmlr-2009-A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization
16 0.4311488 71 jmlr-2009-Perturbation Corrections in Approximate Inference: Mixture Modelling Applications
17 0.43099192 55 jmlr-2009-Maximum Entropy Discrimination Markov Networks
18 0.42970794 14 jmlr-2009-CarpeDiem: Optimizing the Viterbi Algorithm and Applications to Supervised Sequential Learning
19 0.42965665 36 jmlr-2009-Fourier Theoretic Probabilistic Inference over Permutations
20 0.42942446 37 jmlr-2009-Generalization Bounds for Ranking Algorithms via Algorithmic Stability