jmlr jmlr2012 jmlr2012-96 knowledge-graph by maker-knowledge-mining

96 jmlr-2012-Refinement of Operator-valued Reproducing Kernels

Source: pdf

Author: Haizhang Zhang, Yuesheng Xu, Qinghui Zhang

Abstract: This paper studies the construction of a reﬁnement kernel for a given operator-valued reproducing kernel such that the vector-valued reproducing kernel Hilbert space of the reﬁnement kernel contains that of the given kernel as a subspace. The study is motivated from the need of updating the current operator-valued reproducing kernel in multi-task learning when underﬁtting or overﬁtting occurs. Numerical simulations conﬁrm that the established reﬁnement kernel method is able to meet this need. Various characterizations are provided based on feature maps and vector-valued integral representations of operator-valued reproducing kernels. Concrete examples of reﬁning translation invariant and ﬁnite Hilbert-Schmidt operator-valued reproducing kernels are provided. Other examples include reﬁnement of Hessian of scalar-valued translation-invariant kernels and transformation kernels. Existence and properties of operator-valued reproducing kernels preserved during the reﬁnement process are also investigated. Keywords: vector-valued reproducing kernel Hilbert spaces, operator-valued reproducing kernels, reﬁnement, embedding, translation invariant kernels, Hessian of Gaussian kernels, Hilbert-Schmidt kernels, numerical experiments

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 China Editor: John Shawe-Taylor Abstract This paper studies the construction of a reﬁnement kernel for a given operator-valued reproducing kernel such that the vector-valued reproducing kernel Hilbert space of the reﬁnement kernel contains that of the given kernel as a subspace. [sent-11, score-1.116]

2 The study is motivated from the need of updating the current operator-valued reproducing kernel in multi-task learning when underﬁtting or overﬁtting occurs. [sent-12, score-0.402]

3 Various characterizations are provided based on feature maps and vector-valued integral representations of operator-valued reproducing kernels. [sent-14, score-0.395]

4 Concrete examples of reﬁning translation invariant and ﬁnite Hilbert-Schmidt operator-valued reproducing kernels are provided. [sent-15, score-0.538]

5 Existence and properties of operator-valued reproducing kernels preserved during the reﬁnement process are also investigated. [sent-17, score-0.461]

6 Keywords: vector-valued reproducing kernel Hilbert spaces, operator-valued reproducing kernels, reﬁnement, embedding, translation invariant kernels, Hessian of Gaussian kernels, Hilbert-Schmidt kernels, numerical experiments 1. [sent-18, score-0.777]

7 Built upon the theory of scalar-valued reproducing kernels (Aronszajn, 1950), kernel methods have proven useful in single task learning (Sch¨ lkopf and Smola, 2002; Shawe-Taylor and o Cristianini, 2004; Vapnik, 1998). [sent-21, score-0.565]

8 These two considerations lead immediately to the notion of reproducing kernel Hilbert spaces (RKHS). [sent-32, score-0.402]

9 It is well-known that a bivariate function is a scalar-valued reproducing kernel if and only if it is representable as some inner product of the feature of inputs (Sch¨ lkopf and Smola, 2002). [sent-35, score-0.402]

10 Finally, ﬁnding a feature map and taking the inner product of the o feature of two inputs are equivalent to choosing a scalar-valued reproducing kernel and performing function evaluations of it. [sent-36, score-0.425]

11 We shall brieﬂy review existing work on vector-valued RKHS and the associated operator-valued reproducing kernels. [sent-45, score-0.395]

12 The notion of matrix-valued or operator-valued reproducing kernels was also obtained in Burbea and Masani (1984). [sent-47, score-0.461]

13 Various characterizations and examples of universal operator-valued reproducing kernels were provided in Caponnetto et al. [sent-51, score-0.515]

14 , 2010) also examined basic operations of operator-valued reproducing kernels and extended the Bochner characterization of translation invariant reproducing kernels to the operator-valued case. [sent-55, score-1.018]

15 The purpose of this paper is to study the reﬁnement relationship of two vector-valued reproducing kernels. [sent-56, score-0.298]

16 We say that a vector-valued reproducing kernel is a reﬁnement of another kernel of such type if the RKHS of the ﬁrst kernel contains that of the latter one as a linear subspace and their norms coincide on the smaller space. [sent-57, score-0.61]

17 The study is motivated by the need of updating a vector-valued reproducing kernel for multi-task machine learning when underﬁtting or overﬁtting occurs. [sent-59, score-0.402]

18 3 about the reﬁnement of Hessian kernels and transformation kernels are unique, and Section 7 of numerical experiments is novel. [sent-70, score-0.326]

19 By contrast, the discussion of general characterizations and ﬁnite-dimensional RKHS in Section 3, reﬁnement of kernels deﬁned by the integral of operator-value kernels with respect to a scalar-valued measure in Section 4. [sent-71, score-0.443]

20 In Section 3, we shall present three general characterizations of the reﬁnement relationship by examining the difference of two given kernels, the feature map representation of kernels, and the associated kernels on the extended input space. [sent-80, score-0.337]

21 Recall that most scalar-valued reproducing kernels are represented by integrals. [sent-81, score-0.461]

22 In the operator-valued case, we have two types of integral representations: the integral of operator-valued reproducing kernels with respect to a scalar-valued measure, and the integral of scalar-valued reproducing kernels with respect to an operator-valued measure. [sent-82, score-1.051]

23 As a key part of this paper, we shall investigate in Section 4 speciﬁcations of the general characterizations when the operator-valued reproducing kernels are given by such integrals. [sent-83, score-0.612]

24 Section 6 treats specially the existence of nontrivial reﬁnements and desirable properties of operator-valued reproducing kernels that can be preserved during the reﬁnement process. [sent-85, score-0.535]

25 In Section 7, we perform three numerical simulations to show the effect of the reﬁnement kernel method in updating operator-valued reproducing kernels for multi-task learning. [sent-86, score-0.565]

26 Kernel Reﬁnement To explain our motivation from multi-task learning in details, we ﬁrst recall the deﬁnition of operatorvalued reproducing kernels. [sent-89, score-0.298]

27 An L (Λ)-valued reproducing kernel on X is a function K : X × X → L (Λ) such that K(x, y) = K(y, x)∗ for all x, y ∈ X, and such that for all x j ∈ X, ξ j ∈ Λ, j ∈ Nn := {1, 2, . [sent-96, score-0.402]

28 (1) j=1 k=1 For each L (Λ)-valued reproducing kernel K on X, there exists a unique Hilbert space, denoted by HK , consisting of Λ-valued functions on X such that K(x, ·)ξ ∈ HK for all x ∈ X and ξ ∈ Λ (2) ( f (x), ξ)Λ = ( f , K(x, ·)ξ)HK for all f ∈ HK , x ∈ X, and ξ ∈ Λ. [sent-100, score-0.402]

29 Conversely, for each Λ-valued RKHS on X, there exists a unique L (Λ)-valued reproducing kernel K on X that satisﬁes (2) and (3). [sent-104, score-0.402]

30 For this reason, we also call K the reproducing kernel (or kernel for short) of HK . [sent-105, score-0.506]

31 The bijective correspondence between L (Λ)-valued reproducing kernels and Λ-valued RKHS is central to the theory of vector-valued RKHS. [sent-106, score-0.461]

32 Given two L (Λ)-valued reproducing kernels K, G on X, we shall investigate in this paper the fundamental embedding relationship HK HG in the sense that HK ⊆ HG and for all f ∈ HK , f HK = f HG . [sent-107, score-0.558]

33 When one of the above situations happens, a remedy is to modify the reproducing kernel. [sent-126, score-0.298]

34 Speciﬁcally, one might want to ﬁnd another L (Λ)-valued reproducing kernel G such that HK HG when there is underﬁtting, or such that HG HK when there is overﬁtting. [sent-127, score-0.402]

35 We shall verify in the last section through extensive numerical simulations that the reﬁnement kernel method is indeed able to provide an appropriate update of an operator-valued reproducing kernel when underﬁtting or overﬁtting occurs. [sent-129, score-0.603]

36 Before moving on to the characterization of reﬁnement of operator-valued reproducing kernels, we collect here notations that will be frequently used in the rest of the paper. [sent-130, score-0.317]

37 General Characterizations The relationship between the RKHS of the sum of two operator-valued reproducing kernels and those of the summand kernels has been made clear in Theorem 1 on page 44 of Pedrick (1957). [sent-134, score-0.624]

38 Proposition 1 Let K, G be two L (Λ)-valued reproducing kernels on X. [sent-136, score-0.461]

39 Then HK if G − K is an L (Λ)-valued reproducing kernel on X and HK ∩ HG−K = {0}. [sent-137, score-0.402]

40 HG if and only HG then HG−K Every reproducing kernel has a feature map representation. [sent-139, score-0.425]

41 The following lemma is useful in identifying the RKHS of a reproducing kernel given by a feature map representation (5). [sent-142, score-0.425]

42 Lemma 2 If K is an L (Λ)-valued reproducing kernel on X given by (5) then HK = {Φ(·)∗ u : u ∈ W } with inner product (Φ(·)∗ u, Φ(·)∗ v)HK := (PΦ u, PΦ v)W , u, v ∈ W , where PΦ is the orthogonal projection of W onto WΦ := span{Φ(x)ξ : x ∈ X, ξ ∈ Λ}. [sent-143, score-0.424]

43 Theorem 3 Suppose that L (Λ)-valued reproducing kernels K and G are given by the feature maps ′ Φ : X → L (Λ, W ) and Φ′ : X → L (Λ, W ′ ), respectively. [sent-145, score-0.461]

44 To illustrate the above useful results, we shall present a concrete example aiming at reﬁning L (Λ)-valued reproducing kernels K with a ﬁnite-dimensional RKHS. [sent-149, score-0.58]

45 j=1 k=1 98 R EFINEMENT OF O PERATOR - VALUED R EPRODUCING K ERNELS Thus, we see that K(x, y) = Φ(y)∗ Φ(x), x, y ∈ X, implying that K is an L (Λ)-valued reproducing kernel. [sent-169, score-0.298]

46 Introduce for each L (Λ)-valued reproducing kernel K on X a scalar-valued reproducing kernel ˜ ˜ K on the extended input space X := X × Λ by setting ˜ K((x, ξ), (y, η)) := (K(x, y)ξ, η)Λ , x, y ∈ X, ξ, η ∈ Λ. [sent-191, score-0.804]

47 It appears by Proposition 6 that we do not have to bother studying reﬁnement of operator-valued reproducing kernels. [sent-205, score-0.298]

48 Typical examples include the important translationinvariant operator-valued kernels and hessian kernels to be considered in the next section. [sent-213, score-0.348]

49 We introduce another notation before returning to reproducing kernels. [sent-241, score-0.298]

50 To introduce our L (Λ)-valued reproducing kernels, we also let W be a Hilbert space and φ a mapping from X × Ω to L (Λ, W ) such that for each x ∈ X, φ(x, ·) belongs to both L2 (Ω, L (Λ, W ), dµ) and L2 (Ω, L (Λ, W ), dν). [sent-247, score-0.298]

51 We verify that (20) indeed deﬁnes an L (Λ)-valued reproducing kernel. [sent-282, score-0.298]

52 104 R EFINEMENT OF O PERATOR - VALUED R EPRODUCING K ERNELS Proposition 8 With the above assumptions on Ψ and µ, the function K deﬁned by (20) is an L (Λ)valued reproducing kernel on X. [sent-283, score-0.402]

53 (24) j=1 Since Ψ(·, ·,tl ) is a scalar-valued reproducing kernel on X, [Ψ(x j , xk ,tl ) : j, k ∈ Nn ] is a positive semi-deﬁnite matrix for each l ∈ Nm . [sent-291, score-0.43]

54 To investigate the reﬁnement relationship, we shall consider a simpliﬁed version of (20) that covers a large class of operator-valued reproducing kernels. [sent-295, score-0.395]

55 By Proposition 8, K, G are L (Λ)-valued reproducing kernels on X. [sent-299, score-0.461]

56 Examples We present in this section several concrete examples of reﬁnement of operator-valued reproducing kernels. [sent-405, score-0.32]

57 1 Translation Invariant Reproducing Kernels Let d ∈ N and K be an L (Λ)-valued reproducing kernel on Rd . [sent-408, score-0.402]

58 A celebrated characterization due to Bochner (1959) states that every continuous scalar-valued translation invariant reproducing kernel on Rd must be the Fourier transform of a ﬁnite nonnegative Borel measure on Rd , and vice versa. [sent-410, score-0.573]

59 To this end, we ﬁrst investigate the structure of the RKHS of a translation invariant L (Λ)-valued reproducing kernel. [sent-416, score-0.375]

60 Let γ be an arbitrary measure in B (Rd , Λ) and L the associated translation invariant reproducing kernel deﬁned by L(x, y) = Rd ei(x−y)·t dγ(t), x, y ∈ Rd . [sent-417, score-0.499]

61 Proof We introduce for each ξ ∈ Λ two scalar-valued translation invariant reproducing kernels on Rd by setting Aa (x, y) := (La (x, y)ξ, ξ)Λ , x, y ∈ Rd , a ∈ {c, s}. [sent-429, score-0.538]

62 In this subsection, we shall consider special translation invariant reproducing kernels and establish the characterization of reﬁnement using Theorem 7. [sent-498, score-0.654]

63 Let k be a continuously differentiable translation invariant reproducing kernel on Rd . [sent-499, score-0.479]

64 By the general integral representation (17) of operator-valued reproducing kernels, K deﬁned by (44) is an L (Cd )valued reproducing kernel on Rd . [sent-506, score-0.743]

65 Matrix-valued translation invariant reproducing kernels of the form (44) are useful for the development of divergence-free kernel methods for solving some special partial differential equations (see, for example, Lowitzsh, 2003; Wendland, 2009, and the references therein). [sent-507, score-0.642]

66 Another class of kernels constructed from the Hessian of a scalar-valued translation invariant reproducing kernel is widely applied to the learning of a multivariate function together with its gradient simultaneously (Mukherjee and Wu, 2006; Mukherjee and Zhou, 2006; Ying and Campbell, 2008). [sent-508, score-0.642]

67 We aim at reﬁning matrix-valued reproducing kernels of the forms (44) and (48) in this subsection. [sent-514, score-0.461]

68 Then K, G, K, G are matrix-valued translation invariant reproducing kernels on Rd . [sent-518, score-0.538]

69 3 Transformation Reproducing Kernels Let us consider a particular class of matrix-valued reproducing kernels whose universality was studied in Caponnetto et al. [sent-530, score-0.481]

70 To this end, we let k, g be two scalar-valued reproducing kernels on another input space Y and Tp be mappings from X to Y , p ∈ Nn . [sent-533, score-0.461]

71 (51) It is known that K, G deﬁned above are indeed L (Cn )-valued reproducing kernels (Caponnetto et al. [sent-535, score-0.461]

72 A more general case of reﬁnement of transformation reproducing kernels is discussed below. [sent-557, score-0.461]

73 Proposition 20 Let Tp , S p be mappings from X to Y and k, g be scalar-valued reproducing kernels on Y . [sent-559, score-0.461]

74 4 Finite Hilbert-Schmidt Reproducing Kernels We consider reﬁnement of ﬁnite Hilbert-Schmidt reproducing kernels in this subsection. [sent-564, score-0.461]

75 Let B j ,C j be invertible operators in L+ (Λ), n ≤ m ∈ N, and Ψ j , j ∈ Nm , be scalar-valued reproducing kernels on the input space X. [sent-565, score-0.504]

76 (57) By the general integral representation (20) and Proposition 8, K, G above are L (Λ)-valued reproducing kernels on X. [sent-567, score-0.504]

77 k∈Nm \{ j} Theorem 21 Let K, G be deﬁned by (57), where B j ,C j ∈ L+ (Λ) are invertible and Ψ j , j ∈ Nm , are scalar-valued reproducing kernels on X satisfying (58). [sent-569, score-0.461]

78 The above equation can be equivalently formulated as ∑ ∑(w j , ei ⊗ fk )Λ⊗W (ei , B j ξ)Λ fk , φ j (x) j =0 Wj k∈J j i∈I By the denseness of φ j (X) in W j , ∑(w j , ei ⊗ fk )Λ⊗W (ei , B j ξ)Λ = 0 for all j ∈ Nn , k ∈ J j and ξ ∈ Λ. [sent-587, score-0.343]

79 j i∈I We thus have for all j ∈ Nn and k ∈ J j that ∑i∈I (w j , ei ⊗ fk )Λ⊗W j ei = 0, which implies (w j , ei ⊗ fk )Λ⊗W j = 0 for all j ∈ Nn , k ∈ J j , i ∈ I. [sent-588, score-0.34]

80 Existence This section is devoted to the existence of nontrivial reﬁnement of operator-valued reproducing kernels. [sent-620, score-0.372]

81 Proposition 25 There does not exist a nontrivial reﬁnement of an L (Λ)-valued reproducing kernel K on X if and only if HK = ΛX , the set of all the functions from X to Λ. [sent-624, score-0.476]

82 If the cardinality of X is inﬁnite then every L (Λ)-valued reproducing kernel on X has a nontrivial reﬁnement. [sent-625, score-0.476]

83 A necessary condition for an L (Λ)-valued reproducing kernel on X to have no nontrivial reﬁnements is that n ∑ n ∑ (K(x j , xk )ξ j , ξk )Λ > 0 for all ξ j ∈ Λ, j ∈ Nn with j=1 k=1 121 n ∑ j=1 ξj Λ > 0. [sent-628, score-0.504]

84 In the process of reﬁning an operator-valued reproducing kernel, it is usually desirable to preserve favorable properties of the original kernel. [sent-653, score-0.298]

85 We shall show that this is feasible as far as continuity and universality of operator-valued reproducing kernels are concerned. [sent-654, score-0.578]

86 Let X be a metric space and K an L (Λ)-valued reproducing kernel that is continuous from X × X to L (Λ) when the latter is equipped with the operator norm. [sent-655, score-0.448]

87 Then every continuous L (Λ)valued reproducing kernel on X has a nontrivial continuous reﬁnement. [sent-664, score-0.52]

88 123 Z HANG , X U AND Z HANG Lemma 28 Let K be a continuous L (Λ)-valued reproducing kernel on X with the feature map representation (5), where Φ : X → L (Λ, W ) is continuous. [sent-669, score-0.447]

89 Proposition 29 Let X be a metric space and K a continuous L (Λ)-valued reproducing kernel on X. [sent-682, score-0.424]

90 Numerical Experiments We present in this ﬁnal section three numerical experiments on the application of reﬁnement of operator-valued reproducing kernels to multi-task learning. [sent-685, score-0.461]

91 To deal with the noise and have an acceptable generalization error, we use the following regularization network 1 f ∈ HK m min m ∑ j=1 f (x j ) − ξ j 124 2 Λ +σ f 2 K, H (68) R EFINEMENT OF O PERATOR - VALUED R EPRODUCING K ERNELS where K is a chosen Λ-valued reproducing kernel on X. [sent-688, score-0.402]

92 On the other hand, when overﬁtting appears in the second experiment, we shall then ﬁnd a Λ-valued reproducing kernel L on X such that HL HK with the same purpose. [sent-691, score-0.499]

93 The L+ (Rn )-valued reproducing kernel that we shall use in the regularization network (68) is a Gaussian kernel K(x, y) := S exp − (x − y)2 , x, y ∈ R, 2 n) where S ∈ L+ (R√ is strictly positive-deﬁnite. [sent-699, score-0.603]

94 More speciﬁcally, we plan to apply the reﬁnement kernel method 131 Z HANG , X U AND Z HANG kernel K kernel L n = 8, δ = 0. [sent-1185, score-0.312]

95 Conclusion and Discussion The reﬁnement relationship between two operator-valued reproducing kernels provides a promising way of updating kernels for multi-task machine learning when overﬁtting or underﬁtting occurs. [sent-1299, score-0.624]

96 Particular attention has been paid to the case when the kernels under investigation have a vector-valued integral representation, the most general form of operator-valued reproducing kernels. [sent-1301, score-0.504]

97 By the characterizations, we present concrete examples of reﬁning the translation invariant operator-valued reproducing kernels, Hessian of the scalar-valued Gaussian kernel, and ﬁnite Hilbert-Schmidt operator-valued reproducing kernels. [sent-1302, score-0.695]

98 Vector valued reproducing kernel Hilbert spaces of integrable functions and Mercer theorem. [sent-1370, score-0.517]

99 Theory of reproducing kernels for Hilbert spaces of vector valued functions. [sent-1483, score-0.536]

100 On the inclusion relation of reproducing kernel Hilbert spaces. [sent-1536, score-0.402]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hk', 0.43), ('hg', 0.41), ('nement', 0.327), ('reproducing', 0.298), ('nn', 0.24), ('hang', 0.184), ('kernels', 0.163), ('efinement', 0.141), ('eproducing', 0.141), ('perator', 0.141), ('ek', 0.128), ('ernels', 0.121), ('nm', 0.119), ('rd', 0.112), ('kernel', 0.104), ('eg', 0.102), ('rkhs', 0.098), ('shall', 0.097), ('re', 0.093), ('tp', 0.084), ('ei', 0.082), ('hkc', 0.077), ('outliers', 0.077), ('valued', 0.075), ('nontrivial', 0.074), ('span', 0.069), ('hilbert', 0.065), ('el', 0.063), ('xu', 0.063), ('hgc', 0.058), ('hgs', 0.058), ('characterizations', 0.054), ('banach', 0.053), ('diestel', 0.051), ('kz', 0.051), ('uhl', 0.051), ('borel', 0.049), ('translation', 0.047), ('fk', 0.047), ('ran', 0.045), ('hks', 0.045), ('hkz', 0.045), ('proposition', 0.044), ('operators', 0.043), ('integral', 0.043), ('absolutely', 0.041), ('integrable', 0.04), ('carmeli', 0.038), ('denseness', 0.038), ('hlc', 0.038), ('hls', 0.038), ('tting', 0.037), ('gc', 0.036), ('zhang', 0.036), ('bk', 0.035), ('cn', 0.035), ('lebesgue', 0.035), ('nonnegative', 0.033), ('caponnetto', 0.033), ('bochner', 0.032), ('brought', 0.03), ('tq', 0.03), ('invariant', 0.03), ('xk', 0.028), ('instances', 0.028), ('xy', 0.028), ('kc', 0.027), ('hl', 0.027), ('china', 0.027), ('fn', 0.026), ('lim', 0.026), ('guangdong', 0.026), ('sees', 0.025), ('adjoint', 0.025), ('micchelli', 0.025), ('operator', 0.024), ('listed', 0.024), ('map', 0.023), ('minimizer', 0.023), ('orthogonal', 0.022), ('continuous', 0.022), ('fz', 0.022), ('cucker', 0.022), ('schatten', 0.022), ('mukherjee', 0.022), ('tk', 0.022), ('concrete', 0.022), ('hessian', 0.022), ('errors', 0.02), ('universality', 0.02), ('lc', 0.02), ('nements', 0.02), ('measure', 0.02), ('characterization', 0.019), ('guangzhou', 0.019), ('haizhang', 0.019), ('hla', 0.019), ('pedrick', 0.019), ('qinghui', 0.019), ('runge', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999911 96 jmlr-2012-Refinement of Operator-valued Reproducing Kernels

Author: Haizhang Zhang, Yuesheng Xu, Qinghui Zhang

2 0.098515257 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods

Author: Zhihua Zhang, Dehua Liu, Guang Dai, Michael I. Jordan

Abstract: Support vector machines (SVMs) naturally embody sparseness due to their use of hinge loss functions. However, SVMs can not directly estimate conditional class probabilities. In this paper we propose and study a family of coherence functions, which are convex and differentiable, as surrogates of the hinge function. The coherence function is derived by using the maximum-entropy principle and is characterized by a temperature parameter. It bridges the hinge function and the logit function in logistic regression. The limit of the coherence function at zero temperature corresponds to the hinge function, and the limit of the minimizer of its expected error is the minimizer of the expected error of the hinge loss. We refer to the use of the coherence function in large-margin classiﬁcation as “C -learning,” and we present efﬁcient coordinate descent algorithms for the training of regularized C -learning models. Keywords: large-margin classiﬁers, hinge functions, logistic functions, coherence functions, C learning

3 0.095710211 117 jmlr-2012-Variable Selection in High-dimensional Varying-coefficient Models with Global Optimality

Author: Lan Xue, Annie Qu

Abstract: The varying-coefﬁcient model is ﬂexible and powerful for modeling the dynamic changes of regression coefﬁcients. It is important to identify signiﬁcant covariates associated with response variables, especially for high-dimensional settings where the number of covariates can be larger than the sample size. We consider model selection in the high-dimensional setting and adopt difference convex programming to approximate the L0 penalty, and we investigate the global optimality properties of the varying-coefﬁcient estimator. The challenge of the variable selection problem here is that the dimension of the nonparametric form for the varying-coefﬁcient modeling could be inﬁnite, in addition to dealing with the high-dimensional linear covariates. We show that the proposed varying-coefﬁcient estimator is consistent, enjoys the oracle property and achieves an optimal convergence rate for the non-zero nonparametric components for high-dimensional data. Our simulations and numerical examples indicate that the difference convex algorithm is efﬁcient using the coordinate decent algorithm, and is able to select the true model at a higher frequency than the least absolute shrinkage and selection operator (LASSO), the adaptive LASSO and the smoothly clipped absolute deviation (SCAD) approaches. Keywords: coordinate decent algorithm, difference convex programming, L0 - regularization, large-p small-n, model selection, nonparametric function, oracle property, truncated L1 penalty

4 0.090212353 16 jmlr-2012-Algorithms for Learning Kernels Based on Centered Alignment

Author: Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh

Abstract: This paper presents new and effective algorithms for learning kernels. In particular, as shown by our empirical results, these algorithms consistently outperform the so-called uniform combination solution that has proven to be difﬁcult to improve upon in the past, as well as other algorithms for learning kernels based on convex combinations of base kernels in both classiﬁcation and regression. Our algorithms are based on the notion of centered alignment which is used as a similarity measure between kernels or kernel matrices. We present a number of novel algorithmic, theoretical, and empirical results for learning kernels based on our notion of centered alignment. In particular, we describe efﬁcient algorithms for learning a maximum alignment kernel by showing that the problem can be reduced to a simple QP and discuss a one-stage algorithm for learning both a kernel and a hypothesis based on that kernel using an alignment-based regularization. Our theoretical results include a novel concentration bound for centered alignment between kernel matrices, the proof of the existence of effective predictors for kernels with high alignment, both for classiﬁcation and for regression, and the proof of stability-based generalization bounds for a broad family of algorithms for learning kernels based on centered alignment. We also report the results of experiments with our centered alignment-based algorithms in both classiﬁcation and regression. Keywords: kernel methods, learning kernels, feature selection

5 0.086499676 100 jmlr-2012-Robust Kernel Density Estimation

Author: JooSeuk Kim, Clayton D. Scott

Abstract: We propose a method for nonparametric density estimation that exhibits robustness to contamination of the training sample. This method achieves robustness by combining a traditional kernel density estimator (KDE) with ideas from classical M-estimation. We interpret the KDE based on a positive semi-deﬁnite kernel as a sample mean in the associated reproducing kernel Hilbert space. Since the sample mean is sensitive to outliers, we estimate it robustly via M-estimation, yielding a robust kernel density estimator (RKDE). An RKDE can be computed efﬁciently via a kernelized iteratively re-weighted least squares (IRWLS) algorithm. Necessary and sufﬁcient conditions are given for kernelized IRWLS to converge to the global minimizer of the M-estimator objective function. The robustness of the RKDE is demonstrated with a representer theorem, the inﬂuence function, and experimental results for density estimation and anomaly detection. Keywords: outlier, reproducing kernel Hilbert space, kernel trick, inﬂuence function, M-estimation

6 0.0748519 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

7 0.071891159 69 jmlr-2012-Mixability is Bayes Risk Curvature Relative to Log Loss

8 0.065026872 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis

9 0.061258055 56 jmlr-2012-Learning Linear Cyclic Causal Models with Latent Variables

10 0.054976501 80 jmlr-2012-On Ranking and Generalization Bounds

11 0.053449422 66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation

12 0.053090803 4 jmlr-2012-A Kernel Two-Sample Test

13 0.053039961 74 jmlr-2012-Multi Kernel Learning with Online-Batch Optimization

14 0.049816646 44 jmlr-2012-Feature Selection via Dependence Maximization

15 0.049219977 89 jmlr-2012-Pairwise Support Vector Machines and their Application to Large Scale Problems

16 0.047357142 59 jmlr-2012-Linear Regression With Random Projections

17 0.045301113 111 jmlr-2012-Structured Sparsity and Generalization

18 0.042274289 81 jmlr-2012-On the Convergence Rate oflp-Norm Multiple Kernel Learning

19 0.041960247 20 jmlr-2012-Analysis of a Random Forests Model

20 0.041226786 97 jmlr-2012-Regularization Techniques for Learning with Matrices

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.184), (1, 0.093), (2, -0.052), (3, 0.149), (4, 0.1), (5, -0.041), (6, -0.169), (7, -0.017), (8, 0.023), (9, -0.009), (10, 0.069), (11, -0.079), (12, 0.025), (13, -0.007), (14, 0.155), (15, 0.188), (16, -0.044), (17, -0.055), (18, -0.11), (19, 0.065), (20, 0.08), (21, 0.006), (22, -0.085), (23, 0.062), (24, 0.15), (25, -0.01), (26, 0.144), (27, -0.105), (28, -0.136), (29, 0.162), (30, -0.107), (31, -0.203), (32, 0.052), (33, -0.068), (34, -0.166), (35, 0.188), (36, 0.032), (37, 0.042), (38, -0.061), (39, -0.074), (40, 0.087), (41, 0.097), (42, -0.187), (43, 0.05), (44, 0.0), (45, -0.127), (46, 0.092), (47, -0.028), (48, 0.013), (49, 0.006)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96559453 96 jmlr-2012-Refinement of Operator-valued Reproducing Kernels

Author: Haizhang Zhang, Yuesheng Xu, Qinghui Zhang

2 0.54331756 16 jmlr-2012-Algorithms for Learning Kernels Based on Centered Alignment

Author: Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh

3 0.48151481 100 jmlr-2012-Robust Kernel Density Estimation

Author: JooSeuk Kim, Clayton D. Scott

4 0.38343185 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods

Author: Zhihua Zhang, Dehua Liu, Guang Dai, Michael I. Jordan

5 0.37719682 69 jmlr-2012-Mixability is Bayes Risk Curvature Relative to Log Loss

Author: Tim van Erven, Mark D. Reid, Robert C. Williamson

Abstract: Mixability of a loss characterizes fast rates in the online learning setting of prediction with expert advice. The determination of the mixability constant for binary losses is straightforward but opaque. In the binary case we make this transparent and simpler by characterising mixability in terms of the second derivative of the Bayes risk of proper losses. We then extend this result to multiclass proper losses where there are few existing results. We show that mixability is governed by the maximum eigenvalue of the Hessian of the Bayes risk, relative to the Hessian of the Bayes risk for log loss. We conclude by comparing our result to other work that bounds prediction performance in terms of the geometry of the Bayes risk. Although all calculations are for proper losses, we also show how to carry the results across to improper losses. Keywords: mixability, multiclass, prediction with expert advice, proper loss, learning rates

6 0.32483622 117 jmlr-2012-Variable Selection in High-dimensional Varying-coefficient Models with Global Optimality

7 0.31987941 66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation

8 0.30728906 56 jmlr-2012-Learning Linear Cyclic Causal Models with Latent Variables

9 0.30579454 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis

10 0.29576191 89 jmlr-2012-Pairwise Support Vector Machines and their Application to Large Scale Problems

11 0.27090096 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

12 0.24932085 44 jmlr-2012-Feature Selection via Dependence Maximization

13 0.24051072 4 jmlr-2012-A Kernel Two-Sample Test

14 0.23509187 38 jmlr-2012-Entropy Search for Information-Efficient Global Optimization

15 0.23315485 19 jmlr-2012-An Introduction to Artificial Prediction Markets for Classification

16 0.23036566 15 jmlr-2012-Algebraic Geometric Comparison of Probability Distributions

17 0.22721906 74 jmlr-2012-Multi Kernel Learning with Online-Batch Optimization

18 0.22422482 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting

19 0.21480434 10 jmlr-2012-A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss

20 0.21392529 25 jmlr-2012-Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.026), (21, 0.05), (26, 0.031), (29, 0.07), (32, 0.338), (35, 0.012), (49, 0.011), (56, 0.013), (57, 0.013), (67, 0.011), (69, 0.021), (75, 0.084), (77, 0.011), (79, 0.015), (92, 0.133), (96, 0.055)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.77850568 96 jmlr-2012-Refinement of Operator-valued Reproducing Kernels

Author: Haizhang Zhang, Yuesheng Xu, Qinghui Zhang

2 0.44687349 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting

Author: Matus Telgarsky

Abstract: Boosting combines weak learners into a predictor with low empirical risk. Its dual constructs a high entropy distribution upon which weak learners and training labels are uncorrelated. This manuscript studies this primal-dual relationship under a broad family of losses, including the exponential loss of AdaBoost and the logistic loss, revealing: • Weak learnability aids the whole loss family: for any ε > 0, O (ln(1/ε)) iterations sufﬁce to produce a predictor with empirical risk ε-close to the inﬁmum; • The circumstances granting the existence of an empirical risk minimizer may be characterized in terms of the primal and dual problems, yielding a new proof of the known rate O (ln(1/ε)); • Arbitrary instances may be decomposed into the above two, granting rate O (1/ε), with a matching lower bound provided for the logistic loss. Keywords: boosting, convex analysis, weak learnability, coordinate descent, maximum entropy

3 0.4468146 111 jmlr-2012-Structured Sparsity and Generalization

Author: Andreas Maurer, Massimiliano Pontil

Abstract: We present a data dependent generalization bound for a large class of regularized algorithms which implement structured sparsity constraints. The bound can be applied to standard squared-norm regularization, the Lasso, the group Lasso, some versions of the group Lasso with overlapping groups, multiple kernel learning and other regularization schemes. In all these cases competitive results are obtained. A novel feature of our bound is that it can be applied in an inﬁnite dimensional setting such as the Lasso in a separable Hilbert space or multiple kernel learning with a countable number of kernels. Keywords: empirical processes, Rademacher average, sparse estimation.

4 0.44532841 82 jmlr-2012-On the Necessity of Irrelevant Variables

Author: David P. Helmbold, Philip M. Long

Abstract: This work explores the effects of relevant and irrelevant boolean variables on the accuracy of classiﬁers. The analysis uses the assumption that the variables are conditionally independent given the class, and focuses on a natural family of learning algorithms for such sources when the relevant variables have a small advantage over random guessing. The main result is that algorithms relying predominately on irrelevant variables have error probabilities that quickly go to 0 in situations where algorithms that limit the use of irrelevant variables have errors bounded below by a positive constant. We also show that accurate learning is possible even when there are so few examples that one cannot determine with high conﬁdence whether or not any individual variable is relevant. Keywords: feature selection, generalization, learning theory

5 0.43977824 46 jmlr-2012-Finite-Sample Analysis of Least-Squares Policy Iteration

Author: Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos

Abstract: In this paper, we report a performance bound for the widely used least-squares policy iteration (LSPI) algorithm. We ﬁrst consider the problem of policy evaluation in reinforcement learning, that is, learning the value function of a ﬁxed policy, using the least-squares temporal-difference (LSTD) learning method, and report ﬁnite-sample analysis for this algorithm. To do so, we ﬁrst derive a bound on the performance of the LSTD solution evaluated at the states generated by the Markov chain and used by the algorithm to learn an estimate of the value function. This result is general in the sense that no assumption is made on the existence of a stationary distribution for the Markov chain. We then derive generalization bounds in the case when the Markov chain possesses a stationary distribution and is β-mixing. Finally, we analyze how the error at each policy evaluation step is propagated through the iterations of a policy iteration method, and derive a performance bound for the LSPI algorithm. Keywords: Markov decision processes, reinforcement learning, least-squares temporal-difference, least-squares policy iteration, generalization bounds, ﬁnite-sample analysis

6 0.43956318 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning

7 0.43905339 4 jmlr-2012-A Kernel Two-Sample Test

8 0.43766305 29 jmlr-2012-Consistent Model Selection Criteria on High Dimensions

9 0.4366855 69 jmlr-2012-Mixability is Bayes Risk Curvature Relative to Log Loss

10 0.4351877 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches

11 0.43477166 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods

12 0.43464261 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

13 0.43313015 117 jmlr-2012-Variable Selection in High-dimensional Varying-coefficient Models with Global Optimality

14 0.43279368 34 jmlr-2012-Dynamic Policy Programming

15 0.43234122 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class

16 0.43078372 7 jmlr-2012-A Multi-Stage Framework for Dantzig Selector and LASSO

17 0.4301855 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers

18 0.42934394 13 jmlr-2012-Active Learning via Perfect Selective Classification

19 0.42920828 80 jmlr-2012-On Ranking and Generalization Bounds

20 0.42431945 14 jmlr-2012-Activized Learning: Transforming Passive to Active with Improved Label Complexity