nips nips2010 nips2010-279 knowledge-graph by maker-knowledge-mining

279 nips-2010-Universal Kernels on Non-Standard Input Spaces

Source: pdf

Author: Andreas Christmann, Ingo Steinwart

Abstract: During the last years support vector machines (SVMs) have been successfully applied in situations where the input space X is not necessarily a subset of Rd . Examples include SVMs for the analysis of histograms or colored images, SVMs for text classiÄ?Ĺš cation and web mining, and SVMs for applications from computational biology using, e.g., kernels for trees and graphs. Moreover, SVMs are known to be consistent to the Bayes risk, if either the input space is a complete separable metric space and the reproducing kernel Hilbert space (RKHS) H Ă˘Š‚ Lp (PX ) is dense, or if the SVM uses a universal kernel k. So far, however, there are no kernels of practical interest known that satisfy these assumptions, if X Ă˘Š‚ Rd . We close this gap by providing a general technique based on Taylor-type kernels to explicitly construct universal kernels on compact metric spaces which are not subset of Rd . We apply this technique for the following special cases: universal kernels on the set of probability measures, universal kernels based on Fourier transforms, and universal kernels for signal processing. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Moreover, SVMs are known to be consistent to the Bayes risk, if either the input space is a complete separable metric space and the reproducing kernel Hilbert space (RKHS) H Ă˘Š‚ Lp (PX ) is dense, or if the SVM uses a universal kernel k. [sent-11, score-1.214]

2 So far, however, there are no kernels of practical interest known that satisfy these assumptions, if X Ă˘Š‚ Rd . [sent-12, score-0.426]

3 We close this gap by providing a general technique based on Taylor-type kernels to explicitly construct universal kernels on compact metric spaces which are not subset of Rd . [sent-13, score-1.706]

4 We apply this technique for the following special cases: universal kernels on the set of probability measures, universal kernels based on Fourier transforms, and universal kernels for signal processing. [sent-14, score-2.424]

5 histograms, as input samples have been used to analyze histogram data such as colored images, see [5, 11, 14, 12, 27, 29], and also [17] for nonextensive information theoretic kernels on measures. [sent-23, score-0.534]

6 Ĺš cation and web mining [15, 12, 16], Ă˘€Ë˜ SVMs with kernels from computational biology, e. [sent-25, score-0.426]

7 Interestingly, in this analysis, the kernel and its reproducing kernel Hilbert space (RKHS) make it possible to completely decouple the statistical analysis of SVMs from the input space X. [sent-32, score-0.516]

8 For example, if one uses the hinge loss and a bounded measurable kernel whose RKHS H is separable and dense in L1 (Ă‚Äž) for all distributions Ă‚Äž on X, then [31, Theorem 7. [sent-33, score-0.409]

9 In other words, independently of the input space X, the universal consistency of SVMs is well-understood modulo an approximation theoretical question, namely that of the denseness of H in all L1 (Ă‚Äž). [sent-37, score-0.537]

10 For example, for compact X Ă˘Š‚ Rd , [30] showed that, among a few others, the RKHSs of the Gaussian RBF kernels are universal, that is, they are dense in the space C(X) of continuous functions f : X Ă˘†’ R. [sent-39, score-0.879]

11 14], it is then easy to conclude that these RKHS are also dense in all L1 (Ă‚Äž) for which Ă‚Äž has a compact support. [sent-43, score-0.31]

12 This key result has been extended in a couple of different directions: For example, [18] establishes universality for more classes of kernels on compact X Ă˘Š‚ Rd , whereas [32] shows the denseness of the Gaussian RKHSs in L1 (Ă‚Äž) for all distributions Ă‚Äž on Rd . [sent-44, score-0.959]

13 Finally, [7, 8, 28, 29] show that universal kernels are closely related to so-called characteristic kernels that can be used to distinguish distributions. [sent-45, score-1.301]

14 Ĺš cient or necessary conditions for universality of kernels on arbitrary compact metric spaces X, and [32] further shows that the compact metric spaces are exactly the compact topological spaces on which there exist universal spaces. [sent-47, score-2.301]

15 Ĺš cient conditions for universality nor the proof of the existence of universal kernels can be used to construct universal kernels on compact metric spaces X Ă˘Š‚ Rd . [sent-49, score-2.278]

16 In fact, to the best of our knowledge, no explicit example of such kernels has so far been presented. [sent-50, score-0.465]

17 Ĺš rst explicit and constructive examples of universal kernels that live on compact metric spaces X Ă˘Š‚ Rd . [sent-53, score-1.375]

18 Ĺš nition of the Gaussian RBF kernels, or more generally, kernels that can be expressed by a Taylor series, from the Euclidean Rd to its inÄ? [sent-56, score-0.426]

19 Indeed, the closed balls of 2 are no longer (norm)-compact subsets of 2 and hence we cannot expect universality on these balls. [sent-60, score-0.32]

20 To address this issue, one may be tempted to use the weakĂ˘ˆ— -topology on 2 , since in this topology the closed balls are both compact and metrizable, thus universal kernels do exist on them. [sent-61, score-1.158]

21 However, the Taylor kernels do not belong to them, because Ă˘€“basicallyĂ˘€“ the inner product Ă‚Ë‡ , Ă‚Ë‡ 2 fails to be continuous with respect to the weakĂ˘ˆ— -topology as the sequence of the standard orthonormal basis vectors show. [sent-62, score-0.563]

22 Since the inner product of 2 is continuous with respect to the norm by virtue of the Cauchy-Schwarz inequality, it turns out that the Taylor kernels are continuous with respect to the norm topology. [sent-64, score-0.675]

23 Moreover, we will see that in this situation the Stone-WeierstraÄ‚Ÿ-argument of [30] yields a variety of universal kernels including the inÄ? [sent-65, score-0.833]

24 Ĺš nite dimensional Euclidean spaces Rd and their compact subsets, the compact subsets of 2 can be hardly viewed as somewhat natural examples of input spaces X. [sent-68, score-0.816]

25 Therefore, we go one step further by considering compact metric spaces X for which there exist a separable Hilbert space H and an injective and continuous map ÄŽ : X Ă˘†’ H. [sent-69, score-0.948]

26 Ĺš nes a universal kernel on X and the same is true for the analogous deÄ? [sent-73, score-0.588]

27 Indeed, we will use this general result to present examples of Gaussian kernels deÄ? [sent-78, score-0.457]

28 Section 2 contains the main results and constructs examples for universal kernels based on our technique. [sent-81, score-0.83]

29 In particular, we show how to construct universal kernels on sets of probability measures and on sets of functions, the latter being interesting for signal processing. [sent-82, score-0.864]

30 Equivai,j=1 Ă‹œ Ă‹œ Ă‹œ lently, k is a kernel if and only there exists a Hilbert space H and a map ĂŽĹš : X Ă˘†’ H such Ă‹œ Ă‹œ ) Ă‹œ for all x, x Ă˘ˆˆ X. [sent-89, score-0.312]

31 Moreover, for a compact metric space (X, d), we write C(X) := {f : X Ă˘†’ R | f continuous} for the space of continuous functions on X and equip this space with the usual supremum norm Ă‚Ë‡ Ă˘ˆž . [sent-95, score-0.765]

32 A kernel k on X is called universal, if k is continuous and its RKHS H is dense in C(X). [sent-96, score-0.318]

33 The kernels we consider in this paper are constructed by functions K : [Ă˘ˆ’r, r] Ă˘†’ R that can be expressed by its Taylor series, that is Ă˘ˆž an tn , K(t) = t Ă˘ˆˆ [Ă˘ˆ’r, r] . [sent-99, score-0.452]

34 Ĺš nes a kernel on the closed ball rBRd := {x Ă˘ˆˆ Rd : x 2 Ă˘‰Â¤ r} with radius r, whenever all Taylor coefÄ? [sent-102, score-0.287]

35 57], showed that Taylor kernels are universal, if an > 0 for all n Ă˘‰Ä˝ 0, while [21] notes that strict positivity on certain subsets of indices n sufÄ? [sent-106, score-0.466]

36 Ĺš nite dimensional and 2 separable counterpart 2 := {(wj )jĂ˘‰Ä˝1 : (wj ) 22 := jĂ˘‰Ä˝1 wj < Ă˘ˆž}. [sent-110, score-0.215]

37 Ĺš rst main result shows that this extension leads to a kernel, whose restrictions to compact subsets are universal, if an > 0 for all n Ă˘ˆˆ N0 := N Ă˘ˆĹž {0}. [sent-113, score-0.278]

38 (3) n=0 ii) If an > 0 for all n Ă˘ˆˆ N0 , then the restriction k|W Ä‚—W : W Ä‚— W Ă˘†’ R of k to an arbitrary Ă˘ˆš compact set W Ă˘Š‚ rB 2 is universal. [sent-117, score-0.238]

39 1 for all r > 0, and hence the resulting exponential kernel is universal on every compact subset W of 2 . [sent-122, score-0.783]

40 Ĺš rst goal, namely explicit, constructive examples of universal kernels on X Ă˘Š‚ Rd , the result is so far not really satisfying. [sent-127, score-0.83]

41 2 Let X be a compact metric space and H be a separable Hilbert space such that there exists a continuous and injective map ÄŽ : X Ă˘†’ H. [sent-133, score-0.924]

42 , (5) n=0 ii) If an > 0 for all n Ă˘ˆˆ N0 , then k is a universal kernel. [sent-137, score-0.373]

43 iii) For ÄŽƒ > 0, the Gaussian-type RBF-kernel kÄŽƒ : X Ä‚— X Ă˘†’ R is a universal kernel, where kÄŽƒ (x, x ) := exp Ă˘ˆ’ÄŽƒ 2 ÄŽ (x) Ă˘ˆ’ ÄŽ (x ) 2 H x, x Ă˘ˆˆ X. [sent-138, score-0.402]

44 H d Indeed, [25] uses the fact that on R such kernels have an integral representation in terms of the Gaussian RBF kernels to show, see [25, Corollary 4. [sent-140, score-0.852]

45 9], that these kernels inherit approximation properties such as universality from the Gaussian RBF kernels. [sent-141, score-0.625]

46 Ĺš ne explicit universal kernels, we point to a technical detail of Theorem 2. [sent-146, score-0.412]

47 To this end, let (X, dX ) be an arbitrary metric space, H be a separable Hilbert space and ÄŽ : X Ă˘†’ H be an injective map. [sent-148, score-0.434]

48 We write V := ÄŽ (X) and equip this space with the metric deÄ? [sent-149, score-0.275]

49 Moreover, since H is assumed to be separable, it is isometrically isomorphic to 2 , and hence there exists an isometric isomorphism I : H Ă˘†’ 2 . [sent-153, score-0.234]

50 We write W := I(V ) and equip this set with the metric deÄ? [sent-154, score-0.232]

51 Let us now assume that we have a kernel kW on W with RKHS HW and canonical feature map ĂŽĹšW : W Ă˘†’ HW . [sent-160, score-0.233]

52 Let us now assume that X is compact and that kW is one of the universal kernels considered in Theorem 2. [sent-166, score-1.037]

53 2 shows that kX is one of the universal kernels considered in Theorem 2. [sent-169, score-0.799]

54 Ĺš ned by kV (v, v ) := kW (I(v), I(v )), then an analogous argument shows that kV is a universal kernel. [sent-172, score-0.373]

55 Ĺš ces to assume that ÄŽ is injective, continuous and has a compact image V . [sent-174, score-0.35]

56 Then, in general, ÄŽ is not a homeomorphism and the sets of continuous functions on X and V are in general different, even if we consider the set of bounded continuous functions on X. [sent-181, score-0.365]

57 Now this difference makes it impossible to conclude from the universality of kV (or kW ) to the universality of kX . [sent-186, score-0.436]

58 Ĺš nes a metric that generates ÄŽ Ă˘ˆ’1 (ÄŽ„V ) and, since ÄŽ is isometric with respect to this new metric, we can conclude that (X, dÄŽ ) is a compact metric space. [sent-192, score-0.671]

59 2, and hence kX is universal with respect to the space C(X, dÄŽ ) of functions X Ă˘†’ R that are continuous with respect to dÄŽ . [sent-194, score-0.554]

60 In other words, while HX may fail to approximate every function that is continuous with respect to dX , it does approximate every function that is continuous with respect to dÄŽ . [sent-195, score-0.224]

61 Let us now present some universal kernels of practical interest. [sent-198, score-0.799]

62 Example 1: universal kernels on the set of probability measures. [sent-202, score-0.799]

63 Let (Ă˘„Ĺš, dĂ˘„Ĺš ) be a compact metric space, B(Ă˘„Ĺš) be its Borel ÄŽƒ-algebra, and X := M1 (Ă˘„Ĺš) be the set of all Borel probability measures on Ă˘„Ĺš. [sent-203, score-0.416]

64 Moreover, (X, dX ) is a compact metric space if and only if (Ă˘„Ĺš, dĂ˘„Ĺš ) is a compact metric space, see [19, Thm. [sent-211, score-0.799]

65 In order to construct universal kernels on (X, dX ) with the help of Theorem 2. [sent-214, score-0.799]

66 Ĺš nd separable Hilbert spaces H and injective, continuous embeddings ÄŽ : X Ă˘†’ H. [sent-216, score-0.403]

67 Let kĂ˘„Ĺš be a continuous kernel on Ă˘„Ĺš with RKHS HĂ˘„Ĺš and canonical feature map ĂŽĹšĂ˘„Ĺš (ÄŽ‰) := kĂ˘„Ĺš (ÄŽ‰, Ă‚Ë‡), ÄŽ‰ Ă˘ˆˆ Ă˘„Ĺš. [sent-217, score-0.345]

68 3 Let (Ă˘„Ĺš, dĂ˘„Ĺš ) be a complete separable metric space, H be a separable Banach space and ĂŽĹš : Ă˘„Ĺš Ă˘†’ H be a bounded, continuous function. [sent-234, score-0.579]

69 Note that this kernel is conceptionally different to characteristic kernels on Ă˘„Ĺš. [sent-242, score-0.674]

70 Indeed, characteristic kernels live on Ă˘„Ĺš and their RKHS consist of functions Ă˘„Ĺš Ă˘†’ R, while the new kernel kÄŽƒ lives on M1 (Ă˘„Ĺš) and its RKHS consists of functions M1 (Ă˘„Ĺš) Ă˘†’ R. [sent-243, score-0.751]

71 represented by histograms, densities or data, while characteristic kernels can only be used to check whether two of such distributions are equal or not. [sent-246, score-0.536]

72 Example 2: universal kernels based on Fourier transforms of probability measures. [sent-247, score-0.799]

73 Moreover, ÄŽ : P Ă˘†’ P is injective, and if a sequence (Pn ) converges weakly to some Ă‹† P Ă‹† Ă‹† P, then (Pn ) converges uniformly to P on every compact subset of Rd . [sent-254, score-0.238]

74 2 ensures that the following Gaussian-type kernel is universal and bounded: Ă‹† Ă‹† kÄŽƒ (P, P ) := exp Ă˘ˆ’ÄŽƒ 2 P Ă˘ˆ’ P 2 L2 (Ă‚Äž) P, P Ă˘ˆˆ M1 (Ă˘„Ĺš). [sent-260, score-0.574]

75 Our new kernels make it possible to directly plug the empirical distributions into the kernel kÄŽƒ , even if these distributions do not have the same length. [sent-264, score-0.666]

76 Moreover, other techniques to convert empirical distributions to absolutely continuous distributions such as kernel estimators derived via weighted averaging of rounded points (WAPRing) and (averaging) histograms with different origins, [20, 24] can be used in kÄŽƒ , too. [sent-265, score-0.443]

77 In addition, let us assume that our input values xi Ă˘ˆˆ X are functions taken from some compact set X Ă˘Š‚ L2 (Ă‚Äž). [sent-274, score-0.295]

78 This smoothing can often be described by a compact linear operator T : L2 ([0, 1]) Ă˘†’ L2 ([0, 1]), e. [sent-276, score-0.238]

79 Hence, if we assume that the true signals are contained in the closed unit ball BL2 ([0,1]) , then the observed, smoothed signals T Ă˘—Ĺš f are contained in a compact subset X of L2 ([0, 1]). [sent-279, score-0.31]

80 2, and hence the Gaussian-type kernel kÄŽƒ (g, g ) := exp Ă˘ˆ’ÄŽƒ 2 g Ă˘ˆ’ g 2 L2 (Ă‚Äž) , g, g Ă˘ˆˆ X, (11) deÄ? [sent-283, score-0.201]

81 3 Discussion The main goal of this paper was to provide an explicit construction of universal kernels that are deÄ? [sent-287, score-0.838]

82 Ĺš ned on arbitrary compact metric spaces, which are not necessarily a subset of Rd . [sent-288, score-0.378]

83 There is a still increasing interest in kernel methods including support vector machines on such input spaces, e. [sent-289, score-0.203]

84 As examples, we gave explicit universal kernels on the set of probability distributions and for signal processing. [sent-293, score-0.899]

85 One direction of further research may be to generalize our results to the case of non-compact metric spaces or to Ä? [sent-294, score-0.243]

86 Then for all j Ă˘ˆˆ NN with |j| = n, there exists a constant 0 cj Ă˘ˆˆ (0, Ă˘ˆž) such that for all summable sequences (bi ) Ă˘Š‚ [0, Ă˘ˆž) we have Ă˘ˆž Ă˘ˆž n bi = i=1 bji . [sent-309, score-0.299]

87 Moreover, 2 := 2 (N) is separable, and by using an orthonormal basis representation, it is further known that every separable Hilbert space is isometrically isomorphic to 2 . [sent-315, score-0.257]

88 The following result provides a method to construct Taylor kernels on closed balls in 2. [sent-317, score-0.507]

89 Ĺš ned by Ă˘ˆž Ă˘ˆš j ĂŽĹš(w) := cj wi i , w Ă˘ˆˆ rB 2 , i=1 jĂ˘ˆˆJ Ă˘ˆš rB 2 Ă˘†’ (12) is a feature map of k, where we use the convention 00 := 1. [sent-329, score-0.248]

90 2 then shows that, for all j Ă˘ˆˆ NN , there exists a constant cj Ă˘ˆˆ (0, Ă˘ˆž) such 0 that Ă˘ˆž k(w, w ) = i=1 jĂ˘ˆˆNN 0 Setting cj := a kernel. [sent-334, score-0.312]

91 (wi )ji a|j| cj Ă‹œ i=1 a|j| cj , we obtain that ĂŽĹš deÄ? [sent-336, score-0.276]

92 4 Let W be a compact metric space and k be a continuous kernel on W with k(w, w) > 0 for all w Ă˘ˆˆ W . [sent-341, score-0.705]

93 Suppose that we have an injective feature map ĂŽĹš : W Ă˘†’ 2 (J) of k, where J is some countable set. [sent-342, score-0.214]

94 For the multi-index j Ă˘ˆˆ J that equals 1 at the i-component and vanishes everywhere else we then have ĂŽĹš(w) = cj wi = cj wi = ĂŽĹš(w ), and hence ĂŽĹš is injective. [sent-366, score-0.374]

95 2: Since H is separable Hilbert space there exists an isometric isomorphism I : H Ă˘†’ 2 . [sent-368, score-0.347]

96 Since ÄŽ is continuous, V is the image of a compact set under a continuous map, and thus V is compact and the inverse of the bijective map I Ă˘—Ĺš ÄŽ : X Ă˘†’ W is continuous. [sent-371, score-0.715]

97 Consequently, there is a one-to-one relationship between the continuous functions fX on X and the continuous functions fW on W , namely C(X) = C(W ) Ă˘—Ĺš I Ă˘—Ĺš ÄŽ , see also the discussion following (7). [sent-372, score-0.276]

98 Moreover, the fact that I : H Ă˘†’ 2 is an isometric isomorphism yields I(ÄŽ (x)), I(ÄŽ (x )) 2 = ÄŽ (x), ÄŽ (x ) H for all x, x Ă˘ˆˆ X, and hence the kernel k considered in Theorem 2. [sent-373, score-0.298]

99 Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. [sent-428, score-0.347]

100 On the relation between universality, characteristic kernels and RKHS embeddings of measures. [sent-622, score-0.548]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('kernels', 0.426), ('universal', 0.373), ('hw', 0.249), ('compact', 0.238), ('universality', 0.199), ('kernel', 0.172), ('rkhs', 0.158), ('rb', 0.156), ('separable', 0.142), ('metric', 0.14), ('cj', 0.138), ('rd', 0.134), ('kw', 0.133), ('svms', 0.124), ('kx', 0.122), ('hilbert', 0.12), ('nn', 0.118), ('continuous', 0.112), ('rbf', 0.11), ('hx', 0.109), ('injective', 0.109), ('spaces', 0.103), ('fukumizu', 0.093), ('compactness', 0.083), ('taylor', 0.079), ('characteristic', 0.076), ('isometric', 0.072), ('ji', 0.069), ('bijective', 0.066), ('histograms', 0.066), ('theorem', 0.065), ('sriperumbudur', 0.062), ('denseness', 0.062), ('equip', 0.062), ('homeomorphism', 0.062), ('summable', 0.062), ('map', 0.061), ('kv', 0.059), ('hein', 0.056), ('steinwart', 0.056), ('reproducing', 0.055), ('isomorphism', 0.054), ('ep', 0.049), ('wi', 0.049), ('moreover', 0.049), ('consequently', 0.047), ('rkhss', 0.047), ('embeddings', 0.046), ('dx', 0.046), ('borel', 0.044), ('countable', 0.044), ('space', 0.043), ('nes', 0.043), ('sch', 0.042), ('bayreuth', 0.041), ('nonextensive', 0.041), ('rbrd', 0.041), ('stuttgart', 0.041), ('wj', 0.041), ('gretton', 0.041), ('balls', 0.041), ('subsets', 0.04), ('closed', 0.04), ('topology', 0.04), ('fourier', 0.039), ('explicit', 0.039), ('conclude', 0.038), ('measures', 0.038), ('lemma', 0.037), ('exists', 0.036), ('bji', 0.036), ('isometrically', 0.036), ('isomorphic', 0.036), ('colored', 0.036), ('pn', 0.035), ('distributions', 0.034), ('dense', 0.034), ('situation', 0.034), ('hush', 0.033), ('hilbertian', 0.033), ('dimensional', 0.032), ('ball', 0.032), ('examples', 0.031), ('input', 0.031), ('proposition', 0.031), ('write', 0.03), ('banach', 0.029), ('berlin', 0.029), ('exp', 0.029), ('modulo', 0.028), ('usual', 0.028), ('induction', 0.028), ('bi', 0.027), ('signal', 0.027), ('editors', 0.027), ('bounded', 0.027), ('functions', 0.026), ('inner', 0.025), ('live', 0.025), ('absolutely', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 279 nips-2010-Universal Kernels on Non-Standard Input Spaces

Author: Andreas Christmann, Ingo Steinwart

2 0.29500949 250 nips-2010-Spectral Regularization for Support Estimation

Author: Ernesto D. Vito, Lorenzo Rosasco, Alessandro Toigo

Abstract: In this paper we consider the problem of learning from data the support of a probability distribution when the distribution does not have a density (with respect to some reference measure). We propose a new class of regularized spectral estimators based on a new notion of reproducing kernel Hilbert space, which we call “completely regular”. Completely regular kernels allow to capture the relevant geometric and topological properties of an arbitrary probability space. In particular, they are the key ingredient to prove the universal consistency of the spectral estimators and in this respect they are the analogue of universal kernels for supervised problems. Numerical experiments show that spectral estimators compare favorably to state of the art machine learning algorithms for density support estimation.

3 0.20818555 124 nips-2010-Inductive Regularized Learning of Kernel Functions

Author: Prateek Jain, Brian Kulis, Inderjit S. Dhillon

Abstract: In this paper we consider the problem of semi-supervised kernel function learning. We ﬁrst propose a general regularized framework for learning a kernel matrix, and then demonstrate an equivalence between our proposed kernel matrix learning framework and a general linear transformation learning problem. Our result shows that the learned kernel matrices parameterize a linear transformation kernel function and can be applied inductively to new data points. Furthermore, our result gives a constructive method for kernelizing most existing Mahalanobis metric learning formulations. To make our results practical for large-scale data, we modify our framework to limit the number of parameters in the optimization process. We also consider the problem of kernelized inductive dimensionality reduction in the semi-supervised setting. To this end, we introduce a novel method for this problem by considering a special case of our general kernel learning framework where we select the trace norm function as the regularizer. We empirically demonstrate that our framework learns useful kernel functions, improving the k-NN classiﬁcation accuracy signiﬁcantly in a variety of domains. Furthermore, our kernelized dimensionality reduction technique signiﬁcantly reduces the dimensionality of the feature space while achieving competitive classiﬁcation accuracies. 1

4 0.20731407 145 nips-2010-Learning Kernels with Radiuses of Minimum Enclosing Balls

Author: Kun Gai, Guangyun Chen, Chang-shui Zhang

Abstract: In this paper, we point out that there exist scaling and initialization problems in most existing multiple kernel learning (MKL) approaches, which employ the large margin principle to jointly learn both a kernel and an SVM classiﬁer. The reason is that the margin itself can not well describe how good a kernel is due to the negligence of the scaling. We use the ratio between the margin and the radius of the minimum enclosing ball to measure the goodness of a kernel, and present a new minimization formulation for kernel learning. This formulation is invariant to scalings of learned kernels, and when learning linear combination of basis kernels it is also invariant to scalings of basis kernels and to the types (e.g., L1 or L2 ) of norm constraints on combination coefﬁcients. We establish the differentiability of our formulation, and propose a gradient projection algorithm for kernel learning. Experiments show that our method signiﬁcantly outperforms both SVM with the uniform combination of basis kernels and other state-of-art MKL approaches. 1

5 0.19297442 133 nips-2010-Kernel Descriptors for Visual Recognition

Author: Liefeng Bo, Xiaofeng Ren, Dieter Fox

Abstract: The design of low-level image features is critical for computer vision algorithms. Orientation histograms, such as those in SIFT [16] and HOG [3], are the most successful and popular features for visual object and scene recognition. We highlight the kernel view of orientation histograms, and show that they are equivalent to a certain type of match kernels over image patches. This novel view allows us to design a family of kernel descriptors which provide a uniﬁed and principled framework to turn pixel attributes (gradient, color, local binary pattern, etc.) into compact patch-level features. In particular, we introduce three types of match kernels to measure similarities between image patches, and construct compact low-dimensional kernel descriptors from these match kernels using kernel principal component analysis (KPCA) [23]. Kernel descriptors are easy to design and can turn any type of pixel attribute into patch-level features. They outperform carefully tuned and sophisticated features including SIFT and deep belief networks. We report superior performance on standard image classiﬁcation benchmarks: Scene-15, Caltech-101, CIFAR10 and CIFAR10-ImageNet.

6 0.16261967 278 nips-2010-Universal Consistency of Multi-Class Support Vector Classification

7 0.16237916 280 nips-2010-Unsupervised Kernel Dimension Reduction

8 0.1368438 176 nips-2010-Multiple Kernel Learning and the SMO Algorithm

9 0.11516701 174 nips-2010-Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition

10 0.10306538 104 nips-2010-Generative Local Metric Learning for Nearest Neighbor Classification

11 0.10258681 96 nips-2010-Fractionally Predictive Spiking Neurons

12 0.095176026 233 nips-2010-Scrambled Objects for Least-Squares Regression

13 0.091165416 7 nips-2010-A Family of Penalty Functions for Structured Sparsity

14 0.085142002 249 nips-2010-Spatial and anatomical regularization of SVM for brain image analysis

15 0.08373642 270 nips-2010-Tight Sample Complexity of Large-Margin Learning

16 0.080051899 199 nips-2010-Optimal learning rates for Kernel Conjugate Gradient regression

17 0.076001197 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis

18 0.075201951 222 nips-2010-Random Walk Approach to Regret Minimization

19 0.073658846 185 nips-2010-Nonparametric Density Estimation for Stochastic Optimization with an Observable State Variable

20 0.072725341 243 nips-2010-Smoothness, Low Noise and Fast Rates

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.195), (1, 0.066), (2, 0.109), (3, -0.025), (4, 0.244), (5, 0.104), (6, 0.323), (7, -0.045), (8, 0.078), (9, 0.079), (10, -0.029), (11, -0.038), (12, -0.159), (13, -0.029), (14, -0.032), (15, 0.025), (16, 0.052), (17, 0.02), (18, -0.017), (19, -0.06), (20, -0.042), (21, -0.041), (22, -0.019), (23, 0.003), (24, 0.014), (25, 0.021), (26, 0.042), (27, -0.025), (28, -0.124), (29, -0.1), (30, 0.061), (31, -0.029), (32, -0.059), (33, 0.014), (34, 0.019), (35, -0.007), (36, -0.095), (37, 0.056), (38, 0.029), (39, 0.003), (40, -0.009), (41, 0.053), (42, -0.041), (43, 0.018), (44, 0.05), (45, 0.017), (46, 0.036), (47, -0.007), (48, -0.095), (49, 0.111)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98143327 279 nips-2010-Universal Kernels on Non-Standard Input Spaces

Author: Andreas Christmann, Ingo Steinwart

2 0.87308133 250 nips-2010-Spectral Regularization for Support Estimation

Author: Ernesto D. Vito, Lorenzo Rosasco, Alessandro Toigo

3 0.80449325 280 nips-2010-Unsupervised Kernel Dimension Reduction

Author: Meihong Wang, Fei Sha, Michael I. Jordan

Abstract: We apply the framework of kernel dimension reduction, originally designed for supervised problems, to unsupervised dimensionality reduction. In this framework, kernel-based measures of independence are used to derive low-dimensional representations that maximally capture information in covariates in order to predict responses. We extend this idea and develop similarly motivated measures for unsupervised problems where covariates and responses are the same. Our empirical studies show that the resulting compact representation yields meaningful and appealing visualization and clustering of data. Furthermore, when used in conjunction with supervised learners for classiﬁcation, our methods lead to lower classiﬁcation errors than state-of-the-art methods, especially when embedding data in spaces of very few dimensions.

4 0.78150165 124 nips-2010-Inductive Regularized Learning of Kernel Functions

Author: Prateek Jain, Brian Kulis, Inderjit S. Dhillon

5 0.72179317 145 nips-2010-Learning Kernels with Radiuses of Minimum Enclosing Balls

Author: Kun Gai, Guangyun Chen, Chang-shui Zhang

6 0.68277431 133 nips-2010-Kernel Descriptors for Visual Recognition

7 0.66031188 199 nips-2010-Optimal learning rates for Kernel Conjugate Gradient regression

8 0.65694189 278 nips-2010-Universal Consistency of Multi-Class Support Vector Classification

9 0.60354728 174 nips-2010-Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition

10 0.55424356 176 nips-2010-Multiple Kernel Learning and the SMO Algorithm

11 0.54797935 233 nips-2010-Scrambled Objects for Least-Squares Regression

12 0.50201136 72 nips-2010-Efficient algorithms for learning kernels from multiple similarity matrices with general convex loss functions

13 0.47093496 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors

14 0.45663881 249 nips-2010-Spatial and anatomical regularization of SVM for brain image analysis

15 0.42975184 104 nips-2010-Generative Local Metric Learning for Nearest Neighbor Classification

16 0.41405782 105 nips-2010-Getting lost in space: Large sample analysis of the resistance distance

17 0.38346395 113 nips-2010-Heavy-Tailed Process Priors for Selective Shrinkage

18 0.3685948 134 nips-2010-LSTD with Random Projections

19 0.36816701 270 nips-2010-Tight Sample Complexity of Large-Margin Learning

20 0.35478371 222 nips-2010-Random Walk Approach to Regret Minimization

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.028), (27, 0.031), (30, 0.06), (45, 0.139), (50, 0.039), (52, 0.455), (60, 0.064), (77, 0.036), (90, 0.054)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.8714326 279 nips-2010-Universal Kernels on Non-Standard Input Spaces

Author: Andreas Christmann, Ingo Steinwart

2 0.86994368 231 nips-2010-Robust PCA via Outlier Pursuit

Author: Huan Xu, Constantine Caramanis, Sujay Sanghavi

Abstract: Singular Value Decomposition (and Principal Component Analysis) is one of the most widely used techniques for dimensionality reduction: successful and efﬁciently computable, it is nevertheless plagued by a well-known, well-documented sensitivity to outliers. Recent work has considered the setting where each point has a few arbitrarily corrupted components. Yet, in applications of SVD or PCA such as robust collaborative ﬁltering or bioinformatics, malicious agents, defective genes, or simply corrupted or contaminated experiments may effectively yield entire points that are completely corrupted. We present an efﬁcient convex optimization-based algorithm we call Outlier Pursuit, that under some mild assumptions on the uncorrupted points (satisﬁed, e.g., by the standard generative assumption in PCA problems) recovers the exact optimal low-dimensional subspace, and identiﬁes the corrupted points. Such identiﬁcation of corrupted points that do not conform to the low-dimensional approximation, is of paramount interest in bioinformatics and ﬁnancial applications, and beyond. Our techniques involve matrix decomposition using nuclear norm minimization, however, our results, setup, and approach, necessarily differ considerably from the existing line of work in matrix completion and matrix decomposition, since we develop an approach to recover the correct column space of the uncorrupted matrix, rather than the exact matrix itself.

3 0.85573941 227 nips-2010-Rescaling, thinning or complementing? On goodness-of-fit procedures for point process models and Generalized Linear Models

Author: Felipe Gerhard, Wulfram Gerstner

Abstract: Generalized Linear Models (GLMs) are an increasingly popular framework for modeling neural spike trains. They have been linked to the theory of stochastic point processes and researchers have used this relation to assess goodness-of-ﬁt using methods from point-process theory, e.g. the time-rescaling theorem. However, high neural ﬁring rates or coarse discretization lead to a breakdown of the assumptions necessary for this connection. Here, we show how goodness-of-ﬁt tests from point-process theory can still be applied to GLMs by constructing equivalent surrogate point processes out of time-series observations. Furthermore, two additional tests based on thinning and complementing point processes are introduced. They augment the instruments available for checking model adequacy of point processes as well as discretized models. 1

4 0.814915 65 nips-2010-Divisive Normalization: Justification and Effectiveness as Efficient Coding Transform

Author: Siwei Lyu

Abstract: Divisive normalization (DN) has been advocated as an effective nonlinear efﬁcient coding transform for natural sensory signals with applications in biology and engineering. In this work, we aim to establish a connection between the DN transform and the statistical properties of natural sensory signals. Our analysis is based on the use of multivariate t model to capture some important statistical properties of natural sensory signals. The multivariate t model justiﬁes DN as an approximation to the transform that completely eliminates its statistical dependency. Furthermore, using the multivariate t model and measuring statistical dependency with multi-information, we can precisely quantify the statistical dependency that is reduced by the DN transform. We compare this with the actual performance of the DN transform in reducing statistical dependencies of natural sensory signals. Our theoretical analysis and quantitative evaluations conﬁrm DN as an effective efﬁcient coding transform for natural sensory signals. On the other hand, we also observe a previously unreported phenomenon that DN may increase statistical dependencies when the size of pooling is small. 1

5 0.77823168 140 nips-2010-Layer-wise analysis of deep networks with Gaussian kernels

Author: Grégoire Montavon, Klaus-Robert Müller, Mikio L. Braun

Abstract: Deep networks can potentially express a learning problem more efﬁciently than local learning machines. While deep networks outperform local learning machines on some problems, it is still unclear how their nice representation emerges from their complex structure. We present an analysis based on Gaussian kernels that measures how the representation of the learning problem evolves layer after layer as the deep network builds higher-level abstract representations of the input. We use this analysis to show empirically that deep networks build progressively better representations of the learning problem and that the best representations are obtained when the deep network discriminates only in the last layers. 1

6 0.54505163 18 nips-2010-A novel family of non-parametric cumulative based divergences for point processes

7 0.52458364 10 nips-2010-A Novel Kernel for Learning a Neuron Model from Spike Train Data

8 0.5140208 64 nips-2010-Distributionally Robust Markov Decision Processes

9 0.51186639 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning

10 0.50896049 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery

11 0.49913579 115 nips-2010-Identifying Dendritic Processing

12 0.49654314 96 nips-2010-Fractionally Predictive Spiking Neurons

13 0.49389318 7 nips-2010-A Family of Penalty Functions for Structured Sparsity

14 0.49285331 131 nips-2010-Joint Analysis of Time-Evolving Binary Matrices and Associated Documents

15 0.49215323 250 nips-2010-Spectral Regularization for Support Estimation

16 0.49178904 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing

17 0.49163336 117 nips-2010-Identifying graph-structured activation patterns in networks

18 0.4883768 233 nips-2010-Scrambled Objects for Least-Squares Regression

19 0.48837146 265 nips-2010-The LASSO risk: asymptotic results and real world examples

20 0.48438773 225 nips-2010-Relaxed Clipping: A Global Training Method for Robust Regression and Classification