nips nips2010 nips2010-279 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Andreas Christmann, Ingo Steinwart
Abstract: During the last years support vector machines (SVMs) have been successfully applied in situations where the input space X is not necessarily a subset of Rd . Examples include SVMs for the analysis of histograms or colored images, SVMs for text classiÄ?Ĺš cation and web mining, and SVMs for applications from computational biology using, e.g., kernels for trees and graphs. Moreover, SVMs are known to be consistent to the Bayes risk, if either the input space is a complete separable metric space and the reproducing kernel Hilbert space (RKHS) H ⊂ Lp (PX ) is dense, or if the SVM uses a universal kernel k. So far, however, there are no kernels of practical interest known that satisfy these assumptions, if X ⊂ Rd . We close this gap by providing a general technique based on Taylor-type kernels to explicitly construct universal kernels on compact metric spaces which are not subset of Rd . We apply this technique for the following special cases: universal kernels on the set of probability measures, universal kernels based on Fourier transforms, and universal kernels for signal processing. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Moreover, SVMs are known to be consistent to the Bayes risk, if either the input space is a complete separable metric space and the reproducing kernel Hilbert space (RKHS) H ⊂ Lp (PX ) is dense, or if the SVM uses a universal kernel k. [sent-11, score-1.214]
2 So far, however, there are no kernels of practical interest known that satisfy these assumptions, if X ⊂ Rd . [sent-12, score-0.426]
3 We close this gap by providing a general technique based on Taylor-type kernels to explicitly construct universal kernels on compact metric spaces which are not subset of Rd . [sent-13, score-1.706]
4 We apply this technique for the following special cases: universal kernels on the set of probability measures, universal kernels based on Fourier transforms, and universal kernels for signal processing. [sent-14, score-2.424]
5 histograms, as input samples have been used to analyze histogram data such as colored images, see [5, 11, 14, 12, 27, 29], and also [17] for nonextensive information theoretic kernels on measures. [sent-23, score-0.534]
6 Ĺš cation and web mining [15, 12, 16], â€Ë˜ SVMs with kernels from computational biology, e. [sent-25, score-0.426]
7 Interestingly, in this analysis, the kernel and its reproducing kernel Hilbert space (RKHS) make it possible to completely decouple the statistical analysis of SVMs from the input space X. [sent-32, score-0.516]
8 For example, if one uses the hinge loss and a bounded measurable kernel whose RKHS H is separable and dense in L1 (Ă‚Äž) for all distributions Ă‚Äž on X, then [31, Theorem 7. [sent-33, score-0.409]
9 In other words, independently of the input space X, the universal consistency of SVMs is well-understood modulo an approximation theoretical question, namely that of the denseness of H in all L1 (Ă‚Äž). [sent-37, score-0.537]
10 For example, for compact X ⊂ Rd , [30] showed that, among a few others, the RKHSs of the Gaussian RBF kernels are universal, that is, they are dense in the space C(X) of continuous functions f : X → R. [sent-39, score-0.879]
11 14], it is then easy to conclude that these RKHS are also dense in all L1 (Ă‚Äž) for which Ă‚Äž has a compact support. [sent-43, score-0.31]
12 This key result has been extended in a couple of different directions: For example, [18] establishes universality for more classes of kernels on compact X ⊂ Rd , whereas [32] shows the denseness of the Gaussian RKHSs in L1 (ÂĞ) for all distributions ÂĞ on Rd . [sent-44, score-0.959]
13 Finally, [7, 8, 28, 29] show that universal kernels are closely related to so-called characteristic kernels that can be used to distinguish distributions. [sent-45, score-1.301]
14 Ĺš cient or necessary conditions for universality of kernels on arbitrary compact metric spaces X, and [32] further shows that the compact metric spaces are exactly the compact topological spaces on which there exist universal spaces. [sent-47, score-2.301]
15 Ś cient conditions for universality nor the proof of the existence of universal kernels can be used to construct universal kernels on compact metric spaces X ⊂ Rd . [sent-49, score-2.278]
16 In fact, to the best of our knowledge, no explicit example of such kernels has so far been presented. [sent-50, score-0.465]
17 Ś rst explicit and constructive examples of universal kernels that live on compact metric spaces X ⊂ Rd . [sent-53, score-1.375]
18 Ĺš nition of the Gaussian RBF kernels, or more generally, kernels that can be expressed by a Taylor series, from the Euclidean Rd to its inÄ? [sent-56, score-0.426]
19 Indeed, the closed balls of 2 are no longer (norm)-compact subsets of 2 and hence we cannot expect universality on these balls. [sent-60, score-0.32]
20 To address this issue, one may be tempted to use the weak∗ -topology on 2 , since in this topology the closed balls are both compact and metrizable, thus universal kernels do exist on them. [sent-61, score-1.158]
21 However, the Taylor kernels do not belong to them, because –basically– the inner product ¡ , ¡ 2 fails to be continuous with respect to the weak∗ -topology as the sequence of the standard orthonormal basis vectors show. [sent-62, score-0.563]
22 Since the inner product of 2 is continuous with respect to the norm by virtue of the Cauchy-Schwarz inequality, it turns out that the Taylor kernels are continuous with respect to the norm topology. [sent-64, score-0.675]
23 Moreover, we will see that in this situation the Stone-WeierstraÄ‚Ÿ-argument of [30] yields a variety of universal kernels including the inÄ? [sent-65, score-0.833]
24 Ĺš nite dimensional Euclidean spaces Rd and their compact subsets, the compact subsets of 2 can be hardly viewed as somewhat natural examples of input spaces X. [sent-68, score-0.816]
25 Therefore, we go one step further by considering compact metric spaces X for which there exist a separable Hilbert space H and an injective and continuous map Ď : X → H. [sent-69, score-0.948]
26 Ĺš nes a universal kernel on X and the same is true for the analogous deÄ? [sent-73, score-0.588]
27 Indeed, we will use this general result to present examples of Gaussian kernels deÄ? [sent-78, score-0.457]
28 Section 2 contains the main results and constructs examples for universal kernels based on our technique. [sent-81, score-0.83]
29 In particular, we show how to construct universal kernels on sets of probability measures and on sets of functions, the latter being interesting for signal processing. [sent-82, score-0.864]
30 Equivai,j=1 Ă‹œ Ă‹œ Ă‹œ lently, k is a kernel if and only there exists a Hilbert space H and a map ĂŽĹš : X → H such Ă‹œ Ă‹œ ) Ă‹œ for all x, x ∈ X. [sent-89, score-0.312]
31 Moreover, for a compact metric space (X, d), we write C(X) := {f : X → R | f continuous} for the space of continuous functions on X and equip this space with the usual supremum norm ¡ ∞ . [sent-95, score-0.765]
32 A kernel k on X is called universal, if k is continuous and its RKHS H is dense in C(X). [sent-96, score-0.318]
33 The kernels we consider in this paper are constructed by functions K : [−r, r] → R that can be expressed by its Taylor series, that is ∞ an tn , K(t) = t ∈ [−r, r] . [sent-99, score-0.452]
34 Ĺš nes a kernel on the closed ball rBRd := {x ∈ Rd : x 2 â‰Â¤ r} with radius r, whenever all Taylor coefÄ? [sent-102, score-0.287]
35 57], showed that Taylor kernels are universal, if an > 0 for all n â‰Ä˝ 0, while [21] notes that strict positivity on certain subsets of indices n sufÄ? [sent-106, score-0.466]
36 Ĺš nite dimensional and 2 separable counterpart 2 := {(wj )jâ‰Ä˝1 : (wj ) 22 := jâ‰Ä˝1 wj < ∞}. [sent-110, score-0.215]
37 Ĺš rst main result shows that this extension leads to a kernel, whose restrictions to compact subsets are universal, if an > 0 for all n ∈ N0 := N âˆĹž {0}. [sent-113, score-0.278]
38 (3) n=0 ii) If an > 0 for all n ∈ N0 , then the restriction k|W Ä‚—W : W Ä‚— W → R of k to an arbitrary √ compact set W ⊂ rB 2 is universal. [sent-117, score-0.238]
39 1 for all r > 0, and hence the resulting exponential kernel is universal on every compact subset W of 2 . [sent-122, score-0.783]
40 Ś rst goal, namely explicit, constructive examples of universal kernels on X ⊂ Rd , the result is so far not really satisfying. [sent-127, score-0.83]
41 2 Let X be a compact metric space and H be a separable Hilbert space such that there exists a continuous and injective map Ď : X → H. [sent-133, score-0.924]
42 , (5) n=0 ii) If an > 0 for all n ∈ N0 , then k is a universal kernel. [sent-137, score-0.373]
43 iii) For ÄŽƒ > 0, the Gaussian-type RBF-kernel kÄŽƒ : X Ä‚— X → R is a universal kernel, where kÄŽƒ (x, x ) := exp −Ďƒ 2 ÄŽ (x) − ÄŽ (x ) 2 H x, x ∈ X. [sent-138, score-0.402]
44 H d Indeed, [25] uses the fact that on R such kernels have an integral representation in terms of the Gaussian RBF kernels to show, see [25, Corollary 4. [sent-140, score-0.852]
45 9], that these kernels inherit approximation properties such as universality from the Gaussian RBF kernels. [sent-141, score-0.625]
46 Ĺš ne explicit universal kernels, we point to a technical detail of Theorem 2. [sent-146, score-0.412]
47 To this end, let (X, dX ) be an arbitrary metric space, H be a separable Hilbert space and Ď : X → H be an injective map. [sent-148, score-0.434]
48 We write V := ÄŽ (X) and equip this space with the metric deÄ? [sent-149, score-0.275]
49 Moreover, since H is assumed to be separable, it is isometrically isomorphic to 2 , and hence there exists an isometric isomorphism I : H → 2 . [sent-153, score-0.234]
50 We write W := I(V ) and equip this set with the metric deÄ? [sent-154, score-0.232]
51 Let us now assume that we have a kernel kW on W with RKHS HW and canonical feature map ÎŚW : W → HW . [sent-160, score-0.233]
52 Let us now assume that X is compact and that kW is one of the universal kernels considered in Theorem 2. [sent-166, score-1.037]
53 2 shows that kX is one of the universal kernels considered in Theorem 2. [sent-169, score-0.799]
54 Ĺš ned by kV (v, v ) := kW (I(v), I(v )), then an analogous argument shows that kV is a universal kernel. [sent-172, score-0.373]
55 Ĺš ces to assume that ÄŽ is injective, continuous and has a compact image V . [sent-174, score-0.35]
56 Then, in general, ÄŽ is not a homeomorphism and the sets of continuous functions on X and V are in general different, even if we consider the set of bounded continuous functions on X. [sent-181, score-0.365]
57 Now this difference makes it impossible to conclude from the universality of kV (or kW ) to the universality of kX . [sent-186, score-0.436]
58 Ĺš nes a metric that generates ÄŽ −1 (ÄŽ„V ) and, since ÄŽ is isometric with respect to this new metric, we can conclude that (X, dÄŽ ) is a compact metric space. [sent-192, score-0.671]
59 2, and hence kX is universal with respect to the space C(X, dĎ ) of functions X → R that are continuous with respect to dĎ . [sent-194, score-0.554]
60 In other words, while HX may fail to approximate every function that is continuous with respect to dX , it does approximate every function that is continuous with respect to dÄŽ . [sent-195, score-0.224]
61 Let us now present some universal kernels of practical interest. [sent-198, score-0.799]
62 Example 1: universal kernels on the set of probability measures. [sent-202, score-0.799]
63 Let (â„Ĺš, dâ„Ĺš ) be a compact metric space, B(â„Ĺš) be its Borel ÄŽƒ-algebra, and X := M1 (â„Ĺš) be the set of all Borel probability measures on â„Ĺš. [sent-203, score-0.416]
64 Moreover, (X, dX ) is a compact metric space if and only if (â„Ĺš, dâ„Ĺš ) is a compact metric space, see [19, Thm. [sent-211, score-0.799]
65 In order to construct universal kernels on (X, dX ) with the help of Theorem 2. [sent-214, score-0.799]
66 Ś nd separable Hilbert spaces H and injective, continuous embeddings Ď : X → H. [sent-216, score-0.403]
67 Let kâ„Ĺš be a continuous kernel on â„Ĺš with RKHS Hâ„Ĺš and canonical feature map ÎŚâ„Ĺš (ÄŽ‰) := kâ„Ĺš (ÄŽ‰, ¡), ÄŽ‰ ∈ â„Ĺš. [sent-217, score-0.345]
68 3 Let (â„Ĺš, dâ„Ĺš ) be a complete separable metric space, H be a separable Banach space and ĂŽĹš : â„Ĺš → H be a bounded, continuous function. [sent-234, score-0.579]
69 Note that this kernel is conceptionally different to characteristic kernels on â„Ĺš. [sent-242, score-0.674]
70 Indeed, characteristic kernels live on â„Ĺš and their RKHS consist of functions â„Ĺš → R, while the new kernel kÄŽƒ lives on M1 (â„Ĺš) and its RKHS consists of functions M1 (â„Ĺš) → R. [sent-243, score-0.751]
71 represented by histograms, densities or data, while characteristic kernels can only be used to check whether two of such distributions are equal or not. [sent-246, score-0.536]
72 Example 2: universal kernels based on Fourier transforms of probability measures. [sent-247, score-0.799]
73 Moreover, ÄŽ : P → P is injective, and if a sequence (Pn ) converges weakly to some Ă‹† P Ă‹† Ă‹† P, then (Pn ) converges uniformly to P on every compact subset of Rd . [sent-254, score-0.238]
74 2 ensures that the following Gaussian-type kernel is universal and bounded: Ă‹† Ă‹† kÄŽƒ (P, P ) := exp −Ďƒ 2 P − P 2 L2 (Ă‚Äž) P, P ∈ M1 (â„Ĺš). [sent-260, score-0.574]
75 Our new kernels make it possible to directly plug the empirical distributions into the kernel kÄŽƒ , even if these distributions do not have the same length. [sent-264, score-0.666]
76 Moreover, other techniques to convert empirical distributions to absolutely continuous distributions such as kernel estimators derived via weighted averaging of rounded points (WAPRing) and (averaging) histograms with different origins, [20, 24] can be used in kÄŽƒ , too. [sent-265, score-0.443]
77 In addition, let us assume that our input values xi ∈ X are functions taken from some compact set X ⊂ L2 (Ă‚Äž). [sent-274, score-0.295]
78 This smoothing can often be described by a compact linear operator T : L2 ([0, 1]) → L2 ([0, 1]), e. [sent-276, score-0.238]
79 Hence, if we assume that the true signals are contained in the closed unit ball BL2 ([0,1]) , then the observed, smoothed signals T â—Ĺš f are contained in a compact subset X of L2 ([0, 1]). [sent-279, score-0.31]
80 2, and hence the Gaussian-type kernel kÄŽƒ (g, g ) := exp −Ďƒ 2 g − g 2 L2 (Ă‚Äž) , g, g ∈ X, (11) deÄ? [sent-283, score-0.201]
81 3 Discussion The main goal of this paper was to provide an explicit construction of universal kernels that are deÄ? [sent-287, score-0.838]
82 Ĺš ned on arbitrary compact metric spaces, which are not necessarily a subset of Rd . [sent-288, score-0.378]
83 There is a still increasing interest in kernel methods including support vector machines on such input spaces, e. [sent-289, score-0.203]
84 As examples, we gave explicit universal kernels on the set of probability distributions and for signal processing. [sent-293, score-0.899]
85 One direction of further research may be to generalize our results to the case of non-compact metric spaces or to Ä? [sent-294, score-0.243]
86 Then for all j ∈ NN with |j| = n, there exists a constant 0 cj ∈ (0, ∞) such that for all summable sequences (bi ) ⊂ [0, ∞) we have ∞ ∞ n bi = i=1 bji . [sent-309, score-0.299]
87 Moreover, 2 := 2 (N) is separable, and by using an orthonormal basis representation, it is further known that every separable Hilbert space is isometrically isomorphic to 2 . [sent-315, score-0.257]
88 The following result provides a method to construct Taylor kernels on closed balls in 2. [sent-317, score-0.507]
89 Ĺš ned by ∞ √ j ĂŽĹš(w) := cj wi i , w ∈ rB 2 , i=1 j∈J √ rB 2 → (12) is a feature map of k, where we use the convention 00 := 1. [sent-329, score-0.248]
90 2 then shows that, for all j ∈ NN , there exists a constant cj ∈ (0, ∞) such 0 that ∞ k(w, w ) = i=1 j∈NN 0 Setting cj := a kernel. [sent-334, score-0.312]
91 (wi )ji a|j| cj Ă‹œ i=1 a|j| cj , we obtain that ĂŽĹš deÄ? [sent-336, score-0.276]
92 4 Let W be a compact metric space and k be a continuous kernel on W with k(w, w) > 0 for all w ∈ W . [sent-341, score-0.705]
93 Suppose that we have an injective feature map ÎŚ : W → 2 (J) of k, where J is some countable set. [sent-342, score-0.214]
94 For the multi-index j ∈ J that equals 1 at the i-component and vanishes everywhere else we then have ĂŽĹš(w) = cj wi = cj wi = ĂŽĹš(w ), and hence ĂŽĹš is injective. [sent-366, score-0.374]
95 2: Since H is separable Hilbert space there exists an isometric isomorphism I : H → 2 . [sent-368, score-0.347]
96 Since ÄŽ is continuous, V is the image of a compact set under a continuous map, and thus V is compact and the inverse of the bijective map I â—Ĺš ÄŽ : X → W is continuous. [sent-371, score-0.715]
97 Consequently, there is a one-to-one relationship between the continuous functions fX on X and the continuous functions fW on W , namely C(X) = C(W ) â—Ĺš I â—Ĺš ÄŽ , see also the discussion following (7). [sent-372, score-0.276]
98 Moreover, the fact that I : H → 2 is an isometric isomorphism yields I(Ď (x)), I(Ď (x )) 2 = Ď (x), Ď (x ) H for all x, x ∈ X, and hence the kernel k considered in Theorem 2. [sent-373, score-0.298]
99 Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. [sent-428, score-0.347]
100 On the relation between universality, characteristic kernels and RKHS embeddings of measures. [sent-622, score-0.548]
wordName wordTfidf (topN-words)
[('kernels', 0.426), ('universal', 0.373), ('hw', 0.249), ('compact', 0.238), ('universality', 0.199), ('kernel', 0.172), ('rkhs', 0.158), ('rb', 0.156), ('separable', 0.142), ('metric', 0.14), ('cj', 0.138), ('rd', 0.134), ('kw', 0.133), ('svms', 0.124), ('kx', 0.122), ('hilbert', 0.12), ('nn', 0.118), ('continuous', 0.112), ('rbf', 0.11), ('hx', 0.109), ('injective', 0.109), ('spaces', 0.103), ('fukumizu', 0.093), ('compactness', 0.083), ('taylor', 0.079), ('characteristic', 0.076), ('isometric', 0.072), ('ji', 0.069), ('bijective', 0.066), ('histograms', 0.066), ('theorem', 0.065), ('sriperumbudur', 0.062), ('denseness', 0.062), ('equip', 0.062), ('homeomorphism', 0.062), ('summable', 0.062), ('map', 0.061), ('kv', 0.059), ('hein', 0.056), ('steinwart', 0.056), ('reproducing', 0.055), ('isomorphism', 0.054), ('ep', 0.049), ('wi', 0.049), ('moreover', 0.049), ('consequently', 0.047), ('rkhss', 0.047), ('embeddings', 0.046), ('dx', 0.046), ('borel', 0.044), ('countable', 0.044), ('space', 0.043), ('nes', 0.043), ('sch', 0.042), ('bayreuth', 0.041), ('nonextensive', 0.041), ('rbrd', 0.041), ('stuttgart', 0.041), ('wj', 0.041), ('gretton', 0.041), ('balls', 0.041), ('subsets', 0.04), ('closed', 0.04), ('topology', 0.04), ('fourier', 0.039), ('explicit', 0.039), ('conclude', 0.038), ('measures', 0.038), ('lemma', 0.037), ('exists', 0.036), ('bji', 0.036), ('isometrically', 0.036), ('isomorphic', 0.036), ('colored', 0.036), ('pn', 0.035), ('distributions', 0.034), ('dense', 0.034), ('situation', 0.034), ('hush', 0.033), ('hilbertian', 0.033), ('dimensional', 0.032), ('ball', 0.032), ('examples', 0.031), ('input', 0.031), ('proposition', 0.031), ('write', 0.03), ('banach', 0.029), ('berlin', 0.029), ('exp', 0.029), ('modulo', 0.028), ('usual', 0.028), ('induction', 0.028), ('bi', 0.027), ('signal', 0.027), ('editors', 0.027), ('bounded', 0.027), ('functions', 0.026), ('inner', 0.025), ('live', 0.025), ('absolutely', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 279 nips-2010-Universal Kernels on Non-Standard Input Spaces
Author: Andreas Christmann, Ingo Steinwart
Abstract: During the last years support vector machines (SVMs) have been successfully applied in situations where the input space X is not necessarily a subset of Rd . Examples include SVMs for the analysis of histograms or colored images, SVMs for text classiÄ?Ĺš cation and web mining, and SVMs for applications from computational biology using, e.g., kernels for trees and graphs. Moreover, SVMs are known to be consistent to the Bayes risk, if either the input space is a complete separable metric space and the reproducing kernel Hilbert space (RKHS) H ⊂ Lp (PX ) is dense, or if the SVM uses a universal kernel k. So far, however, there are no kernels of practical interest known that satisfy these assumptions, if X ⊂ Rd . We close this gap by providing a general technique based on Taylor-type kernels to explicitly construct universal kernels on compact metric spaces which are not subset of Rd . We apply this technique for the following special cases: universal kernels on the set of probability measures, universal kernels based on Fourier transforms, and universal kernels for signal processing. 1
2 0.29500949 250 nips-2010-Spectral Regularization for Support Estimation
Author: Ernesto D. Vito, Lorenzo Rosasco, Alessandro Toigo
Abstract: In this paper we consider the problem of learning from data the support of a probability distribution when the distribution does not have a density (with respect to some reference measure). We propose a new class of regularized spectral estimators based on a new notion of reproducing kernel Hilbert space, which we call “completely regular”. Completely regular kernels allow to capture the relevant geometric and topological properties of an arbitrary probability space. In particular, they are the key ingredient to prove the universal consistency of the spectral estimators and in this respect they are the analogue of universal kernels for supervised problems. Numerical experiments show that spectral estimators compare favorably to state of the art machine learning algorithms for density support estimation.
3 0.20818555 124 nips-2010-Inductive Regularized Learning of Kernel Functions
Author: Prateek Jain, Brian Kulis, Inderjit S. Dhillon
Abstract: In this paper we consider the problem of semi-supervised kernel function learning. We first propose a general regularized framework for learning a kernel matrix, and then demonstrate an equivalence between our proposed kernel matrix learning framework and a general linear transformation learning problem. Our result shows that the learned kernel matrices parameterize a linear transformation kernel function and can be applied inductively to new data points. Furthermore, our result gives a constructive method for kernelizing most existing Mahalanobis metric learning formulations. To make our results practical for large-scale data, we modify our framework to limit the number of parameters in the optimization process. We also consider the problem of kernelized inductive dimensionality reduction in the semi-supervised setting. To this end, we introduce a novel method for this problem by considering a special case of our general kernel learning framework where we select the trace norm function as the regularizer. We empirically demonstrate that our framework learns useful kernel functions, improving the k-NN classification accuracy significantly in a variety of domains. Furthermore, our kernelized dimensionality reduction technique significantly reduces the dimensionality of the feature space while achieving competitive classification accuracies. 1
4 0.20731407 145 nips-2010-Learning Kernels with Radiuses of Minimum Enclosing Balls
Author: Kun Gai, Guangyun Chen, Chang-shui Zhang
Abstract: In this paper, we point out that there exist scaling and initialization problems in most existing multiple kernel learning (MKL) approaches, which employ the large margin principle to jointly learn both a kernel and an SVM classifier. The reason is that the margin itself can not well describe how good a kernel is due to the negligence of the scaling. We use the ratio between the margin and the radius of the minimum enclosing ball to measure the goodness of a kernel, and present a new minimization formulation for kernel learning. This formulation is invariant to scalings of learned kernels, and when learning linear combination of basis kernels it is also invariant to scalings of basis kernels and to the types (e.g., L1 or L2 ) of norm constraints on combination coefficients. We establish the differentiability of our formulation, and propose a gradient projection algorithm for kernel learning. Experiments show that our method significantly outperforms both SVM with the uniform combination of basis kernels and other state-of-art MKL approaches. 1
5 0.19297442 133 nips-2010-Kernel Descriptors for Visual Recognition
Author: Liefeng Bo, Xiaofeng Ren, Dieter Fox
Abstract: The design of low-level image features is critical for computer vision algorithms. Orientation histograms, such as those in SIFT [16] and HOG [3], are the most successful and popular features for visual object and scene recognition. We highlight the kernel view of orientation histograms, and show that they are equivalent to a certain type of match kernels over image patches. This novel view allows us to design a family of kernel descriptors which provide a unified and principled framework to turn pixel attributes (gradient, color, local binary pattern, etc.) into compact patch-level features. In particular, we introduce three types of match kernels to measure similarities between image patches, and construct compact low-dimensional kernel descriptors from these match kernels using kernel principal component analysis (KPCA) [23]. Kernel descriptors are easy to design and can turn any type of pixel attribute into patch-level features. They outperform carefully tuned and sophisticated features including SIFT and deep belief networks. We report superior performance on standard image classification benchmarks: Scene-15, Caltech-101, CIFAR10 and CIFAR10-ImageNet.
6 0.16261967 278 nips-2010-Universal Consistency of Multi-Class Support Vector Classification
7 0.16237916 280 nips-2010-Unsupervised Kernel Dimension Reduction
8 0.1368438 176 nips-2010-Multiple Kernel Learning and the SMO Algorithm
10 0.10306538 104 nips-2010-Generative Local Metric Learning for Nearest Neighbor Classification
11 0.10258681 96 nips-2010-Fractionally Predictive Spiking Neurons
12 0.095176026 233 nips-2010-Scrambled Objects for Least-Squares Regression
13 0.091165416 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
14 0.085142002 249 nips-2010-Spatial and anatomical regularization of SVM for brain image analysis
15 0.08373642 270 nips-2010-Tight Sample Complexity of Large-Margin Learning
16 0.080051899 199 nips-2010-Optimal learning rates for Kernel Conjugate Gradient regression
17 0.076001197 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis
18 0.075201951 222 nips-2010-Random Walk Approach to Regret Minimization
19 0.073658846 185 nips-2010-Nonparametric Density Estimation for Stochastic Optimization with an Observable State Variable
20 0.072725341 243 nips-2010-Smoothness, Low Noise and Fast Rates
topicId topicWeight
[(0, 0.195), (1, 0.066), (2, 0.109), (3, -0.025), (4, 0.244), (5, 0.104), (6, 0.323), (7, -0.045), (8, 0.078), (9, 0.079), (10, -0.029), (11, -0.038), (12, -0.159), (13, -0.029), (14, -0.032), (15, 0.025), (16, 0.052), (17, 0.02), (18, -0.017), (19, -0.06), (20, -0.042), (21, -0.041), (22, -0.019), (23, 0.003), (24, 0.014), (25, 0.021), (26, 0.042), (27, -0.025), (28, -0.124), (29, -0.1), (30, 0.061), (31, -0.029), (32, -0.059), (33, 0.014), (34, 0.019), (35, -0.007), (36, -0.095), (37, 0.056), (38, 0.029), (39, 0.003), (40, -0.009), (41, 0.053), (42, -0.041), (43, 0.018), (44, 0.05), (45, 0.017), (46, 0.036), (47, -0.007), (48, -0.095), (49, 0.111)]
simIndex simValue paperId paperTitle
same-paper 1 0.98143327 279 nips-2010-Universal Kernels on Non-Standard Input Spaces
Author: Andreas Christmann, Ingo Steinwart
Abstract: During the last years support vector machines (SVMs) have been successfully applied in situations where the input space X is not necessarily a subset of Rd . Examples include SVMs for the analysis of histograms or colored images, SVMs for text classiÄ?Ĺš cation and web mining, and SVMs for applications from computational biology using, e.g., kernels for trees and graphs. Moreover, SVMs are known to be consistent to the Bayes risk, if either the input space is a complete separable metric space and the reproducing kernel Hilbert space (RKHS) H ⊂ Lp (PX ) is dense, or if the SVM uses a universal kernel k. So far, however, there are no kernels of practical interest known that satisfy these assumptions, if X ⊂ Rd . We close this gap by providing a general technique based on Taylor-type kernels to explicitly construct universal kernels on compact metric spaces which are not subset of Rd . We apply this technique for the following special cases: universal kernels on the set of probability measures, universal kernels based on Fourier transforms, and universal kernels for signal processing. 1
2 0.87308133 250 nips-2010-Spectral Regularization for Support Estimation
Author: Ernesto D. Vito, Lorenzo Rosasco, Alessandro Toigo
Abstract: In this paper we consider the problem of learning from data the support of a probability distribution when the distribution does not have a density (with respect to some reference measure). We propose a new class of regularized spectral estimators based on a new notion of reproducing kernel Hilbert space, which we call “completely regular”. Completely regular kernels allow to capture the relevant geometric and topological properties of an arbitrary probability space. In particular, they are the key ingredient to prove the universal consistency of the spectral estimators and in this respect they are the analogue of universal kernels for supervised problems. Numerical experiments show that spectral estimators compare favorably to state of the art machine learning algorithms for density support estimation.
3 0.80449325 280 nips-2010-Unsupervised Kernel Dimension Reduction
Author: Meihong Wang, Fei Sha, Michael I. Jordan
Abstract: We apply the framework of kernel dimension reduction, originally designed for supervised problems, to unsupervised dimensionality reduction. In this framework, kernel-based measures of independence are used to derive low-dimensional representations that maximally capture information in covariates in order to predict responses. We extend this idea and develop similarly motivated measures for unsupervised problems where covariates and responses are the same. Our empirical studies show that the resulting compact representation yields meaningful and appealing visualization and clustering of data. Furthermore, when used in conjunction with supervised learners for classification, our methods lead to lower classification errors than state-of-the-art methods, especially when embedding data in spaces of very few dimensions.
4 0.78150165 124 nips-2010-Inductive Regularized Learning of Kernel Functions
Author: Prateek Jain, Brian Kulis, Inderjit S. Dhillon
Abstract: In this paper we consider the problem of semi-supervised kernel function learning. We first propose a general regularized framework for learning a kernel matrix, and then demonstrate an equivalence between our proposed kernel matrix learning framework and a general linear transformation learning problem. Our result shows that the learned kernel matrices parameterize a linear transformation kernel function and can be applied inductively to new data points. Furthermore, our result gives a constructive method for kernelizing most existing Mahalanobis metric learning formulations. To make our results practical for large-scale data, we modify our framework to limit the number of parameters in the optimization process. We also consider the problem of kernelized inductive dimensionality reduction in the semi-supervised setting. To this end, we introduce a novel method for this problem by considering a special case of our general kernel learning framework where we select the trace norm function as the regularizer. We empirically demonstrate that our framework learns useful kernel functions, improving the k-NN classification accuracy significantly in a variety of domains. Furthermore, our kernelized dimensionality reduction technique significantly reduces the dimensionality of the feature space while achieving competitive classification accuracies. 1
5 0.72179317 145 nips-2010-Learning Kernels with Radiuses of Minimum Enclosing Balls
Author: Kun Gai, Guangyun Chen, Chang-shui Zhang
Abstract: In this paper, we point out that there exist scaling and initialization problems in most existing multiple kernel learning (MKL) approaches, which employ the large margin principle to jointly learn both a kernel and an SVM classifier. The reason is that the margin itself can not well describe how good a kernel is due to the negligence of the scaling. We use the ratio between the margin and the radius of the minimum enclosing ball to measure the goodness of a kernel, and present a new minimization formulation for kernel learning. This formulation is invariant to scalings of learned kernels, and when learning linear combination of basis kernels it is also invariant to scalings of basis kernels and to the types (e.g., L1 or L2 ) of norm constraints on combination coefficients. We establish the differentiability of our formulation, and propose a gradient projection algorithm for kernel learning. Experiments show that our method significantly outperforms both SVM with the uniform combination of basis kernels and other state-of-art MKL approaches. 1
6 0.68277431 133 nips-2010-Kernel Descriptors for Visual Recognition
7 0.66031188 199 nips-2010-Optimal learning rates for Kernel Conjugate Gradient regression
8 0.65694189 278 nips-2010-Universal Consistency of Multi-Class Support Vector Classification
10 0.55424356 176 nips-2010-Multiple Kernel Learning and the SMO Algorithm
11 0.54797935 233 nips-2010-Scrambled Objects for Least-Squares Regression
12 0.50201136 72 nips-2010-Efficient algorithms for learning kernels from multiple similarity matrices with general convex loss functions
13 0.47093496 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors
14 0.45663881 249 nips-2010-Spatial and anatomical regularization of SVM for brain image analysis
15 0.42975184 104 nips-2010-Generative Local Metric Learning for Nearest Neighbor Classification
16 0.41405782 105 nips-2010-Getting lost in space: Large sample analysis of the resistance distance
17 0.38346395 113 nips-2010-Heavy-Tailed Process Priors for Selective Shrinkage
18 0.3685948 134 nips-2010-LSTD with Random Projections
19 0.36816701 270 nips-2010-Tight Sample Complexity of Large-Margin Learning
20 0.35478371 222 nips-2010-Random Walk Approach to Regret Minimization
topicId topicWeight
[(13, 0.028), (27, 0.031), (30, 0.06), (45, 0.139), (50, 0.039), (52, 0.455), (60, 0.064), (77, 0.036), (90, 0.054)]
simIndex simValue paperId paperTitle
same-paper 1 0.8714326 279 nips-2010-Universal Kernels on Non-Standard Input Spaces
Author: Andreas Christmann, Ingo Steinwart
Abstract: During the last years support vector machines (SVMs) have been successfully applied in situations where the input space X is not necessarily a subset of Rd . Examples include SVMs for the analysis of histograms or colored images, SVMs for text classiÄ?Ĺš cation and web mining, and SVMs for applications from computational biology using, e.g., kernels for trees and graphs. Moreover, SVMs are known to be consistent to the Bayes risk, if either the input space is a complete separable metric space and the reproducing kernel Hilbert space (RKHS) H ⊂ Lp (PX ) is dense, or if the SVM uses a universal kernel k. So far, however, there are no kernels of practical interest known that satisfy these assumptions, if X ⊂ Rd . We close this gap by providing a general technique based on Taylor-type kernels to explicitly construct universal kernels on compact metric spaces which are not subset of Rd . We apply this technique for the following special cases: universal kernels on the set of probability measures, universal kernels based on Fourier transforms, and universal kernels for signal processing. 1
2 0.86994368 231 nips-2010-Robust PCA via Outlier Pursuit
Author: Huan Xu, Constantine Caramanis, Sujay Sanghavi
Abstract: Singular Value Decomposition (and Principal Component Analysis) is one of the most widely used techniques for dimensionality reduction: successful and efficiently computable, it is nevertheless plagued by a well-known, well-documented sensitivity to outliers. Recent work has considered the setting where each point has a few arbitrarily corrupted components. Yet, in applications of SVD or PCA such as robust collaborative filtering or bioinformatics, malicious agents, defective genes, or simply corrupted or contaminated experiments may effectively yield entire points that are completely corrupted. We present an efficient convex optimization-based algorithm we call Outlier Pursuit, that under some mild assumptions on the uncorrupted points (satisfied, e.g., by the standard generative assumption in PCA problems) recovers the exact optimal low-dimensional subspace, and identifies the corrupted points. Such identification of corrupted points that do not conform to the low-dimensional approximation, is of paramount interest in bioinformatics and financial applications, and beyond. Our techniques involve matrix decomposition using nuclear norm minimization, however, our results, setup, and approach, necessarily differ considerably from the existing line of work in matrix completion and matrix decomposition, since we develop an approach to recover the correct column space of the uncorrupted matrix, rather than the exact matrix itself.
Author: Felipe Gerhard, Wulfram Gerstner
Abstract: Generalized Linear Models (GLMs) are an increasingly popular framework for modeling neural spike trains. They have been linked to the theory of stochastic point processes and researchers have used this relation to assess goodness-of-fit using methods from point-process theory, e.g. the time-rescaling theorem. However, high neural firing rates or coarse discretization lead to a breakdown of the assumptions necessary for this connection. Here, we show how goodness-of-fit tests from point-process theory can still be applied to GLMs by constructing equivalent surrogate point processes out of time-series observations. Furthermore, two additional tests based on thinning and complementing point processes are introduced. They augment the instruments available for checking model adequacy of point processes as well as discretized models. 1
4 0.814915 65 nips-2010-Divisive Normalization: Justification and Effectiveness as Efficient Coding Transform
Author: Siwei Lyu
Abstract: Divisive normalization (DN) has been advocated as an effective nonlinear efficient coding transform for natural sensory signals with applications in biology and engineering. In this work, we aim to establish a connection between the DN transform and the statistical properties of natural sensory signals. Our analysis is based on the use of multivariate t model to capture some important statistical properties of natural sensory signals. The multivariate t model justifies DN as an approximation to the transform that completely eliminates its statistical dependency. Furthermore, using the multivariate t model and measuring statistical dependency with multi-information, we can precisely quantify the statistical dependency that is reduced by the DN transform. We compare this with the actual performance of the DN transform in reducing statistical dependencies of natural sensory signals. Our theoretical analysis and quantitative evaluations confirm DN as an effective efficient coding transform for natural sensory signals. On the other hand, we also observe a previously unreported phenomenon that DN may increase statistical dependencies when the size of pooling is small. 1
5 0.77823168 140 nips-2010-Layer-wise analysis of deep networks with Gaussian kernels
Author: Grégoire Montavon, Klaus-Robert Müller, Mikio L. Braun
Abstract: Deep networks can potentially express a learning problem more efficiently than local learning machines. While deep networks outperform local learning machines on some problems, it is still unclear how their nice representation emerges from their complex structure. We present an analysis based on Gaussian kernels that measures how the representation of the learning problem evolves layer after layer as the deep network builds higher-level abstract representations of the input. We use this analysis to show empirically that deep networks build progressively better representations of the learning problem and that the best representations are obtained when the deep network discriminates only in the last layers. 1
6 0.54505163 18 nips-2010-A novel family of non-parametric cumulative based divergences for point processes
7 0.52458364 10 nips-2010-A Novel Kernel for Learning a Neuron Model from Spike Train Data
8 0.5140208 64 nips-2010-Distributionally Robust Markov Decision Processes
9 0.51186639 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning
10 0.50896049 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
11 0.49913579 115 nips-2010-Identifying Dendritic Processing
12 0.49654314 96 nips-2010-Fractionally Predictive Spiking Neurons
13 0.49389318 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
14 0.49285331 131 nips-2010-Joint Analysis of Time-Evolving Binary Matrices and Associated Documents
15 0.49215323 250 nips-2010-Spectral Regularization for Support Estimation
16 0.49178904 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
17 0.49163336 117 nips-2010-Identifying graph-structured activation patterns in networks
18 0.4883768 233 nips-2010-Scrambled Objects for Least-Squares Regression
19 0.48837146 265 nips-2010-The LASSO risk: asymptotic results and real world examples
20 0.48438773 225 nips-2010-Relaxed Clipping: A Global Training Method for Robust Regression and Classification