nips nips2010 nips2010-250 knowledge-graph by maker-knowledge-mining

250 nips-2010-Spectral Regularization for Support Estimation


Source: pdf

Author: Ernesto D. Vito, Lorenzo Rosasco, Alessandro Toigo

Abstract: In this paper we consider the problem of learning from data the support of a probability distribution when the distribution does not have a density (with respect to some reference measure). We propose a new class of regularized spectral estimators based on a new notion of reproducing kernel Hilbert space, which we call “completely regular”. Completely regular kernels allow to capture the relevant geometric and topological properties of an arbitrary probability space. In particular, they are the key ingredient to prove the universal consistency of the spectral estimators and in this respect they are the analogue of universal kernels for supervised problems. Numerical experiments show that spectral estimators compare favorably to state of the art machine learning algorithms for density support estimation.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 it Abstract In this paper we consider the problem of learning from data the support of a probability distribution when the distribution does not have a density (with respect to some reference measure). [sent-10, score-0.168]

2 We propose a new class of regularized spectral estimators based on a new notion of reproducing kernel Hilbert space, which we call “completely regular”. [sent-11, score-0.867]

3 Completely regular kernels allow to capture the relevant geometric and topological properties of an arbitrary probability space. [sent-12, score-0.232]

4 In particular, they are the key ingredient to prove the universal consistency of the spectral estimators and in this respect they are the analogue of universal kernels for supervised problems. [sent-13, score-0.448]

5 Numerical experiments show that spectral estimators compare favorably to state of the art machine learning algorithms for density support estimation. [sent-14, score-0.387]

6 From a theoretical point of view the problem has been usually considered in the setting where the probability distribution has a density with respect to a known measure (for example the Lebesgue measure in Rd or the volume measure on a manifold). [sent-19, score-0.192]

7 Another kernel method, related to the one we discuss in this paper, is presented in [11]. [sent-22, score-0.245]

8 1 In this paper we consider a general scenario (see [18]) where the underlying model is a probability space (X, ρ) and we are given a (similarity) function K which is a reproducing kernel. [sent-27, score-0.322]

9 The geometry (and topology) in (X, ρ) is defined by the kernel K. [sent-34, score-0.245]

10 While this framework is abstract and poses new challenges, by assuming the similarity function to be a reproducing kernel we can make full use of the good computational properties of kernel methods and the powerful theory of reproducing kernel Hilbert spaces (RKHS) [2]. [sent-35, score-1.358]

11 Interestingly, the idea of using a reproducing kernel K to construct a metric on a set X is originally due to Schoenberg (see for example [4]). [sent-36, score-0.619]

12 If X is Rd (in fact any topological space), a natural candidate to define the region of interest, is the notion of support of a probability distribution– defined as the intersection of the closed subsets C of X, such that ρ(C) = 1. [sent-42, score-0.334]

13 In an arbitrary probability space the support of the measure is not well defined since no topology is given. [sent-43, score-0.202]

14 The reproducing kernel K provides a way to solve this problem and also suggests a possible approach to model Xρ . [sent-44, score-0.529]

15 The first idea is to use the fact that under mild assumptions the kernel defines a metric on X [18], so that the concept of closed set, hence that of support, is well defined. [sent-45, score-0.47]

16 The second idea is to use the kernel to construct a function Fρ such that the level set corresponding to one is exactly the support Xρ – in this case we say that the RKHS associated to K separates the support Xρ . [sent-46, score-0.507]

17 By doing this we are in fact imposing an assumption on Xρ : given a kernel K, we can only separate certain sets. [sent-47, score-0.245]

18 • We prove that Fρ is uniquely defined by the null space of the integral operator associated to K. [sent-49, score-0.261]

19 Given that the integral operator (and its spectral properties) can be approximated studying the kernel matrix on a sample, this result suggests a way to estimate the support empirically. [sent-50, score-0.714]

20 In this paper, we consider a spectral estimator based on a spectral regularization strategy, replacing the kernel matrix with its regularized version (Tikhonov regularization being one example). [sent-53, score-0.963]

21 • We introduce the notion of completely regular RKHS, that answer positively to the question whether there exist kernels that can separate the support of any distribution. [sent-54, score-0.448]

22 Examples of completely regular kernels are presented and results suggesting how they can be constructed are given. [sent-55, score-0.302]

23 The concept of completely regular RKHS plays a role similar to the concept of universal kernels in supervised learning, for example see [19]. [sent-56, score-0.407]

24 Finally, given the above results, we show that the regularized spectral estimator enjoys a universal consistency property: the correct support can be asymptotically recovered for any problem (that is any probability distribution). [sent-57, score-0.524]

25 In Section 2 we introduce the notion of completely regular kernels and their basic properties. [sent-59, score-0.348]

26 2 Completely regular reproducing kernel Hilbert spaces In this section we introduce the notion of a completely regular reproducing kernel Hilbert space. [sent-63, score-1.555]

27 Such a space defines a geometry on a measurable space X which is compatible with the measurable structure. [sent-64, score-0.242]

28 The function is determined by the spectral projection associated with the null eigenvalue of the integral operator defined by the reproducing kernel. [sent-66, score-0.716]

29 We fix a complex1 reproducing kernel Hilbert space H on X with a reproducing kernel K : X × X → C [2]. [sent-69, score-1.096]

30 For each function f ∈ H, the reproducing property f (x) = f, Kx holds for all x ∈ X. [sent-72, score-0.284]

31 When different reproducing kernel Hilbert spaces are considered, we denote by HK the reproducing kernel Hilbert space with reproducing kernel K. [sent-73, score-1.68]

32 Before giving the definition of completely regular RKHS, which is the key concept presented in this section, we need some preliminary definitions and results. [sent-74, score-0.289]

33 (1) For example, if X = Rd and H is the reproducing kernel Hilbert space with linear kernel K(x, t) = x · t, the sets separated by H are precisely the hyperplanes containing the origin. [sent-77, score-0.911]

34 Then (X, dK ) is a metric space and the sets separated by H are always dK closed, see Prop. [sent-90, score-0.197]

35 The next result shows that assuming the kernel to be measurable is enough to solve this problem. [sent-94, score-0.328]

36 Assume that Kx = Kt if x = t, then the sets separated by H are closed with respect to dK . [sent-96, score-0.207]

37 Moreover, if H is separable and the kernel is measurable, then the sets separated by H are measurable. [sent-97, score-0.371]

38 Given the above premises, the following is the key definition that characterizes the reproducing kernel Hilbert spaces which are able to separate the largest family of subsets of X. [sent-98, score-0.615]

39 A reproducing kernel Hilbert space H with reproducing kernel K such that Kx = Kt if x = t is called completely regular if H separates all the subsets C ⊂ X which are closed with respect to the metric (2). [sent-100, score-1.676]

40 The term completely regular is borrowed from topology, where a topological space is called completely regular if, for any closed subset C and any point x0 ∈ C, there exists a continuous function f / such that f (x0 ) = 0 and f (x) = 0 for all x ∈ C. [sent-101, score-0.74]

41 In the supplementary material, several examples of completely regular reproducing kernel Hilbert spaces are given, as well as a discussion on how such spaces can be constructed. [sent-102, score-0.927]

42 A particular case is when X is already a metric space with a distance 1 Considering complex valued RKHS allows to use the theory of Fourier transform and for practical problems we can simply consider real valued kernels. [sent-103, score-0.196]

43 If K is continuous with respect to dX , the assumption of complete regularity forces the metrics dK and dX to have the same closed subsets. [sent-105, score-0.174]

44 Furthermore, since the closed sets of X are independent of H, the complete regularity of H can be proved by showing that a suitable family of bump2 functions is contained in H. [sent-107, score-0.173]

45 Let X be a separable metric space with respect to a metric dX . [sent-109, score-0.308]

46 Assume that the kernel K is a continuous function with respect to dX and that the space H separates every subset C which is closed with respect to dX . [sent-110, score-0.543]

47 Then (i) The space H is separable and K is measurable with respect to the Borel σ-algebra generated by dX . [sent-111, score-0.211]

48 (ii) The metric dK defined by (2) is equivalent to dX , that is, a set is closed with respect to dK if and only if it is closed with respect to dX . [sent-112, score-0.366]

49 As a consequence of the above result, many classical reproducing kernel Hilbert spaces are completely regular. [sent-114, score-0.706]

50 For example, if X = Rd and H is the Sobolev space of order s with s > d/2, then H is completely regular. [sent-115, score-0.16]

51 In fact, a standard result of analysis ensures that, for any closed set C and any x0 ∈ C there exists a smooth bump function such that f (x0 ) = 1 and its support is contained in / the complement of C. [sent-117, score-0.268]

52 Interestingly enough, if H is the reproducing kernel Hilbert space with the Gaussian kernel, it is known that the elements of H are analytic functions, see Cor. [sent-118, score-0.567]

53 Finally, the next result shows that the reproducing kernel can be normalized to one on the diagonal under the mild assumption that K(x, x) = 0 for all x ∈ X. [sent-123, score-0.529]

54 Then the reproducing kernel Hilbert space K(x, t) with the normalized kernel K ′ (x, t) = separates the same sets as H. [sent-126, score-0.874]

55 In particular, we prove that both the Laplacian kernel K(x, y) = e exponential kernel K(x, y) = e d ∈ N. [sent-128, score-0.49]

56 − x−y 1 √ 2σ − x−y 2 √ 2σ and ℓ1 - d defined on R are completely regular for any σ > 0 and 3 Spectral Algorithms for Learning the Support In this section, we first discuss our framework and our main assumptions. [sent-129, score-0.259]

57 Moreover we consider a reproducing kernel K satisfying the following assumption. [sent-139, score-0.529]

58 The reproducing kernel K is measurable and K(x, x) = 1, for all x ∈ X. [sent-141, score-0.612]

59 Moreover K defines a completely regular and separable RKHS H. [sent-142, score-0.316]

60 We endow X with the metric dK defined in (2), so that X becomes a separable metric space. [sent-143, score-0.237]

61 The assumption of complete regularity ensures that any closed subset is separated by H and, hence, is measurable by Prop. [sent-144, score-0.32]

62 Then we can define the support Xρ of the measure ρ, as the intersection of all the closed sets C ⊂ X, such that ρ(C) = 1. [sent-146, score-0.237]

63 To design an effective empirical estimator we develop a novel characterization of the support of an arbitrary distribution that we describe in the next section. [sent-151, score-0.2]

64 1 A New Characterization of the Support The key observation towards defining a learning algorithm to estimate Xρ it is that the projection Pρ can be expressed in terms of the integral operator defined by the kernel K. [sent-153, score-0.405]

65 Moreover, let T : H → H be the linear operator defined as T = X Kx ⊗ Kx dρ(x), where the integral converges in the Hilbert space of Hilbert-Schmidt operators on H (see for example [7] for the proof). [sent-155, score-0.198]

66 Using the reproducing property in H [2], it is straightforward to see that T is simply the integral operator with kernel K with domain and range in H. [sent-156, score-0.689]

67 Observe that in general Kx does not belong to the domain of T † and, if θ denotes the Heaviside function with θ(0) = 0, then spectral theory gives that Pρ = T † T = θ(T ). [sent-159, score-0.209]

68 The above observation is crucial as it gives a new characterization of the support of ρ in terms of the null space of T and the latter can be estimated from data. [sent-160, score-0.201]

69 The latter operator is an unbiased estimator of T . [sent-166, score-0.184]

70 † In this paper we study a spectral filtering approach which replaces Tn with an approximation gλ (Tn ) obtained filtering out the components corresponding to the small eigenvalues of Tn . [sent-170, score-0.318]

71 More precisely if Tn = j σj vj ⊗ vj is a spectral decomposition of Tn , then gλ (Tn ) = j gλ (σj )vj ⊗ vj . [sent-172, score-0.35]

72 Examples of algorithms that fall into the above class include iterative methods– akin to boosting mλ 1 1 gλ (σ) = k=0 (1 − σ)k , spectral cut-off gλ (σ) = σ 1σ>λ (σ) + λ 1σ≤λ (σ), and Tikhonov regular1 ization gλ (σ) = σ+λ . [sent-178, score-0.209]

73 (5) One can see that that the computation of Fn reduces to solving a simple finite dimensional problem involving the empirical kernel matrix defined by the training data. [sent-181, score-0.299]

74 A simple computation shows that 1 ∗ ∗ Tn = n Sn Sn and Sn Sn = Kn is the n by n kernel matrix, where the (i, j)-entry is K(xi , xj ). [sent-190, score-0.245]

75 1 ∗ ∗ ∗ Then it is easy to see that gλ (Tn )Tn = gλ (Sn Sn /n)Sn Sn /n = n Sn gλ (Kn /n)Sn , so that Fn (x) = 1 T kx gλ (Kn /n)kx , n (6) where kx is the n-dimensional column vector kx = Sn Kx = (K(x1 , x), . [sent-191, score-1.599]

76 Note that Equation (6) plays the role of a representer theorem for the spectral estimator, in the sense that it reduces the problem of finding an estimator in an infinite dimensional space to a finite dimensional problem. [sent-195, score-0.455]

77 4 Theoretical Analysis: Universal Consistency In this section we study the consistency property of spectral estimators. [sent-196, score-0.239]

78 We prove the results only for the filter corresponding to the classical Tikhonov regularization though the same results hold for the class of spectral filters described by Assumption 2. [sent-198, score-0.289]

79 Also note that standard metric used for support estimation (see for example [22, 5]) cannot be used in our analsys since they rely on the existence of a reference measure µ (usually the Lebesgue measure) and the assumption that ρ is absolutely continuous with respect to µ. [sent-201, score-0.255]

80 Let Fn be the estimator defined by Tikhonov regularization and choose a sequence λn so that log n lim λn = 0 and limsup √ < +∞, (7) n→∞ n→∞ λn n then lim sup |Fn (x) − Fρ (x)| = 0, almost surely, (8) n→+∞ x∈C for every compact subset C of X We add three comments. [sent-204, score-0.385]

81 Second, we observe that a natural choice would be the regularization defined by kernel PCA [11], which corresponds to truncating the generalized inverse of the kernel matrix at some cutoff parameter λ. [sent-206, score-0.601]

82 Third, we note that the uniform convergence of Fn to Fρ on compact subsets does not imply the convergence of the level sets of Fn to the corresponding level sets of Fρ , for example with respect to the standard Hausdorff distance among closed subsets. [sent-210, score-0.221]

83 If the sequence (τn )n∈N converges to zero in such a way that lim sup n→∞ supx∈C |Fn (x) − Fρ (x)| ≤ 1, τn almost surely (9) then, lim dH (Xn ∩ C, Xρ ∩ C) = 0 n→+∞ almost surely, for any compact subset C. [sent-214, score-0.256]

84 First, it is possible to show that, if the (normalized) kernel K is such that limx′ →∞ Kx (x′ ) = 0 for any x ∈ X – as it happens for the Laplacian kernel, then Theorems 1 and 2 also hold by choosing C = X. [sent-216, score-0.245]

85 Again for space constraints we will only discuss spectral algorithms induced by Tikhonov regularization. [sent-220, score-0.247]

86 Following the discussion in the last section, Tikhonov regularization defines an estimator Fn (x) = kx T (Kn + nλI)−1 kx and a point is labeled as belonging to the support if Fn (x) ≥ 1 − τ . [sent-222, score-1.346]

87 Note that, in this case the algorithm requires to choose 3 parameters: the regularization parameter λ, the kernel width σ and the threshold τ . [sent-233, score-0.395]

88 The kernel width is chosen as the median of the distribution of distances of the K-th nearest neighbor of each training set point for K = 10. [sent-236, score-0.315]

89 Fixed the kernel width, we choose regularization parameter in correspondence of the maximum curvature in the eigenvalue behavior– see Figure 1, the rational being that after this value the eigenvalues are relatively small. [sent-237, score-0.434]

90 For comparison we considered a Parzen window density estimator and one-class SVM (1CSVM )as implemented by [6]. [sent-238, score-0.163]

91 Given a kernel width an estimate of the probability distribution is computed and can be used to estimate the support by fixing a threshold τ ′ . [sent-241, score-0.415]

92 For the one-class SVM we considered the Gaussian kernel, so that we have to fix the kernel width and a regularization parameter ν. [sent-242, score-0.423]

93 We fix the kernel width to be the same used by our estimator and fixed ν = 0. [sent-243, score-0.415]

94 On the different test performed on the Mnist data the spectral algorithm always achieves results which are better- and often substantially better - than those of the other methods. [sent-250, score-0.209]

95 On the CBCL dataset SVM provides the best result, but spectral algorithm still provides a competitive performance. [sent-251, score-0.209]

96 To overcome this problem we introduce a new notion of RKHS, that we call completely regular, that captures the relevant geometric properties of the probability distribution. [sent-254, score-0.168]

97 Then, the support of the distribution can be characterized as the null space of the integral operator defined by the kernel and can be estimated using a spectral filtering approach. [sent-255, score-0.815]

98 9 1 Figure 2: ROC curves for the different estimator in three different tasks: digit 9vs 4 Left, digit 1vs 7 Center, CBCL Right. [sent-311, score-0.164]

99 Collectively compact operator approximation theory and applications to integral equations. [sent-348, score-0.212]

100 Vector valued reproducing kernel Hilbert spaces of integrable functions and Mercer theorem. [sent-398, score-0.618]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('kx', 0.533), ('reproducing', 0.284), ('kernel', 0.245), ('spectral', 0.209), ('tn', 0.204), ('fn', 0.179), ('sn', 0.157), ('hilbert', 0.154), ('regular', 0.137), ('cbcl', 0.135), ('tikhonov', 0.13), ('completely', 0.122), ('rkhs', 0.12), ('parzen', 0.116), ('dk', 0.112), ('eigenvalues', 0.109), ('closed', 0.105), ('pc', 0.104), ('support', 0.1), ('estimator', 0.1), ('metric', 0.09), ('operator', 0.084), ('measurable', 0.083), ('dx', 0.082), ('regularization', 0.08), ('integral', 0.076), ('width', 0.07), ('separated', 0.069), ('falsepos', 0.066), ('oneclasssvm', 0.066), ('truepos', 0.066), ('null', 0.063), ('separates', 0.062), ('kn', 0.062), ('hausdorff', 0.058), ('separable', 0.057), ('spaces', 0.055), ('dimensional', 0.054), ('decay', 0.054), ('compact', 0.052), ('topological', 0.052), ('kt', 0.051), ('surely', 0.051), ('lim', 0.048), ('manifold', 0.048), ('notion', 0.046), ('universal', 0.045), ('fc', 0.045), ('infn', 0.044), ('maginitude', 0.044), ('rouen', 0.044), ('sezione', 0.044), ('toigo', 0.044), ('kernels', 0.043), ('mnist', 0.043), ('estimators', 0.043), ('auc', 0.04), ('regularized', 0.04), ('italy', 0.039), ('svm', 0.039), ('genova', 0.039), ('vito', 0.039), ('space', 0.038), ('vj', 0.037), ('regularity', 0.036), ('milano', 0.035), ('cn', 0.035), ('density', 0.035), ('laplacian', 0.034), ('valued', 0.034), ('jan', 0.034), ('respect', 0.033), ('rosasco', 0.033), ('smale', 0.033), ('digits', 0.033), ('xn', 0.033), ('measure', 0.032), ('contained', 0.032), ('rd', 0.032), ('digit', 0.032), ('topology', 0.032), ('roc', 0.032), ('di', 0.031), ('bump', 0.031), ('cutoff', 0.031), ('anomaly', 0.031), ('subsets', 0.031), ('sup', 0.03), ('precisely', 0.03), ('steinwart', 0.03), ('consistency', 0.03), ('concept', 0.03), ('supplementary', 0.029), ('considered', 0.028), ('compactly', 0.028), ('hs', 0.028), ('niyogi', 0.027), ('lebesgue', 0.027), ('belkin', 0.027), ('subset', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999923 250 nips-2010-Spectral Regularization for Support Estimation

Author: Ernesto D. Vito, Lorenzo Rosasco, Alessandro Toigo

Abstract: In this paper we consider the problem of learning from data the support of a probability distribution when the distribution does not have a density (with respect to some reference measure). We propose a new class of regularized spectral estimators based on a new notion of reproducing kernel Hilbert space, which we call “completely regular”. Completely regular kernels allow to capture the relevant geometric and topological properties of an arbitrary probability space. In particular, they are the key ingredient to prove the universal consistency of the spectral estimators and in this respect they are the analogue of universal kernels for supervised problems. Numerical experiments show that spectral estimators compare favorably to state of the art machine learning algorithms for density support estimation.

2 0.29500949 279 nips-2010-Universal Kernels on Non-Standard Input Spaces

Author: Andreas Christmann, Ingo Steinwart

Abstract: During the last years support vector machines (SVMs) have been successfully applied in situations where the input space X is not necessarily a subset of Rd . Examples include SVMs for the analysis of histograms or colored images, SVMs for text classiÄ?Ĺš cation and web mining, and SVMs for applications from computational biology using, e.g., kernels for trees and graphs. Moreover, SVMs are known to be consistent to the Bayes risk, if either the input space is a complete separable metric space and the reproducing kernel Hilbert space (RKHS) H ⊂ Lp (PX ) is dense, or if the SVM uses a universal kernel k. So far, however, there are no kernels of practical interest known that satisfy these assumptions, if X ⊂ Rd . We close this gap by providing a general technique based on Taylor-type kernels to explicitly construct universal kernels on compact metric spaces which are not subset of Rd . We apply this technique for the following special cases: universal kernels on the set of probability measures, universal kernels based on Fourier transforms, and universal kernels for signal processing. 1

3 0.24321038 199 nips-2010-Optimal learning rates for Kernel Conjugate Gradient regression

Author: Gilles Blanchard, Nicole Krämer

Abstract: We prove rates of convergence in the statistical sense for kernel-based least squares regression using a conjugate gradient algorithm, where regularization against overfitting is obtained by early stopping. This method is directly related to Kernel Partial Least Squares, a regression method that combines supervised dimensionality reduction with least squares projection. The rates depend on two key quantities: first, on the regularity of the target regression function and second, on the effective dimensionality of the data mapped into the kernel space. Lower bounds on attainable rates depending on these two quantities were established in earlier literature, and we obtain upper bounds for the considered method that match these lower bounds (up to a log factor) if the true regression function belongs to the reproducing kernel Hilbert space. If this assumption is not fulfilled, we obtain similar convergence rates provided additional unlabeled data are available. The order of the learning rates match state-of-the-art results that were recently obtained for least squares support vector machines and for linear regularization operators. 1

4 0.16579762 280 nips-2010-Unsupervised Kernel Dimension Reduction

Author: Meihong Wang, Fei Sha, Michael I. Jordan

Abstract: We apply the framework of kernel dimension reduction, originally designed for supervised problems, to unsupervised dimensionality reduction. In this framework, kernel-based measures of independence are used to derive low-dimensional representations that maximally capture information in covariates in order to predict responses. We extend this idea and develop similarly motivated measures for unsupervised problems where covariates and responses are the same. Our empirical studies show that the resulting compact representation yields meaningful and appealing visualization and clustering of data. Furthermore, when used in conjunction with supervised learners for classification, our methods lead to lower classification errors than state-of-the-art methods, especially when embedding data in spaces of very few dimensions.

5 0.15644631 124 nips-2010-Inductive Regularized Learning of Kernel Functions

Author: Prateek Jain, Brian Kulis, Inderjit S. Dhillon

Abstract: In this paper we consider the problem of semi-supervised kernel function learning. We first propose a general regularized framework for learning a kernel matrix, and then demonstrate an equivalence between our proposed kernel matrix learning framework and a general linear transformation learning problem. Our result shows that the learned kernel matrices parameterize a linear transformation kernel function and can be applied inductively to new data points. Furthermore, our result gives a constructive method for kernelizing most existing Mahalanobis metric learning formulations. To make our results practical for large-scale data, we modify our framework to limit the number of parameters in the optimization process. We also consider the problem of kernelized inductive dimensionality reduction in the semi-supervised setting. To this end, we introduce a novel method for this problem by considering a special case of our general kernel learning framework where we select the trace norm function as the regularizer. We empirically demonstrate that our framework learns useful kernel functions, improving the k-NN classification accuracy significantly in a variety of domains. Furthermore, our kernelized dimensionality reduction technique significantly reduces the dimensionality of the feature space while achieving competitive classification accuracies. 1

6 0.13531169 133 nips-2010-Kernel Descriptors for Visual Recognition

7 0.12908834 185 nips-2010-Nonparametric Density Estimation for Stochastic Optimization with an Observable State Variable

8 0.12743258 75 nips-2010-Empirical Risk Minimization with Approximations of Probabilistic Grammars

9 0.12510744 174 nips-2010-Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition

10 0.12114185 145 nips-2010-Learning Kernels with Radiuses of Minimum Enclosing Balls

11 0.11415188 278 nips-2010-Universal Consistency of Multi-Class Support Vector Classification

12 0.10389563 104 nips-2010-Generative Local Metric Learning for Nearest Neighbor Classification

13 0.087100074 249 nips-2010-Spatial and anatomical regularization of SVM for brain image analysis

14 0.08312013 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis

15 0.083051093 270 nips-2010-Tight Sample Complexity of Large-Margin Learning

16 0.081317075 233 nips-2010-Scrambled Objects for Least-Squares Regression

17 0.077602774 281 nips-2010-Using body-anchored priors for identifying actions in single images

18 0.077429749 117 nips-2010-Identifying graph-structured activation patterns in networks

19 0.076634757 223 nips-2010-Rates of convergence for the cluster tree

20 0.074925311 146 nips-2010-Learning Multiple Tasks using Manifold Regularization


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.211), (1, 0.063), (2, 0.102), (3, 0.01), (4, 0.233), (5, 0.094), (6, 0.263), (7, -0.071), (8, 0.067), (9, 0.075), (10, -0.017), (11, -0.069), (12, -0.152), (13, -0.081), (14, 0.018), (15, 0.001), (16, 0.13), (17, -0.003), (18, -0.011), (19, 0.01), (20, -0.112), (21, -0.058), (22, -0.068), (23, -0.002), (24, 0.131), (25, 0.013), (26, 0.02), (27, -0.021), (28, -0.16), (29, -0.039), (30, 0.028), (31, -0.005), (32, -0.01), (33, 0.075), (34, -0.028), (35, 0.127), (36, -0.147), (37, 0.037), (38, 0.043), (39, -0.024), (40, -0.029), (41, 0.048), (42, -0.019), (43, -0.022), (44, -0.072), (45, 0.071), (46, -0.051), (47, -0.085), (48, -0.043), (49, 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96627408 250 nips-2010-Spectral Regularization for Support Estimation

Author: Ernesto D. Vito, Lorenzo Rosasco, Alessandro Toigo

Abstract: In this paper we consider the problem of learning from data the support of a probability distribution when the distribution does not have a density (with respect to some reference measure). We propose a new class of regularized spectral estimators based on a new notion of reproducing kernel Hilbert space, which we call “completely regular”. Completely regular kernels allow to capture the relevant geometric and topological properties of an arbitrary probability space. In particular, they are the key ingredient to prove the universal consistency of the spectral estimators and in this respect they are the analogue of universal kernels for supervised problems. Numerical experiments show that spectral estimators compare favorably to state of the art machine learning algorithms for density support estimation.

2 0.84306765 199 nips-2010-Optimal learning rates for Kernel Conjugate Gradient regression

Author: Gilles Blanchard, Nicole Krämer

Abstract: We prove rates of convergence in the statistical sense for kernel-based least squares regression using a conjugate gradient algorithm, where regularization against overfitting is obtained by early stopping. This method is directly related to Kernel Partial Least Squares, a regression method that combines supervised dimensionality reduction with least squares projection. The rates depend on two key quantities: first, on the regularity of the target regression function and second, on the effective dimensionality of the data mapped into the kernel space. Lower bounds on attainable rates depending on these two quantities were established in earlier literature, and we obtain upper bounds for the considered method that match these lower bounds (up to a log factor) if the true regression function belongs to the reproducing kernel Hilbert space. If this assumption is not fulfilled, we obtain similar convergence rates provided additional unlabeled data are available. The order of the learning rates match state-of-the-art results that were recently obtained for least squares support vector machines and for linear regularization operators. 1

3 0.83207089 279 nips-2010-Universal Kernels on Non-Standard Input Spaces

Author: Andreas Christmann, Ingo Steinwart

Abstract: During the last years support vector machines (SVMs) have been successfully applied in situations where the input space X is not necessarily a subset of Rd . Examples include SVMs for the analysis of histograms or colored images, SVMs for text classiÄ?Ĺš cation and web mining, and SVMs for applications from computational biology using, e.g., kernels for trees and graphs. Moreover, SVMs are known to be consistent to the Bayes risk, if either the input space is a complete separable metric space and the reproducing kernel Hilbert space (RKHS) H ⊂ Lp (PX ) is dense, or if the SVM uses a universal kernel k. So far, however, there are no kernels of practical interest known that satisfy these assumptions, if X ⊂ Rd . We close this gap by providing a general technique based on Taylor-type kernels to explicitly construct universal kernels on compact metric spaces which are not subset of Rd . We apply this technique for the following special cases: universal kernels on the set of probability measures, universal kernels based on Fourier transforms, and universal kernels for signal processing. 1

4 0.72159541 280 nips-2010-Unsupervised Kernel Dimension Reduction

Author: Meihong Wang, Fei Sha, Michael I. Jordan

Abstract: We apply the framework of kernel dimension reduction, originally designed for supervised problems, to unsupervised dimensionality reduction. In this framework, kernel-based measures of independence are used to derive low-dimensional representations that maximally capture information in covariates in order to predict responses. We extend this idea and develop similarly motivated measures for unsupervised problems where covariates and responses are the same. Our empirical studies show that the resulting compact representation yields meaningful and appealing visualization and clustering of data. Furthermore, when used in conjunction with supervised learners for classification, our methods lead to lower classification errors than state-of-the-art methods, especially when embedding data in spaces of very few dimensions.

5 0.69008428 124 nips-2010-Inductive Regularized Learning of Kernel Functions

Author: Prateek Jain, Brian Kulis, Inderjit S. Dhillon

Abstract: In this paper we consider the problem of semi-supervised kernel function learning. We first propose a general regularized framework for learning a kernel matrix, and then demonstrate an equivalence between our proposed kernel matrix learning framework and a general linear transformation learning problem. Our result shows that the learned kernel matrices parameterize a linear transformation kernel function and can be applied inductively to new data points. Furthermore, our result gives a constructive method for kernelizing most existing Mahalanobis metric learning formulations. To make our results practical for large-scale data, we modify our framework to limit the number of parameters in the optimization process. We also consider the problem of kernelized inductive dimensionality reduction in the semi-supervised setting. To this end, we introduce a novel method for this problem by considering a special case of our general kernel learning framework where we select the trace norm function as the regularizer. We empirically demonstrate that our framework learns useful kernel functions, improving the k-NN classification accuracy significantly in a variety of domains. Furthermore, our kernelized dimensionality reduction technique significantly reduces the dimensionality of the feature space while achieving competitive classification accuracies. 1

6 0.6357249 278 nips-2010-Universal Consistency of Multi-Class Support Vector Classification

7 0.58128881 75 nips-2010-Empirical Risk Minimization with Approximations of Probabilistic Grammars

8 0.55764562 133 nips-2010-Kernel Descriptors for Visual Recognition

9 0.54826516 145 nips-2010-Learning Kernels with Radiuses of Minimum Enclosing Balls

10 0.5355649 233 nips-2010-Scrambled Objects for Least-Squares Regression

11 0.47462386 105 nips-2010-Getting lost in space: Large sample analysis of the resistance distance

12 0.47302899 249 nips-2010-Spatial and anatomical regularization of SVM for brain image analysis

13 0.46608844 174 nips-2010-Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition

14 0.41658285 72 nips-2010-Efficient algorithms for learning kernels from multiple similarity matrices with general convex loss functions

15 0.41311067 104 nips-2010-Generative Local Metric Learning for Nearest Neighbor Classification

16 0.40469968 223 nips-2010-Rates of convergence for the cluster tree

17 0.3861486 148 nips-2010-Learning Networks of Stochastic Differential Equations

18 0.38519534 185 nips-2010-Nonparametric Density Estimation for Stochastic Optimization with an Observable State Variable

19 0.3827593 270 nips-2010-Tight Sample Complexity of Large-Margin Learning

20 0.3791163 134 nips-2010-LSTD with Random Projections


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.048), (27, 0.033), (30, 0.054), (35, 0.02), (45, 0.167), (50, 0.027), (52, 0.072), (60, 0.072), (77, 0.031), (78, 0.01), (90, 0.39)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.87827301 205 nips-2010-Permutation Complexity Bound on Out-Sample Error

Author: Malik Magdon-Ismail

Abstract: We define a data dependent permutation complexity for a hypothesis set H, which is similar to a Rademacher complexity or maximum discrepancy. The permutation complexity is based (like the maximum discrepancy) on dependent sampling. We prove a uniform bound on the generalization error, as well as a concentration result which means that the permutation estimate can be efficiently estimated.

same-paper 2 0.86060762 250 nips-2010-Spectral Regularization for Support Estimation

Author: Ernesto D. Vito, Lorenzo Rosasco, Alessandro Toigo

Abstract: In this paper we consider the problem of learning from data the support of a probability distribution when the distribution does not have a density (with respect to some reference measure). We propose a new class of regularized spectral estimators based on a new notion of reproducing kernel Hilbert space, which we call “completely regular”. Completely regular kernels allow to capture the relevant geometric and topological properties of an arbitrary probability space. In particular, they are the key ingredient to prove the universal consistency of the spectral estimators and in this respect they are the analogue of universal kernels for supervised problems. Numerical experiments show that spectral estimators compare favorably to state of the art machine learning algorithms for density support estimation.

3 0.85649103 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

Author: Congcong Li, Adarsh Kowdle, Ashutosh Saxena, Tsuhan Chen

Abstract: In many machine learning domains (such as scene understanding), several related sub-tasks (such as scene categorization, depth estimation, object detection) operate on the same raw data and provide correlated outputs. Each of these tasks is often notoriously hard, and state-of-the-art classifiers already exist for many subtasks. It is desirable to have an algorithm that can capture such correlation without requiring to make any changes to the inner workings of any classifier. We propose Feedback Enabled Cascaded Classification Models (FE-CCM), that maximizes the joint likelihood of the sub-tasks, while requiring only a ‘black-box’ interface to the original classifier for each sub-task. We use a two-layer cascade of classifiers, which are repeated instantiations of the original ones, with the output of the first layer fed into the second layer as input. Our training method involves a feedback step that allows later classifiers to provide earlier classifiers information about what error modes to focus on. We show that our method significantly improves performance in all the sub-tasks in two different domains: (i) scene understanding, where we consider depth estimation, scene categorization, event categorization, object detection, geometric labeling and saliency detection, and (ii) robotic grasping, where we consider grasp point detection and object classification. 1

4 0.8244592 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models

Author: Jun Zhu, Li-jia Li, Li Fei-fei, Eric P. Xing

Abstract: Upstream supervised topic models have been widely used for complicated scene understanding. However, existing maximum likelihood estimation (MLE) schemes can make the prediction model learning independent of latent topic discovery and result in an imbalanced prediction rule for scene classification. This paper presents a joint max-margin and max-likelihood learning method for upstream scene understanding models, in which latent topic discovery and prediction model estimation are closely coupled and well-balanced. The optimization problem is efficiently solved with a variational EM procedure, which iteratively solves an online loss-augmented SVM. We demonstrate the advantages of the large-margin approach on both an 8-category sports dataset and the 67-class MIT indoor scene dataset for scene categorization.

5 0.74526513 178 nips-2010-Multivariate Dyadic Regression Trees for Sparse Learning Problems

Author: Han Liu, Xi Chen

Abstract: We propose a new nonparametric learning method based on multivariate dyadic regression trees (MDRTs). Unlike traditional dyadic decision trees (DDTs) or classification and regression trees (CARTs), MDRTs are constructed using penalized empirical risk minimization with a novel sparsity-inducing penalty. Theoretically, we show that MDRTs can simultaneously adapt to the unknown sparsity and smoothness of the true regression functions, and achieve the nearly optimal rates of convergence (in a minimax sense) for the class of (α, C)-smooth functions. Empirically, MDRTs can simultaneously conduct function estimation and variable selection in high dimensions. To make MDRTs applicable for large-scale learning problems, we propose a greedy heuristics. The superior performance of MDRTs are demonstrated on both synthetic and real datasets. 1

6 0.71239638 127 nips-2010-Inferring Stimulus Selectivity from the Spatial Structure of Neural Network Dynamics

7 0.65874738 175 nips-2010-Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers

8 0.65448946 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification

9 0.61997885 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers

10 0.61653453 173 nips-2010-Multi-View Active Learning in the Non-Realizable Case

11 0.61114544 199 nips-2010-Optimal learning rates for Kernel Conjugate Gradient regression

12 0.60428298 117 nips-2010-Identifying graph-structured activation patterns in networks

13 0.60163611 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability

14 0.60116386 282 nips-2010-Variable margin losses for classifier design

15 0.59300286 80 nips-2010-Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs

16 0.58286357 249 nips-2010-Spatial and anatomical regularization of SVM for brain image analysis

17 0.58052135 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach

18 0.57842064 278 nips-2010-Universal Consistency of Multi-Class Support Vector Classification

19 0.57634467 24 nips-2010-Active Learning Applied to Patient-Adaptive Heartbeat Classification

20 0.57607293 243 nips-2010-Smoothness, Low Noise and Fast Rates