jmlr jmlr2012 jmlr2012-100 knowledge-graph by maker-knowledge-mining

100 jmlr-2012-Robust Kernel Density Estimation


Source: pdf

Author: JooSeuk Kim, Clayton D. Scott

Abstract: We propose a method for nonparametric density estimation that exhibits robustness to contamination of the training sample. This method achieves robustness by combining a traditional kernel density estimator (KDE) with ideas from classical M-estimation. We interpret the KDE based on a positive semi-definite kernel as a sample mean in the associated reproducing kernel Hilbert space. Since the sample mean is sensitive to outliers, we estimate it robustly via M-estimation, yielding a robust kernel density estimator (RKDE). An RKDE can be computed efficiently via a kernelized iteratively re-weighted least squares (IRWLS) algorithm. Necessary and sufficient conditions are given for kernelized IRWLS to converge to the global minimizer of the M-estimator objective function. The robustness of the RKDE is demonstrated with a representer theorem, the influence function, and experimental results for density estimation and anomaly detection. Keywords: outlier, reproducing kernel Hilbert space, kernel trick, influence function, M-estimation

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 EDU Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI 48109-2122 USA Editor: Kenji Fukumizu Abstract We propose a method for nonparametric density estimation that exhibits robustness to contamination of the training sample. [sent-4, score-0.652]

2 This method achieves robustness by combining a traditional kernel density estimator (KDE) with ideas from classical M-estimation. [sent-5, score-0.612]

3 We interpret the KDE based on a positive semi-definite kernel as a sample mean in the associated reproducing kernel Hilbert space. [sent-6, score-0.431]

4 Since the sample mean is sensitive to outliers, we estimate it robustly via M-estimation, yielding a robust kernel density estimator (RKDE). [sent-7, score-0.789]

5 An RKDE can be computed efficiently via a kernelized iteratively re-weighted least squares (IRWLS) algorithm. [sent-8, score-0.138]

6 Necessary and sufficient conditions are given for kernelized IRWLS to converge to the global minimizer of the M-estimator objective function. [sent-9, score-0.093]

7 The robustness of the RKDE is demonstrated with a representer theorem, the influence function, and experimental results for density estimation and anomaly detection. [sent-10, score-0.529]

8 Keywords: outlier, reproducing kernel Hilbert space, kernel trick, influence function, M-estimation 1. [sent-11, score-0.431]

9 Introduction The kernel density estimator (KDE) is a well-known nonparametric estimator of univariate or multivariate densities, and numerous articles have been written on its properties, applications, and extensions (Silverman, 1986; Scott, 1992). [sent-12, score-0.624]

10 This paper addresses a method of nonparametric density estimation that generalizes the KDE, and exhibits robustness to contamination of the training sample. [sent-14, score-0.652]

11 1 Consider training data following a contamination model iid X1 , . [sent-15, score-0.207]

12 , Xn ∼ (1 − p) f0 + p f1 , where f0 is the “nominal” density to be estimated, f1 is the density of the contaminating distribution, 1 and p < 2 is the proportion of contamination. [sent-18, score-0.533]

13 The objective is to estimate f0 while making no parametric assumptions about the nominal or contaminating distributions. [sent-20, score-0.356]

14 Instead, we will focus on a set of nonparametric conditions that are reasonable in many practical applications. [sent-28, score-0.07]

15 As a motivating application, consider anomaly detection in a computer network. [sent-30, score-0.127]

16 For example, each Xi may record the volume of traffic along certain links in the network, at a certain instant in time (Chhabra et al. [sent-35, score-0.035]

17 If each measurement is collected when the network is in a nominal state, these data could be used to construct an anomaly detector by first estimating the density f0 of nominal measurements, and then thresholding that estimate at some level to obtain decision regions. [sent-37, score-0.806]

18 Hence, it is necessary to estimate the nominal density (or a level set thereof) from contaminated data. [sent-40, score-0.61]

19 Furthermore, the distributions of both nominal and anomalous measurements are potentially complex, and it is therefore desirable to avoid parametric models. [sent-41, score-0.348]

20 The proposed method achieves robustness by combining a traditional kernel density estimator with ideas from M-estimation (Huber, 1964; Hampel, 1974). [sent-42, score-0.612]

21 The KDE based on a translation invariant, positive semi-definite (PSD) kernel is interpreted as a sample mean in the reproducing kernel Hilbert space (RKHS) associated with the kernel. [sent-43, score-0.482]

22 Since the sample mean is sensitive to outliers, we estimate it robustly via M-estimation, yielding a robust kernel density estimator (RKDE). [sent-44, score-0.789]

23 We describe a kernelized iteratively re-weighted least squares (KIRWLS) algorithm to efficiently compute the RKDE, and provide necessary and sufficient conditions for the convergence of KIRWLS to the RKDE. [sent-45, score-0.138]

24 We also offer three arguments to support the claim that the RKDE robustly estimates the nominal density and its level sets. [sent-46, score-0.517]

25 First, we characterize the RKDE by a representer theorem. [sent-47, score-0.067]

26 This theorem shows that the RKDE is a weighted KDE, and the weights are smaller for more outlying data points. [sent-48, score-0.071]

27 Second, we study the influence function of the RKDE, and show through an exact formula and numerical results that the RKDE is less sensitive to contamination by outliers than the KDE. [sent-49, score-0.366]

28 Third, we conduct experiments on several benchmark data sets that demonstrate the improved performance of the RKDE, relative to competing methods, at both density estimation and anomaly detection. [sent-50, score-0.383]

29 One motivation for this work is that the traditional kernel density estimator is well-known to be sensitive to outliers. [sent-51, score-0.556]

30 Even without contamination, the standard KDE tends to overestimate the density in regions where the true density is low. [sent-52, score-0.477]

31 This has motivated several authors to consider variable kernel density estimators (VKDEs), which employ a data-dependent bandwidth at each data point (Breiman et al. [sent-53, score-0.46]

32 This bandwidth is adapted to be larger where the data are less dense, with the aim of decreasing the aforementioned bias. [sent-55, score-0.098]

33 Such methods have been applied in outlier detection and computer vision applications (Comaniciu et al. [sent-56, score-0.041]

34 , 2007), and are one possible approach to robust nonparametric density estimation. [sent-58, score-0.418]

35 2530 ROBUST K ERNEL D ENSITY E STIMATION Density estimation with positive semi-definite kernels has been studied by several authors. [sent-60, score-0.071]

36 Vapnik and Mukherjee (2000) optimize a criterion based on the empirical cumulative distribution function over the class of weighted KDEs based on a PSD kernel. [sent-61, score-0.025]

37 Shawe-Taylor and Dolia (2007) provide a refined theoretical treatment of this approach. [sent-62, score-0.028]

38 (2008) adopt a different criterion based on Hilbert space embeddings of probability distributions. [sent-64, score-0.051]

39 Our approach is somewhat similar in that we attempt to match the mean of the empirical distribution in the RKHS, but our criterion is different. [sent-65, score-0.025]

40 These methods were also not designed with contaminated data in mind. [sent-66, score-0.141]

41 We show that the standard kernel density estimator can be viewed as the solution to a certain least squares problem in the RKHS. [sent-67, score-0.505]

42 The use of quadratic criteria in density estimation has also been previously developed. [sent-68, score-0.256]

43 optimizes the norm-squared in Hilbert space, whereas Kim (1995), Girolami and He (2003), Kim and Scott (2010) and Mahapatruni and Gray (2011) adopt the integrated squared error. [sent-70, score-0.026]

44 Once again, these methods are not designed for contaminated data. [sent-71, score-0.141]

45 Previous work combining robust estimation and kernel methods has focused primarily on supervised learning problems. [sent-72, score-0.322]

46 M-estimation applied to kernel regression has been studied by various authors (Christmann and Steinwart, 2007; Debruyne et al. [sent-73, score-0.168]

47 In unsupervised learning, a robust way of doing kernel principal component analysis, called spherical KPCA, has been proposed, which applies PCA to feature vectors projected onto a unit sphere around the spatial median in a kernel feature space (Debruyne et al. [sent-79, score-0.493]

48 The kernelized spatial depth was also proposed to estimate depth contours nonparametrically (Chen et al. [sent-81, score-0.317]

49 To our knowledge, the RKDE is the first application of M-estimation ideas in kernel density estimation. [sent-83, score-0.426]

50 In Section 2 we propose robust kernel density estimation. [sent-84, score-0.516]

51 In Section 3 we present a representer theorem for the RKDE. [sent-85, score-0.067]

52 , Xn ∈ Rd be a random sample from a distribution F with a density f . [sent-98, score-0.225]

53 The kernel density estimate of f , also called the Parzen window estimate, is a nonparametric estimate given by fKDE (x) = 1 n ∑ kσ (x, Xi ) n i=1 where kσ is a kernel function with bandwidth σ. [sent-99, score-0.766]

54 To ensure that fKDE (x) is a density, we assume the kernel function satisfies kσ ( · , · ) ≥ 0 and kσ (x, · ) dx = 1. [sent-100, score-0.198]

55 We will also assume that kσ (x, x′ ) is translation invariant, in that kσ (x − z, x′ − z) = kσ (x, x′ ) for all x, x′ , and z. [sent-101, score-0.051]

56 Every PSD kernel kσ is associated with a unique Hilbert space of functions called its reproducing kernel Hilbert space (RKHS) which we will denote H , and kσ is called the reproducing kernel of H . [sent-108, score-0.694]

57 See Steinwart and Christmann (2008) for a thorough treatment of PSD kernels and RKHSs. [sent-110, score-0.068]

58 For our purposes, the critical property of H is the so-called reproducing property. [sent-111, score-0.095]

59 We also note that, by translation invariance, the functions Φ(x) have constant norm in H because Φ(x) 2 = Φ(x), Φ(x) H = kσ (x, x) = kσ (0, 0). [sent-114, score-0.051]

60 H g∈H i=1 Being the solution of a least squares problem, the KDE is sensitive to the presence of outliers among the Φ(Xi )’s. [sent-118, score-0.204]

61 To reduce the effect of outliers, we propose to use M-estimation (Huber, 1964) to find a robust sample mean of the Φ(Xi )’s. [sent-119, score-0.123]

62 For a robust loss function ρ(x) on x ≥ 0, the robust kernel density estimate is defined as n fRKDE = arg min ∑ ρ Φ(Xi ) − g H . [sent-120, score-0.673]

63 g∈H (2) i=1 Well-known examples of robust loss functions are Huber’s or Hampel’s ρ. [sent-121, score-0.123]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('rkde', 0.497), ('kde', 0.414), ('density', 0.225), ('nominal', 0.21), ('contamination', 0.207), ('kernel', 0.168), ('fkde', 0.166), ('scott', 0.165), ('contaminated', 0.141), ('anomaly', 0.127), ('huber', 0.127), ('hampel', 0.124), ('kirwls', 0.124), ('robust', 0.123), ('outliers', 0.103), ('reproducing', 0.095), ('hilbert', 0.095), ('kernelized', 0.093), ('kim', 0.093), ('psd', 0.089), ('contaminating', 0.083), ('cott', 0.083), ('debruyne', 0.083), ('irwls', 0.083), ('jooseuk', 0.083), ('robustly', 0.082), ('robustness', 0.079), ('outlying', 0.071), ('umich', 0.071), ('ensity', 0.071), ('nonparametric', 0.07), ('estimator', 0.067), ('bandwidth', 0.067), ('representer', 0.067), ('clayton', 0.064), ('christmann', 0.064), ('anomalous', 0.064), ('rkhs', 0.058), ('sensitive', 0.056), ('stimation', 0.052), ('translation', 0.051), ('ernel', 0.049), ('song', 0.047), ('depth', 0.047), ('squares', 0.045), ('measurements', 0.045), ('steinwart', 0.045), ('outlier', 0.041), ('im', 0.041), ('kernels', 0.04), ('exhibits', 0.04), ('traditional', 0.04), ('cd', 0.038), ('uence', 0.038), ('dolia', 0.035), ('kpca', 0.035), ('kenji', 0.035), ('wellknown', 0.035), ('labor', 0.035), ('nonparametrically', 0.035), ('instant', 0.035), ('anomalies', 0.035), ('abramson', 0.035), ('chhabra', 0.035), ('estimate', 0.034), ('xi', 0.034), ('densities', 0.034), ('spatial', 0.034), ('yielding', 0.034), ('ideas', 0.033), ('girolami', 0.032), ('silverman', 0.032), ('abundant', 0.032), ('ann', 0.032), ('arbor', 0.032), ('scovel', 0.032), ('rd', 0.031), ('estimation', 0.031), ('aforementioned', 0.031), ('dx', 0.03), ('tedious', 0.029), ('fukumizu', 0.029), ('minority', 0.029), ('diffuse', 0.029), ('parametric', 0.029), ('treatment', 0.028), ('parzen', 0.027), ('contours', 0.027), ('articles', 0.027), ('overestimate', 0.027), ('thereof', 0.026), ('spatially', 0.026), ('acoustics', 0.026), ('traf', 0.026), ('adopt', 0.026), ('criterion', 0.025), ('intensive', 0.025), ('michigan', 0.025), ('laplacian', 0.025), ('offered', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 100 jmlr-2012-Robust Kernel Density Estimation

Author: JooSeuk Kim, Clayton D. Scott

Abstract: We propose a method for nonparametric density estimation that exhibits robustness to contamination of the training sample. This method achieves robustness by combining a traditional kernel density estimator (KDE) with ideas from classical M-estimation. We interpret the KDE based on a positive semi-definite kernel as a sample mean in the associated reproducing kernel Hilbert space. Since the sample mean is sensitive to outliers, we estimate it robustly via M-estimation, yielding a robust kernel density estimator (RKDE). An RKDE can be computed efficiently via a kernelized iteratively re-weighted least squares (IRWLS) algorithm. Necessary and sufficient conditions are given for kernelized IRWLS to converge to the global minimizer of the M-estimator objective function. The robustness of the RKDE is demonstrated with a representer theorem, the influence function, and experimental results for density estimation and anomaly detection. Keywords: outlier, reproducing kernel Hilbert space, kernel trick, influence function, M-estimation

2 0.087345466 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices

Author: Aharon Ben-Tal, Sahely Bhadra, Chiranjib Bhattacharyya, Arkadi Nemirovski

Abstract: In this paper we study the problem of designing SVM classifiers when the kernel matrix, K, is affected by uncertainty. Specifically K is modeled as a positive affine combination of given positive semi definite kernels, with the coefficients ranging in a norm-bounded uncertainty set. We treat the problem using the Robust Optimization methodology. This reduces the uncertain SVM problem into a deterministic conic quadratic problem which can be solved in principle by a polynomial time Interior Point (IP) algorithm. However, for large-scale classification problems, IP methods become intractable and one has to resort to first-order gradient type methods. The strategy we use here is to reformulate the robust counterpart of the uncertain SVM problem as a saddle point problem and employ a special gradient scheme which works directly on the convex-concave saddle function. The algorithm is a simplified version of a general scheme due to Juditski and Nemirovski (2011). It achieves an O(1/T 2 ) reduction of the initial error after T iterations. A comprehensive empirical study on both synthetic data and real-world protein structure data sets show that the proposed formulations achieve the desired robustness, and the saddle point based algorithm outperforms the IP method significantly. Keywords: robust optimization, uncertain classification, kernel functions

3 0.086499676 96 jmlr-2012-Refinement of Operator-valued Reproducing Kernels

Author: Haizhang Zhang, Yuesheng Xu, Qinghui Zhang

Abstract: This paper studies the construction of a refinement kernel for a given operator-valued reproducing kernel such that the vector-valued reproducing kernel Hilbert space of the refinement kernel contains that of the given kernel as a subspace. The study is motivated from the need of updating the current operator-valued reproducing kernel in multi-task learning when underfitting or overfitting occurs. Numerical simulations confirm that the established refinement kernel method is able to meet this need. Various characterizations are provided based on feature maps and vector-valued integral representations of operator-valued reproducing kernels. Concrete examples of refining translation invariant and finite Hilbert-Schmidt operator-valued reproducing kernels are provided. Other examples include refinement of Hessian of scalar-valued translation-invariant kernels and transformation kernels. Existence and properties of operator-valued reproducing kernels preserved during the refinement process are also investigated. Keywords: vector-valued reproducing kernel Hilbert spaces, operator-valued reproducing kernels, refinement, embedding, translation invariant kernels, Hessian of Gaussian kernels, Hilbert-Schmidt kernels, numerical experiments

4 0.061162952 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

Author: Garvesh Raskutti, Martin J. Wainwright, Bin Yu

Abstract: Sparse additive models are families of d-variate functions with the additive decomposition f ∗ = ∑ j∈S f j∗ , where S is an unknown subset of cardinality s ≪ d. In this paper, we consider the case where each univariate component function f j∗ lies in a reproducing kernel Hilbert space (RKHS), and analyze a method for estimating the unknown function f ∗ based on kernels combined with ℓ1 -type convex regularization. Working within a high-dimensional framework that allows both the dimension d and sparsity s to increase with n, we derive convergence rates in the L2 (P) and L2 (Pn ) norms over the class Fd,s,H of sparse additive models with each univariate function f j∗ in the unit ball of a univariate RKHS with bounded kernel function. We complement our upper bounds by deriving minimax lower bounds on the L2 (P) error, thereby showing the optimality of our method. Thus, we obtain optimal minimax rates for many interesting classes of sparse additive models, including polynomials, splines, and Sobolev classes. We also show that if, in contrast to our univariate conditions, the d-variate function class is assumed to be globally bounded, then much √ faster estimation rates are possible for any sparsity s = Ω( n), showing that global boundedness is a significant restriction in the high-dimensional setting. Keywords: sparsity, kernel, non-parametric, convex, minimax

5 0.060482126 4 jmlr-2012-A Kernel Two-Sample Test

Author: Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, Alexander Smola

Abstract: We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD). We present two distributionfree tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests. ∗. †. ‡. §. Also at Gatsby Computational Neuroscience Unit, CSML, 17 Queen Square, London WC1N 3AR, UK. This work was carried out while K.M.B. was with the Ludwig-Maximilians-Universit¨ t M¨ nchen. a u This work was carried out while M.J.R. was with the Graz University of Technology. Also at The Australian National University, Canberra, ACT 0200, Australia. c 2012 Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch¨ lkopf and Alexander Smola. o ¨ G RETTON , B ORGWARDT, R ASCH , S CH OLKOPF AND S MOLA Keywords: kernel methods, two-sample test, uniform convergence bounds, schema matching, integral probability metric, hypothesis testing

6 0.055180389 66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation

7 0.052888222 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis

8 0.051363207 44 jmlr-2012-Feature Selection via Dependence Maximization

9 0.050528824 109 jmlr-2012-Stability of Density-Based Clustering

10 0.041455641 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods

11 0.040643729 16 jmlr-2012-Algorithms for Learning Kernels Based on Centered Alignment

12 0.040623564 74 jmlr-2012-Multi Kernel Learning with Online-Batch Optimization

13 0.040457569 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

14 0.035479993 104 jmlr-2012-Security Analysis of Online Centroid Anomaly Detection

15 0.032067042 38 jmlr-2012-Entropy Search for Information-Efficient Global Optimization

16 0.031728487 89 jmlr-2012-Pairwise Support Vector Machines and their Application to Large Scale Problems

17 0.030149532 68 jmlr-2012-Minimax Manifold Estimation

18 0.02923944 81 jmlr-2012-On the Convergence Rate oflp-Norm Multiple Kernel Learning

19 0.028438071 59 jmlr-2012-Linear Regression With Random Projections

20 0.028220262 80 jmlr-2012-On Ranking and Generalization Bounds


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.134), (1, 0.052), (2, 0.04), (3, 0.133), (4, 0.096), (5, -0.031), (6, -0.121), (7, -0.016), (8, -0.001), (9, 0.005), (10, 0.066), (11, -0.099), (12, -0.074), (13, 0.038), (14, 0.088), (15, 0.2), (16, 0.011), (17, -0.132), (18, -0.018), (19, 0.051), (20, 0.112), (21, 0.093), (22, 0.019), (23, 0.024), (24, -0.019), (25, -0.066), (26, 0.071), (27, 0.064), (28, -0.093), (29, 0.233), (30, 0.142), (31, 0.019), (32, 0.048), (33, -0.144), (34, -0.122), (35, -0.028), (36, -0.205), (37, 0.14), (38, 0.149), (39, -0.24), (40, -0.068), (41, -0.066), (42, 0.018), (43, 0.101), (44, -0.048), (45, 0.102), (46, 0.03), (47, -0.039), (48, 0.213), (49, -0.086)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94867885 100 jmlr-2012-Robust Kernel Density Estimation

Author: JooSeuk Kim, Clayton D. Scott

Abstract: We propose a method for nonparametric density estimation that exhibits robustness to contamination of the training sample. This method achieves robustness by combining a traditional kernel density estimator (KDE) with ideas from classical M-estimation. We interpret the KDE based on a positive semi-definite kernel as a sample mean in the associated reproducing kernel Hilbert space. Since the sample mean is sensitive to outliers, we estimate it robustly via M-estimation, yielding a robust kernel density estimator (RKDE). An RKDE can be computed efficiently via a kernelized iteratively re-weighted least squares (IRWLS) algorithm. Necessary and sufficient conditions are given for kernelized IRWLS to converge to the global minimizer of the M-estimator objective function. The robustness of the RKDE is demonstrated with a representer theorem, the influence function, and experimental results for density estimation and anomaly detection. Keywords: outlier, reproducing kernel Hilbert space, kernel trick, influence function, M-estimation

2 0.62276334 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices

Author: Aharon Ben-Tal, Sahely Bhadra, Chiranjib Bhattacharyya, Arkadi Nemirovski

Abstract: In this paper we study the problem of designing SVM classifiers when the kernel matrix, K, is affected by uncertainty. Specifically K is modeled as a positive affine combination of given positive semi definite kernels, with the coefficients ranging in a norm-bounded uncertainty set. We treat the problem using the Robust Optimization methodology. This reduces the uncertain SVM problem into a deterministic conic quadratic problem which can be solved in principle by a polynomial time Interior Point (IP) algorithm. However, for large-scale classification problems, IP methods become intractable and one has to resort to first-order gradient type methods. The strategy we use here is to reformulate the robust counterpart of the uncertain SVM problem as a saddle point problem and employ a special gradient scheme which works directly on the convex-concave saddle function. The algorithm is a simplified version of a general scheme due to Juditski and Nemirovski (2011). It achieves an O(1/T 2 ) reduction of the initial error after T iterations. A comprehensive empirical study on both synthetic data and real-world protein structure data sets show that the proposed formulations achieve the desired robustness, and the saddle point based algorithm outperforms the IP method significantly. Keywords: robust optimization, uncertain classification, kernel functions

3 0.4592509 96 jmlr-2012-Refinement of Operator-valued Reproducing Kernels

Author: Haizhang Zhang, Yuesheng Xu, Qinghui Zhang

Abstract: This paper studies the construction of a refinement kernel for a given operator-valued reproducing kernel such that the vector-valued reproducing kernel Hilbert space of the refinement kernel contains that of the given kernel as a subspace. The study is motivated from the need of updating the current operator-valued reproducing kernel in multi-task learning when underfitting or overfitting occurs. Numerical simulations confirm that the established refinement kernel method is able to meet this need. Various characterizations are provided based on feature maps and vector-valued integral representations of operator-valued reproducing kernels. Concrete examples of refining translation invariant and finite Hilbert-Schmidt operator-valued reproducing kernels are provided. Other examples include refinement of Hessian of scalar-valued translation-invariant kernels and transformation kernels. Existence and properties of operator-valued reproducing kernels preserved during the refinement process are also investigated. Keywords: vector-valued reproducing kernel Hilbert spaces, operator-valued reproducing kernels, refinement, embedding, translation invariant kernels, Hessian of Gaussian kernels, Hilbert-Schmidt kernels, numerical experiments

4 0.44398135 109 jmlr-2012-Stability of Density-Based Clustering

Author: Alessandro Rinaldo, Aarti Singh, Rebecca Nugent, Larry Wasserman

Abstract: High density clusters can be characterized by the connected components of a level set L(λ) = {x : p(x) > λ} of the underlying probability density function p generating the data, at some appropriate level λ ≥ 0. The complete hierarchical clustering can be characterized by a cluster tree T = λ L(λ). In this paper, we study the behavior of a density level set estimate L(λ) and cluster tree estimate T based on a kernel density estimator with kernel bandwidth h. We define two notions of instability to measure the variability of L(λ) and T as a function of h, and investigate the theoretical properties of these instability measures. Keywords: clustering, density estimation, level sets, stability, model selection

5 0.35707939 16 jmlr-2012-Algorithms for Learning Kernels Based on Centered Alignment

Author: Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh

Abstract: This paper presents new and effective algorithms for learning kernels. In particular, as shown by our empirical results, these algorithms consistently outperform the so-called uniform combination solution that has proven to be difficult to improve upon in the past, as well as other algorithms for learning kernels based on convex combinations of base kernels in both classification and regression. Our algorithms are based on the notion of centered alignment which is used as a similarity measure between kernels or kernel matrices. We present a number of novel algorithmic, theoretical, and empirical results for learning kernels based on our notion of centered alignment. In particular, we describe efficient algorithms for learning a maximum alignment kernel by showing that the problem can be reduced to a simple QP and discuss a one-stage algorithm for learning both a kernel and a hypothesis based on that kernel using an alignment-based regularization. Our theoretical results include a novel concentration bound for centered alignment between kernel matrices, the proof of the existence of effective predictors for kernels with high alignment, both for classification and for regression, and the proof of stability-based generalization bounds for a broad family of algorithms for learning kernels based on centered alignment. We also report the results of experiments with our centered alignment-based algorithms in both classification and regression. Keywords: kernel methods, learning kernels, feature selection

6 0.31634969 66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation

7 0.27937889 104 jmlr-2012-Security Analysis of Online Centroid Anomaly Detection

8 0.27603978 4 jmlr-2012-A Kernel Two-Sample Test

9 0.25824651 44 jmlr-2012-Feature Selection via Dependence Maximization

10 0.2443704 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis

11 0.24247065 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

12 0.22930939 38 jmlr-2012-Entropy Search for Information-Efficient Global Optimization

13 0.22023503 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

14 0.1903397 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods

15 0.171827 74 jmlr-2012-Multi Kernel Learning with Online-Batch Optimization

16 0.15722837 15 jmlr-2012-Algebraic Geometric Comparison of Probability Distributions

17 0.15439779 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems

18 0.14999379 119 jmlr-2012-glm-ie: Generalised Linear Models Inference & Estimation Toolbox

19 0.1357238 59 jmlr-2012-Linear Regression With Random Projections

20 0.13468942 21 jmlr-2012-Bayesian Mixed-Effects Inference on Classification Performance in Hierarchical Data Sets


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(21, 0.026), (26, 0.051), (29, 0.069), (56, 0.012), (67, 0.429), (69, 0.05), (75, 0.067), (77, 0.012), (79, 0.011), (92, 0.062), (96, 0.084)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.69107926 100 jmlr-2012-Robust Kernel Density Estimation

Author: JooSeuk Kim, Clayton D. Scott

Abstract: We propose a method for nonparametric density estimation that exhibits robustness to contamination of the training sample. This method achieves robustness by combining a traditional kernel density estimator (KDE) with ideas from classical M-estimation. We interpret the KDE based on a positive semi-definite kernel as a sample mean in the associated reproducing kernel Hilbert space. Since the sample mean is sensitive to outliers, we estimate it robustly via M-estimation, yielding a robust kernel density estimator (RKDE). An RKDE can be computed efficiently via a kernelized iteratively re-weighted least squares (IRWLS) algorithm. Necessary and sufficient conditions are given for kernelized IRWLS to converge to the global minimizer of the M-estimator objective function. The robustness of the RKDE is demonstrated with a representer theorem, the influence function, and experimental results for density estimation and anomaly detection. Keywords: outlier, reproducing kernel Hilbert space, kernel trick, influence function, M-estimation

2 0.31646609 4 jmlr-2012-A Kernel Two-Sample Test

Author: Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, Alexander Smola

Abstract: We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD). We present two distributionfree tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests. ∗. †. ‡. §. Also at Gatsby Computational Neuroscience Unit, CSML, 17 Queen Square, London WC1N 3AR, UK. This work was carried out while K.M.B. was with the Ludwig-Maximilians-Universit¨ t M¨ nchen. a u This work was carried out while M.J.R. was with the Graz University of Technology. Also at The Australian National University, Canberra, ACT 0200, Australia. c 2012 Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch¨ lkopf and Alexander Smola. o ¨ G RETTON , B ORGWARDT, R ASCH , S CH OLKOPF AND S MOLA Keywords: kernel methods, two-sample test, uniform convergence bounds, schema matching, integral probability metric, hypothesis testing

3 0.30877253 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

Author: Neil D. Lawrence

Abstract: We introduce a new perspective on spectral dimensionality reduction which views these methods as Gaussian Markov random fields (GRFs). Our unifying perspective is based on the maximum entropy principle which is in turn inspired by maximum variance unfolding. The resulting model, which we call maximum entropy unfolding (MEU) is a nonlinear generalization of principal component analysis. We relate the model to Laplacian eigenmaps and isomap. We show that parameter fitting in the locally linear embedding (LLE) is approximate maximum likelihood MEU. We introduce a variant of LLE that performs maximum likelihood exactly: Acyclic LLE (ALLE). We show that MEU and ALLE are competitive with the leading spectral approaches on a robot navigation visualization and a human motion capture data set. Finally the maximum likelihood perspective allows us to introduce a new approach to dimensionality reduction based on L1 regularization of the Gaussian random field via the graphical lasso.

4 0.30669621 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches

Author: Ofer Dekel, Ran Gilad-Bachrach, Ohad Shamir, Lin Xiao

Abstract: Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of web-scale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this work, we present the distributed mini-batch algorithm, a method of converting many serial gradient-based online prediction algorithms into distributed algorithms. We prove a regret bound for this method that is asymptotically optimal for smooth convex loss functions and stochastic inputs. Moreover, our analysis explicitly takes into account communication latencies between nodes in the distributed environment. We show how our method can be used to solve the closely-related distributed stochastic optimization problem, achieving an asymptotically linear speed-up over multiple processors. Finally, we demonstrate the merits of our approach on a web-scale online prediction problem. Keywords: distributed computing, online learning, stochastic optimization, regret bounds, convex optimization

5 0.30635533 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis

Author: Fei Yan, Josef Kittler, Krystian Mikolajczyk, Atif Tahir

Abstract: Sparsity-inducing multiple kernel Fisher discriminant analysis (MK-FDA) has been studied in the literature. Building on recent advances in non-sparse multiple kernel learning (MKL), we propose a non-sparse version of MK-FDA, which imposes a general ℓ p norm regularisation on the kernel weights. We formulate the associated optimisation problem as a semi-infinite program (SIP), and adapt an iterative wrapper algorithm to solve it. We then discuss, in light of latest advances in MKL optimisation techniques, several reformulations and optimisation strategies that can potentially lead to significant improvements in the efficiency and scalability of MK-FDA. We carry out extensive experiments on six datasets from various application areas, and compare closely the performance of ℓ p MK-FDA, fixed norm MK-FDA, and several variants of SVM-based MKL (MK-SVM). Our results demonstrate that ℓ p MK-FDA improves upon sparse MK-FDA in many practical situations. The results also show that on image categorisation problems, ℓ p MK-FDA tends to outperform its SVM counterpart. Finally, we also discuss the connection between (MK-)FDA and (MK-)SVM, under the unified framework of regularised kernel machines. Keywords: multiple kernel learning, kernel fisher discriminant analysis, regularised least squares, support vector machines

6 0.30392176 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms

7 0.30292761 27 jmlr-2012-Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection

8 0.30281785 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods

9 0.30217123 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning

10 0.30001602 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints

11 0.2999396 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers

12 0.29794598 38 jmlr-2012-Entropy Search for Information-Efficient Global Optimization

13 0.29718548 21 jmlr-2012-Bayesian Mixed-Effects Inference on Classification Performance in Hierarchical Data Sets

14 0.29675129 1 jmlr-2012-A Case Study on Meta-Generalising: A Gaussian Processes Approach

15 0.29615307 114 jmlr-2012-Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies

16 0.29594588 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models

17 0.29529458 106 jmlr-2012-Sign Language Recognition using Sub-Units

18 0.29443544 18 jmlr-2012-An Improved GLMNET for L1-regularized Logistic Regression

19 0.29440472 82 jmlr-2012-On the Necessity of Irrelevant Variables

20 0.29359239 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices