jmlr jmlr2013 jmlr2013-32 knowledge-graph by maker-knowledge-mining

32 jmlr-2013-Differential Privacy for Functions and Functional Data

Source: pdf

Author: Rob Hall, Alessandro Rinaldo, Larry Wasserman

Abstract: Differential privacy is a rigorous cryptographically-motivated characterization of data privacy which may be applied when releasing summaries of a database. Previous work has focused mainly on methods for which the output is a ﬁnite dimensional vector, or an element of some discrete set. We develop methods for releasing functions while preserving differential privacy. Speciﬁcally, we show that adding an appropriate Gaussian process to the function of interest yields differential privacy. When the functions lie in the reproducing kernel Hilbert space (RKHS) generated by the covariance kernel of the Gaussian process, then the correct noise level is established by measuring the “sensitivity” of the function in the RKHS norm. As examples we consider kernel density estimation, kernel support vector machines, and functions in RKHSs. Keywords: differential privacy, density estimation, Gaussian processes, reproducing kernel Hilbert space

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 EDU Department of Statistics Carnegie Mellon University Pittsburgh, PA 15289, USA Editor: Charles Elkan Abstract Differential privacy is a rigorous cryptographically-motivated characterization of data privacy which may be applied when releasing summaries of a database. [sent-7, score-1.047]

2 We develop methods for releasing functions while preserving differential privacy. [sent-9, score-0.207]

3 Speciﬁcally, we show that adding an appropriate Gaussian process to the function of interest yields differential privacy. [sent-10, score-0.182]

4 When the functions lie in the reproducing kernel Hilbert space (RKHS) generated by the covariance kernel of the Gaussian process, then the correct noise level is established by measuring the “sensitivity” of the function in the RKHS norm. [sent-11, score-0.175]

5 As examples we consider kernel density estimation, kernel support vector machines, and functions in RKHSs. [sent-12, score-0.16]

6 Keywords: differential privacy, density estimation, Gaussian processes, reproducing kernel Hilbert space 1. [sent-13, score-0.327]

7 We want to release a summary of D without compromising the privacy of those individuals in the database. [sent-15, score-0.61]

8 One framework for deﬁning privacy rigorously in such problems is differential privacy (Dwork et al. [sent-16, score-1.204]

9 Algorithms that preserve differential privacy have been developed for boosting, parameter estimation, clustering, logistic regression, SVM learning and many other learning tasks. [sent-22, score-0.693]

10 H ALL , R INALDO AND WASSERMAN A concept that has been important in differential privacy is the “sensitivity” of the output (Dwork et al. [sent-31, score-0.717]

11 To establish privacy a Gaussian process may be added to the function with noise level calibrated to the “RKHS sensitivity” of the output. [sent-35, score-0.511]

12 , dn ∈ Rd are a sample from a distribution with density f then we can estimate the density with the kernel density estimator f (x) = 1 n ∑W n i=1 ||x − di || , h x ∈ Rd , where W is a kernel (see, for instance, Wasserman, 2006) and h > 0 is the bandwidth parameter. [sent-44, score-0.326]

13 We may then want to release a “version” of the density estimator f in a way which fulﬁlls the criteria of differential privacy. [sent-46, score-0.359]

14 With a differentially private density estimator in hand, a large sample of data may be drawn from that density. [sent-49, score-0.414]

15 The release of such a sample would inherit the differential privacy properties of the density estimator: see, in particular, Wasserman and Zhou (2010) and Machanavajjhala et al. [sent-50, score-0.84]

16 This is a very attractive proposition, since a differentially private sample of data could be used as the basis for any number of statistical analyses which may have been brought to bear against the original data (for instance, exploratory data analysis, model ﬁtting, etc). [sent-52, score-0.355]

17 The preferred method for density estimation in statistics is kernel density estimation. [sent-57, score-0.152]

18 The methods developed in this paper lead to a private kernel density estimator. [sent-58, score-0.325]

19 We give methods which in essence permit the user to request the evaluation of the private function at arbitrarily many input points. [sent-63, score-0.269]

20 These points may be speciﬁed a-priori, or after the construction of the function, or even adaptively based on the outputs given, it makes no difference to the privacy guarantee. [sent-64, score-0.511]

21 However we note that in the case when the points are speciﬁed a-priori (for example, a grid over the domain) that the release of the function values corresponds to the release of a ﬁnite dimensional vector. [sent-65, score-0.226]

22 In this case we may regard this work as providing a conceptually clean way to determine the sensitivity of this vector so that standard ﬁnite dimensional privacy techniques may be applied. [sent-66, score-0.608]

23 After putting our contribution in the context of some related work, we introduce some notation and review the deﬁnition of differential privacy in Section 2. [sent-68, score-0.693]

24 We also give a demonstration of a technique to achieve the differential privacy for a vector valued output. [sent-69, score-0.757]

25 The theory for demonstrating differentially privacy of functions is established in Section 3. [sent-70, score-0.626]

26 In Section 4 we apply the theory to the problems of kernel density estimation and kernel SVM learning. [sent-71, score-0.16]

27 1 Related Work There are a few lines of research in the differential privacy literature that are related to this work. [sent-75, score-0.693]

28 In addition to the foundational papers mentioned above there has already been interest in the differentially private release of functions such as support vector machines. [sent-76, score-0.435]

29 (2010) and Chaudhuri and Monteleoni (2011) demonstrated that a private approximation to a nonprivate kernel support vector machine could be made by considering a certain ﬁnite dimensional projection of the original function. [sent-78, score-0.305]

30 Here we give an alternate method for the release of the classiﬁcation function (or regression function) which avoids this approximation, although we do so by employing a weaker type of differential privacy. [sent-80, score-0.281]

31 Our method in essence results in an inﬁnite dimensional private function which is not necessarily characterized by any ﬁnite dimensional vector. [sent-81, score-0.301]

32 (2010) give techniques which output a differentially private contingency table. [sent-86, score-0.376]

33 The second approach is to perform some other kind of density estimation on the input data in a way which admits the differential privacy, and then to sample that density to generate synthetic data. [sent-91, score-0.278]

34 So far differentially private density estimation is considered in Smith (2011) for parametric models and Wasserman and Zhou (2010) for non-parametric estimation. [sent-92, score-0.384]

35 In this work we give a technique which is useful for the differentially private estimation of a density in arbitrary dimension, and which achieves the same convergence rate (up to constants) as the non-private estimator. [sent-93, score-0.405]

36 It is worth noting that the availability of a differentially private synthetic data set allows the recipient to compute whatever function he wishes on the data. [sent-94, score-0.336]

37 This may seem to obviate the need for methods to compute private kernel machines and other functions. [sent-96, score-0.277]

38 On the other hand a kernel support vector machine learned on the original data may converge to a good classiﬁer with a relatively small number of samples, and so building a private version of that directly may lead to better classiﬁcation performance. [sent-98, score-0.277]

39 Therefore although private synthetic data may be seen as a panacea for differential privacy, it is important to remember that it in essence taints all analyses with the curse of dimensionality. [sent-99, score-0.427]

40 Differential Privacy Here we recall the deﬁnition of differential privacy and introduce some notation. [sent-101, score-0.693]

41 We phrase the deﬁnition of differential privacy using this characterization of randomized algorithms. [sent-116, score-0.71]

42 Typically the above deﬁnition is called “approximate differential privacy” whenever β > 0, and “(α, 0)-differential privacy” is shortened to “α-differential privacy. [sent-118, score-0.202]

43 We note that an alternate notion of adjacency for databases also appears in some of the differential privacy literature. [sent-121, score-0.715]

44 The σ-ﬁeld A is rarely mentioned in the literature on differential privacy but is actually quite / important. [sent-124, score-0.693]

45 (2006a) give a technique to achieve approximate differential privacy for general vector valued outputs in which the “sensitivity” may be bounded. [sent-131, score-0.757]

46 We review this below, since the result is important in the demonstration of the privacy of our methods which output functions. [sent-132, score-0.535]

47 In demonstrating the differential privacy, we make use of the following lemma which is simply an explicit statement of an argument used in a proof by Dwork et al. [sent-135, score-0.182]

48 dλ (4) H ALL , R INALDO AND WASSERMAN In our next result we show that approximate differential privacy is achieved via (4) when the output is a real vector, say vD = v(D) ∈ Rd , whose dimension does not depend on the database D. [sent-145, score-0.771]

49 i=1 We note that this is basically a re-statement of the well-known fact that the addition of appropriate Gaussian noise to a vector valued output will lead to approximate differential privacy. [sent-147, score-0.249]

50 2 The Implications of Approximate Differential Privacy The above deﬁnitions provide a strong privacy guarantee in the sense that they aim to protect against an adversary having almost complete knowledge of the private database. [sent-166, score-0.762]

51 Speciﬁcally, an adversary knowing all but one of the data elements and having observed the output of a private procedure, will remain unable to determine the identity of the data element which is unknown to him. [sent-167, score-0.275]

52 , dn−1 ), and the private database by D = (d1 , . [sent-173, score-0.275]

53 First note that before observing the output of the private algorithm, the adversary could determine that the private database D lay in the set {(d1 , . [sent-177, score-0.566]

54 Thus, the private database comprises his data with one more element. [sent-181, score-0.275]

55 The above result follows immediately from noting that the rejection region of the test is a measurable set in the space and so obeys the constraint of the differential privacy. [sent-187, score-0.182]

56 Below, we demonstrate the approximate differential privacy of measures on function spaces, by considering random variables which correspond to evaluating the random function on a ﬁnite set of points. [sent-199, score-0.693]

57 We focus on the creation of algorithms for which the differential privacy holds over the ﬁeld of cylinder sets, in the sense that, for all D ∼ D′ ∈ D , P( fD ∈ A) ≤ eα P( fD′ ∈ A) + β, ∀A ∈ F0 . [sent-215, score-0.744]

58 First we note that satisfying (7) implies that the release of any ﬁnite evaluation of the function achieves the differential privacy. [sent-217, score-0.281]

59 710 D IFFERENTIAL P RIVACY FOR F UNCTIONS The claimed privacy guarantee follows from (7). [sent-235, score-0.511]

60 In principle, if it were possible for a computer to release a complete description of the function fD then this result would demonstrate the privacy guarantee achieved by our algorithm. [sent-252, score-0.61]

61 Therefore in the case of continuous functions, differential privacy over F0 hence leads to differential privacy throughout the Borel σﬁeld. [sent-256, score-1.386]

62 In summary, we ﬁnd that if every ﬁnite dimensional projection of the released function satisﬁes differential privacy, then so does every countable-dimensional projection. [sent-257, score-0.259]

63 We now explore techniques which achieve the differential privacy over these σ-ﬁelds. [sent-258, score-0.693]

64 2 Differential Privacy via the Exponential Mechanism A straightforward means to output a function in a way which achieves the differential privacy is to make use of the so-called “exponential mechanism” of McSherry and Talwar (2007). [sent-260, score-0.717]

65 D∼D′ McSherry and Talwar (2007) demonstrate that such a technique achieves the α-differential privacy, which is strictly stronger than the (α, β)-differential privacy we consider here. [sent-266, score-0.532]

66 The technique given above can be interpreted as outputting a discrete random variable, and fulﬁlling privacy deﬁnition with respect to the σ-ﬁeld consisting of the powerset of G. [sent-270, score-0.532]

67 This implies the privacy with respect to the cylinder sets, since the restriction of each cylinder set to the elements of G corresponds some subset of G. [sent-271, score-0.613]

68 Therefore, with some additional technical machinery which we will illustrate next, it is possible to move from differentially private measures over vectors to those over functions. [sent-280, score-0.336]

69 We ﬁrst demonstrate the technical machinery which allows us to move from ﬁnite dimensional distributions to distributions on the function space, and then we give differentially private measures on function spaces of one dimension. [sent-288, score-0.364]

70 Then the release of ∆c(β) G α is (α, β)-differentially private (with respect to the cylinder σ-ﬁeld F ) whenever   fD (x1 ) − fD′ (x1 )   . [sent-305, score-0.391]

71 Thus for the vector obtained by evaluation of f at those points, differential privacy is demonstrated by Proposition 3 since (8) implies the sensitivity bound (5). [sent-322, score-0.746]

72 Combining this with the above gives the requisite privacy statement for all A ∈ F0 . [sent-340, score-0.511]

73 4 Functions in a Reproducing Kernel Hilbert Space When the family of functions lies in the reproducing kernel Hilbert space (RKHS) which corresponds to the covariance kernel of the Gaussian process, then establishing upper bounds of the form (8) is simple. [sent-343, score-0.194]

74 Corollary 9 For { fD : D ∈ D } ⊆ H , the release of fD = fD + ∆c(β) G α is (α, β)-differentially private (with respect to the cylinder σ-ﬁeld) whenever ∆ ≥ sup fD − fD′ H . [sent-369, score-0.391]

75 Examples We now give some examples in which the above technique may be used to construct private versions of functions in an RKHS. [sent-372, score-0.242]

76 Then fD − fD′ = 1 n(2πh2 )d/2 ′ Kxn − Kxn and fD − fD′ 2 = H 1 n(2πh2 )d/2 1 ≤2 n(2πh2 )d/2 If we release 2 ′ ′ ′ K(xn , xn ) + K(xn , xn ) − 2K(xn , xn ) 2 . [sent-387, score-0.267]

77 √ c(β) 2 G fD = fD + αn(2πh2 )d/2 where G is a sample path of a Gaussian process having mean zero and covariance K, then differential privacy is demonstrated by corollary 9. [sent-388, score-0.715]

78 For the differentially private function it is easy to see that E ( fD (x) − f (x))2 dx = O h4 + Therefore, at least in terms of rates, no accuracy has been lost. [sent-392, score-0.336]

79 0 x Figure 1: An example of a kernel density estimator (the solid curve) and the released version (the dashed curve). [sent-403, score-0.183]

80 1 N ON -I SOTROPIC K ERNELS The above demonstration of privacy also holds when the kernel is replaced by a non-isotropic Gaussian kernel. [sent-416, score-0.567]

81 In this case the kernel density estimate may take the form fD (x) = 1 n(2π)d/2 |H|1/2 n ∑ exp i=1 1 − (x − xi )T H −1 (x − xi ) , 2 x ∈ T, where H is a positive deﬁnite matrix and |H| is the determinant. [sent-417, score-0.158]

82 So long as H is ﬁxed a-priori, privacy may be established by adding a Gaussian process having mean zero and covariance given by 1 K(x, y) = exp − (x − y)T H −1 (x − y) . [sent-419, score-0.533]

83 2 As above, the sensitivity is upper bounded, as 1 fD − fD′ 2 ≤ 2 H d/2 |H|1/2 n(2π) Therefore it satisﬁes the (α, β)-DP to release √ c(β) 2 G, fD = fD + αn(2π)d/2 |H|1/2 2 . [sent-420, score-0.152]

84 In order to do this it is necessary to ﬁnd a differentially private version of h and then to employ the composition property of differential privacy. [sent-427, score-0.518]

85 This technique may be amenable to private analysis via the “exponential mechanism,” however it would evidently require that T be a compact set which is known a-priori. [sent-430, score-0.242]

86 To make a private version h j we may use the technique of Dwork and Lei (2009) in which a differentially private algorithm for the interquartile range was developed. [sent-434, score-0.595]

87 2 Functions in a Sobolev Space The above technique worked easily since we chose a particular RKHS in which we knew the kernel density estimator to live. [sent-436, score-0.155]

88 Thus for a family of functions in one dimension which lay in the Sobolev space H 1 , we may determine a noise level necessary to achieve the differential privacy by bounding the above quantity for the difference of two functions. [sent-445, score-0.728]

89 , xd ) = f1 (x1 ) · · · fd (xd ), fi ∈ H 1 [0, 1] . [sent-450, score-0.629]

90 We revisit the example of a kernel density estimator (with an isotropic Gaussian kernel). [sent-458, score-0.154]

91 (2010) noted that these bounds are useful for establishing the noise level required for differential privacy of support vector machines. [sent-469, score-0.693]

92 They are also useful for our approach to privacy in a function space. [sent-470, score-0.511]

93 0 x Figure 2: An example of a kernel density estimator (the solid curve) and the released version (the dashed curve). [sent-490, score-0.183]

94 The setup is the same as in Figure 1, but the privacy mechanism developed in Section 4. [sent-491, score-0.531]

95 In the bottom image are the same data points, with the predictions of the private kernel svm. [sent-538, score-0.277]

96 Conclusion We have shown how to add random noise to a function in such a way that differential privacy is preserved. [sent-577, score-0.693]

97 Interesting future work will be to address the issue of lower bounds for private functions. [sent-579, score-0.221]

98 Specifically, we can ask: Given that we want to release a differentially private function, what is the least amount of noise that must necessarily be added in order to preserve differential privacy? [sent-580, score-0.617]

99 A simple and practical algorithm for differentially private data release. [sent-703, score-0.336]

100 Differentially private recommender systems: building privacy into the net. [sent-733, score-0.732]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('fd', 0.629), ('privacy', 0.511), ('pd', 0.228), ('private', 0.221), ('differential', 0.182), ('differentially', 0.115), ('ld', 0.109), ('wasserman', 0.108), ('vd', 0.108), ('dwork', 0.102), ('ifferential', 0.1), ('inaldo', 0.1), ('rivacy', 0.1), ('release', 0.099), ('kx', 0.083), ('rkhs', 0.081), ('mcsherry', 0.071), ('kxi', 0.071), ('unctions', 0.071), ('kernel', 0.056), ('xn', 0.056), ('database', 0.054), ('sensitivity', 0.053), ('cylinder', 0.051), ('released', 0.049), ('density', 0.048), ('eld', 0.044), ('valued', 0.043), ('dpd', 0.043), ('chaudhuri', 0.042), ('owner', 0.041), ('reproducing', 0.041), ('kxl', 0.033), ('rt', 0.031), ('estimator', 0.03), ('adversary', 0.03), ('dimensional', 0.028), ('hardt', 0.028), ('mironov', 0.028), ('nissim', 0.028), ('talwar', 0.028), ('xi', 0.027), ('borel', 0.026), ('bertinet', 0.025), ('releasing', 0.025), ('proposition', 0.025), ('gaussian', 0.024), ('rd', 0.024), ('output', 0.024), ('bandwidth', 0.024), ('request', 0.024), ('essence', 0.024), ('bousquet', 0.022), ('covariance', 0.022), ('databases', 0.022), ('sobolev', 0.022), ('hilbert', 0.022), ('technique', 0.021), ('billingsley', 0.021), ('monteleoni', 0.021), ('nhd', 0.021), ('adler', 0.021), ('whenever', 0.02), ('mechanism', 0.02), ('isotropic', 0.02), ('functionals', 0.02), ('family', 0.019), ('gi', 0.019), ('rubinstein', 0.019), ('barak', 0.019), ('ramsay', 0.019), ('bear', 0.019), ('countable', 0.018), ('zhou', 0.018), ('randomized', 0.017), ('elisseeff', 0.017), ('agnan', 0.017), ('cylinders', 0.017), ('interquartile', 0.017), ('kxk', 0.017), ('kxn', 0.017), ('mcgregor', 0.017), ('privatized', 0.017), ('rob', 0.017), ('ci', 0.016), ('symposium', 0.016), ('conceptually', 0.016), ('contingency', 0.016), ('lay', 0.016), ('dn', 0.016), ('carnegie', 0.015), ('mellon', 0.015), ('nite', 0.015), ('minimizers', 0.015), ('larry', 0.015), ('dominating', 0.015), ('kasiviswanathan', 0.014), ('machanavajjhala', 0.014), ('raskhodnikova', 0.014), ('rinaldo', 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 32 jmlr-2013-Differential Privacy for Functions and Functional Data

Author: Rob Hall, Alessandro Rinaldo, Larry Wasserman

2 0.42789534 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components

Author: Kamalika Chaudhuri, Anand D. Sarwate, Kaushik Sinha

Abstract: The principal components analysis (PCA) algorithm is a standard tool for identifying good lowdimensional approximations to high-dimensional data. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We show that the sample complexity of the proposed method differs from the existing procedure in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results, showing that on real data there is a large performance gap between the existing method and our method. Keywords: differential privacy, principal components analysis, dimension reduction

3 0.16992381 117 jmlr-2013-Universal Consistency of Localized Versions of Regularized Kernel Methods

Author: Robert Hable

Abstract: In supervised learning problems, global and local learning algorithms are used. In contrast to global learning algorithms, the prediction of a local learning algorithm in a testing point is only based on training data which are close to the testing point. Every global algorithm such as support vector machines (SVM) can be localized in the following way: in every testing point, the (global) learning algorithm is not applied to the whole training data but only to the k nearest neighbors (kNN) of the testing point. In case of support vector machines, the success of such mixtures of SVM and kNN (called SVM-KNN) has been shown in extensive simulation studies and also for real data sets but only little has been known on theoretical properties so far. In the present article, it is shown how a large class of regularized kernel methods (including SVM) can be localized in order to get a universally consistent learning algorithm. Keywords: machine learning, regularized kernel methods, localization, SVM, k-nearest neighbors, SVM-KNN

4 0.066324912 86 jmlr-2013-Parallel Vector Field Embedding

Author: Binbin Lin, Xiaofei He, Chiyuan Zhang, Ming Ji

Abstract: We propose a novel local isometry based dimensionality reduction method from the perspective of vector ﬁelds, which is called parallel vector ﬁeld embedding (PFE). We ﬁrst give a discussion on local isometry and global isometry to show the intrinsic connection between parallel vector ﬁelds and isometry. The problem of ﬁnding an isometry turns out to be equivalent to ﬁnding orthonormal parallel vector ﬁelds on the data manifold. Therefore, we ﬁrst ﬁnd orthonormal parallel vector ﬁelds by solving a variational problem on the manifold. Then each embedding function can be obtained by requiring its gradient ﬁeld to be as close to the corresponding parallel vector ﬁeld as possible. Theoretical results show that our method can precisely recover the manifold if it is isometric to a connected open subset of Euclidean space. Both synthetic and real data examples demonstrate the effectiveness of our method even if there is heavy noise and high curvature. Keywords: manifold learning, isometry, vector ﬁeld, covariant derivative, out-of-sample extension

5 0.061193813 20 jmlr-2013-CODA: High Dimensional Copula Discriminant Analysis

Author: Fang Han, Tuo Zhao, Han Liu

Abstract: We propose a high dimensional classiﬁcation method, named the Copula Discriminant Analysis (CODA). The CODA generalizes the normal-based linear discriminant analysis to the larger Gaussian Copula models (or the nonparanormal) as proposed by Liu et al. (2009). To simultaneously achieve estimation efﬁciency and robustness, the nonparametric rank-based methods including the Spearman’s rho and Kendall’s tau are exploited in estimating the covariance matrix. In high dimensional settings, we prove that the sparsity pattern of the discriminant features can be consistently recovered with the parametric rate, and the expected misclassiﬁcation error is consistent to the Bayes risk. Our theory is backed up by careful numerical experiments, which show that the extra ﬂexibility gained by the CODA method incurs little efﬁciency loss even when the data are truly Gaussian. These results suggest that the CODA method can be an alternative choice besides the normal-based high dimensional linear discriminant analysis. Keywords: high dimensional statistics, sparse nonlinear discriminant analysis, Gaussian copula, nonparanormal distribution, rank-based statistics

6 0.049590979 76 jmlr-2013-Nonparametric Sparsity and Regularization

7 0.0411397 57 jmlr-2013-Kernel Bayes' Rule: Bayesian Inference with Positive Definite Kernels

8 0.03937567 69 jmlr-2013-Manifold Regularization and Semi-supervised Learning: Some Theoretical Analyses

9 0.032056551 6 jmlr-2013-A Plug-in Approach to Neyman-Pearson Classification

10 0.025714317 31 jmlr-2013-Derivative Estimation with Local Polynomial Fitting

11 0.024772231 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference

12 0.023432178 38 jmlr-2013-Dynamic Affine-Invariant Shape-Appearance Handshape Features and Classification in Sign Language Videos

13 0.023108538 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems

14 0.021109069 114 jmlr-2013-The Rate of Convergence of AdaBoost

15 0.020569585 109 jmlr-2013-Stress Functions for Nonlinear Dimension Reduction, Proximity Analysis, and Graph Drawing

16 0.02038466 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion

17 0.020204538 14 jmlr-2013-Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators

18 0.019971475 96 jmlr-2013-Regularization-Free Principal Curve Estimation

19 0.01990829 98 jmlr-2013-Segregating Event Streams and Noise with a Markov Renewal Process Model

20 0.019727532 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.149), (1, 0.061), (2, 0.064), (3, -0.123), (4, -0.02), (5, -0.224), (6, -0.064), (7, 0.733), (8, -0.087), (9, 0.078), (10, 0.176), (11, 0.095), (12, -0.079), (13, 0.031), (14, 0.126), (15, -0.036), (16, 0.013), (17, 0.06), (18, -0.017), (19, -0.018), (20, -0.024), (21, 0.021), (22, 0.077), (23, 0.001), (24, 0.023), (25, -0.028), (26, 0.03), (27, 0.003), (28, -0.011), (29, -0.006), (30, -0.002), (31, -0.02), (32, 0.029), (33, -0.0), (34, -0.024), (35, 0.036), (36, -0.008), (37, 0.003), (38, 0.007), (39, 0.021), (40, 0.002), (41, -0.024), (42, 0.024), (43, 0.011), (44, -0.012), (45, 0.028), (46, 0.019), (47, -0.0), (48, -0.021), (49, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95514965 32 jmlr-2013-Differential Privacy for Functions and Functional Data

Author: Rob Hall, Alessandro Rinaldo, Larry Wasserman

2 0.82517803 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components

Author: Kamalika Chaudhuri, Anand D. Sarwate, Kaushik Sinha

3 0.33889619 117 jmlr-2013-Universal Consistency of Localized Versions of Regularized Kernel Methods

Author: Robert Hable

4 0.16156988 76 jmlr-2013-Nonparametric Sparsity and Regularization

Author: Lorenzo Rosasco, Silvia Villa, Sofia Mosci, Matteo Santoro, Alessandro Verri

Abstract: In this work we are interested in the problems of supervised learning and variable selection when the input-output dependence is described by a nonlinear function depending on a few variables. Our goal is to consider a sparse nonparametric model, hence avoiding linear or additive models. The key idea is to measure the importance of each variable in the model by making use of partial derivatives. Based on this intuition we propose a new notion of nonparametric sparsity and a corresponding least squares regularization scheme. Using concepts and results from the theory of reproducing kernel Hilbert spaces and proximal methods, we show that the proposed learning algorithm corresponds to a minimization problem which can be provably solved by an iterative procedure. The consistency properties of the obtained estimator are studied both in terms of prediction and selection performance. An extensive empirical analysis shows that the proposed method performs favorably with respect to the state-of-the-art methods. Keywords: sparsity, nonparametric, variable selection, regularization, proximal methods, RKHS ∗. Also at Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy and Massachusetts Institute of Technology, Bldg. 46-5155, 77 Massachusetts Avenue, Cambridge, MA 02139, USA. c 2013 Lorenzo Rosasco, Silvia Villa, Soﬁa Mosci, Matteo Santoro and Alessandro Verri. ROSASCO , V ILLA , M OSCI , S ANTORO AND V ERRI

5 0.15755931 20 jmlr-2013-CODA: High Dimensional Copula Discriminant Analysis

Author: Fang Han, Tuo Zhao, Han Liu

6 0.1519628 86 jmlr-2013-Parallel Vector Field Embedding

7 0.15137838 57 jmlr-2013-Kernel Bayes' Rule: Bayesian Inference with Positive Definite Kernels

8 0.11389755 69 jmlr-2013-Manifold Regularization and Semi-supervised Learning: Some Theoretical Analyses

9 0.10780887 104 jmlr-2013-Sparse Single-Index Model

10 0.10656407 6 jmlr-2013-A Plug-in Approach to Neyman-Pearson Classification

11 0.10570558 88 jmlr-2013-Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models

12 0.10557286 35 jmlr-2013-Distribution-Dependent Sample Complexity of Large Margin Learning

13 0.098832339 14 jmlr-2013-Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators

14 0.098757356 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference

15 0.09548904 77 jmlr-2013-On the Convergence of Maximum Variance Unfolding

16 0.093406811 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning

17 0.091740072 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion

18 0.090436392 39 jmlr-2013-Efficient Active Learning of Halfspaces: An Aggressive Approach

19 0.089340642 71 jmlr-2013-Message-Passing Algorithms for Quadratic Minimization

20 0.087488249 70 jmlr-2013-Maximum Volume Clustering: A New Discriminative Clustering Approach

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.019), (5, 0.105), (6, 0.031), (10, 0.055), (13, 0.091), (20, 0.026), (23, 0.034), (48, 0.334), (53, 0.011), (68, 0.022), (70, 0.025), (75, 0.052), (85, 0.038), (87, 0.015), (93, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.70311993 27 jmlr-2013-Consistent Selection of Tuning Parameters via Variable Selection Stability

Author: Wei Sun, Junhui Wang, Yixin Fang

Abstract: Penalized regression models are popularly used in high-dimensional data analysis to conduct variable selection and model ﬁtting simultaneously. Whereas success has been widely reported in literature, their performances largely depend on the tuning parameters that balance the trade-off between model ﬁtting and model sparsity. Existing tuning criteria mainly follow the route of minimizing the estimated prediction error or maximizing the posterior model probability, such as cross validation, AIC and BIC. This article introduces a general tuning parameter selection criterion based on variable selection stability. The key idea is to select the tuning parameters so that the resultant penalized regression model is stable in variable selection. The asymptotic selection consistency is established for both ﬁxed and diverging dimensions. Its effectiveness is also demonstrated in a variety of simulated examples as well as an application to the prostate cancer data. Keywords: kappa coefﬁcient, penalized regression, selection consistency, stability, tuning

same-paper 2 0.69968605 32 jmlr-2013-Differential Privacy for Functions and Functional Data

Author: Rob Hall, Alessandro Rinaldo, Larry Wasserman

3 0.45025119 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components

Author: Kamalika Chaudhuri, Anand D. Sarwate, Kaushik Sinha

4 0.37863034 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization

Author: Yuchen Zhang, John C. Duchi, Martin J. Wainwright

Abstract: We analyze two communication-efﬁcient algorithms for distributed optimization in statistical settings involving large-scale data sets. The ﬁrst algorithm is a standard averaging method that distributes the N data samples evenly to m machines, performs separate minimization on each subset, and then averages the estimates. We provide a sharp analysis of this average mixture algorithm, showing that under a reasonable set of conditions, the combined parameter achieves √ mean-squared error (MSE) that decays as O (N −1 + (N/m)−2 ). Whenever m ≤ N, this guarantee matches the best possible rate achievable by a centralized algorithm having access to all N samples. The second algorithm is a novel method, based on an appropriate form of bootstrap subsampling. Requiring only a single round of communication, it has mean-squared error that decays as O (N −1 + (N/m)−3 ), and so is more robust to the amount of parallelization. In addition, we show that a stochastic gradient-based method attains mean-squared error decaying as O (N −1 + (N/m)−3/2 ), easing computation at the expense of a potentially slower MSE rate. We also provide an experimental evaluation of our methods, investigating their performance both on simulated data and on a large-scale regression problem from the internet search domain. In particular, we show that our methods can be used to efﬁciently solve an advertisement prediction problem from the Chinese SoSo Search Engine, which involves logistic regression with N ≈ 2.4 × 108 samples and d ≈ 740,000 covariates. Keywords: distributed learning, stochastic optimization, averaging, subsampling

5 0.37331095 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems

Author: Takafumi Kanamori, Akiko Takeda, Taiji Suzuki

Abstract: There are two main approaches to binary classiÄ?Ĺš cation problems: the loss function approach and the uncertainty set approach. The loss function approach is widely used in real-world data analysis. Statistical decision theory has been used to elucidate its properties such as statistical consistency. Conditional probabilities can also be estimated by using the minimum solution of the loss function. In the uncertainty set approach, an uncertainty set is deÄ?Ĺš ned for each binary label from training samples. The best separating hyperplane between the two uncertainty sets is used as the decision function. Although the uncertainty set approach provides an intuitive understanding of learning algorithms, its statistical properties have not been sufÄ?Ĺš ciently studied. In this paper, we show that the uncertainty set is deeply connected with the convex conjugate of a loss function. On the basis of the conjugate relation, we propose a way of revising the uncertainty set approach so that it will have good statistical properties such as statistical consistency. We also introduce statistical models corresponding to uncertainty sets in order to estimate conditional probabilities. Finally, we present numerical experiments, verifying that the learning with revised uncertainty sets improves the prediction accuracy. Keywords: loss function, uncertainty set, convex conjugate, consistency

6 0.37075445 76 jmlr-2013-Nonparametric Sparsity and Regularization

7 0.37044927 2 jmlr-2013-A Binary-Classification-Based Metric between Time-Series Distributions and Its Use in Statistical and Learning Problems

8 0.37007979 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation

9 0.3698453 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion

10 0.36968046 52 jmlr-2013-How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis

11 0.36957946 75 jmlr-2013-Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood

12 0.36909232 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference

13 0.36900902 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning

14 0.36698145 73 jmlr-2013-Multicategory Large-Margin Unified Machines

15 0.36687985 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning

16 0.36638361 51 jmlr-2013-Greedy Sparsity-Constrained Optimization

17 0.36593151 105 jmlr-2013-Sparsity Regret Bounds for Individual Sequences in Online Linear Regression

18 0.36489981 9 jmlr-2013-A Widely Applicable Bayesian Information Criterion

19 0.3644993 39 jmlr-2013-Efficient Active Learning of Halfspaces: An Aggressive Approach

20 0.36416829 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso