jmlr jmlr2013 jmlr2013-5 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kamalika Chaudhuri, Anand D. Sarwate, Kaushik Sinha
Abstract: The principal components analysis (PCA) algorithm is a standard tool for identifying good lowdimensional approximations to high-dimensional data. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We show that the sample complexity of the proposed method differs from the existing procedure in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results, showing that on real data there is a large performance gap between the existing method and our method. Keywords: differential privacy, principal components analysis, dimension reduction
Reference: text
sentIndex sentText sentNum sentScore
1 Many data sets of interest contain private or sensitive information about individuals. [sent-9, score-0.242]
2 Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. [sent-10, score-0.558]
3 Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. [sent-11, score-1.238]
4 In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. [sent-12, score-0.545]
5 Much of today’s machine-learning is performed on the vast amounts of personal information collected by private companies and government agencies about individuals: examples include user or customer behaviors, demographic surveys, and test results from experimental subjects or patients. [sent-28, score-0.242]
6 We study approximations to PCA which guarantee differential privacy, a cryptographically motivated definition of privacy (Dwork et al. [sent-31, score-0.652]
7 Differential privacy measures privacy risk by a parameter ε p that bounds the log-likelihood ratio of output of a (private) algorithm under two databases differing in a single individual. [sent-35, score-1.143]
8 The goal of this paper is to characterize the problem of differentially private PCA. [sent-45, score-0.423]
9 In order to understand the performance gap, we prove sample complexity bounds for the case of k = 1 for SULQ and PPCA, as well as a general lower bound on the sample complexity for √ any differentially private algorithm. [sent-52, score-0.45]
10 Furthermore, we show that any differentially private algorithm requires Ω(d) samples. [sent-54, score-0.423]
11 These theoretical results suggest that our experiments demonstrate the limit of how well ε p -differentially private algorithms can perform, and our experiments show that this gap should persist for general k. [sent-56, score-0.264]
12 However, we believe this is a consequence of the fact that we make minimal assumptions on the data; our results imply that, absent additional limitations on the data set, the sample complexity differentially private PCA must grow linearly with the data dimension. [sent-58, score-0.423]
13 Differentially privacy is a mathematical definition, but algorithms must be implemented using finite precision machines. [sent-61, score-0.558]
14 A second set of issues is theoretical—while the privacy guarantees of PPCA hold for all k, our theoretical analysis of sample complexity applies only to k = 1 in which the distance and angles between vectors are related. [sent-63, score-0.558]
15 Under differential privacy, the selection of k itself must be done in a differentially private manner. [sent-66, score-0.517]
16 1 Related Work Differential privacy was first proposed by Dwork et al. [sent-68, score-0.558]
17 Differential privacy has been shown to have strong semantic guarantees (Dwork et al. [sent-72, score-0.558]
18 In particular, so-called syntactic definitions of privacy (Sweeney, 2002; Machanavajjhala et al. [sent-75, score-0.558]
19 There are several general approaches to constructing differentially private approximations to some desired algorithm or computation. [sent-78, score-0.423]
20 The exponential mechanism (McSherry and Talwar, 2007) can be used to perform differentially private selection based on a score function that measures the quality of different outputs. [sent-82, score-0.459]
21 (2007) and Dwork and Lei (2009) have been used to approximate a variety of statistical, machine learning, and data mining tasks under differential privacy (Barak et al. [sent-86, score-0.652]
22 This paper deals with the problem of differentially private approximations to PCA. [sent-92, score-0.423]
23 Subsequent to our work, Kapralov and Talwar (2013) 2907 C HAUDHURI , S ARWATE AND S INHA have proposed a dynamic programming algorithm for differentially private low rank matrix approximation which involves sampling from a distribution induced by the exponential mechanism. [sent-97, score-0.447]
24 Hardt and Roth (2013) show a method for guaranteeing (ε, δ)-differential privacy under an entry-wise neighborhood condition using the power method for calculating singular values. [sent-101, score-0.558]
25 (2009) analyze a differentially private data release scheme where a random linear transformation is applied to data to preserve differential privacy, and then measures how much this transformation affects the utility of a PCA of the data. [sent-105, score-0.639]
26 (2012) show that the JL transform of the data preserves differential privacy provided the minimum singular value of the data matrix is large. [sent-108, score-0.676]
27 (2013) study the problem of estimating the distance between data points with differential privacy using a random projection of the data points. [sent-110, score-0.652]
28 There has been significant work on other notions of privacy based on manipulating entries within the database (Sweeney, 2002; Machanavajjhala et al. [sent-111, score-0.587]
29 For more details on these and other alternative notions of privacy see Fung et al. [sent-114, score-0.558]
30 , 2006); our work differs from this line of work in that we offer differential privacy guarantees, and we only release the PCA subspace, not actual data. [sent-119, score-0.652]
31 , xn } where each xi corresponds to the private value of one individual, xi ∈ Rd , and xi ≤ 1 for all i. [sent-124, score-0.242]
32 We study approximations to A to PCA that preserve the privacy of the underlying data. [sent-150, score-0.558]
33 The notion of privacy that we use is differential privacy, which quantifies the privacy guaranteed by a randomized algorithm A applied to a data set D . [sent-151, score-1.21]
34 Definition 4 An algorithm A (B ) taking values in a set T provides (ε p , δ)-differential privacy if P (A (D ) ∈ S ) ≤ eε p P A (D ′ ) ∈ S + δ, for all measurable S ⊆ T and all data sets D and D ′ differing in a single entry. [sent-154, score-0.558]
35 Here ε p and δ are privacy parameters, where low ε p and δ ensure more privacy (Dwork et al. [sent-155, score-1.116]
36 The second privacy guarantee is weaker; the parameter δ bounds the probability of failure, and is typically chosen to be quite small. [sent-158, score-0.558]
37 In this paper we are interested in proving results on the sample complexity of differentially private algorithms that approximate PCA. [sent-161, score-0.423]
38 That is, for a given ε p and ρ, how large must the number of individuals n in the data set be such that the algorithm is both ε p -differentially private and a (ρ, η)-close approximation to PCA? [sent-162, score-0.274]
39 It is well known that as the number of individuals n grows, it is easier to guarantee the same level of privacy with relatively less noise or perturbation, and therefore the utility of the approximation also improves. [sent-163, score-0.712]
40 Our results characterize how the privacy ε p and utility ρ scale with n and the tradeoff between them for fixed n. [sent-164, score-0.68]
41 Algorithms and Results In this section we describe differentially private techniques for approximating (2). [sent-167, score-0.423]
42 Both procedures are differentially private approximations to the top-k subspace: SULQ guarantees (ε p , δ)-differential privacy and PPCA guarantees ε p -differential privacy. [sent-171, score-0.981]
43 The SULQ method perturbs each entry of the empirical second moment matrix A to ensure differential privacy and releases the top-k eigenvectors of this perturbed matrix. [sent-175, score-0.711]
44 This guarantees a weaker privacy definition known as (ε p , δ)-differential privacy. [sent-180, score-0.558]
45 Algorithm 1: Algorithm MOD-SULQ (input pertubation) inputs: d × n data matrix X, privacy parameter ε p , parameter δ ˆ outputs: d × k matrix Vk = [v1 v2 · · · vk ] with orthonormal columns ˆ ˆ ˆ 1 1 Set A = n XX T . [sent-192, score-0.702]
46 2 Exponential Mechanism Our new method, PPCA, randomly samples a k-dimensional subspace from a distribution that ensures differential privacy and is biased towards high utility. [sent-200, score-0.713]
47 The algorithm and its privacy properties apply to general k < d but our theoretical results on the utility focus on the special case k = 1. [sent-202, score-0.68]
48 By combining results on the exponential mechanism along with properties of PCA algorithm, we can show that this procedure is differentially private. [sent-205, score-0.217]
49 4 Main Results Our theoretical results are sample complexity bounds for PPCA and MOD-SULQ as well as a general lower bound on the sample complexity for any ε p -differentially private algorithm. [sent-224, score-0.269]
50 These results show that the PPCA is nearly optimal in terms of the scaling of the sample complexity with respect to the data dimension d, privacy parameter ε p , and eigengap ∆. [sent-225, score-0.596]
51 We further show that MOD-SULQ requires more samples as a function of d, despite having a slightly weaker privacy guarantee. [sent-226, score-0.558]
52 Theorem 5 For the β in (5) Algorithm MOD-SULQ is (ε p , δ) differentially private. [sent-230, score-0.181]
53 The fact that these two algorithms are differentially private follows from some simple calculations. [sent-232, score-0.423]
54 Our first sample complexity result provides an upper bound on the number of samples required by PPCA to guarantee a certain level of privacy and accuracy. [sent-233, score-0.585]
55 Theorem 7 (Sample complexity of PPCA) If n> 8λ1 d log(1/η) + 2 log 4 , ε p ∆(1 − ρ) d (1 − ρ2 )∆ then the top PCA direction v1 and the output of PPCA v1 with privacy parameter ε p satisfy ˆ Pr(| v1 , v1 | > ρ) ≥ 1 − η. [sent-236, score-0.625]
56 For any ρ ≥ 1 − no ε p -differentially private algorithm A can apexp −2 · proximate PCA with expected utility greater than ρ on all databases with n points in dimension d having eigenvalue gap ∆, where n< d max 1, εp∆ Theorem 7 shows that if n scales like 1−φ 80(1 − ρ) . [sent-240, score-0.413]
57 1 d ε p ∆(1−ρ) log 1−ρ2 then PPCA produces an approximation d for any v1 that has correlation ρ with v1 , whereas Theorem 8 shows that n must scale like √ ˆ εp∆ (1−ρ) ε p -differentially private algorithm. [sent-241, score-0.285]
58 Proof Fix a privacy level ε p , target correlation ρ, and probability η. [sent-246, score-0.558]
59 3 Lower Bound on Utility We now turn to a general lower bound on the sample complexity for any differentially private approximation to PCA. [sent-265, score-0.45]
60 For such a collection, Lemma 12 shows that for any differentially private mechanism, the average correlation over the collection cannot be too large. [sent-267, score-0.423]
61 That is, any ε p -differentially private mechanism cannot have high utility on all K data sets. [sent-268, score-0.4]
62 The following Lemma gives a lower bound on the expected utility averaged over a set of databases which differ in a “small” number of elements. [sent-275, score-0.176]
63 If A is any ε p -differentially private algorithm, then, K ∑ EA [| A (Di), ui i=1 |] ≤ K 1 − 1 (1 − max ui , u j ) . [sent-286, score-0.352]
64 16 Proof Let t = min( ui − u j , ui + u j ), i= j and Gi be the “double cap” around ±ui of radius t/2: Gi = {u : u − ui < t/2} ∪ {u : u + ui < t/2} . [sent-287, score-0.22]
65 8 Rewriting the norms in terms of inner products shows K 1 2K − 2 ∑ EA [| A (Di ), ui |] ≥ (K − 1) 2 − 2 max ui , u j 8 i=1 , so K ∑ EA [| A (Di), ui i=1 |] ≤ K 1 − ≤ K 1− 1 K −1 (1 − max ui , u j ) 8 K 1 (1 − max ui , u j ) . [sent-293, score-0.275]
66 , K}, for at least one database i, EA [| A (Di ), ui |] ≤ 1 − ln(K−1) εp points with top eigenvectors 1 1 − max ui , u j 16 for any ε p -differentially private algorithm. [sent-298, score-0.44]
67 ≤ 1− 2 b + (a − λ)2 ≤ max ui , u j i= j (17) 2 To obtain an upper bound on maxi= j ui , u j we must lower bound b2(a−λ) 2 . [sent-336, score-0.164]
68 80(1 − ρ) 2920 A N EAR -O PTIMAL A LGORITHM FOR D IFFERENTIALLY-P RIVATE P RINCIPAL C OMPONENTS d Thus for β ≤ ∆ ≤ 1/2, any ε p -differentially private algorithm needs Ω ε ∆√1−ρ points to get p expected inner product ρ on all data sets with eigengap ∆. [sent-344, score-0.28]
69 As the top eigenvector of the i-th database is ui = wi , max | ui , u j | = max | wi , w j | ≤ φ. [sent-349, score-0.214]
70 Theorem 15 provides a lower bound on the distance between the vector released by MOD-SULQ and the true top eigenvector in terms of the privacy parameters ε p and δ and the number of points n in the data set. [sent-354, score-0.66]
71 Theorem 15 provides a lower bound on the distance between the vector released by MOD-SULQ and the true top eigenvector in terms of the privacy parameters ε p and δ and the number of points n in the data set. [sent-386, score-0.66]
72 2923 C HAUDHURI , S ARWATE AND S INHA Theorem 15 (Utility bound for MOD-SULQ) Let d, n, and ε p > 0 be given and let β be given by Algorithm 1 so that the output of MOD-SULQ is (ε p , δ)-differentially private for all data sets in Rd with n elements. [sent-407, score-0.269]
73 The interaction of MCMC procedures and differential privacy is a rich area for future research. [sent-482, score-0.652]
74 We choose k = 4 for kddcup, k = 8 for census, k = 10 for localization and k = 11 for insurance; in each case, the utility qF (Vk ) of the top-k PCA subspace of the data matrix accounts for at least 80% of A F . [sent-490, score-0.244]
75 2927 C HAUDHURI , S ARWATE AND S INHA Data Set kddcup census localization insurance #instances 494,021 199,523 164,860 9,822 #dimensions 116 513 44 150 qF (Vk ) 0. [sent-497, score-0.209]
76 The “ideal” execution of PPCA provides ε p -differential privacy, but because our implementation only approximates sampling from the Bingham distribution, we cannot guarantee that this implementation provides the privacy guarantee. [sent-520, score-0.558]
77 As noted by Mironov (2012), even current implementations of floating-point arithmetic may suffer from privacy problems, so there is still significant work to do between theory and implementation. [sent-521, score-0.558]
78 Figure 2(a) shows log Fki (T ) for for k = 4 as a function of iteration T for 40, 000 steps of the Gibbs sampler on the kddcup data set. [sent-531, score-0.158]
79 log Fki (T ) also appears to have a larger variance for kddcup than for insurance; this is explained by the fact that the kddcup data set has a much larger number of samples, which makes its stationary distribution farther from the initial distribution of the sampler. [sent-538, score-0.165]
80 These plots illustrate how additional data can improve the utility of the output for a fixed privacy level ε p . [sent-557, score-0.68]
81 As n increases, the dashed blue line indicating the utility of PPCA begins to approach qF (Vk ), the utility of the optimal subspace. [sent-558, score-0.244]
82 Our theoretical results suggest that the performance of differentially private PCA cannot be significantly improved over the performance of PPCA but since those results hold for k = 1 they do not immediately apply here. [sent-562, score-0.423]
83 0 2e+04 4e+04 6e+04 8e+04 1e+05 sample size (b) kddcup (k = 4) Figure 3: Plot of the unnormalized utility qF (U) versus the sample size n, averaged over random subsets of the data and randomness in the algorithms. [sent-586, score-0.183]
84 The top red line is the PCA algorithm without privacy constraints. [sent-588, score-0.582]
85 1 2000 4000 6000 8000 10000 sample size (b) insurance (k = 11) Figure 4: Plot of the unnormalized utility qF (U) versus the sample size n, averaged over random subsets of the data and randomness in the algorithms. [sent-604, score-0.192]
86 The top red line is the PCA algorithm without privacy constraints. [sent-606, score-0.582]
87 The green and purple dashed lines are nearly indistinguishable for insurance but diverge for localization—they represent the utility from random projections and MOD-SULQ, respectively. [sent-608, score-0.222]
88 In this case the space spanned by the top two eigenvectors has most of the energy, so we chose k = 2 and plotted the utility qF (·) for non-private PCA, MOD-SULQ with δ = 0. [sent-639, score-0.181]
89 The privacy requirement relaxes as ε p increases, and both MOD-SULQ and PPCA approach the utility of PCA without privacy constraints. [sent-642, score-1.238]
90 Utility versus privacy parameter Utility qF(U) 0. [sent-644, score-0.558]
91 Conclusion In this paper we investigated the theoretical and empirical performance of differentially private approximations to PCA. [sent-656, score-0.423]
92 Because PPCA uses the exponential mechanism with qF (·) as the utility function, it is not surprising that it 2933 C HAUDHURI , S ARWATE AND S INHA performs well. [sent-659, score-0.158]
93 We furthermore showed that PPCA is nearly optimal, in that any differentially private approximation to PCA must use Ω(d) samples. [sent-661, score-0.423]
94 The description of differentially private algorithms assume an ideal model of computation : real systems require additional security assumptions that have to be verified. [sent-663, score-0.423]
95 Numerical optimization methods used in some privacy methods (Chaudhuri et al. [sent-665, score-0.558]
96 For minimax lower bounds, it may be possible to construct a packing with respect to a general utility metric. [sent-672, score-0.162]
97 Differential privacy for statistics: What we know and what we want to learn. [sent-1031, score-0.558]
98 Accurate estimation of the degree distribution of private networks. [sent-1163, score-0.242]
99 Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. [sent-1275, score-0.585]
100 Differentially private recommender systems: Building privacy into the netflix prize contenders. [sent-1326, score-0.8]
wordName wordTfidf (topN-words)
[('privacy', 0.558), ('ppca', 0.372), ('private', 0.242), ('qf', 0.19), ('differentially', 0.181), ('doi', 0.157), ('ear', 0.144), ('omponents', 0.144), ('sulq', 0.144), ('arwate', 0.137), ('haudhuri', 0.137), ('inha', 0.137), ('rincipal', 0.123), ('rivate', 0.123), ('utility', 0.122), ('pca', 0.118), ('url', 0.106), ('dwork', 0.105), ('mcsherry', 0.097), ('ptimal', 0.096), ('differential', 0.094), ('lgorithm', 0.075), ('insurance', 0.07), ('bingham', 0.068), ('chaudhuri', 0.067), ('vk', 0.063), ('subspace', 0.061), ('kamalika', 0.061), ('kddcup', 0.061), ('gibbs', 0.057), ('ui', 0.055), ('sampler', 0.054), ('fki', 0.053), ('talwar', 0.052), ('eigenvector', 0.051), ('mcmc', 0.049), ('hoff', 0.046), ('hardt', 0.045), ('log', 0.043), ('census', 0.041), ('packing', 0.04), ('nissim', 0.039), ('fi', 0.038), ('csisz', 0.038), ('eigengap', 0.038), ('blum', 0.037), ('localization', 0.037), ('mechanism', 0.036), ('zi', 0.036), ('sd', 0.035), ('cynthia', 0.035), ('eigenvectors', 0.035), ('qa', 0.035), ('http', 0.034), ('chain', 0.034), ('nt', 0.033), ('orthonormal', 0.033), ('brooks', 0.032), ('machanavajjhala', 0.032), ('mironov', 0.032), ('copies', 0.032), ('individuals', 0.032), ('anand', 0.03), ('frank', 0.03), ('kapralov', 0.03), ('kaushik', 0.03), ('roberts', 0.03), ('projections', 0.03), ('di', 0.029), ('database', 0.029), ('adam', 0.029), ('sensitivity', 0.028), ('bound', 0.027), ('databases', 0.027), ('subspaces', 0.027), ('perturbation', 0.027), ('june', 0.026), ('gi', 0.026), ('ny', 0.026), ('ilya', 0.026), ('nonprivate', 0.026), ('sphere', 0.026), ('markov', 0.025), ('top', 0.024), ('ln', 0.024), ('ai', 0.024), ('adds', 0.024), ('matrix', 0.024), ('stoc', 0.024), ('exp', 0.023), ('avrim', 0.023), ('ganta', 0.023), ('gareth', 0.023), ('kobbi', 0.023), ('narayan', 0.023), ('nina', 0.023), ('packings', 0.023), ('sarwate', 0.023), ('wichita', 0.023), ('gap', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components
Author: Kamalika Chaudhuri, Anand D. Sarwate, Kaushik Sinha
Abstract: The principal components analysis (PCA) algorithm is a standard tool for identifying good lowdimensional approximations to high-dimensional data. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We show that the sample complexity of the proposed method differs from the existing procedure in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results, showing that on real data there is a large performance gap between the existing method and our method. Keywords: differential privacy, principal components analysis, dimension reduction
2 0.42789534 32 jmlr-2013-Differential Privacy for Functions and Functional Data
Author: Rob Hall, Alessandro Rinaldo, Larry Wasserman
Abstract: Differential privacy is a rigorous cryptographically-motivated characterization of data privacy which may be applied when releasing summaries of a database. Previous work has focused mainly on methods for which the output is a finite dimensional vector, or an element of some discrete set. We develop methods for releasing functions while preserving differential privacy. Specifically, we show that adding an appropriate Gaussian process to the function of interest yields differential privacy. When the functions lie in the reproducing kernel Hilbert space (RKHS) generated by the covariance kernel of the Gaussian process, then the correct noise level is established by measuring the “sensitivity” of the function in the RKHS norm. As examples we consider kernel density estimation, kernel support vector machines, and functions in RKHSs. Keywords: differential privacy, density estimation, Gaussian processes, reproducing kernel Hilbert space
3 0.050006192 7 jmlr-2013-A Risk Comparison of Ordinary Least Squares vs Ridge Regression
Author: Paramveer S. Dhillon, Dean P. Foster, Sham M. Kakade, Lyle H. Ungar
Abstract: We compare the risk of ridge regression to a simple variant of ordinary least squares, in which one simply projects the data onto a finite dimensional subspace (as specified by a principal component analysis) and then performs an ordinary (un-regularized) least squares regression in this subspace. This note shows that the risk of this ordinary least squares method (PCA-OLS) is within a constant factor (namely 4) of the risk of ridge regression (RR). Keywords: risk inflation, ridge regression, pca
4 0.044969562 36 jmlr-2013-Distributions of Angles in Random Packing on Spheres
Author: Tony Cai, Jianqing Fan, Tiefeng Jiang
Abstract: This paper studies the asymptotic behaviors of the pairwise angles among n randomly and uniformly distributed unit vectors in R p as the number of points n → ∞, while the dimension p is either fixed or growing with n. For both settings, we derive the limiting empirical distribution of the random angles and the limiting distributions of the extreme angles. The results reveal interesting differences in the two settings and provide a precise characterization of the folklore that “all high-dimensional random vectors are almost always nearly orthogonal to each other”. Applications to statistics and machine learning and connections with some open problems in physics and mathematics are also discussed. Keywords: random angle, uniform distribution on sphere, empirical law, maximum of random variables, minimum of random variables, extreme-value distribution, packing on sphere
5 0.042821743 78 jmlr-2013-On the Learnability of Shuffle Ideals
Author: Dana Angluin, James Aspnes, Sarah Eisenstat, Aryeh Kontorovich
Abstract: PAC learning of unrestricted regular languages is long known to be a difficult problem. The class of shuffle ideals is a very restricted subclass of regular languages, where the shuffle ideal generated by a string u is the collection of all strings containing u as a subsequence. This fundamental language family is of theoretical interest in its own right and provides the building blocks for other important language families. Despite its apparent simplicity, the class of shuffle ideals appears quite difficult to learn. In particular, just as for unrestricted regular languages, the class is not properly PAC learnable in polynomial time if RP = NP, and PAC learning the class improperly in polynomial time would imply polynomial time algorithms for certain fundamental problems in cryptography. In the positive direction, we give an efficient algorithm for properly learning shuffle ideals in the statistical query (and therefore also PAC) model under the uniform distribution. Keywords: PAC learning, statistical queries, regular languages, deterministic finite automata, shuffle ideals, subsequences
6 0.041406725 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering
7 0.040901355 43 jmlr-2013-Fast MCMC Sampling for Markov Jump Processes and Extensions
8 0.037782591 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion
9 0.036019143 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning
10 0.035670649 35 jmlr-2013-Distribution-Dependent Sample Complexity of Large Margin Learning
11 0.035505306 116 jmlr-2013-Truncated Power Method for Sparse Eigenvalue Problems
12 0.034301884 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference
13 0.034241159 96 jmlr-2013-Regularization-Free Principal Curve Estimation
14 0.03256477 104 jmlr-2013-Sparse Single-Index Model
15 0.03169056 84 jmlr-2013-PC Algorithm for Nonparanormal Graphical Models
16 0.031335346 75 jmlr-2013-Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood
17 0.030347755 87 jmlr-2013-Performance Bounds for λ Policy Iteration and Application to the Game of Tetris
18 0.030000187 86 jmlr-2013-Parallel Vector Field Embedding
19 0.029214595 114 jmlr-2013-The Rate of Convergence of AdaBoost
20 0.02868633 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso
topicId topicWeight
[(0, -0.176), (1, 0.045), (2, 0.058), (3, -0.082), (4, -0.022), (5, -0.194), (6, -0.065), (7, 0.653), (8, -0.09), (9, 0.159), (10, 0.245), (11, 0.07), (12, -0.158), (13, 0.054), (14, 0.17), (15, -0.041), (16, -0.034), (17, -0.037), (18, -0.086), (19, 0.014), (20, 0.019), (21, 0.003), (22, 0.063), (23, -0.043), (24, 0.025), (25, -0.008), (26, -0.006), (27, -0.005), (28, -0.021), (29, -0.027), (30, -0.056), (31, 0.032), (32, 0.004), (33, 0.06), (34, 0.016), (35, -0.035), (36, -0.014), (37, 0.013), (38, 0.019), (39, 0.063), (40, -0.004), (41, 0.025), (42, 0.003), (43, 0.055), (44, -0.024), (45, 0.003), (46, -0.01), (47, -0.011), (48, 0.011), (49, -0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.91297871 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components
Author: Kamalika Chaudhuri, Anand D. Sarwate, Kaushik Sinha
Abstract: The principal components analysis (PCA) algorithm is a standard tool for identifying good lowdimensional approximations to high-dimensional data. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We show that the sample complexity of the proposed method differs from the existing procedure in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results, showing that on real data there is a large performance gap between the existing method and our method. Keywords: differential privacy, principal components analysis, dimension reduction
2 0.89663261 32 jmlr-2013-Differential Privacy for Functions and Functional Data
Author: Rob Hall, Alessandro Rinaldo, Larry Wasserman
Abstract: Differential privacy is a rigorous cryptographically-motivated characterization of data privacy which may be applied when releasing summaries of a database. Previous work has focused mainly on methods for which the output is a finite dimensional vector, or an element of some discrete set. We develop methods for releasing functions while preserving differential privacy. Specifically, we show that adding an appropriate Gaussian process to the function of interest yields differential privacy. When the functions lie in the reproducing kernel Hilbert space (RKHS) generated by the covariance kernel of the Gaussian process, then the correct noise level is established by measuring the “sensitivity” of the function in the RKHS norm. As examples we consider kernel density estimation, kernel support vector machines, and functions in RKHSs. Keywords: differential privacy, density estimation, Gaussian processes, reproducing kernel Hilbert space
3 0.18990934 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion
Author: Tony Cai, Wen-Xin Zhou
Abstract: We consider in this paper the problem of noisy 1-bit matrix completion under a general non-uniform sampling distribution using the max-norm as a convex relaxation for the rank. A max-norm constrained maximum likelihood estimate is introduced and studied. The rate of convergence for the estimate is obtained. Information-theoretical methods are used to establish a minimax lower bound under the general sampling model. The minimax upper and lower bounds together yield the optimal rate of convergence for the Frobenius norm loss. Computational algorithms and numerical performance are also discussed. Keywords: 1-bit matrix completion, low-rank matrix, max-norm, trace-norm, constrained optimization, maximum likelihood estimate, optimal rate of convergence
4 0.17595868 43 jmlr-2013-Fast MCMC Sampling for Markov Jump Processes and Extensions
Author: Vinayak Rao, Yee Whye Teh
Abstract: Markov jump processes (or continuous-time Markov chains) are a simple and important class of continuous-time dynamical systems. In this paper, we tackle the problem of simulating from the posterior distribution over paths in these models, given partial and noisy observations. Our approach is an auxiliary variable Gibbs sampler, and is based on the idea of uniformization. This sets up a Markov chain over paths by alternately sampling a finite set of virtual jump times given the current path, and then sampling a new path given the set of extant and virtual jump times. The first step involves simulating a piecewise-constant inhomogeneous Poisson process, while for the second, we use a standard hidden Markov model forward filtering-backward sampling algorithm. Our method is exact and does not involve approximations like time-discretization. We demonstrate how our sampler extends naturally to MJP-based models like Markov-modulated Poisson processes and continuous-time Bayesian networks, and show significant computational benefits over state-ofthe-art MCMC samplers for these models. Keywords: Markov jump process, MCMC, Gibbs sampler, uniformization, Markov-modulated Poisson process, continuous-time Bayesian network
5 0.17481709 116 jmlr-2013-Truncated Power Method for Sparse Eigenvalue Problems
Author: Xiao-Tong Yuan, Tong Zhang
Abstract: This paper considers the sparse eigenvalue problem, which is to extract dominant (largest) sparse eigenvectors with at most k non-zero components. We propose a simple yet effective solution called truncated power method that can approximately solve the underlying nonconvex optimization problem. A strong sparse recovery result is proved for the truncated power method, and this theory is our key motivation for developing the new algorithm. The proposed method is tested on applications such as sparse principal component analysis and the densest k-subgraph problem. Extensive experiments on several synthetic and real-world data sets demonstrate the competitive empirical performance of our method. Keywords: sparse eigenvalue, power method, sparse principal component analysis, densest k-subgraph
6 0.15631942 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering
7 0.15331429 35 jmlr-2013-Distribution-Dependent Sample Complexity of Large Margin Learning
8 0.14321356 117 jmlr-2013-Universal Consistency of Localized Versions of Regularized Kernel Methods
9 0.14103274 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference
10 0.13894053 7 jmlr-2013-A Risk Comparison of Ordinary Least Squares vs Ridge Regression
11 0.13631275 13 jmlr-2013-Approximating the Permanent with Fractional Belief Propagation
12 0.1326248 78 jmlr-2013-On the Learnability of Shuffle Ideals
13 0.13259508 104 jmlr-2013-Sparse Single-Index Model
14 0.12984729 109 jmlr-2013-Stress Functions for Nonlinear Dimension Reduction, Proximity Analysis, and Graph Drawing
15 0.12894155 39 jmlr-2013-Efficient Active Learning of Halfspaces: An Aggressive Approach
16 0.1282154 36 jmlr-2013-Distributions of Angles in Random Packing on Spheres
17 0.12726744 86 jmlr-2013-Parallel Vector Field Embedding
18 0.12699407 53 jmlr-2013-Improving CUR Matrix Decomposition and the Nystrom Approximation via Adaptive Sampling
19 0.12635671 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso
20 0.12488854 20 jmlr-2013-CODA: High Dimensional Copula Discriminant Analysis
topicId topicWeight
[(0, 0.021), (5, 0.098), (6, 0.037), (9, 0.015), (10, 0.076), (13, 0.4), (20, 0.029), (23, 0.028), (68, 0.021), (70, 0.021), (75, 0.056), (85, 0.028), (87, 0.04), (89, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.66960996 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components
Author: Kamalika Chaudhuri, Anand D. Sarwate, Kaushik Sinha
Abstract: The principal components analysis (PCA) algorithm is a standard tool for identifying good lowdimensional approximations to high-dimensional data. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We show that the sample complexity of the proposed method differs from the existing procedure in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results, showing that on real data there is a large performance gap between the existing method and our method. Keywords: differential privacy, principal components analysis, dimension reduction
2 0.41931081 32 jmlr-2013-Differential Privacy for Functions and Functional Data
Author: Rob Hall, Alessandro Rinaldo, Larry Wasserman
Abstract: Differential privacy is a rigorous cryptographically-motivated characterization of data privacy which may be applied when releasing summaries of a database. Previous work has focused mainly on methods for which the output is a finite dimensional vector, or an element of some discrete set. We develop methods for releasing functions while preserving differential privacy. Specifically, we show that adding an appropriate Gaussian process to the function of interest yields differential privacy. When the functions lie in the reproducing kernel Hilbert space (RKHS) generated by the covariance kernel of the Gaussian process, then the correct noise level is established by measuring the “sensitivity” of the function in the RKHS norm. As examples we consider kernel density estimation, kernel support vector machines, and functions in RKHSs. Keywords: differential privacy, density estimation, Gaussian processes, reproducing kernel Hilbert space
Author: Alberto N. Escalante-B., Laurenz Wiskott
Abstract: Supervised learning from high-dimensional data, for example, multimedia data, is a challenging task. We propose an extension of slow feature analysis (SFA) for supervised dimensionality reduction called graph-based SFA (GSFA). The algorithm extracts a label-predictive low-dimensional set of features that can be post-processed by typical supervised algorithms to generate the final label or class estimation. GSFA is trained with a so-called training graph, in which the vertices are the samples and the edges represent similarities of the corresponding labels. A new weighted SFA optimization problem is introduced, generalizing the notion of slowness from sequences of samples to such training graphs. We show that GSFA computes an optimal solution to this problem in the considered function space and propose several types of training graphs. For classification, the most straightforward graph yields features equivalent to those of (nonlinear) Fisher discriminant analysis. Emphasis is on regression, where four different graphs were evaluated experimentally with a subproblem of face detection on photographs. The method proposed is promising particularly when linear models are insufficient as well as when feature selection is difficult. Keywords: slow feature analysis, feature extraction, classification, regression, pattern recognition, training graphs, nonlinear dimensionality reduction, supervised learning, implicitly supervised, high-dimensional data, image analysis
4 0.33771756 59 jmlr-2013-Large-scale SVD and Manifold Learning
Author: Ameet Talwalkar, Sanjiv Kumar, Mehryar Mohri, Henry Rowley
Abstract: This paper examines the efficacy of sampling-based low-rank approximation techniques when applied to large dense kernel matrices. We analyze two common approximate singular value decomposition techniques, namely the Nystr¨ m and Column sampling methods. We present a theoretical o comparison between these two methods, provide novel insights regarding their suitability for various tasks and present experimental results that support our theory. Our results illustrate the relative strengths of each method. We next examine the performance of these two techniques on the largescale task of extracting low-dimensional manifold structure given millions of high-dimensional face images. We address the computational challenges of non-linear dimensionality reduction via Isomap and Laplacian Eigenmaps, using a graph containing about 18 million nodes and 65 million edges. We present extensive experiments on learning low-dimensional embeddings for two large face data sets: CMU-PIE (35 thousand faces) and a web data set (18 million faces). Our comparisons show that the Nystr¨ m approximation is superior to the Column sampling method for this o task. Furthermore, approximate Isomap tends to perform better than Laplacian Eigenmaps on both clustering and classification with the labeled CMU-PIE data set. Keywords: low-rank approximation, manifold learning, large-scale matrix factorization
5 0.33730978 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning
Author: Wendelin Böhmer, Steffen Grünewälder, Yun Shen, Marek Musial, Klaus Obermayer
Abstract: Linear reinforcement learning (RL) algorithms like least-squares temporal difference learning (LSTD) require basis functions that span approximation spaces of potential value functions. This article investigates methods to construct these bases from samples. We hypothesize that an ideal approximation spaces should encode diffusion distances and that slow feature analysis (SFA) constructs such spaces. To validate our hypothesis we provide theoretical statements about the LSTD value approximation error and induced metric of approximation spaces constructed by SFA and the state-of-the-art methods Krylov bases and proto-value functions (PVF). In particular, we prove that SFA minimizes the average (over all tasks in the same environment) bound on the above approximation error. Compared to other methods, SFA is very sensitive to sampling and can sometimes fail to encode the whole state space. We derive a novel importance sampling modification to compensate for this effect. Finally, the LSTD and least squares policy iteration (LSPI) performance of approximation spaces constructed by Krylov bases, PVF, SFA and PCA is compared in benchmark tasks and a visual robot navigation experiment (both in a realistic simulation and with a robot). The results support our hypothesis and suggest that (i) SFA provides subspace-invariant features for MDPs with self-adjoint transition operators, which allows strong guarantees on the approximation error, (ii) the modified SFA algorithm is best suited for LSPI in both discrete and continuous state spaces and (iii) approximation spaces encoding diffusion distances facilitate LSPI performance. Keywords: reinforcement learning, diffusion distance, proto value functions, slow feature analysis, least-squares policy iteration, visual robot navigation c 2013 Wendelin B¨ hmer, Steffen Gr¨ new¨ lder, Yun Shen, Marek Musial and Klaus Obermayer. o u a ¨ ¨ ¨ B OHMER , G R UNEW ALDER , S HEN , M USIAL AND O BERMAYER
6 0.33659595 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization
7 0.33369172 3 jmlr-2013-A Framework for Evaluating Approximation Methods for Gaussian Process Regression
8 0.33338365 86 jmlr-2013-Parallel Vector Field Embedding
10 0.33169386 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation
11 0.33159283 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering
12 0.33118448 46 jmlr-2013-GURLS: A Least Squares Library for Supervised Learning
13 0.33113784 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference
14 0.33107728 51 jmlr-2013-Greedy Sparsity-Constrained Optimization
15 0.33061033 77 jmlr-2013-On the Convergence of Maximum Variance Unfolding
16 0.33047512 75 jmlr-2013-Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood
17 0.33005992 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems
18 0.3290554 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning
19 0.32874888 22 jmlr-2013-Classifying With Confidence From Incomplete Information
20 0.32854524 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion