jmlr jmlr2013 jmlr2013-5 knowledge-graph by maker-knowledge-mining

5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components

Source: pdf

Author: Kamalika Chaudhuri, Anand D. Sarwate, Kaushik Sinha

Abstract: The principal components analysis (PCA) algorithm is a standard tool for identifying good lowdimensional approximations to high-dimensional data. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We show that the sample complexity of the proposed method differs from the existing procedure in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results, showing that on real data there is a large performance gap between the existing method and our method. Keywords: differential privacy, principal components analysis, dimension reduction

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Many data sets of interest contain private or sensitive information about individuals. [sent-9, score-0.242]

2 Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. [sent-10, score-0.558]

3 Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. [sent-11, score-1.238]

4 In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. [sent-12, score-0.545]

5 Much of today’s machine-learning is performed on the vast amounts of personal information collected by private companies and government agencies about individuals: examples include user or customer behaviors, demographic surveys, and test results from experimental subjects or patients. [sent-28, score-0.242]

6 We study approximations to PCA which guarantee differential privacy, a cryptographically motivated deﬁnition of privacy (Dwork et al. [sent-31, score-0.652]

7 Differential privacy measures privacy risk by a parameter ε p that bounds the log-likelihood ratio of output of a (private) algorithm under two databases differing in a single individual. [sent-35, score-1.143]

8 The goal of this paper is to characterize the problem of differentially private PCA. [sent-45, score-0.423]

9 In order to understand the performance gap, we prove sample complexity bounds for the case of k = 1 for SULQ and PPCA, as well as a general lower bound on the sample complexity for √ any differentially private algorithm. [sent-52, score-0.45]

10 Furthermore, we show that any differentially private algorithm requires Ω(d) samples. [sent-54, score-0.423]

11 These theoretical results suggest that our experiments demonstrate the limit of how well ε p -differentially private algorithms can perform, and our experiments show that this gap should persist for general k. [sent-56, score-0.264]

12 However, we believe this is a consequence of the fact that we make minimal assumptions on the data; our results imply that, absent additional limitations on the data set, the sample complexity differentially private PCA must grow linearly with the data dimension. [sent-58, score-0.423]

13 Differentially privacy is a mathematical deﬁnition, but algorithms must be implemented using ﬁnite precision machines. [sent-61, score-0.558]

14 A second set of issues is theoretical—while the privacy guarantees of PPCA hold for all k, our theoretical analysis of sample complexity applies only to k = 1 in which the distance and angles between vectors are related. [sent-63, score-0.558]

15 Under differential privacy, the selection of k itself must be done in a differentially private manner. [sent-66, score-0.517]

16 1 Related Work Differential privacy was ﬁrst proposed by Dwork et al. [sent-68, score-0.558]

17 Differential privacy has been shown to have strong semantic guarantees (Dwork et al. [sent-72, score-0.558]

18 In particular, so-called syntactic deﬁnitions of privacy (Sweeney, 2002; Machanavajjhala et al. [sent-75, score-0.558]

19 There are several general approaches to constructing differentially private approximations to some desired algorithm or computation. [sent-78, score-0.423]

20 The exponential mechanism (McSherry and Talwar, 2007) can be used to perform differentially private selection based on a score function that measures the quality of different outputs. [sent-82, score-0.459]

21 (2007) and Dwork and Lei (2009) have been used to approximate a variety of statistical, machine learning, and data mining tasks under differential privacy (Barak et al. [sent-86, score-0.652]

22 This paper deals with the problem of differentially private approximations to PCA. [sent-92, score-0.423]

23 Subsequent to our work, Kapralov and Talwar (2013) 2907 C HAUDHURI , S ARWATE AND S INHA have proposed a dynamic programming algorithm for differentially private low rank matrix approximation which involves sampling from a distribution induced by the exponential mechanism. [sent-97, score-0.447]

24 Hardt and Roth (2013) show a method for guaranteeing (ε, δ)-differential privacy under an entry-wise neighborhood condition using the power method for calculating singular values. [sent-101, score-0.558]

25 (2009) analyze a differentially private data release scheme where a random linear transformation is applied to data to preserve differential privacy, and then measures how much this transformation affects the utility of a PCA of the data. [sent-105, score-0.639]

26 (2012) show that the JL transform of the data preserves differential privacy provided the minimum singular value of the data matrix is large. [sent-108, score-0.676]

27 (2013) study the problem of estimating the distance between data points with differential privacy using a random projection of the data points. [sent-110, score-0.652]

28 There has been signiﬁcant work on other notions of privacy based on manipulating entries within the database (Sweeney, 2002; Machanavajjhala et al. [sent-111, score-0.587]

29 For more details on these and other alternative notions of privacy see Fung et al. [sent-114, score-0.558]

30 , 2006); our work differs from this line of work in that we offer differential privacy guarantees, and we only release the PCA subspace, not actual data. [sent-119, score-0.652]

31 , xn } where each xi corresponds to the private value of one individual, xi ∈ Rd , and xi ≤ 1 for all i. [sent-124, score-0.242]

32 We study approximations to A to PCA that preserve the privacy of the underlying data. [sent-150, score-0.558]

33 The notion of privacy that we use is differential privacy, which quantiﬁes the privacy guaranteed by a randomized algorithm A applied to a data set D . [sent-151, score-1.21]

34 Deﬁnition 4 An algorithm A (B ) taking values in a set T provides (ε p , δ)-differential privacy if P (A (D ) ∈ S ) ≤ eε p P A (D ′ ) ∈ S + δ, for all measurable S ⊆ T and all data sets D and D ′ differing in a single entry. [sent-154, score-0.558]

35 Here ε p and δ are privacy parameters, where low ε p and δ ensure more privacy (Dwork et al. [sent-155, score-1.116]

36 The second privacy guarantee is weaker; the parameter δ bounds the probability of failure, and is typically chosen to be quite small. [sent-158, score-0.558]

37 In this paper we are interested in proving results on the sample complexity of differentially private algorithms that approximate PCA. [sent-161, score-0.423]

38 That is, for a given ε p and ρ, how large must the number of individuals n in the data set be such that the algorithm is both ε p -differentially private and a (ρ, η)-close approximation to PCA? [sent-162, score-0.274]

39 It is well known that as the number of individuals n grows, it is easier to guarantee the same level of privacy with relatively less noise or perturbation, and therefore the utility of the approximation also improves. [sent-163, score-0.712]

40 Our results characterize how the privacy ε p and utility ρ scale with n and the tradeoff between them for ﬁxed n. [sent-164, score-0.68]

41 Algorithms and Results In this section we describe differentially private techniques for approximating (2). [sent-167, score-0.423]

42 Both procedures are differentially private approximations to the top-k subspace: SULQ guarantees (ε p , δ)-differential privacy and PPCA guarantees ε p -differential privacy. [sent-171, score-0.981]

43 The SULQ method perturbs each entry of the empirical second moment matrix A to ensure differential privacy and releases the top-k eigenvectors of this perturbed matrix. [sent-175, score-0.711]

44 This guarantees a weaker privacy deﬁnition known as (ε p , δ)-differential privacy. [sent-180, score-0.558]

45 Algorithm 1: Algorithm MOD-SULQ (input pertubation) inputs: d × n data matrix X, privacy parameter ε p , parameter δ ˆ outputs: d × k matrix Vk = [v1 v2 · · · vk ] with orthonormal columns ˆ ˆ ˆ 1 1 Set A = n XX T . [sent-192, score-0.702]

46 2 Exponential Mechanism Our new method, PPCA, randomly samples a k-dimensional subspace from a distribution that ensures differential privacy and is biased towards high utility. [sent-200, score-0.713]

47 The algorithm and its privacy properties apply to general k < d but our theoretical results on the utility focus on the special case k = 1. [sent-202, score-0.68]

48 By combining results on the exponential mechanism along with properties of PCA algorithm, we can show that this procedure is differentially private. [sent-205, score-0.217]

49 4 Main Results Our theoretical results are sample complexity bounds for PPCA and MOD-SULQ as well as a general lower bound on the sample complexity for any ε p -differentially private algorithm. [sent-224, score-0.269]

50 These results show that the PPCA is nearly optimal in terms of the scaling of the sample complexity with respect to the data dimension d, privacy parameter ε p , and eigengap ∆. [sent-225, score-0.596]

51 We further show that MOD-SULQ requires more samples as a function of d, despite having a slightly weaker privacy guarantee. [sent-226, score-0.558]

52 Theorem 5 For the β in (5) Algorithm MOD-SULQ is (ε p , δ) differentially private. [sent-230, score-0.181]

53 The fact that these two algorithms are differentially private follows from some simple calculations. [sent-232, score-0.423]

54 Our ﬁrst sample complexity result provides an upper bound on the number of samples required by PPCA to guarantee a certain level of privacy and accuracy. [sent-233, score-0.585]

55 Theorem 7 (Sample complexity of PPCA) If n> 8λ1 d log(1/η) + 2 log 4 , ε p ∆(1 − ρ) d (1 − ρ2 )∆ then the top PCA direction v1 and the output of PPCA v1 with privacy parameter ε p satisfy ˆ Pr(| v1 , v1 | > ρ) ≥ 1 − η. [sent-236, score-0.625]

56 For any ρ ≥ 1 − no ε p -differentially private algorithm A can apexp −2 · proximate PCA with expected utility greater than ρ on all databases with n points in dimension d having eigenvalue gap ∆, where n< d max 1, εp∆ Theorem 7 shows that if n scales like 1−φ 80(1 − ρ) . [sent-240, score-0.413]

57 1 d ε p ∆(1−ρ) log 1−ρ2 then PPCA produces an approximation d for any v1 that has correlation ρ with v1 , whereas Theorem 8 shows that n must scale like √ ˆ εp∆ (1−ρ) ε p -differentially private algorithm. [sent-241, score-0.285]

58 Proof Fix a privacy level ε p , target correlation ρ, and probability η. [sent-246, score-0.558]

59 3 Lower Bound on Utility We now turn to a general lower bound on the sample complexity for any differentially private approximation to PCA. [sent-265, score-0.45]

60 For such a collection, Lemma 12 shows that for any differentially private mechanism, the average correlation over the collection cannot be too large. [sent-267, score-0.423]

61 That is, any ε p -differentially private mechanism cannot have high utility on all K data sets. [sent-268, score-0.4]

62 The following Lemma gives a lower bound on the expected utility averaged over a set of databases which differ in a “small” number of elements. [sent-275, score-0.176]

63 If A is any ε p -differentially private algorithm, then, K ∑ EA [| A (Di), ui i=1 |] ≤ K 1 − 1 (1 − max ui , u j ) . [sent-286, score-0.352]

64 16 Proof Let t = min( ui − u j , ui + u j ), i= j and Gi be the “double cap” around ±ui of radius t/2: Gi = {u : u − ui < t/2} ∪ {u : u + ui < t/2} . [sent-287, score-0.22]

65 8 Rewriting the norms in terms of inner products shows K 1 2K − 2 ∑ EA [| A (Di ), ui |] ≥ (K − 1) 2 − 2 max ui , u j 8 i=1 , so K ∑ EA [| A (Di), ui i=1 |] ≤ K 1 − ≤ K 1− 1 K −1 (1 − max ui , u j ) 8 K 1 (1 − max ui , u j ) . [sent-293, score-0.275]

66 , K}, for at least one database i, EA [| A (Di ), ui |] ≤ 1 − ln(K−1) εp points with top eigenvectors 1 1 − max ui , u j 16 for any ε p -differentially private algorithm. [sent-298, score-0.44]

67 ≤ 1− 2 b + (a − λ)2 ≤ max ui , u j i= j (17) 2 To obtain an upper bound on maxi= j ui , u j we must lower bound b2(a−λ) 2 . [sent-336, score-0.164]

68 80(1 − ρ) 2920 A N EAR -O PTIMAL A LGORITHM FOR D IFFERENTIALLY-P RIVATE P RINCIPAL C OMPONENTS d Thus for β ≤ ∆ ≤ 1/2, any ε p -differentially private algorithm needs Ω ε ∆√1−ρ points to get p expected inner product ρ on all data sets with eigengap ∆. [sent-344, score-0.28]

69 As the top eigenvector of the i-th database is ui = wi , max | ui , u j | = max | wi , w j | ≤ φ. [sent-349, score-0.214]

70 Theorem 15 provides a lower bound on the distance between the vector released by MOD-SULQ and the true top eigenvector in terms of the privacy parameters ε p and δ and the number of points n in the data set. [sent-354, score-0.66]

71 Theorem 15 provides a lower bound on the distance between the vector released by MOD-SULQ and the true top eigenvector in terms of the privacy parameters ε p and δ and the number of points n in the data set. [sent-386, score-0.66]

72 2923 C HAUDHURI , S ARWATE AND S INHA Theorem 15 (Utility bound for MOD-SULQ) Let d, n, and ε p > 0 be given and let β be given by Algorithm 1 so that the output of MOD-SULQ is (ε p , δ)-differentially private for all data sets in Rd with n elements. [sent-407, score-0.269]

73 The interaction of MCMC procedures and differential privacy is a rich area for future research. [sent-482, score-0.652]

74 We choose k = 4 for kddcup, k = 8 for census, k = 10 for localization and k = 11 for insurance; in each case, the utility qF (Vk ) of the top-k PCA subspace of the data matrix accounts for at least 80% of A F . [sent-490, score-0.244]

75 2927 C HAUDHURI , S ARWATE AND S INHA Data Set kddcup census localization insurance #instances 494,021 199,523 164,860 9,822 #dimensions 116 513 44 150 qF (Vk ) 0. [sent-497, score-0.209]

76 The “ideal” execution of PPCA provides ε p -differential privacy, but because our implementation only approximates sampling from the Bingham distribution, we cannot guarantee that this implementation provides the privacy guarantee. [sent-520, score-0.558]

77 As noted by Mironov (2012), even current implementations of ﬂoating-point arithmetic may suffer from privacy problems, so there is still signiﬁcant work to do between theory and implementation. [sent-521, score-0.558]

78 Figure 2(a) shows log Fki (T ) for for k = 4 as a function of iteration T for 40, 000 steps of the Gibbs sampler on the kddcup data set. [sent-531, score-0.158]

79 log Fki (T ) also appears to have a larger variance for kddcup than for insurance; this is explained by the fact that the kddcup data set has a much larger number of samples, which makes its stationary distribution farther from the initial distribution of the sampler. [sent-538, score-0.165]

80 These plots illustrate how additional data can improve the utility of the output for a ﬁxed privacy level ε p . [sent-557, score-0.68]

81 As n increases, the dashed blue line indicating the utility of PPCA begins to approach qF (Vk ), the utility of the optimal subspace. [sent-558, score-0.244]

82 Our theoretical results suggest that the performance of differentially private PCA cannot be signiﬁcantly improved over the performance of PPCA but since those results hold for k = 1 they do not immediately apply here. [sent-562, score-0.423]

83 0 2e+04 4e+04 6e+04 8e+04 1e+05 sample size (b) kddcup (k = 4) Figure 3: Plot of the unnormalized utility qF (U) versus the sample size n, averaged over random subsets of the data and randomness in the algorithms. [sent-586, score-0.183]

84 The top red line is the PCA algorithm without privacy constraints. [sent-588, score-0.582]

85 1 2000 4000 6000 8000 10000 sample size (b) insurance (k = 11) Figure 4: Plot of the unnormalized utility qF (U) versus the sample size n, averaged over random subsets of the data and randomness in the algorithms. [sent-604, score-0.192]

86 The top red line is the PCA algorithm without privacy constraints. [sent-606, score-0.582]

87 The green and purple dashed lines are nearly indistinguishable for insurance but diverge for localization—they represent the utility from random projections and MOD-SULQ, respectively. [sent-608, score-0.222]

88 In this case the space spanned by the top two eigenvectors has most of the energy, so we chose k = 2 and plotted the utility qF (·) for non-private PCA, MOD-SULQ with δ = 0. [sent-639, score-0.181]

89 The privacy requirement relaxes as ε p increases, and both MOD-SULQ and PPCA approach the utility of PCA without privacy constraints. [sent-642, score-1.238]

90 Utility versus privacy parameter Utility qF(U) 0. [sent-644, score-0.558]

91 Conclusion In this paper we investigated the theoretical and empirical performance of differentially private approximations to PCA. [sent-656, score-0.423]

92 Because PPCA uses the exponential mechanism with qF (·) as the utility function, it is not surprising that it 2933 C HAUDHURI , S ARWATE AND S INHA performs well. [sent-659, score-0.158]

93 We furthermore showed that PPCA is nearly optimal, in that any differentially private approximation to PCA must use Ω(d) samples. [sent-661, score-0.423]

94 The description of differentially private algorithms assume an ideal model of computation : real systems require additional security assumptions that have to be veriﬁed. [sent-663, score-0.423]

95 Numerical optimization methods used in some privacy methods (Chaudhuri et al. [sent-665, score-0.558]

96 For minimax lower bounds, it may be possible to construct a packing with respect to a general utility metric. [sent-672, score-0.162]

97 Differential privacy for statistics: What we know and what we want to learn. [sent-1031, score-0.558]

98 Accurate estimation of the degree distribution of private networks. [sent-1163, score-0.242]

99 Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. [sent-1275, score-0.585]

100 Differentially private recommender systems: Building privacy into the netﬂix prize contenders. [sent-1326, score-0.8]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('privacy', 0.558), ('ppca', 0.372), ('private', 0.242), ('qf', 0.19), ('differentially', 0.181), ('doi', 0.157), ('ear', 0.144), ('omponents', 0.144), ('sulq', 0.144), ('arwate', 0.137), ('haudhuri', 0.137), ('inha', 0.137), ('rincipal', 0.123), ('rivate', 0.123), ('utility', 0.122), ('pca', 0.118), ('url', 0.106), ('dwork', 0.105), ('mcsherry', 0.097), ('ptimal', 0.096), ('differential', 0.094), ('lgorithm', 0.075), ('insurance', 0.07), ('bingham', 0.068), ('chaudhuri', 0.067), ('vk', 0.063), ('subspace', 0.061), ('kamalika', 0.061), ('kddcup', 0.061), ('gibbs', 0.057), ('ui', 0.055), ('sampler', 0.054), ('fki', 0.053), ('talwar', 0.052), ('eigenvector', 0.051), ('mcmc', 0.049), ('hoff', 0.046), ('hardt', 0.045), ('log', 0.043), ('census', 0.041), ('packing', 0.04), ('nissim', 0.039), ('fi', 0.038), ('csisz', 0.038), ('eigengap', 0.038), ('blum', 0.037), ('localization', 0.037), ('mechanism', 0.036), ('zi', 0.036), ('sd', 0.035), ('cynthia', 0.035), ('eigenvectors', 0.035), ('qa', 0.035), ('http', 0.034), ('chain', 0.034), ('nt', 0.033), ('orthonormal', 0.033), ('brooks', 0.032), ('machanavajjhala', 0.032), ('mironov', 0.032), ('copies', 0.032), ('individuals', 0.032), ('anand', 0.03), ('frank', 0.03), ('kapralov', 0.03), ('kaushik', 0.03), ('roberts', 0.03), ('projections', 0.03), ('di', 0.029), ('database', 0.029), ('adam', 0.029), ('sensitivity', 0.028), ('bound', 0.027), ('databases', 0.027), ('subspaces', 0.027), ('perturbation', 0.027), ('june', 0.026), ('gi', 0.026), ('ny', 0.026), ('ilya', 0.026), ('nonprivate', 0.026), ('sphere', 0.026), ('markov', 0.025), ('top', 0.024), ('ln', 0.024), ('ai', 0.024), ('adds', 0.024), ('matrix', 0.024), ('stoc', 0.024), ('exp', 0.023), ('avrim', 0.023), ('ganta', 0.023), ('gareth', 0.023), ('kobbi', 0.023), ('narayan', 0.023), ('nina', 0.023), ('packings', 0.023), ('sarwate', 0.023), ('wichita', 0.023), ('gap', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components

Author: Kamalika Chaudhuri, Anand D. Sarwate, Kaushik Sinha

2 0.42789534 32 jmlr-2013-Differential Privacy for Functions and Functional Data

Author: Rob Hall, Alessandro Rinaldo, Larry Wasserman

Abstract: Differential privacy is a rigorous cryptographically-motivated characterization of data privacy which may be applied when releasing summaries of a database. Previous work has focused mainly on methods for which the output is a ﬁnite dimensional vector, or an element of some discrete set. We develop methods for releasing functions while preserving differential privacy. Speciﬁcally, we show that adding an appropriate Gaussian process to the function of interest yields differential privacy. When the functions lie in the reproducing kernel Hilbert space (RKHS) generated by the covariance kernel of the Gaussian process, then the correct noise level is established by measuring the “sensitivity” of the function in the RKHS norm. As examples we consider kernel density estimation, kernel support vector machines, and functions in RKHSs. Keywords: differential privacy, density estimation, Gaussian processes, reproducing kernel Hilbert space

3 0.050006192 7 jmlr-2013-A Risk Comparison of Ordinary Least Squares vs Ridge Regression

Author: Paramveer S. Dhillon, Dean P. Foster, Sham M. Kakade, Lyle H. Ungar

Abstract: We compare the risk of ridge regression to a simple variant of ordinary least squares, in which one simply projects the data onto a ﬁnite dimensional subspace (as speciﬁed by a principal component analysis) and then performs an ordinary (un-regularized) least squares regression in this subspace. This note shows that the risk of this ordinary least squares method (PCA-OLS) is within a constant factor (namely 4) of the risk of ridge regression (RR). Keywords: risk inﬂation, ridge regression, pca

4 0.044969562 36 jmlr-2013-Distributions of Angles in Random Packing on Spheres

Author: Tony Cai, Jianqing Fan, Tiefeng Jiang

Abstract: This paper studies the asymptotic behaviors of the pairwise angles among n randomly and uniformly distributed unit vectors in R p as the number of points n → ∞, while the dimension p is either ﬁxed or growing with n. For both settings, we derive the limiting empirical distribution of the random angles and the limiting distributions of the extreme angles. The results reveal interesting differences in the two settings and provide a precise characterization of the folklore that “all high-dimensional random vectors are almost always nearly orthogonal to each other”. Applications to statistics and machine learning and connections with some open problems in physics and mathematics are also discussed. Keywords: random angle, uniform distribution on sphere, empirical law, maximum of random variables, minimum of random variables, extreme-value distribution, packing on sphere

5 0.042821743 78 jmlr-2013-On the Learnability of Shuffle Ideals

Author: Dana Angluin, James Aspnes, Sarah Eisenstat, Aryeh Kontorovich

Abstract: PAC learning of unrestricted regular languages is long known to be a difﬁcult problem. The class of shufﬂe ideals is a very restricted subclass of regular languages, where the shufﬂe ideal generated by a string u is the collection of all strings containing u as a subsequence. This fundamental language family is of theoretical interest in its own right and provides the building blocks for other important language families. Despite its apparent simplicity, the class of shufﬂe ideals appears quite difﬁcult to learn. In particular, just as for unrestricted regular languages, the class is not properly PAC learnable in polynomial time if RP = NP, and PAC learning the class improperly in polynomial time would imply polynomial time algorithms for certain fundamental problems in cryptography. In the positive direction, we give an efﬁcient algorithm for properly learning shufﬂe ideals in the statistical query (and therefore also PAC) model under the uniform distribution. Keywords: PAC learning, statistical queries, regular languages, deterministic ﬁnite automata, shufﬂe ideals, subsequences

6 0.041406725 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering

7 0.040901355 43 jmlr-2013-Fast MCMC Sampling for Markov Jump Processes and Extensions

8 0.037782591 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion

9 0.036019143 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning

10 0.035670649 35 jmlr-2013-Distribution-Dependent Sample Complexity of Large Margin Learning

11 0.035505306 116 jmlr-2013-Truncated Power Method for Sparse Eigenvalue Problems

12 0.034301884 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference

13 0.034241159 96 jmlr-2013-Regularization-Free Principal Curve Estimation

14 0.03256477 104 jmlr-2013-Sparse Single-Index Model

15 0.03169056 84 jmlr-2013-PC Algorithm for Nonparanormal Graphical Models

16 0.031335346 75 jmlr-2013-Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood

17 0.030347755 87 jmlr-2013-Performance Bounds for λ Policy Iteration and Application to the Game of Tetris

18 0.030000187 86 jmlr-2013-Parallel Vector Field Embedding

19 0.029214595 114 jmlr-2013-The Rate of Convergence of AdaBoost

20 0.02868633 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.176), (1, 0.045), (2, 0.058), (3, -0.082), (4, -0.022), (5, -0.194), (6, -0.065), (7, 0.653), (8, -0.09), (9, 0.159), (10, 0.245), (11, 0.07), (12, -0.158), (13, 0.054), (14, 0.17), (15, -0.041), (16, -0.034), (17, -0.037), (18, -0.086), (19, 0.014), (20, 0.019), (21, 0.003), (22, 0.063), (23, -0.043), (24, 0.025), (25, -0.008), (26, -0.006), (27, -0.005), (28, -0.021), (29, -0.027), (30, -0.056), (31, 0.032), (32, 0.004), (33, 0.06), (34, 0.016), (35, -0.035), (36, -0.014), (37, 0.013), (38, 0.019), (39, 0.063), (40, -0.004), (41, 0.025), (42, 0.003), (43, 0.055), (44, -0.024), (45, 0.003), (46, -0.01), (47, -0.011), (48, 0.011), (49, -0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91297871 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components

Author: Kamalika Chaudhuri, Anand D. Sarwate, Kaushik Sinha

2 0.89663261 32 jmlr-2013-Differential Privacy for Functions and Functional Data

Author: Rob Hall, Alessandro Rinaldo, Larry Wasserman

3 0.18990934 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion

Author: Tony Cai, Wen-Xin Zhou

Abstract: We consider in this paper the problem of noisy 1-bit matrix completion under a general non-uniform sampling distribution using the max-norm as a convex relaxation for the rank. A max-norm constrained maximum likelihood estimate is introduced and studied. The rate of convergence for the estimate is obtained. Information-theoretical methods are used to establish a minimax lower bound under the general sampling model. The minimax upper and lower bounds together yield the optimal rate of convergence for the Frobenius norm loss. Computational algorithms and numerical performance are also discussed. Keywords: 1-bit matrix completion, low-rank matrix, max-norm, trace-norm, constrained optimization, maximum likelihood estimate, optimal rate of convergence

4 0.17595868 43 jmlr-2013-Fast MCMC Sampling for Markov Jump Processes and Extensions

Author: Vinayak Rao, Yee Whye Teh

Abstract: Markov jump processes (or continuous-time Markov chains) are a simple and important class of continuous-time dynamical systems. In this paper, we tackle the problem of simulating from the posterior distribution over paths in these models, given partial and noisy observations. Our approach is an auxiliary variable Gibbs sampler, and is based on the idea of uniformization. This sets up a Markov chain over paths by alternately sampling a ﬁnite set of virtual jump times given the current path, and then sampling a new path given the set of extant and virtual jump times. The ﬁrst step involves simulating a piecewise-constant inhomogeneous Poisson process, while for the second, we use a standard hidden Markov model forward ﬁltering-backward sampling algorithm. Our method is exact and does not involve approximations like time-discretization. We demonstrate how our sampler extends naturally to MJP-based models like Markov-modulated Poisson processes and continuous-time Bayesian networks, and show signiﬁcant computational beneﬁts over state-ofthe-art MCMC samplers for these models. Keywords: Markov jump process, MCMC, Gibbs sampler, uniformization, Markov-modulated Poisson process, continuous-time Bayesian network

5 0.17481709 116 jmlr-2013-Truncated Power Method for Sparse Eigenvalue Problems

Author: Xiao-Tong Yuan, Tong Zhang

Abstract: This paper considers the sparse eigenvalue problem, which is to extract dominant (largest) sparse eigenvectors with at most k non-zero components. We propose a simple yet effective solution called truncated power method that can approximately solve the underlying nonconvex optimization problem. A strong sparse recovery result is proved for the truncated power method, and this theory is our key motivation for developing the new algorithm. The proposed method is tested on applications such as sparse principal component analysis and the densest k-subgraph problem. Extensive experiments on several synthetic and real-world data sets demonstrate the competitive empirical performance of our method. Keywords: sparse eigenvalue, power method, sparse principal component analysis, densest k-subgraph

6 0.15631942 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering

7 0.15331429 35 jmlr-2013-Distribution-Dependent Sample Complexity of Large Margin Learning

8 0.14321356 117 jmlr-2013-Universal Consistency of Localized Versions of Regularized Kernel Methods

9 0.14103274 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference

10 0.13894053 7 jmlr-2013-A Risk Comparison of Ordinary Least Squares vs Ridge Regression

11 0.13631275 13 jmlr-2013-Approximating the Permanent with Fractional Belief Propagation

12 0.1326248 78 jmlr-2013-On the Learnability of Shuffle Ideals

13 0.13259508 104 jmlr-2013-Sparse Single-Index Model

14 0.12984729 109 jmlr-2013-Stress Functions for Nonlinear Dimension Reduction, Proximity Analysis, and Graph Drawing

15 0.12894155 39 jmlr-2013-Efficient Active Learning of Halfspaces: An Aggressive Approach

16 0.1282154 36 jmlr-2013-Distributions of Angles in Random Packing on Spheres

17 0.12726744 86 jmlr-2013-Parallel Vector Field Embedding

18 0.12699407 53 jmlr-2013-Improving CUR Matrix Decomposition and the Nystrom Approximation via Adaptive Sampling

19 0.12635671 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso

20 0.12488854 20 jmlr-2013-CODA: High Dimensional Copula Discriminant Analysis

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.021), (5, 0.098), (6, 0.037), (9, 0.015), (10, 0.076), (13, 0.4), (20, 0.029), (23, 0.028), (68, 0.021), (70, 0.021), (75, 0.056), (85, 0.028), (87, 0.04), (89, 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.66960996 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components

Author: Kamalika Chaudhuri, Anand D. Sarwate, Kaushik Sinha

2 0.41931081 32 jmlr-2013-Differential Privacy for Functions and Functional Data

Author: Rob Hall, Alessandro Rinaldo, Larry Wasserman

3 0.34041056 52 jmlr-2013-How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis

Author: Alberto N. Escalante-B., Laurenz Wiskott

Abstract: Supervised learning from high-dimensional data, for example, multimedia data, is a challenging task. We propose an extension of slow feature analysis (SFA) for supervised dimensionality reduction called graph-based SFA (GSFA). The algorithm extracts a label-predictive low-dimensional set of features that can be post-processed by typical supervised algorithms to generate the ﬁnal label or class estimation. GSFA is trained with a so-called training graph, in which the vertices are the samples and the edges represent similarities of the corresponding labels. A new weighted SFA optimization problem is introduced, generalizing the notion of slowness from sequences of samples to such training graphs. We show that GSFA computes an optimal solution to this problem in the considered function space and propose several types of training graphs. For classiﬁcation, the most straightforward graph yields features equivalent to those of (nonlinear) Fisher discriminant analysis. Emphasis is on regression, where four different graphs were evaluated experimentally with a subproblem of face detection on photographs. The method proposed is promising particularly when linear models are insufﬁcient as well as when feature selection is difﬁcult. Keywords: slow feature analysis, feature extraction, classiﬁcation, regression, pattern recognition, training graphs, nonlinear dimensionality reduction, supervised learning, implicitly supervised, high-dimensional data, image analysis

4 0.33771756 59 jmlr-2013-Large-scale SVD and Manifold Learning

Author: Ameet Talwalkar, Sanjiv Kumar, Mehryar Mohri, Henry Rowley

Abstract: This paper examines the efﬁcacy of sampling-based low-rank approximation techniques when applied to large dense kernel matrices. We analyze two common approximate singular value decomposition techniques, namely the Nystr¨ m and Column sampling methods. We present a theoretical o comparison between these two methods, provide novel insights regarding their suitability for various tasks and present experimental results that support our theory. Our results illustrate the relative strengths of each method. We next examine the performance of these two techniques on the largescale task of extracting low-dimensional manifold structure given millions of high-dimensional face images. We address the computational challenges of non-linear dimensionality reduction via Isomap and Laplacian Eigenmaps, using a graph containing about 18 million nodes and 65 million edges. We present extensive experiments on learning low-dimensional embeddings for two large face data sets: CMU-PIE (35 thousand faces) and a web data set (18 million faces). Our comparisons show that the Nystr¨ m approximation is superior to the Column sampling method for this o task. Furthermore, approximate Isomap tends to perform better than Laplacian Eigenmaps on both clustering and classiﬁcation with the labeled CMU-PIE data set. Keywords: low-rank approximation, manifold learning, large-scale matrix factorization

5 0.33730978 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning

Author: Wendelin Böhmer, Steffen Grünewälder, Yun Shen, Marek Musial, Klaus Obermayer

Abstract: Linear reinforcement learning (RL) algorithms like least-squares temporal difference learning (LSTD) require basis functions that span approximation spaces of potential value functions. This article investigates methods to construct these bases from samples. We hypothesize that an ideal approximation spaces should encode diffusion distances and that slow feature analysis (SFA) constructs such spaces. To validate our hypothesis we provide theoretical statements about the LSTD value approximation error and induced metric of approximation spaces constructed by SFA and the state-of-the-art methods Krylov bases and proto-value functions (PVF). In particular, we prove that SFA minimizes the average (over all tasks in the same environment) bound on the above approximation error. Compared to other methods, SFA is very sensitive to sampling and can sometimes fail to encode the whole state space. We derive a novel importance sampling modiﬁcation to compensate for this effect. Finally, the LSTD and least squares policy iteration (LSPI) performance of approximation spaces constructed by Krylov bases, PVF, SFA and PCA is compared in benchmark tasks and a visual robot navigation experiment (both in a realistic simulation and with a robot). The results support our hypothesis and suggest that (i) SFA provides subspace-invariant features for MDPs with self-adjoint transition operators, which allows strong guarantees on the approximation error, (ii) the modiﬁed SFA algorithm is best suited for LSPI in both discrete and continuous state spaces and (iii) approximation spaces encoding diffusion distances facilitate LSPI performance. Keywords: reinforcement learning, diffusion distance, proto value functions, slow feature analysis, least-squares policy iteration, visual robot navigation c 2013 Wendelin B¨ hmer, Steffen Gr¨ new¨ lder, Yun Shen, Marek Musial and Klaus Obermayer. o u a ¨ ¨ ¨ B OHMER , G R UNEW ALDER , S HEN , M USIAL AND O BERMAYER

6 0.33659595 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization

7 0.33369172 3 jmlr-2013-A Framework for Evaluating Approximation Methods for Gaussian Process Regression

8 0.33338365 86 jmlr-2013-Parallel Vector Field Embedding

9 0.33273947 2 jmlr-2013-A Binary-Classification-Based Metric between Time-Series Distributions and Its Use in Statistical and Learning Problems

10 0.33169386 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation

11 0.33159283 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering

12 0.33118448 46 jmlr-2013-GURLS: A Least Squares Library for Supervised Learning

13 0.33113784 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference

14 0.33107728 51 jmlr-2013-Greedy Sparsity-Constrained Optimization

15 0.33061033 77 jmlr-2013-On the Convergence of Maximum Variance Unfolding

16 0.33047512 75 jmlr-2013-Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood

17 0.33005992 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems

18 0.3290554 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning

19 0.32874888 22 jmlr-2013-Classifying With Confidence From Incomplete Information

20 0.32854524 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion