nips nips2012 nips2012-237 nips2012-237-reference knowledge-graph by maker-knowledge-mining

237 nips-2012-Near-optimal Differentially Private Principal Components

Source: pdf

Author: Kamalika Chaudhuri, Anand Sarwate, Kaushik Sinha

Abstract: Principal components analysis (PCA) is a standard tool for identifying good lowdimensional approximations to data sets in high dimension. Many current data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We demonstrate that on real data, there is a large performance gap between the existing method and our method. We show that the sample complexity for the two procedures differs in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. 1

reference text

[1] BARAK , B., C HAUDHURI , K., DWORK , C., K ALE , S., M C S HERRY, F., AND TALWAR , K. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In PODS (2007), pp. 273– 282.

[2] B LUM , A., DWORK , C., M C S HERRY, F., AND N ISSIM , K. Practical privacy: the SuLQ framework. In PODS (2005), pp. 128–138.

[3] B LUM , A., L IGETT, K., AND ROTH , A. A learning theory approach to non-interactive database privacy. In STOC (2008), R. E. Ladner and C. Dwork, Eds., ACM, pp. 609–618.

[4] C HAUDHURI , K., AND H SU , D. Sample complexity bounds for differentially private learning. In COLT (2011).

[5] C HAUDHURI , K., AND H SU , D. Convergence rates for differentially private statistical estimation. In ICML (2012).

[6] C HAUDHURI , K., M ONTELEONI , C., AND S ARWATE , A. D. Differentially private empirical risk minimization. Journal of Machine Learning Research 12 (March 2011), 1069–1109.

[7] C HIKUSE , Y. Statistics on Special Manifolds. No. 174 in Lecture Notes in Statistics. Springer, New York, 2003.

[8] DWORK , C., K ENTHAPADI , K., M C S HERRY, F., M IRONOV, I., AND NAOR , M. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT (2006), vol. 4004, pp. 486–503.

[9] DWORK , C., M C S HERRY, F., N ISSIM , K., AND S MITH , A. Calibrating noise to sensitivity in private data analysis. In 3rd IACR Theory of Cryptography Conference, (2006), pp. 265–284.

[10] F RIEDMAN , A., AND S CHUSTER , A. Data mining with differential privacy. In KDD (2010), pp. 493– 502.

[11] F UNG , B. C. M., WANG , K., C HEN , R., AND Y U , P. S. Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv. 42, 4 (June 2010), 53 pages.

[12] G ANTA , S. R., K ASIVISWANATHAN , S. P., AND S MITH , A. Composition attacks and auxiliary information in data privacy. In KDD (2008), pp. 265–273.

[13] H AN , S., N G , W. K., AND Y U , P. Privacy-preserving singular value decomposition. In ICDE (29 2009-april 2 2009), pp. 1267 –1270.

[14] H ARDT, M., AND ROTH , A. Beating randomized response on incoherent matrices. In STOC (2012).

[15] H OFF , P. D. Simulation of the matrix Bingham-von Mises-Fisher distribution, with applications to multivariate and relational data. J. Comp. Graph. Stat. 18, 2 (2009), 438–456.

[16] K APRALOV, M., AND TALWAR , K. On differentially private low rank approximation. In Proc. of SODA (2013).

[17] K ASIVISWANATHAN , S. P., AND S MITH , A. A note on differential privacy: Deﬁning resistance to arbitrary side information. CoRR abs/0803.3946 (2008).

[18] L IU , K., K ARGUPTA , H., AND RYAN , J. Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans. Knowl. Data Eng. 18, 1 (2006), 92–106.

[19] M ACHANAVAJJHALA , A., K IFER , D., A BOWD , J. M., G EHRKE , J., AND V ILHUBER , L. Privacy: Theory meets practice on the map. In ICDE (2008), pp. 277–286.

[20] M C S HERRY, F. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD Conference (2009), pp. 19–30.

[21] M C S HERRY, F., AND M IRONOV, I. Differentially private recommender systems: Building privacy into the netﬂix prize contenders. In KDD (2009), pp. 627–636.

[22] M C S HERRY, F., AND TALWAR , K. Mechanism design via differential privacy. In FOCS (2007), pp. 94– 103.

[23] M OHAMMED , N., C HEN , R., F UNG , B. C. M., AND Y U , P. S. Differentially private data release for data mining. In KDD (2011), pp. 493–501.

[24] N ISSIM , K., R ASKHODNIKOVA , S., AND S MITH , A. Smooth sensitivity and sampling in private data analysis. In STOC (2007), D. S. Johnson and U. Feige, Eds., ACM, pp. 75–84.

[25] S TEWART, G. On the early history of the singular value decomposition. SIAM Review 35, 4 (1993), 551–566.

[26] WASSERMAN , L., AND Z HOU , S. A statistical framework for differential privacy. JASA 105, 489 (2010).

[27] W ILLIAMS , O., AND M C S HERRY, F. Probabilistic inference and differential privacy. In NIPS (2010).

[28] Z HAN , J. Z., AND M ATWIN , S. Privacy-preserving support vector machine classiﬁcation. IJIIDS 1, 3/4 (2007), 356–385. 9