nips nips2007 nips2007-53 nips2007-53-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shuheng Zhou, Larry Wasserman, John D. Lafferty
Abstract: Recent research has studied the role of sparsity in high dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models from sparse data. In this paper we study a variant of this problem where the original n input variables are compressed by a random linear transformation to m n examples in p dimensions, and establish conditions under which a sparse linear model can be successfully recovered from the compressed data. A primary motivation for this compression procedure is to anonymize the data and preserve privacy by revealing little information about the original data. We characterize the number of random projections that are required for 1 -regularized compressed regression to identify the nonzero coefficients in the true model with probability approaching one, a property called “sparsistence.” In addition, we show that 1 -regularized compressed regression asymptotically predicts as well as an oracle linear model, a property called “persistence.” Finally, we characterize the privacy properties of the compression procedure in information-theoretic terms, establishing upper bounds on the rate of information communicated between the compressed and uncompressed data that decay to zero. 1
[1] D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In In Proceedings of the 20th Symposium on Principles of Database Systems, May 2001.
[2] E. Candès, J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications in Pure and Applied Mathematics, 59(8):1207–1223, August 2006.
[3] T. Dalenius. Towards a methodology for statistical disclosure control. Statistik Tidskrift, 15:429–444, 1977.
[4] D. Donoho. Compressed sensing. IEEE Trans. Info. Theory, 52(4):1289–1306, April 2006.
[5] M. Duarte, M. Davenport, M. Wakin, and R. Baraniuk. Sparse signal detection from incoherent projections. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 2006.
[6] G. Duncan and R. Pearson. Enhancing access to microdata while protecting confidentiality: Prospects for the future. Statistical Science, 6(3):219–232, August 1991.
[7] C. Dwork. Differential privacy. In 33rd International Colloquium on Automata, Languages and Programming–ICALP 2006, pages 1–12, 2006.
[8] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32(2):407– 499, 2004.
[9] J. Feigenbaum, Y. Ishai, T. Malkin, K. Nissim, M. J. Strauss, and R. N. Wright. Secure multiparty computation of approximations. ACM Trans. Algorithms, 2(3):435–472, 2006.
[10] E. Greenshtein and Y. Ritov. Persistency in high dimensional linear predictor-selection and the virtue of over-parametrization. Journal of Bernoulli, 10:971–988, 2004.
[11] J. Haupt, R. Castro, R. Nowak, G. Fudge, and A. Yeh. Compressive sampling for signal classification. In Proc. Asilomar Conference on Signals, Systems, and Computers, October 2006.
[12] K. Liu, H. Kargupta, and J. Ryan. Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans. on Knowl. and Data Engin., 18(1), Jan. 2006.
[13] T. L. Marzetta and B. M. Hochwald. Capacity of a mobile multiple-antenna communication link in Rayleigh flat fading. IEEE Trans. Info. Theory, 45(1):139–157, January 1999.
[14] N. Meinshausen and B. Yu. Lasso-type recovery of sparse representations for high-dimensional data. Technical Report 720, Department of Statistics, UC Berkeley, 2006.
[15] M. Osborne, B. Presnell, and B. Turlach. On the lasso and its dual. J. Comp. and Graph. Stat., 9(2):319– 337, 2000.
[16] A. P. Sanil, A. Karr, X. Lin, and J. P. Reiter. Privacy preserving regression modelling via distributed computation. In Proceedings of Tenth ACM SIGKDD, 2004.
[17] R. Tibshirani. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B, 58(1):267–288, 1996.
[18] J. Tropp. Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50(10):2231–2242, 2004.
[19] M. Wainwright. Sharp thresholds for high-dimensional and noisy recovery of sparsity. Technical Report 709, Department of Statistics, UC Berkeley, May 2006.
[20] P. Zhao and B. Yu. On model selection consistency of lasso. J. Mach. Learn. Research, 7:2541–2567, 2007. 8