nips nips2012 nips2012-244 nips2012-244-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zhihua Zhang, Bojun Tu
Abstract: In this paper we study sparsity-inducing nonconvex penalty functions using L´ vy e processes. We define such a penalty as the Laplace exponent of a subordinator. Accordingly, we propose a novel approach for the construction of sparsityinducing nonconvex penalties. Particularly, we show that the nonconvex logarithmic (LOG) and exponential (EXP) penalty functions are the Laplace exponents of Gamma and compound Poisson subordinators, respectively. Additionally, we explore the concave conjugate of nonconvex penalties. We find that the LOG and EXP penalties are the concave conjugates of negative Kullback-Leiber (KL) distance functions. Furthermore, the relationship between these two penalties is due to asymmetricity of the KL distance. 1
[1] D. Applebaum. L´ vy Processes and Stochastic Calculus. Cambridge University Press, Cambridge, UK, e 2004.
[2] A. Armagan, D. Dunson, and J. Lee. Generalized double Pareto shrinkage. Technical report, Duke University Department of Statistical Science, February 2011.
[3] P. S. Bradley and O. L. Mangasarian. Feature selection via concave minimization and support vector machines. In The 26th International Conference on Machine Learning, pages 82–90. Morgan Kaufmann Publishers, San Francisco, California, 1998.
[4] F. Caron and A. Doucet. Sparse bayesian nonparametric regression. In Proceedings of the 25th international conference on Machine learning, page 88, 2008.
[5] V. Cevher. Learning with compressible priors. In Advances in Neural Information Processing Systems 22, pages 261–269, 2009.
[6] J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its Oracle properties. Journal of the American Statistical Association, 96:1348–1361, 2001.
[7] Osborne B. G., Fearn T., Miller A. R., and Douglas S. Application of near-infrared reflectance spectroscopy to compositional analysis of biscuits and biscuit dough. Journal of the Science of Food and Agriculture, 35(1):99–105, 1984.
[8] C. Gao, N. Wang, Q. Yu, and Z. Zhang. A feasible nonconvex relaxation approach to feature selection. In Proceedings of the Twenty-Fifth National Conference on Artificial Intelligence (AAAI’11), 2011.
[9] P. J. Garrigues and B. A. Olshausen. Group sparse coding with a Laplacian scale mixture prior. In Advances in Neural Information Processing Systems 22, 2010.
[10] Z. Ghahramani, T. Griffiths, and P. Sollich. Bayesian nonparametric latent feature models. In World meeting on Bayesian Statistics, 2006.
[11] J. E. Griffin and P. J. Brown. Bayesian adaptive Lassos with non-convex penalization. Technical report, University of Kent, 2010.
[12] A. Lee, F. Caron, A. Doucet, and C. Holmes. A hierarchical Bayesian framework for constructing sparsityinducing priors. Technical report, University of Oxford, UK, 2010.
[13] R. Mazumder, J. Friedman, and T. Hastie. SparseNet: Coordinate descent with nonconvex penalties. Journal of the American Statistical Association, 106(495):1125–1138, 2011.
[14] J. A. Palmer, D. P. Wipf, K. Kreutz-Delgado, and B. D. Rao. Variational EM algorithms for non-Gaussian latent variable models. In Advances in Neural Information Processing Systems 18, 2006.
[15] N. G. Polson and J. G. Scott. Local shrinkage rules, l´ vy processes, and regularized regression. Journal e of the Royal Statistical Society (Series B), 74(2):287–311, 2012.
[16] S.-I. P. Sato. L´ vy Processes and infinitely Divisible Distributions. Cambridge University Press, Came bridge, UK, 1999.
[17] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58:267–288, 1996.
[18] M. K. Titsias. The infinite gamma-poisson feature models. In Advances in Neural Information Processing Systems 20, 2007.
[19] J. Weston, A. Elisseeff, B. Sch¨ lkopf, and M. Tipping. Use of the zero-norm with linear models and o kernel methods. Journal of Machine Learning Research, 3:1439–1461, 2003.
[20] C.-H. Zhang. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38:894–942, 2010.
[21] T. Zhang. Analysis of multi-stage convex relaxation for sparse regularization. Journal of Machine Learning Research, 11:1081–1107, 2010.
[22] Z. Zhang, S. Wang, D. Liu, and M. I. Jordan. EP-GIG priors and applications in Bayesian sparse learning. Journal of Machine Learning Research, 13:2031–2061, 2012.
[23] H. Zou and R. Li. One-step sparse estimates in nonconcave penalized likelihood models. The Annals of Statistics, 36(4):1509–1533, 2008. 9