jmlr jmlr2012 jmlr2012-35 jmlr2012-35-reference knowledge-graph by maker-knowledge-mining

35 jmlr-2012-EP-GIG Priors and Applications in Bayesian Sparse Learning


Source: pdf

Author: Zhihua Zhang, Shusen Wang, Dehua Liu, Michael I. Jordan

Abstract: In this paper we propose a novel framework for the construction of sparsity-inducing priors. In particular, we define such priors as a mixture of exponential power distributions with a generalized inverse Gaussian density (EP-GIG). EP-GIG is a variant of generalized hyperbolic distributions, and the special cases include Gaussian scale mixtures and Laplace scale mixtures. Furthermore, Laplace scale mixtures can subserve a Bayesian framework for sparse learning with nonconvex penalization. The densities of EP-GIG can be explicitly expressed. Moreover, the corresponding posterior distribution also follows a generalized inverse Gaussian distribution. We exploit these properties to develop EM algorithms for sparse empirical Bayesian learning. We also show that these algorithms bear an interesting resemblance to iteratively reweighted ℓ2 or ℓ1 methods. Finally, we present two extensions for grouped variable selection and logistic regression. Keywords: sparsity priors, scale mixtures of exponential power distributions, generalized inverse Gaussian distributions, expectation-maximization algorithms, iteratively reweighted minimization methods


reference text

D. F. Andrews and C. L. Mallows. Scale mixtures of normal distributions. Journal of the Royal Statistical Society B, 36:99–102, 1974. C. Archambeau and F. R. Bach. Sparse probabilistic projections. In Advances in Neural Information Processing Systems 21, 2009. A. Armagan, D. Dunson, and J. Lee. Generalized double Pareto shrinkage. Technical report, Duke University Department of Statistical Science, February 2011. G. E. P. Box and G. C. Tiao. Bayesian Inference in Statistical Analysis. John Willey and Sons, New York, 1992. L. Breiman and J. Friedman. Predicting multivariate responses in multiple linear regression (with discussion). Journal of the Royal Statistical Society, B, 59(1):3–54, 1997. E. J. Cand` s, M. B. Wakin, and S. P. Boyd. Enhancing sparsity by reweighted ℓ1 minimization. The e Journal of Fourier Analysis and Applications, 14(5):877–905, 2008. F. Caron and A. Doucet. Sparse Bayesian nonparametric regression. In Proceedings of the 25th international conference on Machine learning, page 8895, 2008. 2059 Z HANG , WANG , L IU AND J ORDAN C. M. Carvalho, N. G. Polson, and J. G. Scott. Biometrika, 97:465–480, 2010. The horseshoe estimator for sparse signals. V. Cevher. Learning with compressible priors. In Advances in Neural Information Processing Systems 22, pages 261–269, 2009. R. Chartrand and W. Yin. Iteratively reweighted algorithms for compressive sensing. In The 33rd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2008. I. Daubechies, R. Devore, M. Fornasier, and C. S. G¨ nt¨ rk. Iteratively reweighted least squares u u minimization for sparse recovery. Communications on Pure and Applied Mathematics, 63(1): 1–38, 2010. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B, 39(1):1–38, 1977. J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its Oracle properties. Journal of the American Statistical Association, 96:1348–1361, 2001. W. Feller. An Introduction to Probability Theory and Its Applications, volume II. John Wiley & Sons, second edition, 1971. M. A. T. Figueiredo. Adaptive sparseness for supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1150–1159, 2003. W. Fu. Penalized regressions: the bridge vs. the lasso. Journal of Computational and Graphical Statistics, 7:397–416, 1998. P. J. Garrigues and B. A. Olshausen. Group sparse coding with a Laplacian scale mixture prior. In Advances in Neural Information Processing Systems 22, 2010. J. E. Griffin and P. J. Brown. Inference with normal-gamma prior distributions in regression problems. Bayesian Analysis, 5(1):171–183, 2010a. J. E. Griffin and P. J. Brown. Bayesian adaptive Lassos with non-convex penalization. Technical report, University of Kent, 2010b. E. Grosswald. The student t-distribution of any degree of freedom is infinitely divisible. Zeitschrift f¨ r Wahrscheinlichkeitstheorie und verwandte Gebiete, 36:103–109, 1976. u C. Hans. Bayesian lasso regression. Biometrika, 96:835–845, 2009. D. Hunter and R. Li. Variable selection using MM algorithms. The Annals of Statistics, 33(4): 1617–1642, 2005. B. Jørgensen. Statistical Properties of the Generalized Inverse Gaussian Distribution. Lecture Notes in Statistics. Springer, New York, 1982. H. Kiiveri. A general approach to simultaneous model fitting and variable elimination in response models for biological data with many more variables than observations. BMC Bioinformatics, 9: 195, 2008. 2060 EP-GIG P RIORS AND A PPLICATIONS IN BAYESIAN S PARSE L EARNING M. Kyung, J. Gill, M. Ghosh, and G. Casella. Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis, 5(2):369412, 2010. K. Lange and J. S. Sinsheimer. Normal/independent distributions and their applications in robust regression. Journal of Computational and Graphical Statistics, 2(2):175–198, 1993. A. Lee, F. Caron, A. Doucet, and C. Holmes. A hierarchical Bayesian framework for constructing sparsity-inducing priors. Technical report, University of Oxford, UK, 2010. R. Mazumder, J. Friedman, and T. Hastie. SparseNet: Coordinate descent with nonconvex penalties. Journal of the American Statistical Association, 106(495):1125–1138, 2011. T. Park and G. Casella. The Bayesian Lasso. Journal of the American Statistical Association, 103 (482):681–686, 2008. N. G. Polson and J. G. Scott. Shrink globally, act locally: Sparse Bayesian regularization and prediction. In J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith, and M. West, editors, Bayesian Statistics 9. Oxford University Press, 2010. N. G. Polson and J. G. Scott. Sparse Bayes estimation in non-gaussian models via data augmentation. Technical report, University of Texas at Austin, July 2011. N. G. Polson and J. G. Scott. Local shrinkage rules, L´ vy processes, and regularized regression. e Journal of the Royal Statistical Society (Series B), 74(2):287–311, 2012. B. K. Sriperumbudur and G. R. G. Lanckriet. On the convergence of the concave-convex procedure. In Advances in Neural Information Processing Systems 22, 2009. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58:267–288, 1996. M. E. Tipping. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1:211–244, 2001. M. West. On scale mixtures of normal distributions. Biometrika, 74:646–648, 1987. J. Weston, A. Elisseeff, B. Sch¨ lkopf, and M. Tipping. Use of the zero-norm with linear models o and kernel methods. Journal of Machine Learning Research, 3:1439–1461, 2003. D. Wipf and S. Nagarajan. Iterative reweighted ℓ1 and ℓ2 methods for finding sparse solutions. IEEE Journal of Selected Topics in Signal Processing, 4(2):317–329, 2010. C. F. J. Wu. On the convergence properties of the EM algorithm. The Annals of Statistics, 11: 95–103, 1983. M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B, 68:49–67, 2007. H. Zou. The adaptive lasso and its Oracle properties. Journal of the American Statistical Association, 101(476):1418–1429, 2006. H. Zou and R. Li. One-step sparse estimates in nonconcave penalized likelihood models. The Annals of Statistics, 36(4):1509–1533, 2008. 2061