nips nips2012 nips2012-104 nips2012-104-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yi Wu, David P. Wipf
Abstract: Sparse linear (or generalized linear) models combine a standard likelihood function with a sparse prior on the unknown coefficients. These priors can conveniently be expressed as a maximization over zero-mean Gaussians with different variance hyperparameters. Standard MAP estimation (Type I) involves maximizing over both the hyperparameters and coefficients, while an empirical Bayesian alternative (Type II) first marginalizes the coefficients and then maximizes over the hyperparameters, leading to a tractable posterior approximation. The underlying cost functions can be related via a dual-space framework from [22], which allows both the Type I or Type II objectives to be expressed in either coefficient or hyperparmeter space. This perspective is useful because some analyses or extensions are more conducive to development in one space or the other. Herein we consider the estimation of a trade-off parameter balancing sparsity and data fit. As this parameter is effectively a variance, natural estimators exist by assessing the problem in hyperparameter (variance) space, transitioning natural ideas from Type II to solve what is much less intuitive for Type I. In contrast, for analyses of update rules and sparsity properties of local and global solutions, as well as extensions to more general likelihood models, we can leverage coefficient-space techniques developed for Type I and apply them to Type II. For example, this allows us to prove that Type II-inspired techniques can be successful recovering sparse coefficients when unfavorable restricted isometry properties (RIP) lead to failure of popular ℓ1 reconstructions. It also facilitates the analysis of Type II when non-Gaussian likelihood models lead to intractable integrations. 1
[1] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, 2004.
[2] E. Cand` s, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction e from highly incomplete frequency information,” IEEE Trans. Information Theory, vol. 52, no. 2, pp. 489–509, Feb. 2006.
[3] E. Cand` s, M. Wakin, and S. Boyd, “Enhancing sparsity by reweighted ℓ1 minimization,” J. e Fourier Anal. Appl., vol. 14, no. 5, pp. 877–905, 2008.
[4] R. Chartrand and W. Yin, “Iteratively reweighted algorithms for compressive sensing,” Proc. Int. Conf. Accoustics, Speech, and Signal Proc., 2008.
[5] D.L. Donoho and M. Elad, “Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization,” Proc. National Academy of Sciences, vol. 100, no. 5, pp. 2197– 2202, March 2003.
[6] M.A.T. Figueiredo, “Adaptive sparseness using Jeffreys prior,” Advances in Neural Information Processing Systems 14, pp. 697–704, 2002.
[7] C. F´ votte and S.J. Godsill, “Blind separation of sparse sources using Jeffreys inverse prior and e the EM algorithm,” Proc. 6th Int. Conf. Independent Component Analysis and Blind Source Separation, Mar. 2006.
[8] M. Girolami, “A variational method for learning sparse and overcomplete representations,” Neural Computation, vol. 13, no. 11, pp. 2517–2532, 2001.
[9] I.F. Gorodnitsky and B.D. Rao, “Sparse signal reconstruction from limited data using FOCUSS: A re-weighted minimum norm algorithm,” IEEE Transactions on Signal Processing, vol. 45, no. 3, pp. 600–616, March 1997.
[10] S. Ji, D. Dunson, and L. Carin, “Multi-task compressive sensing,” IEEE Trans. Signal Processing, vol. 57, no. 1, pp. 92–106, Jan 2009.
[11] K. Kreutz-Delgado, J. F. Murray, B.D. Rao, K. Engan, T.-W. Lee, and T.J. Sejnowski, “Dictionary learning algorithms for sparse representation,” Neural Computation, vol. 15, no. 2, pp. 349–396, February 2003.
[12] D.J.C. MacKay, “Bayesian interpolation,” Neural Computation, vol. 4, no. 3, pp. 415–447, 1992.
[13] J. Mattout, C. Phillips, W.D. Penny, M.D. Rugg, and K.J. Friston, “MEG source localization under multiple constraints: An extended Bayesian framework,” NeuroImage, vol. 30, pp. 753– 767, 2006.
[14] R.M. Neal, Bayesian Learning for Neural Networks, Springer-Verlag, New York, 1996.
[15] J.A. Palmer, D.P. Wipf, K. Kreutz-Delgado, and B.D. Rao, “Variational EM algorithms for non-Gaussian latent variable models,” Advances in Neural Information Processing Systems 18, pp. 1059–1066, 2006.
[16] B.D. Rao, K. Engan, S. F. Cotter, J. Palmer, and K. Kreutz-Delgado, “Subset selection in noise based on diversity measure minimization,” IEEE Trans. Signal Processing, vol. 51, no. 3, pp. 760–770, March 2003.
[17] M. Seeger and H. Nickisch, “Large scale Bayesian inference and experimental design for sparse linear models,” SIAM J. Imaging Sciences, vol. 4, no. 1, pp. 166–199, 2011.
[18] R. Tibshirani, “Regression shrinkage and selection via the Lasso,” Journal of the Royal Statistical Society, vol. 58, no. 1, pp. 267–288, 1996.
[19] M.E. Tipping, “Sparse Bayesian learning and the relevance vector machine,” Journal of Machine Learning Research, vol. 1, pp. 211–244, 2001.
[20] M.E. Tipping and A.C. Faul, “Fast marginal likelihood maximisation for sparse Bayesian models,” Ninth Int. Workshop. Artificial Intelligence and Statistics, Jan. 2003.
[21] D.P. Wipf “Sparse estimation with structured dictionaries,” Advances in Nerual Information Processing 24, 2011.
[22] D.P. Wipf, B.D. Rao, and S. Nagarajan, “Latent variable Bayesian models for promoting sparsity,” IEEE Trans. Information Theory, vol. 57, no. 9, Sept. 2011. 9