nips nips2013 nips2013-265 nips2013-265-reference knowledge-graph by maker-knowledge-mining

265 nips-2013-Reconciling "priors" & "priors" without prejudice?

Source: pdf

Author: Remi Gribonval, Pierre Machart

Abstract: There are two major routes to address linear inverse problems. Whereas regularization-based approaches build estimators as solutions of penalized regression optimization problems, Bayesian estimators rely on the posterior distribution of the unknown, given some assumed family of priors. While these may seem radically different approaches, recent results have shown that, in the context of additive white Gaussian denoising, the Bayesian conditional mean estimator is always the solution of a penalized regression problem. The contribution of this paper is twofold. First, we extend the additive white Gaussian denoising results to general linear inverse problems with colored Gaussian noise. Second, we characterize conditions under which the penalty function associated to the conditional mean estimator can satisfy certain popular properties such as convexity, separability, and smoothness. This sheds light on some tradeoff between computational efﬁciency and estimation accuracy in sparse regularization, and draws some connections between Bayesian estimation and proximal optimization. 1

reference text

[1] Arthur E. Hoerl and Robert W. Kennard. Ridge regression: applications to nonorthogonal problems. Technometrics, 12(1):69–82, 1970.

[2] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 58(1):267–288, 1996.

[3] Matthieu Kowalski. Sparse regression using mixed norms. Applied and Computational Harmonic Analysis, 27(3):303–324, 2009.

[4] Francis Bach, Rodolphe Jenatton, Julien Mairal, and Guillaume Obozinski. Optmization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 4(1):1–106, 2012.

[5] Rodolphe Jenatton, Guillaume Obozinski, and Francis Bach. Active set algorithm for structured sparsity-inducing norms. In OPT 2009: 2nd NIPS Workshop on Optimization for Machine Learning, 2009.

[6] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.

[7] R´ mi Gribonval, Volkan Cevher, and Mike Davies, E. Compressible Distributions for Highe dimensional Statistics. IEEE Transactions on Information Theory, 2012.

[8] R´ mi Gribonval. Should penalized least squares regression be interpreted as maximum a pose teriori estimation? IEEE Transactions on Signal Processing, 59(5):2405–2410, 2011.

[9] Y. Nesterov. Efﬁciency of coordinate descent methods on huge-scale optimization problems. Core discussion papers, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, 2010.

[10] C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. Sathiya Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear svm. In Proceedings of the 25th International Conference on Machine Learning, pages 408–415, 2008.

[11] Pierre Machart, Thomas Peel, Liva Ralaivola, Sandrine Anthoine, and Herv´ Glotin. Stochase tic low-rank kernel learning for regression. In 28th International Conference on Machine Learning, 2011.

[12] Martin Raphan and Eero P. Simoncelli. Learning to be bayesian without supervision. In in Adv. Neural Information Processing Systems (NIPS*06. MIT Press, 2007.

[13] R´ mi Gribonval and Pierre Machart. Reconciling ”priors” & ”priors” without prejudice? Ree search report RR-8366, INRIA, September 2013. 9