jmlr jmlr2011 jmlr2011-59 jmlr2011-59-reference knowledge-graph by maker-knowledge-mining

59 jmlr-2011-Learning with Structured Sparsity

Source: pdf

Author: Junzhou Huang, Tong Zhang, Dimitris Metaxas

Abstract: This paper investigates a learning formulation called structured sparsity, which is a natural extension of the standard sparsity concept in statistical learning and compressive sensing. By allowing arbitrary structures on the feature set, this concept generalizes the group sparsity idea that has become popular in recent years. A general theory is developed for learning with structured sparsity, based on the notion of coding complexity associated with the structure. It is shown that if the coding complexity of the target signal is small, then one can achieve improved performance by using coding complexity regularization methods, which generalize the standard sparse regularization. Moreover, a structured greedy algorithm is proposed to efﬁciently solve the structured sparsity problem. It is shown that the greedy algorithm approximately solves the coding complexity optimization problem under appropriate conditions. Experiments are included to demonstrate the advantage of structured sparsity over standard sparsity on some real applications. Keywords: structured sparsity, standard sparsity, group sparsity, tree sparsity, graph sparsity, sparse learning, feature selection, compressive sensing

reference text

A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning Journal, 73:243–272, 2008. I. Johnstone B. Efron, T. Hastie and R. Tibshirani. Least angle regression. Annals of Statistics, 32: 407–499, 2004. F. R. Bach. Consistency of the group lasso and multiple kernel learning. Journal of Machine Learning Research, 9:1179–1225, 2008. R. Baraniuk, V. Cevher, M. F. Duarte, and C. Hegde. Model based compressive sensing. IEEE Transactions on Information Theory, 56:1982–2001, 2010. F. Bunea, A. B. Tsybakov, and M. H. Wegkamp. Aggregation for Gaussian regression. Annals of Statistics, 35:1674–1697, 2007. E. J. Candes and T. Tao. Decoding by linear programming. IEEE Transaction on Information Theory, 51:4203–4215, 2005. V. Cevher, M. F. Duarte, C. Hegde, and R. Baraniuk. Sparse signal recovery using markov random ﬁelds. In Proceeding of NIPS, 2009a. V. Cevher, P. Indyk, C. Hegde, and R. G. Baraniuk. Recovery of clustered sparse signals from compressive measurements. In Proceeding of the 8th International Conference on Sampling Theory and Applications, 2009b. C. Chesneau and M. Hebiri. Some theoretical results on the grouped variables Lasso. Mathematical Methods of Statistics, 17:317–326, 2008. T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, 1991. A. d’Aspremont, F. Bach, and L. El Ghaoui. Optimal solutions for sparse principal component analysis. Journal of Machine Learning Research, 9:1269–1294, 2008. L. Daudet. Sparse and structured decomposition of audio signals in overcomplete spaces. In Proceeding of International Conference on Digital Audio Effects, 2004. D. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52:1289–1306, 2006. D. Grimm, T. Netzer, and M. Schweighofer. A note on the representation of positive polynomials with structured sparsity. Archiv der Mathematik, 89:399–403, 2007. J. Haupt and R. Nowak. Signal reconstruction from noisy projections. IEEE Trans. Information Theory, 52:4036–4048, 2006. L. He and L. Carin. Exploiting structure in wavelet-based bayesian compressive sensing. IEEE Transaction on Signal Processing, 57:3488–3497, 2009a. L. He and L. Carin. Exploiting structure in compressive sensing with a jpeg basis. In Preprint, 2009b. 3410 L EARNING WITH S TRUCTURED S PARSITY J. Huang and T. Zhang. The beneﬁt of group sparsity. Annals of Statistics, 38(4):1978–2004, 2010. L. Jacob, G. Obozinski, and J. Vert. Group lasso with overlap and graph lasso. In Proceedings of ICML, 2009. R. Jenatton, J. Y. Audibert, and F. Bach. Structured variable selection with sparsity-inducing norms. Technical report, Tech Report: arXiv:0904, 2009. S. Ji, D. Dunson, and L. Carin. Multi-task compressive sensing. IEEE Transactions on Signal Processing, 2008. Accepted. V. Koltchinskii and M. Yuan. Sparse recovery in large ensembles of kernel machines. In Proceeding of COLT, 2008. M. Kowalski and B. Torresani. Structured sparsity: from mixed norms to structured shrinkage. In Workshop on Signal Processing with Adaptive Sparse Representations, 2009. K. Lounici, M. Pontil, A. B. Tsybakov, and S. A. Van De Geer. Taking advantage of sparsity in multi-task learnng. In Proceeding of COLT, 2009. A. Lozano, G. Swirszcz, and N. Abe. Group orthogonal matching pursuit for variable selection and prediction. In Proceedings of NIPS, 2009. Y. M. Lu and M. N. Do. A theory for sampling signals from a union of subspaces. IEEE Transactions on Signal Processing, 56(6):2334–2345, 2008. S. Mallat. A Wavelet Tour of Signal Processing. Academic Press, 1999. Y. Nardi and A. Rinaldo. On the asymptotic properties of the group lasso estimator for linear models. Electronic Journal of Statistics, 2:605–633, 2008. G. Obozinski, M. J. Wainwright, and M. I. Jordan. Union support recovery in high-dimensional multivariate regression. Technical Report 761, UC Berkeley, 2008. G. Pisier. The Volume of Convex Bodies and Banach Space Geometry. Cambridge University Press, 1989. J. M. Shapiro. Embedded image coding using zerotrees of wavelet coefﬁcients. IEEE Transactions on Signal Processing, 41:3445–3462, 1993. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 58:267–288, 1996. J. Tropp and A. Gilbert. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory, 53(12):4655–4666, 2007. D. Wipf and B. Rao. An empirical Bayesian strategy for solving the simultaneous sparse approximation problem. IEEE Transactions on Signal Processing, 55(7):3704–3716, 2007. M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of The Royal Statistical Society Series B, 68(1):49–67, 2006. 3411 H UANG , Z HANG AND M ETAXAS T. Zhang. Adaptive forward-backward greedy algorithm for learning sparse representations. IEEE Transactions on Information Theory, 57:4689–4708, 2011. P. Zhao, G. Rocha, and B. Yu. The composite absolute penalties family for grouped and hierarchical variable selection. The Annals of Statistics, 37(6A):3468–3497, 2009. Z. Zivkovic and F. Heijden. Efﬁcient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognition Letters, 27(7):773–780, 2006. 3412