nips nips2011 nips2011-258 nips2011-258-reference knowledge-graph by maker-knowledge-mining

258 nips-2011-Sparse Bayesian Multi-Task Learning


Source: pdf

Author: Shengbo Guo, Onno Zoeter, Cédric Archambeau

Abstract: We propose a new sparse Bayesian model for multi-task regression and classification. The model is able to capture correlations between tasks, or more specifically a low-rank approximation of the covariance matrix, while being sparse in the features. We introduce a general family of group sparsity inducing priors based on matrix-variate Gaussian scale mixtures. We show the amount of sparsity can be learnt from the data by combining an approximate inference approach with type II maximum likelihood estimation of the hyperparameters. Empirical evaluations on data sets from biology and vision demonstrate the applicability of the model, where on both regression and classification tasks it achieves competitive predictive performance compared to previously proposed methods. 1


reference text

[1] J. H. Albers and S. Chib. Bayesian analysis of binary and polychotomous response data. J.A.S.A., 88(422):669–679, 1993.

[2] R. K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. JMLR, 6:1817–1853, 2005.

[3] D. F. Andrews and C. L. Mallows. Scale mixtures of normal distributions. Journal of the Royal Statistical Society B, 36(1):99–102, 1974.

[4] C. Archambeau and F. Bach. Sparse probabilistic projections. In NIPS. MIT Press, 2008.

[5] A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73:243–272, 2008.

[6] B. Bakker and T. Heskes. Task clustering and gating for bayesian multitask learning. JMLR, 4:83–99, 2003.

[7] M. J. Beal. Variational Algorithms for Approximate Bayesian Inference. PhD thesis, Gatsby Computational Neuroscience Unit, University College London, 2003.

[8] J. O. Berger. Statistical Decision Theory and Bayesian Analysis. Springer, New York, 1985.

[9] M. R. Boutell, J. Luo, X. Shen, and C. M. Brown. Learning multi-label scene classification. Pattern Recognition, 37(9):1757–1771, 2004.

[10] E. J. Cand` s, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? Journal of e the ACM, 58:1–37, June 2011.

[11] F. Caron and A. Doucet. Sparse Bayesian nonparametric regression. In ICML, pages 88–95. ACM, 2008.

[12] R. Caruana. Multitask learning. Machine Learning, 28(1):41–75, 1997. 8

[13] O. Chapelle, P. Shivaswamy, S. Vadrevu, K. Weinberger, Y. Zhang, and B. Tseng. Multi-task learning for boosting with application to web search ranking. In SIGKDD, pages 1189–1198, 2010.

[14] R. Chari, W. W. Lockwood, B. P. Coe, A. Chu, D. Macey, A. Thomson, J. J. Davies, C. MacAulay, and W. L. Lam. Sigma: A system for integrative genomic microarray analysis of cancer genomes. BMC Genomics, 7:324, 2006.

[15] J. Chen, J. Liu, and J. Ye. Learning incoherent sparse and low-rank patterns from multiple tasks. In SIGKDD, pages 1179–1188. ACM, 2010.

[16] A. P. Dawid. Some matrix-variate distribution theory: Notational considerations and a bayesian application. Biometrika, 68(1):265–274, 1981.

[17] A. Elisseeff and J. Weston. A kernel method for multi-labelled classification. In NIPS. 2002.

[18] T. Evgeniou, C. A. Micchelli, and M. Pontil. Learning multiple tasks with kernel methods. JMLR, 6:615–637, 2005.

[19] M. Figueiredo. Adaptive sparseness for supervised learning. IEEE Transactions on PAMI, 25:1150–1159, 2003.

[20] A. Gelman and J. Hill. Data Analysis Using Regression and Multilevel/Hiererarchical Models. Cambridge University Press, 2007.

[21] D. Hern´ ndez-Lobato, J. M. Hern´ ndez-Lobato, T. Helleputte, and P. Dupont. Expectation a a propagation for Bayesian multi-task feature selection. In ECML-PKDD, pages 522–537, 2010.

[22] L. Jacob, F. Bach, and J.-P. Vert. Clustered multi-task learning: A convex formulation. In NIPS, pages 745–752. 2009.

[23] T. Jebara. Multitask sparsity via maximum entropy discrimination. JMLR, 12:75–110, 2011.

[24] B. Jørgensen. Statistical Properties of the Generalized Inverse Gaussian Distribution. Springer, 1982.

[25] D. J. C. MacKay. Bayesian interpolation. Neural Computation, 4(3):415–447, 1992.

[26] A. Makadia, V. Pavlovic, and S. Kumar. A new baseline for image annotation. In ECCV, 2008.

[27] R. M. Neal and G. E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. In M. I. Jordan, editor, Learning in Graphical Models, pages 355–368. MIT press, 1998.

[28] P. Rai and H. Daume. Multi-label prediction via sparse infinite cca. In NIPS, pages 1518–1526. 2009.

[29] P. Rai and H. D. III. Infinite predictor subspace models for multitask learning. In AISTATS, pages 613–620, 2010.

[30] S. Raman, T. J. Fuchs, P. J. Wild, E. Dahl, and V. Roth. The Bayesian group-Lasso for analyzing contingency tables. In ICML, pages 881–888, 2009.

[31] A. Torralba, K. P. Murphy, and W. T. Freeman. Sharing features: efficient boosting procedures for multiclass object detection. In CVPR, pages 762–769. IEEE Computer Society, 2004.

[32] M. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using l1 -constrained quadratic programming (lasso). IEEE Transactions on Information Theory, 55(5):2183 –2202, 2009.

[33] Y. Xue, D. Dunson, and L. Carin. The matrix stick-breaking process for flexible multi-task learning. In ICML, pages 1063–1070, 2007.

[34] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. J. R. Statistic. Soc. B, 68(1):49–67, 2006.

[35] Y. Zhang and J. Schneider. Learning multiple tasks with a sparse matrix-normal penalty. In NIPS, pages 2550–2558. 2010. 9