nips nips2010 nips2010-217 nips2010-217-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yu Zhang, Dit-Yan Yeung, Qian Xu
Abstract: Recently, some variants of the đ?‘™1 norm, particularly matrix norms such as the đ?‘™1,2 and đ?‘™1,∞ norms, have been widely used in multi-task learning, compressed sensing and other related areas to enforce sparsity via joint regularization. In this paper, we unify the đ?‘™1,2 and đ?‘™1,∞ norms by considering a family of đ?‘™1,đ?‘ž norms for 1 < đ?‘ž ≤ ∞ and study the problem of determining the most appropriate sparsity enforcing norm to use in the context of multi-task feature selection. Using the generalized normal distribution, we provide a probabilistic interpretation of the general multi-task feature selection problem using the đ?‘™1,đ?‘ž norm. Based on this probabilistic interpretation, we develop a probabilistic model using the noninformative Jeffreys prior. We also extend the model to learn and exploit more general types of pairwise relationships between tasks. For both versions of the model, we devise expectation-maximization (EM) algorithms to learn all model parameters, including đ?‘ž, automatically. Experiments have been conducted on two cancer classiďŹ cation applications using microarray gene expression data. 1
[1] A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243–272, 2008.
[2] J. Bi, T. Xiong, S. Yu, M. Dundar, and R. B. Rao. An improved multi-task learning approach with applications in medical diagnosis. In ECMLPKDD, 2008.
[3] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, New York, 2006.
[4] E. Bonilla, K. M. A. Chai, and C. Williams. Multi-task Gaussian process prediction. In NIPS 20, 2008.
[5] E. J. Cand` s, M. B. Wakin, and S. P. Boyd. Enhancing sparsity by reweighted �1 minimization. Journal e of Fourier Analysis and Applications, 14(5):877–905, 2008. 8
[6] J. Chen and X. Huo. Theoretical results on sparse representations of multiple-measurement vectors. IEEE Transactions on Signal Processing, 54(12):4634–4643, 2006.
[7] S. F. Cotter, B. D. Rao, K. Engan, and K. Kreutz-Delgado. Sparse solutions to linear inverse problems with multiple measurement vectors. IEEE Transactions on Signal Processing, 53(7):2477–2488, 2005.
[8] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistic Society, B, 39(1):1–38, 1977.
[9] M. A. T. Figueiredo. Adaptive sparseness for supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1150–1159, 2003.
[10] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian Data Analysis. Chapman & Hall, 2nd edition, 2003.
[11] I. R. Goodman and S. Kotz. Multivariate đ?œƒ-generalized normal distributions. Journal of Multivariate Analysis, 3(2):204–219, 1973.
[12] A. K. Gupta and D. K. Nagar. Matrix Variate Distributions. Chapman & Hall, 2000.
[13] A. K. Gupta and T. Varga. Matrix variate đ?œƒ-generalized normal distribution. Transactions of The American Mathematical Society, 347(4):1429–1437, 1995.
[14] K. Lange, D. R. Hunter, and I. Yang. Optimization transfer using surrogate objective functions. Journal of Computational and Graphical Statistics, 9(1):1–59, 2000.
[15] H. Liu, M. Palatucci, and J. Zhang. Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. In ICML, 2009.
[16] J. Liu, S. Ji, and J. Ye. Multi-task feature learning via efďŹ cient đ?‘™2,1 -norm minimization. In UAI, 2009.
[17] G. Obozinski, B. Taskar, and M. Jordan. Multi-task feature selection. Technical report, Department of Statistics, University of California, Berkeley, June 2006.
[18] G. Obozinski1, B. Taskar, and M. I. Jordan. Joint covariate selection and joint subspace selection for multiple classiďŹ cation problems. Statistics and Computing, 20(2):231–252, 2010.
[19] Y. Qi, T. P. Minka, R. W. Picard, and Z. Ghahramani. Predictive automatic relevance determination by expectation propagation. In ICML, 2004.
[20] A. Quattoni, X. Carreras, M. Collins, and T. Darrell. An efďŹ cient projection for đ?‘™1,∞ regularization. In ICML, 2009.
[21] A. A. Shabalin, H. Tjelmeland, C. Fan, C. M. Perou, and A. B. Nobel. Merging two gene-expression studies via cross-platform normalization. Bioinformatics, 24(9):1154–1160, 2008.
[22] D. Singh, P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, P. Tamayo, A. A. Renshaw, A. V. DAmico, J. P. Richie, E. S. Lander, M. Loda, P. W. Kantoff, T. R. Golub, and W. R.Sellers. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2):203–209, 2002.
[23] L. Sun, J. Liu, J. Chen, and J. Ye. EfďŹ cient recovery of jointly sparse vectors. In NIPS 22. 2009.
[24] B. A. Turlach, W. N. Wenables, and S. J. Wright. Simultaneous variable selection. Technometrics, 47(3):349–363, 2005.
[25] J. B. Welsh, L. M. Sapinoso, A. I. Su, S. G. Kern, J. Wang-Rodriguez, C. A. Moskaluk, F. H. Frierson, Jr., and G. M. Hampton. Analysis of gene expression identiďŹ es candidate markers and pharmacological targets in prostate cancer. Cancer Research, 61(16):5974–5978, 2001.
[26] D. Wipf and S. Nagarajan. A new view of automatic relevance determination. In NIPS 20, 2007.
[27] D.P. Wipf and S. Nagarajan. Iterative reweighted đ?‘™1 and đ?‘™2 methods for ďŹ nding sparse solutions. Journal of Selected Topics in Signal Processing, 2010.
[28] T. Xiong, J. Bi, B. Rao, and V. Cherkassky. Probabilistic joint feature selection for multi-task learning. In SDM, 2007.
[29] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B, 2006.
[30] J. Zhang, Z. Ghahramani, and Y. Yang. Flexible latent variable models for multi-task learning. Machine Learning, 73(3):221–242, 2008.
[31] Y. Zhang and D.-Y. Yeung. A convex formulation for learning task relationships in multi-task learning. In UAI, 2010.
[32] Y. Zhang and D.-Y. Yeung. Multi-task learning using generalized đ?‘Ą process. In AISTATS, 2010. 9