jmlr jmlr2012 jmlr2012-73 jmlr2012-73-reference knowledge-graph by maker-knowledge-mining

73 jmlr-2012-Multi-task Regression using Minimal Penalties

Source: pdf

Author: Matthieu Solnon, Sylvain Arlot, Francis Bach

Abstract: In this paper we study the kernel multiple ridge regression framework, which we refer to as multitask regression, using penalization techniques. The theoretical analysis of this problem shows that the key element appearing for an optimal calibration is the covariance matrix of the noise between the different tasks. We present a new algorithm to estimate this covariance matrix, based on the concept of minimal penalty, which was previously used in the single-task regression framework to estimate the variance of the noise. We show, in a non-asymptotic setting and under mild assumptions on the target function, that this estimator converges towards the covariance matrix. Then plugging this estimator into the corresponding ideal penalty leads to an oracle inequality. We illustrate the behavior of our algorithm on synthetic examples. Keywords: multi-task, oracle inequality, learning theory

reference text

Hirotogu Akaike. Statistical predictor identiﬁcation. Annals of the Institute of Statistical Mathematics, 22:203–217, 1970. Rie Kubota Ando and Tong Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6:1817–1853, December 2005. ISSN 1532-4435. Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243–272, 2008. Sylvain Arlot. Model selection by resampling penalization. Electron. J. Stat., 3:557–624 (electronic), 2009. ISSN 1935-7524. doi: 10.1214/08-EJS196. Sylvain Arlot and Francis Bach. Data-driven calibration of linear estimators with minimal penalties, July 2011. arXiv:0909.1884v2. Sylvain Arlot and Pascal Massart. Data-driven calibration of penalties for least-squares regression. Journal of Machine Learning Research, 10:245–279 (electronic), 2009. Nachman Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68(3):337–404, May 1950. Bart Bakker and Tom Heskes. Task clustering and gating for bayesian multitask learning. Journal of Machine Learning Research, 4:83–99, December 2003. ISSN 1532-4435. doi: http://dx.doi. org/10.1162/153244304322765658. 2810 M ULTI - TASK R EGRESSION USING M INIMAL P ENALTIES Lucien Birg´ and Pascal Massart. Minimal penalties for Gaussian model selection. Probability e Theory and Related Fields, 138:33–73, 2007. Philip J. Brown and James V. Zidek. Adaptive multivariate ridge regression. The Annals of Statistics, 8(1):pp. 64–74, 1980. ISSN 00905364. Rich Caruana. Multitask learning. Machine Learning, 28:41–75, July 1997. ISSN 0885-6125. doi: 10.1023/A:1007379606734. Theodoros Evgeniou, Charles A. Micchelli, and Massimiliano Pontil. Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6:615–637, 2005. Gilles Gasso, Alain Rakotomamonjy, and St´ phane Canu. Recovering sparse signals with none convex penalties and dc programming. IEEE Trans. Signal Processing, 57(12):4686–4698, 2009. Roger A. Horn and Charles R. Johnson. Topics in Matrix Analysis. Cambridge University Press, 1991. ISBN 9780521467131. Laurent Jacob, Francis Bach, and Jean-Philippe Vert. Clustered multi-task learning: A convex formulation. Computing Research Repository, pages –1–1, 2008. Matthieu Lerasle. Optimal model selection in density estimation. Ann. Inst. H. Poincar´ Probab. e Statist., 2011. ISSN 0246-0203. Accepted. arXiv:0910.1654. Percy Liang, Francis Bach, Guillaume Bouchard, and Michael I. Jordan. Asymptotically optimal regularization in smooth parametric models. In Advances in Neural Information Processing Systems, 2010. Karim Lounici, Massimiliano Pontil, Alexandre B. Tsybakov, and Sara van de Geer. Oracle inequalities and optimal inference under group sparsity. Technical Report arXiv:1007.1771, Jul 2010. Comments: 37 pages. Karim Lounici, Massimiliano Pontil, Sarah van de Geer, and Alexandre Tsybakov. Oracle inequalities and optimal inference under group sparsity. The Annals of Statistics, 39(4):2164–2204, 2011. Colin L. Mallows. Some comments on CP . Technometrics, pages 661–675, 1973. Guillaume Obozinski, Martin J. Wainwright, and Michael I. Jordan. Support union recovery in high-dimensional multivariate regression. The Annals of Statistics, 39(1):1–17, 2011. Carl E. Rasmussen and Christopher K.I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006. Bernhard Sch¨ lkopf and Alexander J. Smola. Learning with Kernels: Support Vector Machines, o Regularization, Optimization, and Beyond. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, USA, 12 2002. Sebastian Thrun and Joseph O’Sullivan. Discovering structure in multiple learning tasks: The TC algorithm. Proceedings of the 13th International Conference on Machine Learning, 1996. 2811 S OLNON , A RLOT AND BACH Grace Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1990. ISBN 0-89871-244-0. Tong Zhang. Learning bounds for kernel regression using effective data dimensionality. Neural Computation, 17(9):2077–2098, 2005. 2812