jmlr jmlr2010 jmlr2010-66 jmlr2010-66-reference knowledge-graph by maker-knowledge-mining

66 jmlr-2010-Linear Algorithms for Online Multitask Classification

Source: pdf

Author: Giovanni Cavallanti, Nicolò Cesa-Bianchi, Claudio Gentile

Abstract: We introduce new Perceptron-based algorithms for the online multitask binary classiﬁcation problem. Under suitable regularity conditions, our algorithms are shown to improve on their baselines by a factor proportional to the number of tasks. We achieve these improvements using various types of regularization that bias our algorithms towards speciﬁc notions of task relatedness. More specifically, similarity among tasks is either measured in terms of the geometric closeness of the task reference vectors or as a function of the dimension of their spanned subspace. In addition to adapting to the online setting a mix of known techniques, such as the multitask kernels of Evgeniou et al., our analysis also introduces a matrix-based multitask extension of the p-norm Perceptron, which is used to implement spectral co-regularization. Experiments on real-world data sets complement and support our theoretical ﬁndings. Keywords: mistake bounds, perceptron algorithm, multitask learning, spectral regularization

reference text

J. Abernethy, P.L. Bartlett, and A. Rakhlin. Multitask learning with expert advice. In Proceedings of the 20th Annual Conference on Learning Theory, pages 484–498. Springer, 2007. A. Agarwal, A. Rakhlin, and P. Bartlett. Matrix regularization techniques for online multitask learning. Technical Report UCB/EECS-2008-138, EECS Department, University of California, Berkeley, 2008. R.K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6:1817–1853, 2005. A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. In Advances in Neural Information Processing Systems 19, pages 41–48. MIT Press, 2007. A. Argyriou, C.A. Micchelli, M. Pontil, and Y. Ying. A spectral regularization framework for multitask structure learning. In Advances in Neural Information Processing Systems 20, pages 25–32. Curran Associates, 2008. A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operation Research Letters, 31:167–175, 2003. U. Brefeld, T. Gaertner, T. Scheffer, and S. Wrobel. Efﬁcient co-regularised least squares regression. In Proceedings of the 23rd International Conference on Machine Learning. Omnipress, 2006. N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006. N. Cesa-Bianchi, A. Conconi, and C. Gentile. A second-order Perceptron algorithm. SIAM Journal on Computing, 43(3):640–668, 2005. O. Dekel, P.M. Long, and Y. Singer. Online learning of multiple tasks with a shared loss. Journal of Machine Learning Research, 8:2233–2264, 2007. ECML/PKDD Discovery Challenge, 2006. URL : www.ecmlpkdd2006.org/challenge.html. T. Evgeniou, C. Micchelli, and M. Pontil. Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6:614–637, 2005. 2932 L INEAR A LGORITHMS FOR O NLINE M ULTITASK C LASSIFICATION Y. Freund and R.E. Schapire. Large margin classiﬁcation using the perceptron algorithm. Machine Learning, 37(3):277–296, 1999. C. Gentile. The robustness of the p-norm algorithms. Machine Learning, 53(3): 265–299, 2003. A.J. Grove, N. Littlestone, and D. Schuurmans. General convergence results for linear discriminant updates. Machine Learning, 43(3):173–210, 2001. M. Herbster and G. Lever. Predicting the labelling of a graph via minimum p-seminorm interpolation. In Proceedings of the 22nd Annual Conference on Learning Theory. Omnipress, 2009. M. Herbster, M. Pontil, and L. Wainer. Online learning over graphs. In Proceedings of the 22nd International Conference on Machine Learning, pages 305–312. Omnipress, 2005. L. Hogben. Handbook of Linear Algebra. CRC Press, 2006. R.A. Horn and C.R. Johnson. Matrix Analysis. Cambridge University Press, 1985. R.A. Horn and C.R. Johnson. Topics in Matrix Analysis. Cambridge University Press, 1991. L. Jacob, F. Bach, and J.P. Vert. Clustered multi-task learning. In Advances in Neural Information Processing Systems 21, pages 745–752. Curran Associates, 2009. A. Juditsky and A. Nemirovski. Large deviations of vector-valued martingales in 2-smooth normed spaces. Manuscript, 2009. S. Kakade and D. Foster. Multi-view regression via canonical correlation analysis. In Proceedings of the 20th Annual Conference on Learning Theory, pages 82–96. Springer, 2007. S. Kakade, S. Shalev-Shwartz, and A. Tewari. On the duality of strong convexity and strong smoothness: Learning applications and matrix regularization. Manuscript, 2009. J. Kivinen and M. Warmuth. Relative loss bounds for multidimensional regression problems. Machine Learning, 45:301–329, 2001. A.S. Lewis. The convex analysis of unitarily invariant matrix functions. Journal of Convex Analsys, 2(1):173–183, 1995. N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, University of California at Santa Cruz, 1989. G. Lugosi, O. Papaspiliopoulos, and G. Stoltz. Online multi-task learning with hard constraints. In Proceedings of the 22nd Annual Conference on Learning Theory. Omnipress, 2009. J.R. Magnus and H. Neudecker. Matrix Differential Calculus with Applications in Statistics and Econometrics. John Wiley, 1999. A. Maurer. Bounds for linear multi-task learning. Journal of Machine Learning Research, 7:117– 139, 2006. A. Nemirovski and D. Yudin. Problem Complexity and Method Efﬁciency in Optimization. Nauka Publishers, 1978. 2933 C AVALLANTI , C ESA -B IANCHI AND G ENTILE NIST, 2004. URL : trec.nist.gov/data/reuters/reuters.html. V. Sindhwani D.R. Rosenberg. An RKHS for multi-view learning and manifold co-regularization. In Proceedings of the 26th International Conference on Machine Learning, pages 976–983. Omnipress, 2008. V. Sindhwani, P. Niyogi, and M. Belkin. A co-regularized approach to semi-supervised learning. In Proceedings of the 22nd International Conference on Machine Learning Workshop on Learning with Multiple Views, 2005. K. Tsuda, G. Raetsch, and M.K. Warmuth. Matrix exponentiated gradient updates for on-line learning and Bregman projection. Journal of Machine Learning Research, 6:995–1018, 2005. M.K. Warmuth. Winnowing subspaces. In Proceedings of the 24th International Conference on Machine Learning, pages 999–1006. Omnipress, 2007. M.K. Warmuth. Kernelization of matrix updates. Manuscript, 2009. M.K. Warmuth and D. Kuzmin. Online variance minimization. In Proceedings of the 19th Annual Conference on Learning Theory, pages 514–528. Springer, 2006. 2934