jmlr jmlr2007 jmlr2007-64 jmlr2007-64-reference knowledge-graph by maker-knowledge-mining

64 jmlr-2007-Online Learning of Multiple Tasks with a Shared Loss

Source: pdf

Author: Ofer Dekel, Philip M. Long, Yoram Singer

Abstract: We study the problem of learning multiple tasks in parallel within the online learning framework. On each online round, the algorithm receives an instance for each of the parallel tasks and responds by predicting the label of each instance. We consider the case where the predictions made on each round all contribute toward a common goal. The relationship between the various tasks is deﬁned by a global loss function, which evaluates the overall quality of the multiple predictions made on each round. Speciﬁcally, each individual prediction is associated with its own loss value, and then these multiple loss values are combined into a single number using the global loss function. We focus on the case where the global loss function belongs to the family of absolute norms, and present several online learning algorithms for the induced problem. We prove worst-case relative loss bounds for all of our algorithms, and demonstrate the effectiveness of our approach on a largescale multiclass-multilabel text categorization problem. Keywords: online learning, multitask learning, multiclass multilabel classiifcation, perceptron

reference text

J. Baxter. A model of inductive bias learning. Journal of Artiﬁcial Intelligence Research, 12: 149–198, 2000. S. Ben-David and R. Schuller. Exploiting task relatedness for multiple task learning. In Proceedings of the Sixteenth Annual Conference on Computational Learning Theory, 2003. C. Bennett and R. Sharpley. Interpolation of Operators. Academic Press, 1998. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. R. Caruana. Multitask learning. Machine Learning, 28(1):41–75, 1997. O. Chapelle and Z. Harchaoui. A machine learning approach to conjoint analysis. In Advances in Neural Information Processing Systems, volume 17, 2005. K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2:265–292, 2001. K. Crammer and Y. Singer. Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3:951–991, 2003. K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive aggressive algorithms. Journal of Machine Learning Research, 7:551–585, Mar 2006. T. G. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artiﬁcial Intelligence Research, 2:263–286, January 1995. T. Evgeniou, C.Micchelli, and M. Pontil. Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6:615–637, 2005. D. P. Helmbold, J. Kivinen, and M. Warmuth. Relative loss bounds for single neurons. IEEE Transactions on Neural Networks, 10(6):1291–1304, 1999. R. Herbrich, T. Graepel, and K. Obermayer. Large marging rank boundaries for ordinal regression. In A. Smola, B. Sch¨ lkopf, and D. Schuurmans, editors, Advances in Large Margin Classiﬁers. o MIT Press, 2000. 2263 D EKEL , L ONG AND S INGER T. Heskes. Solving a huge number of silmilar tasks: A combination of multitask learning and a hierarchical bayesian approach. In Proceedings of the Fifteenth International Conference on Machine Learning, pages 233–241, 1998. R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1985. J. Kivinen and M. Warmuth. Relative loss bounds for multidimensional regression problems. Journal of Machine Learning, 45(3):301–329, July 2001. A. B. J. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, volume XII, pages 615–622, 1962. F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–407, 1958. (Reprinted in Neurocomputing (MIT Press, 1988).). I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for interdependent and structured output spaces. In Proceedings of the Twenty-First International Conference on Machine Learning, 2004. 2264