jmlr jmlr2009 jmlr2009-13 jmlr2009-13-reference knowledge-graph by maker-knowledge-mining

13 jmlr-2009-Bounded Kernel-Based Online Learning

Source: pdf

Author: Francesco Orabona, Joseph Keshet, Barbara Caputo

Abstract: A common problem of kernel-based online algorithms, such as the kernel-based Perceptron algorithm, is the amount of memory required to store the online hypothesis, which may increase without bound as the algorithm progresses. Furthermore, the computational load of such algorithms grows linearly with the amount of memory used to store the hypothesis. To attack these problems, most previous work has focused on discarding some of the instances, in order to keep the memory bounded. In this paper we present a new algorithm, in which the instances are not discarded, but are instead projected onto the space spanned by the previous online hypothesis. We call this algorithm Projectron. While the memory size of the Projectron solution cannot be predicted before training, we prove that its solution is guaranteed to be bounded. We derive a relative mistake bound for the proposed algorithm, and deduce from it a slightly different algorithm which outperforms the Perceptron. We call this second algorithm Projectron++. We show that this algorithm can be extended to handle the multiclass and the structured output settings, resulting, as far as we know, in the ﬁrst online bounded algorithm that can learn complex classiﬁcation tasks. The method of bounding the hypothesis representation can be applied to any conservative online algorithm and to other online algorithms, as it is demonstrated for ALMA2 . Experimental results on various data sets show the empirical advantage of our technique compared to various bounded online algorithms, both in terms of memory and accuracy. Keywords: online learning, kernel methods, support vector machines, bounded support set

reference text

G. Cauwenberghs and T. Poggio. Incremental and decremental support vector machine learning. In Advances in Neural Information Processing Systems 14, pages 409–415, 2000. N. Cesa-Bianchi, A. Conconi, and C. Gentile. On the generalization ability of on-line learning algorithms. IEEE Trans. on Information Theory, 50(9):2050–2057, 2004. N. Cesa-Bianchi, A. Conconi, and C. Gentile. A second-order perceptron algorithm. SIAM Journal on Computing, 34(3):640–668, 2005. N. Cesa-Bianchi, A. Conconi, and C. Gentile. Tracking the best hyperplane with a simple budget Perceptron. In Proc. of the 19th Conference on Learning Theory, pages 483–498, 2006. L. Cheng, S. V. N. Vishwanathan, D. Schuurmans, S. Wang, and T. Caelli. Implicit online learning with kernels. In Advances in Neural Information Processing Systems 19, pages 249–256, 2007. M. Collins. Discriminative reranking for natural language parsing. In Proc. of the 17th International Conference on Machine Learning, pages 175–182, 2000. K. Crammer and Y. Singer. Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3:951–991, 2003. K. Crammer, J. Kandola, and Y. Singer. Online classiﬁcation on a budget. In Advances in Neural Information Processing Systems 16, 2003. 2664 B OUNDED K ERNEL -BASED O NLINE L EARNING K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7:551–585, 2006. L. Csat´ and M. Opper. Sparse on-line gaussian processes. Neural Computation, 14:641–668, 2002. o F. Cucker and D. X. Zhou. Learning Theory: An Approximation Theory Viewpoint. Cambridge University Press, New York, NY, USA, 2007. O. Dekel, S. Shalev-Shwartz, and Y. Singer. The Forgetron: A kernel-based perceptron on a budget. SIAM Journal on Computing, 37(5):1342–1372, 2007. T. Downs, K. E. Gates, and A. Masters. Exact simpliﬁcation of support vectors solutions. Journal of Machine Learning Research, 2:293–297, 2001. M. F. Duarte and Y. H. Hu. Vehicle classiﬁcation in distributed sensor networks. Journal of Parallel and Distributed Computing, 64:826–838, 2004. Y. Engel, S. Mannor, and R. Meir. The kernel recursive least-squares algorithm. IEEE Transactions on Signal Processing, 52(8):2275–2285, 2004. Y. Freund and R. E. Schapire. Large margin classiﬁcation using the Perceptron algorithm. Machine Learning, pages 277–296, 1999. C. Gentile. A new approximate maximal margin classiﬁcation algorithm. Journal of Machine Learning Research, 2:213–242, 2001. C. Gentile. The robustness of the p-norm algorithms. Machine Learning, 53(3):265–299, 2003. J. J. Hull. A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5):550–554, 1994. S. S. Keerthi, D. Decoste, and T. Joachims. A modiﬁed ﬁnite newton method for fast solution of large scale linear svms. Journal of Machine Learning Research, 6:2005, 2005. J. Kivinen, A. Smola, and R. Williamson. Online learning with kernels. IEEE Trans. on Signal Processing, 52(8):2165–2176, 2004. J. Langford, L. Li, and T. Zhang. Sparse online learning via truncated gradient. In Advances in Neural Information Processing Systems 21, pages 905–912, 2008. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, November 1998. L. Lemel, R. Kassel, and S. Seneff. Speech database development: Design and analysis. Report no. SAIC-86/1546, Proc. DARPA Speech Recognition Workshop, 1986. F. Orabona. DOGMA: A MATLAB Toolbox for Online Learning, 2009. Software available at http: //dogma.sourceforge.net. F. Orabona, C. Castellini, B. Caputo, J. Luo, and G. Sandini. Indoor place recognition using online independent support vector machines. In Proc. of the British Machine Vision Conference 2007, pages 1090–1099, 2007. 2665 O RABONA , K ESHET AND C APUTO J. C. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Sch¨ lkopf, C. Burges, and A. Smola, editors, Advances in kernel methods – support vector o learning, pages 185–208. MIT Press, Cambridge, MA, USA, 1999. D. Prokhorov. IJCNN 2001 neural network competition. Technical report, 2001. F. Rosenblatt. The Perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–407, 1958. B. Sch¨ lkopf, R. Herbrich, and A. J. Smola. A generalized representer theorem. In Proc. of the 14th o Conference on Computational Learning Theory, pages 416–426, London, UK, 2001. SpringerVerlag. B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In Advances in Neural Information Processing Systems 17, 2003. I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for interdependent and structured output spaces. In Proc. of the 21st International Conference on Machine Learning, pages 823–830, 2004. J. Weston, A. Bordes, and L. Bottou. Online (and ofﬂine) on an even tighter budget. In Robert G. Cowell and Zoubin Ghahramani, editors, Proc. of AISTATS 2005, pages 413–420, 2005. 2666