jmlr jmlr2009 jmlr2009-13 jmlr2009-13-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Francesco Orabona, Joseph Keshet, Barbara Caputo
Abstract: A common problem of kernel-based online algorithms, such as the kernel-based Perceptron algorithm, is the amount of memory required to store the online hypothesis, which may increase without bound as the algorithm progresses. Furthermore, the computational load of such algorithms grows linearly with the amount of memory used to store the hypothesis. To attack these problems, most previous work has focused on discarding some of the instances, in order to keep the memory bounded. In this paper we present a new algorithm, in which the instances are not discarded, but are instead projected onto the space spanned by the previous online hypothesis. We call this algorithm Projectron. While the memory size of the Projectron solution cannot be predicted before training, we prove that its solution is guaranteed to be bounded. We derive a relative mistake bound for the proposed algorithm, and deduce from it a slightly different algorithm which outperforms the Perceptron. We call this second algorithm Projectron++. We show that this algorithm can be extended to handle the multiclass and the structured output settings, resulting, as far as we know, in the first online bounded algorithm that can learn complex classification tasks. The method of bounding the hypothesis representation can be applied to any conservative online algorithm and to other online algorithms, as it is demonstrated for ALMA2 . Experimental results on various data sets show the empirical advantage of our technique compared to various bounded online algorithms, both in terms of memory and accuracy. Keywords: online learning, kernel methods, support vector machines, bounded support set
G. Cauwenberghs and T. Poggio. Incremental and decremental support vector machine learning. In Advances in Neural Information Processing Systems 14, pages 409–415, 2000. N. Cesa-Bianchi, A. Conconi, and C. Gentile. On the generalization ability of on-line learning algorithms. IEEE Trans. on Information Theory, 50(9):2050–2057, 2004. N. Cesa-Bianchi, A. Conconi, and C. Gentile. A second-order perceptron algorithm. SIAM Journal on Computing, 34(3):640–668, 2005. N. Cesa-Bianchi, A. Conconi, and C. Gentile. Tracking the best hyperplane with a simple budget Perceptron. In Proc. of the 19th Conference on Learning Theory, pages 483–498, 2006. L. Cheng, S. V. N. Vishwanathan, D. Schuurmans, S. Wang, and T. Caelli. Implicit online learning with kernels. In Advances in Neural Information Processing Systems 19, pages 249–256, 2007. M. Collins. Discriminative reranking for natural language parsing. In Proc. of the 17th International Conference on Machine Learning, pages 175–182, 2000. K. Crammer and Y. Singer. Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3:951–991, 2003. K. Crammer, J. Kandola, and Y. Singer. Online classification on a budget. In Advances in Neural Information Processing Systems 16, 2003. 2664 B OUNDED K ERNEL -BASED O NLINE L EARNING K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7:551–585, 2006. L. Csat´ and M. Opper. Sparse on-line gaussian processes. Neural Computation, 14:641–668, 2002. o F. Cucker and D. X. Zhou. Learning Theory: An Approximation Theory Viewpoint. Cambridge University Press, New York, NY, USA, 2007. O. Dekel, S. Shalev-Shwartz, and Y. Singer. The Forgetron: A kernel-based perceptron on a budget. SIAM Journal on Computing, 37(5):1342–1372, 2007. T. Downs, K. E. Gates, and A. Masters. Exact simplification of support vectors solutions. Journal of Machine Learning Research, 2:293–297, 2001. M. F. Duarte and Y. H. Hu. Vehicle classification in distributed sensor networks. Journal of Parallel and Distributed Computing, 64:826–838, 2004. Y. Engel, S. Mannor, and R. Meir. The kernel recursive least-squares algorithm. IEEE Transactions on Signal Processing, 52(8):2275–2285, 2004. Y. Freund and R. E. Schapire. Large margin classification using the Perceptron algorithm. Machine Learning, pages 277–296, 1999. C. Gentile. A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, 2:213–242, 2001. C. Gentile. The robustness of the p-norm algorithms. Machine Learning, 53(3):265–299, 2003. J. J. Hull. A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5):550–554, 1994. S. S. Keerthi, D. Decoste, and T. Joachims. A modified finite newton method for fast solution of large scale linear svms. Journal of Machine Learning Research, 6:2005, 2005. J. Kivinen, A. Smola, and R. Williamson. Online learning with kernels. IEEE Trans. on Signal Processing, 52(8):2165–2176, 2004. J. Langford, L. Li, and T. Zhang. Sparse online learning via truncated gradient. In Advances in Neural Information Processing Systems 21, pages 905–912, 2008. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, November 1998. L. Lemel, R. Kassel, and S. Seneff. Speech database development: Design and analysis. Report no. SAIC-86/1546, Proc. DARPA Speech Recognition Workshop, 1986. F. Orabona. DOGMA: A MATLAB Toolbox for Online Learning, 2009. Software available at http: //dogma.sourceforge.net. F. Orabona, C. Castellini, B. Caputo, J. Luo, and G. Sandini. Indoor place recognition using online independent support vector machines. In Proc. of the British Machine Vision Conference 2007, pages 1090–1099, 2007. 2665 O RABONA , K ESHET AND C APUTO J. C. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Sch¨ lkopf, C. Burges, and A. Smola, editors, Advances in kernel methods – support vector o learning, pages 185–208. MIT Press, Cambridge, MA, USA, 1999. D. Prokhorov. IJCNN 2001 neural network competition. Technical report, 2001. F. Rosenblatt. The Perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–407, 1958. B. Sch¨ lkopf, R. Herbrich, and A. J. Smola. A generalized representer theorem. In Proc. of the 14th o Conference on Computational Learning Theory, pages 416–426, London, UK, 2001. SpringerVerlag. B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In Advances in Neural Information Processing Systems 17, 2003. I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for interdependent and structured output spaces. In Proc. of the 21st International Conference on Machine Learning, pages 823–830, 2004. J. Weston, A. Bordes, and L. Bottou. Online (and offline) on an even tighter budget. In Robert G. Cowell and Zoubin Ghahramani, editors, Proc. of AISTATS 2005, pages 413–420, 2005. 2666