nips nips2001 nips2001-139 nips2001-139-reference knowledge-graph by maker-knowledge-mining

139 nips-2001-Online Learning with Kernels


Source: pdf

Author: Jyrki Kivinen, Alex J. Smola, Robert C. Williamson

Abstract: We consider online learning in a Reproducing Kernel Hilbert Space. Our method is computationally efficient and leads to simple algorithms. In particular we derive update equations for classification, regression, and novelty detection. The inclusion of the -trick allows us to give a robust parameterization. Moreover, unlike in batch learning where the -trick only applies to the -insensitive loss function we are able to derive general trimmed-mean types of estimators such as for Huber’s robust loss.     ¡


reference text

[1] K. P. Bennett and O. L. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1:23–34, 1992.

[2] G. Cauwenberghs and T. Poggio. Incremental and decremental support vector machine learning. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 409–415. MIT Press, 2001.

[3] L. Csat´ and M. Opper. Sparse representation for gaussian process models. In T. K. o Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 444–450. MIT Press, 2001.

[4] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Technical report, Stanford University, Dept. of Statistics, 1998.

[5] C. Gentile. A new approximate maximal margin classification algorithm. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 500–506. MIT Press, 2001.

[6] T. Graepel, R. Herbrich, and R. C. Williamson. From margin to sparsity. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 210–216. MIT Press, 2001.

[7] Y. Guo, P. Bartlett, A. Smola, and R. C. Williamson. Norm-based regularization of boosting. Submitted to Journal of Machine Learning Research, 2001.

[8] M. Herbster. Learning additive models online with fast evaluating kernels. In Proc. 14th Annual Conference on Computational Learning Theory (COLT), pages 444–460. Springer, 2001.

[9] P. J. Huber. Robust statistics: a review. Annals of Statistics, 43:1041, 1972.

[10] G. S. Kimeldorf and G. Wahba. Some results on Tchebycheffian spline functions. J. Math. Anal. Applic., 33:82–95, 1971.

[11] J. Kivinen, A.J. Smola, and R.C. Williamson. Large margin classification for moving targets. Unpublished manuscript, 2001.

[12] Y. Li and P.M. Long. The relaxed online maximum margin algorithm. In S. A. Solla, T. K. Leen, and K.-R. M¨ ller, editors, Advances in Neural Information Processing u Systems 12, pages 498–504. MIT Press, 1999.

[13] L. Mason, J. Baxter, P. L. Bartlett, and M. Frean. Functional gradient techniques for combining hypotheses. In A. J. Smola, P. L. Bartlett, B. Sch¨ lkopf, and D. Schuo urmans, editors, Advances in Large Margin Classifiers, Cambridge, MA, 2000. MIT Press. 221–246.

[14] B. Sch¨ lkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating o the support of a high-dimensional distribution. Neural Computation, 13(7), 2001.

[15] B. Sch¨ lkopf, A. Smola, R. C. Williamson, and P. L. Bartlett. New support vector o algorithms. Neural Computation, 12(5):1207–1245, 2000.

[16] V. Vapnik, S. Golowich, and A. Smola. Support vector method for function approximation, regression estimation, and signal processing. In M. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 281– 287, Cambridge, MA, 1997. MIT Press.