nips nips2001 nips2001-138 nips2001-138-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nicolò Cesa-bianchi, Alex Conconi, Claudio Gentile
Abstract: In this paper we show that on-line algorithms for classification and regression can be naturally used to obtain hypotheses with good datadependent tail bounds on their risk. Our results are proven without requiring complicated concentration-of-measure arguments and they hold for arbitrary on-line learning algorithms. Furthermore, when applied to concrete on-line algorithms, our results yield tail bounds that in many cases are comparable or better than the best known bounds.
[1] Angluin, D. Queries and concept learning, Machine Learning, 2(4), 319-342, 1988.
[2] Azoury, K., and Warmuth, M. K. Relative loss bounds for on-line density estimation with the exponential family of distributions, Machine Learning, 43:211–246, 2001.
[3] K. Azuma. Weighted sum of certain dependend random variables. Tohoku Mathematical Journal, 68, 357–367, 1967. 2 Using a slightly different linear regression algorithm, Forster and Warmuth [7] have proven a sharper bound on the expected relative loss. In particular, they have exhibited an algorithm computing hypothesis such that in expectation (over ) the relative risk is bounded by . ` X aY0 W US¡IGECA4 V T R Q P H FD B @ &0 7 986 D 2 s q i rrpe 4 5D h 20 & ' 31)(& g e c 54 fdb
[4] A. Blum, A. Kalai, and J. Langford. Beating the hold-out: bounds for k-fold and progressive cross-validation. In 12th COLT, pages 203–208, 1999.
[5] S. Boucheron, G. Lugosi, and P. Massart. A sharp concentration inequality with applications. Random Structures and Algorithms, 16, 277–292, 2000.
[6] N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire, and M. K. Warmuth. How to use expert advice. Journal of the ACM, 44(3), 427–485, 1997.
[7] J. Forster, and M. K. Warmuth. Relative expected instantaneous loss bounds. 13th COLT, 90–99, 2000.
[8] Y. Freund and R. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3), 277–296, 1999.
[9] C. Gentile The robustness of the -norm algorithms. Manuscript, 2001. An extended abstract (co-authored with N. Littlestone) appeared in 12th COLT, 1–11, 1999.
[10] A. J. Grove, N. Littlestone, and D. Schuurmans. General convergence results for linear discriminant updates, Machine Learning, 43(3), 173–210, 2001. ¥
[11] D. Helmbold and M. K. Warmuth. On weak learning. JCSS, 50(3), 551–573, June 1995.
[12] A. Hoerl, and R. Kennard, Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12, 55–67, 1970.
[13] J. Kivinen and M. K. Warmuth. Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132(1), 1–64, 1997.
[14] N. Littlestone. Learning quickly when irrelevant attributes abound: A new linearthreshold algorithm. Machine Learning, 2, 285–318, 1988.
[15] N. Littlestone. From on-line to batch learning. In 2nd COLT, 269–284, 1989.
[16] N. Littlestone and M. K. Warmuth. The weighted majority algorithm. Information and Computation, 108(2), 212–261, 1994.
[17] F. Rosenblatt. Principles of neurodynamics: Perceptrons and the theory of brain mechanisms. Spartan Books, Washington, D.C., 1962.
[18] C. Saunders, A. Gammerman, and V. Vovk. Ridge Regression Learning Algorithm in Dual Variables, In 15th ICML, 1998.
[19] J. Shawe-Taylor, P. Bartlett, R. Williamson, and M. Anthony, Structural Risk Minimization over Data-dependent Hierarchies. IEEE Trans. IT, 44, 1926–1940, 1998.
[20] J. Shawe-Taylor and N. Cristianini, On the generalization of soft margin algorithms, 2000. NeuroCOLT2 Tech. Rep. 2000-082, http://www.neurocolt.org.
[21] V.N. Vapnik, Statistical learning theory. J. Wiley and Sons, NY, 1998.
[22] V. Vovk, Competitive on-line linear regression. In NIPS*10, 1998. Also: Tech. Rep. Department of Computer Science, Royal Holloway, University of London, CSD-TR97-13, 1997.
[23] R. C. Williamson, J. Shawe-Taylor, B. Sch¨ lkopf and A. J. Smola, Sample Based o Generalization Bounds, IEEE Trans. IT, to appear.