nips nips2009 nips2009-94 nips2009-94-reference knowledge-graph by maker-knowledge-mining

94 nips-2009-Fast Learning from Non-i.i.d. Observations


Source: pdf

Author: Ingo Steinwart, Andreas Christmann

Abstract: We prove an oracle inequality for generic regularized empirical risk minimization algorithms learning from α-mixing processes. To illustrate this oracle inequality, we use it to derive learning rates for some learning methods including least squares SVMs. Since the proof of the oracle inequality uses recent localization ideas developed for independent and identically distributed (i.i.d.) processes, it turns out that these learning rates are close to the optimal rates known in the i.i.d. case. 1


reference text

[1] P. L. Bartlett, O. Bousquet, and S. Mendelson. Local Rademacher complexities. Ann. Statist., 33:1497–1537, 2005.

[2] G. Blanchard, G. Lugosi, and N. Vayatis. On the rate of convergence of regularized boosting classifiers. J. Mach. Learn. Res., 4:861–894, 2003. 8

[3] R. C. Bradley. Introduction to Strong Mixing Conditions. Vol. 1-3. Kendrick Press, Heber City, UT, 2007.

[4] J. Fan and Q. Yao. Nonlinear Time Series. Springer, New York, 2003.

[5] A. Irle. On consistency in nonparametric estimation under mixing conditions. J. Multivariate Anal., 60:123–147, 1997.

[6] W. S. Lee, P. L. Bartlett, and R. C. Williamson. The importance of convexity in learning with squared loss. IEEE Trans. Inform. Theory, 44:1974–1980, 1998.

[7] A. Lozano, S. Kulkarni, and R. Schapire. Convergence and consistency of regularized boosting algorithms with stationary β-mixing observations. In Y. Weiss, B. Sch¨ lkopf, and J. Platt, o editors, Advances in Neural Information Processing Systems 18, pages 819–826. MIT Press, Cambridge, MA, 2006.

[8] R. Meir. Nonparametric time series prediction through adaptive model selection. Mach. Learn., 39:5–34, 2000.

[9] D. S. Modha and E. Masry. Minimum complexity regression estimation with weakly dependent observations. IEEE Trans. Inform. Theory, 42:2133–2145, 1996.

[10] M. Mohri and A. Rostamizadeh. Stability bounds for non-i.i.d. processes. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1025–1032. MIT Press, Cambridge, MA, 2008.

[11] M. Mohri and A. Rostamizadeh. Rademacher complexity bounds for non-i.i.d. processes. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1097–1104. 2009.

[12] I. Steinwart. Two oracle inequalities for regularized boosting classiers. Statistics and Its Interface, 2:271284, 2009.

[13] I. Steinwart and A. Christmann. Support Vector Machines. Springer, New York, 2008.

[14] I. Steinwart and A. Christmann. Estimating conditional quantiles with the help of the pinball loss. Bernoulli, accepted with minor revision.

[15] I. Steinwart, D. Hush, and C. Scovel. Learning from dependent observations. J. Multivariate Anal., 100:175–194, 2009.

[16] I. Steinwart, D. Hush, and C. Scovel. Optimal rates for regularized least squares regression. In S. Dasgupta and A. Klivans, editors, Proceedings of the 22nd Annual Conference on Learning Theory, pages 79–93. 2009.

[17] H. Sun and Q. Wu. Regularized least square regression with dependent samples. Adv. Comput. Math., to appear.

[18] M. Vidyasagar. A Theory of Learning and Generalization: With Applications to Neural Networks and Control Systems. Springer, London, 2nd edition, 2003.

[19] Y.-L. Xu and D.-R. Chen. Learning rates of regularized regression for exponentially strongly mixing sequence. J. Statist. Plann. Inference, 138:2180–2189, 2008.

[20] B. Yu. Rates of convergence for empirical processes of stationary mixing sequences. Ann. Probab., 22:94–116, 1994.

[21] B. Zou and L. Li. The performance bounds of learning machines based on exponentially strongly mixing sequences. Comput. Math. Appl., 53:1050–1058, 2007. 9