nips nips2008 nips2008-189 nips2008-189-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mehryar Mohri, Afshin Rostamizadeh
Abstract: This paper presents the first Rademacher complexity-based error bounds for noni.i.d. settings, a generalization of similar existing bounds derived for the i.i.d. case. Our bounds hold in the scenario of dependent samples generated by a stationary β-mixing process, which is commonly adopted in many previous studies of noni.i.d. settings. They benefit from the crucial advantages of Rademacher complexity over other measures of the complexity of hypothesis classes. In particular, they are data-dependent and measure the complexity of a class of hypotheses based on the training sample. The empirical Rademacher complexity can be estimated from such finite samples and lead to tighter generalization bounds. We also present the first margin bounds for kernel-based classification in this non-i.i.d. setting and briefly study their convergence.
[1] M. Anthony and P. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge, UK, 1999.
[2] P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:2002, 2002.
[3] A. Irle. On the consistency in nonparametric estimation under mixing assumptions. Journal of Multivariate Analysis, 60:123–147, 1997.
[4] V. Koltchinskii and D. Panchenko. Rademacher processes and bounding the risk of function learning. In High Dimensional Probability II, pages 443–459. preprint, 2000.
[5] V. Koltchinskii and D. Panchenko. Empirical margin distributions and bounding the generalization error of combined classifiers. Annals of Statistics, 30, 2002.
[6] M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and Processes. Springer, 1991.
[7] A. Lozano, S. Kulkarni, and R. Schapire. Convergence and consistency of regularized boosting algorithms with stationary β-mixing observations. Advances in Neural Information Processing Systems, 18, 2006.
[8] C. McDiarmid. On the method of bounded differences. In Surveys in Combinatorics, pages 148–188. Cambridge University Press, 1989.
[9] R. Meir. Nonparametric time series prediction through adaptive model selection. Machine Learning, 39(1):5–34, 2000.
[10] M. Mohri and A. Rostamizadeh. Stability bounds for non-iid processes. Advances in Neural Information Processing Systems, 2007.
[11] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.
[12] I. Steinwart, D. Hush, and C. Scovel. Learning from dependent observations. Technical Report LA-UR06-3507, Los Alamos National Laboratory, 2007.
[13] L. G. Valiant. A theory of the learnable. ACM Press New York, NY, USA, 1984.
[14] M. Vidyasagar. Learning and Generalization: with Applications to Neural Networks. Springer, 2003.
[15] B. Yu. Rates of convergence for empirical processes of stationary mixing sequences. Annals Probability, 22(1):94–116, 1994. 8