nips nips2001 nips2001-31 nips2001-31-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ralf Herbrich, Robert C. Williamson
Abstract: In contrast to standard statistical learning theory which studies uniform bounds on the expected error we present a framework that exploits the specific learning algorithm used. Motivated by the luckiness framework [8] we are also able to exploit the serendipity of the training sample. The main difference to previous approaches lies in the complexity measure; rather than covering all hypotheses in a given hypothesis space it is only necessary to cover the functions which could have been learned using the fixed learning algorithm. We show how the resulting framework relates to the VC, luckiness and compression frameworks. Finally, we present an application of this framework to the maximum margin algorithm for linear classifiers which results in a bound that exploits both the margin and the distribution of the data in feature space. 1
[1) M. Anthony and P. Bartlett. A Theory of Learning in Artificial Neural Networks. Cambridge University Press, 1999. [2) O. Bousquet and A. Elisseeff. Algorithmic stability and generalization performance. In T. K. Leen , T. G. Dietterich, and V. Tresp, editors, Advances in N eural Information Processing Systems 13, pages 196- 202. MIT Press, 2001. [3) N. Cristianini, A. Elisseeff, and J. Shawe-Taylor. On optimizing kernel alignment . Technical Report NC2-TR-2001-087, NeuroCOLT, http: //www.neurocolt.com. 2001. [4) R. Herbrich and T . Graepel. A PAC-Bayesian margin bound for linear classifiers: Why SVMs work. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 224- 230 , Cambridge, MA , 2001. MIT Press. [5) R. Herbrich and R. C. Williamson. Algorithmic luckiness. Technical report, Microsoft Research, 2002. [6) N . Littlestone and M. Warmuth. Relating data compression and learnability. Technical report, University of California Santa Cruz, 1986. [7) Y . Makovoz. Random approximants and neural networks. Journal of Approximation Theory, 85:98- 109, 1996. [8) J. Shawe-Taylor, P. L. Bartlett, R. C. Williamson, and M. Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5):1926- 1940, 1998. [9) V . Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, 1998. [10) V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264- 281, 1971.