nips nips2010 nips2010-74 nips2010-74-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Thomas Peel, Sandrine Anthoine, Liva Ralaivola
Abstract: We present original empirical Bernstein inequalities for U-statistics with bounded symmetric kernels q. They are expressed with respect to empirical estimates of either the variance of q or the conditional variance that appears in the Bernsteintype inequality for U-statistics derived by Arcones [2]. Our result subsumes other existing empirical Bernstein inequalities, as it reduces to them when U-statistics of order 1 are considered. In addition, it is based on a rather direct argument using two applications of the same (non-empirical) Bernstein inequality for U-statistics. We discuss potential applications of our new inequalities, especially in the realm of learning ranking/scoring functions. In the process, we exhibit an efficient procedure to compute the variance estimates for the special case of bipartite ranking that rests on a sorting argument. We also argue that our results may provide test set bounds and particularly interesting empirical racing algorithms for the problem of online learning of scoring functions. 1
[1] S. Agarwal, T. Graepel, R. Herbrich, S. Har-Peled, and D. Roth. Generalization Bounds for the Area under the ROC Curve. Journal of Machine Learning Research, 6:393–425, 2005.
[2] M. A. Arcones. A bernstein-type inequality for u-statistics and u-processes. Statistics & probability letters, 22(3):239–247, 1995.
[3] J.-Y. Audibert, R. Munos, and C. Szepesv´ ri. Tuning bandit algorithms in stochastic environa ments. In ALT ’07: Proceedings of the 18th international conference on Algorithmic Learning Theory, pages 150–165, Berlin, Heidelberg, 2007. Springer-Verlag.
[4] S. Boucheron, O. Bousquet, and G. Lugosi. Theory of classification : A survey of some recent advances. ESAIM. P&S;, 9:323–375, 2005.
[5] N. Cesa-Bianchi, A. Conconi, and C. Gentile. On the generalization ability of online learning algorithms. IEEE Transactions on Information Theory, 50(9):2050–2057, 2004.
[6] S. Cl´ mencon, G. Lugosi, and N. Vayatis. Ranking and empirical minimization of u -statistics. e ¸ The Annals of Statistics, 36(2):844–874, April 2008.
[7] Y. Freund, R. Iyer, R.E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4:933–969, 2003.
[8] J. H´ jek and Z. Sid´ k. Theory of Rank Tests. Academic Press, 1967. a a
[9] J. A. Hanley and B. J. Mcneil. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 143(1):29–36, April 1982.
[10] W. Hoeffding. A Class of Statistics with Asymptotically Normal Distribution. Annals of Mathematical Statistics, 19(3):293–325, 1948.
[11] W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, 1963.
[12] O. Maron and A. Moore. Hoeffding races: Accelerating model selection search for classification and function approximation. In Adv. in Neural Information Processing Systems NIPS 93, pages 59–66, 1993.
[13] A. Maurer and M. Pontil. Empirical bernstein bounds and sample-variance penalization. In COLT 09: Proc. of The 22nd Annual Conference on Learning Theory, 2009.
[14] V. Mnih, C. Szepesv´ ri, and J.-Y. Audibert. Empirical bernstein stopping. In ICML ’08: a Proceedings of the 25th international conference on Machine learning, pages 672–679, New York, NY, USA, 2008. ACM.
[15] C. Rudin and R. E. Schapire. Margin-based ranking and an equivalence between AdaBoost and RankBoost. Journal of Machine Learning Research, 10:2193–2232, Oct 2009.
[16] R. J. Serfling. Approximation theorems of mathematical statistics. J. Wiley & Sons, 1980. 9