jmlr jmlr2012 jmlr2012-80 jmlr2012-80-reference knowledge-graph by maker-knowledge-mining

80 jmlr-2012-On Ranking and Generalization Bounds

Source: pdf

Author: Wojciech Rejchel

Abstract: The problem of ranking is to predict or to guess the ordering between objects on the basis of their observed features. In this paper we consider ranking estimators that minimize the empirical convex risk. We prove generalization bounds for the excess risk of such estimators with rates that are 1 faster than √n . We apply our results to commonly used ranking algorithms, for instance boosting or support vector machines. Moreover, we study the performance of considered estimators on real data sets. Keywords: convex risk minimization, excess risk, support vector machine, empirical process, U-process

reference text

R. Adamczak. Moment inequalities for U-statistics. Ann. Probab., 34:2288–2314, 2007. 1389 R EJCHEL S. Agarwal, T. Graepel, R. Herbrich, S. Har-Peled, and D. Roth. Generalization bounds for the area under the ROC curve. J. Machine Learning Research, 6:393–425, 2005. M. A. Arcones and E. Gin´ . Limit theorems for U-processes. Ann. Probab., 21:1494–1542, 1993. e M. A. Arcones and E. Gin´ . U-processes indexed by Vapnik-Cervonenkis classes of functions with e applications to asymptotics and bootstrap of U-statistics with estimated parameters. Stochastic Process. Appl., 52:17–38, 1994. P. L. Bartlett, O. Bousquet, and S. Mendelson. Local rademacher complexities. Ann. Statist., 33: 1497–1537, 2005. P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classiﬁcation and risk bounds. Journal of the American Statistical Association, 101:138–156, 2006. G. Blanchard, G. Lugosi, and N. Vayatis. On the rates of convergence of regularized boosting classiﬁers. J. Machine Learning Research, 4:861–894, 2003. G. Blanchard, O. Bousquet, and P. Massart. Statistical performance of support vector machines. Ann. Statist., 36:489–531, 2008. A. Bose. Bahadur representation of Mm estimates. Ann. Statist., 26:771–777, 1998. S. Boucheron, O. Bousquet, and G. Lugosi. Theory of classiﬁcation: a survey of some recent advances. ESAIM: P&S;, 9:323–375, 2005. C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121–167, 1998. S. Cl´ mencon, G. Lugosi, and N. Vayatis. Ranking and scoring using empirical risk minimization. e ¸ In Proceedings of COLT, pages 1–15, 2005. S. Cl´ mencon, G. Lugosi, and N. Vayatis. Ranking and empirical minimization of U-statistics. Ann. e ¸ Statist., 36:844–874, 2008. C. Cortes and V. N. Vapnik. Support vector networks. Machine Learning, 20:273–297, 1995. P. Cortez, A. Cerdeira, F. Almeida, F. Matos, and J. Reis. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 47:547–553, 2009. D. Cossock and T. Zhang. Subset ranking using regression. In Proceedings of the 19th Annual Conference on Learning Theory, 2006. F. Cucker and S. Smale. On the mathematical foundations of learning. Bulletin of the American Mathematical Society, 39:1–49, 2002. V. H. de la Pe˜ a and E. Gin´ . Decoupling: From Dependence to Independence. Springer-Verlag, n e New York, 1999. E. Dimitriadou, K. Hornik, F. Leisch, D. Meyer, and A. Weingessel. e1071: Functions of the Department of Statistics (e1071). TU Wien, 2010. http://CRAN.R-project.org/package=e1071. 1390 Misc URL O N R ANKING AND G ENERALIZATION B OUNDS A. Frank and A. Asuncion. UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences, 2010. URL http://archive.ics.uci.edu/ml. Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. of Computer and System Sciences, 55:119–139, 1997. Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efﬁcient boosting algorithm for combining preferences. J. Machine Learning Research, 4:933–969, 2004. R. Herbrich, T. Graepel, and K. Obermayer. Large Margin Rank Boundaries for Ordinal Regression, chapter 7. Advanced Large Margin Classiﬁers. MIT Press, Cambridge, 2000. W. Hoeffding. A class of statistics with asymptotically normal distribution. Ann. Math. Statist., 19: 293–325, 1948. T. Joachims. Training linear svms in linear time. In Proceedings of the ACM KDD, pages 217–226, 2006. G. Lugosi and N. Vayatis. On the bayes-risk consistency of regularized boosting methods. Ann. Statist., 32:30–55, 2004. P. Major. An estimate of the supremum of a nice class of stochastic integrals and U-statistics. Probab. Theory Related Fields, 134:489–537, 2006. P. Massart. Some applications of concentration inequalities to statistics. Probability theory. Ann. Fac. Sci. Toulouse Math., 9:245–303, 2000. S. Mendelson. Improving the sample complexity using global data. IEEE Trans. Inform. Theory, 48:1977–1991, 2002. S. Mendelson. A Few Notes on Statistical Learning Theory, chapter 1. Advanced Lectures in Machine Learning. Springer, 2003. W. Niemiro. Asymptotics for M-estimators deﬁned by convex minimization. Ann. Statist., 20: 1514–1533, 1992. W. Niemiro and W. Rejchel. Rank correlation estimators and their limiting distributions. Statistical Papers, 50:887–893, 2009. D. Nolan and D. Pollard. U-processes: rates of convergence. Ann. Statist., 15:780–799, 1987. A. Pakes and D. Pollard. Simulation and asymptotics of optimization estimators. Econometrica, 57:1027–1057, 1989. D. Pollard. Convergence of Stochastic Processes. Springer, New York, 1984. R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2009. URL http://www.R-project.org. W. Rejchel. Ranking - convex risk minimization. In Proceedings of WASET, pages 172–178, 2009. 1391 R EJCHEL C. Rudin. Ranking with a p-norm push. In In Proceedings of the 19th Annual Conference on Learning Theory, 2006. C. Scovel and I. Steinwart. Fast rates for support vector machines using Gaussian kernels, 2005. Preprint. C. Scovel and I. Steinwart. Fast rates for support vector machines using Gaussian kernels. Ann. Statist., 35:575–607, 2007. R. P. Sherman. The limiting distributions of the maximum rank correlation estimator. Econometrica, 61:123–137, 1993. I. Steinwart. On the inﬂuence of the kernel on the consistency of support vector machines. J. of Machine Learning Research, 2:67–93, 2001. M. Talagrand. Sharper bounds for Gaussian and empirical processes. Ann. Probab., 22:28–76, 1994. A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Verlag, New York, 1996. V. N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998. I. C. Yeh. Modeling of strength of high performance concrete using artiﬁcial neural networks. Cement and Concrete Research, 28:1797–1808, 1998. 1392