jmlr jmlr2009 jmlr2009-18 jmlr2009-18-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Alon Zakai, Ya'acov Ritov
Abstract: We show that all consistent learning methods—that is, that asymptotically achieve the lowest possible expected loss for any distribution on (X,Y )—are necessarily localizable, by which we mean that they do not signiÄ?Ĺš cantly change their response at a particular point when we show them only the part of the training set that is close to that point. This is true in particular for methods that appear to be deÄ?Ĺš ned in a non-local manner, such as support vector machines in classiÄ?Ĺš cation and least-squares estimators in regression. Aside from showing that consistency implies a speciÄ?Ĺš c form of localizability, we also show that consistency is logically equivalent to the combination of two properties: (1) a form of localizability, and (2) that the method’s global mean (over the entire X distribution) correctly estimates the true mean. Consistency can therefore be seen as comprised of two aspects, one local and one global. Keywords: consistency, local learning, regression, classiÄ?Ĺš cation
C. G. Atkeson, A. W. Moore, and S. Schaal. Locally weighted learning. ArtiÄ?Ĺš cial Intelligence Review, 11:11–73, 1997. 854 C ONSISTENCY AND L OCALIZABILITY P. L. Bartlett and M. Traskin. Adaboost is consistent. In Advances in Neural Information Processing Systems 19, pages 105–112. MIT Press, Cambridge, MA, 2007. M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, 2003. Y. Bengio, O. Delalleau, and N. Le Roux. The curse of highly variable functions for local kernel machines. In Advances in Neural Information Processing Systems 18, pages 107–114. MIT Press, Cambridge, MA, 2006. L. Bottou and V. N. Vapnik. Local learning algorithms. Neural Computation, 4(6):888–900, 1992. W. Cleveland and C. Loader. Smoothing by local regression: Principles and methods. Technical report, AT&T; Bell Laboratories, Murray Hill, NY., 1995. Available at http://citeseer.ist.psu.edu/194800.html. L. Devroye. On the almost everywhere convergence of nonparametric regression function estimates. Annals of Statistics, 9:1310–1319, 1981. L. Devroye and T. J. Wagner. Distribution-free consistency results in nonparametric discrimination and regression function estimation. Annals of Statistics, 8(2):231–239, 1980. L. Devroye, L. GyorÄ?Ĺš , and G. Lugosi. A Probabilistic Theory of Pattern Recognition. SpringerVerlag, New York, NY, 1996. Y. Freund and R. Schapire. A short introduction to boosting. J. Japan. Soc. for Artif. Intel., 14(5): 771–780, 1999. L. GyorÄ?Ĺš , M. Kohler, A. Krzyzak, and H. Walk. A Distribution-Free Theory of Nonparametric Regression. Springer, Berlin, 2002. O. Kouropteva, O. Okun, and M. Pietikine. Supervised locally linear embedding algorithm for pattern recognition. Lecture Notes in Computer Science, 2652:386–394, 2003. H. Li, L. Teng, W. Chen, and I. Shen. Supervised learning on local tangent space. In L. GyorÄ?Ĺš , editor, Lecture Notes in Computer Science, pages 546–551. Springer-Verlag, 2005. G. Lugosi and K. Zeger. Nonparametric estimation via empirical risk minimization. IEEE Transactions on Information Theory, 41(3):677–687, 1995. S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000. G. Saunders, A. Gammerman, and V. Vovk. Ridge regression learning algorithm in dual variables. In Proc. 15th International Conf. on Machine Learning, pages 515–521. Morgan Kaufmann, San Francisco, CA, 1998. A. J. Smola and B. Schoelkopf. A tutorial on support vector regression, 1998. Available at http://citeseer.ist.psu.edu/smola03tutorial.html. 855 Z AKAI AND R ITOV I. Steinwart. Support vector machines are universally consistent. Journal of Complexity, 18:768– 791, 2002. C. J. Stone. Consistent nonparametric regression. Annals of Statistics, 5:595–645, 1977. V. N. Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, NY, 1998. V. N. Vapnik and L. Bottou. Local algorithms for pattern recognition and dependencies estimation. Neural Computation, 5(6):893–909, 1993. A. Zakai and Y. Ritov. How local should a learning method be? In 21st Annual Conference on Learning Theory (COLT), pages 205–216, 2008. T. Zhang. Statistical behavior and consistency of classiÄ?Ĺš cation methods based on convex risk minimization. Annals of Statistics, 32(1):56–134, 2004. 856