jmlr jmlr2013 jmlr2013-117 jmlr2013-117-reference knowledge-graph by maker-knowledge-mining

117 jmlr-2013-Universal Consistency of Localized Versions of Regularized Kernel Methods


Source: pdf

Author: Robert Hable

Abstract: In supervised learning problems, global and local learning algorithms are used. In contrast to global learning algorithms, the prediction of a local learning algorithm in a testing point is only based on training data which are close to the testing point. Every global algorithm such as support vector machines (SVM) can be localized in the following way: in every testing point, the (global) learning algorithm is not applied to the whole training data but only to the k nearest neighbors (kNN) of the testing point. In case of support vector machines, the success of such mixtures of SVM and kNN (called SVM-KNN) has been shown in extensive simulation studies and also for real data sets but only little has been known on theoretical properties so far. In the present article, it is shown how a large class of regularized kernel methods (including SVM) can be localized in order to get a universally consistent learning algorithm. Keywords: machine learning, regularized kernel methods, localization, SVM, k-nearest neighbors, SVM-KNN


reference text

Heinz Bauer. Measure and Integration Theory. Walter de Gruyter & Co., Berlin, 2001. Kristin P. Bennett and Jennifer A. Blue. A support vector machine approach to decision trees. In Proceedings IEEE International Joint Conference on Neural Networks, 1998. Enrico Blanzieri and Anton Bryl. Instance-based spam filtering using SVM nearest neighbor classifier. In The 20th International FLAIRS Conference, pages 441–442, 2007a. Enrico Blanzieri and Anton Bryl. Evaluation of the highest probability SVM nearest neighbor classifier with variable relative error cost. In Fourth Conference on Email and Anti-Spam CEAS 2007, 2007b. URL http://www.ceas.cc/2007/papers/paper-42 upd.pdf. Enrico Blanzieri and Farid Melgani. Nearest neighbor classification of remote sensing images with the maximal margin principle. IEEE Transactions on Geoscience and Remote Sensing, 46(6): 1804–1811, 2008. L´ on Bottou and Vladimir Vapnik. Local learning algorithms. Neural Computation, 4(6):888–900, e 1992. Fu Chang, Chien-Yang Guo, Xiao-Rong Lin, and Chi-Jen Lu. Tree decomposition for large-scale SVM problems. Journal of Machine Learning Research, 11:2935–2972, 2010. Haibin Cheng, Pang-Ning Tan, and Rong Jin. Localized support vector machine and its efficient algorithm. In Proceedings of the Seventh SIAM International Conference on Data Mining, 2007. URL http://www.siam.org/proceedings/datamining/2007/dm07 045cheng.pdf. Haibin Cheng, Pang-Ning Tan, and Rong Jin. Efficient algorithm for localized support vector machine. IEEE Transactions on Knowledge and Data Engineering, 22(4):537–549, 2010. Andreas Christmann and Robert Hable. Consistency of support vector machines using additive kernels for additive models. Computational Statistics and Data Analysis, 56:854–873, 2012. Andreas Christmann and Ingo Steinwart. Consistency and robustness of kernel-based regression in convex risk minimization. Bernoulli, 13(3):799–819, 2007. John B. Conway. A Course in Functional Analysis. Springer-Verlag, New York, 1985. 184 U NIVERSAL C ONSISTENCY OF L OCALIZED V ERSIONS OF R EGULARIZED K ERNEL M ETHODS Zdzislaw Denkowski, Stanislaw Mig´ rski, and Nikolas S. Papageorgiou. An Introduction to Nono linear Analysis: Theory. Kluwer Academic Publishers, Boston, 2003. Luc Devroye, L´ szl´ Gy¨ rfi, Adam Krzy˙ ak, and G´ bor Lugosi. On the strong universal consistency a o o z a of nearest neighbor regression function estimates. The Annals of Statistics, 22:1371–1385, 1994. Luc Devroye, L´ szl´ Gy¨ rfi, and G´ bor Lugosi. A Probabilistic Theory of Pattern Recognition. a o o a Springer, New York, 1996. Richard M. Dudley. Real Analysis and Probability. Cambridge University Press, Cambridge, 2002. Revised reprint of the 1989 original. Nelson Dunford and Jacob T. Schwartz. Linear Operators. I. General Theory. Wiley-Interscience Publishers, New York, 1958. Jinqing Fan and Ir` ne Gijbels. Local Polynomial Modelling and its Applications. Chapman & Hall, e London, 1996. David H. Fremlin. Measure Theory. Vol. 4. Torres Fremlin, Colchester, 2006. L´ szl´ Gy¨ rfi, Michael Kohler, Adam Krzy˙ ak, and Harro Walk. A Distribution-free Theory of a o o z Nonparametric Regression. Springer, New York, 2002. Robert Hable and Andreas Christmann. On qualitative robustness of support vector machines. Journal of Multivariate Analysis, 102:993–1007, 2011. Steven G. Krantz and Harold R. Parks. Geometric Integration Theory. Birkh¨ user, Basel, 2008. a Kalyanapuram R. Parthasarathy. Probability Measures on Metric Spaces. Academic Press Inc., New York, 1967. Bernhard Sch¨ lkopf and Alexander J. Smola. Learning with Kernels. MIT Press, Cambridge, 2002. o Nicola Segata and Enrico Blanzieri. Empirical assessment of classification accuracy of local SVM. In The 18th Annual Belgian-Dutch Conference on Machine Learning (Benelearn 2009), pages 47–55, Tilburg, Belgium, 2009. Nicola Segata and Enrico Blanzieri. Fast and scalable local kernel machines. Journal of Machine Learning Research, 11:1883–1926, 2010. Ingo Steinwart and Andreas Christmann. Support Vector Machines. Springer, New York, 2008. Vladimir Vapnik and L´ on Bottou. Local algorithms for pattern-recognition and dependencies estie mation. Neural Computation, 5(6):893–909, 1993. Donghui Wu, Kristin P. Bennett, Nello Cristianini, John Shawe-Taylor, and Royal Holloway. Large margin trees for induction and transduction. In Proceedings of International Conference on Machine Learning, pages 474–483, 1999. Alon Zakai. Towards a Theory of Learning in High-Dimensional Spaces. PhD thesis, The Hebrew University of Jerusalem, 2008. URL http://icnc.huji.ac.il/phd/theses/files/AlonZakai.pdf. 185 H ABLE Alon Zakai and Ya’acov Ritov. Consistency and localizability. Journal of Machine Learning Research, 10:827–856, 2009. Hao Zhang, Alexander C. Berg, Michael Maire, and Jitendra Malik. SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), pages 2126–2136. IEEE Computer Society, 2006. 186