jmlr jmlr2009 jmlr2009-82 jmlr2009-82-reference knowledge-graph by maker-knowledge-mining

82 jmlr-2009-Robustness and Regularization of Support Vector Machines

Source: pdf

Author: Huan Xu, Constantine Caramanis, Shie Mannor

Abstract: We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equivalence suggests more general SVM-like algorithms for classiﬁcation that explicitly build in protection to noise, and at the same time control overﬁtting. On the analysis front, the equivalence of robustness and regularization provides a robust optimization interpretation for the success of regularized SVMs. We use this new robustness interpretation of SVMs to give a new proof of consistency of (kernelized) SVMs, thus establishing robustness as the reason regularized SVMs generalize well. Keywords: robustness, regularization, generalization, kernel, support vector machine

reference text

M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999. P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. The Journal of Machine Learning Research, 3:463–482, November 2002. P. L. Bartlett, O. Bousquet, and S. Mendelson. Local Rademacher complexities. The Annals of Statistics, 33(4):1497–1537, 2005. A. Ben-Tal and A. Nemirovski. Robust solutions of uncertain linear programs. Operations Research Letters, 25(1):1–13, August 1999. K. P. Bennett and O. L. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1(1):23–34, 1992. D. Bertsimas and A. Fertis. Personal Correspondence, March 2008. D. Bertsimas and M. Sim. The price of robustness. Operations Research, 52(1):35–53, January 2004. C. Bhattacharyya. Robust classiﬁcation of noisy data using second order cone programming approach. In Proceedings International Conference on Intelligent Sensing and Information Processing, pages 433–438, Chennai, India, 2004. C. Bhattacharyya, L. R. Grate, M. I. Jordan, L. El Ghaoui, and I. S. Mian. Robust sparse hyperplane classiﬁers: Application to uncertain molecular proﬁling data. Journal of Computational Biology, 11(6):1073–1089, 2004a. C. Bhattacharyya, K. S. Pannagadatta, and A. J. Smola. A second order cone programming formulation for classifying missing data. In Lawrence K. Saul, Yair Weiss, and L´ on Bottou, editors, e Advances in Neural Information Processing Systems (NIPS17), Cambridge, MA, 2004b. MIT Press. J. Bi and T. Zhang. Support vector classiﬁcation with input data uncertainty. In Lawrence K. Saul, Yair Weiss, and L´ on Bottou, editors, Advances in Neural Information Processing Systems e (NIPS17), Cambridge, MA, 2004. MIT Press. 1507 X U , C ARAMANIS AND M ANNOR C. M. Bishop. Training with noise is equivalent to Tikhonov regularization. ral Computation, 7(1):108–116, 1995. doi: 10.1162/neco.1995.7.1.108. http://www.mitpressjournals.org/doi/abs/10.1162/neco.1995.7.1.108. NeuURL B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classiﬁers. In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, pages 144–152, New York, NY, 1992. O. Bousquet and A. Elisseeff. Stability and generalization. The Journal of Machine Learning Research, 2:499–526, 2002. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. A. Christmann and I. Steinwart. On robustness properties of convex risk minimization methods for pattern recognition. The Journal of Machine Learning Research, 5:1007–1034, 2004. A. Christmann and I. Steinwart. Consistency and robustness of kernel based regression in convex risk minimization. Bernoulli, 13(3):799–819, 2007. A. Christmann and A. Van Messem. Bouligand derivatives and robustness of support vector machines for regression. The Journal of Machine Learning Research, 9:915–936, 2008. C. Cortes and V. N. Vapnik. Support vector networks. Machine Learning, 20:1–25, 1995. R. Durrett. Probability: Theory and Examples. Duxbury Press, 2004. L. El Ghaoui and H. Lebret. Robust solutions to least-squares problems with uncertain data. SIAM Journal on Matrix Analysis and Applications, 18:1035–1064, 1997. T. Evgeniou, M. Pontil, and T. Poggio. Regularization networks and support vector machines. In A. J. Smola, P. L. Bartlett, B. Sch¨ lkopf, and D. Schuurmans, editors, Advances in Large Margin o Classiﬁers, pages 171–203, Cambridge, MA, 2000. MIT Press. A. Globerson and S. Roweis. Nightmare at test time: Robust learning by feature deletion. In ICML ’06: Proceedings of the 23rd International Conference on Machine Learning, pages 353–360, New York, NY, USA, 2006. ACM Press. F. Hampel. The inﬂuence curve and its role in robust estimation. Journal of the American Statistical Association, 69(346):383–393, 1974. F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel. Robust Statistics: The Approach Based on Inﬂuence Functions. John Wiley & Sons, New York, 1986. P. J. Huber. Robust Statistics. John Wiley & Sons, New York, 1981. M. Kearns, Y. Mansour, A. Ng, and D. Ron. An experimental and theoretical comparison of model selection methods. Machine Learning, 27:7–50, 1997. V. Koltchinskii and D. Panchenko. Empirical margin distributions and bounding the generalization error of combined classiﬁers. The Annals of Statistics, 30(1):1–50, 2002. 1508 ROBUSTNESS AND R EGULARIZATION OF SVM S S. Kutin and P. Niyogi. Almost-everywhere algorithmic stability and generalization error. In UAI2002: Uncertainty in Artiﬁcial Intelligence, pages 275–282, 2002. G. R. Lanckriet, L. El Ghaoui, C. Bhattacharyya, and M. I. Jordan. A robust minimax approach to classiﬁcation. The Journal of Machine Learning Research, 3:555–582, 2003. R. A. Maronna, R. D. Martin, and V. J. Yohai. Robust Statistics: Theory and Methods. John Wiley & Sons, New York, 2006. S. Mukherjee, P. Niyogi, T. Poggio, and R. Rifkin. Learning theory: Stability is sufﬁcient for generalization and necessary and sufﬁcient for consistency of empirical risk minimization. Advances in Computational Mathematics, 25(1-3):161–193, 2006. T. Poggio, R. Rifkin, S. Mukherjee, and P. Niyogi. General conditions for predictivity in learning theory. Nature, 428(6981):419–422, 2004. P. J. Rousseeuw and A. M. Leroy. Robust Regression and Outlier Detection. John Wiley & Sons, New York, 1987. B. Sch¨ lkopf and A. J. Smola. Learning with Kernels. MIT Press, 2002. o P. K. Shivaswamy, C. Bhattacharyya, and A. J. Smola. Second order cone programming approaches for handling missing and uncertain data. The Journal of Machine Learning Research, 7:1283– 1314, July 2006. A. J. Smola, B. Sch¨ lkopf, and K. M¨ ller. The connection between regularization operators and o u support vector kernels. Neural Networks, 11:637–649, 1998. I. Steinwart. Consistency of support vector machines and other regularized kernel classiﬁers. IEEE Transactions on Information Theory, 51(1):128–142, 2005. I. Steinwart and A. Christmann. Support Vector Machines. Springer, New York, 2008. C. H. Teo, A. Globerson, S. Roweis, and A. J. Smola. Convex learning with invariances. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1489–1496, Cambridge, MA, 2008. MIT Press. T. Trafalis and R. Gilbert. Robust support vector machines for classiﬁcation and computational issues. Optimization Methods and Software, 22(1):187–198, February 2007. A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. SpringerVerlag, New York, 2000. V. N. Vapnik and A. Chervonenkis. Theory of Pattern Recognition. Nauka, Moscow, 1974. V. N. Vapnik and A. Chervonenkis. The necessary and sufﬁcient conditions for consistency in the empirical risk minimization method. Pattern Recognition and Image Analysis, 1(3):260–284, 1991. V. N. Vapnik and A. Lerner. Pattern recognition using generalized portrait method. Automation and Remote Control, 24:744–780, 1963. 1509 X U , C ARAMANIS AND M ANNOR H. Xu, C. Caramanis, and S. Mannor. Robust regression and Lasso. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1801–1808, 2009. 1510