jmlr jmlr2009 jmlr2009-82 jmlr2009-82-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Huan Xu, Constantine Caramanis, Shie Mannor
Abstract: We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equivalence suggests more general SVM-like algorithms for classification that explicitly build in protection to noise, and at the same time control overfitting. On the analysis front, the equivalence of robustness and regularization provides a robust optimization interpretation for the success of regularized SVMs. We use this new robustness interpretation of SVMs to give a new proof of consistency of (kernelized) SVMs, thus establishing robustness as the reason regularized SVMs generalize well. Keywords: robustness, regularization, generalization, kernel, support vector machine
M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999. P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. The Journal of Machine Learning Research, 3:463–482, November 2002. P. L. Bartlett, O. Bousquet, and S. Mendelson. Local Rademacher complexities. The Annals of Statistics, 33(4):1497–1537, 2005. A. Ben-Tal and A. Nemirovski. Robust solutions of uncertain linear programs. Operations Research Letters, 25(1):1–13, August 1999. K. P. Bennett and O. L. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1(1):23–34, 1992. D. Bertsimas and A. Fertis. Personal Correspondence, March 2008. D. Bertsimas and M. Sim. The price of robustness. Operations Research, 52(1):35–53, January 2004. C. Bhattacharyya. Robust classification of noisy data using second order cone programming approach. In Proceedings International Conference on Intelligent Sensing and Information Processing, pages 433–438, Chennai, India, 2004. C. Bhattacharyya, L. R. Grate, M. I. Jordan, L. El Ghaoui, and I. S. Mian. Robust sparse hyperplane classifiers: Application to uncertain molecular profiling data. Journal of Computational Biology, 11(6):1073–1089, 2004a. C. Bhattacharyya, K. S. Pannagadatta, and A. J. Smola. A second order cone programming formulation for classifying missing data. In Lawrence K. Saul, Yair Weiss, and L´ on Bottou, editors, e Advances in Neural Information Processing Systems (NIPS17), Cambridge, MA, 2004b. MIT Press. J. Bi and T. Zhang. Support vector classification with input data uncertainty. In Lawrence K. Saul, Yair Weiss, and L´ on Bottou, editors, Advances in Neural Information Processing Systems e (NIPS17), Cambridge, MA, 2004. MIT Press. 1507 X U , C ARAMANIS AND M ANNOR C. M. Bishop. Training with noise is equivalent to Tikhonov regularization. ral Computation, 7(1):108–116, 1995. doi: 10.1162/neco.1995.7.1.108. http://www.mitpressjournals.org/doi/abs/10.1162/neco.1995.7.1.108. NeuURL B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, pages 144–152, New York, NY, 1992. O. Bousquet and A. Elisseeff. Stability and generalization. The Journal of Machine Learning Research, 2:499–526, 2002. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. A. Christmann and I. Steinwart. On robustness properties of convex risk minimization methods for pattern recognition. The Journal of Machine Learning Research, 5:1007–1034, 2004. A. Christmann and I. Steinwart. Consistency and robustness of kernel based regression in convex risk minimization. Bernoulli, 13(3):799–819, 2007. A. Christmann and A. Van Messem. Bouligand derivatives and robustness of support vector machines for regression. The Journal of Machine Learning Research, 9:915–936, 2008. C. Cortes and V. N. Vapnik. Support vector networks. Machine Learning, 20:1–25, 1995. R. Durrett. Probability: Theory and Examples. Duxbury Press, 2004. L. El Ghaoui and H. Lebret. Robust solutions to least-squares problems with uncertain data. SIAM Journal on Matrix Analysis and Applications, 18:1035–1064, 1997. T. Evgeniou, M. Pontil, and T. Poggio. Regularization networks and support vector machines. In A. J. Smola, P. L. Bartlett, B. Sch¨ lkopf, and D. Schuurmans, editors, Advances in Large Margin o Classifiers, pages 171–203, Cambridge, MA, 2000. MIT Press. A. Globerson and S. Roweis. Nightmare at test time: Robust learning by feature deletion. In ICML ’06: Proceedings of the 23rd International Conference on Machine Learning, pages 353–360, New York, NY, USA, 2006. ACM Press. F. Hampel. The influence curve and its role in robust estimation. Journal of the American Statistical Association, 69(346):383–393, 1974. F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel. Robust Statistics: The Approach Based on Influence Functions. John Wiley & Sons, New York, 1986. P. J. Huber. Robust Statistics. John Wiley & Sons, New York, 1981. M. Kearns, Y. Mansour, A. Ng, and D. Ron. An experimental and theoretical comparison of model selection methods. Machine Learning, 27:7–50, 1997. V. Koltchinskii and D. Panchenko. Empirical margin distributions and bounding the generalization error of combined classifiers. The Annals of Statistics, 30(1):1–50, 2002. 1508 ROBUSTNESS AND R EGULARIZATION OF SVM S S. Kutin and P. Niyogi. Almost-everywhere algorithmic stability and generalization error. In UAI2002: Uncertainty in Artificial Intelligence, pages 275–282, 2002. G. R. Lanckriet, L. El Ghaoui, C. Bhattacharyya, and M. I. Jordan. A robust minimax approach to classification. The Journal of Machine Learning Research, 3:555–582, 2003. R. A. Maronna, R. D. Martin, and V. J. Yohai. Robust Statistics: Theory and Methods. John Wiley & Sons, New York, 2006. S. Mukherjee, P. Niyogi, T. Poggio, and R. Rifkin. Learning theory: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Advances in Computational Mathematics, 25(1-3):161–193, 2006. T. Poggio, R. Rifkin, S. Mukherjee, and P. Niyogi. General conditions for predictivity in learning theory. Nature, 428(6981):419–422, 2004. P. J. Rousseeuw and A. M. Leroy. Robust Regression and Outlier Detection. John Wiley & Sons, New York, 1987. B. Sch¨ lkopf and A. J. Smola. Learning with Kernels. MIT Press, 2002. o P. K. Shivaswamy, C. Bhattacharyya, and A. J. Smola. Second order cone programming approaches for handling missing and uncertain data. The Journal of Machine Learning Research, 7:1283– 1314, July 2006. A. J. Smola, B. Sch¨ lkopf, and K. M¨ ller. The connection between regularization operators and o u support vector kernels. Neural Networks, 11:637–649, 1998. I. Steinwart. Consistency of support vector machines and other regularized kernel classifiers. IEEE Transactions on Information Theory, 51(1):128–142, 2005. I. Steinwart and A. Christmann. Support Vector Machines. Springer, New York, 2008. C. H. Teo, A. Globerson, S. Roweis, and A. J. Smola. Convex learning with invariances. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1489–1496, Cambridge, MA, 2008. MIT Press. T. Trafalis and R. Gilbert. Robust support vector machines for classification and computational issues. Optimization Methods and Software, 22(1):187–198, February 2007. A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. SpringerVerlag, New York, 2000. V. N. Vapnik and A. Chervonenkis. Theory of Pattern Recognition. Nauka, Moscow, 1974. V. N. Vapnik and A. Chervonenkis. The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognition and Image Analysis, 1(3):260–284, 1991. V. N. Vapnik and A. Lerner. Pattern recognition using generalized portrait method. Automation and Remote Control, 24:744–780, 1963. 1509 X U , C ARAMANIS AND M ANNOR H. Xu, C. Caramanis, and S. Mannor. Robust regression and Lasso. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1801–1808, 2009. 1510