nips nips2012 nips2012-174 nips2012-174-reference knowledge-graph by maker-knowledge-mining

174 nips-2012-Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs


Source: pdf

Author: Aharon Birnbaum, Shai S. Shwartz

Abstract: Given α, ϵ, we study the time complexity required to improperly learn a halfspace with misclassification error rate of at most (1 + α) L∗ + ϵ, where L∗ is the γ γ optimal γ-margin error rate. For α = 1/γ, polynomial time and sample complexity is achievable using the hinge-loss. For α = 0, Shalev-Shwartz et al. [2011] showed that poly(1/γ) time is impossible, while learning is possible in ˜ time exp(O(1/γ)). An immediate question, which this paper tackles, is what is achievable if α ∈ (0, 1/γ). We derive positive results interpolating between the polynomial time for α = 1/γ and the exponential time for α = 0. In particular, we show that there are cases in which α = o(1/γ) but the problem is still solvable in polynomial time. Our results naturally extend to the adversarial online learning model and to the PAC learning with malicious noise model. 1


reference text

S. Agmon. The relaxation method for linear inequalities. Canadian Journal of Mathematics, 6(3):382–392, 1954. P. Auer and N. Cesa-Bianchi. On-line learning with malicious noise and the closure algorithm. Annals of mathematics and artificial intelligence, 23(1):83–99, 1998. P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:463–482, 2002. P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101:138–156, 2006. S. Ben-David and H. Simon. Efficient learning of linear perceptrons. In NIPS, 2000. S. Ben-David, D. Pal, , and S. Shalev-Shwartz. Agnostic online learning. In COLT, 2009. S. Ben-David, D. Loker, N. Srebro, and K. Sridharan. Minimizing the misclassification error rate using a surrogate convex loss. In ICML, 2012. E. Blais, R. O’Donnell, and K Wimmer. Polynomial regression under arbitrary product distributions. In COLT, 2008. O. Bousquet. Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms. PhD thesis, Ecole Polytechnique, 2002. N. Cesa-Bianchi and G. Lugosi. Prediction, learning, and games. Cambridge University Press, 2006. N. Cristianini and J. Shawe-Taylor. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004. Y. Freund and R. E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277–296, 1999. Y. Freund and R.E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, August 1997. C. Gentile. The robustness of the p-norm algorithms. Machine Learning, 53(3):265–299, 2003. A. Kalai, A.R. Klivans, Y. Mansour, and R. Servedio. Agnostically learning halfspaces. In Proceedings of the 46th Foundations of Computer Science (FOCS), 2005. A.T. Kalai and R. Sastry. The isotron algorithm: High-dimensional isotonic regression. In Proceedings of the 22th Annual Conference on Learning Theory, 2009. A.R. Klivans, P.M. Long, and R.A. Servedio. Learning halfspaces with malicious noise. The Journal of Machine Learning Research, 10:2715–2740, 2009. P.M. Long and R.A. Servedio. Learning large-margin halfspaces with more malicious noise. In NIPS, 2011. F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–407, 1958. (Reprinted in Neurocomputing (MIT Press, 1988).). B. Sch¨ lkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization o and Beyond. MIT Press, 2002. R.A. Servedio. Smooth boosting and learning with malicious noise. Journal of Machine Learning Research, 4: 633–648, 2003. S. Shalev-Shwartz, O. Shamir, and K. Sridharan. Learning kernel-based halfspaces with the 0-1 loss. SIAM Journal on Computing, 40:1623–1646, 2011. L. G. Valiant. Learning disjunctions of conjunctions. In Proceedings of the 9th International Joint Conference on Artificial Intelligence, pages 560–566, August 1985. V. N. Vapnik. Statistical Learning Theory. Wiley, 1998. T. Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. The Annals of Statistics, 32:56–85, 2004. 9