jmlr jmlr2009 jmlr2009-46 jmlr2009-46-reference knowledge-graph by maker-knowledge-mining

46 jmlr-2009-Learning Halfspaces with Malicious Noise

Source: pdf

Author: Adam R. Klivans, Philip M. Long, Rocco A. Servedio

Abstract: We give new algorithms for learning halfspaces in the challenging malicious noise model, where an adversary may corrupt both the labels and the underlying distribution of examples. Our algorithms can tolerate malicious noise rates exponentially larger than previous work in terms of the dependence on the dimension n, and succeed for the fairly broad class of all isotropic log-concave distributions. We give poly(n, 1/ε)-time algorithms for solving the following problems to accuracy ε: • Learning origin-centered halfspaces in Rn with respect to the uniform distribution on the unit ball with malicious noise rate η = Ω(ε2 / log(n/ε)). (The best previous result was Ω(ε/(n log(n/ε))1/4 ).) • Learning origin-centered halfspaces with respect to any isotropic logconcave distribution on Rn with malicious noise rate η = Ω(ε3 / log2 (n/ε)). This is the ﬁrst efﬁcient algorithm for learning under isotropic log-concave distributions in the presence of malicious noise. We also give a poly(n, 1/ε)-time algorithm for learning origin-centered halfspaces under any isotropic log-concave distribution on Rn in the presence of adversarial label noise at rate η = Ω(ε3 / log(1/ε)). In the adversarial label noise setting (or agnostic model), labels can be noisy, but not example points themselves. Previous results could handle η = Ω(ε) but had running time exponential in an unspeciﬁed function of 1/ε. Our analysis crucially exploits both concentration and anti-concentration properties of isotropic log-concave distributions. Our algorithms combine an iterative outlier removal procedure using Principal Component Analysis together with “smooth” boosting. Keywords: PAC learning, noise tolerance, malicious noise, agnostic learning, label noise, halfspace learning, linear classiﬁers c 2009 Adam R. Klivans, Philip M. Long and Rocco A. Servedio. K LIVANS , L ONG AND S ERVEDIO

reference text

S. Arora, L. Babai, J. Stern, and Z. Sweedyk. The hardness of approximate optima in lattices, codes, and systems of linear equations. Proceedings of the 34th Annual Symposium on Foundations of Computer Science, pages 724–733, 1993. A. Blum, A. Frieze, R. Kannan, and S. Vempala. A polynomial time algorithm for learning noisy linear threshold functions. Algorithmica, 22(1/2):35–52, 1997. A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth. Learnability and the VapnikChervonenkis dimension. Journal of the ACM, 36(4):929–965, 1989. S. C. Brubaker. Extensions of Principle Components Analysis. PhD thesis, Georgia Institute of Technology, 2009. N. H. Bshouty, Y. Li, and P. M. Long. Using the doubling dimension to analyze the generalization of learning algorithms. Journal of Computer & System Sciences, 75(6):323–335, 2009. J. Dunagan and S. Vempala. Optimal outlier removal in high-dimensional spaces. J. Computer & System Sciences, 68(2):335–373, 2004. V. Feldman, P. Gopalan, S. Khot, and A. Ponnuswami. New results for learning noisy parities and halfspaces. In Proc. 47th IEEE Symposium on Foundations of Computer Science (FOCS), pages 563–576, 2006. Y. Freund and R. E. Schapire. Large margin classiﬁcation using the perceptron algorithm. Machine Learning, 37(3):277–296, 1999. D. Gavinsky. Optimally-smooth adaptive boosting and application to agnostic learning. Journal of Machine Learning Research, 4:101–117, 2003. V. Guruswami and P. Raghavendra. Hardness of learning halfspaces with noise. In Proc. 47th IEEE Symposium on Foundations of Computer Science (FOCS), pages 543–552. IEEE Computer Society, 2006. D. Haussler, M. Kearns, N. Littlestone, and M. Warmuth. Equivalence of models for polynomial learnability. Information and Computation, 95(2):129–161, 1991. I.T. Jolliffe. Principal Component Analysis. Springer Series in Statistics, 2002. A. Kalai, A. Klivans, Y. Mansour, and R. Servedio. Agnostically learning halfspaces. SIAM Journal on Computing, 37(6):1777–1805, 2008. M. Kearns and M. Li. Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4):807–837, 1993. M. Kearns, R. Schapire, and L. Sellie. Toward Efﬁcient Agnostic Learning. Machine Learning, 17 (2/3):115–141, 1994. A. Klivans and A. Sherstov. Cryptographic hardness for learning intersections of halfspaces. In Proc. 47th IEEE Symposium on Foundations of Computer Science (FOCS), pages 553–562, 2006. 2739 K LIVANS , L ONG AND S ERVEDIO N. Littlestone. Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Machine Learning, 2:285–318, 1987. N. Littlestone. Redundant noisy attributes, attribute errors, and linear-threshold learning using Winnow. In Proceedings of the Fourth Annual Workshop on Computational Learning Theory, pages 147–156, 1991. L. Lov´ sz and S. Vempala. The geometry of logconcave functions and sampling algorithms. Rana dom Structures and Algorithms, 30(3):307–358, 2007. W. Maass and G. Turan. How fast can a threshold gate learn? In Computational Learning Theory and Natural Learning Systems: Volume I: Constraints and Prospects, pages 381–414. MIT Press, 1994. Y. Mansour and M. Parnas. Learning conjunctions with noise under product distributions. Information Processing Letters, 68(4):189–196, 1998. A. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on Mathematical Theory of Automata, volume XII, pages 615–622, 1962. D. Pollard. Convergence of Stochastic Processes. Springer Verlag, 1984. F. Rosenblatt. The Perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–407, 1958. R. Schapire and Y. Singer. Improved boosting algorithms using conﬁdence-rated predictions. Machine Learning, 37:297–336, 1999. R. Servedio. Efﬁcient Algorithms in Computational Learning Theory. PhD thesis, Harvard University, 2001. R. Servedio. PAC analogues of Perceptron and Winnow via boosting the margin. Machine Learning, 47(2/3):133–151, 2002. R. Servedio. Smooth boosting and learning with malicious noise. Journal of Machine Learning Research, 4:633–648, 2003. J. Shawe-Taylor and N. Cristianini. An Introduction to Support Vector Machines. Cambridge University Press, 2000. M. Talagrand. Sharper bounds for Gaussian and empirical processes. Annals of Probability, 22: 28–76, 1994. L. Valiant. Learning disjunctions of conjunctions. In Proceedings of the Ninth International Joint Conference on Artiﬁcial Intelligence, pages 560–566, 1985. H. Xu, C. Caramanis, and S. Mannor. Principal component analysis with contaminated data: The high dimensional case. Journal of Machine Learning Research, 2009. To appear. 2740