nips nips2011 nips2011-153 nips2011-153-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Phil Long, Rocco Servedio
Abstract: We describe a simple algorithm that runs in time poly(n, 1/γ, 1/ε) and learns an unknown n-dimensional γ-margin halfspace to accuracy 1 − ε in the presence of malicious noise, when the noise rate is allowed to be as high as Θ(εγ log(1/γ)). Previous efficient algorithms could only learn to accuracy ε in the presence of malicious noise of rate at most Θ(εγ). Our algorithm does not work by optimizing a convex loss function. We show that no algorithm for learning γ-margin halfspaces that minimizes a convex proxy for misclassification error can tolerate malicious noise at a rate greater than Θ(εγ); this may partially explain why previous algorithms could not achieve the higher noise tolerance of our new algorithm. 1
[1] J. Aslam and S. Decatur. Specification and simulation of statistical query algorithms for efficiency and noise tolerance. Journal of Computer and System Sciences, 56:191–208, 1998.
[2] P. Auer. Learning nested differences in the presence of malicious noise. Theor. Comp. Sci., 185(1):159– 175, 1997.
[3] P. Auer and N. Cesa-Bianchi. On-line learning with malicious noise and the closure algorithm. Annals of Mathematics and Artificial Intelligence, 23:83–99, 1998. 8
[4] E. B. Baum and D. Haussler. What size net gives valid generalization? Neural Comput., 1:151–160, 1989.
[5] H. Block. The Perceptron: a model for brain functioning. Reviews of Modern Physics, 34:123–135, 1962.
[6] A. Blum. Random Projection, Margins, Kernels, and Feature-Selection. In LNCS Volume 3940, pages 52–68, 2006.
[7] A. Blum and M.-F. Balcan. A discriminative model for semi-supervised learning. Journal of the ACM, 57(3), 2010.
[8] S. Decatur. Statistical queries and faulty PAC oracles. In Proc. 6th COLT, pages 262–268, 1993.
[9] C. Domingo and O. Watanabe. MadaBoost: a modified version of AdaBoost. In Proc. 13th COLT, pages 180–189, 2000.
[10] D. Dubhashi and A. Panconesi. Concentration of measure for the analysis of randomized algorithms. Cambridge University Press, Cambridge, 2009.
[11] A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82(3):247–251, 1989.
[12] V. Feldman, P. Gopalan, S. Khot, and A. Ponnuswami. On agnostic learning of parities, monomials, and halfspaces. SIAM J. Comput., 39(2):606–645, 2009.
[13] W. Feller. Generalization of a probability limit theorem of Cram´ r. Trans. Am. Math. Soc., 54:361–372, e 1943.
[14] Y. Freund and R. Schapire. Large margin classification using the Perceptron algorithm. In Proc. 11th COLT, pages 209–217., 1998.
[15] Y. Freund and R. Schapire. A short introduction to boosting. J. Japan. Soc. Artif. Intel., 14(5):771–780, 1999.
[16] D. Gavinsky. Optimally-smooth adaptive boosting and application to agnostic learning. JMLR, 4:101– 117, 2003.
[17] C. Gentile and N. Littlestone. The robustness of the p-norm algorithms. In Proc. 12th COLT, pages 1–11, 1999.
[18] V. Guruswami and P. Raghavendra. Hardness of learning halfspaces with noise. SIAM J. Comput., 39(2):742–765, 2009.
[19] D. Haussler, M. Kearns, N. Littlestone, and M. Warmuth. Equivalence of models for polynomial learnability. Information and Computation, 95(2):129–161, 1991.
[20] M. Kearns and M. Li. Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4):807–837, 1993.
[21] R. Khardon and G. Wachman. Noise tolerant variants of the perceptron algorithm. JMLR, 8:227–248, 2007.
[22] A. Klivans, P. Long, and R. Servedio. Learning Halfspaces with Malicious Noise. JMLR, 10:2715–2740, 2009.
[23] P. Long and R. Servedio. Random classification noise defeats all convex potential boosters. Machine Learning, 78(3):287–304, 2010.
[24] Y. Mansour and M. Parnas. Learning conjunctions with noise under product distributions. Information Processing Letters, 68(4):189–196, 1998.
[25] R. Meir and G. R¨ tsch. An introduction to boosting and leveraging. In LNAI Advanced Lectures on a Machine Learning, pages 118–183, 2003.
[26] A. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on Mathematical Theory of Automata, volume XII, pages 615–622, 1962.
[27] F. Rosenblatt. The Perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–407, 1958.
[28] R. Servedio. Smooth boosting and learning with malicious noise. JMLR, 4:633–648, 2003.
[29] J. Shawe-Taylor, P. Bartlett, R. Williamson, and M. Anthony. Structural risk minimization over datadependent hierarchies. IEEE Transactions on Information Theory, 44(5):1926–1940, 1998.
[30] L. Valiant. Learning disjunctions of conjunctions. In Proc. 9th IJCAI, pages 560–566, 1985.
[31] V. Vapnik. Statistical Learning Theory. Wiley-Interscience, New York, 1998. 9