nips nips2013 nips2013-171 nips2013-171-reference knowledge-graph by maker-knowledge-mining

171 nips-2013-Learning with Noisy Labels

Source: pdf

Author: Nagarajan Natarajan, Inderjit Dhillon, Pradeep Ravikumar, Ambuj Tewari

Abstract: In this paper, we theoretically study the problem of binary classiﬁcation in the presence of random classiﬁcation noise — the learner, instead of seeing the true labels, sees labels that have independently been ﬂipped with some small probability. Moreover, random label noise is class-conditional — the ﬂip probability depends on the class. We provide two approaches to suitably modify any given surrogate loss function. First, we provide a simple unbiased estimator of any loss, and obtain performance bounds for empirical risk minimization in the presence of iid data with noisy labels. If the loss function satisﬁes a simple symmetry condition, we show that the method leads to an efﬁcient algorithm for empirical minimization. Second, by leveraging a reduction of risk minimization under noisy labels to classiﬁcation with weighted 0-1 loss, we suggest the use of a simple weighted surrogate loss, for which we are able to obtain strong empirical risk bounds. This approach has a very remarkable consequence — methods used in practice such as biased SVM and weighted logistic regression are provably noise-tolerant. On a synthetic non-separable dataset, our methods achieve over 88% accuracy even when 40% of the labels are corrupted, and are competitive with respect to recently proposed methods for dealing with label noise in several benchmark datasets.

reference text

D. Angluin and P. Laird. Learning from noisy examples. Mach. Learn., 2(4):343–370, 1988. Javed A. Aslam and Scott E. Decatur. On the sample complexity of noise-tolerant learning. Inf. Process. Lett., 57(4):189–195, 1996. Peter L. Bartlett, Michael I. Jordan, and Jon D. McAuliffe. Convexity, classiﬁcation, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006. Shai Ben-David, D´ vid P´ l, and Shai Shalev-Shwartz. Agnostic online learning. In Proceedings of the 22nd a a Conference on Learning Theory, 2009. Battista Biggio, Blaine Nelson, and Pavel Laskov. Support vector machines under adversarial label noise. Journal of Machine Learning Research - Proceedings Track, 20:97–112, 2011. Tom Bylander. Learning linear threshold functions in the presence of classiﬁcation noise. In Proc. of the 7th COLT, pages 340–347, NY, USA, 1994. ACM. Nicol` Cesa-Bianchi, Eli Dichterman, Paul Fischer, Eli Shamir, and Hans Ulrich Simon. Sample-efﬁcient o strategies for learning in the presence of noise. J. ACM, 46(5):684–719, 1999. Nicol` Cesa-Bianchi, Shai Shalev-Shwartz, and Ohad Shamir. Online learning of noisy data. IEEE Transaco tions on Information Theory, 57(12):7907–7931, 2011. K. Crammer and D. Lee. Learning via gaussian herding. In Advances in NIPS 23, pages 451–459, 2010. Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. Online passive-aggressive algorithms. J. Mach. Learn. Res., 7:551–585, 2006. Koby Crammer, Alex Kulesza, and Mark Dredze. Adaptive regularization of weight vectors. In Advances in NIPS 22, pages 414–422, 2009. Mark Dredze, Koby Crammer, and Fernando Pereira. Conﬁdence-weighted linear classiﬁcation. In Proceedings of the Twenty-Fifth ICML, pages 264–271, 2008. C. Elkan and K. Noto. Learning classiﬁers from only positive and unlabeled data. In Proc. of the 14th ACM SIGKDD intl. conf. on Knowledge discovery and data mining, pages 213–220, 2008. Yoav Freund. A more robust boosting algorithm, 2009. preprint arXiv:0905.2138 [stat.ML] available at http://arxiv.org/abs/0905.2138. T. Graepel and R. Herbrich. The kernel Gibbs sampler. In Advances in NIPS 13, pages 514–520, 2000. Roni Khardon and Gabriel Wachman. Noise tolerant variants of the perceptron algorithm. J. Mach. Learn. Res., 8:227–248, 2007. Neil D. Lawrence and Bernhard Sch¨ lkopf. Estimating a kernel Fisher discriminant in the presence of label o noise. In Proceedings of the Eighteenth ICML, pages 306–313, 2001. Bing Liu, Yang Dai, Xiaoli Li, Wee Sun Lee, and Philip S Yu. Building text classiﬁers using positive and unlabeled examples. In ICDM 2003., pages 179–186. IEEE, 2003. Philip M. Long and Rocco A. Servedio. Random classiﬁcation noise defeats all convex potential boosters. Mach. Learn., 78(3):287–304, 2010. Yves Lucet. What shape is your conjugate? a survey of computational convex analysis and its applications. SIAM Rev., 52(3):505–542, August 2010. ISSN 0036-1445. Naresh Manwani and P. S. Sastry. Noise tolerance under risk minimization. To appear in IEEE Trans. Syst. Man and Cybern. Part B, 2013. URL: http://arxiv.org/abs/1109.5231. A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM J. on Opt., 19(4):1574–1609, 2009. David F. Nettleton, A. Orriols-Puig, and A. Fornells. A study of the effect of different types of noise on the precision of supervised learning techniques. Artif. Intell. Rev., 33(4):275–306, 2010. Clayton Scott. Calibrated asymmetric surrogate losses. Electronic J. of Stat., 6:958–992, 2012. Clayton Scott, Gilles Blanchard, and Gregory Handy. Classiﬁcation with asymmetric label noise: Consistency and maximal denoising. To appear in COLT, 2013. G. Stempfel and L. Ralaivola. Learning kernel perceptrons on noisy data using random projections. In Algorithmic Learning Theory, pages 328–342. Springer, 2007. G. Stempfel, L. Ralaivola, and F. Denis. Learning from noisy data using hyperplane sampling and sample averages. 2007. Guillaume Stempfel and Liva Ralaivola. Learning SVMs from sloppily labeled data. In Proc. of the 19th Intl. Conf. on Artiﬁcial Neural Networks: Part I, pages 884–893. Springer-Verlag, 2009. Martin Zinkevich. Online convex programming and generalized inﬁnitesimal gradient ascent. In Proceedings of the Twentieth ICML, pages 928–936, 2003. 9