jmlr jmlr2006 jmlr2006-81 jmlr2006-81-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Paul W. Goldberg
Abstract: A classical approach in multi-class pattern classification is the following. Estimate the probability distributions that generated the observations for each label class, and then label new instances by applying the Bayes classifier to the estimated distributions. That approach provides more useful information than just a class label; it also provides estimates of the conditional distribution of class labels, in situations where there is class overlap. We would like to know whether it is harder to build accurate classifiers via this approach, than by techniques that may process all data with distinct labels together. In this paper we make that question precise by considering it in the context of PAC learnability. We propose two restrictions on the PAC learning framework that are intended to correspond with the above approach, and consider their relationship with standard PAC learning. Our main restriction of interest leads to some interesting algorithms that show that the restriction is not stronger (more restrictive) than various other well-known restrictions on PAC learning. An alternative slightly milder restriction turns out to be almost equivalent to unrestricted PAC learning. Keywords: computational learning theory, computational complexity, pattern classification
E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113–141, Dec 2000. D. Angluin and P. Laird. Learning from noisy examples. Machine Learning, 2(4):343–370, 1988. M. Anthony and N. Biggs. Computational Learning Theory. Cambridge Tracts in Theoretical Computer Science. Cambridge University Press, Cambridge, 1992. S. Ben-David and E. Dichterman. Learnability with restricted focus of attention guarantees noisetolerance. In 5th International Workshop on Algorithmic Learning Theory, pages 248–259, 1994. S. Ben-David and E. Dichterman. Learning with restricted focus of attention. Journal of Computer and System Sciences, 56(3):277–298, 1998. C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995. A. Blum. Separating distribution-free and mistake-bound learning models over the boolean domain. SIAM Journal on Computing, 23(5):990–1000, 1994. A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Occam’s razor. Information Processing Letters, 24:377–380, 1987. A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Learnability and the vapnikchervonenkis dimension. Journal of the ACM, 36(4):929–965, 1989. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, 2000. M. Cryan, L. A. Goldberg, and P. W. Goldberg. Evolutionary trees can be learned in polynomial time in the two-state general markov model. SIAM Journal on Computing, 31(2):375–397, 2001. S. Dasgupta. Learning mixtures of gaussians. In 40th IEEE Symposium on Foundations of Computer Science, pages 634–644, 1999. F. Denis. Pac learning from positive statistical queries. In Algorithmic Learning Theory (ALT), 9th International Conference, volume 1501 of LNAI, pages 112–126. Springer, 1998. R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973. Y. Freund and Y. Mansour. Estimating a mixture of two product distributions. In Proceedings of the 12th Workshop on Computational Learning Theory (COLT), pages 53–62. Morgan Kaufmann, 1999. 305 G OLDBERG A. Frieze, M. R. Jerrum, and R. Kannan. Learning linear transformations. In Proceedings of the 37th IEEE Symposium on Foundations of Computer Science, pages 359–368, 1996. V. Guruswami and A. Sahai. Multiclass learning, boosting, and error-correcting codes. In Proceedings of the 12th Workshop on Computational Learning Theory (COLT), pages 145–155, 1999. D. Haussler, M. J. Kearns, N. Littlestone, and M. K. Warmuth. Equivalence of models for polynomial learnability. Information and Computation, 95(2):129–161, 1991. D. Helmbold, R. Sloan, and M. K. Warmuth. Learning integer lattices. SIAM Journal on Computing, 21(2):240–266, 1992. M. J. Kearns. Efficient noise-tolerant learning from statistical queries. Journal of the ACM, 45(6): 983–1006, 1998. M. J. Kearns, Y. Mansour, D. Ron, R. Rubinfeld, R. E. Schapire, and L. Sellie. On the learnability of discrete distributions. In Proceedings of the 26th Annual ACM Symposium on the Theory of Computing, pages 273–282, 1994. M. J. Kearns and R. E. Schapire. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48(3):464–497, 1994. F. Letouzey, F. Denis, and R. Gilleron. Learning from positive and unlabeled examples. In Proceedings of the 11th International Conference on Algorithmic Learning Theory, pages 71–85, 2000. N. Palmer and P. W. Goldberg. Pac classification based on pac estimates of label class distributions. Technical Report 411, University of Warwick, Dept. of Computer Science, 2005. J. C. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin dags for multiclass classification. In Proceedings of 12th NIPS conference, 2000. R. E. Schapire. The strength of weak learnability. Machine Learning, 5:197–227, 1990. V. N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, second edition, 2000. 306