jmlr jmlr2007 jmlr2007-88 jmlr2007-88-reference knowledge-graph by maker-knowledge-mining

88 jmlr-2007-Unlabeled Compression Schemes for Maximum Classes

Source: pdf

Author: Dima Kuzmin, Manfred K. Warmuth

Abstract: n Maximum concept classes of VC dimension d over n domain points have size ≤d , and this is an upper bound on the size of any concept class of VC dimension d over n points. We give a compression scheme for any maximum class that represents each concept by a subset of up to d unlabeled domain points and has the property that for any sample of a concept in the class, the representative of exactly one of the concepts consistent with the sample is a subset of the domain of the sample. This allows us to compress any sample of a concept in the class to a subset of up to d unlabeled sample points such that this subset represents a concept consistent with the entire original sample. Unlike the previously known compression scheme for maximum classes (Floyd and Warmuth, 1995) which compresses to labeled subsets of the sample of size equal d, our new scheme is tight in the sense that the number of possible unlabeled compression sets of size at most d equals the number of concepts in the class. Keywords: compression schemes, VC dimension, maximum classes, one-inclusion graph

reference text

S. Ben-David and A. Litman. Combinatorial variability of Vapnik-Chervonenkis classes with applications to sample compression schemes. Discrete Applied Mathematics, 86:3 – 25, 1998. F. Eaton. Private communication, 2005. H. Edelsbrunner. Algorithms in Combinatorial Geometry, volume 10 of EATCS Monographs on Theoretical Computer Science. Springer-Verlag, Berlin, New York, 1987. ISBN 038713722X. S. Floyd. Space-bounded learning and the Vapnik-Chervonenkis Dimension (Ph.D). PhD thesis, U.C. Berkeley, December 1989. ICSI Tech Report TR-89-061. S. Floyd and M. K. Warmuth. Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Machine Learning, 21(3):269–304, 1995. L. Gurvits. Linear algebraic proofs of VC-dimension based inequalities. In Shai Ben-David, editor, EuroCOLT ’97, Jerusalem, Israel, March 1997, pages 238–250. Springer Verlag, March 1997. D. Haussler, N. Littlestone, and M. K. Warmuth. Predicting {0, 1} functions on randomly drawn points. Inform. Comput., 115(2):248–292, 1994. D. Helmbold, R. Sloan, and M. K. Warmuth. Learning integer lattices. SIAM J. Comput., 21(2): 240–266, 1992. J. Langford. Tutorial on practical prediction theory for classiﬁcation. Journal of Machine Learning Research, 6:273–306, 2005. Y. Li, P. M. Long, and A. Srinivasan. The one-inclusion graph algorithm is near optimal for the prediction model of learning. Transaction on Information Theory, 47(3):1257–1261, 2002. N. Littlestone and M. K. Warmuth. Relating data compression and learnability. Unpublished manuscript, obtainable at http://www.cse.ucsc.edu/˜manfred/pubs/T1.pdf, June 10 1986. M. Marchand and J. Shawe-Taylor. The Decision List Machine. In Advances in Neural Information Processing Systems 15, pages 921–928. MIT-Press, Cambridge, MA, USA, 2003. M. Marchand and J. Shawe-Taylor. The Set Covering Machine. Journal of Machine Learning Research, 3:723–746, 2002. T. Neylon. Sparse Solutions to Linear Prediction Problems. PhD thesis, New York University, Courant Institute of Mathematical Sciences, May 2006a. T. Neylon. Private communication, 2006b. 2080 U NLABELED C OMPRESSION S CHEMES FOR M AXIMUM C LASSES B.I.P. Rubinstein, P. Bartlett, and J. H. Rubinstein. Shifting: One-inclusion mistake bounds and sample compression. Technical Report UCB/EECS-200786, EECS Department, University of California, Berkeley, Jun 2007a. URL http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-86.html. B.I.P. Rubinstein, P. L. Bartlett, and J. H. Rubinstein. Shifting, one-inclusion mistake bounds and ¨ tight multiclass expected risk bounds. In B. Scholkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 1193–1200. MIT Press, Cambridge, MA, 2007b. N. Sauer. On the density of families of sets. Journal of Combinatorial Theory (A), 13:145–147, 1972. V. N. Vapnik. Estimation of Dependences Based on Empirical Data. Springer-Verlag, New York, 1982. V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probab. and its Applications, 16(2):264–280, 1971. ¨ U. von Luxburg, O. Bousquet, and B. Scholkopf. A compression approach to support vector model selection. Journal of Machine Learning Research, 5:293–323, April 2004. M. K. Warmuth. Compressing to VC dimension many points. In Proceedings of the 16th Annual Conference on Learning Theory (COLT 03), Washington D.C., USA, August 2003. Springer. Open problem. M. K. Warmuth. The optimal PAC algorithm. In Proceedings of the 17th Annual Conference on Learning Theory (COLT 04), Banff, Canada, July 2004. Springer. Open problem. E. Welzl. Complete range spaces. Unpublished notes, 1987. 2081