nips nips2006 nips2006-109 nips2006-109-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yi Li, Philip M. Long
Abstract: Given a set of classifiers and a probability distribution over their domain, one can define a metric by taking the distance between a pair of classifiers to be the probability that they classify a random item differently. We prove bounds on the sample complexity of PAC learning in terms of the doubling dimension of this metric. These bounds imply known bounds on the sample complexity of learning halfspaces with respect to the uniform distribution that are optimal up to a constant factor. We prove a bound that holds for any algorithm that outputs a classifier with zero error whenever this is possible; this bound is in terms of the maximum of the doubling dimension and the VC-dimension of , and strengthens the best known bound in terms of the VC-dimension alone. We show that there is no bound on the doubling dimension in terms of the VC-dimension of (in contrast with the metric dimension).
[1] K. Alexander. Rates of growth for weighted empirical processes. In Proc. of Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, volume 2, pages 475–493, 1985.
[2] N. Alon, J. H. Spencer, and P. Erd¨ s. The Probabilistic Method. Wiley, 1992. o
[3] M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999.
[4] P. Assouad. Plongements lipschitziens dans. R . Bull. Soc. Math. France, 111(4):429–448, 1983.
[5] P. L. Bartlett, O. Bousquet, and S. Mendelson. Local Rademacher complexities. Annals of Statistics, 33(4):1497–1537, 2005.
[6] A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. ICML, 2006.
[7] H. T. H. Chan, A. Gupta, B. M. Maggs, and S. Zhou. On hierarchical routing in doubling metrics. SODA, 2005.
[8] L. Devroye, L. Gy¨ rfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, 1996. o
[9] D. Dubhashi and D. Ranjan. Balls and bins: A study in negative dependence. Random Structures & Algorithms, 13(2):99–124, Sept 1998.
[10] Devdatt Dubhashi, Volker Priebe, and Desh Ranjan. Negative dependence through the FKG inequality. Technical Report RS-96-27, BRICS, 1996.
[11] R. M. Dudley. Central limit theorems for empirical measures. Annals of Probability, 6(6):899–929, 1978.
[12] R. M. Dudley. A course on empirical processes. Lecture notes in mathematics, 1097:2–142, 1984.
[13] A. Gupta, R. Krauthgamer, and J. R. Lee. Bounded geometries, fractals, and low-distortion embeddings. FOCS, 2003.
[14] S. Har-Peled and M. Mendel. Fast construction of nets in low dimensional metrics, and their applications. SICOMP, 35(5):1148–1184, 2006.
[15] D. Haussler. Sphere packing numbers for subsets of the Boolean Ò-cube with bounded VapnikChervonenkis dimension. Journal of Combinatorial Theory, Series A, 69(2):217–232, 1995.
[16] K. Joag-Dev and F. Proschan. Negative association of random variables, with applications. The Annals of Statistics, 11(1):286–295, 1983.
[17] J. Kleinberg, A. Slivkins, and T. Wexler. Triangulation and embedding using small sets of beacons. FOCS, 2004.
[18] A. N. Kolmogorov and V. M. Tihomirov. ¯-entropy and ¯-capacity of sets in functional spaces. American Mathematical Society Translations (Ser. 2), 17:277–364, 1961.
[19] J. Koml´ s, J. Pach, and G. Woeginger. Almost tight bounds on epsilon-nets. Discrete and Computational o Geometry, 7:163–173, 1992.
[20] R. Krauthgamer and J. R. Lee. The black-box complexity of nearest neighbor search. ICALP, 2004.
[21] R. Krauthgamer and J. R. Lee. Navigating nets: simple algorithms for proximity search. SODA, 2004.
[22] F. Kuhn, T. Moscibroda, and R. Wattenhofer. On the locality of bounded growth. PODC, 2005.
[23] P. M. Long. On the sample complexity of PAC learning halfspaces against the uniform distribution. IEEE Transactions on Neural Networks, 6(6):1556–1559, 1995.
[24] P. M. Long. Using the pseudo-dimension to analyze approximation algorithms for integer programming. Proceedings of the Seventh International Workshop on Algorithms and Data Structures, 2001.
[25] P. M. Long. An upper bound on the sample complexity of PAC learning halfspaces with respect to the uniform distribution. Information Processing Letters, 87(5):229–234, 2003.
[26] S. Mendelson. Estimating the performance of kernel classes. Journal of Machine Learning Research, 4:759–771, 2003.
[27] R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.
[28] A. Slivkins. Distance estimation and object location via rings of neighbors. PODC, 2005.
[29] K. Talwar. Bypassing the embedding: Approximation schemes and compact representations for low dimensional metrics. STOC, 2004.
[30] S. van de Geer. Empirical processes in M-estimation. Cambridge Series in Statistical and Probabilistic Methods, 2000.
[31] A. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes With Applications to Statistics. Springer, 1996.
[32] V. N. Vapnik. Estimation of Dependencies based on Empirical Data. Springer Verlag, 1982.
[33] V. N. Vapnik. Statistical Learning Theory. New York, 1998.
[34] T. Zhang. Information theoretical upper and lower bounds for statistical estimation. IEEE Transactions on Information Theory, 2006. to appear. A Proof of Lemma 9 Definition 10 ([16]) A collection ½ Ò of random variables are negatively associated if for every disjoint pair Á  ½ Ò of index sets, and for every pair Ê Á Ê and Ê Â Ê of non-decreasing functions, we have ¾ Áµ ´ ´ ¾  µµ ´ ¾ Á µµ ´ ´ ´ ´ ¾  µµ Lemma 11 ([10]) If is chosen uniformly at random from among the subsets of ½ × with exactly elements, and ½ if and ¼ otherwise, then ½ × are negatively associated. ¾ Lemma 12 ([9]) Collections ½ ÈÒ for any ¼, ´ ÜÔ´ µµ ½ ¾ Ò of ÉÒ negatively associated random variables satisfy Chernoff bounds: ½ ´ ÜÔ´ µµ ¾ Proof of Lemma 9: Let ¼ ½ indicate whether . By Lemma 11, ½ are negatively È associated. We have ½ . Combining Lemma 12 with a standard Chernoff-Hoeffding ½ bound (see Theorem 4.1 of [27]) completes the proof.