nips nips2011 nips2011-162 nips2011-162-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Maxim Raginsky, Alexander Rakhlin
Abstract: We develop unified information-theoretic machinery for deriving lower bounds for passive and active learning schemes. Our bounds involve the so-called Alexander’s capacity function. The supremum of this function has been recently rediscovered by Hanneke in the context of active learning under the name of “disagreement coefficient.” For passive learning, our lower bounds match the upper bounds of Gin´ and Koltchinskii up to constants and generalize analogous results of Mase sart and N´ d´ lec. For active learning, we provide first known lower bounds based e e on the capacity function rather than the disagreement coefficient. 1
[1] K.S. Alexander. Rates of growth and sample moduli for weighted empirical processes indexed by sets. Probability Theory and Related Fields, 75(3):379–423, 1987.
[2] K.S. Alexander. The central limit theorem for weighted empirical processes indexed by sets. Journal of Multivariate Analysis, 22(2):313–339, 1987.
[3] S. M. Ali and S. D. Silvey. A general class of coefficients of divergence of one distribution from another. J. Roy. Stat. Soc. Ser. B, 28:131–142, 1966.
[4] M.-F. Balcan, A. Beygelzimer, and J. Langford. Agnostic active learning. In ICML ’06: Proceedings of the 23rd international conference on Machine learning, pages 65–72, New York, NY, USA, 2006. ACM.
[5] A. Beygelzimer, S. Dasgupta, and J. Langford. Importance weighted active learning. In ICML. ACM New York, NY, USA, 2009.
[6] M.V. Burnashev and K.S. Zigangirov. An interval estimation problem for controlled observations. Problemy Peredachi Informatsii, 10(3):51–61, 1974.
[7] R. M. Castro and R. D. Nowak. Minimax bounds for active learning. IEEE Trans. Inform. Theory, 54(5):2339–2353, 2008.
[8] G. Cavallanti, N. Cesa-Bianchi, and C. Gentile. Linear classification and selective sampling under low noise conditions. Advances in Neural Information Processing Systems, 21, 2009.
[9] D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201–221, 1994.
[10] I. Csisz´ r. Information-type measures of difference of probability distributions and indirect observations. a Studia Sci. Math. Hungar., 2:299–318, 1967.
[11] S. Dasgupta, D. Hsu, and C. Monteleoni. A general agnostic active learning algorithm. In Advances in neural information processing systems, volume 20, page 2, 2007.
[12] A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82(3):247–261, 1989.
[13] Y. Freund, H.S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28(2):133–168, 1997.
[14] C. Gentile and D. P. Helmbold. Improved lower bounds for learning from noisy examples: an informationtheoretic approach. Inform. Comput., 166:133–155, 2001.
[15] E. Gin´ and V. Koltchinskii. Concentration inequalities and asymptotic results for ratio type empirical e processes. Ann. Statist., 34(3):1143–1216, 2006.
[16] A. Guntuboyina. Lower bounds for the minimax risk using f -divergences, and applications. IEEE Trans. Inf. Theory, 57(4):2386–2399, 2011.
[17] A. A. Gushchin. On Fano’s lemma and similar inequalities for the minimax risk. Theory of Probability and Mathematical Statistics, pages 29–42, 2003.
[18] T. S. Han and S. Verd´ . Generalizing the Fano inequality. IEEE Trans. Inf. Theory, 40(4):1247–1251, u 1994.
[19] S. Hanneke. A bound on the label complexity of agnostic active learning. In Proceedings of the 24th international conference on Machine learning, page 360. ACM, 2007.
[20] S. Hanneke. Rates of convergence in active learning. Ann. Statist., 39(1):333–361, 2011.
[21] T. Heged˝ s. Generalized teaching dimensions and the query complexity of learning. In COLT ’95, pages u 108–117, New York, NY, USA, 1995. ACM.
[22] M. K¨ ari¨ inen. Active learning in the non-realizable case. In ALT, pages 63–77, 2006. a¨ a
[23] V. Koltchinskii. Rademacher complexities and bounding the excess risk of active learning. J. Machine Learn. Res., 11:2457–2485, 2010. ´ e e
[24] P. Massart and E. N´ d´ lec. Risk bounds for statistical learning. Ann. Statist., 34(5):2326–2366, 2006.
[25] R. D. Nowak. The geometry of generalized binary search. Preprint, October 2009.
[26] D. P. Palomar and S. Verd´ . Lautum information. IEEE Trans. Inform. Theory, 54(3):964–975, March u 2008.
[27] Y. Yang and A. Barron. Information-theoretic determination of minimax rates of convergence. Ann. Statist., 27(5):1564–1599, 1999. 9