nips nips2013 nips2013-60 nips2013-60-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Liu Yang, Jaime Carbonell
Abstract: In many practical applications of active learning, it is more cost-effective to request labels in large batches, rather than one-at-a-time. This is because the cost of labeling a large batch of examples at once is often sublinear in the number of examples in the batch. In this work, we study the label complexity of active learning algorithms that request labels in a given number of batches, as well as the tradeoff between the total number of queries and the number of rounds allowed. We additionally study the total cost sufficient for learning, for an abstract notion of the cost of requesting the labels of a given number of examples at once. In particular, we find that for sublinear cost functions, it is often desirable to request labels in large batches (i.e., buying in bulk); although this may increase the total number of labels requested, it reduces the total cost required for learning. 1
[1] V. S. Sheng and C. X. Ling. Feature value acquisition in testing: a sequential batch test algorithm. In Proceedings of the 23rd international conference on Machine learning, 2006.
[2] S. Chakraborty, V. Balasubramanian, and S. Panchanathan. An optimization based framework for dynamic batch mode active learning. In Advances in Neural Information Processing, 2010.
[3] S. Dasgupta, A. Kalai, and C. Monteleoni. Analysis of perceptron-based active learning. Journal of Machine Learning Research, 10:281–299, 2009.
[4] S. Dasgupta. Coarse sample complexity bounds for active learning. In Advances in Neural Information Processing Systems 18, 2005.
[5] M. F. Balcan, A. Beygelzimer, and J. Langford. Agnostic active learning. In Proc. of the 23rd International Conference on Machine Learning, 2006.
[6] S. Hanneke. A bound on the label complexity of agnostic active learning. In Proceedings of the 24th International Conference on Machine Learning, 2007.
[7] S. Hanneke. Rates of convergence in active learning. The Annals of Statistics, 39(1):333–361, 2011.
[8] D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201–221, 1994.
[9] V. Vapnik. Estimation of Dependencies Based on Empirical Data. Springer-Verlag, New York, 1982.
[10] M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999.
[11] E. Mammen and A.B. Tsybakov. Smooth discrimination analysis. The Annals of Statistics, 27:1808–1829, 1999. ´ e e
[12] P. Massart and E. N´ d´ lec. Risk bounds for statistical learning. The Annals of Statistics, 34(5):2326–2366, 2006.
[13] V. Koltchinskii. Local rademacher complexities and oracle inequalities in risk minimization. The Annals of Statistics, 34(6):2593–2656, 2006.
[14] S. Hanneke. Activized learning: Transforming passive to active with improved label complexity. Journal of Machine Learning Research, 13(5):1469–1587, 2012.
[15] M.-F. Balcan, A. Broder, and T. Zhang. Margin based active learning. In Proceedings of the 20th Conference on Learning Theory, 2007.
[16] C. H. Papadimitriou and M. Sipser. Communication complexity. Journal of Computer and System Sciences, 28(2):260269, 1984.
[17] P. Harsha, Y. Ishai, Joe Kilian, Kobbi Nissim, and Srinivasan Venkatesh. Communication versus computation. In The 31st International Colloquium on Automata, Languages and Programming, pages 745–756, 2004. 9