jmlr jmlr2013 jmlr2013-39 jmlr2013-39-reference knowledge-graph by maker-knowledge-mining

39 jmlr-2013-Efficient Active Learning of Halfspaces: An Aggressive Approach

Source: pdf

Author: Alon Gonen, Sivan Sabato, Shai Shalev-Shwartz

Abstract: We study pool-based active learning of half-spaces. We revisit the aggressive approach for active learning in the realizable case, and show that it can be made efﬁcient and practical, while also having theoretical guarantees under reasonable assumptions. We further show, both theoretically and experimentally, that it can be preferable to mellow approaches. Our efﬁcient aggressive active learner of half-spaces has formal approximation guarantees that hold when the pool is separable with a margin. While our analysis is focused on the realizable setting, we show that a simple heuristic allows using the same algorithm successfully for pools with low error as well. We further compare the aggressive approach to the mellow approach, and prove that there are cases in which the aggressive approach results in signiﬁcantly better label complexity compared to the mellow approach. We demonstrate experimentally that substantial improvements in label complexity can be achieved using the aggressive approach, for both realizable and low-error settings. Keywords: active learning, linear classiﬁers, margin, adaptive sub-modularity

reference text

D. Angluin. Queries and concept learning. Machine Learning, 2(4):319–342, 1988. E. M. Arkin, H. Meijer, J.S.B. Mitchell, D. Rappaport, and S.S. Skiena. Decision trees for geometric models. In Proceedings of the Ninth Annual Symposium on Computational Geometry, pages 369–378. ACM, 1993. M. F. Balcan, A. Beygelzimer, and J. Langford. Agnostic active learning. In Proceedings of the 23rd International Conference on Machine Learning, pages 65–72. ACM, 2006a. M. F. Balcan, A. Blum, and S. Vempala. Kernels as features: On kernels, margins, and low-dimensional mappings. Machine Learning, 65(1):79–94, 2006b. M. F. Balcan, A. Broder, and T. Zhang. Margin based active learning. Learning Theory, pages 35–50, 2007. A. Beygelzimer, S. Dasgupta, and J. Langford. Importance weighted active learning. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 49–56. ACM, 2009. J. Bourgain. On lipschitz embedding of ﬁnite metric spaces in hilbert space. Israel Journal of Mathematics, 52(1):46–52, 1985. G. Brightwell and P. Winkler. Counting linear extensions is #p-complete. In Proceedings of the Twenty-third Annual ACM Symposium on Theory of Computing, STOC ’91, pages 175–181, 1991. D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15(2): 201–221, 1994. S. Dasgupta. Analysis of a greedy active learning strategy. Advances in Neural Information Processing Systems, 17:337–344, 2005. S. Dasgupta. Coarse sample complexity bounds for active learning. Advances in Neural Information Processing Systems, 18:235, 2006. S. Dasgupta, A. Kalai, and C. Monteleoni. Analysis of perceptron-based active learning. Learning Theory, pages 889–905, 2005. S. Dasgupta, D. Hsu, and C. Monteleoni. A general agnostic active learning algorithm. Advances in Neural Information Processing Systems, 20:353–360, 2007. R. El-Yaniv and Y. Wiener. Active learning via perfect selective classiﬁcation. The Journal of Machine Learning Research, 13:255–279, 2012. Y. Freund, H.S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28(2):133–168, 1997. E. Friedman. Active learning for smooth problems. In Proceedings of the 22nd Conference on Learning Theory, volume 1, pages 3–2, 2009. R. Gilad-Bachrach, A. Navot, and N. Tishby. Query by committee made real. Advances in Neural Information Processing Systems (NIPS), 19, 2005. D. Golovin and A. Krause. Adaptive submodularity: A new approach to active learning and stochastic optimization. In Proceedings of International Conference on Learning Theory (COLT), 2010. S. Hanneke. A bound on the label complexity of agnostic active learning. In The 24th Annual International Conference on Machine Learning (ICML), 2007. S. Hanneke. Rates of convergence in active learning. The Annals of Statistics, 39(1):333–361, 2011. 2614 E FFICIENT ACTIVE L EARNING OF H ALFSPACES J. H˚ stad. On the size of weights for threshold gates. SIAM Journal on Discrete Mathematics, 7:484, 1994. a W. Johnson and J. Lindenstrauss. Extensions of lipschitz mapping into hilbert space. Contemporary Mathematics, 26:189–206, 1984. R. Kannan, L. Lov´ sz, and M. Simonovits. Random walks and an o ∗ (n5 ) volume algorithm for convex a bodies. Random Structures and Algorithms, 11(1):1–50, 1997. L. Lov´ sz. Hit-and-run mixes fast. Mathematical Programming, 86(3):443–461, 1999. a J. Matouˇek. Lectures on Discrete Geometry, volume 212. Springer Verlag, 2002. s A. McCallum and K. Nigam. Employing em in pool-based active learning for text classiﬁcation. In Proceedings of ICML-98, 15th International Conference on Machine Learning, pages 350–358, 1998. S. Muroga, I. Toda, and S. Takasu. Theory of majority decision elements. Journal of the Franklin Institute, 271(5):376–418, 1961. S. Sabato, N. Srebro, and N. Tishby. Tight sample complexity of large-margin learning. In Advances in Neural Information Processing Systems 23 (NIPS), pages 2038–2046, 2010. H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 287–294. ACM, 1992. C. E. Shannon. Probability of error for optimal codes in a gaussian channel. Bell System Technical Journal, 38:611–656, 1959. S. Tong and D. Koller. Support vector machine active learning with applications to text classiﬁcation. The Journal of Machine Learning Research, 2:45–66, 2002. 2615