nips nips2008 nips2008-123 nips2008-123-reference knowledge-graph by maker-knowledge-mining

123 nips-2008-Linear Classification and Selective Sampling Under Low Noise Conditions

Source: pdf

Author: Giovanni Cavallanti, Nicolò Cesa-bianchi, Claudio Gentile

Abstract: We provide a new analysis of an efﬁcient margin-based algorithm for selective sampling in classiﬁcation problems. Using the so-called Tsybakov low noise condition to parametrize the instance distribution, we show bounds on the convergence rate to the Bayes risk of both the fully supervised and the selective sampling versions of the basic algorithm. Our analysis reveals that, excluding logarithmic factors, the average risk of the selective sampler converges to the Bayes risk at rate N −(1+α)(2+α)/2(3+α) where N denotes the number of √ queried labels, and α > 0 is the exponent in the low noise condition. For all α > 3 − 1 ≈ 0.73 this convergence rate is asymptotically faster than the rate N −(1+α)/(2+α) achieved by the fully supervised version of the same classiﬁer, which queries all labels, and for α → ∞ the two rates exhibit an exponential gap. Experiments on textual data reveal that simple variants of the proposed selective sampler perform much better than popular and similarly efﬁcient competitors. 1

reference text

[1] D. Angluin. Queries revisited. In 12th ALT, pages 12–31. Springer, 2001.

[2] K.S. Azoury and M.K. Warmuth. Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning, 43(3):211–246, 2001.

[3] M.F. Balcan, A. Beygelzimer, and J. Langford. Agnostic active learning. In 23rd ICML, pages 65–72. ACM Press, 2006.

[4] M.F. Balcan, A. Broder, and T. Zhang. Margin-based active learning. In 20th COLT, pages 35–50. Springer, 2007.

[5] P.L. Bartlett, M.I. Jordan, and J.D. McAuliffe. 101(473):138–156, 2006. Convexity, classiﬁcation, and risk bounds. JASA,

[6] G. Blanchard, O. Bousquet, and L. Zwald. Statistical properties of kernel principal component analysis. Machine Learning, 66:259–294, 2007.

[7] M.L. Braun. Accurate error bounds for the eigenvalues of the kernel matrix. JMLR, 7:2303–2328, 2006.

[8] C. Campbell, N. Cristianini, and A. Smola. Query learning with large margin classiﬁers. In 17th ICML, pages 111–118. Morgan Kaufmann, 2000.

[9] R. Castro and R.D. Nowak. Minimax bounds for active learning. IEEE Trans. IT, 2008. To appear.

[10] N. Cesa-Bianchi, A. Conconi, and C. Gentile. Learning probabilistic linear-threshold classiﬁers via selective sampling. In 16th COLT, pages 373–387. Springer, 2003.

[11] N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Worst-case analysis of selective sampling for linear classiﬁcation. JMLR, 7:1205–1230, 2006.

[12] N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.

[13] D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201–221, 1994.

[14] R. Cohn, L. Atlas, and R. Ladner. Training connectionist networks with queries and selective sampling. In NIPS 2. MIT Press, 1990.

[15] S. Dasgupta, D. Hsu, and C. Monteleoni. A general agnostic active learning algorithm. In NIPS 20, pages 353–360. MIT Press, 2008.

[16] S. Dasgupta, A. T. Kalai, and C. Monteleoni. Analysis of Perceptron-based active learning. In 18th COLT, pages 249–263. Springer, 2005.

[17] Y. Freund, S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28(2/3):133–168, 1997.

[18] R. Gilad-Bachrach, A. Navot, and N. Tishby. Query by committee made real. NIPS, 18, 2005.

[19] T. Joachims. Making large-scale SVM learning practical. In B. Sch¨ lkopf, C. Burges, and A. Smola, o editors, Advances in Kernel Methods: Support Vector Learning. MIT Press, 1999.

[20] C. Monteleoni and M. K¨ ari¨ inen. Practical online active learning for classiﬁcation. In 24th IEEE CVPR, a¨ a pages 249–263. IEEE Computer Society Press, 2007.

[21] J. Shawe-Taylor, C.K.I. Williams, N. Cristianini, and J. Kandola. On the eigenspectrum of the Gram matrix and the generalization error of kernel-PCA. IEEE Trans. IT, 51(7):2510–2522, 2005.

[22] I. Steinwart and C. Scovel Fast Rates for Support Vector Machines using Gaussian Kernels Annals of Statistics, 35: 575-607, 2007.

[23] S. Tong and D. Koller. Support vector machine active learning with applications to text classiﬁcation. In 17th ICML, pages 999–1006. Morgan Kaufmann, 2000.

[24] A. Tsybakov. Optimal aggregation of classiﬁers in statistical learning. The Annals of Statistics, 32(1):135– 166, 2004.

[25] V. Vovk. Competitive on-line statistics. International Statistical Review, 69:213–248, 2001.

[26] Y. Ying and D.X. Zhou. Online regularized classiﬁcation algorithms. IEEE Transactions on Information Theory, 52:4775–4788, 2006.