nips nips2006 nips2006-20 nips2006-20-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Francis R. Bach
Abstract: Active learning refers to algorithmic frameworks aimed at selecting training data points in order to reduce the number of required training data points and/or improve the generalization performance of a learning method. In this paper, we present an asymptotic analysis of active learning for generalized linear models. Our analysis holds under the common practical situation of model misspecification, and is based on realistic assumptions regarding the nature of the sampling distributions, which are usually neither independent nor identical. We derive unbiased estimators of generalization performance, as well as estimators of expected reduction in generalization error after adding a new training data point, that allow us to optimize its sampling distribution through a convex optimization problem. Our analysis naturally leads to an algorithm for sequential active learning which is applicable for all tasks supported by generalized linear models (e.g., binary classification, multi-class classification, regression) and can be applied in non-linear settings through the use of Mercer kernels. 1
[1] D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. J. Art. Intel. Res., 4:129–145, 1996.
[2] V. V. Fedorov. Theory of optimal experiments. Academic Press, 1972.
[3] P. Chaudhuri and P. A. Mykland. On efficient designing of nonlinear experiments. Stat. Sin., 5:421–440, 1995.
[4] S. Dasgupta. Coarse sample complexity bounds for active learning. In Adv. NIPS 18, 2006.
[5] N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. In Proc. ICML, 2001.
[6] S. Tong and E. Chang. Support vector machine active learning for image retrieval. In Proc. ACM Multimedia, 2001.
[7] M. Warmuth, G. R¨ tsch, M. Mathieson, J. Liao, and C. Lemmen. Active learning in the drug discovery a process. In Adv. NIPS 14, 2002.
[8] X. Zhu, J. Lafferty, and Z. Ghahramani. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions. In Proc. ICML, 2003.
[9] A I. Schein. Active Learning for Logistic Regression. Ph.D. diss., U. Penn., 2005. CIS Dpt.
[10] P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman and Hall, 1989.
[11] T. Zhang and F. J. Oles. A probability analysis on the value of unlabeled data for classification problems. In Proc. ICML, 2000.
[12] O. Chapelle. Active learning for parzen window classifier. In Proc. AISTATS, 2005.
[13] H. Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Stat. Plan. Inf., 90:227–244, 2000.
[14] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge Univ. Press, 2003.
[15] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge Univ. Press, 2004.
[16] S. Fine and K. Scheinberg. Efficient SVM training using low-rank kernel representations. J. Mach. Learn. Res., 2:243–264, 2001.
[17] S. Tong and D. Koller. Support vector machine active learning with applications to text classification. In Proc. ICML, 2000.
[18] K. Fukumizu. Active learning in multilayer perceptrons. In Adv. NIPS 8, 1996.
[19] T. Kanamori and H. Shimodaira. Active learning algorithm using the maximum weighted log-likelihood estimator. J. Stat. Plan. Inf., 116:149–162, 2003.
[20] T. Kanamori. Statistical asymptotic theory of active learning. Ann. Inst. Stat. Math., 54(3):459–475, 2002.
[21] H. White. Maximum likelihood estimation of misspecified models. Econometrica, 50(1):1–26, 1982.
[22] H. Akaike. A new look at statistical model identification. IEEE Trans. Aut. Cont., 19:716–722, 1974.
[23] F. R. Bach. Active learning for misspecified generalized linear models. Technical Report N15/06/MM, Ecole des Mines de Paris, 2006.
[24] A. W. Van der Vaart. Asymptotic Statistics. Cambridge Univ. Press, 1998.
[25] D. MacKay. Information-based objective functions for active data selection. Neural Computation, 4(4):590–604, 1992.
[26] Y. Bengio and Y Grandvalet. Semi-supervised learning by entropy minimization. In Adv. NIPS 17, 2005.