nips nips2006 nips2006-20 nips2006-20-reference knowledge-graph by maker-knowledge-mining

20 nips-2006-Active learning for misspecified generalized linear models

Source: pdf

Author: Francis R. Bach

Abstract: Active learning refers to algorithmic frameworks aimed at selecting training data points in order to reduce the number of required training data points and/or improve the generalization performance of a learning method. In this paper, we present an asymptotic analysis of active learning for generalized linear models. Our analysis holds under the common practical situation of model misspeciﬁcation, and is based on realistic assumptions regarding the nature of the sampling distributions, which are usually neither independent nor identical. We derive unbiased estimators of generalization performance, as well as estimators of expected reduction in generalization error after adding a new training data point, that allow us to optimize its sampling distribution through a convex optimization problem. Our analysis naturally leads to an algorithm for sequential active learning which is applicable for all tasks supported by generalized linear models (e.g., binary classiﬁcation, multi-class classiﬁcation, regression) and can be applied in non-linear settings through the use of Mercer kernels. 1

reference text

[1] D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. J. Art. Intel. Res., 4:129–145, 1996.

[2] V. V. Fedorov. Theory of optimal experiments. Academic Press, 1972.

[3] P. Chaudhuri and P. A. Mykland. On efﬁcient designing of nonlinear experiments. Stat. Sin., 5:421–440, 1995.

[4] S. Dasgupta. Coarse sample complexity bounds for active learning. In Adv. NIPS 18, 2006.

[5] N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. In Proc. ICML, 2001.

[6] S. Tong and E. Chang. Support vector machine active learning for image retrieval. In Proc. ACM Multimedia, 2001.

[7] M. Warmuth, G. R¨ tsch, M. Mathieson, J. Liao, and C. Lemmen. Active learning in the drug discovery a process. In Adv. NIPS 14, 2002.

[8] X. Zhu, J. Lafferty, and Z. Ghahramani. Combining active learning and semi-supervised learning using Gaussian ﬁelds and harmonic functions. In Proc. ICML, 2003.

[9] A I. Schein. Active Learning for Logistic Regression. Ph.D. diss., U. Penn., 2005. CIS Dpt.

[10] P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman and Hall, 1989.

[11] T. Zhang and F. J. Oles. A probability analysis on the value of unlabeled data for classiﬁcation problems. In Proc. ICML, 2000.

[12] O. Chapelle. Active learning for parzen window classiﬁer. In Proc. AISTATS, 2005.

[13] H. Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Stat. Plan. Inf., 90:227–244, 2000.

[14] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge Univ. Press, 2003.

[15] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge Univ. Press, 2004.

[16] S. Fine and K. Scheinberg. Efﬁcient SVM training using low-rank kernel representations. J. Mach. Learn. Res., 2:243–264, 2001.

[17] S. Tong and D. Koller. Support vector machine active learning with applications to text classiﬁcation. In Proc. ICML, 2000.

[18] K. Fukumizu. Active learning in multilayer perceptrons. In Adv. NIPS 8, 1996.

[19] T. Kanamori and H. Shimodaira. Active learning algorithm using the maximum weighted log-likelihood estimator. J. Stat. Plan. Inf., 116:149–162, 2003.

[20] T. Kanamori. Statistical asymptotic theory of active learning. Ann. Inst. Stat. Math., 54(3):459–475, 2002.

[21] H. White. Maximum likelihood estimation of misspeciﬁed models. Econometrica, 50(1):1–26, 1982.

[22] H. Akaike. A new look at statistical model identiﬁcation. IEEE Trans. Aut. Cont., 19:716–722, 1974.

[23] F. R. Bach. Active learning for misspeciﬁed generalized linear models. Technical Report N15/06/MM, Ecole des Mines de Paris, 2006.

[24] A. W. Van der Vaart. Asymptotic Statistics. Cambridge Univ. Press, 1998.

[25] D. MacKay. Information-based objective functions for active data selection. Neural Computation, 4(4):590–604, 1992.

[26] Y. Bengio and Y Grandvalet. Semi-supervised learning by entropy minimization. In Adv. NIPS 17, 2005.