nips nips2012 nips2012-271 nips2012-271-reference knowledge-graph by maker-knowledge-mining

271 nips-2012-Pointwise Tracking the Optimal Regression Function


Source: pdf

Author: Yair Wiener, Ran El-Yaniv

Abstract: This paper examines the possibility of a ‘reject option’ in the context of least squares regression. It is shown that using rejection it is theoretically possible to learn ‘selective’ regressors that can ǫ-pointwise track the best regressor in hindsight from the same hypothesis class, while rejecting only a bounded portion of the domain. Moreover, the rejected volume vanishes with the training set size, under certain conditions. We then develop efficient and exact implementation of these selective regressors for the case of linear regression. Empirical evaluation over a suite of real-world datasets corroborates the theoretical analysis and indicates that our selective regressors can provide substantial advantage by reducing estimation error.


reference text

[1] V. Vapnik. Statistical learning theory. 1998. Wiley, New York, 1998.

[2] C.K. Chow. An optimum character recognition system using decision function. IEEE Trans. Computer, 6(4):247–254, 1957.

[3] C.K. Chow. On optimum recognition error and reject trade-off. IEEE Trans. on Information Theory, 16:41–36, 1970.

[4] B. K´ gl. Robust regression by boosting the median. Learning Theory and Kernel Machines, e pages 258–272, 2003. ¨

[5] O. Ayseg¨ l, G. Mehmet, A. Ethem, and H. T¨ rkan. Machine learning integration for predicting ¸ u u the effect of single amino acid substitutions on protein stability. BMC Structural Biology, 9.

[6] R. El-Yaniv and Y. Wiener. On the foundations of noise-free selective classification. The Journal of Machine Learning Research, 11:1605–1641, 2010.

[7] R. El-Yaniv and Y. Wiener. Active learning via perfect selective classification. Journal of Machine Learning Research, 13:255–279, 2012.

[8] R. El-Yaniv and Y. Wiener. Agnostic selective classification. In Neural Information Processing Systems (NIPS), 2011.

[9] W.S. Lee. Agnostic Learning and Single Hidden Layer Neural Networks. PhD thesis, Australian National University, 1996.

[10] V.N. Vapnik. An overview of statistical learning theory. Neural Networks, IEEE Transactions on, 10(5):988–999, 1999.

[11] R.M. Kil and I. Koo. Generalization bounds for the regression of real-valued functions. In Proceedings of the 9th International Conference on Neural Information Processing, volume 4, pages 1766–1770, 2002.

[12] S. Hanneke. A bound on the label complexity of agnostic active learning. In ICML, pages 353–360, 2007.

[13] S. Hanneke. Theoretical Foundations of Active Learning. PhD thesis, Carnegie Mellon University, 2009.

[14] A. Beygelzimer, S. Dasgupta, and J. Langford. Importance weighted active learning. In ICML ’09: Proceedings of the 26th Annual International Conference on Machine Learning, pages 49–56. ACM, 2009.

[15] J.E. Gentle. Numerical linear algebra for applications in statistics. Springer Verlag, 1998.

[16] C.C. Chang and C.J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at ”http://www.csie.ntu.edu.tw/ cjlin/libsvm”.

[17] S. Hanneke. Rates of convergence in active learning. The Annals of Statistics, 39(1):333–361, 2011. 9