nips nips2010 nips2010-142 nips2010-142-reference knowledge-graph by maker-knowledge-mining

142 nips-2010-Learning Bounds for Importance Weighting

Source: pdf

Author: Corinna Cortes, Yishay Mansour, Mehryar Mohri

Abstract: This paper presents an analysis of importance weighting for learning from ﬁnite samples and gives a series of theoretical and algorithmic results. We point out simple cases where importance weighting can fail, which suggests the need for an analysis of the properties of this technique. We then give both upper and lower bounds for generalization with bounded importance weights and, more signiﬁcantly, give learning guarantees for the more common case of unbounded importance weights under the weak assumption that the second moment is bounded, a condition related to the R´ nyi divergence of the training and test distributions. e These results are based on a series of novel and general bounds we derive for unbounded loss functions, which are of independent interest. We use these bounds to guide the deﬁnition of an alternative reweighting algorithm and report the results of experiments demonstrating its beneﬁts. Finally, we analyze the properties of normalized importance weights which are also commonly used.

reference text

[1] M. Anthony and J. Shawe-Taylor. A result of Vapnik with applications. Discrete Applied Mathematics, 47:207 – 217, 1993. 8

[2] C. Arndt. Information Measures: Information and its Description in Science and Engineering. Signals and Communication Technology. Springer Verlag, 2004.

[3] S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. NIPS, 2007.

[4] S. N. Bernstein. Sur l’extension du th´ or` me limite du calcul des probabilit´ s aux sommes de e e e quantit´ s d´ pendantes. Mathematische Annalen, 97:1–59, 1927. e e

[5] A. Beygelzimer, S. Dasgupta, and J. Langford. Importance weighted active learning. In ICML, pages 49–56, New York, NY, USA, 2009.

[6] S. Bickel, M. Br¨ ckner, and T. Scheffer. Discriminative learning for differing training and test u distributions. In ICML, pages 81–88, 2007.

[7] J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Wortman. Learning bounds for domain adaptation. NIPS 2007, 2008.

[8] C. Cortes, M. Mohri, M. Riley, and A. Rostamizadeh. Sample selection bias correction theory. In ALT, 2008.

[9] S. Dasgupta and P. M. Long. Boosting with diverse base classiﬁers. In COLT, 2003.

[10] H. Daum´ III and D. Marcu. Domain adaptation for statistical classiﬁers. Journal of Artiﬁcial e Intelligence Research, 26:101–126, 2006.

[11] M. Dud´k, R. E. Schapire, and S. J. Phillips. Correcting sample selection bias in maximum ı entropy density estimation. In NIPS, 2006.

[12] R. M. Dudley. A course on empirical processes. Lecture Notes in Math., 1097:2 – 142, 1984.

[13] R. M. Dudley. Universal Donsker classes and metric entropy. Annals of Probability, 14(4):1306 – 1326, 1987.

[14] C. Elkan. The foundations of cost-sensitive learning. In IJCAI, pages 973–978, 2001.

[15] D. Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Inf. Comput., 100(1):78–150, 1992.

[16] J. Huang, A. J. Smola, A. Gretton, K. M. Borgwardt, and B. Sch¨ lkopf. Correcting sample o selection bias by unlabeled data. In NIPS, volume 19, pages 601–608, 2006.

[17] J. Jiang and C. Zhai. Instance Weighting for Domain Adaptation in NLP. In ACL, 2007.

[18] J. S. Liu. Monte Carlo strategies in scientiﬁc computing. Springer, 2001.

[19] Y. Mansour, M. Mohri, and A. Rostamizadeh. Domain adaptation: Learning bounds and algorithms. In COLT, 2009.

[20] A. Maurer and M. Pontil. Empirical bernstein bounds and sample-variance penalization. In COLT, Montr´ al, Canada, June 2009. Omnipress. e

[21] D. Pollard. Convergence of Stochastic Processess. Springer, New York, 1984.

[22] D. Pollard. Asymptotics via empirical processes. Statistical Science, 4(4):341 – 366, 1989.

[23] A. R´ nyi. On measures of information and entropy. In Proceedings of the 4th Berkeley Syme posium on Mathematics, Statistics and Probability, page 547561, 1960.

[24] H. Shimodaira. Improving predictive inference under covariate shift by weighting the loglikelihood function. Journal of Statistical Planning and Inference, 90(2):227–244, 2000.

[25] M. Sugiyama, S. Nakajima, H. Kashima, P. von B¨ nau, and M. Kawanabe. Direct importance u estimation with model selection and its application to covariate shift adaptation. In NIPS, 2008.

[26] V. N. Vapnik. Statistical Learning Theory. John Wiley & Sons, 1998.

[27] V. N. Vapnik. Estimation of Dependences Based on Empirical Data, 2nd ed. Springer, 2006.

[28] J. von Neumann. Various techniques used in connection with random digits. Monte Carlo methods. Nat. Bureau Standards, 12:36–38, 1951.

[29] B. Zadrozny, J. Langford, and N. Abe. Cost-sensitive learning by cost-proportionate example weighting. In ICDM, 2003. 9