nips nips2006 nips2006-131 nips2006-131-reference knowledge-graph by maker-knowledge-mining

131 nips-2006-Mixture Regression for Covariate Shift


Source: pdf

Author: Masashi Sugiyama, Amos J. Storkey

Abstract: In supervised learning there is a typical presumption that the training and test points are taken from the same distribution. In practice this assumption is commonly violated. The situations where the training and test data are from different distributions is called covariate shift. Recent work has examined techniques for dealing with covariate shift in terms of minimisation of generalisation error. As yet the literature lacks a Bayesian generative perspective on this problem. This paper tackles this issue for regression models. Recent work on covariate shift can be understood in terms of mixture regression. Using this view, we obtain a general approach to regression under covariate shift, which reproduces previous work as a special case. The main advantages of this new formulation over previous models for covariate shift are that we no longer need to presume the test and training densities are known, the regression and density estimation are combined into a single procedure, and previous methods are reproduced as special cases of this procedure, shedding light on the implicit assumptions the methods are making. 1


reference text

[1] P. Baldi, S. Brumak, and G. A. Stolovitzky. Bioinformatics: The Machine Learning Approach. MIT Press, Cambridge, 1998.

[2] C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995.

[3] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39:1–38, 1977.

[4] W.S. DeSarbo and W.L. Cron. A maximum likelihood methodology for clusterwise linear regression. Journal of Classification, 5:249–282, 1988.

[5] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley Interscience, 2001.

[6] C. Fraley and A.E. Raftery. How many clusters? Which clustering method? Answers via model-based cluster analysis. Computer Journal, 41:578–588, 1998.

[7] J. J. Heckman. Sample selection bias as a specification error. Econometrica, 47:153–162, 1979.

[8] C. Hennig. Identifiability of models for clusterwise linear regressions. Journal of Classification, 17:273– 296, 2000.

[9] R. Herbrich. Learning Kernel Classifiers. MIT Press, 2002.

[10] R.A. Jacobs, M.I. Jordan, S.J. Nowlan, and G.E. Hinton. Adaptive mixtures of local experts. Neural Computation, 3:79–87, 1991.

[11] M. I. Jordan and R. A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6:181–214, 1994.

[12] C. Keribin. Consistent estimation of the order of mixture models. Technical report, Universit´ d’Evry-Val e d’Essonne, Laboratoire Analyse et Probabilit´ , 1997. e

[13] C.R. Shelton. Importance Sampling for Reinforcement Learning with multiple Objectives. PhD thesis, Massachusetts Institute of Technology, 2001.

[14] H. Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90:227–244, 2000.

[15] M. Sugiyama and K. -R. M¨ ller. Input-dependent estimation of generalisation error under covariate shift. u Statistics and Decisions, 23:249–279, 2005.

[16] H.G. Sung. Gaussian Mixture Regression and Classification. PhD thesis, Rice University, 2004.

[17] J.K. Vermunt. A general non-parametric approach to unobserved heterogeneity in the analysis of event history data. In J. Hagenaars and A. McCutcheon, editors, Applied Latent Class Models. Cambridge University Press, 2002.

[18] M. Wedel and W.S. DeSarbo. A mixture likelihood approach for generalised linear models. Journal of Classification, 12:21–55, 1995.

[19] B. Zadrozny. Learning and evaluating classifiers under sample selection bias. In Proceedings of ICML, 2004.