nips nips2008 nips2008-241 nips2008-241-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Steffen Bickel, Christoph Sawade, Tobias Scheffer
Abstract: We address the problem of learning classifiers for several related tasks that may differ in their joint distribution of input and output variables. For each task, small – possibly even empty – labeled samples and large unlabeled samples are available. While the unlabeled samples reflect the target distribution, the labeled samples may be biased. This setting is motivated by the problem of predicting sociodemographic features for users of web portals, based on the content which they have accessed. Here, questionnaires offered to a portion of each portal’s users produce biased samples. We derive a transfer learning procedure that produces resampling weights which match the pool of all examples to the target distribution of any given task. Transfer learning enables us to make predictions even for new portals with few or no training data and improves the overall prediction accuracy. 1
[1] H. Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90:227–244, 2000.
[2] J. Huang, A. Smola, A. Gretton, K. Borgwardt, and B. Sch¨ lkopf. Correcting sample selection bias by o unlabeled data. In Advances in Neural Information Processing Systems, 2007.
[3] M. Sugiyama, S. Nakajima, H. Kashima, P. von Bunau, and M. Kawanabe. Direct importance estimation with model selection and its application to covariate shift adaptation. In Advances in Neural Information Processing Systems, 2008.
[4] S. Bickel, M. Br¨ ckner, and T. Scheffer. Discriminative learning for differing training and test distributions. u In Proceedings of the International Conference on Machine Learning, 2007.
[5] A. Schwaighofer, V. Tresp, and K. Yu. Learning Gaussian process kernels via hierarchical Bayes. In Advances in Neural Information Processing Systems, 2005.
[6] T. Evgeniou and M. Pontil. Regularized multi–task learning. Proceedings of the International Conference on Knowledge Discovery and Data Mining, pages 109–117, 2004.
[7] Y. Xue, X. Liao, L. Carin, and B. Krishnapuram. Multi-task learning for classification with Dirichlet process priors. Journal of Machine Learning Research, 8:35–63, 2007.
[8] S. Bickel, J. Bogojeska, T. Lengauer, and T. Scheffer. Multi-task learning for HIV therapy screening. In Proceedings of the International Conference on Machine Learning, 2008.
[9] C. Lin, R. Weng, and S. Keerthi. Trust region Newton method for large-scale logistic regression. Journal of Machine Learning Research, 9:627–650, 2008.