nips nips2007 nips2007-110 nips2007-110-reference knowledge-graph by maker-knowledge-mining

110 nips-2007-Learning Bounds for Domain Adaptation

Source: pdf

Author: John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, Jennifer Wortman

Abstract: Empirical risk minimization offers well-known learning guarantees when training and test data come from the same domain. In the real world, though, we often wish to adapt a classiﬁer from a source domain with a large amount of training data to different target domain with very little training data. In this work we give uniform convergence bounds for algorithms that minimize a convex combination of source and target empirical risk. The bounds explicitly model the inherent trade-off between training on a large but inaccurate source data set and a small but accurate target training set. Our theory also gives results when we have multiple source domains, each of which may have a different number of instances, and we exhibit cases in which minimizing a non-uniform combination of source risks can achieve much lower target error than standard empirical risk minimization. 1

reference text

[1] M. Anthony and P. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge, 1999.

[2] P. Barlett and S. Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. JMLR, 3:463–482, 2002.

[3] S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In NIPS, 2007.

[4] S. Ben-David, J. Gehrke, and D. Kifer. Detecting change in data streams. In VLDB, 2004.

[5] S. Bickel, M. Br¨ ckner, and T. Scheffer. Discriminative learning for differing training and test distribuu tions. In ICML, 2007.

[6] J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boomboxes and blenders: Domain adaptation for sentiment classiﬁcation. In ACL, 2007.

[7] C. Chelba and A. Acero. Empirical methods in natural language processing. In EMNLP, 2004.

[8] K. Crammer, M. Kearns, and J. Wortman. Learning from multiple sources. In NIPS, 2007.

[9] W. Dai, Q. Yang, G. Xue, and Y. Yu. Boosting for transfer learning. In ICML, 2007.

[10] J. Huang, A. Smola, A. Gretton, K. Borgwardt, and B. Schoelkopf. Correcting sample selection bias by unlabeled data. In NIPS, 2007.

[11] J. Jiang and C. Zhai. Instance weighting for domain adaptation. In ACL, 2007.

[12] C. Legetter and P. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech and Language, 9:171–185, 1995.

[13] X. Li and J. Bilmes. A bayesian divergence prior for classiﬁcation adaptation. In AISTATS, 2007.

[14] A. Martinez. Recognition of partially occluded and/or imprecisely localized faces using a probabilistic approach. In CVPR, 2007.

[15] D. McAllester. Simpliﬁed PAC-Bayesian margin bounds. In COLT, 2003.

[16] V. Vapnik. Statistical Learning Theory. John Wiley, New York, 1998.

[17] P. Wu and T. Dietterich. Improving svm accuracy by training on auxiliary data sources. In ICML, 2004. 8