nips nips2007 nips2007-110 nips2007-110-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, Jennifer Wortman
Abstract: Empirical risk minimization offers well-known learning guarantees when training and test data come from the same domain. In the real world, though, we often wish to adapt a classifier from a source domain with a large amount of training data to different target domain with very little training data. In this work we give uniform convergence bounds for algorithms that minimize a convex combination of source and target empirical risk. The bounds explicitly model the inherent trade-off between training on a large but inaccurate source data set and a small but accurate target training set. Our theory also gives results when we have multiple source domains, each of which may have a different number of instances, and we exhibit cases in which minimizing a non-uniform combination of source risks can achieve much lower target error than standard empirical risk minimization. 1
[1] M. Anthony and P. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge, 1999.
[2] P. Barlett and S. Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. JMLR, 3:463–482, 2002.
[3] S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In NIPS, 2007.
[4] S. Ben-David, J. Gehrke, and D. Kifer. Detecting change in data streams. In VLDB, 2004.
[5] S. Bickel, M. Br¨ ckner, and T. Scheffer. Discriminative learning for differing training and test distribuu tions. In ICML, 2007.
[6] J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boomboxes and blenders: Domain adaptation for sentiment classification. In ACL, 2007.
[7] C. Chelba and A. Acero. Empirical methods in natural language processing. In EMNLP, 2004.
[8] K. Crammer, M. Kearns, and J. Wortman. Learning from multiple sources. In NIPS, 2007.
[9] W. Dai, Q. Yang, G. Xue, and Y. Yu. Boosting for transfer learning. In ICML, 2007.
[10] J. Huang, A. Smola, A. Gretton, K. Borgwardt, and B. Schoelkopf. Correcting sample selection bias by unlabeled data. In NIPS, 2007.
[11] J. Jiang and C. Zhai. Instance weighting for domain adaptation. In ACL, 2007.
[12] C. Legetter and P. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech and Language, 9:171–185, 1995.
[13] X. Li and J. Bilmes. A bayesian divergence prior for classification adaptation. In AISTATS, 2007.
[14] A. Martinez. Recognition of partially occluded and/or imprecisely localized faces using a probabilistic approach. In CVPR, 2007.
[15] D. McAllester. Simplified PAC-Bayesian margin bounds. In COLT, 2003.
[16] V. Vapnik. Statistical Learning Theory. John Wiley, New York, 1998.
[17] P. Wu and T. Dietterich. Improving svm accuracy by training on auxiliary data sources. In ICML, 2004. 8