nips nips2012 nips2012-142 nips2012-142-reference knowledge-graph by maker-knowledge-mining

142 nips-2012-Generalization Bounds for Domain Adaptation


Source: pdf

Author: Chao Zhang, Lei Zhang, Jieping Ye

Abstract: In this paper, we provide a new framework to study the generalization bound of the learning process for domain adaptation. We consider two kinds of representative domain adaptation settings: one is domain adaptation with multiple sources and the other is domain adaptation combining source and target data. In particular, we use the integral probability metric to measure the difference between two domains. Then, we develop the specific Hoeffding-type deviation inequality and symmetrization inequality for either kind of domain adaptation to achieve the corresponding generalization bound based on the uniform entropy number. By using the resultant generalization bound, we analyze the asymptotic convergence and the rate of convergence of the learning process for domain adaptation. Meanwhile, we discuss the factors that affect the asymptotic behavior of the learning process. The numerical experiments support our results. 1


reference text

[1] V.N. Vapnik (1999). An overview of statistical learning theory. IEEE Transactions on Neural Networks 10(5):988-999.

[2] O. Bousquet, S. Boucheron, and G. Lugosi (2004). Introduction to Statistical Learning Theory. In O. Bousquet et al. (ed.), Advanced Lectures on Machine Learning, 169-207.

[3] V.N. Vapnik (1998). Statistical Learning Theory. New York: John Wiley and Sons.

[4] A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth (1989). Learnability and the VapnikChervonenkis dimension. Journal of the ACM 36(4):929-965.

[5] A. van der Vaart, and J. Wellner (2000). Weak Convergence and Empirical Processes With Applications to Statistics (Hardcover). Springer.

[6] P.L. Bartlett, O. Bousquet, and S. Mendelson (2005). Local Rademacher Complexities. Annals of Statistics 33:1497-1537.

[7] Z. Hussain, and J. Shawe-Taylor (2011). Improved Loss Bounds for Multiple Kernel Learning. Journal of Machine Learning Research - Proceedings Track 15:370-377. 8

[8] J. Jiang, and C. Zhai (2007). Instance Weighting for Domain Adaptation in NLP. Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL), 264-271.

[9] J. Blitzer, M. Dredze, and F. Pereira (2007). Biographies, bollywood, boomboxes and blenders: Domain adaptation for sentiment classification. Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL), 440-447.

[10] S. Bickel, M. Br¨ ckner, and T. Scheffer (2007). Discriminative learning for differing training and test u distributions. Proceedings of the 24th international conference on Machine learning (ICML), 81-88.

[11] P. Wu, and T.G. Dietterich (2004). Improving SVM accuracy by training on auxiliary data sources. Proceedings of the twenty-first international conference on Machine learning (ICML), 871-878.

[12] J. Blitzer, R. McDonald, and F. Pereira (2006). Domain adaptation with structural correspondence learning. Conference on Empirical Methods in Natural Language Processing (EMNLP), 120-128.

[13] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Wortman (2010). A Theory of Learning from Different Domains. Machine Learning 79:151-175.

[14] K. Crammer, M. Kearns, and J. Wortman (2006). Learning from Multiple Sources. Advances in Neural Information Processing Systems (NIPS).

[15] K. Crammer, M. Kearns, and J. Wortman (2008). Learning from Multiple Sources. Journal of Machine Learning Research 9:1757-1774.

[16] Y. Mansour, M. Mohri, and A. Rostamizadeh (2008). Domain adaptation with multiple sources. Advances in Neural Information Processing Systems (NIPS), 1041-1048.

[17] Y. Mansour, M. Mohri, and A. Rostamizadeh (2009). Multiple Source Adaptation and The R´ nyi Divere gence. Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI).

[18] J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Wortman (2007). Learning Bounds for Domain Adaptation. Advances in Neural Information Processing Systems (NIPS).

[19] S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, F (2006). Analysis of Representations for Domain Adaptation. Advances in Neural Information Processing Systems (NIPS), 137-144.

[20] Y. Mansour, M. Mohri, and A. Rostamizadeh (2009). Domain Adaptation: Learning Bounds and Algorithms. Conference on Learning Theory (COLT).

[21] W. Hoeffding (1963). Probability Inequalities for Sums of Bounded Random Variables. Journal of the American Statistical Association 58(301):13-30.

[22] S. Mendelson (2003). A Few Notes on Statistical Learning Theory. Lecture Notes in Computer Science 2600:1-40.

[23] V.M. Zolotarev (1984). Probability Metrics. Theory of Probability and its Application 28(1):278-302.

[24] S.T. Rachev (1991). Probability Metrics and the Stability of Stochastic Models. John Wiley and Sons.

[25] A. M¨ ller (1997). Integral Probability Metrics and Their Generating Classes of Functions. Advances in u Applied Probability 29(2):429-443.

[26] M.D. Reid and R.C. Williamson (2011). Information, Divergence and Risk for Binary Experiments. Journal of Machine Learning Research 12:731-817.

[27] B.K. Sriperumbudur, A. Gretton, K. Fukumizu, G.R.G. Lanckriet and B. Sch¨ lkopf (2009). A Note on o Integral Probability Metrics and φ-Divergences. CoRR abs/0901.2698. 9