nips nips2006 nips2006-33 nips2006-33-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shai Ben-David, John Blitzer, Koby Crammer, Fernando Pereira
Abstract: Discriminative learning methods for classification perform well when training and test data are drawn from the same distribution. In many situations, though, we have labeled training data for a source domain, and we wish to learn a classifier which performs well on a target domain with a different distribution. Under what conditions can we adapt a classifier trained on the source domain for use in the target domain? Intuitively, a good feature representation is a crucial factor in the success of domain adaptation. We formalize this intuition theoretically with a generalization bound for domain adaption. Our theory illustrates the tradeoffs inherent in designing a representation for domain adaptation and gives a new justification for a recently proposed model. It also points toward a promising new model for domain adaptation: one which explicitly minimizes the difference between the source and target domains, while at the same time maximizing the margin of the training set. 1
[1] R. Arriaga and S. Vempala. An algorithmic theory of learning robust concepts and random projection. In FOCS, volume 40, 1999.
[2] T. Batu, L. Fortnow, R. Rubinfeld, W. Smith, and P. White. Testing that distributions are close. In FOCS, volume 41, pages 259–269, 2000.
[3] J. Baxter. Learning internal representations. In COLT ’95: Proceedings of the eighth annual conference on Computational learning theory, pages 311–320, New York, NY, USA, 1995.
[4] S. Ben-David, N. Eiron, and P. Long. On the difficulty of approximately maximizing agreements. Journal of Computer and System Sciences, 66:496–514, 2003.
[5] S. Ben-David and R. Schuller. Exploiting task relatedness for multiple task learning. In COLT 2003: Proceedings of the sixteenth annual conference on Computational learning theory, 2003.
[6] J. Blitzer, R. McDonald, and F. Pereira. Domain adaption with structural correspondence learning. In EMNLP, 2006.
[7] K. Crammer, M. Kearns, and J. Wortman. Learning from data of variable quality. In Neural Information Processing Systems (NIPS), Vancouver, Canada, 2005.
[8] W. Johnson and J. Lindenstrauss. Extension of lipschitz mappings to hilbert space. Contemporary Mathematics, 26:189–206, 1984.
[9] D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. In Very Large Databases (VLDB), 2004.
[10] C. Manning. Foundations of Statistical Natural Language Processing. MIT Press, Boston, 1999.
[11] D. McClosky, E. Charniak, and M. Johnson. Reranking and self-training for parser adaptation. In ACL, 2006.
[12] M. Sugiyama and K. Mueller. Generalization error estimation under covariate shift. In Workshop on Information-Based Induction Sciences, 2005.
[13] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Sharing clusters among related groups: Hierarchical Dirichlet processes. In Advances in Neural Information Processing Systems, volume 17, 2005.
[14] V. Vapnik. Statistical Learning Theory. John Wiley, New York, 1998.
[15] T. Zhang. Solving large-scale linear prediction problems with stochastic gradient descent. In ICML, 2004.