nips nips2009 nips2009-72 nips2009-72-reference knowledge-graph by maker-knowledge-mining

72 nips-2009-Distribution Matching for Transduction

Source: pdf

Author: Novi Quadrianto, James Petterson, Alex J. Smola

Abstract: Many transductive inference algorithms assume that distributions over training and test estimates should be related, e.g. by providing a large margin of separation on both sets. We use this idea to design a transduction algorithm which can be used without modiﬁcation for classiﬁcation, regression, and structured estimation. At its heart we exploit the fact that for a good learner the distributions over the outputs on training and test sets should match. This is a classical two-sample problem which can be solved efﬁciently in its most general form by using distance measures in Hilbert Space. It turns out that a number of existing heuristics can be viewed as special cases of our approach. 1

reference text

[1] O. Chapelle, B. Sch¨ lkopf, and A. Zien, editors. Semi-Supervised Learning. MIT Press, o Cambridge, MA, 2006.

[2] T. Pham Dinh and L. Hoai An. A D.C. optimization algorithm for solving the trust-region subproblem. SIAM Journal on Optimization, 8(2):476–505, 1988.

[3] G. Druck, G.S. Mann, and A. McCallum. Learning from labeled features using generalized expectation criteria. In S.-H. Myaeng, D.W. Oard, F. Sebastiani, T.-S. Chua, and M.-K. Leong, editors, SIGIR, pages 595–602. ACM, 2008.

[4] A. Gammerman, Volodya Vovk, and Vladimir Vapnik. Learning by transduction. In Proceedings of Uncertainty in AI, pages 148–155, Madison, Wisconsin, 1998.

[5] T. G¨ rtner, Q.V. Le, S. Burton, A. J. Smola, and S. V. N. Vishwanathan. Large-scale multiclass a transduction. In Y. Weiss, B. Sch¨ lkopf, and J. Platt, editors, Advances in Neural Information o Processing Systems 18, pages 411–418, Cambride, MA, 2006. MIT Press.

[6] J. Graca, K. Ganchev, and B. Taskar. Expectation maximization and posterior constraints. In ¸ J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, editors, NIPS. MIT Press, 2007.

[7] A. Gretton, K. Borgwardt, M. Rasch, B. Sch¨ lkopf, and A. Smola. A kernel method for the o two sample problem. Technical Report 157, MPI for Biological Cybernetics, 2008.

[8] T. Joachims. Transductive inference for text classiﬁcation using support vector machines. In I. Bratko and S. Dzeroski, editors, Proc. Intl. Conf. Machine Learning, pages 200–209, San Francisco, 1999. Morgan Kaufmann Publishers.

[9] J. D. Lafferty, A. McCallum, and F. Pereira. Conditional random ﬁelds: Probabilistic modeling for segmenting and labeling sequence data. In Proc. Intl. Conf. Machine Learning, volume 18, pages 282–289, San Francisco, CA, 2001. Morgan Kaufmann.

[10] Q.V. Le, A.J. Smola, T. G¨ rtner, and Y. Altun. Transductive gaussian process regression with a automatic model selection. In J. F¨ rnkranz, T. Scheffer, and M. Spiliopoulou, editors, Eurou pean Conference of Machine Learning, volume 4212 of LNAI. 306-317, 2006.

[11] A. McCallum and W. Li. Early results for named entity recognition with conditional random ﬁelds, feature induction and web enhanced lexicons. In CoNLL, 2003.

[12] Y. Nesterov and J.-P. Vial. Conﬁdence level solutions for stochastic programming. Technical Report 2000/13, Universit´ Catholique de Louvain - Center for Operations Research and e Economics, 2000.

[13] E.F. Tjong Kim Sang and S. Buchholz. Introduction to the CoNLL-2000 shared task: Chunking. In Proc. Conf. Computational Natural Language Learning, pages 127–132, Lisbon, Portugal, 2000.

[14] V. Sindhwani and S.S. Keerthi. Large scale semi-supervised linear SVMs. In SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 477–484, New York, NY, USA, 2006. ACM Press.

[15] A. Zien, U. Brefeld, and T. Scheffer. Transductive support vector machines for structured variables. In ICML, pages 1183–1190, 2007.

[16] UCI repository, http://archive.ics.uci.edu/ml/

[17] LibSVM, http://www.csie.ntu.edu.tw/˜cjlin/libsvmtools/

[18] CRF++, http://chasen.org/˜taku/software/CRF++

[19] Stochastic Gradient Descent code, http://leon.bottou.org/projects/sgd

[20] DMOZ ontology, http://www.dmoz.org 9