nips nips2009 nips2009-130 nips2009-130-reference knowledge-graph by maker-knowledge-mining

130 nips-2009-Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization

Source: pdf

Author: Massih Amini, Nicolas Usunier, Cyril Goutte

Abstract: We address the problem of learning classiﬁers when observations have multiple views, some of which may not be observed for all examples. We assume the existence of view generating functions which may complete the missing views in an approximate way. This situation corresponds for example to learning text classiﬁers from multilingual collections where documents are not available in all languages. In that case, Machine Translation (MT) systems may be used to translate each document in the missing languages. We derive a generalization error bound for classiﬁers learned on examples with multiple artiﬁcially created views. Our result uncovers a trade-off between the size of the training set, the number of views, and the quality of the view generating functions. As a consequence, we identify situations where it is more interesting to use multiple views for learning instead of classical single view learning. An extension of this framework is a natural way to leverage unlabeled multi-view data in semi-supervised learning. Experimental results on a subset of the Reuters RCV1/RCV2 collections support our ﬁndings by showing that additional views obtained from MT may signiﬁcantly improve the classiﬁcation performance in the cases identiﬁed by our trade-off. 1

reference text

[1] P. L. Bartlett and S. Mendelson. Rademacher and gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research, 3:463–482, 2003.

[2] J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Wortman. Learning bounds for domain adaptation. In NIPS, 2007.

[3] A. Blum and T. M. Mitchell. Combining labeled and unlabeled sata with co-training. In COLT, pages 92–100, 1998.

[4] K. Crammer, M. Kearns, and J. Wortman. Learning from multiple sources. Journal of Machine Learning Research, 9:1757–1774, 2008.

[5] J. D. R. Farquhar, D. Hardoon, H. Meng, J. Shawe-Taylor, and S. Szedmak. Two view learning: Svm-2k, theory and practice. In Advances in Neural Information Processing Systems 18, pages 355–362. 2006.

[6] D. R. Hardoon, G. Leen, S. Kaski, and J. S.-T. (eds). Nips workshop on learning from multiple sources. 2008.

[7] T. Joachims. Transductive inference for text classiﬁcation using support vector machines. In ICML, pages 200–209, 1999.

[8] T. Joachims. Training linear svms in linear time. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), pages 217–226, 2006.

[9] J. Langford and J. Shawe-taylor. Pac-bayes & margins. In NIPS 15, pages 439–446, 2002.

[10] E. Lehmann. Nonparametric Statistical Methods Based on Ranks. McGraw-Hill, New York, 1975.

[11] B. Leskes. The value of agreement, a new boosting algorithm. In COLT, pages 95–110, 2005.

[12] D. D. Lewis, Y. Yang, T. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361–397, 2004.

[13] I. Muslea. Active learning with multiple views. PhD thesis, USC, 2002.

[14] Reuters. Corpus, volume 2, multilingual corpus, 1996-08-20 to 1997-08-19. 2005.

[15] N. Uefﬁng, M. Simard, S. Larkin, and J. H. Johnson. NRC’s PORTAGE system for WMT. In In ACL-2007 Second Workshop on SMT, pages 185–188, 2007.

[16] X. Zhu. Semi-supervised learning literature survey. Technical report, Univ. Wisconsis, 2007.