nips nips2004 nips2004-136 nips2004-136-reference knowledge-graph by maker-knowledge-mining

136 nips-2004-On Semi-Supervised Classification

Source: pdf

Author: Balaji Krishnapuram, David Williams, Ya Xue, Lawrence Carin, Mário Figueiredo, Alexander J. Hartemink

Abstract: A graph-based prior is proposed for parametric semi-supervised classiﬁcation. The prior utilizes both labelled and unlabelled data; it also integrates features from multiple views of a given sample (e.g., multiple sensors), thus implementing a Bayesian form of co-training. An EM algorithm for training the classiﬁer automatically adjusts the tradeoff between the contributions of: (a) the labelled data; (b) the unlabelled data; and (c) the co-training information. Active label query selection is performed using a mutual information based criterion that explicitly uses the unlabelled data and the co-training information. Encouraging results are presented on public benchmarks and on measured data from single and multiple sensors. 1

reference text

[1] M. Belkin, I. Matveeva, and P. Niyogi. Regularization and regression on large graphs. In Proc. Computational Learning Theory – COLT’04, Banff, Canada, 2004.

[2] M. Belkin and P. Niyogi. Using manifold structure for partially labelled classiﬁcation. In NIPS 15, MIT Press, Cambridge, MA, 2003.

[3] J. Bernardo and A. Smith. Bayesian Theory. J. Wiley & Sons, Chichester, UK, 1994.

[4] A. Blum and T. Mitchell. Combining labelled and unlabelled data with co-training. In Proc. Computational Learning Theory – COLT’98, Madison, WI, 1998.

[5] D. B¨ hning. Multinomial logistic regression algorithm. Annals Inst. Stat. Math., vol. 44, o pp. 197–200, 1992.

[6] O. Chapelle, J. Weston, and B. Sch¨ lkopf. Cluster kernels for semi-supervised learning. In o NIPS 15, MIT Press, Cambridge, MA, 2003.

[7] A. Corduneanu and T. Jaakkola. On Information regularization. In Proc. Uncertainty in Artiﬁcial Intelligence – UAI’03, Acapulco, Mexico, 2003.

[8] M. Figueiredo. Adaptive sparseness using Jeffreys’ prior. In NIPS 14, MIT Press, 2002.

[9] T. Joachims. Transductive inference for text classiﬁcation using support vector machines. In Int. Conf. Machine Learning – ICML’99, 1999.

[10] T. Joachims. Transductive learning via spectral graph partitioning. In ICML’03, 2003.

[11] K. Lange, D. Hunter, and I. Yang. Optimization transfer using surrogate objective functions. J. Computational and Graphical Statistics, vol. 9, pp. 1–59, 2000.

[12] G. Schohn and D. Cohn. Less is more: Active learning with support vector machines. Intern. Conf. on Mach. Learn. – ICML’00.

[13] M. Seeger. Learning with labelled and unlabelled data. Tech. Rep., Institute for Adaptive and Neural Computation, University of Edinburgh, UK, 2001.

[14] M. Tipping. Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Research, vol. 1, pp. 211–244, 2001.

[15] S. Tong and D. Koller. Support vector machine active learning with applications to text classiﬁcation. In J. Mach. Learn. Research, vol. 2, pp. 45–66, 2001.

[16] D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Sch¨ lkopf. Semi-supervised learning by o maximizing smoothness. J. of Mach. Learn. Research, 2004 (submitted).

[17] X. Zhu, J. Lafferty and Z. Ghahramani. Combining active learning and semi-supervised learning using Gaussian ﬁelds and harmonic functions. In ICML’03 Workshop on The Continuum from Labelled to Unlabelled Data in Mach. Learning, 2003.

[18] X. Zhu, J. Lafferty and Z. Ghahramani. Semi-supervised learning: From Gaussian ﬁelds to Gaussian processes. Tech. Rep. CMU-CS-03-175, School of CS, CMU, 2003.