nips nips2004 nips2004-164 nips2004-164-reference knowledge-graph by maker-knowledge-mining

164 nips-2004-Semi-supervised Learning by Entropy Minimization

Source: pdf

Author: Yves Grandvalet, Yoshua Bengio

Abstract: We consider the semi-supervised learning problem, where a decision rule is to be learned from labeled and unlabeled data. In this framework, we motivate minimum entropy regularization, which enables to incorporate unlabeled data in the standard supervised learning. Our approach includes other approaches to the semi-supervised problem as particular or limiting cases. A series of experiments illustrates that the proposed solution beneﬁts from unlabeled data. The method challenges mixture models when the data are sampled from the distribution class spanned by the generative model. The performances are deﬁnitely in favor of minimum entropy regularization when generative models are misspeciﬁed, and the weighting of unlabeled data provides robustness to the violation of the “cluster assumption”. Finally, we also illustrate that the method can also be far superior to manifold learning in high dimension spaces. 1

reference text

[1] M. R. Amini and P. Gallinari. Semi-supervised logistic regression. In 15th European Conference on Artiﬁcial Intelligence, pages 390–394. IOS Press, 2002.

[2] J. O. Berger. Statistical Decision Theory and Bayesian Analysis. Springer, New York, 2 edition, 1985.

[3] M. Brand. Structure learning in conditional probability models via an entropic prior and parameter extinction. Neural Computation, 11(5):1155–1182, 1999.

[4] V. Castelli and T. M. Cover. The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Trans. on Information Theory, 42(6):2102–2117, 1996.

[5] Y. Grandvalet. Logistic regression for partial labels. In 9th Information Processing and Management of Uncertainty in Knowledge-based Systems – IPMU’02, pages 1935–1941, 2002.

[6] G. J. McLachlan. Discriminant analysis and statistical pattern recognition. Wiley, 1992.

[7] K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In Ninth International Conference on Information and Knowledge Management, pages 86–93, 2000.

[8] K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell. Text classiﬁcation from labeled and unlabeled documents using EM. Machine learning, 39(2/3):135–167, 2000.

[9] T. J. O’Neill. Normal discrimination with unclassiﬁed observations. Journal of the American Statistical Association, 73(364):821–826, 1978.

[10] M. Seeger. Learning with labeled and unlabeled data. Technical report, Institute for Adaptive and Neural Computation, University of Edinburgh, 2002.

[11] M. Szummer and T. S. Jaakkola. Information regularization with partially labeled data. In Advances in Neural Information Processing Systems 15. MIT Press, 2003.

[12] D. Zhou, O. Bousquet, T. Navin Lal, J. Weston, and B. Sch¨ lkopf. Learning with local and o global consistency. In Advances in Neural Information Processing Systems 16, 2004.

[13] X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using Gaussian ﬁelds and harmonic functions. In 20th Int. Conf. on Machine Learning, pages 912–919, 2003.