nips nips2002 nips2002-181 nips2002-181-reference knowledge-graph by maker-knowledge-mining

181 nips-2002-Self Supervised Boosting

Source: pdf

Author: Max Welling, Richard S. Zemel, Geoffrey E. Hinton

Abstract: Boosting algorithms and successful applications thereof abound for classiﬁcation and regression learning problems, but not for unsupervised learning. We propose a sequential approach to adding features to a random ﬁeld model by training them to improve classiﬁcation performance between the data and an equal-sized sample of “negative examples” generated from the model’s current estimate of the data density. Training in each boosting round proceeds in three stages: ﬁrst we sample negative examples from the model’s current Boltzmann distribution. Next, a feature is trained to improve classiﬁcation performance between data and negative examples. Finally, a coefﬁcient is learned which determines the importance of this feature relative to ones already in the pool. Negative examples only need to be generated once to learn each new feature. The validity of the approach is demonstrated on binary digits and continuous synthetic data.

reference text

[1] Y. Freund and D. Haussler. Unsupervised learning of distributions of binary vectors using 2-layer networks. In Advances in Neural Information Processing Systems, volume 4, pages 912–919, 1992.

[2] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. Technical report, Dept. of Statistics, Stanford University Technical Report., 1998.

[3] J.H. Friedman. Greedy function approximation: A gradient boosting machine. Technical report, Technical Report, Dept. of Statistics, Stanford University, 1999.

[4] G.E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14:1771–1800, 2002.

[5] G.E. Hinton and A. Brown. Spiking Boltzmann machines. In Advances in Neural Information Processing Systems, volume 12, 2000.

[6] G. Lebanon and J. Lafferty. Boosting and maximum likelihood for exponential models. In Advances in Neural Information Processing Systems, volume 14, 2002.

[7] L. Mason, J. Baxter, P. Bartlett, and M. Frean. Boosting algorithms as gradient descent. In Advances in Neural Information Processing Systems, volume 12, 2000.

[8] S. Della Pietra, V.J. Della Pietra, and J.D. Lafferty. Inducing features of random ﬁelds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380–393, 1997.

[9] S. Rosset and E. Segal. Boosting density estimation. In Advances in Neural Information Processing Systems, volume 15 (this volume), 2002.

[10] R.E. Schapire and Y. Singer. Improved boosting algorithms using conﬁdence-rated predictions. In Computational Learing Theory, pages 80–91, 1998.

[11] S.C. Zhu, Z.N. Wu, and D. Mumford. Minimax entropy principle and its application to texture modeling. Neural Computation, 9(8):1627–1660, 1997.