nips nips2002 nips2002-181 nips2002-181-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Max Welling, Richard S. Zemel, Geoffrey E. Hinton
Abstract: Boosting algorithms and successful applications thereof abound for classification and regression learning problems, but not for unsupervised learning. We propose a sequential approach to adding features to a random field model by training them to improve classification performance between the data and an equal-sized sample of “negative examples” generated from the model’s current estimate of the data density. Training in each boosting round proceeds in three stages: first we sample negative examples from the model’s current Boltzmann distribution. Next, a feature is trained to improve classification performance between data and negative examples. Finally, a coefficient is learned which determines the importance of this feature relative to ones already in the pool. Negative examples only need to be generated once to learn each new feature. The validity of the approach is demonstrated on binary digits and continuous synthetic data.
[1] Y. Freund and D. Haussler. Unsupervised learning of distributions of binary vectors using 2-layer networks. In Advances in Neural Information Processing Systems, volume 4, pages 912–919, 1992.
[2] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. Technical report, Dept. of Statistics, Stanford University Technical Report., 1998.
[3] J.H. Friedman. Greedy function approximation: A gradient boosting machine. Technical report, Technical Report, Dept. of Statistics, Stanford University, 1999.
[4] G.E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14:1771–1800, 2002.
[5] G.E. Hinton and A. Brown. Spiking Boltzmann machines. In Advances in Neural Information Processing Systems, volume 12, 2000.
[6] G. Lebanon and J. Lafferty. Boosting and maximum likelihood for exponential models. In Advances in Neural Information Processing Systems, volume 14, 2002.
[7] L. Mason, J. Baxter, P. Bartlett, and M. Frean. Boosting algorithms as gradient descent. In Advances in Neural Information Processing Systems, volume 12, 2000.
[8] S. Della Pietra, V.J. Della Pietra, and J.D. Lafferty. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380–393, 1997.
[9] S. Rosset and E. Segal. Boosting density estimation. In Advances in Neural Information Processing Systems, volume 15 (this volume), 2002.
[10] R.E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. In Computational Learing Theory, pages 80–91, 1998.
[11] S.C. Zhu, Z.N. Wu, and D. Mumford. Minimax entropy principle and its application to texture modeling. Neural Computation, 9(8):1627–1660, 1997.