nips nips2001 nips2001-75 nips2001-75-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Brendan J. Frey, Nebojsa Jojic
Abstract: In previous work on “transformed mixtures of Gaussians” and “transformed hidden Markov models”, we showed how the EM algorithm in a discrete latent variable model can be used to jointly normalize data (e.g., center images, pitch-normalize spectrograms) and learn a mixture model of the normalized data. The only input to the algorithm is the data, a list of possible transformations, and the number of clusters to find. The main criticism of this work was that the exhaustive computation of the posterior probabilities over transformations would make scaling up to large feature vectors and large sets of transformations intractable. Here, we describe how a tremendous speed-up is acheived through the use of a variational technique for decoupling transformations, and a fast Fourier transform method for computing posterior probabilities. For N ×N images, learning C clusters under N rotations, N scales, N x-translations and N y-translations takes only (C + 2 log N )N 2 scalar operations per iteration. In contrast, the original algorithm takes CN 6 operations to account for these transformations. We give results on learning a 4-component mixture model from a video sequence with frames of size 320 ×240. The model accounts for 360 rotations and 76,800 translations. Each iteration of EM takes only 10 seconds per frame in MATLAB, which is over 5 million times faster than the original algorithm. 1
Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Proceedings of the Royal Statistical Society, B-39:1–38. Frey, B. J. and Jojic, N. 2001. Transformation invariant clustering and dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence. To appear. Available at http://www.cs.utoronto.ca/∼frey. Figure 2: Learning a rotation, scale and translation invariant model on 320x240 video Hinton, G. E., Dayan, P., and Revow, M. 1997. Modeling the manifolds of images of handwritten digits. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8:65–74. Jojic, N., Petrovic, N., Frey, B. J., and Huang, T. S. 2000. Transformed hidden markov models: Estimating mixture models of images and inferring spatial transformations in video sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., and Saul, L. K. 1998. An introduction to variational methods for graphical models. In Jordan, M. I., editor, Learning in Graphical Models. Kluwer Academic Publishers, Norwell MA. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324. Neal, R. M. and Hinton, G. E. 1998. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Jordan, M. I., editor, Learning in Graphical Models, pages 355–368. Kluwer Academic Publishers, Norwell MA. Simard, P. Y., LeCun, Y., and Denker, J. 1993. Efficient pattern recognition using a new transformation distance. In Hanson, S. J., Cowan, J. D., and Giles, C. L., editors, Advances in Neural Information Processing Systems 5. Morgan Kaufmann, San Mateo CA. Vasconcelos, N. and Lippman, A. 1998. Multiresolution tangent distance for affineinvariant classification. In Jordan, M. I., Kearns, M. I., and Solla, S. A., editors, Advances in Neural Information Processing Systems 10. MIT Press, Cambridge MA. Wolberg, G. and Zokai, S. 2000. Robust image registration using log-polar transform. In Proceedings IEEE Intl. Conference on Image Processing, Vancouver, Canada.