nips nips2011 nips2011-156 nips2011-156-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Antonio Torralba, Joshua B. Tenenbaum, Ruslan Salakhutdinov
Abstract: We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian models. Specifically we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the top-level features in a Deep Boltzmann Machine (DBM). This compound HDP-DBM model learns to learn novel concepts from very few training examples, by learning low-level generic features, high-level features that capture correlations among low-level features, and a category hierarchy for sharing priors over the high-level features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for the HDP-DBM model and show that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten character recognition, and human motion capture datasets. 1
[1] E. Bart, I. Porteous, P. Perona, and M. Welling. Unsupervised learning of visual taxonomies. In CVPR, pages 1–8, 2008.
[2] David M. Blei, Thomas L. Griffiths, and Michael I. Jordan. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM, 57(2), 2010.
[3] Kevin R. Canini and Thomas L. Griffiths. Modeling human transfer learning with the hierarchical dirichlet process. In NIPS 2009 workshop: Nonparametric Bayes, 2009.
[4] Li Fei-Fei, R. Fergus, and P. Perona. One-shot learning of object categories. IEEE Trans. Pattern Analysis and Machine Intelligence, 28(4):594–611, April 2006.
[5] G. E. Hinton, S. Osindero, and Y. W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006.
[6] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504 – 507, 2006.
[7] C. Kemp, A. Perfors, and J. Tenenbaum. Learning overhypotheses with hierarchical Bayesian models. Developmental Science, 10(3):307–321, 2006.
[8] Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, Dept. of Computer Science, University of Toronto, 2009.
[9] Brenden Lake, Ruslan Salakhutdinov, Jason Gross, and Josh Tenenbaum. One-shot learning of simple visual concepts. In Proceedings of the 33rd Annual Conference of the Cognitive Science Society, 2011.
[10] H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin. Exploring strategies for training deep neural networks. Journal of Machine Learning Research, 10:1–40, 2009.
[11] Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th International Conference on Machine Learning, pages 609–616, 2009.
[12] M. A. Ranzato, Y. Boureau, and Y. LeCun. Sparse feature learning for deep belief networks. Advances in Neural Information Processing Systems, 2008.
[13] A. Rodriguez, D. Dunson, and A. Gelfand. The nested Dirichlet process. Journal of the American Statistical Association, 103:11311144, 2008.
[14] R. R. Salakhutdinov and G. E. Hinton. Deep Boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 12, 2009.
[15] R. R. Salakhutdinov and G. E. Hinton. Replicated softmax: an undirected topic model. In Advances in Neural Information Processing Systems, volume 22, 2010.
[16] L.B. Smith, S.S. Jones, B. Landau, L. Gershkoff-Stowe, and L. Samuelson. Object name learning provides on-the-job training for attention. Psychological Science, pages 13–19, 2002.
[17] E. B. Sudderth, A. Torralba, W. T. Freeman, and A. S. Willsky. Describing visual scenes using transformed objects and parts. International Journal of Computer Vision, 77(1-3):291–330, 2008.
[18] G. Taylor, G. E. Hinton, and S. T. Roweis. Modeling human motion using binary latent variables. In Advances in Neural Information Processing Systems. MIT Press, 2006.
[19] Y. W. Teh and G. E. Hinton. Rate-coded restricted Boltzmann machines for face recognition. In Advances in Neural Information Processing Systems, volume 13, 2001.
[20] Y. W. Teh and M. I. Jordan. Hierarchical Bayesian nonparametric models with applications. In Bayesian Nonparametrics: Principles and Practice. Cambridge University Press, 2010.
[21] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581, 2006.
[22] T. Tieleman. Training restricted Boltzmann machines using approximations to the likelihood gradient. In ICML. ACM, 2008.
[23] A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: a large dataset for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11):1958–1970, 2008.
[24] A. Torralba, R. Fergus, and Y. Weiss. Small codes and large image databases for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008.
[25] Fei Xu and Joshua B. Tenenbaum. Word learning as bayesian inference. Psychological Review, 114(2), 2007.
[26] L. Younes. On the convergence of Markovian stochastic algorithms with rapidly decreasing ergodicity rates, March 17 2000. 9