nips nips2010 nips2010-99 nips2010-99-reference knowledge-graph by maker-knowledge-mining

99 nips-2010-Gated Softmax Classification

Source: pdf

Author: Roland Memisevic, Christopher Zach, Marc Pollefeys, Geoffrey E. Hinton

Abstract: We describe a ”log-bilinear” model that computes class probabilities by combining an input vector multiplicatively with a vector of binary latent variables. Even though the latent variables can take on exponentially many possible combinations of values, we can efﬁciently compute the exact probability of each class by marginalizing over the latent variables. This makes it possible to get the exact gradient of the log likelihood. The bilinear score-functions are deﬁned using a three-dimensional weight tensor, and we show that factorizing this tensor allows the model to encode invariances inherent in a task by learning a dictionary of invariant basis functions. Experiments on a set of benchmark problems show that this fully probabilistic model can achieve classiﬁcation performance that is competitive with (kernel) SVMs, backpropagation, and deep belief nets. 1

reference text

[1] Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro, and Andrew Zisserman. Supervised dictionary learning. In Advances in Neural Information Processing Systems 21. 2009.

[2] Vinod Nair and Geoffrey Hinton. 3D object recognition with deep belief nets. In Advances in Neural Information Processing Systems 22. 2009.

[3] Roland Memisevic and Geoffrey Hinton. Learning to represent spatial transformations with factored higher-order Boltzmann machines. Neural Computation, 22(6):1473–92, 2010.

[4] Christopher Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.

[5] Adam Berger, Vincent Della Pietra, and Stephen Della Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39–71, 1996.

[6] Geoffrey Hinton. To recognize shapes, ﬁrst learn to generate images. Technical report, Toronto, 2006.

[7] Hugo Larochelle and Yoshua Bengio. Classiﬁcation using discriminative restricted Boltzmann machines. In ICML ’08: Proceedings of the 25th international conference on Machine learning, New York, NY, USA, 2008. ACM.

[8] Geoffrey Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1771–1800, 2002.

[9] Roland Memisevic and Geoffrey Hinton. Unsupervised learning of image transformations. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2007.

[10] Vinod Nair and Geoffrey Hinton. Implicit mixtures of restricted Boltzmann machines. In Advances in Neural Information Processing Systems 21. 2009.

[11] Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In ICML ’07: Proceedings of the 24th international conference on Machine learning, New York, NY, USA, 2007. ACM.

[12] Yoshua Bengio and Yann LeCun. Scaling learning algorithms towards ai. In L. Bottou, O. Chapelle, D. DeCoste, and J. Weston, editors, Large-Scale Kernel Machines. MIT Press, 2007.

[13] Youngmin Cho and Lawrence Saul. Kernel methods for deep learning. In Advances in Neural Information Processing Systems 22. 2009.

[14] Jason Weston, Fr´ d´ ric Ratle, and Ronan Collobert. Deep learning via semi-supervised embedding. In e e ICML ’08: Proceedings of the 25th international conference on Machine learning, New York, NY, USA, 2008. ACM.

[15] Bruno Olshausen, Charles Cadieu, Jack Culpepper, and David Warland. Bilinear models of natural images. In SPIE Proceedings: Human Vision Electronic Imaging XII, San Jose, 2007.

[16] Rajesh Rao and Dana Ballard. Efﬁcient encoding of natural time varying images produces oriented spacetime receptive ﬁelds. Technical report, Rochester, NY, USA, 1997.

[17] Rajesh Rao and Daniel Ruderman. Learning lie groups for invariant visual perception. In In Advances in Neural Information Processing Systems 11. MIT Press, 1999.

[18] David Grimes and Rajesh Rao. Bilinear sparse coding for invariant vision. Neural Computation, 17(1):47– 73, 2005.

[19] Joshua Tenenbaum and William Freeman. Separating style and content with bilinear models. Neural Computation, 12(6):1247–1283, 2000.

[20] Honglak Lee, Chaitanya Ekanadham, and Andrew Ng. Sparse deep belief net model for visual area V2. In Advances in Neural Information Processing Systems 20. MIT Press, 2008. 9