nips nips2010 nips2010-140 nips2010-140-reference knowledge-graph by maker-knowledge-mining

140 nips-2010-Layer-wise analysis of deep networks with Gaussian kernels

Source: pdf

Author: Grégoire Montavon, Klaus-Robert Müller, Mikio L. Braun

Abstract: Deep networks can potentially express a learning problem more efﬁciently than local learning machines. While deep networks outperform local learning machines on some problems, it is still unclear how their nice representation emerges from their complex structure. We present an analysis based on Gaussian kernels that measures how the representation of the learning problem evolves layer after layer as the deep network builds higher-level abstract representations of the input. We use this analysis to show empirically that deep networks build progressively better representations of the learning problem and that the best representations are obtained when the deep network discriminates only in the last layers. 1

reference text

Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In Advances in Neural Information Processing Systems 19, pages 153–160. MIT Press, 2007. Yoshua Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1):1–127, 2009. Mikio L. Braun. Accurate bounds for the eigenvalues of the kernel matrix. Journal of Machine Learning Research, 7:2303–2328, Nov 2006. Mikio L. Braun, Joachim Buhmann, and Klaus-Robert M¨ ller. On relevant dimensions in kernel u feature spaces. Journal of Machine Learning Research, 9:1875–1908, Aug 2008. R. Collobert and J. Weston. A uniﬁed architecture for natural language processing: Deep neural networks with multitask learning. In International Conference on Machine Learning, ICML, 2008. Dumitru Erhan, Yoshua Bengio, Aaron C. Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11:625–660, 2010. Ian Goodfellow, Quoc Le, Andrew Saxe, and Andrew Y. Ng. Measuring invariances in deep networks. In Advances in Neural Information Processing Systems 22, pages 646–654, 2009. G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, July 2006. Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural Comput., 18(7):1527–1554, 2006. D. H. Hubel and T. N. Wiesel. Receptive ﬁelds, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of physiology, 160:106–154, January 1962. Kevin Jarrett, Koray Kavukcuoglu, Marc’Aurelio Ranzato, and Yann LeCun. What is the best multistage architecture for object recognition? In Proc. International Conference on Computer Vision (ICCV’09). IEEE, 2009. Hugo Larochelle, Yoshua Bengio, J´ rˆ me Louradour, and Pascal Lamblin. Exploring strategies for eo training deep neural networks. J. Mach. Learn. Res., 10:1–40, 2009. ISSN 1532-4435. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(1):2278–2324, November 1998. Hossein Mobahi, Ronan Collobert, and Jason Weston. Deep learning from temporal coherence in video. In L´ on Bottou and Michael Littman, editors, Proceedings of the 26th International e Conference on Machine Learning, pages 737–744, Montreal, June 2009. Omnipress. Noboru Murata, Shuji Yoshizawa, and Shun ichi Amari. Network information criterion - determining the number of hidden units for an artiﬁcial neural network model. IEEE Transactions on Neural Networks, 5:865–872, 1994. Genevieve B. Orr and Klaus-Robert M¨ ller, editors. Neural Networks: Tricks of the Trade, this book u is an outgrowth of a 1996 NIPS workshop, volume 1524 of Lecture Notes in Computer Science, 1998. Springer. M. A. Ranzato, Fu J. Huang, Y. L. Boureau, and Y. LeCun. Unsupervised learning of invariant feature hierarchies with applications to object recognition. In Computer Vision and Pattern Recognition, 2007. CVPR ’07. IEEE Conference on, pages 1–8, 2007. 8 D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by back-propagating errors. Nature, 323(6088):533–536, 1986. Bernhard Sch¨ lkopf, Alexander Smola, and Klaus-Robert M¨ ller. Nonlinear component analysis as o u a kernel eigenvalue problem. Neural Comput., 10(5):1299–1319, 1998. Jason Weston, Fr´ d´ ric Ratle, and Ronan Collobert. Deep learning via semi-supervised embedding. e e In ICML ’08: Proceedings of the 25th international conference on Machine learning, pages 1168– 1175, 2008. Alexander Zien, Gunnar R¨ tsch, Sebastian Mika, Bernhard Sch¨ lkopf, Thomas Lengauer, and a o Klaus-Robert M¨ ller. Engineering support vector machine kernels that recognize translation iniu tiation sites. Bioinformatics, 16(9):799–807, 2000. 9