nips nips2006 nips2006-72 nips2006-72-reference knowledge-graph by maker-knowledge-mining

72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model

Source: pdf

Author: Marc'aurelio Ranzato, Christopher Poultney, Sumit Chopra, Yann L. Cun

Abstract: We describe a novel unsupervised method for learning sparse, overcomplete features. The model uses a linear encoder, and a linear decoder preceded by a sparsifying non-linearity that turns a code vector into a quasi-binary sparse code vector. Given an input, the optimal code minimizes the distance between the output of the decoder and the input patch while being as similar as possible to the encoder output. Learning proceeds in a two-phase EM-like fashion: (1) compute the minimum-energy code vector, (2) adjust the parameters of the encoder and decoder so as to decrease the energy. The model produces “stroke detectors” when trained on handwritten numerals, and Gabor-like ﬁlters when trained on natural image patches. Inference and learning are very fast, requiring no preprocessing, and no expensive sampling. Using the proposed unsupervised method to initialize the ﬁrst layer of a convolutional network, we achieved an error rate slightly lower than the best reported result on the MNIST dataset. Finally, an extension of the method is described to learn topographical ﬁlter maps. 1

reference text

[1] Lee, D.D. and Seung, H.S. (1999) Learning the parts of objects by non-negative matrix factorization. Nature, 401:788-791.

[2] Hyvarinen, A. and Hoyer, P.O. (2001) A 2-layer sparse coding model learns simple and complex cell receptive ﬁelds and topography from natural images. Vision Research, 41:2413-2423.

[3] Olshausen, B.A. (2002) Sparse codes and spikes. R.P.N. Rao, B.A. Olshausen and M.S. Lewicki Eds. MIT press:257-272.

[4] Teh, Y.W. and Welling, M. and Osindero, S. and Hinton, G.E. (2003) Energy-based models for sparse overcomplete representations. Journal of Machine Learning Research, 4:1235-1260.

[5] Lennie, P. (2003) The cost of cortical computation. Current biology, 13:493-497

[6] Simoncelli, E.P. (2005) Statistical modeling of photographic images. Academic Press 2nd ed.

[7] Hinton, G.E. and Zemel, R.S. (1994) Autoencoders, minimum description length, and Helmholtz free energy. Advances in Neural Information Processing Systems 6, J. D. Cowan, G. Tesauro and J. Alspector (Eds.), Morgan Kaufmann: San Mateo, CA.

[8] Hinton, G.E. (2002) Training products of experts by minimizing contrastive divergence. Neural Computation, 14:1771-1800.

[9] Doi E., Balcan, D.C. and Lewicki, M.S. (2006) A theoretical analysis of robust coding over noisy overcomplete channels. Advances in Neural Information Processing Systems 18, MIT Press.

[10] Olshausen, B.A. and Field, D.J. (1997) Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Research, 37:3311-3325.

[11] Foldiak, P. (1990) Forming sparse representations by local anti-hebbian learning. Biological Cybernetics, 64:165-170.

[12] The berkeley segmentation dataset http://www.cs.berkeley.edu/projects/vision/grouping/segbench/

[13] The MNIST database of handwritten digits http://yann.lecun.com/exdb/mnist/

[14] Simard, P.Y. Steinkraus, D. and Platt, J.C. (2003) Best practices for convolutional neural networks. ICDAR

[15] LeCun, Y. Bottou, L. Bengio, Y. and Haffner, P. (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324.

[16] Hinton, G.E., Osindero, S. and Teh, Y. (2006) A fast learning algorithm for deep belief nets. Neural Computation 18, pp 1527-1554.