nips nips2011 nips2011-261 nips2011-261-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jiquan Ngiam, Zhenghao Chen, Sonia A. Bhaskar, Pang W. Koh, Andrew Y. Ng
Abstract: Unsupervised feature learning has been shown to be effective at learning representations that perform well on image, video and audio classification. However, many existing feature learning algorithms are hard to use and require extensive hyperparameter tuning. In this work, we present sparse filtering, a simple new algorithm which is efficient and only has one hyperparameter, the number of features to learn. In contrast to most other feature learning methods, sparse filtering does not explicitly attempt to construct a model of the data distribution. Instead, it optimizes a simple cost function – the sparsity of 2 -normalized features – which can easily be implemented in a few lines of MATLAB code. Sparse filtering scales gracefully to handle high-dimensional inputs, and can also be used to learn meaningful features in additional layers with greedy layer-wise stacking. We evaluate sparse filtering on natural images, object classification (STL-10), and phone classification (TIMIT), and show that our method works well on a range of different modalities. 1
[1] G. E. Dahl, M. Ranzato, A. Mohamed, and G. E. Hinton. Phone recognition with the mean-covariance restricted Boltzmann machine. In NIPS. 2010.
[2] H. Lee, Y. Largman, P. Pham, and A. Y. Ng. Unsupervised feature learning for audio classification using convolutional deep belief networks. In NIPS. 2009.
[3] J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In CVPR, 2009.
[4] M.A. Ranzato, F. J. Huang, Y.-L. Boureau, and Y. LeCun. Unsupervised learning of invariant feature hierarchies with applications to object recognition. In CVPR, 2007.
[5] Q. V. Le, W. Y. Zou, S. Y. Yeung, and A. Y. Ng. Learning hierarchical spatio-temporal features for action recognition with independent subspace analysis. In CVPR, 2011.
[6] H. Lee, C. Ekanadham, and A.Y. Ng. Sparse deep belief net model for visual area v2. In NIPS, 2008.
[7] G. E. Hinton, S. Osindero, and Y.W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006.
[8] P. Vincent, H. Larochelle, Y. Bengio, and P. A. Manzagol. Extracting and composing robust features with denoising autoencoders. In ICML, 2008.
[9] J. H. van Hateren and A. van der Schaaf. Independent component filters of natural images compared with simple cells in primary visual cortex. Proceedings: Biological Sciences, 265(1394):359–366, 1998.
[10] A. J. Bell and T. J. Sejnowski. The ”independent components” of natural scenes are edge filters. Vision Res., 37(23):3327–3338, December 1997.
[11] B. Olshausen and D. Field. Sparse coding with an overcomplete basis set: A strategy employed by V1? Nature, 1997.
[12] A. Hyv¨ rinen, J. Hurri, and Patrick O. Hoyer. Natural Image Statistics: A Probabilistic Approach to Early a Computational Vision. (Computational Imaging and Vision). Springer, 2nd printing. edition, 2009.
[13] D. J. Field. What is the goal of sensory coding? Neural Computation, 6(4):559–601, July 1994.
[14] B. Willmore and D. J. Tolhurst. Characterizing the sparseness of neural codes. Network, 12(3):255–270, January 2001.
[15] O. Schwartz and E. P. Simoncelli. Natural signal statistics and sensory gain control. Nature Neuroscience, 4:819–825, 2001.
[16] M.A. Ranzato, C. Poultney, S. Chopra, and Y. Lecun. Efficient learning of sparse representations with an energy-based model. In NIPS, 2006.
[17] A. Coates, H. Lee, and A. Y. Ng. An analysis of single-layer networks in unsupervised feature learning. In AISTATS, 2011. 8
[18] A. Treves and E. Rolls. What determines the capacity of autoassociative memories in the brain? Network: Computation in Neural Systems, 2:371–397(27), 1991.
[19] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy Layer-Wise training of deep networks. In NIPS, 2006.
[20] M. Schmidt. minFunc. http://www.cs.ubc.ca/˜schmidtm/Software/minFunc.html, 2005.
[21] M. Ranzato and G. E. Hinton. Modeling Pixel Means and Covariances Using Factorized Third-Order Boltzmann Machines. In CVPR, 2010.
[22] U. K¨ ster and A. Hyv¨ rinen. A two-layer model of natural stimuli estimated with score matching. Neural o a Computation, 22(9):2308–2333, 2010.
[23] A. Saxe, M. Bhand, Z. Chen, P.W. Koh, B. Suresh, and A.Y. Ng. On random weights and unsupervised feature learning. In ICML, 2011.
[24] S. Petrov, A. Pauls, and D. Klein. Learning structured models for phone recognition. In Proc. of EMNLPCoNLL, 2007.
[25] F. Sha and L.K. Saul. Large margin gaussian mixture modeling for phonetic classification and recognition. In ICASSP. IEEE, 2006.
[26] D. Yu, L. Deng, and A. Acero. Hidden conditional random field with distribution constraints for phone classification. In Interspeech, 2009.
[27] H.A. Chang and J.R. Glass. Hierarchical large-margin gaussian mixture models for phonetic classification. In Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on, pages 272–277. IEEE, 2007.
[28] W. E. Fisher, G. R. Doddington, and K. M. Goudle-marshall. The DARPA speech recognition research database: specifications and status. 1986.
[29] P. Clarkson and P. J. Moreno. On the use of support vector machines for phonetic classification. Acoustics, Speech, and Signal Processing, IEEE International Conference on, 2:585–588, 1999.
[30] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparse coding. In ICML, 2009.
[31] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie. ntu.edu.tw/˜cjlin/libsvm.
[32] M. Wainwright, O. Schwartz, and E. Simoncelli. Natural image statistics and divisive normalization: Modeling nonlinearity and adaptation in cortical neurons, 2001.
[33] K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun. What is the best multi-stage architecture for object recognition? In ICCV, 2009.
[34] N. Pinto, D. D. Cox, and J. J. DiCarlo. Why is Real-World visual object recognition hard? PLoS Comput Biol, 4(1):e27+, January 2008.
[35] Patrik O. Hoyer. Non-negative matrix factorization with sparseness constraints. JMLR, 5:1457–1469, 2004. 9