nips nips2012 nips2012-197 nips2012-197-reference knowledge-graph by maker-knowledge-mining

197 nips-2012-Learning with Recursive Perceptual Representations

Source: pdf

Author: Oriol Vinyals, Yangqing Jia, Li Deng, Trevor Darrell

Abstract: Linear Support Vector Machines (SVMs) have become very popular in vision as part of state-of-the-art object recognition and other classiﬁcation tasks but require high dimensional feature spaces for good performance. Deep learning methods can ﬁnd more compact representations but current methods employ multilayer perceptrons that require solving a difﬁcult, non-convex optimization problem. We propose a deep non-linear classiﬁer whose layers are SVMs and which incorporates random projection as its core stacking element. Our method learns layers of linear SVMs recursively transforming the original data manifold through a random projection of the weak prediction computed from each layer. Our method scales as linear SVMs, does not rely on any kernel computations or nonconvex optimization, and exhibits better generalization ability than kernel-based SVMs. This is especially true when the number of training samples is smaller than the dimensionality of data, a common scenario in many real-world applications. The use of random projections is key to our method, as we show in the experiments section, in which we observe a consistent improvement over previous –often more complicated– methods on several vision and speech benchmarks. 1

reference text

[1] P L Bartlett and S Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. The Journal of Machine Learning Research, 3:463–482, 2003.

[2] O Boiman, E Shechtman, and M Irani. In defense of nearest-neighbor based image classiﬁcation. In CVPR, 2008.

[3] L Bourdev, S Maji, T Brox, and J Malik. Detecting people using mutually consistent poselet activations. In ECCV, 2010.

[4] A Coates and A Ng. The importance of encoding versus training with sparse coding and vector quantization. In ICML, 2011.

[5] W Cohen and V R de Carvalho. Stacked sequential learning. In IJCAI, 2005.

[6] R Collobert, F Sinz, J Weston, and L Bottou. Trading convexity for scalability. In ICML, 2006.

[7] N Dalal. Histograms of oriented gradients for human detection. In CVPR, 2005.

[8] S Davis and P Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Acoustics, Speech and Signal Processing, IEEE Transactions on, 28(4):357–366, 1980.

[9] L Deng, M L Seltzer, D Yu, A Acero, A Mohamed, and G Hinton. Binary coding of speech spectrograms using a deep auto-encoder. In Interspeech, 2010.

[10] L Deng and D Yu. Deep convex network: A scalable architecture for deep learning. In Interspeech, 2011.

[11] L Deng, D Yu, and J Platt. Scalable stacking and learning for building deep architectures. In ICASSP, 2012.

[12] L Fei-Fei and P Perona. A bayesian hierarchical model for learning natural scene categories. In CVPR, 2005.

[13] G Hinton, L Deng, D Yu, G Dahl, A Mohamed, N Jaitly, A Senior, V Vanhoucke, P Nguyen, T Sainath, and B Kingsbury. Deep Neural Networks for Acoustic Modeling in Speech Recognition. IEEE Signal Processing Magazine, 28:82–97, 2012.

[14] G Hinton and R Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504, 2006.

[15] K Jarrett, K Kavukcuoglu, M A Ranzato, and Y LeCun. What is the best multi-stage architecture for object recognition? In ICCV, 2009.

[16] T Kohonen. Self-Organizing Maps. Springer-Verlag, 2001.

[17] Y Lin, T Zhang, S Zhu, and K Yu. Deep coding network. In NIPS, 2010.

[18] D Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 2004.

[19] S Maji, AC Berg, and J Malik. Classiﬁcation using intersection kernel support vector machines is efﬁcient. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. Ieee, 2008.

[20] A Mohamed, D Yu, and L Deng. Investigation of full-sequence training of deep belief networks for speech recognition. In Interspeech, 2010.

[21] B Olshausen and D J Field. Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision research, 37(23):3311–3325, 1997.

[22] O Vinyals and L Deng. Are Sparse Representations Rich Enough for Acoustic Modeling? In Interspeech, 2012.

[23] D H Wolpert. Stacked generalization. Neural networks, 5(2):241–259, 1992.

[24] J Yang, K Yu, and Y Gong. Linear spatial pyramid matching using sparse coding for image classiﬁcation. In CVPR, 2009.

[25] J Yang, K Yu, and T Huang. Efﬁcient highly over-complete sparse coding using a mixture model. In ECCV, 2010.

[26] K Yu and T Zhang. Improved Local Coordinate Coding using Local Tangents. In ICML, 2010. 9