cvpr cvpr2013 cvpr2013-304 cvpr2013-304-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Liefeng Bo, Xiaofeng Ren, Dieter Fox
Abstract: Complex real-world signals, such as images, contain discriminative structures that differ in many aspects including scale, invariance, and data channel. While progress in deep learning shows the importance of learning features through multiple layers, it is equally important to learn features through multiple paths. We propose Multipath Hierarchical Matching Pursuit (M-HMP), a novel feature learning architecture that combines a collection of hierarchical sparse features for image classification to capture multiple aspects of discriminative structures. Our building blocks are MI-KSVD, a codebook learning algorithm that balances the reconstruction error and the mutual incoherence of the codebook, and batch orthogonal matching pursuit (OMP); we apply them recursively at varying layers and scales. The result is a highly discriminative image representation that leads to large improvements to the state-of-the-art on many standard benchmarks, e.g., Caltech-101, Caltech-256, MITScenes, Oxford-IIIT Pet and Caltech-UCSD Bird-200.
[1] M. Aharon, M. Elad, and A. Bruckstein. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation. IEEE Transactions on Signal Processing, 54(1 1):431 1–4322, 2006. 2
[2] L. Bo, X. Ren, and D. Fox. Kernel Descriptors for Visual Recognition. In NIPS, 2010. 1
[3] L. Bo, X. Ren, and D. Fox. Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms. In NIPS, 2011. 1, 2, 6
[4] L. Bo, X. Ren, and D. Fox. Unsupervised Feature Learning for RGB-D Based Object Recognition. In ISER, 2012. 2, 7
[5] Y. Boureau, N. Roux, F. Bach, J. Ponce, and Y. LeCun. Ask the Locals: Multi-Way Local Pooling for Image Recognition. In ICCV, 2011. 2, 6
[6] S. Branson, C. Wah, B. Babenko, F. Schroff, P. Welinder, P. Perona, and S. Belongie. Visual Recognition with Humans in the Loop. In ECCV, 2010. 7
[7] E. Candes and J. Romberg. Sparsity and Incoherence in Compressive Sampling. Inverse problems, 23:969, 2007. 2
[8] K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman. The Devil is in the Details: an Evaluation of Recent Feature Encoding Methods. In BMVC, 2011. 6
[9] A. Coates and A. Ng. The Importance of Encoding versus Training with Sparse Coding and Vector Quantization. In ICML, 2011. 2, 6
[10] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object Detection with Discriminatively Trained PartBased Models. IEEE PAMI, 32: 1627–1645, 2010. 7
[11] P. Gehler and S. Nowozin. On Feature Combination for Multiclass Object Classification. In ICCV, 2009. 6
[12] G. Hinton, S. Osindero, and Y. Teh. A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18(7): 1527–1554, 2006. 1, 2
[13] Z. Jiang, Z. Lin, and L. Davis. Learning a Discriminative Dictionary for Sparse Coding via Label Consistent K-SVD. In CVPR, 2011. 6
[14] F. Khan, J. van de Weijer, A. Bagdanov, and M. Vanrell. Portmanteau Vocabularies for Multi-cue Image Representations. NIPS, 2011. 7
[15] A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet Classification with Deep Convolutional Neural Networks. In NIPS, 2012. 1, 2
[16] S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In CVPR, 2006. 4, 6
[17] Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng. Building High-Level Features Using Large Scale Unsupervised Learning. In ICML, 2012. 1, 2
[18] H. Lee, R. Grosse, R. Ranganath, and A. Ng. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations. In ICML, 2009. 2
[19] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman.
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30] Discriminative Learned Dictionaries for Local Image Analysis. In CVPR, 2008. 2 S. McCann and D. Lowe. Local Naive Bayes Nearest Neighbor for image classification. In CVPR, 2012. 6 B. Olshausen and D. Field. Emergence ofSimple-cell Receptive Field Properties by Learning a Sparse Code for Natural Images. Nature, 381:607–609, 1996. 2 M. Pandey and S. Lazebnik. Scene Recognition and Weakly Supervised Object Localization with Deformable Part-Based Models. In ICCV, 2011. 7 S. N. Parizi, J. Oberlin, and P. Felzenszwalb. Reconfigurable Models for Scene Recognition. In CVPR, 2012. 7 O. Parkhi, A. Vedaldi, A. Zisserman, and C. Jawahar. Cats and Dogs. In CVPR, 2012. 7 Y. Pati, R. Rezaiifar, and P. Krishnaprasad. Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition. In The TwentySeventh Asilomar Conference on Signals, Systems and Computers, pages 40–44, 1993. 3 I. Ramirez, P. Sprechmann, and G. Sapiro. Classification and Clustering via Dictionary Learning with Structured Incoherence and Shared Features. In CVPR, 2010. 3 K. Sohn, D. Jung, H. Lee, and A. Hero III. Efficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition. In ICCV, 2011. 6 J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Guo. Locality-constrained Linear Coding for Image Classification. In CVPR, 2010. 2, 6 J. Yang, K. Yu, Y. Gong, and T. Huang. Linear Spatial Pyramid Matching using Sparse Coding for Image Classification. In CVPR, 2009. 6 S. Yang, L. Bo, J. Wang, and L. Shapiro. Unsupervised template learning for fine-grained object recognition. In NIPS, 2012. 7 [3 1] B. Yao, A. Khosla, and L. Fei-Fei. Combining Randomization and Discrimination for Fine-grained Image Categorization. CVPR, 2011. 7
[32] K. Yu, Y. Lin, and J. Lafferty. Learning Image Representations from the Pixel Level via Hierarchical Sparse Coding. In CVPR, 2011. 1, 2, 6
[33] M. Zeiler, G. Taylor, and R. Fergus. Adaptive Deconvolutional Networks for Mid and High Level Feature Learning. In ICCV, 2011. 2
[34] N. Zhang, R. Farrell, and T. Darrell. Pose Pooling Kernels for Sub-category Recognition. CVPR, 2012. 7 667