cvpr cvpr2013 cvpr2013-204 cvpr2013-204-reference knowledge-graph by maker-knowledge-mining

204 cvpr-2013-Histograms of Sparse Codes for Object Detection

Source: pdf

Author: Xiaofeng Ren, Deva Ramanan

Abstract: Object detection has seen huge progress in recent years, much thanks to the heavily-engineered Histograms of Oriented Gradients (HOG) features. Can we go beyond gradients and do better than HOG? Weprovide an affirmative answer byproposing and investigating a sparse representation for object detection, Histograms of Sparse Codes (HSC). We compute sparse codes with dictionaries learned from data using K-SVD, and aggregate per-pixel sparse codes to form local histograms. We intentionally keep true to the sliding window framework (with mixtures and parts) and only change the underlying features. To keep training (and testing) efficient, we apply dimension reduction by computing SVD on learned models, and adopt supervised training where latent positions of roots and parts are given externally e.g. from a HOG-based detector. By learning and using local representations that are much more expressive than gradients, we demonstrate large improvements over the state of the art on the PASCAL benchmark for both root- only and part-based models.

reference text

[1] M. Aharon, M. Elad, and A. Bruckstein. K-SVD: An algorithm for designing overcomplete dictionaries for sparse 333222555200

[2]

[3]

[4]

[5]

[6]

[7] representation. IEEE Transactions on Signal Processing, 54(1 1):431 1–4322, 2006. H. Azizpour and I. Laptev. Object detection using stronglysupervised deformable part models. In ECCV, 2012. L. Bo, X. Ren, and D. Fox. Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms. In Advances in Neural Information Processing Systems 24, 2011. A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In Proceedings of the 6th ACM international conference on Image and video retrieval, pages 401–408. ACM, 2007. L. Bourdev and J. Malik. Poselets: Body part detectors trained using 3d human pose annotations. In ICCV, pages 1365–1372. IEEE, 2009. J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu. Semantic segmentation with second-order pooling. In ECCV, 2012. A. Coates, B. Carpenter, C. Case, S. Satheesh, B. Suresh, T. Wang, D. Wu, and A. Ng. Text detection and character

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17] recognition in scene images with unsupervised feature learning. In Document Analysis and Recognition (ICDAR), pages 440–445, 2011. N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, pages I:886–893, 2005. M. Dikmen, D. Hoiem, and T. S. Huang. A data-driven method for feature transformation. In CVPR. IEEE, 2012. S. Divvala, A. Efros, and M. Hebert. How important are deformable parts in the deformable parts model? In ECCV Workshop on Parts and Attributes, 2012. P. Doll a´r, Z. Tu, P. Perona, and S. Belongie. Integral channel features. In British Machine Vision Conference, pages 1–1 1, 2009. M. Everingham, L. Van Gool, C. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2):303– 338, 2010. P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained partbased models. IEEE Trans. PAMI, 32(9): 1627–1645, 2010. P. F. Felzenszwalb, R. B. Girshick, and D. McAllester. Discriminatively trained deformable part models, release 4. http://people.cs.uchicago.edu/ pff/latent-release4/. R. Girshick, P. Felzenszwalb, and D. McAllester. Object detection with grammar models. Advances in Neural Information Processing Systems 24, 2011. G. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7): 1527–1554, 2006. S. Hussain, W. Triggs, et al. Feature sets and dimensionality reduction for visual object detection. In British Machine

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26] Vision Conference, 2010. K. Kavukcuoglu, P. Sermanet, Y. Boureau, K. Gregor, M. Mathieu, and Y. LeCun. Learning convolutional feature hierarchies for visual recognition. In Advances in Neural Information Processing Systems 23, pages 1090–1098, 2010. A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, 2012. B. Leibe, A. Leonardis, and B. Schiele. Combined object categorization and segmentation with an implicit shape model. In Workshop on Statistical Learning in Computer Vision, ECCV, pages 17–32, 2004. T. Malisiewicz, A. Gupta, and A. Efros. Ensemble of exemplar-svms for object detection and beyond. In ICCV, pages 89–96. IEEE, 2011. M. Ozuysal, M. Calonder, V. Lepetit, and P. Fua. Fast keypoint recognition using random ferns. Pattern Analysis andMachine Intelligence, IEEE Transactions on, 32(3):448– 461, 2010. D. Parikh and C. Zitnick. Finding the weakest link in person detectors. In CVPR, pages 1425–1432. IEEE, 2011. Y. Pati, R. Rezaiifar, and P. Krishnaprasad. Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition. In The TwentySeventh Asilomar Conference on Signals, Systems and Computers, pages 40–44, 1993. H. Pirsiavash, D. Ramanan, and C. Fowlkes. Bilinear classifiers for visual recognition. Advances in Neural Information Processing Systems 22, 1(2), 2009. X. Ren and L. Bo. Discriminatively trained sparse code gradients for contour detection. In Advances in Neural Informa-

[27]

[28]

[29]

[30] [3 1]

[32]

[33]

[34]

[35]

[36] tion Processing Systems 25, 2012. R. Rubinstein, M. Zibulevsky, and M. Elad. Efficient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit. Technical report, CS Technion, 2008. W. Schwartz, A. Kembhavi, D. Harwood, and L. Davis. Human detection using partial least squares analysis. In ICCV, pages 24–31. IEEE, 2009. H. Song, S. Zickler, T. Althoff, R. Girshick, M. Fritz, C. Geyer, P. Felzenszwalb, and T. Darrell. Sparselet models for efficient multiclass object detection. In ECCV, 2012. A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. In ICCV, pages 606–613. IEEE, 2009. S. Vijayanarasimhan and K. Grauman. Efficient region search for object detection. In CVPR, pages 1401–1408, 2011. J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In CVPR, pages 3485–3492, 2010. J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In CVPR, pages 1794–1801, 2009. Y. Yang and D. Ramanan. Articulated pose estimation with flexible mixtures-of-parts. In CVPR, pages 1385–1392. IEEE, 2011. X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In CVPR, pages 2879– 2886. IEEE, 2012. X. Zhu, C. Vondrick, D. Ramanan, and C. Fowlkes. Do we need more training data or better models for object detection? In BMVC, 2012. 333222555311