cvpr cvpr2013 cvpr2013-144 cvpr2013-144-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Qiang Chen, Zheng Song, Rogerio Feris, Ankur Datta, Liangliang Cao, Zhongyang Huang, Shuicheng Yan
Abstract: In recent years, efficiency of large-scale object detection has arisen as an important topic due to the exponential growth in the size of benchmark object detection datasets. Most current object detection methods focus on improving accuracy of large-scale object detection with efficiency being an afterthought. In this paper, we present the Efficient Maximum Appearance Search (EMAS) model which is an order of magnitude faster than the existing state-of-the-art large-scale object detection approaches, while maintaining comparable accuracy. Our EMAS model consists of representing an image as an ensemble of densely sampled feature points with the proposed Pointwise Fisher Vector encoding method, so that the learnt discriminative scoring function can be applied locally. Consequently, the object detection problem is transformed into searching an image sub-area for maximum local appearance probability, thereby making EMAS an order of magnitude faster than the traditional detection methods. In addition, the proposed model is also suitable for incorporating global context at a negligible extra computational cost. EMAS can also incorporate fusion of multiple features, which greatly improves its performance in detecting multiple object categories. Our experiments show that the proposed algorithm can perform detection of 1000 object classes in less than one minute per image on the Image Net ILSVRC2012 dataset and for 107 object classes in less than 5 seconds per image for the SUN09 dataset using a single CPU.
[1] Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR. (2005)
[2] Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans. Pattern Analysis and Machine Intelligence (2010)
[3] Chum, O., Zisserman, A.: An Exemplar Model for Learning Object Classes. In: CVPR. (2007)
[4] Felzenszwalb, P., Girshick, R.: Cascade object detection with deformable part models. In: CVPR. (2010)
[5] Lampert, C., Blaschko, M., Hofmann, T.: Beyond sliding windows: Object localization by efficient subwindow search. In: CVPR (2008)
[6] N. Gunji, T. Higuchi, K. Yasumoto, H. Muraoka, Y. Ushiku, T. Harada, and Y. Kuniyoshi.: Scalable Multiclass Object Categorization with Fisher Based Features http : / /www . image-net .org/ chal lenge s / LSVRC / 2 0 12 / ii s .pdf
[7] Lampert, C., Blaschko, M.: Learning to Localize Objects with Structured Output Regression. In: ECCV. (2008)
[8] An, S., Peursum, P., Liu, W., Venkatesh, S.: Efficient subwindow search with submodular score functions. In: CVPR. (201 1)
[9] An, S., Peursum, P., Liu, W., Venkatesh, S.: Efficient algorithms for subwindow search in object detection and localization. In: CVPR. (2009)
[10] Florent Perronnin, J.S., Mensink, T.: Improving the Fisher Kernel for LargeScale Image Classification. In: ECCV. (2010)
[11] Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision (2010)
[12] Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations. In: CVPR. (2009)
[13] Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-ofparts. In: CVPR. (201 1)
[14] Yang, J., Yu, K., Gong, Y.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR. (2009)
[15] Wang, J., Yang, J., Yu, K., Lv, F., Huang, T.: Locality-constrained linear coding for image classification. In: CVPR. (2010)
[16] Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple Kernels for Object Detection. In: ICCV. (2009)
[17] Chatfield, K., Lempitsky, V., Vedaldi, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC. (201 1)
[18] Zhou, X., Yu, K., Zhang, T.: Image classification using super-vector coding of local image descriptors. In: ECCV. (2010)
[19] Bentley, J.: Programming Pearls (2nd Edition). Addison-Wesley Professional (1999)
[20] Choi, M.J., Lim, J., Torralba: Exploiting hierarchical context on a large database of object categories. In: CVPR. (2010)
[21] Lempitsky,V., Zisserman, A., Learning to Count Objects in Images In: NIPS. (2010)
[22] Song, Z., Chen, Q., Huang, Z., Hua, Y., Yan, S.: Contextualizing object detection and classification. In: CVPR. (201 1)
[23] Zhu, L., Chen, Y., Yuille, A.: Learning a Hierarchical Deformable Template for Rapid Deformable Object Parsing. IEEE Trans. Pattern Analysis and Machine Intelligence (2010)
[24] Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet Classification with Deep Convolutional Neural Networks. In: NIPS. (2012)
[25] Csurka G., Perronnin F.: A Simple High Performance Approach to Semantic Segmentation. In: BMVC. (2008)
[26] Fischler, M.A., Elschlager, R.A.: The Representation and Matching of Pictorial Structures IEEE Trans. Computers. (1973)
[27] Russell, B., Torralba, A., Murphy, K.: LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision (2008)
[28] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei,: ImageNet: A Large-Scale Hierarchical Image Database. In: CVPR. (2009).
[29] Dubout, C., Fleuret, F.: Exact Acceleration of Linear Object Detectors. In: ECCV. (2012)
[30] Hsieh, C., Chang, K., Lin, C., Keerthi, S., Sundararajan, S.: A dual coordinate descent method for large-scale linear SVM. In: ICML. (2008) [3 1] Song, H., Zickler S.,Althoff T., Girshick R., Fritz M., Geyer C., Felzenszwalb P., Darrell T.: Sparselet Models for Efficient Multiclass Object Detection In: ECCV. (2012)
[32] Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms. http : / /www .vlfeat . org/, 2008. 333 111999557