iccv iccv2013 iccv2013-327 iccv2013-327-reference knowledge-graph by maker-knowledge-mining

327 iccv-2013-Predicting an Object Location Using a Global Image Representation

Source: pdf

Author: Jose A. Rodriguez Serrano, Diane Larlus

Abstract: We tackle the detection of prominent objects in images as a retrieval task: given a global image descriptor, we find the most similar images in an annotated dataset, and transfer the object bounding boxes. We refer to this approach as data driven detection (DDD), that is an alternative to sliding windows. Previous works have used similar notions but with task-independent similarities and representations, i.e. they were not tailored to the end-goal of localization. This article proposes two contributions: (i) a metric learning algorithm and (ii) a representation of images as object probability maps, that are both optimized for detection. We show experimentally that these two contributions are crucial to DDD, do not require costly additional operations, and in some cases yield comparable or better results than state-of-the-art detectors despite conceptual simplicity and increased speed. As an application of prominent object detection, we improve fine-grained categorization by precropping images with the proposed approach.

reference text

[1] http://www.image-net.org/challenges/LSVRC/2012/. 6

[2] B. Alexe, T. Deselaers, and V. Ferrari. What is an object? In CVPR, 2010. 2

[3] B. Bai, J. Weston, D. Grangier, R. Collobert, K. Sadamasa, Y. Qi, O. Chapelle, and K. Weinberger. Supervised semantic indexing. In CIKM, 2009. 3

[4] L. Bottou. Stochastic learning. In Advanced Lectures on Machine Learning, 2003. 4

[5] G. Csurka and F. Perronnin. An efficient approach to semantic segmentation. IJCV, 2011. 4, 5

[6] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005. 1, 2

[7] J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon. Information-theoretic metric learning. In ICML, 2007. 3

[8] K. Duan, D. Parikh, D. J. Crandall, and K. Grauman. Discovering localized attributes for fine-grained recognition. In CVPR, 2012. 1

[9] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL voc Challenge 2012 Results. 1, 2, 3, 5

[10] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. PAMI, 2010. 1, 2, 5

[11] R. B. Girshick, P. F. Felzenszwalb, and D. McAllester. Discriminatively trained deformable part models, release 5. http://people.cs.uchicago.edu/ rbg/latentrelease5/. 5

[12] A. Y. Halevy, P. Norvig, and F. Pereira. The unreasonable effectiveness of data. IEEE Intelligent Systems, 2009. 2

[13] J. Hays and A. A. Efros. im2gps: estimating geographic information from a single image. In CVPR, 2008. 2

[14] S. Johnson and M. Everingham. Learning effective human pose estimation from inaccurate annotation. In CVPR, pages 1465–1472, 2011. 5

[15] D. Kuettel and V. Ferrari. Figure-ground segmentation by transferring window masks. In CVPR, June 2012. 2, 3

[16] N. Kumar, P. N. Belhumeur, A. Biswas, D. W. Jacobs, W. J. Kress, I. C. Lopez, and J. V. B. Soares. Leafsnap: A computer vision system for automatic plant species identification. In ECCV, 2012. 1

[17] L. Ladicky, C. Russell, P. Kohli, and P. H. S. Torr. Associative hierarchical CRFs for object class image segmentation. In ICCV, pages 739–746, 2009. 4

[18] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006. 5

[19] E. P. X. Li-Jia Li, Hao Su and L. Fei-Fei. Object bank: A high-level image representation for scene classification & semantic feature sparsification. In NIPS, 2010. 4

[20] C. Liu, J. Yuen, and A. Torralba. Nonparametric scene parsing via label transfer. IEEE PAMI, 2011. 2, 3

[21] A. Makadia, V. Pavlovic, and S. Kumar. A new baseline for image annotation. In ECCV, 2008. 2

[22] T. Malisiewicz, A. Gupta, and A. A. Efros. Ensemble of exemplar-svms for object detection and beyond. In ICCV, 2011. 2

[23] L. Marchesotti, C. Cifarelli, and G. Csurka. A framework for visual saliency detection with applications to image thumbnailing. In ICCV, 2009. 1, 2

[24] F. Perronnin, Y. Liu, J. S ´anchez, and H. Poirier. Large-scale image retrieval with compressed fisher vectors. In CVPR, 2010. 3

[25] F. Perronnin, J. S ´anchez, and T. Mensink. Improving the Fisher kernel for large-scale image classification. In ECCV, 2010. 3, 5

[26] B. Russell, A. A. Efros, J. Sivic, B. Freeman, and A. Zisserman. Segmenting scenes by matching image composites. In NIPS09, 2009. 2, 3

[27] B. Russell, A. Torralba, C. Liu, R. Fergus, and W. Freeman. Object recognition by scene alignment. In NIPS. 2008. 2, 3

[28] X. Shen, Z. Lin, J. Brandt, and Y. Wu. Mobile product image search by automatic query object extraction. In ECCV, 2012. 1

[29] M. Stark, J. Krause, B. Pepik, D. Meger, J. J. Little, B. Schiele, and D. Koller. Fine-grained categorization for 3d scene understanding. In BMVC, 2012. 1

[30] J. Tighe and S. Lazebnik. Superparsing: Scalable nonparametric image parsing with superpixels. In ECCV, 2010. 2, 3

[31] A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: a large dataset for non-parametric object and scene recognition. IEEE PAMI, 2008. 2

[32] Z. Tu and X. Bai. Auto-context and its application to high-level vision tasks and 3d brain image segmentation. PAMI, 32(10): 1744–1757, 2010. 4

[33] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The CaltechUCSD Birds-200-201 1Dataset. Technical Report CNS-TR-201 1-001, California Institute of Technology, 2011. 7

[34] K. Weinberger and L. Saul. Distance metric learning for large margin nearest neighbor classification. JMLR, 2009. 3

[35] B. Yao, G. Bradski, and L. Fei-Fei. A codebook-free and annotation-free ap- proach for fine-grained image categorization. In CVPR, 2012. 1

[36] B. Yao, A. Khosla, and L. Fei-Fei. Combining randomization and discrimination for fine-grained image categorization. In CVPR, 2011. 1 11773366