cvpr cvpr2013 cvpr2013-67 cvpr2013-67-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mayank Juneja, Andrea Vedaldi, C.V. Jawahar, Andrew Zisserman
Abstract: The automatic discovery of distinctive parts for an object or scene class is challenging since it requires simultaneously to learn the part appearance and also to identify the part occurrences in images. In this paper, we propose a simple, efficient, and effective method to do so. We address this problem by learning parts incrementally, starting from a single part occurrence with an Exemplar SVM. In this manner, additional part instances are discovered and aligned reliably before being considered as training examples. We also propose entropy-rank curves as a means of evaluating the distinctiveness of parts shareable between categories and use them to select useful parts out of a set of candidates. We apply the new representation to the task of scene categorisation on the MIT Scene 67 benchmark. We show that our method can learn parts which are significantly more informative and for a fraction of the cost, compared to previouspart-learning methods such as Singh et al. [28]. We also show that a well constructed bag of words or Fisher vector model can substantially outperform the previous state-of- the-art classification performance on this data.
[1] B. Alexe, T. Deselaers, and V. Ferrari. Measuring the objectness of image windows. In PAMI, 2012.
[2] Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. Neural Computation, 9, 1997.
[3] Y. Amit and A. Trouv ´e. Generative models for labeling multi-object configurations in images. In Toward Category-Level Object Recognition. Springer, 2006.
[4] R. Arandjelovic and A. Zisserman. Three things everyone should know to improve object retrieval. In Proc. CVPR, 2012.
[5] L. Bourdev and J. Malik. Poselets: Body part detectors trained using 3D human pose annotations. In Proc. ICCV, 2009.
[6] K. Chatfield, L. Lempitsky, A. Vedaldi, and A. Zisserman. The devil is in the details: an evaluation of recent feature encoding methods. In Proc. BMVC, 2011.
[7] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Proc. CVPR, 2005.
[8] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes (VOC) challenge. IJCV, 88(2):303–338, June 2010.
[9] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. PAMI, 2009.
[10] P. F. Felzenszwalb and D. P. Huttenlocher. Efficient graph-based im-
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25] age segmentation. IJCV, 59(2), 2004. M. A. Fischler and R. A. Elschlager. The representation and matching of pictorial structures. In IEEE Trans. on Computers, 1973. M. Gharbi, T. Malisiewicz, S. Paris, and F. Durand. A gaussian approximation of feature space for fast image similarity. Technical Report 2012-032, MIT CSAIL, 2012. B. Hariharan, J. Malik, and D. Ramanan. Discriminative decorrelation for clustering and classification. In Proc. ECCV, 2012. S. Lazebnik, C. Schmid, and J. Ponce. Beyond bag of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. CVPR, 2006. L.-J. Li, H. Su, E. Xing, and L. Fei-Fei. Object bank: A high-level image representation for scene classification & semantic feature sparsification. In Proc. NIPS, 2010. D. G. Lowe. Object recognition from local scale-invariant features. In Proc. ICCV, 1999. T. Malisiewicz, A. Gupta, and A. A. Efros. Ensemble of exemplarsvms for object detection and beyond. In Proc. ICCV, 2011. M. Pandey and S. Lazebnik. Scene recognition and weakly supervised object localization with deformable part-based models. In Proc. ICCV, 2011. S. Parizi, J. Oberlin, and P. Felzenszwalb. Reconfigurable models for scene recognition. In Proc. CVPR. CVPR, 2012. O. Parkhi, A. Vedaldi, C. V. Jawahar, and A. Zisserman. The truth about cats and dogs. In Proc. ICCV, 2011. F. Perronnin, Y. Liu, J. Sanchez, and H. Poirier. Large-scale image retrieval with compressed fisher vectors. In Proc. CVPR, 2010. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proc. CVPR, 2008. J. C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. Smola, P. Bartlett, B. Sch o¨lkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers. Cambridge, 2000. A. Quattoni and A. Torralba. Recognizing indoor scenes. In Proc. CVPR, 2009. M. Raptis, I. Kokkinos, and S. Soatto. Discovering discriminative
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37] action parts from mid-level video representations. In Proc. CVPR, 2012. F. Sadeghi and M. F. Tappen. Latent pyramidal regions for recognizing scenes. In Proc. ECCV, 2012. S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for svm. In Proc. ICML, pages 807–814, 2007. S. Singh, A. Gupta, and A. A. Efros. Unsupervised discovery of mid-level discriminative patches. In Proc. ECCV, 2012. S. Ullman, E. Sali, and M. Vidal-Naquet. A fragment-based approach to object representation and classification. In Intl. Workshop on Visual Form, 2001. K. E. A. van de Sande, J. R. R. Ujilings, T. Gevers, and A. W. M. Smeulders. Segmentation as selective search for object recognition. In Proc. ICCV, 2011. J. C. van Gemert, J.-M. Geusebroek, C. J. Veenman, and A. W. M. Smeulders. Kernel codebooks for scene categorization. In Proc. ECCV, 2008. A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. In Proc. ICCV, 2009. A. Vedaldi and A. Zisserman. Efficient additive kernels via explicit feature maps. In Proc. CVPR, 2010. J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Localityconstrained linear coding for image classification. Proc. CVPR, 2010. M. Weber, M. Welling, and P. Perona. Towards automatic discovery ofobject categories. In Proc. CVPR, volume 2, pages 101–108, 2000. J. Wu and J. Rehg. Centrist: A visual descriptor for scene categorization. In PAMI, 2011. J. Zhu, L.-J. Li, L. Fei-Fei, and E. Xing. Large margin learning of upstream scene understanding models. In Proc. NIPS, 2010. 9 9 932 230 8 8