iccv iccv2013 iccv2013-77 iccv2013-77-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zhenyang Li, Efstratios Gavves, Koen E.A. van_de_Sande, Cees G.M. Snoek, Arnold W.M. Smeulders
Abstract: In this paper we aim for segmentation and classification of objects. We propose codemaps that are a joint formulation of the classification score and the local neighborhood it belongs to in the image. We obtain the codemap by reordering the encoding, pooling and classification steps over lattice elements. Other than existing linear decompositions who emphasize only the efficiency benefits for localized search, we make three novel contributions. As a preliminary, we provide a theoretical generalization of the sufficient mathematical conditions under which image encodings and classification becomes locally decomposable. As first novelty we introduce ℓ2 normalization for arbitrarily shaped image regions, which is fast enough for semantic segmentation using our Fisher codemaps. Second, using the same lattice across images, we propose kernel pooling which embeds nonlinearities into codemaps for object classification by explicit or approximate feature mappings. Results demonstrate that ℓ2 normalized Fisher codemaps improve the state-of-the-art in semantic segmentation for PAS- CAL VOC. For object classification the addition of nonlinearities brings us on par with the state-of-the-art, but is 3x faster. Because of the codemaps ’ inherent efficiency, we can reach significant speed-ups for localized search as well. We exploit the efficiency gain for our third novelty: object segment retrieval using a single query image only.
[1] P. Arbelaez, B. Hariharan, C. Gu, S. Gupta, L. D. Bourdev, and J. Malik. Semantic segmentation using regions and parts. In CVPR, 2012.
[2] P. Arbelaez, M. Maire, C. C. Fowlkes, and J. Malik. From contours to regions: An empirical evaluation. In CVPR, 2009.
[3] L. Bo and C. Sminchisescu. Efficient match kernels between sets of features for visual recognition. In NIPS, 2009.
[4] Y.-L. Boureau, J. Ponce, and Y. Lecun. A theoretical analysis of feature pooling in visual recognition. In ICML, 2010.
[5] J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu. Semantic Segmentation with Second-Order Pooling. In ECCV, 2012.
[6] C. Carson, S. Belongie, H. Greenspan, and J. Malik. Blobworld: Image segmentation using expectation-maximization and its application to image querying. TPAMI, 2002.
[7] G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In ECCV SLCV, 2004.
[8] G. Csurka and F. Perronnin. An efficient approach to semantic segmentation. IJCV, 2011.
[9] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR, 2009.
[10] M. Everingham, L. Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes challenge 2012. www.pascalnetwork.org/challenges/VOC/voc2012/workshop/index.html.
[11] M. Everingham, L. Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. IJCV, 2010.
[12] K. Grauman and T. Darrell. The pyramid match kernel: Efficient learning with sets of features. JMLR, 2007.
[13] H. Harzallah, F. Jurie, and C. Schmid. Combining efficient object localization and image classification. In ICCV, 2009.
[14] H. J ´egou, F. Perronnin, M. Douze, J. S ´anchez, P. P ´erez, and C. Schmid. Aggregating local image descriptors into compact codes. TPAMI, 2012.
[15] A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
[16] L. Ladicky, C. Russell, P. Kohli, and P. H. S. Torr. Graph cut based inference with co-occurrence statistics. In ECCV, 2010.
[17] C. H. Lampert, M. B. Blaschko, and T. Hofmann. Efficient subwindow search: A branch and bound framework for object localization. TPAMI, 2009.
[18] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006.
[19] A. Levinshtein, A. Stere, K. N. Kutulakos, D. J. Fleet, S. J. Dickinson, and K. Siddiqi. Turbopixels: Fast superpixels using geometric flows. TPAMI, 2009.
[20] D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 2004.
[21] S. Maji and A. Berg. Max-margin additive classifiers for detection. In ICCV, 2009.
[22] D. B. P. Yadollahpour and G. Shakhnarovich. Discriminative ReRanking of Diverse Segmentations. In CVPR, 2013.
[23] F. Perronnin, J. S ´anchez, and T. Mensink. Improving the fisher kernel for large-scale image classification. In ECCV, 2010.
[24] X. Ren and J. Malik. Learning a classification model for segmentation. In ICCV, 2003.
[25] B. Sch o¨lkopf and A. J. Smola. Learning with kernels. the MIT Press, 2002.
[26] J. Sivic and A. Zisserman. Video google: A text retrieval approach
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36] to object matching in videos. In ICCV, 2003. A. F. Smeaton, P. Over, and W. Kraaij. Evaluation campaigns and trecvid. In MIR, 2006. J. R. R. Uijlings, A. W. M. Smeulders, and R. J. H. Scha. The visual extent of an object. IJCV, 2012. J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders. Selective search for object recognition. In IJCV, 2013. K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek. Evaluating color descriptors for object and scene recognition. TPAMI, 2010. J. C. van Gemert, C. J. Veenman, A. W. M. Smeulders, and J. M. Geusebroek. Visual word ambiguity. TPAMI, 2010. A. Vedaldi and A. Zisserman. Efficient additive kernels via explicit feature maps. TPAMI, 2011. A. Vezhnevets, V. Ferrari, and J. M. Buhmann. Weakly supervised structured output learning for semantic segmentation. In CVPR, 2012. S. Vijayanarasimhan and K. Grauman. Efficient region search for object detection. In CVPR, 2011. J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In CVPR, 2009. J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local features and kernels for classification of texture and object categories: A comprehensive study. IJCV, 2007. 2143