cvpr cvpr2013 cvpr2013-421 cvpr2013-421-reference knowledge-graph by maker-knowledge-mining

421 cvpr-2013-Supervised Kernel Descriptors for Visual Recognition

Source: pdf

Author: Peng Wang, Jingdong Wang, Gang Zeng, Weiwei Xu, Hongbin Zha, Shipeng Li

Abstract: In visual recognition tasks, the design of low level image feature representation is fundamental. The advent of local patch features from pixel attributes such as SIFT and LBP, has precipitated dramatic progresses. Recently, a kernel view of these features, called kernel descriptors (KDES) [1], generalizes the feature design in an unsupervised fashion and yields impressive results. In this paper, we present a supervised framework to embed the image level label information into the design of patch level kernel descriptors, which we call supervised kernel descriptors (SKDES). Specifically, we adopt the broadly applied bag-of-words (BOW) image classification pipeline and a large margin criterion to learn the lowlevel patch representation, which makes the patch features much more compact and achieve better discriminative ability than KDES. With this method, we achieve competitive results over several public datasets comparing with stateof-the-art methods.

reference text

[1] L. Bo, X. Ren, and D. Fox. Kernel descriptors for visual recognition. In NIPS, pages 244–252, 2010. 1, 2, 3, 5, 6, 7

[2] L. Bo and C. Sminchisescu. Efficient match kernel between sets of features for visual recognition. In NIPS, pages 135–143, 2009. 1, 5

[3] Y.-L. Boureau, N. L. Roux, F. Bach, J. Ponce, and Y. LeCun. Ask the locals: Multi-way local pooling for image recognition. In ICCV, pages 2651–2658, 201 1. 1, 7

[4] M. Brown, G. Hua, and S. A. J. Winder. Discriminative learning of local image descriptors. IEEE Trans. Pattern Anal. Mach. Intell., 33(1):43–57, 2011. 1

[5] A. Coates and A. Y. Ng. The importance ofencoding versus training with sparse coding and vector quantization. In ICML, pages 921–928, 2011. 1

[6] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, pages 886–893, 2005. 1

[7] M. Dikmen, D. Hoiem, and T. S. Huang. A data driven method for feature transformation. In CVPR, pages 33 14–3321, 2012. 1

[8] J. Feng, B. Ni, Q. Tian, and S. Yan. Geometric p-norm feature pooling for image classification. In CVPR, pages 2697–2704, 2011. 1, 7

[9] S. Gao, I. W.-H. Tsang, and L.-T. Chia. Kernel sparse representation for image classification and face recognition. In ECCV (4), pages 1–14, 2010. 6, 7

[10] S. Gao, I. W.-H. Tsang, L.-T. Chia, and P. Zhao. Local features are not lonely - laplacian sparse coding for image classification. In CVPR, pages 3555–3561, 2010. 1, 6, 7

[11] P. V. Gehler and S. Nowozin. On feature combination for multiclass object classification. In ICCV, pages 221–228, 2009. 7

[12] Y. Jia, C. Huang, and T. Darrell. Beyond spatial pyramids: Receptive field learning for pooled image features. In CVPR, pages 3370–3377, 2012. 1, 6, 7

[13] Z. Jiang, G. Zhang, and L. S. Davis. Submodular dictionary learning for sparse coding. In CVPR, pages 3418–3425, 2012. 1, 7

[14] K. Kavukcuoglu, P. Sermanet, Y.-L. Boureau, K. Gregor, M. Mathieu, and Y. LeCun. Learning convolutional feature hierarchies for visual recognition. In NIPS, pages 1090–1098, 2010. 2

[15] Y. Ke and R. Sukthankar. Pca-sift: A more distinctive representation for local image descriptors. In CVPR (2), pages 506–513, 2004. 1

[16] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1106–1 114, 2012. 2

[17] R. Kwitt, N. Vasconcelos, and N. Rasiwasia. Scene recognition on the semantic manifold. In ECCV (4), pages 359–372, 2012. 6, 7

[18] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR (2), pages 2169–2178, 2006. 1, 5, 6, 7

[19] Y. LeCun, F. J. Huang, and L. Bottou. Learning methods for generic object

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30] [3 1]

[32]

[33]

[34]

[35]

[36]

[37] recognition with invariance to pose and lighting. In CVPR (2), pages 97–104, 2004. 2 H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In ICML, page 77, 2009. 2 F.-F. Li, R. Fergus, and P. Perona. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell., 28(4):594–61 1, 2006. 7 L.-J. Li and F.-F. Li. What, where and who? classifying events by scene and object recognition. In ICCV, pages 1–8, 2007. 6, 7 L.-J. Li, H. Su, E. P. Xing, and F.-F. Li. Object bank: A high-level image representation for scene classification & semantic feature sparsification. In NIPS, pages 1378–1386, 2010. 6 L. Liu, L. Wang, and X. Liu. In defense of soft-assignment coding. In ICCV, pages 2486–2493, 2011. 1, 6, 7 D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–1 10, 2004. 1 J. Mairal, F. Bach, and J. Ponce. Task-driven dictionary learning. IEEE Trans. Pattern Anal. Mach. Intell., 34(4):791–804, 2012. 1 K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell., 27(10): 1615–1630, 2005. 1 P. Natarajan, S. Wu, S. N. P. Vitaladevuni, X. Zhuang, U. Park, R. Prasad, and P. Natarajan. Multi-channel shape-flow kernel descriptors for robust video event detection and retrieval. In ECCV (2), pages 301–314, 2012. 2 Z. Niu, G. Hua, X. Gao, and Q. Tian. Context aware topic model for scene recognition. In CVPR, pages 2743–2750, 2012. 6, 7 S. N. Parizi, J. G. Oberlin, and P. F. Felzenszwalb. Reconfigurable models for scene recognition. In CVPR, pages 2775–2782, 2012. 7 J. Philbin, M. Isard, J. Sivic, and A. Zisserman. Descriptor learning for efficient retrieval. In ECCV (3), pages 677–691, 2010. 2 X. Ren, L. Bo, and D. Fox. Rgb-(d) scene labeling: Features and algorithms. In CVPR, pages 2759–2766, 2012. 2 O. Russakovsky, Y. Lin, K. Yu, and L. Fei-Fei. Object-centric spatial pooling for image classification. In ECCV (2), pages 1–15, 2012. 6 G. Sharma, F. Jurie, and C. Schmid. Discriminative spatial saliency for image classification. In CVPR, pages 3506–3513, 2012. 7 K. Simonyan, A. Vedaldi, and A. Zisserman. Descriptor learning using convex optimisation. In ECCV (1), pages 243–256, 2012. 2, 4 A. Tamrakar, S. Ali, Q. Yu, J. Liu, O. Javed, A. Divakaran, H. Cheng, and H. S. Sawhney. Evaluation of low-level features and their combinations for complex event detection in open source videos. In CVPR, pages 3681–3688, 2012. 1 J. Wang, J. Yang, K. Yu, F. Lv, T. S. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In CVPR, pages 3360–3367, 2010. 1, 6, 7

[38]

[39]

[40]

[41]

[42]

[43]

[44] K. Q. Weinberger and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10:207– 244, 2009. 2, 3 S. A. J. Winder and M. Brown. Learning local image descriptors. In CVPR, 2007. 2 J. Wu and J. M. Rehg. Beyond the euclidean distance: Creating effective visual codebooks using the histogram intersection kernel. In ICCV, pages 630–637, 2009. 6, 7 L. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research, 11:2543–2596, 2010. 4 J. Yang, K. Yu, Y. Gong, and T. S. Huang. Linear spatial pyramid matching using sparse coding for image classification. In CVPR, pages 1794–1801, 2009. 1, 6, 7 J. Yang, K. Yu, and T. S. Huang. Supervised translation-invariant sparse coding. In CVPR, pages 3517–3524, 2010. 1 M. D. Zeiler, G. W. Taylor, and R. Fergus. Adaptive deconvolutional networks for mid and high level feature learning. In ICCV, pages 2018–2025, 2011. 2, 7 222888666533