nips nips2010 nips2010-133 nips2010-133-reference knowledge-graph by maker-knowledge-mining

133 nips-2010-Kernel Descriptors for Visual Recognition

Source: pdf

Author: Liefeng Bo, Xiaofeng Ren, Dieter Fox

Abstract: The design of low-level image features is critical for computer vision algorithms. Orientation histograms, such as those in SIFT [16] and HOG [3], are the most successful and popular features for visual object and scene recognition. We highlight the kernel view of orientation histograms, and show that they are equivalent to a certain type of match kernels over image patches. This novel view allows us to design a family of kernel descriptors which provide a uniﬁed and principled framework to turn pixel attributes (gradient, color, local binary pattern, etc.) into compact patch-level features. In particular, we introduce three types of match kernels to measure similarities between image patches, and construct compact low-dimensional kernel descriptors from these match kernels using kernel principal component analysis (KPCA) [23]. Kernel descriptors are easy to design and can turn any type of pixel attribute into patch-level features. They outperform carefully tuned and sophisticated features including SIFT and deep belief networks. We report superior performance on standard image classiﬁcation benchmarks: Scene-15, Caltech-101, CIFAR10 and CIFAR10-ImageNet.

reference text

[1] L. Bo and C. Sminchisescu. Efﬁcient Match Kernel between Sets of Features for Visual Recognition. In NIPS, 2009.

[2] O. Boiman, E. Shechtman, and M. Irani. In defense of nearest-neighbor based image classiﬁcation. In CVPR, 2008.

[3] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.

[4] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR, 2009.

[5] P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In CVPR, 2008.

[6] P. Gehler and S. Nowozin. On feature combination for multiclass object classiﬁcation. In ICCV, 2009.

[7] K. Grauman and T. Darrell. The pyramid match kernel: discriminative classiﬁcation with sets of image features. In ICCV, 2005.

[8] D. Haussler. Convolution kernels on discrete structures. Technical report, 1999.

[9] K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun. What is the best multi-stage architecture for object recognition? In ICCV, 2009.

[10] K. Kavukcuoglu, M. Ranzato, R. Fergus, and Y. LeCun. Learning invariant features through topographic ﬁlter maps. In CVPR, 2009.

[11] R. Kondor and T. Jebara. A kernel between sets of vectors. In ICML, 2003.

[12] A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.

[13] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006.

[14] H. Lee, R. Grosse, R. Ranganath, and A. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In ICML, 2009.

[15] F. Li, R. Fergus, and P. Perona. One-shot learning of object categories. IEEE PAMI, 2006.

[16] D. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60:91–110, 2004.

[17] S. Lyu. Mercer kernels for object recognition with local features. In CVPR, 2005.

[18] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE PAMI, 27(10):1615–1630, 2005.

[19] T. Ojala, M. Pietik¨ inen, and T. M¨ enp¨ a. Multiresolution gray-scale and rotation invariant a a a¨ texture classiﬁcation with local binary patterns. IEEE PAMI, 24(7):971–987, 2002.

[20] A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV, 42(3):145–175, 2001.

[21] M. Ranzato, Krizhevsky A., and G. Hinton. Factored 3-way restricted boltzmann machines for modeling natural images. In AISTATS, 2010.

[22] M. Ranzato and G. Hinton. Modeling pixel means and covariances using factorized third-order boltzmann machines. In CVPR, 2010.

[23] B. Sch¨ lkopf, A. Smola, and K. M¨ ller. Nonlinear component analysis as a kernel eigenvalue o u problem. Neural Computation, 10:1299–1319, 1998.

[24] S. Smale, L. Rosasco, J. Bouvrie, A. Caponnetto, and T. Poggio. Mathematics of the neural response. Foundations of Computational Mathematics, 10(1):67–91, 2010.

[25] A. Torralba, R. Fergus, and W. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE PAMI, 30(11):1958–1970, 2008.

[26] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Guo. Locality-constrained linear coding for image classiﬁcation. In CVPR, 2010.

[27] J. Wu and J. Rehg. Beyond the euclidean distance: Creating effective visual codebooks using the histogram intersection kernel. 2002.

[28] K. Yu, W. Xu, and Y. Gong. Deep learning with kernel regularization for visual recognition. In NIPS, 2008. 9