jmlr jmlr2011 jmlr2011-31 jmlr2011-31-reference knowledge-graph by maker-knowledge-mining

31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels

Source: pdf

Author: Jianxin Wu, Wei-Chian Tan, James M. Rehg

Abstract: Common visual codebook generation methods used in a bag of visual words model, for example, k-means or Gaussian Mixture Model, use the Euclidean distance to cluster features into visual code words. However, most popular visual descriptors are histograms of image measurements. It has been shown that with histogram features, the Histogram Intersection Kernel (HIK) is more effective than the Euclidean distance in supervised learning tasks. In this paper, we demonstrate that HIK can be used in an unsupervised manner to signiﬁcantly improve the generation of visual codebooks. We propose a histogram kernel k-means algorithm which is easy to implement and runs almost as fast as the standard k-means. The HIK codebooks have consistently higher recognition accuracy over k-means codebooks by 2–4% in several benchmark object and scene recognition data sets. The algorithm is also generalized to arbitrary additive kernels. Its speed is thousands of times faster than a naive implementation of the kernel k-means algorithm. In addition, we propose a one-class SVM formulation to create more effective visual code words. Finally, we show that the standard kmedian clustering method can be used for visual codebook generation and can act as a compromise between the HIK / additive kernel and the k-means approaches. Keywords: visual codebook, additive kernel, histogram intersection kernel

reference text

Ankur Agarwal and Bill Triggs. Multilevel image coding with hyperfeatures. International Journal of Computer Vision, 78(1):15–27, 2008. David Arthur and Sergei Vassilvitskii. k-means++: the advantage of careful seeding. In 18th Symposium on Discrete Algorithms, pages 1027–1035, 2007. Oren Boiman, Eli Shechtman, and Michal Irani. In defense of nearest-neighbor based image classiﬁcation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2008. Sabri Boughorbel, Jean-Philippe Tarel, and Nozha Boujemaa. Generalized histogram intersection kernel for image recognition. In Proc. Int’l Conf. on Image Processing, 2005. Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm. Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, volume 1, pages 886–893, 2005. Janez Demˇar. Statistical comparisons of classiﬁers over multiple data sets. Journal of Machine s Learning Research, 7:1–30, 2006. Mark Everingham, Andrew Zisserman, Christopher Williams, and Luc Van Gool. The PASCAL visual object classes challenge 2006 (VOC 2006) results, 2006. Li Fei-Fei, Rob Fergus, and Pietro Perona. Learning generative visual models from few training example: an incremental Bayesian approach tested on 101 object categories. In CVPR 2004, Workshop on Generative-Model Based Vision, 2004. Shenghua Gao, Ivor Wai-Hung Tsang, Liang-Tien Chia, and Peilin Zhao. Local features are not lonely – Laplacian sparse coding for image classiﬁcation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2010. Chih-Wei Hsu and Chih-Jen Lin. BSVM, 2006. Software available at http://www.csie.ntu. edu.tw/˜cjlin/bsvm. 3116 V ISUAL C ODEBOOK G ENERATION U SING A DDITIVE K ERNELS Fr´ d´ ric Jurie and Bill Triggs. Creating efﬁcient codebooks for visual recognition. In The IEEE e e Conf. on Computer Vision, volume 1, pages 604–610, 2005. Svetlana Lazebnik and Maxim Raginsky. Supervised learning of quantizer codebooks by information loss minimization. IEEE Trans. on Pattern Analysis and Machine Intelligence, 31(7): 1294–1309, 2009. Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, volume II, pages 2169–2178, 2006. Li-Jia Li and Li Fei-Fei. What, where and who? Classifying events by scene and object recognition. In The IEEE Conf. on Computer Vision, 2007. Jingen Liu and Mubarak Shah. Scene modeling using Co-Clustering. In The IEEE Conf. on Computer Vision, 2007. David G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004. Subhransu Maji and Alexander C. Berg. Max-margin additive classiﬁers for detection. In The IEEE Conf. on Computer Vision, 2009. Subhransu Maji, Alexander C. Berg, and Jitendra Malik. Classiﬁcation using intersection kernel support vector machines is efﬁcient. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2008. Frank Moosmann, Eric Nowak, and Frederic Jurie. Randomized clustering forests for image classiﬁcation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 30(9):1632–1646, 2008. David Nist´ r and Henrik Stew´ nius. Scalable recognition with a vocabulary tree. In Proc. IEEE e e Conf. on Computer Vision and Pattern Recognition, volume 2, pages 2161–2168, 2006. Francesca Odone, Annalisa Barla, and Alessandro Verri. Building kernels from binary strings for image matching. IEEE Trans. Image Processing, 14(2):169–180, 2005. Florent Perronnin. Universal and adapted vocabularies for generic visual categorization. IEEE Trans. on Pattern Analysis and Machine Intelligence, 30(7):1243–1256, 2008. James Philbin, Ondˇej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. Lost in quantizar tion: Improving particular object retrieval in large scale image databases. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2008. Ariadna Quattoni and Antonio Torralba. Recognizing indoor scenes. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2009. Bernhard Sch¨ lkopf, Alexander Smola, and Klaus-Robert M¨ ller. Nonlinear component analysis as o u a kernel eigenvalue problem. Neural Computation, 10(5):1299–1319, 1998. 3117 W U , TAN AND R EHG Bernhard Sch¨ lkopf, John C. Platt, John Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. o Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443– 1471, 2001. Josef Sivic and Andrew Zisserman. Video Google: A text retrieval approach to object matching in videos. In The IEEE Conf. on Computer Vision, volume 2, pages 1470–1477, 2003. Michael J. Swain and Dana H. Ballard. Color indexing. International Journal of Computer Vision, 7(1):11–32, 1991. Tinne Tuytelaars and Cordelia Schmid. Vector quantizing feature space with a regular lattice. In The IEEE Conf. on Computer Vision, 2007. Jan C. van Gemert, Jan-Mark Geusebroek, Cor J. Veenman, and Arnold W.M. Smeulders. Kernel codebooks for scene categorization. In European Conf. Computer Vision, 2008. Andrea Vedaldi and Andrew Zisserman. Efﬁcient additive kernels via explicit feature maps. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2010. Julia Vogel and Bernt Schiele. Semantic modeling of natural scenes for content-based image retrieval. International Journal of Computer Vision, 72(2):133–157, 2007. Yair Weiss, Antonio Torralba, and Rob Fergus. Spectral hashing. In Advances in Neural Information Processing Systems 21, pages 1753–1760, 2009. John Winn, Antonio Criminisi, and Thomas Minka. Object categorization by learned universal visual dictionary. In The IEEE Conf. on Computer Vision, volume 2, pages 1800–1807, 2005. Jianxin Wu. A fast dual method for HIK SVM learning. In European Conf. Computer Vision, LNCS 6312, pages 552–565, 2010. Jianxin Wu and James M. Rehg. Where am I: Place instance and category recognition using spatial PACT. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pages 1–8, 2008. Jianxin Wu and James M. Rehg. Beyond the Euclidean distance: Creating effective visual codebooks using the histogram intersection kernel. In The IEEE Conf. on Computer Vision, pages 630–637, 2009. Jianxin Wu and James M. Rehg. CENTRIST: A visual descriptor for scene categorization. IEEE Trans. on Pattern Analysis and Machine Intelligence, 33(8):1489–1501, 2011. Jianchao Yang, Kai Yu, Yihong Gong, and Thomas Huang. Linear spatial pyramid matching using sparse coding for image classiﬁcation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2009. Liu Yang, Rong Jin, Rahul Sukthankar, and Frederic Jurie. Unifying discriminative visual codebook generation with classiﬁer training for object category recognition. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2008. 3118