nips nips2012 nips2012-42 nips2012-42-reference knowledge-graph by maker-knowledge-mining

42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search


Source: pdf

Author: Yunchao Gong, Sanjiv Kumar, Vishal Verma, Svetlana Lazebnik

Abstract: This paper focuses on the problem of learning binary codes for efficient retrieval of high-dimensional non-negative data that arises in vision and text applications where counts or frequencies are used as features. The similarity of such feature vectors is commonly measured using the cosine of the angle between them. In this work, we introduce a novel angular quantization-based binary coding (AQBC) technique for such data and analyze its properties. In its most basic form, AQBC works by mapping each non-negative feature vector onto the vertex of the binary hypercube with which it has the smallest angle. Even though the number of vertices (quantization landmarks) in this scheme grows exponentially with data dimensionality d, we propose a method for mapping feature vectors to their smallest-angle binary vertices that scales as O(d log d). Further, we propose a method for learning a linear transformation of the data to minimize the quantization error, and show that it results in improved binary codes. Experiments on image and text datasets show that the proposed AQBC method outperforms the state of the art. 1


reference text

[1] A. Banerjee, I. S. Dhillon, J. Ghosh, and S. Sra. Clustering on the unit hypersphere using von Mises-Fisher distributions. JMLR, 2005.

[2] A. Bergamo, L. Torresani, and A. Fitzgibbon. Picodes: Learning a compact code for novelcategory recognition. NIPS, 2011.

[3] A. Broder. On the resemblance and containment of documents. Compression and Complexity of Sequences, 1997.

[4] M. S. Charikar. Similarity estimation techniques from rounding algorithms. STOC, 2002.

[5] X. Chen, B. Bai, Y. Qi, Q. Lin, and J. Carbonell. Sparse latent semantic analysis. SDM, 2011.

[6] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. CVPR, 2009.

[7] Y. Gong and S. Lazebnik. Iterative quantization: A Procrustean approach to learning binary codes. CVPR, 2011.

[8] J. He, R. Radhakrishnan, S.-F. Chang, and C. Bauer. Compact hashing with joint optimization of search accuracy and time. CVPR, 2011.

[9] J.-P. Heo, Y. Lee, J. He, S.-F. Chang, and S.-E. Yoon. Spherical hashing. CVPR, 2012.

[10] P. Jain, B. Kulis, and K. Grauman. Fast image search for learned metrics. CVPR, 2008.

[11] B. Kulis and T. Darrell. Learning to hash with binary reconstructive embeddings. NIPS, 2009.

[12] B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalable image search. In ICCV, 2009.

[13] P. Li and C. Konig. Theory and applications of b-bit minwise hashing. Communications of the ACM, 2011.

[14] P. Li, A. Shrivastava, J. Moore, and C. Konig. Hashing algorithms for large-scale learning. NIPS, 2011.

[15] W. Liu, S. Kumar, and S.-F. Chang. Hashing with graphs. ICML, 2011.

[16] W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang. Supervised hashing with kernels. CVPR, 2012.

[17] C. D. Manning and H. Sch¨ tze. Foundations of statistical natural language processing. MIT u Press, 1999.

[18] M. Norouzi and D. J. Fleet. Minimal loss hashing for compact binary codes. ICML, 2011.

[19] F. Perronnin, J. Sanchez, , and Y. Liu. Large-scale image categorization with explicit data embedding. CVPR, 2010.

[20] M. Raginsky and S. Lazebnik. Locality sensitive binary codes from sift-invariant kernels. NIPS, 2009.

[21] R. Salakhutdinov and G. Hinton. Semantic hashing. International Journal of Approximate Reasoning, 2009.

[22] A. Shrivastava and P. Li. Fast near neighbor search in high-dimensional binary data. ECML, 2012.

[23] A. Torralba, R. Fergus, and Y. Weiss. Small codes and large image databases for recognition. CVPR, 2008.

[24] J. Wang, S. Kumar, and S.-F. Chang. Semi-supervised hashing for scalable image retrieval. CVPR, 2010.

[25] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. CVPR, 2010.

[26] Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. NIPS, 2008.

[27] J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. SUN database: Large-scale scene recognition from Abbey to Zoo. CVPR, 2010.

[28] G. K. Zipf. The psychobiology of language. Houghton-Mifflin, 1935. 9