nips nips2012 nips2012-329 nips2012-329-reference knowledge-graph by maker-knowledge-mining

329 nips-2012-Super-Bit Locality-Sensitive Hashing

Source: pdf

Author: Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, Qi Tian

Abstract: Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within (0, ⇡/2]. The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve signiﬁcant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments. 1

reference text

[1] Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Annual IEEE Symposium on Foundations of Computer Science, 2006.

[2] Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. Syntactic clustering of the web. Computer Networks, 29(8-13):1157–1166, 1997.

[3] Moses Charikar. Similarity estimation techniques from rounding algorithms. In ACM Symposium on Theory of Computing, 2002.

[4] Yasuko Chikuse. Statistics on Special Manifolds. Springer, February 2003.

[5] Ondrej Chum, James Philbin, and Andrew Zisserman. Near duplicate image detection: min-hash and tf-idf weighting. In British Machine Vision Conference, 2008.

[6] Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, 2004.

[7] Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity search in high dimensions via hashing. In International Conference on Very Large Databases, 1999.

[8] Michel X. Goemans and David P. Williamson. Improved approximation algorithms for maximum cut and satisﬁability problems using semideﬁnite programming. Journal of the ACM, 42(6):1115–1145, 1995.

[9] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In ACM Symposium on Theory of Computing, 1998.

[10] Prateek Jain, Brian Kulis, and Kristen Grauman. Fast image search for learned metrics. In IEEE Conference on Computer Vision and Pattern Recognition, 2008.

[11] Herv´ J´ gou, Matthijs Douze, and Cordelia Schmid. Product quantization for nearest neighbor search. e e IEEE Trans. Pattern Anal. Mach. Intell., 33(1):117–128, 2011.

[12] Brian Kulis and Trevor Darrell. Learning to hash with binary reconstructive embeddings. In Advances in Neural Information Processing Systems, 2009.

[13] Brian Kulis and Kristen Grauman. Kernelized locality-sensitive hashing for scalable image search. In IEEE International Conference on Computer Vision, 2009.

[14] Ping Li, Trevor Hastie, and Kenneth Ward Church. Improving random projections using marginal information. In COLT, pages 635–649, 2006.

[15] Ping Li, Trevor Hastie, and Kenneth Ward Church. Very sparse random projections. In KDD, pages 287–296, 2006.

[16] Ping Li and Arnd Christian K¨ nig. b-bit minwise hashing. In International World Wide Web Conference, o 2010.

[17] Ping Li, Anshumali Shrivastava, Joshua L. Moore, and Arnd Christian K¨ nig. Hashing algorithms for o large-scale learning. In Advances in Neural Information Processing Systems, 2011.

[18] Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and Shih-Fu Chang. Supervised hashing with kernels. In CVPR, pages 2074–2081, 2012.

[19] Wei Liu, Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. Hashing with graphs. In ICML, pages 1–8, 2011.

[20] David G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004.

[21] Yadong Mu, Jialie Shen, and Shuicheng Yan. Weakly-supervised hashing in kernel space. In IEEE Conference on Computer Vision and Pattern Recognition, 2010.

[22] Antonio Torralba, Robert Fergus, and William T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell., 30(11):1958–1970, 2008.

[23] Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. Sequential projection learning for hashing with compact codes. In International Conference on Machine Learning, 2010.

[24] Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. Semi-supervised hashing for large scale search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99(PrePrints), 2012.

[25] Yair Weiss, Antonio Torralba, and Robert Fergus. Spectral hashing. In Advances in Neural Information Processing Systems, 2008.

[26] Simon A. J. Winder and Matthew Brown. Learning local image descriptors. In IEEE Conference on Computer Vision and Pattern Recognition, 2007. 9