iccv iccv2013 iccv2013-221 iccv2013-221-reference knowledge-graph by maker-knowledge-mining

221 iccv-2013-Joint Inverted Indexing

Source: pdf

Author: Yan Xia, Kaiming He, Fang Wen, Jian Sun

Abstract: Inverted indexing is a popular non-exhaustive solution to large scale search. An inverted file is built by a quantizer such as k-means or a tree structure. It has been found that multiple inverted files, obtained by multiple independent random quantizers, are able to achieve practically good recall and speed. Instead of computing the multiple quantizers independently, we present a method that creates them jointly. Our method jointly optimizes all codewords in all quantizers. Then it assigns these codewords to the quantizers. In experiments this method shows significant improvement over various existing methods that use multiple independent quantizers. On the one-billion set of SIFT vectors, our method is faster and more accurate than a recent state-of-the-art inverted indexing method.

reference text

[1] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger. Closest point search in lattices. IEEE Transactions on Information Theory, 48(8):2201–2214, 2002.

[2] A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM, 51:117–122, 2008.

[3] A. Babenko and V. S. Lempitsky. The inverted multi-index. In CVPR, 2012.

[4] R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern information retrieval, volume 463. 1999.

[5] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 30(1): 107–1 17, 1998.

[6] M. S. Charikar. Similarity estimation techniques from rounding algorithms. In ACM Symposium on Theory of Computing, pages 380–388, 2002.

[7] S. Dasgupta and Y. Freund. Random projection trees and low dimensional manifolds. In STOC, 2008.

[8] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, pages 253–262, 2004.

[9] Y. Freund, S. Dasgupta, M. Kabra, and N. Verma. Learning the structure of manifolds using random projections. In NIPS, 2007.

[10] J. H. Friedman, J. L. Bentley, and R. A. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw., 3:209–226, 1977.

[11] K. Fukunaga and P. M. Narendra. A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans. Computers, 24:750–753, 1975.

[12] T. Ge, K. He, Q. Ke, and J. Sun. Optimized product quantization for approximate nearest neighbor search. In CVPR, 2013.

[13] Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. In CVPR, 2011.

[14] R. Gray. Vector quantization. ASSP Magazine, IEEE, 1(2):4– 29, 1984.

[15] K. He, F. Wen, and J. Sun. K-means Hashing: an Affinity-

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25] Preserving Quantization Method for Learning Binary Compact Codes. In CVPR, 2013. P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In STOC, pages 604–613, 1998. H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In ECCV, pages 304–317, 2008. H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. TPAMI, 33: 117–128, 2011. H. Jegou, R. Tavenard, M. Douze, and L. Amsaleg. Searching in one billion vectors: re-rank with source coding. In ICASSP, pages 861–864, 2011. H. Lejsek, F. H. A´smundsson, B. T. J´ onsson, and L. Amsaleg. Nv-tree: An efficient disk-based index for approximate search in very large high-dimensional collections. TPAMI, 31(5):869–883, 2009. D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60:91–1 10, 2004. J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pages 281–297. University of California Press, 1967. M. Muja and D. G. Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. In VISAPP, pages 331–340, 2009. D. Nistr and H. Stewnius. Scalable recognition with a vocabulary tree. In CVPR, 2006. A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV, 42: 145– 175, 2001.

[26] L. Paulev´ e, H. J ´egou, and L. Amsaleg. Locality sensitive hashing: a comparison of hash function types and querying mechanisms. Pattern Recognition Letters, 31:1348–1358, 2010.

[27] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, 2008.

[28] C. Silpa-Anan and R. Hartley. Optimised kd-trees for fast image descriptor matching. In CVPR, 2008.

[29] J. Sivic and A. Zisserman. Video google: a text retrieval approach to object matching in videos. In ICCV, pages 1470– 1477, 2003.

[30] A. B. Torralba, R. Fergus, and Y. Weiss. Small codes and large image databases for recognition. In CVPR, 2008. [3 1] Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, pages 1753–1760, 2008.

[32] I. H. Witten, A. Moffat, and T. C. Bell. Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann, 1999.

[33] Z. Wu, Q. Ke, M. Isard, and J. Sun. Bundling features for large scale partial-duplicate web image search. In CVPR, 2009. 33441236