nips nips2011 nips2011-157 nips2011-157-reference knowledge-graph by maker-knowledge-mining

157 nips-2011-Learning to Search Efficiently in High Dimensions

Source: pdf

Author: Zhen Li, Huazhong Ning, Liangliang Cao, Tong Zhang, Yihong Gong, Thomas S. Huang

Abstract: High dimensional similarity search in large scale databases becomes an important challenge due to the advent of Internet. For such applications, specialized data structures are required to achieve computational efﬁciency. Traditional approaches relied on algorithmic constructions that are often data independent (such as Locality Sensitive Hashing) or weakly dependent (such as kd-trees, k-means trees). While supervised learning algorithms have been applied to related problems, those proposed in the literature mainly focused on learning hash codes optimized for compact embedding of the data rather than search efﬁciency. Consequently such an embedding has to be used with linear scan or another search algorithm. Hence learning to hash does not directly address the search efﬁciency issue. This paper considers a new framework that applies supervised learning to directly optimize a data structure that supports efﬁcient large scale search. Our approach takes both search quality and computational cost into consideration. Speciﬁcally, we learn a boosted search forest that is optimized using pair-wise similarity labeled examples. The output of this search forest can be efﬁciently converted into an inverted indexing data structure, which can leverage modern text search infrastructure to achieve both scalability and efﬁciency. Experimental results show that our approach signiﬁcantly outperforms the start-of-the-art learning to hash methods (such as spectral hashing), as well as state-of-the-art high dimensional search algorithms (such as LSH and k-means trees).

reference text

[1] J. S. Beis and D. G. Lowe. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In CVPR, pages 1000–1006, 1997.

[2] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, pages 253–262, 2004.

[3] J. Friedman, J. Bentley, and R. Finkel. An algorithm for ﬁnding best matches in logarithmic expected time. ACM Transactions on Mathematical Software (TOMS), 3(3):209–226, 1977.

[4] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28(2):337–374, 2000.

[5] K. Fukunage and P. Narendra. A branch and bound algorithm for computing k-nearest neighbors. IEEE Transactions on Computers, 100(7):750–753, 1975.

[6] A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, pages 518–529, 1999.

[7] J. He, W. Liu, and S.-F. Chang. Scalable similarity search with optimized kernel hashing. In KDD, 2010.

[8] P. Jain, B. Kulis, and K. Grauman. Fast image search for learned metrics. In CVPR, 2008.

[9] V. Jain and M. Varma. Learning to re-rank: query-dependent image re-ranking using click data. In WWW, pages 277–286, 2011.

[10] B. Kulis and T. Darrell. Learning to hash with binary reconstructive embeddings. NIPS, 2009.

[11] B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalable image search. In ICCV, 2009.

[12] Y. Lin, F. Lv, S. Zhu, M. Yang, T. Cour, K. Yu, L. Cao, and T. Huang. Large-scale image classiﬁcation: fast feature extraction and svm training. In CVPR, 2011.

[13] T.-Y. Liu. Learning to rank for information retrieval. In SIGIR, page 904, 2010.

[14] W. Liu, J. Wang, S. Kumar, and S. Chang. Hashing with graphs. In ICML, 2011.

[15] F. Moosmann, B. Triggs, and F. Jurie. Fast discriminative visual codebooks using randomized clustering forests. In NIPS, pages 985–992, 2006.

[16] M. Muja and D. G. Lowe. Fast approximate nearest neighbors with automatic algorithm conﬁguration. In VISSAPP, 2009.

[17] M. Naphade, J. Smith, J. Tesic, S. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis. Large-scale concept ontology for multimedia. IEEE Multimedia Magazine, 13(3):86–91, 2006.

[18] D. Nist´ r and H. Stew´ nius. Scalable recognition with a vocabulary tree. In CVPR, pages e e 2161–2168, 2006.

[19] R. Salakhutdinov and G. E. Hinton. Semantic hashing. Int. J. Approx. Reasoning, 50(7):969– 978, 2009.

[20] G. Shakhnarovich. Learning task-speciﬁc similarity. PhD thesis, Massachusetts Institute of Technology, 2005.

[21] G. Shakhnarovich, T. Darrell, and P. Indyk. Nearest-Neighbor Methods in Learning and Vision: Theory and Practice. The MIT Press, 2006.

[22] C. Silpa-Anan and R. Hartley. Optimised kd-trees for fast image descriptor matching. In CVPR, 2008.

[23] A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. PAMI, 30(11), 2008.

[24] J. Wang, O. Kumar, and S.-F. Chang. Semi-supervised hashing for scalable image retrieval. In CVPR, 2010.

[25] Weber, Roger, Schek, Hans J., and Blott, Stephen. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In VLDB, 1998.

[26] Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, 2008.

[27] T. Yeh, J. J. Lee, and T. Darrell. Adaptive vocabulary forests br dynamic indexing and category learning. In ICCV, pages 1–8, 2007.

[28] X. Zhou, N. Cui, Z. Li, F. Liang, and T. S. Huang. Hierarchical gaussianization for image classiﬁcation. In ICCV, 2009. 9