nips nips2013 nips2013-169 nips2013-169-reference knowledge-graph by maker-knowledge-mining

169 nips-2013-Learning to Prune in Metric and Non-Metric Spaces

Source: pdf

Author: Leonid Boytsov, Bilegsaikhan Naidan

Abstract: Our focus is on approximate nearest neighbor retrieval in metric and non-metric spaces. We employ a VP-tree and explore two simple yet effective learning-toprune approaches: density estimation through sampling and “stretching” of the triangle inequality. Both methods are evaluated using data sets with metric (Euclidean) and non-metric (KL-divergence and Itakura-Saito) distance functions. Conditions on spaces where the VP-tree is applicable are discussed. The VP-tree with a learned pruner is compared against the recently proposed state-of-the-art approaches: the bbtree, the multi-probe locality sensitive hashing (LSH), and permutation methods. Our method was competitive against state-of-the-art methods and, in most cases, was more efﬁcient for the same rank approximation quality. 1

reference text

[1] G. Amato, F. Rabitti, P. Savino, and P. Zezula. Region proximity in metric spaces and its use for approximate similarity search. ACM Trans. Inf. Syst., 21(2):192–227, Apr. 2003.

[2] G. Amato and P. Savino. Approximate similarity search in metric spaces using inverted ﬁles. In Proceedings of the 3rd international conference on Scalable information systems, InfoScale ’08, pages 28:1–28:10, ICST, Brussels, Belgium, Belgium, 2008. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering).

[3] V. Athitsos, J. Alon, S. Sclaroff, and G. Kollios. BoostMap: A method for efﬁcient approximate similarity rankings. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, volume 2, pages II–268 – II–275 Vol.2, june-2 july 2004.

[4] J. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509–517, 1975.

[5] L. Boytsov and B. Naidan. Engineering efﬁcient and effective Non-Metric Space Library. In N. Brisaboa, O. Pedreira, and P. Zezula, editors, Similarity Search and Applications, volume 8199 of Lecture Notes in Computer Science, pages 280–293. Springer Berlin Heidelberg, 2013.

[6] L. Cayton. Fast nearest neighbor retrieval for Bregman divergences. In Proceedings of the 25th international conference on Machine learning, ICML ’08, pages 112–119, New York, NY, USA, 2008. ACM.

[7] L. Cayton and S. Dasgupta. A learning framework for nearest neighbor search. Advances in Neural Information Processing Systems, 20, 2007.

[8] E. Ch´ vez, K. Figueroa, and G. Navarro. Effective proximity retrieval by ordering permutaa tions. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(9):1647 –1658, sept. 2008.

[9] E. Ch´ vez and G. Navarro. Probabilistic proximity search: Fighting the curse of dimensionality a in metric spaces. Information Processing Letters, 85(1):39–46, 2003.

[10] E. Ch´ vez, G. Navarro, R. Baeza-Yates, and J. L. Marroquin. Searching in metric spaces. ACM a Computing Surveys, 33(3):273–321, 2001.

[11] W. Dong, Z. Wang, W. Josephson, M. Charikar, and K. Li. Modeling LSH for performance tuning. In Proceedings of the 17th ACM conference on Information and knowledge management, CIKM ’08, pages 669–678, New York, NY, USA, 2008. ACM.

[12] O. Edsberg and M. L. Hetland. Indexing inexact proximity search with distance regression in pivot space. In Proceedings of the Third International Conference on SImilarity Search and APplications, SISAP ’10, pages 51–58, New York, NY, USA, 2010. ACM.

[13] A. Esuli. Use of permutation preﬁxes for efﬁcient and scalable approximate similarity search. Inf. Process. Manage., 48(5):889–902, Sept. 2012.

[14] E. Gonzalez, K. Figueroa, and G. Navarro. Effective proximity retrieval by ordering permutations. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(9):1647–1658, 2008.

[15] L. V. Hedges and J. L. Vevea. Fixed-and random-effects models in meta-analysis. Psychological methods, 3(4):486–504, 1998. 8

[16] G. Hjaltason and H. Samet. Properties of embedding methods for similarity searching in metric spaces. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 25(5):530–549, 2003.

[17] P. Indyk. Nearest neighbors in high-dimensional spaces. In J. E. Goodman and J. O’Rourke, editors, Handbook of discrete and computational geometry, pages 877–892. Chapman and Hall/CRC, 2004.

[18] P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, STOC ’98, pages 604–613, New York, NY, USA, 1998. ACM.

[19] D. Jacobs, D. Weinshall, and Y. Gdalyahu. Classiﬁcation with nonmetric distances: Image retrieval and class representation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(6):583–600, 2000.

[20] E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efﬁcient search for approximate nearest neighbor in high dimensional spaces. In Proceedings of the 30th annual ACM symposium on Theory of computing, STOC ’98, pages 614–623, New York, NY, USA, 1998. ACM. ´

[21] H. Lejsek, F. Asmundsson, B. J´ nsson, and L. Amsaleg. NV-Tree: An efﬁcient disk-based o index for approximate search in very large high-dimensional collections. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31(5):869 –883, may 2009.

[22] Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe LSH: efﬁcient indexing for high-dimensional similarity search. In Proceedings of the 33rd international conference on Very large data bases, VLDB ’07, pages 950–961. VLDB Endowment, 2007.

[23] Y. Mu and S. Yan. Non-metric locality-sensitive hashing. In AAAI, 2010.

[24] T. Murakami, K. Takahashi, S. Serita, and Y. Fujii. Versatile probability-based indexing for approximate similarity search. In Proceedings of the Fourth International Conference on SImilarity Search and APplications, SISAP ’11, pages 51–58, New York, NY, USA, 2011. ACM.

[25] V. Pestov. Indexability, concentration, and {VC} theory. Journal of Discrete Algorithms, 13(0):2 – 18, 2012. Best Papers from the 3rd International Conference on Similarity Search and Applications (SISAP 2010).

[26] P. Ram, D. Lee, H. Ouyang, and A. G. Gray. Rank-approximate nearest neighbor search: Retaining meaning and speed in high dimensions. In Advances in Neural Information Processing Systems, pages 1536–1544, 2009.

[27] H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., 2005.

[28] J. Uhlmann. Satisfying general proximity similarity queries with metric trees. Information Processing Letters, 40:175–179, 1991.

[29] J. Vermorel. Near neighbor search in metric and nonmetric space, 2005. http:// hal.archives-ouvertes.fr/docs/00/03/04/85/PDF/densitree.pdf last accessed on Nov 1st 2012.

[30] R. Weber, H. J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the 24th International Conference on Very Large Data Bases, pages 194–205. Morgan Kaufmann, August 1998.

[31] P. N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms, SODA ’93, pages 311–321, Philadelphia, PA, USA, 1993. Society for Industrial and Applied Mathematics.

[32] P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2005.

[33] P. Zezula, P. Savino, G. Amato, and F. Rabitti. Approximate similarity retrieval with M-trees. The VLDB Journal, 7(4):275–293, Dec. 1998.

[34] Z. Zhang, B. C. Ooi, S. Parthasarathy, and A. K. H. Tung. Similarity search on Bregman divergence: towards non-metric indexing. Proc. VLDB Endow., 2(1):13–24, Aug. 2009. 9