nips nips2009 nips2009-198 nips2009-198-reference knowledge-graph by maker-knowledge-mining

198 nips-2009-Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions


Source: pdf

Author: Parikshit Ram, Dongryeol Lee, Hua Ouyang, Alexander G. Gray

Abstract: The long-standing problem of efďŹ cient nearest-neighbor (NN) search has ubiquitous applications ranging from astrophysics to MP3 ďŹ ngerprinting to bioinformatics to movie recommendations. As the dimensionality of the dataset increases, exact NN search becomes computationally prohibitive; (1+đ?œ–) distance-approximate NN search can provide large speedups but risks losing the meaning of NN search present in the ranks (ordering) of the distances. This paper presents a simple, practical algorithm allowing the user to, for the ďŹ rst time, directly control the true accuracy of NN search (in terms of ranks) while still achieving the large speedups over exact NN. Experiments on high-dimensional datasets show that our algorithm often achieves faster and more accurate results than the best-known distance-approximate method, with much more stable behavior. 1


reference text

[1] T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.

[2] B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, 1986.

[3] J. B. Tenenbaum, V. Silva, and J.C. Langford. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290(5500):2319–2323, 2000.

[4] S. T. Roweis and L. K. Saul. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290(5500):2323–2326, December 2000.

[5] A. N. Papadopoulos and Y. Manolopoulos. Nearest Neighbor Search: A Database Perspective. Springer, 2005.

[6] N. Alon, M. B˘ doiu, E. D. Demaine, M. Farach-Colton, and M. T. Hajiaghayi. Ordinal Ema beddings of Minimum Relaxation: General Properties, Trees, and Ultrametrics. 2008.

[7] K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When Is “Nearest Neighbor� Meaningful? LECTURE NOTES IN COMPUTER SCIENCE, pages 217–235, 1999.

[8] J. M. Hammersley. The Distribution of Distance in a Hypersphere. Annals of Mathematical Statistics, 21:447–452, 1950.

[9] J. H. Freidman, J. L. Bentley, and R. A. Finkel. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Trans. Math. Softw., 3(3):209–226, September 1977.

[10] S. M. Omohundro. Five Balltree Construction Algorithms. Technical Report TR-89-063, International Computer Science Institute, December 1989.

[11] F. P. Preparata and M. I. Shamos. Computational Geometry: An Introduction. Springer, 1985.

[12] A. Beygelzimer, S. Kakade, and J.C. Langford. Cover Trees for Nearest Neighbor. Proceedings of the 23rd international conference on Machine learning, pages 97–104, 2006.

[13] A. G. Gray and A. W. Moore. ‘đ?‘ -Body’ Problems in Statistical Learning. In NIPS, volume 4, pages 521–527, 2000.

[14] T. Liu, A. W. Moore, A. G. Gray, and K. Yang. An Investigation of Practical Approximate Nearest Neighbor Algorithms. In Advances in Neural Information Processing Systems 17, pages 825–832, 2005.

[15] L. Cayton. Fast Nearest Neighbor Retrieval for Bregman Divergences. Proceedings of the 25th international conference on Machine learning, pages 112–119, 2008.

[16] T. Liu, A. W. Moore, and A. G. Gray. EfďŹ cient Exact k-NN and Nonparametric ClassiďŹ cation in High Dimensions. 2004.

[17] P. Ciaccia and M. Patella. PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-dimensional and Metric spaces. Data Engineering, 2000. Proceedings. 16th International Conference on, pages 244–255, 2000.

[18] A. Gionis, P. Indyk, and R. Motwani. Similarity Search in High Dimensions via Hashing. pages 518–529, 1999.

[19] P. Indyk and R. Motwani. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In STOC, pages 604–613, 1998.

[20] J. Sedransk and J. Meyer. ConďŹ dence Intervals for the Quantiles of a Finite Population: Simple Random and StratiďŹ ed Simple Random sampling. Journal of the Royal Statistical Society, pages 239–252, 1978.

[21] C. L. Blake and C. J. Merz. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/, 1998.

[22] Y. LeCun. MN IST dataset, 2000. http://yann.lecun.com/exdb/mnist/. 9