nips nips2011 nips2011-64 nips2011-64-reference knowledge-graph by maker-knowledge-mining

64 nips-2011-Convergent Bounds on the Euclidean Distance

Source: pdf

Author: Yoonho Hwang, Hee-kap Ahn

Abstract: Given a set V of n vectors in d-dimensional space, we provide an efﬁcient method for computing quality upper and lower bounds of the Euclidean distances between a pair of vectors in V . For this purpose, we deﬁne a distance measure, called the MS-distance, by using the mean and the standard deviation values of vectors in V . Once we compute the mean and the standard deviation values of vectors in V in O(dn) time, the MS-distance provides upper and lower bounds of Euclidean distance between any pair of vectors in V in constant time. Furthermore, these bounds can be reﬁned further in such a way to converge monotonically to the exact Euclidean distance within d reﬁnement steps. An analysis on a random sequence of reﬁnement steps shows that the MS-distance provides very tight bounds in only a few reﬁnement steps. The MS-distance can be used to various applications where the Euclidean distance is used to measure the proximity or similarity between objects. We provide experimental results on the nearest and the farthest neighbor searches. 1

reference text

[1] J. B. MacQueen. Some methods for classiﬁcation and analysis of multivariate observations. In L. M. Le Cam and J. Neyman, editors, Proceedings of the ﬁfth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 281–297. University of California Press, 1967.

[2] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, NY, USA, 1995.

[3] K. Pearson. On lines and planes of closest ﬁt to systems of points in space. Philosophical Magazine, 2:559–572, 1901.

[4] J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of ACM, 18:509–517, September 1975.

[5] A. Guttman. R-trees: A dynamic index structure for spatial searching. In Beatrice Yormark, editor, Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD ’84, pages 47–57. ACM, 1984.

[6] A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. In Proceedings of the 23rd international conference on Machine learning, ICML ’06, pages 97–104, New York, NY, USA, 2006. ACM.

[7] D. R. Karger and M. Ruhl. Finding nearest neighbors in growth-restricted metrics. In Proceedings of the 34th annual ACM symposium on Theory of computing, STOC ’02, pages 741–750, New York, NY, USA, 2002. ACM. ¨ g

[8] O. E˘ ecio˘ lu and H. Ferhatosmano˘ lu. Dimensionality reduction and similarity computation by inner g g product approximations. In Proceedings of the ninth international conference on Information and knowledge management, CIKM ’00, pages 219–226, New York, NY, USA, 2000. ACM.

[9] R. Weber, H. J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the 24rd International Conference on Very Large Data Bases, VLDB ’98, pages 194–205, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc.

[10] H. V. Jagadish, B. C. Ooi, K.L. Tan, C. Yu, and R. Zhang. idistance: An adaptive b+-tree based indexing method for nearest neighbor search. ACM Transactions on Database Systems, 30:364–397, June 2005.

[11] UCI machine learning archive. http://archive.ics.uci.edu/ml/. 8

[12] NIPS 2003 challenge on feature selection. http://clopinet.com/isabelle/projects/nips2003/.

[13] K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is ”nearest neighbor” meaningful? In Proceedings of the 7th International Conference on Database Theory, ICDT ’99, pages 217–235, London, UK, 1999. Springer. 9