nips nips2011 nips2011-112 nips2011-112-reference knowledge-graph by maker-knowledge-mining

112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors


Source: pdf

Author: Yangqing Jia, Trevor Darrell

Abstract: Many applications in computer vision measure the similarity between images or image patches based on some statistics such as oriented gradients. These are often modeled implicitly or explicitly with a Gaussian noise assumption, leading to the use of the Euclidean distance when comparing image descriptors. In this paper, we show that the statistics of gradient based image descriptors often follow a heavy-tailed distribution, which undermines any principled motivation for the use of Euclidean distances. We advocate for the use of a distance measure based on the likelihood ratio test with appropriate probabilistic models that fit the empirical data distribution. We instantiate this similarity measure with the Gammacompound-Laplace distribution, and show significant improvement over existing distance measures in the application of SIFT feature matching, at relatively low computational cost. 1


reference text

[1] A Agarwal and H Daume III. Generative kernels for exponential families. In AISTATS, 2011.

[2] A Banerjee, S Merugu, I Dhillon, and J Ghosh. Clustering with Bregman divergences. JMLR, 6:1705– 1749, 2005.

[3] JT Barron and J Malik. High-frequency shape and albedo from shading using natural image statistics. In CVPR, 2011.

[4] O Boiman, E Shechtman, and M Irani. In defense of nearest-neighbor based image classification. In CVPR, 2008.

[5] P Chen, Y Chen, and M Rao. Metrics defined by bregman divergences. Communications in Mathematical Sciences, 6(4):915–926, 2008.

[6] N Dalal. Histograms of oriented gradients for human detection. In CVPR, 2005.

[7] AC Davison. Statistical models. Cambridge Univ Press, 2003.

[8] J Huang, AB Lee, and D Mumford. Statistics of range images. In CVPR, 2000.

[9] S Ji, Y Xue, and L Carin. Bayesian compressive sensing. IEEE Trans. Signal Processing, 56(6):2346– 2356, 2008.

[10] Y Jia, M Salzmann, and D Trevor. Factorized latent spaces with structured sparsity. In NIPS, 2010.

[11] D Koller and N Friedman. Probabilistic graphical models. MIT press, 2009.

[12] B Kulis and T Darrell. Learning to hash with binary reconstructive embeddings. In NIPS, 2009.

[13] S Lazebnik, C Schmid, and J Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006.

[14] D Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–110, 2004.

[15] J Mairal, F Bach, J Ponce, and G Sapiro. Online learning for matrix factorization and sparse coding. JMLR, 11:19–60, 2010.

[16] AW Moore. The anchors hierarchy: using the triangle inequality to survive high dimensional data. In UAI, 2000.

[17] A Oliva and A Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV, 42(3):145–175, 2001.

[18] B Olshausen. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583):607–609, 1996.

[19] M Ozuysal and P Fua. Fast keypoint recognition in ten lines of code. In CVPR, 2007.

[20] J Portilla, V Strela, MJ Wainwright, and EP Simoncelli. Image denoising using scale mixtures of gaussians in the wavelet domain. IEEE Trans. Image Processing, 12(11):1338–1351, 2003.

[21] M Riesenhuber and T Poggio. Hierarchical models of object recognition in cortex. Nature Neuroscience, 2:1019–1025, 1999.

[22] G Shakhnarovich, P Viola, and T Darrell. Fast pose estimation with parameter-sensitive hashing. In ICCV, 2003.

[23] EP Simoncelli. Statistical models for images: compression, restoration and synthesis. In Asilomar Conference on Signals, Systems & Computers, 1997.

[24] Y Weiss and WT Freeman. What makes a good model of natural images? In CVPR, 2007.

[25] S Winder and M Brown. Learning local image descriptors. In CVPR, 2007.

[26] L Wolf, T Hassner, and Y Taigman. The one-shot similarity kernel. In ICCV, 2009. 9