nips nips2013 nips2013-296 nips2013-296-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Marco Cuturi
Abstract: Optimal transport distances are a fundamental family of distances for probability measures and histograms of features. Despite their appealing theoretical properties, excellent performance in retrieval tasks and intuitive formulation, their computation involves the resolution of a linear program whose cost can quickly become prohibitive whenever the size of the support of these measures or the histograms’ dimension exceeds a few hundred. We propose in this work a new family of optimal transport distances that look at transport problems from a maximumentropy perspective. We smooth the classic optimal transport problem with an entropic regularization term, and show that the resulting optimum is also a distance which can be computed through Sinkhorn’s matrix scaling algorithm at a speed that is several orders of magnitude faster than that of transport solvers. We also show that this regularized distance improves upon classic optimal transport distances on the MNIST classification problem.
Ahuja, R., Magnanti, T., and Orlin, J. (1993). Network Flows: Theory, Algorithms and Applications. Prentice Hall. Avis, D. (1980). On the extreme rays of the metric cone. Canadian Journal of Mathematics, 32(1):126–144. Berg, C., Christensen, J., and Ressel, P. (1984). Harmonic Analysis on Semigroups. Number 100 in Graduate Texts in Mathematics. Springer Verlag. Bieling, J., Peschlow, P., and Martini, P. (2010). An efficient gpu implementation of the revised simplex method. In Parallel Distributed Processing, 2010 IEEE International Symposium on, pages 1–8. Brickell, J., Dhillon, I., Sra, S., and Tropp, J. (2008). The metric nearness problem. SIAM J. Matrix Anal. Appl, 30(1):375–396. Brualdi, R. A. (2006). Combinatorial matrix classes, volume 108. Cambridge University Press. Cover, T. and Thomas, J. (1991). Elements of Information Theory. Wiley & Sons. Darroch, J. N. and Ratcliff, D. (1972). Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics, 43(5):1470–1480. Erlander, S. and Stewart, N. (1990). The gravity model in transportation analysis: theory and extensions. Vsp. Ferradans, S., Papadakis, N., Rabin, J., Peyr´ , G., Aujol, J.-F., et al. (2013). Regularized discrete optimal e transport. In International Conference on Scale Space and Variational Methods in Computer Vision, pages 1–12. Franklin, J. and Lorenz, J. (1989). On the scaling of multidimensional matrices. Linear Algebra and its applications, 114:717–735. Good, I. (1963). Maximum entropy for hypothesis formulation, especially for multidimensional contingency tables. The Annals of Mathematical Statistics, pages 911–934. Grauman, K. and Darrell, T. (2004). Fast contour matching using approximate earth mover’s distance. In IEEE Conf. Vision and Patt. Recog., pages 220–227. Gudmundsson, J., Klein, O., Knauer, C., and Smid, M. (2007). Small manhattan networks and algorithmic applications for the earth movers distance. In Proceedings of the 23rd European Workshop on Computational Geometry, pages 174–177. Indyk, P. and Thaper, N. (2003). Fast image retrieval via embeddings. In 3rd International Workshop on Statistical and Computational Theories of Vision (at ICCV). Jaynes, E. T. (1957). Information theory and statistical mechanics. Phys. Rev., 106:620–630. Knight, P. A. (2008). The sinkhorn-knopp algorithm: convergence and applications. SIAM Journal on Matrix Analysis and Applications, 30(1):261–275. Ling, H. and Okada, K. (2007). An efficient earth mover’s distance algorithm for robust histogram comparison. IEEE transactions on Patt. An. and Mach. Intell., pages 840–853. Naor, A. and Schechtman, G. (2007). Planar earthmover is not in l1 . SIAM J. Comput., 37(3):804–826. Pele, O. and Werman, M. (2009). Fast and robust earth mover’s distances. In ICCV’09. Rubner, Y., Guibas, L., and Tomasi, C. (1997). The earth movers distance, multi-dimensional scaling, and color-based image retrieval. In Proceedings of the ARPA Image Understanding Workshop, pages 661–668. Shirdhonkar, S. and Jacobs, D. (2008). Approximate earth movers distance in linear time. In CVPR 2008, pages 1–8. IEEE. Sinkhorn, R. (1967). Diagonal equivalence to matrices with prescribed row and column sums. The American Mathematical Monthly, 74(4):402–405. Villani, C. (2009). Optimal transport: old and new, volume 338. Springer Verlag. Wilson, A. G. (1969). The use of entropy maximising models, in the theory of trip distribution, mode split and route split. Journal of Transport Economics and Policy, pages 108–126. 9