jmlr jmlr2008 jmlr2008-96 jmlr2008-96-reference knowledge-graph by maker-knowledge-mining

96 jmlr-2008-Visualizing Data using t-SNE

Source: pdf

Author: Laurens van der Maaten, Geoffrey Hinton

Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces signiﬁcantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large data sets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to inﬂuence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of data sets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are signiﬁcantly better than those produced by the other techniques on almost all of the data sets. Keywords: visualization, dimensionality reduction, manifold learning, embedding algorithms, multidimensional scaling

reference text

W.E. Arnoldi. The principle of minimized iteration in the solution of the matrix eigenvalue problem. Quarterly of Applied Mathematics, 9:17–25, 1951. 2602 V ISUALIZING DATA USING T-SNE G.D. Battista, P. Eades, R. Tamassia, and I.G. Tollis. Annotated bibliography on graph drawing. Computational Geometry: Theory and Applications, 4:235–282, 1994. M. Belkin and P. Niyogi. Laplacian Eigenmaps and spectral techniques for embedding and clustering. In Advances in Neural Information Processing Systems, volume 14, pages 585–591, Cambridge, MA, USA, 2002. The MIT Press. ´ Y. Bengio. Learning deep architectures for AI. Technical Report 1312, Universit e de Montr´ al, e 2007. N. Biggs. Algebraic graph theory. In Cambridge Tracts in Mathematics, volume 67. Cambridge University Press, 1974. H. Chernoff. The use of faces to represent points in k-dimensional space graphically. Journal of the American Statistical Association, 68:361–368, 1973. J.A. Cook, I. Sutskever, A. Mnih, and G.E. Hinton. Visualizing similarity data with a mixture of maps. In Proceedings of the 11th International Conference on Artiﬁcial Intelligence and Statistics, volume 2, pages 67–74, 2007. M.C. Ferreira de Oliveira and H. Levkowitz. From visual data exploration to visual data mining: A survey. IEEE Transactions on Visualization and Computer Graphics, 9(3):378–394, 2003. V. de Silva and J.B. Tenenbaum. Global versus local methods in nonlinear dimensionality reduction. In Advances in Neural Information Processing Systems, volume 15, pages 721–728, Cambridge, MA, USA, 2003. The MIT Press. P. Demartines and J. H´ rault. Curvilinear component analysis: A self-organizing neural network for e nonlinear mapping of data sets. IEEE Transactions on Neural Networks, 8(1):148–154, 1997. P. Doyle and L. Snell. Random walks and electric networks. In Carus mathematical monographs, volume 22. Mathematical Association of America, 1984. D.R. Fokkema, G.L.G. Sleijpen, and H.A. van der Vorst. Jacobi–Davidson style QR and QZ algorithms for the reduction of matrix pencils. SIAM Journal on Scientiﬁc Computing, 20(1):94–125, 1999. L. Grady. Random walks for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11):1768–1783, 2006. G.E. Hinton and S.T. Roweis. Stochastic Neighbor Embedding. In Advances in Neural Information Processing Systems, volume 15, pages 833–840, Cambridge, MA, USA, 2002. The MIT Press. G.E. Hinton and R.R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006. H. Hotelling. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24:417–441, 1933. R.A. Jacobs. Increased rates of convergence through learning rate adaptation. Neural Networks, 1: 295–307, 1988. 2603 VAN DER M AATEN AND H INTON S. Kakutani. Markov processes and the Dirichlet problem. Proceedings of the Japan Academy, 21: 227–233, 1945. D.A. Keim. Designing pixel-oriented visualization techniques: Theory and applications. IEEE Transactions on Visualization and Computer Graphics, 6(1):59–78, 2000. S. Lafon and A.B. Lee. Diffusion maps and coarse-graining: A uniﬁed framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9):1393–1403, 2006. J.A. Lee and M. Verleysen. Nonlinear dimensionality reduction of data manifolds with essential loops. Neurocomputing, 67:29–53, 2005. J.A. Lee and M. Verleysen. Nonlinear dimensionality reduction. Springer, New York, NY, USA, 2007. J.A. Lee, A. Lendasse, N. Donckers, and M. Verleysen. A robust nonlinear projection method. In Proceedings of the 8th European Symposium on Artiﬁcial Neural Networks, pages 13–20, 2000. K.V. Mardia, J.T. Kent, and J.M. Bibby. Multivariate Analysis. Academic Press, 1979. M. Meytlis and L. Sirovich. On the dimensionality of face space. IEEE Transactions of Pattern Analysis and Machine Intelligence, 29(7):1262–1267, 2007. B. Nadler, S. Lafon, R.R. Coifman, and I.G. Kevrekidis. Diffusion maps, spectral clustering and the reaction coordinates of dynamical systems. Applied and Computational Harmonic Analysis: Special Issue on Diffusion Maps and Wavelets, 21:113–127, 2006. S.A. Nene, S.K. Nayar, and H. Murase. Columbia Object Image Library (COIL-20). Technical Report CUCS-005-96, Columbia University, 1996. S.T. Roweis and L.K. Saul. Nonlinear dimensionality reduction by Locally Linear Embedding. Science, 290(5500):2323–2326, 2000. J.W. Sammon. A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, 18(5):401–409, 1969. L. Song, A.J. Smola, K. Borgwardt, and A. Gretton. Colored Maximum Variance Unfolding. In Advances in Neural Information Processing Systems, volume 21 (in press), 2007. W.N. Street, W.H. Wolberg, and O.L. Mangasarian. Nuclear feature extraction for breast tumor diagnosis. In Proceedings of the IS&T;/SPIE International Symposium on Electronic Imaging: Science and Technology, volume 1905, pages 861–870, 1993. M. Szummer and T. Jaakkola. Partially labeled classiﬁcation with Markov random walks. In Advances in Neural Information Processing Systems, volume 14, pages 945–952, 2001. J.B. Tenenbaum, V. de Silva, and J.C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000. 2604 V ISUALIZING DATA USING T-SNE W.S. Torgerson. Multidimensional scaling I: Theory and method. Psychometrika, 17:401–419, 1952. L.J.P. van der Maaten, E.O. Postma, and H.J. van den Herik. Dimensionality reduction: A comparative review. Online Preprint, 2008. K.Q. Weinberger, F. Sha, and L.K. Saul. Learning a kernel matrix for nonlinear dimensionality reduction. In Proceedings of the 21st International Confernence on Machine Learning, 2004. K.Q. Weinberger, F. Sha, Q. Zhu, and L.K. Saul. Graph Laplacian regularization for large-scale semideﬁnite programming. In Advances in Neural Information Processing Systems, volume 19, 2007. C.K.I. Williams. On a connection between Kernel PCA and metric multidimensional scaling. Machine Learning, 46(1-3):11–19, 2002. X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using Gaussian ﬁelds and harmonic functions. In Proceedings of the 20th International Conference on Machine Learning, pages 912–919, 2003. 2605