jmlr jmlr2013 jmlr2013-59 jmlr2013-59-reference knowledge-graph by maker-knowledge-mining

59 jmlr-2013-Large-scale SVD and Manifold Learning

Source: pdf

Author: Ameet Talwalkar, Sanjiv Kumar, Mehryar Mohri, Henry Rowley

Abstract: This paper examines the efﬁcacy of sampling-based low-rank approximation techniques when applied to large dense kernel matrices. We analyze two common approximate singular value decomposition techniques, namely the Nystr¨ m and Column sampling methods. We present a theoretical o comparison between these two methods, provide novel insights regarding their suitability for various tasks and present experimental results that support our theory. Our results illustrate the relative strengths of each method. We next examine the performance of these two techniques on the largescale task of extracting low-dimensional manifold structure given millions of high-dimensional face images. We address the computational challenges of non-linear dimensionality reduction via Isomap and Laplacian Eigenmaps, using a graph containing about 18 million nodes and 65 million edges. We present extensive experiments on learning low-dimensional embeddings for two large face data sets: CMU-PIE (35 thousand faces) and a web data set (18 million faces). Our comparisons show that the Nystr¨ m approximation is superior to the Column sampling method for this o task. Furthermore, approximate Isomap tends to perform better than Laplacian Eigenmaps on both clustering and classiﬁcation with the labeled CMU-PIE data set. Keywords: low-rank approximation, manifold learning, large-scale matrix factorization

reference text

C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, and M. Mohri. OpenFST: A general and efﬁcient weighted ﬁnite-state transducer library. In Conference on Implementation and Application of Automata, 2007. M. Balasubramanian and E. L. Schwartz. The Isomap algorithm and topological stability. Science, 295, 2002. M. Belkin and P. Niyogi. Laplacian Eigenmaps and spectral techniques for embedding and clustering. In Neural Information Processing Systems, 2001. B. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classiﬁers. In Conference on Learning Theory, 1992. E. Chang, K. Zhu, H. Wang, H. Bai, J. Li, Z. Qiu, and H. Cui. Parallelizing support vector machines on distributed computers. In Neural Information Processing Systems, 2008. C. Cortes, M. Mohri, and A. Talwalkar. On the impact of kernel approximation on learning accuracy. In Conference on Artiﬁcial Intelligence and Statistics, 2010. 3149 TALWALKAR , K UMAR , M OHRI AND ROWLEY V. de Silva and J. Tenenbaum. Global versus local methods in nonlinear dimensionality reduction. In Neural Information Processing Systems, 2003. A. Deshpande, L. Rademacher, S. Vempala, and G. Wang. Matrix approximation and projective clustering via volume sampling. In Symposium on Discrete Algorithms, 2006. I. Dhillon and B. Parlett. Multiple representations to compute orthogonal eigenvectors of symmetric tridiagonal matrices. Linear Algebra and its Applications, 387:1–28, 2004. D. Donoho and C. Grimes. Hessian Eigenmaps: locally linear embedding techniques for high dimensional data. Proceedings of the National Academy of Sciences, 100(10):5591–5596, 2003. P. Drineas and M. W. Mahoney. On the Nystr¨ m method for approximating a gram matrix for o improved kernel-based learning. Journal of Machine Learning Research, 6:2153–2175, 2005. P. Drineas, R. Kannan, and M. W. Mahoney. Fast Monte Carlo algorithms for matrices ii: Computing a low-rank approximation to a matrix. SIAM Journal on Computing, 36(1), 2006. S. Fine and K. Scheinberg. Efﬁcient SVM training using low-rank kernel representations. Journal of Machine Learning Research, 2:243–264, 2002. C. Fowlkes, S. Belongie, F. Chung, and J. Malik. Spectral grouping using the Nystr¨ m method. o Transactions on Pattern Analysis and Machine Intelligence, 26(2):214–225, 2004. A. Frieze, R. Kannan, and S. Vempala. Fast Monte Carlo algorithms for ﬁnding low-rank approximations. In Foundation of Computer Science, 1998. G. Golub and C. V. Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, 2nd edition, 1983. ISBN 0-8018-3772-3 (hardcover), 0-8018-3739-1 (paperback). N. Halko, P. Martinsson, and J. Tropp. Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions. arXiv:0909.4061v1[math.NA], 2009. J. Ham, D. D. Lee, S. Mika, and B. Sch¨ lkopf. A kernel view of the dimensionality reduction of o manifolds. In International Conference on Machine Learning, 2004. X. He, S. Yan, Y. Hu, and P. Niyogi. Face recognition using Laplacianfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3):328–340, 2005. T. Joachims. Making large-scale support vector machine learning practical. In Neural Information Processing Systems, 1999. S. Kumar and H. Rowley. People Hopper. http://googleresearch.blogspot.com/2010/03/ hopping-on-face-manifold-via-people.html, 2010. S. Kumar, M. Mohri, and A. Talwalkar. On sampling-based approximate spectral decomposition. In International Conference on Machine Learning, 2009. S. Kumar, M. Mohri, and A. Talwalkar. Sampling methods for the Nystr¨ m method. In Journal of o Machine Learning Research, 2012. 3150 L ARGE - SCALE SVD AND M ANIFOLD L EARNING Y. LeCun and C. Cortes. The MNIST database of handwritten digits. http://yann.lecun.com/ exdb/mnist/, 1998. T. Liu, A. W. Moore, A. G. Gray, and K. Yang. An investigation of practical approximate nearest neighbor algorithms. In Neural Information Processing Systems, 2004. L. Mackey, A. Talwalkar, and M. I. Jordan. Divide-and-conquer matrix factorization. In Neural Information Processing Systems, 2011. M. W. Mahoney. Randomized algorithms for matrices and data. Foundations and Trends in Machine Learning, 3(2):123–224, 2011. G. Mann, R. McDonald, M. Mohri, N. Silberman, and D. Walker. Efﬁcient large-scale distributed training of conditional maximum entropy models. In Neural Information Processing Systems, 2009. ¨ E. Nystr¨ m. Uber die praktische auﬂ¨ sung von linearen integralgleichungen mit anwendungen o o auf randwertaufgaben der potentialtheorie. Commentationes Physico-Mathematicae, 4(15):1–52, 1928. J. Platt. Fast training of Support Vector Machines using sequential minimal optimization. In Neural Information Processing Systems, 1999. J. Platt. Fast embedding of sparse similarity graphs. In Neural Information Processing Systems, 2004. S. Roweis and L. Saul. Nonlinear dimensionality reduction by Locally Linear Embedding. Science, 290(5500), 2000. B. Sch¨ lkopf and A. Smola. Learning with Kernels. MIT Press: Cambridge, MA, 2002. o B. Shaw and T. Jebara. Structure preserving embedding. In International Conference on Machine Learning, 2009. T. Sim, S. Baker, and M. Bsat. The CMU pose, illumination, and expression database. In Conference on Automatic Face and Gesture Recognition, 2002. A. Talwalkar. Matrix Approximation for Large-scale Learning. Ph.D. thesis, Computer Science Department, Courant Institute, New York University, New York, NY, 2010. A. Talwalkar and A. Rostamizadeh. Matrix coherence and the Nystr¨ m method. In Conference on o Uncertainty in Artiﬁcial Intelligence, 2010. A. Talwalkar, S. Kumar, and H. Rowley. Large-scale manifold learning. In Conference on Vision and Pattern Recognition, 2008. J. Tenenbaum, V. de Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2000. K. Weinberger and L. Saul. An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In AAAI Conference on Artiﬁcial Intelligence, 2006. 3151 TALWALKAR , K UMAR , M OHRI AND ROWLEY C. K. Williams and M. Seeger. Using the Nystr¨ m method to speed up kernel machines. In Neural o Information Processing Systems, 2000. 3152