jmlr jmlr2012 jmlr2012-103 jmlr2012-103-reference knowledge-graph by maker-knowledge-mining

103 jmlr-2012-Sampling Methods for the Nyström Method

Source: pdf

Author: Sanjiv Kumar, Mehryar Mohri, Ameet Talwalkar

Abstract: The Nystr¨ m method is an efﬁcient technique to generate low-rank matrix approximations and is o used in several large-scale learning applications. A key aspect of this method is the procedure according to which columns are sampled from the original matrix. In this work, we explore the efﬁcacy of a variety of ﬁxed and adaptive sampling schemes. We also propose a family of ensemble-based sampling algorithms for the Nystr¨ m method. We report results of extensive experiments o that provide a detailed comparison of various ﬁxed and adaptive sampling techniques, and demonstrate the performance improvement associated with the ensemble Nystr¨ m method when used in o conjunction with either ﬁxed or adaptive sampling schemes. Corroborating these empirical ﬁndings, we present a theoretical analysis of the Nystr¨ m method, providing novel error bounds guaro anteeing a better convergence rate of the ensemble Nystr¨ m method in comparison to the standard o Nystr¨ m method. o Keywords: low-rank approximation, nystr¨ m method, ensemble methods, large-scale learning o

reference text

Dimitris Achlioptas and Frank Mcsherry. Fast computation of low-rank matrix approximations. Journal of the ACM, 54(2), 2007. Sanjeev Arora, Elad Hazan, and Satyen Kale. A fast random sampling algorithm for sparsifying matrices. In Approx-Random, 2006. Arthur Asuncion and David Newman. UCI machine http://www.ics.uci.edu/ mlearn/MLRepository.html, 2007. learning repository. Francis R. Bach and Michael I. Jordan. Kernel independent component analysis. Journal of Machine Learning Research, 3:1–48, 2002. Francis R. Bach and Michael I. Jordan. Predictive low-rank decomposition for kernel methods. In International Conference on Machine Learning, 2005. Christopher T. Baker. The Numerical Treatment of Integral Equations. Clarendon Press, Oxford, 1977. Mohamed A. Belabbas and Patrick J. Wolfe. Spectral methods in machine learning and new strategies for very large datasets. Proceedings of the National Academy of Sciences of the United States of America, 106(2):369–374, January 2009. ISSN 1091-6490. Mohamed A. Belabbas and Patrick J. Wolfe. On landmark selection and sampling in highdimensional data analysis. arXiv:0906.4582v1 [stat.ML], 2009. Bernhard E. Boser, Isabelle Guyon, and Vladimir N. Vapnik. A training algorithm for optimal margin classiﬁers. In Conference on Learning Theory, 1992. Christos Boutsidis, Michael W. Mahoney, and Petros Drineas. An improved approximation algorithm for the column subset selection problem. In Symposium on Discrete Algorithms, 2009. Roland Bunschoten. hange/71-distance-m/, 1999. http://www.mathworks.com/matlabcentral/fileexc Emmanuel J. Cand` s and Benjamin Recht. Exact matrix completion via convex optimization. Foune dations of Computational Mathematics, 9(6):717–772, 2009. Emmanuel J. Cand` s and Terence Tao. The power of convex relaxation: near-optimal matrix come pletion. arXiv:0903.1476v1 [cs.IT], 2009. Gavin Cawley and Nicola Talbot. Miscellaneous matlab http://theoval.cmp.uea.ac.uk/matlab/default.html#cholinc, 2004. software. Corinna Cortes and Vladimir N. Vapnik. Support-vector networks. Machine Learning, 20(3):273– 297, 1995. Corinna Cortes, Mehryar Mohri, Dmitry Pechyony, and Ashish Rastogi. Stability of transductive regression algorithms. In International Conference on Machine Learning, 2008. 1003 K UMAR , M OHRI AND TALWALKAR Corinna Cortes, Mehryar Mohri, and Ameet Talwalkar. On the impact of kernel approximation on learning accuracy. In Conference on Artiﬁcial Intelligence and Statistics, 2010. Vin de Silva and Joshua Tenenbaum. Global versus local methods in nonlinear dimensionality reduction. In Neural Information Processing Systems, 2003. Amit Deshpande, Luis Rademacher, Santosh Vempala, and Grant Wang. Matrix approximation and projective clustering via volume sampling. In Symposium on Discrete Algorithms, 2006. Petros Drineas. Personal communication, 2008. Petros Drineas and Michael W. Mahoney. On the Nystr¨ m method for approximating a gram matrix o for improved kernel-based learning. Journal of Machine Learning Research, 6:2153–2175, 2005. Petros Drineas, Eleni Drinea, and Patrick S. Huggins. An experimental evaluation of a Monte-Carlo algorithm for svd. In Panhellenic Conference on Informatics, 2001. Petros Drineas, Ravi Kannan, and Michael W. Mahoney. Fast Monte Carlo algorithms for matrices ii: computing a low-rank approximation to a matrix. SIAM Journal of Computing, 36(1), 2006. Petros Drineas, Michael W. Mahoney, and S. Muthukrishnan. Relative-error cur matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30(2):844–881, 2008. Shai Fine and Katya Scheinberg. Efﬁcient svm training using low-rank kernel representations. Journal of Machine Learning Research, 2:243–264, 2002. Charless Fowlkes, Serge Belongie, Fan Chung, and Jitendra Malik. Spectral grouping using the Nystr¨ m method. Transactions on Pattern Analysis and Machine Intelligence, 26(2):214–225, o 2004. Alan Frieze, Ravi Kannan, and Santosh Vempala. Fast Monte-Carlo algorithms for ﬁnding low-rank approximations. In Foundation of Computer Science, 1998. Gene Golub and Charles Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, 2nd edition, 1983. ISBN 0-8018-3772-3 (hardcover), 0-8018-3739-1 (paperback). Sergei A. Goreinov, Eugene E. Tyrtyshnikov, and Nickolai L. Zamarashkin. A theory of pseudoskeleton approximations. Linear Algebra and Its Applications, 261:1–21, 1997. Genevieve Gorrell. Generalized Hebbian algorithm for incremental singular value decomposition in natural language processing. In European Chapter of the Association for Computational Linguistics, 2006. Ming Gu and Stanley C. Eisenstat. Efﬁcient algorithms for computing a strong rank-revealing qr factorization. SIAM Journal of Scientiﬁc Computing, 17(4):848–869, 1996. Adam Gustafson, Evan Snitkin, Stephen Parker, Charles DeLisi, and Simon Kasif. Towards the identiﬁcation of essential genes using targeted genome sequencing and comparative analysis. BMC:Genomics, 7:265, 2006. 1004 ¨ S AMPLING M ETHODS FOR THE N YSTR OM M ETHOD Nathan Halko, Per Gunnar Martinsson, and Joel A. Tropp. Finding structure with randomness: stochastic algorithms for constructing approximate matrix decompositions. arXiv:0909.4061v1 [math.NA], 2009. Sariel Har-peled. Low-rank matrix approximation in linear time, manuscript, 2006. Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of the ACM, 53(3):307–323, 2006. William B. Johnson and Joram Lindenstrauss. Extensions of lipschitz mappings into a hilbert space. Contemporary Mathematics, 26:189–206, 1984. Sanjiv Kumar, Mehryar Mohri, and Ameet Talwalkar. Sampling techniques for the Nystr¨ m method. o In Conference on Artiﬁcial Intelligence and Statistics, 2009a. Sanjiv Kumar, Mehryar Mohri, and Ameet Talwalkar. On sampling-based approximate spectral decomposition. In International Conference on Machine Learning, 2009b. Sanjiv Kumar, Mehryar Mohri, and Ameet Talwalkar. Ensemble Nystr¨ m method. In Neural o Information Processing Systems, 2009c. Yann LeCun and Corinna Cortes. The http://yann.lecun.com/exdb/mnist/, 1998. mnist database of handwritten digits. Mu Li, James T. Kwok, and Bao-Liang Lu. Making large-scale Nystr¨ m approximation possible. o In International Conference on Machine Learning, 2010. Edo Liberty. Accelerated Dense Random Projections. Ph.D. thesis, computer science department, Yale University, New Haven, CT, 2009. Nick Littlestone and Manfred K. Warmuth. The weighted majority algorithm. Information and Computation, 108(2):212–261, 1994. Rong Liu, Varun Jain, and Hao Zhang. Subsampling for efﬁcient spectral mesh processing. In Computer Graphics International Conference, 2006. Michael W Mahoney and Petros Drineas. CUR matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences, 106(3):697–702, 2009. Evert J. Nystr¨ m. Uber die praktische auﬂ¨ sung von linearen integralgleichungen mit anwendungen o ¨ o auf randwertaufgaben der potentialtheorie. Commentationes Physico-Mathematicae, 4(15):1–52, 1928. Marie Ouimet and Yoshua Bengio. Greedy spectral embedding. In Artiﬁcial Intelligence and Statistics, 2005. Christos H. Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, and Santosh Vempala. Latent semantic indexing: a probabilistic analysis. In Principles of Database Systems, 1998. John C. Platt. Fast embedding of sparse similarity graphs. In Neural Information Processing Systems, 2004. 1005 K UMAR , M OHRI AND TALWALKAR Vladimir Rokhlin, Arthur Szlam, and Mark Tygert. A randomized algorithm for principal component analysis. SIAM Journal on Matrix Analysis and Applications, 31(3):1100–1124, 2009. Mark Rudelson and Roman Vershynin. Sampling from large matrices: an approach through geometric functional analysis. Journal of the ACM, 54(4):21, 2007. Anthony F. Ruston. Auerbach’s theorem and tensor products of banach spaces. Mathematical Proceedings of the Cambridge Philosophical Society, 58:476–480, 1962. Bernhard Sch¨ lkopf and Alex Smola. Learning with Kernels. MIT Press: Cambridge, MA, 2002. o Bernhard Sch¨ lkopf, Alexander Smola, and Klaus-Robert M¨ ller. Nonlinear component analysis as o u a kernel eigenvalue problem. Neural Computation, 10(5):1299–1319, 1998. Terence Sim, Simon Baker, and Maan Bsat. The cmu pose, illumination, and expression database. In Conference on Automatic Face and Gesture Recognition, 2002. Alex J. Smola. SVLab. http://alex.smola.org/data/svlab.tgz, 2000. Alex J. Smola and Bernhard Sch¨ lkopf. Sparse greedy matrix approximation for machine learning. o In International Conference on Machine Learning, 2000. G. W. Stewart. Four algorithms for the efﬁcient computation of truncated pivoted qr approximations to a sparse matrix. Numerische Mathematik, 83(2):313–323, 1999. Ameet Talwalkar and Afshin Rostamizadeh. Matrix coherence and the Nystr¨ m method. In Cono ference on Uncertainty in Artiﬁcial Intelligence, 2010. Ameet Talwalkar, Sanjiv Kumar, and Henry Rowley. Large-scale manifold learning. In Conference on Vision and Pattern Recognition, 2008. Mark Tygert. http://www.mathworks.com/matlabcentral/fileexchange/ 21524-principal-component-analysis, 2009. Christopher K. I. Williams and Matthias Seeger. Using the Nystr¨ m method to speed up kernel o machines. In Neural Information Processing Systems, 2000. Kai Zhang and James T. Kwok. Density-weighted Nystr¨ m method for computing large kernel o eigensystems. Neural Computation, 21(1):121–146, 2009. Kai Zhang, Ivor Tsang, and James Kwok. Improved Nystr¨ m low-rank approximation and error o analysis. In International Conference on Machine Learning, 2008. 1006