nips nips2009 nips2009-81 nips2009-81-reference knowledge-graph by maker-knowledge-mining

81 nips-2009-Ensemble Nystrom Method

Source: pdf

Author: Sanjiv Kumar, Mehryar Mohri, Ameet Talwalkar

Abstract: A crucial technique for scaling kernel methods to very large data sets reaching or exceeding millions of instances is based on low-rank approximation of kernel matrices. We introduce a new family of algorithms based on mixtures of Nystr¨ m o approximations, ensemble Nystr¨ m algorithms, that yield more accurate low-rank o approximations than the standard Nystr¨ m method. We give a detailed study of o variants of these algorithms based on simple averaging, an exponential weight method, or regression-based methods. We also present a theoretical analysis of these algorithms, including novel error bounds guaranteeing a better convergence rate than the standard Nystr¨ m method. Finally, we report results of extensive o experiments with several data sets containing up to 1M points demonstrating the signiﬁcant improvement over the standard Nystr¨ m approximation. o 1

reference text

[1] A. Asuncion and D. Newman. UCI machine learning repository, 2007.

[2] B. E. Boser, I. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classiﬁers. In COLT, volume 5, pages 144–152, 1992.

[3] C. Cortes, M. Mohri, D. Pechyony, and A. Rastogi. Stability of transductive regression algorithms. In ICML, 2008.

[4] C. Cortes and V. N. Vapnik. Support-Vector Networks. Machine Learning, 20(3):273–297, 1995.

[5] P. Drineas and M. W. Mahoney. On the Nystr¨ m method for approximating a Gram matrix for o improved kernel-based learning. JMLR, 6:2153–2175, 2005.

[6] C. Fowlkes, S. Belongie, F. Chung, and J. Malik. Spectral grouping using the Nystr¨ m method. o IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 2004.

[7] G. Golub and C. V. Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, 2nd edition, 1983.

[8] A. Gustafson, E. Snitkin, S. Parker, C. DeLisi, and S. Kasif. Towards the identiﬁcation of essential genes using targeted genome sequencing and comparative analysis. BMC:Genomics, 7:265, 2006.

[9] S. Kumar, M. Mohri, and A. Talwalkar. Sampling techniques for the Nystr¨ m method. In o AISTATS, pages 304–311, 2009.

[10] Y. LeCun and C. Cortes. The MNIST database of handwritten digits, 2009.

[11] N. Littlestone and M. K. Warmuth. The weighted majority algorithm. Information and Computation, 108(2):212261, 1994.

[12] D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60:91–110, 2004.

[13] J. C. Platt. Fast embedding of sparse similarity graphs. In NIPS, 2004.

[14] C. Saunders, A. Gammerman, and V. Vovk. Ridge Regression Learning Algorithm in Dual Variables. In Proceedings of the ICML ’98, pages 515–521, 1998.

[15] B. Sch¨ lkopf, A. Smola, and K.-R. M¨ ller. Nonlinear component analysis as a kernel eigeno u value problem. Neural Computation, 10(5):1299–1319, 1998.

[16] T. Sim, S. Baker, and M. Bsat. The CMU PIE database. In Conference on Automatic Face and Gesture Recognition, 2002.

[17] A. Talwalkar, S. Kumar, and H. Rowley. Large-scale manifold learning. In CVPR, 2008.

[18] C. K. I. Williams and M. Seeger. Using the Nystr¨ m method to speed up kernel machines. In o NIPS, pages 682–688, 2000.

[19] K. Zhang, I. Tsang, and J. Kwok. Improved Nystr¨ m low-rank approximation and error analo ysis. In ICML, pages 273–297, 2008. 9