nips nips2013 nips2013-224 nips2013-224-reference knowledge-graph by maker-knowledge-mining

224 nips-2013-On the Sample Complexity of Subspace Learning

Source: pdf

Author: Alessandro Rudi, Guillermo D. Canas, Lorenzo Rosasco

Abstract: A large number of algorithms in machine learning, from principal component analysis (PCA), and its non-linear (kernel) extensions, to more recent spectral embedding and support estimation methods, rely on estimating a linear subspace from samples. In this paper we introduce a general formulation of this problem and derive novel learning error estimates. Our results rely on natural assumptions on the spectral properties of the covariance operator associated to the data distribution, and hold for a wide class of metrics between subspaces. As special cases, we discuss sharp error estimates for the reconstruction properties of PCA and spectral support estimation. Key to our analysis is an operator theoretic approach that has broad applicability to spectral learning methods. 1

reference text

[1] G. Beer. Topologies on Closed and Closed Convex Sets. Springer, 1993.

[2] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373–1396, 2003.

[3] Y. Bengio, O. Delalleau, N.L. Roux, J.F. Paiement, P. Vincent, and M. Ouimet. Learning eigenfunctions links spectral embedding and kernel pca. Neural Computation, 16(10):2197–2219, 2004.

[4] Y. Bengio, J.F. Paiement, and al. Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. Advances in neural information processing systems, 16:177–184, 2004.

[5] S. Bernstein. The Theory of Probabilities. Gastehizdat Publishing House, Moscow, 1946.

[6] G. Blanchard, O. Bousquet, and L. Zwald. Statistical properties of kernel principal component analysis. Machine Learning, 66(2):259–294, 2007.

[7] I. Borg and P.J.F. Groenen. Modern multidimensional scaling: Theory and applications. Springer, 2005.

[8] Ernesto De Vito, Lorenzo Rosasco, and al. Learning sets with separating kernels. arXiv:1204.3573, 2012.

[9] Ernesto De Vito, Lorenzo Rosasco, and Alessandro Toigo. Spectral regularization for support estimation. Advances in Neural Information Processing Systems, NIPS Foundation, pages 1–9, 2010.

[10] D.L. Donoho and C. Grimes. Hessian eigenmaps: Locally linear embedding techniques for highdimensional data. Proceedings of the National Academy of Sciences, 100(10):5591–5596, 2003.

[11] J. Ham, D.D. Lee, S. Mika, and B. Sch¨ lkopf. A kernel view of the dimensionality reduction of manifolds. o In Proceedings of the twenty-ﬁrst international conference on Machine learning, page 47. ACM, 2004.

[12] I. Jolliffe. Principal component analysis. Wiley Online Library, 2005.

[13] Andreas Maurer and Massimiliano Pontil. K–dimensional coding schemes in hilbert spaces. IEEE Transactions on Information Theory, 56(11):5839–5846, 2010.

[14] Iosif Pinelis. Optimum bounds for the distributions of martingales in banach spaces. The Annals of Probability, pages 1679–1706, 1994.

[15] J.R. Retherford. Hilbert Space: Compact Operators and the Trace Theorem. London Mathematical Society Student Texts. Cambridge University Press, 1993.

[16] S.T. Roweis and L.K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000.

[17] L.K. Saul and S.T. Roweis. Think globally, ﬁt locally: unsupervised learning of low dimensional manifolds. The Journal of Machine Learning Research, 4:119–155, 2003.

[18] B. Sch¨ lkopf, A. Smola, and K.R. M¨ ller. Kernel principal component analysis. Artiﬁcial Neural o u Networks-ICANN’97, pages 583–588, 1997.

[19] J. Shawe-Taylor, C. K. Williams, N. Cristianini, and J. Kandola. On the eigenspectrum of the gram matrix and the generalization error of kernel-pca. Information Theory, IEEE Transactions on, 51(7), 2005.

[20] I. Steinwart and A. Christmann. Support vector machines. Information science and statistics. SpringerVerlag. New York, 2008.

[21] J. Sun, S. Boyd, L. Xiao, and P. Diaconis. The fastest mixing markov process on a graph and a connection to a maximum variance unfolding problem. SIAM review, 48(4):681–699, 2006.

[22] J.B. Tenenbaum, V. De Silva, and J.C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.

[23] J.A. Tropp. User-friendly tools for random matrices: An introduction. 2012.

[24] K.Q. Weinberger and L.K. Saul. Unsupervised learning of image manifolds by semideﬁnite programming. In Computer Vision and Pattern Recognition, 2004. CVPR 2004., volume 2, pages II–988. IEEE, 2004.

[25] K.Q. Weinberger and L.K. Saul. Unsupervised learning of image manifolds by semideﬁnite programming. International Journal of Computer Vision, 70(1):77–90, 2006.

[26] C.K.I. Williams. On a connection between kernel pca and metric multidimensional scaling. Machine Learning, 46(1):11–19, 2002. 9