jmlr jmlr2007 jmlr2007-33 jmlr2007-33-reference knowledge-graph by maker-knowledge-mining

33 jmlr-2007-Fast Iterative Kernel Principal Component Analysis


Source: pdf

Author: Simon Günter, Nicol N. Schraudolph, S. V. N. Vishwanathan

Abstract: We develop gain adaptation methods that improve convergence of the kernel Hebbian algorithm (KHA) for iterative kernel PCA (Kim et al., 2005). KHA has a scalar gain parameter which is either held constant or decreased according to a predetermined annealing schedule, leading to slow convergence. We accelerate it by incorporating the reciprocal of the current estimated eigenvalues as part of a gain vector. An additional normalization term then allows us to eliminate a tuning parameter in the annealing schedule. Finally we derive and apply stochastic meta-descent (SMD) gain vector adaptation (Schraudolph, 1999, 2002) in reproducing kernel Hilbert space to further speed up convergence. Experimental results on kernel PCA and spectral clustering of USPS digits, motion capture and image denoising, and image super-resolution tasks confirm that our methods converge substantially faster than conventional KHA. To demonstrate scalability, we perform kernel PCA on the entire MNIST data set. Keywords: step size adaptation, gain vector adaptation, stochastic meta-descent, kernel Hebbian algorithm, online learning


reference text

Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, England, 2004. Liang-Hwe Chen and Shyang Chang. An adaptive learning algorithm for principal component analysis. IEEE Transaction on Neural Networks, 6(5):1255–1263, 1995. Christian Darken and John E. Moody. Towards faster stochastic gradient search. In John E. Moody, Stephen J. Hanson, and Richard Lippmann, editors, Advances in Neural Information Processing Systems, volume 4, pages 1009–1016. Morgan Kaufmann Publishers, 1992. Athinodoros S. Georghiades, Peter N. Belhumeur, and David J. Kriegman. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6):643–660, 2001. ISSN 0162-8828. doi: http://doi.ieeecomputersociety.org/10.1109/34.927464. Andreas Griewank. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Frontiers in Applied Mathematics. SIAM, Philadelphia, 2000. ¨ Thorsten Joachims. Making large-scale SVM learning practical. In Bernhard Sch olkopf, Chris J. C. Burges, and Alex J. Smola, editors, Advances in Kernel Methods — Support Vector Learning, pages 169–184, Cambridge, MA, 1999. MIT Press. Juha Karhunen. Optimization criteria and nonlinear PCA neural networks. In IEEE World Congress on Computational Intelligence, volume 2, pages 1241–1246, 1994. Juha Karhunen and Jyrki Joutsensalo. Representation and separation of signals using nonlinear PCA type learning. Neural Networks, 7(1):113–127, 1994. 1916 FAST I TERATIVE K ERNEL PCA ¨ Kwang In Kim, Matthias O. Franz, and Bernhard Scholkopf. Iterative kernel principal component analysis for image modeling. IEEE Trans. Pattern Analysis and Machine Intelligence, 27(9): 1351–1366, 2005. Yann LeCun. MNIST handwritten digit database, 1998. URL http://www.research.att.com/ ˜yann/ocr/mnist/. Yann LeCun, Bernhard E. Boser, John S. Denker, Donnie Henderson, R. E. Howard, Wayne E. Hubbard, and Lawrence D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1:541–551, 1989. Marina Meila. Comparing clusterings: An axiomatic view. In Proc. 22nd Intl. Conf. Machine Learning (ICML), pages 577–584, New York, NY, USA, 2005. ACM Press. ¨ Sebastian Mika, Bernhard Sch¨ lkopf, Alex J. Smola, Klaus-Robert Muller, Matthias Scholz, and o Gunnar R¨ tsch. Kernel PCA and de-noising in feature spaces. In Michael S. Kearns, Sara A. a Solla, and David A. Cohn, editors, Advances in Neural Information Processing Systems, volume 11, pages 536–542. MIT Press, 1999. David C. Munson, Jr. A note on Lena. IEEE Trans. Image Processing, 5(1), 1996. Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. In Thomas G. Dietterich, Suzanna Becker, and Zoubin Ghahramani, editors, Advances in Neural Information Processing Systems, volume 14, 2002. Erkki Oja and Juha Karhunen. On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix. Journal of Mathematical Analysis and Applications, 106(1): 69–84, February 1985. Barak A. Pearlmutter. Fast exact multiplication by the Hessian. Neural Computation, 6(1):147–160, 1994. John Platt. Fast training of support vector machines using sequential minimal optimization. In Bernhard Sch¨ lkopf, Chris J. C. Burges, and Alex J. Smola, editors, Advances in Kernel Metho ods — Support Vector Learning, pages 185–208, Cambridge, MA, 1999. MIT Press. Herbert Robbins and Sutton Monro. A stochastic approximation method. Annals of Mathematical Statistics, 22:400–407, 1951. Terrence D. Sanger. Optimal unsupervised learning in a single-layer linear feedforward network. Neural Networks, 2:459–473, 1989. Bernhard Sch¨ lkopf and Alex J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002. o ¨ Bernhard Sch¨ lkopf, Alex J. Smola, and Klaus-Robert Muller. Nonlinear component analysis as a o kernel eigenvalue problem. Neural Computation, 10:1299–1319, 1998. Nicol N. Schraudolph. Fast curvature matrix-vector products for second-order gradient descent. Neural Computation, 14(7):1723–1738, 2002. 1917 ¨ G UNTER , S CHRAUDOLPH AND V ISHWANATHAN Nicol N. Schraudolph. Local gain adaptation in stochastic gradient descent. In Proc. Intl. Conf. Artificial Neural Networks, pages 569–574, Edinburgh, Scotland, 1999. IEE, London. ¨ Nicol N. Schraudolph, Simon Gunter, and S. V. N. Vishwanathan. Fast iterative kernel PCA. In Bernhard Sch¨ lkopf, John Platt, and Thomas Hofmann, editors, Advances in Neural Information o Processing Systems, volume 19, Cambridge MA, June 2007. MIT Press. Therdsak Tangkuampien and David Suter. Human motion de-noising via greedy kernel principal component analysis filtering. In Proc. Intl. Conf. Pattern Recognition, 2006. S. V. N. Vishwanathan, Nicol N. Schraudolph, and Alex J. Smola. Step size adaptation in reproducing kernel Hilbert space. Journal of Machine Learning Research, 7:1107–1133, June 2006. Luca Zanni, Thomas Serafini, and Gaetano Zanghirati. Parallel software for training large scale support vector machines on multiprocessor systems. Journal of Machine Learning Research, 7: 1467–1492, July 2006. 1918