nips nips2006 nips2006-79 nips2006-79-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nicol N. Schraudolph, Simon Günter, S.v.n. Vishwanathan
Abstract: We introduce two methods to improve convergence of the Kernel Hebbian Algorithm (KHA) for iterative kernel PCA. KHA has a scalar gain parameter which is either held constant or decreased as 1/t, leading to slow convergence. Our KHA/et algorithm accelerates KHA by incorporating the reciprocal of the current estimated eigenvalues as a gain vector. We then derive and apply Stochastic MetaDescent (SMD) to KHA/et; this further speeds convergence by performing gain adaptation in RKHS. Experimental results for kernel PCA and spectral clustering of USPS digits as well as motion capture and image de-noising problems confirm that our methods converge substantially faster than conventional KHA. 1
[1] T. D. Sanger. Optimal unsupervised learning in a single-layer linear feedforward network. Neural Networks, 2:459–473, 1989.
[2] K. I. Kim, M. O. Franz, and B. Sch¨ lkopf. Iterative kernel principal component analysis for o image modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(9): 1351–1366, 2005.
[3] B. Sch¨ lkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002. o
[4] B. Sch¨ lkopf, A. J. Smola, and K.-R. M¨ ller. Nonlinear component analysis as a kernel eigeno u value problem. Neural Computation, 10:1299–1319, 1998.
[5] H. Robbins and S. Monro. A stochastic approximation method. Annals of Mathematical Statistics, 22:400–407, 1951.
[6] L.-H. Chen and S. Chang. An adaptive learning algorithm for principal component analysis. IEEE Transaction on Neural Networks, 6(5):1255–1263, 1995.
[7] N. N. Schraudolph. Fast curvature matrix-vector products for second-order gradient descent. Neural Computation, 14(7):1723–1738, 2002.
[8] S. V. N. Vishwanathan, N. N. Schraudolph, and A. J. Smola. Step size adaptation in reproducing kernel Hilbert space. Journal of Machine Learning Research, 7:1107–1133, 2006.
[9] C. Darken and J. E. Moody. Towards faster stochastic gradient search. In J. E. Moody, S. J. Hanson, and R. Lippmann, editors, Advances in Neural Information Processing Systems 4, pages 1009–1016. Morgan Kaufmann Publishers, 1992.
[10] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. J. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1:541– 551, 1989.
[11] S. Mika, B. Sch¨ lkopf, A. J. Smola, K.-R. M¨ ller, M. Scholz, and G. R¨ tsch. Kernel PCA and o u a de-noising in feature spaces. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in Neural Information Processing Systems 11, pages 536–542. MIT Press, 1999.
[12] D. J. Munson. A note on Lena. IEEE Trans. Image Processing, 5(1), 1996.
[13] A. Ng, M. Jordan, and Y. Weiss. Spectral clustering: Analysis and an algorithm (with appendix). In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, 2002.
[14] M. Meila. Comparing clusterings: an axiomatic view. In ICML ’05: Proceedings of the 22nd international conference on Machine learning, pages 577–584, New York, NY, USA, 2005. ACM Press.
[15] T. Tangkuampien and D. Suter. Human motion de-noising via greedy kernel principal component analysis filtering. In Proc. Intl. Conf. Pattern Recognition, 2006.