nips nips2007 nips2007-206 nips2007-206-reference knowledge-graph by maker-knowledge-mining

206 nips-2007-Topmoumoute Online Natural Gradient Algorithm

Source: pdf

Author: Nicolas L. Roux, Pierre-antoine Manzagol, Yoshua Bengio

Abstract: Guided by the goal of obtaining an optimization algorithm that is both fast and yields good generalization, we study the descent direction maximizing the decrease in generalization error or the probability of not increasing generalization error. The surprising result is that from both the Bayesian and frequentist perspectives this can yield the natural gradient direction. Although that direction can be very expensive to compute we develop an efﬁcient, general, online approximation to the natural gradient descent which is suited to large scale problems. We report experimental results showing much faster convergence in computation time and in number of iterations with TONGA (Topmoumoute Online natural Gradient Algorithm) than with stochastic gradient descent, even on very large datasets.

reference text

[1] S. Amari. Natural gradient works efﬁciently in learning. Neural Computation, 10(2):251–276, 1998.

[2] S. Amari, H. Park, and K. Fukumizu. Adaptive method of realizing natural gradient learning for multilayer perceptrons. Neural Computation, 12(6):1399–1409, 2000.

[3] L. Bottou. Stochastic learning. In O. Bousquet and U. von Luxburg, editors, Advanced Lectures on Machine Learning, number LNAI 3176 in Lecture Notes in Artiﬁcial Intelligence, pages 146–168. Springer Verlag, Berlin, 2004.

[4] R. Collobert. Large Scale Machine Learning. PhD thesis, Universit´ de Paris VI, LIP6, 2004. e

[5] H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In Twenty-fourth International Conference on Machine Learning (ICML’2007), 2007.

[6] Y. LeCun, L. Bottou, G. Orr, and K.-R. M¨ ller. Efﬁcient backprop. In G. Orr and K.-R. M¨ ller, editors, u u Neural Networks: Tricks of the Trade, pages 9–50. Springer, 1998.

[7] K. B. Petersen and M. S. Pedersen. The matrix cookbook, feb 2006. Version 20051003.

[8] N. N. Schraudolph. Fast curvature matrix-vector products for second-order gradient descent. Neural Computation, 14(7):1723–1738, 2002.

[9] H. H. Yang and S. Amari. Natural gradient descent for training multi-layer perceptrons. Submitted to IEEE Tr. on Neural Networks, 1997.

[10] H. H. Yang and S. Amari. Complexity issues in natural gradient descent method for training multi-layer perceptrons. Neural Computation, 10(8):2137–2157, 1998. 8