nips nips2003 nips2003-102 nips2003-102-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Léon Bottou, Yann L. Cun
Abstract: We consider situations where training data is abundant and computing resources are comparatively scarce. We argue that suitably designed online learning algorithms asymptotically outperform any batch learning algorithm. Both theoretical and experimental evidences are presented. 1
Amari, S. (1998). Natural Gradient Works Efficiently in Learning. Neural Computation, 10(2):251– 276. Bottou, L. (1998). Online Algorithms and Stochastic Approximations, 9-42. In Saad, D., editor, Online Learning and Neural Networks. Cambridge University Press, Cambridge, UK. Bottou, L. and Murata, N. (2002). Stochastic Approximations and Efficient Learning. In Arbib, M. A., editor, The Handbook of Brain Theory and Neural Networks, Second edition,. The MIT Press, Cambridge, MA. Bottou, L. and Le Cun, Y. (2003). Online Learning for Very Large Datasets. NEC Labs TR-2003L039. To appear: Applied Stochastic Models in Business and Industry. Wiley. Chambers, J. M. and Hastie, T. J. (1992). Statistical Models in S, Chapman & Hall, London. Dennis, J. and Schnabel, R. B. (1983). Numerical Methods For Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Inc., Englewood Cliffs, New Jersey. Amari, S. and Park, H. and Fukumizu, K. (1998). Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons, Neural Computation, 12(6):1399–1409 ¨ Le Cun, Y., Bottou, L., Orr, G. B., and Muller, K.-R. (1998). Efficient Back-prop. In Neural Networks, Tricks of the Trade, Lecture Notes in Computer Science 1524. Springer Verlag. Murata, N. and Amari, S. (1999). Statistical analysis of learning dynamics. Signal Processing, 74(1):3–28. Vapnik, V. N. and Chervonenkis, A. (1974). Theory of Pattern Recognition (in russian). Nauka. Tsypkin, Ya. (1973). Foundations of the theory of learning systems. Academic Press. 5 Recall Et Φt ∂L (zt , θ) = Φt ∂C (θ) = Φt Hθ + o (|θ|) = θ + o (|θ|) ∂θ ∂θ