nips nips2003 nips2003-102 nips2003-102-reference knowledge-graph by maker-knowledge-mining

102 nips-2003-Large Scale Online Learning

Source: pdf

Author: Léon Bottou, Yann L. Cun

Abstract: We consider situations where training data is abundant and computing resources are comparatively scarce. We argue that suitably designed online learning algorithms asymptotically outperform any batch learning algorithm. Both theoretical and experimental evidences are presented. 1

reference text

Amari, S. (1998). Natural Gradient Works Efﬁciently in Learning. Neural Computation, 10(2):251– 276. Bottou, L. (1998). Online Algorithms and Stochastic Approximations, 9-42. In Saad, D., editor, Online Learning and Neural Networks. Cambridge University Press, Cambridge, UK. Bottou, L. and Murata, N. (2002). Stochastic Approximations and Efﬁcient Learning. In Arbib, M. A., editor, The Handbook of Brain Theory and Neural Networks, Second edition,. The MIT Press, Cambridge, MA. Bottou, L. and Le Cun, Y. (2003). Online Learning for Very Large Datasets. NEC Labs TR-2003L039. To appear: Applied Stochastic Models in Business and Industry. Wiley. Chambers, J. M. and Hastie, T. J. (1992). Statistical Models in S, Chapman & Hall, London. Dennis, J. and Schnabel, R. B. (1983). Numerical Methods For Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Inc., Englewood Cliffs, New Jersey. Amari, S. and Park, H. and Fukumizu, K. (1998). Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons, Neural Computation, 12(6):1399–1409 ¨ Le Cun, Y., Bottou, L., Orr, G. B., and Muller, K.-R. (1998). Efﬁcient Back-prop. In Neural Networks, Tricks of the Trade, Lecture Notes in Computer Science 1524. Springer Verlag. Murata, N. and Amari, S. (1999). Statistical analysis of learning dynamics. Signal Processing, 74(1):3–28. Vapnik, V. N. and Chervonenkis, A. (1974). Theory of Pattern Recognition (in russian). Nauka. Tsypkin, Ya. (1973). Foundations of the theory of learning systems. Academic Press. 5 Recall Et Φt ∂L (zt , θ) = Φt ∂C (θ) = Φt Hθ + o (|θ|) = θ + o (|θ|) ∂θ ∂θ