nips nips2013 nips2013-339 nips2013-339-reference knowledge-graph by maker-knowledge-mining

339 nips-2013-Understanding Dropout

Source: pdf

Author: Pierre Baldi, Peter J. Sadowski

Abstract: Dropout is a relatively new algorithm for training neural networks which relies on stochastically “dropping out” neurons during training in order to avoid the co-adaptation of feature detectors. We introduce a general formalism for studying dropout on either units or connections, with arbitrary probability values, and use it to analyze the averaging and regularizing properties of dropout in both linear and non-linear networks. For deep neural networks, the averaging properties of dropout are characterized by three recursive equations, including the approximation of expectations by normalized weighted geometric means. We provide estimates and bounds for these approximations and corroborate the results with simulations. Among other results, we also show how dropout performs stochastic gradient descent on a regularized error function. 1

reference text

[1] P. Baldi and P. Sadowski. The Dropout Learning Algorithm. Artiﬁcial Intelligence, 2014. In press.

[2] E. F. Beckenbach and R. Bellman. Inequalities. Springer-Verlag Berlin, 1965.

[3] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientiﬁc Computing Conference (SciPy), Austin, TX, June 2010. Oral Presentation.

[4] L. Bottou. Online algorithms and stochastic approximations. In D. Saad, editor, Online Learning and Neural Networks. Cambridge University Press, Cambridge, UK, 1998.

[5] L. Bottou. Stochastic learning. In O. Bousquet and U. von Luxburg, editors, Advanced Lectures on Machine Learning, Lecture Notes in Artiﬁcial Intelligence, LNAI 3176, pages 146–168. Springer Verlag, Berlin, 2004.

[6] D. Cartwright and M. Field. A reﬁnement of the arithmetic mean-geometric mean inequality. Proceedings of the American Mathematical Society, pages 36–38, 1978.

[7] G. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. http://arxiv.org/abs/1207.0580, 2012.

[8] E. Neuman and J. S´ ndor. On the Ky Fan inequality and related inequalities i. MATHEMATIa CAL INEQUALITIES AND APPLICATIONS, 5:49–56, 2002.

[9] E. Neuman and J. Sandor. On the Ky Fan inequality and related inequalities ii. Bulletin of the Australian Mathematical Society, 72(1):87–108, 2005.

[10] S. Nitish. Improving Neural Networks with Dropout. PhD thesis, University of Toronto, Toronto, Canada, 2013.

[11] H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. Optimizing methods in statistics, pages 233–257, 1971.

[12] D. Warde-Farley, I. Goodfellow, P. Lamblin, G. Desjardins, F. Bastien, and Y. Bengio. pylearn2. 2011. http://deeplearning.net/software/pylearn2. 9