nips nips2013 nips2013-334 nips2013-334-reference knowledge-graph by maker-knowledge-mining

334 nips-2013-Training and Analysing Deep Recurrent Neural Networks

Source: pdf

Author: Michiel Hermans, Benjamin Schrauwen

Abstract: Time series often have a temporal hierarchy, with information that is spread out over multiple time scales. Common recurrent neural networks, however, do not explicitly accommodate such a hierarchy, and most research on them has been focusing on training algorithms rather than on their basic architecture. In this paper we study the effect of a hierarchy of recurrent neural networks on processing time series. Here, each layer is a recurrent network which receives the hidden state of the previous layer as input. This architecture allows us to perform hierarchical processing on difﬁcult temporal tasks, and more naturally capture the structure of time series. We show that they reach state-of-the-art performance for recurrent networks in character-level language modeling when trained with simple stochastic gradient descent. We also offer an analysis of the different emergent time scales. 1

reference text

[1] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientiﬁc Computing Conference (SciPy), June 2010.

[2] L. Bottou and O. Bousquet. The tradeoffs of large-scale learning. Optimization for Machine Learning, page 351, 2011.

[3] W.-Y. Chen, Y.-F. Liao, and S.-H. Chen. Speech recognition with hierarchical recurrent neural networks. Pattern Recognition, 28(6):795 – 805, 1995.

[4] D. Ciresan, U. Meier, L. Gambardella, and J. Schmidhuber. Deep, big, simple neural nets for handwritten digit recognition. Neural computation, 22(12):3207–3220, 2010.

[5] S. El Hihi and Y. Bengio. Hierarchical recurrent neural networks for long-term dependencies. Advances in Neural Information Processing Systems, 8:493–499, 1996.

[6] S. Fern´ ndez, A. Graves, and J. Schmidhuber. Sequence labelling in structured domains with hierarchia cal recurrent neural networks. In Proceedings of the 20th International Joint Conference on Artiﬁcial Intelligence, IJCAI 2007, Hyderabad, India, January 2007.

[7] J. Garofolo, N. I. of Standards, T. (US, L. D. Consortium, I. Science, T. Ofﬁce, U. States, and D. A. R. P. Agency. TIMIT Acoustic-phonetic Continuous Speech Corpus. Linguistic Data Consortium, 1993.

[8] A. Graves, A. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In To appear in ICASSP 2013, 2013.

[9] G. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527–1554, 2006.

[10] G. E. Hinton. Reducing the dimensionality of data with neural networks. Science, 313:504–507, 2006.

[11] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.

[12] M. Hutter. The human knowledge compression prize, 2006.

[13] H. Jaeger. Long short-term memory in echo state networks: Details of a simulation study. Technical report, Jacobs University, 2012.

[14] M. Mahoney. Adaptive weighing of context models for lossless data compression. Florida Tech., Melbourne, USA, Tech. Rep, 2005.

[15] J. Martens. Deep learning via hessian-free optimization. In Proceedings of the 27th International Conference on Machine Learning, pages 735–742, 2010.

[16] J. Martens and I. Sutskever. Learning recurrent neural networks with hessian-free optimization. In Proceedings of the 28th International Conference on Machine Learning, volume 46, page 68. Omnipress Madison, WI, 2011.

[17] A. Mohamed, G. Dahl, and G. Hinton. Acoustic modeling using deep belief networks. Audio, Speech, and Language Processing, IEEE Transactions on, 20(1):14–22, 2012.

[18] C. E. Shannon. Prediction and entropy of printed english. Bell system technical journal, 30(1):50–64, 1951.

[19] I. Sutskever, J. Martens, and G. Hinton. Generating text with recurrent neural networks. In Proceedings of the 28th International Conference on Machine Learning, pages 1017–1024, 2011.

[20] P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine learning, pages 1096–1103, 2008. 9