nips nips2010 nips2010-207 nips2010-207-reference knowledge-graph by maker-knowledge-mining

207 nips-2010-Phoneme Recognition with Large Hierarchical Reservoirs


Source: pdf

Author: Fabian Triefenbach, Azarakhsh Jalalvand, Benjamin Schrauwen, Jean-pierre Martens

Abstract: Automatic speech recognition has gradually improved over the years, but the reliable recognition of unconstrained speech is still not within reach. In order to achieve a breakthrough, many research groups are now investigating new methodologies that have potential to outperform the Hidden Markov Model technology that is at the core of all present commercial systems. In this paper, it is shown that the recently introduced concept of Reservoir Computing might form the basis of such a methodology. In a limited amount of time, a reservoir system that can recognize the elementary sounds of continuous speech has been built. The system already achieves a state-of-the-art performance, and there is evidence that the margin for further improvements is still significant. 1


reference text

[1] A. Robinson. An application of recurrent neural nets to phone probability estimation. IEEE Trans. on Neural Networks, 5:298–305, 1994.

[2] H. Bourlard and N. Morgan. Continuous speeh recognition by connectionist statistical methods. IEEE Trans. on Neural Networks, 4:893–909, 1993.

[3] G. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18:1527–1554, 2006.

[4] A. Mohamed, G. Dahl, and G. Hinton. Deep belief networks for phone recognition. In NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, 2009.

[5] A. Graves and J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18:602–610, 2005.

[6] H. Jaeger. Tutorial on training recurrent neural networks, covering BPTT, RTRL, EKF and the echo state network approach (48 pp). Technical report, German National Research Center for Information Technology, 2002.

[7] W. Maass, T. Natschl¨ ger, and H. Markram. Real-time computing without stable states: A new framework a for neural computation based on perturbations. Neural Computation, 14(11):2531–2560, 2002.

[8] D. Verstraeten, B. Schrauwen, M. D’Haene, and D. Stroobandt. An experimental unification of reservoir computing methods. Neural Networks, 20:391–403, 2007.

[9] E. Antonelo, B. Schrauwen, and J. Van Campenhout. Generative modeling of autonomous robots and their environments using reservoir computing. Neural Processing Letters, 26(3):233–249, 2007.

[10] G. Holzmann and H. Hauser. Echo state networks with filter neurons and a delay & sum readout. Neural Networks, 23:244–256, 2010.

[11] D. Verstraeten, B. Schrauwen, and D. Stroobandt. Isolated word recognition using a liquid state machine. In Proceedings of the 13th European Symposium on Artificial Neural Networks (ESANN), pages 435–440, 2005.

[12] M. Skowronski and J. Harris. Automatic speech recognition using a predictive echo state network classifier. Neural Networks, 20(3):414–423, 2007.

[13] B. Schrauwen. A hierarchy of recurrent networks for speech recognition. In NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, 2009.

[14] J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, and N. Dahlgren. The DARPA TIMIT acousticphonetic continuous speech corpus cd-rom. Technical report, National Institute of Standards and Technology, 1993.

[15] K.F. Lee and H-W. Hon. Speaker-independent phone recognition using hidden markov models. In IEEE Trans. on Acoustics, Speech and Signal Processing, ASSP, volume 37, pages 1641–1648, 1989.

[16] H. Jaeger, M. Lukosevicius, D. Popovici, and U. Siewert. Optimization and applications of echo state networks with leaky-integrator neurons. Neural Networks, 20:335–352, 2007.

[17] S. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. on Acoustics Speech & Signal Processing, 28:357– 366, 1980.

[18] L. Van Immerseel and J.P. Martens. Pitch and voiced/unvoiced determination with an auditory model. Acoustical Society of America, 91(6):3511–3526, June 1992.

[19] B. Schrauwen, L. Buesing, and R. Legenstein. Computational power and the order-chaos phase transition in reservoir computing. In Proc. Advances in Neural Information Processing Systems (NIPS), volume 21, pages 1425–1432, 2008.

[20] P. Schwarz, P. Matejka, and J. Cernocky. Hierarchical structures of neural networks for phoneme recognition. In Proc. International Conference on Acoustics, Speech and Signal Processing, pages 325–328, 2006.

[21] Linguistic Data Consortium. COMLEX english pronunciation lexicon, 2009.

[22] K. Demuynck, J. Roelens, D. Van Compernolle, and P. Wambacq. SPRAAK: An open source speech recognition and automatic annotation kit. In Procs. Interspeech 2008, page 495, 2008.

[23] J. Ming and F.J. Smith. Improved phone recognition using bayesian triphone models. IEEE Trans. on Acoustics, Speech and Signal Processing, ASSP, 1:409–412, 1998. 9