nips nips2001 nips2001-173 nips2001-173-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: S. Parveen, P. Green
Abstract: In the ‘missing data’ approach to improving the robustness of automatic speech recognition to added noise, an initial process identifies spectraltemporal regions which are dominated by the speech source. The remaining regions are considered to be ‘missing’. In this paper we develop a connectionist approach to the problem of adapting speech recognition to the missing data case, using Recurrent Neural Networks. In contrast to methods based on Hidden Markov Models, RNNs allow us to make use of long-term time constraints and to make the problems of classification with incomplete data and imputing missing values interact. We report encouraging results on an isolated digit recognition task.
Ahmed, S. & Tresp, V. (1993). Some solutions to the missing feature problem in vision. Advances in Neural Information Processing Systems 5 (S.J.Hanson, J.D.Cowan & C.L.Giles, eds.), Morgan Kaufmann, San Mateo, CA, 393-400. Barker, J., Green, P.D. and Cooke, M.P. (2001). Linking auditory scene analysis and robust ASR by missing data techniques. Workshop on Innovation in Speech Processing 2001, Stratford-upon-Avon, UK. Barker, J., Josifovski, L., Cooke, M.P. and Green, P.D. (2000a). Soft decisions in missing data techniques for robust automatic speech recognition. Accepted for ICSLP-2000, Beijing. Barker, J., Cooke, M.P. and Ellis, D.P.W. (2000b). Decoding speech in the presence of other sound sources. Accepted for ICSLP-2000, Beijing Bourlard, H. and N. Morgan (1998). Hybrid HMM/ANN systems for speech recognition: Overview and new research directions. In C. L.Giles and M. Gori (Eds.), Adaptive Processing of Sequences and Data Structures, Volume 1387 of Lecture Notes in Artificial Intelligence, pp. 389--417. Springer. Cooke, M., Green, P., Josifovski, L. and Vizinho, A. (2001). Robust automatic speech recognition with missing and unreliable acoustic data. submitted to Speech Communication, 24th June 1999. Cooke, M.P., Morris, A. & Green, P.D. (1996). Recognising occluded speech. ESCA Tutorial and Workshop on the Auditory Basis of Speech Perception, Keele University, July 15-19. Drygajlo, A. & El-Maliki, M. (1998). Speaker verification in noisy environment with combined spectral subtraction and missing data theory. Proc ICASSP-98, vol. I, pp121-124. Elman, J.L. (1990). Finding structure in time. Cognitive Science, 14, 179-211. Furui, S. (1997). Recent advances in robust speech recognition. Proc. ESCA-NATO Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, France, pp.11-20. Gingras, F. and Bengio, Y. (1998). Handling Asynchronous or Missing Data with Recurrent Networks. International Journal of Computational Intelligence and Organizations, vol. 1, no. 3, pp. 154-163. Ghahramani, Z. & Jordan, M.I. (1994). Supervised learning from incomplete data via an EM approach. Advances in Neural Information Processing Systems 6 (J.D. Cowan, G. Tesauro & J. Alspector, eds.), Morgan Kaufmann, San Mateo, CA, pp.120-129. Jordan, M. I. (1998). Supervised learning and systems with excess degrees of freedom. Technical Report COINS TR 88-27, Massachusetts Institute of Technology, 1988. Josifovski, L., Cooke, M., Green, P. and Vizinho, A. (1999). State based imputation of missing data for robust speech recognition and speech enhancement. Proc. Eurospeech’99, Budapest, Vol. 6, pp. 2837-2840. Leonard, R. G., (1984). A Database for Speaker-Independent Digit Recognition. Proc. ICASSP 84, Vol. 3, p. 42.11, 1984. Lippmann, R. P. (1997). Speech recognition by machines and humans. Speech Communication vol. 22 no. 1 pp. 1-15. Morris, A., Josifovski, L., Bourlard, H., Cooke, M.P. and Green, P.D. (2000). A neural network for classification with incomplete data: application to robust ASR. ICSLP 2000, Beijing China. Pearce, D. and Hirsch, H.--G. (2000). The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In Proc. ICSLP 2000, IV, 29--32, Beijing, China. Pedersen, M. W. (1997). Optimization of Recurrent Neural Networks for Time Series Modeling. PhD thesis. Technical University of Denmark. Raj, B., Seltzer, M., & Stern, R. (2000). Reconstruction of damaged spectrographic features for robust speech recognition. ICSLP 2000. Seung, H. S. (1997). Learning continuous attractors in Recurrent Networks. Proc. NIPS’97 pp 654660. Vizinho, A., Green, P., Cooke, M. and Josifovski, L. (1999). Missing data theory, spectral subtraction and signal-to-noise estimation for robust ASR: An integrated study. Proc. Eurospeech’99, Budapest, Sep. 1999, Vol. 5, pp. 2407-2410.