nips nips2007 nips2007-210 nips2007-210-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Alex Graves, Marcus Liwicki, Horst Bunke, Jürgen Schmidhuber, Santiago Fernández
Abstract: In online handwriting recognition the trajectory of the pen is recorded during writing. Although the trajectory provides a compact and complete representation of the written output, it is hard to transcribe directly, because each letter is spread over many pen locations. Most recognition systems therefore employ sophisticated preprocessing techniques to put the inputs into a more localised form. However these techniques require considerable human effort, and are specific to particular languages and alphabets. This paper describes a system capable of directly transcribing raw online handwriting data. The system consists of an advanced recurrent neural network with an output layer designed for sequence labelling, combined with a probabilistic language model. In experiments on an unconstrained online database, we record excellent results using either raw or preprocessed data, well outperforming a state-of-the-art HMM based system in both cases. 1
[1] J. S. Bridle. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In F. Fogleman-Soulie and J.Herault, editors, Neurocomputing: Algorithms, Architectures and Applications, pages 227–236. Springer-Verlag, 1990.
[2] A. Graves, S. Fern´ ndez, F. Gomez, and J. Schmidhuber. Connectionist temporal classification: Labelling a unsegmented sequence data with recurrent neural networks. In Proc. 23rd Int. Conf. on Machine Learning, Pittsburgh, USA, 2006.
[3] A. Graves and J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18(5-6):602–610, June/July 2005.
[4] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In S. C. Kremer and J. F. Kolen, editors, A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press, 2001.
[5] S. Hochreiter and J. Schmidhuber. Long Short-Term Memory. Neural Comp., 9(8):1735–1780, 1997.
[6] J. Hu, S. G. Lim, and M. K. Brown. Writer independent on-line handwriting recognition using an HMM approach. Pattern Recognition, 33:133–147, 2000. 7 Figure 1: Sequential Jacobian for an excerpt from the IAM-OnDB, with raw inputs (left) and preprocessed inputs (right). For ease of visualisation, only the derivative with highest absolute value is plotted at each time step. The reconstructed image was created by plotting the pen coordinates recorded by the sensor. The individual strokes are alternately coloured red and black. For both representations, the Jacobian is plotted for the output corresponding to the label ‘i’ at the point when ‘i’ is emitted (indicated by the vertical dashed lines). Because bidirectional networks were used, the range of sensitivity extends in both directions from the dashed line. For the preprocessed data, the Jacobian is sharply peaked around the time when the output is emitted. For the raw data it is more spread out, suggesting that the network makes more use of long-range context. Note the spike in sensitivity to the very end of the raw input sequence: this corresponds to the delayed dot of the ‘i’.
[7] S. Jaeger, S. Manke, J. Reichert, and A. Waibel. On-line handwriting recognition: the NPen++ recognizer. Int. Journal on Document Analysis and Recognition, 3:169–180, 2001.
[8] S. Johansson, R. Atwell, R. Garside, and G. Leech. The tagged LOB corpus user’s manual; Norwegian Computing Centre for the Humanities, 1986.
[9] P. Lamere, P. Kwok, W. Walker, E. Gouvea, R. Singh, B. Raj, and P. Wolf. Design of the CMU Sphinx-4 decoder. In Proc. 8th European Conf. on Speech Communication and Technology, Aug. 2003.
[10] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proc. IEEE, 86(11):2278–2324, Nov. 1998.
[11] Y. LeCun, F. Huang, and L. Bottou. Learning methods for generic object recognition with invariance to pose and lighting. In Proc. of CVPR’04. IEEE Press, 2004.
[12] M. Liwicki and H. Bunke. IAM-OnDB - an on-line English sentence database acquired from handwritten text on a whiteboard. In Proc. 8th Int. Conf. on Document Analysis and Recognition, volume 2, pages 956–961, 2005.
[13] M. Liwicki, A. Graves, S. Fern´ ndez, H. Bunke, and J. Schmidhuber. A novel approach to on-line a handwriting recognition based on bidirectional long short-term memory networks. In Proc. 9th Int. Conf. on Document Analysis and Recognition, Curitiba, Brazil, Sep. 2007.
[14] M. Schuster and K. K. Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45:2673–2681, Nov. 1997.
[15] P. Y. Simard, D. Steinkraus, and J. C. Platt. Best practices for convolutional neural networks applied to visual document analysis. In Proc. 7th Int. Conf. on Document Analysis and Recognition, page 958, Washington, DC, USA, 2003. IEEE Computer Society.
[16] S. Young, N. Russell, and J. Thornton. Token passing: A simple conceptual model for connected speech recognition systems. Technical Report CUED/F-INFENG/TR38, Cambridge University Eng. Dept., 1989. 8