iccv iccv2013 iccv2013-315 iccv2013-315-reference knowledge-graph by maker-knowledge-mining

315 iccv-2013-PhotoOCR: Reading Text in Uncontrolled Conditions


Source: pdf

Author: Alessandro Bissacco, Mark Cummins, Yuval Netzer, Hartmut Neven

Abstract: We describe PhotoOCR, a system for text extraction from images. Our particular focus is reliable text extraction from smartphone imagery, with the goal of text recognition as a user input modality similar to speech recognition. Commercially available OCR performs poorly on this task. Recent progress in machine learning has substantially improved isolated character classification; we build on this progress by demonstrating a complete OCR system using these techniques. We also incorporate modern datacenter-scale distributed language modelling. Our approach is capable of recognizing text in a variety of challenging imaging conditions where traditional OCR systems fail, notably in the presence of substantial blur, low resolution, low contrast, high image noise and other distortions. It also operates with low latency; mean processing time is 600 ms per image. We evaluate our system on public benchmark datasets for text extraction and outperform all previously reported results, more than halving the error rate on multiple benchmarks. The system is currently in use in many applications at Google, and is available as a user input modality in Google Translate for Android.


reference text

[1] T. Brants, A. Popat, P. Xu, F. Och, and J. Dean. Large language models in machine translation. In EMNLP, 2007.

[2] T. Breuel. The OCRopus open source OCR system. In IS&T;/SPIE 20th Annual Symposium, 2008.

[3] B. Carpenter. Scaling high-order character language models to gigabytes. In ACL Workshop on Software, 2005.

[4] X. Chen and A. Yuille. Detecting and reading text in natural scenes. In CVPR, 2004.

[5] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.

[6] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng. Large scale distributed deep networks. In NIPS, 2012.

[7] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121–2159, 2011.

[8] B. Epshtein, E. Ofek, and Y. Wexler. Detecting text in natural scenes with stroke width transform. In CVPR, 2010.

[9] V. Goel, A. Mishra, K. Alahari, and C. Jawahar. Whole is greater than sum of parts: Recognizing scene text words. In ICDAR, 2013.

[10] G. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Improving neural networks by preventing co-adaptation offeature detectors. CoRR, abs/1207.0580, 2012.

[11] C. Jacobs, P. Simard, P. Viola, and J. Rinker. Text recognition of low-resolution document images. In ICDAR, 2005.

[12] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. iBigorda, S. Mestre, J. Mas, D. Mota, J. Almazan, and L. de las Heras. ICDAR 2013 Robust Reading Competition. In ICDAR, 2013.

[13] F. Kimura, T. Wakabayashi, S. Tsuruoka, and Y. Miyake. Improvement of handwritten Japanese character recognition using weighted direction code histogram. Pattern recognition, 30(8): 1329–1337, 1997.

[14] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(1 1):2278–2324, 1998.

[15] A. Mishra, K. Alahari, and C. Jawahar. Top-down and bottom-up cues for scene text recognition. In CVPR, 2012.

[16] V. Nair and G. Hinton. Rectified linear units improve restricted Boltzmann machines. In ICML, 2010.

[17] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27] 779922 A. Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011. L. Neumann and J. Matas. A method for text localization and recognition in real-world images. In ACCV, 2010. W. Niblack. An Introduction to Digital Image Processing. Prentice Hall, 1986. T. Novikova, O. Barinova, P. Kohli, and V. Lempitsky. Largelexicon attribute-consistent text recognition in natural images. In ECCV, 2012. Y. Pan, X. Hou, and C. Liu. Text localization in natural scene images based on conditional random field. In ICDAR, 2009. S. Russell and P. Norvig. Artificial intelligence: A Modern Approach. Prentice Hall, 1995. D. Smith, J. Field, and E. Learned-Miller. Enforcing similarity constraints with integer programming for better scene text recognition. In CVPR, 2011. P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In CVPR, 2001. K. Wang, B. Babenko, and S. Belongie. End-to-end scene text recognition. In ICCV, 2011. T. Wang, D. Wu, A. Coates, and A. Ng. End-to-end text recognition with convolutional neural networks. In ICPR, 2012. J. Weinman, E. Learned-Miller, and A. Hanson. Scene text recognition using similarity and a lexicon with sparse belief propagation. Pattern Analysis and Machine Intelligence, 31(10): 1733–1746, 2009.