cvpr cvpr2013 cvpr2013-382 cvpr2013-382-reference knowledge-graph by maker-knowledge-mining

382 cvpr-2013-Scene Text Recognition Using Part-Based Tree-Structured Character Detection

Source: pdf

Author: Cunzhao Shi, Chunheng Wang, Baihua Xiao, Yang Zhang, Song Gao, Zhong Zhang

Abstract: Scene text recognition has inspired great interests from the computer vision community in recent years. In this paper, we propose a novel scene text recognition method using part-based tree-structured character detection. Different from conventional multi-scale sliding window character detection strategy, which does not make use of the character-specific structure information, we use part-based tree-structure to model each type of character so as to detect and recognize the characters at the same time. While for word recognition, we build a Conditional Random Field model on the potential character locations to incorporate the detection scores, spatial constraints and linguistic knowledge into one framework. The final word recognition result is obtained by minimizing the cost function defined on the random field. Experimental results on a range of challenging public datasets (ICDAR 2003, ICDAR 2011, SVT) demonstrate that the proposed method outperforms stateof-the-art methods significantly bothfor character detection and word recognition.

reference text

[1] Abbyy finereader 9.0. http://www.abbyy.com.

[2] X. Chen and A. Yuille. Detecting and reading text in natural scenes. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages II– 366. IEEE, 2004.

[3] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, pages 886–893, 2005.

[4] T. de Campos, B. Babu, and M. Varma. Character recognition in natural images. In VISAP, 2009.

[5] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained partbased models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1627–1645, 2010.

[6] P. Felzenszwalb and D. Huttenlocher. Pictorial structures for object recognition. International Journal of Computer Vision, 61(1):55–79, 2005.

[7] T. Judd, K. Ehinger, F. Durand, and A. Torralba. Learning to predict where humans look. In IEEE 12th International Conference on Computer Vision (ICCV), pages 2106–21 13. IEEE, 2009.

[8] V. Kolmogorov. Convergent tree-reweighted message passing for energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10): 1568 1583, 2006.

[9] S. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, and R. Young. Icdar 2003 robust reading competitions. In Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR), volume 2, pages 682– –

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19] 687, 2003. A. Mishra, K. Alahari, and C. V. Jawahar. Scene text recognition using higher order langauge priors. In BMVC, 2012. A. Mishra, K. Alahari, and C. V. Jawahar. Top-down and bottom-up cues for scene text recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2012. L. Neumann and J. Matas. Real-time scene text localization and recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. A. Newell and L. Griffin. Multiscale histogram of oriented gradient descriptors for robust character recognition. In International Conference on Document Analysis and Recognition (ICDAR), pages 1085–1089. IEEE, 2011. N. Otsu. A threshold selection method from gray-level histograms. Automatica, 11:285–296, 1975. J. Pearl. Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kauffman, 1988. A. Shahab, F. Shafait, and A. Dengel. Icdar 2011 robust reading competition challenge 2: Reading text in scene images. In International Conference on DocumentAnalysis and Recognition (ICDAR), pages 1491–1496. IEEE, 2011. A. Stolcke et al. Srilm-an extensible language modeling toolkit. In Proceedings of the international conference on spoken language processing, volume 2, pages 901–904, 2002. K. Wang, B. Babenko, and S. Belongie. End-to-end scene text recognition. In International Conference on Computer Vision (ICCV), 2011. K. Wang and S. Belongie. Word spotting in the wild. Computer Vision–ECCV, pages 591–604, 2010.

[20] Z. Xiangxin and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. pages 2879 2886, 2011.

[21] Y. Yang and D. Ramanan. Articulated pose estimation with flexible mixtures-of-parts. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1385–1392. IEEE, 2011.

[22] M. Yokobayashi and T. Wakahara. Segmentation and recognition of characters in scene images using selective binarization in color space and gat correlation. In Proceedings. Eighth International Conference on Document Analysis and Recognition (ICDAR), pages 167–171. IEEE, 2005. – 222999666866