تعداد نشریات | 30 |
تعداد شمارهها | 691 |
تعداد مقالات | 6,782 |
تعداد مشاهده مقاله | 11,080,618 |
تعداد دریافت فایل اصل مقاله | 7,478,702 |
Multi-Oriented Scene Text Detection at Character Level | ||
International Journal of Industrial Electronics Control and Optimization | ||
مقاله 6، دوره 6، شماره 3، آذر 2023، صفحه 219-227 اصل مقاله (924.24 K) | ||
نوع مقاله: Research Articles | ||
شناسه دیجیتال (DOI): 10.22111/ieco.2023.44026.1471 | ||
نویسندگان | ||
Mahdi Kazeminia* ؛ Hamed Shahraki؛ Mehran Tamjidi | ||
Velayat University | ||
چکیده | ||
Recent scene text detection methods perform superior on benchmark datasets using deep-learning frameworks. In this paper, we re-implement the state-of-the-art text detection method, character region awareness for text detection (CRAFT), which can detect individual characters of scene text images. CRAFT is a character-based detection method with many advantages in detecting complex text by detecting character units and estimating the area between characters, capable of detecting texts of any shape. In the other words, we improve the detection performance of the baseline method, CRAFT, by some modifications in its architecture and proposing a training scheme that takes benefit of the advanced optimizer. The performance improvements of CRAFT are validated on three benchmark datasets: ICDAR2013, ICDAR2015, and COCO-Text. By applying the pre-trained models on COCO-Text, CRAFT shows that it cannot generalize without fine-tuning. We also improve the ICDAR2015 model and evaluate it on benchmark datasets. The evaluation results show improved precision performance compared to the original pre-trained model with fewer iterations and higher accuracy. | ||
کلیدواژهها | ||
Deep Learning؛ Scene Text Detection؛ CRAFT | ||
مراجع | ||
[1] H. Lin, P. Yang, and F. Zhang, "Review of scene text detection and recognition," Archives of Computational Methods in Engineering, pp. 433–454, 2020.
[2] S. Long, Y. Guan, B. Wang, K. Bian, and C. Yao, “Rethinking Irregular Scene Text Recognition,” arXiv.org, Nov. 11, 2019. https://arxiv.org/abs/1908.11834. [3] T. Diep, "State-of-the-art in action: unconstrained text detection, " in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019, pp. 0-0. [4] J. Matas, O. Chum, M. Urban, and T. Pajdla, "Robust widebaseline stereo from maximally stable extremal regions," Image and Vision Computing, vol. 22, no. 10, pp. 761–767, Sep, 2004. [5] L. Neumann and J. Matas, "A method for text localization and recognition in real-world images," in Asian Conference On Computer Vision, Springer, 2010, pp. 770-783. [6] B. Epshtein, E. Ofek, and Y. Wexler, "Detecting text in natural scenes with stroke width transform, " in Proc. IEEE Conf. on Comp. Vision and Pattern Recognit, 2010, pp. 2963–2970. [7] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. Berg, "SSD: single shot multibox detector, " in European Conference on Computer Vision, Springer, Oct. 2016, pp. 21–37. [8] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. "You only look once: Unified, real-time object detection, " in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–788. [9] S. Ren, K. He, R. Girshick, and J. Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks," in Advances in Neural Information Processing Systems, 2015, pp. 91–99. [10] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation, " in Proceedings Of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431–3440. [11] K. He, G. Gkioxari, P. Dollar, and R. Girshick, "Mask RCNN," in Proc. Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2961–2969. [12] Y. Baek, et al. "Character region awareness for text detection" in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9365-9374. [13] X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang, "East: an efficient and accurate scene text detector," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5551–5560. [14] M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, "Textboxes: A fast text detector with a single deep neural network," in Thirty-First AAAI Conference on Artificial Intelligence, Feb. 2017. [15] Y. Liu and L. Jin, "Deep matching prior network: Toward tighter multi-oriented text detection, " in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1962–1969. [16] M. Liao, B. Shi, and X. Bai, "Textboxes++: a single-shot oriented scene text detector," IEEE Transactions on Image Processing, Vol. 27, No. 8, pp. 3676–3690, Apr. 2018. [17] J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, and X. Xue, "Arbitrary-oriented scene text detection via rotation proposals, " IEEE Transactions on Multimedia, Vol. 20, No. 11, pp.3111–3122, 2018. [18] X. Li, J. Liu, S. Zhang, and G. Zhang, "Learning to predict more accurate text instances for scene text detection," arXiv preprint arXiv: 1911.07423, 2019. [19] P. He, W. Huang, T. He, Q. Zhu, Y. Qiao, and X. Li, "Single shot text detector with regional attention," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 3047–3055. [20] D. Deng, H. Liu, X. Li, and D. Cai, "Pixellink: Detecting scene text via instance segmentation," in Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32, No. 1. 2018. [21] S. Long, et al . "Textsnake: A flexible representation for detecting text of arbitrary shapes," In Proceedings of the European conference on computer vision (ECCV), 2018, pp. 20–36. [22] Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai, "Multi-oriented text detection with fully convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4159–4167. [23] M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai, "Real-time scene text detection with differentiable binarization," In AAAI Conf. on Artificial Intelligence, 2020, pages 11474–11481. [24] X. Liu, D. Liang, S. Yan, D. Chen, Y. Qiao, and J. Yan, "Fots: Fast oriented text spotting with a unified network," In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 5676–5685. [25] T. He, Z. Tian, W. Huang, C. Shen, Y. Qiao, and C. Sun, "An end-to-end text spotter with explicit alignment and attention," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 5020–5029. [26] P. Lyu, M. Liao, C. Yao, W. Wu, and X. Bai, "Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes," Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 67-83. [27] Q. Wang, Y. Zheng, and M. Betke, "Sa-text: Simple but accurate detector for text of arbitrary shapes," arXiv preprint arXiv:1911.07046, 2019. [28] H. Hu, C. Zhang, Y. Luo, Y. Wang, J. Han, and E. Ding, "Wordsup: Exploiting word annotations for character based text detection," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 4940–4949. [29] S. Zhang, M. Lin, T. Chen, L. Jin, and L. Lin, "Character proposal network for robust text extraction," in International Conference on Acoustics, Speech and Signal Processing (ICASSP), Mar. 2016, pp. 2633–2637. [30] C. Yao, X. Bai, N. Sang, X. Zhou, S. Zhou, and Z. Cao, "Scene text detection via holistic, multi-channel prediction," arXiv preprint arXiv: 1606.09002, 2016. [31] B. Shi, X. Bai, and S. Belongie, "Detecting oriented text in natural images by linking segments," In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2550–2558. [32] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in International Conference on Medical image computing and computer-assisted intervention, Springer, 2015, pp. 234–241. [33] K. Simonyan and A. Zisserman, "Detecting oriented text in natural images by linking segments," CoRR, abs/1409.1556, 2014. [34] C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, "Detecting texts of arbitrary orientations in natural images," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2012, pp. 1083–1090. [35] D. Karatzas, et al. " Icdar 2013 robust reading competition, " in 12th international conference on document analysis and recognition, 2013, pp. 1484–1493. [36] D. Karatzas, et al. "Icdar 2015 competition on robust reading," in 13th international conference on document analysis and recognition (ICDAR, 2015), pp. 1156–1160. [37] A. Gupta, A. Vedaldi, and A. Zisserman, "Synthetic data for text localisation in natural images, " In Proc. IEEE Conf. on Comp. Vision and Pattern Recognit, 2016, pp. 2315–2324. [38] C.K. Ch'ng and C.S. Chan, "Total-text: A comprehensive dataset for scene text detection and recognition," In Proc. IAPR Int. Conf. on Document Anal. and Recognit. (ICDAR), Vol. 1, 2017, pp. 935–942. [39] L. Yuliang, J. Lianwen, Z. Shuaitao, and Z. Sheng, "Detecting curve text in the wild: New dataset and new solution," In arXiv preprint arXiv: 1712.02170, 2017. [40] M. Iwamura, N. Morimoto, K. Tainaka, D. Bazazian, L. Gomez, and D. Karatzas, "Icdar2017 robust reading challenge on omnidirectional video," in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 1, 2017, pp. 1448–1453. [41] T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and L. Zitnick, "Microsoft coco: Common objects in context," in European conference on computer vision, Springer, 2014, pp. 740–755. [42] A. Veit, T. Matera, L. Neumann, J. Matas, and S. Belongie, "Coco-text: Dataset and benchmark for text detection and recognition in natural images," arXiv preprint arXiv:1601.07140, 2016. [43] Z. Raisi, M.A. Naiel, P. Fieguth, S. Wardell, and John Zelek. "Text detection and recognition in the wild: A review, " arXiv preprint arXiv: 2006.04305, 2020.
[44] D.P. Kingma and J. Ba, "Adam: A method for stochastic optimization, " arXiv preprint arXiv:1412.6980, 2014. [45] W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao, "Shape robust text detection with progressive scale expansion network," arXiv preprint arXiv:1903.12473, 2019. [46] S.X. Zhang, et al. "Deep relational reasoning graph network for arbitrary shape text detection," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9699–9708. [47] Y. Su, Z. Shao, Y. Zhou, F. Meng, H. Zhu, B. Liu, and R. Yao, "Textdct: Arbitrary-shaped text detection via discrete cosine transform mask", IEEE Transactions on Multimedia, 2022. [48] Z. Raisi, et al. "Smart Text Reader System for People who are Blind Using Machine and Deep Learning," Machine Learning Algorithms for Signal and Image Processing, pp. 161-200, 2022. | ||
آمار تعداد مشاهده مقاله: 181 تعداد دریافت فایل اصل مقاله: 301 |