Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized with code and dataset
ICDAR 2003(IC03):
ICDAR 2011(IC11):
ICDAR 2013(IC13):
USTB-SV1K:
SVT:
SVT-P:
ICDAR 2015(IC15):
COCO-Text:
MSRA-TD500:
MLT 2017:
MLT 2019:
CTW:
RCTW-17:
ReCTS:
CUTE80:
Total-Text:
SCUT-CTW1500:
LSVT:
ArTs:
Synth80k :
SynthText :
Comparison of Datasets | |||||||||||||
Datasets | Language | Image | Text instance | Text Shape | Annotation level | ||||||||
Total | Train | Test | Total | Train | Test | Horizontal | Arbitrary-Quadrilateral | Multi-oriented | Char | Word | Text-Line | ||
IC03 | English | 509 | 258 | 251 | 2266 | 1110 | 1156 | ✓ | ✕ | ✕ | ✕ | ✓ | ✕ |
IC11 | English | 484 | 229 | 255 | 1564 | ~ | ~ | ✓ | ✕ | ✕ | ✓ | ✓ | ✕ |
IC13 | English | 462 | 229 | 233 | 1944 | 849 | 1095 | ✓ | ✕ | ✕ | ✓ | ✓ | ✕ |
USTB-SV1K | English | 1000 | 500 | 500 | 2955 | ~ | ~ | ✓ | ✓ | ✕ | ✕ | ✓ | ✕ |
SVT | English | 350 | 100 | 250 | 725 | 211 | 514 | ✓ | ✓ | ✕ | ✓ | ✓ | ✕ |
SVT-P | English | 238 | ~ | ~ | 639 | ~ | ~ | ✓ | ✓ | ✕ | ✕ | ✓ | ✕ |
IC15 | English | 1500 | 1000 | 500 | 17548 | 122318 | 5230 | ✓ | ✓ | ✕ | ✕ | ✓ | ✕ |
COCO-Text | English | 63686 | 43686 | 20000 | 145859 | 118309 | 27550 | ✓ | ✓ | ✕ | ✕ | ✓ | ✕ |
MSRA-TD500 | English/Chinese | 500 | 300 | 200 | ~ | ~ | ~ | ✓ | ✓ | ✕ | ✕ | ✕ | ✓ |
MLT 2017 | Multi-lingual | 18000 | 7200 | 10800 | ~ | ~ | ~ | ✓ | ✓ | ✕ | ✕ | ✓ | ✕ |
MLT 2019 | Multi-lingual | 20000 | 10000 | 10000 | ~ | ~ | ~ | ✓ | ✓ | ✕ | ✕ | ✓ | ✕ |
CTW | Chinese | 32285 | 25887 | 6398 | 1018402 | 812872 | 205530 | ✓ | ✓ | ✕ | ✓ | ✓ | ✕ |
RCTW-17 | English/Chinese | 12514 | 15114 | 1000 | ~ | ~ | ~ | ✓ | ✓ | ✕ | ✕ | ✕ | ✓ |
ReCTS | Chinese | 20000 | ~ | ~ | ~ | ~ | ~ | ✓ | ✓ | ✕ | ✓ | ✓ | ✕ |
CUTE80 | English | 80 | ~ | ~ | ~ | ~ | ~ | ✕ | ✕ | ✓ | ✕ | ✓ | ✓ |
Total-Text | English | 1525 | 1225 | 300 | 9330 | ~ | ~ | ✓ | ✓ | ✓ | ✕ | ✓ | ✓ |
CTW-1500 | English/Chinese | 1500 | 1000 | 500 | 10751 | ~ | ~ | ✓ | ✓ | ✓ | ✕ | ✓ | ✓ |
LSVT | English/Chinese | 450000 | 430000 | 20000 | ~ | ~ | ~ | ✓ | ✓ | ✓ | ✕ | ✓ | ✓ |
ArT | English/Chinese | 10166 | 5603 | 4563 | ~ | ~ | ~ | ✓ | ✓ | ✓ | ✕ | ✓ | ✕ |
Synth80k | English | 80k | ~ | ~ | 8m | ~ | ~ | ✓ | ✕ | ✕ | ✓ | ✓ | ✕ |
SynthText | English | 800k | ~ | ~ | 6m | ~ | ~ | ✓ | ✓ | ✕ | ✕ | ✓ | ✕ |
[A] [IJCV-2020] Shangbang Long, Xin He, Cong Yao. Scene Text Detection and Recognition: The Deep Learning Era[J]. International Journal of Computer Vision, 2020, 1--24. arXiv
[B] [TPAMI-2015] Ye Q, Doermann D. Text detection and recognition in imagery: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(7): 1480-1500. paper
[C] [Frontiers-Comput. Sci-2016] Zhu Y, Yao C, Bai X. Scene text detection and recognition: Recent advances and future trends[J]. Frontiers of Computer Science, 2016, 10(1): 19-36. paper
[D] [arXiv-2018] Long S, He X, Ya C. Scene Text Detection and Recognition: The Deep Learning Era[J]. arXiv preprint arXiv:1811.04256, 2018. paper
If you are insterested in developing better scene text detection metrics, some references recommended here might be useful.
[A] Wolf, Christian, and Jean-Michel Jolion. "Object count/area graphs for the evaluation of object detection and segmentation algorithms." International Journal of Document Analysis and Recognition (IJDAR) 8.4 (2006): 280-296. paper
[B] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. K. Ghosh, A. D.Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny. ICDAR 2015 competition on robust reading. In ICDAR, pages 1156–1160, 2015. paper
[C] Calarasanu, Stefania, Jonathan Fabrizio, and Severine Dubuisson. "What is a good evaluation protocol for text localization systems? Concerns, arguments, comparisons and solutions." Image and Vision Computing 46 (2016): 1-17. paper
[D] Shi, Baoguang, et al. "ICDAR2017 competition on reading chinese text in the wild (RCTW-17)." 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Vol. 1. IEEE, 2017. paper
[E] Nayef, N; Yin, F; Bizid, I; et al. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on, volume 1, 1454–1459. IEEE. paper
[F] Dangla, Aliona, et al. "A first step toward a fair comparison of evaluation protocols for text detection algorithms." 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE, 2018. paper
[G] He,Mengchao and Liu, Yuliang, et al. ICPR2018 Contest on Robust Reading for Multi-Type Web images. ICPR 2018. paper
[H] Liu, Yuliang and Jin, Lianwen, et al. "Tightness-aware Evaluation Protocol for Scene Text Detection" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2019. paper code
OCR | API | Free |
---|---|---|
Tesseract OCR Engine | × | √ |
Azure | √ | √ |
ABBYY | √ | √ |
OCR Space | √ | √ |
SODA PDF OCR | √ | √ |
Free Online OCR | √ | √ |
Online OCR | √ | √ |
Super Tools | √ | √ |
Online Chinese Recognition | √ | √ |
Calamari OCR | × | √ |
Tencent OCR | √ | × |
[1].Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection
作者 | Shi-Xue Zhang, Xiaobin Zhu, Jie-Bo Hou, Chang Liu, Chun Yang, Hongfa Wang, Xu-Cheng Yin
单位 | 北京科技大学;中国科学技术大学人工智能联合实验室;腾讯科技(深圳)
代码 | https://github.com/GXYM/DRRG
备注 | CVPR 2020 Oral
解读 | https://blog.csdn.net/SpicyCoder/article/details/105072570
[2].ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection
作者 | Yuxin Wang, Hongtao Xie, Zheng-Jun Zha, Mengting Xing, Zilong Fu, Yongdong Zhang
单位 | 中国科学技术大学
代码 | https://github.com/wangyuxin87/ContourNet
解读 | https://zhuanlan.zhihu.com/p/135399747
[3].On Vocabulary Reliance in Scene Text Recognition
作者 | Zhaoyi Wan, Jielei Zhang, Liang Zhang, Jiebo Luo, Cong Yao
单位 | 旷视;中国矿业大学;罗切斯特大学
[4].SCATTER: Selective Context Attentional Scene Text Recognizer
作者 | Ron Litman, Oron Anschel, Shahar Tsiper, Roee Litman, Shai Mazor, R. Manmatha
单位 | Amazon Web Services
[5].Towards Accurate Scene Text Recognition With Semantic Reasoning Networks
作者 | Deli Yu, Xuan Li, Chengquan Zhang, Tao Liu, Junyu Han, Jingtuo Liu, Errui Ding
单位 | 国科大;百度;中科院
代码 | https://github.com/chenjun2hao/SRN.pytorch
[6].SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition
作者 | Zhi Qiao, Yu Zhou, Dongbao Yang, Yucan Zhou, Weiping Wang
单位 | 中科院;国科大
代码 | https://github.com/Pay20Y/SEED(即将)
[7].OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page Text Recognition by learning to unfold
作者 | Mohamed Yousef, Tom E. Bishop
单位 | Intuition Machines, Inc
代码 | https://github.com/IntuitionMachines/OrigamiNet
[8].ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network
作者 | Yuliang Liu, Hao Chen, Chunhua Shen, Tong He, Lianwen Jin, Liangwei Wang
单位 | 华南理工大学;阿德莱德大学;
代码 | https://github.com/Yuliang-Liu/bezier_curve_text_spotting
备注 | CVPR 2020 Oral
解读 | https://zhuanlan.zhihu.com/p/146276834
半监督变长手写文本生成,增加文本数据集,提高识别算法精度
[9].ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation
作者 | Sharon Fogel, Hadar Averbuch-Elor, Sarel Cohen, Shai Mazor, Roee Litman
单位 | 以色列国,Amazon Rekognition;康奈尔大学
代码 | https://github.com/amzn/convolutional-handwriting-gan
使用渲染引擎合成场景文本,增加训练样本,提升识别算法精度
[10].UnrealText: Synthesizing Realistic Scene Text Images From the Unreal
作者 | WorldShangbang Long, Cong Yao
单位 | 卡内基梅隆大学;旷视
代码 | https://jyouhou.github.io/UnrealText/
解读 | https://zhuanlan.zhihu.com/p/137406773
图像增广用于手写与场景文本识别
[11].Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition
作者 | Canjie Luo, Yuanzhi Zhu, Lianwen Jin, Yongpan Wang
单位 | 华南理工大学;阿里
代码 | https://github.com/Canjie-Luo/Text-Image-Augmentation
[12].STEFANN: Scene Text Editor Using Font Adaptive Neural Network
作者 | Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal
单位 | 印度统计研究所;印度理工学院
代码 | https://github.com/prasunroy/stefann
网站 | https://prasunroy.github.io/stefann/
破碎纸片重建文档,用于法医等刑侦调查
[13].Fast(er) Reconstruction of Shredded Text Documents via Self-Supervised Deep Asymmetric Metric Learning
作者 | Thiago M. Paixao, Rodrigo F. Berriel, Maria C. S. Boeres, Alessandro L. Koerich, Claudine Badue, Alberto F. De Souza, Thiago Oliveira-Santos
单位 | IFES,Brazil;UFES,Brazil;ETS,Canada
[14].SwapText: Image Based Texts Transfer in Scenes
作者 | Qiangpeng Yang, Jun Huang, Wei Lin
单位 | 阿里
[15].What Machines See Is Not What They Get: Fooling Scene Text Recognition Models With Adversarial Text Images
作者 | Xing Xu, Jiefu Chen, Jinhui Xiao, Lianli Gao, Fumin Shen, Heng Tao Shen
单位 | 电子科技大学
[16].Sequential Motif Profiles and Topological Plots for Offline Signature Verification
作者 | Elias N. Zois, Evangelos Zervas, Dimitrios Tsourounis, George Economou
单位 | University of West Attica ;派图拉斯大学
Feel free to dive in! Open an issue or submit PRs.
This project exists thanks to all the people who contribute.
More sincerely, I'm appreciate to @HCIILAB & @Jyouhou
Copyright © 2020 MaiweiAI.cn @Charmve. All Rights Reserved.