caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection
This is a caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection.
This project is modified from py-R-FCN, and inclined nms and generate rotated box component is imported from EAST project. Thanks for the author's(@zxytim @argman) help. Please cite this paper if you find this useful.
.
├── docker-compose.yml
├── docker // docker deps file
├── Dockerfile // docker build file
├── model // model directory
│ ├── caffemodel // trained caffe model
│ ├── icdar15_gt // ICDAR2015 groundtruth
│ ├── prototxt // caffe prototxt file
│ └── imagenet_models // pretrained on imagenet
├── nvidia-docker-compose.yml
├── logs
│ ├── submit // original submit file
│ ├── submit_zip // zip submit file
│ ├── snapshots
│ └── train
│ ├── VGG16.txt.*
│ └── snapshots
├── README.md
├── requirements.txt // python package
├── src
│ ├── cfgs // train config yml
│ ├── data // cache file
│ ├── lib
│ ├── _init_path.py
│ ├── demo.py
│ ├── eval_icdar15.py // eval 2015 icdar dataset F-meaure
│ ├── test_net.py
│ └── train_net.py
├── demo.sh
├── train.sh
├── images // test images
│ ├── img_1.jpg
│ ├── img_2.jpg
│ ├── img_3.jpg
│ ├── img_4.jpg
│ └── img_5.jpg
└── test.sh // test script
It should have this basic structure
ICDARdevkit_Root
.
├── ICDAR2013
├── merge_train.txt // images list contains ICDAR2013+ICDAR2015 train dataset, then raw data augmentation the same as the paper
├── ICDAR2015
│ ├── augmentation // contains all augmented images
│ └── ImageSets/Main/test.txt // ICDAR2015 test images list
It is highly recommended to use docker to build environment. More about how to configure docker, see Running with Docker If you are familiar with docker, please run
1. nvidia-docker-compose run --rm --service-ports rrcnn bash
2. bash ./demo.sh
If you don't familiar with docker, please follow py-R-FCN to install caffe.
cd src/lib && make
It is recommended to use UNIX socket to support GUI for docker, plesase open another terminal and type:
xhost + # may be you need it when open a new terminal
# docker-compose.yml: mount host volume : /tmp/.X11-unix to docker volume: /tmp/.X11-unix
# pass DISPLAY variable to docker container so host X server can display image in docker
docker exec -it -e DISPLAY=$DISPLAY ${CURRENT_CONTAINER_ID} bash
bash ./demo.sh
bash ./test.sh
# please uncomment two lines in src/cfgs/faster_rcnn_end2end.yml
SCALES: [720, 1200]
MULTI_SCALES_NOC: True
# modify src/lib/datasets/icdar.py to find ICDAR2015 test data, please refer to commit @bbac1cf
# then run
bash ./test.sh
- Mine: ICDAR2013+ICDAR2015 train dataset, and raw data augmentation, at last got 15977 images.
- Paper: ICDAR2015 + 2000 focused scene text images they collected.
# Train for RRCNN4-TextBoxes-v2-OHEM
bash ./train.sh
note: If you use USE_FLIPPED=True&USE_FLIPPED_QUAD=True, you will get almost 31200 roidb.
Approaches | Anchor Scales | Pooled sizes | Inclined NMS | Test scales(short side) | F-measure(Mine VS paper) |
---|---|---|---|---|---|
R2CNN-2 | (4, 8, 16) | (7, 7) | Y | (720) | 71.12% VS 68.49% |
R2CNN-3 | (4, 8, 16) | (7, 7) | Y | (720) | 73.10% VS 74.29% |
R2CNN-4 | (4, 8, 16, 32) | (7, 7) | Y | (720) | 74.14% VS 74.36% |
R2CNN-4 | (4, 8, 16, 32) | (7, 7) | Y | (720, 1200) | 79.05% VS 81.80% |
R2CNN-5 | (4, 8, 16, 32) | (7, 7) (11, 3) (3, 11) | Y | (720) | 74.34% VS 75.34% |
R2CNN-5 | (4, 8, 16, 32) | (7, 7) (11, 3) (3, 11) | Y | (720, 1200) | 78.70% VS 82.54% |
Approaches | Anchor Scales | aspect ration | Pooled sizes | Inclined NMS | Test scales(short side) | F-measure |
---|---|---|---|---|---|---|
R2CNN-4 | (4, 8, 16, 32) | (0.5, 1, 2) | (7, 7) | Y | (720) | 74.36% |
R2CNN-4 | (4, 8, 16, 32) | (0.5, 1, 2) | (7, 7) | Y | (720, 1200) | VS 81.80% |
R2CNN-4-TextBoxes-OHEM | (4, 8, 16, 32) | (0.5, 1, 2, 3, 5, 7, 10) | (7, 7) | Y | (720) | 76.53% |
You can try Resnet-50, Resnet-101 and so on.