Faster R-CNN / R-FCN :bulb: C++ version based on Caffe
Special Features for This Caffe Repository
Conv + BatchNorm + Scale
layers to 1 layer when those layer are freezed to reduce memory: 'examples/FRCNN/res50/gen_merged_model.py'. script for merge ResNet: 'examples/FRCNN/merge_resnet.sh'.Special layers
Python layer
,from caffe PR#5294
Data Preprocess
data enhancement:
data augmentation:
TODO list
This repository uses C++11 features, so make sure to use compiler that is compatible of C++11.
Tested on CUDA 8.0/9.2, CuDNN 7.0, NCCLv1#286916a.
GCC v5.4.0/7.3.1, note that versions lower than v5 are not supported. Python 2.7 for python scripts.
cd $CAFFE_ROOT
cp Makefile.config.example Makefile.config
# modify the content in Makefile.config to adapt your system
# if you like to use VisualDL to log losses, set USE_VISUALDL to 1,
# and cd src/logger && make
make -j7
# extra: 'py' for python interface of Caffe.
# extra: 'pyfrcnn' python wrapper of C++ api. You can use this for demo.
make pyfrcnn py
All following steps, you should do these in the $CAFFE_ROOT
path.
The official Faster R-CNN code of NIPS 2015 paper (written in MATLAB) is available here. It is worth noticing that:
commit 8ba1d26
as base framework.Using sh example/FRCNN/demo_frcnn.sh
, the will process five pictures in the examples/FRCNN/images
, and put results into examples/FRCNN/results
.
Note: You should prepare the trained caffemodel into models/FRCNN
, such as ZF_faster_rcnn_final.caffemodel
for ZF model.
examples/FRCNN/dataset/voc2007.trainval
.examples/FRCNN/dataset/voc2007.trainval
.ln -s $YOUR_VOCdevkit_Path $CAFFE_ROOT/VOCdevkit
.As shown in VGG example models/FRCNN/vgg16/train_val.proto
, the original pictures should appear at $CAFFE_ROOT/VOCdevkit/VOC2007/JPEGImages/
. (Check window_data_param in FrcnnRoiData)
If you want to train Faster R-CNN on your own dataset, you may prepare custom dataset list. The format is as below
# image-id
image-name
number of boxes
label x1 y1 x2 y2 difficulty
...
sh examples/FRCNN/zf/train_frcnn.sh
will start training process of voc2007 data using ZF model.
The ImageNet pre-trained models can be found in this link
If you use the provided training script, please make sure:
examples/FRCNN/convert_model.py
transform the parameters of bbox_pred
layer by mean and stds values,
because the regression value is normalized during training and we should recover it to obtain the final model.
sh examples/FRCNN/zf/test_frcnn.sh
the will evaluate the performance of voc2007 test data using the trained ZF model.
The program use config file named like config.json
to set params. Special params need to be cared about:
data_jitter
: data augmentation, if set <0 then no jitter,hue,saturation,exposureim_size_align
: set to stride of last conv layer of FPN to avoid Deconv shape problem, such as 64, set to 0 to disablebbox_normalize_targets
: do bbox norm in training, and do unnorm at testing(do not need convert model weight before testing)test_rpn_score_thresh
: you can set >0 to speed up NMS at testingScripts and prototxts for different models are listed in the examples/FRCNN
More details about the code in include and src directory:
api/FRCNN
for demo and test apicaffe/FRCNN
contains codes related to Faster R-CNNcaffe/RFCN
for R-FCNcaffe/DeformConv
for Deformable Convcaffe/SSD
for SSDexamples/YOLO
for YOLOv3 inference, includes converter script and demo. pay attention to the Upsample layer usage.logger
dir relates to logger toolsmodules
and yaml-cpp
relate to Caffe module layers, which include FPN layers .etcpython/frcnn
relates to pybind11 interface for democaffe/ACTION_REC
Two-Stream Convolutional Networks for Action Recognition in Videocaffe/CTPN
relates to CTPN special layers for scene text detectioncaffe/PR
for some layers from caffe PRFor synchronous with official caffe
Rebase the dev branch
frcnn_proposal_layer.cu
requires a head file <cub/cub.cuh>
. CUB is library contained in the official Cuda Toolkit, usually can be found in /usr/local/cuda/include/thrust/system/cuda/detail/
. You should add this path in your Makefile.config
(try locate cub.cuh
to find cub on your system)error: RPC failed; result=22, HTTP code = 0
, use git config http.postBuffer 524288000
, increases git buffer to 500mbCAFFE_LAYER_PATH
then in predefined DEFAULT_LAYER_PATH
in Makefile. So try to set CAFFE_LAYER_PATH
in shell script. And this could be happen when using pycaffe.bg_thresh_lo
to 0 when use OHEM.Caffe is released under the BSD 2-Clause license. The BAIR/BVLC reference models are released for unrestricted use.
Please cite the following papers in your publications if it helps your research:
@article{jia2014caffe,
Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
Journal = {arXiv preprint arXiv:1408.5093},
Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
Year = {2014}
}
@inproceedings{girshick2015fast,
title={Fast R-CNN},
author={Girshick, Ross},
booktitle={International Conference on Computer Vision},
pages={1440--1448},
year={2015}
}
@inproceedings{ren2015faster,
title={Faster {R-CNN}: Towards real-time object detection with region proposal networks},
author={Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian},
booktitle={Neural Information Processing Systems},
pages={91--99},
year={2015}
}
@article{ren2017faster,
title={Faster {R-CNN}: Towards real-time object detection with region proposal networks},
author={Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
volume={39},
number={6},
pages={1137--1149},
year={2017},
publisher={IEEE}
}
@article{dai16rfcn,
Author = {Jifeng Dai, Yi Li, Kaiming He, Jian Sun},
Title = {{R-FCN}: Object Detection via Region-based Fully Convolutional Networks},
Journal = {arXiv preprint arXiv:1605.06409},
Year = {2016}
}
@article{dai17dcn,
Author = {Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei},
Title = {Deformable Convolutional Networks},
Journal = {arXiv preprint arXiv:1703.06211},
Year = {2017}
}
@article{
Author = {Navaneeth Bodla and Bharat Singh and Rama Chellappa and Larry S. Davis},
Title = {Soft-NMS -- Improving Object Detection With One Line of Code},
Booktitle = {Proceedings of the IEEE International Conference on Computer Vision},
Year = {2017}
}
@article{li2017light,
title={Light-Head R-CNN: In Defense of Two-Stage Object Detector},
author={Li, Zeming and Peng, Chao and Yu, Gang and Zhang, Xiangyu and Deng, Yangdong and Sun, Jian},
journal={arXiv preprint arXiv:1711.07264},
year={2017}
}
@inproceedings{cai18cascadercnn,
author = {Zhaowei Cai and Nuno Vasconcelos},
Title = {Cascade R-CNN: Delving into High Quality Object Detection},
booktitle = {CVPR},
Year = {2018}
}