Pytorch implementation of RetinaNet using EfficientNets as feature extractor or backbone network.
Pytorch implementation of RetinaNet object detection as described in Focal Loss for Dense Object Detection by Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollár.
EfficientNets are a family of image classification models, which achieve state-of-the-art accuracy, yet being an order-of-magnitude smaller and faster than previous models. EfficientNets are developed based on AutoML and Compound Scaling. In particular, authors have used AutoML Mobile framework to develop a mobile-size baseline network, named as EfficientNet-B0; Then, they have used the compound scaling method to scale up this baseline to obtain EfficientNet-B1 to B7.
EfficientNets achieve state-of-the-art accuracy on ImageNet with an order of magnitude better efficiency:
In high-accuracy regime, EfficientNet-B7 achieves state-of-the-art 84.4% top-1 / 97.1% top-5 accuracy on ImageNet with 66M parameters and 37B FLOPS, being 8.4x smaller and 6.1x faster on CPU inference than previous best Gpipe.
In middle-accuracy regime, EfficientNet-B1 is 7.6x smaller and 5.7x faster on CPU inference than ResNet-152, with similar ImageNet accuracy.
Compared with the widely used ResNet-50, EfficientNet-B4 improves the top-1 accuracy from 76.3% of ResNet-50 to 82.6% (+6.3%), under similar FLOPS constraint.
For training
python train.py "dataset/train_labels.csv" "dataset/test_labels.csv" "labels.csv" "b0" 100 1
The retinanet model uses a EfficientNet backbone. You can set the backbone using the model_type argument. model_type must be one of b0, b1, b2, b3, b4 or b5. Note that deeper models are more accurate but are slower and use more memory.
For inferencing
To visualize the network detection, use inference.py
:
!python inference.py "model_path" "image_path" True
CSVGenerator
provides an easy way to define your own datasets.python create_csv_files.py
create_data_lists("Path to img_dir","Path to json","dataset")
#dataset folder contains the created csv files.
The CSV file with annotations should contain one annotation per line. Images with multiple bounding boxes should use one row per bounding box. The expected format of each line is:
path/to/image.jpg,xmin,ymin,xmax,ymax,label
A full example:
/data/imgs/img_001.jpg,837,346,981,456,cat
/data/imgs/img_002.jpg,215,312,279,391,dog
/data/imgs/img_002.jpg,22,5,89,84,cat
This defines a dataset with 2 images.
img_001.jpg
contains class 1.
img_002.jpg
contains classes 1 and 2.
The class name to ID mapping file should contain one mapping per line.
Each line should use the following format:
Refer to labels.csv
class_name,id
Indexing for classes starts at 1. Do not include a background class as it is implicit.
For example:
dog,1
cat,2
Resize()
to Resize(is_test=False, multi_scale=True, p=0.7)
in train.py(line 29)
Resize(is_test=False, multi_scale=True, p=0.7)
During training phase
is_test = False
multi_scale = True
p = 0.7(the test image size will be used 30% of the time)
During testing phase
is_test=True