Keras implementation of RetinaNet for object detection and visual relationship identification
This is the Keras implementation of RetinaNet for object detection as described in Focal Loss for Dense Object Detection by Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollár.
If this repository helps you in anyway, show your love :heart: by putting a :star: on this project :v:
The RetinaNet used is a single, unified network composed of a resnet50 backbone network and two task-specific subnetworks. The backbone is responsible for computing a convolution feature map over an entire input image and is an off-the-self convolution network. The first subnet performs classification on the backbones output; the second subnet performs convolution bounding box regression. The RetinaNet is a good model for object detection but getting it to work was a challenge. I underestimated the high number of classes and the size of the data set but was still able to land a bronze medal (Top 20%) among 450 competitors with some tweaks. The benchmark file is added for reference with the local score for predictions and the parameter used.
I focused on Object detection and used a simple multi class linear regressor for relationship prediction. Unlike the usual approach of using a LSTM, I experimented with a Random Forest Classifier and a Multi Output Classifier from sklearn just to prove LSTM doesn't have much intelligence behind it and it was just a statistical tool. And the local classification scores proved I was right with giving me an accuracy greater than 90%. And since my visual relationship was based on how good my object detector performed I was not able to get a better score but with this model I was able to land a bronze model (Top 30%) among 230 competitors.
The build was made for the Google AI Object Detection and Visual Relationship Kaggle challenge so if you are using this project on Googles' Open Image data set follow the instructions below to run the module. Also the code is written in such a way that you can take individual modules to build a custom model as per your needs. So when you install the model, make sure you turn the imports into absolute imports or follow the Folder Structure shown below.
The code was initially run on a NVIDIA GeForce GTX 1050 Ti but the model exploded since for the Open Image data set consisted of 1,743,042 Images and 500 classes with 12,195,144 bounding boxes and the image size was resized to 600 by 600. Resizing the images could have solved the issue but did not try it. Instead the code was run on a NVIDIA Tesla K80 and the model worked fine and to convert the training model to a inference model NVIDIA Tesla P100 was used. So I would recommend a K80 or a higher version of GPU.
main_dir
- challenge2018 (The folder containing data files for the challenge)
- images
- train (consists of the train images)
- test (consists of the test images)
- keras_retinanet (keras retinanet package)
- callbacks
- callbacks.py
- models
- classifier.py
- model_backbone.py
- resnet.py
- retinanet.py
- preprocessing
- generator.py
- image.py
- open_images.py
- trainer
- convert_model.py
- evaluate.py
- model.py
- task.py
- utils
- anchors.py
- clean.py
- freeze.py
- initializers.py
- layers.py
- losses.py
Run the task.py
from the trainer folder.
task.py main_dir(path/to/main directory) dataset_type(oid)
First run the convert_model.py
to convert the training model to inference model.
Then run the evaluate.py
for evaluation. Evaluation is defaulted for both object detection and visual
relationship identification, to select between the object detection and the visual relationship identification
add 'od' or 'vr' when calling the evaluate.py
convert_model.py main_dir(path/to/main directory) model_in(model name to be used to convert)
evaluate.py main_dir(path/to/main directory) model_in(model name to be used for evaluation)
callbacks.py:
classifier.py:
model_backbone.py:
resnet.py:
retinanet.py:
generator.py:
image.py:
open_images.py:
convert_model.py:
evaluate.py:
model.py:
task.py:
anchors.py:
clean.py:
freeze.py:
initializers.py:
layers.py:
losses.py:
This project is licensed under the MIT License - see the LICENSE file for details