Object localization in images using simple CNNs and Keras
This project shows how to localize objects in images by using simple convolutional neural networks.
Before getting started, we have to download a dataset and generate a csv file containing the annotations (boxes).
First, let's look at YOLOv2's approach:
We proceed in the same way to build the object detector:
_inverted_res_block
for MobileNetv2)The code in this repository uses MobileNetv2, because it is faster than other models and the performance can be adapted. For example, if alpha = 0.35 with 96x96 is not good enough, one can just increase both values (see here for a comparison). If you use another architecture, change preprocess_input
.
python3 example_1/train.py
example_1/test.py
(given by the last script)python3 example_1/test.py
In the following images red is the predicted box, green is the ground truth:
This time we have to run the scripts example_2/train.py
and example_2/test.py
.
In order to distinguish between classes, we have to modify the loss function. I'm using here w_1*log((y_hat - y)^2 + 1) + w_2*FL(p_hat, p)
where w_1 = w_2 = 1
are two weights and FL(p_hat, p) = -(0.9(1 - p_hat)^2 p*log(p_hat) + 0.1*p_hat^2(1 - p)log(1-p_hat))
(focal loss).
Instead of using all 37 classes, the code will only output class 0 (contains only class 0) or class 1 (contains class 1 to 36). However, it is easy to extend this to more classes (use categorical cross entropy instead of focal loss and try out different weights).
In this example, we use a skip-net architecture similar to U-Net. For an in-depth explanation see my blog post.
This example is based on the three YOLO papers. For an in-depth explanation see this blog post.
example_4
the same code can be added to the other examplesALPHA
and IMAGE_SIZE
in train_model.pyIMAGE_SIZE
BATCH_SIZE
IMAGE_SIZE
and ALPHA