I implemented a detection algorithm with a classification data set that does not have annotation information for the bounding box. Based on resnet50 network, I implemented text detector using class activation mapping method.
I am implementing a detection algorithm with a classification data set that does not have annotation information for the bounding box. I used the class activation mapping proposed in Learning Deep Features for Discriminative Localization.
The procedure to build detector is as follows:
Pre-trained weight files are stored at https://drive.google.com/drive/folders/1kj398ZW3zwk-KMDA0_Y5tq6Gr09oFDbZ. This file allows you to skip the fine tuning process.
activation_40 (Activation) (None, 14, 14, 1024) 0 add_13[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 14, 14, 1024) 9438208 activation_40[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 14, 14, 1024) 4096 conv2d_1[0][0]
__________________________________________________________________________________________________
activation_50 (Activation) (None, 14, 14, 1024) 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
cam_average_pooling (AveragePoo (None, 1, 1, 1024) 0 activation_50[0][0]
__________________________________________________________________________________________________
flatten_2 (Flatten) (None, 1024) 0 cam_average_pooling[0][0]
__________________________________________________________________________________________________
cam_cls (Dense) (None, 2) 2050 flatten_2[0][0]
1_train.py
.Running the 1_train.py
will do all of this.
activation_50 (Activation) (None, 14, 14, 1024) 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
binear_up_sampling2d_1 (BinearU (None, 224, 224, 102 0 activation_50[0][0]
__________________________________________________________________________________________________
reshape_1 (Reshape) (None, 50176, 1024) 0 binear_up_sampling2d_1[0][0]
__________________________________________________________________________________________________
cam_cls (Dense) (None, 50176, 2) 2050 reshape_1[0][0]
__________________________________________________________________________________________________
reshape_2 (Reshape) (None, 224, 224, 2) 0 cam_cls[0][0]
Load the weight we learned earlier on this network. image classifiers
and CAM networks
have different structures. However, since the layer naming is set in advance, the weight file can be used in both networks in common.
Entering image on this network will generate a text activation map.
Running the 2_cam_plot.py
will do all of this.