A 2-CNN pipeline to do both detection (using bounding box regression) and classification of numbers on SVHN dataset.
This is my (not very successful) attempt to do both detection and classification of numbers in SVHN dataset using 2 CNNs.
This project contains 2 parts:
My original intension was that this would improve the accuracy compared to the case where we just feed the entire svhn image into the CNN and let the CNN predict all the digits in the image. But the entire pipeline gave me only 51% accuracy where all the digits match exactly and individual digit accuracies of 71%, 65%, 84% and 98% for the 1st, 2nd, 3rd and 4th digit respectively (we only consider max of 4-digit prediction).
The bounding boxes in the images below are coordinates predicted by the detection CNN and the number prediction is done by the classification CNN.
Image | Predicted value | Actual value |
---|---|---|
1522 | 1502 | |
135 | 135 | |
861 | 861 | |
348 | 348 | |
114 | 114 | |
23 | 23 |
The bounding boxes in the images below are coordinates predicted by the detection CNN and the number prediction is done by the classification CNN.
Image | Predicted value | Actual value |
---|---|---|
32 | 863 | |
6 | 7 | |
8 | 26 | |
1 | 184 | |
1410 | 44 | |
27 | 6 |
construct_datasets.py
Uses the images downloaded from SVHN dataset website website along with the .mat files describing the bounding box to build a single table for each test and train for easy use in other files. If you don't want to run this file, download it .h5 files from the google drive link below.
train_digit_classification.py
Uses the processed .h5 files in data folder to train a classification CNN.
train_digit_detection.py
Uses the processed .h5 files in data folder to train a detection CNN.
combi_models.py
After training both networks, this file uses both networks to implement all the steps described in the pipeline section above.
Weights for both CNNs and .h5 files for train and test datasets are available in the link below:
CNN Weights: https://drive.google.com/open?id=1vv7vzqzGjjUqjcCZYeX_NaGrqSU1Ami2
Dataset: https://drive.google.com/open?id=1KfVqQHjimQnXdzsCtQurwmTSpMe2mmA7
Python 3.5
All code was run on Amazon EC2 Deep Learning AMI version 7 (ami-139a476c)
I also tested this on my local Windows 10 PC with the following libraries: