Real-time PPE detection and tracking using YOLO v3 and deep_sort
In Industry, specially manufacturing industry, Personal Protective Equipment (PPE) like helmet (hard-hat), safety-harness, goggles etc play a very important role in ensuring the safety of workers. However, many accidents still occur, due to the negligence of the workers as well as their supervisors. Supervisors can make mistakes due to the fact that such tasks are monotonous and they may not be able to monitor consistently. This project aims to utilize existing CCTV camera infrastructure to assist supervisors to monitor workers effectively by providing them with real time alerts.
It detects persons without helmet and displays the number of persons with helmet and those without helmet. It sends notification in the message box for each camera. There is global message box, where alerts from all cameras are displayed.
It detects that the same person about which it had warned earlier has now worn a helmet and notifies that also.
Please note that this is still a work under progress and new ideas and contributions are welcome.
Using conda environment is recommended. Follow these steps to get the code running:
full_yolo3_helmet_and_person.h5
conda env create -f environment.yml
Alternatively,
conda create --name helmet-detection --file requirements.txt
conda activate helmet-detection
python predict_gui.py -c config.json -n <number of cameras>
Note that the gui supports only upto 2 cameras.
To run the code without gui :
python predict.py -c config.json -n <number of cameras>
Here you can enter any number of cameras you want to use.
Data Collection
The dataset containing images of people wearing helmets and people without helmets were collected mostly from google search. Some images have people applauding, those were collected from Stanford 40 Action Dataset. Download images for training from train_image_folder.
Annotations
Annotaion of each image was done in Pascal VOC format using the awesome lightweight annotation tool LabelImg for object-detection. Download annotations from train_annot_folder.
Organize the dataset into 4 folders:
There is a one-to-one correspondence by file name between images and annotations. If the validation set is empty, the training set will be automatically splitted into the training set and validation set using the ratio of 0.8.
The configuration file is a json file, which looks like this:
{
"model" : {
"min_input_size": 288,
"max_input_size": 448,
"anchors": [33,34, 52,218, 55,67, 92,306, 96,88, 118,158, 153,347, 209,182, 266,359],
"labels": ["helmet","person with helmet","person without helmet"]
},
"train": {
"train_image_folder": "train_image_folder/",
"train_annot_folder": "train_annot_folder/",
"cache_name": "helmet_train.pkl",
"train_times": 8,
"batch_size": 8,
"learning_rate": 1e-4,
"nb_epochs": 100,
"warmup_epochs": 3,
"ignore_thresh": 0.5,
"gpus": "0,1",
"grid_scales": [1,1,1],
"obj_scale": 5,
"noobj_scale": 1,
"xywh_scale": 1,
"tensorboard_dir": "logs",
"saved_weights_name": "full_yolo3_helmet_and_person.h5",
"debug": true
},
"valid": {
"valid_image_folder": "",
"valid_annot_folder": "",
"cache_name": "",
"valid_times": 1
}
}
The model section defines the type of the model to construct as well as other parameters of the model such as the input image size and the list of anchors. The labels
setting lists the labels to be trained on. Only images, which has labels being listed, are fed to the network. The rest images are simply ignored. By this way, a Dog Detector can easily be trained using VOC or COCO dataset by setting labels
to ['dog']
.
Download pretrained weights for backend at: backend.h5
These weights must be put in the root folder of the repository. They are the pretrained weights for the backend only and will be loaded during model creation. The code does not work without these weights.
python gen_anchors.py -c config.json
Copy the generated anchors printed on the terminal to the anchors
setting in config.json
.
python train.py -c config.json
By the end of this process, the code will write the weights of the best model to file best_weights.h5 (or whatever name specified in the setting "saved_weights_name" in the config.json file). The training process stops when the loss on the validation set is not improved in 3 consecutive epoches.
To run the code with gui :
python predict_gui.py -c config.json -n <number of cameras>
Note that the gui supports only upto 2 cameras.
To run the code without gui :
python predict.py -c config.json -n <number of cameras>
Here you can enter any number of cameras you want to use.