yolo(all versions) implementation in keras and tensorflow 2.x
pip install git+https://github.com/unsignedrant/yolo-tf2
Verify installation
% yolotf2
Yolo-tf2 1.0
Usage:
yolotf2 <command> [options] [args]
Available commands:
train Create new or use existing dataset and train a model
detect Detect a folder of images or a video
Use yolotf2 <command> -h to see more info about a command
Use yolotf2 -h to display all command line options
yolo-tf2 was initially an implementation of yolov3 (you only look once)(training & inference) and support for all yolo versions was added in db2f889. Yolo is a state-of-the-art, real-time object detection system that is extremely fast and accurate. The official repo is here. There are many implementations that support tensorflow, only a few that support tensorflow v2 and as I did not find versions that suit my needs so, I decided to create this version which is very flexible and customizable. It requires python 3.10+, is not platform specific and is MIT licensed.
flags | help | required | default |
---|---|---|---|
--anchors | Path to anchors .txt file | True | - |
--batch-size | Training/detection batch size | - | 8 |
--classes | Path to classes .txt file | True | - |
--input-shape | Input shape ex: (m, m, c) | - | (416, 416, 3) |
--iou-threshold | iou (intersection over union) threshold | - | 0.5 |
--masks | Path to masks .txt file | True | - |
--max-boxes | Maximum boxes per image | - | 100 |
--model-cfg | Yolo DarkNet configuration .cfg file | True | - |
--quiet | If specified, verbosity is set to False | - | - |
--score-threshold | Confidence score threshold | - | 0.5 |
--v4 | If yolov4 configuration is used, this should be specified | - | - |
flags | help | default |
---|---|---|
--dataset-name | Checkpoint/dataset prefix | - |
--delete-images | If specified, dataset images will be deleted upon being saved to tfrecord. | - |
--epochs | Number of training epochs | 100 |
--es-patience | Early stopping patience | - |
--image-dir | Path to folder containing images referenced by .xml labels | - |
--labeled-examples | Path to labels .csv file | - |
--learning-rate | Training learning rate | 0.001 |
--output-dir | Path to folder where training dataset / checkpoints / other data will be saved | . |
--shuffle-buffer-size | Dataset shuffle buffer | 512 |
--train-shards | Total number of .tfrecord files to split training dataset into | 1 |
--train-tfrecord | Path to training .tfrecord file | - |
--valid-frac | Validation dataset fraction | 0.1 |
--valid-shards | Total number of .tfrecord files to split validation dataset into | 1 |
--valid-tfrecord | Path to validation .tfrecord file | - |
--weights | Path to trained weights .tf or .weights file | - |
--xml-dir | Path to folder containing .xml labels in VOC format | - |
flags | help | required | default |
---|---|---|---|
--codec | Codec to use for predicting videos | - | mp4v |
--display-vid | Display video during detection | - | - |
--evaluation-examples | Path to .csv file with ground truth for evaluation of the trained model and mAP score calculation. | - | - |
--image-dir | A directory that contains images to predict | - | - |
--images | Paths of images to detect | - | - |
--output-dir | Path to directory for saving results | - | - |
--video | Path to video to predict | - | - |
--weights | Path to trained weights .tf or .weights file | True | - |
This feature was introduced to replace the old hard-coded model. Models are loaded directly from DarkNet .cfg files for convenience.
As of db2f889 DarkNet .cfg files are automatically converted to keras models.
The current code leverages features that were introduced in tensorflow 2.x including keras models, tfrecord datasets, etc...
Both options are available, and Note in case of using DarkNet weights you must maintain the same number of COCO classes (80 classes) as transfer learning to models with different classes is not currently supported.
There are 3 input options accepted by the api:
A .csv file similar to the one below is supported. Note that x0
, y0
, x1
, y1
are x and y coordinates relative to their corresponding image width and height. For
example:
image width = 1000
image height = 500
x0, y0 = 100, 300
x1, y1 = 120, 320
x0, y0, x1, y1 = 0.1, 0.6, 0.12, 0.64 respectively.
image | object_name | object_index | x0 | y0 | x1 | y1 |
---|---|---|---|---|---|---|
/path/to/368.jpg | Car | 0 | 0.478423 | 0.57672 | 0.558036 | 0.699735 |
/path/to/368.jpg | Car | 0 | 0.540923 | 0.583333 | 0.574405 | 0.626984 |
/path/to/368.jpg | Car | 0 | 0.389881 | 0.574074 | 0.470982 | 0.683862 |
/path/to/368.jpg | Car | 0 | 0.447173 | 0.555556 | 0.497024 | 0.638889 |
/path/to/368.jpg | Street Sign | 1 | 0.946429 | 0.40873 | 0.991815 | 0.510582 |
<annotation>
<folder>VOC2012</folder>
<filename>2007_000033.jpg</filename>
<source>
<database>The VOC2007 Database</database>
<annotation>PASCAL VOC2007</annotation>
<image>flickr</image>
</source>
<size>
<width>500</width>
<height>366</height>
<depth>3</depth>
</size>
<segmented>1</segmented>
<object>
<name>aeroplane</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>9</xmin>
<ymin>107</ymin>
<xmax>499</xmax>
<ymax>263</ymax>
</bndbox>
</object>
<object>
<name>aeroplane</name>
<pose>Left</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>421</xmin>
<ymin>200</ymin>
<xmax>482</xmax>
<ymax>226</ymax>
</bndbox>
</object>
<object>
<name>aeroplane</name>
<pose>Left</pose>
<truncated>1</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>325</xmin>
<ymin>188</ymin>
<xmax>411</xmax>
<ymax>223</ymax>
</bndbox>
</object>
</annotation>
.tfrecord files previously generated by the code can be reused. A typical feature map looks like:
{
'image': tf.io.FixedLenFeature([], tf.string),
'x0': tf.io.VarLenFeature(tf.float32),
'y0': tf.io.VarLenFeature(tf.float32),
'x1': tf.io.VarLenFeature(tf.float32),
'y1': tf.io.VarLenFeature(tf.float32),
'object_name': tf.io.VarLenFeature(tf.string),
'object_index': tf.io.VarLenFeature(tf.int64),
}
A k-means algorithm finds the optimal sizes and generates anchors with process visualization.
Including:
You can always visualize different stages of the program using my other repo labelpix which is a tool for drawing bounding boxes, but can also be used to visualize bounding boxes over images using csv files in the format mentioned here.
Evaluation is available through the detection api which supports mAP score calculation. A typical evaluation result looks like:
object_name | average_precision | actual | detections | true_positives | false_positives | combined | |
---|---|---|---|---|---|---|---|
1 | Car | 0.825907 | 298 | 338 | 275 | 63 | 338 |
12 | Bus | 0.666667 | 3 | 2 | 2 | 0 | 2 |
6 | Palm Tree | 0.627774 | 122 | 93 | 82 | 11 | 93 |
7 | Trash Can | 0.555556 | 9 | 7 | 5 | 2 | 7 |
8 | Flag | 0.480867 | 14 | 8 | 7 | 1 | 8 |
2 | Traffic Lights | 0.296155 | 122 | 87 | 58 | 29 | 87 |
5 | Street Lamp | 0.289578 | 73 | 41 | 28 | 13 | 41 |
3 | Street Sign | 0.287331 | 93 | 52 | 35 | 17 | 52 |
9 | Fire Hydrant | 0.194444 | 6 | 3 | 2 | 1 | 3 |
4 | Pedestrian | 0.183942 | 130 | 56 | 35 | 21 | 56 |
0 | Delivery Truck | 0 | 1 | 0 | 0 | 0 | 0 |
10 | Road Block | 0 | 2 | 7 | 0 | 7 | 7 |
11 | Minivan | 0 | 3 | 0 | 0 | 0 | 0 |
13 | Bicycle | 0 | 4 | 1 | 0 | 1 | 1 |
14 | Pickup Truck | 0 | 2 | 0 | 0 | 0 | 0 |
You can check my other repo labelpix which is a labeling tool that you can use produce small datasets for experimentation. It supports .csv files in the format mentioned here and/or .xml files as here
Detections can be performed on photos or videos using the detection api.
The following files are expected:
Object classes .txt file.
person
bicycle
car
motorbike
aeroplane
bus
train
truck
boat
traffic light
fire hydrant
DarkNet model .cfg file
Anchors .txt file
10,13
16,30
33,23
30,61
62,45
59,119
116,90
156,198
373,326
Masks .txt file
6,7,8
3,4,5
0,1,2
Labeled examples, ONE of:
Training is available through yolo_tf2.train api. For more info about
other parameters, check the docstrings, available through help()
import yolo_tf2
yolo_tf2.train(
input_shape=(608, 608, 3),
classes='/path/to/classes.txt',
model_cfg='/path/to/darknet/file.cfg',
anchors='/path/to/anchors.txt',
masks='/path/to/masks.txt',
labeled_examples='/path/to/labeled_examples.csv',
output_dir='/path/to/training-output-dir'
)
yolotf2 train --input-shape 608 608 3 --classes /path/to/classes.txt --model-cfg /path/to/darknet/file.cfg --anchors /path/to/anchors.txt --masks /path/to/masks.txt --labeled-examples /path/to/labeled_examples.csv --output-dir /path/to/training-output-dir
The following files are expected:
Object classes .txt file.
person
bicycle
car
motorbike
aeroplane
bus
train
truck
boat
traffic light
fire hydrant
DarkNet model .cfg file
Anchors .txt file
10,13
16,30
33,23
30,61
62,45
59,119
116,90
156,198
373,326
Masks .txt file
6,7,8
3,4,5
0,1,2
Trained .tf or .weights file
Whatever is to detect: any of:
Note: For yolov4 configuration, v4=True
or --v4
should be specified
Detection is available through yolo_tf2.detect api. For more info about
other parameters, check the docstrings, available through help()
import yolo_tf2
yolo_tf2.detect(
input_shape=(608, 608, 3),
classes='/path/to/classes.txt',
anchors='/path/to/anchors.txt',
masks='/path/to/masks.txt',
model_cfg='/path/to/darknet/file.cfg',
weights='/path/to/trained_weights.tf',
images=['/path/to/image1', '/path/to/image2', ...],
output_dir='detection-output'
)
yolotf2 detect --input-shape 608 608 3 --classes /path/to/classes.txt --model-cfg /path/to/darknet/file.cfg --anchors /path/to/anchors.txt --masks /path/to/masks.txt --weights /path/to/trained_weights.tf --images /path/to/image1 /path/to/image2 --output-dir /path/to/detection-output-dir
Notes:
video
or --video
needs to be passed insteadv4=True
or --v4
should be specified**Evaluation is available through the very same detection api described in the previous section.
The only difference is an additional parameter evaluation_examples
or --evaluation-examples
for command line which is a .csv file containing the actual labels of the images being detected.
The names of the images passed will be looked for in the actual labels, and if any of the
filenames were not found, an error is raised, which means:
if you do:
import yolo_tf2
yolo_tf2.detect(
input_shape=(608, 608, 3),
classes='/path/to/classes.txt',
anchors='/path/to/anchors.txt',
masks='/path/to/masks.txt',
model_cfg='/path/to/darknet/file.cfg',
weights='/path/to/trained_weights.tf',
images=['/path/to/image1', '/path/to/image2', ...],
output_dir='detection-output',
evaluation_examples='/path/to/actual/examples'
)
evaluation_examples
.csv file should look like:
image | object_name | object_index | x0 | y0 | x1 | y1 |
---|---|---|---|---|---|---|
/path/to/image1 | Car | 0 | 0.478423 | 0.57672 | 0.558036 | 0.699735 |
/path/to/image1 | Car | 0 | 0.540923 | 0.583333 | 0.574405 | 0.626984 |
/path/to/image1 | Car | 0 | 0.389881 | 0.574074 | 0.470982 | 0.683862 |
/path/to/image2 | Car | 0 | 0.447173 | 0.555556 | 0.497024 | 0.638889 |
/path/to/image2 | Street Sign | 1 | 0.946429 | 0.40873 | 0.991815 | 0.510582 |
Because images=['/path/to/image1', '/path/to/image2', ...]
were passed,
their actual labels must be provided. Same thing applies to the images contained in a directory
if image_dir
was passed instead.
Contributions are what make the open source community such an amazing place to
learn, inspire, and create. Any contributions you make are greatly appreciated.
git checkout -b feature/AmazingFeature
)git commit -m 'Add some AmazingFeature'
)git push origin feature/AmazingFeature
)There are relevant cases in which the issues will be addressed and irrelevant ones that will be closed.
The following issues will be addressed.
The following issues will not be addressed and will be closed.
Distributed under the MIT License. See LICENSE for more information.
Give a ⭐️ if this project helped you!