YOLOv3 in PyTorch > ONNX > CoreML > TFLite
This release is a minor update implementing numerous bug fixes, feature additions and performance improvements from https://github.com/ultralytics/yolov5 to this repo. Models remain unchanged from v9.0 release.
The ultralytics/yolov3 repository is now divided into two branches:
$ git clone https://github.com/ultralytics/yolov3 # master branch (default)
$ git clone -b archive https://github.com/ultralytics/yolov3 # archive branch
** GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 32, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS. EfficientDet data from google/automl at batch size 8.
Model | APval | APtest | AP50 | SpeedGPU | FPSGPU | params | FLOPS | |
---|---|---|---|---|---|---|---|---|
YOLOv3 | 43.3 | 43.3 | 63.0 | 4.8ms | 208 | 61.9M | 156.4B | |
YOLOv3-SPP | 44.3 | 44.3 | 64.6 | 4.9ms | 204 | 63.0M | 157.0B | |
YOLOv3-tiny | 17.6 | 34.9 | 34.9 | 1.7ms | 588 | 8.9M | 13.3B |
** APtest denotes COCO test-dev2017 server results, all other AP results denote val2017 accuracy.
** All AP numbers are for single-model single-scale without ensemble or TTA. Reproduce mAP by python test.py --data coco.yaml --img 640 --conf 0.001 --iou 0.65
** SpeedGPU averaged over 5000 COCO val2017 images using a GCP n1-standard-16 V100 instance, and includes image preprocessing, FP16 inference, postprocessing and NMS. NMS is 1-2ms/img. Reproduce speed by python test.py --data coco.yaml --img 640 --conf 0.25 --iou 0.45
** All checkpoints are trained to 300 epochs with default settings and hyperparameters (no autoaugmentation).
** Test Time Augmentation (TTA) runs at 3 image sizes. Reproduce TTA by python test.py --data coco.yaml --img 832 --iou 0.65 --augment
Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7
. To install run:
$ pip install -r requirements.txt
This release is a major update to the https://github.com/ultralytics/yolov3 repository that brings forward-compatibility with YOLOv5, and incorporates numerous bug fixes, feature additions and performance improvements from https://github.com/ultralytics/yolov5 to this repo.
The ultralytics/yolov3 repository is now divided into two branches:
$ git clone https://github.com/ultralytics/yolov3 # master branch (default)
$ git clone -b archive https://github.com/ultralytics/yolov3 # archive branch
** GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 32, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS. EfficientDet data from google/automl at batch size 8.
Model | APval | APtest | AP50 | SpeedGPU | FPSGPU | params | FLOPS | |
---|---|---|---|---|---|---|---|---|
YOLOv3 | 43.3 | 43.3 | 63.0 | 4.8ms | 208 | 61.9M | 156.4B | |
YOLOv3-SPP | 44.3 | 44.3 | 64.6 | 4.9ms | 204 | 63.0M | 157.0B | |
YOLOv3-tiny | 17.6 | 34.9 | 34.9 | 1.7ms | 588 | 8.9M | 13.3B |
** APtest denotes COCO test-dev2017 server results, all other AP results denote val2017 accuracy.
** All AP numbers are for single-model single-scale without ensemble or TTA. Reproduce mAP by python test.py --data coco.yaml --img 640 --conf 0.001 --iou 0.65
** SpeedGPU averaged over 5000 COCO val2017 images using a GCP n1-standard-16 V100 instance, and includes image preprocessing, FP16 inference, postprocessing and NMS. NMS is 1-2ms/img. Reproduce speed by python test.py --data coco.yaml --img 640 --conf 0.25 --iou 0.45
** All checkpoints are trained to 300 epochs with default settings and hyperparameters (no autoaugmentation).
** Test Time Augmentation (TTA) runs at 3 image sizes. Reproduce TTA by python test.py --data coco.yaml --img 832 --iou 0.65 --augment
Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7
. To install run:
$ pip install -r requirements.txt
This is the final release of the darknet-compatible version of the https://github.com/ultralytics/yolov3 repository. This release is backwards-compatible with darknet *.cfg files for model configuration.
All pytorch (*.pt
) and darknet (*.weights
) models/backbones available are attached to this release in the Assets section below.
There are no breaking changes in this release.
https://cloud.google.com/deep-learning-vm/
Machine type: preemptible n1-standard-8 (8 vCPUs, 30 GB memory)
CPU platform: Intel Skylake
GPUs: K80 ($0.14/hr), T4 ($0.11/hr), V100 ($0.74/hr) CUDA with Nvidia Apex FP16/32
HDD: 300 GB SSD
Dataset: COCO train 2014 (117,263 images)
Model: yolov3-spp.cfg
Command: python3 train.py --data coco2017.data --img 416 --batch 32
GPU | n | --batch-size |
img/s | epoch time |
epoch cost |
---|---|---|---|---|---|
K80 | 1 | 32 x 2 | 11 | 175 min | $0.41 |
T4 | 1 2 |
32 x 2 64 x 1 |
41 61 |
48 min 32 min |
$0.09 $0.11 |
V100 | 1 2 |
32 x 2 64 x 1 |
122 178 |
16 min 11 min |
$0.21 $0.28 |
2080Ti | 1 2 |
32 x 2 64 x 1 |
81 140 |
24 min 14 min |
- - |
Size | COCO mAP @0.5...0.95 |
COCO mAP @0.5 |
|
---|---|---|---|
YOLOv3-tiny YOLOv3 YOLOv3-SPP YOLOv3-SPP-ultralytics |
320 | 14.0 28.7 30.5 37.7 |
29.1 51.8 52.3 56.8 |
YOLOv3-tiny YOLOv3 YOLOv3-SPP YOLOv3-SPP-ultralytics |
416 | 16.0 31.2 33.9 41.2 |
33.0 55.4 56.9 60.6 |
YOLOv3-tiny YOLOv3 YOLOv3-SPP YOLOv3-SPP-ultralytics |
512 | 16.6 32.7 35.6 42.6 |
34.9 57.7 59.5 62.4 |
YOLOv3-tiny YOLOv3 YOLOv3-SPP YOLOv3-SPP-ultralytics |
608 | 16.6 33.1 37.0 43.1 |
35.4 58.2 60.7 62.8 |
This release requires PyTorch >= v1.4 to function properly. Please install the latest version from https://github.com/pytorch/pytorch/releases
There are no breaking changes in this release.
--augment
argument in test.py and detect.py.--rect
argument in train.pyhttps://cloud.google.com/deep-learning-vm/
Machine type: preemptible n1-standard-8 (8 vCPUs, 30 GB memory)
CPU platform: Intel Skylake
GPUs: K80 ($0.14/hr), T4 ($0.11/hr), V100 ($0.74/hr) CUDA with Nvidia Apex FP16/32
HDD: 300 GB SSD
Dataset: COCO train 2014 (117,263 images)
Model: yolov3-spp.cfg
Command: python3 train.py --data coco2017.data --img 416 --batch 32
GPU | n | --batch-size |
img/s | epoch time |
epoch cost |
---|---|---|---|---|---|
K80 | 1 | 32 x 2 | 11 | 175 min | $0.41 |
T4 | 1 2 |
32 x 2 64 x 1 |
41 61 |
48 min 32 min |
$0.09 $0.11 |
V100 | 1 2 |
32 x 2 64 x 1 |
122 178 |
16 min 11 min |
$0.21 $0.28 |
2080Ti | 1 2 |
32 x 2 64 x 1 |
81 140 |
24 min 14 min |
- - |
Size | COCO mAP @0.5...0.95 |
COCO mAP @0.5 |
|
---|---|---|---|
YOLOv3-tiny YOLOv3 YOLOv3-SPP YOLOv3-SPP-ultralytics |
320 | 14.0 28.7 30.5 37.7 |
29.1 51.8 52.3 56.8 |
YOLOv3-tiny YOLOv3 YOLOv3-SPP YOLOv3-SPP-ultralytics |
416 | 16.0 31.2 33.9 41.2 |
33.0 55.4 56.9 60.6 |
YOLOv3-tiny YOLOv3 YOLOv3-SPP YOLOv3-SPP-ultralytics |
512 | 16.6 32.7 35.6 42.6 |
34.9 57.7 59.5 62.4 |
YOLOv3-tiny YOLOv3 YOLOv3-SPP YOLOv3-SPP-ultralytics |
608 | 16.6 33.1 37.0 43.1 |
35.4 58.2 60.7 62.8 |
This release requires PyTorch >= v1.0.0 to function properly. Please install the latest version from https://github.com/pytorch/pytorch/releases
There are no breaking changes in this release.
https://cloud.google.com/deep-learning-vm/
Machine type: n1-standard-8 (8 vCPUs, 30 GB memory)
CPU platform: Intel Skylake
GPUs: K80 ($0.198/hr), P4 ($0.279/hr), T4 ($0.353/hr), P100 ($0.493/hr), V100 ($0.803/hr)
HDD: 100 GB SSD
Dataset: COCO train 2014
GPUs | batch_size |
batch time | epoch time | epoch cost |
---|---|---|---|---|
(images) | (s/batch) | |||
1 K80 | 16 | 1.43s | 175min | $0.58 |
1 P4 | 8 | 0.51s | 125min | $0.58 |
1 T4 | 16 | 0.78s | 94min | $0.55 |
1 P100 | 16 | 0.39s | 48min | $0.39 |
2 P100 | 32 | 0.48s | 29min | $0.47 |
4 P100 | 64 | 0.65s | 20min | $0.65 |
1 V100 | 16 | 0.25s | 31min | $0.41 |
2 V100 | 32 | 0.29s | 18min | $0.48 |
4 V100 | 64 | 0.41s | 13min | $0.70 |
8 V100 | 128 | 0.49s | 7min | $0.80 |
This release requires PyTorch >= v1.0.0 to function properly. Please install the latest version from https://github.com/pytorch/pytorch/releases
There are no breaking changes in this release.
https://cloud.google.com/deep-learning-vm/
Machine type: n1-standard-8 (8 vCPUs, 30 GB memory)
CPU platform: Intel Skylake
GPUs: K80 ($0.198/hr), P4 ($0.279/hr), T4 ($0.353/hr), P100 ($0.493/hr), V100 ($0.803/hr)
HDD: 100 GB SSD
Dataset: COCO train 2014
GPUs | batch_size |
batch time | epoch time | epoch cost |
---|---|---|---|---|
(images) | (s/batch) | |||
1 K80 | 16 | 1.43s | 175min | $0.58 |
1 P4 | 8 | 0.51s | 125min | $0.58 |
1 T4 | 16 | 0.78s | 94min | $0.55 |
1 P100 | 16 | 0.39s | 48min | $0.39 |
2 P100 | 32 | 0.48s | 29min | $0.47 |
4 P100 | 64 | 0.65s | 20min | $0.65 |
1 V100 | 16 | 0.25s | 31min | $0.41 |
2 V100 | 32 | 0.29s | 18min | $0.48 |
4 V100 | 64 | 0.41s | 13min | $0.70 |
8 V100 | 128 | 0.49s | 7min | $0.80 |
This release requires PyTorch >= v1.0.0 to function properly. Please install the latest version from https://github.com/pytorch/pytorch/releases
There are no breaking changes in this release.
test.py
now natively outputs the same results as pycocotools to within 1% under most circumstances https://github.com/ultralytics/yolov3/issues/2
ultralytics/yolov3 with pycocotools |
darknet/yolov3 | |
---|---|---|
YOLOv3-320 | 51.8 | 51.5 |
YOLOv3-416 | 55.4 | 55.3 |
YOLOv3-608 | 58.2 | 57.9 |
sudo rm -rf yolov3 && git clone https://github.com/ultralytics/yolov3
# bash yolov3/data/get_coco_dataset.sh
sudo rm -rf cocoapi && git clone https://github.com/cocodataset/cocoapi && cd cocoapi/PythonAPI && make && cd ../.. && cp -r cocoapi/PythonAPI/pycocotools yolov3
cd yolov3
python3 test.py --save-json --conf-thres 0.001 --img-size 416
Namespace(batch_size=32, cfg='cfg/yolov3.cfg', conf_thres=0.001, data_cfg='cfg/coco.data', img_size=416, iou_thres=0.5, nms_thres=0.5, save_json=True, weights='weights/yolov3.weights')
Using cuda _CudaDeviceProperties(name='Tesla V100-SXM2-16GB', major=7, minor=0, total_memory=16130MB, multi_processor_count=80)
Image Total P R mAP
Calculating mAP: 100%|█████████████████████████████████| 157/157 [08:34<00:00, 2.53s/it]
5000 5000 0.0896 0.756 0.555
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.312
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.554
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.317
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.145
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.343
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.452
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.268
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.411
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.435
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.244
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.477
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.587
python3 test.py --save-json --conf-thres 0.001 --img-size 608 --batch-size 16
Namespace(batch_size=16, cfg='cfg/yolov3.cfg', conf_thres=0.001, data_cfg='cfg/coco.data', img_size=608, iou_thres=0.5, nms_thres=0.5, save_json=True, weights='weights/yolov3.weights')
Using cuda _CudaDeviceProperties(name='Tesla V100-SXM2-16GB', major=7, minor=0, total_memory=16130MB, multi_processor_count=80)
Image Total P R mAP
Calculating mAP: 100%|█████████████████████████████████| 313/313 [08:54<00:00, 1.55s/it]
5000 5000 0.0966 0.786 0.579
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.331
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.582
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.344
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.198
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.362
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.427
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.281
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.437
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.463
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.309
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.494
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.577
--conf-thres 0.001
captures many boxes that all must be passed through NMS. On a V100 test.py runs in about 8 minuteshttps://cloud.google.com/deep-learning-vm/
Machine type: n1-standard-8 (8 vCPUs, 30 GB memory)
CPU platform: Intel Skylake
GPUs: K80 ($0.198/hr), P4 ($0.279/hr), T4 ($0.353/hr), P100 ($0.493/hr), V100 ($0.803/hr)
HDD: 100 GB SSD
Dataset: COCO train 2014
GPUs | batch_size |
batch time | epoch time | epoch cost |
---|---|---|---|---|
(images) | (s/batch) | |||
1 K80 | 16 | 1.43s | 175min | $0.58 |
1 P4 | 8 | 0.51s | 125min | $0.58 |
1 T4 | 16 | 0.78s | 94min | $0.55 |
1 P100 | 16 | 0.39s | 48min | $0.39 |
2 P100 | 32 | 0.48s | 29min | $0.47 |
4 P100 | 64 | 0.65s | 20min | $0.65 |
1 V100 | 16 | 0.25s | 31min | $0.41 |
2 V100 | 32 | 0.29s | 18min | $0.48 |
4 V100 | 64 | 0.41s | 13min | $0.70 |
8 V100 | 128 | 0.49s | 7min | $0.80 |
This release requires PyTorch >= v1.0.0 to function properly. Please install the latest version from https://github.com/pytorch/pytorch/releases
There are no breaking changes in this release.
test.py
optionally outputs pycocotools compatible json files now with the --save-json
flag, and computes official COCO mAP using pycocotools. Output is verified against official darknet results from https://arxiv.org/abs/1804.02767. https://github.com/ultralytics/yolov3/issues/2#issuecomment-434751531.ultralytics/yolov3 mAP | darknet mAP | |
---|---|---|
YOLOv3-320 | 51.3 | 51.5 |
YOLOv3-416 | 54.9 | 55.3 |
YOLOv3-608 | 57.9 | 57.9 |
https://cloud.google.com/deep-learning-vm/
Machine type: n1-highmem-4 (4 vCPUs, 26 GB memory)
CPU platform: Intel Skylake
GPUs: 1-4 x NVIDIA Tesla P100
HDD: 100 GB SSD
GPUs | batch_size |
speed | COCO epoch |
---|---|---|---|
(P100) | (images) | (s/batch) | (min/epoch) |
1 | 16 | 0.54s | 66min |
2 | 32 | 0.99s | 61min |
4 | 64 | 1.61s | 49min |