YOLOv6 Versions Save

YOLOv6: a single-stage object detection framework dedicated to industrial applications.

0.4.1

7 months ago

Features

Release YOLOv6-Segmentation models at full scales
Achieve the state-of-the-art accuracy in Real-time Instance Segmentation.

Performance of YOLOv6-seg models

Model	Size	mAP^box 50-95	mAP^mask 50-95	Speed^{T4 trt fp16 b1 (fps)}	Params ^(M)	FLOPs ^(G)
YOLOv6-N	640	35.3	31.2	645	4.9	7.0
YOLOv6-S	640	44.0	38.0	292	19.6	27.7
YOLOv6-M	640	48.2	41.3	148	37.1	54.3
YOLOv6-L	640	51.1	43.7	93	63.6	95.5
YOLOv6-X	640	52.2	44.8	47	119.1	175.5

Table Notes

All checkpoints are trained from scratch on COCO for 300 epochs without distillation.
Results of the mAP and speed are evaluated on COCO val2017 dataset with the input resolution of 640×640.
Speed is tested with TensorRT 8.5 on T4 without post-processing.

0.4.0

1 year ago

v4.0 release

Features

Release YOLOv6Lite models on mobile or CPU.
Update MBLABlock in the network structure.
Update YOLOv6Lite-face models on mobile or CPU.
Code reconstruction and normalization of convolution operators.

Performance of YOLOv6Lite models

Model	Size	mAP^val 0.5:0.95	sm8350 ^(ms)	mt6853 ^(ms)	sdm660 ^(ms)	Params ^(M)	FLOPs ^(G)
YOLOv6Lite-S	320*320	22.4	7.99	11.99	41.86	0.55	0.56
YOLOv6Lite-M	320*320	25.1	9.08	13.27	47.95	0.79	0.67
YOLOv6Lite-L	320*320	28.0	11.37	16.20	61.40	1.09	0.87
YOLOv6Lite-L	320*192	25.0	7.02	9.66	36.13	1.09	0.52
YOLOv6Lite-L	224*128	18.9	3.63	4.99	17.76	1.09	0.24

Table Notes

From the perspective of model size and input image ratio, we have built a series of models on the mobile terminal to facilitate flexible applications in different scenarios.
All checkpoints are trained with 400 epochs without distillation.
Results of the mAP and speed are evaluated on COCO val2017 dataset, and the input resolution is the Size in the table.
Speed is tested on MNN 2.3.0 AArch64 with 2 threads by arm82 acceleration. The inference warm-up is performed 10 times, and the cycle is performed 100 times.
Qualcomm 888(sm8350), Dimensity 720(mt6853) and Qualcomm 660(sdm660) correspond to chips with different performances at the high, middle and low end respectively, which can be used as a reference for model capabilities under different chips.
Refer to Test NCNN Speed tutorial to reproduce the NCNN speed results of YOLOv6Lite.

Performance of YOLOv6_MBLA models

Model	Size	mAP^val 0.5:0.95	Speed^{T4 trt fp16 b1 (fps)}	Speed^{T4 trt fp16 b32 (fps)}	Params ^(M)	FLOPs ^(G)
YOLOv6-S-mbla	640	47.0^distill	300	424	11.6	29.8
YOLOv6-M-mbla	640	50.3^distill	168	216	26.1	66.7
YOLOv6-L-mbla	640	52.0^distill	129	154	46.3	118.2
YOLOv6-X-mbla	640	53.5^distill	78	94	78.8	199.0

Table Notes

Speed is tested with TensorRT 8.4.2.4 on T4.
The processes of model training, evaluation, and inference are the same as the original ones. For details, please refer to this README.

0.3.1

1 year ago

Features

Face detection and landmarks localization
Repulsion loss
Same-channel Dehead

Performance on WIDERFACE

Model	Size	Easy	Medium	Hard	Speed^{T4 trt fp16 b1 (fps)}	Speed^{T4 trt fp16 b32 (fps)}	Params ^(M)	FLOPs ^(G)
YOLOv6-N	640	95.0	92.4	80.4	797	1313	4.63	11.35
YOLOv6-S	640	96.2	94.7	85.1	339	484	12.41	32.45
YOLOv6-M	640	97.0	95.3	86.3	188	240	24.85	70.59
YOLOv6-L	640	97.2	95.9	87.5	102	121	56.77	159.24

All checkpoints are fine-tuned from COCO pretrained model for 300 epochs without distillation.
Results of the mAP and speed are evaluated on WIDER FACE dataset with the input resolution of 640×640.
Speed is tested with TensorRT 8.2 on T4.
Refer to Test speed tutorial to reproduce the speed results of YOLOv6.
Params and FLOPs of YOLOv6 are estimated on deployed models.

0.3.0

1 year ago

v3.0 release

Features

Release P6 models and update P5 models

Renew the neck of the detector with a BiC module and SimCSPSPPF Block.
Propose an anchor-aided training (AAT) strategy.
Involve a new self-distillation strategy for small models of YOLOv6.
Expand YOLOv6 and hit a new SOTA performance on the COCO dataset.

Performance

Model	Size	mAP^val 0.5:0.95	Speed^{T4 trt fp16 b1 (fps)}	Speed^{T4 trt fp16 b32 (fps)}	Params ^(M)	FLOPs ^(G)
YOLOv6-N	640	37.5	779	1187	4.7	11.4
YOLOv6-S	640	45.0	339	484	18.5	45.3
YOLOv6-M	640	50.0	175	226	34.9	85.8
YOLOv6-L	640	52.8	98	116	59.6	150.7

YOLOv6-N6	1280	44.9	228	281	10.4	49.8
YOLOv6-S6	1280	50.3	98	108	41.4	198.0
YOLOv6-M6	1280	55.2	47	55	79.6	379.5
YOLOv6-L6	1280	57.2	26	29	140.4	673.4

Performance of base models

Model	Size	mAP^val 0.5:0.95	Speed^{T4 TRT FP16 b1 (FPS)}	Speed^{T4 TRT FP16 b32 (FPS)}	Speed^{T4 TRT INT8 b1 (FPS)}	Speed^{T4 TRT INT8 b32 (FPS)}	Params ^(M)	FLOPs ^(G)
YOLOv6-N-base	640	36.6	727	1302	814	1805	4.65	11.46
YOLOv6-S-base	640	45.3	346	525	487	908	13.14	30.6
YOLOv6-M-base	640	49.4	179	245	284	439	28.33	72.30
YOLOv6-L-base	640	51.1	116	157	196	288	59.61	150.89

0.2.1

1 year ago

v2.1 release

Features

Release base models

Use only regular convolution and Relu activation functions.
Apply CSP (1/2 channel dim) blocks in the network structure, except for Nano base model.

Advantage:

Adopt a unified network structure and configuration, and the accuracy loss of the PTQ 8-bit quantization model is negligible, about 0.4%.
Suitable for users who are just getting started or who need to apply, optimize and deploy an 8-bit quantization model quickly and frequently.

Shortcoming:

The accuracy on COCO is slightly lower than the v2.0 released models.

Performance

Model	Size	mAP^val 0.5:0.95	Speed^{T4 trt fp16 b1 (fps)}	Speed^{T4 trt fp16 b32 (fps)}	Params ^(M)	FLOPs ^(G)
YOLOv6-N-base	640	35.6^400e	832	1249	4.3	11.1
YOLOv6-S-base	640	43.8^400e	373	531	11.5	27.6
YOLOv6-M-base	640	48.8^distill	179	246	27.7	68.4
YOLOv6-L-base	640	51.0^distill	115	153	58.5	144.0

0.2.0

1 year ago

v2.0 release

YOLOv6 has a series of models for various industrial scenarios, including nano/tiny/s/m/l, which the architectures vary considering the model size for better accuracy-speed trade-off. And some Bag-of-freebies methods are introduced to further improve the performance, such as self-distillation and more training epochs. For industrial deployment, we adopt QAT with channel-wise distillation and graph optimization to pursue extreme performance.

New Features

Release M/L models and update N/T/S models with enhanced performance.⭐️ Benchmark
2x faster training time.
Fix the degration of performance when evaluating on 640x640 inputs.
Customized quantization methods. 🚀 Quantization Tutorial

0.1.0

1 year ago

v1.0 release

Features

YOLOv6 is a single-stage object detection framework dedicated to industrial application, with hardware-friendly efficient design and high performance, outperforming YOLOv5, YOLOX and PP-YOLOE.

YOLOv6-nano achieves 35.0 mAP on COCO val2017 dataset with 1242 FPS on T4 using TensorRT FP16 for bs32 inference, and YOLOv6-s achieves 43.1 mAP on COCO val2017 dataset with 520 FPS on T4 using TensorRT FP16 for bs32 inference.

Hardware-friendly Design for Backbone and Neck
Efficient Decoupled Head with SIoU Loss