Gluon CV Toolkit
GluonCV 0.10.0 release features a new Auto Module
designed to bootstrap training tasks with less code and effort:
simpler and better custom dataset loading experience with pandas DataFrame visualization. Comparing with obsolete code based dataset composition, it allows you to load arbitrary datasets faster and more reliable.
one liner fit
function with configuration file support(yaml configuration file)
built-in HPO support, for effortless tuning of hyper-parameters
This release includes a new module called gluoncv.auto
, with gluoncv.auto
you can access many high-level APIs such as data
, estimators
and tasks
.
auto.data
module is designed to load arbitrary web datasets you find on the internet, such as Kaggle competition datasets.
You may refer to this tutorial or check out the fully compatible d8 dataset for loading custom datasets.
The dataset has internal DataFrame
storage for easier access and analysis
similar for object detection:
In this release, we packed the following high-level estimators for training and predicting images for image classification and object detection.
fit
function:predict
, predict_proba
(for image classification), predict_feature
(for image classification)save
and load
.You may visit the tutorial website for more detailed examples.
In this release, the following auto tasks are supported and have been massively tested on many datasets to ensure HPO performance:
Comparing with pure algorithm-based estimators, the auto tasks provide identical APIs and functionalities but allow you to fit
with hyper-parameter optimization(HPO) with specified num_trials
and time_limit
. For object detection, it allows multiple algorithms(e.g., SSDEstimator and FasterRCNNEstimator) to be tuned as a categorical search space.
The tutorial is available here
GluonCV v0.9.0 starts to support PyTorch!
We want to make our toolkit agnostic to deep learning frameworks so that it is available for everyone. From this release, we start to support PyTorch. All PyTorch code and models are under torch
folder inside gluoncv
, arranged in the same hierarchy as before: model
, data
, nn
and utils
. model
folder contains our model zoo with model definitions, data
folder contains dataset definition and dataloader, nn
defines new operators and utils
provide utility functions to help model training, evaluation and visualization.
To get started, you can find installation instructions, model zoo and tutorials on our website. In order to make our toolkit easier to use and customize, we provide model definitions separately for each method without extreme abstraction and modularization. In this manner, you can play with each model without jumping across multiple files, and you can modify individual model implementation without affecting other models. At the same time, we adopt yaml
for easier configuration. We thrive to make our toolkit more user friendly for students and researchers.
We have 46 PyTorch models for video action recognition, with better I3D models, more recent TPN family, faster training (DDP support and multi-grid) and K700 pretrained weights. Finetuning and feature extraction can never be easier.
Details of our model zoo can be seen at here. In terms of models, we cover TSN, I3D, I3D_slow, R2+1D, Non-local, CSN, TSN and TPN. In terms of datasets, we cover Kinetics400, Kinetics700 and Something-something-v2. All of our models have similar or better performance compared to numbers reported in original paper.
We provide several tutorials to get you started, including how to make predictions using a pretrained model, how to extract video features from a pretrained model, how to finetune a model on your dataset, how to measure a model's flops/speed, and how to use our DDP framework.
Since video models are slow to train (due to slow IO and large model), we also support distributed dataparallel (DDP) training and multi-grid training. DDP can provide 2x speed up and multi-grid training can provide 3-4x speed up. Combining these two techniques can significantly shorten the training process. In addition, both techniques are provided as helper functions. You can easily add your model definitions to GluonCV (a single python file like this) and enjoy the speed brought by our framework. More details can be read in this tutorial.
Special thanks to @Arthurlxy @ECHO960 @zhreshold @yinweisu for their support in this release. Thanks to @coocoo90 for contributing the CSN and R2+1D models. And thanks to other contributors for the bug fixes and improvements.
GluonCV v0.8.0 features the popular depth estimation model Monodepth2, semantic segmentation models (DANet and FastSCNN), StyleGAN, and multiple usability improvements.
We provide GluonCV implementation of Monodepth2 and the results are fully reproducible. To try out on your own images, please see our demo tutorial. To train a Monodepth2 model on your own dataset, please see our dive deep tutorial.
Following table shows its performance on the KITTI dataset.
Name | Modality | Resolution | Abs. Rel. Error | delta < 1.25 | Hashtag |
---|---|---|---|---|---|
monodepth2_resnet18_kitti_stereo_640x192 1 | Stereo | 640x192 | 0.114 | 0.856 | 92871317 |
We include two new semantic segmentation models in this release, one is DANet, the other is FastSCNN.
Following table shows their performance on the Cityscapes validation set.
Model | Pre-Trained Dataset | Dataset | pixAcc | mIoU |
---|---|---|---|---|
danet_resnet50_citys | ImageNet | Cityscapes | 96.3 | 78.5 |
danet_resnet101_citys | ImageNet | Cityscapes | 96.5 | 80.1 |
fastscnn_citys | - | Cityscapes | 95.1 | 72.3 |
Our FastSCNN is an improved version from a recent paper using semi-supervised learning. To our best knowledge, 72.3
mIoU is the highest-scored implementation of FastSCNN and one of the best real-time semantic segmentation models.
A GluonCV implementation of StyleGAN "A Style-Based Generator Architecture for Generative Adversarial Networks": https://github.com/dmlc/gluon-cv/tree/master/scripts/gan/stylegan
train.py
(#1339)GluonCV 0.7 added our latest backbone network: ResNeSt, and the derived models for semantic segmentation and object detection. We achieve significant performance improvement on all three tasks.
GluonCV now provides the state-of-art image classification backbones that can be used by various downstream tasks. Our ResNeSt outperforms EfficientNet in accuracy-speed trade-off as shown in the following figures. You can now swap in our new ResNeSt in your research or product to get immediate performance improvement. Checkout the detail in our paper: ResNeSt: Split Attention Network
Here is a comparison between ResNeSt and EfficientNet. The average latency is computed using a single V100 on a p3dn.24xlarge machine with a batch size of 16.
Model | input size | top-1 acc (%) | avg latency (ms) | |
---|---|---|---|---|
SENet_154 | 224x224 | 81.26 | 5.07 | previous |
ResNeSt50 | 224x224 | 81.13 | 1.78 | v0.7 |
ResNeSt101 | 256x256 | 82.83 | 3.43 | v0.7 |
ResNeSt200 | 320x320 | 83.90 | 9.49 | v0.7 |
ResNeSt269 | 416x416 | 84.54 | 19.50 | v0.7 |
We add two new ResNeSt based Faster R-CNN model. Noted that our model is trained using 2x learning rate schedule instead of the 1x schedule used in our paper. Our two new models are 2-4% higher on COCO mAP than our previous best model “faster_rcnn_fpn_resnet101_v1d_coco”. Notebly, our ResNeSt-50 based model has a 4.1% higher mAP than our previous ResNet-101 based model.
Model | Backbone | mAP | |
---|---|---|---|
Faster R-CNN | ResNet-101 | 40.8 | previous |
Faster R-CNN | ResNeSt-50 | 42.7 | v0.7 |
Faster R-CNN | ResNeSt-101 | 44.9 | v0.7 |
We add ResNeSt-50 and ResNeSt-101 based DeepLabV3 for semantic segmentation task on ADE20K dataset. Our new models are 1-2.8% higher than our previous best. Similar to our detection result, ResNeSt-50 performs better than ResNet-101 based model. DeepLabV3 with ResNeSt-101 backbone achieves a new state-of-the-art of 46.9 mIoU on ADE20K validation set, which outperform previous best by more than 1%.
Model | Backbone | pixel Accuracy | mIoU | |
---|---|---|---|---|
DeepLabV3 | ResNet-101 | 81.1 | 44.1 | previous |
DeepLabV3 | ResNeSt-50 | 81.2 | 45.1 | v0.7 |
DeepLabV3 | ResNeSt-101 | 82.1 | 46.9 | v0.7 |
GluonCV v0.6.0 added more video classification models, added pose estimation models that are suitable for mobile inference, added quantized models for video classification and pose estimation, and we also included multiple usability and code improvements.
https://gluon-cv.mxnet.io/model_zoo/action_recognition.html
We now provide state-of-the-art video classification networks, such as I3D, I3D-Nonlocal and SlowFast. We have a complete model zoo over several widely adopted video datasets. We provide a general video dataloader (which can handle both frame format and raw video format). Users can do training, fine-tuning, prediction and feature extraction without writing complicate code. Just prepare a text file containing the video information is enough.
Below is the table of new models included in this release.
Name | Pretrained | Segments | Clip Length | Top-1 | Hashtag |
---|---|---|---|---|---|
inceptionv1_kinetics400 | ImageNet | 7 | 1 | 69.1 | 6dcdafb1 |
inceptionv3_kinetics400 | ImageNet | 7 | 1 | 72.5 | 8a4a6946 |
resnet18_v1b_kinetics400 | ImageNet | 7 | 1 | 65.5 | 46d5a985 |
resnet34_v1b_kinetics400 | ImageNet | 7 | 1 | 69.1 | 8a8d0d8d |
resnet50_v1b_kinetics400 | ImageNet | 7 | 1 | 69.9 | cc757e5c |
resnet101_v1b_kinetics400 | ImageNet | 7 | 1 | 71.3 | 5bb6098e |
resnet152_v1b_kinetics400 | ImageNet | 7 | 1 | 71.5 | 9bc70c66 |
i3d_inceptionv1_kinetics400 | ImageNet | 1 | 32 (64/2) | 71.8 | 81e0be10 |
i3d_inceptionv3_kinetics400 | ImageNet | 1 | 32 (64/2) | 73.6 | f14f8a99 |
i3d_resnet50_v1_kinetics400 | ImageNet | 1 | 32 (64/2) | 74.0 | 568a722e |
i3d_resnet101_v1_kinetics400 | ImageNet | 1 | 32 (64/2) | 75.1 | 6b69f655 |
i3d_nl5_resnet50_v1_kinetics400 | ImageNet | 1 | 32 (64/2) | 75.2 | 3c0e47ea |
i3d_nl10_resnet50_v1_kinetics400 | ImageNet | 1 | 32 (64/2) | 75.3 | bfb58c41 |
i3d_nl5_resnet101_v1_kinetics400 | ImageNet | 1 | 32 (64/2) | 76.0 | fbfc1d30 |
i3d_nl10_resnet101_v1_kinetics400 | ImageNet | 1 | 32 (64/2) | 76.1 | 59186c31 |
slowfast_4x16_resnet50_kinetics400 | ImageNet | 1 | 36 (64/1) | 75.3 | 9d650f51 |
slowfast_8x8_resnet50_kinetics400 | ImageNet | 1 | 40 (64/1) | 76.6 | d6b25339 |
slowfast_8x8_resnet101_kinetics400 | ImageNet | 1 | 40 (64/1) | 77.2 | fbde1a7c |
resnet50_v1b_ucf101 | ImageNet | 3 | 1 | 83.7 | d728ecc7 |
i3d_resnet50_v1_ucf101 | ImageNet | 1 | 32 (64/2) | 83.9 | 7afc7286 |
i3d_resnet50_v1_ucf101 | Kinetics400 | 1 | 32 (64/2) | 95.4 | 760d0981 |
resnet50_v1b_hmdb51 | ImageNet | 3 | 1 | 55.2 | 682591e2 |
i3d_resnet50_v1_hmdb51 | ImageNet | 1 | 32 (64/2) | 48.5 | 0d0ad559 |
i3d_resnet50_v1_hmdb51 | Kinetics400 | 1 | 32 (64/2) | 70.9 | 2ec6bf01 |
resnet50_v1b_sthsthv2 | ImageNet | 8 | 1 | 35.5 | 80ee0c6b |
i3d_resnet50_v1_sthsthv2 | ImageNet | 1 | 16 (32/2) | 50.6 | 01961e4c |
We include tutorials for how to fine-tune a pre-trained model on users' own dataset. https://gluon-cv.mxnet.io/build/examples_action_recognition/finetune_custom.html
We include tutorials for introducing a new efficient video reader, Decord. https://gluon-cv.mxnet.io/build/examples_action_recognition/decord_loader.html
We include tutorials for how to extract features from a pre-trained model. https://gluon-cv.mxnet.io/build/examples_action_recognition/feat_custom.html
We include tutorials for how to make predictions from a pre-trained model. https://gluon-cv.mxnet.io/build/examples_action_recognition/demo_custom.html
We include tutorials for how to perform distributed training on deep video models. https://gluon-cv.mxnet.io/build/examples_distributed/distributed_slowfast.html
We include tutorials for how to prepare HMDB51 and Something-something-v2 dataset. https://gluon-cv.mxnet.io/build/examples_datasets/hmdb51.html https://gluon-cv.mxnet.io/build/examples_datasets/somethingsomethingv2.html
We will provide Kinetics600 and Kinetics700 pre-trained models in the next release, please stay tuned.
https://gluon-cv.mxnet.io/model_zoo/pose.html#mobile-pose-models
Model | OKS AP | OKS AP (with flip) | Hashtag |
---|---|---|---|
mobile_pose_resnet18_v1b | 66.2/89.2/74.3 | 67.9/90.3/75.7 | dd6644eb |
mobile_pose_resnet50_v1b | 71.1/91.3/78.7 | 72.4/92.3/79.8 | ec8809df |
mobile_pose_mobilenet1.0 | 64.1/88.1/71.2 | 65.7/89.2/73.4 | b399bac7 |
mobile_pose_mobilenetv2_1.0 | 63.7/88.1/71.0 | 65.0/89.2/72.3 | 4acdc130 |
mobile_pose_mobilenetv3_large | 63.7/88.9/70.8 | 64.5/89.0/72.0 | 1ca004dc |
mobile_pose_mobilenetv3_small | 54.3/83.7/59.4 | 55.6/84.7/61.7 | b1b148a9 |
By replacing the backbone network, and use pixel shuffle layer instead of deconvolution, we can have models that are very fast. These models are suitable for edge device applications, tutorials on deployment will come soon.
https://gluon-cv.mxnet.io/build/examples_deployment/int8_inference.html Below CPU performance is benchmarked on AWS EC2 C5.12xlarge instance with 24 physical cores. Note that you will need nightly build of MXNet to properly use these new features.
Model | Dataset | Batch Size | Speedup (INT8/FP32) | FP32 Accuracy | INT8 Accuracy |
---|---|---|---|---|---|
simple_pose_resnet18_v1b | COCO Keypoint | 128 | 2.55 | 66.3 | 65.9 |
simple_pose_resnet50_v1b | COCO Keypoint | 128 | 3.50 | 71.0 | 70.6 |
simple_pose_resnet50_v1d | COCO Keypoint | 128 | 5.89 | 71.6 | 71.4 |
simple_pose_resnet101_v1b | COCO Keypoint | 128 | 4.07 | 72.4 | 72.2 |
simple_pose_resnet101_v1d | COCO Keypoint | 128 | 5.97 | 73.0 | 72.7 |
vgg16_ucf101 | UCF101 | 64 | 4.46 | 81.86 | 81.41 |
inceptionv3_ucf101 | UCF101 | 64 | 5.16 | 86.92 | 86.55 |
resnet18_v1b_kinetics400 | Kinetics400 | 64 | 5.24 | 63.29 | 63.14 |
resnet50_v1b_kinetics400 | Kinetics400 | 64 | 6.78 | 68.08 | 68.15 |
inceptionv3_kinetics400 | Kinetics400 | 64 | 5.29 | 67.93 | 67.92 |
For pose-estimation models, the accuracy metric is OKS AP w/o flip. Quantized 2D video action recognition models are calibrated with num-segments=3 (7 is for ResNet-based models).
GluonCV v0.5.0 added Video Action Recognition models, added AlphaPose, added MobileNetV3, added VPLR semantic segmentation models for driving scenes, added more Int8 quantized models for deployment, and we also included multiple usability improvements.
Model | Metric | 0.5 |
---|---|---|
vgg16_ucf101 | UCF101 Top-1 | 83.4 |
inceptionv3_ucf101 | UCF101 Top-1 | 88.1 |
inceptionv3_kinetics400 | Kinetics400 Top-1 | 72.5 |
alpha_pose_resnet101_v1b_coco | OKS AP (with flip) | 76.7/92.6/82.9 |
MobileNetV3_Large | ImageNet Top-1 | 75.32 |
MobileNetV3_Small | ImageNet Top-1 | 67.72 |
deeplab_v3b_plus_wideresnet_citys | Cityscapes mIoU | 83.5 |
https://gluon-cv.mxnet.io/model_zoo/action_recognition.html
Video Action Recognition in GluonCV is a complete application set, including model definition, training scripts, useful loss and metric functions. We also included some pre-trained models and usage tutorials.
Model | Pre-Trained Dataset | Clip Length | Num of Segments | Metric | Dataset | Accuracy |
---|---|---|---|---|---|---|
vgg16_ucf101 | ImageNet | 1 | 1 | Top-1 | UCF101 | 81.5 |
vgg16_ucf101 | ImageNet | 1 | 3 | Top-1 | UCF101 | 83.4 |
inceptionv3_ucf101 | ImageNet | 1 | 1 | Top-1 | UCF101 | 85.6 |
inceptionv3_ucf101 | ImageNet | 1 | 3 | Top-1 | UCF101 | 88.1 |
inceptionv3_kinetics400 | ImageNet | 1 | 3 | Top-1 | Kinetics400 | 72.5 |
The tutorial for how to prepare UCF101 and Kinetics400 dataset: https://gluon-cv.mxnet.io/build/examples_datasets/ucf101.html and https://gluon-cv.mxnet.io/build/examples_datasets/kinetics400.html .
The demo for using the pre-trained model to predict human actions: https://gluon-cv.mxnet.io/build/examples_action_recognition/demo_ucf101.html.
The tutorial for how to train your own action recognition model: https://gluon-cv.mxnet.io/build/examples_action_recognition/dive_deep_ucf101.html.
More state-of-the-art models (I3D, SlowFast, etc.) are coming in the next release. Stay tuned.
https://gluon-cv.mxnet.io/model_zoo/pose.html#alphapose
Model | Dataset | OKS AP | OKS AP (with flip) |
---|---|---|---|
alpha_pose_resnet101_v1b_coco | COCO Keypoint | 74.2/91.6/80.7 | 76.7/92.6/82.9 |
The demo for using the pre-trained AlphaPose model: https://gluon-cv.mxnet.io/build/examples_pose/demo_alpha_pose.html.
https://gluon-cv.mxnet.io/model_zoo/classification.html#mobilenet
Model | Dataset | Top-1 | Top-5 | Top-1 (original paper) |
---|---|---|---|---|
MobileNetV3_Large | ImageNet | 75.3 | 92.3 | 75.2 |
MobileNetV3_Small | ImageNet | 67.7 | 87.5 | 67.4 |
https://gluon-cv.mxnet.io/model_zoo/segmentation.html#cityscapes-dataset
Model | Pre-Trained Dataset | Dataset | mIoU | iIoU |
---|---|---|---|---|
deeplab_v3b_plus_wideresnet_citys | ImageNet, Mapillary Vista | Cityscapes | 83.5 | 64.4 |
Improving Semantic Segmentation via Video Propagation and Label Relaxation ported in GluonCV. State-of-the-art method on several driving semantic segmentation benchmarks (Cityscapes, CamVid and KITTI), and generalizes well to other scenes.
https://gluon-cv.mxnet.io/build/examples_deployment/int8_inference.html Below CPU performance is benchmarked on AWS EC2 C5.12xlarge instance with 24 physical cores. Note that you will need nightly build of MXNet to properly use these new features.
Model | Dataset | Batch Size | C5.12xlarge FP32 | C5.12xlarge INT8 | Speedup | FP32 Acc | INT8 Acc |
---|---|---|---|---|---|---|---|
FCN_resnet101 | VOC | 1 | 5.46 | 26.33 | 4.82 | 97.97% | 98.00% |
PSP_resnet101 | VOC | 1 | 3.96 | 10.63 | 2.68 | 98.46% | 98.45% |
Deeplab_resnet101 | VOC | 1 | 4.17 | 13.35 | 3.20 | 98.36% | 98.34% |
FCN_resnet101 | COCO | 1 | 5.19 | 26.22 | 5.05 | 91.28% | 90.96% |
PSP_resnet101 | COCO | 1 | 3.94 | 10.60 | 2.69 | 91.82% | 91.88% |
Deeplab_resnet101 | COCO | 1 | 4.15 | 13.56 | 3.27 | 91.86% | 91.98% |
For segmentation models, the accuracy metric is pixAcc. Usage of int8 quantized model is identical to standard GluonCV models, simple use suffix _int8
.
GluonCV v0.4 added Pose Estimation models, Int8 quantization for intel CPUs, added FPN Faster/Mask-RCNN, wide se/resnext models, and we also included multiple usability improvements.
We highly suggest to use GluonCV 0.4.0 with MXNet>=1.4.0 to avoid some dependency issues. For some specific tasks you may need MXNet nightly build. See https://gluon-cv.mxnet.io/index.html
Model | Metric | 0.4 |
---|---|---|
simple_pose_resnet152_v1b | OKS AP* | 74.2 |
simple_pose_resnet50_v1b | OKS AP* | 72.2 |
ResNext50_32x4d | ImageNet Top-1 | 79.32 |
ResNext101_64x4d | ImageNet Top-1 | 80.69 |
SE_ResNext101_32x4d | ImageNet Top-1 | 79.95 |
SE_ResNext101_64x4d | ImageNet Top-1 | 81.01 |
yolo3_mobilenet1.0_coco | COCO mAP | 28.6 |
* Using Ground-Truth person detection results
GluonCV is now integrated with Intel's vector neural network instruction(vnni) to accelerate model inference speed. Note that you will need a capable Intel Skylake CPU to see proper speed up ratio.
Model | Dataset | Batch Size | C5.18x FP32 | C5.18x INT8 | Speedup | FP32 Acc | INT8 Acc |
---|---|---|---|---|---|---|---|
resnet50_v1 | ImageNet | 128 | 122.02 | 276.72 | 2.27 | 77.21%/93.55% | 76.86%/93.46% |
mobilenet1.0 | ImageNet | 128 | 375.33 | 1016.39 | 2.71 | 73.28%/91.22% | 72.85%/90.99% |
ssd_300_vgg16_atrous_voc* | VOC | 224 | 21.55 | 31.47 | 1.46 | 77.4 | 77.46 |
ssd_512_vgg16_atrous_voc* | VOC | 224 | 7.63 | 11.69 | 1.53 | 78.41 | 78.39 |
ssd_512_resnet50_v1_voc* | VOC | 224 | 17.81 | 34.55 | 1.94 | 80.21 | 80.16 |
ssd_512_mobilenet1.0_voc* | VOC | 224 | 31.13 | 48.72 | 1.57 | 75.42 | 75.04 |
*nms_thresh=0.45, nms_topk=200
Usage of int8
quantized model is identical to standard GluonCV models, simple use suffix _int8
.
For example, use resnet50_v1_int8
as int8
quantized version of resnet50_v1
.
https://gluon-cv.mxnet.io/model_zoo/classification.html#pruned-resnet
Pruning channels of convolution layers is an very effective way to reduce model redundency which aims to speed up inference without sacrificing significant accuracy. GluonCV 0.4 has included several pruned resnets from original GluonCV SoTA ResNets for ImageNet.
Model | Top-1 | Top-5 | Hashtag | Speedup (to original ResNet) |
---|---|---|---|---|
resnet18_v1b_0.89 | 67.2 | 87.45 | 54f7742b | 2x |
resnet50_v1d_0.86 | 78.02 | 93.82 | a230c33f | 1.68x |
resnet50_v1d_0.48 | 74.66 | 92.34 | 0d3e69bb | 3.3x |
resnet50_v1d_0.37 | 70.71 | 89.74 | 9982ae49 | 5.01x |
resnet50_v1d_0.11 | 63.22 | 84.79 | 6a25eece | 8.78x |
resnet101_v1d_0.76 | 79.46 | 94.69 | a872796b | 1.8x |
resnet101_v1d_0.73 | 78.89 | 94.48 | 712fccb1 | 2.02x |
Scripts for pruning resnets will be release in the future.
A GluonCV SRGAN of "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network ": https://github.com/dmlc/gluon-cv/tree/master/scripts/gan/srgan
Image-to-Image translation reproduced in GluonCV: https://github.com/dmlc/gluon-cv/tree/master/scripts/gan/cycle_gan
GluonCV implementation of https://arxiv.org/abs/1704.06904
https://gluon-cv.mxnet.io/model_zoo/pose.html
Human Pose Estimation in GluonCV is a complete application set, including model definition, training scripts, useful loss and metric functions. We also included some pre-trained models and usage tutorials.
Model | OKS AP | OKS AP (with flip) |
---|---|---|
simple_pose_resnet18_v1b | 66.3/89.2/73.4 | 68.4/90.3/75.7 |
simple_pose_resnet18_v1b | 52.8/83.6/57.9 | 54.5/84.8/60.3 |
simple_pose_resnet50_v1b | 71.0/91.2/78.6 | 72.2/92.2/79.9 |
simple_pose_resnet50_v1d | 71.6/91.3/78.7 | 73.3/92.4/80.8 |
simple_pose_resnet101_v1b | 72.4/92.2/79.8 | 73.7/92.3/81.1 |
simple_pose_resnet101_v1d | 73.0/92.2/80.8 | 74.2/92.4/82.0 |
simple_pose_resnet152_v1b | 72.4/92.1/79.6 | 74.2/92.3/82.1 |
simple_pose_resnet152_v1d | 73.4/92.3/80.7 | 74.6/93.4/82.1 |
simple_pose_resnet152_v1d | 74.8/92.3/82.0 | 76.1/92.4/83.2 |
Model | bbox/seg mAP | Caffe bbox/seg |
---|---|---|
faster_rcnn_fpn_resnet50_v1b_coco | 0.384/- | 0.379 |
faster_rcnn_fpn_bn_resnet50_v1b_coco | 0.393/- | - |
faster_rcnn_fpn_resnet101_v1d_coco | 0.412/- | 0.398/- |
maskrcnn_fpn_resnet50_v1b_coco | 0.392/0.353 | 0.386/0.345 |
maskrcnn_fpn_resnet101_v1d_coco | 0.423/0.377 | 0.409/0.364 |
reset_class
for reuse partial category knowledge so some task may not need to finetune models anymore: https://gluon-cv.mxnet.io/build/examples_detection/skip_fintune.html#sphx-glr-build-examples-detection-skip-fintune-py
Model | Metric | 0.2 | 0.3 | Reference |
---|---|---|---|---|
ResNet-50 | top-1 acc on ImageNet | 77.07% | 79.15% | 75.3% (Caffe impl) |
ResNet-101 | top-1 acc on ImageNet | 78.81% | 80.51% | 76.4% (Caffe impl) |
MobileNet 1.0 | top-1 acc on ImageNet | N/A | 73.28% | 70.9% (tensorflow impl) |
Faster-RCNN | mAP on COCO | N/A | 40.1% | 39.6% (Detectron) |
Yolo-v3 | mAP on COCO | N/A | 37.0% | 33.0% (paper) |
DeepLab-v3 | mIoU on VOC | N/A | 86.7% | 85.7% (paper) |
Mask-RCNN | mask AP on COCO | N/A | 33.1% | 32.8% (Detectron) |
For image classification:
and for object detection
All models are hybridiziable. They can be deployed without Python. See tutorials to deploy these models in C++.
We now provide a broader range of model families that are good for out of box usage and various research purposes.
Significantly more accurate than original paper. For example, we get 37.0% mAP on CoCo versus the original paper's 33.0%. The techniques we used will be included in a paper to be released later.
Accuracy now matches Caffe2 Detectron without FPN, e.g. 38.3% box AP and 33.1% mask AP on COCO with ResNet50.
FPN support will come in future versions.
Slightly more accurate than original paper. For example, we get 86.7% mIoU on voc versus the original paper's 85.7%.
Reproduced WGAN with ResNet
Provide a baseline model which achieved 93.1 best rank1 score on Market1501 dataset.
GluonCV 0.3.0 now depends on incubator-mxnet >= 1.3.0, please update mxnet according to installation guide to avoid compatibility issues.
Note: This release rely on some features of mxnet 1.3.0. You can early access these features by installing nightly build of mxnet.
You can update mxnet with pip:
pip install mxnet --upgrade --pre
# or
pip install mxnet-cu90 --upgrade --pre
Highlight: Much more accurate pre-trained ResNet models on ImageNet classification
These high accuracy models are updated to Gluon Model Zoo.
Highlight: Faster-RCNN model with training/testing scripts
Faster-RCNN
Train SSD on COCO dataset
Highlight: PSPNet for Semantic Segmentation
dilation
is an optionAdded the following datasets and usage tutorials
DilatedResnetV0
to ResNetV1b
GluonCV provides implementations of state-of-the-art (SOTA) deep learning algorithms in computer vision. It is designed for helping engineers, researchers, and students to quickly prototype products, validate new ideas, and learning computer vision.
Tutorials
Model Zoo
Training Scripts
Util functions