PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNet-V3/V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
Datasets & transform refactoring
--dataset hfids:org/dataset
)datasets
and webdataset wrapper streaming from HF hub with recent timm
ImageNet uploads to https://huggingface.co/timm
--input-size 1 224 224
or --in-chans 1
, sets PIL image conversion appropriately in dataset--val-split ''
) in train script--bce-sum
(sum over class dim) and --bce-pos-weight
(positive weighting) args for training as they're common BCE loss tweaks I was often hard codingmodel_args
config entry. model_args
will be passed as kwargs through to models on creation.
vision_transformer.py
typing and doc cleanup by Laureηt
quickgelu
ViT variants for OpenAI, DFN, MetaCLIP weights that use it (less efficient)convnext_xxlarge
quickgelu
ViT variants for OpenAI, DFN, MetaCLIP weights that use it (less efficient)convnext_xxlarge
vision_transformer.py
.
vision_transformer.py
, vision_transformer_hybrid.py
, deit.py
, and eva.py
w/o breaking backward compat.
dynamic_img_size=True
to args at model creation time to allow changing the grid size (interpolate abs and/or ROPE pos embed each forward pass).dynamic_img_pad=True
to allow image sizes that aren't divisible by patch size (pad bottom right to patch size each forward pass).img_size
(interpolate pretrained embed weights once) on creation still works.patch_size
(resize pretrained patch_embed weights once) on creation still works.python validate.py /imagenet --model vit_base_patch16_224 --amp --amp-dtype bfloat16 --img-size 255 --crop-pct 1.0 --model-kwargs dynamic_img_size=True dyamic_img_pad=True
--reparam
arg to benchmark.py
, onnx_export.py
, and validate.py
to trigger layer reparameterization / fusion for models with any one of reparameterize()
, switch_to_deploy()
or fuse()
python validate.py /imagenet --model swin_base_patch4_window7_224.ms_in22k_ft_in1k --amp --amp-dtype bfloat16 --input-size 3 256 320 --model-kwargs window_size=8,10 img_size=256,320
Minor updates and bug fixes. New ResNeXT w/ highest ImageNet eval I'm aware of in the ResNe(X)t family (seresnextaa201d_32x8d.sw_in12k_ft_in1k_384
)
selecsls*
model naming regressionseresnextaa201d_32x8d.sw_in12k_ft_in1k_384
weights (and .sw_in12k
pretrain) with 87.3% top-1 on ImageNet-1k, best ImageNet ResNet family model I'm aware of.