A basic and simple training framework for pytorch. Easy to expand for multi network interaction.
[TOC]
A basic and simple training framework for pytorch, easy for extension.
python main.py --model resnet18 --save_dir cifar100_resnet18 --config_path ./configs/cifar100.yml
for ddp training, you can just run the below command in the terminal
sh run_ddp.sh
All CIFAR100 log files can be downloaded in package here: pan.baidu code:3lp2
Network | Accuracy | log |
---|---|---|
resnet18 | 76.46 | pan.baidu code: ewnd |
resnet34 | 77.23 | pan.baidu code: dq4r |
resnet50 | 76.82 | pan.baidu code: 1e62 |
resnet101 | 77.32 | pan.baidu code: myfv |
vgg11_bn | 70.52 | pan.baidu code: 2pun |
vgg13_bn | 73.71 | pan.baidu code: 4vmm |
mobilenetV2 | 68.99 | pan.baidu code: e93w |
shufflenet | 71.17 | pan.baidu code: lnvy |
shufflenetV2 | 71.16 | pan.baidu code: vmi6 |
All CIFAR10 log files can be downloaded in package here: pan.baidu code:3iqz
Network | Accuracy | log |
---|---|---|
resnet18 | 94.92 | pan.baidu code: a20j |
resnet34 | 94.80 | pan.baidu code: q8h1 |
resnet50 | 94.81 | pan.baidu code: f3wr |
resnet101 | 95.45 | pan.baidu code: d3i8 |
vgg11_bn | 92.21 | pan.baidu code: di45 |
vgg13_bn | 93.74 | pan.baidu code: su1z |
mobilenetV2 | 90.92 | pan.baidu code: todf |
shufflenet | 92.06 | pan.baidu code: 1xr2 |
shufflenetV2 | 91.61 | pan.baidu code: 8swu |
import modules
from src import DatasetBuilder, TransformBuilder, ModelBuilder, LossBuilder, LossWrapper, OptimizerBuilder, SchedulerBuilder, MetricBuilder
from torch.utils.data import DataLoader
from src import Controller
from src.utils import AverageMeter
Load your dataloader and transform
transform_name = 'cifar100_transform' # your transform function name
dataset_name = 'CIFAR100' # your dataset class name
train_transform, val_transform = TransformBuilder.load(transform_name)
trainset, trainset_config = DatasetBuilder.load(dataset_name=dataset_name, transform=train_transform, train=True)
valset, valset_config = DatasetBuilder.load(dataset_name=dataset_name, transform=val_transform, train=False)
train_loader = DataLoader(trainset, batch_size=128, shuffle=True, num_workers=4, pin_memory=True)
val_loader = DataLoader(valset, batch_size=128, shuffle=False, num_workers=4, pin_memory=True)
Load model, loss wrapper, metrics, optimizer, schemes and controller
epochs = 20
model = ModelBuilder.load("resnet18", num_classes=100)
# if categorical weights: LossBuilder.load("CrossEntropyLoss", weight=torch.tensor([1.0, 2.0]).float())
loss_func1 = LossBuilder.load("CrossEntropyLoss")
# Only one loss function, cross entropy, set its weight to 1.0
loss_wrapper = LossWrapper([loss_func1], [1.0])
model = model.cuda()
loss_wrapper = loss_wrapper.cuda()
metric_functions = [MetricBuilder.load('Accuracy')]
optimizer, optimizer_param = OptimizerBuilder.load('Adam', model.parameters(), lr=0.1)
scheduler, scheduler_param = SchedulerBuilder.load("cosine_annealing_lr", optimizer, max_epoch=epochs)
controller = Controller(loss_wrapper=loss_wrapper, model=model, optimizer=optimizer)
Train !
for epoch in range(epochs):
# train
model.train()
loss_recorder = AverageMeter(type='scalar', name='total loss')
loss_list_recorder = AverageMeter(type='tuple', num_scalar=1, names=["CrossEntropyLoss"])
metric_list_recorder = AverageMeter(type='tuple', num_scalar=1, names=["Accuracy"])
for (img, label) in train_loader:
img = img.cuda()
label = label.cuda()
loss, loss_tuple, output_no_grad = controller.train_step(img, label)
loss_recorder.update(loss.item(), img.size(0))
loss_list_recorder.update(loss_tuple, img.size(0))
metrics = tuple([func(output_no_grad, label) for func in metric_functions])
metric_list_recorder.update(metrics, img.size(0))
print(f"total loss:{loss_recorder.get_value()} loss_tuple:{loss_list_recorder} metrics:{metric_list_recorder}")
# eval
model.eval()
# ...
src/datasets
directory, define your customized mydataset.py like cifar.py
,.root
, train
, download
, which can be specified in src/datasets/dataset_config.yml
. Something should be noticed that transform
needs to be set in transorms.py
, details can be found at Customize Transform.src/datasets/dataset_builder.py
, please import your dataset class. For example, MyDataset
class is defined in mydataset.py
, thus from .mydataset import *
in dataset_builder.py
configs/xxx.yml
, set dataset.name
to MyDataset
src/datasets/transforms.py
, define your transform function, named my_transform_func
, which returns train_transform
and val_transform
configs/xxx.yml
, please set dataset.transform_name
to my_transform_func
src/models
directory, define your customized model, such as my_model.py
, and define the module class MyModel
. Please refer to resnet.py
src/models/model_builder.py
, import your model. from .my_model import *
under the try
process, and from my_model import *
under the except
process. It is just convenient for debugging.configs/xxx.yml
, set the model['name']
to MyModel
src/losses
directory, you can define customized loss Module, such as CrossEntropyLoss
in classification.py
.loss_builder.py
LossWrapper
Module in src/losses/loss_wrapper.py
may meet the requirement.configs/xxx.yml
, please add your loss names and weights into train.criterion.names
and train/criterion/loss_weights
respectively.In src/optimizer
directory, optimizers.py
can be found, please define your customized optimizer here. For example, SGD
and Adam
have already defined, parameters
and lr
should be specified, and other params need to be specifed by *args, **kwargs
. Please refer to
# src/optimizer/optimizers.py
def SGD(parameters, lr, *args, **kwargs) -> optim.Optimizer:
optimizer = optim.SGD(parameters, lr=lr, *args, **kwargs)
return optimizer
For other parameters, such as weight_decay
, can be set in src/optimizer/optimizer_config.yml
. Please refer to the below yaml, and it is ok for 5e-4
format, we transform it in src/optimizer/optimizer_builder.py
.
# src/optimizer/optimizer_config.yml
SGD:
momentum: 0.9
weight_decay: 5e-4
dampening: 0
nesterov: False
In configs/xxx.yml
, set the train['lr']
, and set the train['optimizer']
to SGD
src/schemes/lr_schemes.py
, define your learning rate scheme function, named my_scheduler
, which requires some params, such as optimizer
, epochs
and so on.src/schemes/scheme_config.yml
configs/xxx.yml
, set the train.schedule
to my_scheduler
src/metrics/
directory, define your metric, such as Accuracy
in accuracy.py
, which computes the metric of predictions and target and returns an metric scalarmetric_builder.py
, for example, from .accuracy import *
configs/xxx.yml
, add your metrics into train.metric.names
. While training model, the strategy of saving checkpoint refers to the train.metrics.key_metric_name
in configs.xxx.yml
, more details can be found at Customize Checkpoint Saving Strategy
src/controller.py
, please feel free to build your training end validation steploss
, loss_tuple
and output_no_grad
, where loss_tuple
and output_no_grad
only involve in logging, whether loss
has a gradient or not depends on you.src/utils/netio.py
will save the best state dict according to key_metric_name
and strategy
in configs/xxx.yml
save_freq
epoch, which can be set in configs/xxx.yml
as wellTorch-atom got ideas and developed based on the following projects:
If you find this project useful in your research, please consider cite:
@misc{2022torchatom,
title={Torch-atom: A basic and simple training framework for pytorch},
author={Baitan Shao},
howpublished = {\url{https://github.com/shaoeric/torch-atom}},
year={2022}
}
The MIT License | Open Source Initiative
Please feel free to submit issues, :)