Official PyTorch implementation of "Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity" (ICLR'21 Oral)
This is the code for "Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity" accepted at ICLR'21 Oral (paper). Some parts of the codes are borrowed from Puzzle Mix (link).
@inproceedings{
kim2021comixup,
title={Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity},
author={JangHyun Kim and Wonho Choo and Hosan Jeong and Hyun Oh Song},
booktitle={International Conference on Learning Representations},
year={2021}
}
This code was tested with
python 3.7.6
pytorch 1.7.0
torchvision 0.8.1
CUDA 11.1
cuDNN 7603
Also, we should install
gco-wrapper (https://github.com/Borda/pyGCO)
We provide Co-Mixup training log files with (PreActResNet18, CIFAR-100, 300 epochs) (see ./checkpoint). The last checkpoint shows 80.19% clean test accuracy.
To download the model, install gdown and run
pip install gdown
gdown https://drive.google.com/uc?id=1awBkSLxQKHUry-jkbDB1aMRBgIn5aT3F -O ./checkpoint/cifar100_preactresnet18_eph300_comixup/checkpoint.pth.tar
To test the model, run
python main.py --evaluate --log_off --parallel False --resume ./checkpoint/cifar100_preactresnet18_eph300_comixup/checkpoint.pth.tar --data_dir ./data/cifar100/
Note that, CIFAR-100 dataset will be downloaded at --data_dir
, if the dataset dose not exist.
Detailed descriptions of arguments are provided in main.py
. Below are some of the examples for reproducing the experimental results.
Dataset will be downloaded at --data_dir
and the results will be saved at --root_dir
. If you want to run codes without saving results, please set --log_off True
.
python main.py --dataset cifar100 --data_dir ./data/cifar100/ --root_dir ./experiments/cifar100 --labels_per_class 500 --arch preactresnet18 --learning_rate 0.2 --epochs 300 --schedule 100 200 --gammas 0.1 0.1 --comix True --parallel True --m_part 20 --m_block_num 4 --mixup_alpha 2.0 --clean_lam 1.0 --m_beta 0.32 --m_gamma 1.0 --m_thres 0.83 --m_eta 0.05 --m_omega 0.001
--labels_per_class 5000
.--parallel True
(default). However, this requires additional GPU memory (the more the number of partition, the more the number of processes, about 1GB GPU memory per CUDA process). If OOM occurs, set --parallel False
or increase the number of partition size --m_part
. One can also modify the code and use different numbers of partitions and processes.--workers
has a significant impact on training time. I set 0 for CIFAR (using only main thread) and 8 for Tiny-ImageNet.--m_niter 3
(the number iterations for the outer loop of Co-Mixup).--clean_lam
allows us to use high --mixup_alpha
. If we set --clean_lam 0, then --mixup_alpha should be decreased accordingly.The following process is forked from (link).
python load_data.py
python main.py --dataset tiny-imagenet-200 --data_dir ./data/tiny-imagenet-200 --root_dir ./experiments/tiny --labels_per_class 500 --arch preactresnet18 --learning_rate 0.2 --epochs 1200 --schedule 600 900 --gammas 0.1 0.1 --workers 8 --comix True --parallel True --m_part 20 --m_block_num 4 --mixup_alpha 2.0 --clean_lam 1.0 --m_beta 0.32 --m_gamma 1.0 --m_thres 0.83 --m_eta 0.05 --m_omega 0.001
For ImageNet experiments, please refer to ./comix-imagenet
. We also provide test code for localization and robustness experiments at ./comix-localization
. You can also download pretrained models at ./comix-localization
.
MIT License