[NeurIPS 2022] Official PyTorch implementation of Optimizing Relevance Maps of Vision Transformers Improves Robustness. This code allows to finetune the explainability maps of Vision Transformers to enhance robustness.
This code allows to finetune the explainability maps of Vision Transformers to enhance robustness.
06/05/2022 Added a HuggingFace Spaces demo:
The method employs loss functions directly to the explainability maps to ensure that the model is focused mostly on the foreground of the image:
Using a short finetuning process with only 3 labeled examples from 500 classes, our method improves robustness of ViT models across different model sizes and training techniques, even when data augmentations/ regularization are applied.
Below are links to download finetuned models for the base models of ViT AugReg (this is also the model that appears on timm), vanilla ViT, and DeiT. These are also the weights used in our colab notebook.
Path | Description |
---|---|
AugReg-B | Finetuned ViT Augreg base model. |
ViT-B | Finetuned vanilla ViT base model. |
DeiT-B | Finetuned DeiT base model. |
pytorch==1.7.1
torchvision==0.8.2
timm==0.4.12
To use the ImageNet-S labeled data, download the ImageNetS919
dataset
Clone the TokenCut project
git clone https://github.com/YangtaoWANG95/TokenCut.git
Install the dependencies Python 3.7, PyTorch 1.7.1, and CUDA 11.2. Please refer to the official installation. If CUDA 10.2 has been properly installed:
pip install torch==1.7.1 torchvision==0.8.2
Followed by:
pip install -r TokenCut/requirements.txt
Use the following command to extract the segmentation maps:
python tokencut_generate_segmentation.py --img_path <PATH_TO_IMAGE> --out_dir <PATH_TO_OUTPUT_DIRECTORY>
To finetune a pretrained ViT model use the imagenet_finetune.py
script. Notice to uncomment the import line containing the pretrained model you
wish to finetune.
Usage example:
python imagenet_finetune.py --seg_data <PATH_TO_SEGMENTATION_DATA> --data <PATH_TO_IMAGENET> --gpu 0 --lr <LR> --lambda_seg <SEG> --lambda_acc <ACC> --lambda_background <BACK> --lambda_foreground <FORE>
Notes:
lambda_seg=0.8
lambda_acc=0.2
lambda_background=2
lambda_foreground=0.3
temperature=0.65
for DeiT-Btemperature=0.55
for DeiT-SNotice to uncomment the import line containing the pretrained model you wish to finetune in the code.
Run the following command:
python imagenet_finetune_gradmask.py --seg_data <PATH_TO_SEGMENTATION_DATA> --data <PATH_TO_IMAGENET> --gpu 0 --lr <LR> --lambda_seg <SEG> --lambda_acc <ACC>
All hyperparameters for the different models can be found in section D of the supplementary material.
Run the following command:
python imagenet_finetune_rrr.py --seg_data <PATH_TO_SEGMENTATION_DATA> --data <PATH_TO_IMAGENET> --gpu 0 --lr <LR> --lambda_seg <SEG> --lambda_acc <ACC>
All hyperparameters for the different models can be found in section D of the supplementary material.
Download the evaluation datasets:
Run the following script to evaluate:
python imagenet_eval_robustness.py --data <PATH_TO_ROBUSTNESS_DATASET> --batch-size <BATCH_SIZE> --evaluate --checkpoint <PATH_TO_FINETUNED_CHECKPOINT>
checkpoint
parameter.--isV2
.--isObjectNet
.--isSI
.Our segmentation tests are based on the test in the official implementation of Transformer Interpretability Beyond Attention Visualization.
PYTHONPATH=./:$PYTHONPATH python SegmentationTest/imagenet_seg_eval.py --imagenet-seg-path <PATH_TO_gtsegs_ijcv.mat>
We would like to sincerely thank the authors for their great works.
If you make use of our work, please cite our paper:
@inproceedings{
chefer2022optimizing,
title={Optimizing Relevance Maps of Vision Transformers Improves Robustness},
author={Hila Chefer and Idan Schwartz and Lior Wolf},
booktitle={Thirty-Sixth Conference on Neural Information Processing Systems},
year={2022},
url={https://openreview.net/forum?id=upuYKQiyxa_}
}