Beyond INet Save

Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"

Project README

ConvNet vs Transformer, Supervised vs CLIP:
Beyond ImageNet Accuracy

Paper | Project Page | Twitter

ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy

Kirill Vishniakov¹, Zhiqiang Shen¹, Zhuang Liu²

¹MBZUAI, ²Meta AI Research

Overview

Abstract: Modern computer vision offers a great variety of models to practitioners, and selecting a model from multiple options for specific applications can be challenging. Conventionally, competing model architectures and training protocols are compared by their classification accuracy on ImageNet. However, this single metric does not fully capture performance nuances critical for specialized tasks. In this work, we conduct an in-depth comparative analysis of model behaviors beyond ImageNet accuracy, for both ConvNet and Vision Transformer architectures, each across supervised and CLIP training paradigms. Although our selected models have similar ImageNet accuracies and compute requirements, we find that they differ in many other aspects: types of mistakes, output calibration, transferability, and feature invariance, among others. This diversity in model characteristics, not captured by traditional metrics, highlights the need for more nuanced analysis when choosing among different models.

How to use this repo

This repository provides the necessary code and resources to reproduce the experiments detailed in our paper.

Inference Code is located within the inference folder, containing functionality for various evaluations like calibration, ImageNet-X, PUG-ImageNet, robustness, shape/texture bias and transformation invariance.

There are several Entry Points to run experiments:

main.py: Main script to run experiments. Using this script you can run experiments for robustness, ImageNet-X, invariance, PUG-ImageNet.
inference/calibration.py: Script for model calibration evaluation.
inference/shape_texture.py: Script for analyzing shape and texture biases.

models folder contains the model implementations. For supervised models we use Deit3-B/16 and ConvNext-B both pretrained on ImageNet-21K and finetuned on ImageNet-1K, and take implementations from their original repos. Their checkpoints can be acquired at the respective links:

For CLIP models we use models the following models from OpenCLIP:

ConvNeXt-CLIP: convnext_base, pretrained: laion400m_s13b_b51k
ViT-CLIP: ViT-B-16, pretrained: laion400m_e31

Inference

Here we provide several examples on how to utilize the provided scripts to run various evaluations.

By default the following values are supported for the --model argument: deit3_21k, convnext_base_21k, vit_clip, convnext_clip.

Using `main.py`:

Robustness Evaluation: In this case $data_path should be pointing to the folder where easyrobust downloaded all the benchmarks.
```
python3 main.py --model "deit3_21k" --experiment "robustness" --data_path $data_path
```

ImageNet-X Evaluation:

python3 main.py --model "deit3_21k" --experiment "imagenet_x" --data_path $ImageNet_Val_path

Invariance Evaluation:

python3 main.py --model "deit3_21k" --experiment "invariance" --data_path $ImageNet_Val_path

PUG-ImageNet Evaluation:

python3 main.py --model "deit3_21k" --experiment "pug_imagenet" --data_path $PUG_ImageNet_path

Using `inference/calibration.py`:

Model Calibration Evaluation: In this case $data_path should be pointing either to ImageNet validation or ImageNet-R.
```
python3 inference/calibration.py \
    --model "deit3_21k" \
    --data_path $data_path
```

Using `inference/shape_texture.py`:

Shape and Texture Bias Analysis:
```
python3 inference/shape_texture.py
```

Installation

Clone the repository:

git clone https://github.com/kirill-vish/Beyond-INet.git

Navigate to the project directory:
```
cd Beyond-INet
```

Create a new Conda environment and activate it:

conda create --name beyond_imagenet python=3.10
conda activate beyond_imagenet

Install the required packages:
```
pip install -r requirements.txt
```
Install the package:
```
python setup.py install
```

Contact

Kirill Vishniakov ki.vishniakov at gmail.com

Acknowledgements

Citation

If you found our work useful, please consider citing us.

@article{vishniakov2023convnet,
      title={ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy}, 
      author={Kirill Vishniakov and Zhiqiang Shen and Zhuang Liu},
      year={2023},
      eprint={2311.09215},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

This repository is licensed under MIT License.

Open Source Agenda is not affiliated with "Beyond INet" Project. README Source: kirill-vish/Beyond-INet

Stars

Open Issues

Last Commit

3 months ago

Repository

kirill-vish/Beyond-INet

License

MIT

Homepage

https://arxiv.org/abs/2311.09215

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/beyond-inet"><img src="https://www.opensourceagenda.com/projects/beyond-inet/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog