Holds code for our CVPR'23 tutorial: All Things ViTs: Understanding and Interpreting Attention in Vision.
By: Hila Chefer (Tel-Aviv University and Google) and Sayak Paul (Hugging Face) (with Ron Mokady as a guest speaker)
Holds code for our CVPR'23 tutorial: All Things ViTs: Understanding and Interpreting Attention in Vision. We leverage ๐ค transformers, ๐งจ diffusers, timm, and PyTorch for the code samples.
We provide all the code samples as Colab Notebooks so that no setup is needed locally to execute them.
We divide our tutorial into the following logical sections:
explainability
: has the notebooks that show how to generate explanations from attention-based models (such as Vision Transformers) on the basis of their predictions.
CLIP_explainability.ipynb
Comparative_Transformer_explainability.ipynb
Transformer_explainability.ipynb
probing
: has notebooks the probe into the representations learned by the attention-based models (such as Vision Transformers).
dino_attention_maps.ipynb
mean_attention_distance.ipynb
Below we provide links to all the Colab Notebooks:
The following notebooks were taken from their original repositories with the authors being aware of this: