ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
T2T-ViT pretrained models