ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch...
Official PyTorch implementation of VoxFormer [CVPR 2023 Highlight]
Video Foundation Models & Data for Multimodal Understanding
[NeurIPS 2021] You Only Look at One Sequence
(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision ...
SOTA Semantic Segmentation Models in PyTorch
Repository of Vision Transformer with Deformable Attention (CVPR2022) an...
[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Tr...
Vision-Centric BEV Perception: A Survey
CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learni...
Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Re...
Feed PDFs, docs, slides, web pages and more into GPT-4-Vision in one lin...
Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transfor...
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Scene Text Recognition with Permuted Autoregressive Sequence Models (ECC...