LAVIS - A One-stop Library for Language-Vision Intelligence
Official implementation of the paper "Grounding DINO: Marrying DINO with...
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Uni...
[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation ...
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision...
Instruction Following Agents with Multimodal Transforemrs