Official implementation of the paper "Grounding DINO: Marrying DINO with...
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Uni...
Chinese version of CLIP which achieves Chinese cross-modal retrieval and...
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectur...
Pix2Seq codebase: multi-tasks with generative modeling (autoregressive a...
日本語LLMまとめ - Overview of Japanese LLMs
DriveLM: Driving with Graph Visual Question Answering
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLa...
PyTorch code for "Controlling Vision-Language Models for Universal Image...
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Official implementation of SEED-LLaMA (ICLR 2024).
CLIPort: What and Where Pathways for Robotic Manipulation
多模态中文LLaMA&Alpaca大语言模型(VisualCLA)
[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation ...
💐Kaleido-BERT: Vision-Language Pre-training on Fashion Domain. (CVPR2021)