Best 37 Vision Language Open Source Projects

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-H...

💐Kaleido-BERT: Vision-Language Pre-training on Fashion Domain. (CVPR2021)

Tools for movie and video research

This is the third party implementation of the paper Grounding DINO: Marr...

A Framework of Small-scale Large Multimodal Models

Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foun...

[ICCV 2023} Official repo of "BEVBert: Multimodal Map Pre-training for L...

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrar...

Code for "Learning the Best Pooling Strategy for Visual Semantic Embeddi...

PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise in...

Pytorch code for Language Models with Image Descriptors are Strong Few-S...

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions...

[ICCV 2023] Official implementation of "PØDA: Prompt-driven Zero-shot Do...

A detection/segmentation dataset with labels characterized by intricate ...

[CVPR 2023] Official repository of paper titled "CLIP2Protect: Protectin...