Best 78 Vision And Language Open Source Projects

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

AI Research Platform for Reinforcement Learning from Real Panoramic Images.

A curated list of awesome vision and language resources (still under con...

[arXiv 2023] PointLLM: Empowering Large Language Models to Understand Po...

Conceptual 12M is a dataset containing (image-URL, caption) pairs collec...

Implementation of 'X-Linear Attention Networks for Image Captioning' [CV...

This repo lists relevant papers summarized in our survey paper: A Syste...

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-H...

code for TCL: Vision-Language Pre-Training with Triple Contrastive Learn...

Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for...

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robus...

Official Implementation of "GiT: Towards Generalist Vision Transformer t...

HPT - Open Multimodal LLMs from HyperGAI

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transfor...