[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V...
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-r...
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from ...
Docker image for LLaVA: Large Language and Vision Assistant