[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V...
An efficient, flexible and full-featured toolkit for fine-tuning large m...
A C#/.NET library to run LLM models (🦙LLaMA/LLaVA) on your local device...
ChatGPT爆火,开启了通往AGI的关键一步,本项目旨在汇总那些ChatGPT的开源平...
A one-stop data processing system to make data higher-quality, juicier, ...
ms-swift: Use PEFT or Full-parameter to finetune 200+ LLMs or 15+ MLLMs
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA...
Pocket-Sized Multimodal AI for content understanding and generation acro...
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLa...
Open-source evaluation toolkit of large vision-language models (LVLMs), ...
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for...
A Framework of Small-scale Large Multimodal Models
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robus...
Paddle Multimodal Integration and eXploration, supporting mainstream mul...