Large-scale Self-supervised Pre-training Across Tasks, Languages, and Mo...
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Per...
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document U...
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruc...
A family of lightweight multimodal models.
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language M...
Grounded Multimodal Large Language Model with Localized Visual Tokenization
Custom ComfyUI nodes for Vision Language Models, Large Language Models, ...
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object ...
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
Merlin: Empowering Multimodal LLMs with Foresight Minds
Official Repo of Graphist
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal ...