Best 15 Mllm Open Source Projects

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Mo...

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Per...

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document U...

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruc...

A family of lightweight multimodal models.

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language M...

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language M...

Grounded Multimodal Large Language Model with Localized Visual Tokenization

Custom ComfyUI nodes for Vision Language Models, Large Language Models, ...

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object ...

Official code for Paper "Mantis: Multi-Image Instruction Tuning"

Merlin: Empowering Multimodal LLMs with Foresight Minds

Official Repo of Graphist

mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating

Evaluation framework for paper "VisualWebBench: How Far Have Multimodal ...