:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Langu...
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Per...
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, a...
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document U...
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to ...
A family of lightweight multimodal models.
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language M...
Research Trends in LLM-guided Multimodal Learning.
Personal Project: MPP-Qwen14B(Multimodal Pipeline Parallel-Qwen14B). Don...
[Paper][Preprint 2023] Making Large Language Models Perform Better in Kn...
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal...
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
From scratch implementation of a vision language model in pure PyTorch
Official implementation of "Gemini in Reasoning: Unveiling Commonsense i...