Multimodal GPT Save

Multimodal-GPT

Project README

🤖 Multi-modal GPT

Train a multi-modal chatbot with visual and language instructions!

Based on the open-source multi-modal model OpenFlamingo, we create various visual instruction data with open datasets, including VQA, Image Captioning, Visual Reasoning, Text OCR, and Visual Dialogue. Additionally, we also train the language model component of OpenFlamingo using only language-only instruction data.

The joint training of visual and language instructions effectively improves the performance of the model! For more details please refer to our technical report.

Welcome to join us!

Open Source Agenda is not affiliated with "Multimodal GPT" Project. README Source: open-mmlab/Multimodal-GPT
Stars
1,402
Open Issues
16
Last Commit
10 months ago
License

Open Source Agenda Badge

Open Source Agenda Rating