MPLUG DocOwl Save

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Project README

The Powerful Multi-modal LLM Family

for OCR-free Document Understanding

Alibaba Group

News

  • 🔥🔥🔥 [2024.4.3] We build demos of DocOwl1.5 on both ModelScope and HuggingFace 🤗, supported by the DocOwl1.5-Omni. The source codes of launching a local demo are also released in DocOwl1.5.
  • 🔥🔥 [2024.3.28] We release the training data (DocStruct4M, DocDownstream-1.0, DocReason25K), codes and models (DocOwl1.5-stage1, DocOwl1.5, DocOwl1.5-Chat, DocOwl1.5-Omni) of mPLUG-DocOwl 1.5 on both HuggingFace 🤗 and ModelScope .
  • 🔥 [2024.3.20] We release the arxiv paper of mPLUG-DocOwl 1.5, a SOTA 8B Multimodal LLM on OCR-free Document Understanding (DocVQA 82.2, InfoVQA 50.7, ChartQA 70.2, TextVQA 68.6).
  • [2024.01.13] Our Scientific Diagram Analysis dataset M-Paper has been available on both HuggingFace 🤗 and ModelScope , containing 447k high-resolution diagram images and corresponding paragraph analysis.
  • [2023.10.13] Training data, models of mPLUG-DocOwl/UReader has been open-soruced.
  • [2023.10.10] Our paper UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model is accepted by EMNLP 2023.
  • [2023.07.10] The demo of mPLUG-DocOwl on ModelScope is avaliable.
  • [2023.07.07] We release the technical report and evaluation set of mPLUG-DocOwl.

Models

  • mPLUG-DocOwl1.5 (Arxiv 2024) - mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

  • mPLUG-PaperOwl (Arxiv 2023) - mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

  • UReader (EMNLP 2023) - UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model

  • mPLUG-DocOwl (Arxiv 2023) - mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Online Demo

Note: The demo of HuggingFace is not as stable as ModelScope because the GPU in ZeroGPU Spaces of HuggingFace is dynamically assigned.

ModelScope

HuggingFace

Cases

images

Open Source Agenda is not affiliated with "MPLUG DocOwl" Project. README Source: X-PLUG/mPLUG-DocOwl

Open Source Agenda Badge

Open Source Agenda Rating