mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
mPLUG-DocOwl1.5 (Arxiv 2024) - mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
mPLUG-PaperOwl (Arxiv 2023) - mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model
UReader (EMNLP 2023) - UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
mPLUG-DocOwl (Arxiv 2023) - mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Note: The demo of HuggingFace is not as stable as ModelScope because the GPU in ZeroGPU Spaces of HuggingFace is dynamically assigned.