Paddle Multimodal Integration and eXploration, supporting mainstream mul...
Famous Vision Language Models and Their Architectures