PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Uni...
mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
Deep Modular Co-Attention Networks for Visual Question Answering
Recent Papers including Neural Symbolic Reasoning, Logical Reasoning, Vi...
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing
Pytorch implementation of "Explainable and Explicit Visual Reasoning ove...
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision...
Visual Question Reasoning on General Dependency Tree