PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Uni...
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectur...
Bottom-up attention model for image captioning and VQA, based on Faster ...
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question...
X-modaler is a versatile and high-performance codebase for cross-modal a...
Bilinear attention networks for visual question answering
Deep Modular Co-Attention Networks for Visual Question Answering
PyTorch implementation of "Transparency by Design: Closing the Gap Betwe...
A lightweight, scalable, and general framework for visual question answe...
Strong baseline for visual question answering
This repo contains evaluation code for the paper "MMMU: A Massive Multi-...
MathVista: data, code, and evaluation for Mathematical Reasoning in Visu...
Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirection...
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey