A lightweight, scalable, and general framework for visual question answe...
Strong baseline for visual question answering
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robus...
Multimodal Question Answering in the Medical Domain: A summary of Existi...
Research Code for ICCV 2019 paper "Relation-aware Graph Attention Networ...
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirection...
Implementation for the paper "Hierarchical Conditional Relation Networks...
读过的CV方向的一些论文,图像生成文字、弱监督分割等
[IEEE TIP'2021] "UGC-VQA: Benchmarking Blind Video Quality Assessment fo...
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Mil...
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to b...
[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias
Improved Fusion of Visual and Language Representations by Dense Symmetri...
Tensorflow Implementation of Deeper LSTM+ normalized CNN for Visual Ques...