All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (T...
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Mil...
Dataset API for "PhraseCut: Language-based Image Segmentation in the Wild"
Video Feature Extraction Code for EMNLP 2020 paper "HERO: Hierarchical E...
🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle
[EMNLP'21] Visual News: Benchmark and Challenges in News Image Captioning
source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper L...
A collection of multimodal datasets, and visual features for VQA and cap...
[CVPR20] Video Object Grounding using Semantic Roles in Language Descrip...
Code for CVPR'19 "Recursive Visual Attention in Visual Dialog"
Code for "Aligning Visual Regions and Textual Concepts for Semantic-Grou...
Tensorflow Implementation on Paper [CVPR2020]Image Search with Text Feed...
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robo...
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Atte...