Video Foundation Models & Data for Multimodal Understanding
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirection...
A PyTorch implementation of VIOLET
[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Q...
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Mil...
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions...
a simple yet interesting tool for chatting with video
Video Graph Transformer for Video Question Answering (ECCV'22)
ROCK model for Knowledge-Based VQA in Videos