[EMNLP 2018] PyTorch code for TVQA: Localized, Compositional Video Quest...
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirection...
Implementation for the paper "Hierarchical Conditional Relation Networks...
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Mil...
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions...
Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning an...
An VideoQA dataset based on the videos from ActivityNet
Video Graph Transformer for Video Question Answering (ECCV'22)