[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transfor...
This repository contains my solutions to the assignments for Stanford's ...
Code of the CVPR 2021 Oral paper: A Recurrent Vision-and-Language BERT f...
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirection...
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist...
An ever-growing playground of notebooks showcasing CLIP's impressive zer...
[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grou...
Code for the ACL paper "No Metrics Are Perfect: Adversarial Reward Learn...
A PyTorch implementation of VIOLET
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Ima...
PyTorch code for CVPR 2019 paper: The Regretful Agent: Heuristic-Aided N...
Evaluating Vision & Language Pretraining Models with Objects, Attributes...
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial ...
PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2...
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Mil...