An end-to-end masked contrastive video-and-language pre-training framework
No resources for this project.