An official implementation for "CLIP4Clip: An Empirical Study of CLIP fo...
An end-to-end masked contrastive video-and-language pre-training framework