An official implementation for "CLIP4Clip: An Empirical Study of CLIP fo...
An official implementation for " UniVL: A Unified Video and Language Pre...
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Story-Based Retrieval with Contextual Embeddings. Largest freely availab...
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Represe...
[arXiv] Cross-Modal Adapter for Text-Video Retrieval
An end-to-end masked contrastive video-and-language pre-training framework