Code for ALBEF: a new vision-language pre-training method
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Ima...
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal S...
Deep Cross-Modal Projection Learning for Image-Text Matching
The largest multilingual image-text classification dataset. It contains ...
A client library for LAION's effort to filter CommonCrawl with CLIP, bui...