LAVIS - A One-stop Library for Language-Vision Intelligence
A one stop repository for generative AI research updates, interview reso...
Multimodal-GPT
Code for ALBEF: a new vision-language pre-training method
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Tra...
The implementation of "Prismer: A Vision-Language Model with Multi-Task ...
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
Oscar and VinVL
X-modaler is a versatile and high-performance codebase for cross-modal a...
My Reading Lists of Deep Learning and Natural Language Processing
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Represen...
Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Lingui...
日本語LLMまとめ - Overview of Japanese LLMs
Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model ca...
Creating a software for automatic monitoring in online proctoring