An open source implementation of CLIP.
Chinese version of CLIP which achieves Chinese cross-modal retrieval and...
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and T...
The implementation of "Prismer: A Vision-Language Model with Multi-Task ...
A concise but complete implementation of CLIP with various experimental ...
A curated list of Visual Question Answering(VQA)(Image/Video Question An...
[CVPR 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Tow...
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Pres...
A detection/segmentation dataset with labels characterized by intricate ...
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Dis...
Pytorch version of the HyperDenseNet deep neural network for multi-modal...
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Atte...
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Un...
【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object R...