☁️ Build multimodal AI applications with cloud-native stack
🏄 Scalable embedding, reasoning, ranking for images and sentences with ...
a state-of-the-art-level open visual language model | 多模态预训练模型
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Foreca...
Codebase for ECCV18 "The Sound of Pixels"
TOMM2020 Dual-Path Convolutional Image-Text Embedding :feet: https://a...
Tools for movie and video research
[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Languag...
Awesome Cross-modality Person Re-identification
PyTorch implementation of the paper "Semantically Tied Paired Cycle Cons...
Co-Separating Sounds of Visual Objects (ICCV 2019)
Demo code for visible thermal (cross-modality) person re-identification
[CVPR 2023] Diverse Embedding Expansion Network and Low-Light Cross-Moda...
CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared P...