Best 55 Multi Modal Open Source Projects

SALMONN: Speech Audio Language Music Open Neural Network

A curated list of Visual Question Answering(VQA)(Image/Video Question An...

FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Susta...

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by La...

Open-source evaluation toolkit of large vision-language models (LVLMs), ...

Source code for "Taming Visually Guided Sound Generation" (Oral at the B...

Democratization of RT-2 "RT-2: New model translates vision and language ...

This repository collects papers for "A Survey on Knowledge Distillation ...

[MIR-2023-Survey] A continuously updated paper list for multi-modal pre-...

[ICCV2023] Official Implementation of "UniTR: A Unified and Efficient Mu...

Robust robotic localization and mapping, together with NavAbility(TM). ...

VLE: Vision-Language Encoder (VLE: 视觉-语言多模态预训练模型)

[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Un...

[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Pres...

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrar...