Embed arbitrary modalities (images, audio, documents, etc) into large la...
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from ...
(NeurIPS 2022 CellSeg Challenge - 1st Winner) Open source code for "MEDI...
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to b...
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
Implementation of MambaByte in "MambaByte: Token-free Selective State Sp...
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Atte...
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Represe...
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Select...
Latest Papers and Datasets on Visual Instruction Tuning
Official code for WACV 2021 paper - Compositional Learning of Image-Text...