☁️ Build multimodal AI applications with cloud-native stack
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V...
🏄 Scalable embedding, reasoning, ranking for images and sentences with ...
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Langu...
Simple command line tool for text to image generation using OpenAI's CLI...
Feed PDFs, URLs, Slides, YouTube, and more into Vision-Language models w...
Algorithms and Publications on 3D Object Tracking
Collaborative Diffusion (CVPR 2023)
Effortless plugin and play Optimizer to cut model training costs by 50%...
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint ...
[ICCV2019] Robust Multi-Modality Multi-Object Tracking
Unifying Voxel-based Representation with Transformer for 3D Object Detec...
This repo contains the official code of our work SAM-SLR which won the C...
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Disco...