☁️ Build multimodal AI applications with cloud-native stack
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V...
🏄 Scalable embedding, reasoning, ranking for images and sentences with ...
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Langu...
Simple command line tool for text to image generation using OpenAI's CLI...
Algorithms and Publications on 3D Object Tracking
Feed PDFs, docs, slides, web pages and more into GPT-4-Vision in one lin...
Collaborative Diffusion (CVPR 2023)
Effortless plugin and play Optimizer to cut model training costs by 50%...
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint ...
[ICCV2019] Robust Multi-Modality Multi-Object Tracking
Unifying Voxel-based Representation with Transformer for 3D Object Detec...
This repo contains the official code of our work SAM-SLR which won the C...
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Disco...