Foundational Model for Speech Recognition Tasks
Images to inference with no labeling (use foundation models to train sup...
A professional list of Deep Learning and Large (Language) Models (LM, LL...
Docker image for LLaVA: Large Language and Vision Assistant
Recent research papers about Foundation Models for Combinatorial Optimiz...
Code Base for MinD-Video
This is an official repo for fine-tuning SAM to customized medical images.
A general representation modal across vision, audio, language modalities.
[ICML 2024] A novel, efficient approach combining convolutional operatio...
Code for Neural Plasticity-Inspired Foundation Model for Observing the E...
World Model based Autonomous Driving Platform in CARLA :car:
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal ...
A curated list of foundation models for vision and language tasks