Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible...
Reference implementation of Mistral AI 7B v0.1 model.
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
本项目旨在分享大模型相关技术原理以及实战经验。
OpenVINO™ is an open-source toolkit for optimizing and deploying AI infe...
Sparsity-aware deep learning inference runtime for CPUs
Code examples and resources for DBRX, a large language model developed b...
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
⚡ Build your chatbot within minutes on your favorite device; offer SOTA...
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linu...
Medusa: Simple Framework for Accelerating LLM Generation with Multiple D...
🦖 Stateful Serverless Framework for building Geo-distributed Edge AI Infra
Generative AI reference workflows optimized for accelerated infrastructu...
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
RayLLM - LLMs on Ray