Best 52 Llm Inference Open Source Projects

Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible...

Reference implementation of Mistral AI 7B v0.1 model.

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

本项目旨在分享大模型相关技术原理以及实战经验。

OpenVINO™ is an open-source toolkit for optimizing and deploying AI infe...

Sparsity-aware deep learning inference runtime for CPUs

Code examples and resources for DBRX, a large language model developed b...

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

⚡ Build your chatbot within minutes on your favorite device; offer SOTA...

Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linu...

Medusa: Simple Framework for Accelerating LLM Generation with Multiple D...

🦖 Stateful Serverless Framework for building Geo-distributed Edge AI Infra

Generative AI reference workflows optimized for accelerated infrastructu...

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

RayLLM - LLMs on Ray