Xorbitsai Inference Versions Save

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

v0.11.0

2 days ago

What's new in 0.11.0 (2024-05-11)

These are the changes in inference v0.11.0.

Break Changes

v0.11.0 introduced break change when launching model that model_engine should be specified, refer to Model Engine for more information

New features

FEAT: Support Mixtral-8x22b-instruct-v0.1 by @qinxuye in https://github.com/xorbitsai/inference/pull/1340
feat: add phi-3-mini series by @orangeclk in https://github.com/xorbitsai/inference/pull/1379
FEAT: add Starling model by @boy-hack in https://github.com/xorbitsai/inference/pull/1384
FEAT: support qwen1.5 110b by @qinxuye in https://github.com/xorbitsai/inference/pull/1388
FEAT: Support query engine with cmdline by @Ago327 in https://github.com/xorbitsai/inference/pull/1380
FEAT: Ascend support by @qinxuye in https://github.com/xorbitsai/inference/pull/1408
FEAT: Audio support verbose_json and timestamp by @codingl2k1 in https://github.com/xorbitsai/inference/pull/1402
FEAT: [UI] Add engine option when launching LLM by @yiboyasss in https://github.com/xorbitsai/inference/pull/1456

Enhancements

ENH: add custom image model by @amumu96 in https://github.com/xorbitsai/inference/pull/1312
ENH: Support more quantization with VLLM by @amumu96 in https://github.com/xorbitsai/inference/pull/1372
ENH: Update chatglm3 6b model version by @codingl2k1 in https://github.com/xorbitsai/inference/pull/1401
ENH: make qwen_vl support streaming output by @Minamiyama in https://github.com/xorbitsai/inference/pull/1425
ENH: Removed the max tokens limitation and boost performance by avoid unnecessary repeated cuda device detection. by @mikeshi80 in https://github.com/xorbitsai/inference/pull/1429
ENH: Improve benchmark and add long context generate by @frostyplanet in https://github.com/xorbitsai/inference/pull/1423
ENH: make yi_vl support streaming output by @Minamiyama in https://github.com/xorbitsai/inference/pull/1443
ENH: Some minor changes by @frostyplanet in https://github.com/xorbitsai/inference/pull/1453
ENH: make deepseek_vl support streaming output by @Minamiyama in https://github.com/xorbitsai/inference/pull/1444
ENH: Rename model_engine for more clear inference backend by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1466
BLD: Use self-hosted aws machine to build docker image by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1405
CLN: Remove actor client by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1436
CLN: Remove all speculative-related codes by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1435
REF: Query for engine by @Ago327 in https://github.com/xorbitsai/inference/pull/1342
REF: [UI] Refactor register model by @yiboyasss in https://github.com/xorbitsai/inference/pull/1368
REF: Add the model_engine parameter for launching process by @hainaweiben in https://github.com/xorbitsai/inference/pull/1367

Bug fixes

BUG: Fix llama3-instruct 70B filename error by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1370
BUG: no role:user msg or content empty got an error. by @liuzhenghua in https://github.com/xorbitsai/inference/pull/1378
BUG: fix file template of andrewcanis/c4ai-command-r-v01-GGUF by @emulated24 in https://github.com/xorbitsai/inference/pull/1389
BUG: Fix using extra gpus due to match in __init__ by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1400
BUG: Fix qwen tool call paramerter empty issue by @codingl2k1 in https://github.com/xorbitsai/inference/pull/1381
BUG: Fix tool calls return invalid usage by @codingl2k1 in https://github.com/xorbitsai/inference/pull/1420
BUG: Fix tools ability by @mikeshi80 in https://github.com/xorbitsai/inference/pull/1447
BUG: Install error on MacOS due to auto-gptq by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1457
BUG: fix some issues in query engine interface by @Ago327 in https://github.com/xorbitsai/inference/pull/1442

Tests

TST: Pin huggingface-hub to pass CI since it has some break changes by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1427

Documentation

DOC: update readme & fix Mac CI by @qinxuye in https://github.com/xorbitsai/inference/pull/1385
DOC: worker address should be specified for xinference-worker by @amumu96 in https://github.com/xorbitsai/inference/pull/1397
DOC: update docker doc in using xinference by @qinxuye in https://github.com/xorbitsai/inference/pull/1417
DOC: add the missing backslash in shell command by @mikeshi80 in https://github.com/xorbitsai/inference/pull/1451
DOC: Usage about model_engine by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1468

Others

BUG：Fix mertics is empty when call /v1/chat/completions by @amumu96 in https://github.com/xorbitsai/inference/pull/1406

New Contributors

@liuzhenghua made their first contribution in https://github.com/xorbitsai/inference/pull/1378
@emulated24 made their first contribution in https://github.com/xorbitsai/inference/pull/1389
@orangeclk made their first contribution in https://github.com/xorbitsai/inference/pull/1379
@boy-hack made their first contribution in https://github.com/xorbitsai/inference/pull/1384
@frostyplanet made their first contribution in https://github.com/xorbitsai/inference/pull/1423

Full Changelog: https://github.com/xorbitsai/inference/compare/v0.10.3...v0.11.0

v0.10.3

2 weeks ago

What's new in 0.10.3 (2024-04-24)

These are the changes in inference v0.10.3.

New features

FEAT: support llama-3 family by @qinxuye in https://github.com/xorbitsai/inference/pull/1332
FEAT: Add Belle-whisper-large-v3-zh by @codingl2k1 in https://github.com/xorbitsai/inference/pull/1351

Enhancements

ENH: fix the max length of codeqwen-7B-chat by @mikeshi80 in https://github.com/xorbitsai/inference/pull/1354
ENH: Clear cache for embedding and rerank by @codingl2k1 in https://github.com/xorbitsai/inference/pull/1360

Bug fixes

BUG: Fix Launching embedding or reranking models from commandline fails due to PEFT by @hainaweiben in https://github.com/xorbitsai/inference/pull/1343
BUG: Fix extra parameters issue when auto-recovering models by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1348
BUG: Fix old rerank models use flag rerank issue by @codingl2k1 in https://github.com/xorbitsai/inference/pull/1350

Documentation

DOC: Add new models to README by @qinxuye in https://github.com/xorbitsai/inference/pull/1346
DOC: Update README, add FastGPT to integrations by @yangchuansheng in https://github.com/xorbitsai/inference/pull/1355

New Contributors

@yangchuansheng made their first contribution in https://github.com/xorbitsai/inference/pull/1355

Full Changelog: https://github.com/xorbitsai/inference/compare/v0.10.2.post1...v0.10.3

v0.10.2.post1

3 weeks ago

What's new in 0.10.2.post1 (2024-04-19)

These are the changes in inference v0.10.2.post1.

Bug fixes

BUG: Fix xinference-client package depends on internal code by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1330
BUG: Fix restful client depends on specific type by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1331

Full Changelog: https://github.com/xorbitsai/inference/compare/v0.10.2...v0.10.2.post1

v0.10.2

3 weeks ago

What's new in 0.10.2 (2024-04-19)

These are the changes in inference v0.10.2.

New features

FEAT: [UI] Add replica configuration when launching embedding and rerank models by @yiboyasss in https://github.com/xorbitsai/inference/pull/1306
FEAT: Lora multi support by @hainaweiben in https://github.com/xorbitsai/inference/pull/1273
FEAT: Support SeaLLM-7B and c4ai-command-r-v01 by @mujin2 in https://github.com/xorbitsai/inference/pull/1310
FEAT: Support BAAI/bge-reranker-v2-* rerank model by @codingl2k1 in https://github.com/xorbitsai/inference/pull/1305
FEAT: UI supports multi lora by @yiboyasss in https://github.com/xorbitsai/inference/pull/1320
FEAT: Add_cia4command_modelscope by @mujin2 in https://github.com/xorbitsai/inference/pull/1321
FEAT: support m3e embedding models by @qinxuye in https://github.com/xorbitsai/inference/pull/1298
FEAT: hotkey to active search by @Minamiyama in https://github.com/xorbitsai/inference/pull/1287
FEAT: support codeqwen1.5-chat by @qinxuye in https://github.com/xorbitsai/inference/pull/1322

Enhancements

ENH: Support custom audio model by @amumu96 in https://github.com/xorbitsai/inference/pull/1279
ENH: support int and str compare for model size by @mikeshi80 in https://github.com/xorbitsai/inference/pull/1277
BLD: Add FlagEmbedding in cpu docker by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1318
REF: support query for engine feature by @Ago327 in https://github.com/xorbitsai/inference/pull/1294

Others

Revert "REF: support query for engine feature" by @qinxuye in https://github.com/xorbitsai/inference/pull/1329

Full Changelog: https://github.com/xorbitsai/inference/compare/v0.10.1...v0.10.2

v0.10.1

1 month ago

What's new in 0.10.1 (2024-04-12)

These are the changes in inference v0.10.1.

New features

FEAT: add support for qwen1.5 32B chat model by @mikeshi80 in https://github.com/xorbitsai/inference/pull/1249
FEAT: Support Qwen MoE model for huggingface and modelscope by @xiaodouzi666 in https://github.com/xorbitsai/inference/pull/1263
FEAT: Enable streaming in tool calls for Qwen when using vllm by @zhanghx0905 in https://github.com/xorbitsai/inference/pull/1215

Enhancements

ENH: make function create_embedding could receive extra args by @amumu96 in https://github.com/xorbitsai/inference/pull/1224
ENH: support more GPTQ and AWQ format for some models by @xiaodouzi666 in https://github.com/xorbitsai/inference/pull/1243
ENH: support multi gpus for qwen-vl and yi-vl by @qinxuye in https://github.com/xorbitsai/inference/pull/1236
ENH: support llamacpp multiple gpu by @amumu96 in https://github.com/xorbitsai/inference/pull/1229
ENH: UI: paper material for cards by @Minamiyama in https://github.com/xorbitsai/inference/pull/1261
REF: Refactor launch model for Web UI by @yiboyasss in https://github.com/xorbitsai/inference/pull/1254
REF: Remove ctransformers supports by @mujin2 in https://github.com/xorbitsai/inference/pull/1267

Bug fixes

BUG: Fix docker cpu build by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1213
BUG: Fix cannot start xinference in docker due to cv2 by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1217
BUG: Cannot start xinference in docker by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1219
BUG: Fix opencv issue in docker container by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1227
BUG: Fix the launch bug of OmnilMM 12B. by @hainaweiben in https://github.com/xorbitsai/inference/pull/1241
BUG: style spell error by @Minamiyama in https://github.com/xorbitsai/inference/pull/1247
BUG: Fix issue with supervisor not clearing information after worker exit by @hainaweiben in https://github.com/xorbitsai/inference/pull/1231
BUG: custom models on the web ui by @yiboyasss in https://github.com/xorbitsai/inference/pull/1259
BUG: fix system prompts for chatglm3 and internlm2 pytorch by @qinxuye in https://github.com/xorbitsai/inference/pull/1271
BUG: Fix authority and jump issue by @yiboyasss in https://github.com/xorbitsai/inference/pull/1276
BUG: fix custom vision model by @qinxuye in https://github.com/xorbitsai/inference/pull/1280

Tests

TST: Fix tests due to llama-cpp-python v0.2.58 by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1242

Documentation

DOC: auto gen vllm doc & add chatglm3-{32k, 128k} support for vllm by @qinxuye in https://github.com/xorbitsai/inference/pull/1234
DOC: update models doc by @qinxuye in https://github.com/xorbitsai/inference/pull/1246
DOC: update readme by @qinxuye in https://github.com/xorbitsai/inference/pull/1268

New Contributors

@amumu96 made their first contribution in https://github.com/xorbitsai/inference/pull/1224
@xiaodouzi666 made their first contribution in https://github.com/xorbitsai/inference/pull/1243
@yiboyasss made their first contribution in https://github.com/xorbitsai/inference/pull/1254

Full Changelog: https://github.com/xorbitsai/inference/compare/v0.10.0...v0.10.1

v0.10.0

1 month ago

What's new in 0.10.0 (2024-03-29)

These are the changes in inference v0.10.0.

New features

FEAT: launch UI of audio model. by @hainaweiben in https://github.com/xorbitsai/inference/pull/1102
FEAT: Supports OmniLMM chat model by @hainaweiben in https://github.com/xorbitsai/inference/pull/1171
FEAT: Added vllm support for deepseek models by @ivanzfb in https://github.com/xorbitsai/inference/pull/1200
FEAT: force to specify worker ip and gpu idx when launching models by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1195
FEAT: OAuth system supports api-key by @Ago327 in https://github.com/xorbitsai/inference/pull/1168
FEAT: Support deepseek vl by @codingl2k1 in https://github.com/xorbitsai/inference/pull/1175
FEAT: support some builtin new models by @mujin2 in https://github.com/xorbitsai/inference/pull/1204

Enhancements

BLD: add autoawq in setup by @utopia2077 in https://github.com/xorbitsai/inference/pull/1190

Bug fixes

BUG: Fix the incorrect model interface address caused a 307 redirect to HTTP, blocking the request and preventing the display of the model list. by @wertycn in https://github.com/xorbitsai/inference/pull/1182
BUG: fix doc fail introduced by #1171 & update readme by @qinxuye in https://github.com/xorbitsai/inference/pull/1203
BUG: Increase validator types for thie 'input' parameter of embeddings to match OpenAI API by @Minamiyama in https://github.com/xorbitsai/inference/pull/1201

Documentation

DOC: internal design by @1572161937 in https://github.com/xorbitsai/inference/pull/1178
Doc: update readme and models doc by @qinxuye in https://github.com/xorbitsai/inference/pull/1176
DOC: Doc for oauth system with api-key by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1210

New Contributors

@utopia2077 made their first contribution in https://github.com/xorbitsai/inference/pull/1190
@ivanzfb made their first contribution in https://github.com/xorbitsai/inference/pull/1200

Full Changelog: https://github.com/xorbitsai/inference/compare/v0.9.4...v0.10.0

v0.9.4

1 month ago

What's new in 0.9.4 (2024-03-21)

These are the changes in inference v0.9.4.

New features

FEAT: Support CodeShell model by @hainaweiben in https://github.com/xorbitsai/inference/pull/1166
FEAT: Supports sglang backend by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1161

Enhancements

ENH: vLLM latest models support by @1572161937 in https://github.com/xorbitsai/inference/pull/1155

Bug fixes

BUG: remove best_of from benchmark by @qinxuye in https://github.com/xorbitsai/inference/pull/1150
BUG: fix _eval_qwen_chat_arguments parsing problem by @channingxiao18 in https://github.com/xorbitsai/inference/pull/1098
BUG: Fix OpenAI compatibility issue during chat by @mujin2 in https://github.com/xorbitsai/inference/pull/1159

Documentation

DOC: Update doc by @codingl2k1 in https://github.com/xorbitsai/inference/pull/1156

Others

Chore: add assign workflow by @qinxuye in https://github.com/xorbitsai/inference/pull/1131

New Contributors

@channingxiao18 made their first contribution in https://github.com/xorbitsai/inference/pull/1098
@1572161937 made their first contribution in https://github.com/xorbitsai/inference/pull/1155

Full Changelog: https://github.com/xorbitsai/inference/compare/v0.9.3...v0.9.4

v0.9.3

1 month ago

What's new in 0.9.3 (2024-03-15)

These are the changes in inference v0.9.3.

New features

FEAT: Add Yi-9B by @mujin2 in https://github.com/xorbitsai/inference/pull/1117
FEAT: Provided the function of generate image. by @hainaweiben in https://github.com/xorbitsai/inference/pull/1047

Enhancements

ENH: update cmd help info by @luweizheng in https://github.com/xorbitsai/inference/pull/1106
ENH: Remove quantization limits for Apple METAL device when running model via llama-cpp-python by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1134
ENH: Make GET /v1/models compatible with OpenAI API. by @notsyncing in https://github.com/xorbitsai/inference/pull/1127
ENH: support vllm>=0.3.1 by @qinxuye in https://github.com/xorbitsai/inference/pull/1145

Bug fixes

BUG: fix the useless fstring. by @mikeshi80 in https://github.com/xorbitsai/inference/pull/1130
BUG: Fixing the issue of model list loading failure caused by a large number of invalid requests on the model list page. by @wertycn in https://github.com/xorbitsai/inference/pull/1111
BUG: Fix cache status for embedding, rerank and image models on the web UI by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1135
BUG: Fix missing information for xinference registrations and xinference list command by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1140
BUG: Fix cannot continue to chat after canceling the streaming chat via ctrl+c by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1144

Tests

TST: Remove testing LLM model creating embedding by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1121

Documentation

DOC: fix CPU docker image & refine by @qinxuye in https://github.com/xorbitsai/inference/pull/1118

New Contributors

@luweizheng made their first contribution in https://github.com/xorbitsai/inference/pull/1106
@mujin2 made their first contribution in https://github.com/xorbitsai/inference/pull/1117
@wertycn made their first contribution in https://github.com/xorbitsai/inference/pull/1111

Full Changelog: https://github.com/xorbitsai/inference/compare/v0.9.2...v0.9.3

v0.9.2

2 months ago

What's new in 0.9.2 (2024-03-08)

These are the changes in inference v0.9.2.

New features

FEAT: Add a command / SDK interface to query which models are able to… by @hainaweiben in https://github.com/xorbitsai/inference/pull/1076
FEAT: add a docker-compose-distributed example with multiple workers by @bufferoverflow in https://github.com/xorbitsai/inference/pull/1064
FEAT: Support download and merge multiple parts of gguf files by @notsyncing in https://github.com/xorbitsai/inference/pull/1075
FEAT: Supports LoRA for LLM and image models by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1080

Enhancements

ENH: Supports n_gpu_layers parameter for llama-cpp-python by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1070
ENH: Add a dropdown to the web UI to support adjusting GPU offload layers for llama.cpp loader by @notsyncing in https://github.com/xorbitsai/inference/pull/1073
ENH: [UI] Show replica on running model page by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1093
ENH: Add "[DONE]" to the end of stream generation for better openai SDK compatibility by @ZhangTianrong in https://github.com/xorbitsai/inference/pull/1062
ENH: [UI] Support setting CPU when selecting n_gpu by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1096

Documentation

DOC: Extra parameters for launching models by @aresnow1 in https://github.com/xorbitsai/inference/pull/1077
DOC: contribution doc by @Ago327 in https://github.com/xorbitsai/inference/pull/1092
DOC: doc for lora by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1103

Others

Update llm_family.json to correct the context length of glaive coder by @mikeshi80 in https://github.com/xorbitsai/inference/pull/1083

New Contributors

@mikeshi80 made their first contribution in https://github.com/xorbitsai/inference/pull/1083
@bufferoverflow made their first contribution in https://github.com/xorbitsai/inference/pull/1064
@Ago327 made their first contribution in https://github.com/xorbitsai/inference/pull/1092

Full Changelog: https://github.com/xorbitsai/inference/compare/v0.9.1...v0.9.2

v0.9.1

2 months ago

What's new in 0.9.1 (2024-03-01)

These are the changes in inference v0.9.1.

New features

FEAT: Docker for cpu only by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1068

Enhancements

ENH: Support downloading gemma from modelscope by @aresnow1 in https://github.com/xorbitsai/inference/pull/1035
ENH: [UI] Setting quantization when registering LLM by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1040
ENH: Restful client supports multiple system prompts for chat by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1056
ENH: supports disabling worker reporting status by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1057
ENH: Extra params for xinference launch command line by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1048

Bug fixes

BUG: Fix some models that cannot download from modelscope by @ChengjieLi28 in https://github.com/xorbitsai/inference/pull/1066
BUG: Fix early truncation due to max_token being default to 16 instead of 1024 by @ZhangTianrong in https://github.com/xorbitsai/inference/pull/1061

Documentation

DOC: Update readme by @qinxuye in https://github.com/xorbitsai/inference/pull/1045
DOC: Fix readme by @qinxuye in https://github.com/xorbitsai/inference/pull/1054
DOC: Fix wechat links by @qinxuye in https://github.com/xorbitsai/inference/pull/1055

New Contributors

@ZhangTianrong made their first contribution in https://github.com/xorbitsai/inference/pull/1061

Full Changelog: https://github.com/xorbitsai/inference/compare/v0.9.0...v0.9.1