Predibase Lorax Versions Save

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

v0.9.0

1 month ago

🎉 Enhancements

Allow assigning dedicated memory reservation for adapters on GPU by @tgaddair in https://github.com/predibase/lorax/pull/303
Enforce adapters cannot be loaded past --adapter-memory-fraction by @tgaddair in https://github.com/predibase/lorax/pull/306
Added Qwen2 by @tgaddair in https://github.com/predibase/lorax/pull/327
Make max_new_tokens optional, default to max_total_tokens - input_length by @tgaddair in https://github.com/predibase/lorax/pull/353
Expose ignore_eos_token option in generate requests by @jeffreyftang in https://github.com/predibase/lorax/pull/340
Generate to max_total_tokens during warmup by @tgaddair in https://github.com/predibase/lorax/pull/286
Add support for returning alternative tokens by @JTS22 in https://github.com/predibase/lorax/pull/297
feat: add repetition_penalty and top_k to openai by @huytuong010101 in https://github.com/predibase/lorax/pull/288
Add support for LoRA adapters trained with Rank-Stabilized scaling by @arnavgarg1 in https://github.com/predibase/lorax/pull/299
Provide more granular methods to configure the embedded S3 client. by @mitchklusty in https://github.com/predibase/lorax/pull/325
Allow specifying base model as model param in OpenAI API by @tgaddair in https://github.com/predibase/lorax/pull/331
Add ignore_eos_token param to completions and chat completions endpoints by @jeffreyftang in https://github.com/predibase/lorax/pull/344
Log whether SGMV kernel is enabled by @tgaddair in https://github.com/predibase/lorax/pull/342
Log generated tokens out to file when streaming by @magdyksaleh in https://github.com/predibase/lorax/pull/309

🐛 Bugfixes

Fix tensor parallelism with SGMV to use true rank of the LoRA after splitting by @tgaddair in https://github.com/predibase/lorax/pull/324
Fix hanging caused by tqdm stderr not being printed by @tgaddair in https://github.com/predibase/lorax/pull/352
Fix dynamic RoPE by @tgaddair in https://github.com/predibase/lorax/pull/350
Only update cache during warmup by @tgaddair in https://github.com/predibase/lorax/pull/351
Prevent model loading errors from appearing as flash attention import errors by @tgaddair in https://github.com/predibase/lorax/pull/328
Make architecture compatibility check non-fatal if base model config cannot be loaded by @tgaddair in https://github.com/predibase/lorax/pull/317
Fix Qwen2 LoRA loading by @tgaddair in https://github.com/predibase/lorax/pull/345
Remove vec wrapping from OpenAI-compatible response by @jeffreyftang in https://github.com/predibase/lorax/pull/273
Disallow early stopping during warmup by @tgaddair in https://github.com/predibase/lorax/pull/290
Skip returning EOS token on finish_reason 'stop' by @jeffreyftang in https://github.com/predibase/lorax/pull/289
Fixed static adapter loading with same arch by @tgaddair in https://github.com/predibase/lorax/pull/300
Ensure model_id is a string when using a model from s3 by @fadebek in https://github.com/predibase/lorax/pull/291
Fix name for adapter id by @noyoshi in https://github.com/predibase/lorax/pull/284
Update AsyncClient with ignore_eos_token parameter by @jeffreyftang in https://github.com/predibase/lorax/pull/341

📝 Docs

Update docs now that we no longer return a list from OpenAI-compatible endpoints by @jeffreyftang in https://github.com/predibase/lorax/pull/281
Change guided generation to structured generation by @jeffreyftang in https://github.com/predibase/lorax/pull/302
Clarify getting started documentation regarding port number used in pre-built Docker image. by @alexsherstinsky in https://github.com/predibase/lorax/pull/313
Added system requirements to README by @tgaddair in https://github.com/predibase/lorax/pull/293
Update README.md by @tgaddair in https://github.com/predibase/lorax/pull/294

🔧 Maintenance

Split out server and router unit tests by @tgaddair in https://github.com/predibase/lorax/pull/275
Add in response headers to streaming endpoint by @noyoshi in https://github.com/predibase/lorax/pull/282
Propagate bearer token from header if one exists for OpenAI-compatible endpoints by @jeffreyftang in https://github.com/predibase/lorax/pull/278
Update tokenizers to v0.15 to be consistent with server by @tgaddair in https://github.com/predibase/lorax/pull/285
Autogen python client docs by @tgaddair in https://github.com/predibase/lorax/pull/295
Reporting on total tokens by @noyoshi in https://github.com/predibase/lorax/pull/349

New Contributors

@huytuong010101 made their first contribution in https://github.com/predibase/lorax/pull/288
@fadebek made their first contribution in https://github.com/predibase/lorax/pull/291
@JTS22 made their first contribution in https://github.com/predibase/lorax/pull/297
@alexsherstinsky made their first contribution in https://github.com/predibase/lorax/pull/313
@mitchklusty made their first contribution in https://github.com/predibase/lorax/pull/325

Full Changelog: https://github.com/predibase/lorax/compare/v0.8.1...v0.9.0

v0.8.1

2 months ago

🎉 Enhancements

Added Gemma by @tgaddair in https://github.com/predibase/lorax/pull/267
Pass details param into client by @magdyksaleh in https://github.com/predibase/lorax/pull/265

🔧 Maintenance

bump version by @magdyksaleh in https://github.com/predibase/lorax/pull/268
Bump by @magdyksaleh in https://github.com/predibase/lorax/pull/270

Full Changelog: https://github.com/predibase/lorax/compare/v0.8.0...v0.8.1

v0.8.0

2 months ago

🎉 Enhancements

Added Outlines logits processor for JSON schema validation by @tgaddair in https://github.com/predibase/lorax/pull/224
Enable JSON guided generation via OpenAI-compatible API by @jeffreyftang in https://github.com/predibase/lorax/pull/243
JSON schema for guided generation now optionally respects field order by @jeffreyftang in https://github.com/predibase/lorax/pull/264
Set default adapter source by @magdyksaleh in https://github.com/predibase/lorax/pull/223
Pad LoRA ranks to ensure compatibility with SGMV kernel by @tgaddair in https://github.com/predibase/lorax/pull/256
Add model and adapter response headers by @magdyksaleh in https://github.com/predibase/lorax/pull/220
Add Cors params by @magdyksaleh in https://github.com/predibase/lorax/pull/221
Add expose headers by @magdyksaleh in https://github.com/predibase/lorax/pull/230

🐛 Bugfixes

Properly split out model_id when retrieving adapter weights downloaded from S3 by @jeffreyftang in https://github.com/predibase/lorax/pull/246
Fixed TIES merging to calculate sign before applying weights by @tgaddair in https://github.com/predibase/lorax/pull/239
Update s3.py by @llama-shepard in https://github.com/predibase/lorax/pull/234
Fix concatenate for flash batch by @tgaddair in https://github.com/predibase/lorax/pull/254
Fixed batch merging and filtering to handle Outlines state by @tgaddair in https://github.com/predibase/lorax/pull/263

📝 Docs

Add guide for guided generation by @jeffreyftang in https://github.com/predibase/lorax/pull/240
Added contributing guide by @tgaddair in https://github.com/predibase/lorax/pull/226
Update README to include model merging by @tgaddair in https://github.com/predibase/lorax/pull/225
Updated structured output by @tgaddair in https://github.com/predibase/lorax/pull/258
Minor corrections to development env setup instructions by @jeffreyftang in https://github.com/predibase/lorax/pull/228

🔧 Maintenance

Upgrade docker to use rust 1.75 and ubuntu 22.04 by @tgaddair in https://github.com/predibase/lorax/pull/250
Upgrading rust for dependency changes by @DhruvaBansal00 in https://github.com/predibase/lorax/pull/248
fix paths on runner by @noyoshi in https://github.com/predibase/lorax/pull/242

New Contributors

@jeffreyftang made their first contribution in https://github.com/predibase/lorax/pull/228
@DhruvaBansal00 made their first contribution in https://github.com/predibase/lorax/pull/248

Full Changelog: https://github.com/predibase/lorax/compare/v0.7.0...v0.8.0

v0.7.0

3 months ago

🎉 Enhancements

Merge multiple LoRA adapters per request (linear, TIES, DARE) by @tgaddair in https://github.com/predibase/lorax/pull/212
Eetq by @flozi00 in https://github.com/predibase/lorax/pull/195
hqq JIT Quantization by @flozi00 in https://github.com/predibase/lorax/pull/147
Added Bloom dynamic adapter loading by @tgaddair in https://github.com/predibase/lorax/pull/187
Added pbase adapter_source and expose api_token in client by @tgaddair in https://github.com/predibase/lorax/pull/181
Cloudflare R2 Source by @llama-shepard in https://github.com/predibase/lorax/pull/198

🐛 Bugfixes

Fixed Phi for new HF format by @tgaddair in https://github.com/predibase/lorax/pull/192
Fixed OpenAI stream response data by @tgaddair in https://github.com/predibase/lorax/pull/193
fix: OpenAI response format by @tgaddair in https://github.com/predibase/lorax/pull/184
Fix RoPE and YARN scaling by @tgaddair in https://github.com/predibase/lorax/pull/202
check for base model earlier in the adapter function by @noyoshi in https://github.com/predibase/lorax/pull/196

📝 Docs

Updated quantization docs by @tgaddair in https://github.com/predibase/lorax/pull/206

🔧 Maintenance

Upgrade to pytorch==2.2.0 by @tgaddair in https://github.com/predibase/lorax/pull/217
upgrade exllama kernel by @flozi00 in https://github.com/predibase/lorax/pull/209
Add a model cache to avoid running out of storage by @magdyksaleh in https://github.com/predibase/lorax/pull/201

New Contributors

@llama-shepard made their first contribution in https://github.com/predibase/lorax/pull/198

Full Changelog: https://github.com/predibase/lorax/compare/v0.6.0...v0.7.0

v0.6.0

4 months ago

🎉 Enhancements

OpenAI v1 Completions API by @tgaddair in https://github.com/predibase/lorax/pull/170
OpenAI v1 Chat Completions API by @tgaddair in https://github.com/predibase/lorax/pull/171
Added prompt_tokens to the response by @tgaddair in https://github.com/predibase/lorax/pull/165

🐛 Bugfixes

fix: Handle NaN values during weight conversion by @tgaddair in https://github.com/predibase/lorax/pull/168

📝 Docs

docs: OpenAI compatible API by @tgaddair in https://github.com/predibase/lorax/pull/174

🔧 Maintenance

fix: Only install stanford-stk on linux by @tgaddair in https://github.com/predibase/lorax/pull/169
added separate installation for torch by @asingh9530 in https://github.com/predibase/lorax/pull/173

New Contributors

@asingh9530 made their first contribution in https://github.com/predibase/lorax/pull/173

Full Changelog: https://github.com/predibase/lorax/compare/v0.5.0...v0.6.0

v0.5.0

4 months ago

🎉 Enhancements

CUDA graph compilation by @tgaddair in https://github.com/predibase/lorax/pull/154

🐛 Bugfixes

Fixed deadlock in sgmv_shrink kernel caused by imbalanced segments by @tgaddair in https://github.com/predibase/lorax/pull/156
Fixed loading adapter from absolute s3 path by @tgaddair in https://github.com/predibase/lorax/pull/161

📝 Docs

Update client docs with new endpoint source by @abidwael in https://github.com/predibase/lorax/pull/126
Update client docs with new endpoint source by @abidwael in https://github.com/predibase/lorax/pull/146

🔧 Maintenance

Reduce Docker size by removing duplicate torch install by @tgaddair in https://github.com/predibase/lorax/pull/144
remove CACHE_MANAGER in flash_causal_lm.py by @michaelfeil in https://github.com/predibase/lorax/pull/157

New Contributors

@michaelfeil made their first contribution in https://github.com/predibase/lorax/pull/157

Full Changelog: https://github.com/predibase/lorax/compare/v0.4.1...v0.5.0

v0.4.1

5 months ago

🐛 Bugfixes

fix: Phi LoRA loading by @tgaddair in https://github.com/predibase/lorax/pull/136
fix: Triton usage for GPT-Q by @tgaddair in https://github.com/predibase/lorax/pull/140

🔧 Maintenance

Optimize SGMV kernel code path to reduce mallocs by @tgaddair in https://github.com/predibase/lorax/pull/139
fix sync script to account for subfolder bucket paths by @noyoshi in https://github.com/predibase/lorax/pull/135

Full Changelog: https://github.com/predibase/lorax/compare/v0.4.0...v0.4.1

v0.4.0

5 months ago

🎉 Enhancements

Mixtral by @flozi00 in https://github.com/predibase/lorax/pull/122
Added Phi by @tgaddair in https://github.com/predibase/lorax/pull/132
add support for H100s by @thelinuxkid in https://github.com/predibase/lorax/pull/111
upgrade to py 3.10 by @flozi00 in https://github.com/predibase/lorax/pull/121
Add predibase as a source for adapters by @magdyksaleh in https://github.com/predibase/lorax/pull/125
enh: Add soci indexing to allow Lazy loading of LoRAX images by @gyanesh-mishra in https://github.com/predibase/lorax/pull/95

🐛 Bugfixes

fix: Set Mistral sliding window to max position embeddings when None by @tgaddair in https://github.com/predibase/lorax/pull/128
Fix Qwen tensor parallelism by @tgaddair in https://github.com/predibase/lorax/pull/120
fix: Llama AWQ with GQA by @tgaddair in https://github.com/predibase/lorax/pull/114
fix: Mixtral adapter loading wraps lm_head by @tgaddair in https://github.com/predibase/lorax/pull/131

📝 Docs

Add Skypilot example and getting started guide by @tgaddair in https://github.com/predibase/lorax/pull/117
docs: fix broken link by @Fluder-Paradyne in https://github.com/predibase/lorax/pull/133
Added Mixtral and Phi to docs by @tgaddair in https://github.com/predibase/lorax/pull/134

🔧 Maintenance

Increase default client timeout to 60s by @tgaddair in https://github.com/predibase/lorax/pull/119
Make transpose contiguous for fan-in-fan-out by @tgaddair in https://github.com/predibase/lorax/pull/129
remove lorax env var by @geoffreyangus in https://github.com/predibase/lorax/pull/113

New Contributors

@gyanesh-mishra made their first contribution in https://github.com/predibase/lorax/pull/95
@thelinuxkid made their first contribution in https://github.com/predibase/lorax/pull/111
@Fluder-Paradyne made their first contribution in https://github.com/predibase/lorax/pull/133

Full Changelog: https://github.com/predibase/lorax/compare/v0.3.0...v0.4.0

v0.3.0

5 months ago

What's Changed

Enhancements

Add AWQ quantization by @flozi00 in https://github.com/predibase/lorax/pull/102
Add support for Qwen by @tgaddair in https://github.com/predibase/lorax/pull/103
Add Flash GPT2 by @geoffreyangus in https://github.com/predibase/lorax/pull/93
LoRAX-compatible GPT-2 by @geoffreyangus in https://github.com/predibase/lorax/pull/109

Bugfixes

decrease the max batch total tokens manually by @flozi00 in https://github.com/predibase/lorax/pull/89
Added --max-active-adapters to launcher by @tgaddair in https://github.com/predibase/lorax/pull/96
fix gptq fp16 inference by @flozi00 in https://github.com/predibase/lorax/pull/104
fix static adapter merge by @geoffreyangus in https://github.com/predibase/lorax/pull/106

Maintenance

Update values.yaml tag to always use the latest image by @arnavgarg1 in https://github.com/predibase/lorax/pull/87
Update chart version by @abidwael in https://github.com/predibase/lorax/pull/88
Warn if there are unused weights in the adapter by @tgaddair in https://github.com/predibase/lorax/pull/105
docs: Added client docs for connecting to Predibase endpoints by @tgaddair in https://github.com/predibase/lorax/pull/98
Generalized layer types and row parallel split logic by @tgaddair in https://github.com/predibase/lorax/pull/110
Mkdocs by @tgaddair in https://github.com/predibase/lorax/pull/112

New Contributors

@arnavgarg1 made their first contribution in https://github.com/predibase/lorax/pull/87

Full Changelog: https://github.com/predibase/lorax/compare/v0.2.1...v0.3.0

lorax-0.3.0

5 months ago

LoRAX is the open-source framework for serving hundreds of fine-tuned LLMs in production for the price of one.