Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
--adapter-memory-fraction
by @tgaddair in https://github.com/predibase/lorax/pull/306
max_total_tokens
during warmup by @tgaddair in https://github.com/predibase/lorax/pull/286
Full Changelog: https://github.com/predibase/lorax/compare/v0.8.1...v0.9.0
Full Changelog: https://github.com/predibase/lorax/compare/v0.8.0...v0.8.1
Full Changelog: https://github.com/predibase/lorax/compare/v0.7.0...v0.8.0
Full Changelog: https://github.com/predibase/lorax/compare/v0.6.0...v0.7.0
prompt_tokens
to the response by @tgaddair in https://github.com/predibase/lorax/pull/165
Full Changelog: https://github.com/predibase/lorax/compare/v0.5.0...v0.6.0
Full Changelog: https://github.com/predibase/lorax/compare/v0.4.1...v0.5.0
Full Changelog: https://github.com/predibase/lorax/compare/v0.4.0...v0.4.1
Full Changelog: https://github.com/predibase/lorax/compare/v0.3.0...v0.4.0
Full Changelog: https://github.com/predibase/lorax/compare/v0.2.1...v0.3.0
LoRAX is the open-source framework for serving hundreds of fine-tuned LLMs in production for the price of one.