Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
Full Changelog: https://github.com/michaelfeil/infinity/compare/0.0.34...0.0.35
Full Changelog: https://github.com/michaelfeil/infinity/compare/0.0.33...0.0.34
EngineArray
Multi-Model [1/3] by @michaelfeil in https://github.com/michaelfeil/infinity/pull/200
BatchHandler
into ModelWorker
by @michaelfeil in https://github.com/michaelfeil/infinity/pull/202
Full Changelog: https://github.com/michaelfeil/infinity/compare/0.0.32...0.0.33
You can now run a model with a alias. This will help you communicating with the API.
infinity_emb --served-model-name "your_nickname"
You can now use preload
models. This acts as a "run download and load into ram" test. Upon execution, all files are cached, which will speedup consecutive loads. For additonal speedups, use --no-model-warmup
to skip model warmup after loading.
infinity_emb --preload-only --model--name-or-path BAAI/bge-large-en-v1.5
PR's
Full Changelog: https://github.com/michaelfeil/infinity/compare/0.0.31...0.0.32
ENUM..TypeHint
into a function by @michaelfeil in https://github.com/michaelfeil/infinity/pull/172
/docs
and optional imports by @michaelfeil in https://github.com/michaelfeil/infinity/pull/175
Full Changelog: https://github.com/michaelfeil/infinity/compare/0.0.30...0.0.31
Full Changelog: https://github.com/michaelfeil/infinity/compare/0.0.29...0.0.30
This will be the last release with fastembed - fastembed and optimum provide similar capabilities. Please use optimum going forward.
Full Changelog: https://github.com/michaelfeil/infinity/compare/0.0.28...0.0.29
Full Changelog: https://github.com/michaelfeil/infinity/compare/0.0.27...0.0.28
Full Changelog: https://github.com/michaelfeil/infinity/compare/0.0.26...0.0.27
Full Changelog: https://github.com/michaelfeil/infinity/compare/0.0.25...0.0.26