The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!
Full Changelog: https://github.com/bentoml/BentoML/compare/v1.2.0a4...v1.2.0a5
Full Changelog: https://github.com/bentoml/BentoML/compare/v1.2.0a3...v1.2.0a4
Full Changelog: https://github.com/bentoml/BentoML/compare/v1.2.0a2...v1.2.0a3
Full Changelog: https://github.com/bentoml/BentoML/compare/v1.2.0a1...v1.2.0a2
Full Changelog: https://github.com/bentoml/BentoML/compare/v1.2.0a0...v1.2.0a1
Full Changelog: https://github.com/bentoml/BentoML/compare/v1.1.11...v1.2.0a0
SSE
utils:import bentoml
from bentoml.io import SSE
class MyRunnable(bentoml.Runnable):
@bentoml.Runnable.method()
def streaming(self, text):
yield "data: 1\n\n"
yield "data: 12222222222222222222222222222\n\n"
runner = bentoml.Runner(MyRunnable)
svc = bentoml.Service("service", runners=[runner])
@svc.api()
def infer(text):
result = 0
async for it in runner.streaming.async_stream(text):
payload = SSE.from_iterator(it)
result += int(payload.data)
return result
README.md
by @shenxiangzhuang in https://github.com/bentoml/BentoML/pull/4301
trust_remote_code
and added unit tests by @MingLiangDai in https://github.com/bentoml/BentoML/pull/4271
Full Changelog: https://github.com/bentoml/BentoML/compare/v1.1.10...v1.1.11
Released a patch that set the upper bound for cattrs<23.2
, which breaks our whole serialisation process both upstream and downstream.
Full Changelog: https://github.com/bentoml/BentoML/compare/v1.1.9...v1.1.10
bentoml.transformers.import_model
API imports pretrained transformers models directly from HuggingFace. Using this API allows importing Transformers models into the BentoML model store without loading the model into memory. The bentoml.transformers.import_model
API takes the first argument to be the model name in BentoML store, and the second argument to be the model_id
on HuggingFace Hub.import bentoml
bentomodel = bentoml.transformers.import_model("zephyr-7b-beta", "HuggingFaceH4/zephyr-7b-beta")
nvidia-ml-py
: BentoML now uses the official nvidia-ml-py
package instead of pynvml
to avoid conflict with other packages.bentoml_configuration.yaml
, values in the form of ${ENV_VAR}
will be expanded at runtime to the value of the corresponding environment variable, but please note that this only supports string types.Full Changelog: https://github.com/bentoml/BentoML/compare/v1.1.7...v1.1.9
Full Changelog: https://github.com/bentoml/BentoML/compare/v1.1.7...v1.1.8