BentoML Versions Save

The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!

v1.2.0a5

4 months ago

What's Changed

Full Changelog: https://github.com/bentoml/BentoML/compare/v1.2.0a4...v1.2.0a5

v1.2.0a4

4 months ago

What's Changed

Full Changelog: https://github.com/bentoml/BentoML/compare/v1.2.0a3...v1.2.0a4

v1.2.0a3

4 months ago

What's Changed

Full Changelog: https://github.com/bentoml/BentoML/compare/v1.2.0a2...v1.2.0a3

v1.2.0a2

4 months ago

What's Changed

Full Changelog: https://github.com/bentoml/BentoML/compare/v1.2.0a1...v1.2.0a2

v1.2.0a1

4 months ago

What's Changed

New Contributors

Full Changelog: https://github.com/bentoml/BentoML/compare/v1.2.0a0...v1.2.0a1

v1.2.0a0

4 months ago

What's Changed

Full Changelog: https://github.com/bentoml/BentoML/compare/v1.1.11...v1.2.0a0

v1.1.11

5 months ago

Bug fixes

  • Fix streaming for long payloads on remote runners. It will now always yield text and follow SSE protocol. We also provide SSE utils:
import bentoml
from bentoml.io import SSE

class MyRunnable(bentoml.Runnable):
	@bentoml.Runnable.method()
	def streaming(self, text):
		yield "data: 1\n\n"
		yield "data: 12222222222222222222222222222\n\n"

runner = bentoml.Runner(MyRunnable)

svc = bentoml.Service("service", runners=[runner])

@svc.api()
def infer(text):
	result = 0
	async for it in runner.streaming.async_stream(text):
		payload = SSE.from_iterator(it)
		result += int(payload.data)
	return result

What's Changed

New Contributors

Full Changelog: https://github.com/bentoml/BentoML/compare/v1.1.10...v1.1.11

v1.1.10

6 months ago

Released a patch that set the upper bound for cattrs<23.2, which breaks our whole serialisation process both upstream and downstream.

What's Changed

New Contributors

Full Changelog: https://github.com/bentoml/BentoML/compare/v1.1.9...v1.1.10

v1.1.9

6 months ago
  • Import Hugging Face Transformers Model: the bentoml.transformers.import_model API imports pretrained transformers models directly from HuggingFace. Using this API allows importing Transformers models into the BentoML model store without loading the model into memory. The bentoml.transformers.import_model API takes the first argument to be the model name in BentoML store, and the second argument to be the model_id on HuggingFace Hub.
import bentoml

bentomodel = bentoml.transformers.import_model("zephyr-7b-beta", "HuggingFaceH4/zephyr-7b-beta")
  • Standardize with nvidia-ml-py: BentoML now uses the official nvidia-ml-py package instead of pynvml to avoid conflict with other packages.
  • Define Environment Variable in Configuration: Within bentoml_configuration.yaml, values in the form of ${ENV_VAR} will be expanded at runtime to the value of the corresponding environment variable, but please note that this only supports string types.

What's Changed

New Contributors

Full Changelog: https://github.com/bentoml/BentoML/compare/v1.1.7...v1.1.9

v1.1.8

6 months ago

What's Changed

New Contributors

Full Changelog: https://github.com/bentoml/BentoML/compare/v1.1.7...v1.1.8