A scalable inference server for models optimized with OpenVINO™
The 2024.1 has a few improvements in the serving functionality, demo enhancements and bug fixes.
Updated OpenVINO Runtime backend to 2024.1 Link
Added support for OpenVINO models with string data type on output. Together with the features introduced in 2024.0, now OVMS can support models with input and output of string type. That way you can take advantage of the tokenization built into the model as the first layer. You can also rely on any post-processing embedded into the model which returns just text. Check universal sentence encoder demo and image classification with string output demo
Updated MediaPipe python calculators to support relative path for all related configuration and python code files. Now, the complete graph configuration folder can be deployed in arbitrary path without any code changes. It is demonstrated in the updated text generation demo.
Extended support for KServe REST API for MediaPipe graph endpoints. Now you can send the data in KServe JSON body. Check how it is used in text generation use case.
Added demo showcasing full RAG algorithm entirely delegated to the model server Link
Added RedHat UBI based Dockerfile for python demos, usage documented in python demos
No breaking changes.
You can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command:
docker pull openvino/model_server:2024.1
- CPU device support with the image based on Ubuntu22.04
docker pull openvino/model_server:2024.1-gpu
- GPU and CPU device support with the image based on Ubuntu22.04
or use provided binary packages.
The prebuilt image is available also on RedHat Ecosystem Catalog
The 2024.0 includes new version of OpenVINO™ backend and several improvements in the serving functionality.
string
input data type including tokenization extension. Link to demo
Batch Size AUTO and Shape AUTO are deprecated and will be removed. Use Dynamic Model Shape feature instead.
No breaking changes.
You can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command:
docker pull openvino/model_server:2024.0
- CPU device support with the image based on Ubuntu22.04
docker pull openvino/model_server:2024.0-gpu
- GPU and CPU device support with the image based on Ubuntu22.04
or use provided binary packages.
The prebuilt image is available also on RedHat Ecosystem Catalog
The 2023.3 is a major release with added a new feature and numerous improvements.
Included a set of new demos using custom nodes as a python code. They include LLM text generation, stable diffusion and seq2seq translation.
Improvements in the demo highlighting video stream analysis. A simple client example can now process the video stream from a local camera, video file or RTSP stream. The data can be sent to the model server via unary gRPC calls or gRPC streaming.
Changes in the public release artifacts – the base image of the public model server images is now updated to Ubuntu 22.04 and RHEL 8.8. Public docker images include support for python custom nodes but without custom python dependencies. The public binary distribution of the model server is targeted also on Ubuntu 22.04 and RHEL 8.8 but without python support (it can be deployed on bare metal hosts without python installed). Check building from source guide.
Improvements in the documentation https://docs.openvino.ai/2023.3/ovms_what_is_openvino_model_server.html
gRPC streaming support is out of preview and considered stable.
No breaking changes.
Batch Size AUTO and Shape AUTO are deprecated and will be removed. Use Dynamic Model Shape feature instead.
OVMS handles boolean parameters to plugin config now https://github.com/openvinotoolkit/model_server/pull/2197
Sporadic failures in the IrisTracking demo using gRPC stream are fixed https://github.com/openvinotoolkit/model_server/pull/2161
Fixed handling of the incorrect MediaPipe graphs producing multiple outputs with the same name https://github.com/openvinotoolkit/model_server/pull/2161
You can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command:
docker pull openvino/model_server:2023.3
- CPU device support with the image based on Ubuntu22.04
docker pull openvino/model_server:2023.3-gpu
- GPU and CPU device support with the image based on Ubuntu22.04
or use provided binary packages.
The prebuilt image is available also on RedHat Ecosystem Catalog
The 2023.2 is a major release with several new features and improvements.
No breaking changes.
You can use an OpenVINO Model Server public Docker image's based on Ubuntu via the following command:
docker pull openvino/model_server:2023.2
- CPU device support with the image based on Ubuntu20.04
docker pull openvino/model_server:2023.2-gpu
- GPU and CPU device support with the image based on Ubuntu22.04
or use provided binary packages.
The prebuilt image is available also on RedHat Ecosystem Catalog
OpenVINO™ Model Server 2023.1
The 2023.1 is a major release with numerous improvements and changes.
GetModelMetadata
implementation for MediaPipe graphs – the calls to model metadata returns information about the expected inputs and outputs names from the graph with the limitation on shape and datatype
ov::Tensor
, mediapipe::Image
, KServe ModelInfer Request/Response – those capabilities simplify adoption of the existing graphs which might have on the input and output the expected data in many different formats. Now the data submitted to the KServe endpoint can be automatically deserialized to the expected type. The deserialization function is determined based on the naming convention in the graph input and output tags in the graphs config. Check more details.OpenVINOInferenceCalculator
support for a range of input formats from ov::Tensor
to tensorflow::Tensor
and TfLite::Tensor
- the OpenVINOInferenceCalculator
has been created as a replacement for Tensorflow calculators. It can accept the input data and returns the data with a range of possible formats. That simplifies just swapping inference related nodes in the existing graphs without changing the rest of the graph. Learn more about the calculators
OVMS_ApiVersion
calluint64_t
to int64_t
and dimCount
from uint32_t
to size_t
, this is breaking change[]
(scalar).0
size like [0,234]
.tflite
extensionYou can use an OpenVINO Model Server public Docker image's based on Ubuntu via the following command:
docker pull openvino/model_server:2023.1
- CPU device support with the image based on Ubuntu20.04
docker pull openvino/model_server:2023.1-gpu
- GPU and CPU device support with the image based on Ubuntu22.04
or use provided binary packages.
The prebuilt image is available also on RedHat Ecosystem Catalog
The 2023.0 is a major release with numerous improvements and changes.
ovmsclient
library can be used to send the string data to the model server. Check the code snippets.ovmsclient
is generally created to avoid the requirement of tensorflow
package installation to create smaller python environment. Now the tensorflow
package will not be conflicting so it is fully optional.You can use an OpenVINO Model Server public Docker image's based on Ubuntu via the following command:
docker pull openvino/model_server:2023.0
- CPU device support with the image based on Ubuntu20.04
docker pull openvino/model_server:2023.0-gpu
- GPU and CPU device support with the image based on Ubuntu22.04
or use provided binary packages.
The prebuilt image is available also on RedHat Ecosystem Catalog
The 2022.3.0.1 version is a patch release for the OpenVINO Model Server. It includes a few bug fixes and enhancement in the C-API.
You can use an OpenVINO Model Server public Docker image's based on Ubuntu via the following command:
docker pull openvino/model_server:2022.3.0.1
docker pull openvino/model_server:2022.3.0.1-gpu
or use provided binary packages.
The 2022.3 version is a major release. It includes several new features, enhancements and bug fixes.
OpenVINO Model Server can now load TensorFlow models directly from the model repository. Converting to OpenVINO Intermediate Representation (IR) format with model optimizer is not required. This is a preview feature with several limitations. The model must be in a frozen graph format with .pb extension. Loaded models take advantage of all OpenVINO optimizations. Learn more about it and check this demo.
It is now possible to leverage the model management functionality in OpenVINO Model Server for local inference execution within an application. Just dynamically link the OVMS shared library to take advantage of its new C API and use internal model server functions in C/C++ applications. To learn more see the documentation and check this demo.
The KServe gRPC API implemented in OpenVINO Model Server has been extended to support both input and output in format of Tensor data and raw data. Output format is consistent with the input format. This extension enables using Triton Client library with OpenVINO Model Server to send inference requests. The input data can be prepared as vectors or encoded as jpeg/png and sent as bytes. Learn more about the current API and check Python and C++ samples.
The KServe REST API now has additional functionality that improves compatibility with Triton Inference Server extension. It is now possible to send raw data in an HTTP request outside of the JSON content. Concatenated bytes can be interpreted by the model server depending on the header content. It is easy and quick to serialize the data from numpy/vectors and send jpeg/png encoded images.
OpenVINO Model Server has now official support for Intel® Data Center GPU Flex and Intel® Arc GPU cards. Learn more about using discrete GPU devices.
New client code samples to demonstrate KServe API usage. These samples illustrate typical data formats and scenarios. Check out the samples.
Python client code samples have been extended to include new API features for both the gRPC and REST interfaces
OpenVINO Model Server can now be used also with NVIDIA GPU cards. Follow those steps to build the Model Server from sources including NVIDIA plugin from openvino_contrib repo. Learn more about using NVIDIA plugin
You can use an OpenVINO Model Server public Docker image's based on Ubuntu via the following command:
docker pull openvino/model_server:2022.3
docker pull openvino/model_server:2022.3-gpu
or use provided binary packages.
The 2022.2 version is a major release with the new OpenVINO backend API (Application Programming Interface).
Beside Tensorflow Serving API, it is now possible to run calls to the OpenVINO Model Server using KServe API. The following gRPC methods are implemented: ModelInfer, ModelMetadata, ModelReady, ServerLive, ServerReady and ServerMetadata. Inference execution supports the input both in the raw_input_contents format and InferTensorContents.
The same clients can be used to connect with the OpenVINO Model Server like with other KFServe compatible model servers. Check the samples using Triton client library in python.
Next to TensorFlow Serving REST API, we implemented also KFServe REST API. There are functional the following endpoints:
v2
v2/health/live
v2/health/ready
v2/models/{MODEL_NAME}[/versions/{MODEL_VERSION}]
v2/models/{MODEL_NAME}[/versions/{MODEL_VERSION}]/ready
v2/models/{MODEL_NAME}[/versions/{MODEL_VERSION}]/infer
Beside the standard input format as tensor_data
, there is implemented also the binary extension compatible with the Triton Inference Server.
That way the data could be sent in as arrays in json
or as encoded to jpeg
or png
content.
Check how to connect to KFServe in the samples using Triton client library in python.
OpenVINO Model Server can now expose metrics compatible with Prometheus format. Metrics can be enabled in the server configuration file or using a command line parameter. The following metrics are now available:
ovms_streams
ovms_current_requests
ovms_requests_success
ovms_requests_fail
ovms_request_time_us
ovms_inference_time_us
ovms_wait_for_infer_req_time_us
ovms_infer_req_queue_size
ovms_infer_req_active
Metrics can be integrated with the Grafana reports or with horizontal autoscaler.
Learn more about using metrics.
OpenVINO Model Server includes now PaddlePaddle model importer. It is possible to deploy models trained in PaddlePaddle framework directly into the models repository. Check the demo how to deploy and use a segmentation model ocrnet-hrnet-w48-paddle in PaddlePaddle format.
In several scenarios, the pipeline execution was improved to reduce data copy operation. That will be perceived as reduced latency and increased overall throughput.
Deploying the pipelines based on exemplary custom nodes . So far it was required to compile the custom node and mount into the container during the deployment. Now, those libraries are added to the public docker image. Demos including custom nodes, include now an option to use the precompiled version in the image or to build them from source. Check the demo of horizontal text detection pipeline
With this version, the model server initiates the gRPC and REST endpoints (if enabled) before the models are loaded. Before that change, active network interface was acting as the readiness indicator. Now, the server readiness and models readiness can be checked using the dedicated endpoints according to the KFServe API:
v2/health/ready
v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]/ready
It will make easier to monitor the state of models during the initialization phase.
That impacts the custom node compatibility. Any custom nodes using OpenCV for custom image transformation could be recompiled. Check the recommended process for building the custom nodes in the docker container in our examples
You can use an OpenVINO Model Server public Docker image based on Ubuntu via the following command:
docker pull openvino/model_server:2022.2
or
docker pull openvino/model_server:2022.2-gpu
The 2022.1 version is a major release with the new OpenVINO backend API (Application Programming Interface). It includes several new features and a few breaking changes.
New features
--layout NCHW:NHWC
which informs the OVMS that natively accepts NHWC layout and we should add preprocessing step with transposition from NCHW to accept such inputs.Breaking changes
--shape "(1,3,224,224)" --layout NHWC
. Now both parameters should describe target values so with 2022.1 it should look like: --shape "(1,224,224,3)" --layout NHWC:NCHW
.--layout
administrator was not required to know what the underlying model layout is because OV by default used NCHW. Now when using parameter --layout NCHW
inform the OVMS that model is using layout NCHW – both model is using NCHW and accepting NCHW input.Other changes:
Bug Fixes:
You can use an OpenVINO Model Server public Docker image based on Ubuntu via the following command:
docker pull openvino/model_server:2022.1
or
docker pull openvino/model_server:2022.1-gpu