Model Server Versions Save

A scalable inference server for models optimized with OpenVINO™

v2024.1

3 weeks ago

The 2024.1 has a few improvements in the serving functionality, demo enhancements and bug fixes.

Changes and improvements

  • Updated OpenVINO Runtime backend to 2024.1  Link

  • Added support for OpenVINO models with string data type on output. Together with the features introduced in 2024.0, now OVMS can support models with input and output of string type. That way you can take advantage of the tokenization built into the model as the first layer. You can also rely on any post-processing embedded into the model which returns just text. Check universal sentence encoder demo  and image classification with string output demo

  • Updated MediaPipe python calculators to support relative path for all related configuration and python code files. Now, the complete graph configuration folder can be deployed in arbitrary path without any code changes. It is demonstrated in the updated text generation demo.

  • Extended support for KServe REST API for MediaPipe graph endpoints. Now you can send the data in KServe JSON body. Check how it is used in text generation use case.

  • Added demo showcasing full RAG algorithm entirely delegated to the model server Link

  • Added RedHat UBI based Dockerfile for python demos, usage documented in python demos

Breaking changes

No breaking changes.

Bug fixes

  • Improvements in error handling for invalid requests and incorrect configuration
  • Fixes in the demos and documentation

You can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command: docker pull openvino/model_server:2024.1 - CPU device support with the image based on Ubuntu22.04 docker pull openvino/model_server:2024.1-gpu - GPU and CPU device support with the image based on Ubuntu22.04 or use provided binary packages. The prebuilt image is available also on RedHat Ecosystem Catalog

v2024.0

2 months ago

The 2024.0 includes new version of OpenVINO™ backend and several improvements in the serving functionality.

Changes and improvements

  • Updated OpenVINO™ Runtime backend to 2024.0. Link
  • Extended text generation demo to support multi batch size both with streaming and unary clients. Link to demo
  • Added support for REST client for servables based on MediaPipe graphs including python pipeline nodes. Link to demo
  • Added additional MediaPipe calculators which can be reused for multiple image analysis scenarios. Link to new calculators
  • Added support for models with a string input data type including tokenization extension. Link to demo
  • Security related updates in versions of included dependencies. 

Deprecation notices

Batch Size AUTO and Shape AUTO are deprecated and will be removed. Use Dynamic Model Shape feature instead.

Breaking changes

No breaking changes.

Bug fixes

  • Improvements in error handling for invalid requests and incorrect configuration
  • Minor fixes in the demos and documentation

You can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command: docker pull openvino/model_server:2024.0 - CPU device support with the image based on Ubuntu22.04 docker pull openvino/model_server:2024.0-gpu - GPU and CPU device support with the image based on Ubuntu22.04 or use provided binary packages. The prebuilt image is available also on RedHat Ecosystem Catalog

v2023.3

3 months ago

The 2023.3 is a major release with added a new feature and numerous improvements.

Changes and improvements

  • Included a set of new demos using custom nodes as a python code. They include LLM text generation, stable diffusion and seq2seq translation.

  • Improvements in the demo highlighting video stream analysis. A simple client example can now process the video stream from a local camera, video file or RTSP stream. The data can be sent to the model server via unary gRPC calls or gRPC streaming.

  • Changes in the public release artifacts – the base image of the public model server images is now updated to Ubuntu 22.04 and RHEL 8.8. Public docker images include support for python custom nodes but without custom python dependencies. The public binary distribution of the model server is targeted also on Ubuntu 22.04 and RHEL 8.8 but without python support (it can be deployed on bare metal hosts without python installed). Check building from source guide.

  • Improvements in the documentation https://docs.openvino.ai/2023.3/ovms_what_is_openvino_model_server.html

New Features (Preview)

  • Added support for serving MediaPipe graphs with custom nodes implemented as a python code. It greatly simplifies exposing GenAI algorithms based on Hugging Face and Optimum libraries. It can be also applied for arbitrary pre and post processing for the AI solutions. Learn more about it

Stable Feature

gRPC streaming support is out of preview and considered stable.

Breaking changes

No breaking changes.

Deprecation notices

Batch Size AUTO and Shape AUTO are deprecated and will be removed. Use Dynamic Model Shape feature instead.

Bug fixes

You can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command: docker pull openvino/model_server:2023.3 - CPU device support with the image based on Ubuntu22.04 docker pull openvino/model_server:2023.3-gpu - GPU and CPU device support with the image based on Ubuntu22.04 or use provided binary packages. The prebuilt image is available also on RedHat Ecosystem Catalog

v2023.2

6 months ago

The 2023.2 is a major release with several new features and improvements.

Changes

  • Updated OpenVINO backend to version 2023.2.
  • MediaPipe framework has been updated to the current latest version 0.10.3.
  • Model API used in the OpenVINO Inference MediaPipe Calculator has been updated and included with all its features.

New Features

  • Introduced extension of KServe gRPC API with a stream on input and output. That extension is enabled for the servables with MediaPipe graphs. MediaPipe graph is persistent in the scope of the user session. That improves processing performance and supports stateful graphs – for example tracking algorithms. It also enables the use of source calculators. Check more details. 
  • Added a demo showcasing gRPC streaming with MediaPipe graph. Check more details.
  • Added parameters for gRPC quota configuration and changed default gRPC channel arguments to add rate limits. It will minimize the risks of impact of the service from uncontrolled flow of requests. Check more details.
  • Updated python clients requirements to match wide range of python versions from 3.7 to 3.11

Breaking changes

No breaking changes.

Bug fixes

  • Handling situation when MediaPipe graph is being added with the same name as previously loaded DAG.
  • Fixed returned HTTP status code when MediaPipe graph/DAG is not loaded yet. (previously 404, now 503)
  • Corrected error message returned via HTTP when using method other than GET for metadata endpoint - "Unsupported method".

You can use an OpenVINO Model Server public Docker image's based on Ubuntu via the following command: docker pull openvino/model_server:2023.2  - CPU device support with the image based on Ubuntu20.04 docker pull openvino/model_server:2023.2-gpu - GPU and CPU device support with the image based on Ubuntu22.04 or use provided binary packages. The prebuilt image is available also on RedHat Ecosystem Catalog

v2023.1

8 months ago

OpenVINO™ Model Server 2023.1

The 2023.1 is a major release with numerous improvements and changes.

New Features

  • Improvements in Model Server with MediaPipe integration. In the previous version MediaPipe scheduler was included in OpenVINO Model Server as a preview. Now, the MediaPipe graph scheduler is added by default and officially supported. Check mediapipe in the model server documentation. This release includes the following improvements in running requests calls to the graphs:
    • GetModelMetadata implementation for MediaPipe graphs – the calls to model metadata returns information about the expected inputs and outputs names from the graph with the limitation on shape and datatype
    • Support for data serialization and deserialization to a range of types: ov::Tensor, mediapipe::Image, KServe ModelInfer Request/Response – those capabilities simplify adoption of the existing graphs which might have on the input and output the expected data in many different formats. Now the data submitted to the KServe endpoint can be automatically deserialized to the expected type. The deserialization function is determined based on the naming convention in the graph input and output tags in the graphs config. Check more details.
    • OpenVINOInferenceCalculator support for a range of input formats from ov::Tensor to tensorflow::Tensor and TfLite::Tensor - the OpenVINOInferenceCalculator has been created as a replacement for Tensorflow calculators. It can accept the input data and returns the data with a range of possible formats. That simplifies just swapping inference related nodes in the existing graphs without changing the rest of the graph. Learn more about the calculators
    • Added demos based on MediaPipe upstream graphs: holistic sensory analysis, object detection, iris detection
  • Improvements in C-API interface:
    • Added OVMS_ApiVersion call
    • Added support for C-API calls to DAG pipelines
    • Changed data type in API calls for data shape from uint64_t to int64_t and dimCount from uint32_t to size_t, this is breaking change
    • Added a call to servable (model, DAG) metadata and state
    • Added a call to get ServerMetadata
  • Improvements in error handling
  • Improvements in GRPC and REST status codes - the error statuses will include more meaningful and accurate info about the culprit
  • Support for models with scalars on input (empty shape) - model server can be used with models even with input shape represented by an empty list [] (scalar).
  • Support for input with zero size dimensions - model server can now accept requests to dynamic shape models even with 0 size like [0,234]
  • Added support for TFLite models - OpenVINO Model Server can not directly serve models with .tflite extension
  • Demo improvements:

Breaking changes

  • Changed few of the C-API functions names. Check this commit

Bug fixes

  • Fix REST status code when the improper path is requested
  • metrics endpoint now returns correct response even with unsupported parameters

You can use an OpenVINO Model Server public Docker image's based on Ubuntu via the following command: docker pull openvino/model_server:2023.1  - CPU device support with the image based on Ubuntu20.04 docker pull openvino/model_server:2023.1-gpu - GPU and CPU device support with the image based on Ubuntu22.04 or use provided binary packages. The prebuilt image is available also on RedHat Ecosystem Catalog

v2023.0

11 months ago

The 2023.0 is a major release with numerous improvements and changes.

New Features

  • Added option to submit inference requests in a form of strings and reading the response also in a form of a string. That can be currently utilized via a custom nodes and OpenVINO models with a CPU extension handling string data:
    • Using a custom node in a DAG pipeline which can perform string tokenization before passing it to the OpenVINO model - that is beneficial for models without tokenization layer to fully delegate that preprocessing to the model server.
    • Using a custom node in a DAG pipeline which can perform string detokenization of the model response to convert it to a string format - that can be beneficial for models without detokenization layer to fully delegate that postprocessing to the model server.
    • Both options above are demonstrated with a GPT model for text generation demo.
    • For models with tokenization layer like universal-sentence-encoder - there is added a cpu extension which implements sentencepiece_tokenization layer. Users can pass to the model a string which is automatically converted to the format needed by the cpu extension.
    • The option above is demonstrated in universal-sentence-encoder model usage demo.
    • Added support for string input and output in the ovmsclientovmsclient library can be used to send the string data to the model server. Check the code snippets.
  • Preview version of OVMS with MediaPipe framework - it is possible to make calls to OpenVINO Model Server to perform mediapipe graph processing. There are calculators performing OpenVINO inference via C-API calls from OpenVINO Model Server, and also calculators converting the OV::Tensor input format to mediapipe image format. That creates a foundation for creating arbitrary graphs. Check model server integration with mediapipe documentation.
  • Extended C-API interface with ApiVersion and Metadata calls, C-API version is now 0.3.
  • Added support for saved_model format. Check how to create models repository. An example of such use case is in universal-sentence-encoder demo.
  • Added option to build the model server with NVIDIA plugin on UBI8 base image.
  • Virtual plugins AUTO, HETERO and MULTI are now supported with NVIDIA plugin.
  • In the DEBUG log_level, there is included a message about the actual execution device for each inference request for the AUTO target_device. Learn more about the AUTO plugin.
  • Support for relative paths to the model files. The paths can be now relative to the config.json location. It simplifies deployments when the config.json to distributed together with the models repository.
  • Updated OpenCL drivers for the GPU device to version 23.13 (with Ubuntu22.04 base image).
  • Added option to build OVMS on the base OS Ubuntu:22.04. This is an addition to the supported based OSes Ubuntu:20.04 and UBI8.7.

Breaking changes

  • KServe API unification with Triton implementation for handling string and encoded images formats (now every string or encoded image located in binary extension (REST) or raw_input_contents (GRPC) need to be preceded by 4 bytes (little endian) containing its size) The updated code snippets and samples.
  • Changed default performance hint from THROUGHPUT to LATENCY in 2023.0 the default performance hint is changed from THROUGHPUT to LATENCY. With the new default settings, the model server will be adjusted for optimal execution and minimal latency with low concurrency. The default setting will also minimize memory consumption. In case of the usage model with high concurrency, it is recommended to adjust the NUM_STREAMS or set the performance hint to THROUGHPUT explicitly. Read more in performance tuning guide.

Bug fixes

  • AUTO plugin starts serving models on CPU and switch to GPU device after the model is compiled – it reduces the startup time for the model.
  • Fixed image building error on MacOS and Ubuntu22.
  • Ovmsclient python library compatible with tensorflow in the same environment – ovmsclient is generally created to avoid the requirement of tensorflow package installation to create smaller python environment. Now the tensorflow package will not be conflicting so it is fully optional.
  • Improved memory handling after unloading the models – the model server will not force releasing the memory after models unloading. Memory consumption reported by the model server process will be smaller in use case, when the models are frequently changed.

You can use an OpenVINO Model Server public Docker image's based on Ubuntu via the following command: docker pull openvino/model_server:2023.0  - CPU device support with the image based on Ubuntu20.04 docker pull openvino/model_server:2023.0-gpu - GPU and CPU device support with the image based on Ubuntu22.04 or use provided binary packages. The prebuilt image is available also on RedHat Ecosystem Catalog

v2022.3.0.1

1 year ago

The 2022.3.0.1 version is a patch release for the OpenVINO Model Server. It includes a few bug fixes and enhancement in the C-API.

New Features

  • Added to inference execution method OVMS_Inference in C API support for DAG pipelines. The parameter servableName can be both the model name or the pipeline name
  • Added debug log in the AUTO plugin execution to report which physical device is used - AUTO plugin allocates the best available device for the model execution. For troubleshooting purposes, in the debug log level, the model server will report which device is used for each inference execution
  • Allowed enabling metrics collection via CLI parameters while using the configuration file. Metrics collection can be configured in CLI parameters or in the configuration file. Enabling the metrics in CLI is not blocking any more the usage of configuration file to define multiple models for serving.
  • Added client sample in Java to demonstrate KServe API usage .
  • Added client sample in Go to demonstrate KServe API usage.
  • Added client samples demonstrating asynchronous calls via KServe API.
  • Added a demo showcasing OVMS with GPT-J-6b model from Hugging Face.

Bug fixes

  • Fixed model server image building with NVIDIA plugin on a host with NVIDIA Container Toolkit installed.
  • Fixed KServe API response to include the DAG pipeline name for the calls to DAG – based on the API definition, the response includes the servable name. In case of DAG processing, it will return now the pipeline name instead of an empty value.
  • Default number of gRPC and REST workers will be calculated correctly based on allocated CPU cores – when the model server is started in the docker container with constrained CPU allocation, the default number of the frontend threads will be set more efficiently.
  • Corrected reporting the number of streams in the metrics while using non-CPU plugins – before fixing that bug, a zero value was returned. That metric suggests the optimal number of active parallel inferences calls for the best throughput performance.
  • Fixed handling model mapping with model reloads.
  • Fixed handling model mapping with dynamic shape/batch size.
  • ovmsclient is not causing conflicts with tensorflow-serving-api package installation in the same python environment.
  • Fixed debug image building.
  • Fixed C-API demo building.
  • Added security fixes.

Other changes:

  • Updated OpenCV version to 4.7 - opencv is an included dependence for image transformation in the custom nodes and for jpeg/png input decoding.
  • Lengthened requests waiting timeout during DAG reloads. On slower machines during DAG configuration reload sporadically timeout was reached ending in unsuccessful request.
  • ovmsclient has more relaxed requirements related to numpy version.
  • Improved unit tests stability.
  • Improved documentation.

You can use an OpenVINO Model Server public Docker image's based on Ubuntu via the following command: docker pull openvino/model_server:2022.3.0.1  docker pull openvino/model_server:2022.3.0.1-gpu or use provided binary packages.

v2022.3

1 year ago

The 2022.3 version is a major release. It includes several new features, enhancements and bug fixes.

New Features

Import TensorFlow Models – preview feature

OpenVINO Model Server can now load TensorFlow models directly from the model repository. Converting to OpenVINO Intermediate Representation (IR) format with model optimizer is not required. This is a preview feature with several limitations. The model must be in a frozen graph format with .pb extension. Loaded models take advantage of all OpenVINO optimizations. Learn more about it and check this demo.

C API interface to the model server internal functions – preview feature

It is now possible to leverage the model management functionality in OpenVINO Model Server for local inference execution within an application. Just dynamically link the OVMS shared library to take advantage of its new C API and use internal model server functions in C/C++ applications. To learn more see the documentation and check this demo.

Extended KServe gRPC API

The KServe gRPC API implemented in OpenVINO Model Server has been extended to support both input and output in format of Tensor data and raw data. Output format is consistent with the input format. This extension enables using Triton Client library with OpenVINO Model Server to send inference requests. The input data can be prepared as vectors or encoded as jpeg/png and sent as bytes. Learn more about the current API and check Python and C++ samples.

Extended KServe REST API

The KServe REST API now has additional functionality that improves compatibility with Triton Inference Server extension. It is now possible to send raw data in an HTTP request outside of the JSON content. Concatenated bytes can be interpreted by the model server depending on the header content. It is easy and quick to serialize the data from numpy/vectors and send jpeg/png encoded images.

Added Support for Intel® Data Center GPU Flex and Intel® Arc GPU

OpenVINO Model Server has now official support for Intel® Data Center GPU Flex and Intel® Arc GPU cards. Learn more about using discrete GPU devices.

C++ Sample Inference Client Applications using KServe API

New client code samples to demonstrate KServe API usage. These samples illustrate typical data formats and scenarios. Check out the samples.

Extended Python Client Samples using KServe API

Python client code samples have been extended to include new API features for both the gRPC and REST interfaces

Added integration with OpenVINO plugin for NVIDIA GPU

OpenVINO Model Server can now be used also with NVIDIA GPU cards. Follow those steps to build the Model Server from sources including NVIDIA plugin from openvino_contrib repo. Learn more about using NVIDIA plugin

Breaking changes

  • CLI parameter has been changed to reflect interval time unit: custom_node_resources_cleaner_interval_seconds. Default value should be optimal for most use cases.
  • Temporarily there is no support for HDDL/NCS plugins. Support for those will come in next release.

Deprecated functionality

  • Plugin config parameters from OpenVINO API 1.0 – OpenVINO Model can be tuned using plugin config parameters. So far, the parameter names are defined by OpenVINO API 1.0. It is recommended to start using the parameter names defined in OpenVINO API 2.0. In this release old parameters are automatically translated to new substitutions. Check performance tuning guide and more info about the plugin parameters.

Bug fixes

  • Improved performance for DAG pipelines executed on GPU accelerators
  • The default number of performance tuning parameters was not calculated correctly inside docker containers with constrained CPU capacity. Now the number of optimal streams for THROUGHPUT mode will be set based on the bound CPU in the container.
  • Fixes in unit tests raising sporadic false positive errors.

Other changes:

  • Published binary package of OpenVINO Model Server which can be used in the deployments on baremetal hosts without Docker containers. See instructions for baremetal deployment.
  • Updated software dependencies and container base images

You can use an OpenVINO Model Server public Docker image's based on Ubuntu via the following command: docker pull openvino/model_server:2022.3  docker pull openvino/model_server:2022.3-gpu or use provided binary packages.

v2022.2

1 year ago

The 2022.2 version is a major release with the new OpenVINO backend API (Application Programming Interface).

New features

KServe gRPC API

Beside Tensorflow Serving API, it is now possible to run calls to the OpenVINO Model Server using KServe API. The following gRPC methods are implemented: ModelInfer, ModelMetadata, ModelReady, ServerLive, ServerReady and ServerMetadata. Inference execution supports the input both in the raw_input_contents format and InferTensorContents.

The same clients can be used to connect with the OpenVINO Model Server like with other KFServe compatible model servers. Check the samples using Triton client library in python.

KServe REST API – feature preview

Next to TensorFlow Serving REST API, we implemented also KFServe REST API. There are functional the following endpoints:

v2
v2/health/live
v2/health/ready
v2/models/{MODEL_NAME}[/versions/{MODEL_VERSION}]
v2/models/{MODEL_NAME}[/versions/{MODEL_VERSION}]/ready
v2/models/{MODEL_NAME}[/versions/{MODEL_VERSION}]/infer

Beside the standard input format as tensor_data, there is implemented also the binary extension compatible with the Triton Inference Server.

That way the data could be sent in as arrays in json or as encoded to jpeg or png content.

Check how to connect to KFServe in the samples using Triton client library in python.

Execution metrics – feature preview

OpenVINO Model Server can now expose metrics compatible with Prometheus format. Metrics can be enabled in the server configuration file or using a command line parameter. The following metrics are now available:

ovms_streams 
ovms_current_requests 
ovms_requests_success 
ovms_requests_fail 
ovms_request_time_us 
ovms_inference_time_us 
ovms_wait_for_infer_req_time_us 
ovms_infer_req_queue_size 
ovms_infer_req_active 

Metrics can be integrated with the Grafana reports or with horizontal autoscaler.

Learn more about using metrics.

Direct support for PaddlePaddle models

OpenVINO Model Server includes now PaddlePaddle model importer. It is possible to deploy models trained in PaddlePaddle framework directly into the models repository. Check the demo how to deploy and use a segmentation model ocrnet-hrnet-w48-paddle in PaddlePaddle format.

Performance improvements in DAG execution

In several scenarios, the pipeline execution was improved to reduce data copy operation. That will be perceived as reduced latency and increased overall throughput.

Exemplary custom nodes are included in the OpenVINO Model Server public docker image.

Deploying the pipelines based on exemplary custom nodes . So far it was required to compile the custom node and mount into the container during the deployment. Now, those libraries are added to the public docker image. Demos including custom nodes, include now an option to use the precompiled version in the image or to build them from source. Check the demo of horizontal text detection pipeline

Breaking changes

Changed the sequence of starting REST/gRPC endpoints vs initial loading of the models.

With this version, the model server initiates the gRPC and REST endpoints (if enabled) before the models are loaded. Before that change, active network interface was acting as the readiness indicator. Now, the server readiness and models readiness can be checked using the dedicated endpoints according to the KFServe API:

v2/health/ready 
v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]/ready 

It will make easier to monitor the state of models during the initialization phase.

Updated OpenCV version used in the model server to 4.6.0 version

That impacts the custom node compatibility. Any custom nodes using OpenCV for custom image transformation could be recompiled. Check the recommended process for building the custom nodes in the docker container in our examples

Bug Fixes:

  • Minor fixes in logging
  • Fixed configuring warning log level
  • Fixes in documentation
  • Security fixes

You can use an OpenVINO Model Server public Docker image based on Ubuntu via the following command: docker pull openvino/model_server:2022.2 or docker pull openvino/model_server:2022.2-gpu

v2022.1

2 years ago

The 2022.1 version is a major release with the new OpenVINO backend API (Application Programming Interface). It includes several new features and a few breaking changes.

New features

  • Support for dynamic shape in the models Allow configuring model inputs to accept range of input shape dimensions variable batch size. This enables sending predict requests with various image resolutions and batches.
  • Model cache for faster loading and initialization The cached files make the Model Server initialization faster when performing subsequent model loading. Cache files can be reused within the same Model Server version, target device, hardware, model, model version, model shape and plugin config parameters.
  • Support for double precision OVMS now supports two more additional precisions FP64 and I64.
  • Extended API for the Directed Acyclic Graph scheduler custom nodes to include initialization and cleanup steps This enables additional use cases where you can initialize resources in the DAG loading step instead of during each predict request. This for example allows to avoid dynamic allocation during custom node execution.
  • Easier deployment of models with layout from training frameworks If model has information about its layout this information is preserved in OVMS. OpenVINO Model Optimizer can be instructed to save information about model layout.
  • Arbitrary layout transpositions Added support for handling any layout transformation when loading models. This will result in adding preprocessing step before inference. This is performed using --layout NCHW:NHWC which informs the OVMS that natively accepts NHWC layout and we should add preprocessing step with transposition from NCHW to accept such inputs.
  • Support for models with batch size on arbitrary dimension Batch size in layout can be now on any position in model. Previously OVMS batch size was accepted only on first dimension when changing model batch size.

Breaking changes

  • Order of reshape and layout change operations during model initialization. In previous OVMS versions, the order was: first do the reshape then apply layout change. In this release OVMS handles order of operations for user, and it is required to specify expected final shape and expected transposition to be added. If you wanted to change model with original shape: (1,3,200,200), layout: NCHW to handle different layout & resolution you had to set --shape "(1,3,224,224)" --layout NHWC. Now both parameters should describe target values so with 2022.1 it should look like: --shape "(1,224,224,3)" --layout NHWC:NCHW.
  • Layout parameter changes Previously when configuring model with parameter --layout administrator was not required to know what the underlying model layout is because OV by default used NCHW. Now when using parameter --layout NCHW inform the OVMS that model is using layout NCHW – both model is using NCHW and accepting NCHW input.
  • Custom nodes code must include implementation of new API methods. It might be dummy implementation if not needed. Additionally, all previous API functions must include additional parameter void*.
  • In the DAG pipelines configuration, demultiplexing with dynamic number of parallel operations is configurable with the parameter “dynamic_count” set to –1 beside the 0 so far. It will be more consistent with the common conventions used e.g., in model input shapes. Using 0 is now deprecated and support for this will be removed in following releases.

Other changes:

  • Updated demo with question answering use case – BERT model demo with dynamic shape and variable length of the request content
  • Rearranged structure of the demos and client code examples.
  • Python client code examples both with tensorflow-server-api and ovmsclient library.
  • Demos updated to use models with preserved layout and color format
  • Custom nodes updated to use new API. Initialization step in model zoo custom node uses memory buffers initialization to speed up the execution.

Bug Fixes:

  • Fixed issue with loading cloud stored models. Occasionally when downloading model it would not load properly.
  • Fixes in documentation
  • Security fixes

You can use an OpenVINO Model Server public Docker image based on Ubuntu via the following command: docker pull openvino/model_server:2022.1 or docker pull openvino/model_server:2022.1-gpu