ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.
download-onnxruntime-linux.sh
script
/usr/local/onnxruntime
./usr/local/onnxruntime/lib
to /etc/ld.so.conf.d/onnxruntime.conf
and run ldconfig
.brew install onnxruntime
sudo apt install cmake pkg-config libboost-all-dev libssl-dev
# optional, for Nvidia GPU support
sudo apt install nvidia-cuda-toolkit nvidia-cudnn
# optional, for Nvidia GPU support with Docker
sudo apt install nvidia-container-toolkit
brew install cmake boost openssl
cmake -B build -S . -DCMAKE_BUILD_TYPE=Release
cmake --build build --parallel
sudo cmake --install build --prefix /usr/local/onnxruntime-server
--model-dir
) where the models are located.
${model_dir}/${model_name}/${model_version}/model.onnx
Files in --model-dir |
Create session request body | Get/Execute session API URL path (after created) |
---|---|---|
model_name/model_version/model.onnx |
{"model":"model_name", "version":"model_version"} |
/api/sessions/model_name/model_version |
sample/v1/model.onnx |
{"model":"sample", "version":"v1"} |
/api/sessions/sample/v1 |
sample/v2/model.onnx |
{"model":"sample", "version":"v2"} |
/api/sessions/sample/v2 |
other/20200101/model.onnx |
{"model":"other", "version":"20200101"} |
/api/sessions/other/20200101 |
--tcp-port
option.--http-port
option.--https-port
, --https-cert
and --https-key
options.--swagger-url-path
option.-h
, --help
option to see a full list of options.ONNX_SERVER_CONFIG_PRIORITY=env
environment variable exists, environment variables have higher priority.
Within a Docker image, environment variables have higher priority.Option | Environment | Description |
---|---|---|
--workers |
ONNX_SERVER_WORKERS |
Worker thread pool size. Default: 4 |
--model-dir |
ONNX_SERVER_MODEL_DIR |
Model directory path The onnx model files must be located in the following path: ${model_dir}/${model_name}/${model_version}/model.onnx Default: models |
--prepare-model |
ONNX_SERVER_PREPARE_MODEL |
Pre-create some model sessions at server startup. Format as a space-separated list of model_name:model_version or model_name:model_version(session_options, ...) .Available session_options are - cuda=device_id [ or true or false] eg) model1:v1 model2:v9 model1:v1(cuda=true) model2:v9(cuda=1) |
Option | Environment | Description |
---|---|---|
--tcp-port |
ONNX_SERVER_TCP_PORT |
Enable TCP backend and which port number to use. |
--http-port |
ONNX_SERVER_HTTP_PORT |
Enable HTTP backend and which port number to use. |
--https-port |
ONNX_SERVER_HTTPS_PORT |
Enable HTTPS backend and which port number to use. |
--https-cert |
ONNX_SERVER_HTTPS_CERT |
SSL Certification file path for HTTPS |
--https-key |
ONNX_SERVER_HTTPS_KEY |
SSL Private key file path for HTTPS |
--swagger-url-path |
ONNX_SERVER_SWAGGER_URL_PATH |
Enable Swagger API document for HTTP/HTTPS backend. This value cannot start with "/api/" and "/health" If not specified, swagger document not provided. eg) /swagger or /api-docs |
Option | Environment | Description |
---|---|---|
--log-level |
ONNX_SERVER_LOG_LEVEL |
Log level(debug, info, warn, error, fatal) |
--log-file |
ONNX_SERVER_LOG_FILE |
Log file path. If not specified, logs will be printed to stdout. |
--access-log-file |
ONNX_SERVER_ACCESS_LOG_FILE |
Access log file path. If not specified, logs will be printed to stdout. |
1.16.1-linux-cuda
amd641.16.1-linux-cpu
amd64, arm64DOCKER_IMAGE=kibae/onnxruntime-server:1.16.1-linux-cuda # or kibae/onnxruntime-server:1.16.1-linux-cpu
docker pull ${DOCKER_IMAGE}
# simple http backend
docker run --name onnxruntime_server_container -d --rm --gpus all \
-p 80:80 \
-v "/your_model_dir:/app/models" \
-v "/your_log_dir:/app/logs" \
-e "ONNX_SERVER_SWAGGER_URL_PATH=/api-docs" \
${DOCKER_IMAGE}
--swagger-url-path=/swagger/
option at launch. This must be used with the --http-port
or --https-port
option.
./onnxruntime_server --model-dir=YOUR_MODEL_DIR --http-port=8080 --swagger-url-path=/api-docs/
http://localhost:8080/api-docs/
.%%{init: {
'sequence': {'noteAlign': 'left', 'mirrorActors': true}
}}%%
sequenceDiagram
actor A as Administrator
box rgb(0, 0, 0, 0.1) "ONNX Runtime Server"
participant SD as Disk
participant SP as Process
end
actor C as Client
Note right of A: You have 3 models to serve.
A ->> SD: copy model files to disk.<br />"/var/models/model_A/v1/model.onnx"<br />"/var/models/model_A/v2/model.onnx"<br />"/var/models/model_B/20201101/model.onnx"
A ->> SP: Start server with --prepare-model option
activate SP
Note right of A: onnxruntime_server<br />--http-port=8080<br />--model-path=/var/models<br />--prepare-model="model_A:v1(cuda=0) model_A:v2(cuda=0)"
SP -->> SD: Load model
Note over SD, SP: Load model from<br />"/var/models/model_A/v1/model.onnx"
SD -->> SP: Model binary
activate SP
SP -->> SP: Create<br />onnxruntime<br />session
deactivate SP
deactivate SP
rect rgb(100, 100, 100, 0.3)
Note over SD, C: Execute Session
C ->> SP: Execute session request
activate SP
Note over SP, C: POST /api/sessions/model_A/v1<br />{<br />"x": [[1], [2], [3]],<br />"y": [[2], [3], [4]],<br />"z": [[3], [4], [5]]<br />}
activate SP
SP -->> SP: Execute<br />onnxruntime<br />session
deactivate SP
SP ->> C: Execute session response
deactivate SP
Note over SP, C: {<br />"output": [<br />[0.6492120623588562],<br />[0.7610487341880798],<br />[0.8728854656219482]<br />]<br />}
end
%%{init: {
'sequence': {'noteAlign': 'left', 'mirrorActors': true}
}}%%
sequenceDiagram
actor A as Administrator
box rgb(0, 0, 0, 0.1) "ONNX Runtime Server"
participant SD as Disk
participant SP as Process
end
actor C as Client
Note right of A: You have 3 models to serve.
A ->> SD: copy model files to disk.<br />"/var/models/model_A/v1/model.onnx"<br />"/var/models/model_A/v2/model.onnx"<br />"/var/models/model_B/20201101/model.onnx"
A ->> SP: Start server
Note right of A: onnxruntime_server<br />--http-port=8080<br />--model-path=/var/models
rect rgb(100, 100, 100, 0.3)
Note over SD, C: Create Session
C ->> SP: Create session request
activate SP
Note over SP, C: POST /api/sessions<br />{"model": "model_A", "version": "v1"}
SP -->> SD: Load model
Note over SD, SP: Load model from<br />"/var/models/model_A/v1/model.onnx"
SD -->> SP: Model binary
activate SP
SP -->> SP: Create<br />onnxruntime<br />session
deactivate SP
SP ->> C: Create session response
deactivate SP
Note over SP, C: {<br />"model": "model_A",<br />"version": "v1",<br />"created_at": 1694228106,<br />"execution_count": 0,<br />"last_executed_at": 0,<br />"inputs": {<br />"x": "float32[-1,1]",<br />"y": "float32[-1,1]",<br />"z": "float32[-1,1]"<br />},<br />"outputs": {<br />"output": "float32[-1,1]"<br />}<br />}
Note right of C: 👌 You can know the type and shape<br />of the input and output.
end
rect rgb(100, 100, 100, 0.3)
Note over SD, C: Execute Session
C ->> SP: Execute session request
activate SP
Note over SP, C: POST /api/sessions/model_A/v1<br />{<br />"x": [[1], [2], [3]],<br />"y": [[2], [3], [4]],<br />"z": [[3], [4], [5]]<br />}
activate SP
SP -->> SP: Execute<br />onnxruntime<br />session
deactivate SP
SP ->> C: Execute session response
deactivate SP
Note over SP, C: {<br />"output": [<br />[0.6492120623588562],<br />[0.7610487341880798],<br />[0.8728854656219482]<br />]<br />}
end