Triton TensorRT Inference CRAFT Pytorch Save

Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX

Project README

Advanced Triton Inference Pipeline for CRAFT (Character-Region Awareness For Text detection)

High performance and advanced inference for text detection with Triton Inference Server. Model format included: TensorRT, ONNX, Torchscript.
CRAFT: (forked from https://github.com/clovaai/CRAFT-pytorch) Official Pytorch implementation of CRAFT text detector | Paper | Pretrained Model | Supplementary

Overview

Implementation new inference pipeline using NVIDIA Triton Inference Server for CRAFT text detector in Pytorch.

Author

k9ele7en. Give 1 star if you find some value in this repo.
Thank you.

License

[BSD-3-Clause License] The BSD 3-clause license allows you almost unlimited freedom with the software so long as you include the BSD copyright and license notice in it (found in Fulltext).

Updates

13 Jul, 2021: Initial update, preparation script run well.

14 Jul, 2021: Inference on Triton server run well (single request), TensorRT format give advance performance.

Getting started

1. Install dependencies

Requirements

$ pip install -r requirements.txt

2. Install required environment for inference using Triton server

Check ./README_ENV.md for details. Install tools/packages included:

TensorRT
Docker
nvidia-docker
PyCUDA ...

3. Training

The code for training is not included in this repository, as ClovaAI provided.

4. Inference instruction using pretrained model

Download the trained models

Model name	Used datasets	Languages	Purpose	Model Link
General	SynthText, IC13, IC17	Eng + MLT	For general purpose	Click
IC15	SynthText, IC15	Eng	For IC15 only	Click
LinkRefiner	CTW1500	-	Used with the General Model	Click

5. Model preparation before run Triton server:

a. Triton Inference Server inference: see details at ./README_ENV.md
Initially, you need to run a (.sh) script to prepare Model Repo, then, you just need to run Docker image when inferencing. Script get things ready for Triton server, steps covered:

Convert downloaded pretrain into mutiple formats
Locate converted model formats into Triton's Model Repository
Run (Pull first if not exist) Triton Server image from NGC

Check if Server running correctly:

$ curl -v localhost:8000/v2/health/ready
...
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain

Now everythings ready, start inference by:

Run docker image of Triton server (replace mount -v path to your full path to model_repository):

$ sudo docker run --gpus all --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /home/maverick911/repo/Triton-TensorRT-Inference-CRAFT-pytorch/model_repository:/models nvcr.io/nvidia/tritonserver:21.05-py3 tritonserver --model-repository=/models
...
+------------+---------+--------+
| Model      | Version | Status |
+------------+---------+--------+
| detec_onnx | 1       | READY  |
| detec_pt   | 1       | READY  |
| detec_trt  | 1       | READY  |
+------------+---------+--------+
I0714 00:37:55.265177 1 grpc_server.cc:4062] Started GRPCInferenceService at 0.0.0.0:8001
I0714 00:37:55.269588 1 http_server.cc:2887] Started HTTPService at 0.0.0.0:8000
I0714 00:37:55.312507 1 http_server.cc:2906] Started Metrics Service at 0.0.0.0:8002

Run infer by cmd:

$ python infer_triton.py -m='detec_trt' -x=1 --test_folder='./images' -i='grpc' -u='localhost:8001'
Request 1, batch size 1s/sample.jpg
elapsed time : 0.9521937370300293s

Output from Triton:

Performance benchmarks: single image (sample.jpg), time in seconds

Triton server: (gRPC-HTTP):

Model format gRPC (s) HTTP (s)

TensoRT 0.946 0.952

Torchscript 1.244 1.098

ONNX 1.052 1.060
Classic Pytorch: 1.319s

Arguments

-m: name of model with format
-x: version of model
--test_folder: input image/folder
-i: protocol (HTTP/gRPC)
-u: URL of corresponding protocol (HTTP-8000, gRPC-8001)
... (Details in ./infer_triton.py)

Notes:

Error below is caused by wrong dynamic input shapes, check if the input image shape is valid to dynamic shapes in config.

inference failed: [StatusCode.INTERNAL] request specifies invalid shape for input 'input' for detec_trt_0_gpu0. Error details: model expected the shape of dimension 2 to be between 256 and 1200 but received 1216

b. Classic Pytorch (.pth) inference:

$ python test.py --trained_model=[weightfile] --test_folder=[folder path to test images]

The result image and socre maps will be saved to ./result by default.

Open Source Agenda is not affiliated with "Triton TensorRT Inference CRAFT Pytorch" Project. README Source: k9ele7en/Triton-TensorRT-Inference-CRAFT-pytorch

Stars

Open Issues

Last Commit

2 years ago

Repository

k9ele7en/Triton-TensorRT-Inference-CRAFT-pytorch

License

BSD-3-Clause

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/triton-tensorrt-inference-craft-pytorch"><img src="https://www.opensourceagenda.com/projects/triton-tensorrt-inference-craft-pytorch/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022

Model format	gRPC (s)	HTTP (s)
TensoRT	0.946	0.952
Torchscript	1.244	1.098
ONNX	1.052	1.060