Adlik Versions Save

Adlik: Toolkit for Accelerating Deep Learning Inference

v1.2.0

4 months ago

Feature List

We fork vLLM repository and add some new features to accelerate LLM inference:

Support int8 inference.
Support int4 inference, throughput increase of 1.9-4.0 times compared to the FP16 model.
Support FP8 kv cache which not only simplifies the quantization and dequantization operations, but also does not require additional scale GPU memory storage. The throughput can achive up to 1.54 times compared to disable this feature.

v1.1.0

10 months ago

Feature List

Model Optimzier

Support PTQ and QAT quantization.
Support Low-bit quantization and release the W4A4 and W3A3 models on Modelzoo.
The W4A4 and W3A4 models rank in the top two on paperswithcode's quantitazation list.
Export to JIT and ONNX format for quantization model.
Support automatic model pruning using AutoSlim.
Support model distillation.

LLMA

Support large language model inference on Cloudblazer Yunsui t20

v1.0.0

1 year ago

Release Date: 2022-12-20 Compatibility: The functional interfaces of Adlik r1.0.0 are compatible with previous release.

Feature List

Compiler

Spport compiling onnx model to the data format which can be accelerated by Cloudblazer Yunsui i20 with a Suixi 2.5 chip of Enflame company
Upgrade OpenVINO to 2022.3.0
Upgrade TensorFlow to 2.10.1
Upgrade TensorRT to 8.4.3.1
Upgrade ubuntu to 20.04, python to 3.8 and cuda to 11.6 in the base docker images

Inference Engine

Support model nference on Cloudblazer Yunsui i20
Upgrade OpenVINO to 2022.3.0
Upgrade TensorFlow to 2.10.1
Upgrade TensorRT to 8.4.3.1
Upgrade ubuntu to 20.04, python to 3.8 and cuda to 11.6 in the base docker images

Model Zoo:

Add mobilenet v2 series model which accuracy on the imagenet-1k dataset reach 72.396%

Benchmark Test:

Complete Benchmark tests of YOLOv5m and MaskRcnn on the Intel 8260 CPU, including accuracy, performance indicators such as throughput

Fixed issues

v0.5.0

1 year ago

Release Date: 2022-6-21 Compatibility: The functional interfaces of Adlik r0.5 are compatible with previous release.

Feature List

Model Optimizer

Support quantization and distillation of YOLOv5s models, which achieves nearly 2.5 times inference performance improvement with OpenVINO runtime.

Model Zoo

A new repository that stores Adlik optimized and compiled models, including ResNet series models and YOLOv5 series models.

Compiler:

Support compilation path from oneflow to onnx
OpenVINO upgraded to version 2022.1.0

Inference Engine

Support Torch runtime
OpenVINO upgraded to version 2022.1.0

Benchmark Test

Benchmark test for BERT model on Intel 8260 CPU, including throughput and other performance indicators.

Fixed issues

v0.4.0

2 years ago

Release Date: 2021-12-02 Compatibility: The functional interfaces of Adlik r0.4 are compatible with previous release.

Feature List

Compiler

Adlik compiler supports OpenVINO INT8 quantization.
Adlik compiler supports TensorRT INT8 quantization. Supports extended quantization calibrator for TensorRT for reducing the accuracy drop caused by quantization.

Optimizer

Support multi-teacher distillation method, which uses multi-teacher networks for distillation optimization.
Support ZEN-NAS search enhancement features, including parallel training, optimization for search acceleration, fix the bugs of original implementation etc. The consumed search time is reduced by about 15%, when the search score is slightly improved, which results in increas of the training accuracy by 0.2% ~1%.

Inference Engine

Support Paddle Inference Runtime. When using Paddle-format model, converting model format through Onnx components is not needed, and users can directly perform model inference in the Adlik environment.
Support Intel TGL-U i5 device inference, and complete benchmark tests with several models.
Docker images for cloud native environments support newest version of inference components including: (1) OpenVINO: version 2021.4.582 (2)TensorFlow: 2.6.2 (3)TensorRT: 7.2.1.6 (4) Tf-lite: 2.4.0 (5) TVM: 0.7 (6) Paddle Inference: 2.1.2
Introduce C++ version of Client API, which supports cmake and bazel compilation, and is convenient for users to deploy in C/C++ scenarios.

Benchmark Test

Complete Benchmark tests of Resnet-50, Yolo v3/v4, FastRCNN, MaskRCNN and other models on Intel TGL-U i5 equipment, including latency, throughput, and various performance indicators under GPU video decoding.
MLPerf result on Bert model with Adlik-optimized.

Fixed issues

v0.3.0

2 years ago

Release Date: 2021-06-21 Compatibility: The functional interfaces of Adlik r0.3 are compatible with r0.2 and r0.1.

Feature List

Compiler

Integrate deep learning frameworks including PaddlePaddle, Caffe and MXNet
Support compiling into TVM
Support FP16 quantization for OpenVINO
Support TVM auto scheduling

Optimizer

Inference Engine

Support runtime of TVM and TF-TRT
Docker images for cloud native environments support newest version of inference components including:

OpenVINO (2021.1.110)
TensorFlow (2.4.0)
TensorRT (7.2.1.6)
TFLite (2.4.0)
TVM (0.7)

Benchmark Test

Support paddle models, such as Paddle OCR，PP-YOLO，PPresnet-50

Fixed issues

v0.2.0

3 years ago

Release Date: 2020-11-20 Compatibility: The functional interfaces of Adlik r0.2 are compatible with r0.1.

Feature List

New Model Compiler

Support DAG generation for end-to-end compilation of models with different representation.
Source representation: H5, Ckpt, Pb, Pth, Onnx and SavedModel.
Target representation: SavedModel, OpenVINO IR, TensorRT Plan and Tflite.
Support model quantization for TfLite and TensorRT.
Int8 quantization for TfLite.
Int8 and fp16 quantization for TensorRT.

Inference Engine

Support hybrid scheduling of ML and DL inference jobs.
Support image based deployment of Adlik compiler and inference engine in cloud native environment.
Deployment and functions has been tested in docker (V19.03.12) and Kubernetes (V1.13).
Support Adlik running in RaspberryPi and JetsonNano.
Support the newest version of OpenVINO (2021.1.110) and TensorFlow (2.3.1).

Benchmark Test

Support benchmark test for models including ResNet-50, Inception V3, Yolo V3 and Bert with 4 devices and 5 runtimes supported by Adlik.

Fixed issues

v0.1.0

3 years ago

Release 0.1.0

Release Date: 2020-06-15 Compatibility: Because r0.1.0 is the first release version of Adlik, there is no consideration on compatibility.

Feature List

Model Compiler

A new framework which is easy to expand and maintain.
Compilation of models trained from Keras, Tensorflow and Pytorch for better execution on CPU/GPU.

Training framework	Model format	Target runtime	compiled format
Keras	h5	Tf Serving	SavedModel
		OpenVINO	IR
		TensorRT	Plan
		TF-Lite	tflite
TensorFlow	Ckpt/pb	Tf Serving	SavedModel
		OpenVINO	IR
		TensorRT	Plan
		TF-Lite	tflite
PyTorch	pth	OpenVINO	IR
		TensorRT	Plan

Training framework	Inference engine	hardware environment
Keras	TensorFlow Serving-1.14	CPU/GPU
	TensorFlow Serving-2.2	CPU/GPU
	OpenVINO-2019	CPU
	TensorRT-6	GPU
	TensorRT-7	GPU
	TF Lite-2.1	CPU(X86/ARM)
TensorFlow	TensorFlow Serving-1.14	CPU/GPU
	TensorFlow Serving-2.2	CPU/GPU
	OpenVINO-2019	CPU
	TensorRT-6	GPU
	TensorRT-7	GPU
	TF Lite-2.1	CPU(X86/ARM)
PyTorch	OpenVINO-2019	CPU
	TensorRT-6	GPU

Model Optimizer

Multi nodes multi GPUs training and pruning.
Configurable implementation of filter pruning to achieve smaller size of inference models.
Small batch dataset quantization for TF-Lite and TF-TRT.

Inference Engine

Management of multi models and multi versions.
HTTP/GRPC interfaces for inference service.
Runtime scheduler that supports scheduling of multi model instances.
Integration of multiple DL inference runtime, including TensorFlow Serving, OpenVINO, TensorRT and TF Lite.

Inference engine	hardware environment
TensorFlow Serving-1.14	CPU/GPU
TensorFlow Serving-2.2	CPU/GPU
OpenVINO-2019	CPU
TensorRT-6	GPU
TensorRT-7	GPU
TF Lite-2.1	CPU(X86/ARM)

Integration of dlib to support ML runtime.

Benchmark Test Framework for Deep Learning Model

A containalized solution which auto executes compiling, packaging of models, loading of runtime and models, startup of inference service and client, and generation of testing results.
Supports all the compilers and runtime that can be integrated into Adlik.
Supported output: inference result, inference speed, delay of inference execution.