Adlik Versions Save

Adlik: Toolkit for Accelerating Deep Learning Inference

v1.2.0

4 months ago

Feature List

We fork vLLM repository and add some new features to accelerate LLM inference:

  • Support int8 inference.
  • Support int4 inference, throughput increase of 1.9-4.0 times compared to the FP16 model.
  • Support FP8 kv cache which not only simplifies the quantization and dequantization operations, but also does not require additional scale GPU memory storage. The throughput can achive up to 1.54 times compared to disable this feature.

v1.1.0

10 months ago

Feature List

Model Optimzier

  • Support PTQ and QAT quantization.
  • Support Low-bit quantization and release the W4A4 and W3A3 models on Modelzoo.
  • The W4A4 and W3A4 models rank in the top two on paperswithcode's quantitazation list.
  • Export to JIT and ONNX format for quantization model.
  • Support automatic model pruning using AutoSlim.
  • Support model distillation.

LLMA

  • Support large language model inference on Cloudblazer Yunsui t20

v1.0.0

1 year ago

Release Date: 2022-12-20 Compatibility: The functional interfaces of Adlik r1.0.0 are compatible with previous release.

Feature List

Compiler

  • Spport compiling onnx model to the data format which can be accelerated by Cloudblazer Yunsui i20 with a Suixi 2.5 chip of Enflame company
  • Upgrade OpenVINO to 2022.3.0
  • Upgrade TensorFlow to 2.10.1
  • Upgrade TensorRT to 8.4.3.1
  • Upgrade ubuntu to 20.04, python to 3.8 and cuda to 11.6 in the base docker images

Inference Engine

  • Support model nference on Cloudblazer Yunsui i20
  • Upgrade OpenVINO to 2022.3.0
  • Upgrade TensorFlow to 2.10.1
  • Upgrade TensorRT to 8.4.3.1
  • Upgrade ubuntu to 20.04, python to 3.8 and cuda to 11.6 in the base docker images

Model Zoo:

  • Add mobilenet v2 series model which accuracy on the imagenet-1k dataset reach 72.396%

Benchmark Test:

  • Complete Benchmark tests of YOLOv5m and MaskRcnn on the Intel 8260 CPU, including accuracy, performance indicators such as throughput

Fixed issues

v0.5.0

1 year ago

Release Date: 2022-6-21 Compatibility: The functional interfaces of Adlik r0.5 are compatible with previous release.

Feature List

Model Optimizer

  • Support quantization and distillation of YOLOv5s models, which achieves nearly 2.5 times inference performance improvement with OpenVINO runtime.

Model Zoo

  • A new repository that stores Adlik optimized and compiled models, including ResNet series models and YOLOv5 series models.

Compiler:

  • Support compilation path from oneflow to onnx
  • OpenVINO upgraded to version 2022.1.0

Inference Engine

  • Support Torch runtime
  • OpenVINO upgraded to version 2022.1.0

Benchmark Test

  • Benchmark test for BERT model on Intel 8260 CPU, including throughput and other performance indicators.

Fixed issues

v0.4.0

2 years ago

Release Date: 2021-12-02 Compatibility: The functional interfaces of Adlik r0.4 are compatible with previous release.

Feature List

Compiler

  1. Adlik compiler supports OpenVINO INT8 quantization.
  2. Adlik compiler supports TensorRT INT8 quantization. Supports extended quantization calibrator for TensorRT for reducing the accuracy drop caused by quantization.

Optimizer

  1. Support multi-teacher distillation method, which uses multi-teacher networks for distillation optimization.
  2. Support ZEN-NAS search enhancement features, including parallel training, optimization for search acceleration, fix the bugs of original implementation etc. The consumed search time is reduced by about 15%, when the search score is slightly improved, which results in increas of the training accuracy by 0.2% ~1%.

Inference Engine

  1. Support Paddle Inference Runtime. When using Paddle-format model, converting model format through Onnx components is not needed, and users can directly perform model inference in the Adlik environment.
  2. Support Intel TGL-U i5 device inference, and complete benchmark tests with several models.
  3. Docker images for cloud native environments support newest version of inference components including: (1) OpenVINO: version 2021.4.582 (2)TensorFlow: 2.6.2 (3)TensorRT: 7.2.1.6 (4) Tf-lite: 2.4.0 (5) TVM: 0.7 (6) Paddle Inference: 2.1.2
  4. Introduce C++ version of Client API, which supports cmake and bazel compilation, and is convenient for users to deploy in C/C++ scenarios.

Benchmark Test

  1. Complete Benchmark tests of Resnet-50, Yolo v3/v4, FastRCNN, MaskRCNN and other models on Intel TGL-U i5 equipment, including latency, throughput, and various performance indicators under GPU video decoding.
  2. MLPerf result on Bert model with Adlik-optimized.

Fixed issues

v0.3.0

2 years ago

Release Date: 2021-06-21 Compatibility: The functional interfaces of Adlik r0.3 are compatible with r0.2 and r0.1.

Feature List

Compiler

  1. Integrate deep learning frameworks including PaddlePaddle, Caffe and MXNet
  2. Support compiling into TVM
  3. Support FP16 quantization for OpenVINO
  4. Support TVM auto scheduling

Optimizer

  1. Specific optimization for YOLO V4
  2. Pruning, distillation and quantization for ResNet-50

Inference Engine

  1. Support runtime of TVM and TF-TRT
  2. Docker images for cloud native environments support newest version of inference components including:
  • OpenVINO (2021.1.110)
  • TensorFlow (2.4.0)
  • TensorRT (7.2.1.6)
  • TFLite (2.4.0)
  • TVM (0.7)

Benchmark Test

  1. Support paddle models, such as Paddle OCR,PP-YOLO,PPresnet-50

Fixed issues

v0.2.0

3 years ago

Release Date: 2020-11-20 Compatibility: The functional interfaces of Adlik r0.2 are compatible with r0.1.

Feature List

New Model Compiler

  1. Support DAG generation for end-to-end compilation of models with different representation.
  2. Source representation: H5, Ckpt, Pb, Pth, Onnx and SavedModel.
  3. Target representation: SavedModel, OpenVINO IR, TensorRT Plan and Tflite.
  4. Support model quantization for TfLite and TensorRT.
  5. Int8 quantization for TfLite.
  6. Int8 and fp16 quantization for TensorRT.

Inference Engine

  1. Support hybrid scheduling of ML and DL inference jobs.
  2. Support image based deployment of Adlik compiler and inference engine in cloud native environment.
  3. Deployment and functions has been tested in docker (V19.03.12) and Kubernetes (V1.13).
  4. Support Adlik running in RaspberryPi and JetsonNano.
  5. Support the newest version of OpenVINO (2021.1.110) and TensorFlow (2.3.1).

Benchmark Test

  1. Support benchmark test for models including ResNet-50, Inception V3, Yolo V3 and Bert with 4 devices and 5 runtimes supported by Adlik.

Fixed issues

v0.1.0

3 years ago

Release 0.1.0

Release Date: 2020-06-15 Compatibility: Because r0.1.0 is the first release version of Adlik, there is no consideration on compatibility.

Feature List

Model Compiler

  1. A new framework which is easy to expand and maintain.
  2. Compilation of models trained from Keras, Tensorflow and Pytorch for better execution on CPU/GPU.
Training framework Model format Target runtime compiled format
Keras h5 Tf Serving SavedModel
OpenVINO IR
TensorRT Plan
TF-Lite tflite
TensorFlow Ckpt/pb Tf Serving SavedModel
OpenVINO IR
TensorRT Plan
TF-Lite tflite
PyTorch pth OpenVINO IR
TensorRT Plan
Training framework Inference engine hardware environment
Keras TensorFlow Serving-1.14 CPU/GPU
TensorFlow Serving-2.2 CPU/GPU
OpenVINO-2019 CPU
TensorRT-6 GPU
TensorRT-7 GPU
TF Lite-2.1 CPU(X86/ARM)
TensorFlow TensorFlow Serving-1.14 CPU/GPU
TensorFlow Serving-2.2 CPU/GPU
OpenVINO-2019 CPU
TensorRT-6 GPU
TensorRT-7 GPU
TF Lite-2.1 CPU(X86/ARM)
PyTorch OpenVINO-2019 CPU
TensorRT-6 GPU

Model Optimizer

  1. Multi nodes multi GPUs training and pruning.
  2. Configurable implementation of filter pruning to achieve smaller size of inference models.
  3. Small batch dataset quantization for TF-Lite and TF-TRT.

Inference Engine

  1. Management of multi models and multi versions.
  2. HTTP/GRPC interfaces for inference service.
  3. Runtime scheduler that supports scheduling of multi model instances.
  4. Integration of multiple DL inference runtime, including TensorFlow Serving, OpenVINO, TensorRT and TF Lite.
Inference engine hardware environment
TensorFlow Serving-1.14 CPU/GPU
TensorFlow Serving-2.2 CPU/GPU
OpenVINO-2019 CPU
TensorRT-6 GPU
TensorRT-7 GPU
TF Lite-2.1 CPU(X86/ARM)
  1. Integration of dlib to support ML runtime.

Benchmark Test Framework for Deep Learning Model

  1. A containalized solution which auto executes compiling, packaging of models, loading of runtime and models, startup of inference service and client, and generation of testing results.
  2. Supports all the compilers and runtime that can be integrated into Adlik.
  3. Supported output: inference result, inference speed, delay of inference execution.