Transformer Deploy Versions Save

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀

v0.4.0

2 years ago

add support for decoder based model (GPT-2) on both ONNX Runtime and TensorRT
refactor triton configuration generation (simplification)
add GPT-2 model documentation (notebook)
fix CPU quantization benchmark (was not using the quant model)
fix sentence transformers bug

v0.3.0

2 years ago

What's Changed

Update requirements_gpu.txt by @sam-writer in https://github.com/ELS-RD/transformer-deploy/pull/22
refactoring by @pommedeterresautee in https://github.com/ELS-RD/transformer-deploy/pull/27
add CPU inference support by @pommedeterresautee in https://github.com/ELS-RD/transformer-deploy/pull/28
Add QAT support to more models by @pommedeterresautee in https://github.com/ELS-RD/transformer-deploy/pull/29

Full Changelog: https://github.com/ELS-RD/transformer-deploy/compare/v0.2.0...v0.3.0

v0.2.0

2 years ago

support int-8 GPU quantization
add a tuto to perform quantization end to end
add QDQRoberta model
switch to ONNX opset 13
refactoring in the TensorRT engine creation
fix bugs
add auth token (for private HF repo)

What's Changed

Update triton by @pommedeterresautee in https://github.com/ELS-RD/transformer-deploy/pull/11
fix README.md by @pommedeterresautee in https://github.com/ELS-RD/transformer-deploy/pull/13
Fix install errors by @sam-writer in https://github.com/ELS-RD/transformer-deploy/pull/20
Add auth token by @sam-writer in https://github.com/ELS-RD/transformer-deploy/pull/19
Support GPU INT-8 quantization by @pommedeterresautee in https://github.com/ELS-RD/transformer-deploy/pull/15

New Contributors

@sam-writer made their first contribution in https://github.com/ELS-RD/transformer-deploy/pull/20

Full Changelog: https://github.com/ELS-RD/transformer-deploy/compare/v0.1.1...v0.2.0

v0.1.1

2 years ago

update Docker image
update documentation

v0.1.0

2 years ago

switch from a proof of concept to a library
add support for TensorRT Python API (for best performances)
improve documentation (separate Hugging Face Infinity thing from the doc, add benchmark, etc.)
fix issues with mixed precision
add license
add tests, Github actions, Makefile
change the way the Docker image is built

v0.0.1

2 years ago

all the scripts to reproduce https://medium.com/p/e1be0057a51c