Transformer Deploy Versions Save

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀

v0.4.0

2 years ago
  • add support for decoder based model (GPT-2) on both ONNX Runtime and TensorRT
  • refactor triton configuration generation (simplification)
  • add GPT-2 model documentation (notebook)
  • fix CPU quantization benchmark (was not using the quant model)
  • fix sentence transformers bug

v0.3.0

2 years ago

What's Changed

Full Changelog: https://github.com/ELS-RD/transformer-deploy/compare/v0.2.0...v0.3.0

v0.2.0

2 years ago
  • support int-8 GPU quantization
  • add a tuto to perform quantization end to end
  • add QDQRoberta model
  • switch to ONNX opset 13
  • refactoring in the TensorRT engine creation
  • fix bugs
  • add auth token (for private HF repo)

What's Changed

New Contributors

Full Changelog: https://github.com/ELS-RD/transformer-deploy/compare/v0.1.1...v0.2.0

v0.1.1

2 years ago
  • update Docker image
  • update documentation

v0.1.0

2 years ago
  • switch from a proof of concept to a library
  • add support for TensorRT Python API (for best performances)
  • improve documentation (separate Hugging Face Infinity thing from the doc, add benchmark, etc.)
  • fix issues with mixed precision
  • add license
  • add tests, Github actions, Makefile
  • change the way the Docker image is built

v0.0.1

2 years ago

all the scripts to reproduce https://medium.com/p/e1be0057a51c