AutoGPTQ Versions Save

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

v0.2.2

11 months ago
  • fix autogptq_cuda dir missed in distribution file

v0.2.1

11 months ago

Fix the problem that installation from pypi failed when the environment variable CUDA_VERSION is set.

v0.2.0

11 months ago

Happy International Children's Day! 🎈 At the age of LLMs and the dawn of AGI, may we always be curious like children, with vigorous energy and courage to explore the bright future.

Features Summary

There are bunch of new features been added in this version:

  • Optimized modules for faster inference speed: fused attention for llama and gptj, fused mlp for llama
  • Full CPU offloading
  • Multiple GPUs inference with triton backend
  • Three new models are supported: codegen, gpt_bigcode and falcon
  • Support download/upload quantized model from/to HF Hub

Change Log

Below are the detailed change log:

New Contributors

Following are new contributors and their first pr. Thank you very much for your love of auto_gptq and contributions! ❤️

Full Changelog: https://github.com/PanQiWei/AutoGPTQ/compare/v0.1.0...v0.2.0

v0.1.0

1 year ago

What's Changed

New Contributors

Full Changelog: https://github.com/PanQiWei/AutoGPTQ/compare/v0.0.5...v0.1.0

v0.0.5

1 year ago

What's Changed

New Contributors

Full Changelog: https://github.com/PanQiWei/AutoGPTQ/compare/v0.0.4...v0.0.5

v0.0.4

1 year ago

Big News

  • triton is officially supported start from this version!
  • quick install from pypi using pip install auto-gptq is supported start from this version!

What's Changed

Full Changelog: https://github.com/PanQiWei/AutoGPTQ/compare/v0.0.3...v0.0.4

v0.0.3

1 year ago

What's Changed

  • fix typo in README.md
  • fix problem that can't get some models' max sequence length
  • fix problem that some models have more required positional arguments when forward in transformer layers
  • fix mismatch GPTNeoxForCausalLM's lm_head

New Contributors

v0.0.2

1 year ago
  • added eval_tasks module to support evaluate model's performance on predefined down-stream tasks before and after quantization
  • fixed some bugs when using LLaMa model
  • fixed some bugs when using models that required position_ids