AutoGPTQ Versions Save

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

v0.2.2

11 months ago

fix autogptq_cuda dir missed in distribution file

v0.2.1

11 months ago

Fix the problem that installation from pypi failed when the environment variable CUDA_VERSION is set.

v0.2.0

11 months ago

Happy International Children's Day! 🎈 At the age of LLMs and the dawn of AGI, may we always be curious like children, with vigorous energy and courage to explore the bright future.

Features Summary

There are bunch of new features been added in this version:

Optimized modules for faster inference speed: fused attention for llama and gptj, fused mlp for llama
Full CPU offloading
Multiple GPUs inference with triton backend
Three new models are supported: codegen, gpt_bigcode and falcon
Support download/upload quantized model from/to HF Hub

Change Log

Below are the detailed change log:

Fix bug cuda by @qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/44
Fix bug caused by 'groupsize' vs 'group_size' and change all code to use 'group_size' consistently by @TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/58
Setup conda by @Sciumo in https://github.com/PanQiWei/AutoGPTQ/pull/59
fix incorrect pack while using cuda, desc_act and grouping by @lszxb in https://github.com/PanQiWei/AutoGPTQ/pull/62
Faster llama by @qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/43
Gptj fused attention by @PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/76
Look for .pt files by @oobabooga in https://github.com/PanQiWei/AutoGPTQ/pull/79
Support users customize device_map by @PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/80
Update example script to include desc_act by @Ph0rk0z in https://github.com/PanQiWei/AutoGPTQ/pull/82
Forward position args to allow model(tokens) syntax by @TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/84
Rename 'quant_cuda' to 'autogptq_cuda' to avoid conflicts with existing GPTQ-for-LLaMa installations. by @TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/93
fix ImportError when triton is not installed by @PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/92
Fix CUDA out of memory error in qlinear_old.py by @LexSong in https://github.com/PanQiWei/AutoGPTQ/pull/66
Improve CPU offload by @PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/100
triton float32 support by @qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/104
Add support for CodeGen/2 by @LaaZa in https://github.com/PanQiWei/AutoGPTQ/pull/65
Add support for GPTBigCode(starcoder) by @LaaZa in https://github.com/PanQiWei/AutoGPTQ/pull/63
Minor syntax fix for auto.py by @billcai in https://github.com/PanQiWei/AutoGPTQ/pull/112
Falcon support by @qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/111
Add support for HF Hub download, and push_to_hub by @TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/91
Add build wheels workflow by @PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/120

New Contributors

Following are new contributors and their first pr. Thank you very much for your love of auto_gptq and contributions! ❤️

@Sciumo made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/59
@lszxb made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/62
@oobabooga made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/79
@Ph0rk0z made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/82
@LexSong made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/66
@LaaZa made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/65
@billcai made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/112

Full Changelog: https://github.com/PanQiWei/AutoGPTQ/compare/v0.1.0...v0.2.0

v0.1.0

1 year ago

What's Changed

add option by @qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/23
Add gpt2 by @qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/30
Fix bug speedup quant and support gpt2 by @qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/29
Offloading and Multiple devices quantization/inference by @PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/24
Add raise exception and gpt2 xl example add by @qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/31
Allow to load arbitrary models by @z80maniac in https://github.com/PanQiWei/AutoGPTQ/pull/33
Change save name by @qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/34
Fix typo: 'hole' -> 'whole' by @TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/40
bug fix quantization demo by @qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/37
Check that model_save_name exists before trying to load it, to avoid confusing checkpoint error by @TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/39
Faster cuda no actorder by @qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/38

New Contributors

@z80maniac made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/33
@TheBloke made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/40

Full Changelog: https://github.com/PanQiWei/AutoGPTQ/compare/v0.0.5...v0.1.0

v0.0.5

1 year ago

What's Changed

add simple demo ppl test with wikitext2 by @qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/17
push_to_hub integration by @PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/18

New Contributors

@qwopqwop200 made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/17

Full Changelog: https://github.com/PanQiWei/AutoGPTQ/compare/v0.0.4...v0.0.5

v0.0.4

1 year ago

Big News

triton is officially supported start from this version!
quick install from pypi using pip install auto-gptq is supported start from this version!

What's Changed

Support MOSS model by @PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/15
Triton integration by @PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/16

Full Changelog: https://github.com/PanQiWei/AutoGPTQ/compare/v0.0.3...v0.0.4

v0.0.3

1 year ago

What's Changed

fix typo in README.md
fix problem that can't get some models' max sequence length
fix problem that some models have more required positional arguments when forward in transformer layers
fix mismatch GPTNeoxForCausalLM's lm_head

New Contributors

@eltociear made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/10

v0.0.2

1 year ago

added eval_tasks module to support evaluate model's performance on predefined down-stream tasks before and after quantization
fixed some bugs when using LLaMa model
fixed some bugs when using models that required position_ids