Best 112 Quantization Open Source Projects

PyTorch Project Specification.

Calculate token/s & GPU memory requirement for any LLM. Supports llama....

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization tec...

An Open-Source Package for Deep Learning to Hash (DeepHash)

Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference wit...

QKeras: a quantization deep learning library for Tensorflow Keras

Awesome machine learning model compression research papers, tools, and l...

Infrastructures™ for Machine Learning Training/Inference in Production.

Neural network model repository for highly sparse and sparse-quantized m...

[CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed ...

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups ...

BEVFormer inference on TensorRT, including INT8 Quantization and Custom ...

Everything in Torch Fx

LLaMa/RWKV onnx models, quantization and testcase