PyTorch Project Specification.
Calculate token/s & GPU memory requirement for any LLM. Supports llama....
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization tec...
An Open-Source Package for Deep Learning to Hash (DeepHash)
Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference wit...
QKeras: a quantization deep learning library for Tensorflow Keras
Awesome machine learning model compression research papers, tools, and l...
Infrastructures™ for Machine Learning Training/Inference in Production.
Neural network model repository for highly sparse and sparse-quantized m...
[CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed ...
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups ...
BEVFormer inference on TensorRT, including INT8 Quantization and Custom ...
Everything in Torch Fx
LLaMa/RWKV onnx models, quantization and testcase