Best 21 Efficient Inference Open Source Projects

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Hua...

LLMCompiler: An LLM Compiler for Parallel Function Calling

Code for paper " AdderNet: Do We Really Need Multiplications in Deep Lea...

EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

SqueezeLLM: Dense-and-Sparse Quantization

Learning Efficient Convolutional Networks through Network Slimming, In I...

"LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and...

Deep Face Model Compression

[ECCV2022] Efficient Long-Range Attention Network for Image Super-resolu...

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Q...

[ECCV 2022] Official implementation of the paper "DeciWatch: A Simple Ba...

Soft Threshold Weight Reparameterization for Learnable Sparsity

[NeurIPS'23] Speculative Decoding with Big Little Decoder

Explorations into some recent techniques surrounding speculative decoding