Efficient AI Backbones including GhostNet, TNT and MLP, developed by Hua...
LLMCompiler: An LLM Compiler for Parallel Function Calling
Code for paper " AdderNet: Do We Really Need Multiplications in Deep Lea...
EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
SqueezeLLM: Dense-and-Sparse Quantization
Learning Efficient Convolutional Networks through Network Slimming, In I...
"LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and...
Deep Face Model Compression
[ECCV2022] Efficient Long-Range Attention Network for Image Super-resolu...
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Q...
[ECCV 2022] Official implementation of the paper "DeciWatch: A Simple Ba...
Soft Threshold Weight Reparameterization for Learnable Sparsity
[NeurIPS'23] Speculative Decoding with Big Little Decoder
Explorations into some recent techniques surrounding speculative decoding