[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
[ICLR 2020] Once for All: Train One Network and Specialize it for Effici...
[ICLR 2019] ProxylessNAS: Direct Neural Architecture Search on Target Ta...
[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile...
[CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed ...
[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language...
A DNN inference latency prediction toolkit for accurately modeling and p...
[CVPR'20] ZeroQ: A Novel Zero Shot Quantization Framework
[ICML'21 Oral] I-BERT: Integer-only BERT Quantization
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Q...
Efficient 3D Backbone Network for Temporal Modeling
[ICCV 2019] Harmonious Bottleneck on Two Orthogonal Dimensions, surpassi...
[KDD'22] Learned Token Pruning for Transformers
S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural N...