[NeurIPS'23] Speculative Decoding with Big Little Decoder
This repo implements Speculative Decoding with Big Little Decoder (BiLD) on top of the HuggingFace framework.
Check out the paper for more details.
Big Little Decoder is a simple framework that enables faster generative inference. It can dramatically accelerate text generation by ~2x, without compromising performance on a variety of text generation scenarios. Furthermore, it is a simple plug-and-play solution that requires no training or architecture redesign.
Here's the key underlying idea:
You need to prepare your own large and small models. You can either use HuggingFace's pretrained models or finetune them on your target tasks. Please refer to the HuggingFace's official instructions for more detail on loading and/or finetuning pretrained models.
We provide a script that evaluates BiLD on machine translation tasks: examples/pytorch/run_bild_translation.py
.
BiLD evaluation command:
CUDA_VISIBLE_DEVICES=0 python run_bild_translation.py --model bild --small [small_model_path] --large [large_model_path] \
--dataset_name iwslt2017 --dataset_config iwslt2017-de-en --source_lang de --target_lang en --bild_rollback [RB] --bild_fallback [FB]
[small_model_path]
and [large_model_path]
are paths to the small and the large model, respectively (prepared as prerequisite).[RB]
is the rollback threshold (normally 2~5 works fine). [FB]
is the fallback threshold that can have a value from 0 to 1. For more details of these two hyperparameters, please refer to our paper.We also provide a command for running the baseline model:
CUDA_VISIBLE_DEVICES=0 python run_bild_translation.py --model [model_path] \
--dataset_name iwslt2017 --dataset_config iwslt2017-de-en --source_lang de --target_lang en
[model_path]
is the path to the baseline model (e.g. [small_model_path]
or [large_model_path]
)We provide finetuned checkpoints that were used for the evaluations in our paper.
Dataset | Model | Link |
---|---|---|
IWSLT-2017-De-En | mT5-small | link |
IWSLT-2017-De-En | mT5-small (aligned) | link |
IWSLT-2017-De-En | mT5-large | link |
WMT-2014-De-En | mT5-small | link |
WMT-2014-De-En | mT5-small (aligned) | link |
WMT-2014-De-En | mT5-large | link |
XSUM | T5-small | link |
XSUM | T5-small (aligned) | link |
XSUM | T5-large | link |
CNNDM | T5-small | link |
CNNDM | T5-small (aligned) | link |
CNNDM | T5-large | link |