Crosslingual Generalization through Multitask Finetuning
This repository provides an overview of all components used for the creation of BLOOMZ & mT0 and xP3 introduced in the paper Crosslingual Generalization through Multitask Finetuning.
Name | Explanation | Example models |
---|---|---|
xP3x | Mixture of 17 tasks in 277 languages with English prompts | WIP - Join us at Project Aya @C4AI to help! |
xP3 | Mixture of 13 training tasks in 46 languages with English prompts | BLOOMZ & mT0-13B |
xP3mt | Mixture of 13 training tasks in 46 languages with prompts in 20 languages (machine-translated from English) | BLOOMZ-MT & mT0-13B-MT |
xP3all | xP3 + our evaluation datasets adding an additional 3 tasks for a total of 16 tasks in 46 languages with English prompts | |
xP3megds | Megatron-DeepSpeed processed version of xP3 | BLOOMZ |
P3 | Repreprocessed version of the English-only P3 with 8 training tasks | BLOOMZ-P3 & mT0-13B-P3 |
Multitask finetuned on xP3. Recommended for prompting in English. | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Parameters | 300M | 580M | 1.2B | 3.7B | 13B | 560M | 1.1B | 1.7B | 3B | 7.1B | 176B |
Finetuned Model | mt0-small | mt0-base | mt0-large | mt0-xl | mt0-xxl | bloomz-560m | bloomz-1b1 | bloomz-1b7 | bloomz-3b | bloomz-7b1 | bloomz |
Multitask finetuned on xP3mt. Recommended for prompting in non-English. | |||||||||||
Finetuned Model | mt0-xxl-mt | bloomz-7b1-mt | bloomz-mt | Multitask finetuned on P3. Released for research purposes only. Strictly inferior to above models! | |||||||
Finetuned Model | mt0-xxl-p3 | bloomz-7b1-p3 | bloomz-p3 | Original pretrained checkpoints. Not recommended. | |||||||
Pretrained Model | mt5-small | mt5-base | mt5-large | mt5-xl | mt5-xxl | bloom-560m | bloom-1b1 | bloom-1b7 | bloom-3b | bloom-7b1 | bloom |
We have processed & uploaded xP3. If you want to recreate it, follow these steps:
git clone -b xp3mt https://github.com/Muennighoff/promptsource.git
, for xP3 git clone -b tr13 https://github.com/Muennighoff/promptsource.git
& install cd promptsource; pip install -e .
pip install -q datasets iso-639
USE_ENGLISH_PROMPTS = False
in the beginningUSE_ENGLISH_PROMPTS = True
in the beginningpython prepare_xp3.py
or a SLURM script
For the new extension of xP3, xP3x, the process is largely the same except:
xp3x
branch instead i.e. pip install git+https://github.com/Muennighoff/promptsource.git@xp3x
create_xp3x.py
.xP3x is a superset of xP3, so unless you want to reproduce the paper, we recommend always using xP3x (or xP3mt if you want machine-translated prompts).
git clone -b t0loading https://github.com/bigscience-workshop/Megatron-DeepSpeed
& follow its setup guide to create an environment with necessary packages.merged_{lang}.jsonl
files & preprocess it using the script here.xp3capmixnewcodelonglossseq
. E.g. this is the script launched to train bloomz. Important parts of the script to modify are:#SBATCH
variables, such as nodes, gpus, time, etc. - Our SLURM guide is here
source $six_ALL_CCFRWORK/start-tr13f-6B3-ml-t0
to point to your own conda environment setup via Megatron-DeepSpeedTRAIN_DATA_PATH
& VALID_DATA_PATH
, which point to files pointing to your processed training and validation data. We provide our files in this repository (xp3capmixnewcodelong_train.txt
& xp3capmixnewcodelong_validation.txt
), but you will likely want to change the paths inside. The percentages per language are based on how much each language makes up in xP3 with code being slightly upsampled.--universal
flag in the script. We recommend saving a new checkpoint right after & then continuing training without --universal
, which will be faster.--no-load-optim
& --reset-progress
flagsHelpful resources:
Follow the finetuning instructions here making sure to use pretrained mT5 models & the xP3 dataset.
Helpful resources:
Evaluation results are all available in this repository: https://huggingface.co/datasets/bigscience/evaluation-results under the respective models. Below we explain how to run evaluation.
We evaluate the models on Rank Evaluation on XCOPA, XNLI, XStoryCloze & XWinograd:
git clone -b xp3mt https://github.com/Muennighoff/promptsource.git
& cd promptsource; pip install -e .
git clone -b muennighoff/upgrdps https://github.com/Muennighoff/t-zero.git
& cd t-zero; pip install -e .
We evaluate generation on translation & summarization during training for validation:
git clone -b xp3mt https://github.com/Muennighoff/promptsource
& cd promptsource; pip install -e .
git clone https://github.com/bigscience-workshop/lm-evaluation-harness
. The script for the 7.1B model, for example, is here.We also evaluate code generation on HumanEval:
git clone https://github.com/loubnabnl/bloom-code-evaluation
& go through its setup.prepend_eos
to False
in code_eval.py
at complete_code(model, tokenizer, prompt, num_completions=1, prepend_eos=True, **gen_kwargs)
i.e. complete_code(model, tokenizer, prompt, num_completions=1, prepend_eos=False, **gen_kwargs)
.plotstables/xp3_taxonomy.drawio
& plotstables/xp3_taxonomy.pdf
plotstables/xp3_languages.ipynb
& colab
plotstables/xp3_variants.pdf
& drawings
plotstables/xp3_generalization_bar.pdf
& colab
plotstables/lang_generalization
& colab
plotstables/scale.pdf
& colab
plotstables/validation.pdf
& colab
plotstables/pretraining_sizes.pdf
& colab
plotstables/english_task_generalization.pdf
& colab
plotstables/task_generalization.pdf
& colab
plotstables/roots_xp3_languages.pdf
& colab requiring some of the files in plotstables/contamination
plotstables/examples/bloom_code_example.py
& plotstables/examples/bloom_code_light.pdf
& plotstables/examples/bloomz_code_light.pdf
; The raw code files can be found here & here
plotstables/examples/*.pdf
& plotstables/examples/generations.drawio
plotstables/compute_codegen_len.ipynb
for generations & plotstables/countcode.py
for xP3plotstables/levenshtein.py
@article{muennighoff2022crosslingual,
title={Crosslingual generalization through multitask finetuning},
author={Muennighoff, Niklas and Wang, Thomas and Sutawika, Lintang and Roberts, Adam and Biderman, Stella and Scao, Teven Le and Bari, M Saiful and Shen, Sheng and Yong, Zheng-Xin and Schoelkopf, Hailey and others},
journal={arXiv preprint arXiv:2211.01786},
year={2022}
}