Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
Implementation of ST-MoE, the final incarnation of mixture of experts after years of research at Brain, in Pytorch. Will be largely a transcription of the official Mesh Tensorflow implementation. If you have any papers you think should be added, while I have my attention on mixture of experts, please open an issue.
@inproceedings{Zoph2022STMoEDS,
title = {ST-MoE: Designing Stable and Transferable Sparse Expert Models},
author = {Barret Zoph and Irwan Bello and Sameer Kumar and Nan Du and Yanping Huang and Jeff Dean and Noam M. Shazeer and William Fedus},
year = {2022}
}