SF MoE DG Save

GMoE could be the next backbone model for many kinds of generalization task.

Project README

Welcome to Generalizable Mixture-of-Experts for Domain Generalization

🔥 Our paper Sparse Mixture-of-Experts are Domain Generalizable Learners has officially been accepted as ICLR 2023 for Oral presentation.

🔥 GMoE-S/16 model currently ranks top place among multiple DG datasets without extra pre-training data. (Our GMoE-S/16 is initilized from DeiT-S/16, which was only pretrained on ImageNet-1K 2012)

Wondering why GMoEs have astonishing performance? 🤯 Let's investigate the generalization ability of model architecture itself and see the great potentials of Sparse Mixture-of-Experts (MoE) architecture.


pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

python3 -m pip uninstall tutel -y
python3 -m pip install --user --upgrade git+https://github.com/microsoft/tutel@main

pip3 install -r requirements.txt


python3 -m domainbed.scripts.download \


Environment details used in paper for the main experiments on Nvidia V100 GPU.

	Python: 3.9.12
	PyTorch: 1.12.0+cu116
	Torchvision: 0.13.0+cu116
	CUDA: 11.6
	CUDNN: 8302
	NumPy: 1.19.5
	PIL: 9.2.0

Start Training

Train a model:

python3 -m domainbed.scripts.train\
       --algorithm GMOE\
       --dataset OfficeHome\
       --test_env 2


We put hparams for each dataset into


Basically, you just need to choose --algorithm and --dataset. The optimal hparams will be loaded accordingly.


This source code is released under the MIT license, included here.


The MoE module is built on Tutel MoE.

Open Source Agenda is not affiliated with "SF MoE DG" Project. README Source: Luodian/Generalizable-Mixture-of-Experts

Open Source Agenda Badge

Open Source Agenda Rating