Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
Full Changelog: https://github.com/lucidrains/Mega-pytorch/compare/0.0.6...0.0.7