list of efficient attention modules
My take on a practical implementation of Linformer for Pytorch.
Reproducing the Linear Multihead Attention introduced in Linformer paper...