My take on a practical implementation of Linformer for Pytorch.
Have not pushed up a release in a while, and this is a latest working version after 2 misc bugs have been fixed.
Added intermediate ff dimension
Now, the model dimension can be different in the intermediate layers.
This change applies to the ff module, and only in the encoder. Now, if
the flag ff_intermediate
is not None, the layers will look like this:
channels -> ff_dim -> ff_intermediate (For layer 1)
ff_intermediate -> ff_dim -> ff_intermediate (For layers 2 to depth-1)
ff_intermediate -> ff_dim -> channels (For layer depth)
As opposed to
channels -> ff_dim -> channels (For all layers)
Now, the linformer supports convolution as a way to downsample the input, instead of relying on linear layers. This may reduce the amount of parameters necessary.
Finished an encoder and a decoder module. Also, causal attention works, when the causal=True
flag is set. Will update the README shortly...
Added masking to the Linformer. However, this is still a WIP, since masking cannot be done in the traditional sense, like what is done in the attention is all you need paper, because there is an overhead of adding another (n,n)
matrix, which is infeasable.
The repo now supports an encoder and a decoder.
TODO: Masking
Fixed a bug with the sequencing of the Linformer. Now should train properly.
A lm model is now available, for language modeling tasks
Rebased the code so it looks better, and added the option to plot the MHAttention module as well as the Linformer module
Check out pull request #7 to see the changes