🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFl...
[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented...
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Mo...
End-to-end ASR/LM implementation with PyTorch
XLNet for generating language.
Clean baseline implementation of PPO using an episodic TransformerXL memory
Absolutely amazing SOTA Google Colab (Jupyter) Notebooks for creating/tr...
Transformer-XL with checkpoint loader
custom cuda kernel for {2, 3}d relative attention with pytorch wrapper
[ACL‘20] Highway Transformer: A Gated Transformer.