Kernl lets you run PyTorch transformer models several times faster on GP...
Fast, differentiable sorting and ranking in PyTorch
row-major matmul optimization