Sequence Parallel Attention for Long Context LLM Model Training and Inference
Sequence parallel attention adopting a hybrid ulysses and ring attention approach. Support GQA Support QKV packed.