Long Context Attention Versions Save

Sequence Parallel Attention for Long Context LLM Model Training and Inference

1 month ago

Sequence parallel attention adopting a hybrid ulysses and ring attention approach. Support GQA Support QKV packed.