Making large AI models cheaper, faster and more accessible
gradient_checkpointing_ratio
and heterogenous shard policy for llama (#5508) by Wenhao Chen
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.7...v0.3.6
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.6...v0.3.5
strict=False
, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017) by Elsa Granger
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.5...v0.3.4
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.4...v0.3.3
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.3...v0.3.2
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.2...v0.3.1
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.1...v0.3.0
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.0...v0.2.8
Dockerfile
(#3006) by Saurav Maheshkar
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.8...v0.2.7
torch.where
(#2822) by Boyuan Yao
torch.tanh()
and torch.nn.Dropout
(#2773) by Boyuan Yao
torch.nn.Embedding
(#2760) by Boyuan Yao
Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.6...v0.2.5