ColossalAI Versions Save

Making large AI models cheaper, faster and more accessible

v0.3.7

1 week ago

What's Changed

Release

[release] update version (#5654) by Hongxin Liu
[release] grok-1 inference benchmark (#5500) by binmakeswell
[release] grok-1 314b inference (#5490) by binmakeswell

Hotfix

[hotfix] add soft link to support required files (#5661) by Tong Li
[hotfix] Fixed fused layernorm bug without apex (#5609) by Edenzzzz
[hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606) by Edenzzzz
[hotfix] fix typo s/get_defualt_parser /get_default_parser (#5548) by digger yu
[hotfix] quick fixes to make legacy tutorials runnable (#5559) by Edenzzzz
[hotfix] set return_outputs=False in examples and polish code (#5404) by Wenhao Chen
[hotfix] fix typo s/keywrods/keywords etc. (#5429) by digger yu

News

[news] llama3 and open-sora v1.1 (#5655) by binmakeswell

Lazyinit

[lazyinit] skip whisper test (#5653) by Hongxin Liu

Shardformer

[shardformer] refactor pipeline grad ckpt config (#5646) by Hongxin Liu
[shardformer] fix chatglm implementation (#5644) by Hongxin Liu
[shardformer] remove useless code (#5645) by flybird11111
[shardformer] update transformers (#5583) by Wang Binluo
[shardformer] fix pipeline grad ckpt (#5620) by Hongxin Liu
[shardformer] refactor embedding resize (#5603) by flybird11111
[shardformer] Sequence Parallelism Optimization (#5533) by Zhongkai Zhao
[shardformer] fix pipeline forward error if custom layer distribution is used (#5189) by Insu Jang
[shardformer] update colo attention to support custom mask (#5510) by Hongxin Liu
[shardformer]Fix lm parallel. (#5480) by flybird11111
[shardformer] fix gathering output when using tensor parallelism (#5431) by flybird11111

Fix

[Fix]: implement thread-safety singleton to avoid deadlock for very large-scale training scenarios (#5625) by Season
[fix] fix typo s/muiti-node /multi-node etc. (#5448) by digger yu
[Fix] Grok-1 use tokenizer from the same pretrained path (#5532) by Yuanheng Zhao
[fix] fix grok-1 example typo (#5506) by Yuanheng Zhao

Coloattention

[coloattention]modify coloattention (#5627) by flybird11111

Example

[example] llama3 (#5631) by binmakeswell
[example] update Grok-1 inference (#5495) by Yuanheng Zhao
[example] add grok-1 inference (#5485) by Hongxin Liu

Exampe

[exampe] update llama example (#5626) by Hongxin Liu

Feature

[Feature] Support LLaMA-3 CPT and ST (#5619) by Tong Li

Zero

[zero] support multiple (partial) backward passes (#5596) by Hongxin Liu

Doc

[doc] fix ColossalMoE readme (#5599) by Camille Zhong
[doc] update open-sora demo (#5479) by binmakeswell
[doc] release Open-Sora 1.0 with model weights (#5468) by binmakeswell

Devops

[devops] remove post commit ci (#5566) by Hongxin Liu
[devops] fix example test ci (#5504) by Hongxin Liu
[devops] fix compatibility (#5444) by Hongxin Liu

Shardformer, pipeline

[shardformer, pipeline] add gradient_checkpointing_ratio and heterogenous shard policy for llama (#5508) by Wenhao Chen

Colossalchat

[ColossalChat] Update RLHF V2 (#5286) by YeAnbang

Format

[format] applied code formatting on changed files in pull request 5510 (#5517) by github-actions[bot]

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.7...v0.3.6

v0.3.6

2 months ago

What's Changed

Release

[release] update version (#5411) by Hongxin Liu

Colossal-llama2

[colossal-llama2] add stream chat examlple for chat version model (#5428) by Camille Zhong

Hotfix

[hotfix] fix stable diffusion inference bug. (#5289) by Youngon
[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335) by digger yu
[hotfix] fix typo change enabel to enable under colossalai/shardformer/ (#5317) by digger yu
[hotfix] fix typo change _descrption to _description (#5331) by digger yu
[hotfix] fix typo of openmoe model source (#5403) by Luo Yihang
[hotfix] fix sd vit import error (#5420) by MickeyCHAN
[hotfix] Fix wrong import in meta_registry (#5392) by Stephan Kölker
[hotfix] fix variable type for top_p (#5313) by CZYCW

Doc

[doc] Fix typo s/infered/inferred/ (#5288) by hugo-syn
[doc] update some translations with README-zh-Hans.md (#5382) by digger yu
[doc] sora release (#5425) by binmakeswell
[doc] fix blog link by binmakeswell
[doc] fix blog link by binmakeswell
[doc] updated installation command (#5389) by Frank Lee
[doc] Fix typo (#5361) by yixiaoer

Eval-hotfix

[eval-hotfix] set few_shot_data to None when few shot is disabled (#5422) by Dongruixuan Li

Devops

[devops] fix extention building (#5427) by Hongxin Liu

Example

[example]add gpt2 benchmark example script. (#5295) by flybird11111
[example] reuse flash attn patch (#5400) by Hongxin Liu

Workflow

[workflow] added pypi channel (#5412) by Frank Lee

Shardformer

[shardformer]gather llama logits (#5398) by flybird11111

Setup

[setup] fixed nightly release (#5388) by Frank Lee

Fsdp

[fsdp] impl save/load shard model/optimizer (#5357) by QinLuo

Extension

[extension] hotfix jit extension setup (#5402) by Hongxin Liu

Llama

[llama] fix training and inference scripts (#5384) by Hongxin Liu

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.6...v0.3.5

v0.3.5

2 months ago

What's Changed

Release

[release] update version (#5380) by Hongxin Liu

Llama

Merge pull request #5377 from hpcaitech/example/llama-npu by Frank Lee
[llama] fix memory issue (#5371) by Hongxin Liu
[llama] polish training script and fix optim ckpt (#5368) by Hongxin Liu
[llama] fix neftune & pbar with start_step (#5364) by Camille Zhong
[llama] add flash attn patch for npu (#5362) by Hongxin Liu
[llama] update training script (#5360) by Hongxin Liu
[llama] fix dataloader for hybrid parallel (#5358) by Hongxin Liu

Moe

[moe] fix tests by ver217
[moe] fix mixtral optim checkpoint (#5344) by Hongxin Liu
[moe] fix mixtral forward default value (#5329) by Hongxin Liu
[moe] fix mixtral checkpoint io (#5314) by Hongxin Liu
[moe] support mixtral (#5309) by Hongxin Liu
[moe] update capacity computing (#5253) by Hongxin Liu
[moe] init mixtral impl by Xuanlei Zhao
[moe]: fix ep/tp tests, add hierarchical all2all (#4982) by Wenhao Chen
[moe] support optimizer checkpoint (#5015) by Xuanlei Zhao
[moe] merge moe into main (#4978) by Xuanlei Zhao

Lr-scheduler

[lr-scheduler] fix load state dict and add test (#5369) by Hongxin Liu

Eval

[eval] update llama npu eval (#5366) by Camille Zhong

Gemini

[gemini] fix param op hook when output is tuple (#5355) by Hongxin Liu
[gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150) by flybird11111
[gemini]fix gemini optimzer, saving Shardformer in Gemini got list assignment index out of range (#5085) by flybird11111
[gemini] gemini support extra-dp (#5043) by flybird11111
[gemini] gemini support tensor parallelism. (#4942) by flybird11111

Fix

[fix] remove unnecessary dp_size assert (#5351) by Wenhao Chen

Checkpointio

[checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347) by Hongxin Liu

Chat

[Chat] fix sft loss nan (#5345) by YeAnbang

Extension

[extension] fixed exception catch (#5342) by Frank Lee

Doc

[doc] added docs for extensions (#5324) by Frank Lee
[doc] add llama2-13B disyplay (#5285) by Desperado-Jia
[doc] fix doc typo (#5256) by binmakeswell
[doc] fix typo in Colossal-LLaMA-2/README.md (#5247) by digger yu
[doc] SwiftInfer release (#5236) by binmakeswell
[doc] add Colossal-LLaMA-2-13B (#5234) by binmakeswell
[doc] Make leaderboard format more uniform and good-looking (#5231) by JIMMY ZHAO
[doc] Update README.md of Colossal-LLAMA2 (#5233) by Camille Zhong
[doc] Update required third-party library list for testing and torch comptibility checking (#5207) by Zhongkai Zhao
[doc] update pytorch version in documents. (#5177) by flybird11111
[doc] fix colossalqa document (#5146) by Michelle
[doc] updated paper citation (#5131) by Frank Lee
[doc] add moe news (#5128) by binmakeswell

Tests

[tests] fix t5 test. (#5322) by flybird11111

Accelerator

Merge pull request #5321 from FrankLeeeee/hotfix/accelerator-api by Frank Lee
[accelerator] fixed npu api by FrankLeeeee
[accelerator] init the accelerator module (#5129) by Frank Lee

Workflow

[workflow] updated CI image (#5318) by Frank Lee
[workflow] fixed oom tests (#5275) by Frank Lee
[workflow] fixed incomplete bash command (#5272) by Frank Lee
[workflow] fixed build CI (#5240) by Frank Lee

Feat

[feat] refactored extension module (#5298) by Frank Lee

Nfc

[NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style (#5228) by 李文军
[nfc] fix typo colossalai/shardformer/ (#5133) by digger yu
[nfc] fix typo change directoty to directory (#5111) by digger yu
[nfc] fix typo and author name (#5089) by digger yu
[nfc] fix typo in docs/ (#4972) by digger yu

Hotfix

[hotfix] fix 3d plugin test (#5292) by Hongxin Liu
[hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230) by Zhongkai Zhao
[hotfix]: add pp sanity check and fix mbs arg (#5268) by Wenhao Chen
[hotfix] removed unused flag (#5242) by Frank Lee
[hotfix] fixed memory usage of shardformer module replacement (#5122) by アマデウス
[Hotfix] Fix model policy matching strategy in ShardFormer (#5064) by Zhongkai Zhao
[hotfix]: modify create_ep_hierarchical_group and add test (#5032) by Wenhao Chen
[hotfix] Suport extra_kwargs in ShardConfig (#5031) by Zhongkai Zhao
[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926) by littsk
[hotfix] fix grad accumulation plus clipping for gemini (#5002) by Baizhou Zhang

Sync

Merge pull request #5278 from ver217/sync/npu by Frank Lee

Shardformer

[shardformer] hybridparallelplugin support gradients accumulation. (#5246) by flybird11111
[shardformer] llama support DistCrossEntropy (#5176) by flybird11111
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) by Wenhao Chen
[shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084) by flybird11111
[shardformer] fix llama error when transformers upgraded. (#5055) by flybird11111
[shardformer] Fix serialization error with Tensor Parallel state saving (#5018) by Jun Gao

Ci

[ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276) by flybird11111
[ci] fix shardformer tests. (#5255) by flybird11111
[ci] fixed ddp test (#5254) by Frank Lee
[ci] fixed booster test (#5251) by Frank Lee

Npu

[npu] change device to accelerator api (#5239) by Hongxin Liu
[npu] use extension for op builder (#5172) by Xuanlei Zhao
[npu] support triangle attention for llama (#5130) by Xuanlei Zhao
[npu] add npu support for hybrid plugin and llama (#5090) by Xuanlei Zhao
[npu] add npu support for gemini and zero (#5067) by Hongxin Liu

Pipeline

[pipeline] A more general _communicate in p2p (#5062) by Elsa Granger
[pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214) by Wenhao Chen
[pipeline]: support arbitrary batch size in forward_only mode (#5201) by Wenhao Chen
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) by Wenhao Chen

Format

[format] applied code formatting on changed files in pull request 5234 (#5235) by github-actions[bot]
[format] applied code formatting on changed files in pull request 5115 (#5118) by github-actions[bot]
[format] applied code formatting on changed files in pull request 5124 (#5125) by github-actions[bot]
[format] applied code formatting on changed files in pull request 5088 (#5127) by github-actions[bot]
[format] applied code formatting on changed files in pull request 5067 (#5072) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4926 (#5007) by github-actions[bot]

Colossal-llama-2

[Colossal-LLaMA-2] Release Colossal-LLaMA-2-13b-base model (#5224) by Tong Li
[Colossal-Llama-2] Add finetuning Colossal-Llama-2 example (#4878) by Yuanchen

Devops

[devops] update torch versoin in ci (#5217) by Hongxin Liu

Colossaleval

[ColossalEval] Support GSM, Data Leakage Evaluation and Tensor Parallel (#5169) by Yuanchen

Colossalqa

[colossalqa] fix pangu api (#5170) by Michelle
[ColossalQA] refactor server and webui & add new feature (#5138) by Michelle

Plugin

[plugin]fix 3d checkpoint load when booster boost without optimizer. (#5135) by flybird11111

Feature

[FEATURE] Add Safety Eval Datasets to ColossalEval (#5095) by Zian(Andy) Zheng
[Feature] Add document retrieval QA (#5020) by YeAnbang

Inference

[inference] refactor examples and fix schedule (#5077) by Hongxin Liu
[inference] update examples and engine (#5073) by Xu Kai
[inference] Refactor inference architecture (#5057) by Xu Kai
[Inference] Fix bug in ChatGLM2 Tensor Parallelism (#5014) by Jianghai

Hotfix/hybridengine

[hotfix/hybridengine] Fix init model with random parameters in benchmark (#5074) by Bin Jia
[hotfix/hybridengine] fix bug when tp*pp size = 1 (#5069) by Bin Jia

Misc

[misc] remove outdated submodule (#5070) by Hongxin Liu
[misc] add code owners (#5024) by Hongxin Liu

Kernels

[Kernels]added flash-decoidng of triton (#5063) by Cuiqing Li (李崔卿)
[Kernels]Update triton kernels into 2.1.0 (#5046) by Cuiqing Li (李崔卿)

Exampe

[exampe] fix llama example' loss error when using gemini plugin (#5060) by flybird11111

Pipeline,shardformer

[pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when strict=False, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017) by Elsa Granger

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.5...v0.3.4

v0.3.4

6 months ago

What's Changed

Release

[release] update version (#4995) by Hongxin Liu

Pipeline inference

[Pipeline Inference] Merge pp with tp (#4993) by Bin Jia
[Pipeline inference] Combine kvcache with pipeline inference (#4938) by Bin Jia
[Pipeline Inference] Sync pipeline inference branch to main (#4820) by Bin Jia

Doc

[doc] add supported feature diagram for hybrid parallel plugin (#4996) by ppt0011
[doc]Update doc for colossal-inference (#4989) by Cuiqing Li (李崔卿)
Merge pull request #4889 from ppt0011/main by ppt0011
[doc] add reminder for issue encountered with hybrid adam by ppt0011
[doc] update advanced tutorials, training gpt with hybrid parallelism (#4866) by flybird11111
Merge pull request #4858 from Shawlleyw/main by ppt0011
[doc] update slack link (#4823) by binmakeswell
[doc] add lazy init docs (#4808) by Hongxin Liu
Merge pull request #4805 from TongLi3701/docs/fix by Desperado-Jia
[doc] polish shardformer doc (#4779) by Baizhou Zhang
[doc] add llama2 domain-specific solution news (#4789) by binmakeswell

Hotfix

[hotfix] fix the bug of repeatedly storing param group (#4951) by Baizhou Zhang
[hotfix] Fix the bug where process groups were not being properly released. (#4940) by littsk
[hotfix] fix torch 2.0 compatibility (#4936) by Hongxin Liu
[hotfix] fix lr scheduler bug in torch 2.0 (#4864) by Baizhou Zhang
[hotfix] fix bug in sequence parallel test (#4887) by littsk
[hotfix] Correct several erroneous code comments (#4794) by littsk
[hotfix] fix norm type error in zero optimizer (#4795) by littsk
[hotfix] change llama2 Colossal-LLaMA-2 script filename (#4800) by Chandler-Bing

Kernels

[Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965) by Cuiqing Li

Inference

[Inference] Dynamic Batching Inference, online and offline (#4953) by Jianghai
[Inference]ADD Bench Chatglm2 script (#4963) by Jianghai
[inference] add reference and fix some bugs (#4937) by Xu Kai
[inference] Add smmoothquant for llama (#4904) by Xu Kai
[inference] add llama2 support (#4898) by Xu Kai
[inference]fix import bug and delete down useless init (#4830) by Jianghai

Test

[test] merge old components to test to model zoo (#4945) by Hongxin Liu
[test] add no master test for low level zero plugin (#4934) by Zhongkai Zhao
Merge pull request #4856 from KKZ20/test/model_support_for_low_level_zero by ppt0011
[test] modify model supporting part of low_level_zero plugin (including correspoding docs) by Zhongkai Zhao

Refactor

[Refactor] Integrated some lightllm kernels into token-attention (#4946) by Cuiqing Li

Nfc

[nfc] fix some typo with colossalai/ docs/ etc. (#4920) by digger yu
[nfc] fix minor typo in README (#4846) by Blagoy Simandoff
[NFC] polish code style (#4799) by Camille Zhong
[NFC] polish colossalai/inference/quant/gptq/cai_gptq/init.py code style (#4792) by Michelle

Format

[format] applied code formatting on changed files in pull request 4820 (#4886) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4908 (#4918) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4595 (#4602) by github-actions[bot]

Gemini

[gemini] support gradient accumulation (#4869) by Baizhou Zhang
[gemini] support amp o3 for gemini (#4872) by Hongxin Liu

Kernel

[kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921) by Hongxin Liu

Feature

[feature] support no master weights option for low level zero plugin (#4816) by Zhongkai Zhao
[feature] Add clip_grad_norm for hybrid_parallel_plugin (#4837) by littsk
[feature] ColossalEval: Evaluation Pipeline for LLMs (#4786) by Yuanchen

Checkpointio

[checkpointio] hotfix torch 2.0 compatibility (#4824) by Hongxin Liu
[checkpointio] support unsharded checkpointIO for hybrid parallel (#4774) by Baizhou Zhang

Infer

[infer] fix test bug (#4838) by Xu Kai
[Infer] Serving example w/ ray-serve (multiple GPU case) (#4841) by Yuanheng Zhao
[Infer] Colossal-Inference serving example w/ TorchServe (single GPU case) (#4771) by Yuanheng Zhao

Chat

[chat] fix gemini strategy (#4698) by flybird11111

Misc

[misc] add last_epoch in CosineAnnealingWarmupLR (#4778) by Yan haixu

Lazy

[lazy] support from_pretrained (#4801) by Hongxin Liu

Fix

[fix] fix weekly runing example (#4787) by flybird11111

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.4...v0.3.3

v0.3.3

7 months ago

What's Changed

Release

[release] update version (#4775) by Hongxin Liu

Inference

[inference] chatglm2 infer demo (#4724) by Jianghai

Feature

[feature] add gptq for inference (#4754) by Xu Kai
[Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577) by Cuiqing Li

Bug

[bug] Fix the version check bug in colossalai run when generating the cmd. (#4713) by littsk
[bug] fix get_default_parser in examples (#4764) by Baizhou Zhang

Lazy

[lazy] support torch 2.0 (#4763) by Hongxin Liu

Chat

[chat]: add lora merge weights config (#4766) by Wenhao Chen
[chat]: update rm, add wandb and fix bugs (#4471) by Wenhao Chen

Doc

[doc] add shardformer doc to sidebar (#4768) by Baizhou Zhang
[doc] clean up outdated docs (#4765) by Hongxin Liu
Merge pull request #4757 from ppt0011/main by ppt0011
[doc] put native colossalai plugins first in description section by Pengtai Xu
[doc] add model examples for each plugin by Pengtai Xu
[doc] put individual plugin explanation in front by Pengtai Xu
[doc] explain suitable use case for each plugin by Pengtai Xu
[doc] explaination of loading large pretrained models (#4741) by Baizhou Zhang
[doc] polish shardformer doc (#4735) by Baizhou Zhang
[doc] add shardformer support matrix/update tensor parallel documents (#4728) by Baizhou Zhang
[doc] Add user document for Shardformer (#4702) by Baizhou Zhang
[doc] fix llama2 code link (#4726) by binmakeswell
[doc] add potential solution for OOM in llama2 example (#4699) by Baizhou Zhang
[doc] Update booster user documents. (#4669) by Baizhou Zhang

Shardformer

[shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758) by Baizhou Zhang
[shardformer] add custom policy in hybrid parallel plugin (#4718) by Xuanlei Zhao
[shardformer] update seq parallel document (#4730) by Bin Jia
[shardformer] update pipeline parallel document (#4725) by flybird11111
[shardformer] to fix whisper test failed due to significant accuracy differences. (#4710) by flybird11111
[shardformer] fix GPT2DoubleHeadsModel (#4703) by flybird11111
[shardformer] update shardformer readme (#4689) by flybird11111
[shardformer]fix gpt2 double head (#4663) by flybird11111
[shardformer] update llama2/opt finetune example and fix llama2 policy (#4645) by flybird11111
[shardformer] Support customized policy for llamav2 based model with HybridParallelPlugin (#4624) by eric8607242

Misc

[misc] update pre-commit and run all files (#4752) by Hongxin Liu

Format

[format] applied code formatting on changed files in pull request 4743 (#4750) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4726 (#4727) by github-actions[bot]

Legacy

[legacy] clean up legacy code (#4743) by Hongxin Liu
Merge pull request #4738 from ppt0011/main by ppt0011
[legacy] remove deterministic data loader test by Pengtai Xu
[legacy] move communication and nn to legacy and refactor logger (#4671) by Hongxin Liu

Kernel

[kernel] update triton init #4740 (#4740) by Xuanlei Zhao

Example

[example] llama2 add fine-tune example (#4673) by flybird11111
[example] add gpt2 HybridParallelPlugin example (#4653) by Bin Jia
[example] update vit example for hybrid parallel plugin (#4641) by Baizhou Zhang

Hotfix

[hotfix] Fix import error: colossal.kernel without triton installed (#4722) by Yuanheng Zhao
[hotfix] fix typo in hybrid parallel io (#4697) by Baizhou Zhang

Devops

[devops] fix concurrency group (#4667) by Hongxin Liu
[devops] fix concurrency group and compatibility test (#4665) by Hongxin Liu

Pipeline

[pipeline] set optimizer to optional in execute_pipeline (#4630) by Baizhou Zhang

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.3...v0.3.2

v0.3.2

8 months ago

What's Changed

Release

[release] update version (#4623) by Hongxin Liu

Shardformer

Merge pull request #4612 from hpcaitech/feature/shardformer by Hongxin Liu
[shardformer] update shardformer readme (#4617) by flybird11111
[shardformer] Add overlap optional for HybridParallelPlugin (#4615) by Bin Jia
[shardformer] update bert finetune example with HybridParallelPlugin (#4584) by flybird11111
[shardformer] Pytree fix (#4533) by Jianghai
[shardformer] support from_pretrained when loading model with HybridParallelPlugin (#4575) by Baizhou Zhang
[shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540) by Baizhou Zhang
[shardformer] fix submodule replacement bug when enabling pp (#4544) by Baizhou Zhang
[shardformer] support pp+tp+zero1 tests (#4531) by flybird11111
[shardformer] fix opt test hanging (#4521) by flybird11111
[shardformer] Add overlap support for gpt2 (#4535) by Bin Jia
[shardformer] fix emerged bugs after updating transformers (#4526) by Baizhou Zhang
[shardformer] zero1+pp and the corresponding tests (#4517) by Jianghai
[shardformer] support sharded checkpoint IO for models of HybridParallelPlugin (#4506) by Baizhou Zhang
[shardformer] opt fix. (#4514) by flybird11111
[shardformer] vit/llama/t5 ignore the sequence parallelism flag and some fix. (#4498) by flybird11111
[shardformer] tests for 3d parallel (#4493) by Jianghai
[shardformer] chatglm support sequence parallel (#4482) by flybird11111
[shardformer] support tp+zero for shardformer (#4472) by Baizhou Zhang
[shardformer] Pipeline/whisper (#4456) by Jianghai
[shardformer] bert support sequence parallel. (#4455) by flybird11111
[shardformer] bloom support sequence parallel (#4465) by flybird11111
[shardformer] support interleaved pipeline (#4448) by LuGY
[shardformer] support DDP in HybridPlugin/add tp+dp tests (#4446) by Baizhou Zhang
[shardformer] fix import by ver217
[shardformer] fix embedding by ver217
[shardformer] update bloom/llama/vit/chatglm tests (#4420) by flybird11111
[shardformer]update t5 tests for using all optimizations. (#4407) by flybird11111
[shardformer] update tests for all optimization (#4413) by flybird11111
[shardformer] rewrite tests for opt/bloom/llama/vit/chatglm (#4395) by Baizhou Zhang
[shardformer]fix, test gpt2 for AMP+TP (#4403) by flybird11111
[shardformer] test all optimizations (#4399) by flybird1111
[shardformer] update shardformer to use flash attention 2 (#4392) by flybird1111
[Shardformer] Merge flash attention branch to pipeline branch (#4362) by flybird1111
[shardformer] add util functions for shardformer tests/fix sync_shared_param (#4366) by Baizhou Zhang
[shardformer] support Blip2 (#4243) by FoolPlayer
[shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit by klhhhhh
[shardformer] pre-commit check files by klhhhhh
[shardformer] register without auto policy by klhhhhh
[shardformer] ChatGLM support layernorm sharding by klhhhhh
[shardformer] delete some file by klhhhhh
[shardformer] support chatglm without layernorm by klhhhhh
[shardformer] polish code by klhhhhh
[shardformer] polish chatglm code by klhhhhh
[shardformer] add test kit in model zoo for chatglm by klhhhhh
[shardformer] vit test finish and support by klhhhhh
[shardformer] added tests by klhhhhh
Feature/chatglm (#4240) by Kun Lin
[shardformer] support whisper (#4212) by FoolPlayer
[shardformer] support SAM (#4231) by FoolPlayer
Feature/vit support (#4182) by Kun Lin
[shardformer] support pipeline base vit model (#4284) by FoolPlayer
[shardformer] support inplace sharding (#4251) by Hongxin Liu
[shardformer] fix base policy (#4229) by Hongxin Liu
[shardformer] support lazy init (#4202) by Hongxin Liu
[shardformer] fix type hint by ver217
[shardformer] rename policy file name by ver217

Legacy

[legacy] move builder and registry to legacy (#4603) by Hongxin Liu
[legacy] move engine to legacy (#4560) by Hongxin Liu
[legacy] move trainer to legacy (#4545) by Hongxin Liu

Test

[test] fix gemini checkpoint and gpt test (#4620) by Hongxin Liu
[test] ignore gpt2 shardformer test (#4619) by Hongxin Liu
[test] Hotfix/fix some model test and refactor check util api (#4369) by Bin Jia
[test] skip some not compatible models by FoolPlayer
[test] add shard util tests by ver217
[test] update shardformer tests by ver217
[test] remove useless tests (#4359) by Hongxin Liu

Zero

[zero] hotfix master param sync (#4618) by Hongxin Liu
[zero]fix zero ckptIO with offload (#4529) by LuGY
[zero]support zero2 with gradient accumulation (#4511) by LuGY

Checkpointio

[checkpointio] support huggingface from_pretrained for all plugins (#4606) by Baizhou Zhang
[checkpointio] optimize zero optim checkpoint io (#4591) by Hongxin Liu

Coati

Merge pull request #4542 from hpcaitech/chatglm by yingliu-hpc
Merge pull request #4541 from ver217/coati/chatglm by yingliu-hpc
[coati] update ci by ver217
[coati] add chatglm model (#4539) by yingliu-hpc

Doc

[doc] add llama2 benchmark (#4604) by binmakeswell
[DOC] hotfix/llama2news (#4595) by binmakeswell
[doc] fix a typo in examples/tutorial/auto_parallel/README.md (#4430) by Tian Siyuan
[doc] update Coati README (#4405) by Wenhao Chen
[doc] add Series A Funding and NeurIPS news (#4377) by binmakeswell
[doc] Fix gradient accumulation doc. (#4349) by flybird1111

Pipeline

[pipeline] 1f1b schedule receive microbatch size (#4589) by Hongxin Liu
[pipeline] rewrite bert tests and fix some bugs (#4409) by Jianghai
[pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388) by Baizhou Zhang
[pipeline] add chatglm (#4363) by Jianghai
[pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354) by Baizhou Zhang
[pipeline] refactor test pipeline and remove useless utils in pipeline (#4324) by Jianghai
[pipeline] add unit test for 1f1b (#4303) by LuGY
[pipeline] fix return_dict/fix pure_pipeline_test (#4331) by Baizhou Zhang
[pipeline] add pipeline support for all T5 models (#4310) by Baizhou Zhang
[pipeline] test pure pipeline process using llama (#4218) by Jianghai
[pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300) by Baizhou Zhang
[pipeline] reformat for unified design (#4283) by Jianghai
[pipeline] OPT model pipeline (#4258) by Jianghai
[pipeline] refactor gpt2 pipeline forwards (#4287) by Baizhou Zhang
[pipeline] support shardformer for GPT2ForQuestionAnswering & complete pipeline support for GPT2 (#4245) by Baizhou Zhang
[pipeline] finish bloom models pipeline and tests (#4223) by Jianghai
[pipeline] All bert models (#4233) by Jianghai
[pipeline] add pipeline forward for variants of gpt2 (#4238) by Baizhou Zhang
[pipeline] Add Pipeline Forward for GPT2Model Shardformer (#4224) by Baizhou Zhang
[pipeline] add bloom model pipeline (#4210) by Jianghai
[pipeline] Llama causal lm and llama for sequence classification pipeline (#4208) by Jianghai
[pipeline] Llama pipeline (#4205) by Jianghai
[pipeline] Bert pipeline for shardformer and its tests (#4197) by Jianghai
[pipeline] move bert related pipeline components to shardformer (#4187) by Jianghai
[pipeline] add bert_for_pretraining bert_lmhead forward and policy (#4172) by Jianghai
[pipeline] update shardformer docstring by ver217
[pipeline] update shardformer policy by ver217
[pipeline] build bloom model and policy , revise the base class of policy (#4161) by Jianghai
[pipeline]add pipeline policy and bert forward (#4130) by Jianghai
[pipeline] add stage manager (#4093) by Hongxin Liu
[pipeline]add pipeline policy and bert forward (#4130) by Jianghai
[pipeline] refactor 1f1b schedule (#4115) by Hongxin Liu
[pipeline] implement p2p communication (#4100) by Hongxin Liu
[pipeline] add stage manager (#4093) by Hongxin Liu

Fix

[Fix] Fix compile error (#4357) by Mashiro
[fix] coloattention support flash attention 2 (#4347) by flybird1111

Devops

[devops] cancel previous runs in the PR (#4546) by Hongxin Liu
[devops] add large-scale distributed test marker (#4452) by Hongxin Liu

Example

[example] change accelerate version (#4431) by Tian Siyuan
[example] update streamlit 0.73.1 to 1.11.1 (#4386) by ChengDaqi2023
[example] add llama2 example (#4527) by Hongxin Liu

Shardformer/fix overlap bug

[shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516) by Bin Jia

Format

[format] applied code formatting on changed files in pull request 4479 (#4504) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4441 (#4445) by github-actions[bot]

Gemini

[gemini] improve compatibility and add static placement policy (#4479) by Hongxin Liu
[gemini] fix tensor storage cleaning in state dict collection (#4396) by Baizhou Zhang

Shardformer/sequence parallel

[shardformer/sequence parallel] not support opt of seq-parallel, add warning and fix a bug in gpt2 pp (#4488) by Bin Jia
[shardformer/sequence parallel] support gpt2 seq parallel with pp/dp/tp (#4460) by Bin Jia
[shardformer/sequence parallel] Cherry pick commit to new branch (#4450) by Bin Jia

Chat

[chat] update config and prompt (#4139) by Michelle
[chat] fix bugs and add unit tests (#4213) by Wenhao Chen

Misc

[misc] update requirements by ver217
[misc] resolve code factor issues (#4433) by Hongxin Liu

Sharformer

[sharformer] add first version of policy of chatglm by klhhhhh

Hotfix

[hotfix] fix gemini and zero test (#4333) by Hongxin Liu
[hotfix] fix opt pipeline (#4293) by Jianghai
[hotfix] fix unsafe async comm in zero (#4404) by LuGY
[hotfix] update gradio 3.11 to 3.34.0 (#4329) by caption

Plugin

[plugin] add 3d parallel plugin (#4295) by Hongxin Liu

Bugs

[bugs] hot fix some testing bugs for new models (#4268) by Jianghai

Cluster

[cluster] add process group mesh (#4039) by Hongxin Liu

Kernel

[kernel] updated unittests for coloattention (#4389) by flybird1111

Coloattention

[coloattention] fix import error (#4380) by flybird1111

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.2...v0.3.1

v0.3.1

9 months ago

What's Changed

Release

[release] update version (#4332) by Hongxin Liu

Chat

[chat] fix compute_approx_kl (#4338) by Wenhao Chen
[chat] removed cache file (#4155) by Frank Lee
[chat] use official transformers and fix some issues (#4117) by Wenhao Chen
[chat] remove naive strategy and split colossalai strategy (#4094) by Wenhao Chen
[chat] refactor trainer class (#4080) by Wenhao Chen
[chat]: fix chat evaluation possible bug (#4064) by Michelle
[chat] refactor strategy class with booster api (#3987) by Wenhao Chen
[chat] refactor actor class (#3968) by Wenhao Chen
[chat] add distributed PPO trainer (#3740) by Hongxin Liu

Zero

[zero] optimize the optimizer step time (#4221) by LuGY
[zero] support shard optimizer state dict of zero (#4194) by LuGY
[zero] add state dict for low level zero (#4179) by LuGY
[zero] allow passing process group to zero12 (#4153) by LuGY
[zero]support no_sync method for zero1 plugin (#4138) by LuGY
[zero] refactor low level zero for shard evenly (#4030) by LuGY

Nfc

[NFC] polish applications/Chat/coati/models/utils.py codestyle (#4277) by yuxuan-lou
[NFC] polish applications/Chat/coati/trainer/strategies/base.py code style (#4278) by Zirui Zhu
[NFC] polish applications/Chat/coati/models/generation.py code style (#4275) by RichardoLuo
[NFC] polish applications/Chat/inference/server.py code style (#4274) by Yuanchen
[NFC] fix format of application/Chat/coati/trainer/utils.py (#4273) by アマデウス
[NFC] polish applications/Chat/examples/train_reward_model.py code style (#4271) by Xu Kai
[NFC] fix: format (#4270) by dayellow
[NFC] polish runtime_preparation_pass style (#4266) by Wenhao Chen
[NFC] polish unary_elementwise_generator.py code style (#4267) by YeAnbang
[NFC] polish applications/Chat/coati/trainer/base.py code style (#4260) by shenggan
[NFC] polish applications/Chat/coati/dataset/sft_dataset.py code style (#4259) by Zheng Zangwei (Alex Zheng)
[NFC] polish colossalai/booster/plugin/low_level_zero_plugin.py code style (#4256) by 梁爽
[NFC] polish colossalai/auto_parallel/offload/amp_optimizer.py code style (#4255) by Yanjia0
[NFC] polish colossalai/cli/benchmark/utils.py code style (#4254) by ocd_with_naming
[NFC] policy applications/Chat/examples/ray/mmmt_prompt.py code style (#4250) by CZYCW
[NFC] polish applications/Chat/coati/models/base/actor.py code style (#4248) by Junming Wu
[NFC] polish applications/Chat/inference/requirements.txt code style (#4265) by Camille Zhong
[NFC] Fix format for mixed precision (#4253) by Jianghai
[nfc]fix ColossalaiOptimizer is not defined (#4122) by digger yu
[nfc] fix dim not defined and fix typo (#3991) by digger yu
[nfc] fix typo colossalai/zero (#3923) by digger yu
[nfc]fix typo colossalai/pipeline tensor nn (#3899) by digger yu
[nfc] fix typo colossalai/nn (#3887) by digger yu
[nfc] fix typo colossalai/cli fx kernel (#3847) by digger yu

Example

Fix/format (#4261) by Michelle
[example] add llama pretraining (#4257) by binmakeswell
[example] fix bucket size in example of gpt gemini (#4028) by LuGY
[example] update ViT example using booster api (#3940) by Baizhou Zhang
Merge pull request #3905 from MaruyamaAya/dreambooth by Liu Ziming
[example] update opt example using booster api (#3918) by Baizhou Zhang
[example] Modify palm example with the new booster API (#3913) by Liu Ziming
[example] update gemini examples (#3868) by jiangmingyan

Ci

[ci] support testmon core pkg change detection (#4305) by Hongxin Liu

Checkpointio

[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin (#4302) by Baizhou Zhang
Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini Plugin (#4141) by Baizhou Zhang
[checkpointio] sharded optimizer checkpoint for DDP plugin (#4002) by Baizhou Zhang
[checkpointio] General Checkpointing of Sharded Optimizers (#3984) by Baizhou Zhang

Lazy

[lazy] support init on cuda (#4269) by Hongxin Liu
[lazy] fix compatibility problem on torch 1.13 (#3911) by Hongxin Liu
[lazy] refactor lazy init (#3891) by Hongxin Liu

Kernels

[Kernels] added triton-implemented of self attention for colossal-ai (#4241) by Cuiqing Li

Docker

[docker] fixed ninja build command (#4203) by Frank Lee
[docker] added ssh and rdma support for docker (#4192) by Frank Lee

Dtensor

[dtensor] fixed readme file name and removed deprecated file (#4162) by Frank Lee
[dtensor] updated api and doc (#3845) by Frank Lee

Workflow

[workflow] show test duration (#4159) by Frank Lee
[workflow] added status check for test coverage workflow (#4106) by Frank Lee
[workflow] cover all public repositories in weekly report (#4069) by Frank Lee
[workflow] fixed the directory check in build (#3980) by Frank Lee
[workflow] cancel duplicated workflow jobs (#3960) by Frank Lee
[workflow] cancel duplicated workflow jobs (#3960) by Frank Lee
[workflow] added docker latest tag for release (#3920) by Frank Lee
[workflow] fixed workflow check for docker build (#3849) by Frank Lee

Cli

[cli] hotfix launch command for multi-nodes (#4165) by Hongxin Liu

Format

[format] applied code formatting on changed files in pull request 4152 (#4157) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4021 (#4022) by github-actions[bot]

Shardformer

[shardformer] added development protocol for standardization (#4149) by Frank Lee
[shardformer] made tensor parallelism configurable (#4144) by Frank Lee
[shardformer] refactored some doc and api (#4137) by Frank Lee
[shardformer] write an shardformer example with bert finetuning (#4126) by jiangmingyan
[shardformer] added embedding gradient check (#4124) by Frank Lee
[shardformer] import huggingface implicitly (#4101) by Frank Lee
[shardformer] integrate with data parallelism (#4103) by Frank Lee
[shardformer] supported fused normalization (#4112) by Frank Lee
[shardformer] supported bloom model (#4098) by Frank Lee
[shardformer] support vision transformer (#4096) by Kun Lin
[shardformer] shardformer support opt models (#4091) by jiangmingyan
[shardformer] refactored layernorm (#4086) by Frank Lee
[shardformer] Add layernorm (#4072) by FoolPlayer
[shardformer] supported fused qkv checkpoint (#4073) by Frank Lee
[shardformer] add linearconv1d test (#4067) by FoolPlayer
[shardformer] support module saving and loading (#4062) by Frank Lee
[shardformer] refactored the shardformer layer structure (#4053) by Frank Lee
[shardformer] adapted T5 and LLaMa test to use kit (#4049) by Frank Lee
[shardformer] add gpt2 test and layer class refactor (#4041) by FoolPlayer
[shardformer] supported T5 and its variants (#4045) by Frank Lee
[shardformer] adapted llama to the new API (#4036) by Frank Lee
[shardformer] fix bert and gpt downstream with new api (#4024) by FoolPlayer
[shardformer] updated doc (#4016) by Frank Lee
[shardformer] removed inplace tensor sharding (#4018) by Frank Lee
[shardformer] refactored embedding and dropout to parallel module (#4013) by Frank Lee
[shardformer] integrated linear 1D with dtensor (#3996) by Frank Lee
[shardformer] Refactor shardformer api (#4001) by FoolPlayer
[shardformer] fix an error in readme (#3988) by FoolPlayer
[Shardformer] Downstream bert (#3979) by FoolPlayer
[shardformer] shardformer support t5 model (#3994) by wukong1992
[shardformer] support llama model using shardformer (#3969) by wukong1992
[shardformer] Add dropout layer in shard model and refactor policy api (#3949) by FoolPlayer
[shardformer] Unit test (#3928) by FoolPlayer
[shardformer] Align bert value (#3907) by FoolPlayer
[shardformer] add gpt2 policy and modify shard and slicer to support (#3883) by FoolPlayer
[shardformer] add Dropout layer support different dropout pattern (#3856) by FoolPlayer
[shardformer] update readme with modules implement doc (#3834) by FoolPlayer
[shardformer] refactored the user api (#3828) by Frank Lee
[shardformer] updated readme (#3827) by Frank Lee
[shardformer]: Feature/shardformer, add some docstring and readme (#3816) by FoolPlayer
[shardformer] init shardformer code structure (#3731) by FoolPlayer
[shardformer] add gpt2 policy and modify shard and slicer to support (#3883) by FoolPlayer
[shardformer] add Dropout layer support different dropout pattern (#3856) by FoolPlayer
[shardformer] update readme with modules implement doc (#3834) by FoolPlayer
[shardformer] refactored the user api (#3828) by Frank Lee
[shardformer] updated readme (#3827) by Frank Lee
[shardformer]: Feature/shardformer, add some docstring and readme (#3816) by FoolPlayer
[shardformer] init shardformer code structure (#3731) by FoolPlayer

Test

[test] fixed tests failed due to dtensor change (#4082) by Frank Lee
[test] fixed codefactor format report (#4026) by Frank Lee

Device

[device] support init device mesh from process group (#3990) by Frank Lee

Hotfix

[hotfix] fix import bug in checkpoint_io (#4142) by Baizhou Zhang
[hotfix]fix argument naming in docs and examples (#4083) by Baizhou Zhang

Doc

[doc] update and revise some typos and errs in docs (#4107) by Jianghai
[doc] add a note about unit-testing to CONTRIBUTING.md (#3970) by Baizhou Zhang
[doc] add lazy init tutorial (#3922) by Hongxin Liu
[doc] fix docs about booster api usage (#3898) by Baizhou Zhang
[doc]update moe chinese document. (#3890) by jiangmingyan
[doc] update document of zero with chunk. (#3855) by jiangmingyan
[doc] update nvme offload documents. (#3850) by jiangmingyan

Examples

[examples] copy resnet example to image (#4090) by Jianghai

Testing

[testing] move pytest to be inside the function (#4087) by Frank Lee

Gemini

Merge pull request #4056 from Fridge003/hotfix/fix_gemini_chunk_config_searching by Baizhou Zhang
[gemini] fix argument naming during chunk configuration searching by Baizhou Zhang
[gemini] fixed the gemini checkpoint io (#3934) by Frank Lee
[gemini] fixed the gemini checkpoint io (#3934) by Frank Lee

Devops

[devops] fix build on pr ci (#4043) by Hongxin Liu
[devops] update torch version in compability test (#3919) by Hongxin Liu
[devops] hotfix testmon cache clean logic (#3917) by Hongxin Liu
[devops] hotfix CI about testmon cache (#3910) by Hongxin Liu
[devops] improving testmon cache (#3902) by Hongxin Liu

Sync

Merge pull request #4025 from hpcaitech/develop by Frank Lee
Merge pull request #3967 from ver217/update-develop by Frank Lee
Merge pull request #3942 from hpcaitech/revert-3931-sync/develop-to-shardformer by FoolPlayer
Revert "[sync] sync feature/shardformer with develop" by Frank Lee
Merge pull request #3931 from FrankLeeeee/sync/develop-to-shardformer by FoolPlayer
Merge pull request #3916 from FrankLeeeee/sync/dtensor-with-develop by Frank Lee
Merge pull request #3915 from FrankLeeeee/update/develop by Frank Lee

Booster

[booster] make optimizer argument optional for boost (#3993) by Wenhao Chen
[booster] update bert example, using booster api (#3885) by wukong1992

Evaluate

[evaluate] support gpt evaluation with reference (#3972) by Yuanchen

Feature

Merge pull request #3926 from hpcaitech/feature/dtensor by Frank Lee

Bf16

[bf16] add bf16 support (#3882) by Hongxin Liu

Evaluation

[evaluation] improvement on evaluation (#3862) by Yuanchen

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.1...v0.3.0

v0.3.0

11 months ago

What's Changed

Release

[release] bump to v0.3.0 (#3830) by Frank Lee

Nfc

[nfc] fix typo colossalai/ applications/ (#3831) by digger yu
[NFC]fix typo colossalai/auto_parallel nn utils etc. (#3779) by digger yu
[NFC] fix typo colossalai/amp auto_parallel autochunk (#3756) by digger yu
[NFC] fix typo with colossalai/auto_parallel/tensor_shard (#3742) by digger yu
[NFC] fix typo applications/ and colossalai/ (#3735) by digger-yu
[NFC] polish colossalai/engine/gradient_handler/init.py code style (#3329) by Ofey Chan
[NFC] polish colossalai/context/random/init.py code style (#3327) by yuxuan-lou
[NFC] polish colossalai/fx/tracer/_tracer_utils.py (#3323) by Michelle
[NFC] polish colossalai/gemini/paramhooks/_param_hookmgr.py code style by Xu Kai
[NFC] polish initializer_data.py code style (#3287) by RichardoLuo
[NFC] polish colossalai/cli/benchmark/models.py code style (#3290) by Ziheng Qin
[NFC] polish initializer_3d.py code style (#3279) by Kai Wang (Victor Kai)
[NFC] polish colossalai/engine/gradient_accumulation/_gradient_accumulation.py code style (#3277) by Sze-qq
[NFC] polish colossalai/context/parallel_context.py code style (#3276) by Arsmart1
[NFC] polish colossalai/engine/schedule/_pipeline_schedule_v2.py code style (#3275) by Zirui Zhu
[NFC] polish colossalai/nn/_ops/addmm.py code style (#3274) by Tong Li
[NFC] polish colossalai/amp/init.py code style (#3272) by lucasliunju
[NFC] polish code style (#3273) by Xuanlei Zhao
[NFC] policy colossalai/fx/proxy.py code style (#3269) by CZYCW
[NFC] polish code style (#3268) by Yuanchen
[NFC] polish tensor_placement_policy.py code style (#3265) by Camille Zhong
[NFC] polish colossalai/fx/passes/split_module.py code style (#3263) by CsRic
[NFC] polish colossalai/global_variables.py code style (#3259) by jiangmingyan
[NFC] polish colossalai/engine/gradient_handler/_moe_gradient_handler.py (#3260) by LuGY
[NFC] polish colossalai/fx/profiler/experimental/profiler_module/embedding.py code style (#3256) by dayellow

Doc

[doc] update document of gemini instruction. (#3842) by jiangmingyan
Merge pull request #3810 from jiangmingyan/amp by jiangmingyan
[doc]fix by jiangmingyan
[doc]fix by jiangmingyan
[doc] add warning about fsdp plugin (#3813) by Hongxin Liu
[doc] add removed change of config.py by jiangmingyan
[doc] add removed warning by jiangmingyan
[doc] update amp document by Mingyan Jiang
[doc] update amp document by Mingyan Jiang
[doc] update amp document by Mingyan Jiang
[doc] update gradient accumulation (#3771) by jiangmingyan
[doc] update gradient cliping document (#3778) by jiangmingyan
[doc] add deprecated warning on doc Basics section (#3754) by Yanjia0
[doc] add booster docstring and fix autodoc (#3789) by Hongxin Liu
[doc] add tutorial for booster checkpoint (#3785) by Hongxin Liu
[doc] add tutorial for booster plugins (#3758) by Hongxin Liu
[doc] add tutorial for cluster utils (#3763) by Hongxin Liu
[doc] update hybrid parallelism doc (#3770) by jiangmingyan
[doc] update booster tutorials (#3718) by jiangmingyan
[doc] fix chat spelling error (#3671) by digger-yu
[Doc] enhancement on README.md for chat examples (#3646) by Camille Zhong
[doc] Fix typo under colossalai and doc(#3618) by digger-yu
[doc] .github/workflows/README.md (#3605) by digger-yu
[doc] fix setup.py typo (#3603) by digger-yu
[doc] fix op_builder/README.md (#3597) by digger-yu
[doc] Update .github/workflows/README.md (#3577) by digger-yu
[doc] Update 1D_tensor_parallel.md (#3573) by digger-yu
[doc] Update 1D_tensor_parallel.md (#3563) by digger-yu
[doc] Update README.md (#3549) by digger-yu
[doc] Update README-zh-Hans.md (#3541) by digger-yu
[doc] hide diffusion in application path (#3519) by binmakeswell
[doc] add requirement and highlight application (#3516) by binmakeswell
[doc] Add docs for clip args in zero optim (#3504) by YH
[doc] updated contributor list (#3474) by Frank Lee
[doc] polish diffusion example (#3386) by Jan Roudaut
[doc] add Intel cooperation news (#3333) by binmakeswell
[doc] added authors to the chat application (#3307) by Fazzie-Maqianli

Workflow

[workflow] supported test on CUDA 10.2 (#3841) by Frank Lee
[workflow] fixed testmon cache in build CI (#3806) by Frank Lee
[workflow] changed to doc build to be on schedule and release (#3825) by Frank Lee
[workflow] enblaed doc build from a forked repo (#3815) by Frank Lee
[workflow] enable testing for develop & feature branch (#3801) by Frank Lee
[workflow] fixed the docker build workflow (#3794) by Frank Lee

Booster

[booster] add warning for torch fsdp plugin doc (#3833) by wukong1992
[booster] torch fsdp fix ckpt (#3788) by wukong1992
[booster] removed models that don't support fsdp (#3744) by wukong1992
[booster] support torch fsdp plugin in booster (#3697) by wukong1992
[booster] add tests for ddp and low level zero's checkpointio (#3715) by jiangmingyan
[booster] fix no_sync method (#3709) by Hongxin Liu
[booster] update prepare dataloader method for plugin (#3706) by Hongxin Liu
[booster] refactor all dp fashion plugins (#3684) by Hongxin Liu
[booster] gemini plugin support shard checkpoint (#3610) by jiangmingyan
[booster] add low level zero plugin (#3594) by Hongxin Liu
[booster] fixed the torch ddp plugin with the new checkpoint api (#3442) by Frank Lee
[booster] implement Gemini plugin (#3352) by ver217

Docs

[docs] change placememt_policy to placement_policy (#3829) by digger yu

Evaluation

[evaluation] add automatic evaluation pipeline (#3821) by Yuanchen

Docker

[Docker] Fix a couple of build issues (#3691) by Yanming W
Fix/docker action (#3266) by liuzeming

Api

[API] add docstrings and initialization to apex amp, naive amp (#3783) by jiangmingyan

Test

[test] fixed lazy init test import error (#3799) by Frank Lee
Update test_ci.sh by Camille Zhong
[test] refactor tests with spawn (#3452) by Frank Lee
[test] reorganize zero/gemini tests (#3445) by ver217
[test] fixed gemini plugin test (#3411) by Frank Lee

Format

[format] applied code formatting on changed files in pull request 3786 (#3787) by github-actions[bot]
[format] Run lint on colossalai.engine (#3367) by Hakjin Lee

Plugin

[plugin] a workaround for zero plugins' optimizer checkpoint (#3780) by Hongxin Liu
[plugin] torch ddp plugin supports sharded model checkpoint (#3775) by Hongxin Liu

Chat

[chat] add performance and tutorial (#3786) by binmakeswell
[chat] fix bugs in stage 3 training (#3759) by Yuanchen
[chat] fix community example ray (#3719) by MisterLin1995
[chat] fix train_prompts.py gemini strategy bug (#3666) by zhang-yi-chi
[chat] PPO stage3 doc enhancement (#3679) by Camille Zhong
[chat] add opt attn kernel (#3655) by Hongxin Liu
[chat] typo accimulation_steps -> accumulation_steps (#3662) by tanitna
Merge pull request #3656 from TongLi3701/chat/update_eval by Tong Li
[chat] set default zero2 strategy (#3667) by binmakeswell
[chat] refactor model save/load logic (#3654) by Hongxin Liu
[chat] remove lm model class (#3653) by Hongxin Liu
[chat] refactor trainer (#3648) by Hongxin Liu
[chat] polish performance evaluator (#3647) by Hongxin Liu
Merge pull request #3621 from zhang-yi-chi/fix/chat-train-prompts-single-gpu by Tong Li
[Chat] Remove duplicate functions (#3625) by ddobokki
[chat] fix enable single gpu training bug by zhang-yi-chi
[chat] polish code note typo (#3612) by digger-yu
[chat] update reward model sh (#3578) by binmakeswell
[chat] ChatGPT train prompts on ray example (#3309) by MisterLin1995
[chat] polish tutorial doc (#3551) by binmakeswell
[chat]add examples of training with limited resources in chat readme (#3536) by Yuanchen
[chat]: add vf_coef argument for PPOTrainer (#3318) by zhang-yi-chi
[chat] add zero2 cpu strategy for sft training (#3520) by ver217
[chat] fix stage3 PPO sample sh command (#3477) by binmakeswell
[Chat]Add Peft support & fix the ptx bug (#3433) by YY Lin
[chat]fix save_model(#3377) by Dr-Corgi
[chat]fix readme (#3429) by kingkingofall
[Chat] fix the tokenizer "int too big to convert" error in SFT training (#3453) by Camille Zhong
[chat]fix sft training for bloom, gpt and opt (#3418) by Yuanchen
[chat] correcting a few obvious typos and grammars errors (#3338) by Andrew

Devops

[devops] fix doc test on pr (#3782) by Hongxin Liu
[devops] fix ci for document check (#3751) by Hongxin Liu
[devops] make build on PR run automatically (#3748) by Hongxin Liu
[devops] update torch version of CI (#3725) by Hongxin Liu
[devops] fix chat ci (#3628) by Hongxin Liu

Amp

[amp] Add naive amp demo (#3774) by jiangmingyan

Auto

[auto] fix install cmd (#3772) by binmakeswell

Fix

[fix] Add init to fix import error when importing _analyzer (#3668) by Ziyue Jiang

Ci

[CI] fix typo with tests/ etc. (#3727) by digger-yu
[CI] fix typo with tests components (#3695) by digger-yu
[CI] fix some spelling errors (#3707) by digger-yu
[CI] Update test_sharded_optim_with_sync_bn.py (#3688) by digger-yu

Example

[example] add train resnet/vit with booster example (#3694) by Hongxin Liu
[example] add finetune bert with booster example (#3693) by Hongxin Liu
[example] fix community doc (#3586) by digger-yu
[example] reorganize for community examples (#3557) by binmakeswell
[example] remove redundant texts & update roberta (#3493) by mandoxzhang
[example] update roberta with newer ColossalAI (#3472) by mandoxzhang
[example] update examples related to zero/gemini (#3431) by ver217

Tensor

[tensor] Refactor handle_trans_spec in DistSpecManager by YH

Zero

[zero] Suggests a minor change to confusing variable names in the ZeRO optimizer. (#3173) by YH
[zero] reorganize zero/gemini folder structure (#3424) by ver217

Gemini

[gemini] accelerate inference (#3641) by Hongxin Liu
[gemini] state dict supports fp16 (#3590) by Hongxin Liu
[gemini] support save state dict in shards (#3581) by Hongxin Liu
[gemini] gemini supports lazy init (#3379) by Hongxin Liu

Bot

[bot] Automated submodule synchronization (#3596) by github-actions[bot]

Misc

[misc] op_builder/builder.py (#3593) by digger-yu
[misc] add verbose arg for zero and op builder (#3552) by Hongxin Liu

Coati

[coati] fix install cmd (#3592) by binmakeswell
[coati] add costom model suppor tguide (#3579) by Fazzie-Maqianli
[coati] Fix LlamaCritic (#3475) by gongenlei

Fx

[fx] fix meta tensor registration (#3589) by Hongxin Liu

Chatgpt

[chatgpt] Detached PPO Training (#3195) by csric
[chatgpt] add pre-trained model RoBERTa for RLHF stage 2 & 3 (#3223) by Camille Zhong

Lazyinit

[lazyinit] fix clone and deepcopy (#3553) by Hongxin Liu

Checkpoint

[checkpoint] Shard saved checkpoint need to be compatible with the naming format of hf checkpoint files (#3479) by jiangmingyan
[checkpoint] support huggingface style sharded checkpoint (#3461) by jiangmingyan
[checkpoint] refactored the API and added safetensors support (#3427) by Frank Lee

Chat community

[Chat Community] Update README.md (fixed#3487) (#3506) by NatalieC323

Dreambooth

Revert "[dreambooth] fixing the incompatibity in requirements.txt (#3190) (#3378)" (#3481) by NatalieC323
[dreambooth] fixing the incompatibity in requirements.txt (#3190) (#3378) by NatalieC323

Autoparallel

[autoparallel]integrate auto parallel feature with new tracer (#3408) by YuliangLiu0306
[autoparallel] adapt autoparallel with new analyzer (#3261) by YuliangLiu0306

Moe

[moe] add checkpoint for moe models (#3354) by HELSON

Hotfix

[hotfix] meta_tensor_compatibility_with_torch2 by YuliangLiu0306

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.0...v0.2.8

v0.2.8

1 year ago

What's Changed

Release

[release] v0.2.8 (#3305) by Frank Lee

Format

[format] applied code formatting on changed files in pull request 3300 (#3302) by github-actions[bot]
[format] applied code formatting on changed files in pull request 3296 (#3298) by github-actions[bot]

Doc

[doc] add ColossalChat news (#3304) by binmakeswell
[doc] add ColossalChat (#3297) by binmakeswell
[doc] fix typo (#3222) by binmakeswell
[doc] update chatgpt doc paper link (#3229) by Camille Zhong
[doc] add community contribution guide (#3153) by binmakeswell
[doc] add Intel cooperation for biomedicine (#3108) by binmakeswell

Application

[application] updated the README (#3301) by Frank Lee

Chat

[chat]polish prompts training (#3300) by BlueRum
[chat]Update Readme (#3296) by BlueRum

Coati

[coati] fix inference profanity check (#3299) by ver217
[coati] inference supports profanity check (#3295) by ver217
[coati] add repetition_penalty for inference (#3294) by ver217
[coati] fix inference output (#3285) by ver217
[Coati] first commit (#3283) by Fazzie-Maqianli

Colossalchat

[ColossalChat]add cite for datasets (#3292) by Fazzie-Maqianli

Examples

[examples] polish AutoParallel readme (#3270) by YuliangLiu0306
[examples] Solving the diffusion issue of incompatibility issue#3169 (#3170) by NatalieC323

Fx

[fx] meta registration compatibility (#3253) by HELSON
[FX] refactor experimental tracer and adapt it with hf models (#3157) by YuliangLiu0306

Booster

[booster] implemented the torch ddd + resnet example (#3232) by Frank Lee
[booster] implemented the cluster module (#3191) by Frank Lee
[booster] added the plugin base and torch ddp plugin (#3180) by Frank Lee
[booster] added the accelerator implementation (#3159) by Frank Lee
[booster] implemented mixed precision class (#3151) by Frank Lee

Ci

[CI] Fix pre-commit workflow (#3238) by Hakjin Lee

Api

[API] implement device mesh manager (#3221) by YuliangLiu0306
[api] implemented the checkpoint io module (#3205) by Frank Lee

Hotfix

[hotfix] skip torchaudio tracing test (#3211) by YuliangLiu0306
[hotfix] layout converting issue (#3188) by YuliangLiu0306

Chatgpt

[chatgpt] add precision option for colossalai (#3233) by ver217
[chatgpt] unnify datasets (#3218) by Fazzie-Maqianli
[chatgpt] support instuct training (#3216) by Fazzie-Maqianli
[chatgpt]add reward model code for deberta (#3199) by Yuanchen
[chatgpt]support llama (#3070) by Fazzie-Maqianli
[chatgpt] add supervised learning fine-tune code (#3183) by pgzhang
[chatgpt]Reward Model Training Process update (#3133) by BlueRum
[chatgpt] fix trainer generate kwargs (#3166) by ver217
[chatgpt] fix ppo training hanging problem with gemini (#3162) by ver217
[chatgpt]update ci (#3087) by BlueRum
[chatgpt]Fix examples (#3116) by BlueRum
[chatgpt] fix lora support for gpt (#3113) by BlueRum
[chatgpt] type miss of kwargs (#3107) by hiko2MSP
[chatgpt] fix lora save bug (#3099) by BlueRum

Lazyinit

[lazyinit] combine lazy tensor with dtensor (#3204) by ver217
[lazyinit] add correctness verification (#3147) by ver217
[lazyinit] refactor lazy tensor and lazy init ctx (#3131) by ver217

Auto

[auto] fix requirements typo for issue #3125 (#3209) by Yan Fang

Analyzer

[Analyzer] fix analyzer tests (#3197) by YuliangLiu0306

Dreambooth

[dreambooth] fixing the incompatibity in requirements.txt (#3190) by NatalieC323

Auto-parallel

[auto-parallel] add auto-offload feature (#3154) by Zihao

Zero

[zero] Refactor ZeroContextConfig class using dataclass (#3186) by YH

Test

[test] fixed torchrec registration in model zoo (#3177) by Frank Lee
[test] fixed torchrec model test (#3167) by Frank Lee
[test] add torchrec models to test model zoo (#3139) by YuliangLiu0306
[test] added transformers models to test model zoo (#3135) by Frank Lee
[test] added torchvision models to test model zoo (#3132) by Frank Lee
[test] added timm models to test model zoo (#3129) by Frank Lee

Refactor

[refactor] update docs (#3174) by Saurav Maheshkar

Tests

[tests] model zoo add torchaudio models (#3138) by ver217
[tests] diffuser models in model zoo (#3136) by HELSON

Docker

[docker] Add opencontainers image-spec to Dockerfile (#3006) by Saurav Maheshkar

Dtensor

[DTensor] refactor dtensor with new components (#3089) by YuliangLiu0306

Workflow

[workflow] purged extension cache before GPT test (#3128) by Frank Lee

Autochunk

[autochunk] support complete benchmark (#3121) by Xuanlei Zhao

Tutorial

[tutorial] update notes for TransformerEngine (#3098) by binmakeswell

Nvidia

[NVIDIA] Add FP8 example using TE (#3080) by Kirthi Shankar Sivamani

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.8...v0.2.7

v0.2.6

1 year ago

What's Changed

Release

[release] v0.2.6 (#3057) by Frank Lee

Doc

[doc] moved doc test command to bottom (#3075) by Frank Lee
[doc] specified operating system requirement (#3019) by Frank Lee
[doc] update nvme offload doc (#3014) by ver217
[doc] add ISC tutorial (#2997) by binmakeswell
[doc] add deepspeed citation and copyright (#2996) by ver217
[doc] added reference to related works (#2994) by Frank Lee
[doc] update news (#2983) by binmakeswell
[doc] fix chatgpt inference typo (#2964) by binmakeswell
[doc] add env scope (#2933) by binmakeswell
[doc] added readme for documentation (#2935) by Frank Lee
[doc] removed read-the-docs (#2932) by Frank Lee
[doc] update installation for GPT (#2922) by binmakeswell
[doc] add os scope, update tutorial install and tips (#2914) by binmakeswell
[doc] fix GPT tutorial (#2860) by dawei-wang
[doc] fix typo in opt inference tutorial (#2849) by Zheng Zeng
[doc] update OPT serving (#2804) by binmakeswell
[doc] update example and OPT serving link (#2769) by binmakeswell
[doc] add opt service doc (#2747) by Frank Lee
[doc] fixed a typo in GPT readme (#2736) by cloudhuang
[doc] updated documentation version list (#2730) by Frank Lee

Workflow

[workflow] fixed doc build trigger condition (#3072) by Frank Lee
[workflow] supported conda package installation in doc test (#3028) by Frank Lee
[workflow] fixed the post-commit failure when no formatting needed (#3020) by Frank Lee
[workflow] added auto doc test on PR (#2929) by Frank Lee
[workflow] moved pre-commit to post-commit (#2895) by Frank Lee

Booster

[booster] init module structure and definition (#3056) by Frank Lee

Example

[example] fix redundant note (#3065) by binmakeswell
[example] fixed opt model downloading from huggingface by Tomek
[example] add LoRA support (#2821) by Haofan Wang

Autochunk

[autochunk] refactor chunk memory estimation (#2762) by Xuanlei Zhao

Chatgpt

[chatgpt] change critic input as state (#3042) by wenjunyang
[chatgpt] fix readme (#3025) by BlueRum
[chatgpt] Add saving ckpt callback for PPO (#2880) by LuGY
[chatgpt]fix inference model load (#2988) by BlueRum
[chatgpt] allow shard init and display warning (#2986) by ver217
[chatgpt] fix lora gemini conflict in RM training (#2984) by BlueRum
[chatgpt] making experience support dp (#2971) by ver217
[chatgpt]fix lora bug (#2974) by BlueRum
[chatgpt] fix inference demo loading bug (#2969) by BlueRum
[ChatGPT] fix README (#2966) by Fazzie-Maqianli
[chatgpt]add inference example (#2944) by BlueRum
[chatgpt]support opt & gpt for rm training (#2876) by BlueRum
[chatgpt] Support saving ckpt in examples (#2846) by BlueRum
[chatgpt] fix rm eval (#2829) by BlueRum
[chatgpt] add test checkpoint (#2797) by ver217
[chatgpt] update readme about checkpoint (#2792) by ver217
[chatgpt] startegy add prepare method (#2766) by ver217
[chatgpt] disable shard init for colossalai (#2767) by ver217
[chatgpt] support colossalai strategy to train rm (#2742) by BlueRum
[chatgpt]fix train_rm bug with lora (#2741) by BlueRum

Dtensor

[DTensor] refactor CommSpec (#3034) by YuliangLiu0306
[DTensor] refactor sharding spec (#2987) by YuliangLiu0306
[DTensor] implementation of dtensor (#2946) by YuliangLiu0306

Hotfix

[hotfix] skip auto checkpointing tests (#3029) by YuliangLiu0306
[hotfix] add shard dim to aviod backward communication error (#2954) by YuliangLiu0306
[hotfix]: Remove math.prod dependency (#2837) by Jiatong (Julius) Han
[hotfix] fix autoparallel compatibility test issues (#2754) by YuliangLiu0306
[hotfix] fix chunk size can not be divided (#2867) by HELSON
Hotfix/auto parallel zh doc (#2820) by YuliangLiu0306
[hotfix] add copyright for solver and device mesh (#2803) by YuliangLiu0306
[hotfix] add correct device for fake_param (#2796) by HELSON

Revert] recover "[refactor

[revert] recover "[refactor] restructure configuration files (#2977)" (#3022) by Frank Lee

Format

[format] applied code formatting on changed files in pull request 3025 (#3026) by github-actions[bot]
[format] applied code formatting on changed files in pull request 2997 (#3008) by github-actions[bot]
[format] applied code formatting on changed files in pull request 2933 (#2939) by github-actions[bot]
[format] applied code formatting on changed files in pull request 2922 (#2923) by github-actions[bot]

Pipeline

[pipeline] Add Simplified Alpa DP Partition (#2507) by Ziyue Jiang

Fx

[fx] remove depreciated algorithms. (#2312) (#2313) by Super Daniel

Refactor

[refactor] restructure configuration files (#2977) by Saurav Maheshkar

Kernel

[kernel] cached the op kernel and fixed version check (#2886) by Frank Lee

Misc

[misc] add reference (#2930) by ver217

Autoparallel

[autoparallel] apply repeat block to reduce solving time (#2912) by YuliangLiu0306
[autoparallel] find repeat blocks (#2854) by YuliangLiu0306
[autoparallel] Patch meta information for nodes that will not be handled by SPMD solver (#2823) by Boyuan Yao
[autoparallel] Patch meta information of torch.where (#2822) by Boyuan Yao
[autoparallel] Patch meta information of torch.tanh() and torch.nn.Dropout (#2773) by Boyuan Yao
[autoparallel] Patch tensor related operations meta information (#2789) by Boyuan Yao
[autoparallel] rotor solver refactor (#2813) by Boyuan Yao
[autoparallel] Patch meta information of torch.nn.Embedding (#2760) by Boyuan Yao
[autoparallel] distinguish different parallel strategies (#2699) by YuliangLiu0306

Zero

[zero] trivial zero optimizer refactoring (#2869) by YH
[zero] fix wrong import (#2777) by Boyuan Yao

Cli

[cli] handled version check exceptions (#2848) by Frank Lee

Triton

[triton] added copyright information for flash attention (#2835) by Frank Lee

Nfc

[NFC] polish colossalai/engine/schedule/_pipeline_schedule.py code style (#2744) by Michelle
[NFC] polish code format by binmakeswell
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/graph_analysis.py code style (#2737) by xyupeng
[NFC] polish colossalai/context/process_group_initializer/initializer_2d.py code style (#2726) by Zirui Zhu
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/batch_norm_handler.py code style (#2728) by Zangwei Zheng
[NFC] polish colossalai/cli/cli.py code style (#2734) by Wangbo Zhao(黑色枷锁)

Exmaple

[exmaple] add bert and albert (#2824) by Jiarui Fang

Ci/cd

[CI/CD] fix nightly release CD running on forked repo (#2812) by LuGY

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.6...v0.2.5