ColossalAI Versions Save

Making large AI models cheaper, faster and more accessible

v0.3.7

1 week ago

What's Changed

Release

Hotfix

  • [hotfix] add soft link to support required files (#5661) by Tong Li
  • [hotfix] Fixed fused layernorm bug without apex (#5609) by Edenzzzz
  • [hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606) by Edenzzzz
  • [hotfix] fix typo s/get_defualt_parser /get_default_parser (#5548) by digger yu
  • [hotfix] quick fixes to make legacy tutorials runnable (#5559) by Edenzzzz
  • [hotfix] set return_outputs=False in examples and polish code (#5404) by Wenhao Chen
  • [hotfix] fix typo s/keywrods/keywords etc. (#5429) by digger yu

News

Lazyinit

Shardformer

  • [shardformer] refactor pipeline grad ckpt config (#5646) by Hongxin Liu
  • [shardformer] fix chatglm implementation (#5644) by Hongxin Liu
  • [shardformer] remove useless code (#5645) by flybird11111
  • [shardformer] update transformers (#5583) by Wang Binluo
  • [shardformer] fix pipeline grad ckpt (#5620) by Hongxin Liu
  • [shardformer] refactor embedding resize (#5603) by flybird11111
  • [shardformer] Sequence Parallelism Optimization (#5533) by Zhongkai Zhao
  • [shardformer] fix pipeline forward error if custom layer distribution is used (#5189) by Insu Jang
  • [shardformer] update colo attention to support custom mask (#5510) by Hongxin Liu
  • [shardformer]Fix lm parallel. (#5480) by flybird11111
  • [shardformer] fix gathering output when using tensor parallelism (#5431) by flybird11111

Fix

  • [Fix]: implement thread-safety singleton to avoid deadlock for very large-scale training scenarios (#5625) by Season
  • [fix] fix typo s/muiti-node /multi-node etc. (#5448) by digger yu
  • [Fix] Grok-1 use tokenizer from the same pretrained path (#5532) by Yuanheng Zhao
  • [fix] fix grok-1 example typo (#5506) by Yuanheng Zhao

Coloattention

  • [coloattention]modify coloattention (#5627) by flybird11111

Example

Exampe

Feature

  • [Feature] Support LLaMA-3 CPT and ST (#5619) by Tong Li

Zero

  • [zero] support multiple (partial) backward passes (#5596) by Hongxin Liu

Doc

Devops

Shardformer, pipeline

  • [shardformer, pipeline] add gradient_checkpointing_ratio and heterogenous shard policy for llama (#5508) by Wenhao Chen

Colossalchat

  • [ColossalChat] Update RLHF V2 (#5286) by YeAnbang

Format

  • [format] applied code formatting on changed files in pull request 5510 (#5517) by github-actions[bot]

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.7...v0.3.6

v0.3.6

2 months ago

What's Changed

Release

Colossal-llama2

  • [colossal-llama2] add stream chat examlple for chat version model (#5428) by Camille Zhong

Hotfix

  • [hotfix] fix stable diffusion inference bug. (#5289) by Youngon
  • [hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335) by digger yu
  • [hotfix] fix typo change enabel to enable under colossalai/shardformer/ (#5317) by digger yu
  • [hotfix] fix typo change _descrption to _description (#5331) by digger yu
  • [hotfix] fix typo of openmoe model source (#5403) by Luo Yihang
  • [hotfix] fix sd vit import error (#5420) by MickeyCHAN
  • [hotfix] Fix wrong import in meta_registry (#5392) by Stephan Kölker
  • [hotfix] fix variable type for top_p (#5313) by CZYCW

Doc

Eval-hotfix

  • [eval-hotfix] set few_shot_data to None when few shot is disabled (#5422) by Dongruixuan Li

Devops

  • [devops] fix extention building (#5427) by Hongxin Liu

Example

  • [example]add gpt2 benchmark example script. (#5295) by flybird11111
  • [example] reuse flash attn patch (#5400) by Hongxin Liu

Workflow

  • [workflow] added pypi channel (#5412) by Frank Lee

Shardformer

Setup

  • [setup] fixed nightly release (#5388) by Frank Lee

Fsdp

  • [fsdp] impl save/load shard model/optimizer (#5357) by QinLuo

Extension

  • [extension] hotfix jit extension setup (#5402) by Hongxin Liu

Llama

  • [llama] fix training and inference scripts (#5384) by Hongxin Liu

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.6...v0.3.5

v0.3.5

2 months ago

What's Changed

Release

Llama

  • Merge pull request #5377 from hpcaitech/example/llama-npu by Frank Lee
  • [llama] fix memory issue (#5371) by Hongxin Liu
  • [llama] polish training script and fix optim ckpt (#5368) by Hongxin Liu
  • [llama] fix neftune & pbar with start_step (#5364) by Camille Zhong
  • [llama] add flash attn patch for npu (#5362) by Hongxin Liu
  • [llama] update training script (#5360) by Hongxin Liu
  • [llama] fix dataloader for hybrid parallel (#5358) by Hongxin Liu

Moe

Lr-scheduler

  • [lr-scheduler] fix load state dict and add test (#5369) by Hongxin Liu

Eval

Gemini

  • [gemini] fix param op hook when output is tuple (#5355) by Hongxin Liu
  • [gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150) by flybird11111
  • [gemini]fix gemini optimzer, saving Shardformer in Gemini got list assignment index out of range (#5085) by flybird11111
  • [gemini] gemini support extra-dp (#5043) by flybird11111
  • [gemini] gemini support tensor parallelism. (#4942) by flybird11111

Fix

  • [fix] remove unnecessary dp_size assert (#5351) by Wenhao Chen

Checkpointio

  • [checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347) by Hongxin Liu

Chat

  • [Chat] fix sft loss nan (#5345) by YeAnbang

Extension

  • [extension] fixed exception catch (#5342) by Frank Lee

Doc

  • [doc] added docs for extensions (#5324) by Frank Lee
  • [doc] add llama2-13B disyplay (#5285) by Desperado-Jia
  • [doc] fix doc typo (#5256) by binmakeswell
  • [doc] fix typo in Colossal-LLaMA-2/README.md (#5247) by digger yu
  • [doc] SwiftInfer release (#5236) by binmakeswell
  • [doc] add Colossal-LLaMA-2-13B (#5234) by binmakeswell
  • [doc] Make leaderboard format more uniform and good-looking (#5231) by JIMMY ZHAO
  • [doc] Update README.md of Colossal-LLAMA2 (#5233) by Camille Zhong
  • [doc] Update required third-party library list for testing and torch comptibility checking (#5207) by Zhongkai Zhao
  • [doc] update pytorch version in documents. (#5177) by flybird11111
  • [doc] fix colossalqa document (#5146) by Michelle
  • [doc] updated paper citation (#5131) by Frank Lee
  • [doc] add moe news (#5128) by binmakeswell

Tests

Accelerator

  • Merge pull request #5321 from FrankLeeeee/hotfix/accelerator-api by Frank Lee
  • [accelerator] fixed npu api by FrankLeeeee
  • [accelerator] init the accelerator module (#5129) by Frank Lee

Workflow

  • [workflow] updated CI image (#5318) by Frank Lee
  • [workflow] fixed oom tests (#5275) by Frank Lee
  • [workflow] fixed incomplete bash command (#5272) by Frank Lee
  • [workflow] fixed build CI (#5240) by Frank Lee

Feat

  • [feat] refactored extension module (#5298) by Frank Lee

Nfc

  • [NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style (#5228) by 李文军
  • [nfc] fix typo colossalai/shardformer/ (#5133) by digger yu
  • [nfc] fix typo change directoty to directory (#5111) by digger yu
  • [nfc] fix typo and author name (#5089) by digger yu
  • [nfc] fix typo in docs/ (#4972) by digger yu

Hotfix

  • [hotfix] fix 3d plugin test (#5292) by Hongxin Liu
  • [hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230) by Zhongkai Zhao
  • [hotfix]: add pp sanity check and fix mbs arg (#5268) by Wenhao Chen
  • [hotfix] removed unused flag (#5242) by Frank Lee
  • [hotfix] fixed memory usage of shardformer module replacement (#5122) by アマデウス
  • [Hotfix] Fix model policy matching strategy in ShardFormer (#5064) by Zhongkai Zhao
  • [hotfix]: modify create_ep_hierarchical_group and add test (#5032) by Wenhao Chen
  • [hotfix] Suport extra_kwargs in ShardConfig (#5031) by Zhongkai Zhao
  • [hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926) by littsk
  • [hotfix] fix grad accumulation plus clipping for gemini (#5002) by Baizhou Zhang

Sync

  • Merge pull request #5278 from ver217/sync/npu by Frank Lee

Shardformer

  • [shardformer] hybridparallelplugin support gradients accumulation. (#5246) by flybird11111
  • [shardformer] llama support DistCrossEntropy (#5176) by flybird11111
  • [shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) by Wenhao Chen
  • [shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084) by flybird11111
  • [shardformer] fix llama error when transformers upgraded. (#5055) by flybird11111
  • [shardformer] Fix serialization error with Tensor Parallel state saving (#5018) by Jun Gao

Ci

  • [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276) by flybird11111
  • [ci] fix shardformer tests. (#5255) by flybird11111
  • [ci] fixed ddp test (#5254) by Frank Lee
  • [ci] fixed booster test (#5251) by Frank Lee

Npu

  • [npu] change device to accelerator api (#5239) by Hongxin Liu
  • [npu] use extension for op builder (#5172) by Xuanlei Zhao
  • [npu] support triangle attention for llama (#5130) by Xuanlei Zhao
  • [npu] add npu support for hybrid plugin and llama (#5090) by Xuanlei Zhao
  • [npu] add npu support for gemini and zero (#5067) by Hongxin Liu

Pipeline

  • [pipeline] A more general _communicate in p2p (#5062) by Elsa Granger
  • [pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214) by Wenhao Chen
  • [pipeline]: support arbitrary batch size in forward_only mode (#5201) by Wenhao Chen
  • [pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) by Wenhao Chen

Format

  • [format] applied code formatting on changed files in pull request 5234 (#5235) by github-actions[bot]
  • [format] applied code formatting on changed files in pull request 5115 (#5118) by github-actions[bot]
  • [format] applied code formatting on changed files in pull request 5124 (#5125) by github-actions[bot]
  • [format] applied code formatting on changed files in pull request 5088 (#5127) by github-actions[bot]
  • [format] applied code formatting on changed files in pull request 5067 (#5072) by github-actions[bot]
  • [format] applied code formatting on changed files in pull request 4926 (#5007) by github-actions[bot]

Colossal-llama-2

  • [Colossal-LLaMA-2] Release Colossal-LLaMA-2-13b-base model (#5224) by Tong Li
  • [Colossal-Llama-2] Add finetuning Colossal-Llama-2 example (#4878) by Yuanchen

Devops

  • [devops] update torch versoin in ci (#5217) by Hongxin Liu

Colossaleval

  • [ColossalEval] Support GSM, Data Leakage Evaluation and Tensor Parallel (#5169) by Yuanchen

Colossalqa

  • [colossalqa] fix pangu api (#5170) by Michelle
  • [ColossalQA] refactor server and webui & add new feature (#5138) by Michelle

Plugin

  • [plugin]fix 3d checkpoint load when booster boost without optimizer. (#5135) by flybird11111

Feature

  • [FEATURE] Add Safety Eval Datasets to ColossalEval (#5095) by Zian(Andy) Zheng
  • [Feature] Add document retrieval QA (#5020) by YeAnbang

Inference

  • [inference] refactor examples and fix schedule (#5077) by Hongxin Liu
  • [inference] update examples and engine (#5073) by Xu Kai
  • [inference] Refactor inference architecture (#5057) by Xu Kai
  • [Inference] Fix bug in ChatGLM2 Tensor Parallelism (#5014) by Jianghai

Hotfix/hybridengine

  • [hotfix/hybridengine] Fix init model with random parameters in benchmark (#5074) by Bin Jia
  • [hotfix/hybridengine] fix bug when tp*pp size = 1 (#5069) by Bin Jia

Misc

Kernels

Exampe

  • [exampe] fix llama example' loss error when using gemini plugin (#5060) by flybird11111

Pipeline,shardformer

  • [pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when strict=False, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017) by Elsa Granger

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.5...v0.3.4

v0.3.4

6 months ago

What's Changed

Release

Pipeline inference

  • [Pipeline Inference] Merge pp with tp (#4993) by Bin Jia
  • [Pipeline inference] Combine kvcache with pipeline inference (#4938) by Bin Jia
  • [Pipeline Inference] Sync pipeline inference branch to main (#4820) by Bin Jia

Doc

  • [doc] add supported feature diagram for hybrid parallel plugin (#4996) by ppt0011
  • [doc]Update doc for colossal-inference (#4989) by Cuiqing Li (李崔卿)
  • Merge pull request #4889 from ppt0011/main by ppt0011
  • [doc] add reminder for issue encountered with hybrid adam by ppt0011
  • [doc] update advanced tutorials, training gpt with hybrid parallelism (#4866) by flybird11111
  • Merge pull request #4858 from Shawlleyw/main by ppt0011
  • [doc] update slack link (#4823) by binmakeswell
  • [doc] add lazy init docs (#4808) by Hongxin Liu
  • Merge pull request #4805 from TongLi3701/docs/fix by Desperado-Jia
  • [doc] polish shardformer doc (#4779) by Baizhou Zhang
  • [doc] add llama2 domain-specific solution news (#4789) by binmakeswell

Hotfix

  • [hotfix] fix the bug of repeatedly storing param group (#4951) by Baizhou Zhang
  • [hotfix] Fix the bug where process groups were not being properly released. (#4940) by littsk
  • [hotfix] fix torch 2.0 compatibility (#4936) by Hongxin Liu
  • [hotfix] fix lr scheduler bug in torch 2.0 (#4864) by Baizhou Zhang
  • [hotfix] fix bug in sequence parallel test (#4887) by littsk
  • [hotfix] Correct several erroneous code comments (#4794) by littsk
  • [hotfix] fix norm type error in zero optimizer (#4795) by littsk
  • [hotfix] change llama2 Colossal-LLaMA-2 script filename (#4800) by Chandler-Bing

Kernels

  • [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention (#4965) by Cuiqing Li

Inference

  • [Inference] Dynamic Batching Inference, online and offline (#4953) by Jianghai
  • [Inference]ADD Bench Chatglm2 script (#4963) by Jianghai
  • [inference] add reference and fix some bugs (#4937) by Xu Kai
  • [inference] Add smmoothquant for llama (#4904) by Xu Kai
  • [inference] add llama2 support (#4898) by Xu Kai
  • [inference]fix import bug and delete down useless init (#4830) by Jianghai

Test

  • [test] merge old components to test to model zoo (#4945) by Hongxin Liu
  • [test] add no master test for low level zero plugin (#4934) by Zhongkai Zhao
  • Merge pull request #4856 from KKZ20/test/model_support_for_low_level_zero by ppt0011
  • [test] modify model supporting part of low_level_zero plugin (including correspoding docs) by Zhongkai Zhao

Refactor

  • [Refactor] Integrated some lightllm kernels into token-attention (#4946) by Cuiqing Li

Nfc

  • [nfc] fix some typo with colossalai/ docs/ etc. (#4920) by digger yu
  • [nfc] fix minor typo in README (#4846) by Blagoy Simandoff
  • [NFC] polish code style (#4799) by Camille Zhong
  • [NFC] polish colossalai/inference/quant/gptq/cai_gptq/init.py code style (#4792) by Michelle

Format

  • [format] applied code formatting on changed files in pull request 4820 (#4886) by github-actions[bot]
  • [format] applied code formatting on changed files in pull request 4908 (#4918) by github-actions[bot]
  • [format] applied code formatting on changed files in pull request 4595 (#4602) by github-actions[bot]

Gemini

  • [gemini] support gradient accumulation (#4869) by Baizhou Zhang
  • [gemini] support amp o3 for gemini (#4872) by Hongxin Liu

Kernel

  • [kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921) by Hongxin Liu

Feature

  • [feature] support no master weights option for low level zero plugin (#4816) by Zhongkai Zhao
  • [feature] Add clip_grad_norm for hybrid_parallel_plugin (#4837) by littsk
  • [feature] ColossalEval: Evaluation Pipeline for LLMs (#4786) by Yuanchen

Checkpointio

  • [checkpointio] hotfix torch 2.0 compatibility (#4824) by Hongxin Liu
  • [checkpointio] support unsharded checkpointIO for hybrid parallel (#4774) by Baizhou Zhang

Infer

  • [infer] fix test bug (#4838) by Xu Kai
  • [Infer] Serving example w/ ray-serve (multiple GPU case) (#4841) by Yuanheng Zhao
  • [Infer] Colossal-Inference serving example w/ TorchServe (single GPU case) (#4771) by Yuanheng Zhao

Chat

Misc

  • [misc] add last_epoch in CosineAnnealingWarmupLR (#4778) by Yan haixu

Lazy

  • [lazy] support from_pretrained (#4801) by Hongxin Liu

Fix

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.4...v0.3.3

v0.3.3

7 months ago

What's Changed

Release

Inference

  • [inference] chatglm2 infer demo (#4724) by Jianghai

Feature

  • [feature] add gptq for inference (#4754) by Xu Kai
  • [Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577) by Cuiqing Li

Bug

  • [bug] Fix the version check bug in colossalai run when generating the cmd. (#4713) by littsk
  • [bug] fix get_default_parser in examples (#4764) by Baizhou Zhang

Lazy

Chat

  • [chat]: add lora merge weights config (#4766) by Wenhao Chen
  • [chat]: update rm, add wandb and fix bugs (#4471) by Wenhao Chen

Doc

  • [doc] add shardformer doc to sidebar (#4768) by Baizhou Zhang
  • [doc] clean up outdated docs (#4765) by Hongxin Liu
  • Merge pull request #4757 from ppt0011/main by ppt0011
  • [doc] put native colossalai plugins first in description section by Pengtai Xu
  • [doc] add model examples for each plugin by Pengtai Xu
  • [doc] put individual plugin explanation in front by Pengtai Xu
  • [doc] explain suitable use case for each plugin by Pengtai Xu
  • [doc] explaination of loading large pretrained models (#4741) by Baizhou Zhang
  • [doc] polish shardformer doc (#4735) by Baizhou Zhang
  • [doc] add shardformer support matrix/update tensor parallel documents (#4728) by Baizhou Zhang
  • [doc] Add user document for Shardformer (#4702) by Baizhou Zhang
  • [doc] fix llama2 code link (#4726) by binmakeswell
  • [doc] add potential solution for OOM in llama2 example (#4699) by Baizhou Zhang
  • [doc] Update booster user documents. (#4669) by Baizhou Zhang

Shardformer

  • [shardformer] fix master param sync for hybrid plugin/rewrite unwrapping logic (#4758) by Baizhou Zhang
  • [shardformer] add custom policy in hybrid parallel plugin (#4718) by Xuanlei Zhao
  • [shardformer] update seq parallel document (#4730) by Bin Jia
  • [shardformer] update pipeline parallel document (#4725) by flybird11111
  • [shardformer] to fix whisper test failed due to significant accuracy differences. (#4710) by flybird11111
  • [shardformer] fix GPT2DoubleHeadsModel (#4703) by flybird11111
  • [shardformer] update shardformer readme (#4689) by flybird11111
  • [shardformer]fix gpt2 double head (#4663) by flybird11111
  • [shardformer] update llama2/opt finetune example and fix llama2 policy (#4645) by flybird11111
  • [shardformer] Support customized policy for llamav2 based model with HybridParallelPlugin (#4624) by eric8607242

Misc

  • [misc] update pre-commit and run all files (#4752) by Hongxin Liu

Format

  • [format] applied code formatting on changed files in pull request 4743 (#4750) by github-actions[bot]
  • [format] applied code formatting on changed files in pull request 4726 (#4727) by github-actions[bot]

Legacy

  • [legacy] clean up legacy code (#4743) by Hongxin Liu
  • Merge pull request #4738 from ppt0011/main by ppt0011
  • [legacy] remove deterministic data loader test by Pengtai Xu
  • [legacy] move communication and nn to legacy and refactor logger (#4671) by Hongxin Liu

Kernel

Example

  • [example] llama2 add fine-tune example (#4673) by flybird11111
  • [example] add gpt2 HybridParallelPlugin example (#4653) by Bin Jia
  • [example] update vit example for hybrid parallel plugin (#4641) by Baizhou Zhang

Hotfix

  • [hotfix] Fix import error: colossal.kernel without triton installed (#4722) by Yuanheng Zhao
  • [hotfix] fix typo in hybrid parallel io (#4697) by Baizhou Zhang

Devops

  • [devops] fix concurrency group (#4667) by Hongxin Liu
  • [devops] fix concurrency group and compatibility test (#4665) by Hongxin Liu

Pipeline

  • [pipeline] set optimizer to optional in execute_pipeline (#4630) by Baizhou Zhang

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.3...v0.3.2

v0.3.2

8 months ago

What's Changed

Release

Shardformer

  • Merge pull request #4612 from hpcaitech/feature/shardformer by Hongxin Liu
  • [shardformer] update shardformer readme (#4617) by flybird11111
  • [shardformer] Add overlap optional for HybridParallelPlugin (#4615) by Bin Jia
  • [shardformer] update bert finetune example with HybridParallelPlugin (#4584) by flybird11111
  • [shardformer] Pytree fix (#4533) by Jianghai
  • [shardformer] support from_pretrained when loading model with HybridParallelPlugin (#4575) by Baizhou Zhang
  • [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540) by Baizhou Zhang
  • [shardformer] fix submodule replacement bug when enabling pp (#4544) by Baizhou Zhang
  • [shardformer] support pp+tp+zero1 tests (#4531) by flybird11111
  • [shardformer] fix opt test hanging (#4521) by flybird11111
  • [shardformer] Add overlap support for gpt2 (#4535) by Bin Jia
  • [shardformer] fix emerged bugs after updating transformers (#4526) by Baizhou Zhang
  • [shardformer] zero1+pp and the corresponding tests (#4517) by Jianghai
  • [shardformer] support sharded checkpoint IO for models of HybridParallelPlugin (#4506) by Baizhou Zhang
  • [shardformer] opt fix. (#4514) by flybird11111
  • [shardformer] vit/llama/t5 ignore the sequence parallelism flag and some fix. (#4498) by flybird11111
  • [shardformer] tests for 3d parallel (#4493) by Jianghai
  • [shardformer] chatglm support sequence parallel (#4482) by flybird11111
  • [shardformer] support tp+zero for shardformer (#4472) by Baizhou Zhang
  • [shardformer] Pipeline/whisper (#4456) by Jianghai
  • [shardformer] bert support sequence parallel. (#4455) by flybird11111
  • [shardformer] bloom support sequence parallel (#4465) by flybird11111
  • [shardformer] support interleaved pipeline (#4448) by LuGY
  • [shardformer] support DDP in HybridPlugin/add tp+dp tests (#4446) by Baizhou Zhang
  • [shardformer] fix import by ver217
  • [shardformer] fix embedding by ver217
  • [shardformer] update bloom/llama/vit/chatglm tests (#4420) by flybird11111
  • [shardformer]update t5 tests for using all optimizations. (#4407) by flybird11111
  • [shardformer] update tests for all optimization (#4413) by flybird11111
  • [shardformer] rewrite tests for opt/bloom/llama/vit/chatglm (#4395) by Baizhou Zhang
  • [shardformer]fix, test gpt2 for AMP+TP (#4403) by flybird11111
  • [shardformer] test all optimizations (#4399) by flybird1111
  • [shardformer] update shardformer to use flash attention 2 (#4392) by flybird1111
  • [Shardformer] Merge flash attention branch to pipeline branch (#4362) by flybird1111
  • [shardformer] add util functions for shardformer tests/fix sync_shared_param (#4366) by Baizhou Zhang
  • [shardformer] support Blip2 (#4243) by FoolPlayer
  • [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit by klhhhhh
  • [shardformer] pre-commit check files by klhhhhh
  • [shardformer] register without auto policy by klhhhhh
  • [shardformer] ChatGLM support layernorm sharding by klhhhhh
  • [shardformer] delete some file by klhhhhh
  • [shardformer] support chatglm without layernorm by klhhhhh
  • [shardformer] polish code by klhhhhh
  • [shardformer] polish chatglm code by klhhhhh
  • [shardformer] add test kit in model zoo for chatglm by klhhhhh
  • [shardformer] vit test finish and support by klhhhhh
  • [shardformer] added tests by klhhhhh
  • Feature/chatglm (#4240) by Kun Lin
  • [shardformer] support whisper (#4212) by FoolPlayer
  • [shardformer] support SAM (#4231) by FoolPlayer
  • Feature/vit support (#4182) by Kun Lin
  • [shardformer] support pipeline base vit model (#4284) by FoolPlayer
  • [shardformer] support inplace sharding (#4251) by Hongxin Liu
  • [shardformer] fix base policy (#4229) by Hongxin Liu
  • [shardformer] support lazy init (#4202) by Hongxin Liu
  • [shardformer] fix type hint by ver217
  • [shardformer] rename policy file name by ver217

Legacy

  • [legacy] move builder and registry to legacy (#4603) by Hongxin Liu
  • [legacy] move engine to legacy (#4560) by Hongxin Liu
  • [legacy] move trainer to legacy (#4545) by Hongxin Liu

Test

  • [test] fix gemini checkpoint and gpt test (#4620) by Hongxin Liu
  • [test] ignore gpt2 shardformer test (#4619) by Hongxin Liu
  • [test] Hotfix/fix some model test and refactor check util api (#4369) by Bin Jia
  • [test] skip some not compatible models by FoolPlayer
  • [test] add shard util tests by ver217
  • [test] update shardformer tests by ver217
  • [test] remove useless tests (#4359) by Hongxin Liu

Zero

  • [zero] hotfix master param sync (#4618) by Hongxin Liu
  • [zero]fix zero ckptIO with offload (#4529) by LuGY
  • [zero]support zero2 with gradient accumulation (#4511) by LuGY

Checkpointio

  • [checkpointio] support huggingface from_pretrained for all plugins (#4606) by Baizhou Zhang
  • [checkpointio] optimize zero optim checkpoint io (#4591) by Hongxin Liu

Coati

  • Merge pull request #4542 from hpcaitech/chatglm by yingliu-hpc
  • Merge pull request #4541 from ver217/coati/chatglm by yingliu-hpc
  • [coati] update ci by ver217
  • [coati] add chatglm model (#4539) by yingliu-hpc

Doc

  • [doc] add llama2 benchmark (#4604) by binmakeswell
  • [DOC] hotfix/llama2news (#4595) by binmakeswell
  • [doc] fix a typo in examples/tutorial/auto_parallel/README.md (#4430) by Tian Siyuan
  • [doc] update Coati README (#4405) by Wenhao Chen
  • [doc] add Series A Funding and NeurIPS news (#4377) by binmakeswell
  • [doc] Fix gradient accumulation doc. (#4349) by flybird1111

Pipeline

  • [pipeline] 1f1b schedule receive microbatch size (#4589) by Hongxin Liu
  • [pipeline] rewrite bert tests and fix some bugs (#4409) by Jianghai
  • [pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388) by Baizhou Zhang
  • [pipeline] add chatglm (#4363) by Jianghai
  • [pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354) by Baizhou Zhang
  • [pipeline] refactor test pipeline and remove useless utils in pipeline (#4324) by Jianghai
  • [pipeline] add unit test for 1f1b (#4303) by LuGY
  • [pipeline] fix return_dict/fix pure_pipeline_test (#4331) by Baizhou Zhang
  • [pipeline] add pipeline support for all T5 models (#4310) by Baizhou Zhang
  • [pipeline] test pure pipeline process using llama (#4218) by Jianghai
  • [pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300) by Baizhou Zhang
  • [pipeline] reformat for unified design (#4283) by Jianghai
  • [pipeline] OPT model pipeline (#4258) by Jianghai
  • [pipeline] refactor gpt2 pipeline forwards (#4287) by Baizhou Zhang
  • [pipeline] support shardformer for GPT2ForQuestionAnswering & complete pipeline support for GPT2 (#4245) by Baizhou Zhang
  • [pipeline] finish bloom models pipeline and tests (#4223) by Jianghai
  • [pipeline] All bert models (#4233) by Jianghai
  • [pipeline] add pipeline forward for variants of gpt2 (#4238) by Baizhou Zhang
  • [pipeline] Add Pipeline Forward for GPT2Model Shardformer (#4224) by Baizhou Zhang
  • [pipeline] add bloom model pipeline (#4210) by Jianghai
  • [pipeline] Llama causal lm and llama for sequence classification pipeline (#4208) by Jianghai
  • [pipeline] Llama pipeline (#4205) by Jianghai
  • [pipeline] Bert pipeline for shardformer and its tests (#4197) by Jianghai
  • [pipeline] move bert related pipeline components to shardformer (#4187) by Jianghai
  • [pipeline] add bert_for_pretraining bert_lmhead forward and policy (#4172) by Jianghai
  • [pipeline] update shardformer docstring by ver217
  • [pipeline] update shardformer policy by ver217
  • [pipeline] build bloom model and policy , revise the base class of policy (#4161) by Jianghai
  • [pipeline]add pipeline policy and bert forward (#4130) by Jianghai
  • [pipeline] add stage manager (#4093) by Hongxin Liu
  • [pipeline]add pipeline policy and bert forward (#4130) by Jianghai
  • [pipeline] refactor 1f1b schedule (#4115) by Hongxin Liu
  • [pipeline] implement p2p communication (#4100) by Hongxin Liu
  • [pipeline] add stage manager (#4093) by Hongxin Liu

Fix

  • [Fix] Fix compile error (#4357) by Mashiro
  • [fix] coloattention support flash attention 2 (#4347) by flybird1111

Devops

  • [devops] cancel previous runs in the PR (#4546) by Hongxin Liu
  • [devops] add large-scale distributed test marker (#4452) by Hongxin Liu

Example

  • [example] change accelerate version (#4431) by Tian Siyuan
  • [example] update streamlit 0.73.1 to 1.11.1 (#4386) by ChengDaqi2023
  • [example] add llama2 example (#4527) by Hongxin Liu

Shardformer/fix overlap bug

  • [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516) by Bin Jia

Format

  • [format] applied code formatting on changed files in pull request 4479 (#4504) by github-actions[bot]
  • [format] applied code formatting on changed files in pull request 4441 (#4445) by github-actions[bot]

Gemini

  • [gemini] improve compatibility and add static placement policy (#4479) by Hongxin Liu
  • [gemini] fix tensor storage cleaning in state dict collection (#4396) by Baizhou Zhang

Shardformer/sequence parallel

  • [shardformer/sequence parallel] not support opt of seq-parallel, add warning and fix a bug in gpt2 pp (#4488) by Bin Jia
  • [shardformer/sequence parallel] support gpt2 seq parallel with pp/dp/tp (#4460) by Bin Jia
  • [shardformer/sequence parallel] Cherry pick commit to new branch (#4450) by Bin Jia

Chat

  • [chat] update config and prompt (#4139) by Michelle
  • [chat] fix bugs and add unit tests (#4213) by Wenhao Chen

Misc

  • [misc] update requirements by ver217
  • [misc] resolve code factor issues (#4433) by Hongxin Liu

Sharformer

  • [sharformer] add first version of policy of chatglm by klhhhhh

Hotfix

  • [hotfix] fix gemini and zero test (#4333) by Hongxin Liu
  • [hotfix] fix opt pipeline (#4293) by Jianghai
  • [hotfix] fix unsafe async comm in zero (#4404) by LuGY
  • [hotfix] update gradio 3.11 to 3.34.0 (#4329) by caption

Plugin

  • [plugin] add 3d parallel plugin (#4295) by Hongxin Liu

Bugs

  • [bugs] hot fix some testing bugs for new models (#4268) by Jianghai

Cluster

  • [cluster] add process group mesh (#4039) by Hongxin Liu

Kernel

  • [kernel] updated unittests for coloattention (#4389) by flybird1111

Coloattention

  • [coloattention] fix import error (#4380) by flybird1111

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.2...v0.3.1

v0.3.1

9 months ago

What's Changed

Release

Chat

  • [chat] fix compute_approx_kl (#4338) by Wenhao Chen
  • [chat] removed cache file (#4155) by Frank Lee
  • [chat] use official transformers and fix some issues (#4117) by Wenhao Chen
  • [chat] remove naive strategy and split colossalai strategy (#4094) by Wenhao Chen
  • [chat] refactor trainer class (#4080) by Wenhao Chen
  • [chat]: fix chat evaluation possible bug (#4064) by Michelle
  • [chat] refactor strategy class with booster api (#3987) by Wenhao Chen
  • [chat] refactor actor class (#3968) by Wenhao Chen
  • [chat] add distributed PPO trainer (#3740) by Hongxin Liu

Zero

  • [zero] optimize the optimizer step time (#4221) by LuGY
  • [zero] support shard optimizer state dict of zero (#4194) by LuGY
  • [zero] add state dict for low level zero (#4179) by LuGY
  • [zero] allow passing process group to zero12 (#4153) by LuGY
  • [zero]support no_sync method for zero1 plugin (#4138) by LuGY
  • [zero] refactor low level zero for shard evenly (#4030) by LuGY

Nfc

  • [NFC] polish applications/Chat/coati/models/utils.py codestyle (#4277) by yuxuan-lou
  • [NFC] polish applications/Chat/coati/trainer/strategies/base.py code style (#4278) by Zirui Zhu
  • [NFC] polish applications/Chat/coati/models/generation.py code style (#4275) by RichardoLuo
  • [NFC] polish applications/Chat/inference/server.py code style (#4274) by Yuanchen
  • [NFC] fix format of application/Chat/coati/trainer/utils.py (#4273) by アマデウス
  • [NFC] polish applications/Chat/examples/train_reward_model.py code style (#4271) by Xu Kai
  • [NFC] fix: format (#4270) by dayellow
  • [NFC] polish runtime_preparation_pass style (#4266) by Wenhao Chen
  • [NFC] polish unary_elementwise_generator.py code style (#4267) by YeAnbang
  • [NFC] polish applications/Chat/coati/trainer/base.py code style (#4260) by shenggan
  • [NFC] polish applications/Chat/coati/dataset/sft_dataset.py code style (#4259) by Zheng Zangwei (Alex Zheng)
  • [NFC] polish colossalai/booster/plugin/low_level_zero_plugin.py code style (#4256) by 梁爽
  • [NFC] polish colossalai/auto_parallel/offload/amp_optimizer.py code style (#4255) by Yanjia0
  • [NFC] polish colossalai/cli/benchmark/utils.py code style (#4254) by ocd_with_naming
  • [NFC] policy applications/Chat/examples/ray/mmmt_prompt.py code style (#4250) by CZYCW
  • [NFC] polish applications/Chat/coati/models/base/actor.py code style (#4248) by Junming Wu
  • [NFC] polish applications/Chat/inference/requirements.txt code style (#4265) by Camille Zhong
  • [NFC] Fix format for mixed precision (#4253) by Jianghai
  • [nfc]fix ColossalaiOptimizer is not defined (#4122) by digger yu
  • [nfc] fix dim not defined and fix typo (#3991) by digger yu
  • [nfc] fix typo colossalai/zero (#3923) by digger yu
  • [nfc]fix typo colossalai/pipeline tensor nn (#3899) by digger yu
  • [nfc] fix typo colossalai/nn (#3887) by digger yu
  • [nfc] fix typo colossalai/cli fx kernel (#3847) by digger yu

Example

  • Fix/format (#4261) by Michelle
  • [example] add llama pretraining (#4257) by binmakeswell
  • [example] fix bucket size in example of gpt gemini (#4028) by LuGY
  • [example] update ViT example using booster api (#3940) by Baizhou Zhang
  • Merge pull request #3905 from MaruyamaAya/dreambooth by Liu Ziming
  • [example] update opt example using booster api (#3918) by Baizhou Zhang
  • [example] Modify palm example with the new booster API (#3913) by Liu Ziming
  • [example] update gemini examples (#3868) by jiangmingyan

Ci

  • [ci] support testmon core pkg change detection (#4305) by Hongxin Liu

Checkpointio

  • [checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin (#4302) by Baizhou Zhang
  • Next commit [checkpointio] Unsharded Optimizer Checkpoint for Gemini Plugin (#4141) by Baizhou Zhang
  • [checkpointio] sharded optimizer checkpoint for DDP plugin (#4002) by Baizhou Zhang
  • [checkpointio] General Checkpointing of Sharded Optimizers (#3984) by Baizhou Zhang

Lazy

  • [lazy] support init on cuda (#4269) by Hongxin Liu
  • [lazy] fix compatibility problem on torch 1.13 (#3911) by Hongxin Liu
  • [lazy] refactor lazy init (#3891) by Hongxin Liu

Kernels

  • [Kernels] added triton-implemented of self attention for colossal-ai (#4241) by Cuiqing Li

Docker

  • [docker] fixed ninja build command (#4203) by Frank Lee
  • [docker] added ssh and rdma support for docker (#4192) by Frank Lee

Dtensor

  • [dtensor] fixed readme file name and removed deprecated file (#4162) by Frank Lee
  • [dtensor] updated api and doc (#3845) by Frank Lee

Workflow

  • [workflow] show test duration (#4159) by Frank Lee
  • [workflow] added status check for test coverage workflow (#4106) by Frank Lee
  • [workflow] cover all public repositories in weekly report (#4069) by Frank Lee
  • [workflow] fixed the directory check in build (#3980) by Frank Lee
  • [workflow] cancel duplicated workflow jobs (#3960) by Frank Lee
  • [workflow] cancel duplicated workflow jobs (#3960) by Frank Lee
  • [workflow] added docker latest tag for release (#3920) by Frank Lee
  • [workflow] fixed workflow check for docker build (#3849) by Frank Lee

Cli

  • [cli] hotfix launch command for multi-nodes (#4165) by Hongxin Liu

Format

  • [format] applied code formatting on changed files in pull request 4152 (#4157) by github-actions[bot]
  • [format] applied code formatting on changed files in pull request 4021 (#4022) by github-actions[bot]

Shardformer

  • [shardformer] added development protocol for standardization (#4149) by Frank Lee
  • [shardformer] made tensor parallelism configurable (#4144) by Frank Lee
  • [shardformer] refactored some doc and api (#4137) by Frank Lee
  • [shardformer] write an shardformer example with bert finetuning (#4126) by jiangmingyan
  • [shardformer] added embedding gradient check (#4124) by Frank Lee
  • [shardformer] import huggingface implicitly (#4101) by Frank Lee
  • [shardformer] integrate with data parallelism (#4103) by Frank Lee
  • [shardformer] supported fused normalization (#4112) by Frank Lee
  • [shardformer] supported bloom model (#4098) by Frank Lee
  • [shardformer] support vision transformer (#4096) by Kun Lin
  • [shardformer] shardformer support opt models (#4091) by jiangmingyan
  • [shardformer] refactored layernorm (#4086) by Frank Lee
  • [shardformer] Add layernorm (#4072) by FoolPlayer
  • [shardformer] supported fused qkv checkpoint (#4073) by Frank Lee
  • [shardformer] add linearconv1d test (#4067) by FoolPlayer
  • [shardformer] support module saving and loading (#4062) by Frank Lee
  • [shardformer] refactored the shardformer layer structure (#4053) by Frank Lee
  • [shardformer] adapted T5 and LLaMa test to use kit (#4049) by Frank Lee
  • [shardformer] add gpt2 test and layer class refactor (#4041) by FoolPlayer
  • [shardformer] supported T5 and its variants (#4045) by Frank Lee
  • [shardformer] adapted llama to the new API (#4036) by Frank Lee
  • [shardformer] fix bert and gpt downstream with new api (#4024) by FoolPlayer
  • [shardformer] updated doc (#4016) by Frank Lee
  • [shardformer] removed inplace tensor sharding (#4018) by Frank Lee
  • [shardformer] refactored embedding and dropout to parallel module (#4013) by Frank Lee
  • [shardformer] integrated linear 1D with dtensor (#3996) by Frank Lee
  • [shardformer] Refactor shardformer api (#4001) by FoolPlayer
  • [shardformer] fix an error in readme (#3988) by FoolPlayer
  • [Shardformer] Downstream bert (#3979) by FoolPlayer
  • [shardformer] shardformer support t5 model (#3994) by wukong1992
  • [shardformer] support llama model using shardformer (#3969) by wukong1992
  • [shardformer] Add dropout layer in shard model and refactor policy api (#3949) by FoolPlayer
  • [shardformer] Unit test (#3928) by FoolPlayer
  • [shardformer] Align bert value (#3907) by FoolPlayer
  • [shardformer] add gpt2 policy and modify shard and slicer to support (#3883) by FoolPlayer
  • [shardformer] add Dropout layer support different dropout pattern (#3856) by FoolPlayer
  • [shardformer] update readme with modules implement doc (#3834) by FoolPlayer
  • [shardformer] refactored the user api (#3828) by Frank Lee
  • [shardformer] updated readme (#3827) by Frank Lee
  • [shardformer]: Feature/shardformer, add some docstring and readme (#3816) by FoolPlayer
  • [shardformer] init shardformer code structure (#3731) by FoolPlayer
  • [shardformer] add gpt2 policy and modify shard and slicer to support (#3883) by FoolPlayer
  • [shardformer] add Dropout layer support different dropout pattern (#3856) by FoolPlayer
  • [shardformer] update readme with modules implement doc (#3834) by FoolPlayer
  • [shardformer] refactored the user api (#3828) by Frank Lee
  • [shardformer] updated readme (#3827) by Frank Lee
  • [shardformer]: Feature/shardformer, add some docstring and readme (#3816) by FoolPlayer
  • [shardformer] init shardformer code structure (#3731) by FoolPlayer

Test

  • [test] fixed tests failed due to dtensor change (#4082) by Frank Lee
  • [test] fixed codefactor format report (#4026) by Frank Lee

Device

  • [device] support init device mesh from process group (#3990) by Frank Lee

Hotfix

  • [hotfix] fix import bug in checkpoint_io (#4142) by Baizhou Zhang
  • [hotfix]fix argument naming in docs and examples (#4083) by Baizhou Zhang

Doc

  • [doc] update and revise some typos and errs in docs (#4107) by Jianghai
  • [doc] add a note about unit-testing to CONTRIBUTING.md (#3970) by Baizhou Zhang
  • [doc] add lazy init tutorial (#3922) by Hongxin Liu
  • [doc] fix docs about booster api usage (#3898) by Baizhou Zhang
  • [doc]update moe chinese document. (#3890) by jiangmingyan
  • [doc] update document of zero with chunk. (#3855) by jiangmingyan
  • [doc] update nvme offload documents. (#3850) by jiangmingyan

Examples

  • [examples] copy resnet example to image (#4090) by Jianghai

Testing

  • [testing] move pytest to be inside the function (#4087) by Frank Lee

Gemini

  • Merge pull request #4056 from Fridge003/hotfix/fix_gemini_chunk_config_searching by Baizhou Zhang
  • [gemini] fix argument naming during chunk configuration searching by Baizhou Zhang
  • [gemini] fixed the gemini checkpoint io (#3934) by Frank Lee
  • [gemini] fixed the gemini checkpoint io (#3934) by Frank Lee

Devops

  • [devops] fix build on pr ci (#4043) by Hongxin Liu
  • [devops] update torch version in compability test (#3919) by Hongxin Liu
  • [devops] hotfix testmon cache clean logic (#3917) by Hongxin Liu
  • [devops] hotfix CI about testmon cache (#3910) by Hongxin Liu
  • [devops] improving testmon cache (#3902) by Hongxin Liu

Sync

  • Merge pull request #4025 from hpcaitech/develop by Frank Lee
  • Merge pull request #3967 from ver217/update-develop by Frank Lee
  • Merge pull request #3942 from hpcaitech/revert-3931-sync/develop-to-shardformer by FoolPlayer
  • Revert "[sync] sync feature/shardformer with develop" by Frank Lee
  • Merge pull request #3931 from FrankLeeeee/sync/develop-to-shardformer by FoolPlayer
  • Merge pull request #3916 from FrankLeeeee/sync/dtensor-with-develop by Frank Lee
  • Merge pull request #3915 from FrankLeeeee/update/develop by Frank Lee

Booster

  • [booster] make optimizer argument optional for boost (#3993) by Wenhao Chen
  • [booster] update bert example, using booster api (#3885) by wukong1992

Evaluate

  • [evaluate] support gpt evaluation with reference (#3972) by Yuanchen

Feature

  • Merge pull request #3926 from hpcaitech/feature/dtensor by Frank Lee

Bf16

Evaluation

  • [evaluation] improvement on evaluation (#3862) by Yuanchen

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.1...v0.3.0

v0.3.0

11 months ago

What's Changed

Release

  • [release] bump to v0.3.0 (#3830) by Frank Lee

Nfc

  • [nfc] fix typo colossalai/ applications/ (#3831) by digger yu
  • [NFC]fix typo colossalai/auto_parallel nn utils etc. (#3779) by digger yu
  • [NFC] fix typo colossalai/amp auto_parallel autochunk (#3756) by digger yu
  • [NFC] fix typo with colossalai/auto_parallel/tensor_shard (#3742) by digger yu
  • [NFC] fix typo applications/ and colossalai/ (#3735) by digger-yu
  • [NFC] polish colossalai/engine/gradient_handler/init.py code style (#3329) by Ofey Chan
  • [NFC] polish colossalai/context/random/init.py code style (#3327) by yuxuan-lou
  • [NFC] polish colossalai/fx/tracer/_tracer_utils.py (#3323) by Michelle
  • [NFC] polish colossalai/gemini/paramhooks/_param_hookmgr.py code style by Xu Kai
  • [NFC] polish initializer_data.py code style (#3287) by RichardoLuo
  • [NFC] polish colossalai/cli/benchmark/models.py code style (#3290) by Ziheng Qin
  • [NFC] polish initializer_3d.py code style (#3279) by Kai Wang (Victor Kai)
  • [NFC] polish colossalai/engine/gradient_accumulation/_gradient_accumulation.py code style (#3277) by Sze-qq
  • [NFC] polish colossalai/context/parallel_context.py code style (#3276) by Arsmart1
  • [NFC] polish colossalai/engine/schedule/_pipeline_schedule_v2.py code style (#3275) by Zirui Zhu
  • [NFC] polish colossalai/nn/_ops/addmm.py code style (#3274) by Tong Li
  • [NFC] polish colossalai/amp/init.py code style (#3272) by lucasliunju
  • [NFC] polish code style (#3273) by Xuanlei Zhao
  • [NFC] policy colossalai/fx/proxy.py code style (#3269) by CZYCW
  • [NFC] polish code style (#3268) by Yuanchen
  • [NFC] polish tensor_placement_policy.py code style (#3265) by Camille Zhong
  • [NFC] polish colossalai/fx/passes/split_module.py code style (#3263) by CsRic
  • [NFC] polish colossalai/global_variables.py code style (#3259) by jiangmingyan
  • [NFC] polish colossalai/engine/gradient_handler/_moe_gradient_handler.py (#3260) by LuGY
  • [NFC] polish colossalai/fx/profiler/experimental/profiler_module/embedding.py code style (#3256) by dayellow

Doc

  • [doc] update document of gemini instruction. (#3842) by jiangmingyan
  • Merge pull request #3810 from jiangmingyan/amp by jiangmingyan
  • [doc]fix by jiangmingyan
  • [doc]fix by jiangmingyan
  • [doc] add warning about fsdp plugin (#3813) by Hongxin Liu
  • [doc] add removed change of config.py by jiangmingyan
  • [doc] add removed warning by jiangmingyan
  • [doc] update amp document by Mingyan Jiang
  • [doc] update amp document by Mingyan Jiang
  • [doc] update amp document by Mingyan Jiang
  • [doc] update gradient accumulation (#3771) by jiangmingyan
  • [doc] update gradient cliping document (#3778) by jiangmingyan
  • [doc] add deprecated warning on doc Basics section (#3754) by Yanjia0
  • [doc] add booster docstring and fix autodoc (#3789) by Hongxin Liu
  • [doc] add tutorial for booster checkpoint (#3785) by Hongxin Liu
  • [doc] add tutorial for booster plugins (#3758) by Hongxin Liu
  • [doc] add tutorial for cluster utils (#3763) by Hongxin Liu
  • [doc] update hybrid parallelism doc (#3770) by jiangmingyan
  • [doc] update booster tutorials (#3718) by jiangmingyan
  • [doc] fix chat spelling error (#3671) by digger-yu
  • [Doc] enhancement on README.md for chat examples (#3646) by Camille Zhong
  • [doc] Fix typo under colossalai and doc(#3618) by digger-yu
  • [doc] .github/workflows/README.md (#3605) by digger-yu
  • [doc] fix setup.py typo (#3603) by digger-yu
  • [doc] fix op_builder/README.md (#3597) by digger-yu
  • [doc] Update .github/workflows/README.md (#3577) by digger-yu
  • [doc] Update 1D_tensor_parallel.md (#3573) by digger-yu
  • [doc] Update 1D_tensor_parallel.md (#3563) by digger-yu
  • [doc] Update README.md (#3549) by digger-yu
  • [doc] Update README-zh-Hans.md (#3541) by digger-yu
  • [doc] hide diffusion in application path (#3519) by binmakeswell
  • [doc] add requirement and highlight application (#3516) by binmakeswell
  • [doc] Add docs for clip args in zero optim (#3504) by YH
  • [doc] updated contributor list (#3474) by Frank Lee
  • [doc] polish diffusion example (#3386) by Jan Roudaut
  • [doc] add Intel cooperation news (#3333) by binmakeswell
  • [doc] added authors to the chat application (#3307) by Fazzie-Maqianli

Workflow

  • [workflow] supported test on CUDA 10.2 (#3841) by Frank Lee
  • [workflow] fixed testmon cache in build CI (#3806) by Frank Lee
  • [workflow] changed to doc build to be on schedule and release (#3825) by Frank Lee
  • [workflow] enblaed doc build from a forked repo (#3815) by Frank Lee
  • [workflow] enable testing for develop & feature branch (#3801) by Frank Lee
  • [workflow] fixed the docker build workflow (#3794) by Frank Lee

Booster

  • [booster] add warning for torch fsdp plugin doc (#3833) by wukong1992
  • [booster] torch fsdp fix ckpt (#3788) by wukong1992
  • [booster] removed models that don't support fsdp (#3744) by wukong1992
  • [booster] support torch fsdp plugin in booster (#3697) by wukong1992
  • [booster] add tests for ddp and low level zero's checkpointio (#3715) by jiangmingyan
  • [booster] fix no_sync method (#3709) by Hongxin Liu
  • [booster] update prepare dataloader method for plugin (#3706) by Hongxin Liu
  • [booster] refactor all dp fashion plugins (#3684) by Hongxin Liu
  • [booster] gemini plugin support shard checkpoint (#3610) by jiangmingyan
  • [booster] add low level zero plugin (#3594) by Hongxin Liu
  • [booster] fixed the torch ddp plugin with the new checkpoint api (#3442) by Frank Lee
  • [booster] implement Gemini plugin (#3352) by ver217

Docs

  • [docs] change placememt_policy to placement_policy (#3829) by digger yu

Evaluation

  • [evaluation] add automatic evaluation pipeline (#3821) by Yuanchen

Docker

  • [Docker] Fix a couple of build issues (#3691) by Yanming W
  • Fix/docker action (#3266) by liuzeming

Api

  • [API] add docstrings and initialization to apex amp, naive amp (#3783) by jiangmingyan

Test

  • [test] fixed lazy init test import error (#3799) by Frank Lee
  • Update test_ci.sh by Camille Zhong
  • [test] refactor tests with spawn (#3452) by Frank Lee
  • [test] reorganize zero/gemini tests (#3445) by ver217
  • [test] fixed gemini plugin test (#3411) by Frank Lee

Format

  • [format] applied code formatting on changed files in pull request 3786 (#3787) by github-actions[bot]
  • [format] Run lint on colossalai.engine (#3367) by Hakjin Lee

Plugin

  • [plugin] a workaround for zero plugins' optimizer checkpoint (#3780) by Hongxin Liu
  • [plugin] torch ddp plugin supports sharded model checkpoint (#3775) by Hongxin Liu

Chat

  • [chat] add performance and tutorial (#3786) by binmakeswell
  • [chat] fix bugs in stage 3 training (#3759) by Yuanchen
  • [chat] fix community example ray (#3719) by MisterLin1995
  • [chat] fix train_prompts.py gemini strategy bug (#3666) by zhang-yi-chi
  • [chat] PPO stage3 doc enhancement (#3679) by Camille Zhong
  • [chat] add opt attn kernel (#3655) by Hongxin Liu
  • [chat] typo accimulation_steps -> accumulation_steps (#3662) by tanitna
  • Merge pull request #3656 from TongLi3701/chat/update_eval by Tong Li
  • [chat] set default zero2 strategy (#3667) by binmakeswell
  • [chat] refactor model save/load logic (#3654) by Hongxin Liu
  • [chat] remove lm model class (#3653) by Hongxin Liu
  • [chat] refactor trainer (#3648) by Hongxin Liu
  • [chat] polish performance evaluator (#3647) by Hongxin Liu
  • Merge pull request #3621 from zhang-yi-chi/fix/chat-train-prompts-single-gpu by Tong Li
  • [Chat] Remove duplicate functions (#3625) by ddobokki
  • [chat] fix enable single gpu training bug by zhang-yi-chi
  • [chat] polish code note typo (#3612) by digger-yu
  • [chat] update reward model sh (#3578) by binmakeswell
  • [chat] ChatGPT train prompts on ray example (#3309) by MisterLin1995
  • [chat] polish tutorial doc (#3551) by binmakeswell
  • [chat]add examples of training with limited resources in chat readme (#3536) by Yuanchen
  • [chat]: add vf_coef argument for PPOTrainer (#3318) by zhang-yi-chi
  • [chat] add zero2 cpu strategy for sft training (#3520) by ver217
  • [chat] fix stage3 PPO sample sh command (#3477) by binmakeswell
  • [Chat]Add Peft support & fix the ptx bug (#3433) by YY Lin
  • [chat]fix save_model(#3377) by Dr-Corgi
  • [chat]fix readme (#3429) by kingkingofall
  • [Chat] fix the tokenizer "int too big to convert" error in SFT training (#3453) by Camille Zhong
  • [chat]fix sft training for bloom, gpt and opt (#3418) by Yuanchen
  • [chat] correcting a few obvious typos and grammars errors (#3338) by Andrew

Devops

  • [devops] fix doc test on pr (#3782) by Hongxin Liu
  • [devops] fix ci for document check (#3751) by Hongxin Liu
  • [devops] make build on PR run automatically (#3748) by Hongxin Liu
  • [devops] update torch version of CI (#3725) by Hongxin Liu
  • [devops] fix chat ci (#3628) by Hongxin Liu

Amp

Auto

Fix

  • [fix] Add init to fix import error when importing _analyzer (#3668) by Ziyue Jiang

Ci

  • [CI] fix typo with tests/ etc. (#3727) by digger-yu
  • [CI] fix typo with tests components (#3695) by digger-yu
  • [CI] fix some spelling errors (#3707) by digger-yu
  • [CI] Update test_sharded_optim_with_sync_bn.py (#3688) by digger-yu

Example

  • [example] add train resnet/vit with booster example (#3694) by Hongxin Liu
  • [example] add finetune bert with booster example (#3693) by Hongxin Liu
  • [example] fix community doc (#3586) by digger-yu
  • [example] reorganize for community examples (#3557) by binmakeswell
  • [example] remove redundant texts & update roberta (#3493) by mandoxzhang
  • [example] update roberta with newer ColossalAI (#3472) by mandoxzhang
  • [example] update examples related to zero/gemini (#3431) by ver217

Tensor

  • [tensor] Refactor handle_trans_spec in DistSpecManager by YH

Zero

  • [zero] Suggests a minor change to confusing variable names in the ZeRO optimizer. (#3173) by YH
  • [zero] reorganize zero/gemini folder structure (#3424) by ver217

Gemini

  • [gemini] accelerate inference (#3641) by Hongxin Liu
  • [gemini] state dict supports fp16 (#3590) by Hongxin Liu
  • [gemini] support save state dict in shards (#3581) by Hongxin Liu
  • [gemini] gemini supports lazy init (#3379) by Hongxin Liu

Bot

Misc

  • [misc] op_builder/builder.py (#3593) by digger-yu
  • [misc] add verbose arg for zero and op builder (#3552) by Hongxin Liu

Coati

Fx

  • [fx] fix meta tensor registration (#3589) by Hongxin Liu

Chatgpt

  • [chatgpt] Detached PPO Training (#3195) by csric
  • [chatgpt] add pre-trained model RoBERTa for RLHF stage 2 & 3 (#3223) by Camille Zhong

Lazyinit

  • [lazyinit] fix clone and deepcopy (#3553) by Hongxin Liu

Checkpoint

  • [checkpoint] Shard saved checkpoint need to be compatible with the naming format of hf checkpoint files (#3479) by jiangmingyan
  • [checkpoint] support huggingface style sharded checkpoint (#3461) by jiangmingyan
  • [checkpoint] refactored the API and added safetensors support (#3427) by Frank Lee

Chat community

  • [Chat Community] Update README.md (fixed#3487) (#3506) by NatalieC323

Dreambooth

  • Revert "[dreambooth] fixing the incompatibity in requirements.txt (#3190) (#3378)" (#3481) by NatalieC323
  • [dreambooth] fixing the incompatibity in requirements.txt (#3190) (#3378) by NatalieC323

Autoparallel

  • [autoparallel]integrate auto parallel feature with new tracer (#3408) by YuliangLiu0306
  • [autoparallel] adapt autoparallel with new analyzer (#3261) by YuliangLiu0306

Moe

  • [moe] add checkpoint for moe models (#3354) by HELSON

Hotfix

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.3.0...v0.2.8

v0.2.8

1 year ago

What's Changed

Release

Format

  • [format] applied code formatting on changed files in pull request 3300 (#3302) by github-actions[bot]
  • [format] applied code formatting on changed files in pull request 3296 (#3298) by github-actions[bot]

Doc

Application

  • [application] updated the README (#3301) by Frank Lee

Chat

  • [chat]polish prompts training (#3300) by BlueRum
  • [chat]Update Readme (#3296) by BlueRum

Coati

  • [coati] fix inference profanity check (#3299) by ver217
  • [coati] inference supports profanity check (#3295) by ver217
  • [coati] add repetition_penalty for inference (#3294) by ver217
  • [coati] fix inference output (#3285) by ver217
  • [Coati] first commit (#3283) by Fazzie-Maqianli

Colossalchat

Examples

  • [examples] polish AutoParallel readme (#3270) by YuliangLiu0306
  • [examples] Solving the diffusion issue of incompatibility issue#3169 (#3170) by NatalieC323

Fx

  • [fx] meta registration compatibility (#3253) by HELSON
  • [FX] refactor experimental tracer and adapt it with hf models (#3157) by YuliangLiu0306

Booster

  • [booster] implemented the torch ddd + resnet example (#3232) by Frank Lee
  • [booster] implemented the cluster module (#3191) by Frank Lee
  • [booster] added the plugin base and torch ddp plugin (#3180) by Frank Lee
  • [booster] added the accelerator implementation (#3159) by Frank Lee
  • [booster] implemented mixed precision class (#3151) by Frank Lee

Ci

  • [CI] Fix pre-commit workflow (#3238) by Hakjin Lee

Api

  • [API] implement device mesh manager (#3221) by YuliangLiu0306
  • [api] implemented the checkpoint io module (#3205) by Frank Lee

Hotfix

Chatgpt

  • [chatgpt] add precision option for colossalai (#3233) by ver217
  • [chatgpt] unnify datasets (#3218) by Fazzie-Maqianli
  • [chatgpt] support instuct training (#3216) by Fazzie-Maqianli
  • [chatgpt]add reward model code for deberta (#3199) by Yuanchen
  • [chatgpt]support llama (#3070) by Fazzie-Maqianli
  • [chatgpt] add supervised learning fine-tune code (#3183) by pgzhang
  • [chatgpt]Reward Model Training Process update (#3133) by BlueRum
  • [chatgpt] fix trainer generate kwargs (#3166) by ver217
  • [chatgpt] fix ppo training hanging problem with gemini (#3162) by ver217
  • [chatgpt]update ci (#3087) by BlueRum
  • [chatgpt]Fix examples (#3116) by BlueRum
  • [chatgpt] fix lora support for gpt (#3113) by BlueRum
  • [chatgpt] type miss of kwargs (#3107) by hiko2MSP
  • [chatgpt] fix lora save bug (#3099) by BlueRum

Lazyinit

  • [lazyinit] combine lazy tensor with dtensor (#3204) by ver217
  • [lazyinit] add correctness verification (#3147) by ver217
  • [lazyinit] refactor lazy tensor and lazy init ctx (#3131) by ver217

Auto

  • [auto] fix requirements typo for issue #3125 (#3209) by Yan Fang

Analyzer

Dreambooth

  • [dreambooth] fixing the incompatibity in requirements.txt (#3190) by NatalieC323

Auto-parallel

  • [auto-parallel] add auto-offload feature (#3154) by Zihao

Zero

  • [zero] Refactor ZeroContextConfig class using dataclass (#3186) by YH

Test

  • [test] fixed torchrec registration in model zoo (#3177) by Frank Lee
  • [test] fixed torchrec model test (#3167) by Frank Lee
  • [test] add torchrec models to test model zoo (#3139) by YuliangLiu0306
  • [test] added transformers models to test model zoo (#3135) by Frank Lee
  • [test] added torchvision models to test model zoo (#3132) by Frank Lee
  • [test] added timm models to test model zoo (#3129) by Frank Lee

Refactor

Tests

  • [tests] model zoo add torchaudio models (#3138) by ver217
  • [tests] diffuser models in model zoo (#3136) by HELSON

Docker

  • [docker] Add opencontainers image-spec to Dockerfile (#3006) by Saurav Maheshkar

Dtensor

  • [DTensor] refactor dtensor with new components (#3089) by YuliangLiu0306

Workflow

  • [workflow] purged extension cache before GPT test (#3128) by Frank Lee

Autochunk

  • [autochunk] support complete benchmark (#3121) by Xuanlei Zhao

Tutorial

  • [tutorial] update notes for TransformerEngine (#3098) by binmakeswell

Nvidia

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.8...v0.2.7

v0.2.6

1 year ago

What's Changed

Release

Doc

  • [doc] moved doc test command to bottom (#3075) by Frank Lee
  • [doc] specified operating system requirement (#3019) by Frank Lee
  • [doc] update nvme offload doc (#3014) by ver217
  • [doc] add ISC tutorial (#2997) by binmakeswell
  • [doc] add deepspeed citation and copyright (#2996) by ver217
  • [doc] added reference to related works (#2994) by Frank Lee
  • [doc] update news (#2983) by binmakeswell
  • [doc] fix chatgpt inference typo (#2964) by binmakeswell
  • [doc] add env scope (#2933) by binmakeswell
  • [doc] added readme for documentation (#2935) by Frank Lee
  • [doc] removed read-the-docs (#2932) by Frank Lee
  • [doc] update installation for GPT (#2922) by binmakeswell
  • [doc] add os scope, update tutorial install and tips (#2914) by binmakeswell
  • [doc] fix GPT tutorial (#2860) by dawei-wang
  • [doc] fix typo in opt inference tutorial (#2849) by Zheng Zeng
  • [doc] update OPT serving (#2804) by binmakeswell
  • [doc] update example and OPT serving link (#2769) by binmakeswell
  • [doc] add opt service doc (#2747) by Frank Lee
  • [doc] fixed a typo in GPT readme (#2736) by cloudhuang
  • [doc] updated documentation version list (#2730) by Frank Lee

Workflow

  • [workflow] fixed doc build trigger condition (#3072) by Frank Lee
  • [workflow] supported conda package installation in doc test (#3028) by Frank Lee
  • [workflow] fixed the post-commit failure when no formatting needed (#3020) by Frank Lee
  • [workflow] added auto doc test on PR (#2929) by Frank Lee
  • [workflow] moved pre-commit to post-commit (#2895) by Frank Lee

Booster

  • [booster] init module structure and definition (#3056) by Frank Lee

Example

  • [example] fix redundant note (#3065) by binmakeswell
  • [example] fixed opt model downloading from huggingface by Tomek
  • [example] add LoRA support (#2821) by Haofan Wang

Autochunk

  • [autochunk] refactor chunk memory estimation (#2762) by Xuanlei Zhao

Chatgpt

  • [chatgpt] change critic input as state (#3042) by wenjunyang
  • [chatgpt] fix readme (#3025) by BlueRum
  • [chatgpt] Add saving ckpt callback for PPO (#2880) by LuGY
  • [chatgpt]fix inference model load (#2988) by BlueRum
  • [chatgpt] allow shard init and display warning (#2986) by ver217
  • [chatgpt] fix lora gemini conflict in RM training (#2984) by BlueRum
  • [chatgpt] making experience support dp (#2971) by ver217
  • [chatgpt]fix lora bug (#2974) by BlueRum
  • [chatgpt] fix inference demo loading bug (#2969) by BlueRum
  • [ChatGPT] fix README (#2966) by Fazzie-Maqianli
  • [chatgpt]add inference example (#2944) by BlueRum
  • [chatgpt]support opt & gpt for rm training (#2876) by BlueRum
  • [chatgpt] Support saving ckpt in examples (#2846) by BlueRum
  • [chatgpt] fix rm eval (#2829) by BlueRum
  • [chatgpt] add test checkpoint (#2797) by ver217
  • [chatgpt] update readme about checkpoint (#2792) by ver217
  • [chatgpt] startegy add prepare method (#2766) by ver217
  • [chatgpt] disable shard init for colossalai (#2767) by ver217
  • [chatgpt] support colossalai strategy to train rm (#2742) by BlueRum
  • [chatgpt]fix train_rm bug with lora (#2741) by BlueRum

Dtensor

Hotfix

  • [hotfix] skip auto checkpointing tests (#3029) by YuliangLiu0306
  • [hotfix] add shard dim to aviod backward communication error (#2954) by YuliangLiu0306
  • [hotfix]: Remove math.prod dependency (#2837) by Jiatong (Julius) Han
  • [hotfix] fix autoparallel compatibility test issues (#2754) by YuliangLiu0306
  • [hotfix] fix chunk size can not be divided (#2867) by HELSON
  • Hotfix/auto parallel zh doc (#2820) by YuliangLiu0306
  • [hotfix] add copyright for solver and device mesh (#2803) by YuliangLiu0306
  • [hotfix] add correct device for fake_param (#2796) by HELSON

Revert] recover "[refactor

  • [revert] recover "[refactor] restructure configuration files (#2977)" (#3022) by Frank Lee

Format

  • [format] applied code formatting on changed files in pull request 3025 (#3026) by github-actions[bot]
  • [format] applied code formatting on changed files in pull request 2997 (#3008) by github-actions[bot]
  • [format] applied code formatting on changed files in pull request 2933 (#2939) by github-actions[bot]
  • [format] applied code formatting on changed files in pull request 2922 (#2923) by github-actions[bot]

Pipeline

  • [pipeline] Add Simplified Alpa DP Partition (#2507) by Ziyue Jiang

Fx

  • [fx] remove depreciated algorithms. (#2312) (#2313) by Super Daniel

Refactor

Kernel

  • [kernel] cached the op kernel and fixed version check (#2886) by Frank Lee

Misc

  • [misc] add reference (#2930) by ver217

Autoparallel

  • [autoparallel] apply repeat block to reduce solving time (#2912) by YuliangLiu0306
  • [autoparallel] find repeat blocks (#2854) by YuliangLiu0306
  • [autoparallel] Patch meta information for nodes that will not be handled by SPMD solver (#2823) by Boyuan Yao
  • [autoparallel] Patch meta information of torch.where (#2822) by Boyuan Yao
  • [autoparallel] Patch meta information of torch.tanh() and torch.nn.Dropout (#2773) by Boyuan Yao
  • [autoparallel] Patch tensor related operations meta information (#2789) by Boyuan Yao
  • [autoparallel] rotor solver refactor (#2813) by Boyuan Yao
  • [autoparallel] Patch meta information of torch.nn.Embedding (#2760) by Boyuan Yao
  • [autoparallel] distinguish different parallel strategies (#2699) by YuliangLiu0306

Zero

  • [zero] trivial zero optimizer refactoring (#2869) by YH
  • [zero] fix wrong import (#2777) by Boyuan Yao

Cli

  • [cli] handled version check exceptions (#2848) by Frank Lee

Triton

  • [triton] added copyright information for flash attention (#2835) by Frank Lee

Nfc

  • [NFC] polish colossalai/engine/schedule/_pipeline_schedule.py code style (#2744) by Michelle
  • [NFC] polish code format by binmakeswell
  • [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/graph_analysis.py code style (#2737) by xyupeng
  • [NFC] polish colossalai/context/process_group_initializer/initializer_2d.py code style (#2726) by Zirui Zhu
  • [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/batch_norm_handler.py code style (#2728) by Zangwei Zheng
  • [NFC] polish colossalai/cli/cli.py code style (#2734) by Wangbo Zhao(黑色枷锁)

Exmaple

Ci/cd

  • [CI/CD] fix nightly release CD running on forked repo (#2812) by LuGY

Full Changelog: https://github.com/hpcaitech/ColossalAI/compare/v0.2.6...v0.2.5