InternLM Versions Save

Official release of InternLM2 7B and 20B base and chat models. 200K context support

v0.2.1dev20240102

4 months ago

What's Changed

fix(timeout): larger timeout by @JiaoPL in https://github.com/InternLM/InternLM/pull/495
feat(doc): add GPU memory info for 7B & 20B models by @li126com in https://github.com/InternLM/InternLM/pull/507
feat(model): add rope_base interface by @00INDEX in https://github.com/InternLM/InternLM/pull/512
Feat(QA): Check loss when swapping micro_num and micro_bsz && Check grad norm by @li126com in https://github.com/InternLM/InternLM/pull/510
Fix(QA): the py name in main is wrong by @li126com in https://github.com/InternLM/InternLM/pull/514
fix/feat: small fix and enhancement by @SolenoidWGT in https://github.com/InternLM/InternLM/pull/515
test(workflow): add workflow for loss test and change trigger event by @kkscilife in https://github.com/InternLM/InternLM/pull/513
fix(ci): fix test model ckpt ci test by @SolenoidWGT in https://github.com/InternLM/InternLM/pull/518
test(workflow): add unit test case by @kkscilife in https://github.com/InternLM/InternLM/pull/524
feat(storage): use multipart upload when using oss by @li126com in https://github.com/InternLM/InternLM/pull/520
Fix (QA checkpoint): fix test_model_checkpoint singleton import by @li126com in https://github.com/InternLM/InternLM/pull/526
fix(model): add IS_SEQUENCE_PARALLEL check for norm module by @yingtongxiong in https://github.com/InternLM/InternLM/pull/528
feat(model): add output embedding tf32 option by @JiaoPL in https://github.com/InternLM/InternLM/pull/523
feat(grad_norm): vocab grad norm profiling by @JiaoPL in https://github.com/InternLM/InternLM/pull/519
fix(data): fix the unpack for type_ids when use_flash_attn=False by @yingtongxiong in https://github.com/InternLM/InternLM/pull/516
fix(storage): unify the name of AK and SK by @li126com in https://github.com/InternLM/InternLM/pull/527
fix(test): fix type_ids unpack bug by @SolenoidWGT in https://github.com/InternLM/InternLM/pull/530
feat(model): support llama model with checkpoint loading by @li126com in https://github.com/InternLM/InternLM/pull/532
fix(metric): add metric dtype control by @Pryest in https://github.com/InternLM/InternLM/pull/533
feat(ckpt): support auto resume in Volc and Ali by @li126com in https://github.com/InternLM/InternLM/pull/529
fix(sequence_parallel): fix norm all-reduce in seq_parallel when not overlaping by @yingtongxiong in https://github.com/InternLM/InternLM/pull/534
fix(pp): fix no-packed dataset load micro batch error by @SolenoidWGT in https://github.com/InternLM/InternLM/pull/538
fix(model): change model_type LLAMA to LLAMA2 by @li126com in https://github.com/InternLM/InternLM/pull/539
fix(moe): fix moe zero mode bug by @blankde in https://github.com/InternLM/InternLM/pull/548
fix(grad_norm): token grad norm with tp by @JiaoPL in https://github.com/InternLM/InternLM/pull/547
test(workflow): change into reserved by @kkscilife in https://github.com/InternLM/InternLM/pull/550
fix(model): add ckpt_type constraint when loading ckpts by @li126com in https://github.com/InternLM/InternLM/pull/542
feat(logger): add tensorboard key value buffer by @SolenoidWGT in https://github.com/InternLM/InternLM/pull/549
fix(metrics): remove redundant cuda memory in metric calculations by @SolenoidWGT in https://github.com/InternLM/InternLM/pull/557
fix(lr_scheduler): fix when resuming lr_scheduler without loading optimizer by @gaoyang07 in https://github.com/InternLM/InternLM/pull/565

Full Changelog: https://github.com/InternLM/InternLM/compare/v0.2.1dev20231121...v0.2.1dev20240102

v0.2.1dev20231121

5 months ago

TBD

v0.2.1dev20230915

8 months ago

Highlights

fix the bug that may have grad overflow when total_steps is small
fix the rotary_emb.inv_freq KeyError in tool convert2hf.py
add unit test for model

What's Changed

🚀 Features

feat(core/trainer.py): add more tgs metrics by @li126com in https://github.com/InternLM/InternLM/pull/310

🐞 Bug fixes

fix(convert2hf.py): fix the rotary_emb.inv_freq KeyError by @jiangtann in https://github.com/InternLM/InternLM/pull/299
fix(configs/7B_sft.py): model dtype float16 to bfloat16 by @huangting4201 in https://github.com/InternLM/InternLM/pull/302
fix(chat): fix stream_chat to return generator by @zhjunqin in https://github.com/InternLM/InternLM/pull/123

📚 Documentations

docs(doc/code-docs): update quickstart usage by @huangting4201 in https://github.com/InternLM/InternLM/pull/301
docs(doc/code-docs): add figure for training docs by @zigzagcai in https://github.com/InternLM/InternLM/pull/307

✅ Tests

tests(tests/test_model): add unit test for model by @li126com in https://github.com/InternLM/InternLM/pull/300
tests(tests/test_solver): add unit test for optimizer by @li126com in https://github.com/InternLM/InternLM/pull/303

🌐 Other

Known issues

Full Changelog: https://github.com/InternLM/InternLM/compare/v0.2.1dev20230909...v0.2.1dev20230915

v0.2.1dev20230909

8 months ago

What's Changed

fix(ckpt): fix snapshot none load error and remove file lock by @SolenoidWGT in https://github.com/InternLM/InternLM/pull/298

Full Changelog: https://github.com/InternLM/InternLM/compare/v0.2.1dev20230908...v0.2.1dev20230909

v0.2.1dev20230908

8 months ago

Highlights

fix the bug that may have NaN value when overlap gradients' allreduce with backward
support timeout wrapper and runtime diagnosis
support readthedocs Chinese version

What's Changed

🚀 Features

feat(monitor): add light monitor by @JiaoPL in https://github.com/InternLM/InternLM/pull/275
feat(utils): add timeout wrapper by @SolenoidWGT in https://github.com/InternLM/InternLM/pull/286
feat: add runtime diagnosis by @sunpengsdu in https://github.com/InternLM/InternLM/pull/297

💥 Improvements

fix(storage): refactor and fix storage_manager api by @SolenoidWGT in https://github.com/InternLM/InternLM/pull/281
Feat/sync grad use async op by @sunpengsdu in https://github.com/InternLM/InternLM/pull/277

🐞 Bug fixes

fix(doc/code-docs): autodoc shown error by @huangting4201 in https://github.com/InternLM/InternLM/pull/265
fix(eval): no need to check length of valid_dl when using streaming dataset by @00INDEX in https://github.com/InternLM/InternLM/pull/274
fix/broadcast should not in commu stream by @sunpengsdu in https://github.com/InternLM/InternLM/pull/276
fix(model): set tensor parallel attribute for mlp by @yingtongxiong in https://github.com/InternLM/InternLM/pull/271
feat(ckpt): checkpoint bug fixes and feature enhancements. by @SolenoidWGT in https://github.com/InternLM/InternLM/pull/259
fix(ckpt): fix checkpoint reload bug by @SolenoidWGT in https://github.com/InternLM/InternLM/pull/282
fix(core/context): use dummy mode to generate random numbers in model construction by @blankde in https://github.com/InternLM/InternLM/pull/266
fix(monitor): add alert switch and refactor monitor config by @JiaoPL in https://github.com/InternLM/InternLM/pull/285
fix: fix the bug to do bcast in a stream by @sunpengsdu in https://github.com/InternLM/InternLM/pull/294

📚 Documentations

docs(*): add documentation and reST files for readthedocs by @zigzagcai in https://github.com/InternLM/InternLM/pull/272
docs(doc/code-docs): support zh cn readthedocs by @huangting4201 in https://github.com/InternLM/InternLM/pull/289
docs(fsdp): add training option for fsdp by @zaglc in https://github.com/InternLM/InternLM/pull/273
docs(doc/code-docs): refine profiler docs by @zigzagcai in https://github.com/InternLM/InternLM/pull/295

🌐 Other

Known issues

New Contributors

@JiaoPL made their first contribution in https://github.com/InternLM/InternLM/pull/275
@blankde made their first contribution in https://github.com/InternLM/InternLM/pull/266
@zigzagcai made their first contribution in https://github.com/InternLM/InternLM/pull/272
@zaglc made their first contribution in https://github.com/InternLM/InternLM/pull/273

Full Changelog: https://github.com/InternLM/InternLM/compare/v0.2.1dev20230901...v0.2.1dev20230908

v0.2.1dev20230901

8 months ago

Highlights

Support centos and ubuntu dockerfile
Support runtime gpu flops and nccl allreduce speed test

What's Changed

🚀 Features

Implement uniform_init for tensor by @Pryest in https://github.com/InternLM/InternLM/pull/252
Support centos and ubuntu dockerfile by @li126com in https://github.com/InternLM/InternLM/pull/220 https://github.com/InternLM/InternLM/pull/243
Support writer add_scalars for writing dict data by @huangting4201 in https://github.com/InternLM/InternLM/pull/257
Support runtime gpu flops and nccl allreduce speed test by @sunpengsdu in https://github.com/InternLM/InternLM/pull/254

💥 Improvements

🐞 Bug fixes

Fix StreamingDataset does not have an len method by @00INDEX in https://github.com/InternLM/InternLM/pull/251
Fix argument missing in getting loss metrics by @MagicDevilZhang in https://github.com/InternLM/InternLM/pull/256
Fix the error that RotaryEmbedding is converted to a non-fp32 format during operation by @YWMditto in https://github.com/InternLM/InternLM/pull/239

📚 Documentations

Update readme structure by @huangting4201 in https://github.com/InternLM/InternLM/pull/240
Support readthedocs by @huangting4201 in https://github.com/InternLM/InternLM/pull/245 https://github.com/InternLM/InternLM/pull/264

🌐 Other

Known issues

v0.2.0

8 months ago

Features：

Support pipeline parallel, including interleaved and non-interleaved pipeline scheduler.
Support sequence parallel.
Support model evaluation.
Support tf32 with flash-attention.
Support tensorboard writer for recording training performance metrics.
Support customed uniscale logger.
Support calculating model's accuracy and perplexity metrics.
Support oss storage and checkpoint asynchronous uploading.
Support automatically loading the latest checkpoint.
Support checkpoint snapshot.
Support monitoring the status of training jobs, and alarm abnormal status.
Support torch profiler.
Support simple memory profiler.

Optimizations：

Overlapping optimizer parameters broadcast with model forward.
Overlapping optimizer last bucket gradients allreduce with compute norm.

v0.1.0

8 months ago