InternLM Versions Save

Official release of InternLM2 7B and 20B base and chat models. 200K context support

v0.2.1dev20240102

4 months ago

What's Changed

Full Changelog: https://github.com/InternLM/InternLM/compare/v0.2.1dev20231121...v0.2.1dev20240102

v0.2.1dev20231121

5 months ago

TBD

v0.2.1dev20230915

8 months ago

Highlights

  • fix the bug that may have grad overflow when total_steps is small
  • fix the rotary_emb.inv_freq KeyError in tool convert2hf.py
  • add unit test for model

What's Changed

πŸš€ Features

🐞 Bug fixes

πŸ“š Documentations

βœ… Tests

🌐 Other

Known issues

Full Changelog: https://github.com/InternLM/InternLM/compare/v0.2.1dev20230909...v0.2.1dev20230915

v0.2.1dev20230909

8 months ago

What's Changed

Full Changelog: https://github.com/InternLM/InternLM/compare/v0.2.1dev20230908...v0.2.1dev20230909

v0.2.1dev20230908

8 months ago

Highlights

  • fix the bug that may have NaN value when overlap gradients' allreduce with backward
  • support timeout wrapper and runtime diagnosis
  • support readthedocs Chinese version

What's Changed

πŸš€ Features

πŸ’₯ Improvements

🐞 Bug fixes

πŸ“š Documentations

🌐 Other

Known issues

New Contributors

Full Changelog: https://github.com/InternLM/InternLM/compare/v0.2.1dev20230901...v0.2.1dev20230908

v0.2.1dev20230901

8 months ago

Highlights

  • Support centos and ubuntu dockerfile
  • Support runtime gpu flops and nccl allreduce speed test

What's Changed

πŸš€ Features

πŸ’₯ Improvements

🐞 Bug fixes

πŸ“š Documentations

🌐 Other

Known issues

v0.2.0

8 months ago

Features:

  1. Support pipeline parallel, including interleaved and non-interleaved pipeline scheduler.
  2. Support sequence parallel.
  3. Support model evaluation.
  4. Support tf32 with flash-attention.
  5. Support tensorboard writer for recording training performance metrics.
  6. Support customed uniscale logger.
  7. Support calculating model's accuracy and perplexity metrics.
  8. Support oss storage and checkpoint asynchronous uploading.
  9. Support automatically loading the latest checkpoint.
  10. Support checkpoint snapshot.
  11. Support monitoring the status of training jobs, and alarm abnormal status.
  12. Support torch profiler.
  13. Support simple memory profiler.

Optimizations:

  1. Overlapping optimizer parameters broadcast with model forward.
  2. Overlapping optimizer last bucket gradients allreduce with compute norm.

v0.1.0

8 months ago