Pytorch Lightning Versions Save

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.

2.0.4

10 months ago

App

Fixed

  • bumped several dependencies to address security vulnerabilities.

Fabric

Fixed

  • Fixed validation of parameters of plugins.precision.MixedPrecision (#17687)
  • Fixed an issue with HPU imports leading to performance degradation (#17788)

PyTorch

Changed

  • Changes to the NeptuneLogger (#16761):
    • It now supports neptune-client 0.16.16 and neptune >=1.0, and we have replaced the log() method with append() and extend().
    • It now accepts a namespace Handler as an alternative to Run for the run argument. This means that you can call it NeptuneLogger(run=run["some/namespace"]) to log everything to the some/namespace/ location of the run.

Fixed

  • Fixed validation of parameters of plugins.precision.MixedPrecisionPlugin (#17687)
  • Fixed deriving default map location in LightningModule.load_from_checkpoint when there is an extra state (#17812)

Contributors

@akreuzer, @awaelchli, @borda, @jerome-habana, @kshitij12345

If we forgot someone due to not matching commit email with GitHub account, let us know :]

2.0.3

11 months ago

App

Added

  • Added the property LightningWork.public_ip that exposes the public IP of the LightningWork instance (#17742)
  • Add missing python-multipart dependency (#17244)

Changed

  • Made type hints public (#17100)

Fixed

  • Fixed LightningWork.internal_ip that was mistakenly exposing the public IP instead; now exposes the private/internal IP address (#17742)
  • Fixed resolution of the latest version in CLI (#17351)
  • Fixed property raised instead of returned (#17595)
  • Fixed get project (#17617, #17666)

Fabric

Added

  • Added support for Callback registration through entry points (#17756)

Changed

  • Made type hints public (#17100)
  • Support compiling a module after it was set up by Fabric (#17529)

Fixed

  • Fixed computing the next version folder in CSVLogger (#17139)
  • Fixed inconsistent settings for FSDP Precision (#17670)

PyTorch

Changed

  • Made type hints public (#17100)

Fixed

  • CombinedLoader only starts DataLoader workers when necessary when operating in sequential mode (#17639)
  • Fixed a potential bug with uploading model checkpoints to Neptune.ai by uploading files from stream (#17430)
  • Fixed signature inspection of decorated hooks (#17507)
  • The WandbLogger no longer flattens dictionaries in the hyperparameters logged to the dashboard (#17574)
  • Fixed computing the next version folder in CSVLogger (#17139)
  • Fixed a formatting issue when the filename in ModelCheckpoint contained metrics that were substrings of each other (#17610)
  • Fixed WandbLogger ignoring the WANDB_PROJECT environment variable (#16222)
  • Fixed inconsistent settings for FSDP Precision (#17670)
  • Fixed an edge case causing overlapping samples in DDP when no global seed is set (#17713)
  • Fallback to module available check for mlflow (#17467)
  • Fixed LR finder max val batches (#17636)
  • Fixed multithreading checkpoint loading (#17678)

Contributors

@adamjstewart, @AleksanderWWW, @awaelchli, @baskrahmer, @bkiat1123, @borda, @carmocca, @ethanwharris, @leng-yue, @lightningforever, @manangoel99, @mukhery, @Quasar-Kim, @water-vapor, @yurijmikhalevich

If we forgot someone due to not matching commit email with GitHub account, let us know :]

2.0.2

1 year ago

App

Fixed

  • Resolved Lightning App with remote storage (#17426)
  • Fixed AppState, streamlit example (#17452)

Fabric

Changed

  • Enable precision autocast for LightningModule step methods in Fabric (#17439)

Fixed

  • Fixed an issue with LightningModule.*_step methods bypassing the DDP/FSDP wrapper (#17424)
  • Fixed device handling in Fabric.setup() when the model has no parameters (#17441)

PyTorch

Fixed

  • Fixed Model.load_from_checkpoint("checkpoint.ckpt", map_location=map_location) would always return model on CPU (#17308)
  • Fixed Sync module states during non-fit (#17370)
  • Fixed an issue that caused num_nodes not to be set correctly for FSDPStrategy (#17438)

Contributors

@awaelchli, @borda, @carmocca, @ethanwharris, @ryan597, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

1.9.5

1 year ago

App

Changed

  • Added healthz endpoint to plugin server (#16882)
  • System customization syncing for jobs run (#16932)

Fabric

Changed

  • Let TorchCollective works on the torch.distributed WORLD process group by default (#16995)

Fixed

  • fixed for all _cuda_clearCublasWorkspaces on teardown (#16907)
  • Improved the error message for installing tensorboard or tensorboardx (#17053)

PyTorch

Changed

  • Changed to the NeptuneLogger (#16761):
    • It now supports neptune-client 0.16.16 and neptune >=1.0, and we have replaced the log() method with append() and extend().
    • It now accepts a namespace Handler as an alternative to Run for the run argument. This means that you can call it like NeptuneLogger(run=run["some/namespace"]) to log everything to the some/namespace/ location of the run.
  • Allow sys.argv and args in LightningCLI (#16808)
  • Moveed HPU broadcast override to the HPU strategy file (#17011)

Depercated

  • Removed registration of ShardedTensor state dict hooks in LightningModule.__init__ with torch>=2.1 (#16892)
  • Removed the lightning.pytorch.core.saving.ModelIO class interface (#16974)

Fixed

  • Fixed num_nodes not being set for DDPFullyShardedNativeStrategy (#17160)
  • Fixed parsing the precision config for inference in DeepSpeedStrategy (#16973)
  • Fixed the availability check for rich that prevented Lightning to be imported in Google Colab (#17156)
  • Fixed for all _cuda_clearCublasWorkspaces on teardown (#16907)
  • The psutil package is now required for CPU monitoring (#17010)
  • Improved the error message for installing tensorboard or tensorboardx (#17053)

Contributors

@awaelchli, @belerico, @carmocca, @colehawkins, @dmitsf, @Erotemic, @ethanwharris, @kshitij12345, @borda

If we forgot someone due to not matching commit email with GitHub account, let us know :]

2.0.1.post0

1 year ago

App

Fixed

  • Fix frontend hosts when running with multi-process in the cloud (#17324)

Fabric

No changes.


PyTorch

Fixed

  • Make the is_picklable function more robust (#17270)

Contributors

@eng-yue @ethanwharris @Borda @awaelchli @carmocca

If we forgot someone due to not matching commit email with GitHub account, let us know :]

2.0.1

1 year ago

App

No changes


Fabric

Changed

  • Generalized Optimizer validation to accommodate both FSDP 1.x and 2.x (#16733)

PyTorch

Changed

  • Pickling the LightningModule no longer pickles the Trainer (#17133)
  • Generalized Optimizer validation to accommodate both FSDP 1.x and 2.x (#16733)
  • Disable torch.inference_mode with torch.compile in PyTorch 2.0 (#17215)

Fixed

  • Fixed issue where pickling the module instance would fail with a DataLoader error (#17130)
  • Fixed WandbLogger not showing "best" aliases for model checkpoints when ModelCheckpoint(save_top_k>0) is used (#17121)
  • Fixed the availability check for rich that prevented Lightning to be imported in Google Colab (#17156)
  • Fixed parsing the precision config for inference in DeepSpeedStrategy (#16973)
  • Fixed issue where torch.compile would fail when logging to WandB (#17216)

Contributors

@Borda @williamFalcon @lightningforever @adamjstewart @carmocca @tshu-w @saryazdi @parambharat @awaelchli @colehawkins @woqidaideshi @md-121 @yhl48 @gkroiz @idc9 @speediedan

If we forgot someone due to not matching commit email with GitHub account, let us know :]

2.0.0

1 year ago

1.9.4

1 year ago

App

Removed

  • Removed implicit ui testing with testing.run_app_in_cloud in favor of headless login and app selection (#16741)

Fabric

Added

  • Added Fabric(strategy="auto") support (#16916)

Fixed

  • Fixed edge cases in parsing device ids using NVML (#16795)
  • Fixed DDP spawn hang on TPU Pods (#16844)
  • Fixed an error when passing find_usable_cuda_devices(num_devices=-1) (#16866)

PyTorch

Added

  • Added Fabric(strategy="auto") support. It will choose DDP over DDP-spawn, contrary to strategy=None (default) (#16916)

Fixed

  • Fixed DDP spawn hang on TPU Pods (#16844)
  • Fixed edge cases in parsing device ids using NVML (#16795)
  • Fixed backwards compatibility for lightning.pytorch.utilities.parsing.get_init_args (#16851)

Contributors

@ethanwharris, @carmocca, @awaelchli, @justusschock , @dtuit, @Liyang90

If we forgot someone due to not matching commit email with GitHub account, let us know :]

1.9.3

1 year ago

App

Fixed

  • Fixed lightning open command and improved redirects (#16794)

Fabric

Fixed

  • Fixed an issue causing a wrong environment plugin to be selected when accelerator=tpu and devices > 1 (#16806)
  • Fixed parsing of defaults for --accelerator and --precision in Fabric CLI when accelerator and precision are set to non-default values in the code (#16818)

PyTorch

Fixed

  • Fixed an issue causing a wrong environment plugin to be selected when accelerator=tpu and devices > 1 (#16806)

Contributors

@ethanwharris, @carmocca, @awaelchli, @borda, @tchaton, @yurijmikhalevich

If we forgot someone due to not matching commit email with GitHub account, let us know :]