Pytorch Lightning Versions Save

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.

1.9.2

1 year ago

App

Added

Added Storage Commands (#16740)
- rm: Delete files from your Cloud Platform Filesystem
Added lightning connect data to register data connection to private s3 buckets (#16738)

Fabric

Fixed

Fixed an attribute error and improved input validation for invalid strategy types being passed to Fabric (#16693)

PyTorch

Changed

Disabled strict loading in multiprocessing launcher ("ddp_spawn", etc.) when loading weights back into the main process (#16365)

Fixed

Fixed an attribute error and improved input validation for invalid strategy types being passed to Trainer (#16693)
Fixed early stopping triggering extra validation runs after reaching min_epochs or min_steps (#16719)

Contributors

@akihironitta, @awaelchli, @borda, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

1.9.1

1 year ago

App

Added

Added lightning open command (#16482)
Added experimental support for interruptable GPU in the cloud (#16399)
Added FileSystem abstraction to simply manipulate files (#16581)
Added Storage Commands (#16606)
- ls: List files from your Cloud Platform Filesystem
- cd: Change the current directory within your Cloud Platform filesystem (terminal session based)
- pwd: Return the current folder in your Cloud Platform Filesystem
- cp: Copy files between your Cloud Platform Filesystem and local filesystem
Prevent to cd into non-existent folders (#16645)
Enabled cp (upload) at project level (#16631)
Enabled ls and cp (download) at project level (#16622)
Added lightning connect data to register data connection to s3 buckets (#16670)
Added support for running with multiprocessing in the cloud (#16624)
Initial plugin server (#16523)
Connect and Disconnect node (#16700)

Changed

Changed the default LightningClient(retry=False) to retry=True (#16382)
Add support for async predict method in PythonServer and remove torch context (#16453)
Renamed lightning.app.components.LiteMultiNode to lightning.app.components.FabricMultiNode (#16505)
Changed the command lightning connect to lightning connect app for consistency (#16670)
Refactor cloud dispatch and update to new API (#16456)
Updated app URLs to the latest format (#16568)

Fixed

Fixed a deadlock causing apps not to exit properly when running locally (#16623)
Fixed the Drive root_folder not parsed properly (#16454)
Fixed malformed path when downloading files using lightning cp (#16626)
Fixed app name in URL (#16575)

Fabric

Fixed

Fixed error handling for accelerator="mps" and ddp strategy pairing (#16455)
Fixed strict availability check for torch_xla requirement (#16476)
Fixed an issue where PL would wrap DataLoaders with XLA's MpDeviceLoader more than once (#16571)
Fixed the batch_sampler reference for DataLoaders wrapped with XLA's MpDeviceLoader (#16571)
Fixed an import error when torch.distributed is not available (#16658)

Pytorch

Fixed

Fixed an unintended limitation for calling save_hyperparameters on mixin classes that don't subclass LightningModule/LightningDataModule (#16369)
Fixed an issue with MLFlowLogger logging the wrong keys with .log_hyperparams() (#16418)
Fixed logging more than 100 parameters with MLFlowLogger and long values are truncated (#16451)
Fixed strict availability check for torch_xla requirement (#16476)
Fixed an issue where PL would wrap DataLoaders with XLA's MpDeviceLoader more than once (#16571)
Fixed the batch_sampler reference for DataLoaders wrapped with XLA's MpDeviceLoader (#16571)
Fixed an import error when torch.distributed is not available (#16658)

Contributors

@akihironitta, @awaelchli, @borda, @BrianPulfer, @ethanwharris, @hhsecond, @justusschock, @Liyang90, @RuRo, @senarvi, @shenoynikhil, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

1.9.0

1 year ago

App

Added

Added a possibility to set up basic authentication for Lightning apps (#16105)

Changed

The LoadBalancer now uses internal ip + port instead of URL exposed (#16119)
Added support for logging in different trainer stages with DeviceStatsMonitor (#16002)
Changed lightning_app.components.serve.gradio to lightning_app.components.serve.gradio_server (#16201)
Made cluster creation/deletion async by default (#16185)

Fixed

Fixed not being able to run multiple lightning apps locally due to port collision (#15819)
Avoid relpath bug on Windows (#16164)
Avoid using the deprecated LooseVersion (#16162)
Porting fixes to autoscaler component (#16249)
Fixed a bug where lightning login with env variables would not correctly save the credentials (#16339)

Fabric

Added

Added Fabric.launch() to programmatically launch processes (e.g. in Jupyter notebook) (#14992)
Added the option to launch Fabric scripts from the CLI, without the need to wrap the code into the run method (#14992)
Added Fabric.setup_module() and Fabric.setup_optimizers() to support strategies that need to set up the model before an optimizer can be created (#15185)
Added support for Fully Sharded Data Parallel (FSDP) training in Lightning Lite (#14967)
Added lightning_fabric.accelerators.find_usable_cuda_devices utility function (#16147)
Added basic support for LightningModules (#16048)
Added support for managing callbacks via Fabric(callbacks=...) and emitting events through Fabric.call() (#16074)
Added Logger support (#16121)
- Added Fabric(loggers=...) to support different Logger frameworks in Fabric
- Added Fabric.log for logging scalars using multiple loggers
- Added Fabric.log_dict for logging a dictionary of multiple metrics at once
- Added Fabric.loggers and Fabric.logger attributes to access the individual logger instances
- Added support for calling self.log and self.log_dict in a LightningModule when using Fabric
- Added access to self.logger and self.loggers in a LightningModule when using Fabric
Added lightning_fabric.loggers.TensorBoardLogger (#16121)
Added lightning_fabric.loggers.CSVLogger (#16346)
Added support for a consistent .zero_grad(set_to_none=...) on the wrapped optimizer regardless of which strategy is used (#16275)

Changed

Renamed the class LightningLite to Fabric (#15932, #15938)
The Fabric.run() method is no longer abstract (#14992)
The XLAStrategy now inherits from ParallelStrategy instead of DDPSpawnStrategy (#15838)
Merged the implementation of DDPSpawnStrategy into DDPStrategy and removed DDPSpawnStrategy (#14952)
The dataloader wrapper returned from .setup_dataloaders() now calls .set_epoch() on the distributed sampler if one is used (#16101)
Renamed Strategy.reduce to Strategy.all_reduce in all strategies (#16370)
When using multiple devices, the strategy now defaults to "ddp" instead of "ddp_spawn" when none is set (#16388)

Removed

Removed support for FairScale's sharded training (strategy='ddp_sharded'|'ddp_sharded_spawn'). Use Fully-Sharded Data Parallel instead (strategy='fsdp') (#16329)

Fixed

Restored sampling parity between PyTorch and Fabric dataloaders when using the DistributedSampler (#16101)
Fixes an issue where the error message wouldn't tell the user the real value that was passed through the CLI (#16334)

PyTorch

Added

Added support for native logging of MetricCollection with enabled compute groups (#15580)
Added support for custom artifact names in pl.loggers.WandbLogger (#16173)
Added support for DDP with LRFinder (#15304)
Added utilities to migrate checkpoints from one Lightning version to another (#15237)
Added support to upgrade all checkpoints in a folder using the pl.utilities.upgrade_checkpoint script (#15333)
Add an axes argument ax to the .lr_find().plot() to enable writing to a user-defined axes in a matplotlib figure (#15652)
Added log_model parameter to MLFlowLogger (#9187)
Added a check to validate that wrapped FSDP models are used while initializing optimizers (#15301)
Added a warning when self.log(..., logger=True) is called without a configured logger (#15814)
Added support for colossalai 0.1.11 (#15888)
Added LightningCLI support for optimizer and learning schedulers via callable type dependency injection (#15869)
Added support for activation checkpointing for the DDPFullyShardedNativeStrategy strategy (#15826)
Added the option to set DDPFullyShardedNativeStrategy(cpu_offload=True|False) via bool instead of needing to pass a configuration object (#15832)
Added info message for Ampere CUDA GPU users to enable tf32 matmul precision (#16037)
Added support for returning optimizer-like classes in LightningModule.configure_optimizers (#16189)

Changed

Switch from tensorboard to tensorboardx in TensorBoardLogger (#15728)
From now on, Lightning Trainer and LightningModule.load_from_checkpoint automatically upgrade the loaded checkpoint if it was produced in an old version of Lightning (#15237)
Trainer.{validate,test,predict}(ckpt_path=...) no longer restores the Trainer.global_step and trainer.current_epoch value from the checkpoints - From now on, only Trainer.fit will restore this value (#15532)
The ModelCheckpoint.save_on_train_epoch_end attribute is now computed dynamically every epoch, accounting for changes to the validation dataloaders (#15300)
The Trainer now raises an error if it is given multiple stateful callbacks of the same time with colliding state keys (#15634)
MLFlowLogger now logs hyperparameters and metrics in batched API calls (#15915)
Overriding the on_train_batch_{start,end} hooks in conjunction with taking a dataloader_iter in the training_step no longer errors out and instead shows a warning (#16062)
Move tensorboardX to extra dependencies. Use the CSVLogger by default (#16349)
Drop PyTorch 1.9 support (#15347)

Deprecated

Deprecated description, env_prefix and env_parse parameters in LightningCLI.__init__ in favour of giving them through parser_kwargs (#15651)
Deprecated pytorch_lightning.profiler in favor of pytorch_lightning.profilers (#16059)
Deprecated Trainer(auto_select_gpus=...) in favor of pytorch_lightning.accelerators.find_usable_cuda_devices (#16147)
Deprecated pytorch_lightning.tuner.auto_gpu_select.{pick_single_gpu,pick_multiple_gpus} in favor of pytorch_lightning.accelerators.find_usable_cuda_devices (#16147)
nvidia/apex deprecation (#16039)
- Deprecated pytorch_lightning.plugins.NativeMixedPrecisionPlugin in favor of pytorch_lightning.plugins.MixedPrecisionPlugin
- Deprecated the LightningModule.optimizer_step(using_native_amp=...) argument
- Deprecated the Trainer(amp_backend=...) argument
- Deprecated the Trainer.amp_backend property
- Deprecated the Trainer(amp_level=...) argument
- Deprecated the pytorch_lightning.plugins.ApexMixedPrecisionPlugin class
- Deprecates the pytorch_lightning.utilities.enums.AMPType enum
- Deprecates the DeepSpeedPrecisionPlugin(amp_type=..., amp_level=...) arguments
horovod deprecation (#16141)
- Deprecated Trainer(strategy="horovod")
- Deprecated the HorovodStrategy class
Deprecated pytorch_lightning.lite.LightningLite in favor of lightning.fabric.Fabric (#16314)
FairScale deprecation (in favor of PyTorch's FSDP implementation) (#16353)
- Deprecated the pytorch_lightning.overrides.fairscale.LightningShardedDataParallel class
- Deprecated the pytorch_lightning.plugins.precision.fully_sharded_native_amp.FullyShardedNativeMixedPrecisionPlugin class
- Deprecated the pytorch_lightning.plugins.precision.sharded_native_amp.ShardedNativeMixedPrecisionPlugin class
- Deprecated the pytorch_lightning.strategies.fully_sharded.DDPFullyShardedStrategy class
- Deprecated the pytorch_lightning.strategies.sharded.DDPShardedStrategy class
- Deprecated the pytorch_lightning.strategies.sharded_spawn.DDPSpawnShardedStrategy class

Removed

Removed deprecated pytorch_lightning.utilities.memory.get_gpu_memory_map in favor of pytorch_lightning.accelerators.cuda.get_nvidia_gpu_stats (#15617)
Temporarily removed support for Hydra multi-run (#15737)
Removed deprecated pytorch_lightning.profiler.base.AbstractProfiler in favor of pytorch_lightning.profilers.profiler.Profiler (#15637)
Removed deprecated pytorch_lightning.profiler.base.BaseProfiler in favor of pytorch_lightning.profilers.profiler.Profiler (#15637)
Removed deprecated code in pytorch_lightning.utilities.meta (#16038)
Removed the deprecated LightningDeepSpeedModule (#16041)
Removed the deprecated pytorch_lightning.accelerators.GPUAccelerator in favor of pytorch_lightning.accelerators.CUDAAccelerator (#16050)
Removed the deprecated pytorch_lightning.profiler.* classes in favor of pytorch_lightning.profilers (#16059)
Removed the deprecated pytorch_lightning.utilities.cli module in favor of pytorch_lightning.cli (#16116)
Removed the deprecated pytorch_lightning.loggers.base module in favor of pytorch_lightning.loggers.logger (#16120)
Removed the deprecated pytorch_lightning.loops.base module in favor of pytorch_lightning.loops.loop (#16142)
Removed the deprecated pytorch_lightning.core.lightning module in favor of pytorch_lightning.core.module (#16318)
Removed the deprecated pytorch_lightning.callbacks.base module in favor of pytorch_lightning.callbacks.callback (#16319)
Removed the deprecated Trainer.reset_train_val_dataloaders() in favor of Trainer.reset_{train,val}_dataloader (#16131)
Removed support for LightningCLI(seed_everything_default=None) (#16131)
Removed support in LightningLite for FairScale's sharded training (strategy='ddp_sharded'|'ddp_sharded_spawn'). Use Fully-Sharded Data Parallel instead (strategy='fsdp') (#16329)

Fixed

Enhanced reduce_boolean_decision to accommodate any-analogous semantics expected by the EarlyStopping callback (#15253)
Fixed the incorrect optimizer step synchronization when running across multiple TPU devices (#16020)
Fixed a type error when dividing the chunk size in the ColossalAI strategy (#16212)
Fixed bug where the interval key of the scheduler would be ignored during manual optimization, making the LearningRateMonitor callback fail to log the learning rate (#16308)
Fixed an issue with MLFlowLogger not finalizing correctly when status code 'finished' was passed (#16340)

Contributors

@1SAA, @akihironitta, @AlessioQuercia, @awaelchli, @bipinKrishnan, @Borda, @carmocca, @dmitsf, @erhoo82, @ethanwharris, @Forbu, @hhsecond, @justusschock, @lantiga, @lightningforever, @Liyang90, @manangoel99, @mauvilsa, @nicolai86, @nohalon, @rohitgr7, @schmidt-jake, @speediedan, @yMayanand

If we forgot someone due to not matching commit email with GitHub account, let us know :]

1.8.6

1 year ago

App

Added

Added partial support for fastapi Request annotation in configure_api handlers (#16047)
Added a nicer UI with URL and examples for the autoscaler component (#16063)
Enabled users to have more control over scaling out/in intervals (#16093)
Added more datatypes to the serving component (#16018)
Added work.delete method to delete the work (#16103)
Added display_name property to LightningWork for the cloud (#16095)
Added ColdStartProxy to the AutoScaler (#16094)
Added status endpoint, enable ready (#16075)
Implemented ready for components (#16129)

Changed

The default start_method for creating Work processes locally on macOS is now 'spawn' (previously 'fork') (#16089)
The utility lightning.app.utilities.cloud.is_running_in_cloud now returns True during the loading of the app locally when running with --cloud (#16045)
Updated Multinode Warning (#16091)
Updated app testing (#16000)
Changed overwrite to True (#16009)
Simplified messaging in cloud dispatch (#16160)
Added annotations endpoint (#16159)

Fixed

Fixed PythonServer messaging "Your app has started" (#15989)
Fixed auto-batching to enable batching for requests coming even after the batch interval but is in the queue (#16110)
Fixed a bug where AutoScaler would fail with min_replica=0 (#16092
Fixed a non-thread safe deepcopy in the scheduler (#16114)
Fixed HTTP Queue sleeping for 1 sec by default if no delta was found (#16114)
Fixed the endpoint info tab not showing up in the AutoScaler UI (#16128)
Fixed an issue where an exception would be raised in the logs when using a recent version of streamlit (#16139)
Fixed e2e tests (#16146)

Full Changelog: https://github.com/Lightning-AI/lightning/compare/1.8.5.post0...1.8.6

1.8.5.post0

1 year ago

App

Fixed install/upgrade - removing single quote (#16079)
Fixed bug where components that are re-instantiated several times failed to initialize if they were modifying self.lightningignore (#16080)
Fixed a bug where apps that had previously been deleted could not be run again from the CLI (#16082)

Pytorch

Add function to remove checkpoint to allow override for extended classes (#16067)

Full Changelog: https://github.com/Lightning-AI/lightning/compare/1.8.5...1.8.5.post0

1.8.5

1 year ago

App

Added

Added Lightning{Flow,Work}.lightningignores attributes to programmatically ignore files before uploading to the cloud (#15818)
Added a progress bar while connecting to an app through the CLI (#16035)
Support running on multiple clusters (#16016)
Added guards to cluster deletion from cli (#16053)
Added creation of the default .lightningignore that ignores venv (#16056)

Changed

Cleanup cluster waiting (#16054)

Fixed

Fixed DDPStrategy import in app framework (#16029)
Fixed AutoScaler raising an exception when non-default cloud compute is specified (#15991)
Fixed and improvements of login flow (#16052)
Fixed the debugger detection mechanism for the lightning App in VSCode (#16068)

Pytorch

some minor cleaning

Full Changelog: https://github.com/Lightning-AI/lightning/compare/1.8.4.post0...1.8.5

1.8.4.post0

1 year ago

App

Fixed MultiNode Component to use separate cloud computes (#15965)
Fixed Registration for CloudComputes of Works in L.app.structures (#15964)
Fixed a bug where auto-upgrading to the latest lightning via the CLI could get stuck in a loop (#15984)

Pytorch

Fixed the XLAProfiler not recording anything due to mismatching of action names (#15885)

Full Changelog: https://github.com/Lightning-AI/lightning/compare/1.8.4...1.8.4.post0

1.8.3.post2

1 year ago

:robot:

1.8.4

1 year ago

App

Added

Add code_dir argument to tracer run (#15771)
Added the CLI command lightning run model to launch a LightningLite accelerated script (#15506)
Added the CLI command lightning delete app to delete a lightning app on the cloud (#15783)
Added a CloudMultiProcessBackend which enables running a child App from within the Flow in the cloud (#15800)
Utility for pickling work object safely even from a child process (#15836)
Added AutoScaler component (#15769)
Added the property ready of the LightningFlow to inform when the Open App should be visible (#15921)
Added private work attributed _start_method to customize how to start the works (#15923)
Added a configure_layout method to the LightningWork which can be used to control how the work is handled in the layout of a parent flow (#15926)
Added the ability to run a Lightning App or Component directly from the Gallery using lightning run app organization/name (#15941)
Added automatic conversion of list and dict of works and flows to structures (#15961)

Changed

The MultiNode components now warn the user when running with num_nodes > 1 locally (#15806)
Cluster creation and deletion now waits by default [#15458
Running an app without a UI locally no longer opens the browser (#15875)
Show a message when BuildConfig(requirements=[...]) is passed but a requirements.txt file is already present in the Work (#15799)
Show a message when BuildConfig(dockerfile="...") is passed but a Dockerfile file is already present in the Work (#15799)
Dropped name column from cluster list (#15721)
Apps without UIs no longer activate the "Open App" button when running in the cloud (#15875)
Wait for full file to be transferred in Path / Payload (#15934)

Removed

Removed the SingleProcessRuntime (#15933)

Fixed

Fixed SSH CLI command listing stopped components (#15810)
Fixed bug when launching apps on multiple clusters (#15484)
Fixed Sigterm Handler causing thread lock which caused KeyboardInterrupt to hang (#15881)
Fixed MPS error for multinode component (defaults to cpu on mps devices now as distributed operations are not supported by pytorch on mps) (#15748)
Fixed the work not stopped when successful when passed directly to the LightningApp (#15801)
Fixed the PyTorch Inference locally on GPU (#15813)
Fixed the enable_spawn method of the WorkRunExecutor (#15812)
Fixed require/import decorator (#15849)
Fixed a bug where using L.app.structures would cause multiple apps to be opened and fail with an error in the cloud (#15911)
Fixed PythonServer generating noise on M1 (#15949)
Fixed multiprocessing breakpoint (#15950)
Fixed detection of a Lightning App running in debug mode (#15951)
Fixed ImportError on Multinode if package not present (#15963)

Lite

Fixed shuffle=False having no effect when using DDP/DistributedSampler (#15931)

Pytorch

Changed

Direct support for compiled models (#15922)

Fixed

Fixed issue with unsupported torch.inference_mode() on hpu backends (#15918)
Fixed LRScheduler import for PyTorch 2.0 (#15940)
Fixed fit_loop.restarting to be False for lr finder (#15620)
Fixed torch.jit.script-ing a LightningModule causing an unintended error message about deprecated use_amp property (#15947)

Full Changelog: https://github.com/Lightning-AI/lightning/compare/1.8.3...1.8.4

1.8.3.post1

1 year ago

App

Changed

Fixed the PyTorch Inference locally on GPU (#15813)

Full Changelog: https://github.com/Lightning-AI/lightning/compare/1.8.3...1.8.3