Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
:rabbit:
_FabricOptimizer.state
to remain outdated after loading with load_state_dict
(#18488)log_model
parameter in WandbLogger
via the LightningCLI (#18458)v_num
in the progress bar when running with Trainer(fast_dev_run=True)
(#18491)UnboundLocalError
when running with python -O
(#18496)Trainer(fast_dev_run=True)
(#18550)@awaelchli, @borda, @justusschock, @SebastianGer
If we forgot someone due to not matching commit email with GitHub account, let us know :]
_handle_is_headless
calls in app run loop (#18362)strategy="ddp_spawn"
and accelerator="cpu"
; this has a necessary memory impact, as parameters are replicated for each process now (#18238)fabric.no_backward_sync
with XLA strategies (#17761)param_dtype
training (16-mixed
, bf16-mixed
and 32-true
configurations) to avoid FSDP assertion errors with PyTorch < 2.0 (#18278)param_dtype
training (16-mixed
and bf16-mixed
configurations) to avoid FSDP assertion errors with PyTorch < 2.0 (#18278)experiment
property defined (#18093)MLFlowLogger
for logging artifacts to the MLFlow server (#18395)iter()
call to dataloader when checking dataloading configuration (#18415)strategy="ddp_spawn"
and accelerator="cpu"
; this has a necessary memory impact, as parameters are replicated for each process now (#18238)fetcher.done
with dataloader_iter
(#18376)@awaelchli, @Borda, @carmocca, @quintenroets, @rlizzo, @speediedan, @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]
lightning.pdb
; import lightning.app.pdb
instead (#18177)Fabric.all_reduce()
not performing an inplace operation for all backends consistently (#18235)LightningOptimizer.refresh()
to update the __dict__
in case the optimizer it wraps has changed its internal state (#18280)Missing folder
exception when using a Google Storage URL as a default_root_dir
(#18088)None
) (#18267)LightningOptimizer
wrapper returned by LightningModule.optimizers()
have different internal state than the optimizer it wraps (#18280)@0x404, @awaelchli, @bilelomrani1, @borda, @ethanwharris, @nisheethlahoti
If we forgot someone due to not matching commit email with GitHub account, let us know :]
:rabbit:
None
request in the file orchestration queue (#18111)TensorBoardLogger.log_graph
not unwrapping the _FabricModule
(#17844)LightningCLI
not saving correctly seed_everything
when run=True
and seed_everything=True
(#18056)_FaultTolerantMode
when loading an old checkpoint that pickled the enum (#18094)@awaelchli, @lantiga, @mauvilsa, @shihaoyin
If we forgot someone due to not matching commit email with GitHub account, let us know :]
torch.set_float32_matmul_precision
info message to show multiple times (#17960)Fabric.load()
is called after Fabric.setup()
(#17997)WandbLogger
(#17818)torch.set_float32_matmul_precision
info message to show multiple times (#17960)map_location
argument for the LightningDataModule.load_from_checkpoint
function (#17950)neptune-client
(#17939)@anio, @awaelchli, @borda, @ethanwharris, @lantiga, @nicolai86, @rjarun8, @schmidt-ai, @schuhschuh, @wouterzwerink, @yurijmikhalevich
If we forgot someone due to not matching commit email with GitHub account, let us know :]
plugins.precision.MixedPrecision
(#17687)NeptuneLogger
(#16761):
log()
method with append()
and extend()
.Handler
as an alternative to Run
for the run
argument. This means that you can call it NeptuneLogger(run=run["some/namespace"])
to log everything to the some/namespace/
location of the run.plugins.precision.MixedPrecisionPlugin
(#17687)LightningModule.load_from_checkpoint
when there is an extra state (#17812)@akreuzer, @awaelchli, @borda, @jerome-habana, @kshitij12345
If we forgot someone due to not matching commit email with GitHub account, let us know :]