Kozistr Pytorch Optimizer Versions Save

optimizer & lr scheduler & loss function collections in PyTorch

v3.0.0

4 weeks ago

Change Log

The major version is updated! (v2.12.0 -> v3.0.0) (#164)

Many optimizers, learning rate schedulers, and objective functions are in pytorch-optimizer. Currently, pytorch-optimizer supports 67 optimizers (+ bitsandbytes), 11 lr schedulers, and 13 loss functions, and reached about 4 ~ 50K downloads / month (peak is 75K downloads / month)!

The reason for updating the major version from v2 to v3 is that I think it's a good time to ship the recent implementations (the last update was about 7 months ago) and plan to pivot to new concepts like training utilities while maintaining the original features (e.g. optimizers). Also, rich test cases, benchmarks, and examples are on the list!

Finally, thanks for using the pytorch-optimizer, and feel free to make any requests :)

Feature

Implement REX lr scheduler. (#217, #222)
- Revisiting Budgeted Training with an Improved Schedule
Implement Aida optimizer. (#220, #221)
- A DNN Optimizer that Improves over AdaBelief by Suppression of the Adaptive Stepsize Range
Implement WSAM optimizer. (#213, #216)
- Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term
Implement GaLore optimizer. (#224, #228)
- Memory-Efficient LLM Training by Gradient Low-Rank Projection
Implement Adalite optimizer. (#225, #229)
Implement bSAM optimizer. (#212, #233)
- SAM as an Optimal Relaxation of Bayes
Implement Schedule-Free optimizer. (#230, #233)
- Schedule-Free optimizers
Implement EMCMC. (#231, #233)
- Entropy-MCMC: Sampling from flat basins with ease

Fix

Fix SRMM to allow operation beyond memory_length. (#227)

Dependency

Drop Python 3.7 support officially. (#221)
- Please check the README.
Update bitsandbytes to 0.43.0. (#228)

Docs

Add missing parameters in Ranger21 optimizer document. (#214, #215)
Fix WSAM optimizer paper link. (#219)

Contributions

thanks to @sdbds, @i404788

Diff

from the previous major version : 2.0.0...3.0.0
from the previous version: 2.12.0...3.0.0

v2.12.0

8 months ago

Change Log

Feature

Support bitsandbytes optimizer. (#211)
- now, you can install with pip3 install pytorch-optimizer[bitsandbytes]
- supports 8 bnb optimizers.
  - bnb_adagrad8bit, bnb_adam8bit, bnb_adamw8bit, bnb_lion8bit, bnb_lamb8bit, bnb_lars8bit, bnb_rmsprop8bit, bnb_sgd8bit.

Docs

Introduce mkdocs with material theme. (#204, #206)
- documentation : https://pytorch-optimizers.readthedocs.io/en/latest/

Diff

2.11.2...2.12.0

v2.11.2

9 months ago

Change Log

Feature

Implement DAdaptLion optimizer (#203)
- Lion with D-Adaptation

Fix

Fix Lookahead optimizer (#200, #201, #202)
- When using PyTorch Lightning which expects your optimiser to be a subclass of Optimizer.
Fix default rectify to False in AdaBelief optimizer (#203)

Test

Add DynamicLossScaler test case

Docs

Highlight the code blocks
Fix pepy badges

Contributions

thanks to @georg-wolflein

Diff

2.11.1...2.11.2

v2.11.1

10 months ago

Change Log

Feature

Implement Tiger optimizer (#192)
- A Tight-fisted Optimizer
Implement CAME optimizer (#196)
- Confidence-guided Adaptive Memory Efficient Optimization
Implement loss functions (#198)
- Tversky Loss : Tversky loss function for image segmentation using 3D fully convolutional deep networks
- Focal Tversky Loss
- Lovasz Hinge Loss : The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks

Diff

2.11.0...2.11.1

v2.11.0

11 months ago

Change Log

Feature

Implement PAdam optimizer (#186)
- Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
Implement LOMO optimizer (#188)
- Full Parameter Fine-tuning for Large Language Models with Limited Resources
Implement loss functions (#189)
- BCELoss
- BCEFocalLoss
- FocalLoss : Focal Loss for Dense Object Detection
- FocalCosineLoss : Data-Efficient Deep Learning Method for Image Classification Using Data Augmentation, Focal Cosine Loss, and Ensemble
- DiceLoss : Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations
- LDAMLoss : Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss
- JaccardLoss
- BiTemperedLogisticLoss : Robust Bi-Tempered Logistic Loss Based on Bregman Divergences

Diff

2.10.1...2.11.0

v2.10.1

1 year ago

Change Log

Feature

Implement Prodigy optimizer (#183)
- An Expeditiously Adaptive Parameter-Free Learner

Fix

perturb isn't multiplied by -step_size in SWATS optimizer. (#179)
chebyshev step has size of T while the permutation is 2^T. (#168, #181)

Diff

2.10.0...2.10.1

v2.10.0

1 year ago

Change Log

Feature

Implement Amos optimizer (#174)
- An Adam-style Optimizer with Adaptive Weight Decay towards Model-Oriented Scale
Implement SignSGD optimizer (#176)
- Compressed Optimisation for Non-Convex Problems
Implement AdaHessian optimizer (#176)
- An Adaptive Second Order Optimizer for Machine Learning
Implement SophiaH optimizer (#173, #176)
- A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Implement re-usable functions to compute hessian in BaseOptimizer (#176, #177)
- two types of distribution are supported (Gaussian, Rademacher).
Support AdamD feature for AdaHessian optimizer (#177)

Diff

2.9.1...2.10.0

Contributions

thanks to @i404788

v2.9.1

1 year ago

Change Log

Fix

fix weight decay in Ranger21 (#170)

Diff

2.9.0...2.9.1

v2.9.0

1 year ago

Change Log

Feature

Implement AdaMax optimizer, #148
- A variant of Adam based on the infinity norm
Implement Gravity optimizer, #151
- a Kinematic Approach on Optimization in Deep Learning
Implement AdaSmooth optimizer, #153
- An Adaptive Learning Rate Method based on Effective Ratio
Implement SRMM optimizer, #154
- Stochastic regularized majorization-minimization with weakly convex and multi-convex surrogates
Implement AvaGrad optimizer, #155
- Domain-independent Dominance of Adaptive Methods
Implement AdaShift optimizer, #157
- Decorrelation and Convergence of Adaptive Learning Rate Methods
Upgrade to D-Adaptation v3, #158, #159
Implement AdaDelta optimizer, #160
- An Adaptive Learning Rate Method

Docs

Fix readthedocs build issue, #156
Move citations into table, #156

Refactor

Refactor validation logic, #149, #150
Rename amsbound, amsgrad terms into ams_bound, #149
Return gradient instead of the parameter, AGC. #149
Refactor duplicates (e.g. rectified step size, AMSBound, AdamD, AdaNorm, weight decay) into re-usable functions, #150
Move pytorch_optimizer.experimental under pytorch_optimizer.*.experimental

Diff

2.8.0...2.9.0

v2.8.0

1 year ago

Change Log

Feature

Implement A2Grad optimizer, #136
- Optimal Adaptive and Accelerated Stochastic Gradient Descent
Implement Accelerated SGD optimizer, #137
- Accelerating Stochastic Gradient Descent For Least Squares Regression
Implement Adaptive SGD optimizer, #139
- Adaptive Gradient Descent without Descent
Implement SGDW optimizer, #139
- Decoupled Weight Decay Regularization
Implement Yogi optimizer, #140
- Adaptive Methods for Nonconvex Optimization
Implement SWATS optimizer, #141
- Improving Generalization Performance by Switching from Adam to SGD
Implement Fromage optimizer, #142
- On the distance between two neural networks and the stability of learning
Implement MSVAG optimizer, #143
- Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients
Implement AdaMod optimizer, #144
- An Adaptive and Momental Bound Method for Stochastic Learning
Implement AggMo optimizer, #145
- Aggregated Momentum: Stability Through Passive Damping
Implement QHAdam, QHM optimizers, #146
- Quasi-hyperbolic momentum and Adam for deep learning
Implement PID optimizer, #147
- A PID Controller Approach for Stochastic Optimization of Deep Networks

Bug

Fix update in Lion optimizer, #135
Fix momentum_buffer in SGDP optimizer, #139