-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Open
Labels
featureA request for a proper, new feature.A request for a proper, new feature.module: optimizerRelated to torch.optimRelated to torch.optimtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🚀 Feature
Weight decay is used very often. A common strategy is to implicitly use the learning rate scheduler todo so, or to simply shrinking the weights at the end of each iteration by a constant multiplicative factor. However, one could expect to use strategies different from this. In that case, we could have weight schedulers that modify the weights using a syntax and grammar similar to learning rate schedulers already available.
Motivation
The AdamW code from #21250 does not include the weight scheduler, as mentioned here.
There are also currently a few issues and pull requests about weight schedulers, see below.
Alternatives
- Decoupled Weight Decay Regularization in optimizers (added adamw and sgdw among others) #4429 suggests modifying the optimizer logic to accept a new parameter
weight_decayspecifying the constant multiplicative factor to use. - Decoupled Weight Decay Regularization #3740, Implement AdamW optimizer #21250, adding AdamW optimizer #22163 introduce variations on Adam and other optimizers with a corresponding built-in weight decay. Add SGDR, SGDW, AdamW and AdamWR #3790 is requesting some of these to be supported.
- We could instead have a new "weight_decay_type" option to those optimizers to switch between common strategies.
Metadata
Metadata
Assignees
Labels
featureA request for a proper, new feature.A request for a proper, new feature.module: optimizerRelated to torch.optimRelated to torch.optimtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module