-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Open
Labels
module: optimizerRelated to torch.optimRelated to torch.optimtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
Recently, there are two papers from Ilya Loshchilov and Frank Hutter.
SGDR: Stochastic Gradient Descent with Warm Restarts, introduces a learning rate decay method according to different training periods. Several hyperparameters improve the-state-of-the art result a lot. It has been added to tensorflow as tf.train.cosine_decay
Their recent work Fixing Weight Decay Regularization in Adam fixes a wide misunderstanding when using Adam and weight decay synchronously. It has been highly admitted by the author of Adam.
I think pytorch should add these features as well.
cc @vincentqb
AmmarJawad, roma-goodok, lkytal, acburigo, showgood163 and 10 more
Metadata
Metadata
Assignees
Labels
module: optimizerRelated to torch.optimRelated to torch.optimtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module