Skip to content

Add SGDR, SGDW, AdamW and AdamWR #3790

@onlytailei

Description

@onlytailei

Recently, there are two papers from Ilya Loshchilov and Frank Hutter.
SGDR: Stochastic Gradient Descent with Warm Restarts, introduces a learning rate decay method according to different training periods. Several hyperparameters improve the-state-of-the art result a lot. It has been added to tensorflow as tf.train.cosine_decay
Their recent work Fixing Weight Decay Regularization in Adam fixes a wide misunderstanding when using Adam and weight decay synchronously. It has been highly admitted by the author of Adam.

I think pytorch should add these features as well.

cc @vincentqb

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: optimizerRelated to torch.optimtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions