Skip to content

MultiStepLR does not return good lr after load_state_dict #29697

@vadimkantorov

Description

@vadimkantorov

The param_group's lr's cannot be trusted if the optimizer state is not restored (and this can be okay, because optimizer buffers can double the checkpoint size).
In this line they are trusted if last_epoch is between milestones https://github.com/pytorch/pytorch/blob/master/torch/optim/lr_scheduler.py#L389

The closed_form formula is correct, since it does bisect_right to reset the lr value but for some reason it is not called.

I'm not sure if this is a problem with some assumed updated state of optimizer param_groups' lr or a problem with API of schedulers or sth else...

My use case is: I use MultiStepLR to drop learning rate at some #iterations milestones. After loading a checkpoint, the iteration index is restored, but the optimizer state is not, so I was relying on MultiStepLR to recompute the lr from the last_epoch field which is way greater than the values in milestones, so lr should be dropped.

cc @vincentqb @ezyang

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: optimizerRelated to torch.optimtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions