Load scheduler state when resuming training #2788
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request addresses learning rate schedulers when resuming training. Currently, the schedulers are simply built from the config file. So when continuing to train from an existing snapshot, the scheduler will restart at epoch 0, and the learning rate won't be adapted (as mentioned in #2784).
In this pull request, the code is updated to:
load_scheduler_state_dictkey is added to the runner configurationTrue: loading a snapshot with a saved scheduler to continue training will load the scheduler state dictload_scheduler_state_dict: falseso the state dict doesn't overwrite their edited parametersself.starting_epochvalue is used to set thelast_epochin the scheduler; so the way the learning rate will be schedule matches the epochs printed in the logsTests were added to validate that the expected learning rates are applied.