Load scheduler state when resuming training #2788

n-poulsen · 2024-11-14T14:03:19Z

This pull request addresses learning rate schedulers when resuming training. Currently, the schedulers are simply built from the config file. So when continuing to train from an existing snapshot, the scheduler will restart at epoch 0, and the learning rate won't be adapted (as mentioned in #2784).

In this pull request, the code is updated to:

save the scheduler state dicts in snapshots
when resuming training, try to load the state dict for the scheduler
- if successful, set the optimizers learning rate to the last learning rate from the scheduler
a load_scheduler_state_dict key is added to the runner configuration
- the default is True: loading a snapshot with a saved scheduler to continue training will load the scheduler state dict
- however, users might edit the scheduler's parameters and want an updated learning rate to continue training
- in that case, they need to set load_scheduler_state_dict: false so the state dict doesn't overwrite their edited parameters
- the self.starting_epoch value is used to set the last_epoch in the scheduler; so the way the learning rate will be schedule matches the epochs printed in the logs

Tests were added to validate that the expected learning rates are applied.

deeplabcut/pose_estimation_pytorch/runners/train.py

Co-authored-by: maximpavliv <[email protected]>

n-poulsen added 9 commits November 14, 2024 11:45

improve snapshot loading: resume scheduler

c7ac57f

added config to not load scheduler state dict

a94629e

add docs for detector

9c9faf4

added more tests

f866244

improved tests

f773dda

bug fix

b5ddb68

bug fix and added runner tests

3234422

improved scheduler state loading

776fe66

improved _load_scheduler_state_dict

233225e

n-poulsen added enhancement New feature or request DLC3.0🔥 labels Nov 14, 2024

n-poulsen requested a review from maximpavliv November 14, 2024 14:03

maximpavliv approved these changes Nov 21, 2024

View reviewed changes

deeplabcut/pose_estimation_pytorch/runners/train.py Outdated Show resolved Hide resolved

deeplabcut/pose_estimation_pytorch/runners/train.py Show resolved Hide resolved

n-poulsen and others added 2 commits November 21, 2024 17:07

improved train runner docstring

298fd56

Update deeplabcut/pose_estimation_pytorch/runners/train.py

bd807c5

Co-authored-by: maximpavliv <[email protected]>

n-poulsen merged commit b9a80bd into pytorch_dlc Nov 21, 2024

n-poulsen deleted the niels/save_scheduler_state_dict branch November 21, 2024 16:09

xiu-cs pushed a commit to xiu-cs/DeepLabCut that referenced this pull request Nov 25, 2024

Load scheduler state when resuming training (DeepLabCut#2788)

94a310d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Load scheduler state when resuming training #2788

Load scheduler state when resuming training #2788

Uh oh!

n-poulsen commented Nov 14, 2024

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Load scheduler state when resuming training #2788

Load scheduler state when resuming training #2788

Uh oh!

Conversation

n-poulsen commented Nov 14, 2024

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants