[BC-breaking] Feature/#2465 lr scheduler attach events#2496
[BC-breaking] Feature/#2465 lr scheduler attach events#2496vfdev-5 merged 19 commits intopytorch:masterfrom
Conversation
ignite/handlers/param_scheduler.py
Outdated
| milestones_values.append((warmup_duration, init_lr)) | ||
|
|
||
| lr_scheduler = LRScheduler(lr_scheduler, save_history=save_history) | ||
| lr_scheduler = LRScheduler(lr_scheduler, save_history=save_history, use_legacy=True) |
There was a problem hiding this comment.
Can we check if we could use use_legacy=False ? Or what is blocking here ?
There was a problem hiding this comment.
if use_legacy=False, the result look like the example below
- example1
from torch.optim.lr_scheduler import ExponentialLR
tensor = torch.zeros([1], requires_grad=True)
optimizer = torch.optim.SGD([tensor], lr=0.01)
torch_lr_scheduler = StepLR(optimizer, step_size=3, gamma=0.1)
trainer = Engine(dummy_update)
scheduler = create_lr_scheduler_with_warmup(torch_lr_scheduler,
warmup_start_value=0.0,
warmup_end_value=0.1,
warmup_duration=3)
trainer.add_event_handler(Events.ITERATION_STARTED, scheduler)
@trainer.on(Events.ITERATION_COMPLETED)
def print_lr():
print(optimizer.param_groups[0]["lr"])
_ = trainer.run([0] * 8, max_epochs=1)- output1
0.0
0.05
0.1
0.01
0.01
0.01
0.01
0.001
- example2
from torch.optim.lr_scheduler import ExponentialLR
tensor = torch.zeros([1], requires_grad=True)
optimizer = torch.optim.SGD([tensor], lr=0.1)
torch_lr_scheduler = StepLR(optimizer, step_size=3, gamma=0.1)
trainer = Engine(dummy_update)
scheduler = create_lr_scheduler_with_warmup(torch_lr_scheduler,
warmup_start_value=0.0,
warmup_end_value=0.1,
warmup_duration=3)
trainer.add_event_handler(Events.ITERATION_STARTED, scheduler)
@trainer.on(Events.ITERATION_COMPLETED)
def print_lr():
print(optimizer.param_groups[0]["lr"])
_ = trainer.run([0] * 8, max_epochs=1)- output2
0.0
0.05
0.1
0.1
0.1
0.1
0.010000000000000002
0.010000000000000002
The initial value of stepLR (= 0.1 or 0.01) is counted one extra time than step_size (=3).
There was a problem hiding this comment.
Thanks for the details @yuta0821 , can you please debug this a bit more and explicitly say which scheduler is responsible for adding LR value (0.1 or 0.01). I'm not quite sure to understand why exactly this happens. Thanks !
There was a problem hiding this comment.
@vfdev-5 Thanks a lot for your comment !
Consider the case where warmup_start_value=0.0, warmup_end_value=0.1, warmup_duration=3.
In this case, milestones_values = [(0, 0.0), (2, 0.1)]
If the initial value of lr in the optimizer is different from the warmup_end_value, it is necessary to add the initial value to the end of milestones. Therefore, milestones_values = [(0, 0.0), (2, 0.1), (3, initial value of lr)]
This is because the LRScheduler updates the lr starting from the last value of the milestones_values.
After that the following code is executed, resulting in repeating the initial value of lr.
super(LRScheduler, self).__call__(engine, name)
self.lr_scheduler.last_epoch += 1 # type: ignore[attr-defined]Even if the initial value of lr in the optimizer is equal to the warmup_end_value, then the initial value of lr will be called extra once.
In the end, since the first __call__ method of LRScheduler runs with reference to the last value of milestones_values, the last value of milestones_values plus the initial value of LRScheduler are duplicated.
If we adjust this bug without use_legacy=False, we may have to change a lot of code such like one related to the PeacewiseLinear , which is beyond the scope of this PR.
There was a problem hiding this comment.
@yuta0821 sorry for delay and thanks for the explanation! I haven't checked it in details but will do as soon as it could be possible from my side (~4-5 days).
@sdesrozis can you help with that if you have some bandwidth ?
There was a problem hiding this comment.
It performs the same operation as use_legacy, but wouldn't it be preferable to add the argument skip_initial_value as a variable to be used for the internal function ?
There was a problem hiding this comment.
Yeah, expected rather than excepted 😅
As far I understand, using use_legacy=False, the first lr comes from the optimizer. Whatever schedulers used, the schedulers concatenation will produce a repetition at each joint.
Having an internal option as you suggested sounds good to me. I mean rename use_legacy to skip_initial_value is fine. Although, we have to keep use_legacy for the users.
What do you think ?
There was a problem hiding this comment.
@sdesrozis
I am sorry for having consulted with you.
It seems that this problem can be solved by setting the internal variable keep_first_value=True only when create_lr_scheduler_with_warmup is called, as shown below, to store the initial value in LRScheduler.
class LRScheduler(ParamScheduler):
def __init__(self, lr_scheduler: _LRScheduler, save_history: bool = False, use_legacy: bool = False, keep_first_lr: bool = False):
if keep_first_lr:
self.lr_scheduler._get_lr_called_within_step = True # type: ignore[attr-defined]
self.first_lr = self.lr_scheduler.get_lr()
self.lr_scheduler._get_lr_called_within_step = False # type: ignore[attr-defined]
def get_param(self) -> Union[float, List[float]]:
"""Method to get current optimizer's parameter value"""
# Emulate context manager for pytorch>=1.4
if hasattr(self, "first_lr"):
lr_list = self.first_lr
del self.first_lr
else:
def create_lr_scheduler_with_warmup( ):
if isinstance(lr_scheduler, _LRScheduler):
init_lr = param_group["lr"]
lr_scheduler = LRScheduler(lr_scheduler, save_history=save_history, keep_first_lr=True)
else:
init_lr = lr_scheduler.get_param()I am running the existing test now. I will commit once all tests pass ! -> Done !
There was a problem hiding this comment.
It seems that the tests are ko...
There was a problem hiding this comment.
Sorry, I added type: ignore[attr-defined] !
…cheduler_attach_Events
|
@yuta0821 Thanks very much for this work. I have to think about the api though. The |
|
@sdesrozis Yes, I agree, I don't want to add more args without any thought. However, I haven't yet figured out how to control the |
|
My toughts are the target would be having the following code working fine from torch.optim.lr_scheduler import StepLR
scheduler1 = StepLR(optimizer, step_size=3, gamma=0.1)
scheduler1 = LRScheduler(scheduler1)
scheduler2 = StepLR(optimizer, step_size=3, gamma=0.01)
scheduler2 = LRScheduler(scheduler2)
scheduler = ConcatScheduler(schedulers=[scheduler1, scheduler2], durations=[4, ])Whether scheduler2 = LRScheduler(scheduler2)
scheduler2.keep_first_lr = TrueI don't know yet... Extra question, what happen if |
|
@sdesrozis I'd like to confirm the behavior of following code using from torch.optim.lr_scheduler import StepLR
optimizer = torch.optim.SGD([tensor], lr=0.01)
torch_scheduler1 = StepLR(optimizer, step_size=3, gamma=0.1)
scheduler1 = LRScheduler(torch_scheduler1, use_legacy=True)
torch_scheduler2 = StepLR(optimizer, step_size=3, gamma=0.01)
scheduler2 = LRScheduler(torch_scheduler2, use_legacy=True)
scheduler = ConcatScheduler(schedulers=[scheduler1, scheduler2], durations=[4, ])
def dummy_update(engine, batch):
pass
trainer = Engine(dummy_update)
@trainer.on(Events.ITERATION_COMPLETED)
def print_lr():
print(optimizer.param_groups[0]["lr"])
trainer.add_event_handler(Events.ITERATION_COMPLETED, scheduler)
_ = trainer.run([0] * 9, max_epochs=1)
OK, I understand what message means. Thank you for the clear explanation. Surely, we must enable |
…hub.com/yuta0821/ignite into feature/#2465_LRScheduler_attach_Events
@yuta0821 the output looks correct to me. We have What happens if using |
|
@vfdev-5
If set from torch.optim.lr_scheduler import StepLR
optimizer = torch.optim.SGD([tensor], lr=0.01)
torch_scheduler1 = StepLR(optimizer, step_size=3, gamma=0.1)
scheduler1 = LRScheduler(torch_scheduler1, use_legacy=False)
torch_scheduler2 = StepLR(optimizer, step_size=3, gamma=0.01)
scheduler2 = LRScheduler(torch_scheduler2, use_legacy=False)
scheduler = ConcatScheduler(schedulers=[scheduler1, scheduler2], durations=[4, ])
def dummy_update(engine, batch):
pass
trainer = Engine(dummy_update)
trainer.add_event_handler(Events.ITERATION_STARTED, scheduler)
@trainer.on(Events.ITERATION_COMPLETED)
def print_lr():
print(optimizer.param_groups[0]["lr"])
_ = trainer.run([0] * 9, max_epochs=1) |
|
@yuta0821 sorry for delay on this PR, i'm still missing a complete understanding of the problem here. I'll try to figure out what happens exactly on my side and try to suggest a way to handle that. Adding an argument for internal usage is not a good design. |
No, sir, not at all. Thanks a lot for your help. If I can contribute in any way, I'll be happy to do so. |
|
@yuta0821 i understand the problem and why you need to introduce or 2) I see here two ways to move forward:
lr_scheduler = LRScheduler(lr_scheduler, save_history=save_history)
lr_scheduler.lr_scheduler.last_epoch += 1 |
Thank you for your analysis. Your summary is exactly what I was thinking. |
|
@yuta0821 I'm trying to figure out how to merge your work to master step by step. |
|
@vfdev-5 Thank you for your reply ! |
…_LRScheduler_attach_Events
52dcbcf to
f593385
Compare
|
@yuta0821 I pushed few commits to make this PR merged. Basically, it is one of your suggestions. Currently, I do not have a better way to fix it. Let's keep it as it is. |
|
Thank you for your review.
I understand. If I come up with a better way to fix it, I will create a new issue. |
Fixes #2465
Description:
Based on discussion in the #2465, I updated the code so that the LRScheduler will attach to Events.STARTED.
Main Contribution
use_legacyon class LRScheduler and explanation of it__call__methodsimulate_valuessuch as change the timing of__call__method of scheduleruse_legacyworksMatters for consultation
Currently implementing the following part using
use_legacy = True, but feel it is not a very smart implemantation.However, in the current implementation, we need to set the initial learning rate at the end of each milestone, and the change for issue #2465 in implementation will probably result in one extra initial learning rate. I couldn't think of an easy way to fix this. I think if don't take
use_legacy=True, probably may need to change the code in many places.I ran the following test and pass them
Check list: