-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
🐛 Describe the bug
I am aware of similar posts which already notify this issue:
- ReduceLROnPlateau will fail when add new parameter group to the optimizer #20997
- Poor support of
Optimizer.add_param_group#53712 - And a lightning issue: How to properly configure lr schedulers when using fine tuning and DDP? Lightning-AI/pytorch-lightning#8727
But it seems that the error should be caught sooner to help the user understand what's going on.
Using ReduceLROnPlateau:
pytorch/torch/optim/lr_scheduler.py
Line 913 in 044a8e3
| class ReduceLROnPlateau: |
If we use Optimizer.add_param_group on the optimizer attached to the scheduler while the ReduceLrOnPlateau is instantiated, the method: _reduce_lr(self, epoch) once called while throw an IndexError: list index out of range due to this line new_lr = max(old_lr * self.factor, self.min_lrs[i] which tries to access an index of min_lrs that does not exist.
This is something that usually happens in a finetuning process when we unfreeze some layers of a network and add a new group of parameters to the optimizer attached to the scheduler.
The property min_lrs of the ReduceLROnPlateau instance is already populated at this point based on the previous number of existing groups in the optimizer ( length of the param_groups) and will not be updated accordingly.
To make it clearer that the problem comes from the fact that len(self.min_lrs) != len(optimizer.param_groups) we could notify the user with some ValueError encapsulated in a property.
class ReduceLROnPlateau:
def __init__(self, optimizer, mode='min', factor=0.1, patience=10,
threshold=1e-4, threshold_mode='rel', cooldown=0,
min_lr=0, eps=1e-8, verbose=False):
....
self.optimizer = optimizer
self._min_lrs = None
self.min_lrs = min_lr
@property
def min_lrs(self):
if len(self._min_lrs) != len(self.optimizer.param_groups):
raise ValueError("expected `min_lrs` lenght of {}, got {}. The number of elements present in `min_lrs` must match the lenght of the {}'s `param_groups`. Set the `min_lrs` of the scheduler each time the optimizer's method add_param_group() is called.".format(
len(self.optimizer.param_groups), len(self._min_lrs), self.optimizer.__class__))
return self._min_lrs
@min_lrs.setter
def min_lrs(self, min_lrs):
if isinstance(min_lrs, (list, tuple)):
if len(self._min_lrs) != len(self.optimizer.param_groups):
raise ValueError("expected {} min_lrs, got {}".format(
len(self.optimizer.param_groups), len(self._min_lrs)))
self._min_lrs = list(min_lrs)
else:
self._min_lrs = [min_lrs] * len(self.optimizer.param_groups)
....This is a general idea and could even be dealt with using a list extension that match length of the param_groups instead of raising an error.
Versions
PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 13.3.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.3 (clang-1403.0.22.14.1)
CMake version: Could not collect
Libc version: N/A
Python version: 3.10.2 (main, Feb 9 2023, 12:03:02) [Clang 14.0.0 (clang-1400.0.29.102)] (64-bit runtime)
Python platform: macOS-13.3.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M1 Pro
Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.25.0
[pip3] pytorch-lightning==2.0.3
[pip3] torch==2.0.1
[pip3] torchaudio==2.0.2
[pip3] torchmetrics==0.11.4
[pip3] torchvision==0.15.2
[conda] Could not collect
Metadata
Metadata
Assignees
Labels
Type
Projects
Status