ReduceLROnPlateau will throw IndexError: list index out of range with modified optimizer's param_groups.

### 🐛 Describe the bug

I am aware of similar posts which already notify this issue:

- https://github.com/pytorch/pytorch/issues/20997
- https://github.com/pytorch/pytorch/issues/53712
- And a lightning issue: https://github.com/Lightning-AI/lightning/issues/8727

But it seems that the error should be caught sooner to help the user understand what's going on.

Using ReduceLROnPlateau: https://github.com/pytorch/pytorch/blob/044a8e3305bdff28780cdab757b859abf2fc76d9/torch/optim/lr_scheduler.py#L913

If we use [Optimizer.add_param_group](https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.add_param_group.html#torch.optim.Optimizer.add_param_group(param_group)) on the optimizer attached to the scheduler while the ReduceLrOnPlateau is instantiated, the method: [_reduce_lr(self, epoch)](
https://github.com/pytorch/pytorch/blob/044a8e3305bdff28780cdab757b859abf2fc76d9/torch/optim/lr_scheduler.py#L1033) once called while throw an **IndexError: list index out of range** due to this line [`new_lr = max(old_lr * self.factor, self.min_lrs[i]`](https://github.com/pytorch/pytorch/blob/044a8e3305bdff28780cdab757b859abf2fc76d9/torch/optim/lr_scheduler.py#L1036C12-L1036C64) which tries to access an index of `min_lrs` that does not exist.

This is something that usually happens in a finetuning process when we unfreeze some layers of a network and add a new group of parameters to the optimizer attached to the scheduler.

The property `min_lrs` of the ReduceLROnPlateau instance is already populated at this point based on the previous number of existing groups in the optimizer ( length of the param_groups) and will not be updated accordingly.

To make it clearer that the problem comes from the fact that `len(self.min_lrs) != len(optimizer.param_groups)` we could notify the user with some ValueError encapsulated in a property.


```python
class ReduceLROnPlateau:
    def __init__(self, optimizer, mode='min', factor=0.1, patience=10,
                 threshold=1e-4, threshold_mode='rel', cooldown=0,
                 min_lr=0, eps=1e-8, verbose=False):

        ....

        self.optimizer = optimizer
        self._min_lrs = None
        self.min_lrs = min_lr


@property
def min_lrs(self):
    if len(self._min_lrs) != len(self.optimizer.param_groups):
        raise ValueError("expected `min_lrs` lenght of {}, got {}. The number of elements present in `min_lrs` must match the lenght of the {}'s `param_groups`. Set the `min_lrs` of the scheduler each time the optimizer's method add_param_group() is called.".format(
            len(self.optimizer.param_groups), len(self._min_lrs), self.optimizer.__class__))
    return self._min_lrs

@min_lrs.setter
def min_lrs(self, min_lrs):
    if isinstance(min_lrs, (list, tuple)):
        if len(self._min_lrs) != len(self.optimizer.param_groups):
            raise ValueError("expected {} min_lrs, got {}".format(
                len(self.optimizer.param_groups), len(self._min_lrs)))
        self._min_lrs = list(min_lrs)
    else:
        self._min_lrs = [min_lrs] * len(self.optimizer.param_groups)

....
```

This is a general idea and could even be dealt with using a list extension that match length of the param_groups instead of raising an error.

### Versions

PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.3.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.3 (clang-1403.0.22.14.1)
CMake version: Could not collect
Libc version: N/A

Python version: 3.10.2 (main, Feb  9 2023, 12:03:02) [Clang 14.0.0 (clang-1400.0.29.102)] (64-bit runtime)
Python platform: macOS-13.3.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Pro

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.25.0
[pip3] pytorch-lightning==2.0.3
[pip3] torch==2.0.1
[pip3] torchaudio==2.0.2
[pip3] torchmetrics==0.11.4
[pip3] torchvision==0.15.2
[conda] Could not collect

cc @vincentqb @jbschlosser @albanD @janeyx99 @crcrpar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ReduceLROnPlateau will throw IndexError: list index out of range with modified optimizer's param_groups. #104361

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ReduceLROnPlateau will throw IndexError: list index out of range with modified optimizer's param_groups. #104361

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions