-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
In the optimizer's param_groups['params'] the order of the parameters (in which they were given to the optimizer's init) matters.
In load_state_dict the snippet shows this :
id_map = {old_id: p for old_id, p in
zip(chain(*(g['params'] for g in saved_groups)),
chain(*(g['params'] for g in groups)))}
state = {id_map.get(k, k): v for k, v in state_dict['state'].items()}
If we change the order in which the parameters are given to the optimizer when loading, the code breaks as the state dict now incorrectly maps parameters to their states.
Consider model (when using, say, Adam optimizer)
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.p1 = nn.Linear(2,3, False)
self.p2 = nn.Linear(3,4, False)
After saving, if the order in which the parameters are defined in the model changes i.e. if I change the class to have
self.p2 = nn.Linear(3,4, False)
self.p1 = nn.Linear(2,3, False)
the loaded optimizer's state for p1 will be mapped to p2 and vice-versa. I tried this and this indeed happens which is wrong and now training cannot proceed (step() will, rightly so, give an error).
The nn.Module class is robust to such behavior as it uses parameter names instead of id ordering.
IMO the optimizer should also use parameter names instead of ids and relying on the ordering in which they are supplied to the optimizer when initializing.
Corresponding PyTorch-Discuss post
cc @vincentqb