Hi, I downloaded the checkpoint file you provided in README and tried to train one more epoch to observe something interesting. However, it raised error indicating that loaded state dict has a different number of parameter groups when calling optimizer.load_state_dict(ckpt["optimizer"]). I wonder whether there is some difference between the code here and that you used to generate model ?