remove hardcoded seeds from trainer#424
Conversation
|
Thanks for flagging this and proposing a fix @tstadel ! The reason why we set the seeds there was to gain full reproducibility of training with checkpointing vs. without. I agree that they shouldn't be hardcoded to specific values (=39), but do you see a particular down-side of having them there? We could, for example, add an arg "seed" to the Trainer() and use this value instead. |
|
Thanks for the explanation! Nevertheless I have to say, it just looks pretty odd to me to set the seed in every step... Even if we introduce an arg to the Trainer, calls to Is it simply too cumbersome to restore the seeds stored in the checkpoint after fast-forwarding or is there another reason we can't do that? |
It's not really complex, but it would probably make the code a bit less readable as we don't have access to the checkpoint anymore after the initial loading via the Trainer's class method. One option would be to pass the checkpoint path as an attribute to the Trainer instance and then loading the random states like here: Lines 477 to 482 in 0098819 Thinking more about this, I agree that the current situation is not ideal as recent research also shows quite some variance of performance for different seeds in downstream tasks. This "seed hacking" is not possible right now. Thus, my suggestion is to:
Let me know if you are willing to work on the midterm solution. |
|
Yes, I'd like to work on the midterm solution. I'm trying to find some time for this till the end of the week. |
|
Perfect, thanks! I am merging this one with the short term fix now, and we can tackle the midterm solution in a separate PR. |
resolves #423 by removing calls to set_all_seeds in Trainer.train()