remove hardcoded seeds from trainer by tstadel · Pull Request #424 · deepset-ai/FARM

tstadel · 2020-06-25T19:10:59Z

resolves #423 by removing calls to set_all_seeds in Trainer.train()

tholor · 2020-06-26T09:35:49Z

Thanks for flagging this and proposing a fix @tstadel !

The reason why we set the seeds there was to gain full reproducibility of training with checkpointing vs. without.
If you load a checkpoint, we will "fast-forward" the DataLoader to the batch where you stopped in your last run (e.g step 100). During this fast-forwarding, the random state will change as there are operations that modify it (e.g. finding a random next sentence). So even though we load the random states from the checkpoint, they will be altered after the fast-forwarding.

I agree that they shouldn't be hardcoded to specific values (=39), but do you see a particular down-side of having them there? We could, for example, add an arg "seed" to the Trainer() and use this value instead.

tstadel · 2020-06-26T11:29:03Z

Thanks for the explanation!

Nevertheless I have to say, it just looks pretty odd to me to set the seed in every step... Even if we introduce an arg to the Trainer, calls to set_all_seeds() from upstream code would be effectively ignored. So in order to reproduce results from such code without having to change it, the seed setting within train() should be optional.

Is it simply too cumbersome to restore the seeds stored in the checkpoint after fast-forwarding or is there another reason we can't do that?

tholor · 2020-06-30T07:34:01Z

Is it simply too cumbersome to restore the seeds stored in the checkpoint after fast-forwarding or is there another reason we can't do that?

It's not really complex, but it would probably make the code a bit less readable as we don't have access to the checkpoint anymore after the initial loading via the Trainer's class method. One option would be to pass the checkpoint path as an attribute to the Trainer instance and then loading the random states like here:

FARM/farm/train.py

Lines 477 to 482 in 0098819

    
           numpy_rng_state = trainer_checkpoint["numpy_rng_state"] 
        
           numpy.random.set_state(numpy_rng_state) 
        
           rng_state = trainer_checkpoint["rng_state"] 
        
           cuda_rng_state = trainer_checkpoint["cuda_rng_state"] 
        
           torch.set_rng_state(rng_state) 
        
           torch.cuda.set_rng_state(cuda_rng_state)

Thinking more about this, I agree that the current situation is not ideal as recent research also shows quite some variance of performance for different seeds in downstream tasks. This "seed hacking" is not possible right now.

Thus, my suggestion is to:

short-term: get rid of these seeds
mid/long-term: load the random states from the checkpoint after fast-forwarding

Let me know if you are willing to work on the midterm solution.

tstadel · 2020-06-30T08:44:27Z

Yes, I'd like to work on the midterm solution. I'm trying to find some time for this till the end of the week.

tholor · 2020-06-30T08:53:06Z

Perfect, thanks! I am merging this one with the short term fix now, and we can tackle the midterm solution in a separate PR.

remove hardcoded seeds from trainer

4eeb076

resolves deepset-ai#423

Timoeller assigned tholor Jun 26, 2020

tholor approved these changes Jun 30, 2020

View reviewed changes

tholor merged commit daa6291 into deepset-ai:master Jun 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove hardcoded seeds from trainer#424

remove hardcoded seeds from trainer#424
tholor merged 1 commit intodeepset-ai:masterfrom
tstadel:hardcoded_seeds_fix

tstadel commented Jun 25, 2020

Uh oh!

tholor commented Jun 26, 2020 •

edited

Loading

Uh oh!

tstadel commented Jun 26, 2020 •

edited

Loading

Uh oh!

tholor commented Jun 30, 2020

Uh oh!

tstadel commented Jun 30, 2020

Uh oh!

tholor commented Jun 30, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tstadel commented Jun 25, 2020

Uh oh!

tholor commented Jun 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tstadel commented Jun 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tholor commented Jun 30, 2020

Uh oh!

tstadel commented Jun 30, 2020

Uh oh!

tholor commented Jun 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tholor commented Jun 26, 2020 •

edited

Loading

tstadel commented Jun 26, 2020 •

edited

Loading

tholor commented Jun 30, 2020 •

edited

Loading