Skip to content

Add interrupt/terminate option to Trainers & Evaluators #4554

@holgerroth

Description

@holgerroth

Is your feature request related to a problem? Please describe.
It should be possible to abort/finalize a running Trainer by calling the API (rather than ctr+C). This will be helpful if the Trainer needs to be executed remotely, such as in federated learning (FL) scenarios.

Describe the solution you'd like
Add abort() and finalize() functions to the Trainer class (or potentially its base class). Note, finalize() should terminate the training completely, while abort() should allow later continue of where it was aborted(), by calling run() again.

For example, an ignite-based Trainer support abort() and finalize() calls could be implemented as such (Currently used in MONAI-FL's MonaiAlgo class; private repo - contact me if you need access)

    def abort(self):
        self.trainer.terminate()
        # save current iteration for next round
        setattr(self.trainer.state, "dataloader_iter", self.trainer._dataloader_iter)

        if self.trainer.state.iteration % self.trainer.state.epoch_length == 0:
            # if current iteration is end of 1 epoch, manually trigger epoch completed event
            self.trainer._fire_event(Events.EPOCH_COMPLETED)

    def finalize(self):
        self.trainer.terminate()

Describe alternatives you've considered
n/a

Additional context
n/a

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

Relationships

None yet

Development

No branches or pull requests

Issue actions