Skip to content

Engine interrupt/continue feature as generator#2682

Merged
vfdev-5 merged 16 commits intopytorch:masterfrom
vfdev-5:feature-engine-interrupt-continue-as-generator
Sep 1, 2022
Merged

Engine interrupt/continue feature as generator#2682
vfdev-5 merged 16 commits intopytorch:masterfrom
vfdev-5:feature-engine-interrupt-continue-as-generator

Conversation

@vfdev-5
Copy link
Copy Markdown
Collaborator

@vfdev-5 vfdev-5 commented Aug 31, 2022

Description:

  • Engine interrupt/continue feature as generator

Usage:

from ignite.engine import Engine, Events
from ignite.utils import setup_logger, logging

data = range(10)
max_epochs = 3

def check_input_data(e, b):
    print(f"Epoch {engine.state.epoch}, Iter {engine.state.iteration} | data={b}")
    i = (e.state.iteration - 1) % len(data)
    assert b == data[i]

engine = Engine(check_input_data)
engine.logger = setup_logger("engine", level=logging.WARNING)

can_interrupt = True

@engine.on(Events.ITERATION_COMPLETED(every=6))
def call_interrupt():
    if can_interrupt:
        engine.interrupt()


print("\nStart engine run with interruptions:")
state = engine.run(data, max_epochs=max_epochs)
print("1 Engine run is interrupted at ", state.epoch, state.iteration)
state = engine.run(data, max_epochs=max_epochs)
print("2 Engine run is interrupted at ", state.epoch, state.iteration)
state = engine.run(data, max_epochs=max_epochs)
print("3 Engine run is interrupted at ", state.epoch, state.iteration)

can_interrupt = False
           
state = engine.run(data, max_epochs=max_epochs)
print("Engine ended the run at ", state.epoch, state.iteration)

print("Compare to a run without interruptions:")
state = engine.run(data, max_epochs=max_epochs)
print("Engine ended the run at ", state.epoch, state.iteration)
Output:
Start engine run with interruptions:
Epoch 1, Iter 1 | data=0
Epoch 1, Iter 2 | data=1
Epoch 1, Iter 3 | data=2
Epoch 1, Iter 4 | data=3
Epoch 1, Iter 5 | data=4
Epoch 1, Iter 6 | data=5
1 Engine run is interrupted at  1 6
Epoch 1, Iter 7 | data=6
Epoch 1, Iter 8 | data=7
Epoch 1, Iter 9 | data=8
Epoch 1, Iter 10 | data=9
Epoch 2, Iter 11 | data=0
Epoch 2, Iter 12 | data=1
2 Engine run is interrupted at  2 12
Epoch 2, Iter 13 | data=2
Epoch 2, Iter 14 | data=3
Epoch 2, Iter 15 | data=4
Epoch 2, Iter 16 | data=5
Epoch 2, Iter 17 | data=6
Epoch 2, Iter 18 | data=7
3 Engine run is interrupted at  2 18
Epoch 2, Iter 19 | data=8
Epoch 2, Iter 20 | data=9
Epoch 3, Iter 21 | data=0
Epoch 3, Iter 22 | data=1
Epoch 3, Iter 23 | data=2
Epoch 3, Iter 24 | data=3
Epoch 3, Iter 25 | data=4
Epoch 3, Iter 26 | data=5
Epoch 3, Iter 27 | data=6
Epoch 3, Iter 28 | data=7
Epoch 3, Iter 29 | data=8
Epoch 3, Iter 30 | data=9
Engine ended the run at  3 30
Compare to a run without interruptions:
Epoch 1, Iter 1 | data=0
Epoch 1, Iter 2 | data=1
Epoch 1, Iter 3 | data=2
Epoch 1, Iter 4 | data=3
Epoch 1, Iter 5 | data=4
Epoch 1, Iter 6 | data=5
Epoch 1, Iter 7 | data=6
Epoch 1, Iter 8 | data=7
Epoch 1, Iter 9 | data=8
Epoch 1, Iter 10 | data=9
Epoch 2, Iter 11 | data=0
Epoch 2, Iter 12 | data=1
Epoch 2, Iter 13 | data=2
Epoch 2, Iter 14 | data=3
Epoch 2, Iter 15 | data=4
Epoch 2, Iter 16 | data=5
Epoch 2, Iter 17 | data=6
Epoch 2, Iter 18 | data=7
Epoch 2, Iter 19 | data=8
Epoch 2, Iter 20 | data=9
Epoch 3, Iter 21 | data=0
Epoch 3, Iter 22 | data=1
Epoch 3, Iter 23 | data=2
Epoch 3, Iter 24 | data=3
Epoch 3, Iter 25 | data=4
Epoch 3, Iter 26 | data=5
Epoch 3, Iter 27 | data=6
Epoch 3, Iter 28 | data=7
Epoch 3, Iter 29 | data=8
Epoch 3, Iter 30 | data=9
Engine ended the run at  3 30
  • Run time measurements vs stable
  1. Empty engine:
import time
from ignite.engine import Engineengine = Engine(lambda e, b: time.sleep(0.01))

%%timeit -r 3 -n 5
engine.run(epoch_length=50, max_epochs=10)
  • v0.4.9 : 5.73 s ± 4.88 ms per loop (mean ± std. dev. of 3 runs, 5 loops each)
  • This PR: 5.73 s ± 30.2 ms per loop (mean ± std. dev. of 3 runs, 5 loops each)
  1. Cifar10 example

torchrun --nproc_per_node=2 main.py run --backend="nccl"

  • master : avg 2m 35 s

  • This PR: avg 2m 35 s

  • Make implementation more bullet-proof

  • Keep previous Engine for BC

Check list:

  • New tests are added (if a new feature is added)
  • New doc strings: description and/or example code are in RST format
  • Documentation is updated (if required)

@github-actions github-actions bot added the module: engine Engine module label Aug 31, 2022
Copy link
Copy Markdown
Contributor

@Nic-Ma Nic-Ma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick enhancement.
Seems like a non-breaking change, looks good to me.

Thanks.

@holgerroth
Copy link
Copy Markdown

Thank you @vfdev-5 for building this feature. Looks good!

@vfdev-5 vfdev-5 merged commit b17ee25 into pytorch:master Sep 1, 2022
@vfdev-5 vfdev-5 deleted the feature-engine-interrupt-continue-as-generator branch September 1, 2022 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: engine Engine module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants