Skip to content

Add random delay during multiple job launch to break tie#1243

Merged
wyli merged 2 commits intoProject-MONAI:masterfrom
IsaacYangSLA:add_random_delay_during_concurrent_job_launching
Nov 17, 2020
Merged

Add random delay during multiple job launch to break tie#1243
wyli merged 2 commits intoProject-MONAI:masterfrom
IsaacYangSLA:add_random_delay_during_concurrent_job_launching

Conversation

@IsaacYangSLA
Copy link
Copy Markdown
Contributor

Signed-off-by: Isaac Yang [email protected]

Fixes #1236 .

Description

GPU idle detection may have a race condition if multiple jobs are launched at nearly the same time. Add some random delay to break the tie.

The logs from four jobs in reported failed pipeline showed all of them were assigned to GPU 0 and 3. That meant all of them reached the detection code at the same time and at the time, none of them was using GPU.

The number 16 can be larger so the probability of two jobs being launched side-by-side is reduced.
The number 60 is to give time for unittest code to start allocating GPU memory, and for other instance to be aware of GPU utilization by others.

Status

Ready

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh --codeformat --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

Copy link
Copy Markdown
Contributor

@wyli wyli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@wyli wyli merged commit 870258b into Project-MONAI:master Nov 17, 2020
@IsaacYangSLA IsaacYangSLA deleted the add_random_delay_during_concurrent_job_launching branch November 17, 2020 17:25
wyli added a commit to wyli/MONAI that referenced this pull request Apr 5, 2021
wyli added a commit to wyli/MONAI that referenced this pull request Apr 5, 2021
@wyli wyli mentioned this pull request Apr 5, 2021
1 task
wyli added a commit to wyli/MONAI that referenced this pull request Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cron integration test memory error

2 participants