Skip to content

3686 Skip workflow run if data is empty or the specified epoch_length is 0#3690

Merged
wyli merged 7 commits intoProject-MONAI:devfrom
Nic-Ma:3686-skip-workflow-run
Jan 21, 2022
Merged

3686 Skip workflow run if data is empty or the specified epoch_length is 0#3690
wyli merged 7 commits intoProject-MONAI:devfrom
Nic-Ma:3686-skip-workflow-run

Conversation

@Nic-Ma
Copy link
Copy Markdown
Contributor

@Nic-Ma Nic-Ma commented Jan 20, 2022

Fixes #3686 .

Description

This PR enhanced the workflow to skip run if data is empty or the specified epoch_length is 0.

Status

Ready

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@Nic-Ma
Copy link
Copy Markdown
Contributor Author

Nic-Ma commented Jan 20, 2022

/black

@Nic-Ma
Copy link
Copy Markdown
Contributor Author

Nic-Ma commented Jan 20, 2022

/build

@Nic-Ma Nic-Ma requested review from ericspod, rijobro and wyli January 20, 2022 16:27
Copy link
Copy Markdown
Contributor

@SachidanandAlle SachidanandAlle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.. should be good if we have the test case for multi-gpu as well

@Nic-Ma
Copy link
Copy Markdown
Contributor Author

Nic-Ma commented Jan 21, 2022

Hi @ericspod @SachidanandAlle ,

Thanks for the review.
I spent much time today to add support for multi-gpu training case, I tried many methods, but no ideal method can guarantee our distributed communication logic works fine with less ranks that the setting, it's PyTorch or NCCL logic, we can't dynamically add or reduce ranks. For example, some handlers or metrics need all ranks to run the same logic to all-gather the result, otherwise, it will hang.
I already added unit tests for multi-gpu training and added more message in the warning for the hanging case.

Thanks.

@Nic-Ma
Copy link
Copy Markdown
Contributor Author

Nic-Ma commented Jan 21, 2022

/black

@Nic-Ma
Copy link
Copy Markdown
Contributor Author

Nic-Ma commented Jan 21, 2022

/build

@wyli wyli merged commit e96dcca into Project-MONAI:dev Jan 21, 2022
wyli pushed a commit that referenced this pull request Jan 21, 2022
… is 0 (#3690)

* [DLMED] check 0 length

Signed-off-by: Nic Ma <[email protected]>

* [DLMED] add dist tests

Signed-off-by: Nic Ma <[email protected]>
wyli pushed a commit that referenced this pull request Jan 21, 2022
… is 0 (#3690)

* [DLMED] check 0 length

Signed-off-by: Nic Ma <[email protected]>

* [DLMED] add dist tests

Signed-off-by: Nic Ma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Skip workflow run if no data provided

4 participants