Skip to content

Adding Thread Worker Option to ThreadDataLoader#4252

Merged
ericspod merged 25 commits intoProject-MONAI:devfrom
ericspod:thread_dataloader_extension
Jun 6, 2022
Merged

Adding Thread Worker Option to ThreadDataLoader#4252
ericspod merged 25 commits intoProject-MONAI:devfrom
ericspod:thread_dataloader_extension

Conversation

@ericspod
Copy link
Copy Markdown
Member

Signed-off-by: Eric Kerfoot [email protected]

Description

Adds the ability to run workers in ThreadDataLoader as threads instead of processes. This is a fix for Windows when we have issues with its process spawning semantics.

Status

Work in progress

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@Nic-Ma
Copy link
Copy Markdown
Contributor

Nic-Ma commented May 10, 2022

Hi @ericspod ,

I didn't check details of this PR as it's still WIP, but I want to raise a concern:
We didn't enable multi-threads for ThreadDataloader because most of MONAI random transforms are not thread-safe.
What do you think about it?
CC @wyli @rijobro

Thanks in advance.

@ericspod
Copy link
Copy Markdown
Member Author

I'm still dealing with a error I've had that's shown up as the failures. For those transforms that are thread-safe this would be an enhancement, we have those that aren't safe as inheriting from ThreadUnsafe but that may not be sufficient. All the random transforms aren't thread-safe because they share random states. This was meant to be a potential fix for some Windows issues that I wanted to put out there but I'll leave it as a WIP for now.

@wyli
Copy link
Copy Markdown
Contributor

wyli commented May 26, 2022

@ericspod @Nic-Ma I tried this PR with the fast training test, it works fine and it's slightly faster with my desktop (34s vs 36s)...this is how I run it:

diff --git a/tests/test_integration_fast_train.py b/tests/test_integration_fast_train.py
index 4dbb70b..8271a47 100644
--- a/tests/test_integration_fast_train.py
+++ b/tests/test_integration_fast_train.py
@@ -151,8 +151,8 @@ class IntegrationFastTrain(DistTestCase):
         train_ds = CacheDataset(data=train_files, transform=train_transforms, cache_rate=1.0, num_workers=8)
         val_ds = CacheDataset(data=val_files, transform=val_transforms, cache_rate=1.0, num_workers=5)
         # disable multi-workers because `ThreadDataLoader` works with multi-threads
-        train_loader = ThreadDataLoader(train_ds, num_workers=0, batch_size=4, shuffle=True)
-        val_loader = ThreadDataLoader(val_ds, num_workers=0, batch_size=1)
+        train_loader = ThreadDataLoader(train_ds, num_workers=2, use_thread_workers=True, batch_size=4, shuffle=True)
+        val_loader = ThreadDataLoader(val_ds, num_workers=2, use_thread_workers=True, batch_size=1)
 
         loss_function = DiceCELoss(to_onehot_y=True, softmax=True, squared_pred=True, batch=True)
         model = UNet(

I think we should merge this one...

@ericspod
Copy link
Copy Markdown
Member Author

We do know there is a thread-safety issue with many transforms which use the random state, things can sometimes train faster but there will be race issues which may preclude reproducibility. I need to have time to consider possible solutions to this, the purpose of this addition was to permit faster operation in some cases but also allows us to debug transform sequences in one process with a single worker thread.

@ericspod ericspod marked this pull request as ready for review June 5, 2022 19:08
@ericspod ericspod enabled auto-merge (squash) June 5, 2022 19:53
Copy link
Copy Markdown
Contributor

@Nic-Ma Nic-Ma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It overall looks good to me.
Put some comments inline.

Thanks.

@wyli
Copy link
Copy Markdown
Contributor

wyli commented Jun 6, 2022

/build

@ericspod ericspod merged commit 22924f5 into Project-MONAI:dev Jun 6, 2022
@ericspod ericspod deleted the thread_dataloader_extension branch June 6, 2022 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants