SharedCacheDataset#5308
Conversation
Signed-off-by: myron <[email protected]>
Signed-off-by: myron <[email protected]>
|
I've added unit tests. please review. thanks |
|
Hi @myron , Thanks very much for the implementation of this feature.
Thanks in advance. |
|
Hi @Nic-Ma , thank you for the feedback
There is only one similar method. Sublclassing would be too messy and confusing, since the use case is different. Modifying CacheDataset will make it very long (and prone to bugs).
Yes, I have been using in both spawn and fork modes.
This class is designed to for caching on CPU only, it is not designed for GPU, I don't see a reason to test for it.
It's true there is an overhead with shared memory. It's definitely faster vs not-cashing at all. |
Signed-off-by: myron <[email protected]>
Signed-off-by: myron <[email protected]>
I ran the comparison in terms of time SharedCacheDataset vs CacheDataset, using a 3D segmentation task (416 images training, 104 validation) - evenly divisible data size for 8 gpu training. Set up on 8gpu (Nvidia A40 48gb x 8, ampere, latest pytorch, monai 1.0.0 dev branch):
Timing: First epoch:
Second epoch (everything non-random is already cached)
no difference in time on subsequent epochs. If there is any overhead with using shared memory, it's probably hidden within DataLoader parallel loads (that are preloading in parallel to gpu runs). So overall it seems SharedCacheDataset has all the advantages: caches on the fly, no need for special pre-splitting of data (that limits randomness). PS: this an experiment on particular 3D tumor segmentation task from CT, on other tasks timing could be slightly different. |
|
agree with @Nic-Ma, if it's always better the changes should be made to |
|
Hi @myron , Cool, glad to see your test results, could you please help change it to update the Thanks in advance. |
@Nic-Ma , thanks, CacheDataset is quite a bit different, if I update it it will break many old use cases. Also perhaps someone prefers to use CacheDataset as is (as it works for gpu caching too). For num_workers, I used 3, but, we could investigate it's performance later. Can we merge this please, to unblock the overall progress. This is a new class, so it won't break anything. In the future we can consider more performance tests, or think of merging with other classes. |
|
this is a nice feature, all of us don't want to block it... @Nic-Ma and I just wanted to ensure the API is clean and maintainable. simply copying and modifying existing classes will add to the technical debts. hope we are not creating trouble for you. perhaps we can invite other people to collaborate on this if you don't have time. |
@wyli I understand what you're saying, and I agree with you that we need a clean maintainable API. So if you'd like to invite other people to collaborate here and they do it, it'd be great. My only request is that we merge it soon , as we need it for Auto3Dseg to unblock that work as soon as possible. PS: this PR has been waiting for 7 days already. The related issue #3843 has been open for 8 months (since February), it doesn't look like we have much interest from other people to collaborate or contribute to it. I want to avoid the situation when it will take forever to bring any novel developments into our system. Since this class is very useful and not conflicting with anything existing, my preference is to merge it now, and after that, for sure, people can have a thorough discussion how to improve the cleanness of the API or unify some classes if they want. |
|
Hi @myron , OK, let me try to investigate more about this feature, can I commit to this PR directly? Or I create another PR based on this. Thanks. |
|
Hi @myron , I took a deep look at the solution of this PR, and got some questions:
CC @ericspod @wyli for more discussion. Thanks in advance. |
Do be aware that |
I don't know, I don't use torch.launch personally, I always start DistributedDataParaller from within the code itself, so that I can allocated the shared memory and provide it to the individual/spawned processes.
I'm not sure either. I do use 'fork' usually with DDP, but I don't initialize anything major before launching DDP.
|
Signed-off-by: myron <[email protected]>
Signed-off-by: myron <[email protected]>
I found a way to initialize cache without master process, by initializing it on first rank , and broadcasting caveat, but this will only works if this was called before caveat2, in multi-gpu, all processes must create the SharedDataset, to be able to broadcast the cache, otherwise there will a deadlock. but with this approach , we don't need to initialize cache in master process, and it doesn't need to be an input parameter tat all. @Nic-Ma what do you think? |
Fixes #5308 . ### Description This PR added runtime cache support in the `CacheDataset`. ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [ ] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [ ] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. Signed-off-by: Wenqi Li <[email protected]>
…#5365) Fixes Project-MONAI#5308 . ### Description This PR added runtime cache support in the `CacheDataset`. ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [ ] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [ ] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. Signed-off-by: Wenqi Li <[email protected]> Signed-off-by: Behrooz <[email protected]>
This introduces a new SharedCacheDataset(Dataset) class to address #3843
It has several improvements / changes compared to the CacheDataset https://github.com/Project-MONAI/MONAI/blob/dev/monai/data/dataset.py#L690
If tested it and used for many tasks already in DDP / multigpu
I'll add unit tests too if it looks good
Types of changes
./runtests.sh -f -u --net --coverage../runtests.sh --quick --unittests --disttests.make htmlcommand in thedocs/folder.