Patch dataset seems extremely inefficient in training

Hello! Thank you for this incredible library! I'm loving my experience with MONAI so far.
**Is your feature request related to a problem? Please describe.**
But I've noticed a severe inefficiency in the way PatchDataset is written. If used with RandSpacialCropSamplesd, as it is in the example, it takes **more** time to sample the same number of patches, than if using RandSpacialCropd with a simple Dataset. That's due to how the patches are sampled: for every patch, first, **all** patches are resampled, then a patch is chosen with the corresponding index, while the other sampled patches are just thrown away. So, this creates two problems
1. First, obviously, it is unnecessarily slow, re-sampling patches every time, so it makes no sense to use it to speed up training.
2. Second, it eats up more memory than it should. One could easily implement this with generators, so that the memory footprint would be basically the same as in sampling a single patch.

**Describe the solution you'd like**
I wish PatchDataset would store a generator inside, which would yield patches. On StopIteration, one would store newly-sampled patches in the generator. Also, please add a flag, allowing to turn RandSpacialCropSamplesd into a generator, so that it wouldn't store all sampled patches in memory. It would be slower than storing all of them in memory, but it would greatly reduce the memory footprint (to the footprint of sampling a single patch).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch dataset seems extremely inefficient in training #6585

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Patch dataset seems extremely inefficient in training #6585

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions