Skip to content

Patch dataset seems extremely inefficient in training #6585

@Ilykuleshov

Description

@Ilykuleshov

Hello! Thank you for this incredible library! I'm loving my experience with MONAI so far.
Is your feature request related to a problem? Please describe.
But I've noticed a severe inefficiency in the way PatchDataset is written. If used with RandSpacialCropSamplesd, as it is in the example, it takes more time to sample the same number of patches, than if using RandSpacialCropd with a simple Dataset. That's due to how the patches are sampled: for every patch, first, all patches are resampled, then a patch is chosen with the corresponding index, while the other sampled patches are just thrown away. So, this creates two problems

  1. First, obviously, it is unnecessarily slow, re-sampling patches every time, so it makes no sense to use it to speed up training.
  2. Second, it eats up more memory than it should. One could easily implement this with generators, so that the memory footprint would be basically the same as in sampling a single patch.

Describe the solution you'd like
I wish PatchDataset would store a generator inside, which would yield patches. On StopIteration, one would store newly-sampled patches in the generator. Also, please add a flag, allowing to turn RandSpacialCropSamplesd into a generator, so that it wouldn't store all sampled patches in memory. It would be slower than storing all of them in memory, but it would greatly reduce the memory footprint (to the footprint of sampling a single patch).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions