Skip to content

Collate error on the key 'img_transforms' of dictionary data. #2116

@ambroslins

Description

@ambroslins

Describe the bug
While using the CacheDataset and multiple workers and the persistent_workers option for the DataLoader following exception is raised in the second epoch:

  File "C:\Users\Ambros\Dev\monai-issue\env\lib\site-packages\monai\data\utils.py", line 277, in list_data_collate
    raise RuntimeError(re_str)
RuntimeError: each element in list of batch should be of equal size
Collate error on the key 'img_transforms' of dictionary data.

MONAI hint: if your transforms intentionally create images of different shapes, creating your `DataLoader` with `collate_fn=pad_list_data_collate` might solve this problem (check its documentation).

see full output

To Reproduce
I wrote a minimal example based on unet_training_dict.py from the tutorials repository.

Steps to reproduce the behavior:

  1. Go to https://gist.github.com/ambroslins/9b079eac659c3483c6fceba5fcee8859#file-unet_training_dict-py and download unet_training_dict.py
  2. Install environment
  3. Run command: python unet_training_dict.py

Expected behavior
I don't expect this expection to be raised, the same code worked fine on monai version 0.4.0.

Environment
pip freeze

> python -c 'import monai; monai.config.print_debug_info()'
================================
Printing MONAI config...
================================
MONAI version: 0.5.2
Numpy version: 1.20.2
Pytorch version: 1.8.1+cu111
MONAI flags: HAS_EXT = False, USE_COMPILED = False
MONAI rev id: feb3a334b7bbf302b13a6da80e0b022a4cf75a4e

Optional dependencies:
Pytorch Ignite version: NOT INSTALLED or UNKNOWN VERSION.
Nibabel version: 3.2.1
scikit-image version: NOT INSTALLED or UNKNOWN VERSION.
Pillow version: 8.2.0
Tensorboard version: 2.5.0
gdown version: NOT INSTALLED or UNKNOWN VERSION.
TorchVision version: 0.9.1+cu111
ITK version: NOT INSTALLED or UNKNOWN VERSION.
tqdm version: 4.60.0
lmdb version: NOT INSTALLED or UNKNOWN VERSION.
psutil version: NOT INSTALLED or UNKNOWN VERSION.

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies


================================
Printing system config...
================================
`psutil` required for `print_system_info`

================================
Printing GPU config...
================================
Num GPUs: 1
Has CUDA: True
CUDA version: 11.1
cuDNN enabled: True
cuDNN version: 8005
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_37']
GPU 0 Name: NVIDIA GeForce RTX 2070 SUPER
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 40
GPU 0 Total memory (GB): 8.0
GPU 0 CUDA capability (maj.min): 7.5

Additional context
The problem occurs only if the CacheDataset is used and persistent_workers=True for the DataLoader.
I tried to debug the problem and set a breakpoint on monai/data/utils.py:256.
On the second epoch the transforms for one of the elements in batch changed:
image
I don't expect that RandAffined and ToTensord are applied twice, I think this is what causes the exception in default_collate but I don't know what causes the transforms to be duplicated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions