-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Collate error on the key 'img_transforms' of dictionary data. #2116
Description
Describe the bug
While using the CacheDataset and multiple workers and the persistent_workers option for the DataLoader following exception is raised in the second epoch:
File "C:\Users\Ambros\Dev\monai-issue\env\lib\site-packages\monai\data\utils.py", line 277, in list_data_collate
raise RuntimeError(re_str)
RuntimeError: each element in list of batch should be of equal size
Collate error on the key 'img_transforms' of dictionary data.
MONAI hint: if your transforms intentionally create images of different shapes, creating your `DataLoader` with `collate_fn=pad_list_data_collate` might solve this problem (check its documentation).
To Reproduce
I wrote a minimal example based on unet_training_dict.py from the tutorials repository.
Steps to reproduce the behavior:
- Go to https://gist.github.com/ambroslins/9b079eac659c3483c6fceba5fcee8859#file-unet_training_dict-py and download
unet_training_dict.py - Install environment
- Run command:
python unet_training_dict.py
Expected behavior
I don't expect this expection to be raised, the same code worked fine on monai version 0.4.0.
Environment
pip freeze
> python -c 'import monai; monai.config.print_debug_info()'
================================
Printing MONAI config...
================================
MONAI version: 0.5.2
Numpy version: 1.20.2
Pytorch version: 1.8.1+cu111
MONAI flags: HAS_EXT = False, USE_COMPILED = False
MONAI rev id: feb3a334b7bbf302b13a6da80e0b022a4cf75a4e
Optional dependencies:
Pytorch Ignite version: NOT INSTALLED or UNKNOWN VERSION.
Nibabel version: 3.2.1
scikit-image version: NOT INSTALLED or UNKNOWN VERSION.
Pillow version: 8.2.0
Tensorboard version: 2.5.0
gdown version: NOT INSTALLED or UNKNOWN VERSION.
TorchVision version: 0.9.1+cu111
ITK version: NOT INSTALLED or UNKNOWN VERSION.
tqdm version: 4.60.0
lmdb version: NOT INSTALLED or UNKNOWN VERSION.
psutil version: NOT INSTALLED or UNKNOWN VERSION.
For details about installing the optional dependencies, please visit:
https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
================================
Printing system config...
================================
`psutil` required for `print_system_info`
================================
Printing GPU config...
================================
Num GPUs: 1
Has CUDA: True
CUDA version: 11.1
cuDNN enabled: True
cuDNN version: 8005
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_37']
GPU 0 Name: NVIDIA GeForce RTX 2070 SUPER
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 40
GPU 0 Total memory (GB): 8.0
GPU 0 CUDA capability (maj.min): 7.5
Additional context
The problem occurs only if the CacheDataset is used and persistent_workers=True for the DataLoader.
I tried to debug the problem and set a breakpoint on monai/data/utils.py:256.
On the second epoch the transforms for one of the elements in batch changed:

I don't expect that RandAffined and ToTensord are applied twice, I think this is what causes the exception in default_collate but I don't know what causes the transforms to be duplicated.