Skip to content

error in multi-process accessing of the same non-exist persistent cache #2374

@wyli

Description

@wyli

Is your feature request related to a problem? Please describe.
the persistent dataset will first check the existence of a cache directory and the create a new one if needed:

MONAI/monai/data/dataset.py

Lines 163 to 165 in feb3a33

if self.cache_dir is not None:
if not self.cache_dir.exists():
self.cache_dir.mkdir(parents=True)

these steps may run into a race condition in a multiprocess context.

python -m tests.test_persistentdataset
persistent 1
persistent 0
create /var/folders/6f/fdkl7m0x7sz3nj_t7p3ccgz00000gp/T/tmpu578spse/test
create /var/folders/6f/fdkl7m0x7sz3nj_t7p3ccgz00000gp/T/tmpu578spse/test
Process Process-2:
Traceback (most recent call last):
  File "/usr/local/anaconda3/envs/py37/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/local/anaconda3/envs/py37/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "MONAI/tests/utils.py", line 296, in run_process
    raise e
  File "MONAI/tests/utils.py", line 287, in run_process
    func(*args, **kwargs)
  File "MONAI/tests/utils.py", line 471, in _call_original_func
    return f(*args, **kwargs)
  File "MONAI/tests/test_persistentdataset.py", line 166, in test_mp_dataset
    ds = PersistentDataset(items, transform=_InplaceXform(), cache_dir=cache_dir)
  File "MONAI/monai/data/dataset.py", line 172, in __init__
    self.cache_dir.mkdir(parents=True)
  File "/usr/local/anaconda3/envs/py37/lib/python3.7/pathlib.py", line 1273, in mkdir
    self._accessor.mkdir(self, mode)
FileExistsError: [Errno 17] File exists: '/var/folders/6f/fdkl7m0x7sz3nj_t7p3ccgz00000gp/T/tmpu578spse/test'
F
======================================================================
FAIL: test_mp_dataset (__main__.TestDistCreateDataset)
----------------------------------------------------------------------
Traceback (most recent call last):
  File MONAI/tests/utils.py", line 343, in _wrapper
    assert results.get(), "Distributed call failed."
AssertionError: Distributed call failed.

----------------------------------------------------------------------

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions