Skip to content

Issues with file permissions when using Dask #725

@olliestephenson

Description

@olliestephenson

tl;dr: Problems with Dask being able to acquire a workspace lock on path can be solved by changing the path that Dask uses.

Description of the problem

When using Dask to do parallel processing with Mintpy (as described here: https://mintpy.readthedocs.io/en/latest/dask/) I have been running into problems related to file permissions.

The problem arises during the invert_network stage of Mintpy processing, where Dask is used to split the job up over many CPUs. Then I start getting errors, shown below.

Full error message
Here is an example error message, that repeats many times

------- start parallel processing using Dask -------
input Dask cluster type: local
initiate Dask cluster
distributed.diskutils - ERROR - Could not acquire workspace lock on path: /marmot-nobak/olstephe/InSAR/Makran/T115a/mintpy/process_stack_full_time_small_region_1cpu/dask-worker-space/worker-efk6dn5q.dirlock .Continuing without lock. This may result in workspaces not being cleaned up
Traceback (most recent call last):
  File "/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/site-packages/distributed/diskutils.py", line 61, in __init__
    with workspace._global_lock():
  File "/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/site-packages/distributed/locket.py", line 196, in __enter__
    self.acquire()
  File "/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/site-packages/distributed/locket.py", line 190, in acquire
    self._lock.acquire(self._timeout, self._retry_period)
  File "/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/site-packages/distributed/locket.py", line 119, in acquire
    lock.acquire(timeout, retry_period)
  File "/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/site-packages/distributed/locket.py", line 163, in acquire
    _lock_file_blocking(self._file)
  File "/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/site-packages/distributed/locket.py", line 59, in _lock_file_blocking
    fcntl.flock(file_.fileno(), fcntl.LOCK_EX)
OSError: [Errno 37] No locks available
/home/olstephe/apps/miniconda3/envs/mintpy/lib/python3.8/contextlib.py:120: UserWarning: Creating scratch directories is taking a surprisingly long time. This is often due to running workers on a network file system. Consider specifying a local-directory to point workers to write scratch data to a local disk.
  next(self.gen)

In my case, the issue is possibly related to how the specific disk I'm trying to use is mounted. The issue is resolved by getting Dask to use a different location for writing scratch data.

We can do this by creating an YAML file for dask in the ~/.config/dask/ directory (i.e. ~/.config/dask/dask.yaml), and adding the following line to that file:

temporary-directory: /tmp # Directory for local disk like /tmp, /scratch, or /local

In this case we use the /tmp directory, but this will depend on your system. Dask will create a dask-worker-space directory in /tmp, and put directories for each worker within that directory. If others are using the same machine they may already have created a dask-worker-space directory in /tmp which you won't have permissions for. In this case you can just create a personal directory for storing Dask workers (e.g. temporary-directory: /tmp/my_dask_dir in the YAML file).

This resolved the issue for me.

See other relevant issues on GitHub:
dask/distributed#2113
dask/distributed#2496

System information

  • Operating system: Red Hat Enterprise Linux 8.5
  • Python environment: conda
  • Version of MintPy: MintPy version v1.3.2, date 2021-11-21

Thanks to @yunjunz for previous help with this.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions