-
-
Notifications
You must be signed in to change notification settings - Fork 750
Closed
Labels
enhancementImprove existing functionality or make things work betterImprove existing functionality or make things work better
Description
Use case
This has been raised offline by a power user.
Their workers have limited disk space - frequently less than the amount of RAM. At the moment, the user has completely disabled spilling as the spill file will occupy all available space and when that happens OSErrors will start being raised and data will be lost.
Proposed behaviour
- If spill-to-disk limit is hit, stop spilling to disk and keep data in memory.
- Buildup of memory will then pause worker
Proposed design
- Add a line to the dask config to put an upper limit to the size of the spill file.
- Keep track of the current size of the spill files on disk. This may not be exactly the same as the size of the keys due to discrepancies between sizeof() and pickle output. Note that this measure can be done without I/O by intercepting the calls to
zict.File.__setitem__. - Ahead of spilling, add the
sizeof()of the key to be spilled to the current size of the spill files on disk. If the maximum size would be exceeded, log a warning and don't spill. If memory pressure keeps building up, this will in turn cause the worker to eventually reach the pause threshold.
Metadata
Metadata
Assignees
Labels
enhancementImprove existing functionality or make things work betterImprove existing functionality or make things work better