-
-
Notifications
You must be signed in to change notification settings - Fork 750
Open
Labels
enhancementImprove existing functionality or make things work betterImprove existing functionality or make things work bettermemory
Description
Use case
This has been raised offline by a power user. Their containerization / virtualization system allows changing the amount of RAM mounted on the host on the fly. They would like to do so and then to change the memory-limit of the Worker without restarting it. Everything chained to it (target, spill, pause, and terminate thresholds) should also be recalculated.
Current state of the art
- It is incidentally possible to update spill and pause thresholds on the fly on the worker.
- Updating the target threshold does nothing.
- Nannies are not easily accessible from the client so the terminate threshold can't be changed.
- Updating the memory_limit incidentally works for the purpose of recalculating the absolute spill and pause, but not for target or terminate.
Proposed implementation
Calling
def set_memory_limit(dask_worker, n):
dask_worker.memory_limit = n
client.run(set_memory_limit, n, workers=[...])should just work. As this is a niche use case, a dedicated Client API is probably overkill.
Target, spill, pause and terminate thresholds must automatically be recalculated.
Notes
- The target threshold % is multiplied by the memory_limit by
Worker.__init__to get an absolute target and then used to build the zict SpillBuffer. Need to find a way to change the zict target on the fly; this likely will require an upstream patch or at the very least additional upstream unit tests. - The terminate threshold is stored on the Nanny, so some sort of Worker->Nanny RPC will be necessary.
- A reduction in the memory limit may send the worker immediately above the terminate threshold. There must be some sort of algorithm that lets it sit in paused state for a while instead. e.g. disable terminate entirely for X seconds (configurable) after a reduction.
AC
- A straightforward way to change the memory_limit is clearly documented and covered by unit tests
- Target, spill, pause and terminate thresholds are recalculated automatically. This is covered by unit tests.
- Explicit management for the use case of reduction causing the worker to suddenly exceed the terminate threshold is implemented, documented, and covered by unit tests. As this is a new feature, it is reasonable to leave this last point as a separate, later PR.
Metadata
Metadata
Assignees
Labels
enhancementImprove existing functionality or make things work betterImprove existing functionality or make things work bettermemory