Skip to content

Scheduler gather should warn or abort requests if data is too large #7964

@fjetter

Description

@fjetter

By default, Client.compute or Client.gather will proxy the data fetching over the scheduler via Scheduler.gather.

Particularly for novice users (but also for the careless or under-caffeinated veterans) an accidental compute call can kill the scheduler since the scheduler will collect data without any further checks.

The scheduler knows at this point already how large the requested keys are and could abort the request by raising an appropriate exception instead of fetching the data until it ultimately runs out of memory.

While the end result in both situations is that the computation is lost, the UX is substantially better since the user receives immediate and hopefully actionable feedback to resolve the situation without killing the cluster.

Even for large requests that will make it through the scheduler, I would like to see an informative warning message on the client side. For instance, whenever data exceeds a certain threshold, a warning is issued on the client side informing the user that a fetch of XGB is currently in progress.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementImprove existing functionality or make things work betterstabilityIssue or feature related to cluster stability (e.g. deadlock)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions