Skip to content

Utility functions to debug cluster_dump #5873

@fjetter

Description

@fjetter

Client.dump_cluster_state offers a way to dump the entire cluster state of all workers and scheduler. For active clusters this typically includes hundreds of thousands of lines of state (assuming it is printed human readable).

To work with this state dump artifact, one typically needs to write custom scripts, grep the logs, etc. Many of these operations can be standardized and we should keep a few functions around to help us analyze this.

  • get_tasks_in_state(state: str, worker: bool=False) Return all TaskState objects who are currently in a given state on the scheduler (or the Workers if worker is True)
  • story - Get a global story for a key or stimulus ID, similar to Client.story - Support collecting cluster-wide story for a key or stimulus ID #5872
  • missing_workers - Get names and addresses of all workers who are known to the scheduler but we are missing logs for
  • ...

The above functions should accept the cluster dump artifact in both yaml and msgpack format.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions