Skip to content

feat: Introduce a global mechanism for persisting, checkpointing, and disabling AQE #554

@SauronShepherd

Description

@SauronShepherd

This proposal aims to add a centralized global mechanism that tracks all DataFrames persisted and/or checkpointed across different GraphFrames algorithms, serving as a common tool for consistency and efficiency.

Nice-to-haves:

  • Methods to centralize DataFrame persistence, checkpointing, and temporary AQE disabling.
  • Persisted DataFrames should set the internal tableName in the Spark's Cache Manager to prevent Spark from performing an internal explain (see SPARK-50992) and be clearly identified in the Spark UI.
  • A global configuration setting for the storage level, ensuring uniform persistence levels for all DataFrames.
  • A clearAll method to unpersist all internally persisted DataFrames in GraphFrames and delete the checkpoint dir.
  • A "nice" documentation ;)

Related Issues:

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions