You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This proposal aims to add a centralized global mechanism that tracks all DataFrames persisted and/or checkpointed across different GraphFrames algorithms, serving as a common tool for consistency and efficiency.
Nice-to-haves:
Methods to centralize DataFrame persistence, checkpointing, and temporary AQE disabling.
Persisted DataFrames should set the internal tableName in the Spark's Cache Manager to prevent Spark from performing an internal explain (see SPARK-50992) and be clearly identified in the Spark UI.
A global configuration setting for the storage level, ensuring uniform persistence levels for all DataFrames.
A clearAll method to unpersist all internally persisted DataFrames in GraphFrames and delete the checkpoint dir.