refactor: replace manual Parquet checkpointing with DataFrame.checkpoint()

For connected component computation, we need to replace manual Parquet checkpointing with DataFrame.checkpoint().

- Simplify the code by leveraging the built-in DataFrame.checkpoint() API, which is available in Spark ≥ 2.3.
- Reduce potential correctness issues and maintenance burden, especially those related to S3 eventual consistency, manual path management, and explicit reloads.
- Align with Spark best practices, where checkpointing can be triggered eagerly or lazily, depending on workflow needs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: replace manual Parquet checkpointing with DataFrame.checkpoint() #593

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

refactor: replace manual Parquet checkpointing with DataFrame.checkpoint() #593

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions