For connected component computation, we need to replace manual Parquet checkpointing with DataFrame.checkpoint().
- Simplify the code by leveraging the built-in DataFrame.checkpoint() API, which is available in Spark ≥ 2.3.
- Reduce potential correctness issues and maintenance burden, especially those related to S3 eventual consistency, manual path management, and explicit reloads.
- Align with Spark best practices, where checkpointing can be triggered eagerly or lazily, depending on workflow needs.