feat: reduce memory consumption of cycles detection #731

SemyonSinchenko · 2025-10-11T08:34:40Z

What changes were proposed in this pull request?

Before sending any messages with current sequences, we are filtering out all of them, where the starting vertex of the sequence is grater than the destination vertex of the edge.

Explanation.
At the moment, algorithm finds all of the cycles [1, 2, 3], [2, 3, 1] and [3, 1, 2]. Because of that it creates a huge memory load on iterations as well as on the result. Because we are talking about cycles, it is very easy to return only [1, 2, 3] by checking on each iteration, that the first vertex of the current path is the smallest among all of the vertices. To reduce shuffles, I added a check before sending a message (with list of current paths) that starting vertices of each path are less (by ID) than the destination. Based on this check, not passed it paths are filtered out.

It reduce the memory consumption on both iterations (less stored sequences, more lightweight messages, etc.), reduce the weight of the returned dataframe and does not require from users to deduplicate cycles manually.

Why are the changes needed?

Close #730

P.S. To avoid merge-conflicts, I will update docs as part of the #725

codecov-commenter · 2025-10-11T08:43:17Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.01%. Comparing base (a2d2a91) to head (67d0472).
⚠️ Report is 3 commits behind head on main.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #731      +/-   ##
==========================================
- Coverage   86.60%   84.01%   -2.59%     
==========================================
  Files          63       65       +2     
  Lines        2881     3028     +147     
  Branches      321      360      +39     
==========================================
+ Hits         2495     2544      +49     
- Misses        386      484      +98

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

james-willis · 2025-10-27T15:54:56Z

any conern about this potentially breaking consumers?

SemyonSinchenko · 2025-10-27T16:04:52Z

any conern about this potentially breaking consumers?

The feature was not released yet. So, it's perfect time to change it.

reduce memory consumption of cycles detection

67d0472

SemyonSinchenko requested review from james-willis and rjurney October 11, 2025 08:34

SemyonSinchenko self-assigned this Oct 11, 2025

SemyonSinchenko added scala performance labels Oct 11, 2025

SemyonSinchenko added 2 commits October 11, 2025 14:59

pytests

d20820e

fix pytest

bdae0ce

SemyonSinchenko mentioned this pull request Oct 14, 2025

feat: prepare the release 0.10.0 #725

Merged

james-willis approved these changes Oct 27, 2025

View reviewed changes

SemyonSinchenko merged commit 8f94c12 into graphframes:main Oct 27, 2025
5 checks passed

SemyonSinchenko deleted the 730-unique-cycles branch October 27, 2025 16:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: reduce memory consumption of cycles detection #731

feat: reduce memory consumption of cycles detection #731

Uh oh!

SemyonSinchenko commented Oct 11, 2025

Uh oh!

codecov-commenter commented Oct 11, 2025 •

edited

Loading

Uh oh!

james-willis commented Oct 27, 2025

Uh oh!

SemyonSinchenko commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: reduce memory consumption of cycles detection #731

feat: reduce memory consumption of cycles detection #731

Uh oh!

Conversation

SemyonSinchenko commented Oct 11, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Uh oh!

codecov-commenter commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

james-willis commented Oct 27, 2025

Uh oh!

SemyonSinchenko commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Oct 11, 2025 •

edited

Loading