Skip to content

Conversation

@SemyonSinchenko
Copy link
Collaborator

What changes were proposed in this pull request?

Before sending any messages with current sequences, we are filtering out all of them, where the starting vertex of the sequence is grater than the destination vertex of the edge.

Explanation.
At the moment, algorithm finds all of the cycles [1, 2, 3], [2, 3, 1] and [3, 1, 2]. Because of that it creates a huge memory load on iterations as well as on the result. Because we are talking about cycles, it is very easy to return only [1, 2, 3] by checking on each iteration, that the first vertex of the current path is the smallest among all of the vertices. To reduce shuffles, I added a check before sending a message (with list of current paths) that starting vertices of each path are less (by ID) than the destination. Based on this check, not passed it paths are filtered out.

It reduce the memory consumption on both iterations (less stored sequences, more lightweight messages, etc.), reduce the weight of the returned dataframe and does not require from users to deduplicate cycles manually.

Why are the changes needed?

Close #730

P.S. To avoid merge-conflicts, I will update docs as part of the #725

@codecov-commenter
Copy link

codecov-commenter commented Oct 11, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.01%. Comparing base (a2d2a91) to head (67d0472).
⚠️ Report is 3 commits behind head on main.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #731      +/-   ##
==========================================
- Coverage   86.60%   84.01%   -2.59%     
==========================================
  Files          63       65       +2     
  Lines        2881     3028     +147     
  Branches      321      360      +39     
==========================================
+ Hits         2495     2544      +49     
- Misses        386      484      +98     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@james-willis
Copy link
Collaborator

any conern about this potentially breaking consumers?

@SemyonSinchenko
Copy link
Collaborator Author

any conern about this potentially breaking consumers?

The feature was not released yet. So, it's perfect time to change it.

@SemyonSinchenko SemyonSinchenko merged commit 8f94c12 into graphframes:main Oct 27, 2025
5 checks passed
@SemyonSinchenko SemyonSinchenko deleted the 730-unique-cycles branch October 27, 2025 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: only unique cycles in detectyng cycles

3 participants