Skip to content

Conversation

@james-willis
Copy link
Collaborator

What changes were proposed in this pull request?

Enable setting the connected components algorithm from spark configurations.

Why are the changes needed?

spark configurations are a standard way to control implementation details across a spark runtime. This pattern will help runtime environment owners control how their users use graphframes

Copy link
Collaborator

@rjurney rjurney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@SauronShepherd
Copy link
Contributor

latm (= looks asymmetric to me)

Why allow only the algorithm? Why not also checkpointInterval, broadcastThreshold, and intermediateStorageLevel?
Why not extend this to all the other classes in GraphFrames?

@rjurney
Copy link
Collaborator

rjurney commented Apr 3, 2025

I like where you're going with this...

@james-willis
Copy link
Collaborator Author

Why allow only the algorithm?

Because I'm lazy and greedy.

And I'm gauging interest in this kind of approach.

@james-willis james-willis changed the title Allow setting default ConnectedComponents algorithm from spark config Allow setting default ConnectedComponents configurations from spark config Apr 3, 2025
@SauronShepherd
Copy link
Contributor

Why allow only the algorithm?

Because I'm lazy and greedy.

Didn't know lazy people actually propose new things ... 😂

@SemyonSinchenko
Copy link
Collaborator

@james-willis jfyi, you can use a pre-commit hook to be sure that the code is properly formatted: https://github.com/graphframes/graphframes/blob/master/CONTRIBUTING.md#styleguides

Alternatively, just run ./build/sbt root/scalafmtAll

@SauronShepherd
Copy link
Contributor

Hey, and what about the Spark Connect feature in Graphframes? isn't it affected also by this change?

@james-willis
Copy link
Collaborator Author

Hey, and what about the Spark Connect feature in Graphframes? isn't it affected also by this change?

The new interface here is just spark config so I don't think there is any need to change anything about the connect integration but if you think there is some change to be made please point me in the right direction.

@SemyonSinchenko
Copy link
Collaborator

Hey, and what about the Spark Connect feature in Graphframes? isn't it affected also by this change?

It isnt, there is no problem, it is just an alternative way of passing configs.

@SemyonSinchenko
Copy link
Collaborator

@james-willis in the latest changes we introduced a new mixin (WithAlgorithmChoice) to avoid code repeating that was added to ConnectedComponents. So, instead of redefining the default you should probably just call a setter.

@james-willis
Copy link
Collaborator Author

Bogged down at work. will get back to this eventually.

@james-willis james-willis force-pushed the cc-conf-algo branch 3 times, most recently from 4fc31ce to de50969 Compare May 1, 2025 20:28
Copy link
Collaborator

@SemyonSinchenko SemyonSinchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice work @james-willis !

@SemyonSinchenko SemyonSinchenko merged commit 0b761ab into graphframes:master May 12, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants