-
Notifications
You must be signed in to change notification settings - Fork 257
feat: Spark 4.0 support #608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Spark 4.0 support #608
Conversation
|
The failing test is the only thing I couldn't figure out. |
I faced it already, let me take a look. |
SemyonSinchenko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love it! Thanks a lot, @Kimahriman, that is a beautiful work! I left a few comments and I will work on fixing the py-connect test i a separate branch (I think it is related to the different order of rows in spark-arrow-python conversion)
src/test/scala-spark-3/org/apache/spark/sql/graphframes/SparkTestShims.scala
Outdated
Show resolved
Hide resolved
@Kimahriman Can you please take changes from #610? There are only few lines of code... It just will be faster, cause we are still having some governance issues and merging #610 may be slow. I think the problem is that PySpark Connect in 4.x does not preserve the order of rows after Thanks in advance! |
Thought about that too but unfortunately doesn't appear to be the case. With that update: |
|
Figured it out, it was a difference in the number of partitions of the initial edge DataFrame, must be something non-deterministic about the power iteration clustering? |
Wow! Nice work! Yes, most probably it is not deterministic. Actually it is just a wrapper on top of SparkML |
|
I also tried to fix some issues with the shading/assembly, I noticed with testing |
Putting classes from the root to connect is intentional to allow users to specify a single dependency and to have a ready to use connect-plugin instead of manually adding both connect and root. To be honest I'm not sure what would be the best to do it, I was trying to avoid user-facing "ClassNotFoundExpceptions" that are tricky to understand and resolve.. |
SemyonSinchenko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Great work @Kimahriman !
|
I will left this until the end of Sunday, @rjurney @SauronShepherd please, notify me if you need more time to review. |
Based on the publishM2, |
And that is another big problem of GraphFrames: I came to the project with a hope to get some scala experience, I was 100% not ready to work on the build-publish system just because I do not have enough experience for it. But someone had to do it, otherwise the all the new contributors lost the motivation soon if we wouldn't be able to make releases... Anyway thanks a lot, I will take a look! It would be very cool also if you share how did you find it. I tried to run |
Yeah that's basically what I did. They only appeared on the published connect JAR because of |
What changes were proposed in this pull request?
.remote('local')in the fixture instead of wrapper start/stop scriptsWhy are the changes needed?
To support the latest Spark version.