Skip to content

Move forward to datafusion 52#291

Merged
gabotechs merged 16 commits intodatafusion-contrib:mainfrom
pydantic:dfd-df52
Jan 15, 2026
Merged

Move forward to datafusion 52#291
gabotechs merged 16 commits intodatafusion-contrib:mainfrom
pydantic:dfd-df52

Conversation

@marc-pydantic
Copy link
Contributor

@marc-pydantic marc-pydantic commented Jan 13, 2026

This PR moves everything to Datafusion 52, which has been released yesterday/today (depending where you are!). All dependencies that are shared with datafusion are upgraded to the same version the Datafusion 52 release is using to minimize friction. Other that that, it was just a minor patch so far.

The big issue is with ballista: It depends transitively on a different version of the native lzma, due to a different apache-avro version being amongst its dependencies. There's no way to reconcile this due to C's lack of namespaces, so I commented out ballista for now. That's why this PR is a draft for now, please let me know how to proceed!

Update: I reviewed all cargo insta snapshots, but a second pair of eyes can't hurt. From what I understand these are exclusively changes in the optimizer causing less CoalesceBatchesExec and RoundRobinBatch nodes to be emitted.

@gabotechs
Copy link
Collaborator

gabotechs commented Jan 14, 2026

🤔 The ballista dependency is in fact a problem. Let me create a separate PR removing ballista.

@gabotechs
Copy link
Collaborator

gabotechs commented Jan 14, 2026

@gabotechs
Copy link
Collaborator

Done, hopefully this will make your life easier

@marc-pydantic
Copy link
Contributor Author

marc-pydantic commented Jan 14, 2026

Awesome, let me try to get this PR working then!

@marc-pydantic marc-pydantic marked this pull request as ready for review January 14, 2026 16:28
@gabotechs
Copy link
Collaborator

I think the fact that the CoalesceBatchesExec has been embedded into RepartitionExec is giving problems...

Previously, we were marking the CoalesceBatchesExec node as the place were a network boundary should be placed below, but this now disappeared... solving that might be a bit complex, as it will require changes in the planning logic.

If you are down for doing it, I'm more than happy to guide you, otherwise I can make a PR to this one with the necessary changes.

@gabotechs
Copy link
Collaborator

gabotechs commented Jan 15, 2026

I took a look and the necessary changes for fixing the tests are pretty nasty. I don't think it's fair to throw you into the depths of the distributed planner, so I made a new PR to this one fixing the remaining issues:

pydantic#2

@marc-pydantic
Copy link
Contributor Author

Thank you, merged in. I'll give whatever's left broken a shot tomorrow - or should #294 be merged first?

@gabotechs
Copy link
Collaborator

If CI passes, let’s pull this one in directly and just do follow ups.

Thanks for handling this!

Copy link
Collaborator

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @marc-pydantic !

@gabotechs gabotechs merged commit 9707b00 into datafusion-contrib:main Jan 15, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants