build(deps): update Arrow/Parquet to 52.0, object-store to 0.10#10765
build(deps): update Arrow/Parquet to 52.0, object-store to 0.10#10765andygrove merged 19 commits intoapache:mainfrom
52.0, object-store to 0.10#10765Conversation
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
|
Okay, the one last thing is to update Cargo.toml after arrow |
|
Epic! |
Signed-off-by: Ruihang Xia <[email protected]>
|
@waynexia It looks like you need to run |
| common::collect(merged_aggregate.execute(0, task_ctx.clone())?).await?; | ||
| // enlarge memory limit in spill mode | ||
| let task_ctx = if spill { | ||
| new_spill_ctx(2, 2600) |
There was a problem hiding this comment.
Should this be configurable? Also could you add a comment explaining what is happening here?
There was a problem hiding this comment.
This change has two stages to my understanding. The first partial aggr stage has a smaller limit, to meet the expectation that the partial plan has early emit.
datafusion/datafusion/physical-plan/src/aggregates/mod.rs
Lines 1504 to 1516 in 6846513
And the new lines enlarge the limit after that "early emit" check. Otherwise the merge aggr would fall because of insufficient memory. (at line 1554)
let result = common::collect(merged_aggregate.execute(0, task_ctx)?).await?;The root cause of this change is somehow the memory requirement becomes larger. From my investigation, the biggest change compared to the previous is from the RawTable in GroupValuesPrimitive -- it needs about 1000 bytes more:
There was a problem hiding this comment.
I hadn't realized that these literal values were just in tests ... thanks for the explanation
| ) -> Result<()> { | ||
| let task_ctx = if spill { | ||
| new_spill_ctx(2, 3200) | ||
| new_spill_ctx(2, 4200) |
There was a problem hiding this comment.
Again, seems like we need to make something configurable here?
There was a problem hiding this comment.
I might be misunderstanding, do you mean the change in 3a33755 🤔 ?
|
I merged up from main and updated the lock file |
alamb
left a comment
There was a problem hiding this comment.
Thanks @waynexia -- I agree with @andygrove 's concern about the changes to spilling -- I think we should address them before merging but otherwise this looks really nice to me
🙏
Co-authored-by: Andrew Lamb <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
|
Thanks for reviewing! |
|
@waynexia I was inspecting the dependency tree for delta-rs and it seems the dependency didn't get bumped for object-store in |
|
Sorry for causing this! Opened #10848 to fix it. |
|
All good, just wanted to bring it to your attention before 39 formally drops :) |
FYI @andygrove -- maybe this means we should make another RC for 39.0.0 🤔 We could also potentially make a 39.1.0 release too with this dependency upgrade fix #10848 |
…pache#10765) * fix compile on default feature config Signed-off-by: Ruihang Xia <[email protected]> * fix test of common, functions, optimizer and physical-expr Signed-off-by: Ruihang Xia <[email protected]> * fix other tests Signed-off-by: Ruihang Xia <[email protected]> * fix one last test Signed-off-by: Ruihang Xia <[email protected]> * fix clippy warnings Signed-off-by: Ruihang Xia <[email protected]> * fix datafusion-cli Signed-off-by: Ruihang Xia <[email protected]> * switch to git deps Signed-off-by: Ruihang Xia <[email protected]> * regen proto file Signed-off-by: Ruihang Xia <[email protected]> * fix pyo3 feature Signed-off-by: Ruihang Xia <[email protected]> * fix slt Signed-off-by: Ruihang Xia <[email protected]> * fix symmetric hash join cases Signed-off-by: Ruihang Xia <[email protected]> * update integration result Signed-off-by: Ruihang Xia <[email protected]> * fix up spill test Signed-off-by: Ruihang Xia <[email protected]> * shift to the released packages Signed-off-by: Ruihang Xia <[email protected]> * Update cargo.lock * Update datafusion/optimizer/src/analyzer/type_coercion.rs Co-authored-by: Andrew Lamb <[email protected]> * update document Signed-off-by: Ruihang Xia <[email protected]> * move memory limit to parameter pos Signed-off-by: Ruihang Xia <[email protected]> --------- Signed-off-by: Ruihang Xia <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
Closes #.
Rationale for this change
51.0.0, tonic to0.11#961352.0.0arrow-rs#5688What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?