Skip to content

Comments

fix: gc string view arrays in RepartitionExec#20500

Draft
Samyak2 wants to merge 2 commits intoapache:mainfrom
Samyak2:fix-repartition-string-view-counting
Draft

fix: gc string view arrays in RepartitionExec#20500
Samyak2 wants to merge 2 commits intoapache:mainfrom
Samyak2:fix-repartition-string-view-counting

Conversation

@Samyak2
Copy link
Contributor

@Samyak2 Samyak2 commented Feb 23, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

  • If any StringViewArray columns are present in the repartitioned input, we gc them to reduce duplicate tracking of the same string view buffer.

Are these changes tested?

Checked that this fixes the issue reported in #20491

Are there any user-facing changes?

No

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Feb 23, 2026
Fixes apache#20491

- Took the fix from `ExternalSorter` introduced in apache#14823
- If any `StringViewArray` columns are present in the repartitioned
  input, we gc them to reduce duplicate tracking of the same string view
  buffer.
- Fixes over-counting when there's a `RepartitionExec` above a partial
  agg on a `StringViewArray` column.
This is not needed for round robin repartition
@Samyak2 Samyak2 force-pushed the fix-repartition-string-view-counting branch from 6a1547d to 471de13 Compare February 25, 2026 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Over-counting of memory in aggregation + repartition over Utf8View/StringViewArray

1 participant