[Repo Assist] perf: optimise Frame.AggregateRowsBy to avoid per-group sub-frame construction by github-actions[bot] · Pull Request #669 · fslaborg/Deedle

github-actions · 2026-03-19T02:59:56Z

🤖 This is an automated PR from Repo Assist, created in response to the maintainer's request on issue #269.

Closes #269

What and Why

Frame.aggregateRowsBy was slow at scale because the previous implementation called NestRowsBy, which built a full sub-Frame (materialising vector relocations for all columns) for every distinct group. The result was then assembled row-by-row via fromRows. For a 100 000-row frame with few distinct groups this was essentially O(n × numColumns) in vector work.

The approach in the issue comments (FoldColumnBy / AggregateColumnBy) pointed directly to the fix: do one pass to bucket rows by group key, then extract the aggregate data column-by-column.

How

Single scan — builds a Dictionary(obj[], int) (using HashIdentity.Structural for element-wise key equality) that records the ordered groups and the row offsets belonging to each group. O(n).
Per aggregate column — for each column in aggBy, the sub-series passed to aggFunc is built directly from the source column's values at those row offsets. No sub-frame construction; no vector relocation of unrelated columns.
Assembly — each group's result is packed as an ObjectSeries<'C> (same shape the original fromRows path expected), so FrameUtils.fromRows still handles column-type inference exactly as before.

This replaces G × C vector relocations (G groups, C total columns) with G × A series constructions (A aggregate columns only).

Test Status

All 676 existing tests pass:

Passed! - Failed: 0, Passed: 676, Skipped: 0, Total: 676

The existing test Can aggregate rows by key pairs with missing item in pairs exercises multi-column grouping with missing values and continues to pass.

Trade-offs

The fromRows assembly at the end still iterates over rows; for very wide result frames with many aggregate columns there is still some overhead. A future optimisation could avoid fromRows entirely by constructing the output frame column-by-column (requiring typed column construction per column key).
Behaviour is otherwise identical to the previous implementation.

Generated by Repo Assist for issue #269 · ◷

To install this agentic workflow, run
gh aw add githubnext/agentics/workflows/repo-assist.md@30f2254f2a7a944da1224df45d181a3f8faefd0d

…frames The previous implementation called NestRowsBy which materialised a full sub-Frame (all columns, vector relocation) for every distinct group, and then assembled the result row-by-row via fromRows. For a frame with many rows and few distinct groups this became O(n * numColumns) in vector work. The new implementation: 1. Extracts the group-by column values in a single pass using a Dictionary<obj[], int> with structural equality (HashIdentity.Structural), recording the ordered groups and the row offsets belonging to each group. 2. For each aggregate column, builds a sub-series directly from the source column's values at those offsets — no sub-frame construction needed. 3. Assembles the result rows as ObjectSeries (one per group) and hands them to FrameUtils.fromRows exactly as before, preserving the existing type- inference behaviour for result column types. This eliminates the NestRowsBy bottleneck (which created G * C vector relocations for G groups and C columns) and replaces it with a single O(n) scan plus G * A series constructions for A aggregate columns. Fixes #269 Co-authored-by: Copilot <[email protected]>

github-actions Bot added automation performance repo-assist labels Mar 19, 2026

ci: trigger checks

a521ddd

github-actions Bot mentioned this pull request Mar 19, 2026

Slow aggregations #269

Closed

dsyme marked this pull request as ready for review March 19, 2026 10:22

dsyme merged commit 0c1305c into master Mar 19, 2026
2 checks passed

dsyme deleted the repo-assist/fix-issue-269-aggregaterowsby-perf-ca8ad268bcd8d2b4 branch March 19, 2026 10:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Repo Assist] perf: optimise Frame.AggregateRowsBy to avoid per-group sub-frame construction#669

[Repo Assist] perf: optimise Frame.AggregateRowsBy to avoid per-group sub-frame construction#669
dsyme merged 2 commits intomasterfrom
repo-assist/fix-issue-269-aggregaterowsby-perf-ca8ad268bcd8d2b4

github-actions Bot commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

github-actions Bot commented Mar 19, 2026

What and Why

How

Test Status

Trade-offs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant