Skip to content

[Repo Assist] Perf: add Frame.mapColValuesAs for typed, non-boxing column mapping — closes #539#629

Merged
dsyme merged 2 commits intomasterfrom
repo-assist/perf-mapColValuesAs-539-222a5de4320d73a2
Mar 17, 2026
Merged

[Repo Assist] Perf: add Frame.mapColValuesAs for typed, non-boxing column mapping — closes #539#629
dsyme merged 2 commits intomasterfrom
repo-assist/perf-mapColValuesAs-539-222a5de4320d73a2

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

🤖 Repo Assist — automated performance improvement (Task 8)

Summary

Adds Frame.mapColValuesAs, a typed variant of Frame.mapColValues that avoids the boxing overhead of ObjectSeries — fixing the ~100× performance gap reported in #539.

Root cause

frame.Columns materialises each column as an ObjectSeries by calling boxVector, which wraps every float in the underlying IVector(float) into a heap-allocated obj on access. A 2-column × 20,000-row frame therefore allocates ~40,000 boxed values just to iterate the columns.

Frame.mapColValuesAs uses GetColumns<'T>() instead, which calls TryAs<'T>()unboxVector in O(1) per column, returning the original typed IVector<'T> directly.

New API

// F# module function (CompiledName: SelectColumnValuesAs)
Frame.mapColValuesAs : (Series<'R,'T> -> Series<'R,'S>) -> Frame<'R,'C> -> Frame<'R,'C>
// Before — ~1 s (boxed path)
frame |> Frame.mapColValues (fun (s:ObjectSeries(int)) ->
    s.As(float)() |> Series.chunkWhileInto isSameHour Stats.mean)

// After — ~10 ms (unboxed path)
frame |> Frame.mapColValuesAs (Series.chunkWhileInto isSameHour Stats.mean)

Columns not convertible to 'T are silently dropped (consistent with getNumericCols behaviour). The existing mapColValues API is unchanged — fully backwards-compatible.

Files changed

File Change
src/Deedle/FrameModule.fs Add mapColValuesAs after mapColValues
tests/Deedle.Tests/Frame.fs Add 4 regression tests

Test Status

  • ✅ 4 new tests covering: correct values, column key preservation, non-convertible column dropping, and parity with mapColValues for homogeneous frames
  • ✅ All 592 tests pass (was 588 before this change)
  • ✅ Build succeeded with 0 errors

Generated by Repo Assist ·

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@30f2254f2a7a944da1224df45d181a3f8faefd0d

…#539

Adds Frame.mapColValuesAs (F# CompiledName: SelectColumnValuesAs), a typed
variant of Frame.mapColValues that avoids boxing overhead.

Root cause of the performance gap in #539:
Frame.mapColValues routes through frame.Columns which produces ObjectSeries
by boxing every element of the underlying typed vector into obj. When the
user function iterates 20,000-row columns this generates 20,000+ heap
allocations per column.

Frame.mapColValuesAs uses GetColumns<'T>() instead, which calls TryAs<'T>()
→ unboxVector in O(1) per column, returning the original unboxed IVector<'T>
directly. The user function then iterates at native speed.

Benchmark context (from #539 issue):
  mapColValues:   ~1 s (boxed path)
  mapColValuesAs: ~10 ms (unboxed path) — 100× speedup

API:
  Frame.mapColValuesAs (f: Series<'R,'T> -> Series<'R,'S>) : Frame<'R,'C> -> Frame<'R,'C>

Columns not convertible to 'T are silently dropped (same as getNumericCols).
The existing mapColValues API is unchanged — fully backwards-compatible.

Tests: 4 new regression tests covering correct values, column key
preservation, non-convertible column dropping, and parity with
mapColValues for homogeneous frames. All 592 tests pass.

Co-authored-by: Copilot <[email protected]>
@dsyme dsyme marked this pull request as ready for review March 17, 2026 00:43
@dsyme dsyme merged commit b243c48 into master Mar 17, 2026
2 checks passed
@dsyme dsyme deleted the repo-assist/perf-mapColValuesAs-539-222a5de4320d73a2 branch March 17, 2026 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant