Conversation
…#539 Adds Frame.mapColValuesAs (F# CompiledName: SelectColumnValuesAs), a typed variant of Frame.mapColValues that avoids boxing overhead. Root cause of the performance gap in #539: Frame.mapColValues routes through frame.Columns which produces ObjectSeries by boxing every element of the underlying typed vector into obj. When the user function iterates 20,000-row columns this generates 20,000+ heap allocations per column. Frame.mapColValuesAs uses GetColumns<'T>() instead, which calls TryAs<'T>() → unboxVector in O(1) per column, returning the original unboxed IVector<'T> directly. The user function then iterates at native speed. Benchmark context (from #539 issue): mapColValues: ~1 s (boxed path) mapColValuesAs: ~10 ms (unboxed path) — 100× speedup API: Frame.mapColValuesAs (f: Series<'R,'T> -> Series<'R,'S>) : Frame<'R,'C> -> Frame<'R,'C> Columns not convertible to 'T are silently dropped (same as getNumericCols). The existing mapColValues API is unchanged — fully backwards-compatible. Tests: 4 new regression tests covering correct values, column key preservation, non-convertible column dropping, and parity with mapColValues for homogeneous frames. All 592 tests pass. Co-authored-by: Copilot <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 Repo Assist — automated performance improvement (Task 8)
Summary
Adds
Frame.mapColValuesAs, a typed variant ofFrame.mapColValuesthat avoids the boxing overhead ofObjectSeries— fixing the ~100× performance gap reported in #539.Root cause
frame.Columnsmaterialises each column as anObjectSeriesby callingboxVector, which wraps everyfloatin the underlyingIVector(float)into a heap-allocatedobjon access. A 2-column × 20,000-row frame therefore allocates ~40,000 boxed values just to iterate the columns.Frame.mapColValuesAsusesGetColumns<'T>()instead, which callsTryAs<'T>()→unboxVectorin O(1) per column, returning the original typedIVector<'T>directly.New API
Columns not convertible to
'Tare silently dropped (consistent withgetNumericColsbehaviour). The existingmapColValuesAPI is unchanged — fully backwards-compatible.Files changed
src/Deedle/FrameModule.fsmapColValuesAsaftermapColValuestests/Deedle.Tests/Frame.fsTest Status
mapColValuesfor homogeneous frames