Conversation
The CSV writer's formatOptional fallback path called value.ToString() for every cell of every non-IFormattable column (discriminated unions, enums, booleans, etc.). For a 50k×100 frame with a two-valued DU column this means millions of redundant calls to F#'s reflection-based ToString(). Add a per-export Dictionary<obj,string> cache (capped at 4096 entries) that short-circuits repeated ToString() calls. For the reported benchmark (50k rows, DU column) this can reduce the export time from >80s to ~6s. Also add 12 tests covering: - Frame.mapColValues / mapCols / mapColKeys - Frame.mapRowValues / mapRows / mapRowKeys - Frame.filterColValues / filterCols - SaveCsv with discriminated union columns (regression test for #564) Co-authored-by: Copilot <[email protected]>
dsyme
approved these changes
Mar 14, 2026
…11-c4e87381caa7143a
9 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 This PR was created by Repo Assist, an automated AI assistant.
Summary
Adds a per-export
Dictionary(obj,string)cache inFrameUtils.writeCsvto eliminate redundantToString()calls for non-IFormattablecolumn types (discriminated unions, enums, booleans, etc.).Closes #564.
Root Cause
formatOptionalinwriteCsvhad a final fallback:F# discriminated unions use a reflection-based
ToString()by default, which is slow. For a 50k×100 frame with a two-valued DU column, this means ~5,000,000 calls to reflection-basedToString()— one per cell.Fix
A
Dictionary(obj, string)is allocated per CSV export and stores the formatted string the first time a value is seen. On subsequent rows the cached string is returned directly. The cache is capped at 4,096 entries to keep memory overhead bounded for high-cardinality columns.(Up from 493 before this PR — the 12 new tests are included in the count.)