Add memory usage diff to SPMI reports #124943
Conversation
Add a new 'memorydiff' command to the SuperPMI tooling alongside 'asmdiffs' and 'tpdiff'. This command measures JIT arena allocation differences (BytesAllocated metric) between base and diff JITs for all contexts and reports aggregate statistics including P90 per-context allocation ratios. Key changes: - superpmi.py: Add SuperPMIReplayMemoryDiff class, aggregate_memory_diff_metrics(), write_memorydiff_markdown_summary(), subparser, setup_args, main dispatch, and summarize support for 'memorydiff' - superpmi_diffs.py: Add do_memorydiff() method to Diff class, wire into dispatch - superpmi_diffs_setup.py: Add 'memorydiff' to valid types, use checked JITs - superpmi_diffs_summarize.py: Add 'memorydiff' consolidation support The command leverages the existing BytesAllocated JITMETADATAMETRIC that is already automatically written to the SPMI details CSV. No C++ changes needed. Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Pull request overview
Adds a new “memorydiff” mode to the CoreCLR SuperPMI diff infrastructure, enabling CI to report JIT memory allocation deltas alongside existing asm diffs and throughput diffs.
Changes:
- Extend SuperPMI scripts (
superpmi.py,superpmi_diffs*.py) to run and summarize a newmemorydiffdiff type. - Update JIT metrics reporting so
BytesAllocatedis available (and reported) for memory-diffing scenarios. - Wire
memorydiffthrough Helix project + Azure Pipelines templates and produce consolidated markdown reports.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/scripts/superpmi_diffs_summarize.py | Recognize and consolidate memorydiff summaries into the overall markdown report. |
| src/coreclr/scripts/superpmi_diffs_setup.py | Allow memorydiff as a diff type and ensure release assets are required for it. |
| src/coreclr/scripts/superpmi_diffs.py | Add memorydiff execution path and capture its JSON summary output. |
| src/coreclr/scripts/superpmi.py | Add memorydiff subcommand, implement replay+aggregation logic, and add JSON summarization support. |
| src/coreclr/scripts/superpmi-diffs.proj | Download Helix result artifacts for memorydiff runs. |
| src/coreclr/jit/jitmetadata.cpp | Make JitMetrics::report() available outside DEBUG to support metric reporting in release builds. |
| src/coreclr/jit/fginline.cpp | Always merge inlinee metrics into root metrics (not DEBUG-only). |
| src/coreclr/jit/compiler.cpp | Ensure BytesAllocated is set and Metrics.report() is called (outside DEBUG) so the metric is surfaced. |
| eng/pipelines/coreclr/templates/superpmi-diffs-job.yml | Add memorydiff handling to job dependencies/artifact usage. |
| eng/pipelines/coreclr/templates/run-superpmi-diffs-job.yml | Pass correct setup args for memorydiff jobs. |
| eng/pipelines/coreclr/superpmi-diffs.yml | Add CI matrix entries to run memorydiff alongside existing diff types. |
| # This is the summary file name and location written by superpmi.py. If the file exists, remove it to ensure superpmi.py doesn't created a numbered version. | ||
| overall_json_memorydiff_summary_file = os.path.join(self.spmi_location, "memorydiff_summary.json") | ||
| if os.path.isfile(overall_json_memorydiff_summary_file): | ||
| os.remove(overall_json_memorydiff_summary_file) | ||
|
|
||
| overall_json_memorydiff_summary_file_target = os.path.join(self.log_directory, "superpmi_memorydiff_summary_{}.json".format(self.target)) | ||
| self.summary_json_files.append((overall_json_memorydiff_summary_file, overall_json_memorydiff_summary_file_target)) |
There was a problem hiding this comment.
Diff.summarize() writes a JSON placeholder string when the expected summary JSON doesn’t exist. For the new memorydiff flow, superpmi_diffs_summarize.py will pass these files to superpmi.py summarize, which expects a specific JSON tuple shape for memorydiff and will throw if it receives a placeholder string. Consider updating the memorydiff path to ensure a valid (possibly empty) memorydiff_summary.json is always produced, or adjust the placeholder generation to emit the correct JSON structure for memorydiff.
Extracts JIT changes from #124943 into a standalone PR (per <a href="https://github.com/dotnet/runtime/pull/124943#discussion_r2863316022">reviewer request</a>) to prepare jitrollingbuilds with the new functionality first. ## Description ### JIT: `DOTNET_JitReportMetrics` config `Metrics.report(this)` was called unconditionally in every DEBUG compilation, making O(metrics) JIT-EE calls per method even when no consumer needed them. Adds `DOTNET_JitReportMetrics` (default `0`) to gate reporting uniformly across all build configurations; SuperPMI special-cases this config to always return `1`. - **`jitconfigvalues.h`** — adds `RELEASE_CONFIG_INTEGER(JitReportMetrics, "JitReportMetrics", 0)` - **`jitmetadata.cpp`** — moves `JitMetadata::report`, `reportValue`, and `JitMetrics::report` out of `#ifdef DEBUG` so they compile in release builds (required since SuperPMI `memorydiff` uses release JIT binaries); `dump`/`mergeToRoot` remain debug-only - **`compiler.cpp`** - Gates `Metrics.BytesAllocated` computation behind `JitReportMetrics` check (avoids arena page-walk overhead when metrics aren't consumed) - Gates `Metrics.report(this)` behind `JitConfig.JitReportMetrics()` — unified across DEBUG and release builds (no longer unconditional in DEBUG): ```cpp if (JitConfig.JitReportMetrics()) { Metrics.report(this); } ``` - **`superpmi/jithost.cpp`** — special-cases `JitReportMetrics` in `JitHost::getIntConfigValue` to always return `1`, next to the existing `SuperPMIMethodContextNumber` special case - **`superpmi-shim-collector/jithost.cpp`** — excludes `JitReportMetrics` from collection (same pattern as `SuperPMIMethodContextNumber`) so it is not recorded into method contexts <!-- START COPILOT CODING AGENT TIPS --> --- 💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey). --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: EgorBo <[email protected]> Co-authored-by: Egor Bogatov <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: jakobbotsch <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
|
Couple of comments/questions:
|
Is there a source for variance? I'd expect it to be fully deterministic. The most recent report is no-diff: https://dev.azure.com/dnceng-public/public/_build/results?buildId=1314409&view=ms.vss-build-web.run-extensions-tab |
…issues - Rename memorydiff to metricdiff across all files per Jakob's suggestion to support diffing more metrics in the future - Guard print_superpmi_success_result call to only run when metrics are available (fixes crash when details CSV is missing) - Handle placeholder JSON files in summarize_json_summaries to avoid tuple unpacking errors on empty/placeholder summaries - Update comment in superpmi_diffs_summarize.py to include metricdiff filename format Co-authored-by: Copilot <[email protected]>
Is it working? I see nothing under "Details" and looking at one console log shows: (It's possible the One source of variance is that we use hash maps where keys are memory addresses, so those will change from run-to-run and can change how the keys get bucketed. But if it turns out not to be a problem then that's great. |
Move Metrics.report() call outside of #ifdef DEBUG block so JIT metrics (BytesAllocated, PerfScore, etc.) are reported in Release builds. The merge of PR dotnet#124975 accidentally placed the call inside an existing #ifdef DEBUG region. Co-authored-by: Copilot <[email protected]>
Remove P90 ratio and Overall/MinOpts/FullOpts pivots. Instead, track multiple metrics per collection: BytesAllocated, StackAllocatedRefClasses, StackAllocatedBoxedValueClasses, StackAllocatedArrays, LoopsCloned, CseCount. Each metric with a diff gets its own collapsible section in the markdown summary with Base/Diff/PDIFF columns. Co-authored-by: Copilot <[email protected]>
The long summary now always shows a collapsible Details table with Base/Diff/PDIFF for all metrics across all collections, even when there are no significant diffs. Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Metrics that already appear in the diff summary section are not repeated in the details section below. Co-authored-by: Copilot <[email protected]>
Remove hardcoded METRIC_DIFF_METRICS list. Instead, discover all metric columns dynamically from the CSV header (Base X / Diff X pairs). This automatically picks up any new metrics added to jitmetadatalist.h. Float metrics like PerfScore are handled correctly. Co-authored-by: Copilot <[email protected]>
- Track Successful/Missing/Failing compiles separately for base and diff sides (fixes incorrect counts when results differ between JITs) - Rename remaining 'memory diff' references to 'metric diff' in logs, comments, and method names for consistency with metricdiff command Co-authored-by: Copilot <[email protected]>
… diffs - Force JitReportMetrics=0 for tpdiff and asmdiffs to avoid throughput measurement noise from metric reporting - Show +infinity (red) in PDIFF column when base is 0 and diff > 0 instead of misleading 0.00% Co-authored-by: Copilot <[email protected]>
| memory_diffs = [] | ||
|
|
There was a problem hiding this comment.
The variable name memory_diffs suggests this mode is allocations-only, but the code aggregates arbitrary metrics discovered from the details CSV. Consider renaming it (and related tuple element names) to metric_diffs/metrics_diffs to better match what’s being tracked.
Co-authored-by: Copilot <[email protected]>
| The object is responsible for replaying the mch files given to the | ||
| instance of the class and measuring JIT memory allocation differences. |
There was a problem hiding this comment.
The new metricdiff replay class docstring still says it measures “JIT memory allocation differences”, but the implementation discovers and aggregates all metrics present in the -details CSV (not just allocations). Updating the docstring will help avoid confusion when interpreting metricdiff output and when extending the feature in the future.
| The object is responsible for replaying the mch files given to the | |
| instance of the class and measuring JIT memory allocation differences. | |
| The object is responsible for replaying the MCH files given to the | |
| instance of the class and discovering, aggregating, and comparing | |
| all metrics reported in the SuperPMI -details CSV output. |
| """ Replay SuperPMI collections measuring memory allocation differences. | ||
|
|
||
| Returns: | ||
| (bool) True on success; False otherwise | ||
| """ |
There was a problem hiding this comment.
This method docstring says metricdiff is “measuring memory allocation differences”, but the code aggregates all metrics found in the -details CSV. Please update the wording so it matches what the mode actually reports.

Example: