[cDAC] Convert cDAC dump tests to run on Helix#124782
[cDAC] Convert cDAC dump tests to run on Helix#124782max-charlamb wants to merge 4 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag |
There was a problem hiding this comment.
Pull request overview
This PR converts the cDAC dump tests from running as ADO pipeline stages to running on Helix infrastructure. The change simplifies the pipeline by replacing separate DumpCreation and DumpTest stages with a single CdacDumpTests stage that generates dumps and runs tests on Helix machines.
Changes:
- Adds Helix infrastructure for remote test execution
- Enables windows_arm64 platform support
- Consolidates dump generation and testing into a single Helix-based stage
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| cdac-dump-helix.proj | New Helix SDK project that orchestrates dump generation and test execution on Helix; defines commands, queues, and work items |
| Microsoft.Diagnostics.DataContractReader.DumpTests.csproj | Adds PrepareHelixPayload target to stage test artifacts and debuggees; adds XUnitConsoleRunner package reference |
| DumpTests.targets | Adds BuildDebuggeesOnly target for building debuggees without generating dumps locally |
| runtime-diagnostics.yml | Replaces two-stage ADO workflow (DumpCreation + DumpTest) with single CdacDumpTests stage using Helix |
| <_HeapDebuggee Include="@(_DebuggeeWithTypes)" Condition="'%(DumpTypes)' == 'Heap'" DumpDir="heap" MiniDumpType="2" /> | ||
| <_FullDebuggee Include="@(_DebuggeeWithTypes)" Condition="'%(DumpTypes)' == 'Full'" DumpDir="full" MiniDumpType="4" /> |
There was a problem hiding this comment.
The exact equality conditions won't handle debuggees with DumpTypes="Heap;Full" (which is documented as supported in Debuggees/Directory.Build.props line 14). A debuggee with semicolon-separated dump types would be excluded from both _HeapDebuggee and _FullDebuggee lists.
To fix this, the DumpTypes metadata needs to be split into individual items first before filtering. A common MSBuild pattern for this is:
- Create an intermediate ItemGroup that transforms each debuggee×dumptype combination into separate items
- Then filter those items by dump type name
Alternatively, use Contains checks, but ensure they don't have false positives (e.g., check for the value equaling the type or containing it surrounded by semicolons).
| <_HeapDebuggee Include="@(_DebuggeeWithTypes)" Condition="'%(DumpTypes)' == 'Heap'" DumpDir="heap" MiniDumpType="2" /> | |
| <_FullDebuggee Include="@(_DebuggeeWithTypes)" Condition="'%(DumpTypes)' == 'Full'" DumpDir="full" MiniDumpType="4" /> | |
| <_HeapDebuggee Include="@(_DebuggeeWithTypes)" Condition="'%(DumpTypes)' == 'Heap' Or $([System.String]::Copy(';%(DumpTypes);').Contains(';Heap;'))" DumpDir="heap" MiniDumpType="2" /> | |
| <_FullDebuggee Include="@(_DebuggeeWithTypes)" Condition="'%(DumpTypes)' == 'Full' Or $([System.String]::Copy(';%(DumpTypes);').Contains(';Full;'))" DumpDir="full" MiniDumpType="4" /> |
| -bl:$(Build.SourcesDirectory)/artifacts/log/DumpTestPayload.binlog | ||
| displayName: 'Prepare Helix Payload' | ||
| - powershell: | | ||
| $testhostDir = Get-ChildItem -Directory -Path "$(Build.SourcesDirectory)/artifacts/bin/testhost/net*" | Select-Object -First 1 -ExpandProperty FullName |
There was a problem hiding this comment.
The PowerShell script uses a wildcard pattern net* to find the testhost directory and selects the first match. This could be fragile if multiple testhost directories exist (e.g., from previous builds or different configurations). Consider using a more specific pattern that includes the target architecture and OS, similar to how DumpTests.targets constructs the TestHostDir path (line 44): $(NetCoreAppCurrent)-$(HostOS)-$(_TestHostConfig)-$(_HostArch).
| $testhostDir = Get-ChildItem -Directory -Path "$(Build.SourcesDirectory)/artifacts/bin/testhost/net*" | Select-Object -First 1 -ExpandProperty FullName | |
| $testhostDir = Get-ChildItem -Directory -Path "$(Build.SourcesDirectory)/artifacts/bin/testhost/net*-$(osGroup)-$(_BuildConfig)-$(archType)" | Select-Object -First 1 -ExpandProperty FullName |
|
|
||
| <!-- Map TargetOS + TargetArchitecture to Helix queues --> | ||
| <PropertyGroup Condition="'$(HelixTargetQueues)' == ''"> | ||
| <HelixTargetQueues Condition="'$(TargetOS)' == 'windows' AND '$(TargetArchitecture)' == 'x64'">Windows.11.Amd64.Client.Open</HelixTargetQueues> |
There was a problem hiding this comment.
Can we keep all helix queue references in the yaml files under eng? The Helix queue names get updated regularly. Having the names spread throughout the repo will make it harder.
d82438f to
3f3aefe
Compare
3f3aefe to
d1ce702
Compare
| <_HeapDebuggee Include="@(_DebuggeeWithTypes)" Condition="'%(DumpTypes)' == 'Heap'" DumpDir="heap" MiniDumpType="2" /> | ||
| <_FullDebuggee Include="@(_DebuggeeWithTypes)" Condition="'%(DumpTypes)' == 'Full'" DumpDir="full" MiniDumpType="4" /> | ||
| <_AllDebuggeeMetadata Include="@(_HeapDebuggee);@(_FullDebuggee)" /> | ||
| </ItemGroup> | ||
|
|
||
| <ItemGroup> | ||
| <_MetadataLines Include="<Project>" /> | ||
| <_MetadataLines Include=" <ItemGroup>" /> | ||
| <_MetadataLines Include="@(_AllDebuggeeMetadata->' <_Debuggee Include="%(Identity)" DumpDir="%(DumpDir)" MiniDumpType="%(MiniDumpType)" />')" /> |
There was a problem hiding this comment.
The condition logic doesn't correctly handle debuggees with semicolon-separated DumpTypes (e.g., "Heap;Full"). The string comparison '%(DumpTypes)' == 'Heap' will fail when DumpTypes contains "Heap;Full", causing those debuggees to be excluded from both _HeapDebuggee and _FullDebuggee.
To fix this, you need to split the DumpTypes metadata first, similar to how DumpTests.targets handles this in the _GenerateDumpsForDebuggee target (lines 105-108). One approach would be to create separate items for each dump type before filtering:
<ItemGroup>
<_DebuggeeTypeExpanded Include="@(_DebuggeeWithTypes->Metadata('DumpTypes'))" DebuggeeNameoriginal="%(Identity)" />
</ItemGroup>Then filter based on the expanded items. Alternatively, use the MSBuild Contains function if available, though note that MSBuild item metadata conditions don't support the Contains function directly in this context.
| <_HeapDebuggee Include="@(_DebuggeeWithTypes)" Condition="'%(DumpTypes)' == 'Heap'" DumpDir="heap" MiniDumpType="2" /> | |
| <_FullDebuggee Include="@(_DebuggeeWithTypes)" Condition="'%(DumpTypes)' == 'Full'" DumpDir="full" MiniDumpType="4" /> | |
| <_AllDebuggeeMetadata Include="@(_HeapDebuggee);@(_FullDebuggee)" /> | |
| </ItemGroup> | |
| <ItemGroup> | |
| <_MetadataLines Include="<Project>" /> | |
| <_MetadataLines Include=" <ItemGroup>" /> | |
| <_MetadataLines Include="@(_AllDebuggeeMetadata->' <_Debuggee Include="%(Identity)" DumpDir="%(DumpDir)" MiniDumpType="%(MiniDumpType)" />')" /> | |
| <_DebuggeeTypeExpanded Include="@(_DebuggeeWithTypes->'%(DumpTypes)')" DebuggeeName="%(_DebuggeeWithTypes.Identity)" /> | |
| </ItemGroup> | |
| <ItemGroup> | |
| <_HeapDebuggee Include="@(_DebuggeeTypeExpanded)" Condition="'%(Identity)' == 'Heap'"> | |
| <DebuggeeName>%(DebuggeeName)</DebuggeeName> | |
| <DumpDir>heap</DumpDir> | |
| <MiniDumpType>2</MiniDumpType> | |
| </_HeapDebuggee> | |
| <_FullDebuggee Include="@(_DebuggeeTypeExpanded)" Condition="'%(Identity)' == 'Full'"> | |
| <DebuggeeName>%(DebuggeeName)</DebuggeeName> | |
| <DumpDir>full</DumpDir> | |
| <MiniDumpType>4</MiniDumpType> | |
| </_FullDebuggee> | |
| <_AllDebuggeeMetadata Include="@(_HeapDebuggee);@(_FullDebuggee)" /> | |
| </ItemGroup> | |
| <ItemGroup> | |
| <_MetadataLines Include="<Project>" /> | |
| <_MetadataLines Include=" <ItemGroup>" /> | |
| <_MetadataLines Include="@(_AllDebuggeeMetadata->' <_Debuggee Include="%(DebuggeeName)" DumpDir="%(DumpDir)" MiniDumpType="%(MiniDumpType)" />')" /> |
3f93840 to
569cce9
Compare
| <!-- Pre-commands: enable dump generation and set dump root for tests --> | ||
| <ItemGroup Condition="'$(TargetOS)' == 'windows'"> | ||
| <!-- Allow heap dump generation with the unsigned locally-built DAC --> | ||
| <HelixPreCommand Include="reg add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\MiniDumpSettings" /v DisableAuxProviderSignatureCheck /t REG_DWORD /d 1 /f" /> |
There was a problem hiding this comment.
The Windows Helix pre-command that sets DisableAuxProviderSignatureCheck can fail on machines without admin rights. Unlike the existing MSBuild target in DumpTests.targets (which is opt-in and ignores failures), Helix will treat a failing pre-command as a work item failure. Consider making this opt-in (pipeline-controlled) and/or ensuring the command cannot fail the work item (e.g., ignore exit code and emit a diagnostic message instead).
| <HelixPreCommand Include="reg add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\MiniDumpSettings" /v DisableAuxProviderSignatureCheck /t REG_DWORD /d 1 /f" /> | |
| <HelixPreCommand Include="reg add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\MiniDumpSettings" /v DisableAuxProviderSignatureCheck /t REG_DWORD /d 1 /f || echo Failed to set DisableAuxProviderSignatureCheck (non-fatal)" /> |
| jobTemplate: /eng/pipelines/common/global-build-job.yml | ||
| buildConfig: release | ||
| platforms: ${{ parameters.cdacDumpPlatforms }} | ||
| shouldContinueOnError: true |
There was a problem hiding this comment.
shouldContinueOnError: true for the whole CdacDumpTests platform matrix means failures in the build/Helix submission steps can be downgraded to SucceededWithIssues instead of failing the job. This risks the pipeline not gating on dump test failures (unlike the previous DumpTest stage which explicitly failed when tests had issues). Consider removing shouldContinueOnError, or adding a final explicit failure step when the job status is SucceededWithIssues so failures still block the stage as intended.
| shouldContinueOnError: true |
Replace the DumpCreation + DumpTest ADO stages in runtime-diagnostics.yml with a single CdacDumpTests stage that builds debuggees on ADO and sends dump generation + test execution to Helix machines. Flow: ADO builds runtime + debuggees, prepares a Helix payload containing test DLLs, debuggee binaries, and auto-generated dump type metadata. On Helix, each debuggee is run to produce a crash dump, then xunit tests validate the dumps. Changes: - Add BuildDebuggeesOnly target to DumpTests.targets using Exec with dotnet build (ensures implicit NuGet restore, matching _GenerateLocalDump) - Add PrepareHelixPayload target + XUnitConsoleRunner package to csproj; copies tests + debuggees, generates debuggee-metadata.props from GetDumpTypes for dynamic debuggee discovery - Create cdac-dump-helix.proj Helix SDK project that imports the generated metadata, builds per-OS dump generation + xunit test execution commands - Update runtime-diagnostics.yml with single CdacDumpTests stage - Add windows_arm64 platform support Co-authored-by: Copilot <[email protected]>
569cce9 to
3d682f9
Compare
| <PropertyGroup Condition="'$(TargetOS)' == 'windows'"> | ||
| <_TestCommands>@(_SourcePlatform->'set "CDAC_DUMP_ROOT=%HELIX_WORKITEM_PAYLOAD%\dumps\%(Identity)" & %HELIX_CORRELATION_PAYLOAD%\dotnet.exe exec --runtimeconfig %HELIX_WORKITEM_PAYLOAD%\tests\Microsoft.Diagnostics.DataContractReader.DumpTests.runtimeconfig.json --depsfile %HELIX_WORKITEM_PAYLOAD%\tests\Microsoft.Diagnostics.DataContractReader.DumpTests.deps.json %HELIX_WORKITEM_PAYLOAD%\tests\xunit.console.dll %HELIX_WORKITEM_PAYLOAD%\tests\Microsoft.Diagnostics.DataContractReader.DumpTests.dll -xml testResults_%(Identity).xml -nologo', ' & ')</_TestCommands> | ||
| </PropertyGroup> | ||
|
|
||
| <PropertyGroup Condition="'$(TargetOS)' != 'windows'"> | ||
| <_TestCommands>@(_SourcePlatform->'CDAC_DUMP_ROOT=$HELIX_WORKITEM_PAYLOAD/dumps/%(Identity) $HELIX_CORRELATION_PAYLOAD/dotnet exec --runtimeconfig $HELIX_WORKITEM_PAYLOAD/tests/Microsoft.Diagnostics.DataContractReader.DumpTests.runtimeconfig.json --depsfile $HELIX_WORKITEM_PAYLOAD/tests/Microsoft.Diagnostics.DataContractReader.DumpTests.deps.json $HELIX_WORKITEM_PAYLOAD/tests/xunit.console.dll $HELIX_WORKITEM_PAYLOAD/tests/Microsoft.Diagnostics.DataContractReader.DumpTests.dll -xml testResults_%(Identity).xml -nologo', ' %3B ')</_TestCommands> | ||
| </PropertyGroup> |
There was a problem hiding this comment.
The x-plat test work item chains multiple xunit runs with & (Windows) / ; (Unix). This means the overall work item exit code will be the last xunit invocation’s exit code, so failures from earlier source platforms can be masked if a later run passes. If this is intended to be caught via Helix's xUnit reporter, ensure the reporter is guaranteed to ingest all result files; otherwise aggregate failures explicitly (e.g., track an error flag per run and return non-zero at the end) or split into multiple Helix work items (one per source platform).
| <!-- Pre-commands: enable dump generation --> | ||
| <ItemGroup Condition="'$(TargetOS)' == 'windows'"> | ||
| <!-- Allow heap dump generation with the unsigned locally-built DAC --> | ||
| <HelixPreCommand Include="reg add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\MiniDumpSettings" /v DisableAuxProviderSignatureCheck /t REG_DWORD /d 1 /f" /> |
There was a problem hiding this comment.
The Helix pre-command that writes DisableAuxProviderSignatureCheck uses reg add ... without ignoring failures. Since this is a machine-wide HKLM write that may require elevation, a failure here can abort dump generation and prevent any dumps from being uploaded. Make this step best-effort (ignore errors) or otherwise ensure it won't fail the work item when the registry write isn't permitted.
| <HelixPreCommand Include="reg add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\MiniDumpSettings" /v DisableAuxProviderSignatureCheck /t REG_DWORD /d 1 /f" /> | |
| <HelixPreCommand Include="reg add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\MiniDumpSettings" /v DisableAuxProviderSignatureCheck /t REG_DWORD /d 1 /f || echo DisableAuxProviderSignatureCheck registry write failed (ignored)" /> |
| jobTemplate: /eng/pipelines/common/global-build-job.yml | ||
| buildConfig: release | ||
| platforms: ${{ parameters.cdacDumpPlatforms }} | ||
| shouldContinueOnError: true |
There was a problem hiding this comment.
These new Helix-based stages set shouldContinueOnError: true, which makes the send-to-helix-inner-step.yml invocation run with continueOnError: true. Unlike the previous ADO-based DumpTest flow, there’s no final “fail if tests failed” step, so Helix failures can leave the job/stage as SucceededWithIssues instead of failing the pipeline. If dump test failures are meant to be gating, consider removing shouldContinueOnError: true here or reintroducing an explicit failure step when Agent.JobStatus == SucceededWithIssues.
| shouldContinueOnError: true |
|
|
||
| <!-- Copy xunit console runner files to tests directory --> | ||
| <ItemGroup> | ||
| <_XunitConsoleFiles Include="$([System.IO.Path]::GetDirectoryName('$(XunitConsoleNetCoreAppPath)'))\*" /> |
There was a problem hiding this comment.
PrepareHelixPayload copies xunit console runner files using a hardcoded \* path separator (...GetDirectoryName('$(XunitConsoleNetCoreAppPath)'))\*). When this target runs on Linux/macOS build agents, the backslash can produce an invalid glob and skip copying the runner. Use NormalizePath/NormalizeDirectory (or $(MSBuildThisFileDirectory)-style joins) so the include pattern is OS-agnostic.
| <_XunitConsoleFiles Include="$([System.IO.Path]::GetDirectoryName('$(XunitConsoleNetCoreAppPath)'))\*" /> | |
| <_XunitConsoleFiles Include="$([MSBuild]::NormalizeDirectory($([System.IO.Path]::GetDirectoryName('$(XunitConsoleNetCoreAppPath)'))))*" /> |
| <_HeapDebuggee Include="@(_DebuggeeWithTypes)" Condition="'%(DumpTypes)' == 'Heap'" DumpDir="heap" MiniDumpType="2" /> | ||
| <_FullDebuggee Include="@(_DebuggeeWithTypes)" Condition="'%(DumpTypes)' == 'Full'" DumpDir="full" MiniDumpType="4" /> |
There was a problem hiding this comment.
The generated debuggee-metadata.props only emits metadata when %(DumpTypes) is exactly Heap or exactly Full. However DumpTypes is documented/used elsewhere as supporting Heap;Full (generate both). If any debuggee sets Heap;Full, it will be omitted from the Helix metadata and its dumps/tests won't run. Consider splitting %(DumpTypes) on ; and emitting one _Debuggee item per dump type.
| <_HeapDebuggee Include="@(_DebuggeeWithTypes)" Condition="'%(DumpTypes)' == 'Heap'" DumpDir="heap" MiniDumpType="2" /> | |
| <_FullDebuggee Include="@(_DebuggeeWithTypes)" Condition="'%(DumpTypes)' == 'Full'" DumpDir="full" MiniDumpType="4" /> | |
| <_HeapDebuggee Include="@(_DebuggeeWithTypes)" Condition="$([System.String]::Copy('%(DumpTypes)').Contains('Heap'))" DumpDir="heap" MiniDumpType="2" /> | |
| <_FullDebuggee Include="@(_DebuggeeWithTypes)" Condition="$([System.String]::Copy('%(DumpTypes)').Contains('Full'))" DumpDir="full" MiniDumpType="4" /> |
Replace the DumpCreation + DumpTest ADO stages in runtime-diagnostics.yml with a single CdacDumpTests stage that builds, prepares a Helix payload, and sends dump generation + test execution to Helix machines.
Changes: