dotnetup retries to download manifest content - flaky test fixes#53102
dotnetup retries to download manifest content - flaky test fixes#53102nagilson wants to merge 7 commits intodotnet:release/dnupfrom
dotnetup retries to download manifest content - flaky test fixes#53102Conversation
also trying to run CI to see if network issues are reoccuring
dotnetrelease library doesn't allow custom http clients - does it not respect proxy?
DOTNET_TESTHOOK_DEFAULT_INSTALL_PATH is no longer used, we pass the manifest path into the app args. Also, this sets the cwd for all of the process, not the sub dotnetup process. This was a poor choice and prone to cause race conditions on CWD setting / unsetting for concurrent test
dotnetup retries to download manifest content - flaky test fixe s
dotnetup retries to download manifest content - flaky test fixe sdotnetup retries to download manifest content - flaky test fixes
There was a problem hiding this comment.
Pull request overview
This pull request addresses flaky test issues in dotnetup by adding retry logic for network operations and improving test isolation. The changes aim to make CI builds more resilient to transient network failures when downloading .NET release manifests.
Changes:
- Added retry logic with linear backoff to
ReleaseManifest.GetReleasesIndex()to handle transient network failures when fetching the releases index - Simplified
TestEnvironmentclass by removing process-wide state mutations (environment variables and current directory changes), relying instead on explicit path parameters for proper test isolation - Configured CI pipeline with permissive network isolation policy to allow external network access needed for downloading release manifests
- Added documentation guideline about using the existing
dotnetuplabel for issues
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/Installer/Microsoft.Dotnet.Installation/Internal/ReleaseManifest.cs | Adds retry logic with linear backoff for ProductCollection.GetAsync() calls to handle transient HttpRequestException failures |
| test/dotnetup.Tests/Utilities/TestEnvironment.cs | Removes process-wide state mutations (CWD and environment variables) to improve test isolation and parallelization safety |
| .vsts-dnup-ci.yml | Adds permissive network isolation policy to allow external network access and removes trailing whitespace |
| .github/copilot-instructions.md | Adds guideline to use existing dotnetup label rather than creating new label variants |
| catch (HttpRequestException ex) when (attempt < MaxRetryCount) | ||
| { | ||
| lastException = ex; | ||
| Thread.Sleep(RetryDelayMilliseconds * attempt); // Linear backoff | ||
| } | ||
| } | ||
|
|
||
| throw new HttpRequestException( | ||
| $"Failed to fetch the releases index after {MaxRetryCount} attempts.", lastException); |
There was a problem hiding this comment.
The retry logic has a flaw: when the final attempt (attempt 3) throws an HttpRequestException, the catch clause won't match because the condition is attempt < MaxRetryCount (3 < 3 is false). This means the exception will propagate without being wrapped in the informative error message with retry count.
To fix this, either:
- Change the catch condition to not include the attempt check, and instead catch the exception on all attempts, storing it in lastException each time, then check if we should retry after the catch block
- Add a separate catch block without the when clause outside the loop to handle the final attempt's exception
Option 1 is cleaner and matches the pattern used in other retry implementations in the codebase.
…com/nagilson/sdk into nagilson-dnup-release-network-retry
also trying to run CI to see if network issues are reoccuring