Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 12, 2025

Fix retry logic in DownloadWithRetries function

Issue Analysis

The DownloadWithRetries function in eng/download-source-built-archive.sh has a bug where it doesn't actually retry downloads for most error conditions. When curl fails with errors like exit code 92 (HTTP/2 stream error), the function immediately returns instead of retrying, leading to intermittent build failures.

Root Cause

The case statement's default branch returns immediately for all non-404 exit codes:

  • Exit code 22 (HTTP 404): return 22 immediately ✓
  • All other codes: return 1 immediately ✗ BUG!

Solution Implemented

Fixed the case statement by changing the default case to sleep and continue the loop instead of returning:

case $exitCode in
  22)
    # HTTP error (including 404) - don't retry
    return 22
    ;;
  *)
    # For all other errors (including partial transfers), sleep and retry
    sleep 3
    ;;
esac

This ensures:

  • Exit code 22 (404): Return immediately without retry (correct behavior)
  • All other errors (18, 92, 6, etc.): Sleep 3 seconds and continue the retry loop (up to 5 attempts)
  • Case statement structure preserved for future extensibility

Impact

  • ✓ Transient network errors (HTTP/2 stream errors, connection failures, etc.) will now be retried up to 5 times
  • ✓ 404 errors still fail immediately (no wasted retry attempts)
  • ✓ Case statement structure maintained for adding additional special cases in the future
  • ✓ Should resolve intermittent build failures reported in the issue

Changes

  • Analyze the issue and understand the retry logic bug
  • Fix the retry logic in case statement's default branch
  • Keep case statement structure as requested in code review
  • Verify the fix works correctly
  • Commit changes
Original prompt

This section details on the original issue you should resolve

<issue_title>dotnet-symbols-.*.tar.gz not found building SB</issue_title>
<issue_description>## Error

  tar (child): /__w/1/s/artifacts/obj/init-source-only//dotnet-symbols-all-*.tar.gz: Cannot open: No such file or directory
  tar (child): Error is not recoverable: exiting now
  tar: Child returned status 2
  tar: Error is not recoverable: exiting now
/__w/1/s/eng/init-source-only.proj(109,5): error MSB3073: The command "tar -xzf /__w/1/s/artifacts/obj/init-source-only//dotnet-symbols-all-*.tar.gz" exited with code 2.

Looks intermittent, seen once so far.

Links

Affected Releases

  • 10.0.200</issue_description>

<agent_instructions>See if you can spot and fix an issue with the DownloadWithRetries that is preventing the code in

echo " Downloading $label from $displayUrl..."
if DownloadWithRetries "$archiveUrl" "$outputDir"; then
downloadSucceeded=true
break
else
local downloadResult=$?
# Only continue to next URL if it was a 404
if [[ $downloadResult -ne 22 ]]; then
echo " ERROR: Failed to download $displayUrl"
return 1
fi
echo " Not found, trying next location..."
fi
from correctly retrying.

This is the reported error:

 27 1152M   27  312M    0     0  2443k      0  0:08:02  0:02:10  0:05:52     0
 27 1152M   27  312M    0     0  2406k      0  0:08:10  0:02:12  0:05:58     0
 27 1152M   27  312M    0     0  2370k      0  0:08:17  0:02:14  0:06:03     0
 27 1152M   27  312M    0     0  2335k      0  0:08:25  0:02:16  0:06:09     0
 27 1152M   27  312M    0     0  2302k      0  0:08:32  0:02:18  0:06:14     0
 27 1152M   27  312M    0     0  2269k      0  0:08:39  0:02:20  0:06:19     0
 27 1152M   27  312M    0     0  2237k      0  0:08:47  0:02:22  0:06:25     0
 27 1152M   27  312M    0     0  2206k      0  0:08:54  0:02:24  0:06:30     0
 27 1152M   27  312M    0     0  2198k      0  0:08:56  0:02:25  0:06:31     0
curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)
  ERROR: Failed to download https://ci.dot.net/public/source-build/Private.SourceBuilt.Artifacts.10.0.100-rtm.25523.111.almalinux.8-x64.tar.gz
tar (child): /__w/_temp/shared-components/Private.SourceBuilt.SharedComponents.*.tar.gz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting no

If you do not see any issues, do not change any code and make a comment on the PR</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@MichaelSimons @NikolaMilosavljevic, can you take a look? TIA @NikolaMilosavljevic Not sure when shared symbols archive gets downloaded in 2xx builds. @mthalman do you know? @mthalman Symbols are in the included in the PSB artifacts tarball from 1xx that get downloaded in `Download Shared Component Artifacts`. You can see there was an error in that [step](https://dev.azure.com/dnceng/internal/_build/results?buildId=2859088&view=logs&j=8802692e-c42f-5cd7-c577-7149df984d8d&t=5f62c26e-3d0a-52f9-3a2b-932cae79b44f&s=9b073b6c-199e-5782-2adb-83c30ea48e21):
curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)
  ERROR: Failed to download https://ci.dot.net/public/source-build/Private.SourceBuilt.Artifacts.10.0.100-rtm.25523.111.almalinux.8-x64.tar.gz

Looks like just a network problem but that step should be updated to fail on such an error.</comment_new>
<comment_new>@mmitche
It would be good to add retries to that so we can avoid the network errors.</comment_new>
<comment_new>@mthalman

It would be good to add retries to that so we can avoid the network errors.

That is the intention here:

if DownloadWithRetries "$archiveUrl" "$outputDir"; then

But apparently that's not working.</comment_new>


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Fix intermittent issue with dotnet-symbols tar file not found Fix DownloadWithRetries to actually retry on network errors Dec 12, 2025
Copilot AI requested a review from mmitche December 12, 2025 17:02
@mmitche mmitche marked this pull request as ready for review December 12, 2025 17:09
@mmitche mmitche requested a review from a team as a code owner December 12, 2025 17:09
Copilot AI review requested due to automatic review settings December 12, 2025 17:09
@mmitche mmitche requested a review from a team as a code owner December 12, 2025 17:09
@mmitche mmitche enabled auto-merge (squash) December 12, 2025 17:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug in the DownloadWithRetries function that prevented it from actually retrying downloads on most network errors. The original code had a logic flaw where the default case in the error handling switch statement would return immediately instead of continuing the retry loop, causing intermittent build failures when transient network errors like HTTP/2 stream errors occurred.

Key Changes:

  • Fixed the default case in the error handling switch statement to sleep and continue retrying instead of returning immediately
  • Preserved the special handling for HTTP 404 errors (exit code 22) to return immediately without retry
  • Ensured transient network errors (like exit code 92 - HTTP/2 stream errors) now properly retry up to 5 times with 3-second delays

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mmitche mmitche merged commit 29afcbc into main Dec 12, 2025
17 checks passed
@mmitche mmitche deleted the copilot/fix-dotnet-symbols-error branch December 12, 2025 22:14
@mmitche
Copy link
Member

mmitche commented Dec 15, 2025

/backport to release/10.0.1xx

@github-actions
Copy link
Contributor

Started backporting to release/10.0.1xx (link to workflow run)

@github-actions
Copy link
Contributor

@mmitche backporting to release/10.0.1xx failed, the patch most likely resulted in conflicts. Please backport manually!

git am output
$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Creating an empty commit: Initial plan
Applying: Fix retry logic in DownloadWithRetries to properly retry on network errors
Using index info to reconstruct a base tree...
M	eng/download-source-built-archive.sh
Falling back to patching base and 3-way merge...
Auto-merging eng/download-source-built-archive.sh
CONFLICT (content): Merge conflict in eng/download-source-built-archive.sh
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0002 Fix retry logic in DownloadWithRetries to properly retry on network errors
Error: The process '/usr/bin/git' failed with exit code 128

Link to workflow output

mmitche added a commit to mmitche/dotnet that referenced this pull request Dec 15, 2025
mmitche added a commit to mmitche/dotnet that referenced this pull request Dec 15, 2025
mmitche added a commit that referenced this pull request Dec 15, 2025
mmitche added a commit that referenced this pull request Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dotnet-symbols-.*.tar.gz not found building SB

4 participants