Skip to content

GHA/windows: work around Git for Windows perf regression#15380

Closed
vszakats wants to merge 4 commits intocurl:masterfrom
vszakats:gha-vcpkg-test
Closed

GHA/windows: work around Git for Windows perf regression#15380
vszakats wants to merge 4 commits intocurl:masterfrom
vszakats:gha-vcpkg-test

Conversation

@vszakats
Copy link
Copy Markdown
Member

@vszakats vszakats commented Oct 23, 2024

Fix the significant perf regression for vcpkg jobs by switching to the
MSYS2 shell environment from Git for Windows. This env is already used
for old-mingw-w64 job that remained unaffected by this issue.

The issue began with the windows-runner update 20241015.1.0. It bumped
Git for Windows from Git 2.46.2.windows.1 to Git 2.47.0.windows.1. GfW
bumped its MSYS2 components, including msys-2.0.dll. That's Cygwin
code, which may have contributed to this. Pipes were involved and
runtests.pl relies on pipes heavily in parallel mode. (The issue was
not seen with parallel tests disabled, in retrospect.)

This is useful as a permanent solution too. It drop GfW as a dependency
and makes Windows jobs use one less shell/env flavour.

Long term it might help to use native Windows Perl to avoid the MSYS
layer completely, if there is a way to make that work.

Assortment of possibly related links:
https://cygwin.com/pipermail/cygwin/2024-August/256398.html
cygwin/cygwin@f78009c
cygwin/cygwin@7f3c225

actions/runner-images#10843
git-for-windows/git#5199
git-for-windows/msys2-runtime#75
git-for-windows/msys2-runtime@7913a41
git-for-windows/msys2-runtime@555afcb
cygwin/cygwin@1c5f4dc

Follow-up to c33174d #15364
Follow-up to 1e03059 #15356

@vszakats vszakats marked this pull request as draft October 23, 2024 09:06
@github-actions github-actions Bot added Windows Windows-specific CI Continuous Integration labels Oct 23, 2024
@vszakats
Copy link
Copy Markdown
Member Author

vszakats commented Oct 23, 2024

It looks like the devastating perf regression traces back to Git for Windows.
Possibly not the shell itself, but something brought by its MSYS2 env,
or tool?

It would be nice to know if this was a bugfix, that for example aimed to
fix stability, or just a regression. Was it something local to GfW or brought
from upstream MSYS2 that may hit that also (we'll see).

Either way I think we are in a better place with using one less shell flavour,
and rely on native MSYS2 in vcpkg jobs like we already do in others. This
exact setup is used for old-mingw-w64 jobs.

What's curious is that GfW is the default when using "shell: bash", and
nobody else hit this? Halving performance is something noticeable. Maybe
we are hitting an uncommon combination of tools / features.

Digging more:

Maybe something v2.47.0.windows.2 would fix it?
actions/runner-images#10843

It looks like there was a big update for msys-2.0.dll, which is something
capable of causing what we saw.
git-for-windows/git#5199
git-for-windows/msys2-runtime#75
git-for-windows/msys2-runtime@7913a41
git-for-windows/msys2-runtime@555afcb
cygwin/cygwin@1c5f4dc

And more. This seems to hit closer than the above, reporting and
fixing a big perf drop when using pipes:

https://cygwin.com/pipermail/cygwin/2024-August/256398.html
cygwin/cygwin@f78009c
cygwin/cygwin@7f3c225

@vszakats vszakats changed the title try a different shell 1 GHA/windows: workaround for Git for Windows regression Oct 23, 2024
@vszakats vszakats marked this pull request as ready for review October 23, 2024 10:44
@vszakats vszakats changed the title GHA/windows: workaround for Git for Windows regression GHA/windows: work around Git for Windows regression Oct 23, 2024
@vszakats vszakats changed the title GHA/windows: work around Git for Windows regression GHA/windows: work around Git for Windows perf regression Oct 23, 2024
@vszakats vszakats closed this in 5f9411f Oct 23, 2024
@vszakats vszakats deleted the gha-vcpkg-test branch October 23, 2024 11:01
vszakats added a commit that referenced this pull request Nov 26, 2024
…null-dereference`

- GHA/windows: switch mingw-w64 UWP CI job to use UCRT.
  `msvcr120_app` was missing `getch()` for example.
  Follow-up to f988842 #15637
  This job tests compiling for UWP correctly, but the the resulting
  `curl.exe` still doesn't look like a correct UWP app, now exiting
  on startup with: `curl: error initializing curl library`.

- tool_getpass: restore `getch()` for UWP builds.
  Follow-up to f988842 #15637

- schannel: silence `-Werror=null-dereference` warning in mingw-w64 UWP:
  ```
  lib/vtls/schannel_verify.c: In function 'Curl_verify_host':
  lib/vtls/schannel_verify.c:558:33: error: null pointer dereference [-Werror=null-dereference]
    558 |     for(i = 0; i < alt_name_info->cAltEntry; ++i) {
        |                    ~~~~~~~~~~~~~^~~~~~~~~~~
  lib/vtls/schannel_verify.c:559:50: error: null pointer dereference [-Werror=null-dereference]
    559 |       PCERT_ALT_NAME_ENTRY entry = &alt_name_info->rgAltEntry[i];
        |                                     ~~~~~~~~~~~~~^~~~~~~~~~~~
  ```
  Ref: https://github.com/curl/curl/actions/runs/12022656065/job/33515255397?pr=15638#step:19:27
  Follow-up to 9640a8e #15421

- GHA/windows: fix `find` command in MSVC job step.
  Follow-up to 5f9411f #15380

- GHA/windows: drop unnecessary `windowsappcompat` lib from mingw-w64
  UWP job. Also drop related MSYS2 package.

- GHA/windows: cmake 3.31.0 still invokes `windres` with wrong options
  with mingw-w64 UPW. Update curl version in comment accordingly.

- GHA/windows: tidy up mingw-w64 UWP spec logic, limit it to gcc.

- GHA/windows: update comments on `curl.exe` UWP startup errors.

Closes #15638
vszakats added a commit that referenced this pull request Feb 7, 2025
Today GHA Windows runner images (all versions) deployed an upgrade
(20250127.1.0 -> 20250203.1.0) that upgraded the default MSYS2, which
now seems to feature the October 2024 issue that caused curl runtests
run times increasing ~2.5x. It also causes test987 to fail, and vcpkg
jobs hitting their time limits and fail. Reliability also got a hit.

In October this issue came with a Git for Windows upgrade, and likely
the MSYS2 runtime update within it. It affected vcpkg jobs only, and
I mitigated it by switching them to use the default MSYS2 shell and
runtime (at `C:\msys64`):

5f9411f #15380

After today's update this mitigation no longer works. The issue also
affects `dl-mingw` jobs now, though to a lesser extent than vcpkg ones.

Tried switching back to Git for Windows which received several updates
since October, but the performance issue is still present.

I managed to mitigate the slowdown in vcpkg by lowering test parallelism
to `-j4` (from `-j8`), after which the jobs are about *half the speed*
than before, and fit their time limits. `dl-mingw` builds run slower by
1-1.5 minutes per job, they were already using `-j4`.

Example jobs:

Before (ALL GOOD):
https://github.com/curl/curl/actions/runs/13167230443/job/36750175428 installed MSYS2, mingw (-j8): 3m50s (OK)
https://github.com/curl/curl/actions/runs/13167230443/job/36750158662 default MSYS2, dl-mingw (-j4): 4m22s (OK)
https://github.com/curl/curl/actions/runs/13167230443/job/36750163392 default MSYS2, vcpkg (-j8): 3m27s (OK)
runner: https://github.com/actions/runner-images/blob/win22/20250127.1/images/windows/Windows2022-Readme.md
C:\msys64:
System: MSYS_NT-10.0-20348 fv-az1115-916 3.5.4-0bc1222b.x86_64 2024-12-05 09:27 UTC x86_64 Msys
msys2/msys2-runtime@0bc1222b

After:
https://github.com/curl/curl/actions/runs/13186498273/job/36809747078 installed MSYS2, mingw (-j8): 3m48s (OK)
https://github.com/curl/curl/actions/runs/13186498273/job/36809728481 default MSYS2, dl-mingw (-j4): 5m56s (SLOW)
https://github.com/curl/curl/actions/runs/13186498273/job/36809736429 default MSYS2, vcpkg (-j8): 9m1s (SLOW)
runner: https://github.com/actions/runner-images/blob/win22/20250203.1/images/windows/Windows2022-Readme.md
C:\msys64:
System: MSYS_NT-10.0-20348 fv-az1115-498 3.5.7-2644508f.x86_64 2025-01-30 09:08 UTC x86_64 Msys
msys2/msys2-runtime@2644508f

windows-2025 image:
C:\msys64:
System: MSYS_NT-10.0-26100 fv-az2043-515 3.5.7-2644508f.x86_64 2025-01-30 09:08 UTC x86_64 Msys

windows-2019 image:
C:\msys64:
System: MSYS_NT-10.0-17763 fv-az1434-677 3.5.7-2644508f.x86_64 2025-01-30 09:08 UTC x86_64 Msys

This PR:
final: https://github.com/curl/curl/actions/runs/13186498273/job/36809736429 GfW, vcpkg (*-j4*): ~7m (SLOW)
test: https://github.com/curl/curl/actions/runs/13187992987/job/36814644852?pr=16217, GfW, vcpkg (-j8): ~11m (SLOWER)

Before and after (unused) Git for Windows (SLOW as tested in this PR):
C:\Program Files\Git
System: MINGW64_NT-10.0-20348 fv-az1760-186 3.5.4-395fda67.x86_64 2024-11-25 09:49 UTC x86_64 Msys
msys2/msys2-runtime@395fda67 (fork)

Before and after (used) MSYS2 installed via msys2/setup-msys2 (OK):
D:\a\_temp\msys64
System: MINGW64_NT-10.0-20348 fv-az836-378 3.5.4-0bc1222b.x86_64 2024-12-05 09:27 UTC x86_64 Msys

Perl pipe issue report from October, still open:
msys2/msys2-runtime#230
ARM deadlock fixed by GfW 2.47.1(1), but for x86_64, on a quick glance:
msys2/msys2-runtime@290bea9
Possibly interesting:
msys2/msys2-autobuild#62

Closes #16217
vszakats added a commit to vszakats/curl that referenced this pull request Feb 8, 2025
vszakats added a commit to vszakats/curl that referenced this pull request Feb 8, 2025
vszakats added a commit to vszakats/curl that referenced this pull request Feb 8, 2025
vszakats added a commit that referenced this pull request Feb 8, 2025
Use the last known good release of Git for Windows by installing it
manually. It restores `runtests.pl` performance to the levels before
the October 2024 and this week's fallouts which gradually deployed
MSYS2 runtimes with pipe/signal/concurrency issues.

Also:
- restore vcpkg job's test parallelism to `-j8` (from `-j4`).
- keep using the default shell for jobs not running tests.
  To avoid the unnecessary Git for Windows install overhead.

Upsides:
- good performance again.
- easy to experiment with any version.

Downsides:
- installing the Git for Windows package takes 15-30 seconds.
- we're pinned to an old package version.
- no canary to tell when the issue is fixed on the runner images.

Unknown:
- stability. (no MSYS2 runtimes were ever stable and it's difficult
  to quantify if a version improves or worsens stability/flakiness, and
  intermittent env failures.

Follow-up to 1bf774d #16217
Follow-up to 5f9411f #15380

Closes #16265
vszakats added a commit that referenced this pull request Feb 21, 2025
We recently switched to a known good version of Git for Windows to avoid
the MSYS2/Cygwin runtime performance regression.

MSYS2 is closer to the source of the MSYS2/Cygwin projects. Its known
good version is newer. Installing the downgrade is faster and safer. It
also allows to restore the scripts to their original iteration, making
the workaround easier to drop once the perf issue is fixed upstream.

Therefore, switch back to using MSYS2, and install the runtime downgrade
before running curl tests.

Also disable `pacman`'s `CheckSpace` for best performance.

Jeremy identified to the root cause of the perf regression in this
Cygwin commit (from 2024-09-17):
https://cygwin.com/git/?p=newlib-cygwin.git;a=commit;h=c7fe29f5cb85242ae2607945762f7e0b9af02513

Co-authored-by: Jeremy Drake
Patch: jeremyd2019@95a404e
Ref: #16217 (comment)
Ref: #16217 (comment)

Follow-up to 116950a #16265
Follow-up to 1bf774d #16217
Follow-up to 5f9411f #15380

Closes #16424
pps83 pushed a commit to pps83/curl that referenced this pull request Apr 26, 2025
Fix the significant perf regression for vcpkg jobs by switching to the
MSYS2 shell environment from Git for Windows. This env is already used
for old-mingw-w64 job that remained unaffected by this issue.

The issue began with the windows-runner update 20241015.1.0. It bumped
Git for Windows from Git 2.46.2.windows.1 to Git 2.47.0.windows.1. GfW
bumped its MSYS2 components, including `msys-2.0.dll`. That's Cygwin
code, which may have contributed to this. Pipes were involved and
`runtests.pl` relies on pipes heavily in parallel mode. (The issue was
not seen with parallel tests disabled, in retrospect.)

This is useful as a permanent solution too. It drop GfW as a dependency
and makes Windows jobs use one less shell/env flavour.

Long term it might help to use native Windows Perl to avoid the MSYS
layer completely, if there is a way to make that work.

Assortment of possibly related links:
https://cygwin.com/pipermail/cygwin/2024-August/256398.html
cygwin/cygwin@f78009c
cygwin/cygwin@7f3c225

actions/runner-images#10843
git-for-windows/git#5199
git-for-windows/msys2-runtime#75
git-for-windows/msys2-runtime@7913a41
git-for-windows/msys2-runtime@555afcb
cygwin/cygwin@1c5f4dc

Follow-up to c33174d curl#15364
Follow-up to 1e03059 curl#15356

Closes curl#15380
pps83 pushed a commit to pps83/curl that referenced this pull request Apr 26, 2025
…null-dereference`

- GHA/windows: switch mingw-w64 UWP CI job to use UCRT.
  `msvcr120_app` was missing `getch()` for example.
  Follow-up to f988842 curl#15637
  This job tests compiling for UWP correctly, but the the resulting
  `curl.exe` still doesn't look like a correct UWP app, now exiting
  on startup with: `curl: error initializing curl library`.

- tool_getpass: restore `getch()` for UWP builds.
  Follow-up to f988842 curl#15637

- schannel: silence `-Werror=null-dereference` warning in mingw-w64 UWP:
  ```
  lib/vtls/schannel_verify.c: In function 'Curl_verify_host':
  lib/vtls/schannel_verify.c:558:33: error: null pointer dereference [-Werror=null-dereference]
    558 |     for(i = 0; i < alt_name_info->cAltEntry; ++i) {
        |                    ~~~~~~~~~~~~~^~~~~~~~~~~
  lib/vtls/schannel_verify.c:559:50: error: null pointer dereference [-Werror=null-dereference]
    559 |       PCERT_ALT_NAME_ENTRY entry = &alt_name_info->rgAltEntry[i];
        |                                     ~~~~~~~~~~~~~^~~~~~~~~~~~
  ```
  Ref: https://github.com/curl/curl/actions/runs/12022656065/job/33515255397?pr=15638#step:19:27
  Follow-up to 9640a8e curl#15421

- GHA/windows: fix `find` command in MSVC job step.
  Follow-up to 5f9411f curl#15380

- GHA/windows: drop unnecessary `windowsappcompat` lib from mingw-w64
  UWP job. Also drop related MSYS2 package.

- GHA/windows: cmake 3.31.0 still invokes `windres` with wrong options
  with mingw-w64 UPW. Update curl version in comment accordingly.

- GHA/windows: tidy up mingw-w64 UWP spec logic, limit it to gcc.

- GHA/windows: update comments on `curl.exe` UWP startup errors.

Closes curl#15638
pps83 pushed a commit to pps83/curl that referenced this pull request Apr 26, 2025
Today GHA Windows runner images (all versions) deployed an upgrade
(20250127.1.0 -> 20250203.1.0) that upgraded the default MSYS2, which
now seems to feature the October 2024 issue that caused curl runtests
run times increasing ~2.5x. It also causes test987 to fail, and vcpkg
jobs hitting their time limits and fail. Reliability also got a hit.

In October this issue came with a Git for Windows upgrade, and likely
the MSYS2 runtime update within it. It affected vcpkg jobs only, and
I mitigated it by switching them to use the default MSYS2 shell and
runtime (at `C:\msys64`):

5f9411f curl#15380

After today's update this mitigation no longer works. The issue also
affects `dl-mingw` jobs now, though to a lesser extent than vcpkg ones.

Tried switching back to Git for Windows which received several updates
since October, but the performance issue is still present.

I managed to mitigate the slowdown in vcpkg by lowering test parallelism
to `-j4` (from `-j8`), after which the jobs are about *half the speed*
than before, and fit their time limits. `dl-mingw` builds run slower by
1-1.5 minutes per job, they were already using `-j4`.

Example jobs:

Before (ALL GOOD):
https://github.com/curl/curl/actions/runs/13167230443/job/36750175428 installed MSYS2, mingw (-j8): 3m50s (OK)
https://github.com/curl/curl/actions/runs/13167230443/job/36750158662 default MSYS2, dl-mingw (-j4): 4m22s (OK)
https://github.com/curl/curl/actions/runs/13167230443/job/36750163392 default MSYS2, vcpkg (-j8): 3m27s (OK)
runner: https://github.com/actions/runner-images/blob/win22/20250127.1/images/windows/Windows2022-Readme.md
C:\msys64:
System: MSYS_NT-10.0-20348 fv-az1115-916 3.5.4-0bc1222b.x86_64 2024-12-05 09:27 UTC x86_64 Msys
msys2/msys2-runtime@0bc1222b

After:
https://github.com/curl/curl/actions/runs/13186498273/job/36809747078 installed MSYS2, mingw (-j8): 3m48s (OK)
https://github.com/curl/curl/actions/runs/13186498273/job/36809728481 default MSYS2, dl-mingw (-j4): 5m56s (SLOW)
https://github.com/curl/curl/actions/runs/13186498273/job/36809736429 default MSYS2, vcpkg (-j8): 9m1s (SLOW)
runner: https://github.com/actions/runner-images/blob/win22/20250203.1/images/windows/Windows2022-Readme.md
C:\msys64:
System: MSYS_NT-10.0-20348 fv-az1115-498 3.5.7-2644508f.x86_64 2025-01-30 09:08 UTC x86_64 Msys
msys2/msys2-runtime@2644508f

windows-2025 image:
C:\msys64:
System: MSYS_NT-10.0-26100 fv-az2043-515 3.5.7-2644508f.x86_64 2025-01-30 09:08 UTC x86_64 Msys

windows-2019 image:
C:\msys64:
System: MSYS_NT-10.0-17763 fv-az1434-677 3.5.7-2644508f.x86_64 2025-01-30 09:08 UTC x86_64 Msys

This PR:
final: https://github.com/curl/curl/actions/runs/13186498273/job/36809736429 GfW, vcpkg (*-j4*): ~7m (SLOW)
test: https://github.com/curl/curl/actions/runs/13187992987/job/36814644852?pr=16217, GfW, vcpkg (-j8): ~11m (SLOWER)

Before and after (unused) Git for Windows (SLOW as tested in this PR):
C:\Program Files\Git
System: MINGW64_NT-10.0-20348 fv-az1760-186 3.5.4-395fda67.x86_64 2024-11-25 09:49 UTC x86_64 Msys
msys2/msys2-runtime@395fda67 (fork)

Before and after (used) MSYS2 installed via msys2/setup-msys2 (OK):
D:\a\_temp\msys64
System: MINGW64_NT-10.0-20348 fv-az836-378 3.5.4-0bc1222b.x86_64 2024-12-05 09:27 UTC x86_64 Msys

Perl pipe issue report from October, still open:
msys2/msys2-runtime#230
ARM deadlock fixed by GfW 2.47.1(1), but for x86_64, on a quick glance:
msys2/msys2-runtime@290bea9
Possibly interesting:
msys2/msys2-autobuild#62

Closes curl#16217
pps83 pushed a commit to pps83/curl that referenced this pull request Apr 26, 2025
Use the last known good release of Git for Windows by installing it
manually. It restores `runtests.pl` performance to the levels before
the October 2024 and this week's fallouts which gradually deployed
MSYS2 runtimes with pipe/signal/concurrency issues.

Also:
- restore vcpkg job's test parallelism to `-j8` (from `-j4`).
- keep using the default shell for jobs not running tests.
  To avoid the unnecessary Git for Windows install overhead.

Upsides:
- good performance again.
- easy to experiment with any version.

Downsides:
- installing the Git for Windows package takes 15-30 seconds.
- we're pinned to an old package version.
- no canary to tell when the issue is fixed on the runner images.

Unknown:
- stability. (no MSYS2 runtimes were ever stable and it's difficult
  to quantify if a version improves or worsens stability/flakiness, and
  intermittent env failures.

Follow-up to 1bf774d curl#16217
Follow-up to 5f9411f curl#15380

Closes curl#16265
pps83 pushed a commit to pps83/curl that referenced this pull request Apr 26, 2025
We recently switched to a known good version of Git for Windows to avoid
the MSYS2/Cygwin runtime performance regression.

MSYS2 is closer to the source of the MSYS2/Cygwin projects. Its known
good version is newer. Installing the downgrade is faster and safer. It
also allows to restore the scripts to their original iteration, making
the workaround easier to drop once the perf issue is fixed upstream.

Therefore, switch back to using MSYS2, and install the runtime downgrade
before running curl tests.

Also disable `pacman`'s `CheckSpace` for best performance.

Jeremy identified to the root cause of the perf regression in this
Cygwin commit (from 2024-09-17):
https://cygwin.com/git/?p=newlib-cygwin.git;a=commit;h=c7fe29f5cb85242ae2607945762f7e0b9af02513

Co-authored-by: Jeremy Drake
Patch: jeremyd2019@95a404e
Ref: curl#16217 (comment)
Ref: curl#16217 (comment)

Follow-up to 116950a curl#16265
Follow-up to 1bf774d curl#16217
Follow-up to 5f9411f curl#15380

Closes curl#16424
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Continuous Integration Windows Windows-specific

Development

Successfully merging this pull request may close these issues.

1 participant