Skip to content

[DomainsOnly] Jobs fail with GLIBC version not found #140631

@izaitsevfb

Description

@izaitsevfb

Current Status

ongoing

Error looks like

Run actions/checkout@v3
/__e/node20/bin/node: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libc.so.6: version `GLIBC_2.28' not found (required by /__e/node20/bin/node)
/__e/node20/bin/node: /lib64/libc.so.6: version `GLIBC_2.[25](https://github.com/pytorch/torchtune/actions/runs/11826665430/job/32953216074#step:5:26)' not found (required by /__e/node20/bin/node)

failure example

Incident timeline (all times pacific)

started: Wed Nov 13 ≈12pm
detected: Wed Nov 13 ≈3pm

User impact

Some Nova workflows may fail with the error above. Domain libraries affected torchvision, torchaudio, data, torchtune.

Root cause

Github removed Node 16 in 2.321.0 release.

FYI, a new runner release is created v2.321.0. I am going to rollout this runner slowly through all the rings.
What's special about this runner release is we removed node16 from this runner package and upgrade the dotnet sdk from dotnet 6 to dotnet 8
We need to fully remove node16 from our actions runner to secure our ecosystems and upgrade .net to version 8. Both of these break support with older clib versions of linux, mainly centos7

Mitigation

ongoing

Prevention/followups

TBD

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

Metadata

Metadata

Assignees

No one assigned

    Labels

    ci: sevcritical failure affecting PyTorch CIhigh prioritymodule: rocmAMD GPU support for PytorchtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions