Skip to content

[CI] XPU linux ci test has flaky error with sccache service #143585

@chuanqi129

Description

@chuanqi129

Noticed that there are some torch xpu ci test failure with sccache caused by the S3 token permission expired, refer https://github.com/pytorch/pytorch/actions/runs/12374649660/job/34540161843#step:14:3461

sccache: error: Server startup failed: cache storage failed to read: Unexpected (permanent) at read => S3Error { code: "ExpiredToken", message: "The provided token has expired.", resource: "", request_id: "NMJ4H2V91GQ7S2BZ" }

The workflow is https://github.com/pytorch/pytorch/blob/main/.github/workflows/_xpu-test.yml runs on self-hosted runner linux.idc.xpu with docker containers

cc @seemethere @malfet @pytorch/pytorch-dev-infra @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

Metadata

Metadata

Assignees

Labels

intelThis tag is for PR from Intelmodule: ciRelated to continuous integrationmodule: infraRelates to CI infrastructuretriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions