Skip to content

Bots sometimes timeout waiting for scheduling. #74997

@dnfield

Description

@dnfield

I've seen this several times now on #74922 and a couple other recent PRs, particularly on Windows bots.

For example: https://ci.chromium.org/ui/p/flutter/builders/try/Windows%20tool_tests/9570/steps?succeeded=true&debug=false

AFAICT, what's happening is that the "sharding" bot gets scheduled and starts. It spawns 3 more builders and waits. Those other 3 builders take most of the ~1hr the original bot has to run to even get scheduled and start. the original bot sees it is out of time and cancels the other builders, and is marked as a failure.

I think we need some combination of:

  • More bots
  • Fewer shards (or more shards so they finish/fail faster?)
  • No timeout on shard waiting bots
  • No dedicated bot waiting for the sharded bots.

/cc @godofredoc @keyonghan @CaseyHillers

I've seen this on several PRs recently.

Metadata

Metadata

Assignees

Labels

c: contributor-productivityTeam-specific productivity, code health, technical debt.infra: metricsInfrastructure metrics-related issuesteam-infraOwned by Infrastructure team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions