Serialize cluster image builds and split CLI tests into separate CI job by chrisguidry · Pull Request #338 · chrisguidry/docket

chrisguidry · 2026-02-14T22:31:26Z

Two changes to improve CI reliability:

Serialize cluster image builds with file lock

The AlreadyExists fix in #337 handled one symptom of parallel xdist workers racing to build the same cluster image, but there's a second failure mode showing up in CI:

https://github.com/chrisguidry/docket/actions/runs/22025132964/job/63640478732

When concurrent builds target the same tag, the Docker SDK's build() completes successfully in the daemon, then tries to inspect the resulting image by its short ID. If another worker's build re-tagged the image in the meantime, the first image ID gets orphaned and the inspect 404s. This knocked out 485 of 686 tests in the cluster job.

Rather than catching yet another exception type, this serializes the builds with fcntl.flock so only one worker builds at a time. The others wait and find it already built. Eliminates both the AlreadyExists and ImageNotFound races structurally.

Split CLI tests into separate CI job

Cluster CI jobs consistently run right at the 4-minute timeout, and when any test runs slightly slow the whole job gets cancelled. This has been showing up in roughly a third of recent CI runs:

https://github.com/chrisguidry/docket/actions/runs/22025359927/job/63641245074

The 91 CLI tests are subprocess-based and don't exercise backend-specific behavior — they spawn python -m docket ... processes and check output. Running them against every Python x Backend combination (30 matrix entries) is wasted effort.

This moves CLI tests to their own job that varies by Python version but uses a single Redis backend (8.0). The main test matrix now passes --ignore=tests/cli so cluster/valkey/memory jobs only run the tests that actually care about the backend. Local pytest runs are unaffected.

The `AlreadyExists` fix in #337 handled one symptom of parallel xdist workers racing to build the same cluster image, but there's a second failure mode showing up in CI: https://github.com/chrisguidry/docket/actions/runs/22025132964/job/63640478732 When concurrent builds target the same tag, the Docker SDK's `build()` completes successfully in the daemon, then tries to inspect the resulting image by its short ID. If another worker's build re-tagged the image in the meantime, the first image ID gets orphaned and the inspect 404s. This knocked out 485 of 686 tests in the cluster job. Rather than catching yet another exception type, this serializes the builds with `fcntl.flock` so only one worker builds at a time. The others wait and find it already built. Eliminates both the `AlreadyExists` and `ImageNotFound` races structurally. Co-Authored-By: Claude Opus 4.6 <[email protected]>

github-actions · 2026-02-14T22:31:47Z

📚 Documentation has been built for this PR!

You can download the documentation directly here:
https://github.com/chrisguidry/docket/actions/runs/22026488352/artifacts/5513446843

codecov-commenter · 2026-02-14T22:33:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (aace0e1) to head (1be2690).

Additional details and impacted files

@@             Coverage Diff             @@
##             main      #338      +/-   ##
===========================================
+ Coverage   98.63%   100.00%   +1.36%     
===========================================
  Files         103        99       -4     
  Lines       10391      3090    -7301     
  Branches      497        28     -469     
===========================================
- Hits        10249      3090    -7159     
+ Misses        126         0     -126     
+ Partials       16         0      -16

Flag	Coverage Δ
cli-python-3.10	`100.00% <ø> (?)`
cli-python-3.11	`100.00% <ø> (?)`
cli-python-3.12	`100.00% <ø> (?)`
cli-python-3.13	`100.00% <ø> (?)`
cli-python-3.14	`100.00% <ø> (?)`
python-3.10	`100.00% <100.00%> (+1.36%)`	⬆️
python-3.11	`98.05% <100.00%> (+0.76%)`	⬆️
python-3.12	`100.00% <100.00%> (+1.36%)`	⬆️
python-3.13	`100.00% <100.00%> (+1.36%)`	⬆️
python-3.14	`100.00% <100.00%> (+1.37%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/docket/cli/_support.py	`100.00% <ø> (ø)`
tests/cli/run.py	`100.00% <ø> (ø)`
tests/cli/test_iterate_with_timeout.py	`100.00% <ø> (ø)`
tests/cli/test_snapshot.py	`100.00% <ø> (ø)`
tests/test_docket_registration.py	`100.00% <100.00%> (ø)`

... and 97 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1538e9f043

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-14T22:34:24Z

tests/_container.py

 including single-node Redis, Redis Cluster, and Valkey variants.
 """

+import fcntl


Guard Unix-only fcntl import for cross-platform tests

This introduces a hard dependency on fcntl, which is unavailable on Windows; because tests/conftest.py imports tests._container unconditionally, pytest collection now fails immediately on Windows even for non-cluster or memory-backend runs. If contributors are expected to run tests cross-platform (the project metadata is OS-independent), this should be behind a platform guard or use a portable locking fallback.

Useful? React with 👍 / 👎.

Cluster CI jobs consistently run right at the 4-minute timeout, and when any test runs slightly slow the whole job gets cancelled. This has been showing up in roughly a third of recent CI runs: https://github.com/chrisguidry/docket/actions/runs/22025359927/job/63641245074 The 91 CLI tests are subprocess-based and don't exercise backend-specific behavior — they spawn `python -m docket ...` processes and check output. Running them against every Python x Backend combination (30 matrix entries) is wasted effort. This moves CLI tests to their own job that varies by Python version but uses a single Redis backend (8.0). The main test matrix now passes `--ignore=tests/cli` so cluster/valkey/memory jobs only run the tests that actually care about the backend. Local `pytest` runs are unaffected — they still use the `pyproject.toml` coverage config, run all tests, and cover everything. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Co-Authored-By: Claude Opus 4.6 <[email protected]>

The old .coveragerc-memory was missing three files from its omit list that pyproject.toml had (tests/_container.py, tests/_key_leak_checker.py, src/docket/_prometheus_exporter.py). Memory backend uploads included those files with partial coverage, dragging the Codecov project total to 98.63%. The new .coveragerc-core has a consistent omit list, but adding an ignore section to codecov.yml as well so Codecov never counts these files regardless of what coverage.py reports. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Co-Authored-By: Claude Opus 4.6 <[email protected]>

The first CI run of the split showed core tests failing at 99% because `tests/cli/run.py` and `tests/cli/waiting.py` leaked through the `test_*.py` glob, and `register_collection` in `docket.py` was only tested via CLI tests. Changes: - Add `--cov-fail-under=100` to pyproject.toml as the local default - Scope CLI job coverage to just `cli.py`, `_cli_support.py`, and `tests/cli/` so 100% is achievable for that slice - Widen coveragerc-core omit to `tests/cli/*.py` (not just `test_*.py`) - Add `test_register_collection` to core tests - Remove stale `.coveragerc-memory` reference in `tests/cli/run.py` Co-Authored-By: Claude Opus 4.6 <[email protected]>

--cov expects packages/directories, not individual files, so scoping coverage to just cli.py and _cli_support.py doesn't work with subprocess coverage. Instead, the CLI job uploads coverage for everything and Codecov merges it with core test uploads to enforce 100% across the board. Co-Authored-By: Claude Opus 4.6 <[email protected]>

The previous attempt used --cov with file paths, but --cov only accepts packages/directories. Now using --cov=src/docket --cov=tests/cli with an explicit omit list in .coveragerc-cli that excludes all non-CLI source. Only cli.py, _cli_support.py, and tests/cli/ helper modules are measured. Co-Authored-By: Claude Opus 4.6 <[email protected]>

--cov accepts package/module names, not file paths. Using --cov=docket.cli --cov=docket._cli_support scopes coverage to just those modules without needing a .coveragerc-cli omit list to maintain. Co-Authored-By: Claude Opus 4.6 <[email protected]>

The `--cov=docket.cli` approach triggered a beartype circular import because coverage.py imports the module early to find its path. Moving CLI into `src/docket/cli/` lets us use `--cov=src/docket/cli` instead, which is just a directory lookup — no imports, no beartype drama. Also sets the stage for splitting CLI into multiple files down the road. Co-Authored-By: Claude Opus 4.6 <[email protected]>

These tests cover `cli/_support.py` code that's now excluded from core coverage. Without moving them, the `StopAsyncIteration` break path wasn't covered by the CLI job (it was covered by core tests, but core omits CLI source). Co-Authored-By: Claude Opus 4.6 <[email protected]>

chatgpt-codex-connector bot reviewed Feb 14, 2026

View reviewed changes

chrisguidry changed the title ~~Serialize cluster image builds with file lock~~ Serialize cluster image builds and split CLI tests into separate CI job Feb 14, 2026

chrisguidry and others added 9 commits February 14, 2026 18:13

Rename main test job to "Core Tests"

3abbe9c

Co-Authored-By: Claude Opus 4.6 <[email protected]>

Fix coveragerc-core to exclude all tests/cli/*.py

5c246fb

Co-Authored-By: Claude Opus 4.6 <[email protected]>

chrisguidry merged commit f410dfb into main Feb 15, 2026
83 checks passed

chrisguidry deleted the serialize-cluster-build branch February 15, 2026 00:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialize cluster image builds and split CLI tests into separate CI job#338

Serialize cluster image builds and split CLI tests into separate CI job#338
chrisguidry merged 11 commits intomainfrom
serialize-cluster-build

chrisguidry commented Feb 14, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 14, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Feb 14, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chrisguidry commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chrisguidry commented Feb 14, 2026 •

edited

Loading

github-actions bot commented Feb 14, 2026 •

edited

Loading

codecov-commenter commented Feb 14, 2026 •

edited

Loading