Skip to content

Serialize cluster image builds and split CLI tests into separate CI job#338

Merged
chrisguidry merged 11 commits intomainfrom
serialize-cluster-build
Feb 15, 2026
Merged

Serialize cluster image builds and split CLI tests into separate CI job#338
chrisguidry merged 11 commits intomainfrom
serialize-cluster-build

Conversation

@chrisguidry
Copy link
Owner

@chrisguidry chrisguidry commented Feb 14, 2026

Two changes to improve CI reliability:

Serialize cluster image builds with file lock

The AlreadyExists fix in #337 handled one symptom of parallel xdist workers racing to build the same cluster image, but there's a second failure mode showing up in CI:

https://github.com/chrisguidry/docket/actions/runs/22025132964/job/63640478732

When concurrent builds target the same tag, the Docker SDK's build() completes successfully in the daemon, then tries to inspect the resulting image by its short ID. If another worker's build re-tagged the image in the meantime, the first image ID gets orphaned and the inspect 404s. This knocked out 485 of 686 tests in the cluster job.

Rather than catching yet another exception type, this serializes the builds with fcntl.flock so only one worker builds at a time. The others wait and find it already built. Eliminates both the AlreadyExists and ImageNotFound races structurally.

Split CLI tests into separate CI job

Cluster CI jobs consistently run right at the 4-minute timeout, and when any test runs slightly slow the whole job gets cancelled. This has been showing up in roughly a third of recent CI runs:

https://github.com/chrisguidry/docket/actions/runs/22025359927/job/63641245074

The 91 CLI tests are subprocess-based and don't exercise backend-specific behavior — they spawn python -m docket ... processes and check output. Running them against every Python x Backend combination (30 matrix entries) is wasted effort.

This moves CLI tests to their own job that varies by Python version but uses a single Redis backend (8.0). The main test matrix now passes --ignore=tests/cli so cluster/valkey/memory jobs only run the tests that actually care about the backend. Local pytest runs are unaffected.

The `AlreadyExists` fix in #337 handled one symptom of parallel xdist
workers racing to build the same cluster image, but there's a second
failure mode showing up in CI:

https://github.com/chrisguidry/docket/actions/runs/22025132964/job/63640478732

When concurrent builds target the same tag, the Docker SDK's `build()`
completes successfully in the daemon, then tries to inspect the resulting
image by its short ID. If another worker's build re-tagged the image in
the meantime, the first image ID gets orphaned and the inspect 404s.
This knocked out 485 of 686 tests in the cluster job.

Rather than catching yet another exception type, this serializes the
builds with `fcntl.flock` so only one worker builds at a time. The
others wait and find it already built. Eliminates both the
`AlreadyExists` and `ImageNotFound` races structurally.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@github-actions
Copy link

github-actions bot commented Feb 14, 2026

📚 Documentation has been built for this PR!

You can download the documentation directly here:
https://github.com/chrisguidry/docket/actions/runs/22026488352/artifacts/5513446843

@codecov-commenter
Copy link

codecov-commenter commented Feb 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (aace0e1) to head (1be2690).

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main      #338      +/-   ##
===========================================
+ Coverage   98.63%   100.00%   +1.36%     
===========================================
  Files         103        99       -4     
  Lines       10391      3090    -7301     
  Branches      497        28     -469     
===========================================
- Hits        10249      3090    -7159     
+ Misses        126         0     -126     
+ Partials       16         0      -16     
Flag Coverage Δ
cli-python-3.10 100.00% <ø> (?)
cli-python-3.11 100.00% <ø> (?)
cli-python-3.12 100.00% <ø> (?)
cli-python-3.13 100.00% <ø> (?)
cli-python-3.14 100.00% <ø> (?)
python-3.10 100.00% <100.00%> (+1.36%) ⬆️
python-3.11 98.05% <100.00%> (+0.76%) ⬆️
python-3.12 100.00% <100.00%> (+1.36%) ⬆️
python-3.13 100.00% <100.00%> (+1.36%) ⬆️
python-3.14 100.00% <100.00%> (+1.37%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/docket/cli/_support.py 100.00% <ø> (ø)
tests/cli/run.py 100.00% <ø> (ø)
tests/cli/test_iterate_with_timeout.py 100.00% <ø> (ø)
tests/cli/test_snapshot.py 100.00% <ø> (ø)
tests/test_docket_registration.py 100.00% <100.00%> (ø)

... and 97 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1538e9f043

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

including single-node Redis, Redis Cluster, and Valkey variants.
"""

import fcntl

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard Unix-only fcntl import for cross-platform tests

This introduces a hard dependency on fcntl, which is unavailable on Windows; because tests/conftest.py imports tests._container unconditionally, pytest collection now fails immediately on Windows even for non-cluster or memory-backend runs. If contributors are expected to run tests cross-platform (the project metadata is OS-independent), this should be behind a platform guard or use a portable locking fallback.

Useful? React with 👍 / 👎.

Cluster CI jobs consistently run right at the 4-minute timeout, and when
any test runs slightly slow the whole job gets cancelled. This has been
showing up in roughly a third of recent CI runs:

https://github.com/chrisguidry/docket/actions/runs/22025359927/job/63641245074

The 91 CLI tests are subprocess-based and don't exercise backend-specific
behavior — they spawn `python -m docket ...` processes and check output.
Running them against every Python x Backend combination (30 matrix
entries) is wasted effort.

This moves CLI tests to their own job that varies by Python version but
uses a single Redis backend (8.0). The main test matrix now passes
`--ignore=tests/cli` so cluster/valkey/memory jobs only run the tests
that actually care about the backend.

Local `pytest` runs are unaffected — they still use the `pyproject.toml`
coverage config, run all tests, and cover everything.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@chrisguidry chrisguidry changed the title Serialize cluster image builds with file lock Serialize cluster image builds and split CLI tests into separate CI job Feb 14, 2026
chrisguidry and others added 9 commits February 14, 2026 18:13
The old .coveragerc-memory was missing three files from its omit list
that pyproject.toml had (tests/_container.py, tests/_key_leak_checker.py,
src/docket/_prometheus_exporter.py). Memory backend uploads included
those files with partial coverage, dragging the Codecov project total
to 98.63%.

The new .coveragerc-core has a consistent omit list, but adding an
ignore section to codecov.yml as well so Codecov never counts these
files regardless of what coverage.py reports.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The first CI run of the split showed core tests failing at 99% because
`tests/cli/run.py` and `tests/cli/waiting.py` leaked through the
`test_*.py` glob, and `register_collection` in `docket.py` was only
tested via CLI tests.

Changes:
- Add `--cov-fail-under=100` to pyproject.toml as the local default
- Scope CLI job coverage to just `cli.py`, `_cli_support.py`, and
  `tests/cli/` so 100% is achievable for that slice
- Widen coveragerc-core omit to `tests/cli/*.py` (not just `test_*.py`)
- Add `test_register_collection` to core tests
- Remove stale `.coveragerc-memory` reference in `tests/cli/run.py`

Co-Authored-By: Claude Opus 4.6 <[email protected]>
--cov expects packages/directories, not individual files, so scoping
coverage to just cli.py and _cli_support.py doesn't work with subprocess
coverage. Instead, the CLI job uploads coverage for everything and
Codecov merges it with core test uploads to enforce 100% across the
board.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The previous attempt used --cov with file paths, but --cov only accepts
packages/directories. Now using --cov=src/docket --cov=tests/cli with an
explicit omit list in .coveragerc-cli that excludes all non-CLI source.
Only cli.py, _cli_support.py, and tests/cli/ helper modules are measured.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
--cov accepts package/module names, not file paths. Using
--cov=docket.cli --cov=docket._cli_support scopes coverage to just
those modules without needing a .coveragerc-cli omit list to maintain.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The `--cov=docket.cli` approach triggered a beartype circular import
because coverage.py imports the module early to find its path. Moving
CLI into `src/docket/cli/` lets us use `--cov=src/docket/cli` instead,
which is just a directory lookup — no imports, no beartype drama.

Also sets the stage for splitting CLI into multiple files down the road.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
These tests cover `cli/_support.py` code that's now excluded from core
coverage. Without moving them, the `StopAsyncIteration` break path
wasn't covered by the CLI job (it was covered by core tests, but core
omits CLI source).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@chrisguidry chrisguidry merged commit f410dfb into main Feb 15, 2026
83 checks passed
@chrisguidry chrisguidry deleted the serialize-cluster-build branch February 15, 2026 00:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants