Skip to content

ci: speed up schema crash test (CSafeLoader + xdist-safe aggregation)#3873

Merged
jlowin merged 4 commits intomainfrom
ci/schema-crash-nightly
Apr 12, 2026
Merged

ci: speed up schema crash test (CSafeLoader + xdist-safe aggregation)#3873
jlowin merged 4 commits intomainfrom
ci/schema-crash-nightly

Conversation

@jlowin
Copy link
Copy Markdown
Member

@jlowin jlowin commented Apr 12, 2026

Follow-up to #3826.

The real-world schema crash test was running ~700 provider-level pytest cases serially, taking 25-40 minutes on PRs touching schema-to-type code. Three changes speed it up and make parallel execution correct:

  1. CSafeLoader instead of yaml.safe_load. Profiling showed ~80% of wall-clock was pure-Python YAML parsing — safe_load() doesn't pick up libyaml automatically. Switching to the C loader with the same safety semantics gives ~8x speedup per file.

  2. -n auto in the pytest command. GitHub's standard ubuntu runners have 2 cores, so this gives ~2x on top of the YAML win.

  3. Disk-backed result persistence + pytest_sessionfinish aggregation. The previous aggregate test (test_z_aggregate_crash_rate) read a module-level _results dict, which broke under xdist because each worker is a separate process. Now each per-provider test writes its counts to SCHEMA_CRASH_RESULTS_DIR, and a pytest_sessionfinish hook in conftest.py reads them all on the xdist master after workers finish. Baseline violations set session.exitstatus to fail the run.

Net: ~3-5 minutes expected in CI (down from ~25-40). One incidental observation for @strawgate — on my local run the TypeErrors count dropped from the previous baseline of 388 to 0. The JSON round-trip should produce identical input regardless of YAML loader, so it's worth a second look to confirm nothing silently changed.

@marvin-context-protocol marvin-context-protocol Bot added enhancement Improvement to existing functionality. For issues and smaller PR improvements. tests labels Apr 12, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 417c9ea883

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

notify:
name: Notify on failure
needs: [schema_crash_test]
if: failure() && github.event.pull_request == null
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict failure issue creation to main-branch runs

The notify job currently runs on any failed workflow_dispatch execution because its condition only checks failure() and github.event.pull_request == null. Since manual dispatch can target non-main branches, a contributor’s failed branch run will open/update the shared "schema crash test failing" issue with a "main branch" title/body, creating false alarms and potentially masking real main failures until a later main success closes it. Add the same github.ref == 'refs/heads/main' guard used by close-on-success so only main-branch failures create this issue.

Useful? React with 👍 / 👎.

The real-world schema crash test ran ~700 provider cases serially, taking
25-40 minutes wall-clock on PRs touching json_schema code. Parallelizing
with -n auto uses the runner's cores for 4-8x speedup.
@jlowin jlowin force-pushed the ci/schema-crash-nightly branch from 417c9ea to 80bccc6 Compare April 12, 2026 18:13
@jlowin jlowin changed the title ci: run schema crash test nightly instead of per-PR ci: parallelize schema crash test with -n auto Apr 12, 2026
The real-world schema crash test was dominated by pure-Python YAML parsing
(~80% of wall-clock on AWS/GitHub/Google providers). Switching to libyaml's
CSafeLoader gives ~8x speedup per file with identical safety guarantees.

Prior to this change, -n auto broke the aggregate baseline check because
the module-level _results accumulator only saw results from whichever xdist
worker happened to run test_z_aggregate_crash_rate. Each per-provider test
now persists its counts to SCHEMA_CRASH_RESULTS_DIR, and aggregation moves
to a pytest_sessionfinish hook in conftest.py — that hook runs on the xdist
master after all workers finish, giving a correct cross-worker total.
@jlowin jlowin changed the title ci: parallelize schema crash test with -n auto ci: speed up schema crash test (CSafeLoader + xdist-safe aggregation) Apr 12, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9f4a28ee0b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

RUN_REAL_WORLD_SCHEMA_TEST: "1"
OPENAPI_DIRECTORY_PATH: /tmp/openapi-directory
run: uv run pytest tests/utilities/json_schema_type/test_real_world_schemas.py -m integration -v -s --timeout-method=thread
run: uv run pytest tests/utilities/json_schema_type/test_real_world_schemas.py -m integration -v -n auto --timeout-method=thread
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent xdist workers from racing on repo setup

Running this workflow with -n auto makes multiple workers call test_provider_schemas concurrently, and each call still executes _ensure_repo() (in tests/utilities/json_schema_type/test_real_world_schemas.py) against the shared OPENAPI_DIRECTORY_PATH (/tmp/openapi-directory). When the checkout is not already at the pinned commit, _ensure_repo() does shutil.rmtree(CLONE_DIR) and reclones, so two workers can delete/clone the same directory at once and intermittently raise git clone/git CalledProcessError, turning the job flaky before any schema assertions run.

Useful? React with 👍 / 👎.

Under -n auto, multiple xdist workers can enter _ensure_repo() concurrently
and race on shutil.rmtree + git clone against the shared CLONE_DIR. An
fcntl.flock around the rmtree/clone path serializes the reclone, and a
lock-free fast path returns early when HEAD already matches the pinned
commit (covers the steady state where the workflow's clone step has
already prepared the directory).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce268f4486

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


from __future__ import annotations

import fcntl
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard fcntl import on non-POSIX platforms

test_real_world_schemas.py now imports fcntl at module import time, but fcntl is unavailable on Windows. Our default CI (.github/workflows/run-tests.yml, matrix includes windows-latest) runs pytest tests -m "not integration ...", which still imports test modules during collection, so this raises ModuleNotFoundError before marker deselection and can fail the whole Windows test job on unrelated PRs. Please make the lock implementation conditional (or defer import) so collection succeeds on Windows.

Useful? React with 👍 / 👎.

@marvin-context-protocol
Copy link
Copy Markdown
Contributor

Test Failure Analysis

Summary: The Windows unit test job fails during pytest collection because test_real_world_schemas.py now has a bare top-level import fcntl, and fcntl is a Unix-only module not available on Windows.

Root Cause: The PR adds import fcntl at line 15 of tests/utilities/json_schema_type/test_real_world_schemas.py to support file locking in _ensure_repo(). Even though this test file only contains integration-marked tests (which are excluded from the Windows unit test run), pytest must still collect (import) the module to filter by marker — and that import fails with ModuleNotFoundError: No module named 'fcntl'.

Suggested Solution: Guard the import so it doesn't fail on Windows. Since fcntl is only used in _ensure_repo() (which never runs on Windows because the integration test is Linux-only), a no-op fallback is safe:

In tests/utilities/json_schema_type/test_real_world_schemas.py, replace:

import fcntl

with:

try:
    import fcntl as _fcntl
    _HAS_FLOCK = True
except ImportError:
    _HAS_FLOCK = False  # Windows — no file locking available

And update _ensure_repo() to skip locking on Windows:

with open(lock_path, "w") as lf:
    if _HAS_FLOCK:
        _fcntl.flock(lf.fileno(), _fcntl.LOCK_EX)
    try:
        ...
    finally:
        if _HAS_FLOCK:
            _fcntl.flock(lf.fileno(), _fcntl.LOCK_UN)

This matches the existing precedent in the same file where signal.SIGALRM is guarded with use_alarm = hasattr(signal, "SIGALRM").

Detailed Analysis

Failing job: Tests: Python 3.10 on windows-latest (job ID 70992524754)

Error log excerpt:

ERROR collecting tests/utilities/json_schema_type/test_real_world_schemas.py
ImportError while importing test module '...\test_real_world_schemas.py'.
tests\utilities\json_schema_type\test_real_world_schemas.py:15: in <module>
    import fcntl
ModuleNotFoundError: No module named 'fcntl'

The test was introduced/extended as an integration test, but since pytest loads all test modules during collection (to read their markers), the top-level import fcntl fails on Windows even though none of these integration tests are actually selected for running.

The signal.SIGALRM usage in the same file is already correctly guarded (use_alarm = hasattr(signal, "SIGALRM")), so the same pattern applies here.

Related Files
  • tests/utilities/json_schema_type/test_real_world_schemas.py — adds import fcntl at module level (line 15 in PR diff); this is the file that needs the conditional import guard
  • tests/utilities/json_schema_type/conftest.py — new file added in this PR; no Windows-incompatible imports

fcntl is POSIX-only; importing at module level breaks pytest collection on
Windows (the integration test is skipped on Windows but collection still
imports the module to read its markers). Move the import inside
_ensure_repo, which only runs on POSIX.
@jlowin jlowin merged commit 7f80f78 into main Apr 12, 2026
9 checks passed
@jlowin jlowin deleted the ci/schema-crash-nightly branch April 12, 2026 20:34
@strawgate
Copy link
Copy Markdown
Collaborator

strawgate commented Apr 12, 2026

I think I had two prs that fixed bugs in this area so depending on merge order the type issues may be fixed and we can update the baseline

Will look in a bit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement to existing functionality. For issues and smaller PR improvements. tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants