Add real-world schema crash test (232K schemas from APIs.guru)#3826
Add real-world schema crash test (232K schemas from APIs.guru)#3826
Conversation
Test Failure Analysis(Updated to reflect latest workflow run — previous analysis re: ruff lint violations is no longer current.) Summary: The Root Cause: A test in This PR is not the cause. The most recent push to Suggested Solution: No action needed on this PR. The failure is pre-existing on Detailed AnalysisFailed job: Timeout stack trace from this run: Comparison with most recent main branch failure (run 24256002384, April 10): Same root cause, different test — confirming this is flaky infrastructure, not a code regression. All other jobs (Ubuntu Python 3.10, Ubuntu Python 3.13, integration tests, MCP conformance tests, lowest-direct dependencies) passed cleanly on this PR run. Related Files
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 49a94be1b9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pytest.mark.skipif( | ||
| not CLONE_DIR.exists(), |
There was a problem hiding this comment.
Remove skip gate that blocks real-world crash test execution
The module-level skipif(not CLONE_DIR.exists()) causes this test to skip on a fresh checkout, so _ensure_repo() never runs and the repo is never cloned despite the test/docs saying it will. In practice this means CI and most local runs silently skip the regression baseline, so crash-rate regressions are not detected unless the directory is pre-created manually.
Useful? React with 👍 / 👎.
49a94be to
fce60b1
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fce60b179a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| paths: | ||
| - "src/fastmcp/utilities/json_schema_type.py" | ||
| - "src/fastmcp/utilities/json_schema.py" | ||
| - "src/fastmcp/utilities/openapi/**" | ||
| - "src/fastmcp/server/providers/openapi/**" |
There was a problem hiding this comment.
Add client mixin path to PR workflow trigger
The new workflow only runs on pull_request when files under the listed paths change, but src/fastmcp/client/mixins/tools.py is included for push and omitted for pull_request. That means a PR that changes this OpenAPI-related client path can merge without running the crash baseline, and regressions are only caught after merge when push runs on main. Include the same path in the PR filter so this gate works pre-merge.
Useful? React with 👍 / 👎.
fce60b1 to
ff9668f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ff9668fbe1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if CLONE_DIR.exists(): | ||
| import shutil | ||
|
|
||
| shutil.rmtree(CLONE_DIR) |
There was a problem hiding this comment.
Guard removal of arbitrary existing clone path
_ensure_repo() recursively deletes CLONE_DIR whenever it exists but is not already at the pinned commit, without validating that the path is actually an openapi-directory checkout. Since CLONE_DIR comes from OPENAPI_DIRECTORY_PATH, a misconfigured value (for example, a parent directory like /tmp) would cause unrelated files to be deleted during a test run. Please fail safely (or require explicit opt-in) instead of unconditional rmtree on any existing path.
Useful? React with 👍 / 👎.
| signal.alarm(0) | ||
| signal.signal(signal.SIGALRM, old_handler) |
There was a problem hiding this comment.
Preserve pytest timeout alarm when resetting per-schema alarm
The per-schema cleanup calls signal.alarm(0), which cancels any existing process alarm, including the pytest.mark.timeout(1200) guard on POSIX runs that use SIGALRM. Because the code only restores the handler (not the prior timer), the global test timeout can be silently disabled after the first schema, so a later non-interruptible hang may run until the workflow/job timeout instead of failing at 1200s.
Useful? React with 👍 / 👎.
b08f26c to
4567d69
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4567d6998a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| @pytest.mark.integration | ||
| @pytest.mark.timeout(300, method="thread") | ||
| @pytest.mark.parametrize("provider", _collect_providers()) |
There was a problem hiding this comment.
Defer provider parametrization until repo is ensured
The provider list is computed at collection time via @pytest.mark.parametrize("provider", _collect_providers()), but _ensure_repo() (which clones/checks out the pinned commit) only runs inside the test body. This means the parametrization can be built from the wrong checkout (or from no checkout at all), so in RUN_REAL_WORLD_SCHEMA_TEST=1 fresh-checkout runs pytest collects an empty provider set ([NOTSET]), and with a non-pinned local clone it can miss providers that exist in the pinned dataset. As a result, the per-provider timeout guard is silently skipped or incomplete even though the test is intended to give per-provider coverage/progress.
Useful? React with 👍 / 👎.
4567d69 to
fc6bf28
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc6bf2816d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # In CI (RUN_REAL_WORLD_SCHEMA_TEST=1), _ensure_repo clones automatically. | ||
| # Locally, skip unless the repo is already cloned to avoid a surprise 200MB download. | ||
| _run_in_ci = os.environ.get("RUN_REAL_WORLD_SCHEMA_TEST") == "1" | ||
| _skip_locally = not _run_in_ci and not CLONE_DIR.exists() |
There was a problem hiding this comment.
Skip local run when clone path exists but is not repo
_skip_locally only checks CLONE_DIR.exists(), so any existing path at /tmp/openapi-directory (including an unrelated directory or file) disables the skip and runs the test, after which _ensure_repo() raises RuntimeError because the path is not an openapi-directory clone. This turns a test documented as “skipped by default unless cloned” into a hard failure in normal local pytest runs when that path happens to exist, which is a common /tmp collision case.
Useful? React with 👍 / 👎.
fc6bf28 to
31e99dd
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 31e99dd3de
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
|
|
||
| # Accumulator: per-provider tests store results here, final test reads them. | ||
| _results: dict[str, ProviderResult] = {} |
There was a problem hiding this comment.
Make aggregate crash baseline xdist-safe
This module aggregates provider results in the process-local _results dict, so when the test is run with pytest -n auto (xdist), each worker builds a separate partial view and test_z_aggregate_crash_rate only sees whatever ran on that single worker. In practice this can either skip the aggregate check (_results empty on that worker) or fail the >200_000 assertion with partial totals, so the regression baseline is not reliable under parallel execution.
Useful? React with 👍 / 👎.
31e99dd to
b56818d
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b56818dece
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| str(k) if not isinstance(k, str) else k: _normalize_yaml_types(v) | ||
| for k, v in obj.items() |
There was a problem hiding this comment.
Normalize
required entries when coercing YAML keys
Converting every dict key to str here changes property names (e.g. YAML 200: becomes '200') but list scalars are left as-is, so required: [200] remains an int. In _create_dataclass, required detection uses prop_name in required, so '200' no longer matches 200 and a required field is silently treated as optional. This weakens validation for YAML-loaded schemas with unquoted numeric keys instead of preserving their JSON semantics.
Useful? React with 👍 / 👎.
Integration test that runs json_schema_to_type against 232K schemas from 4,120 real-world OpenAPI specs (APIs.guru openapi-directory). Snapshots crash counts as regression baselines so future changes can't silently increase the crash rate. Current baseline (openapi-directory@f7207cf0): TypeErrors: 2,342 (datetime serialization) SchemaErrors: 273 (invalid regexes in specs) Timeouts: 0 Other: 0 Skipped unless openapi-directory is cloned locally. Run with: pytest -m integration tests/.../test_real_world_schemas.py 🤖 Generated with Claude Code Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
b56818d to
2915cba
Compare
jlowin
left a comment
There was a problem hiding this comment.
Well-structured crash regression suite — good isolation from the main test run, and the YAML type normalization fix is a nice bonus.
Summary
Adds an integration test that runs
json_schema_to_typeagainst every schema definition in the APIs.guru openapi-directory — 232,092 schemas from 4,120 real-world OpenAPI specs.The test snapshots crash counts as regression baselines so future changes can't silently increase the crash rate. As we fix crash patterns, we ratchet the baselines down.
Skipped by default — requires a local clone of the openapi-directory repo. Run with:
Takes ~13 minutes on a MacBook Pro. Each schema gets a 5-second SIGALRM timeout to catch infinite loops.
🤖 Generated with Claude Code