Skip to content

Add real-world schema crash test (232K schemas from APIs.guru)#3826

Merged
jlowin merged 1 commit intomainfrom
test/real-world-schema-crash-test
Apr 11, 2026
Merged

Add real-world schema crash test (232K schemas from APIs.guru)#3826
jlowin merged 1 commit intomainfrom
test/real-world-schema-crash-test

Conversation

@strawgate
Copy link
Copy Markdown
Collaborator

@strawgate strawgate commented Apr 10, 2026

Summary

Adds an integration test that runs json_schema_to_type against every schema definition in the APIs.guru openapi-directory — 232,092 schemas from 4,120 real-world OpenAPI specs.

The test snapshots crash counts as regression baselines so future changes can't silently increase the crash rate. As we fix crash patterns, we ratchet the baselines down.

Schemas tested:  232,092
TypeErrors:      2,342  (e.x. datetime serialization in defaults)
SchemaErrors:      273  (e.x. invalid regexes in specs — not our code)
Timeouts:            0
Other errors:        0
Total crashes:   2,615  (1.13%)

Skipped by default — requires a local clone of the openapi-directory repo. Run with:

git clone --depth 1 https://github.com/APIs-guru/openapi-directory /tmp/openapi-directory
uv run pytest tests/utilities/json_schema_type/test_real_world_schemas.py -m integration -s

Takes ~13 minutes on a MacBook Pro. Each schema gets a 5-second SIGALRM timeout to catch infinite loops.

🤖 Generated with Claude Code

@marvin-context-protocol marvin-context-protocol Bot added enhancement Improvement to existing functionality. For issues and smaller PR improvements. tests openapi Related to OpenAPI integration, parsing, or code generation features. labels Apr 10, 2026
@marvin-context-protocol
Copy link
Copy Markdown
Contributor

marvin-context-protocol Bot commented Apr 10, 2026

Test Failure Analysis

(Updated to reflect latest workflow run — previous analysis re: ruff lint violations is no longer current.)

Summary: The Tests: Python 3.10 on windows-latest job failed due to a test timeout in tests/client/test_streamable_http.py. This is a pre-existing, intermittent flaky failure on Windows that is unrelated to this PR.

Root Cause: A test in tests/client/test_streamable_http.py timed out while blocked in the Windows asyncio I/O selector (select.select()). This is a known class of flakiness on Windows Python 3.10 where asyncio's SelectSelector (used instead of Linux's epoll) can hang when a server-side ASGI handler returns without completing the HTTP response (ERROR: ASGI callable returned without completing response). The test process stalls waiting for data that never arrives, eventually hitting the 5-second pytest-timeout.

This PR is not the cause. The most recent push to main (run 24256002384, April 10) had the identical failure pattern — Tests: Python 3.10 on windows-latest timed out in tests/cli/test_run.py for the same reason. The failing test varies between runs, confirming this is intermittent infrastructure-level flakiness, not a regression from this PR.

Suggested Solution: No action needed on this PR. The failure is pre-existing on main and not introduced by these changes. Re-running the CI job will likely pass (or fail in a different test, consistent with the flaky pattern).

Detailed Analysis

Failed job: Tests: Python 3.10 on windows-latest (job ID 70872661074)

Timeout stack trace from this run:

tests\client\test_streamable_http.py +++++++++++++++++++++++++++++++++++ Timeout +++++++++++++++++++++++++++++++++++
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Captured stdout ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
INFO:     127.0.0.1:51818 - "POST /mcp HTTP/1.1" 200 OK
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Captured stderr ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ERROR:    ASGI callable returned without completing response.
...
  File "C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\selectors.py", line 315, in _select
    r, w, x = select.select(r, w, w, timeout)
+++++++++++++++++++++++++++++++++++ Timeout +++++++++++++++++++++++++++++++++++

Comparison with most recent main branch failure (run 24256002384, April 10):

tests\cli\test_run.py ... Timeout
  File "C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\selectors.py", line 315, in _select
    r, w, x = select.select(r, w, w, timeout)

Same root cause, different test — confirming this is flaky infrastructure, not a code regression.

All other jobs (Ubuntu Python 3.10, Ubuntu Python 3.13, integration tests, MCP conformance tests, lowest-direct dependencies) passed cleanly on this PR run.

Related Files
  • tests/client/test_streamable_http.py — the test that timed out; not modified by this PR
  • tests/utilities/json_schema_type/test_real_world_schemas.py — the new file added by this PR (skipped by default, requires local openapi-directory clone)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 49a94be1b9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +37 to +38
pytest.mark.skipif(
not CLONE_DIR.exists(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove skip gate that blocks real-world crash test execution

The module-level skipif(not CLONE_DIR.exists()) causes this test to skip on a fresh checkout, so _ensure_repo() never runs and the repo is never cloned despite the test/docs saying it will. In practice this means CI and most local runs silently skip the regression baseline, so crash-rate regressions are not detected unless the directory is pre-created manually.

Useful? React with 👍 / 👎.

@strawgate strawgate force-pushed the test/real-world-schema-crash-test branch from 49a94be to fce60b1 Compare April 10, 2026 21:16
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fce60b179a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +16 to +20
paths:
- "src/fastmcp/utilities/json_schema_type.py"
- "src/fastmcp/utilities/json_schema.py"
- "src/fastmcp/utilities/openapi/**"
- "src/fastmcp/server/providers/openapi/**"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add client mixin path to PR workflow trigger

The new workflow only runs on pull_request when files under the listed paths change, but src/fastmcp/client/mixins/tools.py is included for push and omitted for pull_request. That means a PR that changes this OpenAPI-related client path can merge without running the crash baseline, and regressions are only caught after merge when push runs on main. Include the same path in the PR filter so this gate works pre-merge.

Useful? React with 👍 / 👎.

@strawgate strawgate force-pushed the test/real-world-schema-crash-test branch from fce60b1 to ff9668f Compare April 10, 2026 21:24
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ff9668fbe1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +84 to +87
if CLONE_DIR.exists():
import shutil

shutil.rmtree(CLONE_DIR)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard removal of arbitrary existing clone path

_ensure_repo() recursively deletes CLONE_DIR whenever it exists but is not already at the pinned commit, without validating that the path is actually an openapi-directory checkout. Since CLONE_DIR comes from OPENAPI_DIRECTORY_PATH, a misconfigured value (for example, a parent directory like /tmp) would cause unrelated files to be deleted during a test run. Please fail safely (or require explicit opt-in) instead of unconditional rmtree on any existing path.

Useful? React with 👍 / 👎.

Comment on lines +179 to +180
signal.alarm(0)
signal.signal(signal.SIGALRM, old_handler)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve pytest timeout alarm when resetting per-schema alarm

The per-schema cleanup calls signal.alarm(0), which cancels any existing process alarm, including the pytest.mark.timeout(1200) guard on POSIX runs that use SIGALRM. Because the code only restores the handler (not the prior timer), the global test timeout can be silently disabled after the first schema, so a later non-interruptible hang may run until the workflow/job timeout instead of failing at 1200s.

Useful? React with 👍 / 👎.

@strawgate strawgate force-pushed the test/real-world-schema-crash-test branch 5 times, most recently from b08f26c to 4567d69 Compare April 11, 2026 00:12
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4567d6998a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


@pytest.mark.integration
@pytest.mark.timeout(300, method="thread")
@pytest.mark.parametrize("provider", _collect_providers())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Defer provider parametrization until repo is ensured

The provider list is computed at collection time via @pytest.mark.parametrize("provider", _collect_providers()), but _ensure_repo() (which clones/checks out the pinned commit) only runs inside the test body. This means the parametrization can be built from the wrong checkout (or from no checkout at all), so in RUN_REAL_WORLD_SCHEMA_TEST=1 fresh-checkout runs pytest collects an empty provider set ([NOTSET]), and with a non-pinned local clone it can miss providers that exist in the pinned dataset. As a result, the per-provider timeout guard is silently skipped or incomplete even though the test is intended to give per-provider coverage/progress.

Useful? React with 👍 / 👎.

@strawgate strawgate force-pushed the test/real-world-schema-crash-test branch from 4567d69 to fc6bf28 Compare April 11, 2026 00:49
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc6bf2816d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# In CI (RUN_REAL_WORLD_SCHEMA_TEST=1), _ensure_repo clones automatically.
# Locally, skip unless the repo is already cloned to avoid a surprise 200MB download.
_run_in_ci = os.environ.get("RUN_REAL_WORLD_SCHEMA_TEST") == "1"
_skip_locally = not _run_in_ci and not CLONE_DIR.exists()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip local run when clone path exists but is not repo

_skip_locally only checks CLONE_DIR.exists(), so any existing path at /tmp/openapi-directory (including an unrelated directory or file) disables the skip and runs the test, after which _ensure_repo() raises RuntimeError because the path is not an openapi-directory clone. This turns a test documented as “skipped by default unless cloned” into a hard failure in normal local pytest runs when that path happens to exist, which is a common /tmp collision case.

Useful? React with 👍 / 👎.

@strawgate strawgate force-pushed the test/real-world-schema-crash-test branch from fc6bf28 to 31e99dd Compare April 11, 2026 01:54
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 31e99dd3de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".



# Accumulator: per-provider tests store results here, final test reads them.
_results: dict[str, ProviderResult] = {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make aggregate crash baseline xdist-safe

This module aggregates provider results in the process-local _results dict, so when the test is run with pytest -n auto (xdist), each worker builds a separate partial view and test_z_aggregate_crash_rate only sees whatever ran on that single worker. In practice this can either skip the aggregate check (_results empty on that worker) or fail the >200_000 assertion with partial totals, so the regression baseline is not reliable under parallel execution.

Useful? React with 👍 / 👎.

@strawgate strawgate force-pushed the test/real-world-schema-crash-test branch from 31e99dd to b56818d Compare April 11, 2026 02:24
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b56818dece

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +82 to +83
str(k) if not isinstance(k, str) else k: _normalize_yaml_types(v)
for k, v in obj.items()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize required entries when coercing YAML keys

Converting every dict key to str here changes property names (e.g. YAML 200: becomes '200') but list scalars are left as-is, so required: [200] remains an int. In _create_dataclass, required detection uses prop_name in required, so '200' no longer matches 200 and a required field is silently treated as optional. This weakens validation for YAML-loaded schemas with unquoted numeric keys instead of preserving their JSON semantics.

Useful? React with 👍 / 👎.

Integration test that runs json_schema_to_type against 232K schemas
from 4,120 real-world OpenAPI specs (APIs.guru openapi-directory).
Snapshots crash counts as regression baselines so future changes
can't silently increase the crash rate.

Current baseline (openapi-directory@f7207cf0):
  TypeErrors:   2,342 (datetime serialization)
  SchemaErrors:   273 (invalid regexes in specs)
  Timeouts:         0
  Other:            0

Skipped unless openapi-directory is cloned locally.
Run with: pytest -m integration tests/.../test_real_world_schemas.py

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@strawgate strawgate force-pushed the test/real-world-schema-crash-test branch from b56818d to 2915cba Compare April 11, 2026 02:53
Copy link
Copy Markdown
Member

@jlowin jlowin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-structured crash regression suite — good isolation from the main test run, and the YAML type normalization fix is a nice bonus.

@jlowin jlowin merged commit 4685599 into main Apr 11, 2026
10 checks passed
@jlowin jlowin deleted the test/real-world-schema-crash-test branch April 11, 2026 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement to existing functionality. For issues and smaller PR improvements. openapi Related to OpenAPI integration, parsing, or code generation features. tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants