Skip to content

fix: strip titles from bare-metadata nodes (Gemini 2.5 Flash)#3881

Merged
jlowin merged 4 commits intomainfrom
fix/gemini-title-stripping
Apr 13, 2026
Merged

fix: strip titles from bare-metadata nodes (Gemini 2.5 Flash)#3881
jlowin merged 4 commits intomainfrom
fix/gemini-title-stripping

Conversation

@jlowin
Copy link
Copy Markdown
Member

@jlowin jlowin commented Apr 12, 2026

Follow-up to #3861.

Two edge cases Codex flagged on the original Gemini title-stripping PR that were left open:

Bare-metadata nodes leak through. Pydantic emits {"title": "X"} for Any-typed fields — no type, no properties, nothing to make the heuristic fire. Gemini 2.5 Flash rejects those with MALFORMED_FUNCTION_CALL, meaning the original fix didn't actually resolve the bug for Pydantic schemas that include any Any-typed field. The heuristic now strips title when every remaining key is a pure JSON Schema metadata keyword (title, description, deprecated, readOnly, writeOnly), which covers the Pydantic-Any case without touching user property dicts.

Literal values get corrupted. The traversal recursed into default, const, examples, and enum unconditionally. A user default shaped like {"title": "My Dashboard", "type": "vis"} could silently have its "title" field interpreted as schema metadata and stripped. These keywords carry literal user data, not sub-schemas, so the traversal now skips them.

# Before (Pydantic Any field): Gemini still crashes
{"properties": {"anyfield": {"title": "Anyfield"}}}  # title survived
# After: title stripped, Gemini accepts

# Before (user default with metadata-shaped dict):
{"default": {"title": "X", "type": "Y"}}  # title silently removed
# After: default preserved as-is

The "property literally named title" case Codex raised is already handled correctly by the existing isinstance(title, str) guard — when properties["title"] is a dict (a sub-schema), it isn't stripped. Covered by an existing regression test (test_title_pruning_preserves_parameter_named_title).

Two follow-ups to #3861's Gemini 2.5 Flash title-stripping fix:

1. Pydantic emits bare {"title": "X"} schemas for Any-typed fields with no
   sibling type/properties. The original heuristic required a schema keyword
   before stripping, so those bare nodes leaked through and Gemini still hit
   MALFORMED_FUNCTION_CALL. Now we also strip when every remaining key is a
   pure metadata key (title/description/deprecated/readOnly/writeOnly).

2. The traversal no longer recurses into default, const, examples, or enum.
   Those hold literal user values — a user default of
   {"title": "X", "type": "Y"} was at risk of having its keys
   interpreted as schema metadata.
@marvin-context-protocol marvin-context-protocol Bot added the bug Something isn't working. Reports of errors, unexpected behavior, or broken functionality. label Apr 12, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8b2a8db49e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/fastmcp/utilities/json_schema.py Outdated
# Keywords whose values are literal user data, not sub-schemas. Skipping
# recursion here prevents `default: {"title": "X"}` from losing the "title"
# data value because it happens to look metadata-shaped.
_LITERAL_KEYWORDS = frozenset({"default", "const", "examples", "enum"})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include example in literal-keyword skip list

The new traversal guard skips literal payload keywords via _LITERAL_KEYWORDS, but omitting example means prune_titles=True still descends into example values and strips user data fields named title. In practice, schemas produced with Pydantic json_schema_extra={"example": {"title": "Dashboard", ...}} now have that example mutated during compression, which is a regression from the previous behavior and contradicts the goal of preserving literal values.

Useful? React with 👍 / 👎.

Two follow-ups after the first round:

1. Skipping recursion on literal-keyword keys broke schemas where a user
   property is literally named 'enum' or 'default' — the traversal stopped
   at the property name and never reached the sub-schema's $ref, so
   referenced $defs got pruned as unused (AssertionError in
   test_all_primitive_field_types). Split arbitrary-key dicts (properties,
   patternProperties, $defs, definitions, dependentSchemas) into their
   own branch so their values are always traversed as sub-schemas
   regardless of key name.

2. Added 'example' (OpenAPI/Swagger 2.0 singular) alongside 'examples'
   (JSON Schema draft 7+) to the literal-keyword list. Codex flagged that
   Pydantic's json_schema_extra={'example': {'title': '...'}} was still
   getting its literal user data stripped.

3. Updated test_lambda snapshot — the whole point of this PR is that bare
   {'title': 'X'} nodes (Pydantic Any fields) now get pruned, so the
   snapshot's {'x': {'title': 'X'}} is now {'x': {}}.
@marvin-context-protocol
Copy link
Copy Markdown
Contributor

Test Failure Analysis

Summary: The test test_all_primitive_field_types in tests/client/test_elicitation.py fails across all Python versions because the new _LITERAL_KEYWORDS skip in traverse_and_clean accidentally treats a dataclass property named enum as the JSON Schema enum keyword, preventing its $ref from being tracked — so $defs/DataEnum gets pruned as "unused" while the reference remains.

Root Cause: In src/fastmcp/utilities/json_schema.py, the PR adds:

_LITERAL_KEYWORDS = frozenset({"default", "const", "examples", "enum"})

and then skips recursion into any dict key matching that set:

if key in _LITERAL_KEYWORDS:
    continue

The problem: this check runs while iterating over every dict — including a schema's properties sub-dict where keys are user-defined property names, not JSON Schema keywords. The failing test has a dataclass field enum: DataEnum, so the properties dict contains:

"enum": {"$ref": "#/$defs/DataEnum"}

When traverse_and_clean iterates over that properties dict, it sees the key "enum", hits _LITERAL_KEYWORDS, and skips recursing into {"$ref": "#/$defs/DataEnum"}. The $ref collector never sees that node, "DataEnum" is never added to root_refs, is_def_used("DataEnum") returns False, and the definition is pruned. The surviving $ref then points to nothing, causing the PointerToNowhere error at runtime.

Suggested Solution: In src/fastmcp/utilities/json_schema.py, handle the properties key specially before the _LITERAL_KEYWORDS check so that its values (property sub-schemas) are always recursed into:

for key, value in node.items():
    if skip_defs_section and key == "$defs":
        continue

    # Properties dict keys are user-defined property names, not schema keywords.
    # Recurse into each property sub-schema directly, bypassing keyword checks.
    if key == "properties" and isinstance(value, dict):
        for prop_schema in value.values():
            traverse_and_clean(prop_schema, current_def_name, depth=depth + 1)
        continue

    # Don't descend into keywords that carry literal data, not sub-schemas
    if key in _LITERAL_KEYWORDS:
        continue

    if key in ["allOf", "oneOf", "anyOf"] and isinstance(value, list):
        ...

The same bug would affect properties named "const", "default", or "examples" (any name that collides with _LITERAL_KEYWORDS), so this is the correct structural fix rather than just removing "enum" from the set.

Detailed Analysis

Failing test: tests/client/test_elicitation.py:486test_all_primitive_field_types

Error (from job logs):

fastmcp.exceptions.ToolError: PointerToNowhere: '/$defs/DataEnum' does not exist within
{'properties': {'integer': {'type': 'integer'}, ..., 'enum': {'$ref': '#/$defs/DataEnum'}},
 'required': [...], 'type': 'object'}

The schema reaching the validator has a $ref pointing to #/$defs/DataEnum but no $defs key at all — DataEnum was removed by the unused-definition pruner in compress_schema.

Trace through the new code (_single_pass_optimize in json_schema.py, Phase 2):

  1. Top-level schema: {"type": "object", "properties": {..., "enum": {"$ref": "#/$defs/DataEnum"}}, "$defs": {"DataEnum": {...}}}
  2. Traversal visits properties value (the property map).
  3. Inside the property map, key = "enum" (property name), value = {"$ref": "#/$defs/DataEnum"}.
  4. New check: "enum" in _LITERAL_KEYWORDSTrue → continue — the sub-schema is never visited.
  5. root_refs never receives "DataEnum".
  6. Phase 4: is_def_used("DataEnum")FalseDataEnum deleted from $defs.
  7. Empty $defs removed from schema.
  8. Schema now has a dangling $refPointerToNowhere at validation time.

Jobs that failed:

  • Tests with lowest-direct dependencies (job 70995129518) — Python 3.10
  • Tests: Python 3.10 on windows-latest (job 70995129524)
  • Tests: Python 3.13 on ubuntu-latest (job 70995129530)
  • Tests: Python 3.10 on ubuntu-latest (job 70995129536)
Related Files
  • src/fastmcp/utilities/json_schema.py — contains _single_pass_optimize / traverse_and_clean; this is where the fix should be applied (the for key, value in node.items() loop, roughly line 420–465 in the PR branch)
  • src/fastmcp/server/elicitation.py:324get_elicitation_schema calls compress_schema(schema) (without prune_titles, so the title changes in this PR don't directly affect the elicitation path — only the prune_defs logic is in play here)
  • tests/client/test_elicitation.py:486 — failing test test_all_primitive_field_types

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 055f50ea45

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/fastmcp/utilities/json_schema.py Outdated
and isinstance(node["title"], str) # type: ignore
and (
any(k in node for k in _SCHEMA_KEYWORDS)
or all(k in _METADATA_KEYS for k in node)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Limit metadata-only title stripping to schema contexts

The new all(k in _METADATA_KEYS for k in node) branch removes title from any dict that happens to have metadata-like keys, even when that dict is user extension payload rather than a schema node. For example, with json_schema_extra={"x-ui": {"title": "Dashboard", "description": "..."}}, compress_schema(..., prune_titles=True) now mutates x-ui.title to nothing, which is a regression introduced by this commit (previous logic required structural schema keywords before pruning).

Useful? React with 👍 / 👎.

Codex flagged that the bare-metadata title-strip heuristic fires on any
dict whose keys are all metadata — including user extension payloads like
json_schema_extra={'x-ui': {'title': 'Dashboard', 'description': '...'}},
where the payload is opaque to us but happens to look metadata-shaped.

Fix: thread an in_schema flag through traverse_and_clean. We only apply
title/additionalProperties cleanups when we arrived via a known schema
position (root, properties/$defs/patternProperties values, items,
allOf/anyOf/oneOf/prefixItems elements, or single-schema keywords like
if/then/else/contains/propertyNames). Unknown keys (vendor extensions,
arbitrary user dicts) still get traversed for $ref collection, but
in_schema=False so no cleanups run inside them.

Also classified the schema-valued keywords explicitly (SUBSCHEMA_VALUE_KEYS
and SUBSCHEMA_LIST_KEYS) to route traversal correctly.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c1d2a27594

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +552 to +556
# Unknown keys (user extensions like `x-ui`, vendor
# metadata, etc.) — descend for $ref collection but mark
# in_schema=False so cleanups don't touch user payloads.
traverse_and_clean(
value, current_def_name, depth=depth + 1, in_schema=False
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Traverse unlisted schema keywords in schema mode

The fallback path now recurses unknown keys with in_schema=False, which means any valid JSON Schema keyword missing from the hardcoded sets (for example draft-07 dependencies and additionalItems) is treated as opaque payload instead of a subschema. In those branches, compress_schema(..., prune_titles=True) and prune_additional_properties=True stop applying cleanups, whereas the parent implementation still cleaned them, so this introduces a regression for schemas that rely on those standard keywords.

Useful? React with 👍 / 👎.

Codex flagged that unlisted sub-schema keywords now fall into the
in_schema=False branch and skip cleanups entirely. Add the missing
standard keywords:

- dependencies (draft-07): values may be sub-schemas (list values
  short-circuit in the list branch, so it's safe in _SUBSCHEMA_MAP_KEYS)
- additionalItems (draft-07): single sub-schema, precursor to
  unevaluatedItems
- contentSchema (2019-09+): single sub-schema for typed string payloads
@jlowin jlowin merged commit fa9ad6c into main Apr 13, 2026
9 checks passed
@jlowin jlowin deleted the fix/gemini-title-stripping branch April 13, 2026 01:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working. Reports of errors, unexpected behavior, or broken functionality.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant