Skip to content

Add search_result_serializer hook and serialize_tools_for_output_markdown#3337

Merged
jlowin merged 3 commits intoPrefectHQ:mainfrom
MagnusS0:feat/code-mode-markdown-serializer
Mar 1, 2026
Merged

Add search_result_serializer hook and serialize_tools_for_output_markdown#3337
jlowin merged 3 commits intoPrefectHQ:mainfrom
MagnusS0:feat/code-mode-markdown-serializer

Conversation

@MagnusS0
Copy link
Copy Markdown
Contributor

@MagnusS0 MagnusS0 commented Feb 28, 2026

Description

While migrating the OpenBB MCP server to FastMCP v3 I was playing around with CodeMode and noticed that search results were eating a lot of the context savings it's supposed to provide. The LLM gets back full JSON tool definitions (same as list_tools ) which is fine for simple tools but balloons fast with anything schema-heavy.

My solution is a search_result_serializer hook on BaseSearchTransform (and CodeMode), so you can swap in whatever serialization makes sense for your use case. The built-in serialize_tools_for_output_markdown strips the JSON boilerplate and renders just what the LLM actually needs to pick and call a tool. In my simple benchmark across an 11-tool catalog with some complex OpenBB-style schemas, it cut search result tokens by ~65-70% (benchmark script). Default behavior is unchanged, fully opt-in.

from fastmcp.experimental.transforms import CodeMode
from fastmcp.server.transforms.search import serialize_tools_for_output_markdown

mcp = FastMCP("Server", transforms=[
    CodeMode(search_result_serializer=serialize_tools_for_output_markdown)
])

The markdown output for a tool looks like this:

### create_document

Create a new document in the workspace with the given metadata.

**Parameters**
- `title` (string, required)
- `content` (string, required)
- `tags` (string[], required)
- `author` (string, required)
- `published` (boolean)
- `parent_id` (string?)

**Returns**
- `value` (object)

Both built-in serializers (serialize_tools_for_output_json and serialize_tools_for_output_markdown) are exported from fastmcp.server.transforms.search and work on standalone search transforms too, not just CodeMode.

Generated with Claude Code

Contributors Checklist

Review Checklist

  • I have self-reviewed my changes
  • My Pull Request is ready for review

…down

Adds a `search_result_serializer` hook to `BaseSearchTransform` (and by
extension `BM25SearchTransform` and `RegexSearchTransform`) so callers
can control how search results are serialized before being returned to
the LLM. The same hook is available on `CodeMode`.

Adds a new built-in `serialize_tools_for_output_markdown` serializer that
renders tool definitions as compact markdown (~65-70% fewer tokens than
the default JSON format). The existing JSON serializer is now also public
as `serialize_tools_for_output_json`. Both are exported from
`fastmcp.server.transforms.search`.

🤖 Generated with Claude Code
@marvin-context-protocol marvin-context-protocol Bot added enhancement Improvement to existing functionality. For issues and smaller PR improvements. server Related to FastMCP server implementation or server-side functionality. labels Feb 28, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7d70767e79

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/fastmcp/experimental/transforms/code_mode.py Outdated
Comment thread src/fastmcp/server/transforms/search/base.py Outdated
CodeMode now falls back to the search transform's own serializer when no
explicit search_result_serializer is set, so a pre-configured transform
(e.g. BM25SearchTransform(search_result_serializer=...)) is no longer
silently ignored.

_schema_section now distinguishes between a missing properties key (single
unnamed value) and an empty properties dict (zero-argument tool), rendering
the latter as "*(no parameters)*" instead of a fake value argument.

🤖 Generated with Claude Code
@marvin-context-protocol
Copy link
Copy Markdown
Contributor

Test Failure Analysis

Summary: The CI failure is a pre-existing flaky timeout test unrelated to this PR's changes. The same test (TestTimeout::test_timeout_client_timeout_does_not_override_tool_call_timeout_if_lower) has failed across multiple unrelated branches on the same day.

Root Cause: tests/client/test_sse.py::TestTimeout::test_timeout_client_timeout_does_not_override_tool_call_timeout_if_lower uses very tight timing thresholds (client timeout=0.1s, tool sleep of 0.03s, per-call timeout of 2s). Under CI load, the connection initialization itself — the __aenter__ / initialize() handshake — consumes the 0.1s client timeout before the actual call_tool is ever invoked. The error is thrown during session setup (mcp.shared.session.py:294), not during the tool call.

This PR (feat/code-mode-markdown-serializer) did not touch tests/client/test_sse.py or any timeout-related client code. An earlier run of this exact branch (22528048506) passed all tests cleanly.

Suggested Solution: Re-run the failed job — this is a CI infrastructure flake. The underlying test itself should be fixed separately by either relaxing its timing margins or marking it with @pytest.mark.flaky to tolerate intermittent failures.

Detailed Analysis

Failing test (tests/client/test_sse.py:188-200):

async def test_timeout_client_timeout_does_not_override_tool_call_timeout_if_lower(
    self, sse_server: str
):
    async with Client(
        transport=SSETransport(sse_server),
        timeout=0.1,   # <-- only 100ms for entire connection
    ) as client:
        await client.call_tool("sleep", {"seconds": 0.03}, timeout=2)

Error from logs:

E  mcp.shared.exceptions.McpError: Timed out while waiting for response to ClientRequest. Waited 0.1 seconds.
   .venv/lib/python3.10/site-packages/mcp/shared/session.py:294: McpError

The timeout fires at client.py:476 during self._session_state.initialize_result = await self.session.initialize(), which is the MCP session handshake that happens inside async with Client(...) as client:. The 0.1s budget is fully consumed by the SSE handshake under CI load, never reaching call_tool.

Same test failed on unrelated branches today:

  • Run 22525457770 (branch claude/review-issue-3035-E6W3s) — same error, same test
  • Run 22528355900 (this PR) — same error, same test

This PR's changes are confined to:

  • src/fastmcp/experimental/transforms/code_mode.py
  • src/fastmcp/server/transforms/search/
  • docs/servers/transforms/code-mode.mdx

None of these files are involved in the failing test.

Related Files
  • /home/runner/work/fastmcp/fastmcp/tests/client/test_sse.py — contains the flaky test (lines 188-200)
  • /home/runner/work/fastmcp/fastmcp/src/fastmcp/client/client.py — where timeout fires during initialize() (line 476)
  • /home/runner/work/fastmcp/fastmcp/src/fastmcp/experimental/transforms/code_mode.py — PR changes (unrelated)
  • /home/runner/work/fastmcp/fastmcp/src/fastmcp/server/transforms/search/base.py — PR changes (unrelated)

🤖 Generated with Claude Code

@MagnusS0
Copy link
Copy Markdown
Contributor Author

More general comment when it comes to token usage and search tools, maybe a follow-up PR.
But from my experience, from a context perspective, I have found the approach taken by mcp-cli where you have a two step search:

  1. candidate selection (name + short description)
  2. full schema for only 1-2 finalists

A lot more token efficient and scales better when you have a lot of tools.

Could be something like:

# default `summary`: name, short description, tags/category
# optional `detail="full"` for current behavior

search(query, detail="summary")

# return full input/output schema for selected tool(s)
get_schema(name | [name1, name2, ...] )

@jlowin
Copy link
Copy Markdown
Member

jlowin commented Mar 1, 2026

Thanks @MagnusS0 this is really solid work! The serializer hook and the markdown rendering are both well thought out, and I totally agree with the direction you're going here. Your follow-up comment about two-stage search actually resolves some tension I've been feeling about how search results scale with complex tool catalogs, so thanks for surfacing that.

Here's where I think this should land architecturally: we'll split the current search tool into a progressive disclosure flow —

  • search returns just names and short descriptions, super lightweight. Option to get more detail.
  • get_schema takes tool name(s) and returns abbreviated markdown (your serializer) by default, with an option to get the full JSON schema when the LLM needs it
  • execute stays as-is

The LLM regulates its own token budget — scan candidates cheaply, pull schema detail only for the tools it actually cares about. Some default to let users err for more detail on simple servers (where two calls is worse than one due to latency) or less detail on complex servers (where two search calls is probably more efficient overall).

Since you already suggested this as a follow-up, I'd love to merge this as-is and then quickly follow up with a PR that implements the staged approach on top. You've already done the heavy lifting here with the serializer — the rest is reshaping the tool surface around it. This will all land together before it ships. Ideally targeting later this week or next.

@jlowin jlowin added this to the 3.1 milestone Mar 1, 2026
@jlowin jlowin merged commit 1e72f24 into PrefectHQ:main Mar 1, 2026
6 checks passed
@MagnusS0
Copy link
Copy Markdown
Contributor Author

MagnusS0 commented Mar 1, 2026

Thanks @jlowin, really glad it landed well and that you like the direction!

One more thought for the staged approach, since you're designing the tool surface now anyway, for MCPs or converted APIs with say 10–20 domains, even a lightweight search over all candidates can get noisy. A list_categories() call as a cheap first step would let the agent scope before it does anything else:

list_categories()   # ["equity", "sec", "etf", ...]
search("price", category="equity", detail="summary")   # scoped candidates
get_schema("price_history")   # full detail

This could map to MCP tags if tools already have them, so no extra annotation burden. For servers where tags aren't set maybe don't expose the tool at all.

One pattern I've hit personally: sub-categories (e.g. equity → {price, fundamentals}). Probably niche, but if it's easy to expose as an opt-in, say a custom categories mapping on the transform, it would be very nice.

Just flagging before the surface is locked in, easier to bake in now than retrofit later.

@jlowin
Copy link
Copy Markdown
Member

jlowin commented Mar 1, 2026

So there is this interesting fallback (we actually started with it but it's too complicated for simple cases) where you can use the call tool to actually learn what the tools are. like we could actually expose the schema as a data structure and allow an LLM to process it arbitrarily. I think its a interesting extreme case. Maybe we find a way to support a spectrum of tools with some signature. I'll think on it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement to existing functionality. For issues and smaller PR improvements. server Related to FastMCP server implementation or server-side functionality.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CodeMode search results inflate context for tools with complex schemas

2 participants