Add search_result_serializer hook and serialize_tools_for_output_markdown#3337
Conversation
…down Adds a `search_result_serializer` hook to `BaseSearchTransform` (and by extension `BM25SearchTransform` and `RegexSearchTransform`) so callers can control how search results are serialized before being returned to the LLM. The same hook is available on `CodeMode`. Adds a new built-in `serialize_tools_for_output_markdown` serializer that renders tool definitions as compact markdown (~65-70% fewer tokens than the default JSON format). The existing JSON serializer is now also public as `serialize_tools_for_output_json`. Both are exported from `fastmcp.server.transforms.search`. 🤖 Generated with Claude Code
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7d70767e79
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
CodeMode now falls back to the search transform's own serializer when no explicit search_result_serializer is set, so a pre-configured transform (e.g. BM25SearchTransform(search_result_serializer=...)) is no longer silently ignored. _schema_section now distinguishes between a missing properties key (single unnamed value) and an empty properties dict (zero-argument tool), rendering the latter as "*(no parameters)*" instead of a fake value argument. 🤖 Generated with Claude Code
Test Failure AnalysisSummary: The CI failure is a pre-existing flaky timeout test unrelated to this PR's changes. The same test ( Root Cause: This PR ( Suggested Solution: Re-run the failed job — this is a CI infrastructure flake. The underlying test itself should be fixed separately by either relaxing its timing margins or marking it with Detailed AnalysisFailing test ( async def test_timeout_client_timeout_does_not_override_tool_call_timeout_if_lower(
self, sse_server: str
):
async with Client(
transport=SSETransport(sse_server),
timeout=0.1, # <-- only 100ms for entire connection
) as client:
await client.call_tool("sleep", {"seconds": 0.03}, timeout=2)Error from logs: The timeout fires at Same test failed on unrelated branches today:
This PR's changes are confined to:
None of these files are involved in the failing test. Related Files
🤖 Generated with Claude Code |
|
More general comment when it comes to token usage and search tools, maybe a follow-up PR.
A lot more token efficient and scales better when you have a lot of tools. Could be something like: # default `summary`: name, short description, tags/category
# optional `detail="full"` for current behavior
search(query, detail="summary")
# return full input/output schema for selected tool(s)
get_schema(name | [name1, name2, ...] ) |
|
Thanks @MagnusS0 this is really solid work! The serializer hook and the markdown rendering are both well thought out, and I totally agree with the direction you're going here. Your follow-up comment about two-stage search actually resolves some tension I've been feeling about how search results scale with complex tool catalogs, so thanks for surfacing that. Here's where I think this should land architecturally: we'll split the current search tool into a progressive disclosure flow —
The LLM regulates its own token budget — scan candidates cheaply, pull schema detail only for the tools it actually cares about. Some default to let users err for more detail on simple servers (where two calls is worse than one due to latency) or less detail on complex servers (where two search calls is probably more efficient overall). Since you already suggested this as a follow-up, I'd love to merge this as-is and then quickly follow up with a PR that implements the staged approach on top. You've already done the heavy lifting here with the serializer — the rest is reshaping the tool surface around it. This will all land together before it ships. Ideally targeting later this week or next. |
|
Thanks @jlowin, really glad it landed well and that you like the direction! One more thought for the staged approach, since you're designing the tool surface now anyway, for MCPs or converted APIs with say 10–20 domains, even a lightweight search over all candidates can get noisy. A list_categories() call as a cheap first step would let the agent scope before it does anything else: list_categories() # ["equity", "sec", "etf", ...]
search("price", category="equity", detail="summary") # scoped candidates
get_schema("price_history") # full detailThis could map to MCP tags if tools already have them, so no extra annotation burden. For servers where tags aren't set maybe don't expose the tool at all. One pattern I've hit personally: sub-categories (e.g. equity → {price, fundamentals}). Probably niche, but if it's easy to expose as an opt-in, say a custom categories mapping on the transform, it would be very nice. Just flagging before the surface is locked in, easier to bake in now than retrofit later. |
|
So there is this interesting fallback (we actually started with it but it's too complicated for simple cases) where you can use the call tool to actually learn what the tools are. like we could actually expose the schema as a data structure and allow an LLM to process it arbitrarily. I think its a interesting extreme case. Maybe we find a way to support a spectrum of tools with some signature. I'll think on it! |
Description
While migrating the OpenBB MCP server to FastMCP v3 I was playing around with
CodeModeand noticed that search results were eating a lot of the context savings it's supposed to provide. The LLM gets back full JSON tool definitions (same aslist_tools) which is fine for simple tools but balloons fast with anything schema-heavy.My solution is a
search_result_serializerhook onBaseSearchTransform(andCodeMode), so you can swap in whatever serialization makes sense for your use case. The built-inserialize_tools_for_output_markdownstrips the JSON boilerplate and renders just what the LLM actually needs to pick and call a tool. In my simple benchmark across an 11-tool catalog with some complex OpenBB-style schemas, it cut search result tokens by ~65-70% (benchmark script). Default behavior is unchanged, fully opt-in.The markdown output for a tool looks like this:
Both built-in serializers (
serialize_tools_for_output_jsonandserialize_tools_for_output_markdown) are exported fromfastmcp.server.transforms.searchand work on standalone search transforms too, not justCodeMode.Generated with Claude Code
Contributors Checklist
Review Checklist