Skip to content

Fix session persistence across tool calls in multi-server MCPConfigTransport#3330

Merged
jlowin merged 10 commits intoPrefectHQ:mainfrom
jer805:fix-session-persistence
Mar 2, 2026
Merged

Fix session persistence across tool calls in multi-server MCPConfigTransport#3330
jlowin merged 10 commits intoPrefectHQ:mainfrom
jer805:fix-session-persistence

Conversation

@jer805
Copy link
Copy Markdown
Contributor

@jer805 jer805 commented Feb 28, 2026

Root cause

MCPConfigTransport.connect_session() created ProxyClient instances but never connected them before passing to create_proxy(). This meant _create_client_factory() saw them as disconnected and used the client.new() path, creating a fresh connection (and new session ID) for every tool call.

Fix

Session persistence (config.py): _create_proxy() now connects each client via AsyncExitStack before mounting it on the composite server. StatefulProxyClient is used instead of ProxyClient, because its context-restoring handler wrappers prevent stale ContextVars in the reused session's receive-loop task, which fixes elicitation, logging, and progress forwarding that silently failed with ProxyClient in the testing phase.

Note that a new INFO message is printed to the user when reconnecting to a session; we might consider downgrading this to DEBUG.

Since StatefulProxyClient.__aexit__ is intentionally a no-op (designed for the new_stateful() use case), enter_async_context alone cannot clean up the connection. Instead, the client is connected manually via await client.__aenter__(), and explicit cleanup callbacks are pushed onto the AsyncExitStack in LIFO order: transport.close() is pushed first (runs last), then client._disconnect(force=True) is pushed second (runs first). This ensures the client is force-disconnected before the underlying transport is closed when the stack unwinds.

Context restoration (proxy.py): _restore_request_context() now also restores _current_context so get_context() works in the receive-loop task. Following existing patterns, (RequestContext, weakref[FastMCP]) is stashed instead of the Context instance (whose properties are ContextVar-dependent and would resolve stale values), and a fresh Context is constructed at restore time after request_ctx has been fixed.

Test fixes (test_mcp_config.py): test_multi_server_timeout_propagation replaced its wraps= spy with a full AsyncMock to match the new async _create_proxy signature and avoid real stdio connections that caused hangs.

Contributors Checklist

Review Checklist

  • I have self-reviewed my changes
  • My Pull Request is ready for review

@marvin-context-protocol marvin-context-protocol Bot added bug Something isn't working. Reports of errors, unexpected behavior, or broken functionality. client Related to the FastMCP client SDK or client-side functionality. server Related to FastMCP server implementation or server-side functionality. labels Feb 28, 2026
@jer805 jer805 marked this pull request as ready for review February 28, 2026 17:22
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ae71307f01

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/fastmcp/client/transports/config.py
Copy link
Copy Markdown
Member

@jlowin jlowin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlowin jlowin merged commit f2dd2e5 into PrefectHQ:main Mar 2, 2026
7 checks passed
@marvin-context-protocol
Copy link
Copy Markdown
Contributor

Test Failure Analysis

Summary: One timing-sensitive test failed in the "Tests with lowest-direct dependencies" CI job — test_ping_loop_sends_pings_at_interval asserted at least 2 pings were sent in 350ms with a 50ms interval, but only 1 was delivered.

Root Cause: tests/server/middleware/test_ping.py:176 uses anyio.move_on_after(0.35) to bound a 50ms-interval ping loop, then asserts send_ping.call_count >= 2. On a loaded CI runner (or with an older anyio version in the lowest-dependencies matrix), the actual sleep durations stretch past the nominal 50ms, so only one ping fires before the 350ms deadline. The failure is unrelated to the changes in this PR.

Suggested Solution: Make the test less sensitive to wall-clock timing by either:

  1. Increase the time window (simplest fix) — raise move_on_after from 0.35 to 2.0 seconds in test_ping_loop_sends_pings_at_interval. With a 50ms interval that gives ~40 expected pings, so a very slow runner would need to be 20× slower than expected to fail.

  2. Mock anyio.sleep (most robust) — patch fastmcp.server.middleware.ping.anyio.sleep to advance a counter and cancel after N calls, removing the wall-clock dependency entirely.

The companion test test_ping_loop_cleans_up_on_cancellation uses move_on_after(0.1) with the same 50ms interval and could suffer the same fate — worth fixing both at the same time.

Detailed Analysis

Failing test: tests/server/middleware/test_ping.py::TestPingLoop::test_ping_loop_sends_pings_at_interval

Log excerpt:

async def test_ping_loop_sends_pings_at_interval(self):
    middleware = PingMiddleware(interval_ms=50)
    ...
    with anyio.move_on_after(0.35):
        await middleware._ping_loop(mock_session, session_id)

    # Should have sent at least 2 pings in 350ms with 50ms interval
>   assert mock_session.send_ping.call_count >= 2
E   AssertionError: assert 1 >= 2

_ping_loop implementation (src/fastmcp/server/middleware/ping.py:63):

async def _ping_loop(self, session: Any, session_id: int) -> None:
    try:
        while True:
            await anyio.sleep(self.interval_ms / 1000)  # sleep FIRST, then ping
            await session.send_ping()
    finally:
        self._active_sessions.discard(session_id)

Note the sleep-first pattern: the loop sleeps 50ms, sends a ping, sleeps again, etc. With a 350ms window the math says ~7 pings, but the "lowest-direct dependencies" matrix job (which pins minimum compatible versions) ran on a busy runner and the cumulative sleep drift consumed almost the entire window after just one iteration.

Only the "lowest-direct dependencies" job failed (1 fail / 4619 pass), all other matrix jobs passed.

Related Files
  • tests/server/middleware/test_ping.py — failing test at line 161 (test_ping_loop_sends_pings_at_interval) and potentially fragile test at line 178 (test_ping_loop_cleans_up_on_cancellation)
  • src/fastmcp/server/middleware/ping.py_ping_loop implementation (line 63)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working. Reports of errors, unexpected behavior, or broken functionality. client Related to the FastMCP client SDK or client-side functionality. server Related to FastMCP server implementation or server-side functionality.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

StreamableHttpTransport creates new session for each connection in MCPConfigTransport

2 participants