Graceful degradation for multi-server proxy setup by jlowin · Pull Request #3546 · PrefectHQ/fastmcp

jlowin · 2026-03-18T15:21:32Z

MCPConfigTransport currently fails the entire proxy setup if any single upstream server can't connect — even when other servers in the config are perfectly healthy. This contradicts the graceful-failure philosophy that AggregateProvider already implements at runtime, where individual provider failures are logged and skipped.

The setup phase now matches that philosophy: each _create_proxy() call is individually wrapped, failures are logged at WARNING level with full tracebacks, and the composite is built from whichever servers succeed. If all servers fail, a ConnectionError is raised rather than silently returning an empty composite.

config = {
    "mcpServers": {
        "healthy": {"command": "python", "args": ["server.py"]},
        "unreachable": {"command": "nonexistent-binary"},
    }
}

# Before: ConnectionError (total failure)
# After: works — tools from "healthy" are available,
#         WARNING log for "unreachable"
client = Client(config)
async with client:
    tools = await client.list_tools()

Closes #3533

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef2a0de915

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-18T15:26:51Z

+                except Exception:  # Broad catch is intentional: failure modes
+                    # are diverse (OSError, TimeoutError, RuntimeError, etc.)
+                    # and the whole point is to skip any server that can't connect.
+                    logger.warning(
+                        "Failed to connect to MCP server %r, skipping",
+                        name,
+                        exc_info=True,
                    )
-            except Exception:
-                # Clean up any transports created before the failure
-                for t in self._transports:
-                    await t.close()
-                self._transports = []
-                raise
+                    continue


Preserve config errors instead of skipping the server

This except Exception is broader than the graceful-degradation goal: _create_proxy() also performs local setup such as ToolTransform(...), and that code raises deterministic config errors (for example, duplicate rename targets raise ValueError in ToolTransform.__init__). With this change, a multi-server config that has one malformed server and one healthy server now succeeds with the bad server silently omitted instead of surfacing the misconfiguration to the caller, which makes bad configs much harder to detect and debug.

Useful? React with 👍 / 👎.

Intentional — the whole point of graceful degradation is to skip any server that can't connect, and the failure modes are too diverse to enumerate meaningfully (OSError, TimeoutError, RuntimeError, MCP errors, etc.). The warning log with exc_info=True ensures config errors are visible, not silent.

marvin-context-protocol · 2026-03-18T15:31:35Z

Test Failure Analysis

Summary: The Windows Python 3.10 test job failed with a timeout in tests/server/test_server.py, specifically at TestSettingsFromEnvironment::test_server_starts_without_auth. This test was the 16th test executed in the file and hit the default 5-second timeout while spawning a Python subprocess on Windows.

Root Cause: The failure is not caused by this PR's changes. The PR only modifies src/fastmcp/client/transports/config.py and tests/test_mcp_config.py. The timing out test (test_server_starts_without_auth) was not changed and uses PythonStdioTransport to spawn a Python subprocess — a notoriously slow operation on Windows where 5 seconds is an insufficient timeout.

However, the new tests added by this PR in tests/test_mcp_config.py (test_multi_server_partial_failure, test_multi_server_partial_failure_logs_warning, test_multi_server_partial_failure_cleanup) also spawn Python subprocesses and do not have a @pytest.mark.timeout(15) decorator. The existing subprocess tests in that file (e.g., test_multi_server_config_transport at line 843) use @pytest.mark.timeout(15). The new tests did not get the same treatment. The test runner never reached test_mcp_config.py in this run, but if it did, those tests would also likely time out on Windows.

Suggested Solution:

For the pre-existing flaky test (test_server_starts_without_auth in tests/server/test_server.py): Add @pytest.mark.timeout(15) to the test — it spawns a Python subprocess and 5 seconds is insufficient on Windows.
For the new tests in this PR (tests/test_mcp_config.py): Add @pytest.mark.timeout(15) to the three new tests that spawn Python subprocesses:
- test_multi_server_partial_failure
- test_multi_server_partial_failure_logs_warning
- test_multi_server_partial_failure_cleanup
These tests follow the same pattern as existing subprocess-based tests in the same file (e.g., test_multi_server_config_transport) which already have this marker.

Detailed Analysis

Failing job: Tests: Python 3.10 on windows-latest (Job ID: 67597415991)

Timeout log excerpt:

tests\server\test_server.py ..............+++++++++++++++++++++++++++++++++++ Timeout +++++++++++++++++++++++++++++++++++
                    DEBUG    Stdio transport connected             stdio.py:191

15 tests passed, the 16th timed out. The 16th test in the file is TestSettingsFromEnvironment::test_server_starts_without_auth.

Stack trace shows the test was waiting in asyncio's selector/event loop waiting for the subprocess connection — a Windows subprocess spawn issue.

Existing timeout pattern in test_mcp_config.py — existing subprocess tests have the decorator:

@pytest.mark.timeout(15)
async def test_multi_server_config_transport(tmp_path: Path):

New tests lack this:

async def test_multi_server_partial_failure(tmp_path: Path, server_order: dict):
    # spawns python subprocess — no timeout marker

All other jobs passed: Python 3.10 on Ubuntu, Python 3.13 on Ubuntu, Integration tests, Tests with lowest-direct dependencies.

Related Files

tests/server/test_server.py (line 250): test_server_starts_without_auth — pre-existing test without timeout decorator, spawns subprocess via PythonStdioTransport
tests/test_mcp_config.py (new tests ~line 1009+): test_multi_server_partial_failure, test_multi_server_partial_failure_logs_warning, test_multi_server_partial_failure_cleanup — new tests that spawn Python subprocesses without @pytest.mark.timeout(15)
tests/test_mcp_config.py (line 843, 422, 366): existing subprocess tests that correctly use @pytest.mark.timeout(15)
src/fastmcp/client/transports/config.py: the main logic change in this PR (not related to the failure)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6b24ee413c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-18T16:38:55Z

+                        "Failed to connect to MCP server %r, skipping",
+                        name,
+                        exc_info=True,
                    )
-            except Exception:
-                # Clean up any transports created before the failure
-                for t in self._transports:
-                    await t.close()
-                self._transports = []
-                raise
+                    continue


Disconnect partially connected backends before skipping them

If a transforming server fails after _create_proxy() has already connected it—for example, a tools mapping that renames two backend tools to the same target makes ToolTransform(...) raise at config.py:183-197—this handler only logs and continues. In that case the disconnect callbacks already pushed onto the shared AsyncExitStack do not run until the composite client exits, so the skipped server's subprocess/HTTP session stays alive for the caller's entire session even though that backend was omitted from the router.

Useful? React with 👍 / 👎.

The AsyncExitStack handles cleanup when the session exits — the subprocess isn't leaked, just kept alive until session end. This is the same lifecycle as successfully-mounted servers. The edge case (ToolTransform raising after connection) is narrow enough that the mild resource waste doesn't warrant additional complexity.

jlowin added the enhancement Improvement to existing functionality. For issues and smaller PR improvements. label Mar 18, 2026

marvin-context-protocol Bot added the client Related to the FastMCP client SDK or client-side functionality. label Mar 18, 2026

chatgpt-codex-connector Bot reviewed Mar 18, 2026

View reviewed changes

jlowin added 2 commits March 18, 2026 12:23

Graceful degradation for multi-server proxy setup

2c81044

Add comment for broad except, parameterize test ordering

6b24ee4

jlowin force-pushed the fix/graceful-multi-server-proxy-setup branch from ef2a0de to 6b24ee4 Compare March 18, 2026 16:33

chatgpt-codex-connector Bot reviewed Mar 18, 2026

View reviewed changes

jlowin merged commit 02d55de into main Mar 18, 2026
9 of 12 checks passed

jlowin deleted the fix/graceful-multi-server-proxy-setup branch March 18, 2026 19:25

rishid mentioned this pull request Mar 26, 2026

Multi-Server MCPConfigTransport fails entire session if any server is unreachable #3633

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graceful degradation for multi-server proxy setup#3546

Graceful degradation for multi-server proxy setup#3546
jlowin merged 2 commits intomainfrom
fix/graceful-multi-server-proxy-setup

jlowin commented Mar 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 18, 2026

Uh oh!

jlowin Mar 18, 2026

Uh oh!

marvin-context-protocol Bot commented Mar 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 18, 2026

Uh oh!

jlowin Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jlowin commented Mar 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

jlowin Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

marvin-context-protocol Bot commented Mar 18, 2026

Test Failure Analysis

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

jlowin Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant