fix: skip max_completion_tokens when maxTokens is None by eon01 · Pull Request #3284 · PrefectHQ/fastmcp

eon01 · 2026-02-24T09:08:53Z

Description

When maxTokens is not specified by the caller, the OpenAI sampling handler was sending "max_completion_tokens": null explicitly in the API request. OpenAI treats an explicit null differently from an omitted key, it returns an empty response with finish_reason: "stop" and no content, which causes the handler to raise ValueError("No content in response from completion").

Fix: guard max_completion_tokens with if params.maxTokens is not None, consistent with how temperature and stopSequences are already handled in the same block.

This is a follow-up to the rename fix in #3252.

Contributors Checklist

My change closes OpenAI handler fails with GPT-5 and O1 models - uses max_tokens instead of max_completion_tokens #3252
I have followed the repository's development workflow
I have tested my changes manually

Review Checklist

I have self-reviewed my changes
My Pull Request is ready for review

Copilot

Pull request overview

This PR fixes a bug in the OpenAI sampling handler where sending an explicit null value for max_completion_tokens causes the API to return empty responses. The fix ensures that when maxTokens is not specified (None), the parameter is omitted from the API request entirely, which is the expected behavior.

Changes:

Modified the OpenAI sampling handler to conditionally include max_completion_tokens only when params.maxTokens is not None, making it consistent with how temperature and stopSequences are handled.

jlowin

Thanks!

marvin-context-protocol · 2026-02-24T19:15:29Z

Test Failure Analysis

Summary: One test failed on Python 3.13 in the "Run unit tests" step: tests/server/middleware/test_rate_limiting.py::TestRateLimitingMiddlewareIntegration::test_rate_limiting_with_different_operations — expected a ToolError to be raised but none was raised.

Root Cause: The test is timing-sensitive and incorrectly calibrated for Python 3.13. It configures RateLimitingMiddleware(max_requests_per_second=9.0, burst_capacity=5) and expects the 3rd explicit call_tool to hit the rate limit. However, the token bucket starts with 5 tokens and the high refill rate (9/sec) means any measurable wall-clock time between async calls replenishes tokens. The number of implicit MCP protocol requests (e.g., initialize, list_tools) made during async with Client(...) likely differs between Python 3.10 and 3.13, leaving one extra token available on 3.13 — causing the 3rd call to succeed instead of raising.

Suggested Solution: Fix test_rate_limiting_with_different_operations to use a near-zero refill rate so token replenishment is negligible during the test, and set burst_capacity to exactly the count of expected requests (based on how many implicit protocol requests the client makes):

File: tests/server/middleware/test_rate_limiting.py (lines 383–384)
Change: Replace RateLimitingMiddleware(max_requests_per_second=9.0, burst_capacity=5) with something like RateLimitingMiddleware(max_requests_per_second=0.001, burst_capacity=4) (or whichever burst value exhausts after init + list_tools + 2 calls on all Python versions)
Why: A near-zero refill rate makes timing irrelevant; the bucket simply counts requests without replenishing during the test

This failure is unrelated to the max_completion_tokens fix in this PR — it's a pre-existing flaky test that happened to fail on this CI run for Python 3.13.

Detailed Analysis

Failing test (tests/server/middleware/test_rate_limiting.py:381):
```python
async def test_rate_limiting_with_different_operations(self, rate_limit_server):
rate_limit_server.add_middleware(
RateLimitingMiddleware(max_requests_per_second=9.0, burst_capacity=5)
)
async with Client(rate_limit_server) as client:
await client.call_tool("quick_action", {"message": "test"})
await client.call_tool("heavy_computation")
# Should be rate limited regardless of operation type
with pytest.raises(ToolError, match="Rate limit exceeded"): # <-- DID NOT RAISE
await client.call_tool("batch_process", {"items": ["a", "b", "c"]})
```

Log excerpt:
```
FAILED tests/server/middleware/test_rate_limiting.py::TestRateLimitingMiddlewareIntegration::test_rate_limiting_with_different_operations

Failed: DID NOT RAISE <class 'fastmcp.exceptions.ToolError'>
= 1 failed, 4379 passed, 1 skipped, 1 xfailed, 6 warnings in 104.53s (0:01:44) =
```

Token bucket logic (src/fastmcp/server/middleware/rate_limiting.py:38):
```python
async def consume(self, tokens: int = 1) -> bool:
async with self._lock:
now = time.time()
elapsed = now - self.last_refill
# Add tokens based on elapsed time
self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_rate)
...
```

With refill_rate=9.0, just ~0.11 seconds between requests restores 1 token. A comparable test (test_sliding_window_rate_limiting) comments: # 1 init + 1 list_tools + 3 calls + 1 to fail for max_requests=6, suggesting 5 requests are expected before the failing one. But the rate-limited version uses burst_capacity=5 with only 2 explicit calls, leaving the 3rd with 1 token remaining on Python 3.13.

Only fails on: Python 3.13 / ubuntu-latest
Passes on: Python 3.10 / ubuntu-latest, Python 3.10 / windows-latest

Related Files

tests/server/middleware/test_rate_limiting.py — contains the flaky test; lines 381–394 are the failing test case
src/fastmcp/server/middleware/rate_limiting.py — TokenBucketRateLimiter.consume() at line 38 performs time-based refill that makes the test timing-sensitive

fix: skip max_completion_tokens when maxTokens is None

3e6cdb0

Copilot AI review requested due to automatic review settings February 24, 2026 09:08

Copilot started reviewing on behalf of eon01 February 24, 2026 09:09 View session

marvin-context-protocol Bot added bug Something isn't working. Reports of errors, unexpected behavior, or broken functionality. client Related to the FastMCP client SDK or client-side functionality. labels Feb 24, 2026

Copilot AI reviewed Feb 24, 2026

View reviewed changes

Comment thread src/fastmcp/client/sampling/handlers/openai.py

jlowin approved these changes Feb 24, 2026

View reviewed changes

jlowin merged commit 549f48b into PrefectHQ:main Feb 24, 2026
12 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: skip max_completion_tokens when maxTokens is None#3284

fix: skip max_completion_tokens when maxTokens is None#3284
jlowin merged 1 commit intoPrefectHQ:mainfrom
eon01:fix/sampling-handler-max-completion-tokens

eon01 commented Feb 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

jlowin left a comment

Uh oh!

Uh oh!

marvin-context-protocol Bot commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

eon01 commented Feb 24, 2026

Description

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

jlowin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

marvin-context-protocol Bot commented Feb 24, 2026

Test Failure Analysis

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants