Description
Hi,
Based on some test I noticed that OpenAI sampling handler in FastMCP fails when used with some OpenAI models (GPT-5, O1 series) because it always uses the max_tokens parameter, but these models require max_completion_tokens instead.
Is this a bug?
Is there existing documentation about which models are officially supported for sampling?
Example Code
import asyncio
from dotenv import load_dotenv
from fastmcp.client.sampling.handlers.openai import OpenAISamplingHandler
from mcp.types import CreateMessageRequestParams, SamplingMessage, TextContent
load_dotenv()
async def main():
handler = OpenAISamplingHandler(default_model="gpt-4o")
messages = [
SamplingMessage(role="user", content=TextContent(type="text", text="Hi"))
]
params = CreateMessageRequestParams(messages=messages, maxTokens=300)
result = await handler(messages, params, context=None) # type: ignore
print("LLM Response:", result)
# ===> Hello! How can I assist you today?
models = [
"o1",
"gpt-5-mini",
"gpt-5",
]
for model in models:
try:
handler = OpenAISamplingHandler(default_model=model)
messages = [
SamplingMessage(
role="user", content=TextContent(type="text", text="Hi")
)
]
params = CreateMessageRequestParams(messages=messages, maxTokens=300)
result = await handler(messages, params, context=None) # type: ignore
print("LLM Response:", result)
except Exception as e:
print(f"Error with model {model}: {e}")
# ===> 'max_tokens' is not supported with this model.
# Use 'max_completion_tokens' instead.
# The problem:
# server/.venv/lib/python3.12/site-packages/fastmcp/client/sampling/handlers/openai.py
# blindly uses max_tokens for ALL models
asyncio.run(main())
Version Information
FastMCP version: 3.0.0
MCP version: 1.26.0
Python version: 3.12.11
Platform: Linux-6.8.0-90-generic-x86_64-with-glibc2.39
FastMCP root path: xxxx/.venv/lib/python3.12/site-packages
Description
Hi,
Based on some test I noticed that OpenAI sampling handler in FastMCP fails when used with some OpenAI models (GPT-5, O1 series) because it always uses the
max_tokensparameter, but these models requiremax_completion_tokensinstead.Is this a bug?
Is there existing documentation about which models are officially supported for sampling?
Example Code
Version Information