Skip to content

Add cache_control support to OpenAI provider for Claude models #3333

@HikaruEgashira

Description

@HikaruEgashira

Add cache_control support to OpenAI provider for Claude models

Please explain the motivation behind the feature request.
OpenAI-compatible services like LiteLLM support cache_control for Claude models to reduce API costs through prompt caching. The OpenRouter provider already includes this optimization for Anthropic models, but the OpenAI provider does not, missing cost optimization opportunities.

/// Update the request when using anthropic model.
/// For anthropic model, we can enable prompt caching to save cost. Since openrouter is the OpenAI compatible
/// endpoint, we need to modify the open ai request to have anthropic cache control field.
fn update_request_for_anthropic(original_payload: &Value) -> Value {
let mut payload = original_payload.clone();
if let Some(messages_spec) = payload
.as_object_mut()
.and_then(|obj| obj.get_mut("messages"))
.and_then(|messages| messages.as_array_mut())
{
// Add "cache_control" to the last and second-to-last "user" messages.
// During each turn, we mark the final message with cache_control so the conversation can be
// incrementally cached. The second-to-last user message is also marked for caching with the
// cache_control parameter, so that this checkpoint can read from the previous cache.
let mut user_count = 0;
for message in messages_spec.iter_mut().rev() {
if message.get("role") == Some(&json!("user")) {
if let Some(content) = message.get_mut("content") {
if let Some(content_str) = content.as_str() {
*content = json!([{
"type": "text",
"text": content_str,
"cache_control": { "type": "ephemeral" }
}]);
}
}
user_count += 1;
if user_count >= 2 {
break;
}
}
}
// Update the system message to have cache_control field.
if let Some(system_message) = messages_spec
.iter_mut()
.find(|msg| msg.get("role") == Some(&json!("system")))
{
if let Some(content) = system_message.get_mut("content") {
if let Some(content_str) = content.as_str() {
*system_message = json!({
"role": "system",
"content": [{
"type": "text",
"text": content_str,
"cache_control": { "type": "ephemeral" }
}]
});
}
}
}
}
if let Some(tools_spec) = payload
.as_object_mut()
.and_then(|obj| obj.get_mut("tools"))
.and_then(|tools| tools.as_array_mut())
{
// Add "cache_control" to the last tool spec, if any. This means that all tool definitions,
// will be cached as a single prefix.
if let Some(last_tool) = tools_spec.last_mut() {
if let Some(function) = last_tool.get_mut("function") {
function
.as_object_mut()
.unwrap()
.insert("cache_control".to_string(), json!({ "type": "ephemeral" }));
}
}
}
payload
}

Describe the solution you'd like
Add cache_control functionality to the OpenAI provider when the model name contains "claude":

  1. Add cache_control markers to system messages
  2. Add cache_control markers to the last two user messages
  3. Add cache_control markers to the last tool definition

Describe alternatives you've considered

  1. Apply universally: Could cause issues with non-Claude models
  2. Configuration-based: Adds unnecessary complexity
  3. Model name detection: Practical approach for LiteLLM and similar services

Additional context

  • OpenRouter provider already implements this pattern in update_request_for_anthropic

  • Backwards-compatible since unsupported services ignore unknown fields

  • I have verified this does not duplicate an existing feature request

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions