-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Add cache_control support to OpenAI provider for Claude models #3333
Description
Add cache_control support to OpenAI provider for Claude models
Please explain the motivation behind the feature request.
OpenAI-compatible services like LiteLLM support cache_control for Claude models to reduce API costs through prompt caching. The OpenRouter provider already includes this optimization for Anthropic models, but the OpenAI provider does not, missing cost optimization opportunities.
goose/crates/goose/src/providers/openrouter.rs
Lines 128 to 199 in 6d06909
| /// Update the request when using anthropic model. | |
| /// For anthropic model, we can enable prompt caching to save cost. Since openrouter is the OpenAI compatible | |
| /// endpoint, we need to modify the open ai request to have anthropic cache control field. | |
| fn update_request_for_anthropic(original_payload: &Value) -> Value { | |
| let mut payload = original_payload.clone(); | |
| if let Some(messages_spec) = payload | |
| .as_object_mut() | |
| .and_then(|obj| obj.get_mut("messages")) | |
| .and_then(|messages| messages.as_array_mut()) | |
| { | |
| // Add "cache_control" to the last and second-to-last "user" messages. | |
| // During each turn, we mark the final message with cache_control so the conversation can be | |
| // incrementally cached. The second-to-last user message is also marked for caching with the | |
| // cache_control parameter, so that this checkpoint can read from the previous cache. | |
| let mut user_count = 0; | |
| for message in messages_spec.iter_mut().rev() { | |
| if message.get("role") == Some(&json!("user")) { | |
| if let Some(content) = message.get_mut("content") { | |
| if let Some(content_str) = content.as_str() { | |
| *content = json!([{ | |
| "type": "text", | |
| "text": content_str, | |
| "cache_control": { "type": "ephemeral" } | |
| }]); | |
| } | |
| } | |
| user_count += 1; | |
| if user_count >= 2 { | |
| break; | |
| } | |
| } | |
| } | |
| // Update the system message to have cache_control field. | |
| if let Some(system_message) = messages_spec | |
| .iter_mut() | |
| .find(|msg| msg.get("role") == Some(&json!("system"))) | |
| { | |
| if let Some(content) = system_message.get_mut("content") { | |
| if let Some(content_str) = content.as_str() { | |
| *system_message = json!({ | |
| "role": "system", | |
| "content": [{ | |
| "type": "text", | |
| "text": content_str, | |
| "cache_control": { "type": "ephemeral" } | |
| }] | |
| }); | |
| } | |
| } | |
| } | |
| } | |
| if let Some(tools_spec) = payload | |
| .as_object_mut() | |
| .and_then(|obj| obj.get_mut("tools")) | |
| .and_then(|tools| tools.as_array_mut()) | |
| { | |
| // Add "cache_control" to the last tool spec, if any. This means that all tool definitions, | |
| // will be cached as a single prefix. | |
| if let Some(last_tool) = tools_spec.last_mut() { | |
| if let Some(function) = last_tool.get_mut("function") { | |
| function | |
| .as_object_mut() | |
| .unwrap() | |
| .insert("cache_control".to_string(), json!({ "type": "ephemeral" })); | |
| } | |
| } | |
| } | |
| payload | |
| } |
Describe the solution you'd like
Add cache_control functionality to the OpenAI provider when the model name contains "claude":
- Add cache_control markers to system messages
- Add cache_control markers to the last two user messages
- Add cache_control markers to the last tool definition
Describe alternatives you've considered
- Apply universally: Could cause issues with non-Claude models
- Configuration-based: Adds unnecessary complexity
- Model name detection: Practical approach for LiteLLM and similar services
Additional context
-
OpenRouter provider already implements this pattern in
update_request_for_anthropic -
Backwards-compatible since unsupported services ignore unknown fields
-
I have verified this does not duplicate an existing feature request