OpenRouter supports server‑side prompt caching via the cache_control parameter (docs: https://openrouter.ai/docs/guides/best-practices/prompt-caching).
This can significantly reduce token costs by caching the static prefix (system prompt + injected workspace files) across requests.
Currently, OpenClaw already implements this for Anthropic direct via cacheRetention, but OpenRouter - for some providers' - requests don't include cache_control, missing potential savings of ~10‑12k tokens per turn. I recognise some of them cache automatically, but others don't.
Proposed solution:
- Add cache_control parameter to OpenRouter provider configuration
- Compute hash of static prompt prefix (system prompt + injected files)
- Insert cache_control breakpoint after static prefix
- Track and reuse cache IDs across requests with same prefix
Configuration example:
{ agents: { defaults: { models: { "openrouter/openai/chatgpt-4o": { params: { cache_control: { type: "ephemeral", ttl: "1h" } } } } } } }
Benefits:
-
Reduces token burn for identical prefix across turns
-
Works across sessions if prefix unchanged
-
Compatible with all OpenRouter‑supported providers (Anthropic, OpenAI, Gemini, DeepSeek, etc.)
-
Follows existing pattern from Anthropic cacheRetention implementation
Implementation complexity:
Low‑medium (~100‑200 LoC)
2. Code Changes Sketch
Let me create a minimal patch for packages/gateway/src/providers/openrouter.ts (based on inferred structure):
// Hypothetical implementation - needs actual code inspection
import { hash } from 'node:crypto';
interface OpenRouterCacheControl {
type: 'ephemeral';
ttl?: '1h';
}
interface OpenRouterRequest {
messages: Array<{
role: string;
content: Array<{
type: string;
text: string;
cache_control?: OpenRouterCacheControl;
}>;
}>;
cache_control?: OpenRouterCacheControl; // Top-level for some providers
}
class OpenRouterProvider {
private prefixCache = new Map<string, string>(); // hash -> cache_id
async createChatCompletion(request, config) {
const { cache_control } = config.params || {};
if (cache_control) {
// 1. Compute hash of static prefix (system prompt + injected files)
const prefixHash = this.computePrefixHash(request.messages);
// 2. Check for existing cache ID
const cacheId = this.prefixCache.get(prefixHash);
if (cacheId) {
request.cache_id = cacheId;
} else {
// 3. Insert cache_control breakpoint after static prefix
this.insertCacheControl(request.messages, cache_control);
}
}
const response = await this.sendToOpenRouter(request);
// 4. Store new cache ID from response
if (cache_control && response.cache_id && !request.cache_id) {
const prefixHash = this.computePrefixHash(request.messages);
this.prefixCache.set(prefixHash, response.cache_id);
}
return response;
}
private computePrefixHash(messages) {
// Identify static parts: system messages + injected file content
// Hash them for change detection
const staticText = messages
.filter(msg => msg.role === 'system')
.map(msg => msg.content.map(c => c.text).join(''))
.join('');
return hash('sha256').update(staticText).digest('hex');
}
private insertCacheControl(messages, cache_control) {
// Find the last system message or first user message
// Insert cache_control in the appropriate text part
for (const msg of messages) {
if (msg.role === 'system' && msg.content?.length) {
const lastContent = msg.content[msg.content.length - 1];
if (lastContent.type === 'text') {
lastContent.cache_control = cache_control;
break;
}
}
}
}
}
Additional considerations:
- Need to check minimum token requirements per provider
- Handle multiple cache_control breakpoints for Anthropic (max 4)
- Clear cache when workspace files change
- Add metrics to track cache hits/savings
OpenRouter supports server‑side prompt caching via the cache_control parameter (docs: https://openrouter.ai/docs/guides/best-practices/prompt-caching).
This can significantly reduce token costs by caching the static prefix (system prompt + injected workspace files) across requests.
Currently, OpenClaw already implements this for Anthropic direct via cacheRetention, but OpenRouter - for some providers' - requests don't include cache_control, missing potential savings of ~10‑12k tokens per turn. I recognise some of them cache automatically, but others don't.
Proposed solution:
Configuration example:
{ agents: { defaults: { models: { "openrouter/openai/chatgpt-4o": { params: { cache_control: { type: "ephemeral", ttl: "1h" } } } } } } }Benefits:
Reduces token burn for identical prefix across turns
Works across sessions if prefix unchanged
Compatible with all OpenRouter‑supported providers (Anthropic, OpenAI, Gemini, DeepSeek, etc.)
Follows existing pattern from Anthropic
cacheRetentionimplementationImplementation complexity:
Low‑medium (~100‑200 LoC)
2. Code Changes Sketch
Let me create a minimal patch for packages/gateway/src/providers/openrouter.ts (based on inferred structure):
Additional considerations: