Skip to content

Commit e46e32b

Browse files
authored
feat: expose prompt-cache runtime context to context engines (#62179)
* Context engine: plumb prompt cache runtime context Add a typed prompt-cache payload to the context-engine runtime context and populate it from the embedded runner's resolved retention, last-call usage, cache-break observation, and cache-touch metadata. Also pass the same payload through the retry compaction runtime context when a run attempt already has it. Regeneration-Prompt: | Expose OpenClaw prompt-cache telemetry to context engines in a narrow, additive way without changing compaction policy. Keep the public change on the OpenClaw side only: add a typed promptCache payload to the context-engine runtime context, thread it into afterTurn, and also into compact where the existing run loop already has the data cheaply available. Use OpenClaw's resolved cache retention, not raw config. Use last-call usage for the new payload, not accumulated retry or tool-loop totals. Reuse the existing prompt-cache observability result and tracked change causes instead of inventing a new heuristic. If cache-touch metadata is already available from the cache-TTL bookkeeping, include it; do not invent expiry timestamps for providers where OpenClaw cannot know them confidently. Keep the interface backward-compatible for engines that ignore the new field. Add focused tests around the existing attempt/context-engine helpers and the compaction runtime-context propagation path rather than broad new integration coverage. * Agents: fix prompt-cache afterTurn usage Regeneration-Prompt: | Fix PR #62179 so context-engine prompt-cache metadata uses only the current attempt's usage. The review comment pointed out that early exits could reuse a prior turn's assistant usage when no new assistant message was produced. Restrict the prompt-cache lastCallUsage lookup to assistant messages added after prePromptMessageCount, and fall back to current-attempt usage totals instead of stale snapshot history. Also repair the PR's new context-engine test typings and add a regression test for the stale prior-turn case. Two import-only fixes in doctor-state-integrity and config/talk were already broken on origin/main, but they blocked build/check and the gateway-watch regression harness, so include the minimum unblocking imports as well. * Agents: document prompt-cache context * Agents: address prompt-cache review feedback * Doctor: drop unused isRecord import
1 parent dac7288 commit e46e32b

13 files changed

Lines changed: 589 additions & 48 deletions

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ Docs: https://docs.openclaw.ai
2727
- Memory/wiki: add an opt-in `context.includeCompiledDigestPrompt` flag so memory prompt supplements can append a compact compiled wiki snapshot for legacy prompt assembly and context engines that explicitly consume memory prompt sections. Thanks @vincentkoc.
2828
- Plugin SDK/context engines: pass `availableTools` and `citationsMode` into `assemble()`, and expose `buildMemorySystemPromptAddition(...)` so non-legacy context engines can adopt the active memory prompt path without reimplementing it. Thanks @vincentkoc.
2929
- Providers/inferrs: add string-content compatibility for stricter OpenAI-compatible chat backends, document `inferrs` setup with a full config example, and add troubleshooting guidance for local backends that pass direct probes but fail on full agent-runtime prompts.
30+
- Agents/context engine: expose prompt-cache runtime context to context engines and keep current-turn prompt-cache usage aligned with the active attempt instead of stale prior-turn assistant state. (#62179) Thanks @jalehman.
3031

3132
### Fixes
3233

src/agents/pi-embedded-runner/cache-ttl.test.ts

Lines changed: 56 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ vi.mock("../../plugins/provider-runtime.js", async () => {
2525
};
2626
});
2727

28-
import { isCacheTtlEligibleProvider } from "./cache-ttl.js";
28+
import { isCacheTtlEligibleProvider, readLastCacheTtlTimestamp } from "./cache-ttl.js";
2929

3030
describe("isCacheTtlEligibleProvider", () => {
3131
it("allows anthropic", () => {
@@ -85,3 +85,58 @@ describe("isCacheTtlEligibleProvider", () => {
8585
).toBe(true);
8686
});
8787
});
88+
89+
describe("readLastCacheTtlTimestamp", () => {
90+
it("returns the latest matching timestamp for the active provider/model", () => {
91+
const sessionManager = {
92+
getEntries: () => [
93+
{
94+
type: "custom",
95+
customType: "openclaw.cache-ttl",
96+
data: {
97+
timestamp: 1_700_000_000_000,
98+
provider: "anthropic",
99+
modelId: "claude-sonnet-4-5",
100+
},
101+
},
102+
{
103+
type: "custom",
104+
customType: "openclaw.cache-ttl",
105+
data: {
106+
timestamp: 1_700_000_001_000,
107+
provider: "google",
108+
modelId: "gemini-3.1-pro-preview",
109+
},
110+
},
111+
],
112+
};
113+
114+
expect(
115+
readLastCacheTtlTimestamp(sessionManager, {
116+
provider: "Anthropic",
117+
modelId: "Claude-Sonnet-4-5",
118+
}),
119+
).toBe(1_700_000_000_000);
120+
});
121+
122+
it("ignores unscoped cache-ttl entries when a context filter is requested", () => {
123+
const sessionManager = {
124+
getEntries: () => [
125+
{
126+
type: "custom",
127+
customType: "openclaw.cache-ttl",
128+
data: {
129+
timestamp: 1_700_000_000_000,
130+
},
131+
},
132+
],
133+
};
134+
135+
expect(
136+
readLastCacheTtlTimestamp(sessionManager, {
137+
provider: "anthropic",
138+
modelId: "claude-sonnet-4-5",
139+
}),
140+
).toBeNull();
141+
});
142+
});

src/agents/pi-embedded-runner/cache-ttl.ts

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,11 @@ export type CacheTtlEntryData = {
1212
modelId?: string;
1313
};
1414

15+
type CacheTtlContext = {
16+
provider?: string;
17+
modelId?: string;
18+
};
19+
1520
export function isCacheTtlEligibleProvider(
1621
provider: string,
1722
modelId: string,
@@ -39,7 +44,32 @@ export function isCacheTtlEligibleProvider(
3944
);
4045
}
4146

42-
export function readLastCacheTtlTimestamp(sessionManager: unknown): number | null {
47+
function normalizeCacheTtlKey(value: string | undefined): string | undefined {
48+
return value?.trim().toLowerCase();
49+
}
50+
51+
function matchesCacheTtlContext(
52+
data: Partial<CacheTtlEntryData> | undefined,
53+
context: CacheTtlContext | undefined,
54+
): boolean {
55+
if (!context) {
56+
return true;
57+
}
58+
const expectedProvider = normalizeCacheTtlKey(context.provider);
59+
if (expectedProvider && normalizeCacheTtlKey(data?.provider) !== expectedProvider) {
60+
return false;
61+
}
62+
const expectedModelId = normalizeCacheTtlKey(context.modelId);
63+
if (expectedModelId && normalizeCacheTtlKey(data?.modelId) !== expectedModelId) {
64+
return false;
65+
}
66+
return true;
67+
}
68+
69+
export function readLastCacheTtlTimestamp(
70+
sessionManager: unknown,
71+
context?: CacheTtlContext,
72+
): number | null {
4373
const sm = sessionManager as { getEntries?: () => CustomEntryLike[] };
4474
if (!sm?.getEntries) {
4575
return null;
@@ -53,6 +83,9 @@ export function readLastCacheTtlTimestamp(sessionManager: unknown): number | nul
5383
continue;
5484
}
5585
const data = entry?.data as Partial<CacheTtlEntryData> | undefined;
86+
if (!matchesCacheTtlContext(data, context)) {
87+
continue;
88+
}
5689
const ts = typeof data?.timestamp === "number" ? data.timestamp : null;
5790
if (ts && Number.isFinite(ts)) {
5891
last = ts;

src/agents/pi-embedded-runner/extensions.ts

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,10 @@ function buildContextPruningFactory(params: {
5959
contextWindowTokens: resolveContextWindowTokens(params),
6060
isToolPrunable: makeToolPrunablePredicate(settings.tools),
6161
dropThinkingBlocks: transcriptPolicy.dropThinkingBlocks,
62-
lastCacheTouchAt: readLastCacheTtlTimestamp(params.sessionManager),
62+
lastCacheTouchAt: readLastCacheTtlTimestamp(params.sessionManager, {
63+
provider: params.provider,
64+
modelId: params.modelId,
65+
}),
6366
});
6467

6568
return contextPruningExtension;

src/agents/pi-embedded-runner/run.overflow-compaction.test.ts

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,58 @@ describe("runEmbeddedPiAgent overflow compaction trigger routing", () => {
124124
);
125125
});
126126

127+
it("threads prompt-cache runtime context into overflow compaction", async () => {
128+
mockedRunEmbeddedAttempt
129+
.mockResolvedValueOnce(
130+
makeAttemptResult({
131+
promptError: makeOverflowError(),
132+
promptCache: {
133+
retention: "short",
134+
lastCallUsage: {
135+
input: 150000,
136+
cacheRead: 32000,
137+
total: 182000,
138+
},
139+
observation: {
140+
broke: false,
141+
cacheRead: 32000,
142+
},
143+
lastCacheTouchAt: 1_700_000_000_000,
144+
},
145+
}),
146+
)
147+
.mockResolvedValueOnce(makeAttemptResult({ promptError: null }));
148+
mockedCompactDirect.mockResolvedValueOnce(
149+
makeCompactionSuccess({
150+
summary: "Compacted session",
151+
tokensBefore: 150000,
152+
tokensAfter: 80000,
153+
}),
154+
);
155+
156+
await runEmbeddedPiAgent(overflowBaseRunParams);
157+
158+
expect(mockedCompactDirect).toHaveBeenCalledWith(
159+
expect.objectContaining({
160+
runtimeContext: expect.objectContaining({
161+
trigger: "overflow",
162+
promptCache: expect.objectContaining({
163+
retention: "short",
164+
lastCallUsage: expect.objectContaining({
165+
input: 150000,
166+
cacheRead: 32000,
167+
}),
168+
observation: expect.objectContaining({
169+
broke: false,
170+
cacheRead: 32000,
171+
}),
172+
lastCacheTouchAt: 1_700_000_000_000,
173+
}),
174+
}),
175+
}),
176+
);
177+
});
178+
127179
it("passes observed overflow token counts into compaction when providers report them", async () => {
128180
const overflowError = new Error(
129181
'400 {"type":"error","error":{"type":"invalid_request_error","message":"prompt is too long: 277403 tokens > 200000 maximum"}}',

src/agents/pi-embedded-runner/run.timeout-triggered-compaction.test.ts

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,19 @@ describe("timeout-triggered compaction", () => {
4040
mockedRunEmbeddedAttempt.mockResolvedValueOnce(
4141
makeAttemptResult({
4242
timedOut: true,
43+
promptCache: {
44+
retention: "short",
45+
lastCallUsage: {
46+
input: 150000,
47+
cacheRead: 32000,
48+
total: 182000,
49+
},
50+
observation: {
51+
broke: false,
52+
cacheRead: 32000,
53+
},
54+
lastCacheTouchAt: 1_700_000_000_000,
55+
},
4356
lastAssistant: {
4457
usage: { input: 150000 },
4558
} as never,
@@ -67,6 +80,18 @@ describe("timeout-triggered compaction", () => {
6780
force: true,
6881
compactionTarget: "budget",
6982
runtimeContext: expect.objectContaining({
83+
promptCache: expect.objectContaining({
84+
retention: "short",
85+
lastCallUsage: expect.objectContaining({
86+
input: 150000,
87+
cacheRead: 32000,
88+
}),
89+
observation: expect.objectContaining({
90+
broke: false,
91+
cacheRead: 32000,
92+
}),
93+
lastCacheTouchAt: 1_700_000_000_000,
94+
}),
7095
trigger: "timeout_recovery",
7196
attempt: 1,
7297
maxAttempts: 2,

src/agents/pi-embedded-runner/run.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -803,6 +803,7 @@ export async function runEmbeddedPiAgent(
803803
extraSystemPrompt: params.extraSystemPrompt,
804804
ownerNumbers: params.ownerNumbers,
805805
}),
806+
...(attempt.promptCache ? { promptCache: attempt.promptCache } : {}),
806807
runId: params.runId,
807808
trigger: "timeout_recovery",
808809
diagId: timeoutDiagId,
@@ -944,6 +945,7 @@ export async function runEmbeddedPiAgent(
944945
extraSystemPrompt: params.extraSystemPrompt,
945946
ownerNumbers: params.ownerNumbers,
946947
}),
948+
...(attempt.promptCache ? { promptCache: attempt.promptCache } : {}),
947949
runId: params.runId,
948950
trigger: "overflow",
949951
...(observedOverflowTokens !== undefined

src/agents/pi-embedded-runner/run/attempt.prompt-helpers.ts

Lines changed: 32 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
import type { OpenClawConfig } from "../../../config/config.js";
2+
import type {
3+
ContextEnginePromptCacheInfo,
4+
ContextEngineRuntimeContext,
5+
} from "../../../context-engine/types.js";
26
import type {
37
PluginHookAgentContext,
48
PluginHookBeforeAgentStartResult,
@@ -11,7 +15,6 @@ import { buildActiveMusicGenerationTaskPromptContextForSession } from "../../mus
1115
import { prependSystemPromptAdditionAfterCacheBoundary } from "../../system-prompt-cache-boundary.js";
1216
import { resolveEffectiveToolFsWorkspaceOnly } from "../../tool-fs-policy.js";
1317
import { buildActiveVideoGenerationTaskPromptContextForSession } from "../../video-generation-task-status.js";
14-
import type { CompactEmbeddedPiSessionParams } from "../compact.js";
1518
import { buildEmbeddedCompactionRuntimeContext } from "../compaction-runtime-context.js";
1619
import { log } from "../logger.js";
1720
import { shouldInjectHeartbeatPromptForTrigger } from "./trigger-policy.js";
@@ -179,28 +182,32 @@ export function buildAfterTurnRuntimeContext(params: {
179182
>;
180183
workspaceDir: string;
181184
agentDir: string;
182-
}): Partial<CompactEmbeddedPiSessionParams> {
183-
return buildEmbeddedCompactionRuntimeContext({
184-
sessionKey: params.attempt.sessionKey,
185-
messageChannel: params.attempt.messageChannel,
186-
messageProvider: params.attempt.messageProvider,
187-
agentAccountId: params.attempt.agentAccountId,
188-
currentChannelId: params.attempt.currentChannelId,
189-
currentThreadTs: params.attempt.currentThreadTs,
190-
currentMessageId: params.attempt.currentMessageId,
191-
authProfileId: params.attempt.authProfileId,
192-
workspaceDir: params.workspaceDir,
193-
agentDir: params.agentDir,
194-
config: params.attempt.config,
195-
skillsSnapshot: params.attempt.skillsSnapshot,
196-
senderIsOwner: params.attempt.senderIsOwner,
197-
senderId: params.attempt.senderId,
198-
provider: params.attempt.provider,
199-
modelId: params.attempt.modelId,
200-
thinkLevel: params.attempt.thinkLevel,
201-
reasoningLevel: params.attempt.reasoningLevel,
202-
bashElevated: params.attempt.bashElevated,
203-
extraSystemPrompt: params.attempt.extraSystemPrompt,
204-
ownerNumbers: params.attempt.ownerNumbers,
205-
});
185+
promptCache?: ContextEnginePromptCacheInfo;
186+
}): ContextEngineRuntimeContext {
187+
return {
188+
...buildEmbeddedCompactionRuntimeContext({
189+
sessionKey: params.attempt.sessionKey,
190+
messageChannel: params.attempt.messageChannel,
191+
messageProvider: params.attempt.messageProvider,
192+
agentAccountId: params.attempt.agentAccountId,
193+
currentChannelId: params.attempt.currentChannelId,
194+
currentThreadTs: params.attempt.currentThreadTs,
195+
currentMessageId: params.attempt.currentMessageId,
196+
authProfileId: params.attempt.authProfileId,
197+
workspaceDir: params.workspaceDir,
198+
agentDir: params.agentDir,
199+
config: params.attempt.config,
200+
skillsSnapshot: params.attempt.skillsSnapshot,
201+
senderIsOwner: params.attempt.senderIsOwner,
202+
senderId: params.attempt.senderId,
203+
provider: params.attempt.provider,
204+
modelId: params.attempt.modelId,
205+
thinkLevel: params.attempt.thinkLevel,
206+
reasoningLevel: params.attempt.reasoningLevel,
207+
bashElevated: params.attempt.bashElevated,
208+
extraSystemPrompt: params.attempt.extraSystemPrompt,
209+
ownerNumbers: params.attempt.ownerNumbers,
210+
}),
211+
...(params.promptCache ? { promptCache: params.promptCache } : {}),
212+
};
206213
}

0 commit comments

Comments
 (0)