Summary
OpenClaw's ContextEngine.assemble() contract says the engine returns ordered messages ready for the model under a token budget, but the embedded runner still hard-gates prompt submission on a larger pre-assembly message view (unwindowedMessages). With lossless-claw/LCM as the context engine, this can force repeated precheck overflow, compaction retries, and eventually a session reset even when the assembled context should be the prompt-authoritative view.
This is related to earlier compaction/context drift reports such as #69838 and #50065, but the failure mode here is narrower: context-engine assembly can be applied to activeSession.messages, then bypassed by the preemptive overflow precheck.
Environment observed
- OpenClaw package:
2026.4.24
- Live install symlink:
/opt/homebrew/lib/node_modules/openclaw -> /Users/lume/repos/openclaw-pr70071-rebase
- Live OpenClaw head:
39199f8e42
- Active context engine slot:
lossless-claw
- Active model:
openai-codex/gpt-5.5
- Configured model context window:
258000
- Runtime reserve:
50000
- Effective prompt budget before reserve in logs:
208000
Confidence matrix
- 0.94 - primary cause: OpenClaw precheck treats the pre-assembly message view as a hard prompt-admission gate even after context-engine assembly has produced a smaller prompt-ready view.
- 0.86 - runtime failure timeline: Apr 28-29 logs show repeated
gpt-5.5 precheck overflow, compaction retries, and a hard reset to a new session.
- 0.85 - contributing cause: aggregate tool-result bulk repeatedly pushed prompt estimates over budget; recovery worked in some turns but not the fatal loop.
- 0.80 - user-visible forgetfulness cause: compaction failure resets the session and clears token/cache/accounting fields; continuity then depends on the context engine and recall tools.
- 0.75 - contributing pressure:
gpt-5.5's 258000 context window plus a 50000 reserve leaves only 208000 prompt tokens, making this bug much easier to trigger than under larger-window models.
- 0.70 - contributing pressure: repeated bootstrap/system/context injection and large workspace guidance add prompt bulk; logs show
AGENTS.md and SOUL.md truncation around 40k chars.
- 0.65 - related cache/retry risk: the deployed branch does not appear to include the full CLI prompt-build drain-cache retry fix from commit
4225db3a7b; this is likely adjacent for CLI/session-expired forgetfulness, though the fatal overflow evidence here is in the embedded runner.
- 0.65 - related recall risk: Cortex recall was effectively absent in the same window, which can amplify perceived forgetfulness after a reset, but this is separate from the overflow precheck bug.
Source evidence
ContextEngine.assemble() is defined as the prompt-ready assembly point:
src/context-engine/types.ts: AssembleResult.messages are the "Ordered messages to use as model context" and assemble() "Returns an ordered set of messages ready for the model."
The embedded runner snapshots the pre-assembly messages, calls the context engine, and replaces the active session messages with the assembled output:
src/agents/pi-embedded-runner/run/attempt.ts
- snapshots
unwindowedContextEngineMessagesForPrecheck = activeSession.messages.slice() before assembly
- calls
assembleAttemptContextEngine(...)
- assigns
activeSession.agent.state.messages = assembled.messages when assembly returns a different array
The precheck then receives both the assembled active messages and the pre-assembly snapshot:
src/agents/pi-embedded-runner/run/attempt.ts
shouldPreemptivelyCompactBeforePrompt({ messages: activeSession.messages, unwindowedMessages: unwindowedContextEngineMessagesForPrecheck, ... })
The precheck intentionally chooses the larger estimate:
src/agents/pi-embedded-runner/run/preemptive-compaction.ts
- if
unwindowedEstimatedPromptTokens > estimatedPromptTokens, it replaces the estimate and messagesForPressure with unwindowedMessages
A current unit test locks in this behavior:
src/agents/pi-embedded-runner/run/preemptive-compaction.test.ts
- test name:
uses the larger unwindowed message estimate when context engine assembly windows history
A targeted local test run passed, confirming the behavior is expected by the current test suite:
node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/pi-embedded-runner/run/preemptive-compaction.test.ts
Test Files 1 passed (1)
Tests 10 passed (10)
Runtime evidence
The fatal Apr 29 loop looked like this:
2026-04-29T01:24:27.357+07:00 [context-overflow-precheck] route=compact_then_truncate estimatedPromptTokens=284165 promptBudgetBeforeReserve=208000 overflowTokens=76165 toolResultReducibleChars=227371 sessionFile=.../7fa8806a-4ad4-4b57-8978-e7d08a6bfdc2.jsonl
2026-04-29T01:24:36.133+07:00 [context-overflow-precheck] route=compact_only estimatedPromptTokens=215976 promptBudgetBeforeReserve=208000 overflowTokens=7976 toolResultReducibleChars=0 sessionFile=.../7fa8806a-4ad4-4b57-8978-e7d08a6bfdc2.jsonl
2026-04-29T01:24:47.337+07:00 [context-overflow-precheck] route=compact_only estimatedPromptTokens=215976 promptBudgetBeforeReserve=208000 overflowTokens=7976 toolResultReducibleChars=0 sessionFile=.../7fa8806a-4ad4-4b57-8978-e7d08a6bfdc2.jsonl
2026-04-29T01:24:48.125+07:00 [context-overflow-precheck] route=compact_only estimatedPromptTokens=215976 promptBudgetBeforeReserve=208000 overflowTokens=7976 toolResultReducibleChars=0 sessionFile=.../7fa8806a-4ad4-4b57-8978-e7d08a6bfdc2.jsonl
2026-04-29T01:24:48.172+07:00 Auto-compaction failed (Context overflow: prompt too large for the model (precheck).). Restarting session agent:main:main -> 4ff1673e-4b2c-406a-b6e4-03c0beb54b25 and retrying.
The new session begins with an injected reset message:
Context limit exceeded. I've reset our conversation to start fresh - please try again.
The active LCM database state after the reset showed much smaller assembled context than the raw stored history:
- active conversation had about
56k messages / 20M stored message tokens
- current
context_items were compact: summaries + recent messages, around tens of thousands of tokens
- maintenance row reported
current_token_count=87767 against token_budget=258000
Caveat: the LCM DB tracks sessionKey=agent:main:main through reset, so the exact LCM assembled output for the pre-reset 7fa... failure is not preserved as a separate per-session row. The code-path contradiction above does not depend on that caveat.
Possible causes to investigate
- The current precheck conflates two different concepts: prompt-admission size and raw-history maintenance pressure.
- Context engines that perform real summarizing/retrieval assembly need their assembled result to be prompt-authoritative by default.
- If OpenClaw needs to detect raw-history debt, that should be a separate maintenance signal, not an unconditional prompt blocker.
- Post-compaction recovery can flip from
compact_then_truncate to compact_only; after that, remaining tool-result cleanup may be skipped even though it was part of the original pressure.
- Session reset after precheck compaction failure causes user-visible forgetfulness and clears accounting/cache fields.
- Bootstrap/context injection volume and small effective
gpt-5.5 prompt budget make the failure easier to trigger.
- Cortex/recall unavailability after reset can make the continuity loss more visible, though it is not the primary overflow trigger.
Suggested fix
Make context-engine assembly prompt-authoritative for prompt admission by default:
- Precheck should estimate against
assembled.messages plus system prompt and user prompt.
- Prefer
assembled.estimatedTokens when the engine provides a trustworthy estimate, or add a flag/capability for trust level.
- If the host still wants to monitor pre-assembly/raw-history pressure, expose that as a separate maintenance/debt signal.
- Add a regression test where a context engine assembles a small valid prompt from a large pre-assembly history and precheck allows the prompt instead of raising
Context overflow: prompt too large for the model (precheck).
- Consider a
ContextEngineInfo capability such as assemblyIsPromptAuthoritative or a result-level metadata field that distinguishes assembled, fallback-live, and emergency outputs.
Suggested regression shape
- Mock a context engine whose
assemble() returns a small messages array under budget.
- Provide a much larger pre-assembly message array that would overflow if checked directly.
- Run the embedded attempt precheck path.
- Assert prompt submission is allowed, and raw-history pressure is reported only as maintenance debt.
Impact
Long-running context-engine sessions can reset even though the context engine has already compacted or assembled a prompt-ready view. This causes avoidable prompt failures, compaction loops, cache disruption, and user-visible memory loss.
Summary
OpenClaw's
ContextEngine.assemble()contract says the engine returns ordered messages ready for the model under a token budget, but the embedded runner still hard-gates prompt submission on a larger pre-assembly message view (unwindowedMessages). Withlossless-claw/LCM as the context engine, this can force repeated precheck overflow, compaction retries, and eventually a session reset even when the assembled context should be the prompt-authoritative view.This is related to earlier compaction/context drift reports such as #69838 and #50065, but the failure mode here is narrower: context-engine assembly can be applied to
activeSession.messages, then bypassed by the preemptive overflow precheck.Environment observed
2026.4.24/opt/homebrew/lib/node_modules/openclaw -> /Users/lume/repos/openclaw-pr70071-rebase39199f8e42lossless-clawopenai-codex/gpt-5.525800050000208000Confidence matrix
gpt-5.5precheck overflow, compaction retries, and a hard reset to a new session.gpt-5.5's258000context window plus a50000reserve leaves only208000prompt tokens, making this bug much easier to trigger than under larger-window models.AGENTS.mdandSOUL.mdtruncation around 40k chars.4225db3a7b; this is likely adjacent for CLI/session-expired forgetfulness, though the fatal overflow evidence here is in the embedded runner.Source evidence
ContextEngine.assemble()is defined as the prompt-ready assembly point:src/context-engine/types.ts:AssembleResult.messagesare the "Ordered messages to use as model context" andassemble()"Returns an ordered set of messages ready for the model."The embedded runner snapshots the pre-assembly messages, calls the context engine, and replaces the active session messages with the assembled output:
src/agents/pi-embedded-runner/run/attempt.tsunwindowedContextEngineMessagesForPrecheck = activeSession.messages.slice()before assemblyassembleAttemptContextEngine(...)activeSession.agent.state.messages = assembled.messageswhen assembly returns a different arrayThe precheck then receives both the assembled active messages and the pre-assembly snapshot:
src/agents/pi-embedded-runner/run/attempt.tsshouldPreemptivelyCompactBeforePrompt({ messages: activeSession.messages, unwindowedMessages: unwindowedContextEngineMessagesForPrecheck, ... })The precheck intentionally chooses the larger estimate:
src/agents/pi-embedded-runner/run/preemptive-compaction.tsunwindowedEstimatedPromptTokens > estimatedPromptTokens, it replaces the estimate andmessagesForPressurewithunwindowedMessagesA current unit test locks in this behavior:
src/agents/pi-embedded-runner/run/preemptive-compaction.test.tsuses the larger unwindowed message estimate when context engine assembly windows historyA targeted local test run passed, confirming the behavior is expected by the current test suite:
Runtime evidence
The fatal Apr 29 loop looked like this:
The new session begins with an injected reset message:
The active LCM database state after the reset showed much smaller assembled context than the raw stored history:
56kmessages /20Mstored message tokenscontext_itemswere compact: summaries + recent messages, around tens of thousands of tokenscurrent_token_count=87767againsttoken_budget=258000Caveat: the LCM DB tracks
sessionKey=agent:main:mainthrough reset, so the exact LCM assembled output for the pre-reset7fa...failure is not preserved as a separate per-session row. The code-path contradiction above does not depend on that caveat.Possible causes to investigate
compact_then_truncatetocompact_only; after that, remaining tool-result cleanup may be skipped even though it was part of the original pressure.gpt-5.5prompt budget make the failure easier to trigger.Suggested fix
Make context-engine assembly prompt-authoritative for prompt admission by default:
assembled.messagesplus system prompt and user prompt.assembled.estimatedTokenswhen the engine provides a trustworthy estimate, or add a flag/capability for trust level.Context overflow: prompt too large for the model (precheck).ContextEngineInfocapability such asassemblyIsPromptAuthoritativeor a result-level metadata field that distinguishesassembled,fallback-live, andemergencyoutputs.Suggested regression shape
assemble()returns a smallmessagesarray under budget.Impact
Long-running context-engine sessions can reset even though the context engine has already compacted or assembled a prompt-ready view. This causes avoidable prompt failures, compaction loops, cache disruption, and user-visible memory loss.