Summary
agent/embedded session-repair logic leaves session JSONL files ending on a role=assistant message after "repair", which then resubmits to Anthropic with assistant prefill — Anthropic rejects with HTTP 400 (This model does not support assistant message prefill). When the agent's failover chain has only one candidate, this surfaces as a user-visible error and can wedge the embedded agent in a silent loop.
Environment
- OpenClaw: latest (running via LaunchAgent on macOS)
- Model:
anthropic/claude-opus-4-7
- Platform: Darwin 25.3.0 arm64, Node v25.8.1
Repro / Observed Behaviour
- Session JSONL ends with
role=assistant (e.g. previous run interrupted before the next user turn was appended).
agent/embedded attempts repair on session resume.
- Repair rewrites the assistant message but file still ends on
role=assistant.
- Request submitted to Anthropic ends with assistant content → HTTP 400
format error.
- Failover decision =
surface_error because primary is the only candidate.
- Loop continues until process is SIGTERM'd.
Real-world Impact
Out of 877 session files on this gateway, 250 (~28%) had this corruption. Yesterday alone: 300 prefill rejections + 49 repair attempts. Today before manual restart (~7h window): 84 + 14. Continuous, not one-off.
Suggested Fix
Either:
- Drop trailing assistant entries during repair until the file ends on a
role=user turn, OR
- Append a synthetic user "(continue)" turn before resubmission.
Also worth considering: a session-janitor cron that quarantines any JSONL whose last message entry isn't role=user, run weekly or on startup.
Workaround (current)
- Quarantined 250 corrupt sessions to
~/.openclaw/agents/<id>/sessions/_quarantine_YYYY-MM-DD/ with a MANIFEST.
- Added
fallbacks array to agents.defaults.model so single-candidate 4xx no longer surfaces directly.
Logs (sanitized)
06:51:36 agent/embedded repair → session e9256c82-...
06:51:38 Anthropic 400: "This model does not support assistant message prefill"
06:51:38 failover decision: surface_error (1 candidate)
06:51:53→06:53:21 90s silence, embedded lane stuck
06:53:21 SIGTERM (manual)
Happy to provide the quarantine MANIFEST or full logs offline if useful.
Summary
agent/embeddedsession-repair logic leaves session JSONL files ending on arole=assistantmessage after "repair", which then resubmits to Anthropic with assistant prefill — Anthropic rejects with HTTP 400 (This model does not support assistant message prefill). When the agent's failover chain has only one candidate, this surfaces as a user-visible error and can wedge the embedded agent in a silent loop.Environment
anthropic/claude-opus-4-7Repro / Observed Behaviour
role=assistant(e.g. previous run interrupted before the next user turn was appended).agent/embeddedattempts repair on session resume.role=assistant.formaterror.surface_errorbecause primary is the only candidate.Real-world Impact
Out of 877 session files on this gateway, 250 (~28%) had this corruption. Yesterday alone: 300 prefill rejections + 49 repair attempts. Today before manual restart (~7h window): 84 + 14. Continuous, not one-off.
Suggested Fix
Either:
role=userturn, ORAlso worth considering: a session-janitor cron that quarantines any JSONL whose last
messageentry isn'trole=user, run weekly or on startup.Workaround (current)
~/.openclaw/agents/<id>/sessions/_quarantine_YYYY-MM-DD/with a MANIFEST.fallbacksarray toagents.defaults.modelso single-candidate 4xx no longer surfaces directly.Logs (sanitized)
Happy to provide the quarantine MANIFEST or full logs offline if useful.