-
-
Notifications
You must be signed in to change notification settings - Fork 40.1k
Description
Summary
When a tool result contains content that the Gemini API rejects (e.g., large base64/minified JavaScript from reading a dist file), the session becomes permanently stuck in a 400 error loop. Every subsequent API call — including /new and /reset — fails because the full session history (containing the offending content) is sent to the API before the command is processed.
There is no circuit-breaker, auto-recovery, or tool result size guard to prevent or recover from this state.
Root Cause
Three missing safeguards combine to create an unrecoverable failure:
-
No tool result size limit or content sanitization —
session-tool-result-guard.tsappends tool results directly to session history without any truncation or filtering. A singleexectool call that reads a large dist file (containing base64 source maps, minified JS, etc.) can inject hundreds of KB of content that the Gemini API treats as invalid. -
No circuit-breaker for consecutive API errors —
pi-embedded-runner/run.tshandles context overflow errors (viaisContextOverflowError) and auth failover errors, but has no handling for repeated 400 "invalid argument" errors. The session keeps retrying with the same poisoned history indefinitely. -
/newcommand cannot break the loop — While/newdoes reset the session at the routing layer (commands-core.tslines 65-107), in group channels the incoming Discord message first triggers an agent turn on the existing session. The agent turn sends the poisoned history to the API, gets a 400 error, and the/newis never processed. The session stays stuck.
Steps to Reproduce
- Start a Discord group session using
google-gemini-cli/gemini-3-flash-preview - Have the agent read a large compiled JavaScript file via the
exectool (e.g., a bundled dist file with inline source maps) - The tool result (containing base64/minified content) is stored in session history
- Every subsequent API call fails with:
Cloud Code Assist API error (400): Request contains an invalid argument. - Incoming Discord messages append to the session but all API calls fail
- Sending
/newin the channel also fails — the command triggers an API call with the poisoned history before the session reset can take effect - Session is permanently broken
Evidence
From session JSONL (2a47c269):
# Line 779: Tool result with base64/minified content from dist file read
role=toolResult content=,CAAC,EAAE,MAAMA,EAAE,CAAC,EAAE,YAAY,GAAG,EAAE,EAAE...
# Line 780: First 400 error — every API call after this fails
role=assistant stopReason=error errorMessage="Cloud Code Assist API error (400): Request contains an invalid argument."
# Lines 782-817: 19 consecutive 400 errors over 3+ hours, interspersed with user messages that cannot be processed
Session was at 925k/1049k tokens (88%) with 817 messages. The only recovery was manually archiving the session file and restarting the gateway.
Impact
- Critical — Renders the session completely unusable with no self-recovery
- User sees repeated
Cloud Code Assist API error (400)messages in Discord /newand/resetcommands cannot break the loop- Manual file system intervention required (delete/archive session JSONL)
- Affects any session where the agent reads large binary/minified content via tools
Proposed Fixes
1. Circuit-breaker for consecutive API errors (high priority)
In pi-embedded-runner/run.ts, add error tracking per session:
// After N consecutive non-overflow 400 errors on the same session,
// auto-compact (drop oldest messages) or auto-rotate to a new session
if (isConsecutive400Error(errorText) && consecutiveErrorCount >= MAX_CONSECUTIVE_ERRORS) {
// Option A: Auto-compact — drop messages before the last successful exchange
// Option B: Auto-rotate — create a new session and notify the user
// Option C: Trim the last N tool results and retry
}2. Tool result size guard (medium priority)
In session-tool-result-guard.ts, add truncation before persisting:
const MAX_TOOL_RESULT_CHARS = 100_000; // ~25k tokens
if (resultText.length > MAX_TOOL_RESULT_CHARS) {
resultText = resultText.slice(0, MAX_TOOL_RESULT_CHARS) +
`\n\n[Truncated: output was ${resultText.length} chars, limit is ${MAX_TOOL_RESULT_CHARS}]`;
}3. /new bypass for poisoned sessions (medium priority)
Process /new and /reset commands at the message routing layer before triggering an agent turn, so they work even when the session history is invalid.
4. Content sanitization for known-bad patterns (low priority)
Strip or replace content that is known to cause issues with specific providers (e.g., large base64 blobs, source maps, minified JS bundles) before persisting tool results.
Workaround
Manually archive or delete the poisoned session file:
mkdir -p ~/.openclaw/agents/main/sessions/_archived
mv ~/.openclaw/agents/main/sessions/<session-id>.jsonl ~/.openclaw/agents/main/sessions/_archived/
# Restart gateway
pkill -USR1 -f "openclaw.*gateway"Environment
- OpenClaw version: 2026.2.6-3
- Channel: Discord (group)
- Model:
google-gemini-cli/gemini-3-flash-preview - Platform: macOS Darwin 25.2.0 (arm64) + Linux (same behavior on both)
Related Issues
- Session should auto-recover when corrupted tool response makes history invalid #8946 — Session should auto-recover when corrupted tool response makes history invalid (same fundamental problem, different trigger)
- [Bug]: Tool Call Formatting Issue + Context Overflow on Model Switch #11291 — Tool call formatting + context overflow on model switch (same error loop pattern, filed today)
- [Bug] Single-session compaction orphans tool_result blocks, permanently breaking session #9672 — Compaction orphans tool_result blocks, permanently breaking session
- [Bug]: Read tool inlines base64 images in session transcripts, causing context overflow #6202 — Base64 images in tool results causing context overflow (related: no size guard on tool results)
- Orphaned tool_result causes API 400 error after terminated tool_use #3014 — Orphaned tool_result causes API 400 error loop
- Context overflow error does not trigger automatic session recovery #3154 — Context overflow does not trigger automatic session recovery
- 🐛 Session Corruption Bug: Terminated Tool Calls Break All Subsequent Requests #5430 — Terminated tool calls break all subsequent requests