Skip to content

🐛 Session Corruption Bug: Terminated Tool Calls Break All Subsequent Requests #5430

@LeLe1110

Description

@LeLe1110

🐛 Session Corruption Bug: Terminated Tool Calls Break All Subsequent Requests

Summary

When a tool call is abnormally terminated (e.g., process killed, timeout), it leaves a corrupted record in the session history file (.jsonl). This corrupted record causes ALL subsequent API requests to fail with error code 2013, completely breaking chat functionality for that session.

Severity

Critical - Renders the entire chat session unusable. User sees only truncated responses (e.g., single letter "A") with no error indication in the UI.

Steps to Reproduce

  1. Start a chat session that triggers a tool call
  2. Forcefully terminate the process during tool execution (e.g., pkill -9 openclaw-gateway)
  3. Restart the gateway service
  4. Try sending any new message in the same session

Expected Behavior

  • The system should gracefully handle incomplete tool calls
  • Corrupted session data should be auto-repaired or isolated
  • Error messages should be clearly displayed to the user

Actual Behavior

  • Every API request fails with: invalid params, tool result's tool id(call_function_w2bitop211ji_1) not found (2013)
  • UI shows only first character of error ("A") instead of actual response
  • Session becomes permanently broken until manual cleanup

Error Details

From session file (.jsonl):

{
  "type": "message",
  "id": "29c9121d",
  "message": {
    "role": "assistant",
    "content": [
      {
        "type": "toolCall",
        "id": "call_function_w2bitop211ji_1",
        "name": "write",
        "arguments": {
          "path": "/Users/macmini_no1/clawd/HEARTBEAT.md",
          "content": "# Heartbeat.md\n\nKeep this file empty (or with only comments) to skip heartbeat API calls.\nAdd"
        },
        "partialJson": "{\"path\": \"/Users/macmini_no1/clawd/HEARTBEAT.md\", \"content\": \"# Heartbeat.md\\n\\nKeep this file empty (or with only comments) to skip heartbeat API calls.\\nAdd"
      }
    ],
    "stopReason": "error",
    "errorMessage": "terminated"
  }
}

From gateway logs:

[feishu] deliver called: text=LLM request rejected: invalid params, tool result's tool id(call_function_w2bitop211ji_1) not found

From API response:

{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "invalid params, tool result's tool id(call_function_w2bitop211ji_1) not found (2013)"
  }
}

Impact

  • User Experience: Completely broken chat with no clear error indication
  • Data Loss: Users may lose entire conversation history when forced to delete session file
  • Debugging: Extremely difficult to diagnose without examining session .jsonl files

Environment

  • OpenClaw Version: 2026.1.29
  • Provider: MiniMax (api.minimax.io/anthropic)
  • Model: MiniMax-M2.1
  • OS: macOS Darwin 25.2.0
  • Session File: ~/.openclaw/agents/main/sessions/*.jsonl

Current Workaround

# 1. Backup the corrupted session file
cd ~/.openclaw/agents/main/sessions
mv <session-id>.jsonl <session-id>.jsonl.backup

# 2. Restart gateway
pkill -9 openclaw-gateway
openclaw-gateway &

# 3. System creates new clean session (loses conversation history)

Suggested Fixes

1. Session Validation on Load (Recommended)

  • Validate session .jsonl file on startup
  • Remove or quarantine corrupted entries
  • Log warning about data corruption

2. Tool Call Cleanup

  • Detect incomplete tool calls in conversation history
  • Remove orphaned tool_use blocks before sending to API
  • Add timeout handling for tool execution

3. Error Handling Improvements

  • Catch error 2013 specifically and trigger session repair
  • Display meaningful error in UI instead of truncated "A"
  • Provide user option to reset session from UI

4. Session Health Check

  • Add periodic validation of active sessions
  • Auto-repair or flag corrupted sessions
  • Implement session recovery mechanism

Related Files

  • Session history: ~/.openclaw/agents/main/sessions/*.jsonl
  • Session state: ~/.openclaw/agents/main/sessions/sessions.json
  • Gateway logs: ~/.openclaw/logs/gateway.log

Additional Context

This bug was discovered after:

  1. User reported "chat不响应" (chat not responding)
  2. Git diff showed last operation was terminated write tool call
  3. Analysis of session file revealed corrupted tool call record
  4. Every subsequent message failed with error 2013
  5. Only fix was to delete session file and restart

The UI displaying only "A" (first character) instead of the full error message made this extremely difficult to diagnose.


Reproduction Rate: 100% when tool call is terminated abnormally
Priority: High - Breaks core functionality with no user-facing error handling

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions