-
-
Notifications
You must be signed in to change notification settings - Fork 69.2k
LLM calls have no timeout control, slow model responses cause complete agent hang #55065
Copy link
Copy link
Closed
Description
Problem
When an LLM model responds slowly or becomes completely unresponsive, OpenClaw has no timeout mechanism to abort the request. This causes the entire agent lane to be blocked, making the agent unresponsive to any user interaction.
Real-world Incident
Timeline from logs:
12:12:12 - Agent reply completed
12:12:18 - lane=session:agent:main:main waitedMs=25871 (main session waited 26s)
12:13:03 - User sent new message, but blocked (no response)
12:17:05 - User sent /stop (4 minutes later)
12:17:22 - lane=nested waitedMs=299249 (cumulative block: 299 seconds)
Root cause analysis:
- 12:13:03 ~ 12:17:05: 4 minutes of empty logs
- This wasn't a `sessions_send` 30s timeout issue
- The real cause: LLM call got stuck - model returned no tokens
- The entire lane was occupied, main session couldn't respond to new messages
- Only `/stop` could recover
Expected Behavior
- LLM calls should have an idle timeout mechanism
- If the model doesn't return any token within a specified time, the request should be aborted with a user-friendly error message
- Agent should remain responsive to user messages
Actual Behavior
- LLM calls have no timeout limit
- When model is slow/unresponsive, the entire agent is blocked
- User can only recover via `/stop`
- No error message is shown to the user
Impact
- All LLM providers: Any model can encounter network issues or server-side delays
- All users: Unpredictable when it happens, extremely poor experience when it does
- Multi-agent scenarios: One blocked agent may affect communication with other agents
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Fields
Give feedbackNo fields configured for issues without a type.