Skip to content

feat: add LLM idle timeout for streaming responses#55072

Merged
obviyus merged 3 commits intoopenclaw:mainfrom
liuy:feat/llm-idle-timeout
Mar 30, 2026
Merged

feat: add LLM idle timeout for streaming responses#55072
obviyus merged 3 commits intoopenclaw:mainfrom
liuy:feat/llm-idle-timeout

Conversation

@liuy
Copy link
Copy Markdown
Contributor

@liuy liuy commented Mar 26, 2026

Fixes #55065

Summary

Adds an idle timeout mechanism for LLM streaming responses. If the model doesn't return any token within the specified timeout, the request is aborted with a user-friendly error message.

Changes

  1. New configuration option: agents.defaults.llm.idleTimeoutSeconds

    • 0 = disabled (never timeout)
    • > 0 = timeout in seconds
    • Default: 60 seconds
  2. Core implementation: src/agents/pi-embedded-runner/run/llm-idle-timeout.ts

    • resolveLlmIdleTimeoutMs(): resolves timeout from config
    • streamWithIdleTimeout(): wraps stream function with idle timeout using Promise.race
  3. Integration: Modified attempt.ts to wrap streamFn with idle timeout

  4. Tests: llm-idle-timeout.test.ts with 13 test cases covering:

    • Config resolution (8 tests)
    • Stream wrapper behavior (5 tests, including timeout scenario)

Configuration Example

```json
{
"agents": {
"defaults": {
"llm": {
"idleTimeoutSeconds": 60
}
}
}
}
```

User Experience

Before: Agent hangs indefinitely when LLM is unresponsive, user must use `/stop`

After: After 60s (configurable) of no response, user sees:
```
⏱️ LLM idle timeout (60s): no response from model
```

@openclaw-barnacle openclaw-barnacle bot added agents Agent runtime and tooling size: M labels Mar 26, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 26, 2026

Greptile Summary

This PR adds a configurable idle timeout for LLM streaming responses, aborting with a descriptive error message if no token is received within the threshold (default 60 s). The feature is well-scoped and fits cleanly into the existing attempt.ts retry loop via a thin wrapper around streamFn.

Key implementation decisions look correct:

  • The per-next() timer pattern (clearTimer()createTimeoutPromise()Promise.race) correctly resets the idle clock on each received chunk, not on the overall request duration.
  • clearTimer() is invoked on all three exit paths of next() (done, non-done, and catch), preventing lingering timers.
  • controller.abort(error) is called inside the timeout callback so upstream abort handling preserves the user-friendly error message (LLM idle timeout (60s): …) via the extended makeAbortError in attempt.ts.
  • Zod schema uses .nonnegative() and the JSON schema uses minimum: 0, correctly allowing 0 to opt out of the feature.

One minor style nit: streamSimple is imported as a value on line 2 of llm-idle-timeout.ts but is used only in the type position ReturnType<typeof streamSimple> — it should be import type.

Confidence Score: 5/5

Safe to merge — all previous P1 concerns (empty abort body, timer leak on non-done path, schema rejecting 0) are resolved in this version.

The only remaining finding is a P2 style issue (value import used only for its type). All core logic — timer lifecycle, abort propagation, config validation — is correct and covered by 13 tests.

No files require special attention.

Important Files Changed

Filename Overview
src/agents/pi-embedded-runner/run/llm-idle-timeout.ts New file implementing idle timeout for LLM streaming via timer-per-next() + Promise.race. Logic is sound: timer is cleared on all three return paths (done, non-done, throw), controller.abort is called on timeout, and resolveLlmIdleTimeoutMs correctly converts seconds to ms with capping. One style nit: streamSimple is imported as a value but used only for its type.
src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts 13 test cases covering config resolution edge cases and stream wrapper behavior including timeout firing and controller abort verification. Coverage is solid.
src/agents/pi-embedded-runner/run/attempt.ts Integration wires streamWithIdleTimeout into the attempt loop when idleTimeoutMs > 0, and extends makeAbortError to preserve the original Error message when the abort reason is already an Error instance.
src/config/zod-schema.agent-defaults.ts Uses .nonnegative() (allowing 0 to disable) — correctly aligned with the runtime behavior of resolveLlmIdleTimeoutMs.
src/config/schema.base.generated.ts JSON schema uses minimum: 0 (inclusive) for idleTimeoutSeconds, consistent with the Zod schema and runtime behavior.
src/config/types.agent-defaults.ts Adds AgentLlmConfig type with idleTimeoutSeconds?: number and wires it into AgentDefaultsConfig. Straightforward type scaffolding.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/llm-idle-timeout.ts
Line: 2

Comment:
**Value import used only for its type**

`streamSimple` is imported as a value but used exclusively as a type in `ReturnType<typeof streamSimple>` on line 52. This should be `import type` to avoid a potential runtime side-effect load of `@mariozechner/pi-ai`. Projects using `verbatimModuleSyntax: true` would also require this.

Alternatively, you can drop the dependency on `streamSimple` entirely by deriving the type from `StreamFn`'s own return type (e.g. `Awaited<ReturnType<StreamFn>>`), which keeps the module self-contained.

```suggestion
import type { streamSimple } from "@mariozechner/pi-ai";
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (3): Last reviewed commit: "fix: stop rewriting abort reason names i..." | Re-trigger Greptile

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d7237f65d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@liuy
Copy link
Copy Markdown
Contributor Author

liuy commented Mar 26, 2026

Thanks for the thorough review! Both issues have been fixed in the latest commit:

  1. Schema: Changed .positive().nonnegative() in zod-schema.agent-defaults.ts. The generated schema (schema.base.generated.ts) was also regenerated and now has minimum: 0 instead of exclusiveMinimum: 0.

  2. Abort propagation: The signal parameter has been replaced with controller: AbortController. On timeout, we now call controller.abort(error) to properly cancel the underlying request.

@greptile-apps Could you re-review the latest commit 8e4335cbb5? The fixes are on the same branch.

@liuy liuy force-pushed the feat/llm-idle-timeout branch from 8e4335c to 3fc715e Compare March 26, 2026 10:20
@liuy liuy force-pushed the feat/llm-idle-timeout branch from 3fc715e to 5451a87 Compare March 26, 2026 10:23
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5451a871a4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@liuy liuy force-pushed the feat/llm-idle-timeout branch 2 times, most recently from 2fc137d to 81d5f75 Compare March 26, 2026 12:36
@liuy
Copy link
Copy Markdown
Contributor Author

liuy commented Mar 26, 2026

Addressed all review comments:

  • Changed .positive() to .nonnegative() to allow idleTimeoutSeconds: 0 (disabled)
  • Added controller.abort(error) to cancel network requests on timeout
  • Modified makeAbortError() to preserve original error message ("LLM idle timeout..." instead of "aborted")
  • Added clearTimer() before returning non-done chunks

@greptile-apps @chatgpt-codex-connector

@liuy liuy force-pushed the feat/llm-idle-timeout branch from dd5759e to 81d5f75 Compare March 27, 2026 07:32
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dd5759e7a0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@liuy liuy force-pushed the feat/llm-idle-timeout branch from 81d5f75 to 895bf06 Compare March 27, 2026 07:55
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 077f9623c7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@liuy liuy force-pushed the feat/llm-idle-timeout branch from 077f962 to f7a72b7 Compare March 27, 2026 08:22
@liuy
Copy link
Copy Markdown
Contributor Author

liuy commented Mar 29, 2026

@greptileai please rescore

@obviyus
Copy link
Copy Markdown
Contributor

obviyus commented Mar 30, 2026

Patched the idle-timeout path.

  • streamWithIdleTimeout() now reports idle stalls through the existing runner abort path instead of aborting the raw controller directly.
  • Idle timeouts now set normal timeout state, stop the active session, and reuse the standard cleanup path.
  • Kept the change small: one callback seam in the wrapper, one handoff back into abortRun(true, error).

Verified with:

  • pnpm test -- src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts
  • pnpm tsgo

@obviyus obviyus self-assigned this Mar 30, 2026
liuy and others added 2 commits March 30, 2026 08:00
Problem: When LLM stops responding, the agent hangs for ~5 minutes with no feedback.
Users had to use /stop to recover.

Solution: Add idle timeout detection for LLM streaming responses.
@obviyus obviyus force-pushed the feat/llm-idle-timeout branch from d52d7e5 to 2e7f201 Compare March 30, 2026 02:30
Copy link
Copy Markdown
Contributor

@obviyus obviyus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed latest changes; landing now.

@obviyus obviyus merged commit 6f09a68 into openclaw:main Mar 30, 2026
9 checks passed
@obviyus
Copy link
Copy Markdown
Contributor

obviyus commented Mar 30, 2026

Landed on main.

Thanks @liuy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLM calls have no timeout control, slow model responses cause complete agent hang

2 participants