fix(agents): add drain deadline and reduce announce timeouts#10355
fix(agents): add drain deadline and reduce announce timeouts#10355pycckuu wants to merge 3 commits intoopenclaw:mainfrom
Conversation
| const DRAIN_DEADLINE_MS = 120_000; | ||
| const deadline = Date.now() + DRAIN_DEADLINE_MS; | ||
| while ((queue.items.length > 0 || queue.droppedCount > 0) && Date.now() < deadline) { | ||
| await waitForQueueDebounce(queue); |
There was a problem hiding this comment.
Deadline causes busy loop
When the drain hits the deadline, the finally block immediately calls scheduleAnnounceDrain(key) again if there are still queued/dropped items. Since each new drain recomputes deadline = Date.now() + 120_000, this never actually stops draining and can spin a tight loop if waitForQueueDebounce(queue) returns quickly (e.g., debounceMs=0). This defeats the goal of preventing unbounded blocking and can peg the event loop while the underlying send() is slow/failing.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/subagent-announce-queue.ts
Line: 92:95
Comment:
**Deadline causes busy loop**
When the drain hits the `deadline`, the `finally` block immediately calls `scheduleAnnounceDrain(key)` again if there are still queued/dropped items. Since each new drain recomputes `deadline = Date.now() + 120_000`, this never actually stops draining and can spin a tight loop if `waitForQueueDebounce(queue)` returns quickly (e.g., `debounceMs=0`). This defeats the goal of preventing unbounded blocking and can peg the event loop while the underlying `send()` is slow/failing.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Addressed in 1df48c671 — after drain deadline, remaining items are now discarded (lines 159-162) instead of rescheduling, which prevents the infinite loop.
| // Advance time in steps so the async drain loop can interleave properly. | ||
| // Each step resolves one send (50s) and lets the loop iterate. | ||
| for (let step = 0; step < 6; step++) { | ||
| await vi.advanceTimersByTimeAsync(25_000); | ||
| } |
There was a problem hiding this comment.
Flaky fake-timer test
This test doesn't reliably allow the drain loop to reach the timeout path: send() awaits a 50s timer, but you only advance 25_000ms per step (total 150s). Depending on how many times the loop is awaiting the 50s timer vs. debounce, sendCount may end up being 0 or 5, making the >0 && <5 assertion flaky. Consider advancing enough time to deterministically complete N sends (e.g., advance 50s per expected send) and then advance past the 120s deadline, or explicitly await drain completion before asserting.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/subagent-announce-queue.test.ts
Line: 76:80
Comment:
**Flaky fake-timer test**
This test doesn't reliably allow the drain loop to reach the timeout path: `send()` awaits a 50s timer, but you only advance `25_000ms` per step (total 150s). Depending on how many times the loop is awaiting the 50s timer vs. debounce, `sendCount` may end up being 0 or 5, making the `>0 && <5` assertion flaky. Consider advancing enough time to deterministically complete N sends (e.g., advance 50s per expected send) and then advance past the 120s deadline, or explicitly await drain completion before asserting.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Addressed in 1df48c671 and 6faafc4a1 — added vi.resetModules() for test isolation, tightened assertion to exact toHaveBeenCalledTimes(3), and added a dedicated "discards remaining items on timeout without rescheduling" test that verifies no reschedule fires after deadline.
|
Pushed follow-up commit 1df48c671 addressing review feedback:
|
When multiple sub-agents complete simultaneously, the announce queue drain loop could block the main session lane indefinitely — each announce triggers a full agent turn (deliver: true, expectFinal: true) and they process sequentially with no overall time limit. - Add 120s deadline to scheduleAnnounceDrain() while loop - Reduce per-announce callGateway timeout from 60s to 30s - Log diagnostic when drain times out with items remaining Refs openclaw#10334
After drain deadline, clear remaining items instead of rescheduling a fresh drain with a new 120s window. Also improve test isolation with vi.resetModules() and tighten assertions to exact send count. Refs openclaw#10334
- send throws: verifies error logging and queue recovery for subsequent enqueues - timeout discards: confirms no reschedule loop after deadline (items cleared) - single item: boundary case for minimal queue Refs openclaw#10334
6faafc4 to
43e2463
Compare
Summary
Prevents the gateway from becoming unresponsive when multiple sub-agents complete and try to announce results back to the main session (#10334). Three targeted changes:
Changes
src/agents/subagent-announce-queue.tsDRAIN_DEADLINE_MS = 120_000constant andDate.now() < deadlinecheck to the while loop inscheduleAnnounceDrain()defaultRuntime.error()when drain exits due to timeoutfinallyblock from rescheduling a fresh drain with a new 120s window (infinite loop)src/agents/subagent-announce.tstimeoutMsfrom60_000to30_000in bothsendAnnounce()andrunSubagentAnnounceFlow()direct send pathsrc/agents/subagent-announce-queue.test.ts(new)Test plan
pnpm test src/agents/subagent-announce-queue.test.tspnpm test src/agents/subagent-announce.format.test.tsCloses #10334