fix: wrap waitForCompactionRetry() in abortable() to prevent lane deadlock on timeout#13347
Closed
smartleos wants to merge 1 commit intoopenclaw:mainfrom
Closed
fix: wrap waitForCompactionRetry() in abortable() to prevent lane deadlock on timeout#13347smartleos wants to merge 1 commit intoopenclaw:mainfrom
smartleos wants to merge 1 commit intoopenclaw:mainfrom
Conversation
…dlock on timeout When an embedded run times out during the post-reply compaction phase, abortRun(true) fires but the abort signal never reaches waitForCompactionRetry() because it is a bare await — not wrapped in abortable(). This causes the finally cleanup block to never execute, permanently blocking the affected DM lane, leaking the session in "processing" state, and leaving a zombie run in the active count. The fix wraps waitForCompactionRetry() in the existing abortable() helper (already used for activeSession.prompt() in the same scope), so the abort signal properly interrupts the compaction wait and allows the finally block to run clearActiveEmbeddedRun(). Fixes openclaw#13341
bfc1ccb to
f92900f
Compare
Member
|
Closing as duplicate of #12227. If this is incorrect, please contact us. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
waitForCompactionRetry()in the existingabortable()helper so the embedded run timeout's abort signal can interrupt the compaction waitprocessingstate, and leaves a zombie run in the active countProblem
In
src/agents/pi-embedded-runner/run/attempt.ts, the main prompt call is correctly wrapped:But the compaction wait immediately after is not:
When the run timeout fires during compaction (e.g., OpenAI Batch API slow),
abortRun(true)signals therunAbortController, butwaitForCompactionRetry()never sees it. Thefinallyblock withclearActiveEmbeddedRun()never executes.Result: session stuck in
processing, run never cleared, lane task never completes → DM channel permanently dead until gateway restart.Fix
One-line change — wrap in
abortable()(already in scope, already used for the prompt):The existing
catchblock already handlesAbortErrorcorrectly.Test plan
pnpm test)abortable()immediately rejects when signal already aborted (covers case where timeout fired before reaching compaction wait)abortable()rejects via abort listener when signal fires during wait (covers case where timeout fires while compaction is in-flight)finallyblock runs in both cases, callingclearActiveEmbeddedRun()andunsubscribe()Fixes #13341
Note
AI-assisted PR. The fix was identified through production log analysis and verified by reading the source. The one-line change connects two existing, well-tested mechanisms (
abortable()andwaitForCompactionRetry()) that were simply not wired together.Greptile Overview
Greptile Summary
This PR changes the embedded runner’s post-prompt compaction wait to be abort-aware by wrapping
waitForCompactionRetry()with the existing localabortable()helper insrc/agents/pi-embedded-runner/run/attempt.ts. This makes the compaction-wait phase respond to the samerunAbortControllersignal used for the mainactiveSession.prompt(...)call, so run timeouts/aborts can propagate through the compaction wait and reliably reach the outerfinallycleanup that unsubscribes and clears the active embedded run/lane handle.Confidence Score: 5/5
isRunnerAbortError, and non-abort errors are still rethrown, so behavior outside the timeout/abort path remains unchanged.(2/5) Greptile learns from your feedback when you react with thumbs up/down!