fix(embedded-runner): abort compaction wait on timeout#15449
fix(embedded-runner): abort compaction wait on timeout#15449wangai-studio wants to merge 4 commits intoopenclaw:mainfrom
Conversation
| let finished = false; | ||
| void runPromise.finally(() => { | ||
| finished = true; | ||
| }); | ||
|
|
||
| try { | ||
| await waitCalled; | ||
| await vi.advanceTimersByTimeAsync(timeoutMs + 1); | ||
| await Promise.resolve(); | ||
|
|
||
| // Expect runner to end on timeout instead of hanging in compaction wait. | ||
| expect(finished).toBe(true); | ||
| expect(clearActiveEmbeddedRun).toHaveBeenCalledTimes(1); | ||
| } finally { |
There was a problem hiding this comment.
Flaky completion assertion
This test doesn’t await runPromise and instead asserts finished (set via runPromise.finally) after advancing fake timers and a single Promise.resolve(). With Vitest fake timers, the timeout callback + async finally cleanup (where clearActiveEmbeddedRun is called) can take additional microtask flushes, so the assertion can fail even when the fix works. Consider awaiting the promise (or awaiting the .finally() you attach) rather than relying on the finished flag/microtask timing.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.compaction-timeout.test.ts
Line: 347:360
Comment:
**Flaky completion assertion**
This test doesn’t await `runPromise` and instead asserts `finished` (set via `runPromise.finally`) after advancing fake timers and a single `Promise.resolve()`. With Vitest fake timers, the timeout callback + async `finally` cleanup (where `clearActiveEmbeddedRun` is called) can take additional microtask flushes, so the assertion can fail even when the fix works. Consider awaiting the promise (or awaiting the `.finally()` you attach) rather than relying on the `finished` flag/microtask timing.
How can I resolve this? If you propose a fix, please make it concise.When compaction enters a retry and the agent never emits the expected end events, waitForCompactionRetry() can hang forever. This prevents the attempt cleanup from running and keeps the active run state stuck in processing. Wrap the compaction wait in the runner abort controller and add a regression test. Test: pnpm vitest src/agents/pi-embedded-runner/run
bba5521 to
f5280fa
Compare
bfc1ccb to
f92900f
Compare
…n-wait # Conflicts: # src/agents/pi-embedded-runner/run/attempt.ts
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
3 similar comments
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
21 similar comments
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
|
Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch. |
Summary
Context / Bug
runEmbeddedAttemptawaitedwaitForCompactionRetry()after the prompt completes. If auto-compaction enters a retry state and the expected end events never arrive, this wait can remain pending indefinitely.Because the wait wasn’t tied to the run abort signal, a
timeoutMsabort could still leave the attempt hanging before itsfinallycleanup (clearActiveEmbeddedRun(...)). This manifests as sessions stuck inprocessingand diagnosticstuck sessionspam, and (for Telegram) no reply until a gateway restart clears in-memory state.Fix
Make the compaction retry wait abortable:
src/agents/pi-embedded-runner/run/attempt.ts:await abortable(waitForCompactionRetry())Evidence (Sanitized)
From a Telegram “no reply” incident (PII removed):
(Full sanitized excerpt available in my incident notes; happy to provide more if needed.)
Test Plan
pnpm checkpnpm buildpnpm vitest src/agents/pi-embedded-runner/runAI-Assisted
AI-assisted (Codex). I reviewed the code and understand the change.
Greptile Overview
Greptile Summary
This PR makes the post-prompt
waitForCompactionRetry()wait abortable by wrapping it in the attempt’s abort signal, preventing embedded runs from hanging pasttimeoutMsand skippingclearActiveEmbeddedRun(...)cleanup.It also adds a regression test that simulates a compaction retry wait that never resolves and asserts the attempt exits on timeout and still runs cleanup. The behavior change is localized to the embedded runner attempt flow (
src/agents/pi-embedded-runner/run/attempt.ts), specifically the section after the prompt completes where compaction retry waits previously could block indefinitely.Confidence Score: 4/5
Last reviewed commit: bba5521
(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!