Skip to content

fix(embedded-runner): abort compaction wait on timeout#15449

Closed
wangai-studio wants to merge 4 commits intoopenclaw:mainfrom
wangai-studio:fix/abort-compaction-wait
Closed

fix(embedded-runner): abort compaction wait on timeout#15449
wangai-studio wants to merge 4 commits intoopenclaw:mainfrom
wangai-studio:fix/abort-compaction-wait

Conversation

@wangai-studio
Copy link
Copy Markdown
Contributor

@wangai-studio wangai-studio commented Feb 13, 2026

Summary

  • Fix a hang where an embedded attempt can get stuck waiting for compaction retry even after the run timeout/abort fires.
  • Add a regression test that reproduces the stuck compaction wait and asserts cleanup still runs.

Context / Bug

runEmbeddedAttempt awaited waitForCompactionRetry() after the prompt completes. If auto-compaction enters a retry state and the expected end events never arrive, this wait can remain pending indefinitely.

Because the wait wasn’t tied to the run abort signal, a timeoutMs abort could still leave the attempt hanging before its finally cleanup (clearActiveEmbeddedRun(...)). This manifests as sessions stuck in processing and diagnostic stuck session spam, and (for Telegram) no reply until a gateway restart clears in-memory state.

Fix

Make the compaction retry wait abortable:

  • src/agents/pi-embedded-runner/run/attempt.ts: await abortable(waitForCompactionRetry())

Evidence (Sanitized)

From a Telegram “no reply” incident (PII removed):

2026-02-12T13:01:08.804Z [agent/embedded] embedded run compaction start: runId=79b5f8f5-6107-46c9-8742-d5d34553eff1
2026-02-12T13:01:36.406Z [agent/embedded] embedded run compaction retry: runId=79b5f8f5-6107-46c9-8742-d5d34553eff1
2026-02-12T13:05:46.490Z [agent/embedded] embedded run timeout: runId=79b5f8f5-6107-46c9-8742-d5d34553eff1 sessionId=7f174700-813c-4fbc-a6f6-09cd04d67dbe timeoutMs=600000
2026-02-12T13:06:14.587Z [diagnostic] stuck session: sessionId=7f174700-813c-4fbc-a6f6-09cd04d67dbe sessionKey=unknown state=processing age=628s queueDepth=0

(Full sanitized excerpt available in my incident notes; happy to provide more if needed.)

Test Plan

  • pnpm check
  • pnpm build
  • pnpm vitest src/agents/pi-embedded-runner/run

AI-Assisted

AI-assisted (Codex). I reviewed the code and understand the change.

Greptile Overview

Greptile Summary

This PR makes the post-prompt waitForCompactionRetry() wait abortable by wrapping it in the attempt’s abort signal, preventing embedded runs from hanging past timeoutMs and skipping clearActiveEmbeddedRun(...) cleanup.

It also adds a regression test that simulates a compaction retry wait that never resolves and asserts the attempt exits on timeout and still runs cleanup. The behavior change is localized to the embedded runner attempt flow (src/agents/pi-embedded-runner/run/attempt.ts), specifically the section after the prompt completes where compaction retry waits previously could block indefinitely.

Confidence Score: 4/5

  • This PR is close to safe to merge, with the main risk being a potentially flaky regression test under fake timers.
  • The production change is small and correctly ties the compaction retry wait to the run abort signal. The primary remaining concern is the new test’s timing/awaiting pattern, which may intermittently fail in CI due to fake-timers microtask flushing rather than real logic regressions.
  • src/agents/pi-embedded-runner/run/attempt.compaction-timeout.test.ts

Last reviewed commit: bba5521

(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!

@openclaw-barnacle openclaw-barnacle bot added agents Agent runtime and tooling size: M labels Feb 13, 2026
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +347 to +360
let finished = false;
void runPromise.finally(() => {
finished = true;
});

try {
await waitCalled;
await vi.advanceTimersByTimeAsync(timeoutMs + 1);
await Promise.resolve();

// Expect runner to end on timeout instead of hanging in compaction wait.
expect(finished).toBe(true);
expect(clearActiveEmbeddedRun).toHaveBeenCalledTimes(1);
} finally {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flaky completion assertion

This test doesn’t await runPromise and instead asserts finished (set via runPromise.finally) after advancing fake timers and a single Promise.resolve(). With Vitest fake timers, the timeout callback + async finally cleanup (where clearActiveEmbeddedRun is called) can take additional microtask flushes, so the assertion can fail even when the fix works. Consider awaiting the promise (or awaiting the .finally() you attach) rather than relying on the finished flag/microtask timing.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.compaction-timeout.test.ts
Line: 347:360

Comment:
**Flaky completion assertion**

This test doesn’t await `runPromise` and instead asserts `finished` (set via `runPromise.finally`) after advancing fake timers and a single `Promise.resolve()`. With Vitest fake timers, the timeout callback + async `finally` cleanup (where `clearActiveEmbeddedRun` is called) can take additional microtask flushes, so the assertion can fail even when the fix works. Consider awaiting the promise (or awaiting the `.finally()` you attach) rather than relying on the `finished` flag/microtask timing.

How can I resolve this? If you propose a fix, please make it concise.

When compaction enters a retry and the agent never emits the expected end events, waitForCompactionRetry() can hang forever. This prevents the attempt cleanup from running and keeps the active run state stuck in processing.

Wrap the compaction wait in the runner abort controller and add a regression test.

Test: pnpm vitest src/agents/pi-embedded-runner/run
…n-wait

# Conflicts:
#	src/agents/pi-embedded-runner/run/attempt.ts
@openclaw-barnacle openclaw-barnacle bot added channel: bluebubbles Channel integration: bluebubbles channel: discord Channel integration: discord channel: googlechat Channel integration: googlechat channel: imessage Channel integration: imessage channel: matrix Channel integration: matrix channel: msteams Channel integration: msteams channel: nextcloud-talk Channel integration: nextcloud-talk channel: nostr Channel integration: nostr channel: signal Channel integration: signal channel: slack Channel integration: slack channel: telegram Channel integration: telegram channel: tlon Channel integration: tlon channel: voice-call Channel integration: voice-call channel: whatsapp-web Channel integration: whatsapp-web channel: zalo Channel integration: zalo channel: zalouser Channel integration: zalouser app: web-ui App: web-ui gateway Gateway runtime extensions: diagnostics-otel Extension: diagnostics-otel extensions: llm-task Extension: llm-task labels Feb 20, 2026
@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

3 similar comments
@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

21 similar comments
@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

@openclaw-barnacle
Copy link
Copy Markdown

Closing this PR because it looks dirty (too many unrelated commits). Please recreate the PR from a clean branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling app: web-ui App: web-ui channel: bluebubbles Channel integration: bluebubbles channel: discord Channel integration: discord channel: feishu Channel integration: feishu channel: googlechat Channel integration: googlechat channel: imessage Channel integration: imessage channel: irc channel: matrix Channel integration: matrix channel: msteams Channel integration: msteams channel: nextcloud-talk Channel integration: nextcloud-talk channel: nostr Channel integration: nostr channel: signal Channel integration: signal channel: slack Channel integration: slack channel: telegram Channel integration: telegram channel: tlon Channel integration: tlon channel: twitch Channel integration: twitch channel: voice-call Channel integration: voice-call channel: whatsapp-web Channel integration: whatsapp-web channel: zalo Channel integration: zalo channel: zalouser Channel integration: zalouser cli CLI command changes commands Command implementations docker Docker and sandbox tooling extensions: device-pair extensions: diagnostics-otel Extension: diagnostics-otel extensions: llm-task Extension: llm-task extensions: lobster Extension: lobster extensions: memory-lancedb Extension: memory-lancedb extensions: phone-control gateway Gateway runtime scripts Repository scripts size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant